Great course. I checked it out this evening and the real value in the course is in the system design problems in my opinion. For that alone, it’s worth it, even for an experienced professional.
For more experienced people, the challenge can at times be the fact that they know exactly how to solve these problems. I found myself wondering why the Twitter problem didn’t mention “ingester” or “fan out” as part of the solution, for example. What is desired by the interviewer is the collaborative communication and demonstration of the problem-solving process; that organization of thought alone is worth buying the course to “scale down” the answer for the appropriate audience even if you actually do know how to solve them. ;-)
FEEDBACK:
The “System Design Basics” section is tailored for less experienced people and doesn’t go deep into any particular topic. It’s a high-level introduction to each, some more elaborate than others. I would suggest in each section links to where people can study those concepts to gain a deeper understanding.
EXAMPLE OF MORE DEPTH NEEDED:
The database section could use more examples and one thing jumped out at me stating that changing or adding a SQL column will bring the system offline. If designed properly, this would not occur (like if requests served from cache and high hit rate for example), but also database vendors like Percona has solved this with their pt-online-schema-change tool. It would have been nice to see mention of considerations of storage engine like MyISAM and InnoDB and breaking down read or write-intensive systems and tradeoffs of row vs. table locking and transactional capabilities.
The indexes section is very light and it would be nice to see mention how to optimize or design an index using EXPLAIN on queries to understand how the database is executing the query. Tell the user about the pros and cons of compound indices, and how they relate to the WHERE clause in their queries is needed. There is more information written about consistent hashing than database optimization; that may be relevant for interviewing with one or two companies in particular, but a more well-rounded understanding would better serve the student. My fear is people will take the course, be able to “hack” the interview, but not truly have that expected level of experience when they face those issues in real life.
Redundancy section it’s important to speak about the number of nodes and perhaps whether they should be even or odd in the count. You could touch on RAID perhaps so the student understands the SQL vs NoSQL style of redundancy. I didn’t see anything about master/slave or master/master configuration either. Sharding/partitioning is covered in depth but this seems missing.
ADDING PROS/CONS OF ALTERNATIVES OR CASE STUDIES:
It would be nice to see the pros/cons of the systems mentioned perhaps in a tabular format so the student had a better understanding. I liken this to understanding the Big O notation of algorithms. If they are to understand the time and space complexity of algorithms, then it’s reasonable to expect to understand the pros and cons of the different tools.
Example for load balancing with a browser-based mobile app launch and hardware load balancing. HWLB tries to balance based on IP address and mobile device carrier NOCs grouped requests from single IP so an entire metro market assigned to a single server takes down system with 50k+ concurrent users. Changing to a software load balancing strategy leveraging cookies/tokens properly distributed load and solved an issue.
Example for queues with RabbitMQ and the overhead cost of AMQP protocol; it’s great for fault tolerance but poor on write performance may max out (depending on message size and infrastructure) around 2,000 writes/second. Switching to Redis using LIST object and BLPOPRPUSH for a client and you can build a fault-tolerant queue with 180,000 writes/second. Eventually, this may be replaced by Kafka and 2–3 million writes/sec over a simple 3-node cluster if using small message size (1KB).
ADDITIONAL SYSTEM DESIGN BASICS CONSIDERATION:
I would suggest also adding a new proposed architecture, Stream Pipeline Architecture. Solutions like Kafka or Amazon Kinesis allowing append-only distributed redundant log with pub/sub are becoming more popular than traditional enterprise message bus architectures. This is a paradigm shift is system design because the database becomes a function of the application instead of the application a function of the database, making DB choice irrelevant and “hot swappable” as business needs change over time. Make sure to describe the two strategies for publishing to topics (Stream [changelog], Table [snapshots]) and serialization dos/don’ts. Lastly be sure to describe serialization considerations like Avro (recommended for Kafka for example) or Protobuf, etc.
Another section I would add is “Source of bottlenecks” and perhaps describe the various choke points in a few systems and how they can be diagnosed and solved, or better yet, anticipated and designed around. These may not need to be covered during an interview, but having an understanding while the interviewee is “talking through the problem” will demonstrate they understand how they apply the gathered requirements to formulate their proposed solution. A bottleneck for example could be the network interface card (NIC) and whether using Gigabit Ethernet between boxes. There are also case studies where startups scaled using cloud and the cost and performance of VPS weren’t worth it and they migrated to bare metal. Perhaps a way to compute hypervisor vs. bare metal in performance considerations would be cool.
Cheers.