What to learn as a backend dev
In the past few weeks I’ve been collating a list of topics that I plan to explore on my path of becoming a better software engineer. I’m sharing this here in the hope that it will be useful for others as well. I believe that a solid understanding of most of these topics provides a strong base knowledge. I’m going to use this as a guide, and I’ll be updating this post on occasion.
Note
- this is a purely a technical list, and doesn’t include any other aspects of becoming a good engineer, which are plentiful and just as important: communication, conflict handling, domain knowledge, prioritisation skills, social skills, and many others
- this list is quite subjective and not necessarily comprehensive
- it’s written from the point of view of a backend developer - there are many, many other fields of software, but I mainly focus on backend web development and distributed systems.
# Table of contents
- Databases
- Programming
- Tools
- APIs
- QA
- Operating systems
- Hardware
- Infrastructure
- Distributed systems, software architecture
- Monitoring and Alerting
- Networking
- Security
- Basic math and statistics
- Resources
# Databases
Databases are the bread and butter of a backend developer - even if you are mostly dealing with business logic and writing application code, knowing how the underlying datastore works can help immensly.
# Relational databases (RDBMS)
Relational databases are the primary persistence layer of most organisations. Therefore it makes sense to place a heavy emphasis on getting to know them well. Amongst others, you might want to:
- Be able to come up with and draw an entity-relationship model for a given business requirement. It’s a very useful skill in software, and it’s also a common interview question, whereby they ask you to design a database schema for a certain scenario (e.g. for an online bookstore). I suggest searching for “database schema modelling excercise” or “entity realtionship modelling” and you’ll find lots of good resources.
- Know about concurrency control: this fantastic article by the IT Hare can help: ACID, MVCC vs Locks, Transaction Isolation Levels, and Concurrency control.
- know what ACID is,
- be able to explain the basics of transactions,
- know about MVCC (and here),
- know that even with MVCC, there are locks and potential deadlocks.
- Read “Chapter 7. Transactions” from Designing Data Intensive Applications (a highly recommended book!)
- Know about indexes, their different types, the most well-known indexing algorithm; know when they are useful, and when they are not. Know about their drawbacks (i.e. the overhead they create). Here is a good read about this topic.
- Read up about schema migrations: here is a promising article about doing schema migrations at scale in Postgres. Also a good resource is this article from Benchling.
- Be able to talk about ORM, and its pros and cons.
- Sharding is an important concept, about which you can read in this Digital Ocean article, this post by Jeeyoung Kim and on Wikipedia too.
- Know about materialized views
- This post from Jeeyoung Kim on this general topic of databases looks very promising.
- B-tree, WAL
- Data cubes aka OLAP cube
# NoSQL DBs
- Know the fundamental differences between SQL and NoSQL databases (also know that the term non-relational for NoSQL might be a misnomer, as NoSQL DBs still have relations - according to this and this source).
- Know the pros and cons of each, and know where you want to prefer one over the other (do you need a session storage? Do you need atomicity? Do you need transactions? Is vertical scalability essential? Do you need flexible schemas? - questions like these can help to determine which one you need)
- Read up about MongoDB, DynamoDB, Redis, Neo4J and/or Cassandra.
- There is a great and fast-paced talk about DynamoDB by Rick Houlihan here at AWS re:Invent.
# Graph Databases
- Read a bit about Graph databases, like Neo4j. This article on TowardsDataScience seems like a good introduction too.
# Data warehouses
- Read up about column stores on Wikipedia and on docs.microsoft.com (or a similar Microsoft blog post here)
- Read up about star-schema queries
- Snowflake is an elastic data warehouse with amazing scalability. Read the white paper here and slides here
- This is a good article about best practices when working with ETL pipelines and Snowflake
- Redshift is the data warehouse offering of AWS. It would be useful to read a comparison of it with Snowflake (like this or this)
- There is an intersting task when working with data warehouses, which is the challenge of loading large datasets into a warehouse. This looks like a good article about moving data into Snowflake.
- How SIMD helps crunching large amounts of data on modern CPUs: Implementing Database Operations Using SIMD Instructions
# Message queues
While not strictly a persistence layer, queues and message brokers are used heavily to pass messages between different components of a system. I would like to (and would suggest to) look into RabbitMQ, Amazon SQS and Apache Kafka in depth. This IBM article about queues looks good, and this one about message brokers is worthy of a read as well.
# Programming
I’m a Python developer, so this is going to be opinionated.
This is a huge topic, but there are still some general guidelines that I would like to adhere to.
I will strive to know at least 2-3 programming languages at a comfortable level. The choice of language shouldn’t be very important in theory, but it is in practice, because I prefer languages with a thriving ecosystem, good documentation and promising employment prospects.
Python, NodeJS and Go are equally popular nowadays and are fun to learn. Rust, C#, C++, Java, PHP, Ruby and many others are also fine choices in their own way. I will probably limit myself to Python, Go, Node and Ruby.
# Language agnostic topics
- algorithms - as well as being useful for coming up with efficient solutions to real-world problems, having a good grasp of algorithms can help on a traditional coding interview too, so it makes sense to know them well. Sorting, seraching, tree traversals, complexity, bigO notation.
- Look up SST, LSM, B-tree, DAG (data structures related to storage technology, among others)
- data structures - similar to the point above, the most common data structures are worth refreshing once in a while. Stack, queue, deque, list, linked list, hashmap, tree, graph, and some more.
For the two topics above, the Problem Solving with Algorithms and Data Structures using Python material is a good one.
- Concurrent programming - here’s the best article I’ve read so far on this topic.
- design patterns - less popular on interviews, but useful for being able to create good abstractions in code, design patterns are worth reading up about every now and then. See the wiki article for a pretty good read. Some of these are already implemented in higher level languages (e.g. Python has the decorator pattern as a built-in).
- testing (different levels of testing: unit, integration, contract, etc) an article about unit testing (see also the QA section below)
- analyzing the performance of your code: profiling, tracing, etc.
- character encodings, Unicode and character sets
A good resource for this part is the book called “Cracking the Coding Interview” by Gayle Laakmann McDowell. It has tons of good advice for tackling programming tasks on interviews, it inlcudes tons of example quesions and their solutions (with code) for different topics.
# Python
- Assuming you choose Python as your main language, then definitely know about the following topics:
- the GIL, concurrency and async Python - multithreading, mutliprocessing, asyncio. Have a read at the built-in concurrency primitives: docs.python.org/3/library/concurrency.html and read this great Real Python post about this topic.
- pre-emptive and cooperative multitasking (explained in the resources above)
- being thread-safe and re-entrant
- new features in the last 1-2 years - in Python 3.6, 3.7 and 3.8.
- the importance of generators and the
yield
keyword - decorators, including writing a decorator from scratch
- context managers (aka.
with
keyword) - inheritance, with its advantages and its pitfalls (
super
, multiple inheritance, MRO) - here is a good RealPython article for this. - know a web framework or two - like Flask, Django or Tornado
- know how to use an ORM library like SQLAlchemy or DjangoORM
- to be safe, review the typical interview questions about Python.
- have a read at these fun and mind-blowing quirks of Python
- know how to manage dependencies when working with Python code. How to resolve dependency conflicts, another source
- know how to package and distribute projects
- know how to use virtual environments, and use the virtualenv wrapper
- know how to manage dependencies: poetry, pip, pipenv - pros and cons
- use test frameworks and mocks, follow guidelines and good patterns for testing.
- the GIL, concurrency and async Python - multithreading, mutliprocessing, asyncio. Have a read at the built-in concurrency primitives: docs.python.org/3/library/concurrency.html and read this great Real Python post about this topic.
# Tools
- Version control: knowing git is essential - this might sound obvious, but it’s worth perfecting your git skills, including the less frequently used rebase and bisect commands, using Git hooks, knowing the reflog, and some more (eg.: cherry-picking).
- Linux command line tools, including:
- manipulating text with tools like
sed
andawk
- processing JSON files with jq
- searching and sorting (
grep
,sort
,uniq
)
- manipulating text with tools like
- Automating your local workflows with Makefiles. See how using makefiles can benefit you as a Python developer. See this project for a more in-depth tutorial on Make.
# APIs
- know RESTful principles in and out (some good resources here and here)
- HTTP verbs (methods) and which one is used for what operation
- know some of the most used HTTP headers: good sources here, here and here
- know the most frequent HTTP status codes
- be able to write an API that
- filters, sorts, paginates results, and
- rate limits requests. Read up here, here and here about rate limiting.
- have a look at the OpenAPI Specification
- know about API security best practices
- know a bit about GraphQL
- know about streamed responses (in HTTP/1.1 - it’s not supported in HTTP/2)
- know a bit about HTTP/2
# QA
- Martin Fowler has good articles about testing: see the practical test pyramid and this software testgin guide
- See this post about acceptance criteria
# Operating systems
- It sounds evident: know the operating system you are working with. In most cases this usually means understanding some Unix/Linux concepts. The Linux Bible is a fantastic book for this. Some other Linux topics to check out:
# Hardware
It is questionable whether it is essential to know any details about the hardware on which your code runs - it’s probably not. However, it’s still an interesting topic worth reading about, and it might make you a more well-rounded engineer.
- What every programmer should know about SSDs
- What every programmer should know about memory (or if you prefer the PDF format)
- CPUs are an even broader topic - the Wikiedia article is a good summary of some of the details. If you are interested in the lower-level details with more context, I suggest reading Andrew Tanenbaum’s Structured Computer Organization. The book is summarised in a series of slides here (check the pdfs under the “Folien” section).
# Infrastructure
- Docker: the docker-curriculum seems neat
- Container orchestration (eg.: Kubernetes)
- basic concepts and entities
- scaling, replicasets, affinity, health checks, networking, horizontal and vertical pod autoscalers - their docs are great and plenty
- see this post by Julia Evans
- see this repo for Kubernetes internals
- be familiar with at least one cloud provider (AWS/Google Cloud/Azure)
- if AWS, then
- know the basics of AWS services - IAM users, roles, EC2, load balancers, VPCs, Lambda, ElastiCache, etc.
- if AWS, then
- fault tolerance Wiki article, and a article by Imperva
- 12 factor applications
- logging best practices. Log aggregation, log collection. Structured logging. Piping and storing logs.
- AWS Reference Architeture Diagrams
# Distributed systems, software architecture
- Learn system design: find some amazing resources here, here and Gaurav Sen’s Youtube playlist on the topic.
- Read this AWS writing on the challenges of distributed systems
- Read the Dynamo paper, which is a great example of many key concepts used together, ultimately enabling high availability and scalabilty.
- Read about the Paxos consensus protocol on Wikipedia, and on this blog or on Medium or watch John Ousterhout’s lecture on Youtube.
- Backpressure
- Testing distributed systems
- Bugs in distributed systems
- A promising paper about verifying distributed systems and one on the taxonomy of bugs
- Read tons about microservices, know about their pros and cons
- The following architectural patterns are also useful to know:
- Model-View-Controller pattern
# Monitoring and Alerting
- USE and RED methods (read about them here and here, and here and here)
- Read as many blogs posts from RobustPerception as you can - filtering for their instrumentation tag yields a great set of articles, as well as their resources about alerting
- The corresponding chapters from the Google SRE book are must-reads: Monitoring Distributed Systems and Practical Alerting
# Networking
- DNS
- HTTP/S
- TCP basics
- SSH
- provider-specific things (eg.: AWS VPCs), read AWS answers about networking, or another AWS networking post
- Linux networking tools poster by Julia Evans
# Security
- be able to talk about encryption and its most widely used implementations
- know the difference between encoding, encryption, and hashing
- know about man-in-the-middle attacks
- be able to talk about OWASP top ten
- injection, cross-site scripting, etc
- already mentioned at the section about APIs, but this article is worth a read
- authentication methods
- API Keys vs OAuth Tokens vs JSON Web Tokens
- This blog about REST APIs, incudling their security aspect seems promising - for example: Top 5 OWASP Security Tips for REST APIs and 4 Most Used REST API Authentication Methods
# Basic math and statistics
- A blog post about statistics for programmers by Julia Evans
- Learn about probability with Python by Peter Norvig
# Resources
Most of the resources are inlined in this post, so I’ll just use this section to list some books that are considered classics and that I highly recommend reading.
- The Pragmatic Programmer by Andrew Hunt & David Thomas is a book about good programming practices.
- A Philosophy of Software Design by John Ousterhout is also a great book, similar to the one above.
- Designing Data Intensive Applications by Martin Kleppman is an amazing book covering a lot of topics in a very enjoyable manner.
- Clean Code by Robert Martin (see some examples in Javascript here) is a well-known classic.