What to learn as a backend dev - 2025 edition

Note This is an updated version of a post I wrote five years ago. It has been rewritten to include technologies from the past five years and also emphasizes the knowledge (tools and technologies) regarding LLMs.

In the past few weeks, I’ve been collating a list of topics that I plan to explore on my path to becoming a better software engineer. I’m sharing this here in the hope that it will be useful for others as well. I believe that a solid understanding of most of these topics provides a strong base knowledge. I’m going to use this as a guide, and I’ll be updating this post on occasion.

Note

this is a purely a technical list, and doesn’t include any other aspects of becoming a good engineer, which are plentiful and just as important: communication, conflict handling, domain knowledge, prioritisation skills, social skills, and many others

this list is quite subjective and not necessarily comprehensive

it’s written from the point of view of a backend developer - there are many, many other fields of software, but I mainly focus on backend web development and distributed systems.

with the rise of ai and llms, understanding these systems has become crucial for backend developers to effectively integrate and work with these technologies

# Table of contents

Databases
Programming
Tools
APIs
QA
Operating systems
Hardware
Infrastructure
Distributed systems, software architecture
Monitoring and Alerting
Networking
Security
Basic math and statistics
Resources

# Databases

Databases are the bread and butter of a backend developer - even if you are mostly dealing with business logic and writing application code, knowing how the underlying datastore works can help immensely.

# Relational databases (RDBMS)

Relational databases are the primary persistence layer of most organisations. Therefore, it makes sense to place a heavy emphasis on getting to know them well. Amongst others, you might want to:

Be able to come up with and draw an entity-relationship model for a given business requirement. It’s a very useful skill in software, and it’s also a common interview question, whereby they ask you to design a database schema for a certain scenario (e.g. for an online bookstore). I suggest searching for “database schema modelling excercise” or “entity realtionship modelling” and you’ll find lots of good resources.
Know about concurrency control: this fantastic article by the IT Hare can help: ACID, MVCC vs Locks, Transaction Isolation Levels, and Concurrency control.
- know what ACID is,
- be able to explain the basics of transactions,
- know about MVCC (and here),
- know that even with MVCC, there are locks and potential deadlocks.
Read “Chapter 7. Transactions” from Designing Data Intensive Applications (a highly recommended book!)
Know about indexes, their different types, the most well-known indexing algorithm; know when they are useful, and when they are not. Know about their drawbacks (i.e. the overhead they create). Here is a good read about this topic.
Read up about schema migrations: here is a promising article about doing schema migrations at scale in Postgres. Also a good resource is this article from Benchling.
Be able to talk about ORM, and its pros and cons.
Sharding is an important concept, about which you can read in this Digital Ocean article, this post by Jeeyoung Kim and on Wikipedia too.
Know about materialized views
This post from Jeeyoung Kim on this general topic of databases looks very promising.
B-tree, WAL
Data cubes aka OLAP cube

# NoSQL DBs

Know the fundamental differences between SQL and NoSQL databases (also know that the term non-relational for NoSQL might be a misnomer, as NoSQL DBs still have relations - according to this and this source).
Know the pros and cons of each, and know where you want to prefer one over the other (do you need a session storage? Do you need atomicity? Do you need transactions? Is vertical scalability essential? Do you need flexible schemas? - questions like these can help to determine which one you need)
Read up about MongoDB, DynamoDB, Redis, Neo4J and/or Cassandra.
- There is a great and fast-paced talk about DynamoDB by Rick Houlihan here at AWS re:Invent.

# Graph Databases

Read a bit about Graph databases, like Neo4j. This article on TowardsDataScience seems like a good introduction too.

# Vector Databases

As LLMs have become more prevalent, vector databases have emerged as a critical technology for managing and querying high-dimensional vector data.
Read up on popular vector databases like Pinecone, Weaviate, and Milvus.
Understand concepts like vector indexing, similarity search, and approximate nearest neighbor (ANN) algorithms.

# Data warehouses

Read up about column stores on Wikipedia and on docs.microsoft.com (or a similar Microsoft blog post here)
Read up about star-schema queries
Snowflake is an elastic data warehouse with amazing scalability. Read the white paper here and slides here
This is a good article about best practices when working with ETL pipelines and Snowflake
Redshift is the data warehouse offering of AWS. It would be useful to read a comparison of it with Snowflake (like this or this)
There is an intersting task when working with data warehouses, which is the challenge of loading large datasets into a warehouse. This looks like a good article about moving data into Snowflake.
How SIMD helps crunching large amounts of data on modern CPUs: Implementing Database Operations Using SIMD Instructions

# Message queues

While not strictly a persistence layer, queues and message brokers are used heavily to pass messages between different components of a system. I would like to (and would suggest to) look into RabbitMQ, Amazon SQS and Apache Kafka in depth. This IBM article about queues looks good, and this one about message brokers is worthy of a read as well.

# Programming

I’m a Python developer, so this is going to be opinionated.

This is a huge topic, but there are still some general guidelines that I would like to adhere to.

I will strive to know at least 2-3 programming languages at a comfortable level. The choice of language shouldn’t be very important in theory, but it is in practice, because I prefer languages with a thriving ecosystem, good documentation and promising employment prospects.

Python, NodeJS and Go are equally popular nowadays and are fun to learn. Rust, C#, C++, Java, PHP, Ruby and many others are also fine choices in their own way. I will probably limit myself to Python, Go, Node and Ruby.

# Language agnostic topics

algorithms - as well as being useful for coming up with efficient solutions to real-world problems, having a good grasp of algorithms can help on a traditional coding interview too, so it makes sense to know them well. Sorting, seraching, tree traversals, complexity, bigO notation.
Look up SST, LSM, B-tree, DAG (data structures related to storage technology, among others)
data structures - similar to the point above, the most common data structures are worth refreshing once in a while. Stack, queue, deque, list, linked list, hashmap, tree, graph, and some more.

For the two topics above, the Problem Solving with Algorithms and Data Structures using Python material is a good one.
Concurrent programming - here’s the best article I’ve read so far on this topic.

design patterns - less popular on interviews, but useful for being able to create good abstractions in code, design patterns are worth reading up about every now and then. See the wiki article for a pretty good read. Some of these are already implemented in higher level languages (e.g. Python has the decorator pattern as a built-in).
testing (different levels of testing: unit, integration, contract, etc) an article about unit testing (see also the QA section below)
analyzing the performance of your code: profiling, tracing, etc.
character encodings, Unicode and character sets

A good resource for this part is the book called “Cracking the Coding Interview” by Gayle Laakmann McDowell. It has tons of good advice for tackling programming tasks on interviews, it inlcudes tons of example quesions and their solutions (with code) for different topics.

# Python

Assuming you choose Python as your main language, then definitely know about the following topics:
- the GIL, concurrency and async Python - multithreading, mutliprocessing, asyncio. Have a read at the built-in concurrency primitives: docs.python.org/3/library/concurrency.html and read this great Real Python post about this topic.
  - pre-emptive and cooperative multitasking (explained in the resources above)
  - being thread-safe and re-entrant
- new features in the last 1-2 years - in Python 3.6, 3.7, 3.8, 3.9, 3.10, 3.11 and 3.12.
- the importance of generators and the yield keyword
- decorators, including writing a decorator from scratch
- context managers (aka. with keyword)
- inheritance, with its advantages and its pitfalls (super, multiple inheritance, MRO) - here is a good RealPython article for this.
- know a web framework or two - like Flask, Django or Tornado
- know how to use an ORM library like SQLAlchemy or DjangoORM
- to be safe, review the typical interview questions about Python.
- have a read at these fun and mind-blowing quirks of Python
- know how to manage dependencies when working with Python code. How to resolve dependency conflicts, another source
- know how to package and distribute projects
- know how to use virtual environments, and use the virtualenv wrapper
- know how to manage dependencies: poetry, pip, pipenv - pros and cons
- use test frameworks and mocks, follow guidelines and good patterns for testing.

# LLMs and AI Systems

As artificial intelligence and particularly large language models (llms) become increasingly integrated into backend systems, understanding these technologies is becoming essential for backend developers. here are key areas to focus on:

# fundamentals

understand the basics of machine learning and neural networks
learn about transformer architecture and attention mechanisms
familiarize yourself with key concepts like tokens, embeddings, and context windows
understand the differences between base models, instruction-tuned models, and fine-tuned models

# development and integration

learn popular llm frameworks like langchain and llamaindex
understand prompt engineering principles and best practices
know how to implement retrieval-augmented generation (rag)
be familiar with vector databases and similarity search
understand token usage, rate limiting, and cost optimization

# limitations and considerations

understand hallucinations and their implications
be aware of context window limitations
know about prompt injection and security concerns
understand latency and performance implications
be familiar with caching strategies for llm responses

# ethical considerations

understand bias in training data and model outputs
be aware of privacy implications when handling user data
know about content filtering and safety measures
understand attribution and copyright considerations
be familiar with responsible ai principles

# deployment and operations

understand model deployment strategies
know how to monitor llm-based systems
implement proper error handling and fallbacks
understand cost management and optimization
be familiar with a/b testing for prompt engineering

# evaluation and testing

know how to evaluate model outputs
understand metrics for measuring performance
implement automated testing for llm-based features
know how to handle edge cases and failures
understand how to maintain quality over time

# Tools

Version control: knowing git is essential - this might sound obvious, but it’s worth perfecting your git skills, including the less frequently used rebase and bisect commands, using Git hooks, knowing the reflog, and some more (eg.: cherry-picking).
Linux command line tools, including:
- manipulating text with tools like sed and awk
- processing JSON files with jq
- searching and sorting (grep, sort, uniq)
Automating your local workflows with Makefiles. See how using makefiles can benefit you as a Python developer. See this project for a more in-depth tutorial on Make.

# APIs

know RESTful principles in and out (some good resources here and here)
HTTP verbs (methods) and which one is used for what operation
know some of the most used HTTP headers: good sources here, here and here
know the most frequent HTTP status codes
be able to write an API that
- filters, sorts, paginates results, and
- rate limits requests. Read up here, here and here about rate limiting.
have a look at the OpenAPI Specification
know about API security best practices
know a bit about GraphQL
know about streamed responses (in HTTP/1.1 - it’s not supported in HTTP/2)
know a bit about HTTP/2
know that gRPC has gained popularity as a high-performance, open-source universal RPC framework.
know that WebSockets provide full-duplex communication channels over a single TCP connection, enabling real-time, bidirectional communication between clients and servers.

# QA

Martin Fowler has good articles about testing: see the practical test pyramid and this software testgin guide
See this post about acceptance criteria

# Operating systems

It sounds evident: know the operating system you are working with. In most cases this usually means understanding some Unix/Linux concepts. The Linux Bible is a fantastic book for this. Some other Linux topics to check out:
- Filesystems
- boot sequence

# Hardware

It is questionable whether it is essential to know any details about the hardware on which your code runs - it’s probably not. However, it’s still an interesting topic worth reading about, and it might make you a more well-rounded engineer.

What every programmer should know about SSDs
What every programmer should know about memory (or if you prefer the PDF format)
CPUs are an even broader topic - the Wikiedia article is a good summary of some of the details. If you are interested in the lower-level details with more context, I suggest reading Andrew Tanenbaum’s Structured Computer Organization. The book is summarised in a series of slides here (check the pdfs under the “Folien” section).

# Infrastructure

Docker: the docker-curriculum seems neat
Container orchestration (eg.: Kubernetes)
- basic concepts and entities
- scaling, replicasets, affinity, health checks, networking, horizontal and vertical pod autoscalers - their docs are great and plenty
- see this post by Julia Evans
- see this repo for Kubernetes internals
be familiar with at least one cloud provider (AWS/Google Cloud/Azure)
- if AWS, then
  - know the basics of AWS services - IAM users, roles, EC2, load balancers, VPCs, Lambda, ElastiCache, etc.
fault tolerance Wiki article, and a article by Imperva
12 factor applications
logging best practices. Log aggregation, log collection. Structured logging. Piping and storing logs.
AWS Reference Architeture Diagrams
Infrastructure as Code (IaC) has become a standard practice for managing and provisioning infrastructure.
know that Terraform is a popular open-source IaC tool that allows you to define and provision infrastructure using a declarative configuration language.
know that Serverless computing, popularized by services like AWS Lambda, allows you to run code without provisioning or managing servers.
know that Service meshes like Istio and Linkerd provide a dedicated infrastructure layer for handling service-to-service communication in microservices architectures.

# Distributed systems, software architecture

Learn system design: find some amazing resources here, here and Gaurav Sen’s Youtube playlist on the topic.
Read this AWS writing on the challenges of distributed systems
Read the Dynamo paper, which is a great example of many key concepts used together, ultimately enabling high availability and scalabilty.
Read about the Paxos consensus protocol on Wikipedia, and on this blog or on Medium or watch John Ousterhout’s lecture on Youtube.
Backpressure
Testing distributed systems
Bugs in distributed systems
A promising paper about verifying distributed systems and one on the taxonomy of bugs
Read tons about microservices, know about their pros and cons
The following architectural patterns are also useful to know:
Model-View-Controller pattern
The concept of a “data mesh” has emerged as a decentralized approach to data management, where data is treated as a product and owned by domains.

# Monitoring and Alerting

USE and RED methods (read about them here and here, and here and here)
Read as many blogs posts from RobustPerception as you can - filtering for their instrumentation tag yields a great set of articles, as well as their resources about alerting
The corresponding chapters from the Google SRE book are must-reads: Monitoring Distributed Systems and Practical Alerting
Prometheus has become a leading open-source monitoring system, particularly in cloud-native environments.
OpenTelemetry is an emerging standard for observability, providing a set of APIs and SDKs for generating and collecting telemetry data.

# Networking

DNS
HTTP/S
TCP basics
SSH
provider-specific things (eg.: AWS VPCs), read AWS answers about networking, or another AWS networking post
Linux networking tools poster by Julia Evans

# Security

be able to talk about encryption and its most widely used implementations
know the difference between encoding, encryption, and hashing
know about man-in-the-middle attacks
be able to talk about OWASP top ten
- injection, cross-site scripting, etc
already mentioned at the section about APIs, but this article is worth a read
authentication methods
API Keys vs OAuth Tokens vs JSON Web Tokens
This blog about REST APIs, incudling their security aspect seems promising - for example: Top 5 OWASP Security Tips for REST APIs and 4 Most Used REST API Authentication Methods

# Basic math and statistics

A blog post about statistics for programmers by Julia Evans
Learn about probability with Python by Peter Norvig

# Resources

Most of the resources are inlined in this post, so I’ll just use this section to list some books that are considered classics and that I highly recommend reading.

The Pragmatic Programmer by Andrew Hunt & David Thomas is a book about good programming practices.
A Philosophy of Software Design by John Ousterhout is also a great book, similar to the one above.
Designing Data Intensive Applications by Martin Kleppman is an amazing book covering a lot of topics in a very enjoyable manner.
Clean Code by Robert Martin (see some examples in Javascript here) is a well-known classic.
Building Microservices by Sam Newman provides a comprehensive guide to designing, building, and deploying microservices.
Site Reliability Engineering by Google is a collection of essays and best practices for running reliable, scalable systems.
The Phoenix Project by Gene Kim, Kevin Behr, and George Spafford is a novel that explores DevOps principles and practices through a fictional story.
Accelerate: The Science of Lean Software and DevOps by Nicole Forsgren, Jez Humble, and Gene Kim presents research-backed insights into the factors that drive high-performing software teams.
Kubernetes Patterns by Bilgin Ibryam and Roland Huß provides a catalog of reusable patterns for designing and implementing cloud-native applications on Kubernetes.
Fundamentals of Data Engineering by Joe Reis and Matt Housley covers the end-to-end process of building and operating data systems.
Machine Learning Design Patterns by Valliappa Lakshmanan, Sara Robinson, and Michael Munn offers solutions to common challenges in designing and deploying machine learning systems.

Written on January 14, 2025

If you notice anything wrong with this post (factual error, rude tone, bad grammar, typo, etc.), and you feel like giving feedback, please do so by contacting me at hello@samu.space. Thank you!

Back