What to learn as a backend dev - 2025 edition

Note This is an updated version of a post I wrote five years ago. It has been rewritten to include technologies from the past five years and also emphasizes the knowledge (tools and technologies) regarding LLMs.

In the past few weeks, I’ve been collating a list of topics that I plan to explore on my path to becoming a better software engineer. I’m sharing this here in the hope that it will be useful for others as well. I believe that a solid understanding of most of these topics provides a strong base knowledge. I’m going to use this as a guide, and I’ll be updating this post on occasion.

Note

  • this is a purely a technical list, and doesn’t include any other aspects of becoming a good engineer, which are plentiful and just as important: communication, conflict handling, domain knowledge, prioritisation skills, social skills, and many others
  • this list is quite subjective and not necessarily comprehensive
  • it’s written from the point of view of a backend developer - there are many, many other fields of software, but I mainly focus on backend web development and distributed systems.
  • with the rise of ai and llms, understanding these systems has become crucial for backend developers to effectively integrate and work with these technologies

# Table of contents

# Databases

Databases are the bread and butter of a backend developer - even if you are mostly dealing with business logic and writing application code, knowing how the underlying datastore works can help immensely.

# Relational databases (RDBMS)

Relational databases are the primary persistence layer of most organisations. Therefore, it makes sense to place a heavy emphasis on getting to know them well. Amongst others, you might want to:

# NoSQL DBs

  • Know the fundamental differences between SQL and NoSQL databases (also know that the term non-relational for NoSQL might be a misnomer, as NoSQL DBs still have relations - according to this and this source).
  • Know the pros and cons of each, and know where you want to prefer one over the other (do you need a session storage? Do you need atomicity? Do you need transactions? Is vertical scalability essential? Do you need flexible schemas? - questions like these can help to determine which one you need)
  • Read up about MongoDB, DynamoDB, Redis, Neo4J and/or Cassandra.
    • There is a great and fast-paced talk about DynamoDB by Rick Houlihan here at AWS re:Invent.

# Graph Databases

# Vector Databases

  • As LLMs have become more prevalent, vector databases have emerged as a critical technology for managing and querying high-dimensional vector data.
  • Read up on popular vector databases like Pinecone, Weaviate, and Milvus.
  • Understand concepts like vector indexing, similarity search, and approximate nearest neighbor (ANN) algorithms.

# Data warehouses

# Message queues

While not strictly a persistence layer, queues and message brokers are used heavily to pass messages between different components of a system. I would like to (and would suggest to) look into RabbitMQ, Amazon SQS and Apache Kafka in depth. This IBM article about queues looks good, and this one about message brokers is worthy of a read as well.

# Programming

I’m a Python developer, so this is going to be opinionated.

This is a huge topic, but there are still some general guidelines that I would like to adhere to.

I will strive to know at least 2-3 programming languages at a comfortable level. The choice of language shouldn’t be very important in theory, but it is in practice, because I prefer languages with a thriving ecosystem, good documentation and promising employment prospects.

Python, NodeJS and Go are equally popular nowadays and are fun to learn. Rust, C#, C++, Java, PHP, Ruby and many others are also fine choices in their own way. I will probably limit myself to Python, Go, Node and Ruby.

# Language agnostic topics

  • algorithms - as well as being useful for coming up with efficient solutions to real-world problems, having a good grasp of algorithms can help on a traditional coding interview too, so it makes sense to know them well. Sorting, seraching, tree traversals, complexity, bigO notation.
  • Look up SST, LSM, B-tree, DAG (data structures related to storage technology, among others)
  • data structures - similar to the point above, the most common data structures are worth refreshing once in a while. Stack, queue, deque, list, linked list, hashmap, tree, graph, and some more.

    For the two topics above, the Problem Solving with Algorithms and Data Structures using Python material is a good one.

  • Concurrent programming - here’s the best article I’ve read so far on this topic.

  • design patterns - less popular on interviews, but useful for being able to create good abstractions in code, design patterns are worth reading up about every now and then. See the wiki article for a pretty good read. Some of these are already implemented in higher level languages (e.g. Python has the decorator pattern as a built-in).
  • testing (different levels of testing: unit, integration, contract, etc) an article about unit testing (see also the QA section below)
  • analyzing the performance of your code: profiling, tracing, etc.
  • character encodings, Unicode and character sets

A good resource for this part is the book called “Cracking the Coding Interview” by Gayle Laakmann McDowell. It has tons of good advice for tackling programming tasks on interviews, it inlcudes tons of example quesions and their solutions (with code) for different topics.

# Python

# LLMs and AI Systems

As artificial intelligence and particularly large language models (llms) become increasingly integrated into backend systems, understanding these technologies is becoming essential for backend developers. here are key areas to focus on:

# fundamentals

  • understand the basics of machine learning and neural networks
  • learn about transformer architecture and attention mechanisms
  • familiarize yourself with key concepts like tokens, embeddings, and context windows
  • understand the differences between base models, instruction-tuned models, and fine-tuned models

# development and integration

  • learn popular llm frameworks like langchain and llamaindex
  • understand prompt engineering principles and best practices
  • know how to implement retrieval-augmented generation (rag)
  • be familiar with vector databases and similarity search
  • understand token usage, rate limiting, and cost optimization

# limitations and considerations

  • understand hallucinations and their implications
  • be aware of context window limitations
  • know about prompt injection and security concerns
  • understand latency and performance implications
  • be familiar with caching strategies for llm responses

# ethical considerations

  • understand bias in training data and model outputs
  • be aware of privacy implications when handling user data
  • know about content filtering and safety measures
  • understand attribution and copyright considerations
  • be familiar with responsible ai principles

# deployment and operations

  • understand model deployment strategies
  • know how to monitor llm-based systems
  • implement proper error handling and fallbacks
  • understand cost management and optimization
  • be familiar with a/b testing for prompt engineering

# evaluation and testing

  • know how to evaluate model outputs
  • understand metrics for measuring performance
  • implement automated testing for llm-based features
  • know how to handle edge cases and failures
  • understand how to maintain quality over time

# Tools

  • Version control: knowing git is essential - this might sound obvious, but it’s worth perfecting your git skills, including the less frequently used rebase and bisect commands, using Git hooks, knowing the reflog, and some more (eg.: cherry-picking).
  • Linux command line tools, including:
    • manipulating text with tools like sed and awk
    • processing JSON files with jq
    • searching and sorting (grep, sort, uniq)
  • Automating your local workflows with Makefiles. See how using makefiles can benefit you as a Python developer. See this project for a more in-depth tutorial on Make.

# APIs

# QA

# Operating systems

  • It sounds evident: know the operating system you are working with. In most cases this usually means understanding some Unix/Linux concepts. The Linux Bible is a fantastic book for this. Some other Linux topics to check out:

# Hardware

It is questionable whether it is essential to know any details about the hardware on which your code runs - it’s probably not. However, it’s still an interesting topic worth reading about, and it might make you a more well-rounded engineer.

# Infrastructure

  • Docker: the docker-curriculum seems neat
  • Container orchestration (eg.: Kubernetes)
    • basic concepts and entities
    • scaling, replicasets, affinity, health checks, networking, horizontal and vertical pod autoscalers - their docs are great and plenty
    • see this post by Julia Evans
    • see this repo for Kubernetes internals
  • be familiar with at least one cloud provider (AWS/Google Cloud/Azure)
    • if AWS, then
      • know the basics of AWS services - IAM users, roles, EC2, load balancers, VPCs, Lambda, ElastiCache, etc.
  • fault tolerance Wiki article, and a article by Imperva
  • 12 factor applications
  • logging best practices. Log aggregation, log collection. Structured logging. Piping and storing logs.
  • AWS Reference Architeture Diagrams
  • Infrastructure as Code (IaC) has become a standard practice for managing and provisioning infrastructure.
  • know that Terraform is a popular open-source IaC tool that allows you to define and provision infrastructure using a declarative configuration language.
  • know that Serverless computing, popularized by services like AWS Lambda, allows you to run code without provisioning or managing servers.
  • know that Service meshes like Istio and Linkerd provide a dedicated infrastructure layer for handling service-to-service communication in microservices architectures.

# Distributed systems, software architecture

# Monitoring and Alerting

  • USE and RED methods (read about them here and here, and here and here)
  • Read as many blogs posts from RobustPerception as you can - filtering for their instrumentation tag yields a great set of articles, as well as their resources about alerting
  • The corresponding chapters from the Google SRE book are must-reads: Monitoring Distributed Systems and Practical Alerting
  • Prometheus has become a leading open-source monitoring system, particularly in cloud-native environments.
  • OpenTelemetry is an emerging standard for observability, providing a set of APIs and SDKs for generating and collecting telemetry data.

# Networking

# Security

# Basic math and statistics

# Resources

Most of the resources are inlined in this post, so I’ll just use this section to list some books that are considered classics and that I highly recommend reading.

  • The Pragmatic Programmer by Andrew Hunt & David Thomas is a book about good programming practices.
  • A Philosophy of Software Design by John Ousterhout is also a great book, similar to the one above.
  • Designing Data Intensive Applications by Martin Kleppman is an amazing book covering a lot of topics in a very enjoyable manner.
  • Clean Code by Robert Martin (see some examples in Javascript here) is a well-known classic.
  • Building Microservices by Sam Newman provides a comprehensive guide to designing, building, and deploying microservices.
  • Site Reliability Engineering by Google is a collection of essays and best practices for running reliable, scalable systems.
  • The Phoenix Project by Gene Kim, Kevin Behr, and George Spafford is a novel that explores DevOps principles and practices through a fictional story.
  • Accelerate: The Science of Lean Software and DevOps by Nicole Forsgren, Jez Humble, and Gene Kim presents research-backed insights into the factors that drive high-performing software teams.
  • Kubernetes Patterns by Bilgin Ibryam and Roland Huß provides a catalog of reusable patterns for designing and implementing cloud-native applications on Kubernetes.
  • Fundamentals of Data Engineering by Joe Reis and Matt Housley covers the end-to-end process of building and operating data systems.
  • Machine Learning Design Patterns by Valliappa Lakshmanan, Sara Robinson, and Michael Munn offers solutions to common challenges in designing and deploying machine learning systems.
Written on January 14, 2025

If you notice anything wrong with this post (factual error, rude tone, bad grammar, typo, etc.), and you feel like giving feedback, please do so by contacting me at hello@samu.space. Thank you!