ai news aggregator
I recently built a small and opinionated AI news aggregator site (ai.samu.space) to help filter through the noise and find meaningful AI-related news and research. The project took about 3-4 hours to put together, leveraging FastAPI, React, news APIs and OpenAI’s API.
# Architecture
The application consists of three main components:
- A FastAPI backend that fetches and processes news articles, storing them in sqlite
- A React frontend for displaying the news in a clean interface
- An LLM service that helps filter and summarize the content
# News Processing Pipeline
The most interesting part of the system is how it processes incoming news articles:
- Articles are fetched from multiple sources (NewsAPI, academic papers)
- Each article goes through content filtering:
- Blocked terms filter (to avoid celebrity drama and ignore people I don’t want to hear about)
- Similarity check to prevent duplicates
- AI-relevance check using LLM
- Valid articles are stored in a sqlite database
- A summary is generated for the latest batch of news
The LLM service acts as a smart filter, determining if an article is truly AI-related rather than just containing buzzwords. This helps maintain signal-to-noise ratio in the feed.
# Frontend Experience
The frontend is intentionally minimal, focusing on readability and quick access to information:
- Clean card-based layout for articles
- Tabs for switching between news and academic content
- AI-generated synthesis of latest developments
- Mobile-responsive design
# Technical Decisions
Some interesting technical choices I made:
# Background Processing
Rather than making users wait, article fetching happens in the background using FastAPI’s BackgroundTasks. When a user requests news, they immediately get cached results while a refresh is triggered if needed.
# Caching Strategy
Articles are cached for 6 hours to balance freshness with API rate limits. The cache invalidation is handled automatically when users request news.
# Content Deduplication
To avoid duplicate content, the service uses a combination of title matching and content similarity checking. This handles cases where the same news appears on different sites with slightly different titles.
# Learnings
Building this highlighted some interesting challenges:
- LLM APIs can be slow - had to carefully balance thoroughness of filtering with response times
- News APIs have varying quality - some return duplicates or irrelevant content requiring additional filtering
- Background tasks need careful error handling to prevent failed tasks from affecting the main application
- Summarization requires finding the right prompt balance between conciseness and informativeness
The project was a fun exercise in putting together different services (news APIs, LLMs, frontend) into a cohesive application. While simple, it serves its purpose of helping me stay updated on meaningful AI developments without getting lost in the noise.