ai news aggregator

I recently built a small and opinionated AI news aggregator site (ai.samu.space) to help filter through the noise and find meaningful AI-related news and research. The project took about 3-4 hours to put together, leveraging FastAPI, React, news APIs and OpenAI’s API.

ai-news-aggregator

# Architecture

The application consists of three main components:

  • A FastAPI backend that fetches and processes news articles, storing them in sqlite
  • A React frontend for displaying the news in a clean interface
  • An LLM service that helps filter and summarize the content

# News Processing Pipeline

The most interesting part of the system is how it processes incoming news articles:

  1. Articles are fetched from multiple sources (NewsAPI, academic papers)
  2. Each article goes through content filtering:
    • Blocked terms filter (to avoid celebrity drama and ignore people I don’t want to hear about)
    • Similarity check to prevent duplicates
    • AI-relevance check using LLM
  3. Valid articles are stored in a sqlite database
  4. A summary is generated for the latest batch of news

The LLM service acts as a smart filter, determining if an article is truly AI-related rather than just containing buzzwords. This helps maintain signal-to-noise ratio in the feed.

# Frontend Experience

The frontend is intentionally minimal, focusing on readability and quick access to information:

  • Clean card-based layout for articles
  • Tabs for switching between news and academic content
  • AI-generated synthesis of latest developments
  • Mobile-responsive design

# Technical Decisions

Some interesting technical choices I made:

# Background Processing

Rather than making users wait, article fetching happens in the background using FastAPI’s BackgroundTasks. When a user requests news, they immediately get cached results while a refresh is triggered if needed.

# Caching Strategy

Articles are cached for 6 hours to balance freshness with API rate limits. The cache invalidation is handled automatically when users request news.

# Content Deduplication

To avoid duplicate content, the service uses a combination of title matching and content similarity checking. This handles cases where the same news appears on different sites with slightly different titles.

# Learnings

Building this highlighted some interesting challenges:

  • LLM APIs can be slow - had to carefully balance thoroughness of filtering with response times
  • News APIs have varying quality - some return duplicates or irrelevant content requiring additional filtering
  • Background tasks need careful error handling to prevent failed tasks from affecting the main application
  • Summarization requires finding the right prompt balance between conciseness and informativeness

The project was a fun exercise in putting together different services (news APIs, LLMs, frontend) into a cohesive application. While simple, it serves its purpose of helping me stay updated on meaningful AI developments without getting lost in the noise.

Written on January 27, 2025

If you notice anything wrong with this post (factual error, rude tone, bad grammar, typo, etc.), and you feel like giving feedback, please do so by contacting me at hello@samu.space. Thank you!