PG
All projects
Cookie3

Data Collection Engines

A set of scalable scrapers feeding the company's data ecosystem with content from YouTube, Telegram and news platforms.

What I used
.NETMongoDBDockerGitHub Actions
01

What I did

  • Built fault-tolerant data pipelines for many different sources.
  • Normalized diverse formats into one consistent model.
  • Automated deployment and maintenance with Docker and GitHub Actions.
02

What I learned

  • Working with different APIs and handling rate limiting.
  • Building pipelines that survive a single source going down.
  • Thinking of data as a product — from raw stream to finished insight.
03

Challenges

  • Sources changed structure often — scrapers had to be flexible and easy to fix.
  • Keeping data fresh within external API constraints.
  • Scale — what worked for one source had to generalize to dozens.
See all projects