

Professional datasets, dataset-specific AI enrichment, instant delivery. No enterprise sales calls. No surprise fees.
Reddit, YouTube, GitHub, Medium, LinkedIn and 16 more — 100 active datasets covering finance, AI, gaming, jobs, crypto and more.
100-row samples from every dataset — no signup needed. Datasets under 1,000 records are completely free to download.
Download in CSV, JSON, JSONL, or Parquet. Every dataset supports all formats on both sample and full endpoints.
Automated pipeline runs every day across all 20 platforms. Data lands in S3 and is live within hours.
Each dataset carries only the enrichment fields that matter for its topic — no bloat, no irrelevant columns.
Deduplicated, schema-validated records. Missing fields handled gracefully. Ready to load directly into your pipeline.
Every dataset carries only the columns that make sense for its platform and topic. Reddit and Medium datasets get full sentiment analysis. GitHub datasets skip it entirely — stars, forks, and community health matter more. Financial datasets add ticker extraction and directional signals. YouTube adds virality scoring and topic classification.
sentiment_labelpositive / neutral / negativesentiment_scoreVADER compound −1.0 → 1.0sentiment_positivepositive proportion 0→1sentiment_negativenegative proportion 0→1sentiment_neutralneutral proportion 0→1sentiment_confidencemodel certainty 0→1sentiment_methodvader_lexicon | blendedemotion_primaryjoy / anger / fear / sadness …sarcasm_flagbool — /s, quoting, incongruencesarcasm_scoresarcasm confidence 0→1language_detectedISO 639-1 code en / es / fr …text_quality_scorespam / noise signal 0→1subjectivity_scoreobjective ↔ subjective 0→1intensity_scoreexpression strength 0→1financial_signalbullish / bearish / neutral (financial / crypto datasets)financefinancial_scoredirectional score −1→1 (financial / crypto datasets)financeticker_mentions$TICKER list extracted from text (financial / crypto)financeaspect_sentimentper-aspect label + score dict (brand / sarcasm datasets)brandvirality_scoreengagement-weighted score (YouTube datasets)youtubetopic_categoryyt_topic content bucket (YouTube datasets)youtubeis_trendingbool virality > 70 (YouTube datasets)youtubehas_codecode block detected (AI / dev / ML datasets)devmodels_mentionedGPT / LLaMA / Claude / Gemini … (ML/AI datasets)devcve_mentionedCVE-YYYY-NNNN regex match (cyber-security dataset)securityhas_scientific_refstudy / research / evidence cited (science / health)healthcontent_formatarticle / listicle / case_study (Medium datasets)mediumcurl, Python, or any HTTP client — no SDK required
# Free sample — no auth needed curl "https://api.socialintel.io/api/datasets/ sample/financial-sentiment?format=csv" # Free full dataset — no auth needed curl "https://api.socialintel.io/api/datasets/ download/gaming-streams?format=parquet" # Paid dataset — API key required curl -H "X-API-Key: sk_live_..." \ "https://api.socialintel.io/api/datasets/ download/financial-sentiment?format=json"
Data streams directly — hit the endpoint, get the file
Datasets stream directly from S3 via the API. Files are ready the moment you hit the endpoint.
Every dataset has a /sample endpoint returning 100 rows. Inspect schema, enrichment fields and quality before buying.
$59/month gets you unlimited downloads across all 100 datasets, all formats, every day.
Built secure from day one — not bolted on
X-Frame-Options, X-Content-Type-Options, X-XSS-Protection, HSTS, and CSP all enforced.
All inputs sanitized. CSRF tokens required on state-changing requests. Output escaped at render.
100 requests/minute per IP on the API. Edge-level protection on frontend routes via Vercel.
API keys and env vars never sent to the client. .env files blocked. Keys stored server-side only.
Hosted on Railway + Vercel. TLS 1.3, AES-256 at rest, SSL auto-renewed. No plaintext transport.
GDPR and CCPA compliant. Only public content collected. No personal data sold or shared.
Browse 100 datasets with enrichment fields tailored to each platform and topic. Download free samples instantly — no account needed.