Documentation Index
Fetch the complete documentation index at: https://jmail.world/docs/llms.txt
Use this file to discover all available pages before exploring further.
DuckDB can query Parquet files directly over HTTP with zero download or setup. This is the fastest way to explore the data interactively.
Setup
Install DuckDB, then query directly:
SELECT sender, COUNT(*) as n
FROM read_parquet('https://data.jmail.world/v1/emails-slim.parquet')
GROUP BY sender ORDER BY n DESC LIMIT 20;
Example Queries
Count emails by sender
SELECT sender, COUNT(*) as n
FROM read_parquet('https://data.jmail.world/v1/emails-slim.parquet')
GROUP BY sender ORDER BY n DESC LIMIT 20;
Emails sent by Epstein
SELECT subject, sent_at, to_recipients
FROM read_parquet('https://data.jmail.world/v1/emails-slim.parquet')
WHERE epstein_is_sender = true
ORDER BY sent_at DESC LIMIT 20;
Search documents
SELECT original_filename, document_description, page_count
FROM read_parquet('https://data.jmail.world/v1/documents.parquet')
WHERE document_description ILIKE '%flight%'
LIMIT 20;
Join photos with people
SELECT p.name, COUNT(*) as appearances
FROM read_parquet('https://data.jmail.world/v1/photo_faces.parquet') pf
JOIN read_parquet('https://data.jmail.world/v1/people.parquet') p
ON pf.person_id = p.id
GROUP BY p.name ORDER BY appearances DESC;
Star counts by entity type
SELECT entity_type, SUM(count) as total_stars
FROM read_parquet('https://data.jmail.world/v1/star_counts.parquet')
GROUP BY entity_type ORDER BY total_stars DESC;
Using with Polars
Polars can also read Parquet over HTTP:
import polars as pl
df = pl.read_parquet("https://data.jmail.world/v1/emails-slim.parquet")
print(df.group_by("sender").len().sort("len", descending=True).head(20))
Using with Pandas
import pandas as pd
df = pd.read_parquet("https://data.jmail.world/v1/emails-slim.parquet")
print(df.groupby("sender").size().sort_values(ascending=False).head(20))