Datasets - Jmail Data API

Available Datasets

All files are served from https://data.jmail.world/v1/.

Dataset	Parquet URL	NDJSON URL
Emails (full)	`emails.parquet`	`emails.ndjson.gz`
Emails (slim)	`emails-slim.parquet`	`emails-slim.ndjson.gz`
Documents	`documents.parquet`	`documents.ndjson.gz`
Photos	`photos.parquet`	`photos.ndjson.gz`
People	`people.parquet`	`people.ndjson.gz`
Photo Faces	`photo_faces.parquet`	`photo_faces.ndjson.gz`
iMessage Conversations	`imessage_conversations.parquet`	`imessage_conversations.ndjson.gz`
iMessage Messages	`imessage_messages.parquet`	`imessage_messages.ndjson.gz`
Star Counts	`star_counts.parquet`	`star_counts.ndjson.gz`
Release Batches	`release_batches.parquet`	`release_batches.ndjson.gz`

Emails

The primary dataset. Contains all released emails from the Epstein archive. emails.parquet — Full dataset with body text (content_markdown), sender, recipients, subject, dates, and metadata. emails-slim.parquet — Same emails but without body text columns. Much smaller download, ideal for network analysis, sender/recipient graphs, and timeline visualizations.

Key Columns (slim)

Column	Type	Description
`id`	int	Unique email ID
`doc_id`	string	Thread grouping ID
`sender`	string	Sender email/name
`subject`	string	Email subject line
`to_recipients`	json	To recipients
`cc_recipients`	json	CC recipients
`bcc_recipients`	json	BCC recipients
`sent_at`	timestamp	Send date
`account_email`	string	Source account
`email_drop_id`	string	Source identifier
`epstein_is_sender`	bool	Whether Epstein sent this email

Additional Columns (full)

Column	Type	Description
`content_markdown`	string	Email body as Markdown
`content_html`	string	Email body as HTML
`attachments`	int	Attachment count

Documents

Metadata for all documents in the archive (DOJ releases, House Oversight, court records).

Column	Type	Description
`id`	int	Unique document ID
`source`	string	Source (`doj`, `house_oversight`)
`release_batch`	string	Volume/batch identifier
`original_filename`	string	Original filename
`page_count`	int	Number of pages
`size`	int	File size in bytes
`document_description`	string	AI-generated description
`has_thumbnail`	bool	Whether a thumbnail exists

Document Full-Text Shards

Full extracted text is too large for a single file. Use the sharded files:

Shard	URL	Contents
VOL00008	`documents-full/VOL00008.parquet`	DOJ Volume 8
VOL00009	`documents-full/VOL00009.parquet`	DOJ Volume 9
VOL00010	`documents-full/VOL00010.parquet`	DOJ Volume 10
DataSet11	`documents-full/DataSet11.parquet`	DOJ Dataset 11
other	`documents-full/other.parquet`	House Oversight, court records, etc.

The Python client handles shard concatenation automatically via client.documents(include_text=True).

Photos

Photo metadata from government releases with AI-generated descriptions.

Column	Type	Description
`id`	int	Unique photo ID
`source`	string	Source identifier
`release_batch`	string	Volume/batch
`original_filename`	string	Original filename
`content_type`	string	MIME type
`width`	int	Image width in pixels
`height`	int	Image height in pixels
`image_description`	string	AI-generated description

People

People identified via AWS Rekognition facial recognition.

Column	Type	Description
`id`	int	Unique person ID
`name`	string	Identified name
`source`	string	Detection source
`photo_count`	int	Number of photos containing this person

Photo Faces

Bounding boxes linking detected faces in photos to identified people.

Column	Type	Description
`id`	int	Unique face ID
`photo_id`	int	FK to photos
`person_id`	int	FK to people
`bbox_left`	float	Bounding box left edge
`bbox_top`	float	Bounding box top edge
`bbox_width`	float	Bounding box width
`bbox_height`	float	Bounding box height
`confidence`	float	Detection confidence

iMessage Conversations

Metadata for iMessage conversations recovered from the archive.

Column	Type	Description
`id`	int	Unique conversation ID
`slug`	string	URL-safe conversation identifier
`name`	string	Contact name
`bio`	string	Contact bio/description
`photo`	string	Contact photo URL
`last_message`	string	Preview of the last message
`last_message_time`	string	Timestamp of last message
`pinned`	bool	Whether the conversation was pinned
`confirmed`	bool	Whether the contact identity is confirmed
`source_files`	json	Source files this conversation was extracted from
`message_count`	int	Total messages in this conversation

iMessage Messages

Individual iMessage text messages with sender info and timestamps.

Column	Type	Description
`id`	string	Unique message ID (`{slug}#{index}`)
`conversation_slug`	string	FK to conversations (slug)
`message_index`	int	Message position within conversation
`text`	string	Message text content
`sender`	string	`me` (Epstein) or `them` (contact)
`time`	string	Original timestamp string
`timestamp`	timestamp	Parsed timestamp
`source_file`	string	Source file this message was extracted from
`sender_name`	string	Display name of sender

Star Counts

Crowd-sourced star/interest counts from jmail.world users.

Column	Type	Description
`entity_type`	string	Type (`email_message`, `email_thread`, `photo`, `document`)
`entity_id`	int	Entity ID
`count`	int	Number of stars

Release Batches

Metadata about each release batch.

Column	Type	Description
`id`	int	Batch ID
`name`	string	Batch name
`description`	string	Batch description
`released_at`	timestamp	Public release date

​Available Datasets

​ Emails

​Key Columns (slim)

​Additional Columns (full)

​ Documents

​Document Full-Text Shards

​ Photos

​People

​Photo Faces

​iMessage Conversations

​iMessage Messages

​Star Counts

​Release Batches