Skip to main content

Available Datasets

All files are served from https://data.jmail.world/v1/.

Emails

The primary dataset. Contains all released emails from the Epstein archive. emails.parquet — Full dataset with body text (content_markdown), sender, recipients, subject, dates, and metadata. emails-slim.parquet — Same emails but without body text columns. Much smaller download, ideal for network analysis, sender/recipient graphs, and timeline visualizations.

Key Columns (slim)

ColumnTypeDescription
idintUnique email ID
doc_idstringThread grouping ID
senderstringSender email/name
subjectstringEmail subject line
to_recipientsjsonTo recipients
cc_recipientsjsonCC recipients
bcc_recipientsjsonBCC recipients
sent_attimestampSend date
account_emailstringSource account
email_drop_idstringSource identifier
epstein_is_senderboolWhether Epstein sent this email

Additional Columns (full)

ColumnTypeDescription
content_markdownstringEmail body as Markdown
content_htmlstringEmail body as HTML
attachmentsintAttachment count

Documents

Metadata for all documents in the archive (DOJ releases, House Oversight, court records).
ColumnTypeDescription
idintUnique document ID
sourcestringSource (doj, house_oversight)
release_batchstringVolume/batch identifier
original_filenamestringOriginal filename
page_countintNumber of pages
sizeintFile size in bytes
document_descriptionstringAI-generated description
has_thumbnailboolWhether a thumbnail exists

Document Full-Text Shards

Full extracted text is too large for a single file. Use the sharded files:
ShardURLContents
VOL00008documents-full/VOL00008.parquetDOJ Volume 8
VOL00009documents-full/VOL00009.parquetDOJ Volume 9
VOL00010documents-full/VOL00010.parquetDOJ Volume 10
DataSet11documents-full/DataSet11.parquetDOJ Dataset 11
otherdocuments-full/other.parquetHouse Oversight, court records, etc.
The Python client handles shard concatenation automatically via client.documents(include_text=True).

Photos

Photo metadata from government releases with AI-generated descriptions.
ColumnTypeDescription
idintUnique photo ID
sourcestringSource identifier
release_batchstringVolume/batch
original_filenamestringOriginal filename
content_typestringMIME type
widthintImage width in pixels
heightintImage height in pixels
image_descriptionstringAI-generated description

People

People identified via AWS Rekognition facial recognition.
ColumnTypeDescription
idintUnique person ID
namestringIdentified name
sourcestringDetection source
photo_countintNumber of photos containing this person

Photo Faces

Bounding boxes linking detected faces in photos to identified people.
ColumnTypeDescription
idintUnique face ID
photo_idintFK to photos
person_idintFK to people
bbox_leftfloatBounding box left edge
bbox_topfloatBounding box top edge
bbox_widthfloatBounding box width
bbox_heightfloatBounding box height
confidencefloatDetection confidence

iMessage Conversations

Metadata for iMessage conversations recovered from the archive.
ColumnTypeDescription
idintUnique conversation ID
slugstringURL-safe conversation identifier
namestringContact name
biostringContact bio/description
photostringContact photo URL
last_messagestringPreview of the last message
last_message_timestringTimestamp of last message
pinnedboolWhether the conversation was pinned
confirmedboolWhether the contact identity is confirmed
source_filesjsonSource files this conversation was extracted from
message_countintTotal messages in this conversation

iMessage Messages

Individual iMessage text messages with sender info and timestamps.
ColumnTypeDescription
idstringUnique message ID ({slug}#{index})
conversation_slugstringFK to conversations (slug)
message_indexintMessage position within conversation
textstringMessage text content
senderstringme (Epstein) or them (contact)
timestringOriginal timestamp string
timestamptimestampParsed timestamp
source_filestringSource file this message was extracted from
sender_namestringDisplay name of sender

Star Counts

Crowd-sourced star/interest counts from jmail.world users.
ColumnTypeDescription
entity_typestringType (email_message, email_thread, photo, document)
entity_idintEntity ID
countintNumber of stars

Release Batches

Metadata about each release batch.
ColumnTypeDescription
idintBatch ID
namestringBatch name
descriptionstringBatch description
released_attimestampPublic release date