Available Datasets
All files are served fromhttps://data.jmail.world/v1/.
| Dataset | Parquet URL | NDJSON URL |
|---|---|---|
| Emails (full) | emails.parquet | emails.ndjson.gz |
| Emails (slim) | emails-slim.parquet | emails-slim.ndjson.gz |
| Documents | documents.parquet | documents.ndjson.gz |
| Photos | photos.parquet | photos.ndjson.gz |
| People | people.parquet | people.ndjson.gz |
| Photo Faces | photo_faces.parquet | photo_faces.ndjson.gz |
| iMessage Conversations | imessage_conversations.parquet | imessage_conversations.ndjson.gz |
| iMessage Messages | imessage_messages.parquet | imessage_messages.ndjson.gz |
| Star Counts | star_counts.parquet | star_counts.ndjson.gz |
| Release Batches | release_batches.parquet | release_batches.ndjson.gz |
Emails
The primary dataset. Contains all released emails from the Epstein archive.
emails.parquet — Full dataset with body text (content_markdown), sender, recipients, subject, dates, and metadata.
emails-slim.parquet — Same emails but without body text columns. Much smaller download, ideal for network analysis, sender/recipient graphs, and timeline visualizations.
Key Columns (slim)
| Column | Type | Description |
|---|---|---|
id | int | Unique email ID |
doc_id | string | Thread grouping ID |
sender | string | Sender email/name |
subject | string | Email subject line |
to_recipients | json | To recipients |
cc_recipients | json | CC recipients |
bcc_recipients | json | BCC recipients |
sent_at | timestamp | Send date |
account_email | string | Source account |
email_drop_id | string | Source identifier |
epstein_is_sender | bool | Whether Epstein sent this email |
Additional Columns (full)
| Column | Type | Description |
|---|---|---|
content_markdown | string | Email body as Markdown |
content_html | string | Email body as HTML |
attachments | int | Attachment count |
Documents
Metadata for all documents in the archive (DOJ releases, House Oversight, court records).
| Column | Type | Description |
|---|---|---|
id | int | Unique document ID |
source | string | Source (doj, house_oversight) |
release_batch | string | Volume/batch identifier |
original_filename | string | Original filename |
page_count | int | Number of pages |
size | int | File size in bytes |
document_description | string | AI-generated description |
has_thumbnail | bool | Whether a thumbnail exists |
Document Full-Text Shards
Full extracted text is too large for a single file. Use the sharded files:| Shard | URL | Contents |
|---|---|---|
| VOL00008 | documents-full/VOL00008.parquet | DOJ Volume 8 |
| VOL00009 | documents-full/VOL00009.parquet | DOJ Volume 9 |
| VOL00010 | documents-full/VOL00010.parquet | DOJ Volume 10 |
| DataSet11 | documents-full/DataSet11.parquet | DOJ Dataset 11 |
| other | documents-full/other.parquet | House Oversight, court records, etc. |
client.documents(include_text=True).
Photos
Photo metadata from government releases with AI-generated descriptions.
| Column | Type | Description |
|---|---|---|
id | int | Unique photo ID |
source | string | Source identifier |
release_batch | string | Volume/batch |
original_filename | string | Original filename |
content_type | string | MIME type |
width | int | Image width in pixels |
height | int | Image height in pixels |
image_description | string | AI-generated description |
People
People identified via AWS Rekognition facial recognition.| Column | Type | Description |
|---|---|---|
id | int | Unique person ID |
name | string | Identified name |
source | string | Detection source |
photo_count | int | Number of photos containing this person |
Photo Faces
Bounding boxes linking detected faces in photos to identified people.| Column | Type | Description |
|---|---|---|
id | int | Unique face ID |
photo_id | int | FK to photos |
person_id | int | FK to people |
bbox_left | float | Bounding box left edge |
bbox_top | float | Bounding box top edge |
bbox_width | float | Bounding box width |
bbox_height | float | Bounding box height |
confidence | float | Detection confidence |
iMessage Conversations
Metadata for iMessage conversations recovered from the archive.| Column | Type | Description |
|---|---|---|
id | int | Unique conversation ID |
slug | string | URL-safe conversation identifier |
name | string | Contact name |
bio | string | Contact bio/description |
photo | string | Contact photo URL |
last_message | string | Preview of the last message |
last_message_time | string | Timestamp of last message |
pinned | bool | Whether the conversation was pinned |
confirmed | bool | Whether the contact identity is confirmed |
source_files | json | Source files this conversation was extracted from |
message_count | int | Total messages in this conversation |
iMessage Messages
Individual iMessage text messages with sender info and timestamps.| Column | Type | Description |
|---|---|---|
id | string | Unique message ID ({slug}#{index}) |
conversation_slug | string | FK to conversations (slug) |
message_index | int | Message position within conversation |
text | string | Message text content |
sender | string | me (Epstein) or them (contact) |
time | string | Original timestamp string |
timestamp | timestamp | Parsed timestamp |
source_file | string | Source file this message was extracted from |
sender_name | string | Display name of sender |
Star Counts
Crowd-sourced star/interest counts from jmail.world users.| Column | Type | Description |
|---|---|---|
entity_type | string | Type (email_message, email_thread, photo, document) |
entity_id | int | Entity ID |
count | int | Number of stars |
Release Batches
Metadata about each release batch.| Column | Type | Description |
|---|---|---|
id | int | Batch ID |
name | string | Batch name |
description | string | Batch description |
released_at | timestamp | Public release date |
