
Ok everyone, I have done a complete indexing of the first 13,000 pages of the DOJ Data Set 9.
KEY FINDING: 3 files are listed but INACCESSIBLE
These appear in DOJ pagination but return error pages - potential evidence of removal:
EFTA00326497
EFTA00326501
EFTA00534391
You can try them yourself (they all fail):
https://www.justice.gov/epstein/files/DataSet 9/EFTA00326497.pdf
The 86GB torrent is 7x more complete than DOJ website
DOJ website exposes: 77,766 files
Torrent contains: 531,256 files
Page Range Min EFTA Max EFTA New Files
0-499 EFTA00039025 EFTA00267311 21,842
500-999 EFTA00267314 EFTA00337032 18,983
1000-1499 EFTA00067524 EFTA00380774 14,396
1500-1999 EFTA00092963 EFTA00413050 2,709
2000-2499 EFTA00083599 EFTA00426736 4,432
2500-2999 EFTA00218527 EFTA00423620 4,515
3000-3499 EFTA00203975 EFTA00539216 2,692
3500-3999 EFTA00137295 EFTA00313715 329
4000-4499 EFTA00078217 EFTA00338754 706
4500-4999 EFTA00338134 EFTA00384534 2,825
5000-5499 EFTA00377742 EFTA00415182 1,353
5500-5999 EFTA00416356 EFTA00432673 1,214
6000-6499 EFTA00213187 EFTA00270156 501
6500-6999 EFTA00068280 EFTA00281003 554
7000-7499 EFTA00154989 EFTA00425720 106
7500-7999 (no new files - all wraps/redundant)
8000-8499 (no new files - all wraps/redundant)
8500-8999 EFTA00168409 EFTA00169291 10
9000-9499 EFTA00154873 EFTA00154974 35
9500-9999 EFTA00139661 EFTA00377759 324
10000-10499 EFTA00140897 EFTA01262781 240
10500-12999 (no new files - all wraps/redundant)
TOTAL UNIQUE FILES: 77,766
Pagination limit discovered: page 184,467,440,737,095,516 (2^64/100)
I searched random pages between 13k and this limit - NO new documents found. The pagination is an infinite loop. All work at: https://github.com/degenai/Dataset9
My question is, why is the total download size so large and the range of displayed documents so little? Only 15% of the known documents are individually served on the site, and some arent seen until page 10,000