Commit Graph

64 Commits

Author SHA1 Message Date
3151fce353 Update warc_wat_url_processor.py 2024-02-08 00:39:09 +00:00
486a68a796 Removed multithread compression and added force overwrite for compression files 2024-01-28 22:50:09 +00:00
ebc07a6974 Update zstd to overwrite conflicts 2024-01-28 11:39:00 +00:00
29d24e9826 Reverting 2024-01-28 11:36:17 +00:00
6d591ef0d0 Added in error logging 2024-01-28 11:29:53 +00:00
54747b64f6 Update warc_wat_url_processor.py 2024-01-28 11:23:27 +00:00
b6a9c68140 Update warc_wat_url_processor.py 2024-01-28 09:14:12 +00:00
d0fa7c84f4 Update warc_wat_url_processor.py 2024-01-28 09:10:17 +00:00
875082e8d9 Update prerequisites.sh 2024-01-27 06:31:36 +00:00
ea8aa8f755 Update commoncrawl_local_to_share_move.ps1 2024-01-27 06:28:44 +00:00
54d85523b4 Update commoncrawl_transfer.ps1 2024-01-27 06:28:16 +00:00
0014256679 Update commoncrawl_transfer.ps1 2024-01-27 06:03:49 +00:00
4f46c841b3 Updated to include disk space checking 2024-01-27 03:02:54 +00:00
488571b4c0 Added in Lock file support 2024-01-27 02:42:11 +00:00
674a7ae450 Update to zst extension 2024-01-27 02:26:27 +00:00
7eafadc2fe Update to zst file extension 2024-01-27 02:24:36 +00:00
8244b9241a Update concurrency 2024-01-26 08:01:39 +00:00
f76dbb13fd Update README.md 2024-01-26 07:17:23 +00:00
5661ce44b2 Updated loop issue 2024-01-26 07:14:25 +00:00
2bee6db85a Updated error processing for zstd compression errors 2024-01-25 01:33:11 +00:00
097ec759ab Update prerequisites.sh 2024-01-24 06:00:42 +00:00
295c3daba4 Update warc_wat_url_processor.py 2024-01-23 10:45:32 +00:00
6edffba451 Update warc_wat_url_processor.py 2024-01-23 04:43:15 +00:00
513b32e80a Update warc_wat_url_processor.py 2024-01-20 12:33:28 +00:00
06e3399861 Update warc_wat_url_processor.py 2024-01-20 12:28:23 +00:00
3ce09f46d7 Update prerequisites.sh 2024-01-20 11:30:43 +00:00
e98f80aec4 Updated to use command line zstd 2024-01-20 11:29:57 +00:00
78f6b69cdf Updated with higher compression 2024-01-20 11:25:59 +00:00
d1cfd0178f Update prerequisites.sh 2024-01-20 11:14:42 +00:00
17b4ce6077 Update warc_wat_url_processor.py 2024-01-20 03:26:03 +00:00
99c2f07498 Add warc_wat_url_processor.py 2024-01-20 03:25:50 +00:00
b881641c69 Commented out URL Extractions (to be done post downloading of files) 2024-01-12 04:02:06 +00:00
cba96e96e7 Rollback of change 2023-12-22 01:06:30 +00:00
bfc13cb6ef Updated script to keep regenerating a list of files to download 2023-12-21 09:35:39 +00:00
50e89b9de2 Add in checking for new files once list is depleted 2023-12-21 01:58:55 +00:00
0aad853966 Update README.md 2023-12-20 04:09:47 +00:00
5f152307f2 Add commoncrawl_local_to_share_move.ps1 2023-12-20 04:09:13 +00:00
fd9376cbe0 Updated to extract Pastebin URL's 2023-12-19 00:23:55 +00:00
1036de64a7 Documentation Update 2023-12-18 04:34:20 +00:00
727d2c3187 Update commoncrawl_transfer.ps1 2023-12-18 04:27:30 +00:00
171d3e2d2d Upload files to "/" 2023-12-18 04:27:03 +00:00
65757f8cc4 Update README.md 2023-12-12 11:04:52 +00:00
1e25ef86fe Upload files to "/" 2023-12-12 10:23:58 +00:00
b24287ef6f Upload files to "/" 2023-12-12 10:23:37 +00:00
b7ce7aa4b0 Upload files to "/" 2023-12-12 10:22:31 +00:00
7f9480bc40 Update urlextractor_archiveteam.sh 2023-12-12 10:22:22 +00:00
ac0f299269 Update commoncrawl_url_processor.py 2023-12-12 10:05:46 +00:00
ba11c6af9f Update commoncrawl_url_processor.py 2023-12-12 10:05:23 +00:00
d97491b4f0 Update urlextractor_archiveteam.sh 2023-12-10 05:10:10 +00:00
6a90fc7f5e Update multithread_script_5.0.py 2023-12-06 10:59:41 +00:00