datechnoman
  • Joined on 2023-03-10
datechnoman renamed repository from ArchiveOrg_CDX_Stats_Processor to ArchiveTeam/Migrated_ArchiveOrg_CDX_Stats_... 2024-08-10 09:48:13 +00:00
datechnoman renamed repository from ArchiveOrg_URL_Processor to ArchiveTeam/Ignored_ArchiveOrg_URL_Processor 2024-07-08 03:18:40 +00:00
datechnoman renamed repository from Bulk_Temporary_Docker_Project_Containers to ArchiveTeam/Ignored_Bulk_Temporary_Docker_... 2024-07-08 03:18:27 +00:00
datechnoman renamed repository from Keyword_URL_Extractor to ArchiveTeam/Migrated_Keyword_URL_Extractor 2024-07-08 03:17:59 +00:00
datechnoman renamed repository from Archived_CommonCrawl_WAT_Path_Comparer to ArchiveTeam/Migrated_CommonCrawl_WAT_Path_... 2024-07-05 01:18:20 +00:00
datechnoman renamed repository from CommonCrawl_WAT_Path_Comparer to ArchiveTeam/Migrated_CommonCrawl_WAT_Path_... 2024-07-05 01:18:01 +00:00
datechnoman renamed repository from Blogger_URL_Extractors to ArchiveTeam/Migrated_Blogger_URL_Extractors 2024-06-30 22:58:38 +00:00
datechnoman renamed repository from ArchiveTeam_Project_URL_Extractor to ArchiveTeam/Migrated_ArchiveTeam_Project_U... 2024-06-30 22:58:11 +00:00
datechnoman pushed to main at ArchiveTeam/Migrated_Keyword_URL_Extractor 2024-06-22 13:16:35 +00:00
74d32a7702 Stream to RAM and process in RAM
datechnoman pushed to main at ArchiveTeam/Migrated_ArchiveTeam_Project_U... 2024-06-22 12:47:53 +00:00
46d8e2e718 Stream to RAM + regex at same time
datechnoman pushed to main at ArchiveTeam/Migrated_ArchiveOrg_CDX_Stats_... 2024-06-06 11:00:12 +00:00
44127f7724 Add archivebot_automated_cdx_processor.py
datechnoman pushed to main at ArchiveTeam/Migrated_ArchiveTeam_Project_U... 2024-06-02 08:13:27 +00:00
b4f357090f Updated to include doi.org extraction
datechnoman pushed to main at ArchiveTeam/Migrated_ArchiveTeam_Project_U... 2024-04-02 11:33:04 +00:00
39afb55120 Updated to include pdf extraction
datechnoman pushed to main at ArchiveTeam/Migrated_ArchiveTeam_Project_U... 2024-04-02 11:31:23 +00:00
029d171d91 Update archiveteam_project_url_extractor.py
datechnoman pushed to main at ArchiveTeam/Migrated_ArchiveTeam_Project_U... 2024-04-02 11:30:30 +00:00
cc91b10e04 Updated to add pdf extraction
datechnoman pushed to main at ArchiveTeam/CommonCrawl_URL_Processor 2024-03-31 11:55:04 +00:00
319d65735c Update README.md
datechnoman pushed to main at ArchiveTeam/CommonCrawl_URL_Processor 2024-03-31 11:45:36 +00:00
7c157c5a48 Update warc_wat_url_processor.py
datechnoman pushed to main at ArchiveTeam/CommonCrawl_URL_Processor 2024-03-31 11:44:03 +00:00
f95513b9fd Update README.md
datechnoman pushed to main at ArchiveTeam/CommonCrawl_URL_Processor 2024-03-31 11:40:24 +00:00
511629b012 Update README.md
datechnoman pushed to main at ArchiveTeam/CommonCrawl_URL_Processor 2024-03-31 11:39:24 +00:00
c2d5ad43b8 Delete urlextractor_archiveteam.sh