/r/Archiveteam

Photograph via snooOG

Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it's lost forever.

Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it's lost forever.



Related Subreddits


Feel free to join us on the IRC channel! We're on the hackint network in a channel called #archiveteam-bs, where we say truly awful things. Connect with your client of choice or use hackint's online chat.

/r/Archiveteam

13,874 Subscribers

19

Youtuber being forced to delete all his content by employer

I can't get the yt-dlp to archive it, is anybody cogent enough with that tool to assist?

It's not a lot, but it is valuable to flight enthusiasts.

https://www.youtube.com/@jonpirotte

8 Comments
2024/05/03
21:31 UTC

1

Archiving forum pages that have posts from a specific user

Is there any good way to archive forum threads, or specific pages of threads, that contain posts by a specific user? Keep in mind I have no real programming experience so making my own script is off the table. Also I want to save these on my own storage not upload them to the Internet Archive.

Will I have to do this the long way without those expertise?

0 Comments
2024/05/03
20:43 UTC

0

Will the reddit archives ever be unlocked on IA

0 Comments
2024/05/03
13:47 UTC

1

Wayback machine - caclulate deleted pages

Hi, just discovered this. Is there a way to determin how many items (or products) have been deleted between snapshots?

5 Comments
2024/05/03
11:06 UTC

0

Old YouTube account

could someone help me get back old YouTube videos? I have the YouTube account, but I deleted all of my videos in 2013 or 2014. I made a bunch of videos with my friends in middle school and elementary school. So sad they’re all gone. Is there anyway to get them back at all? I’ve tried the way back machine, but nothing came up. If anyone could help or set me in the right direction that’d be amazing

3 Comments
2024/05/02
00:08 UTC

0

Roblox warrior script not working(?)

I’m seeing no new items coming in on the leaderboard and my warrior just says the number of items is being limited. Is something wrong?

2 Comments
2024/05/01
07:00 UTC

8

Wrote a working python script for decompressing the imgur archives on windows

import io
import os
import struct
import subprocess
import sys
import tempfile


def get_dict(fp):
    magic = fp.read(4)
    assert magic == b'\x5D\x2A\x4D\x18', 'not a valid warc.zst with a custom dictionary'
    dictSize = fp.read(4)
    assert len(dictSize) == 4, 'missing dict size'
    dictSize = struct.unpack('<I', dictSize)[0]
    assert dictSize >= 4, 'dict too small'
    assert dictSize < 100 * 1024**2, 'dict too large'
    ds = []
    dlen = 0
    while dlen < dictSize:
        c = fp.read(dictSize - dlen)
        if c is None or c == b'': # EOF
            break
        ds.append(c)
        dlen += len(c)
    d = b''.join(ds)
    assert len(d) == dictSize, f'could not read dict fully: expected {dictSize}, got {len(d)}'
    assert d.startswith(b'\x28\xB5\x2F\xFD') or d.startswith(b'\x37\xA4\x30\xEC'), 'not a valid dict'
    if d.startswith(b'\x28\xB5\x2F\xFD'): # Compressed dict
        # Decompress with zstd -d
        p = subprocess.Popen(['zstd', '-d'], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
        out, err = p.communicate(d)
        assert p.returncode == 0, f'zstd -d exited non-zero: return code {p.returncode}, stderr: {err!r}'
        d = out
    return d


input_file = 'imgur-2023-01.warc.zst'  # Set your input file path here

if not input_file:
    print('Input file not provided.', file=sys.stderr)
    sys.exit(1)

if not os.path.exists(input_file):
    print(f'Input file "{input_file}" not found.', file=sys.stderr)
    sys.exit(1)

with open(input_file, 'rb') as fp:
    d = get_dict(fp)

# Write the dictionary to a text file
with open('dict.txt', 'wb') as dict_file:
    dict_file.write(d)

# Extracting the dictionary and decompressing the file using the dictionary
output_file = 'output.warc'

subprocess.run(['zstd', '-d', input_file, '-D', 'dict.txt', '-o', output_file], check=True)

# Delete the dictionary file
os.remove('dict.txt')

I kept having to use a linux vm to decompress the archives which was disrupting my workflow so I finally figured out a way to make this linux script work on windows. My implementation is a little different, but I find it to be a lot faster (might just be due to vm io issues though). This 1 year old question finally has a solution.

5 Comments
2024/04/30
21:20 UTC

4

Was there an issue with the original imgur warc that was later corrected?

I've been using the script I posted about here to extract the contents of the imgur warcs and noticed that when I did it on a random archive from late 2023 everything was fine, but when I went back to the first few warcs that were released (the 10gb ones) a lot of images have tons of repeats in slightly different resolutions and ratios. Is this an issue with my parsing code or was a correction made to the warc creation at some point which prevented all these duplicates from being stored?

2 Comments
2024/04/30
05:31 UTC

1

Wee 3 songs via Treehouse TV's Toons n' Tunes player

Please take a minute to watch these:

https://www.youtube.com/watch?v=WkifGLX8pi8

https://www.youtube.com/watch?v=NOhnaRp_NiA

https://www.youtube.com/watch?v=pL3iqpf_Gh0

They managed to recover the prologue of the Scooby Doo gameseries Horror On The High Seas and Mayan Mayhem! I too thought those .swf cutscene/cinematic files along with that game Carrot Season (not to be confused with Carrot Sweeper) were lost for good! were lost for good!

It's beyond my capabilities at this point; I really have tried everything I could think of from my end but to no avail. So I need someone with higher IT decrypting knowledge and capabilities that far exceed my own to help me resolve this. For both me and everyone else searching frustratingly for this specific part of our lost childhood! They're all depending on me with your assistance as I seem to be the only one who has this one critical piece of memoric information retained firmly in my brain!

Please help! I have faith and I do believe in you! I *really did* try everything I could think up on my end but to no avail! But I'm convinced it's still out there... somewhere. I just need to seek help from anybody who's recovery skills far exceed my own. And it’s not just for me, it’s for everyone else frustratingly searching for this piece of lost childhood.

I need you to help recover every audio file - specifically relating to the Kid's Show Wee 3 via this program from treehousetv.com and it's Toons n' Tunes music and video player. Look for every Wee 3 song between the time period November 13 2006 - August 24 2007.

The images I've enclosed below will help you to better understand what to click on, search through, and extract from.

Toons n' Tunes is a feature that was available on treehousetv.com for the timeframe November 13 2006 - August 24 2007. It was at the time YouTube was just created. Throughout that time period, the songs on the Toons n' Tunes player kept changing. It's only about 10 months of data to dig through; that amount of time shouldn't take too long to sift through.

A decade has passed and YouTube still has yet to publish all the Wee 3 episodes. Even though there's YouTube channels like this: https://www.youtube.com/@licketysplit7505 https://www.youtube.com/@TheBigComfyCouch and https://www.youtube.com/@TreehouseDirect https://www.youtube.com/@treehousetv, this is only part of the jigsaw complete. I even tried contacting Treehouse TV's email several times only to get no replies, so it's all down to you.

What makes this more complicated is that Treehouse Tv's Toons n' Tunes is INDEED a .swf program. So I'm convinced that the .swf audio files are still secure in the Wayback Machine.

The song audio files may not work via the Toons n' Tunes player anymore, but maybe - just MAYBE they'll be playable via a .swf decompiler!

Please reply back when you make a breakthrough. This is a RELIC and I'm convinced it's still out there!

Although some of these shows were weird, they're still childhood material for 90's born people!

EACH DAY WE DON'T LISTEN TO OR VIEW THEM AFTER A DECADE, WE MISS THEM DEARLY!!!

And cross my heart, I WILL be sure to credit that person who found the Wee 3 songs as "Special Thanks" when I render the Wee 3 songs into a YouTube video via Vegas Pro.

https://preview.redd.it/5qqmfjw9ljxc1.jpg?width=1024&format=pjpg&auto=webp&s=fd6b34adda4a03c8fbb36c1451027fd59469f35f

https://preview.redd.it/tgiealw9ljxc1.jpg?width=1065&format=pjpg&auto=webp&s=85748d0edf73ecb9bd37b01b00d865aa9aa96a0c

https://preview.redd.it/8p0rfo6cljxc1.jpg?width=1702&format=pjpg&auto=webp&s=aa04ac1aea441b7b0fb41ca4c1c4655035ccf100

https://preview.redd.it/jsu5ql6cljxc1.jpg?width=3840&format=pjpg&auto=webp&s=751f83e82ce83a48f0b15e30e41bbc8d3d41702c

0 Comments
2024/04/30
04:14 UTC

0

is the internet archive really in danger?

https://www.youtube.com/watch?v=vdMT-x7CbdU just saw this video

if internet archive goes on sale, hopefully it goes to the rightful owner....google has been fucking up/nerfing their own freeware for more ads and the subscription version. mircosoft would probably claim copyright on all their software and games just as willy nilly tit for tat takedowns as a plan to boost their future sales, amazon has this propaganda machine about them but don't really claim loyalty over anything but sales and customers -- they actually did wear the big boy pants and go after publishers, such as in the lawsuits stated above about wayback machine, and this intertwining of amazon sales of books would probably make these publishers retract some of these suits if amazon owned wayback machine. ik people feel a certain way about how they treat their own employees, but they literally bend over backwards for the customer and may be the best fit to acquire wayback machine

2 Comments
2024/04/29
16:34 UTC

3

Help trying to view web archive of Purevolume

So I am new to website archives and python so this has been hours of struggle, I'm going to try and explain the issue I'm having the best I can, please bear with me if I don't use the correct terms.

I grabbed the website archive here: https://archive.org/details/archiveteam_purevolume_20180814174904 and was able to install pywb after much banging my head against the wall with python. I used glogg to get the urls from the cdxj file but when I set up the localhost in my browser I keep getting an error with any url I try. Example:

http://localhost:8080/my-web-archive/http://www.purevolume.com/3penguinsuk
Pywb Error
http://www.purevolume.com/3penguinsuk
Error Details:

{'args': {'coll': 'my-web-archive', 'type': 'replay', 'metadata': {}}, 'error': '{"message": "archiveteam_purevolume_20180814174904/archiveteam_purevolume_20180814174904.megawarc.warc.gz: \'NoneType\' object is not subscriptable", "errors": {"WARCPathLoader": "archiveteam_purevolume_20180814174904/archiveteam_purevolume_20180814174904.megawarc.warc.gz: \'NoneType\' object is not subscriptable"}}'}

I'm an absolute noob that just wants to preserve and archive Pop Punk bands from the 2000-10s, any help would be so appreciative. I'd love to be able to see these old bands' Purevolume profiles again.

1 Comment
2024/04/29
00:07 UTC

8

Archiving TikTok

So the bill to ban TikTok just got passed in the US, which I like, however it does mean that theres a high chance that all the content may never be saved again. And BYteDance said theyd rather delete the app than sell it (https://www.theguardian.com/technology/2024/apr/25/bytedance-shut-down-tiktok-than-sell). Inactive accounts typically get deleted on TikTok so are we going to archive all the American TikTok pages?

4 Comments
2024/04/26
15:37 UTC

10

Archiving the Rooster Teeth website

The Rooster Teeth website will shut down on May 15 of this year. Are there plans to archive it before that happens? Or has that already been done?

1 Comment
2024/04/25
23:17 UTC

2

Best way to store a website?

Hey, I need to make sure we don't lose a website - it's not especially urgent, just a hobby thing, we use that stuff a lot, that's all. I tried making a script using waybackpy and going over the webpages one by one after making a list, but after leaving it overnight, it spits out an error no matter what I do. Today I stopped the script, waited for an hour, restarted it, and from the get-go I'm getting rate limit errors.

On second look, waybackpy was last edited 2 years ago - I'm going to guess it must've gathered some technical debt, and Archive may have changed somewhat. Anyone got any advice, preferably something I can automate? I'm talking about around 20000-30000 pages here, and I expect roughly 2.5 GB (it's a retro-looking forum with software from the late '90s).

I could just DL the whole forum to my computer and have a local backup, but I'd rather avoid that if at all possible - it would be best if it were open for everyone on the internet to look at. Any advice?

6 Comments
2024/04/22
19:00 UTC

7

How can I find all videos by a specific YouTube channel archived on the Wayback Machine?

I want to make a video about a youtuber's career, but they've deleted most of their old videos. Their channel page has been archived on the wayback machine, but they don't feature all of the uploads, so I can't check every video or even see the title of them.

I thought maybe a tool existed that could search through subpages of a website with HTML snippits as the search query, but I couldn't find anything.

I found I could use the CDX API to search through urls with filters, but since urls for YouTube videos don't include any information about the channel it's from, I got stuck here too.

Does anyone know a tool for this, or another solution?

2 Comments
2024/04/22
12:04 UTC

9

How best to help archive sources linked from a website?

floodlit.org is a website about abuse cases. I'm not running that site, but have been manually archiving the sources they link. However they have a lot and this list will continue to grow.

I'm curious if there is a better way to do this. I'm trying to make sure both archive.org and archive.today have links before they succumb to link rot. Sadly some pages already have disappeared. At the speed I can do this many more pages will be gone before I get to them.

14 Comments
2024/04/19
19:17 UTC

2

Downloading Twitter Videos with the tweet embedded

I've been archiving tweets before the platform eventually implodes but I've realized part of the fun is the funny caption/commentary preceding the video. Obviously I could just screen record and grab the audio and put it all together or manually insert the text in editing, but that's a lot of work and I was curious if there were any tools out there!

1 Comment
2024/04/16
15:17 UTC

14

Is it just me or is neither IA nor archive.is properly saving Twitter pages currently?

Over the past week I tried checking some Twitter posts which have already been archived on archive.org (example) but they appear completely white and page source suggests there's no content of the page in it and that JS is meant to handle it loading (but doesn't in the archived version).

While on archive.is when trying to archive some pages it remains in a continual loop (have waited like 30m on one and seen multiple loops occur). Which is unusual.

Have others encountered this? As it's not a great outlook for pages being crawled/archived during this time. (Only tangentially related to AT, I know, but still concerning.)

1 Comment
2024/04/14
07:53 UTC

6

Has anyone attempted to archive Vocaroos?

0 Comments
2024/04/13
06:28 UTC

1

Trying to find old show

Im trying to locate an old show called Sex Rehab with Doctor Drew. It aired in 2006, i can only find one episode. Has anyone come across it?

0 Comments
2024/04/12
23:20 UTC

1

Looking for archive for tracks of a music artist jasson fransis.

Here are the 2 link that I managed to find for his song :

Younger years : https://www.youtube.com/watch?v=iYcGwVE6VTI

You beautiful https://www.youtube.com/watch?v=ZdX5HMxnWe8&pp=QAFIAQ%3D%3D

My trouble now is the artist account was hacked and all his tracks from all platform, Amazon music, Apple Music, YouTube, Spotify, everything that you could think of are gone. There is an archive in IA but that’s just screen capture not the video. There might be a slight chance if someone could find Vietnamese or Thailand website and use vpn to gain access to the video, or I saw an ai website actually got 10 second footage of it, but that’s for remix. Greatly appreciate if anything got clues on how to find them.

0 Comments
2024/04/12
14:46 UTC

2

Really need to find something! Please help!

Hello. I am UG student - who's researching on a colonial legislation (passed in 1868 for India), but am not able to find it online.

Any clues? + if I go to the Delhi State Archives -what do I ask for? (Act?/ Debate?)

Am very new to all of this - need someone to help me out

0 Comments
2024/04/09
05:04 UTC

6

Most 000webhost sites will be/are closing.

000webhost was bought by Hostinger a while back and they have begun shutting down sites unless they get the “premium plan” which is effectively just moving to Hostinger. They seem to be shutting down newer sites first so we still have a bit of time to grab what we can.

2 Comments
2024/04/08
10:31 UTC

16

Most 000webhost sites will be/are closing.

000webhost was bought by Hostinger a while back and they have begun shutting down sites unless they get the “premium plan” which is effectively just moving to Hostinger. They seem to be shutting down newer sites first so we still have a bit of time to grab what we can.

4 Comments
2024/04/08
10:31 UTC

9

Pakapaka, an Argentine television channel and website, will close on April 7th.

https://en.wikipedia.org/wiki/Pakapaka

https://www.elciudadanoweb.com/cierra-paka-paka-desde-la-libertad-avanza-celebran-su-final

Pakapaka is owned by the government. Its Youtube channel contains many episodes of the cartoons that have been broadcast many years ago.

https://www.youtube.com/@CanalPakapaka/videos

0 Comments
2024/04/06
17:20 UTC

Back To Top