/r/DataHoarder
This is a sub that aims at bringing data hoarders together to share their passion with like minded people.
Who are we?
We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Timetm). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.
We are one. We are legion. And we're trying really hard not to forget.
-- /u/5-4-3-2-1-bang from this thread
Links!!
Free Post Friday
On Fridays we'll allow posts that don't normally fit in the usual data-hoarding theme, including posts that would usually be removed by rule 4: “No memes or 'look at this [thing]'”
Just make sure to tag the post with the flair [Free-Post Friday!] and give a little background info/context.
Related Subreddits
Data Hoarding/Curation:
Servers and Homelabs:
Tech Support:
Sales & Marketplace:
/r/DataHoarder
I have zero knowledge about this brand/website FYI
I’m looking to upgrade my current storage setup, which right now is just a single external drive. I’d like to expand to around 4 drives. Over the past few days, I’ve been researching solutions, but I’m still unsure about the best option.
Part of me wants to build my own NAS or repurpose an old office PC, but since I’m moving in about a year, I’m not in a position to commit to that right now. I’m considering getting a 4-bay DAS for the time being. If I go that route, would I be able to later connect it to a NUC, old PC, or a future NAS build without any major limitations? I’ve read a lot about RAID arrays over USB not being ideal. If I choose this setup, are there other connection options that would make it faster or more reliable?
Workers at NASA Told to ‘Drop Everything’ to Scrub Mentions of Indigenous People, Women from Its Websites - 404media.co
I am seeing if anyone knows of a backup of the NASA site? I saw that the NTRS (NASA Technical Reports Server) is backed up.
I'm talking about technical documentation or videos, precise enough to replicate the steps and finished product, for things like:
Sort of like the Doomsday Vault in Svalbard, but with the knowledge distributed across many communities, because Svalbard is likely to be the last place that people will be able to get to in a collapse of civilization.
I would like to export my Twitter/x followers into a CSV list, is there a free way that will allow me to do so? I am not subscribed to Twitter Blue
Hey all, hope this kind of question is allowed (I think it follows the sub rules but I'm new here). I use a lot of NCES data (nces.ed.gov), and given the administration's removal of Census data and threats to the Department of Education, I'm wondering if anyone is backing up NCES data. There's a lot that they produce about the number of students in K-12, higher education, and beyond; these data are used in so, so many reports about the state of education in the US. I'm happy to contribute to ongoing efforts but didn't see anything else in this sub, and I wanted to ask before spending a lot of time duplicating efforts.
I went looking for some more ST16000NM001G Seagate 16TB drives and was pretty shocked at the price for the refurbs from goharddrive/serverpartdeals currently sitting at $209-220. I have 6 others of these and they were all purchased for $135-140 a few months ago. This has to be a fairly recent increase. What gives?
Thanks to those archiving data and protecting information! I've begun backing up content that's important to me and other critical sources.
A few thoughts:
I have a few TB of space to spend and NIST has standards for dozens of different disciplines that may be worth saving. Standard reference materials? NVD? Research publications?
I've been casually trying to get into data archiving, saving information from things like the emursive/punchdrunk show that recently closed "Sleep No More", however with recent events with the CDC website scrubbing data on anything queer/lgbt, I wanted to start helping with the effort of preserving that which is being erased.
I've just been going through the "banned" terms on the CDC website, downloading any PDFs and saving any of the pages I can as PDFs, as well as attempting to save links onto the wayback machine and using it for any cdc pages that are already downed/scrubbed.
Anybody have any tips for methods/tools to make this more efficient than just panic downloading whatever I can? any tips on places to post these for others who may want to access this information?
Thank y'all in advance!
I don't live in the US, but will be traveling there soon, so i intend to by 4x hdds (maybe on serverpartdeals, but their price is now absurd, maybe amazon to chuck - anyway it is cheaper than back home).
I intend to take to my trip a hdd usb case, so I would use this to test them. Is there any way to test the drives having only a notebook and this case? I prefer a throughout test, so i can RMA them in the few days i'll be there. Notebook is dual boot macos and windows 10.
Please advise me on tools on how to test the disk in a very complete way.
Looking back I've always been a data hoarder, but I never knew it was an actually thing. I just thought I had an unhealthy obsession with cataloging and trying to archive random interesting things I found on the internet. I didn't even know data hoarding was a real hobby till I stumbled across this sub reddit, but I'm already in love with all of it lol.
I'd love some advice on how to get started and learn more about the technical aspects of everything. I'm not exactly a whiz with computers so I barely know alot of basic things, like what zip files are, using an external hard drive, etc. So far my set up just consists of me screenshotting things, making things into PDF's, and downloading it all onto a USB drive lol. I'd love to start doing things ledgit. I'd also like to learn about the cyber security aspect of things and keep me and my data safe and making sure nothing gets corrupted.
Thanks for the help!
Now that Panasonic no longer produce optical media I'm looking for a reliable brand for 50gb optical discs. I'm from the UK and have access to Verbatim 43748 BD-R DL 50GB 6x - 5 Pack Jewel Case for around £22.00, that's all I really have access too. I thought I'd have a quick look on Amazon Japan and found these Verbatim Blank Blu-ray BD-R DL 50GB 1-6x 50 Discs VBR260RP50SV1 Printable Inkjet they retail at around £70 delivered which is reasonable and they also do packs of 100 for around £130. Can anyone give me any advice on what discs to go for please?
Lynda M. Kellam, the Director of Research Data and Digital Scholarship at the University of Pennsylvania Libraries, has compiled a list of groups working on data rescue or guerilla archiving of U.S. federal government data.
The live document is here and it's being continuously updated: https://docs.google.com/document/d/15ZRxHqbhGDHCXo7Hqi_Vcy4Q50ZItLblIFaY3s7LBLw/
Here's a PDF version of the Google Doc I downloaded (on 2025-02-04 at 06:39 UTC) for those who prefer a PDF: https://archive.org/details/data-rescue-efforts
She posted the document on Bluesky.
Hey everyone,
Given how the number of SATA ports on motherboards keep dwindling, the time has come for me to seriously consider moving to an external NAS drive, but I am a little lost in what NAS system I should use, or what type of filesystem I should use.
Basically, my requirements are:
At least 30TB of starting effective capacity with enough redundancy against 2 drive failures. Mostly to give me room to replace drives when one drive fail.
Preferably ready made NAS enclosure.
Data being able to survive and retrieved when the NAS hardware or OS fails. This is mandatory.
Being able to upgrade capacity without needing to do a lengthy or complete migration would be nice.
Thanks!
I was copying terabytes of data back and forth between a external SSD, and the temperatures on my internal NVME reached 75C and the drive disconnected from my PC. Could this event have damaged my data?
Archive Team is a collective of volunteer digital archivists led by Jason Scott (u/textfiles), who holds the job title of Free Range Archivist and Software Curator at the Internet Archive.
Archive Team has a special relationship with the Internet Archive and is able to upload captures of web pages to the Wayback Machine.
Currently, Archive Team is running a US Government project focused on webpages belonging to the U.S. federal government.
Here's how you can contribute.
Step 1. Download Oracle VirtualBox: https://www.virtualbox.org/wiki/Downloads
Step 2. Install it.
Step 3. Download the ArchiveTeam Warrior appliance: https://warriorhq.archiveteam.org/downloads/warrior4/archiveteam-warrior-v4.1-20240906.ova
Step 4. Run OracleVirtual Box. Select "File" → "Import Appliance..." and select the .ova file you downloaded in Step 3.
Step 5. Click "Next" and "Finish". The default settings are fine.
Step 6. Click on "archiveteam-warrior-4.1" and click the "Start" button. (Note: If you get an error message when attempting to start the Warrior, restarting your computer might fix the problem. Seriously.)
Step 7. Wait a few moments for the ArchiveTeam Warrior software to boot up. When it's ready, it will display a message telling you to go to a certain address in your web browser. (It will be a bunch of numbers.)
Step 8. Go to that address in your web browser or you can just try going to http://localhost:8001/
Step 9. Choose a nickname (it could be your Reddit username or any other name).
Step 10. Select your project. Next to "US Government", click "Work on this project".
Step 11. Confirm that things are happening by clicking on "Current project" and seeing that a bunch of inscrutable log messages are filling up the screen.
For more documentation on ArchiveTeam Warrior, check the Archive Team wiki: https://wiki.archiveteam.org/index.php/ArchiveTeam_Warrior
You can see live statistics and a leaderboard for the US Government project here: https://tracker.archiveteam.org/usgovernment/
More information about the US Government project: https://wiki.archiveteam.org/index.php/US_Government
For technical support, go to the #warrior channel on Hackint's IRC network.
To ask questions about the US Government project, go to #UncleSamsArchive on Hackint's IRC network.
Please note that using IRC reveals your IP address to everyone else on the IRC server.
You can somewhat (but not fully) mitigate this by getting a cloak on the Hackint network by following the instructions here: https://hackint.org/faq
To use IRC, you can use the web chat here: https://chat.hackint.org/#/connect
You can also download one of these IRC clients: https://libera.chat/guides/clients
For Windows, I recommend KVIrc: https://github.com/kvirc/KVIrc/releases
Archive Team also has a subreddit at r/Archiveteam
I'm trying to use the Wayback machine, but I'm hearing it doesn't work well for Youtube videos - is that right? I'm a total newbie on this stuff. I really want to make sure all of the Census videos don't get removed and lost. Looks like most, if not all, of Census' videos are on Youtube.
Please any ideas on how to save these: https://www.youtube.com/@uscensusbureau/featured
Hi, so I recently bought a 7-season TV Show on DVD and don't intend on watching it just yet, that being said, I'd like to check for errors in case I need to return it, is there a way to do so?
Historically I've used MacBook Pros backed up to an external drive using Time Machine, + external SSDs holding a few TB of various media. I'm buying a newer ThinkPad P1 and moving to Linux, and now is a good time to take a look creating a more reliable backup routine. It seems simple, but to be honest I can read for a few hours and feel like I haven't learned anything.
I have my data on my laptop and the external drives I can restore data from, but after I experienced what an SSD failure looks like, I decided I want to have an additional HDD I can back everything up to. When I started looking at 3.5" enclosures, I came across RAID enclosures like the OWC Mercury Elite Pro Dual and it got me thinking about setting up a RAID 1 array as surely having my backups mirrored to a 2nd drive can't be worse than backing up to a single HDD. I have since learned that is not a good plan because of Reasons™ but I do plan to mirror my backup to a 2nd drive manually. I understand this doesn't protect me in case of a fire, but it does greatly reduce the risk from drive failure which is my main concern.
I should note the drives will only be powered up and attached during backups. After giving up on the idea of a RAID, I planned to buy a plain dual bay enclosure (or a RAID enclosure but use the drives individualy) for the 2x 8TB UltraStar HDDs I've already bought. But, basically every enclosure out there has reviews saying the drives started disconnecting randomly and that BOTH drives were suddenly corrupted. This is true for $50 on up to several hundred dollar enclosures and too common to ignore when the whole point is to help me rest easy.
My question is: what is going on with all these failures? Shouldn't it be harder to make a mistake so bad that all your data gets corrupted when you're just trying to make a backup? I haven't been able to find any good answers about this. I'd prefer a single enclosure to avoid double the cords and power supplies plus I imagine the speed is better transferring between the drives inside, but if 2 separate enclosures is safer I'm good with that. My needs are simple but I know a lot of the same 4/5/6/10 bay enclosures come in a dual bay version so hopefully someone has some good experience - is it that all enclosures use crappy controllers? Is there a reliable one out there?
I've been told you should always have a backup to be safe, but come on - this is the backup that I'm already making to be safe. It's not reasonable to need a backup for my backup for my backup, with the expectation that whole drives being corrupted is a normal contingency. I think I've planned out a solution that is better than average, and I'm confident there is a method that is "pretty darn good" even if I don't run my own data center deep within a mountain or something. So I'd appreciate any info/tips from those with experience!
Hey Team! I’m looking for a quiet-ish solution to add additional 3.5” drives.
I have a 12 Bay JBOD right now, but the PSUs are very loud.
I’m not opposed to normal fan noise, but I can’t do enterprise grade high pitched PSUs or fans.
Are there any decent Dell / Supermicro chassis that I can make quiet, or a custom JBOD solution?
Okay so I've mostly got LFF drives in my current setup, but I'm looking at upgrading to a 4 blade system that has 24 SFF bays.
I'll absolutely be keeping my primary storage running, but I'd like to look into actually using the SFF bays and not just having them as decoration.
However, in my (albeit quick cursory) research I'm seeing that m.2 form factor drives seem to be more available than SFF drives.
Does anyone have any experience with using SFF adapters to run m.2 drives, and are the adapter cards that carry multiple m.2 drives into a single SFF bay actually worth it?
(Or would I be better off replacing the backplane with something that I can throw m.2 drives straight into?)
Hi there,
I am having trouble accessing the CDC Wonder data query (and the internet archive version doesn’t work for downloading data) and I was wondering if anyone on this thread knows of an archive of cause of death data that is stratified by year/location/age/etc.
If not I will keep trying to get the website to work and start collecting it myself!
I am a Sociologist and Criminologist and I was just wondering if anyone had archived the Bureau of Justice Statistics and or the FBI Uniform Crime Reports/NIBRS National Incident-Based Reporting System? It hasn't disappeared yet but I fear it will.
I am at the beginning stages of organizing home media (ripping discs and scanning photos), backups for several computers, etc. I have been reading about 3-2-1 backup method and filesystems and data integrity. I have a couple of questions for which I would appreciate advice.
Current HDD available: 16 TB internal, 16 TB external, 2 x 8 TB internal, 8 TB external, 2 x 4 TB external, 3 TB external, and various assorted 2 TB and 1 TB externals. I have various internal and external SSD for boot drives and other uses.
Regarding 3-2-1 backup method, I have a question about 2. I get don’t put a “second backup” on your Time Machine drive, potential problems with two copies on different drives in one NAS on the same power supply, etc. Would copies of data on two devices on different electrical circuit breakers in one room generally count as 2 copies of data? We get lightning storms where the devices are used.
Regarding filesystems and data integrity, I have a Debian 12 computer using ext4, a Windows 11 Pro computer using NTFS, and macOS Sequoia computers using APFS. The computers communicate fine over the ethernet and wireless network. Besides a cloud copy of my data (10-12 TB), each of the three filesystems is likely to hold a copy of my data. I am willing to change the Debian 12 computer FS if needed and incorporate external drives into the backup method. What is the best way to maximize and monitor data integrity among copies of the data on these different devices with different filesystems? I have had a few files go bad over the years before I knew about bit rot, and I would like to not repeat it.