/r/DataHoarder

Photograph via snooOG

This is a sub that aims at bringing data hoarders together to share their passion with like minded people.

Who are we?

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Timetm). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

We are one. We are legion. And we're trying really hard not to forget.

-- /u/5-4-3-2-1-bang from this thread


Links!!


Rule(s)

  1. Search the Internet, this subreddit and our wiki before posting.
  2. Keep it about datahoarding.
  3. Be excellent to each other.
  4. No memes or 'look at this old storage medium/connection speed/purchase' (except on Free Post Fridays).
  5. Posts must include context/detail.
  6. No unapproved sale threads, advertisement posts, or giveaways. Companies must get prior approval from mod team before posting.
  7. No cryptocurrency posts.
  8. We are not your personal archival army.
  9. r/techsupport exists.
  10. No requests, use r/DHExchange

Free Post Friday
On Fridays we'll allow posts that don't normally fit in the usual data-hoarding theme, including posts that would usually be removed by rule 4: “No memes or 'look at this [thing]'”
Just make sure to tag the post with the flair [Free-Post Friday!] and give a little background info/context.


Related Subreddits
Data Hoarding/Curation:

Servers and Homelabs:

Tech Support:

Sales & Marketplace:

/r/DataHoarder

812,834 Subscribers

60

Does Internet Archive have any plans to move their data off U.S. soil?

With the way things are going, I wouldn't be surprised if Internet Archive became a target for censorship. Does anyone know if there are backups hosted in other countries or plans to move their data?

In a 2016 blog post, they mentioned that they were planning to host a copy of the archive in Canada and that they have partial copies hosted in Egypt and the Netherlands. Is that still relevant information?

16 Comments
2025/02/01
09:51 UTC

1

Looking for m.2 to mini SAS options, or other solutions?

Bit of a niche case here, but I have an Asus ROG STRIX X870E-E GAMING WIFI mobo in a system that has a lot of HDDs, and the board only has x4 SATA slots.

I'm trying to avoid using my 2nd PCIe slot so that the primary doesn't go to x8.

Is there an m.2 to mini SAS (x2) option out there that's reliable?

I found this but I don't know if it would work. Plus I don't know how to tell if my m.2 slots are SATA or NVME slots... that's important right?

Any opinions or suggestions are welcome!

Thank you.

2 Comments
2025/02/01
09:23 UTC

5

Archiving or scraping Brickshelf before it shuts down

https://brickshelf.com/ is shutting down March 1st.

I’m not well versed in scraping it would be sad to see so many Lego albums be deleted and there’s lots of custom instructions on there too.

11 Comments
2025/02/01
08:40 UTC

2

Saving just a part of a website?

First of all, it sounds like a lot of you are doing really important work in light of what's going on.

This a really simple task for you guys, but it's currently way beyond my skill set.

I ran a website that used a template from a company called Zenfolio. I still have it, but just want to download all of the blog entries, ideally with pics.

I haven't been sure who to ask until I saw this sub mentioned a lot today. An ELI10 format would be very much appreciated.

3 Comments
2025/02/01
08:39 UTC

4

Expanding old Storage or replacing it with newer drives.

Hey,
I currently run 2x12TB and want to add more storage.

My main options are:

Buy 2x16TB and make two mirrors, for a total of 28TB

OR

Buy 2x12TB and make a raidz1 for a total of 36TB

Obviously the second option is not only cheaper but also provides more storage.
The problem is, that the second option will lock me more into the 12TB, while the first allows me to more easily extend with 16TB Drives in the future.

Is it still worth it to go with 12TB drives or will prices of higher capacity drives drop quickly enough to already start with a 16TB array?

1 Comment
2025/02/01
08:21 UTC

14

Thank you

Never thought I'd have to think this, much less say it, but to all those of you who save humanity's data, I salute you

you all are heroes in a super weird world

1 Comment
2025/02/01
07:50 UTC

1

How much storage do you have? and how do you get so much?

I always remember hearing storage was really expensive, and with mechanical drives growing up, higher capacities being more likely to give out with a lot of use. How is storage in current era and fail rates? I'm still using about 4TB between two drives.

14 Comments
2025/02/01
06:46 UTC

2

Record live news tv feeds on a 2-3 day loop

I had this idea tonight of recording live tv news channels (digitally) on a 2 - 3 day loop? like could be done with pc or an SBC? I have a raspberry pi 5? Or iptv?

4 Comments
2025/02/01
05:12 UTC

7

Tiktok archive

Ive been downloading tiktoks for the past few weeks and my archive has gotten up to over 400k videos and almost 2 TBs of room. I've still got another 300ish creators to download so I've got a bit of time but what would be the best practices for ensuring that I never lose my archive? I know that raid exists and that raid 6 has dual parity so you can lose up to 2 drives without losing any data but that's about as much as I know. I have a bit of money but id like to do it fairly cheap if at all possible without adding too much labor or failure points.

Also, ive only ever used windows machines regularly so Linux will be a challenge but im up to it if that's the best choice.

4 Comments
2025/02/01
04:29 UTC

71

This is the first time I’m in the sub

Y’all probably feel so justified right now… it’s like being a survivalist/doomsday packer and the zombie apocalypse just happens.

Appreciate y’all

(And of course this is ignoring the genuine fear, insecurity, and worries people are experiencing)

5 Comments
2025/02/01
04:18 UTC

11

US Census Bureau ftp

Hi fellow hoarders, I noticed the detailed data downloads from the census bureau (the ftp site) is down right now. Is this a coincidence or just routine maintenance?

https://www2.census.gov/geo/tiger/TIGER2024/

I would like to save all of this down as I use it for a lot of personal and professional work. And it's just cool.

4 Comments
2025/02/01
04:00 UTC

15

all Instagram story savers are capped at 720p!! no more Full HD.

the last 1080p story I saved was January 6, and all 35 stories I've ripped since then are 720p. very disappointing as if I knew I would have screen recorded. has Instagram blocked apps from ripping stories at max bitrate?

what apps or websites are u guys using?

1 Comment
2025/02/01
02:24 UTC

4

SSD drive integrity after accidental exposure to weather elements

Hello,

I recently found 2 Samsung SSDs (980 and 990) that I lost 6 months ago. Apparently, someone from the household was "cleaning" and accidentally left these SSDs outside. Oops ...

I live in the PNW. It's humid here and has been freezing for a week, with temps dropping to 25F. These are/were brand new hard drives, nothing stored on them.

Would you think the hard drives survived? Would this type of memory "survive" freezing temperatures? How to "double" check the integrity of these drives?

Thank you

3 Comments
2025/02/01
02:17 UTC

10

Organized continuity effort

Is there any group organizing an effort to create a shadow instance of "vital sites and information"? I would be willing to bet that many of us have at least some spare space and the ability to host things like cdc.screwfascists.com or whatever to make sure that things are continued. Maybe this could be the beginning of a trusted decentralized register of scientific and historical data. Not to step on Wikipedia's toes.

4 Comments
2025/02/01
01:03 UTC

13

Data.gov currently being scrubbed

1 Comment
2025/01/31
23:49 UTC

148

Score!

21 Comments
2025/01/31
23:46 UTC

37

Gov YouTube channels to get?

Given the news I'm planning on turning my TubeArchivist instance for good. I don't think these are in the EOT archives, but if they are feel free to ignore me.

So far I'm collecting:

  • CDC
  • HHS
  • Census Department
  • Department of State (large channel, will take time)

I'm sure there's more, but the first two are my highest priority right now, I've had a handful of videos removed already.

8 Comments
2025/01/31
23:34 UTC

0

Does anyone have the BRFSS places 500 data?

Hello All,

I was going to use the BRFSS Places for a project. But the site is down

Does anyone have the BRFSS places 500 data by census tract for the latest year?

I would really appreciate it.

12 Comments
2025/01/31
22:51 UTC

0

Rate my setup!

I've got a QNAP 873A 8-bay NAS. Four Samsung QVO 870 8TB drives in a 32TB RAID0 array. Four 16TB EXOS X16s in a RAID6 configuration. HBS3 is set to real-time sync data from the RAID0 array to the RAID6. The SSD array is also backed up to Backblaze B2 nightly. The NAS has 64GB of RAM and a QUADRO T1000 8GB GPU, stock fans have been replaced with Noctua NF-P12's, all ethernet ports have been fitted with surge protection.

I mean, is that good enough, already? How much more can I do lol

7 Comments
2025/01/31
22:29 UTC

21

DHS data access going down

1 Comment
2025/01/31
21:30 UTC

151

How can I help archiving public US Government stuff to the Internet Archive? As a European...

I just wanted to ask if there's a way to help your efforts to save and archive public data from Trump's actions.

I got an Unraid setup at home and I want to do something to help you all out, because knowledge is so damn important.

Is there a simple Docker container I could set up? Can I lend a hand somehow?

I hope this is the right sub...

Thanks in advance xxo

13 Comments
2025/01/31
19:51 UTC

0

Any $8 per TB deals out there? I'm seeking a refurb 14-16 TB internal SATA drive

I'm seeking a refurb 14-16 TB internal SATA drive for backup and storing (not connected) purposes. So, it's not critical this be working perfect everyday for the next 5 years if you get what I'm saying here.

It appears all the deals have dried up after XMAS. How long must I wait until these type of deals re-appear?

3 Comments
2025/01/30
21:12 UTC

1

yt-dlp vs jdownloader

is there any benefit to using yt-dlp commandline as opposed to just jdownloader?

1 Comment
2025/01/30
21:36 UTC

1

1 Drive or 2 ?

I need more storage but I’m limited on my budget. I need people’s opinion.

2 8TB Drives and mirror them Or 1 20TB Drive and add and other later to mirror.

Is it safe ish to run 1 drive for say 6-8 months before I can get another?

2 Comments
2025/01/31
02:16 UTC

2

SOSSE v1.12.0 Released – Web Archiving, Crawling & Search Engine

Hey everyone! We're excited to announce the release of SOSSE v1.12.0, the latest version of our open-source web archiving software, crawler, and search engine.

For those unfamiliar, SOSSE (Selenium Open Source Search Engine) lets you:

  • 🔍 Search web page content, including JavaScript-rendered pages
  • 🕵️ Crawl sites at regular intervals & detect content updates
  • 📥 Download files in bulk from web pages
  • 📑 Archive pages with local assets for offline access
  • 🔔 Monitor websites and generate Atom feeds for new content
  • 🔒 Authenticate to access private content

📖 Full docs: https://sosse.readthedocs.io/
🐙 GitHub: https://github.com/biolds/sosse
🦊 GitLab: https://gitlab.com/biolds1/sosse
💬 Join us on Discord: https://discord.gg/Vt9cMf7BGK

📢 We Need Your Input!

We're running a short survey to help prioritize new features and gauge interest in professional support. If you've used SOSSE or are interested, please take a moment to fill it out:
➡️ https://framaforms.org/202502-sosse-survey-1738309561

Your feedback is invaluable! Let us know what you think about v1.12.0! 🚀

0 Comments
2025/01/31
08:34 UTC

Back To Top