/r/DataHoarder

Photograph via snooOG

This is a sub that aims at bringing data hoarders together to share their passion with like minded people.

Who are we?

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Timetm). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

We are one. We are legion. And we're trying really hard not to forget.

-- /u/5-4-3-2-1-bang from this thread


Links!!


Rule(s)

  1. Search the Internet, this subreddit and our wiki before posting.
  2. Keep it about datahoarding.
  3. Be excellent to each other.
  4. No memes or 'look at this old storage medium/connection speed/purchase' (except on Free Post Fridays).
  5. Posts must include context/detail.
  6. No unapproved sale threads, advertisement posts, or giveaways. Companies must get prior approval from mod team before posting.
  7. No cryptocurrency posts.
  8. We are not your personal archival army.
  9. r/techsupport exists.
  10. No requests, use r/DHExchange

Free Post Friday
On Fridays we'll allow posts that don't normally fit in the usual data-hoarding theme, including posts that would usually be removed by rule 4: “No memes or 'look at this [thing]'”
Just make sure to tag the post with the flair [Free-Post Friday!] and give a little background info/context.


Related Subreddits
Data Hoarding/Curation:

Servers and Homelabs:

Tech Support:

Sales & Marketplace:

/r/DataHoarder

747,520 Subscribers

0

Will a 4TB externed hard drive last less longer than a 2TB?

I want to buy the WD Elements My Passport Ultra 4TB hard drive, but I heard that higher capacity drives are more prone to failing than lower capacity ones.

Because if this is true, I would need to go for a 2TB version.

Not sure if this is a problem with WD's newer hard drive models (like passport ultra), but would really appreciate tips/suggestions from you guys.

1 Comment
2024/04/28
09:03 UTC

2

Best off-site cold storage backup method for rarely accessed data

I have a client that needs around 2PB of data backed up off site. I was going to help them get set up via cloud (glacier/cold blob), after a few calls- junked that idea based off cost (yep, not cheap).

Source data stays on several external hard drives shelved on site a secure temp controlled room.

Client is more likely to pull data from external drives as needed. We would hope they would never have to access off-site backup, and if they did- it wouldn't be urgent.

With this amount of data, which would make more sense?

A. Would it make more sense to keep the drives spinning on a couple high density storage servers in a datacenter? Including integrity checks/audits, and drive replacements every few years.

B. True cold storage, having drives in pelican cases and stored in a temp controlled off site storage facility (a bit more work).

I have also looked into tape backup as a 3rd backup depending on client's needs/risk tolerance.

If high density storage servers is the answer, I'm looking at Storinator, Synology, and Supermicro. If anyone has any suggestions regarding the job in relation to the hardware I would be interested to hear your experience. I've deployed a Synology HD6500 without any issues, haven't tried the others. (funny as this is sort of a reverse cold storage solution).

I think this is the sort of project that's up everyones alley, hoping for the best.

Also, I appreciate everyone's time reading this.

2 Comments
2024/04/28
05:28 UTC

0

Would a brand new SSD still in the box with no data on it be vulnerable to electron tunneling through a barrier?

I imagine that an SSD still new in the box has firmware data on it at least. Does only having a little data lower the chances of electron tunneling because there are fewer electrons to potentially tunnel? I'm just curious.

Thank you.

6 Comments
2024/04/28
02:20 UTC

0

Which archive format(s) do you tend to use?

There seems to be this odd problem that most programs still process files sequentially, quite often using synchronous I/O, being bound by the latency of storage and single CPU core performance. While an HDD to SSD migration where applicable is a significant drop in latency, neither option progressed much lately latency-wise, and single CPU core improvements are quite limited too.

Given these limitations, storage size and somewhat relatedly file count scaling significantly higher than processing performance means that keeping a ton of loose files around is not just still a pain in the ass, but it became relatively worse as our hoarding habits are allowed to get more out of hand with storage size improvements.

The usual solution for this problem is archiving with optionally compressing, a field which still seems to be quite fragmented, apparently not really converging towards a universal solution covering most problems.

7z still seems to be the go-to solution in the Windows world where it mostly performs okay, but it seems to be rather Windows-focused which is really not working well with Linux becoming more and more popular even if sometimes in the form of WSL and Docker Desktop, so the limitations on the information stored in the archive requires careful consideration of what's being processed. There's also the issue of LZMA2 being slow and memory hungry which is once again a scaling issue especially with maximum (desktop) memory capacity barely increasing lately. The addition of Zstandard may be a good solution for this later problem, but the adoption process seems to be quite slow.

Tar is still the primary pick in the Linux world, but the lack of a file index is quite limiting to just mostly distribution of packages, and making "cold" archives which are really not expected to be used any soon. While the bandwidth race of SSDs can offset the need to go through the whole archive to do practically anything with it, the scaling of HDD bandwidth didn't keep up at all, and the scaling of the bandwidth of typical user networks is even worse, making it painful to use on a NAS. Storing enough information to be able to even backup the whole system, and having great and well supported compression options does make it shine often, but the lack of file index is a serious drawback.

Looked at other options too, but there doesn't seem to be much else out there. ZIP is mostly used where compatibility is more important than compression, and RAR just seems to have a small fan base holding onto it for the error correction capability. Everything else is either considered really niche, or not even considered to be an archiving format even if looking somewhat suitable.

For example SquashFS looks like a modern candidate at the first sight by even boasting with file deduplication instead of just hoping that the same content would be found within the same block, but then the block size is significantly limited to favor low memory usage and quick random access, and the tooling like the usual libarchive-backed transparent browsing and file I/O is just not around.

I'm well aware that solutions below the file level like Btrfs/ZFS snapshots are not bothered by the file count, but as tools operating on the file level haven't kept up well as explained and therefore I still deem archive files an important way for keeping the hoarded data organized and easy to work with, I'm interested in how others are handling data that's not hot enough to escape the desire to be packed away into an archive file, but also not so cold to be packed into a file that is not too feasible to browse.

Painfully long 7zip LZMA2 compression sessions for simple file structures, tar with zstd (or xz) for "complex" structures, or am I behind the times? I'm already using Btrfs with deduplication and transparent compression, but a directory with 6-7 digits of number of files tend to get into the way of operations occasionally on local SSDs, with even just 5 digits tending to significantly slow down the NAS use case with HDDs still being rather slow.

28 Comments
2024/04/28
02:10 UTC

0

Storage setup advice for NAS, VM/Container, and Proxmox

Hello - I'd like to see if the following storage setup makes sense for Bare Metal Proxmox. I have a H11SSL-NC motherboard for reference and will virtualize unraid w/ 3x 8TB data drives and 1x 8TB parity drive.

  1. Proxmox install - I'm thinking either RAIDZ2 for redundancy with 4 SATA 512GB NVMe SSD's on an LSI 9500-8i HBA, but I'm open to other implementations/hardware. Not sure if I would need to have it in IT mode for this configuration.

  2. Virtualized Unraid - 9500-8i HBA w/ 3x 8TB SATA data drives and 1x 8TB SATA parity drive (WD Red Pros). Will be passing through the HBA to this VM. Not sure what to do for cache drives here.!

  3. Container/VM storage - I was thinking to maybe set my Unraid storage as a share to use for VM's/containers, but open to feedback here. Not sure how this would work though..

Please let me know what you think!

1 Comment
2024/04/28
00:42 UTC

1

Hoping for help with Redgifs, not sure where else to ask. I can no longer browse and download individual gifs with ease. I was using an extension called "Allow Right-Click", but it no longer works, I've figured out how to add a link to Jdownloader, using share, but that is tedious to organize.

Is there still an easy way to save individual videos from Redgifs? If not, what is your recommended way?

2 Comments
2024/04/28
00:16 UTC

0

2 Dell EquaLogic PS6000 SANs - Any value?

Recently moved out of our office suite and cleaned out our server room. In it were 2 Dell EqualLogic PS6000 Serial Storage Arrays filled with 300GB 15K drives (32 total.) They were not hooked up and I assume had not been used for a very long time.

Would these machines be of value to anyone these days, or should I bring them to be recycled? Trying to understand if it is worth the time and effort to wipe the drives in order to sell/donate to someone that wants it...or if the chassis by themselves would be of any value to someone and I should recycle the drives?

As an aside, I am in the Dallas area if anyone is interested in these machines. They are sitting in my garage.

3 Comments
2024/04/28
00:08 UTC

0

WaybackMachine for promotional emails/newsletters?

Anyone know if there is someone keeping an archive.org type thing for promotional email blasts/newsletters?

Essentially it would be crawling webpages similar to how archive.org works, but not storing any webpage data - instead it would just be subscribing to every email newsletter pop up it comes across and archive the emails by date and sender so it would be browsable similar to WaybackMachine...

2 Comments
2024/04/27
23:30 UTC

0

Does storage media begin to age from its manufacture date or the date that the drive is first used?

Probably a silly question, but just wondering how to best determine the age health of various media.

5 Comments
2024/04/27
21:01 UTC

0

Offline NAS Server?

Nas is on the network, but if I'm running off-grid on a laptop I have no network. If I built a nas server with I guess truenas and hooked it up to a switch, couldn't I connect the laptop to that switch and then transfer files at a good speed? I use a laptop over my desktop now due to the significantly lower power consumption, but this means I can't have a bunch of drives internally.

11 Comments
2024/04/27
20:38 UTC

0

What to do with like 40 small drives?

I have a pile of like 40 drives, probably 250gb to 1tb, mostly 500gb 600gb and 750gb I think. Anything I could do with these, or does it make sense to just sell them for a few bucks each as originally planned? Power probably isn't an issue since if anything id like to use them to backup my main storage, so it would only be on during manual backups every few days.

25 Comments
2024/04/27
20:36 UTC

0

Question regarding adding Sata ports and RAID 5 array to an existing gaming PC

I've done hours of research and I want to make sure I am understanding the information I have correctly.

My goal: Create a 4 drive RAID-5 setup within an existing gaming PC (3080TI, 9900K, 32GB ram, ASRock Z390 Taichi Ultimate, Windows 10).

Data route: PCIE express lane via a SAS HBA controller and connect four SATA drives (Likely 16TB Seagate EXO's) using Mini SAS to Sata cables.

Links to the things I am thinking of buying:

(SAS HBA controller) https://www.amazon.com/9300-16I-12GB-Adapter-03-25600-01B-LSI00447/dp/B0B49KWPQV

(Mini SAS to SATA cables) https://www.amazon.com/dp/B012BPLYJC?starsLeft=1&ref_=cm_sw_r_cso_cp_apin_dp_QX018CHSETCNMBY1R8W2&th=1

(Hard drive cage) https://www.amazon.com/dp/B0854QRSC2/?coliid=I3NMA56SIA7B92&colid=16IZISTCMIYCH&psc=1&ref_=list_c_wl_lv_ov_lig_dp_it

Overall, I do not want to touch the SATA/M2 ports on my motherboard for various reasons. I want the RAID setup constructed purely on the SAS HBA controller. Does anyone have any incompatibilities, insight or warnings for the path Im looking at here?

Edit: One concern is overall data Bandwidth between my GPU, The SAS HBA controller, and two M.2 drives I have. I do not think I understand the relationship between the ports well but the 3080Ti supposedly is an X8 drive? And the SAS HBA controller is also X8 so I think I should be okay there? I can get rid of the M.2 drives if needed.

Thanks in advance

11 Comments
2024/04/27
19:07 UTC

3

Thinking about storing encryption keys

I have a Vaultwarden instance running on my server.

I finally have set up a backup service tool using Kopia and Backblaze.

Of course, my backup is encrypted by Kopia with a long random key which is stored to Vaultwarden.

But, if my server has gone away and if I lose access to the Vaultwarden, how can I restore it if I don't even know the key to decrypt my backup ?

How do you handle a such case? Where do you store your random keys?

7 Comments
2024/04/27
17:28 UTC

1

I am not sure how to back up my 72TB (Usable) NAS

I just put together a NAS with a Synology 1821+ that has after my SHR2 (RAID 6) config 83TB of usable space. I plan to leave about 10% overhead but this leaves me with the task of backing up about 72TB worth of data. I am honestly not sure the best way to do this. I am thinking of maybe the following choices but each has it's down side. What would you do and what makes the most sense to back this up?

-No RAID, Manual Backup (4x 20TB Drives)

This option would be a little tedious, but would not have the risk of a RAID failure. However, it also would not be able to benefit form Data Scrubbing to check for bit rot.

-RAID 0, Auto Backup (4x 20TB Drives)

This option would allow for an easy backup solution with automatic backups and would save the cost of an addition drive for a RAID 5. However, it runs the risk of a RAID failure with no parody drive and also would not be able to benefit form Data Scrubbing to check for bit rot.

-RAID 5, Auto Backup (5x 20TB Drives)

This option would allow for an easy backup solution and provides a parody drive for HDD failure protection and data scrubbing for bit rot check. However, it is the most expensive option as it would require a 5th hard drive.

Update: Just for some clarification, the question about the RAID is strictly for the backup. Setting up another RAID for the back up of my main RAID.

View Poll

27 Comments
2024/04/27
16:48 UTC

0

BackBlaze backup prices for photos.

Does anybody have an estimate for how much it would cost to keep 20GB of data stored in backblaze per month? The only info I can find is per TB and a calculator I used says it would cost $1 a year - something I find very hard to believe. I imagine Backblaze is used for larger data sets than my photo collection but I'm still interested.

13 Comments
2024/04/27
15:43 UTC

57

I thought DataHoarder might enjoy this video i put together

9 Comments
2024/04/27
14:52 UTC

0

NAS First Timer

NAS Build - First timer

I have never built a NAS before, mostly pure gamer so the server side of it all is very knew to me. What advice would people give me? I’m just looking for a nice simple network drive to back up things like bills, house contracts, photos etc so me and my partner.

I currently have a spare Intel i3-8100T CPU and 2 x 4GB DDR4 SODIMM RAM.

7 Comments
2024/04/27
12:48 UTC

4

Should I Buy HBAs from "Art of Server"?

Hi, has anyone have any experience in getting HBAs from them? I am thinking of getting internal PCIe 3.0 HBAs and relevant cables from them since it appears that they know what they are doing but I will like to know how is the experience like with those who bought from them before

Thanks!

33 Comments
2024/04/27
10:16 UTC

0

Adding CRC32 to filename based on data section of audio & video files

Anyone still adding CRC32 markers on the media filenames for visual comparison? Seems to be a good balance on speed and manageability for a 300.000 random file media library.

I want to extend the idea of "rapidcrc unicode", CRC in the filename but not CRC the whole file , just CRC the data section without the metadata.

.\ffmpeg.exe -i "myfile.mp3" -map 0:a -codec copy -hide_banner -loglevel warning -f streamhash -hash crc32 -
myfile[%crc32%].mp3

Are there any tools or script that already do this? Any experience?

3 Comments
2024/04/27
09:55 UTC

17

Australia's War Memorial now transcribing their Anzac letter collections publicly

Hi! You may remember me from when I posted about the Brisbane State Archives and learnt about the data loss they suffered. I find national and international archive efforts fascinating and want to share what Australia's War Memorial (AWM) is now up to in their archiving efforts. Hopefully some of you will find this interesting as well.

The new project started during Covid, manually transcribing the ANZAC letters by a select few during isolation. Now they've received a sponsorship and are able to offer a webtool for the public to transcribe as well.

A vast amount of letters are being made available. With the help of transcribers we may be able to do deep searchers in the near future. I like this type of stuff because it offers a first hand glimpse into history that seems to be becoming further and further away.

For those who want to listen to what Robyn Van Dyk (head of research for AWM) has to say, a podcast is here.

0 Comments
2024/04/27
09:40 UTC

0

Instagram + Artstation batch downloader

4 Comments
2024/04/27
08:07 UTC

4

Automating WMV to MP4 Conversion (Free & Easy Software Recommendations?)

I'm drowning in a sea of WMV files with all sorts of different specs - bitrates, frame rates, resolutions, you name it. I'm looking to convert them all to MP4 for better compatibility, but the manual work in Handbrake is killing me.

Ideally, I'd love some free software that can analyze these WMV files and automatically choose the best MP4 settings to minimize quality loss while keeping the file size reasonable. Any recommendations from the video conversion gurus out there?

Thanks in advance!

11 Comments
2024/04/27
04:51 UTC

0

Replacing LSI HBA Card Query

Howdy, so I'm currently using 2x of these LSI SAS 9207-8e cards in my setup:

https://www.amazon.co.uk/dp/B00AENN19K

I'd like to free up a slot on my motherboard and potentially replace them with a single card, such as this LSI SAS 9201-16e card:

https://www.amazon.co.uk/dp/B00AENN16S

Would that work alright or am i possibly going to run into problems? Any input is appreciated.

I'm aware the 9207-8e is 3.0 and the 9201-16e is 2.0, i'm currently using WD Gold 20TB drives and don't run any raid configuration (setup suits me fine), so i don't think there should be any speed issue from it but you never know.

Also i have fans positioned underneath the cards so cooling isn't an issue.

3 Comments
2024/04/27
04:15 UTC

0

Is there a way to FTP into the-eye?

Is there a way to FTP into the eye.eu? I’ve tried Jdownloader, but I can’t get it to download recursively.

3 Comments
2024/04/27
03:45 UTC

0

Application/script to download Patreon Text content?

Is there software that can download various articles from Patreon? I've found ones that will pull media / images / videos but I'm more interested in text.

2 Comments
2024/04/27
02:51 UTC

0

HOME NAS SYSTEM

Hi, people of data hoarding. I need guidance in choosing and procuring a nas server. My basic needs are 4 bays, the cheaper the better, i wish i can find it for 0 cash but a guy van dream. 4 bays i need becouse i already have 4 tb hdd filled to 90‰ and it will snowball from there, it will be a frankenstein with different brands of hdd ill try to keep it seagate or wd. First job of it is mass storage as much i can get it so 4 by 4 tb is what ill go for and raid will be needed for it so i need advice on it what raid is best for max amount of tb and redundancy.

Next stage after im confident it will live without bricking itself under 3 months is some form of streaming, so next question are second hand NAS actualy capable of it, i would try to go for some form of media server, i would need a lot of advice on it and my country does not care about piracy.

I will ad some stuff i found online, you are welcome to add to it but only if it is in austria or croatia i cant order them from other places coz shipping. My budget is 100 euros flat. Hdds are welcome but are optional.

Will teilen:

https://www.willhaben.at/iad/kaufen-und-verkaufen/d/qnap-ts-419p-ii-nas-3-x-1-tb-hdd-wd-1250662821/

Will teilen:

https://www.willhaben.at/iad/kaufen-und-verkaufen/d/synology-ds413j-nas-mit-2x2-tb-festplatten-1208083021/

Will teilen:

https://www.willhaben.at/iad/kaufen-und-verkaufen/d/nas-thecus-4100-pro-1gb-ram-2x1gb-lan-1299293987/

Thank you for every response it is appriciated.

5 Comments
2024/04/27
01:54 UTC

0

Maxing Out MPG B760M EDGE TI WIFI for Plex: Need Advice on PCI Slot Flexibility & Ideal GPU/NIC Setup 🖥️🚀

I have a question about the MPG B760M EDGE TI WIFI.

I know that it has 1 exposed PCI slot, however I’m looking to use this in a Plex Server type of configuration. And it has to be a certain specification for the case I plan to use.

So my question is if any of those other PCI slots that are technically addressing their M.2 configuration can be pulled out and utilized for something else?

Ideally I’d love to have an RTX A400, RTX A1000 ir RTX A2000 to help with Plex, along with a possible 10G NIC for future proofing.

So, with this MoBo, what’s the flexibility of this board for my intended use case? I’ll be running TrueNas Core/Scale for the Plex Media Server. The other one (Yes there will be 2) will be used for downloads/docker containers/VM’s. Also, yes I have a 10G network in my house and an 8Gbps ISP connection.

Hopefully someone can answer this with all of the information I need.

6 Comments
2024/04/27
00:44 UTC

3

Software to try copy files from unstable devices

I have some old HDD and now I need to get these files an backup in decent recent media. But some HDDs are "working" but they are absurdly slow and doing some odd sounds...

I have an old spare PC, and I will put these HDDs, and I hope let some software running (I know, I don't care if it take days).

Are there good recommendations in this area? Of course free/OSS is good, but competent payed will work too. I don't care If this software is for linux too (but all the HDDs are with Windows installed).

Thanks!

7 Comments
2024/04/27
00:31 UTC

Back To Top