3-2-1 Rule storage concept
When storing data the 3-2-1 rule is a generally accepted best practice. It means that you should have:
- 3 copies of data (1 production, 2 backups)
- 2 types of storage media
- 1 offsite copy
Even in times of modern cloud services this diversification makes sense. As it protects in case something should happen to your data. Those events are rare but they still do happen from time to time and who would want to loose for example all personal photos of the last years to such an event. Below I outline the basic layers of my personal storage concept which almost naturally suffices the 3-2-1 rule in almost all cases.
My personal storage concept is simple and universal in the sense that it's applicable to almost every service or type of data. Maintenance effort and running costs are very low.
1. Production data
My personal production data is almost always with some cloud service. This allows convenient access from everywhere in the world and data is protected and backed up by the service provider. So far I've never run into any significant problems but there is a remaining concentration of risks relying on single cloud service providers only.
2. Local storage layer
The local storage layer is basically a
- Debian Linux server with 10 GE NIC
- 3 x 1 TB SSDs (each one of a different brand from a different vendor) on
- ZFS in RAID-Z1 mode.
While RAID-Z1 may be not perfect it's comparable to RAID 5 which is already pretty good for a non-professional environment. The beauty of ZFS is that it's all software, very reliable, doesn't have any special hardware requirements in terms of controller, harddisk, protocols, etc. it just needs a few gigabytes of RAM.
This foundational local storage layer is also where all the VMs and Containers have there persistence and therefore all local data is stored.
ZFS has proven to be very smooth and reliable and already handled a failed SSD seamlessly and gracefully.
3. Distributed storage layer
For all the files I need to have on different machines I use Resilio Sync. Resilio Sync has emerged from the BitTorrent sharing protocol and is therefore from its very nature a distributed data sharing protocol that is not following a strict client / server paradigm. This is brilliant since every node in this private sharing network is a fully capable peer that can synch from and to other peers. I do use a server as a kind of always on node but technically it's just another peer.
Unlike the typical cloud services that offer comparable features I found Resilio Sync to be much faster when moving files around locally between peers and the support on all the OSes I use (Linux, MacOS, Windows, iOS) is solid. It also solves the need to have a local copy on my local storage layer almost as a side effect.
One drawback though is that iOS file integration is still not really good and fully functional but that's almost all I would have to complain besides that it is not open source.
4. Client side encryption
For local encryption of sensitive content I use Cryptomator. Cryptomator is open source (please donate) and easy to handle. Again it works well on Linux, MacOS, Windows and recently also iOS. Cryptomator encrypted containers can easily be stored on every normal file system which makes them very portable.
5. iOS Photo backup
Photos on iOS is a use case where I would in general trust the cloud service to handle storage and backup. However the content that accumulated here over time is meanwhile just to precious to me to risk loosing it and not having a backup. Backing up Apple Photos is unfortunately not that straight forward and requires you to have a device with enough storage in order to download all photos in original quality. If you don't have a device (mobile phone or computer) with enough storage you are out of luck.
To automatically sync new photos in original quality to my local storage I use the PhotoSynch App. It has some limitations due to iOS not allowing to compare and keep track of what was already synced but developers found ways to mitigate this and it works really well. Every time when I am in the defined location new photos and videos are synchronized automatically to my distributed and local storage layer.
6. Offsite backup
Last but not least the offsite backup. Even though it's somewhat unlikely to happen an unfortunate chain of events might at some point in time make this worthwhile and since the effort to set it up and the running cost of it is minimal, I think this is a good investment.
What is basically done here is that defined folders of the local storage layer are regularly backed up to a Google Cloud Storage bucket. Data is encrypted client side before transferring it to the GCP bucket. I choose Archive storage type since it is extremely cheap at $0.0012 per GB per month in europe-west plus a little for the data transfer and I intend to never download it unless in the worst case.
I use Restic for backing up data incrementally to the GCP bucket. While it's not perfect and I would prefer GPG encryption instead of password based, it is just super easy to handle and has so far been very reliable. Automation can be achieved with a simple cronjob.
7. What next
At this point in time my storage and backup concept has proven to be simple and low effort to handle while fulfilling the 3-2-1 rule in almost any case without a lot of additional effort. Costs for it are negligible as it uses existing hardware and cloud storage run costs are super cheap anyway.
I hope that at some point I'll be able to embed Proton Drive (which is currently beta) into my storage concept as I just like the concept of zero access and full end-to-end encryption very much. But then it will likely depend on how well Linux server integration will be.