I'm planning on burning time on building a full failure solution. Records snapshotted at least daily and any single node/service can die entirely and there is an exact, tested manual recovery checklist or automatic rollover option in place for each permutation.
This runs counter to the more cavalier "release early, polish later" advice I keep seeing. Maybe I am doubly freaked out because the things I'm storing are not easily recovered or re-imported by the users themselves or any kind of algorithm/redux.
I want to not only have a backup scheme but also make sure it's restore-tested. Maybe I wasn't totally clear, not planning on a beautiful failover in each place in the beginning (planning failover for the DB at least). Just a tested (even if manual) restore procedure in each situation.
This runs counter to the more cavalier "release early, polish later" advice I keep seeing. Maybe I am doubly freaked out because the things I'm storing are not easily recovered or re-imported by the users themselves or any kind of algorithm/redux.