S3fs filesystems are really slow. We tested around 10mb/s for file upload. Where it really struggles is when you have a lot of files in a folder. Try doing an 'ls' on a folder with hundreds of files to see it break.
As always, it depends on your use case. Just because it can be slow doesn't mean it's not a viable (and in some cases superior) option.
We use it to store petabytes of large video files and our system is structured such that no folder ever has more than a couple files in it (>20 is rare). With properly tuned caching this works fantastically well for our use case and I would take the simpler code and reduced points of failure over NFS nonsense any day.
That of course doesn't mean s3fs is the solution to every problem, it simply means it's good to have options and don't write something off because it "might be slow."
Know your data, know your use case, and know your tools. You can make smart decisions on your own rather than driven by anecdotal comments on HN.
I agree with your points. It is indeed a viable, if somewhat clunky solution.
For getting the data into S3, we found exponential improvements in using the AWS CLI, as I believe it handles uploads in a multi-threaded way.
S3fs turned out to be viable for our use case, storing Magento Enterprise content assets which are then served directly from S3, so the app's upload features rely on s3fs as well as the file checks from the app itself (which are indeed quite slow).
I've always wanted to do it natively, mounting EBS volumes on more than one instance (which is not currently possible) or wishing for a native NFS service like AWS released.
All in all, it is a happy day for me. More options make us more powerful.
The fuse-based ones that I've tried were ridden with problems and poor error handling. Hangs and truncated files were the rule rather than the exception.
s3fs-fuse has its share of problems, but master has fixes for some of the error handling and truncated files issues. Please report any bugs you encounter on GitHub!
> Q: What data consistency model does Amazon S3 employ?
> Amazon S3 buckets in the US Standard region provide eventual consistency. Amazon S3 buckets in all other regions provide read-after-write consistency for PUTS of new objects and eventual consistency for overwrite PUTS and DELETES.
I'd be very interested to know what kind of consistency guarantees EFS provides. The history of NFS is plagued by syscalls whose docs have a variation of the phrase "this operation is atomic (except on NFS)".
> Finally, note that, for NFS version 3 protocol requests, a subsequent commit request from the NFS client at file close time, or at fsync() time, will force the server to write any previously unwritten data/metadata to the disk, and the server will not reply to the client until this has been completed, as long as sync behavior is followed. If async is used, the commit is essentially a no-op, since the server once again lies to the client, telling the client that the data has been sent to stable storage. This again exposes the client and server to data corruption, since cached data may be discarded on the client due to its belief that the server now has the data maintained in stable storage.
I am not certain how this works in NFSv4 which is what EFS will be. The safe solution is to use the sync option for mounting the NFS volume, at the cost of performance.