May I ask why? My experience with NFS is pretty bad performance wise.

matheweis · on April 9, 2015

Not sure why NFS gets such a bad rap. On a low-latency network, properly tuned NFS has very few, if any performance issues.

I've personally seen read/write rates exceeding 800MByte/sec on more or less white-box hardware, at which point it was limited by the underlying storage infrastructure (8Gbit fiber), not the NFS protocols.

Dell has a 2013 white paper (I'm not affiliated with them, fwiw) about their fairly white box setup that achieved > 100,000 iops, 2.5GBbyte/s sequential write, and 3.5GBbyte/s sequential read: http://en.community.dell.com/techcenter/high-performance-com...

buster · on April 10, 2015

(me: Working in the Messaging Business for ISPs)

Not sure how it would ever be technically possible for a networked filesystem to get even near directly attached storage. But, for sure, the typical carrier grade EMC or Netapp is MUCH slower then a good SAN. I'm talking about petabytes of very small (average maybe 20kB) files with lots of _random_ sync writes and reads. NFS has a lot of other benefits, but it surely is not super high performance in every usecase. Regardless of what a theoretical marketing whitepaper has shown in some lab setup.

Someone who thinks that you can put a network protocol around a filesystem without _any_ performance impact is nuts.

BUT if your usecase fits NFS you might as well get very good performance out of it. As always, pick the right technology for your specific case.

dekhn · on April 10, 2015

Petabytes of 20k files?

I think you might want to use your filesystem more effectively.

buster · on April 11, 2015

Well, what would it help in terms of NFS? You'd still have to tell NFS to read 20kB. If it's 20kB from one big file or one 20kB file doesn't matter much. it's common to have one file per email and the usual filesystem has no problem with that.

eknkc · on April 9, 2015

Is it really that good?

My only test case has been a VMWare virtual machine, mounting an nfs share from the host so I could work on my local filesystem and execute within the VM. Switched to a filesystem wather + rsync combo after struggling with poor random read performance. Maybe it was due to bad configuration but always thought it would be a poor choice for anything serius.

ploxiln · on April 9, 2015

That might be an issue with scheduling of the virtual kernel and the host kernel.

I've found nfs to be much faster and more reliable than sshfs or smbfs for VMs, using either qemu-kvm on linux or virtualbox on OS X.

SEJeff · on April 9, 2015

That very much depends on what you're using and how you tune things. With NFSv4.1, you can use parallel nfs, which is essentially striping reads and writes over multiple nfs servers.

http://www.pnfs.com/

If you're using modern servers and clients, it is as fast as you can imagine a cluster of ssd nfs servers to be.

toomuchtodo · on April 9, 2015

Depending on how you tune it, it can be a monster. Several years ago I was managing a cluster with ~5K linux instances all mounted to ~4PB of spinning disk served with NFS. Worked very well.

click170 · on April 9, 2015

Can you speak to any stale NFS handle problems?

I've used NFS at home and have had NFS file handle problems but IIRC that was only when there were problems like kernel faults or network partitions.

However several of my colleagues at work have many NFS horror stories and are adamant that NFS does not scale well.

Is NFS stability at scale simply a function of your underlying network and infrastructure stability in your opinion?

rectang · on April 9, 2015

Unlike typical local Unix file systems, NFS does not support "delete on last close" semantics.

Ordinarily, even if you unlink a file, the operating system keeps the inode around until the last filehandle referencing it goes away. But an NFS mount cannot know when all filehandles on all networked systems have closed. When you attempt to read from an NFS file handle whose underlying file has been deleted out from under you, BOOM -- `ESTALE`.

The solution is typically to guard against file deletion using read locks... which are extremely annoying to implement on NFS because of portability issues and cache coherency problems.

I'm not sure I'd describe that as a "scaling problem" per se, because it gets bad quickly and stays bad. It's more of a severe limitation on how applications and libraries can design their interaction with the file system.

gavia1 · on April 9, 2015

Doesn't NFS rename deleted files to some temp name so other clients can still read and write to it using their existing file handle?

dekhn · on April 9, 2015

some implementations of NFS use "silly rename" to get delete on last close semantics. http://nfs.sourceforge.net/#faq_d2

I think that's limited to v2/v3 and not fully general or reliable.

matheweis · on April 9, 2015

Having a low latency network is key (e.g. LAN across an office or maybe a city should be fine, but WAN across country would be bad news).

Hard coding the fsid's on the server side can help if you change your exports a lot.

Using the automounter can help, although it does not scale to large numbers of mounts well.

acdha · on April 9, 2015

It very much depends on your workload, particularly with NFSv3 and earlier. We were able to reliably handle multiple gigabit streams no later than 2005 but that was writing to huge files (backing up a ~2-3Gbps data acquisition system being processed by 4 Mac or Linux clients).

Small files were much worse because they require a server round-trip every time something calls stat() unless you know that all of the software in use reliably uses Maildir-style practices to avoid contention. That meant that e.g. /var/mail could be mounted with the various attribute-cache values (see acregmin / acdirmin in http://linux.die.net/man/5/nfs) but general purpose volumes had to be safe and slow.

If you read through the somewhat ponderous NFSv4 docs, there are a number of design decisions which are clearly aimed at making that use-case less painful. I haven't done benchmarks in years but I'd assume it's improved significantly.

azinman2 · on April 9, 2015

Ya. I'm wondering how you'd back say a PostgreSQL instance over nfs? It is just different needs that require different solutions.

dekhn · on April 9, 2015

You run PostgreSQL or any other DB server with its DB data dir on the NFS mount.

Oracle supports this- and they even wrote a user-space NFS client to "get the highest level of performance" (because they thought the kernel NFS implementation sucked).

The important bit is to ensure the NFS client and server implementation handle whatever POSIX features are required by the DB server.

azinman2 · on April 10, 2015

Why would you want to? You can't share across two instances at the same time anyway, it's going to be slower/more edge case-y, and the cost with Amazon is higher?

alexchamberlain · on April 9, 2015

Is anyone suggesting that? It is widely considered best practice to back your databases by SSDs.

mallipeddi · on April 9, 2015

Well he's pointing out EFS is not a replacement for EBS. You can run postgres on EBS volumes with PIOPS.

cwyers · on April 9, 2015

If EFS offers PIOPS, couldn't you run Postgres on it, too?

teraflop · on April 9, 2015

EFS is backed by SSDs.

alrs · on April 9, 2015

Yeah, if you need a performant file system you use an instance with a lot of local storage or you get out of EC2.