Not sure why NFS gets such a bad rap. On a low-latency network, properly tuned NFS has very few, if any performance issues.
I've personally seen read/write rates exceeding 800MByte/sec on more or less white-box hardware, at which point it was limited by the underlying storage infrastructure (8Gbit fiber), not the NFS protocols.
Dell has a 2013 white paper (I'm not affiliated with them, fwiw) about their fairly white box setup that achieved > 100,000 iops, 2.5GBbyte/s sequential write, and 3.5GBbyte/s sequential read:
http://en.community.dell.com/techcenter/high-performance-com...
Not sure how it would ever be technically possible for a networked filesystem to get even near directly attached storage.
But, for sure, the typical carrier grade EMC or Netapp is MUCH slower then a good SAN. I'm talking about petabytes of very small (average maybe 20kB) files with lots of _random_ sync writes and reads. NFS has a lot of other benefits, but it surely is not super high performance in every usecase. Regardless of what a theoretical marketing whitepaper has shown in some lab setup.
Someone who thinks that you can put a network protocol around a filesystem without _any_ performance impact is nuts.
BUT if your usecase fits NFS you might as well get very good performance out of it. As always, pick the right technology for your specific case.
Well, what would it help in terms of NFS? You'd still have to tell NFS to read 20kB. If it's 20kB from one big file or one 20kB file doesn't matter much. it's common to have one file per email and the usual filesystem has no problem with that.
My only test case has been a VMWare virtual machine, mounting an nfs share from the host so I could work on my local filesystem and execute within the VM. Switched to a filesystem wather + rsync combo after struggling with poor random read performance. Maybe it was due to bad configuration but always thought it would be a poor choice for anything serius.
That very much depends on what you're using and how you tune things. With NFSv4.1, you can use parallel nfs, which is essentially striping reads and writes over multiple nfs servers.
Depending on how you tune it, it can be a monster. Several years ago I was managing a cluster with ~5K linux instances all mounted to ~4PB of spinning disk served with NFS. Worked very well.
Unlike typical local Unix file systems, NFS does not support "delete on last close" semantics.
Ordinarily, even if you unlink a file, the operating system keeps the inode around until the last filehandle referencing it goes away. But an NFS mount cannot know when all filehandles on all networked systems have closed. When you attempt to read from an NFS file handle whose underlying file has been deleted out from under you, BOOM -- `ESTALE`.
The solution is typically to guard against file deletion using read locks... which are extremely annoying to implement on NFS because of portability issues and cache coherency problems.
I'm not sure I'd describe that as a "scaling problem" per se, because it gets bad quickly and stays bad. It's more of a severe limitation on how applications and libraries can design their interaction with the file system.
It very much depends on your workload, particularly with NFSv3 and earlier. We were able to reliably handle multiple gigabit streams no later than 2005 but that was writing to huge files (backing up a ~2-3Gbps data acquisition system being processed by 4 Mac or Linux clients).
Small files were much worse because they require a server round-trip every time something calls stat() unless you know that all of the software in use reliably uses Maildir-style practices to avoid contention. That meant that e.g. /var/mail could be mounted with the various attribute-cache values (see acregmin / acdirmin in http://linux.die.net/man/5/nfs) but general purpose volumes had to be safe and slow.
If you read through the somewhat ponderous NFSv4 docs, there are a number of design decisions which are clearly aimed at making that use-case less painful. I haven't done benchmarks in years but I'd assume it's improved significantly.
You run PostgreSQL or any other DB server with its DB data dir on the NFS mount.
Oracle supports this- and they even wrote a user-space NFS client to "get the highest level of performance" (because they thought the kernel NFS implementation sucked).
The important bit is to ensure the NFS client and server implementation handle whatever POSIX features are required by the DB server.
Why would you want to? You can't share across two instances at the same time anyway, it's going to be slower/more edge case-y, and the cost with Amazon is higher?