It's useful if you want to log to a centralized logging server. This helps to have all your logs in one place and also keeps the logs safe, if someone breaks into your server.
That's a pretty weak argument considering that syslog is entirely UDP and is bound to drop log data, sometimes en masse, most likely even silently. Not a good idea.
Why not use something like multilog or svlogd and wire up a tiny processor for it to kick logging data over someplace using something like rsync?
To boot, syslog is annoying to tune, depending on your particular implementation. rsyslog has a default buffer limit of 2k, whereas other syslog implementations (IIRC, syslog-ng and Solaris syslog at the very least) have default buffer limits of 1k, and this might not be obvious until you're running up against that and make the shocking discovery that you're losing data.
On an nginx server that services 2TB/mo worth of transit (which is distinctly possible since I've got infrastructure in production that does this), there's a good chance that you'll be stretching some of these limits a bit.
> That's a pretty weak argument considering that syslog is entirely UDP and is bound to drop log data, sometimes en masse, most likely even silently. Not a good idea.
rsyslog and syslog-ng have support for TCP
> Why not use something like multilog or svlogd
Additional point of failure.
> and wire up a tiny processor for it to kick logging data over someplace using something like rsync?
Additional point of failure (processor); additional point of failure (rsync/ssh); non-realtime log replication (which is bad for breaking/progressive system failure/etc).
> To boot, syslog is annoying to tune, depending on your particular implementation.
All of the examples you list are easier to learn about, tweak, and monitor than the suggestions you've proposed, however.
> On an nginx server that services 2TB/mo worth of transit (which is distinctly possible since I've got infrastructure in production that does this), there's a good chance that you'll be stretching some of these limits a bit.
If you're dealing with 2 TB/mo in transit, you're probably capable enough to understand the risks with centralized log management and mitigate/monitor them ahead of time.
In addition to what danudey says, modern sysloggers also support Unix domain sockets, which are reliable. Typically this is /dev/log; on Linux, I believe the GNU C Library's syslog() uses this by default.
On Linux, sending UDP to localhost is very reliable and fast, essentially going through kernel buffers with very little overhead. You will only see dropped data if the system is extremely overloaded. I did some testing, a few years back, and was not able to induce packet loss on localhost.
The usual way to set up centralized logging with syslog is to have each node run a local syslog daemon (eg., RSyslog), which then buffers the data and streams it to a central syslog daemon using a more reliable protocol such as RELP [1] over TCP.
While I agree with you on the whole, one minor point about UDP/datagrams: they are not reliable even on localhost under some circumstances. The point of datagrams is that they are allowed to be lost without a trace if the consumer (syslog) is not consuming fast enough. For example, if process A starts spewing 10,000 log records (UPD or datagram UNIX socket packets) a second at syslog, and syslog can only handle 5,000, then the other 5,000 records will be lost. Any other process will also get its records lost as they will not be guaranteed to be processed. The rate of loss will be controlled by how large a datagram buffer the consumer's kernel has. Moreover, the processing will not be uniform: the buffer is LIFO, so older records will be processed while newer ones will be lost.
On the other hand if you use stream sockets, the producer will either block or be told that the consumer is not ready to read any more data (beauty of TCP). In either case, TCP produces enough overhead compared to UDP to slow down the actual useful part of your application, which is often not desirable.
Neither one of these is a good solution as either your consumer or your producer needs to keep their own very large buffers to accommodate spikes in traffic. Ideally, you do this anyways to ensure that you hold onto all the packets you received.
Having said that, I don't know exactly what rsyslog does so I cannot say if this would actually be a problem for it.