It's useful if you want to log to a centralized logging server. This helps to have all your logs in one place and also keeps the logs safe, if someone breaks into your server.
That's a pretty weak argument considering that syslog is entirely UDP and is bound to drop log data, sometimes en masse, most likely even silently. Not a good idea.
Why not use something like multilog or svlogd and wire up a tiny processor for it to kick logging data over someplace using something like rsync?
To boot, syslog is annoying to tune, depending on your particular implementation. rsyslog has a default buffer limit of 2k, whereas other syslog implementations (IIRC, syslog-ng and Solaris syslog at the very least) have default buffer limits of 1k, and this might not be obvious until you're running up against that and make the shocking discovery that you're losing data.
On an nginx server that services 2TB/mo worth of transit (which is distinctly possible since I've got infrastructure in production that does this), there's a good chance that you'll be stretching some of these limits a bit.
> That's a pretty weak argument considering that syslog is entirely UDP and is bound to drop log data, sometimes en masse, most likely even silently. Not a good idea.
rsyslog and syslog-ng have support for TCP
> Why not use something like multilog or svlogd
Additional point of failure.
> and wire up a tiny processor for it to kick logging data over someplace using something like rsync?
Additional point of failure (processor); additional point of failure (rsync/ssh); non-realtime log replication (which is bad for breaking/progressive system failure/etc).
> To boot, syslog is annoying to tune, depending on your particular implementation.
All of the examples you list are easier to learn about, tweak, and monitor than the suggestions you've proposed, however.
> On an nginx server that services 2TB/mo worth of transit (which is distinctly possible since I've got infrastructure in production that does this), there's a good chance that you'll be stretching some of these limits a bit.
If you're dealing with 2 TB/mo in transit, you're probably capable enough to understand the risks with centralized log management and mitigate/monitor them ahead of time.
In addition to what danudey says, modern sysloggers also support Unix domain sockets, which are reliable. Typically this is /dev/log; on Linux, I believe the GNU C Library's syslog() uses this by default.
On Linux, sending UDP to localhost is very reliable and fast, essentially going through kernel buffers with very little overhead. You will only see dropped data if the system is extremely overloaded. I did some testing, a few years back, and was not able to induce packet loss on localhost.
The usual way to set up centralized logging with syslog is to have each node run a local syslog daemon (eg., RSyslog), which then buffers the data and streams it to a central syslog daemon using a more reliable protocol such as RELP [1] over TCP.
While I agree with you on the whole, one minor point about UDP/datagrams: they are not reliable even on localhost under some circumstances. The point of datagrams is that they are allowed to be lost without a trace if the consumer (syslog) is not consuming fast enough. For example, if process A starts spewing 10,000 log records (UPD or datagram UNIX socket packets) a second at syslog, and syslog can only handle 5,000, then the other 5,000 records will be lost. Any other process will also get its records lost as they will not be guaranteed to be processed. The rate of loss will be controlled by how large a datagram buffer the consumer's kernel has. Moreover, the processing will not be uniform: the buffer is LIFO, so older records will be processed while newer ones will be lost.
On the other hand if you use stream sockets, the producer will either block or be told that the consumer is not ready to read any more data (beauty of TCP). In either case, TCP produces enough overhead compared to UDP to slow down the actual useful part of your application, which is often not desirable.
Neither one of these is a good solution as either your consumer or your producer needs to keep their own very large buffers to accommodate spikes in traffic. Ideally, you do this anyways to ensure that you hold onto all the packets you received.
Having said that, I don't know exactly what rsyslog does so I cannot say if this would actually be a problem for it.
- A large enough deployment to want centralized logging
but are:
- Cheap enough not to buy nginx (for good reason or not)
and
- Too lazy to maintain a patcheset against distro packages
and
- Too bad at linux administration to use the file pipe trick to log to syslog anyway
So, yeah syslog is nice but this change does have quite a narrow use case. What it did have were vocal complainers that knew the right places to complain online to be noticed.
This is kind of a cheap shot at people who just wanted built-in syslog support. There are plenty of reasons someone might not be able to patch or upgrade a binary, and using a fifo is a pretty bad kludge considering the whole thing can hang if they're started/stopped in the wrong order (meaning your startup scripts now have to be rewritten). Building in syslog support means just getting the damn thing working without special hacks and kludges.
But to answer OP's question, syslog is better than just appending to a file because syslog does a lot of things for you, like filtering your logs in real time and splitting them into new files, logging remotely to industry-standard aggregation devices, access control, (somewhat) standardized formatting, log rolling, etc.
The simplest answer I could think of is getting remote syslogging for free. This also makes it easier to process nginx logs in realtime without named pipe trickery. I'm pretty excited to clean up some log ingestion code.
Yes, I'm saying it's really about time they added it to the core, especially because it's been available as a patch for a long time.
I get that they need to differentiate their paid version, I just wish it was in areas where the community hadn't already provided something that works.
That's an implementation defined limit in practice. rsyslog for example defaults to 2K (apparently to be compatible with upcoming RFCs), but it's configurable to higher values. If you want to be compatible with other software though, it may need to stay at 1K.
Have a look at this[0] blogpost. Also, I'm just guessing, once kdbus is in the kernel using direct calls to journald will (or could potentially) be more efficient.
It's UDP only. They also hard-code the port instead of looking it up with getservbyname, which is a little weird. If you want to use TCP, send syslog messages to localhost, and have a local rsyslog/RELP forwarder send messages remotely.
getservbyname under glibc uses dlopen, effectively working around static compilation. while this is a non-issue on ubuntu, the nginx devs are aware of the number of people building against uclibc or musl to produce static binaries for embedded use. I've seen nginx running on bare metal -- just it and libc, no kernel underneath. kudos to them for making this easier than they have to.
Did someone manage to setup remote log using the buffer option? I get a "parameter "buffer=32k" is not supported by syslog" but I'm pretty sure I'm doing something wrong.