Two months of feed reader behavior analysis

quectophoton · on Aug 4, 2024

How many of the "polling issues" can be explained by clients doing heuristic caching[1][2] due to a lack of a `Cache-Control` header in the responses?

If the feed was last updated (for example) 1 hour ago, and the response lacks a `Cache-Control` header, in that case the response would be cached for 6 minutes (10% of 1 hour; the 10% is mentioned in RFC 9111 section 4.2.2).

[1]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching#he...

[2]: https://datatracker.ietf.org/doc/html/rfc9111#name-calculati...

kkfx · on Aug 3, 2024

Personally I choose tt-RSS for a reason: human quickness going through posts. I can have all my feeds arranged, just a line per entry, scrolling mark them as read, opening one open a side panel without touching visually the current post and threads list, being on a desktop with a 21:9 screen it's a perfect mach. I've used Miniflux before but it's less quick skimming posts, opening one, come back to the compact list etc.

Before, but very before in the past, I've tried few desktops one, and the final choice was a Java/SWIG one RSSOwl, heavy but effective enough compared to others. I've also used for a period elfeed (Emacs) but it's too slow to read feeds for me, Emacs offer UI for focused read, while for a large amount of low importance posts I need something allowing a very quick pass even if something get lost.

A thing I miss in ALL feed readers I've tried so far is real-world filtering abilities like fuzzy-matching titles to reasonably show only one per kind of news (let's say you follow many newspapers and nearly all report the same news about an earthquake somewhere, there is no point in seeing let's say 12 posts on the same event), eventually offering a button to show all the matched if I want to go through them. Another is historical analysis, let's say every years there are wildfires around the world, I'd like to see from news I've read if they are almost the same of the last year or more, if they start to appear earlier and last longer etc, it's still fuzzy keywords matching, nothing so hard, but still absent in all, I imaging I'm one of the very few interested in such automation to use feeds as a personal aggregator. Gnus with scoring offer something similar, but well, it's too slow to really skim things, and easy to break as well.

trekz · on Aug 3, 2024

Not surprised to see Reeder in there. It’s a great app for Apple users. But that app can bring a website to its knees with how aggressive it is.

I can see in my logs that it constantly makes over ~20 requests to different RSS feeds on my domain, all in the exact same millisecond. Happens multiple times a day. And it appears to rotate IPs. Scary… Tried reaching out to the developer about it twice, but they never responded.

latexr · on Aug 4, 2024

> Not surprised to see Reeder in there. It’s a great app for Apple users.

I agree. Until you find a bug or have a feature request.

> Tried reaching out to the developer about it twice, but they never responded.

And this is exactly why. The developer is the most unresponsive I’ve ever seen. I don’t know why they bother with a “Support / Feedback / Contact” form on the website. And it’s not just you or me, I’ve seen the same commentary from other people.

So if you want to use the app, you better like it as it is. Especially since the developer is working on something else which overlaps in functionality, so I doubt Reeder will get much love going forward. It’s a shame, because it’s the best feed reader I’ve tried, and its small annoyances could be easily solved.

radicality · on Aug 3, 2024

It also queries from external servers? I was under the impression it’s all from the IP of the users themselves. I have Reeder on iOS, and all the feed storage set to iCloud, and afaik whenever I open the app and it’s syncing, I imagine it’s going via whatever network I’m currently connected to.

rcarmo · on Aug 4, 2024

That's probably Apple's Private Relay feature

latexr · on Aug 4, 2024

iCloud Private Relay only affects Safari.

https://support.apple.com/en-us/102602

rcarmo · on Aug 4, 2024

That's not what it says on that article. According to their PDF, Private Relay also covers apps:

https://www.apple.com/privacy/docs/iCloud_Private_Relay_Over...

latexr · on Aug 5, 2024

> That's not what it says on that article.

I don’t see any mention of anything else but Safari on the page.

> According to their PDF, Private Relay also covers apps

Only if the app’s traffic is unencrypted, which is a an important caveat. In practice, I doubt that affects many.

Still, thank you for the correction. I was under the impression there was another small case in addition to Safari but wasn’t finding it so thought I misremembered.

And it is relevant in this case since it is plausible someone added a non-HTTPS feed URL as a feed and never updated it.

aquova · on Aug 3, 2024

Can someone explain how to parse this article? There's a number of similar entries for each client. I can't tell how to interpret these complaints.

slyall · on Aug 4, 2024

I assume they are reading down a list of unique clients they have identified. Pretend their is a hidden column containing IP addresses.

Some clients have the multiple entries because multiple people/companies will have installed the same software.

jerlam · on Aug 3, 2024

I'm disappointed in Feedly here, as I've been on their platform since Google Reader was shut down - over ten years. And their script is still at 1.0.

Rachel's complaint about Feedly's overzealous polling also contrasts the experience of author John Scalzi, where his blog was simply getting ignored by Feedly:

https://whatever.scalzi.com/2024/08/02/the-feedly-issue-appa...

8organicbits · on Aug 4, 2024

Does anyone have experience with WebSub? It's designed to solve the frequent polling problem while still giving readers immediate access to content.

https://en.m.wikipedia.org/wiki/WebSub

JustARandomGuy · on Aug 3, 2024

Unread RSS Reader. Godawful poll timing. 6103 requests in 52 days is about one poll every 736 seconds _on average_, but they're hugely spread out. WTF? Put it this way: the list of unique intervals (nn seconds, nn minutes, ...) is four pages tall on my web browser.

Not entirely sure what the criticism here is other than polling on average every 12 minutes seems a little excessive at best. Why does in matter it the intervals are a bit wonky? I could think of many reasons why: maybe the poll intervals are smaller during the daytime and more spread out over the night to optimize for reading conditions, etc

quectophoton · on Aug 3, 2024

Last time I checked, she wasn't including `Cache-Control` header in the responses. That means a well-behaved client would be doing heuristic caching.

Quoting from https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching#de...

> As mentioned above, the default behavior for caching (that is, for a response without Cache-Control) is not simply "don't cache" but implicit caching according to so-called "heuristic caching".

And then, quoting from https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching#he...

> How long to reuse is up to the implementation, but the specification recommends about 10% [...] of the time after storing.

So assuming I'm reading this correctly, if she still hasn't added `Cache-Control` headers to the feed responses, and her last post was (for example) 24 hours ago, that means a well-behaved client will check the feed again in 2.4h from now, and then again 2.64h after that, and then 2.9h after that, and so on.

If the last feed update was 1 hour ago, and no `Cache-Control` header is present, a well-behaved client would cache the response for 6 minutes (10% of 1 hour).

Quoting the next paragraph (right after my last quote):

> Heuristic caching is a workaround that came before Cache-Control support became widely adopted, and basically all responses should explicitly specify a Cache-Control header.

So a possibility is that Unread RSS Reader might be detecting the lack of `Cache-Control` (which means Rachel is not bothering to follow caching recommendations), which then might be causing Unread RSS Reader to be doing heuristic caching as per the recommendations mentioned above, and the problem is just that Rachel doesn't like this heuristic caching but also doesn't want to include `Cache-Control` in her responses.

And if I'm understanding all this correctly, the `If-Modified-Since` and `If-None-Match` headers have nothing to do with request rate (what she's complaining about), they are only used to let the server decide if it should return a full response or if a 304 is enough.

trekz · on Aug 4, 2024

Yeah, relying too much on modified headers from requests to send the appropriate response isn't ideal, because request headers can be incorrect and largely unreliable.

Responding with a Cache-Control header with a max-age seems like a much more superior option for these cases.

butz · on Aug 3, 2024

Sad to see that so many feed readers are unable to solve polling problems. It should be in their interest to make least requests possible. Especially the ones hosting multiple tenants that are very likely subscribing to same feeds.

kibwen · on Aug 3, 2024

> rawdog/2.24rc1. Behavior is spot on. More like this, please.

I'd like to see a description of what the proper behavior is in this context. The OP uses terms like timing, pacing, conditionals, and unconditionals in a way that makes me think that these must be well-defined jargon in the context of RSS, but I don't see these in the RSS spec.

gnabgib · on Aug 3, 2024

See Rachel's past posts, there's been several on the topic (although the service/scoring doesn't seem to be linked anywhere).

[The feed reader score service is now online](http://rachelbythebay.com/w/2024/05/30/fs/)

8organicbits · on Aug 4, 2024

Looking at (a fork?) of the code,[1] I see a static polling interval.

Rawdog is a planet[2], which is fairly different from a normal feed reader.

[1] https://github.com/chirlu/rawdog/blob/75c742cb33fe968526af0c...

[2] https://indieweb.org/planet

impure · on Aug 3, 2024

Looks like I need to update my reader to use the If-Modified-Since and If-None-Match headers

Twirrim · on Aug 3, 2024

Disappointing to see feedly in the misbehaving bucket. It's an online service, you'd have expected to get this right

pipingdog · on Aug 3, 2024

It's important to consider that an online service's poor behavior is amortized over it's user base (or all of the users who subscribe via the service's local copy of a given feed), so it _could_ be worse, if all those users were fetching the feed themselves.

It's also the case that a service wants to ensure they have the freshest copy or impatient users could bail somewhere else, or just do it themselves.

But, there's certainly an opportunity for a service to perform analysis on feeds to see the rate at which they're likely to have more content, as well as take cues from metadata.

At the end of the day, RSS isn't a protocol, and feed providers are just as wild west as consumers.