More

eliasdorneles · on Aug 25, 2016

Hello, Hynek! Thanks for attrs, wonderful library!

I'd suggest to market the "serious business aliases" more prominently. It's not about aesthetics, but expectations: people like me are immediately puzzled by "attr.ib()", thinking "why ib?".

The reason is because after reading a lot of Python code, our brain is already trained to recognize attributes and method names after the dot.

Also, it's a reasonable expectation to be able to import a function or submodule and the code still make sense, but this won't make sense:

  from attr import s, ib

  @s
  class Thing(object):
      x = ib()

I understand it looks such a small thing for you and others already used to this DSL, and also that it's not the most important technical aspect of the library. However, I do think it's an important human aspect of it.

I have the feeling that by making the "no non-sense" alternative more prominent in the examples and documentation, it would reduce cognition steps for newbie users and maybe make your own life easier by not having to explain and paste this link every time someone finds it odd (which will probably continue to happen).

Cheers!

hynek · on Aug 26, 2016

Hello, thank you for the rare nice words in this thread!

I got a bit hit by surprise by the submission of this article to HN (it’s like two weeks old now) so it caught me a bit off guard close before a new release and there’s some inconsistency between the GitHub README and the docs on RTD.

The version that’s on PyPI now <https://pypi.org/project/attrs/> has it as the first sentence after the first code sample anyone ever sees (and people still complained because the just don’t read :() and the new version <https://attrs.readthedocs.io/en/latest/> has a whole section around the example explaining attrs’ scope and ib/s: <https://attrs.readthedocs.io/en/latest/overview.html> (the new version also adds and promotes aliases that make more sense but that’s beyond the point).

I feel like I’ve really done my due here and I’ll have to accept that I can’t make everyone happy. :|

eliasdorneles · on Aug 25, 2016

I've also wrote about undoing things in Git and other productivity tips here: http://eliasdorneles.github.io/2016/06/19/on-getting-product...

eliasdorneles · on April 27, 2016

This story is simply badass!

eliasdorneles · on April 20, 2016

ahh, __VIEWSTATE, a pain in the butt for everyone doing webscraping of government official websites. :)

eliasdorneles · on Feb 4, 2016

There has been some discussions and a proof-of-concept with asyncio + aiohttp: https://github.com/scrapy/scrapy/pull/1455

However, as you said, it'd be a major change and it would affect the whole ecosystem (plugins and extensions), so it's complicated. We'll see what happens. :)

eliasdorneles · on Jan 20, 2016

Well, missing data can happen from problems in several different levels:

1) site changes caused the items that were scraped to be incomplete (missing fields) -- for this, one approach is to use an Item Validation Pipeline in Scrapy, perhaps using a JSON schema or something similar, logging errors or rejecting an item if it doesn't pass the validation.

2) site changes caused the scraping the items itself to fail: one solution is to store the sources and monitor the spider errors -- and when there are errors, you can rescrape from the stored sources (it can get a bit expensive store sources for big crawlers). Scrapy doesn't have a complete solution for this out-of-the-box, you have to build your own. You could use the HTTP cache mechanism and build a custom cache policy: http://doc.scrapy.org/en/latest/topics/downloader-middleware...

3) site changed the navigation structure, and the pages to be scraped from were never reached: this is the worst one, it's similar to the previous one, but it's one that you want to detect earlier -- saving the sources doesn't help much, since it happens at an early time during the crawl, so you want to be monitoring it.

One good practice is to split the crawl in two: one spider does the navigation and push the links of the pages to be scraped into a queue or something, and another spider reads the URLs from that and just scrape the data.

eliasdorneles · on Nov 11, 2015

Agreed, that can be truly challenging. It's important to store the timestamp of when the page source was downloaded, together with the timezone if available.

eliasdorneles · on March 3, 2015

Nice tutorial! Lua scripting is lovely.

eliasdorneles · on Nov 24, 2014

interesting. it seems to support only English dates, sadly.

thraxil · on Nov 26, 2014

open a ticket.

eliasdorneles · on Nov 24, 2014

Well, it is meant to be very forgivable. Right now it outputs the same thing for both "1 week ago" and "1 weeks ago" (even though the latter is grammatically incorrect).

Can you elaborate what you mean by "proper grammar for singular values"?