sighansen's comments

sighansen · 2025-10-03T15:00:51 1759503651

Really love litestream. Easy to use and never crashes on me. I still recommend using it as a systemd unit service. I'm not only using it as a backup tool but also to mirror databases. Looking forward to their read-replica feature.

sighansen · 2025-08-24T21:01:08 1756069268

As long as iceberg and delta lake won't support v2, adoption will be really hard. I'm working aot with parquet and wasn't even aware that there is a version 2.0.

lolive · 2025-08-24T21:24:47 1756070687

Why wouldn't they adopt the v2.0?

mr_toad · 2025-08-25T01:33:26 1756085606

Version 1 took about ten years before it became de rigueur. Version 2 is hot off the press.

lolive · 2025-08-25T15:46:07 1756136767

From my memories, when Unicode arrived [i.e ages ago], I bet 10$ it would never succeed . Now that it is reasonably supported everywhere [and I lost my 10$], I am more confident that sometimes good ideas eventually win.

#callMeOptimist

sighansen · on Feb 3, 2024

I don't understand why you wouldn't just use plain s3. There is no comparison in the readme and I would love to understand what the benefits are. Also I would have expected a comparison to maybe Apache Iceberg, but this might be more specialized for relational data lake data?

sighansen · on Jan 23, 2024

That is just plain untrue. They deport criminal migrants or migrants that are not under international asylum law. And we are talking about few thousands per year.

Throw84949 · on Jan 23, 2024

[flagged]

jansan · on Jan 23, 2024

> I bet those poor people are also disproportionately non-white!

Yes, simply based on the fact that only non-European citizens can be deported. But what is interesting is that they are also disproportionately non-east-asian.

Throw84949 · on Jan 23, 2024

Not necessarily. Many refugees are from Ukraine, they can freely move to Germany, work there and apply for social benefits. I think Germany really needs similar rules for other countries, particularly in Africa and Middle East!

That would prove they are not far right once for all!

sighansen · on Jan 14, 2024

This looks really helpful! I'm working a lot on graph databases and am wondering, if there are similar projects working with say neo4j. I guess because you don't have a schema, the complexity goes up.

jazzyjackson · on Jan 14, 2024

neo4j advertises such an integration on their landing page

https://neo4j.com/generativeai/

sighansen · on Dec 26, 2023

The commit messages are terrible. In my opinion, conventional commit messages [0] should be used for a clean commit history.

[0] https://www.conventionalcommits.org/en/v1.0.0/

ltbarcly3 · on Dec 26, 2023

I question the value of commit messages at all. Sure, at some level you need a summary of what a change is trying to do, but we have that at 5 levels now and they are completely redundant. Generally there is a ticket in some system for tracking changes, whether it's Jira or Github itself or some other system. Then you have a PR/MR that is attached to a branch which you are trying to have merged. Then there are the commit messages themselves. These are all completely redundant to each other, and nobody in their right mind should want all of these at the same time. It's too many places to look for the same exact information, there's no reason to maintain it in more than one place.

Some truly awful standard for formatting commit messages, how to do something that has at best dubious value to begin with, is a fantastic way to give the appearance of work without the need for skill or ability or spending time trying to get useful work done, a true boon to incompetents and hangers on. It's also a great way to snipe someone's amazing work and put yourself in a position to critique them with 1/1000th of the effort of accomplishing something useful.

etra0 · on Dec 26, 2023

I value the commit messages in the context where you develop a tool that has to be run widely.

In particular, I had experience with Wine. Having useful commit messages allows you to do bisects and trace down regressions with more ease than cross-checking messages with some external ticket system, and when you have a lot of people contributing to a project it's easier to see what they're doing when they try to do a patch.

I also believe though, that it is good practice to help your colleagues when they do need to find an issue in a project where a lot of different people can work on.

ltbarcly3 · on Dec 26, 2023

This would boil down to merge level messages. Not all projects squash commits, so what you actually care about here are the mr/pr level messages, which might be approximated by the commit message on the merge commit, but actually probably will just say merge {branchname}. When you bisect with parent=0 you only see merge commits or ff directly against main branch.

Putting what im saying another way, in a project with pull requests, commit messages are redundant with the text typed into the pr and the comments on same. We should just carbon copy those onto the merge commit and forget per commit messages.

wavemode · on Dec 26, 2023

I think Jira messages are generally written from a product perspective ("here's what we want") whereas PR's are written from an implementation perspective ("here's how we did it"). And then the PR description ought to just become the squashed commit message (at least that's how my current company does it).

cardanome · on Dec 26, 2023

Code tends to live longer than projects management tools like JIRA. The version history should always be understandable from itself without access to external software. Not to mention that JIRA tickets should be based on concrete user stories while commit messages describe implementation details, they are different layers of concern. As for individual commit messages, you need to so the poor soul reviewing your MR knows what the hell you are doing.

Seriously, please think of the poor soul having to maintain you legacy code when the JIRA is long gone or the external contractor who doesn't even get access to it in the first place.

ltbarcly3 · on Dec 26, 2023

Then tooling should be set up to maintain the same information in multiple systems. I'm not saying that its not important to remember what people were trying to do, I'm saying the original theory for commit messages is completely nonexistent in modern development. People don't make single self contained commits directly to main branch. Since everyone uses some kind of pull request as the unit of merge, just keep the information on the mr and copuly it to the merge request message.

theyinwhy · on Dec 26, 2023

Something more descriptive than "Grrr" would be nice I guess.

seba_dos1 · on Dec 26, 2023

They are really bad and I would feel bad if I presented something like that to someone for a review (not just messages - most of those commits shouldn't exist at all). That said, the MR in question is marked as a draft, so anything goes at that point.

Also, Conventional Commits are mostly pointless. Linux-style commit message conventions are enough.

Kwpolska · on Dec 26, 2023

CPython seems to use squash merges, which means only one commit will end up on the main branch after merging this PR. The history on branches is irrelevant and can be completely messy, full of merges and other experiments; the main branch has one commit per actual feature/change.

And eh, conventional commits seem like pointless bureaucracy to me.

hnfong · on Dec 26, 2023

With only +1,722 lines added, even if the commits were eventually squashed upon landing, I'd consider it good etiquette to tidy up changes to maybe a handful of logical commits instead of pushing 404 raw commits.

Or maybe it's another weird pun on 404 Not Found? I can't tell by now...

CapsAdmin · on Dec 26, 2023

The end result of doing this is good, but I find it really difficult to cleanly do this before I have something that's 100% complete.

I don't code linearly like "first I need feature A, then I code feature B which is needed for feature C, and so on"

It's usually a bit all over the place and it's not clear what depends on what until I start reaching the end.

So to do this properly I'd need to spend a day or two rewriting or making a new branch that cleanly adds everything in order. Hopefully in a way that doesn't leave master in a broken state when reverting tail commits.

In addition, when doing multiple pull requests for a single high level feature, you might get some comments about pull request "C" that would require changes in pull request "A"

cardanome · on Dec 26, 2023

How the hell is someone supposed to review your pull request if you don't take the time to clean it up?

I normally go through every single individual commit when reviewing something and find the commit messages extremely helpful to understand what some change is supposed to do.

Yes, cleaning up your commits takes some time butt I don't see an alternative if you don't work alone and want your code to stay maintainable.

Kwpolska · on Dec 26, 2023

I review the pull request as a whole, looking at the diff between main and the latest commit on the branch (i.e. what GitHub/etc show by default). Reading commit-by-commit means you’d read code that the author knows is wrong and had already fixed it, but you’re cluttering your mind with it. During re-reviews, I usually look at the diff between the last commit I reviewed and the newest commit.

cardanome · on Dec 26, 2023

> Reading commit-by-commit means you’d read code that the author knows is wrong and had already fixed it

If the commit is wrong, it shouldn't be there. I expect every commit in a Pull Request to be functional on its own or I am not going to approve it in the first place. Git has tools to rewrite your commit history and you should use them.

The whole point is that I should be able to revert individual commits without code breaking. At least that is the ideal. A clean version history matters a lot of the people maintaining your code down the line.

fhreviewable · on Dec 27, 2023

In my experience, this is very team-specific. Some teams want squash merges and ignore individual commits and only look at the latest version, while others care about the "history" and will tidy it up in the PR and then merge all the commits from the PR. Though I've found the latter to be much more rare, that's why some tools (like Reviewable https://docs.reviewable.io/reviews.html?highlight=commit-by-...) have a commit-by-commit option but the default is to combine them for review.

wirrbel · on Dec 26, 2023

Yes and no.

I think what you say is definitely the goal for day-to-day contributions.

However, there are changes to a code base that are more "Manhattan project" in nature where not all changes can be neatly packaged into their own commits, OR the PR author kind of needs to re-do their coding on a clean room branch. Which is significant overhead.

Being able to undo a commit is a means to an end, not the ultimative goal.

seba_dos1 · on Dec 26, 2023

> I find it really difficult to cleanly do this before I have something that's 100% complete

That's what a DVCS like git makes easy to do, it's really worth learning.

meowface · on Dec 26, 2023

Yes, but for such a significant contribution to a huge project it's good etiquette to squash on your own before submitting the PR. (Not that it means the PR shouldn't be reviewed and accepted.)

Honestly, I frequently do this for my own personal projects since I'm lazy, but if I'm submitting something to a big open source project I always clean it up first.

da39a3ee · on Dec 26, 2023

Being honest though, the guy's commit messages influence my prior on how reliable and well-designed his code will be.

japanman185 · on Dec 26, 2023

“My prior”. Give it a rest.

da39a3ee · on Dec 27, 2023

Just saying that if I were working with this person it wouldn't make me think highly of him, and in my fairly extensive experience I can report that there's a strong correlation between silly commit messages and not great code. I didn't mean to imply that I was qualified or skilled enough to evaluate the JIT compiler for Python.

I think I missed what your point was?

austinjp · on Dec 27, 2023

I understood the point to be about using accessible language rather than jargon.

da39a3ee · on Dec 27, 2023

You think "my prior" is jargon? I'm not sure what to say to you then. You realize Bayesian thinking in statistics has been around since the 1950s if not 1930s? https://en.wikipedia.org/wiki/History_of_statistics#Bayesian...

austinjp · on Dec 27, 2023

Bear in mind I'm interpreting the previous comment.

I'm familiar with many things Bayesian thanks :) However, I wouldn't assume everyone else is, even here.

Are you familiar with the meaning(s) of the word jargon? https://dictionary.cambridge.org/dictionary/english/jargon https://en.wikipedia.org/wiki/Jargon

Just to provide an example, your previous comment could have been written something like this: "Being honest though, the guy's commit messages changed my preconceptions about how reliable and well-designed his code will be."

No knowledge of statistics required, Bayesian or otherwise.

da39a3ee · on Dec 28, 2023

OK, fair enough, your suggestion is totally reasonable. However I've been referring to people's "priors" though in informal conversation for about 25 years, to friends, romantic partners, and family as well as academics and programmers, and I know several other people who do the same. Apart from anything else it's a nice non-technical sounding word. I'm not a Bayesian statistics zealot (I don't even work in statistics any longer). But I definitely think all educated people should be familiar with the _idea_ of Bayesian inference. I think that goes without saying. I'm no expert on such matters but clearly our own perception/cognition has some sort of Bayesian flavour to it (you think a mammal dimly perceived on the horizon is probably a dog etc). What I'm saying is -- it sound like perhaps you also have had some involvement with the academic subject -- I think you don't need to push that word quite so far away from mainsteam culture. It's perhaps even a little patronizing to mainstream culture? And I think that if we are ever going to overcome CP Snow's Two Cultures problem then making little gestures like this in the right direction is actually important; especially from people like you and me.

sighansen · on Dec 6, 2023

Are your transformations written in SQL?

nmachado · on Dec 6, 2023

They are written in python but SQL is in the roadmap.

paddy_m · on Dec 6, 2023

Check out generating ibis, which can output SQL and many dataframe formats (pandas, polars, modin,...)

nmachado · on Dec 7, 2023

Sounds very useful, thank you for sharing.

sighansen · on Nov 4, 2023

We just ditched slack in favour of teams at our company, because slack wasn't "secure" enough. I feel like I see a headline like this twice a month. I can't ever remember seeing a similar headline for slack.

chollida1 · on Nov 4, 2023

Is this related to Teams?

I thought this was for on premise exchange installs that are directly facing the internet, which is an extremely rare setup these days.

Most companies use hosted exchange or if exchange is on premise, it sits behind a firewall of some kind.

blakes · on Nov 4, 2023

Yeah this is completely unrelated to Teams.

Exchange is actually still fairly prevalent, even among smaller companies. Although many of the smaller orgs that still have on-prem Exchange tend to have a migration plan to M365.

merb · on Nov 5, 2023

> Exchange is actually still fairly prevalent, even among smaller companies. Although many of the smaller orgs that still have on-prem Exchange tend to have a migration plan to M365.

and I hope they do. most of these smaller companies are sometimes sitting on really really old versions. "it works" is mostly the argument. updating exchange sometimes can be painful. most of the time everything works, but sometimes things just break.

aksss · on Nov 5, 2023

Let’s not ignore that if you’re a company self-hosting a highly available Exchange installation (plus backup infrastructure and maybe near-line storage solutions for mail), it’s almost certainly comprised of very expensive capital and > an FTE of labor, all which are entirely a waste of time and resources at this point.

There are vanishingly few circumstances where it makes sense for an organization to be funding deep expertise for the direct management of an Exchange environment. This has been clear for nearly a decade.

The capex to refresh that hardware is a ridiculous waste, so yeah, it wouldn’t surprise me if the people still running those setups have very aged installations (e.g. WinSrvr 2008-12), which are as great a risk as the Exchange Server software they’re running.

The gating factor is often the expertise to plan and execute a migration with minimal disruption and loss. It’s not simple, and it’s nothing like an exchange upgrade project. It’s a downright UGLY project if a company has been abusing their mail system for years (e.g. using their mail system as a document management platform since ‘99, allowing distributed PSTs, etc.). Seen it.

fulafel · on Nov 5, 2023

Teams is half way to the null position in the continuum - if it doesn't do anything and/or people don't want to use it, it exposes you less to vulnerabilities.

gpvos · on Nov 5, 2023

What do you mean?

fulafel · on Nov 5, 2023

I mean that Teams compared to Slack is terrible and people don't want to use it.

baz00 · on Nov 5, 2023

Can confirm. We have an outsourcing op that uses only Teams. We tend to just avoid talking to them so we don't have to use it.

0xDEAFBEAD · on Nov 5, 2023

Can anyone recommend a solid website which aggregates CVE data in order to generate security scores for companies, platforms, open source projects, etc.? I know CVE data has a lot of problems, but I still suspect that this would be more objectively accurate than making security decisions based on gut feel.

dcsommer · on Nov 5, 2023

I don't know of one, and making this judgement based on CVE data alone will not answer your question. Factors ignored include codebase size, customer count, internal CVE filing standards/criteria, etc.

The only signal I would conclude from CVE data by itself, is that I bias towards a preference for companies that regularly publish CVEs. The ones that don't publish CVEs regularly are hiding, ignorant, or actually secure (and the first two are more likely).

You can't look at CVE in isolation.

MattPalmer1086 · on Nov 5, 2023

Aggregating cve data is probably not a useful signal. Products with more cves are not necessarily less secure than ones with fewer ones.

Possibly if a product consistently has high cves over a long period of time that might tell you something about poor security practices over that period (or before it). It might also mean that their security is now quite good!

You have to interpret the data I'm afraid. I can't think of any useful statistical measures you could use to compare aggregate data across multiple products.

pelasaco · on Nov 5, 2023

https://www.cvedetails.com/ ?

liquidpele · on Nov 4, 2023

Ha, bullshit. It’s always because they’re giving teams away for free with office 365.

faeriechangling · on Nov 4, 2023

I really have no idea how Microsoft has gotten away with this clear monopolistic abuse for so long. It's classic tying.

Lorin · on Nov 5, 2023

It's kinda sad that they went through the whole monopoly suit over a century ago and here we are in Windows 11 getting OneDrive notices crammed down our throats even when there's an active work Office 365 subscription on the damn machines. (... and now Teams ad notifications in the Office suite)

liquidpele · on Nov 6, 2023

… how long do you think a century is lol

Lorin · on Nov 11, 2023

Sorry for forgetting the word 'decade', my point still stands.

kzrdude · on Nov 5, 2023

EU antitrust already forced them to unbundle teams, in the EU. I'm not sure how much effect it will have in practice.

taspeotis · on Nov 5, 2023

> free with office 365

Give Microsoft money and receive Teams for free?

trelane · on Nov 5, 2023

> Give Microsoft money and receive Teams for free?

Yes. As opposed to giving Microsoft the same money and then also giving Slack some money too.