Hacker Newsnew | past | comments | ask | show | jobs | submit | ricketycricket's favoriteslogin

same: google-chrome --headless --disable-gpu --no-pdf-header-footer --hide-scrollbars --print-to-pdf-margins="0,0,0,0" --print-to-pdf --window-size=1280,720 https://example.com

ended up using headless chrome specifically to make sure javascript things rendered properly


RAG is taking a bunch of docs, chunking them it to text blocks of a certain length (how best todo this up for debate), creating a search API that takes query (like a google search) and compares it to the document chunks (very much how your describing). Take the returned chunks, ignore the score from vector search, feed those chunks into a re-ranker with the original query (this step is important vector search mostly sucks), filter those re-ranked for the top 1/2 results and then format a prompt like;

The user ask 'long query', we fetched some docs (see below), answer the query based on the docs (reference the docs if u feel like it)

Doc1.pdf - Chunk N Eat cheese

Doc2.pdf- Chunk Y Dont eat cheese

You then expose the search API as a "tool" for the LLM to call, slightly reformatting the prompt above into a multi turn convo, and suddenly you're in ze money.

But once your users are happy with those results they'll want something dumb like the latest football scores, then you need a web tool - and then it never ends.

To be fair though, its pretty powerful once you've got in place.


> but then I’d have to relieve the horror of being forced to submit my email address to Microsoft to install the damn OS.

all it takes to install Windows 11 without an account is to press SHIFT + F10 on the "Connect your internet"-screen and execute this command: OOBE\BYPASSNRO

never in my life have i linked an MS-Account to my personal windows, i always use a local account.


There are a few, actually :)

CCBoot is a Windows Server-based diskless solution I mentioned, and they also provide CCDisk, which can do "hybrid" mode — where there is a small SSD in every PC with base OS pre-installed and pre-configured, which then mounts an iSCSI game drive

GGRock is a fantastic product, in my opinion. It is pricy, but where as CCBoot relies heavily on knowing it's inner workings, GGRock is pretty much turnkey solution

There is also CCu Cloud Update, which I have heard of, but didn't try myself, since they sell licenses only in Asia, from what I remember

LANGAME Premium is an addon for LAN centre ERP system, which is basically an ITAAS solution based on TrueNAS. Of all paid offerings that one is my favourite so far — but you have to use their ERP and actually run a business for it to be cost-effective

NetX provides an all-in-one (router, traffic filter and iSCSI target) NUC-like server with pre-configured software on a subscription basis. I am most skeptical of that just on the basis that, from my research, two NVMe drives can't really handle the load from a fully occupied 40+ machines LAN centre. Not for a long time, at least

...and homebrew, of course. I myself am running a homebrew ZFS-based system which I'm extremely happy with

In your case, I'd go with building my own thing too. Does not take a lot of time if you know the inner workings and you have no additional OPEX for your room :)


I do hiring in the US not the UK, but I'm only (I'm guessing) a few years older than you so you probably had the "keep it to one page" rule pounded into your head.

I get a lot of resumes, and that is definitely not the case anymore. Two to three pages is perfectly fine. I would suggest:

1. Add a list of skills and/or technologies you're particularly good in. You can include stuff you've used but aren't current in, but make sure it's annotated as such

2. Add a high-level summary that hits the aggregated highlights of your career. Especially important for making sure that the resume reader quickly understands your specialties and focus, depth and breadth. Your Blender work is badass and should def show up in that summary

3. If you have education, I'd add a small thing about that at the bottom. For smaller orgs it doesn't matter but bigger ones it will make a difference. Ditto if you have any certifications

Overall though, I heartily agree that it's a vicious hiring market right now so don't take it personally. You look like a fascinating candidate! If we had open slots and our markets lined up more, I'd be shooting you an email. I'm a little weird in this regard, but I am particularly impressed by open source work that people do. It shows passion, motivation, and a willingness to make the world a better place, which are three things I really appreciate.


You might try Magnet. It's $10 on the mac app store. I've had zero issues with it, and it has some problems sorted that other tools don't or haven't always, namely correctly tiling across monitors. I used to use Rectangle, which is open source and frankly pretty good but at the time I was using it had some warts. macOS' window management seems to not be entirely transparent. I don't know all the details, but this is my workstation. I need(ed) this to work now, and if it's a system that might change a bit over time or has some weird idiosyncrasies then I'm happily willing to pay a low one-time fee for a tool like this.

I agree and I do this via Power BI. If you import data into a Power BI report, create a data model with calculated measures (in DAX, not MDX), and publish it to the online service, then users can click on "Analyze in Excel" and it downloads an Excel workbook with a pivot table connected to that data model. I provide this to the PMs for the product I work on and they're able to answer a lot of their questions just by pivoting instead of having to write bespoke SQL.

Porkbun pricing:

.com (9.73USD): https://porkbun.com/tld/com

.xyz (9.92USD): https://porkbun.com/tld/xyz


In addition to the choices for how to chunk (i.e. defining chunk size, chunk boundaries, chunk overlap, etc.), there's also the question of what actually gets returned once finding the chunks that match. For example, perhaps I have a document with 100 1-page sections where each section is broken into roughly 5 chunks. I may get optimal performance in my RAG application not by retrieving the top K chunks from the index, but rather by returning the top K sections fom the document, where sections might be scored based on the number and scores of child chunks. It also might be useful to incorporate section summaries, etc., in the retrieval process.

To summarise the article if you're skipping to the comments, the pgvector allows you to create a "vector" type in your database

    create table documents (
      id bigserial primary key,
      content text,
      embedding vector (1536)
    );
 
Then you can use OpenAI's Embedding API[0] to convert large text blocks into a 1535-dimension vector, which you will store in the database. From there you can used pgvector's cosine distance operator for searching for related documents

You can combine the search results into a prompt, and send that to GPT for a "ChatGPT-like" interface, where it will generate an answer from the documents provided

[0] https://platform.openai.com/docs/guides/embeddings


At the risk of sounding like a salesperson, I'd like to mention a feature of basic/classic Dualit toasters - namely, the optional and elegantly simple sandwich cage. [1] The cage is designed to take two slices of bread with a filling (in my usage, always heavy on the cheese). You put this into the toaster slot, which is vertical, and safely make cheese/whatever toasties. So, your 12-year-old had the right idea, but not the right toaster.

FWIW, and now going full salesperson, Dualit toasters are low-tech and minimalist. The timer is clockwork. There is no automatic pop-up function; you can depress a lever to raise the toast during cooking to check its colour, if you like. All parts are serviceable/replaceable. I've had my bog-standard Dualit (plus two cages) for 20+ years, and so far I've not had to replace even a heating element; and this with usage at least five days a week.

[1] https://www.dualit.com/products/original-sandwich-cage


Anybody claiming that anything has "solved all the pain" is definitely selling you something after having bought a load of it themselves.

State is suffering, and life is stateful, even after you're long dead and garbage collected.


99% the problem is one of pixel density. That is, you have a 400x400 canvas, and a 400x400 CSS pixel space you are drawing it to, but your devices pixel ratio is higher than 1, so it looks kinda blurry. This is because devices squish more than one "hardware pixel" per CSS pixel. Often at a ratio of 1.5 or 2, or higher.

The solution is to make a larger canvas, say, 800x800 and put it into that 400x400 space.

Here is an example, using that MDN code, with a 400x400 canvas (red) next to a 800x800 canvas (blue). CSS is forcing them both to appear the same 400x400 size. The blue one should look sharper on most devices.

Note how the 800x800 canvas needs to be scaled double with ctx2.scale(2,2) so that it appears correct.

https://codepen.io/simonsarris/pen/eYexbOb

Pixel ratio is variable (window.devicePixelRatio), so this canvas pixel density is something you'll want to programmatically set for each user.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: