Hacker Newsnew | past | comments | ask | show | jobs | submit | fithisux's commentslogin

Excellent

Of course it is not working. PDF and images are supposed to be tamper resistant. OCR tries to reverse engineer them.

Since when is tamper resistance a part of PDF or any common image format?

PDF files can be signed, that is tamper resistance. Tamper resistance doesn't have to make any difference to the readability of the document.

So can any type of file -- that doesn't have any relevance to the supposed design of every file type in existence. Now, later versions of PDF do have explicit support for signatures, but what does this have to do with preventing OCR? OCR reads a file, it doesn't change the original file.

Some OCR solutions do change the original file, like OCRmyPDF. They take layers that were just images before and replace it with text layers so that you can search the document.

That isn't OCR, but an application of the resulting output of OCR. Again, a signature on a PDF or any type of file doesn't prevent you from reading it. (It also doesn't technically prevent you from changing it, it just enables the detection of changes to a particular file.)

There's nothing about PDFs or image formats that prevent anyone from doing OCR. The reason construction documents are difficult to OCR is because OCR models are not well trained for them, and they're very technical documents where small details are significant. It doesn't have anything to do with the file format


True but you can make modified copies if you reverse engineer it with OCR.

That's not really what I would call reverse engineering. If you read a pdf, and type it into word is that reverse engineering? Either way whatever you get is in no way going to convince anybody that it is the original.

Can't one just remove the signature and re-sign it with anything else after tampering? Who verifies PDFs that hard?

If you're performing OCR, you're almost by definition, disregarding the source file. The whole point of OCR is to be transformative.

You can't change a PDF, it is by design to be not easy to OCRed

PDFs are merely an collection of objects, that can be plainly read by reading the file -- some of those are straight up plain text that doesn't even need to be OCR'd, it can be simply extracted. It is also possible to embed image objects in PDFs, (this is common for scanned files) which might be what you are thinking of. But this is not a design feature of PDF, but rather the output format of a scanner: an image. Editing PDFs is a simple matter of simply editing a file, which you can do plainly as you would any other.

It is not by design! PDFs that are made from scanned documents or collections of images would require OCRing but that is true of any format that the scans/images are put into. These days the vast majority of PDFs do not need to be OCRed as the pages are just made up of text, line drawings and images. And although it can get tricky you can edit those text, line and image commands as much as you want.

For example: add this is in the contents stream for a pdf page and it'll put hello world on the page

  BT
    /myfont 50 Tf
    100 200 Td
    (Hello World) Tj
  ET

(Note: a bit more is required to select the font etc)

Completely false.

Congratulations guys!!!

Keep up the good work.


Lack of Mingw support keeps me away from it, and Odin.

Just donated my 6 month amount.

We need more projects like this.


Astral to Join OpenAI (astral.sh) OpenAI to Acquire Astral(https://openai.com/index/openai-to-acquire-astral/)

what can I say?


You can say "acquire", as the URL you linked to does. It's more honest than the mealy-mouthed "join".


They could have rebuilt it on top of osFree and have 64bit support.


That would kill half the point.

OS/2 is amazing as a 32bit pmode OS that can still run DOS and Win16 software while being far more stable than Windows (of the time).


And have an unstable base for the supposed commercial applications they sell to?


Is zig able to target WindowsXP?


Yes! Zig's cross-compilation story is one of the reasons I chose it for this project.

Zig can target x86-windows-gnu and lets you set os_version_min to .xp in the build config. This tells the linker to use only APIs available on XP SP3.

In practice there were a few things I had to deal with:

- RtlGetSystemTimePrecise doesn't exist on XP, so I wrote a compatibility shim that redirects to GetSystemTimeAsFileTime at startup

- The build uses -OReleaseSmall, strips symbols, and is single-threaded to keep the binary small (~750 KB) and compatible with XP's threading model

- There's a dedicated build-xp.zig that sets all the right flags, but you can also just do: zig build -Dtarget=x86-windows-gnu

- No UCRT or MSVC runtime dependency — this was critical because XP doesn't ship with the Universal CRT, and you can't install it reliably on minimal XP systems

The same codebase cross-compiles to Linux x86/x64/ARM with "zig build cross" — that's where Zig really shines compared to writing this in C with manual cross-compilation toolchains.

One thing I should mention: Zig 0.15 (currently in use) dropped some older target support, but x86-windows with XP compat still works. I'd recommend testing with the exact version in the repo if you want to reproduce it.


Valuable information. Thanks!


Then they will "fix" it.


They want to replace software engineers because they want exclusive power and because they are not team players.

Narcissism


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: