Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It should be possible with a good screen reader. It can read to the icons and text on the page to you. It might be hard to know which boxes are clickable, but that applies to using it normally too.


A screenreader doesn’t literally read the screen, it reads the accessibility tree that apps build for their interfaces. If your user interface kit doesn’t create an accessibility tree then your users’ screenreaders are completely lost at sea.


>A screenreader doesn’t literally read the screen

Why not? The approach of creating an accessibility tree can take extra work from developers instead of it just working. It's convenient to be able to just use an image without writing alt text for it. For example in a group chat.


An image on its own: sure, our tech is getting pretty good at recognising the contents. An image in context? Almost impossible to tell whether it’s meant to be decorative, inspirational, factual, or to be OCRed for text without some sort of hint.

Or what about a chart, or an assembly or measurement diagram? Can current image recognition reliably reproduce that information?

At the end of the day, the extra work by developers is part of what it means to be a developer. If you’re not doing that work then is the end product really meeting your users’ needs?


A screen reader could be made to describe those things. Even if it failed I don't see how it could be worse than it just saying that there is an image there.


> The approach of creating an accessibility tree can take extra work from developers instead of it just working.

Because this isn’t true unless you’re using a nonnative framework like Flutter. If you write your apps in HTML or native frameworks, the tree is built automatically. You only have to fiddle with it if you’re doing really custom stuff (which almost no one is).


The project that is being linked to renders everything into a canvas. It's extra work to create a separate tree for another program to digest.


You asked why not. That's the answer -- why should platforms create an overcomplicated ML-based solution to try to screen read (assuming such a thing is even possible, which I don't believe it is), when existing solutions using standard frameworks work fine and _don't_ require devs to spend extra time?


Considering year old mobiles are able to perform on-the-fly language translations in photos even via awful cameras, i find it weird that screenreaders still rely on such hints.


Delete every CSS declaration (both inline or stylesheets) from every website and see how easy it is to read them. Not very, huh? Same deal with accessibility.

You can't "just OCR stuff" without losing all the visual meaning in a page. Just like we use borders and paddings and colors to hierarchize information, screenreaders use an information hierarchy too so users can conveniently navigate around.


I wasn't referring to just OCR stuff (or even just web stuff) though, my point was that there is enough information in the screen to make out detail - computer vision is more a broad subject than just scanning text. ~12 years ago i was working on getting a computer figure out where 2D boxes were in a feed from a camera (for augmented reality, not accessibility) and my algorithm was quite naive and primitive, but also the source was some awful web camera, not something "pristine" like a screen's content.

Of course i don't know that it is possible, it could be impossible, i'm just having the impression that there hasn't been much effort towards that approach. And TBH it kinda feels like it'd be much better to have a solution that works with "everything" without that "everything" knowing about it (or at least with very little participation from that).

Also FWIW i often use a "simple" web browser like Dillo or Elinks to read articles since it bypasses all the cruft and the usual suspect for making things unreadable isn't CSS but JavaScript.


OCR is relatively easy, but the accessibility information is not only that. There are the types of elements, the possible interactions and the changes on the screen. Also, it gives the ability to skip unnecessary information. Using ml for all of that is taxing and probably not very practical until the invention of AGI.


I wasn't referring to just OCR, check my other comment.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: