Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I've tried using qwen and deepseek but they can't even output documents

What agent harness did you use? Usually, "write_file", "shell_exec" or similar is two of the first tools you add to an agent harness, after read_file/list_files. If it doesn't have those tools, unsure if you could even call it a agent harness in the first place.



Sorry for the confusion, I was actually talking about their Web based chat. Since most of my work is governance and docs, I just use their Web chats and they just refuse to output proper documents like Claude or Chatgpt do.


Aha... Well, I let Codex (Claude Code would work too) manage/troubleshoot .xlsx files too, seems to handle it just fine (it tends to un-archive them and browse the resulting XML files without issues), seen it do similar stuff for .app and .docx files too so maybe give that a try with other harnesses/models too, they might get it :)


Yeah, it's just way easier to do via the web/mobile app but I'll give using it via the CLI a try. Thanks :)


there's things like Open Web UI that allow you to easily get a chat UI from an open source model


You're not giving an AI command line access to your work computer? How do you expect to keep up? /s


You give it command line access in a VM...


Yeah, fine... but it's like daily that a non-tech-savvy friend of mine tells me they just installed some shiny "harness" on their laptop now to organize their emails, and they "just put it in one folder" and "8n8 says", what does it say on the tin, Dave? "it says it's highly unlikely it will escape from the folder". Your work computer? "Yeah, but it's a real company. They're all about security."

So telling someone who just wants to upload an .xlsx file to a bot that they should just find a harness to give CLI access to their work computer - right after they say they work in a regulatory capacity - is just freakin malpractice.


i give it in real ubuntu, no vm, no docker. so long I don't ask it to organize files, it will behave. it has not screw me so far.


Godspeed


I only run it with --dangerously-skip-permissions. YOLO!


You mean a VM like the one that contains a 0day that can escape the sandbox that gets found every year at pwn2own?


Presumably you’re also using a browser to view this web page. There have also been vulnerabilities in that. You have to draw a line somewhere.


I run mine as a separate unprivileged user. (No VM.) Am I pwned?


Maybe, but the sort of 0days you're talking about aren't exploited in any meaningful way for almost all developers.


"Seatbelts don't save the life of everyone who gets into an accident, so why bother wearing one?"


You can make a harness fully functional with just the "shell_exec" tool if you give it access to a linux/unix environment + playwright cli.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: