Hacker Newsnew | past | comments | ask | show | jobs | submit | carls's commentslogin

I've seen this being posted all over, but people rarely seem to realize that data here is from 2023.


> data here is from 2023

It's from February 2025[1]:

> Latest Release: February 20, 2025

> Labor Market Outcomes of College Graduates by Major

> Art History Unemployment Rate: 3.0%

> Computer Engineering Unemployment Rate: 7.5%

> Computer Science Unemployment Rate: 6.1%

[1] https://www.newyorkfed.org/research/college-labor-market#--:...


> It's from February 2025

Unless you keep reading...[1]

> Notes: Figures are for 2023.

[1] https://www.newyorkfed.org/research/college-labor-market#--:...


Thank you for the correction - must admit to having missed the fine print at the bottom - mea culpa!


This project is a research project out of the University of Washington, led by several of the professors there. I believe the lab is the Healthy Aging and Longevity Institute.

They share a list of academic publications that have resulted from the project, and their Team page lists the full names a sizable large number of people.

Their FAQ indicates that the cost of the DNA Kit and other things are covered by the project funding. [1]

What made you think that it's engaging in fraud? I'm genuinely curious.

I'm not involved in the project but just from looking at the site for several minutes, it seems to be a fairly reasonable research project.

Or did you say "fraud" less to mean "these are people who are stealing money and e.g., hoarding it away" and more to mean "these are people engaging in a research project I disapprove of"?

[1] https://dogagingproject.zendesk.com/hc/en-us/articles/441699...


Honestly, it does not appear like a research institute at all. It looks like the styling that the owners of People's Magazine or the National Enquirer would use to market their 'research nonprofit' where the actual research contribution is 3% of their revenues, while supporting fat salaries for an executive staff. It just looks too consumer and not academic, not serious. It is simply too feel good. It has the trappings of respectability, but not really. It's too slick. I also never spent the time to look deeper, the loud consumer targeted presentation drove me away.


Probably because creating a nice-looking website is more likely to drive engagement with today's users than something that looks like it was made by some academic stuck in the 90s.


I think it's reasonable to be turned off by a slick-looking website, but I imagine it's because the intended audience of the website is the general (dog-owning) public, likely for the purposes of soliciting participants.

Interestingly, through engaging with you I discovered that this is a cognitive bias called the "horn effect" and is the reverse of the more common "Halo effect": https://en.wikipedia.org/wiki/Horn_effect#:~:text=The%20horn....


If only you could check your claims with the investment of 20 seconds and 2 clicks

Here, now it's one click

https://dogagingproject.org/our-team


Several people have corrected you, and yet you keep going, claiming that this is some sort of fraud. What's your deal? Why are you spreading FUD?


"several people"? I engaged with two people and moved on.


Maybe the person just really loves it when their dog dies early. :(


There are probably several, but one of the ones that made the rounds a while ago is "The Man Who Killed Google Search" by Edward Zitron: https://www.wheresyoured.at/the-men-who-killed-google/


Prabhakar Raghavan got fired (= promoted out of the job) last week.


I skimmed the technical report: https://cosine.sh/blog/genie-technical-report

At the bottom, they noted the following:

> SWE-Bench has recently modified their submission requirements, now asking for the full working process of our AI model in addition to the final results -their condition to have us appear on the offical leaderboard. This change poses a significant challenge for us, as our proprietary methodology is evident in these internal processes. Publicly sharing this information would essentially open-source our approach, undermining the competitive advantage we’ve worked hard to develop. For now, we’ve decided to keep our model’s internal workings confidential. However we’ve made the model’s final outputs publicly available on GitHub for independent verification. These outputs clearly demonstrate our model’s 30% success rate on the SWE-Bench tasks.

Their model outputs are here: https://github.com/CosineAI/experiments/tree/cos/swe-bench-s...


> However we’ve made the model’s final outputs publicly available on GitHub for independent verification.

Sounds legit


It seems to me that there's a natural tension in emerging fields of science and engineering between establishing clear guidelines and regulations early on to minimize harms, or instead allowing practitioners to experiment, tinker, build and create outcomes that may be potentially harmful.

What are some frameworks for how to think about navigating this tension in emerging scientific or engineering fields?

Some ones I'm mulling over:

1. Rate of innovation: In rapidly evolving fields, imposing strict regulations too early can hinder innovation and progress. In such cases, it might be better to minimize restrictions early on to allow practitioners to explore new ideas. Then, as the field matures, regulations and standards can be gradually introduced.

2. Adaptive regulation: Implement a flexible regulatory framework that can be updated as new information becomes available.

3. Self-regulation: In some cases, maybe we should expect and encourage the industry to use self-regulation via developing guidelines and codes of conduct. This may be one way to try and strike a balance between responsible innovation while minimizing bureaucratic obstacles.

What do others think?


I think Nassim Taleb does a good job of identifying the core issue, which is “skin in the game”. A key problem with regulation is that it diffuses responsibility, in the sense that an engineer adhering to some regulatory scheme is partially absolved of personal responsibility for the outcome as long as they check the regulatory boxes. This point is raised by Brunel as well.

In other words, the harm of regulation is not that it trades progress for safety, but that it stifles both. See also Frederic Bastiat, The Seen and the Unseen.

The modern counterargument might be that engineered systems have become so dense and complex, and in some cases capable of catastrophic consequences for failure, that we simply can’t afford to let practitioners figure it out on their own.


I’d nominate degree of irreversability and timescale for damage to become apparent as drivers for differences on how fields need to be regulated


I think if I want to write some software, and I have the knowledge and compute power to do what I want, then I really don't care whether you or some authoritarian committee tells me I'm allowed to run my software. I'm going to do it anyway.


Now think of synthesizing and distributing novel chemical compounds, or using novel medical devices.


Or performing gain-of-function research on pathogens.


What kind of chemical compounds? Drugs? Fine. You can only hurt yourself with those. Same goes for medical devices. Explosives? Now that's a different matter.


I took a class with Professor Ousterhout. He would end every Friday's lecture with a "Thought for the Weekend", such as this one.

It was very entertaining and charming to hear him discuss his personal and professional life, and lessons he's learned throughout them often occasionally have very little to do with computer science.

I don't remember all of his "Thoughts for the Weekend", but I do remember one story he told about wishing he had apologized sooner to resolve some conflict he was in. That was a bit of wisdom that stuck with me from the class, beyond any of the computer science topics we covered.



I've thought very deeply on the subject of my personal relationships and what causes them to "wear out" as Ousterhout put it. My conclusion differs, and it's because Ousterhout puts the cart before the horse:

> So, the solution is if you want a relationship to last a long time, somehow you have to keep the scar tissue from building up.

The key here is "if you want the relationship to last." In many relationships, people lose the desire for the relationship to last. For instance, in his contractor anecdote, he cares more about the outcome of the construction project that he cares about prolonging his relationship with the contractor. Or in the case of a business relationship, business partners want the business to be run in their own way more than the want their relationship to stay strong. Everything comes down to a desire to keep the relationship going.


> The key here is "if you want the relationship to last." In many relationships, people lose the desire for the relationship to last. For instance, in his contractor anecdote, he cares more about the outcome of the construction project that he cares about prolonging his relationship with the contractor. Or in the case of a business relationship, business partners want the business to be run in their own way more than the want their relationship to stay strong. Everything comes down to a desire to keep the relationship going.

There is also another aspect to the "desire to keep the relationship going". It is culture. It's unfortunate that he used a business relationship to drive home his point, because western business culture greatly emphasizes short-termism, binary outcomes and litigous behavior all of which are not conducive to long-term relationships.

With personal relationships, the same is true: consistency of behavior, personal autonomy and personal goals are all emphasized over collective concerns. These values all make it difficult to value or sustain a 'long-term' relationship that doesn't involve any direct personal benefit.


He doesn't ignore that "people lose the desire for the relationship to last," it sounds like that's exactly what he's saying the "scar tissue" causes: "and then somebody decides they just don't care anymore," as he puts it.


Some relationships aren't meant to last. You're not likely going to have a lasting relationship with the contractor who built your home. Or the lawyer that represented you in some real estate transaction.

One of the reasons relationships wear out is that you can't have so many well-maintained relationships because there is not enough time to maintain them all. Some have to fall by the wayside, or you have to find a way to maintain them with much less frequent contact than when the relationships were fresh.

At the end of the day your longest-lasting relationships will be with the people nearest to you. Parents, siblings, spouses, children, close friends. All the others are at risk merely because you can't give them enough time (and they can't give you enough time). You can make some number of non-core relationships last, but you really have to choose to, and the choice has to be mutual.


My point that this "scar tissue" only formed because he had no desire to prolong the relationship with the contractor. Imagine a good friend was doing the work instead of a random contractor. Do you think he'd greet his friend every morning by going over every single thing that was done imperfectly the day prior?


That sort of behavior happens often in marriages. The "accounting of flaws" there isn't the first sort of "scar tissue", it's something that happens after a bunch has built up. It then layers up and makes it progressively harder and harder to repair the relationship.


I like that this essay frames up this issue, but it's ultimately kind of disappointing in its conclusions. Relationships wear out because they develop scar tissue, and they develop scar tissue... because they do. And there is no clear way to prevent that from happening except to try hard. He doesn't even go into any of his strategies, meaning you're just as on your own as you were before you started reading the essay! It feels to me that a lot more could be said or conjectured on the topic.


I talked about this with my now wife when we started dating. We need to be open and honest with each other, and if there are issues we need to talk about them ASAP instead of thinking "it will somehow be fine".

Once you can safely establish that, it's not really hard work. Just need to be able to feel comfortable enough with the person to say your real honest thoughts and feelings.

I understand that for some people that is really hard to express what they are thinking and feeling, to anyone, even themselves, but if you work on that, then the rest becomes easier.

It was hard for me, this last part, and I had to find some good books and resources to help me understand myself first. The books that helped the most were Nicomachean Ethics by Aristotle, and Before You Know It: The Unconscious Reasons We Do What We Do by John A. Bargh.

Aristotle allows you to see that there is a way to find the middle in any kind of context, and that there is no really "best" in anything, or "the right way" in anything, and it really depends on the person. This allowed me to see better in others' perspectives and empathise better, and not feel too bad when there are conflicting opinions, since none of us are the same.

"Before you know it" allowed me to see how we think, subconsciously and consciously, and how some things are in our control and some aren't.

I hope this helps.


You also need to be comfortable enough with yourself to hear the other person's real honest thoughts and feelings.


That's very accurate, I agree.


Interesting. Is there a list of thoughts which he provided?



Nice! Too bad it is just the titles...


The article does in fact mention this event in the below paragraph:

"The hypothesis took another hit last July when a bombshell article in Science revealed that data in the influential 2006 Nature paper linking amyloid plaques to cognitive symptoms of Alzheimer’s disease may have been fabricated. The connection claimed by the paper had convinced many researchers to keep pursuing amyloid theories at the time. For many of them, the new exposé created a “big dent” in the amyloid theory, Patira said."


I think the situation is more complex than just "we must assume they don't care about learning ... just fail the bastards." I think it's more important to first understand what were the social, environmental, cultural (and otherwise) causes of this behavior.

Specifically, different systems of incentives and permissiveness will produce different behavior. I taught high school computer science for 4 years, and I can attest that cheating occurred in the classes I taught. I've also been enrolled part-time in Stanford's MS in CS, and have taken a number of the core undergraduate curriculum for CS majors.

I also went to a hypercompetitive US public high school with a number of brilliant classmates, many of whom also cheated.

My experiences have showed me that there is a wide spectrum of "cheating", ranging from students sharing things like, "I was at office hours and heard from the TA heard from the professor that topic X is going to be really emphasized on the exam, so you better study for it!" to outright blatant copying of other's code or answers.

What I've noticed as qualities of a learning environment that seems to increase the likelihood of cheating are:

1. The technological ease of which it is to cheat: it's easier to cheat on an asynchronous online exam than when you're taking it synchronously in a large classroom.

2. How "high stakes" the course is for students: for students at institutions like Stanford, where they be used to a certain level of academic success, failing a course isn't just a blow to their transcript -- it's a psychological blow to their identity as a "smart student." They may find it easier to cheat and maintain their self-image (and projected image to their family/friends) as a great student than to take the honest hit to their GPA, and have to give up their identity.

3. How "legitimate" the course feels: classes where the instructor is widely perceived as "unfair" or "incompetent" seem to have more cheating. Students feel disrespected ("How could she put X on the exam? We barely covered it!") or unvalued ("He doesn't even bother giving clear instructions on the homework assignments. Why should we respect his test?") may try to 'retaliate' by cheating.

4. How permissive the academic culture is around cheating: if there is widely perceived to be little-to-no consequences to cheating, or if cheating is seen as, "well everyone does it", then you will have a lot more cheating.

I'm sure the above is not an exhaustive list. My broader point is that in order to address the issues around cheating, we need to be more encompassing than simply punishing the cheaters. If the stakes are high enough, and the incentives strong enough, cheaters will still exist even if they are aware of the severity of the punishment.


> 1. The technological ease of which it is to cheat: it's easier to cheat on an asynchronous online exam than when you're taking it synchronously in a large classroom.

I went to school pre-2000, so the Internet existed, but was not as prevalent as it is today. What struck me most in the article is how easy it is today to cheat today. Real-time group chats, easy sharing of screenshots and quizzes, the volume of easily-copied content off the Internet, and tools available 100% of the time simply put fraternity list of historical quizzes and copied texts in the library to shame. The ease of cheating today is one less barrier that people have to cross to compromise their morals.

I think if a goal of post-secondary education is to prepare their populations for professional success, then you're right, simply punishing cheaters does not achieve that goal. But our world today is full of examples where it's easy to take the less moral or ethical road and suffer little to no consequences. Hopefully, schools will not succumb to that too much.


I recently took a CS class at Stanford with an interesting policy on cheating. While cheating almost certainly happened during the course, at the end of the quarter the course staff made a public post allowing any student who cheated to make a private message to the staff admitting they've done so.

If a student admitted to cheating, while they would face academic disciplinary action (i.e. receiving a failing or low grade), they would not be brought up to the administrative office that deals with issues of academic integrity, and therefore would not face consequences like expulsion or being on official academic probation.

However if a cheating student decided to risk it and not admit their guilt, they were at risk of a potentially even greater degree punishment. The course staff would run all students code through a piece of software to detect similarities between each other, as well as online solutions. Students who were flagged by this software would then have their code hand-checked by at least one course staff, who would make a judgement call as to whether it seemed like cheating.

I found this policy quite interesting. As a former high school teacher, I've certain encountered teaching in my own classes, and have historically oscillated between taking a very harsh stance, or perhaps an overly permissive one.

The one taken by the lecturers of this course offered a "second chance" to cheaters in a way I hadn't seen before.


That sounds great and all but I honestly have doubts about this software that detects similarities… there’s only so many ways to solve the bland questions that professors lift from books; kind of ironic. I’m assuming it’s basicallly doing AST analysis and it’s no smarter than eliminating things like variables being renamed.

They are basically stating that this “software” is 100% accurate. Furthermore it’s then left to whims of some TAs?

No algorithm can detect cheating unless the number of permutations are very very large (I.e being struck by lightening). Maybe one way to offset would be to use data as the student is entering the solution but that was never the case for us; just upload the source code to their custom made Windows app.


Speaking from experience using similar software on students assignments, it is often blatantly obvious when cheating is occurring.

To start with, at an undergrad level, most students had fairly distinct coding styles - usually with quirks of not "proper" coding. Some cheaters had the exact same quirks in multiple students assignments.

Also, some cheaters had the exact same mistakes in their code, on top of the same code style.

Yes the software picks up people that write correct solutions with perfect syntax, but those are the ones that you just toss out because there isn't any proof there.

The people that get caught cheating generally don't know what correct solutions and good code look like, so they don't understand how obvious it is when they copy paste their friends mediocre code.


I agree with you. I run a days science department in a corporation and when I'm doing code review for a junior, I can tell what was original and what came from somewhere else. Fortunately, in the workplace context that just means trying to get people to paste the SO URL as a comment above the appropriate code block.


Assuming that the software detects a similarity between two or more student’s submissions, how do you know which students cheated? What if one of the students (the one that actually did the work) had their program stolen/copied somehow (eg left screen open in lab or print out of code)?


I teach some courses with coding assignments and we just tell the students very clearly and repeatedly, at the beginning of the course and before each submission deadline, that submitting duplicate material means failing. It doesn't matter if A copied from B, B from A, both copied from an external source, or even A stole B's password and downloaded their data. The penalty is the same. We cannot go into such details because we just don't have the means to find out, and some students are amazing at lying with a poker face.

It's a pity to have to fail students sometimes because they failed to secure their accounts and someone stole their code, but they have been warned and hey, securing your stuff is not the worst bitter lesson you can learn if you're going to devote your career to CS, I guess...


Cheater Student enters the lab, turns on the video camera on their phone, walks casually behind other students recording their screens, reviews video for useful information. Other students fail. Seems like a poor outcome that is plausible and unfair to the student whose info was stolen to no fault of their own.


Indeed, it's plausible enough that I've actually caught students trying to do that.

The problem is: what's the realistic alternative? Just letting cheating happen is also unfair (to students who fail while the cheater passes). And finding out what exactly happened is not viable because students lie. We used to try to do that in the past, but the majority of the time all parties involved act as outraged and say they wrote the code and don't know what happened. Some students are very good actors, many others aren't, but even when you face the latter, your impression that they are lying is not proof that you can use in a formal evaluation process and would withstand an appeal.

So yes, it can be unfair, but it's the lesser evil among the solutions I know.


Ask the students how their code works and how they came up with it. It shouldn't be hard to tell who actually wrote it.


On the one hand, as we know from the P vs. NP problem (at least if we assume the majority opinion), explaining a solution is much easier than coming up with it... and even easier if they copy from a good student who not only writes good code, but also documents it.

On the other hand, even if I am very confident that a student didn't write the code because they clearly don't understand it (which is often the case), this is difficult to uphold if the student appeals. For better or for worse, the greater accountability in grading and the availability of appeal processes means that you need to have some kind of objective evidence. "It was written in the rules that duplicate code would not be accepted, and this is clearly duplicate code" is objective. "I questioned both students and I found that this one couldn't correctly explain how the code works, so I'm sure he didn't write it" is not.

Note that I do this kind of questioning routinely (not only when cheating is involved) and take it into account in grades, because it of course makes sense to evaluate comprehension of the code... but outright failing a student on the grounds of an oral interview can easily get a professor into trouble.


> On the one hand, as we know from the P vs. NP problem (at least if we assume the majority opinion), explaining a solution is much easier than coming up with it... and even easier if they copy from a good student who not only writes good code, but also documents it.

You can ask “tricky” questions that someone who understands the material shouldn't have a problem answering, such as “if the problem required you to also do this, how would you change your code?”.

"I questioned both students and I found that this one couldn't correctly explain how the code works, so I'm sure he didn't write it" is not.

Fair enough. But at least you can give a bad grade for not understanding the course material.


I would let 100 people cheat if it meant I was sure 1 innocent student wasn’t punished unjustly.

People that don’t cheat may benefit in the future for not doing so.

I said May here because I generally found university education to be useless for myself. Instead, I wish I had met folks I consider mentors at work, earlier in my life.


> I would let 100 people cheat if it meant I was sure 1 innocent student wasn’t punished unjustly.

This makes sense in the justice system, but in the justice system you often can find proof as to what happened, so the system still acts as a deterrent even if a fraction criminals get away with no punishment. In university assignments, most of the time it's practically impossible to find evidence of who copied from whom, so applying that principle would basically mean no enforcement, that everyone would be free to cheat and assignments would just not make sense at all.

Also, failing a course is far from such a big deal as going to jail or paying a fine. At least in my country, you can take the course again next year and the impact on your GPA is zero or negligible. You will have an entry in your academic record saying that you failed in the first attempt, but it won't be any different from that of someone who failed due to, e.g., illness.

If the consequences were harsher (e.g. being expelled from the institution, or something like that) then I would agree with you.


Put a security camera in the lab. Catch a student doing something like that and you have grounds for expulsion.


When I was a TA checking "Intro to programming" HW assignments, my brain was the similarity check software.

Anyway, when I detected two basically-identical submissions, I would call in both students to my office. I would chide them, explain to them that learning to code happens with your fingers, and that if they don't do it themselves, then even though they might sneak past the TA, they'll just not know programming, and would be stuck in future courses.

The I would tell them this:

"Look, I have a single assignment here, with a grade, on its own, of X% (out of a total of 100%), and two people. I'm going to let you decide how you want to divide the credit for the assignment among yourselves, and will not second-guess you. Please take a few minutes to talk about it outside and let me know who gets what."

Most times, one person would confess to cheating and one person got their grade. For various reasons I would not report these cases further up the official ladder, and left it at that.


It becomes obvious when you ask them to explain the code. At my university I once overheard a boy and a girl presenting some code "they" had written to a TA. The TA asked them some basic questions on while-loops and function calls. It became obvious that the boy had written all the code and the girl had no clue. So the TA decided that the boy had passed but that the girl had to come back and present the code herself on the next session.


It doesn't matter, both violated academic integrity by letting the copy happen. (Submissions are never stolen) If you think letting copying happen is less severe, you ask them and rebalance based on the work. Most of the time 'they made it together'.


Curves are the lesser evil. There are professors who dont give good grades at all. If you select a curse run by them cant get more than a grade C. Meanwhile there are other professors where everyone gets an A easily.

Most students will probably pick the easy professors who give only As - because for them the degree is just a ticket for the job.

In fact those "tough" professors can have an adverse effect on those who picked the harder route. If you dont get good grades, you will have a lower GPA and that dream company will not even invite you for an interview. Some automated HR system will reject your application. They dont care that you went to a professor who taught you a lot -> they only see the low grade.

Same for scholarships - tough professor makes it already difficult to have good grades, but if you are graded without a curve, you get bad grade -> and can lose your scholarship.

Nobody cares about you as a person, or your knowledge, they measure you by your grades.

This is a tragedy of the commons in some ways: professors are supposed to give good grades, otherwise students wont choose them. Those who want to know more, are punished for it - in multiple way (first of all, they need to study more, but then they get a lower grade, which means lower GPA, what can lead to worse job offers, no scholarships etc).

If you want to be a "popular" professor, just pass everyone?

On a side note, in those great universities, dont they pass everyone anyway? I think frontpage had an article some time ago, that when you get to Ivy League, you will get a B or C even if you are bad, they generally dont kick out students who try to study, but arent particularly good.

Curves wouldnt be needed if every course had an objective list of material that should be learned - but even this is difficult - and not comparable between professors on same university, not to mention different ones - despite standards and various efforts (not to mention measuring if students really can know the whole list)


How is sharing knowledge "violating academic integrity"? Unless given specific and explicit instructions not to reveal working solutions, then sharing your code is literally just "helping" others, it's up to them to either study it and produce their own versions, or jut blatantly copy & cheat.


Because each university has university-wide rules forbidding sharing assignment solutions. It is explicitly forbidden even before starting the course unless the syllabus or professor directs you so. You can't "help" others on their own assignments by giving your solution. You can't receive direct "help" either.

Edit: here's the text of my alma mater: Any behavior of individual students by which they make it impossible or attempt to make it impossible to make a correct judgment about the knowledge, insight and/or skills of themselves or of other students, in whole or in part, is considered an irregularity that may give rise to an adjusted sanction.

A special form of such irregularity is plagiarism, i.e. the copying without adequate source reference of the work (ideas, texts, structures, designs, images, plans, codes, ...) of others or of previous work of one's own, in an identical manner. or in slightly modified form.

[https://www.kuleuven.be/onderwijs/oer/2021/?faculteit=500004... translated with Google]


In my time in college I helped a lot of fellow students work through a lot of assignments. I sat down with them and helped them to think through the problem and find examples to learn from that weren't full solutions to the assignment. I helped them find difficult bugs in their implementation by pointing them in the right direction or showing them debugging tricks I found helpful.

What I didn't do was show them my implementation or even talk about how I solved it. Yeah, doing it the long way takes a bit more effort, but the result is that the students I helped actually understood the code they submitted and were better equipped to solve the next assignment without help.


You ask them to do it again in front of you


I implemented the widely used MOSS algorithm (mentioned by a sibling) for my CS department in my senior year. That algorithm doesn't do AST analysis, it just looks at the plain text in a way that is resistant to most small refactorings. MOSS compares sets of k-grams (strings of k characters) between every pair of projects that are under test and produces the number of shared k-grams for each pair of projects. On any given assignment in a given semester, there's a baseline amount of of similarity that is "normal". You then test for outliers, and that gives you the projects that need closer scrutiny.

On the test data we were given (anonymized assignments from prior semesters together with known public git repos), we never had a false positive. On the flip side, small refactorings like variable renames or method re-ordering still turned up above the "suspicious" threshold because there would be enough remaining matching k-grams to make that pair of projects an outlier.

Our school explicitly did not use the algorithm's numbers as evidence of cheating and did not involve the TAs--the numbers were used only to point the professor in the right direction. We excluded all k-grams that featured in the professor's materials (slides, examples, boilerplate code). It also helped that they only used it on the more complex assignments that should have had unique source code (our test data was a client and server for an Android app).

My sense was that this was a pretty good system. Cheaters stood out in the outliers test by several orders of magnitude, so false positives are extremely unlikely. At the same time, the k-gram approach means that if you actually manage to mangle your project enough that it's not detected as copied, you had to perform refactorings in the process that clearly show you know how the program works--anything less still leaves you above the safe zone of shared k-grams.


From doing some cursory research, it appears the software in question is called MOSS (Measure of Software Similarity) and is currently being provided as a service [0].

Since it is intended to be used by instructors and staff, the source is restricted (though "anyone may create a MOSS account"). According to the paper describing how it's used [1], "False positives have never been reported, and all false negatives were quickly traced back to the source, which was either an implementation or a user misunderstanding."

Sources:

[0]: https://theory.stanford.edu/~aiken/moss

[1]: http://theory.stanford.edu/~aiken/publications/papers/sigmod...


I used something similar when I was a TA 20 years ago and while your assumption seems reasonable, there are actually a lot of different ways to solve even quite simple tasks and most cheating is very obvious on manual inspection.


Yep... If you're going to go through the effort of completely rewriting a piece of code to try and dodge an AST analysis algorithm, you've effectively just done 70% of the work and put your grade/position at the institution on the line. It's not worth it, and so people don't tend to do that. It's the same thing with plagiarism—students could very well resynthesize a stolen work in their own words. It would still be plagiarism, sure, but it's also putting in a large amount of effort while still being risky.


If you rewrite everything you steal (e.g. never copy/paste), it’s no different from using an especially well written source.


Well, no. It's still plagiarized if you fail to communicate that it isn't your original work. You can't just steal ideas from someone else's paper, even if you rewrite everything. If you rely on another paper for inspiration, you have to cite it. If a student submitted a paper that was just another paper entirely rephrased, that would not be acceptable in the least even if they cited their source because the expectation of writing a paper is that you contribute something novel and not just regurgitate someone else's argument


If the problem is large enough, I do submit that there are multiple (even many) ways of solving it.

I will also say that there’s problems where that is not the case. For example, we were told to write simulators for scheduling schemes (RR, MLFQ). Other than using different data structures (even that’s a bit of a stretch) not sure how much variance there will be.

Using the right tool for the right job is important.

Just above your post another author posted/cited results of a system that “never produced false positives”.

I think that cited number another author shared is probably correct but presumably, the tool is used in cases where problems are big enough to warrant it.


The problems we had were way way simpler than anything deserving an acronym. You'd think there was only one way to do it and yet it was not hard to distinguish plagiarism.


Do you happen to have a few examples? I’m super curious! How many students were taking the course?


I noticed a swap in your prose (still comprehensible), but just realized that cheating and teaching are semi-spoonerisms (swapping the sound order of a single word)... how appropos!


I have the same policy at my uni in Poland. Admit to cheating without being called out? Depending on the professor mood, they either allowed you to retake exam (though the best grade you'd get was the lowest passing one), or you just fail the course and try again next year.

Both they had less paperwork, and well, they wouldn't report that person.


> I've certain encountered teaching in my own classes,

This kind of mistyping reminds me of those example of people whose names fit their job but it's rare to find such an apropos example.

Sorry to derail your point but the juxtaposition of "ch" and "t" here is perfect.


Not juxtaposition, transposition. https://en.wiktionary.org/wiki/juxtaposition


Yes, my mistake, thanks.


The results of checking against existing and other test takers solutions must be taken with a strong human judgment. Programming problems such as would be asked in tests are essentially like mathematical formulas/algorithms, and there isn't much variation in how a given formula or algorithm can be implemented.


I don't think these techniques are often applied to problems in tests - there are other, simpler ways of catching cheaters there.

They are much more likely to be applied to homework assignments, where the opportunity for copying is large, but the chance of two students producing the exact same >500-1000 line program is slim to none. Perhaps once in a while a critical function will be copied and no one will realize or similarities in a trivial function will be unnecessarily flagged, but this will be relatively rare and quickly discovered in manual review.


There is a lot of syntactic variation possible, both for formula and algorithms. Even for something as simple as quicksort there is enough natural variation for a class of 30 maybe even 100 (if no references can be used). Anything more complex and even with references it should be unique.


It's not _just_ trying to be lenient and offer a second chance - it's a way to catch more cheaters. "Turn yourself in and we'll go easy on you... because we might not catch you."


I think a fundamental mistake you (and other commenters on this article) are making is judging the value of the "plain" English version by whether you think it's good writing.

However, the article's intention is to use plain language to be accessible to individuals with intellectual disabilities or other difficulties in language (i.e. recent immigrants).

As a fairly well-read and educated person, I also find the "plain" version dull and uninspiring. However, I accept that the article is trying to make the point that such writing may be more broadly understandable.

For example, I recently had some relatives immigrate from a non-English speaking country. I helped them set up internet and noticed there were multiple points during the company's signup and payment process their lack of English fluency created huge hurdles.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: