Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
“Close to 10% of the papers we receive show some sign of academic misconduct” (retractionwatch.wordpress.com)
66 points by tokenadult on Sept 20, 2013 | hide | past | favorite | 48 comments


On a vaguely related note, I know of a crystallographer who would send out Christmas cards every year that looked like postcards with a partial, yet to be published, crystal structure as the image. One year, one of the crystallographers he sent the card out to figured out which protein it was, reconstructed the density map (partially from his own data on the same molecule, partially from the solved structure on the card) and published the first PDB for it to RCSB. The following tiff and yelling about academic misconduct was pretty funny. They wound up taking down the PDB, but the guy's never sent out another postcard-like Christmas card


sounds like a good fairy tale. are you claiming he phased data he had already using the model in the postcard? seems unlikely.


Self-plagarism is a tricky thing. Some redundancy in publications is virtually unavoidable. If you are reporting new results from an experiment that has been previously published, you still need to provide a brief summary of how things work but, of course, you reference your previous work. This is as transparent and innocent as things get, and isn't self-plagiarism. However, what if you reproduce a figure published in another journal exactly? Then things can get ugly (Depending on a lot of legalese no researcher should have to worry about.)!

To make matters worse, the same experimental setup is frequently used to run experiments that result in several publications. When you have several papers authored by different overlapping groups of researchers in various stages of review simultaneously things can get very confusing. The first paper you submit is frequently the last one published. The last paper you submit will then wind up referencing arxiv pre-prints instead of published articles, and if you did plagiarize yourself by mistake, the publisher of the paper submitted last might have absolutely no chance of detecting it. On top of it all, there is frequently interference from editors and reviewers. A well-written and coherent paper on a theory an experiment testing that theory might be considered "too long" or "too confusing" by a referee, and the editor will jump on that comment and demand the paper be split into two, sometimes only one of which they feel like publishing in their journal. If the researcher had done this of his/her own initiative many would consider it CV-padding!

Yes, self-plagiarism can be a means to pump up your publication list, but it also happens innocently because publishing is such a confusing counter-intuitive mess! It's probably an idiotic idea, but some kind of pan-journal version control system, even just for figures, would be a tremendous headache reducer if done right!


Half the plagiarism violations are "self-plagiarism", which I assume means copying from your own previously published works. Hardly seems like such an ethical problem to copy from yourself except of course if you're paid by the word.


It's funny that self-plagiarism is the offense he cites most. One of the things that drives me crazy about the current system is how much time is wasted in that largish section called the Introduction. On the one hand, it's my favorite section to write since it provides the most leeway for prose over the dry, wanky obfuscation that are the remainders of most papers… but still, every time I write one, I wish I could just include a damn link to any one of the N review papers out there, or even a wikipedia page.


It's an ethical problem when you certify that you haven't done exactly that.


It would also be bad if you get your Bachelor with one of the papers, the Master with the next and they are the same.


Why? If you the quality is sufficient to be a Master's thesis surely you should be awarded the degree.


It increases the amount of crap out there, plus it pads the CV


This is a huge issue. When you slog through 5 papers someone has written and keep getting a deja vu feeling it really sucks.


Not to mention when your deja vu is of deja vu.


We have to keep in mind what the academic definition of plagiarism is, because it isn't the same as the lay person definition. Reusing the same fragment of as few as 3 or 4 words without direct attribution is plagiarism.

For fun, trying googling "It increases the amount of crap", and you will find that your very own post meets the academic definition of plagiarism. Both for containing a single plagiarised phrase and percentage wise (46% plagiarised!).

I've seen post grads spend weeks using software to look for identical phrases of 4 words in their final work and previous work and then slightly rephrasing a sentence to avoid duplication of a common phrase and the dreaded self-plagiarism. Then running the thing again and inevitably finding the new wording was a duplication as well. I couldn't see it as anything other than a colossal waste of time. If you are writing about the same topic it's inevitable that you will phrase things in the same way at times.

All of this does nothing to combat the CV padding you worry about because it only concerns reusing the same exact language, not talking about the same ideas over and over in different ways.

As an outsider it seems that academia has descended into a kind of madness about plagiarism and self plagiarism. I think this is largely thanks to software companies trying to sell their plagiarism detection systems.


Sure but crap as such isn't an ethical problem. In fact, theoretically it would be the job of publications, not authors, to prevent crap from being published.

Plus, if the competition wasn't so gruesome and that evaluation criteria weren't shallow, the pressure to pad one's CV wouldn't be so great.

"Holy ethical violation batman, the slaves are slipping out of their manacles and they certified on their honor they would not do that"


Yes, it is. It's an ethical problem because making researchers sift through more crap just so that you can pad up your CV is slowing down further research into your field.

I know it's an oversimplification, but the essence of it is true: curing cancer is already hard enough. It shouldn't be made any harder by ambitiously exceeding your dead tree quota.

Sauce: abandoned an academic career :-)


Uh, by the token, a badly designed desk which makes things harder for researchers is an ethical problem.

I wouldn't minimize the problematic quality of having a bunch of crap in research or what-all else. But the thing is that, as you know, the academic world puts extreme pressure on people to both follow the rules and to makes themselves look good. When any violation of a rule-set can be called an "ethical violation" when it's really not that, it cheapens the whole concept of ethics. Which isn't surprising given how monumentally unethical it is to set up system.

Essentially, I can't help seeing people with a lot of power engaging in seriously unethetical things and happy to blur the lines between that and people cuts corners in response to pressure (and that stuff winds up being more pressure on people which doesn't actually stop the corner-cutting).


I would say intentionally designing a desk that is uncomfortable and will slow people down, and then passing it off as comfortable is unethical


Parse the sentence above in the context of being a reply to another post, darn it.


I felt a little uneasy reading this in the article. I write CS conference papers that are all in the same area, so obviously, some of the background information (which is a pretty small part of the overall paper) is basically copied from paper to paper. And that part is the most polished.

I don't feel uneasy because I'm doing anything remotely unethical---I'm not. I feel uneasy because it sounded a bit like the description in this article was over-broadly defining self-plagarism.


Is this an issue with conferences? My guess is that conferences would prefer to have interesting topics, and probably wouldn't care as much if the material was partially recycled. I have known people who presented early results of a study at one conference, and the final results at another. They were upfront about it, so it wasn't unethical, but I guess you could accuse them of resume padding. Personally, I would prefer a bunch of really interesting presentations at a conference with some recycled material, rather than a bunch of boring presentations that are all original (although if you attend a lot of conferences with the same people, you might prefer the latter I guess).


The conferences I submit to don't care if some of the background material (which is sort of boilerplate---but it is nonetheless necessary to state your assumptions) is similar across papers. Really, you couldn't do it any other way.

I will say that academic (i.e., scientific) conferences should never include "recycled" or "old but interesting" material. There has to be some new research results. Otherwise, it's not science. The point of these conferences is not to entertain, it's to push forward the scope of knowledge.


I really don't view this as resume padding. Think of it in terms of the agile methodology - you don't want to spend a year of your life working on an idea that the rest of your community thinks is irrelevant or has a major flaw. So you publish your preliminary findings in a workshop, garner feedback, and then publish the full work later on in a conference. Seems like a sensible approach to me.


I agree, that is the smart approach.


Exactly. A research group will build an experimental setup and then do several experiments with it. In each paper you need to describe the experimental setup, so it makes sense to write that description once and use it in several papers while only changing the parts of the setup that you changed for the new experiment. Another case in which this often occurs is in the introduction. Each paper has a one or two paragraph introduction to the field. For instance papers about breast cancer maybe have an introduction that goes "Breast cancer occurs in X% of women over N years old, etc etc". Since a researcher's papers are most times in the same field, it makes sense to reuse those paragraphs in several papers. They may call this "self-plagiarism" but in my opinion there is nothing unethical about this.


Agreed, you write the same stuff over and over again. This isn't "plagiarism", it's "I've been working on the same topic of 20 years and there is nothing new to report about it in this portion of the paper, but here something new". That is the point of a paper.

Either way, I agree with everyone else in this thread as well. The system of publish or perish is 1. dumb 2. old 3. not moving science forward. It's become more of a "I need xyz papers published this year" instead of "I'm curious about xyz and how it solves abc". That isn't science, it's a career/ego metric.

Good news is that there are groups (like myself, science exchange, figshare etc) who are working to fix that. It's a matter of time, but a social change to science is starting.


A major issue is that you assign copyright to many publications, meaning that it's infringement to reuse an image in another publication. The rest of the problem is basically one of passing off already published results as new ones. Of course, because reviews take for fucking ever and can be hit by unexpected obstacles, you will not know the publication order of your articles in advance, so properly referencing parts of the work as already published while it's in pre-pub limbo is difficult. Arxiv helps a lot there, but there are still many traps.


I've said as much in the article comments, but it's worth repeating here: there's really no surprise that this sort of thing happens when more than a million articles are published every year.

There is so much pressure on researchers to publish. I really do think the way to solve this sort of problem is to find ways to give researchers credit for other forms of contributions.

This is what figshare is doing with datasets, and what we're doing with peer-review: http://blog.publons.com/post/61380784056/announcing-doi-supp...


For Computer Science Ph.D.'s, I think you should get credit for writing code.

Either production-quality open source code, or pedagogical code. I'm looking at the Stanford Pintos kernel and MIT xv6 kernel. While there were minor papers from those projects, I think they were more like a labor of love. When you consider the coding effort, those projects probably took 10x the effort than a typical paper.

But yeah it would be better if a little more time was spent on code vs. papers.

I actually attended a talk from an Adobe researcher talking about software abstractions some years ago. He advocated that you should be able to get a Ph.D. for finding a good abstraction, e.g. for say modeling a paint brush or something. There are lots of bad ways to write code but only a few good ones. Even better would be to write it in a way so that other people can actually learn from it.


Part of being a scientist is learning how to communicate your results. And anyway, writing is required for good thinking.

As a PhD student implementor, yes, my theory colleagues get way more publications than I hypothetically could even if I were a better student than I am. That makes it hard to get an academic job, but otherwise doesn't matter too much.


Sure, I didn't say you shouldn't write. There are plenty of people who write great code and then write about it. The author of Redis author has a great blog, where he explains various design and implementation issues. That kind of work may not qualify for most academic journals. But it is a lot more valuable than a lot of academic research and should be a valid way to get a Ph.D.


Is what he is doing scientific reesarch in computer science? Possibly, if he also writes up his results to that they can go into the scientific literature. If he does that, he can get a PhD.

A PhD in CS does not mean "extremely good software engineer," it means "scientist."


I'm not sure you can really write production quality code in a Ph.D., because that requires a production environment, which just isn't available in academia, and also because writing such code conflicts with research goals.


Sure you can. Plenty of 16 year olds have written say windows managers that thousands of people use -- that's production quality code.

Your second argument is circular -- I'm saying a valid research goal should be to write solid and useful code, and then explain it.

For example, you could write an OS kernel or kernel subsystem to fill some particular part of the design space. A great but rare example is what the authors of Lua have done.


All I mean is that fixing non-essential bugs, making your code pretty, worrying about small performance regressions, writing thorough documentation, polishing the user interface, porting your code to N platforms, dealing with copyrights and patents and trademarks, ensuring good distribution of your code either by selling it or making packages for major distros, and taking care of your users generally doesn't help you to get papers published.

Academia wants new ideas. Production (industry) generally revolves around making a new idea useful to a broad range of people, by which point it's an old idea. You can take something written in academia and put it into production, but if you do that in your Ph.D. it almost certainly isn't helping you to finish.


You're still arguing circularly. I know that what you mention is exactly the reason why academics don't write more code.

I'm saying it would be better for society if the academic culture emphasized the craft of coding, rather than solely "new ideas". The whole point of this article is that the emphasis on "new ideas" incentivizes fraud.

Academic culture changes faster than you think. I expect that the structural changes caused by online courses will have a big effect in the near future.


In my experience, it's already at the point where a focus on trade and craft actually hampers innovation in an academic context, because so much of the funding comes from industry. Most papers today are about safe, incremental improvements rather than bold new ideas, because there's no obvious money in significant breaks from tradition. I think we need to let academia and industry each do their job: academia can produce great, wild ideas and discover fundamental truths, and industry can refine them into something profitable.


Nah. You can write great code inside Racket for a PhD using tooling that's light years ahead of almost any development platform on the planet.


ImpactStory monitors github, which is pretty cool. Example: http://impactstory.org/CarlBoettiger


the publish-or-perish rat race is one of many factors degrading academia. Couple that with steadily declining numbers of tenure-track faculty, making the glut of PhD students and early career faculty ever more desperate to pull all the stops to secure a meaningful career, and it really shouldn't be surprising to see a lot of misconduct. People are desperate.


You can't enforce such a brutal publish or perish pressure on someone's livelihood without selecting to some extent for people who will abuse the system. I personally think the whole system is fundamentally flawed, given the extent to which it relies upon trust. You're operating a very, very competitive market, where the integrity of the 'product' is almost entirely self-verified.


Kind of ironic that we have lawyers from elsevier talking about misconduct when they have this http://www.the-scientist.com/?articles.view/articleNo/27383/... in their closet as well as sundry other things.

On the other hand, maybe they really are trying to improve themselves and I am being too judgemental.


Important issue. It is good to see Elsevier, a publisher often thought to be more concerned with profits, highlighting the problem.


I don't see why that matters. This is in issue with absolutely every journal, and Elsevier should die regardless. Ask the editors of any open access journal what percentage of submissions show problems like this, you'll likely get the same response. This in no way excuses or mitigates the evil that Elsevier has done to the world.


And those are the ones who do a sloppy enough job to get caught.


I can speak for Computer Science and the problem is "quid pro quo". Irrespective of what the conferences/journals say, reviews are NEVER blind and the people know each other in the community. No one "dares" to cross lines or report in fear of being censured.


Exactly. I've experienced this in mathematics. Just knowing the topic of a paper will typically narrow down the possible list of authors to about 5 people. From there, the writing style or other clues typically give it away.


I thought this was going to be a study of papers written by undergrads for class. I thought 10% sounded extremely low.


Elsevier talks about ethical problems. There is a joke in here somewhere.


What happens to them, are they blacklisted from future publications?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: