Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You can probably tokenize the names so they become irrelevant. You can ignore non-functional whitespace, so that code C remains. Maybe one can hash all the training data D such that hash(C) is in hash(D). Some sort of Bloom filter...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: