More

Aurel1us · on Nov 1, 2014

Is it normal that the sound continues after I close a Tab (Windows 8.1 with Chrome 38.02)?

Aurel1us · on Nov 1, 2014

Looks very promising. Did a quick set up but unfortunately some counters couldn't be initialized on my VM.

Aurel1us · on Aug 20, 2014

Just as a reminder: "So, malloc on Linux only fails if there isn’t enough memory for its control structures. It does not fail if there isn’t enough memory to fulfill the request." - http://scvalex.net/posts/6/

quotemstr · on Aug 20, 2014

Malloc can also fail if you're out of address space without necessarily being out of memory.

NT does a much better job of separating these concepts than Unix-family operating systems do. Conceptually, setting aside a region of your process's address space and guaranteeing that the OS will be able to serve you a given number of pages are completely different operations. I wish more programs would use MAP_NORESERVE when they want the former without the latter. (I'm looking at you, Java.)

One day, perhaps when I am old and frail, we will achieve sanity and turn overcommit off by default. But we're a long way from being able to do that now.

klodolph · on Aug 20, 2014

These days, I describe malloc() as "a function which allocates address space", to avoid confusion. Which means it makes sense that malloc() returns NULL if you are out of address space, even if you have lots of memory. (But so many people don't check malloc()'s return anyway...)

stonogo · on Aug 20, 2014

This is a mostly-untrue statement because it makes unreliable assumptions about the host system. It depends on the vm.overcommit_memory setting and the programmer should never make assumptions about why or when malloc might fail. Read more on Rich Felker's excellent blog post here: http://ewontfix.com/3/

acdha · on Aug 20, 2014

That page is incomplete: malloc() will fail on Linux depending on the `vm_overcommit` setting:

https://www.kernel.org/doc/Documentation/vm/overcommit-accou...

jjnoakes · on Aug 20, 2014

malloc() can fail in other ways.

One is if you set a lower process limit.

Another is if you are allocating lots of memory with alternating mprotect() permissions. On some systems (AIX for example) this uses up all of the memory for control structures WAY before hitting the address space limit (I've seen it fail after just a couple of GB).

TheCoelacanth · on Aug 20, 2014

It will also fail if there's not enough address space to fulfill the request.

Aurel1us · on May 11, 2014

Does anyone have additional links for that field of research? Favored without charge PDFs.

Aurel1us · on April 12, 2014

An identifier must start with $, _, or any character in the Unicode categories “Uppercase letter (Lu)”, “Lowercase letter (Ll)”, “Titlecase letter (Lt)”, “Modifier letter (Lm)”, “Other letter (Lo)”, or “Letter number (Nl)”.

From: http://stackoverflow.com/questions/1661197/valid-characters-...

Aurel1us · on May 19, 2013

Short answer: \d includes all the Unicode characters from http://www.fileformat.info/info/unicode/category/Nd/list.htm

ars · on May 19, 2013

Is that actually a good thing? If I'm using \d to validate numbers (for example to check before string to int conversion, or IP address, phone number, or any other use), other unicode digits are not helpful to me.

It's great to support unicode, but I don't think the \d should have been extended this way. Add a \ud or something.

Tuna-Fish · on May 19, 2013

Given that the category is specifically "decimal digit", I think it's good, so long as the number parsing code accepts them all too.

dllthomas · on May 20, 2013

Yes. Assuming that, it's good. I think that assumption is likely to be invalid in many cases, though.

rmc · on May 20, 2013

Yes it's a good thing. There are other places in the world that don't just use ascii. If you want European style numbers just use [0-9]

bellbind · on May 19, 2013

If you use a preg engine you can add the /a modifier which excludes unicode chars from matches.

chebucto · on May 20, 2013

Maybe specify the subset of unicode you're expecting in the headers, and have the compiler do the nitty gritty?

wging · on May 19, 2013

...at least in C# regexes.

ars · on May 19, 2013

Anyone know if this happens in other languages?

yahelc · on May 19, 2013

Doesn't appear to in JavaScript:

    "੧".match(/\d/); //null

(Incidentally, this may explain the finding from http://stackoverflow.com/a/16622773/172322, as to why adding the RegexOptions.ECMAScript flag in the C# code eliminates the performance gap)

deskglass · on May 19, 2013

Nor in python:

print re.match(r'\d','੧')

None

wulczer · on May 19, 2013

it does when using the re.U flag

  re.match(r'\d', u'੧', re.U)
  <_sre.SRE_Match at 0x3070ac0>

  sys.version
  2.7.3 (default, Mar  4 2013, 14:57:34) \n[GCC 4.7.2]

tcas · on May 20, 2013

Also, when using Python 3.2 it seems to be the default behavior

  Python 3.2.3 (default, Oct 19 2012, 20:10:41) 
  [GCC 4.6.3] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> import re
  >>> re.match(r'\d', '੧')
  <_sre.SRE_Match object at 0x7f188f6d4850>

Falling3 · on May 20, 2013

Yes, but not by default.

jrabone · on May 20, 2013

Not true for Java. Docs even say:

  \d         A digit: [0-9]
  \p{Digit}  A decimal digit: [0-9]

which is actually somewhat depressing. I'd expect the named class to include the full Unicode digit set. It's surprising to see:

  ab1234567890cd matched 1234567890
  ab𝟣𝟤𝟥𝟦𝟧𝟨𝟩𝟪𝟫𝟢cd no match

from code using Pattern.compile("(\\p{Digit}+)");

EDIT: and perhaps more surprising to see in the logs:

  Exception in thread "main" java.lang.NumberFormatException: For input string: "𝟤𝟥𝟦𝟧"
  	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
  	at java.lang.Integer.parseInt(Integer.java:449)

That'll keep someone guessing for a while...

nspragmatic · on May 20, 2013

It happens in Objective-C:

    NSString *pattern = @"\\d", *string = @"੧";
    NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:pattern
                                                                           options:NSRegularExpressionCaseInsensitive
                                                                             error:nil];

    NSUInteger numMatches = [regex numberOfMatchesInString:string
                                                   options:0
                                                     range:NSMakeRange(0, [string length])];

    numMatches ? NSLog(@"%@ found by %@", string, pattern) : NSLog(@"%@ not found", string);

    // 2013-05-20 09:38:42.650 Regexperiment[17848:c07] ੧ found by \d

LawnGnome · on May 19, 2013

Happens in PHP only if you enable Unicode regex handling via the /u modifier and are running libpcre 8.10 or later (which corresponds to PHP 5.3.4 and later, assuming you're using the bundled libpcre): http://3v4l.org/QD3k0

bodyfour · on May 19, 2013

If you're using pcre directly from C code, this is controlled by specifying the PCRE_UCP flag to pcre_compile(). By default, \d and friends only match ASCII characters even if the PCRE_UTF8 flag is set.

xudongz · on May 19, 2013

Not true for Go

http://play.golang.org/p/ls96RxJxpz

nknighthb · on May 19, 2013

I would be reluctant to rely on this until the Go documentation is clearer on the intended behavior. Right now it's very poorly specified. The regex doc[1] talks about "same general syntax" as Perl, but points to [2], which doesn't seem to understand what it's saying, describing '\d' in terms of its "Perl" meaning, but then saying that it's [0-9].

[1] http://golang.org/pkg/regexp/

[2] https://code.google.com/p/re2/wiki/Syntax

laumars · on May 19, 2013

As a Perl developer that's been making the switch to Go, I've been caught out a few times with Go's no-so-Perl-like regular expression syntax. In fact I wish I knew about your 2nd link before now, because that could have saved me a few hours over recent months.

jamesmiller5 · on May 20, 2013

Considering go's vocal support for UTF8 I'm surprised at this behavior and curious to the reason for excluding it.

masklinn · on May 20, 2013

Supporting UTF8 and correctly handling unicode are very, very different beasts. The former is absolutely trivial, the latter is extremely difficult.

Go is vocal about the former, but seems to not give a shit about the latter.

snogglethorpe · on May 20, 2013

I'm not particularly fond of go, but "correctly handling unicode" can be subjective and case-dependent... I think making only minimal guarantees and punting to the application is often the only sane course.

masklinn · on May 20, 2013

> "correctly handling unicode" can be subjective and case-dependent...

So is correctly handling integers.

> I think making only minimal guarantees and punting to the application is often the only sane course.

That is completely and utterly crazy, the average developer has neither the knowledge nor the resources to make anything but a mess out of it without proper tools and APIs. Even with these (including a complete implementation of the unicode standard and its technical reports) unicode is already complex enough to deal with.

snogglethorpe · on May 20, 2013

Of course it's not "completely and utterly crazy."

Not every app needs to deal with the enormous complexities implied by "full unicode support", and given the huge cost of that, there's a real place for a minimalist approach. If all I do with unicode is input strings from the user, store them in a database, and then later spit them out, I don't need to be able to do Turkish case-conversion, and I may not want to pay the cost of making it possible.

Certainly tools and APIs help for those cases where an app needs to do the sort of complicated text-processing that warrants "full" unicode support, but it's not at all clear that the proper place for such support is in the base language libraries. It's quite reasonable for the language implementors to say "if you want to do X, we'll support that, but if you want to do Y and Z, please use external library L."

masklinn · on May 20, 2013

> Not every app needs to deal with the enormous complexities implied by "full unicode support", and given the huge cost of that, there's a real place for a minimalist approach.

Not sure what point you're trying to make, I never said all applications had to make full use of all possible Unicode APIs, I said the language must expose them. Because if it doesn't, those who should use them will never become aware of them let alone use them.

> If all I do with unicode is input strings from the user, store them in a database, and then later spit them out, I don't need to be able to do Turkish case-conversion, and I may not want to pay the cost of making it possible.

So?

> It's quite reasonable for the language implementors to say "if you want to add numbers, we'll support that, but if you want to subtract or divide them, please use external library L."

Really?

Then again, considering Go's embedded contempt for non-US locales (see: datetime patterns) I'm not even sure why we're having this discussion, and since it's obvious they don't care for a non-US world it make sense that they wouldn't care for processing text.

And at the end of the day, you agree that Go has no provision for unicode handling, you just think it's all fine and dandy.

cwmma · on May 20, 2013

All the same speed in JavaScrip http://jsperf.com/regexcwm/2

jeltz · on May 19, 2013

Happens in Perl but not ruby or PostgreSQL.

pfedor · on May 20, 2013

Doesn't happen in Perl for me:

  pfedor@Pawels-iMac:~$ perl -ne 'print "Digit!\n" if /\d/'
  af
  3
  Digit!
  23fa3
  Digit!
  asdf
  ١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹
  ৩৪৫৬৭৮৯੦੧੨੩੪੫੬੭੮੯૦૧૨૩૪૫
  ୧୨୩୪୫୬୭୮
  ౨౩౪౫౬౭౮౯೦೧೨೩೪೫೬೭೮೯൦൧൨൩൪൫൬൭൮൯๐๑๒๓๔๕๖๗๘๙໐໑໒໓
  234
  Digit!

(perl from Macports and perl from /usr/bin/perl behave the same in this respect.)

xonea · on May 20, 2013

You have to tell to interpret stdin as UTF-8 (flag -C) - then it works: https://news.ycombinator.com/item?id=5734641

pfedor · on May 21, 2013

Good to know, thanks.

I'd argue that perl gets it right--as the default behavior, this behavior would gravely violate the principle of least surprise, but for the 0.01% of people who want \d to match ੧, there's no harm to making it available as an option you need to specifically request.

Falling3 · on May 20, 2013

Exactly what I was thinking.

Doesn't in Ruby:

/\d/.match "੧" #=> nil

Argorak · on May 20, 2013

Just for reference:

  /\p{Digit}/.match "੧" => #<MatchData "੧">

hkmurakami · on May 19, 2013

oh wow I had no idea that "full width digits" can actually be handled properly. (U+FF10 ~ U+FF19)

coldtea · on May 19, 2013

Or improperly. If you expect \d to be a shorthand for 0-9, your string can also contain junk.

Aurel1us · on May 13, 2013

Thanks a lot - wonderful collection!

Aurel1us · on May 12, 2013

I iz from da (#)

Aurel1us · on May 8, 2013

Anyone know something similar for C/C++ or Python ?

judah · on May 9, 2013

It's based on the Z3 solver, here's Z3Py: http://rise4fun.com/z3py

monksy · on May 8, 2013

Or for Java.

grdvnl · on May 9, 2013

Pex is based on the Daikon dynamic invariant. This document explains some results and examples in certain languages.

http://groups.csail.mit.edu/pag/daikon/download/doc/daikon.p...

rainforest · on May 9, 2013

Is it? I was under the impression it just uses symbolic execution rather than invariant detection.

grdvnl · on May 19, 2013

Your are right. I mixed up dynamic symbolic execution with invariant detection. I know there is something called DySy which does do invariant detection using dynamic symbolic execution.

danbruc · on May 11, 2013

Pex is based on the Daikon dynamic invariant.

That's not true.

danbruc · on May 8, 2013

https://news.ycombinator.com/item?id=5676010

Aurel1us · on April 18, 2013

Came here to post that link. One of the best TED talks ever!

civilian · on April 18, 2013

Well I beat you to it :-P is your username inspired by Marcus Aurelius? (I'm almost done with Meditations and it's great.)

Aurel1us · on April 19, 2013

Right! Love the presence of emperors' names.