Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
SSE4.2 and the new CRC32 instruction (byteworm.com)
48 points by rbranson on Oct 13, 2010 | hide | past | favorite | 18 comments


VIA has hardware AES encryption, SHA hashing as well as hardware RNG in its CPUs for quite some time now.

http://www.via.com.tw/en/initiatives/padlock/hardware.jsp


http://en.wikipedia.org/wiki/AES_instruction_set it seems Intel also has similar instruction set. Could be useful if you need a lot of encryption (truecrypt partition encryption etc.)


I've read varying comments regarding this with respect to the Intel CORE CPU models that have it active. Early on, a review comment or two claimed to observe little effect e.g. with TrueCrypt. More recent comments I've seen have claimed to observe a much greater effect; IIRC TrueCrypt 7, recently released, is the first version to use the AES instructions when available.

P.S. Note, if you are interested in this feature, that not all levels of the CORE product line have it. For example, in the current mobile line, I believe it's present/activated only at the 520M level and above (although my knowledge is some months dated, at this point).

Ah, I see this information is echoed in the Wikipedia reference:

http://en.wikipedia.org/wiki/AES_instruction_set#CPUs_with_A...


The IBM System Z also has a crypto coprocessor. It's pretty neat stuff if you ask me.


also: http://www.strchr.com/strcmp_and_strlen_using_sse_4.2

SSE 4.2 introduces four instructions (PcmpEstrI, PcmpEstrM, PcmpIstrI, and PcmpIstrM) that can be used to speed up text processing code (including strcmp, memcmp, strstr, and strspn functions).


Does anyone know any systems where generating CRC32s is a bottleneck?


Perhaps not a bottleneck, but for very high network throughput systems, offloading the TCP checksumming can make more CPU available for other tasks. Often the offload engines built into network cards are either slow or have very poorly written drivers.


I feel like hardware TCP/IP stacks would probably massively improve performance by removing all details of the communications from the OS and processor. In saber OSes, this would just mean that one of the higher levels of abstraction would essentially deal directly with the hardware instead of a few more layers of abstraction.

Does this make sense? These communications seem so central to modern computing that they deserve specialized hardware support.


Many network cards offload some of the low-level protocols, typically ethernet and IP checksums, but some do deeper inspection and prepare packets conveniently for the driver if the OS supports it.


I forgot about TCPs use of CRC32. Still, I wonder if the time spent in CRC32 is a noticeable fraction of the total time. I guess it must be in some situations, otherwise why would they put it in the chip...


That might be caused by the fact, that TCP does not use CRC, but it's own checksum algorithm (faster, less relaible).


...and very often is calculated by network card itself and left blank by OS. (See checksum errors in wireshark.)


iSCSI with Data/Header validation is the main motivation for including CRC32


Although I can imagine a significant number of CRC32 operations in high-capacity gateways (LTE gateways, IP core routers and so forth), Intel processors are not so prevalent in those areas (edit: anecdotal, of course, it's just my perception of the area. I'm a bit bored to seek concrete numbers on this. MIPS-based multicore processors seem to be more commonly used.). Still, there may be a significant benefit from hardware implementations of those operations (checksumming, hashing, Triple-DES/AES for IPsec, etc).


Yes, they all use Adler32 instead because it's faster.


A sha256 would be handy for block-level deduplication.


He probably should have compared one assembler program with another, but CRC computation is so simple that the C program probably generates optimal assembly code.


Even though the C program might generate optimal assembly, it doesn't generate optimal assembly for calculating a CRC32 checksum -- that's the whole point of the article: new Intel processors have a single instruction to do this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: