Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Intel's “cripple AMD” function (2019) (agner.org)
500 points by arto on Aug 28, 2020 | hide | past | favorite | 104 comments


> Never rely on benchmark tests unless the benchmarking code is known to be open source and compiled without using any Intel tools.

A serious question - how many common benchmark packages are compiled by ICC or uses Intel MKL? I hope the number is limited, otherwise all the benchmarks published by mainstream PC reviewers are potentially biased. If there's a serious ICC-in-benchmark problem, then only Phoronix's Linux benchmarks are trustworthy - the majority of benchmarks on Phoronix uses free and open source compilers and testsuits, with known versions, build parameters and optimization levels. Thanks Michael Larabel for his service for the community.


This concern really only applies to synthetic benchmarks (stuff like SPEC CPU). If you're testing a commercially available application or game as delivered to consumers, this issue does not invalidate the benchmark, it just makes the software vendor a bit of an Intel stooge.


Of course it doesn't matter for home consumers because real target audience for MKL isn't desktop user applications or games. Try using VASP or COMSOL or Mathematica on an AMD CPU. Both MKL and CUDA are major "issues" in HPC that limit decisions in purchasing clusters.


I don't know about VASP because it's proprietary, but I have compared other DFT code on 64-core Bulldozer with 12-core Sandybridge nodes. I don't remember the numbers, but the high core count was rather effective in reducing the communication costs with all free software.

https://www.archer2.ac.uk/ will run a lot of that sort of thing. I think at least cp2k and CASTEP are included in the benchmarks, but they're listed somewhere.


When you're actually a part of conversations on purchasing a cluster (which I gather from the way you talk, you haven't been a part of) which cost $400k ~ $1M for a smallish/mid-sized, arguments like "I don't remember the numbers" "they're listed somewhere", "there's this some other random DFT code" aren't effective. The hard fact is (which I hate), you get results faster with MKL on Intel than any other alternatives. This is more so with the proprietary software that are golden standards.

> I have compared other DFT code on 64-core Bulldozer with 12-core Sandybridge nodes

And what's that comparison supposed to tell us, aside from the obvious fact that MPI introduces latency? That's just about the number of cores, not the performance of a each core. You need to compare 64-core AMD node against a 64-core Intel node.


I don't remember, because the measurements on the £1M purchase was maybe five years ago, but they taught a useful lesson. I didn't see figures in what I was responding to. If I'd had more influence on the purchase, as opposed to observing the process, we wouldn't have ended up with a pure sandybridge system, which was a mistake. Anyhow, my all-free-software version of cp2k was faster on it than an all-Intel version on slightly faster CPUs on an otherwise equivalent cluster. I measured and paid attention to the MPI, which benefited everything using alltoallv. The large core-count AMD boxes were simply a better bet for the range of work on a university HPC system. It's not as if most codes topped out an arithmetic intensity and there was a serious problem with serial performance, even if MKL had been significantly better than the free libraries, which it wasn't.

For a recent exercise, spending rather more money on AMD CPUs for the UK Tier 1 system, look at the Archer2 reference and benchmarking for it. It's expected to run large amounts of VASP-like code; www.archer.ac.uk publishes usage of the current system. Circumstances differ, and I'm pointing out contrary experience, understanding the measurements and what determined them.


The go-to Intel-crushing benchmark these days is Cinebench R20, which runs on Intel's Embree raytracer.


So Cinebench R20 is a suspect?! That's not good...


But is that compiled by ICC?


Well, it depends. If you are a big enough outlet or a hyperscaler, you should care about raw-performance because if a sufficient amount of people make the same choice as you, the software vendor will adapt and you will end up with the competitive advantage.


Most of Python math libraries? So a majority of Machine Learning folks is affected.


Most Python math and ML libraries are compiled using GCC or LLVM and are linked against against cBLAS or OpenBLAS[1]. The latter is highly performant on both Intel and AMD (and other platforms). Some libraries are optionally compiled against the MKL, in particular those that are distributed with the Anaconda Python distribution.

[1] https://numpy.org/doc/stable/reference/routines.linalg.html?...


PyTorch uses MKL https://www.google.com/amp/s/amp.reddit.com/r/MachineLearnin... and the workaround for AMD has been disabled by Intel


I think MKL actually fixed Zen performance. That is, the workaround no longer makes any difference because it is no longer needed.

Small matrix multiply benchmarks on a Zen2 (Ryzen 7 4700U), featuring MKL 2020.1.216+0, OpenBLAS, and Eigen: https://gist.github.com/stillyslalom/bd916e3d26b4531364676ac...

MKL's benchmark performance requires AVX + FMA. 3.5 GHz * (4 add + 4 multiply) * 2 fma/cycle = 56 peak GFLOPS. To exceed 50 GFLOPS without them would imply the CPU ran at 12.5 GHz.

OpenBLAS, on the other hand, actually performed poorly because it was limited to SSE thanks to a bug preventing the CPU from being recognized as Zen2.


I think MKL actually fixed Zen performance. That is, the workaround no longer makes any difference because it is no longer needed.

Odd. I am trying on my 3700X and it is definitely not using AVX, FMA or AVX2 code paths. Intel MKL 2020 update 2:

     ldd  ~/git/sticker2/target/release/sticker2  | grep mkl_intel
     libmkl_intel_lp64.so => /nix/store/jpjwkkv1dqk4nn8swjzr5qqzp0dpzk2f-mkl-2020.2.254/lib/libmkl_intel_lp64.so (0x00007fe786862000)
I checked the instructions in with perf and it is using an SSE code path. Also, as reported elsewhere, MKL_DEBUG_CPU_TYPE=5 does not enable AVX2 support as it used to do.


Comparing OpenBLAS and MKL with `peakflops` in Julia, there's definitely an advantage for MKL:

    julia> using LinearAlgebra

    julia> BLAS.vendor()
    :openblas64

    julia> BLAS.set_num_threads(1)

    julia> peakflops()
    3.9023447970402664e10


    julia> using LinearAlgebra
    
    julia> BLAS.vendor()
    :mkl
    
    julia> BLAS.set_num_threads(1)
    
    julia> peakflops()
    4.8113846984735275e10
That's close to the ~50 Gflops I saw in @celrod's benchmarks.


The plot thickens. As I reported elsewhere in the thread, the slow code paths were selected on my machine, unless I override the mkl_serv_intel_cpu_true function to always return true. However, this was with PyTorch.

I have now also compiled the ACE DGEMM benchmark and linked against MKL iomp:

    $ ./mt-dgemm 1000 | grep GFLOP
    GFLOP/s rate:         69.124168 GF/s
Most-used function is

   mt-dgemm  libmkl_def.so       [.] mkl_blas_def_dgemm_kernel_zen
So, it is clearly using a GEMM kernel. Now I wonder what is different between PyTorch and this simple benchmark, causing PyTorch to result in a slow SSE code path.


Found the discrepancy. I use single precision in PyTorch. When I benchmark sgemm, the SSE code path is selected.

Conclusion: MKL detects Zen now, but currently only implements a Zen code path for dgemm and not for sgemm. To get good performance for sgemm, you have to fake being an Intel CPU.

Edit, longer description: https://github.com/pytorch/builder/issues/504


Hmm.

FWIW, on my [Skylake/Cascadelake]-X Intel systems, Intel's compilers performed well, almost always outperforming GCC and Clang. But on Zen, their performance was terrible. So I was happy to see that MKL, unlike the compilers, did not appear to gimp AMD.

It's disappointing that MKL doesn't use optimized code paths on the 3700X.

I messaged the person who actually ran the benchmarks and owns the laptop, asking them to chime in with more information. I'm just the person who wrote that benchmark suite.


It seems I have found the issue. We were both right. MKL now uses a Zen-optimized kernel for dgemm, but not (yet?) for sgemm. More details:

https://github.com/pytorch/builder/issues/504


If OpenBLAS' CPU detection fails, you can force it with an environment variable, but why omit AMD's implementation?


Do you know if this applies to epyc as well?


No, I don't. The 32-core AWS systems must be Epyc, so I'll try benchmarking there.

When OpenBLAS identifies the arch, it is competitive with MKL in single threaded performance, at least for matrices with a couple hundred rows and columns or more. But MKL truly shines with multiple threads, so scaling on a 32 core system would be interesting to look at.


You can see BLIS on Intel's home turf at https://github.com/flame/blis/blob/master/docs/Performance.m... (52-core SKX) and compare with OpenBLAS on 32-core Zen1. (Multithreaded BLAS isn't typically used in HPC, where the parallelism is elsewhere.)


I'd have to disagree, whenever I install new pip/conda packages, an MKL version is downloaded by default (I've never seen any *BLAS version by default in my life).


NumPy binary wheels on PyPI (i.e. from pip) are built with OpenBLAS. NumPy from the official Anaconda, Inc. conda channel defaults to MKL. The conda-forge channel defaults to OpenBLAS.

Explained in the NumPy docs here: https://numpy.org/install/


> Most Python math and ML libraries are compiled using GCC or LLVM and are linked against against cBLAS or OpenBLAS[1]

This is definitely understating MKLs market share in BLAS. It's an extremely common BLAS backing for Python libraries.


MKL is a real problem in HPC, and it does play a significant role in decisions when purchasing clusters.

In addition, software like Mathematica/Matlab which use only MKL can affect decisions even for office workstations.


Money quote: "... on an AMD computer then you may set the environment variable MKL_DEBUG_CPU_TYPE=5."

When run on an AMD, any program built with Intel's compiler should have the environment variable set. I don't think there is any downside to leaving it on all the time, unless you are measuring how badly Intel has tried to cripple your AMD performance.


My understanding is that that flag is gone, as of couple of months ago. Intel “fixed” it.


Yes, starting with MKL 2020.01 release. The Wikipedia page has more information and references:

https://en.wikipedia.org/wiki/Math_Kernel_Library#Performanc...

This is quite bad, since a lot of software relies on Intel MKL as the default BLAS implementation (e.g. PyTorch binaries).


Why not patch out the CPUID check as a post compilation step?


That's definitely possible (it probably checks that the manufacturer ID is GenuineIntel), but nobody wants to distribute patched MKL versions, because it most likely violates the MKL license.

It may even be easier to replace the function altogether with LD_PRELOAD.


Indeed works. A simple trace reveals that the function is called mkl_serv_intel_cpu_true().

Make a file with the following content:

    int mkl_serv_intel_cpu_true() {
      return 1;
    }
Compile

    gcc -shared -o libfake.so fake.c
Run

    LD_PRELOAD=libfake.so yourprogram
And it uses the optimized AVX codepaths.

Disclaimer: may not be legal in your country. I take no responsibility.


By the way, if you want make this permanent in a binary, there is no need to set LD_PRELOAD all the time. You could just add a DT_NEEDED entry to the dynamic section. E.g. something like:

    patchelf --add-needed libfakeintel.so yourbinary


Wow. I wasn't quite expecting something as simple as "if CPU is not intel, make everything worse."


I’m sure their justification is that (1) they have no obligation to help AMD, and (2) how could you guarantee AMD implements CPUID the same as Intel (as in: what if AMD implements a feature bit differently?)

Of course, the second one makes no sense as x86 programs run just as fine on AMD as Intel with the same feature set (albeit at different speeds)


You distribute a binary patch for a given MKL release, have your package download the official MKL release and then patch it using the binary patch. Nobody suffers, everyone wins.


No need to patch MKL, just your own binaries post compile.


Exactly what I was thinking. For libs like MKL it should even be feasible to have a database of known binary releases with a patch offset so you can speed up your scientific application using a little patch tool. But even for executables my guess is that it should be relatively easy to programmatically find the relevant check and patch it, unless Intel starts to deliberately obfuscate it, like copy protection checks in games.


How on is the end user supposed to know to do that, know when to do that, or know what to do when the next update to Intel’s compiler that puts cpu-type-5 on the pessimal code path?

Is there something I can add to my bashrc to handle that?


If the environment variable still works, it could be set by a distribution (esp scientific ones) or in your .bashrc: eg. `export VAR=5`.

If that fails, as OP implies, you can still override the function by creating a tiny library with it always returning true. On GNU/Linux systems, you do that using LD_PRELOAD. Perhaps someone's already done that so you just need to download, compile and set it.

Sorry for the lack of specifics, but I do not deal with these libraries, yet I was still hoping to point you in the right direction.


Not any program built using ICC, rather, any program using Intel’s MKL, a set of basic linear algebra libraries (BLAS). This is typically limited to scientific computing applications and libraries.


A lot of software depends on a BLAS as a dependency somewhere.


The statement you were responding to is only referring to the Intel mkl, though. There are many other blas libraries. Where you making a more general statement about some set kf blas implementations? Or the blas interface in general perhaps?


I work on CFD software. We're well aware of this in my work, but the reality is that all our big corporate clients use Intel hardware. We already tell people to set those environment variables in our documentation.

> Avoid the Intel compiler. There are other compilers with similar or better performance.

This is not really true IMO, but even as an aside, the Intel compiler has the enormous advantage of being available cross platform. So we can use it on Linux and Windows, and provides MPI cross platform. We upgrade fairly regularly and that provides us with less work.

My own tests found that PGI compiler performance was worse than Intel for C++, and that now appears to have been discontinued on Windows anyway with NVidia's new HPC compiler suite replacing it. GNU can run everywhere, but performance is around 2.5x worse on Linux for our application use case because it doesn't perform many of the optimisations that Intel does. We use MSVC on Windows just because everyone can have a license, and performance is much worse.

The other thing is that MKL is pretty stable and gets updated. If I use an open source BLAS/LAPACK implementation - sure, it works, and it may even give better performance! But it's not guaranteed to get updates beyond a couple of years, and plenty of implementations are also only partial. We pay Intel a lot of money for the lack of hassle, basically.


So which are the optimizations the Intel compiler does which GCC can't is asked? I could guess at the reason for a factor of two, but what does the detailed profiling say with equivalent compiler flags? I can also say that GCC is a factor of two better on SKX on a Fortran benchmark, and came out about the same over the collection that's from when profile-directed. The usual reason for the Intel compiler appearing to win much is incorrect-by-default maths optimization allowing more vectorization.

I don't know about MKL stability, but reliability definitely isn't something I associate with the Intel Fortran compiler (or MPI) in research computing support.


I found that the common subexpression elimination performance was significantly better than that in GCC for one thing


> the Intel compiler has the enormous advantage of being available cross platform

How does this advantage not apply to gcc? Isn't gcc the most cross-platform compiler ever?


I think it’s a matter of features and performance. The poster says he can use gcc but the performance is 2.5x slower.


that's an incredible margin, and sounds suspiciously like they didn't enable optimizations on gcc, or set icc to optimize for a specific processor and gcc to generic, or something like that.


Hmm, OpenMP is a wildcard here.

At which point, its not just the compiler (which GCC is pretty good at), but also the threading implementation (which I can believe that GCC has an inferior Windows-threading OpenMP implementation).

I don't really use either tool. But OpenMP + GCC on Windows doesn't sound like it'd be fast to me.

--------

MSVC only has OpenMP 2.0 support (OpenMP is all the way up to 5.0 now).

OpenMP, despite being a common interface, also is pretty reliant on many implementation details for performance. One way of doing things on GCC could be faster than another, while it could be the opposite on ICC. Its quite possible that their codebase is tailored for ICC, and that recompiling it under GCC (with a different OpenMP implementation) results in weaker performance.

I wouldn't expect 250% performance difference in normal code however. GCC and ICC aren't that far off under typical circumstances.


The mythology surrounding the Intel tools and libraries really ought to die. It's bizarre seeing people deciding they must use MKL rather than the linear algebra libraries on which AMD has been working hard to optimize for their hardware (and possibly other hardware incidentally). Similarly for compiler code generation.

Free BLASs are pretty much on a par with MKL, at least for large dimension level 3 in BLIS's case, even on Haswell. For small matrices MKL only became fast after libxsmm showed the way. (I don't know about libxsmm on current AMD hardware, but it's free software you can work on if necessary, like AMD have done with BLIS.) OpenBLAS and BLIS are infinitely better performing than MKL in general because they can run on all CPU architectures (and BLIS's plain C gets about 75% of the hand-written DGEMM kernel's performance).

The differences between the implementations are comparable with the noise in typical HPC jobs, even if performance was entirely dominated by, say, DGEMM (and getting close to peak floating point intensity is atypical). On the other hand, you can see a factor of several difference in MPI performance in some cases.



Even if the compilers are biased, isn't it reflective of what users would experience because most software is made with biased compilers?


Not really. No one outside of specialized applications like HPC will use Intel's compiler for their software. The general public seeing SPEC benchmark figures between gcc AMD and icc Intel may be surprised when they that Intel CPU doesn't perform as well as expected vs AMD when running generic code.


5-10 years ago the Intel C compiler produced significantly faster code than gcc (and clang was even worse back then), so there was a bigger reason to use it back then.


That was the story 10 years ago as well, yet I have never managed to find an open source program where the Intel compiler has produced faster code than gcc back then, too.

gcc has always produced faster code for at least 15 years. In fact, it is the Intel compiler which has caught up in the most recent version.


I got faster (10-20%) results with icc on an abstract game minimax AI bot back then (i.e. something similar to a chess engine). Even more so when taking advantage of PGO. Over time GCC caught up.

By nature, this code had no usage of floating point in its critical path.

I haven't bothered with icc in years though.


For what sort of application? I ran benchmarks of my own scientific code for doing particle-particle calculations and with -march=native I could get 2.5x better performance with Intel vs GCC.

One thing I found that you do have to be careful with though is ensuring that Intel uses IEEE floating point precision, because by default it's less accurate than GCC. This causes issues in Eigen sometimes, we ran into an issue recently after upgrading compiler where suddenly the results changed and it was because someone had forgotten to set 'fp-model' to 'strict'


If Intel is using floating point math shortcuts you can replicate it with -Ofast when using gcc.

It goes without saying that you should use -O3 (or -O2 for some rare cases) otherwise. I am mentioning it just in case because 2.5x slower sounds so exotic to me that the first intuition is that you're omitting important optimization flags when using GCC. GCC was faster than Intel on everything I tried in the past.


Once upon a time, Oracle used Intel C Compiler (ICC) to compile Oracle RDBMS on some platforms [1].

I don't know if Oracle is still using ICC for that or not. (If you download Oracle RDBMS, and check the binaries, you will be able to work it out. I can't be bothered.)

[1] https://www.businesswire.com/news/home/20030507005238/en/Ora...


How can you tell from a binary what compile was used to produce it?


There can be various traces left in strings, the symbol table, etc

Many compilers statically link implementations of various built-in functions into the resulting executable, and that can result in different symbol table entries


...and that despite not being anywhere near as aggressive with exploiting UB as gcc or clang, which shows that backend-based optimisations like instruction selection, scheduling, and register allocation are far more valuable (and predictable).


I don't think anyone disputes that? Most optimizing compiler literature doesn't even mention language semantics, the gains there are very much last-ditch rather than necessary.

I can't even find benchmarks of ICC vs a current GCC but they were pretty even the best part of a decade ago. GCC is a mess compared to LLVM but it's quick.


I'd be curious to know what compiler Unity and Unreal Engine are using.


I've never used Unity, but Unreal Engine is heavily tied into the Visual Studio (proper, not Code) workflow, including the Microsoft C++ Compiler toolchain and all 30GB+ of it's friends.

I'd suspect the same from Unity.


Both engines support platforms where Visual Studio is not available, right?


Unreal uses the native compiler for the target platform. Windows this is msvc. Modern consoles are all clang forks. Linux is the only exception where I think they depend on clang not gcc.


nitpick: Maybe Xbox One is built by MSVC?


Is there major software that uses I tell compiler?


Most high performance software on super computers uses the Intel C and Fortran compilers, and much engineering and scientific software on workstations uses the Intel Maths Kernel Library (MKL) for high performance linear algebra.

Now that AMD EPYC processors are powering a lot of next generation super-computer clusters, we're going to have to figure out some workarounds!


I just compiled tensorflow on amd epyc and had no idea https://github.com/oneapi-src/oneDNN was actually MKL... now Im wondering if I even am getting all that power


The actual cpuid checking code is drilled from here: https://github.com/oneapi-src/oneDNN/blob/master/src/cpu/x64...

to here: https://github.com/oneapi-src/oneDNN/blob/master/src/cpu/x64...

It's using feature-flag checks, not family checks, so you shouldn't be affected if you're using oneDNN.


thank you!


I took a reversing course some years ago and during the first part we learned how to identify the compiler using common patterns. Long story short, the Intel compiler did a phenomenally amazing job optimizing. This was 10 years ago so things may be different now.


10 years ago LLVM was a baby and GCC was still on version 4. Intel probably have an advantage in areas where people pay them for it but GCC and LLVM are excellent compilers today.

Anecdotally, (ignoring that I'm still not sure whether to trust it or not) Intel stopped developing IACA and suggested (but not recommended) LLVM's MCA - which does suggest a changing of the guard in some way.


I think this: https://developer.amd.com/amd-aocl/amd-math-library-libm/ is supposed to be the alternative to MKL for those applications.


No need to develop and alternative when you can trick MKL into not crippling AMD: https://www.pugetsystems.com/labs/hpc/How-To-Use-MKL-with-AM...

Edit: the link I posted follows Agner's advice from the bottom of OPs link. However I think the extra information that it adds is that Zen2 Threadrippers outpaced then-current Intel's top contender. Once Zen3 and Intel's 11th Gen become available, repeating this benchmarks would be very valuable.


Thank you! I wasn't aware of this. But this is only a replacement for libm (i.e. basic trig, exp function), not the matrix-orientated BLAS, LAPACK and SCALAPACK routines that scientific codes spend >90% of their time.


I'm not personally familiar with those, but seems like BLAS, SCALAPACK, & others are also available:

https://developer.amd.com/amd-aocl/


I think you meant Intel Compiler? Yes. Intel Compiler consistently produces the highest performing binaries on Intel processors, often by a big margin. Intel MKL used to be the highest performing math library, and may still be so. As a result, most performance critical software, such as scientific applications, are compiled using ICC.


This is an overstatement. ICC consistently compiles the slowest and produces the largest binaries. It also defaults to something close to -ffast-math, which may or may not be appropriate. If your app benefits from aggressive inlining and vectorization at the expense of potentially huge increases in code size, ICC is likely to do well for you. However, I've seen lots of cases where well-vectorized code is faster with GCC or Clang, including some very important cases using Intel intrinsics. (Several such cases reported to/acknowledged by Intel; some have been fixed over the years, but these observations are not uncommon.)

BLIS is used by AMD and is a good open alternative to MKL (for BLAS) across many platforms. https://github.com/flame/blis/blob/master/docs/Performance.m...


I have been hearing about the superiority of Intel's compiler for a couple of decades now. Back when GCC was a tiny baby compared to what it is now, and when Clang/LLVM didn't even exist.

I wonder if this Intel compiler 'superiority' is still the case today, or if this is just a meme at this point.


For matrix manipulation based Fortran scientific codes, ifort/MKL can give +30% compared to gfortran. It's difficult to disentangle where the speedup comes from, but certainly as jedbrown aludes to, the Intel compilers seem to make a better go of poorly optimised / badly written code.

For C based software, its a much closer run thing, and often sticking with GCC avoids weird segfaults when mixing Intel and GCC-compiled Linux libraries.


> This is an overstatement.

To be generous...

Where do you typically see lack of inlining and vectorization with GCC? I'm curious because most times people have said GCC wouldn't vectorize code that I've been able to try, it would, at least if allowed -ffast-math a la Intel (as in BLIS now).


Can you explain "BLIS is used by AMD"? In what way do they use it?


It's their official BLAS [1] since 2015 when they moved away from their proprietary ACML implementation [2].

[1]https://developer.amd.com/amd-aocl/blas-library/

[2] https://developer.amd.com/open-source-strikes-again-accelera...


Amusingly, OpenBLAS significantly beat the bought-in ACML, on DGEMM, over the six(?) generations of Opteron I had available. AMD learnt.


The fact that MKL is the highest performing library has nothing to do with the quality of icc's output.

It is a myth that icc produces faster binaries that may have been true 25 years ago.


So what are they compiled with for non Intel processors?


The HPC codes I worked on we would compile with gcc, clang, icc and whatever vendor compiler was installed (Cray, PGI, something even worse). Then we'd benchmark the resulting binaries and make a recommendation based on speed. (Assuming the compiled binaries gave the correct results, which would sometimes fail and trigger further debugging to find out if we had undefined behaviors (or implementation defined) or had managed to find a compiler bug. For codes that are memory-bandwidth dominated the results are pretty much a toss-up. For compute bound codes intel would often win.

You can do the same when your machine has non-intel CPUs that are supported by a lot of compilers. If you are on power9 or arm the compiler list gets shorter. And a lot of supercomputers start to contain accelerators (often, but not always Nvidia GPUs) in which case there is often only one supported compiler and you have to rewrite code until that compiler is happy and produces fast code.


It's a depressingly common choice for educational installations. Folks are trained to use ICC instead of GCC and then they keep using ICC when they leave school.


Unsure, but I think that's the implication of this bit in the OP:

"The same effect was documented with some of the most popular mathematical software packages, including Mathematica, Mathcad, and Matlab."


A lot of scientific/HPC code running on supercomputers is compiled with icc.


Wow. How is this not a lawsuit happening already?


Toward the end of the article, the several lawsuits and FTC actions are discussed. The end result of them is a disclaimer on the Intel compiler that it's not optimized for non-Intel processors and that Intel can't artificially hurt AMD performance (but it apparently has no obligation to support unique AMD optimizations either).


> (but it apparently has no obligation to support unique AMD optimizations either)

It's a bit worse than that. Intel has no obligation to support optimizations that aren't unique to AMD; they're allowed to disable SIMD extensions that AMD processors declare support for, while at the same time using all of those SIMD extensions on Intel CPUs. They just have to include the disclaimer that their compiler and libraries may be doing this.


Why is it just the compiler maker’s job to report it may (read: will) underperform on AMD and not also the program developer too? If I paid for software that performed worse on AMD because it deliberately hobbled itself (and was not informed), I’d want a refund.

It’s straight up anti-competitive, but consumers aren’t smart enough to understand that it’s a problem; A consumer just sees biased benchmarks that show Intel outperforming AMD, and then choose Intel.


> that Intel can't artificially hurt AMD performance (but it apparently has no obligation to support unique AMD optimizations either).

As far as I understand it quite the opposite, it explicitly mentions that it may not apply "optimizations that are not unique to Intel" to other processors. It wont select the optimal code path unless the CPU vendor ID is set to GenuineIntel and fall back to the worst path your compile settings include.


All companies are evil if you let them...


Business is war, they say.


Except when discussing free trade.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: