Used to benchmark a little, with ATT's obsolete R-benchmark-25.R script.
gcc + BLAS vs. icc + MKL, compiled under Ubuntu x64 (shorter is better):
The performance of OpenBLAS seems similar to MKL according to the authors (higher is better):

So this could be it. Actually matrix calculation is a part of the whole picture, the performance of control-flows and miscellaneous building blocks are also critical, which Julia did better too.