Just playing with clang/llvm (1.1/2.7) [memcpy 65 bytes string]
clang/llvm 1.1/2.7
llvmc -x c++ -O3
I refs: 28,117,480
I1 misses: 718
L2i misses: 711
I1 miss rate: 0.00%
L2i miss rate: 0.00%
D refs: 25,731,989 (13,528,145 rd + 12,203,844 wr)
D1 misses: 6,903 ( 6,512 rd + 391 wr)
L2d misses: 4,108 ( 3,840 rd + 268 wr)
D1 miss rate: 0.0% ( 0.0% + 0.0% )
L2d miss rate: 0.0% ( 0.0% + 0.0% )
L2 refs: 7,621 ( 7,230 rd + 391 wr)
L2 misses: 4,819 ( 4,551 rd + 268 wr)
L2 miss rate: 0.0% ( 0.0% + 0.0% )
gcc 4.5
g++ -x c++ -O3
I refs: 14,482,419
I1 misses: 713
L2i misses: 706
I1 miss rate: 0.00%
L2i miss rate: 0.00%
D refs: 18,389,586 (9,332,878 rd + 9,056,708 wr)
D1 misses: 6,823 ( 6,487 rd + 336 wr)
L2d misses: 4,066 ( 3,833 rd + 233 wr)
D1 miss rate: 0.0% ( 0.0% + 0.0% )
L2d miss rate: 0.0% ( 0.0% + 0.0% )
L2 refs: 7,536 ( 7,200 rd + 336 wr)
L2 misses: 4,772 ( 4,539 rd + 233 wr)
L2 miss rate: 0.0% ( 0.0% + 0.0% )
And here is the dump of llmv generated main
0x08048430 <+0>: push %edi
0x08048431 <+1>: push %esi
0x08048432 <+2>: sub $0x100c,%esp
0x08048438 <+8>: lea 0xc(%esp),%esi
0x0804843c <+12>: mov $0x7ffff,%edi
0x08048441 <+17>: mov %esi,(%esp)
0x08048444 <+20>: movl $0x1000,0x8(%esp)
0x0804844c <+28>: movl $0x0,0x4(%esp)
0x08048454 <+36>: call 0x804833c <memset@plt>
0x08048459 <+41>: lea 0x0(%esi,%eiz,1),%esi
0x08048460 <+48>: mov %esi,(%esp)
0x08048463 <+51>: movl $0x3f,0x8(%esp)
0x0804846b <+59>: movl $0x8048560,0x4(%esp)
0x08048473 <+67>: call 0x804835c <memcpy@plt>
0x08048478 <+72>: dec %edi
0x08048479 <+73>: jne 0x8048460
0x0804847b <+75>: xor %eax,%eax
0x0804847d <+77>: add $0x100c,%esp
0x08048483 <+83>: pop %esi
0x08048484 <+84>: pop %edi
0x08048485 <+85>: ret
Frankly, I thought llvm will perform better (unless I'm doing something wrong).