freshening up some assembly skills, I played around with the mmx and sse(2) transports on my new atom…

freshening up some assembly skills, I played around with the mmx and sse(2) transports on my new atom n570. Below is the fastest memory zeroing I could come up with:

void zeromemory(long* addr,long size)
{
_asm_ _volatile_ (
“shrl $7,%%ecx;n”
“pxor %%xmm0,%%xmm0;n”
“pxor %%xmm1,%%xmm1;n”
“pxor %%xmm2,%%xmm2;n”
“pxor %%xmm3,%%xmm3;n”
“pxor %%xmm4,%%xmm4;n”
“pxor %%xmm5,%%xmm5;n”
“pxor %%xmm6,%%xmm6;n”
“pxor %%xmm7,%%xmm7;n”
“set:n”
“movaps %%xmm0,(%%ebx);n”
“movaps %%xmm1,16(%%ebx);n”
“movaps %%xmm2,32(%%ebx);n”
“movaps %%xmm3,48(%%ebx);n”
“movaps %%xmm4,64(%%ebx);n”
“movaps %%xmm5,80(%%ebx);n”
“movaps %%xmm6,96(%%ebx);n”
“movaps %%xmm7,112(%%ebx);n”
“addl $128,%%ebx;n”
“loop set;”
: :”c”(size),”b”(addr):);
}

some remarks:
1. as usual with me, this snippet is subject to the zlib-license
2. addr has to point to 16byte aligned memory,ie: addr = memalign(16,size);

#memset #zeromemory #assembly #sse #mmx #c++

(View on Google+)

Posted in gplus.