public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Fast 64-bit atomic writes (SSE?)
@ 2004-03-20  0:39 Roland Dreier
  0 siblings, 0 replies; 4+ messages in thread
From: Roland Dreier @ 2004-03-20  0:39 UTC (permalink / raw)
  To: linux-kernel

Hi, I'm trying to find the best (fastest) way to write a 64-bit
quantity atomically to a device (on i386).  Taking a spinlock around
two 32-bit accesses seems to be rather expensive.  I'm mostly
concerned about newish CPUs, so I'm willing to use SSE or SSE2
instructions (of course falling back to a slower locked path if the
kernel is built for an old CPU).

I guess I have two questions, a general one and a specific one.

General question:
  What's the best way to do this?

Specific question:
  I've come up with the below function.  However, you may notice that
I'm forced to use movdqu (instead of movdqa, which is presumably
faster).  This is because even with the __attribute__((aligned(16))),
my xmmsave array is not guaranteed to be aligned to 16 bytes.  I could
just allocate 31 bytes for xmmsave and align it to 16 bytes by hand,
but that seems a little ugly.  Is there some magic I'm missing?

Thanks,
  Roland

static inline void mywrite64(u32 *val, void *dest)
{
        u8 xmmsave[16] __attribute__((aligned(16)));

/* The #ifs are a hack to deal with 2.4 kernels without preempt. */
#if CONFIG_PREEMPT
        preempt_disable();
#endif

        /* We use movdqu for the moment, because even
           __attribute__((aligned(16))) doesn't seem to guarantee
           xmmsave is aligned to a 16 byte boundary. */
        __asm__ __volatile__ (
                "movdqu %%xmm0,(%0); \n\t"
                "movq   (%1),%%xmm0; \n\t"
                "movq   %%xmm0,(%2); \n\t"
                "movdqu (%0),%%xmm0; \n\t"
                :
                : "r" (xmmsave), "r" (val), "r" (dest)
                : "memory" );

#if CONFIG_PREEMPT
        preempt_enable();
#endif
}

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2004-03-20 14:49 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <874qskl5ca.fsf@love-shack.home.digitalvampire.org.suse.lists.linux.kernel>
2004-03-20  7:37 ` Fast 64-bit atomic writes (SSE?) Andi Kleen
2004-03-20 14:43   ` Roland Dreier
2004-03-20 14:48   ` Roland Dreier
2004-03-20  0:39 Roland Dreier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox