public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@suse.de>
To: Linus Torvalds <torvalds@transmeta.com>
Cc: Andi Kleen <ak@suse.de>, Brian Gerst <bgerst@didntduck.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	linux-kernel@vger.kernel.org, jh@suse.cz
Subject: Re: [PATCH] Re: SSE related security hole
Date: Sun, 21 Apr 2002 04:08:10 +0200	[thread overview]
Message-ID: <20020421040810.P1291@dualathlon.random> (raw)
In-Reply-To: <20020420232818.N1291@dualathlon.random> <Pine.LNX.4.44.0204201619170.3643-100000@home.transmeta.com>

On Sat, Apr 20, 2002 at 04:23:50PM -0700, Linus Torvalds wrote:
> 
> 
> On Sat, 20 Apr 2002, Andrea Arcangeli wrote:
> >
> > pxor+xorps is definitely faster than fxrestor on athlon-mp.
> 
> Andrea, that's not the _comparison_.
> 
> The "fxrestor" replaces the "fninit" too, so you have to take that into
> account.

Note that I just took it into account:

	rdtscl(before);

	__asm__("fninit");
	^^^^^^^^^^^^^^^^^

	asm volatile(
		     "pxor %mm0, %mm0	\n"
		     "xorps %xmm0, %xmm0	\n"
		     "pxor %mm1, %mm1	\n"
		     "xorps %xmm1, %xmm1	\n"
		     "pxor %mm2, %mm2	\n"
		     "xorps %xmm2, %xmm2	\n"
		     "pxor %mm3, %mm3	\n"
		     "xorps %xmm3, %xmm3	\n"
		     "pxor %mm4, %mm4	\n"
		     "xorps %xmm4, %xmm4	\n"
		     "pxor %mm5, %mm5	\n"
		     "xorps %xmm5, %xmm5	\n"
		     "pxor %mm6, %mm6	\n"
		     "xorps %xmm6, %xmm6	\n"
		     "pxor %mm7, %mm7	\n"
		     "xorps %xmm7, %xmm7	\n"
		     "emms			\n");
	load_mxcsr(0x1f80);
	rdtscl(after);

> 
> > fxrestor on athlon-mp 1600, on cold cache (the "default fpu state" will
> > be cold most of the time, it's only ever used at the first math fault of
> > a task):
> 
> Except it's _never_ cold-cache the way it's coded now. In fact it's always
> hot-cache - which are exactly the numbers I posted.

In the common case fork/clone are executed at a frequency that makes it
a cold cache.

Anyways since you are apparently constantly forking off a new task every
10 milliseconds on your machine, on the PIII with the hot cache it's 97
cycles for fxrestor, and 90 cycles of pxor/xorps/fninit so it's still
faster, but not much faster anymore. With the athlon-mp it's 120 cycles
the xorps/pxor/fninit and 85 cycles fxrestor, but that's only because
the fninit is very slow on the athlon-mp, probably not the case for
x86-64 (while the xorps/pxor is just much faster on the athlon-mp
compared to the PIII). The pxor/xorps alone on the athlon-mp takes only
17 cycles! the fninit alone instead takes 99 cycles.  512bytes of ram on
x86-64 (around half on x86) cannot be optimized away by a new cpu, the
bus will get the hit regardless, while fninit has the potential to be
much faster on the new cpu (just like it's much faster on the PIII and
infact it's a win).  I guess the P4 will give hot cache results similar
to the PIII with the hot-cache, and anyways there's no doubt on the huge
speedup with the _common_ case cold-cache, even more obviously on x86-64
that will really have to read the _whole_ 512bytes, not (yet) the case
for x86. I'm astonished you prefer to take the hit on the ram bus.

here the hot cache benchmark that I used so you can test yourself:

#include <sys/mman.h>
#include <asm/msr.h>

struct i387_fxsave_struct {
	unsigned short	cwd;
	unsigned short	swd;
	unsigned short	twd;
	unsigned short	fop;
	long	fip;
	long	fcs;
	long	foo;
	long	fos;
	long	mxcsr;
	long	reserved;
	long	st_space[32];	/* 8*16 bytes for each FP-reg = 128 bytes */
	long	xmm_space[32];	/* 8*16 bytes for each XMM-reg = 128 bytes */
	long	padding[56];
} __attribute__ ((aligned (16)));

#define LOOPS 200

#define load_mxcsr( val ) do { \
	unsigned long __mxcsr = ((unsigned long)(val) & 0xffbf); \
	asm volatile( "ldmxcsr %0" : : "m" (__mxcsr) ); \
} while (0)

struct i387_fxsave_struct i387;
char buf[1024*1024*40];

static void cold_dcache(void)
{
	memset(buf, 0, 1024*1024*40);
}

main()
{
	unsigned long before, after;
	int i;

#if 1
	for (i = 0; i < LOOPS; i++) {
		rdtscl(before);

 		__asm__("fninit");

#if 1
		asm volatile("pxor %mm0, %mm0	\n"
			     "pxor %mm1, %mm1	\n"
			     "pxor %mm2, %mm2	\n"
			     "pxor %mm3, %mm3	\n"
			     "pxor %mm4, %mm4	\n"
			     "pxor %mm5, %mm5	\n"
			     "pxor %mm6, %mm6	\n"
			     "pxor %mm7, %mm7	\n"
			     "xorps %xmm0, %xmm0	\n"
			     "xorps %xmm1, %xmm1	\n"
			     "xorps %xmm2, %xmm2	\n"
			     "xorps %xmm3, %xmm3	\n"
			     "xorps %xmm4, %xmm4	\n"
			     "xorps %xmm5, %xmm5	\n"
			     "xorps %xmm6, %xmm6	\n"
			     "xorps %xmm7, %xmm7	\n"
			     "emms			\n");
		load_mxcsr(0x1f80);
#endif
		rdtscl(after);
	}
#else
	asm volatile("fxsave %0" : : "m" (i387));

	for (i = 0; i < LOOPS; i++) {
		rdtscl(before);

		asm volatile("fxrstor %0" : "=m" (i387));

		rdtscl(after);
	}
#endif

	printf("cycles %lu\n", after-before);
}

> It may have high precision, but since it's testing something that has
> nothing to do with the problem at hand, it's basically 100% useless.

Andrea

  reply	other threads:[~2002-04-21  2:10 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20020418183639.20946.qmail@science.horizon.com.suse.lists.linux.kernel>
     [not found] ` <a9ncgs$2s2$1@cesium.transmeta.com.suse.lists.linux.kernel>
2002-04-19 14:06   ` SSE related security hole Andi Kleen
2002-04-19 18:00     ` Doug Ledford
2002-04-19 21:04       ` Andrea Arcangeli
2002-04-19 21:35         ` H. Peter Anvin
2002-04-19 21:42           ` Andi Kleen
2002-04-20  3:23             ` Andrea Arcangeli
2002-04-19 23:12           ` [PATCH] " Brian Gerst
2002-04-19 23:41             ` Linus Torvalds
2002-04-20  0:01               ` H. Peter Anvin
2002-04-20  0:09                 ` Linus Torvalds
2002-04-20  0:11                   ` Brian Gerst
2002-04-20  0:19                   ` H. Peter Anvin
2002-04-20  0:29                     ` Linus Torvalds
2002-04-20  0:31                   ` Alan Cox
2002-04-20  0:08               ` Brian Gerst
2002-04-20  0:21                 ` Linus Torvalds
2002-04-20  4:21                 ` Andrea Arcangeli
2002-04-20  4:35                   ` Linus Torvalds
2002-04-20  5:07                     ` Andrea Arcangeli
2002-04-20 16:27                       ` Linus Torvalds
2002-04-20 17:27                         ` Andrea Arcangeli
2002-04-20 17:38                           ` Linus Torvalds
2002-04-20 18:12                             ` Andrea Arcangeli
2002-04-20 19:30                               ` Linus Torvalds
2002-04-20 19:41                                 ` Andi Kleen
2002-04-20 21:28                                   ` Andrea Arcangeli
2002-04-20 22:43                                     ` H. Peter Anvin
2002-04-21  2:09                                       ` Andrea Arcangeli
2002-04-20 23:23                                     ` Linus Torvalds
2002-04-21  2:08                                       ` Andrea Arcangeli [this message]
2002-04-20 23:13                                   ` Linus Torvalds
2002-04-23 19:21                               ` Linus Torvalds
2002-04-23 20:05                                 ` H. Peter Anvin
2002-04-24  0:32                                 ` Andrea Arcangeli
2002-04-24  2:10                                   ` Linus Torvalds
2002-04-26  9:13                                     ` Pavel Machek
2002-04-26 11:55                                       ` Andrea Arcangeli
2002-04-19 22:18         ` Jan Hubicka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20020421040810.P1291@dualathlon.random \
    --to=andrea@suse.de \
    --cc=ak@suse.de \
    --cc=bgerst@didntduck.org \
    --cc=hpa@zytor.com \
    --cc=jh@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@transmeta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox