public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "J.A. Magallon" <jamagallon@able.es>
To: root@chaos.analogic.com
Cc: "Martin J. Bligh" <mbligh@aracnet.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	lse-tech <lse-tech@lists.sourceforge.net>
Subject: Re: gcc 2.95 vs 3.21 performance
Date: Tue, 4 Feb 2003 01:43:21 +0100	[thread overview]
Message-ID: <20030204004321.GA12038@werewolf.able.es> (raw)
In-Reply-To: <Pine.LNX.3.95.1030203182417.7651A-100000@chaos.analogic.com>; from root@chaos.analogic.com on Tue, Feb 04, 2003 at 00:31:56 +0100


On 2003.02.04 Richard B. Johnson wrote:
> On Mon, 3 Feb 2003, Martin J. Bligh wrote:
> 
> > People keep extolling the virtues of gcc 3.2 to me, which I'm
> > reluctant to switch to, since it compiles so much slower. But
> > it supposedly generates better code, so I thought I'd compile
> > the kernel with both and compare the results. This is gcc 2.95
> > and 3.2.1 from debian unstable on a 16-way NUMA-Q. The kernbench
> > tests still use 2.95 for the compile-time stuff.
> >
> [SNIPPED tests...]
> 
> Don't let this get out, but egcs-2.91.66 compiled FFT code
> works about 50 percent of the speed of whatever M$ uses for
> Visual C++ Version 6.0  I was awfully disheartened when I
> found that identical code executed twice as fast on M$ than
> it does on Linux. I tried to isolate what was causing the
> difference. So I replaced 'hypot()' with some 'C' code that
> does sqrt(x^2 + y^2) just to see if it was the 'C' library.
> It didn't help. When I find out what type (section) of code
> is running slower, I'll report. In the meantime, it's fast
> enough, but I don't like being beat by M$.
> 

I face a simliar problem. As everybody says that SSE is so marvelous,
we are trying to put some SSE code in our render engine, to speed up this.
But look at the results of the code below (box is a P4@1.8, Xeon with ht):
annwn:~/sse> ss-g
Proc std:
      5020 kticks
Proc std inline:
      4320 kticks
Proc sse:
      4290 kticks
Proc sse inline:
      3890 kticks

So what ? Just around 500 ticks for updating to sse ? As Computer Architecture
people at the school says, it is something called 'spill code' (did I wrote it
ok?). In short, too much sse but too less registers, so Intel ia32 turns into
crap when you need some indexes, out of registers and copy to and from the stack.

#include <stdlib.h>
#include <time.h>
#include <stdio.h>
#if defined(__INTEL_COMPILER)
#include <xmmintrin.h>
#endif

#define LOOPS	1000
#define SZ		100000

#if defined(__GNUC__) && defined(__SSE__)
typedef void __ve_reg __attribute__((__mode__(V4SF)));
#endif

typedef struct point point;
struct point { 
	float v[4];
};

void mulp_std(const point* a,const point* b,point* r)
{
	int i;
	for (i=0; i<4; i++)
		r->v[i] = a->v[i] * b->v[i];
}

inline void mulpi_std(const point* a,const point* b,point* r)
{
	int i;
	for (i=0; i<4; i++)
		r->v[i] = a->v[i] * b->v[i];
}

void mulp_sse(const point* a,const point* b,point* r)
{
#if defined(__GNUC__) && defined(__SSE__)
	__ve_reg xmm0,xmm1,xmm2;
	xmm0 = __builtin_ia32_loadups((float*)a->v);
	xmm1 = __builtin_ia32_loadups((float*)b->v);
	xmm2 = __builtin_ia32_mulps(xmm0,xmm1);
	__builtin_ia32_storeups(r->v,xmm2);
#endif
#if defined(__INTEL_COMPILER)
	__m128 xmm0,xmm1,xmm2;
	xmm0 = _mm_loadu_ps((float*)a->v);
	xmm1 = _mm_loadu_ps((float*)b->v);
	xmm2 = _mm_mul_ps(xmm0,xmm1);
	_mm_storeu_ps(r->v,xmm2);
#endif
}

inline void mulpi_sse(const point* a,const point* b,point* r)
{
#if defined(__GNUC__) && defined(__SSE__)
	__ve_reg xmm0,xmm1,xmm2;
	xmm0 = __builtin_ia32_loadups((float*)a->v);
	xmm1 = __builtin_ia32_loadups((float*)b->v);
	xmm2 = __builtin_ia32_mulps(xmm0,xmm1);
	__builtin_ia32_storeups(r->v,xmm2);
#endif
#if defined(__INTEL_COMPILER)
#if defined(__INTEL_COMPILER)
	__m128 xmm0,xmm1,xmm2;
	xmm0 = _mm_loadu_ps((float*)a->v);
	xmm1 = _mm_loadu_ps((float*)b->v);
	xmm2 = _mm_mul_ps(xmm0,xmm1);
	_mm_storeu_ps(r->v,xmm2);
#endif
#endif
}

int main(int argc, char** argv)
{
	point *a;
	point *b;
	point *c;
	int i,j;
	unsigned long t0,t1;

	a = malloc(SZ*sizeof(point));
	b = malloc(SZ*sizeof(point));
	c = malloc(SZ*sizeof(point));

	printf("Proc std:\n");
	t0 = clock();
	for (i=0; i<LOOPS; i++)
	{
		for (j=0; j<SZ; j++)
			mulp_std(&a[j],&b[j],&c[j]);
		for (j=0; j<SZ; j++)
			mulp_std(&b[j],&b[j],&a[j]);
	}
	t1 = clock();
	printf("%10d kticks\n",(t1-t0)/1000);

	printf("Proc std inline:\n");
	t0 = clock();
	for (i=0; i<LOOPS; i++)
	{
		for (j=0; j<SZ; j++)
			mulpi_std(&a[j],&b[j],&c[j]);
		for (j=0; j<SZ; j++)
			mulpi_std(&b[j],&b[j],&a[j]);
	}
	t1 = clock();
	printf("%10d kticks\n",(t1-t0)/1000);

	printf("Proc sse:\n");
	t0 = clock();
	for (i=0; i<LOOPS; i++)
	{
		for (j=0; j<SZ; j++)
			mulp_sse(&a[j],&b[j],&c[j]);
		for (j=0; j<SZ; j++)
			mulp_sse(&b[j],&b[j],&a[j]);
	}
	t1 = clock();
	printf("%10d kticks\n",(t1-t0)/1000);

	printf("Proc sse inline:\n");
	t0 = clock();
	for (i=0; i<LOOPS; i++)
	{
		for (j=0; j<SZ; j++)
			mulpi_sse(&a[j],&b[j],&c[j]);
		for (j=0; j<SZ; j++)
			mulpi_sse(&b[j],&b[j],&a[j]);
	}
	t1 = clock();
	printf("%10d kticks\n",(t1-t0)/1000);

	free(c);
	free(b);
	free(a);

	return 0;
}


-- 
J.A. Magallon <jamagallon@able.es>      \                 Software is like sex:
werewolf.able.es                         \           It's better when it's free
Mandrake Linux release 9.1 (Cooker) for i586
Linux 2.4.21-pre4-jam1 (gcc 3.2.1 (Mandrake Linux 9.1 3.2.1-5mdk))

  reply	other threads:[~2003-02-04  0:33 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-02-03 23:05 gcc 2.95 vs 3.21 performance Martin J. Bligh
2003-02-03 23:22 ` [Lse-tech] " Andi Kleen
2003-02-03 23:31 ` Richard B. Johnson
2003-02-04  0:43   ` J.A. Magallon [this message]
2003-02-04 13:42     ` Richard B. Johnson
2003-02-04 14:20       ` John Bradford
2003-02-04  6:54   ` Denis Vlasenko
2003-02-04  7:13     ` Martin J. Bligh
2003-02-04 12:25       ` Adrian Bunk
2003-02-04 15:51         ` Martin J. Bligh
2003-02-04 16:27           ` [Lse-tech] " Martin J. Bligh
2003-02-04 17:40             ` Patrick Mansfield
2003-02-04 17:55               ` Martin J. Bligh
2003-02-04  9:54     ` Bryan Andersen
2003-02-04 15:46       ` Martin J. Bligh
2003-02-04 19:09     ` Timothy D. Witham
2003-02-04 19:35       ` John Bradford
2003-02-04 19:44         ` Dave Jones
2003-02-04 20:11           ` John Bradford
2003-02-04 20:20             ` John Bradford
2003-02-04 20:45             ` Herman Oosthuysen
2003-02-04 21:44               ` Timothy D. Witham
2003-02-05  7:15               ` Denis Vlasenko
2003-02-05 10:36                 ` Andreas Schwab
2003-02-05 11:41                   ` Denis Vlasenko
2003-02-05 12:20                     ` Dave Jones
2003-02-05 13:10                     ` [Lse-tech] " Dipankar Sarma
2003-02-05 15:30                 ` Martin J. Bligh
2003-02-04 21:38         ` Linus Torvalds
2003-02-04 21:54           ` John Bradford
2003-02-04 22:11             ` Linus Torvalds
2003-02-04 23:27               ` Timothy D. Witham
2003-02-04 23:21           ` Larry McVoy
2003-02-04 23:42             ` b_adlakha
2003-02-05  0:19               ` Andy Pfiffer
2003-02-04 23:51             ` Jakob Oestergaard
2003-02-05  1:03               ` Hugo Mills
2003-02-10 22:26               ` Andrea Arcangeli
2003-02-10 23:28                 ` J.A. Magallon
2003-02-04 23:51             ` Eli Carter
2003-02-05  0:27               ` Larry McVoy
2003-02-06 20:42                 ` Paul Jakma
2003-02-05  3:03             ` Tomas Szepe
2003-02-05  6:03             ` Mark Mielke
2003-02-07 16:09           ` Pavel Machek
2003-02-04 10:57   ` Padraig
2003-02-04 13:11     ` Helge Hafting
2003-02-04 13:29       ` Jörn Engel
2003-02-04 14:05       ` P
2003-02-04 20:36         ` Herman Oosthuysen
2003-02-04 12:20 ` [Lse-tech] " Dave Jones
2003-02-04 15:50   ` Martin J. Bligh
2003-02-10 12:13     ` Momchil Velikov
2003-02-06 15:42 ` gcc -O2 vs gcc -Os performance Martin J. Bligh
2003-02-06 15:51   ` [Lse-tech] " Andi Kleen
2003-02-06 17:48   ` Alan Cox
2003-02-06 17:06     ` Martin J. Bligh
2003-02-06 20:38     ` Martin J. Bligh
2003-02-06 21:32       ` John Bradford
2003-02-06 22:12       ` Linus Torvalds
2003-02-06 22:58         ` Martin J. Bligh
2003-02-06 23:16           ` Linus Torvalds
2003-02-06 23:59             ` Martin J. Bligh
2003-02-06 23:17       ` Roger Larsson
2003-02-06 23:33         ` Martin J. Bligh
     [not found] <1044385759.1861.46.camel@localhost.localdomain.suse.lists.linux.kernel>
     [not found] ` <200302041935.h14JZ69G002675@darkstar.example.net.suse.lists.linux.kernel>
     [not found]   ` <b1pbt8$2ll$1@penguin.transmeta.com.suse.lists.linux.kernel>
2003-02-04 22:05     ` gcc 2.95 vs 3.21 performance Andi Kleen
2003-02-04 22:14       ` Linus Torvalds
2003-02-05 10:04         ` Pavel Janík
2003-02-05 20:07           ` Linus Torvalds
2003-02-06 15:00           ` Horst von Brand
2003-02-04 22:59       ` Jeff Muizelaar
2003-02-04 23:12         ` b_adlakha
2003-02-05  8:41         ` Horst von Brand
2003-02-05 19:09         ` Linus Torvalds
2003-02-05 19:22           ` Randy.Dunlap
2003-02-05 19:24           ` John Bradford
2003-02-06  7:02         ` Neil Booth
     [not found]           ` <courier.3E423112.00007219@softhome.net>
     [not found]             ` <20030206212218.GA4891@daikokuya.co.uk>
2003-02-07 10:31               ` b_adlakha
2003-02-07 18:46                 ` Horst von Brand
2003-02-07 21:49                 ` Neil Booth
2003-02-10  2:14           ` Jeff Garzik
2003-02-10  9:19             ` Tomas Szepe
     [not found] <120432836@toto.iv>
2003-02-05  2:45 ` Peter Chubb
     [not found] <200302052021.h15KLrXv000881@darkstar.example.net>
2003-02-05 20:28 ` b_adlakha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20030204004321.GA12038@werewolf.able.es \
    --to=jamagallon@able.es \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lse-tech@lists.sourceforge.net \
    --cc=mbligh@aracnet.com \
    --cc=root@chaos.analogic.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox