linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Joe Perches <joe@perches.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
	Neil Horman <nhorman@tuxdriver.com>,
	linux-kernel@vger.kernel.org, sebastien.dugue@bull.net,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	x86@kernel.org
Subject: Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's
Date: Tue, 15 Oct 2013 09:21:04 -0700	[thread overview]
Message-ID: <1381854064.22110.16.camel@joe-AO722> (raw)
In-Reply-To: <20131015074123.GB25493@gmail.com>

On Tue, 2013-10-15 at 09:41 +0200, Ingo Molnar wrote:
> * Joe Perches <joe@perches.com> wrote:
> 
> > On Mon, 2013-10-14 at 15:44 -0700, Eric Dumazet wrote:
> > > On Mon, 2013-10-14 at 15:37 -0700, Joe Perches wrote:
> > > > On Mon, 2013-10-14 at 15:18 -0700, Eric Dumazet wrote:
> > > > > attached patch brings much better results
> > > > > 
> > > > > lpq83:~# ./netperf -H 7.7.8.84 -l 10 -Cc
> > > > > MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 () port 0 AF_INET
> > > > > Recv   Send    Send                          Utilization       Service Demand
> > > > > Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
> > > > > Size   Size    Size     Time     Throughput  local    remote   local   remote
> > > > > bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
> > > > > 
> > > > >  87380  16384  16384    10.00      8043.82   2.32     5.34     0.566   1.304  
> > > > > 
> > > > > diff --git a/arch/x86/lib/csum-partial_64.c b/arch/x86/lib/csum-partial_64.c
> > > > []
> > > > > @@ -68,7 +68,8 @@ static unsigned do_csum(const unsigned char *buff, unsigned len)
> > > > >  			zero = 0;
> > > > >  			count64 = count >> 3;
> > > > >  			while (count64) { 
> > > > > -				asm("addq 0*8(%[src]),%[res]\n\t"
> > > > > +				asm("prefetch 5*64(%[src])\n\t"
> > > > 
> > > > Might the prefetch size be too big here?
> > > 
> > > To be effective, you need to prefetch well ahead of time.
> > 
> > No doubt.
> 
> So why did you ask then?
> 
> > > 5*64 seems common practice (check arch/x86/lib/copy_page_64.S)
> > 
> > 5 cachelines for some processors seems like a lot.
> 
> What processors would that be?

The ones where conservatism in L1 cache use is good
because there are multiple threads running concurrently.

> Most processors have hundreds of cachelines even in their L1 cache. 

And sometimes that many executable processes too.

> Thousands in the L2 cache, up to hundreds of thousands.

Irrelevant because prefetch doesn't apply there.

Ingo, Eric _showed_ that the prefetch is good here.
How about looking at a little optimization to the minimal
prefetch that gives that level of performance.

You could argue that prefetching PAGE_SIZE or larger
would be better still otherwise.

I suspect that using a smaller multiple of
L1_CACHE_BYTES like 2 or 3 would perform the same.

The last time it was looked at for copy_page_64.S was
quite awhile ago.  It looks like maybe 2003.



  parent reply	other threads:[~2013-10-15 16:21 UTC|newest]

Thread overview: 132+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-11 16:51 [PATCH] x86: Run checksumming in parallel accross multiple alu's Neil Horman
2013-10-12 17:21 ` Ingo Molnar
2013-10-13 12:53   ` Neil Horman
2013-10-14 20:28   ` Neil Horman
2013-10-14 21:19     ` Eric Dumazet
2013-10-14 22:18       ` Eric Dumazet
2013-10-14 22:37         ` Joe Perches
2013-10-14 22:44           ` Eric Dumazet
2013-10-14 22:49             ` Joe Perches
2013-10-15  7:41               ` Ingo Molnar
2013-10-15 10:51                 ` Borislav Petkov
2013-10-15 12:04                   ` Ingo Molnar
2013-10-15 16:21                 ` Joe Perches [this message]
2013-10-16  0:34                   ` Eric Dumazet
2013-10-16  6:25                   ` Ingo Molnar
2013-10-16 16:55                     ` Joe Perches
2013-10-17  0:34         ` Neil Horman
2013-10-17  1:42           ` Eric Dumazet
2013-10-18 16:50             ` Neil Horman
2013-10-18 17:20               ` Eric Dumazet
2013-10-18 20:11                 ` Neil Horman
2013-10-18 21:15                   ` Eric Dumazet
2013-10-20 21:29                     ` Neil Horman
2013-10-21 17:31                       ` Eric Dumazet
2013-10-21 17:46                         ` Neil Horman
2013-10-21 19:21                     ` Neil Horman
2013-10-21 19:44                       ` Eric Dumazet
2013-10-21 20:19                         ` Neil Horman
2013-10-26 12:01                           ` Ingo Molnar
2013-10-26 13:58                             ` Neil Horman
2013-10-27  7:26                               ` Ingo Molnar
2013-10-27 17:05                                 ` Neil Horman
2013-10-17  8:41           ` Ingo Molnar
2013-10-17 18:19             ` H. Peter Anvin
2013-10-17 18:48               ` Eric Dumazet
2013-10-18  6:43               ` Ingo Molnar
2013-10-28 16:01             ` Neil Horman
2013-10-28 16:20               ` Ingo Molnar
2013-10-28 17:49                 ` Neil Horman
2013-10-28 16:24               ` Ingo Molnar
2013-10-28 16:49                 ` David Ahern
2013-10-28 17:46                 ` Neil Horman
2013-10-28 18:29                   ` Neil Horman
2013-10-29  8:25                     ` Ingo Molnar
2013-10-29 11:20                       ` Neil Horman
2013-10-29 11:30                         ` Ingo Molnar
2013-10-29 11:49                           ` Neil Horman
2013-10-29 12:52                             ` Ingo Molnar
2013-10-29 13:07                               ` Neil Horman
2013-10-29 13:11                                 ` Ingo Molnar
2013-10-29 13:20                                   ` Neil Horman
2013-10-29 14:17                                   ` Neil Horman
2013-10-29 14:27                                     ` Ingo Molnar
2013-10-29 20:26                                       ` Neil Horman
2013-10-31 10:22                                         ` Ingo Molnar
2013-10-31 14:33                                           ` Neil Horman
2013-11-01  9:13                                             ` Ingo Molnar
2013-11-01 14:06                                               ` Neil Horman
2013-10-29 14:12                               ` David Ahern
2013-10-15  7:32     ` Ingo Molnar
2013-10-15 13:14       ` Neil Horman
2013-10-12 22:29 ` H. Peter Anvin
2013-10-13 12:53   ` Neil Horman
2013-10-18 16:42   ` Neil Horman
2013-10-18 17:09     ` H. Peter Anvin
2013-10-25 13:06       ` Neil Horman
2013-10-14  4:38 ` Andi Kleen
2013-10-14  7:49   ` Ingo Molnar
2013-10-14 21:07     ` Eric Dumazet
2013-10-15 13:17       ` Neil Horman
2013-10-14 20:25   ` Neil Horman
2013-10-15  7:12     ` Sébastien Dugué
2013-10-15 13:33       ` Andi Kleen
2013-10-15 13:56         ` Sébastien Dugué
2013-10-15 14:06           ` Eric Dumazet
2013-10-15 14:15             ` Sébastien Dugué
2013-10-15 14:26               ` Eric Dumazet
2013-10-15 14:52                 ` Eric Dumazet
2013-10-15 16:02                   ` Andi Kleen
2013-10-16  0:28                     ` Eric Dumazet
2013-11-06 15:23 ` x86: Enhance perf checksum profiling and x86 implementation Neil Horman
2013-11-06 15:23   ` [PATCH v2 1/2] perf: Add csum benchmark tests to perf Neil Horman
2013-11-06 15:23   ` [PATCH v2 2/2] x86: add prefetching to do_csum Neil Horman
2013-11-06 15:34     ` Dave Jones
2013-11-06 15:54       ` Neil Horman
2013-11-06 17:19         ` Joe Perches
2013-11-06 18:11           ` Neil Horman
2013-11-06 20:02           ` Neil Horman
2013-11-06 20:07             ` Joe Perches
2013-11-08 16:25               ` Neil Horman
2013-11-08 16:51                 ` Joe Perches
2013-11-08 19:07                   ` Neil Horman
2013-11-08 19:17                     ` Joe Perches
2013-11-08 20:08                       ` Neil Horman
2013-11-08 19:17                     ` H. Peter Anvin
2013-11-08 19:01           ` Neil Horman
2013-11-08 19:33             ` Joe Perches
2013-11-08 20:14               ` Neil Horman
2013-11-08 20:29                 ` Joe Perches
2013-11-11 19:40                   ` Neil Horman
2013-11-11 21:18                     ` Ingo Molnar
2013-11-06 18:23         ` Eric Dumazet
2013-11-06 18:59           ` Neil Horman
2013-11-06 20:19     ` Andi Kleen
2013-11-07 21:23       ` Neil Horman
  -- strict thread matches above, loose matches on Subject: below --
2013-10-18 15:46 [PATCH] x86: Run checksumming in parallel accross multiple alu's Doug Ledford
2013-10-18 17:42 Doug Ledford
2013-10-19  8:23 ` Ingo Molnar
2013-10-21 17:54   ` Doug Ledford
2013-10-26 11:55     ` Ingo Molnar
2013-10-28 17:02       ` Doug Ledford
2013-10-29  8:38         ` Ingo Molnar
2013-10-30  5:25 Doug Ledford
2013-10-30 10:27 ` David Laight
2013-10-30 11:02 ` Neil Horman
2013-10-30 12:18   ` David Laight
2013-10-30 13:22     ` Doug Ledford
2013-10-30 13:35   ` Doug Ledford
2013-10-30 14:04     ` David Laight
2013-10-30 14:52     ` Neil Horman
2013-10-31 18:30     ` Neil Horman
2013-11-01  9:21       ` Ingo Molnar
2013-11-01 15:42       ` Ben Hutchings
2013-11-01 16:08         ` Neil Horman
2013-11-01 16:16           ` Ben Hutchings
2013-11-01 16:18           ` David Laight
2013-11-01 17:37             ` Neil Horman
2013-11-01 19:45               ` Joe Perches
2013-11-01 19:58                 ` Neil Horman
2013-11-01 20:26                   ` Joe Perches
2013-11-02  2:07                     ` Neil Horman
2013-11-04  9:47               ` David Laight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1381854064.22110.16.camel@joe-AO722 \
    --to=joe@perches.com \
    --cc=eric.dumazet@gmail.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=nhorman@tuxdriver.com \
    --cc=sebastien.dugue@bull.net \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).