From: leroy christophe <christophe.leroy@c-s.fr>
To: Segher Boessenkool <segher@kernel.crashing.org>,
Scott Wood <scottwood@freescale.com>
Cc: linuxppc-dev@lists.ozlabs.org, Paul Mackerras <paulus@samba.org>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 2/2] powerpc32: optimise csum_partial() loop
Date: Mon, 17 Aug 2015 15:05:40 +0200 [thread overview]
Message-ID: <55D1DC24.2020407@c-s.fr> (raw)
In-Reply-To: <55D1BED4.4040808@c-s.fr>
Le 17/08/2015 13:00, leroy christophe a écrit :
>
>
> Le 17/08/2015 12:56, leroy christophe a écrit :
>>
>>
>> Le 07/08/2015 01:25, Segher Boessenkool a écrit :
>>> On Thu, Aug 06, 2015 at 05:45:45PM -0500, Scott Wood wrote:
>>>> If this makes performance non-negligibly worse on other 32-bit
>>>> chips, and is
>>>> an important improvement on 8xx, then we can use an ifdef since 8xx
>>>> already
>>>> requires its own kernel build. I'd prefer to see a benchmark
>>>> showing that it
>>>> actually does make things worse on those chips, though.
>>> And I'd like to see a benchmark that shows it *does not* hurt
>>> performance
>>> on most chips, and does improve things on 8xx, and by how much. But it
>>> isn't *me* who has to show that, it is not my patch.
>> Ok, following this discussion I made some additional measurement and
>> it looks like:
>> * There is almost no change on the 885
>> * There is a non negligeable degradation on the 8323 (19.5 tb ticks
>> instead of 15.3)
>>
>> Thanks for pointing this out, I think my patch is therefore not good.
>>
> Oops, I was talking about my other past, the one that was to optimise
> ip_csum_fast.
> I still have to measure csum_partial
>
Now, I have the results for csum_partial(). The measurement is done with
mftbl() before and after calling the function, with IRQ off to get a
stable measure. Measurement is done with a transfer of vmlinux file done
3 times via scp toward the target. We get approximatly 50000 calls to
csum_partial()
On MPC885:
1/ Without the patchset, mean time spent in csum_partial() is 167 tb ticks.
2/ With the patchset, mean time is 150 tb ticks
On MPC8323:
1/ Without the patchset, mean time is 287 tb ticks
2/ With the patchset, mean time is 256 tb ticks
The improvement is approximatly 10% in both cases
So, unlike my patch on ip_fast_csum(), this one is worth it.
Christophe
prev parent reply other threads:[~2015-08-17 13:05 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-05 13:29 [PATCH v2 0/2] powerpc32: Optimise csum_partial() Christophe Leroy
2015-08-05 13:29 ` [PATCH v2 1/2] powerpc32: optimise a few instructions in csum_partial() Christophe Leroy
2015-08-05 13:29 ` [PATCH v2 2/2] powerpc32: optimise csum_partial() loop Christophe Leroy
2015-08-06 0:30 ` Segher Boessenkool
2015-08-06 2:31 ` Scott Wood
2015-08-06 4:39 ` Segher Boessenkool
2015-08-06 22:45 ` Scott Wood
2015-08-06 23:25 ` Segher Boessenkool
2015-08-17 10:56 ` leroy christophe
2015-08-17 11:00 ` leroy christophe
2015-08-17 13:05 ` leroy christophe [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55D1DC24.2020407@c-s.fr \
--to=christophe.leroy@c-s.fr \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=paulus@samba.org \
--cc=scottwood@freescale.com \
--cc=segher@kernel.crashing.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.