Re: [9/9] powerpc: optimise csum_partial() call when len is constant

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

From: Scott Wood <oss@buserror.net>
To: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	scottwood@freescale.com, netdev@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org
Subject: Re: [9/9] powerpc: optimise csum_partial() call when len is constant
Date: Fri, 4 Mar 2016 23:29:00 -0600	[thread overview]
Message-ID: <20160305052900.GA5742@home.buserror.net> (raw)
In-Reply-To: <df593a56e52ccd24b758b9d40ae3b414cb4e5372.1442876807.git.christophe.leroy@c-s.fr>

On Tue, Sep 22, 2015 at 04:34:36PM +0200, Christophe Leroy wrote:
> +/*
> + * computes the checksum of a memory block at buff, length len,
> + * and adds in "sum" (32-bit)
> + *
> + * returns a 32-bit number suitable for feeding into itself
> + * or csum_tcpudp_magic
> + *
> + * this function must be called with even lengths, except
> + * for the last fragment, which may be odd
> + *
> + * it's best to have buff aligned on a 32-bit boundary
> + */
> +__wsum __csum_partial(const void *buff, int len, __wsum sum);
> +
> +static inline __wsum csum_partial(const void *buff, int len, __wsum sum)
> +{
> +	if (__builtin_constant_p(len) && len == 0)
> +		return sum;
> +
> +	if (__builtin_constant_p(len) && len <= 16 && (len & 1) == 0) {
> +		__wsum sum1;
> +
> +		if (len == 2)
> +			sum1 = (__force u32)*(u16 *)buff;
> +		if (len >= 4)
> +			sum1 = *(u32 *)buff;
> +		if (len == 6)
> +			sum1 = csum_add(sum1, (__force u32)*(u16 *)(buff + 4));
> +		if (len >= 8)
> +			sum1 = csum_add(sum1, *(u32 *)(buff + 4));
> +		if (len == 10)
> +			sum1 = csum_add(sum1, (__force u32)*(u16 *)(buff + 8));
> +		if (len >= 12)
> +			sum1 = csum_add(sum1, *(u32 *)(buff + 8));
> +		if (len == 14)
> +			sum1 = csum_add(sum1, (__force u32)*(u16 *)(buff + 12));
> +		if (len >= 16)
> +			sum1 = csum_add(sum1, *(u32 *)(buff + 12));
> +
> +		sum = csum_add(sum1, sum);

Why the final csum_add instead of s/sum1/sum/ and putting csum_add in the
"len == 2" and "len >= 4" cases?

The (__force u32) casts are unnecessary.  Or rather, it should be
(__force __wsum) -- on all of them, not just the 16-bit ones.

The pointer casts should be const.

> +	} else if (__builtin_constant_p(len) && (len & 3) == 0) {
> +		sum = csum_add(ip_fast_csum_nofold(buff, len >> 2), sum);

It may not make a functional difference, but based on the csum_add()
argument names and other csum_add() usage, sum should come first
and the new content second.

-Scott

next prev parent reply	other threads:[~2016-03-05  5:29 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-22 14:34 [PATCH 0/9] powerpc32: set of optimisation of network checksum functions Christophe Leroy
2015-09-22 14:34 ` [PATCH 1/9] powerpc: unexport csum_tcpudp_magic Christophe Leroy
2015-09-22 14:34 ` [PATCH 2/9] powerpc: mark xer clobbered in csum_add() Christophe Leroy
2015-09-22 14:34 ` [PATCH 3/9] powerpc32: checksum_wrappers_64 becomes checksum_wrappers Christophe Leroy
2015-10-23  3:26   ` Scott Wood
2015-10-28 11:11     ` Anton Blanchard
2015-09-22 14:34 ` [PATCH 4/9] powerpc: inline ip_fast_csum() Christophe Leroy
2015-09-23  5:43   ` Denis Kirjanov
2016-02-29  7:25     ` Christophe Leroy
2016-03-05  3:50   ` [4/9] " Scott Wood
2015-09-22 14:34 ` [PATCH 5/9] powerpc32: rewrite csum_partial_copy_generic() based on copy_tofrom_user() Christophe Leroy
2015-09-22 14:34 ` [PATCH 6/9] powerpc32: optimise a few instructions in csum_partial() Christophe Leroy
2015-10-23  3:30   ` Scott Wood
2016-02-29 12:53     ` Christophe Leroy
2015-09-22 14:34 ` [PATCH 7/9] powerpc32: optimise csum_partial() loop Christophe Leroy
2015-09-22 14:34 ` [PATCH 8/9] powerpc: simplify csum_add(a, b) in case a or b is constant 0 Christophe Leroy
2015-10-23  3:33   ` Scott Wood
2016-02-29  7:26     ` Christophe Leroy
2015-09-22 14:34 ` [PATCH 9/9] powerpc: optimise csum_partial() call when len is constant Christophe Leroy
2015-10-23  3:32   ` Scott Wood
2016-03-05  5:29   ` Scott Wood [this message]
2015-09-23 22:38 ` [PATCH 0/9] powerpc32: set of optimisation of network checksum functions David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160305052900.GA5742@home.buserror.net \
    --to=oss@buserror.net \
    --cc=benh@kernel.crashing.org \
    --cc=christophe.leroy@c-s.fr \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=netdev@vger.kernel.org \
    --cc=paulus@samba.org \
    --cc=scottwood@freescale.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).