Re: [PATCHv2 2/3] lib: vsprintf: optimised put_dec() for 32-bit machines

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Denys Vlasenko <vda.linux@googlemail.com>
To: Michal Nazarewicz <mina86@mina86.com>
Cc: linux-kernel@vger.kernel.org, m.nazarewicz@samsung.com,
	"Douglas W. Jones" <jones@cs.uiowa.edu>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCHv2 2/3] lib: vsprintf: optimised put_dec() for 32-bit machines
Date: Tue, 10 Aug 2010 06:15:52 +0200	[thread overview]
Message-ID: <201008100615.52510.vda.linux@googlemail.com> (raw)
In-Reply-To: <419733287a42072a658f0f74d0b1901132178b75.1281295424.git.mina86@mina86.com>

On Sunday 08 August 2010 21:29, Michal Nazarewicz wrote:
> Compared to previous version: the code is used only if:
> 1. if long long is 64-bit (ie. ULLONG_MAX == 2**64-1), and
> 2. user did not select optimisation for size with Kconfig.

I measured the size and it does not seem to make sense
to exclude it on -Os. On x86:

put_dec_full change: 0x93 -> 0x47 bytes
put_dec      change: 0x12c -> 0x137 bytes

IOW, there is net code size reduction (compared to current kernel,
it may be a slight growth compared to patch 1).

So, please use the optimized code even for CONFIG_CC_OPTIMIZE_FOR_SIZE.


> Here are the results (normalised to the fastest/smallest):
>                     :        ARM       Atom
> -- Speed ----------------------------------
> orig_put_dec        :   9.333822   2.083110  Original
> mod1_put_dec        :   9.282045   1.904564
> mod2_put_dec        :   9.260409   1.910302
> mod3_put_dec        :   9.320053   1.905689  Proposed by previous patch
> mod4_put_dec        :   9.297146   1.933971
> mod5_put_dec        :  13.034318   2.434942
> mod6_put_dec        :   1.000000   1.000000  Proposed by this patch
> mod7_put_dec        :   1.009574   1.014147
> mod8_put_dec        :   7.226004   1.953460
> -- Size -----------------------------------
> orig_put_dec        :   1.000000   1.000000  Original
> mod1_put_dec        :   1.000000   1.000000
> mod2_put_dec        :   1.361111   1.403226
> mod3_put_dec        :   1.000000   1.000000  Proposed by previous patch
> mod4_put_dec        :   1.361111   1.403226
> mod5_put_dec        :   1.000000   1.000000
> mod6_put_dec        :   2.555556   3.508065  Proposed by this patch
> mod7_put_dec        :   2.833333   3.911290
> mod8_put_dec        :   2.027778   2.258065

I believe these are old results? Size growth is just too big.


> As it can be obsevred, proposed version of the put_dec function is
> twice as fast as the original version on Atom and almost 10 times
> faster on ARM.  I imagine that it may be similar on other "embedded"
> processors.
> 
> This may be skewed by the fact that the benchmark is using GCC's
> 64-bit division operator instead of kernel's do_div but it would
> appear that by avoiding 64-bit division something can be gained.

Re speed: on Phenom II in 32-bit mode, I see ~x3.3 speedup
on conversions involving large integers (might be skewed
by gcc's full-blown 64-bit division in "old" code - kernel's
div is smarter).


> PS. From Mr. Jones site: "Nonetheless, before relying on the material
> here, it would be prudent to check the arithmetic!" hence I checked
> all the calculations myself and everything seemed fine.  I've also run
> test applitacion several times so it tested a few 64-bit numbers..."

I tested [0, 100 million] and [2^64-100 million, 2^64-1] ranges.
No errors.


> +#if BITS_PER_LONG != 32 || defined CONFIG_CC_OPTIMIZE_FOR_SIZE || \
> +	ULLONG_MAX != 18446744073709551615ULL

I think it's better to say "if BITS_PER_LONG > 32 and ULLONG_MAX > 2^64-1",
since it expresses your intent better. Also, add comments explaining
what case you optimize for:

#if BITS_PER_LONG > 32 || ULLONG_MAX > 18446744073709551615ULL

/* Generic code */
...

#else /* BITS_PER_LONG <= 32 && ULLONG_MAX <= 2^64-1 */

/* Optimized code for arches with 64-bit long longs */
...


> +static noinline_for_stack
> +char *put_dec(char *buf, unsigned long long n)
> +{
> +	uint32_t d3, d2, d1, q;
> +
> +	if (!n) {
> +		*buf++ = '0';
> +		return buf;
> +	}

You may as well use the above shortcut for n <= 9, not only for 0.

> +	buf = put_dec_full4(buf, q % 10000);
> +	q   = q / 10000;
> +
> +	d1  = q + 7671 * d3 + 9496 * d2 + 6 * d1;
> +	buf = put_dec_full4(buf, d1 % 10000);
> +	q   = d1 / 10000;

I experimented with moving division up, before put_dec_full4:
q   = d1 / 10000;
buf = put_dec_full4(buf, d1 % 10000);
but gcc appears to be smart emough to do this transformation
itself. But you may still do it for older (dumber) gcc's.

-- 
vda

next prev parent reply	other threads:[~2010-08-10  4:16 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-08 19:29 [PATCHv2 1/3] lib: vsprintf: optimised put_dec_trunc() and put_dec_full() Michal Nazarewicz
2010-08-08 19:29 ` [PATCHv2 2/3] lib: vsprintf: optimised put_dec() for 32-bit machines Michal Nazarewicz
2010-08-08 19:29   ` [PATCHv2 3/3] lib: vsprintf: added a put_dec() test and benchmark tool Michal Nazarewicz
2010-08-10  4:15   ` Denys Vlasenko [this message]
2010-08-10  7:42     ` [PATCHv2 2/3] lib: vsprintf: optimised put_dec() for 32-bit machines Michał Nazarewicz
2010-08-10 16:10       ` Denys Vlasenko
2010-08-10  3:17 ` [PATCHv2 1/3] lib: vsprintf: optimised put_dec_trunc() and put_dec_full() Denys Vlasenko
2010-08-10  7:39   ` Michał Nazarewicz
2010-08-10 16:08     ` Denys Vlasenko
2010-08-10 22:42       ` Michal Nazarewicz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201008100615.52510.vda.linux@googlemail.com \
    --to=vda.linux@googlemail.com \
    --cc=akpm@linux-foundation.org \
    --cc=jones@cs.uiowa.edu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=m.nazarewicz@samsung.com \
    --cc=mina86@mina86.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.