Re: [PATCH v3 1/4] lib/string_helpers: change blk_size to u32 for string_get_size() interface

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: James Bottomley <jbottomley@odin.com>
Cc: "ulf.hansson\@linaro.org" <ulf.hansson@linaro.org>,
	"linux\@rasmusvillemoes.dk" <linux@rasmusvillemoes.dk>,
	"andriy.shevchenko\@linux.intel.com" 
	<andriy.shevchenko@linux.intel.com>,
	"keescook\@chromium.org" <keescook@chromium.org>,
	"linux-kernel\@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"akpm\@linux-foundation.org" <akpm@linux-foundation.org>
Subject: Re: [PATCH v3 1/4] lib/string_helpers: change blk_size to u32 for string_get_size() interface
Date: Tue, 03 Nov 2015 14:13:08 +0100	[thread overview]
Message-ID: <87egg7fcpn.fsf@vitty.brq.redhat.com> (raw)
In-Reply-To: <1446522039.2189.49.camel@Odin.com> (James Bottomley's message of "Tue, 3 Nov 2015 03:40:43 +0000")

James Bottomley <jbottomley@odin.com> writes:

> On Mon, 2015-11-02 at 16:58 +0100, Vitaly Kuznetsov wrote:
>> James Bottomley <jbottomley@odin.com> writes:
>> 
>> > On Fri, 2015-10-30 at 11:46 +0100, Vitaly Kuznetsov wrote:
>> >> James Bottomley <jbottomley@odin.com> writes:
>> >> 
>> >> > On Thu, 2015-10-29 at 17:30 +0100, Vitaly Kuznetsov wrote:
>> >> >> string_get_size() can't really handle huge block sizes, especially
>> >> >> blk_size > U32_MAX but string_get_size() interface states the opposite.
>> >> >> Change blk_size from u64 to u32 to reflect the reality.
>> >> >
>> >> > What is the actual evidence for this?  The calculation is designed to be
>> >> > a symmetric 128 bit multiply.  When I wrote and tested it, it worked
>> >> > fine for huge block sizes.
>> >> 
>> >> We have 'u32 remainder' and then we do:
>> >> 
>> >> exp = divisor[units] / (u32)blk_size;
>> >> ...
>> >> remainder = do_div(size, divisor[units]);
>> >> remainder *= blk_size;
>> >> 
>> >> I'm pretty sure it will overflow for some inputs.
>> >
>> > It shouldn't; the full code snippet does this:
>> >
>> >         	while (blk_size >= divisor[units]) {
>> >         		remainder = do_div(blk_size, divisor[units]);
>> >         		i++;
>> >         	}
>> >
>> >         	exp = divisor[units] / (u32)blk_size;
>> >
>> > So by the time it reaches the statement you complain about, blk_size is
>> > already less than or equal to the divisor (which is 1000 or 1024) so
>> > truncating to 32 bits is always correct.
>> >
>> 
>> I overlooked, sorry!
>> 
>> > I'm sort of getting the impression you don't quite understand the
>> > mathematics:  i is the logarithm to the base divisor[units].  We reduce
>> > both operands to exponents of the logarithm base (adding the two bases
>> > together in i), which means they are by definition in a range between
>> > zero and the base and then multiply the remaining exponents correcting
>> > the result for a base overflow (so the result is always a correct
>> > exponent and i is the logarithm to the base).  It's actually simply
>> > Napier's algorithm.
>> >
>> > The reason we're getting the up to 2.5% rounding errors you complain
>> > about is because at each logarithm until the last one, we throw away the
>> > remainder (it's legitimate because it's always 1000x smaller than the
>> > exponent), but in the case of a large remainder it provides a small
>> > correction to the final operation which we don't account for.  If you
>> > want to make a true correction, you save the penultimate residue in each
>> > case, multiply each by the *other* exponent add them together, divide by
>> > the base and increment the final result by the remainder.
>> 
>> My assumption was that we don't really need to support blk_sizes >
>> U32_MAX and we can simplify string_get_size() instead of adding
>> additional complexity. Apparently, the assumption was wrong.
>> 
>> >
>> > However, for 2.5% the physicist in me says the above is way overkill.
>> >
>> 
>> It is getting was over 2.5% if blk_size is not a power of 2. While it is
>> probably never the case for block subsystem the function is in lib and
>> pretends to be general-enough. I'll try to make proper correction and
>> let's see if it's worth the effort. 
>
> OK, this is the full calculation.  It also includes an arithmetic
> rounding to the final figure print.  I suppose it's not that much more
> complexity than the original, and it does make the algorithm easier to
> understand.
>
> We could do with running the comments by some other non-mathematician,
> now I've explained it in detail to you two, to see if they actually give
> an understanding of the algorithm.

Thanks, to me they look great! One nitpick below ...

>
> James
>
> ---
>
> diff --git a/lib/string_helpers.c b/lib/string_helpers.c
> index 5939f63..1ec7e77a 100644
> --- a/lib/string_helpers.c
> +++ b/lib/string_helpers.c
> @@ -44,7 +44,7 @@ void string_get_size(u64 size, u64 blk_size, const enum string_size_units units,
>  		[STRING_UNITS_2] = 1024,
>  	};
>  	int i, j;
> -	u32 remainder = 0, sf_cap, exp;
> +	u32 remainder = 0, sf_cap, r1 = 0, r2 = 0, round;
>  	char tmp[8];
>  	const char *unit;
>
> @@ -53,27 +53,46 @@ void string_get_size(u64 size, u64 blk_size, const enum string_size_units units,
>  	if (!size)
>  		goto out;
>
> +	/* This is napier's algorithm.  Reduce the original block size to
> +	 *
> +	 * co * divisor[units]^i
> +	 *
> +	 * where co = blk_size + r1/divisor[units];
> +	 *
> +	 * and the same for size.  We simply add to the exponent i, because
> +	 * the final calculation we're looking for is
> +	 *
> +	 * (co1 * co2) * divisor[units]^i
> +	 */
> +
> +
>  	while (blk_size >= divisor[units]) {
> -		remainder = do_div(blk_size, divisor[units]);
> +		r1 = do_div(blk_size, divisor[units]);
>  		i++;
>  	}
>
> -	exp = divisor[units] / (u32)blk_size;
> -	/*
> -	 * size must be strictly greater than exp here to ensure that remainder
> -	 * is greater than divisor[units] coming out of the if below.
> -	 */
> -	if (size > exp) {
> -		remainder = do_div(size, divisor[units]);
> -		remainder *= blk_size;
> +	while (size >= divisor[units]) {
> +		r2 = do_div(size, divisor[units]);
>  		i++;
> -	} else {
> -		remainder *= size;
>  	}
>
> -	size *= blk_size;
> -	size += remainder / divisor[units];
> -	remainder %= divisor[units];
> +	/* here's the magic.  co1 * co2 may be > divisor[i], so correct for
> +	 * that in the exponent and make sure that the additional corrections
> +	 * from the remainders is added in.
> +	 *
> +	 * co1 *co2 = (blk_size + r1/divisor[units])*(size + r2/divisor[units])
> +	 *
> +	 * therefore
> +	 *
> +	 * co1*co2*divisor[units] = blk_size*size*divisor[units] +
> +	 *          r1*size + r2*size + r1*r2/divisor[units]
> +	 *
> +	 * drop the last term because it's too small and perform the
> +	 * calculation cleverly by decremeting i to be automatically dealing
> +	 * with everything multiplied by divisor[units] */
> +
> +	--i;
> +	size = size * blk_size * divisor[units] + r1 * size + r2 *
> blk_size;

The last term is actually not that small. Here is an example:

size = 8192  blk_size = 1024

'As is' the algorithm gives us '8.38 MB', and if we add "+ r1 * r1 /
divisor[units]" we get '8.39 MB' (the correct answer is 8192 * 1024 =
8388608 which is 8.39).

Both r1 and r2 are < divisor[units] here so r1 * r2 won't overflow u32,
I suggest we add this term.

>
>  	while (size >= divisor[units]) {
>  		remainder = do_div(size, divisor[units]);
> @@ -81,8 +100,15 @@ void string_get_size(u64 size, u64 blk_size, const enum string_size_units units,
>  	}
>
>  	sf_cap = size;
> -	for (j = 0; sf_cap*10 < 1000; j++)
> +	round = 500;
> +	for (j = 0; sf_cap*10 < 1000; j++) {
>  		sf_cap *= 10;
> +		round /= 10;
> +	}
> +
> +	/* add a 5 to the digit below what will be printed to ensure
> +	 * an arithmetical round up */
> +	remainder += round;
>
>  	if (j) {
>  		remainder *= 1000;

Can I post this solution with your Suggested-by or do you plan to do it
yourself?

Thanks,

-- 
  Vitaly

next prev parent reply	other threads:[~2015-11-03 13:13 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-29 16:30 [PATCH v3 0/4] lib/string_helpers: fix precision issues and introduce tests Vitaly Kuznetsov
2015-10-29 16:30 ` [PATCH v3 1/4] lib/string_helpers: change blk_size to u32 for string_get_size() interface Vitaly Kuznetsov
2015-10-29 22:27   ` James Bottomley
2015-10-29 23:19     ` Rasmus Villemoes
2015-10-29 23:24       ` Rasmus Villemoes
2015-10-30  3:33       ` James Bottomley
2015-10-30 10:46     ` Vitaly Kuznetsov
2015-10-31  0:21       ` James Bottomley
2015-11-02 15:58         ` Vitaly Kuznetsov
2015-11-03  3:40           ` James Bottomley
2015-11-03 13:13             ` Vitaly Kuznetsov [this message]
2015-11-03 17:02               ` James Bottomley
2015-11-03 20:57                 ` Rasmus Villemoes
2015-11-03 21:19                   ` James Bottomley
2015-10-29 16:30 ` [PATCH v3 2/4] lib/string_helpers.c: protect string_get_size() against blk_size=0 Vitaly Kuznetsov
2015-10-29 21:24   ` Andy Shevchenko
2015-10-29 23:00   ` James Bottomley
2015-10-29 23:32     ` Andy Shevchenko
2015-10-30  3:34       ` James Bottomley
2015-10-30 10:41         ` Vitaly Kuznetsov
2015-10-31  0:07           ` James Bottomley
2015-10-29 16:30 ` [PATCH v3 3/4] lib/string_helpers.c: don't lose precision in string_get_size() Vitaly Kuznetsov
2015-10-29 21:22   ` Andy Shevchenko
2015-10-29 16:30 ` [PATCH v3 4/4] lib/test-string_helpers.c: add string_get_size() tests Vitaly Kuznetsov
2015-10-29 21:33   ` Andy Shevchenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87egg7fcpn.fsf@vitty.brq.redhat.com \
    --to=vkuznets@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=jbottomley@odin.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=ulf.hansson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).