public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH mmotm] fix broken bootup on 32-bit
@ 2011-03-05 12:42 Hugh Dickins
  2011-03-05 19:41 ` [PATCH] lib: vsprintf: 32-bit put_dec() fixes Michal Nazarewicz
       [not found] ` <AANLkTikDssqCD-f1E=i-HvKEAMnBMtD5qJq-k5Q3FJKC@mail.gmail.com>
  0 siblings, 2 replies; 3+ messages in thread
From: Hugh Dickins @ 2011-03-05 12:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Nazarewicz, Douglas W. Jones, Denis Vlasenko, linux-kernel

mmotm 2011-03-02-16-52 is utterly broken on 32-bit: panics at boot
with "Couldn't register console driver", and preceding warnings don't
even print their line numbers... which leads to the vsprintf changes.

Comparing the 32-bit version of put_dec() against mod7_put_dec()
in tools/put-dec/put-dec-test.c (ah, it is useful after all!) shows
that someone has decided to invert the ordering of pairs of lines
in the kernel version of the algorithm, making a nonsense of it.

Switch them back, and put in the conditionals from mod7_put_dec()
(absent from mod6_put_dec??): my guess is that less work is better.

My testing has gone no further than booting up, and checking that
the numbers in /proc/partitions come out right (which they didn't
at my first attempt, when I thought just a one-liner was needed).

Signed-off-by: Hugh Dickins <hughd@google.com>
---
Diffed to go on the top of mmotm, but can be applied after
lib-vsprintf-optimised-put_dec-function-fix.patch

 lib/vsprintf.c |   33 ++++++++++++++++++++-------------
 1 file changed, 20 insertions(+), 13 deletions(-)

--- mmotm/lib/vsprintf.c	2011-03-03 15:43:57.000000000 -0800
+++ linux/lib/vsprintf.c	2011-03-05 03:37:14.000000000 -0800
@@ -310,22 +310,29 @@ char *put_dec(char *buf, unsigned long l
 
 	q  = 656 * d3 + 7296 * d2 + 5536 * d1 + (n & 0xFFFF);
 
-	q  = q / 10000;
 	buf = put_dec_full4(buf, q % 10000);
+	q  = q / 10000;
 
 	d1 = q + 7671 * d3 + 9496 * d2 + 6 * d1;
-	q  = d1 / 10000;
-	buf = put_dec_full4(buf, d1 % 10000);
-
-	d2 = q + 4749 * d3 + 42 * d2;
-	q  = d2 / 10000;
-	buf = put_dec_full4(buf, d2 % 10000);
-
-	d3 = q + 281 * d3;
-	q  = d3 / 10000;
-	buf = put_dec_full4(buf, d3 % 10000);
-
-	buf = put_dec_full4(buf, q);
+	if (d1) {
+		buf = put_dec_full4(buf, d1 % 10000);
+		q  = d1 / 10000;
+
+		d2 = q + 4749 * d3 + 42 * d2;
+		if (d2) {
+			buf = put_dec_full4(buf, d2 % 10000);
+			q  = d2 / 10000;
+
+			d3 = q + 281 * d3;
+			if (d3) {
+				buf = put_dec_full4(buf, d3 % 10000);
+				q  = d3 / 10000;
+
+				if (q)
+					buf = put_dec_full4(buf, q);
+			}
+		}
+	}
 
 	while (buf[-1] == '0')
 		--buf;

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH] lib: vsprintf: 32-bit put_dec() fixes
  2011-03-05 12:42 [PATCH mmotm] fix broken bootup on 32-bit Hugh Dickins
@ 2011-03-05 19:41 ` Michal Nazarewicz
       [not found] ` <AANLkTikDssqCD-f1E=i-HvKEAMnBMtD5qJq-k5Q3FJKC@mail.gmail.com>
  1 sibling, 0 replies; 3+ messages in thread
From: Michal Nazarewicz @ 2011-03-05 19:41 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Douglas W. Jones, Denis Vlasenko, linux-kernel

This commit fixes the 32-bit put_dec() function.

I have submitted by mistake an older version of the put_dec()
patch with a bug in it (which had been spotted by Denys and
fixed in subsequent version), which resulted in Hugh having to
find the bug once again (after experiencing boot failure).

This commit fixes the bug once and for all and introduces some
additional optimisations and comments (which were present in
the fixed version of put_dec() patch).

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Hugh Dickins <hughd@google.com>
---
 lib/vsprintf.c |   49 ++++++++++++++++++++++++-------------------------
 1 files changed, 24 insertions(+), 25 deletions(-)

Hugh Dickins <hughd@google.com> writes:
> mmotm 2011-03-02-16-52 is utterly broken on 32-bit: panics
> at boot with "Couldn't register console driver", and
> preceding warnings don't even print their line
> numbers... which leads to the vsprintf changes.

As I suspected, I sent by mistake the v3 instead of v4 of the
patch.  I haven't spotted any problems because I was building
kernel for my 32-bit machine from a branch with a fix but than
sent patch from a branch for 64-bit machine (which was not
affected by the bug).

Hugh proposed version with cascades of ifs.  I think this is
better version because it's smaller and benchmark are not
clear which version is faster (on Intel version with ifs was
faster but on ARM version without ifs was).

I pushed an unified version of the whole put_dec patch,
rebased on v2.6.38-rc7, to github:

        git://github.com/mina86/linux-2.6.git v2.6.38-rc7+put-dec

The attached patch is rebased on top of -mm (ie. just a delta).

For the sake of elegance, I would probably recommend taking
the unified patch instead of what's already in -mm plus this
or Hugh's fix.

Again, sorry about all the problems.

diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 344c03f..daa9209 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -157,7 +157,7 @@ static noinline_for_stack char *put_dec_full5(char *buf, unsigned q)
 	 * without any branches.
 	 */
 
-	r      = (q * (uint64_t)0xcccd) >> 19;
+	r = (q * (uint64_t)0xcccd) >> 19;
 	*buf++ = (q - 10 * r) + '0';
 
 	/*
@@ -226,7 +226,14 @@ static noinline_for_stack char *put_dec_trunc5(char *buf, unsigned q)
 	return buf;
 }
 
-/* No inlining helps gcc to use registers better */
+/*
+ * This function formats all integers correctly, however on 32-bit
+ * processors function below is used (not this one) which handles only
+ * non-zero integers.  So be advised never to call this function with
+ * num == 0.
+ *
+ * No inlining helps gcc to use registers better
+ */
 static noinline_for_stack
 char *put_dec(char *buf, unsigned long long num)
 {
@@ -293,36 +300,36 @@ char *put_dec_8bit(char *buf, unsigned q)
  * permission from the author).  This performs no 64-bit division and
  * hence should be faster on 32-bit machines then the version of the
  * function above.
+ *
+ * This function formats correctly all NON-ZERO integers.  Passing
+ * zero makes daemons come out of your closet.  This is OK, since
+ * number(), which calls this function, has a special case for zero
+ * anyways.
  */
 static noinline_for_stack
 char *put_dec(char *buf, unsigned long long n)
 {
 	uint32_t d3, d2, d1, q;
 
-	if (n < 10) {
-		*buf++ = '0' + (unsigned)n;
-		return buf;
-	}
-
 	d1 = (n >> 16) & 0xFFFF;
 	d2 = (n >> 32) & 0xFFFF;
 	d3 = (n >> 48) & 0xFFFF;
 
-	q  = 656 * d3 + 7296 * d2 + 5536 * d1 + (n & 0xFFFF);
+	q = 656 * d3 + 7296 * d2 + 5536 * d1 + (n & 0xFFFF);
 
-	q  = q / 10000;
 	buf = put_dec_full4(buf, q % 10000);
+	q = q / 10000;
 
 	d1 = q + 7671 * d3 + 9496 * d2 + 6 * d1;
-	q  = d1 / 10000;
+	q = d1 / 10000;
 	buf = put_dec_full4(buf, d1 % 10000);
 
 	d2 = q + 4749 * d3 + 42 * d2;
-	q  = d2 / 10000;
+	q = d2 / 10000;
 	buf = put_dec_full4(buf, d2 % 10000);
 
 	d3 = q + 281 * d3;
-	q  = d3 / 10000;
+	q = d3 / 10000;
 	buf = put_dec_full4(buf, d3 % 10000);
 
 	buf = put_dec_full4(buf, q);
@@ -407,22 +414,14 @@ char *number(char *buf, char *end, unsigned long long num,
 			spec.field_width--;
 		}
 	}
-	if (need_pfx) {
-		spec.field_width--;
-		if (spec.base == 16)
-			spec.field_width--;
-	}
+	if (need_pfx)
+		spec.field_width -= spec.base / 8;
 
 	/* generate full string in tmp[], in reverse order */
 	i = 0;
-	if (num == 0)
-		tmp[i++] = '0';
-	/* Generic code, for any base:
-	else do {
-		tmp[i++] = (digits[do_div(num,base)] | locase);
-	} while (num != 0);
-	*/
-	else if (spec.base != 10) { /* 8 or 16 */
+	if (num < 8) {
+		tmp[i++] = '0' + (char)num;
+	} else if (spec.base != 10) { /* 8 or 16 */
 		int mask = spec.base - 1;
 		int shift = 3;
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH mmotm] fix broken bootup on 32-bit
       [not found] ` <AANLkTikDssqCD-f1E=i-HvKEAMnBMtD5qJq-k5Q3FJKC@mail.gmail.com>
@ 2011-03-05 19:49   ` Hugh Dickins
  0 siblings, 0 replies; 3+ messages in thread
From: Hugh Dickins @ 2011-03-05 19:49 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: linux-kernel, Douglas W. Jones, Andrew Morton, Denis Vlasenko, e

On Sat, 5 Mar 2011, Michal Nazarewicz wrote:
> On Mar 5, 2011 1:43 PM, "Hugh Dickins" <hughd@google.com> wrote:
> >
> > mmotm 2011-03-02-16-52 is utterly broken on 32-bit: panics at boot
> > with "Couldn't register console driver", and preceding warnings don't
> > even print their line numbers... which leads to the vsprintf changes.
> 
> Sorry. I must have submitted the wrong version of the patch. I don't know
> how that could happen.

I don't know by what route they got into mmotm:
I can't find them on LKML since last August.

> 
> > Comparing the 32-bit version of put_dec() against mod7_put_dec()
> > in tools/put-dec/put-dec-test.c (ah, it is useful after all!) shows
> > that someone has decided to invert the ordering of pairs of lines
> > in the kernel version of the algorithm, making a nonsense of it.
> 
> Again, I'm at loss of how that happened. I remember Denis pointing this out
> though and me fixing it. In some strange turn of events I must have
> submitted version without the fix.

And what other surprises may have crept in there?

I think you need to get more signoffs from competent (not me!) people,
and (at least for initial submission) a configurable test of the code
that can actually be run at startup in the kernel.

I nearly called the tools/put-dec testing worthless, since what it's
testing is divorced from and different from what's being run in the
kernel.  But that's unfair, it surely helped me to a booting system.

> 
> > Switch them back, and put in the conditionals from mod7_put_dec()
> > (absent from mod6_put_dec??): my guess is that less work is better.
> >
> > My testing has gone no further than booting up, and checking that
> > the numbers in /proc/partitions come out right (which they didn't
> > at my first attempt, when I thought just a one-liner was needed).
> >
> > Signed-off-by: Hugh Dickins <hughd@google.com>
> > ---
> > Diffed to go on the top of mmotm, but can be applied after
> > lib-vsprintf-optimised-put_dec-function-fix.patch
> >
> >  lib/vsprintf.c |   33 ++++++++++++++++++++-------------
> >  1 file changed, 20 insertions(+), 13 deletions(-)
> >
> > --- mmotm/lib/vsprintf.c        2011-03-03 15:43:57.000000000 -0800
> > +++ linux/lib/vsprintf.c        2011-03-05 03:37:14.000000000 -0800
> > @@ -310,22 +310,29 @@ char *put_dec(char *buf, unsigned long l
> >
> >        q  = 656 * d3 + 7296 * d2 + 5536 * d1 + (n & 0xFFFF);
> >
> > -       q  = q / 10000;
> >        buf = put_dec_full4(buf, q % 10000);
> > +       q  = q / 10000;
> 
> This should be enough.

Looking at it again this morning, yes, that's the only reordering that
matters, the rest just makes it more recognizably like what's tested.

> 
> >
> >        d1 = q + 7671 * d3 + 9496 * d2 + 6 * d1;
> > -       q  = d1 / 10000;
> > -       buf = put_dec_full4(buf, d1 % 10000);
> > -
> > -       d2 = q + 4749 * d3 + 42 * d2;
> > -       q  = d2 / 10000;
> > -       buf = put_dec_full4(buf, d2 % 10000);
> > -
> > -       d3 = q + 281 * d3;
> > -       q  = d3 / 10000;
> > -       buf = put_dec_full4(buf, d3 % 10000);
> > -
> > -       buf = put_dec_full4(buf, q);
> > +       if (d1) {
> > +               buf = put_dec_full4(buf, d1 % 10000);
> > +               q  = d1 / 10000;
> > +
> > +               d2 = q + 4749 * d3 + 42 * d2;
> > +               if (d2) {
> > +                       buf = put_dec_full4(buf, d2 % 10000);
> > +                       q  = d2 / 10000;
> > +
> > +                       d3 = q + 281 * d3;
> > +                       if (d3) {
> > +                               buf = put_dec_full4(buf, d3 % 10000);
> > +                               q  = d3 / 10000;
> > +
> > +                               if (q)
> > +                                       buf = put_dec_full4(buf, q);
> > +                       }
> > +               }
> > +       }
> >
> >        while (buf[-1] == '0')
> >                --buf;
> 
> Note that this loop handles zeros at the beginning so it's not necessary to
> add the ifs. As I've said, swapping the two lines at the beginning will be
> enough, and I would recommend doing only that.

I realize that zeroes are handled, but I was imagining that one branch
taken (for numbers up to 9999) is cheaper than four out-of-line function
calls, six divisions-or-modulos by constant 10000, three multiplications
by constants; oh, and a lot more once I look inside put_dec_full4().

Is that not the case?  Isn't performance the justification for this magic?

> 
> Once again, sorry about the trouble.

Hugh

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-03-05 19:49 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-05 12:42 [PATCH mmotm] fix broken bootup on 32-bit Hugh Dickins
2011-03-05 19:41 ` [PATCH] lib: vsprintf: 32-bit put_dec() fixes Michal Nazarewicz
     [not found] ` <AANLkTikDssqCD-f1E=i-HvKEAMnBMtD5qJq-k5Q3FJKC@mail.gmail.com>
2011-03-05 19:49   ` [PATCH mmotm] fix broken bootup on 32-bit Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox