Re: [PATCH] Make div64_u64() precise on 32bit platforms

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Brian Behlendorf <behlendorf1@llnl.gov>
To: Oleg Nesterov <oleg@redhat.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH] Make div64_u64() precise on 32bit platforms
Date: Thu, 21 Oct 2010 10:46:42 -0700	[thread overview]
Message-ID: <1287683202.16680.44.camel@pip> (raw)
In-Reply-To: <20101014121159.GA407@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 1995 bytes --]

> > > +	if (divisor >> 32 == 0) {
> > > +		if (dividend >> 32 < divisor) {
> > > +			return div_u64_rem(dividend, divisor, &rem);
> > > +		} else {
> > > +			u0 = dividend & 0xFFFFFFFF;
> > > +			quot1 = div_u64_rem(dividend >> 32, divisor, &rem);
> > > +			u0 += ((u64)rem << 32);
> > > +			quot0 = div_u64_rem(u0, divisor, &rem);
> > > +			return (quot1 << 32) + quot0;
> > > +		}
> >
> > Looks correct... but I can't understand these complications.
> > Looks like we can just do
> >
> > 	if ((divisor >> 32) == 0) {
> > 		div_u64(dividend, divisor);
> > 	} else {
> > 	...
> >
> > No?

The idea here, as described in the formal proof, is to cleanly handle
the overflow case.  When I implemented this I assumed the overflow case
would in fact be a problem.  To my surprise your right it doesn't seem
to be causing any trouble.  In practice I can't find any cases where it
is a problem on i386.

> > I can't understand this "dividend >> 1". It seems to me that
> >
> > 		quot1 = div_u64(dividend, (divisor << n) >> 32);
> > 		quot0 = (quot1 << n) >> 32;
> >
> > should be equally correct. Or I missed some overflow?
> 
> Thinking more about this with a fresh head, we don't event need quot1,
> unless I missed something. We can do
> 
> 		quot0 = div_u64((dividend << n) >> 32, (divisor << n) >> 32);
> 
> instead. Or, better,
> 
> 		n = 32 - __builtin_clzll(divisor);
> 		quot0 = div_u64(dividend >> n, divisor >> n);
> 
> And 32 - clzll == fls.

Once again, the extra complexity was only there to handle to overflow
case.

> So, I think it can be really trivial, see the test-case below,
> seems to work (you need 64bit machine to test).
> 
> What do you think? I do not trust my math skills.

I think we should use your simpler version.  There's no good reason to
make this more complicated than it needs to be.  I haven't been able to
find a test case where your changes get the wrong result.

The updated patch is against linux-2.6.35 and passes all the previous
test cases.

Thanks,
Brian

[-- Attachment #2: Type: text/x-patch, Size: 3572 bytes --]

>From 35755c57cb45f7d30abcd88a8e2fc1ccc5beecfd Mon Sep 17 00:00:00 2001
From: Brian Behlendorf <behlendorf1@llnl.gov>
Date: Thu, 5 Aug 2010 14:59:11 -0700
Subject: [PATCH] Fix div64_u64 for 32bit platforms

The current implementation of div64_u64 for 32bit systems returns
an approximately correct result when the divisor exceeds 32bits.
Since doing 64bit division using 32bit hardware is a long since
solved problem we just use one of the existing proven methods.

Additionally, add a div64_s64 function to correctly handle doing
signed 64bit division.
---
 include/linux/kernel.h |    5 ++++
 include/linux/math64.h |   12 +++++++++++
 lib/div64.c            |   52 ++++++++++++++++++++++++++++++++++++++---------
 3 files changed, 59 insertions(+), 10 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 8317ec4..ed6371c 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -162,6 +162,11 @@ extern int _cond_resched(void);
 		(__x < 0) ? -__x : __x;		\
 	})
 
+#define abs64(x) ({				\
+		s64 __x = (x);			\
+		(__x < 0) ? -__x : __x;		\
+	})
+
 #ifdef CONFIG_PROVE_LOCKING
 void might_fault(void);
 #else
diff --git a/include/linux/math64.h b/include/linux/math64.h
index c87f152..23fcdfc 100644
--- a/include/linux/math64.h
+++ b/include/linux/math64.h
@@ -35,6 +35,14 @@ static inline u64 div64_u64(u64 dividend, u64 divisor)
 	return dividend / divisor;
 }
 
+/**
+ * div64_s64 - signed 64bit divide with 64bit divisor
+ */
+static inline s64 div64_s64(s64 dividend, s64 divisor)
+{
+	return dividend / divisor;
+}
+
 #elif BITS_PER_LONG == 32
 
 #ifndef div_u64_rem
@@ -53,6 +61,10 @@ extern s64 div_s64_rem(s64 dividend, s32 divisor, s32 *remainder);
 extern u64 div64_u64(u64 dividend, u64 divisor);
 #endif
 
+#ifndef div64_s64
+extern s64 div64_s64(s64 dividend, s64 divisor);
+#endif
+
 #endif /* BITS_PER_LONG */
 
 /**
diff --git a/lib/div64.c b/lib/div64.c
index a111eb8..5b49191 100644
--- a/lib/div64.c
+++ b/lib/div64.c
@@ -77,26 +77,58 @@ s64 div_s64_rem(s64 dividend, s32 divisor, s32 *remainder)
 EXPORT_SYMBOL(div_s64_rem);
 #endif
 
-/* 64bit divisor, dividend and result. dynamic precision */
+/**
+ * div64_u64 - unsigned 64bit divide with 64bit divisor
+ * @dividend:	64bit dividend
+ * @divisor:	64bit divisor
+ *
+ * This implementation is a modified version of the algorithm proposed
+ * by the book 'Hacker's Delight'.  The original source and full proof
+ * can be found here and is available for use without restriction.
+ *
+ * 'http://www.hackersdelight.org/HDcode/newCode/divDouble.c'
+ */
 #ifndef div64_u64
 u64 div64_u64(u64 dividend, u64 divisor)
 {
-	u32 high, d;
+	u32 high = divisor >> 32;
+	u64 quot;
 
-	high = divisor >> 32;
-	if (high) {
-		unsigned int shift = fls(high);
+	if (high == 0) {
+		quot = div_u64(dividend, divisor);
+	} else {
+		int n = 1 + fls(high);
+		quot = div_u64(dividend >> n, divisor >> n);
 
-		d = divisor >> shift;
-		dividend >>= shift;
-	} else
-		d = divisor;
+		if (quot != 0)
+			quot--;
+		if ((dividend - quot * divisor) >= divisor)
+			quot++;
+	}
 
-	return div_u64(dividend, d);
+	return quot;
 }
 EXPORT_SYMBOL(div64_u64);
 #endif
 
+/**
+ * div64_s64 - signed 64bit divide with 64bit divisor
+ * @dividend:	64bit dividend
+ * @divisor:	64bit divisor
+ */
+#ifndef div64_s64
+s64 div64_s64(s64 dividend, s64 divisor)
+{
+	s64 quot, t;
+
+	quot = div64_u64(abs64(dividend), abs64(divisor));
+	t = (dividend ^ divisor) >> 63;
+
+	return (quot ^ t) - t;
+}
+EXPORT_SYMBOL(div64_s64);
+#endif
+
 #endif /* BITS_PER_LONG == 32 */
 
 /*
-- 
1.7.1

next prev parent reply	other threads:[~2010-10-21 17:46 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-12 19:26 [PATCH] Make div64_u64() precise on 32bit platforms Brian Behlendorf
2010-10-13 21:37 ` Oleg Nesterov
2010-10-14 12:11   ` Oleg Nesterov
2010-10-21 17:46     ` Brian Behlendorf [this message]
2010-10-21 18:12       ` Oleg Nesterov
2010-10-21 19:22         ` Andrew Morton
2010-10-21 19:49           ` Oleg Nesterov
  -- strict thread matches above, loose matches on Subject: below --
2010-08-02 16:09 [PATCH] trivial, document that div64_u64() is not " Oleg Nesterov
2010-08-03 22:28 ` Andrew Morton
2010-08-09 16:30   ` [PATCH] Make div64_u64() " Brian Behlendorf
2010-09-17  0:00     ` Oleg Nesterov

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:8317ec4 dfblob:ed6371c dfblob:c87f152 dfblob:23fcdfc
dfblob:a111eb8 dfblob:5b49191 )
 OR (
bs:"Fix div64_u64 for 32bit platforms" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1287683202.16680.44.camel@pip \
    --to=behlendorf1@llnl.gov \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.