* [PATCH] lib: fix 842 build on 32-bit architectures
@ 2015-05-13 20:56 Arnd Bergmann
2015-05-13 23:52 ` Dan Streetman
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Arnd Bergmann @ 2015-05-13 20:56 UTC (permalink / raw)
To: linux-arm-kernel
Building the 842 code on 32-bit ARM currently results in this link
error:
ERROR: "__aeabi_uldivmod" [lib/842/842_decompress.ko] undefined!
The reason is that the __do_index function performs a 64-bit
division by a power-of-two number, but it has no insight into
the function arguments.
By marking that function inline, the fsize argument is always
known at the time that do_index is called, and the compiler is
able to replace the extremely expensive 64-bit division with
a cheap constant shift operation.
Aside from fixing that link error, this approach should also improve
both code size and performance on 32-bit architectures significantly.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
Found while building arm32 allmodconfig with gcc-5.0
diff --git a/lib/842/842_decompress.c b/lib/842/842_decompress.c
index 6b2b45aecde3..285bf6b6959c 100644
--- a/lib/842/842_decompress.c
+++ b/lib/842/842_decompress.c
@@ -169,7 +169,7 @@ static int do_data(struct sw842_param *p, u8 n)
return 0;
}
-static int __do_index(struct sw842_param *p, u8 size, u8 bits, u64 fsize)
+static inline int __do_index(struct sw842_param *p, u8 size, u8 bits, u64 fsize)
{
u64 index, offset, total = round_down(p->out - p->ostart, 8);
int ret;
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH] lib: fix 842 build on 32-bit architectures 2015-05-13 20:56 [PATCH] lib: fix 842 build on 32-bit architectures Arnd Bergmann @ 2015-05-13 23:52 ` Dan Streetman 2015-05-14 8:54 ` Dan Streetman 2015-05-14 3:06 ` Herbert Xu 2015-05-14 10:03 ` Russell King - ARM Linux 2 siblings, 1 reply; 7+ messages in thread From: Dan Streetman @ 2015-05-13 23:52 UTC (permalink / raw) To: linux-arm-kernel > Building the 842 code on 32-bit ARM currently results in this link > error: > > ERROR: "__aeabi_uldivmod" [lib/842/842_decompress.ko] undefined! Oops! Guess I should build/test on 32 bit more. > > The reason is that the __do_index function performs a 64-bit > division by a power-of-two number, but it has no insight into > the function arguments. > > By marking that function inline, the fsize argument is always > known at the time that do_index is called, and the compiler is > able to replace the extremely expensive 64-bit division with > a cheap constant shift operation. alternately, we know that fsize will always be less than 64 bits, at most it's 4<<9 or 8<<8 (both == 1<<11). So we could just change its type to u16. diff --git a/lib/842/842_decompress.c b/lib/842/842_decompress.c index 6b2b45aecde3..285bf6b6959c 100644 --- a/lib/842/842_decompress.c +++ b/lib/842/842_decompress.c @@ -169,7 +169,7 @@ static int do_data(struct sw842_param *p, u8 n) return 0; } -static int __do_index(struct sw842_param *p, u8 size, u8 bits, u64 fsize) +static int __do_index(struct sw842_param *p, u8 size, u8 bits, u16 fsize) { u64 index, offset, total = round_down(p->out - p->ostart, 8); int ret; Or, we could inline it and change the type to u16. In any case, Acked-by: Dan Streetman <ddstreet@ieee.org> > > Aside from fixing that link error, this approach should also improve > both code size and performance on 32-bit architectures significantly. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > Found while building arm32 allmodconfig with gcc-5.0 > > diff --git a/lib/842/842_decompress.c b/lib/842/842_decompress.c > index 6b2b45aecde3..285bf6b6959c 100644 > --- a/lib/842/842_decompress.c > +++ b/lib/842/842_decompress.c > @@ -169,7 +169,7 @@ static int do_data(struct sw842_param *p, u8 n) > return 0; > } > > -static int __do_index(struct sw842_param *p, u8 size, u8 bits, u64 fsize) > +static inline int __do_index(struct sw842_param *p, u8 size, u8 bits, u64 fsize) > { > u64 index, offset, total = round_down(p->out - p->ostart, 8); > int ret; > ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH] lib: fix 842 build on 32-bit architectures 2015-05-13 23:52 ` Dan Streetman @ 2015-05-14 8:54 ` Dan Streetman 2015-05-14 10:49 ` Arnd Bergmann 0 siblings, 1 reply; 7+ messages in thread From: Dan Streetman @ 2015-05-14 8:54 UTC (permalink / raw) To: linux-arm-kernel On Wed, May 13, 2015 at 7:52 PM, Dan Streetman <ddstreet@ieee.org> wrote: >> Building the 842 code on 32-bit ARM currently results in this link >> error: >> >> ERROR: "__aeabi_uldivmod" [lib/842/842_decompress.ko] undefined! > > Oops! Guess I should build/test on 32 bit more. > >> >> The reason is that the __do_index function performs a 64-bit >> division by a power-of-two number, but it has no insight into >> the function arguments. wait, do you mean the 64 bit mod, total % fsize? That should already be fixed in Herbert's tree, I changed it to subtraction instead. In any case, I looked at the code again and I think the fsize parameter can be removed, and just simply calculated in the function, it's just a shift. I'll send a patch. >> >> By marking that function inline, the fsize argument is always >> known at the time that do_index is called, and the compiler is >> able to replace the extremely expensive 64-bit division with >> a cheap constant shift operation. > > alternately, we know that fsize will always be less than 64 bits, > at most it's 4<<9 or 8<<8 (both == 1<<11). So we could just change > its type to u16. > > diff --git a/lib/842/842_decompress.c b/lib/842/842_decompress.c > index 6b2b45aecde3..285bf6b6959c 100644 > --- a/lib/842/842_decompress.c > +++ b/lib/842/842_decompress.c > @@ -169,7 +169,7 @@ static int do_data(struct sw842_param *p, u8 n) > return 0; > } > > -static int __do_index(struct sw842_param *p, u8 size, u8 bits, u64 fsize) > +static int __do_index(struct sw842_param *p, u8 size, u8 bits, u16 fsize) > { > u64 index, offset, total = round_down(p->out - p->ostart, 8); > int ret; > > Or, we could inline it and change the type to u16. In any case, > > Acked-by: Dan Streetman <ddstreet@ieee.org> > >> >> Aside from fixing that link error, this approach should also improve >> both code size and performance on 32-bit architectures significantly. >> >> Signed-off-by: Arnd Bergmann <arnd@arndb.de> >> --- >> Found while building arm32 allmodconfig with gcc-5.0 >> >> diff --git a/lib/842/842_decompress.c b/lib/842/842_decompress.c >> index 6b2b45aecde3..285bf6b6959c 100644 >> --- a/lib/842/842_decompress.c >> +++ b/lib/842/842_decompress.c >> @@ -169,7 +169,7 @@ static int do_data(struct sw842_param *p, u8 n) >> return 0; >> } >> >> -static int __do_index(struct sw842_param *p, u8 size, u8 bits, u64 fsize) >> +static inline int __do_index(struct sw842_param *p, u8 size, u8 bits, u64 fsize) >> { >> u64 index, offset, total = round_down(p->out - p->ostart, 8); >> int ret; >> ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH] lib: fix 842 build on 32-bit architectures 2015-05-14 8:54 ` Dan Streetman @ 2015-05-14 10:49 ` Arnd Bergmann 2015-05-14 11:03 ` Dan Streetman 0 siblings, 1 reply; 7+ messages in thread From: Arnd Bergmann @ 2015-05-14 10:49 UTC (permalink / raw) To: linux-arm-kernel On Thursday 14 May 2015 04:54:07 Dan Streetman wrote: > On Wed, May 13, 2015 at 7:52 PM, Dan Streetman <ddstreet@ieee.org> wrote: > >> Building the 842 code on 32-bit ARM currently results in this link > >> error: > >> > >> ERROR: "__aeabi_uldivmod" [lib/842/842_decompress.ko] undefined! > > > > Oops! Guess I should build/test on 32 bit more. > > > >> > >> The reason is that the __do_index function performs a 64-bit > >> division by a power-of-two number, but it has no insight into > >> the function arguments. > > wait, do you mean the 64 bit mod, total % fsize? That should already > be fixed in Herbert's tree, I changed it to subtraction instead. Yes, that's the one. I was looking at yesterday's linux-next which still had the bug, but your fix has made it into today's release, so my patch is no longer needed. > In any case, I looked at the code again and I think the fsize > parameter can be removed, and just simply calculated in the function, > it's just a shift. I'll send a patch. Not necessary for this problem any more, but it could still make sense if you think that improves the code. You can also try to see if marking the function inline has any effect on code size or performance if either of them matters. Arnd ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH] lib: fix 842 build on 32-bit architectures 2015-05-14 10:49 ` Arnd Bergmann @ 2015-05-14 11:03 ` Dan Streetman 0 siblings, 0 replies; 7+ messages in thread From: Dan Streetman @ 2015-05-14 11:03 UTC (permalink / raw) To: linux-arm-kernel On Thu, May 14, 2015 at 6:49 AM, Arnd Bergmann <arnd@arndb.de> wrote: > On Thursday 14 May 2015 04:54:07 Dan Streetman wrote: >> On Wed, May 13, 2015 at 7:52 PM, Dan Streetman <ddstreet@ieee.org> wrote: >> >> Building the 842 code on 32-bit ARM currently results in this link >> >> error: >> >> >> >> ERROR: "__aeabi_uldivmod" [lib/842/842_decompress.ko] undefined! >> > >> > Oops! Guess I should build/test on 32 bit more. >> > >> >> >> >> The reason is that the __do_index function performs a 64-bit >> >> division by a power-of-two number, but it has no insight into >> >> the function arguments. >> >> wait, do you mean the 64 bit mod, total % fsize? That should already >> be fixed in Herbert's tree, I changed it to subtraction instead. > > Yes, that's the one. I was looking at yesterday's linux-next which still > had the bug, but your fix has made it into today's release, so my patch > is no longer needed. > >> In any case, I looked at the code again and I think the fsize >> parameter can be removed, and just simply calculated in the function, >> it's just a shift. I'll send a patch. > > Not necessary for this problem any more, but it could still make sense > if you think that improves the code. You can also try to see if marking > the function inline has any effect on code size or performance if either > of them matters. Yep, I'll run some performance tests both ways and send a patch if it does improve things. Thanks! > > Arnd ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH] lib: fix 842 build on 32-bit architectures 2015-05-13 20:56 [PATCH] lib: fix 842 build on 32-bit architectures Arnd Bergmann 2015-05-13 23:52 ` Dan Streetman @ 2015-05-14 3:06 ` Herbert Xu 2015-05-14 10:03 ` Russell King - ARM Linux 2 siblings, 0 replies; 7+ messages in thread From: Herbert Xu @ 2015-05-14 3:06 UTC (permalink / raw) To: linux-arm-kernel On Wed, May 13, 2015 at 10:56:39PM +0200, Arnd Bergmann wrote: > Building the 842 code on 32-bit ARM currently results in this link > error: > > ERROR: "__aeabi_uldivmod" [lib/842/842_decompress.ko] undefined! > > The reason is that the __do_index function performs a 64-bit > division by a power-of-two number, but it has no insight into > the function arguments. > > By marking that function inline, the fsize argument is always > known at the time that do_index is called, and the compiler is > able to replace the extremely expensive 64-bit division with > a cheap constant shift operation. > > Aside from fixing that link error, this approach should also improve > both code size and performance on 32-bit architectures significantly. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > Found while building arm32 allmodconfig with gcc-5.0 > > diff --git a/lib/842/842_decompress.c b/lib/842/842_decompress.c > index 6b2b45aecde3..285bf6b6959c 100644 > --- a/lib/842/842_decompress.c > +++ b/lib/842/842_decompress.c > @@ -169,7 +169,7 @@ static int do_data(struct sw842_param *p, u8 n) > return 0; > } > > -static int __do_index(struct sw842_param *p, u8 size, u8 bits, u64 fsize) > +static inline int __do_index(struct sw842_param *p, u8 size, u8 bits, u64 fsize) Ugh, relying on inlining to work is fragile. I'm not against making this inline but please make it work even when it is out- of-line. Thanks, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH] lib: fix 842 build on 32-bit architectures 2015-05-13 20:56 [PATCH] lib: fix 842 build on 32-bit architectures Arnd Bergmann 2015-05-13 23:52 ` Dan Streetman 2015-05-14 3:06 ` Herbert Xu @ 2015-05-14 10:03 ` Russell King - ARM Linux 2 siblings, 0 replies; 7+ messages in thread From: Russell King - ARM Linux @ 2015-05-14 10:03 UTC (permalink / raw) To: linux-arm-kernel On Wed, May 13, 2015 at 10:56:39PM +0200, Arnd Bergmann wrote: > Building the 842 code on 32-bit ARM currently results in this link > error: > > ERROR: "__aeabi_uldivmod" [lib/842/842_decompress.ko] undefined! > > The reason is that the __do_index function performs a 64-bit > division by a power-of-two number, but it has no insight into > the function arguments. > > By marking that function inline, the fsize argument is always > known at the time that do_index is called, and the compiler is > able to replace the extremely expensive 64-bit division with > a cheap constant shift operation. > > Aside from fixing that link error, this approach should also improve > both code size and performance on 32-bit architectures significantly. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > Found while building arm32 allmodconfig with gcc-5.0 > > diff --git a/lib/842/842_decompress.c b/lib/842/842_decompress.c > index 6b2b45aecde3..285bf6b6959c 100644 > --- a/lib/842/842_decompress.c > +++ b/lib/842/842_decompress.c > @@ -169,7 +169,7 @@ static int do_data(struct sw842_param *p, u8 n) > return 0; > } > > -static int __do_index(struct sw842_param *p, u8 size, u8 bits, u64 fsize) > +static inline int __do_index(struct sw842_param *p, u8 size, u8 bits, u64 fsize) This had better get a comment to say why this is done, to stop the "don't do static inline in a .c" brigade reverting this change. -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-05-14 11:03 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-05-13 20:56 [PATCH] lib: fix 842 build on 32-bit architectures Arnd Bergmann 2015-05-13 23:52 ` Dan Streetman 2015-05-14 8:54 ` Dan Streetman 2015-05-14 10:49 ` Arnd Bergmann 2015-05-14 11:03 ` Dan Streetman 2015-05-14 3:06 ` Herbert Xu 2015-05-14 10:03 ` Russell King - ARM Linux
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox