* Fwd: PPC bn_div_words routine rewrite [not found] ` <4dd15d1805070508312427a0ba@mail.gmail.com> @ 2005-07-05 15:45 ` David Ho 2005-07-05 16:36 ` David Ho 0 siblings, 1 reply; 7+ messages in thread From: David Ho @ 2005-07-05 15:45 UTC (permalink / raw) To: appro, linuxppc-embedded This is the second confirmed report of the same problem on the ppc8xx. After reading my email. I must say I was the unfriendly one, I apologize for that. More debugging evidence to come. ---------- Forwarded message ---------- From: Murch, Christopher <cmurch@mrv.com> Date: Jul 1, 2005 9:46 AM Subject: RE: PPC bn_div_words routine rewrite To: David Ho <davidkwho@gmail.com> David, I had observed the same issue on ppc 8xx machines after upgrading to the as= m version of the BN routines. Thank you very much for your work for the fix. My question is, do you have high confidence in the other new asm ppc BN routines after observing this issue or do you think they might have similia= r problems? Thanks. Chris -----Original Message----- From: David Ho [mailto:davidkwho@gmail.com] Sent: Thursday, June 30, 2005 6:22 PM To: openssl-dev@openssl.org; linuxppc-embedded@ozlabs.org Subject: Re: PPC bn_div_words routine rewrite The reason I had to redo this routine, in case anyone is wondering, is because ssh-keygen segfaults when this assembly routine returns junk to the BN_div_word function. On a ppc, if you issue the command ssh-keygen -t rsa1 -f /etc/ssh/ssh_host_key -N "" The program craps out when it tries to write the public key in ascii decimal. Regards, David On 6/30/05, David Ho <davidkwho@gmail.com> wrote: > Hi all, > > This is a rewrite of the bn_div_words routine for the PowerPC arch, > tested on a MPC8xx processor. > I initially thought there is maybe a small mistake in the code that > requires a one-liner change but it turns out I have to redo the > routine. > I guess this routine is not called very often as I see that most other > routines are hand-crafted, whereas this routine is compiled from a C > function that apparently has not gone through a whole lot of testing. > > I wrote a C function to confirm correctness of the code. > > unsigned long div_words (unsigned long h, > unsigned long l, > unsigned long d) > { > unsigned long i_h; /* intermediate dividend */ > unsigned long i_q; /* quotient of i_h/d */ > unsigned long i_r; /* remainder of i_h/d */ > > unsigned long i_cntr; > unsigned long i_carry; > > unsigned long ret_q; /* return quotient */ > > /* cannot divide by zero */ > if (d =3D=3D 0) return 0xffffffff; > > /* do simple 32-bit divide */ > if (h =3D=3D 0) return l/d; > > i_q =3D h/d; > i_r =3D h - (i_q*d); > ret_q =3D i_q; > > i_cntr =3D 32; > > while (i_cntr--) > { > i_carry =3D (l & 0x80000000) ? 1:0; > l =3D l << 1; > > i_h =3D (i_r << 1) | i_carry; > i_q =3D i_h/d; > i_r =3D i_h - (i_q*d); > > ret_q =3D (ret_q << 1) | i_q; > } > > return ret_q; > } > > > Then I handcrafted the routine in PPC assembly. > The result is a 26 line assembly that is easy to understand and > predictable as opposed to a 81liner that I am still trying to > decipher... > If anyone is interested in incorporating this routine to the openssl > code I'll be happy to assist. > At this point I think I will be taking a bit of a break from this 3 > day debugging/fixing marathon. > > Regards, > David Ho > > > # > # Handcrafted version of bn_div_words > # > # r3 =3D h > # r4 =3D l > # r5 =3D d > > cmplwi 0,r5,0 # compare r5 and 0 > bc BO_IF_NOT,CR0_EQ,.Lppcasm_div1 # proceed if d!=3D0 > li r3,-1 # d=3D0 return -1 > bclr BO_ALWAYS,CR0_LT > .Lppcasm_div1: > cmplwi 0,r3,0 # compare r3 and 0 > bc BO_IF_NOT,CR0_EQ,.Lppcasm_div2 # proceed if h !=3D 0 > divwu r3,r4,r5 # ret_q =3D l/d > bclr BO_ALWAYS,CR0_LT # return result in r3 > .Lppcasm_div2: > divwu r9,r3,r5 # i_q =3D h/d > mullw r10,r9,r5 # i_r =3D h - (i_q*d) > subf r10,r10,r3 > mr r3,r9 # req_q =3D i_q > .Lppcasm_set_ctr: > li r12,32 # ctr =3D bitsizeof(d) > mtctr r12 > .Lppcasm_div_loop: > addc r4,r4,r4 # l =3D l << 1 -> i_carry > adde r11,r10,r10 # i_h =3D (i_r << 1) | i_carry > divwu r9,r11,r5 # i_q =3D i_h/d > mullw r10,r9,r5 # i_r =3D i_h - (i_q*d) > subf r10,r10,r11 > add r3,r3,r3 # ret_q =3D ret_q << 1 | i_q > add r3,r3,r9 > bc BO_dCTR_NZERO,CR0_EQ,.Lppcasm_div_loop > .Lppc_div_end: > bclr BO_ALWAYS,CR0_LT # return result in r3 > .long 0x00000000 > _______________________________________________ Linuxppc-embedded mailing list Linuxppc-embedded@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-embedded ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PPC bn_div_words routine rewrite 2005-07-05 15:45 ` Fwd: PPC bn_div_words routine rewrite David Ho @ 2005-07-05 16:36 ` David Ho 2005-07-05 17:01 ` David Ho 0 siblings, 1 reply; 7+ messages in thread From: David Ho @ 2005-07-05 16:36 UTC (permalink / raw) To: appro, linuxppc-embedded, openssl-dev First pass debugging results from gdb on ppc8xx. Executing ssh-keygen with following arguments. (gdb) show args Argument list to give program being debugged when it is started is "-t rsa1 -f /etc/ssh/ssh_host_key -N """. Program received signal SIGSEGV, Segmentation fault. BN_bn2dec (a=3D0x1002d9f0) at bn_print.c:136 136 *lp=3DBN_div_word(t,BN_DEC_CONV); (gdb) i r r0 0x0 0 r1 0x7fffd580 2147472768 r2 0x30012868 805382248 r3 0x80000000 2147483648 r4 0xfef33fc 267334652 r5 0x25 37 r6 0xfccdef8 265084664 r7 0x7fffd4c0 2147472576 r8 0xfbad2887 4222429319 r9 0x84044022 2214871074 r10 0x0 0 r11 0x2 2 r12 0xfef2054 267329620 r13 0x10030bc8 268635080 r14 0x0 0 r15 0x0 0 r16 0x0 0 r17 0x0 0 r18 0x0 0 r19 0x0 0 r20 0x0 0 r21 0x0 0 r22 0x0 0 r23 0x64 100 r24 0x5 5 r25 0x1002d438 268620856 r26 0x1002d9f0 268622320 r27 0x1002c578 268617080 r28 0x1 1 r29 0x10031000 268636160 r30 0xffbf7d0 268171216 r31 0x1002d9f0 268622320 pc 0xfef2058 267329624 ps 0xd032 53298 cr 0x24044022 604258338 lr 0xfef2054 267329620 ctr 0xfccefa0 265088928 xer 0x20000000 536870912 fpscr 0x0 0 vscr 0x0 0 vrsave 0x0 0 (gdb) p/x $pc $1 =3D 0xfef2058 0x0fef2058 <BN_bn2dec+472>: stw r3,0(r29) (gdb) x 0x10031000 0x10031000: Cannot access memory at address 0x10031000 On 7/5/05, David Ho <davidkwho@gmail.com> wrote: > This is the second confirmed report of the same problem on the ppc8xx. >=20 > After reading my email. I must say I was the unfriendly one, I > apologize for that. >=20 > More debugging evidence to come. >=20 > ---------- Forwarded message ---------- > From: Murch, Christopher <cmurch@mrv.com> > Date: Jul 1, 2005 9:46 AM > Subject: RE: PPC bn_div_words routine rewrite > To: David Ho <davidkwho@gmail.com> >=20 >=20 > David, > I had observed the same issue on ppc 8xx machines after upgrading to the = asm > version of the BN routines. Thank you very much for your work for the fi= x. > My question is, do you have high confidence in the other new asm ppc BN > routines after observing this issue or do you think they might have simil= iar > problems? > Thanks. > Chris >=20 > -----Original Message----- > From: David Ho [mailto:davidkwho@gmail.com] > Sent: Thursday, June 30, 2005 6:22 PM > To: openssl-dev@openssl.org; linuxppc-embedded@ozlabs.org > Subject: Re: PPC bn_div_words routine rewrite >=20 >=20 > The reason I had to redo this routine, in case anyone is wondering, is > because ssh-keygen segfaults when this assembly routine returns junk > to the BN_div_word function. On a ppc, if you issue the command >=20 > ssh-keygen -t rsa1 -f /etc/ssh/ssh_host_key -N "" >=20 > The program craps out when it tries to write the public key in ascii > decimal. >=20 > Regards, > David >=20 > On 6/30/05, David Ho <davidkwho@gmail.com> wrote: > > Hi all, > > > > This is a rewrite of the bn_div_words routine for the PowerPC arch, > > tested on a MPC8xx processor. > > I initially thought there is maybe a small mistake in the code that > > requires a one-liner change but it turns out I have to redo the > > routine. > > I guess this routine is not called very often as I see that most other > > routines are hand-crafted, whereas this routine is compiled from a C > > function that apparently has not gone through a whole lot of testing. > > > > I wrote a C function to confirm correctness of the code. > > > > unsigned long div_words (unsigned long h, > > unsigned long l, > > unsigned long d) > > { > > unsigned long i_h; /* intermediate dividend */ > > unsigned long i_q; /* quotient of i_h/d */ > > unsigned long i_r; /* remainder of i_h/d */ > > > > unsigned long i_cntr; > > unsigned long i_carry; > > > > unsigned long ret_q; /* return quotient */ > > > > /* cannot divide by zero */ > > if (d =3D=3D 0) return 0xffffffff; > > > > /* do simple 32-bit divide */ > > if (h =3D=3D 0) return l/d; > > > > i_q =3D h/d; > > i_r =3D h - (i_q*d); > > ret_q =3D i_q; > > > > i_cntr =3D 32; > > > > while (i_cntr--) > > { > > i_carry =3D (l & 0x80000000) ? 1:0; > > l =3D l << 1; > > > > i_h =3D (i_r << 1) | i_carry; > > i_q =3D i_h/d; > > i_r =3D i_h - (i_q*d); > > > > ret_q =3D (ret_q << 1) | i_q; > > } > > > > return ret_q; > > } > > > > > > Then I handcrafted the routine in PPC assembly. > > The result is a 26 line assembly that is easy to understand and > > predictable as opposed to a 81liner that I am still trying to > > decipher... > > If anyone is interested in incorporating this routine to the openssl > > code I'll be happy to assist. > > At this point I think I will be taking a bit of a break from this 3 > > day debugging/fixing marathon. > > > > Regards, > > David Ho > > > > > > # > > # Handcrafted version of bn_div_words > > # > > # r3 =3D h > > # r4 =3D l > > # r5 =3D d > > > > cmplwi 0,r5,0 # compare r5 and 0 > > bc BO_IF_NOT,CR0_EQ,.Lppcasm_div1 # proceed if d!=3D0 > > li r3,-1 # d=3D0 return -1 > > bclr BO_ALWAYS,CR0_LT > > .Lppcasm_div1: > > cmplwi 0,r3,0 # compare r3 and 0 > > bc BO_IF_NOT,CR0_EQ,.Lppcasm_div2 # proceed if h !=3D 0 > > divwu r3,r4,r5 # ret_q =3D l/d > > bclr BO_ALWAYS,CR0_LT # return result in r3 > > .Lppcasm_div2: > > divwu r9,r3,r5 # i_q =3D h/d > > mullw r10,r9,r5 # i_r =3D h - (i_q*d) > > subf r10,r10,r3 > > mr r3,r9 # req_q =3D i_q > > .Lppcasm_set_ctr: > > li r12,32 # ctr =3D bitsizeof(d) > > mtctr r12 > > .Lppcasm_div_loop: > > addc r4,r4,r4 # l =3D l << 1 -> i_carry > > adde r11,r10,r10 # i_h =3D (i_r << 1) | i_carry > > divwu r9,r11,r5 # i_q =3D i_h/d > > mullw r10,r9,r5 # i_r =3D i_h - (i_q*d) > > subf r10,r10,r11 > > add r3,r3,r3 # ret_q =3D ret_q << 1 | i_q > > add r3,r3,r9 > > bc BO_dCTR_NZERO,CR0_EQ,.Lppcasm_div_loop > > .Lppc_div_end: > > bclr BO_ALWAYS,CR0_LT # return result in r3 > > .long 0x00000000 > > > _______________________________________________ > Linuxppc-embedded mailing list > Linuxppc-embedded@ozlabs.org > https://ozlabs.org/mailman/listinfo/linuxppc-embedded > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PPC bn_div_words routine rewrite 2005-07-05 16:36 ` David Ho @ 2005-07-05 17:01 ` David Ho 2005-07-05 20:21 ` David Ho 0 siblings, 1 reply; 7+ messages in thread From: David Ho @ 2005-07-05 17:01 UTC (permalink / raw) To: appro, linuxppc-embedded, openssl-dev I can tell you with certainty, with reference to the function BN_bn2dec, that since lp is a pointer, and within the while loop around bn_print.c:136 lp is being incremented. Because the test BN_is_zero(t) is always false, you have a pointer that is going off into the stratosphere, hence the segfault on ppc8xx. More analysis to come. On 7/5/05, David Ho <davidkwho@gmail.com> wrote: > First pass debugging results from gdb on ppc8xx. Executing ssh-keygen > with following arguments. >=20 > (gdb) show args > Argument list to give program being debugged when it is started is > "-t rsa1 -f /etc/ssh/ssh_host_key -N """. >=20 > Program received signal SIGSEGV, Segmentation fault. > BN_bn2dec (a=3D0x1002d9f0) at bn_print.c:136 > 136 *lp=3DBN_div_word(t,BN_DEC_CONV); >=20 > (gdb) i r > r0 0x0 0 > r1 0x7fffd580 2147472768 > r2 0x30012868 805382248 > r3 0x80000000 2147483648 > r4 0xfef33fc 267334652 > r5 0x25 37 > r6 0xfccdef8 265084664 > r7 0x7fffd4c0 2147472576 > r8 0xfbad2887 4222429319 > r9 0x84044022 2214871074 > r10 0x0 0 > r11 0x2 2 > r12 0xfef2054 267329620 > r13 0x10030bc8 268635080 > r14 0x0 0 > r15 0x0 0 > r16 0x0 0 > r17 0x0 0 > r18 0x0 0 > r19 0x0 0 > r20 0x0 0 > r21 0x0 0 > r22 0x0 0 > r23 0x64 100 > r24 0x5 5 > r25 0x1002d438 268620856 > r26 0x1002d9f0 268622320 > r27 0x1002c578 268617080 > r28 0x1 1 > r29 0x10031000 268636160 > r30 0xffbf7d0 268171216 > r31 0x1002d9f0 268622320 > pc 0xfef2058 267329624 > ps 0xd032 53298 > cr 0x24044022 604258338 > lr 0xfef2054 267329620 > ctr 0xfccefa0 265088928 > xer 0x20000000 536870912 > fpscr 0x0 0 > vscr 0x0 0 > vrsave 0x0 0 >=20 > (gdb) p/x $pc > $1 =3D 0xfef2058 >=20 > 0x0fef2058 <BN_bn2dec+472>: stw r3,0(r29) >=20 > (gdb) x 0x10031000 > 0x10031000: Cannot access memory at address 0x10031000 >=20 >=20 >=20 >=20 >=20 >=20 >=20 >=20 >=20 >=20 > On 7/5/05, David Ho <davidkwho@gmail.com> wrote: > > This is the second confirmed report of the same problem on the ppc8xx. > > > > After reading my email. I must say I was the unfriendly one, I > > apologize for that. > > > > More debugging evidence to come. > > > > ---------- Forwarded message ---------- > > From: Murch, Christopher <cmurch@mrv.com> > > Date: Jul 1, 2005 9:46 AM > > Subject: RE: PPC bn_div_words routine rewrite > > To: David Ho <davidkwho@gmail.com> > > > > > > David, > > I had observed the same issue on ppc 8xx machines after upgrading to th= e asm > > version of the BN routines. Thank you very much for your work for the = fix. > > My question is, do you have high confidence in the other new asm ppc BN > > routines after observing this issue or do you think they might have sim= iliar > > problems? > > Thanks. > > Chris > > > > -----Original Message----- > > From: David Ho [mailto:davidkwho@gmail.com] > > Sent: Thursday, June 30, 2005 6:22 PM > > To: openssl-dev@openssl.org; linuxppc-embedded@ozlabs.org > > Subject: Re: PPC bn_div_words routine rewrite > > > > > > The reason I had to redo this routine, in case anyone is wondering, is > > because ssh-keygen segfaults when this assembly routine returns junk > > to the BN_div_word function. On a ppc, if you issue the command > > > > ssh-keygen -t rsa1 -f /etc/ssh/ssh_host_key -N "" > > > > The program craps out when it tries to write the public key in ascii > > decimal. > > > > Regards, > > David > > > > On 6/30/05, David Ho <davidkwho@gmail.com> wrote: > > > Hi all, > > > > > > This is a rewrite of the bn_div_words routine for the PowerPC arch, > > > tested on a MPC8xx processor. > > > I initially thought there is maybe a small mistake in the code that > > > requires a one-liner change but it turns out I have to redo the > > > routine. > > > I guess this routine is not called very often as I see that most othe= r > > > routines are hand-crafted, whereas this routine is compiled from a C > > > function that apparently has not gone through a whole lot of testing. > > > > > > I wrote a C function to confirm correctness of the code. > > > > > > unsigned long div_words (unsigned long h, > > > unsigned long l, > > > unsigned long d) > > > { > > > unsigned long i_h; /* intermediate dividend */ > > > unsigned long i_q; /* quotient of i_h/d */ > > > unsigned long i_r; /* remainder of i_h/d */ > > > > > > unsigned long i_cntr; > > > unsigned long i_carry; > > > > > > unsigned long ret_q; /* return quotient */ > > > > > > /* cannot divide by zero */ > > > if (d =3D=3D 0) return 0xffffffff; > > > > > > /* do simple 32-bit divide */ > > > if (h =3D=3D 0) return l/d; > > > > > > i_q =3D h/d; > > > i_r =3D h - (i_q*d); > > > ret_q =3D i_q; > > > > > > i_cntr =3D 32; > > > > > > while (i_cntr--) > > > { > > > i_carry =3D (l & 0x80000000) ? 1:0; > > > l =3D l << 1; > > > > > > i_h =3D (i_r << 1) | i_carry; > > > i_q =3D i_h/d; > > > i_r =3D i_h - (i_q*d); > > > > > > ret_q =3D (ret_q << 1) | i_q; > > > } > > > > > > return ret_q; > > > } > > > > > > > > > Then I handcrafted the routine in PPC assembly. > > > The result is a 26 line assembly that is easy to understand and > > > predictable as opposed to a 81liner that I am still trying to > > > decipher... > > > If anyone is interested in incorporating this routine to the openssl > > > code I'll be happy to assist. > > > At this point I think I will be taking a bit of a break from this 3 > > > day debugging/fixing marathon. > > > > > > Regards, > > > David Ho > > > > > > > > > # > > > # Handcrafted version of bn_div_words > > > # > > > # r3 =3D h > > > # r4 =3D l > > > # r5 =3D d > > > > > > cmplwi 0,r5,0 # compare r5 and 0 > > > bc BO_IF_NOT,CR0_EQ,.Lppcasm_div1 # proceed if d!=3D0 > > > li r3,-1 # d=3D0 return -1 > > > bclr BO_ALWAYS,CR0_LT > > > .Lppcasm_div1: > > > cmplwi 0,r3,0 # compare r3 and 0 > > > bc BO_IF_NOT,CR0_EQ,.Lppcasm_div2 # proceed if h !=3D 0 > > > divwu r3,r4,r5 # ret_q =3D l/d > > > bclr BO_ALWAYS,CR0_LT # return result in r3 > > > .Lppcasm_div2: > > > divwu r9,r3,r5 # i_q =3D h/d > > > mullw r10,r9,r5 # i_r =3D h - (i_q*d) > > > subf r10,r10,r3 > > > mr r3,r9 # req_q =3D i_q > > > .Lppcasm_set_ctr: > > > li r12,32 # ctr =3D bitsizeof(d) > > > mtctr r12 > > > .Lppcasm_div_loop: > > > addc r4,r4,r4 # l =3D l << 1 -> i_carry > > > adde r11,r10,r10 # i_h =3D (i_r << 1) | i_carr= y > > > divwu r9,r11,r5 # i_q =3D i_h/d > > > mullw r10,r9,r5 # i_r =3D i_h - (i_q*d) > > > subf r10,r10,r11 > > > add r3,r3,r3 # ret_q =3D ret_q << 1 | i_q > > > add r3,r3,r9 > > > bc BO_dCTR_NZERO,CR0_EQ,.Lppcasm_div_loop > > > .Lppc_div_end: > > > bclr BO_ALWAYS,CR0_LT # return result in r3 > > > .long 0x00000000 > > > > > _______________________________________________ > > Linuxppc-embedded mailing list > > Linuxppc-embedded@ozlabs.org > > https://ozlabs.org/mailman/listinfo/linuxppc-embedded > > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PPC bn_div_words routine rewrite 2005-07-05 17:01 ` David Ho @ 2005-07-05 20:21 ` David Ho 2005-07-05 21:22 ` Andy Polyakov 2005-07-05 21:25 ` David Ho 0 siblings, 2 replies; 7+ messages in thread From: David Ho @ 2005-07-05 20:21 UTC (permalink / raw) To: appro, linuxppc-embedded, openssl-dev Let's take first call to BN_div_word for example from BN_bn2dec, the parameter being passed to BN_div_word is (a=3D35, w=3D1000000000) (decimal numbers). It then calls the bn_div_words with (h=3D0, l=3D35, d=3D1000000000) if you examine the code in linux_ppc32.s it will exit early on because h is 0. the routine returns a divide by 0, which is undefined according to the manual. In the case of ppc8xx the result is 0x80000000. So this is the return value from bn_div_words, as seen in register R3. So what happens next is BN_div_word modifies "a" (1st parameter) with the result (0x80000000) and returns 23 as the remainder of the division. So "a" is never zero as a result and hence the test for BN_is_zero is always false. The problem fails the very first time it uses bn_div_words. The next thing I did naturally was to fix the case when you have h=3D0, which you can quite easy do it with the native divwu instruction. Lo and behold I was once again disappointed when h is not equal to 0. More to come... On 7/5/05, David Ho <davidkwho@gmail.com> wrote: > I can tell you with certainty, with reference to the function > BN_bn2dec, that since lp is a pointer, and within the while loop > around bn_print.c:136 lp is being incremented. Because the test > BN_is_zero(t) is always false, you have a pointer that is going off > into the stratosphere, hence the segfault on ppc8xx. >=20 > More analysis to come. >=20 > On 7/5/05, David Ho <davidkwho@gmail.com> wrote: > > First pass debugging results from gdb on ppc8xx. Executing ssh-keygen > > with following arguments. > > > > (gdb) show args > > Argument list to give program being debugged when it is started is > > "-t rsa1 -f /etc/ssh/ssh_host_key -N """. > > > > Program received signal SIGSEGV, Segmentation fault. > > BN_bn2dec (a=3D0x1002d9f0) at bn_print.c:136 > > 136 *lp=3DBN_div_word(t,BN_DEC_CONV); > > > > (gdb) i r > > r0 0x0 0 > > r1 0x7fffd580 2147472768 > > r2 0x30012868 805382248 > > r3 0x80000000 2147483648 > > r4 0xfef33fc 267334652 > > r5 0x25 37 > > r6 0xfccdef8 265084664 > > r7 0x7fffd4c0 2147472576 > > r8 0xfbad2887 4222429319 > > r9 0x84044022 2214871074 > > r10 0x0 0 > > r11 0x2 2 > > r12 0xfef2054 267329620 > > r13 0x10030bc8 268635080 > > r14 0x0 0 > > r15 0x0 0 > > r16 0x0 0 > > r17 0x0 0 > > r18 0x0 0 > > r19 0x0 0 > > r20 0x0 0 > > r21 0x0 0 > > r22 0x0 0 > > r23 0x64 100 > > r24 0x5 5 > > r25 0x1002d438 268620856 > > r26 0x1002d9f0 268622320 > > r27 0x1002c578 268617080 > > r28 0x1 1 > > r29 0x10031000 268636160 > > r30 0xffbf7d0 268171216 > > r31 0x1002d9f0 268622320 > > pc 0xfef2058 267329624 > > ps 0xd032 53298 > > cr 0x24044022 604258338 > > lr 0xfef2054 267329620 > > ctr 0xfccefa0 265088928 > > xer 0x20000000 536870912 > > fpscr 0x0 0 > > vscr 0x0 0 > > vrsave 0x0 0 > > > > (gdb) p/x $pc > > $1 =3D 0xfef2058 > > > > 0x0fef2058 <BN_bn2dec+472>: stw r3,0(r29) > > > > (gdb) x 0x10031000 > > 0x10031000: Cannot access memory at address 0x10031000 > > > > > > > > > > > > > > > > > > > > > > On 7/5/05, David Ho <davidkwho@gmail.com> wrote: > > > This is the second confirmed report of the same problem on the ppc8xx= . > > > > > > After reading my email. I must say I was the unfriendly one, I > > > apologize for that. > > > > > > More debugging evidence to come. > > > > > > ---------- Forwarded message ---------- > > > From: Murch, Christopher <cmurch@mrv.com> > > > Date: Jul 1, 2005 9:46 AM > > > Subject: RE: PPC bn_div_words routine rewrite > > > To: David Ho <davidkwho@gmail.com> > > > > > > > > > David, > > > I had observed the same issue on ppc 8xx machines after upgrading to = the asm > > > version of the BN routines. Thank you very much for your work for th= e fix. > > > My question is, do you have high confidence in the other new asm ppc = BN > > > routines after observing this issue or do you think they might have s= imiliar > > > problems? > > > Thanks. > > > Chris > > > > > > -----Original Message----- > > > From: David Ho [mailto:davidkwho@gmail.com] > > > Sent: Thursday, June 30, 2005 6:22 PM > > > To: openssl-dev@openssl.org; linuxppc-embedded@ozlabs.org > > > Subject: Re: PPC bn_div_words routine rewrite > > > > > > > > > The reason I had to redo this routine, in case anyone is wondering, i= s > > > because ssh-keygen segfaults when this assembly routine returns junk > > > to the BN_div_word function. On a ppc, if you issue the command > > > > > > ssh-keygen -t rsa1 -f /etc/ssh/ssh_host_key -N "" > > > > > > The program craps out when it tries to write the public key in ascii > > > decimal. > > > > > > Regards, > > > David > > > > > > On 6/30/05, David Ho <davidkwho@gmail.com> wrote: > > > > Hi all, > > > > > > > > This is a rewrite of the bn_div_words routine for the PowerPC arch, > > > > tested on a MPC8xx processor. > > > > I initially thought there is maybe a small mistake in the code that > > > > requires a one-liner change but it turns out I have to redo the > > > > routine. > > > > I guess this routine is not called very often as I see that most ot= her > > > > routines are hand-crafted, whereas this routine is compiled from a = C > > > > function that apparently has not gone through a whole lot of testin= g. > > > > > > > > I wrote a C function to confirm correctness of the code. > > > > > > > > unsigned long div_words (unsigned long h, > > > > unsigned long l, > > > > unsigned long d) > > > > { > > > > unsigned long i_h; /* intermediate dividend */ > > > > unsigned long i_q; /* quotient of i_h/d */ > > > > unsigned long i_r; /* remainder of i_h/d */ > > > > > > > > unsigned long i_cntr; > > > > unsigned long i_carry; > > > > > > > > unsigned long ret_q; /* return quotient */ > > > > > > > > /* cannot divide by zero */ > > > > if (d =3D=3D 0) return 0xffffffff; > > > > > > > > /* do simple 32-bit divide */ > > > > if (h =3D=3D 0) return l/d; > > > > > > > > i_q =3D h/d; > > > > i_r =3D h - (i_q*d); > > > > ret_q =3D i_q; > > > > > > > > i_cntr =3D 32; > > > > > > > > while (i_cntr--) > > > > { > > > > i_carry =3D (l & 0x80000000) ? 1:0; > > > > l =3D l << 1; > > > > > > > > i_h =3D (i_r << 1) | i_carry; > > > > i_q =3D i_h/d; > > > > i_r =3D i_h - (i_q*d); > > > > > > > > ret_q =3D (ret_q << 1) | i_q; > > > > } > > > > > > > > return ret_q; > > > > } > > > > > > > > > > > > Then I handcrafted the routine in PPC assembly. > > > > The result is a 26 line assembly that is easy to understand and > > > > predictable as opposed to a 81liner that I am still trying to > > > > decipher... > > > > If anyone is interested in incorporating this routine to the openss= l > > > > code I'll be happy to assist. > > > > At this point I think I will be taking a bit of a break from this 3 > > > > day debugging/fixing marathon. > > > > > > > > Regards, > > > > David Ho > > > > > > > > > > > > # > > > > # Handcrafted version of bn_div_words > > > > # > > > > # r3 =3D h > > > > # r4 =3D l > > > > # r5 =3D d > > > > > > > > cmplwi 0,r5,0 # compare r5 and 0 > > > > bc BO_IF_NOT,CR0_EQ,.Lppcasm_div1 # proceed if d!=3D0 > > > > li r3,-1 # d=3D0 return -1 > > > > bclr BO_ALWAYS,CR0_LT > > > > .Lppcasm_div1: > > > > cmplwi 0,r3,0 # compare r3 and 0 > > > > bc BO_IF_NOT,CR0_EQ,.Lppcasm_div2 # proceed if h !=3D= 0 > > > > divwu r3,r4,r5 # ret_q =3D l/d > > > > bclr BO_ALWAYS,CR0_LT # return result in r3 > > > > .Lppcasm_div2: > > > > divwu r9,r3,r5 # i_q =3D h/d > > > > mullw r10,r9,r5 # i_r =3D h - (i_q*d) > > > > subf r10,r10,r3 > > > > mr r3,r9 # req_q =3D i_q > > > > .Lppcasm_set_ctr: > > > > li r12,32 # ctr =3D bitsizeof(d) > > > > mtctr r12 > > > > .Lppcasm_div_loop: > > > > addc r4,r4,r4 # l =3D l << 1 -> i_carry > > > > adde r11,r10,r10 # i_h =3D (i_r << 1) | i_ca= rry > > > > divwu r9,r11,r5 # i_q =3D i_h/d > > > > mullw r10,r9,r5 # i_r =3D i_h - (i_q*d) > > > > subf r10,r10,r11 > > > > add r3,r3,r3 # ret_q =3D ret_q << 1 | i_= q > > > > add r3,r3,r9 > > > > bc BO_dCTR_NZERO,CR0_EQ,.Lppcasm_div_loop > > > > .Lppc_div_end: > > > > bclr BO_ALWAYS,CR0_LT # return result in r3 > > > > .long 0x00000000 > > > > > > > _______________________________________________ > > > Linuxppc-embedded mailing list > > > Linuxppc-embedded@ozlabs.org > > > https://ozlabs.org/mailman/listinfo/linuxppc-embedded > > > > > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PPC bn_div_words routine rewrite 2005-07-05 20:21 ` David Ho @ 2005-07-05 21:22 ` Andy Polyakov 2005-07-05 21:25 ` David Ho 1 sibling, 0 replies; 7+ messages in thread From: Andy Polyakov @ 2005-07-05 21:22 UTC (permalink / raw) To: openssl-dev; +Cc: linuxppc-embedded > Let's take first call to BN_div_word for example from BN_bn2dec, the > parameter being passed to BN_div_word is (a=35, w=1000000000) (decimal > numbers). It then calls the bn_div_words with (h=0, l=35, > d=1000000000) if you examine the code in linux_ppc32.s it will exit > early on because h is 0. the routine returns a divide by 0, which is > undefined according to the manual. In the case of ppc8xx the result > is 0x80000000. And on the PPC machine I have access to it returns 0. This is explanation why I never experienced any SEGV, but a sparse decimal output. And it does explain why BN_is_zero condition never met on your system and you hit sbrk(0) limit and suffer the penalty. However! Note that updated routine, http://cvs.openssl.org/getfile/openssl/crypto/bn/asm/ppc.pl?v=1.4 never issues divide by 0 [it traps instead, but the condition is never met now when called from BN_div_words] and it does return correct answer to me. Can you really confirm that updated subroutine doesn't work for you? And if so, how does problem manifest? Still SEGV? At same point? It should pointed out that bug in ppc.pl is encountered only in 0.9.7 context, as 0.9.8 avoids it by normalizing divisor [and adjusting dividend accordingly]. BTW, I can confirm that 0.9.7 produces correct decimal ASCII with your routine [but no luck with make test_bn], but in 0.9.8 context decimal printout comes out truncated [not sparse with some significant digits there and there, but truncated] if your routine is pasted in. A. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PPC bn_div_words routine rewrite 2005-07-05 20:21 ` David Ho 2005-07-05 21:22 ` Andy Polyakov @ 2005-07-05 21:25 ` David Ho 2005-07-05 21:49 ` Andy Polyakov 1 sibling, 1 reply; 7+ messages in thread From: David Ho @ 2005-07-05 21:25 UTC (permalink / raw) To: appro, linuxppc-embedded, openssl-dev Okay, having actually did what Andy suggested, i.e. the one liner fix in the assembly code, bn_div_words returns the correct results. At this point, my conclusion is, up to openssl-0.9.8-beta6, the ppc32 bn_div_words routine generated from crypto/bn/ppc.pl is still busted.=20 Your solution is either to use the one liner fix andy provided in his reply or my routine. My solution is a complete rewrite of the routine, which is tested to work on ppc8xx. I personally do not know why it would not work on other ppc32 systems but if you have time to spare you are welcome to give it a try on your ppc32, because it is by far the most straight-forward algorithm for division. If you have done long division on paper from your primary school days, this function mimics exactly the long division steps in binary (as opposed to decimal). Way easier to fix if you know what the algorithm is trying to do. Comments I have for the old routine are: Why do you signal an overflow condition when it appears functions that call bn_div_words do not check for overflow conditions? That's why in my routine I just let the register overflow. Why the complexity, does it offer performance advantage? Obviously code size is huge compared to mine. David On 7/5/05, David Ho <davidkwho@gmail.com> wrote: > Let's take first call to BN_div_word for example from BN_bn2dec, the > parameter being passed to BN_div_word is (a=3D35, w=3D1000000000) (decima= l > numbers). It then calls the bn_div_words with (h=3D0, l=3D35, > d=3D1000000000) if you examine the code in linux_ppc32.s it will exit > early on because h is 0. the routine returns a divide by 0, which is > undefined according to the manual. In the case of ppc8xx the result > is 0x80000000. So this is the return value from bn_div_words, as seen > in register R3. >=20 > So what happens next is BN_div_word modifies "a" (1st parameter) with > the result (0x80000000) and returns 23 as the remainder of the > division. So "a" is never zero as a result and hence the test for > BN_is_zero is always false. The problem fails the very first time it > uses bn_div_words. >=20 > The next thing I did naturally was to fix the case when you have h=3D0, > which you can quite easy do it with the native divwu instruction. Lo > and behold I was once again disappointed when h is not equal to 0. >=20 > More to come... >=20 >=20 > On 7/5/05, David Ho <davidkwho@gmail.com> wrote: > > I can tell you with certainty, with reference to the function > > BN_bn2dec, that since lp is a pointer, and within the while loop > > around bn_print.c:136 lp is being incremented. Because the test > > BN_is_zero(t) is always false, you have a pointer that is going off > > into the stratosphere, hence the segfault on ppc8xx. > > > > More analysis to come. > > > > On 7/5/05, David Ho <davidkwho@gmail.com> wrote: > > > First pass debugging results from gdb on ppc8xx. Executing ssh-keyge= n > > > with following arguments. > > > > > > (gdb) show args > > > Argument list to give program being debugged when it is started is > > > "-t rsa1 -f /etc/ssh/ssh_host_key -N """. > > > > > > Program received signal SIGSEGV, Segmentation fault. > > > BN_bn2dec (a=3D0x1002d9f0) at bn_print.c:136 > > > 136 *lp=3DBN_div_word(t,BN_DEC_CONV); > > > > > > (gdb) i r > > > r0 0x0 0 > > > r1 0x7fffd580 2147472768 > > > r2 0x30012868 805382248 > > > r3 0x80000000 2147483648 > > > r4 0xfef33fc 267334652 > > > r5 0x25 37 > > > r6 0xfccdef8 265084664 > > > r7 0x7fffd4c0 2147472576 > > > r8 0xfbad2887 4222429319 > > > r9 0x84044022 2214871074 > > > r10 0x0 0 > > > r11 0x2 2 > > > r12 0xfef2054 267329620 > > > r13 0x10030bc8 268635080 > > > r14 0x0 0 > > > r15 0x0 0 > > > r16 0x0 0 > > > r17 0x0 0 > > > r18 0x0 0 > > > r19 0x0 0 > > > r20 0x0 0 > > > r21 0x0 0 > > > r22 0x0 0 > > > r23 0x64 100 > > > r24 0x5 5 > > > r25 0x1002d438 268620856 > > > r26 0x1002d9f0 268622320 > > > r27 0x1002c578 268617080 > > > r28 0x1 1 > > > r29 0x10031000 268636160 > > > r30 0xffbf7d0 268171216 > > > r31 0x1002d9f0 268622320 > > > pc 0xfef2058 267329624 > > > ps 0xd032 53298 > > > cr 0x24044022 604258338 > > > lr 0xfef2054 267329620 > > > ctr 0xfccefa0 265088928 > > > xer 0x20000000 536870912 > > > fpscr 0x0 0 > > > vscr 0x0 0 > > > vrsave 0x0 0 > > > > > > (gdb) p/x $pc > > > $1 =3D 0xfef2058 > > > > > > 0x0fef2058 <BN_bn2dec+472>: stw r3,0(r29) > > > > > > (gdb) x 0x10031000 > > > 0x10031000: Cannot access memory at address 0x10031000 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 7/5/05, David Ho <davidkwho@gmail.com> wrote: > > > > This is the second confirmed report of the same problem on the ppc8= xx. > > > > > > > > After reading my email. I must say I was the unfriendly one, I > > > > apologize for that. > > > > > > > > More debugging evidence to come. > > > > > > > > ---------- Forwarded message ---------- > > > > From: Murch, Christopher <cmurch@mrv.com> > > > > Date: Jul 1, 2005 9:46 AM > > > > Subject: RE: PPC bn_div_words routine rewrite > > > > To: David Ho <davidkwho@gmail.com> > > > > > > > > > > > > David, > > > > I had observed the same issue on ppc 8xx machines after upgrading t= o the asm > > > > version of the BN routines. Thank you very much for your work for = the fix. > > > > My question is, do you have high confidence in the other new asm pp= c BN > > > > routines after observing this issue or do you think they might have= similiar > > > > problems? > > > > Thanks. > > > > Chris > > > > > > > > -----Original Message----- > > > > From: David Ho [mailto:davidkwho@gmail.com] > > > > Sent: Thursday, June 30, 2005 6:22 PM > > > > To: openssl-dev@openssl.org; linuxppc-embedded@ozlabs.org > > > > Subject: Re: PPC bn_div_words routine rewrite > > > > > > > > > > > > The reason I had to redo this routine, in case anyone is wondering,= is > > > > because ssh-keygen segfaults when this assembly routine returns ju= nk > > > > to the BN_div_word function. On a ppc, if you issue the command > > > > > > > > ssh-keygen -t rsa1 -f /etc/ssh/ssh_host_key -N "" > > > > > > > > The program craps out when it tries to write the public key in asci= i > > > > decimal. > > > > > > > > Regards, > > > > David > > > > > > > > On 6/30/05, David Ho <davidkwho@gmail.com> wrote: > > > > > Hi all, > > > > > > > > > > This is a rewrite of the bn_div_words routine for the PowerPC arc= h, > > > > > tested on a MPC8xx processor. > > > > > I initially thought there is maybe a small mistake in the code th= at > > > > > requires a one-liner change but it turns out I have to redo the > > > > > routine. > > > > > I guess this routine is not called very often as I see that most = other > > > > > routines are hand-crafted, whereas this routine is compiled from = a C > > > > > function that apparently has not gone through a whole lot of test= ing. > > > > > > > > > > I wrote a C function to confirm correctness of the code. > > > > > > > > > > unsigned long div_words (unsigned long h, > > > > > unsigned long l, > > > > > unsigned long d) > > > > > { > > > > > unsigned long i_h; /* intermediate dividend */ > > > > > unsigned long i_q; /* quotient of i_h/d */ > > > > > unsigned long i_r; /* remainder of i_h/d */ > > > > > > > > > > unsigned long i_cntr; > > > > > unsigned long i_carry; > > > > > > > > > > unsigned long ret_q; /* return quotient */ > > > > > > > > > > /* cannot divide by zero */ > > > > > if (d =3D=3D 0) return 0xffffffff; > > > > > > > > > > /* do simple 32-bit divide */ > > > > > if (h =3D=3D 0) return l/d; > > > > > > > > > > i_q =3D h/d; > > > > > i_r =3D h - (i_q*d); > > > > > ret_q =3D i_q; > > > > > > > > > > i_cntr =3D 32; > > > > > > > > > > while (i_cntr--) > > > > > { > > > > > i_carry =3D (l & 0x80000000) ? 1:0; > > > > > l =3D l << 1; > > > > > > > > > > i_h =3D (i_r << 1) | i_carry; > > > > > i_q =3D i_h/d; > > > > > i_r =3D i_h - (i_q*d); > > > > > > > > > > ret_q =3D (ret_q << 1) | i_q; > > > > > } > > > > > > > > > > return ret_q; > > > > > } > > > > > > > > > > > > > > > Then I handcrafted the routine in PPC assembly. > > > > > The result is a 26 line assembly that is easy to understand and > > > > > predictable as opposed to a 81liner that I am still trying to > > > > > decipher... > > > > > If anyone is interested in incorporating this routine to the open= ssl > > > > > code I'll be happy to assist. > > > > > At this point I think I will be taking a bit of a break from this= 3 > > > > > day debugging/fixing marathon. > > > > > > > > > > Regards, > > > > > David Ho > > > > > > > > > > > > > > > # > > > > > # Handcrafted version of bn_div_words > > > > > # > > > > > # r3 =3D h > > > > > # r4 =3D l > > > > > # r5 =3D d > > > > > > > > > > cmplwi 0,r5,0 # compare r5 and 0 > > > > > bc BO_IF_NOT,CR0_EQ,.Lppcasm_div1 # proceed if d!= =3D0 > > > > > li r3,-1 # d=3D0 return -1 > > > > > bclr BO_ALWAYS,CR0_LT > > > > > .Lppcasm_div1: > > > > > cmplwi 0,r3,0 # compare r3 and 0 > > > > > bc BO_IF_NOT,CR0_EQ,.Lppcasm_div2 # proceed if h != =3D 0 > > > > > divwu r3,r4,r5 # ret_q =3D l/d > > > > > bclr BO_ALWAYS,CR0_LT # return result in r3 > > > > > .Lppcasm_div2: > > > > > divwu r9,r3,r5 # i_q =3D h/d > > > > > mullw r10,r9,r5 # i_r =3D h - (i_q*d) > > > > > subf r10,r10,r3 > > > > > mr r3,r9 # req_q =3D i_q > > > > > .Lppcasm_set_ctr: > > > > > li r12,32 # ctr =3D bitsizeof(d) > > > > > mtctr r12 > > > > > .Lppcasm_div_loop: > > > > > addc r4,r4,r4 # l =3D l << 1 -> i_carry > > > > > adde r11,r10,r10 # i_h =3D (i_r << 1) | i_= carry > > > > > divwu r9,r11,r5 # i_q =3D i_h/d > > > > > mullw r10,r9,r5 # i_r =3D i_h - (i_q*d) > > > > > subf r10,r10,r11 > > > > > add r3,r3,r3 # ret_q =3D ret_q << 1 | = i_q > > > > > add r3,r3,r9 > > > > > bc BO_dCTR_NZERO,CR0_EQ,.Lppcasm_div_loop > > > > > .Lppc_div_end: > > > > > bclr BO_ALWAYS,CR0_LT # return result in r3 > > > > > .long 0x00000000 > > > > > > > > > _______________________________________________ > > > > Linuxppc-embedded mailing list > > > > Linuxppc-embedded@ozlabs.org > > > > https://ozlabs.org/mailman/listinfo/linuxppc-embedded > > > > > > > > > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PPC bn_div_words routine rewrite 2005-07-05 21:25 ` David Ho @ 2005-07-05 21:49 ` Andy Polyakov 0 siblings, 0 replies; 7+ messages in thread From: Andy Polyakov @ 2005-07-05 21:49 UTC (permalink / raw) To: openssl-dev; +Cc: linuxppc-embedded > Okay, having actually did what Andy suggested, i.e. the one liner fix > in the assembly code, bn_div_words returns the correct results. Note that the final version, one committed to all relevant OpenSSL branches since couple of days ago and one which actually made to just released 0.9.8, is a bit different from originally suggested one-line fix, see for example http://cvs.openssl.org/chngview?cn=14199. > At this point, my conclusion is, up to openssl-0.9.8-beta6, the ppc32 > bn_div_words routine generated from crypto/bn/ppc.pl is still busted. Yes. Though it should be noted that 0.9.8 was inadvertently avoiding the bug condition. Recall that original problem report was for 0.9.7. > Why do you signal an overflow condition when it appears functions that > call bn_div_words do not check for overflow conditions? That's question to IBM. By the time they submitted the code, I've explicitly asked what would be appropriate way to generate *fatal* condition at that point, i.e. one which would result in a core dump, and it came out as division by 0 instruction. By that time I had no access to any PPC machine and had to just go with it. Now it actually came as surprise that division by 0 does not raise an exception, but silently returns implementation-specific value... A. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2005-07-05 21:50 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <19EE6EC66973A5408FBE4CB7772F6F0A02C8770E@ltnmail.xyplex.com>
[not found] ` <4dd15d1805070508312427a0ba@mail.gmail.com>
2005-07-05 15:45 ` Fwd: PPC bn_div_words routine rewrite David Ho
2005-07-05 16:36 ` David Ho
2005-07-05 17:01 ` David Ho
2005-07-05 20:21 ` David Ho
2005-07-05 21:22 ` Andy Polyakov
2005-07-05 21:25 ` David Ho
2005-07-05 21:49 ` Andy Polyakov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).