linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* Re: PPC bn_div_words routine rewrite
       [not found] <4dd15d1805063003587276af7e@mail.gmail.com>
@ 2005-06-30 22:22 ` David Ho
  2005-07-01 17:36   ` Andy Polyakov
  0 siblings, 1 reply; 11+ messages in thread
From: David Ho @ 2005-06-30 22:22 UTC (permalink / raw)
  To: openssl-dev, linuxppc-embedded

The reason I had to redo this routine, in case anyone is wondering, is
because ssh-keygen  segfaults when this assembly routine returns junk
to the BN_div_word function. On a ppc, if you issue the command

ssh-keygen -t rsa1 -f /etc/ssh/ssh_host_key -N ""

The program craps out when it tries to write the public key in ascii decima=
l.

Regards,
David=20

On 6/30/05, David Ho <davidkwho@gmail.com> wrote:
> Hi all,
>=20
> This is a rewrite of the bn_div_words routine for the PowerPC arch,
> tested on a MPC8xx processor.
> I initially thought there is maybe a small mistake in the code that
> requires a one-liner change but it turns out I have to redo the
> routine.
> I guess this routine is not called very often as I see that most other
> routines are hand-crafted, whereas this routine is compiled from a C
> function that apparently has not gone through a whole lot of testing.
>=20
> I wrote a C function to confirm correctness of the code.
>=20
> unsigned long div_words (unsigned long h,
>                          unsigned long l,
>                          unsigned long d)
> {
>   unsigned long i_h; /* intermediate dividend */
>   unsigned long i_q; /* quotient of i_h/d */
>   unsigned long i_r; /* remainder of i_h/d */
>=20
>   unsigned long i_cntr;
>   unsigned long i_carry;
>=20
>   unsigned long ret_q; /* return quotient */
>=20
>   /* cannot divide by zero */
>   if (d =3D=3D 0) return 0xffffffff;
>=20
>   /* do simple 32-bit divide */
>   if (h =3D=3D 0) return l/d;
>=20
>   i_q =3D h/d;
>   i_r =3D h - (i_q*d);
>   ret_q =3D i_q;
>=20
>   i_cntr =3D 32;
>=20
>   while (i_cntr--)
>   {
>     i_carry =3D (l & 0x80000000) ? 1:0;
>     l =3D l << 1;
>=20
>     i_h =3D (i_r << 1) | i_carry;
>     i_q =3D i_h/d;
>     i_r =3D i_h - (i_q*d);
>=20
>     ret_q =3D (ret_q << 1) | i_q;
>   }
>=20
>   return ret_q;
> }
>=20
>=20
> Then I handcrafted the routine in PPC assembly.
> The result is a 26 line assembly that is easy to understand and
> predictable as opposed to a 81liner that I am still trying to
> decipher...
> If anyone is interested in incorporating this routine to the openssl
> code I'll be happy to assist.
> At this point I think I will be taking a bit of a break from this 3
> day debugging/fixing marathon.
>=20
> Regards,
> David Ho
>=20
>=20
> #
> #       Handcrafted version of bn_div_words
> #
> #       r3 =3D h
> #       r4 =3D l
> #       r5 =3D d
>=20
>         cmplwi  0,r5,0                  # compare r5 and 0
>         bc      BO_IF_NOT,CR0_EQ,.Lppcasm_div1  # proceed if d!=3D0
>         li      r3,-1                   # d=3D0 return -1
>         bclr    BO_ALWAYS,CR0_LT
> .Lppcasm_div1:
>         cmplwi  0,r3,0                  # compare r3 and 0
>         bc      BO_IF_NOT,CR0_EQ,.Lppcasm_div2  # proceed if h !=3D 0
>         divwu   r3,r4,r5                # ret_q =3D l/d
>         bclr    BO_ALWAYS,CR0_LT        # return result in r3
> .Lppcasm_div2:
>         divwu   r9,r3,r5                # i_q =3D h/d
>         mullw   r10,r9,r5               # i_r =3D h - (i_q*d)
>         subf    r10,r10,r3
>         mr      r3,r9                   # req_q =3D i_q
> .Lppcasm_set_ctr:
>         li      r12,32                  # ctr =3D bitsizeof(d)
>         mtctr   r12
> .Lppcasm_div_loop:
>         addc    r4,r4,r4                # l =3D l << 1 -> i_carry
>         adde    r11,r10,r10             # i_h =3D (i_r << 1) | i_carry
>         divwu   r9,r11,r5               # i_q =3D i_h/d
>         mullw   r10,r9,r5               # i_r =3D i_h - (i_q*d)
>         subf    r10,r10,r11
>         add     r3,r3,r3                # ret_q =3D ret_q << 1 | i_q
>         add     r3,r3,r9
>         bc      BO_dCTR_NZERO,CR0_EQ,.Lppcasm_div_loop
> .Lppc_div_end:
>         bclr    BO_ALWAYS,CR0_LT        # return result in r3
>         .long   0x00000000
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PPC bn_div_words routine rewrite
  2005-06-30 22:22 ` David Ho
@ 2005-07-01 17:36   ` Andy Polyakov
  2005-07-04 14:35     ` David Ho
  0 siblings, 1 reply; 11+ messages in thread
From: Andy Polyakov @ 2005-07-01 17:36 UTC (permalink / raw)
  To: openssl-dev; +Cc: linuxppc-embedded

> The reason I had to redo this routine, in case anyone is wondering, is
> because ssh-keygen  segfaults when this assembly routine returns junk
> to the BN_div_word function. On a ppc, if you issue the command
> 
> ssh-keygen -t rsa1 -f /etc/ssh/ssh_host_key -N ""
> 
> The program craps out when it tries to write the public key in ascii decimal.

If would help if you provide evidence such as debugger stack trace and 
program output. Provided description makes no sense. "seg-faults when 
routine returns junk to BN_div_word"? Seg-fault [segmentation violation] 
can occur when you write something to memory and nothing gets written to 
memory upon result return. BN_div_word does write to memory, but I fail 
to see how a bogus value could possibly trigger seg-fault. The only 
possibility is that assembler doesn't follow ABI convention and corrupts 
registers, which caller is using/expects to be preserved by callee. 
There're several PPC ABI flavors in use, but OpenSSL routines were 
designed ABI-neutral, Well, "neutrality" really means "common 
denominator for ABI specs examined at the moment of coding," so there is 
a window of opportunity that it won't be "neutral" to future ABI, but is 
it really case? That your system uses some newly designed PPC ABI? You 
never mentioned what's your system...

But you're apparently right about a bug being present in PPC assembler. 
I too have got insane [with very few significant digits] decimal 
printout of public key generated by ssh-keygen...

>>This is a rewrite of the bn_div_words routine for the PowerPC arch,
>>tested on a MPC8xx processor.

Well, suggested routine apparently sends ssh-keygen on the PPC-based 
32-bit system I have access to to an end-less loop... And (cd test; make 
test_bn) fails early in BN_sqr... And test/exptest fails miserably with 
"bad reciprocal"...

>>I initially thought there is maybe a small mistake in the code that
>>requires a one-liner change

But apparently this appears to be the case! Please verify following:

--- crypto/bn/asm/ppc.pl.orig        2004-04-28 00:05:50.000000000 +0200
+++ crypto/bn/asm/ppc.pl      2005-07-01 18:58:21.105656512 +0200
@@ -1717,7 +1717,7 @@
         li      r9,1                    # r9=1
         $SHL    r10,r9,r8               # r9<<=r8
         $UCMP   0,r3,r10                #
-       bc      BO_IF,CR0_GT,Lppcasm_div2       #or if (h > (1<<r8))
+       bc      BO_IF_NOT,CR0_GT,Lppcasm_div2   #or if (h > (1<<r8))
         $UDIV   r3,r3,r0                #if not assert(0) divide by 0!
                                         #that's how we signal overflow
         bclr    BO_ALWAYS,CR0_LT        #return. NEVER REACHED.

A.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PPC bn_div_words routine rewrite
  2005-07-01 17:36   ` Andy Polyakov
@ 2005-07-04 14:35     ` David Ho
  2005-07-05 15:00       ` Andy Polyakov
  0 siblings, 1 reply; 11+ messages in thread
From: David Ho @ 2005-07-04 14:35 UTC (permalink / raw)
  To: openssl-dev; +Cc: linuxppc-embedded

[-- Attachment #1: Type: text/plain, Size: 5108 bytes --]

Good morning gentleman,  

Let's start the week off with less hostility and more productive
criticism on this topic.  As I see it, the popular Linux distributions
are not that interested on ppc32 since neither Novell nor Redhat
openly support this arch.  We will have to put our heads together to
make ppc32 a stable Linux platform.

On 7/1/05, Andy Polyakov <appro@fy.chalmers.se> wrote:

> BN_div_word does write to memory, but I fail
> to see how a bogus value could possibly trigger seg-fault. 

Assuming that you are a maintainer of the openssl code, which I gather
you are, would you even care to take the time to examine the code in
suspect, or find someone who is capable of doing that, I'm sure you
must have a lot of friends, at least one who is knowledgeable in the
ppc32 arch.

> The only
> possibility is that assembler doesn't follow ABI convention and corrupts
> registers, which caller is using/expects to be preserved by callee.
> There're several PPC ABI flavors in use, but OpenSSL routines were
> designed ABI-neutral, Well, "neutrality" really means "common
> denominator for ABI specs examined at the moment of coding," so there is
> a window of opportunity that it won't be "neutral" to future ABI, but is
> it really case? 

I don't understand the terminology you use here.  As I understand it
ABI is there so binaries can inter-operate in the case of dynamic
loading, when the ABI is consistent you can use any compiler that
conforms to the ABI and those binaries can work together.  As I see it
I'm building openssl and openssh with the same compiler so what is the
real issue here?  Or is it an issue at all?

If you are referring to C calling convention, then I can only guess
you have figured it out or otherwise none of the assembly routine will
work.

> That your system uses some newly designed PPC ABI? You
> never mentioned what's your system...

In my very first email, I have already said quote "tested on a MPC8xx
processor"  and I am using 2.4.24 which is nothing close to the
bleeding edge so I reckon the ABI is fairly standard.

Do you have any experience with the PowerPC arch?  

> But you're apparently right about a bug being present in PPC assembler.

So you are saying there is a bug in the GCC assembler?  How confident
are you in that?  Is the first correct step to examine the assembly
code for errors before jumping to any conclusion that the GCC
assembler is bad?

> >>This is a rewrite of the bn_div_words routine for the PowerPC arch,
> >>tested on a MPC8xx processor.
> 
> Well, suggested routine apparently sends ssh-keygen on the PPC-based
> 32-bit system I have access to to an end-less loop... 

If you care to read the c function I supplied or if you don't believe
it:  If you understand ppc 32-bit instructions, as specified in the
PowerPC Microprocessor Family: Programming Environments for 32-Bit
Microprocessors.  My routine would not be able to find a condition
that will make it go into an end-less loop,unless you messed up bad
somewhere.

> And (cd test; make
> test_bn) fails early in BN_sqr... And test/exptest fails miserably with
> "bad reciprocal"...

I don't know what you did but (make test_bn) works for me.  So I
question how diligently you are in setting up the tests.  If you are
busy, please get one of your ppc friends to do it.

> >>I initially thought there is maybe a small mistake in the code that
> >>requires a one-liner change
> 
> But apparently this appears to be the case! Please verify following:
> 
> --- crypto/bn/asm/ppc.pl.orig        2004-04-28 00:05:50.000000000 +0200
> +++ crypto/bn/asm/ppc.pl      2005-07-01 18:58:21.105656512 +0200
> @@ -1717,7 +1717,7 @@
>          li      r9,1                    # r9=1
>          $SHL    r10,r9,r8               # r9<<=r8
>          $UCMP   0,r3,r10                #
> -       bc      BO_IF,CR0_GT,Lppcasm_div2       #or if (h > (1<<r8))
> +       bc      BO_IF_NOT,CR0_GT,Lppcasm_div2   #or if (h > (1<<r8))
>          $UDIV   r3,r3,r0                #if not assert(0) divide by 0!
>                                          #that's how we signal overflow
>          bclr    BO_ALWAYS,CR0_LT        #return. NEVER REACHED.
> 

You don't think I have gone in and fix that before realizing it's
worse than that?  I am sure you didn't think I spend 3 days out in the
beach and somehow come up with an idea of clobbering some useful
routine because I think my routine is better?

In summary, what I am trying to provide the community is an
alternative to do something, the current implementation of which is
very questionable.  You might resist change which is understandable. 
But I were a diligent maintainer and I realize something is broken in
my code, then I will put the time to investigate the problem.  If I
don't have the time, I will delagate it to a friend.  If I don't have
a friend with that expertise then, I will try to make friends with the
reporter of the bug since he will most likely have spent hours fixing
the problem.

Regards,
David Ho.

[-- Attachment #2: test_bn.txt --]
[-- Type: text/plain, Size: 3397 bytes --]

bash-2.05b# make test_bn
starting big number library test, could take a while...
test BN_add
test BN_sub
test BN_lshift1
test BN_lshift (fixed)
test BN_lshift
test BN_rshift1
test BN_rshift
test BN_sqr
test BN_mul
test BN_div
test BN_div_recp
test BN_mod
test BN_mod_mul
test BN_mont
test BN_mod_exp
test BN_exp
test BN_kronecker
...++++++
....................................................................................................
test BN_mod_sqrt
.....
.....
.....
.....
.....
.....
.....
.....
.......++++++++++++
.....
......++++++++++++
.....
...++++++++++++
.....
..++++++++++++
.....
...++++++++++++
.....
........++++++++++++
.....
.................++++++++++++
.....
...........++++++++++++
.....
running bc

verify BN_add....................................................................................................
verify BN_sub......................................................................................................................................................
verify BN_lshift1....................................................................................................
verify BN_lshift (fixed)....................................................................................................
verify BN_lshift....................................................................................................
verify BN_rshift1....................................................................................................
verify BN_rshift....................................................................................................
verify BN_sqr....................................................................................................
verify BN_mul......................................................................................................................................................
verify BN_div............................................................................................................................................................................................................................................................................................................
verify BN_div_recp............................................................................................................................................................................................................................................................................................................
verify BN_mod....................................................................................................
verify BN_mod_mul............................................................................................................................................................................................................................................................................................................
verify BN_mont.....
verify BN_mod_exp.....
verify BN_exp.....
verify BN_kronecker
verify BN_mod_sqrt
2015 tests passed
test a^b%c implementations
../util/shlib_wrap.sh ./exptest
........................................................................................................................................................................................................ done

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PPC bn_div_words routine rewrite
  2005-07-04 14:35     ` David Ho
@ 2005-07-05 15:00       ` Andy Polyakov
  0 siblings, 0 replies; 11+ messages in thread
From: Andy Polyakov @ 2005-07-05 15:00 UTC (permalink / raw)
  To: openssl-dev; +Cc: linuxppc-embedded

> Let's start the week off with less hostility and more productive
> criticism on this topic.

If you want productivity, then provide real evidence in form of stack 
backtrace at segmentation violation point, disassemble output in the 
vicinity of segmentation violation point and 'info registers' output at 
the same point. As for hostility I leave it without comment, as you're 
apparently can outrank anybody in that area:-)

>>But you're apparently right about a bug being present in PPC assembler.
> 
> 
> So you are saying there is a bug in the GCC assembler? How confident
> are you in that?  Is the first correct step to examine the assembly
> code for errors before jumping to any conclusion that the GCC
> assembler is bad?

Did I say GCC assembler? I said PPC assembler, which refers to 
crypto/bn/asm/ppc.pl.

>>>>This is a rewrite of the bn_div_words routine for the PowerPC arch,
>>>>tested on a MPC8xx processor.
>>
>>Well, suggested routine apparently sends ssh-keygen on the PPC-based
>>32-bit system I have access to to an end-less loop... 
> 
> 
> If you care to read the c function I supplied or if you don't believe
> it:  If you understand ppc 32-bit instructions, as specified in the
> PowerPC Microprocessor Family: Programming Environments for 32-Bit
> Microprocessors.  My routine would not be able to find a condition
> that will make it go into an end-less loop,unless you messed up bad
> somewhere.

I didn't say that suggested routine goes into an end-less loop, but that 
it "*apparently* sends ssh-keygen into end-less loop." I made no claims 
about which routine exactly loops, and I even admit that I don't know 
for sure if it was in fact end-less loop, because I've chosen to kill 
the process after couple of minutes. Note that normally it takes just 
few seconds on the machine I've tested on, so that couple of minutes is 
essentially unacceptable and by all means *appears* as end-less loop.

> In summary, what I am trying to provide the community is an
> alternative to ... the current implementation of which is
> very questionable.

crypto/bn/asm/ppc.pl distributed with OpenSSL was designed for and 
explicitly tested by IBM under 32- and 64-bit PPC Linux, 32- and 64-bit 
AIX, as well as 32-bit MacOS X. Special care was taken to make sure that 
neither of ABIs/calling conventions used by above listed platforms are 
violated, so that module can be safely invoked by compiler-generated 
code for above mentioned OSes. Afterwards there were reports that it was 
successfully used on unspecified [in report] embedded PPC-based 
platform. Despite this on Friday I could personally confirm on 32-bit 
MacOS X that there admittedly was a bug in ppc.pl, which manifests as 
failure to generate sane decimal ASCII presentation of a BIGNUM, which 
is exactly the kind of operation taking place when you run ssh-keygen -t 
rsa1 [it should be noted decimal ASCII is unfortunately not covered by 
'make test_bn' suite]. But under no circumstances segmentation violation 
was observed. At the same time I could personally confirm that if pasted 
into osx32_ppc.s, suggested implementation induces 'make test_bn' 
failure on 32-bit MacOS X. In particular test/bntest terminates with

print "test BN_sqr\n"
-FF554CAEAE * -FF554CAEAE - FEAB0B30019BBA80FE44
Square test failed!
1

while test/exptest:

BN_mod_exp_recp() problems
14482:error:03082065:bignum routines:BN_div_recp:bad 
reciprocal:bn_recp.c:194:

For me it's enough reasons to become sceptical to submission and conduct 
trouble-shooting of my own. Currently available ppc.pl was verified to 
pass 'make test_bn' on 32-bit MacOS X, 32- and 64-bit AIX [tested by 
IBM], as well as to generate correct decimal ASCII presentation on the 
mentioned platforms. If it doesn't work for you, then submit information 
listed in the beginning of the letter. A.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Fwd: PPC bn_div_words routine rewrite
       [not found] ` <4dd15d1805070508312427a0ba@mail.gmail.com>
@ 2005-07-05 15:45   ` David Ho
  2005-07-05 16:36     ` David Ho
  0 siblings, 1 reply; 11+ messages in thread
From: David Ho @ 2005-07-05 15:45 UTC (permalink / raw)
  To: appro, linuxppc-embedded

This is the second confirmed report of the same problem on the ppc8xx.

After reading my email.  I must say I was the unfriendly one, I
apologize for that.

More debugging evidence to come.

---------- Forwarded message ----------
From: Murch, Christopher <cmurch@mrv.com>
Date: Jul 1, 2005 9:46 AM
Subject: RE: PPC bn_div_words routine rewrite
To: David Ho <davidkwho@gmail.com>


David,
I had observed the same issue on ppc 8xx machines after upgrading to the as=
m
version of the BN routines.  Thank you very much for your work for the fix.
My question is, do you have high confidence in the other new asm ppc BN
routines after observing this issue or do you think they might have similia=
r
problems?
Thanks.
Chris

-----Original Message-----
From: David Ho [mailto:davidkwho@gmail.com]
Sent: Thursday, June 30, 2005 6:22 PM
To: openssl-dev@openssl.org; linuxppc-embedded@ozlabs.org
Subject: Re: PPC bn_div_words routine rewrite


The reason I had to redo this routine, in case anyone is wondering, is
because ssh-keygen  segfaults when this assembly routine returns junk
to the BN_div_word function. On a ppc, if you issue the command

ssh-keygen -t rsa1 -f /etc/ssh/ssh_host_key -N ""

The program craps out when it tries to write the public key in ascii
decimal.

Regards,
David

On 6/30/05, David Ho <davidkwho@gmail.com> wrote:
> Hi all,
>
> This is a rewrite of the bn_div_words routine for the PowerPC arch,
> tested on a MPC8xx processor.
> I initially thought there is maybe a small mistake in the code that
> requires a one-liner change but it turns out I have to redo the
> routine.
> I guess this routine is not called very often as I see that most other
> routines are hand-crafted, whereas this routine is compiled from a C
> function that apparently has not gone through a whole lot of testing.
>
> I wrote a C function to confirm correctness of the code.
>
> unsigned long div_words (unsigned long h,
>                          unsigned long l,
>                          unsigned long d)
> {
>   unsigned long i_h; /* intermediate dividend */
>   unsigned long i_q; /* quotient of i_h/d */
>   unsigned long i_r; /* remainder of i_h/d */
>
>   unsigned long i_cntr;
>   unsigned long i_carry;
>
>   unsigned long ret_q; /* return quotient */
>
>   /* cannot divide by zero */
>   if (d =3D=3D 0) return 0xffffffff;
>
>   /* do simple 32-bit divide */
>   if (h =3D=3D 0) return l/d;
>
>   i_q =3D h/d;
>   i_r =3D h - (i_q*d);
>   ret_q =3D i_q;
>
>   i_cntr =3D 32;
>
>   while (i_cntr--)
>   {
>     i_carry =3D (l & 0x80000000) ? 1:0;
>     l =3D l << 1;
>
>     i_h =3D (i_r << 1) | i_carry;
>     i_q =3D i_h/d;
>     i_r =3D i_h - (i_q*d);
>
>     ret_q =3D (ret_q << 1) | i_q;
>   }
>
>   return ret_q;
> }
>
>
> Then I handcrafted the routine in PPC assembly.
> The result is a 26 line assembly that is easy to understand and
> predictable as opposed to a 81liner that I am still trying to
> decipher...
> If anyone is interested in incorporating this routine to the openssl
> code I'll be happy to assist.
> At this point I think I will be taking a bit of a break from this 3
> day debugging/fixing marathon.
>
> Regards,
> David Ho
>
>
> #
> #       Handcrafted version of bn_div_words
> #
> #       r3 =3D h
> #       r4 =3D l
> #       r5 =3D d
>
>         cmplwi  0,r5,0                  # compare r5 and 0
>         bc      BO_IF_NOT,CR0_EQ,.Lppcasm_div1  # proceed if d!=3D0
>         li      r3,-1                   # d=3D0 return -1
>         bclr    BO_ALWAYS,CR0_LT
> .Lppcasm_div1:
>         cmplwi  0,r3,0                  # compare r3 and 0
>         bc      BO_IF_NOT,CR0_EQ,.Lppcasm_div2  # proceed if h !=3D 0
>         divwu   r3,r4,r5                # ret_q =3D l/d
>         bclr    BO_ALWAYS,CR0_LT        # return result in r3
> .Lppcasm_div2:
>         divwu   r9,r3,r5                # i_q =3D h/d
>         mullw   r10,r9,r5               # i_r =3D h - (i_q*d)
>         subf    r10,r10,r3
>         mr      r3,r9                   # req_q =3D i_q
> .Lppcasm_set_ctr:
>         li      r12,32                  # ctr =3D bitsizeof(d)
>         mtctr   r12
> .Lppcasm_div_loop:
>         addc    r4,r4,r4                # l =3D l << 1 -> i_carry
>         adde    r11,r10,r10             # i_h =3D (i_r << 1) | i_carry
>         divwu   r9,r11,r5               # i_q =3D i_h/d
>         mullw   r10,r9,r5               # i_r =3D i_h - (i_q*d)
>         subf    r10,r10,r11
>         add     r3,r3,r3                # ret_q =3D ret_q << 1 | i_q
>         add     r3,r3,r9
>         bc      BO_dCTR_NZERO,CR0_EQ,.Lppcasm_div_loop
> .Lppc_div_end:
>         bclr    BO_ALWAYS,CR0_LT        # return result in r3
>         .long   0x00000000
>
_______________________________________________
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PPC bn_div_words routine rewrite
  2005-07-05 15:45   ` Fwd: PPC bn_div_words routine rewrite David Ho
@ 2005-07-05 16:36     ` David Ho
  2005-07-05 17:01       ` David Ho
  0 siblings, 1 reply; 11+ messages in thread
From: David Ho @ 2005-07-05 16:36 UTC (permalink / raw)
  To: appro, linuxppc-embedded, openssl-dev

First pass debugging results from gdb on ppc8xx.  Executing ssh-keygen
with following arguments.

(gdb) show args
Argument list to give program being debugged when it is started is
    "-t rsa1 -f /etc/ssh/ssh_host_key -N """.

Program received signal SIGSEGV, Segmentation fault.
BN_bn2dec (a=3D0x1002d9f0) at bn_print.c:136
136                             *lp=3DBN_div_word(t,BN_DEC_CONV);

(gdb) i r
r0             0x0      0
r1             0x7fffd580       2147472768
r2             0x30012868       805382248
r3             0x80000000       2147483648
r4             0xfef33fc        267334652
r5             0x25     37
r6             0xfccdef8        265084664
r7             0x7fffd4c0       2147472576
r8             0xfbad2887       4222429319
r9             0x84044022       2214871074
r10            0x0      0
r11            0x2      2
r12            0xfef2054        267329620
r13            0x10030bc8       268635080
r14            0x0      0
r15            0x0      0
r16            0x0      0
r17            0x0      0
r18            0x0      0
r19            0x0      0
r20            0x0      0
r21            0x0      0
r22            0x0      0
r23            0x64     100
r24            0x5      5
r25            0x1002d438       268620856
r26            0x1002d9f0       268622320
r27            0x1002c578       268617080
r28            0x1      1
r29            0x10031000       268636160
r30            0xffbf7d0        268171216
r31            0x1002d9f0       268622320
pc             0xfef2058        267329624
ps             0xd032   53298
cr             0x24044022       604258338
lr             0xfef2054        267329620
ctr            0xfccefa0        265088928
xer            0x20000000       536870912
fpscr          0x0      0
vscr           0x0      0
vrsave         0x0      0

(gdb) p/x $pc
$1 =3D 0xfef2058

0x0fef2058 <BN_bn2dec+472>:     stw     r3,0(r29)

(gdb) x 0x10031000
0x10031000:     Cannot access memory at address 0x10031000










On 7/5/05, David Ho <davidkwho@gmail.com> wrote:
> This is the second confirmed report of the same problem on the ppc8xx.
>=20
> After reading my email.  I must say I was the unfriendly one, I
> apologize for that.
>=20
> More debugging evidence to come.
>=20
> ---------- Forwarded message ----------
> From: Murch, Christopher <cmurch@mrv.com>
> Date: Jul 1, 2005 9:46 AM
> Subject: RE: PPC bn_div_words routine rewrite
> To: David Ho <davidkwho@gmail.com>
>=20
>=20
> David,
> I had observed the same issue on ppc 8xx machines after upgrading to the =
asm
> version of the BN routines.  Thank you very much for your work for the fi=
x.
> My question is, do you have high confidence in the other new asm ppc BN
> routines after observing this issue or do you think they might have simil=
iar
> problems?
> Thanks.
> Chris
>=20
> -----Original Message-----
> From: David Ho [mailto:davidkwho@gmail.com]
> Sent: Thursday, June 30, 2005 6:22 PM
> To: openssl-dev@openssl.org; linuxppc-embedded@ozlabs.org
> Subject: Re: PPC bn_div_words routine rewrite
>=20
>=20
> The reason I had to redo this routine, in case anyone is wondering, is
> because ssh-keygen  segfaults when this assembly routine returns junk
> to the BN_div_word function. On a ppc, if you issue the command
>=20
> ssh-keygen -t rsa1 -f /etc/ssh/ssh_host_key -N ""
>=20
> The program craps out when it tries to write the public key in ascii
> decimal.
>=20
> Regards,
> David
>=20
> On 6/30/05, David Ho <davidkwho@gmail.com> wrote:
> > Hi all,
> >
> > This is a rewrite of the bn_div_words routine for the PowerPC arch,
> > tested on a MPC8xx processor.
> > I initially thought there is maybe a small mistake in the code that
> > requires a one-liner change but it turns out I have to redo the
> > routine.
> > I guess this routine is not called very often as I see that most other
> > routines are hand-crafted, whereas this routine is compiled from a C
> > function that apparently has not gone through a whole lot of testing.
> >
> > I wrote a C function to confirm correctness of the code.
> >
> > unsigned long div_words (unsigned long h,
> >                          unsigned long l,
> >                          unsigned long d)
> > {
> >   unsigned long i_h; /* intermediate dividend */
> >   unsigned long i_q; /* quotient of i_h/d */
> >   unsigned long i_r; /* remainder of i_h/d */
> >
> >   unsigned long i_cntr;
> >   unsigned long i_carry;
> >
> >   unsigned long ret_q; /* return quotient */
> >
> >   /* cannot divide by zero */
> >   if (d =3D=3D 0) return 0xffffffff;
> >
> >   /* do simple 32-bit divide */
> >   if (h =3D=3D 0) return l/d;
> >
> >   i_q =3D h/d;
> >   i_r =3D h - (i_q*d);
> >   ret_q =3D i_q;
> >
> >   i_cntr =3D 32;
> >
> >   while (i_cntr--)
> >   {
> >     i_carry =3D (l & 0x80000000) ? 1:0;
> >     l =3D l << 1;
> >
> >     i_h =3D (i_r << 1) | i_carry;
> >     i_q =3D i_h/d;
> >     i_r =3D i_h - (i_q*d);
> >
> >     ret_q =3D (ret_q << 1) | i_q;
> >   }
> >
> >   return ret_q;
> > }
> >
> >
> > Then I handcrafted the routine in PPC assembly.
> > The result is a 26 line assembly that is easy to understand and
> > predictable as opposed to a 81liner that I am still trying to
> > decipher...
> > If anyone is interested in incorporating this routine to the openssl
> > code I'll be happy to assist.
> > At this point I think I will be taking a bit of a break from this 3
> > day debugging/fixing marathon.
> >
> > Regards,
> > David Ho
> >
> >
> > #
> > #       Handcrafted version of bn_div_words
> > #
> > #       r3 =3D h
> > #       r4 =3D l
> > #       r5 =3D d
> >
> >         cmplwi  0,r5,0                  # compare r5 and 0
> >         bc      BO_IF_NOT,CR0_EQ,.Lppcasm_div1  # proceed if d!=3D0
> >         li      r3,-1                   # d=3D0 return -1
> >         bclr    BO_ALWAYS,CR0_LT
> > .Lppcasm_div1:
> >         cmplwi  0,r3,0                  # compare r3 and 0
> >         bc      BO_IF_NOT,CR0_EQ,.Lppcasm_div2  # proceed if h !=3D 0
> >         divwu   r3,r4,r5                # ret_q =3D l/d
> >         bclr    BO_ALWAYS,CR0_LT        # return result in r3
> > .Lppcasm_div2:
> >         divwu   r9,r3,r5                # i_q =3D h/d
> >         mullw   r10,r9,r5               # i_r =3D h - (i_q*d)
> >         subf    r10,r10,r3
> >         mr      r3,r9                   # req_q =3D i_q
> > .Lppcasm_set_ctr:
> >         li      r12,32                  # ctr =3D bitsizeof(d)
> >         mtctr   r12
> > .Lppcasm_div_loop:
> >         addc    r4,r4,r4                # l =3D l << 1 -> i_carry
> >         adde    r11,r10,r10             # i_h =3D (i_r << 1) | i_carry
> >         divwu   r9,r11,r5               # i_q =3D i_h/d
> >         mullw   r10,r9,r5               # i_r =3D i_h - (i_q*d)
> >         subf    r10,r10,r11
> >         add     r3,r3,r3                # ret_q =3D ret_q << 1 | i_q
> >         add     r3,r3,r9
> >         bc      BO_dCTR_NZERO,CR0_EQ,.Lppcasm_div_loop
> > .Lppc_div_end:
> >         bclr    BO_ALWAYS,CR0_LT        # return result in r3
> >         .long   0x00000000
> >
> _______________________________________________
> Linuxppc-embedded mailing list
> Linuxppc-embedded@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-embedded
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PPC bn_div_words routine rewrite
  2005-07-05 16:36     ` David Ho
@ 2005-07-05 17:01       ` David Ho
  2005-07-05 20:21         ` David Ho
  0 siblings, 1 reply; 11+ messages in thread
From: David Ho @ 2005-07-05 17:01 UTC (permalink / raw)
  To: appro, linuxppc-embedded, openssl-dev

I can tell you with certainty, with reference to the function
BN_bn2dec, that since lp is a pointer, and within the while loop
around bn_print.c:136 lp is being incremented.  Because the test
BN_is_zero(t) is always false, you have a pointer that is going off
into the stratosphere, hence the segfault on ppc8xx.

More analysis to come.

On 7/5/05, David Ho <davidkwho@gmail.com> wrote:
> First pass debugging results from gdb on ppc8xx.  Executing ssh-keygen
> with following arguments.
>=20
> (gdb) show args
> Argument list to give program being debugged when it is started is
>     "-t rsa1 -f /etc/ssh/ssh_host_key -N """.
>=20
> Program received signal SIGSEGV, Segmentation fault.
> BN_bn2dec (a=3D0x1002d9f0) at bn_print.c:136
> 136                             *lp=3DBN_div_word(t,BN_DEC_CONV);
>=20
> (gdb) i r
> r0             0x0      0
> r1             0x7fffd580       2147472768
> r2             0x30012868       805382248
> r3             0x80000000       2147483648
> r4             0xfef33fc        267334652
> r5             0x25     37
> r6             0xfccdef8        265084664
> r7             0x7fffd4c0       2147472576
> r8             0xfbad2887       4222429319
> r9             0x84044022       2214871074
> r10            0x0      0
> r11            0x2      2
> r12            0xfef2054        267329620
> r13            0x10030bc8       268635080
> r14            0x0      0
> r15            0x0      0
> r16            0x0      0
> r17            0x0      0
> r18            0x0      0
> r19            0x0      0
> r20            0x0      0
> r21            0x0      0
> r22            0x0      0
> r23            0x64     100
> r24            0x5      5
> r25            0x1002d438       268620856
> r26            0x1002d9f0       268622320
> r27            0x1002c578       268617080
> r28            0x1      1
> r29            0x10031000       268636160
> r30            0xffbf7d0        268171216
> r31            0x1002d9f0       268622320
> pc             0xfef2058        267329624
> ps             0xd032   53298
> cr             0x24044022       604258338
> lr             0xfef2054        267329620
> ctr            0xfccefa0        265088928
> xer            0x20000000       536870912
> fpscr          0x0      0
> vscr           0x0      0
> vrsave         0x0      0
>=20
> (gdb) p/x $pc
> $1 =3D 0xfef2058
>=20
> 0x0fef2058 <BN_bn2dec+472>:     stw     r3,0(r29)
>=20
> (gdb) x 0x10031000
> 0x10031000:     Cannot access memory at address 0x10031000
>=20
>=20
>=20
>=20
>=20
>=20
>=20
>=20
>=20
>=20
> On 7/5/05, David Ho <davidkwho@gmail.com> wrote:
> > This is the second confirmed report of the same problem on the ppc8xx.
> >
> > After reading my email.  I must say I was the unfriendly one, I
> > apologize for that.
> >
> > More debugging evidence to come.
> >
> > ---------- Forwarded message ----------
> > From: Murch, Christopher <cmurch@mrv.com>
> > Date: Jul 1, 2005 9:46 AM
> > Subject: RE: PPC bn_div_words routine rewrite
> > To: David Ho <davidkwho@gmail.com>
> >
> >
> > David,
> > I had observed the same issue on ppc 8xx machines after upgrading to th=
e asm
> > version of the BN routines.  Thank you very much for your work for the =
fix.
> > My question is, do you have high confidence in the other new asm ppc BN
> > routines after observing this issue or do you think they might have sim=
iliar
> > problems?
> > Thanks.
> > Chris
> >
> > -----Original Message-----
> > From: David Ho [mailto:davidkwho@gmail.com]
> > Sent: Thursday, June 30, 2005 6:22 PM
> > To: openssl-dev@openssl.org; linuxppc-embedded@ozlabs.org
> > Subject: Re: PPC bn_div_words routine rewrite
> >
> >
> > The reason I had to redo this routine, in case anyone is wondering, is
> > because ssh-keygen  segfaults when this assembly routine returns junk
> > to the BN_div_word function. On a ppc, if you issue the command
> >
> > ssh-keygen -t rsa1 -f /etc/ssh/ssh_host_key -N ""
> >
> > The program craps out when it tries to write the public key in ascii
> > decimal.
> >
> > Regards,
> > David
> >
> > On 6/30/05, David Ho <davidkwho@gmail.com> wrote:
> > > Hi all,
> > >
> > > This is a rewrite of the bn_div_words routine for the PowerPC arch,
> > > tested on a MPC8xx processor.
> > > I initially thought there is maybe a small mistake in the code that
> > > requires a one-liner change but it turns out I have to redo the
> > > routine.
> > > I guess this routine is not called very often as I see that most othe=
r
> > > routines are hand-crafted, whereas this routine is compiled from a C
> > > function that apparently has not gone through a whole lot of testing.
> > >
> > > I wrote a C function to confirm correctness of the code.
> > >
> > > unsigned long div_words (unsigned long h,
> > >                          unsigned long l,
> > >                          unsigned long d)
> > > {
> > >   unsigned long i_h; /* intermediate dividend */
> > >   unsigned long i_q; /* quotient of i_h/d */
> > >   unsigned long i_r; /* remainder of i_h/d */
> > >
> > >   unsigned long i_cntr;
> > >   unsigned long i_carry;
> > >
> > >   unsigned long ret_q; /* return quotient */
> > >
> > >   /* cannot divide by zero */
> > >   if (d =3D=3D 0) return 0xffffffff;
> > >
> > >   /* do simple 32-bit divide */
> > >   if (h =3D=3D 0) return l/d;
> > >
> > >   i_q =3D h/d;
> > >   i_r =3D h - (i_q*d);
> > >   ret_q =3D i_q;
> > >
> > >   i_cntr =3D 32;
> > >
> > >   while (i_cntr--)
> > >   {
> > >     i_carry =3D (l & 0x80000000) ? 1:0;
> > >     l =3D l << 1;
> > >
> > >     i_h =3D (i_r << 1) | i_carry;
> > >     i_q =3D i_h/d;
> > >     i_r =3D i_h - (i_q*d);
> > >
> > >     ret_q =3D (ret_q << 1) | i_q;
> > >   }
> > >
> > >   return ret_q;
> > > }
> > >
> > >
> > > Then I handcrafted the routine in PPC assembly.
> > > The result is a 26 line assembly that is easy to understand and
> > > predictable as opposed to a 81liner that I am still trying to
> > > decipher...
> > > If anyone is interested in incorporating this routine to the openssl
> > > code I'll be happy to assist.
> > > At this point I think I will be taking a bit of a break from this 3
> > > day debugging/fixing marathon.
> > >
> > > Regards,
> > > David Ho
> > >
> > >
> > > #
> > > #       Handcrafted version of bn_div_words
> > > #
> > > #       r3 =3D h
> > > #       r4 =3D l
> > > #       r5 =3D d
> > >
> > >         cmplwi  0,r5,0                  # compare r5 and 0
> > >         bc      BO_IF_NOT,CR0_EQ,.Lppcasm_div1  # proceed if d!=3D0
> > >         li      r3,-1                   # d=3D0 return -1
> > >         bclr    BO_ALWAYS,CR0_LT
> > > .Lppcasm_div1:
> > >         cmplwi  0,r3,0                  # compare r3 and 0
> > >         bc      BO_IF_NOT,CR0_EQ,.Lppcasm_div2  # proceed if h !=3D 0
> > >         divwu   r3,r4,r5                # ret_q =3D l/d
> > >         bclr    BO_ALWAYS,CR0_LT        # return result in r3
> > > .Lppcasm_div2:
> > >         divwu   r9,r3,r5                # i_q =3D h/d
> > >         mullw   r10,r9,r5               # i_r =3D h - (i_q*d)
> > >         subf    r10,r10,r3
> > >         mr      r3,r9                   # req_q =3D i_q
> > > .Lppcasm_set_ctr:
> > >         li      r12,32                  # ctr =3D bitsizeof(d)
> > >         mtctr   r12
> > > .Lppcasm_div_loop:
> > >         addc    r4,r4,r4                # l =3D l << 1 -> i_carry
> > >         adde    r11,r10,r10             # i_h =3D (i_r << 1) | i_carr=
y
> > >         divwu   r9,r11,r5               # i_q =3D i_h/d
> > >         mullw   r10,r9,r5               # i_r =3D i_h - (i_q*d)
> > >         subf    r10,r10,r11
> > >         add     r3,r3,r3                # ret_q =3D ret_q << 1 | i_q
> > >         add     r3,r3,r9
> > >         bc      BO_dCTR_NZERO,CR0_EQ,.Lppcasm_div_loop
> > > .Lppc_div_end:
> > >         bclr    BO_ALWAYS,CR0_LT        # return result in r3
> > >         .long   0x00000000
> > >
> > _______________________________________________
> > Linuxppc-embedded mailing list
> > Linuxppc-embedded@ozlabs.org
> > https://ozlabs.org/mailman/listinfo/linuxppc-embedded
> >
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PPC bn_div_words routine rewrite
  2005-07-05 17:01       ` David Ho
@ 2005-07-05 20:21         ` David Ho
  2005-07-05 21:22           ` Andy Polyakov
  2005-07-05 21:25           ` David Ho
  0 siblings, 2 replies; 11+ messages in thread
From: David Ho @ 2005-07-05 20:21 UTC (permalink / raw)
  To: appro, linuxppc-embedded, openssl-dev

Let's take first call to BN_div_word for example from BN_bn2dec, the
parameter being passed to BN_div_word is (a=3D35, w=3D1000000000) (decimal
numbers).  It then calls the bn_div_words with (h=3D0, l=3D35,
d=3D1000000000)  if you examine the code in linux_ppc32.s it will exit
early on because h is 0.  the routine returns a divide by 0, which is
undefined according to the manual.  In the case of ppc8xx the result
is 0x80000000.  So this is the return value from bn_div_words, as seen
in register R3.

So what happens next is BN_div_word modifies "a" (1st parameter) with
the result (0x80000000) and returns 23 as the remainder of the
division. So "a" is never zero as a result and hence the test for
BN_is_zero is always false.  The problem fails the very first time it
uses bn_div_words.

The next thing I did naturally was to fix the case when you have h=3D0,
which you can quite easy do it with the native divwu instruction.  Lo
and behold I was once again disappointed when h is not equal to 0.

More to come...


On 7/5/05, David Ho <davidkwho@gmail.com> wrote:
> I can tell you with certainty, with reference to the function
> BN_bn2dec, that since lp is a pointer, and within the while loop
> around bn_print.c:136 lp is being incremented.  Because the test
> BN_is_zero(t) is always false, you have a pointer that is going off
> into the stratosphere, hence the segfault on ppc8xx.
>=20
> More analysis to come.
>=20
> On 7/5/05, David Ho <davidkwho@gmail.com> wrote:
> > First pass debugging results from gdb on ppc8xx.  Executing ssh-keygen
> > with following arguments.
> >
> > (gdb) show args
> > Argument list to give program being debugged when it is started is
> >     "-t rsa1 -f /etc/ssh/ssh_host_key -N """.
> >
> > Program received signal SIGSEGV, Segmentation fault.
> > BN_bn2dec (a=3D0x1002d9f0) at bn_print.c:136
> > 136                             *lp=3DBN_div_word(t,BN_DEC_CONV);
> >
> > (gdb) i r
> > r0             0x0      0
> > r1             0x7fffd580       2147472768
> > r2             0x30012868       805382248
> > r3             0x80000000       2147483648
> > r4             0xfef33fc        267334652
> > r5             0x25     37
> > r6             0xfccdef8        265084664
> > r7             0x7fffd4c0       2147472576
> > r8             0xfbad2887       4222429319
> > r9             0x84044022       2214871074
> > r10            0x0      0
> > r11            0x2      2
> > r12            0xfef2054        267329620
> > r13            0x10030bc8       268635080
> > r14            0x0      0
> > r15            0x0      0
> > r16            0x0      0
> > r17            0x0      0
> > r18            0x0      0
> > r19            0x0      0
> > r20            0x0      0
> > r21            0x0      0
> > r22            0x0      0
> > r23            0x64     100
> > r24            0x5      5
> > r25            0x1002d438       268620856
> > r26            0x1002d9f0       268622320
> > r27            0x1002c578       268617080
> > r28            0x1      1
> > r29            0x10031000       268636160
> > r30            0xffbf7d0        268171216
> > r31            0x1002d9f0       268622320
> > pc             0xfef2058        267329624
> > ps             0xd032   53298
> > cr             0x24044022       604258338
> > lr             0xfef2054        267329620
> > ctr            0xfccefa0        265088928
> > xer            0x20000000       536870912
> > fpscr          0x0      0
> > vscr           0x0      0
> > vrsave         0x0      0
> >
> > (gdb) p/x $pc
> > $1 =3D 0xfef2058
> >
> > 0x0fef2058 <BN_bn2dec+472>:     stw     r3,0(r29)
> >
> > (gdb) x 0x10031000
> > 0x10031000:     Cannot access memory at address 0x10031000
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On 7/5/05, David Ho <davidkwho@gmail.com> wrote:
> > > This is the second confirmed report of the same problem on the ppc8xx=
.
> > >
> > > After reading my email.  I must say I was the unfriendly one, I
> > > apologize for that.
> > >
> > > More debugging evidence to come.
> > >
> > > ---------- Forwarded message ----------
> > > From: Murch, Christopher <cmurch@mrv.com>
> > > Date: Jul 1, 2005 9:46 AM
> > > Subject: RE: PPC bn_div_words routine rewrite
> > > To: David Ho <davidkwho@gmail.com>
> > >
> > >
> > > David,
> > > I had observed the same issue on ppc 8xx machines after upgrading to =
the asm
> > > version of the BN routines.  Thank you very much for your work for th=
e fix.
> > > My question is, do you have high confidence in the other new asm ppc =
BN
> > > routines after observing this issue or do you think they might have s=
imiliar
> > > problems?
> > > Thanks.
> > > Chris
> > >
> > > -----Original Message-----
> > > From: David Ho [mailto:davidkwho@gmail.com]
> > > Sent: Thursday, June 30, 2005 6:22 PM
> > > To: openssl-dev@openssl.org; linuxppc-embedded@ozlabs.org
> > > Subject: Re: PPC bn_div_words routine rewrite
> > >
> > >
> > > The reason I had to redo this routine, in case anyone is wondering, i=
s
> > > because ssh-keygen  segfaults when this assembly routine returns junk
> > > to the BN_div_word function. On a ppc, if you issue the command
> > >
> > > ssh-keygen -t rsa1 -f /etc/ssh/ssh_host_key -N ""
> > >
> > > The program craps out when it tries to write the public key in ascii
> > > decimal.
> > >
> > > Regards,
> > > David
> > >
> > > On 6/30/05, David Ho <davidkwho@gmail.com> wrote:
> > > > Hi all,
> > > >
> > > > This is a rewrite of the bn_div_words routine for the PowerPC arch,
> > > > tested on a MPC8xx processor.
> > > > I initially thought there is maybe a small mistake in the code that
> > > > requires a one-liner change but it turns out I have to redo the
> > > > routine.
> > > > I guess this routine is not called very often as I see that most ot=
her
> > > > routines are hand-crafted, whereas this routine is compiled from a =
C
> > > > function that apparently has not gone through a whole lot of testin=
g.
> > > >
> > > > I wrote a C function to confirm correctness of the code.
> > > >
> > > > unsigned long div_words (unsigned long h,
> > > >                          unsigned long l,
> > > >                          unsigned long d)
> > > > {
> > > >   unsigned long i_h; /* intermediate dividend */
> > > >   unsigned long i_q; /* quotient of i_h/d */
> > > >   unsigned long i_r; /* remainder of i_h/d */
> > > >
> > > >   unsigned long i_cntr;
> > > >   unsigned long i_carry;
> > > >
> > > >   unsigned long ret_q; /* return quotient */
> > > >
> > > >   /* cannot divide by zero */
> > > >   if (d =3D=3D 0) return 0xffffffff;
> > > >
> > > >   /* do simple 32-bit divide */
> > > >   if (h =3D=3D 0) return l/d;
> > > >
> > > >   i_q =3D h/d;
> > > >   i_r =3D h - (i_q*d);
> > > >   ret_q =3D i_q;
> > > >
> > > >   i_cntr =3D 32;
> > > >
> > > >   while (i_cntr--)
> > > >   {
> > > >     i_carry =3D (l & 0x80000000) ? 1:0;
> > > >     l =3D l << 1;
> > > >
> > > >     i_h =3D (i_r << 1) | i_carry;
> > > >     i_q =3D i_h/d;
> > > >     i_r =3D i_h - (i_q*d);
> > > >
> > > >     ret_q =3D (ret_q << 1) | i_q;
> > > >   }
> > > >
> > > >   return ret_q;
> > > > }
> > > >
> > > >
> > > > Then I handcrafted the routine in PPC assembly.
> > > > The result is a 26 line assembly that is easy to understand and
> > > > predictable as opposed to a 81liner that I am still trying to
> > > > decipher...
> > > > If anyone is interested in incorporating this routine to the openss=
l
> > > > code I'll be happy to assist.
> > > > At this point I think I will be taking a bit of a break from this 3
> > > > day debugging/fixing marathon.
> > > >
> > > > Regards,
> > > > David Ho
> > > >
> > > >
> > > > #
> > > > #       Handcrafted version of bn_div_words
> > > > #
> > > > #       r3 =3D h
> > > > #       r4 =3D l
> > > > #       r5 =3D d
> > > >
> > > >         cmplwi  0,r5,0                  # compare r5 and 0
> > > >         bc      BO_IF_NOT,CR0_EQ,.Lppcasm_div1  # proceed if d!=3D0
> > > >         li      r3,-1                   # d=3D0 return -1
> > > >         bclr    BO_ALWAYS,CR0_LT
> > > > .Lppcasm_div1:
> > > >         cmplwi  0,r3,0                  # compare r3 and 0
> > > >         bc      BO_IF_NOT,CR0_EQ,.Lppcasm_div2  # proceed if h !=3D=
 0
> > > >         divwu   r3,r4,r5                # ret_q =3D l/d
> > > >         bclr    BO_ALWAYS,CR0_LT        # return result in r3
> > > > .Lppcasm_div2:
> > > >         divwu   r9,r3,r5                # i_q =3D h/d
> > > >         mullw   r10,r9,r5               # i_r =3D h - (i_q*d)
> > > >         subf    r10,r10,r3
> > > >         mr      r3,r9                   # req_q =3D i_q
> > > > .Lppcasm_set_ctr:
> > > >         li      r12,32                  # ctr =3D bitsizeof(d)
> > > >         mtctr   r12
> > > > .Lppcasm_div_loop:
> > > >         addc    r4,r4,r4                # l =3D l << 1 -> i_carry
> > > >         adde    r11,r10,r10             # i_h =3D (i_r << 1) | i_ca=
rry
> > > >         divwu   r9,r11,r5               # i_q =3D i_h/d
> > > >         mullw   r10,r9,r5               # i_r =3D i_h - (i_q*d)
> > > >         subf    r10,r10,r11
> > > >         add     r3,r3,r3                # ret_q =3D ret_q << 1 | i_=
q
> > > >         add     r3,r3,r9
> > > >         bc      BO_dCTR_NZERO,CR0_EQ,.Lppcasm_div_loop
> > > > .Lppc_div_end:
> > > >         bclr    BO_ALWAYS,CR0_LT        # return result in r3
> > > >         .long   0x00000000
> > > >
> > > _______________________________________________
> > > Linuxppc-embedded mailing list
> > > Linuxppc-embedded@ozlabs.org
> > > https://ozlabs.org/mailman/listinfo/linuxppc-embedded
> > >
> >
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PPC bn_div_words routine rewrite
  2005-07-05 20:21         ` David Ho
@ 2005-07-05 21:22           ` Andy Polyakov
  2005-07-05 21:25           ` David Ho
  1 sibling, 0 replies; 11+ messages in thread
From: Andy Polyakov @ 2005-07-05 21:22 UTC (permalink / raw)
  To: openssl-dev; +Cc: linuxppc-embedded

> Let's take first call to BN_div_word for example from BN_bn2dec, the
> parameter being passed to BN_div_word is (a=35, w=1000000000) (decimal
> numbers).  It then calls the bn_div_words with (h=0, l=35,
> d=1000000000)  if you examine the code in linux_ppc32.s it will exit
> early on because h is 0.  the routine returns a divide by 0,  which is
> undefined according to the manual.  In the case of ppc8xx the result
> is 0x80000000.

And on the PPC machine I have access to it returns 0. This is 
explanation why I never experienced any SEGV, but a sparse decimal 
output. And it does explain why BN_is_zero condition never met on your 
system and you hit sbrk(0) limit and suffer the penalty. However! Note 
that updated routine, 
http://cvs.openssl.org/getfile/openssl/crypto/bn/asm/ppc.pl?v=1.4 never 
issues divide by 0 [it traps instead, but the condition is never met now 
when called from BN_div_words] and it does return correct answer to me. 
Can you really confirm that updated subroutine doesn't work for you? And 
if so, how does problem manifest? Still SEGV? At same point?

It should pointed out that bug in ppc.pl is encountered only in 0.9.7 
context, as 0.9.8 avoids it by normalizing divisor [and adjusting 
dividend accordingly]. BTW, I can confirm that 0.9.7 produces correct 
decimal ASCII with your routine [but no luck with make test_bn], but in 
0.9.8 context decimal printout comes out truncated [not sparse with some 
significant digits there and there, but truncated] if your routine is 
pasted in. A.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PPC bn_div_words routine rewrite
  2005-07-05 20:21         ` David Ho
  2005-07-05 21:22           ` Andy Polyakov
@ 2005-07-05 21:25           ` David Ho
  2005-07-05 21:49             ` Andy Polyakov
  1 sibling, 1 reply; 11+ messages in thread
From: David Ho @ 2005-07-05 21:25 UTC (permalink / raw)
  To: appro, linuxppc-embedded, openssl-dev

Okay, having actually did what Andy suggested, i.e. the one liner fix
in the assembly code, bn_div_words returns the correct results.

At this point, my conclusion is, up to openssl-0.9.8-beta6,  the ppc32
bn_div_words routine generated from crypto/bn/ppc.pl is still busted.=20
Your solution is either to use the one liner fix andy provided in his
reply or my routine.  My solution is a complete rewrite of the
routine, which is tested to work on ppc8xx.  I personally do not know
why it would not work on other ppc32 systems but if you have time to
spare you are welcome to give it a try on your ppc32, because it is by
far the most straight-forward algorithm for division.  If you have
done long division on paper from your primary school days, this
function mimics exactly the long division steps in binary (as opposed
to decimal).  Way easier to fix if you know what the algorithm is
trying to do.

Comments I have for the old routine are:

Why do you signal an overflow condition when it appears functions that
call bn_div_words do not check for overflow conditions?  That's why in
my routine I just let the register overflow.

Why the complexity, does it offer performance advantage?  Obviously
code size is huge compared to mine.

David










On 7/5/05, David Ho <davidkwho@gmail.com> wrote:
> Let's take first call to BN_div_word for example from BN_bn2dec, the
> parameter being passed to BN_div_word is (a=3D35, w=3D1000000000) (decima=
l
> numbers).  It then calls the bn_div_words with (h=3D0, l=3D35,
> d=3D1000000000)  if you examine the code in linux_ppc32.s it will exit
> early on because h is 0.  the routine returns a divide by 0, which is
> undefined according to the manual.  In the case of ppc8xx the result
> is 0x80000000.  So this is the return value from bn_div_words, as seen
> in register R3.
>=20
> So what happens next is BN_div_word modifies "a" (1st parameter) with
> the result (0x80000000) and returns 23 as the remainder of the
> division. So "a" is never zero as a result and hence the test for
> BN_is_zero is always false.  The problem fails the very first time it
> uses bn_div_words.
>=20
> The next thing I did naturally was to fix the case when you have h=3D0,
> which you can quite easy do it with the native divwu instruction.  Lo
> and behold I was once again disappointed when h is not equal to 0.
>=20
> More to come...
>=20
>=20
> On 7/5/05, David Ho <davidkwho@gmail.com> wrote:
> > I can tell you with certainty, with reference to the function
> > BN_bn2dec, that since lp is a pointer, and within the while loop
> > around bn_print.c:136 lp is being incremented.  Because the test
> > BN_is_zero(t) is always false, you have a pointer that is going off
> > into the stratosphere, hence the segfault on ppc8xx.
> >
> > More analysis to come.
> >
> > On 7/5/05, David Ho <davidkwho@gmail.com> wrote:
> > > First pass debugging results from gdb on ppc8xx.  Executing ssh-keyge=
n
> > > with following arguments.
> > >
> > > (gdb) show args
> > > Argument list to give program being debugged when it is started is
> > >     "-t rsa1 -f /etc/ssh/ssh_host_key -N """.
> > >
> > > Program received signal SIGSEGV, Segmentation fault.
> > > BN_bn2dec (a=3D0x1002d9f0) at bn_print.c:136
> > > 136                             *lp=3DBN_div_word(t,BN_DEC_CONV);
> > >
> > > (gdb) i r
> > > r0             0x0      0
> > > r1             0x7fffd580       2147472768
> > > r2             0x30012868       805382248
> > > r3             0x80000000       2147483648
> > > r4             0xfef33fc        267334652
> > > r5             0x25     37
> > > r6             0xfccdef8        265084664
> > > r7             0x7fffd4c0       2147472576
> > > r8             0xfbad2887       4222429319
> > > r9             0x84044022       2214871074
> > > r10            0x0      0
> > > r11            0x2      2
> > > r12            0xfef2054        267329620
> > > r13            0x10030bc8       268635080
> > > r14            0x0      0
> > > r15            0x0      0
> > > r16            0x0      0
> > > r17            0x0      0
> > > r18            0x0      0
> > > r19            0x0      0
> > > r20            0x0      0
> > > r21            0x0      0
> > > r22            0x0      0
> > > r23            0x64     100
> > > r24            0x5      5
> > > r25            0x1002d438       268620856
> > > r26            0x1002d9f0       268622320
> > > r27            0x1002c578       268617080
> > > r28            0x1      1
> > > r29            0x10031000       268636160
> > > r30            0xffbf7d0        268171216
> > > r31            0x1002d9f0       268622320
> > > pc             0xfef2058        267329624
> > > ps             0xd032   53298
> > > cr             0x24044022       604258338
> > > lr             0xfef2054        267329620
> > > ctr            0xfccefa0        265088928
> > > xer            0x20000000       536870912
> > > fpscr          0x0      0
> > > vscr           0x0      0
> > > vrsave         0x0      0
> > >
> > > (gdb) p/x $pc
> > > $1 =3D 0xfef2058
> > >
> > > 0x0fef2058 <BN_bn2dec+472>:     stw     r3,0(r29)
> > >
> > > (gdb) x 0x10031000
> > > 0x10031000:     Cannot access memory at address 0x10031000
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On 7/5/05, David Ho <davidkwho@gmail.com> wrote:
> > > > This is the second confirmed report of the same problem on the ppc8=
xx.
> > > >
> > > > After reading my email.  I must say I was the unfriendly one, I
> > > > apologize for that.
> > > >
> > > > More debugging evidence to come.
> > > >
> > > > ---------- Forwarded message ----------
> > > > From: Murch, Christopher <cmurch@mrv.com>
> > > > Date: Jul 1, 2005 9:46 AM
> > > > Subject: RE: PPC bn_div_words routine rewrite
> > > > To: David Ho <davidkwho@gmail.com>
> > > >
> > > >
> > > > David,
> > > > I had observed the same issue on ppc 8xx machines after upgrading t=
o the asm
> > > > version of the BN routines.  Thank you very much for your work for =
the fix.
> > > > My question is, do you have high confidence in the other new asm pp=
c BN
> > > > routines after observing this issue or do you think they might have=
 similiar
> > > > problems?
> > > > Thanks.
> > > > Chris
> > > >
> > > > -----Original Message-----
> > > > From: David Ho [mailto:davidkwho@gmail.com]
> > > > Sent: Thursday, June 30, 2005 6:22 PM
> > > > To: openssl-dev@openssl.org; linuxppc-embedded@ozlabs.org
> > > > Subject: Re: PPC bn_div_words routine rewrite
> > > >
> > > >
> > > > The reason I had to redo this routine, in case anyone is wondering,=
 is
> > > > because ssh-keygen  segfaults when this assembly routine returns ju=
nk
> > > > to the BN_div_word function. On a ppc, if you issue the command
> > > >
> > > > ssh-keygen -t rsa1 -f /etc/ssh/ssh_host_key -N ""
> > > >
> > > > The program craps out when it tries to write the public key in asci=
i
> > > > decimal.
> > > >
> > > > Regards,
> > > > David
> > > >
> > > > On 6/30/05, David Ho <davidkwho@gmail.com> wrote:
> > > > > Hi all,
> > > > >
> > > > > This is a rewrite of the bn_div_words routine for the PowerPC arc=
h,
> > > > > tested on a MPC8xx processor.
> > > > > I initially thought there is maybe a small mistake in the code th=
at
> > > > > requires a one-liner change but it turns out I have to redo the
> > > > > routine.
> > > > > I guess this routine is not called very often as I see that most =
other
> > > > > routines are hand-crafted, whereas this routine is compiled from =
a C
> > > > > function that apparently has not gone through a whole lot of test=
ing.
> > > > >
> > > > > I wrote a C function to confirm correctness of the code.
> > > > >
> > > > > unsigned long div_words (unsigned long h,
> > > > >                          unsigned long l,
> > > > >                          unsigned long d)
> > > > > {
> > > > >   unsigned long i_h; /* intermediate dividend */
> > > > >   unsigned long i_q; /* quotient of i_h/d */
> > > > >   unsigned long i_r; /* remainder of i_h/d */
> > > > >
> > > > >   unsigned long i_cntr;
> > > > >   unsigned long i_carry;
> > > > >
> > > > >   unsigned long ret_q; /* return quotient */
> > > > >
> > > > >   /* cannot divide by zero */
> > > > >   if (d =3D=3D 0) return 0xffffffff;
> > > > >
> > > > >   /* do simple 32-bit divide */
> > > > >   if (h =3D=3D 0) return l/d;
> > > > >
> > > > >   i_q =3D h/d;
> > > > >   i_r =3D h - (i_q*d);
> > > > >   ret_q =3D i_q;
> > > > >
> > > > >   i_cntr =3D 32;
> > > > >
> > > > >   while (i_cntr--)
> > > > >   {
> > > > >     i_carry =3D (l & 0x80000000) ? 1:0;
> > > > >     l =3D l << 1;
> > > > >
> > > > >     i_h =3D (i_r << 1) | i_carry;
> > > > >     i_q =3D i_h/d;
> > > > >     i_r =3D i_h - (i_q*d);
> > > > >
> > > > >     ret_q =3D (ret_q << 1) | i_q;
> > > > >   }
> > > > >
> > > > >   return ret_q;
> > > > > }
> > > > >
> > > > >
> > > > > Then I handcrafted the routine in PPC assembly.
> > > > > The result is a 26 line assembly that is easy to understand and
> > > > > predictable as opposed to a 81liner that I am still trying to
> > > > > decipher...
> > > > > If anyone is interested in incorporating this routine to the open=
ssl
> > > > > code I'll be happy to assist.
> > > > > At this point I think I will be taking a bit of a break from this=
 3
> > > > > day debugging/fixing marathon.
> > > > >
> > > > > Regards,
> > > > > David Ho
> > > > >
> > > > >
> > > > > #
> > > > > #       Handcrafted version of bn_div_words
> > > > > #
> > > > > #       r3 =3D h
> > > > > #       r4 =3D l
> > > > > #       r5 =3D d
> > > > >
> > > > >         cmplwi  0,r5,0                  # compare r5 and 0
> > > > >         bc      BO_IF_NOT,CR0_EQ,.Lppcasm_div1  # proceed if d!=
=3D0
> > > > >         li      r3,-1                   # d=3D0 return -1
> > > > >         bclr    BO_ALWAYS,CR0_LT
> > > > > .Lppcasm_div1:
> > > > >         cmplwi  0,r3,0                  # compare r3 and 0
> > > > >         bc      BO_IF_NOT,CR0_EQ,.Lppcasm_div2  # proceed if h !=
=3D 0
> > > > >         divwu   r3,r4,r5                # ret_q =3D l/d
> > > > >         bclr    BO_ALWAYS,CR0_LT        # return result in r3
> > > > > .Lppcasm_div2:
> > > > >         divwu   r9,r3,r5                # i_q =3D h/d
> > > > >         mullw   r10,r9,r5               # i_r =3D h - (i_q*d)
> > > > >         subf    r10,r10,r3
> > > > >         mr      r3,r9                   # req_q =3D i_q
> > > > > .Lppcasm_set_ctr:
> > > > >         li      r12,32                  # ctr =3D bitsizeof(d)
> > > > >         mtctr   r12
> > > > > .Lppcasm_div_loop:
> > > > >         addc    r4,r4,r4                # l =3D l << 1 -> i_carry
> > > > >         adde    r11,r10,r10             # i_h =3D (i_r << 1) | i_=
carry
> > > > >         divwu   r9,r11,r5               # i_q =3D i_h/d
> > > > >         mullw   r10,r9,r5               # i_r =3D i_h - (i_q*d)
> > > > >         subf    r10,r10,r11
> > > > >         add     r3,r3,r3                # ret_q =3D ret_q << 1 | =
i_q
> > > > >         add     r3,r3,r9
> > > > >         bc      BO_dCTR_NZERO,CR0_EQ,.Lppcasm_div_loop
> > > > > .Lppc_div_end:
> > > > >         bclr    BO_ALWAYS,CR0_LT        # return result in r3
> > > > >         .long   0x00000000
> > > > >
> > > > _______________________________________________
> > > > Linuxppc-embedded mailing list
> > > > Linuxppc-embedded@ozlabs.org
> > > > https://ozlabs.org/mailman/listinfo/linuxppc-embedded
> > > >
> > >
> >
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PPC bn_div_words routine rewrite
  2005-07-05 21:25           ` David Ho
@ 2005-07-05 21:49             ` Andy Polyakov
  0 siblings, 0 replies; 11+ messages in thread
From: Andy Polyakov @ 2005-07-05 21:49 UTC (permalink / raw)
  To: openssl-dev; +Cc: linuxppc-embedded

> Okay, having actually did what Andy suggested, i.e. the one liner fix
> in the assembly code, bn_div_words returns the correct results.

Note that the final version, one committed to all relevant OpenSSL 
branches since couple of days ago and one which actually made to just 
released 0.9.8, is a bit different from originally suggested one-line 
fix, see for example http://cvs.openssl.org/chngview?cn=14199.

> At this point, my conclusion is, up to openssl-0.9.8-beta6,  the ppc32
> bn_div_words routine generated from crypto/bn/ppc.pl is still busted.

Yes. Though it should be noted that 0.9.8 was inadvertently avoiding the 
bug condition. Recall that original problem report was for 0.9.7.

> Why do you signal an overflow condition when it appears functions that
> call bn_div_words do not check for overflow conditions?

That's question to IBM. By the time they submitted the code, I've 
explicitly asked what would be appropriate way to generate *fatal* 
condition at that point, i.e. one which would result in a core dump, and 
it came out as division by 0 instruction. By that time I had no access 
to any PPC machine and had to just go with it. Now it actually came as 
surprise that division by 0 does not raise an exception, but silently 
returns implementation-specific value... A.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2005-07-05 21:50 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <19EE6EC66973A5408FBE4CB7772F6F0A02C8770E@ltnmail.xyplex.com>
     [not found] ` <4dd15d1805070508312427a0ba@mail.gmail.com>
2005-07-05 15:45   ` Fwd: PPC bn_div_words routine rewrite David Ho
2005-07-05 16:36     ` David Ho
2005-07-05 17:01       ` David Ho
2005-07-05 20:21         ` David Ho
2005-07-05 21:22           ` Andy Polyakov
2005-07-05 21:25           ` David Ho
2005-07-05 21:49             ` Andy Polyakov
     [not found] <4dd15d1805063003587276af7e@mail.gmail.com>
2005-06-30 22:22 ` David Ho
2005-07-01 17:36   ` Andy Polyakov
2005-07-04 14:35     ` David Ho
2005-07-05 15:00       ` Andy Polyakov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).