Linux PARISC architecture development
 help / color / mirror / Atom feed
* in ccio_io_pdir_entry(), BUG_ON() seems to break gcc-4.2 optimization?
@ 2008-06-15 12:37 rubisher
  2008-06-16 11:37 ` in ccio_io_pdir_entry(),BUG_ON() " rubisher
  2008-06-19 16:04 ` in ccio_io_pdir_entry(), BUG_ON() " Grant Grundler
  0 siblings, 2 replies; 17+ messages in thread
From: rubisher @ 2008-06-15 12:37 UTC (permalink / raw)
  To: linux-parisc

Hello all,

looking at this hunk:
void CCIO_INLINE
ccio_io_pdir_entry(u64 *pdir_ptr, space_t sid, unsigned long vba,
                    unsigned long hints)
{
         register unsigned long pa;
         register unsigned long ci; /* coherent index */

         /* We currently only support kernel addresses */
         BUG_ON(sid != KERNEL_SPACE);

         mtsp(sid,1);

[snip]
         pa = virt_to_phys(vba);
         asm volatile("depw  %1,31,12,%0" : "+r" (pa) : "r" (hints));
         ((u32 *)pdir_ptr)[1] = (u32) pa;

[snip]
         pa = 0;
[snip]
         asm volatile ("lci %%r0(%%sr1, %1), %0" : "=r" (ci) : "r" (vba));
         asm volatile ("extru %1,19,12,%0" : "+r" (ci) : "r" (ci));
         asm volatile ("depw  %1,15,12,%0" : "+r" (pa) : "r" (ci));

         ((u32 *)pdir_ptr)[0] = (u32) pa;

[snip]
         asm volatile("fdc %%r0(%0)" : : "r" (pdir_ptr));
         asm volatile("sync");
}
(I just remove comments and 64bit stuff)

and I noticed that resulting code looks like:
    0:   cb 39 a0 60     movb,<> r25,r25,38 <ccio_io_pdir_entry+0x38>
    4:   34 1c 00 00     ldi 0,ret0
    8:   00 1c 58 20     mtsp ret0,sr1
    c:   22 60 0e 01     ldil L%-10000000,r19
   10:   0a 78 0a 13     add,l r24,r19,r19
   14:   d6 77 0c 14     depw r23,31,12,r19
   18:   0f 53 12 88     stw r19,4(r26)
   1c:   07 00 53 1c     lci r0(sr1,r24),ret0
   20:   d3 9c 1a 74     extrw,u ret0,19,12,ret0
   24:   d7 3c 0e 14     depw ret0,15,12,r25
   28:   0f 59 12 80     stw r25,0(r26)
   2c:   07 40 12 80     fdc r0(r26)
   30:   00 00 04 00     sync
   34:   e8 40 c0 02     bv,n r0(rp)
   38:   03 ff e0 1f     break 1f,1fff
   3c:   e8 1f 1f f7     b,l,n 3c <ccio_io_pdir_entry+0x3c>,r0
Disassembly of section .text.ccio_proc_bitmap_open:

And my worry was about lines 4: and 8:.
According to the C code, I don't understand why optimization want to initialize sr1 to 0 while it should be set to r25 (i.e. 
arg1)?

Otoh, the sba botherhood code didn't showing the same behaviour:
    0:   22 a0 0e 01     ldil L%-10000000,r21
    4:   34 1c 00 00     ldi 0,ret0
    8:   34 1d 20 01     ldi -1000,ret1
    c:   0a b8 0a 15     add,l r24,r21,r21
   10:   08 15 02 56     copy r21,r22
   14:   34 15 00 00     ldi 0,r21
   18:   0b 95 02 15     and r21,ret0,r21
   1c:   0b b6 02 16     and r22,ret1,r22
   20:   00 19 58 20     mtsp r25,sr1
   24:   07 00 53 13     lci r0(sr1,r24),r19
   28:   d2 73 1a 6c     extrw,u r19,19,20,r19
   2c:   23 80 00 01     ldil L%-80000000,ret0
   30:   34 1d 00 00     ldi 0,ret1

but didn't start with BUG_ON(), I simply try to remove this from ccio code and get a better result:
00000000 <ccio_io_pdir_entry>:
    0:   00 19 58 20     mtsp r25,sr1
    4:   23 80 0e 01     ldil L%-10000000,ret0
    8:   0b 98 0a 1c     add,l r24,ret0,ret0
    c:   d7 97 0c 14     depw r23,31,12,ret0
   10:   0f 5c 12 88     stw ret0,4(r26)
   14:   07 00 53 18     lci r0(sr1,r24),r24
   18:   d3 18 1a 74     extrw,u r24,19,12,r24
   1c:   34 1c 00 00     ldi 0,ret0
   20:   d7 98 0e 14     depw r24,15,12,ret0
   24:   0f 5c 12 80     stw ret0,0(r26)
   28:   07 40 12 80     fdc r0(r26)
   2c:   00 00 04 00     sync
   30:   e8 40 c0 02     bv,n r0(rp)
Disassembly of section .init.text:

But this time, it seems not consider assembly:
         asm volatile ("lci %%r0(%%sr1, %1), %0" : "=r" (ci) : "r" (vba));
         asm volatile ("extru %1,19,12,%0" : "+r" (ci) : "r" (ci));
         asm volatile ("depw  %1,15,12,%0" : "+r" (pa) : "r" (ci));

as a 'volatile' block and insert line 1c:
This could may be solved by re-write as an one 'volatile' asm block like:
	asm volatile (
	"lci %%r0(%%sr1, %1), %1"
	"\textru        %1,19,12,%1\n"
	"\tdepw         %1,15,12,%0\n"
	: "=r" (pa)
	: "r" (vba));

and even add a clobber 'memory'
	asm volatile (
	"lci %%r0(%%sr1, %1), %1"
	"\textru        %1,19,12,%1\n"
	"\tdepw         %1,15,12,%0\n"
	: "=r" (pa)
	: "r" (vba)
	: "memory");

But I have no clue how to restore BUG_ON() and avoid wrong optimization?

Any idea?

Tia,
	r.


^ permalink raw reply	[flat|nested] 17+ messages in thread
* Re: in ccio_io_pdir_entry(),BUG_ON() seems to break gcc-4.2 optimization?
@ 2008-06-20  6:37 Joel Soete
  0 siblings, 0 replies; 17+ messages in thread
From: Joel Soete @ 2008-06-20  6:37 UTC (permalink / raw)
  To: dave; +Cc: grundler, rubisher, linux-parisc

> > > Does the BUG_ON explaination make sense to you?
> > > 
> > Well I nerver imagine that a compiler can make proof of this kind of
induction spirit ;-) but that's give the sense, tx.
> 
> Yes, the current compiler eliminates a lot of dead code based on induction.
> This occurs a lot when functions are inlined, etc.  Makes debugging a real
> problem.
> 
Thanks Dave (after a while away from this parisc hobby, it's nice to learn
fresh info).

Cheers,
    J.

> Dave
> -- 
> J. David Anglin                                  dave.anglin@nrc-cnrc.gc.ca
> National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread
* Re: in ccio_io_pdir_entry(),BUG_ON() seems to break gcc-4.2 optimization?
@ 2008-06-26  6:28 Joel Soete
  2008-06-28 20:23 ` Grant Grundler
  0 siblings, 1 reply; 17+ messages in thread
From: Joel Soete @ 2008-06-26  6:28 UTC (permalink / raw)
  To: Grant Grundler; +Cc: linux-parisc

Just for remind:

Joel Soete wrote:
 >
 >
[snip]
 >>> But this time, it seems not consider assembly:
 >>>         asm volatile ("lci %%r0(%%sr1, %1), %0" : "=r" (ci) : "r"
 >>> (vba));
 >>>         asm volatile ("extru %1,19,12,%0" : "+r" (ci) : "r" (ci));
 >>>         asm volatile ("depw  %1,15,12,%0" : "+r" (pa) : "r" (ci));
 >>>
 >>> as a 'volatile' block and insert line 1c:

This seems to make better what I want:
        __asm__ __volatile__ (
        "lci    %%r0(%%sr1, %2), %0\n"
        "\textru        %0,19,12,%0\n"
        "\tdepw         %0,15,12,%1\n"
        : "+r" (ci), "+r" (pa)
        : "r" (vba)
        : "memory"
        );


in <ccio_map_sg>
  200:	06 80 53 13 	lci r0(sr1,r20),r19		      |	 200:	08 1a 02 54 	copy r26,r20
  204:	d2 73 1a 74 	extrw,u r19,19,12,r19		      |	 204:	06 a0 53 14 	lci
r0(sr1,r21),r20
  208:	08 1a 02 5c 	copy r26,ret0			      |	 208:	d2 94 1a 74 	extrw,u
r20,19,12,r20
  20c:	d7 93 0e 14 	depw r19,15,12,ret0		      |	 20c:	d7 94 0e 14 	depw
r20,15,12,ret0
  210:	0e dc 12 80 	stw ret0,0(r22)				 210:	0e dc 12 80 	stw ret0,0(r22)
  214:	06 c0 12 80 	fdc r0(r22)				 214:	06 c0 12 80 	fdc r0(r22)
  218:	00 00 04 00 	sync					 218:	00 00 04 00 	sync

J.

PS: I don't yet understand why kernel panicing when I try to get rid to
reserve 'ci' variable and use in place a gr as r19 (even if clobbered), the
produced object seems ok but resulting kernel panicing (even after a full
rebuild after a make distclean???)



^ permalink raw reply	[flat|nested] 17+ messages in thread
* Re: in ccio_io_pdir_entry(),BUG_ON() seems to break gcc-4.2 optimization?
@ 2008-07-08  9:04 Joel Soete
  0 siblings, 0 replies; 17+ messages in thread
From: Joel Soete @ 2008-07-08  9:04 UTC (permalink / raw)
  To: grundler; +Cc: grundler, linux-parisc

[...]
> 
> In "New Stuff", r20 gets clobbered and values copied from
> r26 are lost. That's not the case in the original code.
> "depw" only replaces 12 bits of "ret0" in the original code.
> 
Good catch, in fact I would just have to save the read+write constraint on pa;
i.e this seems to make the drill:

        __asm__ __volatile__ (
        "lci    %%r0(%%sr1, %2), %0\n"
        "\textru        %0,19,12,%0\n"
        "\tdepw         %0,15,12,%1\n"
        : "=r" (ci), "+r" (pa)
        : "r" (vba)
        );

comparing produced objdump:
00000000 <ccio_io_pdir_entry>:           00000000 <ccio_io_pdir_entry>:
 0: 00 19 58 20  mtsp r25,sr1             0: 00 19 58 20  mtsp r25,sr1
 4: 23 80 0e 01  ldil L%-10000000,ret0    4: 23 80 0e 01  ldil L%-10000000,ret0
 8: 0b 98 0a 1c  add,l r24,ret0,ret0      8: 0b 98 0a 1c  add,l r24,ret0,ret0
 c: d7 97 0c 14  depw r23,31,12,ret0      c: d7 97 0c 14  depw r23,31,12,ret0
10: 0f 5c 12 88  stw ret0,4(r26)       | 10: 34 13 00 00  ldi 0,r19
14: 07 00 53 18  lci r0(sr1,r24),r24   | 14: 0f 5c 12 88  stw ret0,4(r26)
18: d3 18 1a 74  extrw,u r24,19,12,r24 | 18: 07 00 53 1c  lci r0(sr1,r24),ret0
1c: 34 1c 00 00  ldi 0,ret0            | 1c: d3 9c 1a 74  extrw,u ret0,19,12,ret0
20: d7 98 0e 14  depw r24,15,12,ret0   | 20: d6 7c 0e 14  depw ret0,15,12,r19
24: 0f 5c 12 80  stw ret0,0(r26)       | 24: 0f 53 12 80  stw r19,0(r26)
28: 07 40 12 80  fdc r0(r26)             28: 07 40 12 80  fdc r0(r26)
2c: 00 00 04 00  sync                    2c: 00 00 04 00  sync
30: e8 40 c0 02  bv,n r0(rp)             30: e8 40 c0 02  bv,n r0(rp)
Disassembly of section .text.ccio_free_consistent:Disassembly of section ...

This time r19 is well 'initialized' with 0 (as per C code pa = 0);
the coherent index in r28 (ret0) and after extrw, depw replaces 12 bits of r19.

Many thanks for re-reading ;-)
    J.

> thanks,
> grant
> --
> To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2008-07-08  9:04 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-15 12:37 in ccio_io_pdir_entry(), BUG_ON() seems to break gcc-4.2 optimization? rubisher
2008-06-16 11:37 ` in ccio_io_pdir_entry(),BUG_ON() " rubisher
2008-06-19 16:04 ` in ccio_io_pdir_entry(), BUG_ON() " Grant Grundler
2008-06-19 19:44   ` Joel Soete
2008-06-19 22:48     ` John David Anglin
2008-06-19 22:41   ` in ccio_io_pdir_entry(), BUG_ON() seems to break gcc-4.2 John David Anglin
  -- strict thread matches above, loose matches on Subject: below --
2008-06-20  6:37 in ccio_io_pdir_entry(),BUG_ON() seems to break gcc-4.2 optimization? Joel Soete
2008-06-26  6:28 Joel Soete
2008-06-28 20:23 ` Grant Grundler
2008-06-28 22:26   ` Joel Soete
2008-06-28 22:45     ` John David Anglin
2008-06-29 20:52     ` Grant Grundler
2008-06-30 18:28       ` Joel Soete
2008-07-02  4:28         ` Grant Grundler
2008-07-02 18:01           ` Joel Soete
2008-07-07 15:28             ` Grant Grundler
2008-07-08  9:04 Joel Soete

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox