* wrong SP restored after DBE exception @ 2006-09-27 19:53 Dave Johnson 2006-09-28 13:09 ` Ralf Baechle 0 siblings, 1 reply; 6+ messages in thread From: Dave Johnson @ 2006-09-27 19:53 UTC (permalink / raw) To: linux-mips I'm running into an odd problem with the DBE exception handler. I've got an IO device that on some error conditions causes a bus fault on access. The driver I have that accesses this device directs all reads/writes through a wrapper function to handle potential bus faults (see the read version below). If the device causes a DBE on access, do_be() looks up the EPC in the DBE table and successfully corrects the PC to handle the fault. This works most of the time, however on about 1 out of 100 faults the SP register is saved and restored incorrectly. When control returns to the faulting function SP is 304 bytes less than where it should be and as expected things go down hill from there. 304 bytes is PT_SIZE (the amount of space for saved registers) I suspect something is wrong with except_vec3_generic() or handle_dbe() but the only thing that comes to mind is potential nested interrupts/exceptions that would clobber K0/K1. The fact that SP is off by 304 bytes seems to indicate it saved twice but only restored once. CPU: SiByte BCM1250 (both A8 and B2 stepping tested) Kernel: linux 2.6.12 (yes, I know it's old), 64bit kernel Config: Occurs with and without SMP and with and without PREEMPT I took a quick look to see if this area has changed between 2.6.12 and 2.6.17 and the only part I see is get_saved_sp() and that should only effect faults from userspace. All the faults I'm getting are from a kernel-mode driver. I've walked through one (succesful) DBE fault from this driver using a JTAG debugger and everything looks to run exactly as expected. I have yet to catch a failing one with the debugger except for after the restore is finished but that's too late. Anyone have any thoughts on this issue? ------------------------ read wrapper function is: .set noreorder /* int sb_io_trap_readb(unsigned char *value, const volatile void *addrs); */ LEAF(sb_io_trap_readb) /* do the read, handle error */ 8: lb t0, (a1) 9: add t0, t0, zero /* consume read */ .section __dbe_table,"a" PTR 8b, 1f PTR 9b, 1f .previous /* * write out to the caller's pointer, if this fails it's a bug * and we should fault as normal */ sb t0, (a0) /* all good, return success */ jr ra move v0, zero /* fault handler, return -EIO */ 1: jr ra li v0, -EIO END(sb_io_trap_readb) -- Dave Johnson Starent Networks ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: wrong SP restored after DBE exception 2006-09-27 19:53 wrong SP restored after DBE exception Dave Johnson @ 2006-09-28 13:09 ` Ralf Baechle 2006-09-28 13:56 ` Maciej W. Rozycki 0 siblings, 1 reply; 6+ messages in thread From: Ralf Baechle @ 2006-09-28 13:09 UTC (permalink / raw) To: Dave Johnson; +Cc: linux-mips On Wed, Sep 27, 2006 at 03:53:55PM -0400, Dave Johnson wrote: > I'm running into an odd problem with the DBE exception handler. There is a fundamental problem with the way unmaskable exceptions other than cache errors and NMI are handled. This is the disassembly of the kernel's exception entry path starting at the general exception vector: a8000000003bae20 <except_vec3_generic>: a8000000003bae20: 401b6800 mfc0 k1,$13 a8000000003bae24: 337b007c andi k1,k1,0x7c a8000000003bae28: 001bd878 dsll k1,k1,0x1 a8000000003bae2c: 3c1a003f lui k0,0x3f a8000000003bae30: 035bd02d daddu k0,k0,k1 a8000000003bae34: df5ad2d8 ld k0,-11560(k0) a8000000003bae38: 03400008 jr k0 a8000000003bae3c: 00000000 nop [...] A few types of exceptions will be handled by just using $k0 and $k1; must will save the registers right away: [...] a800000000020440: 401a6000 mfc0 k0,$12 a800000000020444: 001ad0c0 sll k0,k0,0x3 a800000000020448: 0740000a bltz k0,0xa800000000020474 a80000000002044c: 03a0d82d move k1,sp a800000000020450: 403b2000 dmfc0 k1,$4 a800000000020454: 3c1aa800 lui k0,0xa800 a800000000020458: 001bddfa dsrl k1,k1,0x17 a80000000002045c: 675a0000 daddiu k0,k0,0 a800000000020460: 001ad438 dsll k0,k0,0x10 a800000000020464: 675a0043 daddiu k0,k0,67 a800000000020468: 001ad438 dsll k0,k0,0x10 a80000000002046c: 037ad82d daddu k1,k1,k0 a800000000020470: df7bf008 ld k1,-4088(k1) a800000000020474: 03a0d02d move k0,sp a800000000020478: 677dfed0 daddiu sp,k1,-304 a80000000002047c: ffba00e8 sd k0,232(sp) (c0_status.exl is cleared a mile further down) [...] If we take a DBE exception in this code we're in trouble and I've seen systems delivering DBEs highly asynchronously. Afar the Broadcom SOCs fall into that class. So the interesting part is if we take a data bus exception between the stack pointer adjustment and and before EXL is cleared. We're taking a nested exception so c0_epc and c0_cause.bd will not be updated. So when the bus error handler will save the $sp value it saw on entry but will return to the EPC of the first exception, that is only one stack frame will be popped. Whops ... Ralf ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: wrong SP restored after DBE exception 2006-09-28 13:09 ` Ralf Baechle @ 2006-09-28 13:56 ` Maciej W. Rozycki 2006-09-28 14:28 ` Ralf Baechle 0 siblings, 1 reply; 6+ messages in thread From: Maciej W. Rozycki @ 2006-09-28 13:56 UTC (permalink / raw) To: Ralf Baechle; +Cc: Dave Johnson, linux-mips On Thu, 28 Sep 2006, Ralf Baechle wrote: > If we take a DBE exception in this code we're in trouble and I've seen > systems delivering DBEs highly asynchronously. Afar the Broadcom SOCs > fall into that class. > > So the interesting part is if we take a data bus exception between > the stack pointer adjustment and and before EXL is cleared. We're taking > a nested exception so c0_epc and c0_cause.bd will not be updated. So > when the bus error handler will save the $sp value it saw on entry but > will return to the EPC of the first exception, that is only one stack > frame will be popped. Whops ... It looks like a design issue -- further asynchronous bus error exceptions should be blocked till one currenly being handled has been acked. In fact if they are asynchronous, then it really makes no sense to use the exception and a general interrupt should be used instead -- the whole point of using an exception here is the ability to stop a data corrupting transaction, as unlike an interropt, an exception can be precise. Maciej ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: wrong SP restored after DBE exception 2006-09-28 13:56 ` Maciej W. Rozycki @ 2006-09-28 14:28 ` Ralf Baechle 2006-09-28 14:34 ` Dave Johnson 0 siblings, 1 reply; 6+ messages in thread From: Ralf Baechle @ 2006-09-28 14:28 UTC (permalink / raw) To: Maciej W. Rozycki; +Cc: Dave Johnson, linux-mips On Thu, Sep 28, 2006 at 02:56:29PM +0100, Maciej W. Rozycki wrote: > > If we take a DBE exception in this code we're in trouble and I've seen > > systems delivering DBEs highly asynchronously. Afar the Broadcom SOCs > > fall into that class. > > > > So the interesting part is if we take a data bus exception between > > the stack pointer adjustment and and before EXL is cleared. We're taking > > a nested exception so c0_epc and c0_cause.bd will not be updated. So > > when the bus error handler will save the $sp value it saw on entry but > > will return to the EPC of the first exception, that is only one stack > > frame will be popped. Whops ... > > It looks like a design issue -- further asynchronous bus error exceptions > should be blocked till one currenly being handled has been acked. In fact > if they are asynchronous, then it really makes no sense to use the > exception and a general interrupt should be used instead -- the whole > point of using an exception here is the ability to stop a data corrupting > transaction, as unlike an interropt, an exception can be precise. I would suggest to disable interrupts around accesses that potencially could result in DB exceptions and just to make sure he is not getting trapped by a non-blocking load by making some use of any value read from the device. Writes could be posted depending on bus type. So having a read from the same device would force the write to complete. Ralf ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: wrong SP restored after DBE exception 2006-09-28 14:28 ` Ralf Baechle @ 2006-09-28 14:34 ` Dave Johnson 2006-09-28 17:55 ` Dave Johnson 0 siblings, 1 reply; 6+ messages in thread From: Dave Johnson @ 2006-09-28 14:34 UTC (permalink / raw) To: Ralf Baechle; +Cc: Maciej W. Rozycki, linux-mips Ralf Baechle writes: > I would suggest to disable interrupts around accesses that potencially > could result in DB exceptions and just to make sure he is not getting > trapped by a non-blocking load by making some use of any value read > from the device. Writes could be posted depending on bus type. So > having a read from the same device would force the write to complete. > > Ralf Ya, I was about to try that. I could be getting an interrupt between the time the read is issued and the timeout occurs on the GBus. Also, doing a dummy read on the GBus to a device that shouldn't fault prior to (for reads) or after (for writes) the potentially faulting one to force ordering seems like a good idea too. -- Dave Johnson Starent Networks ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: wrong SP restored after DBE exception 2006-09-28 14:34 ` Dave Johnson @ 2006-09-28 17:55 ` Dave Johnson 0 siblings, 0 replies; 6+ messages in thread From: Dave Johnson @ 2006-09-28 17:55 UTC (permalink / raw) To: Ralf Baechle, Maciej W. Rozycki, linux-mips Dave Johnson <djohnson+linux-mips@sw.starentnetworks.com>, writes: > Ralf Baechle writes: > > I would suggest to disable interrupts around accesses that potencially > > could result in DB exceptions and just to make sure he is not getting > > trapped by a non-blocking load by making some use of any value read > > from the device. Writes could be posted depending on bus type. So > > having a read from the same device would force the write to complete. Disabling interrupts around the accesses works ok. My test program has caused about 400000 DBEs so far with no problem. Thanks. -- Dave Johnson Starent Networks ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2006-09-28 17:56 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-09-27 19:53 wrong SP restored after DBE exception Dave Johnson 2006-09-28 13:09 ` Ralf Baechle 2006-09-28 13:56 ` Maciej W. Rozycki 2006-09-28 14:28 ` Ralf Baechle 2006-09-28 14:34 ` Dave Johnson 2006-09-28 17:55 ` Dave Johnson
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.