public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 2.6.17-rc1-mm2: badness in 3w_xxxx driver
@ 2006-04-09 18:23 Nick Orlov
  2006-04-09 18:32 ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Nick Orlov @ 2006-04-09 18:23 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton, Jens Axboe, James Bottomley

The following patch: x86-kmap_atomic-debugging.patch exposed a badness
in 3w_xxx driver. I'm getting a lot of:

Apr  9 13:00:04 nickolas kernel: kmap_atomic: local irqs are enabled while using KM_IRQn
Apr  9 13:00:04 nickolas kernel:  <c0104103> show_trace+0x13/0x20   <c010412e> dump_stack+0x1e/0x20
Apr  9 13:00:04 nickolas kernel:  <c01159c9> kmap_atomic+0x79/0xe0   <c028b885> tw_transfer_internal+0x85/0xa0
Apr  9 13:00:04 nickolas kernel:  <c028ca7e> tw_interrupt+0x3fe/0x820   <c0143b9e> handle_IRQ_event+0x3e/0x80
Apr  9 13:00:04 nickolas kernel:  <c0143c70> __do_IRQ+0x90/0x100   <c01057a6> do_IRQ+0x26/0x40
Apr  9 13:00:04 nickolas kernel:  <c010396e> common_interrupt+0x1a/0x20   <c0101cdd> cpu_idle+0x4d/0xb0
Apr  9 13:00:04 nickolas kernel:  <c010f2cc> start_secondary+0x24c/0x4b0   <00000000> 0x0
Apr  9 13:00:04 nickolas kernel:  <c214ffb4> 0xc214ffb4  

I'm running 32 bit kernel on AMD64x2 w/ HIGHMEM enabled.
I think this is an old bug since the 3w_xxxx.c has not been changed for
a long time (at least since 2.6.16-rc1-mm4).

Please let me know if you want me to try some patches.

-- 
With best wishes,
	Nick Orlov.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.17-rc1-mm2: badness in 3w_xxxx driver
  2006-04-09 18:23 2.6.17-rc1-mm2: badness in 3w_xxxx driver Nick Orlov
@ 2006-04-09 18:32 ` Andrew Morton
  2006-04-09 19:12   ` Jeff Garzik
  2006-04-09 19:12   ` Nick Orlov
  0 siblings, 2 replies; 7+ messages in thread
From: Andrew Morton @ 2006-04-09 18:32 UTC (permalink / raw)
  To: Nick Orlov; +Cc: linux-kernel, axboe, James.Bottomley

Nick Orlov <bugfixer@list.ru> wrote:
>
> The following patch: x86-kmap_atomic-debugging.patch exposed a badness
> in 3w_xxx driver.

Sweet, thanks.

> I'm getting a lot of:
> 
> Apr  9 13:00:04 nickolas kernel: kmap_atomic: local irqs are enabled while using KM_IRQn
> Apr  9 13:00:04 nickolas kernel:  <c0104103> show_trace+0x13/0x20   <c010412e> dump_stack+0x1e/0x20
> Apr  9 13:00:04 nickolas kernel:  <c01159c9> kmap_atomic+0x79/0xe0   <c028b885> tw_transfer_internal+0x85/0xa0
> Apr  9 13:00:04 nickolas kernel:  <c028ca7e> tw_interrupt+0x3fe/0x820   <c0143b9e> handle_IRQ_event+0x3e/0x80
> Apr  9 13:00:04 nickolas kernel:  <c0143c70> __do_IRQ+0x90/0x100   <c01057a6> do_IRQ+0x26/0x40
> Apr  9 13:00:04 nickolas kernel:  <c010396e> common_interrupt+0x1a/0x20   <c0101cdd> cpu_idle+0x4d/0xb0
> Apr  9 13:00:04 nickolas kernel:  <c010f2cc> start_secondary+0x24c/0x4b0   <00000000> 0x0
> Apr  9 13:00:04 nickolas kernel:  <c214ffb4> 0xc214ffb4  
> 
> I'm running 32 bit kernel on AMD64x2 w/ HIGHMEM enabled.
> I think this is an old bug since the 3w_xxxx.c has not been changed for
> a long time (at least since 2.6.16-rc1-mm4).
> 
> Please let me know if you want me to try some patches.
> 


From: Andrew Morton <akpm@osdl.org>

We must disable local IRQs while holding KM_IRQ0 or KM_IRQ1.  Otherwise, an
IRQ handler could use those kmap slots while this code is using them,
resulting in memory corruption.

Thanks to Nick Orlov <bugfixer@list.ru> for reporting.

Cc: <linuxraid@amcc.com>
Cc: James Bottomley <James.Bottomley@SteelEye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 drivers/scsi/3w-xxxx.c |    3 +++
 1 files changed, 3 insertions(+)

diff -puN drivers/scsi/3w-xxxx.c~3ware-kmap_atomic-fix drivers/scsi/3w-xxxx.c
--- devel/drivers/scsi/3w-xxxx.c~3ware-kmap_atomic-fix	2006-04-09 11:28:08.000000000 -0700
+++ devel-akpm/drivers/scsi/3w-xxxx.c	2006-04-09 11:29:21.000000000 -0700
@@ -1508,10 +1508,12 @@ static void tw_transfer_internal(TW_Devi
 	struct scsi_cmnd *cmd = tw_dev->srb[request_id];
 	void *buf;
 	unsigned int transfer_len;
+	unsigned long flags = 0;
 
 	if (cmd->use_sg) {
 		struct scatterlist *sg =
 			(struct scatterlist *)cmd->request_buffer;
+		local_irq_save(flags);
 		buf = kmap_atomic(sg->page, KM_IRQ0) + sg->offset;
 		transfer_len = min(sg->length, len);
 	} else {
@@ -1526,6 +1528,7 @@ static void tw_transfer_internal(TW_Devi
 
 		sg = (struct scatterlist *)cmd->request_buffer;
 		kunmap_atomic(buf - sg->offset, KM_IRQ0);
+		local_irq_restore(flags);
 	}
 }
 
_


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.17-rc1-mm2: badness in 3w_xxxx driver
  2006-04-09 18:32 ` Andrew Morton
@ 2006-04-09 19:12   ` Jeff Garzik
  2006-04-09 19:21     ` Arjan van de Ven
  2006-04-09 19:12   ` Nick Orlov
  1 sibling, 1 reply; 7+ messages in thread
From: Jeff Garzik @ 2006-04-09 19:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Nick Orlov, linux-kernel, axboe, James.Bottomley

Andrew Morton wrote:
> Nick Orlov <bugfixer@list.ru> wrote:
>> The following patch: x86-kmap_atomic-debugging.patch exposed a badness
>> in 3w_xxx driver.
> 
> Sweet, thanks.
> 
>> I'm getting a lot of:
>>
>> Apr  9 13:00:04 nickolas kernel: kmap_atomic: local irqs are enabled while using KM_IRQn
>> Apr  9 13:00:04 nickolas kernel:  <c0104103> show_trace+0x13/0x20   <c010412e> dump_stack+0x1e/0x20
>> Apr  9 13:00:04 nickolas kernel:  <c01159c9> kmap_atomic+0x79/0xe0   <c028b885> tw_transfer_internal+0x85/0xa0
>> Apr  9 13:00:04 nickolas kernel:  <c028ca7e> tw_interrupt+0x3fe/0x820   <c0143b9e> handle_IRQ_event+0x3e/0x80
>> Apr  9 13:00:04 nickolas kernel:  <c0143c70> __do_IRQ+0x90/0x100   <c01057a6> do_IRQ+0x26/0x40
>> Apr  9 13:00:04 nickolas kernel:  <c010396e> common_interrupt+0x1a/0x20   <c0101cdd> cpu_idle+0x4d/0xb0
>> Apr  9 13:00:04 nickolas kernel:  <c010f2cc> start_secondary+0x24c/0x4b0   <00000000> 0x0
>> Apr  9 13:00:04 nickolas kernel:  <c214ffb4> 0xc214ffb4  
>>
>> I'm running 32 bit kernel on AMD64x2 w/ HIGHMEM enabled.
>> I think this is an old bug since the 3w_xxxx.c has not been changed for
>> a long time (at least since 2.6.16-rc1-mm4).
>>
>> Please let me know if you want me to try some patches.
>>
> 
> 
> From: Andrew Morton <akpm@osdl.org>
> 
> We must disable local IRQs while holding KM_IRQ0 or KM_IRQ1.  Otherwise, an
> IRQ handler could use those kmap slots while this code is using them,
> resulting in memory corruption.
> 
> Thanks to Nick Orlov <bugfixer@list.ru> for reporting.
> 
> Cc: <linuxraid@amcc.com>
> Cc: James Bottomley <James.Bottomley@SteelEye.com>
> Signed-off-by: Andrew Morton <akpm@osdl.org>
> ---
> 
>  drivers/scsi/3w-xxxx.c |    3 +++
>  1 files changed, 3 insertions(+)
> 
> diff -puN drivers/scsi/3w-xxxx.c~3ware-kmap_atomic-fix drivers/scsi/3w-xxxx.c
> --- devel/drivers/scsi/3w-xxxx.c~3ware-kmap_atomic-fix	2006-04-09 11:28:08.000000000 -0700
> +++ devel-akpm/drivers/scsi/3w-xxxx.c	2006-04-09 11:29:21.000000000 -0700
> @@ -1508,10 +1508,12 @@ static void tw_transfer_internal(TW_Devi
>  	struct scsi_cmnd *cmd = tw_dev->srb[request_id];
>  	void *buf;
>  	unsigned int transfer_len;
> +	unsigned long flags = 0;
>  
>  	if (cmd->use_sg) {
>  		struct scatterlist *sg =
>  			(struct scatterlist *)cmd->request_buffer;
> +		local_irq_save(flags);
>  		buf = kmap_atomic(sg->page, KM_IRQ0) + sg->offset;
>  		transfer_len = min(sg->length, len);
>  	} else {
> @@ -1526,6 +1528,7 @@ static void tw_transfer_internal(TW_Devi
>  
>  		sg = (struct scatterlist *)cmd->request_buffer;
>  		kunmap_atomic(buf - sg->offset, KM_IRQ0);
> +		local_irq_restore(flags);

ACK.

Though please make sure the active maintainer is CC'd on this...  There 
is even a helpful MAINTAINERS entry for this driver.

	Jeff



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.17-rc1-mm2: badness in 3w_xxxx driver
  2006-04-09 18:32 ` Andrew Morton
  2006-04-09 19:12   ` Jeff Garzik
@ 2006-04-09 19:12   ` Nick Orlov
  2006-04-09 19:43     ` Andrew Morton
  1 sibling, 1 reply; 7+ messages in thread
From: Nick Orlov @ 2006-04-09 19:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On Sun, Apr 09, 2006 at 11:32:40AM -0700, Andrew Morton wrote:
> Nick Orlov <bugfixer@list.ru> wrote:
> >
> > The following patch: x86-kmap_atomic-debugging.patch exposed a badness
> > in 3w_xxx driver.
> 
> Sweet, thanks.
> 
[[ skipped ]]
> 
> From: Andrew Morton <akpm@osdl.org>
> 
> We must disable local IRQs while holding KM_IRQ0 or KM_IRQ1.  Otherwise, an
> IRQ handler could use those kmap slots while this code is using them,
> resulting in memory corruption.
> 
> Thanks to Nick Orlov <bugfixer@list.ru> for reporting.
> 
> Cc: <linuxraid@amcc.com>
> Cc: James Bottomley <James.Bottomley@SteelEye.com>
> Signed-off-by: Andrew Morton <akpm@osdl.org>
> ---
> 
>  drivers/scsi/3w-xxxx.c |    3 +++
>  1 files changed, 3 insertions(+)
> 
> diff -puN drivers/scsi/3w-xxxx.c~3ware-kmap_atomic-fix drivers/scsi/3w-xxxx.c
> --- devel/drivers/scsi/3w-xxxx.c~3ware-kmap_atomic-fix	2006-04-09 11:28:08.000000000 -0700
> +++ devel-akpm/drivers/scsi/3w-xxxx.c	2006-04-09 11:29:21.000000000 -0700
> @@ -1508,10 +1508,12 @@ static void tw_transfer_internal(TW_Devi
>  	struct scsi_cmnd *cmd = tw_dev->srb[request_id];
>  	void *buf;
>  	unsigned int transfer_len;
> +	unsigned long flags = 0;
>  
>  	if (cmd->use_sg) {
>  		struct scatterlist *sg =
>  			(struct scatterlist *)cmd->request_buffer;
> +		local_irq_save(flags);
>  		buf = kmap_atomic(sg->page, KM_IRQ0) + sg->offset;
>  		transfer_len = min(sg->length, len);
>  	} else {
> @@ -1526,6 +1528,7 @@ static void tw_transfer_internal(TW_Devi
>  
>  		sg = (struct scatterlist *)cmd->request_buffer;
>  		kunmap_atomic(buf - sg->offset, KM_IRQ0);
> +		local_irq_restore(flags);
>  	}
>  }
>  
> _

Confirmed, this patch solves the "badness" problem for me.
I still experiencing a weird hangs though (the box just hangs, no
messages on console/syslog, nothing). I'll try to nail it down.

2.6.16-mm2 works like a charm with the same config.
Do you know which patches should I try to revert first?

-- 
With best wishes,
	Nick Orlov.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.17-rc1-mm2: badness in 3w_xxxx driver
  2006-04-09 19:12   ` Jeff Garzik
@ 2006-04-09 19:21     ` Arjan van de Ven
  0 siblings, 0 replies; 7+ messages in thread
From: Arjan van de Ven @ 2006-04-09 19:21 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Andrew Morton, Nick Orlov, linux-kernel, axboe, James.Bottomley


> > Cc: <linuxraid@amcc.com>
> > ---
> > 
> ACK.
> 
> Though please make sure the active maintainer is CC'd on this...  There 
> is even a helpful MAINTAINERS entry for this driver.

I'd say it is ;-)



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.17-rc1-mm2: badness in 3w_xxxx driver
  2006-04-09 19:12   ` Nick Orlov
@ 2006-04-09 19:43     ` Andrew Morton
  2006-04-09 21:23       ` Nick Orlov
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2006-04-09 19:43 UTC (permalink / raw)
  To: Nick Orlov; +Cc: linux-kernel

Nick Orlov <bugfixer@list.ru> wrote:
>
> Confirmed, this patch solves the "badness" problem for me.

yup, thanks.

>  I still experiencing a weird hangs though (the box just hangs, no
>  messages on console/syslog, nothing). I'll try to nail it down.
> 
>  2.6.16-mm2 works like a charm with the same config.
>  Do you know which patches should I try to revert first?

Gee, 2.6.16-mm2 was a long time ago.

Tried sysrq?

	echo 1 > /proc/sys/kernel/sysrq
	<wait for hang>
	ALT-SYSRQ-P or ALT-SYSRQ-T

Is the NMi watchdog enabled?  Boot with `nmi_watchdog=1', make sure that
the NMI counts are incrementing in /proc/interrupts.

Failing all that, testing 2.6.17-rc1 would be interesting.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.17-rc1-mm2: badness in 3w_xxxx driver
  2006-04-09 19:43     ` Andrew Morton
@ 2006-04-09 21:23       ` Nick Orlov
  0 siblings, 0 replies; 7+ messages in thread
From: Nick Orlov @ 2006-04-09 21:23 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On Sun, Apr 09, 2006 at 12:43:01PM -0700, Andrew Morton wrote:
> Nick Orlov <bugfixer@list.ru> wrote:
> >
> > Confirmed, this patch solves the "badness" problem for me.
> 
> yup, thanks.
> 
> >  I still experiencing a weird hangs though (the box just hangs, no
> >  messages on console/syslog, nothing). I'll try to nail it down.
> > 
> >  2.6.16-mm2 works like a charm with the same config.
> >  Do you know which patches should I try to revert first?
> 
> Gee, 2.6.16-mm2 was a long time ago.
> 
> Tried sysrq?
> 
> 	echo 1 > /proc/sys/kernel/sysrq
> 	<wait for hang>
> 	ALT-SYSRQ-P or ALT-SYSRQ-T
> 
> Is the NMi watchdog enabled?  Boot with `nmi_watchdog=1', make sure that
> the NMI counts are incrementing in /proc/interrupts.
> 
> Failing all that, testing 2.6.17-rc1 would be interesting.

2.6.17-rc1 fails in the same fashion - it hangs "randomly".
Good news that I've found the pattern and solution:
it always happens when 2 applications open /dev/dsp simultaneously.

Applying the following patches published by Takashi Iwai solves the
problem:
http://marc.theaimsgroup.com/?l=linux-kernel&m=114423578508165&w=2
http://marc.theaimsgroup.com/?l=linux-kernel&m=114424198614019&w=2

Not sure if the first one is enough.

I would probably recommend to put them into the hot-fixes,
since many people can be frustrated because of this.

-- 
With best wishes,
	Nick Orlov.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2006-04-09 21:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-09 18:23 2.6.17-rc1-mm2: badness in 3w_xxxx driver Nick Orlov
2006-04-09 18:32 ` Andrew Morton
2006-04-09 19:12   ` Jeff Garzik
2006-04-09 19:21     ` Arjan van de Ven
2006-04-09 19:12   ` Nick Orlov
2006-04-09 19:43     ` Andrew Morton
2006-04-09 21:23       ` Nick Orlov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox