public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* OOPS when copying data from local to an external drive (ieee1394)
@ 2004-03-07  6:39 Dmitry Torokhov
  2004-03-08  6:50 ` Zwane Mwaikambo
  0 siblings, 1 reply; 5+ messages in thread
From: Dmitry Torokhov @ 2004-03-07  6:39 UTC (permalink / raw)
  To: Ben Collins; +Cc: linux-kernel

Hi,

I started getting oopses when cpying data from local IDE to an external
Firewire drive. Not always, but quite often. The kernel is a bk pull a
day before 2.6.4-rc2 was released, I do not see any ieee1394 updates
since.

Unfortunately the oops was not saves in the logs, so here is what I managed
to write down:

Oops: 00002 [#1]
PREEMPT
CPU: 0
EIP: 0060 [<c0243d087>] Tainted: P
EFLAGS: 00010047
EIP is at hpsb_packet_sent+0x86/0x90
eax: 00100100 ebx: dfd74000 ecx: dd6edfb0 edx: 00200200
esi: 00000001 edi: dd6cdf60 ebp: c03e3ee0 esp: c03c3edc
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0; threadinfo=c03c2000, task=c034a800)
....
Call trace:
[<co25306e>] dma_trm_tasklet+0xae/0x1b0
recal_task_prio+0xb4/0x1f0
tasklet_action
do_softirq
do_IRQ
common_interrupt
acpi_process_idle
default_idle
rest_init
default_init
rest_init
cpu_idle
start_kernel
unknown_bootparam

Code: ...
Kernel panic: Fatal exception in interrupt
In interrupt handler - not synching


This OOPS is with NVIDIA module loaded but I have seen exactly the
same trace without the module loaded.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: OOPS when copying data from local to an external drive (ieee1394)
  2004-03-07  6:39 OOPS when copying data from local to an external drive (ieee1394) Dmitry Torokhov
@ 2004-03-08  6:50 ` Zwane Mwaikambo
  2004-03-09  7:11   ` Dmitry Torokhov
  0 siblings, 1 reply; 5+ messages in thread
From: Zwane Mwaikambo @ 2004-03-08  6:50 UTC (permalink / raw)
  To: Dmitry Torokhov; +Cc: Ben Collins, Linux Kernel

On Sun, 7 Mar 2004, Dmitry Torokhov wrote:

> I started getting oopses when cpying data from local IDE to an external
> Firewire drive. Not always, but quite often. The kernel is a bk pull a
> day before 2.6.4-rc2 was released, I do not see any ieee1394 updates
> since.
>
> Unfortunately the oops was not saves in the logs, so here is what I managed
> to write down:

> Oops: 00002 [#1]
> PREEMPT
> CPU: 0
> EIP: 0060 [<c0243d087>] Tainted: P
> EFLAGS: 00010047
> EIP is at hpsb_packet_sent+0x86/0x90
> eax: 00100100 ebx: dfd74000 ecx: dd6edfb0 edx: 00200200

A spot of linked list corruption.

> esi: 00000001 edi: dd6cdf60 ebp: c03e3ee0 esp: c03c3edc
> ds: 007b es: 007b ss: 0068
> Process swapper (pid: 0; threadinfo=c03c2000, task=c034a800)
> ....
> Call trace:
> [<co25306e>] dma_trm_tasklet+0xae/0x1b0

Does this patch help any?

Index: linux-2.6.4-rc1-mm2/drivers/ieee1394/ieee1394_core.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.4-rc1-mm2/drivers/ieee1394/ieee1394_core.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 ieee1394_core.c
--- linux-2.6.4-rc1-mm2/drivers/ieee1394/ieee1394_core.c	4 Mar 2004 04:12:44 -0000	1.1.1.1
+++ linux-2.6.4-rc1-mm2/drivers/ieee1394/ieee1394_core.c	8 Mar 2004 06:47:04 -0000
@@ -403,6 +403,8 @@ void hpsb_selfid_complete(struct hpsb_ho
 void hpsb_packet_sent(struct hpsb_host *host, struct hpsb_packet *packet,
                       int ackcode)
 {
+	unsigned long flags;
+
 	packet->ack_code = ackcode;

 	if (packet->no_waiter) {
@@ -413,7 +415,9 @@ void hpsb_packet_sent(struct hpsb_host *

 	if (ackcode != ACK_PENDING || !packet->expect_response) {
 		atomic_dec(&packet->refcnt);
+		spin_lock_irqsave(&host->pending_pkt_lock, flags);
 		list_del(&packet->list);
+		spin_unlock_irqrestore(&host->pending_pkt_lock, flags);
 		packet->state = hpsb_complete;
 		queue_packet_complete(packet);
 		return;

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: OOPS when copying data from local to an external drive (ieee1394)
  2004-03-08  6:50 ` Zwane Mwaikambo
@ 2004-03-09  7:11   ` Dmitry Torokhov
  2004-03-09 15:16     ` Zwane Mwaikambo
  0 siblings, 1 reply; 5+ messages in thread
From: Dmitry Torokhov @ 2004-03-09  7:11 UTC (permalink / raw)
  To: Zwane Mwaikambo; +Cc: Ben Collins, Linux Kernel

On Monday 08 March 2004 01:50 am, Zwane Mwaikambo wrote:
> On Sun, 7 Mar 2004, Dmitry Torokhov wrote:
> 
> > I started getting oopses when cpying data from local IDE to an external
> > Firewire drive. Not always, but quite often. The kernel is a bk pull a
> > day before 2.6.4-rc2 was released, I do not see any ieee1394 updates
> > since.
> >
> > Unfortunately the oops was not saves in the logs, so here is what I managed
> > to write down:
> 
> > Oops: 00002 [#1]
> > PREEMPT
> > CPU: 0
> > EIP: 0060 [<c0243d087>] Tainted: P
> > EFLAGS: 00010047
> > EIP is at hpsb_packet_sent+0x86/0x90
> > eax: 00100100 ebx: dfd74000 ecx: dd6edfb0 edx: 00200200
> 
> A spot of linked list corruption.
> 
> > esi: 00000001 edi: dd6cdf60 ebp: c03e3ee0 esp: c03c3edc
> > ds: 007b es: 007b ss: 0068
> > Process swapper (pid: 0; threadinfo=c03c2000, task=c034a800)
> > ....
> > Call trace:
> > [<co25306e>] dma_trm_tasklet+0xae/0x1b0
> 
> Does this patch help any?
> 

Unfortunately I am still getting oopses with exactly the same call trace.
On top of that I am now seeing the following in the logs:

Mar  9 01:41:21 core kernel: ieee1394: sbp2: aborting sbp2 command
Mar  9 01:41:21 core kernel: Write (10) 00 11 27 de 17 00 00 f8 00
Mar  9 01:41:21 core kernel: ieee1394: sbp2: aborting sbp2 command
Mar  9 01:41:21 core kernel: Write (10) 00 11 27 df 0f 00 00 f8 00
Mar  9 01:41:21 core kernel: ieee1394: sbp2: aborting sbp2 command
Mar  9 01:41:21 core kernel: Write (10) 00 11 27 e0 07 00 00 f8 00
Mar  9 01:41:21 core kernel: ieee1394: sbp2: aborting sbp2 command
Mar  9 01:41:21 core kernel: Write (10) 00 11 27 e0 ff 00 00 f8 00
Mar  9 01:41:21 core kernel: ieee1394: sbp2: aborting sbp2 command
Mar  9 01:41:21 core kernel: Write (10) 00 11 27 e1 f7 00 00 f8 00
Mar  9 01:41:21 core kernel: ieee1394: sbp2: aborting sbp2 command
Mar  9 01:41:21 core kernel: Write (10) 00 11 27 e3 e7 00 00 f8 00
Mar  9 01:41:21 core kernel: ieee1394: sbp2: aborting sbp2 command
Mar  9 01:41:22 core kernel: Write (10) 00 11 27 e4 df 00 00 f8 00
Mar  9 01:41:23 core kernel: ieee1394: sbp2: aborting sbp2 command
Mar  9 01:41:23 core kernel: Write (10) 00 11 27 e5 d7 00 00 f8 00
Mar  9 01:41:26 core kernel: ieee1394: sbp2: sbp2util_node_write_no_wait failed
Mar  9 01:41:28 core last message repeated 8 times
Mar  9 01:41:56 core kernel: ieee1394: sbp2: aborting sbp2 command
Mar  9 01:41:56 core kernel: Write (10) 00 11 2a f8 ff 00 00 f8 00
Mar  9 01:41:56 core kernel: ieee1394: sbp2: aborting sbp2 command
Mar  9 01:41:56 core kernel: Write (10) 00 11 2a f9 f7 00 00 f8 00
Mar  9 01:41:56 core kernel: ieee1394: sbp2: aborting sbp2 command
Mar  9 01:41:56 core kernel: Write (10) 00 11 2a fa ef 00 00 f8 00
Mar  9 01:41:56 core kernel: ieee1394: sbp2: aborting sbp2 command
Mar  9 01:41:56 core kernel: Write (10) 00 11 2a fb e7 00 00 f8 00
Mar  9 01:41:56 core kernel: ieee1394: sbp2: aborting sbp2 command
Mar  9 01:41:56 core kernel: Write (10) 00 11 2a fc df 00 00 f8 00
Mar  9 01:41:56 core kernel: ieee1394: sbp2: aborting sbp2 command
Mar  9 01:41:56 core kernel: Write (10) 00 11 2a fe cf 00 00 f8 00

I did not have these messages before. The kernel was pulled today
from bkbits plus your patch (and some of my patches but they only
affect input drivers).

-- 
Dmitry

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: OOPS when copying data from local to an external drive (ieee1394)
  2004-03-09  7:11   ` Dmitry Torokhov
@ 2004-03-09 15:16     ` Zwane Mwaikambo
  2004-03-09 15:41       ` Ben Collins
  0 siblings, 1 reply; 5+ messages in thread
From: Zwane Mwaikambo @ 2004-03-09 15:16 UTC (permalink / raw)
  To: Dmitry Torokhov; +Cc: Ben Collins, Linux Kernel

On Tue, 9 Mar 2004, Dmitry Torokhov wrote:

> > Does this patch help any?
> >
>
> Unfortunately I am still getting oopses with exactly the same call trace.
> On top of that I am now seeing the following in the logs:

Thanks for testing it, the messages below look like they may be due to
something else.

> I did not have these messages before. The kernel was pulled today
> from bkbits plus your patch (and some of my patches but they only
> affect input drivers).

Just to reconfirm could you backout my patch from that and retry?

Thanks,
	Zwane


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: OOPS when copying data from local to an external drive (ieee1394)
  2004-03-09 15:16     ` Zwane Mwaikambo
@ 2004-03-09 15:41       ` Ben Collins
  0 siblings, 0 replies; 5+ messages in thread
From: Ben Collins @ 2004-03-09 15:41 UTC (permalink / raw)
  To: Zwane Mwaikambo; +Cc: Dmitry Torokhov, Linux Kernel

On Tue, Mar 09, 2004 at 10:16:40AM -0500, Zwane Mwaikambo wrote:
> On Tue, 9 Mar 2004, Dmitry Torokhov wrote:
> 
> > > Does this patch help any?
> > >
> >
> > Unfortunately I am still getting oopses with exactly the same call trace.
> > On top of that I am now seeing the following in the logs:
> 
> Thanks for testing it, the messages below look like they may be due to
> something else.

No, that's exactly from your patch. The locking your patch added seems
to be wrong. I'm looking into the issue already.


-- 
Debian     - http://www.debian.org/
Linux 1394 - http://www.linux1394.org/
Subversion - http://subversion.tigris.org/
WatchGuard - http://www.watchguard.com/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2004-03-09 15:55 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-07  6:39 OOPS when copying data from local to an external drive (ieee1394) Dmitry Torokhov
2004-03-08  6:50 ` Zwane Mwaikambo
2004-03-09  7:11   ` Dmitry Torokhov
2004-03-09 15:16     ` Zwane Mwaikambo
2004-03-09 15:41       ` Ben Collins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox