public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [PANIC] ohci1394 & copy large files
  2004-04-04 14:16 [PANIC] ohci1394 & copy large files Marcel Lanz
@ 2004-04-04 14:13 ` Ben Collins
  2004-04-04 23:00   ` Benjamin Herrenschmidt
  2004-04-05  5:03   ` Dmitry Torokhov
  0 siblings, 2 replies; 10+ messages in thread
From: Ben Collins @ 2004-04-04 14:13 UTC (permalink / raw)
  To: Marcel Lanz; +Cc: linux-kernel

On Sun, Apr 04, 2004 at 04:16:00PM +0200, Marcel Lanz wrote:
> Since 2.6.4 and still in 2.6.5 I get regurarly a Kernel panic if I try
> to backup large files (10-35GB) to an external attached disc (200GB/JFS) via ieee1394/sbp2.
> 
> Has anyone similar problems ?

Known issue, fixed in our repo. I still need to sync with Linus once I
iron one more issue and merge some more patches.

-- 
Debian     - http://www.debian.org/
Linux 1394 - http://www.linux1394.org/
Subversion - http://subversion.tigris.org/
WatchGuard - http://www.watchguard.com/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PANIC] ohci1394 & copy large files
@ 2004-04-04 14:16 Marcel Lanz
  2004-04-04 14:13 ` Ben Collins
  0 siblings, 1 reply; 10+ messages in thread
From: Marcel Lanz @ 2004-04-04 14:16 UTC (permalink / raw)
  To: linux-kernel

Since 2.6.4 and still in 2.6.5 I get regurarly a Kernel panic if I try
to backup large files (10-35GB) to an external attached disc (200GB/JFS) via ieee1394/sbp2.

Has anyone similar problems ?

Marcel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PANIC] ohci1394 & copy large files
  2004-04-04 14:13 ` Ben Collins
@ 2004-04-04 23:00   ` Benjamin Herrenschmidt
  2004-04-04 23:17     ` Ben Collins
  2004-04-05  5:03   ` Dmitry Torokhov
  1 sibling, 1 reply; 10+ messages in thread
From: Benjamin Herrenschmidt @ 2004-04-04 23:00 UTC (permalink / raw)
  To: Ben Collins; +Cc: Marcel Lanz, Linux Kernel list

On Mon, 2004-04-05 at 00:13, Ben Collins wrote:
> On Sun, Apr 04, 2004 at 04:16:00PM +0200, Marcel Lanz wrote:
> > Since 2.6.4 and still in 2.6.5 I get regurarly a Kernel panic if I try
> > to backup large files (10-35GB) to an external attached disc (200GB/JFS) via ieee1394/sbp2.
> > 
> > Has anyone similar problems ?
> 
> Known issue, fixed in our repo. I still need to sync with Linus once I
> iron one more issue and merge some more patches.

Hi Ben !

I don't want to be too critical or harsh or whatever, but why don't you
just send such fixes right upstream instead of stacking patches for a
while in your repo ? From my experience, such "batching" of patches is
the _wrong_ thing to do, and typically, there is a major useability
issue with sbp2 that could have been "right" in 2.6.5 final and will not
be (so we'll have to wait what ? 1 or 2 monthes more now to have a
release kernel with a reliable sbp2)

Ben.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PANIC] ohci1394 & copy large files
  2004-04-04 23:00   ` Benjamin Herrenschmidt
@ 2004-04-04 23:17     ` Ben Collins
  2004-04-04 23:28       ` Randy.Dunlap
  2004-04-05  0:07       ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 10+ messages in thread
From: Ben Collins @ 2004-04-04 23:17 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Marcel Lanz, Linux Kernel list

On Mon, Apr 05, 2004 at 09:00:24AM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2004-04-05 at 00:13, Ben Collins wrote:
> > On Sun, Apr 04, 2004 at 04:16:00PM +0200, Marcel Lanz wrote:
> > > Since 2.6.4 and still in 2.6.5 I get regurarly a Kernel panic if I try
> > > to backup large files (10-35GB) to an external attached disc (200GB/JFS) via ieee1394/sbp2.
> > > 
> > > Has anyone similar problems ?
> > 
> > Known issue, fixed in our repo. I still need to sync with Linus once I
> > iron one more issue and merge some more patches.
> 
> Hi Ben !
> 
> I don't want to be too critical or harsh or whatever, but why don't you
> just send such fixes right upstream instead of stacking patches for a
> while in your repo ? From my experience, such "batching" of patches is
> the _wrong_ thing to do, and typically, there is a major useability
> issue with sbp2 that could have been "right" in 2.6.5 final and will not
> be (so we'll have to wait what ? 1 or 2 monthes more now to have a
> release kernel with a reliable sbp2)

Because the fix was pretty extensive and needed testing. It was
potentially more broken that the problem it was fixing. Sending untested
patches to Linus is far worse than batching a few up and pushing to him.

-- 
Debian     - http://www.debian.org/
Linux 1394 - http://www.linux1394.org/
Subversion - http://subversion.tigris.org/
WatchGuard - http://www.watchguard.com/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PANIC] ohci1394 & copy large files
  2004-04-04 23:17     ` Ben Collins
@ 2004-04-04 23:28       ` Randy.Dunlap
  2004-04-05  0:09         ` Ben Collins
  2004-04-05  0:07       ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 10+ messages in thread
From: Randy.Dunlap @ 2004-04-04 23:28 UTC (permalink / raw)
  To: Ben Collins; +Cc: benh, marcel.lanz, linux-kernel

On Sun, 4 Apr 2004 19:17:46 -0400 Ben Collins <bcollins@debian.org> wrote:

| On Mon, Apr 05, 2004 at 09:00:24AM +1000, Benjamin Herrenschmidt wrote:
| > On Mon, 2004-04-05 at 00:13, Ben Collins wrote:
| > > On Sun, Apr 04, 2004 at 04:16:00PM +0200, Marcel Lanz wrote:
| > > > Since 2.6.4 and still in 2.6.5 I get regurarly a Kernel panic if I try
| > > > to backup large files (10-35GB) to an external attached disc (200GB/JFS) via ieee1394/sbp2.
| > > > 
| > > > Has anyone similar problems ?
| > > 
| > > Known issue, fixed in our repo. I still need to sync with Linus once I
| > > iron one more issue and merge some more patches.
| > 
| > Hi Ben !
| > 
| > I don't want to be too critical or harsh or whatever, but why don't you
| > just send such fixes right upstream instead of stacking patches for a
| > while in your repo ? From my experience, such "batching" of patches is
| > the _wrong_ thing to do, and typically, there is a major useability
| > issue with sbp2 that could have been "right" in 2.6.5 final and will not
| > be (so we'll have to wait what ? 1 or 2 monthes more now to have a
| > release kernel with a reliable sbp2)
| 
| Because the fix was pretty extensive and needed testing. It was
| potentially more broken that the problem it was fixing. Sending untested
| patches to Linus is far worse than batching a few up and pushing to him.

Was (is) it already being tested more extensively in the -mm patches
before going to Linus?  Should/could be.  E.g., that's what gregkh does,
and ACPI, etc.

--
~Randy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PANIC] ohci1394 & copy large files
  2004-04-04 23:17     ` Ben Collins
  2004-04-04 23:28       ` Randy.Dunlap
@ 2004-04-05  0:07       ` Benjamin Herrenschmidt
  2004-04-05  0:12         ` Ben Collins
  1 sibling, 1 reply; 10+ messages in thread
From: Benjamin Herrenschmidt @ 2004-04-05  0:07 UTC (permalink / raw)
  To: Ben Collins; +Cc: Marcel Lanz, Linux Kernel list

On Mon, 2004-04-05 at 09:17, Ben Collins wrote:

> Because the fix was pretty extensive and needed testing. It was
> potentially more broken that the problem it was fixing. Sending untested
> patches to Linus is far worse than batching a few up and pushing to him.

Ok, makes sense, it wasn't just a 1-liner quick fix then ;)

Still, from my experience, _very few_ people actually test things in
trees like ieee1394, fbdev, etc...  Even my tree isn't what it used
to be for pmacs now that I'm fully in sync upstream.

Even -mm lately haven't been as tested as it used to be (possibly
because of upstream getting better). I find it's quite ok to send
a fix that needs a bit more testing to a Linus -rc1 (but not later),

Ben.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PANIC] ohci1394 & copy large files
  2004-04-04 23:28       ` Randy.Dunlap
@ 2004-04-05  0:09         ` Ben Collins
  0 siblings, 0 replies; 10+ messages in thread
From: Ben Collins @ 2004-04-05  0:09 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: benh, marcel.lanz, linux-kernel

On Sun, Apr 04, 2004 at 04:28:18PM -0700, Randy.Dunlap wrote:
> On Sun, 4 Apr 2004 19:17:46 -0400 Ben Collins <bcollins@debian.org> wrote:
> 
> | On Mon, Apr 05, 2004 at 09:00:24AM +1000, Benjamin Herrenschmidt wrote:
> | > On Mon, 2004-04-05 at 00:13, Ben Collins wrote:
> | > > On Sun, Apr 04, 2004 at 04:16:00PM +0200, Marcel Lanz wrote:
> | > > > Since 2.6.4 and still in 2.6.5 I get regurarly a Kernel panic if I try
> | > > > to backup large files (10-35GB) to an external attached disc (200GB/JFS) via ieee1394/sbp2.
> | > > > 
> | > > > Has anyone similar problems ?
> | > > 
> | > > Known issue, fixed in our repo. I still need to sync with Linus once I
> | > > iron one more issue and merge some more patches.
> | > 
> | > Hi Ben !
> | > 
> | > I don't want to be too critical or harsh or whatever, but why don't you
> | > just send such fixes right upstream instead of stacking patches for a
> | > while in your repo ? From my experience, such "batching" of patches is
> | > the _wrong_ thing to do, and typically, there is a major useability
> | > issue with sbp2 that could have been "right" in 2.6.5 final and will not
> | > be (so we'll have to wait what ? 1 or 2 monthes more now to have a
> | > release kernel with a reliable sbp2)
> | 
> | Because the fix was pretty extensive and needed testing. It was
> | potentially more broken that the problem it was fixing. Sending untested
> | patches to Linus is far worse than batching a few up and pushing to him.
> 
> Was (is) it already being tested more extensively in the -mm patches
> before going to Linus?  Should/could be.  E.g., that's what gregkh does,
> and ACPI, etc.

That's what generally happens in our own repo, and if I get the chance
to sync them to my bk tree, then that's what happens with -mm too.
Wasn't the case here since I've been swamped for a little over a week.

-- 
Debian     - http://www.debian.org/
Linux 1394 - http://www.linux1394.org/
Subversion - http://subversion.tigris.org/
WatchGuard - http://www.watchguard.com/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PANIC] ohci1394 & copy large files
  2004-04-05  0:07       ` Benjamin Herrenschmidt
@ 2004-04-05  0:12         ` Ben Collins
  0 siblings, 0 replies; 10+ messages in thread
From: Ben Collins @ 2004-04-05  0:12 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Marcel Lanz, Linux Kernel list

On Mon, Apr 05, 2004 at 10:07:56AM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2004-04-05 at 09:17, Ben Collins wrote:
> 
> > Because the fix was pretty extensive and needed testing. It was
> > potentially more broken that the problem it was fixing. Sending untested
> > patches to Linus is far worse than batching a few up and pushing to him.
> 
> Ok, makes sense, it wasn't just a 1-liner quick fix then ;)
> 
> Still, from my experience, _very few_ people actually test things in
> trees like ieee1394, fbdev, etc...  Even my tree isn't what it used
> to be for pmacs now that I'm fully in sync upstream.
> 
> Even -mm lately haven't been as tested as it used to be (possibly
> because of upstream getting better). I find it's quite ok to send
> a fix that needs a bit more testing to a Linus -rc1 (but not later),

People that experience problems that I have a fix for in the repo,
usually get pointed to that repo for testing.

In this case I did the fix after -rc2, wasn't comfortable enough to send
it for -rc3 and I think I may wait for 2.6.6-rc1 since there's a harder
to reproduce bug now that I have to fix (I haven't seen it, but a few
people that tested our repo have reported it).

-- 
Debian     - http://www.debian.org/
Linux 1394 - http://www.linux1394.org/
Subversion - http://subversion.tigris.org/
WatchGuard - http://www.watchguard.com/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PANIC] ohci1394 & copy large files
  2004-04-04 14:13 ` Ben Collins
  2004-04-04 23:00   ` Benjamin Herrenschmidt
@ 2004-04-05  5:03   ` Dmitry Torokhov
  2004-04-05 13:44     ` Ben Collins
  1 sibling, 1 reply; 10+ messages in thread
From: Dmitry Torokhov @ 2004-04-05  5:03 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ben Collins, Marcel Lanz

On Sunday 04 April 2004 09:13 am, Ben Collins wrote:
> On Sun, Apr 04, 2004 at 04:16:00PM +0200, Marcel Lanz wrote:
> > Since 2.6.4 and still in 2.6.5 I get regurarly a Kernel panic if I try
> > to backup large files (10-35GB) to an external attached disc (200GB/JFS) via ieee1394/sbp2.
> > 
> > Has anyone similar problems ?
> 
> Known issue, fixed in our repo. I still need to sync with Linus once I
> iron one more issue and merge some more patches.
> 

I have some concerns that it is completely fixed in your tree - there is still
a race - if hpsb_packet_received arrives before hosb_packet_sent then there is
a chance that the code will try to put the same packet in the completion queue
twice. With SVN tree it will cause kernel BUG in skb code, in BK tree kernel
will just oops.

I wonder what was the reason to convert the code to abuse skbs aside for using
skbs queues and their locking?

Anyway, below is a backport of my patch from SVN to BK tree, I would like to
know if it works for others...

-- 
Dmitry


===================================================================


ChangeSet@1.1784, 2004-04-04 23:52:29-05:00, dtor_core@ameritech.net
  IEEE1394: Make sure that a packet submitted into completion
            queue just once; race between hpsb_packet_sent and
            hpsb_packet_received


 ieee1394_core.c |   26 ++++++++++++++++++++++----
 ieee1394_core.h |    1 +
 2 files changed, 23 insertions(+), 4 deletions(-)


===================================================================



diff -Nru a/drivers/ieee1394/ieee1394_core.c b/drivers/ieee1394/ieee1394_core.c
--- a/drivers/ieee1394/ieee1394_core.c	Sun Apr  4 23:53:45 2004
+++ b/drivers/ieee1394/ieee1394_core.c	Sun Apr  4 23:53:45 2004
@@ -96,7 +96,7 @@
 	WARN_ON(packet->complete_routine != NULL);
 	packet->complete_routine = routine;
 	packet->complete_data = data;
-	return;
+	atomic_set(&packet->completion_armed, 1);
 }
 
 /**
@@ -412,8 +412,26 @@
 	}
 
 	if (ackcode != ACK_PENDING || !packet->expect_response) {
+		struct hpsb_packet *p;
+		struct list_head *lh;
+		unsigned long flags;
+
 		atomic_dec(&packet->refcnt);
-		list_del(&packet->list);
+
+		/*
+		 * Remove packet from host's pending queue
+		 * (and not from any other queue)
+		 */
+		spin_lock_irqsave(&host->pending_pkt_lock, flags);
+	        list_for_each(lh, &host->pending_packets) {
+                	p = list_entry(lh, struct hpsb_packet, list);
+                	if (p == packet) {
+				list_del(&packet->list);
+				break;
+                	}
+        	}
+		spin_unlock_irqrestore(&host->pending_pkt_lock, flags);
+
 		packet->state = hpsb_complete;
 		queue_packet_complete(packet);
 		return;
@@ -1003,7 +1021,8 @@
 
 static void queue_packet_complete(struct hpsb_packet *packet)
 {
-	if (packet->complete_routine != NULL) {
+	if (packet->complete_routine != NULL &&
+	    atomic_dec_and_test(&packet->completion_armed)) {
 		unsigned long flags;
 
 		spin_lock_irqsave(&khpsbpkt_lock, flags);
@@ -1013,7 +1032,6 @@
 		/* Signal the kernel thread to handle this */
 		up(&khpsbpkt_sig);
 	}
-	return;
 }
 
 static int hpsbpkt_thread(void *__hi)
diff -Nru a/drivers/ieee1394/ieee1394_core.h b/drivers/ieee1394/ieee1394_core.h
--- a/drivers/ieee1394/ieee1394_core.h	Sun Apr  4 23:53:45 2004
+++ b/drivers/ieee1394/ieee1394_core.h	Sun Apr  4 23:53:45 2004
@@ -66,6 +66,7 @@
 	 * packet is completed.  */
 	void (*complete_routine)(void *);
 	void *complete_data;
+	atomic_t completion_armed;
 
         /* Store jiffies for implementing bus timeouts. */
         unsigned long sendtime;

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PANIC] ohci1394 & copy large files
  2004-04-05  5:03   ` Dmitry Torokhov
@ 2004-04-05 13:44     ` Ben Collins
  0 siblings, 0 replies; 10+ messages in thread
From: Ben Collins @ 2004-04-05 13:44 UTC (permalink / raw)
  To: Dmitry Torokhov; +Cc: linux-kernel, Marcel Lanz

On Mon, Apr 05, 2004 at 12:03:10AM -0500, Dmitry Torokhov wrote:
> On Sunday 04 April 2004 09:13 am, Ben Collins wrote:
> > On Sun, Apr 04, 2004 at 04:16:00PM +0200, Marcel Lanz wrote:
> > > Since 2.6.4 and still in 2.6.5 I get regurarly a Kernel panic if I try
> > > to backup large files (10-35GB) to an external attached disc (200GB/JFS) via ieee1394/sbp2.
> > > 
> > > Has anyone similar problems ?
> > 
> > Known issue, fixed in our repo. I still need to sync with Linus once I
> > iron one more issue and merge some more patches.
> > 
> 
> I have some concerns that it is completely fixed in your tree - there is still
> a race - if hpsb_packet_received arrives before hosb_packet_sent then there is
> a chance that the code will try to put the same packet in the completion queue
> twice. With SVN tree it will cause kernel BUG in skb code, in BK tree kernel
> will just oops.
> 
> I wonder what was the reason to convert the code to abuse skbs aside for using
> skbs queues and their locking?
> 
> Anyway, below is a backport of my patch from SVN to BK tree, I would like to
> know if it works for others...

That's the other problem I was talking about. I'm reviewing your patch
today.

-- 
Debian     - http://www.debian.org/
Linux 1394 - http://www.linux1394.org/
Subversion - http://subversion.tigris.org/
WatchGuard - http://www.watchguard.com/

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2004-04-05 13:57 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-04-04 14:16 [PANIC] ohci1394 & copy large files Marcel Lanz
2004-04-04 14:13 ` Ben Collins
2004-04-04 23:00   ` Benjamin Herrenschmidt
2004-04-04 23:17     ` Ben Collins
2004-04-04 23:28       ` Randy.Dunlap
2004-04-05  0:09         ` Ben Collins
2004-04-05  0:07       ` Benjamin Herrenschmidt
2004-04-05  0:12         ` Ben Collins
2004-04-05  5:03   ` Dmitry Torokhov
2004-04-05 13:44     ` Ben Collins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox