* [PANIC] ohci1394 & copy large files @ 2004-04-04 14:16 Marcel Lanz 2004-04-04 14:13 ` Ben Collins 0 siblings, 1 reply; 10+ messages in thread From: Marcel Lanz @ 2004-04-04 14:16 UTC (permalink / raw) To: linux-kernel Since 2.6.4 and still in 2.6.5 I get regurarly a Kernel panic if I try to backup large files (10-35GB) to an external attached disc (200GB/JFS) via ieee1394/sbp2. Has anyone similar problems ? Marcel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PANIC] ohci1394 & copy large files 2004-04-04 14:16 [PANIC] ohci1394 & copy large files Marcel Lanz @ 2004-04-04 14:13 ` Ben Collins 2004-04-04 23:00 ` Benjamin Herrenschmidt 2004-04-05 5:03 ` Dmitry Torokhov 0 siblings, 2 replies; 10+ messages in thread From: Ben Collins @ 2004-04-04 14:13 UTC (permalink / raw) To: Marcel Lanz; +Cc: linux-kernel On Sun, Apr 04, 2004 at 04:16:00PM +0200, Marcel Lanz wrote: > Since 2.6.4 and still in 2.6.5 I get regurarly a Kernel panic if I try > to backup large files (10-35GB) to an external attached disc (200GB/JFS) via ieee1394/sbp2. > > Has anyone similar problems ? Known issue, fixed in our repo. I still need to sync with Linus once I iron one more issue and merge some more patches. -- Debian - http://www.debian.org/ Linux 1394 - http://www.linux1394.org/ Subversion - http://subversion.tigris.org/ WatchGuard - http://www.watchguard.com/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PANIC] ohci1394 & copy large files 2004-04-04 14:13 ` Ben Collins @ 2004-04-04 23:00 ` Benjamin Herrenschmidt 2004-04-04 23:17 ` Ben Collins 2004-04-05 5:03 ` Dmitry Torokhov 1 sibling, 1 reply; 10+ messages in thread From: Benjamin Herrenschmidt @ 2004-04-04 23:00 UTC (permalink / raw) To: Ben Collins; +Cc: Marcel Lanz, Linux Kernel list On Mon, 2004-04-05 at 00:13, Ben Collins wrote: > On Sun, Apr 04, 2004 at 04:16:00PM +0200, Marcel Lanz wrote: > > Since 2.6.4 and still in 2.6.5 I get regurarly a Kernel panic if I try > > to backup large files (10-35GB) to an external attached disc (200GB/JFS) via ieee1394/sbp2. > > > > Has anyone similar problems ? > > Known issue, fixed in our repo. I still need to sync with Linus once I > iron one more issue and merge some more patches. Hi Ben ! I don't want to be too critical or harsh or whatever, but why don't you just send such fixes right upstream instead of stacking patches for a while in your repo ? From my experience, such "batching" of patches is the _wrong_ thing to do, and typically, there is a major useability issue with sbp2 that could have been "right" in 2.6.5 final and will not be (so we'll have to wait what ? 1 or 2 monthes more now to have a release kernel with a reliable sbp2) Ben. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PANIC] ohci1394 & copy large files 2004-04-04 23:00 ` Benjamin Herrenschmidt @ 2004-04-04 23:17 ` Ben Collins 2004-04-04 23:28 ` Randy.Dunlap 2004-04-05 0:07 ` Benjamin Herrenschmidt 0 siblings, 2 replies; 10+ messages in thread From: Ben Collins @ 2004-04-04 23:17 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: Marcel Lanz, Linux Kernel list On Mon, Apr 05, 2004 at 09:00:24AM +1000, Benjamin Herrenschmidt wrote: > On Mon, 2004-04-05 at 00:13, Ben Collins wrote: > > On Sun, Apr 04, 2004 at 04:16:00PM +0200, Marcel Lanz wrote: > > > Since 2.6.4 and still in 2.6.5 I get regurarly a Kernel panic if I try > > > to backup large files (10-35GB) to an external attached disc (200GB/JFS) via ieee1394/sbp2. > > > > > > Has anyone similar problems ? > > > > Known issue, fixed in our repo. I still need to sync with Linus once I > > iron one more issue and merge some more patches. > > Hi Ben ! > > I don't want to be too critical or harsh or whatever, but why don't you > just send such fixes right upstream instead of stacking patches for a > while in your repo ? From my experience, such "batching" of patches is > the _wrong_ thing to do, and typically, there is a major useability > issue with sbp2 that could have been "right" in 2.6.5 final and will not > be (so we'll have to wait what ? 1 or 2 monthes more now to have a > release kernel with a reliable sbp2) Because the fix was pretty extensive and needed testing. It was potentially more broken that the problem it was fixing. Sending untested patches to Linus is far worse than batching a few up and pushing to him. -- Debian - http://www.debian.org/ Linux 1394 - http://www.linux1394.org/ Subversion - http://subversion.tigris.org/ WatchGuard - http://www.watchguard.com/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PANIC] ohci1394 & copy large files 2004-04-04 23:17 ` Ben Collins @ 2004-04-04 23:28 ` Randy.Dunlap 2004-04-05 0:09 ` Ben Collins 2004-04-05 0:07 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 10+ messages in thread From: Randy.Dunlap @ 2004-04-04 23:28 UTC (permalink / raw) To: Ben Collins; +Cc: benh, marcel.lanz, linux-kernel On Sun, 4 Apr 2004 19:17:46 -0400 Ben Collins <bcollins@debian.org> wrote: | On Mon, Apr 05, 2004 at 09:00:24AM +1000, Benjamin Herrenschmidt wrote: | > On Mon, 2004-04-05 at 00:13, Ben Collins wrote: | > > On Sun, Apr 04, 2004 at 04:16:00PM +0200, Marcel Lanz wrote: | > > > Since 2.6.4 and still in 2.6.5 I get regurarly a Kernel panic if I try | > > > to backup large files (10-35GB) to an external attached disc (200GB/JFS) via ieee1394/sbp2. | > > > | > > > Has anyone similar problems ? | > > | > > Known issue, fixed in our repo. I still need to sync with Linus once I | > > iron one more issue and merge some more patches. | > | > Hi Ben ! | > | > I don't want to be too critical or harsh or whatever, but why don't you | > just send such fixes right upstream instead of stacking patches for a | > while in your repo ? From my experience, such "batching" of patches is | > the _wrong_ thing to do, and typically, there is a major useability | > issue with sbp2 that could have been "right" in 2.6.5 final and will not | > be (so we'll have to wait what ? 1 or 2 monthes more now to have a | > release kernel with a reliable sbp2) | | Because the fix was pretty extensive and needed testing. It was | potentially more broken that the problem it was fixing. Sending untested | patches to Linus is far worse than batching a few up and pushing to him. Was (is) it already being tested more extensively in the -mm patches before going to Linus? Should/could be. E.g., that's what gregkh does, and ACPI, etc. -- ~Randy ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PANIC] ohci1394 & copy large files 2004-04-04 23:28 ` Randy.Dunlap @ 2004-04-05 0:09 ` Ben Collins 0 siblings, 0 replies; 10+ messages in thread From: Ben Collins @ 2004-04-05 0:09 UTC (permalink / raw) To: Randy.Dunlap; +Cc: benh, marcel.lanz, linux-kernel On Sun, Apr 04, 2004 at 04:28:18PM -0700, Randy.Dunlap wrote: > On Sun, 4 Apr 2004 19:17:46 -0400 Ben Collins <bcollins@debian.org> wrote: > > | On Mon, Apr 05, 2004 at 09:00:24AM +1000, Benjamin Herrenschmidt wrote: > | > On Mon, 2004-04-05 at 00:13, Ben Collins wrote: > | > > On Sun, Apr 04, 2004 at 04:16:00PM +0200, Marcel Lanz wrote: > | > > > Since 2.6.4 and still in 2.6.5 I get regurarly a Kernel panic if I try > | > > > to backup large files (10-35GB) to an external attached disc (200GB/JFS) via ieee1394/sbp2. > | > > > > | > > > Has anyone similar problems ? > | > > > | > > Known issue, fixed in our repo. I still need to sync with Linus once I > | > > iron one more issue and merge some more patches. > | > > | > Hi Ben ! > | > > | > I don't want to be too critical or harsh or whatever, but why don't you > | > just send such fixes right upstream instead of stacking patches for a > | > while in your repo ? From my experience, such "batching" of patches is > | > the _wrong_ thing to do, and typically, there is a major useability > | > issue with sbp2 that could have been "right" in 2.6.5 final and will not > | > be (so we'll have to wait what ? 1 or 2 monthes more now to have a > | > release kernel with a reliable sbp2) > | > | Because the fix was pretty extensive and needed testing. It was > | potentially more broken that the problem it was fixing. Sending untested > | patches to Linus is far worse than batching a few up and pushing to him. > > Was (is) it already being tested more extensively in the -mm patches > before going to Linus? Should/could be. E.g., that's what gregkh does, > and ACPI, etc. That's what generally happens in our own repo, and if I get the chance to sync them to my bk tree, then that's what happens with -mm too. Wasn't the case here since I've been swamped for a little over a week. -- Debian - http://www.debian.org/ Linux 1394 - http://www.linux1394.org/ Subversion - http://subversion.tigris.org/ WatchGuard - http://www.watchguard.com/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PANIC] ohci1394 & copy large files 2004-04-04 23:17 ` Ben Collins 2004-04-04 23:28 ` Randy.Dunlap @ 2004-04-05 0:07 ` Benjamin Herrenschmidt 2004-04-05 0:12 ` Ben Collins 1 sibling, 1 reply; 10+ messages in thread From: Benjamin Herrenschmidt @ 2004-04-05 0:07 UTC (permalink / raw) To: Ben Collins; +Cc: Marcel Lanz, Linux Kernel list On Mon, 2004-04-05 at 09:17, Ben Collins wrote: > Because the fix was pretty extensive and needed testing. It was > potentially more broken that the problem it was fixing. Sending untested > patches to Linus is far worse than batching a few up and pushing to him. Ok, makes sense, it wasn't just a 1-liner quick fix then ;) Still, from my experience, _very few_ people actually test things in trees like ieee1394, fbdev, etc... Even my tree isn't what it used to be for pmacs now that I'm fully in sync upstream. Even -mm lately haven't been as tested as it used to be (possibly because of upstream getting better). I find it's quite ok to send a fix that needs a bit more testing to a Linus -rc1 (but not later), Ben. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PANIC] ohci1394 & copy large files 2004-04-05 0:07 ` Benjamin Herrenschmidt @ 2004-04-05 0:12 ` Ben Collins 0 siblings, 0 replies; 10+ messages in thread From: Ben Collins @ 2004-04-05 0:12 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: Marcel Lanz, Linux Kernel list On Mon, Apr 05, 2004 at 10:07:56AM +1000, Benjamin Herrenschmidt wrote: > On Mon, 2004-04-05 at 09:17, Ben Collins wrote: > > > Because the fix was pretty extensive and needed testing. It was > > potentially more broken that the problem it was fixing. Sending untested > > patches to Linus is far worse than batching a few up and pushing to him. > > Ok, makes sense, it wasn't just a 1-liner quick fix then ;) > > Still, from my experience, _very few_ people actually test things in > trees like ieee1394, fbdev, etc... Even my tree isn't what it used > to be for pmacs now that I'm fully in sync upstream. > > Even -mm lately haven't been as tested as it used to be (possibly > because of upstream getting better). I find it's quite ok to send > a fix that needs a bit more testing to a Linus -rc1 (but not later), People that experience problems that I have a fix for in the repo, usually get pointed to that repo for testing. In this case I did the fix after -rc2, wasn't comfortable enough to send it for -rc3 and I think I may wait for 2.6.6-rc1 since there's a harder to reproduce bug now that I have to fix (I haven't seen it, but a few people that tested our repo have reported it). -- Debian - http://www.debian.org/ Linux 1394 - http://www.linux1394.org/ Subversion - http://subversion.tigris.org/ WatchGuard - http://www.watchguard.com/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PANIC] ohci1394 & copy large files 2004-04-04 14:13 ` Ben Collins 2004-04-04 23:00 ` Benjamin Herrenschmidt @ 2004-04-05 5:03 ` Dmitry Torokhov 2004-04-05 13:44 ` Ben Collins 1 sibling, 1 reply; 10+ messages in thread From: Dmitry Torokhov @ 2004-04-05 5:03 UTC (permalink / raw) To: linux-kernel; +Cc: Ben Collins, Marcel Lanz On Sunday 04 April 2004 09:13 am, Ben Collins wrote: > On Sun, Apr 04, 2004 at 04:16:00PM +0200, Marcel Lanz wrote: > > Since 2.6.4 and still in 2.6.5 I get regurarly a Kernel panic if I try > > to backup large files (10-35GB) to an external attached disc (200GB/JFS) via ieee1394/sbp2. > > > > Has anyone similar problems ? > > Known issue, fixed in our repo. I still need to sync with Linus once I > iron one more issue and merge some more patches. > I have some concerns that it is completely fixed in your tree - there is still a race - if hpsb_packet_received arrives before hosb_packet_sent then there is a chance that the code will try to put the same packet in the completion queue twice. With SVN tree it will cause kernel BUG in skb code, in BK tree kernel will just oops. I wonder what was the reason to convert the code to abuse skbs aside for using skbs queues and their locking? Anyway, below is a backport of my patch from SVN to BK tree, I would like to know if it works for others... -- Dmitry =================================================================== ChangeSet@1.1784, 2004-04-04 23:52:29-05:00, dtor_core@ameritech.net IEEE1394: Make sure that a packet submitted into completion queue just once; race between hpsb_packet_sent and hpsb_packet_received ieee1394_core.c | 26 ++++++++++++++++++++++---- ieee1394_core.h | 1 + 2 files changed, 23 insertions(+), 4 deletions(-) =================================================================== diff -Nru a/drivers/ieee1394/ieee1394_core.c b/drivers/ieee1394/ieee1394_core.c --- a/drivers/ieee1394/ieee1394_core.c Sun Apr 4 23:53:45 2004 +++ b/drivers/ieee1394/ieee1394_core.c Sun Apr 4 23:53:45 2004 @@ -96,7 +96,7 @@ WARN_ON(packet->complete_routine != NULL); packet->complete_routine = routine; packet->complete_data = data; - return; + atomic_set(&packet->completion_armed, 1); } /** @@ -412,8 +412,26 @@ } if (ackcode != ACK_PENDING || !packet->expect_response) { + struct hpsb_packet *p; + struct list_head *lh; + unsigned long flags; + atomic_dec(&packet->refcnt); - list_del(&packet->list); + + /* + * Remove packet from host's pending queue + * (and not from any other queue) + */ + spin_lock_irqsave(&host->pending_pkt_lock, flags); + list_for_each(lh, &host->pending_packets) { + p = list_entry(lh, struct hpsb_packet, list); + if (p == packet) { + list_del(&packet->list); + break; + } + } + spin_unlock_irqrestore(&host->pending_pkt_lock, flags); + packet->state = hpsb_complete; queue_packet_complete(packet); return; @@ -1003,7 +1021,8 @@ static void queue_packet_complete(struct hpsb_packet *packet) { - if (packet->complete_routine != NULL) { + if (packet->complete_routine != NULL && + atomic_dec_and_test(&packet->completion_armed)) { unsigned long flags; spin_lock_irqsave(&khpsbpkt_lock, flags); @@ -1013,7 +1032,6 @@ /* Signal the kernel thread to handle this */ up(&khpsbpkt_sig); } - return; } static int hpsbpkt_thread(void *__hi) diff -Nru a/drivers/ieee1394/ieee1394_core.h b/drivers/ieee1394/ieee1394_core.h --- a/drivers/ieee1394/ieee1394_core.h Sun Apr 4 23:53:45 2004 +++ b/drivers/ieee1394/ieee1394_core.h Sun Apr 4 23:53:45 2004 @@ -66,6 +66,7 @@ * packet is completed. */ void (*complete_routine)(void *); void *complete_data; + atomic_t completion_armed; /* Store jiffies for implementing bus timeouts. */ unsigned long sendtime; ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PANIC] ohci1394 & copy large files 2004-04-05 5:03 ` Dmitry Torokhov @ 2004-04-05 13:44 ` Ben Collins 0 siblings, 0 replies; 10+ messages in thread From: Ben Collins @ 2004-04-05 13:44 UTC (permalink / raw) To: Dmitry Torokhov; +Cc: linux-kernel, Marcel Lanz On Mon, Apr 05, 2004 at 12:03:10AM -0500, Dmitry Torokhov wrote: > On Sunday 04 April 2004 09:13 am, Ben Collins wrote: > > On Sun, Apr 04, 2004 at 04:16:00PM +0200, Marcel Lanz wrote: > > > Since 2.6.4 and still in 2.6.5 I get regurarly a Kernel panic if I try > > > to backup large files (10-35GB) to an external attached disc (200GB/JFS) via ieee1394/sbp2. > > > > > > Has anyone similar problems ? > > > > Known issue, fixed in our repo. I still need to sync with Linus once I > > iron one more issue and merge some more patches. > > > > I have some concerns that it is completely fixed in your tree - there is still > a race - if hpsb_packet_received arrives before hosb_packet_sent then there is > a chance that the code will try to put the same packet in the completion queue > twice. With SVN tree it will cause kernel BUG in skb code, in BK tree kernel > will just oops. > > I wonder what was the reason to convert the code to abuse skbs aside for using > skbs queues and their locking? > > Anyway, below is a backport of my patch from SVN to BK tree, I would like to > know if it works for others... That's the other problem I was talking about. I'm reviewing your patch today. -- Debian - http://www.debian.org/ Linux 1394 - http://www.linux1394.org/ Subversion - http://subversion.tigris.org/ WatchGuard - http://www.watchguard.com/ ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2004-04-05 13:57 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-04-04 14:16 [PANIC] ohci1394 & copy large files Marcel Lanz 2004-04-04 14:13 ` Ben Collins 2004-04-04 23:00 ` Benjamin Herrenschmidt 2004-04-04 23:17 ` Ben Collins 2004-04-04 23:28 ` Randy.Dunlap 2004-04-05 0:09 ` Ben Collins 2004-04-05 0:07 ` Benjamin Herrenschmidt 2004-04-05 0:12 ` Ben Collins 2004-04-05 5:03 ` Dmitry Torokhov 2004-04-05 13:44 ` Ben Collins
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox