* Crash in ide_do_request() on card removal
@ 2005-07-29 12:01 Steven Scholz
2005-08-02 9:57 ` Steven Scholz
0 siblings, 1 reply; 23+ messages in thread
From: Steven Scholz @ 2005-07-29 12:01 UTC (permalink / raw)
To: linux-ide
Hi there,
when surprisingly removing a CF ATA card (without unmounting before) I sometimes
get kernel crashes in ide_do_request() (linux-2.6.13-rc4 on ARM):
cardmgr[194]: shutting down socket 0
cardmgr[194]: executing: './ide stop hda'
cardmgr[194]: + umount -v /dev/hda1
Assertion '(hwgroup->drive)' failed in drivers/ide/ide-io.c:ide_do_request(1130)
Assertion '(drive)' failed in drivers/ide/ide-io.c:choose_drive(1035)
Unable to handle kernel NULL pointer dereference at virtual address 00000010
pgd = c0e34000
[00000010] *pgd=20eb0031, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1]
Modules linked in: ide_cs pcmcia at91_cf pcmcia_core
CPU: 0
PC is at ide_do_request+0x100/0x480
LR is at 0x1
pc : [<c00f9980>] lr : [<00000001>] Not tainted
...
As the assertions show "drive" is NULL (due to the card removal?) and thus the
kernel crashes ...
Upon card removal the pcmcia cardmgr tries to unmount the drive which disapeared.
("sometimes" above means that the rest of the time the kernel is not dumping
core, but the umount process hangs forever.)
Is this a kernel bug?
--
Steven
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: Crash in ide_do_request() on card removal 2005-07-29 12:01 Crash in ide_do_request() on card removal Steven Scholz @ 2005-08-02 9:57 ` Steven Scholz 2005-08-02 10:48 ` Jens Axboe 0 siblings, 1 reply; 23+ messages in thread From: Steven Scholz @ 2005-08-02 9:57 UTC (permalink / raw) To: linux-ide, linux-kernel Steven Scholz wrote: > Hi there, > > when surprisingly removing a CF ATA card (without unmounting before) I > sometimes get kernel crashes in ide_do_request() (linux-2.6.13-rc4 on ARM): > > cardmgr[194]: shutting down socket 0 > cardmgr[194]: executing: './ide stop hda' > cardmgr[194]: + umount -v /dev/hda1 > Assertion '(hwgroup->drive)' failed in > drivers/ide/ide-io.c:ide_do_request(1130) > Assertion '(drive)' failed in drivers/ide/ide-io.c:choose_drive(1035) > Unable to handle kernel NULL pointer dereference at virtual address > 00000010 > pgd = c0e34000 > [00000010] *pgd=20eb0031, *pte=00000000, *ppte=00000000 > Internal error: Oops: 17 [#1] > Modules linked in: ide_cs pcmcia at91_cf pcmcia_core > CPU: 0 > PC is at ide_do_request+0x100/0x480 > LR is at 0x1 > pc : [<c00f9980>] lr : [<00000001>] Not tainted > ... > > As the assertions show "drive" is NULL (due to the card removal?) and > thus the kernel crashes ... > > Upon card removal the pcmcia cardmgr tries to unmount the drive which > disapeared. > > ("sometimes" above means that the rest of the time the kernel is not > dumping core, but the umount process hangs forever.) (I think) I found the reason for this behaviour: Upon card removal the functions ~ # cardctl eject ide_release(398) ide_unregister(585): index=0 blk_unregister_queue(3603) elv_unregister_queue(549) ide_unregister(698) ide_detach(164) are called. Thus the request queue for the drive is discarded which is fair enough. But disk->queue would still point to a (now invalid) request_queue_t structure. Thus if I/O requests (e.g. "umount") are started _after_ the drive was removed bad things can happen! So I think we should explicitly remove the reference to that queue by doing void blk_unregister_queue(struct gendisk *disk) { request_queue_t *q = disk->queue; if (q && q->request_fn) { elv_unregister_queue(q); kobject_unregister(&q->kobj); + disk->queue = NULL; kobject_put(&disk->kobj); } } in drivers/block/ll_rw_blk.c Then instead of a crash or hang one would get ~ # umount /mnt/pcmcia/ ... generic_shutdown_super(249) calling sop->put_super @ c00ac734 fat_clusters_flush(49) generic_make_request: Trying to access nonexistent block-device hda1 (1) FAT: bread failed in fat_clusters_flush Thanks a million. -- Steven ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Crash in ide_do_request() on card removal 2005-08-02 9:57 ` Steven Scholz @ 2005-08-02 10:48 ` Jens Axboe 2005-08-02 11:10 ` Steven Scholz 0 siblings, 1 reply; 23+ messages in thread From: Jens Axboe @ 2005-08-02 10:48 UTC (permalink / raw) To: Steven Scholz; +Cc: linux-ide, linux-kernel On Tue, Aug 02 2005, Steven Scholz wrote: > Steven Scholz wrote: > > >Hi there, > > > >when surprisingly removing a CF ATA card (without unmounting before) I > >sometimes get kernel crashes in ide_do_request() (linux-2.6.13-rc4 on ARM): > > > >cardmgr[194]: shutting down socket 0 > >cardmgr[194]: executing: './ide stop hda' > >cardmgr[194]: + umount -v /dev/hda1 > >Assertion '(hwgroup->drive)' failed in > >drivers/ide/ide-io.c:ide_do_request(1130) > >Assertion '(drive)' failed in drivers/ide/ide-io.c:choose_drive(1035) > >Unable to handle kernel NULL pointer dereference at virtual address > >00000010 > >pgd = c0e34000 > >[00000010] *pgd=20eb0031, *pte=00000000, *ppte=00000000 > >Internal error: Oops: 17 [#1] > >Modules linked in: ide_cs pcmcia at91_cf pcmcia_core > >CPU: 0 > >PC is at ide_do_request+0x100/0x480 > >LR is at 0x1 > >pc : [<c00f9980>] lr : [<00000001>] Not tainted > >... > > > >As the assertions show "drive" is NULL (due to the card removal?) and > >thus the kernel crashes ... > > > >Upon card removal the pcmcia cardmgr tries to unmount the drive which > >disapeared. > > > >("sometimes" above means that the rest of the time the kernel is not > >dumping core, but the umount process hangs forever.) > > (I think) I found the reason for this behaviour: > > Upon card removal the functions > > ~ # cardctl eject > ide_release(398) > ide_unregister(585): index=0 > blk_unregister_queue(3603) > elv_unregister_queue(549) > ide_unregister(698) > ide_detach(164) > > are called. Thus the request queue for the drive is discarded which is fair > enough. But disk->queue would still point to a (now invalid) > request_queue_t structure. Thus if I/O requests (e.g. "umount") are started > _after_ the drive was removed bad things can happen! So I think we should > explicitly remove the reference to that queue by doing > > void blk_unregister_queue(struct gendisk *disk) > { > request_queue_t *q = disk->queue; > > if (q && q->request_fn) { > elv_unregister_queue(q); > kobject_unregister(&q->kobj); > + disk->queue = NULL; > kobject_put(&disk->kobj); > } > } That's not quite true, q is not invalid after this call. It will only be invalid when it is freed (which doesn't happen from here but rather from the blk_cleanup_queue() call when the reference count drops to 0). This is still not perfect, but a lot better. Does it work for you? --- linux-2.6.12/drivers/ide/ide-disk.c~ 2005-08-02 12:48:16.000000000 +0200 +++ linux-2.6.12/drivers/ide/ide-disk.c 2005-08-02 12:48:32.000000000 +0200 @@ -1054,6 +1054,7 @@ drive->driver_data = NULL; drive->devfs_name[0] = '\0'; g->private_data = NULL; + g->disk = NULL; put_disk(g); kfree(idkp); } -- Jens Axboe ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Crash in ide_do_request() on card removal 2005-08-02 10:48 ` Jens Axboe @ 2005-08-02 11:10 ` Steven Scholz 2005-08-02 11:13 ` Jens Axboe 0 siblings, 1 reply; 23+ messages in thread From: Steven Scholz @ 2005-08-02 11:10 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-ide Jens Axboe wrote: > That's not quite true, q is not invalid after this call. It will only be > invalid when it is freed (which doesn't happen from here but rather from > the blk_cleanup_queue() call when the reference count drops to 0). > > This is still not perfect, but a lot better. Does it work for you? > > --- linux-2.6.12/drivers/ide/ide-disk.c~ 2005-08-02 12:48:16.000000000 +0200 > +++ linux-2.6.12/drivers/ide/ide-disk.c 2005-08-02 12:48:32.000000000 +0200 > @@ -1054,6 +1054,7 @@ > drive->driver_data = NULL; > drive->devfs_name[0] = '\0'; > g->private_data = NULL; > + g->disk = NULL; > put_disk(g); > kfree(idkp); > } No. drivers/ide/ide-disk.c: In function `ide_disk_release': drivers/ide/ide-disk.c:1057: error: structure has no member named `disk' -- Steven -- Steven Scholz imc Measurement & Control imc Meßsysteme GmbH Voltastr. 5 Voltastr. 5 13355 Berlin 13355 Berlin Germany Deutschland ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Crash in ide_do_request() on card removal 2005-08-02 11:10 ` Steven Scholz @ 2005-08-02 11:13 ` Jens Axboe 2005-08-02 11:17 ` Steven Scholz 0 siblings, 1 reply; 23+ messages in thread From: Jens Axboe @ 2005-08-02 11:13 UTC (permalink / raw) To: Steven Scholz; +Cc: linux-ide On Tue, Aug 02 2005, Steven Scholz wrote: > Jens Axboe wrote: > > >That's not quite true, q is not invalid after this call. It will only be > >invalid when it is freed (which doesn't happen from here but rather from > >the blk_cleanup_queue() call when the reference count drops to 0). > > > >This is still not perfect, but a lot better. Does it work for you? > > > >--- linux-2.6.12/drivers/ide/ide-disk.c~ 2005-08-02 > >12:48:16.000000000 +0200 > >+++ linux-2.6.12/drivers/ide/ide-disk.c 2005-08-02 > >12:48:32.000000000 +0200 > >@@ -1054,6 +1054,7 @@ > > drive->driver_data = NULL; > > drive->devfs_name[0] = '\0'; > > g->private_data = NULL; > >+ g->disk = NULL; > > put_disk(g); > > kfree(idkp); > > } > > No. > drivers/ide/ide-disk.c: In function `ide_disk_release': > drivers/ide/ide-disk.c:1057: error: structure has no member named `disk' Eh, typo, should be g->queue of course :-) --- linux-2.6.12/drivers/ide/ide-disk.c~ 2005-08-02 12:48:16.000000000 +0200 +++ linux-2.6.12/drivers/ide/ide-disk.c 2005-08-02 13:12:54.000000000 +0200 @@ -1054,6 +1054,7 @@ drive->driver_data = NULL; drive->devfs_name[0] = '\0'; g->private_data = NULL; + g->queue = NULL; put_disk(g); kfree(idkp); } -- Jens Axboe ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Crash in ide_do_request() on card removal 2005-08-02 11:13 ` Jens Axboe @ 2005-08-02 11:17 ` Steven Scholz 2005-08-02 11:28 ` Jens Axboe 0 siblings, 1 reply; 23+ messages in thread From: Steven Scholz @ 2005-08-02 11:17 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-ide Jens Axboe wrote: > On Tue, Aug 02 2005, Steven Scholz wrote: > >>Jens Axboe wrote: >> >> >>>That's not quite true, q is not invalid after this call. It will only be >>>invalid when it is freed (which doesn't happen from here but rather from >>>the blk_cleanup_queue() call when the reference count drops to 0). >>> >>>This is still not perfect, but a lot better. Does it work for you? >>> >>>--- linux-2.6.12/drivers/ide/ide-disk.c~ 2005-08-02 >>>12:48:16.000000000 +0200 >>>+++ linux-2.6.12/drivers/ide/ide-disk.c 2005-08-02 >>>12:48:32.000000000 +0200 >>>@@ -1054,6 +1054,7 @@ >>> drive->driver_data = NULL; >>> drive->devfs_name[0] = '\0'; >>> g->private_data = NULL; >>>+ g->disk = NULL; >>> put_disk(g); >>> kfree(idkp); >>>} >> >>No. >>drivers/ide/ide-disk.c: In function `ide_disk_release': >>drivers/ide/ide-disk.c:1057: error: structure has no member named `disk' > > > Eh, typo, should be g->queue of course :-) > > --- linux-2.6.12/drivers/ide/ide-disk.c~ 2005-08-02 12:48:16.000000000 +0200 > +++ linux-2.6.12/drivers/ide/ide-disk.c 2005-08-02 13:12:54.000000000 +0200 > @@ -1054,6 +1054,7 @@ > drive->driver_data = NULL; > drive->devfs_name[0] = '\0'; > g->private_data = NULL; > + g->queue = NULL; > put_disk(g); > kfree(idkp); > } No. That does not work: ~ # umount /mnt/pcmcia/ generic_make_request(2859) q=c02d3040 __generic_unplug_device(1447) calling q->request_fn() @ c00f97ec do_ide_request(1281) HWIF=c01dee8c (0), HWGROUP=c089cea0 (1038681856), drive=c01def1c (0, 0), queue=c02d3040 (00000000) do_ide_request(1287) HWIF is not present anymore!!! do_ide_request(1291) DRIVE is not present anymore. SKIPPING REQUEST!!! As you can see generic_make_request() still has the pointer to that queue! It gets it with q = bdev_get_queue(bio->bi_bdev); So the pointer is still stored soemwhere else... -- Steven ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Crash in ide_do_request() on card removal 2005-08-02 11:17 ` Steven Scholz @ 2005-08-02 11:28 ` Jens Axboe 2005-08-02 11:30 ` Steven Scholz 0 siblings, 1 reply; 23+ messages in thread From: Jens Axboe @ 2005-08-02 11:28 UTC (permalink / raw) To: Steven Scholz; +Cc: linux-ide On Tue, Aug 02 2005, Steven Scholz wrote: > Jens Axboe wrote: > > >On Tue, Aug 02 2005, Steven Scholz wrote: > > > >>Jens Axboe wrote: > >> > >> > >>>That's not quite true, q is not invalid after this call. It will only be > >>>invalid when it is freed (which doesn't happen from here but rather from > >>>the blk_cleanup_queue() call when the reference count drops to 0). > >>> > >>>This is still not perfect, but a lot better. Does it work for you? > >>> > >>>--- linux-2.6.12/drivers/ide/ide-disk.c~ 2005-08-02 > >>>12:48:16.000000000 +0200 > >>>+++ linux-2.6.12/drivers/ide/ide-disk.c 2005-08-02 > >>>12:48:32.000000000 +0200 > >>>@@ -1054,6 +1054,7 @@ > >>> drive->driver_data = NULL; > >>> drive->devfs_name[0] = '\0'; > >>> g->private_data = NULL; > >>>+ g->disk = NULL; > >>> put_disk(g); > >>> kfree(idkp); > >>>} > >> > >>No. > >>drivers/ide/ide-disk.c: In function `ide_disk_release': > >>drivers/ide/ide-disk.c:1057: error: structure has no member named `disk' > > > > > >Eh, typo, should be g->queue of course :-) > > > >--- linux-2.6.12/drivers/ide/ide-disk.c~ 2005-08-02 > >12:48:16.000000000 +0200 > >+++ linux-2.6.12/drivers/ide/ide-disk.c 2005-08-02 > >13:12:54.000000000 +0200 > >@@ -1054,6 +1054,7 @@ > > drive->driver_data = NULL; > > drive->devfs_name[0] = '\0'; > > g->private_data = NULL; > >+ g->queue = NULL; > > put_disk(g); > > kfree(idkp); > > } > > No. That does not work: > > ~ # umount /mnt/pcmcia/ > generic_make_request(2859) q=c02d3040 > __generic_unplug_device(1447) calling q->request_fn() @ c00f97ec > > do_ide_request(1281) HWIF=c01dee8c (0), HWGROUP=c089cea0 (1038681856), > drive=c01def1c (0, 0), queue=c02d3040 (00000000) > do_ide_request(1287) HWIF is not present anymore!!! > do_ide_request(1291) DRIVE is not present anymore. SKIPPING REQUEST!!! > > As you can see generic_make_request() still has the pointer to that queue! > It gets it with > > q = bdev_get_queue(bio->bi_bdev); > > So the pointer is still stored soemwhere else... Hmmm, perhaps just let ide end requests where the drive has been removed might be better. The disconnection between the queue cleanup and the gendisk cleanup makes it harder to do it properly. SCSI deals with it the same way, basically. -- Jens Axboe ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Crash in ide_do_request() on card removal 2005-08-02 11:28 ` Jens Axboe @ 2005-08-02 11:30 ` Steven Scholz 2005-08-02 11:33 ` Jens Axboe 0 siblings, 1 reply; 23+ messages in thread From: Steven Scholz @ 2005-08-02 11:30 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-ide Jens Axboe wrote: > On Tue, Aug 02 2005, Steven Scholz wrote: > >>Jens Axboe wrote: >> >> >>>On Tue, Aug 02 2005, Steven Scholz wrote: >>> >>> >>>>Jens Axboe wrote: >>>> >>>> >>>> >>>>>That's not quite true, q is not invalid after this call. It will only be >>>>>invalid when it is freed (which doesn't happen from here but rather from >>>>>the blk_cleanup_queue() call when the reference count drops to 0). >>>>> >>>>>This is still not perfect, but a lot better. Does it work for you? >>>>> >>>>>--- linux-2.6.12/drivers/ide/ide-disk.c~ 2005-08-02 >>>>>12:48:16.000000000 +0200 >>>>>+++ linux-2.6.12/drivers/ide/ide-disk.c 2005-08-02 >>>>>12:48:32.000000000 +0200 >>>>>@@ -1054,6 +1054,7 @@ >>>>> drive->driver_data = NULL; >>>>> drive->devfs_name[0] = '\0'; >>>>> g->private_data = NULL; >>>>>+ g->disk = NULL; >>>>> put_disk(g); >>>>> kfree(idkp); >>>>>} >>>> >>>>No. >>>>drivers/ide/ide-disk.c: In function `ide_disk_release': >>>>drivers/ide/ide-disk.c:1057: error: structure has no member named `disk' >>> >>> >>>Eh, typo, should be g->queue of course :-) >>> >>>--- linux-2.6.12/drivers/ide/ide-disk.c~ 2005-08-02 >>>12:48:16.000000000 +0200 >>>+++ linux-2.6.12/drivers/ide/ide-disk.c 2005-08-02 >>>13:12:54.000000000 +0200 >>>@@ -1054,6 +1054,7 @@ >>> drive->driver_data = NULL; >>> drive->devfs_name[0] = '\0'; >>> g->private_data = NULL; >>>+ g->queue = NULL; >>> put_disk(g); >>> kfree(idkp); >>>} >> >>No. That does not work: >> >>~ # umount /mnt/pcmcia/ >>generic_make_request(2859) q=c02d3040 >>__generic_unplug_device(1447) calling q->request_fn() @ c00f97ec >> >>do_ide_request(1281) HWIF=c01dee8c (0), HWGROUP=c089cea0 (1038681856), >>drive=c01def1c (0, 0), queue=c02d3040 (00000000) >>do_ide_request(1287) HWIF is not present anymore!!! >>do_ide_request(1291) DRIVE is not present anymore. SKIPPING REQUEST!!! >> >>As you can see generic_make_request() still has the pointer to that queue! >>It gets it with >> >> q = bdev_get_queue(bio->bi_bdev); >> >>So the pointer is still stored soemwhere else... > > > Hmmm, perhaps just let ide end requests where the drive has been > removed might be better. I don't understand what you mean. If requests are issued (e.g calling umount) after the drive is gone, then I get either a kernel crash or umount hangs cause it waits in __wait_on_buffer() ... -- Steven ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Crash in ide_do_request() on card removal 2005-08-02 11:30 ` Steven Scholz @ 2005-08-02 11:33 ` Jens Axboe 2005-08-02 12:09 ` Steven Scholz 0 siblings, 1 reply; 23+ messages in thread From: Jens Axboe @ 2005-08-02 11:33 UTC (permalink / raw) To: Steven Scholz; +Cc: linux-ide On Tue, Aug 02 2005, Steven Scholz wrote: > Jens Axboe wrote: > > >On Tue, Aug 02 2005, Steven Scholz wrote: > > > >>Jens Axboe wrote: > >> > >> > >>>On Tue, Aug 02 2005, Steven Scholz wrote: > >>> > >>> > >>>>Jens Axboe wrote: > >>>> > >>>> > >>>> > >>>>>That's not quite true, q is not invalid after this call. It will only > >>>>>be > >>>>>invalid when it is freed (which doesn't happen from here but rather > >>>>>from > >>>>>the blk_cleanup_queue() call when the reference count drops to 0). > >>>>> > >>>>>This is still not perfect, but a lot better. Does it work for you? > >>>>> > >>>>>--- linux-2.6.12/drivers/ide/ide-disk.c~ 2005-08-02 > >>>>>12:48:16.000000000 +0200 > >>>>>+++ linux-2.6.12/drivers/ide/ide-disk.c 2005-08-02 > >>>>>12:48:32.000000000 +0200 > >>>>>@@ -1054,6 +1054,7 @@ > >>>>> drive->driver_data = NULL; > >>>>> drive->devfs_name[0] = '\0'; > >>>>> g->private_data = NULL; > >>>>>+ g->disk = NULL; > >>>>> put_disk(g); > >>>>> kfree(idkp); > >>>>>} > >>>> > >>>>No. > >>>>drivers/ide/ide-disk.c: In function `ide_disk_release': > >>>>drivers/ide/ide-disk.c:1057: error: structure has no member named `disk' > >>> > >>> > >>>Eh, typo, should be g->queue of course :-) > >>> > >>>--- linux-2.6.12/drivers/ide/ide-disk.c~ 2005-08-02 > >>>12:48:16.000000000 +0200 > >>>+++ linux-2.6.12/drivers/ide/ide-disk.c 2005-08-02 > >>>13:12:54.000000000 +0200 > >>>@@ -1054,6 +1054,7 @@ > >>> drive->driver_data = NULL; > >>> drive->devfs_name[0] = '\0'; > >>> g->private_data = NULL; > >>>+ g->queue = NULL; > >>> put_disk(g); > >>> kfree(idkp); > >>>} > >> > >>No. That does not work: > >> > >>~ # umount /mnt/pcmcia/ > >>generic_make_request(2859) q=c02d3040 > >>__generic_unplug_device(1447) calling q->request_fn() @ c00f97ec > >> > >>do_ide_request(1281) HWIF=c01dee8c (0), HWGROUP=c089cea0 (1038681856), > >>drive=c01def1c (0, 0), queue=c02d3040 (00000000) > >>do_ide_request(1287) HWIF is not present anymore!!! > >>do_ide_request(1291) DRIVE is not present anymore. SKIPPING REQUEST!!! > >> > >>As you can see generic_make_request() still has the pointer to that queue! > >>It gets it with > >> > >> q = bdev_get_queue(bio->bi_bdev); > >> > >>So the pointer is still stored soemwhere else... > > > > > >Hmmm, perhaps just let ide end requests where the drive has been > >removed might be better. > > I don't understand what you mean. > > If requests are issued (e.g calling umount) after the drive is gone, then I > get either a kernel crash or umount hangs cause it waits in > __wait_on_buffer() ... No, those waiters will be woken up when ide does an end_request for requests coming in for a device which no longer exists. -- Jens Axboe ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Crash in ide_do_request() on card removal 2005-08-02 11:33 ` Jens Axboe @ 2005-08-02 12:09 ` Steven Scholz 2005-08-02 12:26 ` Jens Axboe 0 siblings, 1 reply; 23+ messages in thread From: Steven Scholz @ 2005-08-02 12:09 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-ide Jens Axboe wrote: > On Tue, Aug 02 2005, Steven Scholz wrote: > >>Jens Axboe wrote: >> >> >>>On Tue, Aug 02 2005, Steven Scholz wrote: >>> >>> >>>>Jens Axboe wrote: >>>> >>>> >>>> >>>>>On Tue, Aug 02 2005, Steven Scholz wrote: >>>>> >>>>> >>>>> >>>>>>Jens Axboe wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>That's not quite true, q is not invalid after this call. It will only >>>>>>>be >>>>>>>invalid when it is freed (which doesn't happen from here but rather >>>>>>>from >>>>>>>the blk_cleanup_queue() call when the reference count drops to 0). >>>>>>> >>>>>>>This is still not perfect, but a lot better. Does it work for you? >>>>>>> >>>>>>>--- linux-2.6.12/drivers/ide/ide-disk.c~ 2005-08-02 >>>>>>>12:48:16.000000000 +0200 >>>>>>>+++ linux-2.6.12/drivers/ide/ide-disk.c 2005-08-02 >>>>>>>12:48:32.000000000 +0200 >>>>>>>@@ -1054,6 +1054,7 @@ >>>>>>> drive->driver_data = NULL; >>>>>>> drive->devfs_name[0] = '\0'; >>>>>>> g->private_data = NULL; >>>>>>>+ g->disk = NULL; >>>>>>> put_disk(g); >>>>>>> kfree(idkp); >>>>>>>} >>>>>> >>>>>>No. >>>>>>drivers/ide/ide-disk.c: In function `ide_disk_release': >>>>>>drivers/ide/ide-disk.c:1057: error: structure has no member named `disk' >>>>> >>>>> >>>>>Eh, typo, should be g->queue of course :-) >>>>> >>>>>--- linux-2.6.12/drivers/ide/ide-disk.c~ 2005-08-02 >>>>>12:48:16.000000000 +0200 >>>>>+++ linux-2.6.12/drivers/ide/ide-disk.c 2005-08-02 >>>>>13:12:54.000000000 +0200 >>>>>@@ -1054,6 +1054,7 @@ >>>>> drive->driver_data = NULL; >>>>> drive->devfs_name[0] = '\0'; >>>>> g->private_data = NULL; >>>>>+ g->queue = NULL; >>>>> put_disk(g); >>>>> kfree(idkp); >>>>>} >>>> >>>>No. That does not work: >>>> >>>>~ # umount /mnt/pcmcia/ >>>>generic_make_request(2859) q=c02d3040 >>>>__generic_unplug_device(1447) calling q->request_fn() @ c00f97ec >>>> >>>>do_ide_request(1281) HWIF=c01dee8c (0), HWGROUP=c089cea0 (1038681856), >>>>drive=c01def1c (0, 0), queue=c02d3040 (00000000) >>>>do_ide_request(1287) HWIF is not present anymore!!! >>>>do_ide_request(1291) DRIVE is not present anymore. SKIPPING REQUEST!!! >>>> >>>>As you can see generic_make_request() still has the pointer to that queue! >>>>It gets it with >>>> >>>> q = bdev_get_queue(bio->bi_bdev); >>>> >>>>So the pointer is still stored soemwhere else... >>> >>> >>>Hmmm, perhaps just let ide end requests where the drive has been >>>removed might be better. >> >>I don't understand what you mean. >> >>If requests are issued (e.g calling umount) after the drive is gone, then I >>get either a kernel crash or umount hangs cause it waits in >>__wait_on_buffer() ... > > > No, those waiters will be woken up when ide does an end_request for > requests coming in for a device which no longer exists. But that would mean generating requests for devices, drives and hwifs that no longer exists. But exactly there it will crash! In do_ide_request() and ide_do_request(). ide_unregister() restores some old hwif structure. drive and queue are set to NULL. When I wait "long enough" between "cardctl eject" and "umount" it looks like this: ~ # cardctl eject ide_release(398) ide_unregister(585): index=0 ide_unregister(698) old HWIF restored! hwif=c01dee8c (0), hwgroup=c0fac2a0, drive=00000000, queue=00000000 ide_detach(164) cardmgr[253]: shutting down socket 0 cardmgr[253]: executing: './ide stop hda' cardmgr[253]: executing: 'modprobe -r ide-cs' exit_ide_cs(514) ~ # umount /mnt/pcmcia/ sys_umount(494) generic_make_request(2859) q=c02d3040 __generic_unplug_device(1447) calling q->request_fn() @ c00f97e4 do_ide_request(1279) HWIF=c01dee8c (0), HWGROUP=c0fac2a0 (738987520), drive=c01def1c (0, 0), queue=c02d3040 (00000000) Assertion '(hwif->present)' failed in drivers/ide/ide-io.c:do_ide_request(1284) Assertion '(drive->present)' failed in drivers/ide/ide-io.c:do_ide_request(1290) ide_do_request(1133) hwgroup is busy! ide_do_request(1135) hwif=01000406 The "738987520" above is hwgroup->busy! Obviously completly wrong. This seems to be a hint that an invalid pointer is dereferenced! The pointer hwif=01000406 also does not look very healthy! drive=c01def1c is the result of drive = choose_drive(hwgroup); but can't be as it was set to NULL before. If I don't wait "long enough" between "cardctl eject" and "umount" the kernel crashes with: ~ # cardctl eject; umount /mnt/pcmcia ide_release(398) ide_unregister(585): index=0 ide_unregister(698) old HWIF restored! hwif=c01dee8c (0), hwgroup=c0268080, drive=00000000, queue=00000000 ide_detach(164) cardmgr[253]: shutting down socket 0 cardmgr[253]: executing: './ide stop hda' sys_umount(494) retval=0 generic_make_request(2859) q=c02d3040 __generic_unplug_device(1447) calling q->request_fn() @ c00f97e4 do_ide_request(1279) HWIF=c01dee8c (0), HWGROUP=c0268080 (0), drive=c01def1c (0, 0), queue=c02d3040 (00000000) Assertion '(hwif->present)' failed in drivers/ide/ide-io.c:do_ide_request(1284) Assertion '(drive->present)' failed in drivers/ide/ide-io.c:do_ide_request(1290) Assertion '(hwgroup->drive)' failed in drivers/ide/ide-io.c:ide_do_request(1124) ide_do_request(1127) hwgroup->drive=00000000 !!!!!!!!!!! Unable to handle kernel NULL pointer dereference at virtual address 00000010 ... Internal error: Oops: 17 [#1] Modules linked in: ide_cs pcmcia at91_cf pcmcia_core CPU: 0 PC is at ide_do_request+0xe0/0x4f4 It crashes in choose_drive()... So how could you generate requests (and handle them sanely) for devices that where removed? If the drive would only had a hardware failure then probably a timeout would occure and some error handling would take place. But when the drive was officially unregistered then no more requests should be generated! I think that's why generic_make_request() checks q = bdev_get_queue(bio->bi_bdev); if (!q) { printk(KERN_ERR "generic_make_request: Trying to access " "nonexistent block-device %s (%Lu)\n", bdevname(bio->bi_bdev, b), (long long) bio->bi_sector); (You probably noted that I am not too deep into the IDE/block devices buisness...) -- Steven ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Crash in ide_do_request() on card removal 2005-08-02 12:09 ` Steven Scholz @ 2005-08-02 12:26 ` Jens Axboe 2005-08-02 12:40 ` Steven Scholz 0 siblings, 1 reply; 23+ messages in thread From: Jens Axboe @ 2005-08-02 12:26 UTC (permalink / raw) To: Steven Scholz; +Cc: linux-ide, bzolnier On Tue, Aug 02 2005, Steven Scholz wrote: > Jens Axboe wrote: > > >On Tue, Aug 02 2005, Steven Scholz wrote: > > > >>Jens Axboe wrote: > >> > >> > >>>On Tue, Aug 02 2005, Steven Scholz wrote: > >>> > >>> > >>>>Jens Axboe wrote: > >>>> > >>>> > >>>> > >>>>>On Tue, Aug 02 2005, Steven Scholz wrote: > >>>>> > >>>>> > >>>>> > >>>>>>Jens Axboe wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>That's not quite true, q is not invalid after this call. It will > >>>>>>>only be > >>>>>>>invalid when it is freed (which doesn't happen from here but rather > >>>>>>>from > >>>>>>>the blk_cleanup_queue() call when the reference count drops to 0). > >>>>>>> > >>>>>>>This is still not perfect, but a lot better. Does it work for you? > >>>>>>> > >>>>>>>--- linux-2.6.12/drivers/ide/ide-disk.c~ 2005-08-02 > >>>>>>>12:48:16.000000000 +0200 > >>>>>>>+++ linux-2.6.12/drivers/ide/ide-disk.c 2005-08-02 > >>>>>>>12:48:32.000000000 +0200 > >>>>>>>@@ -1054,6 +1054,7 @@ > >>>>>>> drive->driver_data = NULL; > >>>>>>> drive->devfs_name[0] = '\0'; > >>>>>>> g->private_data = NULL; > >>>>>>>+ g->disk = NULL; > >>>>>>> put_disk(g); > >>>>>>> kfree(idkp); > >>>>>>>} > >>>>>> > >>>>>>No. > >>>>>>drivers/ide/ide-disk.c: In function `ide_disk_release': > >>>>>>drivers/ide/ide-disk.c:1057: error: structure has no member named > >>>>>>`disk' > >>>>> > >>>>> > >>>>>Eh, typo, should be g->queue of course :-) > >>>>> > >>>>>--- linux-2.6.12/drivers/ide/ide-disk.c~ 2005-08-02 > >>>>>12:48:16.000000000 +0200 > >>>>>+++ linux-2.6.12/drivers/ide/ide-disk.c 2005-08-02 > >>>>>13:12:54.000000000 +0200 > >>>>>@@ -1054,6 +1054,7 @@ > >>>>> drive->driver_data = NULL; > >>>>> drive->devfs_name[0] = '\0'; > >>>>> g->private_data = NULL; > >>>>>+ g->queue = NULL; > >>>>> put_disk(g); > >>>>> kfree(idkp); > >>>>>} > >>>> > >>>>No. That does not work: > >>>> > >>>>~ # umount /mnt/pcmcia/ > >>>>generic_make_request(2859) q=c02d3040 > >>>>__generic_unplug_device(1447) calling q->request_fn() @ c00f97ec > >>>> > >>>>do_ide_request(1281) HWIF=c01dee8c (0), HWGROUP=c089cea0 (1038681856), > >>>>drive=c01def1c (0, 0), queue=c02d3040 (00000000) > >>>>do_ide_request(1287) HWIF is not present anymore!!! > >>>>do_ide_request(1291) DRIVE is not present anymore. SKIPPING REQUEST!!! > >>>> > >>>>As you can see generic_make_request() still has the pointer to that > >>>>queue! > >>>>It gets it with > >>>> > >>>> q = bdev_get_queue(bio->bi_bdev); > >>>> > >>>>So the pointer is still stored soemwhere else... > >>> > >>> > >>>Hmmm, perhaps just let ide end requests where the drive has been > >>>removed might be better. > >> > >>I don't understand what you mean. > >> > >>If requests are issued (e.g calling umount) after the drive is gone, then > >>I get either a kernel crash or umount hangs cause it waits in > >>__wait_on_buffer() ... > > > > > >No, those waiters will be woken up when ide does an end_request for > >requests coming in for a device which no longer exists. > > But that would mean generating requests for devices, drives and hwifs that > no longer exists. But exactly there it will crash! In do_ide_request() and > ide_do_request(). ide doesn't generate the requests, it just receives them for processing. And you want to halt that at the earliest stage possible. Basically the problem you are trying to solve is hacking around the missing hotplug support in drivers/ide. And that will never be pretty. The correct solution would of course be to improve the hotplug support, I think Bart was/is working on that (cc'ing him). > ide_unregister() restores some old hwif structure. drive and queue are set > to NULL. When I wait "long enough" between "cardctl eject" and "umount" it > looks like this: > > ~ # cardctl eject > ide_release(398) > ide_unregister(585): index=0 > ide_unregister(698) old HWIF restored! > hwif=c01dee8c (0), hwgroup=c0fac2a0, drive=00000000, queue=00000000 > ide_detach(164) > cardmgr[253]: shutting down socket 0 > cardmgr[253]: executing: './ide stop hda' > cardmgr[253]: executing: 'modprobe -r ide-cs' > exit_ide_cs(514) > > ~ # umount /mnt/pcmcia/ > sys_umount(494) > generic_make_request(2859) q=c02d3040 > __generic_unplug_device(1447) calling q->request_fn() @ c00f97e4 > do_ide_request(1279) HWIF=c01dee8c (0), HWGROUP=c0fac2a0 (738987520), > drive=c01def1c (0, 0), queue=c02d3040 (00000000) I don't understand what values you are dumping above, please explain. Is HWIF c01dee8c or 0? > Assertion '(hwif->present)' failed in > drivers/ide/ide-io.c:do_ide_request(1284) > Assertion '(drive->present)' failed in > drivers/ide/ide-io.c:do_ide_request(1290) > ide_do_request(1133) hwgroup is busy! > ide_do_request(1135) hwif=01000406 > > The "738987520" above is hwgroup->busy! Obviously completly wrong. This > seems to be a hint that an invalid pointer is dereferenced! The pointer > hwif=01000406 also does not look very healthy! drive=c01def1c is the result > of Yeah it looks very bad. Same thing with the reference counting, ide should not be freeing various structures that the block layer still holds a reference to. > drive = choose_drive(hwgroup); > > but can't be as it was set to NULL before. > > If I don't wait "long enough" between "cardctl eject" and "umount" the > kernel crashes with: > > ~ # cardctl eject; umount /mnt/pcmcia > ide_release(398) > ide_unregister(585): index=0 > ide_unregister(698) old HWIF restored! > hwif=c01dee8c (0), hwgroup=c0268080, drive=00000000, queue=00000000 > ide_detach(164) > cardmgr[253]: shutting down socket 0 > cardmgr[253]: executing: './ide stop hda' > sys_umount(494) retval=0 > generic_make_request(2859) q=c02d3040 > __generic_unplug_device(1447) calling q->request_fn() @ c00f97e4 > do_ide_request(1279) HWIF=c01dee8c (0), HWGROUP=c0268080 (0), > drive=c01def1c (0, 0), queue=c02d3040 (00000000) > Assertion '(hwif->present)' failed in > drivers/ide/ide-io.c:do_ide_request(1284) > Assertion '(drive->present)' failed in > drivers/ide/ide-io.c:do_ide_request(1290) > Assertion '(hwgroup->drive)' failed in > drivers/ide/ide-io.c:ide_do_request(1124) > ide_do_request(1127) hwgroup->drive=00000000 !!!!!!!!!!! > Unable to handle kernel NULL pointer dereference at virtual address 00000010 > ... > Internal error: Oops: 17 [#1] > Modules linked in: ide_cs pcmcia at91_cf pcmcia_core > CPU: 0 > PC is at ide_do_request+0xe0/0x4f4 > > It crashes in choose_drive()... > > So how could you generate requests (and handle them sanely) for devices > that where removed? Generation is not a problem, that happens outside of your scope. The job of the driver is just to make sure that it plays by the rule and at least makes sure it doesn't crash on its own for an active queue. > If the drive would only had a hardware failure then probably a timeout > would occure and some error handling would take place. > But when the drive was officially unregistered then no more requests should > be generated! I think that's why generic_make_request() checks > > q = bdev_get_queue(bio->bi_bdev); > if (!q) { > printk(KERN_ERR > "generic_make_request: Trying to access " > "nonexistent block-device %s (%Lu)\n", > bdevname(bio->bi_bdev, b), > (long long) bio->bi_sector); > > (You probably noted that I am not too deep into the IDE/block devices > buisness...) There's no one thing the above checks for. A queue can be "dead" but still be around, imagine a device going away with io pending already - you can't just kill the queue immediately or driver associated data structures. I suggest you take it up with Bart how best to solve this. He might even already have patches. -- Jens Axboe ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Crash in ide_do_request() on card removal 2005-08-02 12:26 ` Jens Axboe @ 2005-08-02 12:40 ` Steven Scholz 2005-08-02 12:54 ` Jens Axboe 2005-08-02 13:28 ` Bartlomiej Zolnierkiewicz 0 siblings, 2 replies; 23+ messages in thread From: Steven Scholz @ 2005-08-02 12:40 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-ide, bzolnier Jens Axboe wrote: >>>No, those waiters will be woken up when ide does an end_request for >>>requests coming in for a device which no longer exists. >> >>But that would mean generating requests for devices, drives and hwifs that >>no longer exists. But exactly there it will crash! In do_ide_request() and >>ide_do_request(). > > > ide doesn't generate the requests, it just receives them for processing. I know. > And you want to halt that at the earliest stage possible. Agreed. Problems seems to be: A refererenc to the request queue is stored in struct gendisk. Thus if you unregister a block device you should make sure that noone can still try to access that request queue, right? >>~ # umount /mnt/pcmcia/ >>sys_umount(494) >>generic_make_request(2859) q=c02d3040 >>__generic_unplug_device(1447) calling q->request_fn() @ c00f97e4 >>do_ide_request(1279) HWIF=c01dee8c (0), HWGROUP=c0fac2a0 (738987520), >>drive=c01def1c (0, 0), queue=c02d3040 (00000000) > > > I don't understand what values you are dumping above, please explain. Is > HWIF c01dee8c or 0? printk("%s(%d) HWIF=%p (%d), HWGROUP=%p (%d), drive=%p (%d, %d), queue=%p (%p)\n", __FUNCTION__, __LINE__, hwif, hwif->present, hwgroup, hwgroup->busy, drive, drive->present, drive->dead, q, drive->queue); So HWIF is a c01dee8c and hwif->present=0. >>Assertion '(hwif->present)' failed in >>drivers/ide/ide-io.c:do_ide_request(1284) >>Assertion '(drive->present)' failed in >>drivers/ide/ide-io.c:do_ide_request(1290) >>ide_do_request(1133) hwgroup is busy! >>ide_do_request(1135) hwif=01000406 >> >>The "738987520" above is hwgroup->busy! Obviously completly wrong. This >>seems to be a hint that an invalid pointer is dereferenced! The pointer >>hwif=01000406 also does not look very healthy! drive=c01def1c is the result >>of > > > Yeah it looks very bad. Same thing with the reference counting, ide > should not be freeing various structures that the block layer still > holds a reference to. Well or better tell the block layer that the drive is gone and it makes no sense to make any requests ... >>So how could you generate requests (and handle them sanely) for devices >>that where removed? > > Generation is not a problem, that happens outside of your scope. The job > of the driver is just to make sure that it plays by the rule and at > least makes sure it doesn't crash on its own for an active queue. do_ide_request() could check hwif->present and/or drive->present. BUT: at this point the request is already made and the low level block layer is sleeping and waiting for it's completion. I could not figure out how to kill a request in do_ide_request() and wake up the block layer (sleeping in __wait_on_buffer()). That's why I thought preventing the generation of such reuqests would be the right way. > I suggest you take it up with Bart how best to solve this. He might even > already have patches. Bart? Are you there? -- Steven ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Crash in ide_do_request() on card removal 2005-08-02 12:40 ` Steven Scholz @ 2005-08-02 12:54 ` Jens Axboe 2005-08-02 13:03 ` Steven Scholz 2005-08-02 13:28 ` Bartlomiej Zolnierkiewicz 1 sibling, 1 reply; 23+ messages in thread From: Jens Axboe @ 2005-08-02 12:54 UTC (permalink / raw) To: Steven Scholz; +Cc: linux-ide, bzolnier On Tue, Aug 02 2005, Steven Scholz wrote: > Jens Axboe wrote: > > >>>No, those waiters will be woken up when ide does an end_request for > >>>requests coming in for a device which no longer exists. > >> > >>But that would mean generating requests for devices, drives and hwifs > >>that no longer exists. But exactly there it will crash! In > >>do_ide_request() and ide_do_request(). > > > > > >ide doesn't generate the requests, it just receives them for processing. > I know. > > >And you want to halt that at the earliest stage possible. > Agreed. > > Problems seems to be: > > A refererenc to the request queue is stored in struct gendisk. Thus if you > unregister a block device you should make sure that noone can still try to > access that request queue, right? Well the problem is that ide-cs/ide doesn't handle unplug gracefully. You are trying to fix it in the wrong location, fix belongs in ide. > >>~ # umount /mnt/pcmcia/ > >>sys_umount(494) > >>generic_make_request(2859) q=c02d3040 > >>__generic_unplug_device(1447) calling q->request_fn() @ c00f97e4 > >>do_ide_request(1279) HWIF=c01dee8c (0), HWGROUP=c0fac2a0 (738987520), > >>drive=c01def1c (0, 0), queue=c02d3040 (00000000) > > > > > >I don't understand what values you are dumping above, please explain. Is > >HWIF c01dee8c or 0? > > printk("%s(%d) HWIF=%p (%d), HWGROUP=%p (%d), drive=%p (%d, %d), queue=%p > (%p)\n", __FUNCTION__, __LINE__, hwif, hwif->present, hwgroup, > hwgroup->busy, drive, drive->present, drive->dead, q, drive->queue); > > So HWIF is a c01dee8c and hwif->present=0. Ok, so you could kill any request arriving for a !hwif->present hardware interface. > >>Assertion '(hwif->present)' failed in > >>drivers/ide/ide-io.c:do_ide_request(1284) > >>Assertion '(drive->present)' failed in > >>drivers/ide/ide-io.c:do_ide_request(1290) > >>ide_do_request(1133) hwgroup is busy! > >>ide_do_request(1135) hwif=01000406 > >> > >>The "738987520" above is hwgroup->busy! Obviously completly wrong. This > >>seems to be a hint that an invalid pointer is dereferenced! The pointer > >>hwif=01000406 also does not look very healthy! drive=c01def1c is the > >>result of > > > > > >Yeah it looks very bad. Same thing with the reference counting, ide > >should not be freeing various structures that the block layer still > >holds a reference to. > > Well or better tell the block layer that the drive is gone and it makes no > sense to make any requests ... It's not enough! What if requests are already on the queue waiting to be serviced? Again, forget request generation. > >>So how could you generate requests (and handle them sanely) for devices > >>that where removed? > > > >Generation is not a problem, that happens outside of your scope. The job > >of the driver is just to make sure that it plays by the rule and at > >least makes sure it doesn't crash on its own for an active queue. > > do_ide_request() could check hwif->present and/or drive->present. Precisely. > BUT: at this point the request is already made and the low level block > layer is sleeping and waiting for it's completion. Which will complete when you error the request, as I wrote a few mails ago. > I could not figure out how to kill a request in do_ide_request() and wake > up the block layer (sleeping in __wait_on_buffer()). > That's why I thought preventing the generation of such reuqests would be > the right way. It's not the right way, it only solves a little part of the problem. Killing a request with an error usually looks like this: blkdev_dequeue_request(rq); end_that_request_first(rq, 0, rq->hard_nr_sectors); end_that_request_last(rq); -- Jens Axboe ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Crash in ide_do_request() on card removal 2005-08-02 12:54 ` Jens Axboe @ 2005-08-02 13:03 ` Steven Scholz 2005-08-02 13:06 ` Jens Axboe 0 siblings, 1 reply; 23+ messages in thread From: Steven Scholz @ 2005-08-02 13:03 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-ide, bzolnier Jens Axboe wrote: > It's not the right way, it only solves a little part of the problem. > Killing a request with an error usually looks like this: > > blkdev_dequeue_request(rq); > end_that_request_first(rq, 0, rq->hard_nr_sectors); > end_that_request_last(rq); How do I get the request? do_ide_request() only get the complete request_queue_t *q. Shell I use elv_next_request() ? -- Steven ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Crash in ide_do_request() on card removal 2005-08-02 13:03 ` Steven Scholz @ 2005-08-02 13:06 ` Jens Axboe 2005-08-02 13:38 ` Steven Scholz 0 siblings, 1 reply; 23+ messages in thread From: Jens Axboe @ 2005-08-02 13:06 UTC (permalink / raw) To: Steven Scholz; +Cc: linux-ide, bzolnier On Tue, Aug 02 2005, Steven Scholz wrote: > Jens Axboe wrote: > > >It's not the right way, it only solves a little part of the problem. > >Killing a request with an error usually looks like this: > > > > blkdev_dequeue_request(rq); > > end_that_request_first(rq, 0, rq->hard_nr_sectors); > > end_that_request_last(rq); > > How do I get the request? do_ide_request() only get the complete > request_queue_t *q. Shell I use elv_next_request() ? Yes. -- Jens Axboe ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Crash in ide_do_request() on card removal 2005-08-02 13:06 ` Jens Axboe @ 2005-08-02 13:38 ` Steven Scholz 2005-08-02 13:45 ` Jens Axboe 0 siblings, 1 reply; 23+ messages in thread From: Steven Scholz @ 2005-08-02 13:38 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-ide, bzolnier Jens Axboe wrote: > On Tue, Aug 02 2005, Steven Scholz wrote: > >>Jens Axboe wrote: >> >> >>>It's not the right way, it only solves a little part of the problem. >>>Killing a request with an error usually looks like this: >>> >>> blkdev_dequeue_request(rq); >>> end_that_request_first(rq, 0, rq->hard_nr_sectors); >>> end_that_request_last(rq); >> >>How do I get the request? do_ide_request() only get the complete >>request_queue_t *q. Shell I use elv_next_request() ? > > Yes. So my workaround for now would be --- linux-2.6.13-rc5/drivers/ide/ide-io.c +++ linux-2.6.13-rc4-at91-multiIO/drivers/ide/ide-io.c @@ -1230,7 +1264,18 @@ void do_ide_request(request_queue_t *q) { ide_drive_t *drive = q->queuedata; - ide_do_request(HWGROUP(drive), IDE_NO_IRQ); + if (drive->present) + ide_do_request(HWGROUP(drive), IDE_NO_IRQ); + else { + struct request *rq; + printk("%s() drive is not present anymore! Kill request.\n", __FUNCTION__); + rq = elv_next_request(q); + if (rq) { + blkdev_dequeue_request(rq); + end_that_request_first(rq, 0, rq->hard_nr_sectors); + end_that_request_last(rq); + } + } } /* -- Steven ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Crash in ide_do_request() on card removal 2005-08-02 13:38 ` Steven Scholz @ 2005-08-02 13:45 ` Jens Axboe 2005-08-02 13:54 ` Steven Scholz 0 siblings, 1 reply; 23+ messages in thread From: Jens Axboe @ 2005-08-02 13:45 UTC (permalink / raw) To: Steven Scholz; +Cc: linux-ide, bzolnier On Tue, Aug 02 2005, Steven Scholz wrote: > Jens Axboe wrote: > > >On Tue, Aug 02 2005, Steven Scholz wrote: > > > >>Jens Axboe wrote: > >> > >> > >>>It's not the right way, it only solves a little part of the problem. > >>>Killing a request with an error usually looks like this: > >>> > >>> blkdev_dequeue_request(rq); > >>> end_that_request_first(rq, 0, rq->hard_nr_sectors); > >>> end_that_request_last(rq); > >> > >>How do I get the request? do_ide_request() only get the complete > >>request_queue_t *q. Shell I use elv_next_request() ? > > > >Yes. > > So my workaround for now would be > > --- linux-2.6.13-rc5/drivers/ide/ide-io.c > +++ linux-2.6.13-rc4-at91-multiIO/drivers/ide/ide-io.c > @@ -1230,7 +1264,18 @@ void do_ide_request(request_queue_t *q) > { > ide_drive_t *drive = q->queuedata; > > - ide_do_request(HWGROUP(drive), IDE_NO_IRQ); > + if (drive->present) > + ide_do_request(HWGROUP(drive), IDE_NO_IRQ); > + else { > + struct request *rq; > + printk("%s() drive is not present anymore! Kill > request.\n", __FUNCTION__); > + rq = elv_next_request(q); > + if (rq) { > + blkdev_dequeue_request(rq); > + end_that_request_first(rq, 0, rq->hard_nr_sectors); > + end_that_request_last(rq); > + } > + } Pretty close. Make the killing a loop: while ((rq = elv_next_request(q)) != NULL) { blkdev_dequeue_request(rq); end_that_request_first(rq, 0, rq->hard_nr_sectors); end_that_request_last(rq); } and it looks ok to me. Change the printk to something a little more appropriate as well, ala printk(KERN_WARNING "%s: not present, killing requests\n", drive->name); -- Jens Axboe ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Crash in ide_do_request() on card removal 2005-08-02 13:45 ` Jens Axboe @ 2005-08-02 13:54 ` Steven Scholz 2005-08-02 14:11 ` Jens Axboe 0 siblings, 1 reply; 23+ messages in thread From: Steven Scholz @ 2005-08-02 13:54 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-ide, bzolnier [-- Attachment #1: Type: text/plain, Size: 1668 bytes --] Jens Axboe wrote: > On Tue, Aug 02 2005, Steven Scholz wrote: > >>Jens Axboe wrote: >> >> >>>On Tue, Aug 02 2005, Steven Scholz wrote: >>> >>> >>>>Jens Axboe wrote: >>>> >>>> >>>> >>>>>It's not the right way, it only solves a little part of the problem. >>>>>Killing a request with an error usually looks like this: >>>>> >>>>> blkdev_dequeue_request(rq); >>>>> end_that_request_first(rq, 0, rq->hard_nr_sectors); >>>>> end_that_request_last(rq); >>>> >>>>How do I get the request? do_ide_request() only get the complete >>>>request_queue_t *q. Shell I use elv_next_request() ? >>> >>>Yes. >> >>So my workaround for now would be >> >>--- linux-2.6.13-rc5/drivers/ide/ide-io.c >>+++ linux-2.6.13-rc4-at91-multiIO/drivers/ide/ide-io.c >>@@ -1230,7 +1264,18 @@ void do_ide_request(request_queue_t *q) >> { >> ide_drive_t *drive = q->queuedata; >> >>- ide_do_request(HWGROUP(drive), IDE_NO_IRQ); >>+ if (drive->present) >>+ ide_do_request(HWGROUP(drive), IDE_NO_IRQ); >>+ else { >>+ struct request *rq; >>+ printk("%s() drive is not present anymore! Kill >>request.\n", __FUNCTION__); >>+ rq = elv_next_request(q); >>+ if (rq) { >>+ blkdev_dequeue_request(rq); >>+ end_that_request_first(rq, 0, rq->hard_nr_sectors); >>+ end_that_request_last(rq); >>+ } >>+ } > > > Pretty close. Make the killing a loop: ... [PATCH] ide: kill requests when drive is not present anymore Signed-off-by: Steven Scholz <steven.scholz@imc-berlin.de> Ok? Change the CHANGELOG at your will. -- Steven [-- Attachment #2: ide_kill_rq.patch --] [-- Type: text/plain, Size: 682 bytes --] --- linux-2.6.13-rc5/drivers/ide/ide-io.c 2005-06-17 21:48:29.000000000 +0200 +++ linux-2.6.13-rc4-at91-multiIO/drivers/ide/ide-io.c 2005-08-02 15:46:53.000000000 +0200 @@ -1230,7 +1257,17 @@ void do_ide_request(request_queue_t *q) { ide_drive_t *drive = q->queuedata; - ide_do_request(HWGROUP(drive), IDE_NO_IRQ); + if (drive->present) + ide_do_request(HWGROUP(drive), IDE_NO_IRQ); + else { + struct request *rq; + printk(KERN_WARNING "%s: not present, killing requests\n", drive->name); + while ((rq = elv_next_request(q)) != NULL) { + blkdev_dequeue_request(rq); + end_that_request_first(rq, 0, rq->hard_nr_sectors); + end_that_request_last(rq); + } + } } /* ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Crash in ide_do_request() on card removal 2005-08-02 13:54 ` Steven Scholz @ 2005-08-02 14:11 ` Jens Axboe 2005-08-08 9:00 ` Steven Scholz 0 siblings, 1 reply; 23+ messages in thread From: Jens Axboe @ 2005-08-02 14:11 UTC (permalink / raw) To: Steven Scholz; +Cc: linux-ide, bzolnier On Tue, Aug 02 2005, Steven Scholz wrote: > [PATCH] ide: kill requests when drive is not present anymore > > Signed-off-by: Steven Scholz <steven.scholz@imc-berlin.de> > > Ok? > > Change the CHANGELOG at your will. > > -- > Steven > --- linux-2.6.13-rc5/drivers/ide/ide-io.c 2005-06-17 21:48:29.000000000 +0200 > +++ linux-2.6.13-rc4-at91-multiIO/drivers/ide/ide-io.c 2005-08-02 15:46:53.000000000 +0200 > @@ -1230,7 +1257,17 @@ void do_ide_request(request_queue_t *q) > { > ide_drive_t *drive = q->queuedata; > > - ide_do_request(HWGROUP(drive), IDE_NO_IRQ); > + if (drive->present) > + ide_do_request(HWGROUP(drive), IDE_NO_IRQ); > + else { > + struct request *rq; > + printk(KERN_WARNING "%s: not present, killing requests\n", drive->name); > + while ((rq = elv_next_request(q)) != NULL) { > + blkdev_dequeue_request(rq); > + end_that_request_first(rq, 0, rq->hard_nr_sectors); > + end_that_request_last(rq); > + } > + } > } > > /* Looks good to me now, that's one item off Barts list :-) -- Jens Axboe ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Crash in ide_do_request() on card removal 2005-08-02 14:11 ` Jens Axboe @ 2005-08-08 9:00 ` Steven Scholz 0 siblings, 0 replies; 23+ messages in thread From: Steven Scholz @ 2005-08-08 9:00 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-ide, bzolnier Jens Axboe wrote: > On Tue, Aug 02 2005, Steven Scholz wrote: > >>[PATCH] ide: kill requests when drive is not present anymore >> >>Signed-off-by: Steven Scholz <steven.scholz@imc-berlin.de> >> >>Ok? >> >>Change the CHANGELOG at your will. >> >>-- >>Steven > > >>--- linux-2.6.13-rc5/drivers/ide/ide-io.c 2005-06-17 21:48:29.000000000 +0200 >>+++ linux-2.6.13-rc4-at91-multiIO/drivers/ide/ide-io.c 2005-08-02 15:46:53.000000000 +0200 >>@@ -1230,7 +1257,17 @@ void do_ide_request(request_queue_t *q) >> { >> ide_drive_t *drive = q->queuedata; >> >>- ide_do_request(HWGROUP(drive), IDE_NO_IRQ); >>+ if (drive->present) >>+ ide_do_request(HWGROUP(drive), IDE_NO_IRQ); >>+ else { >>+ struct request *rq; >>+ printk(KERN_WARNING "%s: not present, killing requests\n", drive->name); >>+ while ((rq = elv_next_request(q)) != NULL) { >>+ blkdev_dequeue_request(rq); >>+ end_that_request_first(rq, 0, rq->hard_nr_sectors); >>+ end_that_request_last(rq); >>+ } >>+ } >> } >> >> /* > > > Looks good to me now, that's one item off Barts list :-) Will it get into 2.6.13? It's not in -rc6 though. -- Steven -- Steven Scholz imc Measurement & Control imc Meßsysteme GmbH Voltastr. 5 Voltastr. 5 13355 Berlin 13355 Berlin Germany Deutschland ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Crash in ide_do_request() on card removal 2005-08-02 12:40 ` Steven Scholz 2005-08-02 12:54 ` Jens Axboe @ 2005-08-02 13:28 ` Bartlomiej Zolnierkiewicz 2005-08-18 12:59 ` Steven Scholz 2006-01-31 14:28 ` Steven Scholz 1 sibling, 2 replies; 23+ messages in thread From: Bartlomiej Zolnierkiewicz @ 2005-08-02 13:28 UTC (permalink / raw) To: Steven Scholz; +Cc: Jens Axboe, linux-ide On 8/2/05, Steven Scholz <steven.scholz@imc-berlin.de> wrote: > Jens Axboe wrote: > do_ide_request() could check hwif->present and/or drive->present. > BUT: at this point the request is already made and the low level block layer is > sleeping and waiting for it's completion. > I could not figure out how to kill a request in do_ide_request() and wake up the > block layer (sleeping in __wait_on_buffer()). > That's why I thought preventing the generation of such reuqests would be the > right way. > > > I suggest you take it up with Bart how best to solve this. He might even > > already have patches. > Bart? Are you there? IDE device unplug TODO :) * add ide_device_get() helper which will check for drive->present + increase reference count on drive->gendev and ide_device_put() helper which will decrease reference count on drive->gendev * propagate usage of these helpers to device drivers (ide_disk_get() etc.) so there won't be _new_ requests after removal of the device * if !drive->present fail _old_ requests (as already mentioned by Jens) * add proper locking around drive->present * ... first three points should be relatively easy Bartlomiej ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Crash in ide_do_request() on card removal 2005-08-02 13:28 ` Bartlomiej Zolnierkiewicz @ 2005-08-18 12:59 ` Steven Scholz 2006-01-31 14:28 ` Steven Scholz 1 sibling, 0 replies; 23+ messages in thread From: Steven Scholz @ 2005-08-18 12:59 UTC (permalink / raw) To: Bartlomiej Zolnierkiewicz; +Cc: Jens Axboe, linux-ide Bartlomiej Zolnierkiewicz wrote: > On 8/2/05, Steven Scholz <steven.scholz@imc-berlin.de> wrote: > >>Jens Axboe wrote: > > >>do_ide_request() could check hwif->present and/or drive->present. >>BUT: at this point the request is already made and the low level block layer is >>sleeping and waiting for it's completion. >>I could not figure out how to kill a request in do_ide_request() and wake up the >>block layer (sleeping in __wait_on_buffer()). >>That's why I thought preventing the generation of such reuqests would be the >>right way. >> >> >>>I suggest you take it up with Bart how best to solve this. He might even >>>already have patches. >> >>Bart? Are you there? > > > IDE device unplug TODO :) > * add ide_device_get() helper which will check for drive->present > + increase reference count on drive->gendev and ide_device_put() > helper which will decrease reference count on drive->gendev > * propagate usage of these helpers to device drivers (ide_disk_get() etc.) > so there won't be _new_ requests after removal of the device > * if !drive->present fail _old_ requests (as already mentioned by Jens) > * add proper locking around drive->present > * ... > > first three points should be relatively easy What's the status here? Although patching do_ide_request() (see mail from 2.8.2005) helped a bit I've seen a crash in elv_queue_empty(): cardmgr[220]: shutting down socket 0 cardmgr[220]: executing: './ide stop hda' cardmgr[220]: + umount -v /dev/hda1 Unable to handle kernel paging request at virtual address 6a202f20 pgd = c09d0000 [6a202f20] *pgd=00000000 Internal error: Oops: 0 [#1] Modules linked in: ide_cs pcmcia at91_cf pcmcia_core imcdevif imcdevd imcevents CPU: 0 PC is at 0x6a202f20 LR is at elv_queue_empty+0x28/0x40 ... Process umount (pid: 339, stack limit = 0xc094a194) ... Backtrace: (elv_queue_empty+0x0/0x40) from (__make_request+0xa4/0x50c) (__make_request+0x0/0x50c) from [<c00f0690>] (generic_make_request+0x20c/0x228) (generic_make_request+0x0/0x228) from [<c00f0780>] (submit_bio+0xd4/0xf4) (submit_bio+0x0/0xf4) from [<c006c038>] (submit_bh+0x164/0x190) (submit_bh+0x0/0x190) from [<c0069a94>] (__bread_slow+0x7c/0xc4) (__bread_slow+0x0/0xc4) from [<c0069d98>] (__bread+0x24/0x30) (__bread+0x0/0x30) from [<c00adf2c>] (fat_clusters_flush+0x30/0xd0) (fat_clusters_flush+0x0/0xd0) from [<c00ac994>] (fat_put_super+0x24/0x94) (fat_put_super+0x0/0x94) from [<c006dfa0>] (generic_shutdown_super+0xdc/0x188) (generic_shutdown_super+0x0/0x188) from (kill_block_super+0x28/0x3c) (kill_block_super+0x0/0x3c) from (deactivate_super+0x58/0x6c) (deactivate_super+0x0/0x6c) from (__mntput+0x2c/0x30) (__mntput+0x0/0x30) from [<c0074e70>] (path_release_on_umount+0x4c/0x50) (path_release_on_umount+0x0/0x50) from [<c00846b0>] (sys_umount+0x98/0xa0) (sys_umount+0x0/0xa0) from [<c00846cc>] (sys_oldumount+0x14/0x18) (sys_oldumount+0x0/0x18) from [<c0019c60>] (ret_fast_syscall+0x0/0x2c) Code: bad PC value. Badness in do_exit at kernel/exit.c:787 (dump_stack+0x0/0x14) from [<c0032184>] (do_exit+0x40/0x3b4) (do_exit+0x0/0x3b4) from [<c001f090>] (die+0xf8/0x10c) (die+0x0/0x10c) from [<c0020894>] (__do_kernel_fault+0x6c/0x7c) (__do_kernel_fault+0x0/0x7c) from [<c0020bbc>] (do_page_fault+0x104/0x118) (do_page_fault+0x0/0x118) from [<c0020bfc>] (do_translation_fault+0x2c/0xac) (do_translation_fault+0x0/0xac) from [<c0020d90>] (do_PrefetchAbort+0x18/0x1c) (do_PrefetchAbort+0x0/0x1c) from [<c00199e0>] (__pabt_svc+0x40/0x80) (elv_queue_empty+0x0/0x40) from [<c00efd44>] (__make_request+0xa4/0x50c) (__make_request+0x0/0x50c) from [<c00f0690>] (generic_make_request+0x20c/0x228) (generic_make_request+0x0/0x228) from [<c00f0780>] (submit_bio+0xd4/0xf4) (submit_bio+0x0/0xf4) from [<c006c038>] (submit_bh+0x164/0x190) (submit_bh+0x0/0x190) from [<c0069a94>] (__bread_slow+0x7c/0xc4) (__bread_slow+0x0/0xc4) from [<c0069d98>] (__bread+0x24/0x30) (__bread+0x0/0x30) from [<c00adf2c>] (fat_clusters_flush+0x30/0xd0) (fat_clusters_flush+0x0/0xd0) from [<c00ac994>] (fat_put_super+0x24/0x94) (fat_put_super+0x0/0x94) from [<c006dfa0>] (generic_shutdown_super+0xdc/0x188) (generic_shutdown_super+0x0/0x188) from (kill_block_super+0x28/0x3c) (kill_block_super+0x0/0x3c) from [<c006de24>] (deactivate_super+0x58/0x6c) (deactivate_super+0x0/0x6c) from [<c00840c4>] (__mntput+0x2c/0x30) (__mntput+0x0/0x30) from [<c0074e70>] (path_release_on_umount+0x4c/0x50) (path_release_on_umount+0x0/0x50) from [<c00846b0>] (sys_umount+0x98/0xa0) (sys_umount+0x0/0xa0) from [<c00846cc>] (sys_oldumount+0x14/0x18) (sys_oldumount+0x0/0x18) from [<c0019c60>] (ret_fast_syscall+0x0/0x2c) cardmgr[220]: + Segmentation fault cardmgr[220]: stop cmd exited with status 1 cardmgr[220]: executing: 'modprobe -r ide-cs' cardmgr[220]: BEEP_OK ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Crash in ide_do_request() on card removal 2005-08-02 13:28 ` Bartlomiej Zolnierkiewicz 2005-08-18 12:59 ` Steven Scholz @ 2006-01-31 14:28 ` Steven Scholz 1 sibling, 0 replies; 23+ messages in thread From: Steven Scholz @ 2006-01-31 14:28 UTC (permalink / raw) To: Bartlomiej Zolnierkiewicz; +Cc: Jens Axboe, linux-ide Hi all, >>> I suggest you take it up with Bart how best to solve this. He might even >>> already have patches. >> Bart? Are you there? > > IDE device unplug TODO :) > * add ide_device_get() helper which will check for drive->present > + increase reference count on drive->gendev and ide_device_put() > helper which will decrease reference count on drive->gendev > * propagate usage of these helpers to device drivers (ide_disk_get() etc.) > so there won't be _new_ requests after removal of the device > * if !drive->present fail _old_ requests (as already mentioned by Jens) > * add proper locking around drive->present > * ... Was this "IDE device unplug" stuff added to the mainline kernel by now? If so, would it be possible to backport it to linux-2.6.14? Thanks. -- Steven ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2006-01-31 14:28 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-07-29 12:01 Crash in ide_do_request() on card removal Steven Scholz 2005-08-02 9:57 ` Steven Scholz 2005-08-02 10:48 ` Jens Axboe 2005-08-02 11:10 ` Steven Scholz 2005-08-02 11:13 ` Jens Axboe 2005-08-02 11:17 ` Steven Scholz 2005-08-02 11:28 ` Jens Axboe 2005-08-02 11:30 ` Steven Scholz 2005-08-02 11:33 ` Jens Axboe 2005-08-02 12:09 ` Steven Scholz 2005-08-02 12:26 ` Jens Axboe 2005-08-02 12:40 ` Steven Scholz 2005-08-02 12:54 ` Jens Axboe 2005-08-02 13:03 ` Steven Scholz 2005-08-02 13:06 ` Jens Axboe 2005-08-02 13:38 ` Steven Scholz 2005-08-02 13:45 ` Jens Axboe 2005-08-02 13:54 ` Steven Scholz 2005-08-02 14:11 ` Jens Axboe 2005-08-08 9:00 ` Steven Scholz 2005-08-02 13:28 ` Bartlomiej Zolnierkiewicz 2005-08-18 12:59 ` Steven Scholz 2006-01-31 14:28 ` Steven Scholz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).