linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@suse.de>
To: Steven Scholz <steven.scholz@imc-berlin.de>
Cc: linux-ide@vger.kernel.org, bzolnier@gmail.com
Subject: Re: Crash in ide_do_request() on card removal
Date: Tue, 2 Aug 2005 14:26:14 +0200	[thread overview]
Message-ID: <20050802122609.GM22569@suse.de> (raw)
In-Reply-To: <42EF626B.6090103@imc-berlin.de>

On Tue, Aug 02 2005, Steven Scholz wrote:
> Jens Axboe wrote:
> 
> >On Tue, Aug 02 2005, Steven Scholz wrote:
> >
> >>Jens Axboe wrote:
> >>
> >>
> >>>On Tue, Aug 02 2005, Steven Scholz wrote:
> >>>
> >>>
> >>>>Jens Axboe wrote:
> >>>>
> >>>>
> >>>>
> >>>>>On Tue, Aug 02 2005, Steven Scholz wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>>Jens Axboe wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>>That's not quite true, q is not invalid after this call. It will 
> >>>>>>>only be
> >>>>>>>invalid when it is freed (which doesn't happen from here but rather 
> >>>>>>>from
> >>>>>>>the blk_cleanup_queue() call when the reference count drops to 0).
> >>>>>>>
> >>>>>>>This is still not perfect, but a lot better. Does it work for you?
> >>>>>>>
> >>>>>>>--- linux-2.6.12/drivers/ide/ide-disk.c~	2005-08-02 
> >>>>>>>12:48:16.000000000 +0200
> >>>>>>>+++ linux-2.6.12/drivers/ide/ide-disk.c	2005-08-02 
> >>>>>>>12:48:32.000000000 +0200
> >>>>>>>@@ -1054,6 +1054,7 @@
> >>>>>>>	drive->driver_data = NULL;
> >>>>>>>	drive->devfs_name[0] = '\0';
> >>>>>>>	g->private_data = NULL;
> >>>>>>>+	g->disk = NULL;
> >>>>>>>	put_disk(g);
> >>>>>>>	kfree(idkp);
> >>>>>>>}
> >>>>>>
> >>>>>>No.
> >>>>>>drivers/ide/ide-disk.c: In function `ide_disk_release':
> >>>>>>drivers/ide/ide-disk.c:1057: error: structure has no member named 
> >>>>>>`disk'
> >>>>>
> >>>>>
> >>>>>Eh, typo, should be g->queue of course :-)
> >>>>>
> >>>>>--- linux-2.6.12/drivers/ide/ide-disk.c~	2005-08-02 
> >>>>>12:48:16.000000000 +0200
> >>>>>+++ linux-2.6.12/drivers/ide/ide-disk.c	2005-08-02 
> >>>>>13:12:54.000000000 +0200
> >>>>>@@ -1054,6 +1054,7 @@
> >>>>>	drive->driver_data = NULL;
> >>>>>	drive->devfs_name[0] = '\0';
> >>>>>	g->private_data = NULL;
> >>>>>+	g->queue = NULL;
> >>>>>	put_disk(g);
> >>>>>	kfree(idkp);
> >>>>>}
> >>>>
> >>>>No. That does not work:
> >>>>
> >>>>~ # umount /mnt/pcmcia/
> >>>>generic_make_request(2859) q=c02d3040
> >>>>__generic_unplug_device(1447) calling q->request_fn() @ c00f97ec
> >>>>
> >>>>do_ide_request(1281) HWIF=c01dee8c (0), HWGROUP=c089cea0 (1038681856), 
> >>>>drive=c01def1c (0, 0), queue=c02d3040 (00000000)
> >>>>do_ide_request(1287) HWIF is not present anymore!!!
> >>>>do_ide_request(1291) DRIVE is not present anymore. SKIPPING REQUEST!!!
> >>>>
> >>>>As you can see generic_make_request() still has the pointer to that 
> >>>>queue!
> >>>>It gets it with
> >>>>
> >>>>	q = bdev_get_queue(bio->bi_bdev);
> >>>>
> >>>>So the pointer is still stored soemwhere else...
> >>>
> >>>
> >>>Hmmm, perhaps just let ide end requests where the drive has been
> >>>removed might be better. 
> >>
> >>I don't understand what you mean.
> >>
> >>If requests are issued (e.g calling umount) after the drive is gone, then 
> >>I get either a kernel crash or umount hangs cause it waits in 
> >>__wait_on_buffer() ...
> >
> >
> >No, those waiters will be woken up when ide does an end_request for
> >requests coming in for a device which no longer exists.
> 
> But that would mean generating requests for devices, drives and hwifs that 
> no longer exists. But exactly there it will crash! In do_ide_request() and 
> ide_do_request().

ide doesn't generate the requests, it just receives them for processing.
And you want to halt that at the earliest stage possible.

Basically the problem you are trying to solve is hacking around the
missing hotplug support in drivers/ide. And that will never be pretty.
The correct solution would of course be to improve the hotplug support,
I think Bart was/is working on that (cc'ing him).

> ide_unregister() restores some old hwif structure. drive and queue are set 
> to NULL. When I wait "long enough" between "cardctl eject" and "umount" it 
> looks like this:
> 
> ~ # cardctl eject
> ide_release(398)
> ide_unregister(585): index=0
> ide_unregister(698) old HWIF restored!
> hwif=c01dee8c (0), hwgroup=c0fac2a0, drive=00000000, queue=00000000
> ide_detach(164)
> cardmgr[253]: shutting down socket 0
> cardmgr[253]: executing: './ide stop hda'
> cardmgr[253]: executing: 'modprobe -r ide-cs'
> exit_ide_cs(514)
> 
> ~ # umount /mnt/pcmcia/
> sys_umount(494)
> generic_make_request(2859) q=c02d3040
> __generic_unplug_device(1447) calling q->request_fn() @ c00f97e4
> do_ide_request(1279) HWIF=c01dee8c (0), HWGROUP=c0fac2a0 (738987520), 
> drive=c01def1c (0, 0), queue=c02d3040 (00000000)

I don't understand what values you are dumping above, please explain. Is
HWIF c01dee8c or 0?

> Assertion '(hwif->present)' failed in 
> drivers/ide/ide-io.c:do_ide_request(1284)
> Assertion '(drive->present)' failed in 
> drivers/ide/ide-io.c:do_ide_request(1290)
> ide_do_request(1133) hwgroup is busy!
> ide_do_request(1135) hwif=01000406
> 
> The "738987520" above is hwgroup->busy! Obviously completly wrong. This 
> seems to be a hint that an invalid pointer is dereferenced! The pointer 
> hwif=01000406 also does not look very healthy! drive=c01def1c is the result 
> of

Yeah it looks very bad. Same thing with the reference counting, ide
should not be freeing various structures that the block layer still
holds a reference to.

> 	drive = choose_drive(hwgroup);
> 
> but can't be as it was set to NULL before.
> 
> If I don't wait "long enough"  between "cardctl eject" and "umount" the 
> kernel crashes with:
> 
> ~ # cardctl eject; umount /mnt/pcmcia
> ide_release(398)
> ide_unregister(585): index=0
> ide_unregister(698) old HWIF restored!
> hwif=c01dee8c (0), hwgroup=c0268080, drive=00000000, queue=00000000
> ide_detach(164)
> cardmgr[253]: shutting down socket 0
> cardmgr[253]: executing: './ide stop hda'
> sys_umount(494) retval=0
> generic_make_request(2859) q=c02d3040
> __generic_unplug_device(1447) calling q->request_fn() @ c00f97e4
> do_ide_request(1279) HWIF=c01dee8c (0), HWGROUP=c0268080 (0), 
> drive=c01def1c (0, 0), queue=c02d3040 (00000000)
> Assertion '(hwif->present)' failed in 
> drivers/ide/ide-io.c:do_ide_request(1284)
> Assertion '(drive->present)' failed in 
> drivers/ide/ide-io.c:do_ide_request(1290)
> Assertion '(hwgroup->drive)' failed in 
> drivers/ide/ide-io.c:ide_do_request(1124)
> ide_do_request(1127) hwgroup->drive=00000000 !!!!!!!!!!!
> Unable to handle kernel NULL pointer dereference at virtual address 00000010
> ...
> Internal error: Oops: 17 [#1]
> Modules linked in: ide_cs pcmcia at91_cf pcmcia_core
> CPU: 0
> PC is at ide_do_request+0xe0/0x4f4
> 
> It crashes in choose_drive()...
> 
> So how could you generate requests (and handle them sanely) for devices 
> that where removed?

Generation is not a problem, that happens outside of your scope. The job
of the driver is just to make sure that it plays by the rule and at
least makes sure it doesn't crash on its own for an active queue.

> If the drive would only had a hardware failure then probably a timeout 
> would occure and some error handling would take place.
> But when the drive was officially unregistered then no more requests should 
> be generated! I think that's why generic_make_request() checks
> 
> 		q = bdev_get_queue(bio->bi_bdev);
> 		if (!q) {
> 			printk(KERN_ERR
> 			       "generic_make_request: Trying to access "
> 				"nonexistent block-device %s (%Lu)\n",
> 				bdevname(bio->bi_bdev, b),
> 				(long long) bio->bi_sector);
> 
> (You probably noted that I am not too deep into the IDE/block devices 
> buisness...)

There's no one thing the above checks for. A queue can be "dead" but
still be around, imagine a device going away with io pending already -
you can't just kill the queue immediately or driver associated data
structures.

I suggest you take it up with Bart how best to solve this. He might even
already have patches.

-- 
Jens Axboe


  reply	other threads:[~2005-08-02 12:24 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-07-29 12:01 Crash in ide_do_request() on card removal Steven Scholz
2005-08-02  9:57 ` Steven Scholz
2005-08-02 10:48   ` Jens Axboe
2005-08-02 11:10     ` Steven Scholz
2005-08-02 11:13       ` Jens Axboe
2005-08-02 11:17         ` Steven Scholz
2005-08-02 11:28           ` Jens Axboe
2005-08-02 11:30             ` Steven Scholz
2005-08-02 11:33               ` Jens Axboe
2005-08-02 12:09                 ` Steven Scholz
2005-08-02 12:26                   ` Jens Axboe [this message]
2005-08-02 12:40                     ` Steven Scholz
2005-08-02 12:54                       ` Jens Axboe
2005-08-02 13:03                         ` Steven Scholz
2005-08-02 13:06                           ` Jens Axboe
2005-08-02 13:38                             ` Steven Scholz
2005-08-02 13:45                               ` Jens Axboe
2005-08-02 13:54                                 ` Steven Scholz
2005-08-02 14:11                                   ` Jens Axboe
2005-08-08  9:00                                     ` Steven Scholz
2005-08-02 13:28                       ` Bartlomiej Zolnierkiewicz
2005-08-18 12:59                         ` Steven Scholz
2006-01-31 14:28                         ` Steven Scholz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050802122609.GM22569@suse.de \
    --to=axboe@suse.de \
    --cc=bzolnier@gmail.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=steven.scholz@imc-berlin.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).