* [PATCH] genhd fix or ide workaround -- choose one
@ 2006-10-18 22:15 Anton Vorontsov
2006-10-19 12:05 ` Alan Cox
0 siblings, 1 reply; 6+ messages in thread
From: Anton Vorontsov @ 2006-10-18 22:15 UTC (permalink / raw)
To: kernel-discuss; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1432 bytes --]
Hi all,
I've caught deadlock inside IDE layer using IDE-CS: after accessing
to IDE disk placed in PCMCIA (CF card really), it will never probe
again after pulling it from PCMCIA/CF.
The kernel is stuck at
drivers/ide/ide.c:ide_unregister():604:
602 spin_unlock_irq(&ide_lock);
603 device_unregister(&drive->gendev);
604 wait_for_completion(&drive->gendev_rel_comp);
605 spin_lock_irq(&ide_lock);
ide_unregister() assumes that device_unregister() will call
ide-probe.c:drive_release_dev():
1282 static void drive_release_dev (struct device *dev)
1283 {
.... [...]
1302 complete(&drive->gendev_rel_comp);
1303 }
1311 static void init_gendisk (ide_hwif_t *hwif)
1312 {
....
1323 drive->gendev.release = drive_release_dev;
....
1333 }
But release() function will not called because drive->gendev is still
referenced inside genhd layer.
I'm attaching two patches: the fix and the workaround. I assume that
first is a candidate to -mm, and workaround is a temporary solution.
There are two set of patches: for the Linus's git tree and for the
handhelds.org's 2.6.17-hh1 (prefixed with hh-).
Note: this is cross-posting, thus please keep To: and CC: headers, to
keep us all in sync.
Thanks,
-- Anton (irc: bd2)
[-- Attachment #2: genhd-fix.patch --]
[-- Type: text/plain, Size: 1524 bytes --]
From: Anton Vorontsov <cbou@mail.ru>
No need to get_device(driverfs_dev), because it's already gotten by
add_disk(). And no function will call put_device() second time.
This is the source of deadlock[1] in IDE layer. SCSI workaround it by
calling put_device() themself after put_disk(), thus it should reflect
this change by removing put_device().
[1] deadlock in ide_unregister()
at wait_for_completion(&drive->gendev_rel_comp) after
device_unregister(&drive->gendev). That happens by reason of
ide-probe.c:drive_release_dev() not called because of driverfs_dev
(which is drive->gendev) still referenced by add_disk().
diff --git a/fs/partitions/check.c b/fs/partitions/check.c
index 6fb4b61..8371adf 100644
--- a/fs/partitions/check.c
+++ b/fs/partitions/check.c
@@ -378,7 +378,7 @@ static char *make_block_name(struct gend
static int disk_sysfs_symlinks(struct gendisk *disk)
{
- struct device *target = get_device(disk->driverfs_dev);
+ struct device *target = disk->driverfs_dev;
int err;
char *disk_name = NULL;
@@ -414,9 +414,8 @@ err_out_dev_link:
sysfs_remove_link(&disk->kobj, "device");
err_out_disk_name:
kfree(disk_name);
-err_out:
- put_device(target);
}
+err_out:
return err;
}
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 84ff203..03dca98 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1760,7 +1760,6 @@ static void scsi_disk_release(struct cla
disk->private_data = NULL;
put_disk(disk);
- put_device(&sdkp->device->sdev_gendev);
kfree(sdkp);
}
[-- Attachment #3: ide-genhd-workaround.patch --]
[-- Type: text/plain, Size: 1501 bytes --]
From: Anton Vorontsov <cbou@mail.ru>
Because add_disk() getting driverfs_dev twice but putting it only once,
ide-{disk,cd,floppy} should put_device() themselfs. This is workaround
for the bug found in genhd, which causes deadlock in ide_unregister()
at wait_for_completion(&drive->gendev_rel_comp) after
device_unregister(&drive->gendev). That happens by reason of
ide-probe.c:drive_release_dev() not called because of driverfs_dev
(which is drive->gendev) still referenced by add_disk().
diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
index bddfebd..59121a3 100644
--- a/drivers/ide/ide-cd.c
+++ b/drivers/ide/ide-cd.c
@@ -3307,6 +3307,7 @@ static void ide_cd_release(struct kref *
blk_queue_prep_rq(drive->queue, NULL);
g->private_data = NULL;
put_disk(g);
+ put_device(&drive->gendev);
kfree(info);
}
diff --git a/drivers/ide/ide-disk.c b/drivers/ide/ide-disk.c
index 0a05a37..d5d02a1 100644
--- a/drivers/ide/ide-disk.c
+++ b/drivers/ide/ide-disk.c
@@ -1020,6 +1020,7 @@ static void ide_disk_release(struct kref
drive->driver_data = NULL;
g->private_data = NULL;
put_disk(g);
+ put_device(&drive->gendev);
kfree(idkp);
}
diff --git a/drivers/ide/ide-floppy.c b/drivers/ide/ide-floppy.c
index 8ccee9c..7a60d8c 100644
--- a/drivers/ide/ide-floppy.c
+++ b/drivers/ide/ide-floppy.c
@@ -1887,6 +1887,7 @@ static void ide_floppy_release(struct kr
drive->driver_data = NULL;
g->private_data = NULL;
put_disk(g);
+ put_device(&drive->gendev);
kfree(floppy);
}
[-- Attachment #4: hh-genhd-fix.patch --]
[-- Type: text/plain, Size: 1759 bytes --]
From: Anton Vorontsov <cbou@mail.ru>
No need to get_device(driverfs_dev), because it's already gotten by
add_disk(). And no function will call put_device() second time.
This is the source of deadlock[1] in IDE layer. SCSI workaround it by
calling put_device() themself after put_disk(), thus it should reflect
this change by removing put_device().
[1] deadlock in ide_unregister()
at wait_for_completion(&drive->gendev_rel_comp) after
device_unregister(&drive->gendev). That happens by reason of
ide-probe.c:drive_release_dev() not called because of driverfs_dev
(which is drive->gendev) still referenced by add_disk().
Index: fs/partitions/check.c
===================================================================
RCS file: /cvs/linux/kernel26/fs/partitions/check.c,v
retrieving revision 1.17
diff -u -p -b -B -r1.17 check.c
--- fs/partitions/check.c 23 Aug 2006 18:40:39 -0000 1.17
+++ fs/partitions/check.c 18 Oct 2006 21:13:26 -0000
@@ -389,7 +389,7 @@ static char *make_block_name(struct gend
static void disk_sysfs_symlinks(struct gendisk *disk)
{
- struct device *target = get_device(disk->driverfs_dev);
+ struct device *target = disk->driverfs_dev;
if (target) {
char *disk_name = make_block_name(disk);
sysfs_create_link(&disk->kobj,&target->kobj,"device");
Index: drivers/scsi/sd.c
===================================================================
RCS file: /cvs/linux/kernel26/drivers/scsi/sd.c,v
retrieving revision 1.18
diff -u -p -b -B -r1.18 sd.c
--- drivers/scsi/sd.c 23 Aug 2006 18:38:06 -0000 1.18
+++ drivers/scsi/sd.c 18 Oct 2006 21:14:54 -0000
@@ -1756,7 +1756,6 @@ static void scsi_disk_release(struct cla
disk->private_data = NULL;
put_disk(disk);
- put_device(&sdkp->device->sdev_gendev);
kfree(sdkp);
}
[-- Attachment #5: hh-ide-genhd-workaround.patch --]
[-- Type: text/plain, Size: 2034 bytes --]
From: Anton Vorontsov <cbou@mail.ru>
Because add_disk() getting driverfs_dev twice but putting it only once,
ide-{disk,cd,floppy} should put_device() themselfs. This is workaround
for the bug found in genhd, which causes deadlock in ide_unregister()
at wait_for_completion(&drive->gendev_rel_comp) after
device_unregister(&drive->gendev). That happens by reason of
ide-probe.c:drive_release_dev() not called because of driverfs_dev
(which is drive->gendev) still referenced by add_disk().
Index: drivers/ide/ide-cd.c
===================================================================
RCS file: /cvs/linux/kernel26/drivers/ide/ide-cd.c,v
retrieving revision 1.21
diff -u -p -b -B -r1.21 ide-cd.c
--- drivers/ide/ide-cd.c 23 Aug 2006 18:35:05 -0000 1.21
+++ drivers/ide/ide-cd.c 18 Oct 2006 21:18:52 -0000
@@ -3240,6 +3240,7 @@ static void ide_cd_release(struct kref *
blk_queue_prep_rq(drive->queue, NULL);
g->private_data = NULL;
put_disk(g);
+ put_device(&drive->gendev);
kfree(info);
}
Index: drivers/ide/ide-disk.c
===================================================================
RCS file: /cvs/linux/kernel26/drivers/ide/ide-disk.c,v
retrieving revision 1.19
diff -u -p -b -B -r1.19 ide-disk.c
--- drivers/ide/ide-disk.c 23 Aug 2006 18:35:05 -0000 1.19
+++ drivers/ide/ide-disk.c 18 Oct 2006 21:18:52 -0000
@@ -1021,6 +1021,7 @@ static void ide_disk_release(struct kref
drive->devfs_name[0] = '\0';
g->private_data = NULL;
put_disk(g);
+ put_device(&drive->gendev);
kfree(idkp);
}
Index: drivers/ide/ide-floppy.c
===================================================================
RCS file: /cvs/linux/kernel26/drivers/ide/ide-floppy.c,v
retrieving revision 1.16
diff -u -p -b -B -r1.16 ide-floppy.c
--- drivers/ide/ide-floppy.c 23 Aug 2006 18:35:05 -0000 1.16
+++ drivers/ide/ide-floppy.c 18 Oct 2006 21:18:52 -0000
@@ -1893,6 +1893,7 @@ static void ide_floppy_release(struct kr
drive->driver_data = NULL;
g->private_data = NULL;
put_disk(g);
+ put_device(&drive->gendev);
kfree(floppy);
}
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH] genhd fix or ide workaround -- choose one
2006-10-18 22:15 [PATCH] genhd fix or ide workaround -- choose one Anton Vorontsov
@ 2006-10-19 12:05 ` Alan Cox
2006-10-19 12:25 ` Anton Vorontsov
0 siblings, 1 reply; 6+ messages in thread
From: Alan Cox @ 2006-10-19 12:05 UTC (permalink / raw)
To: cbou; +Cc: kernel-discuss, linux-kernel
Ar Iau, 2006-10-19 am 02:15 +0400, ysgrifennodd Anton Vorontsov:
> Hi all,
>
> I've caught deadlock inside IDE layer using IDE-CS: after accessing
> to IDE disk placed in PCMCIA (CF card really), it will never probe
> again after pulling it from PCMCIA/CF.
Works for me, and has done for a long time and I've also had no other
reports of this in the past couple of years so I'm curious what trips it
in your case ?
> The kernel is stuck at
> drivers/ide/ide.c:ide_unregister():604:
>
> 602 spin_unlock_irq(&ide_lock);
> 603 device_unregister(&drive->gendev);
> 604 wait_for_completion(&drive->gendev_rel_comp);
> 605 spin_lock_irq(&ide_lock);
We need to know all the references have gone away, if someone has a disk
referenced and you pull it out we can't clean up immediately so this is
the right place to get stuck both on a refcounting bug and if you've got
something holding a reference for real (eg HAL)
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] genhd fix or ide workaround -- choose one
2006-10-19 12:05 ` Alan Cox
@ 2006-10-19 12:25 ` Anton Vorontsov
2006-10-19 13:08 ` Alan Cox
0 siblings, 1 reply; 6+ messages in thread
From: Anton Vorontsov @ 2006-10-19 12:25 UTC (permalink / raw)
To: Alan Cox; +Cc: kernel-discuss, linux-kernel
On Thu, Oct 19, 2006 at 01:05:52PM +0100, Alan Cox wrote:
> Ar Iau, 2006-10-19 am 02:15 +0400, ysgrifennodd Anton Vorontsov:
> > Hi all,
> >
> > I've caught deadlock inside IDE layer using IDE-CS: after accessing
> > to IDE disk placed in PCMCIA (CF card really), it will never probe
> > again after pulling it from PCMCIA/CF.
>
> Works for me, and has done for a long time and I've also had no other
> reports of this in the past couple of years so I'm curious what trips it
> in your case ?
It just happens every time on HP iPaq hx4700. hx4700 have internal CF
slot, which is working via pxa2xx pcmcia driver.
Have you tried it recently or about six months ago? Because at that time
it was working for me too, but some kernel change
(I suppose semaphores->completion conversion) unearthed that bug, but I'm
not sure.
> > The kernel is stuck at
> > drivers/ide/ide.c:ide_unregister():604:
> >
> > 602 spin_unlock_irq(&ide_lock);
> > 603 device_unregister(&drive->gendev);
> > 604 wait_for_completion(&drive->gendev_rel_comp);
> > 605 spin_lock_irq(&ide_lock);
>
> We need to know all the references have gone away, if someone has a disk
> referenced and you pull it out we can't clean up immediately so this is
> the right place to get stuck both on a refcounting bug and if you've got
> something holding a reference for real (eg HAL)
No HAL used. No udev used. Bare kernel + static /dev + shell. CF is not
even mounted once.
Have you read comments inside -fix patch? Imho it's obvious that nobody
putting driverfs_device second time, but got it twice.
Thanks,
-- Anton (irc: bd2)
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] genhd fix or ide workaround -- choose one
2006-10-19 12:25 ` Anton Vorontsov
@ 2006-10-19 13:08 ` Alan Cox
2006-10-19 13:33 ` Anton Vorontsov
0 siblings, 1 reply; 6+ messages in thread
From: Alan Cox @ 2006-10-19 13:08 UTC (permalink / raw)
To: cbou; +Cc: kernel-discuss, linux-kernel
Ar Iau, 2006-10-19 am 16:25 +0400, ysgrifennodd Anton Vorontsov:
> It just happens every time on HP iPaq hx4700. hx4700 have internal CF
> slot, which is working via pxa2xx pcmcia driver.
I can't duplicate this with the ide_cs driver and a laptop.
> Have you read comments inside -fix patch? Imho it's obvious that nobody
> putting driverfs_device second time, but got it twice.
Its also obvious that it currently works on millions of PC systems and
that also needs explaining before any change is made.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] genhd fix or ide workaround -- choose one
2006-10-19 13:08 ` Alan Cox
@ 2006-10-19 13:33 ` Anton Vorontsov
2006-10-19 14:16 ` Anton Vorontsov
0 siblings, 1 reply; 6+ messages in thread
From: Anton Vorontsov @ 2006-10-19 13:33 UTC (permalink / raw)
To: Alan Cox; +Cc: kernel-discuss, linux-kernel
On Thu, Oct 19, 2006 at 02:08:32PM +0100, Alan Cox wrote:
> Ar Iau, 2006-10-19 am 16:25 +0400, ysgrifennodd Anton Vorontsov:
> > It just happens every time on HP iPaq hx4700. hx4700 have internal CF
> > slot, which is working via pxa2xx pcmcia driver.
>
> I can't duplicate this with the ide_cs driver and a laptop.
>
> > Have you read comments inside -fix patch? Imho it's obvious that nobody
> > putting driverfs_device second time, but got it twice.
>
> Its also obvious that it currently works on millions of PC systems and
> that also needs explaining before any change is made.
You're right, it needs explanation. Unfortunately I don't have any other
PCMCIAable devices to find it out. :-/ Though, I'll try to find answers
in the code.
-- Anton (irc: bd2)
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] genhd fix or ide workaround -- choose one
2006-10-19 13:33 ` Anton Vorontsov
@ 2006-10-19 14:16 ` Anton Vorontsov
0 siblings, 0 replies; 6+ messages in thread
From: Anton Vorontsov @ 2006-10-19 14:16 UTC (permalink / raw)
To: Alan Cox; +Cc: kernel-discuss, linux-kernel
On Thu, Oct 19, 2006 at 05:33:07PM +0400, Anton Vorontsov wrote:
> On Thu, Oct 19, 2006 at 02:08:32PM +0100, Alan Cox wrote:
> > Ar Iau, 2006-10-19 am 16:25 +0400, ysgrifennodd Anton Vorontsov:
> > > It just happens every time on HP iPaq hx4700. hx4700 have internal CF
> > > slot, which is working via pxa2xx pcmcia driver.
> >
> > I can't duplicate this with the ide_cs driver and a laptop.
> >
> > > Have you read comments inside -fix patch? Imho it's obvious that nobody
> > > putting driverfs_device second time, but got it twice.
> >
> > Its also obvious that it currently works on millions of PC systems and
> > that also needs explaining before any change is made.
>
> You're right, it needs explanation. Unfortunately I don't have any other
> PCMCIAable devices to find it out. :-/ Though, I'll try to find answers
> in the code.
Okay. Wild guess: you're using 32 bit CardBus (yenta socket?), but hx4700
is using 16 bit PCMCIA (drivers/pcmcia/ds.c). And indeed ds.c calling
device_unregister() which triggers that sequece:
ide_cs.c:ide_detach()
ide_cs.c:ide_release()
ide.c:ide_unregister() <- hang here
I've grep'ed drivers/pcmcia/ for the device_unregister and seems nobody
calling it except drivers/pcmcia/ds.c.
-- Anton (irc: bd2)
p.s. drivers/pcmcia/cardbus.c states:
* cardbus.c -- 16-bit PCMCIA core support
typo?
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2006-10-19 14:15 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-18 22:15 [PATCH] genhd fix or ide workaround -- choose one Anton Vorontsov
2006-10-19 12:05 ` Alan Cox
2006-10-19 12:25 ` Anton Vorontsov
2006-10-19 13:08 ` Alan Cox
2006-10-19 13:33 ` Anton Vorontsov
2006-10-19 14:16 ` Anton Vorontsov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox