linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* blk-tag.c: 89 BUG() triggering + initial analysis
@ 2013-11-14 11:09 Hans de Goede
  2013-12-13 20:32 ` James Bottomley
  0 siblings, 1 reply; 2+ messages in thread
From: Hans de Goede @ 2013-11-14 11:09 UTC (permalink / raw)
  To: SCSI development list; +Cc: Hans de Goede, Sarah Sharp

Hi All,

I hope linux-scsi is the right list for this, if not let me know.

I've been working on getting the uas (Usb Attached Scsi) driver into
working shape for the last 3 weeks, so that it can be enabled in 3.14 .

My latest tests where performed on top of 3.11 + a bunch of xhci and
of course uas fixes.

I can reliable trigger the BUG() in blk-tag.c line 89:

void blk_free_tags(struct blk_queue_tag *bqt)
{
         if (unlikely(!__blk_free_tags(bqt)))
                 BUG();
}

I believe this is not an uas driver bug, but rather a bug which
any scsi host which uses scsi_init_shared_tag_map() can trigger,
which is likely not seen before because almost no hosts actually
use scsi_init_shared_tag_map().

The above test triggering the BUG() assumes that blk_free_tags()
caller holds the last reference to the bqt. For scsi hosts using
scsi_init_shared_tag_map() this assumes that the release of the
block_queue through blk-sysfs.c: blk_release_queue() happens before
the release of the host through scsi/hosts.c: scsi_host_dev_release()

I've added some strategic debug printk-s to debug this problem
(and removed the BUG()) and in some cases this is not true.

Here is the output of my debug scripts on a normal unplug of
the uas usb-device:

[ 7678.202540] blk-sysfs.c: blk_release_queue queue_tags ffff88022d4d59e0
[ 7678.202551] blk-tag.c: __blk_queue_free_tags bqt ffff88022d4d59e0
[ 7678.202553] blk-tag.c: __blk_free_tags refcnt before dec: 2
[ 7678.202626] scsi/hosts.c: scsi_host_dev_release bqt ffff88022d4d59e0
[ 7678.202654] blk-tag.c: blk_free_tags bqt ffff88022d4d59e0
[ 7678.202655] blk-tag.c: __blk_free_tags refcnt before dec: 1
[ 7678.202657] blk-tag.c: __blk_free_tags free-ed: ffff88022d4d59e0

Which does not trigger the BUG().

If however I do the following:
1) plug in uas usb-device
2) let udisks auto-mount it under:
    /run/media/hans/4e82585c-3c40-48ac-81ad-11d2a7bad0fc
3) cd into that dir to keep it busy
4) unplug
5) cd out of the directory, at which points udisks will umount it


Then with an unpatched kernel I hit the BUG() at step 5, and with
a kernel with the BUG() removed I get the following debug trace:

[ 9089.808021] scsi/hosts.c: scsi_host_dev_release bqt ffff88022c02ae40
[ 9089.808040] blk-tag.c: blk_free_tags bqt ffff88022c02ae40
[ 9089.808041] blk-tag.c: __blk_free_tags refcnt before dec: 2
[ 9089.808046] blk-sysfs.c: blk_release_queue queue_tags ffff88022c02ae40
[ 9089.808057] blk-tag.c: __blk_queue_free_tags bqt ffff88022c02ae40
[ 9089.808058] blk-tag.c: __blk_free_tags refcnt before dec: 1
[ 9089.808059] blk-tag.c: __blk_free_tags free-ed: ffff88022c02ae40

Notice how in this case scsi_host_dev_release() runs before
blk_release_queue(), breaking the assumption the BUG() tests for.

I think this may be caused by userspace holding a reference to the
kobj which has blk_release_queue as release callback when doing the
umount. But I simply don't know the code in question well enough to do
a more detailed analysis of the problem.

A naive fix, which seems to work, would be to simply remove the BUG()
but I'm not sure if that is the right solution...

Regards,

Hans


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: blk-tag.c: 89 BUG() triggering + initial analysis
  2013-11-14 11:09 blk-tag.c: 89 BUG() triggering + initial analysis Hans de Goede
@ 2013-12-13 20:32 ` James Bottomley
  0 siblings, 0 replies; 2+ messages in thread
From: James Bottomley @ 2013-12-13 20:32 UTC (permalink / raw)
  To: Hans de Goede; +Cc: SCSI development list, Sarah Sharp

On Thu, 2013-11-14 at 12:09 +0100, Hans de Goede wrote:
> Hi All,
> 
> I hope linux-scsi is the right list for this, if not let me know.
> 
> I've been working on getting the uas (Usb Attached Scsi) driver into
> working shape for the last 3 weeks, so that it can be enabled in 3.14 .
> 
> My latest tests where performed on top of 3.11 + a bunch of xhci and
> of course uas fixes.
> 
> I can reliable trigger the BUG() in blk-tag.c line 89:
> 
> void blk_free_tags(struct blk_queue_tag *bqt)
> {
>          if (unlikely(!__blk_free_tags(bqt)))
>                  BUG();
> }
> 
> I believe this is not an uas driver bug, but rather a bug which
> any scsi host which uses scsi_init_shared_tag_map() can trigger,
> which is likely not seen before because almost no hosts actually
> use scsi_init_shared_tag_map().

Um, well, no, stex, qla4xxx and fnic do.  Of those, I think fnic and
qla4xxx have quite large install bases.

> The above test triggering the BUG() assumes that blk_free_tags()
> caller holds the last reference to the bqt. For scsi hosts using
> scsi_init_shared_tag_map() this assumes that the release of the
> block_queue through blk-sysfs.c: blk_release_queue() happens before
> the release of the host through scsi/hosts.c: scsi_host_dev_release()

The devices all hold parent references to the host.  The last action of
a device release is a put on its parent (this is after the last put of
the device queue).  The host release function shouldn't be called until
every device has released its reference ... unless something has gone
wrong with the device model, of course.

> I've added some strategic debug printk-s to debug this problem
> (and removed the BUG()) and in some cases this is not true.
> 
> Here is the output of my debug scripts on a normal unplug of
> the uas usb-device:
> 
> [ 7678.202540] blk-sysfs.c: blk_release_queue queue_tags ffff88022d4d59e0
> [ 7678.202551] blk-tag.c: __blk_queue_free_tags bqt ffff88022d4d59e0
> [ 7678.202553] blk-tag.c: __blk_free_tags refcnt before dec: 2
> [ 7678.202626] scsi/hosts.c: scsi_host_dev_release bqt ffff88022d4d59e0
> [ 7678.202654] blk-tag.c: blk_free_tags bqt ffff88022d4d59e0
> [ 7678.202655] blk-tag.c: __blk_free_tags refcnt before dec: 1
> [ 7678.202657] blk-tag.c: __blk_free_tags free-ed: ffff88022d4d59e0
> 
> Which does not trigger the BUG().
> 
> If however I do the following:
> 1) plug in uas usb-device
> 2) let udisks auto-mount it under:
>     /run/media/hans/4e82585c-3c40-48ac-81ad-11d2a7bad0fc
> 3) cd into that dir to keep it busy
> 4) unplug
> 5) cd out of the directory, at which points udisks will umount it
> 
> 
> Then with an unpatched kernel I hit the BUG() at step 5, and with
> a kernel with the BUG() removed I get the following debug trace:
> 
> [ 9089.808021] scsi/hosts.c: scsi_host_dev_release bqt ffff88022c02ae40
> [ 9089.808040] blk-tag.c: blk_free_tags bqt ffff88022c02ae40
> [ 9089.808041] blk-tag.c: __blk_free_tags refcnt before dec: 2
> [ 9089.808046] blk-sysfs.c: blk_release_queue queue_tags ffff88022c02ae40
> [ 9089.808057] blk-tag.c: __blk_queue_free_tags bqt ffff88022c02ae40
> [ 9089.808058] blk-tag.c: __blk_free_tags refcnt before dec: 1
> [ 9089.808059] blk-tag.c: __blk_free_tags free-ed: ffff88022c02ae40
> 
> Notice how in this case scsi_host_dev_release() runs before
> blk_release_queue(), breaking the assumption the BUG() tests for.
> 
> I think this may be caused by userspace holding a reference to the
> kobj which has blk_release_queue as release callback when doing the
> umount. But I simply don't know the code in question well enough to do
> a more detailed analysis of the problem.
> 
> A naive fix, which seems to work, would be to simply remove the BUG()
> but I'm not sure if that is the right solution...

So this looks like an induced failure caused by the filesystem holding a
queue reference which you force by holding the directory, meaning that
the blk_queue_put in the sdev release function isn't the last put.  The
fix for this is probably to release the tags once the queue is dead and
can accept no I/O.  That would be this, does that fix it?

This isn't the whole fix, we can likely now remove the queue free from
last put as well ... I'm just not sure if there's any block devices that
don't call blk_cleanup_queue, which keeping it in would catch.

James

---

diff --git a/block/blk-core.c b/block/blk-core.c
index 8bdd012..03c0bf9 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -506,6 +506,8 @@ void blk_cleanup_queue(struct request_queue *q)
 	del_timer_sync(&q->backing_dev_info.laptop_mode_wb_timer);
 	blk_sync_queue(q);
 
+	__blk_queue_free_tags(q);
+
 	spin_lock_irq(lock);
 	if (q->queue_lock != &q->__queue_lock)
 		q->queue_lock = &q->__queue_lock;



^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-12-13 20:32 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-14 11:09 blk-tag.c: 89 BUG() triggering + initial analysis Hans de Goede
2013-12-13 20:32 ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).