Re: UAS hangs khubd on USB disconnect

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: UAS hangs khubd on USB disconnect
       [not found] <20131212000715.GA3181@xanatos>
@ 2013-12-12 13:13 ` Hans de Goede
  2013-12-12 22:04 ` [usb-storage] " Alan Stern
  1 sibling, 0 replies; 26+ messages in thread
From: Hans de Goede @ 2013-12-12 13:13 UTC (permalink / raw)
  To: Sarah Sharp; +Cc: linux-usb, linux-scsi, USB Storage List

Hi,

On 12/12/2013 01:07 AM, Sarah Sharp wrote:
> Hi Hans,
>
> I've been testing the UAS code you sent a pull request for against
> 3.13-rc1, and I've run into a rather nasty issue with USB disconnect.
>
> I ran some tests with a USB 3.0 storage device under xHCI.  The disk has
> three 10GB partitions: ext3 (sdb1), ext4 (sdb2), and fat32 (sdb4).
> There was a btrfs partition on sdb3, but I deleted it.
>
> If I start to play a movie on the ext4 partition, and then yank the USB
> cable, the uas driver is unbound from the device.  It looks like
> something goes wrong in the SCSI layer shortly after that, causing an
> oops in sysfs_remove_group().
>
> If I plug in the device again, the uas driver loads and sdb is
> recognized, but the partitions aren't mounted.  If I disconnect the
> device, the USB core hub thread (khubd) tries to unbind the interface,
> and the uas driver's disconnect function hangs.  At that point, USB
> device hotplug on any ports (including EHCI ports) are not recognized,
> because khub is hung.  Submitting and completing URBs still work, since
> that doesn't involve khubd.
>
> The end result is that disconnecting the UAS device causes USB hotplug
> to be lost on unrelated ports, while other USB devices that were
> attached at the time of the disconnect still work.
>
> I can reproduce this behavior when the UAS device is attached to the
> EHCI port only, so it's not an xHCI specific bug.
>
> If I use the "no UAS" quirk to make the usb-storage driver bind to the
> device, I can trigger the oops from the SCSI layer, but the partitions
> are always mounted and khubd doesn't hang.

Hmm, interesting I myself have experienced a similar bug during testing,
see: http://www.spinics.net/lists/linux-scsi/msg70002.html

On which I've not received any useful reply myself yet.

It seems that the scsi subsys and/or sd driver has some issues with
not tearing things down in the proper order when a disk is in use
while the scsi-host it is attached to gets removed.

As you can see in the reproduction instructions in my mail, the BUG
does not happen until I cd out of the mountpoint, at which point
the kernel tries to clean things up, and the clean up at that
point happens in a different order then it happens when unplugging
the disk while the mountpoint is not busy.

Regards,

Hans

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [usb-storage] UAS hangs khubd on USB disconnect
       [not found] <20131212000715.GA3181@xanatos>
  2013-12-12 13:13 ` UAS hangs khubd on USB disconnect Hans de Goede
@ 2013-12-12 22:04 ` Alan Stern
       [not found]   ` <Pine.LNX.4.44L0.1312121632470.849-100000-IYeN2dnnYyZXsRXLowluHWD2FQJk+8+b@public.gmane.org>
  1 sibling, 1 reply; 26+ messages in thread
From: Alan Stern @ 2013-12-12 22:04 UTC (permalink / raw)
  To: Sarah Sharp, Tejun Heo, James Bottomley
  Cc: Hans de Goede, USB list, SCSI development list, USB Storage List

On Wed, 11 Dec 2013, Sarah Sharp wrote:

> Hi Hans,
> 
> I've been testing the UAS code you sent a pull request for against
> 3.13-rc1, and I've run into a rather nasty issue with USB disconnect.
> 
> I ran some tests with a USB 3.0 storage device under xHCI.  The disk has
> three 10GB partitions: ext3 (sdb1), ext4 (sdb2), and fat32 (sdb4).
> There was a btrfs partition on sdb3, but I deleted it.
> 
> If I start to play a movie on the ext4 partition, and then yank the USB
> cable, the uas driver is unbound from the device.  It looks like
> something goes wrong in the SCSI layer shortly after that, causing an
> oops in sysfs_remove_group().

I did a little testing.  It turns out this WARN (not an oops) is the 
result of recent changes to sysfs, combined with the peculiar way the 
SCSI layer handles targets.

In the new kernel, when you call device_del for some object, the 
object's directory and everything beneath it get removed from sysfs.  
This wasn't true in the past.

When a USB drive is unplugged, almost everything below it gets
unregistered.  But not the SCSI target -- it remains registered until
the number of "reap references" drops to 0.  This doesn't happen until
all the devices beneath it are released, which happens when all the
open file references are closed and the filesystem is unmounted.

So scsi_target_reap_usercontext ends up calling device_del for the
target after everything else has been removed from sysfs.  As part of
normal device_del processing, attribute groups get removed.  In
particular the power/ subdirectory is removed from the target's sysfs
directory.  But the sysfs directories are long gone by this time, so
sysfs_remove_group complains that it was asked to remove a non-existent
subdirectory.

Given the way things work now, I suspect these warnings are truly 
harmless.  We could simply get rid of the WARN in sysfs_remove_group.

The alternative is to call device_del for SCSI targets earlier on, such 
as when their hosts are unregistered.  I don't know how James would 
feel about this approach.  It would be difficult because targets use 
their own reference counts instead of relying on the usual device 
refcounting mechanism.

Alan Stern

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [usb-storage] UAS hangs khubd on USB disconnect
       [not found]   ` <Pine.LNX.4.44L0.1312121632470.849-100000-IYeN2dnnYyZXsRXLowluHWD2FQJk+8+b@public.gmane.org>
@ 2013-12-13 18:09     ` Sarah Sharp
  2013-12-13 18:19       ` Alan Stern
  0 siblings, 1 reply; 26+ messages in thread
From: Sarah Sharp @ 2013-12-13 18:09 UTC (permalink / raw)
  To: Alan Stern
  Cc: Tejun Heo, James Bottomley, Hans de Goede, USB list,
	SCSI development list, USB Storage List

On Thu, Dec 12, 2013 at 05:04:31PM -0500, Alan Stern wrote:
> On Wed, 11 Dec 2013, Sarah Sharp wrote:
> 
> > Hi Hans,
> > 
> > I've been testing the UAS code you sent a pull request for against
> > 3.13-rc1, and I've run into a rather nasty issue with USB disconnect.
> > 
> > I ran some tests with a USB 3.0 storage device under xHCI.  The disk has
> > three 10GB partitions: ext3 (sdb1), ext4 (sdb2), and fat32 (sdb4).
> > There was a btrfs partition on sdb3, but I deleted it.
> > 
> > If I start to play a movie on the ext4 partition, and then yank the USB
> > cable, the uas driver is unbound from the device.  It looks like
> > something goes wrong in the SCSI layer shortly after that, causing an
> > oops in sysfs_remove_group().
> 
> I did a little testing.  It turns out this WARN (not an oops) is the 
> result of recent changes to sysfs, combined with the peculiar way the 
> SCSI layer handles targets.
> 
> In the new kernel, when you call device_del for some object, the 
> object's directory and everything beneath it get removed from sysfs.  
> This wasn't true in the past.
> 
> When a USB drive is unplugged, almost everything below it gets
> unregistered.  But not the SCSI target -- it remains registered until
> the number of "reap references" drops to 0.  This doesn't happen until
> all the devices beneath it are released, which happens when all the
> open file references are closed and the filesystem is unmounted.
> 
> So scsi_target_reap_usercontext ends up calling device_del for the
> target after everything else has been removed from sysfs.  As part of
> normal device_del processing, attribute groups get removed.  In
> particular the power/ subdirectory is removed from the target's sysfs
> directory.  But the sysfs directories are long gone by this time, so
> sysfs_remove_group complains that it was asked to remove a non-existent
> subdirectory.
> 
> Given the way things work now, I suspect these warnings are truly 
> harmless.  We could simply get rid of the WARN in sysfs_remove_group.
> 
> The alternative is to call device_del for SCSI targets earlier on, such 
> as when their hosts are unregistered.  I don't know how James would 
> feel about this approach.  It would be difficult because targets use 
> their own reference counts instead of relying on the usual device 
> refcounting mechanism.

Thanks for looking into this.  I think just getting rid of the WARN
would be sufficient.  Can you make a patch for that?

The patch still won't help with the UAS issues with
scsi_init_shared_tag_map though.

Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [usb-storage] UAS hangs khubd on USB disconnect
  2013-12-13 18:09     ` Sarah Sharp
@ 2013-12-13 18:19       ` Alan Stern
       [not found]         ` <Pine.LNX.4.44L0.1312131316470.1185-100000-IYeN2dnnYyZXsRXLowluHWD2FQJk+8+b@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Alan Stern @ 2013-12-13 18:19 UTC (permalink / raw)
  To: Sarah Sharp
  Cc: Tejun Heo, James Bottomley, Hans de Goede, USB list,
	SCSI development list, USB Storage List

On Fri, 13 Dec 2013, Sarah Sharp wrote:

> > Given the way things work now, I suspect these warnings are truly 
> > harmless.  We could simply get rid of the WARN in sysfs_remove_group.
> > 
> > The alternative is to call device_del for SCSI targets earlier on, such 
> > as when their hosts are unregistered.  I don't know how James would 
> > feel about this approach.  It would be difficult because targets use 
> > their own reference counts instead of relying on the usual device 
> > refcounting mechanism.
> 
> Thanks for looking into this.  I think just getting rid of the WARN
> would be sufficient.  Can you make a patch for that?

Easily.  The downside is that there would no longer be any warning 
when someone tries to remove a wrong subdirectory by mistake.

> The patch still won't help with the UAS issues with
> scsi_init_shared_tag_map though.

I wasn't clear on the reason for that problem.  Does it also arise from 
late device_del for scsi_target?  I could try to change the way that 
works, if anybody (Hans?) would like to test it.

Alan Stern


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [usb-storage] UAS hangs khubd on USB disconnect
       [not found]         ` <Pine.LNX.4.44L0.1312131316470.1185-100000-IYeN2dnnYyZXsRXLowluHWD2FQJk+8+b@public.gmane.org>
@ 2013-12-13 18:33           ` Tejun Heo
  2013-12-13 19:18             ` James Bottomley
  2013-12-13 19:07           ` Sarah Sharp
  1 sibling, 1 reply; 26+ messages in thread
From: Tejun Heo @ 2013-12-13 18:33 UTC (permalink / raw)
  To: Alan Stern
  Cc: Sarah Sharp, James Bottomley, Hans de Goede, USB list,
	SCSI development list, USB Storage List, Greg Kroah-Hartman

Hello, guys.

(cc'ing Greg)

On Fri, Dec 13, 2013 at 01:19:36PM -0500, Alan Stern wrote:
> On Fri, 13 Dec 2013, Sarah Sharp wrote:
> 
> > > Given the way things work now, I suspect these warnings are truly 
> > > harmless.  We could simply get rid of the WARN in sysfs_remove_group.
> > > 
> > > The alternative is to call device_del for SCSI targets earlier on, such 
> > > as when their hosts are unregistered.  I don't know how James would 
> > > feel about this approach.  It would be difficult because targets use 
> > > their own reference counts instead of relying on the usual device 
> > > refcounting mechanism.
> > 
> > Thanks for looking into this.  I think just getting rid of the WARN
> > would be sufficient.  Can you make a patch for that?
> 
> Easily.  The downside is that there would no longer be any warning 
> when someone tries to remove a wrong subdirectory by mistake.
> 
> > The patch still won't help with the UAS issues with
> > scsi_init_shared_tag_map though.
> 
> I wasn't clear on the reason for that problem.  Does it also arise from 
> late device_del for scsi_target?  I could try to change the way that 
> works, if anybody (Hans?) would like to test it.

While the recent sysfs changes made this issue more visible, Greg
wants to make sure that devices are removed from leaf up in all cases
and keep the warning to ensure that.  Would there be a way fix SCSI
removal ordering?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [usb-storage] UAS hangs khubd on USB disconnect
       [not found]         ` <Pine.LNX.4.44L0.1312131316470.1185-100000-IYeN2dnnYyZXsRXLowluHWD2FQJk+8+b@public.gmane.org>
  2013-12-13 18:33           ` Tejun Heo
@ 2013-12-13 19:07           ` Sarah Sharp
  1 sibling, 0 replies; 26+ messages in thread
From: Sarah Sharp @ 2013-12-13 19:07 UTC (permalink / raw)
  To: Alan Stern
  Cc: Tejun Heo, James Bottomley, Hans de Goede, USB list,
	SCSI development list, USB Storage List, Jens Axboe

On Fri, Dec 13, 2013 at 01:19:36PM -0500, Alan Stern wrote:
> On Fri, 13 Dec 2013, Sarah Sharp wrote:
> 
> > > Given the way things work now, I suspect these warnings are truly 
> > > harmless.  We could simply get rid of the WARN in sysfs_remove_group.
> > > 
> > > The alternative is to call device_del for SCSI targets earlier on, such 
> > > as when their hosts are unregistered.  I don't know how James would 
> > > feel about this approach.  It would be difficult because targets use 
> > > their own reference counts instead of relying on the usual device 
> > > refcounting mechanism.
> > 
> > Thanks for looking into this.  I think just getting rid of the WARN
> > would be sufficient.  Can you make a patch for that?
> 
> Easily.  The downside is that there would no longer be any warning 
> when someone tries to remove a wrong subdirectory by mistake.
> 
> > The patch still won't help with the UAS issues with
> > scsi_init_shared_tag_map though.
> 
> I wasn't clear on the reason for that problem.  Does it also arise from 
> late device_del for scsi_target?  I could try to change the way that 
> works, if anybody (Hans?) would like to test it.

I can certainly test it with my UAS device as well.  I don't know if the
issue arises from the late device_del.  Looking at Hans' stack trace,
the BUG in blk_free_tags gets triggered when the scsi_host is released
before the block_queue release.  So I don't think moving the scsi_target
delete sooner would help?  I really don't know anything about the SCSI
or block layer though.

I can confirm that simply removing the BUG() call in blk_free_tags
allows the partitions on the UAS device to be mounted after it was
hot-removed in the middle of video playback.  Hans, maybe in order to
get an answer to your question[1], you should submit a patch to the
block layer maintainer, Jens Axboe?

Sarah Sharp

[1] http://www.spinics.net/lists/linux-scsi/msg70002.html
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [usb-storage] UAS hangs khubd on USB disconnect
  2013-12-13 18:33           ` Tejun Heo
@ 2013-12-13 19:18             ` James Bottomley
       [not found]               ` <1386962327.2055.54.camel-sFMDBYUN5F8GjUHQrlYNx2Wm91YjaHnnhRte9Li2A+AAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: James Bottomley @ 2013-12-13 19:18 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Alan Stern, Sarah Sharp, Hans de Goede, USB list,
	SCSI development list, USB Storage List, Greg Kroah-Hartman

On Fri, 2013-12-13 at 13:33 -0500, Tejun Heo wrote:
> Hello, guys.
> 
> (cc'ing Greg)
> 
> On Fri, Dec 13, 2013 at 01:19:36PM -0500, Alan Stern wrote:
> > On Fri, 13 Dec 2013, Sarah Sharp wrote:
> > 
> > > > Given the way things work now, I suspect these warnings are truly 
> > > > harmless.  We could simply get rid of the WARN in sysfs_remove_group.
> > > > 
> > > > The alternative is to call device_del for SCSI targets earlier on, such 
> > > > as when their hosts are unregistered.  I don't know how James would 
> > > > feel about this approach.  It would be difficult because targets use 
> > > > their own reference counts instead of relying on the usual device 
> > > > refcounting mechanism.
> > > 
> > > Thanks for looking into this.  I think just getting rid of the WARN
> > > would be sufficient.  Can you make a patch for that?
> > 
> > Easily.  The downside is that there would no longer be any warning 
> > when someone tries to remove a wrong subdirectory by mistake.
> > 
> > > The patch still won't help with the UAS issues with
> > > scsi_init_shared_tag_map though.
> > 
> > I wasn't clear on the reason for that problem.  Does it also arise from 
> > late device_del for scsi_target?  I could try to change the way that 
> > works, if anybody (Hans?) would like to test it.
> 
> While the recent sysfs changes made this issue more visible, Greg
> wants to make sure that devices are removed from leaf up in all cases
> and keep the warning to ensure that.  Would there be a way fix SCSI
> removal ordering?

Could someone analyse the actual problem?  We're quite careful even on
host remove to iterate and remove all the devices, then targets, then
host (and allied transport objects).  Which removal is inverted?

James



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [usb-storage] UAS hangs khubd on USB disconnect
       [not found]               ` <1386962327.2055.54.camel-sFMDBYUN5F8GjUHQrlYNx2Wm91YjaHnnhRte9Li2A+AAvxtiuMwx3w@public.gmane.org>
@ 2013-12-13 20:03                 ` James Bottomley
       [not found]                   ` <1386964999.2055.59.camel-sFMDBYUN5F8GjUHQrlYNx2Wm91YjaHnnhRte9Li2A+AAvxtiuMwx3w@public.gmane.org>
                                     ` (3 more replies)
  2013-12-13 20:05                 ` Alan Stern
  1 sibling, 4 replies; 26+ messages in thread
From: James Bottomley @ 2013-12-13 20:03 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Alan Stern, Sarah Sharp, Hans de Goede, USB list,
	SCSI development list, USB Storage List, Greg Kroah-Hartman

On Fri, 2013-12-13 at 11:18 -0800, James Bottomley wrote:
> On Fri, 2013-12-13 at 13:33 -0500, Tejun Heo wrote:
> > Hello, guys.
> > 
> > (cc'ing Greg)
> > 
> > On Fri, Dec 13, 2013 at 01:19:36PM -0500, Alan Stern wrote:
> > > On Fri, 13 Dec 2013, Sarah Sharp wrote:
> > > 
> > > > > Given the way things work now, I suspect these warnings are truly 
> > > > > harmless.  We could simply get rid of the WARN in sysfs_remove_group.
> > > > > 
> > > > > The alternative is to call device_del for SCSI targets earlier on, such 
> > > > > as when their hosts are unregistered.  I don't know how James would 
> > > > > feel about this approach.  It would be difficult because targets use 
> > > > > their own reference counts instead of relying on the usual device 
> > > > > refcounting mechanism.
> > > > 
> > > > Thanks for looking into this.  I think just getting rid of the WARN
> > > > would be sufficient.  Can you make a patch for that?
> > > 
> > > Easily.  The downside is that there would no longer be any warning 
> > > when someone tries to remove a wrong subdirectory by mistake.
> > > 
> > > > The patch still won't help with the UAS issues with
> > > > scsi_init_shared_tag_map though.
> > > 
> > > I wasn't clear on the reason for that problem.  Does it also arise from 
> > > late device_del for scsi_target?  I could try to change the way that 
> > > works, if anybody (Hans?) would like to test it.
> > 
> > While the recent sysfs changes made this issue more visible, Greg
> > wants to make sure that devices are removed from leaf up in all cases
> > and keep the warning to ensure that.  Would there be a way fix SCSI
> > removal ordering?
> 
> Could someone analyse the actual problem?  We're quite careful even on
> host remove to iterate and remove all the devices, then targets, then
> host (and allied transport objects).  Which removal is inverted?

Actually, I think I have this figured out.  There's a thinko in one of
the scsi_target_reap() cases.  The original (and still existing) problem
with targets is that nothing creates them and nothing destroys them, so,
while we could rely on the refcounting of the device model to preserve
the actual target object, we had no idea when to remove it from
visibility.  That was the job of the reap reference, to track
visibility.  It looks like the reap on device last put is occurring too
late.  I think we should reap immediately after doing the sdev
device_del, so does this fix the warn on? (I'm not sure because no-one
has actually posted a backtrace, but it sounds like this is the
problem).

James

---

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 8ff62c2..98d4eb3 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -399,8 +399,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
 	/* NULL queue means the device can't be used */
 	sdev->request_queue = NULL;
 
-	scsi_target_reap(scsi_target(sdev));

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [usb-storage] UAS hangs khubd on USB disconnect
       [not found]               ` <1386962327.2055.54.camel-sFMDBYUN5F8GjUHQrlYNx2Wm91YjaHnnhRte9Li2A+AAvxtiuMwx3w@public.gmane.org>
  2013-12-13 20:03                 ` James Bottomley
@ 2013-12-13 20:05                 ` Alan Stern
  1 sibling, 0 replies; 26+ messages in thread
From: Alan Stern @ 2013-12-13 20:05 UTC (permalink / raw)
  To: James Bottomley
  Cc: Tejun Heo, Sarah Sharp, Hans de Goede, USB list,
	SCSI development list, USB Storage List, Greg Kroah-Hartman

On Fri, 13 Dec 2013, James Bottomley wrote:

> > > I wasn't clear on the reason for that problem.  Does it also arise from 
> > > late device_del for scsi_target?  I could try to change the way that 
> > > works, if anybody (Hans?) would like to test it.
> > 
> > While the recent sysfs changes made this issue more visible, Greg
> > wants to make sure that devices are removed from leaf up in all cases
> > and keep the warning to ensure that.  Would there be a way fix SCSI
> > removal ordering?
> 
> Could someone analyse the actual problem?  We're quite careful even on
> host remove to iterate and remove all the devices, then targets, then
> host (and allied transport objects).  Which removal is inverted?

The scsi_host is removed before the scsi_target.

The reason is that scsi_remove_host() calls
device_del(&shost->shost_gendev) directly, but
scsi_target_reap_usercontext() doesn't call device_del(&starget->dev)
until it gets invoked (indirectly) from
scsi_device_dev_release_usercontext(), by way of scsi_target_reap().

Thus, the host gets removed from visibility at the appropriate time,
but the target remains until all the scsi_devices beneath it are not
only removed from visibility but also released (their refcounts drop to
0).

This can occur much later if, for example, a scsi_device holds a
mounted filesystem.  The scsi_host and scsi_device are removed when the
underlying USB device is unplugged.  But the scsi_device isn't
released, and hence the scsi_target isn't removed, until the filesystem
is unmounted.

Broadly speaking, the correct approach would be to call
scsi_target_reap() from __scsi_remove_device() instead of from
scsi_device_dev_release_usercontext().

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [usb-storage] UAS hangs khubd on USB disconnect
       [not found]                   ` <1386964999.2055.59.camel-sFMDBYUN5F8GjUHQrlYNx2Wm91YjaHnnhRte9Li2A+AAvxtiuMwx3w@public.gmane.org>
@ 2013-12-13 20:22                     ` Hans de Goede
  0 siblings, 0 replies; 26+ messages in thread
From: Hans de Goede @ 2013-12-13 20:22 UTC (permalink / raw)
  To: James Bottomley, Tejun Heo
  Cc: Alan Stern, Sarah Sharp, USB list, SCSI development list,
	USB Storage List, Greg Kroah-Hartman

Hi James,

On 12/13/2013 09:03 PM, James Bottomley wrote:
> On Fri, 2013-12-13 at 11:18 -0800, James Bottomley wrote:
>> On Fri, 2013-12-13 at 13:33 -0500, Tejun Heo wrote:
>>> Hello, guys.
>>>
>>> (cc'ing Greg)
>>>
>>> On Fri, Dec 13, 2013 at 01:19:36PM -0500, Alan Stern wrote:
>>>> On Fri, 13 Dec 2013, Sarah Sharp wrote:
>>>>
>>>>>> Given the way things work now, I suspect these warnings are truly
>>>>>> harmless.  We could simply get rid of the WARN in sysfs_remove_group
>>>>>>
>>>>>> The alternative is to call device_del for SCSI targets earlier on, such
>>>>>> as when their hosts are unregistered.  I don't know how James would
>>>>>> feel about this approach.  It would be difficult because targets use
>>>>>> their own reference counts instead of relying on the usual device
>>>>>> refcounting mechanism.
>>>>>
>>>>> Thanks for looking into this.  I think just getting rid of the WARN
>>>>> would be sufficient.  Can you make a patch for that?
>>>>
>>>> Easily.  The downside is that there would no longer be any warning
>>>> when someone tries to remove a wrong subdirectory by mistake.
>>>>
>>>>> The patch still won't help with the UAS issues with
>>>>> scsi_init_shared_tag_map though.
>>>>
>>>> I wasn't clear on the reason for that problem.  Does it also arise from
>>>> late device_del for scsi_target?  I could try to change the way that
>>>> works, if anybody (Hans?) would like to test it.
>>>
>>> While the recent sysfs changes made this issue more visible, Greg
>>> wants to make sure that devices are removed from leaf up in all cases
>>> and keep the warning to ensure that.  Would there be a way fix SCSI
>>> removal ordering?
>>
>> Could someone analyse the actual problem?  We're quite careful even on
>> host remove to iterate and remove all the devices, then targets, then
>> host (and allied transport objects).  Which removal is inverted?
>
> Actually, I think I have this figured out.  There's a thinko in one of
> the scsi_target_reap() cases.  The original (and still existing) problem
> with targets is that nothing creates them and nothing destroys them, so,
> while we could rely on the refcounting of the device model to preserve
> the actual target object, we had no idea when to remove it from
> visibility.  That was the job of the reap reference, to track
> visibility.  It looks like the reap on device last put is occurring too
> late.  I think we should reap immediately after doing the sdev
> device_del, so does this fix the warn on? (I'm not sure because no-one
> has actually posted a backtrace, but it sounds like this is the
> problem).

Thanks I'll give this patch a try. As for backtraces I've posted some
(partial) backtraces as well as reproduction instructions here:
http://www.spinics.net/lists/linux-scsi/msg70002.html

Regards,

Hans
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [usb-storage] UAS hangs khubd on USB disconnect
  2013-12-13 20:03                 ` James Bottomley
       [not found]                   ` <1386964999.2055.59.camel-sFMDBYUN5F8GjUHQrlYNx2Wm91YjaHnnhRte9Li2A+AAvxtiuMwx3w@public.gmane.org>
@ 2013-12-13 21:06                   ` Alan Stern
  2013-12-13 21:18                     ` James Bottomley
  2013-12-13 21:13                   ` [usb-storage] UAS hangs khubd on USB disconnect Sarah Sharp
  2013-12-13 21:24                   ` Hans de Goede
  3 siblings, 1 reply; 26+ messages in thread
From: Alan Stern @ 2013-12-13 21:06 UTC (permalink / raw)
  To: James Bottomley
  Cc: Tejun Heo, Sarah Sharp, Hans de Goede, USB list,
	SCSI development list, USB Storage List, Greg Kroah-Hartman

On Fri, 13 Dec 2013, James Bottomley wrote:

> Actually, I think I have this figured out.  There's a thinko in one of
> the scsi_target_reap() cases.  The original (and still existing) problem
> with targets is that nothing creates them and nothing destroys them, so,
> while we could rely on the refcounting of the device model to preserve
> the actual target object, we had no idea when to remove it from
> visibility.  That was the job of the reap reference, to track
> visibility.  It looks like the reap on device last put is occurring too
> late.  I think we should reap immediately after doing the sdev
> device_del, so does this fix the warn on? (I'm not sure because no-one
> has actually posted a backtrace, but it sounds like this is the
> problem).
> 
> James
> 
> ---
> 
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index 8ff62c2..98d4eb3 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -399,8 +399,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
>  	/* NULL queue means the device can't be used */
>  	sdev->request_queue = NULL;
>  
> -	scsi_target_reap(scsi_target(sdev));
> -
>  	kfree(sdev->inquiry);
>  	kfree(sdev);
>  
> @@ -1044,6 +1042,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
>  	} else
>  		put_device(&sdev->sdev_dev);
>  
> +	scsi_target_reap(scsi_target(sdev));
> +
>  	/*
>  	 * Stop accepting new requests and wait until all queuecommand() and
>  	 * scsi_run_queue() invocations have finished before tearing down the

This is not right.  The problem is that you don't keep track explicitly 
of the number of references to a target; you rely implicitly on 
starget->devices being non-empty.  starget->reap_ref is only a count of 
local operations that should block removal.

Consider, for example, what would happen if there is more than one LUN.  
What if one of them is removed while the other remains?

A more invasive change is needed.

Alan Stern


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [usb-storage] UAS hangs khubd on USB disconnect
  2013-12-13 20:03                 ` James Bottomley
       [not found]                   ` <1386964999.2055.59.camel-sFMDBYUN5F8GjUHQrlYNx2Wm91YjaHnnhRte9Li2A+AAvxtiuMwx3w@public.gmane.org>
  2013-12-13 21:06                   ` Alan Stern
@ 2013-12-13 21:13                   ` Sarah Sharp
  2013-12-13 21:24                   ` Hans de Goede
  3 siblings, 0 replies; 26+ messages in thread
From: Sarah Sharp @ 2013-12-13 21:13 UTC (permalink / raw)
  To: James Bottomley
  Cc: Tejun Heo, Alan Stern, Hans de Goede, USB list,
	SCSI development list, USB Storage List, Greg Kroah-Hartman

[-- Attachment #1: Type: text/plain, Size: 1792 bytes --]

On Fri, Dec 13, 2013 at 12:03:19PM -0800, James Bottomley wrote:
> Actually, I think I have this figured out.  There's a thinko in one of
> the scsi_target_reap() cases.  The original (and still existing) problem
> with targets is that nothing creates them and nothing destroys them, so,
> while we could rely on the refcounting of the device model to preserve
> the actual target object, we had no idea when to remove it from
> visibility.  That was the job of the reap reference, to track
> visibility.  It looks like the reap on device last put is occurring too
> late.  I think we should reap immediately after doing the sdev
> device_del, so does this fix the warn on? (I'm not sure because no-one
> has actually posted a backtrace, but it sounds like this is the
> problem).

I can confirm that this patch fixes both the sysfs warning, and the
issue with USB storage disconnect during video playback.  I did trigger
a new (possibly unrelated?) mutex deadlock warning.  dmesg is attached.

Sarah Sharp

> ---
> 
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index 8ff62c2..98d4eb3 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -399,8 +399,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
>  	/* NULL queue means the device can't be used */
>  	sdev->request_queue = NULL;
>  
> -	scsi_target_reap(scsi_target(sdev));
> -
>  	kfree(sdev->inquiry);
>  	kfree(sdev);
>  
> @@ -1044,6 +1042,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
>  	} else
>  		put_device(&sdev->sdev_dev);
>  
> +	scsi_target_reap(scsi_target(sdev));
> +
>  	/*
>  	 * Stop accepting new requests and wait until all queuecommand() and
>  	 * scsi_run_queue() invocations have finished before tearing down the
> 
> 

[-- Attachment #2: uas-james-fix-2013-12-13-13-02.txt --]
[-- Type: text/plain, Size: 30771 bytes --]

Dec 13 13:02:02 xanatos kernel: [    7.029300] usb usb4: bus auto-suspend, wakeup 1
Dec 13 13:02:02 xanatos kernel: [    7.040327] input: SynPS/2 Synaptics TouchPad as /devices/platform/i8042/serio1/input/input11
Dec 13 13:02:02 xanatos kernel: [    7.112065] btusb 3-1.4:1.0: usb_probe_interface
Dec 13 13:02:02 xanatos kernel: [    7.112070] btusb 3-1.4:1.0: usb_probe_interface - got id
Dec 13 13:02:02 xanatos kernel: [    7.122731] usbcore: registered new interface driver btusb
Dec 13 13:02:02 xanatos kernel: [    7.167710] Linux video capture interface: v2.00
Dec 13 13:02:02 xanatos kernel: [    7.235181] uvcvideo 3-1.6:1.0: usb_probe_interface
Dec 13 13:02:02 xanatos kernel: [    7.235187] uvcvideo 3-1.6:1.0: usb_probe_interface - got id
Dec 13 13:02:02 xanatos kernel: [    7.235293] uvcvideo: Found UVC 1.00 device Integrated Camera (04f2:b2ea)
Dec 13 13:02:02 xanatos kernel: [    7.242661] input: Integrated Camera as /devices/pci0000:00/0000:00:1a.0/usb3/3-1/3-1.6/3-1.6:1.0/input/input20
Dec 13 13:02:02 xanatos kernel: [    7.244470] usbcore: registered new interface driver uvcvideo
Dec 13 13:02:02 xanatos kernel: [    7.244473] USB Video Class driver (1.1.1)
Dec 13 13:02:03 xanatos kernel: [    8.044806] bio: create slab <bio-2> at 2
Dec 13 13:02:03 xanatos kernel: [    8.261355] Adding 4085756k swap on /dev/mapper/cryptswap1.  Priority:-1 extents:1 across:4085756k SSFS
Dec 13 13:02:03 xanatos kernel: [    8.407323] e1000e 0000:00:19.0: irq 44 for MSI/MSI-X
Dec 13 13:02:03 xanatos kernel: [    8.510442] e1000e 0000:00:19.0: irq 44 for MSI/MSI-X
Dec 13 13:02:03 xanatos kernel: [    8.510945] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
Dec 13 13:02:03 xanatos kernel: [    8.516037] iwlwifi 0000:03:00.0: L1 Enabled; Disabling L0S
Dec 13 13:02:03 xanatos kernel: [    8.517364] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0
Dec 13 13:02:03 xanatos kernel: [    8.785685] iwlwifi 0000:03:00.0: L1 Enabled; Disabling L0S
Dec 13 13:02:03 xanatos kernel: [    8.792724] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0
Dec 13 13:02:04 xanatos kernel: [    8.876409] IPv6: ADDRCONF(NETDEV_UP): wlan1: link is not ready
Dec 13 13:02:04 xanatos kernel: [    9.787586] usb 3-1.6: usb auto-suspend, wakeup 0
Dec 13 13:02:05 xanatos kernel: [    9.910341] psmouse serio2: alps: Unknown ALPS touchpad: E7=10 00 64, EC=10 00 64
Dec 13 13:02:06 xanatos kernel: [   11.169530] psmouse serio2: trackpoint: IBM TrackPoint firmware: 0x0e, buttons: 3/3
Dec 13 13:02:06 xanatos kernel: [   11.375342] input: TPPS/2 IBM TrackPoint as /devices/platform/i8042/serio1/serio2/input/input19
Dec 13 13:02:07 xanatos kernel: [   11.917809] e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Dec 13 13:02:07 xanatos kernel: [   11.917863] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
Dec 13 13:02:12 xanatos kernel: [   17.125215] 
Dec 13 13:02:12 xanatos kernel: [   17.125218] ======================================================
Dec 13 13:02:12 xanatos kernel: [   17.125219] [ INFO: possible circular locking dependency detected ]
Dec 13 13:02:12 xanatos kernel: [   17.125221] 3.13.0-rc1+ #140 Not tainted
Dec 13 13:02:12 xanatos kernel: [   17.125221] -------------------------------------------------------
Dec 13 13:02:12 xanatos kernel: [   17.125222] lightdm/1764 is trying to acquire lock:
Dec 13 13:02:12 xanatos kernel: [   17.125223]  (&sb->s_type->i_mutex_key#12){+.+.+.}, at: [<ffffffff812b5c53>] ecryptfs_getxattr_lower+0x43/0x80
Dec 13 13:02:12 xanatos kernel: [   17.125230] 
Dec 13 13:02:12 xanatos kernel: [   17.125230] but task is already holding lock:
Dec 13 13:02:12 xanatos kernel: [   17.125231]  (&isp->smk_lock){+.+.+.}, at: [<ffffffff812eda3e>] smack_d_instantiate+0x5e/0x2e0
Dec 13 13:02:12 xanatos kernel: [   17.125236] 
Dec 13 13:02:12 xanatos kernel: [   17.125236] which lock already depends on the new lock.
Dec 13 13:02:12 xanatos kernel: [   17.125236] 
Dec 13 13:02:12 xanatos kernel: [   17.125237] 
Dec 13 13:02:12 xanatos kernel: [   17.125237] the existing dependency chain (in reverse order) is:
Dec 13 13:02:12 xanatos kernel: [   17.125238] 
Dec 13 13:02:12 xanatos kernel: [   17.125238] -> #2 (&isp->smk_lock){+.+.+.}:
Dec 13 13:02:12 xanatos kernel: [   17.125240]        [<ffffffff8109a683>] lock_acquire+0x93/0x120
Dec 13 13:02:12 xanatos kernel: [   17.125242]        [<ffffffff8165d70a>] mutex_lock_nested+0x6a/0x390
Dec 13 13:02:12 xanatos kernel: [   17.125245]        [<ffffffff812eda3e>] smack_d_instantiate+0x5e/0x2e0
Dec 13 13:02:12 xanatos kernel: [   17.125247]        [<ffffffff812e91fb>] security_d_instantiate+0x1b/0x30
Dec 13 13:02:12 xanatos kernel: [   17.125249]        [<ffffffff811bddc0>] d_instantiate+0x50/0x70
Dec 13 13:02:12 xanatos kernel: [   17.125251]        [<ffffffff81257a7e>] ext4_add_nondir+0x6e/0x80
Dec 13 13:02:12 xanatos kernel: [   17.125254]        [<ffffffff81257d04>] ext4_create+0x104/0x170
Dec 13 13:02:12 xanatos kernel: [   17.125256]        [<ffffffff811b53ed>] vfs_create+0xcd/0x130
Dec 13 13:02:12 xanatos kernel: [   17.125257]        [<ffffffff811b65f6>] do_last+0x11a6/0x13d0
Dec 13 13:02:12 xanatos kernel: [   17.125259]        [<ffffffff811b68db>] path_openat+0xbb/0x680
Dec 13 13:02:12 xanatos kernel: [   17.125260]        [<ffffffff811b76ba>] do_filp_open+0x3a/0x90
Dec 13 13:02:12 xanatos kernel: [   17.125262]        [<ffffffff811a524e>] do_sys_open+0x12e/0x210
Dec 13 13:02:12 xanatos kernel: [   17.125264]        [<ffffffff811a534e>] SyS_open+0x1e/0x20
Dec 13 13:02:12 xanatos kernel: [   17.125266]        [<ffffffff81669d96>] system_call_fastpath+0x1a/0x1f
Dec 13 13:02:12 xanatos kernel: [   17.125268] 
Dec 13 13:02:12 xanatos kernel: [   17.125268] -> #1 (jbd2_handle){+.+.+.}:
Dec 13 13:02:12 xanatos kernel: [   17.125270]        [<ffffffff8109a683>] lock_acquire+0x93/0x120
Dec 13 13:02:12 xanatos kernel: [   17.125272]        [<ffffffff8129a4ce>] start_this_handle+0x21e/0x5f0
Dec 13 13:02:12 xanatos kernel: [   17.125274]        [<ffffffff8129aa7b>] jbd2__journal_start+0xcb/0x1b0
Dec 13 13:02:12 xanatos kernel: [   17.125276]        [<ffffffff812788cd>] __ext4_journal_start_sb+0x6d/0x130
Dec 13 13:02:12 xanatos kernel: [   17.125278]        [<ffffffff81251a6b>] ext4_setattr+0x3bb/0x6f0
Dec 13 13:02:12 xanatos kernel: [   17.125279]        [<ffffffff811c3fe9>] notify_change+0x279/0x3d0
Dec 13 13:02:12 xanatos kernel: [   17.125281]        [<ffffffff811a3d3f>] do_truncate+0x6f/0xa0
Dec 13 13:02:12 xanatos kernel: [   17.125283]        [<ffffffff811b5f02>] do_last+0xab2/0x13d0
Dec 13 13:02:12 xanatos kernel: [   17.125284]        [<ffffffff811b68db>] path_openat+0xbb/0x680
Dec 13 13:02:12 xanatos kernel: [   17.125285]        [<ffffffff811b76ba>] do_filp_open+0x3a/0x90
Dec 13 13:02:12 xanatos kernel: [   17.125287]        [<ffffffff811a524e>] do_sys_open+0x12e/0x210
Dec 13 13:02:12 xanatos kernel: [   17.125288]        [<ffffffff811a534e>] SyS_open+0x1e/0x20
Dec 13 13:02:12 xanatos kernel: [   17.125290]        [<ffffffff81669d96>] system_call_fastpath+0x1a/0x1f
Dec 13 13:02:12 xanatos kernel: [   17.125292] 
Dec 13 13:02:12 xanatos kernel: [   17.125292] -> #0 (&sb->s_type->i_mutex_key#12){+.+.+.}:
Dec 13 13:02:12 xanatos kernel: [   17.125294]        [<ffffffff8109995e>] __lock_acquire+0x148e/0x1a10
Dec 13 13:02:12 xanatos kernel: [   17.125295]        [<ffffffff8109a683>] lock_acquire+0x93/0x120
Dec 13 13:02:12 xanatos kernel: [   17.125297]        [<ffffffff8165d70a>] mutex_lock_nested+0x6a/0x390
Dec 13 13:02:12 xanatos kernel: [   17.125298]        [<ffffffff812b5c53>] ecryptfs_getxattr_lower+0x43/0x80
Dec 13 13:02:12 xanatos kernel: [   17.125300]        [<ffffffff812b5ca9>] ecryptfs_getxattr+0x19/0x20
Dec 13 13:02:12 xanatos kernel: [   17.125302]        [<ffffffff812ed9af>] smk_fetch.isra.22+0x5f/0x90
Dec 13 13:02:12 xanatos kernel: [   17.125304]        [<ffffffff812edb1f>] smack_d_instantiate+0x13f/0x2e0
Dec 13 13:02:12 xanatos kernel: [   17.125305]        [<ffffffff812e91fb>] security_d_instantiate+0x1b/0x30
Dec 13 13:02:12 xanatos kernel: [   17.125307]        [<ffffffff811bddc0>] d_instantiate+0x50/0x70
Dec 13 13:02:12 xanatos kernel: [   17.125309]        [<ffffffff812b4b7d>] ecryptfs_lookup+0x13d/0x350
Dec 13 13:02:12 xanatos kernel: [   17.125311]        [<ffffffff811afded>] lookup_real+0x1d/0x50
Dec 13 13:02:12 xanatos kernel: [   17.125313]        [<ffffffff811b5f73>] do_last+0xb23/0x13d0
Dec 13 13:02:12 xanatos kernel: [   17.125315]        [<ffffffff811b68db>] path_openat+0xbb/0x680
Dec 13 13:02:12 xanatos kernel: [   17.125316]        [<ffffffff811b76ba>] do_filp_open+0x3a/0x90
Dec 13 13:02:12 xanatos kernel: [   17.125317]        [<ffffffff811a524e>] do_sys_open+0x12e/0x210
Dec 13 13:02:12 xanatos kernel: [   17.125319]        [<ffffffff811a534e>] SyS_open+0x1e/0x20
Dec 13 13:02:12 xanatos kernel: [   17.125321]        [<ffffffff81669d96>] system_call_fastpath+0x1a/0x1f
Dec 13 13:02:12 xanatos kernel: [   17.125322] 
Dec 13 13:02:12 xanatos kernel: [   17.125322] other info that might help us debug this:
Dec 13 13:02:12 xanatos kernel: [   17.125322] 
Dec 13 13:02:12 xanatos kernel: [   17.125324] Chain exists of:
Dec 13 13:02:12 xanatos kernel: [   17.125324]   &sb->s_type->i_mutex_key#12 --> jbd2_handle --> &isp->smk_lock
Dec 13 13:02:12 xanatos kernel: [   17.125324] 
Dec 13 13:02:12 xanatos kernel: [   17.125326]  Possible unsafe locking scenario:
Dec 13 13:02:12 xanatos kernel: [   17.125326] 
Dec 13 13:02:12 xanatos kernel: [   17.125327]        CPU0                    CPU1
Dec 13 13:02:12 xanatos kernel: [   17.125328]        ----                    ----
Dec 13 13:02:12 xanatos kernel: [   17.125329]   lock(&isp->smk_lock);
Dec 13 13:02:12 xanatos kernel: [   17.125330]                                lock(jbd2_handle);
Dec 13 13:02:12 xanatos kernel: [   17.125331]                                lock(&isp->smk_lock);
Dec 13 13:02:12 xanatos kernel: [   17.125332]   lock(&sb->s_type->i_mutex_key#12);
Dec 13 13:02:12 xanatos kernel: [   17.125334] 
Dec 13 13:02:12 xanatos kernel: [   17.125334]  *** DEADLOCK ***
Dec 13 13:02:12 xanatos kernel: [   17.125334] 
Dec 13 13:02:12 xanatos kernel: [   17.125335] 2 locks held by lightdm/1764:
Dec 13 13:02:12 xanatos kernel: [   17.125336]  #0:  (&type->i_mutex_dir_key#3){+.+.+.}, at: [<ffffffff811b57d0>] do_last+0x380/0x13d0
Dec 13 13:02:12 xanatos kernel: [   17.125339]  #1:  (&isp->smk_lock){+.+.+.}, at: [<ffffffff812eda3e>] smack_d_instantiate+0x5e/0x2e0
Dec 13 13:02:12 xanatos kernel: [   17.125342] 
Dec 13 13:02:12 xanatos kernel: [   17.125342] stack backtrace:
Dec 13 13:02:12 xanatos kernel: [   17.125344] CPU: 2 PID: 1764 Comm: lightdm Not tainted 3.13.0-rc1+ #140
Dec 13 13:02:12 xanatos kernel: [   17.125345] Hardware name: LENOVO 2325AP7/2325AP7, BIOS G2ET82WW (2.02 ) 09/11/2012
Dec 13 13:02:12 xanatos kernel: [   17.125346]  ffffffff82275c10 ffff8800b3e07988 ffffffff81658ace ffffffff82275f70
Dec 13 13:02:12 xanatos kernel: [   17.125349]  ffff8800b3e079c8 ffffffff81654f8d ffff8800b3e07a20 ffff8800b9a327e0
Dec 13 13:02:12 xanatos kernel: [   17.125351]  0000000000000001 0000000000000002 ffff8800b9a32090 ffff8800b9a327e0
Dec 13 13:02:12 xanatos kernel: [   17.125353] Call Trace:
Dec 13 13:02:12 xanatos kernel: [   17.125356]  [<ffffffff81658ace>] dump_stack+0x4d/0x66
Dec 13 13:02:12 xanatos kernel: [   17.125358]  [<ffffffff81654f8d>] print_circular_bug+0x200/0x20f
Dec 13 13:02:12 xanatos kernel: [   17.125360]  [<ffffffff8109995e>] __lock_acquire+0x148e/0x1a10
Dec 13 13:02:12 xanatos kernel: [   17.125363]  [<ffffffff8106a958>] ? __kernel_text_address+0x58/0x80
Dec 13 13:02:12 xanatos kernel: [   17.125364]  [<ffffffff8109a683>] lock_acquire+0x93/0x120
Dec 13 13:02:12 xanatos kernel: [   17.125366]  [<ffffffff812b5c53>] ? ecryptfs_getxattr_lower+0x43/0x80
Dec 13 13:02:12 xanatos kernel: [   17.125368]  [<ffffffff8165d70a>] mutex_lock_nested+0x6a/0x390
Dec 13 13:02:12 xanatos kernel: [   17.125370]  [<ffffffff812b5c53>] ? ecryptfs_getxattr_lower+0x43/0x80
Dec 13 13:02:12 xanatos kernel: [   17.125372]  [<ffffffff812b5c53>] ecryptfs_getxattr_lower+0x43/0x80
Dec 13 13:02:12 xanatos kernel: [   17.125374]  [<ffffffff812b5ca9>] ecryptfs_getxattr+0x19/0x20
Dec 13 13:02:12 xanatos kernel: [   17.125376]  [<ffffffff812ed9af>] smk_fetch.isra.22+0x5f/0x90
Dec 13 13:02:12 xanatos kernel: [   17.125378]  [<ffffffff812edb1f>] smack_d_instantiate+0x13f/0x2e0
Dec 13 13:02:12 xanatos kernel: [   17.125380]  [<ffffffff812e91fb>] security_d_instantiate+0x1b/0x30
Dec 13 13:02:12 xanatos kernel: [   17.125382]  [<ffffffff811bddc0>] d_instantiate+0x50/0x70
Dec 13 13:02:12 xanatos kernel: [   17.125384]  [<ffffffff812b4b7d>] ecryptfs_lookup+0x13d/0x350
Dec 13 13:02:12 xanatos kernel: [   17.125386]  [<ffffffff811afded>] lookup_real+0x1d/0x50
Dec 13 13:02:12 xanatos kernel: [   17.125388]  [<ffffffff811b5f73>] do_last+0xb23/0x13d0
Dec 13 13:02:12 xanatos kernel: [   17.125389]  [<ffffffff811b1d78>] ? inode_permission+0x18/0x50
Dec 13 13:02:12 xanatos kernel: [   17.125391]  [<ffffffff811b2656>] ? link_path_walk+0x246/0x860
Dec 13 13:02:12 xanatos kernel: [   17.125392]  [<ffffffff81098100>] ? trace_hardirqs_on_caller+0xd0/0x1c0
Dec 13 13:02:12 xanatos kernel: [   17.125394]  [<ffffffff811b68db>] path_openat+0xbb/0x680
Dec 13 13:02:12 xanatos kernel: [   17.125396]  [<ffffffff8109812d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
Dec 13 13:02:12 xanatos kernel: [   17.125397]  [<ffffffff810981fd>] ? trace_hardirqs_on+0xd/0x10
Dec 13 13:02:12 xanatos kernel: [   17.125399]  [<ffffffff811b76ba>] do_filp_open+0x3a/0x90
Dec 13 13:02:12 xanatos kernel: [   17.125401]  [<ffffffff81660f77>] ? _raw_spin_unlock+0x27/0x40
Dec 13 13:02:12 xanatos kernel: [   17.125403]  [<ffffffff811c5527>] ? __alloc_fd+0xa7/0x130
Dec 13 13:02:12 xanatos kernel: [   17.125405]  [<ffffffff811a524e>] do_sys_open+0x12e/0x210
Dec 13 13:02:12 xanatos kernel: [   17.125407]  [<ffffffff811a534e>] SyS_open+0x1e/0x20
Dec 13 13:02:12 xanatos kernel: [   17.125409]  [<ffffffff81669d96>] system_call_fastpath+0x1a/0x1f
Dec 13 13:03:40 xanatos kernel: [  105.233448] usb usb2: usb wakeup-resume
Dec 13 13:03:40 xanatos kernel: [  105.233460] usb usb2: usb auto-resume
Dec 13 13:03:40 xanatos kernel: [  105.233482] hub 2-0:1.0: hub_resume
Dec 13 13:03:40 xanatos kernel: [  105.233873] hub 2-0:1.0: port 2: status 0203 change 0001
Dec 13 13:03:40 xanatos kernel: [  105.337785] hub 2-0:1.0: state 7 ports 4 chg 0004 evt 0000
Dec 13 13:03:40 xanatos kernel: [  105.337930] hub 2-0:1.0: port 2, status 0203, change 0000, 5.0 Gb/s
Dec 13 13:03:40 xanatos kernel: [  105.450162] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd
Dec 13 13:03:40 xanatos kernel: [  105.466512] usb 2-2: skipped 1 descriptor after endpoint
Dec 13 13:03:40 xanatos kernel: [  105.466519] usb 2-2: skipped 1 descriptor after endpoint
Dec 13 13:03:40 xanatos kernel: [  105.466530] usb 2-2: skipped 2 descriptors after endpoint
Dec 13 13:03:40 xanatos kernel: [  105.466534] usb 2-2: skipped 2 descriptors after endpoint
Dec 13 13:03:40 xanatos kernel: [  105.466537] usb 2-2: skipped 2 descriptors after endpoint
Dec 13 13:03:40 xanatos kernel: [  105.466540] usb 2-2: skipped 2 descriptors after endpoint
Dec 13 13:03:40 xanatos kernel: [  105.466686] usb 2-2: default language 0x0409
Dec 13 13:03:40 xanatos kernel: [  105.467139] usb 2-2: udev 2, busnum 2, minor = 129
Dec 13 13:03:40 xanatos kernel: [  105.467143] usb 2-2: New USB device found, idVendor=174c, idProduct=55aa
Dec 13 13:03:40 xanatos kernel: [  105.467145] usb 2-2: New USB device strings: Mfr=2, Product=3, SerialNumber=1
Dec 13 13:03:40 xanatos kernel: [  105.467148] usb 2-2: Product: Plugable USB3-SATA-UASP1
Dec 13 13:03:40 xanatos kernel: [  105.467150] usb 2-2: Manufacturer: ASM1053E
Dec 13 13:03:40 xanatos kernel: [  105.467151] usb 2-2: SerialNumber: 123456789045
Dec 13 13:03:40 xanatos kernel: [  105.467641] usb 2-2: usb_probe_device
Dec 13 13:03:40 xanatos kernel: [  105.467646] usb 2-2: configuration #1 chosen from 1 choice
Dec 13 13:03:40 xanatos kernel: [  105.468564] usb 2-2: adding 2-2:1.0 (config #1, interface 0)
Dec 13 13:03:40 xanatos kernel: [  105.469546] hub 2-0:1.0: state 7 ports 4 chg 0000 evt 0004
Dec 13 13:03:40 xanatos kernel: [  105.499291] usb-storage 2-2:1.0: usb_probe_interface
Dec 13 13:03:40 xanatos kernel: [  105.499299] usb-storage 2-2:1.0: usb_probe_interface - got id
Dec 13 13:03:40 xanatos kernel: [  105.500040] usbcore: registered new interface driver usb-storage
Dec 13 13:03:40 xanatos kernel: [  105.503635] uas 2-2:1.0: usb_probe_interface
Dec 13 13:03:40 xanatos kernel: [  105.503640] uas 2-2:1.0: usb_probe_interface - got id
Dec 13 13:03:40 xanatos kernel: [  105.507154] scsi6 : uas
Dec 13 13:03:40 xanatos kernel: [  105.508103] usbcore: registered new interface driver uas
Dec 13 13:03:40 xanatos kernel: [  105.508567] scsi 6:0:0:0: Direct-Access     ASM1053E Plugable USB3-SA 0    PQ: 0 ANSI: 6
Dec 13 13:03:40 xanatos kernel: [  105.509797] sd 6:0:0:0: Attached scsi generic sg1 type 0
Dec 13 13:03:40 xanatos kernel: [  105.510582] sd 6:0:0:0: [sdb] 117231408 512-byte logical blocks: (60.0 GB/55.8 GiB)
Dec 13 13:03:40 xanatos kernel: [  105.511360] sd 6:0:0:0: [sdb] Write Protect is off
Dec 13 13:03:40 xanatos kernel: [  105.511363] sd 6:0:0:0: [sdb] Mode Sense: 43 00 00 00
Dec 13 13:03:40 xanatos kernel: [  105.511724] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Dec 13 13:03:40 xanatos kernel: [  105.515268]  sdb: sdb1 sdb2 sdb4
Dec 13 13:03:40 xanatos kernel: [  105.518270] sd 6:0:0:0: [sdb] Attached SCSI disk
Dec 13 13:03:41 xanatos kernel: [  106.194085] FAT-fs (sdb4): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
Dec 13 13:03:41 xanatos kernel: [  106.197181] kjournald starting.  Commit interval 5 seconds
Dec 13 13:03:41 xanatos kernel: [  106.198121] EXT3-fs (sdb1): using internal journal
Dec 13 13:03:41 xanatos kernel: [  106.198128] EXT3-fs (sdb1): recovery complete
Dec 13 13:03:41 xanatos kernel: [  106.198130] EXT3-fs (sdb1): mounted filesystem with ordered data mode
Dec 13 13:03:41 xanatos kernel: [  106.200429] EXT4-fs (sdb2): recovery complete
Dec 13 13:03:41 xanatos kernel: [  106.203643] EXT4-fs (sdb2): mounted filesystem with ordered data mode. Opts: (null)
Dec 13 13:03:46 xanatos kernel: [  110.914386] hub 2-0:1.0: state 7 ports 4 chg 0000 evt 0004
Dec 13 13:03:46 xanatos kernel: [  110.914591] hub 2-0:1.0: warm reset port 2
Dec 13 13:03:46 xanatos kernel: [  110.969514] hub 2-0:1.0: port 2 not warm reset yet, waiting 50ms
Dec 13 13:03:46 xanatos kernel: [  111.025947] hub 2-0:1.0: port 2, status 02c0, change 0041, 5.0 Gb/s
Dec 13 13:03:46 xanatos kernel: [  111.025964] usb 2-2: USB disconnect, device number 2
Dec 13 13:03:46 xanatos kernel: [  111.025967] usb 2-2: unregistering device
Dec 13 13:03:46 xanatos kernel: [  111.025971] usb 2-2: unregistering interface 2-2:1.0
Dec 13 13:03:46 xanatos kernel: [  111.026263] usb 2-2: usb_set_device_initiated_lpm: Can't disable U1 state for unconfigured device.
Dec 13 13:03:46 xanatos kernel: [  111.026305] usb 2-2: usb_set_device_initiated_lpm: Can't disable U2 state for unconfigured device.
Dec 13 13:03:46 xanatos kernel: [  111.040389] JBD2: Error -5 detected when updating journal superblock for sdb2-8.
Dec 13 13:03:46 xanatos kernel: [  111.040440] Aborting journal on device sdb2-8.
Dec 13 13:03:46 xanatos kernel: [  111.040458] JBD2: Error -5 detected when updating journal superblock for sdb2-8.
Dec 13 13:03:46 xanatos kernel: [  111.040468] journal commit I/O error
Dec 13 13:03:46 xanatos kernel: [  111.043848] sd 6:0:0:0: [sdb] Synchronizing SCSI cache
Dec 13 13:03:46 xanatos kernel: [  111.093454] EXT3-fs (sdb1): I/O error while writing superblock
Dec 13 13:03:46 xanatos kernel: [  111.157506] sd 6:0:0:0: [sdb]  
Dec 13 13:03:46 xanatos kernel: [  111.157513] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
Dec 13 13:03:46 xanatos kernel: [  111.158093] usb 2-2: usb_set_device_initiated_lpm: Can't enable U1 state for unconfigured device.
Dec 13 13:03:46 xanatos kernel: [  111.158143] usb 2-2: usb_set_device_initiated_lpm: Can't enable U2 state for unconfigured device.
Dec 13 13:03:46 xanatos kernel: [  111.158238] usb 2-2: usb_set_device_initiated_lpm: Can't disable U1 state for unconfigured device.
Dec 13 13:03:46 xanatos kernel: [  111.158267] usb 2-2: usb_set_device_initiated_lpm: Can't disable U2 state for unconfigured device.
Dec 13 13:03:46 xanatos kernel: [  111.158289] usb 2-2: usb_disable_device nuking all URBs
Dec 13 13:03:46 xanatos kernel: [  111.285629] hub 2-0:1.0: debounce: port 2: total 100ms stable 100ms status 0x2a0
Dec 13 13:03:46 xanatos kernel: [  111.285636] hub 2-0:1.0: state 7 ports 4 chg 0000 evt 0004
Dec 13 13:03:46 xanatos kernel: [  111.285780] hub 2-0:1.0: hub_suspend
Dec 13 13:03:46 xanatos kernel: [  111.285792] usb usb2: bus auto-suspend, wakeup 1
Dec 13 13:03:46 xanatos kernel: [  111.750203] EXT4-fs error (device sdb2): ext4_put_super:791: Couldn't clean up the journal
Dec 13 13:03:46 xanatos kernel: [  111.750209] EXT4-fs (sdb2): Remounting filesystem read-only
Dec 13 13:04:04 xanatos kernel: [  129.385279] usb usb2: usb wakeup-resume
Dec 13 13:04:04 xanatos kernel: [  129.385292] usb usb2: usb auto-resume
Dec 13 13:04:04 xanatos kernel: [  129.385313] hub 2-0:1.0: hub_resume
Dec 13 13:04:04 xanatos kernel: [  129.385538] hub 2-0:1.0: port 2: status 0203 change 0001
Dec 13 13:04:04 xanatos kernel: [  129.489626] hub 2-0:1.0: state 7 ports 4 chg 0004 evt 0000
Dec 13 13:04:04 xanatos kernel: [  129.489770] hub 2-0:1.0: port 2, status 0203, change 0000, 5.0 Gb/s
Dec 13 13:04:04 xanatos kernel: [  129.601984] usb 2-2: new SuperSpeed USB device number 3 using xhci_hcd
Dec 13 13:04:04 xanatos kernel: [  129.618343] usb 2-2: skipped 1 descriptor after endpoint
Dec 13 13:04:04 xanatos kernel: [  129.618350] usb 2-2: skipped 1 descriptor after endpoint
Dec 13 13:04:04 xanatos kernel: [  129.618359] usb 2-2: skipped 2 descriptors after endpoint
Dec 13 13:04:04 xanatos kernel: [  129.618362] usb 2-2: skipped 2 descriptors after endpoint
Dec 13 13:04:04 xanatos kernel: [  129.618365] usb 2-2: skipped 2 descriptors after endpoint
Dec 13 13:04:04 xanatos kernel: [  129.618368] usb 2-2: skipped 2 descriptors after endpoint
Dec 13 13:04:04 xanatos kernel: [  129.618497] usb 2-2: default language 0x0409
Dec 13 13:04:04 xanatos kernel: [  129.618944] usb 2-2: udev 3, busnum 2, minor = 130
Dec 13 13:04:04 xanatos kernel: [  129.618949] usb 2-2: New USB device found, idVendor=174c, idProduct=55aa
Dec 13 13:04:04 xanatos kernel: [  129.618952] usb 2-2: New USB device strings: Mfr=2, Product=3, SerialNumber=1
Dec 13 13:04:04 xanatos kernel: [  129.618955] usb 2-2: Product: Plugable USB3-SATA-UASP1
Dec 13 13:04:04 xanatos kernel: [  129.618957] usb 2-2: Manufacturer: ASM1053E
Dec 13 13:04:04 xanatos kernel: [  129.618960] usb 2-2: SerialNumber: 123456789045
Dec 13 13:04:04 xanatos kernel: [  129.619422] usb 2-2: usb_probe_device
Dec 13 13:04:04 xanatos kernel: [  129.619428] usb 2-2: configuration #1 chosen from 1 choice
Dec 13 13:04:04 xanatos kernel: [  129.620213] usb 2-2: adding 2-2:1.0 (config #1, interface 0)
Dec 13 13:04:04 xanatos kernel: [  129.620489] usb-storage 2-2:1.0: usb_probe_interface
Dec 13 13:04:04 xanatos kernel: [  129.620495] usb-storage 2-2:1.0: usb_probe_interface - got id
Dec 13 13:04:04 xanatos kernel: [  129.621367] uas 2-2:1.0: usb_probe_interface
Dec 13 13:04:04 xanatos kernel: [  129.621374] uas 2-2:1.0: usb_probe_interface - got id
Dec 13 13:04:04 xanatos kernel: [  129.626512] scsi7 : uas
Dec 13 13:04:04 xanatos kernel: [  129.628228] scsi 7:0:0:0: Direct-Access     ASM1053E Plugable USB3-SA 0    PQ: 0 ANSI: 6
Dec 13 13:04:04 xanatos kernel: [  129.630030] sd 7:0:0:0: Attached scsi generic sg1 type 0
Dec 13 13:04:04 xanatos kernel: [  129.640361] sd 7:0:0:0: [sdb] 117231408 512-byte logical blocks: (60.0 GB/55.8 GiB)
Dec 13 13:04:04 xanatos kernel: [  129.641198] sd 7:0:0:0: [sdb] Write Protect is off
Dec 13 13:04:04 xanatos kernel: [  129.641201] sd 7:0:0:0: [sdb] Mode Sense: 43 00 00 00
Dec 13 13:04:04 xanatos kernel: [  129.641608] sd 7:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Dec 13 13:04:04 xanatos kernel: [  129.645650]  sdb: sdb1 sdb2 sdb4
Dec 13 13:04:04 xanatos kernel: [  129.650160] sd 7:0:0:0: [sdb] Attached SCSI disk
Dec 13 13:04:05 xanatos kernel: [  130.205047] kjournald starting.  Commit interval 5 seconds
Dec 13 13:04:05 xanatos kernel: [  130.205381] EXT3-fs (sdb1): using internal journal
Dec 13 13:04:05 xanatos kernel: [  130.205387] EXT3-fs (sdb1): recovery complete
Dec 13 13:04:05 xanatos kernel: [  130.205389] EXT3-fs (sdb1): mounted filesystem with ordered data mode
Dec 13 13:04:05 xanatos kernel: [  130.228222] FAT-fs (sdb4): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
Dec 13 13:04:05 xanatos kernel: [  130.239468] EXT4-fs (sdb2): recovery complete
Dec 13 13:04:05 xanatos kernel: [  130.239478] EXT4-fs (sdb2): mounted filesystem with ordered data mode. Opts: (null)
Dec 13 13:04:17 xanatos kernel: [  142.112588] hub 2-0:1.0: state 7 ports 4 chg 0000 evt 0004
Dec 13 13:04:17 xanatos kernel: [  142.112682] hub 2-0:1.0: warm reset port 2
Dec 13 13:04:17 xanatos kernel: [  142.136957] sd 7:0:0:0: [sdb] uas_cmd_cmplt ffff88008bc1a100 tag 0, inflight: CMD IN
Dec 13 13:04:17 xanatos kernel: [  142.136963] sd 7:0:0:0: [sdb] cmd cmplt err -71
Dec 13 13:04:17 xanatos kernel: [  142.147967] sd 7:0:0:0: [sdb] uas_cmd_cmplt ffff88008bc1a400 tag 1, inflight: CMD IN
Dec 13 13:04:17 xanatos kernel: [  142.147974] sd 7:0:0:0: [sdb] cmd cmplt err -71
Dec 13 13:04:17 xanatos kernel: [  142.165840] hub 2-0:1.0: port 2 not warm reset yet, waiting 50ms
Dec 13 13:04:17 xanatos kernel: [  142.222174] hub 2-0:1.0: port 2, status 02c0, change 0041, 5.0 Gb/s
Dec 13 13:04:17 xanatos kernel: [  142.222183] usb 2-2: USB disconnect, device number 3
Dec 13 13:04:17 xanatos kernel: [  142.222185] usb 2-2: unregistering device
Dec 13 13:04:17 xanatos kernel: [  142.222187] usb 2-2: unregistering interface 2-2:1.0
Dec 13 13:04:17 xanatos kernel: [  142.222386] usb 2-2: usb_set_device_initiated_lpm: Can't disable U1 state for unconfigured device.
Dec 13 13:04:17 xanatos kernel: [  142.222416] usb 2-2: usb_set_device_initiated_lpm: Can't disable U2 state for unconfigured device.
Dec 13 13:04:17 xanatos kernel: [  142.222463] xhci_hcd 0000:00:14.0: shutdown urb ffff88010be0fb40 ep1in-bulk
Dec 13 13:04:17 xanatos kernel: [  142.222466] xhci_hcd 0000:00:14.0: shutdown urb ffff88010be0ff00 ep1in-bulk
Dec 13 13:04:17 xanatos kernel: [  142.222515] sd 7:0:0:0: [sdb] uas_data_cmplt ffff88008bc1a100 tag 0, inflight: CMD
Dec 13 13:04:17 xanatos kernel: [  142.222520] sd 7:0:0:0: [sdb] data cmplt err -108 stream 2
Dec 13 13:04:17 xanatos kernel: [  142.222535] sd 7:0:0:0: [sdb] uas_data_cmplt ffff88008bc1a400 tag 1, inflight: CMD
Dec 13 13:04:17 xanatos kernel: [  142.222537] sd 7:0:0:0: [sdb] data cmplt err -108 stream 3
Dec 13 13:04:17 xanatos kernel: [  142.222550] xhci_hcd 0000:00:14.0: shutdown urb ffff88010be0f300 ep3in-bulk
Dec 13 13:04:17 xanatos kernel: [  142.222555] xhci_hcd 0000:00:14.0: shutdown urb ffff88010be0f540 ep3in-bulk
Dec 13 13:04:17 xanatos kernel: [  142.222564] usb 2-2: stat urb: status -108
Dec 13 13:04:17 xanatos kernel: [  142.222575] usb 2-2: stat urb: status -108
Dec 13 13:04:17 xanatos kernel: [  142.222588] sd 7:0:0:0: [sdb] uas_disconnect ffff88008bc1a100 tag 0, inflight: CMD
Dec 13 13:04:17 xanatos kernel: [  142.222590] sd 7:0:0:0: [sdb] uas_disconnect ffff88008bc1a400 tag 1, inflight: CMD
Dec 13 13:04:17 xanatos kernel: [  142.222592] sd 7:0:0:0: [sdb] uas_zap_dead ffff88008bc1a100 tag 0, inflight: CMD abort
Dec 13 13:04:17 xanatos kernel: [  142.222594] sd 7:0:0:0: [sdb] abort completed
Dec 13 13:04:17 xanatos kernel: [  142.222597] sd 7:0:0:0: [sdb] uas_zap_dead ffff88008bc1a400 tag 1, inflight: CMD abort
Dec 13 13:04:17 xanatos kernel: [  142.222599] sd 7:0:0:0: [sdb] abort completed
Dec 13 13:04:17 xanatos kernel: [  142.222633] sd 7:0:0:0: [sdb] Unhandled error code
Dec 13 13:04:17 xanatos kernel: [  142.222635] sd 7:0:0:0: [sdb]  
Dec 13 13:04:17 xanatos kernel: [  142.222637] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Dec 13 13:04:17 xanatos kernel: [  142.222641] sd 7:0:0:0: [sdb] CDB: 
Dec 13 13:04:17 xanatos kernel: [  142.222646] Read(10): 28 00 01 49 c9 90 00 01 00 00
Dec 13 13:04:17 xanatos kernel: [  142.222659] end_request: I/O error, dev sdb, sector 21612944
Dec 13 13:04:17 xanatos kernel: [  142.222686] sd 7:0:0:0: [sdb] Unhandled error code
Dec 13 13:04:17 xanatos kernel: [  142.222689] sd 7:0:0:0: [sdb]  
Dec 13 13:04:17 xanatos kernel: [  142.222690] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Dec 13 13:04:17 xanatos kernel: [  142.222692] sd 7:0:0:0: [sdb] CDB: 
Dec 13 13:04:17 xanatos kernel: [  142.222693] Read(10): 28 00 01 49 ca 90 00 01 00 00
Dec 13 13:04:17 xanatos kernel: [  142.222702] end_request: I/O error, dev sdb, sector 21613200
Dec 13 13:04:17 xanatos kernel: [  142.239231] end_request: I/O error, dev sdb, sector 0
Dec 13 13:04:17 xanatos kernel: [  142.245075] sd 7:0:0:0: [sdb] Synchronizing SCSI cache
Dec 13 13:04:17 xanatos kernel: [  142.301917] end_request: I/O error, dev sdb, sector 0
Dec 13 13:04:17 xanatos kernel: [  142.307596] JBD2: Error -5 detected when updating journal superblock for sdb2-8.
Dec 13 13:04:17 xanatos kernel: [  142.338630] EXT3-fs (sdb1): I/O error while writing superblock
Dec 13 13:04:17 xanatos kernel: [  142.354202] sd 7:0:0:0: [sdb]  
Dec 13 13:04:17 xanatos kernel: [  142.354224] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
Dec 13 13:04:17 xanatos kernel: [  142.356394] usb 2-2: usb_set_device_initiated_lpm: Can't enable U1 state for unconfigured device.
Dec 13 13:04:17 xanatos kernel: [  142.356448] usb 2-2: usb_set_device_initiated_lpm: Can't enable U2 state for unconfigured device.
Dec 13 13:04:17 xanatos kernel: [  142.356534] usb 2-2: usb_set_device_initiated_lpm: Can't disable U1 state for unconfigured device.
Dec 13 13:04:17 xanatos kernel: [  142.356566] usb 2-2: usb_set_device_initiated_lpm: Can't disable U2 state for unconfigured device.
Dec 13 13:04:17 xanatos kernel: [  142.356587] usb 2-2: usb_disable_device nuking all URBs
Dec 13 13:04:17 xanatos kernel: [  142.482058] hub 2-0:1.0: debounce: port 2: total 100ms stable 100ms status 0x2a0
Dec 13 13:04:17 xanatos kernel: [  142.482064] hub 2-0:1.0: state 7 ports 4 chg 0000 evt 0004
Dec 13 13:04:17 xanatos kernel: [  142.482188] hub 2-0:1.0: hub_suspend
Dec 13 13:04:17 xanatos kernel: [  142.482196] usb usb2: bus auto-suspend, wakeup 1

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [usb-storage] UAS hangs khubd on USB disconnect
  2013-12-13 21:06                   ` Alan Stern
@ 2013-12-13 21:18                     ` James Bottomley
       [not found]                       ` <1386969529.2055.79.camel-sFMDBYUN5F8GjUHQrlYNx2Wm91YjaHnnhRte9Li2A+AAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: James Bottomley @ 2013-12-13 21:18 UTC (permalink / raw)
  To: Alan Stern
  Cc: Tejun Heo, Sarah Sharp, Hans de Goede, USB list,
	SCSI development list, USB Storage List, Greg Kroah-Hartman

On Fri, 2013-12-13 at 16:06 -0500, Alan Stern wrote:
> On Fri, 13 Dec 2013, James Bottomley wrote:
> 
> > Actually, I think I have this figured out.  There's a thinko in one of
> > the scsi_target_reap() cases.  The original (and still existing) problem
> > with targets is that nothing creates them and nothing destroys them, so,
> > while we could rely on the refcounting of the device model to preserve
> > the actual target object, we had no idea when to remove it from
> > visibility.  That was the job of the reap reference, to track
> > visibility.  It looks like the reap on device last put is occurring too
> > late.  I think we should reap immediately after doing the sdev
> > device_del, so does this fix the warn on? (I'm not sure because no-one
> > has actually posted a backtrace, but it sounds like this is the
> > problem).
> > 
> > James
> > 
> > ---
> > 
> > diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> > index 8ff62c2..98d4eb3 100644
> > --- a/drivers/scsi/scsi_sysfs.c
> > +++ b/drivers/scsi/scsi_sysfs.c
> > @@ -399,8 +399,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
> >  	/* NULL queue means the device can't be used */
> >  	sdev->request_queue = NULL;
> >  
> > -	scsi_target_reap(scsi_target(sdev));
> > -
> >  	kfree(sdev->inquiry);
> >  	kfree(sdev);
> >  
> > @@ -1044,6 +1042,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
> >  	} else
> >  		put_device(&sdev->sdev_dev);
> >  
> > +	scsi_target_reap(scsi_target(sdev));
> > +
> >  	/*
> >  	 * Stop accepting new requests and wait until all queuecommand() and
> >  	 * scsi_run_queue() invocations have finished before tearing down the
> 
> This is not right.  The problem is that you don't keep track explicitly 
> of the number of references to a target; you rely implicitly on 
> starget->devices being non-empty.  starget->reap_ref is only a count of 
> local operations that should block removal.

No, it was supposed explicitly to be a visibility counter to answer the
question when can we delete the target.  It's incremented every time we
add a device to the target (and when we do an operation that may remove
one to keep an atomic context before we blow it away) and decremented
every time we remove one.

> Consider, for example, what would happen if there is more than one LUN.  
> What if one of them is removed while the other remains?

Then the reap reference remains above zero and the target stays.

> A more invasive change is needed.

I think you might be right in that we need to kill the list_empty check,
but I think that should be it.

James




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [usb-storage] UAS hangs khubd on USB disconnect
  2013-12-13 20:03                 ` James Bottomley
                                     ` (2 preceding siblings ...)
  2013-12-13 21:13                   ` [usb-storage] UAS hangs khubd on USB disconnect Sarah Sharp
@ 2013-12-13 21:24                   ` Hans de Goede
  3 siblings, 0 replies; 26+ messages in thread
From: Hans de Goede @ 2013-12-13 21:24 UTC (permalink / raw)
  To: James Bottomley, Tejun Heo
  Cc: Alan Stern, Sarah Sharp, USB list, SCSI development list,
	USB Storage List, Greg Kroah-Hartman

Hi,

On 12/13/2013 09:03 PM, James Bottomley wrote:

<snip>

> Actually, I think I have this figured out.  There's a thinko in one of
> the scsi_target_reap() cases.  The original (and still existing) problem
> with targets is that nothing creates them and nothing destroys them, so,
> while we could rely on the refcounting of the device model to preserve
> the actual target object, we had no idea when to remove it from
> visibility.  That was the job of the reap reference, to track
> visibility.  It looks like the reap on device last put is occurring too
> late.  I think we should reap immediately after doing the sdev
> device_del, so does this fix the warn on? (I'm not sure because no-one
> has actually posted a backtrace, but it sounds like this is the
> problem).
>
> James
>
> ---
>
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index 8ff62c2..98d4eb3 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -399,8 +399,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
>   	/* NULL queue means the device can't be used */
>   	sdev->request_queue = NULL;
>
> -	scsi_target_reap(scsi_target(sdev));
> -
>   	kfree(sdev->inquiry);
>   	kfree(sdev);
>
> @@ -1044,6 +1042,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
>   	} else
>   		put_device(&sdev->sdev_dev);
>
> +	scsi_target_reap(scsi_target(sdev));
> +
>   	/*
>   	 * Stop accepting new requests and wait until all queuecommand() and
>   	 * scsi_run_queue() invocations have finished before tearing down the

I've given this patch a try and it fixes the blk-tag.c: 89 BUG() I was seeing.

As for the other patch you (James) have send for that problem:

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 8ff62c2..98d4eb3 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -399,8 +399,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
  	/* NULL queue means the device can't be used */
  	sdev->request_queue = NULL;

-	scsi_target_reap(scsi_target(sdev));
-
  	kfree(sdev->inquiry);
  	kfree(sdev);

@@ -1044,6 +1042,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
  	} else
  		put_device(&sdev->sdev_dev);

+	scsi_target_reap(scsi_target(sdev));
+
  	/*
  	 * Stop accepting new requests and wait until all queuecommand() and
  	 * scsi_run_queue() invocations have finished before tearing down the

That too fixes the blk-tag.c: 89 BUG() I was seeing. Either patch by itself
seems to be enough to fix this issue for me.

Thanks & Regards,

Hans

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [usb-storage] UAS hangs khubd on USB disconnect
       [not found]                       ` <1386969529.2055.79.camel-sFMDBYUN5F8GjUHQrlYNx2Wm91YjaHnnhRte9Li2A+AAvxtiuMwx3w@public.gmane.org>
@ 2013-12-14  0:48                         ` Alan Stern
  2013-12-14  1:27                           ` James Bottomley
  0 siblings, 1 reply; 26+ messages in thread
From: Alan Stern @ 2013-12-14  0:48 UTC (permalink / raw)
  To: James Bottomley
  Cc: Tejun Heo, Sarah Sharp, Hans de Goede, USB list,
	SCSI development list, USB Storage List, Greg Kroah-Hartman

On Fri, 13 Dec 2013, James Bottomley wrote:

> On Fri, 2013-12-13 at 16:06 -0500, Alan Stern wrote:
> > On Fri, 13 Dec 2013, James Bottomley wrote:
> > 
> > > Actually, I think I have this figured out.  There's a thinko in one of
> > > the scsi_target_reap() cases.  The original (and still existing) problem
> > > with targets is that nothing creates them and nothing destroys them, so,
> > > while we could rely on the refcounting of the device model to preserve
> > > the actual target object, we had no idea when to remove it from
> > > visibility.  That was the job of the reap reference, to track
> > > visibility.  It looks like the reap on device last put is occurring too
> > > late.  I think we should reap immediately after doing the sdev
> > > device_del, so does this fix the warn on? (I'm not sure because no-one
> > > has actually posted a backtrace, but it sounds like this is the
> > > problem).
> > > 
> > > James
> > > 
> > > ---
> > > 
> > > diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> > > index 8ff62c2..98d4eb3 100644
> > > --- a/drivers/scsi/scsi_sysfs.c
> > > +++ b/drivers/scsi/scsi_sysfs.c
> > > @@ -399,8 +399,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
> > >  	/* NULL queue means the device can't be used */
> > >  	sdev->request_queue = NULL;
> > >  
> > > -	scsi_target_reap(scsi_target(sdev));
> > > -
> > >  	kfree(sdev->inquiry);
> > >  	kfree(sdev);
> > >  
> > > @@ -1044,6 +1042,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
> > >  	} else
> > >  		put_device(&sdev->sdev_dev);
> > >  
> > > +	scsi_target_reap(scsi_target(sdev));
> > > +
> > >  	/*
> > >  	 * Stop accepting new requests and wait until all queuecommand() and
> > >  	 * scsi_run_queue() invocations have finished before tearing down the
> > 
> > This is not right.  The problem is that you don't keep track explicitly 
> > of the number of references to a target; you rely implicitly on 
> > starget->devices being non-empty.  starget->reap_ref is only a count of 
> > local operations that should block removal.
> 
> No, it was supposed explicitly to be a visibility counter to answer the
> question when can we delete the target.  It's incremented every time we
> add a device to the target (and when we do an operation that may remove
> one to keep an atomic context before we blow it away) and decremented
> every time we remove one.

Sorry, but you're wrong.  starget->reap_ref is _not_ incremented every 
time we add a device to the target.  That's one of the things we need to 
fix.

> > Consider, for example, what would happen if there is more than one LUN.  
> > What if one of them is removed while the other remains?
> 
> Then the reap reference remains above zero and the target stays.
> 
> > A more invasive change is needed.
> 
> I think you might be right in that we need to kill the list_empty check,
> but I think that should be it.

That, plus a one or two other things.  Look over the patch below.

Alan Stern



Index: usb-3.13/drivers/scsi/scsi_scan.c
===================================================================
--- usb-3.13.orig/drivers/scsi/scsi_scan.c
+++ usb-3.13/drivers/scsi/scsi_scan.c
@@ -334,6 +334,7 @@ static void scsi_target_dev_release(stru
 	struct device *parent = dev->parent;
 	struct scsi_target *starget = to_scsi_target(dev);
 
+	WARN_ON(!list_empty(&starget->devices));
 	kfree(starget);
 	put_device(parent);
 }
@@ -481,7 +482,7 @@ void scsi_target_reap(struct scsi_target
 
 	spin_lock_irqsave(shost->host_lock, flags);
 	state = starget->state;
-	if (--starget->reap_ref == 0 && list_empty(&starget->devices)) {
+	if (--starget->reap_ref == 0) {
 		empty = 1;
 		starget->state = STARGET_DEL;
 	}
Index: usb-3.13/drivers/scsi/scsi_sysfs.c
===================================================================
--- usb-3.13.orig/drivers/scsi/scsi_sysfs.c
+++ usb-3.13/drivers/scsi/scsi_sysfs.c
@@ -369,17 +369,13 @@ static void scsi_device_dev_release_user
 {
 	struct scsi_device *sdev;
 	struct device *parent;
-	struct scsi_target *starget;
 	struct list_head *this, *tmp;
 	unsigned long flags;
 
 	sdev = container_of(work, struct scsi_device, ew.work);
-
 	parent = sdev->sdev_gendev.parent;
-	starget = to_scsi_target(parent);
 
 	spin_lock_irqsave(sdev->host->host_lock, flags);
-	starget->reap_ref++;
 	list_del(&sdev->siblings);
 	list_del(&sdev->same_target_siblings);
 	list_del(&sdev->starved_entry);
@@ -399,13 +395,10 @@ static void scsi_device_dev_release_user
 	/* NULL queue means the device can't be used */
 	sdev->request_queue = NULL;
 
-	scsi_target_reap(scsi_target(sdev));
-
 	kfree(sdev->inquiry);
 	kfree(sdev);
 
-	if (parent)
-		put_device(parent);
+	put_device(parent);
 }
 
 static void scsi_device_dev_release(struct device *dev)
@@ -1044,6 +1037,8 @@ void __scsi_remove_device(struct scsi_de
 	} else
 		put_device(&sdev->sdev_dev);
 
+	scsi_target_reap(scsi_target(sdev));
+
 	/*
 	 * Stop accepting new requests and wait until all queuecommand() and
 	 * scsi_run_queue() invocations have finished before tearing down the
@@ -1200,6 +1195,7 @@ void scsi_sysfs_device_initialize(struct
 	sdev->scsi_level = starget->scsi_level;
 	transport_setup_device(&sdev->sdev_gendev);
 	spin_lock_irqsave(shost->host_lock, flags);
+	++starget->reap_ref;
 	list_add_tail(&sdev->same_target_siblings, &starget->devices);
 	list_add_tail(&sdev->siblings, &shost->__devices);
 	spin_unlock_irqrestore(shost->host_lock, flags);

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [usb-storage] UAS hangs khubd on USB disconnect
  2013-12-14  0:48                         ` Alan Stern
@ 2013-12-14  1:27                           ` James Bottomley
  2013-12-14  3:00                             ` Alan Stern
  2013-12-14  3:03                             ` [RFC] fix our current target reap infrastructure James Bottomley
  0 siblings, 2 replies; 26+ messages in thread
From: James Bottomley @ 2013-12-14  1:27 UTC (permalink / raw)
  To: Alan Stern
  Cc: Tejun Heo, Sarah Sharp, Hans de Goede, USB list,
	SCSI development list, USB Storage List, Greg Kroah-Hartman

On Fri, 2013-12-13 at 19:48 -0500, Alan Stern wrote:
> On Fri, 13 Dec 2013, James Bottomley wrote:
> 
> > On Fri, 2013-12-13 at 16:06 -0500, Alan Stern wrote:
> > > On Fri, 13 Dec 2013, James Bottomley wrote:
> > > 
> > > > Actually, I think I have this figured out.  There's a thinko in one of
> > > > the scsi_target_reap() cases.  The original (and still existing) problem
> > > > with targets is that nothing creates them and nothing destroys them, so,
> > > > while we could rely on the refcounting of the device model to preserve
> > > > the actual target object, we had no idea when to remove it from
> > > > visibility.  That was the job of the reap reference, to track
> > > > visibility.  It looks like the reap on device last put is occurring too
> > > > late.  I think we should reap immediately after doing the sdev
> > > > device_del, so does this fix the warn on? (I'm not sure because no-one
> > > > has actually posted a backtrace, but it sounds like this is the
> > > > problem).
> > > > 
> > > > James
> > > > 
> > > > ---
> > > > 
> > > > diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> > > > index 8ff62c2..98d4eb3 100644
> > > > --- a/drivers/scsi/scsi_sysfs.c
> > > > +++ b/drivers/scsi/scsi_sysfs.c
> > > > @@ -399,8 +399,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
> > > >  	/* NULL queue means the device can't be used */
> > > >  	sdev->request_queue = NULL;
> > > >  
> > > > -	scsi_target_reap(scsi_target(sdev));
> > > > -
> > > >  	kfree(sdev->inquiry);
> > > >  	kfree(sdev);
> > > >  
> > > > @@ -1044,6 +1042,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
> > > >  	} else
> > > >  		put_device(&sdev->sdev_dev);
> > > >  
> > > > +	scsi_target_reap(scsi_target(sdev));
> > > > +
> > > >  	/*
> > > >  	 * Stop accepting new requests and wait until all queuecommand() and
> > > >  	 * scsi_run_queue() invocations have finished before tearing down the
> > > 
> > > This is not right.  The problem is that you don't keep track explicitly 
> > > of the number of references to a target; you rely implicitly on 
> > > starget->devices being non-empty.  starget->reap_ref is only a count of 
> > > local operations that should block removal.
> > 
> > No, it was supposed explicitly to be a visibility counter to answer the
> > question when can we delete the target.  It's incremented every time we
> > add a device to the target (and when we do an operation that may remove
> > one to keep an atomic context before we blow it away) and decremented
> > every time we remove one.
> 
> Sorry, but you're wrong.  starget->reap_ref is _not_ incremented every 
> time we add a device to the target.  That's one of the things we need to 
> fix.

Well, then we would have a pretty astonishing cockup in the code.  The
found case of scsi_alloc_target increments the reference each time it's
called, so scsi_add_device() definitely behaves like this.  I suppose
it's possible the list_empty() check is covering a miscount in some of
the other probing routines, but that would mean we have stale targets
for a lot of our use cases.  I'll audit the code.

James



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [usb-storage] UAS hangs khubd on USB disconnect
  2013-12-14  1:27                           ` James Bottomley
@ 2013-12-14  3:00                             ` Alan Stern
  2013-12-14  3:03                             ` [RFC] fix our current target reap infrastructure James Bottomley
  1 sibling, 0 replies; 26+ messages in thread
From: Alan Stern @ 2013-12-14  3:00 UTC (permalink / raw)
  To: James Bottomley
  Cc: Tejun Heo, Sarah Sharp, Hans de Goede, USB list,
	SCSI development list, USB Storage List, Greg Kroah-Hartman

On Fri, 13 Dec 2013, James Bottomley wrote:

> > Sorry, but you're wrong.  starget->reap_ref is _not_ incremented every 
> > time we add a device to the target.  That's one of the things we need to 
> > fix.
> 
> Well, then we would have a pretty astonishing cockup in the code.  The
> found case of scsi_alloc_target increments the reference each time it's
> called, so scsi_add_device() definitely behaves like this.

You forgot that __scsi_add_device() calls scsi_target_reap() at the 
end.  So the reference count is incremented and then decremented again.

It's easy enough to check that the scsi_probe_and_add_lun pathway
doesn't elevate the refcount.  Print out the value of starget->reap_ref
just after __scsi_add_device() calls scsi_alloc_target() and just
before it calls scsi_target_reap().

>  I suppose
> it's possible the list_empty() check is covering a miscount in some of
> the other probing routines, but that would mean we have stale targets
> for a lot of our use cases.  I'll audit the code.

That's probably right; whenever a target has more than one LUN we must 
end up leaking the target.  In the common case of one LUN it works out, 
because the list is empty by the time the scsi_device is released.

Alan Stern

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC] fix our current target reap infrastructure.
  2013-12-14  1:27                           ` James Bottomley
  2013-12-14  3:00                             ` Alan Stern
@ 2013-12-14  3:03                             ` James Bottomley
  2013-12-14  3:32                               ` Alan Stern
  1 sibling, 1 reply; 26+ messages in thread
From: James Bottomley @ 2013-12-14  3:03 UTC (permalink / raw)
  To: Alan Stern
  Cc: Tejun Heo, Sarah Sharp, Hans de Goede, USB list,
	SCSI development list, USB Storage List, Greg Kroah-Hartman

This patch eliminates the reap_ref and replaces it with a proper kref.
On last put of this kref, the target is removed from visibility in
sysfs.  The final call to scsi_target_reap() for the device is done from
__scsi_remove_device() and only if the device was made visible.  This
ensures that the target disappears as soon as the last device is gone
rather than waiting until final release of the device (which is often
too long).

---

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 307a811..d966e36 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -371,6 +371,31 @@ static struct scsi_target *__scsi_find_target(struct device *parent,
 }
 
 /**
+ * scsi_target_reap_ref_release - remove target from visibility
+ * @kref: the reap_ref in the target being released
+ *
+ * Called on last put of reap_ref, which is the indication that no device
+ * under this target is visible anymore, so render the target invisible in
+ * sysfs.  Note: we have to be in user context here because the target reaps
+ * should be done in places where the scsi device visibility is being removed.
+ */
+static void scsi_target_reap_ref_release(struct kref *kref)
+{
+	struct scsi_target *starget
+		= container_of(kref, struct scsi_target, reap_ref);
+
+	transport_remove_device(&starget->dev);
+	device_del(&starget->dev);
+	starget->state = STARGET_DEL;
+	scsi_target_destroy(starget);
+}
+
+static void scsi_target_reap_ref_put(struct scsi_target *starget)
+{
+	kref_put(&starget->reap_ref, scsi_target_reap_ref_release);
+}
+
+/**
  * scsi_alloc_target - allocate a new or find an existing target
  * @parent:	parent of the target (need not be a scsi host)
  * @channel:	target channel number (zero if no channels)
@@ -401,7 +426,7 @@ static struct scsi_target *scsi_alloc_target(struct device *parent,
 	}
 	dev = &starget->dev;
 	device_initialize(dev);
-	starget->reap_ref = 1;
+	kref_init(&starget->reap_ref);
 	dev->parent = get_device(parent);
 	dev_set_name(dev, "target%d:%d:%d", shost->host_no, channel, id);
 	dev->bus = &scsi_bus_type;
@@ -441,29 +466,26 @@ static struct scsi_target *scsi_alloc_target(struct device *parent,
 	return starget;
 
  found:
-	found_target->reap_ref++;
+	kref_get(&found_target->reap_ref);
 	spin_unlock_irqrestore(shost->host_lock, flags);
 	if (found_target->state != STARGET_DEL) {
 		put_device(dev);
 		return found_target;
 	}
-	/* Unfortunately, we found a dying target; need to
-	 * wait until it's dead before we can get a new one */
+	/*
+	 * Unfortunately, we found a dying target; need to wait until it's
+	 * dead before we can get a new one.  There is an anomaly here.  We
+	 * *should* call scsi_target_reap() to balance the kref_get() of the
+	 * reap_ref above.  However, since the target is in state STARGET_DEL,
+	 * it's already invisible and the reap_ref is irrelevant.  If we call
+	 * scsi_target_reap() we might spuriously do another device_del() on
+	 * an already invisible target.
+	 */
 	put_device(&found_target->dev);
 	flush_scheduled_work();
 	goto retry;
 }
 
-static void scsi_target_reap_usercontext(struct work_struct *work)
-{
-	struct scsi_target *starget =
-		container_of(work, struct scsi_target, ew.work);
-
-	transport_remove_device(&starget->dev);
-	device_del(&starget->dev);
-	scsi_target_destroy(starget);
-}
-
 /**
  * scsi_target_reap - check to see if target is in use and destroy if not
  * @starget: target to be checked
@@ -474,28 +496,11 @@ static void scsi_target_reap_usercontext(struct work_struct *work)
  */
 void scsi_target_reap(struct scsi_target *starget)
 {
-	struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
-	unsigned long flags;
-	enum scsi_target_state state;
-	int empty = 0;
-
-	spin_lock_irqsave(shost->host_lock, flags);
-	state = starget->state;
-	if (--starget->reap_ref == 0 && list_empty(&starget->devices)) {
-		empty = 1;
-		starget->state = STARGET_DEL;
-	}
-	spin_unlock_irqrestore(shost->host_lock, flags);
-
-	if (!empty)
-		return;
-
-	BUG_ON(state == STARGET_DEL);
-	if (state == STARGET_CREATED)
+	BUG_ON(starget->state == STARGET_DEL);
+	if (starget->state == STARGET_CREATED)
 		scsi_target_destroy(starget);
 	else
-		execute_in_process_context(scsi_target_reap_usercontext,
-					   &starget->ew);
+		scsi_target_reap_ref_put(starget);
 }
 
 /**
@@ -1532,6 +1537,10 @@ struct scsi_device *__scsi_add_device(struct Scsi_Host *shost, uint channel,
 	}
 	mutex_unlock(&shost->scan_mutex);
 	scsi_autopm_put_target(starget);
+	/*
+	 * paired with scsi_alloc_target().  Target will be destroyed unless
+	 * scsi_probe_and_add_lun made an underlying device visible
+	 */
 	scsi_target_reap(starget);
 	put_device(&starget->dev);
 
@@ -1612,8 +1621,10 @@ static void __scsi_scan_target(struct device *parent, unsigned int channel,
 
  out_reap:
 	scsi_autopm_put_target(starget);
-	/* now determine if the target has any children at all
-	 * and if not, nuke it */
+	/*
+	 * paired with scsi_alloc_target(): determine if the target has
+	 * any children at all and if not, nuke it
+	 */
 	scsi_target_reap(starget);
 
 	put_device(&starget->dev);
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 9117d0b..7b3770b 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -393,7 +393,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
 	starget = to_scsi_target(parent);
 
 	spin_lock_irqsave(sdev->host->host_lock, flags);
-	starget->reap_ref++;
 	list_del(&sdev->siblings);
 	list_del(&sdev->same_target_siblings);
 	list_del(&sdev->starved_entry);
@@ -413,8 +412,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
 	/* NULL queue means the device can't be used */
 	sdev->request_queue = NULL;
 
-	scsi_target_reap(scsi_target(sdev));
-
 	kfree(sdev->inquiry);
 	kfree(sdev);
 
@@ -1001,6 +998,7 @@ int scsi_sysfs_add_sdev(struct scsi_device *sdev)
 		return error;
 	}
 	transport_add_device(&sdev->sdev_gendev);
+	kref_get(&starget->reap_ref); /* device now visible, so target is held */
 	sdev->is_visible = 1;
 
 	/* create queue files, which may be writable, depending on the host */
@@ -1055,6 +1053,13 @@ void __scsi_remove_device(struct scsi_device *sdev)
 		device_unregister(&sdev->sdev_dev);
 		transport_remove_device(dev);
 		device_del(dev);
+		/*
+		 * Paired with the kref_get() in scsi_sysfs_add_sdev().  We're
+		 * removing sysfs visibility from the device, so make the
+		 * target invisible if this was the last device underneath it.
+		 */
+		scsi_target_reap(scsi_target(sdev));
+
 	} else
 		put_device(&sdev->sdev_dev);
 
@@ -1133,7 +1138,7 @@ void scsi_remove_target(struct device *dev)
 			continue;
 		if (starget->dev.parent == dev || &starget->dev == dev) {
 			/* assuming new targets arrive at the end */
-			starget->reap_ref++;
+			kref_get(&starget->reap_ref);
 			spin_unlock_irqrestore(shost->host_lock, flags);
 			if (last)
 				scsi_target_reap(last);
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index d65fbec..24b9e06 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -257,7 +257,7 @@ struct scsi_target {
 	struct list_head	siblings;
 	struct list_head	devices;
 	struct device		dev;
-	unsigned int		reap_ref; /* protected by the host lock */
+	struct kref		reap_ref; /* last put renders device invisible */
 	unsigned int		channel;
 	unsigned int		id; /* target id ... replace
 				     * scsi_device.id eventually */
@@ -284,7 +284,6 @@ struct scsi_target {
 #define SCSI_DEFAULT_TARGET_BLOCKED	3
 
 	char			scsi_level;
-	struct execute_work	ew;
 	enum scsi_target_state	state;
 	void 			*hostdata; /* available to low-level driver */
 	unsigned long		starget_data[0]; /* for the transport */



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC] fix our current target reap infrastructure.
  2013-12-14  3:03                             ` [RFC] fix our current target reap infrastructure James Bottomley
@ 2013-12-14  3:32                               ` Alan Stern
  2013-12-14 23:55                                 ` James Bottomley
  0 siblings, 1 reply; 26+ messages in thread
From: Alan Stern @ 2013-12-14  3:32 UTC (permalink / raw)
  To: James Bottomley
  Cc: Tejun Heo, Sarah Sharp, Hans de Goede, USB list,
	SCSI development list, USB Storage List, Greg Kroah-Hartman

On Fri, 13 Dec 2013, James Bottomley wrote:

> This patch eliminates the reap_ref and replaces it with a proper kref.
> On last put of this kref, the target is removed from visibility in
> sysfs.  The final call to scsi_target_reap() for the device is done from
> __scsi_remove_device() and only if the device was made visible.  This
> ensures that the target disappears as soon as the last device is gone
> rather than waiting until final release of the device (which is often
> too long).


> @@ -474,28 +496,11 @@ static void scsi_target_reap_usercontext(struct work_struct *work)
>   */
>  void scsi_target_reap(struct scsi_target *starget)
>  {
> -	struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
> -	unsigned long flags;
> -	enum scsi_target_state state;
> -	int empty = 0;
> -
> -	spin_lock_irqsave(shost->host_lock, flags);
> -	state = starget->state;
> -	if (--starget->reap_ref == 0 && list_empty(&starget->devices)) {
> -		empty = 1;
> -		starget->state = STARGET_DEL;
> -	}
> -	spin_unlock_irqrestore(shost->host_lock, flags);
> -
> -	if (!empty)
> -		return;
> -
> -	BUG_ON(state == STARGET_DEL);
> -	if (state == STARGET_CREATED)
> +	BUG_ON(starget->state == STARGET_DEL);
> +	if (starget->state == STARGET_CREATED)
>  		scsi_target_destroy(starget);
>  	else
> -		execute_in_process_context(scsi_target_reap_usercontext,
> -					   &starget->ew);
> +		scsi_target_reap_ref_put(starget);

The refcount test and state change race with scsi_alloc_target().  
Maybe the race won't occur in practice, but to be safe you should hold
shost->host_lock throughout that time interval, as the original code
here does.

This means the kref approach won't work so easily.  You might as well
leave reap_ref as an ordinary int.

> @@ -393,7 +393,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
>  	starget = to_scsi_target(parent);
>  
>  	spin_lock_irqsave(sdev->host->host_lock, flags);
> -	starget->reap_ref++;
>  	list_del(&sdev->siblings);
>  	list_del(&sdev->same_target_siblings);
>  	list_del(&sdev->starved_entry);

starget is now an unused local variable.  It can be eliminated.

Alan Stern


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC] fix our current target reap infrastructure.
  2013-12-14  3:32                               ` Alan Stern
@ 2013-12-14 23:55                                 ` James Bottomley
  2013-12-15 21:32                                   ` Alan Stern
  0 siblings, 1 reply; 26+ messages in thread
From: James Bottomley @ 2013-12-14 23:55 UTC (permalink / raw)
  To: Alan Stern
  Cc: Tejun Heo, Sarah Sharp, Hans de Goede, USB list,
	SCSI development list, USB Storage List, Greg Kroah-Hartman

On Fri, 2013-12-13 at 22:32 -0500, Alan Stern wrote:
> On Fri, 13 Dec 2013, James Bottomley wrote:
> 
> > This patch eliminates the reap_ref and replaces it with a proper kref.
> > On last put of this kref, the target is removed from visibility in
> > sysfs.  The final call to scsi_target_reap() for the device is done from
> > __scsi_remove_device() and only if the device was made visible.  This
> > ensures that the target disappears as soon as the last device is gone
> > rather than waiting until final release of the device (which is often
> > too long).
> 
> 
> > @@ -474,28 +496,11 @@ static void scsi_target_reap_usercontext(struct work_struct *work)
> >   */
> >  void scsi_target_reap(struct scsi_target *starget)
> >  {
> > -	struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
> > -	unsigned long flags;
> > -	enum scsi_target_state state;
> > -	int empty = 0;
> > -
> > -	spin_lock_irqsave(shost->host_lock, flags);
> > -	state = starget->state;
> > -	if (--starget->reap_ref == 0 && list_empty(&starget->devices)) {
> > -		empty = 1;
> > -		starget->state = STARGET_DEL;
> > -	}
> > -	spin_unlock_irqrestore(shost->host_lock, flags);
> > -
> > -	if (!empty)
> > -		return;
> > -
> > -	BUG_ON(state == STARGET_DEL);
> > -	if (state == STARGET_CREATED)
> > +	BUG_ON(starget->state == STARGET_DEL);
> > +	if (starget->state == STARGET_CREATED)
> >  		scsi_target_destroy(starget);
> >  	else
> > -		execute_in_process_context(scsi_target_reap_usercontext,
> > -					   &starget->ew);
> > +		scsi_target_reap_ref_put(starget);
> 
> The refcount test and state change race with scsi_alloc_target().  
> Maybe the race won't occur in practice, but to be safe you should hold
> shost->host_lock throughout that time interval, as the original code
> here does.

You mean the fact that using a state model to indicate whether we should
destroy a target without bothering to refcount isn't robust against two
threads of execution running through a scan on the same target?  Yes, it
could be construed as a bug, but it's a bug in the old code as well.

> This means the kref approach won't work so easily.  You might as well
> leave reap_ref as an ordinary int.

Actually, no, we can better fix it using krefs.  We just force
everything through the kref put instead of special casing the not made
visible destruction case.  We can then case the release routine to fix
this, like below.  I suppose since this is a separate bug I'll keep it
as a separate patch.

> > @@ -393,7 +393,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
> >  	starget = to_scsi_target(parent);
> >  
> >  	spin_lock_irqsave(sdev->host->host_lock, flags);
> > -	starget->reap_ref++;
> >  	list_del(&sdev->siblings);
> >  	list_del(&sdev->same_target_siblings);
> >  	list_del(&sdev->starved_entry);
> 
> starget is now an unused local variable.  It can be eliminated.

True, dumped them, thanks.

James

---

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index d966e36..327c0e92 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -320,6 +320,7 @@ static void scsi_target_destroy(struct scsi_target *starget)
 	struct Scsi_Host *shost = dev_to_shost(dev->parent);
 	unsigned long flags;
 
+	starget->state = STARGET_DEL;
 	transport_destroy_device(dev);
 	spin_lock_irqsave(shost->host_lock, flags);
 	if (shost->hostt->target_destroy)
@@ -384,9 +385,15 @@ static void scsi_target_reap_ref_release(struct kref *kref)
 	struct scsi_target *starget
 		= container_of(kref, struct scsi_target, reap_ref);
 
-	transport_remove_device(&starget->dev);
-	device_del(&starget->dev);
-	starget->state = STARGET_DEL;
+	/*
+	 * if we get here and the target is still in the CREATED state that
+	 * means it was allocated but never made visible (because a scan
+	 * turned up no LUNs), so don't call device_del() on it.
+	 */
+	if (starget->state == STARGET_RUNNING) {
+		transport_remove_device(&starget->dev);
+		device_del(&starget->dev);
+	}
 	scsi_target_destroy(starget);
 }
 
@@ -496,11 +503,13 @@ static struct scsi_target *scsi_alloc_target(struct device *parent,
  */
 void scsi_target_reap(struct scsi_target *starget)
 {
+	/*
+	 * serious problem if this triggers: STARGET_DEL is only set in the
+	 * kref release routine, so we're doing another final put on an
+	 * already released kref
+	 */
 	BUG_ON(starget->state == STARGET_DEL);
-	if (starget->state == STARGET_CREATED)
-		scsi_target_destroy(starget);
-	else
-		scsi_target_reap_ref_put(starget);
+	scsi_target_reap_ref_put(starget);
 }
 
 /**



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC] fix our current target reap infrastructure.
  2013-12-14 23:55                                 ` James Bottomley
@ 2013-12-15 21:32                                   ` Alan Stern
       [not found]                                     ` <Pine.LNX.4.44L0.1312151550380.32133-100000-pYrvlCTfrz9XsRXLowluHWD2FQJk+8+b@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Alan Stern @ 2013-12-15 21:32 UTC (permalink / raw)
  To: James Bottomley
  Cc: Tejun Heo, Sarah Sharp, Hans de Goede, USB list,
	SCSI development list, USB Storage List, Greg Kroah-Hartman

On Sat, 14 Dec 2013, James Bottomley wrote:

> > The refcount test and state change race with scsi_alloc_target().  
> > Maybe the race won't occur in practice, but to be safe you should hold
> > shost->host_lock throughout that time interval, as the original code
> > here does.
> 
> You mean the fact that using a state model to indicate whether we should
> destroy a target without bothering to refcount isn't robust against two
> threads of execution running through a scan on the same target?

I meant that the patch you posted suffers from a race when one thread
is adding a device to a target and another thread is removing an
existing device below that target at the same time.  Suppose the
target's reap_ref count is initially equal to 1:

	Thread 0			Thread 1
	--------			--------
	In scsi_alloc_target():		In scsi_target_reap():
	lock host_lock			scsi_target_reap_ref_put():
	find existing starget		starget->reap_ref drops to 0
	incr starget->reap_ref		In scsi_target_reap_ref_release():
	unlock host_lock		device_del(&starget->dev);
	starget->state == STARGET_DEL?
	No => okay to use starget	set starget->state = STARGET_DEL

Result: We end up using starget _and_ removing it.  The only way to
avoid this race would be to guarantee that we never add and remove
devices below the same target at the same time.  In theory this is 
feasible, but I don't know if you want to do it.

This doesn't seem to be what you are talking about above.  In any case, 
it is a bug.

>  Yes, it
> could be construed as a bug, but it's a bug in the old code as well.

The old code is immune to the bug I just described, because the
existing scsi_target_reap() holds the host_lock while manipulating
starget->state.

	Case A: Thread 0 acquires the host_lock first.  Then thread 0
	increments reap_ref while holding the lock.  Later on, thread 1
	acquires the lock and decrements reap_ref, but the value 
	doesn't drop to 0 (because of the prior increment).  Thus the
	target doesn't get reaped.

	Case B: Thread 1 acquires the host_lock first.  Then thread 1
	decrements reap_ref, sees that it drop to 0, and sets the state
	to STARGET_DEL, all before releasing the lock.  Later on, 
	thread 0 acquires the lock and increments reap_ref futilely, 
	but sees that the state has already been changed to 
	STARGET_DEL.  Thus, it delays and retries instead of using 
	starget.

> > This means the kref approach won't work so easily.  You might as well
> > leave reap_ref as an ordinary int.
> 
> Actually, no, we can better fix it using krefs.  We just force
> everything through the kref put instead of special casing the not made
> visible destruction case.  We can then case the release routine to fix
> this, like below.  I suppose since this is a separate bug I'll keep it
> as a separate patch.

It looks like you misunderstood the problem; the description in my
previous email was perhaps excessively brief.  Hopefully the
explanation above is sufficiently explicit to make everything clear.  
This new, separate patch doesn't fix it.

To fix this bug while using krefs would require that you hold the
host_lock while doing the kref_put().  The release routine could set
the state to STARGET_DEL, but then you would have to use the
execute_in_usercontext mechanism to do the rest of the release work.  

That's doable, but it seems simpler to keep the code the way it is.  
In which case there's no need to change reap_ref into an atomic kref,
because it is never modified outside the scope of the host_lock.

Alan Stern

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC] fix our current target reap infrastructure.
       [not found]                                     ` <Pine.LNX.4.44L0.1312151550380.32133-100000-pYrvlCTfrz9XsRXLowluHWD2FQJk+8+b@public.gmane.org>
@ 2013-12-15 22:14                                       ` James Bottomley
       [not found]                                         ` <1387145674.2284.60.camel-sFMDBYUN5F8GjUHQrlYNx2Wm91YjaHnnhRte9Li2A+AAvxtiuMwx3w@public.gmane.org>
  2013-12-16  2:49                                         ` Alan Stern
  0 siblings, 2 replies; 26+ messages in thread
From: James Bottomley @ 2013-12-15 22:14 UTC (permalink / raw)
  To: Alan Stern
  Cc: Tejun Heo, Sarah Sharp, Hans de Goede, USB list,
	SCSI development list, USB Storage List, Greg Kroah-Hartman

On Sun, 2013-12-15 at 16:32 -0500, Alan Stern wrote:
> On Sat, 14 Dec 2013, James Bottomley wrote:
> 
> > > The refcount test and state change race with scsi_alloc_target().  
> > > Maybe the race won't occur in practice, but to be safe you should hold
> > > shost->host_lock throughout that time interval, as the original code
> > > here does.
> > 
> > You mean the fact that using a state model to indicate whether we should
> > destroy a target without bothering to refcount isn't robust against two
> > threads of execution running through a scan on the same target?
> 
> I meant that the patch you posted suffers from a race when one thread
> is adding a device to a target and another thread is removing an
> existing device below that target at the same time.  Suppose the
> target's reap_ref count is initially equal to 1:
> 
> 	Thread 0			Thread 1
> 	--------			--------
> 	In scsi_alloc_target():		In scsi_target_reap():
> 	lock host_lock			scsi_target_reap_ref_put():
> 	find existing starget		starget->reap_ref drops to 0
> 	incr starget->reap_ref		In scsi_target_reap_ref_release():
> 	unlock host_lock		device_del(&starget->dev);
> 	starget->state == STARGET_DEL?
> 	No => okay to use starget	set starget->state = STARGET_DEL
> 
> Result: We end up using starget _and_ removing it.  The only way to
> avoid this race would be to guarantee that we never add and remove
> devices below the same target at the same time.  In theory this is 
> feasible, but I don't know if you want to do it.
> 
> This doesn't seem to be what you are talking about above.  In any case, 
> it is a bug.

No, I was thinking of the two thread scan bug (i.e. two scan threads)
not one scan and one remove, which is a bug in the old code.  This is a
race between put and get when the kref is incremented from zero (an
illegal operation which triggers a warn on).

The way to mediate this is to check for the kref already being zero
condition, like below.

James

---
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 327c0e92..303d471 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -473,7 +473,13 @@ static struct scsi_target *scsi_alloc_target(struct device *parent,
 	return starget;
 
  found:
-	kref_get(&found_target->reap_ref);
+	if (!kref_get_unless_zero(&found_target->reap_ref))
+		/*
+		 * release routine already fired.  Target is dead, but
+		 * STARGET_DEL may not yet be set (set in the release
+		 * routine), so set here as well, just in case
+		 */
+		found_target->state = STARGET_DEL;
 	spin_unlock_irqrestore(shost->host_lock, flags);
 	if (found_target->state != STARGET_DEL) {
 		put_device(dev);



--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC] fix our current target reap infrastructure.
       [not found]                                         ` <1387145674.2284.60.camel-sFMDBYUN5F8GjUHQrlYNx2Wm91YjaHnnhRte9Li2A+AAvxtiuMwx3w@public.gmane.org>
@ 2013-12-16  2:44                                           ` Alan Stern
  2013-12-16  3:32                                             ` James Bottomley
  0 siblings, 1 reply; 26+ messages in thread
From: Alan Stern @ 2013-12-16  2:44 UTC (permalink / raw)
  To: James Bottomley
  Cc: Tejun Heo, Sarah Sharp, Hans de Goede, USB list,
	SCSI development list, USB Storage List, Greg Kroah-Hartman

On Sun, 15 Dec 2013, James Bottomley wrote:

> No, I was thinking of the two thread scan bug (i.e. two scan threads)
> not one scan and one remove, which is a bug in the old code.  This is a
> race between put and get when the kref is incremented from zero (an
> illegal operation which triggers a warn on).
> 
> The way to mediate this is to check for the kref already being zero
> condition, like below.

Yes, that seems reasonable.  Consider now: Having done this, to what
extent do starget->reap_ref and starget->state really need to be
protected by the host_lock?  Maybe only the linked lists require
protection.  (I haven't checked.)

Can you post a single, combined patch incorporating all your proposed
changes?  It's little hard to review them in pieces...

Alan Stern

P.S.: Would you agree that the phrase "pretty astonishing cockup" did
indeed turn out to be appropriate?  :-)

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC] fix our current target reap infrastructure.
  2013-12-15 22:14                                       ` James Bottomley
       [not found]                                         ` <1387145674.2284.60.camel-sFMDBYUN5F8GjUHQrlYNx2Wm91YjaHnnhRte9Li2A+AAvxtiuMwx3w@public.gmane.org>
@ 2013-12-16  2:49                                         ` Alan Stern
  2013-12-16  3:33                                           ` James Bottomley
  1 sibling, 1 reply; 26+ messages in thread
From: Alan Stern @ 2013-12-16  2:49 UTC (permalink / raw)
  To: James Bottomley
  Cc: Tejun Heo, Sarah Sharp, Hans de Goede, USB list,
	SCSI development list, USB Storage List, Greg Kroah-Hartman

On Sun, 15 Dec 2013, James Bottomley wrote:

> No, I was thinking of the two thread scan bug (i.e. two scan threads)
> not one scan and one remove, which is a bug in the old code.

By the way, the existing code doesn't allow two threads to scan a
target at the same time.  They would both have to hold the host's
scan_mutex.

On the other hand, as far as I can see there's nothing to prevent two
threads from removing a device at the same time.  But that wouldn't 
cause any problems.

Alan Stern


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC] fix our current target reap infrastructure.
  2013-12-16  2:44                                           ` Alan Stern
@ 2013-12-16  3:32                                             ` James Bottomley
  0 siblings, 0 replies; 26+ messages in thread
From: James Bottomley @ 2013-12-16  3:32 UTC (permalink / raw)
  To: Alan Stern
  Cc: Tejun Heo, Sarah Sharp, Hans de Goede, USB list,
	SCSI development list, USB Storage List, Greg Kroah-Hartman

On Sun, 2013-12-15 at 21:44 -0500, Alan Stern wrote:
> On Sun, 15 Dec 2013, James Bottomley wrote:
> 
> > No, I was thinking of the two thread scan bug (i.e. two scan threads)
> > not one scan and one remove, which is a bug in the old code.  This is a
> > race between put and get when the kref is incremented from zero (an
> > illegal operation which triggers a warn on).
> > 
> > The way to mediate this is to check for the kref already being zero
> > condition, like below.
> 
> Yes, that seems reasonable.  Consider now: Having done this, to what
> extent do starget->reap_ref and starget->state really need to be
> protected by the host_lock?  Maybe only the linked lists require
> protection.  (I haven't checked.)

Yes, I think so, but that can be done as an enhancement patch after the
fact.

> Can you post a single, combined patch incorporating all your proposed
> changes?  It's little hard to review them in pieces...

Sure, I'll repost what I have.

> Alan Stern
> 
> P.S.: Would you agree that the phrase "pretty astonishing cockup" did
> indeed turn out to be appropriate?  :-)

Objection, m'lud, my learned friend is leading the witness ...

James




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC] fix our current target reap infrastructure.
  2013-12-16  2:49                                         ` Alan Stern
@ 2013-12-16  3:33                                           ` James Bottomley
  0 siblings, 0 replies; 26+ messages in thread
From: James Bottomley @ 2013-12-16  3:33 UTC (permalink / raw)
  To: Alan Stern
  Cc: Tejun Heo, Sarah Sharp, Hans de Goede, USB list,
	SCSI development list, USB Storage List, Greg Kroah-Hartman

On Sun, 2013-12-15 at 21:49 -0500, Alan Stern wrote:
> On Sun, 15 Dec 2013, James Bottomley wrote:
> 
> > No, I was thinking of the two thread scan bug (i.e. two scan threads)
> > not one scan and one remove, which is a bug in the old code.
> 
> By the way, the existing code doesn't allow two threads to scan a
> target at the same time.  They would both have to hold the host's
> scan_mutex.

I thought of that, but it's dropped too early.  That makes the race
almost untriggerable because the racing thread starts way behind, but
it's not theoretically impossible.

> On the other hand, as far as I can see there's nothing to prevent two
> threads from removing a device at the same time.  But that wouldn't 
> cause any problems.

Right.

Thanks,

James




^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2013-12-16  3:33 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20131212000715.GA3181@xanatos>
2013-12-12 13:13 ` UAS hangs khubd on USB disconnect Hans de Goede
2013-12-12 22:04 ` [usb-storage] " Alan Stern
     [not found]   ` <Pine.LNX.4.44L0.1312121632470.849-100000-IYeN2dnnYyZXsRXLowluHWD2FQJk+8+b@public.gmane.org>
2013-12-13 18:09     ` Sarah Sharp
2013-12-13 18:19       ` Alan Stern
     [not found]         ` <Pine.LNX.4.44L0.1312131316470.1185-100000-IYeN2dnnYyZXsRXLowluHWD2FQJk+8+b@public.gmane.org>
2013-12-13 18:33           ` Tejun Heo
2013-12-13 19:18             ` James Bottomley
     [not found]               ` <1386962327.2055.54.camel-sFMDBYUN5F8GjUHQrlYNx2Wm91YjaHnnhRte9Li2A+AAvxtiuMwx3w@public.gmane.org>
2013-12-13 20:03                 ` James Bottomley
     [not found]                   ` <1386964999.2055.59.camel-sFMDBYUN5F8GjUHQrlYNx2Wm91YjaHnnhRte9Li2A+AAvxtiuMwx3w@public.gmane.org>
2013-12-13 20:22                     ` Hans de Goede
2013-12-13 21:06                   ` Alan Stern
2013-12-13 21:18                     ` James Bottomley
     [not found]                       ` <1386969529.2055.79.camel-sFMDBYUN5F8GjUHQrlYNx2Wm91YjaHnnhRte9Li2A+AAvxtiuMwx3w@public.gmane.org>
2013-12-14  0:48                         ` Alan Stern
2013-12-14  1:27                           ` James Bottomley
2013-12-14  3:00                             ` Alan Stern
2013-12-14  3:03                             ` [RFC] fix our current target reap infrastructure James Bottomley
2013-12-14  3:32                               ` Alan Stern
2013-12-14 23:55                                 ` James Bottomley
2013-12-15 21:32                                   ` Alan Stern
     [not found]                                     ` <Pine.LNX.4.44L0.1312151550380.32133-100000-pYrvlCTfrz9XsRXLowluHWD2FQJk+8+b@public.gmane.org>
2013-12-15 22:14                                       ` James Bottomley
     [not found]                                         ` <1387145674.2284.60.camel-sFMDBYUN5F8GjUHQrlYNx2Wm91YjaHnnhRte9Li2A+AAvxtiuMwx3w@public.gmane.org>
2013-12-16  2:44                                           ` Alan Stern
2013-12-16  3:32                                             ` James Bottomley
2013-12-16  2:49                                         ` Alan Stern
2013-12-16  3:33                                           ` James Bottomley
2013-12-13 21:13                   ` [usb-storage] UAS hangs khubd on USB disconnect Sarah Sharp
2013-12-13 21:24                   ` Hans de Goede
2013-12-13 20:05                 ` Alan Stern
2013-12-13 19:07           ` Sarah Sharp

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).