linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* usb: uas: fix usb subsystem hang after power off hub port
@ 2019-04-04  3:57 Kento.A.Kobayashi
  2019-04-04 19:33 ` Alan Stern
  0 siblings, 1 reply; 39+ messages in thread
From: Kento.A.Kobayashi @ 2019-04-04  3:57 UTC (permalink / raw)
  To: stern
  Cc: oneukum, gregkh, usb-storage, Jacky.Cao, linux-kernel, linux-scsi,
	linux-usb, Kento.A.Kobayashi

Hi,

>> Root Cause
>> - Block layer timeout happens after power off UAS USB device which is accessed as reproduce step. During timeout error handler process, scsi host state becomes SHOST_CANCEL_RECOVERY that causes IO hangs up and lock cannot be released. And in final, usb subsystem hangs up.
>> Follow is function call:
>> blk_mq_timeout_work 
>>   …->scsi_times_out  (… means some functions are not listed before this function.)
>>     …-> scsi_eh_scmd_add(scsi_host_set_state to SHOST_RECOVERY) 
>>       … -> scsi_error_handler
>>         …-> uas_eh_device_reset_handler
>>             -> usb_lock_device_for_reset  <- take lock
>>               -> usb_reset_device
>>                 …-> rebind = uas_post_reset (return 1 since ENODEV) 
>>                 …-> usb_unbind_and_rebind_marked_interfaces (rebind=1)
>>                    …-> uas_disconnect  (scsi_host_set_state to SHOST_CANCEL_RECOVERY)
>>                         … -> scsi_queue_rq
>
>How does scsi_queue_rq get called here?  As far as I can see, this shouldn't happen.

We confirmed the function call path on linux 4.9 when this problem occured since we are working on it. In linux 4.9, the last function is scsi_request_fn instead of scsi_queue_rq. In staging.git, we think the scsi_queue_rq is called by follow path.
uas_disconnect
|- scsi_remove_host
 |- scsi_forget_host
  |- __scsi_remove_device
   |- device_del
    |- bus_remove_device
     |- device_release_driver
      |- device_release_driver_internal
       |- __device_release_driver
        |- drv->remove(dev) (sd_remove)  
         |- sd_shutdown
          |- sd_sync_cache
           |- scsi_execute
            |- __scsi_execute
             |- blk_execute_rq
              |- blk_execute_rq_nowait
               |- blk_mq_sched_insert_request
                |- blk_mq_run_hw_queue
                 |- __blk_mq_delay_run_hw_queue
                  |- __blk_mq_run_hw_queue
                   |- blk_mq_sched_dispatch_requests
                    |- blk_mq_dispatch_rq_list
                     |- q->mq_ops->queue_rq (scsi_queue_rq)

>> Countermeasure
>> - Make uas_post_reset doesn’t return 1 when ENODEV returns from uas_configure_endpoints since usb_unbind_and_rebind_marded_interfaces doesn’t need to do unbind/rebind operations in this situation.
>> blk_mq_timeout_work
>>   …->scsi_times_out  (… means some functions are not listed before this function.)
>>     …-> scsi_eh_scmd_add(scsi_host_set_state to SHOST_RECOVERY) 
>>       … -> scsi_error_handler
>>        …-> uas_eh_device_reset_handler (*1)
>>            -> usb_lock_device_for_reset  <- take lock
>>              -> usb_reset_device
>>                -> usb_reset_and_verify_device (return ENODEV and FAILED will be reported to *1)
>>                -> uas_post_reset returns 0 when ENODEV => rebind=0 
>>                -> usb_unbind_and_rebind_marked_interfaces (rebind=0)
>
>The difference is that uas_disconnect wasn't called here.  But that routine should not cause any problems -- you're always supposed to be able to unbind a driver from a device.  So it looks like this is not the right way to solve the problem.

We confirmed usb_driver_release_interface will call usb_unbind_interface when this problem occurs.
So usb_unbind_interface will call driver disconnect callbak.

Regards,
Kento Kobayashi

^ permalink raw reply	[flat|nested] 39+ messages in thread
* usb: uas: fix usb subsystem hang after power off hub port
@ 2019-04-02 14:38 Alan Stern
  0 siblings, 0 replies; 39+ messages in thread
From: Alan Stern @ 2019-04-02 14:38 UTC (permalink / raw)
  To: Kento.A.Kobayashi
  Cc: oneukum, gregkh, usb-storage, Jacky.Cao, linux-kernel, linux-scsi,
	linux-usb

On Tue, 2 Apr 2019 Kento.A.Kobayashi@sony.com wrote:

> Hi,
> 
> >> Hi,
> >> 
> >> > Sorry,
> >> > 
> >> > I thought this was clear. Your patch is making the assumption that the reset is triggered by the SCSI layer. You cannot make that assumption, as there is an ioctl for resetting a USB device.
> >> > In case we are getting an error during the reset (our endpoints vanish), the device driver must report that to the USB layer, so the driver will always be disconnected.
> >> > We cannot drop errors.
> >> > 
> >> > 	Regards
> >> > 		Oliver
> >> 
> >> This patch modified uas_post_reset to skip rebind operation to avoid exception while -ENODEV happens not drop error.
> >> If uas_post_reset happens -ENODEV, usb_reset_and_verify_device must happen error.
> >> So,when we use ioctl(USBDEVFS_RESET) to reset device, if usb_reset_and_verify_device happens error, the error will be reported through ioctl return value. 
> >
> >OK, It is possible that I am stupid. We must rebind if uas_post_reset() fails. The driver will crash without the endpoints. Can you please explain again in greater detail, what you are trying to achieve?
> 
> Follow is details for this patch.
> 
> Issue
> - USB subsystem hangs if power off the hub port connecting UAS USB3.0/3.1 device by calling ioctl(USBDEVFS_CONTROL) to do Hub Class Request(CLEAR_FEATURE:PORT_POWER) while the device is being accessed. 
> - Status of the process that is accessing the device becomes DEAD and cannot be killed.
> 
> Root Cause
> - Block layer timeout happens after power off UAS USB device which is accessed as reproduce step. During timeout error handler process, scsi host state becomes SHOST_CANCEL_RECOVERY that causes IO hangs up and lock cannot be released. And in final, usb subsystem hangs up.
> Follow is function call:
> blk_mq_timeout_work 
>   …->scsi_times_out  (… means some functions are not listed before this function.)
>     …-> scsi_eh_scmd_add(scsi_host_set_state to SHOST_RECOVERY) 
>       … -> scsi_error_handler
>         …-> uas_eh_device_reset_handler
>             -> usb_lock_device_for_reset  <- take lock
>               -> usb_reset_device
>                 …-> rebind = uas_post_reset (return 1 since ENODEV) 
>                 …-> usb_unbind_and_rebind_marked_interfaces (rebind=1)
>                    …-> uas_disconnect  (scsi_host_set_state to SHOST_CANCEL_RECOVERY)
>                         … -> scsi_queue_rq

How does scsi_queue_rq get called here?  As far as I can see, this 
shouldn't happen.

>                              -> scsi_host_queue_ready(return 0 causes IO hangs up.)
>             -> usb_unlock_device          <- lock cannot be release since usb_reset_device not finish.
> 
> 
> Countermeasure
> - Make uas_post_reset doesn’t return 1 when ENODEV returns from uas_configure_endpoints since usb_unbind_and_rebind_marded_interfaces doesn’t need to do unbind/rebind operations in this situation.
> blk_mq_timeout_work
>   …->scsi_times_out  (… means some functions are not listed before this function.)
>     …-> scsi_eh_scmd_add(scsi_host_set_state to SHOST_RECOVERY) 
>       … -> scsi_error_handler
>        …-> uas_eh_device_reset_handler (*1)
>            -> usb_lock_device_for_reset  <- take lock
>              -> usb_reset_device
>                -> usb_reset_and_verify_device (return ENODEV and FAILED will be reported to *1)
>                -> uas_post_reset returns 0 when ENODEV => rebind=0 
>                -> usb_unbind_and_rebind_marked_interfaces (rebind=0)

The difference is that uas_disconnect wasn't called here.  But that
routine should not cause any problems -- you're always supposed to be
able to unbind a driver from a device.  So it looks like this is not
the right way to solve the problem.

Alan Stern

>            -> usb_unlock_device          <- release lock
> 
> 
> We can get error(-ENODEV) at uas_eh_device_reset_handler from usb_reset_and_verify_device.
> 
> Regards,
> Kento Kobayashi
>

^ permalink raw reply	[flat|nested] 39+ messages in thread
* usb: uas: fix usb subsystem hang after power off hub port
@ 2019-04-02  0:28 Kento.A.Kobayashi
  0 siblings, 0 replies; 39+ messages in thread
From: Kento.A.Kobayashi @ 2019-04-02  0:28 UTC (permalink / raw)
  To: oneukum, gregkh, stern
  Cc: usb-storage, Jacky.Cao, linux-kernel, linux-scsi, linux-usb,
	Kento.A.Kobayashi

Hi,

>> Hi,
>> 
>> > Sorry,
>> > 
>> > I thought this was clear. Your patch is making the assumption that the reset is triggered by the SCSI layer. You cannot make that assumption, as there is an ioctl for resetting a USB device.
>> > In case we are getting an error during the reset (our endpoints vanish), the device driver must report that to the USB layer, so the driver will always be disconnected.
>> > We cannot drop errors.
>> > 
>> > 	Regards
>> > 		Oliver
>> 
>> This patch modified uas_post_reset to skip rebind operation to avoid exception while -ENODEV happens not drop error.
>> If uas_post_reset happens -ENODEV, usb_reset_and_verify_device must happen error.
>> So,when we use ioctl(USBDEVFS_RESET) to reset device, if usb_reset_and_verify_device happens error, the error will be reported through ioctl return value. 
>
>OK, It is possible that I am stupid. We must rebind if uas_post_reset() fails. The driver will crash without the endpoints. Can you please explain again in greater detail, what you are trying to achieve?

Follow is details for this patch.

Issue
- USB subsystem hangs if power off the hub port connecting UAS USB3.0/3.1 device by calling ioctl(USBDEVFS_CONTROL) to do Hub Class Request(CLEAR_FEATURE:PORT_POWER) while the device is being accessed. 
- Status of the process that is accessing the device becomes DEAD and cannot be killed.

Root Cause
- Block layer timeout happens after power off UAS USB device which is accessed as reproduce step. During timeout error handler process, scsi host state becomes SHOST_CANCEL_RECOVERY that causes IO hangs up and lock cannot be released. And in final, usb subsystem hangs up.
Follow is function call:
blk_mq_timeout_work 
  …->scsi_times_out  (… means some functions are not listed before this function.)
    …-> scsi_eh_scmd_add(scsi_host_set_state to SHOST_RECOVERY) 
      … -> scsi_error_handler
        …-> uas_eh_device_reset_handler
            -> usb_lock_device_for_reset  <- take lock
              -> usb_reset_device
                …-> rebind = uas_post_reset (return 1 since ENODEV) 
                …-> usb_unbind_and_rebind_marked_interfaces (rebind=1)
                   …-> uas_disconnect  (scsi_host_set_state to SHOST_CANCEL_RECOVERY)
                        … -> scsi_queue_rq
                             -> scsi_host_queue_ready(return 0 causes IO hangs up.)
            -> usb_unlock_device          <- lock cannot be release since usb_reset_device not finish.


Countermeasure
- Make uas_post_reset doesn’t return 1 when ENODEV returns from uas_configure_endpoints since usb_unbind_and_rebind_marded_interfaces doesn’t need to do unbind/rebind operations in this situation.
blk_mq_timeout_work
  …->scsi_times_out  (… means some functions are not listed before this function.)
    …-> scsi_eh_scmd_add(scsi_host_set_state to SHOST_RECOVERY) 
      … -> scsi_error_handler
       …-> uas_eh_device_reset_handler (*1)
           -> usb_lock_device_for_reset  <- take lock
             -> usb_reset_device
               -> usb_reset_and_verify_device (return ENODEV and FAILED will be reported to *1)
               -> uas_post_reset returns 0 when ENODEV => rebind=0 
               -> usb_unbind_and_rebind_marked_interfaces (rebind=0)
           -> usb_unlock_device          <- release lock


We can get error(-ENODEV) at uas_eh_device_reset_handler from usb_reset_and_verify_device.

Regards,
Kento Kobayashi

^ permalink raw reply	[flat|nested] 39+ messages in thread
* usb: uas: fix usb subsystem hang after power off hub port
@ 2019-03-29 14:13 Alan Stern
  0 siblings, 0 replies; 39+ messages in thread
From: Alan Stern @ 2019-03-29 14:13 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: gregkh, usb-storage, Jacky.Cao, Kento.A.Kobayashi, linux-kernel,
	linux-scsi, linux-usb

On Thu, 28 Mar 2019, Oliver Neukum wrote:

> On Do, 2019-03-28 at 11:57 -0400, Alan Stern wrote:
> > On Thu, 28 Mar 2019, Oliver Neukum wrote:
> > 
> > > On Do, 2019-03-28 at 07:53 +0000, Kento.A.Kobayashi@sony.com wrote:
> > > > Hi,
> > > > 
> > > > > Sorry,
> > > > > 
> > > > > I thought this was clear. Your patch is making the assumption that the reset is triggered by the SCSI layer. You cannot make that assumption, as there is an ioctl for resetting a USB device.
> > > > > In case we are getting an error during the reset (our endpoints vanish), the device driver must report that to the USB layer, so the driver will always be disconnected.
> > > > > We cannot drop errors.
> > > > > 
> > > > >   Regards
> > > > >           Oliver
> > > > 
> > > > This patch modified uas_post_reset to skip rebind operation to avoid exception while -ENODEV happens not drop error.
> > > > If uas_post_reset happens -ENODEV, usb_reset_and_verify_device must happen error.
> > > > So,when we use ioctl(USBDEVFS_RESET) to reset device, if usb_reset_and_verify_device happens error, the error will be reported through ioctl return value. 
> > > 
> > > OK, It is possible that I am stupid. We must rebind if uas_post_reset()
> > > fails. The driver will crash without the endpoints. Can you please
> > > explain again in greater detail, what you are trying to achieve?
> > 
> > Actually no, the driver doesn't have to do anything if the post-reset 
> > method fails.  usbcore will take care of rebinding automatically.
> 
> Yes, but the rebinding is not optional. Hence, either the post_reset()
> must fail to trigger it, or it will be triggered anyway.
> So if the rebinding hangs the machine, I cannot see how alter
> changing the return value of uas_post_reset() would help.
> 
> It looks to me like the state transitions of SCSI are too
> restrictive.

Maybe I don't understand the problem correctly.  It sounds like the
problem occurs because uas tries to rebind (Where, how, and why does it
do this?  I don't see anything special in uas_post_reset) using the
same scsi_host structure as before instead of creating a new one.  
Trying to reuse a defunct host is definitely not allowed.

But if uas didn't try to do anything special, it would be unbound from
the device, causing the scsi_host to be destroyed.  Then when usbcore
automatically attempted a rebind, uas would create a new scsi_host and
the rebind would have a chance to succeed, without hanging anything.

If this description is wrong, please explain more fully.

Alan Stern

^ permalink raw reply	[flat|nested] 39+ messages in thread
* usb: uas: fix usb subsystem hang after power off hub port
@ 2019-03-28 16:49 Oliver Neukum
  0 siblings, 0 replies; 39+ messages in thread
From: Oliver Neukum @ 2019-03-28 16:49 UTC (permalink / raw)
  To: Alan Stern
  Cc: gregkh, usb-storage, Jacky.Cao, Kento.A.Kobayashi, linux-kernel,
	linux-scsi, linux-usb

On Do, 2019-03-28 at 11:57 -0400, Alan Stern wrote:
> On Thu, 28 Mar 2019, Oliver Neukum wrote:
> 
> > On Do, 2019-03-28 at 07:53 +0000, Kento.A.Kobayashi@sony.com wrote:
> > > Hi,
> > > 
> > > > Sorry,
> > > > 
> > > > I thought this was clear. Your patch is making the assumption that the reset is triggered by the SCSI layer. You cannot make that assumption, as there is an ioctl for resetting a USB device.
> > > > In case we are getting an error during the reset (our endpoints vanish), the device driver must report that to the USB layer, so the driver will always be disconnected.
> > > > We cannot drop errors.
> > > > 
> > > >   Regards
> > > >           Oliver
> > > 
> > > This patch modified uas_post_reset to skip rebind operation to avoid exception while -ENODEV happens not drop error.
> > > If uas_post_reset happens -ENODEV, usb_reset_and_verify_device must happen error.
> > > So,when we use ioctl(USBDEVFS_RESET) to reset device, if usb_reset_and_verify_device happens error, the error will be reported through ioctl return value. 
> > 
> > OK, It is possible that I am stupid. We must rebind if uas_post_reset()
> > fails. The driver will crash without the endpoints. Can you please
> > explain again in greater detail, what you are trying to achieve?
> 
> Actually no, the driver doesn't have to do anything if the post-reset 
> method fails.  usbcore will take care of rebinding automatically.

Yes, but the rebinding is not optional. Hence, either the post_reset()
must fail to trigger it, or it will be triggered anyway.
So if the rebinding hangs the machine, I cannot see how alter
changing the return value of uas_post_reset() would help.

It looks to me like the state transitions of SCSI are too
restrictive.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 39+ messages in thread
* usb: uas: fix usb subsystem hang after power off hub port
@ 2019-03-28 15:57 Alan Stern
  0 siblings, 0 replies; 39+ messages in thread
From: Alan Stern @ 2019-03-28 15:57 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Kento.A.Kobayashi, gregkh, usb-storage, Jacky.Cao, linux-kernel,
	linux-scsi, linux-usb

On Thu, 28 Mar 2019, Oliver Neukum wrote:

> On Do, 2019-03-28 at 07:53 +0000, Kento.A.Kobayashi@sony.com wrote:
> > Hi,
> > 
> > > Sorry,
> > > 
> > > I thought this was clear. Your patch is making the assumption that the reset is triggered by the SCSI layer. You cannot make that assumption, as there is an ioctl for resetting a USB device.
> > > In case we are getting an error during the reset (our endpoints vanish), the device driver must report that to the USB layer, so the driver will always be disconnected.
> > > We cannot drop errors.
> > > 
> > > 	Regards
> > > 		Oliver
> > 
> > This patch modified uas_post_reset to skip rebind operation to avoid exception while -ENODEV happens not drop error.
> > If uas_post_reset happens -ENODEV, usb_reset_and_verify_device must happen error.
> > So,when we use ioctl(USBDEVFS_RESET) to reset device, if usb_reset_and_verify_device happens error, the error will be reported through ioctl return value. 
> 
> OK, It is possible that I am stupid. We must rebind if uas_post_reset()
> fails. The driver will crash without the endpoints. Can you please
> explain again in greater detail, what you are trying to achieve?

Actually no, the driver doesn't have to do anything if the post-reset 
method fails.  usbcore will take care of rebinding automatically.

Alan Stern

^ permalink raw reply	[flat|nested] 39+ messages in thread
* usb: uas: fix usb subsystem hang after power off hub port
@ 2019-03-28 15:15 Oliver Neukum
  0 siblings, 0 replies; 39+ messages in thread
From: Oliver Neukum @ 2019-03-28 15:15 UTC (permalink / raw)
  To: Kento.A.Kobayashi, gregkh, stern
  Cc: usb-storage, Jacky.Cao, linux-kernel, linux-scsi, linux-usb

On Do, 2019-03-28 at 07:53 +0000, Kento.A.Kobayashi@sony.com wrote:
> Hi,
> 
> > Sorry,
> > 
> > I thought this was clear. Your patch is making the assumption that the reset is triggered by the SCSI layer. You cannot make that assumption, as there is an ioctl for resetting a USB device.
> > In case we are getting an error during the reset (our endpoints vanish), the device driver must report that to the USB layer, so the driver will always be disconnected.
> > We cannot drop errors.
> > 
> > 	Regards
> > 		Oliver
> 
> This patch modified uas_post_reset to skip rebind operation to avoid exception while -ENODEV happens not drop error.
> If uas_post_reset happens -ENODEV, usb_reset_and_verify_device must happen error.
> So,when we use ioctl(USBDEVFS_RESET) to reset device, if usb_reset_and_verify_device happens error, the error will be reported through ioctl return value. 

OK, It is possible that I am stupid. We must rebind if uas_post_reset()
fails. The driver will crash without the endpoints. Can you please
explain again in greater detail, what you are trying to achieve?

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 39+ messages in thread
* usb: uas: fix usb subsystem hang after power off hub port
@ 2019-03-28  7:53 Kento.A.Kobayashi
  0 siblings, 0 replies; 39+ messages in thread
From: Kento.A.Kobayashi @ 2019-03-28  7:53 UTC (permalink / raw)
  To: oneukum, gregkh, stern
  Cc: usb-storage, Jacky.Cao, linux-kernel, linux-scsi, linux-usb,
	Kento.A.Kobayashi

Hi,

>Sorry,
>
>I thought this was clear. Your patch is making the assumption that the reset is triggered by the SCSI layer. You cannot make that assumption, as there is an ioctl for resetting a USB device.
>In case we are getting an error during the reset (our endpoints vanish), the device driver must report that to the USB layer, so the driver will always be disconnected.
>We cannot drop errors.
>
>	Regards
>		Oliver

This patch modified uas_post_reset to skip rebind operation to avoid exception while -ENODEV happens not drop error.
If uas_post_reset happens -ENODEV, usb_reset_and_verify_device must happen error.
So,when we use ioctl(USBDEVFS_RESET) to reset device, if usb_reset_and_verify_device happens error, the error will be reported through ioctl return value. 

Regards,
Kento Kobayashi

^ permalink raw reply	[flat|nested] 39+ messages in thread
* usb: uas: fix usb subsystem hang after power off hub port
@ 2019-03-25 10:34 Oliver Neukum
  0 siblings, 0 replies; 39+ messages in thread
From: Oliver Neukum @ 2019-03-25 10:34 UTC (permalink / raw)
  To: Kento.A.Kobayashi, gregkh, stern
  Cc: usb-storage, Jacky.Cao, linux-kernel, linux-scsi, linux-usb

On Mo, 2019-03-25 at 10:21 +0000, Kento.A.Kobayashi@sony.com wrote:
> Hi,
> 
> > > > And the return value(-ENODEV) will be returned to error handler.
> > > > uas_eh_device_reset_handler->usb_reset_device -> usb_reset_and_verify_device (return -ENODEV) Then I wrote that commit message that we think even if we ignore "ENODEV" in post reset to avoid hang issue but the error will also be reported to error handler.
> > > > #If ignore an error and the error will not be reported then it is not good.
> > > 
> > > Well, what do we do now? Are you saying that the state model SCSI is using is wrong?
> > No, it is not wrong.
> > But I think we may want to prevend hang by sleep peocess with continuing to take a lock at blk_execute_rq() from >usb_unbind_and_rebind_marded_interfaces().
> 
> Could you please you can apply this patch for mainline or not?
> If not, could you please tell me why you can't allow to apply this patch?

Sorry,

I thought this was clear. Your patch is making the assumption that
the reset is triggered by the SCSI layer. You cannot make that
assumption, as there is an ioctl for resetting a USB device.
In case we are getting an error during the reset (our endpoints
vanish), the device driver must report that to the USB layer,
so the driver will always be disconnected.
We cannot drop errors.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 39+ messages in thread
* usb: uas: fix usb subsystem hang after power off hub port
@ 2019-03-25 10:21 Kento.A.Kobayashi
  0 siblings, 0 replies; 39+ messages in thread
From: Kento.A.Kobayashi @ 2019-03-25 10:21 UTC (permalink / raw)
  To: oneukum, gregkh, stern
  Cc: usb-storage, Jacky.Cao, linux-kernel, linux-scsi, linux-usb

Hi,

>>> And the return value(-ENODEV) will be returned to error handler.
>>> uas_eh_device_reset_handler->usb_reset_device -> usb_reset_and_verify_device (return -ENODEV) Then I wrote that commit message that we think even if we ignore "ENODEV" in post reset to avoid hang issue but the error will also be reported to error handler.
>>> #If ignore an error and the error will not be reported then it is not good.
>>
>>Well, what do we do now? Are you saying that the state model SCSI is using is wrong?

>No, it is not wrong.
>But I think we may want to prevend hang by sleep peocess with continuing to take a lock at blk_execute_rq() from >usb_unbind_and_rebind_marded_interfaces().

Could you please you can apply this patch for mainline or not?
If not, could you please tell me why you can't allow to apply this patch?

Regards,
Kento Kobayashi

^ permalink raw reply	[flat|nested] 39+ messages in thread
* usb: uas: fix usb subsystem hang after power off hub port
@ 2019-03-15  2:28 Kento.A.Kobayashi
  0 siblings, 0 replies; 39+ messages in thread
From: Kento.A.Kobayashi @ 2019-03-15  2:28 UTC (permalink / raw)
  To: oneukum, gregkh, stern
  Cc: usb-storage, Jacky.Cao, linux-kernel, linux-scsi, linux-usb

Hi,

>> And the return value(-ENODEV) will be returned to error handler.
>> uas_eh_device_reset_handler->usb_reset_device -> usb_reset_and_verify_device (return -ENODEV) Then I wrote that commit message that we think even if we ignore "ENODEV" in post reset to avoid hang issue but the error will also be reported to error handler.
>> #If ignore an error and the error will not be reported then it is not good.
>
>Well, what do we do now? Are you saying that the state model SCSI is using is wrong?

No, it is not wrong.
But I think we may want to prevend hang by sleep peocess with continuing to take a lock at blk_execute_rq() from usb_unbind_and_rebind_marded_interfaces().

Regards,
Kento Kobayashi

^ permalink raw reply	[flat|nested] 39+ messages in thread
* usb: uas: fix usb subsystem hang after power off hub port
@ 2019-03-12 15:37 Oliver Neukum
  0 siblings, 0 replies; 39+ messages in thread
From: Oliver Neukum @ 2019-03-12 15:37 UTC (permalink / raw)
  To: Kento.A.Kobayashi, gregkh, stern
  Cc: usb-storage, Jacky.Cao, Kan.Iibuchi, No.Tanaka, linux-kernel,
	linux-scsi, linux-usb

On Mo, 2019-03-11 at 08:36 +0000, Kento.A.Kobayashi@sony.com wrote:
> Hi,
> 
> > no I am sorry, that is an assumption you just cannot make.
> > Anything can trigger a reset. That being SCSI is the common case certainly, but not the only case. And in those cases we cannot depend on upper layers doing the right thing, if we just ignore an error.
> 
> While we investigate this issue, we debugged and found usb_reset_and_verify_device will return -NODEV before enter post_reset operation.

Yes, this can happen.

> And the return value(-ENODEV) will be returned to error handler.
> uas_eh_device_reset_handler->usb_reset_device -> usb_reset_and_verify_device (return -ENODEV) Then I wrote that commit message that we think even if we ignore "ENODEV" in post reset to avoid hang issue but the error will also be reported to error handler.
> #If ignore an error and the error will not be reported then it is not good.

Well, what do we do now? Are you saying that the state model SCSI is
using is wrong?

> Additional information about usb-storage driver(usb/storage/usb.c) in usb_stor_post_reset() function, it always return 0 that means rebind will not be execute and this issue doesn't happen.

I am afraid this is only partially correct. The device descriptors can
still fail to match.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 39+ messages in thread
* usb: uas: fix usb subsystem hang after power off hub port
@ 2019-03-11  8:36 Kento.A.Kobayashi
  0 siblings, 0 replies; 39+ messages in thread
From: Kento.A.Kobayashi @ 2019-03-11  8:36 UTC (permalink / raw)
  To: oneukum, gregkh, stern
  Cc: usb-storage, linux-kernel, linux-scsi, linux-usb, Jacky.Cao,
	No.Tanaka, Kan.Iibuchi

Hi,

>no I am sorry, that is an assumption you just cannot make.
>Anything can trigger a reset. That being SCSI is the common case certainly, but not the only case. And in those cases we cannot depend on upper layers doing the right thing, if we just ignore an error.
While we investigate this issue, we debugged and found usb_reset_and_verify_device will return -NODEV before enter post_reset operation.
And the return value(-ENODEV) will be returned to error handler.
uas_eh_device_reset_handler->usb_reset_device -> usb_reset_and_verify_device (return -ENODEV) Then I wrote that commit message that we think even if we ignore "ENODEV" in post reset to avoid hang issue but the error will also be reported to error handler.
#If ignore an error and the error will not be reported then it is not good.

Additional information about usb-storage driver(usb/storage/usb.c) in usb_stor_post_reset() function, it always return 0 that means rebind will not be execute and this issue doesn't happen.

Regards,
Kento Kobayashi

-----Original Message-----
From: Oliver Neukum <oneukum@suse.com> 
Sent: Saturday, March 9, 2019 1:52 AM
To: Kobayashi, Kento (Sony) <Kento.A.Kobayashi@sony.com>; gregkh@linuxfoundation.org; stern@rowland.harvard.edu
Cc: usb-storage@lists.one-eyed-alien.net; linux-kernel@vger.kernel.org; linux-scsi@vger.kernel.org; linux-usb@vger.kernel.org
Subject: Re: [PATCH] usb: uas: fix usb subsystem hang after power off hub port

On Fr, 2019-03-08 at 09:13 +0000, Kento.A.Kobayashi@sony.com wrote:
> The usb_reset_and_verify_device included in usb_reset_device fails 
> with -ENODEV after power off hub port, and the -ENODEV error will be 
> reported to uas_eh_bus_reset_handler and upper layer, so it doesn't 
> need to do rebind if -ENODEV happens.

Hi,

no I am sorry, that is an assumption you just cannot make.
Anything can trigger a reset. That being SCSI is the common case certainly, but not the only case. And in those cases we cannot depend on upper layers doing the right thing, if we just ignore an error.

NACK

	Sorry
		Oliver

^ permalink raw reply	[flat|nested] 39+ messages in thread
* usb: uas: fix usb subsystem hang after power off hub port
@ 2019-03-08 17:33 Alan Stern
  0 siblings, 0 replies; 39+ messages in thread
From: Alan Stern @ 2019-03-08 17:33 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Kento.A.Kobayashi, gregkh, usb-storage, linux-kernel, linux-scsi,
	linux-usb

On Fri, 8 Mar 2019, Oliver Neukum wrote:

> On Fr, 2019-03-08 at 09:13 +0000, Kento.A.Kobayashi@sony.com wrote:
> > The usb_reset_and_verify_device included in usb_reset_device fails
> > with -ENODEV after power off hub port, and the -ENODEV error will
> > be reported to uas_eh_bus_reset_handler and upper layer, so it
> > doesn't need to do rebind if -ENODEV happens.
> 
> Hi,
> 
> no I am sorry, that is an assumption you just cannot make.
> Anything can trigger a reset. That being SCSI is the common
> case certainly, but not the only case. And in those cases
> we cannot depend on upper layers doing the right thing, if
> we just ignore an error.
> 
> NACK
> 
> 	Sorry
> 		Oliver

Note that the reset routines in usb-storage do not make any exceptions
for -ENODEV.

Alan Stern

^ permalink raw reply	[flat|nested] 39+ messages in thread
* usb: uas: fix usb subsystem hang after power off hub port
@ 2019-03-08 16:52 Oliver Neukum
  0 siblings, 0 replies; 39+ messages in thread
From: Oliver Neukum @ 2019-03-08 16:52 UTC (permalink / raw)
  To: Kento.A.Kobayashi, gregkh, stern
  Cc: usb-storage, linux-kernel, linux-scsi, linux-usb

On Fr, 2019-03-08 at 09:13 +0000, Kento.A.Kobayashi@sony.com wrote:
> The usb_reset_and_verify_device included in usb_reset_device fails
> with -ENODEV after power off hub port, and the -ENODEV error will
> be reported to uas_eh_bus_reset_handler and upper layer, so it
> doesn't need to do rebind if -ENODEV happens.

Hi,

no I am sorry, that is an assumption you just cannot make.
Anything can trigger a reset. That being SCSI is the common
case certainly, but not the only case. And in those cases
we cannot depend on upper layers doing the right thing, if
we just ignore an error.

NACK

	Sorry
		Oliver

^ permalink raw reply	[flat|nested] 39+ messages in thread
* usb: uas: fix usb subsystem hang after power off hub port
@ 2019-03-08  9:13 Kento.A.Kobayashi
  0 siblings, 0 replies; 39+ messages in thread
From: Kento.A.Kobayashi @ 2019-03-08  9:13 UTC (permalink / raw)
  To: oneukum, stern, gregkh; +Cc: linux-usb, linux-scsi, usb-storage, linux-kernel

The issue happens with following steps:
Access usb3.0/3.1 device that uses uas driver.
Power off hub port connecting device by ioctl(USBDEVFS_CONTROL).
Wait longer than 30s(scsi layer timeout period is 30s).
Execute commands like lsusb, no response and usb subsytem hangs.

After scsi layer timeout, uas_eh_bus_reset_handler works and enter
usb_reset_device. During reset, current uas_post_reset returns 1 if
uas_configure_endpoints fails with -ENODEV, then it tries to rebind
uas driver. The unbind request cannot complete because program goes
to host_not_ready in scsi_request_fn. As result, it stops at device
reset process and the lock took before reset will not be released.

The usb_reset_and_verify_device included in usb_reset_device fails
with -ENODEV after power off hub port, and the -ENODEV error will
be reported to uas_eh_bus_reset_handler and upper layer, so it
doesn't need to do rebind if -ENODEV happens.

Signed-off-by: Kento Kobayashi <Kento.A.Kobayashi@sony.com>
---
drivers/usb/storage/uas.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/usb/storage/uas.c b/drivers/usb/storage/uas.c
index 36742e8..24b09fd 100644
--- a/drivers/usb/storage/uas.c
+++ b/drivers/usb/storage/uas.c
@@ -1116,6 +1116,9 @@ static int uas_post_reset(struct usb_interface *intf)

        scsi_unblock_requests(shost);

+       if (err == -ENODEV)
+               return 0;
+
        return err ? 1 : 0;
 }


^ permalink raw reply related	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2019-04-16  2:32 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-04-04  3:57 usb: uas: fix usb subsystem hang after power off hub port Kento.A.Kobayashi
2019-04-04 19:33 ` Alan Stern
2019-04-09  0:28   ` Kento.A.Kobayashi
2019-04-09  0:28     ` [PATCH] " Kento.A.Kobayashi
2019-04-09  1:21     ` Alan Stern
2019-04-09  1:21       ` [PATCH] " Alan Stern
2019-04-09  2:10   ` Martin K. Petersen
2019-04-09  2:10     ` [PATCH] " Martin K. Petersen
2019-04-09 14:44     ` Alan Stern
2019-04-09 14:44       ` [PATCH] " Alan Stern
2019-04-09 15:16       ` Bart Van Assche
2019-04-09 15:16         ` [PATCH] " Bart Van Assche
2019-04-09 16:45         ` Alan Stern
2019-04-09 16:45           ` [PATCH] " Alan Stern
2019-04-15  0:27           ` Kento.A.Kobayashi
2019-04-15  0:27             ` [PATCH] " Kento.A.Kobayashi
2019-04-15 15:18             ` Alan Stern
2019-04-15 15:18               ` [PATCH] " Alan Stern
2019-04-15 15:32               ` Alan Stern
2019-04-15 15:32                 ` [PATCH] " Alan Stern
2019-04-16  2:31                 ` Kento.A.Kobayashi
2019-04-16  2:31                   ` [PATCH] " Kento.A.Kobayashi
2019-04-10  2:11         ` Martin K. Petersen
2019-04-10  2:11           ` [PATCH] " Martin K. Petersen
  -- strict thread matches above, loose matches on Subject: below --
2019-04-02 14:38 Alan Stern
2019-04-02  0:28 Kento.A.Kobayashi
2019-03-29 14:13 Alan Stern
2019-03-28 16:49 Oliver Neukum
2019-03-28 15:57 Alan Stern
2019-03-28 15:15 Oliver Neukum
2019-03-28  7:53 Kento.A.Kobayashi
2019-03-25 10:34 Oliver Neukum
2019-03-25 10:21 Kento.A.Kobayashi
2019-03-15  2:28 Kento.A.Kobayashi
2019-03-12 15:37 Oliver Neukum
2019-03-11  8:36 Kento.A.Kobayashi
2019-03-08 17:33 Alan Stern
2019-03-08 16:52 Oliver Neukum
2019-03-08  9:13 Kento.A.Kobayashi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).