public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Wenchao Hao <haowenchao22@gmail.com>
To: Hannes Reinecke <hare@suse.de>,
	"James E . J . Bottomley" <James.Bottomley@HansenPartnership.com>,
	"Martin K . Petersen" <martin.petersen@oracle.com>,
	linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v5 1/3] scsi: core: Add new helper to iterate all devices of host
Date: Fri, 12 Jul 2024 10:20:40 +0800	[thread overview]
Message-ID: <af8d4040-6940-41a6-a889-fb687adbaf9a@gmail.com> (raw)
In-Reply-To: <cb9b64a1-4c51-481a-ae5a-c20df70360ea@gmail.com>

On 2024/6/13 15:10, Wenchao Hao wrote:
> On 2024/6/13 14:27, Hannes Reinecke wrote:
>> On 6/12/24 17:06, Wenchao Hao wrote:
>>> On 6/12/24 4:33 PM, Hannes Reinecke wrote:
>>>> On 6/5/24 11:17, Wenchao Hao wrote:
>>>>> shost_for_each_device() would skip devices which is in SDEV_CANCEL or
>>>>> SDEV_DEL state, for some scenarios, we donot want to skip these devices,
>>>>> so add a new macro shost_for_each_device_include_deleted() to handle it.
>>>>>
>>>>> Following changes are introduced:
>>>>>
>>>>> 1. Rework scsi_device_get(), add new helper __scsi_device_get() which
>>>>>      determine if skip deleted scsi_device by parameter "skip_deleted".
>>>>> 2. Add new parameter "skip_deleted" to __scsi_iterate_devices() which
>>>>>      is used when calling __scsi_device_get()
>>>>> 3. Update shost_for_each_device() to call __scsi_iterate_devices() with
>>>>>      "skip_deleted" true
>>>>> 4. Add new macro shost_for_each_device_include_deleted() which call
>>>>>      __scsi_iterate_devices() with "skip_deleted" false
>>>>>
>>>>> Signed-off-by: Wenchao Hao <haowenchao22@gmail.com>
>>>>> ---
>>>>>    drivers/scsi/scsi.c        | 46 ++++++++++++++++++++++++++------------
>>>>>    include/scsi/scsi_device.h | 25 ++++++++++++++++++---
>>>>>    2 files changed, 54 insertions(+), 17 deletions(-)
>>>>>
>>>>> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
>>>>> index 3e0c0381277a..5913de543d93 100644
>>>>> --- a/drivers/scsi/scsi.c
>>>>> +++ b/drivers/scsi/scsi.c
>>>>> @@ -735,20 +735,18 @@ int scsi_cdl_enable(struct scsi_device *sdev, bool enable)
>>>>>        return 0;
>>>>>    }
>>>>>    -/**
>>>>> - * scsi_device_get  -  get an additional reference to a scsi_device
>>>>> +/*
>>>>> + * __scsi_device_get  -  get an additional reference to a scsi_device
>>>>>     * @sdev:    device to get a reference to
>>>>> - *
>>>>> - * Description: Gets a reference to the scsi_device and increments the use count
>>>>> - * of the underlying LLDD module.  You must hold host_lock of the
>>>>> - * parent Scsi_Host or already have a reference when calling this.
>>>>> - *
>>>>> - * This will fail if a device is deleted or cancelled, or when the LLD module
>>>>> - * is in the process of being unloaded.
>>>>> + * @skip_deleted: when true, would return failed if device is deleted
>>>>>     */
>>>>> -int scsi_device_get(struct scsi_device *sdev)
>>>>> +static int __scsi_device_get(struct scsi_device *sdev, bool skip_deleted)
>>>>>    {
>>>>> -    if (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL)
>>>>> +    /*
>>>>> +     * if skip_deleted is true and device is in removing, return failed
>>>>> +     */
>>>>> +    if (skip_deleted &&
>>>>> +        (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL))
>>>>>            goto fail;
>>>>
>>>> Nack.
>>>> SDEV_DEL means the device is about to be deleted, so we _must not_ access it at all.
>>>>
>>>
>>> Sorry I added SDEV_DEL here at hand without understanding what it means.
>>> Actually, just include scsi_device which is in SDEV_CANCEL would fix the
>>> issues I described.
>>>
>>> The issues are because device removing concurrent with error handle.
>>> Normally, error handle would not be triggered when scsi_device is in
>>> SDEV_DEL. Below is my analysis, if it is wrong, please correct me.
>>>
>>> If there are scsi_cmnd remain unfinished when removing scsi_device,
>>> the removing process would waiting for all commands to be finished.
>>> If commands error happened and trigger error handle, the removing
>>> process would be blocked until error handle finished, because
>>> __scsi_remove_device called  del_gendisk() which would wait all
>>> requests to be finished. So now scsi_device is in SDEV_CANCEL.
>>>
>>> If the scsi_device is already in SDEV_DEL, then no scsi_cmnd has been
>>> dispatched to this scsi_device, then error handle would never triggered.
>>>
>>> I want to change the new function __scsi_device_get() as following,
>>> please help to review.
>>>
>>> /*
>>>   * __scsi_device_get  -  get an additional reference to a scsi_device
>>>   * @sdev:    device to get a reference to
>>>   * @skip_canceled: when true, would return failed if device is deleted
>>>   */
>>> static int __scsi_device_get(struct scsi_device *sdev, bool skip_canceled)
>>> {
>>>     /*
>>>      * if skip_canceled is true and device is in removing, return failed
>>>      */
>>>     if (sdev->sdev_state == SDEV_DEL ||
>>>         (sdev->sdev_state == SDEV_CANCEL && skip_canceled))
>>>         goto fail;
>>>     if (!try_module_get(sdev->host->hostt->module))
>>>         goto fail;
>>>     if (!get_device(&sdev->sdev_gendev))
>>>         goto fail_put_module;
>>>     return 0;
>>>
>>> fail_put_module:
>>>     module_put(sdev->host->hostt->module);
>>> fail:
>>>     return -ENXIO;
>>> }
>>>
>> I don't think that's required.
>> With your above analysis, wouldn't the problem be solved with:
>>
>> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
>> index 775df00021e4..911fcfa80d69 100644
>> --- a/drivers/scsi/scsi_sysfs.c
>> +++ b/drivers/scsi/scsi_sysfs.c
>> @@ -1470,6 +1470,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
>>          if (sdev->sdev_state == SDEV_DEL)
>>                  return;
>>
>> +       scsi_block_when_processing_errors(sdev);
>> +
>>          if (sdev->is_visible) {
>>                  /*
>>                   * If scsi_internal_target_block() is running concurrently,
>>
>> Hmm?
>>
> 
> We can not make sure no scsi_cmnd remain unfinished after scsi_block_when_processing_errors(). For example, there is a
> command has been dispatched but it's not timeouted when removing
> device, no error happened. After scsi_device is set to SDEV_CANCEL,
> the removing process would be blocked by del_gendisk() because there
> is still a request.
> 
> Then the request timeout and abort failed, error handle would be triggered, now scsi_device is SDEV_CANCEL.
> 
> The error handle would skip this scsi_device when doing device reset.
> 
>> Cheers,
>>
>> Hannes
> 
Hi Hannes,

Would you review these patches? Or do how do you suggest to fix the issues?

thanks.

  reply	other threads:[~2024-07-12  2:20 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-05  9:17 [PATCH v5 0/3] SCSI: Fix issues between removing device and error handle Wenchao Hao
2024-06-05  9:17 ` [PATCH v5 1/3] scsi: core: Add new helper to iterate all devices of host Wenchao Hao
2024-06-12  8:33   ` Hannes Reinecke
2024-06-12 15:06     ` Wenchao Hao
2024-06-13  6:27       ` Hannes Reinecke
2024-06-13  7:10         ` Wenchao Hao
2024-07-12  2:20           ` Wenchao Hao [this message]
2024-06-05  9:17 ` [PATCH v5 2/3] scsi: scsi_error: Fix wrong statistic when print error info Wenchao Hao
2024-06-12  8:34   ` Hannes Reinecke
2024-06-12 15:12     ` Wenchao Hao
2024-06-05  9:17 ` [PATCH v5 3/3] scsi: scsi_error: Fix device reset is not triggered Wenchao Hao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=af8d4040-6940-41a6-a889-fb687adbaf9a@gmail.com \
    --to=haowenchao22@gmail.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=hare@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox