From: Dmitry Bogdanov <d.bogdanov@yadro.com>
To: Mike Christie <michael.christie@oracle.com>
Cc: <mlombard@redhat.com>, <martin.petersen@oracle.com>,
<mgurtovoy@nvidia.com>, <sagi@grimberg.me>,
<linux-scsi@vger.kernel.org>, <target-devel@vger.kernel.org>
Subject: Re: [PATCH 13/18] scsi: target: Fix multiple LUN_RESET handling
Date: Wed, 15 Mar 2023 19:13:01 +0300 [thread overview]
Message-ID: <20230315161301.GE1031@yadro.com> (raw)
In-Reply-To: <20230309223312.94595-14-michael.christie@oracle.com>
On Thu, Mar 09, 2023 at 04:33:07PM -0600, Mike Christie wrote:
> «Внимание! Данное письмо от внешнего адресата!»
>
> This fixes a bug where an initiator thinks a LUN_RESET has cleaned
> up running commands when it hasn't. The bug was added in:
>
> commit 51ec502a3266 ("target: Delete tmr from list before processing")
>
> The problem occurs when:
>
> 1. We have N IO cmds running in the target layer spread over 2 sessions.
> 2. The initiator sends a LUN_RESET for each session.
> 3. session1's LUN_RESET loops over all the running commands from both
> sessions and moves them to its local drain_task_list.
> 4. session2's LUN_RESET does not see the LUN_RESET from session1 because
> the commit above has it remove itself. session2 also does not see any
> commands since the other reset moved them off the state lists.
> 5. sessions2's LUN_RESET will then complete with a successful response.
> 6. sessions2's inititor believes the running commands on its session are
> now cleaned up due to the successful response and cleans up the running
> commands from its side. It then restarts them.
> 7. The commands do eventually complete on the backend and the target
> starts to return aborted task statuses for them. The initiator will
> either throw a invalid ITT error or might accidentally lookup a new task
> if the ITT has been reallocated already.
>
> This fixes the bug by reverting the patch, and also serializes the
> execution of LUN_RESETs and Preempt and Aborts. The latter is necessary
> because it turns out the commit accidentally fixed a bug where if there
> are 2 LUN RESETs executing they can see each other on the dev_tmr_list,
> put the other one on their local drain list, then end up waiting on each
> other resulting in a deadlock.
If LUN_RESET is not in TMR list anymore there is no need to serialize
core_tmr_drain_tmr_list.
>
> Fixes: 51ec502a3266 ("target: Delete tmr from list before processing")
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
> drivers/target/target_core_device.c | 15 ++++++--
> drivers/target/target_core_tmr.c | 15 ++++----
> drivers/target/target_core_transport.c | 50 ++++++++++++++++++++++++--
> include/target/target_core_base.h | 5 ++-
> 4 files changed, 74 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/target/target_core_device.c b/drivers/target/target_core_device.c
> index f6e58410ec3f..c9f75ed1566b 100644
> --- a/drivers/target/target_core_device.c
> +++ b/drivers/target/target_core_device.c
> @@ -179,7 +179,16 @@ int transport_lookup_tmr_lun(struct se_cmd *se_cmd)
> se_tmr->tmr_dev = rcu_dereference_raw(se_lun->lun_se_dev);
>
> spin_lock_irqsave(&se_tmr->tmr_dev->se_tmr_lock, flags);
> - list_add_tail(&se_tmr->tmr_list, &se_tmr->tmr_dev->dev_tmr_list);
> + switch (se_tmr->function) {
> + case TMR_ABORT_TASK:
> + list_add_tail(&se_tmr->tmr_list,
> + &se_tmr->tmr_dev->generic_tmr_list);
> + break;
> + case TMR_LUN_RESET:
> + list_add_tail(&se_tmr->tmr_list,
> + &se_tmr->tmr_dev->lun_reset_tmr_list);
> + break;
> + }
> spin_unlock_irqrestore(&se_tmr->tmr_dev->se_tmr_lock, flags);
>
> return 0;
> @@ -761,7 +770,8 @@ struct se_device *target_alloc_device(struct se_hba *hba, const char *name)
> dev->hba_index = hba->hba_index;
>
> INIT_LIST_HEAD(&dev->dev_sep_list);
> - INIT_LIST_HEAD(&dev->dev_tmr_list);
> + INIT_LIST_HEAD(&dev->generic_tmr_list);
> + INIT_LIST_HEAD(&dev->lun_reset_tmr_list);
> INIT_LIST_HEAD(&dev->delayed_cmd_list);
> INIT_LIST_HEAD(&dev->qf_cmd_list);
> spin_lock_init(&dev->delayed_cmd_lock);
> @@ -782,6 +792,7 @@ struct se_device *target_alloc_device(struct se_hba *hba, const char *name)
> spin_lock_init(&dev->t10_alua.lba_map_lock);
>
> INIT_WORK(&dev->delayed_cmd_work, target_do_delayed_work);
> + mutex_init(&dev->lun_reset_mutex);
>
> dev->t10_wwn.t10_dev = dev;
> /*
> diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c
> index 2b95b4550a63..88d2a7839876 100644
> --- a/drivers/target/target_core_tmr.c
> +++ b/drivers/target/target_core_tmr.c
> @@ -184,13 +184,11 @@ static void core_tmr_drain_tmr_list(
> unsigned long flags;
> bool rc;
> /*
> - * Release all pending and outgoing TMRs aside from the received
> - * LUN_RESET tmr..
> + * Release all pending and outgoing TMRs except for LUN_RESETS.
> */
> spin_lock_irqsave(&dev->se_tmr_lock, flags);
> - if (tmr)
> - list_del_init(&tmr->tmr_list);
> - list_for_each_entry_safe(tmr_p, tmr_pp, &dev->dev_tmr_list, tmr_list) {
> + list_for_each_entry_safe(tmr_p, tmr_pp, &dev->generic_tmr_list,
> + tmr_list) {
> cmd = tmr_p->task_cmd;
> if (!cmd) {
> pr_err("Unable to locate struct se_cmd for TMR\n");
> @@ -379,14 +377,19 @@ int core_tmr_lun_reset(
> tmr_nacl->initiatorname);
> }
> }
> +
> + /* Serialize LUN RESET TMRs and preempt and aborts */
> + mutex_lock(&dev->lun_reset_mutex);
> +
> pr_debug("LUN_RESET: %s starting for [%s], tas: %d\n",
> (preempt_and_abort_list) ? "Preempt" : "TMR",
> dev->transport->name, tas);
> -
> core_tmr_drain_tmr_list(dev, tmr, preempt_and_abort_list);
> core_tmr_drain_state_list(dev, prout_cmd, tmr_sess, tas,
> preempt_and_abort_list);
>
> + mutex_unlock(&dev->lun_reset_mutex);
> +
> /*
> * Clear any legacy SPC-2 reservation when called during
> * LOGICAL UNIT RESET
> diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
> index 1c23079a5d7f..3c732b1b5389 100644
> --- a/drivers/target/target_core_transport.c
> +++ b/drivers/target/target_core_transport.c
> @@ -3574,6 +3574,7 @@ static void target_tmr_work(struct work_struct *work)
> struct se_cmd *cmd = container_of(work, struct se_cmd, work);
> struct se_device *dev = cmd->se_dev;
> struct se_tmr_req *tmr = cmd->se_tmr_req;
> + bool sched_reset = false;
> int ret;
>
> if (cmd->transport_state & CMD_T_ABORTED)
> @@ -3596,6 +3597,22 @@ static void target_tmr_work(struct work_struct *work)
> target_dev_ua_allocate(dev, 0x29,
> ASCQ_29H_BUS_DEVICE_RESET_FUNCTION_OCCURRED);
> }
> +
> + /*
> + * If this is the last reset the device can be freed after we
> + * run transport_cmd_check_stop_to_fabric. Figure out if there
> + * are other resets that need to be scheduled while we know we
> + * have a refcount on the device.
> + */
> + spin_lock_irq(&dev->se_tmr_lock);
tmr->tmr_list is removed from the list in the very end of se_cmd lifecycle
so any number of LUN_RESETs can be in lun_reset_tmr_list. And all of them
can be finished but not yet removed from the list.
You may delete lun_reset here with nulling tmr->tmr_dev:
+ list_del_init(&cmd->se_tmr_req->tmr_list);
+ cmd->se_tmr_req->tmr_dev = NULL;
Then the check below will be just
+ if (!list_empty(dev->lun_reset_tmr_list))
> + if (list_first_entry(&dev->lun_reset_tmr_list,
> + struct se_tmr_req, tmr_list) !=
> + list_last_entry(&dev->lun_reset_tmr_list,
> + struct se_tmr_req, tmr_list))
> + sched_reset = true;
> + else
> + dev->dev_flags &= ~DF_RESETTING_LUN;
> + spin_unlock_irq(&dev->se_tmr_lock);
> break;
> case TMR_TARGET_WARM_RESET:
> tmr->response = TMR_FUNCTION_REJECTED;
> @@ -3617,15 +3634,26 @@ static void target_tmr_work(struct work_struct *work)
>
> transport_lun_remove_cmd(cmd);
> transport_cmd_check_stop_to_fabric(cmd);
> +
> + if (!sched_reset)
> + return;
> +
> + spin_lock_irq(&dev->se_tmr_lock);
> + tmr = list_first_entry(&dev->lun_reset_tmr_list, struct se_tmr_req,
> + tmr_list);
And this list_first_entry will return the next LUN_RESET as you
expected.
> + spin_unlock_irq(&dev->se_tmr_lock);
> +
> + INIT_WORK(&tmr->task_cmd->work, target_tmr_work);
> + schedule_work(&tmr->task_cmd->work);
> return;
>
> aborted:
> target_handle_abort(cmd);
> }
>
> -int transport_generic_handle_tmr(
> - struct se_cmd *cmd)
> +int transport_generic_handle_tmr(struct se_cmd *cmd)
> {
> + struct se_device *dev = cmd->se_dev;
> unsigned long flags;
> bool aborted = false;
>
> @@ -3646,8 +3674,26 @@ int transport_generic_handle_tmr(
> return 0;
> }
>
> + spin_lock_irqsave(&dev->se_tmr_lock, flags);
> + if (cmd->se_tmr_req->function == TMR_LUN_RESET) {
> + /*
> + * We only allow one reset to execute at a time to prevent
> + * one reset waiting on another, and to make sure one reset
> + * does not claim all the cmds causing the other reset to
> + * return early.
> + */
> + if (dev->dev_flags & DF_RESETTING_LUN) {
> + spin_unlock_irqrestore(&dev->se_tmr_lock, flags);
> + goto done;
> + }
> +
> + dev->dev_flags |= DF_RESETTING_LUN;
Not good choise of flag variable. It is used at configuration time and
not under a lock. Configfs file dev/alias can be changed in any time
and could race with LUN_RESET.
> + }
> + spin_unlock_irqrestore(&dev->se_tmr_lock, flags);
> +
> INIT_WORK(&cmd->work, target_tmr_work);
> schedule_work(&cmd->work);
> +done:
> return 0;
> }
> EXPORT_SYMBOL(transport_generic_handle_tmr);
> diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
> index bd299790e99c..0a5b51f8e5e8 100644
> --- a/include/target/target_core_base.h
> +++ b/include/target/target_core_base.h
> @@ -804,6 +804,7 @@ struct se_device {
> #define DF_USING_UDEV_PATH 0x00000008
> #define DF_USING_ALIAS 0x00000010
> #define DF_READ_ONLY 0x00000020
> +#define DF_RESETTING_LUN 0x00000040
> u8 transport_flags;
> /* Physical device queue depth */
> u32 queue_depth;
> @@ -840,7 +841,8 @@ struct se_device {
> /* Used for SPC-3 Persistent Reservations */
> struct t10_pr_registration *dev_pr_res_holder;
> struct list_head dev_sep_list;
> - struct list_head dev_tmr_list;
> + struct list_head generic_tmr_list;
> + struct list_head lun_reset_tmr_list;
> struct work_struct qf_work_queue;
> struct work_struct delayed_cmd_work;
> struct list_head delayed_cmd_list;
> @@ -872,6 +874,7 @@ struct se_device {
> struct rcu_head rcu_head;
> int queue_cnt;
> struct se_device_queue *queues;
> + struct mutex lun_reset_mutex;
> };
>
> struct target_opcode_descriptor {
> --
> 2.31.1
>
>
next prev parent reply other threads:[~2023-03-15 16:14 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-09 22:32 [PATCH 00/18] target: TMF and recovery fixes Mike Christie
2023-03-09 22:32 ` [PATCH 01/18] scsi: target: Move sess cmd counter to new struct Mike Christie
2023-03-09 22:32 ` [PATCH 02/18] scsi: target: Move cmd counter allocation Mike Christie
2023-03-09 22:32 ` [PATCH 03/18] scsi: target: Pass in cmd counter to use during cmd setup Mike Christie
2023-03-09 22:32 ` [PATCH 04/18] scsi: target: iscsit/isert: Alloc per conn cmd counter Mike Christie
2023-03-09 22:32 ` [PATCH 05/18] scsi: target: iscsit: stop/wait on cmds during conn close Mike Christie
2023-03-09 22:33 ` [PATCH 06/18] scsi: target: Drop t_state_lock use in compare_and_write_post Mike Christie
2023-03-09 22:33 ` [PATCH 07/18] scsi: target: Treat CMD_T_FABRIC_STOP like CMD_T_STOP Mike Christie
2023-03-15 10:47 ` Dmitry Bogdanov
2023-03-15 22:54 ` Mike Christie
2023-03-16 0:01 ` michael.christie
2023-03-09 22:33 ` [PATCH 08/18] scsi: target: iscsit: Add helper to check when cmd has failed Mike Christie
2023-03-09 22:33 ` [PATCH 09/18] scsi: target: iscsit: Cleanup isert commands at conn closure Mike Christie
2023-03-09 22:33 ` [PATCH 10/18] IB/isert: Fix hang in target_wait_for_cmds Mike Christie
2023-03-09 22:33 ` [PATCH 11/18] IB/isert: Fix use after free during conn cleanup Mike Christie
2023-03-15 15:21 ` Sagi Grimberg
2023-03-09 22:33 ` [PATCH 12/18] scsi: target: iscsit: free cmds before session free Mike Christie
2023-03-09 22:33 ` [PATCH 13/18] scsi: target: Fix multiple LUN_RESET handling Mike Christie
2023-03-15 16:13 ` Dmitry Bogdanov [this message]
2023-03-15 16:44 ` Mike Christie
2023-03-15 19:11 ` Dmitry Bogdanov
2023-03-15 21:42 ` Mike Christie
2023-03-16 10:39 ` Dmitry Bogdanov
2023-03-16 16:03 ` Mike Christie
2023-03-16 16:07 ` Mike Christie
2023-03-09 22:33 ` [PATCH 14/18] scsi: target: Don't set CMD_T_FABRIC_STOP for aborted tasks Mike Christie
2023-03-09 22:33 ` [PATCH 15/18] scsi: target: iscsit: Fix TAS handling during conn cleanup Mike Christie
2023-03-09 22:33 ` [PATCH 16/18] scsi: target: drop tas arg from __transport_wait_for_tasks Mike Christie
2023-03-09 22:33 ` [PATCH 17/18] scsi: target: Remove sess_cmd_lock Mike Christie
2023-03-09 22:33 ` [PATCH 18/18] scsi: target: Move tag pr_debug to before we do a put on the cmd Mike Christie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230315161301.GE1031@yadro.com \
--to=d.bogdanov@yadro.com \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=mgurtovoy@nvidia.com \
--cc=michael.christie@oracle.com \
--cc=mlombard@redhat.com \
--cc=sagi@grimberg.me \
--cc=target-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.