From: Dmitry Bogdanov <d.bogdanov@yadro.com>
To: Mike Christie <michael.christie@oracle.com>
Cc: <mlombard@redhat.com>, <martin.petersen@oracle.com>,
<linux-scsi@vger.kernel.org>, <target-devel@vger.kernel.org>
Subject: Re: [PATCH 5/7] scsi: target: iscsit/isert: stop/wait on cmds during conn close
Date: Fri, 9 Dec 2022 15:32:02 +0300 [thread overview]
Message-ID: <20221209123202.GD15327@yadro.com> (raw)
In-Reply-To: <20221208031002.106700-6-michael.christie@oracle.com>
On Wed, Dec 07, 2022 at 09:10:00PM -0600, Mike Christie wrote:
>
> This fixes 2 bugs added in:
>
> commit f36199355c64 ("scsi: target: iscsi: Fix cmd abort fabric stop
> race")
>
> If we have multiple sessions to the same se_device we can hit a race where
> a LUN_RESET on one session cleans up the se_cmds from under another
> session which is being closed. This results in the closing session freeing
> its conn/session structs while they are still in use.
>
> The bug is:
>
> 1. Session1 has IO se_cmd1.
> 2. Session2 can also have se_cmds for IO and optionally TMRs for ABORTS
> but then gets a LUN_RESET.
> 3. The LUN_RESET on session2 sees the se_cmds on session1 and during
> the drain stages marks them all with CMD_T_ABORTED.
> 4. session1 is now closed so iscsit_release_commands_from_conn only sees
> se_cmds with the CMD_T_ABORTED bit set and returns immediately even
> though we have outstanding commands.
> 5. session1's connection and session are freed.
> 6. The backend request for se_cmd1 completes and it accesses the freed
> connection/session.
>
> If session1 was executing only IO se_cmds and TAS is set on the se_cmd,
> then we need to do a iscsit_free_cmd on those commands, so we wait on
> their completion from LIO core and the backend.
>
> If session1 was waiting on tmr se_cmds or TAS is not set then we need to
> wait for those outstanding se_cmds to have their last put done so we
> know no user is still accessing them when we free the session/conn.
>
> This fixes the TAS set case, by adding a check so if we hit it we now call
> iscsit_free_cmd. To handle the tmr se_cd and non TAS case, it hooks the
> iscsit layer into the cmd counter code, so we can wait for all outstanding
> commands before freeing the connection and possibly the session.
>
> Fixes: f36199355c64 ("scsi: target: iscsi: Fix cmd abort fabric stop race")
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
> drivers/infiniband/ulp/isert/ib_isert.c | 13 +------------
> drivers/target/iscsi/iscsi_target.c | 13 ++++++++++++-
> drivers/target/target_core_transport.c | 6 ++++--
> include/target/target_core_fabric.h | 2 ++
> 4 files changed, 19 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
> index b360a1527cd1..600059d8a3a7 100644
> --- a/drivers/infiniband/ulp/isert/ib_isert.c
> +++ b/drivers/infiniband/ulp/isert/ib_isert.c
> @@ -2501,17 +2501,6 @@ isert_wait4logout(struct isert_conn *isert_conn)
> }
> }
>
> -static void
> -isert_wait4cmds(struct iscsit_conn *conn)
> -{
> - isert_info("iscsit_conn %p\n", conn);
> -
> - if (conn->sess) {
> - target_stop_session(conn->sess->se_sess);
> - target_wait_for_sess_cmds(conn->sess->se_sess);
> - }
> -}
> -
> /**
> * isert_put_unsol_pending_cmds() - Drop commands waiting for
> * unsolicitate dataout
> @@ -2559,7 +2548,7 @@ static void isert_wait_conn(struct iscsit_conn *conn)
>
> ib_drain_qp(isert_conn->qp);
> isert_put_unsol_pending_cmds(conn);
> - isert_wait4cmds(conn);
> + target_wait_for_cmds(conn->cmd_cnt);
> isert_wait4logout(isert_conn);
>
> queue_work(isert_release_wq, &isert_conn->release_work);
> diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
> index 7a8ffdf33bee..1c3470e4b50c 100644
> --- a/drivers/target/iscsi/iscsi_target.c
> +++ b/drivers/target/iscsi/iscsi_target.c
> @@ -4221,7 +4221,8 @@ static void iscsit_release_commands_from_conn(struct iscsit_conn *conn)
>
> if (se_cmd->se_tfo != NULL) {
> spin_lock_irq(&se_cmd->t_state_lock);
> - if (se_cmd->transport_state & CMD_T_ABORTED) {
> + if (se_cmd->transport_state & CMD_T_ABORTED &&
> + !(se_cmd->transport_state & CMD_T_TAS)) {
> /*
> * LIO's abort path owns the cleanup for this,
> * so put it back on the list and let
Could you please extract ths snippet (fix of the hanged commands with
TAS) to a separate patch? It looks good.
> @@ -4244,6 +4245,14 @@ static void iscsit_release_commands_from_conn(struct iscsit_conn *conn)
> iscsit_free_cmd(cmd, true);
>
> }
> +
> + /*
> + * Wait on commands that were cleaned up via the aborted_task path.
> + * LLDs that implement iscsit_wait_conn will already have waited for
> + * commands.
> + */
> + if (!conn->conn_transport->iscsit_wait_conn)
> + target_wait_for_cmds(conn->cmd_cnt);
> }
>
> static void iscsit_stop_timers_for_cmds(
> @@ -4304,6 +4313,8 @@ int iscsit_close_connection(
> iscsit_stop_nopin_response_timer(conn);
> iscsit_stop_nopin_timer(conn);
>
> + target_stop_cmd_counter(conn->cmd_cnt);
> +
> if (conn->conn_transport->iscsit_wait_conn)
> conn->conn_transport->iscsit_wait_conn(conn);
I strongly believe that waiting for commands complete before decreasing
the command refcounter is useless and leads to hangings.
There was a several tries to wait for the commands complete in the
session. But all of them were eventually reverted due to iSER [1].
[1] https://lore.kernel.org/all/CH2PR12MB4005D671F3D274C4D5FA0BAEDD1C0@CH2PR12MB4005.namprd12.prod.outlook.com/
Let's try it one more time - move conn->conn_transport->iscsit_wait_conn(conn)
to the end of iscsit_release_commands_from_conn() to align iser with other
iscsi transports.
Probably, to have target_wait_for_cmds as a default .iscsit_wait_conn
implementation would be the best way.
>
> diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
> index 90e3b1aef1f1..8bbf0c834b74 100644
> --- a/drivers/target/target_core_transport.c
> +++ b/drivers/target/target_core_transport.c
> @@ -3174,13 +3174,14 @@ static void target_stop_cmd_counter_confirm(struct percpu_ref *ref)
> * target_stop_cmd_counter - Stop new IO from being added to the counter.
> * @cmd_cnt: counter to stop
> */
> -static void target_stop_cmd_counter(struct target_cmd_counter *cmd_cnt)
> +void target_stop_cmd_counter(struct target_cmd_counter *cmd_cnt)
> {
> pr_debug("Stopping command counter.\n");
> if (!atomic_cmpxchg(&cmd_cnt->stopped, 0, 1))
> percpu_ref_kill_and_confirm(&cmd_cnt->refcnt,
> target_stop_cmd_counter_confirm);
> }
> +EXPORT_SYMBOL_GPL(target_stop_cmd_counter);
>
> /**
> * target_stop_session - Stop new IO from being queued on the session.
> @@ -3196,7 +3197,7 @@ EXPORT_SYMBOL(target_stop_session);
> * target_wait_for_cmds - Wait for outstanding cmds.
> * @cmd_cnt: counter to wait for active I/O for.
> */
> -static void target_wait_for_cmds(struct target_cmd_counter *cmd_cnt)
> +void target_wait_for_cmds(struct target_cmd_counter *cmd_cnt)
> {
> int ret;
>
> @@ -3212,6 +3213,7 @@ static void target_wait_for_cmds(struct target_cmd_counter *cmd_cnt)
> wait_for_completion(&cmd_cnt->stop_done);
> pr_debug("Waiting for cmds done.\n");
> }
> +EXPORT_SYMBOL_GPL(target_wait_for_cmds);
>
> /**
> * target_wait_for_sess_cmds - Wait for outstanding commands
> diff --git a/include/target/target_core_fabric.h b/include/target/target_core_fabric.h
> index 4cbfb532a431..b188b1e90e1e 100644
> --- a/include/target/target_core_fabric.h
> +++ b/include/target/target_core_fabric.h
> @@ -133,6 +133,8 @@ struct se_session *target_setup_session(struct se_portal_group *,
> struct se_session *, void *));
> void target_remove_session(struct se_session *);
>
> +void target_stop_cmd_counter(struct target_cmd_counter *cmd_cnt);
> +void target_wait_for_cmds(struct target_cmd_counter *cmd_cnt);
> struct target_cmd_counter *target_alloc_cmd_counter(void);
> void target_free_cmd_counter(struct target_cmd_counter *cmd_cnt);
>
> --
> 2.25.1
>
>
next prev parent reply other threads:[~2022-12-09 12:32 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-08 3:09 [PATCH 0/7] scsi target task management fixes Mike Christie
2022-12-08 3:09 ` [PATCH 1/7] scsi: target: Move sess cmd counter to new struct Mike Christie
2022-12-08 3:09 ` [PATCH 2/7] scsi: target: Move cmd counter allocation Mike Christie
2022-12-08 3:09 ` [PATCH 3/7] scsi: target: Pass in cmd counter to use during cmd setup Mike Christie
2022-12-08 3:09 ` [PATCH 4/7] scsi: target: iscsit: Alloc per conn cmd counter Mike Christie
2022-12-08 3:10 ` [PATCH 5/7] scsi: target: iscsit/isert: stop/wait on cmds during conn close Mike Christie
2022-12-09 12:32 ` Dmitry Bogdanov [this message]
2022-12-10 18:48 ` Mike Christie
2022-12-11 1:38 ` Mike Christie
2023-01-12 3:11 ` Mike Christie
2022-12-11 1:20 ` Mike Christie
2022-12-08 3:10 ` [PATCH 6/7] scsi: target: drop tas arg from __transport_wait_for_tasks Mike Christie
2022-12-08 9:24 ` Dmitry Bogdanov
2022-12-08 3:10 ` [PATCH 7/7] scsi: target: Fix multiple LUN_RESET handling Mike Christie
2022-12-08 9:21 ` Dmitry Bogdanov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221209123202.GD15327@yadro.com \
--to=d.bogdanov@yadro.com \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=michael.christie@oracle.com \
--cc=mlombard@redhat.com \
--cc=target-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox