Re: iSCSI Abort Task and WRITE PENDING

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Konstantin Shelekhin <k.shelekhin@yadro.com>
To: Mike Christie <michael.christie@oracle.com>
Cc: <target-devel@vger.kernel.org>, <linux@yadro.com>,
	Maurizio Lombardi <mlombard@redhat.com>
Subject: Re: iSCSI Abort Task and WRITE PENDING
Date: Wed, 13 Oct 2021 21:58:32 +0300	[thread overview]
Message-ID: <YWcsWKIn5RyN+UbD@yadro.com> (raw)
In-Reply-To: <6059cfab-5cbb-984b-7efc-eb193bddea7a@oracle.com>

On Wed, Oct 13, 2021 at 01:30:32PM -0500, Mike Christie wrote:
> On 10/13/21 1:24 PM, Mike Christie wrote:
> > On 10/13/21 1:08 PM, Konstantin Shelekhin wrote:
> >> On Wed, Oct 13, 2021 at 12:51:32PM -0500, Mike Christie wrote:
> >>> On 10/13/21 8:21 AM, Konstantin Shelekhin wrote:
> >>>> Hi,
> >>>>
> >>>> I really need the collective wisdom.
> >>>>
> >>>> Not long ago we've uncovered the problem with iSCSI and ABORT TASK
> >>>> handling. Currently it's not possible to abort a WRITE_10 command in
> >>>> TRANSPORT_WRITE_PENDING state, because ABORT TASK  will hang itself in
> >>>> the process:
> >>>>
> >>>>   # dmesg | tail -2
> >>>>   [   83.563505] ABORT_TASK: Found referenced iSCSI task_tag: 3372979269
> >>>>   [   84.593545] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus <nexus>
> >>>>
> >>>>   # ps aux | awk '$8 ~/D/'
> >>>>   root        32  0.0  0.0      0     0 ?        D    15:19   0:00 [kworker/0:1+events]
> >>>>   root      1187  0.0  0.0      0     0 ?        D    15:20   0:00 [iscsi_ttx]
> >>>>
> >>>>   # cat /proc/32/stack
> >>>>   [<0>] target_put_cmd_and_wait+0x68/0xa0
> >>>>   [<0>] core_tmr_abort_task.cold+0x16b/0x192
> >>>>   [<0>] target_tmr_work+0x9e/0xe0
> >>>>   [<0>] process_one_work+0x1d4/0x370
> >>>>   [<0>] worker_thread+0x48/0x3d0
> >>>>   [<0>] kthread+0x122/0x140
> >>>>   [<0>] ret_from_fork+0x22/0x30
> >>>>
> >>>>   # cat /proc/1187/stack
> >>>>   [<0>] __transport_wait_for_tasks+0xaf/0x100
> >>>>   [<0>] transport_generic_free_cmd+0xe9/0x180
> >>>>   [<0>] iscsit_free_cmd+0x50/0xb0
> >>>>   [<0>] iscsit_close_connection+0x47d/0x8c0
> >>>>   [<0>] iscsit_take_action_for_connection_exit+0x6f/0xf0
> >>>>   [<0>] iscsi_target_tx_thread+0x184/0x200
> >>>>   [<0>] kthread+0x122/0x140
> >>>>   [<0>] ret_from_fork+0x22/0x30
> >>>>
> >>>> What happens:
> >>>>
> >>>>   1. Initiator sends WRITE_10 CDB
> >>>>   2. Target parses the CDB and sends R2T
> >>>>   3. Target starts the Data-Out timer
> >>>>   4. Initiator sends ABORT TASK; no new data from Initiator after this
> >>>>   5. Target starts aborting WRITE_10, gets into core_tmr_abort_task()
> >>>>      and starts waiting for the request completion
> >>>>   6. Nothing happens
> >>>>   7. The Data-Out timers expires, connection teardown starts and gets
> >>>>      stuck waiting for ABORT TASK that waits for WRITE_10
> >>>>
> >>>> The ABORT TASK processing looks roughly like this:
> >>>>
> >>>>   iscsi_rx_opcode
> >>>>     iscsi_handle_task_mgt_cmd
> >>>>       iscsi_tmr_abort_task
> >>>>       transport_generic_handle_tmr
> >>>>         if (tmr_cmd->transport_state & CMD_T_ABORTED)
> >>>>           target_handle_abort
> >>>>         else
> >>>>           target_tmr_work
> >>>>             if (tmr_cmd->transport_state & CMD_T_ABORTED)
> >>>>               target_handle_abort
> >>>>             else
> >>>>               core_tmr_abort_task
> >>>>                 ret = __target_check_io_state
> >>>>                   if (write_cmd->transport_state & CMD_T_STOP)
> >>>>                     return -1
> >>>>                   write_cmd->transport_state |= CMD_T_ABORTED
> >>>>                   return 0
> >>>>                 if (!ret)
> >>>>                   list_move_tail(&write_cmd->state_list, &aborted)
> >>>>                   target_put_cmd_and_wait(&write_cmd)
> >>>>
> >>>> As I see it, the main problem is that the abort path can't initiate the
> >>>> command termination, it simply waits for the request to handle this on
> >>>> the execution path like in target_execute_cmd():
> >>>>
> >>>>   target_execute_cmd
> >>>>     target_cmd_interrupted
> >>>>       INIT_WORK(&cmd->work, target_abort_work)
> >>>>
> >>>> However, in this case the request is not going to be executed because
> >>>> Initiator will not send the Data-Out buffer.
> >>>>
> >>>> I have a couple of ideas on how to fix this, but they all look kinda
> >>>> ugly. The one that currently works around this for me:
> >>>>
> >>>>   core_tmr_abort_task():
> >>>>
> >>>>     [...]
> >>>>
> >>>>     spin_lock_irqsave(&se_cmd->t_state_lock, flags);
> >>>>     write_pending = se_cmd->t_state == TRANSPORT_WRITE_PENDING;
> >>>>     spin_unlock_irqrestore(&se_cmd->t_state_lock, flags);
> >>>>     
> >>>>     if (write_pending && se_cmd->se_tfo->abort_write_pending)
> >>>>             se_cmd->se_tfo->abort_write_pending(se_cmd);
> >>>>     
> >>>>     target_put_cmd_and_wait(se_cmd);
> >>>>
> >>>>     [...]
> >>>>
> >>>> The new method abort_write_pending() is defined only for iSCSI and calls
> >>>> target_handle_abort(). However, this opens up another can of worms
> >>>> because this code heavily races with R2T sending and requires a couple
> >>>> of checks to "work most of the time". Not ideal, by far.
> >>>>
> >>>> I can make this one better by introducing R2T list draining that ensures
> >>>> the proper order during cleanup, but maybe there is a much easier way
> >>>> that I'm not seeing here.
> >>>
> >>> Ccing Maurizio to make sure I don't add his original bug back.
> >>>
> >>> If I understand you, I think I added this bug in:
> >>>
> >>> commit f36199355c64a39fe82cfddc7623d827c7e050da
> >>> Author: Mike Christie <michael.christie@oracle.com>
> >>> Date:   Fri Nov 13 19:46:18 2020 -0600
> >>>
> >>>     scsi: target: iscsi: Fix cmd abort fabric stop race
> >>>
> >>> With that patch if the abort or a lun reset has got to lio core then we
> >>> are going to be stuck waiting for the data which won't come because we
> >>> killed the iscsi threads.
> >>>
> >>> Can go back to always having the iscsi target clean up the cmd, but if
> >>> LIO has started to abort the cmd we take an extra ref so we don't free
> >>> the cmd from under each other.
> >>>
> >>> This patch is completely untested:
> >>>
> >>>
> >>> diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
> >>> index 2c54c5d8412d..d221e9be7468 100644
> >>> --- a/drivers/target/iscsi/iscsi_target.c
> >>> +++ b/drivers/target/iscsi/iscsi_target.c
> >>> @@ -4090,12 +4090,13 @@ static void 	(struct iscsi_conn *conn)
> >>>  			spin_lock_irq(&se_cmd->t_state_lock);
> >>>  			if (se_cmd->transport_state & CMD_T_ABORTED) {
> >>>  				/*
> >>> -				 * LIO's abort path owns the cleanup for this,
> >>> -				 * so put it back on the list and let
> >>> -				 * aborted_task handle it.
> >>> +				 * The LIO TMR handler owns the cmd but if
> >>> +				 * we were waiting for data from the initiator
> >>> +				 * then we need to internally cleanup to be
> >>> +				 * able to complete it. Get an extra ref so
> >>> +				 * we don't free the cmd from under LIO core.
> >>>  				 */
> >>> -				list_move_tail(&cmd->i_conn_node,
> >>> -					       &conn->conn_cmd_list);
> >>> +				target_get_sess_cmd(se_cmd, false);
> >>>  			} else {
> >>>  				se_cmd->transport_state |= CMD_T_FABRIC_STOP;
> >>>  			}
> >>
> >> The bug was there before. I had to backport this patch in order to
> >> introduce my fix. I can revert it and check what is different, but it's
> >> there in some form.
> >>
> > 
> > Don't waste your time. It's because iscsit_free_cmd's call to
> > transport_generic_free_cmd has wait_for_tasks=true.
> > 
> > We then do transport_generic_free_cmd -> target_wait_free_cmd ->
> > __transport_wait_for_tasks like you posted above.
> 
> That's wrong, it's the transport_generic_free_cmd wait:
> 
>         if (aborted) {
>                 pr_debug("Detected CMD_T_ABORTED for ITT: %llu\n", cmd->tag);
>                 wait_for_completion(&compl);
>                 ret = 1;

I'm not sure I'm following you. The wait on master comes from this:

 core_tmr_abort_task():
  [...]
  target_put_cmd_and_wait(se_cmd);
  [...]

IMO it's kinda legit wait. And we can't just drop the references and
call it a day, because a request has to go through the
target_handle_abort() because it (at least) does some TAS magic. And
currently there is no code that calls target_handle_abort() for
WRITE_PENDING requests.

next prev parent reply	other threads:[~2021-10-13 18:59 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-13 13:21 iSCSI Abort Task and WRITE PENDING Konstantin Shelekhin
2021-10-13 14:22 ` Hannes Reinecke
2021-10-13 14:53   ` Konstantin Shelekhin
2021-10-13 14:56     ` Konstantin Shelekhin
2021-10-14  7:09     ` Hannes Reinecke
2021-10-14  7:52       ` Konstantin Shelekhin
2021-10-13 17:51 ` Mike Christie
2021-10-13 18:05   ` Mike Christie
2021-10-13 18:11     ` Konstantin Shelekhin
2021-10-13 18:08   ` Konstantin Shelekhin
2021-10-13 18:24     ` Mike Christie
2021-10-13 18:30       ` Mike Christie
2021-10-13 18:58         ` Konstantin Shelekhin [this message]
2021-10-13 19:01           ` Konstantin Shelekhin
2021-10-13 20:21             ` Mike Christie
2021-10-14 23:12               ` Konstantin Shelekhin
2021-10-15  3:18                 ` michael.christie
2021-10-18 11:56                   ` Konstantin Shelekhin
2021-10-18 16:29                     ` Mike Christie
2021-10-18 17:08                       ` Mike Christie
2021-10-26 10:59                         ` Konstantin Shelekhin
2021-10-18 17:32                       ` Konstantin Shelekhin
2021-10-18 20:20                         ` Mike Christie
2021-10-18 20:34                           ` Mike Christie
2021-10-18 21:50                             ` Konstantin Shelekhin
2021-10-18 21:48                           ` Konstantin Shelekhin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YWcsWKIn5RyN+UbD@yadro.com \
    --to=k.shelekhin@yadro.com \
    --cc=linux@yadro.com \
    --cc=michael.christie@oracle.com \
    --cc=mlombard@redhat.com \
    --cc=target-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.