public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: Mike Christie <michael.christie@oracle.com>
To: Dmitry Bogdanov <d.bogdanov@yadro.com>,
	Martin Petersen <martin.petersen@oracle.com>,
	target-devel@vger.kernel.org
Cc: linux-scsi@vger.kernel.org, linux@yadro.com,
	Konstantin Shelekhin <k.shelekhin@yadro.com>,
	Roman Bolshakov <r.bolshakov@yadro.com>
Subject: Re: [PATCH v3] target: core: remove from tmr_list at lun unlink
Date: Fri, 17 Sep 2021 11:57:59 -0500	[thread overview]
Message-ID: <40b321b4-76bd-8eb4-84bd-c7378ad2bbc7@oracle.com> (raw)
In-Reply-To: <20210915141719.1484-1-d.bogdanov@yadro.com>

On 9/15/21 9:17 AM, Dmitry Bogdanov wrote:
> Currently TMF commands are removed from de_device.dev_tmf_list at
> the very end of se_cmd lifecycle. But se_lun unlinks from se_cmd
> up on a command status (response) is queued in transport layer.
> It means that LUN and backend device can be deleted meantime and at
> the moment of repsonse completion a panic is occured:
> 
> target_tmr_work()
> 	cmd->se_tfo->queue_tm_rsp(cmd); // send abort_rsp to a wire
> 	transport_lun_remove_cmd(cmd) // unlink se_cmd from se_lun
> - // - // - // -
> <<<--- lun remove
> <<<--- core backend device remove
> - // - // - // -
> qlt_handle_abts_completion()
>   tfo->free_mcmd()
>     transport_generic_free_cmd()
>       target_put_sess_cmd()
>         core_tmr_release_req() {
>           if (dev) { // backend device, can not be null
>             spin_lock_irqsave(&dev->se_tmr_lock, flags); //<<<--- CRASH
> 
> Call Trace:
> NIP [c000000000e1683c] _raw_spin_lock_irqsave+0x2c/0xc0
> LR [c00800000e433338] core_tmr_release_req+0x40/0xa0 [target_core_mod]
> Call Trace:
> (unreliable)
> 0x0
> target_put_sess_cmd+0x2a0/0x370 [target_core_mod]
> transport_generic_free_cmd+0x6c/0x1b0 [target_core_mod]
> tcm_qla2xxx_complete_mcmd+0x28/0x50 [tcm_qla2xxx]
> process_one_work+0x2c4/0x5c0
> worker_thread+0x88/0x690
> 
> For FC protocol it is a race condition, but for iSCSI protocol it is
> easyly reproduced by manual sending iSCSI commands:
> - Send some SCSI sommand
> - Send Abort of that command over iSCSI
> - Remove LUN on target
> - Send next iSCSI command to acknowledge the Abort_Response
> - target panics
> 
> There is no sense to keep the command in tmr_list until response
> completion, so move the removal from tmr_list from the response
> completion to the response queueing when lun is unlinked.
> Move the removal from state list too as it is a subject to the same
> race condition.
> 
> Fixes: c66ac9db8d4a ("[SCSI] target: Add LIO target core v4.0.0-rc6")
> Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
> Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
> 
> ---
> v3:
>  remove iscsi fix as not related to the issue
>  avoid double removal from tmr_list
> v2:
>  fix stuck in tmr list in error case
> 
> The issue exists from the very begining.
> I uploaded a scapy script that helps to reproduce the issue at
> https://gist.github.com/logost/cb93df41dd2432454324449b390403c4
> ---
>  drivers/target/target_core_tmr.c       | 10 +--------
>  drivers/target/target_core_transport.c | 30 ++++++++++++++++++++------
>  2 files changed, 24 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c
> index e7fcbc09f9db..84ae2fe456ec 100644
> --- a/drivers/target/target_core_tmr.c
> +++ b/drivers/target/target_core_tmr.c
> @@ -50,15 +50,6 @@ EXPORT_SYMBOL(core_tmr_alloc_req);
>  
>  void core_tmr_release_req(struct se_tmr_req *tmr)
>  {
> -	struct se_device *dev = tmr->tmr_dev;
> -	unsigned long flags;
> -
> -	if (dev) {
> -		spin_lock_irqsave(&dev->se_tmr_lock, flags);
> -		list_del_init(&tmr->tmr_list);
> -		spin_unlock_irqrestore(&dev->se_tmr_lock, flags);
> -	}
> -
>  	kfree(tmr);
>  }
>  
> @@ -234,6 +225,7 @@ static void core_tmr_drain_tmr_list(
>  		}
>  
>  		list_move_tail(&tmr_p->tmr_list, &drain_tmr_list);
> +		tmr_p->tmr_dev = NULL;

Is this patch now adding a way to hit:

if (!tmr->tmr_dev)
	WARN_ON_ONCE(transport_lookup_tmr_lun(tmr->task_cmd) < 0);                      

in core_tmr_abort_task?

You have the abort and lun reset works running on different CPUs.
The lun reset hits the above code first and clears tmr_dev.
The abort then hits the tmr->tmr_dev check and tries to do
transport_lookup_tmr_lun.

For the case where the lun is not removed, it looks like
transport_lookup_tmr_lun will add the tmr to the dev_tmr_list
but it would also be on the drain_tmr_list above so we would
hit list corruption.

For the case where the lun is getting removed, percpu_ref_tryget_live
would fail in transport_lookup_tmr_lun and we hit the WARN_ON_ONCE.
I think though with your patch, we would be ok and don't want
the WARN_ON_ONCE, right? The lun reset would just wait for the
abort. When it completes the abort and reset complete as expected.

  reply	other threads:[~2021-09-17 16:58 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-15 14:17 [PATCH v3] target: core: remove from tmr_list at lun unlink Dmitry Bogdanov
2021-09-17 16:57 ` Mike Christie [this message]
2021-09-20 16:40   ` Dmitriy Bogdanov
2021-09-22 16:00     ` Mike Christie
2021-09-22 16:43       ` Dmitriy Bogdanov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=40b321b4-76bd-8eb4-84bd-c7378ad2bbc7@oracle.com \
    --to=michael.christie@oracle.com \
    --cc=d.bogdanov@yadro.com \
    --cc=k.shelekhin@yadro.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=linux@yadro.com \
    --cc=martin.petersen@oracle.com \
    --cc=r.bolshakov@yadro.com \
    --cc=target-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox