All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Christie <mchristi@redhat.com>
To: "Nicholas A. Bellinger" <nab@linux-iscsi.org>,
	target-devel <target-devel@vger.kernel.org>
Cc: linux-scsi <linux-scsi@vger.kernel.org>,
	lkml <linux-kernel@vger.kernel.org>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Hannes Reinecke <hare@suse.com>, Christoph Hellwig <hch@lst.de>,
	Sagi Grimberg <sagi@grimberg.me>,
	"Bryant G. Ly" <bryantly@linux.vnet.ibm.com>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Bart Van Assche <bvanassche@acm.org>
Subject: Re: [PATCH 2/2] target: Fix target_wait_for_sess_cmds breakage with active signals
Date: Wed, 10 Oct 2018 11:58:39 -0500	[thread overview]
Message-ID: <5BBE2FBF.7080804@redhat.com> (raw)
In-Reply-To: <1539141790-13557-3-git-send-email-nab@linux-iscsi.org>

On 10/09/2018 10:23 PM, Nicholas A. Bellinger wrote:
> From: Nicholas Bellinger <nab@linux-iscsi.org>
> 
> With the addition of commit 00d909a107 in v4.19-rc, it incorrectly assumes no
> signals will be pending for task_struct executing the normal session shutdown
> and I/O quiesce code-path.
> 
> For example, iscsi-target and iser-target issue SIGINT to all kthreads as
> part of session shutdown.  This has been the behaviour since day one.
> 
> As-is when signals are pending with se_cmds active in se_sess->sess_cmd_list,
> wait_event_interruptible_lock_irq_timeout() returns a negative number and
> immediately kills the machine because of the do while (ret <= 0) loop that
> was added in commit 00d909a107 to spin while backend I/O is taking any
> amount of extended time (say 30 seconds) to complete.
> 
> Here's what it looks like in action with debug plus delayed backend I/O
> completion:
> 
> [ 4951.909951] se_sess: 000000003e7e08fa before target_wait_for_sess_cmds
> [ 4951.914600] target_wait_for_sess_cmds: signal_pending: 1
> [ 4951.918015] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 0
> [ 4951.921639] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 1
> [ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 2
> [ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 3
> [ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 4
> [ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 5
> [ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 6
> [ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 7
> [ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 8
> [ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 9
> 
> ... followed by the usual RCU CPU stalls and deadlock.
> 
> There was never a case pre commit 00d909a107 where wait_for_complete(&se_cmd->cmd_wait_comp)
> was able to be interruptted, so to address this for v4.19+ moving forward go ahead and
> use wait_event_lock_irq_timeout() instead so new code works with all fabric drivers.
> 
> Also for commit 00d909a107, fix a minor regression in target_release_cmd_kref()
> to only wake_up the new se_sess->cmd_list_wq only when shutdown has actually
> been triggered via se_sess->sess_tearing_down.
> 
> Fixes: 00d909a107 ("scsi: target: Make the session shutdown code also wait for commands that are being aborted")
> Cc: Bart Van Assche <bvanassche@acm.org>
> Cc: Mike Christie <mchristi@redhat.com>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Sagi Grimberg <sagi@grimberg.me>
> Cc: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
> Tested-by: Nicholas Bellinger <nab@linux-iscsi.org>
> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
> ---
>  drivers/target/target_core_transport.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
> index 86c0156..fc3093d2 100644
> --- a/drivers/target/target_core_transport.c
> +++ b/drivers/target/target_core_transport.c
> @@ -2754,7 +2754,7 @@ static void target_release_cmd_kref(struct kref *kref)
>  	if (se_sess) {
>  		spin_lock_irqsave(&se_sess->sess_cmd_lock, flags);
>  		list_del_init(&se_cmd->se_cmd_list);
> -		if (list_empty(&se_sess->sess_cmd_list))
> +		if (se_sess->sess_tearing_down && list_empty(&se_sess->sess_cmd_list))

I think there is another issue with 00d909a107 and ibmvscsi_tgt.

The problem is that ibmvscsi_tgt never called
target_sess_cmd_list_set_waiting. It only called
target_wait_for_sess_cmds. So before 00d909a107 there was a bug in that
driver and target_wait_for_sess_cmds never did what was intended because
sess_wait_list would always be empty.

With 00d909a107, we no longer need to call
target_sess_cmd_list_set_waiting to wait for outstanding commands, so
for ibmvscsi_tgt will now wait for commands like we wanted. However, the
commit added a WARN_ON that is hit if target_sess_cmd_list_set_waiting
is not called, so we could hit that.

So I think we need to add a target_sess_cmd_list_set_waiting call in
ibmvscsi_tgt to go along with your patch chunk above and make sure we do
not trigger the WARN_ON.

>  			wake_up(&se_sess->cmd_list_wq);
>  		spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags);
>  	}
> @@ -2907,7 +2907,7 @@ void target_wait_for_sess_cmds(struct se_session *se_sess)
>  
>  	spin_lock_irq(&se_sess->sess_cmd_lock);
>  	do {
> -		ret = wait_event_interruptible_lock_irq_timeout(
> +		ret = wait_event_lock_irq_timeout(
>  				se_sess->cmd_list_wq,
>  				list_empty(&se_sess->sess_cmd_list),
>  				se_sess->sess_cmd_lock, 180 * HZ);
> 

WARNING: multiple messages have this Message-ID (diff)
From: Mike Christie <mchristi@redhat.com>
To: "Nicholas A. Bellinger" <nab@linux-iscsi.org>,
	target-devel <target-devel@vger.kernel.org>
Cc: linux-scsi <linux-scsi@vger.kernel.org>,
	lkml <linux-kernel@vger.kernel.org>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Hannes Reinecke <hare@suse.com>, Christoph Hellwig <hch@lst.de>,
	Sagi Grimberg <sagi@grimberg.me>,
	"Bryant G. Ly" <bryantly@linux.vnet.ibm.com>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Bart Van Assche <bvanassche@acm.org>
Subject: Re: [PATCH 2/2] target: Fix target_wait_for_sess_cmds breakage with active signals
Date: Wed, 10 Oct 2018 16:58:39 +0000	[thread overview]
Message-ID: <5BBE2FBF.7080804@redhat.com> (raw)
In-Reply-To: <1539141790-13557-3-git-send-email-nab@linux-iscsi.org>

On 10/09/2018 10:23 PM, Nicholas A. Bellinger wrote:
> From: Nicholas Bellinger <nab@linux-iscsi.org>
> 
> With the addition of commit 00d909a107 in v4.19-rc, it incorrectly assumes no
> signals will be pending for task_struct executing the normal session shutdown
> and I/O quiesce code-path.
> 
> For example, iscsi-target and iser-target issue SIGINT to all kthreads as
> part of session shutdown.  This has been the behaviour since day one.
> 
> As-is when signals are pending with se_cmds active in se_sess->sess_cmd_list,
> wait_event_interruptible_lock_irq_timeout() returns a negative number and
> immediately kills the machine because of the do while (ret <= 0) loop that
> was added in commit 00d909a107 to spin while backend I/O is taking any
> amount of extended time (say 30 seconds) to complete.
> 
> Here's what it looks like in action with debug plus delayed backend I/O
> completion:
> 
> [ 4951.909951] se_sess: 000000003e7e08fa before target_wait_for_sess_cmds
> [ 4951.914600] target_wait_for_sess_cmds: signal_pending: 1
> [ 4951.918015] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 0
> [ 4951.921639] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 1
> [ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 2
> [ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 3
> [ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 4
> [ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 5
> [ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 6
> [ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 7
> [ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 8
> [ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 9
> 
> ... followed by the usual RCU CPU stalls and deadlock.
> 
> There was never a case pre commit 00d909a107 where wait_for_complete(&se_cmd->cmd_wait_comp)
> was able to be interruptted, so to address this for v4.19+ moving forward go ahead and
> use wait_event_lock_irq_timeout() instead so new code works with all fabric drivers.
> 
> Also for commit 00d909a107, fix a minor regression in target_release_cmd_kref()
> to only wake_up the new se_sess->cmd_list_wq only when shutdown has actually
> been triggered via se_sess->sess_tearing_down.
> 
> Fixes: 00d909a107 ("scsi: target: Make the session shutdown code also wait for commands that are being aborted")
> Cc: Bart Van Assche <bvanassche@acm.org>
> Cc: Mike Christie <mchristi@redhat.com>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Sagi Grimberg <sagi@grimberg.me>
> Cc: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
> Tested-by: Nicholas Bellinger <nab@linux-iscsi.org>
> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
> ---
>  drivers/target/target_core_transport.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
> index 86c0156..fc3093d2 100644
> --- a/drivers/target/target_core_transport.c
> +++ b/drivers/target/target_core_transport.c
> @@ -2754,7 +2754,7 @@ static void target_release_cmd_kref(struct kref *kref)
>  	if (se_sess) {
>  		spin_lock_irqsave(&se_sess->sess_cmd_lock, flags);
>  		list_del_init(&se_cmd->se_cmd_list);
> -		if (list_empty(&se_sess->sess_cmd_list))
> +		if (se_sess->sess_tearing_down && list_empty(&se_sess->sess_cmd_list))

I think there is another issue with 00d909a107 and ibmvscsi_tgt.

The problem is that ibmvscsi_tgt never called
target_sess_cmd_list_set_waiting. It only called
target_wait_for_sess_cmds. So before 00d909a107 there was a bug in that
driver and target_wait_for_sess_cmds never did what was intended because
sess_wait_list would always be empty.

With 00d909a107, we no longer need to call
target_sess_cmd_list_set_waiting to wait for outstanding commands, so
for ibmvscsi_tgt will now wait for commands like we wanted. However, the
commit added a WARN_ON that is hit if target_sess_cmd_list_set_waiting
is not called, so we could hit that.

So I think we need to add a target_sess_cmd_list_set_waiting call in
ibmvscsi_tgt to go along with your patch chunk above and make sure we do
not trigger the WARN_ON.

>  			wake_up(&se_sess->cmd_list_wq);
>  		spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags);
>  	}
> @@ -2907,7 +2907,7 @@ void target_wait_for_sess_cmds(struct se_session *se_sess)
>  
>  	spin_lock_irq(&se_sess->sess_cmd_lock);
>  	do {
> -		ret = wait_event_interruptible_lock_irq_timeout(
> +		ret = wait_event_lock_irq_timeout(
>  				se_sess->cmd_list_wq,
>  				list_empty(&se_sess->sess_cmd_list),
>  				se_sess->sess_cmd_lock, 180 * HZ);
> 

  parent reply	other threads:[~2018-10-10 16:58 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-10  3:23 [PATCH 0/2] target: Fix v4.19-rc active I/O shutdown deadlock Nicholas A. Bellinger
2018-10-10  3:23 ` [PATCH 1/2] sched/wait: Add wait_event_lock_irq_timeout for TASK_UNINTERRUPTIBLE usage Nicholas A. Bellinger
2018-10-10  3:59   ` Ly, Bryant
2018-10-10  3:59     ` Ly, Bryant
2018-10-10  8:31   ` Peter Zijlstra
2018-10-10  8:31     ` Peter Zijlstra
2018-10-12  2:18   ` Bart Van Assche
2018-10-12  2:18     ` Bart Van Assche
2018-10-10  3:23 ` [PATCH 2/2] target: Fix target_wait_for_sess_cmds breakage with active signals Nicholas A. Bellinger
2018-10-10  4:01   ` Ly, Bryant
2018-10-10  4:58   ` Bart Van Assche
2018-10-10  4:58     ` Bart Van Assche
2018-10-10  8:43   ` Peter Zijlstra
2018-10-10  8:43     ` Peter Zijlstra
2018-10-11  5:40     ` Nicholas A. Bellinger
2018-10-11  5:40       ` Nicholas A. Bellinger
2018-10-11  7:55       ` Peter Zijlstra
2018-10-11  7:55         ` Peter Zijlstra
2018-10-10 16:58   ` Mike Christie [this message]
2018-10-10 16:58     ` Mike Christie
2018-10-11  5:56     ` Nicholas A. Bellinger
2018-10-11  5:56       ` Nicholas A. Bellinger
2018-10-11  5:56       ` Nicholas A. Bellinger
2018-10-11 13:05       ` Ly, Bryant
2018-10-16  4:13         ` Martin K. Petersen
2018-10-16  4:13           ` Martin K. Petersen
2018-10-16  4:13           ` Martin K. Petersen
2018-10-16 14:37           ` Ly, Bryant
2018-10-16 14:37             ` Ly, Bryant
2018-10-10  4:20 ` [PATCH 0/2] target: Fix v4.19-rc active I/O shutdown deadlock Nicholas A. Bellinger
2018-10-10  4:20   ` Nicholas A. Bellinger
2018-10-10  4:20   ` Nicholas A. Bellinger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5BBE2FBF.7080804@redhat.com \
    --to=mchristi@redhat.com \
    --cc=bryantly@linux.vnet.ibm.com \
    --cc=bvanassche@acm.org \
    --cc=hare@suse.com \
    --cc=hch@lst.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=nab@linux-iscsi.org \
    --cc=peterz@infradead.org \
    --cc=sagi@grimberg.me \
    --cc=target-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.