public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Damien Le Moal <Damien.LeMoal@wdc.com>
To: "jejb@linux.vnet.ibm.com" <jejb@linux.vnet.ibm.com>,
	Bart Van Assche <Bart.VanAssche@wdc.com>,
	"martin.petersen@oracle.com" <martin.petersen@oracle.com>
Cc: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"hare@suse.com" <hare@suse.com>,
	"jthumshirn@suse.de" <jthumshirn@suse.de>,
	"ptikhomirov@virtuozzo.com" <ptikhomirov@virtuozzo.com>,
	"ncopa@alpinelinux.org" <ncopa@alpinelinux.org>,
	"stable@vger.kernel.org" <stable@vger.kernel.org>
Subject: Re: [PATCH] Avoid that ATA error handling hangs
Date: Thu, 22 Feb 2018 02:23:25 +0000	[thread overview]
Message-ID: <1519266202.16203.5.camel@wdc.com> (raw)
In-Reply-To: <20180221172316.11884-1-bart.vanassche@wdc.com>

Bart,

On Wed, 2018-02-21 at 09:23 -0800, Bart Van Assche wrote:
> Avoid that the recently introduced call_rcu() call in the SCSI core
> causes the RCU core to complain about double call_rcu() calls.
> 
> Reported-by: Natanael Copa <ncopa@alpinelinux.org>
> Reported-by: Damien Le Moal <damien.lemoal@wdc.com>
> References: https://bugzilla.kernel.org/show_bug.cgi?id=198861
> Fixes: 3bd6f43f5cb3 ("scsi: core: Ensure that the SCSI error handler gets
> woken up")
> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: Natanael Copa <ncopa@alpinelinux.org>
> Cc: Damien Le Moal <damien.lemoal@wdc.com>
> Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> Cc: <stable@vger.kernel.org>
> ---
>  drivers/scsi/scsi_error.c | 5 +++--
>  include/scsi/scsi_cmnd.h  | 3 +++
>  include/scsi/scsi_host.h  | 2 --
>  3 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index ae325985eac1..ac9ce099530e 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -229,7 +229,8 @@ static void scsi_eh_reset(struct scsi_cmnd *scmd)
>  
>  static void scsi_eh_inc_host_failed(struct rcu_head *head)
>  {
> -	struct Scsi_Host *shost = container_of(head, typeof(*shost), rcu);
> +	struct scsi_cmnd *scmd = container_of(head, typeof(*scmd), rcu);
> +	struct Scsi_Host *shost = scmd->device->host;
>  	unsigned long flags;
>  
>  	spin_lock_irqsave(shost->host_lock, flags);
> @@ -265,7 +266,7 @@ void scsi_eh_scmd_add(struct scsi_cmnd *scmd)
>  	 * Ensure that all tasks observe the host state change before the
>  	 * host_failed change.
>  	 */
> -	call_rcu(&shost->rcu, scsi_eh_inc_host_failed);
> +	call_rcu(&scmd->rcu, scsi_eh_inc_host_failed);
>  }
>  
>  /**
> diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
> index d8d4a902a88d..2280b2351739 100644
> --- a/include/scsi/scsi_cmnd.h
> +++ b/include/scsi/scsi_cmnd.h
> @@ -68,6 +68,9 @@ struct scsi_cmnd {
>  	struct list_head list;  /* scsi_cmnd participates in queue lists */
>  	struct list_head eh_entry; /* entry for the host eh_cmd_q */
>  	struct delayed_work abort_work;
> +
> +	struct rcu_head rcu;
> +
>  	int eh_eflags;		/* Used by error handlr */
>  
>  	/*
> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
> index 1a1df0d21ee3..a8b7bf879ced 100644
> --- a/include/scsi/scsi_host.h
> +++ b/include/scsi/scsi_host.h
> @@ -571,8 +571,6 @@ struct Scsi_Host {
>  		struct blk_mq_tag_set	tag_set;
>  	};
>  
> -	struct rcu_head rcu;
> -
>  	atomic_t host_busy;		   /* commands actually active
> on low-level */
>  	atomic_t host_blocked;

This does not compile. You missed the init_rcu_head() and destroy_rcu_head()
changes. Adding this:

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 57bf43e34863..dd9464920456 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -328,8 +328,6 @@ static void scsi_host_dev_release(struct device *dev)
        if (shost->work_q)
                destroy_workqueue(shost->work_q);
 
-       destroy_rcu_head(&shost->rcu);
-
        if (shost->shost_state == SHOST_CREATED) {
                /*
                 * Free the shost_dev device name here if scsi_host_alloc()
@@ -404,7 +402,6 @@ struct Scsi_Host *scsi_host_alloc(struct
scsi_host_template *sht, int privsize)
        INIT_LIST_HEAD(&shost->starved_list);
        init_waitqueue_head(&shost->host_wait);
        mutex_init(&shost->scan_mutex);
-       init_rcu_head(&shost->rcu);
 
        index = ida_simple_get(&host_index_ida, 0, 0, GFP_KERNEL);
        if (index < 0)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index a86df9ca7d1c..488e5c9acedf 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -590,6 +590,8 @@ static void scsi_uninit_cmd(struct scsi_cmnd *cmd)
                if (drv->uninit_command)
                        drv->uninit_command(cmd);
        }
+
+       destroy_rcu_head(&cmd->rcu);
 }
 
 static void scsi_mq_free_sgtables(struct scsi_cmnd *cmd)
@@ -1153,6 +1155,7 @@ static void scsi_initialize_rq(struct request *rq)
        scsi_req_init(&cmd->req);
        cmd->jiffies_at_alloc = jiffies;
        cmd->retries = 0;
+       init_rcu_head(&cmd->rcu);
 }
 
 /* Add a command to the list used by the aacraid and dpt_i2o drivers */

And it compiles.

Testing this, the rcu hang is now gone.

However, the behavior of the error recovery  is still different from what I
see in 4.15 and 4.14. For my test case, an unaligned write to a sequential
zone on a ZAC drive connected to an AHCI port, the report zone issued during
the disk revalidation after the write error fails with a timeout, which causes
capacity change to 0, port reset and recovery again. Eventually, everything
comes back up OK, but it takes some time.

I am investigating to make sure I am not hitting a device FW bug to confirm if
this is a kernel problem.

Best regards.

-- 
Damien Le Moal
Western Digital

  reply	other threads:[~2018-02-22  2:23 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-21 17:23 [PATCH] Avoid that ATA error handling hangs Bart Van Assche
2018-02-22  2:23 ` Damien Le Moal [this message]
2018-02-22  3:53   ` Bart Van Assche
2018-02-22  4:06     ` Martin K. Petersen
2018-02-22  4:19       ` Damien Le Moal
2018-02-22  4:39         ` Bart Van Assche
2018-02-22  4:39           ` Bart Van Assche
2018-02-22  4:55             ` Damien Le Moal
2018-02-22 17:15     ` Natanael Copa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1519266202.16203.5.camel@wdc.com \
    --to=damien.lemoal@wdc.com \
    --cc=Bart.VanAssche@wdc.com \
    --cc=hare@suse.com \
    --cc=jejb@linux.vnet.ibm.com \
    --cc=jthumshirn@suse.de \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=ncopa@alpinelinux.org \
    --cc=ptikhomirov@virtuozzo.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox