Re: [PATCH] Avoid that ATA error handling hangs

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Damien Le Moal <Damien.LeMoal@wdc.com>
To: "jejb@linux.vnet.ibm.com" <jejb@linux.vnet.ibm.com>,
	Bart Van Assche <Bart.VanAssche@wdc.com>,
	"martin.petersen@oracle.com" <martin.petersen@oracle.com>
Cc: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"hare@suse.com" <hare@suse.com>,
	"jthumshirn@suse.de" <jthumshirn@suse.de>,
	"ptikhomirov@virtuozzo.com" <ptikhomirov@virtuozzo.com>,
	"ncopa@alpinelinux.org" <ncopa@alpinelinux.org>,
	"stable@vger.kernel.org" <stable@vger.kernel.org>
Subject: Re: [PATCH] Avoid that ATA error handling hangs
Date: Thu, 22 Feb 2018 02:23:25 +0000	[thread overview]
Message-ID: <1519266202.16203.5.camel@wdc.com> (raw)
In-Reply-To: <20180221172316.11884-1-bart.vanassche@wdc.com>

Bart,

On Wed, 2018-02-21 at 09:23 -0800, Bart Van Assche wrote:
> Avoid that the recently introduced call_rcu() call in the SCSI core
> causes the RCU core to complain about double call_rcu() calls.
> 
> Reported-by: Natanael Copa <ncopa@alpinelinux.org>
> Reported-by: Damien Le Moal <damien.lemoal@wdc.com>
> References: https://bugzilla.kernel.org/show_bug.cgi?id=198861
> Fixes: 3bd6f43f5cb3 ("scsi: core: Ensure that the SCSI error handler gets
> woken up")
> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: Natanael Copa <ncopa@alpinelinux.org>
> Cc: Damien Le Moal <damien.lemoal@wdc.com>
> Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> Cc: <stable@vger.kernel.org>
> ---
>  drivers/scsi/scsi_error.c | 5 +++--
>  include/scsi/scsi_cmnd.h  | 3 +++
>  include/scsi/scsi_host.h  | 2 --
>  3 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index ae325985eac1..ac9ce099530e 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -229,7 +229,8 @@ static void scsi_eh_reset(struct scsi_cmnd *scmd)
>  
>  static void scsi_eh_inc_host_failed(struct rcu_head *head)
>  {
> -	struct Scsi_Host *shost = container_of(head, typeof(*shost), rcu);
> +	struct scsi_cmnd *scmd = container_of(head, typeof(*scmd), rcu);
> +	struct Scsi_Host *shost = scmd->device->host;
>  	unsigned long flags;
>  
>  	spin_lock_irqsave(shost->host_lock, flags);
> @@ -265,7 +266,7 @@ void scsi_eh_scmd_add(struct scsi_cmnd *scmd)
>  	 * Ensure that all tasks observe the host state change before the
>  	 * host_failed change.
>  	 */
> -	call_rcu(&shost->rcu, scsi_eh_inc_host_failed);
> +	call_rcu(&scmd->rcu, scsi_eh_inc_host_failed);
>  }
>  
>  /**
> diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
> index d8d4a902a88d..2280b2351739 100644
> --- a/include/scsi/scsi_cmnd.h
> +++ b/include/scsi/scsi_cmnd.h
> @@ -68,6 +68,9 @@ struct scsi_cmnd {
>  	struct list_head list;  /* scsi_cmnd participates in queue lists */
>  	struct list_head eh_entry; /* entry for the host eh_cmd_q */
>  	struct delayed_work abort_work;
> +
> +	struct rcu_head rcu;
> +
>  	int eh_eflags;		/* Used by error handlr */
>  
>  	/*
> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
> index 1a1df0d21ee3..a8b7bf879ced 100644
> --- a/include/scsi/scsi_host.h
> +++ b/include/scsi/scsi_host.h
> @@ -571,8 +571,6 @@ struct Scsi_Host {
>  		struct blk_mq_tag_set	tag_set;
>  	};
>  
> -	struct rcu_head rcu;
> -
>  	atomic_t host_busy;		   /* commands actually active
> on low-level */
>  	atomic_t host_blocked;

This does not compile. You missed the init_rcu_head() and destroy_rcu_head()
changes. Adding this:

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 57bf43e34863..dd9464920456 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -328,8 +328,6 @@ static void scsi_host_dev_release(struct device *dev)
        if (shost->work_q)
                destroy_workqueue(shost->work_q);
 
-       destroy_rcu_head(&shost->rcu);
-
        if (shost->shost_state == SHOST_CREATED) {
                /*
                 * Free the shost_dev device name here if scsi_host_alloc()
@@ -404,7 +402,6 @@ struct Scsi_Host *scsi_host_alloc(struct
scsi_host_template *sht, int privsize)
        INIT_LIST_HEAD(&shost->starved_list);
        init_waitqueue_head(&shost->host_wait);
        mutex_init(&shost->scan_mutex);
-       init_rcu_head(&shost->rcu);
 
        index = ida_simple_get(&host_index_ida, 0, 0, GFP_KERNEL);
        if (index < 0)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index a86df9ca7d1c..488e5c9acedf 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -590,6 +590,8 @@ static void scsi_uninit_cmd(struct scsi_cmnd *cmd)
                if (drv->uninit_command)
                        drv->uninit_command(cmd);
        }
+
+       destroy_rcu_head(&cmd->rcu);
 }
 
 static void scsi_mq_free_sgtables(struct scsi_cmnd *cmd)
@@ -1153,6 +1155,7 @@ static void scsi_initialize_rq(struct request *rq)
        scsi_req_init(&cmd->req);
        cmd->jiffies_at_alloc = jiffies;
        cmd->retries = 0;
+       init_rcu_head(&cmd->rcu);
 }
 
 /* Add a command to the list used by the aacraid and dpt_i2o drivers */

And it compiles.

Testing this, the rcu hang is now gone.

However, the behavior of the error recovery  is still different from what I
see in 4.15 and 4.14. For my test case, an unaligned write to a sequential
zone on a ZAC drive connected to an AHCI port, the report zone issued during
the disk revalidation after the write error fails with a timeout, which causes
capacity change to 0, port reset and recovery again. Eventually, everything
comes back up OK, but it takes some time.

I am investigating to make sure I am not hitting a device FW bug to confirm if
this is a kernel problem.

Best regards.

-- 
Damien Le Moal
Western Digital

next prev parent reply	other threads:[~2018-02-22  2:23 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-21 17:23 [PATCH] Avoid that ATA error handling hangs Bart Van Assche
2018-02-22  2:23 ` Damien Le Moal [this message]
2018-02-22  3:53   ` Bart Van Assche
2018-02-22  4:06     ` Martin K. Petersen
2018-02-22  4:19       ` Damien Le Moal
2018-02-22  4:39         ` Bart Van Assche
2018-02-22  4:39           ` Bart Van Assche
2018-02-22  4:55             ` Damien Le Moal
2018-02-22 17:15     ` Natanael Copa

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:57bf43e3486 dfblob:dd946492045 dfblob:a86df9ca7d1
dfblob:488e5c9aced )
 OR (
bs:"Re: [PATCH] Avoid that ATA error handling hangs" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1519266202.16203.5.camel@wdc.com \
    --to=damien.lemoal@wdc.com \
    --cc=Bart.VanAssche@wdc.com \
    --cc=hare@suse.com \
    --cc=jejb@linux.vnet.ibm.com \
    --cc=jthumshirn@suse.de \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=ncopa@alpinelinux.org \
    --cc=ptikhomirov@virtuozzo.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.