linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Anatol Pomozov <anatol.pomozov@gmail.com>,
	Linux SCSI List <linux-scsi@vger.kernel.org>,
	hch@lst.de, Tejun Heo <tj@kernel.org>
Subject: Re: IO failures with SMR drives at latest kernel versions
Date: Thu, 27 Aug 2015 08:18:24 +0200	[thread overview]
Message-ID: <55DEABB0.5070806@suse.de> (raw)
In-Reply-To: <1440629555.2196.92.camel@HansenPartnership.com>

On 08/27/2015 12:52 AM, James Bottomley wrote:
> On Wed, 2015-08-26 at 08:40 +0200, Hannes Reinecke wrote:
>> On 08/26/2015 06:53 AM, Anatol Pomozov wrote:
>>> Hi
>>>
>>> On Sun, Aug 23, 2015 at 11:15 PM, Hannes Reinecke <hare@suse.de> wrote:
>>>>> I looked at this commit and it actually adds SMR support to SCSI
>>>>> layer. Reverting ATA_DEV_ZAC means going back to zones-unaware
>>>>> algorithms. It is suboptimal but still much better than IO failures
>>>>> and "BTRFS: lost page write due to I/O error on /dev/sdc" errors I see
>>>>> at my computer.
>>>>>
>>>>> If this SMR support is considered as non-stable, can we at least get a
>>>>> kernel boot (or config) option that disables ZAC?
>>>>>
>>>> Again: Has anybody actually _tested_ that reverting this patch fixes
>>>> this issue?
>>>
>>> Yes I tested it.
>>>
>>> This error happens only under heavy load with a lot of read/writes
>>> (like btrfs rebalance).
>>>
>>> With current Linux-4.1.6 'btrfs balance' fails after ~10 minutes after
>>> start. I reverted ZAC related changes and then ran rebalancing. The
>>> operation finished successfully after 3 hours of running.
>>>
>> Can you be a bit more specific about the 'ZAC related changes'?
>> There have been several patches, and we really would need to know
>> which one was the offending one.
>> Can you try to bisect things here?
> 
> OK, let's stop shooting the messenger here.  There are multiple reports
> of this problem.  The pattern seems to be some type of error causes
> everything to die.
> 
> There looks to be an obvious bug in
> 9162c6579bf90b3f5ddb7e3a6c6fa946c1b4cbeb in that there's no
> ATA_DEV_ZAC_UNSUP class which means that any attempt to disable the
> device pushes it up to ATA_DEV_NONE.  I'm not sure ... don't have time
> to follow the code ... but doesn't this interfere with the speed
> dropping routines which seems to disable then re-enable the device?
> Does adding ATA_DEV_ZAC_UNSUP fix this problem? patch (compile tested
> only) below.
> 
> James
> 
> ---
> 
> diff --git a/drivers/ata/libata-transport.c b/drivers/ata/libata-transport.c
> index d6c37bc..fa83320 100644
> --- a/drivers/ata/libata-transport.c
> +++ b/drivers/ata/libata-transport.c
> @@ -144,6 +144,7 @@ static struct {
>  	{ ATA_DEV_SEMB,			"semb" },
>  	{ ATA_DEV_SEMB_UNSUP,		"semb" },
>  	{ ATA_DEV_ZAC,			"zac" },
> +	{ ATA_DEV_ZAC_UNSUP,		"zac" },
>  	{ ATA_DEV_NONE,			"none" }
>  };
>  ata_bitfield_name_search(class, ata_class_names)
> diff --git a/include/linux/libata.h b/include/linux/libata.h
> index 36ce37b..49c5b98 100644
> --- a/include/linux/libata.h
> +++ b/include/linux/libata.h
> @@ -191,7 +191,8 @@ enum {
>  	ATA_DEV_SEMB		= 7,	/* SEMB */
>  	ATA_DEV_SEMB_UNSUP	= 8,	/* SEMB (unsupported) */
>  	ATA_DEV_ZAC		= 9,	/* ZAC device */
> -	ATA_DEV_NONE		= 10,	/* no device */
> +	ATA_DEV_ZAC_UNSUP	= 10,	/* ZAC (unsupported) */
> +	ATA_DEV_NONE		= 11,	/* no device */
>  
>  	/* struct ata_link flags */
>  	ATA_LFLAG_NO_HRST	= (1 << 1), /* avoid hardreset */
> @@ -1517,7 +1518,8 @@ static inline unsigned int ata_class_enabled(unsigned int class)
>  static inline unsigned int ata_class_disabled(unsigned int class)
>  {
>  	return class == ATA_DEV_ATA_UNSUP || class == ATA_DEV_ATAPI_UNSUP ||
> -		class == ATA_DEV_PMP_UNSUP || class == ATA_DEV_SEMB_UNSUP;
> +		class == ATA_DEV_PMP_UNSUP || class == ATA_DEV_SEMB_UNSUP ||
> +		class == ATA_DEV_ZAC_UNSUP;
>  }
>  
>  static inline unsigned int ata_class_absent(unsigned int class)
> 
> 
Yes, you are correct. Even if this does not fix up this particular
issue it looks like a valid fix.

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2015-08-27  6:18 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-22  5:37 IO failures with SMR drives at latest kernel versions Anatol Pomozov
2015-08-22 16:35 ` Tejun Heo
2015-08-22 17:23 ` Anatol Pomozov
2015-08-24  6:15   ` Hannes Reinecke
2015-08-24  7:21     ` Anatol Pomozov
2015-08-26  4:53     ` Anatol Pomozov
2015-08-26  6:40       ` Hannes Reinecke
2015-08-26 21:13         ` Anatol Pomozov
2015-08-26 22:52         ` James Bottomley
2015-08-27  6:18           ` Hannes Reinecke [this message]
2015-08-24  6:11 ` Hannes Reinecke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55DEABB0.5070806@suse.de \
    --to=hare@suse.de \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=anatol.pomozov@gmail.com \
    --cc=hch@lst.de \
    --cc=linux-scsi@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).