From: Hannes Reinecke <hare@suse.de>
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Anatol Pomozov <anatol.pomozov@gmail.com>,
Linux SCSI List <linux-scsi@vger.kernel.org>,
hch@lst.de, Tejun Heo <tj@kernel.org>
Subject: Re: IO failures with SMR drives at latest kernel versions
Date: Thu, 27 Aug 2015 08:18:24 +0200 [thread overview]
Message-ID: <55DEABB0.5070806@suse.de> (raw)
In-Reply-To: <1440629555.2196.92.camel@HansenPartnership.com>
On 08/27/2015 12:52 AM, James Bottomley wrote:
> On Wed, 2015-08-26 at 08:40 +0200, Hannes Reinecke wrote:
>> On 08/26/2015 06:53 AM, Anatol Pomozov wrote:
>>> Hi
>>>
>>> On Sun, Aug 23, 2015 at 11:15 PM, Hannes Reinecke <hare@suse.de> wrote:
>>>>> I looked at this commit and it actually adds SMR support to SCSI
>>>>> layer. Reverting ATA_DEV_ZAC means going back to zones-unaware
>>>>> algorithms. It is suboptimal but still much better than IO failures
>>>>> and "BTRFS: lost page write due to I/O error on /dev/sdc" errors I see
>>>>> at my computer.
>>>>>
>>>>> If this SMR support is considered as non-stable, can we at least get a
>>>>> kernel boot (or config) option that disables ZAC?
>>>>>
>>>> Again: Has anybody actually _tested_ that reverting this patch fixes
>>>> this issue?
>>>
>>> Yes I tested it.
>>>
>>> This error happens only under heavy load with a lot of read/writes
>>> (like btrfs rebalance).
>>>
>>> With current Linux-4.1.6 'btrfs balance' fails after ~10 minutes after
>>> start. I reverted ZAC related changes and then ran rebalancing. The
>>> operation finished successfully after 3 hours of running.
>>>
>> Can you be a bit more specific about the 'ZAC related changes'?
>> There have been several patches, and we really would need to know
>> which one was the offending one.
>> Can you try to bisect things here?
>
> OK, let's stop shooting the messenger here. There are multiple reports
> of this problem. The pattern seems to be some type of error causes
> everything to die.
>
> There looks to be an obvious bug in
> 9162c6579bf90b3f5ddb7e3a6c6fa946c1b4cbeb in that there's no
> ATA_DEV_ZAC_UNSUP class which means that any attempt to disable the
> device pushes it up to ATA_DEV_NONE. I'm not sure ... don't have time
> to follow the code ... but doesn't this interfere with the speed
> dropping routines which seems to disable then re-enable the device?
> Does adding ATA_DEV_ZAC_UNSUP fix this problem? patch (compile tested
> only) below.
>
> James
>
> ---
>
> diff --git a/drivers/ata/libata-transport.c b/drivers/ata/libata-transport.c
> index d6c37bc..fa83320 100644
> --- a/drivers/ata/libata-transport.c
> +++ b/drivers/ata/libata-transport.c
> @@ -144,6 +144,7 @@ static struct {
> { ATA_DEV_SEMB, "semb" },
> { ATA_DEV_SEMB_UNSUP, "semb" },
> { ATA_DEV_ZAC, "zac" },
> + { ATA_DEV_ZAC_UNSUP, "zac" },
> { ATA_DEV_NONE, "none" }
> };
> ata_bitfield_name_search(class, ata_class_names)
> diff --git a/include/linux/libata.h b/include/linux/libata.h
> index 36ce37b..49c5b98 100644
> --- a/include/linux/libata.h
> +++ b/include/linux/libata.h
> @@ -191,7 +191,8 @@ enum {
> ATA_DEV_SEMB = 7, /* SEMB */
> ATA_DEV_SEMB_UNSUP = 8, /* SEMB (unsupported) */
> ATA_DEV_ZAC = 9, /* ZAC device */
> - ATA_DEV_NONE = 10, /* no device */
> + ATA_DEV_ZAC_UNSUP = 10, /* ZAC (unsupported) */
> + ATA_DEV_NONE = 11, /* no device */
>
> /* struct ata_link flags */
> ATA_LFLAG_NO_HRST = (1 << 1), /* avoid hardreset */
> @@ -1517,7 +1518,8 @@ static inline unsigned int ata_class_enabled(unsigned int class)
> static inline unsigned int ata_class_disabled(unsigned int class)
> {
> return class == ATA_DEV_ATA_UNSUP || class == ATA_DEV_ATAPI_UNSUP ||
> - class == ATA_DEV_PMP_UNSUP || class == ATA_DEV_SEMB_UNSUP;
> + class == ATA_DEV_PMP_UNSUP || class == ATA_DEV_SEMB_UNSUP ||
> + class == ATA_DEV_ZAC_UNSUP;
> }
>
> static inline unsigned int ata_class_absent(unsigned int class)
>
>
Yes, you are correct. Even if this does not fix up this particular
issue it looks like a valid fix.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2015-08-27 6:18 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-22 5:37 IO failures with SMR drives at latest kernel versions Anatol Pomozov
2015-08-22 16:35 ` Tejun Heo
2015-08-22 17:23 ` Anatol Pomozov
2015-08-24 6:15 ` Hannes Reinecke
2015-08-24 7:21 ` Anatol Pomozov
2015-08-26 4:53 ` Anatol Pomozov
2015-08-26 6:40 ` Hannes Reinecke
2015-08-26 21:13 ` Anatol Pomozov
2015-08-26 22:52 ` James Bottomley
2015-08-27 6:18 ` Hannes Reinecke [this message]
2015-08-24 6:11 ` Hannes Reinecke
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55DEABB0.5070806@suse.de \
--to=hare@suse.de \
--cc=James.Bottomley@HansenPartnership.com \
--cc=anatol.pomozov@gmail.com \
--cc=hch@lst.de \
--cc=linux-scsi@vger.kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).