IO failures with SMR drives at latest kernel versions

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* IO failures with SMR drives at latest kernel versions
@ 2015-08-22  5:37 Anatol Pomozov
  2015-08-22 16:35 ` Tejun Heo
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Anatol Pomozov @ 2015-08-22  5:37 UTC (permalink / raw)
  To: Hannes Reinecke, Linux SCSI List; +Cc: hch, Tejun Heo

Hi

I recently got 2 Seagate 8Tb drives. 'dd' over whole disc ran fine.
Then I inserted into my RAID and started rebalancing. I've got
following error almost immediately:



ata5.00: failed command: WRITE FPDMA QUEUED
ata5.00: cmd 61/00:e0:80:e7:75/1d:00:9f:00:00/40 tag 28 ncq 3801088 out
                                       res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata5.00: status: { DRDY }
ata5.00: failed command: WRITE FPDMA QUEUED
ata5.00: cmd 61/80:e8:80:04:76/1d:00:9f:00:00/40 tag 29 ncq 3866624 out
                                       res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata5.00: status: { DRDY }
ata5: hard resetting link
ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata5.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
ata5.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
ata5.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY)
filtered out
ata5.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
ata5.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
ata5.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY)
filtered out
ata5.00: configured for UDMA/133
ata5.00: device reported invalid CHS sector 0
ata5.00: device reported invalid CHS sector 0
ata5.00: device reported invalid CHS sector 0
ata5.00: device reported invalid CHS sector 0
ata5.00: device reported invalid CHS sector 0
ata5.00: device reported invalid CHS sector 0
ata5.00: device reported invalid CHS sector 0
ata5.00: device reported invalid CHS sector 0
ata5.00: device reported invalid CHS sector 0
ata5.00: device reported invalid CHS sector 0
ata5.00: device reported invalid CHS sector 0



Discs are from different batches and I was a bit surprised to see
identical failures at the same time. I was ready send the drives to
RMA but then I discovered this thread
https://bbs.archlinux.org/viewtopic.php?id=199351 A lot of people
report the same problem as mine. It was found that the problem appears
only starting from 3.19, with kernel 3.18 or Windows these drives work
fine. It is recommended to revert this change
9162c6579bf90b3f5ddb7e3a6c6fa946c1b4cbeb "libata: Implement
ATA_DEV_ZAC" that seems fixes the issue.

There is also the same issue reported at kernel.org
https://bugzilla.kernel.org/show_bug.cgi?id=93581


This issue is extremely annoying and makes negative SMR drives user
experience. Had anybody look at this bug?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: IO failures with SMR drives at latest kernel versions
  2015-08-22  5:37 IO failures with SMR drives at latest kernel versions Anatol Pomozov
@ 2015-08-22 16:35 ` Tejun Heo
  2015-08-22 17:23 ` Anatol Pomozov
  2015-08-24  6:11 ` Hannes Reinecke
  2 siblings, 0 replies; 11+ messages in thread
From: Tejun Heo @ 2015-08-22 16:35 UTC (permalink / raw)
  To: Anatol Pomozov; +Cc: Hannes Reinecke, Linux SCSI List, hch

Hello,

On Fri, Aug 21, 2015 at 10:37:44PM -0700, Anatol Pomozov wrote:
...
> https://bbs.archlinux.org/viewtopic.php?id=199351 A lot of people
> report the same problem as mine. It was found that the problem appears
> only starting from 3.19, with kernel 3.18 or Windows these drives work
> fine. It is recommended to revert this change
> 9162c6579bf90b3f5ddb7e3a6c6fa946c1b4cbeb "libata: Implement
> ATA_DEV_ZAC" that seems fixes the issue.

Hannes?

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: IO failures with SMR drives at latest kernel versions
  2015-08-22  5:37 IO failures with SMR drives at latest kernel versions Anatol Pomozov
  2015-08-22 16:35 ` Tejun Heo
@ 2015-08-22 17:23 ` Anatol Pomozov
  2015-08-24  6:15   ` Hannes Reinecke
  2015-08-24  6:11 ` Hannes Reinecke
  2 siblings, 1 reply; 11+ messages in thread
From: Anatol Pomozov @ 2015-08-22 17:23 UTC (permalink / raw)
  To: Hannes Reinecke, Linux SCSI List; +Cc: hch, Tejun Heo

Hi

On Fri, Aug 21, 2015 at 10:37 PM, Anatol Pomozov
<anatol.pomozov@gmail.com> wrote:
> Discs are from different batches and I was a bit surprised to see
> identical failures at the same time. I was ready send the drives to
> RMA but then I discovered this thread
> https://bbs.archlinux.org/viewtopic.php?id=199351 A lot of people
> report the same problem as mine. It was found that the problem appears
> only starting from 3.19, with kernel 3.18 or Windows these drives work
> fine. It is recommended to revert this change
> 9162c6579bf90b3f5ddb7e3a6c6fa946c1b4cbeb "libata: Implement
> ATA_DEV_ZAC" that seems fixes the issue.

I looked at this commit and it actually adds SMR support to SCSI
layer. Reverting ATA_DEV_ZAC means going back to zones-unaware
algorithms. It is suboptimal but still much better than IO failures
and "BTRFS: lost page write due to I/O error on /dev/sdc" errors I see
at my computer.

If this SMR support is considered as non-stable, can we at least get a
kernel boot (or config) option that disables ZAC?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: IO failures with SMR drives at latest kernel versions
  2015-08-22  5:37 IO failures with SMR drives at latest kernel versions Anatol Pomozov
  2015-08-22 16:35 ` Tejun Heo
  2015-08-22 17:23 ` Anatol Pomozov
@ 2015-08-24  6:11 ` Hannes Reinecke
  2 siblings, 0 replies; 11+ messages in thread
From: Hannes Reinecke @ 2015-08-24  6:11 UTC (permalink / raw)
  To: Anatol Pomozov, Linux SCSI List; +Cc: hch, Tejun Heo

On 08/22/2015 07:37 AM, Anatol Pomozov wrote:
> Hi
> 
> I recently got 2 Seagate 8Tb drives. 'dd' over whole disc ran fine.
> Then I inserted into my RAID and started rebalancing. I've got
> following error almost immediately:
> 
> 
> 
> ata5.00: failed command: WRITE FPDMA QUEUED
> ata5.00: cmd 61/00:e0:80:e7:75/1d:00:9f:00:00/40 tag 28 ncq 3801088 out
>                                        res
> 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> ata5.00: status: { DRDY }
> ata5.00: failed command: WRITE FPDMA QUEUED
> ata5.00: cmd 61/80:e8:80:04:76/1d:00:9f:00:00/40 tag 29 ncq 3866624 out
>                                        res
> 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> ata5.00: status: { DRDY }
> ata5: hard resetting link
> ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> ata5.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
> ata5.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
> ata5.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY)
> filtered out
> ata5.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
> ata5.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
> ata5.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY)
> filtered out
> ata5.00: configured for UDMA/133
> ata5.00: device reported invalid CHS sector 0
> ata5.00: device reported invalid CHS sector 0
> ata5.00: device reported invalid CHS sector 0
> ata5.00: device reported invalid CHS sector 0
> ata5.00: device reported invalid CHS sector 0
> ata5.00: device reported invalid CHS sector 0
> ata5.00: device reported invalid CHS sector 0
> ata5.00: device reported invalid CHS sector 0
> ata5.00: device reported invalid CHS sector 0
> ata5.00: device reported invalid CHS sector 0
> ata5.00: device reported invalid CHS sector 0
> 
> 
> 
> Discs are from different batches and I was a bit surprised to see
> identical failures at the same time. I was ready send the drives to
> RMA but then I discovered this thread
> https://bbs.archlinux.org/viewtopic.php?id=199351 A lot of people
> report the same problem as mine. It was found that the problem appears
> only starting from 3.19, with kernel 3.18 or Windows these drives work
> fine. It is recommended to revert this change
> 9162c6579bf90b3f5ddb7e3a6c6fa946c1b4cbeb "libata: Implement
> ATA_DEV_ZAC" that seems fixes the issue.
> 
Please clarify: Have you _checked_ that it fixes this issue?

The above patch just adds a new ATA device type, and the necessary
changes to have it detected correctly.
I find it extremely unlikely it being the culprit here.

> There is also the same issue reported at kernel.org
> https://bugzilla.kernel.org/show_bug.cgi?id=93581
> 
> 
> This issue is extremely annoying and makes negative SMR drives user
> experience. Had anybody look at this bug?
> 
Did you attempt a bisect to find out the offending patch?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: IO failures with SMR drives at latest kernel versions
  2015-08-22 17:23 ` Anatol Pomozov
@ 2015-08-24  6:15   ` Hannes Reinecke
  2015-08-24  7:21     ` Anatol Pomozov
  2015-08-26  4:53     ` Anatol Pomozov
  0 siblings, 2 replies; 11+ messages in thread
From: Hannes Reinecke @ 2015-08-24  6:15 UTC (permalink / raw)
  To: Anatol Pomozov, Linux SCSI List; +Cc: hch, Tejun Heo

On 08/22/2015 07:23 PM, Anatol Pomozov wrote:
> Hi
> 
> On Fri, Aug 21, 2015 at 10:37 PM, Anatol Pomozov
> <anatol.pomozov@gmail.com> wrote:
>> Discs are from different batches and I was a bit surprised to see
>> identical failures at the same time. I was ready send the drives to
>> RMA but then I discovered this thread
>> https://bbs.archlinux.org/viewtopic.php?id=199351 A lot of people
>> report the same problem as mine. It was found that the problem appears
>> only starting from 3.19, with kernel 3.18 or Windows these drives work
>> fine. It is recommended to revert this change
>> 9162c6579bf90b3f5ddb7e3a6c6fa946c1b4cbeb "libata: Implement
>> ATA_DEV_ZAC" that seems fixes the issue.
> 
> I looked at this commit and it actually adds SMR support to SCSI
> layer. Reverting ATA_DEV_ZAC means going back to zones-unaware
> algorithms. It is suboptimal but still much better than IO failures
> and "BTRFS: lost page write due to I/O error on /dev/sdc" errors I see
> at my computer.
> 
> If this SMR support is considered as non-stable, can we at least get a
> kernel boot (or config) option that disables ZAC?
> 
Again: Has anybody actually _tested_ that reverting this patch fixes
this issue?
Or even a proper bisect?

Needless to say, my drives work without issues, so I can't really
reproduce this issue.

Which disks are these?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: IO failures with SMR drives at latest kernel versions
  2015-08-24  6:15   ` Hannes Reinecke
@ 2015-08-24  7:21     ` Anatol Pomozov
  2015-08-26  4:53     ` Anatol Pomozov
  1 sibling, 0 replies; 11+ messages in thread
From: Anatol Pomozov @ 2015-08-24  7:21 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Linux SCSI List, hch, Tejun Heo

Hi

On Sun, Aug 23, 2015 at 11:15 PM, Hannes Reinecke <hare@suse.de> wrote:
> On 08/22/2015 07:23 PM, Anatol Pomozov wrote:
>> Hi
>>
>> On Fri, Aug 21, 2015 at 10:37 PM, Anatol Pomozov
>> <anatol.pomozov@gmail.com> wrote:
>>> Discs are from different batches and I was a bit surprised to see
>>> identical failures at the same time. I was ready send the drives to
>>> RMA but then I discovered this thread
>>> https://bbs.archlinux.org/viewtopic.php?id=199351 A lot of people
>>> report the same problem as mine. It was found that the problem appears
>>> only starting from 3.19, with kernel 3.18 or Windows these drives work
>>> fine. It is recommended to revert this change
>>> 9162c6579bf90b3f5ddb7e3a6c6fa946c1b4cbeb "libata: Implement
>>> ATA_DEV_ZAC" that seems fixes the issue.
>>
>> I looked at this commit and it actually adds SMR support to SCSI
>> layer. Reverting ATA_DEV_ZAC means going back to zones-unaware
>> algorithms. It is suboptimal but still much better than IO failures
>> and "BTRFS: lost page write due to I/O error on /dev/sdc" errors I see
>> at my computer.
>>
>> If this SMR support is considered as non-stable, can we at least get a
>> kernel boot (or config) option that disables ZAC?
>>
> Again: Has anybody actually _tested_ that reverting this patch fixes
> this issue?
> Or even a proper bisect?
>
> Needless to say, my drives work without issues, so I can't really
> reproduce this issue.
>
> Which disks are these?

I have ST8000AS0002
http://www.newegg.com/Product/Product.aspx?Item=N82E16822178748

Here is smartctl info https://gist.github.com/anatol/66f5368208c903133f24

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: IO failures with SMR drives at latest kernel versions
  2015-08-24  6:15   ` Hannes Reinecke
  2015-08-24  7:21     ` Anatol Pomozov
@ 2015-08-26  4:53     ` Anatol Pomozov
  2015-08-26  6:40       ` Hannes Reinecke
  1 sibling, 1 reply; 11+ messages in thread
From: Anatol Pomozov @ 2015-08-26  4:53 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Linux SCSI List, hch, Tejun Heo

Hi

On Sun, Aug 23, 2015 at 11:15 PM, Hannes Reinecke <hare@suse.de> wrote:
>> I looked at this commit and it actually adds SMR support to SCSI
>> layer. Reverting ATA_DEV_ZAC means going back to zones-unaware
>> algorithms. It is suboptimal but still much better than IO failures
>> and "BTRFS: lost page write due to I/O error on /dev/sdc" errors I see
>> at my computer.
>>
>> If this SMR support is considered as non-stable, can we at least get a
>> kernel boot (or config) option that disables ZAC?
>>
> Again: Has anybody actually _tested_ that reverting this patch fixes
> this issue?

Yes I tested it.

This error happens only under heavy load with a lot of read/writes
(like btrfs rebalance).

With current Linux-4.1.6 'btrfs balance' fails after ~10 minutes after
start. I reverted ZAC related changes and then ran rebalancing. The
operation finished successfully after 3 hours of running.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: IO failures with SMR drives at latest kernel versions
  2015-08-26  4:53     ` Anatol Pomozov
@ 2015-08-26  6:40       ` Hannes Reinecke
  2015-08-26 21:13         ` Anatol Pomozov
  2015-08-26 22:52         ` James Bottomley
  0 siblings, 2 replies; 11+ messages in thread
From: Hannes Reinecke @ 2015-08-26  6:40 UTC (permalink / raw)
  To: Anatol Pomozov; +Cc: Linux SCSI List, hch, Tejun Heo

On 08/26/2015 06:53 AM, Anatol Pomozov wrote:
> Hi
> 
> On Sun, Aug 23, 2015 at 11:15 PM, Hannes Reinecke <hare@suse.de> wrote:
>>> I looked at this commit and it actually adds SMR support to SCSI
>>> layer. Reverting ATA_DEV_ZAC means going back to zones-unaware
>>> algorithms. It is suboptimal but still much better than IO failures
>>> and "BTRFS: lost page write due to I/O error on /dev/sdc" errors I see
>>> at my computer.
>>>
>>> If this SMR support is considered as non-stable, can we at least get a
>>> kernel boot (or config) option that disables ZAC?
>>>
>> Again: Has anybody actually _tested_ that reverting this patch fixes
>> this issue?
> 
> Yes I tested it.
> 
> This error happens only under heavy load with a lot of read/writes
> (like btrfs rebalance).
> 
> With current Linux-4.1.6 'btrfs balance' fails after ~10 minutes after
> start. I reverted ZAC related changes and then ran rebalancing. The
> operation finished successfully after 3 hours of running.
> 
Can you be a bit more specific about the 'ZAC related changes'?
There have been several patches, and we really would need to know
which one was the offending one.
Can you try to bisect things here?

Thanks.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: IO failures with SMR drives at latest kernel versions
  2015-08-26  6:40       ` Hannes Reinecke
@ 2015-08-26 21:13         ` Anatol Pomozov
  2015-08-26 22:52         ` James Bottomley
  1 sibling, 0 replies; 11+ messages in thread
From: Anatol Pomozov @ 2015-08-26 21:13 UTC (permalink / raw)
  To: Hannes Reinecke, adrian.palmer; +Cc: Linux SCSI List, hch, Tejun Heo

Hi

The same issue was reported here
http://www.spinics.net/lists/linux-ide/msg50641.html

Adding Adrian from Seagate to help track down this issue.

On Tue, Aug 25, 2015 at 11:40 PM, Hannes Reinecke <hare@suse.de> wrote:
> On 08/26/2015 06:53 AM, Anatol Pomozov wrote:
>> Hi
>>
>> On Sun, Aug 23, 2015 at 11:15 PM, Hannes Reinecke <hare@suse.de> wrote:
>>>> I looked at this commit and it actually adds SMR support to SCSI
>>>> layer. Reverting ATA_DEV_ZAC means going back to zones-unaware
>>>> algorithms. It is suboptimal but still much better than IO failures
>>>> and "BTRFS: lost page write due to I/O error on /dev/sdc" errors I see
>>>> at my computer.
>>>>
>>>> If this SMR support is considered as non-stable, can we at least get a
>>>> kernel boot (or config) option that disables ZAC?
>>>>
>>> Again: Has anybody actually _tested_ that reverting this patch fixes
>>> this issue?
>>
>> Yes I tested it.
>>
>> This error happens only under heavy load with a lot of read/writes
>> (like btrfs rebalance).
>>
>> With current Linux-4.1.6 'btrfs balance' fails after ~10 minutes after
>> start. I reverted ZAC related changes and then ran rebalancing. The
>> operation finished successfully after 3 hours of running.
>>
> Can you be a bit more specific about the 'ZAC related changes'?
> There have been several patches, and we really would need to know
> which one was the offending one.

I reverted these changes:

9162c6579bf90b3f5ddb7e3a6c6fa946c1b4cbeb
f9ca5ab832e7ac5bc2b6fe0e82ad46d536f436f9

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: IO failures with SMR drives at latest kernel versions
  2015-08-26  6:40       ` Hannes Reinecke
  2015-08-26 21:13         ` Anatol Pomozov
@ 2015-08-26 22:52         ` James Bottomley
  2015-08-27  6:18           ` Hannes Reinecke
  1 sibling, 1 reply; 11+ messages in thread
From: James Bottomley @ 2015-08-26 22:52 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Anatol Pomozov, Linux SCSI List, hch, Tejun Heo

On Wed, 2015-08-26 at 08:40 +0200, Hannes Reinecke wrote:
> On 08/26/2015 06:53 AM, Anatol Pomozov wrote:
> > Hi
> > 
> > On Sun, Aug 23, 2015 at 11:15 PM, Hannes Reinecke <hare@suse.de> wrote:
> >>> I looked at this commit and it actually adds SMR support to SCSI
> >>> layer. Reverting ATA_DEV_ZAC means going back to zones-unaware
> >>> algorithms. It is suboptimal but still much better than IO failures
> >>> and "BTRFS: lost page write due to I/O error on /dev/sdc" errors I see
> >>> at my computer.
> >>>
> >>> If this SMR support is considered as non-stable, can we at least get a
> >>> kernel boot (or config) option that disables ZAC?
> >>>
> >> Again: Has anybody actually _tested_ that reverting this patch fixes
> >> this issue?
> > 
> > Yes I tested it.
> > 
> > This error happens only under heavy load with a lot of read/writes
> > (like btrfs rebalance).
> > 
> > With current Linux-4.1.6 'btrfs balance' fails after ~10 minutes after
> > start. I reverted ZAC related changes and then ran rebalancing. The
> > operation finished successfully after 3 hours of running.
> > 
> Can you be a bit more specific about the 'ZAC related changes'?
> There have been several patches, and we really would need to know
> which one was the offending one.
> Can you try to bisect things here?

OK, let's stop shooting the messenger here.  There are multiple reports
of this problem.  The pattern seems to be some type of error causes
everything to die.

There looks to be an obvious bug in
9162c6579bf90b3f5ddb7e3a6c6fa946c1b4cbeb in that there's no
ATA_DEV_ZAC_UNSUP class which means that any attempt to disable the
device pushes it up to ATA_DEV_NONE.  I'm not sure ... don't have time
to follow the code ... but doesn't this interfere with the speed
dropping routines which seems to disable then re-enable the device?
Does adding ATA_DEV_ZAC_UNSUP fix this problem? patch (compile tested
only) below.

James

---

diff --git a/drivers/ata/libata-transport.c b/drivers/ata/libata-transport.c
index d6c37bc..fa83320 100644
--- a/drivers/ata/libata-transport.c
+++ b/drivers/ata/libata-transport.c
@@ -144,6 +144,7 @@ static struct {
 	{ ATA_DEV_SEMB,			"semb" },
 	{ ATA_DEV_SEMB_UNSUP,		"semb" },
 	{ ATA_DEV_ZAC,			"zac" },
+	{ ATA_DEV_ZAC_UNSUP,		"zac" },
 	{ ATA_DEV_NONE,			"none" }
 };
 ata_bitfield_name_search(class, ata_class_names)
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 36ce37b..49c5b98 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -191,7 +191,8 @@ enum {
 	ATA_DEV_SEMB		= 7,	/* SEMB */
 	ATA_DEV_SEMB_UNSUP	= 8,	/* SEMB (unsupported) */
 	ATA_DEV_ZAC		= 9,	/* ZAC device */
-	ATA_DEV_NONE		= 10,	/* no device */
+	ATA_DEV_ZAC_UNSUP	= 10,	/* ZAC (unsupported) */
+	ATA_DEV_NONE		= 11,	/* no device */
 
 	/* struct ata_link flags */
 	ATA_LFLAG_NO_HRST	= (1 << 1), /* avoid hardreset */
@@ -1517,7 +1518,8 @@ static inline unsigned int ata_class_enabled(unsigned int class)
 static inline unsigned int ata_class_disabled(unsigned int class)
 {
 	return class == ATA_DEV_ATA_UNSUP || class == ATA_DEV_ATAPI_UNSUP ||
-		class == ATA_DEV_PMP_UNSUP || class == ATA_DEV_SEMB_UNSUP;
+		class == ATA_DEV_PMP_UNSUP || class == ATA_DEV_SEMB_UNSUP ||
+		class == ATA_DEV_ZAC_UNSUP;
 }
 
 static inline unsigned int ata_class_absent(unsigned int class)



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: IO failures with SMR drives at latest kernel versions
  2015-08-26 22:52         ` James Bottomley
@ 2015-08-27  6:18           ` Hannes Reinecke
  0 siblings, 0 replies; 11+ messages in thread
From: Hannes Reinecke @ 2015-08-27  6:18 UTC (permalink / raw)
  To: James Bottomley; +Cc: Anatol Pomozov, Linux SCSI List, hch, Tejun Heo

On 08/27/2015 12:52 AM, James Bottomley wrote:
> On Wed, 2015-08-26 at 08:40 +0200, Hannes Reinecke wrote:
>> On 08/26/2015 06:53 AM, Anatol Pomozov wrote:
>>> Hi
>>>
>>> On Sun, Aug 23, 2015 at 11:15 PM, Hannes Reinecke <hare@suse.de> wrote:
>>>>> I looked at this commit and it actually adds SMR support to SCSI
>>>>> layer. Reverting ATA_DEV_ZAC means going back to zones-unaware
>>>>> algorithms. It is suboptimal but still much better than IO failures
>>>>> and "BTRFS: lost page write due to I/O error on /dev/sdc" errors I see
>>>>> at my computer.
>>>>>
>>>>> If this SMR support is considered as non-stable, can we at least get a
>>>>> kernel boot (or config) option that disables ZAC?
>>>>>
>>>> Again: Has anybody actually _tested_ that reverting this patch fixes
>>>> this issue?
>>>
>>> Yes I tested it.
>>>
>>> This error happens only under heavy load with a lot of read/writes
>>> (like btrfs rebalance).
>>>
>>> With current Linux-4.1.6 'btrfs balance' fails after ~10 minutes after
>>> start. I reverted ZAC related changes and then ran rebalancing. The
>>> operation finished successfully after 3 hours of running.
>>>
>> Can you be a bit more specific about the 'ZAC related changes'?
>> There have been several patches, and we really would need to know
>> which one was the offending one.
>> Can you try to bisect things here?
> 
> OK, let's stop shooting the messenger here.  There are multiple reports
> of this problem.  The pattern seems to be some type of error causes
> everything to die.
> 
> There looks to be an obvious bug in
> 9162c6579bf90b3f5ddb7e3a6c6fa946c1b4cbeb in that there's no
> ATA_DEV_ZAC_UNSUP class which means that any attempt to disable the
> device pushes it up to ATA_DEV_NONE.  I'm not sure ... don't have time
> to follow the code ... but doesn't this interfere with the speed
> dropping routines which seems to disable then re-enable the device?
> Does adding ATA_DEV_ZAC_UNSUP fix this problem? patch (compile tested
> only) below.
> 
> James
> 
> ---
> 
> diff --git a/drivers/ata/libata-transport.c b/drivers/ata/libata-transport.c
> index d6c37bc..fa83320 100644
> --- a/drivers/ata/libata-transport.c
> +++ b/drivers/ata/libata-transport.c
> @@ -144,6 +144,7 @@ static struct {
>  	{ ATA_DEV_SEMB,			"semb" },
>  	{ ATA_DEV_SEMB_UNSUP,		"semb" },
>  	{ ATA_DEV_ZAC,			"zac" },
> +	{ ATA_DEV_ZAC_UNSUP,		"zac" },
>  	{ ATA_DEV_NONE,			"none" }
>  };
>  ata_bitfield_name_search(class, ata_class_names)
> diff --git a/include/linux/libata.h b/include/linux/libata.h
> index 36ce37b..49c5b98 100644
> --- a/include/linux/libata.h
> +++ b/include/linux/libata.h
> @@ -191,7 +191,8 @@ enum {
>  	ATA_DEV_SEMB		= 7,	/* SEMB */
>  	ATA_DEV_SEMB_UNSUP	= 8,	/* SEMB (unsupported) */
>  	ATA_DEV_ZAC		= 9,	/* ZAC device */
> -	ATA_DEV_NONE		= 10,	/* no device */
> +	ATA_DEV_ZAC_UNSUP	= 10,	/* ZAC (unsupported) */
> +	ATA_DEV_NONE		= 11,	/* no device */
>  
>  	/* struct ata_link flags */
>  	ATA_LFLAG_NO_HRST	= (1 << 1), /* avoid hardreset */
> @@ -1517,7 +1518,8 @@ static inline unsigned int ata_class_enabled(unsigned int class)
>  static inline unsigned int ata_class_disabled(unsigned int class)
>  {
>  	return class == ATA_DEV_ATA_UNSUP || class == ATA_DEV_ATAPI_UNSUP ||
> -		class == ATA_DEV_PMP_UNSUP || class == ATA_DEV_SEMB_UNSUP;
> +		class == ATA_DEV_PMP_UNSUP || class == ATA_DEV_SEMB_UNSUP ||
> +		class == ATA_DEV_ZAC_UNSUP;
>  }
>  
>  static inline unsigned int ata_class_absent(unsigned int class)
> 
> 
Yes, you are correct. Even if this does not fix up this particular
issue it looks like a valid fix.

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-08-27  6:18 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-22  5:37 IO failures with SMR drives at latest kernel versions Anatol Pomozov
2015-08-22 16:35 ` Tejun Heo
2015-08-22 17:23 ` Anatol Pomozov
2015-08-24  6:15   ` Hannes Reinecke
2015-08-24  7:21     ` Anatol Pomozov
2015-08-26  4:53     ` Anatol Pomozov
2015-08-26  6:40       ` Hannes Reinecke
2015-08-26 21:13         ` Anatol Pomozov
2015-08-26 22:52         ` James Bottomley
2015-08-27  6:18           ` Hannes Reinecke
2015-08-24  6:11 ` Hannes Reinecke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).