force remapping a pending sector in sw raid5 array

Linux RAID subsystem development
 help / color / mirror / Atom feed

* force remapping a pending sector in sw raid5 array
@ 2018-02-06 18:14 Marc MERLIN
  2018-02-06 18:59 ` Reindl Harald
                   ` (3 more replies)
  0 siblings, 4 replies; 32+ messages in thread
From: Marc MERLIN @ 2018-02-06 18:14 UTC (permalink / raw)
  To: linux-raid

So, I have 2 drives on a 5x6TB array that have respectively 1 and 8
pending sectors in smart.

Currently, I have a check running, but it will take a while...

echo check > /sys/block/md7/md/sync_action 
md7 : active raid5 sdf1[0] sdg1[5] sdd1[3] sdh1[2] sde1[1]
      23441561600 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]
      [==>..................]  check = 10.5% (615972996/5860390400) finish=4822.1min speed=18125K/sec
      bitmap: 3/44 pages [12KB], 65536KB chunk

My understanding is that eventually it will find the bad sectors that can't be read
and rewrite new ones (block remapping) after reading the remaining 4 drives.

But that may take up to 3 days, just due to how long the check will take and size of the drives
(they are on a SATA port multiplier, so I don't get a lot of speed).

Now, I was trying to see if I could just manually remap the block if I can read it at
least once.
Smart shows:
# 3  Extended offline    Completed: read failure       90%       289         1287409520
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       1

So, trying to read the block until it reads ok and gets remapped, would be great
but that didn't work:
myth:/mnt/btrfs_bigbackup/DS2/backup# dd if=/dev/sdh skip=1287409520 count=1 of=- | more  
dd: reading `/dev/sdh': Input/output error  
0+0 records in  
0+0 records out  
0 bytes (0 B) copied, 9.79192 s, 0.0 kB/s  
myth:/mnt/btrfs_bigbackup/DS2/backup# dd if=/dev/sdh skip=1287409520 count=1 of=- | more  
dd: reading `/dev/sdh': Input/output error  
0+0 records in  
0+0 records out  
0 bytes (0 B) copied, 4.54204 s, 0.0 kB/s  
ata5.04: exception Emask 0x0 SAct 0x1c000 SErr 0x0 action 0x0
ata5.04: failed command: READ FPDMA QUEUED
ata5.04: cmd 60/08:80:70:4f:bc/00:00:4c:00:00/40 tag 16 ncq dma 4096 in
         res 41/40:00:70:4f:bc/00:00:4c:00:00/00 Emask 0x409 (media error) <F>
ata5.04: status: { DRDY ERR }
ata5.04: error: { UNC }
ata5.04: configured for UDMA/133
sd 4:4:0:0: [sdh] tag#16 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 4:4:0:0: [sdh] tag#16 Sense Key : Medium Error [current] 
sd 4:4:0:0: [sdh] tag#16 Add. Sense: Unrecovered read error - auto reallocate failed
sd 4:4:0:0: [sdh] tag#16 CDB: Read(16) 88 00 00 00 00 00 4c bc 4f 70 00 00 00 08 00 00
print_req_error: I/O error, dev sdh, sector 1287409520
Buffer I/O error on dev sdh, logical block 160926190, async page read
ata5: EH complete

That's not very unexpected.
However, I can get 
hdparm --read-sector 1287409520 /dev/sdh
to work sometimes.
I've gotten garbage, sometimes 0, and sometimes what seems like good data (and gotten
the same data more than once).


hdparm --read-sector 1287409520 /dev/sdh

/dev/sdh:
reading sector 1287409520: SG_IO: bad/missing sense data, sb[]:  70 00 03 00 00 00 00 0a 40 51 e0 01 11 04 00 00 a0 70 00 00 00 00 00 00 00 00 00 00 00 00 00 00
succeeded

6843 7261 6361 6574 2072 6564 6976 6563
3a73 200a 3120 6d20 6d65 200a 3420 2f20
6564 2f76 6376 302f 200a 3420 7420 7974
200a 3420 7420 7974 0a53 2020 2035 642f
7665 742f 7974 200a 3520 2f20 6564 2f76
6f63 736e 6c6f 0a65 2020 2035 642f 7665
702f 6d74 0a78 2020 2036 706c 200a 3720
7620 7363 200a 3031 6d20 7369 0a63 3120
2033 6e69 7570 0a74 3120 2034 6f73 6e75
2f64 696d 6578 0a72 3120 2034 6f73 6e75
2f64 7364 0a70 3120 2034 6f73 6e75 2f64
7561 6964 0a6f 3120 2034 6f73 6e75 2f64
6461 7073 200a 3132 7320 0a67 3220 2039
6266 200a 3138 7620 6469 6f65 6c34 6e69
7875 310a 3631 6120 736c 0a61 3231 2038
7470 0a6d 3331 2036 7470 0a73 3831 2030
7375 0a62 3831 2039 7375 5f62 6564 6976
6563 320a 3632 6420 6d72 320a 3234 6d20
6465 6169 320a 3334 6820 6469 6172 0a77
3432 2034 6966 6572 6977 6572 320a 3534
6e20 6d76 0a65 3432 2036 656d 0a69 3432
2037 7561 0a78 3432 2038 7362 0a67 3432
2039 6177 6374 6468 676f 320a 3035 7220
6374 320a 3135 6420 7861 320a 3235 6420
6d69 636d 6c74 320a 3335 6e20 6364 6c74
320a 3435 6720 6970 636f 6968 0a70 420a
6f6c 6b63 6420 7665 6369 7365 0a3a 2020
2032 6466 200a 3820 7320 0a64 2020 2039
646d 200a 3131 7320 0a72 3620 2035 6473
200a 3636 7320 0a64 3620 2037 6473 200a
3836 7320 0a64 3620 2039 6473 200a 3037
7320 0a64 3720 2031 6473 310a 3832 7320


Should I stick this data into a 512 byte file and write it back with dd in the right place?
(sadly hdparm –write-sector does not seem to take input and just writes 0's instead)

Does that sound like a good plan, or is there another better way to fix my issue?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: force remapping a pending sector in sw raid5 array
  2018-02-06 18:14 force remapping a pending sector in sw raid5 array Marc MERLIN
@ 2018-02-06 18:59 ` Reindl Harald
  2018-02-06 19:36   ` Marc MERLIN
  2018-02-06 20:03 ` Andreas Klauer
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 32+ messages in thread
From: Reindl Harald @ 2018-02-06 18:59 UTC (permalink / raw)
  To: Marc MERLIN, linux-raid



Am 06.02.2018 um 19:14 schrieb Marc MERLIN:
> So, I have 2 drives on a 5x6TB array that have respectively 1 and 8
> pending sectors in smart.
> 
> Currently, I have a check running, but it will take a while...
> 
> echo check > /sys/block/md7/md/sync_action
> md7 : active raid5 sdf1[0] sdg1[5] sdd1[3] sdh1[2] sde1[1]
>        23441561600 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]
>        [==>..................]  check = 10.5% (615972996/5860390400) finish=4822.1min speed=18125K/sec
>        bitmap: 3/44 pages [12KB], 65536KB chunk
> 
> My understanding is that eventually it will find the bad sectors that can't be read
> and rewrite new ones (block remapping) after reading the remaining 4 drives.
> 
> But that may take up to 3 days, just due to how long the check will take and size of the drives
> (they are on a SATA port multiplier, so I don't get a lot of speed)

but 18125K/sec is a joke given that you should run a scrub every week

did you try to play around with sysctl.conf?

adjusting teh vars below and run "sysctl -p" should amke a difference 
after a view seconds if the hardware is capable of more performance than 
that

dev.raid.speed_limit_min = 25000
dev.raid.speed_limit_max = 1000000

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: force remapping a pending sector in sw raid5 array
  2018-02-06 18:59 ` Reindl Harald
@ 2018-02-06 19:36   ` Marc MERLIN
  0 siblings, 0 replies; 32+ messages in thread
From: Marc MERLIN @ 2018-02-06 19:36 UTC (permalink / raw)
  To: Reindl Harald; +Cc: linux-raid

On Tue, Feb 06, 2018 at 07:59:32PM +0100, Reindl Harald wrote:
> but 18125K/sec is a joke given that you should run a scrub every week

I know it's bad. Right now it's a bit slower than normal because I'm also
copying data to the drives.

> did you try to play around with sysctl.conf?

Yes, I set it to 300,000, but obviously it won't make the hardware go faster
than it can.

I totally understand the performance is crap, but it's a backup array that I
only bring up and power on once a week and scrub once a month, so it's ok
enough for the use in question.

For now, it's more about me learning how to manually force a block remap,
not because I absolutely have to, but because it's always good to know and
learn low level tools and how things work.

I have used hdrecover in the past which reads all the blocks low level and
re-reads a bad block many times to force a successful read and auto remap,
but sadly it doesn't take a block offset, so it would only work if I let it
run on the whole drive, which would be slow.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: force remapping a pending sector in sw raid5 array
  2018-02-06 18:14 force remapping a pending sector in sw raid5 array Marc MERLIN
  2018-02-06 18:59 ` Reindl Harald
@ 2018-02-06 20:03 ` Andreas Klauer
  2018-02-06 21:51 ` Adam Goryachev
  2018-02-07  9:42 ` Kay Diederichs
  3 siblings, 0 replies; 32+ messages in thread
From: Andreas Klauer @ 2018-02-06 20:03 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-raid

On Tue, Feb 06, 2018 at 10:14:16AM -0800, Marc MERLIN wrote:
> echo check > /sys/block/md7/md/sync_action 
> md7 : active raid5 sdf1[0] sdg1[5] sdd1[3] sdh1[2] sde1[1]
>       23441561600 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]
>       [==>..................]  check = 10.5% (615972996/5860390400) finish=4822.1min speed=18125K/sec
>       bitmap: 3/44 pages [12KB], 65536KB chunk
> 
> My understanding is that eventually it will find the bad sectors that can't be read
> and rewrite new ones (block remapping) after reading the remaining 4 drives.

You can do selective area checks, with /sys/block/mdX/md/sync_{min,max}.

Documented here:

https://www.kernel.org/doc/html/latest/admin-guide/md.html#md-devices-in-sysfs

Also see if increasing the stripe cache size helps with speeds.

> Should I stick this data into a 512 byte file and write it back with dd in the right place?

Most hard drives have 4K sectors these days. Writing 512 byte into a bad 
physical 4K sector is probably not a good idea. So at minimum, write 4K 
(aligned). If in doubt, write more...

However, leaving it to the md check should be the much safer option. 
Easy to make mistakes when messing with drives directly.

> Does that sound like a good plan, or is there another better way to fix my issue?

mdadm --replace the drive entirely with a new one is not an option?

Then you could do a full badblocks write test on the removed/faulty drive 
and make a more informed decision as to whether it can be trusted at all.

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: force remapping a pending sector in sw raid5 array
  2018-02-06 18:14 force remapping a pending sector in sw raid5 array Marc MERLIN
  2018-02-06 18:59 ` Reindl Harald
  2018-02-06 20:03 ` Andreas Klauer
@ 2018-02-06 21:51 ` Adam Goryachev
  2018-02-06 22:02   ` Marc MERLIN
  2018-02-07  4:29   ` Marc MERLIN
  2018-02-07  9:42 ` Kay Diederichs
  3 siblings, 2 replies; 32+ messages in thread
From: Adam Goryachev @ 2018-02-06 21:51 UTC (permalink / raw)
  To: Marc MERLIN, linux-raid

On 07/02/18 05:14, Marc MERLIN wrote:
> So, I have 2 drives on a 5x6TB array that have respectively 1 and 8
> pending sectors in smart.
>
> Currently, I have a check running, but it will take a while...
>
> echo check > /sys/block/md7/md/sync_action
> md7 : active raid5 sdf1[0] sdg1[5] sdd1[3] sdh1[2] sde1[1]
>        23441561600 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]
>        [==>..................]  check = 10.5% (615972996/5860390400) finish=4822.1min speed=18125K/sec
>        bitmap: 3/44 pages [12KB], 65536KB chunk
>
> My understanding is that eventually it will find the bad sectors that can't be read
> and rewrite new ones (block remapping) after reading the remaining 4 drives.
>
> But that may take up to 3 days, just due to how long the check will take and size of the drives
> (they are on a SATA port multiplier, so I don't get a lot of speed).
>
> Now, I was trying to see if I could just manually remap the block if I can read it at
> least once.
> Smart shows:
> # 3  Extended offline    Completed: read failure       90%       289         1287409520
> 197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       1
>
> So, trying to read the block until it reads ok and gets remapped, would be great
> but that didn't work:
>
> Does that sound like a good plan, or is there another better way to fix my issue?

I think instead of reading the sector from the drive and relying on the 
drive to determine the correct data (it's already telling you it can't). 
What you need to do is find out where on md7 drive x sector y maps to 
and read that sector from md7, which will get md to (possibly) notice 
the read error, and then read the data from the other drives, and then 
re-write the faulty sector with correct calculated data (or do the 
resync on that area of md7 only).

You could probably take a rough guess as follows (note, my math is 
probably totally bogus as I don't really know the physical / logical 
mapping for raid5, but I'm guessing)
You have 5 drives in raid5, and we know one drive (capacity) is used for 
checksum, so four drives of data. So sector 1287409520 of one drive 
would be approx 4 x sector 1287409520 of the md array.

So try setting something like 1287000000 * 4 as the start of the resync 
up to 1288000000 * 4 and see if that finds and fixes it for you.

If nothing else, it should finish fairly quickly. You might need to 
start earlier, but you could just keep reducing the "window" until you 
find the right spot. Or, someone who knows a lot more about this mapping 
might jump in and answer the question, though they might need to see the 
raid details to see the actual physical layout/order of drives/etc.

Hope that helps anyway....

Regards,
Adam

-- 
Adam Goryachev Website Managers www.websitemanagers.com.au
-- 
The information in this e-mail is confidential and may be legally privileged.
It is intended solely for the addressee. Access to this e-mail by anyone else
is unauthorised. If you are not the intended recipient, any disclosure,
copying, distribution or any action taken or omitted to be taken in reliance
on it, is prohibited and may be unlawful. If you have received this message
in error, please notify us immediately. Please also destroy and delete the
message from your computer. Viruses - Any loss/damage incurred by receiving
this email is not the sender's responsibility.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: force remapping a pending sector in sw raid5 array
  2018-02-06 21:51 ` Adam Goryachev
@ 2018-02-06 22:02   ` Marc MERLIN
  2018-02-06 22:31     ` Roger Heflin
  2018-02-07  4:29   ` Marc MERLIN
  1 sibling, 1 reply; 32+ messages in thread
From: Marc MERLIN @ 2018-02-06 22:02 UTC (permalink / raw)
  To: Adam Goryachev; +Cc: linux-raid

On Wed, Feb 07, 2018 at 08:51:15AM +1100, Adam Goryachev wrote:
> I think instead of reading the sector from the drive and relying on the 
> drive to determine the correct data (it's already telling you it can't). 

Just on that point, it's not that simple. A drive will only try to read the
data a few times before giving up and marking the sector as pending a
re-write with new data (so that it can be re-mapped).
You can however re-read it in different ways and sometimes get the data
back, which _should_ then cause an immediate re-writing of the data on a new
block and turn the pending into a reallocated block
However, this does not seem to have happened on my drive, either because the
bad data didn't really get read by hdparm --read-sector, or because the
firmware isn't doing its remapping job, or something else I don't understand

> What you need to do is find out where on md7 drive x sector y maps to 
> and read that sector from md7, which will get md to (possibly) notice 
> the read error, and then read the data from the other drives, and then 
> re-write the faulty sector with correct calculated data (or do the 
> resync on that area of md7 only).

Yeah, I got that part.

> So try setting something like 1287000000 * 4 as the start of the resync 
> up to 1288000000 * 4 and see if that finds and fixes it for you.
> 
> If nothing else, it should finish fairly quickly. You might need to 
> start earlier, but you could just keep reducing the "window" until you 
> find the right spot. Or, someone who knows a lot more about this mapping 
> might jump in and answer the question, though they might need to see the 
> raid details to see the actual physical layout/order of drives/etc.

I did however (indeed) miss that I can narrow the check range, so I'll try
playing with that until I can narrow it down to the right bit.

I'm still curious as to why the hdparm bit didn't work, but oh well at this
point.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: force remapping a pending sector in sw raid5 array
  2018-02-06 22:02   ` Marc MERLIN
@ 2018-02-06 22:31     ` Roger Heflin
  2018-02-06 22:46       ` Marc MERLIN
  0 siblings, 1 reply; 32+ messages in thread
From: Roger Heflin @ 2018-02-06 22:31 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: Adam Goryachev, Linux RAID

What kind of drive is it?   I have had good luck getting seagates to
remap, on my 3tb WD Red drive with bad sectors the drive does not seem
to remap them as easily.

So far I have a lot of repeat bad sectors, but only 1 has remapped,
even thought I am given the drive a lot of chances to remap the
sectors.

On Tue, Feb 6, 2018 at 4:02 PM, Marc MERLIN <marc@merlins.org> wrote:
> On Wed, Feb 07, 2018 at 08:51:15AM +1100, Adam Goryachev wrote:
>> I think instead of reading the sector from the drive and relying on the
>> drive to determine the correct data (it's already telling you it can't).
>
> Just on that point, it's not that simple. A drive will only try to read the
> data a few times before giving up and marking the sector as pending a
> re-write with new data (so that it can be re-mapped).
> You can however re-read it in different ways and sometimes get the data
> back, which _should_ then cause an immediate re-writing of the data on a new
> block and turn the pending into a reallocated block
> However, this does not seem to have happened on my drive, either because the
> bad data didn't really get read by hdparm --read-sector, or because the
> firmware isn't doing its remapping job, or something else I don't understand
>
>> What you need to do is find out where on md7 drive x sector y maps to
>> and read that sector from md7, which will get md to (possibly) notice
>> the read error, and then read the data from the other drives, and then
>> re-write the faulty sector with correct calculated data (or do the
>> resync on that area of md7 only).
>
> Yeah, I got that part.
>
>> So try setting something like 1287000000 * 4 as the start of the resync
>> up to 1288000000 * 4 and see if that finds and fixes it for you.
>>
>> If nothing else, it should finish fairly quickly. You might need to
>> start earlier, but you could just keep reducing the "window" until you
>> find the right spot. Or, someone who knows a lot more about this mapping
>> might jump in and answer the question, though they might need to see the
>> raid details to see the actual physical layout/order of drives/etc.
>
> I did however (indeed) miss that I can narrow the check range, so I'll try
> playing with that until I can narrow it down to the right bit.
>
> I'm still curious as to why the hdparm bit didn't work, but oh well at this
> point.
>
> Thanks,
> Marc
> --
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                       .... what McDonalds is to gourmet cooking
> Home page: http://marc.merlins.org/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: force remapping a pending sector in sw raid5 array
  2018-02-06 22:31     ` Roger Heflin
@ 2018-02-06 22:46       ` Marc MERLIN
  0 siblings, 0 replies; 32+ messages in thread
From: Marc MERLIN @ 2018-02-06 22:46 UTC (permalink / raw)
  To: Roger Heflin; +Cc: Adam Goryachev, Linux RAID

On Tue, Feb 06, 2018 at 04:31:58PM -0600, Roger Heflin wrote:
> What kind of drive is it?   I have had good luck getting seagates to
> remap, on my 3tb WD Red drive with bad sectors the drive does not seem
> to remap them as easily.
 
Device Model:     WL6000GSA6457
Serial Number:    WOL240367065
LU WWN Device Id: 5 0014ee 05932b834
Firmware Version: 82.00A82
User Capacity:    6,001,175,126,016 bytes [6.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   9
ATA Standard is:  Not recognized. Minor revision code: 0x001f

> So far I have a lot of repeat bad sectors, but only 1 has remapped,
> even thought I am given the drive a lot of chances to remap the
> sectors.

Yeah, it seems that things don't work like they should.

Glad to know that it's not just me, then :)

I'll probably return these drives because that behaviour is not ok, but
at the same time it's interesting to learn about failure cases on data
that I could afford to lose (mostly it's the time lost to re-sync a very
big backup, i.e 1 to 2 weeks)

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: force remapping a pending sector in sw raid5 array
  2018-02-06 21:51 ` Adam Goryachev
  2018-02-06 22:02   ` Marc MERLIN
@ 2018-02-07  4:29   ` Marc MERLIN
  1 sibling, 0 replies; 32+ messages in thread
From: Marc MERLIN @ 2018-02-07  4:29 UTC (permalink / raw)
  To: Adam Goryachev; +Cc: linux-raid

On Wed, Feb 07, 2018 at 08:51:15AM +1100, Adam Goryachev wrote:
> On 07/02/18 05:14, Marc MERLIN wrote:
> > So, I have 2 drives on a 5x6TB array that have respectively 1 and 8
> > pending sectors in smart.
> > 
> > Currently, I have a check running, but it will take a while...
> > 
> > echo check > /sys/block/md7/md/sync_action
> > md7 : active raid5 sdf1[0] sdg1[5] sdd1[3] sdh1[2] sde1[1]
> >        23441561600 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]
> >        [==>..................]  check = 10.5% (615972996/5860390400) finish=4822.1min speed=18125K/sec
> >        bitmap: 3/44 pages [12KB], 65536KB chunk

So, I'm a bit confused.
First, I had
      [====>................]  check = 22.5% (1321310068/5860390400) finish=3442.7min speed=21973K/sec
and to recover from that mark, I have to 
echo 2642620136 > /sys/block/md7/md/sync_min

In other words, 1321310068 is not the number you feed to sync_min, you
have to double it.

Then, you said I should take my LBA from 
# 2  Short offline       Completed: read failure       90%       293         1287409520
and multiply it by 4.

Does it really mean I should have used 8?

I used
1287000000 * 4 
5148000000
1288000000 * 4
5152000000
echo 5144000000 > /sys/block/md7/md/sync_min
echo 5160000000 > /sys/block/md7/md/sync_max

And the sync ran without tripping the bad block.
Worse (kinda), the resync just hung once it reached 5160000000. I had to
force idle to stop it.
For what it's worth, the finish counter is also based on the last block
of the drive, and not the value of sync_max.
Minor bugs/problems?

Ok, so I tried again by doubling the value:
echo 10296000000 > /sys/block/md7/md/sync_min
echo 10304000000 > /sys/block/md7/md/sync_max
echo check > /sys/block/md7/md/sync_action

This does not seem to have helped either. I'm now stuck on:
Personalities : [linear] [raid0] [raid1] [raid10] [multipath] [raid6] [raid5] [raid4]
md7 : active raid5 sdf1[0] sdg1[5] sdd1[3] sdh1[2] sde1[1]
      23441561600 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]
      [=================>...]  check = 87.9% (5152000000/5860390400) finish=1977.2min speed=5970K/sec
      bitmap: 1/44 pages [4KB], 65536KB chunk

Sync has reached max and is hung there, but without triggering the bad
block.

Mmmh, hitting this LBA reported in smart seems harder than it seemed.
I've just reset it to running the whole disk and hope it'll hit the bad
block eventually. 

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: force remapping a pending sector in sw raid5 array
  2018-02-06 18:14 force remapping a pending sector in sw raid5 array Marc MERLIN
                   ` (2 preceding siblings ...)
  2018-02-06 21:51 ` Adam Goryachev
@ 2018-02-07  9:42 ` Kay Diederichs
  2018-02-09 19:29   ` Marc MERLIN
  3 siblings, 1 reply; 32+ messages in thread
From: Kay Diederichs @ 2018-02-07  9:42 UTC (permalink / raw)
  To: linux-raid

On 02/06/2018 07:14 PM, Marc MERLIN wrote:
> So, I have 2 drives on a 5x6TB array that have respectively 1 and 8
> pending sectors in smart.
> 
...
> # 3  Extended offline    Completed: read failure       90%       289
      1287409520

I have successfully used "badblocks" to overwrite bad sectors.

The variety of badblocks that comes with RHEL (there may be others!)
could be used with e.g.
 badblocks -svnb512 /dev/sdh 1287409599 1287409400
where
-n     Use non-destructive read-write mode.  By default only a
non-destructive read-only test is done.  This option  must  not  be
combined  with  the  -w option, as they are mutually exclusive.

I've adjusted the last-block and first-block numbers in the command
above so that they
a) encompass the known bad blocks
b) start and end on 4k-boundaries

This command leaves those blocks intact that still can be read.

After that, use a destructive-write badblocks e.g.

badblocks -sfvwb512 /dev/sdh <x> <y>
You'll have to adjust x and y to match just those blocks that cannot be
read, based on the output of the first badblocks run.

Afterwards, "smartctl -t short /dev/sdh" may clean up the SMART statistics.

HTH,

Kay






^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: force remapping a pending sector in sw raid5 array
  2018-02-07  9:42 ` Kay Diederichs
@ 2018-02-09 19:29   ` Marc MERLIN
  2018-02-09 19:57     ` Kay Diederichs
                       ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Marc MERLIN @ 2018-02-09 19:29 UTC (permalink / raw)
  To: Kay Diederichs, Andreas Klauer, Adam Goryachev, Roger Heflin; +Cc: linux-raid

On Wed, Feb 07, 2018 at 10:42:39AM +0100, Kay Diederichs wrote:
> I've adjusted the last-block and first-block numbers in the command
> above so that they
> a) encompass the known bad blocks
> b) start and end on 4k-boundaries
> 
> This command leaves those blocks intact that still can be read.
> 
> After that, use a destructive-write badblocks e.g.
> 
> badblocks -sfvwb512 /dev/sdh <x> <y>
> You'll have to adjust x and y to match just those blocks that cannot be
> read, based on the output of the first badblocks run.

I will try this next, thanks (still, for learning purposes).

But, I'm confused by what happened. The md check ran to completion.
It found things and supposedly fixed them:
[240351.053406] md/raid:md7: read error corrected (8 sectors at 9159374528 on sdf1)

Strangely, it did nothing with this:
[287271.959779] sd 4:4:0:0: [sdh] tag#6 Add. Sense: Unrecovered read error - auto reallocate failed

The full resync/check is here:
[89601.694910] md: data-check of RAID array md7
[240342.514062] ata5.02: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x0
[240342.514073] ata5.02: failed command: READ FPDMA QUEUED
[240342.514081] ata5.02: cmd 60/60:30:70:fc:f0/02:00:21:02:00/40 tag 6 ncq dma 311296 in
[240342.514086] ata5.02: status: { DRDY ERR }
[240342.514089] ata5.02: error: { UNC }
[240342.515351] ata5.02: configured for UDMA/133
[240342.515470] ata5.02: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 t4
[240342.515578] sd 4:2:0:0: [sdf] tag#6 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[240342.515585] sd 4:2:0:0: [sdf] tag#6 Sense Key : Medium Error [current] 
[240342.515590] sd 4:2:0:0: [sdf] tag#6 Add. Sense: Unrecovered read error - auto reallocate failed
[240342.515596] sd 4:2:0:0: [sdf] tag#6 CDB: Read(16) 88 00 00 00 00 02 21 f0 fc 70 00 00 02 60 00 00
[240342.515600] print_req_error: I/O error, dev sdf, sector 9159375984
[240342.515726] ata5: EH complete
[240350.486141] ata5.02: exception Emask 0x0 SAct 0x30 SErr 0x0 action 0x0
[240350.486153] ata5.02: failed command: READ FPDMA QUEUED
[240350.486160] ata5.02: cmd 60/08:20:c0:fe:f0/00:00:21:02:00/40 tag 4 ncq dma 4096 in
[240350.486166] ata5.02: status: { DRDY ERR }
[240350.486169] ata5.02: error: { UNC }
[240350.487403] ata5.02: configured for UDMA/133
[240350.487450] sd 4:2:0:0: [sdf] tag#4 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[240350.487454] sd 4:2:0:0: [sdf] tag#4 Sense Key : Medium Error [current] 
[240350.487458] sd 4:2:0:0: [sdf] tag#4 Add. Sense: Unrecovered read error - auto reallocate failed
[240350.487462] sd 4:2:0:0: [sdf] tag#4 CDB: Read(16) 88 00 00 00 00 02 21 f0 fe c0 00 00 00 08 00 00
[240350.487466] print_req_error: I/O error, dev sdf, sector 9159376576
[240350.487493] ata5: EH complete
[240351.053406] md/raid:md7: read error corrected (8 sectors at 9159374528 on sdf1)
[287271.958430] ata5.04: exception Emask 0x0 SAct 0xffc0 SErr 0x0 action 0x0
[287271.958442] ata5.04: failed command: READ FPDMA QUEUED
[287271.958449] ata5.04: cmd 60/40:30:f0:d7:64/05:00:86:02:00/40 tag 6 ncq dma 688128 in
[287271.958454] ata5.04: status: { DRDY ERR }
[287271.958457] ata5.04: error: { UNC }
[287271.959691] ata5.04: configured for UDMA/133
[287271.959770] sd 4:4:0:0: [sdh] tag#6 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[287271.959775] sd 4:4:0:0: [sdh] tag#6 Sense Key : Medium Error [current] 
[287271.959779] sd 4:4:0:0: [sdh] tag#6 Add. Sense: Unrecovered read error - auto reallocate failed
[287271.959783] sd 4:4:0:0: [sdh] tag#6 CDB: Read(16) 88 00 00 00 00 02 86 64 d7 f0 00 00 05 40 00 00
[287271.959785] print_req_error: I/O error, dev sdh, sector 10844690416
[287271.959889] ata5: EH complete
[315132.651910] md: md7: data-check done.

Now, the sync is comnplete, and my bad blocks are still there?
myth:~# smartctl -A /dev/sdh
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       2

myth:~# smartctl -A /dev/sdf
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       7

The pending sectors should have been re-written and become Reallocated_Event_Count, no?

Reading 
myth:~# hdparm --read-sector 287409520 /dev/sdh
still gives me what looks like non garbage data (but it could be) and
[315411.087451] ata5.04: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[315411.087462] ata5.04: failed command: READ SECTOR(S) EXT
[315411.087469] ata5.04: cmd 24/00:01:70:4f:bc/00:00:4c:00:00/e0 tag 0 pio 512 in
[315411.087469]          res 51/40:01:70:4f:bc/00:00:4c:00:00/e0 Emask 0x9 (media error)
[315411.087474] ata5.04: status: { DRDY ERR }
[315411.087478] ata5.04: error: { UNC }
[315411.108028] ata5.04: configured for UDMA/133
[315411.108075] ata5: EH complete

So, mdadm is happy allegedly, but my drives still have the same bad sectors they had
(more or less).

Yes, I know I should trash (return) those drives, but I still want to
understand why I can't get basic block remapping working
Any idea what went wrong?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: force remapping a pending sector in sw raid5 array
  2018-02-09 19:29   ` Marc MERLIN
@ 2018-02-09 19:57     ` Kay Diederichs
  2018-02-09 20:02     ` Roger Heflin
  2018-02-09 20:13     ` Phil Turmel
  2 siblings, 0 replies; 32+ messages in thread
From: Kay Diederichs @ 2018-02-09 19:57 UTC (permalink / raw)
  Cc: linux-raid

On 02/09/2018 08:29 PM, Marc MERLIN wrote:
> On Wed, Feb 07, 2018 at 10:42:39AM +0100, Kay Diederichs wrote:
>> I've adjusted the last-block and first-block numbers in the command
>> above so that they
>> a) encompass the known bad blocks
>> b) start and end on 4k-boundaries
>>
>> This command leaves those blocks intact that still can be read.
>>
>> After that, use a destructive-write badblocks e.g.
>>
>> badblocks -sfvwb512 /dev/sdh <x> <y>
>> You'll have to adjust x and y to match just those blocks that cannot be
>> read, based on the output of the first badblocks run.
> 
> I will try this next, thanks (still, for learning purposes).
> 
> But, I'm confused by what happened. The md check ran to completion.
> It found things and supposedly fixed them:
> [240351.053406] md/raid:md7: read error corrected (8 sectors at 9159374528 on sdf1)
> 
> Strangely, it did nothing with this:
> [287271.959779] sd 4:4:0:0: [sdh] tag#6 Add. Sense: Unrecovered read error - auto reallocate failed
> 
> The full resync/check is here:
> [89601.694910] md: data-check of RAID array md7
> [240342.514062] ata5.02: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x0
> [240342.514073] ata5.02: failed command: READ FPDMA QUEUED
> [240342.514081] ata5.02: cmd 60/60:30:70:fc:f0/02:00:21:02:00/40 tag 6 ncq dma 311296 in
> [240342.514086] ata5.02: status: { DRDY ERR }
> [240342.514089] ata5.02: error: { UNC }
> [240342.515351] ata5.02: configured for UDMA/133
> [240342.515470] ata5.02: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 t4
> [240342.515578] sd 4:2:0:0: [sdf] tag#6 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [240342.515585] sd 4:2:0:0: [sdf] tag#6 Sense Key : Medium Error [current] 
> [240342.515590] sd 4:2:0:0: [sdf] tag#6 Add. Sense: Unrecovered read error - auto reallocate failed
> [240342.515596] sd 4:2:0:0: [sdf] tag#6 CDB: Read(16) 88 00 00 00 00 02 21 f0 fc 70 00 00 02 60 00 00
> [240342.515600] print_req_error: I/O error, dev sdf, sector 9159375984
> [240342.515726] ata5: EH complete
> [240350.486141] ata5.02: exception Emask 0x0 SAct 0x30 SErr 0x0 action 0x0
> [240350.486153] ata5.02: failed command: READ FPDMA QUEUED
> [240350.486160] ata5.02: cmd 60/08:20:c0:fe:f0/00:00:21:02:00/40 tag 4 ncq dma 4096 in
> [240350.486166] ata5.02: status: { DRDY ERR }
> [240350.486169] ata5.02: error: { UNC }
> [240350.487403] ata5.02: configured for UDMA/133
> [240350.487450] sd 4:2:0:0: [sdf] tag#4 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [240350.487454] sd 4:2:0:0: [sdf] tag#4 Sense Key : Medium Error [current] 
> [240350.487458] sd 4:2:0:0: [sdf] tag#4 Add. Sense: Unrecovered read error - auto reallocate failed
> [240350.487462] sd 4:2:0:0: [sdf] tag#4 CDB: Read(16) 88 00 00 00 00 02 21 f0 fe c0 00 00 00 08 00 00
> [240350.487466] print_req_error: I/O error, dev sdf, sector 9159376576
> [240350.487493] ata5: EH complete
> [240351.053406] md/raid:md7: read error corrected (8 sectors at 9159374528 on sdf1)
> [287271.958430] ata5.04: exception Emask 0x0 SAct 0xffc0 SErr 0x0 action 0x0
> [287271.958442] ata5.04: failed command: READ FPDMA QUEUED
> [287271.958449] ata5.04: cmd 60/40:30:f0:d7:64/05:00:86:02:00/40 tag 6 ncq dma 688128 in
> [287271.958454] ata5.04: status: { DRDY ERR }
> [287271.958457] ata5.04: error: { UNC }
> [287271.959691] ata5.04: configured for UDMA/133
> [287271.959770] sd 4:4:0:0: [sdh] tag#6 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [287271.959775] sd 4:4:0:0: [sdh] tag#6 Sense Key : Medium Error [current] 
> [287271.959779] sd 4:4:0:0: [sdh] tag#6 Add. Sense: Unrecovered read error - auto reallocate failed
> [287271.959783] sd 4:4:0:0: [sdh] tag#6 CDB: Read(16) 88 00 00 00 00 02 86 64 d7 f0 00 00 05 40 00 00
> [287271.959785] print_req_error: I/O error, dev sdh, sector 10844690416
> [287271.959889] ata5: EH complete
> [315132.651910] md: md7: data-check done.
> 
> Now, the sync is comnplete, and my bad blocks are still there?
> myth:~# smartctl -A /dev/sdh
> 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
> 197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       2
> 
> myth:~# smartctl -A /dev/sdf
> 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
> 197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       7
> 
> The pending sectors should have been re-written and become Reallocated_Event_Count, no?
> 
> Reading 
> myth:~# hdparm --read-sector 287409520 /dev/sdh
> still gives me what looks like non garbage data (but it could be) and
> [315411.087451] ata5.04: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [315411.087462] ata5.04: failed command: READ SECTOR(S) EXT
> [315411.087469] ata5.04: cmd 24/00:01:70:4f:bc/00:00:4c:00:00/e0 tag 0 pio 512 in
> [315411.087469]          res 51/40:01:70:4f:bc/00:00:4c:00:00/e0 Emask 0x9 (media error)
> [315411.087474] ata5.04: status: { DRDY ERR }
> [315411.087478] ata5.04: error: { UNC }
> [315411.108028] ata5.04: configured for UDMA/133
> [315411.108075] ata5: EH complete
> 
> So, mdadm is happy allegedly, but my drives still have the same bad sectors they had
> (more or less).
> 
> Yes, I know I should trash (return) those drives, but I still want to
> understand why I can't get basic block remapping working
> Any idea what went wrong?
> 
> Thanks,
> Marc
> 

In my experience, drives do not behave the same.
You could try two things:
a) run smartctl -t short (or even -t long) on the drives - that might
update their internal bad block lists
b) run the "badblocks" commands above - possibly more than once

Pls report what you find.

HTH
Kay


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: force remapping a pending sector in sw raid5 array
  2018-02-09 19:29   ` Marc MERLIN
  2018-02-09 19:57     ` Kay Diederichs
@ 2018-02-09 20:02     ` Roger Heflin
  2018-02-09 20:13     ` Phil Turmel
  2 siblings, 0 replies; 32+ messages in thread
From: Roger Heflin @ 2018-02-09 20:02 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: Kay Diederichs, Andreas Klauer, Adam Goryachev, Linux RAID

I would not count on it with the WD's.   I have several only one has
bad blocks, but some of the blocks have been re-written many times and
the disk firmware  still won't relocate.

Some of mine I can read and get a failure, and force a rewrite, and
then it will fail on the next read pass a few hours later, and again
get re-written to the same block that will again go bad shortly.

Whatever the firmware is doing it has too high of a threshhold or is
too stupid to reliably relocate sectors even when they are obviously
bad.

On Fri, Feb 9, 2018 at 1:29 PM, Marc MERLIN <marc@merlins.org> wrote:
> On Wed, Feb 07, 2018 at 10:42:39AM +0100, Kay Diederichs wrote:
>> I've adjusted the last-block and first-block numbers in the command
>> above so that they
>> a) encompass the known bad blocks
>> b) start and end on 4k-boundaries
>>
>> This command leaves those blocks intact that still can be read.
>>
>> After that, use a destructive-write badblocks e.g.
>>
>> badblocks -sfvwb512 /dev/sdh <x> <y>
>> You'll have to adjust x and y to match just those blocks that cannot be
>> read, based on the output of the first badblocks run.
>
> I will try this next, thanks (still, for learning purposes).
>
> But, I'm confused by what happened. The md check ran to completion.
> It found things and supposedly fixed them:
> [240351.053406] md/raid:md7: read error corrected (8 sectors at 9159374528 on sdf1)
>
> Strangely, it did nothing with this:
> [287271.959779] sd 4:4:0:0: [sdh] tag#6 Add. Sense: Unrecovered read error - auto reallocate failed
>
> The full resync/check is here:
> [89601.694910] md: data-check of RAID array md7
> [240342.514062] ata5.02: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x0
> [240342.514073] ata5.02: failed command: READ FPDMA QUEUED
> [240342.514081] ata5.02: cmd 60/60:30:70:fc:f0/02:00:21:02:00/40 tag 6 ncq dma 311296 in
> [240342.514086] ata5.02: status: { DRDY ERR }
> [240342.514089] ata5.02: error: { UNC }
> [240342.515351] ata5.02: configured for UDMA/133
> [240342.515470] ata5.02: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 t4
> [240342.515578] sd 4:2:0:0: [sdf] tag#6 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [240342.515585] sd 4:2:0:0: [sdf] tag#6 Sense Key : Medium Error [current]
> [240342.515590] sd 4:2:0:0: [sdf] tag#6 Add. Sense: Unrecovered read error - auto reallocate failed
> [240342.515596] sd 4:2:0:0: [sdf] tag#6 CDB: Read(16) 88 00 00 00 00 02 21 f0 fc 70 00 00 02 60 00 00
> [240342.515600] print_req_error: I/O error, dev sdf, sector 9159375984
> [240342.515726] ata5: EH complete
> [240350.486141] ata5.02: exception Emask 0x0 SAct 0x30 SErr 0x0 action 0x0
> [240350.486153] ata5.02: failed command: READ FPDMA QUEUED
> [240350.486160] ata5.02: cmd 60/08:20:c0:fe:f0/00:00:21:02:00/40 tag 4 ncq dma 4096 in
> [240350.486166] ata5.02: status: { DRDY ERR }
> [240350.486169] ata5.02: error: { UNC }
> [240350.487403] ata5.02: configured for UDMA/133
> [240350.487450] sd 4:2:0:0: [sdf] tag#4 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [240350.487454] sd 4:2:0:0: [sdf] tag#4 Sense Key : Medium Error [current]
> [240350.487458] sd 4:2:0:0: [sdf] tag#4 Add. Sense: Unrecovered read error - auto reallocate failed
> [240350.487462] sd 4:2:0:0: [sdf] tag#4 CDB: Read(16) 88 00 00 00 00 02 21 f0 fe c0 00 00 00 08 00 00
> [240350.487466] print_req_error: I/O error, dev sdf, sector 9159376576
> [240350.487493] ata5: EH complete
> [240351.053406] md/raid:md7: read error corrected (8 sectors at 9159374528 on sdf1)
> [287271.958430] ata5.04: exception Emask 0x0 SAct 0xffc0 SErr 0x0 action 0x0
> [287271.958442] ata5.04: failed command: READ FPDMA QUEUED
> [287271.958449] ata5.04: cmd 60/40:30:f0:d7:64/05:00:86:02:00/40 tag 6 ncq dma 688128 in
> [287271.958454] ata5.04: status: { DRDY ERR }
> [287271.958457] ata5.04: error: { UNC }
> [287271.959691] ata5.04: configured for UDMA/133
> [287271.959770] sd 4:4:0:0: [sdh] tag#6 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [287271.959775] sd 4:4:0:0: [sdh] tag#6 Sense Key : Medium Error [current]
> [287271.959779] sd 4:4:0:0: [sdh] tag#6 Add. Sense: Unrecovered read error - auto reallocate failed
> [287271.959783] sd 4:4:0:0: [sdh] tag#6 CDB: Read(16) 88 00 00 00 00 02 86 64 d7 f0 00 00 05 40 00 00
> [287271.959785] print_req_error: I/O error, dev sdh, sector 10844690416
> [287271.959889] ata5: EH complete
> [315132.651910] md: md7: data-check done.
>
> Now, the sync is comnplete, and my bad blocks are still there?
> myth:~# smartctl -A /dev/sdh
> 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
> 197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       2
>
> myth:~# smartctl -A /dev/sdf
> 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
> 197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       7
>
> The pending sectors should have been re-written and become Reallocated_Event_Count, no?
>
> Reading
> myth:~# hdparm --read-sector 287409520 /dev/sdh
> still gives me what looks like non garbage data (but it could be) and
> [315411.087451] ata5.04: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [315411.087462] ata5.04: failed command: READ SECTOR(S) EXT
> [315411.087469] ata5.04: cmd 24/00:01:70:4f:bc/00:00:4c:00:00/e0 tag 0 pio 512 in
> [315411.087469]          res 51/40:01:70:4f:bc/00:00:4c:00:00/e0 Emask 0x9 (media error)
> [315411.087474] ata5.04: status: { DRDY ERR }
> [315411.087478] ata5.04: error: { UNC }
> [315411.108028] ata5.04: configured for UDMA/133
> [315411.108075] ata5: EH complete
>
> So, mdadm is happy allegedly, but my drives still have the same bad sectors they had
> (more or less).
>
> Yes, I know I should trash (return) those drives, but I still want to
> understand why I can't get basic block remapping working
> Any idea what went wrong?
>
> Thanks,
> Marc
> --
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                       .... what McDonalds is to gourmet cooking
> Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: force remapping a pending sector in sw raid5 array
  2018-02-09 19:29   ` Marc MERLIN
  2018-02-09 19:57     ` Kay Diederichs
  2018-02-09 20:02     ` Roger Heflin
@ 2018-02-09 20:13     ` Phil Turmel
  2018-02-09 20:29       ` Marc MERLIN
  2018-02-10 21:43       ` Mateusz Korniak
  2 siblings, 2 replies; 32+ messages in thread
From: Phil Turmel @ 2018-02-09 20:13 UTC (permalink / raw)
  To: Marc MERLIN, Kay Diederichs, Andreas Klauer, Adam Goryachev,
	Roger Heflin
  Cc: linux-raid

Hi Marc,

On 02/09/2018 02:29 PM, Marc MERLIN wrote:

> But, I'm confused by what happened. The md check ran to completion.
> It found things and supposedly fixed them:
> [240351.053406] md/raid:md7: read error corrected (8 sectors at
> 9159374528 on sdf1)

> Strangely, it did nothing with this:
> [287271.959779] sd 4:4:0:0: [sdh] tag#6 Add. Sense: Unrecovered read
> error - auto reallocate failed

> Now, the sync is comnplete, and my bad blocks are still there?
> myth:~# smartctl -A /dev/sdh
> 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age
> Always       -       0
> 197 Current_Pending_Sector  0x0032   200   200   000    Old_age
> Always       -       2
> 
> myth:~# smartctl -A /dev/sdf
> 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age
> Always       -       0
> 197 Current_Pending_Sector  0x0032   200   200   000    Old_age
> Always       -       7
> 

> The pending sectors should have been re-written and become
> Reallocated_Event_Count, no?

Yes, and not necessarily.  Pending sectors can be non-permanent errors
-- the drive firmware will test a pending sector immediately after write
to see if the write is readable.  If not, it will re-allocate while it
still has the write data in its buffers.  Otherwise, it'll clear the
pending sector.

> So, mdadm is happy allegedly, but my drives still have the same bad
> sectors they had (more or less).

If you have bad block lists enabled in your array, MD will *never* try
to fix the underlying sectors.  Please show your mdadm -E reports for
these devices.  If necessary, stop the array and re-assemble with the
options to disable bad block lists.  { How this misfeature got into the
kernel and enabled by default baffles me. }

Also, pending sectors that are in dead zones between metadata and array
data will not be accessed by a check scrub, and will therefore persist.

> Yes, I know I should trash (return) those drives,

Well, non-permanent read errors are not considered warranty failures.
They are in the drive specs.  When pending is zero and actual
re-allocations are climbing (my threshold is double digits), *then* it's
time to replace.

> but I still want to understand why I can't get basic block remapping
> working Any idea what went wrong?

Invalid expectations, perhaps.

Phil


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: force remapping a pending sector in sw raid5 array
  2018-02-09 20:13     ` Phil Turmel
@ 2018-02-09 20:29       ` Marc MERLIN
  2018-02-09 20:44         ` Phil Turmel
                           ` (2 more replies)
  2018-02-10 21:43       ` Mateusz Korniak
  1 sibling, 3 replies; 32+ messages in thread
From: Marc MERLIN @ 2018-02-09 20:29 UTC (permalink / raw)
  To: Phil Turmel
  Cc: Kay Diederichs, Andreas Klauer, Adam Goryachev, Roger Heflin,
	linux-raid

On Fri, Feb 09, 2018 at 03:13:26PM -0500, Phil Turmel wrote:
> > The pending sectors should have been re-written and become
> > Reallocated_Event_Count, no?
> 
> Yes, and not necessarily.  Pending sectors can be non-permanent errors
> -- the drive firmware will test a pending sector immediately after write
> to see if the write is readable.  If not, it will re-allocate while it
> still has the write data in its buffers.  Otherwise, it'll clear the
> pending sector.

This shows the sector is still bad though, right? 

myth:~# hdparm --read-sector 1287409520 /dev/sdh
/dev/sdh:
reading sector 1287409520: SG_IO: bad/missing sense data, sb[]:  70 00 03 00 00 00 00 0a 40 51 e0 01 11 04 00 00 a0 70 00 00 00 00 00 00 00 00 00 00 00 00 00 00 succeeded
7000 0b54 92c4 ffff 0000 0000 01fe 0000
(...)

[ 2572.139404] ata5.04: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[ 2572.139419] ata5.04: failed command: READ SECTOR(S) EXT
[ 2572.139427] ata5.04: cmd 24/00:01:70:4f:bc/00:00:4c:00:00/e0 tag 28 pio 512 in
[ 2572.139427]          res 51/40:01:70:4f:bc/00:00:4c:00:00/e0 Emask 0x9 (media error)
[ 2572.139431] ata5.04: status: { DRDY ERR }
[ 2572.139435] ata5.04: error: { UNC }
[ 2572.162369] ata5.04: configured for UDMA/133
[ 2572.162414] ata5: EH complete

mdadm also said it found 6 bad sectors and rewrote them (or something like that)
and it's happy. So alledgely it did something, but smart does not agree (yet?)

I'm now running a long smart test on all drives, will see if numbers change.

Mmmh, and I just ran 
myth:~# badblocks -fsvnb512 /dev/sdh 1287409599 1287409400
below, and I don't quite understand what's going on.

> > So, mdadm is happy allegedly, but my drives still have the same bad
> > sectors they had (more or less).
> 
> If you have bad block lists enabled in your array, MD will *never* try
> to fix the underlying sectors.  Please show your mdadm -E reports for
> these devices.  If necessary, stop the array and re-assemble with the
> options to disable bad block lists.  { How this misfeature got into the
> kernel and enabled by default baffles me. }

This means I dont have bad block lists?
myth:~# mdadm -E /dev/sdd e f g h all return
/dev/sdd:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)

> Also, pending sectors that are in dead zones between metadata and array
> data will not be accessed by a check scrub, and will therefore persist.
 
That's a good point, but then I would never have discovered those blocks
while initializing the array.

> > Yes, I know I should trash (return) those drives,
> 
> Well, non-permanent read errors are not considered warranty failures.
> They are in the drive specs.  When pending is zero and actual
> re-allocations are climbing (my threshold is double digits), *then* it's
> time to replace.

I think it's worse here. Read errors are not being cleared by block rewrites?
Those are brand "new" (but really remanufactured) drives. 
So far I'm not liking what I'm seeing and I'm very close to just
returning them all and getting some less dodgy ones.

Sad because the last set of 5 I got from a similar source, have worked
beautifully.

Let's see what a full smart scan does.
I may also use hdparm --write-sector to just fill those bad blocks with 0's
now that it seems that mdadm isn't caring about/using them anymore?

Now, badblocks perplexes me even more. Shouldn't -n re-write blocks?

myth:~# badblocks -fsvnb512 /dev/sdh 1287409599 1287409400
/dev/sdh is apparently in use by the system; badblocks forced anyway.
Checking for bad blocks in non-destructive read-write mode
From block 1287409400 to 1287409599
Checking for bad blocks (non-destructive read-write test)
Testing with random pattern: 1287409520ne, 0:14 elapsed. (0/0/0 errors)
1287409521ne, 0:18 elapsed. (1/0/0 errors)
1287409522ne, 0:23 elapsed. (2/0/0 errors)
1287409523ne, 0:27 elapsed. (3/0/0 errors)
1287409524ne, 0:31 elapsed. (4/0/0 errors)
1287409525ne, 0:36 elapsed. (5/0/0 errors)
1287409526ne, 0:40 elapsed. (6/0/0 errors)
1287409527ne, 0:44 elapsed. (7/0/0 errors)
done                                                 
Pass completed, 8 bad blocks found. (8/0/0 errors)

Badblocks found 8 bad blocks, but didn't rewrite them, or failed to, or
succeeded but that did nothing anyway?

Do I understand that
1) badblocks got read errors
2) it's supposed to rewrite the blocks with new data (or not?)
3) auto reallocate failed


[ 3171.717001] ata5.04: exception Emask 0x0 SAct 0x40 SErr 0x0 action 0x0
[ 3171.717012] ata5.04: failed command: READ FPDMA QUEUED 
[ 3171.717019] ata5.04: cmd 60/08:30:70:4f:bc/00:00:4c:00:00/40 tag 6 ncq dma 4096 in
[ 3171.717019]          res 41/40:00:70:4f:bc/00:00:4c:00:00/00 Emask 0x409 (media error) <F>
[ 3171.717031] ata5.04: status: { DRDY ERR } 
[ 3171.717034] ata5.04: error: { UNC }
[ 3171.718293] ata5.04: configured for UDMA/133
[ 3171.718342] sd 4:4:0:0: [sdh] tag#6 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 3171.718349] sd 4:4:0:0: [sdh] tag#6 Sense Key : Medium Error [current] 
[ 3171.718354] sd 4:4:0:0: [sdh] tag#6 Add. Sense: Unrecovered read error - auto reallocate failed 
[ 3171.718360] sd 4:4:0:0: [sdh] tag#6 CDB: Read(16) 88 00 00 00 00 00 4c bc 4f 70 00 00 00 08 00 00
[ 3171.718364] print_req_error: I/O error, dev sdh, sector 1287409520 
[ 3171.718369] Buffer I/O error on dev sdh, logical block 160926190, async page read
[ 3171.718393] ata5: EH complete
[ 3176.092946] ata5.04: exception Emask 0x0 SAct 0x400000 SErr 0x0 action 0x0
[ 3176.092958] ata5.04: failed command: READ FPDMA QUEUED
[ 3176.092973] ata5.04: cmd 60/08:b0:70:4f:bc/00:00:4c:00:00/40 tag 22 ncq dma 4096 in 
[ 3176.092973]          res 41/40:00:70:4f:bc/00:00:4c:00:00/00 Emask 0x409 (media error) <F>
[ 3176.092978] ata5.04: status: { DRDY ERR }
[ 3176.092981] ata5.04: error: { UNC } 
[ 3176.094237] ata5.04: configured for UDMA/133
[ 3176.094285] sd 4:4:0:0: [sdh] tag#22 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE 
[ 3176.094291] sd 4:4:0:0: [sdh] tag#22 Sense Key : Medium Error [current] 
[ 3176.094296] sd 4:4:0:0: [sdh] tag#22 Add. Sense: Unrecovered read error - auto reallocate failed
[ 3176.094302] sd 4:4:0:0: [sdh] tag#22 CDB: Read(16) 88 00 00 00 00 00 4c bc 4f 70 00 00 00 08 00 00 
[ 3176.094306] print_req_error: I/O error, dev sdh, sector 1287409520
[ 3176.094310] Buffer I/O error on dev sdh, logical block 160926190, async page read
[ 3176.094324] ata5: EH complete
[ 3180.488899] ata5.04: exception Emask 0x0 SAct 0x100 SErr 0x0 action 0x0
[ 3180.488909] ata5.04: failed command: READ FPDMA QUEUED 
[ 3180.488916] ata5.04: cmd 60/08:40:70:4f:bc/00:00:4c:00:00/40 tag 8 ncq dma 4096 in
[ 3180.488916]          res 41/40:00:70:4f:bc/00:00:4c:00:00/00 Emask 0x409 (media error) <F> 
[ 3180.488928] ata5.04: status: { DRDY ERR }
[ 3180.488931] ata5.04: error: { UNC }
[ 3180.490193] ata5.04: configured for UDMA/133
[ 3180.490243] sd 4:4:0:0: [sdh] tag#8 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 3180.490249] sd 4:4:0:0: [sdh] tag#8 Sense Key : Medium Error [current]  
[ 3180.490254] sd 4:4:0:0: [sdh] tag#8 Add. Sense: Unrecovered read error - auto reallocate failed
[ 3180.490259] sd 4:4:0:0: [sdh] tag#8 CDB: Read(16) 88 00 00 00 00 00 4c bc 4f 70 00 00 00 08 00 00
[ 3180.490263] print_req_error: I/O error, dev sdh, sector 1287409520 
[ 3180.490268] Buffer I/O error on dev sdh, logical block 160926190, async page read
[ 3180.490290] ata5: EH complete 
[ 3184.873146] ata5.04: exception Emask 0x0 SAct 0x1000000 SErr 0x0 action 0x0
[ 3184.873161] ata5.04: failed command: READ FPDMA QUEUED
[ 3184.873175] ata5.04: cmd 60/08:c0:70:4f:bc/00:00:4c:00:00/40 tag 24 ncq dma 4096 in 
[ 3184.873175]          res 41/40:00:70:4f:bc/00:00:4c:00:00/00 Emask 0x409 (media error) <F>
[ 3184.873181] ata5.04: status: { DRDY ERR }
[ 3184.873184] ata5.04: error: { UNC }
[ 3184.874437] ata5.04: configured for UDMA/133
[ 3184.874488] sd 4:4:0:0: [sdh] tag#24 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 3184.874495] sd 4:4:0:0: [sdh] tag#24 Sense Key : Medium Error [current] 
[ 3184.874500] sd 4:4:0:0: [sdh] tag#24 Add. Sense: Unrecovered read error - auto reallocate failed
[ 3184.874506] sd 4:4:0:0: [sdh] tag#24 CDB: Read(16) 88 00 00 00 00 00 4c bc 4f 70 00 00 00 08 00 00
[ 3184.874510] print_req_error: I/O error, dev sdh, sector 1287409520
[ 3184.874515] Buffer I/O error on dev sdh, logical block 160926190, async page read
[ 3184.874555] ata5: EH complete

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: force remapping a pending sector in sw raid5 array
  2018-02-09 20:29       ` Marc MERLIN
@ 2018-02-09 20:44         ` Phil Turmel
  2018-02-09 21:22           ` Marc MERLIN
  2018-02-09 20:52         ` Kay Diederichs
  2018-02-09 21:17         ` Kay Diederichs
  2 siblings, 1 reply; 32+ messages in thread
From: Phil Turmel @ 2018-02-09 20:44 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: Kay Diederichs, Andreas Klauer, Adam Goryachev, Roger Heflin,
	linux-raid

On 02/09/2018 03:29 PM, Marc MERLIN wrote:
> On Fri, Feb 09, 2018 at 03:13:26PM -0500, Phil Turmel wrote:
>>> The pending sectors should have been re-written and become
>>> Reallocated_Event_Count, no?
>>
>> Yes, and not necessarily.  Pending sectors can be non-permanent errors
>> -- the drive firmware will test a pending sector immediately after write
>> to see if the write is readable.  If not, it will re-allocate while it
>> still has the write data in its buffers.  Otherwise, it'll clear the
>> pending sector.
> 
> This shows the sector is still bad though, right? 
> 
> myth:~# hdparm --read-sector 1287409520 /dev/sdh
> /dev/sdh:
> reading sector 1287409520: SG_IO: bad/missing sense data, sb[]:  70 00 03 00 00 00 00 0a 40 51 e0 01 11 04 00 00 a0 70 00 00 00 00 00 00 00 00 00 00 00 00 00 00 succeeded
> 7000 0b54 92c4 ffff 0000 0000 01fe 0000
> (...)
> 
> [ 2572.139404] ata5.04: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [ 2572.139419] ata5.04: failed command: READ SECTOR(S) EXT
> [ 2572.139427] ata5.04: cmd 24/00:01:70:4f:bc/00:00:4c:00:00/e0 tag 28 pio 512 in
> [ 2572.139427]          res 51/40:01:70:4f:bc/00:00:4c:00:00/e0 Emask 0x9 (media error)
> [ 2572.139431] ata5.04: status: { DRDY ERR }
> [ 2572.139435] ata5.04: error: { UNC }
> [ 2572.162369] ata5.04: configured for UDMA/133
> [ 2572.162414] ata5: EH complete

Yes.  Those sectors are still pending.

> mdadm also said it found 6 bad sectors and rewrote them (or something like that)
> and it's happy. So alledgely it did something, but smart does not agree (yet?)

Like I said, mdadm "check" won't fix sectors that it has recorded as
bad, and doesn't even look at sectors outside its data area.

> I'm now running a long smart test on all drives, will see if numbers change.

Self tests in the drives don't fix pending sectors, as they don't have
the correct data to write.  That's why they can only be fixed by an
upper layer providing the data (during write).

> Mmmh, and I just ran 
> myth:~# badblocks -fsvnb512 /dev/sdh 1287409599 1287409400
> below, and I don't quite understand what's going on.

I'm not talking about the badblocks command.  I'm talking about the bad
block logging feature of MD.

> This means I dont have bad block lists?
> myth:~# mdadm -E /dev/sdd e f g h all return
> /dev/sdd:
>    MBR Magic : aa55
> Partition[0] :   4294967295 sectors at            1 (type ee)

This means nothing.  Please run mdadm -E on the *member devices*.  That
means include the partition number if you are using partitions.  See the
output of mdadm -D /dev/mdX for an array's list of *members*.

>> Well, non-permanent read errors are not considered warranty failures.
>> They are in the drive specs.  When pending is zero and actual
>> re-allocations are climbing (my threshold is double digits), *then* it's
>> time to replace.
> 
> I think it's worse here. Read errors are not being cleared by block rewrites?
> Those are brand "new" (but really remanufactured) drives. 
> So far I'm not liking what I'm seeing and I'm very close to just
> returning them all and getting some less dodgy ones.

How do you know that these sectors have been re-written?  Let me repeat:
MD will *not* write to blocks that it has recorded as bad in *its* bad
block list, and doesn't even read non-data-area blocks during a check.

> Sad because the last set of 5 I got from a similar source, have worked
> beautifully.

I'm not convinced these drives aren't working beautifully.

> Let's see what a full smart scan does.
> I may also use hdparm --write-sector to just fill those bad blocks with 0's
> now that it seems that mdadm isn't caring about/using them anymore?
> 
> Now, badblocks perplexes me even more. Shouldn't -n re-write blocks?
> 
> myth:~# badblocks -fsvnb512 /dev/sdh 1287409599 1287409400
> /dev/sdh is apparently in use by the system; badblocks forced anyway.

This should have been a hint that you shouldn't be using the badblocks
utility on a running array's devices.

> Badblocks found 8 bad blocks, but didn't rewrite them, or failed to, or
> succeeded but that did nothing anyway?
> 
> Do I understand that
> 1) badblocks got read errors

Yes.

> 2) it's supposed to rewrite the blocks with new data (or not?)

No.

> 3) auto reallocate failed

Don't know.  You haven't provided the information needed to say.

Phil


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: force remapping a pending sector in sw raid5 array
  2018-02-09 20:29       ` Marc MERLIN
  2018-02-09 20:44         ` Phil Turmel
@ 2018-02-09 20:52         ` Kay Diederichs
  2018-02-11 20:52           ` Roger Heflin
  2018-02-09 21:17         ` Kay Diederichs
  2 siblings, 1 reply; 32+ messages in thread
From: Kay Diederichs @ 2018-02-09 20:52 UTC (permalink / raw)
  To: Marc MERLIN, Phil Turmel
  Cc: Andreas Klauer, Adam Goryachev, Roger Heflin, linux-raid

[-- Attachment #1: Type: text/plain, Size: 9361 bytes --]



Am 09/02/18 um 21:29 schrieb Marc MERLIN:
> On Fri, Feb 09, 2018 at 03:13:26PM -0500, Phil Turmel wrote:
>>> The pending sectors should have been re-written and become
>>> Reallocated_Event_Count, no?
>>
>> Yes, and not necessarily.  Pending sectors can be non-permanent errors
>> -- the drive firmware will test a pending sector immediately after write
>> to see if the write is readable.  If not, it will re-allocate while it
>> still has the write data in its buffers.  Otherwise, it'll clear the
>> pending sector.
> 
> This shows the sector is still bad though, right? 
> 
> myth:~# hdparm --read-sector 1287409520 /dev/sdh
> /dev/sdh:
> reading sector 1287409520: SG_IO: bad/missing sense data, sb[]:  70 00 03 00 00 00 00 0a 40 51 e0 01 11 04 00 00 a0 70 00 00 00 00 00 00 00 00 00 00 00 00 00 00 succeeded
> 7000 0b54 92c4 ffff 0000 0000 01fe 0000
> (...)
> 
> [ 2572.139404] ata5.04: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [ 2572.139419] ata5.04: failed command: READ SECTOR(S) EXT
> [ 2572.139427] ata5.04: cmd 24/00:01:70:4f:bc/00:00:4c:00:00/e0 tag 28 pio 512 in
> [ 2572.139427]          res 51/40:01:70:4f:bc/00:00:4c:00:00/e0 Emask 0x9 (media error)
> [ 2572.139431] ata5.04: status: { DRDY ERR }
> [ 2572.139435] ata5.04: error: { UNC }
> [ 2572.162369] ata5.04: configured for UDMA/133
> [ 2572.162414] ata5: EH complete
> 
> mdadm also said it found 6 bad sectors and rewrote them (or something like that)
> and it's happy. So alledgely it did something, but smart does not agree (yet?)
> 
> I'm now running a long smart test on all drives, will see if numbers change.
> 
> Mmmh, and I just ran 
> myth:~# badblocks -fsvnb512 /dev/sdh 1287409599 1287409400
> below, and I don't quite understand what's going on.
> 
>>> So, mdadm is happy allegedly, but my drives still have the same bad
>>> sectors they had (more or less).
>>
>> If you have bad block lists enabled in your array, MD will *never* try
>> to fix the underlying sectors.  Please show your mdadm -E reports for
>> these devices.  If necessary, stop the array and re-assemble with the
>> options to disable bad block lists.  { How this misfeature got into the
>> kernel and enabled by default baffles me. }
> 
> This means I dont have bad block lists?
> myth:~# mdadm -E /dev/sdd e f g h all return
> /dev/sdd:
>    MBR Magic : aa55
> Partition[0] :   4294967295 sectors at            1 (type ee)
> 
>> Also, pending sectors that are in dead zones between metadata and array
>> data will not be accessed by a check scrub, and will therefore persist.
>  
> That's a good point, but then I would never have discovered those blocks
> while initializing the array.
> 
>>> Yes, I know I should trash (return) those drives,
>>
>> Well, non-permanent read errors are not considered warranty failures.
>> They are in the drive specs.  When pending is zero and actual
>> re-allocations are climbing (my threshold is double digits), *then* it's
>> time to replace.
> 
> I think it's worse here. Read errors are not being cleared by block rewrites?
> Those are brand "new" (but really remanufactured) drives. 
> So far I'm not liking what I'm seeing and I'm very close to just
> returning them all and getting some less dodgy ones.
> 
> Sad because the last set of 5 I got from a similar source, have worked
> beautifully.
> 
> Let's see what a full smart scan does.
> I may also use hdparm --write-sector to just fill those bad blocks with 0's
> now that it seems that mdadm isn't caring about/using them anymore?
> 
> Now, badblocks perplexes me even more. Shouldn't -n re-write blocks?
> 
> myth:~# badblocks -fsvnb512 /dev/sdh 1287409599 1287409400
> /dev/sdh is apparently in use by the system; badblocks forced anyway.
> Checking for bad blocks in non-destructive read-write mode
> From block 1287409400 to 1287409599
> Checking for bad blocks (non-destructive read-write test)
> Testing with random pattern: 1287409520ne, 0:14 elapsed. (0/0/0 errors)
> 1287409521ne, 0:18 elapsed. (1/0/0 errors)
> 1287409522ne, 0:23 elapsed. (2/0/0 errors)
> 1287409523ne, 0:27 elapsed. (3/0/0 errors)
> 1287409524ne, 0:31 elapsed. (4/0/0 errors)
> 1287409525ne, 0:36 elapsed. (5/0/0 errors)
> 1287409526ne, 0:40 elapsed. (6/0/0 errors)
> 1287409527ne, 0:44 elapsed. (7/0/0 errors)
> done                                                 
> Pass completed, 8 bad blocks found. (8/0/0 errors)
> 
> Badblocks found 8 bad blocks, but didn't rewrite them, or failed to, or
> succeeded but that did nothing anyway?
> 
> Do I understand that
> 1) badblocks got read errors
> 2) it's supposed to rewrite the blocks with new data (or not?)
> 3) auto reallocate failed
> 
> 
> [ 3171.717001] ata5.04: exception Emask 0x0 SAct 0x40 SErr 0x0 action 0x0
> [ 3171.717012] ata5.04: failed command: READ FPDMA QUEUED 
> [ 3171.717019] ata5.04: cmd 60/08:30:70:4f:bc/00:00:4c:00:00/40 tag 6 ncq dma 4096 in
> [ 3171.717019]          res 41/40:00:70:4f:bc/00:00:4c:00:00/00 Emask 0x409 (media error) <F>
> [ 3171.717031] ata5.04: status: { DRDY ERR } 
> [ 3171.717034] ata5.04: error: { UNC }
> [ 3171.718293] ata5.04: configured for UDMA/133
> [ 3171.718342] sd 4:4:0:0: [sdh] tag#6 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [ 3171.718349] sd 4:4:0:0: [sdh] tag#6 Sense Key : Medium Error [current] 
> [ 3171.718354] sd 4:4:0:0: [sdh] tag#6 Add. Sense: Unrecovered read error - auto reallocate failed 
> [ 3171.718360] sd 4:4:0:0: [sdh] tag#6 CDB: Read(16) 88 00 00 00 00 00 4c bc 4f 70 00 00 00 08 00 00
> [ 3171.718364] print_req_error: I/O error, dev sdh, sector 1287409520 
> [ 3171.718369] Buffer I/O error on dev sdh, logical block 160926190, async page read
> [ 3171.718393] ata5: EH complete
> [ 3176.092946] ata5.04: exception Emask 0x0 SAct 0x400000 SErr 0x0 action 0x0
> [ 3176.092958] ata5.04: failed command: READ FPDMA QUEUED
> [ 3176.092973] ata5.04: cmd 60/08:b0:70:4f:bc/00:00:4c:00:00/40 tag 22 ncq dma 4096 in 
> [ 3176.092973]          res 41/40:00:70:4f:bc/00:00:4c:00:00/00 Emask 0x409 (media error) <F>
> [ 3176.092978] ata5.04: status: { DRDY ERR }
> [ 3176.092981] ata5.04: error: { UNC } 
> [ 3176.094237] ata5.04: configured for UDMA/133
> [ 3176.094285] sd 4:4:0:0: [sdh] tag#22 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE 
> [ 3176.094291] sd 4:4:0:0: [sdh] tag#22 Sense Key : Medium Error [current] 
> [ 3176.094296] sd 4:4:0:0: [sdh] tag#22 Add. Sense: Unrecovered read error - auto reallocate failed
> [ 3176.094302] sd 4:4:0:0: [sdh] tag#22 CDB: Read(16) 88 00 00 00 00 00 4c bc 4f 70 00 00 00 08 00 00 
> [ 3176.094306] print_req_error: I/O error, dev sdh, sector 1287409520
> [ 3176.094310] Buffer I/O error on dev sdh, logical block 160926190, async page read
> [ 3176.094324] ata5: EH complete
> [ 3180.488899] ata5.04: exception Emask 0x0 SAct 0x100 SErr 0x0 action 0x0
> [ 3180.488909] ata5.04: failed command: READ FPDMA QUEUED 
> [ 3180.488916] ata5.04: cmd 60/08:40:70:4f:bc/00:00:4c:00:00/40 tag 8 ncq dma 4096 in
> [ 3180.488916]          res 41/40:00:70:4f:bc/00:00:4c:00:00/00 Emask 0x409 (media error) <F> 
> [ 3180.488928] ata5.04: status: { DRDY ERR }
> [ 3180.488931] ata5.04: error: { UNC }
> [ 3180.490193] ata5.04: configured for UDMA/133
> [ 3180.490243] sd 4:4:0:0: [sdh] tag#8 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [ 3180.490249] sd 4:4:0:0: [sdh] tag#8 Sense Key : Medium Error [current]  
> [ 3180.490254] sd 4:4:0:0: [sdh] tag#8 Add. Sense: Unrecovered read error - auto reallocate failed
> [ 3180.490259] sd 4:4:0:0: [sdh] tag#8 CDB: Read(16) 88 00 00 00 00 00 4c bc 4f 70 00 00 00 08 00 00
> [ 3180.490263] print_req_error: I/O error, dev sdh, sector 1287409520 
> [ 3180.490268] Buffer I/O error on dev sdh, logical block 160926190, async page read
> [ 3180.490290] ata5: EH complete 
> [ 3184.873146] ata5.04: exception Emask 0x0 SAct 0x1000000 SErr 0x0 action 0x0
> [ 3184.873161] ata5.04: failed command: READ FPDMA QUEUED
> [ 3184.873175] ata5.04: cmd 60/08:c0:70:4f:bc/00:00:4c:00:00/40 tag 24 ncq dma 4096 in 
> [ 3184.873175]          res 41/40:00:70:4f:bc/00:00:4c:00:00/00 Emask 0x409 (media error) <F>
> [ 3184.873181] ata5.04: status: { DRDY ERR }
> [ 3184.873184] ata5.04: error: { UNC }
> [ 3184.874437] ata5.04: configured for UDMA/133
> [ 3184.874488] sd 4:4:0:0: [sdh] tag#24 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [ 3184.874495] sd 4:4:0:0: [sdh] tag#24 Sense Key : Medium Error [current] 
> [ 3184.874500] sd 4:4:0:0: [sdh] tag#24 Add. Sense: Unrecovered read error - auto reallocate failed
> [ 3184.874506] sd 4:4:0:0: [sdh] tag#24 CDB: Read(16) 88 00 00 00 00 00 4c bc 4f 70 00 00 00 08 00 00
> [ 3184.874510] print_req_error: I/O error, dev sdh, sector 1287409520
> [ 3184.874515] Buffer I/O error on dev sdh, logical block 160926190, async page read
> [ 3184.874555] ata5: EH complete
> 

What you write about the result of
badblocks -fsvnb512 /dev/sdh 1287409599 1287409400
is the expected behavior. -n means that it will _not_ write sectors that
it cannot read (because that would remove the possibility that data from
these sectors could be recovered by more tries).

As I wrote, you have to use the -w option instead of -n, and use x and y
of 1287409527 1287409520

HTH
Kay



[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5049 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: force remapping a pending sector in sw raid5 array
  2018-02-09 20:29       ` Marc MERLIN
  2018-02-09 20:44         ` Phil Turmel
  2018-02-09 20:52         ` Kay Diederichs
@ 2018-02-09 21:17         ` Kay Diederichs
  2 siblings, 0 replies; 32+ messages in thread
From: Kay Diederichs @ 2018-02-09 21:17 UTC (permalink / raw)
  To: Marc MERLIN, Phil Turmel
  Cc: Andreas Klauer, Adam Goryachev, Roger Heflin, linux-raid

On 02/09/2018 09:29 PM, Marc MERLIN wrote:

...
> 
> myth:~# badblocks -fsvnb512 /dev/sdh 1287409599 1287409400
> /dev/sdh is apparently in use by the system; badblocks forced anyway.
> Checking for bad blocks in non-destructive read-write mode

badblocks gives a warning that /dev/sdh is in use. You should not use it
that way (needing the -f option), because essentially you are messing
with the drive behind md's back.
Remove /dev/sdh from the array before you use badblocks, or hdparm, or
dd or the like on a member device.

Kay

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: force remapping a pending sector in sw raid5 array
  2018-02-09 20:44         ` Phil Turmel
@ 2018-02-09 21:22           ` Marc MERLIN
  2018-02-09 22:07             ` Wol's lists
  0 siblings, 1 reply; 32+ messages in thread
From: Marc MERLIN @ 2018-02-09 21:22 UTC (permalink / raw)
  To: Phil Turmel, Kay Diederichs
  Cc: Andreas Klauer, Adam Goryachev, Roger Heflin, linux-raid

On Fri, Feb 09, 2018 at 03:44:56PM -0500, Phil Turmel wrote:
> > myth:~# mdadm -E /dev/sdd e f g h all return
> > /dev/sdd:
> >    MBR Magic : aa55
> > Partition[0] :   4294967295 sectors at            1 (type ee)
> 
> This means nothing.  Please run mdadm -E on the *member devices*.  That
> means include the partition number if you are using partitions.  See the
> output of mdadm -D /dev/mdX for an array's list of *members*.

Ooops, I knew better, sorry about that (I use --examine usually)

As you guessed, there it is:
  Bad Block Log : 512 entries available at offset 72 sectors - bad blocks present.

So it knows about the bad blocks, skips over them during check/rewrite and
that's why they never got rewritten.
I can see why this could be helpful in some way, but yeah, that confused me
until now. Thanks for pointing that out to me.

> > I think it's worse here. Read errors are not being cleared by block rewrites?
> > Those are brand "new" (but really remanufactured) drives. 
> > So far I'm not liking what I'm seeing and I'm very close to just
> > returning them all and getting some less dodgy ones.
> 
> How do you know that these sectors have been re-written?  Let me repeat:
> MD will *not* write to blocks that it has recorded as bad in *its* bad
> block list, and doesn't even read non-data-area blocks during a check.

Right, got it.

> > Sad because the last set of 5 I got from a similar source, have worked
> > beautifully.
> 
> I'm not convinced these drives aren't working beautifully.

Would you say it's acceptable for a drive nowadays to come with pending sectors 
as soon as you use it?
Yes, I understand I can get them re-allocated and once too many get reallocated, 
things get incrementally bad, but my bar so far as been that by the time a drive
is starting to re-allocate sectors, I should start watching it closely.
If it does this out of the box, then it shouldn't have passed QA and been shipped
to me to start with.
Maybe it's the problem of how many dead pixels are acceptable on a 4K LCD?

> > myth:~# badblocks -fsvnb512 /dev/sdh 1287409599 1287409400
> > /dev/sdh is apparently in use by the system; badblocks forced anyway.
> 
> This should have been a hint that you shouldn't be using the badblocks
> utility on a running array's devices.

I knew I was doing that, we already established that those blocks are not being
used by the array itself because they're in the md bad block skip list, no?
But ok, point taken, bad practise, I'll stop the array first next time.

On Fri, Feb 09, 2018 at 09:52:38PM +0100, Kay Diederichs wrote:
> > From block 1287409400 to 1287409599
> > Checking for bad blocks (non-destructive read-write test)
> > Testing with random pattern: 1287409520ne, 0:14 elapsed. (0/0/0 errors)
> > 1287409521ne, 0:18 elapsed. (1/0/0 errors)
> > 1287409522ne, 0:23 elapsed. (2/0/0 errors)
> > 1287409523ne, 0:27 elapsed. (3/0/0 errors)
> > 1287409524ne, 0:31 elapsed. (4/0/0 errors)
> > 1287409525ne, 0:36 elapsed. (5/0/0 errors)
> > 1287409526ne, 0:40 elapsed. (6/0/0 errors)
> > 1287409527ne, 0:44 elapsed. (7/0/0 errors)
> > done                                                 
> > Pass completed, 8 bad blocks found. (8/0/0 errors)
> 
> What you write about the result of
> badblocks -fsvnb512 /dev/sdh 1287409599 1287409400
> is the expected behavior. -n means that it will _not_ write sectors that
> it cannot read (because that would remove the possibility that data from
> these sectors could be recovered by more tries).
> 
> As I wrote, you have to use the -w option instead of -n, and use x and y
> of 1287409527 1287409520

Right. Just had a very short night, so I'm not doing my best thinking right now :)

myth:~# badblocks -fsvwb512 /dev/sdh 1287409527 1287409520
/dev/sdh is apparently in use by the system; badblocks forced anyway.
Checking for bad blocks in read-write mode
From block 1287409520 to 1287409527
Testing with pattern 0xaa: done                                                 
Reading and comparing: done                                                 
Testing with pattern 0x55: done                                                 
Reading and comparing: done                                                 
Testing with pattern 0xff: done                                                 
Reading and comparing: done                                                 
Testing with pattern 0x00: done                                                 
Reading and comparing: done                                                 
Pass completed, 0 bad blocks found. (0/0/0 errors)

I'm a bit confused as to why bad blocks needs to work in reverse sector
order, but it worked.

Before:
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0                               
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       2    
After:
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       1

So, that fixed one sector, and somehow the drive decided it didn't need to be re-allocated.

Interesting. I figured once a sector went pending once, it would not actually be re-used and 
be remapped on the next write. Seems like it didn't happen here.

Either way, thanks all for you help, let me poke at it a bit more.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: force remapping a pending sector in sw raid5 array
  2018-02-09 21:22           ` Marc MERLIN
@ 2018-02-09 22:07             ` Wol's lists
  2018-02-09 22:36               ` Marc MERLIN
  0 siblings, 1 reply; 32+ messages in thread
From: Wol's lists @ 2018-02-09 22:07 UTC (permalink / raw)
  To: Marc MERLIN, Phil Turmel, Kay Diederichs
  Cc: Andreas Klauer, Adam Goryachev, Roger Heflin, linux-raid

On 09/02/18 21:22, Marc MERLIN wrote:
> Interesting. I figured once a sector went pending once, it would not actually be re-used and
> be remapped on the next write. Seems like it didn't happen here.

Because there's all sorts of reasons a sector can go pending.

My favourite example is to compare it to DRAM. DRAM needs refreshing 
every couple of seconds, otherwise it loses its contents and cannot be 
read, but it's perfectly okay to rewrite and re-use it.

Likewise, the magnetism in a drive can decay such that the data is 
unreadable, but there's nothing actually wrong with the drive. (If the 
data next door is repeatedly rewritten, the rewrite can "leak" and trash 
nearby data ...) The decay time for that should be years.

The problem of course is when the problem has a decay time measured in 
minutes or hours. The rewrite succeeds, so the sector doesn't get 
remapped, but when you next read it it has died :-(

Cheers,
Wol

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: force remapping a pending sector in sw raid5 array
  2018-02-09 22:07             ` Wol's lists
@ 2018-02-09 22:36               ` Marc MERLIN
  0 siblings, 0 replies; 32+ messages in thread
From: Marc MERLIN @ 2018-02-09 22:36 UTC (permalink / raw)
  To: Wol's lists
  Cc: Phil Turmel, Kay Diederichs, Andreas Klauer, Adam Goryachev,
	Roger Heflin, linux-raid

On Fri, Feb 09, 2018 at 10:07:57PM +0000, Wol's lists wrote:
> On 09/02/18 21:22, Marc MERLIN wrote:
> >Interesting. I figured once a sector went pending once, it would not 
> >actually be re-used and
> >be remapped on the next write. Seems like it didn't happen here.
> 
> Because there's all sorts of reasons a sector can go pending.
> 
> My favourite example is to compare it to DRAM. DRAM needs refreshing 
> every couple of seconds, otherwise it loses its contents and cannot be 
> read, but it's perfectly okay to rewrite and re-use it.
 
You're correct. The density of drives is so high now that writing a block
affects the ones around it.

> Likewise, the magnetism in a drive can decay such that the data is 
> unreadable, but there's nothing actually wrong with the drive. (If the 
> data next door is repeatedly rewritten, the rewrite can "leak" and trash 
> nearby data ...) The decay time for that should be years.

Right. That's why I'm unhappy that it happened within a week of unpacking
the drives and 2 out of 5 had problems already.

> The problem of course is when the problem has a decay time measured in 
> minutes or hours. The rewrite succeeds, so the sector doesn't get 
> remapped, but when you next read it it has died :-(

Speaking of this, I still haven't gotten the drive to actually remap
anything yet.
On that 2nd drive, I'm seeing 7 pending sectors, and can't trigger any error
or remapping on them:
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always -       7

# 1  Short offline       Completed: read failure       90%       519         569442000
# 2  Short offline       Completed: read failure       90%       519         569442000
# 3  Extended offline    Completed: read failure       90%       518         569442000
# 4  Short offline       Completed without error       00%       508         -
# 5  Short offline       Completed without error       00%       484         -
# 6  Short offline       Completed without error       00%       460         -
# 7  Short offline       Completed without error       00%       436         -
# 8  Short offline       Completed: read failure       90%       413         569441985
# 9  Extended offline    Completed: read failure       90%       409         569441990
#10  Extended offline    Completed: read failure       90%       409         569441985
#11  Extended offline    Completed: read failure       90%       409         569441991
#12  Extended offline    Completed: read failure       90%       409         569441985

So, running badblocks over that range should help, right?

But no, I get nothing:
myth:~# badblocks -fsvn -b512 /dev/sdf  569942000 569001000
/dev/sdf is apparently in use by the system; badblocks forced anyway.
Checking for bad blocks in non-destructive read-write mode
From block 569001000 to 569942000
Checking for bad blocks (non-destructive read-write test)
Testing with random pattern: done                                                 
Pass completed, 0 bad blocks found. (0/0/0 errors)

In some way, unless I'm reading the wrong blocks, that would mean the blocks are good again?

But smart still shows 
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       7

and a short offline test immediately shows
# 1  Short offline       Completed: read failure       90%       519         569442000

Clearly, I still have some things to learn.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: force remapping a pending sector in sw raid5 array
  2018-02-09 20:13     ` Phil Turmel
  2018-02-09 20:29       ` Marc MERLIN
@ 2018-02-10 21:43       ` Mateusz Korniak
  2018-02-11 15:41         ` Marc MERLIN
  2018-02-11 17:13         ` Phil Turmel
  1 sibling, 2 replies; 32+ messages in thread
From: Mateusz Korniak @ 2018-02-10 21:43 UTC (permalink / raw)
  To: Phil Turmel
  Cc: Marc MERLIN, Kay Diederichs, Andreas Klauer, Adam Goryachev,
	Roger Heflin, linux-raid

On Friday 09 of February 2018 15:13:26 Phil Turmel wrote:
> If you have bad block lists enabled in your array, MD will *never* try
> to fix the underlying sectors

As far I was able to find, failed write marks sector in BBL. 
Does data is saved under different location when such write fails for later 
reads?
Failed read marks sector in BBL too? 

I am surprised to notice that I have plenty of sectors in BBL in some arrays 
which SMART reports be quite healthy, and all members passing short/long SMART 
tests ...  
 
-- 
Mateusz Korniak
"(...) mam brata - poważny, domator, liczykrupa, hipokryta, pobożniś,
 	krótko mówiąc - podpora społeczeństwa."
				Nikos Kazantzakis - "Grek Zorba"


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: force remapping a pending sector in sw raid5 array
  2018-02-10 21:43       ` Mateusz Korniak
@ 2018-02-11 15:41         ` Marc MERLIN
  2018-02-11 16:41           ` Marc MERLIN
  2018-02-11 17:13         ` Phil Turmel
  1 sibling, 1 reply; 32+ messages in thread
From: Marc MERLIN @ 2018-02-11 15:41 UTC (permalink / raw)
  To: Mateusz Korniak
  Cc: Phil Turmel, Kay Diederichs, Andreas Klauer, Adam Goryachev,
	Roger Heflin, linux-raid

As a last update on those drives, sadly they seem to have real problems
with SMART, which is why I was confused when using them.

myth:~# badblocks -fsvn -b512 /dev/sdf
/dev/sdf is apparently in use by the system; badblocks forced anyway.
Checking for bad blocks in non-destructive read-write mode
From block 0 to 3131110575
Checking for bad blocks (non-destructive read-write test)
Testing with random pattern: done                                                 
Pass completed, 0 bad blocks found. (0/0/0 errors)

That means a full read/write scan ran ok.
Yet:
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       7

7 sectors still marked as pending. This makes no sense...

As far as I can tell, SMART is just broken on those drives, and they're
going back to where I got them from.

Thanks all for the replies and helping me confirm this.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: force remapping a pending sector in sw raid5 array
  2018-02-11 15:41         ` Marc MERLIN
@ 2018-02-11 16:41           ` Marc MERLIN
  0 siblings, 0 replies; 32+ messages in thread
From: Marc MERLIN @ 2018-02-11 16:41 UTC (permalink / raw)
  To: Mateusz Korniak
  Cc: Phil Turmel, Kay Diederichs, Andreas Klauer, Adam Goryachev,
	Roger Heflin, linux-raid

On Sun, Feb 11, 2018 at 07:41:58AM -0800, Marc MERLIN wrote:
> As a last update on those drives, sadly they seem to have real problems
> with SMART, which is why I was confused when using them.
> 
> myth:~# badblocks -fsvn -b512 /dev/sdf
> /dev/sdf is apparently in use by the system; badblocks forced anyway.
> Checking for bad blocks in non-destructive read-write mode
> From block 0 to 3131110575
> Checking for bad blocks (non-destructive read-write test)
> Testing with random pattern: done                                                 
> Pass completed, 0 bad blocks found. (0/0/0 errors)
> 
> That means a full read/write scan ran ok.
> Yet:
> 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
> 197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       7
> 
> 7 sectors still marked as pending. This makes no sense...
 
And it gets "better", just re-ran a long self test, and still got:
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%       561         569442000

So, the disk sees bad blocks, SMART says there are bad blocks, and
badblocks run over the entire drive in read/write mode, finds nothing
anymore.

Anyway, those drives are going back in the box and the mail tomorrow,
but that sure is/was weird...

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: force remapping a pending sector in sw raid5 array
  2018-02-10 21:43       ` Mateusz Korniak
  2018-02-11 15:41         ` Marc MERLIN
@ 2018-02-11 17:13         ` Phil Turmel
  2018-02-11 18:02           ` Wols Lists
  2018-02-12 10:43           ` Mateusz Korniak
  1 sibling, 2 replies; 32+ messages in thread
From: Phil Turmel @ 2018-02-11 17:13 UTC (permalink / raw)
  To: Mateusz Korniak
  Cc: Marc MERLIN, Kay Diederichs, Andreas Klauer, Adam Goryachev,
	Roger Heflin, linux-raid

On 02/10/2018 04:43 PM, Mateusz Korniak wrote:
> On Friday 09 of February 2018 15:13:26 Phil Turmel wrote:
>> If you have bad block lists enabled in your array, MD will *never* try
>> to fix the underlying sectors
> 
> As far I was able to find, failed write marks sector in BBL. 
> Does data is saved under different location when such write fails for later 
> reads?

No.  That is why this is a misfeature that should never have been turned
on by default.

Phil

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: force remapping a pending sector in sw raid5 array
  2018-02-11 17:13         ` Phil Turmel
@ 2018-02-11 18:02           ` Wols Lists
  2018-02-12 10:43           ` Mateusz Korniak
  1 sibling, 0 replies; 32+ messages in thread
From: Wols Lists @ 2018-02-11 18:02 UTC (permalink / raw)
  To: Phil Turmel, Mateusz Korniak
  Cc: Marc MERLIN, Kay Diederichs, Andreas Klauer, Adam Goryachev,
	Roger Heflin, linux-raid

On 11/02/18 17:13, Phil Turmel wrote:
> On 02/10/2018 04:43 PM, Mateusz Korniak wrote:
>> On Friday 09 of February 2018 15:13:26 Phil Turmel wrote:
>>> If you have bad block lists enabled in your array, MD will *never* try
>>> to fix the underlying sectors

I've just been reading the man pages. This is exactly what IS supposed
to happen (that is, MD is *supposed* to fix the underlying sectors).
>>
>> As far I was able to find, failed write marks sector in BBL. 
>> Does data is saved under different location when such write fails for later 
>> reads?
> 
> No.  That is why this is a misfeature that should never have been turned
> on by default.
> 
I'm not going to argue about whether the feature should or should not
have been turned on - I think the reality is that the feature is
confused, and almost certainly buggy as a result, but imho it is a
feature that *should* be enabled - by default - if only it worked :-(

For a normal, properly functioning array, bad-blocks should be both
enabled, AND EMPTY. That it has entries you can't get rid of implies
it's buggy, as far as I can tell.

Cheers,
Wol


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: force remapping a pending sector in sw raid5 array
  2018-02-09 20:52         ` Kay Diederichs
@ 2018-02-11 20:52           ` Roger Heflin
  0 siblings, 0 replies; 32+ messages in thread
From: Roger Heflin @ 2018-02-11 20:52 UTC (permalink / raw)
  To: Kay Diederichs
  Cc: Marc MERLIN, Phil Turmel, Andreas Klauer, Adam Goryachev,
	Linux RAID

On my wd I have some code that does a hdparm --read-sector  and if
that fails it does a hdparm --write-sector back once I know bad
sectors and where they are.  I started this after I get the drive
removed from mdraid, and I have had the write-sector successfully
write and verify the data and then minutes or hours later have the
--read-sector on that block fail again.

So from what I can tell my drive is useless trash because of the
firmware not being able to decide reasonably that the sector is not
recoverable, lucky this drive is still under warranty.

I have several other 3tb WD reds several of which are out of warranty
and have no issues as far as I can tell.  That is at least better luck
then I had with my 1.5 (about 80% of the drives replaced bad sectors
and ran out of spares, or where otherwise useless because of randomly
successfully reading bad blocks but pausing everything for less than
the 7second timeout).

On Fri, Feb 9, 2018 at 2:52 PM, Kay Diederichs
<kay.diederichs@uni-konstanz.de> wrote:
>
>
> Am 09/02/18 um 21:29 schrieb Marc MERLIN:
>> On Fri, Feb 09, 2018 at 03:13:26PM -0500, Phil Turmel wrote:
>>>> The pending sectors should have been re-written and become
>>>> Reallocated_Event_Count, no?
>>>
>>> Yes, and not necessarily.  Pending sectors can be non-permanent errors
>>> -- the drive firmware will test a pending sector immediately after write
>>> to see if the write is readable.  If not, it will re-allocate while it
>>> still has the write data in its buffers.  Otherwise, it'll clear the
>>> pending sector.
>>
>> This shows the sector is still bad though, right?
>>
>> myth:~# hdparm --read-sector 1287409520 /dev/sdh
>> /dev/sdh:
>> reading sector 1287409520: SG_IO: bad/missing sense data, sb[]:  70 00 03 00 00 00 00 0a 40 51 e0 01 11 04 00 00 a0 70 00 00 00 00 00 00 00 00 00 00 00 00 00 00 succeeded
>> 7000 0b54 92c4 ffff 0000 0000 01fe 0000
>> (...)
>>
>> [ 2572.139404] ata5.04: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
>> [ 2572.139419] ata5.04: failed command: READ SECTOR(S) EXT
>> [ 2572.139427] ata5.04: cmd 24/00:01:70:4f:bc/00:00:4c:00:00/e0 tag 28 pio 512 in
>> [ 2572.139427]          res 51/40:01:70:4f:bc/00:00:4c:00:00/e0 Emask 0x9 (media error)
>> [ 2572.139431] ata5.04: status: { DRDY ERR }
>> [ 2572.139435] ata5.04: error: { UNC }
>> [ 2572.162369] ata5.04: configured for UDMA/133
>> [ 2572.162414] ata5: EH complete
>>
>> mdadm also said it found 6 bad sectors and rewrote them (or something like that)
>> and it's happy. So alledgely it did something, but smart does not agree (yet?)
>>
>> I'm now running a long smart test on all drives, will see if numbers change.
>>
>> Mmmh, and I just ran
>> myth:~# badblocks -fsvnb512 /dev/sdh 1287409599 1287409400
>> below, and I don't quite understand what's going on.
>>
>>>> So, mdadm is happy allegedly, but my drives still have the same bad
>>>> sectors they had (more or less).
>>>
>>> If you have bad block lists enabled in your array, MD will *never* try
>>> to fix the underlying sectors.  Please show your mdadm -E reports for
>>> these devices.  If necessary, stop the array and re-assemble with the
>>> options to disable bad block lists.  { How this misfeature got into the
>>> kernel and enabled by default baffles me. }
>>
>> This means I dont have bad block lists?
>> myth:~# mdadm -E /dev/sdd e f g h all return
>> /dev/sdd:
>>    MBR Magic : aa55
>> Partition[0] :   4294967295 sectors at            1 (type ee)
>>
>>> Also, pending sectors that are in dead zones between metadata and array
>>> data will not be accessed by a check scrub, and will therefore persist.
>>
>> That's a good point, but then I would never have discovered those blocks
>> while initializing the array.
>>
>>>> Yes, I know I should trash (return) those drives,
>>>
>>> Well, non-permanent read errors are not considered warranty failures.
>>> They are in the drive specs.  When pending is zero and actual
>>> re-allocations are climbing (my threshold is double digits), *then* it's
>>> time to replace.
>>
>> I think it's worse here. Read errors are not being cleared by block rewrites?
>> Those are brand "new" (but really remanufactured) drives.
>> So far I'm not liking what I'm seeing and I'm very close to just
>> returning them all and getting some less dodgy ones.
>>
>> Sad because the last set of 5 I got from a similar source, have worked
>> beautifully.
>>
>> Let's see what a full smart scan does.
>> I may also use hdparm --write-sector to just fill those bad blocks with 0's
>> now that it seems that mdadm isn't caring about/using them anymore?
>>
>> Now, badblocks perplexes me even more. Shouldn't -n re-write blocks?
>>
>> myth:~# badblocks -fsvnb512 /dev/sdh 1287409599 1287409400
>> /dev/sdh is apparently in use by the system; badblocks forced anyway.
>> Checking for bad blocks in non-destructive read-write mode
>> From block 1287409400 to 1287409599
>> Checking for bad blocks (non-destructive read-write test)
>> Testing with random pattern: 1287409520ne, 0:14 elapsed. (0/0/0 errors)
>> 1287409521ne, 0:18 elapsed. (1/0/0 errors)
>> 1287409522ne, 0:23 elapsed. (2/0/0 errors)
>> 1287409523ne, 0:27 elapsed. (3/0/0 errors)
>> 1287409524ne, 0:31 elapsed. (4/0/0 errors)
>> 1287409525ne, 0:36 elapsed. (5/0/0 errors)
>> 1287409526ne, 0:40 elapsed. (6/0/0 errors)
>> 1287409527ne, 0:44 elapsed. (7/0/0 errors)
>> done
>> Pass completed, 8 bad blocks found. (8/0/0 errors)
>>
>> Badblocks found 8 bad blocks, but didn't rewrite them, or failed to, or
>> succeeded but that did nothing anyway?
>>
>> Do I understand that
>> 1) badblocks got read errors
>> 2) it's supposed to rewrite the blocks with new data (or not?)
>> 3) auto reallocate failed
>>
>>
>> [ 3171.717001] ata5.04: exception Emask 0x0 SAct 0x40 SErr 0x0 action 0x0
>> [ 3171.717012] ata5.04: failed command: READ FPDMA QUEUED
>> [ 3171.717019] ata5.04: cmd 60/08:30:70:4f:bc/00:00:4c:00:00/40 tag 6 ncq dma 4096 in
>> [ 3171.717019]          res 41/40:00:70:4f:bc/00:00:4c:00:00/00 Emask 0x409 (media error) <F>
>> [ 3171.717031] ata5.04: status: { DRDY ERR }
>> [ 3171.717034] ata5.04: error: { UNC }
>> [ 3171.718293] ata5.04: configured for UDMA/133
>> [ 3171.718342] sd 4:4:0:0: [sdh] tag#6 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
>> [ 3171.718349] sd 4:4:0:0: [sdh] tag#6 Sense Key : Medium Error [current]
>> [ 3171.718354] sd 4:4:0:0: [sdh] tag#6 Add. Sense: Unrecovered read error - auto reallocate failed
>> [ 3171.718360] sd 4:4:0:0: [sdh] tag#6 CDB: Read(16) 88 00 00 00 00 00 4c bc 4f 70 00 00 00 08 00 00
>> [ 3171.718364] print_req_error: I/O error, dev sdh, sector 1287409520
>> [ 3171.718369] Buffer I/O error on dev sdh, logical block 160926190, async page read
>> [ 3171.718393] ata5: EH complete
>> [ 3176.092946] ata5.04: exception Emask 0x0 SAct 0x400000 SErr 0x0 action 0x0
>> [ 3176.092958] ata5.04: failed command: READ FPDMA QUEUED
>> [ 3176.092973] ata5.04: cmd 60/08:b0:70:4f:bc/00:00:4c:00:00/40 tag 22 ncq dma 4096 in
>> [ 3176.092973]          res 41/40:00:70:4f:bc/00:00:4c:00:00/00 Emask 0x409 (media error) <F>
>> [ 3176.092978] ata5.04: status: { DRDY ERR }
>> [ 3176.092981] ata5.04: error: { UNC }
>> [ 3176.094237] ata5.04: configured for UDMA/133
>> [ 3176.094285] sd 4:4:0:0: [sdh] tag#22 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
>> [ 3176.094291] sd 4:4:0:0: [sdh] tag#22 Sense Key : Medium Error [current]
>> [ 3176.094296] sd 4:4:0:0: [sdh] tag#22 Add. Sense: Unrecovered read error - auto reallocate failed
>> [ 3176.094302] sd 4:4:0:0: [sdh] tag#22 CDB: Read(16) 88 00 00 00 00 00 4c bc 4f 70 00 00 00 08 00 00
>> [ 3176.094306] print_req_error: I/O error, dev sdh, sector 1287409520
>> [ 3176.094310] Buffer I/O error on dev sdh, logical block 160926190, async page read
>> [ 3176.094324] ata5: EH complete
>> [ 3180.488899] ata5.04: exception Emask 0x0 SAct 0x100 SErr 0x0 action 0x0
>> [ 3180.488909] ata5.04: failed command: READ FPDMA QUEUED
>> [ 3180.488916] ata5.04: cmd 60/08:40:70:4f:bc/00:00:4c:00:00/40 tag 8 ncq dma 4096 in
>> [ 3180.488916]          res 41/40:00:70:4f:bc/00:00:4c:00:00/00 Emask 0x409 (media error) <F>
>> [ 3180.488928] ata5.04: status: { DRDY ERR }
>> [ 3180.488931] ata5.04: error: { UNC }
>> [ 3180.490193] ata5.04: configured for UDMA/133
>> [ 3180.490243] sd 4:4:0:0: [sdh] tag#8 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
>> [ 3180.490249] sd 4:4:0:0: [sdh] tag#8 Sense Key : Medium Error [current]
>> [ 3180.490254] sd 4:4:0:0: [sdh] tag#8 Add. Sense: Unrecovered read error - auto reallocate failed
>> [ 3180.490259] sd 4:4:0:0: [sdh] tag#8 CDB: Read(16) 88 00 00 00 00 00 4c bc 4f 70 00 00 00 08 00 00
>> [ 3180.490263] print_req_error: I/O error, dev sdh, sector 1287409520
>> [ 3180.490268] Buffer I/O error on dev sdh, logical block 160926190, async page read
>> [ 3180.490290] ata5: EH complete
>> [ 3184.873146] ata5.04: exception Emask 0x0 SAct 0x1000000 SErr 0x0 action 0x0
>> [ 3184.873161] ata5.04: failed command: READ FPDMA QUEUED
>> [ 3184.873175] ata5.04: cmd 60/08:c0:70:4f:bc/00:00:4c:00:00/40 tag 24 ncq dma 4096 in
>> [ 3184.873175]          res 41/40:00:70:4f:bc/00:00:4c:00:00/00 Emask 0x409 (media error) <F>
>> [ 3184.873181] ata5.04: status: { DRDY ERR }
>> [ 3184.873184] ata5.04: error: { UNC }
>> [ 3184.874437] ata5.04: configured for UDMA/133
>> [ 3184.874488] sd 4:4:0:0: [sdh] tag#24 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
>> [ 3184.874495] sd 4:4:0:0: [sdh] tag#24 Sense Key : Medium Error [current]
>> [ 3184.874500] sd 4:4:0:0: [sdh] tag#24 Add. Sense: Unrecovered read error - auto reallocate failed
>> [ 3184.874506] sd 4:4:0:0: [sdh] tag#24 CDB: Read(16) 88 00 00 00 00 00 4c bc 4f 70 00 00 00 08 00 00
>> [ 3184.874510] print_req_error: I/O error, dev sdh, sector 1287409520
>> [ 3184.874515] Buffer I/O error on dev sdh, logical block 160926190, async page read
>> [ 3184.874555] ata5: EH complete
>>
>
> What you write about the result of
> badblocks -fsvnb512 /dev/sdh 1287409599 1287409400
> is the expected behavior. -n means that it will _not_ write sectors that
> it cannot read (because that would remove the possibility that data from
> these sectors could be recovered by more tries).
>
> As I wrote, you have to use the -w option instead of -n, and use x and y
> of 1287409527 1287409520
>
> HTH
> Kay
>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: force remapping a pending sector in sw raid5 array
  2018-02-11 17:13         ` Phil Turmel
  2018-02-11 18:02           ` Wols Lists
@ 2018-02-12 10:43           ` Mateusz Korniak
  2018-02-12 15:29             ` Phil Turmel
  1 sibling, 1 reply; 32+ messages in thread
From: Mateusz Korniak @ 2018-02-12 10:43 UTC (permalink / raw)
  To: Phil Turmel
  Cc: Marc MERLIN, Kay Diederichs, Andreas Klauer, Adam Goryachev,
	Roger Heflin, linux-raid

On Sunday 11 of February 2018 12:13:45 Phil Turmel wrote:
> On 02/10/2018 04:43 PM, Mateusz Korniak wrote:
>
> > data is saved under different location when such write fails for
> > later reads?
> 
> No. (...)

So having arrays having non-empty BBL members means that array is in fact 
degraded (for tiny part, but still), right?

Is there any option for mdadm --monitor to send warning e-mail when bbl entry 
is added? (I can't see anything regarding bbl in mdadm --monitor section) ?

-- 
Mateusz Korniak
"(...) mam brata - poważny, domator, liczykrupa, hipokryta, pobożniś,
 	krótko mówiąc - podpora społeczeństwa."
				Nikos Kazantzakis - "Grek Zorba"


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: force remapping a pending sector in sw raid5 array
  2018-02-12 10:43           ` Mateusz Korniak
@ 2018-02-12 15:29             ` Phil Turmel
  2018-02-12 16:49               ` Marc MERLIN
  0 siblings, 1 reply; 32+ messages in thread
From: Phil Turmel @ 2018-02-12 15:29 UTC (permalink / raw)
  To: Mateusz Korniak
  Cc: Marc MERLIN, Kay Diederichs, Andreas Klauer, Adam Goryachev,
	Roger Heflin, linux-raid

On 02/12/2018 05:43 AM, Mateusz Korniak wrote:
> On Sunday 11 of February 2018 12:13:45 Phil Turmel wrote:
>> On 02/10/2018 04:43 PM, Mateusz Korniak wrote:
>>
>>> data is saved under different location when such write fails for
>>> later reads?
>>
>> No. (...)
> 
> So having arrays having non-empty BBL members means that array is in fact 
> degraded (for tiny part, but still), right?

Yes, it's degraded wherever there's a BBL entry.  To my knowledge, *no*
upper layer, whether device mapper or any filesystem, uses the
information to avoid allocations in the degraded area or to rescue the
data precariously living there.  Last I looked, mdadm --detail did not
report whether the array has degraded regions.  You must inspect the
output of mdadm --examine for every member.

> Is there any option for mdadm --monitor to send warning e-mail when bbl entry 
> is added? (I can't see anything regarding bbl in mdadm --monitor section) ?

No.  The feature is incomplete.  The only mitigation is to turn it off.

Phil

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: force remapping a pending sector in sw raid5 array
  2018-02-12 15:29             ` Phil Turmel
@ 2018-02-12 16:49               ` Marc MERLIN
  2018-02-12 17:16                 ` Phil Turmel
  0 siblings, 1 reply; 32+ messages in thread
From: Marc MERLIN @ 2018-02-12 16:49 UTC (permalink / raw)
  To: Phil Turmel
  Cc: Mateusz Korniak, Kay Diederichs, Andreas Klauer, Adam Goryachev,
	Roger Heflin, linux-raid

On Mon, Feb 12, 2018 at 10:29:20AM -0500, Phil Turmel wrote:
> > Is there any option for mdadm --monitor to send warning e-mail when bbl entry 
> > is added? (I can't see anything regarding bbl in mdadm --monitor section) ?
> 
> No.  The feature is incomplete.  The only mitigation is to turn it off.

I had a quick look but didn't really find how to turn it off after the fact
(not at array creation time, but after it's already been created).

Can you suggest how?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: force remapping a pending sector in sw raid5 array
  2018-02-12 16:49               ` Marc MERLIN
@ 2018-02-12 17:16                 ` Phil Turmel
  2018-02-12 17:30                   ` Marc MERLIN
  0 siblings, 1 reply; 32+ messages in thread
From: Phil Turmel @ 2018-02-12 17:16 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: Mateusz Korniak, Kay Diederichs, Andreas Klauer, Adam Goryachev,
	Roger Heflin, linux-raid

On 02/12/2018 11:49 AM, Marc MERLIN wrote:
> On Mon, Feb 12, 2018 at 10:29:20AM -0500, Phil Turmel wrote:
>>> Is there any option for mdadm --monitor to send warning e-mail when bbl entry 
>>> is added? (I can't see anything regarding bbl in mdadm --monitor section) ?
>>
>> No.  The feature is incomplete.  The only mitigation is to turn it off.
> 
> I had a quick look but didn't really find how to turn it off after the fact
> (not at array creation time, but after it's already been created).
> 
> Can you suggest how?

mdadm --assemble --update=no-bbl

There's another (undocumented?) option required when there are entries
in the list -- you'll have to dig that out for your situation.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: force remapping a pending sector in sw raid5 array
  2018-02-12 17:16                 ` Phil Turmel
@ 2018-02-12 17:30                   ` Marc MERLIN
  0 siblings, 0 replies; 32+ messages in thread
From: Marc MERLIN @ 2018-02-12 17:30 UTC (permalink / raw)
  To: Phil Turmel
  Cc: Mateusz Korniak, Kay Diederichs, Andreas Klauer, Adam Goryachev,
	Roger Heflin, linux-raid

On Mon, Feb 12, 2018 at 12:16:15PM -0500, Phil Turmel wrote:
> On 02/12/2018 11:49 AM, Marc MERLIN wrote:
> > On Mon, Feb 12, 2018 at 10:29:20AM -0500, Phil Turmel wrote:
> >>> Is there any option for mdadm --monitor to send warning e-mail when bbl entry 
> >>> is added? (I can't see anything regarding bbl in mdadm --monitor section) ?
> >>
> >> No.  The feature is incomplete.  The only mitigation is to turn it off.
> > 
> > I had a quick look but didn't really find how to turn it off after the fact
> > (not at array creation time, but after it's already been created).
> > 
> > Can you suggest how?
> 
> mdadm --assemble --update=no-bbl
 
Thanks.

> There's another (undocumented?) option required when there are entries
> in the list -- you'll have to dig that out for your situation.

That situation is gone, I was not able to clear the pending sectors even by
re-writing every block of the drive with badblocks, so I returned the drives
and got some better ones. 
Bad blocks on a "new" drive is bad enough, but then having the drive refuse
to remap them, or apparently in my case fail to update the smart counters
once the blocks were overwritten with good known data, is not ok.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2018-02-12 17:30 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-02-06 18:14 force remapping a pending sector in sw raid5 array Marc MERLIN
2018-02-06 18:59 ` Reindl Harald
2018-02-06 19:36   ` Marc MERLIN
2018-02-06 20:03 ` Andreas Klauer
2018-02-06 21:51 ` Adam Goryachev
2018-02-06 22:02   ` Marc MERLIN
2018-02-06 22:31     ` Roger Heflin
2018-02-06 22:46       ` Marc MERLIN
2018-02-07  4:29   ` Marc MERLIN
2018-02-07  9:42 ` Kay Diederichs
2018-02-09 19:29   ` Marc MERLIN
2018-02-09 19:57     ` Kay Diederichs
2018-02-09 20:02     ` Roger Heflin
2018-02-09 20:13     ` Phil Turmel
2018-02-09 20:29       ` Marc MERLIN
2018-02-09 20:44         ` Phil Turmel
2018-02-09 21:22           ` Marc MERLIN
2018-02-09 22:07             ` Wol's lists
2018-02-09 22:36               ` Marc MERLIN
2018-02-09 20:52         ` Kay Diederichs
2018-02-11 20:52           ` Roger Heflin
2018-02-09 21:17         ` Kay Diederichs
2018-02-10 21:43       ` Mateusz Korniak
2018-02-11 15:41         ` Marc MERLIN
2018-02-11 16:41           ` Marc MERLIN
2018-02-11 17:13         ` Phil Turmel
2018-02-11 18:02           ` Wols Lists
2018-02-12 10:43           ` Mateusz Korniak
2018-02-12 15:29             ` Phil Turmel
2018-02-12 16:49               ` Marc MERLIN
2018-02-12 17:16                 ` Phil Turmel
2018-02-12 17:30                   ` Marc MERLIN

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox