Linux RAID subsystem development

Linux RAID subsystem development
 help / color / mirror / Atom feed

* Re: recovering failed raid5
From: Andreas Klauer @ 2016-10-28 13:33 UTC (permalink / raw)
  To: Alexander Shenkin; +Cc: linux-raid
In-Reply-To: <715b259f-1e56-9606-edc4-3e5c4d57744b@shenkin.org>

On Fri, Oct 28, 2016 at 01:22:31PM +0100, Alexander Shenkin wrote:
> One remaining question: is sdc definitely toast?

In my opinion a drive is toast starting from the very first reallocated/ 
pending/uncorrectable sector, your drive has several of those and that's 
only the ones the drive already knows about - there may be more.

> Or, is it possible that the Timeout Mismatch (as mentioned by Robin Hill; 
> thanks Robin) is flagging the drive as failed, when something else is at 
> play and perhaps the drive is actually fine?

I don't believe in timeout mismatches, either. The timeouts are generous. 
Waiting for a disk to wake from standby is not a problem, and that takes 
ages already. If a disk gets stuck even longer in error correction limbo 
and it gets kicked because of it - IMHO that's the right call.

A disk that is unable to read its data, a disk that refuses to write data, 
a disk that needs help from the RAID layer to correct its errors, 
should be kicked because it's not able to pull its own weight.

You need drives that work without errors, without outside help, because 
during a rebuild, when the RAID is already degraded, there won't be any 
outside help. Either the disks work or your RAID is dead.

RAID redundancy is supposed to allow disks be replaced. (mdadm --replace)
If you use it instead to keep fixing errors on other disks, there is not 
any real redundancy left. In a RAID, if one of your disks has errors, 
you get rid of it as soon as possible.

Your RAID did not fail because of timeouts or not. It's not important. 
It failed because you didn't notice broken disks in time and you had two. 
Testing, monitoring, actually acting on the first error, is important. 

People have different opinions on this. Someone might argue.
It's up to you what risks to take.

Regards
Andreas Klauer

^ permalink raw reply

* [PATCH] raid1: handle read error also in readonly mode
From: Tomasz Majchrzak @ 2016-10-28 12:45 UTC (permalink / raw)
  To: linux-raid; +Cc: shli, Tomasz Majchrzak

If write is the first operation on a disk and it happens not to be
aligned to page size, block layer sends read request first. If read
operation fails, the disk is set as failed as no attempt to fix the
error is made because array is in auto-readonly mode. Similarily, the
disk is set as failed for read-only array.

Take the same approach as in raid10. Don't fail the disk if array is in
readonly or auto-readonly mode. Try to redirect the request first and if
unsuccessful, return a read error.

Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
---
 drivers/md/raid1.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index db536a6..29e2df5 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2297,17 +2297,23 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio)
 	 * This is all done synchronously while the array is
 	 * frozen
 	 */
+
+	bio = r1_bio->bios[r1_bio->read_disk];
+	bdevname(bio->bi_bdev, b);
+	bio_put(bio);
+	r1_bio->bios[r1_bio->read_disk] = NULL;
+
 	if (mddev->ro == 0) {
 		freeze_array(conf, 1);
 		fix_read_error(conf, r1_bio->read_disk,
 			       r1_bio->sector, r1_bio->sectors);
 		unfreeze_array(conf);
-	} else
-		md_error(mddev, conf->mirrors[r1_bio->read_disk].rdev);
+	} else {
+		r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED;
+	}
+
 	rdev_dec_pending(conf->mirrors[r1_bio->read_disk].rdev, conf->mddev);
 
-	bio = r1_bio->bios[r1_bio->read_disk];
-	bdevname(bio->bi_bdev, b);
 read_more:
 	disk = read_balance(conf, r1_bio, &max_sectors);
 	if (disk == -1) {
@@ -2318,11 +2324,6 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio)
 	} else {
 		const unsigned long do_sync
 			= r1_bio->master_bio->bi_opf & REQ_SYNC;
-		if (bio) {
-			r1_bio->bios[r1_bio->read_disk] =
-				mddev->ro ? IO_BLOCKED : NULL;
-			bio_put(bio);
-		}
 		r1_bio->read_disk = disk;
 		bio = bio_clone_mddev(r1_bio->master_bio, GFP_NOIO, mddev);
 		bio_trim(bio, r1_bio->sector - bio->bi_iter.bi_sector,
-- 
1.8.3.1


^ permalink raw reply related

* Re: recovering failed raid5
From: Alexander Shenkin @ 2016-10-28 12:22 UTC (permalink / raw)
  To: linux-raid; +Cc: Andreas Klauer, rm, robin
In-Reply-To: <20161027160400.GA21042@metamorpher.de>

Thanks Andreas, much appreciated.  Your points about selftests and smart 
are well taken, and i'll implement them once i get this back up.  I'll 
buy yet another new, non drive-from-hell (yes Roman, I did buy the same 
damn drive again.  Will try to return it, thanks for the heads up...) 
and follow your instructions below.

One remaining question: is sdc definitely toast?  Or, is it possible 
that the Timeout Mismatch (as mentioned by Robin Hill; thanks Robin) is 
flagging the drive as failed, when something else is at play and perhaps 
the drive is actually fine?

To everyone: sorry for the multiple posts.  Was having majordomo issues...

On 10/27/2016 5:04 PM, Andreas Klauer wrote:
> On Thu, Oct 27, 2016 at 04:06:14PM +0100, Alexander Shenkin wrote:
>> md2: raid5 mounted on /, via sd[abcd]3
>
> Two failed disks...
>
>> md0: raid1 mounted on /boot, via sd[abcd]1
>
> Actually only two disks active in that one, the other two are spares.
> It hardly matters for /boot, but you could grow it to a 4 disk raid1.
> Spares are not useful.
>
>> My sdb was recently reporting problems.  Instead of second guessing
>> those problems, I just got a new disk, replaced it, and added it to
>> the arrays.
>
> Replacing right away is the right thing to do.
> Unfortunately it seems you have another disk that is broke too.
>
>> 2) smartctl (disabled on drives - can enable once back up.  should I?)
>> note: SMART only enabled after problems started cropping up.
>
> But... why? Why disable smart? And if you do, is it a surprise that you
> only notice disk failures when it's already too late?

yeah, i asked myself that same question.  there was probably some reason 
I did, but i don't remember what it was.  i'll keep smart enabled from 
now on...

> You should enable smart, and not only that, also run regular selftests,
> and have smartd running, and have it send you mail when something happens.
> Same with raid checks, raid checks are at least something but it won't
> tell you about how many reallocated sectors your drive has.

will do

>> root@machinename:/home/username# smartctl --xall /dev/sda
>
> Looks fine but never ran a selftest.
>
>> root@machinename:/home/username# smartctl --xall /dev/sdb
>
> Looks new. (New drives need selftests too.)
>
>> root@machinename:/home/username# smartctl --xall /dev/sdc
>> smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.19.0-39-generic] (local build)
>> Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
>>
>> === START OF INFORMATION SECTION ===
>> Model Family:     Seagate Barracuda 7200.14 (AF)
>> Device Model:     ST3000DM001-1CH166
>> Serial Number:    W1F1N909
>>
>> 197 Current_Pending_Sector  -O--C-   100   100   000    -    8
>> 198 Offline_Uncorrectable   ----C-   100   100   000    -    8
>
> This one is faulty and probably the reason why your resync failed.
> You have no redundancy left, so an option here would be to get a
> new drive and ddrescue it over.
>
> That's exactly the kind of thing you should be notified instantly
> about via mail. And it should be discovered when running selftests.
> Without full surface scan of the media, the disk itself won't know.
>
>> ==> WARNING: A firmware update for this drive may be available,
>> see the following Seagate web pages:
>> http://knowledge.seagate.com/articles/en_US/FAQ/207931en
>> http://knowledge.seagate.com/articles/en_US/FAQ/223651en
>
> About this, *shrug*
> I don't have these drives, you might want to check that out.
> But it probably won't fix bad sectors.
>
>> root@machinename:/home/username# smartctl --xall /dev/sdd
>
> Some strange things in the error log here, but old.
> Still, same as for all others - selftest.
>
>> ################### mdadm --examine ###########################
>>
>> /dev/sda1:
>>      Raid Level : raid1
>>    Raid Devices : 2
>
> A RAID 1 with two drives, could be four.
>
>> /dev/sdb1:
>> /dev/sdc1:
>
> So these would also have data instead of being spare.
>
>> /dev/sda3:
>>      Raid Level : raid5
>>    Raid Devices : 4
>>
>>     Update Time : Mon Oct 24 09:02:52 2016
>>          Events : 53547
>>
>>    Device Role : Active device 0
>>    Array State : A..A ('A' == active, '.' == missing)
>
> RAID-5 with two failed disks.
>
>> /dev/sdc3:
>>      Raid Level : raid5
>>    Raid Devices : 4
>>
>>     Update Time : Mon Oct 24 08:53:57 2016
>>          Events : 53539
>>
>>    Device Role : Active device 2
>>    Array State : AAAA ('A' == active, '.' == missing)
>
> This one failed, 8:53.
>
>> ############ /proc/mdstat ############################################
>>
>> md2 : active raid5 sda3[0] sdc3[2](F) sdd3[3]
>>       8760565248 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/2]
>> [U__U]
>
> [U__U] refers to device roles as in [0123],
> so device role 0 and 3 is okay, 1 and 2 missing.
>
>> md0 : active raid1 sdb1[4](S) sdc1[2](S) sda1[0] sdd1[3]
>>       1950656 blocks super 1.2 [2/2] [UU]
>
> Those two spares again, could be [UUUU] instead.
>
> tl;dr
> stop it all,
> ddrescue /dev/sdc to your new disk,
> try your luck with --assemble --force (not using /dev/sdc!),
> get yet another new disk, add, sync, cross fingers.
>
> There's also mdadm --replace instead of --remove, --add,
> that sometimes helps if there's only a few bad sectors
> on each disk. If the disk you already removed wasn't
> already kicked from the array by the time you replaced,
> maybe it would have avoided this problem.
>
> But good disk monitoring and testing is even more important.

thanks a bunch, Andreas.  I'll monitor and test from now on...

> Regards
> Andreas Klauer


^ permalink raw reply

* [PATCH] Increase buffer for sysfs path
From: Tomasz Majchrzak @ 2016-10-28  8:35 UTC (permalink / raw)
  To: linux-raid; +Cc: Jes.Sorensen, Tomasz Majchrzak

'unacknowledged_bad_blocks' is a long name for sysfs property and it
makes sysfs path over 50 characters long. Increase buffer to the double
length of the longest path available in sysfs at the moment.

Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
---
 sysfs.c | 36 ++++++++++++++++++++----------------
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/sysfs.c b/sysfs.c
index c7a8e66..fc1f01e 100644
--- a/sysfs.c
+++ b/sysfs.c
@@ -27,6 +27,8 @@
 #include	<dirent.h>
 #include	<ctype.h>
 
+#define MAX_SYSFS_PATH_LEN	120
+
 int load_sys(char *path, char *buf, int len)
 {
 	int fd = open(path, O_RDONLY);
@@ -63,15 +65,15 @@ void sysfs_free(struct mdinfo *sra)
 
 int sysfs_open(char *devnm, char *devname, char *attr)
 {
-	char fname[50];
+	char fname[MAX_SYSFS_PATH_LEN];
 	int fd;
 
-	sprintf(fname, "/sys/block/%s/md/", devnm);
+	snprintf(fname, MAX_SYSFS_PATH_LEN, "/sys/block/%s/md/", devnm);
 	if (devname) {
-		strcat(fname, devname);
-		strcat(fname, "/");
+		strncat(fname, devname, MAX_SYSFS_PATH_LEN - strlen(fname));
+		strncat(fname, "/", MAX_SYSFS_PATH_LEN - strlen(fname));
 	}
-	strcat(fname, attr);
+	strncat(fname, attr, MAX_SYSFS_PATH_LEN - strlen(fname));
 	fd = open(fname, O_RDWR);
 	if (fd < 0 && errno == EACCES)
 		fd = open(fname, O_RDONLY);
@@ -396,15 +398,17 @@ unsigned long long get_component_size(int fd)
 	 * This returns in units of sectors.
 	 */
 	struct stat stb;
-	char fname[50];
+	char fname[MAX_SYSFS_PATH_LEN];
 	int n;
 	if (fstat(fd, &stb))
 		return 0;
 	if (major(stb.st_rdev) != (unsigned)get_mdp_major())
-		sprintf(fname, "/sys/block/md%d/md/component_size",
+		snprintf(fname, MAX_SYSFS_PATH_LEN,
+			"/sys/block/md%d/md/component_size",
 			(int)minor(stb.st_rdev));
 	else
-		sprintf(fname, "/sys/block/md_d%d/md/component_size",
+		snprintf(fname, MAX_SYSFS_PATH_LEN,
+			"/sys/block/md_d%d/md/component_size",
 			(int)minor(stb.st_rdev)>>MdpMinorShift);
 	fd = open(fname, O_RDONLY);
 	if (fd < 0)
@@ -420,11 +424,11 @@ unsigned long long get_component_size(int fd)
 int sysfs_set_str(struct mdinfo *sra, struct mdinfo *dev,
 		  char *name, char *val)
 {
-	char fname[50];
+	char fname[MAX_SYSFS_PATH_LEN];
 	unsigned int n;
 	int fd;
 
-	sprintf(fname, "/sys/block/%s/md/%s/%s",
+	snprintf(fname, MAX_SYSFS_PATH_LEN, "/sys/block/%s/md/%s/%s",
 		sra->sys_name, dev?dev->sys_name:"", name);
 	fd = open(fname, O_WRONLY);
 	if (fd < 0)
@@ -457,11 +461,11 @@ int sysfs_set_num_signed(struct mdinfo *sra, struct mdinfo *dev,
 
 int sysfs_uevent(struct mdinfo *sra, char *event)
 {
-	char fname[50];
+	char fname[MAX_SYSFS_PATH_LEN];
 	int n;
 	int fd;
 
-	sprintf(fname, "/sys/block/%s/uevent",
+	snprintf(fname, MAX_SYSFS_PATH_LEN, "/sys/block/%s/uevent",
 		sra->sys_name);
 	fd = open(fname, O_WRONLY);
 	if (fd < 0)
@@ -478,10 +482,10 @@ int sysfs_uevent(struct mdinfo *sra, char *event)
 
 int sysfs_attribute_available(struct mdinfo *sra, struct mdinfo *dev, char *name)
 {
-	char fname[50];
+	char fname[MAX_SYSFS_PATH_LEN];
 	struct stat st;
 
-	sprintf(fname, "/sys/block/%s/md/%s/%s",
+	snprintf(fname, MAX_SYSFS_PATH_LEN, "/sys/block/%s/md/%s/%s",
 		sra->sys_name, dev?dev->sys_name:"", name);
 
 	return stat(fname, &st) == 0;
@@ -490,10 +494,10 @@ int sysfs_attribute_available(struct mdinfo *sra, struct mdinfo *dev, char *name
 int sysfs_get_fd(struct mdinfo *sra, struct mdinfo *dev,
 		       char *name)
 {
-	char fname[50];
+	char fname[MAX_SYSFS_PATH_LEN];
 	int fd;
 
-	sprintf(fname, "/sys/block/%s/md/%s/%s",
+	snprintf(fname, MAX_SYSFS_PATH_LEN, "/sys/block/%s/md/%s/%s",
 		sra->sys_name, dev?dev->sys_name:"", name);
 	fd = open(fname, O_RDWR);
 	if (fd < 0)
-- 
1.8.3.1


^ permalink raw reply related

* Re: Resync issue in RAID1
From: V @ 2016-10-28  6:07 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid
In-Reply-To: <87r371rp0d.fsf@notabene.neil.brown.name>

Is there any reason, why this happens in the resync flow. Normally the
upper layer driver tries to align with device block size for the
request. So could there be an issue in this path ?

Thanks,
V

On Thu, Oct 27, 2016 at 11:01 PM, NeilBrown <neilb@suse.com> wrote:
> On Fri, Oct 28 2016, V wrote:
>
>> Hi Neil,
>>
>> Thanks for the response. But during this phase, why is the scsi driver
>> complaining about bad block number ?
>>
>> Oct 18 03:52:56  kernel: [  52.869378] sd 0:0:0:0: [sda] Bad block
>> number requested
>
> Because md is asking to read blocks are offsets which are not a multiple
> of 8 sectors.
>
> NeilBrown
>
>
>> Oct 18 03:52:56  kernel: [  52.869414] sd 0:0:0:0: [sda] Bad block
>> number requested
>> Oct 18 03:52:56  kernel: [  52.869436] sd 0:0:0:0: [sda] Bad block
>> number requested
>> Oct 18 03:52:56  kernel: [  52.869465] sd 0:0:0:0: [sda] Bad block
>> number requested
>> Oct 18 03:52:56  kernel: [  52.869503] sd 0:0:1:0: [sdb] Bad block
>> number requested
>>
>> Thanks,
>> V
>>
>> On Thu, Oct 27, 2016 at 9:01 PM, NeilBrown <neilb@suse.com> wrote:
>>> On Sat, Oct 22 2016, V wrote:
>>>
>>>> Hi,
>>>>
>>>> I am facing an issue during RAID1 resync. I have an ubuntu
>>>> 4.4.0-31-generic running with raid1 configured with 2 disks as active
>>>> and 2 as spares. On the first powercycle, after installing RAID, i see
>>>> the following messages in kern.log
>>>>
>>>>
>>>> My disks are configured with 4K sector size (both logical and
>>>> physical) (sda and sdb are active disks for this raid)
>>>>
>>>>
>>>> ===========
>>>> Oct 18 03:52:56  kernel: [   52.869113] md: using 128k window, over a
>>>> total of 51167104k.
>>>> Oct 18 03:52:56  kernel: [   52.869114] md: resuming resync of md2 from checkpoint.
>>>
>>> This line (above) combined with ...
>>>
>>>> Oct 18 03:52:56  kernel: [   52.869536] md/raid1:md2: sda: unrecoverable I/O read error for block 3
>>>
>>> this line suggests that when you shut down, md had already started a
>>> resync, and it had checkpointed at block '3'.
>>>
>>> The subsequent error are:
>>>
>>>> Oct 18 03:52:56  kernel: [   52.869692] md/raid1:md2: sda: unrecoverable I/O read error for block 131
>>>> Oct 18 03:52:56  kernel: [   52.869837] md/raid1:md2: sda: unrecoverable I/O read error for block 259
>>>> Oct 18 03:52:56  kernel: [   52.870022] md/raid1:md2: sda: unrecoverable I/O read error for block 387
>>>
>>> which are every 128 blocks (aka sectors) from '3'.
>>> I know what caused that.  The patch below will stop it happening again.
>>>
>>> You might be able get your array working again by stopping it
>>> and assembling with --update=resync.
>>> That will reset the checkpoint to 0.
>>>
>>> NeilBrown
>>>
>>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>>> index 2cf0e1c00b9a..aa2ca23463f4 100644
>>> --- a/drivers/md/md.c
>>> +++ b/drivers/md/md.c
>>> @@ -8099,7 +8099,8 @@ void md_do_sync(struct md_thread *thread)
>>>             mddev->curr_resync > 2) {
>>>                 if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
>>>                         if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) {
>>> -                               if (mddev->curr_resync >= mddev->recovery_cp) {
>>> +                               if (mddev->curr_resync >= mddev->recovery_cp &&
>>> +                                   mddev->curr_resync > 3) {
>>>                                         printk(KERN_INFO
>>>                                                "md: checkpointing %s of %s.\n",
>>>                                                desc, mdname(mddev));
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Resync issue in RAID1
From: NeilBrown @ 2016-10-28  6:01 UTC (permalink / raw)
  To: V; +Cc: linux-raid
In-Reply-To: <CAF9xHmQfUDOYFNa1uehqsz=arx-x24DuavX1sdBY313mdt0GEw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3402 bytes --]

On Fri, Oct 28 2016, V wrote:

> Hi Neil,
>
> Thanks for the response. But during this phase, why is the scsi driver
> complaining about bad block number ?
>
> Oct 18 03:52:56  kernel: [  52.869378] sd 0:0:0:0: [sda] Bad block
> number requested

Because md is asking to read blocks are offsets which are not a multiple
of 8 sectors.

NeilBrown


> Oct 18 03:52:56  kernel: [  52.869414] sd 0:0:0:0: [sda] Bad block
> number requested
> Oct 18 03:52:56  kernel: [  52.869436] sd 0:0:0:0: [sda] Bad block
> number requested
> Oct 18 03:52:56  kernel: [  52.869465] sd 0:0:0:0: [sda] Bad block
> number requested
> Oct 18 03:52:56  kernel: [  52.869503] sd 0:0:1:0: [sdb] Bad block
> number requested
>
> Thanks,
> V
>
> On Thu, Oct 27, 2016 at 9:01 PM, NeilBrown <neilb@suse.com> wrote:
>> On Sat, Oct 22 2016, V wrote:
>>
>>> Hi,
>>>
>>> I am facing an issue during RAID1 resync. I have an ubuntu
>>> 4.4.0-31-generic running with raid1 configured with 2 disks as active
>>> and 2 as spares. On the first powercycle, after installing RAID, i see
>>> the following messages in kern.log
>>>
>>>
>>> My disks are configured with 4K sector size (both logical and
>>> physical) (sda and sdb are active disks for this raid)
>>>
>>>
>>> ===========
>>> Oct 18 03:52:56  kernel: [   52.869113] md: using 128k window, over a
>>> total of 51167104k.
>>> Oct 18 03:52:56  kernel: [   52.869114] md: resuming resync of md2 from checkpoint.
>>
>> This line (above) combined with ...
>>
>>> Oct 18 03:52:56  kernel: [   52.869536] md/raid1:md2: sda: unrecoverable I/O read error for block 3
>>
>> this line suggests that when you shut down, md had already started a
>> resync, and it had checkpointed at block '3'.
>>
>> The subsequent error are:
>>
>>> Oct 18 03:52:56  kernel: [   52.869692] md/raid1:md2: sda: unrecoverable I/O read error for block 131
>>> Oct 18 03:52:56  kernel: [   52.869837] md/raid1:md2: sda: unrecoverable I/O read error for block 259
>>> Oct 18 03:52:56  kernel: [   52.870022] md/raid1:md2: sda: unrecoverable I/O read error for block 387
>>
>> which are every 128 blocks (aka sectors) from '3'.
>> I know what caused that.  The patch below will stop it happening again.
>>
>> You might be able get your array working again by stopping it
>> and assembling with --update=resync.
>> That will reset the checkpoint to 0.
>>
>> NeilBrown
>>
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index 2cf0e1c00b9a..aa2ca23463f4 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -8099,7 +8099,8 @@ void md_do_sync(struct md_thread *thread)
>>             mddev->curr_resync > 2) {
>>                 if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
>>                         if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) {
>> -                               if (mddev->curr_resync >= mddev->recovery_cp) {
>> +                               if (mddev->curr_resync >= mddev->recovery_cp &&
>> +                                   mddev->curr_resync > 3) {
>>                                         printk(KERN_INFO
>>                                                "md: checkpointing %s of %s.\n",
>>                                                desc, mdname(mddev));
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 800 bytes --]

^ permalink raw reply

* Re: Bitmap in RAM?
From: NeilBrown @ 2016-10-28  5:58 UTC (permalink / raw)
  To: Dark Penguin, linux-raid
In-Reply-To: <57F8EC82.9010804@yandex.ru>

[-- Attachment #1: Type: text/plain, Size: 1942 bytes --]

On Sat, Oct 08 2016, Dark Penguin wrote:

> After researching write-intent bitmaps for a while, my understanding is 
> that they are used only to speed up "re-adding" drives by avoiding a 
> full resync, and to enable --write-mostly --write-behind. However, it 

This is not correct.  Speeding up re-adding is certainly one benefit.
The other benefit is speeding up resync after a system crash.

> does introduce some pretty heavy load on whatever device it's on, 
> especially if it's an internal bitmap, because the head would have to 
> fly all the way to the superblock twice per each write. If it's an 
> external bitmap, then the device it's on would be too busy just serving 
> it to do anything else.

It doesn't update the bitmap immediately before and after every write.
Writes are batched, and the bitmap is updated once before each batch of
writes.
There does need to be another update to record that the write has
completed, but that is delayed and usually merged with the update at the
start of the next batch of writes.
So the bitmap is usually updated once per batch of writes.

>
> So if I were to place it on a tmpfs, I could eliminate this problem only 
> at the expense of being unable to re-add drives after a reboot, right?.. 
> I've read somewhere that bitmaps only work correctly on ext2 or ext3 
> filesystems, but that probably means that it's not a good idea to put it 
> on a filesystem with delayed allocation like ext4 of zfs, otherwise I 
> don't understand why - and so I don't know if there would be any problem 
> with it running on tmpfs. Is there?..

You could create a ramdisk, create an ext2 filesystem on that, and put
the bitmap file there.

It probably would make sense to support in-memory bitmaps which never
get written to disk.  It would be fairly easy to do, and would allow
expedited re-add.  I just hasn't been done.
(patches welcome :-)

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 800 bytes --]

^ permalink raw reply

* Re: query re: resync not persisting over reboot in rescue mode
From: Dan Kortschak @ 2016-10-28  5:47 UTC (permalink / raw)
  To: NeilBrown, linux-raid
In-Reply-To: <87zilprq55.fsf@notabene.neil.brown.name>

Thank you.

On Fri, 2016-10-28 at 16:36 +1100, NeilBrown wrote:
> On Mon, Oct 24 2016, Dan Kortschak wrote:
> 
> > 
> > First, I'll start with an apology - I have little (~nothing) in the
> > way
> > of hard data to back up this question, but it is now just a matter
> > of
> > personal interest as the problem appears to be fixed.
> > 
> > 
> > Background:
> > 
> > Last week I tried to upgrade a kernel (ubuntu distro 16.04) but
> > that
> > failed due to out of space (the system was originally built when
> > kernels were much smaller and I have now got to the point where it
> > is
> > not possible to have two kernels on the /boot partitian).
> > 
> > On reboot the boot failed with a great deal of disk noise which
> > persisted. Booting into recovery worked, and the noise was now
> > knowable
> Disk noise shouldn't cause the boot to fail....

No. That was not the suggestion :).

Rather that I was able to know what was causing the noise - the raid is
quite noise when resyncing, so the resyncing explained the noise,
rather than the noise explaining the failure.

> > 
> > to be coming from a RAID resync of /dev/md0 (RAID10) after I
> > dropped
> > into a root shell. I allowed this to complete and then went back to
> > resume the normal boot. This failed as before.
> > 
> > Repeating the process above, I found that the RAID was again
> > unsynced
> > an doing a resync.
> > 
> > This has now resolved - I again allowed the resync to complete and
> > then
> > did a /sbin/reboot from the CLI rather than going back to the
> > recovery
> > menu and continuing the boot from the menu.
> > 
> > 
> > Question:
> > 
> > What is a/are likely cause(s) for a resync to not persist over a
> > reboot?
> If /sbin/reboot works and the menu-thing doesn't then the menu-thing
> much be doing something very seriously wrong.
> You might be able to get the resync status to be forgotten if you
> write something to the array and then quickly run "reboot -f -n", or
> maybe "echo b > /proc/sysrq-trigger".
> 
> Given that /sbin/reboot works, I suggest you report this to ubuntu as
> a
> bug.
> 

This is probably true, though I doubt there is much that can be done
from a report given the absence of data that I could provide to help
resolve the issue. I think this may just be "one of those things".

Thanks for your answer.


Dan



^ permalink raw reply

* Re: growing a RAID-10 array with mdadm 3.3.1+ ?
From: NeilBrown @ 2016-10-28  5:47 UTC (permalink / raw)
  To: moft, Anthony Youngman, linux-raid
In-Reply-To: <1476211037.1004490.752738017.68CF1E3E@webmail.messagingengine.com>

[-- Attachment #1: Type: text/plain, Size: 1336 bytes --]

On Wed, Oct 12 2016, moft@fmailbox.com wrote:

> On Tue, Oct 11, 2016, at 11:29 AM, Anthony Youngman wrote:
>> Growing an array is pretty safe, but like anything here, it does have  its dangers.
>> 
>> Second, what distro are you running? Is it a systemd-based distro?
>
> Opensuse. Yes.

Specifically, which openSUSE.  What version of mdadm. > 

>
>> feeling is that systemd is "to blame". 
>
> I have no idea why that'd be the case.  That's the first time I've heard anybody suggest that.

Debian bug 840743 helped me see a possible reason.

In some cases mdadm need to remain running in the background to monitor
the reshape.  Systemd doesn't like us to do that (it likes to kill
background tasks when you log off).
So on systemd installs we use a systemd service to run
  mdadm --grow --continue
There was a bug in mdadm 3.3.x, fixed in 3.4, which caused that mdadm to
fail (at least in some circumstances).
The the reshape froze at the start.

>
>> > *CAN* I safely grow/expand it?
>> 
>> Bugs excepted - yes you should be able to, without problems.
>
> So grouwing 'far' layouts are now supported?  Do have a
> reference/source for that?

No, 'far' layout RAID10 cannot be reshaped.  There are some messy issues
with making that work sensibly which I never bothered to resolve.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 800 bytes --]

^ permalink raw reply

* Re: query re: resync not persisting over reboot in rescue mode
From: NeilBrown @ 2016-10-28  5:36 UTC (permalink / raw)
  To: Dan Kortschak, linux-raid
In-Reply-To: <1477266640.7137.13.camel@adelaide.edu.au>

[-- Attachment #1: Type: text/plain, Size: 1681 bytes --]

On Mon, Oct 24 2016, Dan Kortschak wrote:

> First, I'll start with an apology - I have little (~nothing) in the way
> of hard data to back up this question, but it is now just a matter of
> personal interest as the problem appears to be fixed.
>
>
> Background:
>
> Last week I tried to upgrade a kernel (ubuntu distro 16.04) but that
> failed due to out of space (the system was originally built when
> kernels were much smaller and I have now got to the point where it is
> not possible to have two kernels on the /boot partitian).
>
> On reboot the boot failed with a great deal of disk noise which
> persisted. Booting into recovery worked, and the noise was now knowable

Disk noise shouldn't cause the boot to fail....


> to be coming from a RAID resync of /dev/md0 (RAID10) after I dropped
> into a root shell. I allowed this to complete and then went back to
> resume the normal boot. This failed as before.
>
> Repeating the process above, I found that the RAID was again unsynced
> an doing a resync.
>
> This has now resolved - I again allowed the resync to complete and then
> did a /sbin/reboot from the CLI rather than going back to the recovery
> menu and continuing the boot from the menu.
>
>
> Question:
>
> What is a/are likely cause(s) for a resync to not persist over a
> reboot?

If /sbin/reboot works and the menu-thing doesn't then the menu-thing
much be doing something very seriously wrong.
You might be able to get the resync status to be forgotten if you
write something to the array and then quickly run "reboot -f -n", or
maybe "echo b > /proc/sysrq-trigger".

Given that /sbin/reboot works, I suggest you report this to ubuntu as a
bug.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 800 bytes --]

^ permalink raw reply

* Re: Resync issue in RAID1
From: V @ 2016-10-28  5:33 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid
In-Reply-To: <8760odt93j.fsf@notabene.neil.brown.name>

Hi Neil,

Thanks for the response. But during this phase, why is the scsi driver
complaining about bad block number ?

Oct 18 03:52:56  kernel: [  52.869378] sd 0:0:0:0: [sda] Bad block
number requested
Oct 18 03:52:56  kernel: [  52.869414] sd 0:0:0:0: [sda] Bad block
number requested
Oct 18 03:52:56  kernel: [  52.869436] sd 0:0:0:0: [sda] Bad block
number requested
Oct 18 03:52:56  kernel: [  52.869465] sd 0:0:0:0: [sda] Bad block
number requested
Oct 18 03:52:56  kernel: [  52.869503] sd 0:0:1:0: [sdb] Bad block
number requested

Thanks,
V

On Thu, Oct 27, 2016 at 9:01 PM, NeilBrown <neilb@suse.com> wrote:
> On Sat, Oct 22 2016, V wrote:
>
>> Hi,
>>
>> I am facing an issue during RAID1 resync. I have an ubuntu
>> 4.4.0-31-generic running with raid1 configured with 2 disks as active
>> and 2 as spares. On the first powercycle, after installing RAID, i see
>> the following messages in kern.log
>>
>>
>> My disks are configured with 4K sector size (both logical and
>> physical) (sda and sdb are active disks for this raid)
>>
>>
>> ===========
>> Oct 18 03:52:56  kernel: [   52.869113] md: using 128k window, over a
>> total of 51167104k.
>> Oct 18 03:52:56  kernel: [   52.869114] md: resuming resync of md2 from checkpoint.
>
> This line (above) combined with ...
>
>> Oct 18 03:52:56  kernel: [   52.869536] md/raid1:md2: sda: unrecoverable I/O read error for block 3
>
> this line suggests that when you shut down, md had already started a
> resync, and it had checkpointed at block '3'.
>
> The subsequent error are:
>
>> Oct 18 03:52:56  kernel: [   52.869692] md/raid1:md2: sda: unrecoverable I/O read error for block 131
>> Oct 18 03:52:56  kernel: [   52.869837] md/raid1:md2: sda: unrecoverable I/O read error for block 259
>> Oct 18 03:52:56  kernel: [   52.870022] md/raid1:md2: sda: unrecoverable I/O read error for block 387
>
> which are every 128 blocks (aka sectors) from '3'.
> I know what caused that.  The patch below will stop it happening again.
>
> You might be able get your array working again by stopping it
> and assembling with --update=resync.
> That will reset the checkpoint to 0.
>
> NeilBrown
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 2cf0e1c00b9a..aa2ca23463f4 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -8099,7 +8099,8 @@ void md_do_sync(struct md_thread *thread)
>             mddev->curr_resync > 2) {
>                 if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
>                         if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) {
> -                               if (mddev->curr_resync >= mddev->recovery_cp) {
> +                               if (mddev->curr_resync >= mddev->recovery_cp &&
> +                                   mddev->curr_resync > 3) {
>                                         printk(KERN_INFO
>                                                "md: checkpointing %s of %s.\n",
>                                                desc, mdname(mddev));

^ permalink raw reply

* [PATCH] md: be careful not lot leak internal curr_resync value into metadata.
From: NeilBrown @ 2016-10-28  4:59 UTC (permalink / raw)
  To: Shaohua Li; +Cc: Linux-RAID, Viswesh

[-- Attachment #1: Type: text/plain, Size: 2354 bytes --]



mddev->curr_resync usually records where the current resync is up to,
but during the starting phase it has some "magic" values.

 1 - means that the array is trying to start a resync, but has yielded
     to another array which shares physical devices, and also needs to
     start a resync
 2 - means the array is trying to start resync, but has found another
     array which shares physical devices and has already started resync.

 3 - means that resync has commensed, but it is possible that nothing
     has actually been resynced yet.

It is important that this value not be visible to user-space and
particularly that it doesn't get written to the metadata, as the
resync or recovery checkpoint.  In part, this is because it may be
slightly higher than the correct value, though this is very rare.
In part, because it is not a multiple of 4K, and some devices only
support 4K aligned accesses.

There are two places where this value is propagates into either
->curr_resync_completed or ->recovery_cp or ->recovery_offset.
These currently avoid the propagation of values 1 and 3, but will
allow 3 to leak through.

Change them to only propagate the value if it is > 3.

As this can cause an array to fail, the patch is suitable for -stable.

Cc: stable@vger.kernel.org
Reported-by: Viswesh <viswesh.vichu@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/md.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index eac84d8ff724..18d0c4adbd7b 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8144,14 +8144,14 @@ void md_do_sync(struct md_thread *thread)
 
 	if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
 	    !test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
-	    mddev->curr_resync > 2) {
+	    mddev->curr_resync > 3) {
 		mddev->curr_resync_completed = mddev->curr_resync;
 		sysfs_notify(&mddev->kobj, NULL, "sync_completed");
 	}
 	mddev->pers->sync_request(mddev, max_sectors, &skipped);
 
 	if (!test_bit(MD_RECOVERY_CHECK, &mddev->recovery) &&
-	    mddev->curr_resync > 2) {
+	    mddev->curr_resync > 3) {
 		if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
 			if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) {
 				if (mddev->curr_resync >= mddev->recovery_cp) {
-- 
2.10.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 800 bytes --]

^ permalink raw reply related

* Re: Resync issue in RAID1
From: NeilBrown @ 2016-10-28  4:01 UTC (permalink / raw)
  To: V, linux-raid
In-Reply-To: <CAF9xHmRCKzBXE5_tdqNY9BJJn7LyC8bOv8OaFXpxc1BvGUTCeQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2057 bytes --]

On Sat, Oct 22 2016, V wrote:

> Hi,
>
> I am facing an issue during RAID1 resync. I have an ubuntu
> 4.4.0-31-generic running with raid1 configured with 2 disks as active
> and 2 as spares. On the first powercycle, after installing RAID, i see
> the following messages in kern.log
>
>
> My disks are configured with 4K sector size (both logical and
> physical) (sda and sdb are active disks for this raid)
>
>
> ===========
> Oct 18 03:52:56  kernel: [   52.869113] md: using 128k window, over a
> total of 51167104k.
> Oct 18 03:52:56  kernel: [   52.869114] md: resuming resync of md2 from checkpoint.

This line (above) combined with ...

> Oct 18 03:52:56  kernel: [   52.869536] md/raid1:md2: sda: unrecoverable I/O read error for block 3

this line suggests that when you shut down, md had already started a
resync, and it had checkpointed at block '3'.

The subsequent error are:

> Oct 18 03:52:56  kernel: [   52.869692] md/raid1:md2: sda: unrecoverable I/O read error for block 131
> Oct 18 03:52:56  kernel: [   52.869837] md/raid1:md2: sda: unrecoverable I/O read error for block 259
> Oct 18 03:52:56  kernel: [   52.870022] md/raid1:md2: sda: unrecoverable I/O read error for block 387

which are every 128 blocks (aka sectors) from '3'.
I know what caused that.  The patch below will stop it happening again.

You might be able get your array working again by stopping it
and assembling with --update=resync.
That will reset the checkpoint to 0.

NeilBrown

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 2cf0e1c00b9a..aa2ca23463f4 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8099,7 +8099,8 @@ void md_do_sync(struct md_thread *thread)
 	    mddev->curr_resync > 2) {
 		if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
 			if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) {
-				if (mddev->curr_resync >= mddev->recovery_cp) {
+				if (mddev->curr_resync >= mddev->recovery_cp &&
+				    mddev->curr_resync > 3) {
 					printk(KERN_INFO
 					       "md: checkpointing %s of %s.\n",
 					       desc, mdname(mddev));

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 800 bytes --]

^ permalink raw reply related

* Re: [PATCH] md/raid5: write an empty meta-block when creatinglogsuper-block
From: Shaohua Li @ 2016-10-27 22:21 UTC (permalink / raw)
  To: Zhengyuan Liu; +Cc: Song Liu, linux-raid, liuzhengyuang521
In-Reply-To: <tencent_647E309D0106DEBE08875B26@qq.com>

On Thu, Oct 27, 2016 at 10:05:06PM +0800, Zhengyuan Liu wrote:
> Sorry for the unclear expression.
> 
> The log might look like this before we did a recovery :
> | mb1 | mb2 | mb3  |           |            |           | 
>  last_checkpoint = mb1'postion,    last_cp_seq = mb1'seq
> After we did a recovery(we would write a empty meta block emb at log tail):
> | mb1 | mb2 | mb3 |  emb |            |           | 
> last_checkpoint = emb'position,  last_cp_seq = mb1'seq + 11
> Then we write two meta blocks and suppose crash happens:
> | mb1 | mb2 | mb3 |  emb |  mb4 | mb5 | 
> last_checkpoint = emb'position,  last_cp_seq = mb1'seq + 11
> Now we did another recovery after restart and suppose mb4 was invalid: 
> | mb1 | mb2 | mb3 |  emb |  mb4 | mb5 | 
> last_checkpoint = emb'position,  last_cp_seq = mb1'seq + 11
> Since mb4 was invalid, we would stop recovering mb5 which should be discarded.
> After recovery, log_start points to mb4 and we wouldn't write a empty meta block
> because condition "ctx.seq > log->last_cp_seq + 1" doesn't satisfy. If we are going to
> write a valid meta block and crash happens again, the new meta block will fall into
> position of mb4 and recovery process may do a recovery to mb5 since it's seq
> is matched.
> 
> What I try to say is that if the first meta block ,not only the mid one, we written was
> invalid, the log recovery could bring problem here too . I think the condition for 
> write a empty meta block should like this:
>     - if (ctx.seq > log->last_cp_seq + 1) {
>     + if (ctx.seq > log->last_cp_seq） {

Get it, thanks! That's correct, as long as we recover one block, we should
rewirte the empty meta block. I'll queue a patch for this.

Thanks,
Shaohua


^ permalink raw reply

* Re: [PATCH v2] raid5: revert commit 11367799f3d1
From: Shaohua Li @ 2016-10-27 22:05 UTC (permalink / raw)
  To: Tomasz Majchrzak; +Cc: linux-raid
In-Reply-To: <1477466439-1797-1-git-send-email-tomasz.majchrzak@intel.com>

On Wed, Oct 26, 2016 at 09:20:39AM +0200, Tomasz Majchrzak wrote:
> Revert commit 11367799f3d1 ("md: Prevent IO hold during accessing to faulty
> raid5 array") as it doesn't comply with commit c3cce6cda162 ("md/raid5:
> ensure device failure recorded before write request returns."). That change
> is not required anymore as the problem is resolved by commit 16f889499a52
> ("md: report 'write_pending' state when array in sync") - read request is
> stuck as array state is not reported correctly via sysfs attribute.
> 
> Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>

Applied.

^ permalink raw reply

* Re: [PATCH] md: wake up personality thread after array state update
From: Shaohua Li @ 2016-10-27 22:02 UTC (permalink / raw)
  To: Tomasz Majchrzak; +Cc: linux-raid, Jes.Sorensen
In-Reply-To: <20161027085206.GA11138@proton.igk.intel.com>

On Thu, Oct 27, 2016 at 10:52:06AM +0200, Tomasz Majchrzak wrote:
> On Wed, Oct 26, 2016 at 12:14:55PM -0700, Shaohua Li wrote:
> > On Tue, Oct 25, 2016 at 05:07:08PM +0200, Tomasz Majchrzak wrote:
> > > When raid1/raid10 array fails to write to one of the drives, the request
> > > is added to bio_end_io_list and finished by personality thread. The
> > > thread doesn't handle it as long as MD_CHANGE_PENDING flag is set. In
> > > case of external metadata this flag is cleared, however the thread is
> > > not woken up. It causes request to be blocked for few seconds (until
> > > another action on the array wakes up the thread) or to get stuck
> > > indefinitely.
> > > 
> > > Wake up personality thread once MD_CHANGE_PENDING has been cleared.
> > > Moving 'restart_array' call after the flag is cleared it not a solution
> > > because in read-write mode the call doesn't wake up the thread.
> > 
> > The patch looks good. However can you elaborate how userspace handles the case?
> > I'd like to understand what the user interface should be to support external
> > metadata array.
> 
> 1. Kernel encounters new bad block that needs to be acknowledged.
> 
> 	sysfs array state == "write-pending" (as MD_CHANGE_PENDING set)
> 	sysfs rdev state == "blocked" (as unacked_exists + external_bbl set)
> 
> 2. mdmon wakes up as there is an update to sysfs array state and unacknowledged
> bad blocks list.
> 
> 3. mdmon checks the state of each disk. If any is 'blocked' and there is a
> support for bad blocks in metadata, it reads unacknowledged bad block list and
> records new bad blocks in metadata. If successful, it acknowledges bad blocks by
> writing to sysfs bad block file. If all bad blocks have been acknowledged, it
> schedules disk unblock.
> 
> As soon as kernel marks all bad blocks as acknowledged, it will clear
> unacked_exists flag.
> 
> 4. mdmon checks 'faulty' flag for each disk. If it is set, the disk is removed
> from array and unblock is scheduled.
> 
> 5. mdmon requests to unblock the array by writing '-blocked' to sysfs disk
> state.
> 
> Requests awaiting for bad block confirmation are woken up in kernel.

Why this step? 3 step writes bad block file, which already wakeup threads
waiting for bad block confirmation.

> 6. mdmon writes 'active' to sysfs array state.
> 
> MD_CHANGE_PENDING flag is cleared by this step but personality thread is not
> woken up. The patch resolves this problem.
> 
> I hope it answers your question.

This is clear, thanks! I applied this patch.

Thanks,
Shaohua

^ permalink raw reply

* Re: recovering failed raid5
From: Robin Hill @ 2016-10-27 20:34 UTC (permalink / raw)
  To: Alexander Shenkin; +Cc: linux-raid
In-Reply-To: <CAM97BgQLPUN=t7VKHaVG-=SrJmS_tvaxGDbo3yqMwHm8B-do_Q@mail.gmail.com>

On Thu Oct 27, 2016 at 04:06:14PM +0100, Alexander Shenkin wrote:

> Hello all,
> 
> A RAID newbie here - apologies in advance for any bonehead questions.
> I have 4 3TB disks that participate in 3 raid arrays.  md2 is failing.
> I'm hoping someone here might be able to give me pointers for the
> right direction to take to avoid data loss, if possible, and recover
> the array.
> 
<- snip ->
> SCT Error Recovery Control command not supported
> SCT Error Recovery Control command not supported
> SCT Error Recovery Control command not supported
> SCT Error Recovery Control command not supported

Others have already chimed in about the recovery process. I just wanted
to make sure you were aware that none of your drives support
TLER/SCTERC. See https://raid.wiki.kernel.org/index.php/Timeout_Mismatch
for details on why this is an issue and how to work around it. You'll
need to do this before attempting any sort of recovery or you're likely
to run into further problems.

Cheers,
    Robin

^ permalink raw reply

* Re: recovering failed raid5
From: Roman Mamedov @ 2016-10-27 16:26 UTC (permalink / raw)
  To: Alexander Shenkin; +Cc: linux-raid
In-Reply-To: <CAM97BgQLPUN=t7VKHaVG-=SrJmS_tvaxGDbo3yqMwHm8B-do_Q@mail.gmail.com>

On Thu, 27 Oct 2016 16:06:14 +0100
Alexander Shenkin <al@shenkin.org> wrote:

> === START OF INFORMATION SECTION ===
> Model Family:     Seagate Barracuda 7200.14 (AF)
> Device Model:     ST3000DM001-9YN166

That's the horror drive of doom with 30% failure rates within a couple of
years https://www.backblaze.com/blog/3tb-hard-drive-failure/

It's even got it's own Wikipedia article by now:
https://en.wikipedia.org/wiki/ST3000DM001

This Russian article dissects what actually causes the failures -- poor
dust-proofing of the platters area: https://habrahabr.ru/post/251941/

I hope you didn't seriously go out and buy one more of that same model to
replace the failed one.

-- 
With respect,
Roman

^ permalink raw reply

* Re: recovering failed raid5
From: Andreas Klauer @ 2016-10-27 16:04 UTC (permalink / raw)
  To: Alexander Shenkin; +Cc: linux-raid
In-Reply-To: <CAM97BgQLPUN=t7VKHaVG-=SrJmS_tvaxGDbo3yqMwHm8B-do_Q@mail.gmail.com>

On Thu, Oct 27, 2016 at 04:06:14PM +0100, Alexander Shenkin wrote:
> md2: raid5 mounted on /, via sd[abcd]3

Two failed disks...

> md0: raid1 mounted on /boot, via sd[abcd]1

Actually only two disks active in that one, the other two are spares.
It hardly matters for /boot, but you could grow it to a 4 disk raid1.
Spares are not useful.

> My sdb was recently reporting problems.  Instead of second guessing
> those problems, I just got a new disk, replaced it, and added it to
> the arrays.

Replacing right away is the right thing to do.
Unfortunately it seems you have another disk that is broke too.

> 2) smartctl (disabled on drives - can enable once back up.  should I?)
> note: SMART only enabled after problems started cropping up.

But... why? Why disable smart? And if you do, is it a surprise that you 
only notice disk failures when it's already too late?

You should enable smart, and not only that, also run regular selftests, 
and have smartd running, and have it send you mail when something happens. 
Same with raid checks, raid checks are at least something but it won't 
tell you about how many reallocated sectors your drive has.

> root@machinename:/home/username# smartctl --xall /dev/sda

Looks fine but never ran a selftest.

> root@machinename:/home/username# smartctl --xall /dev/sdb

Looks new. (New drives need selftests too.)

> root@machinename:/home/username# smartctl --xall /dev/sdc
> smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.19.0-39-generic] (local build)
> Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
> 
> === START OF INFORMATION SECTION ===
> Model Family:     Seagate Barracuda 7200.14 (AF)
> Device Model:     ST3000DM001-1CH166
> Serial Number:    W1F1N909
>
> 197 Current_Pending_Sector  -O--C-   100   100   000    -    8
> 198 Offline_Uncorrectable   ----C-   100   100   000    -    8

This one is faulty and probably the reason why your resync failed.
You have no redundancy left, so an option here would be to get a 
new drive and ddrescue it over.

That's exactly the kind of thing you should be notified instantly 
about via mail. And it should be discovered when running selftests. 
Without full surface scan of the media, the disk itself won't know.

> ==> WARNING: A firmware update for this drive may be available,
> see the following Seagate web pages:
> http://knowledge.seagate.com/articles/en_US/FAQ/207931en
> http://knowledge.seagate.com/articles/en_US/FAQ/223651en

About this, *shrug*
I don't have these drives, you might want to check that out.
But it probably won't fix bad sectors.

> root@machinename:/home/username# smartctl --xall /dev/sdd

Some strange things in the error log here, but old.
Still, same as for all others - selftest.

> ################### mdadm --examine ###########################
> 
> /dev/sda1:
>      Raid Level : raid1
>    Raid Devices : 2

A RAID 1 with two drives, could be four.

> /dev/sdb1:
> /dev/sdc1:

So these would also have data instead of being spare.

> /dev/sda3:
>      Raid Level : raid5
>    Raid Devices : 4
> 
>     Update Time : Mon Oct 24 09:02:52 2016
>          Events : 53547
> 
>    Device Role : Active device 0
>    Array State : A..A ('A' == active, '.' == missing)

RAID-5 with two failed disks.

> /dev/sdc3:
>      Raid Level : raid5
>    Raid Devices : 4
> 
>     Update Time : Mon Oct 24 08:53:57 2016
>          Events : 53539
> 
>    Device Role : Active device 2
>    Array State : AAAA ('A' == active, '.' == missing)

This one failed, 8:53.

> ############ /proc/mdstat ############################################
> 
> md2 : active raid5 sda3[0] sdc3[2](F) sdd3[3]
>       8760565248 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/2]
> [U__U]

[U__U] refers to device roles as in [0123], 
so device role 0 and 3 is okay, 1 and 2 missing.

> md0 : active raid1 sdb1[4](S) sdc1[2](S) sda1[0] sdd1[3]
>       1950656 blocks super 1.2 [2/2] [UU]

Those two spares again, could be [UUUU] instead.

tl;dr
stop it all,
ddrescue /dev/sdc to your new disk,
try your luck with --assemble --force (not using /dev/sdc!),
get yet another new disk, add, sync, cross fingers.

There's also mdadm --replace instead of --remove, --add, 
that sometimes helps if there's only a few bad sectors 
on each disk. If the disk you already removed wasn't 
already kicked from the array by the time you replaced, 
maybe it would have avoided this problem.

But good disk monitoring and testing is even more important.

Regards
Andreas Klauer

^ permalink raw reply

* recovering failed raid5
From: Alexander Shenkin @ 2016-10-27 15:06 UTC (permalink / raw)
  To: linux-raid

Hello all,

A RAID newbie here - apologies in advance for any bonehead questions.
I have 4 3TB disks that participate in 3 raid arrays.  md2 is failing.
I'm hoping someone here might be able to give me pointers for the
right direction to take to avoid data loss, if possible, and recover
the array.

md2: raid5 mounted on /, via sd[abcd]3
md3: raid10 are swap, via sd[abcd]4
md0: raid1 mounted on /boot, via sd[abcd]1
sd[abcd]2 are used for bios_grub

My sdb was recently reporting problems.  Instead of second guessing
those problems, I just got a new disk, replaced it, and added it to
the arrays.  /dev/md3 synced in the new device just fine.  But md0 and
md2 were showing it as spare (S).  When I tried to remove and re-add
sdb3 to md2, sdc3 started acting up, leading to an error message when
adding to the md2 array:

username@machinename:~$ sudo mdadm --manage /dev/md2 --add /dev/sdb3
sudo: unable to open /var/lib/sudo/username/1: No such file or directory
mdadm: /dev/md2 has failed so using --add cannot work and might destroy
mdadm: data on /dev/sdb3.  You should stop the array and re-assemble it.

Following the wiki
(https://raid.wiki.kernel.org/index.php/Linux_Raid), I've included
relevant information below.  My system has marked / as read-only at
this point, so I'm not able to install lsdrv.

The Event counts in the raid5 array (md2) are all quite similar:
         Events : 53547
         Events : 53539
         Events : 53547

Perhaps that means I can try "mdadm --assemble --force /dev/md2
/dev/sd[acd]3"?  I wanted to check with the experts, however, before
moving forward.

Thanks,
Allie


table of contents:
1) mdadm.conf
2) smartctl (disabled on drives - can enable once back up.  should I?)
3) mdadm --examine
4) cat /proc/mdstat
5) parted
6) dmesg output is available here: http://pastebin.com/YRz2Lxr1

################## mdadm.conf ###############################

root@machinename:/home/username# cat /etc/mdadm/mdadm.conf
# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default (built-in), scan all partitions (/proc/partitions) and all
# containers for MD superblocks. alternatively, specify devices to scan,
using
# wildcards if desired.
#DEVICE partitions containers

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR machinename@shenkin.org

# definitions of existing MD arrays
ARRAY /dev/md/0 metadata=1.2 UUID=437e4abb:c7ac46f1:ef8b2976:94921060
name=arrayname:0
   spares=2
ARRAY /dev/md/2 metadata=1.2 UUID=6426779d:5a08badf:9958e59e:2ded49d5
name=arrayname:2
ARRAY /dev/md/3 metadata=1.2 UUID=dd78a43e:92699c27:3dc5489d:91d93bb2
name=arrayname:3

# This file was auto-generated on Sat, 12 Dec 2015 09:37:39 +0000
# by mkconf $Id$


########################## smartctl ##########################

note: SMART only enabled after problems started cropping up.

root@machinename:/home/username# smartctl --xall /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.19.0-39-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.14 (AF)
Device Model:     ST3000DM001-9YN166
Serial Number:    Z1F13FBA
LU WWN Device Id: 5 000c50 04e444ab1
Firmware Version: CC4B
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Wed Oct 26 11:14:12 2016 BST

==> WARNING: A firmware update for this drive may be available,
see the following Seagate web pages:
http://knowledge.seagate.com/articles/en_US/FAQ/207931en
http://knowledge.seagate.com/articles/en_US/FAQ/223651en

SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM level is:     254 (maximum performance)
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Write SCT (Get) XXX Error Recovery Control Command failed: scsi error
aborted command
Wt Cache Reorder: N/A

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  592) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection
on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 335) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x3085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR--   114   099   006    -    81641176
  3 Spin_Up_Time            PO----   092   092   000    -    0
  4 Start_Stop_Count        -O--CK   100   100   020    -    32
  5 Reallocated_Sector_Ct   PO--CK   100   100   036    -    0
  7 Seek_Error_Rate         POSR--   072   060   030    -    17221506
  9 Power_On_Hours          -O--CK   092   092   000    -    7180
 10 Spin_Retry_Count        PO--C-   100   100   097    -    0
 12 Power_Cycle_Count       -O--CK   100   100   020    -    49
183 Runtime_Bad_Block       -O--CK   100   100   000    -    0
184 End-to-End_Error        -O--CK   100   100   099    -    0
187 Reported_Uncorrect      -O--CK   100   100   000    -    0
188 Command_Timeout         -O--CK   100   100   000    -    0 0 0
189 High_Fly_Writes         -O-RCK   100   100   000    -    0
190 Airflow_Temperature_Cel -O---K   051   049   045    -    49 (Min/Max 48/50)
191 G-Sense_Error_Rate      -O--CK   100   100   000    -    0
192 Power-Off_Retract_Count -O--CK   100   100   000    -    43
193 Load_Cycle_Count        -O--CK   100   100   000    -    70
194 Temperature_Celsius     -O---K   049   051   000    -    49 (0 15 0 0 0)
197 Current_Pending_Sector  -O--C-   100   100   000    -    0
198 Offline_Uncorrectable   ----C-   100   100   000    -    0
199 UDMA_CRC_Error_Count    -OSRCK   200   200   000    -    0
240 Head_Flying_Hours       ------   100   253   000    - 1783h+53m+16.383s
241 Total_LBAs_Written      ------   100   253   000    -    837415739719
242 Total_LBAs_Read         ------   100   253   000    -    121956855490474
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      5  Ext. Comprehensive SMART error log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa1       GPL,SL  VS      20  Device vendor specific log
0xa2       GPL     VS    4496  Device vendor specific log
0xa8       GPL,SL  VS      20  Device vendor specific log
0xa9       GPL,SL  VS       1  Device vendor specific log
0xab       GPL     VS       1  Device vendor specific log
0xb0       GPL     VS    5067  Device vendor specific log
0xbd       GPL     VS     512  Device vendor specific log
0xbe-0xbf  GPL     VS   65535  Device vendor specific log
0xc0       GPL,SL  VS       1  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Data Table command not supported

SCT Error Recovery Control command not supported

Device Statistics (GP Log 0x04) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x000a  2            2  Device-to-host register FISes sent due to a COMRESET
0x0001  2            0  Command failed due to ICRC error
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS

root@machinename:/home/username# smartctl --xall /dev/sdb
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.19.0-39-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.14 (AF)
Device Model:     ST3000DM001-1ER166
Serial Number:    Z502SGLG
LU WWN Device Id: 5 000c50 090cfc6d8
Firmware Version: CC26
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Wed Oct 26 11:14:16 2016 BST

==> WARNING: A firmware update for this drive may be available,
see the following Seagate web pages:
http://knowledge.seagate.com/articles/en_US/FAQ/207931en
http://knowledge.seagate.com/articles/en_US/FAQ/223651en

SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM level is:     254 (maximum performance)
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Write SCT (Get) XXX Error Recovery Control Command failed: scsi error
aborted command
Wt Cache Reorder: N/A

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (   80) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection
on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 318) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x1085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR--   100   100   006    -    155392
  3 Spin_Up_Time            PO----   100   100   000    -    0
  4 Start_Stop_Count        -O--CK   100   100   020    -    1
  5 Reallocated_Sector_Ct   PO--CK   100   100   010    -    0
  7 Seek_Error_Rate         POSR--   100   253   030    -    0
  9 Power_On_Hours          -O--CK   100   100   000    -    50
 10 Spin_Retry_Count        PO--C-   100   100   097    -    0
 12 Power_Cycle_Count       -O--CK   100   100   020    -    1
183 Runtime_Bad_Block       -O--CK   100   100   000    -    0
184 End-to-End_Error        -O--CK   100   100   099    -    0
187 Reported_Uncorrect      -O--CK   100   100   000    -    0
188 Command_Timeout         -O--CK   100   100   000    -    0 0 0
189 High_Fly_Writes         -O-RCK   100   100   000    -    0
190 Airflow_Temperature_Cel -O---K   050   050   045    -    50 (Min/Max 17/50)
191 G-Sense_Error_Rate      -O--CK   100   100   000    -    0
192 Power-Off_Retract_Count -O--CK   100   100   000    -    1
193 Load_Cycle_Count        -O--CK   100   100   000    -    1
194 Temperature_Celsius     -O---K   050   050   000    -    50 (0 17 0 0 0)
197 Current_Pending_Sector  -O--C-   100   100   000    -    0
198 Offline_Uncorrectable   ----C-   100   100   000    -    0
199 UDMA_CRC_Error_Count    -OSRCK   200   200   000    -    0
240 Head_Flying_Hours       ------   100   253   000    -    50h+33m+15.470s
241 Total_LBAs_Written      ------   100   253   000    -    15979162
242 Total_LBAs_Read         ------   100   253   000    -    19700
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      5  Ext. Comprehensive SMART error log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa1       GPL,SL  VS      20  Device vendor specific log
0xa2       GPL     VS    4496  Device vendor specific log
0xa8       GPL,SL  VS     129  Device vendor specific log
0xa9       GPL,SL  VS       1  Device vendor specific log
0xab       GPL     VS       1  Device vendor specific log
0xb0       GPL     VS    5176  Device vendor specific log
0xbe-0xbf  GPL     VS   65535  Device vendor specific log
0xc0       GPL,SL  VS       1  Device vendor specific log
0xc1       GPL,SL  VS      10  Device vendor specific log
0xc3       GPL,SL  VS       8  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Data Table command not supported

SCT Error Recovery Control command not supported

Device Statistics (GP Log 0x04) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x000a  2            2  Device-to-host register FISes sent due to a COMRESET
0x0001  2            0  Command failed due to ICRC error
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS

root@machinename:/home/username# smartctl --xall /dev/sdc
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.19.0-39-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.14 (AF)
Device Model:     ST3000DM001-1CH166
Serial Number:    W1F1N909
LU WWN Device Id: 5 000c50 05ce3c3a2
Firmware Version: CC24
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Wed Oct 26 11:14:18 2016 BST

==> WARNING: A firmware update for this drive may be available,
see the following Seagate web pages:
http://knowledge.seagate.com/articles/en_US/FAQ/207931en
http://knowledge.seagate.com/articles/en_US/FAQ/223651en

SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM level is:     254 (maximum performance)
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Write SCT (Get) XXX Error Recovery Control Command failed: scsi error
aborted command
Wt Cache Reorder: N/A

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  592) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection
on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 324) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x3085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR--   107   099   006    -    13439162
  3 Spin_Up_Time            PO----   093   093   000    -    0
  4 Start_Stop_Count        -O--CK   100   100   020    -    30
  5 Reallocated_Sector_Ct   PO--CK   100   100   010    -    0
  7 Seek_Error_Rate         POSR--   060   060   030    -    42962048780
  9 Power_On_Hours          -O--CK   092   092   000    -    7356
 10 Spin_Retry_Count        PO--C-   100   100   097    -    0
 12 Power_Cycle_Count       -O--CK   100   100   020    -    47
183 Runtime_Bad_Block       -O--CK   100   100   000    -    0
184 End-to-End_Error        -O--CK   100   100   099    -    0
187 Reported_Uncorrect      -O--CK   098   098   000    -    2
188 Command_Timeout         -O--CK   100   100   000    -    0 0 0
189 High_Fly_Writes         -O-RCK   098   098   000    -    2
190 Airflow_Temperature_Cel -O---K   049   049   045    -    51 (Min/Max 47/51)
191 G-Sense_Error_Rate      -O--CK   100   100   000    -    0
192 Power-Off_Retract_Count -O--CK   100   100   000    -    42
193 Load_Cycle_Count        -O--CK   100   100   000    -    89
194 Temperature_Celsius     -O---K   051   051   000    -    51 (0 12 0 0 0)
197 Current_Pending_Sector  -O--C-   100   100   000    -    8
198 Offline_Uncorrectable   ----C-   100   100   000    -    8
199 UDMA_CRC_Error_Count    -OSRCK   200   200   000    -    0
240 Head_Flying_Hours       ------   100   253   000    - 1793h+59m+25.776s
241 Total_LBAs_Written      ------   100   253   000    -    5375768743
242 Total_LBAs_Read         ------   100   253   000    -    80575126377
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      5  Ext. Comprehensive SMART error log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa1       GPL,SL  VS      20  Device vendor specific log
0xa2       GPL     VS    4496  Device vendor specific log
0xa8       GPL,SL  VS     129  Device vendor specific log
0xa9       GPL,SL  VS       1  Device vendor specific log
0xab       GPL     VS       1  Device vendor specific log
0xb0       GPL     VS    5176  Device vendor specific log
0xbd       GPL     VS     512  Device vendor specific log
0xbe-0xbf  GPL     VS   65535  Device vendor specific log
0xc0       GPL,SL  VS       1  Device vendor specific log
0xc1       GPL,SL  VS      10  Device vendor specific log
0xc4       GPL,SL  VS       5  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
Device Error Count: 2
        CR     = Command Register
        FEATR  = Features Register
        COUNT  = Count (was: Sector Count) Register
        LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
        LH     = LBA High (was: Cylinder High) Register    ]   LBA
        LM     = LBA Mid (was: Cylinder Low) Register      ] Register
        LL     = LBA Low (was: Sector Number) Register     ]
        DV     = Device (was: Device/Head) Register
        DC     = Device Control Register
        ER     = Error register
        ST     = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 2 [1] occurred at disk power-on lifetime: 7306 hours (304 days + 10 hours)
  When the command that caused the error occurred, the device was
active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 00 3f ad 38 00 00  Error: UNC at LBA =
0x003fad38 = 4173112

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  --------------- --------------------
  60 00 00 00 38 00 00 00 3f af 30 40 00     00:13:13.200  READ FPDMA QUEUED
  60 00 00 00 80 00 00 00 3f ae b0 40 00     00:13:13.200  READ FPDMA QUEUED
  60 00 00 00 80 00 00 00 3f ae 30 40 00     00:13:13.200  READ FPDMA QUEUED
  60 00 00 00 80 00 00 00 3f ad b0 40 00     00:13:13.200  READ FPDMA QUEUED
  60 00 00 04 c0 00 00 00 3f b0 00 40 00     00:13:13.200  READ FPDMA QUEUED

Error 1 [0] occurred at disk power-on lifetime: 7306 hours (304 days + 10 hours)
  When the command that caused the error occurred, the device was
active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 00 3f ad 38 00 00  Error: UNC at LBA =
0x003fad38 = 4173112

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  --------------- --------------------
  60 00 00 02 a8 00 00 00 3f ac c0 40 00     00:13:09.845  READ FPDMA QUEUED
  60 00 00 04 f0 00 00 00 3f a7 d0 40 00     00:13:09.842  READ FPDMA QUEUED
  60 00 00 00 b8 00 00 00 3f a7 18 40 00     00:13:09.812  READ FPDMA QUEUED
  60 00 00 05 40 00 00 00 3f a1 d8 40 00     00:13:09.812  READ FPDMA QUEUED
  60 00 00 00 08 00 00 00 3f a1 c8 40 00     00:13:09.812  READ FPDMA QUEUED

SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Data Table command not supported

SCT Error Recovery Control command not supported

Device Statistics (GP Log 0x04) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x000a  2            2  Device-to-host register FISes sent due to a COMRESET
0x0001  2            0  Command failed due to ICRC error
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS

root@machinename:/home/username# smartctl --xall /dev/sdd
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.19.0-39-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.14 (AF)
Device Model:     ST3000DM001-9YN166
Serial Number:    S1F0HLY4
LU WWN Device Id: 5 000c50 0513b85c1
Firmware Version: CC9F
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Wed Oct 26 11:14:19 2016 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM level is:     254 (maximum performance)
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Write SCT (Get) XXX Error Recovery Control Command failed: scsi error
aborted command
Wt Cache Reorder: N/A

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  575) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection
on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 331) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x3081) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR--   114   099   006    -    76568208
  3 Spin_Up_Time            PO----   092   092   000    -    0
  4 Start_Stop_Count        -O--CK   100   100   020    -    37
  5 Reallocated_Sector_Ct   PO--CK   100   100   036    -    0
  7 Seek_Error_Rate         POSR--   071   060   030    -    13538199
  9 Power_On_Hours          -O--CK   092   092   000    -    7083
 10 Spin_Retry_Count        PO--C-   100   100   097    -    0
 12 Power_Cycle_Count       -O--CK   100   100   020    -    54
183 Runtime_Bad_Block       -O--CK   100   100   000    -    0
184 End-to-End_Error        -O--CK   100   100   099    -    0
187 Reported_Uncorrect      -O--CK   053   053   000    -    47
188 Command_Timeout         -O--CK   100   100   000    -    0 0 0
189 High_Fly_Writes         -O-RCK   100   100   000    -    0
190 Airflow_Temperature_Cel -O---K   058   058   045    -    42 (Min/Max 41/42)
191 G-Sense_Error_Rate      -O--CK   100   100   000    -    0
192 Power-Off_Retract_Count -O--CK   100   100   000    -    45
193 Load_Cycle_Count        -O--CK   100   100   000    -    77
194 Temperature_Celsius     -O---K   042   042   000    -    42 (0 14 0 0 0)
197 Current_Pending_Sector  -O--C-   100   100   000    -    0
198 Offline_Uncorrectable   ----C-   100   100   000    -    0
199 UDMA_CRC_Error_Count    -OSRCK   200   200   000    -    0
240 Head_Flying_Hours       ------   100   253   000    - 1775h+53m+38.823s
241 Total_LBAs_Written      ------   100   253   000    -    828557670290
242 Total_LBAs_Read         ------   100   253   000    -    52298143648302
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      5  Ext. Comprehensive SMART error log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa1       GPL,SL  VS      20  Device vendor specific log
0xa2       GPL     VS    4496  Device vendor specific log
0xa8       GPL,SL  VS      20  Device vendor specific log
0xa9       GPL,SL  VS       1  Device vendor specific log
0xab       GPL     VS       1  Device vendor specific log
0xb0       GPL     VS    5067  Device vendor specific log
0xbd       GPL     VS     512  Device vendor specific log
0xbe-0xbf  GPL     VS   65535  Device vendor specific log
0xc0       GPL,SL  VS       1  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
Device Error Count: 47 (device log contains only the most recent 20 errors)
        CR     = Command Register
        FEATR  = Features Register
        COUNT  = Count (was: Sector Count) Register
        LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
        LH     = LBA High (was: Cylinder High) Register    ]   LBA
        LM     = LBA Mid (was: Cylinder Low) Register      ] Register
        LL     = LBA Low (was: Sector Number) Register     ]
        DV     = Device (was: Device/Head) Register
        DC     = Device Control Register
        ER     = Error register
        ST     = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 47 [6] occurred at disk power-on lifetime: 4698 hours (195 days
+ 18 hours)
  When the command that caused the error occurred, the device was
active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 f4 df c2 c0 00 00  Error: UNC at LBA =
0xf4dfc2c0 = 4108305088

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  --------------- --------------------
  60 00 00 00 08 00 00 cf 15 5c d0 40 00 21d+03:57:37.019  READ FPDMA QUEUED
  61 00 00 00 08 00 00 cd 15 33 d8 40 00 21d+03:57:37.018  WRITE FPDMA QUEUED
  61 00 00 00 08 00 00 cd 15 33 a8 40 00 21d+03:57:37.018  WRITE FPDMA QUEUED
  60 00 00 00 28 00 00 ae 41 4c a0 40 00 21d+03:57:37.018  READ FPDMA QUEUED
  60 00 00 00 20 00 00 f4 df c4 40 40 00 21d+03:57:37.018  READ FPDMA QUEUED

Error 46 [5] occurred at disk power-on lifetime: 4698 hours (195 days
+ 18 hours)
  When the command that caused the error occurred, the device was
active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 f4 df c2 c0 00 00  Error: UNC at LBA =
0xf4dfc2c0 = 4108305088

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  --------------- --------------------
  60 00 00 00 80 00 00 f4 df b0 a0 40 00 21d+03:57:33.737  READ FPDMA QUEUED
  61 00 00 00 80 00 00 f4 df b0 a0 40 00 21d+03:57:33.736  WRITE FPDMA QUEUED
  60 00 00 02 38 00 00 f4 df c4 60 40 00 21d+03:57:33.736  READ FPDMA QUEUED
  60 00 00 05 40 00 00 f4 df bf 20 40 00 21d+03:57:33.736  READ FPDMA QUEUED
  61 00 00 00 80 00 00 f4 df b0 a0 40 00 21d+03:57:33.734  WRITE FPDMA QUEUED

Error 45 [4] occurred at disk power-on lifetime: 4698 hours (195 days
+ 18 hours)
  When the command that caused the error occurred, the device was
active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 f4 df b0 a0 00 00  Error: UNC at LBA =
0xf4dfb0a0 = 4108300448

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  --------------- --------------------
  60 00 00 03 d0 00 00 f4 df b7 10 40 00 21d+03:57:30.735  READ FPDMA QUEUED
  60 00 00 00 30 00 00 f4 df b4 20 40 00 21d+03:57:30.733  READ FPDMA QUEUED
  60 00 00 00 78 00 00 ae 41 4c 20 40 00 21d+03:57:30.733  READ FPDMA QUEUED
  60 00 00 00 80 00 00 f4 df b3 a0 40 00 21d+03:57:30.732  READ FPDMA QUEUED
  60 00 00 00 80 00 00 f4 df b3 20 40 00 21d+03:57:30.732  READ FPDMA QUEUED

Error 44 [3] occurred at disk power-on lifetime: 4698 hours (195 days
+ 18 hours)
  When the command that caused the error occurred, the device was
active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 f4 df b0 a0 00 00  Error: UNC at LBA =
0xf4dfb0a0 = 4108300448

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  --------------- --------------------
  60 00 00 00 a8 00 00 f4 df b4 50 40 00 21d+03:57:27.426  READ FPDMA QUEUED
  61 00 00 00 50 00 00 cd 15 33 88 40 00 21d+03:57:27.403  WRITE FPDMA QUEUED
  60 00 00 02 18 00 00 f4 df ac f8 40 00 21d+03:57:27.388  READ FPDMA QUEUED
  60 00 00 05 40 00 00 f4 df a7 b8 40 00 21d+03:57:27.387  READ FPDMA QUEUED
  60 00 00 00 58 00 00 f4 df 95 68 40 00 21d+03:57:27.384  READ FPDMA QUEUED

Error 43 [2] occurred at disk power-on lifetime: 4698 hours (195 days
+ 18 hours)
  When the command that caused the error occurred, the device was
active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 f4 df 95 68 00 00  Error: UNC at LBA =
0xf4df9568 = 4108293480

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  --------------- --------------------
  60 00 00 00 80 00 00 f4 df 91 50 40 00 21d+03:57:24.175  READ FPDMA QUEUED
  61 00 00 00 80 00 00 f4 df 91 50 40 00 21d+03:57:24.175  WRITE FPDMA QUEUED
  60 00 00 01 c0 00 00 f4 df 9e 78 40 00 21d+03:57:24.175  READ FPDMA QUEUED
  60 00 00 05 40 00 00 f4 df 99 38 40 00 21d+03:57:24.174  READ FPDMA QUEUED
  61 00 00 00 80 00 00 f4 df 91 50 40 00 21d+03:57:24.173  WRITE FPDMA QUEUED

Error 42 [1] occurred at disk power-on lifetime: 4698 hours (195 days
+ 18 hours)
  When the command that caused the error occurred, the device was
active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 f4 df 95 68 00 00  Error: UNC at LBA =
0xf4df9568 = 4108293480

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  --------------- --------------------
  60 00 00 01 00 00 00 f4 df 94 c0 40 00 21d+03:57:21.120  READ FPDMA QUEUED
  60 00 00 00 80 00 00 f4 df 91 d0 40 00 21d+03:57:21.119  READ FPDMA QUEUED
  60 00 00 00 80 00 00 f4 df 92 50 40 00 21d+03:57:21.119  READ FPDMA QUEUED
  60 00 00 00 80 00 00 f4 df 92 d0 40 00 21d+03:57:21.119  READ FPDMA QUEUED
  60 00 00 00 80 00 00 f4 df 93 50 40 00 21d+03:57:21.119  READ FPDMA QUEUED

Error 41 [0] occurred at disk power-on lifetime: 4698 hours (195 days
+ 18 hours)
  When the command that caused the error occurred, the device was
active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 f4 df 91 50 00 00  Error: UNC at LBA =
0xf4df9150 = 4108292432

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  --------------- --------------------
  60 00 00 03 78 00 00 f4 df 95 c0 40 00 21d+03:57:18.237  READ FPDMA QUEUED
  60 00 00 00 70 00 00 f4 df 94 50 40 00 21d+03:57:18.235  READ FPDMA QUEUED
  60 00 00 00 80 00 00 f4 df 93 d0 40 00 21d+03:57:18.235  READ FPDMA QUEUED
  60 00 00 00 80 00 00 f4 df 93 50 40 00 21d+03:57:18.235  READ FPDMA QUEUED
  60 00 00 00 80 00 00 f4 df 92 d0 40 00 21d+03:57:18.235  READ FPDMA QUEUED

Error 40 [19] occurred at disk power-on lifetime: 4698 hours (195 days
+ 18 hours)
  When the command that caused the error occurred, the device was
active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 f4 df 91 50 00 00  Error: UNC at LBA =
0xf4df9150 = 4108292432

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  --------------- --------------------
  60 00 00 00 08 00 00 ca 32 4a 70 40 00 21d+03:57:15.237  READ FPDMA QUEUED
  60 00 00 00 08 00 00 ca 32 4a 40 40 00 21d+03:57:15.237  READ FPDMA QUEUED
  60 00 00 00 20 00 00 ca 31 9e a0 40 00 21d+03:57:15.237  READ FPDMA QUEUED
  60 00 00 01 00 00 00 f4 df 94 c0 40 00 21d+03:57:15.236  READ FPDMA QUEUED
  60 00 00 05 40 00 00 f4 df 8f 80 40 00 21d+03:57:15.236  READ FPDMA QUEUED

SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Data Table command not supported

SCT Error Recovery Control command not supported

Device Statistics (GP Log 0x04) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x000a  2            2  Device-to-host register FISes sent due to a COMRESET
0x0001  2            0  Command failed due to ICRC error
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS

root@machinename:/home/username#


################### mdadm --examine ###########################

/dev/sda1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 437e4abb:c7ac46f1:ef8b2976:94921060
           Name : arrayname:0
  Creation Time : Mon Dec  7 08:31:31 2015
     Raid Level : raid1
   Raid Devices : 2

 Avail Dev Size : 3901440 (1905.32 MiB 1997.54 MB)
     Array Size : 1950656 (1905.26 MiB 1997.47 MB)
  Used Dev Size : 3901312 (1905.26 MiB 1997.47 MB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 9cb1890b:ad675b3b:7517467f:0780ec8e

    Update Time : Mon Oct 24 09:19:37 2016
       Checksum : 65eadceb - correct
         Events : 92


   Device Role : Active device 0
   Array State : AA ('A' == active, '.' == missing)
/dev/sdb1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 437e4abb:c7ac46f1:ef8b2976:94921060
           Name : arrayname:0
  Creation Time : Mon Dec  7 08:31:31 2015
     Raid Level : raid1
   Raid Devices : 2

 Avail Dev Size : 3901440 (1905.32 MiB 1997.54 MB)
     Array Size : 1950656 (1905.26 MiB 1997.47 MB)
  Used Dev Size : 3901312 (1905.26 MiB 1997.47 MB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 7acd8648:d34b69f3:f85b524d:dedb6b19

    Update Time : Mon Oct 24 08:53:41 2016
       Checksum : b3817738 - correct
         Events : 91


   Device Role : spare
   Array State : AA ('A' == active, '.' == missing)
/dev/sdc1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 437e4abb:c7ac46f1:ef8b2976:94921060
           Name : arrayname:0
  Creation Time : Mon Dec  7 08:31:31 2015
     Raid Level : raid1
   Raid Devices : 2

 Avail Dev Size : 3901440 (1905.32 MiB 1997.54 MB)
     Array Size : 1950656 (1905.26 MiB 1997.47 MB)
  Used Dev Size : 3901312 (1905.26 MiB 1997.47 MB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : f162eae5:19f8926b:f5bb6a2a:8adbbefd

    Update Time : Mon Oct 24 08:53:41 2016
       Checksum : 8a7a189d - correct
         Events : 91


   Device Role : spare
   Array State : AA ('A' == active, '.' == missing)
/dev/sdd1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 437e4abb:c7ac46f1:ef8b2976:94921060
           Name : arrayname:0
  Creation Time : Mon Dec  7 08:31:31 2015
     Raid Level : raid1
   Raid Devices : 2

 Avail Dev Size : 3901440 (1905.32 MiB 1997.54 MB)
     Array Size : 1950656 (1905.26 MiB 1997.47 MB)
  Used Dev Size : 3901312 (1905.26 MiB 1997.47 MB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 6b10139e:ab5da9d0:665b17ee:daf63719

    Update Time : Mon Oct 24 09:19:37 2016
       Checksum : 86deec80 - correct
         Events : 92


   Device Role : Active device 1
   Array State : AA ('A' == active, '.' == missing)
/dev/sda3:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 6426779d:5a08badf:9958e59e:2ded49d5
           Name : arrayname:2
  Creation Time : Mon Dec  7 08:32:42 2015
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 5840377856 (2784.91 GiB 2990.27 GB)
     Array Size : 8760565248 (8354.73 GiB 8970.82 GB)
  Used Dev Size : 5840376832 (2784.91 GiB 2990.27 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : c6136963:6c04bbd8:436bda87:2ad19433

    Update Time : Mon Oct 24 09:02:52 2016
       Checksum : 2ec5936f - correct
         Events : 53547

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : A..A ('A' == active, '.' == missing)
/dev/sdc3:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 6426779d:5a08badf:9958e59e:2ded49d5
           Name : arrayname:2
  Creation Time : Mon Dec  7 08:32:42 2015
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 5840377856 (2784.91 GiB 2990.27 GB)
     Array Size : 8760565248 (8354.73 GiB 8970.82 GB)
  Used Dev Size : 5840376832 (2784.91 GiB 2990.27 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 657c6955:1cdcfdaf:eb6c2aed:f5a4ed1f

    Update Time : Mon Oct 24 08:53:57 2016
       Checksum : 49afa71b - correct
         Events : 53539

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAAA ('A' == active, '.' == missing)
/dev/sdd3:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 6426779d:5a08badf:9958e59e:2ded49d5
           Name : arrayname:2
  Creation Time : Mon Dec  7 08:32:42 2015
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 5840377856 (2784.91 GiB 2990.27 GB)
     Array Size : 8760565248 (8354.73 GiB 8970.82 GB)
  Used Dev Size : 5840376832 (2784.91 GiB 2990.27 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 1f7d54bf:e8b8a81e:898d9255:d2683cd7

    Update Time : Mon Oct 24 09:02:52 2016
       Checksum : 41fe73ed - correct
         Events : 53547

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : A..A ('A' == active, '.' == missing)
/dev/sda4:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : dd78a43e:92699c27:3dc5489d:91d93bb2
           Name : arrayname:3
  Creation Time : Mon Dec  7 08:33:17 2015
     Raid Level : raid10
   Raid Devices : 4

 Avail Dev Size : 15976448 (7.62 GiB 8.18 GB)
     Array Size : 15975424 (15.24 GiB 16.36 GB)
  Used Dev Size : 15975424 (7.62 GiB 8.18 GB)
    Data Offset : 8192 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : cba9cc59:55cccd59:aecb8e98:4de1814a

    Update Time : Mon Oct 24 08:56:30 2016
       Checksum : dbe91e70 - correct
         Events : 75

         Layout : near=2
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAAA ('A' == active, '.' == missing)
/dev/sdb4:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : dd78a43e:92699c27:3dc5489d:91d93bb2
           Name : arrayname:3
  Creation Time : Mon Dec  7 08:33:17 2015
     Raid Level : raid10
   Raid Devices : 4

 Avail Dev Size : 15976448 (7.62 GiB 8.18 GB)
     Array Size : 15975424 (15.24 GiB 16.36 GB)
  Used Dev Size : 15975424 (7.62 GiB 8.18 GB)
    Data Offset : 8192 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : f3c3c943:303bd216:bb42b1aa:5ac65a19

    Update Time : Mon Oct 24 08:56:30 2016
       Checksum : 63e60391 - correct
         Events : 75

         Layout : near=2
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AAAA ('A' == active, '.' == missing)
/dev/sdc4:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : dd78a43e:92699c27:3dc5489d:91d93bb2
           Name : arrayname:3
  Creation Time : Mon Dec  7 08:33:17 2015
     Raid Level : raid10
   Raid Devices : 4

 Avail Dev Size : 15976448 (7.62 GiB 8.18 GB)
     Array Size : 15975424 (15.24 GiB 16.36 GB)
  Used Dev Size : 15975424 (7.62 GiB 8.18 GB)
    Data Offset : 8192 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 1fece6ad:ac89b95f:e861553e:d507b3d6

    Update Time : Mon Oct 24 08:56:30 2016
       Checksum : 67e6dae0 - correct
         Events : 75

         Layout : near=2
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAAA ('A' == active, '.' == missing)
/dev/sdd4:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : dd78a43e:92699c27:3dc5489d:91d93bb2
           Name : arrayname:3
  Creation Time : Mon Dec  7 08:33:17 2015
     Raid Level : raid10
   Raid Devices : 4

 Avail Dev Size : 15976448 (7.62 GiB 8.18 GB)
     Array Size : 15975424 (15.24 GiB 16.36 GB)
  Used Dev Size : 15975424 (7.62 GiB 8.18 GB)
    Data Offset : 8192 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 33cae133:63d86892:7011553c:2068b2cc

    Update Time : Mon Oct 24 08:56:30 2016
       Checksum : 1490177f - correct
         Events : 75

         Layout : near=2
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : AAAA ('A' == active, '.' == missing)


############ /proc/mdstat ############################################

root@machinename:/home/username# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md3 : active raid10 sdb4[4] sdd4[3] sda4[0] sdc4[2]
      15975424 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]

md2 : active raid5 sda3[0] sdc3[2](F) sdd3[3]
      8760565248 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/2]
[U__U]

md0 : active raid1 sdb1[4](S) sdc1[2](S) sda1[0] sdd1[3]
      1950656 blocks super 1.2 [2/2] [UU]

unused devices: <none>


################ PARTITIONS ##################

root@machinename:/home/username# cat sd.parted
BYT;
/dev/sda:3001GB:scsi:512:4096:gpt:ATA ST3000DM001-9YN1;
1:1049kB:2000MB:1999MB::boot:raid;
2:2000MB:2001MB:1049kB::grubbios:bios_grub;
3:2001MB:2992GB:2990GB:ext4:main:raid;
4:2992GB:3001GB:8184MB:linux-swap(v1):swap:raid;

BYT;
/dev/sdb:3001GB:scsi:512:4096:gpt:ATA ST3000DM001-1CH1;
1:1049kB:2000MB:1999MB::boot:raid;
2:2000MB:2001MB:1049kB::grubbios:bios_grub;
3:2001MB:2992GB:2990GB::main:raid;
4:2992GB:3001GB:8184MB::swap:raid;

BYT;
/dev/sdc:3001GB:scsi:512:4096:gpt:ATA ST3000DM001-1CH1;
1:1049kB:2000MB:1999MB::boot:raid;
2:2000MB:2001MB:1049kB::grubbios:bios_grub;
3:2001MB:2992GB:2990GB::main:raid;
4:2992GB:3001GB:8184MB::swap:raid;

BYT;
/dev/sdd:3001GB:scsi:512:4096:gpt:ATA ST3000DM001-9YN1;
1:1049kB:2000MB:1999MB::boot:raid;
2:2000MB:2001MB:1049kB::grubbios:bios_grub;
3:2001MB:2992GB:2990GB::main:raid;
4:2992GB:3001GB:8184MB::swap:raid;

BYT;
/dev/md0:1997MB:md:512:4096:loop:Linux Software RAID Array;
1:0.00B:1997MB:1997MB:ext4::;

BYT;
/dev/md2:8971GB:md:512:4096:loop:Linux Software RAID Array;
1:0.00B:8971GB:8971GB:ext4::;

BYT;
/dev/md3:16.4GB:md:512:4096:loop:Linux Software RAID Array;
1:0.00B:16.4GB:16.4GB:linux-swap(v1)::;

^ permalink raw reply

* Re: Fail to assemble raid4 with replaced disk
From: Santiago DIEZ @ 2016-10-27 14:11 UTC (permalink / raw)
  To: Wols Lists; +Cc: Linux Raid LIST
In-Reply-To: <580F9B68.9010105@youngman.org.uk>

Hi,

Indeed, here is what I had in terms of event count:
/dev/sda10: 81589
/dev/sdb10: 81626
/dev/sdc10: 81589

Then the following procedure worked quite straightforward:
--------------------------------------------------------------------------------
# mdadm --assemble /dev/md10 --verbose --force /dev/sda10 /dev/sdb10 /dev/sdc10
# mdadm --manage /dev/md10 --add /dev/sdd10
--------------------------------------------------------------------------------

And 6h+ later:
--------------------------------------------------------------------------------
# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md10 : active raid5 sdd10[3] sda10[0] sdc10[2] sdb10[1]
      5778741888 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
--------------------------------------------------------------------------------

Then I ran:
--------------------------------------------------------------------------------
# e2fsck -f -n -t -v /dev/md10
e2fsck 1.42.5 (29-Jul-2012)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

    15675837 inodes used (4.34%, out of 361177088)
      188798 non-contiguous files (1.2%)
       14751 non-contiguous directories (0.1%)
             # of inodes with ind/dind/tind blocks: 0/0/0
             Extent depth histogram: 15626455/47037/15
  1281308341 blocks used (88.69%, out of 1444685472)
           0 bad blocks
         101 large files

    15311457 regular files
      361754 directories
           0 character device files
           0 block device files
           0 fifos
           0 links
        2607 symbolic links (2310 fast symbolic links)
          10 sockets
------------
    15675828 files
Memory used: 50976k/1912k (20541k/30436k), time: 1304.00/334.06/ 8.00
I/O read: 4891MB, write: 0MB, rate: 3.75MB/s
--------------------------------------------------------------------------------

Does it look OK enough to launch the mount?

Regards and thanks for your help
-------------------------
Santiago DIEZ
Quark Systems & CAOBA
23 rue du Buisson Saint-Louis, 75010 Paris
-------------------------

^ permalink raw reply

* Re: [PATCH] md/raid5: write an empty meta-block when creatinglogsuper-block
From: Zhengyuan Liu @ 2016-10-27 14:05 UTC (permalink / raw)
  To: Shaohua Li; +Cc: Song Liu, linux-raid, liuzhengyuang521

Sorry for the unclear expression.

The log might look like this before we did a recovery :
| mb1 | mb2 | mb3  |           |            |           | 
 last_checkpoint = mb1'postion,    last_cp_seq = mb1'seq
After we did a recovery(we would write a empty meta block emb at log tail):
| mb1 | mb2 | mb3 |  emb |            |           | 
last_checkpoint = emb'position,  last_cp_seq = mb1'seq + 11
Then we write two meta blocks and suppose crash happens:
| mb1 | mb2 | mb3 |  emb |  mb4 | mb5 | 
last_checkpoint = emb'position,  last_cp_seq = mb1'seq + 11
Now we did another recovery after restart and suppose mb4 was invalid: 
| mb1 | mb2 | mb3 |  emb |  mb4 | mb5 | 
last_checkpoint = emb'position,  last_cp_seq = mb1'seq + 11
Since mb4 was invalid, we would stop recovering mb5 which should be discarded.
After recovery, log_start points to mb4 and we wouldn't write a empty meta block
because condition "ctx.seq > log->last_cp_seq + 1" doesn't satisfy. If we are going to
write a valid meta block and crash happens again, the new meta block will fall into
position of mb4 and recovery process may do a recovery to mb5 since it's seq
is matched.

What I try to say is that if the first meta block ,not only the mid one, we written was
invalid, the log recovery could bring problem here too . I think the condition for 
write a empty meta block should like this:
    - if (ctx.seq > log->last_cp_seq + 1) {
    + if (ctx.seq > log->last_cp_seq） {

------------------ Original ------------------
From:  "Shaohua Li"<shli@kernel.org>;
Date:  Thu, Oct 27, 2016 02:35 AM
To:  "Zhengyuan Liu"<liuzhengyuan@kylinos.cn>;
Cc:  "Song Liu"<songliubraving@fb.com>; "linux-raid"<linux-raid@vger.kernel.org>; "liuzhengyuang521"<liuzhengyuang521@gmail.com>;
Subject:  Re: [PATCH] md/raid5: write an empty meta-block when creatinglogsuper-block
 
On Tue, Oct 25, 2016 at 08:43:50PM +0800, Zhengyuan Liu wrote:
> After discussion with my colleague, I think there is still a problem that
> may happen very unlikely.The superblock should point to the last meta
> block we have written after log reclaim or point to the emtpy meta block
> after log recovery, just consider we write some meta block behind the
> superblock position and suppose crash happens. If the first meta block we
> have written neighboring the superblock position is invalid,  ctx.seq would 
> also equal to last_cp_seq+1 after we did a recovery . So the safest way is 
> we always write an empty meta block at ctx.pos no matter how much
> ctx.req is more than last_cp_seq after we did a recovery. 
> How do you think, Shaohua? If it is necessary, I'd revert this patch and
> resend one.

I didn't get the point. Could you please elaborate it again?

Thanks,
Shaohua

> 
> ------------------ Original ------------------
> From:  "Shaohua Li"<shli@kernel.org>;
> Date:  Tue, Oct 25, 2016 05:23 AM
> To:  "Zhengyuan Liu"<liuzhengyuan@kylinos.cn>;
> Cc:  "shli"<shli@fb.com>; "Song Liu"<songliubraving@fb.com>; "linux-raid"<linux-raid@vger.kernel.org>; "liuzhengyuang521"<liuzhengyuang521@gmail.com>;
> Subject:  Re: [PATCH] md/raid5: write an empty meta-block when creating logsuper-block
>  
> On Mon, Oct 24, 2016 at 04:15:59PM +0800, Zhengyuan Liu wrote:
> > If superblock points to an invalid meta block, r5l_load_log will set
> > create_super with true and create an new superblock, this runtime path
> > would always happen if we do no writing I/O to this array since it was
> > created. Writing an empty meta block could avoid this unnecessary
> > action at the first time we created log superblock.
> > 
> > Another reason is for the corretness of log recovery. Currently we have
> > bellow code to guarantee log revocery to be correct.
> > 
> >         if (ctx.seq > log->last_cp_seq + 1) {
> >                 int ret;
> > 
> >                 ret = r5l_log_write_empty_meta_block(log, ctx.pos, ctx.seq + 10);
> >                 if (ret)
> >                         return ret;
> >                 log->seq = ctx.seq + 11;
> >                 log->log_start = r5l_ring_add(log, ctx.pos, BLOCK_SECTORS);
> >                 r5l_write_super(log, ctx.pos);
> >         } else {
> >                 log->log_start = ctx.pos;
> >                 log->seq = ctx.seq;
> >         }
> > 
> > If we just created a array with a journal device, log->log_start and
> > log->last_checkpoint should all be 0, then we write three meta block
> > which are valid except mid one and supposed crash happened. The ctx.seq
> > would equal to log->last_cp_seq + 1 and log->log_start would be set to
> > position of mid invalid meta block after we did a recovery, this will
> > lead to problems which could be avoided with this patch.
> 
> This would be very unlikely, but better to fix. Applied, thanks!

^ permalink raw reply

* [PATCH] Increase buffer for sysfs disk state
From: Tomasz Majchrzak @ 2016-10-27  9:34 UTC (permalink / raw)
  To: linux-raid; +Cc: Jes.Sorensen, Tomasz Majchrzak

Bad block support has incremented sysfs disk state reported by kernel
("external_bbl") so it became longer than 20 bytes. It causes reshape to
fail as it reads truncated entry from sysfs.

Increase buffer so it can accommodate the string including all state
values currently implemented in kernel at the same time.

Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
---
 Grow.c        | 6 ++++--
 monitor.c     | 4 ++--
 super-intel.c | 5 +++--
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/Grow.c b/Grow.c
index 7402597..9f1570e 100755
--- a/Grow.c
+++ b/Grow.c
@@ -4017,8 +4017,10 @@ static int grow_backup(struct mdinfo *sra,
 			if (sd->disk.state & (1<<MD_DISK_FAULTY))
 				continue;
 			if (sd->disk.state & (1<<MD_DISK_SYNC)) {
-				char sbuf[20];
-				if (sysfs_get_str(sra, sd, "state", sbuf, 20) < 0 ||
+				char sbuf[100];
+
+				if (sysfs_get_str(sra, sd, "state",
+						  sbuf, sizeof(sbuf)) < 0 ||
 				    strstr(sbuf, "faulty") ||
 				    strstr(sbuf, "in_sync") == NULL) {
 					/* this device is dead */
diff --git a/monitor.c b/monitor.c
index 1704a59..15181ce 100644
--- a/monitor.c
+++ b/monitor.c
@@ -136,8 +136,8 @@ static enum sync_action read_action( int fd)
 
 int read_dev_state(int fd)
 {
-	char buf[60];
-	int n = read_attr(buf, 60, fd);
+	char buf[100];
+	int n = read_attr(buf, sizeof(buf), fd);
 	char *cp;
 	int rv = 0;
 
diff --git a/super-intel.c b/super-intel.c
index 4bb9059..f9d0a04 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -11408,9 +11408,10 @@ int check_degradation_change(struct mdinfo *info,
 			if (sd->disk.state & (1<<MD_DISK_FAULTY))
 				continue;
 			if (sd->disk.state & (1<<MD_DISK_SYNC)) {
-				char sbuf[20];
+				char sbuf[100];
+
 				if (sysfs_get_str(info,
-					sd, "state", sbuf, 20) < 0 ||
+					sd, "state", sbuf, sizeof(sbuf)) < 0 ||
 					strstr(sbuf, "faulty") ||
 					strstr(sbuf, "in_sync") == NULL) {
 					/* this device is dead */
-- 
1.8.3.1


^ permalink raw reply related

* Re: [PATCH 1/4] mdadm: bad block support for external metadata - initialization
From: Tomasz Majchrzak @ 2016-10-27  9:26 UTC (permalink / raw)
  To: Jes Sorensen, shli; +Cc: linux-raid
In-Reply-To: <wrfjvawekhn0.fsf@redhat.com>

On Wed, Oct 26, 2016 at 03:57:39PM -0400, Jes Sorensen wrote:
> Shaohua Li <shli@kernel.org> writes:
> > On Wed, Oct 26, 2016 at 02:00:47PM -0400, Jes Sorensen wrote:
> >> Tomasz Majchrzak <tomasz.majchrzak@intel.com> writes:
> >> > I cannot see how badblocks program is related to this patch. It is a generic
> >> > code for bad blocks support in IMSM metadata. It introduces 64-bit value for
> >> > sector address, the same size as in kernel. All it does is syncing
> >> > kernel bad
> >> > block list with raid metadata.
> >> >
> >> > Tomek
> >> 
> >> I was waiting for this response, but you cut me off the CC list so
> >> missed it.
> >> 
> >> In this case I'll go ahead and apply these patches to mdadm.
> >
> > Thomasz,
> >
> > So your original kernel patch to support bad block for external metadata writes
> > '-blocked' to state. We agreed it's not required later and the applied kernel
> > patches don't support that interface. Don't you need change of the mdadm
> > patches?
> 
> Well I'll wait until this is resolved then :)

I have explained the process in detail in the other email. I haven't done any
change to '-blocked' action. It is still requested by mdmon as disk is in
blocked state when bad block is awaiting for confirmation. However my accepted
patch stopped reporting disk as faulty if there are unacknowledged bad blocks.
I have realized that disk should be shown as faulty only for unrecoverable
state. Unacknowledged bad block can still be handled so this state is not
adequate. My first mdadm patch set ignored this flag if all bad blocks have
been successfully acknowledged. It was not fully correct as it would not work
if bad block and unrecoverable error happen at the same time.

I have resent the patches that don't ignore faulty state after acknowledging
bad blocks.

Tomek

^ permalink raw reply

* [PATCH 4/4 v4] mdmon: bad block support for external metadata - clear bad blocks
From: Tomasz Majchrzak @ 2016-10-27  8:53 UTC (permalink / raw)
  To: linux-raid; +Cc: Jes.Sorensen, Tomasz Majchrzak
In-Reply-To: <1477558425-13332-1-git-send-email-tomasz.majchrzak@intel.com>

If an update of acknowledged bad blocks file is notified, read entire
bad block list from sysfs file and compare it against local list of bad
blocks. If any obsolete entries are found, remove them from metadata.

As mdmon cannot perform any memory allocation, new superswitch method
get_bad_blocks is expected to return a list of bad blocks in metadata
without allocating memory. It's up to metadata handler to allocate all
required memory in advance.

Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
---
 mdadm.h   |  7 ++++++
 monitor.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 86 insertions(+), 2 deletions(-)

diff --git a/mdadm.h b/mdadm.h
index 05a2e3e..7f1a1b8 100755
--- a/mdadm.h
+++ b/mdadm.h
@@ -1054,6 +1054,13 @@ extern struct superswitch {
 	int (*record_bad_block)(struct active_array *a, int n,
 					unsigned long long sector, int length);
 
+	/* clears bad block from metadata */
+	int (*clear_bad_block)(struct active_array *a, int n,
+					unsigned long long sector, int length);
+
+	/* get list of bad blocks from metadata */
+	struct md_bb *(*get_bad_blocks)(struct active_array *a, int n);
+
 	int swapuuid; /* true if uuid is bigending rather than hostendian */
 	int external;
 	const char *name; /* canonical metadata name */
diff --git a/monitor.c b/monitor.c
index 9de9c8d..3d60fef 100644
--- a/monitor.c
+++ b/monitor.c
@@ -33,6 +33,7 @@ static char *sync_actions[] = {
 
 enum bb_action {
 	RECORD_BB = 1,
+	COMPARE_BB,
 };
 
 static int write_attr(char *attr, int fd)
@@ -184,6 +185,49 @@ int process_ubb(struct active_array *a, struct mdinfo *mdi, const unsigned long
 	return -1;
 }
 
+int compare_bb(struct active_array *a, struct mdinfo *mdi, const unsigned long
+	       long sector, const unsigned int length, void *arg)
+{
+	struct superswitch *ss = a->container->ss;
+	struct md_bb *bb = (struct md_bb *) arg;
+	int record = 1;
+	int i;
+
+	for (i = 0; i < bb->count; i++) {
+		unsigned long long start = bb->entries[i].sector;
+		unsigned long long len = bb->entries[i].length;
+
+		/*
+		 * bad block in metadata exactly matches bad block in kernel
+		 * list, just remove it from a list
+		 */
+		if ((start == sector) && (len == length)) {
+			if (i < bb->count - 1)
+				bb->entries[i] = bb->entries[bb->count - 1];
+			bb->count -= 1;
+			record = 0;
+			break;
+		}
+		/*
+		 * bad block in metadata spans bad block in kernel list,
+		 * clear it and record new bad block
+		 */
+		if ((sector >= start) && (sector + length <= start + len)) {
+			ss->clear_bad_block(a, mdi->disk.raid_disk, start, len);
+			break;
+		}
+	}
+
+	/* record all bad blocks not in metadata list */
+	if (record && (ss->record_bad_block(a, mdi->disk.raid_disk, sector,
+					     length) <= 0)) {
+		sysfs_set_str(&a->info, mdi, "state", "-external_bbl");
+		return -1;
+	}
+
+	return 1;
+}
+
 static int read_bb_file(int fd, struct active_array *a, struct mdinfo *mdi,
 			enum bb_action action, void *arg)
 {
@@ -242,6 +286,8 @@ static int read_bb_file(int fd, struct active_array *a, struct mdinfo *mdi,
 			if (action == RECORD_BB)
 				rc = process_ubb(a, mdi, sector, length,
 						  buf + off, consumed);
+			else if (action == COMPARE_BB)
+				rc = compare_bb(a, mdi, sector, length, arg);
 			else
 				rc = -1;
 
@@ -260,6 +306,34 @@ static int process_dev_ubb(struct active_array *a, struct mdinfo *mdi)
 	return read_bb_file(mdi->ubb_fd, a, mdi, RECORD_BB, NULL);
 }
 
+static int check_for_cleared_bb(struct active_array *a, struct mdinfo *mdi)
+{
+	struct superswitch *ss = a->container->ss;
+	struct md_bb *bb;
+	int i;
+
+	/*
+	 * Get a list of bad blocks for an array, then read list of
+	 * acknowledged bad blocks from kernel and compare it against metadata
+	 * list, clear all bad blocks remaining in metadata list
+	 */
+	bb = ss->get_bad_blocks(a, mdi->disk.raid_disk);
+	if (!bb)
+		return -1;
+
+	if (read_bb_file(mdi->bb_fd, a, mdi, COMPARE_BB, bb) < 0)
+		return -1;
+
+	for (i = 0; i < bb->count; i++) {
+		unsigned long long sector = bb->entries[i].sector;
+		int length = bb->entries[i].length;
+
+		ss->clear_bad_block(a, mdi->disk.raid_disk, sector, length);
+	}
+
+	return 0;
+}
+
 static void signal_manager(void)
 {
 	/* tgkill(getpid(), mon_tid, SIGUSR1); */
@@ -326,7 +400,7 @@ static void signal_manager(void)
 
 #define ARRAY_DIRTY 1
 #define ARRAY_BUSY 2
-static int read_and_act(struct active_array *a)
+static int read_and_act(struct active_array *a, fd_set *fds)
 {
 	unsigned long long sync_completed;
 	int check_degraded = 0;
@@ -368,6 +442,8 @@ static int read_and_act(struct active_array *a)
 		    (process_dev_ubb(a, mdi) > 0)) {
 			mdi->next_state |= DS_UNBLOCK;
 		}
+		if (FD_ISSET(mdi->bb_fd, fds))
+			check_for_cleared_bb(a, mdi);
 	}
 
 	gettimeofday(&tv, NULL);
@@ -754,6 +830,7 @@ static int wait_and_act(struct supertype *container, int nowait)
 		if (rv == -1) {
 			if (errno == EINTR) {
 				rv = 0;
+				FD_ZERO(&rfds);
 				dprintf("monitor: caught signal\n");
 			} else
 				dprintf("monitor: error %d in pselect\n",
@@ -795,7 +872,7 @@ static int wait_and_act(struct supertype *container, int nowait)
 			signal_manager();
 		}
 		if (a->container && !a->to_remove) {
-			int ret = read_and_act(a);
+			int ret = read_and_act(a, &rfds);
 			rv |= 1;
 			dirty_arrays += !!(ret & ARRAY_DIRTY);
 			/* when terminating stop manipulating the array after it
-- 
1.8.3.1


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox