* [PATCH 0/3] [RESEND]dm-raid1: several fixes about writing on out of sync mirror device
@ 2015-04-16 3:43 Lidong Zhong
2015-04-16 3:43 ` [PATCH 1/3] dm-raid1: fix the parameter passed into the kernel Lidong Zhong
` (4 more replies)
0 siblings, 5 replies; 8+ messages in thread
From: Lidong Zhong @ 2015-04-16 3:43 UTC (permalink / raw)
To: dm-devel; +Cc: heinzm
Hi List/Heinz,
These three patches are done based on last patch series that replied on April 8.
The following is the test I did about this feature. My test environment:
linux-klqg:~ # dmsetup ls --tree
vg-lv (253:4)
├─vg-lv_mimage_2 (253:3)
│ └─ (8:48)
├─vg-lv_mimage_1 (253:2)
│ └─ (8:32)
├─vg-lv_mimage_0 (253:1)
│ └─ (8:16)
└─vg-lv_mlog (253:0)
└─ (8:64)
nux-klqg:~ # dmsetup status
vg-lv_mimage_2: 0 614400 linear
vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 600/600 1 AAA 3 disk 253:0 A
vg-lv_mimage_1: 0 614400 linear
vg-lv_mimage_0: 0 614400 linear
vg-lv_mlog: 0 8192 linear
linux-klqg:~ # dmsetup table
vg-lv_mimage_2: 0 614400 linear 8:48 2048
vg-lv: 0 614400 mirror disk 2 253:0 1024 3 253:1 0 253:2 0 253:3 0 2 handle_errors keep_log
vg-lv_mimage_1: 0 614400 linear 8:32 2048
vg-lv_mimage_0: 0 614400 linear 8:16 2048
vg-lv_mlog: 0 8192 linear 8:64 2048
1\, single data device failure
After make one of the data legs failed, writing data to the first three regions.
linux-klqg:~ # echo "a" |dd of=/dev/vg/lv bs=1K count=1 seek=0
0+1 records in
0+1 records out
2 bytes (2 B) copied, 0.0103211 s, 0.2 kB/s
linux-klqg:~ # echo "b" |dd of=/dev/vg/lv bs=1K count=1 seek=512
0+1 records in
0+1 records out
2 bytes (2 B) copied, 0.00428962 s, 0.5 kB/s
linux-klqg:~ # echo "c" |dd of=/dev/vg/lv bs=1K count=1 seek=1024
0+1 records in
0+1 records out
2 bytes (2 B) copied, 0.00282482 s, 0.7 kB/s
linux-klqg:~ # dmsetup status
vg-lv_mimage_2: 0 614400 linear
vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 597/600 1 ADA 3 disk 253:0 A
vg-lv_mimage_1: 0 614400 linear
vg-lv_mimage_0: 0 614400 linear
vg-lv_mlog: 0 8192 linear
Now the failed device comes back, its major/minor number may changes, replace the table as needed.
(The devices I tested on are iscsi devices and the minor number changed after each attach/detach)
Then start the recovery
linux-klqg:~ # dmsetup suspend vg-lv
linux-klqg:~ # dmsetup resume vg-lv
linux-klqg:~ # dmsetup status
vg-lv_mimage_2: 0 614400 linear
vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 600/600 1 AAA 3 disk 253:0 A
vg-lv_mimage_1: 0 614400 linear
vg-lv_mimage_0: 0 614400 linear
vg-lv_mlog: 0 8192 linear
We can see that all the regions are in sync now.
2\, two or more data device failure
After detaching the first device(mine is /dev/sdb), write data to the first and second region
linux-klqg:~ # echo "1111111" | dd of=/dev/vg/lv bs=1K count=1 seek=0
0+1 records in
0+1 records out
8 bytes (8 B) copied, 0.00209451 s, 3.8 kB/s
linux-klqg:~ # echo "222222" | dd of=/dev/vg/lv bs=1K count=1 seek=512
0+1 records in
0+1 records out
7 bytes (7 B) copied, 0.00259999 s, 2.7 kB/s
linux-klqg:~ # dmsetup status
vg-lv_mimage_2: 0 614400 linear
vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 598/600 1 ADA 3 disk 253:0 A
vg-lv_mimage_1: 0 614400 linear
vg-lv_mimage_0: 0 614400 linear
vg-lv_mlog: 0 8192 linear
Now the first and second region are marked as no sync. Then detach the second device
(mine is /dev/sdd) and write data to the third and fourth region
linux-klqg:~ # echo "333333" | dd of=/dev/vg/lv bs=1K count=1 seek=1024
0+1 records in
0+1 records out
7 bytes (7 B) copied, 0.00178031 s, 3.9 kB/s
linux-klqg:~ # echo "444444" | dd of=/dev/vg/lv bs=1K count=1 seek=1536
0+1 records in
0+1 records out
7 bytes (7 B) copied, 0.00256491 s, 2.7 kB/s
linux-klqg:~ # dmsetup status
vg-lv_mimage_2: 0 614400 linear
vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 596/600 1 DDA 3 disk 253:0 A
vg-lv_mimage_1: 0 614400 linear
vg-lv_mimage_0: 0 614400 linear
vg-lv_mlog: 0 8192 linear
Now there are 4 regions are marked as no sync. Then the first failed device comes back, we try to
do the recovery.
linux-klqg:~ # dmsetup suspend vg-lv
linux-klqg:~ # dmsetup resume vg-lv
linux-klqg:~ #
linux-klqg:~ #
linux-klqg:~ # dmsetup status
vg-lv_mimage_2: 0 614400 linear
vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 596/600 1 DDA 3 disk 253:0 A
vg-lv_mimage_1: 0 614400 linear
vg-lv_mimage_0: 0 614400 linear
vg-lv_mlog: 0 8192 linear
And it shows there are still 4 regions are marked as no resync, because there is still
a missing device. And we keep writing to the fifth region
linux-klqg:~ # echo "5555555" | dd of=/dev/vg/lv bs=1K count=1 seek=2048
0+1 records in
0+1 records out
8 bytes (8 B) copied, 0.00213449 s, 3.7 kB/s
And now the second missing device comes back. We try to do the recovery
linux-klqg:~ # dmsetup suspend vg-lv
linux-klqg:~ # dmsetup resume vg-lv
linux-klqg:~ # dmsetup status
vg-lv_mimage_2: 0 614400 linear
vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 600/600 1 AAA 3 disk 253:0 A
vg-lv_mimage_1: 0 614400 linear
vg-lv_mimage_0: 0 614400 linear
vg-lv_mlog: 0 8192 linear
It shows all the legs are sync now. we read data from each leg and get the
same result.
3\, log device failure
After make the log device failed, we tried to write on this lv
linux-klqg:~ # echo "test" |dd of=/dev/vg/lv bs=1K count=1 seek=0
0+1 records in
0+1 records out
21 bytes (21 B) copied, 0.00470523 s, 4.5 kB/s
linux-klqg:~ # dmsetup status
vg-lv_mimage_2: 0 614400 linear
vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 600/600 1 AAA 3 disk 253:0 D
vg-lv_mimage_1: 0 614400 linear
vg-lv_mimage_0: 0 614400 linear
vg-lv_mlog: 0 8192 linear
And we can see that the log device is marked as failed.
And the bio is not written to the data legs because we can't read new data our of
the leg
Is the test enough? or is there corner case that is not covered in the patch?
Any advice is appreciated.
Regards,
Lidong
Lidong Zhong (3):
dm-raid1: fix the parameter passed into the kernel
dm-raid1: remove the error flags in the mirror set when it's in sync
dm-raid1: change default mirror when it's not in sync
drivers/md/dm-raid1.c | 38 +++++++++++++++++++++++++-------------
1 file changed, 25 insertions(+), 13 deletions(-)
--
1.8.1.4
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 8+ messages in thread* [PATCH 1/3] dm-raid1: fix the parameter passed into the kernel
2015-04-16 3:43 [PATCH 0/3] [RESEND]dm-raid1: several fixes about writing on out of sync mirror device Lidong Zhong
@ 2015-04-16 3:43 ` Lidong Zhong
2015-04-16 3:43 ` [PATCH 2/3] dm-raid1: remove the error flags in the mirror set when it's in sync Lidong Zhong
` (3 subsequent siblings)
4 siblings, 0 replies; 8+ messages in thread
From: Lidong Zhong @ 2015-04-16 3:43 UTC (permalink / raw)
To: dm-devel; +Cc: heinzm
If the userspace does not pass the new feature parameter, it will lead
to a kernel crash
Signed-off-by: Lidong Zhong <lzhong@suse.com>
---
drivers/md/dm-raid1.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c
index 00b1fbd..8e32c4e 100644
--- a/drivers/md/dm-raid1.c
+++ b/drivers/md/dm-raid1.c
@@ -990,6 +990,7 @@ static int parse_features(struct mirror_set *ms, unsigned argc, char **argv,
unsigned num_features;
struct dm_target *ti = ms->ti;
char dummy;
+ int i;
*args_used = 0;
@@ -1010,19 +1011,18 @@ static int parse_features(struct mirror_set *ms, unsigned argc, char **argv,
return -EINVAL;
}
- if (!strcmp("handle_errors", argv[0]))
- ms->features |= DM_RAID1_HANDLE_ERRORS;
- else {
- ti->error = "Unrecognised feature requested";
- return -EINVAL;
- }
-
- argc--;
- argv++;
- (*args_used)++;
+ for (i = 0; i < num_features; i++) {
+ if (!strcmp("handle_errors", argv[0]))
+ ms->features |= DM_RAID1_HANDLE_ERRORS;
+ else if (!strcmp("keep_log", argv[0]))
+ ms->features |= DM_RAID1_KEEP_LOG;
+ else {
+ ti->error = "Unrecognised feature requested";
+ return -EINVAL;
+ }
- if (!strcmp("keep_log", argv[0])) {
- ms->features |= DM_RAID1_KEEP_LOG;
+ argc--;
+ argv++;
(*args_used)++;
}
--
1.8.1.4
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH 2/3] dm-raid1: remove the error flags in the mirror set when it's in sync
2015-04-16 3:43 [PATCH 0/3] [RESEND]dm-raid1: several fixes about writing on out of sync mirror device Lidong Zhong
2015-04-16 3:43 ` [PATCH 1/3] dm-raid1: fix the parameter passed into the kernel Lidong Zhong
@ 2015-04-16 3:43 ` Lidong Zhong
2015-04-16 3:43 ` [PATCH 3/3] dm-raid1: change default mirror when it's not " Lidong Zhong
` (2 subsequent siblings)
4 siblings, 0 replies; 8+ messages in thread
From: Lidong Zhong @ 2015-04-16 3:43 UTC (permalink / raw)
To: dm-devel; +Cc: heinzm
Because it's in sync now, we can remove all kinds of error flags during
the recovery. Otherwise, it will affect the valid mirror legs and the write
bio.
Signed-off-by: Lidong Zhong <lzhong@suse.com>
---
drivers/md/dm-raid1.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c
index 8e32c4e..95ec822 100644
--- a/drivers/md/dm-raid1.c
+++ b/drivers/md/dm-raid1.c
@@ -372,6 +372,17 @@ static int recover(struct mirror_set *ms, struct dm_region *reg)
return r;
}
+static void reset_ms_flags(struct mirror_set *ms)
+{
+ unsigned int m;
+
+ ms->leg_failure = 0;
+ for (m = 0; m < ms->nr_mirrors; m++) {
+ atomic_set(&(ms->mirror[m].error_count), 0);
+ ms->mirror[m].error_type = 0;
+ }
+}
+
static void do_recovery(struct mirror_set *ms)
{
struct dm_region *reg;
@@ -400,6 +411,7 @@ static void do_recovery(struct mirror_set *ms)
/* the sync is complete */
dm_table_event(ms->ti->table);
ms->in_sync = 1;
+ reset_ms_flags(ms);
}
}
--
1.8.1.4
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH 3/3] dm-raid1: change default mirror when it's not in sync
2015-04-16 3:43 [PATCH 0/3] [RESEND]dm-raid1: several fixes about writing on out of sync mirror device Lidong Zhong
2015-04-16 3:43 ` [PATCH 1/3] dm-raid1: fix the parameter passed into the kernel Lidong Zhong
2015-04-16 3:43 ` [PATCH 2/3] dm-raid1: remove the error flags in the mirror set when it's in sync Lidong Zhong
@ 2015-04-16 3:43 ` Lidong Zhong
2015-04-16 9:58 ` [PATCH 0/3] [RESEND]dm-raid1: several fixes about writing on out of sync mirror device Heinz Mauelshagen
2015-05-11 18:55 ` Mike Snitzer
4 siblings, 0 replies; 8+ messages in thread
From: Lidong Zhong @ 2015-04-16 3:43 UTC (permalink / raw)
To: dm-devel; +Cc: heinzm
Change the default mirror to a valid leg even when it's not in sync,
otherwise the mirror set will recover from a non-integrated mirror
Signed-off-by: Lidong Zhong <lzhong@suse.com>
---
drivers/md/dm-raid1.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c
index 95ec822..db0ccb9 100644
--- a/drivers/md/dm-raid1.c
+++ b/drivers/md/dm-raid1.c
@@ -231,7 +231,7 @@ static void fail_mirror(struct mirror *m, enum dm_raid1_error error_type)
if (m != get_default_mirror(ms))
goto out;
- if (!ms->in_sync) {
+ if (!ms->in_sync && !keep_log(ms)) {
/*
* Better to issue requests to same failing device
* than to risk returning corrupt data.
--
1.8.1.4
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH 0/3] [RESEND]dm-raid1: several fixes about writing on out of sync mirror device
2015-04-16 3:43 [PATCH 0/3] [RESEND]dm-raid1: several fixes about writing on out of sync mirror device Lidong Zhong
` (2 preceding siblings ...)
2015-04-16 3:43 ` [PATCH 3/3] dm-raid1: change default mirror when it's not " Lidong Zhong
@ 2015-04-16 9:58 ` Heinz Mauelshagen
2015-04-22 3:39 ` Lidong Zhong
2015-05-11 18:55 ` Mike Snitzer
4 siblings, 1 reply; 8+ messages in thread
From: Heinz Mauelshagen @ 2015-04-16 9:58 UTC (permalink / raw)
To: Lidong Zhong, dm-devel
Lidong,
tests need to happen under heavy load, i.e. worst
case scenario failures.
E.g. an fs is mounted and being updated whilst
you're tacking offline/bringing back mirror legs
to cause them to be get resynchronized.
Heinz
On 04/16/2015 05:43 AM, Lidong Zhong wrote:
> Hi List/Heinz,
>
> These three patches are done based on last patch series that replied on April 8.
> The following is the test I did about this feature. My test environment:
> linux-klqg:~ # dmsetup ls --tree
> vg-lv (253:4)
> ├─vg-lv_mimage_2 (253:3)
> │ └─ (8:48)
> ├─vg-lv_mimage_1 (253:2)
> │ └─ (8:32)
> ├─vg-lv_mimage_0 (253:1)
> │ └─ (8:16)
> └─vg-lv_mlog (253:0)
> └─ (8:64)
> nux-klqg:~ # dmsetup status
> vg-lv_mimage_2: 0 614400 linear
> vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 600/600 1 AAA 3 disk 253:0 A
> vg-lv_mimage_1: 0 614400 linear
> vg-lv_mimage_0: 0 614400 linear
> vg-lv_mlog: 0 8192 linear
> linux-klqg:~ # dmsetup table
> vg-lv_mimage_2: 0 614400 linear 8:48 2048
> vg-lv: 0 614400 mirror disk 2 253:0 1024 3 253:1 0 253:2 0 253:3 0 2 handle_errors keep_log
> vg-lv_mimage_1: 0 614400 linear 8:32 2048
> vg-lv_mimage_0: 0 614400 linear 8:16 2048
> vg-lv_mlog: 0 8192 linear 8:64 2048
>
>
> 1\, single data device failure
> After make one of the data legs failed, writing data to the first three regions.
> linux-klqg:~ # echo "a" |dd of=/dev/vg/lv bs=1K count=1 seek=0
> 0+1 records in
> 0+1 records out
> 2 bytes (2 B) copied, 0.0103211 s, 0.2 kB/s
> linux-klqg:~ # echo "b" |dd of=/dev/vg/lv bs=1K count=1 seek=512
> 0+1 records in
> 0+1 records out
> 2 bytes (2 B) copied, 0.00428962 s, 0.5 kB/s
> linux-klqg:~ # echo "c" |dd of=/dev/vg/lv bs=1K count=1 seek=1024
> 0+1 records in
> 0+1 records out
> 2 bytes (2 B) copied, 0.00282482 s, 0.7 kB/s
> linux-klqg:~ # dmsetup status
> vg-lv_mimage_2: 0 614400 linear
> vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 597/600 1 ADA 3 disk 253:0 A
> vg-lv_mimage_1: 0 614400 linear
> vg-lv_mimage_0: 0 614400 linear
> vg-lv_mlog: 0 8192 linear
>
> Now the failed device comes back, its major/minor number may changes, replace the table as needed.
> (The devices I tested on are iscsi devices and the minor number changed after each attach/detach)
> Then start the recovery
> linux-klqg:~ # dmsetup suspend vg-lv
> linux-klqg:~ # dmsetup resume vg-lv
> linux-klqg:~ # dmsetup status
> vg-lv_mimage_2: 0 614400 linear
> vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 600/600 1 AAA 3 disk 253:0 A
> vg-lv_mimage_1: 0 614400 linear
> vg-lv_mimage_0: 0 614400 linear
> vg-lv_mlog: 0 8192 linear
>
> We can see that all the regions are in sync now.
>
> 2\, two or more data device failure
> After detaching the first device(mine is /dev/sdb), write data to the first and second region
> linux-klqg:~ # echo "1111111" | dd of=/dev/vg/lv bs=1K count=1 seek=0
> 0+1 records in
> 0+1 records out
> 8 bytes (8 B) copied, 0.00209451 s, 3.8 kB/s
> linux-klqg:~ # echo "222222" | dd of=/dev/vg/lv bs=1K count=1 seek=512
> 0+1 records in
> 0+1 records out
> 7 bytes (7 B) copied, 0.00259999 s, 2.7 kB/s
> linux-klqg:~ # dmsetup status
> vg-lv_mimage_2: 0 614400 linear
> vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 598/600 1 ADA 3 disk 253:0 A
> vg-lv_mimage_1: 0 614400 linear
> vg-lv_mimage_0: 0 614400 linear
> vg-lv_mlog: 0 8192 linear
>
> Now the first and second region are marked as no sync. Then detach the second device
> (mine is /dev/sdd) and write data to the third and fourth region
> linux-klqg:~ # echo "333333" | dd of=/dev/vg/lv bs=1K count=1 seek=1024
> 0+1 records in
> 0+1 records out
> 7 bytes (7 B) copied, 0.00178031 s, 3.9 kB/s
> linux-klqg:~ # echo "444444" | dd of=/dev/vg/lv bs=1K count=1 seek=1536
> 0+1 records in
> 0+1 records out
> 7 bytes (7 B) copied, 0.00256491 s, 2.7 kB/s
> linux-klqg:~ # dmsetup status
> vg-lv_mimage_2: 0 614400 linear
> vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 596/600 1 DDA 3 disk 253:0 A
> vg-lv_mimage_1: 0 614400 linear
> vg-lv_mimage_0: 0 614400 linear
> vg-lv_mlog: 0 8192 linear
>
> Now there are 4 regions are marked as no sync. Then the first failed device comes back, we try to
> do the recovery.
> linux-klqg:~ # dmsetup suspend vg-lv
> linux-klqg:~ # dmsetup resume vg-lv
> linux-klqg:~ #
> linux-klqg:~ #
> linux-klqg:~ # dmsetup status
> vg-lv_mimage_2: 0 614400 linear
> vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 596/600 1 DDA 3 disk 253:0 A
> vg-lv_mimage_1: 0 614400 linear
> vg-lv_mimage_0: 0 614400 linear
> vg-lv_mlog: 0 8192 linear
>
> And it shows there are still 4 regions are marked as no resync, because there is still
> a missing device. And we keep writing to the fifth region
> linux-klqg:~ # echo "5555555" | dd of=/dev/vg/lv bs=1K count=1 seek=2048
> 0+1 records in
> 0+1 records out
> 8 bytes (8 B) copied, 0.00213449 s, 3.7 kB/s
>
> And now the second missing device comes back. We try to do the recovery
> linux-klqg:~ # dmsetup suspend vg-lv
> linux-klqg:~ # dmsetup resume vg-lv
> linux-klqg:~ # dmsetup status
> vg-lv_mimage_2: 0 614400 linear
> vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 600/600 1 AAA 3 disk 253:0 A
> vg-lv_mimage_1: 0 614400 linear
> vg-lv_mimage_0: 0 614400 linear
> vg-lv_mlog: 0 8192 linear
>
> It shows all the legs are sync now. we read data from each leg and get the
> same result.
> 3\, log device failure
> After make the log device failed, we tried to write on this lv
> linux-klqg:~ # echo "test" |dd of=/dev/vg/lv bs=1K count=1 seek=0
> 0+1 records in
> 0+1 records out
> 21 bytes (21 B) copied, 0.00470523 s, 4.5 kB/s
> linux-klqg:~ # dmsetup status
> vg-lv_mimage_2: 0 614400 linear
> vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 600/600 1 AAA 3 disk 253:0 D
> vg-lv_mimage_1: 0 614400 linear
> vg-lv_mimage_0: 0 614400 linear
> vg-lv_mlog: 0 8192 linear
> And we can see that the log device is marked as failed.
> And the bio is not written to the data legs because we can't read new data our of
> the leg
>
> Is the test enough? or is there corner case that is not covered in the patch?
> Any advice is appreciated.
>
> Regards,
> Lidong
>
> Lidong Zhong (3):
> dm-raid1: fix the parameter passed into the kernel
> dm-raid1: remove the error flags in the mirror set when it's in sync
> dm-raid1: change default mirror when it's not in sync
>
> drivers/md/dm-raid1.c | 38 +++++++++++++++++++++++++-------------
> 1 file changed, 25 insertions(+), 13 deletions(-)
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH 0/3] [RESEND]dm-raid1: several fixes about writing on out of sync mirror device
2015-04-16 9:58 ` [PATCH 0/3] [RESEND]dm-raid1: several fixes about writing on out of sync mirror device Heinz Mauelshagen
@ 2015-04-22 3:39 ` Lidong Zhong
0 siblings, 0 replies; 8+ messages in thread
From: Lidong Zhong @ 2015-04-22 3:39 UTC (permalink / raw)
To: dm-devel
Hi Heinz,
The test idea is basically the same as before, except mounting the lv with ext3
and iozone with the following script.
#! /bin/sh
var=0
while [ $var -lt 10 ]
do
let "var += 1"
date
iozone -i 0 -i 1 -i 2 -l 3 -u 3 -r 4k -s 100m -F /mnt/lv/f1 /mnt/lv/f2 /mnt/lv/f3
date
done
My machine is KVM virtual machine with 2G memory and 4 CPUs.
And the test result is the same. After the failed device comes back, dm-raid1 could do
the recovery and get full synced.(`dmsetup status vg-lv`show that all the regions are sync)
And we can see that data reading from the default mirror and writing to the other legs with
iostat while doing the recovery.
We are still working on the userspace part. Once it's done, we will also send a
test case to cover this feature.
Regards,
Lidong
>>> On 4/16/2015 at 05:58 PM, in message <552F87AE.2010405@redhat.com>, Heinz
Mauelshagen <heinzm@redhat.com> wrote:
> Lidong,
>
> tests need to happen under heavy load, i.e. worst
> case scenario failures.
>
> E.g. an fs is mounted and being updated whilst
> you're tacking offline/bringing back mirror legs
> to cause them to be get resynchronized.
>
> Heinz
>
> On 04/16/2015 05:43 AM, Lidong Zhong wrote:
> > Hi List/Heinz,
> >
> > These three patches are done based on last patch series that replied on
> April 8.
> > The following is the test I did about this feature. My test environment:
> > linux-klqg:~ # dmsetup ls --tree
> > vg-lv (253:4)
> > ├─vg-lv_mimage_2 (253:3)
> > │ └─ (8:48)
> > ├─vg-lv_mimage_1 (253:2)
> > │ └─ (8:32)
> > ├─vg-lv_mimage_0 (253:1)
> > │ └─ (8:16)
> > └─vg-lv_mlog (253:0)
> > └─ (8:64)
> > nux-klqg:~ # dmsetup status
> > vg-lv_mimage_2: 0 614400 linear
> > vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 600/600 1 AAA 3 disk 253:0 A
> > vg-lv_mimage_1: 0 614400 linear
> > vg-lv_mimage_0: 0 614400 linear
> > vg-lv_mlog: 0 8192 linear
> > linux-klqg:~ # dmsetup table
> > vg-lv_mimage_2: 0 614400 linear 8:48 2048
> > vg-lv: 0 614400 mirror disk 2 253:0 1024 3 253:1 0 253:2 0 253:3 0 2
> handle_errors keep_log
> > vg-lv_mimage_1: 0 614400 linear 8:32 2048
> > vg-lv_mimage_0: 0 614400 linear 8:16 2048
> > vg-lv_mlog: 0 8192 linear 8:64 2048
> >
> >
> > 1\, single data device failure
> > After make one of the data legs failed, writing data to the first three
> regions.
> > linux-klqg:~ # echo "a" |dd of=/dev/vg/lv bs=1K count=1 seek=0
> > 0+1 records in
> > 0+1 records out
> > 2 bytes (2 B) copied, 0.0103211 s, 0.2 kB/s
> > linux-klqg:~ # echo "b" |dd of=/dev/vg/lv bs=1K count=1 seek=512
> > 0+1 records in
> > 0+1 records out
> > 2 bytes (2 B) copied, 0.00428962 s, 0.5 kB/s
> > linux-klqg:~ # echo "c" |dd of=/dev/vg/lv bs=1K count=1 seek=1024
> > 0+1 records in
> > 0+1 records out
> > 2 bytes (2 B) copied, 0.00282482 s, 0.7 kB/s
> > linux-klqg:~ # dmsetup status
> > vg-lv_mimage_2: 0 614400 linear
> > vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 597/600 1 ADA 3 disk 253:0 A
> > vg-lv_mimage_1: 0 614400 linear
> > vg-lv_mimage_0: 0 614400 linear
> > vg-lv_mlog: 0 8192 linear
> >
> > Now the failed device comes back, its major/minor number may changes,
> replace the table as needed.
> > (The devices I tested on are iscsi devices and the minor number changed
> after each attach/detach)
> > Then start the recovery
> > linux-klqg:~ # dmsetup suspend vg-lv
> > linux-klqg:~ # dmsetup resume vg-lv
> > linux-klqg:~ # dmsetup status
> > vg-lv_mimage_2: 0 614400 linear
> > vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 600/600 1 AAA 3 disk 253:0 A
> > vg-lv_mimage_1: 0 614400 linear
> > vg-lv_mimage_0: 0 614400 linear
> > vg-lv_mlog: 0 8192 linear
> >
> > We can see that all the regions are in sync now.
> >
> > 2\, two or more data device failure
> > After detaching the first device(mine is /dev/sdb), write data to the first
> and second region
> > linux-klqg:~ # echo "1111111" | dd of=/dev/vg/lv bs=1K count=1 seek=0
> > 0+1 records in
> > 0+1 records out
> > 8 bytes (8 B) copied, 0.00209451 s, 3.8 kB/s
> > linux-klqg:~ # echo "222222" | dd of=/dev/vg/lv bs=1K count=1 seek=512
> > 0+1 records in
> > 0+1 records out
> > 7 bytes (7 B) copied, 0.00259999 s, 2.7 kB/s
> > linux-klqg:~ # dmsetup status
> > vg-lv_mimage_2: 0 614400 linear
> > vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 598/600 1 ADA 3 disk 253:0 A
> > vg-lv_mimage_1: 0 614400 linear
> > vg-lv_mimage_0: 0 614400 linear
> > vg-lv_mlog: 0 8192 linear
> >
> > Now the first and second region are marked as no sync. Then detach the
> second device
> > (mine is /dev/sdd) and write data to the third and fourth region
> > linux-klqg:~ # echo "333333" | dd of=/dev/vg/lv bs=1K count=1 seek=1024
> > 0+1 records in
> > 0+1 records out
> > 7 bytes (7 B) copied, 0.00178031 s, 3.9 kB/s
> > linux-klqg:~ # echo "444444" | dd of=/dev/vg/lv bs=1K count=1 seek=1536
> > 0+1 records in
> > 0+1 records out
> > 7 bytes (7 B) copied, 0.00256491 s, 2.7 kB/s
> > linux-klqg:~ # dmsetup status
> > vg-lv_mimage_2: 0 614400 linear
> > vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 596/600 1 DDA 3 disk 253:0 A
> > vg-lv_mimage_1: 0 614400 linear
> > vg-lv_mimage_0: 0 614400 linear
> > vg-lv_mlog: 0 8192 linear
> >
> > Now there are 4 regions are marked as no sync. Then the first failed device
> comes back, we try to
> > do the recovery.
> > linux-klqg:~ # dmsetup suspend vg-lv
> > linux-klqg:~ # dmsetup resume vg-lv
> > linux-klqg:~ #
> > linux-klqg:~ #
> > linux-klqg:~ # dmsetup status
> > vg-lv_mimage_2: 0 614400 linear
> > vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 596/600 1 DDA 3 disk 253:0 A
> > vg-lv_mimage_1: 0 614400 linear
> > vg-lv_mimage_0: 0 614400 linear
> > vg-lv_mlog: 0 8192 linear
> >
> > And it shows there are still 4 regions are marked as no resync, because
> there is still
> > a missing device. And we keep writing to the fifth region
> > linux-klqg:~ # echo "5555555" | dd of=/dev/vg/lv bs=1K count=1 seek=2048
> > 0+1 records in
> > 0+1 records out
> > 8 bytes (8 B) copied, 0.00213449 s, 3.7 kB/s
> >
> > And now the second missing device comes back. We try to do the recovery
> > linux-klqg:~ # dmsetup suspend vg-lv
> > linux-klqg:~ # dmsetup resume vg-lv
> > linux-klqg:~ # dmsetup status
> > vg-lv_mimage_2: 0 614400 linear
> > vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 600/600 1 AAA 3 disk 253:0 A
> > vg-lv_mimage_1: 0 614400 linear
> > vg-lv_mimage_0: 0 614400 linear
> > vg-lv_mlog: 0 8192 linear
> >
> > It shows all the legs are sync now. we read data from each leg and get the
> > same result.
> > 3\, log device failure
> > After make the log device failed, we tried to write on this lv
> > linux-klqg:~ # echo "test" |dd of=/dev/vg/lv bs=1K count=1 seek=0
> > 0+1 records in
> > 0+1 records out
> > 21 bytes (21 B) copied, 0.00470523 s, 4.5 kB/s
> > linux-klqg:~ # dmsetup status
> > vg-lv_mimage_2: 0 614400 linear
> > vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 600/600 1 AAA 3 disk 253:0 D
> > vg-lv_mimage_1: 0 614400 linear
> > vg-lv_mimage_0: 0 614400 linear
> > vg-lv_mlog: 0 8192 linear
> > And we can see that the log device is marked as failed.
> > And the bio is not written to the data legs because we can't read new data
> our of
> > the leg
> >
> > Is the test enough? or is there corner case that is not covered in the
> patch?
> > Any advice is appreciated.
> >
> > Regards,
> > Lidong
> >
> > Lidong Zhong (3):
> > dm-raid1: fix the parameter passed into the kernel
> > dm-raid1: remove the error flags in the mirror set when it's in sync
> > dm-raid1: change default mirror when it's not in sync
> >
> > drivers/md/dm-raid1.c | 38 +++++++++++++++++++++++++-------------
> > 1 file changed, 25 insertions(+), 13 deletions(-)
>
>
>
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/3] [RESEND]dm-raid1: several fixes about writing on out of sync mirror device
2015-04-16 3:43 [PATCH 0/3] [RESEND]dm-raid1: several fixes about writing on out of sync mirror device Lidong Zhong
` (3 preceding siblings ...)
2015-04-16 9:58 ` [PATCH 0/3] [RESEND]dm-raid1: several fixes about writing on out of sync mirror device Heinz Mauelshagen
@ 2015-05-11 18:55 ` Mike Snitzer
2015-05-13 6:01 ` Lidong Zhong
4 siblings, 1 reply; 8+ messages in thread
From: Mike Snitzer @ 2015-05-11 18:55 UTC (permalink / raw)
To: Lidong Zhong; +Cc: heinzm, dm-devel
On Wed, Apr 15 2015 at 11:43pm -0400,
Lidong Zhong <lzhong@suse.com> wrote:
> Hi List/Heinz,
>
> These three patches are done based on last patch series that replied on April 8.
Unless the current upstream code can crash (and there is an isolated fix
for that crash that should cc stable): I'd prefer to see all related
patches folded into a single patch and the patch header updated to
clearly document what is being fixed with these changes. Also is there
a userspace dependency that is required for this kernel patch to work?
If so please say as much.
So my current understanding is you should be able to fold all 5 of these
patches together and resubmit a single patch:
https://patchwork.kernel.org/patch/6179141/
https://patchwork.kernel.org/patch/6179151/
https://patchwork.kernel.org/patch/6223631/
https://patchwork.kernel.org/patch/6223651/
https://patchwork.kernel.org/patch/6223641/
I'd be OK taking numerous patches but we aren't talking a lot of code
here... not seeing a need to make things more complex than needed.
Mike
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2015-05-13 6:01 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-16 3:43 [PATCH 0/3] [RESEND]dm-raid1: several fixes about writing on out of sync mirror device Lidong Zhong
2015-04-16 3:43 ` [PATCH 1/3] dm-raid1: fix the parameter passed into the kernel Lidong Zhong
2015-04-16 3:43 ` [PATCH 2/3] dm-raid1: remove the error flags in the mirror set when it's in sync Lidong Zhong
2015-04-16 3:43 ` [PATCH 3/3] dm-raid1: change default mirror when it's not " Lidong Zhong
2015-04-16 9:58 ` [PATCH 0/3] [RESEND]dm-raid1: several fixes about writing on out of sync mirror device Heinz Mauelshagen
2015-04-22 3:39 ` Lidong Zhong
2015-05-11 18:55 ` Mike Snitzer
2015-05-13 6:01 ` Lidong Zhong
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.