* [bug] its messy when missing device reappears after its been replaced in RAID1
@ 2014-01-06 16:56 Anand Jain
2014-01-13 10:05 ` Wang Shilong
0 siblings, 1 reply; 3+ messages in thread
From: Anand Jain @ 2014-01-06 16:56 UTC (permalink / raw)
To: linux-btrfs
test case:
disappear a disk then replace (RAID1) the disappeared disk
and then make disappeared disk to reappear.
----
mkfs.btrfs -f -m raid1 -d raid1 /dev/sdc /dev/sdd
mount /dev/sdc /btrfs
dd if=/dev/zero of=/btrfs/tf1 count=1
btrfs fi sync /btrfs
---
devmgt[1] will help to attach or detach a disk easily
--
devmgt show
devmgt detach /dev/sdc
--
btrfs sill unaware of device missing.
--
btrfs fi show -m
Label: none uuid: 5dc0aaf4-4683-4050-b2d6-5ebe5f5cd120
Total devices 2 FS bytes used 32.00KiB
devid 1 size 958.94MiB used 115.88MiB path /dev/sdc <--
devid 2 size 958.94MiB used 103.88MiB path /dev/sdd
btrfs rep start -f 1 /dev/sde /btrfs
Label: none uuid: 5dc0aaf4-4683-4050-b2d6-5ebe5f5cd120
Total devices 2 FS bytes used 32.00KiB
devid 1 size 958.94MiB used 115.88MiB path /dev/sde
devid 2 size 958.94MiB used 103.88MiB path /dev/sdd
--
so far good. now missing /dev/sdc comes-back.
---
devmgt attach host2
btrfs fi show -m shows sdc
Label: none uuid: 5dc0aaf4-4683-4050-b2d6-5ebe5f5cd120^M
Total devices 2 FS bytes used 32.00KiB^M
devid 1 size 958.94MiB used 115.88MiB path /dev/sdc <- Wrong.
devid 2 size 958.94MiB used 103.88MiB path /dev/sdd
---
this is wrong it should be sde. this happened because when
disk comes back device_list_add() is called which would invariably
replace the existing disk with the given disk with the same fsid/devid.
But the actual IO is still going to sde not to sdc.
Further when we start fresh with (modprobe -r btrfs)
unless it is carefully managed using btrfs dev scan <dev>
it may pair with wrong disk.
Need your review of the following proposed fix. This patch
will compare the trans id before disk is substituted.
----------------------------------------------------
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 2ca91fc..b226284 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -496,14 +496,39 @@ static noinline int device_list_add(const char *path,
device->fs_devices = fs_devices;
} else if (!device->name || strcmp(device->name->str, path)) {
- name = rcu_string_strdup(path, GFP_NOFS);
- if (!name)
- return -ENOMEM;
- rcu_string_free(device->name);
- rcu_assign_pointer(device->name, name);
- if (device->missing) {
- fs_devices->missing_devices--;
- device->missing = 0;
+
+ struct buffer_head *bh;
+ struct btrfs_super_block *cur_disk_super;
+ u64 cur_transid;
+
+ if (!device->missing) {
+ bh = btrfs_read_dev_super(device->bdev);
+ if (!bh)
+ return -EINVAL;
+
+ cur_disk_super = (struct btrfs_super_block *)
+ bh->b_data;
+ cur_transid = btrfs_super_generation(ds);
+ } else
+ cur_transid = 0;
+
+ if (found_transid > cur_transid) {
+
+ name = rcu_string_strdup(path, GFP_NOFS);
+ if (!name)
+ return -ENOMEM;
+
+ rcu_string_free(device->name);
+ rcu_assign_pointer(device->name, name);
+
+ if (device->missing) {
+ fs_devices->missing_devices--;
+ device->missing = 0;
+ }
+
+ printk_in_rcu(KERN_INFO "%s tran %llu replaced %s tran %llu\n",
+ path, found_transid,
+ rcu_str_deref(device->name), tranid);
}
}
---------------------------------------
Thanks Anand
[1] github.com/anajain/devmgt.git
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [bug] its messy when missing device reappears after its been replaced in RAID1
2014-01-06 16:56 [bug] its messy when missing device reappears after its been replaced in RAID1 Anand Jain
@ 2014-01-13 10:05 ` Wang Shilong
2014-01-14 11:43 ` Anand Jain
0 siblings, 1 reply; 3+ messages in thread
From: Wang Shilong @ 2014-01-13 10:05 UTC (permalink / raw)
To: Anand Jain; +Cc: linux-btrfs
Hello Anand,
It seems that other developers did not notice such an important thread.:-)
I gave some of my opinions about this issue.
See more below.....
On 01/07/2014 12:56 AM, Anand Jain wrote:
>
> test case:
[snip]
....
> ----------------------------------------------------
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 2ca91fc..b226284 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -496,14 +496,39 @@ static noinline int device_list_add(const char
> *path,
>
> device->fs_devices = fs_devices;
> } else if (!device->name || strcmp(device->name->str, path)) {
> - name = rcu_string_strdup(path, GFP_NOFS);
> - if (!name)
> - return -ENOMEM;
> - rcu_string_free(device->name);
> - rcu_assign_pointer(device->name, name);
> - if (device->missing) {
> - fs_devices->missing_devices--;
> - device->missing = 0;
> +
> + struct buffer_head *bh;
> + struct btrfs_super_block *cur_disk_super;
> + u64 cur_transid;
> +
> + if (!device->missing) {
> + bh = btrfs_read_dev_super(device->bdev);
> + if (!bh)
> + return -EINVAL;
> +
> + cur_disk_super = (struct btrfs_super_block *)
> + bh->b_data;
> + cur_transid = btrfs_super_generation(ds);
> + } else
> + cur_transid = 0;
> +
> + if (found_transid > cur_transid) {
I agree we use transid to find most proper device, but this check is not
right.
Here @found_transid is the most biggest generation, so a right candidate
device's
transid should be @found_tranid -1 (power off for example)or same as
@found_transid.
Anyway, i think the right way should be to check two same id device, and
we only replace
existed one if new found one's transid > previous one.
Please correct me if i miss something here.^_^
Thanks,
Wang
> +
> + name = rcu_string_strdup(path, GFP_NOFS);
> + if (!name)
> + return -ENOMEM;
> +
> + rcu_string_free(device->name);
> + rcu_assign_pointer(device->name, name);
> +
> + if (device->missing) {
> + fs_devices->missing_devices--;
> + device->missing = 0;
> + }
> +
> + printk_in_rcu(KERN_INFO "%s tran %llu replaced %s tran %llu\n",
> + path, found_transid,
> + rcu_str_deref(device->name), tranid);
> }
> }
>
> ---------------------------------------
>
>
> Thanks Anand
>
>
> [1] github.com/anajain/devmgt.git
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [bug] its messy when missing device reappears after its been replaced in RAID1
2014-01-13 10:05 ` Wang Shilong
@ 2014-01-14 11:43 ` Anand Jain
0 siblings, 0 replies; 3+ messages in thread
From: Anand Jain @ 2014-01-14 11:43 UTC (permalink / raw)
To: Wang Shilong; +Cc: linux-btrfs
Hi Wang,
> I agree we use transid to find most proper device, but this check is not
> right.
> Here @found_transid is the most biggest generation, so a right candidate
> device's
> transid should be @found_tranid -1 (power off for example)or same as
> @found_transid.
>
> Anyway, i think the right way should be to check two same id device, and
> we only replace
> existed one if new found one's transid > previous one.
>
> Please correct me if i miss something here.^_^
Thanks for the review comments ! I am writing the fix.
Anand
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2014-01-14 11:34 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-06 16:56 [bug] its messy when missing device reappears after its been replaced in RAID1 Anand Jain
2014-01-13 10:05 ` Wang Shilong
2014-01-14 11:43 ` Anand Jain
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).