linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [bug] its messy when missing device reappears after its been replaced in RAID1
@ 2014-01-06 16:56 Anand Jain
  2014-01-13 10:05 ` Wang Shilong
  0 siblings, 1 reply; 3+ messages in thread
From: Anand Jain @ 2014-01-06 16:56 UTC (permalink / raw)
  To: linux-btrfs


test case:
disappear a disk then replace (RAID1) the disappeared disk
and then make disappeared disk to reappear.

----
  mkfs.btrfs -f -m raid1 -d raid1 /dev/sdc /dev/sdd
  mount /dev/sdc /btrfs
  dd if=/dev/zero of=/btrfs/tf1 count=1
  btrfs fi sync /btrfs
---

devmgt[1] will help to attach or detach a disk easily

--
  devmgt show
  devmgt detach /dev/sdc
--

btrfs sill unaware of device missing.
--
  btrfs fi show -m
Label: none  uuid: 5dc0aaf4-4683-4050-b2d6-5ebe5f5cd120
         Total devices 2 FS bytes used 32.00KiB
         devid    1 size 958.94MiB used 115.88MiB path /dev/sdc <--
         devid    2 size 958.94MiB used 103.88MiB path /dev/sdd

  btrfs rep start -f 1 /dev/sde /btrfs
Label: none  uuid: 5dc0aaf4-4683-4050-b2d6-5ebe5f5cd120
         Total devices 2 FS bytes used 32.00KiB
         devid    1 size 958.94MiB used 115.88MiB path /dev/sde
         devid    2 size 958.94MiB used 103.88MiB path /dev/sdd
--

so far good. now missing /dev/sdc comes-back.

---
  devmgt attach host2

btrfs fi show -m shows sdc
Label: none  uuid: 5dc0aaf4-4683-4050-b2d6-5ebe5f5cd120^M
         Total devices 2 FS bytes used 32.00KiB^M
         devid    1 size 958.94MiB used 115.88MiB path /dev/sdc <- Wrong.
         devid    2 size 958.94MiB used 103.88MiB path /dev/sdd
---

this is wrong it should be sde. this happened because when
disk comes back device_list_add() is called which would invariably
replace the existing disk with the given disk with the same fsid/devid.
But the actual IO is still going to sde not to sdc.

Further when we start fresh with (modprobe -r btrfs)
unless it is carefully managed using btrfs dev scan <dev>
it may pair with wrong disk.

Need your review of the following proposed fix. This patch
will compare the trans id before disk is substituted.

----------------------------------------------------
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 2ca91fc..b226284 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -496,14 +496,39 @@ static noinline int device_list_add(const char *path,

                 device->fs_devices = fs_devices;
         } else if (!device->name || strcmp(device->name->str, path)) {
-               name = rcu_string_strdup(path, GFP_NOFS);
-               if (!name)
-                       return -ENOMEM;
-               rcu_string_free(device->name);
-               rcu_assign_pointer(device->name, name);
-               if (device->missing) {
-                       fs_devices->missing_devices--;
-                       device->missing = 0;
+
+               struct buffer_head *bh;
+               struct btrfs_super_block *cur_disk_super;
+               u64 cur_transid;
+
+               if (!device->missing) {
+                       bh = btrfs_read_dev_super(device->bdev);
+                       if (!bh)
+                               return -EINVAL;
+
+                       cur_disk_super = (struct btrfs_super_block *)
+						bh->b_data;
+                       cur_transid = btrfs_super_generation(ds);
+               } else
+                       cur_transid = 0;
+
+               if (found_transid > cur_transid) {
+
+                       name = rcu_string_strdup(path, GFP_NOFS);
+                       if (!name)
+                               return -ENOMEM;
+
+                       rcu_string_free(device->name);
+                       rcu_assign_pointer(device->name, name);
+
+                       if (device->missing) {
+                               fs_devices->missing_devices--;
+                               device->missing = 0;
+                       }
+
+       printk_in_rcu(KERN_INFO "%s tran %llu replaced %s tran %llu\n",
+                               path, found_transid,
+                               rcu_str_deref(device->name), tranid);
                 }
         }

---------------------------------------


Thanks Anand


[1] github.com/anajain/devmgt.git




^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [bug] its messy when missing device reappears after its been replaced in RAID1
  2014-01-06 16:56 [bug] its messy when missing device reappears after its been replaced in RAID1 Anand Jain
@ 2014-01-13 10:05 ` Wang Shilong
  2014-01-14 11:43   ` Anand Jain
  0 siblings, 1 reply; 3+ messages in thread
From: Wang Shilong @ 2014-01-13 10:05 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs

Hello Anand,

It seems that other developers did not notice such an important thread.:-)
I gave some of my opinions about this issue.

See more below.....

On 01/07/2014 12:56 AM, Anand Jain wrote:
>
> test case:
[snip]
....
> ----------------------------------------------------
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 2ca91fc..b226284 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -496,14 +496,39 @@ static noinline int device_list_add(const char 
> *path,
>
>                 device->fs_devices = fs_devices;
>         } else if (!device->name || strcmp(device->name->str, path)) {
> -               name = rcu_string_strdup(path, GFP_NOFS);
> -               if (!name)
> -                       return -ENOMEM;
> -               rcu_string_free(device->name);
> -               rcu_assign_pointer(device->name, name);
> -               if (device->missing) {
> -                       fs_devices->missing_devices--;
> -                       device->missing = 0;
> +
> +               struct buffer_head *bh;
> +               struct btrfs_super_block *cur_disk_super;
> +               u64 cur_transid;
> +
> +               if (!device->missing) {
> +                       bh = btrfs_read_dev_super(device->bdev);
> +                       if (!bh)
> +                               return -EINVAL;
> +
> +                       cur_disk_super = (struct btrfs_super_block *)
> +                        bh->b_data;
> +                       cur_transid = btrfs_super_generation(ds);
> +               } else
> +                       cur_transid = 0;
> +
> +               if (found_transid > cur_transid) {

I agree we use transid to find most proper device, but this check is not 
right.
Here @found_transid is the most biggest generation, so a right candidate 
device's
transid should be @found_tranid -1 (power off for example)or same as 
@found_transid.

Anyway, i think the right way should be to check two same id device, and 
we only replace
existed one if new found one's transid > previous one.

Please correct me if i miss something here.^_^

Thanks,
Wang
> +
> +                       name = rcu_string_strdup(path, GFP_NOFS);
> +                       if (!name)
> +                               return -ENOMEM;
> +
> +                       rcu_string_free(device->name);
> +                       rcu_assign_pointer(device->name, name);
> +
> +                       if (device->missing) {
> +                               fs_devices->missing_devices--;
> +                               device->missing = 0;
> +                       }
> +
> +       printk_in_rcu(KERN_INFO "%s tran %llu replaced %s tran %llu\n",
> +                               path, found_transid,
> +                               rcu_str_deref(device->name), tranid);
>                 }
>         }
>
> ---------------------------------------
>
>
> Thanks Anand
>
>
> [1] github.com/anajain/devmgt.git
>
>
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [bug] its messy when missing device reappears after its been replaced in RAID1
  2014-01-13 10:05 ` Wang Shilong
@ 2014-01-14 11:43   ` Anand Jain
  0 siblings, 0 replies; 3+ messages in thread
From: Anand Jain @ 2014-01-14 11:43 UTC (permalink / raw)
  To: Wang Shilong; +Cc: linux-btrfs

Hi Wang,


> I agree we use transid to find most proper device, but this check is not
> right.
> Here @found_transid is the most biggest generation, so a right candidate
> device's
> transid should be @found_tranid -1 (power off for example)or same as
> @found_transid.
>
> Anyway, i think the right way should be to check two same id device, and
> we only replace
> existed one if new found one's transid > previous one.
>
> Please correct me if i miss something here.^_^

  Thanks for the review comments ! I am writing the fix.

Anand

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-01-14 11:34 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-06 16:56 [bug] its messy when missing device reappears after its been replaced in RAID1 Anand Jain
2014-01-13 10:05 ` Wang Shilong
2014-01-14 11:43   ` Anand Jain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).