* [PATCH 01/10] Btrfs: Fix the problem that the replace destroys the seed filesystem
@ 2014-07-24 3:37 Miao Xie
2014-07-24 3:37 ` [PATCH 02/10] Btrfs: don't write any data into a readonly device when scrub Miao Xie
` (11 more replies)
0 siblings, 12 replies; 23+ messages in thread
From: Miao Xie @ 2014-07-24 3:37 UTC (permalink / raw)
To: linux-btrfs
The seed filesystem was destroyed by the device replace, the reproduce
method is:
# mkfs.btrfs -f <dev0>
# btrfstune -S 1 <dev0>
# mount <dev0> <mnt>
# btrfs device add <dev1> <mnt>
# umount <mnt>
# mount <dev1> <mnt>
# btrfs replace start -f <dev0> <dev2> <mnt>
# umount <mnt>
# mount <dev0> <mnt>
It is because we erase the super block on the seed device. It is wrong,
we should not change anything on the seed device.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
fs/btrfs/volumes.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 2776070f..19188df 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1853,8 +1853,12 @@ void btrfs_rm_dev_replace_srcdev(struct btrfs_fs_info *fs_info,
if (srcdev->bdev) {
fs_info->fs_devices->open_devices--;
- /* zero out the old super */
- btrfs_scratch_superblock(srcdev);
+ /*
+ * zero out the old super if it is not writable
+ * (e.g. seed device)
+ */
+ if (srcdev->writeable)
+ btrfs_scratch_superblock(srcdev);
}
call_rcu(&srcdev->rcu, free_device);
--
1.9.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 02/10] Btrfs: don't write any data into a readonly device when scrub
2014-07-24 3:37 [PATCH 01/10] Btrfs: Fix the problem that the replace destroys the seed filesystem Miao Xie
@ 2014-07-24 3:37 ` Miao Xie
2014-07-24 13:19 ` David Sterba
2014-07-25 9:39 ` Anand Jain
2014-07-24 3:37 ` [PATCH 03/10] Btrfs: fix wrong fsid check of scrub Miao Xie
` (10 subsequent siblings)
11 siblings, 2 replies; 23+ messages in thread
From: Miao Xie @ 2014-07-24 3:37 UTC (permalink / raw)
To: linux-btrfs
We should not write data into a readonly device especially seed device when
doing scrub, skip those devices.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
fs/btrfs/scrub.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index b6d198f..23d3f6e 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2904,6 +2904,7 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
struct scrub_ctx *sctx;
int ret;
struct btrfs_device *dev;
+ struct rcu_string *name;
if (btrfs_fs_closing(fs_info))
return -EINVAL;
@@ -2965,6 +2966,16 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
return -ENODEV;
}
+ if (!is_dev_replace && !readonly && !dev->writeable) {
+ mutex_unlock(&fs_info->fs_devices->device_list_mutex);
+ rcu_read_lock();
+ name = rcu_dereference(dev->name);
+ btrfs_err(fs_info, "scrub: device %s is not writable",
+ name->str);
+ rcu_read_unlock();
+ return -EROFS;
+ }
+
mutex_lock(&fs_info->scrub_lock);
if (!dev->in_fs_metadata || dev->is_tgtdev_for_dev_replace) {
mutex_unlock(&fs_info->scrub_lock);
--
1.9.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 03/10] Btrfs: fix wrong fsid check of scrub
2014-07-24 3:37 [PATCH 01/10] Btrfs: Fix the problem that the replace destroys the seed filesystem Miao Xie
2014-07-24 3:37 ` [PATCH 02/10] Btrfs: don't write any data into a readonly device when scrub Miao Xie
@ 2014-07-24 3:37 ` Miao Xie
2014-07-24 13:24 ` David Sterba
2014-07-24 3:37 ` [PATCH 04/10] Btrfs: fix wrong generation check of super block on a seed device Miao Xie
` (9 subsequent siblings)
11 siblings, 1 reply; 23+ messages in thread
From: Miao Xie @ 2014-07-24 3:37 UTC (permalink / raw)
To: linux-btrfs
All the metadata in the seed devices has the same fsid as the fsid
of the seed filesystem which is on the seed device, so we should check
them by the current filesystem. Fix it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
fs/btrfs/scrub.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 23d3f6e..9a81874e 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -1361,6 +1361,16 @@ static void scrub_recheck_block(struct btrfs_fs_info *fs_info,
return;
}
+static inline int scrub_check_fsid(u8 fsid[],
+ struct scrub_page *spage)
+{
+ struct btrfs_fs_devices *fs_devices = spage->dev->fs_devices;
+ int ret;
+
+ ret = memcmp(fsid, fs_devices->fsid, BTRFS_UUID_SIZE);
+ return !ret;
+}
+
static void scrub_recheck_block_checksum(struct btrfs_fs_info *fs_info,
struct scrub_block *sblock,
int is_metadata, int have_csum,
@@ -1380,7 +1390,7 @@ static void scrub_recheck_block_checksum(struct btrfs_fs_info *fs_info,
h = (struct btrfs_header *)mapped_buffer;
if (sblock->pagev[0]->logical != btrfs_stack_header_bytenr(h) ||
- memcmp(h->fsid, fs_info->fsid, BTRFS_UUID_SIZE) ||
+ !scrub_check_fsid(h->fsid, sblock->pagev[0]) ||
memcmp(h->chunk_tree_uuid, fs_info->chunk_tree_uuid,
BTRFS_UUID_SIZE)) {
sblock->header_error = 1;
@@ -1750,7 +1760,7 @@ static int scrub_checksum_tree_block(struct scrub_block *sblock)
if (sblock->pagev[0]->generation != btrfs_stack_header_generation(h))
++fail;
- if (memcmp(h->fsid, fs_info->fsid, BTRFS_UUID_SIZE))
+ if (!scrub_check_fsid(h->fsid, sblock->pagev[0]))
++fail;
if (memcmp(h->chunk_tree_uuid, fs_info->chunk_tree_uuid,
@@ -1790,8 +1800,6 @@ static int scrub_checksum_super(struct scrub_block *sblock)
{
struct btrfs_super_block *s;
struct scrub_ctx *sctx = sblock->sctx;
- struct btrfs_root *root = sctx->dev_root;
- struct btrfs_fs_info *fs_info = root->fs_info;
u8 calculated_csum[BTRFS_CSUM_SIZE];
u8 on_disk_csum[BTRFS_CSUM_SIZE];
struct page *page;
@@ -1816,7 +1824,7 @@ static int scrub_checksum_super(struct scrub_block *sblock)
if (sblock->pagev[0]->generation != btrfs_super_generation(s))
++fail_gen;
- if (memcmp(s->fsid, fs_info->fsid, BTRFS_UUID_SIZE))
+ if (!scrub_check_fsid(s->fsid, sblock->pagev[0]))
++fail_cor;
len = BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE;
--
1.9.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 04/10] Btrfs: fix wrong generation check of super block on a seed device
2014-07-24 3:37 [PATCH 01/10] Btrfs: Fix the problem that the replace destroys the seed filesystem Miao Xie
2014-07-24 3:37 ` [PATCH 02/10] Btrfs: don't write any data into a readonly device when scrub Miao Xie
2014-07-24 3:37 ` [PATCH 03/10] Btrfs: fix wrong fsid check of scrub Miao Xie
@ 2014-07-24 3:37 ` Miao Xie
2014-07-24 13:25 ` David Sterba
2014-07-24 3:37 ` [PATCH 05/10] Btrfs: make the device lock and its protected data in the same cacheline Miao Xie
` (8 subsequent siblings)
11 siblings, 1 reply; 23+ messages in thread
From: Miao Xie @ 2014-07-24 3:37 UTC (permalink / raw)
To: linux-btrfs
The super block generation of the seed devices is not the same as the
filesystem which sprouted from them because we don't update the super
block on the seed devices when we change that new filesystem. So we
should not use the generation of that new filesystem to check the super
block generation on the seed devices, Fix it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
fs/btrfs/scrub.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 9a81874e..98b9f8e 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2835,7 +2835,11 @@ static noinline_for_stack int scrub_supers(struct scrub_ctx *sctx,
if (test_bit(BTRFS_FS_STATE_ERROR, &root->fs_info->fs_state))
return -EIO;
- gen = root->fs_info->last_trans_committed;
+ /* Seed devices of a new filesystem has their own generation. */
+ if (scrub_dev->fs_devices != root->fs_info->fs_devices)
+ gen = scrub_dev->generation;
+ else
+ gen = root->fs_info->last_trans_committed;
for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
bytenr = btrfs_sb_offset(i);
--
1.9.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 05/10] Btrfs: make the device lock and its protected data in the same cacheline
2014-07-24 3:37 [PATCH 01/10] Btrfs: Fix the problem that the replace destroys the seed filesystem Miao Xie
` (2 preceding siblings ...)
2014-07-24 3:37 ` [PATCH 04/10] Btrfs: fix wrong generation check of super block on a seed device Miao Xie
@ 2014-07-24 3:37 ` Miao Xie
2014-07-24 3:37 ` [PATCH 06/10] Btrfs: Fix the problem that the dirty flag of dev stats is cleared Miao Xie
` (7 subsequent siblings)
11 siblings, 0 replies; 23+ messages in thread
From: Miao Xie @ 2014-07-24 3:37 UTC (permalink / raw)
To: linux-btrfs
The lock in btrfs_device structure was far away from its protected data, it would
make CPU load the cache line twice when we accessed them, move them together.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
fs/btrfs/volumes.h | 25 ++++++++++++-------------
1 file changed, 12 insertions(+), 13 deletions(-)
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 2aaa00c..6fcc8ea 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -36,30 +36,31 @@ struct btrfs_device {
struct list_head dev_list;
struct list_head dev_alloc_list;
struct btrfs_fs_devices *fs_devices;
+
struct btrfs_root *dev_root;
+ struct rcu_string *name;
+
+ u64 generation;
+
+ spinlock_t io_lock ____cacheline_aligned;
+ int running_pending;
/* regular prio bios */
struct btrfs_pending_bios pending_bios;
/* WRITE_SYNC bios */
struct btrfs_pending_bios pending_sync_bios;
- u64 generation;
- int running_pending;
+ struct block_device *bdev;
+
+ /* the mode sent to blkdev_get */
+ fmode_t mode;
+
int writeable;
int in_fs_metadata;
int missing;
int can_discard;
int is_tgtdev_for_dev_replace;
- spinlock_t io_lock;
- /* the mode sent to blkdev_get */
- fmode_t mode;
-
- struct block_device *bdev;
-
-
- struct rcu_string *name;
-
/* the internal btrfs device id */
u64 devid;
@@ -83,7 +84,6 @@ struct btrfs_device {
/* minimal io size for this device */
u32 sector_size;
-
/* physical drive uuid (or lvm uuid) */
u8 uuid[BTRFS_UUID_SIZE];
@@ -107,7 +107,6 @@ struct btrfs_device {
struct radix_tree_root reada_zones;
struct radix_tree_root reada_extents;
-
/* disk I/O failure stats. For detailed description refer to
* enum btrfs_dev_stat_values in ioctl.h */
int dev_stats_valid;
--
1.9.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 06/10] Btrfs: Fix the problem that the dirty flag of dev stats is cleared
2014-07-24 3:37 [PATCH 01/10] Btrfs: Fix the problem that the replace destroys the seed filesystem Miao Xie
` (3 preceding siblings ...)
2014-07-24 3:37 ` [PATCH 05/10] Btrfs: make the device lock and its protected data in the same cacheline Miao Xie
@ 2014-07-24 3:37 ` Miao Xie
2014-07-24 13:45 ` David Sterba
2014-07-24 3:37 ` [PATCH 07/10] Btrfs: update the comment of total_bytes and disk_total_bytes of btrfs_devie Miao Xie
` (6 subsequent siblings)
11 siblings, 1 reply; 23+ messages in thread
From: Miao Xie @ 2014-07-24 3:37 UTC (permalink / raw)
To: linux-btrfs
The io error might happen during writing out the device stats, and the
device stats information and dirty flag would be update at that time,
but the current code didn't consider this case, just clear the dirty
flag, it would cause that we forgot to write out the new device stats
information. Fix it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
fs/btrfs/volumes.c | 7 +++++--
fs/btrfs/volumes.h | 19 +++++++++++++++----
2 files changed, 20 insertions(+), 6 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 19188df..0d37746 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -159,6 +159,7 @@ static struct btrfs_device *__alloc_device(void)
spin_lock_init(&dev->reada_lock);
atomic_set(&dev->reada_in_flight, 0);
+ atomic_set(&dev->dev_stats_ccnt, 0);
INIT_RADIX_TREE(&dev->reada_zones, GFP_NOFS & ~__GFP_WAIT);
INIT_RADIX_TREE(&dev->reada_extents, GFP_NOFS & ~__GFP_WAIT);
@@ -6398,16 +6399,18 @@ int btrfs_run_dev_stats(struct btrfs_trans_handle *trans,
struct btrfs_root *dev_root = fs_info->dev_root;
struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
struct btrfs_device *device;
+ int stats_cnt;
int ret = 0;
mutex_lock(&fs_devices->device_list_mutex);
list_for_each_entry(device, &fs_devices->devices, dev_list) {
- if (!device->dev_stats_valid || !device->dev_stats_dirty)
+ if (!device->dev_stats_valid || !btrfs_dev_stats_dirty(device))
continue;
+ stats_cnt = atomic_read(&device->dev_stats_ccnt);
ret = update_dev_stat_item(trans, dev_root, device);
if (!ret)
- device->dev_stats_dirty = 0;
+ atomic_sub(stats_cnt, &device->dev_stats_ccnt);
}
mutex_unlock(&fs_devices->device_list_mutex);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 6fcc8ea..0defd23 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -110,7 +110,9 @@ struct btrfs_device {
/* disk I/O failure stats. For detailed description refer to
* enum btrfs_dev_stat_values in ioctl.h */
int dev_stats_valid;
- int dev_stats_dirty; /* counters need to be written to disk */
+
+ /* Counter to record the change of device stats */
+ atomic_t dev_stats_ccnt;
atomic_t dev_stat_values[BTRFS_DEV_STAT_VALUES_MAX];
};
@@ -359,11 +361,18 @@ unsigned long btrfs_full_stripe_len(struct btrfs_root *root,
int btrfs_finish_chunk_alloc(struct btrfs_trans_handle *trans,
struct btrfs_root *extent_root,
u64 chunk_offset, u64 chunk_size);
+
+static inline int btrfs_dev_stats_dirty(struct btrfs_device *dev)
+{
+ return atomic_read(&dev->dev_stats_ccnt);
+}
+
static inline void btrfs_dev_stat_inc(struct btrfs_device *dev,
int index)
{
atomic_inc(dev->dev_stat_values + index);
- dev->dev_stats_dirty = 1;
+ smp_mb__before_atomic();
+ atomic_inc(&dev->dev_stats_ccnt);
}
static inline int btrfs_dev_stat_read(struct btrfs_device *dev,
@@ -378,7 +387,8 @@ static inline int btrfs_dev_stat_read_and_reset(struct btrfs_device *dev,
int ret;
ret = atomic_xchg(dev->dev_stat_values + index, 0);
- dev->dev_stats_dirty = 1;
+ smp_mb__before_atomic();
+ atomic_inc(&dev->dev_stats_ccnt);
return ret;
}
@@ -386,7 +396,8 @@ static inline void btrfs_dev_stat_set(struct btrfs_device *dev,
int index, unsigned long val)
{
atomic_set(dev->dev_stat_values + index, val);
- dev->dev_stats_dirty = 1;
+ smp_mb__before_atomic();
+ atomic_inc(&dev->dev_stats_ccnt);
}
static inline void btrfs_dev_stat_reset(struct btrfs_device *dev,
--
1.9.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 07/10] Btrfs: update the comment of total_bytes and disk_total_bytes of btrfs_devie
2014-07-24 3:37 [PATCH 01/10] Btrfs: Fix the problem that the replace destroys the seed filesystem Miao Xie
` (4 preceding siblings ...)
2014-07-24 3:37 ` [PATCH 06/10] Btrfs: Fix the problem that the dirty flag of dev stats is cleared Miao Xie
@ 2014-07-24 3:37 ` Miao Xie
2014-07-24 3:37 ` [PATCH 08/10] Btrfs: Fix wrong device size when we are resizing the device Miao Xie
` (5 subsequent siblings)
11 siblings, 0 replies; 23+ messages in thread
From: Miao Xie @ 2014-07-24 3:37 UTC (permalink / raw)
To: linux-btrfs
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
fs/btrfs/volumes.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 0defd23..90d4fa8 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -64,10 +64,10 @@ struct btrfs_device {
/* the internal btrfs device id */
u64 devid;
- /* size of the device */
+ /* size of the device in memory */
u64 total_bytes;
- /* size of the disk */
+ /* size of the device on disk */
u64 disk_total_bytes;
/* bytes used */
--
1.9.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 08/10] Btrfs: Fix wrong device size when we are resizing the device
2014-07-24 3:37 [PATCH 01/10] Btrfs: Fix the problem that the replace destroys the seed filesystem Miao Xie
` (5 preceding siblings ...)
2014-07-24 3:37 ` [PATCH 07/10] Btrfs: update the comment of total_bytes and disk_total_bytes of btrfs_devie Miao Xie
@ 2014-07-24 3:37 ` Miao Xie
2014-07-24 3:37 ` [PATCH 09/10] Btrfs: don't consider the missing device when allocating new chunks Miao Xie
` (4 subsequent siblings)
11 siblings, 0 replies; 23+ messages in thread
From: Miao Xie @ 2014-07-24 3:37 UTC (permalink / raw)
To: linux-btrfs
total_bytes of device is just a in-memory variant which is used to record
the size of the device, and it might be changed before we resize a device,
if the resize operation fails, it will be fallbacked. But some code used it
to update on-disk metadata of the device, it would cause the problem that
on-disk metadata of the devices was not consistent. We should use the other
variant named disk_total_bytes to update the on-disk metadata of device,
because that variant is updated only when the resize operation is successful.
Fix it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
fs/btrfs/disk-io.c | 3 ++-
fs/btrfs/volumes.c | 2 +-
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 08e65e9..2a9560c1 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3452,7 +3452,8 @@ static int write_all_supers(struct btrfs_root *root, int max_mirrors)
btrfs_set_stack_device_generation(dev_item, 0);
btrfs_set_stack_device_type(dev_item, dev->type);
btrfs_set_stack_device_id(dev_item, dev->devid);
- btrfs_set_stack_device_total_bytes(dev_item, dev->total_bytes);
+ btrfs_set_stack_device_total_bytes(dev_item,
+ dev->disk_total_bytes);
btrfs_set_stack_device_bytes_used(dev_item, dev->bytes_used);
btrfs_set_stack_device_io_align(dev_item, dev->io_align);
btrfs_set_stack_device_io_width(dev_item, dev->io_width);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 0d37746..0140bff 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1489,7 +1489,7 @@ static int btrfs_add_device(struct btrfs_trans_handle *trans,
btrfs_set_device_io_align(leaf, dev_item, device->io_align);
btrfs_set_device_io_width(leaf, dev_item, device->io_width);
btrfs_set_device_sector_size(leaf, dev_item, device->sector_size);
- btrfs_set_device_total_bytes(leaf, dev_item, device->total_bytes);
+ btrfs_set_device_total_bytes(leaf, dev_item, device->disk_total_bytes);
btrfs_set_device_bytes_used(leaf, dev_item, device->bytes_used);
btrfs_set_device_group(leaf, dev_item, 0);
btrfs_set_device_seek_speed(leaf, dev_item, 0);
--
1.9.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 09/10] Btrfs: don't consider the missing device when allocating new chunks
2014-07-24 3:37 [PATCH 01/10] Btrfs: Fix the problem that the replace destroys the seed filesystem Miao Xie
` (6 preceding siblings ...)
2014-07-24 3:37 ` [PATCH 08/10] Btrfs: Fix wrong device size when we are resizing the device Miao Xie
@ 2014-07-24 3:37 ` Miao Xie
2014-07-24 3:37 ` [PATCH 10/10] Btrfs: cleanup unused latest_devid and latest_trans in fs_devices Miao Xie
` (3 subsequent siblings)
11 siblings, 0 replies; 23+ messages in thread
From: Miao Xie @ 2014-07-24 3:37 UTC (permalink / raw)
To: linux-btrfs
The original code allocated new chunks by the number of the writable devices
and missing devices to make sure that any RAID levels on a degraded FS continue
to be honored, but it introduced a problem that it stopped us to allocating
new chunks, the steps to reproduce is following:
# mkfs.btrfs -m raid1 -d raid1 -f <dev0> <dev1>
# mkfs.btrfs -f <dev1> //Removing <dev1> from the original fs
# mount -o degraded <dev0> <mnt>
# dd if=/dev/null of=<mnt>/tmpfile bs=1M
It is because we allocate new chunks only on the writable devices, if we take
the number of missing devices into account, and want to allocate new chunks
with higher RAID level, we will fail becaue we don't have enough writable
device. Fix it by ignoring the number of missing devices when allocating
new chunks.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
fs/btrfs/extent-tree.c | 16 ++--------------
1 file changed, 2 insertions(+), 14 deletions(-)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 813537f..f778035 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3586,13 +3586,7 @@ static u64 get_restripe_target(struct btrfs_fs_info *fs_info, u64 flags)
*/
static u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags)
{
- /*
- * we add in the count of missing devices because we want
- * to make sure that any RAID levels on a degraded FS
- * continue to be honored.
- */
- u64 num_devices = root->fs_info->fs_devices->rw_devices +
- root->fs_info->fs_devices->missing_devices;
+ u64 num_devices = root->fs_info->fs_devices->rw_devices;
u64 target;
u64 tmp;
@@ -8181,13 +8175,7 @@ static u64 update_block_group_flags(struct btrfs_root *root, u64 flags)
if (stripped)
return extended_to_chunk(stripped);
- /*
- * we add in the count of missing devices because we want
- * to make sure that any RAID levels on a degraded FS
- * continue to be honored.
- */
- num_devices = root->fs_info->fs_devices->rw_devices +
- root->fs_info->fs_devices->missing_devices;
+ num_devices = root->fs_info->fs_devices->rw_devices;
stripped = BTRFS_BLOCK_GROUP_RAID0 |
BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6 |
--
1.9.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 10/10] Btrfs: cleanup unused latest_devid and latest_trans in fs_devices
2014-07-24 3:37 [PATCH 01/10] Btrfs: Fix the problem that the replace destroys the seed filesystem Miao Xie
` (7 preceding siblings ...)
2014-07-24 3:37 ` [PATCH 09/10] Btrfs: don't consider the missing device when allocating new chunks Miao Xie
@ 2014-07-24 3:37 ` Miao Xie
2014-07-24 13:13 ` [PATCH 01/10] Btrfs: Fix the problem that the replace destroys the seed filesystem David Sterba
` (2 subsequent siblings)
11 siblings, 0 replies; 23+ messages in thread
From: Miao Xie @ 2014-07-24 3:37 UTC (permalink / raw)
To: linux-btrfs
The member variants - latest_devid and latest_trans - of fs_devices structure
are set, but no one use them to do anything. so remove them.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
fs/btrfs/volumes.c | 42 +++++++++++-------------------------------
fs/btrfs/volumes.h | 3 ---
2 files changed, 11 insertions(+), 34 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 0140bff..b1ed417 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -475,14 +475,13 @@ static noinline int device_list_add(const char *path,
return PTR_ERR(fs_devices);
list_add(&fs_devices->list, &fs_uuids);
- fs_devices->latest_devid = devid;
- fs_devices->latest_trans = found_transid;
device = NULL;
} else {
device = __find_device(&fs_devices->devices, devid,
disk_super->dev_item.uuid);
}
+
if (!device) {
if (fs_devices->opened)
return -EBUSY;
@@ -567,10 +566,6 @@ static noinline int device_list_add(const char *path,
if (!fs_devices->opened)
device->generation = found_transid;
- if (found_transid > fs_devices->latest_trans) {
- fs_devices->latest_devid = devid;
- fs_devices->latest_trans = found_transid;
- }
*fs_devices_ret = fs_devices;
return ret;
@@ -586,8 +581,6 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig)
if (IS_ERR(fs_devices))
return fs_devices;
- fs_devices->latest_devid = orig->latest_devid;
- fs_devices->latest_trans = orig->latest_trans;
fs_devices->total_devices = orig->total_devices;
/* We have held the volume lock, it is safe to get the devices. */
@@ -631,10 +624,7 @@ void btrfs_close_extra_devices(struct btrfs_fs_info *fs_info,
struct btrfs_fs_devices *fs_devices, int step)
{
struct btrfs_device *device, *next;
-
- struct block_device *latest_bdev = NULL;
- u64 latest_devid = 0;
- u64 latest_transid = 0;
+ struct btrfs_device *latest_dev = NULL;
mutex_lock(&uuid_mutex);
again:
@@ -642,11 +632,9 @@ again:
list_for_each_entry_safe(device, next, &fs_devices->devices, dev_list) {
if (device->in_fs_metadata) {
if (!device->is_tgtdev_for_dev_replace &&
- (!latest_transid ||
- device->generation > latest_transid)) {
- latest_devid = device->devid;
- latest_transid = device->generation;
- latest_bdev = device->bdev;
+ (!latest_dev ||
+ device->generation > latest_dev->generation)) {
+ latest_dev = device;
}
continue;
}
@@ -688,9 +676,7 @@ again:
goto again;
}
- fs_devices->latest_bdev = latest_bdev;
- fs_devices->latest_devid = latest_devid;
- fs_devices->latest_trans = latest_transid;
+ fs_devices->latest_bdev = latest_dev->bdev;
mutex_unlock(&uuid_mutex);
}
@@ -805,11 +791,9 @@ static int __btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
struct block_device *bdev;
struct list_head *head = &fs_devices->devices;
struct btrfs_device *device;
- struct block_device *latest_bdev = NULL;
+ struct btrfs_device *latest_dev = NULL;
struct buffer_head *bh;
struct btrfs_super_block *disk_super;
- u64 latest_devid = 0;
- u64 latest_transid = 0;
u64 devid;
int seeding = 1;
int ret = 0;
@@ -837,11 +821,9 @@ static int __btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
goto error_brelse;
device->generation = btrfs_super_generation(disk_super);
- if (!latest_transid || device->generation > latest_transid) {
- latest_devid = devid;
- latest_transid = device->generation;
- latest_bdev = bdev;
- }
+ if (!latest_dev ||
+ device->generation > latest_dev->generation)
+ latest_dev = device;
if (btrfs_super_flags(disk_super) & BTRFS_SUPER_FLAG_SEEDING) {
device->writeable = 0;
@@ -884,9 +866,7 @@ error_brelse:
}
fs_devices->seeding = seeding;
fs_devices->opened = 1;
- fs_devices->latest_bdev = latest_bdev;
- fs_devices->latest_devid = latest_devid;
- fs_devices->latest_trans = latest_transid;
+ fs_devices->latest_bdev = latest_dev->bdev;
fs_devices->total_rw_bytes = 0;
out:
return ret;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 90d4fa8..e894ac6 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -119,9 +119,6 @@ struct btrfs_device {
struct btrfs_fs_devices {
u8 fsid[BTRFS_FSID_SIZE]; /* FS specific uuid */
- /* the device with this id has the most recent copy of the super */
- u64 latest_devid;
- u64 latest_trans;
u64 num_devices;
u64 open_devices;
u64 rw_devices;
--
1.9.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 01/10] Btrfs: Fix the problem that the replace destroys the seed filesystem
2014-07-24 3:37 [PATCH 01/10] Btrfs: Fix the problem that the replace destroys the seed filesystem Miao Xie
` (8 preceding siblings ...)
2014-07-24 3:37 ` [PATCH 10/10] Btrfs: cleanup unused latest_devid and latest_trans in fs_devices Miao Xie
@ 2014-07-24 13:13 ` David Sterba
2014-07-25 7:56 ` Anand Jain
2014-07-25 12:33 ` [PATCH] btrfs: replace seed device followed by unmount causes kernel WARNING Anand Jain
11 siblings, 0 replies; 23+ messages in thread
From: David Sterba @ 2014-07-24 13:13 UTC (permalink / raw)
To: Miao Xie; +Cc: linux-btrfs
On Thu, Jul 24, 2014 at 11:37:06AM +0800, Miao Xie wrote:
> The seed filesystem was destroyed by the device replace, the reproduce
> method is:
> # mkfs.btrfs -f <dev0>
> # btrfstune -S 1 <dev0>
> # mount <dev0> <mnt>
> # btrfs device add <dev1> <mnt>
> # umount <mnt>
> # mount <dev1> <mnt>
> # btrfs replace start -f <dev0> <dev2> <mnt>
> # umount <mnt>
> # mount <dev0> <mnt>
>
> It is because we erase the super block on the seed device. It is wrong,
> we should not change anything on the seed device.
>
> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 02/10] Btrfs: don't write any data into a readonly device when scrub
2014-07-24 3:37 ` [PATCH 02/10] Btrfs: don't write any data into a readonly device when scrub Miao Xie
@ 2014-07-24 13:19 ` David Sterba
2014-07-25 9:39 ` Anand Jain
1 sibling, 0 replies; 23+ messages in thread
From: David Sterba @ 2014-07-24 13:19 UTC (permalink / raw)
To: Miao Xie; +Cc: linux-btrfs
On Thu, Jul 24, 2014 at 11:37:07AM +0800, Miao Xie wrote:
> We should not write data into a readonly device especially seed device when
> doing scrub, skip those devices.
>
> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
One minor comment below.
> @@ -2904,6 +2904,7 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
> struct scrub_ctx *sctx;
> int ret;
> struct btrfs_device *dev;
> + struct rcu_string *name;
>
> + if (!is_dev_replace && !readonly && !dev->writeable) {
You can define 'name' within the block.
> + mutex_unlock(&fs_info->fs_devices->device_list_mutex);
> + rcu_read_lock();
> + name = rcu_dereference(dev->name);
> + btrfs_err(fs_info, "scrub: device %s is not writable",
> + name->str);
> + rcu_read_unlock();
> + return -EROFS;
> + }
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 03/10] Btrfs: fix wrong fsid check of scrub
2014-07-24 3:37 ` [PATCH 03/10] Btrfs: fix wrong fsid check of scrub Miao Xie
@ 2014-07-24 13:24 ` David Sterba
2014-09-03 6:58 ` [PATCH v2 " Miao Xie
0 siblings, 1 reply; 23+ messages in thread
From: David Sterba @ 2014-07-24 13:24 UTC (permalink / raw)
To: Miao Xie; +Cc: linux-btrfs
On Thu, Jul 24, 2014 at 11:37:08AM +0800, Miao Xie wrote:
> All the metadata in the seed devices has the same fsid as the fsid
> of the seed filesystem which is on the seed device, so we should check
> them by the current filesystem. Fix it.
>
> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
> ---
> fs/btrfs/scrub.c | 18 +++++++++++++-----
> 1 file changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index 23d3f6e..9a81874e 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -1361,6 +1361,16 @@ static void scrub_recheck_block(struct btrfs_fs_info *fs_info,
> return;
> }
>
> +static inline int scrub_check_fsid(u8 fsid[],
Please use 'const u8 *fsid' type.
> + struct scrub_page *spage)
> +{
> + struct btrfs_fs_devices *fs_devices = spage->dev->fs_devices;
> + int ret;
> +
> + ret = memcmp(fsid, fs_devices->fsid, BTRFS_UUID_SIZE);
ret is not necessary
> + return !ret;
> +}
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 04/10] Btrfs: fix wrong generation check of super block on a seed device
2014-07-24 3:37 ` [PATCH 04/10] Btrfs: fix wrong generation check of super block on a seed device Miao Xie
@ 2014-07-24 13:25 ` David Sterba
0 siblings, 0 replies; 23+ messages in thread
From: David Sterba @ 2014-07-24 13:25 UTC (permalink / raw)
To: Miao Xie; +Cc: linux-btrfs
On Thu, Jul 24, 2014 at 11:37:09AM +0800, Miao Xie wrote:
> The super block generation of the seed devices is not the same as the
> filesystem which sprouted from them because we don't update the super
> block on the seed devices when we change that new filesystem. So we
> should not use the generation of that new filesystem to check the super
> block generation on the seed devices, Fix it.
>
> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Good catch.
Reviewed-by: David Sterba <dsterba@suse.cz>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 06/10] Btrfs: Fix the problem that the dirty flag of dev stats is cleared
2014-07-24 3:37 ` [PATCH 06/10] Btrfs: Fix the problem that the dirty flag of dev stats is cleared Miao Xie
@ 2014-07-24 13:45 ` David Sterba
2014-09-03 6:59 ` [PATCH v2 " Miao Xie
0 siblings, 1 reply; 23+ messages in thread
From: David Sterba @ 2014-07-24 13:45 UTC (permalink / raw)
To: Miao Xie; +Cc: linux-btrfs
On Thu, Jul 24, 2014 at 11:37:11AM +0800, Miao Xie wrote:
> The io error might happen during writing out the device stats, and the
> device stats information and dirty flag would be update at that time,
> but the current code didn't consider this case, just clear the dirty
> flag, it would cause that we forgot to write out the new device stats
> information. Fix it.
>
> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
> ---
> fs/btrfs/volumes.c | 7 +++++--
> fs/btrfs/volumes.h | 19 +++++++++++++++----
> 2 files changed, 20 insertions(+), 6 deletions(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 19188df..0d37746 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -159,6 +159,7 @@ static struct btrfs_device *__alloc_device(void)
>
> spin_lock_init(&dev->reada_lock);
> atomic_set(&dev->reada_in_flight, 0);
> + atomic_set(&dev->dev_stats_ccnt, 0);
> INIT_RADIX_TREE(&dev->reada_zones, GFP_NOFS & ~__GFP_WAIT);
> INIT_RADIX_TREE(&dev->reada_extents, GFP_NOFS & ~__GFP_WAIT);
>
> @@ -6398,16 +6399,18 @@ int btrfs_run_dev_stats(struct btrfs_trans_handle *trans,
> struct btrfs_root *dev_root = fs_info->dev_root;
> struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
> struct btrfs_device *device;
> + int stats_cnt;
> int ret = 0;
>
> mutex_lock(&fs_devices->device_list_mutex);
> list_for_each_entry(device, &fs_devices->devices, dev_list) {
> - if (!device->dev_stats_valid || !device->dev_stats_dirty)
> + if (!device->dev_stats_valid || !btrfs_dev_stats_dirty(device))
The helper btrfs_dev_stats_dirty is used only once and IMHO not
necessary.
> continue;
>
> + stats_cnt = atomic_read(&device->dev_stats_ccnt);
Here it is opencoded anyway.
> ret = update_dev_stat_item(trans, dev_root, device);
> if (!ret)
> - device->dev_stats_dirty = 0;
> + atomic_sub(stats_cnt, &device->dev_stats_ccnt);
> }
> mutex_unlock(&fs_devices->device_list_mutex);
>
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 6fcc8ea..0defd23 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -110,7 +110,9 @@ struct btrfs_device {
> /* disk I/O failure stats. For detailed description refer to
> * enum btrfs_dev_stat_values in ioctl.h */
> int dev_stats_valid;
> - int dev_stats_dirty; /* counters need to be written to disk */
> +
> + /* Counter to record the change of device stats */
> + atomic_t dev_stats_ccnt;
dev_stats_dirty is more descriptive, please keep it. The counter
semantics can be documented here.
> atomic_t dev_stat_values[BTRFS_DEV_STAT_VALUES_MAX];
> };
>
> @@ -359,11 +361,18 @@ unsigned long btrfs_full_stripe_len(struct btrfs_root *root,
> int btrfs_finish_chunk_alloc(struct btrfs_trans_handle *trans,
> struct btrfs_root *extent_root,
> u64 chunk_offset, u64 chunk_size);
> +
> +static inline int btrfs_dev_stats_dirty(struct btrfs_device *dev)
> +{
> + return atomic_read(&dev->dev_stats_ccnt);
IMHO too trivial, not necessary.
> +}
> +
> static inline void btrfs_dev_stat_inc(struct btrfs_device *dev,
> int index)
> {
> atomic_inc(dev->dev_stat_values + index);
> - dev->dev_stats_dirty = 1;
> + smp_mb__before_atomic();
> + atomic_inc(&dev->dev_stats_ccnt);
Please put the two lines into a wrapper, 3 times repeating the same is
worth it.
> @@ -378,7 +387,8 @@ static inline int btrfs_dev_stat_read_and_reset(struct btrfs_device *dev,
> int ret;
>
> ret = atomic_xchg(dev->dev_stat_values + index, 0);
> - dev->dev_stats_dirty = 1;
> + smp_mb__before_atomic();
> + atomic_inc(&dev->dev_stats_ccnt);
> @@ -386,7 +396,8 @@ static inline void btrfs_dev_stat_set(struct btrfs_device *dev,
> int index, unsigned long val)
> {
> atomic_set(dev->dev_stat_values + index, val);
> - dev->dev_stats_dirty = 1;
> + smp_mb__before_atomic();
> + atomic_inc(&dev->dev_stats_ccnt);
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 01/10] Btrfs: Fix the problem that the replace destroys the seed filesystem
2014-07-24 3:37 [PATCH 01/10] Btrfs: Fix the problem that the replace destroys the seed filesystem Miao Xie
` (9 preceding siblings ...)
2014-07-24 13:13 ` [PATCH 01/10] Btrfs: Fix the problem that the replace destroys the seed filesystem David Sterba
@ 2014-07-25 7:56 ` Anand Jain
2014-07-25 12:33 ` [PATCH] btrfs: replace seed device followed by unmount causes kernel WARNING Anand Jain
11 siblings, 0 replies; 23+ messages in thread
From: Anand Jain @ 2014-07-25 7:56 UTC (permalink / raw)
To: Miao Xie, linux-btrfs
Thanks for nailing down most of the seed related bugs, scratching
off few from my list.
On 07/24/2014 11:37 AM, Miao Xie wrote:
> The seed filesystem was destroyed by the device replace, the reproduce
> method is:
> # mkfs.btrfs -f <dev0>
> # btrfstune -S 1 <dev0>
> # mount <dev0> <mnt>
> # btrfs device add <dev1> <mnt>
> # umount <mnt>
> # mount <dev1> <mnt>
> # btrfs replace start -f <dev0> <dev2> <mnt>
> # umount <mnt>
> # mount <dev0> <mnt>
>
> It is because we erase the super block on the seed device. It is wrong,
> we should not change anything on the seed device.
nice fix.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
> ---
> fs/btrfs/volumes.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 2776070f..19188df 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1853,8 +1853,12 @@ void btrfs_rm_dev_replace_srcdev(struct btrfs_fs_info *fs_info,
> if (srcdev->bdev) {
> fs_info->fs_devices->open_devices--;
>
> - /* zero out the old super */
> - btrfs_scratch_superblock(srcdev);
> + /*
> + * zero out the old super if it is not writable
> + * (e.g. seed device)
> + */
> + if (srcdev->writeable)
> + btrfs_scratch_superblock(srcdev);
> }
>
> call_rcu(&srcdev->rcu, free_device);
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 02/10] Btrfs: don't write any data into a readonly device when scrub
2014-07-24 3:37 ` [PATCH 02/10] Btrfs: don't write any data into a readonly device when scrub Miao Xie
2014-07-24 13:19 ` David Sterba
@ 2014-07-25 9:39 ` Anand Jain
1 sibling, 0 replies; 23+ messages in thread
From: Anand Jain @ 2014-07-25 9:39 UTC (permalink / raw)
To: Miao Xie, linux-btrfs
On 07/24/2014 11:37 AM, Miao Xie wrote:
> We should not write data into a readonly device especially seed device when
> doing scrub, skip those devices.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
> ---
> fs/btrfs/scrub.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index b6d198f..23d3f6e 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -2904,6 +2904,7 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
> struct scrub_ctx *sctx;
> int ret;
> struct btrfs_device *dev;
> + struct rcu_string *name;
>
> if (btrfs_fs_closing(fs_info))
> return -EINVAL;
> @@ -2965,6 +2966,16 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
> return -ENODEV;
> }
>
> + if (!is_dev_replace && !readonly && !dev->writeable) {
just reading the commit message would ask question what
about readonly scrub anyway. Nice readonly scrub case
is taken care as well.
> + mutex_unlock(&fs_info->fs_devices->device_list_mutex);
> + rcu_read_lock();
> + name = rcu_dereference(dev->name);
> + btrfs_err(fs_info, "scrub: device %s is not writable",
> + name->str);
> + rcu_read_unlock();
> + return -EROFS;
> + }
> +
> mutex_lock(&fs_info->scrub_lock);
> if (!dev->in_fs_metadata || dev->is_tgtdev_for_dev_replace) {
> mutex_unlock(&fs_info->scrub_lock);
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH] btrfs: replace seed device followed by unmount causes kernel WARNING
@ 2014-07-25 12:33 ` Anand Jain
2014-07-30 7:42 ` Miao Xie
0 siblings, 1 reply; 23+ messages in thread
From: Anand Jain @ 2014-07-25 12:33 UTC (permalink / raw)
To: linux-btrfs; +Cc: miaox
After the seed device has been replaced the new target device
is no more a seed device. So we need to bring that state in
the fs_devices.
reproducer:
mount /dev/sdb /btrfs
btrfs dev add /dev/sdc /btrfs
btrfs rep start -B /dev/sdb /dev/sdd /btrfs
umount /btrfs
WARNING: CPU: 0 PID: 12661 at fs/btrfs/volumes.c:891 __btrfs_close_devices+0x1b0/0x200 [btrfs]()
::
__btrfs_close_devices()
::
WARN_ON(fs_devices->open_devices);
WARN_ON(fs_devices->rw_devices);
per the btrfs-devlist tool (to dump fs_devices and
btrfs_device from the kernel) the num_device, open_devices,
rw_devices are still at 1 but the total_device is at 2,
even after the seed device has been replaced in the above example.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
fs/btrfs/dev-replace.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index eea26e1..a144bb1 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -569,6 +569,19 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info,
btrfs_rm_dev_replace_blocked(fs_info);
+ /*
+ * if we are replacing a seed device with a writable device
+ * then FS won't be a seeding FS any more.
+ */
+ if (src_device->fs_devices->seeding && !src_device->writeable) {
+ fs_info->fs_devices->rw_devices++;
+ fs_info->fs_devices->num_devices++;
+ fs_info->fs_devices->open_devices++;
+
+ fs_info->fs_devices->seeding = 0;
+ fs_info->fs_devices->seed = NULL;
+ }
+
btrfs_rm_dev_replace_srcdev(fs_info, src_device);
btrfs_rm_dev_replace_unblocked(fs_info);
--
2.0.0.153.g79dcccc
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: replace seed device followed by unmount causes kernel WARNING
2014-07-25 12:33 ` [PATCH] btrfs: replace seed device followed by unmount causes kernel WARNING Anand Jain
@ 2014-07-30 7:42 ` Miao Xie
2014-07-31 8:45 ` Anand Jain
0 siblings, 1 reply; 23+ messages in thread
From: Miao Xie @ 2014-07-30 7:42 UTC (permalink / raw)
To: Anand Jain, linux-btrfs
On Fri, 25 Jul 2014 20:33:34 +0800, Anand Jain wrote:
> After the seed device has been replaced the new target device
> is no more a seed device. So we need to bring that state in
> the fs_devices.
>
> reproducer:
> mount /dev/sdb /btrfs
> btrfs dev add /dev/sdc /btrfs
> btrfs rep start -B /dev/sdb /dev/sdd /btrfs
> umount /btrfs
>
> WARNING: CPU: 0 PID: 12661 at fs/btrfs/volumes.c:891 __btrfs_close_devices+0x1b0/0x200 [btrfs]()
> ::
>
> __btrfs_close_devices()
> ::
> WARN_ON(fs_devices->open_devices);
> WARN_ON(fs_devices->rw_devices);
>
> per the btrfs-devlist tool (to dump fs_devices and
> btrfs_device from the kernel) the num_device, open_devices,
> rw_devices are still at 1 but the total_device is at 2,
> even after the seed device has been replaced in the above example.
>
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
> fs/btrfs/dev-replace.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
> index eea26e1..a144bb1 100644
> --- a/fs/btrfs/dev-replace.c
> +++ b/fs/btrfs/dev-replace.c
> @@ -569,6 +569,19 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info,
>
> btrfs_rm_dev_replace_blocked(fs_info);
>
> + /*
> + * if we are replacing a seed device with a writable device
> + * then FS won't be a seeding FS any more.
> + */
> + if (src_device->fs_devices->seeding && !src_device->writeable) {
First, why not move this code into btrfs_rm_dev_replace_srcdev()?
Then if the first condition is true, the second one(!src_device->writeable) must be true
because all the devices in the seed fs_device must be read-only. so only the first
check is enough.
> + fs_info->fs_devices->rw_devices++;
If src is missing dev, we would increase it twice.
> + fs_info->fs_devices->num_devices++;
> + fs_info->fs_devices->open_devices++;
> +
> + fs_info->fs_devices->seeding = 0;
> + fs_info->fs_devices->seed = NULL;
In fact, we may have several seed fs_devices in one fs, and the seed fs_device
which includes src might not the first one, so assign seed to be NULL would break
the seed fs_device list.
Thanks
Miao
> + }
> +
> btrfs_rm_dev_replace_srcdev(fs_info, src_device);
>
> btrfs_rm_dev_replace_unblocked(fs_info);
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: replace seed device followed by unmount causes kernel WARNING
2014-07-30 7:42 ` Miao Xie
@ 2014-07-31 8:45 ` Anand Jain
2014-08-11 9:46 ` Anand Jain
0 siblings, 1 reply; 23+ messages in thread
From: Anand Jain @ 2014-07-31 8:45 UTC (permalink / raw)
To: miaox; +Cc: linux-btrfs
On 30/07/2014 15:42, Miao Xie wrote:
> On Fri, 25 Jul 2014 20:33:34 +0800, Anand Jain wrote:
>> After the seed device has been replaced the new target device
>> is no more a seed device. So we need to bring that state in
>> the fs_devices.
>>
>> reproducer:
>> mount /dev/sdb /btrfs
>> btrfs dev add /dev/sdc /btrfs
>> btrfs rep start -B /dev/sdb /dev/sdd /btrfs
>> umount /btrfs
>>
>> WARNING: CPU: 0 PID: 12661 at fs/btrfs/volumes.c:891 __btrfs_close_devices+0x1b0/0x200 [btrfs]()
>> ::
>>
>> __btrfs_close_devices()
>> ::
>> WARN_ON(fs_devices->open_devices);
>> WARN_ON(fs_devices->rw_devices);
>>
>> per the btrfs-devlist tool (to dump fs_devices and
>> btrfs_device from the kernel) the num_device, open_devices,
>> rw_devices are still at 1 but the total_device is at 2,
>> even after the seed device has been replaced in the above example.
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>> fs/btrfs/dev-replace.c | 13 +++++++++++++
>> 1 file changed, 13 insertions(+)
>>
>> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
>> index eea26e1..a144bb1 100644
>> --- a/fs/btrfs/dev-replace.c
>> +++ b/fs/btrfs/dev-replace.c
>> @@ -569,6 +569,19 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info,
>>
>> btrfs_rm_dev_replace_blocked(fs_info);
>>
>> + /*
>> + * if we are replacing a seed device with a writable device
>> + * then FS won't be a seeding FS any more.
>> + */
>> + if (src_device->fs_devices->seeding && !src_device->writeable) {
>
> First, why not move this code into btrfs_rm_dev_replace_srcdev()?
>
> Then if the first condition is true, the second one(!src_device->writeable) must be true
> because all the devices in the seed fs_device must be read-only. so only the first
> check is enough.
>
>> + fs_info->fs_devices->rw_devices++;
>
> If src is missing dev, we would increase it twice.
>
>> + fs_info->fs_devices->num_devices++;
>> + fs_info->fs_devices->open_devices++;
>> +
>> + fs_info->fs_devices->seeding = 0;
>> + fs_info->fs_devices->seed = NULL;
>
> In fact, we may have several seed fs_devices in one fs, and the seed fs_device
> which includes src might not the first one, so assign seed to be NULL would break
> the seed fs_device list.
Yep I had question when writing this patch but later decided
to reset seed and seeding. if I am not wrong don't reset
seeding and seed will do as well.
Thanks for reviewing.
Anand
> Thanks
> Miao
>
>> + }
>> +
>> btrfs_rm_dev_replace_srcdev(fs_info, src_device);
>>
>> btrfs_rm_dev_replace_unblocked(fs_info);
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: replace seed device followed by unmount causes kernel WARNING
2014-07-31 8:45 ` Anand Jain
@ 2014-08-11 9:46 ` Anand Jain
0 siblings, 0 replies; 23+ messages in thread
From: Anand Jain @ 2014-08-11 9:46 UTC (permalink / raw)
To: linux-btrfs; +Cc: miaox
I have sent out the patch-set
[PATCH 1/4] btrfs: preparatory to make btrfs_rm_dev_replace_srcdev()
seed aware
in replacement for this patch. Kindly use/review the above patch set.
Thanks. Anand
On 31/07/2014 16:45, Anand Jain wrote:
>
>
> On 30/07/2014 15:42, Miao Xie wrote:
>> On Fri, 25 Jul 2014 20:33:34 +0800, Anand Jain wrote:
>>> After the seed device has been replaced the new target device
>>> is no more a seed device. So we need to bring that state in
>>> the fs_devices.
>>>
>>> reproducer:
>>> mount /dev/sdb /btrfs
>>> btrfs dev add /dev/sdc /btrfs
>>> btrfs rep start -B /dev/sdb /dev/sdd /btrfs
>>> umount /btrfs
>>>
>>> WARNING: CPU: 0 PID: 12661 at fs/btrfs/volumes.c:891
>>> __btrfs_close_devices+0x1b0/0x200 [btrfs]()
>>> ::
>>>
>>> __btrfs_close_devices()
>>> ::
>>> WARN_ON(fs_devices->open_devices);
>>> WARN_ON(fs_devices->rw_devices);
>>>
>>> per the btrfs-devlist tool (to dump fs_devices and
>>> btrfs_device from the kernel) the num_device, open_devices,
>>> rw_devices are still at 1 but the total_device is at 2,
>>> even after the seed device has been replaced in the above example.
>>>
>>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>>> ---
>>> fs/btrfs/dev-replace.c | 13 +++++++++++++
>>> 1 file changed, 13 insertions(+)
>>>
>>> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
>>> index eea26e1..a144bb1 100644
>>> --- a/fs/btrfs/dev-replace.c
>>> +++ b/fs/btrfs/dev-replace.c
>>> @@ -569,6 +569,19 @@ static int btrfs_dev_replace_finishing(struct
>>> btrfs_fs_info *fs_info,
>>>
>>> btrfs_rm_dev_replace_blocked(fs_info);
>>>
>>> + /*
>>> + * if we are replacing a seed device with a writable device
>>> + * then FS won't be a seeding FS any more.
>>> + */
>>> + if (src_device->fs_devices->seeding && !src_device->writeable) {
>>
>> First, why not move this code into btrfs_rm_dev_replace_srcdev()?
>>
>> Then if the first condition is true, the second
>> one(!src_device->writeable) must be true
>> because all the devices in the seed fs_device must be read-only. so
>> only the first
>> check is enough.
>>
>>> + fs_info->fs_devices->rw_devices++;
>>
>> If src is missing dev, we would increase it twice.
>>
>>> + fs_info->fs_devices->num_devices++;
>>> + fs_info->fs_devices->open_devices++;
>>> +
>>> + fs_info->fs_devices->seeding = 0;
>>> + fs_info->fs_devices->seed = NULL;
>>
>> In fact, we may have several seed fs_devices in one fs, and the seed
>> fs_device
>> which includes src might not the first one, so assign seed to be NULL
>> would break
>> the seed fs_device list.
>
> Yep I had question when writing this patch but later decided
> to reset seed and seeding. if I am not wrong don't reset
> seeding and seed will do as well.
>
> Thanks for reviewing.
> Anand
>
>> Thanks
>> Miao
>>
>>> + }
>>> +
>>> btrfs_rm_dev_replace_srcdev(fs_info, src_device);
>>>
>>> btrfs_rm_dev_replace_unblocked(fs_info);
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v2 03/10] Btrfs: fix wrong fsid check of scrub
2014-07-24 13:24 ` David Sterba
@ 2014-09-03 6:58 ` Miao Xie
0 siblings, 0 replies; 23+ messages in thread
From: Miao Xie @ 2014-09-03 6:58 UTC (permalink / raw)
To: linux-btrfs
All the metadata in the seed devices has the same fsid as the fsid
of the seed filesystem which is on the seed device, so we should check
them by the current filesystem. Fix it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
---
Changelog v1 -> v2:
- Use const keyword to restrict the fsid.
- Remove unnecessary the variant.
---
fs/btrfs/scrub.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index dfb92a2..12a6801 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -1361,6 +1361,14 @@ static void scrub_recheck_block(struct btrfs_fs_info *fs_info,
return;
}
+static inline int scrub_check_fsid(const u8 *fsid,
+ struct scrub_page *spage)
+{
+ struct btrfs_fs_devices *fs_devices = spage->dev->fs_devices;
+
+ return !memcmp(fsid, fs_devices->fsid, BTRFS_UUID_SIZE);
+}
+
static void scrub_recheck_block_checksum(struct btrfs_fs_info *fs_info,
struct scrub_block *sblock,
int is_metadata, int have_csum,
@@ -1380,7 +1388,7 @@ static void scrub_recheck_block_checksum(struct btrfs_fs_info *fs_info,
h = (struct btrfs_header *)mapped_buffer;
if (sblock->pagev[0]->logical != btrfs_stack_header_bytenr(h) ||
- memcmp(h->fsid, fs_info->fsid, BTRFS_UUID_SIZE) ||
+ !scrub_check_fsid(h->fsid, sblock->pagev[0]) ||
memcmp(h->chunk_tree_uuid, fs_info->chunk_tree_uuid,
BTRFS_UUID_SIZE)) {
sblock->header_error = 1;
@@ -1750,7 +1758,7 @@ static int scrub_checksum_tree_block(struct scrub_block *sblock)
if (sblock->pagev[0]->generation != btrfs_stack_header_generation(h))
++fail;
- if (memcmp(h->fsid, fs_info->fsid, BTRFS_UUID_SIZE))
+ if (!scrub_check_fsid(h->fsid, sblock->pagev[0]))
++fail;
if (memcmp(h->chunk_tree_uuid, fs_info->chunk_tree_uuid,
@@ -1790,8 +1798,6 @@ static int scrub_checksum_super(struct scrub_block *sblock)
{
struct btrfs_super_block *s;
struct scrub_ctx *sctx = sblock->sctx;
- struct btrfs_root *root = sctx->dev_root;
- struct btrfs_fs_info *fs_info = root->fs_info;
u8 calculated_csum[BTRFS_CSUM_SIZE];
u8 on_disk_csum[BTRFS_CSUM_SIZE];
struct page *page;
@@ -1816,7 +1822,7 @@ static int scrub_checksum_super(struct scrub_block *sblock)
if (sblock->pagev[0]->generation != btrfs_super_generation(s))
++fail_gen;
- if (memcmp(s->fsid, fs_info->fsid, BTRFS_UUID_SIZE))
+ if (!scrub_check_fsid(s->fsid, sblock->pagev[0]))
++fail_cor;
len = BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE;
--
1.9.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 06/10] Btrfs: Fix the problem that the dirty flag of dev stats is cleared
2014-07-24 13:45 ` David Sterba
@ 2014-09-03 6:59 ` Miao Xie
0 siblings, 0 replies; 23+ messages in thread
From: Miao Xie @ 2014-09-03 6:59 UTC (permalink / raw)
To: linux-btrfs
The io error might happen during writing out the device stats, and the
device stats information and dirty flag would be update at that time,
but the current code didn't consider this case, just clear the dirty
flag, it would cause that we forgot to write out the new device stats
information. Fix it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
Changelog v1 -> v2:
- Change the variant name and make some cleanup by David's comment
---
fs/btrfs/volumes.c | 8 ++++++--
fs/btrfs/volumes.h | 16 ++++++++++++----
2 files changed, 18 insertions(+), 6 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 19188df..4ea73c8 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -159,6 +159,7 @@ static struct btrfs_device *__alloc_device(void)
spin_lock_init(&dev->reada_lock);
atomic_set(&dev->reada_in_flight, 0);
+ atomic_set(&dev->dev_stats_dirty, 0);
INIT_RADIX_TREE(&dev->reada_zones, GFP_NOFS & ~__GFP_WAIT);
INIT_RADIX_TREE(&dev->reada_extents, GFP_NOFS & ~__GFP_WAIT);
@@ -6398,16 +6399,19 @@ int btrfs_run_dev_stats(struct btrfs_trans_handle *trans,
struct btrfs_root *dev_root = fs_info->dev_root;
struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
struct btrfs_device *device;
+ int dirtied;
int ret = 0;
mutex_lock(&fs_devices->device_list_mutex);
list_for_each_entry(device, &fs_devices->devices, dev_list) {
- if (!device->dev_stats_valid || !device->dev_stats_dirty)
+ dirtied = atomic_read(&device->dev_stats_dirty);
+
+ if (!device->dev_stats_valid || !dirtied)
continue;
ret = update_dev_stat_item(trans, dev_root, device);
if (!ret)
- device->dev_stats_dirty = 0;
+ atomic_sub(dirtied, &device->dev_stats_dirty);
}
mutex_unlock(&fs_devices->device_list_mutex);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 6fcc8ea..9a1eff3 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -110,7 +110,8 @@ struct btrfs_device {
/* disk I/O failure stats. For detailed description refer to
* enum btrfs_dev_stat_values in ioctl.h */
int dev_stats_valid;
- int dev_stats_dirty; /* counters need to be written to disk */
+
+ atomic_t dev_stats_dirty; /* counters need to be written to disk */
atomic_t dev_stat_values[BTRFS_DEV_STAT_VALUES_MAX];
};
@@ -359,11 +360,18 @@ unsigned long btrfs_full_stripe_len(struct btrfs_root *root,
int btrfs_finish_chunk_alloc(struct btrfs_trans_handle *trans,
struct btrfs_root *extent_root,
u64 chunk_offset, u64 chunk_size);
+
+static inline void btrfs_dev_dirty_stat(struct btrfs_device *dev)
+{
+ smp_mb__before_atomic();
+ atomic_inc(&dev->dev_stats_dirty);
+}
+
static inline void btrfs_dev_stat_inc(struct btrfs_device *dev,
int index)
{
atomic_inc(dev->dev_stat_values + index);
- dev->dev_stats_dirty = 1;
+ btrfs_dev_dirty_stat(dev);
}
static inline int btrfs_dev_stat_read(struct btrfs_device *dev,
@@ -378,7 +386,7 @@ static inline int btrfs_dev_stat_read_and_reset(struct btrfs_device *dev,
int ret;
ret = atomic_xchg(dev->dev_stat_values + index, 0);
- dev->dev_stats_dirty = 1;
+ btrfs_dev_dirty_stat(dev);
return ret;
}
@@ -386,7 +394,7 @@ static inline void btrfs_dev_stat_set(struct btrfs_device *dev,
int index, unsigned long val)
{
atomic_set(dev->dev_stat_values + index, val);
- dev->dev_stats_dirty = 1;
+ btrfs_dev_dirty_stat(dev);
}
static inline void btrfs_dev_stat_reset(struct btrfs_device *dev,
--
1.9.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
end of thread, other threads:[~2014-09-03 6:58 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-24 3:37 [PATCH 01/10] Btrfs: Fix the problem that the replace destroys the seed filesystem Miao Xie
2014-07-24 3:37 ` [PATCH 02/10] Btrfs: don't write any data into a readonly device when scrub Miao Xie
2014-07-24 13:19 ` David Sterba
2014-07-25 9:39 ` Anand Jain
2014-07-24 3:37 ` [PATCH 03/10] Btrfs: fix wrong fsid check of scrub Miao Xie
2014-07-24 13:24 ` David Sterba
2014-09-03 6:58 ` [PATCH v2 " Miao Xie
2014-07-24 3:37 ` [PATCH 04/10] Btrfs: fix wrong generation check of super block on a seed device Miao Xie
2014-07-24 13:25 ` David Sterba
2014-07-24 3:37 ` [PATCH 05/10] Btrfs: make the device lock and its protected data in the same cacheline Miao Xie
2014-07-24 3:37 ` [PATCH 06/10] Btrfs: Fix the problem that the dirty flag of dev stats is cleared Miao Xie
2014-07-24 13:45 ` David Sterba
2014-09-03 6:59 ` [PATCH v2 " Miao Xie
2014-07-24 3:37 ` [PATCH 07/10] Btrfs: update the comment of total_bytes and disk_total_bytes of btrfs_devie Miao Xie
2014-07-24 3:37 ` [PATCH 08/10] Btrfs: Fix wrong device size when we are resizing the device Miao Xie
2014-07-24 3:37 ` [PATCH 09/10] Btrfs: don't consider the missing device when allocating new chunks Miao Xie
2014-07-24 3:37 ` [PATCH 10/10] Btrfs: cleanup unused latest_devid and latest_trans in fs_devices Miao Xie
2014-07-24 13:13 ` [PATCH 01/10] Btrfs: Fix the problem that the replace destroys the seed filesystem David Sterba
2014-07-25 7:56 ` Anand Jain
2014-07-25 12:33 ` [PATCH] btrfs: replace seed device followed by unmount causes kernel WARNING Anand Jain
2014-07-30 7:42 ` Miao Xie
2014-07-31 8:45 ` Anand Jain
2014-08-11 9:46 ` Anand Jain
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).