* [PATCH 00/21] [RFC] Btrfs: restriper
@ 2011-08-23 20:01 Ilya Dryomov
2011-08-23 20:01 ` [PATCH 01/21] Btrfs: get rid of *_alloc_profile fields Ilya Dryomov
` (23 more replies)
0 siblings, 24 replies; 42+ messages in thread
From: Ilya Dryomov @ 2011-08-23 20:01 UTC (permalink / raw)
To: linux-btrfs; +Cc: Chris Mason, Hugo Mills, idryomov
Hello,
This patch series adds an initial implementation of restriper (it's a
clever name for relocation framework that allows to do selective profile
changing and selective balancing with some goodies like pausing/resuming
and reporting progress to the user.
Profile changing is global (per-FS) so far, per-subvolume profiles
require some discussion and can be implemented in future. This is a RFC
so some features/problems are not yet implemented/resolved. The current
TODO list is as follows:
1) do pause/cancel via trans commit instead of waiting for the current
chunk to be fully relocated
2) fix problems with left-over chunks (the ones we were being relocating
to when the crash occured, this is going to become a bigger problem when
item 1 is done
3) fix remount problems (get rid of deadlocks that occur on remount
while relocating, stop restriper on remounts)
4) issue a discard on removed chunks - 1 GiB+ discards can be a big deal
There is also a couple of problems related to profile changing and
resuming that I'm working on right now. But the basic infrastructure is
all there and is ready for reviewing.
This patchset deprecates Hugo's "Balance management" patch series.
Originally this was supposed to be just a profile changing thing merged
with those patches, but the merge turned out to be a complete rewrite.
The filters part was rewritten to be per-chunk-type and the management
part was thrown away because we now store an item to disk, there is the
difference between pausing and cancelling, locking is different, etc.
I'm happy to integrate any ideas that got dropped as a result of this.
Thanks to Arne who did an early review and Chris for overall guidance.
Any comments/suggestions are appreciated. The series is on top of
3.1-rc3, available at:
git://github.com/idryomov/btrfs-unstable.git restriper-rfc
Thanks,
Ilya
Ilya Dryomov (21):
Btrfs: get rid of *_alloc_profile fields
Btrfs: introduce masks for chunk type and profile
Btrfs: add BTRFS_AVAIL_ALLOC_BIT_SINGLE bit
Btrfs: make avail_*_alloc_bits fields dynamic
Btrfs: add basic restriper infrastructure
Btrfs: implement online profile changing
Btrfs: add basic infrastructure for selective balancing
Btrfs: soft profile changing mode (aka soft convert)
Btrfs: profiles filter
Btrfs: usage filter
Btrfs: devid filter
Btrfs: devid subset filter
Btrfs: virtual address space subset filter
Btrfs: save restripe parameters to disk
Btrfs: recover restripe on mount
Btrfs: allow for cancelling restriper
Btrfs: allow for pausing restriper
Btrfs: allow for resuming restriper after it was paused
Btrfs: add skip_restripe mount option
Btrfs: get rid of btrfs_balance() function
Btrfs: add restripe progress reporting
fs/btrfs/ctree.h | 156 ++++++++++-
fs/btrfs/disk-io.c | 15 +-
fs/btrfs/extent-tree.c | 118 ++++++--
fs/btrfs/ioctl.c | 214 ++++++++++++-
fs/btrfs/ioctl.h | 44 +++
fs/btrfs/super.c | 8 +-
fs/btrfs/volumes.c | 780 +++++++++++++++++++++++++++++++++++++++++++++---
fs/btrfs/volumes.h | 57 ++++-
8 files changed, 1304 insertions(+), 88 deletions(-)
--
1.7.5.4
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 01/21] Btrfs: get rid of *_alloc_profile fields
2011-08-23 20:01 [PATCH 00/21] [RFC] Btrfs: restriper Ilya Dryomov
@ 2011-08-23 20:01 ` Ilya Dryomov
2011-09-27 12:51 ` David Sterba
2011-08-23 20:01 ` [PATCH 02/21] Btrfs: introduce masks for chunk type and profile Ilya Dryomov
` (22 subsequent siblings)
23 siblings, 1 reply; 42+ messages in thread
From: Ilya Dryomov @ 2011-08-23 20:01 UTC (permalink / raw)
To: linux-btrfs; +Cc: Chris Mason, Hugo Mills, idryomov
{data,metadata,system}_alloc_profile fields have been unused for a long
time now. Get rid of them.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
fs/btrfs/ctree.h | 3 ---
fs/btrfs/disk-io.c | 3 ---
fs/btrfs/extent-tree.c | 10 ++++------
fs/btrfs/volumes.c | 6 ++----
4 files changed, 6 insertions(+), 16 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 03912c5..dcf2fd7 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1095,9 +1095,6 @@ struct btrfs_fs_info {
u64 avail_data_alloc_bits;
u64 avail_metadata_alloc_bits;
u64 avail_system_alloc_bits;
- u64 data_alloc_profile;
- u64 metadata_alloc_profile;
- u64 system_alloc_profile;
unsigned data_chunk_allocations;
unsigned metadata_ratio;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 07b3ac6..46d0412 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1997,9 +1997,6 @@ struct btrfs_root *open_ctree(struct super_block *sb,
fs_info->generation = generation;
fs_info->last_trans_committed = generation;
- fs_info->data_alloc_profile = (u64)-1;
- fs_info->metadata_alloc_profile = (u64)-1;
- fs_info->system_alloc_profile = fs_info->metadata_alloc_profile;
ret = btrfs_init_space_info(fs_info);
if (ret) {
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f5be06a..4e1b763 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2998,14 +2998,12 @@ u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags)
static u64 get_alloc_profile(struct btrfs_root *root, u64 flags)
{
if (flags & BTRFS_BLOCK_GROUP_DATA)
- flags |= root->fs_info->avail_data_alloc_bits &
- root->fs_info->data_alloc_profile;
+ flags |= root->fs_info->avail_data_alloc_bits;
else if (flags & BTRFS_BLOCK_GROUP_SYSTEM)
- flags |= root->fs_info->avail_system_alloc_bits &
- root->fs_info->system_alloc_profile;
+ flags |= root->fs_info->avail_system_alloc_bits;
else if (flags & BTRFS_BLOCK_GROUP_METADATA)
- flags |= root->fs_info->avail_metadata_alloc_bits &
- root->fs_info->metadata_alloc_profile;
+ flags |= root->fs_info->avail_metadata_alloc_bits;
+
return btrfs_reduce_alloc_profile(root, flags);
}
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f2a4cc7..ed96275 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2711,8 +2711,7 @@ static noinline int init_first_rw_device(struct btrfs_trans_handle *trans,
return ret;
alloc_profile = BTRFS_BLOCK_GROUP_METADATA |
- (fs_info->metadata_alloc_profile &
- fs_info->avail_metadata_alloc_bits);
+ fs_info->avail_metadata_alloc_bits;
alloc_profile = btrfs_reduce_alloc_profile(root, alloc_profile);
ret = __btrfs_alloc_chunk(trans, extent_root, &map, &chunk_size,
@@ -2722,8 +2721,7 @@ static noinline int init_first_rw_device(struct btrfs_trans_handle *trans,
sys_chunk_offset = chunk_offset + chunk_size;
alloc_profile = BTRFS_BLOCK_GROUP_SYSTEM |
- (fs_info->system_alloc_profile &
- fs_info->avail_system_alloc_bits);
+ fs_info->avail_system_alloc_bits;
alloc_profile = btrfs_reduce_alloc_profile(root, alloc_profile);
ret = __btrfs_alloc_chunk(trans, extent_root, &sys_map,
--
1.7.5.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 02/21] Btrfs: introduce masks for chunk type and profile
2011-08-23 20:01 [PATCH 00/21] [RFC] Btrfs: restriper Ilya Dryomov
2011-08-23 20:01 ` [PATCH 01/21] Btrfs: get rid of *_alloc_profile fields Ilya Dryomov
@ 2011-08-23 20:01 ` Ilya Dryomov
2011-08-23 20:01 ` [PATCH 03/21] Btrfs: add BTRFS_AVAIL_ALLOC_BIT_SINGLE bit Ilya Dryomov
` (21 subsequent siblings)
23 siblings, 0 replies; 42+ messages in thread
From: Ilya Dryomov @ 2011-08-23 20:01 UTC (permalink / raw)
To: linux-btrfs; +Cc: Chris Mason, Hugo Mills, idryomov
Chunk's type and profile are encoded in u64 flags field. Introduce
masks to easily access them.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
fs/btrfs/ctree.h | 8 ++++++++
fs/btrfs/extent-tree.c | 12 +++---------
fs/btrfs/volumes.c | 11 ++---------
3 files changed, 13 insertions(+), 18 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index dcf2fd7..b882c95 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -717,6 +717,14 @@ struct btrfs_csum_item {
#define BTRFS_BLOCK_GROUP_RAID10 (1 << 6)
#define BTRFS_NR_RAID_TYPES 5
+#define BTRFS_BLOCK_GROUP_TYPE_MASK (BTRFS_BLOCK_GROUP_DATA | \
+ BTRFS_BLOCK_GROUP_SYSTEM | \
+ BTRFS_BLOCK_GROUP_METADATA)
+
+#define BTRFS_BLOCK_GROUP_PROFILE_MASK (BTRFS_BLOCK_GROUP_RAID0 | \
+ BTRFS_BLOCK_GROUP_RAID1 | \
+ BTRFS_BLOCK_GROUP_DUP | \
+ BTRFS_BLOCK_GROUP_RAID10)
struct btrfs_block_group_item {
__le64 used;
__le64 chunk_objectid;
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 4e1b763..de4c639 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -559,8 +559,7 @@ static struct btrfs_space_info *__find_space_info(struct btrfs_fs_info *info,
struct list_head *head = &info->space_info;
struct btrfs_space_info *found;
- flags &= BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_SYSTEM |
- BTRFS_BLOCK_GROUP_METADATA;
+ flags &= BTRFS_BLOCK_GROUP_TYPE_MASK;
rcu_read_lock();
list_for_each_entry_rcu(found, head, list) {
@@ -2924,9 +2923,7 @@ static int update_space_info(struct btrfs_fs_info *info, u64 flags,
INIT_LIST_HEAD(&found->block_groups[i]);
init_rwsem(&found->groups_sem);
spin_lock_init(&found->lock);
- found->flags = flags & (BTRFS_BLOCK_GROUP_DATA |
- BTRFS_BLOCK_GROUP_SYSTEM |
- BTRFS_BLOCK_GROUP_METADATA);
+ found->flags = flags & BTRFS_BLOCK_GROUP_TYPE_MASK;
found->total_bytes = total_bytes;
found->disk_total = total_bytes * factor;
found->bytes_used = bytes_used;
@@ -2947,10 +2944,7 @@ static int update_space_info(struct btrfs_fs_info *info, u64 flags,
static void set_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags)
{
- u64 extra_flags = flags & (BTRFS_BLOCK_GROUP_RAID0 |
- BTRFS_BLOCK_GROUP_RAID1 |
- BTRFS_BLOCK_GROUP_RAID10 |
- BTRFS_BLOCK_GROUP_DUP);
+ u64 extra_flags = flags & BTRFS_BLOCK_GROUP_PROFILE_MASK;
if (extra_flags) {
if (flags & BTRFS_BLOCK_GROUP_DATA)
fs_info->avail_data_alloc_bits |= extra_flags;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index ed96275..af4bf56 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2908,12 +2908,8 @@ again:
}
}
if (rw & REQ_DISCARD) {
- if (map->type & (BTRFS_BLOCK_GROUP_RAID0 |
- BTRFS_BLOCK_GROUP_RAID1 |
- BTRFS_BLOCK_GROUP_DUP |
- BTRFS_BLOCK_GROUP_RAID10)) {
+ if (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK)
stripes_required = map->num_stripes;
- }
}
if (multi_ret && (rw & (REQ_WRITE | REQ_DISCARD)) &&
stripes_allocated < stripes_required) {
@@ -2937,10 +2933,7 @@ again:
if (rw & REQ_DISCARD)
*length = min_t(u64, em->len - offset, *length);
- else if (map->type & (BTRFS_BLOCK_GROUP_RAID0 |
- BTRFS_BLOCK_GROUP_RAID1 |
- BTRFS_BLOCK_GROUP_RAID10 |
- BTRFS_BLOCK_GROUP_DUP)) {
+ else if (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
/* we limit the length of each bio to what fits in a stripe */
*length = min_t(u64, em->len - offset,
map->stripe_len - stripe_offset);
--
1.7.5.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 03/21] Btrfs: add BTRFS_AVAIL_ALLOC_BIT_SINGLE bit
2011-08-23 20:01 [PATCH 00/21] [RFC] Btrfs: restriper Ilya Dryomov
2011-08-23 20:01 ` [PATCH 01/21] Btrfs: get rid of *_alloc_profile fields Ilya Dryomov
2011-08-23 20:01 ` [PATCH 02/21] Btrfs: introduce masks for chunk type and profile Ilya Dryomov
@ 2011-08-23 20:01 ` Ilya Dryomov
2011-11-01 7:56 ` Arne Jansen
2011-08-23 20:01 ` [PATCH 04/21] Btrfs: make avail_*_alloc_bits fields dynamic Ilya Dryomov
` (20 subsequent siblings)
23 siblings, 1 reply; 42+ messages in thread
From: Ilya Dryomov @ 2011-08-23 20:01 UTC (permalink / raw)
To: linux-btrfs; +Cc: Chris Mason, Hugo Mills, idryomov
Right now on-disk BTRFS_BLOCK_GROUP_* profile bits are used for
avail_{data,metadata,system}_alloc_bits fields, which are there to tell
us about available allocation profiles in the fs. When chunk is
created, it's profile is OR'ed with respective avail_alloc_bits field.
Since SINGLE is denoted by 0 in the on-disk format, currently there is
no way to tell when such chunks become avaialble. Restriper needs that
information, so add a separate bit for SINGLE profile.
This bit is going to be in-memory only, it should never be written out
to disk, so it's not a disk format change. However to avoid remappings
in future, reserve corresponding on-disk bit.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
fs/btrfs/ctree.h | 12 ++++++++++++
fs/btrfs/extent-tree.c | 22 ++++++++++++++--------
2 files changed, 26 insertions(+), 8 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index b882c95..5b00eb8 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -725,6 +725,17 @@ struct btrfs_csum_item {
BTRFS_BLOCK_GROUP_RAID1 | \
BTRFS_BLOCK_GROUP_DUP | \
BTRFS_BLOCK_GROUP_RAID10)
+/*
+ * We need a bit for restriper to be able to tell when chunks of type
+ * SINGLE are available. It is used in avail_*_alloc_bits.
+ */
+#define BTRFS_AVAIL_ALLOC_BIT_SINGLE (1 << 7)
+
+/*
+ * To avoid troubles or remappings, reserve on-disk bit.
+ */
+#define BTRFS_BLOCK_GROUP_RESERVED (1 << 7)
+
struct btrfs_block_group_item {
__le64 used;
__le64 chunk_objectid;
@@ -1100,6 +1111,7 @@ struct btrfs_fs_info {
spinlock_t ref_cache_lock;
u64 total_ref_cache_size;
+ /* SINGLE has it's own bit for these three */
u64 avail_data_alloc_bits;
u64 avail_metadata_alloc_bits;
u64 avail_system_alloc_bits;
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index de4c639..ed35eb5 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2945,14 +2945,17 @@ static int update_space_info(struct btrfs_fs_info *info, u64 flags,
static void set_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags)
{
u64 extra_flags = flags & BTRFS_BLOCK_GROUP_PROFILE_MASK;
- if (extra_flags) {
- if (flags & BTRFS_BLOCK_GROUP_DATA)
- fs_info->avail_data_alloc_bits |= extra_flags;
- if (flags & BTRFS_BLOCK_GROUP_METADATA)
- fs_info->avail_metadata_alloc_bits |= extra_flags;
- if (flags & BTRFS_BLOCK_GROUP_SYSTEM)
- fs_info->avail_system_alloc_bits |= extra_flags;
- }
+
+ /* on-disk -> in-memory */
+ if (extra_flags == 0)
+ extra_flags = BTRFS_AVAIL_ALLOC_BIT_SINGLE;
+
+ if (flags & BTRFS_BLOCK_GROUP_DATA)
+ fs_info->avail_data_alloc_bits |= extra_flags;
+ if (flags & BTRFS_BLOCK_GROUP_METADATA)
+ fs_info->avail_metadata_alloc_bits |= extra_flags;
+ if (flags & BTRFS_BLOCK_GROUP_SYSTEM)
+ fs_info->avail_system_alloc_bits |= extra_flags;
}
u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags)
@@ -2986,6 +2989,9 @@ u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags)
(flags & BTRFS_BLOCK_GROUP_RAID10) |
(flags & BTRFS_BLOCK_GROUP_DUP)))
flags &= ~BTRFS_BLOCK_GROUP_RAID0;
+
+ /* in-memory -> on-disk */
+ flags &= ~BTRFS_AVAIL_ALLOC_BIT_SINGLE;
return flags;
}
--
1.7.5.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 04/21] Btrfs: make avail_*_alloc_bits fields dynamic
2011-08-23 20:01 [PATCH 00/21] [RFC] Btrfs: restriper Ilya Dryomov
` (2 preceding siblings ...)
2011-08-23 20:01 ` [PATCH 03/21] Btrfs: add BTRFS_AVAIL_ALLOC_BIT_SINGLE bit Ilya Dryomov
@ 2011-08-23 20:01 ` Ilya Dryomov
2011-08-23 20:01 ` [PATCH 05/21] Btrfs: add basic restriper infrastructure Ilya Dryomov
` (19 subsequent siblings)
23 siblings, 0 replies; 42+ messages in thread
From: Ilya Dryomov @ 2011-08-23 20:01 UTC (permalink / raw)
To: linux-btrfs; +Cc: Chris Mason, Hugo Mills, idryomov
Currently when new chunks are created respective avail_alloc_bits field
is updated to reflect profiles of all chunks present in the system.
However when chunks are removed, corresponding profile bits are never
cleared.
This patch clears corresponding bit of avail_alloc_bits field when the
last chunk of that type goes away. Restriper needs this to properly
operate when "downgrading".
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
fs/btrfs/extent-tree.c | 20 ++++++++++++++++++++
1 files changed, 20 insertions(+), 0 deletions(-)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ed35eb5..a04f99b 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -7197,6 +7197,22 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans,
return 0;
}
+static void clear_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags)
+{
+ u64 extra_flags = flags & BTRFS_BLOCK_GROUP_PROFILE_MASK;
+
+ /* on-disk -> in-memory */
+ if (extra_flags == 0)
+ extra_flags = BTRFS_AVAIL_ALLOC_BIT_SINGLE;
+
+ if (flags & BTRFS_BLOCK_GROUP_DATA)
+ fs_info->avail_data_alloc_bits &= ~extra_flags;
+ if (flags & BTRFS_BLOCK_GROUP_METADATA)
+ fs_info->avail_metadata_alloc_bits &= ~extra_flags;
+ if (flags & BTRFS_BLOCK_GROUP_SYSTEM)
+ fs_info->avail_system_alloc_bits &= ~extra_flags;
+}
+
int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
struct btrfs_root *root, u64 group_start)
{
@@ -7207,6 +7223,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
struct btrfs_key key;
struct inode *inode;
int ret;
+ int index;
int factor;
root = root->fs_info->extent_root;
@@ -7222,6 +7239,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
free_excluded_extents(root, block_group);
memcpy(&key, &block_group->key, sizeof(key));
+ index = get_block_group_index(block_group);
if (block_group->flags & (BTRFS_BLOCK_GROUP_DUP |
BTRFS_BLOCK_GROUP_RAID1 |
BTRFS_BLOCK_GROUP_RAID10))
@@ -7296,6 +7314,8 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
* are still on the list after taking the semaphore
*/
list_del_init(&block_group->list);
+ if (list_empty(&block_group->space_info->block_groups[index]))
+ clear_avail_alloc_bits(root->fs_info, block_group->flags);
up_write(&block_group->space_info->groups_sem);
if (block_group->cached == BTRFS_CACHE_STARTED)
--
1.7.5.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 05/21] Btrfs: add basic restriper infrastructure
2011-08-23 20:01 [PATCH 00/21] [RFC] Btrfs: restriper Ilya Dryomov
` (3 preceding siblings ...)
2011-08-23 20:01 ` [PATCH 04/21] Btrfs: make avail_*_alloc_bits fields dynamic Ilya Dryomov
@ 2011-08-23 20:01 ` Ilya Dryomov
2011-11-01 10:08 ` Arne Jansen
2011-08-23 20:01 ` [PATCH 06/21] Btrfs: implement online profile changing Ilya Dryomov
` (18 subsequent siblings)
23 siblings, 1 reply; 42+ messages in thread
From: Ilya Dryomov @ 2011-08-23 20:01 UTC (permalink / raw)
To: linux-btrfs; +Cc: Chris Mason, Hugo Mills, idryomov
Add basic restriper infrastructure: ioctl to start restripe, all
restripe ioctl data structures, add data structure for tracking
restriper's state to fs_info. Duplicate balancing code for restriper,
btrfs_balance() will be removed when restriper is implemented.
Explicitly disallow any volume operations when restriper is running.
(previously this restriction relied on volume_mutex being held during
the execution of any volume operation)
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
fs/btrfs/ctree.h | 5 +
fs/btrfs/disk-io.c | 4 +
fs/btrfs/ioctl.c | 107 ++++++++++++++++++++++----
fs/btrfs/ioctl.h | 37 +++++++++
fs/btrfs/volumes.c | 219 ++++++++++++++++++++++++++++++++++++++++++++++++++--
fs/btrfs/volumes.h | 18 ++++
6 files changed, 369 insertions(+), 21 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 5b00eb8..65d7562 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -895,6 +895,7 @@ struct btrfs_block_group_cache {
};
struct reloc_control;
+struct restripe_control;
struct btrfs_device;
struct btrfs_fs_devices;
struct btrfs_delayed_root;
@@ -1116,6 +1117,10 @@ struct btrfs_fs_info {
u64 avail_metadata_alloc_bits;
u64 avail_system_alloc_bits;
+ spinlock_t restripe_lock;
+ struct mutex restripe_mutex;
+ struct restripe_control *restripe_ctl;
+
unsigned data_chunk_allocations;
unsigned metadata_ratio;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 46d0412..fa2301b 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1700,6 +1700,10 @@ struct btrfs_root *open_ctree(struct super_block *sb,
init_rwsem(&fs_info->scrub_super_lock);
fs_info->scrub_workers_refcnt = 0;
+ spin_lock_init(&fs_info->restripe_lock);
+ mutex_init(&fs_info->restripe_mutex);
+ fs_info->restripe_ctl = NULL;
+
sb->s_blocksize = 4096;
sb->s_blocksize_bits = blksize_bits(4096);
sb->s_bdi = &fs_info->bdi;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 970977a..9dfc686 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1165,13 +1165,21 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root *root,
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
+ mutex_lock(&root->fs_info->volume_mutex);
+ if (root->fs_info->restripe_ctl) {
+ printk(KERN_INFO "btrfs: restripe in progress\n");
+ ret = -EINVAL;
+ goto out;
+ }
+
vol_args = memdup_user(arg, sizeof(*vol_args));
- if (IS_ERR(vol_args))
- return PTR_ERR(vol_args);
+ if (IS_ERR(vol_args)) {
+ ret = PTR_ERR(vol_args);
+ goto out;
+ }
vol_args->name[BTRFS_PATH_NAME_MAX] = '\0';
- mutex_lock(&root->fs_info->volume_mutex);
sizestr = vol_args->name;
devstr = strchr(sizestr, ':');
if (devstr) {
@@ -1188,7 +1196,7 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root *root,
printk(KERN_INFO "resizer unable to find device %llu\n",
(unsigned long long)devid);
ret = -EINVAL;
- goto out_unlock;
+ goto out_free;
}
if (!strcmp(sizestr, "max"))
new_size = device->bdev->bd_inode->i_size;
@@ -1203,7 +1211,7 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root *root,
new_size = memparse(sizestr, NULL);
if (new_size == 0) {
ret = -EINVAL;
- goto out_unlock;
+ goto out_free;
}
}
@@ -1212,7 +1220,7 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root *root,
if (mod < 0) {
if (new_size > old_size) {
ret = -EINVAL;
- goto out_unlock;
+ goto out_free;
}
new_size = old_size - new_size;
} else if (mod > 0) {
@@ -1221,11 +1229,11 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root *root,
if (new_size < 256 * 1024 * 1024) {
ret = -EINVAL;
- goto out_unlock;
+ goto out_free;
}
if (new_size > device->bdev->bd_inode->i_size) {
ret = -EFBIG;
- goto out_unlock;
+ goto out_free;
}
do_div(new_size, root->sectorsize);
@@ -1238,7 +1246,7 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root *root,
trans = btrfs_start_transaction(root, 0);
if (IS_ERR(trans)) {
ret = PTR_ERR(trans);
- goto out_unlock;
+ goto out_free;
}
ret = btrfs_grow_device(trans, device, new_size);
btrfs_commit_transaction(trans, root);
@@ -1246,9 +1254,10 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root *root,
ret = btrfs_shrink_device(device, new_size);
}
-out_unlock:
- mutex_unlock(&root->fs_info->volume_mutex);
+out_free:
kfree(vol_args);
+out:
+ mutex_unlock(&root->fs_info->volume_mutex);
return ret;
}
@@ -2014,14 +2023,25 @@ static long btrfs_ioctl_add_dev(struct btrfs_root *root, void __user *arg)
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
+ mutex_lock(&root->fs_info->volume_mutex);
+ if (root->fs_info->restripe_ctl) {
+ printk(KERN_INFO "btrfs: restripe in progress\n");
+ ret = -EINVAL;
+ goto out;
+ }
+
vol_args = memdup_user(arg, sizeof(*vol_args));
- if (IS_ERR(vol_args))
- return PTR_ERR(vol_args);
+ if (IS_ERR(vol_args)) {
+ ret = PTR_ERR(vol_args);
+ goto out;
+ }
vol_args->name[BTRFS_PATH_NAME_MAX] = '\0';
ret = btrfs_init_new_device(root, vol_args->name);
kfree(vol_args);
+out:
+ mutex_unlock(&root->fs_info->volume_mutex);
return ret;
}
@@ -2036,14 +2056,25 @@ static long btrfs_ioctl_rm_dev(struct btrfs_root *root, void __user *arg)
if (root->fs_info->sb->s_flags & MS_RDONLY)
return -EROFS;
+ mutex_lock(&root->fs_info->volume_mutex);
+ if (root->fs_info->restripe_ctl) {
+ printk(KERN_INFO "btrfs: restripe in progress\n");
+ ret = -EINVAL;
+ goto out;
+ }
+
vol_args = memdup_user(arg, sizeof(*vol_args));
- if (IS_ERR(vol_args))
- return PTR_ERR(vol_args);
+ if (IS_ERR(vol_args)) {
+ ret = PTR_ERR(vol_args);
+ goto out;
+ }
vol_args->name[BTRFS_PATH_NAME_MAX] = '\0';
ret = btrfs_rm_device(root, vol_args->name);
kfree(vol_args);
+out:
+ mutex_unlock(&root->fs_info->volume_mutex);
return ret;
}
@@ -2833,6 +2864,50 @@ static long btrfs_ioctl_scrub_progress(struct btrfs_root *root,
return ret;
}
+static long btrfs_ioctl_restripe(struct btrfs_root *root, void __user *arg)
+{
+ struct btrfs_ioctl_restripe_args *rargs;
+ struct btrfs_fs_info *fs_info = root->fs_info;
+ struct restripe_control *rctl;
+ int ret;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
+ if (fs_info->sb->s_flags & MS_RDONLY)
+ return -EROFS;
+
+ mutex_lock(&fs_info->restripe_mutex);
+
+ rargs = memdup_user(arg, sizeof(*rargs));
+ if (IS_ERR(rargs)) {
+ ret = PTR_ERR(rargs);
+ goto out;
+ }
+
+ rctl = kzalloc(sizeof(*rctl), GFP_NOFS);
+ if (!rctl) {
+ kfree(rargs);
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ rctl->fs_info = fs_info;
+ rctl->flags = rargs->flags;
+
+ memcpy(&rctl->data, &rargs->data, sizeof(rctl->data));
+ memcpy(&rctl->meta, &rargs->meta, sizeof(rctl->meta));
+ memcpy(&rctl->sys, &rargs->sys, sizeof(rctl->sys));
+
+ ret = btrfs_restripe(rctl);
+
+ /* rctl freed in unset_restripe_control */
+ kfree(rargs);
+out:
+ mutex_unlock(&fs_info->restripe_mutex);
+ return ret;
+}
+
long btrfs_ioctl(struct file *file, unsigned int
cmd, unsigned long arg)
{
@@ -2905,6 +2980,8 @@ long btrfs_ioctl(struct file *file, unsigned int
return btrfs_ioctl_scrub_cancel(root, argp);
case BTRFS_IOC_SCRUB_PROGRESS:
return btrfs_ioctl_scrub_progress(root, argp);
+ case BTRFS_IOC_RESTRIPE:
+ return btrfs_ioctl_restripe(root, argp);
}
return -ENOTTY;
diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h
index ad1ea78..798f1d4 100644
--- a/fs/btrfs/ioctl.h
+++ b/fs/btrfs/ioctl.h
@@ -109,6 +109,41 @@ struct btrfs_ioctl_fs_info_args {
__u64 reserved[124]; /* pad to 1k */
};
+struct btrfs_restripe_args {
+ __u64 profiles;
+ __u64 usage;
+ __u64 devid;
+ __u64 pstart;
+ __u64 pend;
+ __u64 vstart;
+ __u64 vend;
+
+ __u64 target;
+
+ __u64 flags;
+
+ __u64 unused[8];
+} __attribute__ ((__packed__));
+
+struct btrfs_restripe_progress {
+ __u64 expected;
+ __u64 considered;
+ __u64 completed;
+};
+
+struct btrfs_ioctl_restripe_args {
+ __u64 flags;
+ __u64 state;
+
+ struct btrfs_restripe_args data;
+ struct btrfs_restripe_args sys;
+ struct btrfs_restripe_args meta;
+
+ struct btrfs_restripe_progress stat;
+
+ __u64 unused[72]; /* pad to 1k */
+};
+
#define BTRFS_INO_LOOKUP_PATH_MAX 4080
struct btrfs_ioctl_ino_lookup_args {
__u64 treeid;
@@ -248,4 +283,6 @@ struct btrfs_ioctl_space_args {
struct btrfs_ioctl_dev_info_args)
#define BTRFS_IOC_FS_INFO _IOR(BTRFS_IOCTL_MAGIC, 31, \
struct btrfs_ioctl_fs_info_args)
+#define BTRFS_IOC_RESTRIPE _IOW(BTRFS_IOCTL_MAGIC, 32, \
+ struct btrfs_ioctl_restripe_args)
#endif
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index af4bf56..0e4a276 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1262,7 +1262,6 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path)
bool clear_super = false;
mutex_lock(&uuid_mutex);
- mutex_lock(&root->fs_info->volume_mutex);
all_avail = root->fs_info->avail_data_alloc_bits |
root->fs_info->avail_system_alloc_bits |
@@ -1427,7 +1426,6 @@ error_close:
if (bdev)
blkdev_put(bdev, FMODE_READ | FMODE_EXCL);
out:
- mutex_unlock(&root->fs_info->volume_mutex);
mutex_unlock(&uuid_mutex);
return ret;
error_undo:
@@ -1604,7 +1602,6 @@ int btrfs_init_new_device(struct btrfs_root *root, char *device_path)
}
filemap_write_and_wait(bdev->bd_inode->i_mapping);
- mutex_lock(&root->fs_info->volume_mutex);
devices = &root->fs_info->fs_devices->devices;
/*
@@ -1728,8 +1725,7 @@ int btrfs_init_new_device(struct btrfs_root *root, char *device_path)
ret = btrfs_relocate_sys_chunks(root);
BUG_ON(ret);
}
-out:
- mutex_unlock(&root->fs_info->volume_mutex);
+
return ret;
error:
blkdev_put(bdev, FMODE_EXCL);
@@ -1737,7 +1733,7 @@ error:
mutex_unlock(&uuid_mutex);
up_write(&sb->s_umount);
}
- goto out;
+ return ret;
}
static noinline int btrfs_update_device(struct btrfs_trans_handle *trans,
@@ -2155,6 +2151,217 @@ error:
}
/*
+ * Should be called with both restripe and volume mutexes held to
+ * serialize other volume operations (add_dev/rm_dev/resize) wrt
+ * restriper. Same goes for unset_restripe_control().
+ */
+static void set_restripe_control(struct restripe_control *rctl)
+{
+ struct btrfs_fs_info *fs_info = rctl->fs_info;
+
+ spin_lock(&fs_info->restripe_lock);
+ fs_info->restripe_ctl = rctl;
+ spin_unlock(&fs_info->restripe_lock);
+}
+
+static void unset_restripe_control(struct btrfs_fs_info *fs_info)
+{
+ struct restripe_control *rctl = fs_info->restripe_ctl;
+
+ spin_lock(&fs_info->restripe_lock);
+ fs_info->restripe_ctl = NULL;
+ spin_unlock(&fs_info->restripe_lock);
+
+ kfree(rctl);
+}
+
+static int __btrfs_restripe(struct btrfs_root *dev_root)
+{
+ struct list_head *devices;
+ struct btrfs_device *device;
+ u64 old_size;
+ u64 size_to_free;
+ struct btrfs_root *chunk_root = dev_root->fs_info->chunk_root;
+ struct btrfs_path *path;
+ struct btrfs_key key;
+ struct btrfs_key found_key;
+ struct btrfs_trans_handle *trans;
+ int ret;
+ int enospc_errors = 0;
+
+ /* step one make some room on all the devices */
+ devices = &dev_root->fs_info->fs_devices->devices;
+ list_for_each_entry(device, devices, dev_list) {
+ old_size = device->total_bytes;
+ size_to_free = div_factor(old_size, 1);
+ size_to_free = min(size_to_free, (u64)1 * 1024 * 1024);
+ if (!device->writeable ||
+ device->total_bytes - device->bytes_used > size_to_free)
+ continue;
+
+ ret = btrfs_shrink_device(device, old_size - size_to_free);
+ if (ret == -ENOSPC)
+ break;
+ BUG_ON(ret);
+
+ trans = btrfs_start_transaction(dev_root, 0);
+ BUG_ON(IS_ERR(trans));
+
+ ret = btrfs_grow_device(trans, device, old_size);
+ BUG_ON(ret);
+
+ btrfs_end_transaction(trans, dev_root);
+ }
+
+ /* step two, relocate all the chunks */
+ path = btrfs_alloc_path();
+ if (!path) {
+ ret = -ENOMEM;
+ goto error;
+ }
+
+ key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
+ key.offset = (u64)-1;
+ key.type = BTRFS_CHUNK_ITEM_KEY;
+
+ while (1) {
+ ret = btrfs_search_slot(NULL, chunk_root, &key, path, 0, 0);
+ if (ret < 0)
+ goto error;
+
+ /*
+ * this shouldn't happen, it means the last relocate
+ * failed
+ */
+ if (ret == 0)
+ BUG_ON(1); /* DIS - break ? */
+
+ ret = btrfs_previous_item(chunk_root, path, 0,
+ BTRFS_CHUNK_ITEM_KEY);
+ if (ret)
+ BUG_ON(1); /* DIS - break ? */
+
+ btrfs_item_key_to_cpu(path->nodes[0], &found_key,
+ path->slots[0]);
+ if (found_key.objectid != key.objectid)
+ break;
+
+ /* chunk zero is special */
+ if (found_key.offset == 0)
+ break;
+
+ btrfs_release_path(path);
+ ret = btrfs_relocate_chunk(chunk_root,
+ chunk_root->root_key.objectid,
+ found_key.objectid,
+ found_key.offset);
+ if (ret && ret != -ENOSPC)
+ goto error;
+ if (ret == -ENOSPC)
+ enospc_errors++;
+ key.offset = found_key.offset - 1;
+ }
+
+error:
+ btrfs_free_path(path);
+ if (enospc_errors) {
+ printk(KERN_INFO "btrfs: restripe finished with %d enospc "
+ "error(s)\n", enospc_errors);
+ ret = -ENOSPC;
+ }
+
+ return ret;
+}
+
+/*
+ * Should be called with restripe_mutex held
+ */
+int btrfs_restripe(struct restripe_control *rctl)
+{
+ struct btrfs_fs_info *fs_info = rctl->fs_info;
+ u64 allowed;
+ int ret;
+
+ mutex_lock(&fs_info->volume_mutex);
+
+ /*
+ * Profile changing sanity checks
+ */
+ allowed = BTRFS_AVAIL_ALLOC_BIT_SINGLE;
+ if (fs_info->fs_devices->num_devices == 1)
+ allowed |= BTRFS_BLOCK_GROUP_DUP;
+ else if (fs_info->fs_devices->num_devices < 4)
+ allowed |= (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1);
+ else
+ allowed |= (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1 |
+ BTRFS_BLOCK_GROUP_RAID10);
+
+ if (rctl->data.target & ~allowed) {
+ printk(KERN_ERR "btrfs: unable to start restripe with target "
+ "data profile %llu\n",
+ (unsigned long long)rctl->data.target);
+ ret = -EINVAL;
+ goto out;
+ }
+ if (rctl->sys.target & ~allowed) {
+ printk(KERN_ERR "btrfs: unable to start restripe with target "
+ "system profile %llu\n",
+ (unsigned long long)rctl->sys.target);
+ ret = -EINVAL;
+ goto out;
+ }
+ if (rctl->meta.target & ~allowed) {
+ printk(KERN_ERR "btrfs: unable to start restripe with target "
+ "metadata profile %llu\n",
+ (unsigned long long)rctl->meta.target);
+ ret = -EINVAL;
+ goto out;
+ }
+
+ if (rctl->data.target & BTRFS_BLOCK_GROUP_DUP) {
+ printk(KERN_ERR "btrfs: dup for data is not allowed\n");
+ ret = -EINVAL;
+ goto out;
+ }
+
+ /* allow to reduce meta or sys integrity only if force set */
+ allowed = BTRFS_BLOCK_GROUP_DUP | BTRFS_BLOCK_GROUP_RAID1 |
+ BTRFS_BLOCK_GROUP_RAID10;
+ if (((rctl->sys.flags & BTRFS_RESTRIPE_ARGS_CONVERT) &&
+ (fs_info->avail_system_alloc_bits & allowed) &&
+ !(rctl->sys.target & allowed)) ||
+ ((rctl->meta.flags & BTRFS_RESTRIPE_ARGS_CONVERT) &&
+ (fs_info->avail_metadata_alloc_bits & allowed) &&
+ !(rctl->meta.target & allowed))) {
+ if (rctl->flags & BTRFS_RESTRIPE_FORCE) {
+ printk(KERN_INFO "btrfs: force reducing metadata "
+ "integrity\n");
+ } else {
+ printk(KERN_ERR "btrfs: can't reduce metadata "
+ "integrity\n");
+ ret = -EINVAL;
+ goto out;
+ }
+ }
+
+ set_restripe_control(rctl);
+ mutex_unlock(&fs_info->volume_mutex);
+
+ ret = __btrfs_restripe(fs_info->dev_root);
+
+ mutex_lock(&fs_info->volume_mutex);
+ unset_restripe_control(fs_info);
+ mutex_unlock(&fs_info->volume_mutex);
+
+ return ret;
+
+out:
+ mutex_unlock(&fs_info->volume_mutex);
+ kfree(rctl);
+ return ret;
+}
+
+/*
* shrinking a device means finding all of the device extents past
* the new size, and then following the back refs to the chunks.
* The chunk relocation code actually frees the device extent
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 6d866db..8804c5c 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -168,6 +168,23 @@ struct map_lookup {
#define map_lookup_size(n) (sizeof(struct map_lookup) + \
(sizeof(struct btrfs_bio_stripe) * (n)))
+#define BTRFS_RESTRIPE_FORCE (1ULL << 3)
+
+/*
+ * Profile changing flags
+ */
+#define BTRFS_RESTRIPE_ARGS_CONVERT (1ULL << 8)
+
+struct btrfs_restripe_args;
+struct restripe_control {
+ struct btrfs_fs_info *fs_info;
+ u64 flags;
+
+ struct btrfs_restripe_args data;
+ struct btrfs_restripe_args sys;
+ struct btrfs_restripe_args meta;
+};
+
int btrfs_account_dev_extents_size(struct btrfs_device *device, u64 start,
u64 end, u64 *length);
@@ -211,6 +228,7 @@ struct btrfs_device *btrfs_find_device(struct btrfs_root *root, u64 devid,
int btrfs_shrink_device(struct btrfs_device *device, u64 new_size);
int btrfs_init_new_device(struct btrfs_root *root, char *path);
int btrfs_balance(struct btrfs_root *dev_root);
+int btrfs_restripe(struct restripe_control *rctl);
int btrfs_chunk_readonly(struct btrfs_root *root, u64 chunk_offset);
int find_free_dev_extent(struct btrfs_trans_handle *trans,
struct btrfs_device *device, u64 num_bytes,
--
1.7.5.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 06/21] Btrfs: implement online profile changing
2011-08-23 20:01 [PATCH 00/21] [RFC] Btrfs: restriper Ilya Dryomov
` (4 preceding siblings ...)
2011-08-23 20:01 ` [PATCH 05/21] Btrfs: add basic restriper infrastructure Ilya Dryomov
@ 2011-08-23 20:01 ` Ilya Dryomov
2011-08-23 20:01 ` [PATCH 07/21] Btrfs: add basic infrastructure for selective balancing Ilya Dryomov
` (17 subsequent siblings)
23 siblings, 0 replies; 42+ messages in thread
From: Ilya Dryomov @ 2011-08-23 20:01 UTC (permalink / raw)
To: linux-btrfs; +Cc: Chris Mason, Hugo Mills, idryomov
Profile changing is done by initializing target field in respective
btrfs_restripe_args structs and launching a balance. Reducing code in
this mode will pick restriper's target profile if it's available instead
of doing a blind reduce. If target profile is not yet available go back
to plain reducing.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
fs/btrfs/extent-tree.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 53 insertions(+), 1 deletions(-)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index a04f99b..05e55d1 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2968,6 +2968,34 @@ u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags)
u64 num_devices = root->fs_info->fs_devices->rw_devices +
root->fs_info->fs_devices->missing_devices;
+ /* pick restriper's target profile if it's available */
+ spin_lock(&root->fs_info->restripe_lock);
+ if (root->fs_info->restripe_ctl) {
+ struct restripe_control *rctl = root->fs_info->restripe_ctl;
+ u64 t = 0;
+
+ if ((flags & BTRFS_BLOCK_GROUP_DATA) &&
+ (rctl->data.flags & BTRFS_RESTRIPE_ARGS_CONVERT) &&
+ (flags & rctl->data.target)) {
+ t = BTRFS_BLOCK_GROUP_DATA | rctl->data.target;
+ } else if ((flags & BTRFS_BLOCK_GROUP_SYSTEM) &&
+ (rctl->sys.flags & BTRFS_RESTRIPE_ARGS_CONVERT) &&
+ (flags & rctl->sys.target)) {
+ t = BTRFS_BLOCK_GROUP_SYSTEM | rctl->sys.target;
+ } else if ((flags & BTRFS_BLOCK_GROUP_METADATA) &&
+ (rctl->meta.flags & BTRFS_RESTRIPE_ARGS_CONVERT) &&
+ (flags & rctl->meta.target)) {
+ t = BTRFS_BLOCK_GROUP_METADATA | rctl->meta.target;
+ }
+
+ if (t) {
+ spin_unlock(&root->fs_info->restripe_lock);
+ t &= ~BTRFS_AVAIL_ALLOC_BIT_SINGLE;
+ return t;
+ }
+ }
+ spin_unlock(&root->fs_info->restripe_lock);
+
if (num_devices == 1)
flags &= ~(BTRFS_BLOCK_GROUP_RAID1 | BTRFS_BLOCK_GROUP_RAID0);
if (num_devices < 4)
@@ -2987,8 +3015,9 @@ u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags)
if ((flags & BTRFS_BLOCK_GROUP_RAID0) &&
((flags & BTRFS_BLOCK_GROUP_RAID1) |
(flags & BTRFS_BLOCK_GROUP_RAID10) |
- (flags & BTRFS_BLOCK_GROUP_DUP)))
+ (flags & BTRFS_BLOCK_GROUP_DUP))) {
flags &= ~BTRFS_BLOCK_GROUP_RAID0;
+ }
/* in-memory -> on-disk */
flags &= ~BTRFS_AVAIL_ALLOC_BIT_SINGLE;
@@ -6519,6 +6548,29 @@ static u64 update_block_group_flags(struct btrfs_root *root, u64 flags)
u64 stripped = BTRFS_BLOCK_GROUP_RAID0 |
BTRFS_BLOCK_GROUP_RAID1 | BTRFS_BLOCK_GROUP_RAID10;
+ if (root->fs_info->restripe_ctl) {
+ struct restripe_control *rctl = root->fs_info->restripe_ctl;
+ u64 t = 0;
+
+ /* pick restriper's target profile and return */
+ if (flags & BTRFS_BLOCK_GROUP_DATA &&
+ rctl->data.flags & BTRFS_RESTRIPE_ARGS_CONVERT) {
+ t = BTRFS_BLOCK_GROUP_DATA | rctl->data.target;
+ } else if (flags & BTRFS_BLOCK_GROUP_SYSTEM &&
+ rctl->sys.flags & BTRFS_RESTRIPE_ARGS_CONVERT) {
+ t = BTRFS_BLOCK_GROUP_SYSTEM | rctl->sys.target;
+ } else if (flags & BTRFS_BLOCK_GROUP_METADATA &&
+ rctl->meta.flags & BTRFS_RESTRIPE_ARGS_CONVERT) {
+ t = BTRFS_BLOCK_GROUP_METADATA | rctl->meta.target;
+ }
+
+ if (t) {
+ /* in-memory -> on-disk */
+ t &= ~BTRFS_AVAIL_ALLOC_BIT_SINGLE;
+ return t;
+ }
+ }
+
/*
* we add in the count of missing devices because we want
* to make sure that any RAID levels on a degraded FS
--
1.7.5.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 07/21] Btrfs: add basic infrastructure for selective balancing
2011-08-23 20:01 [PATCH 00/21] [RFC] Btrfs: restriper Ilya Dryomov
` (5 preceding siblings ...)
2011-08-23 20:01 ` [PATCH 06/21] Btrfs: implement online profile changing Ilya Dryomov
@ 2011-08-23 20:01 ` Ilya Dryomov
2011-09-27 13:02 ` David Sterba
2011-08-23 20:01 ` [PATCH 08/21] Btrfs: soft profile changing mode (aka soft convert) Ilya Dryomov
` (16 subsequent siblings)
23 siblings, 1 reply; 42+ messages in thread
From: Ilya Dryomov @ 2011-08-23 20:01 UTC (permalink / raw)
To: linux-btrfs; +Cc: Chris Mason, Hugo Mills, idryomov
This allows to have a separate set of filters for each chunk type
(data,meta,sys). The code however is generic and switch on chunk type
is only done once.
This commit also adds a type filter: it allows to balance for example
meta and system chunks w/o touching data ones.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
fs/btrfs/volumes.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++--
fs/btrfs/volumes.h | 12 +++++++++
2 files changed, 76 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 0e4a276..95c6310 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2175,6 +2175,30 @@ static void unset_restripe_control(struct btrfs_fs_info *fs_info)
kfree(rctl);
}
+static int should_restripe_chunk(struct btrfs_root *root,
+ struct extent_buffer *leaf,
+ struct btrfs_chunk *chunk, u64 chunk_offset)
+{
+ struct restripe_control *rctl = root->fs_info->restripe_ctl;
+ u64 chunk_type = btrfs_chunk_type(leaf, chunk);
+ struct btrfs_restripe_args *rargs = NULL;
+
+ /* type filter */
+ if (!((chunk_type & BTRFS_BLOCK_GROUP_TYPE_MASK) &
+ (rctl->flags & BTRFS_RESTRIPE_TYPE_MASK))) {
+ return 0;
+ }
+
+ if (chunk_type & BTRFS_BLOCK_GROUP_DATA)
+ rargs = &rctl->data;
+ else if (chunk_type & BTRFS_BLOCK_GROUP_SYSTEM)
+ rargs = &rctl->sys;
+ else if (chunk_type & BTRFS_BLOCK_GROUP_METADATA)
+ rargs = &rctl->meta;
+
+ return 1;
+}
+
static int __btrfs_restripe(struct btrfs_root *dev_root)
{
struct list_head *devices;
@@ -2182,10 +2206,13 @@ static int __btrfs_restripe(struct btrfs_root *dev_root)
u64 old_size;
u64 size_to_free;
struct btrfs_root *chunk_root = dev_root->fs_info->chunk_root;
+ struct btrfs_chunk *chunk;
struct btrfs_path *path;
struct btrfs_key key;
struct btrfs_key found_key;
struct btrfs_trans_handle *trans;
+ struct extent_buffer *leaf;
+ int slot;
int ret;
int enospc_errors = 0;
@@ -2241,8 +2268,10 @@ static int __btrfs_restripe(struct btrfs_root *dev_root)
if (ret)
BUG_ON(1); /* DIS - break ? */
- btrfs_item_key_to_cpu(path->nodes[0], &found_key,
- path->slots[0]);
+ leaf = path->nodes[0];
+ slot = path->slots[0];
+ btrfs_item_key_to_cpu(leaf, &found_key, slot);
+
if (found_key.objectid != key.objectid)
break;
@@ -2250,6 +2279,14 @@ static int __btrfs_restripe(struct btrfs_root *dev_root)
if (found_key.offset == 0)
break;
+ chunk = btrfs_item_ptr(leaf, slot, struct btrfs_chunk);
+
+ if (!should_restripe_chunk(chunk_root, leaf, chunk,
+ found_key.offset)) {
+ btrfs_release_path(path);
+ goto loop;
+ }
+
btrfs_release_path(path);
ret = btrfs_relocate_chunk(chunk_root,
chunk_root->root_key.objectid,
@@ -2259,6 +2296,7 @@ static int __btrfs_restripe(struct btrfs_root *dev_root)
goto error;
if (ret == -ENOSPC)
enospc_errors++;
+loop:
key.offset = found_key.offset - 1;
}
@@ -2285,8 +2323,30 @@ int btrfs_restripe(struct restripe_control *rctl)
mutex_lock(&fs_info->volume_mutex);
/*
- * Profile changing sanity checks
+ * In case of mixed groups both data and meta should be picked,
+ * and identical options should be given for both of them.
*/
+ allowed = btrfs_super_incompat_flags(&fs_info->super_copy);
+ if ((allowed & BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS) &&
+ (rctl->flags & (BTRFS_RESTRIPE_DATA | BTRFS_RESTRIPE_METADATA))) {
+ if (!(rctl->flags & BTRFS_RESTRIPE_DATA) ||
+ !(rctl->flags & BTRFS_RESTRIPE_METADATA) ||
+ memcmp(&rctl->data, &rctl->meta, sizeof(rctl->data))) {
+ printk(KERN_ERR "btrfs: with mixed groups data and "
+ "metadata restripe options must be the same\n");
+ ret = -EINVAL;
+ goto out;
+ }
+ }
+
+ /*
+ * Profile changing sanity checks. Skip them if a simple
+ * balance is requested.
+ */
+ if (!((rctl->data.flags | rctl->sys.flags | rctl->meta.flags) &
+ BTRFS_RESTRIPE_ARGS_CONVERT))
+ goto do_restripe;
+
allowed = BTRFS_AVAIL_ALLOC_BIT_SINGLE;
if (fs_info->fs_devices->num_devices == 1)
allowed |= BTRFS_BLOCK_GROUP_DUP;
@@ -2344,6 +2404,7 @@ int btrfs_restripe(struct restripe_control *rctl)
}
}
+do_restripe:
set_restripe_control(rctl);
mutex_unlock(&fs_info->volume_mutex);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 8804c5c..f40227e 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -168,6 +168,18 @@ struct map_lookup {
#define map_lookup_size(n) (sizeof(struct map_lookup) + \
(sizeof(struct btrfs_bio_stripe) * (n)))
+/*
+ * Restriper's general "type" filter. Shares bits with chunk type for
+ * simplicity, RESTRIPE prefix is used to avoid confusion.
+ */
+#define BTRFS_RESTRIPE_DATA (1ULL << 0)
+#define BTRFS_RESTRIPE_SYSTEM (1ULL << 1)
+#define BTRFS_RESTRIPE_METADATA (1ULL << 2)
+
+#define BTRFS_RESTRIPE_TYPE_MASK (BTRFS_RESTRIPE_DATA | \
+ BTRFS_RESTRIPE_SYSTEM | \
+ BTRFS_RESTRIPE_METADATA)
+
#define BTRFS_RESTRIPE_FORCE (1ULL << 3)
/*
--
1.7.5.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 08/21] Btrfs: soft profile changing mode (aka soft convert)
2011-08-23 20:01 [PATCH 00/21] [RFC] Btrfs: restriper Ilya Dryomov
` (6 preceding siblings ...)
2011-08-23 20:01 ` [PATCH 07/21] Btrfs: add basic infrastructure for selective balancing Ilya Dryomov
@ 2011-08-23 20:01 ` Ilya Dryomov
2011-08-23 20:01 ` [PATCH 09/21] Btrfs: profiles filter Ilya Dryomov
` (15 subsequent siblings)
23 siblings, 0 replies; 42+ messages in thread
From: Ilya Dryomov @ 2011-08-23 20:01 UTC (permalink / raw)
To: linux-btrfs; +Cc: Chris Mason, Hugo Mills, idryomov
When doing convert from one profile to another if soft mode is on
restriper won't touch chunks that already have the profile we are
converting to. This is useful if e.g. half of the fs was converted
earlier.
The soft mode switch is per-type (like everything else). This means
that we can convert for example meta chunks the "hard" way while
converting data chunks selectively with soft switch.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
fs/btrfs/volumes.c | 26 ++++++++++++++++++++++++++
fs/btrfs/volumes.h | 5 ++++-
2 files changed, 30 insertions(+), 1 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 95c6310..ff252ef 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2175,6 +2175,26 @@ static void unset_restripe_control(struct btrfs_fs_info *fs_info)
kfree(rctl);
}
+/*
+ * Restripe filters. Return 1 if chunk should be 'filtered out',
+ * ie should not be restriped.
+ */
+static int chunk_soft_convert_filter(u64 chunk_profile,
+ struct btrfs_restripe_args *rargs)
+{
+ BUG_ON(!(rargs->flags & BTRFS_RESTRIPE_ARGS_CONVERT));
+
+ chunk_profile &= BTRFS_BLOCK_GROUP_PROFILE_MASK;
+
+ if (chunk_profile == 0)
+ chunk_profile = BTRFS_AVAIL_ALLOC_BIT_SINGLE;
+
+ if (rargs->target & chunk_profile)
+ return 1;
+
+ return 0;
+}
+
static int should_restripe_chunk(struct btrfs_root *root,
struct extent_buffer *leaf,
struct btrfs_chunk *chunk, u64 chunk_offset)
@@ -2196,6 +2216,12 @@ static int should_restripe_chunk(struct btrfs_root *root,
else if (chunk_type & BTRFS_BLOCK_GROUP_METADATA)
rargs = &rctl->meta;
+ /* soft profile changing mode */
+ if ((rargs->flags & BTRFS_RESTRIPE_ARGS_SOFT) &&
+ chunk_soft_convert_filter(chunk_type, rargs)) {
+ return 0;
+ }
+
return 1;
}
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index f40227e..1852f69 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -183,9 +183,12 @@ struct map_lookup {
#define BTRFS_RESTRIPE_FORCE (1ULL << 3)
/*
- * Profile changing flags
+ * Profile changing flags. When SOFT is set we won't relocate chunk if
+ * it already has the target profile (even though it may be
+ * half-filled).
*/
#define BTRFS_RESTRIPE_ARGS_CONVERT (1ULL << 8)
+#define BTRFS_RESTRIPE_ARGS_SOFT (1ULL << 9)
struct btrfs_restripe_args;
struct restripe_control {
--
1.7.5.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 09/21] Btrfs: profiles filter
2011-08-23 20:01 [PATCH 00/21] [RFC] Btrfs: restriper Ilya Dryomov
` (7 preceding siblings ...)
2011-08-23 20:01 ` [PATCH 08/21] Btrfs: soft profile changing mode (aka soft convert) Ilya Dryomov
@ 2011-08-23 20:01 ` Ilya Dryomov
2011-08-23 20:01 ` [PATCH 10/21] Btrfs: usage filter Ilya Dryomov
` (14 subsequent siblings)
23 siblings, 0 replies; 42+ messages in thread
From: Ilya Dryomov @ 2011-08-23 20:01 UTC (permalink / raw)
To: linux-btrfs; +Cc: Chris Mason, Hugo Mills, idryomov
Select chunks based on a given profile mask.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
fs/btrfs/volumes.c | 20 ++++++++++++++++++++
fs/btrfs/volumes.h | 5 +++++
2 files changed, 25 insertions(+), 0 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index ff252ef..f045615 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2179,6 +2179,20 @@ static void unset_restripe_control(struct btrfs_fs_info *fs_info)
* Restripe filters. Return 1 if chunk should be 'filtered out',
* ie should not be restriped.
*/
+static int chunk_profiles_filter(u64 chunk_profile,
+ struct btrfs_restripe_args *rargs)
+{
+ chunk_profile &= BTRFS_BLOCK_GROUP_PROFILE_MASK;
+
+ if (chunk_profile == 0)
+ chunk_profile = BTRFS_AVAIL_ALLOC_BIT_SINGLE;
+
+ if (rargs->profiles & chunk_profile)
+ return 0;
+
+ return 1;
+}
+
static int chunk_soft_convert_filter(u64 chunk_profile,
struct btrfs_restripe_args *rargs)
{
@@ -2216,6 +2230,12 @@ static int should_restripe_chunk(struct btrfs_root *root,
else if (chunk_type & BTRFS_BLOCK_GROUP_METADATA)
rargs = &rctl->meta;
+ /* profiles filter */
+ if ((rargs->flags & BTRFS_RESTRIPE_ARGS_PROFILES) &&
+ chunk_profiles_filter(chunk_type, rargs)) {
+ return 0;
+ }
+
/* soft profile changing mode */
if ((rargs->flags & BTRFS_RESTRIPE_ARGS_SOFT) &&
chunk_soft_convert_filter(chunk_type, rargs)) {
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 1852f69..9f96ad8 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -183,6 +183,11 @@ struct map_lookup {
#define BTRFS_RESTRIPE_FORCE (1ULL << 3)
/*
+ * Restripe filters
+ */
+#define BTRFS_RESTRIPE_ARGS_PROFILES (1ULL << 0)
+
+/*
* Profile changing flags. When SOFT is set we won't relocate chunk if
* it already has the target profile (even though it may be
* half-filled).
--
1.7.5.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 10/21] Btrfs: usage filter
2011-08-23 20:01 [PATCH 00/21] [RFC] Btrfs: restriper Ilya Dryomov
` (8 preceding siblings ...)
2011-08-23 20:01 ` [PATCH 09/21] Btrfs: profiles filter Ilya Dryomov
@ 2011-08-23 20:01 ` Ilya Dryomov
2011-09-27 13:22 ` David Sterba
2011-11-01 10:18 ` Arne Jansen
2011-08-23 20:01 ` [PATCH 11/21] Btrfs: devid filter Ilya Dryomov
` (13 subsequent siblings)
23 siblings, 2 replies; 42+ messages in thread
From: Ilya Dryomov @ 2011-08-23 20:01 UTC (permalink / raw)
To: linux-btrfs; +Cc: Chris Mason, Hugo Mills, idryomov
Select chunks that are less than X percent full.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
fs/btrfs/volumes.c | 33 +++++++++++++++++++++++++++++++++
fs/btrfs/volumes.h | 1 +
2 files changed, 34 insertions(+), 0 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f045615..b49ecfa 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2193,6 +2193,33 @@ static int chunk_profiles_filter(u64 chunk_profile,
return 1;
}
+static u64 div_factor_fine(u64 num, int factor)
+{
+ if (factor == 100)
+ return num;
+ num *= factor;
+ do_div(num, 100);
+ return num;
+}
+
+static int chunk_usage_filter(struct btrfs_fs_info *fs_info, u64 chunk_offset,
+ struct btrfs_restripe_args *rargs)
+{
+ struct btrfs_block_group_cache *cache;
+ u64 chunk_used, user_thresh;
+ int ret = 1;
+
+ cache = btrfs_lookup_block_group(fs_info, chunk_offset);
+ chunk_used = btrfs_block_group_used(&cache->item);
+
+ user_thresh = div_factor_fine(cache->key.offset, rargs->usage);
+ if (chunk_used < user_thresh)
+ ret = 0;
+
+ btrfs_put_block_group(cache);
+ return ret;
+}
+
static int chunk_soft_convert_filter(u64 chunk_profile,
struct btrfs_restripe_args *rargs)
{
@@ -2236,6 +2263,12 @@ static int should_restripe_chunk(struct btrfs_root *root,
return 0;
}
+ /* usage filter */
+ if ((rargs->flags & BTRFS_RESTRIPE_ARGS_USAGE) &&
+ chunk_usage_filter(rctl->fs_info, chunk_offset, rargs)) {
+ return 0;
+ }
+
/* soft profile changing mode */
if ((rargs->flags & BTRFS_RESTRIPE_ARGS_SOFT) &&
chunk_soft_convert_filter(chunk_type, rargs)) {
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 9f96ad8..c6baf4b 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -186,6 +186,7 @@ struct map_lookup {
* Restripe filters
*/
#define BTRFS_RESTRIPE_ARGS_PROFILES (1ULL << 0)
+#define BTRFS_RESTRIPE_ARGS_USAGE (1ULL << 1)
/*
* Profile changing flags. When SOFT is set we won't relocate chunk if
--
1.7.5.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 11/21] Btrfs: devid filter
2011-08-23 20:01 [PATCH 00/21] [RFC] Btrfs: restriper Ilya Dryomov
` (9 preceding siblings ...)
2011-08-23 20:01 ` [PATCH 10/21] Btrfs: usage filter Ilya Dryomov
@ 2011-08-23 20:01 ` Ilya Dryomov
2011-08-23 20:01 ` [PATCH 12/21] Btrfs: devid subset filter Ilya Dryomov
` (12 subsequent siblings)
23 siblings, 0 replies; 42+ messages in thread
From: Ilya Dryomov @ 2011-08-23 20:01 UTC (permalink / raw)
To: linux-btrfs; +Cc: Chris Mason, Hugo Mills, idryomov
Relocate chunks which have at least one stripe located on a device with
devid X.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
fs/btrfs/volumes.c | 23 +++++++++++++++++++++++
fs/btrfs/volumes.h | 1 +
2 files changed, 24 insertions(+), 0 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index b49ecfa..ce2a9e0 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2220,6 +2220,23 @@ static int chunk_usage_filter(struct btrfs_fs_info *fs_info, u64 chunk_offset,
return ret;
}
+static int chunk_devid_filter(struct extent_buffer *leaf,
+ struct btrfs_chunk *chunk,
+ struct btrfs_restripe_args *rargs)
+{
+ struct btrfs_stripe *stripe;
+ int num_stripes = btrfs_chunk_num_stripes(leaf, chunk);
+ int i;
+
+ for (i = 0; i < num_stripes; i++) {
+ stripe = btrfs_stripe_nr(chunk, i);
+ if (btrfs_stripe_devid(leaf, stripe) == rargs->devid)
+ return 0;
+ }
+
+ return 1;
+}
+
static int chunk_soft_convert_filter(u64 chunk_profile,
struct btrfs_restripe_args *rargs)
{
@@ -2269,6 +2286,12 @@ static int should_restripe_chunk(struct btrfs_root *root,
return 0;
}
+ /* devid filter */
+ if ((rargs->flags & BTRFS_RESTRIPE_ARGS_DEVID) &&
+ chunk_devid_filter(leaf, chunk, rargs)) {
+ return 0;
+ }
+
/* soft profile changing mode */
if ((rargs->flags & BTRFS_RESTRIPE_ARGS_SOFT) &&
chunk_soft_convert_filter(chunk_type, rargs)) {
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index c6baf4b..1b8dc3e 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -187,6 +187,7 @@ struct map_lookup {
*/
#define BTRFS_RESTRIPE_ARGS_PROFILES (1ULL << 0)
#define BTRFS_RESTRIPE_ARGS_USAGE (1ULL << 1)
+#define BTRFS_RESTRIPE_ARGS_DEVID (1ULL << 2)
/*
* Profile changing flags. When SOFT is set we won't relocate chunk if
--
1.7.5.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 12/21] Btrfs: devid subset filter
2011-08-23 20:01 [PATCH 00/21] [RFC] Btrfs: restriper Ilya Dryomov
` (10 preceding siblings ...)
2011-08-23 20:01 ` [PATCH 11/21] Btrfs: devid filter Ilya Dryomov
@ 2011-08-23 20:01 ` Ilya Dryomov
2011-08-23 20:01 ` [PATCH 13/21] Btrfs: virtual address space " Ilya Dryomov
` (11 subsequent siblings)
23 siblings, 0 replies; 42+ messages in thread
From: Ilya Dryomov @ 2011-08-23 20:01 UTC (permalink / raw)
To: linux-btrfs; +Cc: Chris Mason, Hugo Mills, idryomov
Select chunks which have at least one byte of at least one stripe
located on a device with devid X in a given [pstart,pend) physical
address range.
This filter only works when devid filter is turned on.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
fs/btrfs/volumes.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
fs/btrfs/volumes.h | 1 +
2 files changed, 46 insertions(+), 0 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index ce2a9e0..4393f6d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2237,6 +2237,45 @@ static int chunk_devid_filter(struct extent_buffer *leaf,
return 1;
}
+/* [pstart, pend) */
+static int chunk_drange_filter(struct extent_buffer *leaf,
+ struct btrfs_chunk *chunk,
+ u64 chunk_offset,
+ struct btrfs_restripe_args *rargs)
+{
+ struct btrfs_stripe *stripe;
+ int num_stripes = btrfs_chunk_num_stripes(leaf, chunk);
+ u64 stripe_offset;
+ u64 stripe_length;
+ int factor;
+ int i;
+
+ BUG_ON(!(rargs->flags & BTRFS_RESTRIPE_ARGS_DEVID));
+
+ if (btrfs_chunk_type(leaf, chunk) & (BTRFS_BLOCK_GROUP_DUP |
+ BTRFS_BLOCK_GROUP_RAID1 | BTRFS_BLOCK_GROUP_RAID10))
+ factor = 2;
+ else
+ factor = 1;
+ factor = num_stripes / factor;
+
+ for (i = 0; i < num_stripes; i++) {
+ stripe = btrfs_stripe_nr(chunk, i);
+ if (btrfs_stripe_devid(leaf, stripe) != rargs->devid)
+ continue;
+
+ stripe_offset = btrfs_stripe_offset(leaf, stripe);
+ stripe_length = btrfs_chunk_length(leaf, chunk);
+ do_div(stripe_length, factor);
+
+ if (stripe_offset < rargs->pend &&
+ stripe_offset + stripe_length > rargs->pstart)
+ return 0;
+ }
+
+ return 1;
+}
+
static int chunk_soft_convert_filter(u64 chunk_profile,
struct btrfs_restripe_args *rargs)
{
@@ -2292,6 +2331,12 @@ static int should_restripe_chunk(struct btrfs_root *root,
return 0;
}
+ /* drange filter, makes sense only with devid filter */
+ if ((rargs->flags & BTRFS_RESTRIPE_ARGS_DRANGE) &&
+ chunk_drange_filter(leaf, chunk, chunk_offset, rargs)) {
+ return 0;
+ }
+
/* soft profile changing mode */
if ((rargs->flags & BTRFS_RESTRIPE_ARGS_SOFT) &&
chunk_soft_convert_filter(chunk_type, rargs)) {
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 1b8dc3e..8d4bbcb 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -188,6 +188,7 @@ struct map_lookup {
#define BTRFS_RESTRIPE_ARGS_PROFILES (1ULL << 0)
#define BTRFS_RESTRIPE_ARGS_USAGE (1ULL << 1)
#define BTRFS_RESTRIPE_ARGS_DEVID (1ULL << 2)
+#define BTRFS_RESTRIPE_ARGS_DRANGE (1ULL << 3)
/*
* Profile changing flags. When SOFT is set we won't relocate chunk if
--
1.7.5.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 13/21] Btrfs: virtual address space subset filter
2011-08-23 20:01 [PATCH 00/21] [RFC] Btrfs: restriper Ilya Dryomov
` (11 preceding siblings ...)
2011-08-23 20:01 ` [PATCH 12/21] Btrfs: devid subset filter Ilya Dryomov
@ 2011-08-23 20:01 ` Ilya Dryomov
2011-08-23 20:01 ` [PATCH 14/21] Btrfs: save restripe parameters to disk Ilya Dryomov
` (10 subsequent siblings)
23 siblings, 0 replies; 42+ messages in thread
From: Ilya Dryomov @ 2011-08-23 20:01 UTC (permalink / raw)
To: linux-btrfs; +Cc: Chris Mason, Hugo Mills, idryomov
Select chunks which have at least one byte located inside a given
[vstart, vend) virtual address space range.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
fs/btrfs/volumes.c | 20 ++++++++++++++++++++
fs/btrfs/volumes.h | 3 ++-
2 files changed, 22 insertions(+), 1 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 4393f6d..eccd458 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2276,6 +2276,20 @@ static int chunk_drange_filter(struct extent_buffer *leaf,
return 1;
}
+/* [vstart, vend) */
+static int chunk_vrange_filter(struct extent_buffer *leaf,
+ struct btrfs_chunk *chunk,
+ u64 chunk_offset,
+ struct btrfs_restripe_args *rargs)
+{
+ if (chunk_offset < rargs->vend &&
+ chunk_offset + btrfs_chunk_length(leaf, chunk) > rargs->vstart)
+ /* at least part of the chunk is inside this vrange */
+ return 0;
+
+ return 1;
+}
+
static int chunk_soft_convert_filter(u64 chunk_profile,
struct btrfs_restripe_args *rargs)
{
@@ -2337,6 +2351,12 @@ static int should_restripe_chunk(struct btrfs_root *root,
return 0;
}
+ /* vrange filter */
+ if ((rargs->flags & BTRFS_RESTRIPE_ARGS_VRANGE) &&
+ chunk_vrange_filter(leaf, chunk, chunk_offset, rargs)) {
+ return 0;
+ }
+
/* soft profile changing mode */
if ((rargs->flags & BTRFS_RESTRIPE_ARGS_SOFT) &&
chunk_soft_convert_filter(chunk_type, rargs)) {
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 8d4bbcb..9726180 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -188,7 +188,8 @@ struct map_lookup {
#define BTRFS_RESTRIPE_ARGS_PROFILES (1ULL << 0)
#define BTRFS_RESTRIPE_ARGS_USAGE (1ULL << 1)
#define BTRFS_RESTRIPE_ARGS_DEVID (1ULL << 2)
-#define BTRFS_RESTRIPE_ARGS_DRANGE (1ULL << 3)
+#define BTRFS_RESTRIPE_ARGS_DRANGE (1ULL << 3)
+#define BTRFS_RESTRIPE_ARGS_VRANGE (1ULL << 4)
/*
* Profile changing flags. When SOFT is set we won't relocate chunk if
--
1.7.5.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 14/21] Btrfs: save restripe parameters to disk
2011-08-23 20:01 [PATCH 00/21] [RFC] Btrfs: restriper Ilya Dryomov
` (12 preceding siblings ...)
2011-08-23 20:01 ` [PATCH 13/21] Btrfs: virtual address space " Ilya Dryomov
@ 2011-08-23 20:01 ` Ilya Dryomov
2011-09-27 13:43 ` David Sterba
2011-11-01 10:29 ` Arne Jansen
2011-08-23 20:01 ` [PATCH 15/21] Btrfs: recover restripe on mount Ilya Dryomov
` (9 subsequent siblings)
23 siblings, 2 replies; 42+ messages in thread
From: Ilya Dryomov @ 2011-08-23 20:01 UTC (permalink / raw)
To: linux-btrfs; +Cc: Chris Mason, Hugo Mills, idryomov
Introduce a new btree objectid for storing restripe item. The reason is
to be able to resume restriper after a crash with the same parameters.
Restripe item has a very high objectid and goes into tree of tree roots.
The key for the new item is as follows:
[ BTRFS_RESTRIPE_OBJECTID ; 0 ; 0 ]
Older kernels simply ignore it so it's safe to mount with an older
kernel and then go back to the newer one.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
fs/btrfs/ctree.h | 127 +++++++++++++++++++++++++++++++++++++++++++++++++++-
fs/btrfs/volumes.c | 105 ++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 228 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 65d7562..b524034 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -85,6 +85,9 @@ struct btrfs_ordered_sum;
/* holds checksums of all the data extents */
#define BTRFS_CSUM_TREE_OBJECTID 7ULL
+/* for storing restripe params in the root tree */
+#define BTRFS_RESTRIPE_OBJECTID -4ULL
+
/* orhpan objectid for tracking unlinked/truncated files */
#define BTRFS_ORPHAN_OBJECTID -5ULL
@@ -649,6 +652,47 @@ struct btrfs_root_ref {
__le16 name_len;
} __attribute__ ((__packed__));
+/*
+ * Restriper stuff
+ */
+struct btrfs_disk_restripe_args {
+ /* profiles to touch, in-memory format */
+ __le64 profiles;
+
+ /* usage filter */
+ __le64 usage;
+
+ /* devid filter */
+ __le64 devid;
+
+ /* devid subset filter [pstart..pend) */
+ __le64 pstart;
+ __le64 pend;
+
+ /* btrfs virtual address space subset filter [vstart..vend) */
+ __le64 vstart;
+ __le64 vend;
+
+ /* profile to convert to, in-memory format */
+ __le64 target;
+
+ /* BTRFS_RESTRIPE_ARGS_* */
+ __le64 flags;
+
+ __le64 unused[8];
+} __attribute__ ((__packed__));
+
+struct btrfs_restripe_item {
+ /* BTRFS_RESTRIPE_* */
+ __le64 flags;
+
+ struct btrfs_disk_restripe_args data;
+ struct btrfs_disk_restripe_args sys;
+ struct btrfs_disk_restripe_args meta;
+
+ __le64 unused[4];
+} __attribute__ ((__packed__));
+
#define BTRFS_FILE_EXTENT_INLINE 0
#define BTRFS_FILE_EXTENT_REG 1
#define BTRFS_FILE_EXTENT_PREALLOC 2
@@ -727,7 +771,8 @@ struct btrfs_csum_item {
BTRFS_BLOCK_GROUP_RAID10)
/*
* We need a bit for restriper to be able to tell when chunks of type
- * SINGLE are available. It is used in avail_*_alloc_bits.
+ * SINGLE are available. It is used in avail_*_alloc_bits and restripe
+ * item fields.
*/
#define BTRFS_AVAIL_ALLOC_BIT_SINGLE (1 << 7)
@@ -2000,8 +2045,86 @@ static inline bool btrfs_root_readonly(struct btrfs_root *root)
return root->root_item.flags & BTRFS_ROOT_SUBVOL_RDONLY;
}
-/* struct btrfs_super_block */
+/* struct btrfs_restripe_item */
+BTRFS_SETGET_FUNCS(restripe_flags, struct btrfs_restripe_item, flags, 64);
+
+static inline void btrfs_restripe_data(struct extent_buffer *eb,
+ struct btrfs_restripe_item *ri,
+ struct btrfs_disk_restripe_args *ra)
+{
+ read_eb_member(eb, ri, struct btrfs_restripe_item, data, ra);
+}
+static inline void btrfs_set_restripe_data(struct extent_buffer *eb,
+ struct btrfs_restripe_item *ri,
+ struct btrfs_disk_restripe_args *ra)
+{
+ write_eb_member(eb, ri, struct btrfs_restripe_item, data, ra);
+}
+
+static inline void btrfs_restripe_meta(struct extent_buffer *eb,
+ struct btrfs_restripe_item *ri,
+ struct btrfs_disk_restripe_args *ra)
+{
+ read_eb_member(eb, ri, struct btrfs_restripe_item, meta, ra);
+}
+
+static inline void btrfs_set_restripe_meta(struct extent_buffer *eb,
+ struct btrfs_restripe_item *ri,
+ struct btrfs_disk_restripe_args *ra)
+{
+ write_eb_member(eb, ri, struct btrfs_restripe_item, meta, ra);
+}
+
+static inline void btrfs_restripe_sys(struct extent_buffer *eb,
+ struct btrfs_restripe_item *ri,
+ struct btrfs_disk_restripe_args *ra)
+{
+ read_eb_member(eb, ri, struct btrfs_restripe_item, sys, ra);
+}
+
+static inline void btrfs_set_restripe_sys(struct extent_buffer *eb,
+ struct btrfs_restripe_item *ri,
+ struct btrfs_disk_restripe_args *ra)
+{
+ write_eb_member(eb, ri, struct btrfs_restripe_item, sys, ra);
+}
+
+static inline void
+btrfs_disk_restripe_args_to_cpu(struct btrfs_restripe_args *cpu,
+ struct btrfs_disk_restripe_args *disk)
+{
+ memset(cpu, 0, sizeof(*cpu));
+
+ cpu->profiles = le64_to_cpu(disk->profiles);
+ cpu->usage = le64_to_cpu(disk->usage);
+ cpu->devid = le64_to_cpu(disk->devid);
+ cpu->pstart = le64_to_cpu(disk->pstart);
+ cpu->pend = le64_to_cpu(disk->pend);
+ cpu->vstart = le64_to_cpu(disk->vstart);
+ cpu->vend = le64_to_cpu(disk->vend);
+ cpu->target = le64_to_cpu(disk->target);
+ cpu->flags = le64_to_cpu(disk->flags);
+}
+
+static inline void
+btrfs_cpu_restripe_args_to_disk(struct btrfs_disk_restripe_args *disk,
+ struct btrfs_restripe_args *cpu)
+{
+ memset(disk, 0, sizeof(*disk));
+
+ disk->profiles = cpu_to_le64(cpu->profiles);
+ disk->usage = cpu_to_le64(cpu->usage);
+ disk->devid = cpu_to_le64(cpu->devid);
+ disk->pstart = cpu_to_le64(cpu->pstart);
+ disk->pend = cpu_to_le64(cpu->pend);
+ disk->vstart = cpu_to_le64(cpu->vstart);
+ disk->vend = cpu_to_le64(cpu->vend);
+ disk->target = cpu_to_le64(cpu->target);
+ disk->flags = cpu_to_le64(cpu->flags);
+}
+
+/* struct btrfs_super_block */
BTRFS_SETGET_STACK_FUNCS(super_bytenr, struct btrfs_super_block, bytenr, 64);
BTRFS_SETGET_STACK_FUNCS(super_flags, struct btrfs_super_block, flags, 64);
BTRFS_SETGET_STACK_FUNCS(super_generation, struct btrfs_super_block,
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index eccd458..1057ad3 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2150,6 +2150,97 @@ error:
return ret;
}
+static int insert_restripe_item(struct btrfs_root *root,
+ struct restripe_control *rctl)
+{
+ struct btrfs_trans_handle *trans;
+ struct btrfs_restripe_item *item;
+ struct btrfs_disk_restripe_args disk_rargs;
+ struct btrfs_path *path;
+ struct extent_buffer *leaf;
+ struct btrfs_key key;
+ int ret, err;
+
+ path = btrfs_alloc_path();
+ if (!path)
+ return -ENOMEM;
+
+ trans = btrfs_start_transaction(root, 0);
+ if (IS_ERR(trans)) {
+ btrfs_free_path(path);
+ return PTR_ERR(trans);
+ }
+
+ key.objectid = BTRFS_RESTRIPE_OBJECTID;
+ key.type = 0;
+ key.offset = 0;
+
+ ret = btrfs_insert_empty_item(trans, root, path, &key,
+ sizeof(*item));
+ if (ret)
+ goto out;
+
+ leaf = path->nodes[0];
+ item = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_restripe_item);
+
+ memset_extent_buffer(leaf, 0, (unsigned long)item, sizeof(*item));
+
+ btrfs_cpu_restripe_args_to_disk(&disk_rargs, &rctl->data);
+ btrfs_set_restripe_data(leaf, item, &disk_rargs);
+ btrfs_cpu_restripe_args_to_disk(&disk_rargs, &rctl->meta);
+ btrfs_set_restripe_meta(leaf, item, &disk_rargs);
+ btrfs_cpu_restripe_args_to_disk(&disk_rargs, &rctl->sys);
+ btrfs_set_restripe_sys(leaf, item, &disk_rargs);
+
+ btrfs_set_restripe_flags(leaf, item, rctl->flags);
+
+ btrfs_mark_buffer_dirty(leaf);
+out:
+ btrfs_free_path(path);
+ err = btrfs_commit_transaction(trans, root);
+ if (err && !ret)
+ ret = err;
+ return ret;
+}
+
+static int del_restripe_item(struct btrfs_root *root)
+{
+ struct btrfs_trans_handle *trans;
+ struct btrfs_path *path;
+ struct btrfs_key key;
+ int ret, err;
+
+ path = btrfs_alloc_path();
+ if (!path)
+ return -ENOMEM;
+
+ trans = btrfs_start_transaction(root, 0);
+ if (IS_ERR(trans)) {
+ btrfs_free_path(path);
+ return PTR_ERR(trans);
+ }
+
+ key.objectid = BTRFS_RESTRIPE_OBJECTID;
+ key.type = 0;
+ key.offset = 0;
+
+ ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
+ if (ret < 0)
+ goto out;
+ if (ret > 0) {
+ ret = -ENOENT;
+ goto out;
+ }
+
+ ret = btrfs_del_item(trans, root, path);
+out:
+ btrfs_free_path(path);
+ err = btrfs_commit_transaction(trans, root);
+ if (err && !ret)
+ ret = err;
+ return ret;
+}
+
/*
* Should be called with both restripe and volume mutexes held to
* serialize other volume operations (add_dev/rm_dev/resize) wrt
@@ -2485,6 +2576,7 @@ int btrfs_restripe(struct restripe_control *rctl)
{
struct btrfs_fs_info *fs_info = rctl->fs_info;
u64 allowed;
+ int err;
int ret;
mutex_lock(&fs_info->volume_mutex);
@@ -2572,16 +2664,25 @@ int btrfs_restripe(struct restripe_control *rctl)
}
do_restripe:
+ ret = insert_restripe_item(fs_info->tree_root, rctl);
+ if (ret && ret != -EEXIST)
+ goto out;
+ BUG_ON(ret == -EEXIST);
+
set_restripe_control(rctl);
mutex_unlock(&fs_info->volume_mutex);
- ret = __btrfs_restripe(fs_info->dev_root);
+ err = __btrfs_restripe(fs_info->dev_root);
mutex_lock(&fs_info->volume_mutex);
+
unset_restripe_control(fs_info);
+ ret = del_restripe_item(fs_info->tree_root);
+ BUG_ON(ret);
+
mutex_unlock(&fs_info->volume_mutex);
- return ret;
+ return err;
out:
mutex_unlock(&fs_info->volume_mutex);
--
1.7.5.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 15/21] Btrfs: recover restripe on mount
2011-08-23 20:01 [PATCH 00/21] [RFC] Btrfs: restriper Ilya Dryomov
` (13 preceding siblings ...)
2011-08-23 20:01 ` [PATCH 14/21] Btrfs: save restripe parameters to disk Ilya Dryomov
@ 2011-08-23 20:01 ` Ilya Dryomov
2011-11-01 10:57 ` Arne Jansen
2011-08-23 20:01 ` [PATCH 16/21] Btrfs: allow for cancelling restriper Ilya Dryomov
` (8 subsequent siblings)
23 siblings, 1 reply; 42+ messages in thread
From: Ilya Dryomov @ 2011-08-23 20:01 UTC (permalink / raw)
To: linux-btrfs; +Cc: Chris Mason, Hugo Mills, idryomov
On mount, if restripe item is found, resume restripe in a separate
kernel thread.
Try to be smart to continue roughly where previous balance (or convert)
was interrupted. For chunk types that were being converted to some
profile we turn on soft convert, in case of a simple balance we turn on
usage filter and relocate only less-than-90%-full chunks of that type.
These are just heuristics but they help quite a bit, and can be improved
in future.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
fs/btrfs/disk-io.c | 3 +
fs/btrfs/ioctl.c | 2 +-
fs/btrfs/volumes.c | 125 ++++++++++++++++++++++++++++++++++++++++++++++++++--
fs/btrfs/volumes.h | 3 +-
4 files changed, 127 insertions(+), 6 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index fa2301b..b3950f2 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2103,6 +2103,9 @@ struct btrfs_root *open_ctree(struct super_block *sb,
if (!err)
err = btrfs_orphan_cleanup(fs_info->tree_root);
up_read(&fs_info->cleanup_work_sem);
+
+ err = btrfs_recover_restripe(fs_info->tree_root);
+
if (err) {
close_ctree(tree_root);
return ERR_PTR(err);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 9dfc686..f371edd 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2899,7 +2899,7 @@ static long btrfs_ioctl_restripe(struct btrfs_root *root, void __user *arg)
memcpy(&rctl->meta, &rargs->meta, sizeof(rctl->meta));
memcpy(&rctl->sys, &rargs->sys, sizeof(rctl->sys));
- ret = btrfs_restripe(rctl);
+ ret = btrfs_restripe(rctl, 0);
/* rctl freed in unset_restripe_control */
kfree(rargs);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 1057ad3..4490124 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -23,6 +23,7 @@
#include <linux/random.h>
#include <linux/iocontext.h>
#include <linux/capability.h>
+#include <linux/kthread.h>
#include <asm/div64.h>
#include "compat.h"
#include "ctree.h"
@@ -2242,16 +2243,58 @@ out:
}
/*
+ * This is a heuristic used to reduce the number of chunks restriped on
+ * resume after balance was interrupted.
+ */
+static void update_restripe_args(struct restripe_control *rctl)
+{
+ /*
+ * Turn on soft mode for chunk types that were being converted.
+ */
+ if (rctl->data.flags & BTRFS_RESTRIPE_ARGS_CONVERT)
+ rctl->data.flags |= BTRFS_RESTRIPE_ARGS_SOFT;
+ if (rctl->sys.flags & BTRFS_RESTRIPE_ARGS_CONVERT)
+ rctl->sys.flags |= BTRFS_RESTRIPE_ARGS_SOFT;
+ if (rctl->meta.flags & BTRFS_RESTRIPE_ARGS_CONVERT)
+ rctl->meta.flags |= BTRFS_RESTRIPE_ARGS_SOFT;
+
+ /*
+ * Turn on usage filter if is not already used. The idea is
+ * that chunks that we have already balanced should be
+ * reasonably full. Don't do it for chunks that are being
+ * converted - that will keep us from relocating unconverted
+ * (albeit full) chunks.
+ */
+ if (!(rctl->data.flags & BTRFS_RESTRIPE_ARGS_USAGE) &&
+ !(rctl->data.flags & BTRFS_RESTRIPE_ARGS_CONVERT)) {
+ rctl->data.flags |= BTRFS_RESTRIPE_ARGS_USAGE;
+ rctl->data.usage = 90;
+ }
+ if (!(rctl->sys.flags & BTRFS_RESTRIPE_ARGS_USAGE) &&
+ !(rctl->sys.flags & BTRFS_RESTRIPE_ARGS_CONVERT)) {
+ rctl->sys.flags |= BTRFS_RESTRIPE_ARGS_USAGE;
+ rctl->sys.usage = 90;
+ }
+ if (!(rctl->meta.flags & BTRFS_RESTRIPE_ARGS_USAGE) &&
+ !(rctl->meta.flags & BTRFS_RESTRIPE_ARGS_CONVERT)) {
+ rctl->meta.flags |= BTRFS_RESTRIPE_ARGS_USAGE;
+ rctl->meta.usage = 90;
+ }
+}
+
+/*
* Should be called with both restripe and volume mutexes held to
* serialize other volume operations (add_dev/rm_dev/resize) wrt
* restriper. Same goes for unset_restripe_control().
*/
-static void set_restripe_control(struct restripe_control *rctl)
+static void set_restripe_control(struct restripe_control *rctl, int update)
{
struct btrfs_fs_info *fs_info = rctl->fs_info;
spin_lock(&fs_info->restripe_lock);
fs_info->restripe_ctl = rctl;
+ if (update)
+ update_restripe_args(rctl);
spin_unlock(&fs_info->restripe_lock);
}
@@ -2572,7 +2615,7 @@ error:
/*
* Should be called with restripe_mutex held
*/
-int btrfs_restripe(struct restripe_control *rctl)
+int btrfs_restripe(struct restripe_control *rctl, int resume)
{
struct btrfs_fs_info *fs_info = rctl->fs_info;
u64 allowed;
@@ -2667,9 +2710,9 @@ do_restripe:
ret = insert_restripe_item(fs_info->tree_root, rctl);
if (ret && ret != -EEXIST)
goto out;
- BUG_ON(ret == -EEXIST);
+ BUG_ON(ret == -EEXIST && !resume);
- set_restripe_control(rctl);
+ set_restripe_control(rctl, resume);
mutex_unlock(&fs_info->volume_mutex);
err = __btrfs_restripe(fs_info->dev_root);
@@ -2690,6 +2733,80 @@ out:
return ret;
}
+static int restriper_kthread(void *data)
+{
+ struct restripe_control *rctl = (struct restripe_control *)data;
+ struct btrfs_fs_info *fs_info = rctl->fs_info;
+ int ret;
+
+ mutex_lock(&fs_info->restripe_mutex);
+
+ printk(KERN_INFO "btrfs: continuing restripe\n");
+ ret = btrfs_restripe(rctl, 1);
+
+ mutex_unlock(&fs_info->restripe_mutex);
+ return ret;
+}
+
+int btrfs_recover_restripe(struct btrfs_root *tree_root)
+{
+ struct task_struct *tsk;
+ struct restripe_control *rctl;
+ struct btrfs_restripe_item *item;
+ struct btrfs_disk_restripe_args disk_rargs;
+ struct btrfs_path *path;
+ struct extent_buffer *leaf;
+ struct btrfs_key key;
+ int ret;
+
+ path = btrfs_alloc_path();
+ if (!path)
+ return -ENOMEM;
+
+ rctl = kzalloc(sizeof(*rctl), GFP_NOFS);
+ if (!rctl) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ key.objectid = BTRFS_RESTRIPE_OBJECTID;
+ key.type = 0;
+ key.offset = 0;
+
+ ret = btrfs_search_slot(NULL, tree_root, &key, path, 0, 0);
+ if (ret < 0)
+ goto out_free;
+ if (ret > 0) { /* ret = -ENOENT; */
+ ret = 0;
+ goto out_free;
+ }
+
+ leaf = path->nodes[0];
+ item = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_restripe_item);
+
+ rctl->fs_info = tree_root->fs_info;
+ rctl->flags = btrfs_restripe_flags(leaf, item);
+
+ btrfs_restripe_data(leaf, item, &disk_rargs);
+ btrfs_disk_restripe_args_to_cpu(&rctl->data, &disk_rargs);
+ btrfs_restripe_meta(leaf, item, &disk_rargs);
+ btrfs_disk_restripe_args_to_cpu(&rctl->meta, &disk_rargs);
+ btrfs_restripe_sys(leaf, item, &disk_rargs);
+ btrfs_disk_restripe_args_to_cpu(&rctl->sys, &disk_rargs);
+
+ tsk = kthread_run(restriper_kthread, rctl, "btrfs-restriper");
+ if (IS_ERR(tsk))
+ ret = PTR_ERR(tsk);
+ else
+ goto out;
+
+out_free:
+ kfree(rctl);
+out:
+ btrfs_free_path(path);
+ return ret;
+}
+
/*
* shrinking a device means finding all of the device extents past
* the new size, and then following the back refs to the chunks.
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 9726180..6fcb4a5 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -252,7 +252,8 @@ struct btrfs_device *btrfs_find_device(struct btrfs_root *root, u64 devid,
int btrfs_shrink_device(struct btrfs_device *device, u64 new_size);
int btrfs_init_new_device(struct btrfs_root *root, char *path);
int btrfs_balance(struct btrfs_root *dev_root);
-int btrfs_restripe(struct restripe_control *rctl);
+int btrfs_restripe(struct restripe_control *rctl, int resume);
+int btrfs_recover_restripe(struct btrfs_root *tree_root);
int btrfs_chunk_readonly(struct btrfs_root *root, u64 chunk_offset);
int find_free_dev_extent(struct btrfs_trans_handle *trans,
struct btrfs_device *device, u64 num_bytes,
--
1.7.5.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 16/21] Btrfs: allow for cancelling restriper
2011-08-23 20:01 [PATCH 00/21] [RFC] Btrfs: restriper Ilya Dryomov
` (14 preceding siblings ...)
2011-08-23 20:01 ` [PATCH 15/21] Btrfs: recover restripe on mount Ilya Dryomov
@ 2011-08-23 20:01 ` Ilya Dryomov
2011-08-23 20:01 ` [PATCH 17/21] Btrfs: allow for pausing restriper Ilya Dryomov
` (7 subsequent siblings)
23 siblings, 0 replies; 42+ messages in thread
From: Ilya Dryomov @ 2011-08-23 20:01 UTC (permalink / raw)
To: linux-btrfs; +Cc: Chris Mason, Hugo Mills, idryomov
Implement an ioctl for cancelling restriper. Currently we wait until
relocation of the current block group is finished, in future this can be
done by triggering a commit. Restripe item is deleted and no memory
about the interrupted restripe is kept.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
fs/btrfs/ctree.h | 2 +
fs/btrfs/disk-io.c | 2 +
fs/btrfs/ioctl.c | 20 +++++++++++++++++
fs/btrfs/ioctl.h | 3 ++
fs/btrfs/volumes.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++----
fs/btrfs/volumes.h | 7 ++++++
6 files changed, 90 insertions(+), 5 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index b524034..8e764d9 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1165,6 +1165,8 @@ struct btrfs_fs_info {
spinlock_t restripe_lock;
struct mutex restripe_mutex;
struct restripe_control *restripe_ctl;
+ unsigned long restripe_state;
+ wait_queue_head_t restripe_wait;
unsigned data_chunk_allocations;
unsigned metadata_ratio;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b3950f2..662a6e6 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1703,6 +1703,8 @@ struct btrfs_root *open_ctree(struct super_block *sb,
spin_lock_init(&fs_info->restripe_lock);
mutex_init(&fs_info->restripe_mutex);
fs_info->restripe_ctl = NULL;
+ fs_info->restripe_state = 0;
+ init_waitqueue_head(&fs_info->restripe_wait);
sb->s_blocksize = 4096;
sb->s_blocksize_bits = blksize_bits(4096);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index f371edd..d8bdb67 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2878,6 +2878,10 @@ static long btrfs_ioctl_restripe(struct btrfs_root *root, void __user *arg)
return -EROFS;
mutex_lock(&fs_info->restripe_mutex);
+ if (fs_info->restripe_ctl) {
+ ret = -EINPROGRESS;
+ goto out;
+ }
rargs = memdup_user(arg, sizeof(*rargs));
if (IS_ERR(rargs)) {
@@ -2908,6 +2912,20 @@ out:
return ret;
}
+static long btrfs_ioctl_restripe_ctl(struct btrfs_root *root,
+ int cmd)
+{
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
+ switch (cmd) {
+ case BTRFS_RESTRIPE_CTL_CANCEL:
+ return btrfs_cancel_restripe(root->fs_info);
+ }
+
+ return -EINVAL;
+}
+
long btrfs_ioctl(struct file *file, unsigned int
cmd, unsigned long arg)
{
@@ -2982,6 +3000,8 @@ long btrfs_ioctl(struct file *file, unsigned int
return btrfs_ioctl_scrub_progress(root, argp);
case BTRFS_IOC_RESTRIPE:
return btrfs_ioctl_restripe(root, argp);
+ case BTRFS_IOC_RESTRIPE_CTL:
+ return btrfs_ioctl_restripe_ctl(root, arg);
}
return -ENOTTY;
diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h
index 798f1d4..4f6ead5 100644
--- a/fs/btrfs/ioctl.h
+++ b/fs/btrfs/ioctl.h
@@ -109,6 +109,8 @@ struct btrfs_ioctl_fs_info_args {
__u64 reserved[124]; /* pad to 1k */
};
+#define BTRFS_RESTRIPE_CTL_CANCEL 1
+
struct btrfs_restripe_args {
__u64 profiles;
__u64 usage;
@@ -285,4 +287,5 @@ struct btrfs_ioctl_space_args {
struct btrfs_ioctl_fs_info_args)
#define BTRFS_IOC_RESTRIPE _IOW(BTRFS_IOCTL_MAGIC, 32, \
struct btrfs_ioctl_restripe_args)
+#define BTRFS_IOC_RESTRIPE_CTL _IOW(BTRFS_IOCTL_MAGIC, 33, int)
#endif
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 4490124..cd43368 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2553,6 +2553,13 @@ static int __btrfs_restripe(struct btrfs_root *dev_root)
key.type = BTRFS_CHUNK_ITEM_KEY;
while (1) {
+ struct btrfs_fs_info *fs_info = dev_root->fs_info;
+
+ if (test_bit(RESTRIPE_CANCEL_REQ, &fs_info->restripe_state)) {
+ ret = -ECANCELED;
+ goto error;
+ }
+
ret = btrfs_search_slot(NULL, chunk_root, &key, path, 0, 0);
if (ret < 0)
goto error;
@@ -2715,16 +2722,25 @@ do_restripe:
set_restripe_control(rctl, resume);
mutex_unlock(&fs_info->volume_mutex);
+ set_bit(RESTRIPE_RUNNING, &fs_info->restripe_state);
+ mutex_unlock(&fs_info->restripe_mutex);
+
err = __btrfs_restripe(fs_info->dev_root);
- mutex_lock(&fs_info->volume_mutex);
+ mutex_lock(&fs_info->restripe_mutex);
+ clear_bit(RESTRIPE_RUNNING, &fs_info->restripe_state);
- unset_restripe_control(fs_info);
- ret = del_restripe_item(fs_info->tree_root);
- BUG_ON(ret);
+ if (test_bit(RESTRIPE_CANCEL_REQ, &fs_info->restripe_state)) {
+ mutex_lock(&fs_info->volume_mutex);
- mutex_unlock(&fs_info->volume_mutex);
+ unset_restripe_control(fs_info);
+ ret = del_restripe_item(fs_info->tree_root);
+ BUG_ON(ret);
+
+ mutex_unlock(&fs_info->volume_mutex);
+ }
+ wake_up(&fs_info->restripe_wait);
return err;
out:
@@ -2807,6 +2823,41 @@ out:
return ret;
}
+int btrfs_cancel_restripe(struct btrfs_fs_info *fs_info)
+{
+ int ret = 0;
+
+ mutex_lock(&fs_info->restripe_mutex);
+ if (!fs_info->restripe_ctl) {
+ ret = -ENOTCONN;
+ goto out;
+ }
+
+ if (test_bit(RESTRIPE_RUNNING, &fs_info->restripe_state)) {
+ set_bit(RESTRIPE_CANCEL_REQ, &fs_info->restripe_state);
+ while (test_bit(RESTRIPE_RUNNING, &fs_info->restripe_state)) {
+ mutex_unlock(&fs_info->restripe_mutex);
+ wait_event(fs_info->restripe_wait,
+ !test_bit(RESTRIPE_RUNNING,
+ &fs_info->restripe_state));
+ mutex_lock(&fs_info->restripe_mutex);
+ }
+ clear_bit(RESTRIPE_CANCEL_REQ, &fs_info->restripe_state);
+ } else {
+ mutex_lock(&fs_info->volume_mutex);
+
+ unset_restripe_control(fs_info);
+ ret = del_restripe_item(fs_info->tree_root);
+ BUG_ON(ret);
+
+ mutex_unlock(&fs_info->volume_mutex);
+ }
+
+out:
+ mutex_unlock(&fs_info->restripe_mutex);
+ return ret;
+}
+
/*
* shrinking a device means finding all of the device extents past
* the new size, and then following the back refs to the chunks.
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 6fcb4a5..dd1fa7f 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -199,6 +199,12 @@ struct map_lookup {
#define BTRFS_RESTRIPE_ARGS_CONVERT (1ULL << 8)
#define BTRFS_RESTRIPE_ARGS_SOFT (1ULL << 9)
+/*
+ * Restripe state bits
+ */
+#define RESTRIPE_RUNNING 0
+#define RESTRIPE_CANCEL_REQ 1
+
struct btrfs_restripe_args;
struct restripe_control {
struct btrfs_fs_info *fs_info;
@@ -254,6 +260,7 @@ int btrfs_init_new_device(struct btrfs_root *root, char *path);
int btrfs_balance(struct btrfs_root *dev_root);
int btrfs_restripe(struct restripe_control *rctl, int resume);
int btrfs_recover_restripe(struct btrfs_root *tree_root);
+int btrfs_cancel_restripe(struct btrfs_fs_info *fs_info);
int btrfs_chunk_readonly(struct btrfs_root *root, u64 chunk_offset);
int find_free_dev_extent(struct btrfs_trans_handle *trans,
struct btrfs_device *device, u64 num_bytes,
--
1.7.5.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 17/21] Btrfs: allow for pausing restriper
2011-08-23 20:01 [PATCH 00/21] [RFC] Btrfs: restriper Ilya Dryomov
` (15 preceding siblings ...)
2011-08-23 20:01 ` [PATCH 16/21] Btrfs: allow for cancelling restriper Ilya Dryomov
@ 2011-08-23 20:01 ` Ilya Dryomov
2011-11-01 11:46 ` Arne Jansen
2011-08-23 20:01 ` [PATCH 18/21] Btrfs: allow for resuming restriper after it was paused Ilya Dryomov
` (6 subsequent siblings)
23 siblings, 1 reply; 42+ messages in thread
From: Ilya Dryomov @ 2011-08-23 20:01 UTC (permalink / raw)
To: linux-btrfs; +Cc: Chris Mason, Hugo Mills, idryomov
Implement an ioctl for pausing restriper. This pauses the relocation,
but restripe is still considered to be "in progress": restriper item is
not deleted, other volume operations cannot be started, etc. If paused
in the middle of profile changing operation we will continue making
allocations with the target profile.
Add a hook to close_ctree() to be able to pause restriper and free it's
data structures on unmount. (It's safe to unmount when restriper is in
'paused' state, we will resume with the same parameters on the next
mount)
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
fs/btrfs/disk-io.c | 3 +++
fs/btrfs/ioctl.c | 2 ++
fs/btrfs/ioctl.h | 1 +
fs/btrfs/volumes.c | 44 ++++++++++++++++++++++++++++++++++++++++++--
fs/btrfs/volumes.h | 2 ++
5 files changed, 50 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 662a6e6..7db5c50 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2542,6 +2542,9 @@ int close_ctree(struct btrfs_root *root)
fs_info->closing = 1;
smp_mb();
+ /* pause restriper and free restripe_ctl */
+ btrfs_pause_restripe(root->fs_info, 1);
+
btrfs_scrub_cancel(root);
/* wait for any defraggers to finish */
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index d8bdb67..61978ac 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2921,6 +2921,8 @@ static long btrfs_ioctl_restripe_ctl(struct btrfs_root *root,
switch (cmd) {
case BTRFS_RESTRIPE_CTL_CANCEL:
return btrfs_cancel_restripe(root->fs_info);
+ case BTRFS_RESTRIPE_CTL_PAUSE:
+ return btrfs_pause_restripe(root->fs_info, 0);
}
return -EINVAL;
diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h
index 4f6ead5..e468d5b 100644
--- a/fs/btrfs/ioctl.h
+++ b/fs/btrfs/ioctl.h
@@ -110,6 +110,7 @@ struct btrfs_ioctl_fs_info_args {
};
#define BTRFS_RESTRIPE_CTL_CANCEL 1
+#define BTRFS_RESTRIPE_CTL_PAUSE 2
struct btrfs_restripe_args {
__u64 profiles;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index cd43368..65deaa7 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2555,7 +2555,8 @@ static int __btrfs_restripe(struct btrfs_root *dev_root)
while (1) {
struct btrfs_fs_info *fs_info = dev_root->fs_info;
- if (test_bit(RESTRIPE_CANCEL_REQ, &fs_info->restripe_state)) {
+ if (test_bit(RESTRIPE_CANCEL_REQ, &fs_info->restripe_state) ||
+ test_bit(RESTRIPE_PAUSE_REQ, &fs_info->restripe_state)) {
ret = -ECANCELED;
goto error;
}
@@ -2730,7 +2731,9 @@ do_restripe:
mutex_lock(&fs_info->restripe_mutex);
clear_bit(RESTRIPE_RUNNING, &fs_info->restripe_state);
- if (test_bit(RESTRIPE_CANCEL_REQ, &fs_info->restripe_state)) {
+ if (test_bit(RESTRIPE_CANCEL_REQ, &fs_info->restripe_state) ||
+ (!test_bit(RESTRIPE_PAUSE_REQ, &fs_info->restripe_state) &&
+ !test_bit(RESTRIPE_CANCEL_REQ, &fs_info->restripe_state))) {
mutex_lock(&fs_info->volume_mutex);
unset_restripe_control(fs_info);
@@ -2858,6 +2861,43 @@ out:
return ret;
}
+int btrfs_pause_restripe(struct btrfs_fs_info *fs_info, int unset)
+{
+ int ret = 0;
+
+ mutex_lock(&fs_info->restripe_mutex);
+ if (!fs_info->restripe_ctl) {
+ ret = -ENOTCONN;
+ goto out;
+ }
+
+ /* only running restripe can be paused */
+ if (!test_bit(RESTRIPE_RUNNING, &fs_info->restripe_state)) {
+ ret = -ENOTCONN;
+ goto out_unset;
+ }
+
+ set_bit(RESTRIPE_PAUSE_REQ, &fs_info->restripe_state);
+ while (test_bit(RESTRIPE_RUNNING, &fs_info->restripe_state)) {
+ mutex_unlock(&fs_info->restripe_mutex);
+ wait_event(fs_info->restripe_wait,
+ !test_bit(RESTRIPE_RUNNING,
+ &fs_info->restripe_state));
+ mutex_lock(&fs_info->restripe_mutex);
+ }
+ clear_bit(RESTRIPE_PAUSE_REQ, &fs_info->restripe_state);
+
+out_unset:
+ if (unset) {
+ mutex_lock(&fs_info->volume_mutex);
+ unset_restripe_control(fs_info);
+ mutex_unlock(&fs_info->volume_mutex);
+ }
+out:
+ mutex_unlock(&fs_info->restripe_mutex);
+ return ret;
+}
+
/*
* shrinking a device means finding all of the device extents past
* the new size, and then following the back refs to the chunks.
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index dd1fa7f..b8c234a 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -204,6 +204,7 @@ struct map_lookup {
*/
#define RESTRIPE_RUNNING 0
#define RESTRIPE_CANCEL_REQ 1
+#define RESTRIPE_PAUSE_REQ 2
struct btrfs_restripe_args;
struct restripe_control {
@@ -261,6 +262,7 @@ int btrfs_balance(struct btrfs_root *dev_root);
int btrfs_restripe(struct restripe_control *rctl, int resume);
int btrfs_recover_restripe(struct btrfs_root *tree_root);
int btrfs_cancel_restripe(struct btrfs_fs_info *fs_info);
+int btrfs_pause_restripe(struct btrfs_fs_info *fs_info, int unset);
int btrfs_chunk_readonly(struct btrfs_root *root, u64 chunk_offset);
int find_free_dev_extent(struct btrfs_trans_handle *trans,
struct btrfs_device *device, u64 num_bytes,
--
1.7.5.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 18/21] Btrfs: allow for resuming restriper after it was paused
2011-08-23 20:01 [PATCH 00/21] [RFC] Btrfs: restriper Ilya Dryomov
` (16 preceding siblings ...)
2011-08-23 20:01 ` [PATCH 17/21] Btrfs: allow for pausing restriper Ilya Dryomov
@ 2011-08-23 20:01 ` Ilya Dryomov
2011-08-23 20:02 ` [PATCH 19/21] Btrfs: add skip_restripe mount option Ilya Dryomov
` (5 subsequent siblings)
23 siblings, 0 replies; 42+ messages in thread
From: Ilya Dryomov @ 2011-08-23 20:01 UTC (permalink / raw)
To: linux-btrfs; +Cc: Chris Mason, Hugo Mills, idryomov
Implement an ioctl for resuming restriper. We use the same heuristics
used when recovering restripe after a crash to try to start where we
left off last time. If needed those parameters can be made configurable
through the userspace "resume" command in future.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
fs/btrfs/ioctl.c | 2 ++
fs/btrfs/ioctl.h | 1 +
fs/btrfs/volumes.c | 25 +++++++++++++++++++++++++
fs/btrfs/volumes.h | 1 +
4 files changed, 29 insertions(+), 0 deletions(-)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 61978ac..cb2f420 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2923,6 +2923,8 @@ static long btrfs_ioctl_restripe_ctl(struct btrfs_root *root,
return btrfs_cancel_restripe(root->fs_info);
case BTRFS_RESTRIPE_CTL_PAUSE:
return btrfs_pause_restripe(root->fs_info, 0);
+ case BTRFS_RESTRIPE_CTL_RESUME:
+ return btrfs_resume_restripe(root->fs_info);
}
return -EINVAL;
diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h
index e468d5b..365d06c 100644
--- a/fs/btrfs/ioctl.h
+++ b/fs/btrfs/ioctl.h
@@ -111,6 +111,7 @@ struct btrfs_ioctl_fs_info_args {
#define BTRFS_RESTRIPE_CTL_CANCEL 1
#define BTRFS_RESTRIPE_CTL_PAUSE 2
+#define BTRFS_RESTRIPE_CTL_RESUME 3
struct btrfs_restripe_args {
__u64 profiles;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 65deaa7..bfe2b03 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2898,6 +2898,31 @@ out:
return ret;
}
+int btrfs_resume_restripe(struct btrfs_fs_info *fs_info)
+{
+ int ret;
+
+ if (fs_info->sb->s_flags & MS_RDONLY)
+ return -EROFS;
+
+ mutex_lock(&fs_info->restripe_mutex);
+ if (!fs_info->restripe_ctl) {
+ ret = -ENOTCONN;
+ goto out;
+ }
+
+ if (test_bit(RESTRIPE_RUNNING, &fs_info->restripe_state)) {
+ ret = -EINPROGRESS;
+ goto out;
+ }
+
+ ret = btrfs_restripe(fs_info->restripe_ctl, 1);
+
+out:
+ mutex_unlock(&fs_info->restripe_mutex);
+ return ret;
+}
+
/*
* shrinking a device means finding all of the device extents past
* the new size, and then following the back refs to the chunks.
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index b8c234a..c0652c9 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -263,6 +263,7 @@ int btrfs_restripe(struct restripe_control *rctl, int resume);
int btrfs_recover_restripe(struct btrfs_root *tree_root);
int btrfs_cancel_restripe(struct btrfs_fs_info *fs_info);
int btrfs_pause_restripe(struct btrfs_fs_info *fs_info, int unset);
+int btrfs_resume_restripe(struct btrfs_fs_info *fs_info);
int btrfs_chunk_readonly(struct btrfs_root *root, u64 chunk_offset);
int find_free_dev_extent(struct btrfs_trans_handle *trans,
struct btrfs_device *device, u64 num_bytes,
--
1.7.5.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 19/21] Btrfs: add skip_restripe mount option
2011-08-23 20:01 [PATCH 00/21] [RFC] Btrfs: restriper Ilya Dryomov
` (17 preceding siblings ...)
2011-08-23 20:01 ` [PATCH 18/21] Btrfs: allow for resuming restriper after it was paused Ilya Dryomov
@ 2011-08-23 20:02 ` Ilya Dryomov
2011-08-23 20:02 ` [PATCH 20/21] Btrfs: get rid of btrfs_balance() function Ilya Dryomov
` (4 subsequent siblings)
23 siblings, 0 replies; 42+ messages in thread
From: Ilya Dryomov @ 2011-08-23 20:02 UTC (permalink / raw)
To: linux-btrfs; +Cc: Chris Mason, Hugo Mills, idryomov
Since restriper kthread starts involuntarily on mount and can suck cpu
and memory bandwidth add a mount option to forcefully skip it. The
restriper in that case hangs around in paused state and can be resumed
from userspace when it's convenient.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
fs/btrfs/ctree.h | 1 +
fs/btrfs/super.c | 8 +++++++-
fs/btrfs/volumes.c | 15 +++++++++++++--
3 files changed, 21 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 8e764d9..0eaa08d 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1432,6 +1432,7 @@ struct btrfs_ioctl_defrag_range_args {
#define BTRFS_MOUNT_ENOSPC_DEBUG (1 << 15)
#define BTRFS_MOUNT_AUTO_DEFRAG (1 << 16)
#define BTRFS_MOUNT_INODE_MAP_CACHE (1 << 17)
+#define BTRFS_MOUNT_SKIP_RESTRIPE (1 << 18)
#define btrfs_clear_opt(o, opt) ((o) &= ~BTRFS_MOUNT_##opt)
#define btrfs_set_opt(o, opt) ((o) |= BTRFS_MOUNT_##opt)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 15634d4..1ef8c33 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -162,7 +162,7 @@ enum {
Opt_notreelog, Opt_ratio, Opt_flushoncommit, Opt_discard,
Opt_space_cache, Opt_clear_cache, Opt_user_subvol_rm_allowed,
Opt_enospc_debug, Opt_subvolrootid, Opt_defrag,
- Opt_inode_cache, Opt_err,
+ Opt_inode_cache, Opt_skip_restripe, Opt_err,
};
static match_table_t tokens = {
@@ -195,6 +195,7 @@ static match_table_t tokens = {
{Opt_subvolrootid, "subvolrootid=%d"},
{Opt_defrag, "autodefrag"},
{Opt_inode_cache, "inode_cache"},
+ {Opt_skip_restripe, "skip_restripe"},
{Opt_err, NULL},
};
@@ -381,6 +382,9 @@ int btrfs_parse_options(struct btrfs_root *root, char *options)
printk(KERN_INFO "btrfs: enabling auto defrag");
btrfs_set_opt(info->mount_opt, AUTO_DEFRAG);
break;
+ case Opt_skip_restripe:
+ btrfs_set_opt(info->mount_opt, SKIP_RESTRIPE);
+ break;
case Opt_err:
printk(KERN_INFO "btrfs: unrecognized mount option "
"'%s'\n", p);
@@ -729,6 +733,8 @@ static int btrfs_show_options(struct seq_file *seq, struct vfsmount *vfs)
seq_puts(seq, ",autodefrag");
if (btrfs_test_opt(root, INODE_MAP_CACHE))
seq_puts(seq, ",inode_cache");
+ if (btrfs_test_opt(root, SKIP_RESTRIPE))
+ seq_puts(seq, ",skip_restripe");
return 0;
}
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index bfe2b03..d8958e2 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2756,13 +2756,24 @@ static int restriper_kthread(void *data)
{
struct restripe_control *rctl = (struct restripe_control *)data;
struct btrfs_fs_info *fs_info = rctl->fs_info;
- int ret;
+ int ret = 0;
mutex_lock(&fs_info->restripe_mutex);
- printk(KERN_INFO "btrfs: continuing restripe\n");
+ if (btrfs_test_opt(fs_info->tree_root, SKIP_RESTRIPE)) {
+ mutex_lock(&fs_info->volume_mutex);
+ set_restripe_control(rctl, 0);
+ mutex_unlock(&fs_info->volume_mutex);
+
+ printk(KERN_INFO "btrfs: force skipping restripe\n");
+ goto out;
+ } else {
+ printk(KERN_INFO "btrfs: continuing restripe\n");
+ }
+
ret = btrfs_restripe(rctl, 1);
+out:
mutex_unlock(&fs_info->restripe_mutex);
return ret;
}
--
1.7.5.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 20/21] Btrfs: get rid of btrfs_balance() function
2011-08-23 20:01 [PATCH 00/21] [RFC] Btrfs: restriper Ilya Dryomov
` (18 preceding siblings ...)
2011-08-23 20:02 ` [PATCH 19/21] Btrfs: add skip_restripe mount option Ilya Dryomov
@ 2011-08-23 20:02 ` Ilya Dryomov
2011-08-23 20:02 ` [PATCH 21/21] Btrfs: add restripe progress reporting Ilya Dryomov
` (3 subsequent siblings)
23 siblings, 0 replies; 42+ messages in thread
From: Ilya Dryomov @ 2011-08-23 20:02 UTC (permalink / raw)
To: linux-btrfs; +Cc: Chris Mason, Hugo Mills, idryomov
Remove btrfs_balance(). The old balancing ioctl now uses restriper
infrastructure, just w/o using any filters.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
fs/btrfs/ioctl.c | 38 +++++++++++++++++-
fs/btrfs/volumes.c | 115 ++++-----------------------------------------------
fs/btrfs/volumes.h | 1 -
3 files changed, 46 insertions(+), 108 deletions(-)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index cb2f420..4f29149 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2864,6 +2864,42 @@ static long btrfs_ioctl_scrub_progress(struct btrfs_root *root,
return ret;
}
+static long btrfs_ioctl_balance(struct btrfs_root *root)
+{
+ struct btrfs_fs_info *fs_info = root->fs_info;
+ struct restripe_control *rctl;
+ int ret;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
+ if (fs_info->sb->s_flags & MS_RDONLY)
+ return -EROFS;
+
+ mutex_lock(&fs_info->restripe_mutex);
+ if (fs_info->restripe_ctl) {
+ ret = -EINPROGRESS;
+ goto out;
+ }
+
+ rctl = kzalloc(sizeof(*rctl), GFP_NOFS);
+ if (!rctl) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ rctl->fs_info = fs_info;
+ /* relocate everything - no filters */
+ rctl->flags |= BTRFS_RESTRIPE_TYPE_MASK;
+
+ ret = btrfs_restripe(rctl, 0);
+
+ /* rctl freed in unset_restripe_control */
+out:
+ mutex_unlock(&fs_info->restripe_mutex);
+ return ret;
+}
+
static long btrfs_ioctl_restripe(struct btrfs_root *root, void __user *arg)
{
struct btrfs_ioctl_restripe_args *rargs;
@@ -2974,7 +3010,7 @@ long btrfs_ioctl(struct file *file, unsigned int
case BTRFS_IOC_DEV_INFO:
return btrfs_ioctl_dev_info(root, argp);
case BTRFS_IOC_BALANCE:
- return btrfs_balance(root->fs_info->dev_root);
+ return btrfs_ioctl_balance(root);
case BTRFS_IOC_CLONE:
return btrfs_ioctl_clone(file, arg, 0, 0, 0);
case BTRFS_IOC_CLONE_RANGE:
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index d8958e2..ead4996 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2045,112 +2045,6 @@ error:
return ret;
}
-static u64 div_factor(u64 num, int factor)
-{
- if (factor == 10)
- return num;
- num *= factor;
- do_div(num, 10);
- return num;
-}
-
-int btrfs_balance(struct btrfs_root *dev_root)
-{
- int ret;
- struct list_head *devices = &dev_root->fs_info->fs_devices->devices;
- struct btrfs_device *device;
- u64 old_size;
- u64 size_to_free;
- struct btrfs_path *path;
- struct btrfs_key key;
- struct btrfs_root *chunk_root = dev_root->fs_info->chunk_root;
- struct btrfs_trans_handle *trans;
- struct btrfs_key found_key;
-
- if (dev_root->fs_info->sb->s_flags & MS_RDONLY)
- return -EROFS;
-
- if (!capable(CAP_SYS_ADMIN))
- return -EPERM;
-
- mutex_lock(&dev_root->fs_info->volume_mutex);
- dev_root = dev_root->fs_info->dev_root;
-
- /* step one make some room on all the devices */
- list_for_each_entry(device, devices, dev_list) {
- old_size = device->total_bytes;
- size_to_free = div_factor(old_size, 1);
- size_to_free = min(size_to_free, (u64)1 * 1024 * 1024);
- if (!device->writeable ||
- device->total_bytes - device->bytes_used > size_to_free)
- continue;
-
- ret = btrfs_shrink_device(device, old_size - size_to_free);
- if (ret == -ENOSPC)
- break;
- BUG_ON(ret);
-
- trans = btrfs_start_transaction(dev_root, 0);
- BUG_ON(IS_ERR(trans));
-
- ret = btrfs_grow_device(trans, device, old_size);
- BUG_ON(ret);
-
- btrfs_end_transaction(trans, dev_root);
- }
-
- /* step two, relocate all the chunks */
- path = btrfs_alloc_path();
- if (!path) {
- ret = -ENOMEM;
- goto error;
- }
- key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
- key.offset = (u64)-1;
- key.type = BTRFS_CHUNK_ITEM_KEY;
-
- while (1) {
- ret = btrfs_search_slot(NULL, chunk_root, &key, path, 0, 0);
- if (ret < 0)
- goto error;
-
- /*
- * this shouldn't happen, it means the last relocate
- * failed
- */
- if (ret == 0)
- break;
-
- ret = btrfs_previous_item(chunk_root, path, 0,
- BTRFS_CHUNK_ITEM_KEY);
- if (ret)
- break;
-
- btrfs_item_key_to_cpu(path->nodes[0], &found_key,
- path->slots[0]);
- if (found_key.objectid != key.objectid)
- break;
-
- /* chunk zero is special */
- if (found_key.offset == 0)
- break;
-
- btrfs_release_path(path);
- ret = btrfs_relocate_chunk(chunk_root,
- chunk_root->root_key.objectid,
- found_key.objectid,
- found_key.offset);
- if (ret && ret != -ENOSPC)
- goto error;
- key.offset = found_key.offset - 1;
- }
- ret = 0;
-error:
- btrfs_free_path(path);
- mutex_unlock(&dev_root->fs_info->volume_mutex);
- return ret;
-}
-
static int insert_restripe_item(struct btrfs_root *root,
struct restripe_control *rctl)
{
@@ -2500,6 +2394,15 @@ static int should_restripe_chunk(struct btrfs_root *root,
return 1;
}
+static u64 div_factor(u64 num, int factor)
+{
+ if (factor == 10)
+ return num;
+ num *= factor;
+ do_div(num, 10);
+ return num;
+}
+
static int __btrfs_restripe(struct btrfs_root *dev_root)
{
struct list_head *devices;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index c0652c9..20da71f 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -258,7 +258,6 @@ struct btrfs_device *btrfs_find_device(struct btrfs_root *root, u64 devid,
u8 *uuid, u8 *fsid);
int btrfs_shrink_device(struct btrfs_device *device, u64 new_size);
int btrfs_init_new_device(struct btrfs_root *root, char *path);
-int btrfs_balance(struct btrfs_root *dev_root);
int btrfs_restripe(struct restripe_control *rctl, int resume);
int btrfs_recover_restripe(struct btrfs_root *tree_root);
int btrfs_cancel_restripe(struct btrfs_fs_info *fs_info);
--
1.7.5.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 21/21] Btrfs: add restripe progress reporting
2011-08-23 20:01 [PATCH 00/21] [RFC] Btrfs: restriper Ilya Dryomov
` (19 preceding siblings ...)
2011-08-23 20:02 ` [PATCH 20/21] Btrfs: get rid of btrfs_balance() function Ilya Dryomov
@ 2011-08-23 20:02 ` Ilya Dryomov
2011-09-27 12:47 ` [PATCH 00/21] [RFC] Btrfs: restriper David Sterba
` (2 subsequent siblings)
23 siblings, 0 replies; 42+ messages in thread
From: Ilya Dryomov @ 2011-08-23 20:02 UTC (permalink / raw)
To: linux-btrfs; +Cc: Chris Mason, Hugo Mills, idryomov
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
fs/btrfs/ioctl.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
fs/btrfs/ioctl.h | 2 ++
fs/btrfs/volumes.c | 40 ++++++++++++++++++++++++++++++++++------
fs/btrfs/volumes.h | 3 +++
4 files changed, 84 insertions(+), 6 deletions(-)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 4f29149..a342544 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2966,6 +2966,49 @@ static long btrfs_ioctl_restripe_ctl(struct btrfs_root *root,
return -EINVAL;
}
+static long btrfs_ioctl_restripe_progress(struct btrfs_root *root,
+ void __user *arg)
+{
+ struct btrfs_fs_info *fs_info = root->fs_info;
+ struct btrfs_ioctl_restripe_args *rargs;
+ struct restripe_control *rctl;
+ int ret = 0;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
+ mutex_lock(&fs_info->restripe_mutex);
+ if (!(rctl = fs_info->restripe_ctl)) {
+ ret = -ENOTCONN;
+ goto out;
+ }
+
+ rargs = kzalloc(sizeof(*rargs), GFP_NOFS);
+ if (!rargs) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ rargs->flags = rctl->flags;
+ rargs->state = fs_info->restripe_state;
+
+ memcpy(&rargs->data, &rctl->data, sizeof(rargs->data));
+ memcpy(&rargs->sys, &rctl->sys, sizeof(rargs->sys));
+ memcpy(&rargs->meta, &rctl->meta, sizeof(rargs->meta));
+
+ spin_lock(&fs_info->restripe_lock);
+ memcpy(&rargs->stat, &rctl->stat, sizeof(rargs->stat));
+ spin_unlock(&fs_info->restripe_lock);
+
+ if (copy_to_user(arg, rargs, sizeof(*rargs)))
+ ret = -EFAULT;
+
+ kfree(rargs);
+out:
+ mutex_unlock(&fs_info->restripe_mutex);
+ return ret;
+}
+
long btrfs_ioctl(struct file *file, unsigned int
cmd, unsigned long arg)
{
@@ -3042,6 +3085,8 @@ long btrfs_ioctl(struct file *file, unsigned int
return btrfs_ioctl_restripe(root, argp);
case BTRFS_IOC_RESTRIPE_CTL:
return btrfs_ioctl_restripe_ctl(root, arg);
+ case BTRFS_IOC_RESTRIPE_PROGRESS:
+ return btrfs_ioctl_restripe_progress(root, argp);
}
return -ENOTTY;
diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h
index 365d06c..2154816 100644
--- a/fs/btrfs/ioctl.h
+++ b/fs/btrfs/ioctl.h
@@ -290,4 +290,6 @@ struct btrfs_ioctl_space_args {
#define BTRFS_IOC_RESTRIPE _IOW(BTRFS_IOCTL_MAGIC, 32, \
struct btrfs_ioctl_restripe_args)
#define BTRFS_IOC_RESTRIPE_CTL _IOW(BTRFS_IOCTL_MAGIC, 33, int)
+#define BTRFS_IOC_RESTRIPE_PROGRESS _IOR(BTRFS_IOCTL_MAGIC, 34, \
+ struct btrfs_ioctl_restripe_args)
#endif
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index ead4996..9a248b9 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2187,8 +2187,10 @@ static void set_restripe_control(struct restripe_control *rctl, int update)
spin_lock(&fs_info->restripe_lock);
fs_info->restripe_ctl = rctl;
- if (update)
+ if (update) {
update_restripe_args(rctl);
+ memset(&rctl->stat, 0, sizeof(rctl->stat));
+ }
spin_unlock(&fs_info->restripe_lock);
}
@@ -2419,6 +2421,7 @@ static int __btrfs_restripe(struct btrfs_root *dev_root)
int slot;
int ret;
int enospc_errors = 0;
+ bool counting_only = true;
/* step one make some room on all the devices */
devices = &dev_root->fs_info->fs_devices->devices;
@@ -2451,12 +2454,14 @@ static int __btrfs_restripe(struct btrfs_root *dev_root)
goto error;
}
+again:
key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
key.offset = (u64)-1;
key.type = BTRFS_CHUNK_ITEM_KEY;
while (1) {
struct btrfs_fs_info *fs_info = dev_root->fs_info;
+ struct restripe_control *rctl = fs_info->restripe_ctl;
if (test_bit(RESTRIPE_CANCEL_REQ, &fs_info->restripe_state) ||
test_bit(RESTRIPE_PAUSE_REQ, &fs_info->restripe_state)) {
@@ -2493,25 +2498,48 @@ static int __btrfs_restripe(struct btrfs_root *dev_root)
chunk = btrfs_item_ptr(leaf, slot, struct btrfs_chunk);
- if (!should_restripe_chunk(chunk_root, leaf, chunk,
- found_key.offset)) {
- btrfs_release_path(path);
- goto loop;
+ if (!counting_only) {
+ spin_lock(&fs_info->restripe_lock);
+ rctl->stat.considered++;
+ spin_unlock(&fs_info->restripe_lock);
}
+ ret = should_restripe_chunk(chunk_root, leaf, chunk,
+ found_key.offset);
btrfs_release_path(path);
+ if (!ret)
+ goto loop;
+
+ if (counting_only) {
+ spin_lock(&fs_info->restripe_lock);
+ rctl->stat.expected++;
+ spin_unlock(&fs_info->restripe_lock);
+ goto loop;
+ }
+
ret = btrfs_relocate_chunk(chunk_root,
chunk_root->root_key.objectid,
found_key.objectid,
found_key.offset);
if (ret && ret != -ENOSPC)
goto error;
- if (ret == -ENOSPC)
+ if (ret == -ENOSPC) {
enospc_errors++;
+ } else {
+ spin_lock(&fs_info->restripe_lock);
+ rctl->stat.completed++;
+ spin_unlock(&fs_info->restripe_lock);
+ }
loop:
key.offset = found_key.offset - 1;
}
+ if (counting_only) {
+ btrfs_release_path(path);
+ counting_only = false;
+ goto again;
+ }
+
error:
btrfs_free_path(path);
if (enospc_errors) {
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 20da71f..5ca3b3b 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -207,6 +207,7 @@ struct map_lookup {
#define RESTRIPE_PAUSE_REQ 2
struct btrfs_restripe_args;
+struct btrfs_restripe_progress;
struct restripe_control {
struct btrfs_fs_info *fs_info;
u64 flags;
@@ -214,6 +215,8 @@ struct restripe_control {
struct btrfs_restripe_args data;
struct btrfs_restripe_args sys;
struct btrfs_restripe_args meta;
+
+ struct btrfs_restripe_progress stat;
};
int btrfs_account_dev_extents_size(struct btrfs_device *device, u64 start,
--
1.7.5.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH 00/21] [RFC] Btrfs: restriper
2011-08-23 20:01 [PATCH 00/21] [RFC] Btrfs: restriper Ilya Dryomov
` (20 preceding siblings ...)
2011-08-23 20:02 ` [PATCH 21/21] Btrfs: add restripe progress reporting Ilya Dryomov
@ 2011-09-27 12:47 ` David Sterba
2011-11-14 23:59 ` Phillip Susi
2011-11-17 3:13 ` Phillip Susi
23 siblings, 0 replies; 42+ messages in thread
From: David Sterba @ 2011-09-27 12:47 UTC (permalink / raw)
To: Ilya Dryomov; +Cc: chris.mason, hugo, linux-btrfs
Hi,
I've hit a problem with restriper but under ragher unclear conditions:
[12308.210636] ------------[ cut here ]------------
[12308.214185] kernel BUG at fs/btrfs/relocation.c:2047!
[12308.214185] invalid opcode: 0000 [#1] SMP
[12308.214185] CPU 0
[12308.214185] Modules linked in: loop btrfs aoe
[12308.214185]
[12308.214185] Pid: 31102, comm: btrfs Not tainted 3.1.0-rc7-default+ #32 Intel Corporation Santa Rosa platform/Matanzas
[12308.214185] RIP: 0010:[<ffffffffa0084af5>] [<ffffffffa0084af5>] merge_reloc_root+0x5d5/0x600 [btrfs]
[12308.214185] RSP: 0018:ffff88003e0159f8 EFLAGS: 00010293
[12308.214185] RAX: 00000000ffffffe4 RBX: ffff880051bc1c70 RCX: 0000000000000000
[12308.214185] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff880053a9ccb8
[12308.214185] RBP: ffff88003e015ae8 R08: 0000000000000000 R09: 0000000000000000
[12308.214185] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880075041000
[12308.214185] R13: ffff8800585bb198 R14: ffff880000000000 R15: ffff880026e04000
[12308.214185] FS: 00007fda377f3740(0000) GS:ffff88007e400000(0000) knlGS:0000000000000000
[12308.214185] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[12308.214185] CR2: 00007f049bb1c000 CR3: 0000000026c97000 CR4: 00000000000006f0
[12308.214185] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[12308.214185] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[12308.214185] Process btrfs (pid: 31102, threadinfo ffff88003e014000, task ffff880040d549c0)
[12308.214185] Stack:
[12308.214185] 000000003e015a08 ffff880057b98070 ffff880026fc30fc 0000000000000246
[12308.214185] ffff880057b98070 ffff880057b98058 000000000000e000 ffff880026fc3000
[12308.214185] ffff880057b98058 ffff880057b98058 ffff88003e015a68 ffffffff81c2835b
[12308.214185] Call Trace:
[12308.214185] [<ffffffff81c2835b>] ? _raw_spin_unlock+0x2b/0x50
[12308.214185] [<ffffffffa00364ad>] ? btrfs_read_fs_root_no_name+0x1fd/0x310 [btrfs]
[12308.214185] [<ffffffffa0084c44>] merge_reloc_roots+0x124/0x150 [btrfs]
[12308.214185] [<ffffffffa0085258>] relocate_block_group+0x398/0x610 [btrfs]
[12308.214185] [<ffffffffa003bcf7>] ? btrfs_clean_old_snapshots+0x197/0x1c0 [btrfs]
[12308.214185] [<ffffffffa0085680>] btrfs_relocate_block_group+0x1b0/0x2e0 [btrfs]
[12308.214185] [<ffffffffa0060b7b>] btrfs_relocate_chunk+0x8b/0x6c0 [btrfs]
[12308.214185] [<ffffffff810e0e10>] ? trace_hardirqs_on_caller+0x20/0x1d0
[12308.214185] [<ffffffff81089383>] ? __wake_up+0x53/0x70
[12308.214185] [<ffffffffa006ef80>] ? btrfs_tree_read_unlock_blocking+0x40/0x60 [btrfs]
[12308.214185] [<ffffffffa0064ca9>] btrfs_restripe+0x689/0xb00 [btrfs]
[12308.214185] [<ffffffff811858e4>] ? __kmalloc+0x234/0x260
[12308.214185] [<ffffffffa006e871>] btrfs_ioctl+0x14e1/0x1560 [btrfs]
[12308.214185] [<ffffffff81c2c660>] ? do_page_fault+0x2d0/0x580
[12308.214185] [<ffffffff811a4568>] do_vfs_ioctl+0x98/0x560
[12308.214185] [<ffffffff810da369>] ? trace_hardirqs_off_caller+0x29/0xc0
[12308.214185] [<ffffffff81c28bd9>] ? retint_swapgs+0x13/0x1b
[12308.214185] [<ffffffff81192a6b>] ? fget_light+0x17b/0x3c0
[12308.214185] [<ffffffff811a4a7f>] sys_ioctl+0x4f/0x80
[12308.214185] [<ffffffff81c312c2>] system_call_fastpath+0x16/0x1b
[12308.214185] Code: ff ff 41 bd f4 ff ff ff eb b9 48 8d 95 70 ff ff ff 48 8d 75 90 4c 89 ff e8 a9 9f ff ff eb a4 48 89 df e8 cf 28 f9 ff eb 9a 0f 0b <0f> 0b 0f 0b 0f 0b be ef 07 00 00 48 c7 c7 b4 49 09 a0 e8 54 99
[12308.214185] RIP [<ffffffffa0084af5>] merge_reloc_root+0x5d5/0x600 [btrfs]
[12308.214185] RSP <ffff88003e0159f8>
[12308.652440] ---[ end trace a106d7cf9f82a8ff ]---
steps before the crash
- data: a freshly created raid10, 5 devices with about 4 gigs of data, lots of
chained snapshots, lots of them deleted (both numbers are in order of 10)
- device remove
- restripe
- device add
- restriper start [blocked]
- restripe cancel [blocked]
- *crash*
- successive mount is ok
- rebalance continues, can be started/cancelled without problems
the error is ENOSPC from snapshot cleanup.
one thing that was visible only on disk activity monitor was a steady
several-megs of writes performed by freespace thread. I've seen this already,
but I'm not able to reproduce it reliably.
the tree is from my experimental integration branch
http://repo.or.cz/w/linux-2.6/btrfs-unstable.git integration/btrfs-next-experimental
(linus+josef+mark+janosch+restriper+hotfixes from mailinglist)
apart from that, basic switching of raids works nicely.
david
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 01/21] Btrfs: get rid of *_alloc_profile fields
2011-08-23 20:01 ` [PATCH 01/21] Btrfs: get rid of *_alloc_profile fields Ilya Dryomov
@ 2011-09-27 12:51 ` David Sterba
0 siblings, 0 replies; 42+ messages in thread
From: David Sterba @ 2011-09-27 12:51 UTC (permalink / raw)
To: Ilya Dryomov; +Cc: linux-btrfs, Chris Mason, Hugo Mills
On Tue, Aug 23, 2011 at 11:01:42PM +0300, Ilya Dryomov wrote:
> {data,metadata,system}_alloc_profile fields have been unused for a long
> time now. Get rid of them.
a good cleanup which could be sent separately.
d/
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 07/21] Btrfs: add basic infrastructure for selective balancing
2011-08-23 20:01 ` [PATCH 07/21] Btrfs: add basic infrastructure for selective balancing Ilya Dryomov
@ 2011-09-27 13:02 ` David Sterba
2011-09-27 17:28 ` Ilya Dryomov
0 siblings, 1 reply; 42+ messages in thread
From: David Sterba @ 2011-09-27 13:02 UTC (permalink / raw)
To: Ilya Dryomov; +Cc: linux-btrfs, Chris Mason, Hugo Mills
On Tue, Aug 23, 2011 at 11:01:48PM +0300, Ilya Dryomov wrote:
> This allows to have a separate set of filters for each chunk type
> (data,meta,sys). The code however is generic and switch on chunk type
> is only done once.
>
> This commit also adds a type filter: it allows to balance for example
> meta and system chunks w/o touching data ones.
>
> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
> ---
> fs/btrfs/volumes.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++--
> fs/btrfs/volumes.h | 12 +++++++++
> 2 files changed, 76 insertions(+), 3 deletions(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 0e4a276..95c6310 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2175,6 +2175,30 @@ static void unset_restripe_control(struct btrfs_fs_info *fs_info)
> kfree(rctl);
> }
>
> +static int should_restripe_chunk(struct btrfs_root *root,
> + struct extent_buffer *leaf,
> + struct btrfs_chunk *chunk, u64 chunk_offset)
> +{
> + struct restripe_control *rctl = root->fs_info->restripe_ctl;
> + u64 chunk_type = btrfs_chunk_type(leaf, chunk);
> + struct btrfs_restripe_args *rargs = NULL;
> +
> + /* type filter */
> + if (!((chunk_type & BTRFS_BLOCK_GROUP_TYPE_MASK) &
> + (rctl->flags & BTRFS_RESTRIPE_TYPE_MASK))) {
> + return 0;
> + }
> +
> + if (chunk_type & BTRFS_BLOCK_GROUP_DATA)
> + rargs = &rctl->data;
> + else if (chunk_type & BTRFS_BLOCK_GROUP_SYSTEM)
> + rargs = &rctl->sys;
> + else if (chunk_type & BTRFS_BLOCK_GROUP_METADATA)
> + rargs = &rctl->meta;
what's the point of setting local variable 'rargs' without using or
returning it?
> +
> + return 1;
> +}
> +
> static int __btrfs_restripe(struct btrfs_root *dev_root)
> {
> struct list_head *devices;
> @@ -2182,10 +2206,13 @@ static int __btrfs_restripe(struct btrfs_root *dev_root)
> u64 old_size;
> u64 size_to_free;
> struct btrfs_root *chunk_root = dev_root->fs_info->chunk_root;
> + struct btrfs_chunk *chunk;
> struct btrfs_path *path;
> struct btrfs_key key;
> struct btrfs_key found_key;
> struct btrfs_trans_handle *trans;
> + struct extent_buffer *leaf;
> + int slot;
> int ret;
> int enospc_errors = 0;
>
> @@ -2241,8 +2268,10 @@ static int __btrfs_restripe(struct btrfs_root *dev_root)
> if (ret)
> BUG_ON(1); /* DIS - break ? */
>
> - btrfs_item_key_to_cpu(path->nodes[0], &found_key,
> - path->slots[0]);
> + leaf = path->nodes[0];
> + slot = path->slots[0];
> + btrfs_item_key_to_cpu(leaf, &found_key, slot);
> +
> if (found_key.objectid != key.objectid)
> break;
>
> @@ -2250,6 +2279,14 @@ static int __btrfs_restripe(struct btrfs_root *dev_root)
> if (found_key.offset == 0)
> break;
>
> + chunk = btrfs_item_ptr(leaf, slot, struct btrfs_chunk);
> +
> + if (!should_restripe_chunk(chunk_root, leaf, chunk,
> + found_key.offset)) {
> + btrfs_release_path(path);
> + goto loop;
> + }
> +
> btrfs_release_path(path);
> ret = btrfs_relocate_chunk(chunk_root,
> chunk_root->root_key.objectid,
> @@ -2259,6 +2296,7 @@ static int __btrfs_restripe(struct btrfs_root *dev_root)
> goto error;
> if (ret == -ENOSPC)
> enospc_errors++;
> +loop:
> key.offset = found_key.offset - 1;
> }
>
> @@ -2285,8 +2323,30 @@ int btrfs_restripe(struct restripe_control *rctl)
> mutex_lock(&fs_info->volume_mutex);
>
> /*
> - * Profile changing sanity checks
> + * In case of mixed groups both data and meta should be picked,
> + * and identical options should be given for both of them.
> */
> + allowed = btrfs_super_incompat_flags(&fs_info->super_copy);
> + if ((allowed & BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS) &&
> + (rctl->flags & (BTRFS_RESTRIPE_DATA | BTRFS_RESTRIPE_METADATA))) {
> + if (!(rctl->flags & BTRFS_RESTRIPE_DATA) ||
> + !(rctl->flags & BTRFS_RESTRIPE_METADATA) ||
> + memcmp(&rctl->data, &rctl->meta, sizeof(rctl->data))) {
> + printk(KERN_ERR "btrfs: with mixed groups data and "
> + "metadata restripe options must be the same\n");
> + ret = -EINVAL;
> + goto out;
> + }
> + }
> +
> + /*
> + * Profile changing sanity checks. Skip them if a simple
> + * balance is requested.
> + */
> + if (!((rctl->data.flags | rctl->sys.flags | rctl->meta.flags) &
> + BTRFS_RESTRIPE_ARGS_CONVERT))
> + goto do_restripe;
> +
> allowed = BTRFS_AVAIL_ALLOC_BIT_SINGLE;
> if (fs_info->fs_devices->num_devices == 1)
> allowed |= BTRFS_BLOCK_GROUP_DUP;
> @@ -2344,6 +2404,7 @@ int btrfs_restripe(struct restripe_control *rctl)
> }
> }
>
> +do_restripe:
> set_restripe_control(rctl);
> mutex_unlock(&fs_info->volume_mutex);
>
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 8804c5c..f40227e 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -168,6 +168,18 @@ struct map_lookup {
> #define map_lookup_size(n) (sizeof(struct map_lookup) + \
> (sizeof(struct btrfs_bio_stripe) * (n)))
>
> +/*
> + * Restriper's general "type" filter. Shares bits with chunk type for
> + * simplicity, RESTRIPE prefix is used to avoid confusion.
> + */
> +#define BTRFS_RESTRIPE_DATA (1ULL << 0)
> +#define BTRFS_RESTRIPE_SYSTEM (1ULL << 1)
> +#define BTRFS_RESTRIPE_METADATA (1ULL << 2)
> +
> +#define BTRFS_RESTRIPE_TYPE_MASK (BTRFS_RESTRIPE_DATA | \
> + BTRFS_RESTRIPE_SYSTEM | \
> + BTRFS_RESTRIPE_METADATA)
> +
> #define BTRFS_RESTRIPE_FORCE (1ULL << 3)
>
> /*
> --
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 10/21] Btrfs: usage filter
2011-08-23 20:01 ` [PATCH 10/21] Btrfs: usage filter Ilya Dryomov
@ 2011-09-27 13:22 ` David Sterba
2011-11-01 10:18 ` Arne Jansen
1 sibling, 0 replies; 42+ messages in thread
From: David Sterba @ 2011-09-27 13:22 UTC (permalink / raw)
To: Ilya Dryomov; +Cc: linux-btrfs, Chris Mason, Hugo Mills
On Tue, Aug 23, 2011 at 11:01:51PM +0300, Ilya Dryomov wrote:
> Select chunks that are less than X percent full.
>
> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
> ---
> fs/btrfs/volumes.c | 33 +++++++++++++++++++++++++++++++++
> fs/btrfs/volumes.h | 1 +
> 2 files changed, 34 insertions(+), 0 deletions(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index f045615..b49ecfa 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2193,6 +2193,33 @@ static int chunk_profiles_filter(u64 chunk_profile,
> return 1;
> }
>
> +static u64 div_factor_fine(u64 num, int factor)
> +{
factor is obtained from userspace via btrfs_restripe_args and should
imhoe be checked for safety.
> + if (factor == 100)
something like this (if the type is really 'int')
if (factor < 0 || factor >= 100)
> + return num;
> + num *= factor;
> + do_div(num, 100);
> + return num;
> +}
> +
> +static int chunk_usage_filter(struct btrfs_fs_info *fs_info, u64 chunk_offset,
> + struct btrfs_restripe_args *rargs)
> +{
> + struct btrfs_block_group_cache *cache;
> + u64 chunk_used, user_thresh;
> + int ret = 1;
> +
> + cache = btrfs_lookup_block_group(fs_info, chunk_offset);
> + chunk_used = btrfs_block_group_used(&cache->item);
> +
> + user_thresh = div_factor_fine(cache->key.offset, rargs->usage);
^^^^^^^^^^^^
does not seem right, but AFAICS is harmless, if an overflow occurs
> + if (chunk_used < user_thresh)
> + ret = 0;
will result in ret = 1 and code below will do not continue restriping
> +
> + btrfs_put_block_group(cache);
> + return ret;
> +}
> +
> static int chunk_soft_convert_filter(u64 chunk_profile,
> struct btrfs_restripe_args *rargs)
> {
> @@ -2236,6 +2263,12 @@ static int should_restripe_chunk(struct btrfs_root *root,
> return 0;
> }
>
> + /* usage filter */
> + if ((rargs->flags & BTRFS_RESTRIPE_ARGS_USAGE) &&
> + chunk_usage_filter(rctl->fs_info, chunk_offset, rargs)) {
^^^
will skip restriping chunk (if the previous holds).
> + return 0;
> + }
> +
> /* soft profile changing mode */
> if ((rargs->flags & BTRFS_RESTRIPE_ARGS_SOFT) &&
> chunk_soft_convert_filter(chunk_type, rargs)) {
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 9f96ad8..c6baf4b 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -186,6 +186,7 @@ struct map_lookup {
> * Restripe filters
> */
> #define BTRFS_RESTRIPE_ARGS_PROFILES (1ULL << 0)
> +#define BTRFS_RESTRIPE_ARGS_USAGE (1ULL << 1)
>
> /*
> * Profile changing flags. When SOFT is set we won't relocate chunk if
> --
> 1.7.5.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 14/21] Btrfs: save restripe parameters to disk
2011-08-23 20:01 ` [PATCH 14/21] Btrfs: save restripe parameters to disk Ilya Dryomov
@ 2011-09-27 13:43 ` David Sterba
2011-11-01 10:29 ` Arne Jansen
1 sibling, 0 replies; 42+ messages in thread
From: David Sterba @ 2011-09-27 13:43 UTC (permalink / raw)
To: Ilya Dryomov; +Cc: linux-btrfs, Chris Mason, Hugo Mills
On Tue, Aug 23, 2011 at 11:01:55PM +0300, Ilya Dryomov wrote:
> Introduce a new btree objectid for storing restripe item. The reason is
> to be able to resume restriper after a crash with the same parameters.
> Restripe item has a very high objectid and goes into tree of tree roots.
>
> The key for the new item is as follows:
>
> [ BTRFS_RESTRIPE_OBJECTID ; 0 ; 0 ]
>
> Older kernels simply ignore it so it's safe to mount with an older
> kernel and then go back to the newer one.
>
> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
> ---
> fs/btrfs/ctree.h | 127 +++++++++++++++++++++++++++++++++++++++++++++++++++-
> fs/btrfs/volumes.c | 105 ++++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 228 insertions(+), 4 deletions(-)
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 65d7562..b524034 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -85,6 +85,9 @@ struct btrfs_ordered_sum;
> /* holds checksums of all the data extents */
> #define BTRFS_CSUM_TREE_OBJECTID 7ULL
>
> +/* for storing restripe params in the root tree */
> +#define BTRFS_RESTRIPE_OBJECTID -4ULL
> +
> /* orhpan objectid for tracking unlinked/truncated files */
> #define BTRFS_ORPHAN_OBJECTID -5ULL
>
> @@ -649,6 +652,47 @@ struct btrfs_root_ref {
> __le16 name_len;
> } __attribute__ ((__packed__));
>
> +/*
> + * Restriper stuff
> + */
> +struct btrfs_disk_restripe_args {
> + /* profiles to touch, in-memory format */
> + __le64 profiles;
> +
> + /* usage filter */
> + __le64 usage;
> +
> + /* devid filter */
> + __le64 devid;
> +
> + /* devid subset filter [pstart..pend) */
> + __le64 pstart;
> + __le64 pend;
> +
> + /* btrfs virtual address space subset filter [vstart..vend) */
> + __le64 vstart;
> + __le64 vend;
> +
> + /* profile to convert to, in-memory format */
> + __le64 target;
> +
> + /* BTRFS_RESTRIPE_ARGS_* */
> + __le64 flags;
> +
> + __le64 unused[8];
> +} __attribute__ ((__packed__));
> +
> +struct btrfs_restripe_item {
> + /* BTRFS_RESTRIPE_* */
> + __le64 flags;
> +
> + struct btrfs_disk_restripe_args data;
> + struct btrfs_disk_restripe_args sys;
> + struct btrfs_disk_restripe_args meta;
> +
> + __le64 unused[4];
> +} __attribute__ ((__packed__));
> +
> #define BTRFS_FILE_EXTENT_INLINE 0
> #define BTRFS_FILE_EXTENT_REG 1
> #define BTRFS_FILE_EXTENT_PREALLOC 2
> @@ -727,7 +771,8 @@ struct btrfs_csum_item {
> BTRFS_BLOCK_GROUP_RAID10)
> /*
> * We need a bit for restriper to be able to tell when chunks of type
> - * SINGLE are available. It is used in avail_*_alloc_bits.
> + * SINGLE are available. It is used in avail_*_alloc_bits and restripe
> + * item fields.
> */
> #define BTRFS_AVAIL_ALLOC_BIT_SINGLE (1 << 7)
>
> @@ -2000,8 +2045,86 @@ static inline bool btrfs_root_readonly(struct btrfs_root *root)
> return root->root_item.flags & BTRFS_ROOT_SUBVOL_RDONLY;
> }
>
> -/* struct btrfs_super_block */
> +/* struct btrfs_restripe_item */
> +BTRFS_SETGET_FUNCS(restripe_flags, struct btrfs_restripe_item, flags, 64);
> +
> +static inline void btrfs_restripe_data(struct extent_buffer *eb,
> + struct btrfs_restripe_item *ri,
> + struct btrfs_disk_restripe_args *ra)
> +{
> + read_eb_member(eb, ri, struct btrfs_restripe_item, data, ra);
> +}
>
> +static inline void btrfs_set_restripe_data(struct extent_buffer *eb,
> + struct btrfs_restripe_item *ri,
> + struct btrfs_disk_restripe_args *ra)
> +{
> + write_eb_member(eb, ri, struct btrfs_restripe_item, data, ra);
> +}
> +
> +static inline void btrfs_restripe_meta(struct extent_buffer *eb,
> + struct btrfs_restripe_item *ri,
> + struct btrfs_disk_restripe_args *ra)
> +{
> + read_eb_member(eb, ri, struct btrfs_restripe_item, meta, ra);
> +}
> +
> +static inline void btrfs_set_restripe_meta(struct extent_buffer *eb,
> + struct btrfs_restripe_item *ri,
> + struct btrfs_disk_restripe_args *ra)
> +{
> + write_eb_member(eb, ri, struct btrfs_restripe_item, meta, ra);
> +}
> +
> +static inline void btrfs_restripe_sys(struct extent_buffer *eb,
> + struct btrfs_restripe_item *ri,
> + struct btrfs_disk_restripe_args *ra)
> +{
> + read_eb_member(eb, ri, struct btrfs_restripe_item, sys, ra);
> +}
> +
> +static inline void btrfs_set_restripe_sys(struct extent_buffer *eb,
> + struct btrfs_restripe_item *ri,
> + struct btrfs_disk_restripe_args *ra)
> +{
> + write_eb_member(eb, ri, struct btrfs_restripe_item, sys, ra);
> +}
> +
> +static inline void
> +btrfs_disk_restripe_args_to_cpu(struct btrfs_restripe_args *cpu,
> + struct btrfs_disk_restripe_args *disk)
> +{
> + memset(cpu, 0, sizeof(*cpu));
> +
> + cpu->profiles = le64_to_cpu(disk->profiles);
> + cpu->usage = le64_to_cpu(disk->usage);
> + cpu->devid = le64_to_cpu(disk->devid);
> + cpu->pstart = le64_to_cpu(disk->pstart);
> + cpu->pend = le64_to_cpu(disk->pend);
> + cpu->vstart = le64_to_cpu(disk->vstart);
> + cpu->vend = le64_to_cpu(disk->vend);
> + cpu->target = le64_to_cpu(disk->target);
> + cpu->flags = le64_to_cpu(disk->flags);
> +}
> +
> +static inline void
> +btrfs_cpu_restripe_args_to_disk(struct btrfs_disk_restripe_args *disk,
> + struct btrfs_restripe_args *cpu)
> +{
> + memset(disk, 0, sizeof(*disk));
> +
> + disk->profiles = cpu_to_le64(cpu->profiles);
> + disk->usage = cpu_to_le64(cpu->usage);
> + disk->devid = cpu_to_le64(cpu->devid);
> + disk->pstart = cpu_to_le64(cpu->pstart);
> + disk->pend = cpu_to_le64(cpu->pend);
> + disk->vstart = cpu_to_le64(cpu->vstart);
> + disk->vend = cpu_to_le64(cpu->vend);
> + disk->target = cpu_to_le64(cpu->target);
> + disk->flags = cpu_to_le64(cpu->flags);
> +}
> +
> +/* struct btrfs_super_block */
> BTRFS_SETGET_STACK_FUNCS(super_bytenr, struct btrfs_super_block, bytenr, 64);
> BTRFS_SETGET_STACK_FUNCS(super_flags, struct btrfs_super_block, flags, 64);
> BTRFS_SETGET_STACK_FUNCS(super_generation, struct btrfs_super_block,
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index eccd458..1057ad3 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2150,6 +2150,97 @@ error:
> return ret;
> }
>
> +static int insert_restripe_item(struct btrfs_root *root,
> + struct restripe_control *rctl)
> +{
> + struct btrfs_trans_handle *trans;
> + struct btrfs_restripe_item *item;
> + struct btrfs_disk_restripe_args disk_rargs;
> + struct btrfs_path *path;
> + struct extent_buffer *leaf;
> + struct btrfs_key key;
> + int ret, err;
> +
> + path = btrfs_alloc_path();
> + if (!path)
> + return -ENOMEM;
> +
> + trans = btrfs_start_transaction(root, 0);
> + if (IS_ERR(trans)) {
> + btrfs_free_path(path);
> + return PTR_ERR(trans);
> + }
> +
> + key.objectid = BTRFS_RESTRIPE_OBJECTID;
> + key.type = 0;
> + key.offset = 0;
> +
> + ret = btrfs_insert_empty_item(trans, root, path, &key,
> + sizeof(*item));
> + if (ret)
> + goto out;
> +
> + leaf = path->nodes[0];
> + item = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_restripe_item);
> +
> + memset_extent_buffer(leaf, 0, (unsigned long)item, sizeof(*item));
> +
> + btrfs_cpu_restripe_args_to_disk(&disk_rargs, &rctl->data);
> + btrfs_set_restripe_data(leaf, item, &disk_rargs);
> + btrfs_cpu_restripe_args_to_disk(&disk_rargs, &rctl->meta);
> + btrfs_set_restripe_meta(leaf, item, &disk_rargs);
> + btrfs_cpu_restripe_args_to_disk(&disk_rargs, &rctl->sys);
> + btrfs_set_restripe_sys(leaf, item, &disk_rargs);
> +
> + btrfs_set_restripe_flags(leaf, item, rctl->flags);
> +
> + btrfs_mark_buffer_dirty(leaf);
> +out:
> + btrfs_free_path(path);
> + err = btrfs_commit_transaction(trans, root);
> + if (err && !ret)
> + ret = err;
> + return ret;
> +}
> +
> +static int del_restripe_item(struct btrfs_root *root)
> +{
> + struct btrfs_trans_handle *trans;
> + struct btrfs_path *path;
> + struct btrfs_key key;
> + int ret, err;
> +
> + path = btrfs_alloc_path();
> + if (!path)
> + return -ENOMEM;
> +
> + trans = btrfs_start_transaction(root, 0);
> + if (IS_ERR(trans)) {
> + btrfs_free_path(path);
> + return PTR_ERR(trans);
> + }
> +
> + key.objectid = BTRFS_RESTRIPE_OBJECTID;
> + key.type = 0;
> + key.offset = 0;
> +
> + ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
> + if (ret < 0)
> + goto out;
> + if (ret > 0) {
> + ret = -ENOENT;
> + goto out;
> + }
> +
> + ret = btrfs_del_item(trans, root, path);
> +out:
> + btrfs_free_path(path);
> + err = btrfs_commit_transaction(trans, root);
> + if (err && !ret)
> + ret = err;
> + return ret;
> +}
> +
> /*
> * Should be called with both restripe and volume mutexes held to
> * serialize other volume operations (add_dev/rm_dev/resize) wrt
> @@ -2485,6 +2576,7 @@ int btrfs_restripe(struct restripe_control *rctl)
> {
> struct btrfs_fs_info *fs_info = rctl->fs_info;
> u64 allowed;
> + int err;
> int ret;
>
> mutex_lock(&fs_info->volume_mutex);
> @@ -2572,16 +2664,25 @@ int btrfs_restripe(struct restripe_control *rctl)
> }
>
> do_restripe:
> + ret = insert_restripe_item(fs_info->tree_root, rctl);
> + if (ret && ret != -EEXIST)
> + goto out;
> + BUG_ON(ret == -EEXIST);
> +
> set_restripe_control(rctl);
> mutex_unlock(&fs_info->volume_mutex);
>
> - ret = __btrfs_restripe(fs_info->dev_root);
> + err = __btrfs_restripe(fs_info->dev_root);
>
> mutex_lock(&fs_info->volume_mutex);
> +
> unset_restripe_control(fs_info);
> + ret = del_restripe_item(fs_info->tree_root);
> + BUG_ON(ret);
is it necessary to BUG_ON here? this can fire eg. during mount. if
the old restriper state is left in place, the return value from
insert_restripe_item above needs to be checked as well. my idea is some
kind of checkpointing of the restriper state, eg. transaction number
when the restriper succesfully finishes (and then can clean all
restriper states).
> +
> mutex_unlock(&fs_info->volume_mutex);
>
> - return ret;
> + return err;
>
> out:
> mutex_unlock(&fs_info->volume_mutex);
> --
> 1.7.5.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 07/21] Btrfs: add basic infrastructure for selective balancing
2011-09-27 13:02 ` David Sterba
@ 2011-09-27 17:28 ` Ilya Dryomov
0 siblings, 0 replies; 42+ messages in thread
From: Ilya Dryomov @ 2011-09-27 17:28 UTC (permalink / raw)
To: linux-btrfs, Chris Mason, Hugo Mills
On Tue, Sep 27, 2011 at 03:02:41PM +0200, David Sterba wrote:
> On Tue, Aug 23, 2011 at 11:01:48PM +0300, Ilya Dryomov wrote:
> > This allows to have a separate set of filters for each chunk type
> > (data,meta,sys). The code however is generic and switch on chunk type
> > is only done once.
> >
> > This commit also adds a type filter: it allows to balance for example
> > meta and system chunks w/o touching data ones.
> >
> > Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
> > ---
> > fs/btrfs/volumes.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++--
> > fs/btrfs/volumes.h | 12 +++++++++
> > 2 files changed, 76 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > index 0e4a276..95c6310 100644
> > --- a/fs/btrfs/volumes.c
> > +++ b/fs/btrfs/volumes.c
> > @@ -2175,6 +2175,30 @@ static void unset_restripe_control(struct btrfs_fs_info *fs_info)
> > kfree(rctl);
> > }
> >
> > +static int should_restripe_chunk(struct btrfs_root *root,
> > + struct extent_buffer *leaf,
> > + struct btrfs_chunk *chunk, u64 chunk_offset)
> > +{
> > + struct restripe_control *rctl = root->fs_info->restripe_ctl;
> > + u64 chunk_type = btrfs_chunk_type(leaf, chunk);
> > + struct btrfs_restripe_args *rargs = NULL;
> > +
> > + /* type filter */
> > + if (!((chunk_type & BTRFS_BLOCK_GROUP_TYPE_MASK) &
> > + (rctl->flags & BTRFS_RESTRIPE_TYPE_MASK))) {
> > + return 0;
> > + }
> > +
> > + if (chunk_type & BTRFS_BLOCK_GROUP_DATA)
> > + rargs = &rctl->data;
> > + else if (chunk_type & BTRFS_BLOCK_GROUP_SYSTEM)
> > + rargs = &rctl->sys;
> > + else if (chunk_type & BTRFS_BLOCK_GROUP_METADATA)
> > + rargs = &rctl->meta;
>
> what's the point of setting local variable 'rargs' without using or
> returning it?
rargs is being used later in the series, it is passed to every filter
function. It's kind of hard to review, but that way I can break the
thing into logical chunks and describe each of them.
Thanks,
Ilya
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 03/21] Btrfs: add BTRFS_AVAIL_ALLOC_BIT_SINGLE bit
2011-08-23 20:01 ` [PATCH 03/21] Btrfs: add BTRFS_AVAIL_ALLOC_BIT_SINGLE bit Ilya Dryomov
@ 2011-11-01 7:56 ` Arne Jansen
0 siblings, 0 replies; 42+ messages in thread
From: Arne Jansen @ 2011-11-01 7:56 UTC (permalink / raw)
To: Ilya Dryomov; +Cc: linux-btrfs, Chris Mason, Hugo Mills
On 23.08.2011 22:01, Ilya Dryomov wrote:
> Right now on-disk BTRFS_BLOCK_GROUP_* profile bits are used for
> avail_{data,metadata,system}_alloc_bits fields, which are there to tell
> us about available allocation profiles in the fs. When chunk is
> created, it's profile is OR'ed with respective avail_alloc_bits field.
> Since SINGLE is denoted by 0 in the on-disk format, currently there is
> no way to tell when such chunks become avaialble. Restriper needs that
> information, so add a separate bit for SINGLE profile.
>
> This bit is going to be in-memory only, it should never be written out
> to disk, so it's not a disk format change. However to avoid remappings
> in future, reserve corresponding on-disk bit.
>
> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
> ---
> fs/btrfs/ctree.h | 12 ++++++++++++
> fs/btrfs/extent-tree.c | 22 ++++++++++++++--------
> 2 files changed, 26 insertions(+), 8 deletions(-)
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index b882c95..5b00eb8 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -725,6 +725,17 @@ struct btrfs_csum_item {
> BTRFS_BLOCK_GROUP_RAID1 | \
> BTRFS_BLOCK_GROUP_DUP | \
> BTRFS_BLOCK_GROUP_RAID10)
> +/*
> + * We need a bit for restriper to be able to tell when chunks of type
> + * SINGLE are available. It is used in avail_*_alloc_bits.
> + */
> +#define BTRFS_AVAIL_ALLOC_BIT_SINGLE (1 << 7)
> +
> +/*
> + * To avoid troubles or remappings, reserve on-disk bit.
> + */
> +#define BTRFS_BLOCK_GROUP_RESERVED (1 << 7)
can you move this define up to where the other BLOCK_GROUPS are defined?
Otherwise it is easy to overlook.
> +
> struct btrfs_block_group_item {
> __le64 used;
> __le64 chunk_objectid;
> @@ -1100,6 +1111,7 @@ struct btrfs_fs_info {
> spinlock_t ref_cache_lock;
> u64 total_ref_cache_size;
>
> + /* SINGLE has it's own bit for these three */
While this comment is easily understandable in the context in this patch,
it is not enough when just reading the resulting code without the commit
message. It would be good if you could duplicate more of the commit message
into code comments.
> u64 avail_data_alloc_bits;
> u64 avail_metadata_alloc_bits;
> u64 avail_system_alloc_bits;
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index de4c639..ed35eb5 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -2945,14 +2945,17 @@ static int update_space_info(struct btrfs_fs_info *info, u64 flags,
> static void set_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags)
> {
> u64 extra_flags = flags & BTRFS_BLOCK_GROUP_PROFILE_MASK;
> - if (extra_flags) {
> - if (flags & BTRFS_BLOCK_GROUP_DATA)
> - fs_info->avail_data_alloc_bits |= extra_flags;
> - if (flags & BTRFS_BLOCK_GROUP_METADATA)
> - fs_info->avail_metadata_alloc_bits |= extra_flags;
> - if (flags & BTRFS_BLOCK_GROUP_SYSTEM)
> - fs_info->avail_system_alloc_bits |= extra_flags;
> - }
> +
> + /* on-disk -> in-memory */
> + if (extra_flags == 0)
> + extra_flags = BTRFS_AVAIL_ALLOC_BIT_SINGLE;
> +
> + if (flags & BTRFS_BLOCK_GROUP_DATA)
> + fs_info->avail_data_alloc_bits |= extra_flags;
> + if (flags & BTRFS_BLOCK_GROUP_METADATA)
> + fs_info->avail_metadata_alloc_bits |= extra_flags;
> + if (flags & BTRFS_BLOCK_GROUP_SYSTEM)
> + fs_info->avail_system_alloc_bits |= extra_flags;
> }
>
> u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags)
> @@ -2986,6 +2989,9 @@ u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags)
> (flags & BTRFS_BLOCK_GROUP_RAID10) |
> (flags & BTRFS_BLOCK_GROUP_DUP)))
> flags &= ~BTRFS_BLOCK_GROUP_RAID0;
> +
> + /* in-memory -> on-disk */
> + flags &= ~BTRFS_AVAIL_ALLOC_BIT_SINGLE;
> return flags;
> }
>
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 05/21] Btrfs: add basic restriper infrastructure
2011-08-23 20:01 ` [PATCH 05/21] Btrfs: add basic restriper infrastructure Ilya Dryomov
@ 2011-11-01 10:08 ` Arne Jansen
2011-11-01 11:07 ` David Sterba
0 siblings, 1 reply; 42+ messages in thread
From: Arne Jansen @ 2011-11-01 10:08 UTC (permalink / raw)
To: Ilya Dryomov; +Cc: linux-btrfs, Chris Mason, Hugo Mills
On 23.08.2011 22:01, Ilya Dryomov wrote:
> Add basic restriper infrastructure: ioctl to start restripe, all
> restripe ioctl data structures, add data structure for tracking
> restriper's state to fs_info. Duplicate balancing code for restriper,
> btrfs_balance() will be removed when restriper is implemented.
>
> Explicitly disallow any volume operations when restriper is running.
> (previously this restriction relied on volume_mutex being held during
> the execution of any volume operation)
>
> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
> ---
> fs/btrfs/ctree.h | 5 +
> fs/btrfs/disk-io.c | 4 +
> fs/btrfs/ioctl.c | 107 ++++++++++++++++++++++----
> fs/btrfs/ioctl.h | 37 +++++++++
> fs/btrfs/volumes.c | 219 ++++++++++++++++++++++++++++++++++++++++++++++++++--
> fs/btrfs/volumes.h | 18 ++++
> 6 files changed, 369 insertions(+), 21 deletions(-)
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 5b00eb8..65d7562 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -895,6 +895,7 @@ struct btrfs_block_group_cache {
> };
>
> struct reloc_control;
> +struct restripe_control;
> struct btrfs_device;
> struct btrfs_fs_devices;
> struct btrfs_delayed_root;
> @@ -1116,6 +1117,10 @@ struct btrfs_fs_info {
> u64 avail_metadata_alloc_bits;
> u64 avail_system_alloc_bits;
>
> + spinlock_t restripe_lock;
> + struct mutex restripe_mutex;
> + struct restripe_control *restripe_ctl;
> +
Can you please add some comments on the usage of the locks and
how to protect the restripe_ctl pointer and the access to its
data structures?
> unsigned data_chunk_allocations;
> unsigned metadata_ratio;
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 46d0412..fa2301b 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1700,6 +1700,10 @@ struct btrfs_root *open_ctree(struct super_block *sb,
> init_rwsem(&fs_info->scrub_super_lock);
> fs_info->scrub_workers_refcnt = 0;
>
> + spin_lock_init(&fs_info->restripe_lock);
> + mutex_init(&fs_info->restripe_mutex);
> + fs_info->restripe_ctl = NULL;
> +
> sb->s_blocksize = 4096;
> sb->s_blocksize_bits = blksize_bits(4096);
> sb->s_bdi = &fs_info->bdi;
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 970977a..9dfc686 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -1165,13 +1165,21 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root *root,
> if (!capable(CAP_SYS_ADMIN))
> return -EPERM;
>
> + mutex_lock(&root->fs_info->volume_mutex);
> + if (root->fs_info->restripe_ctl) {
> + printk(KERN_INFO "btrfs: restripe in progress\n");
> + ret = -EINVAL;
> + goto out;
> + }
> +
> vol_args = memdup_user(arg, sizeof(*vol_args));
> - if (IS_ERR(vol_args))
> - return PTR_ERR(vol_args);
> + if (IS_ERR(vol_args)) {
> + ret = PTR_ERR(vol_args);
> + goto out;
> + }
>
> vol_args->name[BTRFS_PATH_NAME_MAX] = '\0';
>
> - mutex_lock(&root->fs_info->volume_mutex);
> sizestr = vol_args->name;
> devstr = strchr(sizestr, ':');
> if (devstr) {
> @@ -1188,7 +1196,7 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root *root,
> printk(KERN_INFO "resizer unable to find device %llu\n",
> (unsigned long long)devid);
> ret = -EINVAL;
> - goto out_unlock;
> + goto out_free;
> }
> if (!strcmp(sizestr, "max"))
> new_size = device->bdev->bd_inode->i_size;
> @@ -1203,7 +1211,7 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root *root,
> new_size = memparse(sizestr, NULL);
> if (new_size == 0) {
> ret = -EINVAL;
> - goto out_unlock;
> + goto out_free;
> }
> }
>
> @@ -1212,7 +1220,7 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root *root,
> if (mod < 0) {
> if (new_size > old_size) {
> ret = -EINVAL;
> - goto out_unlock;
> + goto out_free;
> }
> new_size = old_size - new_size;
> } else if (mod > 0) {
> @@ -1221,11 +1229,11 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root *root,
>
> if (new_size < 256 * 1024 * 1024) {
> ret = -EINVAL;
> - goto out_unlock;
> + goto out_free;
> }
> if (new_size > device->bdev->bd_inode->i_size) {
> ret = -EFBIG;
> - goto out_unlock;
> + goto out_free;
> }
>
> do_div(new_size, root->sectorsize);
> @@ -1238,7 +1246,7 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root *root,
> trans = btrfs_start_transaction(root, 0);
> if (IS_ERR(trans)) {
> ret = PTR_ERR(trans);
> - goto out_unlock;
> + goto out_free;
> }
> ret = btrfs_grow_device(trans, device, new_size);
> btrfs_commit_transaction(trans, root);
> @@ -1246,9 +1254,10 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root *root,
> ret = btrfs_shrink_device(device, new_size);
> }
>
> -out_unlock:
> - mutex_unlock(&root->fs_info->volume_mutex);
> +out_free:
> kfree(vol_args);
> +out:
> + mutex_unlock(&root->fs_info->volume_mutex);
> return ret;
> }
>
> @@ -2014,14 +2023,25 @@ static long btrfs_ioctl_add_dev(struct btrfs_root *root, void __user *arg)
> if (!capable(CAP_SYS_ADMIN))
> return -EPERM;
>
> + mutex_lock(&root->fs_info->volume_mutex);
> + if (root->fs_info->restripe_ctl) {
> + printk(KERN_INFO "btrfs: restripe in progress\n");
> + ret = -EINVAL;
> + goto out;
> + }
> +
> vol_args = memdup_user(arg, sizeof(*vol_args));
> - if (IS_ERR(vol_args))
> - return PTR_ERR(vol_args);
> + if (IS_ERR(vol_args)) {
> + ret = PTR_ERR(vol_args);
> + goto out;
> + }
>
> vol_args->name[BTRFS_PATH_NAME_MAX] = '\0';
> ret = btrfs_init_new_device(root, vol_args->name);
>
> kfree(vol_args);
> +out:
> + mutex_unlock(&root->fs_info->volume_mutex);
> return ret;
> }
>
> @@ -2036,14 +2056,25 @@ static long btrfs_ioctl_rm_dev(struct btrfs_root *root, void __user *arg)
> if (root->fs_info->sb->s_flags & MS_RDONLY)
> return -EROFS;
>
> + mutex_lock(&root->fs_info->volume_mutex);
> + if (root->fs_info->restripe_ctl) {
> + printk(KERN_INFO "btrfs: restripe in progress\n");
> + ret = -EINVAL;
> + goto out;
> + }
> +
> vol_args = memdup_user(arg, sizeof(*vol_args));
> - if (IS_ERR(vol_args))
> - return PTR_ERR(vol_args);
> + if (IS_ERR(vol_args)) {
> + ret = PTR_ERR(vol_args);
> + goto out;
> + }
>
> vol_args->name[BTRFS_PATH_NAME_MAX] = '\0';
> ret = btrfs_rm_device(root, vol_args->name);
>
> kfree(vol_args);
> +out:
> + mutex_unlock(&root->fs_info->volume_mutex);
> return ret;
> }
>
> @@ -2833,6 +2864,50 @@ static long btrfs_ioctl_scrub_progress(struct btrfs_root *root,
> return ret;
> }
>
> +static long btrfs_ioctl_restripe(struct btrfs_root *root, void __user *arg)
> +{
> + struct btrfs_ioctl_restripe_args *rargs;
> + struct btrfs_fs_info *fs_info = root->fs_info;
> + struct restripe_control *rctl;
> + int ret;
> +
> + if (!capable(CAP_SYS_ADMIN))
> + return -EPERM;
> +
> + if (fs_info->sb->s_flags & MS_RDONLY)
> + return -EROFS;
> +
> + mutex_lock(&fs_info->restripe_mutex);
> +
> + rargs = memdup_user(arg, sizeof(*rargs));
> + if (IS_ERR(rargs)) {
> + ret = PTR_ERR(rargs);
> + goto out;
> + }
> +
> + rctl = kzalloc(sizeof(*rctl), GFP_NOFS);
> + if (!rctl) {
> + kfree(rargs);
> + ret = -ENOMEM;
> + goto out;
> + }
> +
> + rctl->fs_info = fs_info;
> + rctl->flags = rargs->flags;
> +
> + memcpy(&rctl->data, &rargs->data, sizeof(rctl->data));
> + memcpy(&rctl->meta, &rargs->meta, sizeof(rctl->meta));
> + memcpy(&rctl->sys, &rargs->sys, sizeof(rctl->sys));
> +
> + ret = btrfs_restripe(rctl);
> +
> + /* rctl freed in unset_restripe_control */
> + kfree(rargs);
> +out:
> + mutex_unlock(&fs_info->restripe_mutex);
> + return ret;
> +}
> +
> long btrfs_ioctl(struct file *file, unsigned int
> cmd, unsigned long arg)
> {
> @@ -2905,6 +2980,8 @@ long btrfs_ioctl(struct file *file, unsigned int
> return btrfs_ioctl_scrub_cancel(root, argp);
> case BTRFS_IOC_SCRUB_PROGRESS:
> return btrfs_ioctl_scrub_progress(root, argp);
> + case BTRFS_IOC_RESTRIPE:
> + return btrfs_ioctl_restripe(root, argp);
> }
>
> return -ENOTTY;
> diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h
> index ad1ea78..798f1d4 100644
> --- a/fs/btrfs/ioctl.h
> +++ b/fs/btrfs/ioctl.h
> @@ -109,6 +109,41 @@ struct btrfs_ioctl_fs_info_args {
> __u64 reserved[124]; /* pad to 1k */
> };
>
> +struct btrfs_restripe_args {
> + __u64 profiles;
> + __u64 usage;
> + __u64 devid;
> + __u64 pstart;
> + __u64 pend;
> + __u64 vstart;
> + __u64 vend;
> +
> + __u64 target;
> +
> + __u64 flags;
> +
> + __u64 unused[8];
> +} __attribute__ ((__packed__));
> +
> +struct btrfs_restripe_progress {
> + __u64 expected;
> + __u64 considered;
> + __u64 completed;
> +};
> +
> +struct btrfs_ioctl_restripe_args {
> + __u64 flags;
> + __u64 state;
> +
> + struct btrfs_restripe_args data;
> + struct btrfs_restripe_args sys;
> + struct btrfs_restripe_args meta;
> +
> + struct btrfs_restripe_progress stat;
> +
> + __u64 unused[72]; /* pad to 1k */
> +};
> +
> #define BTRFS_INO_LOOKUP_PATH_MAX 4080
> struct btrfs_ioctl_ino_lookup_args {
> __u64 treeid;
> @@ -248,4 +283,6 @@ struct btrfs_ioctl_space_args {
> struct btrfs_ioctl_dev_info_args)
> #define BTRFS_IOC_FS_INFO _IOR(BTRFS_IOCTL_MAGIC, 31, \
> struct btrfs_ioctl_fs_info_args)
> +#define BTRFS_IOC_RESTRIPE _IOW(BTRFS_IOCTL_MAGIC, 32, \
> + struct btrfs_ioctl_restripe_args)
> #endif
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index af4bf56..0e4a276 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1262,7 +1262,6 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path)
> bool clear_super = false;
>
> mutex_lock(&uuid_mutex);
> - mutex_lock(&root->fs_info->volume_mutex);
>
> all_avail = root->fs_info->avail_data_alloc_bits |
> root->fs_info->avail_system_alloc_bits |
> @@ -1427,7 +1426,6 @@ error_close:
> if (bdev)
> blkdev_put(bdev, FMODE_READ | FMODE_EXCL);
> out:
> - mutex_unlock(&root->fs_info->volume_mutex);
> mutex_unlock(&uuid_mutex);
> return ret;
> error_undo:
> @@ -1604,7 +1602,6 @@ int btrfs_init_new_device(struct btrfs_root *root, char *device_path)
> }
>
> filemap_write_and_wait(bdev->bd_inode->i_mapping);
> - mutex_lock(&root->fs_info->volume_mutex);
>
> devices = &root->fs_info->fs_devices->devices;
> /*
> @@ -1728,8 +1725,7 @@ int btrfs_init_new_device(struct btrfs_root *root, char *device_path)
> ret = btrfs_relocate_sys_chunks(root);
> BUG_ON(ret);
> }
> -out:
> - mutex_unlock(&root->fs_info->volume_mutex);
> +
> return ret;
> error:
> blkdev_put(bdev, FMODE_EXCL);
> @@ -1737,7 +1733,7 @@ error:
> mutex_unlock(&uuid_mutex);
> up_write(&sb->s_umount);
> }
> - goto out;
> + return ret;
> }
>
> static noinline int btrfs_update_device(struct btrfs_trans_handle *trans,
> @@ -2155,6 +2151,217 @@ error:
> }
>
> /*
> + * Should be called with both restripe and volume mutexes held to
> + * serialize other volume operations (add_dev/rm_dev/resize) wrt
> + * restriper. Same goes for unset_restripe_control().
> + */
> +static void set_restripe_control(struct restripe_control *rctl)
> +{
> + struct btrfs_fs_info *fs_info = rctl->fs_info;
> +
> + spin_lock(&fs_info->restripe_lock);
> + fs_info->restripe_ctl = rctl;
> + spin_unlock(&fs_info->restripe_lock);
> +}
> +
> +static void unset_restripe_control(struct btrfs_fs_info *fs_info)
> +{
> + struct restripe_control *rctl = fs_info->restripe_ctl;
> +
> + spin_lock(&fs_info->restripe_lock);
> + fs_info->restripe_ctl = NULL;
> + spin_unlock(&fs_info->restripe_lock);
> +
> + kfree(rctl);
> +}
> +
> +static int __btrfs_restripe(struct btrfs_root *dev_root)
> +{
> + struct list_head *devices;
> + struct btrfs_device *device;
> + u64 old_size;
> + u64 size_to_free;
> + struct btrfs_root *chunk_root = dev_root->fs_info->chunk_root;
> + struct btrfs_path *path;
> + struct btrfs_key key;
> + struct btrfs_key found_key;
> + struct btrfs_trans_handle *trans;
> + int ret;
> + int enospc_errors = 0;
> +
> + /* step one make some room on all the devices */
> + devices = &dev_root->fs_info->fs_devices->devices;
> + list_for_each_entry(device, devices, dev_list) {
> + old_size = device->total_bytes;
> + size_to_free = div_factor(old_size, 1);
> + size_to_free = min(size_to_free, (u64)1 * 1024 * 1024);
> + if (!device->writeable ||
> + device->total_bytes - device->bytes_used > size_to_free)
> + continue;
> +
> + ret = btrfs_shrink_device(device, old_size - size_to_free);
> + if (ret == -ENOSPC)
> + break;
> + BUG_ON(ret);
> +
> + trans = btrfs_start_transaction(dev_root, 0);
> + BUG_ON(IS_ERR(trans));
> +
> + ret = btrfs_grow_device(trans, device, old_size);
> + BUG_ON(ret);
> +
> + btrfs_end_transaction(trans, dev_root);
> + }
> +
> + /* step two, relocate all the chunks */
> + path = btrfs_alloc_path();
> + if (!path) {
> + ret = -ENOMEM;
> + goto error;
> + }
> +
> + key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
> + key.offset = (u64)-1;
> + key.type = BTRFS_CHUNK_ITEM_KEY;
> +
> + while (1) {
> + ret = btrfs_search_slot(NULL, chunk_root, &key, path, 0, 0);
> + if (ret < 0)
> + goto error;
> +
> + /*
> + * this shouldn't happen, it means the last relocate
> + * failed
> + */
> + if (ret == 0)
> + BUG_ON(1); /* DIS - break ? */
> +
> + ret = btrfs_previous_item(chunk_root, path, 0,
> + BTRFS_CHUNK_ITEM_KEY);
> + if (ret)
> + BUG_ON(1); /* DIS - break ? */
> +
> + btrfs_item_key_to_cpu(path->nodes[0], &found_key,
> + path->slots[0]);
> + if (found_key.objectid != key.objectid)
> + break;
> +
> + /* chunk zero is special */
> + if (found_key.offset == 0)
> + break;
> +
> + btrfs_release_path(path);
> + ret = btrfs_relocate_chunk(chunk_root,
> + chunk_root->root_key.objectid,
> + found_key.objectid,
> + found_key.offset);
> + if (ret && ret != -ENOSPC)
> + goto error;
> + if (ret == -ENOSPC)
> + enospc_errors++;
> + key.offset = found_key.offset - 1;
> + }
> +
> +error:
> + btrfs_free_path(path);
> + if (enospc_errors) {
> + printk(KERN_INFO "btrfs: restripe finished with %d enospc "
> + "error(s)\n", enospc_errors);
> + ret = -ENOSPC;
> + }
> +
> + return ret;
> +}
> +
> +/*
> + * Should be called with restripe_mutex held
> + */
> +int btrfs_restripe(struct restripe_control *rctl)
> +{
> + struct btrfs_fs_info *fs_info = rctl->fs_info;
> + u64 allowed;
> + int ret;
> +
> + mutex_lock(&fs_info->volume_mutex);
> +
> + /*
> + * Profile changing sanity checks
> + */
> + allowed = BTRFS_AVAIL_ALLOC_BIT_SINGLE;
> + if (fs_info->fs_devices->num_devices == 1)
> + allowed |= BTRFS_BLOCK_GROUP_DUP;
> + else if (fs_info->fs_devices->num_devices < 4)
> + allowed |= (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1);
> + else
> + allowed |= (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1 |
> + BTRFS_BLOCK_GROUP_RAID10);
> +
> + if (rctl->data.target & ~allowed) {
> + printk(KERN_ERR "btrfs: unable to start restripe with target "
> + "data profile %llu\n",
> + (unsigned long long)rctl->data.target);
> + ret = -EINVAL;
> + goto out;
> + }
> + if (rctl->sys.target & ~allowed) {
> + printk(KERN_ERR "btrfs: unable to start restripe with target "
> + "system profile %llu\n",
> + (unsigned long long)rctl->sys.target);
> + ret = -EINVAL;
> + goto out;
> + }
> + if (rctl->meta.target & ~allowed) {
> + printk(KERN_ERR "btrfs: unable to start restripe with target "
> + "metadata profile %llu\n",
> + (unsigned long long)rctl->meta.target);
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + if (rctl->data.target & BTRFS_BLOCK_GROUP_DUP) {
> + printk(KERN_ERR "btrfs: dup for data is not allowed\n");
> + ret = -EINVAL;
> + goto out;
> + }
It would be good to get these error messages somehow to the user,
or at least give the user a hint to look in dmesg.
> +
> + /* allow to reduce meta or sys integrity only if force set */
> + allowed = BTRFS_BLOCK_GROUP_DUP | BTRFS_BLOCK_GROUP_RAID1 |
> + BTRFS_BLOCK_GROUP_RAID10;
> + if (((rctl->sys.flags & BTRFS_RESTRIPE_ARGS_CONVERT) &&
> + (fs_info->avail_system_alloc_bits & allowed) &&
> + !(rctl->sys.target & allowed)) ||
> + ((rctl->meta.flags & BTRFS_RESTRIPE_ARGS_CONVERT) &&
> + (fs_info->avail_metadata_alloc_bits & allowed) &&
> + !(rctl->meta.target & allowed))) {
> + if (rctl->flags & BTRFS_RESTRIPE_FORCE) {
> + printk(KERN_INFO "btrfs: force reducing metadata "
> + "integrity\n");
> + } else {
> + printk(KERN_ERR "btrfs: can't reduce metadata "
> + "integrity\n");
> + ret = -EINVAL;
> + goto out;
> + }
> + }
> +
> + set_restripe_control(rctl);
> + mutex_unlock(&fs_info->volume_mutex);
> +
> + ret = __btrfs_restripe(fs_info->dev_root);
> +
> + mutex_lock(&fs_info->volume_mutex);
> + unset_restripe_control(fs_info);
> + mutex_unlock(&fs_info->volume_mutex);
> +
> + return ret;
> +
> +out:
> + mutex_unlock(&fs_info->volume_mutex);
> + kfree(rctl);
> + return ret;
> +}
> +
> +/*
> * shrinking a device means finding all of the device extents past
> * the new size, and then following the back refs to the chunks.
> * The chunk relocation code actually frees the device extent
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 6d866db..8804c5c 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -168,6 +168,23 @@ struct map_lookup {
> #define map_lookup_size(n) (sizeof(struct map_lookup) + \
> (sizeof(struct btrfs_bio_stripe) * (n)))
>
> +#define BTRFS_RESTRIPE_FORCE (1ULL << 3)
> +
> +/*
> + * Profile changing flags
> + */
> +#define BTRFS_RESTRIPE_ARGS_CONVERT (1ULL << 8)
> +
> +struct btrfs_restripe_args;
> +struct restripe_control {
> + struct btrfs_fs_info *fs_info;
> + u64 flags;
> +
> + struct btrfs_restripe_args data;
> + struct btrfs_restripe_args sys;
> + struct btrfs_restripe_args meta;
> +};
> +
> int btrfs_account_dev_extents_size(struct btrfs_device *device, u64 start,
> u64 end, u64 *length);
>
> @@ -211,6 +228,7 @@ struct btrfs_device *btrfs_find_device(struct btrfs_root *root, u64 devid,
> int btrfs_shrink_device(struct btrfs_device *device, u64 new_size);
> int btrfs_init_new_device(struct btrfs_root *root, char *path);
> int btrfs_balance(struct btrfs_root *dev_root);
> +int btrfs_restripe(struct restripe_control *rctl);
> int btrfs_chunk_readonly(struct btrfs_root *root, u64 chunk_offset);
> int find_free_dev_extent(struct btrfs_trans_handle *trans,
> struct btrfs_device *device, u64 num_bytes,
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 10/21] Btrfs: usage filter
2011-08-23 20:01 ` [PATCH 10/21] Btrfs: usage filter Ilya Dryomov
2011-09-27 13:22 ` David Sterba
@ 2011-11-01 10:18 ` Arne Jansen
1 sibling, 0 replies; 42+ messages in thread
From: Arne Jansen @ 2011-11-01 10:18 UTC (permalink / raw)
To: Ilya Dryomov; +Cc: linux-btrfs, Chris Mason, Hugo Mills
On 23.08.2011 22:01, Ilya Dryomov wrote:
> Select chunks that are less than X percent full.
>
> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
> ---
> fs/btrfs/volumes.c | 33 +++++++++++++++++++++++++++++++++
> fs/btrfs/volumes.h | 1 +
> 2 files changed, 34 insertions(+), 0 deletions(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index f045615..b49ecfa 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2193,6 +2193,33 @@ static int chunk_profiles_filter(u64 chunk_profile,
> return 1;
> }
>
> +static u64 div_factor_fine(u64 num, int factor)
> +{
> + if (factor == 100)
> + return num;
You already have changed this to a range check that always returns
num when <0 or >= 100, but I'd find it more consistent to return 0
when factor < 0.
> + num *= factor;
> + do_div(num, 100);
> + return num;
> +}
> +
> +static int chunk_usage_filter(struct btrfs_fs_info *fs_info, u64 chunk_offset,
> + struct btrfs_restripe_args *rargs)
> +{
> + struct btrfs_block_group_cache *cache;
> + u64 chunk_used, user_thresh;
> + int ret = 1;
> +
> + cache = btrfs_lookup_block_group(fs_info, chunk_offset);
> + chunk_used = btrfs_block_group_used(&cache->item);
> +
> + user_thresh = div_factor_fine(cache->key.offset, rargs->usage);
> + if (chunk_used < user_thresh)
> + ret = 0;
> +
> + btrfs_put_block_group(cache);
> + return ret;
> +}
> +
> static int chunk_soft_convert_filter(u64 chunk_profile,
> struct btrfs_restripe_args *rargs)
> {
> @@ -2236,6 +2263,12 @@ static int should_restripe_chunk(struct btrfs_root *root,
> return 0;
> }
>
> + /* usage filter */
> + if ((rargs->flags & BTRFS_RESTRIPE_ARGS_USAGE) &&
> + chunk_usage_filter(rctl->fs_info, chunk_offset, rargs)) {
> + return 0;
> + }
> +
> /* soft profile changing mode */
> if ((rargs->flags & BTRFS_RESTRIPE_ARGS_SOFT) &&
> chunk_soft_convert_filter(chunk_type, rargs)) {
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 9f96ad8..c6baf4b 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -186,6 +186,7 @@ struct map_lookup {
> * Restripe filters
> */
> #define BTRFS_RESTRIPE_ARGS_PROFILES (1ULL << 0)
> +#define BTRFS_RESTRIPE_ARGS_USAGE (1ULL << 1)
>
> /*
> * Profile changing flags. When SOFT is set we won't relocate chunk if
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 14/21] Btrfs: save restripe parameters to disk
2011-08-23 20:01 ` [PATCH 14/21] Btrfs: save restripe parameters to disk Ilya Dryomov
2011-09-27 13:43 ` David Sterba
@ 2011-11-01 10:29 ` Arne Jansen
1 sibling, 0 replies; 42+ messages in thread
From: Arne Jansen @ 2011-11-01 10:29 UTC (permalink / raw)
To: Ilya Dryomov; +Cc: linux-btrfs, Chris Mason, Hugo Mills
On 23.08.2011 22:01, Ilya Dryomov wrote:
> Introduce a new btree objectid for storing restripe item. The reason is
> to be able to resume restriper after a crash with the same parameters.
> Restripe item has a very high objectid and goes into tree of tree roots.
>
> The key for the new item is as follows:
>
> [ BTRFS_RESTRIPE_OBJECTID ; 0 ; 0 ]
>
> Older kernels simply ignore it so it's safe to mount with an older
> kernel and then go back to the newer one.
>
> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
> ---
> fs/btrfs/ctree.h | 127 +++++++++++++++++++++++++++++++++++++++++++++++++++-
> fs/btrfs/volumes.c | 105 ++++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 228 insertions(+), 4 deletions(-)
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 65d7562..b524034 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -85,6 +85,9 @@ struct btrfs_ordered_sum;
> /* holds checksums of all the data extents */
> #define BTRFS_CSUM_TREE_OBJECTID 7ULL
>
> +/* for storing restripe params in the root tree */
> +#define BTRFS_RESTRIPE_OBJECTID -4ULL
> +
> /* orhpan objectid for tracking unlinked/truncated files */
> #define BTRFS_ORPHAN_OBJECTID -5ULL
>
> @@ -649,6 +652,47 @@ struct btrfs_root_ref {
> __le16 name_len;
> } __attribute__ ((__packed__));
>
> +/*
> + * Restriper stuff
> + */
> +struct btrfs_disk_restripe_args {
> + /* profiles to touch, in-memory format */
> + __le64 profiles;
> +
> + /* usage filter */
> + __le64 usage;
> +
> + /* devid filter */
> + __le64 devid;
> +
> + /* devid subset filter [pstart..pend) */
> + __le64 pstart;
> + __le64 pend;
> +
> + /* btrfs virtual address space subset filter [vstart..vend) */
> + __le64 vstart;
> + __le64 vend;
> +
> + /* profile to convert to, in-memory format */
> + __le64 target;
> +
> + /* BTRFS_RESTRIPE_ARGS_* */
> + __le64 flags;
> +
> + __le64 unused[8];
> +} __attribute__ ((__packed__));
> +
> +struct btrfs_restripe_item {
> + /* BTRFS_RESTRIPE_* */
> + __le64 flags;
> +
> + struct btrfs_disk_restripe_args data;
> + struct btrfs_disk_restripe_args sys;
> + struct btrfs_disk_restripe_args meta;
> +
> + __le64 unused[4];
> +} __attribute__ ((__packed__));
what are those unused fields for? As I understand it, the restripe_item
is only temporary and gets removed after restripe finished, so I don't
see much point in leaving space for future expansions. You have the size
of the struct anyway, or can determine which fields to access through the
flags field.
> +
> #define BTRFS_FILE_EXTENT_INLINE 0
> #define BTRFS_FILE_EXTENT_REG 1
> #define BTRFS_FILE_EXTENT_PREALLOC 2
> @@ -727,7 +771,8 @@ struct btrfs_csum_item {
> BTRFS_BLOCK_GROUP_RAID10)
> /*
> * We need a bit for restriper to be able to tell when chunks of type
> - * SINGLE are available. It is used in avail_*_alloc_bits.
> + * SINGLE are available. It is used in avail_*_alloc_bits and restripe
> + * item fields.
> */
> #define BTRFS_AVAIL_ALLOC_BIT_SINGLE (1 << 7)
>
> @@ -2000,8 +2045,86 @@ static inline bool btrfs_root_readonly(struct btrfs_root *root)
> return root->root_item.flags & BTRFS_ROOT_SUBVOL_RDONLY;
> }
>
> -/* struct btrfs_super_block */
> +/* struct btrfs_restripe_item */
> +BTRFS_SETGET_FUNCS(restripe_flags, struct btrfs_restripe_item, flags, 64);
> +
> +static inline void btrfs_restripe_data(struct extent_buffer *eb,
> + struct btrfs_restripe_item *ri,
> + struct btrfs_disk_restripe_args *ra)
> +{
> + read_eb_member(eb, ri, struct btrfs_restripe_item, data, ra);
> +}
>
> +static inline void btrfs_set_restripe_data(struct extent_buffer *eb,
> + struct btrfs_restripe_item *ri,
> + struct btrfs_disk_restripe_args *ra)
> +{
> + write_eb_member(eb, ri, struct btrfs_restripe_item, data, ra);
> +}
> +
> +static inline void btrfs_restripe_meta(struct extent_buffer *eb,
> + struct btrfs_restripe_item *ri,
> + struct btrfs_disk_restripe_args *ra)
> +{
> + read_eb_member(eb, ri, struct btrfs_restripe_item, meta, ra);
> +}
> +
> +static inline void btrfs_set_restripe_meta(struct extent_buffer *eb,
> + struct btrfs_restripe_item *ri,
> + struct btrfs_disk_restripe_args *ra)
> +{
> + write_eb_member(eb, ri, struct btrfs_restripe_item, meta, ra);
> +}
> +
> +static inline void btrfs_restripe_sys(struct extent_buffer *eb,
> + struct btrfs_restripe_item *ri,
> + struct btrfs_disk_restripe_args *ra)
> +{
> + read_eb_member(eb, ri, struct btrfs_restripe_item, sys, ra);
> +}
> +
> +static inline void btrfs_set_restripe_sys(struct extent_buffer *eb,
> + struct btrfs_restripe_item *ri,
> + struct btrfs_disk_restripe_args *ra)
> +{
> + write_eb_member(eb, ri, struct btrfs_restripe_item, sys, ra);
> +}
> +
> +static inline void
> +btrfs_disk_restripe_args_to_cpu(struct btrfs_restripe_args *cpu,
> + struct btrfs_disk_restripe_args *disk)
> +{
> + memset(cpu, 0, sizeof(*cpu));
> +
> + cpu->profiles = le64_to_cpu(disk->profiles);
> + cpu->usage = le64_to_cpu(disk->usage);
> + cpu->devid = le64_to_cpu(disk->devid);
> + cpu->pstart = le64_to_cpu(disk->pstart);
> + cpu->pend = le64_to_cpu(disk->pend);
> + cpu->vstart = le64_to_cpu(disk->vstart);
> + cpu->vend = le64_to_cpu(disk->vend);
> + cpu->target = le64_to_cpu(disk->target);
> + cpu->flags = le64_to_cpu(disk->flags);
> +}
> +
> +static inline void
> +btrfs_cpu_restripe_args_to_disk(struct btrfs_disk_restripe_args *disk,
> + struct btrfs_restripe_args *cpu)
> +{
> + memset(disk, 0, sizeof(*disk));
> +
> + disk->profiles = cpu_to_le64(cpu->profiles);
> + disk->usage = cpu_to_le64(cpu->usage);
> + disk->devid = cpu_to_le64(cpu->devid);
> + disk->pstart = cpu_to_le64(cpu->pstart);
> + disk->pend = cpu_to_le64(cpu->pend);
> + disk->vstart = cpu_to_le64(cpu->vstart);
> + disk->vend = cpu_to_le64(cpu->vend);
> + disk->target = cpu_to_le64(cpu->target);
> + disk->flags = cpu_to_le64(cpu->flags);
> +}
> +
> +/* struct btrfs_super_block */
> BTRFS_SETGET_STACK_FUNCS(super_bytenr, struct btrfs_super_block, bytenr, 64);
> BTRFS_SETGET_STACK_FUNCS(super_flags, struct btrfs_super_block, flags, 64);
> BTRFS_SETGET_STACK_FUNCS(super_generation, struct btrfs_super_block,
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index eccd458..1057ad3 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2150,6 +2150,97 @@ error:
> return ret;
> }
>
> +static int insert_restripe_item(struct btrfs_root *root,
> + struct restripe_control *rctl)
> +{
> + struct btrfs_trans_handle *trans;
> + struct btrfs_restripe_item *item;
> + struct btrfs_disk_restripe_args disk_rargs;
> + struct btrfs_path *path;
> + struct extent_buffer *leaf;
> + struct btrfs_key key;
> + int ret, err;
> +
> + path = btrfs_alloc_path();
> + if (!path)
> + return -ENOMEM;
> +
> + trans = btrfs_start_transaction(root, 0);
> + if (IS_ERR(trans)) {
> + btrfs_free_path(path);
> + return PTR_ERR(trans);
> + }
> +
> + key.objectid = BTRFS_RESTRIPE_OBJECTID;
> + key.type = 0;
> + key.offset = 0;
> +
> + ret = btrfs_insert_empty_item(trans, root, path, &key,
> + sizeof(*item));
> + if (ret)
> + goto out;
> +
> + leaf = path->nodes[0];
> + item = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_restripe_item);
> +
> + memset_extent_buffer(leaf, 0, (unsigned long)item, sizeof(*item));
> +
> + btrfs_cpu_restripe_args_to_disk(&disk_rargs, &rctl->data);
> + btrfs_set_restripe_data(leaf, item, &disk_rargs);
> + btrfs_cpu_restripe_args_to_disk(&disk_rargs, &rctl->meta);
> + btrfs_set_restripe_meta(leaf, item, &disk_rargs);
> + btrfs_cpu_restripe_args_to_disk(&disk_rargs, &rctl->sys);
> + btrfs_set_restripe_sys(leaf, item, &disk_rargs);
> +
> + btrfs_set_restripe_flags(leaf, item, rctl->flags);
> +
> + btrfs_mark_buffer_dirty(leaf);
> +out:
> + btrfs_free_path(path);
> + err = btrfs_commit_transaction(trans, root);
> + if (err && !ret)
> + ret = err;
> + return ret;
> +}
> +
> +static int del_restripe_item(struct btrfs_root *root)
> +{
> + struct btrfs_trans_handle *trans;
> + struct btrfs_path *path;
> + struct btrfs_key key;
> + int ret, err;
> +
> + path = btrfs_alloc_path();
> + if (!path)
> + return -ENOMEM;
> +
> + trans = btrfs_start_transaction(root, 0);
> + if (IS_ERR(trans)) {
> + btrfs_free_path(path);
> + return PTR_ERR(trans);
> + }
> +
> + key.objectid = BTRFS_RESTRIPE_OBJECTID;
> + key.type = 0;
> + key.offset = 0;
> +
> + ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
> + if (ret < 0)
> + goto out;
> + if (ret > 0) {
> + ret = -ENOENT;
> + goto out;
> + }
> +
> + ret = btrfs_del_item(trans, root, path);
> +out:
> + btrfs_free_path(path);
> + err = btrfs_commit_transaction(trans, root);
> + if (err && !ret)
> + ret = err;
> + return ret;
> +}
> +
> /*
> * Should be called with both restripe and volume mutexes held to
> * serialize other volume operations (add_dev/rm_dev/resize) wrt
> @@ -2485,6 +2576,7 @@ int btrfs_restripe(struct restripe_control *rctl)
> {
> struct btrfs_fs_info *fs_info = rctl->fs_info;
> u64 allowed;
> + int err;
> int ret;
>
> mutex_lock(&fs_info->volume_mutex);
> @@ -2572,16 +2664,25 @@ int btrfs_restripe(struct restripe_control *rctl)
> }
>
> do_restripe:
> + ret = insert_restripe_item(fs_info->tree_root, rctl);
> + if (ret && ret != -EEXIST)
> + goto out;
> + BUG_ON(ret == -EEXIST);
> +
> set_restripe_control(rctl);
> mutex_unlock(&fs_info->volume_mutex);
>
> - ret = __btrfs_restripe(fs_info->dev_root);
> + err = __btrfs_restripe(fs_info->dev_root);
>
> mutex_lock(&fs_info->volume_mutex);
> +
> unset_restripe_control(fs_info);
> + ret = del_restripe_item(fs_info->tree_root);
> + BUG_ON(ret);
> +
> mutex_unlock(&fs_info->volume_mutex);
>
> - return ret;
> + return err;
>
> out:
> mutex_unlock(&fs_info->volume_mutex);
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 15/21] Btrfs: recover restripe on mount
2011-08-23 20:01 ` [PATCH 15/21] Btrfs: recover restripe on mount Ilya Dryomov
@ 2011-11-01 10:57 ` Arne Jansen
0 siblings, 0 replies; 42+ messages in thread
From: Arne Jansen @ 2011-11-01 10:57 UTC (permalink / raw)
To: Ilya Dryomov; +Cc: linux-btrfs, Chris Mason, Hugo Mills
On 23.08.2011 22:01, Ilya Dryomov wrote:
> On mount, if restripe item is found, resume restripe in a separate
> kernel thread.
>
> Try to be smart to continue roughly where previous balance (or convert)
> was interrupted. For chunk types that were being converted to some
> profile we turn on soft convert, in case of a simple balance we turn on
> usage filter and relocate only less-than-90%-full chunks of that type.
> These are just heuristics but they help quite a bit, and can be improved
> in future.
>
Instead of trying to find out where you left off, can't you just save a
pointer in your restripe_item every time a chunk finished?
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 05/21] Btrfs: add basic restriper infrastructure
2011-11-01 10:08 ` Arne Jansen
@ 2011-11-01 11:07 ` David Sterba
2011-11-01 11:08 ` Arne Jansen
0 siblings, 1 reply; 42+ messages in thread
From: David Sterba @ 2011-11-01 11:07 UTC (permalink / raw)
To: Arne Jansen; +Cc: Ilya Dryomov, linux-btrfs, Chris Mason, Hugo Mills
On Tue, Nov 01, 2011 at 11:08:38AM +0100, Arne Jansen wrote:
> > +/*
> > + * Should be called with restripe_mutex held
> > + */
> > +int btrfs_restripe(struct restripe_control *rctl)
> > +{
...
> > + if (rctl->data.target & BTRFS_BLOCK_GROUP_DUP) {
> > + printk(KERN_ERR "btrfs: dup for data is not allowed\n");
> > + ret = -EINVAL;
> > + goto out;
> > + }
>
> It would be good to get these error messages somehow to the user,
> or at least give the user a hint to look in dmesg.
the restriper command ends with EINVAL which is in most cases returned
as a result of the ioctl and progs counterpart will
1117 fprintf(stderr, "ERROR: error during restriping '%s' "
1118 "- %s\n", path, strerror(e));
1119 return 19;
the hint should go there imho.
david
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 05/21] Btrfs: add basic restriper infrastructure
2011-11-01 11:07 ` David Sterba
@ 2011-11-01 11:08 ` Arne Jansen
0 siblings, 0 replies; 42+ messages in thread
From: Arne Jansen @ 2011-11-01 11:08 UTC (permalink / raw)
To: Ilya Dryomov, linux-btrfs, Chris Mason, Hugo Mills
On 01.11.2011 12:07, David Sterba wrote:
> On Tue, Nov 01, 2011 at 11:08:38AM +0100, Arne Jansen wrote:
>>> +/*
>>> + * Should be called with restripe_mutex held
>>> + */
>>> +int btrfs_restripe(struct restripe_control *rctl)
>>> +{
> ...
>>> + if (rctl->data.target & BTRFS_BLOCK_GROUP_DUP) {
>>> + printk(KERN_ERR "btrfs: dup for data is not allowed\n");
>>> + ret = -EINVAL;
>>> + goto out;
>>> + }
>>
>> It would be good to get these error messages somehow to the user,
>> or at least give the user a hint to look in dmesg.
>
> the restriper command ends with EINVAL which is in most cases returned
> as a result of the ioctl and progs counterpart will
>
> 1117 fprintf(stderr, "ERROR: error during restriping '%s' "
> 1118 "- %s\n", path, strerror(e));
> 1119 return 19;
>
> the hint should go there imho.
Though it would still be much nicer to get a proper error message to the
user directly.
>
>
> david
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 17/21] Btrfs: allow for pausing restriper
2011-08-23 20:01 ` [PATCH 17/21] Btrfs: allow for pausing restriper Ilya Dryomov
@ 2011-11-01 11:46 ` Arne Jansen
0 siblings, 0 replies; 42+ messages in thread
From: Arne Jansen @ 2011-11-01 11:46 UTC (permalink / raw)
To: Ilya Dryomov; +Cc: linux-btrfs, Chris Mason, Hugo Mills
On 23.08.2011 22:01, Ilya Dryomov wrote:
> Implement an ioctl for pausing restriper. This pauses the relocation,
> but restripe is still considered to be "in progress": restriper item is
> not deleted, other volume operations cannot be started, etc. If paused
> in the middle of profile changing operation we will continue making
> allocations with the target profile.
>
> Add a hook to close_ctree() to be able to pause restriper and free it's
> data structures on unmount. (It's safe to unmount when restriper is in
> 'paused' state, we will resume with the same parameters on the next
> mount)
>
> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
> ---
> fs/btrfs/disk-io.c | 3 +++
> fs/btrfs/ioctl.c | 2 ++
> fs/btrfs/ioctl.h | 1 +
> fs/btrfs/volumes.c | 44 ++++++++++++++++++++++++++++++++++++++++++--
> fs/btrfs/volumes.h | 2 ++
> 5 files changed, 50 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 662a6e6..7db5c50 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2542,6 +2542,9 @@ int close_ctree(struct btrfs_root *root)
> fs_info->closing = 1;
> smp_mb();
>
> + /* pause restriper and free restripe_ctl */
> + btrfs_pause_restripe(root->fs_info, 1);
> +
> btrfs_scrub_cancel(root);
>
> /* wait for any defraggers to finish */
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index d8bdb67..61978ac 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -2921,6 +2921,8 @@ static long btrfs_ioctl_restripe_ctl(struct btrfs_root *root,
> switch (cmd) {
> case BTRFS_RESTRIPE_CTL_CANCEL:
> return btrfs_cancel_restripe(root->fs_info);
> + case BTRFS_RESTRIPE_CTL_PAUSE:
> + return btrfs_pause_restripe(root->fs_info, 0);
> }
>
> return -EINVAL;
> diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h
> index 4f6ead5..e468d5b 100644
> --- a/fs/btrfs/ioctl.h
> +++ b/fs/btrfs/ioctl.h
> @@ -110,6 +110,7 @@ struct btrfs_ioctl_fs_info_args {
> };
>
> #define BTRFS_RESTRIPE_CTL_CANCEL 1
> +#define BTRFS_RESTRIPE_CTL_PAUSE 2
>
> struct btrfs_restripe_args {
> __u64 profiles;
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index cd43368..65deaa7 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2555,7 +2555,8 @@ static int __btrfs_restripe(struct btrfs_root *dev_root)
> while (1) {
> struct btrfs_fs_info *fs_info = dev_root->fs_info;
>
> - if (test_bit(RESTRIPE_CANCEL_REQ, &fs_info->restripe_state)) {
> + if (test_bit(RESTRIPE_CANCEL_REQ, &fs_info->restripe_state) ||
> + test_bit(RESTRIPE_PAUSE_REQ, &fs_info->restripe_state)) {
> ret = -ECANCELED;
> goto error;
> }
> @@ -2730,7 +2731,9 @@ do_restripe:
> mutex_lock(&fs_info->restripe_mutex);
> clear_bit(RESTRIPE_RUNNING, &fs_info->restripe_state);
>
> - if (test_bit(RESTRIPE_CANCEL_REQ, &fs_info->restripe_state)) {
> + if (test_bit(RESTRIPE_CANCEL_REQ, &fs_info->restripe_state) ||
> + (!test_bit(RESTRIPE_PAUSE_REQ, &fs_info->restripe_state) &&
> + !test_bit(RESTRIPE_CANCEL_REQ, &fs_info->restripe_state))) {
> mutex_lock(&fs_info->volume_mutex);
>
> unset_restripe_control(fs_info);
> @@ -2858,6 +2861,43 @@ out:
> return ret;
> }
I don't see a difference in what CANCEL_REQ and PAUSE_REQ do, so it seems
one of them would be enough.
>
> +int btrfs_pause_restripe(struct btrfs_fs_info *fs_info, int unset)
> +{
> + int ret = 0;
> +
> + mutex_lock(&fs_info->restripe_mutex);
> + if (!fs_info->restripe_ctl) {
> + ret = -ENOTCONN;
> + goto out;
> + }
> +
> + /* only running restripe can be paused */
> + if (!test_bit(RESTRIPE_RUNNING, &fs_info->restripe_state)) {
> + ret = -ENOTCONN;
> + goto out_unset;
> + }
> +
> + set_bit(RESTRIPE_PAUSE_REQ, &fs_info->restripe_state);
> + while (test_bit(RESTRIPE_RUNNING, &fs_info->restripe_state)) {
> + mutex_unlock(&fs_info->restripe_mutex);
> + wait_event(fs_info->restripe_wait,
> + !test_bit(RESTRIPE_RUNNING,
> + &fs_info->restripe_state));
> + mutex_lock(&fs_info->restripe_mutex);
> + }
> + clear_bit(RESTRIPE_PAUSE_REQ, &fs_info->restripe_state);
> +
> +out_unset:
> + if (unset) {
> + mutex_lock(&fs_info->volume_mutex);
> + unset_restripe_control(fs_info);
> + mutex_unlock(&fs_info->volume_mutex);
> + }
> +out:
> + mutex_unlock(&fs_info->restripe_mutex);
> + return ret;
> +}
> +
This looks very similar to cancel_restripe. It should be easy to
merge them to one function without making a mess out of it.
> /*
> * shrinking a device means finding all of the device extents past
> * the new size, and then following the back refs to the chunks.
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index dd1fa7f..b8c234a 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -204,6 +204,7 @@ struct map_lookup {
> */
> #define RESTRIPE_RUNNING 0
> #define RESTRIPE_CANCEL_REQ 1
> +#define RESTRIPE_PAUSE_REQ 2
>
> struct btrfs_restripe_args;
> struct restripe_control {
> @@ -261,6 +262,7 @@ int btrfs_balance(struct btrfs_root *dev_root);
> int btrfs_restripe(struct restripe_control *rctl, int resume);
> int btrfs_recover_restripe(struct btrfs_root *tree_root);
> int btrfs_cancel_restripe(struct btrfs_fs_info *fs_info);
> +int btrfs_pause_restripe(struct btrfs_fs_info *fs_info, int unset);
> int btrfs_chunk_readonly(struct btrfs_root *root, u64 chunk_offset);
> int find_free_dev_extent(struct btrfs_trans_handle *trans,
> struct btrfs_device *device, u64 num_bytes,
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 00/21] [RFC] Btrfs: restriper
2011-08-23 20:01 [PATCH 00/21] [RFC] Btrfs: restriper Ilya Dryomov
` (21 preceding siblings ...)
2011-09-27 12:47 ` [PATCH 00/21] [RFC] Btrfs: restriper David Sterba
@ 2011-11-14 23:59 ` Phillip Susi
2011-11-15 9:22 ` Ilya Dryomov
2011-11-17 3:13 ` Phillip Susi
23 siblings, 1 reply; 42+ messages in thread
From: Phillip Susi @ 2011-11-14 23:59 UTC (permalink / raw)
To: Ilya Dryomov; +Cc: linux-btrfs
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
I have a fs that started with the default policy of metadata=dup. I
added a second device and rebalanced, and so the metadata chunks were
converted to raid1. Now I can not remove the second device because
raid1 requires at least two devices.
If I understand this patch series correctly, I can use it to manually
convert those raid1 chunks back to dup, and then remove the second
device. It occurs to me though, that in the restripe process, the
newly created dup chunks can be allocated from either disk still, and
any that are allocated on the second disk will then need to be
relocated in order to remove that disk. This seems inefficient, so I
was wondering if there is a way to make sure that during the restripe,
only the disk I intend to keep is allocated from to create the dup
chunks, and thus avoid the need to relocate when I remove the second disk?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk7Bq0wACgkQJ4UciIs+XuLoUACeMkb4Pd0zshDDKmVzibYtxmvX
GewAnAwKcsCaCaAX2XK6oMWxK6FvZQFc
=UxDl
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 00/21] [RFC] Btrfs: restriper
2011-11-14 23:59 ` Phillip Susi
@ 2011-11-15 9:22 ` Ilya Dryomov
2011-11-15 14:33 ` Phillip Susi
0 siblings, 1 reply; 42+ messages in thread
From: Ilya Dryomov @ 2011-11-15 9:22 UTC (permalink / raw)
To: Phillip Susi; +Cc: linux-btrfs
On Mon, Nov 14, 2011 at 06:59:14PM -0500, Phillip Susi wrote:
> I have a fs that started with the default policy of metadata=dup. I
> added a second device and rebalanced, and so the metadata chunks were
> converted to raid1. Now I can not remove the second device because
> raid1 requires at least two devices.
>
> If I understand this patch series correctly, I can use it to manually
> convert those raid1 chunks back to dup, and then remove the second
> device. It occurs to me though, that in the restripe process, the
> newly created dup chunks can be allocated from either disk still, and
> any that are allocated on the second disk will then need to be
> relocated in order to remove that disk. This seems inefficient, so I
> was wondering if there is a way to make sure that during the restripe,
> only the disk I intend to keep is allocated from to create the dup
> chunks, and thus avoid the need to relocate when I remove the second disk?
Restriper won't let you do raid1 -> dup transition because dup is only
allowed for a single-spindle FS, so you'll end up with error "btrfs:
unable to start restripe ...".
There is no way to prioritize disks during restripe. To get dup back
you'll have to convert everything to single, remove the second drive and
then convert metadata from single to dup.
Thanks,
Ilya
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 00/21] [RFC] Btrfs: restriper
2011-11-15 9:22 ` Ilya Dryomov
@ 2011-11-15 14:33 ` Phillip Susi
2011-11-15 15:06 ` Ilya Dryomov
0 siblings, 1 reply; 42+ messages in thread
From: Phillip Susi @ 2011-11-15 14:33 UTC (permalink / raw)
To: Ilya Dryomov; +Cc: linux-btrfs
On 11/15/2011 4:22 AM, Ilya Dryomov wrote:
> Restriper won't let you do raid1 -> dup transition because dup is only
> allowed for a single-spindle FS, so you'll end up with error "btrfs:
> unable to start restripe ...".
>
> There is no way to prioritize disks during restripe. To get dup back
> you'll have to convert everything to single, remove the second drive and
> then convert metadata from single to dup.
So there is no way to put a disk into read only mode and prevent
allocations of new chunks there?
It seems like both of these limitations are highly undesirable when
trying to recover from a failing disk. You don't want any more data
being written to the failing disk while you are trying to remove it, and
you certainly don't want to drop back to a single copy of data that is
then written to the failing disk.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 00/21] [RFC] Btrfs: restriper
2011-11-15 14:33 ` Phillip Susi
@ 2011-11-15 15:06 ` Ilya Dryomov
0 siblings, 0 replies; 42+ messages in thread
From: Ilya Dryomov @ 2011-11-15 15:06 UTC (permalink / raw)
To: Phillip Susi; +Cc: linux-btrfs
On Tue, Nov 15, 2011 at 09:33:14AM -0500, Phillip Susi wrote:
> On 11/15/2011 4:22 AM, Ilya Dryomov wrote:
> >Restriper won't let you do raid1 -> dup transition because dup is only
> >allowed for a single-spindle FS, so you'll end up with error "btrfs:
> >unable to start restripe ...".
> >
> >There is no way to prioritize disks during restripe. To get dup back
> >you'll have to convert everything to single, remove the second drive and
> >then convert metadata from single to dup.
>
> So there is no way to put a disk into read only mode and prevent
> allocations of new chunks there?
>
> It seems like both of these limitations are highly undesirable when
> trying to recover from a failing disk. You don't want any more data
> being written to the failing disk while you are trying to remove it,
> and you certainly don't want to drop back to a single copy of data
> that is then written to the failing disk.
If you have a failing disk in a raid setup, you don't need to downgrade
your raid, you can add a third drive and remove the failing one. But
that's inconvenient and most of the time you'll have to do a full
balance.
So another thing I'm working on is drive swap, when it's done it will
take care of the failing disk scenario. If you have a raid setup and
one of the disks gone bad you'll be able to say
btrfs device replace FAILED NEW <mountpoint>
and it will put valid copy onto the fresh drive, basically doing a raid
rebuild.
Thanks,
Ilya
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 00/21] [RFC] Btrfs: restriper
2011-08-23 20:01 [PATCH 00/21] [RFC] Btrfs: restriper Ilya Dryomov
` (22 preceding siblings ...)
2011-11-14 23:59 ` Phillip Susi
@ 2011-11-17 3:13 ` Phillip Susi
23 siblings, 0 replies; 42+ messages in thread
From: Phillip Susi @ 2011-11-17 3:13 UTC (permalink / raw)
To: Ilya Dryomov; +Cc: linux-btrfs, Chris Mason, Hugo Mills
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 08/23/2011 04:01 PM, Ilya Dryomov wrote:
> Hello,
>
> This patch series adds an initial implementation of restriper (it's
> a clever name for relocation framework that allows to do selective
> profile changing and selective balancing with some goodies like
> pausing/resuming and reporting progress to the user.
>
> Profile changing is global (per-FS) so far, per-subvolume profiles
> require some discussion and can be implemented in future. This is
> a RFC so some features/problems are not yet implemented/resolved.
> The current TODO list is as follows:
I managed to use these patches to convert the raid1 system and
metadata chunks back to single and drop the second disk from a two
disk array. In doing so I noticed that the restriper required a force
switch to downgrade raid1 to single. This seems completely
unnecessary to me. A force switch to btrfs device delete might make
sense since delete may or may not force a downgrade, but with
restripe, the request to convert from raid1 to single is already quite
explicit with no room for ambiguity, so there should be no need for an
additional confirmation switch.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk7Ee+oACgkQJ4UciIs+XuIGIQCdFx9cP7cPQPslE9IcFNDg/6Ns
LQYAn2l2ykGwiJt/yZNvuqePyMj3sxYH
=P+HR
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 03/21] Btrfs: add BTRFS_AVAIL_ALLOC_BIT_SINGLE bit
2012-01-06 14:30 [PATCH 00/21] " Ilya Dryomov
@ 2012-01-06 14:30 ` Ilya Dryomov
0 siblings, 0 replies; 42+ messages in thread
From: Ilya Dryomov @ 2012-01-06 14:30 UTC (permalink / raw)
To: linux-btrfs; +Cc: Chris Mason, idryomov
Right now on-disk BTRFS_BLOCK_GROUP_* profile bits are used for
avail_{data,metadata,system}_alloc_bits fields, which gather info about
available allocation profiles in the FS. When chunk is created or read
from disk, its profile is OR'ed with the corresponding avail_alloc_bits
field. Since SINGLE is denoted by 0 in the on-disk format, currently
there is no way to tell when such chunks become avaialble. Restriper
needs that information, so add a separate bit for SINGLE profile.
This bit is going to be in-memory only, it should never be written out
to disk, so it's not a disk format change. However to avoid remappings
in future, reserve corresponding on-disk bit.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
fs/btrfs/ctree.h | 15 +++++++++++++++
fs/btrfs/extent-tree.c | 30 +++++++++++++++++++++---------
2 files changed, 36 insertions(+), 9 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 4370a56..3f8f11e 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -758,6 +758,7 @@ struct btrfs_csum_item {
#define BTRFS_BLOCK_GROUP_RAID1 (1ULL << 4)
#define BTRFS_BLOCK_GROUP_DUP (1ULL << 5)
#define BTRFS_BLOCK_GROUP_RAID10 (1ULL << 6)
+#define BTRFS_BLOCK_GROUP_RESERVED BTRFS_AVAIL_ALLOC_BIT_SINGLE
#define BTRFS_NR_RAID_TYPES 5
#define BTRFS_BLOCK_GROUP_TYPE_MASK (BTRFS_BLOCK_GROUP_DATA | \
@@ -768,6 +769,15 @@ struct btrfs_csum_item {
BTRFS_BLOCK_GROUP_RAID1 | \
BTRFS_BLOCK_GROUP_DUP | \
BTRFS_BLOCK_GROUP_RAID10)
+/*
+ * We need a bit for restriper to be able to tell when chunks of type
+ * SINGLE are available. This "extended" profile format is used in
+ * fs_info->avail_*_alloc_bits (in-memory) and balance item fields
+ * (on-disk). The corresponding on-disk bit in chunk.type is reserved
+ * to avoid remappings between two formats in future.
+ */
+#define BTRFS_AVAIL_ALLOC_BIT_SINGLE (1ULL << 48)
+
struct btrfs_block_group_item {
__le64 used;
__le64 chunk_objectid;
@@ -1140,6 +1150,11 @@ struct btrfs_fs_info {
spinlock_t ref_cache_lock;
u64 total_ref_cache_size;
+ /*
+ * these three are in extended format (availability of single
+ * chunks is denoted by BTRFS_AVAIL_ALLOC_BIT_SINGLE bit, other
+ * types are denoted by corresponding BTRFS_BLOCK_GROUP_* bits)
+ */
u64 avail_data_alloc_bits;
u64 avail_metadata_alloc_bits;
u64 avail_system_alloc_bits;
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index a8d8204..15a2294 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3014,16 +3014,24 @@ static int update_space_info(struct btrfs_fs_info *info, u64 flags,
static void set_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags)
{
u64 extra_flags = flags & BTRFS_BLOCK_GROUP_PROFILE_MASK;
- if (extra_flags) {
- if (flags & BTRFS_BLOCK_GROUP_DATA)
- fs_info->avail_data_alloc_bits |= extra_flags;
- if (flags & BTRFS_BLOCK_GROUP_METADATA)
- fs_info->avail_metadata_alloc_bits |= extra_flags;
- if (flags & BTRFS_BLOCK_GROUP_SYSTEM)
- fs_info->avail_system_alloc_bits |= extra_flags;
- }
+
+ /* chunk -> extended profile */
+ if (extra_flags == 0)
+ extra_flags = BTRFS_AVAIL_ALLOC_BIT_SINGLE;
+
+ if (flags & BTRFS_BLOCK_GROUP_DATA)
+ fs_info->avail_data_alloc_bits |= extra_flags;
+ if (flags & BTRFS_BLOCK_GROUP_METADATA)
+ fs_info->avail_metadata_alloc_bits |= extra_flags;
+ if (flags & BTRFS_BLOCK_GROUP_SYSTEM)
+ fs_info->avail_system_alloc_bits |= extra_flags;
}
+/*
+ * @flags: available profiles in extended format (see ctree.h)
+ *
+ * Returns reduced profile in chunk format.
+ */
u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags)
{
/*
@@ -3053,8 +3061,12 @@ u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags)
if ((flags & BTRFS_BLOCK_GROUP_RAID0) &&
((flags & BTRFS_BLOCK_GROUP_RAID1) |
(flags & BTRFS_BLOCK_GROUP_RAID10) |
- (flags & BTRFS_BLOCK_GROUP_DUP)))
+ (flags & BTRFS_BLOCK_GROUP_DUP))) {
flags &= ~BTRFS_BLOCK_GROUP_RAID0;
+ }
+
+ /* extended -> chunk profile */
+ flags &= ~BTRFS_AVAIL_ALLOC_BIT_SINGLE;
return flags;
}
--
1.7.6.3
^ permalink raw reply related [flat|nested] 42+ messages in thread
end of thread, other threads:[~2012-01-06 14:30 UTC | newest]
Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-23 20:01 [PATCH 00/21] [RFC] Btrfs: restriper Ilya Dryomov
2011-08-23 20:01 ` [PATCH 01/21] Btrfs: get rid of *_alloc_profile fields Ilya Dryomov
2011-09-27 12:51 ` David Sterba
2011-08-23 20:01 ` [PATCH 02/21] Btrfs: introduce masks for chunk type and profile Ilya Dryomov
2011-08-23 20:01 ` [PATCH 03/21] Btrfs: add BTRFS_AVAIL_ALLOC_BIT_SINGLE bit Ilya Dryomov
2011-11-01 7:56 ` Arne Jansen
2011-08-23 20:01 ` [PATCH 04/21] Btrfs: make avail_*_alloc_bits fields dynamic Ilya Dryomov
2011-08-23 20:01 ` [PATCH 05/21] Btrfs: add basic restriper infrastructure Ilya Dryomov
2011-11-01 10:08 ` Arne Jansen
2011-11-01 11:07 ` David Sterba
2011-11-01 11:08 ` Arne Jansen
2011-08-23 20:01 ` [PATCH 06/21] Btrfs: implement online profile changing Ilya Dryomov
2011-08-23 20:01 ` [PATCH 07/21] Btrfs: add basic infrastructure for selective balancing Ilya Dryomov
2011-09-27 13:02 ` David Sterba
2011-09-27 17:28 ` Ilya Dryomov
2011-08-23 20:01 ` [PATCH 08/21] Btrfs: soft profile changing mode (aka soft convert) Ilya Dryomov
2011-08-23 20:01 ` [PATCH 09/21] Btrfs: profiles filter Ilya Dryomov
2011-08-23 20:01 ` [PATCH 10/21] Btrfs: usage filter Ilya Dryomov
2011-09-27 13:22 ` David Sterba
2011-11-01 10:18 ` Arne Jansen
2011-08-23 20:01 ` [PATCH 11/21] Btrfs: devid filter Ilya Dryomov
2011-08-23 20:01 ` [PATCH 12/21] Btrfs: devid subset filter Ilya Dryomov
2011-08-23 20:01 ` [PATCH 13/21] Btrfs: virtual address space " Ilya Dryomov
2011-08-23 20:01 ` [PATCH 14/21] Btrfs: save restripe parameters to disk Ilya Dryomov
2011-09-27 13:43 ` David Sterba
2011-11-01 10:29 ` Arne Jansen
2011-08-23 20:01 ` [PATCH 15/21] Btrfs: recover restripe on mount Ilya Dryomov
2011-11-01 10:57 ` Arne Jansen
2011-08-23 20:01 ` [PATCH 16/21] Btrfs: allow for cancelling restriper Ilya Dryomov
2011-08-23 20:01 ` [PATCH 17/21] Btrfs: allow for pausing restriper Ilya Dryomov
2011-11-01 11:46 ` Arne Jansen
2011-08-23 20:01 ` [PATCH 18/21] Btrfs: allow for resuming restriper after it was paused Ilya Dryomov
2011-08-23 20:02 ` [PATCH 19/21] Btrfs: add skip_restripe mount option Ilya Dryomov
2011-08-23 20:02 ` [PATCH 20/21] Btrfs: get rid of btrfs_balance() function Ilya Dryomov
2011-08-23 20:02 ` [PATCH 21/21] Btrfs: add restripe progress reporting Ilya Dryomov
2011-09-27 12:47 ` [PATCH 00/21] [RFC] Btrfs: restriper David Sterba
2011-11-14 23:59 ` Phillip Susi
2011-11-15 9:22 ` Ilya Dryomov
2011-11-15 14:33 ` Phillip Susi
2011-11-15 15:06 ` Ilya Dryomov
2011-11-17 3:13 ` Phillip Susi
-- strict thread matches above, loose matches on Subject: below --
2012-01-06 14:30 [PATCH 00/21] " Ilya Dryomov
2012-01-06 14:30 ` [PATCH 03/21] Btrfs: add BTRFS_AVAIL_ALLOC_BIT_SINGLE bit Ilya Dryomov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).