From: Zhao Lei <zhaolei@cn.fujitsu.com>
To: <fdmanana@gmail.com>
Cc: <linux-btrfs@vger.kernel.org>
Subject: RE: [PATCH 2/2] btrfs: Fix lost-data-profile caused by balance bg
Date: Wed, 30 Sep 2015 16:26:15 +0800 [thread overview]
Message-ID: <00ff01d0fb59$acaa56d0$05ff0470$@cn.fujitsu.com> (raw)
In-Reply-To: <CAL3q7H6uShZiaj7wbVSopDWJ3gWDBypzcOuwGzM32oj8N2bFiA@mail.gmail.com>
Hi, Filipe Manana
> -----Original Message-----
> From: Filipe Manana [mailto:fdmanana@gmail.com]
> Sent: Wednesday, September 30, 2015 3:41 PM
> To: Zhao Lei <zhaolei@cn.fujitsu.com>
> Cc: linux-btrfs@vger.kernel.org
> Subject: Re: [PATCH 2/2] btrfs: Fix lost-data-profile caused by balance bg
>
> On Wed, Sep 30, 2015 at 5:20 AM, Zhao Lei <zhaolei@cn.fujitsu.com> wrote:
> > Hi, Filipe Manana
> >
> > Thanks for reviewing.
> >
> >> -----Original Message-----
> >> From: Filipe Manana [mailto:fdmanana@gmail.com]
> >> Sent: Tuesday, September 29, 2015 11:48 PM
> >> To: Zhao Lei <zhaolei@cn.fujitsu.com>
> >> Cc: linux-btrfs@vger.kernel.org
> >> Subject: Re: [PATCH 2/2] btrfs: Fix lost-data-profile caused by
> >> balance bg
> >>
> >> On Tue, Sep 29, 2015 at 2:51 PM, Zhao Lei <zhaolei@cn.fujitsu.com> wrote:
> >> > Reproduce:
> >> > (In integration-4.3 branch)
> >> >
> >> > TEST_DEV=(/dev/vdg /dev/vdh)
> >> > TEST_DIR=/mnt/tmp
> >> >
> >> > umount "$TEST_DEV" >/dev/null
> >> > mkfs.btrfs -f -d raid1 "${TEST_DEV[@]}"
> >> >
> >> > mount -o nospace_cache "$TEST_DEV" "$TEST_DIR"
> >> > btrfs balance start -dusage=0 $TEST_DIR btrfs filesystem usage
> >> > $TEST_DIR
> >> >
> >> > dd if=/dev/zero of="$TEST_DIR"/file count=100 btrfs filesystem
> >> > usage $TEST_DIR
> >> >
> >> > Result:
> >> > We can see "no data chunk" in first "btrfs filesystem usage":
> >> > # btrfs filesystem usage $TEST_DIR
> >> > Overall:
> >> > ...
> >> > Metadata,single: Size:8.00MiB, Used:0.00B
> >> > /dev/vdg 8.00MiB
> >> > Metadata,RAID1: Size:122.88MiB, Used:112.00KiB
> >> > /dev/vdg 122.88MiB
> >> > /dev/vdh 122.88MiB
> >> > System,single: Size:4.00MiB, Used:0.00B
> >> > /dev/vdg 4.00MiB
> >> > System,RAID1: Size:8.00MiB, Used:16.00KiB
> >> > /dev/vdg 8.00MiB
> >> > /dev/vdh 8.00MiB
> >> > Unallocated:
> >> > /dev/vdg 1.06GiB
> >> > /dev/vdh 1.07GiB
> >> >
> >> > And "data chunks changed from raid1 to single" in second "btrfs
> >> > filesystem usage":
> >> > # btrfs filesystem usage $TEST_DIR
> >> > Overall:
> >> > ...
> >> > Data,single: Size:256.00MiB, Used:0.00B
> >> > /dev/vdh 256.00MiB
> >> > Metadata,single: Size:8.00MiB, Used:0.00B
> >> > /dev/vdg 8.00MiB
> >> > Metadata,RAID1: Size:122.88MiB, Used:112.00KiB
> >> > /dev/vdg 122.88MiB
> >> > /dev/vdh 122.88MiB
> >> > System,single: Size:4.00MiB, Used:0.00B
> >> > /dev/vdg 4.00MiB
> >> > System,RAID1: Size:8.00MiB, Used:16.00KiB
> >> > /dev/vdg 8.00MiB
> >> > /dev/vdh 8.00MiB
> >> > Unallocated:
> >> > /dev/vdg 1.06GiB
> >> > /dev/vdh 841.92MiB
> >> >
> >> > Reason:
> >> > btrfs balance delete last data chunk in case of no data in the
> >> > filesystem, then we can see "no data chunk" by "fi usage"
> >> > command.
> >> >
> >> > And when we do write operation to fs, the only available data
> >> > profile is 0x0, result is all new chunks are allocated single type.
> >> >
> >> > Fix:
> >> > Allocate a data chunk explicitly in balance operation, to reserve
> >> > at least one data chunk in the filesystem.
> >>
> >> Allocate a data chunk explicitly to ensure we don't lose the raid profile for
> data.
> >>
> >
> > Thanks, will change in v2.
> >
> >> >
> >> > Test:
> >> > Test by above script, and confirmed the logic by debug output.
> >> >
> >> > Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
> >> > ---
> >> > fs/btrfs/volumes.c | 19 +++++++++++++++++++
> >> > 1 file changed, 19 insertions(+)
> >> >
> >> > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index
> >> > 6fc73586..3d5e41e 100644
> >> > --- a/fs/btrfs/volumes.c
> >> > +++ b/fs/btrfs/volumes.c
> >> > @@ -3277,6 +3277,7 @@ static int __btrfs_balance(struct
> >> > btrfs_fs_info
> >> *fs_info)
> >> > u64 limit_data = bctl->data.limit;
> >> > u64 limit_meta = bctl->meta.limit;
> >> > u64 limit_sys = bctl->sys.limit;
> >> > + int chunk_reserved = 0;
> >> >
> >> > /* step one make some room on all the devices */
> >> > devices = &fs_info->fs_devices->devices; @@ -3387,6
> >> > +3388,24 @@ again:
> >> > goto loop;
> >> > }
> >> >
> >> > + if (!chunk_reserved) {
> >> > + trans = btrfs_start_transaction(chunk_root,
> 0);
> >> > + if (IS_ERR(trans)) {
> >> > +
> >> mutex_unlock(&fs_info->delete_unused_bgs_mutex);
> >> > + ret = PTR_ERR(trans);
> >> > + goto error;
> >> > + }
> >> > +
> >> > + ret = btrfs_force_chunk_alloc(trans,
> >> > + chunk_root, 1);
> >>
> >> Can we please use the symbol BTRFS_BLOCK_GROUP_DATA instead of 1?
> >>
> > Thanks, will change in v2.
> >
> >
> >> > + if (ret < 0) {
> >> > +
> >> mutex_unlock(&fs_info->delete_unused_bgs_mutex);
> >> > + goto error;
> >> > + }
> >> > +
> >> > + btrfs_end_transaction(trans, chunk_root);
> >> > + chunk_reserved = 1;
> >> > + }
> >>
> >> Can we do this logic only if the block group is a data one? I.e. no
> >> point allocating a data block group if our balance only touches
> >> metadata ones (due to some filter for e.g.).
> >>
> > Thanks for point out it, will change in v2.
>
> I find it somewhat awkward that we always allocate a new data block group no
> matter what. Some balance operations that used to succeed in the past may
> now fail with -ENOSPC for example...
>
> How about making this simpler and not so special for data block groups, like the
> following (compile tested only):
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 644e070..067b1eb
> 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2774,6 +2774,8 @@ static int btrfs_relocate_chunk(struct btrfs_root
> *root, u64 chunk_offset)
> struct btrfs_root *extent_root;
> struct btrfs_trans_handle *trans;
> int ret;
> + struct btrfs_block_group_cache *bg;
> + bool remove = true;
>
> root = root->fs_info->chunk_root;
> extent_root = root->fs_info->extent_root; @@ -2803,6 +2805,23
> @@ static int btrfs_relocate_chunk(struct btrfs_root *root, u64 chunk_offset)
> if (ret)
> return ret;
>
> + bg = btrfs_lookup_block_group(root->fs_info, chunk_offset);
> + ASSERT(bg);
> + down_read(&bg->space_info->groups_sem);
> + /*
> + * Do not remove the last block group of a kind to prevent losing
> + * raid profile information for future chunk allocations.
> + */
> + if (bg->list.next == bg->list.prev)
> + remove = false;
> + up_read(&bg->space_info->groups_sem);
> + if (!remove)
> + btrfs_dec_block_group_ro(extent_root, bg);
> + btrfs_put_block_group(bg);
> +
> + if (!remove)
> + return 0;
> +
> trans = btrfs_start_transaction(root, 0);
> if (IS_ERR(trans)) {
> ret = PTR_ERR(trans);
>
Thanks for your detailed review!
Reason of creating new bg is:
Balance operation may be used to change raid-profile of the filesystem.
If we want to change filesystem form raid1 to raid5, we need:
1: delete all raid1 bgs
2: move all data to raid5 bg if data exist in filesystem
3: reserve only one blank raid5 bg if no data in filesystem
If we use similar operation as patch 1/2(not delete lattest blockgroup),
we'll have following problem:
1: the old raid1 blockgroups are not all-deleted
2: if no data in filesystem, the profile will not changed from raid1 to raid5.
I understand your worry of "NO_SPACE" when create additional bg,
and I also considered it in making patch, but I choose to use this way because:
1: balance operation had check free space before operation
(In front of __btrfs_balance)
2: for filesystem with data, we have to create target-chunk in balance operation,
this patch only make "creating-chunk" earlier, and the created chunk will be used
to save data in balance operation, and reduce one chunk-allocate in balance operation.
3: for a blank filesystem, it is necessary to create a blank chunk with of target raid profile.
the old code hadn't create it, and this patch created. It is right operation even if
cause no-space. Actually, create a chunk in empty filesystem rarely cause no-space,
plus we have free-space check in 1.
4: 1~3 ensure this patch rarely cause additional no-space problem in logic.
And if it really caused additional no-space(if out of my consideration),
We can change code of 1 to avoid it.
Thanks
Zhaolei
> (also at https://friendpaste.com/5IeAIIzBv3oKhureKfvjwm)
>
> thanks
>
>
> >
> >> Also, can't this be ineffective when the data block group we allocate
> >> ends up with a key in the chunk_root that is lower than the key we
> >> found in the current iteration? I.e., key found in current iteration
> >> has object id B, we allocate a new block group which gets a key with
> >> object id A, where A < B and the next iteration of our loop sees key
> >> A, deletes the respective block group A if it's empty. In the end we
> >> have no data block groups, assuming that before A there are no other
> non-empty data block groups.
> >> Your example works because there's no free space before the offset
> >> where the chunk starts in the device.
> >>
> > New chunk will always use increased logic address, even if there are
> > "hole" before, so A's logic address will always >B.
> > And current code of balance also use this feature to avoid balance
> > new-created chunks(which was created by balance operation itself).
> >
> > Thanks
> > Zhaolei
> >
> >> thanks
> >>
> >> > +
> >> > ret = btrfs_relocate_chunk(chunk_root,
> >> > found_key.offset);
> >> > mutex_unlock(&fs_info->delete_unused_bgs_mutex);
> >> > --
> >> > 1.8.5.1
> >> >
> >> > --
> >> > To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
> >> > in the body of a message to majordomo@vger.kernel.org More
> >> > majordomo info at http://vger.kernel.org/majordomo-info.html
> >>
> >>
> >>
> >> --
> >> Filipe David Manana,
> >>
> >> "Reasonable men adapt themselves to the world.
> >> Unreasonable men adapt the world to themselves.
> >> That's why all progress depends on unreasonable men."
> >
>
>
>
> --
> Filipe David Manana,
>
> "Reasonable men adapt themselves to the world.
> Unreasonable men adapt the world to themselves.
> That's why all progress depends on unreasonable men."
next prev parent reply other threads:[~2015-09-30 8:26 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-29 13:51 [PATCH 1/2] btrfs: Fix lost-data-profile caused by auto removing bg Zhao Lei
2015-09-29 13:51 ` [PATCH 2/2] btrfs: Fix lost-data-profile caused by balance bg Zhao Lei
2015-09-29 15:47 ` Filipe Manana
2015-09-30 4:20 ` Zhao Lei
2015-09-30 7:41 ` Filipe Manana
2015-09-30 8:26 ` Zhao Lei [this message]
2015-09-30 9:44 ` Filipe Manana
2015-09-30 7:42 ` [PATCH 1/2] btrfs: Fix lost-data-profile caused by auto removing bg Filipe Manana
2015-09-30 10:06 ` Zhao Lei
2015-09-30 10:26 ` Filipe Manana
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='00ff01d0fb59$acaa56d0$05ff0470$@cn.fujitsu.com' \
--to=zhaolei@cn.fujitsu.com \
--cc=fdmanana@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).