From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: <fdmanana@gmail.com>
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH] btrfs: Fix and enhance merge_extent_mapping() to insert best fitted extent map
Date: Fri, 10 Oct 2014 10:39:04 +0800 [thread overview]
Message-ID: <543746C8.9070601@cn.fujitsu.com> (raw)
In-Reply-To: <CAL3q7H54BTVtBO8Pa1DXuC+Pk7xmndc-nCGSz2tTsQE+YGBJNA@mail.gmail.com>
-------- Original Message --------
Subject: Re: [PATCH] btrfs: Fix and enhance merge_extent_mapping() to
insert best fitted extent map
From: Filipe David Manana <fdmanana@gmail.com>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>
Date: 2014年10月09日 18:27
> On Thu, Oct 9, 2014 at 1:28 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>> -------- Original Message --------
>> Subject: Re: [PATCH] btrfs: Fix and enhance merge_extent_mapping() to insert
>> best fitted extent map
>> From: Filipe David Manana <fdmanana@gmail.com>
>> To: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> Date: 2014年10月08日 20:08
>>> On Fri, Sep 19, 2014 at 1:31 AM, Qu Wenruo <quwenruo@cn.fujitsu.com>
>>> wrote:
>>>> -------- Original Message --------
>>>> Subject: Re: [PATCH] btrfs: Fix and enhance merge_extent_mapping() to
>>>> insert
>>>> best fitted extent map
>>>> From: Filipe David Manana <fdmanana@gmail.com>
>>>> To: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>>> Date: 2014年09月18日 21:16
>>>>> On Wed, Sep 17, 2014 at 4:53 AM, Qu Wenruo <quwenruo@cn.fujitsu.com>
>>>>> wrote:
>>>>>> The following commit enhanced the merge_extent_mapping() to reduce
>>>>>> fragment in extent map tree, but it can't handle case which existing
>>>>>> lies before map_start:
>>>>>> 51f39 btrfs: Use right extent length when inserting overlap extent map.
>>>>>>
>>>>>> [BUG]
>>>>>> When existing extent map's start is before map_start,
>>>>>> the em->len will be minus, which will corrupt the extent map and fail
>>>>>> to
>>>>>> insert the new extent map.
>>>>>> This will happen when someone get a large extent map, but when it is
>>>>>> going to insert it into extent map tree, some one has already commit
>>>>>> some write and split the huge extent into small parts.
>>>>> This sounds like very deterministic to me.
>>>>> Any reason to not add tests to the sanity tests that exercise
>>>>> this/these case/cases?
>>>> Yes, thanks for the informing.
>>>> Will add the test case for it soon.
>>> Hi Qu,
>>>
>>> Any progress on the test?
>>>
>>> This is a very important one IMHO, not only because of the bad
>>> consequences of the bug (extent map corruption, leading to all sorts
>>> of chaos), but also because this problem was not found by the full
>>> xfstests suite on several developer machines.
>>>
>>> thanks
>> Still trying to reproduce it under xfstest framework.
> That's the problem, wasn't apparently reproducible (or detectable at
> least) by anyone with xfstests.
I'll try to build a C program to behave the same of filebench and to see
if it works.
At least with filebench, it can be triggered in 60s with 100%
possibility to reproduce.
>
>> But even followiiing the FileBench randomrw behavior(1 thread random read 1
>> thread random write on preallocated space),
>> I still failed to reproduce it.
>>
>> Still investigating how to reproduce it.
>> Worst case may be add a new C program into src of xfstests?
> How about the sanity tests (fs/btrfs/tests/*.c)? Create an empty map
> tree, add some extent maps, then try to merge some new extent maps
> that used to fail before this fix. Seems simple, no?
>
> thanks Qu
It needs concurrency read and write(commit) to trigger it, I am not sure
it can be reproduced in sanity tests
since it seems not commit things and lacks multithread facility.
I'll give it a try on the filebench-behavior C program first, and then
sanity tests if former doesn't work at all
Thanks,
Qu
>
>
>> Thanks,
>> Qu
>>
>>>> Thanks,
>>>> Qu
>>>>
>>>>> Thanks
>>>>>
>>>>>> [REPRODUCER]
>>>>>> It is very easy to tiger using filebench with randomrw personality.
>>>>>> It is about 100% to reproduce when using 8G preallocated file in 60s
>>>>>> randonrw test.
>>>>>>
>>>>>> [FIX]
>>>>>> This patch can now handle any existing extent position.
>>>>>> Since it does not directly use existing->start, now it will find the
>>>>>> previous and next extent around map_start.
>>>>>> So the old existing->start < map_start bug will never happen again.
>>>>>>
>>>>>> [ENHANCE]
>>>>>> This patch will insert the best fitted extent map into extent map tree,
>>>>>> other than the oldest [map_start, map_start + sectorsize) or the
>>>>>> relatively newer but not perfect [map_start, existing->start).
>>>>>>
>>>>>> The patch will first search existing extent that does not intersects
>>>>>> with
>>>>>> the desired map range [map_start, map_start + len).
>>>>>> The existing extent will be either before or behind map_start, and
>>>>>> based
>>>>>> on the existing extent, we can find out the previous and next extent
>>>>>> around map_start.
>>>>>>
>>>>>> So the best fitted extent would be [prev->end, next->start).
>>>>>> For prev or next is not found, em->start would be prev->end and em->end
>>>>>> wold be next->start.
>>>>>>
>>>>>> With this patch, the fragment in extent map tree should be reduced much
>>>>>> more than the 51f39 commit and reduce an unneeded extent map tree
>>>>>> search.
>>>>>>
>>>>>> Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
>>>>>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>>>>> ---
>>>>>> fs/btrfs/inode.c | 79
>>>>>> ++++++++++++++++++++++++++++++++++++++++----------------
>>>>>> 1 file changed, 57 insertions(+), 22 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>>>>>> index 016c403..8039021 100644
>>>>>> --- a/fs/btrfs/inode.c
>>>>>> +++ b/fs/btrfs/inode.c
>>>>>> @@ -6191,21 +6191,60 @@ out_fail_inode:
>>>>>> goto out_fail;
>>>>>> }
>>>>>>
>>>>>> +/* Find next extent map of a given extent map, caller needs to ensure
>>>>>> locks */
>>>>>> +static struct extent_map *next_extent_map(struct extent_map *em)
>>>>>> +{
>>>>>> + struct rb_node *next;
>>>>>> +
>>>>>> + next = rb_next(&em->rb_node);
>>>>>> + if (!next)
>>>>>> + return NULL;
>>>>>> + return container_of(next, struct extent_map, rb_node);
>>>>>> +}
>>>>>> +
>>>>>> +static struct extent_map *prev_extent_map(struct extent_map *em)
>>>>>> +{
>>>>>> + struct rb_node *prev;
>>>>>> +
>>>>>> + prev = rb_prev(&em->rb_node);
>>>>>> + if (!prev)
>>>>>> + return NULL;
>>>>>> + return container_of(prev, struct extent_map, rb_node);
>>>>>> +}
>>>>>> +
>>>>>> /* helper for btfs_get_extent. Given an existing extent in the
>>>>>> tree,
>>>>>> + * the existing extent is the nearest extent to map_start,
>>>>>> * and an extent that you want to insert, deal with overlap and
>>>>>> insert
>>>>>> - * the new extent into the tree.
>>>>>> + * the best fitted new extent into the tree.
>>>>>> */
>>>>>> static int merge_extent_mapping(struct extent_map_tree *em_tree,
>>>>>> struct extent_map *existing,
>>>>>> struct extent_map *em,
>>>>>> u64 map_start)
>>>>>> {
>>>>>> + struct extent_map *prev;
>>>>>> + struct extent_map *next;
>>>>>> + u64 start;
>>>>>> + u64 end;
>>>>>> u64 start_diff;
>>>>>>
>>>>>> BUG_ON(map_start < em->start || map_start >=
>>>>>> extent_map_end(em));
>>>>>> - start_diff = map_start - em->start;
>>>>>> - em->start = map_start;
>>>>>> - em->len = existing->start - em->start;
>>>>>> +
>>>>>> + if (existing->start > map_start) {
>>>>>> + next = existing;
>>>>>> + prev = prev_extent_map(next);
>>>>>> + } else {
>>>>>> + prev = existing;
>>>>>> + next = next_extent_map(prev);
>>>>>> + }
>>>>>> +
>>>>>> + start = prev ? extent_map_end(prev) : em->start;
>>>>>> + start = max_t(u64, start, em->start);
>>>>>> + end = next ? next->start : extent_map_end(em);
>>>>>> + end = min_t(u64, end, extent_map_end(em));
>>>>>> + start_diff = start - em->start;
>>>>>> + em->start = start;
>>>>>> + em->len = end - start;
>>>>>> if (em->block_start < EXTENT_MAP_LAST_BYTE &&
>>>>>> !test_bit(EXTENT_FLAG_COMPRESSED, &em->flags)) {
>>>>>> em->block_start += start_diff;
>>>>>> @@ -6482,25 +6521,21 @@ insert:
>>>>>>
>>>>>> ret = 0;
>>>>>>
>>>>>> - existing = lookup_extent_mapping(em_tree, start, len);
>>>>>> - if (existing && (existing->start > start ||
>>>>>> - existing->start + existing->len <= start)) {
>>>>>> + existing = search_extent_mapping(em_tree, start, len);
>>>>>> + /*
>>>>>> + * existing will always be non-NULL, since there must
>>>>>> be
>>>>>> + * extent causing the -EEXIST.
>>>>>> + */
>>>>>> + if (start >= extent_map_end(existing) ||
>>>>>> + start + len <= existing->start) {
>>>>>> + /*
>>>>>> + * The existing extent map is the one nearest
>>>>>> to
>>>>>> + * the [start, start + len) range which
>>>>>> overlaps
>>>>>> + */
>>>>>> + err = merge_extent_mapping(em_tree, existing,
>>>>>> + em, start);
>>>>>> free_extent_map(existing);
>>>>>> - existing = NULL;
>>>>>> - }
>>>>>> - if (!existing) {
>>>>>> - existing = lookup_extent_mapping(em_tree,
>>>>>> em->start,
>>>>>> - em->len);
>>>>>> - if (existing) {
>>>>>> - err = merge_extent_mapping(em_tree,
>>>>>> existing,
>>>>>> - em, start);
>>>>>> - free_extent_map(existing);
>>>>>> - if (err) {
>>>>>> - free_extent_map(em);
>>>>>> - em = NULL;
>>>>>> - }
>>>>>> - } else {
>>>>>> - err = -EIO;
>>>>>> + if (err) {
>>>>>> free_extent_map(em);
>>>>>> em = NULL;
>>>>>> }
>>>>>> --
>>>>>> 2.1.0
>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
>>>>>> in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>
>>>
>
>
next prev parent reply other threads:[~2014-10-10 2:38 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-17 3:53 [PATCH] btrfs: Fix and enhance merge_extent_mapping() to insert best fitted extent map Qu Wenruo
2014-09-18 4:21 ` Liu Bo
2014-09-18 5:36 ` Qu Wenruo
2014-09-18 5:40 ` Qu Wenruo
2014-09-18 7:33 ` Liu Bo
2014-09-18 7:58 ` Qu Wenruo
2014-09-18 8:20 ` Liu Bo
2014-09-18 8:24 ` Qu Wenruo
2014-09-18 9:01 ` Liu Bo
2014-09-18 13:16 ` Filipe David Manana
2014-09-19 0:31 ` Qu Wenruo
2014-10-08 12:08 ` Filipe David Manana
2014-10-09 0:28 ` Qu Wenruo
2014-10-09 10:27 ` Filipe David Manana
2014-10-10 2:39 ` Qu Wenruo [this message]
2014-10-10 8:08 ` Filipe David Manana
2014-10-13 2:47 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=543746C8.9070601@cn.fujitsu.com \
--to=quwenruo@cn.fujitsu.com \
--cc=fdmanana@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).