From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cn.fujitsu.com ([59.151.112.132]:6169 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1751064AbaJJCiN convert rfc822-to-8bit (ORCPT ); Thu, 9 Oct 2014 22:38:13 -0400 Message-ID: <543746C8.9070601@cn.fujitsu.com> Date: Fri, 10 Oct 2014 10:39:04 +0800 From: Qu Wenruo MIME-Version: 1.0 To: CC: "linux-btrfs@vger.kernel.org" Subject: Re: [PATCH] btrfs: Fix and enhance merge_extent_mapping() to insert best fitted extent map References: <1410926015-26820-1-git-send-email-quwenruo@cn.fujitsu.com> <541B7967.4050907@cn.fujitsu.com> <5435D695.40803@cn.fujitsu.com> In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Sender: linux-btrfs-owner@vger.kernel.org List-ID: -------- Original Message -------- Subject: Re: [PATCH] btrfs: Fix and enhance merge_extent_mapping() to insert best fitted extent map From: Filipe David Manana To: Qu Wenruo Date: 2014年10月09日 18:27 > On Thu, Oct 9, 2014 at 1:28 AM, Qu Wenruo wrote: >> -------- Original Message -------- >> Subject: Re: [PATCH] btrfs: Fix and enhance merge_extent_mapping() to insert >> best fitted extent map >> From: Filipe David Manana >> To: Qu Wenruo >> Date: 2014年10月08日 20:08 >>> On Fri, Sep 19, 2014 at 1:31 AM, Qu Wenruo >>> wrote: >>>> -------- Original Message -------- >>>> Subject: Re: [PATCH] btrfs: Fix and enhance merge_extent_mapping() to >>>> insert >>>> best fitted extent map >>>> From: Filipe David Manana >>>> To: Qu Wenruo >>>> Date: 2014年09月18日 21:16 >>>>> On Wed, Sep 17, 2014 at 4:53 AM, Qu Wenruo >>>>> wrote: >>>>>> The following commit enhanced the merge_extent_mapping() to reduce >>>>>> fragment in extent map tree, but it can't handle case which existing >>>>>> lies before map_start: >>>>>> 51f39 btrfs: Use right extent length when inserting overlap extent map. >>>>>> >>>>>> [BUG] >>>>>> When existing extent map's start is before map_start, >>>>>> the em->len will be minus, which will corrupt the extent map and fail >>>>>> to >>>>>> insert the new extent map. >>>>>> This will happen when someone get a large extent map, but when it is >>>>>> going to insert it into extent map tree, some one has already commit >>>>>> some write and split the huge extent into small parts. >>>>> This sounds like very deterministic to me. >>>>> Any reason to not add tests to the sanity tests that exercise >>>>> this/these case/cases? >>>> Yes, thanks for the informing. >>>> Will add the test case for it soon. >>> Hi Qu, >>> >>> Any progress on the test? >>> >>> This is a very important one IMHO, not only because of the bad >>> consequences of the bug (extent map corruption, leading to all sorts >>> of chaos), but also because this problem was not found by the full >>> xfstests suite on several developer machines. >>> >>> thanks >> Still trying to reproduce it under xfstest framework. > That's the problem, wasn't apparently reproducible (or detectable at > least) by anyone with xfstests. I'll try to build a C program to behave the same of filebench and to see if it works. At least with filebench, it can be triggered in 60s with 100% possibility to reproduce. > >> But even followiiing the FileBench randomrw behavior(1 thread random read 1 >> thread random write on preallocated space), >> I still failed to reproduce it. >> >> Still investigating how to reproduce it. >> Worst case may be add a new C program into src of xfstests? > How about the sanity tests (fs/btrfs/tests/*.c)? Create an empty map > tree, add some extent maps, then try to merge some new extent maps > that used to fail before this fix. Seems simple, no? > > thanks Qu It needs concurrency read and write(commit) to trigger it, I am not sure it can be reproduced in sanity tests since it seems not commit things and lacks multithread facility. I'll give it a try on the filebench-behavior C program first, and then sanity tests if former doesn't work at all Thanks, Qu > > >> Thanks, >> Qu >> >>>> Thanks, >>>> Qu >>>> >>>>> Thanks >>>>> >>>>>> [REPRODUCER] >>>>>> It is very easy to tiger using filebench with randomrw personality. >>>>>> It is about 100% to reproduce when using 8G preallocated file in 60s >>>>>> randonrw test. >>>>>> >>>>>> [FIX] >>>>>> This patch can now handle any existing extent position. >>>>>> Since it does not directly use existing->start, now it will find the >>>>>> previous and next extent around map_start. >>>>>> So the old existing->start < map_start bug will never happen again. >>>>>> >>>>>> [ENHANCE] >>>>>> This patch will insert the best fitted extent map into extent map tree, >>>>>> other than the oldest [map_start, map_start + sectorsize) or the >>>>>> relatively newer but not perfect [map_start, existing->start). >>>>>> >>>>>> The patch will first search existing extent that does not intersects >>>>>> with >>>>>> the desired map range [map_start, map_start + len). >>>>>> The existing extent will be either before or behind map_start, and >>>>>> based >>>>>> on the existing extent, we can find out the previous and next extent >>>>>> around map_start. >>>>>> >>>>>> So the best fitted extent would be [prev->end, next->start). >>>>>> For prev or next is not found, em->start would be prev->end and em->end >>>>>> wold be next->start. >>>>>> >>>>>> With this patch, the fragment in extent map tree should be reduced much >>>>>> more than the 51f39 commit and reduce an unneeded extent map tree >>>>>> search. >>>>>> >>>>>> Reported-by: Tsutomu Itoh >>>>>> Signed-off-by: Qu Wenruo >>>>>> --- >>>>>> fs/btrfs/inode.c | 79 >>>>>> ++++++++++++++++++++++++++++++++++++++++---------------- >>>>>> 1 file changed, 57 insertions(+), 22 deletions(-) >>>>>> >>>>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c >>>>>> index 016c403..8039021 100644 >>>>>> --- a/fs/btrfs/inode.c >>>>>> +++ b/fs/btrfs/inode.c >>>>>> @@ -6191,21 +6191,60 @@ out_fail_inode: >>>>>> goto out_fail; >>>>>> } >>>>>> >>>>>> +/* Find next extent map of a given extent map, caller needs to ensure >>>>>> locks */ >>>>>> +static struct extent_map *next_extent_map(struct extent_map *em) >>>>>> +{ >>>>>> + struct rb_node *next; >>>>>> + >>>>>> + next = rb_next(&em->rb_node); >>>>>> + if (!next) >>>>>> + return NULL; >>>>>> + return container_of(next, struct extent_map, rb_node); >>>>>> +} >>>>>> + >>>>>> +static struct extent_map *prev_extent_map(struct extent_map *em) >>>>>> +{ >>>>>> + struct rb_node *prev; >>>>>> + >>>>>> + prev = rb_prev(&em->rb_node); >>>>>> + if (!prev) >>>>>> + return NULL; >>>>>> + return container_of(prev, struct extent_map, rb_node); >>>>>> +} >>>>>> + >>>>>> /* helper for btfs_get_extent. Given an existing extent in the >>>>>> tree, >>>>>> + * the existing extent is the nearest extent to map_start, >>>>>> * and an extent that you want to insert, deal with overlap and >>>>>> insert >>>>>> - * the new extent into the tree. >>>>>> + * the best fitted new extent into the tree. >>>>>> */ >>>>>> static int merge_extent_mapping(struct extent_map_tree *em_tree, >>>>>> struct extent_map *existing, >>>>>> struct extent_map *em, >>>>>> u64 map_start) >>>>>> { >>>>>> + struct extent_map *prev; >>>>>> + struct extent_map *next; >>>>>> + u64 start; >>>>>> + u64 end; >>>>>> u64 start_diff; >>>>>> >>>>>> BUG_ON(map_start < em->start || map_start >= >>>>>> extent_map_end(em)); >>>>>> - start_diff = map_start - em->start; >>>>>> - em->start = map_start; >>>>>> - em->len = existing->start - em->start; >>>>>> + >>>>>> + if (existing->start > map_start) { >>>>>> + next = existing; >>>>>> + prev = prev_extent_map(next); >>>>>> + } else { >>>>>> + prev = existing; >>>>>> + next = next_extent_map(prev); >>>>>> + } >>>>>> + >>>>>> + start = prev ? extent_map_end(prev) : em->start; >>>>>> + start = max_t(u64, start, em->start); >>>>>> + end = next ? next->start : extent_map_end(em); >>>>>> + end = min_t(u64, end, extent_map_end(em)); >>>>>> + start_diff = start - em->start; >>>>>> + em->start = start; >>>>>> + em->len = end - start; >>>>>> if (em->block_start < EXTENT_MAP_LAST_BYTE && >>>>>> !test_bit(EXTENT_FLAG_COMPRESSED, &em->flags)) { >>>>>> em->block_start += start_diff; >>>>>> @@ -6482,25 +6521,21 @@ insert: >>>>>> >>>>>> ret = 0; >>>>>> >>>>>> - existing = lookup_extent_mapping(em_tree, start, len); >>>>>> - if (existing && (existing->start > start || >>>>>> - existing->start + existing->len <= start)) { >>>>>> + existing = search_extent_mapping(em_tree, start, len); >>>>>> + /* >>>>>> + * existing will always be non-NULL, since there must >>>>>> be >>>>>> + * extent causing the -EEXIST. >>>>>> + */ >>>>>> + if (start >= extent_map_end(existing) || >>>>>> + start + len <= existing->start) { >>>>>> + /* >>>>>> + * The existing extent map is the one nearest >>>>>> to >>>>>> + * the [start, start + len) range which >>>>>> overlaps >>>>>> + */ >>>>>> + err = merge_extent_mapping(em_tree, existing, >>>>>> + em, start); >>>>>> free_extent_map(existing); >>>>>> - existing = NULL; >>>>>> - } >>>>>> - if (!existing) { >>>>>> - existing = lookup_extent_mapping(em_tree, >>>>>> em->start, >>>>>> - em->len); >>>>>> - if (existing) { >>>>>> - err = merge_extent_mapping(em_tree, >>>>>> existing, >>>>>> - em, start); >>>>>> - free_extent_map(existing); >>>>>> - if (err) { >>>>>> - free_extent_map(em); >>>>>> - em = NULL; >>>>>> - } >>>>>> - } else { >>>>>> - err = -EIO; >>>>>> + if (err) { >>>>>> free_extent_map(em); >>>>>> em = NULL; >>>>>> } >>>>>> -- >>>>>> 2.1.0 >>>>>> >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" >>>>>> in >>>>>> the body of a message to majordomo@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>> >>>>> >>> > >