From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: "Stéphane Lesimple" <stephane_btrfs@lesimple.fr>,
"Qu Wenruo" <quwenruo@cn.fujitsu.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
Date: Wed, 23 Sep 2015 18:13:49 +0800 [thread overview]
Message-ID: <56027B5D.1070404@gmx.com> (raw)
In-Reply-To: <eb465f0cf1fe18eee3f0a0627c2ec0e2@all.all>
在 2015年09月23日 17:40, Stéphane Lesimple 写道:
> Le 2015-09-23 09:03, Qu Wenruo a écrit :
>> Stéphane Lesimple wrote on 2015/09/22 16:31 +0200:
>>> Le 2015-09-22 10:51, Qu Wenruo a écrit :
>>>>>>>> [92098.842261] Call Trace:
>>>>>>>> [92098.842277] [<ffffffffc035a5d8>] ?
>>>>>>>> read_extent_buffer+0xb8/0x110
>>>>>>>> [btrfs]
>>>>>>>> [92098.842304] [<ffffffffc0396d00>] ?
>>>>>>>> btrfs_find_all_roots+0x60/0x70
>>>>>>>> [btrfs]
>>>>>>>> [92098.842329] [<ffffffffc039af3d>]
>>>>>>>> btrfs_qgroup_rescan_worker+0x28d/0x5a0 [btrfs]
>>>>>>>
>>>>>>> Would you please show the code of it?
>>>>>>> This one seems to be another stupid bug I made when rewriting the
>>>>>>> framework.
>>>>>>> Maybe I forgot to reinit some variants or I'm screwing memory...
>>>>>>
>>>>>> (gdb) list *(btrfs_qgroup_rescan_worker+0x28d)
>>>>>> 0x97f6d is in btrfs_qgroup_rescan_worker (fs/btrfs/ctree.h:2760).
>>>>>> 2755
>>>>>> 2756 static inline void btrfs_disk_key_to_cpu(struct btrfs_key
>>>>>> *cpu,
>>>>>> 2757 struct
>>>>>> btrfs_disk_key
>>>>>> *disk)
>>>>>> 2758 {
>>>>>> 2759 cpu->offset =e64_to_cpu(disk->offset);
>>>>>> 2760 cpu->type =isk->type;
>>>>>> 2761 cpu->objectid =e64_to_cpu(disk->objectid);
>>>>>> 2762 }
>>>>>> 2763
>>>>>> 2764 static inline void btrfs_cpu_key_to_disk(struct
>>>>>> btrfs_disk_key
>>>>>> *disk,
>>>>>> (gdb)
>>>>>>
>>>>>>
>>>>>> Does it makes sense ?
>>>>> So it seems that the memory of cpu key is being screwed up...
>>>>>
>>>>> The code is be specific thin inline function, so what about other
>>>>> stack?
>>>>> Like btrfs_qgroup_rescan_helper+0x12?
>>>>>
>>>>> Thanks,
>>>>> Qu
>>>> Oh, I forgot that you can just change the number of
>>>> btrfs_qgroup_rescan_worker+0x28d to smaller value.
>>>> Try +0x280 for example, which will revert to 14 bytes asm code back,
>>>> which may jump out of the inline function range, and may give you a
>>>> good hint.
>>>>
>>>> Or gdb may have a better mode for inline function, but I don't know...
>>>
>>> Actually, "list -" is our friend here (show 10 lignes before the last
>>> src output)
>> No, that's not the case.
>>
>> List - will only show lines around the source code.
>>
>> What I need is to get the higher caller stack.
>> If debugging a running program, it's quite easy to just use frame
>> command.
>>
>> But in this situation, we don't have call stack, so I'd like to change
>> the +0x28d to several bytes backward, until we jump out of the inline
>> function call, and see the meaningful codes.
>
> Ah, you're right.
> I had a hard time finding a value where I wouldn't end up in another inline
> function or entirely somewhere else in the kernel code, but here it is :
>
> (gdb) list *(btrfs_qgroup_rescan_worker+0x26e)
> 0x97f4e is in btrfs_qgroup_rescan_worker (fs/btrfs/qgroup.c:2237).
> 2232 memcpy(scratch_leaf, path->nodes[0],
> sizeof(*scratch_leaf));
> 2233 slot = path->slots[0];
> 2234 btrfs_release_path(path);
> 2235 mutex_unlock(&fs_info->qgroup_rescan_lock);
> 2236
> 2237 for (; slot < btrfs_header_nritems(scratch_leaf); ++slot) {
> 2238 btrfs_item_key_to_cpu(scratch_leaf, &found,
> slot); <== here
>
> 2239 if (found.type != BTRFS_EXTENT_ITEM_KEY &&
> 2240 found.type != BTRFS_METADATA_ITEM_KEY)
> 2241 continue;
>
> the btrfs_item_key_to_cpu() inline func calls 2 other inline funcs:
>
> static inline void btrfs_item_key_to_cpu(struct extent_buffer *eb,
> struct btrfs_key *key, int nr)
> {
> struct btrfs_disk_key disk_key;
> btrfs_item_key(eb, &disk_key, nr);
> btrfs_disk_key_to_cpu(key, &disk_key); <== this is 0x28d
> }
>
> btrfs_disk_key_to_cpu() is the inline referenced by 0x28d and this is where
> the GPF happens.
Thanks, now things are much more clear.
Not completely sure, but scratch_leaf seems invalid and cause the bug.
(found is in stack memory, so I don't think it's the cause).
But less related to the qgroup rework, as that's the existing code.
A quick glance already shows some dirty and maybe deadly hack, like
copying the whole extent buffer, which includes pages and all kinds of
locks.
But I'm not 100% sure if that's the problem, but I'll create a patch to
for you to test in recent days.
>
>
>> BTW, did you tried the following patch?
>> https://patchwork.kernel.org/patch/7114321/
>> btrfs: qgroup: exit the rescan worker during umount
>>
>> The problem seems a little related to the bug you encountered, so I'd
>> recommend to give it a try.
>
> Not yet, but I've come across this bug too during my tests: starting a
> rescan
> and umounting gets you a crash. I didn't mention it because I was sure this
> was an already known bug. Nice to see it has been fixed though !
> I'll certainly give it a try but I'm not really sure it'll fix the specific
> bug we're talking about.
> However the group of patches posted by Mark should fix the qgroup count
> disrepancies as I understand it, right ? It might be of interest to try
> them
> all at once for sure.
Yes, his patch should fix the qgroup count mismatch problem for
subvolume remove.
If I read the codes correctly, after remove and sync, the accounting
number for qgroup of deleted subvolume should be:
rfer = 0 and excl = 0.
Thanks,
Qu
>
> Thanks,
>
next prev parent reply other threads:[~2015-09-23 10:14 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-14 11:46 kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance Stéphane Lesimple
2015-09-15 14:47 ` Stéphane Lesimple
2015-09-15 14:56 ` Josef Bacik
2015-09-15 21:47 ` Stéphane Lesimple
2015-09-16 5:02 ` Duncan
2015-09-16 10:28 ` Stéphane Lesimple
2015-09-16 10:46 ` Holger Hoffstätte
2015-09-16 13:04 ` Stéphane Lesimple
2015-09-16 20:18 ` Duncan
2015-09-16 20:41 ` Stéphane Lesimple
2015-09-17 3:03 ` Qu Wenruo
2015-09-17 6:11 ` Stéphane Lesimple
2015-09-17 6:42 ` Qu Wenruo
2015-09-17 8:02 ` Stéphane Lesimple
2015-09-17 8:11 ` Qu Wenruo
2015-09-17 10:08 ` Stéphane Lesimple
2015-09-17 10:41 ` Qu Wenruo
2015-09-17 18:47 ` Stéphane Lesimple
2015-09-18 0:59 ` Qu Wenruo
2015-09-18 7:36 ` Stéphane Lesimple
2015-09-18 10:15 ` Stéphane Lesimple
2015-09-18 10:26 ` Stéphane Lesimple
2015-09-20 1:22 ` Qu Wenruo
2015-09-20 10:35 ` Stéphane Lesimple
2015-09-20 10:51 ` Qu Wenruo
2015-09-20 11:14 ` Stéphane Lesimple
2015-09-22 1:30 ` Stéphane Lesimple
2015-09-22 1:37 ` Qu Wenruo
2015-09-22 7:34 ` Stéphane Lesimple
2015-09-22 8:40 ` Qu Wenruo
2015-09-22 8:51 ` Qu Wenruo
2015-09-22 14:31 ` Stéphane Lesimple
2015-09-23 7:03 ` Qu Wenruo
2015-09-23 9:40 ` Stéphane Lesimple
2015-09-23 10:13 ` Qu Wenruo [this message]
2015-09-17 6:29 ` Stéphane Lesimple
2015-09-17 7:54 ` Stéphane Lesimple
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56027B5D.1070404@gmx.com \
--to=quwenruo.btrfs@gmx.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=quwenruo@cn.fujitsu.com \
--cc=stephane_btrfs@lesimple.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.