linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Stéphane Lesimple" <stephane_btrfs@lesimple.fr>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>
Cc: Qu Wenruo <quwenruo.btrfs@gmx.com>, linux-btrfs@vger.kernel.org
Subject: Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
Date: Fri, 18 Sep 2015 09:36:19 +0200	[thread overview]
Message-ID: <2ce9b35f73732b145e0f80b18f230a52@all.all> (raw)
In-Reply-To: <55FB61E9.4000300@cn.fujitsu.com>

Le 2015-09-18 02:59, Qu Wenruo a écrit :
> Stéphane Lesimple wrote on 2015/09/17 20:47 +0200:
>> Le 2015-09-17 12:41, Qu Wenruo a écrit :
>>>> In the meantime, I've reactivated quotas, umounted the filesystem 
>>>> and
>>>> ran a btrfsck on it : as you would expect, there's no qgroup problem
>>>> reported so far.
>>> 
>>> At least, rescan code is working without problem.
>>> 
>>>> I'll clear all my snapshots, run an quota rescan, then
>>>> re-create them one by one by rsyncing from my ext4 system I still 
>>>> have.
>>>> Maybe I'll run into the issue again.
>>>> 
>>> 
>>> Would you mind to do the following check for each subvolume rsync?
>>> 
>>> 1) Do 'sync; btrfs qgroup show -prce --raw' and save the output
>>> 2) Create the needed snapshot
>>> 3) Do 'sync; btrfs qgroup show -prce --raw' and save the output
>>> 4) Avoid doing IO if possible until step 6)
>>> 5) Do 'btrfs quota rescan -w' and save it
>>> 6) Do 'sync; btrfs qgroup show -prce --raw' and save the output
>>> 7) Rsync data from ext4 to the newly created snapshot
>>> 
>>> The point is, as you mentioned, rescan is working fine, we can 
>>> compare
>>> output from 3), 6) and 1) to see which qgroup accounting number
>>> changes.
>>> 
>>> And if differs, which means the qgroup update at write time OR
>>> snapshot creation has something wrong, at least we can locate the
>>> problem to qgroup update routine or snapshot creation.
>> 
>> I was about to do that, but first there's something that sounds 
>> strange
>> : I've begun by trashing all my snapshots, then ran a quota rescan, 
>> and
>> waited for it to complete, to start on a sane base.
>> However, this is the output of qgroup show now :
> 
> By "trashing", did you mean deleting all the files inside the 
> subvolume?
> Or "btrfs subv del"?

Sorry for the confusion here, yes, I meant btrfs subvolume del.

>> qgroupid          rfer                 excl     max_rfer     max_excl
>> parent  child
>> --------          ----                 ----     --------     --------
>> ------  -----
>> 0/5              16384                16384         none         none
>> ---     ---
>> 0/1906   1657848029184        1657848029184         none         none
>> ---     ---
>> 0/1909    124950921216         124950921216         none         none
>> ---     ---
>> 0/1911   1054587293696        1054587293696         none         none
>> ---     ---
>> 0/3270     23727300608          23727300608         none         none
>> ---     ---
>> 0/3314     23206055936          23206055936         none         none
>> ---     ---
>> 0/3317     18472996864                    0         none         none
>> ---     ---
>> 0/3318     22235709440 18446744073708421120         none         none
>> ---     ---
>> 0/3319     22240333824                    0         none         none
>> ---     ---
>> 0/3320     22289608704                    0         none         none
>> ---     ---
>> 0/3321     22289608704                    0         none         none
>> ---     ---
>> 0/3322     18461151232                    0         none         none
>> ---     ---
>> 0/3323     18423902208                    0         none         none
>> ---     ---
>> 0/3324     18423902208                    0         none         none
>> ---     ---
>> 0/3325     18463506432                    0         none         none
>> ---     ---
>> 0/3326     18463506432                    0         none         none
>> ---     ---
>> 0/3327     18463506432                    0         none         none
>> ---     ---
>> 0/3328     18463506432                    0         none         none
>> ---     ---
>> 0/3329     18585427968                    0         none         none
>> ---     ---
>> 0/3330     18621472768 18446744073251348480         none         none
>> ---     ---
>> 0/3331     18621472768                    0         none         none
>> ---     ---
>> 0/3332     18621472768                    0         none         none
>> ---     ---
>> 0/3333     18783076352                    0         none         none
>> ---     ---
>> 0/3334     18799804416                    0         none         none
>> ---     ---
>> 0/3335     18799804416                    0         none         none
>> ---     ---
>> 0/3336     18816217088                    0         none         none
>> ---     ---
>> 0/3337     18816266240                    0         none         none
>> ---     ---
>> 0/3338     18816266240                    0         none         none
>> ---     ---
>> 0/3339     18816266240                    0         none         none
>> ---     ---
>> 0/3340     18816364544                    0         none         none
>> ---     ---
>> 0/3341      7530119168           7530119168         none         none
>> ---     ---
>> 0/3342      4919283712                    0         none         none
>> ---     ---
>> 0/3343      4921724928                    0         none         none
>> ---     ---
>> 0/3344      4921724928                    0         none         none
>> ---     ---
>> 0/3345      6503317504 18446744073690902528         none         none
>> ---     ---
>> 0/3346      6503452672                    0         none         none
>> ---     ---
>> 0/3347      6509514752                    0         none         none
>> ---     ---
>> 0/3348      6515793920                    0         none         none
>> ---     ---
>> 0/3349      6515793920                    0         none         none
>> ---     ---
>> 0/3350      6518685696                    0         none         none
>> ---     ---
>> 0/3351      6521511936                    0         none         none
>> ---     ---
>> 0/3352      6521511936                    0         none         none
>> ---     ---
>> 0/3353      6521544704                    0         none         none
>> ---     ---
>> 0/3354      6597963776                    0         none         none
>> ---     ---
>> 0/3355      6598275072                    0         none         none
>> ---     ---
>> 0/3356      6635880448                    0         none         none
>> ---     ---
>> 0/3357      6635880448                    0         none         none
>> ---     ---
>> 0/3358      6635880448                    0         none         none
>> ---     ---
>> 0/3359      6635880448                    0         none         none
>> ---     ---
>> 0/3360      6635880448                    0         none         none
>> ---     ---
>> 0/3361      6635880448                    0         none         none
>> ---     ---
>> 0/3362      6635880448                    0         none         none
>> ---     ---
>> 0/3363      6635880448                    0         none         none
>> ---     ---
>> 0/3364      6635880448                    0         none         none
>> ---     ---
>> 0/3365      6635880448                    0         none         none
>> ---     ---
>> 0/3366      6635896832                    0         none         none
>> ---     ---
>> 0/3367     24185790464          24185790464         none         none
>> ---     ---
>> 
> 
> Nooooo!! What a wired result here!
> Qg 3345 is having minus number again, even after a qgroup rescan....
> IIRC, from the code, rescan is just passing old_roots as NULL, and use
> correct new_roots to build up "rfer" and "excl".
> So in theory it should never go below zero in rescan.
> 
> The only hope for me is, that's a orphan qgroup.(mentioned below)
> 
>> I would have expected all these qgroupids to have been trashed with 
>> the
>> snapshots, but it seems not. It reminded me of the bug you were 
>> talking
>> about, where deleted snapshots don't always clear correctly their
>> qgroup, but as these don't disappear after a rescan either... I'm a 
>> bit
>> surprised.
> 
> If you mean you "btrfs qgroup del" the subvolume, then it's known the
> qgroup won't be deleted, and won't be associated to any subvolume.
> (It's possible later created subvolume uses the old subvolid, and be
> associated to the qgroup again).
> 
> If above qgroups with 0 or even minus "excl" number are orphan, I'll
> be much relieved, as it'll be a minor orphan qgroup bug other than
> another possible qgroup rework(or at least huge review).

The only qgroup subcommand I use is qroup show, I never deleted a qgroup 
directly by using qgroup del... I guess this is not good news :(

>> I've just tried quota disable / quota enable, and not it seems OK. 
>> Just
>> wanted to let you know, in case it's not known behavior ...

There's a typo above, I was meaning "and *now* it seems OK".
I'm sure you corrected, I just want to be sure there's no possibility of 
misinterpretation.

> Thanks for your info a lot, which indeed expose something we didn't
> take much consideration.
> 
> And if the qgroups are the same with above description, would you mind
> to remove these qgroups?

Sure, I did a quota disable / quota enable before running the snapshot 
debug procedure, so the qgroups were clean again when I started :

qgroupid          rfer          excl     max_rfer     max_excl parent  
child
--------          ----          ----     --------     -------- ------  
-----
0/5              16384         16384         none         none ---     
---
0/1906   1657848029184 1657848029184         none         none ---     
---
0/1909    124950921216  124950921216         none         none ---     
---
0/1911   1054587293696 1054587293696         none         none ---     
---
0/3270     23727300608   23727300608         none         none ---     
---
0/3314     23221784576   23221784576         none         none ---     
---
0/3341      7479275520    7479275520         none         none ---     
---
0/3367     24185790464   24185790464         none         none ---     
---

The test is running, I expect to post the results within an hour or two.

-- 
Stéphane.

  reply	other threads:[~2015-09-18  7:36 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-14 11:46 kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance Stéphane Lesimple
2015-09-15 14:47 ` Stéphane Lesimple
2015-09-15 14:56   ` Josef Bacik
2015-09-15 21:47     ` Stéphane Lesimple
2015-09-16  5:02       ` Duncan
2015-09-16 10:28         ` Stéphane Lesimple
2015-09-16 10:46           ` Holger Hoffstätte
2015-09-16 13:04             ` Stéphane Lesimple
2015-09-16 20:18               ` Duncan
2015-09-16 20:41                 ` Stéphane Lesimple
2015-09-17  3:03                   ` Qu Wenruo
2015-09-17  6:11                     ` Stéphane Lesimple
2015-09-17  6:42                       ` Qu Wenruo
2015-09-17  8:02                         ` Stéphane Lesimple
2015-09-17  8:11                           ` Qu Wenruo
2015-09-17 10:08                             ` Stéphane Lesimple
2015-09-17 10:41                               ` Qu Wenruo
2015-09-17 18:47                                 ` Stéphane Lesimple
2015-09-18  0:59                                   ` Qu Wenruo
2015-09-18  7:36                                     ` Stéphane Lesimple [this message]
2015-09-18 10:15                                       ` Stéphane Lesimple
2015-09-18 10:26                                         ` Stéphane Lesimple
2015-09-20  1:22                                           ` Qu Wenruo
2015-09-20 10:35                                             ` Stéphane Lesimple
2015-09-20 10:51                                               ` Qu Wenruo
2015-09-20 11:14                                                 ` Stéphane Lesimple
2015-09-22  1:30                                                   ` Stéphane Lesimple
2015-09-22  1:37                                                     ` Qu Wenruo
2015-09-22  7:34                                                       ` Stéphane Lesimple
2015-09-22  8:40                                                         ` Qu Wenruo
2015-09-22  8:51                                                           ` Qu Wenruo
2015-09-22 14:31                                                             ` Stéphane Lesimple
2015-09-23  7:03                                                               ` Qu Wenruo
2015-09-23  9:40                                                                 ` Stéphane Lesimple
2015-09-23 10:13                                                                   ` Qu Wenruo
2015-09-17  6:29               ` Stéphane Lesimple
2015-09-17  7:54                 ` Stéphane Lesimple

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2ce9b35f73732b145e0f80b18f230a52@all.all \
    --to=stephane_btrfs@lesimple.fr \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    --cc=quwenruo@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).