From: "Stéphane Lesimple" <stephane_btrfs@lesimple.fr>
To: "Stéphane Lesimple" <stephane_btrfs@lesimple.fr>
Cc: Qu Wenruo <quwenruo@cn.fujitsu.com>,
Qu Wenruo <quwenruo.btrfs@gmx.com>,
linux-btrfs@vger.kernel.org
Subject: Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
Date: Fri, 18 Sep 2015 12:26:16 +0200 [thread overview]
Message-ID: <762ec73d5389b5057be4d3c17f74e1f9@all.all> (raw)
In-Reply-To: <c605d4d156f9a880b216e89ca0705269@all.all>
Le 2015-09-18 12:15, Stéphane Lesimple a écrit :
> Le 2015-09-18 09:36, Stéphane Lesimple a écrit :
>> Sure, I did a quota disable / quota enable before running the snapshot
>> debug procedure, so the qgroups were clean again when I started :
>>
>> qgroupid rfer excl max_rfer max_excl parent
>> child
>> -------- ---- ---- -------- -------- ------
>> -----
>> 0/5 16384 16384 none none ---
>> ---
>> 0/1906 1657848029184 1657848029184 none none ---
>> ---
>> 0/1909 124950921216 124950921216 none none ---
>> ---
>> 0/1911 1054587293696 1054587293696 none none ---
>> ---
>> 0/3270 23727300608 23727300608 none none ---
>> ---
>> 0/3314 23221784576 23221784576 none none ---
>> ---
>> 0/3341 7479275520 7479275520 none none ---
>> ---
>> 0/3367 24185790464 24185790464 none none ---
>> ---
>>
>> The test is running, I expect to post the results within an hour or
>> two.
>
> Well, my system crashed twice while running the procedure...
> By "crashed" I mean : the machine no longer pings, and nothing is
> logged in kern.log unfortunately :
>
> [ 7096.735731] BTRFS info (device dm-3): qgroup scan completed
> (inconsistency flag cleared)
> [ 7172.614851] BTRFS info (device dm-3): qgroup scan completed
> (inconsistency flag cleared)
> [ 7242.870259] BTRFS info (device dm-3): qgroup scan completed
> (inconsistency flag cleared)
> [ 7321.466931] BTRFS info (device dm-3): qgroup scan completed
> (inconsistency flag cleared)
> [ 0.000000] Initializing cgroup subsys cpuset
>
> The even stranger part is that the last 2 stdout dump files exist but
> are empty :
>
> -rw-r--r-- 1 root root 21 Sep 18 10:29 snap32.step5
> -rw-r--r-- 1 root root 3.2K Sep 18 10:29 snap32.step6
> -rw-r--r-- 1 root root 3.2K Sep 18 10:29 snap33.step1
> -rw-r--r-- 1 root root 3.3K Sep 18 10:29 snap33.step3
> -rw-r--r-- 1 root root 21 Sep 18 10:30 snap33.step5
> -rw-r--r-- 1 root root 3.3K Sep 18 10:30 snap33.step6
> -rw-r--r-- 1 root root 3.3K Sep 18 10:30 snap34.step1
> -rw-r--r-- 1 root root 0 Sep 18 10:30 snap34.step3 <==
> -rw-r--r-- 1 root root 0 Sep 18 10:30 snap34.step5 <==
>
> The mentioned steps are as follows :
>
> 0) Rsync data from the next ext4 "snapshot" to the subvolume
> 1) Do 'sync; btrfs qgroup show -prce --raw' and save the output <==
> 2) Create the needed readonly snapshot on btrfs
> 3) Do 'sync; btrfs qgroup show -prce --raw' and save the output <==
> 4) Avoid doing IO if possible until step 6)
> 5) Do 'btrfs quota rescan -w' and save it <==
> 6) Do 'sync; btrfs qgroup show -prce --raw' and save the output <==
>
> The resulting files are available here:
> http://speed47.net/tmp2/qgroup.tar.gz
> The run2 is the more complete one, during run1 the machine crashed even
> faster.
> It's interesting to note, however, that it seems to have crashed the
> same way and at the same step in the process.
Actually about that, I forgot I did set up netconsole before starting
the second run after the first "muted" crash, and it did work : even if
I have no logs in kern.log, netconsole managed to send them to my other
machine before going down, so here it is :
---
[ 5738.172692] BUG: unable to handle kernel NULL pointer dereference at
00000000000001f0
[ 5738.172702] IP: [<ffffffffc03150db>] start_transaction+0x1b/0x580
[btrfs]
[ 5738.172719] PGD c0aa7067 PUD c0aa6067 PMD 0
[ 5738.172723] Oops: 0000 [#1] SMP
[ 5738.172726] Modules linked in: netconsole configfs xts gf128mul drbg
ansi_cprng xt_multiport xt_comment xt_conntrack xt_nat xt_tcpudp
nfnetlink_queue nfnetlink_log nfnetlink nf_conntrack_ftp
nf_conntrack_sane iptable_security iptable_filter iptable_mangle
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat
nf_conntrack iptable_raw ip_tables x_tables nfsd auth_rpcgss nfs_acl nfs
cmac dm_crypt rfcomm bnep lockd grace sunrpc fscache binfmt_misc
intel_rapl snd_hda_codec_realtek iosf_mbi x86_pkg_temp_thermal
intel_powerclamp kvm_intel snd_hda_codec_generic snd_hda_intel
snd_hda_codec kvm eeepc_wmi asus_wmi snd_hda_core btusb sparse_keymap
btrtl snd_hwdep btbcm snd_pcm btintel 8021q bluetooth snd_seq_midi
dm_multipath snd_seq_midi_event garp snd_rawmidi mrp snd_seq stp llc
snd_seq_device snd_timer crct10dif_pclmul crc32_pclmul snd
ghash_clmulni_intel cryptd serio_raw soundcore mei_me mei lpc_ich shpchp
mac_hid parport_pc ppdev nct6775 hwmon_vid coretemp lp parport btrfs
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor
async_tx xor raid6_pq raid0 multipath linear nbd raid1 i915 e1000e
i2c_algo_bit drm_kms_helper syscopyarea ptp sysfillrect sysimgblt
fb_sys_fops psmouse ahci drm libahci pps_core wmi video [last unloaded:
netconsole]
[ 5738.172831] CPU: 1 PID: 10932 Comm: kworker/u4:14 Not tainted
4.3.0-rc1 #1
[ 5738.172833] Hardware name: ASUS All Series/H87I-PLUS, BIOS 1005
01/06/2014
[ 5738.172843] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper
[btrfs]
[ 5738.172845] task: ffff8800c7010000 ti: ffff88006acf4000 task.ti:
ffff88006acf4000
[ 5738.172847] RIP: 0010:[<ffffffffc03150db>] [<ffffffffc03150db>]
start_transaction+0x1b/0x580 [btrfs]
[ 5738.172855] RSP: 0018:ffff88006acf7ca8 EFLAGS: 00010282
[ 5738.172856] RAX: 0000000000000004 RBX: 0000000000000201 RCX:
0000000000000002
[ 5738.172857] RDX: 0000000000000201 RSI: 0000000000000001 RDI:
0000000000000000
[ 5738.172858] RBP: ffff88006acf7cf0 R08: ffff88010990eab0 R09:
00000001801c0017
[ 5738.172860] R10: 000000000990e701 R11: ffffea0004264380 R12:
0000000000000000
[ 5738.172861] R13: ffff8800c73a6e08 R14: ffff880027963800 R15:
0000160000000000
[ 5738.172862] FS: 0000000000000000(0000) GS:ffff88011fb00000(0000)
knlGS:0000000000000000
[ 5738.172863] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5738.172864] CR2: 00000000000001f0 CR3: 0000000027a65000 CR4:
00000000000406e0
[ 5738.172866] Stack:
[ 5738.172867] ffff8800c73a6e08 ffff880027963800 0000160000000000
ffff88006acf7ce8
[ 5738.172871] 00000000000000be 00000000fffffffc ffff8800c73a6e08
ffff880027963800
[ 5738.172875] 0000160000000000 ffff88006acf7d00 ffffffffc031565b
ffff88006acf7dc0
[ 5738.172879] Call Trace:
[ 5738.172887] [<ffffffffc031565b>] btrfs_start_transaction+0x1b/0x20
[btrfs]
[ 5738.172896] [<ffffffffc0378038>]
btrfs_qgroup_rescan_worker+0x388/0x5a0 [btrfs]
[ 5738.172904] [<ffffffffc03444e0>] normal_work_helper+0xc0/0x270
[btrfs]
[ 5738.172912] [<ffffffffc03448a2>]
btrfs_qgroup_rescan_helper+0x12/0x20 [btrfs]
[ 5738.172915] [<ffffffff8109127e>] process_one_work+0x14e/0x3d0
[ 5738.172917] [<ffffffff8109192a>] worker_thread+0x11a/0x470
[ 5738.172919] [<ffffffff81091810>] ? rescuer_thread+0x310/0x310
[ 5738.172921] [<ffffffff81097059>] kthread+0xc9/0xe0
[ 5738.172923] [<ffffffff81096f90>] ? kthread_park+0x60/0x60
[ 5738.172926] [<ffffffff817aac4f>] ret_from_fork+0x3f/0x70
[ 5738.172928] [<ffffffff81096f90>] ? kthread_park+0x60/0x60
[ 5738.172929] Code: 49 c1 e9 5c ff ff ff 66 0f 1f 84 00 00 00 00 00 0f
1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 89 d3 48 83
ec 20 <48> 8b 87 f0 01 00 00 48 8b 90 60 0e 00 00 83 e2 01 0f 85 86 00
[ 5738.172973] RIP [<ffffffffc03150db>] start_transaction+0x1b/0x580
[btrfs]
[ 5738.172981] RSP <ffff88006acf7ca8>
[ 5738.172982] CR2: 00000000000001f0
[ 5738.172984] ---[ end trace 9feb85def1327ee9 ]---
[ 5738.173010] BUG: unable to handle kernel paging request at
ffffffffffffffd8
[ 5738.173012] IP: [<ffffffff810977d0>] kthread_data+0x10/0x20
[ 5738.173015] PGD 1c13067 PUD 1c15067 PMD 0
[ 5738.173019] Oops: 0000 [#2] SMP
[ 5738.173021] Modules linked in: netconsole configfs xts gf128mul drbg
---
Clearly this is during a rescan.
> As the machine is now, qgroups seems OK :
>
> ~# btrfs qgroup show -pcre --raw /tank/
> qgroupid rfer excl max_rfer max_excl parent
> child
> -------- ---- ---- -------- -------- ------
> -----
> 0/5 32768 32768 none none ---
> ---
> 0/1906 3315696058368 3315696058368 none none ---
> ---
> 0/1909 249901842432 249901842432 none none ---
> ---
> 0/1911 2109174587392 2109174587392 none none ---
> ---
> 0/3270 47454601216 47454601216 none none ---
> ---
> 0/3314 46408499200 32768 none none ---
> ---
> 0/3341 14991097856 32768 none none ---
> ---
> 0/3367 48371580928 48371580928 none none ---
> ---
> 0/5335 56523751424 280592384 none none ---
> ---
> 0/5336 60175253504 2599960576 none none ---
> ---
> 0/5337 45751746560 250888192 none none ---
> ---
> 0/5338 45804650496 186531840 none none ---
> ---
> 0/5339 45875167232 190521344 none none ---
> ---
> 0/5340 45933486080 327680 none none ---
> ---
> 0/5341 45933502464 344064 none none ---
> ---
> 0/5342 46442815488 35454976 none none ---
> ---
> 0/5343 46442520576 30638080 none none ---
> ---
> 0/5344 46448312320 36495360 none none ---
> ---
> 0/5345 46425235456 86204416 none none ---
> ---
> 0/5346 46081941504 119398400 none none ---
> ---
> 0/5347 46402715648 55615488 none none ---
> ---
> 0/5348 46403534848 50528256 none none ---
> ---
> 0/5349 45486301184 91463680 none none ---
> ---
> 0/5351 46414635008 393216 none none ---
> ---
> 0/5352 46414667776 294912 none none ---
> ---
> 0/5353 46414667776 294912 none none ---
> ---
> 0/5354 46406148096 24829952 none none ---
> ---
> 0/5355 46415986688 33103872 none none ---
> ---
> 0/5356 46406262784 23216128 none none ---
> ---
> 0/5357 46408245248 17408000 none none ---
> ---
> 0/5358 46416052224 25280512 none none ---
> ---
> 0/5359 46406336512 23158784 none none ---
> ---
> 0/5360 46408335360 25157632 none none ---
> ---
> 0/5361 46406402048 24395776 none none ---
> ---
> 0/5362 46415273984 32260096 none none ---
> ---
> 0/5363 46408499200 32768 none none ---
> ---
> 0/5364 14949441536 139812864 none none ---
> ---
> 0/5365 14996299776 176889856 none none ---
> ---
> 0/5366 14958616576 143065088 none none ---
> ---
> 0/5367 14919172096 100171776 none none ---
> ---
> 0/5368 14945968128 142409728 none none ---
> ---
> 0/5369 14991097856 32768 none none ---
> ---
>
>
> But I'm pretty sure I can get that (u64)-1 value again by deleting
> snapshots. Shall I ? Or do you have something else for me to run
> before that ?
>
> So, as a quick summary of this big thread, it seems I've been hitting
> 3 bugs, all reproductible :
> - kernel BUG on balance (this original thread)
> - negative or zero "excl" qgroups
> - hard freezes without kernel trace when playing with snapshots and
> quota
>
> Still available to dig deeper where needed.
next prev parent reply other threads:[~2015-09-18 10:26 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-14 11:46 kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance Stéphane Lesimple
2015-09-15 14:47 ` Stéphane Lesimple
2015-09-15 14:56 ` Josef Bacik
2015-09-15 21:47 ` Stéphane Lesimple
2015-09-16 5:02 ` Duncan
2015-09-16 10:28 ` Stéphane Lesimple
2015-09-16 10:46 ` Holger Hoffstätte
2015-09-16 13:04 ` Stéphane Lesimple
2015-09-16 20:18 ` Duncan
2015-09-16 20:41 ` Stéphane Lesimple
2015-09-17 3:03 ` Qu Wenruo
2015-09-17 6:11 ` Stéphane Lesimple
2015-09-17 6:42 ` Qu Wenruo
2015-09-17 8:02 ` Stéphane Lesimple
2015-09-17 8:11 ` Qu Wenruo
2015-09-17 10:08 ` Stéphane Lesimple
2015-09-17 10:41 ` Qu Wenruo
2015-09-17 18:47 ` Stéphane Lesimple
2015-09-18 0:59 ` Qu Wenruo
2015-09-18 7:36 ` Stéphane Lesimple
2015-09-18 10:15 ` Stéphane Lesimple
2015-09-18 10:26 ` Stéphane Lesimple [this message]
2015-09-20 1:22 ` Qu Wenruo
2015-09-20 10:35 ` Stéphane Lesimple
2015-09-20 10:51 ` Qu Wenruo
2015-09-20 11:14 ` Stéphane Lesimple
2015-09-22 1:30 ` Stéphane Lesimple
2015-09-22 1:37 ` Qu Wenruo
2015-09-22 7:34 ` Stéphane Lesimple
2015-09-22 8:40 ` Qu Wenruo
2015-09-22 8:51 ` Qu Wenruo
2015-09-22 14:31 ` Stéphane Lesimple
2015-09-23 7:03 ` Qu Wenruo
2015-09-23 9:40 ` Stéphane Lesimple
2015-09-23 10:13 ` Qu Wenruo
2015-09-17 6:29 ` Stéphane Lesimple
2015-09-17 7:54 ` Stéphane Lesimple
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=762ec73d5389b5057be4d3c17f74e1f9@all.all \
--to=stephane_btrfs@lesimple.fr \
--cc=linux-btrfs@vger.kernel.org \
--cc=quwenruo.btrfs@gmx.com \
--cc=quwenruo@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).