All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Nikolay Borisov <nborisov@suse.com>
Cc: Qu Wenruo <quwenruo.btrfs@gmx.com>,
	David Sterba <dsterba@suse.com>,
	linux-btrfs@vger.kernel.org, wqu@suse.com
Subject: Re: BUG at fs/btrfs/relocation.c:794! Still happening on misc-next and 5.8.3
Date: Tue, 1 Sep 2020 18:53:42 -0400	[thread overview]
Message-ID: <20200901225341.GY5890@hungrycats.org> (raw)
In-Reply-To: <20200828204255.GV5890@hungrycats.org>

On Fri, Aug 28, 2020 at 04:42:55PM -0400, Zygo Blaxell wrote:
> On Fri, Aug 28, 2020 at 09:34:31AM +0300, Nikolay Borisov wrote:
> > On 28.08.20 г. 3:08 ч., Zygo Blaxell wrote:
> > > On Thu, Aug 27, 2020 at 08:03:13PM -0400, Zygo Blaxell wrote:
> > >> On Tue, Aug 04, 2020 at 12:16:26PM -0400, Zygo Blaxell wrote:
> > 
> > <snip>
> > 
> > Since you can repro reliably could you modify the code in
> > create_reloc_root so it prints what's the returned error value, I'd
> > speculate it's EEXIST from
> > 
> > btrfs_insert_root
> >   btrfs_insert_item
> >    btrfs_insert_empty_item
> >      btrfs_insert_empty_items
> >        btrfs_search_slot
> > 
> > But better be sure.
> 
> Here you go, EEXIST == 17:
> 
> 	Aug 28 15:30:55 regress kernel: [18452.845182][T31546] BTRFS info (device dm-0): balance: start -dlimit=9
> 	Aug 28 15:30:55 regress kernel: [18452.874627][T31546] BTRFS info (device dm-0): relocating block group 15873413742592 flags data
> 	Aug 28 15:30:55 regress kernel: [18453.097516][ T2100] btrfs_search_slot ret = 0
> 	Aug 28 15:30:55 regress kernel: [18453.104865][ T2100] btrfs_search_slot ret = 0
> 	Aug 28 15:30:55 regress kernel: [18453.109751][ T2100] btrfs_search_slot ret = 0
> 	Aug 28 15:30:56 regress kernel: [18454.453135][ T2100] btrfs_search_slot ret = 0
> 	Aug 28 15:30:56 regress kernel: [18454.453955][ T2100] btrfs_insert_empty_item ret = -17
> 	Aug 28 15:30:56 regress kernel: [18454.455022][ T2100] btrfs_insert_root ret = -17
> 	Aug 28 15:30:56 regress kernel: [18454.456229][ T2100] ------------[ cut here ]------------
> 	Aug 28 15:30:56 regress kernel: [18454.457622][ T2100] kernel BUG at fs/btrfs/relocation.c:795!

I did a low-resolution bisect for this issue.  I dug up 5.4, 5.5, 5.6,
and 5.7 kernel sources, backported btrfs fixes from 5.4 to the obsolete
kernels, and ran the tests on each kernel.  Results:

	5.8:  kernel BUG at fs/btrfs/relocation.c:794

	5.7:  kernel BUG (same code but different line number)

	5.6:  kernel BUG (same as the others)

	5.5:  assertion failure (stack trace below)

	5.4:  kernel BUG (!)

The 5.4 result is interesting--I've been running 5.4 for some time and
not hit this before.  So there are 3 possible theories:

	1.  It's because of sending signals to balance.  That has been
	added to my test workload after 5.7 was released, so earlier
	tests on 5.4 would not have triggered it.

	2.  There's a regression in 5.4-stable, which I've cherry-picked
	to all the other kernels during my test setup.	(On the other
	hand, if I don't backport some fixes, kernels 5.5..5.7 crash
	before they get to this bug.)

	3.  There's something rotten in my test filesystem, and the
	BUG will go away for a while if I do a mkfs.  Qu asked for
	a dump earlier in this thread, and I provided one.

All three of these theories are testable to some extent, so I'll have
my test VM run some variations.

The full test workload is:

	balance metadata or data at random intervals

	scrub, scrub cancel at random intervals

	20x rsync updating files

	snapshot create, delete at random intervals

	bees dedupe (lots of EXTENT_SAME and LOGICAL_INO calls)

	balance cancel at random intervals

	kill -9 $(pidof btrfs balance) at random intervals (NEW - added
	when 5.7 came out)

This is the 5.5 root assertion failure:

	Sep  1 04:48:48 regress kernel: [10642.537825][T24161] assertion failed: root, in fs/btrfs/relocation.c:837
	Sep  1 04:48:48 regress kernel: [10642.538911][T24161] ------------[ cut here ]------------
	Sep  1 04:48:48 regress kernel: [10642.539704][T24161] kernel BUG at fs/btrfs/ctree.h:3125!
	Sep  1 04:48:48 regress kernel: [10642.540621][T24161] invalid opcode: 0000 [#1] SMP KASAN PTI
	Sep  1 04:48:48 regress kernel: [10642.540624][ T4639] irq event stamp: 49626809
	Sep  1 04:48:48 regress kernel: [10642.540632][ T4639] hardirqs last  enabled at (49626809): [<ffffffff8a00481a>] trace_hardirqs_on_thunk+0x1a/0x1c
	Sep  1 04:48:48 regress kernel: [10642.541451][T24161] CPU: 1 PID: 24161 Comm: btrfs Tainted: G        W         5.5.19-76348822ab91+ #14
	Sep  1 04:48:48 regress kernel: [10642.542114][ T4639] hardirqs last disabled at (49626808): [<ffffffff8a004836>] trace_hardirqs_off_thunk+0x1a/0x1c
	Sep  1 04:48:48 regress kernel: [10642.543693][T24161] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
	Sep  1 04:48:48 regress kernel: [10642.545017][ T4639] softirqs last  enabled at (49626726): [<ffffffff8bc0048b>] __do_softirq+0x48b/0x5be
	Sep  1 04:48:48 regress kernel: [10642.545023][ T4639] softirqs last disabled at (49626715): [<ffffffff8a1258a2>] irq_exit+0x112/0x120
	Sep  1 04:48:48 regress kernel: [10642.546536][T24161] RIP: 0010:assertfail.constprop.42+0x1c/0x1e
	Sep  1 04:48:49 regress kernel: [10642.551589][T24161] Code: 48 c7 c6 c0 90 03 8c e8 89 29 f1 ff 0f 0b 55 89 f1 48 c7 c2 40 82 03 8c 48 89 fe 48 c7 c7 60 83 03 8c 48 89 e5 e8 41 b0 90 ff <0f> 0b 0f 1f 44 00 00 55 48 89 e5 41 54 49 89 f4 53 48 89 fb 48 83
	Sep  1 04:48:49 regress kernel: [10642.554495][T24161] RSP: 0018:ffffc90002327150 EFLAGS: 00010282
	Sep  1 04:48:49 regress kernel: [10642.555371][T24161] RAX: 0000000000000034 RBX: 000004513701c000 RCX: ffffffff8a264242
	Sep  1 04:48:49 regress kernel: [10642.556515][T24161] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff8881f580004c
	Sep  1 04:48:49 regress kernel: [10642.557680][T24161] RBP: ffffc90002327150 R08: ffffed103eb017e1 R09: ffffed103eb017e1
	Sep  1 04:48:49 regress kernel: [10642.558895][T24161] R10: ffffed103eb017e0 R11: ffff8881f580bf07 R12: ffff88800d1ea5c0
	Sep  1 04:48:49 regress kernel: [10642.560139][T24161] R13: ffffc900023273e0 R14: 0000000000000000 R15: ffff8880bbf238f0
	Sep  1 04:48:49 regress kernel: [10642.561391][T24161] FS:  00007f03f48488c0(0000) GS:ffff8881f5600000(0000) knlGS:0000000000000000
	Sep  1 04:48:49 regress kernel: [10642.562779][T24161] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	Sep  1 04:48:49 regress kernel: [10642.563801][T24161] CR2: 00007f1cab76f718 CR3: 0000000189a5e004 CR4: 00000000001606e0
	Sep  1 04:48:49 regress kernel: [10642.565046][T24161] Call Trace:
	Sep  1 04:48:49 regress kernel: [10642.565565][T24161]  build_backref_tree+0x186b/0x2590
	Sep  1 04:48:49 regress kernel: [10642.566389][T24161]  ? relocate_data_extent+0x1a0/0x1a0
	Sep  1 04:48:49 regress kernel: [10642.567295][T24161]  ? lock_downgrade+0x3d0/0x3d0
	Sep  1 04:48:49 regress kernel: [10642.568142][T24161]  ? match_held_lock+0x20/0x260
	Sep  1 04:48:49 regress kernel: [10642.568925][T24161]  ? do_raw_spin_unlock+0xa8/0x140
	Sep  1 04:48:49 regress kernel: [10642.569765][T24161]  ? _raw_spin_trylock_bh+0x60/0x80
	Sep  1 04:48:49 regress kernel: [10642.570605][T24161]  ? release_extent_buffer+0x13b/0x230
	Sep  1 04:48:49 regress kernel: [10642.571480][T24161]  ? free_extent_buffer.part.45+0xd7/0x140
	Sep  1 04:48:49 regress kernel: [10642.572406][T24161]  relocate_tree_blocks+0x204/0xa50
	Sep  1 04:48:49 regress kernel: [10642.573244][T24161]  ? build_backref_tree+0x2590/0x2590
	Sep  1 04:48:49 regress kernel: [10642.574103][T24161]  ? rb_insert_color+0x3af/0x400
	Sep  1 04:48:49 regress kernel: [10642.574896][T24161]  ? kmem_cache_alloc_trace+0x5af/0x740
	Sep  1 04:48:49 regress kernel: [10642.575785][T24161]  ? tree_insert+0x90/0xb0
	Sep  1 04:48:49 regress kernel: [10642.576495][T24161]  ? add_tree_block.isra.38+0x1d6/0x230
	Sep  1 04:48:49 regress kernel: [10642.577387][T24161]  relocate_block_group+0x528/0x9d0
	Sep  1 04:48:49 regress kernel: [10642.578220][T24161]  ? merge_reloc_roots+0x470/0x470
	Sep  1 04:48:49 regress kernel: [10642.579047][T24161]  btrfs_relocate_block_group+0x26e/0x4c0
	Sep  1 04:48:49 regress kernel: [10642.579968][T24161]  btrfs_relocate_chunk+0x52/0xf0
	Sep  1 04:48:49 regress kernel: [10642.580773][T24161]  btrfs_balance+0xe5b/0x1800
	Sep  1 04:48:49 regress kernel: [10642.581542][T24161]  ? btrfs_relocate_chunk+0xf0/0xf0
	Sep  1 04:48:49 regress kernel: [10642.582381][T24161]  ? kmem_cache_alloc_trace+0x5af/0x740
	Sep  1 04:48:49 regress kernel: [10642.583270][T24161]  ? _copy_from_user+0xaa/0xd0
	Sep  1 04:48:49 regress kernel: [10642.584022][T24161]  btrfs_ioctl_balance+0x3de/0x4c0
	Sep  1 04:48:49 regress kernel: [10642.584819][T24161]  btrfs_ioctl+0x3122/0x4470
	Sep  1 04:48:49 regress kernel: [10642.585540][T24161]  ? __asan_loadN+0xf/0x20
	Sep  1 04:48:49 regress kernel: [10642.586229][T24161]  ? __asan_loadN+0xf/0x20
	Sep  1 04:48:49 regress kernel: [10642.586920][T24161]  ? btrfs_ioctl_get_supported_features+0x30/0x30
	Sep  1 04:48:49 regress kernel: [10642.587935][T24161]  ? __asan_loadN+0xf/0x20
	Sep  1 04:48:49 regress kernel: [10642.588649][T24161]  ? pvclock_clocksource_read+0xeb/0x190
	Sep  1 04:48:49 regress kernel: [10642.589566][T24161]  ? __asan_loadN+0xf/0x20
	Sep  1 04:48:49 regress kernel: [10642.590254][T24161]  ? pvclock_clocksource_read+0xeb/0x190
	Sep  1 04:48:49 regress kernel: [10642.591128][T24161]  ? __kasan_check_read+0x11/0x20
	Sep  1 04:48:49 regress kernel: [10642.591913][T24161]  ? check_chain_key+0x1e6/0x2e0
	Sep  1 04:48:49 regress kernel: [10642.592707][T24161]  ? __asan_loadN+0xf/0x20
	Sep  1 04:48:49 regress kernel: [10642.593409][T24161]  ? pvclock_clocksource_read+0xeb/0x190
	Sep  1 04:48:49 regress kernel: [10642.594312][T24161]  ? kvm_sched_clock_read+0x18/0x30
	Sep  1 04:48:49 regress kernel: [10642.595139][T24161]  ? check_chain_key+0x1e6/0x2e0
	Sep  1 04:48:49 regress kernel: [10642.595929][T24161]  ? sched_clock_cpu+0x1b/0x120
	Sep  1 04:48:49 regress kernel: [10642.596712][T24161]  do_vfs_ioctl+0x13e/0xad0
	Sep  1 04:48:49 regress kernel: [10642.597432][T24161]  ? btrfs_ioctl_get_supported_features+0x30/0x30
	Sep  1 04:48:49 regress kernel: [10642.598455][T24161]  ? do_vfs_ioctl+0x13e/0xad0
	Sep  1 04:48:49 regress kernel: [10642.599202][T24161]  ? compat_ioctl_preallocate+0x170/0x170
	Sep  1 04:48:49 regress kernel: [10642.600128][T24161]  ? __kasan_check_write+0x14/0x20
	Sep  1 04:48:49 regress kernel: [10642.600949][T24161]  ? up_read+0x176/0x4f0
	Sep  1 04:48:49 regress kernel: [10642.601648][T24161]  ? down_write_nested+0x2d0/0x2d0
	Sep  1 04:48:49 regress kernel: [10642.602476][T24161]  ? handle_mm_fault+0x211/0x480
	Sep  1 04:48:49 regress kernel: [10642.603263][T24161]  ? __kasan_check_read+0x11/0x20
	Sep  1 04:48:49 regress kernel: [10642.604062][T24161]  ? __fget_light+0xb2/0x110
	Sep  1 04:48:49 regress kernel: [10642.604805][T24161]  ksys_ioctl+0x67/0x90
	Sep  1 04:48:49 regress kernel: [10642.605471][T24161]  __x64_sys_ioctl+0x43/0x50
	Sep  1 04:48:49 regress kernel: [10642.606203][T24161]  do_syscall_64+0x77/0x2d0
	Sep  1 04:48:49 regress kernel: [10642.606933][T24161]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	Sep  1 04:48:49 regress kernel: [10642.607875][T24161] RIP: 0033:0x7f03f493b427
	Sep  1 04:48:49 regress kernel: [10642.608588][T24161] Code: 00 00 90 48 8b 05 69 aa 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 aa 0c 00 f7 d8 64 89 01 48
	Sep  1 04:48:49 regress kernel: [10642.611697][T24161] RSP: 002b:00007ffd6bd78fb8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
	Sep  1 04:48:49 regress kernel: [10642.613035][T24161] RAX: ffffffffffffffda RBX: 00007ffd6bd79058 RCX: 00007f03f493b427
	Sep  1 04:48:49 regress kernel: [10642.614313][T24161] RDX: 00007ffd6bd79058 RSI: 00000000c4009420 RDI: 0000000000000003
	Sep  1 04:48:49 regress kernel: [10642.615605][T24161] RBP: 0000000000000003 R08: 0000000000000003 R09: 0000000000000078
	Sep  1 04:48:49 regress kernel: [10642.616877][T24161] R10: fffffffffffff5ab R11: 0000000000000206 R12: 0000000000000001
	Sep  1 04:48:49 regress kernel: [10642.618132][T24161] R13: 0000000000000000 R14: 00007ffd6bd7aa46 R15: 0000000000000001
	Sep  1 04:48:49 regress kernel: [10642.619378][T24161] Modules linked in:
	Sep  1 04:48:49 regress kernel: [10642.620153][T24161] ---[ end trace a33c17a7d43dd973 ]---


  reply	other threads:[~2020-09-01 22:53 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-30 22:10 BUG at fs/btrfs/relocation.c:794! David Sterba
2020-07-23 21:56 ` Zygo Blaxell
2020-07-24  0:19   ` Qu Wenruo
2020-08-04 16:16     ` Zygo Blaxell
2020-08-28  0:03       ` BUG at fs/btrfs/relocation.c:794! Still happening on misc-next and 5.8.3 Zygo Blaxell
2020-08-28  0:08         ` Zygo Blaxell
2020-08-28  6:34           ` Nikolay Borisov
2020-08-28 20:42             ` Zygo Blaxell
2020-09-01 22:53               ` Zygo Blaxell [this message]
2020-09-01 23:33                 ` Qu Wenruo
2020-09-02  0:14                   ` Zygo Blaxell
2020-09-02  1:46                     ` Qu Wenruo
2020-09-04 15:54                       ` Zygo Blaxell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200901225341.GY5890@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=dsterba@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=nborisov@suse.com \
    --cc=quwenruo.btrfs@gmx.com \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.