WARNING at fs/btrfs/inode.c:9261 btrfs_destroy

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* WARNING at fs/btrfs/inode.c:9261 btrfs_destroy_inode()
@ 2016-05-10 23:19 Eric Biggers
  2016-05-10 23:57 ` Chris Mason
  2016-05-14  0:56 ` WARNING at fs/btrfs/inode.c:9261 btrfs_destroy_inode() Chris Mason
  0 siblings, 2 replies; 6+ messages in thread
From: Eric Biggers @ 2016-05-10 23:19 UTC (permalink / raw)
  To: linux-btrfs

Hello,

The following warning has been triggering for me since about v4.6-rc3:

	WARN_ON(BTRFS_I(inode)->csum_bytes);

On one machine the warning has occurred 657 times since v4.6-rc5.  On another it
has occurred 3 times since v4.6-rc3.  Both are now on v4.6-rc7, where I have
still observed the warning.  The warnings occur in groups, and do_unlinkat() and
evict() are always in the call stack.

Is this a known issue?  Here is the first occurrence:

Apr 17 10:31:19 zzz kernel: ------------[ cut here ]------------
Apr 17 10:31:19 zzz kernel: WARNING: CPU: 0 PID: 4092 at fs/btrfs/inode.c:9261 btrfs_destroy_inode+0x23c/0x2b0
Apr 17 10:31:19 zzz kernel: Modules linked in: fuse vhost_net vhost tun bridge stp llc ccm iptable_filter iptable_nat nf_conntrack_ipv4 
Apr 17 10:31:19 zzz kernel: CPU: 0 PID: 4092 Comm: rm Tainted: G        W       4.6.0-rc3 #178
Apr 17 10:31:19 zzz kernel: Hardware name: Dell Inc. Inspiron 15-7568/0M5YMV, BIOS 01.00.00 08/07/2015
Apr 17 10:31:19 zzz kernel:  0000000000000286 0000000031dfd09a ffff8801b9277dc0 ffffffff813320d8
Apr 17 10:31:19 zzz kernel:  0000000000000000 0000000000000000 ffff8801b9277e00 ffffffff81074221
Apr 17 10:31:19 zzz kernel:  0000242d85cec000 ffff88019408c9a8 ffff88019408c9a8 ffff880085cec000
Apr 17 10:31:19 zzz kernel: Call Trace:
Apr 17 10:31:19 zzz kernel:  [<ffffffff813320d8>] dump_stack+0x4d/0x65
Apr 17 10:31:19 zzz kernel:  [<ffffffff81074221>] __warn+0xc1/0xe0
Apr 17 10:31:19 zzz kernel:  [<ffffffff81074348>] warn_slowpath_null+0x18/0x20
Apr 17 10:31:19 zzz kernel:  [<ffffffff8126e77c>] btrfs_destroy_inode+0x23c/0x2b0
Apr 17 10:31:19 zzz kernel:  [<ffffffff81185296>] destroy_inode+0x36/0x60
Apr 17 10:31:19 zzz kernel:  [<ffffffff811853e4>] evict+0x124/0x180
Apr 17 10:31:19 zzz kernel:  [<ffffffff81185b68>] iput+0x148/0x1d0
Apr 17 10:31:19 zzz kernel:  [<ffffffff8117b884>] do_unlinkat+0x194/0x2b0
Apr 17 10:31:19 zzz kernel:  [<ffffffff8117c0e6>] SyS_unlinkat+0x16/0x30
Apr 17 10:31:19 zzz kernel:  [<ffffffff817031db>] entry_SYSCALL_64_fastpath+0x13/0x8f
Apr 17 10:31:19 zzz kernel: ---[ end trace 87f6a0e10f4df484 ]---

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: WARNING at fs/btrfs/inode.c:9261 btrfs_destroy_inode()
  2016-05-10 23:19 WARNING at fs/btrfs/inode.c:9261 btrfs_destroy_inode() Eric Biggers
@ 2016-05-10 23:57 ` Chris Mason
  2016-05-11 18:25   ` Adam Borowski
  2016-05-14  0:56 ` WARNING at fs/btrfs/inode.c:9261 btrfs_destroy_inode() Chris Mason
  1 sibling, 1 reply; 6+ messages in thread
From: Chris Mason @ 2016-05-10 23:57 UTC (permalink / raw)
  To: Eric Biggers; +Cc: linux-btrfs

On Tue, May 10, 2016 at 06:19:20PM -0500, Eric Biggers wrote:
> Hello,
> 
> The following warning has been triggering for me since about v4.6-rc3:
> 
> 	WARN_ON(BTRFS_I(inode)->csum_bytes);
> 
> On one machine the warning has occurred 657 times since v4.6-rc5.  On another it
> has occurred 3 times since v4.6-rc3.  Both are now on v4.6-rc7, where I have
> still observed the warning.  The warnings occur in groups, and do_unlinkat() and
> evict() are always in the call stack.
> 
> Is this a known issue?  Here is the first occurrence:

It is a known issue, but I'm having a very hard time triggering it
quickly enough to track it down.  I do know csum_bytes is only 4096 or
8192 when it hits, but beyond that I haven't been able to trigger it
consistently enough to test a patch.

But, I'm trying ;)

-chris

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: WARNING at fs/btrfs/inode.c:9261 btrfs_destroy_inode()
  2016-05-10 23:57 ` Chris Mason
@ 2016-05-11 18:25   ` Adam Borowski
  2016-05-11 18:29     ` [PATCH for 4.6] btrfs: disable a spurious WARN_ON in btrfs_destroy_inode Adam Borowski
  0 siblings, 1 reply; 6+ messages in thread
From: Adam Borowski @ 2016-05-11 18:25 UTC (permalink / raw)
  To: Chris Mason, linux-btrfs, Eric Biggers

Chris Mason wrote:
> On Tue, May 10, 2016 at 06:19:20PM -0500, Eric Biggers wrote:
> > The following warning has been triggering for me since about v4.6-rc3:
> > 
> >       WARN_ON(BTRFS_I(inode)->csum_bytes);
> > 
> > On one machine the warning has occurred 657 times since v4.6-rc5.  On another it
> > has occurred 3 times since v4.6-rc3.  Both are now on v4.6-rc7, where I have
> > still observed the warning.  The warnings occur in groups, and do_unlinkat() and
> > evict() are always in the call stack.
>
> It is a known issue, but I'm having a very hard time triggering it
> quickly enough to track it down.  I do know csum_bytes is only 4096 or
> 8192 when it hits, but beyond that I haven't been able to trigger it
> consistently enough to test a patch.

It happens for me after no more than a few minutes of regular use that
includes unlink, destructive rename or snapshot deletion, even without
deliberately trying to trigger it.  On the other hand, repeating the same
operation that triggered the warning doesn't trigger it again (on live fs, I
did not try a whole-device snapshot of the filesystem).

Effective mount options for this fs:
/dev/sda1 on / type btrfs (rw,noatime,compress=lzo,ssd,space_cache,subvolid=4996,subvol=/sys)
/dev/sda1 on /var/cache type btrfs (rw,noatime,compress=lzo,ssd,space_cache,subvolid=263,subvol=/syscache)
/dev/sda1 on /home type btrfs (rw,noatime,compress=lzo,ssd,space_cache,subvolid=258,subvol=/home)
/dev/sda1 on /mnt/btr1 type btrfs (rw,noatime,compress=lzo,ssd,space_cache,subvolid=5,subvol=/)
/dev/sda1 on /home/kilobyte/.cache type btrfs (rw,noatime,compress=lzo,ssd,space_cache,subvolid=259,subvol=/kb-cache)
/dev/sda1 on /home/kilobyte/tmp type btrfs (rw,noatime,compress=lzo,ssd,space_cache,subvolid=260,subvol=/kb-tmp)
/dev/sda1 on /srv/chroots type btrfs (rw,noatime,compress=lzo,ssd,space_cache,subvolid=5,subvol=/chroots)

I've been running with asserts/etc and a s/WARN_ON/printk/ patch I'll send
after this mail for almost two weeks already, no data loss or any
inconsistency.  My use is mostly sbuild: snapshot+lots of dpkg+lots of
gcc+snapshot deletion.

Could you confirm my understanding?  I don't know btrfs internals, but it
appears to me that unsaved pending checksums attached to an inode don't
matter if the inode is being deleted anyway.  Is this correct?

If so, please apply the patch to silence the warning in non-ASSERTS builds;
people rightfully panic when they see a WARN with tainted kernel and so on,
believing the filesystem needs an immediate fsck + possibly restore.
Having such pending checksums means badness elsewhere -- they should have
been already saved at this point, right?  But no matter the cause, it
doesn't break btrfs_destroy_inode.

Meow!
-- 
How to exploit the Bible for weight loss:
Pr28:25: he that putteth his trust in the ʟᴏʀᴅ shall be made fat.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH for 4.6] btrfs: disable a spurious WARN_ON in btrfs_destroy_inode.
  2016-05-11 18:25   ` Adam Borowski
@ 2016-05-11 18:29     ` Adam Borowski
  2016-05-11 20:17       ` Josef Bacik
  0 siblings, 1 reply; 6+ messages in thread
From: Adam Borowski @ 2016-05-11 18:29 UTC (permalink / raw)
  To: Chris Mason, linux-btrfs, Eric Biggers; +Cc: Josef Bacik, David Sterba

This happens a lot on real-world loads.  The issue is apparently benign,
as unsaved pending checksums are moot when the ship^Winode is going down
anyway.  Thus, no need to cause panic in users.

I've retained the warning in CONFIG_BTRFS_ASSERT builds, as this shouldn't
happen.  I've replaced the no longer helpful register+stack dump with a
printk that mentions the device affected.

Signed-off-by: Adam Borowski <kilobyte@angband.pl>
---
 fs/btrfs/inode.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 2aaba58..ed78104 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9258,7 +9258,10 @@ void btrfs_destroy_inode(struct inode *inode)
 	WARN_ON(BTRFS_I(inode)->outstanding_extents);
 	WARN_ON(BTRFS_I(inode)->reserved_extents);
 	WARN_ON(BTRFS_I(inode)->delalloc_bytes);
-	WARN_ON(BTRFS_I(inode)->csum_bytes);
+#ifdef CONFIG_BTRFS_ASSERT
+	if (BTRFS_I(inode)->csum_bytes)
+		btrfs_info(root->fs_info, "btrfs_destroy_inode: leftover csum_bytes");
+#endif
 	WARN_ON(BTRFS_I(inode)->defrag_bytes);
 
 	/*
--
2.8.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH for 4.6] btrfs: disable a spurious WARN_ON in btrfs_destroy_inode.
  2016-05-11 18:29     ` [PATCH for 4.6] btrfs: disable a spurious WARN_ON in btrfs_destroy_inode Adam Borowski
@ 2016-05-11 20:17       ` Josef Bacik
  0 siblings, 0 replies; 6+ messages in thread
From: Josef Bacik @ 2016-05-11 20:17 UTC (permalink / raw)
  To: Adam Borowski, Chris Mason, linux-btrfs, Eric Biggers; +Cc: David Sterba

On 05/11/2016 11:29 AM, Adam Borowski wrote:
> This happens a lot on real-world loads.  The issue is apparently benign,
> as unsaved pending checksums are moot when the ship^Winode is going down
> anyway.  Thus, no need to cause panic in users.
>
> I've retained the warning in CONFIG_BTRFS_ASSERT builds, as this shouldn't
> happen.  I've replaced the no longer helpful register+stack dump with a
> printk that mentions the device affected.
>

Chris is working on fixing the problem, and these are real issues, we 
need to not hide them behind config options.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: WARNING at fs/btrfs/inode.c:9261 btrfs_destroy_inode()
  2016-05-10 23:19 WARNING at fs/btrfs/inode.c:9261 btrfs_destroy_inode() Eric Biggers
  2016-05-10 23:57 ` Chris Mason
@ 2016-05-14  0:56 ` Chris Mason
  1 sibling, 0 replies; 6+ messages in thread
From: Chris Mason @ 2016-05-14  0:56 UTC (permalink / raw)
  To: Eric Biggers; +Cc: linux-btrfs

On Tue, May 10, 2016 at 06:19:20PM -0500, Eric Biggers wrote:
> Hello,
> 
> The following warning has been triggering for me since about v4.6-rc3:
> 
> 	WARN_ON(BTRFS_I(inode)->csum_bytes);
> 
> On one machine the warning has occurred 657 times since v4.6-rc5.  On another it
> has occurred 3 times since v4.6-rc3.  Both are now on v4.6-rc7, where I have
> still observed the warning.  The warnings occur in groups, and do_unlinkat() and
> evict() are always in the call stack.
> 
> Is this a known issue?  Here is the first occurrence:

Finally tracked this down today, it's a bug in how we deal with page
faults in the middle of a write.  We're testing the fix here and I'll
have a patch on Monday.

Strictly speaking this is a regression in one of Chandan's setup patches
for PAGE_SIZE > sectorsize, but the code was much much too subtle to blame
him.  It should be possible to trigger with a non-aligned multi-page
write, I'm trying to nail down a reliable test program so we don't make
the same mistake again.

-chris

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-05-14  0:56 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-10 23:19 WARNING at fs/btrfs/inode.c:9261 btrfs_destroy_inode() Eric Biggers
2016-05-10 23:57 ` Chris Mason
2016-05-11 18:25   ` Adam Borowski
2016-05-11 18:29     ` [PATCH for 4.6] btrfs: disable a spurious WARN_ON in btrfs_destroy_inode Adam Borowski
2016-05-11 20:17       ` Josef Bacik
2016-05-14  0:56 ` WARNING at fs/btrfs/inode.c:9261 btrfs_destroy_inode() Chris Mason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).