Re: [PATCH v2] xfs: test inode allocation state missmatch corruption

FS/XFS testing framework
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Zorro Lang <zlang@redhat.com>
Cc: fstests@vger.kernel.org
Subject: Re: [PATCH v2] xfs: test inode allocation state missmatch corruption
Date: Fri, 18 May 2018 13:59:39 +1000	[thread overview]
Message-ID: <20180518035939.GI23861@dastard> (raw)
In-Reply-To: <20180516081832.GA4893@hp-dl360g9-06.rhts.eng.pek2.redhat.com>

On Wed, May 16, 2018 at 04:18:32PM +0800, Zorro Lang wrote:
> On Sat, May 12, 2018 at 09:32:39AM +1000, Dave Chinner wrote:
> > On Sat, May 12, 2018 at 12:11:27AM +0800, Zorro Lang wrote:
> > > There's a situation where the directory structure and the inobt
> > > thinks the inode is free, but the inode on disk thinks it is still
> > > in use. XFS should detect it and prevent the kernel from oopsing
> > > on lookup.
> > 
> > Isn't this testing the same thing that I recently posted "xfs: test
> > inobt/on disk free state mismatches" for?
> 
> Hi Dave,
> 
> Last week I replied you that we wrote test case for same bug. But I just found
> I can't reproduce any bugs on rhel-7.4 kernel and upstream 4.16 kernel by
> your case [1], is there anything I misunderstood?
> 
> But the case which I wrote for this bug [3] can trigger failures [2]. Even
> ignore that dmesg error (which I don't know if it's related with this bug),
> it still can trigger an error. And this error can't be triggered after merged
> your patch.
> 
> Thanks,
> Zorro
> 
> 
> [1]
> # ./check xfs/132
> FSTYP         -- xfs (non-debug)
> PLATFORM      -- Linux/x86_64 ibm-x3650m4-10 4.16.7-200.fc27.x86_64
> MKFS_OPTIONS  -- -f -bsize=4096 /dev/mapper/fedora-scratchdev
> MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/mapper/fedora-scratchdev /mnt/scratch
> 
> xfs/132  3s
> Ran: xfs/132
> Passed all 1 tests

What did it dump in dmesg? It may be that the filesystem shut down
because of a cascading failure, but I never ran the test on
non-debug builds so I've got no idea what behaviour to expect there.
Whoever runs xfstests without debug being enabled?  :)

I may be that you need KASAN turned on to catch the failure - this
version of the test corrupts dentry cache memory and KASAN was
reliably catching that on unfixed kernels. Memory corruption may not
be immeidately visible on non-debug kernels - it's guaranteed to
kill the machine sooner or later, though.

> [2]
> # ./check xfs/999                                          
> FSTYP         -- xfs (non-debug)               
> PLATFORM      -- Linux/x86_64 ibm-x3650m4-10 4.16.7-200.fc27.x86_64+debug                      
> MKFS_OPTIONS  -- -f -bsize=4096 /dev/mapper/fedora-scratchdev                                  
> MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/mapper/fedora-scratchdev /mnt/scratch
> 
> xfs/999 4s ... - output mismatch (see /root/git/xfstests-zlang/results//xfs/999.out.bad)
>     --- tests/xfs/999.out       2018-05-11 11:12:21.129590901 -0400
>     +++ /root/git/xfstests-zlang/results//xfs/999.out.bad       2018-05-16 04:04:39.958393813 -0400
>     @@ -1,2 +1,2 @@
>      QA output created by 999
>     -SCRATCH_MNT/dir/newfile: Structure needs cleaning
>     +_check_dmesg: something found in dmesg (see /root/git/xfstests-zlang/results//xfs/999.dmesg)
>     ...
>     (Run 'diff -u tests/xfs/999.out /root/git/xfstests-zlang/results//xfs/999.out.bad'  to see the entire diff)
> 
> Ran: xfs/999                                   
> Failures: xfs/999                              
> Failed 1 of 1 tests
> 
> # dmesg
> [ 1160.076247] ------------[ cut here ]------------
> [ 1160.081403] kernel BUG at lib/list_debug.c:31!
> [ 1160.086399] invalid opcode: 0000 [#1] SMP PTI
> [ 1160.091274] Modules linked in: sunrpc intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate igb intel_uncore ptp iTCO_wdt iTCO_vendor_support pps_core intel_rapl_perf wmi tpm_tis ipmi_ssif tpm_tis_core cdc_ether usbnet mii tpm i2c_i801 ipmi_si ipmi_devintf lpc_ich ipmi_msghandler shpchp ioatdma dca xfs libcrc32c mgag200 i2c_algo_bit drm_kms_helper ttm drm crc32c_intel megaraid_sas
> [ 1160.138576] CPU: 21 PID: 2746 Comm: xfs_io Not tainted 4.16.7-200.fc27.x86_64+debug #1
> [ 1160.147412] Hardware name: IBM System x3650 M4 -[7915ON3]-/00J6520, BIOS -[VVE124AUS-1.30]- 11/21/2012
> [ 1160.157809] RIP: 0010:__list_add_valid+0x61/0x70
> [ 1160.162963] RSP: 0018:ffffb9a28709b970 EFLAGS: 00010282
> [ 1160.168796] RAX: 0000000000000058 RBX: ffff8c6da4a7da30 RCX: 0000000000000000
> [ 1160.176760] RDX: 0000000000000000 RSI: ffff8c6db71d6c48 RDI: ffff8c6db71d6c48
> [ 1160.184723] RBP: ffff8c6db1fe3000 R08: 0000000000000001 R09: 0000000000000000
> [ 1160.192687] R10: ffffb9a28709b8f0 R11: 0000000000000000 R12: ffff8c6da4a7dc10
> [ 1160.200651] R13: ffff8c6da4a7dc10 R14: ffff8c6db1fe3a88 R15: ffff8c6daff70360
> [ 1160.208616] FS:  00007f5911fad840(0000) GS:ffff8c6db7000000(0000) knlGS:0000000000000000
> [ 1160.217646] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1160.224058] CR2: 0000564ff49e3290 CR3: 0000000467b92001 CR4: 00000000000606e0
> [ 1160.232023] Call Trace:
> [ 1160.234756]  inode_sb_list_add+0x47/0x80
> [ 1160.239200]  xfs_setup_inode+0x28/0x160 [xfs]
> [ 1160.244093]  xfs_ialloc+0x30d/0x520 [xfs]
> [ 1160.248600]  xfs_dir_ialloc+0x74/0x240 [xfs]

So this has allocated an inode that is already cached in memory,
which is a different symptom of the same problem. 

i.e. there are two fixes for the problem. The initial cold-cache
test fixes were in commit ee457001ed6c ("xfs: catch inode allocation
state mismatch corruption"). The hot cache fixes were in
commit afca6c5b2595 ("xfs: validate cached inodes
are free when allocated"), and the commit message says:

	We recently fixed a similar inode allocation issue caused by
	inobt record corruption problem in xfs_iget_cache_miss() in
	commit ee457001ed6c ("xfs: catch inode allocation state
	mismatch corruption"). This change adds similar checks to
	the cache-hit path to catch it, and turns the reproducer
	into a corruption shutdown situation.

IOWs, the tests are exercising the same corruption, just through
different code paths. So it would seem that we need both tests...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

next prev parent reply	other threads:[~2018-05-18  3:59 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-11 16:11 [PATCH v2] xfs: test inode allocation state missmatch corruption Zorro Lang
2018-05-11 23:32 ` Dave Chinner
2018-05-12 13:18   ` Zorro Lang
2018-05-16  8:18   ` Zorro Lang
2018-05-18  3:59     ` Dave Chinner [this message]
2018-05-18  5:26       ` Zorro Lang
  -- strict thread matches above, loose matches on Subject: below --
2018-05-21  1:58 Zorro Lang
2018-05-21  9:04 ` Eryu Guan
2018-05-21 15:55   ` Darrick J. Wong
2018-05-21 22:50     ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180518035939.GI23861@dastard \
    --to=david@fromorbit.com \
    --cc=fstests@vger.kernel.org \
    --cc=zlang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox