From: Dave Chinner <david@fromorbit.com>
To: Oliver Sang <oliver.sang@intel.com>
Cc: Guo Xuenan <guoxuenan@huawei.com>,
lkp@lists.01.org, lkp@intel.com, Hou Tao <houtao1@huawei.com>,
linux-xfs@vger.kernel.org
Subject: Re: [xfs] a1df10d42b: xfstests.generic.31*.fail
Date: Mon, 10 Oct 2022 11:07:40 +1100 [thread overview]
Message-ID: <20221010000740.GU3600936@dread.disaster.area> (raw)
In-Reply-To: <Y0J1oxBFwW53udvJ@xsang-OptiPlex-9020>
On Sun, Oct 09, 2022 at 03:17:55PM +0800, Oliver Sang wrote:
> Hi Dave,
>
> On Thu, Oct 06, 2022 at 08:35:43AM +1100, Dave Chinner wrote:
> > On Wed, Oct 05, 2022 at 09:45:12PM +0800, kernel test robot wrote:
> > >
> > > Greeting,
> > >
> > > FYI, we noticed the following commit (built with gcc-11):
> > >
> > > commit: a1df10d42ba99c946f6a574d4d31951bc0a57e33 ("xfs: fix exception caused by unexpected illegal bestcount in leaf dir")
> > > url: https://github.com/intel-lab-lkp/linux/commits/UPDATE-20220929-162751/Guo-Xuenan/xfs-fix-uaf-when-leaf-dir-bestcount-not-match-with-dir-data-blocks/20220831-195920
> > >
> > > in testcase: xfstests
> > > version: xfstests-x86_64-5a5e419-1_20220927
> > > with following parameters:
> > >
> > > disk: 4HDD
> > > fs: xfs
> > > test: generic-group-15
> > >
> > > test-description: xfstests is a regression test suite for xfs and other files ystems.
> > > test-url: git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
> > >
> > >
> > > on test machine: 4 threads 1 sockets Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz (Ivy Bridge) with 8G memory
> > >
> > > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> >
> > THe attached dmesg ends at:
> >
> > [...]
> > [ 102.727610][ T315] generic/309 IPMI BMC is not supported on this machine, skip bmc-watchdog setup!
> > [ 102.727630][ T315]
> > [ 103.884498][ T7407] XFS (sda1): EXPERIMENTAL online scrub feature in use. Use at your own risk!
> > [ 103.993962][ T7431] XFS (sda1): Unmounting Filesystem
> > [ 104.193659][ T7580] XFS (sda1): Mounting V5 Filesystem
> > [ 104.221178][ T7580] XFS (sda1): Ending clean mount
> > [ 104.223821][ T7580] xfs filesystem being mounted at /fs/sda1 supports timestamps until 2038 (0x7fffffff)
> > [ 104.285615][ T315] 2s
> > [ 104.285629][ T315]
> > [ 104.339232][ T1469] run fstests generic/310 at 2022-10-01 13:36:36
> > (END)
> >
> > The start of the failed test. Do you have the logs from generic/310
> > so we might have some idea what corruption/shutdown event occurred
> > during that test run?
>
> sorry for that. I attached dmesg for another run.
[ 109.424124][ T1474] run fstests generic/310 at 2022-10-01 10:14:01
[ 169.865043][ T7563] XFS (sda1): Metadata corruption detected at xfs_dir3_leaf_check_int+0x381/0x600 [xfs], xfs_dir3_leafn block 0x4000088
[ 169.865406][ T7563] XFS (sda1): Unmount and run xfs_repair
[ 169.865510][ T7563] XFS (sda1): First 128 bytes of corrupted metadata buffer:
[ 169.865639][ T7563] 00000000: 00 80 00 01 00 00 00 00 3d ff 00 00 00 00 00 00 ........=.......
[ 169.865793][ T7563] 00000010: 00 00 00 00 04 00 00 88 00 00 00 00 00 00 00 00 ................
[ 169.865945][ T7563] 00000020: 27 64 dd b1 81 61 45 2b 86 66 64 67 56 f2 40 58 'd...aE+.fdgV.@X
[ 169.866122][ T7563] 00000030: 00 00 00 00 00 00 00 87 00 fc 00 00 00 00 00 00 ................
[ 169.866293][ T7563] 00000040: 00 00 00 2e 00 00 00 08 00 00 00 31 00 00 00 0c ...........1....
[ 169.866467][ T7563] 00000050: 00 00 00 32 00 00 00 0e 00 00 00 33 00 00 00 10 ...2.......3....
[ 169.866640][ T7563] 00000060: 00 00 00 34 00 00 00 12 00 00 00 35 00 00 00 14 ...4.......5....
[ 169.866816][ T7563] 00000070: 00 00 00 36 00 00 00 16 00 00 00 37 00 00 00 18 ...6.......7....
[ 169.867002][ T7563] XFS (sda1): Corruption of in-memory data (0x8) detected at _xfs_buf_ioapply+0x508/0x600 [xfs] (fs/xfs/xfs_buf.c:1552). Shutting down filesystem.
I don't see any corruption in the leafn header or the first few hash
entries there. It does say it has 0xfc entries in the block, which
is correct for a full leaf of hash pointers. It has no stale
entries, which is correct according to the what the test does (it
does not remove directory entries at all. It has a forward pointer
but no backwards pointer, which is expected as the hash values tell
me this should be the left-most leaf block in the tree.
The error has been detected at write time, which means the problem
was detected before it got written to disk. But I don't see what
code in xfs_dir3_leaf_check_int() is even triggering a warning on a
leafn block here - what line of code does
xfs_dir3_leaf_check_int+0x381/0x600 actually resolve to?
.....
<nnngggghhh>
No wonder I can't reproduce this locally.
commit a1df10d42ba99c946f6a574d4d31951bc0a57e33 *does not exist in
the upstream xfs-dev tree*. The URL provided pointing to the commit
above resolves to a "404 page not found" error, so I have not idea
what code was even being tested here.
AFAICT, the patch being tested is this one (based on the github url
matching the patch title:
https://lore.kernel.org/linux-xfs/20220831121639.3060527-1-guoxuenan@huawei.com/
Which I NACKed almost a whole month ago! The latest revision of the
patch was posted 2 days ago here:
https://lore.kernel.org/linux-xfs/20221008033624.1237390-1-guoxuenan@huawei.com/
Intel kernel robot maintainers: I've just wasted the best part of 2
hours trying to reproduce and track down a corruption bug that this
report lead me to beleive was in the upstream XFS tree.
You need to make it very clear that your bug report is for a commit
that *hasn't been merged into an upstream tree*. The CI robot
noticed a bug in an *old* NACKed patch, not a bug in a new upstream
commit. Please make it *VERY CLEAR* where the code the CI robot is
testing has come from.
Not happy.
--
Dave Chinner
david@fromorbit.com
WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: lkp@lists.01.org
Subject: Re: [xfs] a1df10d42b: xfstests.generic.31*.fail
Date: Mon, 10 Oct 2022 11:07:40 +1100 [thread overview]
Message-ID: <20221010000740.GU3600936@dread.disaster.area> (raw)
In-Reply-To: <Y0J1oxBFwW53udvJ@xsang-OptiPlex-9020>
[-- Attachment #1: Type: text/plain, Size: 5546 bytes --]
On Sun, Oct 09, 2022 at 03:17:55PM +0800, Oliver Sang wrote:
> Hi Dave,
>
> On Thu, Oct 06, 2022 at 08:35:43AM +1100, Dave Chinner wrote:
> > On Wed, Oct 05, 2022 at 09:45:12PM +0800, kernel test robot wrote:
> > >
> > > Greeting,
> > >
> > > FYI, we noticed the following commit (built with gcc-11):
> > >
> > > commit: a1df10d42ba99c946f6a574d4d31951bc0a57e33 ("xfs: fix exception caused by unexpected illegal bestcount in leaf dir")
> > > url: https://github.com/intel-lab-lkp/linux/commits/UPDATE-20220929-162751/Guo-Xuenan/xfs-fix-uaf-when-leaf-dir-bestcount-not-match-with-dir-data-blocks/20220831-195920
> > >
> > > in testcase: xfstests
> > > version: xfstests-x86_64-5a5e419-1_20220927
> > > with following parameters:
> > >
> > > disk: 4HDD
> > > fs: xfs
> > > test: generic-group-15
> > >
> > > test-description: xfstests is a regression test suite for xfs and other files ystems.
> > > test-url: git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
> > >
> > >
> > > on test machine: 4 threads 1 sockets Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz (Ivy Bridge) with 8G memory
> > >
> > > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> >
> > THe attached dmesg ends at:
> >
> > [...]
> > [ 102.727610][ T315] generic/309 IPMI BMC is not supported on this machine, skip bmc-watchdog setup!
> > [ 102.727630][ T315]
> > [ 103.884498][ T7407] XFS (sda1): EXPERIMENTAL online scrub feature in use. Use at your own risk!
> > [ 103.993962][ T7431] XFS (sda1): Unmounting Filesystem
> > [ 104.193659][ T7580] XFS (sda1): Mounting V5 Filesystem
> > [ 104.221178][ T7580] XFS (sda1): Ending clean mount
> > [ 104.223821][ T7580] xfs filesystem being mounted at /fs/sda1 supports timestamps until 2038 (0x7fffffff)
> > [ 104.285615][ T315] 2s
> > [ 104.285629][ T315]
> > [ 104.339232][ T1469] run fstests generic/310 at 2022-10-01 13:36:36
> > (END)
> >
> > The start of the failed test. Do you have the logs from generic/310
> > so we might have some idea what corruption/shutdown event occurred
> > during that test run?
>
> sorry for that. I attached dmesg for another run.
[ 109.424124][ T1474] run fstests generic/310 at 2022-10-01 10:14:01
[ 169.865043][ T7563] XFS (sda1): Metadata corruption detected at xfs_dir3_leaf_check_int+0x381/0x600 [xfs], xfs_dir3_leafn block 0x4000088
[ 169.865406][ T7563] XFS (sda1): Unmount and run xfs_repair
[ 169.865510][ T7563] XFS (sda1): First 128 bytes of corrupted metadata buffer:
[ 169.865639][ T7563] 00000000: 00 80 00 01 00 00 00 00 3d ff 00 00 00 00 00 00 ........=.......
[ 169.865793][ T7563] 00000010: 00 00 00 00 04 00 00 88 00 00 00 00 00 00 00 00 ................
[ 169.865945][ T7563] 00000020: 27 64 dd b1 81 61 45 2b 86 66 64 67 56 f2 40 58 'd...aE+.fdgV.(a)X
[ 169.866122][ T7563] 00000030: 00 00 00 00 00 00 00 87 00 fc 00 00 00 00 00 00 ................
[ 169.866293][ T7563] 00000040: 00 00 00 2e 00 00 00 08 00 00 00 31 00 00 00 0c ...........1....
[ 169.866467][ T7563] 00000050: 00 00 00 32 00 00 00 0e 00 00 00 33 00 00 00 10 ...2.......3....
[ 169.866640][ T7563] 00000060: 00 00 00 34 00 00 00 12 00 00 00 35 00 00 00 14 ...4.......5....
[ 169.866816][ T7563] 00000070: 00 00 00 36 00 00 00 16 00 00 00 37 00 00 00 18 ...6.......7....
[ 169.867002][ T7563] XFS (sda1): Corruption of in-memory data (0x8) detected at _xfs_buf_ioapply+0x508/0x600 [xfs] (fs/xfs/xfs_buf.c:1552). Shutting down filesystem.
I don't see any corruption in the leafn header or the first few hash
entries there. It does say it has 0xfc entries in the block, which
is correct for a full leaf of hash pointers. It has no stale
entries, which is correct according to the what the test does (it
does not remove directory entries at all. It has a forward pointer
but no backwards pointer, which is expected as the hash values tell
me this should be the left-most leaf block in the tree.
The error has been detected at write time, which means the problem
was detected before it got written to disk. But I don't see what
code in xfs_dir3_leaf_check_int() is even triggering a warning on a
leafn block here - what line of code does
xfs_dir3_leaf_check_int+0x381/0x600 actually resolve to?
.....
<nnngggghhh>
No wonder I can't reproduce this locally.
commit a1df10d42ba99c946f6a574d4d31951bc0a57e33 *does not exist in
the upstream xfs-dev tree*. The URL provided pointing to the commit
above resolves to a "404 page not found" error, so I have not idea
what code was even being tested here.
AFAICT, the patch being tested is this one (based on the github url
matching the patch title:
https://lore.kernel.org/linux-xfs/20220831121639.3060527-1-guoxuenan(a)huawei.com/
Which I NACKed almost a whole month ago! The latest revision of the
patch was posted 2 days ago here:
https://lore.kernel.org/linux-xfs/20221008033624.1237390-1-guoxuenan(a)huawei.com/
Intel kernel robot maintainers: I've just wasted the best part of 2
hours trying to reproduce and track down a corruption bug that this
report lead me to beleive was in the upstream XFS tree.
You need to make it very clear that your bug report is for a commit
that *hasn't been merged into an upstream tree*. The CI robot
noticed a bug in an *old* NACKed patch, not a bug in a new upstream
commit. Please make it *VERY CLEAR* where the code the CI robot is
testing has come from.
Not happy.
--
Dave Chinner
david(a)fromorbit.com
next prev parent reply other threads:[~2022-10-10 0:34 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-05 13:45 [xfs] a1df10d42b: xfstests.generic.31*.fail kernel test robot
2022-10-05 13:45 ` kernel test robot
2022-10-05 21:35 ` Dave Chinner
2022-10-05 21:35 ` Dave Chinner
2022-10-09 7:17 ` Oliver Sang
2022-10-09 7:17 ` Oliver Sang
2022-10-10 0:07 ` Dave Chinner [this message]
2022-10-10 0:07 ` Dave Chinner
2022-10-10 0:32 ` [LKP] " Philip Li
2022-10-10 0:32 ` Philip Li
2022-10-10 20:54 ` [LKP] " Dave Chinner
2022-10-10 20:54 ` Dave Chinner
2022-10-11 1:25 ` [LKP] " Philip Li
2022-10-11 1:25 ` Philip Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221010000740.GU3600936@dread.disaster.area \
--to=david@fromorbit.com \
--cc=guoxuenan@huawei.com \
--cc=houtao1@huawei.com \
--cc=linux-xfs@vger.kernel.org \
--cc=lkp@intel.com \
--cc=lkp@lists.01.org \
--cc=oliver.sang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.