public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Yujie Liu <yujie.liu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	oe-lkp@lists.linux.dev, lkp@intel.com,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org, x86@kernel.org,
	fengwei.yin@intel.com, ying.huang@intel.com,
	fstests@vger.kernel.org, linux-xfs@vger.kernel.org
Subject: Re: [tip:x86/core] [kallsyms] f138918162: WARNING:CPU:#PID:#at_fs/xfs/xfs_message.c:#xfs_buf_alert_ratelimited.cold-#[xfs]
Date: Thu, 27 Oct 2022 09:20:46 -0700	[thread overview]
Message-ID: <Y1qv3ku6ULHy8hOF@magnolia> (raw)
In-Reply-To: <2543dfb1-d9dc-0888-dbea-e420a19d732c@intel.com>

On Thu, Oct 27, 2022 at 10:24:51AM +0800, Yujie Liu wrote:
> On 10/27/2022 10:09, Yujie Liu wrote:
> > On Wed, Oct 26, 2022 at 02:06:17PM +0200, Peter Zijlstra wrote:
> > > On Wed, Oct 26, 2022 at 11:51:14AM +0200, Peter Zijlstra wrote:
> > > > On Wed, Oct 26, 2022 at 05:10:28PM +0800, kernel test robot wrote:
> > > > > Hi Peter,
> > > > > 
> > > > > For below patch, we couldn't find any connection between the code
> > > > > change of kallsyms and xfstests testcase, but we got very stable test
> > > > > results. We tested commit f1389181622a and its parent commit
> > > > > 7825451fa4dc for 12 runs each, parent was 100% good while this commit
> > > > > was 100% bad, so we still send out this report FYI though we don't have
> > > > > clear clue of the root cause. Please check the details below if
> > > > > interested in further investigation. Thanks.
> > > > 
> > > > *groan* I'll go have a poke.
> > > > 
> > > > Also; I've had these patches in my queue.git for over a month now and
> > > > this is the first report, how comes?
> > 
> > Hi Peter,
> > 
> > Sorry about late report. We did catch this problem on
> > call-depth-tracking branch of queue.git in Semtember, but we used to
> > think it may be a false alarm since we cannot connect the code change
> > with xfstests, so didn't report at that time. After merged to x86 tip,
> > we did more tests and reviewed this problem with more developers, and
> > sent out the report finally.
> > 
> > > dmesg starts at 42 seconds, you don't happen to have a complete one?
> > 
> > Sorry for the incomplete dmesg. Please check if attached kmsg.xz helps?
> > The time may vary a little since it is captured from another run.
> > Thanks.
> > 
> > Not sure if this has anything to do with xfs or testcases, so add
> > xfs folks to consult.
> > 
> > Hi xfs folks,
> > 
> > Could you please help check the original report at
> > https://lore.kernel.org/r/202210241508.2e203c3d-yujie.liu@intel.com
> > to see if can find any clue for the problem? Thanks.
> 
> Sorry, wrong link. Should be:
> 
> https://lore.kernel.org/all/202210241614.2ae4c1f5-yujie.liu@intel.com/

xfs/439 corrupts the log and checks that log recovery will notice the
corruption and abort the mount.

Excerpted dmesg from the link above:

[   75.708690][ T3668] ------------[ cut here ]------------
[   75.714041][ T3668] WARNING: CPU: 1 PID: 3668 at fs/xfs/xfs_message.c:104 xfs_buf_alert_ratelimited.cold-0x7/0x45 [xfs]
[   75.779828][ T3668] CPU: 1 PID: 3668 Comm: mount Not tainted 6.1.0-rc1-00053-gf1389181622a #1
[   75.788389][ T3668] Hardware name: Hewlett-Packard HP Pro 3340 MT/17A1, BIOS 8.07 01/24/2013
[   75.796864][ T3668] RIP: 0010:xfs_buf_alert_ratelimited.cold-0x7/0x45 [xfs]
[ 75.898077][ T3668] xlog_find_verify_log_record (fs/xfs/xfs_log_recover.c:431) xfs
[ 75.910856][ T3668] xlog_find_head (fs/xfs/xfs_log_recover.c:705) xfs
[ 75.925847][ T3668] xlog_find_tail (fs/xfs/xfs_log_recover.c:1256) xfs
[ 75.951249][ T3668] xlog_recover (fs/xfs/xfs_log_recover.c:3355) xfs
[ 75.961324][ T3668] xfs_log_mount (fs/xfs/xfs_log.c:739) xfs
[ 75.966486][ T3668] xfs_mountfs (fs/xfs/xfs_mount.c:805 (discriminator 4)) xfs
[ 75.998070][ T3668] xfs_fs_fill_super (fs/xfs/xfs_super.c:1666) xfs

So yes, this is xlog_find_verify_log_record behaving as intended:

		if (i < start_blk) {
			/* valid log record not found */
			xfs_warn(log->l_mp,
		"Log inconsistent (didn't find previous header)");
			ASSERT(0);  <------------------ here
			error = -EFSCORRUPTED;
			goto out;
		}

However, I noticed that the callsite in the WARNING: message has changed
from the usual 'asswarn' (which is the caller of WARN_ON) to
'xfs_buf_alert_ratelimited', which seems totally wrong since XFS log
recovery doesn't touch xfs_buf objects at all.

The reason this turned into a fstests regression is that xfs/439 uses
_filter_assert_dmesg to check for warnings in dmesg that are not the
result of an ASSERT failing (common/rc in fstests):

	# We generate assert related WARNINGs on purpose and make sure
	# test doesn't fail because of these warnings. This is a helper
	# for _check_dmesg() to filter out them.
	_filter_assert_dmesg()
	{
		local warn1="WARNING:.*fs/xfs/xfs_message\.c:.*asswarn.*"
		local warn2="WARNING:.*fs/xfs/xfs_message\.c:.*assfail.*"
		sed -e "s#$warn1#Intentional warnings in asswarn#" \
		    -e "s#$warn2#Intentional warnings in assfail#"
	}

As for why that commit reports the wrong caller ... I don't know.  The
function name and line number are correct.

--D

> 
> --
> Best Regards,
> Yujie

  reply	other threads:[~2022-10-27 16:23 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <202210241614.2ae4c1f5-yujie.liu@intel.com>
     [not found] ` <Y1kDEmLeRA2UGeF8@hirez.programming.kicks-ass.net>
     [not found]   ` <Y1kiuTIYobR4nexS@hirez.programming.kicks-ass.net>
2022-10-27  2:09     ` [tip:x86/core] [kallsyms] f138918162: WARNING:CPU:#PID:#at_fs/xfs/xfs_message.c:#xfs_buf_alert_ratelimited.cold-#[xfs] Yujie Liu
2022-10-27  2:24       ` Yujie Liu
2022-10-27 16:20         ` Darrick J. Wong [this message]
2022-10-27 16:51           ` Peter Zijlstra
2022-10-28 19:52             ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y1qv3ku6ULHy8hOF@magnolia \
    --to=djwong@kernel.org \
    --cc=fengwei.yin@intel.com \
    --cc=fstests@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=ying.huang@intel.com \
    --cc=yujie.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox