* Re: [tip:x86/core] [kallsyms] f138918162: WARNING:CPU:#PID:#at_fs/xfs/xfs_message.c:#xfs_buf_alert_ratelimited.cold-#[xfs]
[not found] ` <Y1kiuTIYobR4nexS@hirez.programming.kicks-ass.net>
@ 2022-10-27 2:09 ` Yujie Liu
2022-10-27 2:24 ` Yujie Liu
0 siblings, 1 reply; 5+ messages in thread
From: Yujie Liu @ 2022-10-27 2:09 UTC (permalink / raw)
To: Peter Zijlstra
Cc: oe-lkp, lkp, Thomas Gleixner, linux-kernel, x86, fengwei.yin,
ying.huang, fstests, Darrick J. Wong, linux-xfs
[-- Attachment #1: Type: text/plain, Size: 1768 bytes --]
On Wed, Oct 26, 2022 at 02:06:17PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 26, 2022 at 11:51:14AM +0200, Peter Zijlstra wrote:
> > On Wed, Oct 26, 2022 at 05:10:28PM +0800, kernel test robot wrote:
> > > Hi Peter,
> > >
> > > For below patch, we couldn't find any connection between the code
> > > change of kallsyms and xfstests testcase, but we got very stable test
> > > results. We tested commit f1389181622a and its parent commit
> > > 7825451fa4dc for 12 runs each, parent was 100% good while this commit
> > > was 100% bad, so we still send out this report FYI though we don't have
> > > clear clue of the root cause. Please check the details below if
> > > interested in further investigation. Thanks.
> >
> > *groan* I'll go have a poke.
> >
> > Also; I've had these patches in my queue.git for over a month now and
> > this is the first report, how comes?
Hi Peter,
Sorry about late report. We did catch this problem on
call-depth-tracking branch of queue.git in Semtember, but we used to
think it may be a false alarm since we cannot connect the code change
with xfstests, so didn't report at that time. After merged to x86 tip,
we did more tests and reviewed this problem with more developers, and
sent out the report finally.
> dmesg starts at 42 seconds, you don't happen to have a complete one?
Sorry for the incomplete dmesg. Please check if attached kmsg.xz helps?
The time may vary a little since it is captured from another run.
Thanks.
Not sure if this has anything to do with xfs or testcases, so add
xfs folks to consult.
Hi xfs folks,
Could you please help check the original report at
https://lore.kernel.org/r/202210241508.2e203c3d-yujie.liu@intel.com
to see if can find any clue for the problem? Thanks.
--
Best Regards,
Yujie
[-- Attachment #2: kmsg.xz --]
[-- Type: application/x-xz, Size: 30300 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [tip:x86/core] [kallsyms] f138918162: WARNING:CPU:#PID:#at_fs/xfs/xfs_message.c:#xfs_buf_alert_ratelimited.cold-#[xfs]
2022-10-27 2:09 ` [tip:x86/core] [kallsyms] f138918162: WARNING:CPU:#PID:#at_fs/xfs/xfs_message.c:#xfs_buf_alert_ratelimited.cold-#[xfs] Yujie Liu
@ 2022-10-27 2:24 ` Yujie Liu
2022-10-27 16:20 ` Darrick J. Wong
0 siblings, 1 reply; 5+ messages in thread
From: Yujie Liu @ 2022-10-27 2:24 UTC (permalink / raw)
To: Peter Zijlstra
Cc: oe-lkp, lkp, Thomas Gleixner, linux-kernel, x86, fengwei.yin,
ying.huang, fstests, Darrick J. Wong, linux-xfs
On 10/27/2022 10:09, Yujie Liu wrote:
> On Wed, Oct 26, 2022 at 02:06:17PM +0200, Peter Zijlstra wrote:
>> On Wed, Oct 26, 2022 at 11:51:14AM +0200, Peter Zijlstra wrote:
>>> On Wed, Oct 26, 2022 at 05:10:28PM +0800, kernel test robot wrote:
>>>> Hi Peter,
>>>>
>>>> For below patch, we couldn't find any connection between the code
>>>> change of kallsyms and xfstests testcase, but we got very stable test
>>>> results. We tested commit f1389181622a and its parent commit
>>>> 7825451fa4dc for 12 runs each, parent was 100% good while this commit
>>>> was 100% bad, so we still send out this report FYI though we don't have
>>>> clear clue of the root cause. Please check the details below if
>>>> interested in further investigation. Thanks.
>>>
>>> *groan* I'll go have a poke.
>>>
>>> Also; I've had these patches in my queue.git for over a month now and
>>> this is the first report, how comes?
>
> Hi Peter,
>
> Sorry about late report. We did catch this problem on
> call-depth-tracking branch of queue.git in Semtember, but we used to
> think it may be a false alarm since we cannot connect the code change
> with xfstests, so didn't report at that time. After merged to x86 tip,
> we did more tests and reviewed this problem with more developers, and
> sent out the report finally.
>
>> dmesg starts at 42 seconds, you don't happen to have a complete one?
>
> Sorry for the incomplete dmesg. Please check if attached kmsg.xz helps?
> The time may vary a little since it is captured from another run.
> Thanks.
>
> Not sure if this has anything to do with xfs or testcases, so add
> xfs folks to consult.
>
> Hi xfs folks,
>
> Could you please help check the original report at
> https://lore.kernel.org/r/202210241508.2e203c3d-yujie.liu@intel.com
> to see if can find any clue for the problem? Thanks.
Sorry, wrong link. Should be:
https://lore.kernel.org/all/202210241614.2ae4c1f5-yujie.liu@intel.com/
--
Best Regards,
Yujie
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [tip:x86/core] [kallsyms] f138918162: WARNING:CPU:#PID:#at_fs/xfs/xfs_message.c:#xfs_buf_alert_ratelimited.cold-#[xfs]
2022-10-27 2:24 ` Yujie Liu
@ 2022-10-27 16:20 ` Darrick J. Wong
2022-10-27 16:51 ` Peter Zijlstra
0 siblings, 1 reply; 5+ messages in thread
From: Darrick J. Wong @ 2022-10-27 16:20 UTC (permalink / raw)
To: Yujie Liu
Cc: Peter Zijlstra, oe-lkp, lkp, Thomas Gleixner, linux-kernel, x86,
fengwei.yin, ying.huang, fstests, linux-xfs
On Thu, Oct 27, 2022 at 10:24:51AM +0800, Yujie Liu wrote:
> On 10/27/2022 10:09, Yujie Liu wrote:
> > On Wed, Oct 26, 2022 at 02:06:17PM +0200, Peter Zijlstra wrote:
> > > On Wed, Oct 26, 2022 at 11:51:14AM +0200, Peter Zijlstra wrote:
> > > > On Wed, Oct 26, 2022 at 05:10:28PM +0800, kernel test robot wrote:
> > > > > Hi Peter,
> > > > >
> > > > > For below patch, we couldn't find any connection between the code
> > > > > change of kallsyms and xfstests testcase, but we got very stable test
> > > > > results. We tested commit f1389181622a and its parent commit
> > > > > 7825451fa4dc for 12 runs each, parent was 100% good while this commit
> > > > > was 100% bad, so we still send out this report FYI though we don't have
> > > > > clear clue of the root cause. Please check the details below if
> > > > > interested in further investigation. Thanks.
> > > >
> > > > *groan* I'll go have a poke.
> > > >
> > > > Also; I've had these patches in my queue.git for over a month now and
> > > > this is the first report, how comes?
> >
> > Hi Peter,
> >
> > Sorry about late report. We did catch this problem on
> > call-depth-tracking branch of queue.git in Semtember, but we used to
> > think it may be a false alarm since we cannot connect the code change
> > with xfstests, so didn't report at that time. After merged to x86 tip,
> > we did more tests and reviewed this problem with more developers, and
> > sent out the report finally.
> >
> > > dmesg starts at 42 seconds, you don't happen to have a complete one?
> >
> > Sorry for the incomplete dmesg. Please check if attached kmsg.xz helps?
> > The time may vary a little since it is captured from another run.
> > Thanks.
> >
> > Not sure if this has anything to do with xfs or testcases, so add
> > xfs folks to consult.
> >
> > Hi xfs folks,
> >
> > Could you please help check the original report at
> > https://lore.kernel.org/r/202210241508.2e203c3d-yujie.liu@intel.com
> > to see if can find any clue for the problem? Thanks.
>
> Sorry, wrong link. Should be:
>
> https://lore.kernel.org/all/202210241614.2ae4c1f5-yujie.liu@intel.com/
xfs/439 corrupts the log and checks that log recovery will notice the
corruption and abort the mount.
Excerpted dmesg from the link above:
[ 75.708690][ T3668] ------------[ cut here ]------------
[ 75.714041][ T3668] WARNING: CPU: 1 PID: 3668 at fs/xfs/xfs_message.c:104 xfs_buf_alert_ratelimited.cold-0x7/0x45 [xfs]
[ 75.779828][ T3668] CPU: 1 PID: 3668 Comm: mount Not tainted 6.1.0-rc1-00053-gf1389181622a #1
[ 75.788389][ T3668] Hardware name: Hewlett-Packard HP Pro 3340 MT/17A1, BIOS 8.07 01/24/2013
[ 75.796864][ T3668] RIP: 0010:xfs_buf_alert_ratelimited.cold-0x7/0x45 [xfs]
[ 75.898077][ T3668] xlog_find_verify_log_record (fs/xfs/xfs_log_recover.c:431) xfs
[ 75.910856][ T3668] xlog_find_head (fs/xfs/xfs_log_recover.c:705) xfs
[ 75.925847][ T3668] xlog_find_tail (fs/xfs/xfs_log_recover.c:1256) xfs
[ 75.951249][ T3668] xlog_recover (fs/xfs/xfs_log_recover.c:3355) xfs
[ 75.961324][ T3668] xfs_log_mount (fs/xfs/xfs_log.c:739) xfs
[ 75.966486][ T3668] xfs_mountfs (fs/xfs/xfs_mount.c:805 (discriminator 4)) xfs
[ 75.998070][ T3668] xfs_fs_fill_super (fs/xfs/xfs_super.c:1666) xfs
So yes, this is xlog_find_verify_log_record behaving as intended:
if (i < start_blk) {
/* valid log record not found */
xfs_warn(log->l_mp,
"Log inconsistent (didn't find previous header)");
ASSERT(0); <------------------ here
error = -EFSCORRUPTED;
goto out;
}
However, I noticed that the callsite in the WARNING: message has changed
from the usual 'asswarn' (which is the caller of WARN_ON) to
'xfs_buf_alert_ratelimited', which seems totally wrong since XFS log
recovery doesn't touch xfs_buf objects at all.
The reason this turned into a fstests regression is that xfs/439 uses
_filter_assert_dmesg to check for warnings in dmesg that are not the
result of an ASSERT failing (common/rc in fstests):
# We generate assert related WARNINGs on purpose and make sure
# test doesn't fail because of these warnings. This is a helper
# for _check_dmesg() to filter out them.
_filter_assert_dmesg()
{
local warn1="WARNING:.*fs/xfs/xfs_message\.c:.*asswarn.*"
local warn2="WARNING:.*fs/xfs/xfs_message\.c:.*assfail.*"
sed -e "s#$warn1#Intentional warnings in asswarn#" \
-e "s#$warn2#Intentional warnings in assfail#"
}
As for why that commit reports the wrong caller ... I don't know. The
function name and line number are correct.
--D
>
> --
> Best Regards,
> Yujie
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [tip:x86/core] [kallsyms] f138918162: WARNING:CPU:#PID:#at_fs/xfs/xfs_message.c:#xfs_buf_alert_ratelimited.cold-#[xfs]
2022-10-27 16:20 ` Darrick J. Wong
@ 2022-10-27 16:51 ` Peter Zijlstra
2022-10-28 19:52 ` Peter Zijlstra
0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2022-10-27 16:51 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Yujie Liu, oe-lkp, lkp, Thomas Gleixner, linux-kernel, x86,
fengwei.yin, ying.huang, fstests, linux-xfs
> However, I noticed that the callsite in the WARNING: message has changed
> from the usual 'asswarn' (which is the caller of WARN_ON) to
> 'xfs_buf_alert_ratelimited', which seems totally wrong since XFS log
> recovery doesn't touch xfs_buf objects at all.
Yeah; and I've meanwhile found more cases where it goes sideways.
I'll revert this patch and try an alternative approach. I'll post
patches tomorrow or something.
Sorry about this.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [tip:x86/core] [kallsyms] f138918162: WARNING:CPU:#PID:#at_fs/xfs/xfs_message.c:#xfs_buf_alert_ratelimited.cold-#[xfs]
2022-10-27 16:51 ` Peter Zijlstra
@ 2022-10-28 19:52 ` Peter Zijlstra
0 siblings, 0 replies; 5+ messages in thread
From: Peter Zijlstra @ 2022-10-28 19:52 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Yujie Liu, oe-lkp, lkp, Thomas Gleixner, linux-kernel, x86,
fengwei.yin, ying.huang, fstests, linux-xfs
On Thu, Oct 27, 2022 at 06:51:09PM +0200, Peter Zijlstra wrote:
>
> > However, I noticed that the callsite in the WARNING: message has changed
> > from the usual 'asswarn' (which is the caller of WARN_ON) to
> > 'xfs_buf_alert_ratelimited', which seems totally wrong since XFS log
> > recovery doesn't touch xfs_buf objects at all.
>
> Yeah; and I've meanwhile found more cases where it goes sideways.
>
> I'll revert this patch and try an alternative approach. I'll post
> patches tomorrow or something.
Now posted:
https://lkml.kernel.org/r/20221028194022.388521751@infradead.org
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-10-28 19:52 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <202210241614.2ae4c1f5-yujie.liu@intel.com>
[not found] ` <Y1kDEmLeRA2UGeF8@hirez.programming.kicks-ass.net>
[not found] ` <Y1kiuTIYobR4nexS@hirez.programming.kicks-ass.net>
2022-10-27 2:09 ` [tip:x86/core] [kallsyms] f138918162: WARNING:CPU:#PID:#at_fs/xfs/xfs_message.c:#xfs_buf_alert_ratelimited.cold-#[xfs] Yujie Liu
2022-10-27 2:24 ` Yujie Liu
2022-10-27 16:20 ` Darrick J. Wong
2022-10-27 16:51 ` Peter Zijlstra
2022-10-28 19:52 ` Peter Zijlstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox