From: "Nicholas Piggin" <npiggin@gmail.com>
To: "Guenter Roeck" <linux@roeck-us.net>,
"Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: david@redhat.com, Peter Zijlstra <peterz@infradead.org>,
linux-kernel@vger.kernel.org, wsa+renesas@sang-engineering.com,
nicholas@linux.ibm.com, windhl@126.com, cuigaosheng1@huawei.com,
mikey@neuling.org, paul@paul-moore.com, haren@linux.ibm.com,
Ingo Molnar <mingo@kernel.org>,
joel@jms.id.au, lukas.bulwahn@gmail.com, nathanl@linux.ibm.com,
ajd@linux.ibm.com, ye.xingchen@zte.com.cn, nathan@kernel.org,
rmclure@linux.ibm.com, hbathini@linux.ibm.com,
atrajeev@linux.vnet.ibm.com, yuanjilin@cdjrlc.com,
pali@kernel.org, farosas@linux.ibm.com, geoff@infradead.org,
Linus Torvalds <torvalds@linux-foundation.org>,
gustavoars@kernel.org, lihuafei1@huawei.com,
aneesh.kumar@linux.ibm.com, zhengyongjun3@huawei.com,
linuxppc-dev@lists.ozlabs.org
Subject: Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag
Date: Thu, 13 Oct 2022 15:14:08 +1000 [thread overview]
Message-ID: <CNKJES19WP6K.LOS0TA0Q4MRO@bobo> (raw)
In-Reply-To: <bba714ce-4af7-a7ea-21b5-10e5578b6db8@roeck-us.net>
On Thu Oct 13, 2022 at 2:43 PM AEST, Guenter Roeck wrote:
> On 10/12/22 10:20, Jason A. Donenfeld wrote:
> > On Wed, Oct 12, 2022 at 09:44:52AM -0700, Guenter Roeck wrote:
> >> On Wed, Oct 12, 2022 at 09:49:26AM -0600, Jason A. Donenfeld wrote:
> >>> On Wed, Oct 12, 2022 at 07:18:27AM -0700, Guenter Roeck wrote:
> >>>> NIP [c000000000031630] .replay_soft_interrupts+0x60/0x300
> >>>> LR [c000000000031964] .arch_local_irq_restore+0x94/0x1c0
> >>>> Call Trace:
> >>>> [c000000007df3870] [c000000000031964] .arch_local_irq_restore+0x94/0x1c0 (unreliable)
> >>>> [c000000007df38f0] [c000000000f8a444] .__schedule+0x664/0xa50
> >>>> [c000000007df39d0] [c000000000f8a8b0] .schedule+0x80/0x140
> >>>> [c000000007df3a50] [c00000000092f0dc] .try_to_generate_entropy+0x118/0x174
> >>>> [c000000007df3b40] [c00000000092e2e4] .urandom_read_iter+0x74/0x140
> >>>> [c000000007df3bc0] [c0000000003b0044] .vfs_read+0x284/0x2d0
> >>>> [c000000007df3cd0] [c0000000003b0d2c] .ksys_read+0xdc/0x130
> >>>> [c000000007df3d80] [c00000000002a88c] .system_call_exception+0x19c/0x330
> >>>> [c000000007df3e10] [c00000000000c1d4] system_call_common+0xf4/0x258
> >>>
> >>> Obviously the first couple lines of this concern me a bit. But I think
> >>> actually this might just be a catalyst for another bug. You could view
> >>> that function as basically just:
> >>>
> >>> while (something)
> >>> schedule();
> >>>
> >>> And I guess in the process of calling the scheduler a lot, which toggles
> >>> interrupts a lot, something got wedged.
> >>>
> >>> Curious, though, I did try to reproduce this, to no avail. My .config is
> >>> https://xn--4db.cc/rBvHWfDZ . What's yours?
> >>>
> >>
> >> Attached. My qemu command line is
> >
> > Okay, thanks, I reproduced it. In this case, I suspect
> > try_to_generate_entropy() is just the messenger. There's an earlier
> > problem:
> >
> > BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
> > caller is .__flush_tlb_pending+0x40/0xf0
> > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.0.0-28380-gde492c83cae0-dirty #4
> > Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac
> > Call Trace:
> > [c0000000044c3540] [c000000000f93ef0] .dump_stack_lvl+0x7c/0xc4 (unreliable)
> > [c0000000044c35d0] [c000000000fc9550] .check_preemption_disabled+0x140/0x150
> > [c0000000044c3660] [c000000000073dd0] .__flush_tlb_pending+0x40/0xf0
> > [c0000000044c36f0] [c000000000334434] .__apply_to_page_range+0x764/0xa30
> > [c0000000044c3840] [c00000000006cad0] .change_memory_attr+0xf0/0x160
> > [c0000000044c38d0] [c0000000002a1d70] .bpf_prog_select_runtime+0x150/0x230
> > [c0000000044c3970] [c000000000d405d4] .bpf_prepare_filter+0x504/0x6f0
> > [c0000000044c3a30] [c000000000d4085c] .bpf_prog_create+0x9c/0x140
> > [c0000000044c3ac0] [c000000002051d9c] .ptp_classifier_init+0x44/0x78
> > [c0000000044c3b50] [c000000002050f3c] .sock_init+0xe0/0x100
> > [c0000000044c3bd0] [c000000000010bd4] .do_one_initcall+0xa4/0x438
> > [c0000000044c3cc0] [c000000002005008] .kernel_init_freeable+0x378/0x428
> > [c0000000044c3da0] [c0000000000113d8] .kernel_init+0x28/0x1a0
> > [c0000000044c3e10] [c00000000000ca3c] .ret_from_kernel_thread+0x58/0x60
> >
> > This in turn is because __flush_tlb_pending() calls:
> >
> > static inline int mm_is_thread_local(struct mm_struct *mm)
> > {
> > return cpumask_equal(mm_cpumask(mm),
> > cpumask_of(smp_processor_id()));
> > }
> >
> > __flush_tlb_pending() has a comment about this:
> >
> > * Must be called from within some kind of spinlock/non-preempt region...
> > */
> > void __flush_tlb_pending(struct ppc64_tlb_batch *batch)
> >
> > So I guess that didn't happen for some reason? Maybe this is indicative
> > of some lock imbalance that then gets hit later?
>
> I managed to bisect that problem. Unfortunately it points to the
> scheduler merge. No idea what to do about that. Any idea ?
> I am copying Peter and Ingo for comments.
>
> # first bad commit: [30c999937f69abf935b0228b8411713737377d9e] Merge tag 'sched-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
This might be a red herring because I can reproduce without it.
I think we can fix this with some preempt critical sections, they
don't look too much of a problem.
I don't know why it's not showing up earlier than this release,
I'll look into it a bit more.
Thanks,
Nick
WARNING: multiple messages have this Message-ID (diff)
From: "Nicholas Piggin" <npiggin@gmail.com>
To: "Guenter Roeck" <linux@roeck-us.net>,
"Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: "Michael Ellerman" <mpe@ellerman.id.au>,
"Linus Torvalds" <torvalds@linux-foundation.org>,
<ajd@linux.ibm.com>, <aneesh.kumar@linux.ibm.com>,
<atrajeev@linux.vnet.ibm.com>, <christophe.leroy@csgroup.eu>,
<cuigaosheng1@huawei.com>, <david@redhat.com>,
<farosas@linux.ibm.com>, <geoff@infradead.org>,
<gustavoars@kernel.org>, <haren@linux.ibm.com>,
<hbathini@linux.ibm.com>, <joel@jms.id.au>,
<lihuafei1@huawei.com>, <linux-kernel@vger.kernel.org>,
<linuxppc-dev@lists.ozlabs.org>, <lukas.bulwahn@gmail.com>,
<mikey@neuling.org>, <nathan@kernel.org>, <nathanl@linux.ibm.com>,
<nicholas@linux.ibm.com>, <pali@kernel.org>,
<paul@paul-moore.com>, <rmclure@linux.ibm.com>,
<ruscur@russell.cc>, <windhl@126.com>,
<wsa+renesas@sang-engineering.com>, <ye.xingchen@zte.com.cn>,
<yuanjilin@cdjrlc.com>, <zhengyongjun3@huawei.com>,
"Peter Zijlstra" <peterz@infradead.org>,
"Ingo Molnar" <mingo@kernel.org>
Subject: Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag
Date: Thu, 13 Oct 2022 15:14:08 +1000 [thread overview]
Message-ID: <CNKJES19WP6K.LOS0TA0Q4MRO@bobo> (raw)
In-Reply-To: <bba714ce-4af7-a7ea-21b5-10e5578b6db8@roeck-us.net>
On Thu Oct 13, 2022 at 2:43 PM AEST, Guenter Roeck wrote:
> On 10/12/22 10:20, Jason A. Donenfeld wrote:
> > On Wed, Oct 12, 2022 at 09:44:52AM -0700, Guenter Roeck wrote:
> >> On Wed, Oct 12, 2022 at 09:49:26AM -0600, Jason A. Donenfeld wrote:
> >>> On Wed, Oct 12, 2022 at 07:18:27AM -0700, Guenter Roeck wrote:
> >>>> NIP [c000000000031630] .replay_soft_interrupts+0x60/0x300
> >>>> LR [c000000000031964] .arch_local_irq_restore+0x94/0x1c0
> >>>> Call Trace:
> >>>> [c000000007df3870] [c000000000031964] .arch_local_irq_restore+0x94/0x1c0 (unreliable)
> >>>> [c000000007df38f0] [c000000000f8a444] .__schedule+0x664/0xa50
> >>>> [c000000007df39d0] [c000000000f8a8b0] .schedule+0x80/0x140
> >>>> [c000000007df3a50] [c00000000092f0dc] .try_to_generate_entropy+0x118/0x174
> >>>> [c000000007df3b40] [c00000000092e2e4] .urandom_read_iter+0x74/0x140
> >>>> [c000000007df3bc0] [c0000000003b0044] .vfs_read+0x284/0x2d0
> >>>> [c000000007df3cd0] [c0000000003b0d2c] .ksys_read+0xdc/0x130
> >>>> [c000000007df3d80] [c00000000002a88c] .system_call_exception+0x19c/0x330
> >>>> [c000000007df3e10] [c00000000000c1d4] system_call_common+0xf4/0x258
> >>>
> >>> Obviously the first couple lines of this concern me a bit. But I think
> >>> actually this might just be a catalyst for another bug. You could view
> >>> that function as basically just:
> >>>
> >>> while (something)
> >>> schedule();
> >>>
> >>> And I guess in the process of calling the scheduler a lot, which toggles
> >>> interrupts a lot, something got wedged.
> >>>
> >>> Curious, though, I did try to reproduce this, to no avail. My .config is
> >>> https://xn--4db.cc/rBvHWfDZ . What's yours?
> >>>
> >>
> >> Attached. My qemu command line is
> >
> > Okay, thanks, I reproduced it. In this case, I suspect
> > try_to_generate_entropy() is just the messenger. There's an earlier
> > problem:
> >
> > BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
> > caller is .__flush_tlb_pending+0x40/0xf0
> > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.0.0-28380-gde492c83cae0-dirty #4
> > Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac
> > Call Trace:
> > [c0000000044c3540] [c000000000f93ef0] .dump_stack_lvl+0x7c/0xc4 (unreliable)
> > [c0000000044c35d0] [c000000000fc9550] .check_preemption_disabled+0x140/0x150
> > [c0000000044c3660] [c000000000073dd0] .__flush_tlb_pending+0x40/0xf0
> > [c0000000044c36f0] [c000000000334434] .__apply_to_page_range+0x764/0xa30
> > [c0000000044c3840] [c00000000006cad0] .change_memory_attr+0xf0/0x160
> > [c0000000044c38d0] [c0000000002a1d70] .bpf_prog_select_runtime+0x150/0x230
> > [c0000000044c3970] [c000000000d405d4] .bpf_prepare_filter+0x504/0x6f0
> > [c0000000044c3a30] [c000000000d4085c] .bpf_prog_create+0x9c/0x140
> > [c0000000044c3ac0] [c000000002051d9c] .ptp_classifier_init+0x44/0x78
> > [c0000000044c3b50] [c000000002050f3c] .sock_init+0xe0/0x100
> > [c0000000044c3bd0] [c000000000010bd4] .do_one_initcall+0xa4/0x438
> > [c0000000044c3cc0] [c000000002005008] .kernel_init_freeable+0x378/0x428
> > [c0000000044c3da0] [c0000000000113d8] .kernel_init+0x28/0x1a0
> > [c0000000044c3e10] [c00000000000ca3c] .ret_from_kernel_thread+0x58/0x60
> >
> > This in turn is because __flush_tlb_pending() calls:
> >
> > static inline int mm_is_thread_local(struct mm_struct *mm)
> > {
> > return cpumask_equal(mm_cpumask(mm),
> > cpumask_of(smp_processor_id()));
> > }
> >
> > __flush_tlb_pending() has a comment about this:
> >
> > * Must be called from within some kind of spinlock/non-preempt region...
> > */
> > void __flush_tlb_pending(struct ppc64_tlb_batch *batch)
> >
> > So I guess that didn't happen for some reason? Maybe this is indicative
> > of some lock imbalance that then gets hit later?
>
> I managed to bisect that problem. Unfortunately it points to the
> scheduler merge. No idea what to do about that. Any idea ?
> I am copying Peter and Ingo for comments.
>
> # first bad commit: [30c999937f69abf935b0228b8411713737377d9e] Merge tag 'sched-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
This might be a red herring because I can reproduce without it.
I think we can fix this with some preempt critical sections, they
don't look too much of a problem.
I don't know why it's not showing up earlier than this release,
I'll look into it a bit more.
Thanks,
Nick
next prev parent reply other threads:[~2022-10-13 5:15 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-09 11:01 [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag Michael Ellerman
2022-10-09 11:01 ` Michael Ellerman
2022-10-09 21:17 ` pr-tracker-bot
2022-10-09 21:17 ` pr-tracker-bot
2022-10-10 19:25 ` Jason A. Donenfeld
2022-10-10 19:25 ` Jason A. Donenfeld
2022-10-10 20:03 ` Jason A. Donenfeld
2022-10-10 20:03 ` Jason A. Donenfeld
2022-10-10 22:26 ` Jason A. Donenfeld
2022-10-10 22:26 ` Jason A. Donenfeld
2022-10-11 0:00 ` Andrew Donnellan
2022-10-11 0:00 ` Andrew Donnellan
2022-10-11 0:13 ` Jason A. Donenfeld
2022-10-11 0:13 ` Jason A. Donenfeld
2022-10-11 1:44 ` Michael Ellerman
2022-10-11 1:44 ` Michael Ellerman
2022-10-11 2:57 ` Jason A. Donenfeld
2022-10-11 2:57 ` Jason A. Donenfeld
2022-10-11 9:34 ` Michael Ellerman
2022-10-11 9:34 ` Michael Ellerman
2022-10-11 1:53 ` Michael Ellerman
2022-10-11 1:53 ` Michael Ellerman
2022-10-11 2:57 ` Jason A. Donenfeld
2022-10-11 2:57 ` Jason A. Donenfeld
2022-10-11 9:35 ` Michael Ellerman
2022-10-11 9:35 ` Michael Ellerman
2022-10-11 11:10 ` Nicholas Piggin
2022-10-11 11:10 ` Nicholas Piggin
2022-10-12 14:18 ` Guenter Roeck
2022-10-12 14:18 ` Guenter Roeck
2022-10-12 15:49 ` Jason A. Donenfeld
2022-10-12 15:49 ` Jason A. Donenfeld
2022-10-12 16:44 ` Guenter Roeck
2022-10-12 16:44 ` Guenter Roeck
2022-10-12 17:20 ` Jason A. Donenfeld
2022-10-12 17:20 ` Jason A. Donenfeld
2022-10-12 17:48 ` Guenter Roeck
2022-10-12 17:48 ` Guenter Roeck
2022-10-12 18:37 ` Jason A. Donenfeld
2022-10-12 18:37 ` Jason A. Donenfeld
2022-10-13 5:17 ` Nicholas Piggin
2022-10-13 5:17 ` Nicholas Piggin
2022-10-12 22:16 ` Guenter Roeck
2022-10-12 22:16 ` Guenter Roeck
2022-10-13 0:03 ` Michael Ellerman
2022-10-13 0:03 ` Michael Ellerman
2022-10-13 0:21 ` Guenter Roeck
2022-10-13 0:21 ` Guenter Roeck
2022-10-13 5:03 ` Nicholas Piggin
2022-10-13 5:03 ` Nicholas Piggin
2022-10-13 5:19 ` Jason A. Donenfeld
2022-10-13 5:19 ` Jason A. Donenfeld
2022-10-13 5:20 ` Guenter Roeck
2022-10-13 5:20 ` Guenter Roeck
2022-10-13 5:22 ` Nicholas Piggin
2022-10-13 5:22 ` Nicholas Piggin
2022-10-13 4:43 ` Guenter Roeck
2022-10-13 4:43 ` Guenter Roeck
2022-10-13 5:14 ` Nicholas Piggin [this message]
2022-10-13 5:14 ` Nicholas Piggin
2022-10-13 18:55 ` Guenter Roeck
2022-10-13 18:55 ` Guenter Roeck
2022-10-12 16:45 ` Jason A. Donenfeld
2022-10-12 16:45 ` Jason A. Donenfeld
2022-10-12 16:49 ` Guenter Roeck
2022-10-12 16:49 ` Guenter Roeck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CNKJES19WP6K.LOS0TA0Q4MRO@bobo \
--to=npiggin@gmail.com \
--cc=Jason@zx2c4.com \
--cc=ajd@linux.ibm.com \
--cc=aneesh.kumar@linux.ibm.com \
--cc=atrajeev@linux.vnet.ibm.com \
--cc=cuigaosheng1@huawei.com \
--cc=david@redhat.com \
--cc=farosas@linux.ibm.com \
--cc=geoff@infradead.org \
--cc=gustavoars@kernel.org \
--cc=haren@linux.ibm.com \
--cc=hbathini@linux.ibm.com \
--cc=joel@jms.id.au \
--cc=lihuafei1@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@roeck-us.net \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=lukas.bulwahn@gmail.com \
--cc=mikey@neuling.org \
--cc=mingo@kernel.org \
--cc=nathan@kernel.org \
--cc=nathanl@linux.ibm.com \
--cc=nicholas@linux.ibm.com \
--cc=pali@kernel.org \
--cc=paul@paul-moore.com \
--cc=peterz@infradead.org \
--cc=rmclure@linux.ibm.com \
--cc=torvalds@linux-foundation.org \
--cc=windhl@126.com \
--cc=wsa+renesas@sang-engineering.com \
--cc=ye.xingchen@zte.com.cn \
--cc=yuanjilin@cdjrlc.com \
--cc=zhengyongjun3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.