All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jason A. Donenfeld" <Jason@zx2c4.com>
To: Guenter Roeck <linux@roeck-us.net>
Cc: david@redhat.com, linux-kernel@vger.kernel.org,
	wsa+renesas@sang-engineering.com, nicholas@linux.ibm.com,
	windhl@126.com, cuigaosheng1@huawei.com, mikey@neuling.org,
	paul@paul-moore.com, haren@linux.ibm.com, joel@jms.id.au,
	lukas.bulwahn@gmail.com, nathanl@linux.ibm.com,
	ajd@linux.ibm.com, ye.xingchen@zte.com.cn, npiggin@gmail.com,
	nathan@kernel.org, rmclure@linux.ibm.com, hbathini@linux.ibm.com,
	atrajeev@linux.vnet.ibm.com, yuanjilin@cdjrlc.com,
	pali@kernel.org, farosas@linux.ibm.com, geoff@infradead.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	gustavoars@kernel.org, lihuafei1@huawei.com,
	aneesh.kumar@linux.ibm.com, zhengyongjun3@huawei.com,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag
Date: Wed, 12 Oct 2022 11:20:38 -0600	[thread overview]
Message-ID: <Y0b3ZsTRHWG6jGK8@zx2c4.com> (raw)
In-Reply-To: <20221012164452.GA2990467@roeck-us.net>

On Wed, Oct 12, 2022 at 09:44:52AM -0700, Guenter Roeck wrote:
> On Wed, Oct 12, 2022 at 09:49:26AM -0600, Jason A. Donenfeld wrote:
> > On Wed, Oct 12, 2022 at 07:18:27AM -0700, Guenter Roeck wrote:
> > > NIP [c000000000031630] .replay_soft_interrupts+0x60/0x300
> > > LR [c000000000031964] .arch_local_irq_restore+0x94/0x1c0
> > > Call Trace:
> > > [c000000007df3870] [c000000000031964] .arch_local_irq_restore+0x94/0x1c0 (unreliable)
> > > [c000000007df38f0] [c000000000f8a444] .__schedule+0x664/0xa50
> > > [c000000007df39d0] [c000000000f8a8b0] .schedule+0x80/0x140
> > > [c000000007df3a50] [c00000000092f0dc] .try_to_generate_entropy+0x118/0x174
> > > [c000000007df3b40] [c00000000092e2e4] .urandom_read_iter+0x74/0x140
> > > [c000000007df3bc0] [c0000000003b0044] .vfs_read+0x284/0x2d0
> > > [c000000007df3cd0] [c0000000003b0d2c] .ksys_read+0xdc/0x130
> > > [c000000007df3d80] [c00000000002a88c] .system_call_exception+0x19c/0x330
> > > [c000000007df3e10] [c00000000000c1d4] system_call_common+0xf4/0x258
> > 
> > Obviously the first couple lines of this concern me a bit. But I think
> > actually this might just be a catalyst for another bug. You could view
> > that function as basically just:
> > 
> >     while (something)
> >     	schedule();
> > 
> > And I guess in the process of calling the scheduler a lot, which toggles
> > interrupts a lot, something got wedged.
> > 
> > Curious, though, I did try to reproduce this, to no avail. My .config is
> > https://xn--4db.cc/rBvHWfDZ . What's yours?
> > 
> 
> Attached. My qemu command line is

Okay, thanks, I reproduced it. In this case, I suspect
try_to_generate_entropy() is just the messenger. There's an earlier
problem:

BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
caller is .__flush_tlb_pending+0x40/0xf0
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.0.0-28380-gde492c83cae0-dirty #4
Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac
Call Trace:
[c0000000044c3540] [c000000000f93ef0] .dump_stack_lvl+0x7c/0xc4 (unreliable)
[c0000000044c35d0] [c000000000fc9550] .check_preemption_disabled+0x140/0x150
[c0000000044c3660] [c000000000073dd0] .__flush_tlb_pending+0x40/0xf0
[c0000000044c36f0] [c000000000334434] .__apply_to_page_range+0x764/0xa30
[c0000000044c3840] [c00000000006cad0] .change_memory_attr+0xf0/0x160
[c0000000044c38d0] [c0000000002a1d70] .bpf_prog_select_runtime+0x150/0x230
[c0000000044c3970] [c000000000d405d4] .bpf_prepare_filter+0x504/0x6f0
[c0000000044c3a30] [c000000000d4085c] .bpf_prog_create+0x9c/0x140
[c0000000044c3ac0] [c000000002051d9c] .ptp_classifier_init+0x44/0x78
[c0000000044c3b50] [c000000002050f3c] .sock_init+0xe0/0x100
[c0000000044c3bd0] [c000000000010bd4] .do_one_initcall+0xa4/0x438
[c0000000044c3cc0] [c000000002005008] .kernel_init_freeable+0x378/0x428
[c0000000044c3da0] [c0000000000113d8] .kernel_init+0x28/0x1a0
[c0000000044c3e10] [c00000000000ca3c] .ret_from_kernel_thread+0x58/0x60

This in turn is because __flush_tlb_pending() calls:

static inline int mm_is_thread_local(struct mm_struct *mm)
{
        return cpumask_equal(mm_cpumask(mm),
                              cpumask_of(smp_processor_id()));
}

__flush_tlb_pending() has a comment about this:

 * Must be called from within some kind of spinlock/non-preempt region...
 */
void __flush_tlb_pending(struct ppc64_tlb_batch *batch)

So I guess that didn't happen for some reason? Maybe this is indicative
of some lock imbalance that then gets hit later?

I've also managed to not hit this bug a few times. When it triggers,
after "kprobes: kprobe jump-optimization is enabled. All kprobes are
optimized if possible.", there's a long hang - tens seconds before it
continues. When it doesn't trigger, there's no hang at that point in the
boot process.

Jason

WARNING: multiple messages have this Message-ID (diff)
From: "Jason A. Donenfeld" <Jason@zx2c4.com>
To: Guenter Roeck <linux@roeck-us.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	ajd@linux.ibm.com, aneesh.kumar@linux.ibm.com,
	atrajeev@linux.vnet.ibm.com, christophe.leroy@csgroup.eu,
	cuigaosheng1@huawei.com, david@redhat.com, farosas@linux.ibm.com,
	geoff@infradead.org, gustavoars@kernel.org, haren@linux.ibm.com,
	hbathini@linux.ibm.com, joel@jms.id.au, lihuafei1@huawei.com,
	linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	lukas.bulwahn@gmail.com, mikey@neuling.org, nathan@kernel.org,
	nathanl@linux.ibm.com, nicholas@linux.ibm.com, npiggin@gmail.com,
	pali@kernel.org, paul@paul-moore.com, rmclure@linux.ibm.com,
	ruscur@russell.cc, windhl@126.com,
	wsa+renesas@sang-engineering.com, ye.xingchen@zte.com.cn,
	yuanjilin@cdjrlc.com, zhengyongjun3@huawei.com
Subject: Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag
Date: Wed, 12 Oct 2022 11:20:38 -0600	[thread overview]
Message-ID: <Y0b3ZsTRHWG6jGK8@zx2c4.com> (raw)
In-Reply-To: <20221012164452.GA2990467@roeck-us.net>

On Wed, Oct 12, 2022 at 09:44:52AM -0700, Guenter Roeck wrote:
> On Wed, Oct 12, 2022 at 09:49:26AM -0600, Jason A. Donenfeld wrote:
> > On Wed, Oct 12, 2022 at 07:18:27AM -0700, Guenter Roeck wrote:
> > > NIP [c000000000031630] .replay_soft_interrupts+0x60/0x300
> > > LR [c000000000031964] .arch_local_irq_restore+0x94/0x1c0
> > > Call Trace:
> > > [c000000007df3870] [c000000000031964] .arch_local_irq_restore+0x94/0x1c0 (unreliable)
> > > [c000000007df38f0] [c000000000f8a444] .__schedule+0x664/0xa50
> > > [c000000007df39d0] [c000000000f8a8b0] .schedule+0x80/0x140
> > > [c000000007df3a50] [c00000000092f0dc] .try_to_generate_entropy+0x118/0x174
> > > [c000000007df3b40] [c00000000092e2e4] .urandom_read_iter+0x74/0x140
> > > [c000000007df3bc0] [c0000000003b0044] .vfs_read+0x284/0x2d0
> > > [c000000007df3cd0] [c0000000003b0d2c] .ksys_read+0xdc/0x130
> > > [c000000007df3d80] [c00000000002a88c] .system_call_exception+0x19c/0x330
> > > [c000000007df3e10] [c00000000000c1d4] system_call_common+0xf4/0x258
> > 
> > Obviously the first couple lines of this concern me a bit. But I think
> > actually this might just be a catalyst for another bug. You could view
> > that function as basically just:
> > 
> >     while (something)
> >     	schedule();
> > 
> > And I guess in the process of calling the scheduler a lot, which toggles
> > interrupts a lot, something got wedged.
> > 
> > Curious, though, I did try to reproduce this, to no avail. My .config is
> > https://xn--4db.cc/rBvHWfDZ . What's yours?
> > 
> 
> Attached. My qemu command line is

Okay, thanks, I reproduced it. In this case, I suspect
try_to_generate_entropy() is just the messenger. There's an earlier
problem:

BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
caller is .__flush_tlb_pending+0x40/0xf0
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.0.0-28380-gde492c83cae0-dirty #4
Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac
Call Trace:
[c0000000044c3540] [c000000000f93ef0] .dump_stack_lvl+0x7c/0xc4 (unreliable)
[c0000000044c35d0] [c000000000fc9550] .check_preemption_disabled+0x140/0x150
[c0000000044c3660] [c000000000073dd0] .__flush_tlb_pending+0x40/0xf0
[c0000000044c36f0] [c000000000334434] .__apply_to_page_range+0x764/0xa30
[c0000000044c3840] [c00000000006cad0] .change_memory_attr+0xf0/0x160
[c0000000044c38d0] [c0000000002a1d70] .bpf_prog_select_runtime+0x150/0x230
[c0000000044c3970] [c000000000d405d4] .bpf_prepare_filter+0x504/0x6f0
[c0000000044c3a30] [c000000000d4085c] .bpf_prog_create+0x9c/0x140
[c0000000044c3ac0] [c000000002051d9c] .ptp_classifier_init+0x44/0x78
[c0000000044c3b50] [c000000002050f3c] .sock_init+0xe0/0x100
[c0000000044c3bd0] [c000000000010bd4] .do_one_initcall+0xa4/0x438
[c0000000044c3cc0] [c000000002005008] .kernel_init_freeable+0x378/0x428
[c0000000044c3da0] [c0000000000113d8] .kernel_init+0x28/0x1a0
[c0000000044c3e10] [c00000000000ca3c] .ret_from_kernel_thread+0x58/0x60

This in turn is because __flush_tlb_pending() calls:

static inline int mm_is_thread_local(struct mm_struct *mm)
{
        return cpumask_equal(mm_cpumask(mm),
                              cpumask_of(smp_processor_id()));
}

__flush_tlb_pending() has a comment about this:

 * Must be called from within some kind of spinlock/non-preempt region...
 */
void __flush_tlb_pending(struct ppc64_tlb_batch *batch)

So I guess that didn't happen for some reason? Maybe this is indicative
of some lock imbalance that then gets hit later?

I've also managed to not hit this bug a few times. When it triggers,
after "kprobes: kprobe jump-optimization is enabled. All kprobes are
optimized if possible.", there's a long hang - tens seconds before it
continues. When it doesn't trigger, there's no hang at that point in the
boot process.

Jason

  reply	other threads:[~2022-10-12 17:21 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-09 11:01 [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag Michael Ellerman
2022-10-09 11:01 ` Michael Ellerman
2022-10-09 21:17 ` pr-tracker-bot
2022-10-09 21:17   ` pr-tracker-bot
2022-10-10 19:25 ` Jason A. Donenfeld
2022-10-10 19:25   ` Jason A. Donenfeld
2022-10-10 20:03   ` Jason A. Donenfeld
2022-10-10 20:03     ` Jason A. Donenfeld
2022-10-10 22:26     ` Jason A. Donenfeld
2022-10-10 22:26       ` Jason A. Donenfeld
2022-10-11  0:00       ` Andrew Donnellan
2022-10-11  0:00         ` Andrew Donnellan
2022-10-11  0:13         ` Jason A. Donenfeld
2022-10-11  0:13           ` Jason A. Donenfeld
2022-10-11  1:44           ` Michael Ellerman
2022-10-11  1:44             ` Michael Ellerman
2022-10-11  2:57             ` Jason A. Donenfeld
2022-10-11  2:57               ` Jason A. Donenfeld
2022-10-11  9:34               ` Michael Ellerman
2022-10-11  9:34                 ` Michael Ellerman
2022-10-11  1:53     ` Michael Ellerman
2022-10-11  1:53       ` Michael Ellerman
2022-10-11  2:57       ` Jason A. Donenfeld
2022-10-11  2:57         ` Jason A. Donenfeld
2022-10-11  9:35         ` Michael Ellerman
2022-10-11  9:35           ` Michael Ellerman
2022-10-11 11:10           ` Nicholas Piggin
2022-10-11 11:10             ` Nicholas Piggin
2022-10-12 14:18 ` Guenter Roeck
2022-10-12 14:18   ` Guenter Roeck
2022-10-12 15:49   ` Jason A. Donenfeld
2022-10-12 15:49     ` Jason A. Donenfeld
2022-10-12 16:44     ` Guenter Roeck
2022-10-12 16:44       ` Guenter Roeck
2022-10-12 17:20       ` Jason A. Donenfeld [this message]
2022-10-12 17:20         ` Jason A. Donenfeld
2022-10-12 17:48         ` Guenter Roeck
2022-10-12 17:48           ` Guenter Roeck
2022-10-12 18:37           ` Jason A. Donenfeld
2022-10-12 18:37             ` Jason A. Donenfeld
2022-10-13  5:17             ` Nicholas Piggin
2022-10-13  5:17               ` Nicholas Piggin
2022-10-12 22:16         ` Guenter Roeck
2022-10-12 22:16           ` Guenter Roeck
2022-10-13  0:03           ` Michael Ellerman
2022-10-13  0:03             ` Michael Ellerman
2022-10-13  0:21             ` Guenter Roeck
2022-10-13  0:21               ` Guenter Roeck
2022-10-13  5:03               ` Nicholas Piggin
2022-10-13  5:03                 ` Nicholas Piggin
2022-10-13  5:19                 ` Jason A. Donenfeld
2022-10-13  5:19                   ` Jason A. Donenfeld
2022-10-13  5:20                 ` Guenter Roeck
2022-10-13  5:20                   ` Guenter Roeck
2022-10-13  5:22               ` Nicholas Piggin
2022-10-13  5:22                 ` Nicholas Piggin
2022-10-13  4:43         ` Guenter Roeck
2022-10-13  4:43           ` Guenter Roeck
2022-10-13  5:14           ` Nicholas Piggin
2022-10-13  5:14             ` Nicholas Piggin
2022-10-13 18:55             ` Guenter Roeck
2022-10-13 18:55               ` Guenter Roeck
2022-10-12 16:45     ` Jason A. Donenfeld
2022-10-12 16:45       ` Jason A. Donenfeld
2022-10-12 16:49       ` Guenter Roeck
2022-10-12 16:49         ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y0b3ZsTRHWG6jGK8@zx2c4.com \
    --to=jason@zx2c4.com \
    --cc=ajd@linux.ibm.com \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=atrajeev@linux.vnet.ibm.com \
    --cc=cuigaosheng1@huawei.com \
    --cc=david@redhat.com \
    --cc=farosas@linux.ibm.com \
    --cc=geoff@infradead.org \
    --cc=gustavoars@kernel.org \
    --cc=haren@linux.ibm.com \
    --cc=hbathini@linux.ibm.com \
    --cc=joel@jms.id.au \
    --cc=lihuafei1@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lukas.bulwahn@gmail.com \
    --cc=mikey@neuling.org \
    --cc=nathan@kernel.org \
    --cc=nathanl@linux.ibm.com \
    --cc=nicholas@linux.ibm.com \
    --cc=npiggin@gmail.com \
    --cc=pali@kernel.org \
    --cc=paul@paul-moore.com \
    --cc=rmclure@linux.ibm.com \
    --cc=torvalds@linux-foundation.org \
    --cc=windhl@126.com \
    --cc=wsa+renesas@sang-engineering.com \
    --cc=ye.xingchen@zte.com.cn \
    --cc=yuanjilin@cdjrlc.com \
    --cc=zhengyongjun3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.