Re: Fwd: RCU indicates stalls with iwlwifi, causing boot failures

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Hugh Dickins <hughd@google.com>
To: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>,
	Bagas Sanjaya <bagasdotme@gmail.com>,
	Lai Jiangshan <laijs@linux.alibaba.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Gregory Greenman <gregory.greenman@intel.com>,
	Ben Greear <greearb@candelatech.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux Networking <netdev@vger.kernel.org>,
	Linux Wireless <linux-wireless@vger.kernel.org>,
	Linux RCU <rcu@vger.kernel.org>
Subject: Re: Fwd: RCU indicates stalls with iwlwifi, causing boot failures
Date: Fri, 1 Sep 2023 23:59:26 -0700 (PDT)	[thread overview]
Message-ID: <f0f6a6ec-e968-a91c-dc46-357566d8811@google.com> (raw)
In-Reply-To: <c3f9b35c-087d-0e34-c251-e249f2c058d3@candelatech.com>

Hi Dave,

On Fri, 1 Sep 2023, Ben Greear wrote:
> On 9/1/23 5:29 PM, Bagas Sanjaya wrote:
> > Hi,
> > 
> > I notice a bug report on Bugzilla [1]. Quoting from it:
> 
> Try booting with pcie=noaer ?
> 
> That fixes only known iwlwifi bug we have found in 6.5, but we are also using
> mostly
> backports iwlwifi driver...
> 
> Thanks,
> Ben
> 
> > 
> >> I'm seeing RCU warnings in Linus's current tree (like
> >> 87dfd85c38923acd9517e8df4afc908565df0961) that come from RCU:
> >>
> >> WARNING: CPU: 0 PID: 0 at kernel/rcu/tree_exp.h:787
> >> rcu_exp_handler+0x35/0xe0
> >>
> >> But they *ONLY* occur on a system with a newer iwlwifi device:
> >>
> >> aa:00.0 Network controller: Intel Corporation Wi-Fi 6 AX210/AX211/AX411
> >> 160MHz (rev 1a)
> >>
> >> and never in a VM or on an older device (like an 8260).  During a bisect
> >> the only seem to occur with the "83" version of the firmware.
> >>
> >> iwlwifi 0000:aa:00.0: loaded firmware version 83.e8f84e98.0
> >> ty-a0-gf-a0-83.ucode op_mode iwlmvm
> >>
> >> The first warning gets spit out within a millisecond of the last printk()
> >> from the iwlwifi driver.  They eventually result in a big spew of RCU
> >> messages like this:
> >>
> >> [   27.124796] rcu: INFO: rcu_preempt detected expedited stalls on
> >> CPUs/tasks: { 0-...D } 125 jiffies s: 193 root: 0x1/.
> >> [   27.126466] rcu: blocking rcu_node structures (internal RCU debug):
> >> [   27.128114] Sending NMI from CPU 3 to CPUs 0:
> >> [   27.128122] NMI backtrace for cpu 0 skipped: idling at
> >> intel_idle+0x5f/0xb0
> >> [   27.159757] loop30: detected capacity change from 0 to 8
> >> [   27.204967] rcu: INFO: rcu_preempt detected expedited stalls on
> >> CPUs/tasks: { 0-...D } 145 jiffies s: 193 root: 0x1/.
> >> [   27.206353] rcu: blocking rcu_node structures (internal RCU debug):
> >> [   27.207751] Sending NMI from CPU 3 to CPUs 0:
> >> [   27.207825] NMI backtrace for cpu 0 skipped: idling at
> >> intel_idle+0x5f/0xb0
> >>
> >> I usually see them at boot.  In that case, they usually hang the system and
> >> keep it from booting.  I've also encountered them at reboots and also seen
> >> them *not* be fatal at boot.  I suspect it has to do with which CPU gets
> >> wedged.
> > 
> > See Bugzilla for the full thread and attached full dmesg output.
> > 
> > Thanks.
> > 
> > [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217856

I just took a look at your dmesg in bugzilla: I see lots of page tables
dumped, including "ESPfix Area", and think you're hitting my screwup: see

https://lore.kernel.org/linux-mm/CABXGCsNi8Tiv5zUPNXr6UJw6qV1VdaBEfGqEAMkkXE3QPvZuAQ@mail.gmail.com/

Please give the patch from the end of that thread a try:

[PATCH] mm/pagewalk: fix bootstopping regression from extra pte_unmap()

[ Commit message now written, but let's see if Dave can confirm it too ]

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 mm/pagewalk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index 2022333805d3..9e7d0276c38a 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -58,7 +58,7 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 			pte = pte_offset_map(pmd, addr);
 		if (pte) {
 			err = walk_pte_range_inner(pte, addr, end, walk);
-			if (walk->mm != &init_mm)
+			if (walk->mm != &init_mm && addr < TASK_SIZE)
 				pte_unmap(pte);
 		}
 	} else {
-- 
2.35.3

next prev parent reply	other threads:[~2023-09-02  6:59 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-02  0:29 Fwd: RCU indicates stalls with iwlwifi, causing boot failures Bagas Sanjaya
2023-09-02  2:07 ` Ben Greear
2023-09-02  6:59   ` Hugh Dickins [this message]
2023-09-02 15:59     ` Hugh Dickins
2023-09-05 16:21     ` Dave Hansen

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:2022333805d dfblob:9e7d0276c38 )
 OR (
bs:"Re: Fwd: RCU indicates stalls with iwlwifi, causing boot failures" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f0f6a6ec-e968-a91c-dc46-357566d8811@google.com \
    --to=hughd@google.com \
    --cc=bagasdotme@gmail.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=greearb@candelatech.com \
    --cc=gregory.greenman@intel.com \
    --cc=laijs@linux.alibaba.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=mikhail.v.gavrilov@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.