linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Brian Norris <briannorris@chromium.org>
To: Kalle Valo <kvalo@kernel.org>
Cc: Sascha Hauer <s.hauer@pengutronix.de>,
	linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org,
	David Lin <yu-hao.lin@nxp.com>,
	Francesco Dolcini <francesco@dolcini.it>
Subject: Re: [PATCH] [RFC] mwifiex: Fix NULL pointer deref
Date: Thu, 20 Jun 2024 12:48:01 -0700	[thread overview]
Message-ID: <ZnSHcZttq79cJS3l@google.com> (raw)
In-Reply-To: <87wmmll5mf.fsf@kernel.org>

Hi Sascha,

On Wed, Jun 19, 2024 at 11:05:28AM +0300, Kalle Valo wrote:
> Sascha Hauer <s.hauer@pengutronix.de> writes:
> 
> > When an Access Point is repeatedly started it happens that the
> > interrupts handler is called with priv->wdev.wiphy being NULL, but
> > dereferenced in mwifiex_parse_single_response_buf() resulting in:
> >
> > | Unable to handle kernel NULL pointer dereference at virtual address 0000000000000140
...
> > | pc : mwifiex_get_cfp+0xd8/0x15c [mwifiex]
> > | lr : mwifiex_get_cfp+0x34/0x15c [mwifiex]
> > | sp : ffff8000818b3a70
> > | x29: ffff8000818b3a70 x28: ffff000006bfd8a5 x27: 0000000000000004
> > | x26: 000000000000002c x25: 0000000000001511 x24: 0000000002e86bc9
> > | x23: ffff000006bfd996 x22: 0000000000000004 x21: ffff000007bec000
> > | x20: 000000000000002c x19: 0000000000000000 x18: 0000000000000000
> > | x17: 000000040044ffff x16: 00500072b5503510 x15: ccc283740681e517
> > | x14: 0201000101006d15 x13: 0000000002e8ff43 x12: 002c01000000ffb1
> > | x11: 0100000000000000 x10: 02e8ff43002c0100 x9 : 0000ffb100100157
> > | x8 : ffff000003d20000 x7 : 00000000000002f1 x6 : 00000000ffffe124
> > | x5 : 0000000000000001 x4 : 0000000000000003 x3 : 0000000000000000
> > | x2 : 0000000000000000 x1 : 0001000000011001 x0 : 0000000000000000
> > | Call trace:
> > |  mwifiex_get_cfp+0xd8/0x15c [mwifiex]
> > |  mwifiex_parse_single_response_buf+0x1d0/0x504 [mwifiex]
> > |  mwifiex_handle_event_ext_scan_report+0x19c/0x2f8 [mwifiex]
> > |  mwifiex_process_sta_event+0x298/0xf0c [mwifiex]
> > |  mwifiex_process_event+0x110/0x238 [mwifiex]
> > |  mwifiex_main_process+0x428/0xa44 [mwifiex]
> > |  mwifiex_sdio_interrupt+0x64/0x12c [mwifiex_sdio]
> > |  process_sdio_pending_irqs+0x64/0x1b8
> > |  sdio_irq_work+0x4c/0x7c
> > |  process_one_work+0x148/0x2a0
> > |  worker_thread+0x2fc/0x40c
> > |  kthread+0x110/0x114
> > |  ret_from_fork+0x10/0x20
> > | Code: a94153f3 a8c37bfd d50323bf d65f03c0 (f940a000)
> > | ---[ end trace 0000000000000000 ]---
> >
> > Fix this by adding a NULL check before dereferencing this pointer.
> >
> > Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
> >
> > ---
> >
> > This is the most obvious fix for this problem, but I am not sure if we
> > might want to catch priv->wdev.wiphy being NULL earlier in the call
> > chain.
> 
> I haven't looked at the call but the symptoms sound like that either we
> are enabling the interrupts too early or there's some kind of locking
> problem so that an other cpu doesn't see the change.

I agree with Kalle that there's a different underlying bug involved, and
(my conclusion:) we shouldn't whack-a-mole the NULL pointer without
addressing the underlying problem.

Looking a bit closer (and without much other context to go on): I believe 
that one potential underlying problem is the complete lack of locking
between cfg80211 entry points (such as mwifiex_add_virtual_intf() or
mwifiex_cfg80211_change_virtual_intf()) and most stuff in the main loop
(mwifiex_main_process()). The former call sites only hold the wiphy
lock, and the latter tends to ... mostly not hold any locks, but rely on
sequentialization with itself, and using its |main_proc_lock| for setup
and teardown. It's all really bad and ready to fall down like a house of
cards at any moment. Unfortunately, no one has spent time on
rearchitecting this driver.

So it's possible that mwifiex_process_event() (mwifiex_get_priv_by_id()
/ mwifiex_get_priv()) is getting a hold of a not-fully-initialized
'priv' structure.

BTW, in case I can reproduce and poke at your scenario, what exactly
is your test case? Are you just starting / killing / restarting hostapd
in a loop? Are you running a full network manager stack that's doing
something more complex (e.g., initiating scans)? Can you reproduce with
some more targeted set of `iw` commands? (`iw phy ... interface add ...;
iw dev ... del`) Is there anything else interesting in the dmesg logs?
(Some of the worst behaviors in this driver come when we see command
timeouts and mwifiex_reinit_sw(), for example.)

Or barring that, can you get some kind of trace of the nl80211 command
sequence, so it's clearer which command(s) are involved leading up to
the problem?

Brian

  reply	other threads:[~2024-06-20 19:48 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-19  7:08 [PATCH] [RFC] mwifiex: Fix NULL pointer deref Sascha Hauer
2024-06-19  8:05 ` Kalle Valo
2024-06-20 19:48   ` Brian Norris [this message]
2024-06-21  9:07     ` Sascha Hauer
2024-06-21 22:50       ` Brian Norris
2024-06-24 11:07         ` Kalle Valo
2024-06-24 16:20       ` Francesco Dolcini
2024-07-02 13:32         ` Sascha Hauer
2024-07-02 20:36           ` Francesco Dolcini
2024-07-03  7:24             ` Sascha Hauer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZnSHcZttq79cJS3l@google.com \
    --to=briannorris@chromium.org \
    --cc=francesco@dolcini.it \
    --cc=kvalo@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=s.hauer@pengutronix.de \
    --cc=yu-hao.lin@nxp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).