From: Martin Habets <habetsm.xilinx@gmail.com>
To: Ding Hui <dinghui@sangfor.com.cn>
Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, ecree.xilinx@gmail.com,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
pengdonglin@sangfor.com.cn, huangcun@sangfor.com.cn
Subject: Re: [RFC PATCH net] sfc: Fix use-after-free due to selftest_work
Date: Fri, 14 Apr 2023 13:33:49 +0100 [thread overview]
Message-ID: <ZDlILcEDU16YgTT/@gmail.com> (raw)
In-Reply-To: <a7cb0d9e-7519-f90c-bd58-eab9ee82b3dc@sangfor.com.cn>
On Fri, Apr 14, 2023 at 07:03:41PM +0800, Ding Hui wrote:
> On 2023/4/14 17:44, Martin Habets wrote:
> > On Thu, Apr 13, 2023 at 04:35:08PM +0800, Ding Hui wrote:
> > > On 2023/4/13 15:37, Martin Habets wrote:
> > > > On Wed, Apr 12, 2023 at 08:50:13AM +0800, Ding Hui wrote:
> > > > > There is a use-after-free scenario that is:
> > > > >
> > > > > When netif_running() is false, user set mac address or vlan tag to VF,
> > > > > the xxx_set_vf_mac() or xxx_set_vf_vlan() will invoke efx_net_stop()
> > > > > and efx_net_open(), since netif_running() is false, the port will not
> > > > > start and keep port_enabled false, but selftest_worker is scheduled
> > > > > in efx_net_open().
> > > > >
> > > > > If we remove the device before selftest_worker run, the efx is freed,
> > > > > then we will get a UAF in run_timer_softirq() like this:
> > > > >
> > > > > [ 1178.907941] ==================================================================
> > > > > [ 1178.907948] BUG: KASAN: use-after-free in run_timer_softirq+0xdea/0xe90
> > > > > [ 1178.907950] Write of size 8 at addr ff11001f449cdc80 by task swapper/47/0
> > > > > [ 1178.907950]
> > > > > [ 1178.907953] CPU: 47 PID: 0 Comm: swapper/47 Kdump: loaded Tainted: G O --------- -t - 4.18.0 #1
> > > > > [ 1178.907954] Hardware name: SANGFOR X620G40/WI2HG-208T1061A, BIOS SPYH051032-U01 04/01/2022
> > > > > [ 1178.907955] Call Trace:
> > > > > [ 1178.907956] <IRQ>
> > > > > [ 1178.907960] dump_stack+0x71/0xab
> > > > > [ 1178.907963] print_address_description+0x6b/0x290
> > > > > [ 1178.907965] ? run_timer_softirq+0xdea/0xe90
> > > > > [ 1178.907967] kasan_report+0x14a/0x2b0
> > > > > [ 1178.907968] run_timer_softirq+0xdea/0xe90
> > > > > [ 1178.907971] ? init_timer_key+0x170/0x170
> > > > > [ 1178.907973] ? hrtimer_cancel+0x20/0x20
> > > > > [ 1178.907976] ? sched_clock+0x5/0x10
> > > > > [ 1178.907978] ? sched_clock_cpu+0x18/0x170
> > > > > [ 1178.907981] __do_softirq+0x1c8/0x5fa
> > > > > [ 1178.907985] irq_exit+0x213/0x240
> > > > > [ 1178.907987] smp_apic_timer_interrupt+0xd0/0x330
> > > > > [ 1178.907989] apic_timer_interrupt+0xf/0x20
> > > > > [ 1178.907990] </IRQ>
> > > > > [ 1178.907991] RIP: 0010:mwait_idle+0xae/0x370
> > > > >
> > > > > I am thinking about several ways to fix the issue:
> > > > >
> > > > > [1] In this RFC, I cancel the selftest_worker unconditionally in
> > > > > efx_pci_remove().
> > > > >
> > > > > [2] Add a test condition, only invoke efx_selftest_async_start() when
> > > > > efx->port_enabled is true in efx_net_open().
> > > > >
> > > > > [3] Move invoking efx_selftest_async_start() from efx_net_open() to
> > > > > efx_start_all() or efx_start_port(), that matching cancel action in
> > > > > efx_stop_port().
> > > >
> > > > I think moving this to efx_start_port() is best, as you say to match
> > > > the cancel in efx_stop_port().
> > > >
> > >
> > > If moving to efx_start_port(), should we worry about that IRQ_TIMEOUT
> > > is still enough?
> >
> > 1 second is a long time for a machine running code, so it does not worry me.
> >
> > > I'm not sure if there is a long time waiting from starting of schedule
> > > selftest_work to the ending of efx_net_open().
> >
> > I see your point. Looking at efx_start_all() there is the call to
> > efx_start_datapath() after the call to efx_net_open(), which takes a
> ^^^^^^^^^^^^
> Do you mean efx_start_port()?
Woops, yes that what I meant.
> > relatively long time (well under 200ms though).
> > Logically it would be better to move efx_selftest_async_start() after this
> > call. What do you think?
>
> Agree with you.
>
> > The point here is that efx_start_all() calls efx_start_port() early, and
> > efx_stop_all() also calls efx_stop_port() early. The calling sequence is
> > correct but they are not the strict inverse of each other.
> >
>
> Yeah, that is what I noticed monitor_work does.
> Then I'll move efx_selftest_async_start() into efx_start_all(), follows
> the monitor_work.
Sounds good.
Thanks,
Martin
> --
> Thanks,
> - Ding Hui
prev parent reply other threads:[~2023-04-14 12:33 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-12 0:50 [RFC PATCH net] sfc: Fix use-after-free due to selftest_work Ding Hui
2023-04-12 22:34 ` Jacob Keller
2023-04-13 1:12 ` Ding Hui
2023-04-13 7:52 ` Martin Habets
2023-04-13 8:11 ` Ding Hui
2023-04-13 7:37 ` Martin Habets
2023-04-13 8:35 ` Ding Hui
2023-04-14 9:44 ` Martin Habets
2023-04-14 11:03 ` Ding Hui
2023-04-14 12:33 ` Martin Habets [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZDlILcEDU16YgTT/@gmail.com \
--to=habetsm.xilinx@gmail.com \
--cc=davem@davemloft.net \
--cc=dinghui@sangfor.com.cn \
--cc=ecree.xilinx@gmail.com \
--cc=edumazet@google.com \
--cc=huangcun@sangfor.com.cn \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pengdonglin@sangfor.com.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.