netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Juergen Gross <jgross@suse.com>,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Stefano Stabellini <sstabellini@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Antoine Tenart <atenart@kernel.org>,
	"moderated list:XEN HYPERVISOR INTERFACE" 
	<xen-devel@lists.xenproject.org>,
	"open list:NETWORKING DRIVERS" <netdev@vger.kernel.org>
Subject: Re: [PATCH] xen/netfront: destroy queues before real_num_tx_queues is zeroed
Date: Wed, 23 Feb 2022 22:14:48 +0100	[thread overview]
Message-ID: <YhajyDmotYNQLx9J@mail-itl> (raw)
In-Reply-To: <20220222120301.10af2737@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>

[-- Attachment #1: Type: text/plain, Size: 4940 bytes --]

On Tue, Feb 22, 2022 at 12:03:01PM -0800, Jakub Kicinski wrote:
> On Mon, 21 Feb 2022 07:27:32 +0100 Juergen Gross wrote:
> > On 20.02.22 14:42, Marek Marczykowski-Górecki wrote:
> > > xennet_destroy_queues() relies on info->netdev->real_num_tx_queues to
> > > delete queues. Since d7dac083414eb5bb99a6d2ed53dc2c1b405224e5
> > > ("net-sysfs: update the queue counts in the unregistration path"),
> > > unregister_netdev() indirectly sets real_num_tx_queues to 0. Those two
> > > facts together means, that xennet_destroy_queues() called from
> > > xennet_remove() cannot do its job, because it's called after
> > > unregister_netdev(). This results in kfree-ing queues that are still
> > > linked in napi, which ultimately crashes:
> > > 
> > >      BUG: kernel NULL pointer dereference, address: 0000000000000000
> > >      #PF: supervisor read access in kernel mode
> > >      #PF: error_code(0x0000) - not-present page
> > >      PGD 0 P4D 0
> > >      Oops: 0000 [#1] PREEMPT SMP PTI
> > >      CPU: 1 PID: 52 Comm: xenwatch Tainted: G        W         5.16.10-1.32.fc32.qubes.x86_64+ #226
> > >      RIP: 0010:free_netdev+0xa3/0x1a0
> > >      Code: ff 48 89 df e8 2e e9 00 00 48 8b 43 50 48 8b 08 48 8d b8 a0 fe ff ff 48 8d a9 a0 fe ff ff 49 39 c4 75 26 eb 47 e8 ed c1 66 ff <48> 8b 85 60 01 00 00 48 8d 95 60 01 00 00 48 89 ef 48 2d 60 01 00
> > >      RSP: 0000:ffffc90000bcfd00 EFLAGS: 00010286
> > >      RAX: 0000000000000000 RBX: ffff88800edad000 RCX: 0000000000000000
> > >      RDX: 0000000000000001 RSI: ffffc90000bcfc30 RDI: 00000000ffffffff
> > >      RBP: fffffffffffffea0 R08: 0000000000000000 R09: 0000000000000000
> > >      R10: 0000000000000000 R11: 0000000000000001 R12: ffff88800edad050
> > >      R13: ffff8880065f8f88 R14: 0000000000000000 R15: ffff8880066c6680
> > >      FS:  0000000000000000(0000) GS:ffff8880f3300000(0000) knlGS:0000000000000000
> > >      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >      CR2: 0000000000000000 CR3: 00000000e998c006 CR4: 00000000003706e0
> > >      Call Trace:
> > >       <TASK>
> > >       xennet_remove+0x13d/0x300 [xen_netfront]
> > >       xenbus_dev_remove+0x6d/0xf0
> > >       __device_release_driver+0x17a/0x240
> > >       device_release_driver+0x24/0x30
> > >       bus_remove_device+0xd8/0x140
> > >       device_del+0x18b/0x410
> > >       ? _raw_spin_unlock+0x16/0x30
> > >       ? klist_iter_exit+0x14/0x20
> > >       ? xenbus_dev_request_and_reply+0x80/0x80
> > >       device_unregister+0x13/0x60
> > >       xenbus_dev_changed+0x18e/0x1f0
> > >       xenwatch_thread+0xc0/0x1a0
> > >       ? do_wait_intr_irq+0xa0/0xa0
> > >       kthread+0x16b/0x190
> > >       ? set_kthread_struct+0x40/0x40
> > >       ret_from_fork+0x22/0x30
> > >       </TASK>
> > > 
> > > Fix this by calling xennet_destroy_queues() from xennet_close() too,
> > > when real_num_tx_queues is still available. This ensures that queues are
> > > destroyed when real_num_tx_queues is set to 0, regardless of how
> > > unregister_netdev() was called.
> > > 
> > > Originally reported at
> > > https://github.com/QubesOS/qubes-issues/issues/7257
> > > 
> > > Fixes: d7dac083414eb5bb9 ("net-sysfs: update the queue counts in the unregistration path")
> > > Cc: stable@vger.kernel.org # 5.16+
> > > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > > 
> > > ---
> > > While this fixes the issue, I'm not sure if that is the correct thing
> > > to do. xennet_remove() calls xennet_destroy_queues() under rtnl_lock,
> > > which may be important here? Just moving xennet_destroy_queues() before  
> > 
> > I checked some of the call paths leading to xennet_close(), and all of
> > those contained an ASSERT_RTNL(), so it seems the rtnl_lock is already
> > taken here. Could you test with adding an ASSERT_RTNL() in
> > xennet_destroy_queues()?
> > 
> > > unregister_netdev() in xennet_remove() did not helped - it crashed in
> > > another way (use-after-free in xennet_close()).  
> > 
> > Yes, this would need to basically do the xennet_close() handling in
> > xennet_destroy() instead, which I believe is not really an option.
> 
> I think the patch makes open/close asymmetric, tho. After ifup ; ifdown;
> the next ifup will fail because queues are already destroyed, no?
> IOW xennet_open() expects the queues were created at an earlier stage.

Right.

> Maybe we can move the destroy to ndo_uninit? (and create to ndo_init?)

It looks like talk_to_netback(), which currently create queues, needs
them for for quite some work. It is also called when reconnecting (and
netdev is _not_ re-registered in this case), so that would be a
significant refactor.
But, moving destroy to ndo_uninit() should be fine. It works, including
after ifup;ifdown;ifup case too. I'll send v2 shortly.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

      reply	other threads:[~2022-02-23 21:14 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-20 13:42 [PATCH] xen/netfront: destroy queues before real_num_tx_queues is zeroed Marek Marczykowski-Górecki
2022-02-21  6:27 ` Juergen Gross
2022-02-22  0:15   ` Marek Marczykowski-Górecki
2022-02-22 20:03   ` Jakub Kicinski
2022-02-23 21:14     ` Marek Marczykowski-Górecki [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YhajyDmotYNQLx9J@mail-itl \
    --to=marmarek@invisiblethingslab.com \
    --cc=atenart@kernel.org \
    --cc=boris.ostrovsky@oracle.com \
    --cc=davem@davemloft.net \
    --cc=jgross@suse.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=sstabellini@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).