All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stanislav Fomichev <stfomichev@gmail.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Mina Almasry <almasrymina@google.com>,
	Taehee Yoo <ap420073@gmail.com>,
	davem@davemloft.net, pabeni@redhat.com, edumazet@google.com,
	andrew+netdev@lunn.ch, horms@kernel.org, asml.silence@gmail.com,
	dw@davidwei.uk, sdf@fomichev.me, skhawaja@google.com,
	simona.vetter@ffwll.ch, kaiyuanz@google.com,
	netdev@vger.kernel.org
Subject: Re: [PATCH net] net: devmem: fix kernel panic when socket close after module unload
Date: Wed, 16 Apr 2025 07:40:07 -0700	[thread overview]
Message-ID: <Z__BRyblHNHhnui7@mini-arch> (raw)
In-Reply-To: <20250415195926.1c3f8aff@kernel.org>

On 04/15, Jakub Kicinski wrote:
> On Tue, 15 Apr 2025 11:59:40 -0700 Stanislav Fomichev wrote:
> > > commit 42f342387841 ("net: fix use-after-free in the
> > > netdev_nl_sock_priv_destroy()") and rolling back a few fixes, it's
> > > really introduced by commit 1d22d3060b9b ("net: drop rtnl_lock for
> > > queue_mgmt operations").
> > > 
> > > My first question, does this issue still reproduce if you remove the
> > > per netdev locking and go back to relying on rtnl_locking? Or do we
> > > crash somewhere else in net_devmem_unbind_dmabuf? If so, where?
> > > Looking through the rest of the unbinding code, it's not clear to me
> > > any of it actually uses dev, so it may just be the locking...  
> >  
> > A proper fix, most likely, will involve resetting binding->dev to NULL
> > when the device is going away.
> 
> Right, tho a bit of work and tricky handling will be necessary to get
> that right. We're not holding a ref on binding->dev.
> 
> I think we need to invert the socket mutex vs instance lock ordering.
> Make the priv mutex protect the binding->list and binding->dev.
> For that to work the binding needs to also store a pointer to its
> owning socket?
> 
> Then in both uninstall paths (from socket and from netdev unreg) we can
> take the socket mutex, delete from list, clear the ->dev pointer,
> unlock, release the ref on the binding.
> 
> The socket close path would probably need to lock the socket, look at 
> the first entry, if entry has ->dev call netdev_hold(), release the
> socket, lock the netdev, lock the socket again, look at the ->dev, if
> NULL we raced - done. If not NULL release the socket, call unbind.
> netdev_put(). Restart this paragraph.
> 
> I can't think of an easier way.

An alternative might be to have a new extra lock to just protect
the binding->bound_rxq? And we can move the netdev_lock/unlock inside
the xa_for_each loop in net_devmem_unbind_dmabuf. This will make sure
we don't touch the outdated 'dev'. But I think you're right, the same
lock ordering issue is gonna happen in this case as well.

> > Replacing rtnl with dev lock exposes the fact that we can't assume
> > that the binding->dev is still valid by the time we do unbind.
> 
> Note that binding->dev is never accessed by net_devmem_unbind_dmabuf().
> So if the device was unregistered and its queues flushed, the only thing
> we touch the netdev pointer for is the instance lock :(

I was assuming that bound_rxq is also protected by the instance lock.
But as you were saying earlier, xa has its own lock, so I might
be wrong with that assumption..

  reply	other threads:[~2025-04-16 14:40 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-15  9:24 [PATCH net] net: devmem: fix kernel panic when socket close after module unload Taehee Yoo
2025-04-15 17:33 ` Stanislav Fomichev
2025-04-15 18:22 ` Mina Almasry
2025-04-15 18:59   ` Stanislav Fomichev
2025-04-16  2:59     ` Jakub Kicinski
2025-04-16 14:40       ` Stanislav Fomichev [this message]
2025-04-17  0:15         ` Jakub Kicinski
2025-04-16 15:01       ` Taehee Yoo
2025-04-17  0:35         ` Jakub Kicinski
2025-04-17  6:57           ` Taehee Yoo
2025-04-17 14:09             ` Jakub Kicinski
2025-04-18 10:46               ` Taehee Yoo
2025-04-16 15:47       ` Mina Almasry
2025-04-17  0:27         ` Jakub Kicinski
2025-04-17 21:07           ` Mina Almasry
2025-04-18 10:52             ` Taehee Yoo
2025-05-05 17:34               ` Jakub Kicinski
2025-05-06 11:41                 ` Taehee Yoo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z__BRyblHNHhnui7@mini-arch \
    --to=stfomichev@gmail.com \
    --cc=almasrymina@google.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=ap420073@gmail.com \
    --cc=asml.silence@gmail.com \
    --cc=davem@davemloft.net \
    --cc=dw@davidwei.uk \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=kaiyuanz@google.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sdf@fomichev.me \
    --cc=simona.vetter@ffwll.ch \
    --cc=skhawaja@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.