netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg KH <gregkh@linuxfoundation.org>
To: Lin Ma <linma@zju.edu.cn>
Cc: Duoming Zhou <duoming@zju.edu.cn>,
	krzysztof.kozlowski@linaro.org, pabeni@redhat.com,
	linux-kernel@vger.kernel.org, davem@davemloft.net,
	alexander.deucher@amd.com, akpm@linux-foundation.org,
	broonie@kernel.org, netdev@vger.kernel.org,
	Jakub Kicinski <kuba@kernel.org>
Subject: Re: [PATCH net v4] nfc: ... device_is_registered() is data race-able
Date: Thu, 28 Apr 2022 15:37:06 +0200	[thread overview]
Message-ID: <YmqYgu++0OuhfFxy@kroah.com> (raw)
In-Reply-To: <f51aa1.41ae.180705614b5.Coremail.linma@zju.edu.cn>

On Thu, Apr 28, 2022 at 09:22:11PM +0800, Lin Ma wrote:
> Hello there,
> 
> > 
> > Yes, that looks better, 
> 
> Cool, thanks again for giving comments. :)
> 
> > but what is the root problem here that you are
> > trying to solve?  Why does NFC need this when no other subsystem does?
> > 
> 
> Well, in fact, me and Duoming are keep finding concurrency bugs that happen 
> between the device cleanup/detach routine and other undergoing routines.
> 
> That is to say, when a device, no matter real or virtual, is detached from 
> the machine, the kernel awake cleanup routine to reclaim the resource. 
> In current case, the cleanup routine will call nfc_unregister_device().
> 
> Other routines, mainly from user-space system calls, need to be careful of 
> the cleanup event. In another word, the kernel need to synchronize these 
> routines to avoid race bugs.
> 
> In our practice, we find that many subsystems are prone to this type of bug.
> 
> For example, in bluetooth we fix
> 
> BT subsystem
> * e2cb6b891ad2 ("bluetooth: eliminate the potential race condition when removing
> the HCI controller")
> * fa78d2d1d64f ("Bluetooth: fix data races in smp_unregister(), smp_del_chan()")
> ..
> 
> AX25 subsystem
> * 1ade48d0c27d ("ax25: NPD bug when detaching AX25 device")
> ..
> 
> we currently focus on the net relevant subsystems and we now is auditing the NFC 
> code.
> 
> In another word, all subsystems need to take care of the synchronization issues.
> But seems that the solutions are varied between different subsystem. 
> 
> Empirically speaking, most of them use specific flags + specific locks to prevent
> the race. 
> 
> In such cases, if the cleanup routine first hold the lock, the other routines will
> wait on the locks. Since the cleanup routine write the specific flag, the other
> routine, after check the specific flag, will be aware of the cleanup stuff and just
> abort their tasks.
> If the other routines first hold the lock, the cleanup routine just wait them to 
> finish.
> 
> NFC here is special because it uses device_is_registered. I thought the author may
> believe this macro is race free. However, it is not. So we need to replace this check
> to make sure the netlink functions will 100 percent be aware of the cleanup routine
> and abort the task if they grab the device_lock lately. Otherwise, the nelink routine
> will call sub-layer code and possilby dereference resources that already freed.
> 
> For example, one of my recent fix 3e3b5dfcd16a ("NFC: reorder the logic in 
> nfc_{un,}register_device") takes the suggestion from maintainer as he thought the 
> device_is_registered is enough. And for now we find out this device_is_registered
> is not enough.

How do you physically remove a NFC device from a system?  What types of
busses are they on?  If non-removable one, then odds are there's not
going to be many races so this is a low-priority thing.  If they can be
on removable busses (USB, PCI, etc.), then that's a bigger thing.

But again, the NFC bus code should handle all of this for the drivers,
there's nothing special about NFC that should warrant a special need for
this type of thing.

thanks,

greg k-h

  reply	other threads:[~2022-04-28 13:37 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-27  1:14 [PATCH net v4] nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs Duoming Zhou
2022-04-28  0:45 ` Jakub Kicinski
2022-04-28  4:03   ` duoming
2022-04-28  7:15   ` [PATCH net v4] nfc: ... device_is_registered() is data race-able Lin Ma
2022-04-28  7:38     ` Greg KH
2022-04-28  7:55       ` Lin Ma
2022-04-28  8:16         ` Greg KH
2022-04-28  8:49           ` Lin Ma
2022-04-28  9:20             ` Greg KH
2022-04-28 13:06               ` Jakub Kicinski
2022-04-28 13:22                 ` Lin Ma
2022-04-28 13:37                   ` Greg KH [this message]
2022-04-28 13:53                     ` Lin Ma
2022-04-28 14:12                       ` Greg KH
2022-04-29  3:17                         ` Lin Ma

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YmqYgu++0OuhfFxy@kroah.com \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.deucher@amd.com \
    --cc=broonie@kernel.org \
    --cc=davem@davemloft.net \
    --cc=duoming@zju.edu.cn \
    --cc=krzysztof.kozlowski@linaro.org \
    --cc=kuba@kernel.org \
    --cc=linma@zju.edu.cn \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).