From: Greg KH <gregkh@linuxfoundation.org>
To: Lin Ma <linma@zju.edu.cn>
Cc: Duoming Zhou <duoming@zju.edu.cn>,
krzysztof.kozlowski@linaro.org, pabeni@redhat.com,
linux-kernel@vger.kernel.org, davem@davemloft.net,
alexander.deucher@amd.com, akpm@linux-foundation.org,
broonie@kernel.org, netdev@vger.kernel.org,
Jakub Kicinski <kuba@kernel.org>
Subject: Re: [PATCH net v4] nfc: ... device_is_registered() is data race-able
Date: Thu, 28 Apr 2022 15:37:06 +0200 [thread overview]
Message-ID: <YmqYgu++0OuhfFxy@kroah.com> (raw)
In-Reply-To: <f51aa1.41ae.180705614b5.Coremail.linma@zju.edu.cn>
On Thu, Apr 28, 2022 at 09:22:11PM +0800, Lin Ma wrote:
> Hello there,
>
> >
> > Yes, that looks better,
>
> Cool, thanks again for giving comments. :)
>
> > but what is the root problem here that you are
> > trying to solve? Why does NFC need this when no other subsystem does?
> >
>
> Well, in fact, me and Duoming are keep finding concurrency bugs that happen
> between the device cleanup/detach routine and other undergoing routines.
>
> That is to say, when a device, no matter real or virtual, is detached from
> the machine, the kernel awake cleanup routine to reclaim the resource.
> In current case, the cleanup routine will call nfc_unregister_device().
>
> Other routines, mainly from user-space system calls, need to be careful of
> the cleanup event. In another word, the kernel need to synchronize these
> routines to avoid race bugs.
>
> In our practice, we find that many subsystems are prone to this type of bug.
>
> For example, in bluetooth we fix
>
> BT subsystem
> * e2cb6b891ad2 ("bluetooth: eliminate the potential race condition when removing
> the HCI controller")
> * fa78d2d1d64f ("Bluetooth: fix data races in smp_unregister(), smp_del_chan()")
> ..
>
> AX25 subsystem
> * 1ade48d0c27d ("ax25: NPD bug when detaching AX25 device")
> ..
>
> we currently focus on the net relevant subsystems and we now is auditing the NFC
> code.
>
> In another word, all subsystems need to take care of the synchronization issues.
> But seems that the solutions are varied between different subsystem.
>
> Empirically speaking, most of them use specific flags + specific locks to prevent
> the race.
>
> In such cases, if the cleanup routine first hold the lock, the other routines will
> wait on the locks. Since the cleanup routine write the specific flag, the other
> routine, after check the specific flag, will be aware of the cleanup stuff and just
> abort their tasks.
> If the other routines first hold the lock, the cleanup routine just wait them to
> finish.
>
> NFC here is special because it uses device_is_registered. I thought the author may
> believe this macro is race free. However, it is not. So we need to replace this check
> to make sure the netlink functions will 100 percent be aware of the cleanup routine
> and abort the task if they grab the device_lock lately. Otherwise, the nelink routine
> will call sub-layer code and possilby dereference resources that already freed.
>
> For example, one of my recent fix 3e3b5dfcd16a ("NFC: reorder the logic in
> nfc_{un,}register_device") takes the suggestion from maintainer as he thought the
> device_is_registered is enough. And for now we find out this device_is_registered
> is not enough.
How do you physically remove a NFC device from a system? What types of
busses are they on? If non-removable one, then odds are there's not
going to be many races so this is a low-priority thing. If they can be
on removable busses (USB, PCI, etc.), then that's a bigger thing.
But again, the NFC bus code should handle all of this for the drivers,
there's nothing special about NFC that should warrant a special need for
this type of thing.
thanks,
greg k-h
next prev parent reply other threads:[~2022-04-28 13:37 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-27 1:14 [PATCH net v4] nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs Duoming Zhou
2022-04-28 0:45 ` Jakub Kicinski
2022-04-28 4:03 ` duoming
2022-04-28 7:15 ` [PATCH net v4] nfc: ... device_is_registered() is data race-able Lin Ma
2022-04-28 7:38 ` Greg KH
2022-04-28 7:55 ` Lin Ma
2022-04-28 8:16 ` Greg KH
2022-04-28 8:49 ` Lin Ma
2022-04-28 9:20 ` Greg KH
2022-04-28 13:06 ` Jakub Kicinski
2022-04-28 13:22 ` Lin Ma
2022-04-28 13:37 ` Greg KH [this message]
2022-04-28 13:53 ` Lin Ma
2022-04-28 14:12 ` Greg KH
2022-04-29 3:17 ` Lin Ma
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YmqYgu++0OuhfFxy@kroah.com \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=alexander.deucher@amd.com \
--cc=broonie@kernel.org \
--cc=davem@davemloft.net \
--cc=duoming@zju.edu.cn \
--cc=krzysztof.kozlowski@linaro.org \
--cc=kuba@kernel.org \
--cc=linma@zju.edu.cn \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).