From: Greg KH <gregkh@linuxfoundation.org>
To: Lin Ma <linma@zju.edu.cn>
Cc: Duoming Zhou <duoming@zju.edu.cn>,
krzysztof.kozlowski@linaro.org, pabeni@redhat.com,
linux-kernel@vger.kernel.org, davem@davemloft.net,
alexander.deucher@amd.com, akpm@linux-foundation.org,
broonie@kernel.org, netdev@vger.kernel.org,
Jakub Kicinski <kuba@kernel.org>
Subject: Re: [PATCH net v4] nfc: ... device_is_registered() is data race-able
Date: Thu, 28 Apr 2022 15:37:06 +0200 [thread overview]
Message-ID: <YmqYgu++0OuhfFxy@kroah.com> (raw)
In-Reply-To: <f51aa1.41ae.180705614b5.Coremail.linma@zju.edu.cn>
On Thu, Apr 28, 2022 at 09:22:11PM +0800, Lin Ma wrote:
> Hello there,
>
> >
> > Yes, that looks better,
>
> Cool, thanks again for giving comments. :)
>
> > but what is the root problem here that you are
> > trying to solve? Why does NFC need this when no other subsystem does?
> >
>
> Well, in fact, me and Duoming are keep finding concurrency bugs that happen
> between the device cleanup/detach routine and other undergoing routines.
>
> That is to say, when a device, no matter real or virtual, is detached from
> the machine, the kernel awake cleanup routine to reclaim the resource.
> In current case, the cleanup routine will call nfc_unregister_device().
>
> Other routines, mainly from user-space system calls, need to be careful of
> the cleanup event. In another word, the kernel need to synchronize these
> routines to avoid race bugs.
>
> In our practice, we find that many subsystems are prone to this type of bug.
>
> For example, in bluetooth we fix
>
> BT subsystem
> * e2cb6b891ad2 ("bluetooth: eliminate the potential race condition when removing
> the HCI controller")
> * fa78d2d1d64f ("Bluetooth: fix data races in smp_unregister(), smp_del_chan()")
> ..
>
> AX25 subsystem
> * 1ade48d0c27d ("ax25: NPD bug when detaching AX25 device")
> ..
>
> we currently focus on the net relevant subsystems and we now is auditing the NFC
> code.
>
> In another word, all subsystems need to take care of the synchronization issues.
> But seems that the solutions are varied between different subsystem.
>
> Empirically speaking, most of them use specific flags + specific locks to prevent
> the race.
>
> In such cases, if the cleanup routine first hold the lock, the other routines will
> wait on the locks. Since the cleanup routine write the specific flag, the other
> routine, after check the specific flag, will be aware of the cleanup stuff and just
> abort their tasks.
> If the other routines first hold the lock, the cleanup routine just wait them to
> finish.
>
> NFC here is special because it uses device_is_registered. I thought the author may
> believe this macro is race free. However, it is not. So we need to replace this check
> to make sure the netlink functions will 100 percent be aware of the cleanup routine
> and abort the task if they grab the device_lock lately. Otherwise, the nelink routine
> will call sub-layer code and possilby dereference resources that already freed.
>
> For example, one of my recent fix 3e3b5dfcd16a ("NFC: reorder the logic in
> nfc_{un,}register_device") takes the suggestion from maintainer as he thought the
> device_is_registered is enough. And for now we find out this device_is_registered
> is not enough.
How do you physically remove a NFC device from a system? What types of
busses are they on? If non-removable one, then odds are there's not
going to be many races so this is a low-priority thing. If they can be
on removable busses (USB, PCI, etc.), then that's a bigger thing.
But again, the NFC bus code should handle all of this for the drivers,
there's nothing special about NFC that should warrant a special need for
this type of thing.
thanks,
greg k-h
next prev parent reply other threads:[~2022-04-28 13:37 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-27 1:14 [PATCH net v4] nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs Duoming Zhou
2022-04-28 0:45 ` Jakub Kicinski
2022-04-28 4:03 ` duoming
2022-04-28 7:15 ` [PATCH net v4] nfc: ... device_is_registered() is data race-able Lin Ma
2022-04-28 7:38 ` Greg KH
2022-04-28 7:55 ` Lin Ma
2022-04-28 8:16 ` Greg KH
2022-04-28 8:49 ` Lin Ma
2022-04-28 9:20 ` Greg KH
2022-04-28 13:06 ` Jakub Kicinski
2022-04-28 13:22 ` Lin Ma
2022-04-28 13:37 ` Greg KH [this message]
2022-04-28 13:53 ` Lin Ma
2022-04-28 14:12 ` Greg KH
2022-04-29 3:17 ` Lin Ma
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YmqYgu++0OuhfFxy@kroah.com \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=alexander.deucher@amd.com \
--cc=broonie@kernel.org \
--cc=davem@davemloft.net \
--cc=duoming@zju.edu.cn \
--cc=krzysztof.kozlowski@linaro.org \
--cc=kuba@kernel.org \
--cc=linma@zju.edu.cn \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.