netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg KH <gregkh@linuxfoundation.org>
To: Lin Ma <linma@zju.edu.cn>
Cc: Jakub Kicinski <kuba@kernel.org>,
	Duoming Zhou <duoming@zju.edu.cn>,
	krzysztof.kozlowski@linaro.org, pabeni@redhat.com,
	linux-kernel@vger.kernel.org, davem@davemloft.net,
	alexander.deucher@amd.com, akpm@linux-foundation.org,
	broonie@kernel.org, netdev@vger.kernel.org
Subject: Re: [PATCH net v4] nfc: ... device_is_registered() is data race-able
Date: Thu, 28 Apr 2022 09:38:13 +0200	[thread overview]
Message-ID: <YmpEZQ7EnOIWlsy8@kroah.com> (raw)
In-Reply-To: <38929d91.237b.1806f05f467.Coremail.linma@zju.edu.cn>

On Thu, Apr 28, 2022 at 03:15:02PM +0800, Lin Ma wrote:
> Hello Jakub,
> 
> and hello there maintainers, when we tried to fix this race problem, we found another very weird issue as below
> 
> > 
> > You can't use a single global variable, there can be many devices 
> > each with their own lock.
> > 
> > Paolo suggested adding a lock, if spin lock doesn't fit the bill
> > why not add a mutex?
> 
> The lock patch can be added to nfcmrvl code but we prefer to fix this in the NFC core layer hence every other driver that supports the firmware downloading task will be free of such a problem.
> 
> But when we analyze the race between the netlink tasks and the cleanup routine, we find some *condition checks* fail to fulfill their responsibility.
> 
> For example, we once though that the device_lock + device_is_registered check can help to fix the race as below.
> 
>   netlink task              |  cleanup routine
>                             |
> nfc_genl_fw_download        | nfc_unregister_device 
>   nfc_fw_download           |   device_del 
>     device_lock             |     device_lock
>     // wait lock            |       kobject_del
>     // ...                  |         ...
>     device_is_registered    |     device_unlock    
>       rc = -ENODEV          |
> 
> However, by dynamic debugging this issue, we find out that **even after the device_del**, the device_is_registered check still returns TRUE!

You should not be making these types of checks outside of the driver
core.

> This is by no means matching our expectations as one of our previous patch relies on the device_is_registered code.

Please do not do that.

> 
> -> the patch: 3e3b5dfcd16a ("NFC: reorder the logic in nfc_{un,}register_device")
> 
> To find out why, we find out the device_is_registered is implemented like below:
> 
> static inline int device_is_registered(struct device *dev)
> {
> 	return dev->kobj.state_in_sysfs;
> }
> 
> By debugging, we find out in normal case, this kobj.state_in_sysfs will be clear out like below
> 
> [#0] 0xffffffff81f0743a → __kobject_del(kobj=0xffff888009ca7018)
> [#1] 0xffffffff81f07882 → kobject_del(kobj=0xffff888009ca7018)
> [#2] 0xffffffff81f07882 → kobject_del(kobj=0xffff888009ca7018)
> [#3] 0xffffffff827708db → device_del(dev=0xffff888009ca7018)
> [#4] 0xffffffff8396496f → nfc_unregister_device(dev=0xffff888009ca7000)
> [#5] 0xffffffff839850a9 → nci_unregister_device(ndev=0xffff888009ca3000)
> [#6] 0xffffffff82811308 → nfcmrvl_nci_unregister_dev(priv=0xffff88800c805c00)
> [#7] 0xffffffff83990c4f → nci_uart_tty_close(tty=0xffff88800b450000)
> [#8] 0xffffffff820f6bd3 → tty_ldisc_kill(tty=0xffff88800b450000)
> [#9] 0xffffffff820f7fb1 → tty_ldisc_hangup(tty=0xffff88800b450000, reinit=0x0)
> 
> The clear out is in function __kobject_del
> 
> static void __kobject_del(struct kobject *kobj)
> {
>    // ...
> 
> 	kobj->state_in_sysfs = 0;
> 	kobj_kset_leave(kobj);
> 	kobj->parent = NULL;
> }
> 
> The structure of device_del is like below
> 
> void device_del(struct device *dev)
> {
> 	struct device *parent = dev->parent;
> 	struct kobject *glue_dir = NULL;
> 	struct class_interface *class_intf;
> 	unsigned int noio_flag;
> 
> 	device_lock(dev);
> 	kill_device(dev);
> 	device_unlock(dev);
>         
>         // ...
>         kobject_del(&dev->kobj);
> 	cleanup_glue_dir(dev, glue_dir);
> 	memalloc_noio_restore(noio_flag);
> 	put_device(parent);
> }
> 
> In another word, the device_del -> kobject_del -> __kobject_del is not protected by the device_lock.

Nor should it be.

> This means the device_lock + device_is_registered is still prone to the data race. And this is not just the problem with firmware downloading. The all relevant netlink tasks that use the device_lock + device_is_registered is possible to be raced.
> 
> To this end, we will come out with two patches, one for fixing this device_is_registered by using another status variable instead. The other is the patch that reorders the code in nci_unregister_device.

Why is this somehow unique to these devices?  Why do no other buses have
this issue?  Are you somehow allowing a code path that should not be
happening?

thanks,

greg k-h

  reply	other threads:[~2022-04-28  7:38 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-27  1:14 [PATCH net v4] nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs Duoming Zhou
2022-04-28  0:45 ` Jakub Kicinski
2022-04-28  4:03   ` duoming
2022-04-28  7:15   ` [PATCH net v4] nfc: ... device_is_registered() is data race-able Lin Ma
2022-04-28  7:38     ` Greg KH [this message]
2022-04-28  7:55       ` Lin Ma
2022-04-28  8:16         ` Greg KH
2022-04-28  8:49           ` Lin Ma
2022-04-28  9:20             ` Greg KH
2022-04-28 13:06               ` Jakub Kicinski
2022-04-28 13:22                 ` Lin Ma
2022-04-28 13:37                   ` Greg KH
2022-04-28 13:53                     ` Lin Ma
2022-04-28 14:12                       ` Greg KH
2022-04-29  3:17                         ` Lin Ma

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YmpEZQ7EnOIWlsy8@kroah.com \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.deucher@amd.com \
    --cc=broonie@kernel.org \
    --cc=davem@davemloft.net \
    --cc=duoming@zju.edu.cn \
    --cc=krzysztof.kozlowski@linaro.org \
    --cc=kuba@kernel.org \
    --cc=linma@zju.edu.cn \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).