public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-kernel@vger.kernel.org,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Saravana Kannan <saravanak@google.com>,
	Heikki Krogerus <heikki.krogerus@linux.intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Michal Hocko <mhocko@kernel.org>
Subject: Re: [PATCH v1] driver core: check for dead devices before onlining/offlining
Date: Fri, 24 Jan 2020 10:09:03 +0100	[thread overview]
Message-ID: <6dd0ea5a-e5c3-c4b6-2b2e-93537571d7d6@redhat.com> (raw)
In-Reply-To: <20200124090052.GA2958140@kroah.com>

On 24.01.20 10:00, Greg Kroah-Hartman wrote:
> On Mon, Jan 20, 2020 at 11:49:09AM +0100, David Hildenbrand wrote:
>> We can have rare cases where the removal of a device races with
>> somebody trying to online it (esp. via sysfs). We can simply check
>> if the device is already removed or getting removed under the dev->lock.
>>
>> E.g., right now, if memory block devices are removed (remove_memory()),
>> we do a:
>>
>> remove_memory() -> lock_device_hotplug() -> mem_hotplug_begin() ->
>> lock_device() -> dev->dead = true
>>
>> Somebody coming via sysfs (/sys/devices/system/memory/memoryX/online)
>> triggers a:
>>
>> lock_device_hotplug_sysfs() -> device_online() -> lock_device() ...
>>
>> So if we made it just before the lock_device_hotplug_sysfs() but get
>> delayed until remove_memory() released all locks, we will continue
>> taking locks and trying to online the device - which is then a zombie
>> device.
>>
>> Note that at least the memory onlining path seems to be protected by
>> checking if all memory sections are still present (something we can then
>> get rid of). We do have other sysfs attributes
>> (e.g., /sys/devices/system/memory/memoryX/valid_zones) that don't do any
>> such locking yet and might race with memory removal in a similar way. For
>> these users, we can then do a
>>
>> device_lock(dev);
>> if (!device_is_dead(dev)) {
>> 	/* magic /*
>> }
>> device_unlock(dev);
>>
>> Introduce and use device_is_dead() right away.
>>
>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
>> Cc: Saravana Kannan <saravanak@google.com>
>> Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com>
>> Cc: Dan Williams <dan.j.williams@intel.com>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>
>> Am I missing any obvious mechanism in the device core that handles
>> something like this already? (especially also for other sysfs attributes?)
> 
> So is a sysfs attribute causing the device itself to go away?  We have

nope, removal is triggered via the driver, not via a sysfs attribute.

Regarding this patch: Is there anything prohibiting the possible
scenario I document above (IOW, is this patch applicable, or is there
another way to fence it properly (e.g., the "specific call" you mentioned))?

-- 
Thanks,

David / dhildenb


  reply	other threads:[~2020-01-24  9:09 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-20 10:49 [PATCH v1] driver core: check for dead devices before onlining/offlining David Hildenbrand
2020-01-24  9:00 ` Greg Kroah-Hartman
2020-01-24  9:09   ` David Hildenbrand [this message]
2020-01-24  9:12     ` Greg Kroah-Hartman
2020-01-24 13:31       ` David Hildenbrand
2020-01-24 13:48         ` David Hildenbrand
2020-01-24 16:31           ` David Hildenbrand
2020-01-24  9:38     ` Rafael J. Wysocki
2020-01-24 13:29       ` David Hildenbrand
2020-01-24 17:14 ` David Hildenbrand
2020-01-24 18:40   ` Dan Williams
2020-01-24 18:58     ` David Hildenbrand
2020-01-24 22:36       ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6dd0ea5a-e5c3-c4b6-2b2e-93537571d7d6@redhat.com \
    --to=david@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=heikki.krogerus@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=rafael@kernel.org \
    --cc=saravanak@google.com \
    --cc=suzuki.poulose@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox