From: Alex Chiang <achiang@hp.com>
To: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>,
Ingo Molnar <mingo@elte.hu>,
jbarnes@virtuousgeek.org, gregkh@suse.de, tj@kernel.org,
linux-kernel@vger.kernel.org
Subject: [PATCH, RFC] sysfs: only allow one scheduled removal callback per kobj
Date: Tue, 10 Mar 2009 17:20:27 -0600 [thread overview]
Message-ID: <20090310232027.GC25665@ldl.fc.hp.com> (raw)
Hi Vegard, sysfs folks,
Vegard was nice enough to test my PCI remove/rescan patches under
kmemcheck. Maybe "torture" is a more appropriate term. ;)
My patch series introduces a sysfs "remove" attribute for PCI
devices, which will remove that device (and child devices).
http://thread.gmane.org/gmane.linux.kernel.pci/3495
Vegard decided that he wanted to do something like:
# while true ; do echo 1 > /sys/bus/pci/devices/.../remove ; done
which caused a nasty oops in my code. You can see the results of
his testing in the thread I referenced above.
After looking at my code for a bit, I decided that maybe it
wasn't completely my fault. ;) See, I'm using device_schedule_callback()
which really is a wrapper around sysfs_schedule_callback() which
is the way that a sysfs attribute is supposed to remove itself to
prevent deadlock.
The problem that Vegard's test exposed is that if you repeatedly
call a sysfs attribute that's supposed to remove itself using
device_schedule_callback, we'll keep scheduling work queue tasks
with a kobj that we really want to release.
[nb, I bet that /sys/bus/scsi/devices/.../delete will exhibit the
same problems]
This is very racy, and at some point, whatever remove handler
we've scheduled with device_schedule_callback will end up
referencing a freed kobj.
I came up with the below patch which changes the semantics of
device/sysfs_schedule_callback. We now only allow one in-flight
callback per kobj, and return -EBUSY if that kobj already has a
callback scheduled for it.
This patch, along with my updated 07/11 patch in my series,
prevents at least the first oops that Vegard reported, and I
suspect it prevents the second kmemcheck error too, although I
haven't tested under kmemcheck (yet*).
I'm looking for comments on the approach I took, specifically:
- are we ok with the new restriction I imposed?
- is it ok to return -EBUSY to our callers?
- is the simple linked list proof of concept
implementation going to scale too poorly?
To answer my own first two questions, I checked for callers of
both device_ and sysfs_schedule_callback, and it looks like
everyone is using it the same way: to schedule themselves for
removal. That is, although the interface could be used to
schedule any sort of callback, the only use case is the removal
use case. I don't think it will be a problem to limit ourselves
to one remove callback per kobj.
Maybe this patch really wants to be a new interface called
sysfs_schedule_callback_once or _single, where we check for an
already-scheduled callback for a kobj, and if we pass, then we
simply continue on to the existing, unchanged
sysfs_schedule_callback. I don't feel too strongly about creating
a new interface, but my belief is that changing the semantics of
the existing interface is probably the better solution.
My opinion on my own third question is that removing a device is
not in the performance path, so a simple linked list is
sufficient.
Depending on the feedback here, I'll resend this patch with a
full changelog (and giving credit to Vegard/kmemcheck as Ingo
requested I do) or I can rework it.
Thanks.
/ac
*: I googled for kmemcheck and the most recent tree I found was
from 2008. Is that really true? Do the patches apply to current
upstream?
---
diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index 1f4a3f8..e05a172 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -659,13 +659,16 @@ void sysfs_remove_file_from_group(struct kobject *kobj,
EXPORT_SYMBOL_GPL(sysfs_remove_file_from_group);
struct sysfs_schedule_callback_struct {
- struct kobject *kobj;
+ struct list_head workq_list;
+ struct kobject *kobj;
void (*func)(void *);
void *data;
struct module *owner;
struct work_struct work;
};
+static DEFINE_MUTEX(sysfs_workq_mutex);
+static LIST_HEAD(sysfs_workq);
static void sysfs_schedule_callback_work(struct work_struct *work)
{
struct sysfs_schedule_callback_struct *ss = container_of(work,
@@ -674,6 +677,9 @@ static void sysfs_schedule_callback_work(struct work_struct *work)
(ss->func)(ss->data);
kobject_put(ss->kobj);
module_put(ss->owner);
+ mutex_lock(&sysfs_workq_mutex);
+ list_del(&ss->workq_list);
+ mutex_unlock(&sysfs_workq_mutex);
kfree(ss);
}
@@ -700,10 +706,19 @@ static void sysfs_schedule_callback_work(struct work_struct *work)
int sysfs_schedule_callback(struct kobject *kobj, void (*func)(void *),
void *data, struct module *owner)
{
- struct sysfs_schedule_callback_struct *ss;
+ struct sysfs_schedule_callback_struct *ss, *tmp;
if (!try_module_get(owner))
return -ENODEV;
+
+ mutex_lock(&sysfs_workq_mutex);
+ list_for_each_entry_safe(ss, tmp, &sysfs_workq, workq_list)
+ if (ss->kobj == kobj) {
+ mutex_unlock(&sysfs_workq_mutex);
+ return -EBUSY;
+ }
+ mutex_unlock(&sysfs_workq_mutex);
+
ss = kmalloc(sizeof(*ss), GFP_KERNEL);
if (!ss) {
module_put(owner);
@@ -715,6 +730,10 @@ int sysfs_schedule_callback(struct kobject *kobj, void (*func)(void *),
ss->data = data;
ss->owner = owner;
INIT_WORK(&ss->work, sysfs_schedule_callback_work);
+ INIT_LIST_HEAD(&ss->workq_list);
+ mutex_lock(&sysfs_workq_mutex);
+ list_add_tail(&ss->workq_list, &sysfs_workq);
+ mutex_unlock(&sysfs_workq_mutex);
schedule_work(&ss->work);
return 0;
}
next reply other threads:[~2009-03-10 23:20 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-10 23:20 Alex Chiang [this message]
2009-03-11 4:41 ` [PATCH, RFC] sysfs: only allow one scheduled removal callback per kobj Greg KH
2009-03-11 7:03 ` Alex Chiang
2009-03-11 7:20 ` Tejun Heo
2009-03-12 0:27 ` Alex Chiang
2009-03-12 3:22 ` Greg KH
2009-03-12 22:02 ` Alex Chiang
2009-03-13 12:03 ` Cornelia Huck
2009-03-13 18:08 ` Alex Chiang
2009-03-11 15:32 ` Greg KH
2009-03-11 17:47 ` Cornelia Huck
2009-03-11 18:14 ` Alex Chiang
2009-03-11 18:19 ` Greg KH
2009-03-11 18:42 ` Alex Chiang
2009-03-12 10:25 ` Cornelia Huck
2009-03-12 21:33 ` Alex Chiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090310232027.GC25665@ldl.fc.hp.com \
--to=achiang@hp.com \
--cc=gregkh@suse.de \
--cc=jbarnes@virtuousgeek.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=penberg@cs.helsinki.fi \
--cc=tj@kernel.org \
--cc=vegard.nossum@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox