* [PATCH v2] cgroup: fix cgroup_rmdir() vs close(eventfd) race
@ 2013-02-18 6:12 Li Zefan
[not found] ` <5121C647.7030608-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: Li Zefan @ 2013-02-18 6:12 UTC (permalink / raw)
To: Tejun Heo; +Cc: Cgroups, LKML, Kirill A. Shutemov
commit 205a872bd6f9a9a09ef035ef1e90185a8245cc58 ("cgroup: fix lockdep
warning for event_control") solved a deadlock by introducing a new
bug.
Move cgrp->event_list to a temporary list doesn't mean you can traverse
this list locklessly, because at the same time cgroup_event_wake() can
be called and remove the event from the list. The result of this race
is disastrous.
We adopt the way how kvm irqfd code implements race-free event removal,
which is now described in the comments in cgroup_event_wake().
Signed-off-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
kernel/cgroup.c | 50 ++++++++++++++++++++++++++++++++++----------------
1 file changed, 34 insertions(+), 16 deletions(-)
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 26c071c..65c8101 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -217,6 +217,10 @@ struct cgroup_event {
*/
struct list_head list;
/*
+ * Need to notify userspace when this event is removed?
+ */
+ bool signal_on_remove;
+ /*
* All fields below needed to unregister event when
* userspace closes eventfd.
*/
@@ -3833,8 +3837,17 @@ static void cgroup_event_remove(struct work_struct *work)
remove);
struct cgroup *cgrp = event->cgrp;
+ remove_wait_queue(event->wqh, &event->wait);
+
event->cft->unregister_event(cgrp, event->cft, event->eventfd);
+ /*
+ * If this event is to be removed due to cgroup removal,
+ * we notify userspace.
+ */
+ if (event->signal_on_remove)
+ eventfd_signal(event->eventfd, 1);
+
eventfd_ctx_put(event->eventfd);
kfree(event);
dput(cgrp->dentry);
@@ -3854,15 +3867,25 @@ static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
unsigned long flags = (unsigned long)key;
if (flags & POLLHUP) {
- __remove_wait_queue(event->wqh, &event->wait);
- spin_lock(&cgrp->event_list_lock);
- list_del_init(&event->list);
- spin_unlock(&cgrp->event_list_lock);
/*
- * We are in atomic context, but cgroup_event_remove() may
- * sleep, so we have to call it in workqueue.
+ * If the event has been detached at cgroup removal, we
+ * can simply return knowing the other side will cleanup
+ * for us.
+ *
+ * We can't race against event freeing since the other
+ * side will require wqh->lock via remove_wait_queue(),
+ * which we hold.
*/
- schedule_work(&event->remove);
+ spin_lock(&cgrp->event_list_lock);
+ if (!list_empty(&event->list)) {
+ list_del_init(&event->list);
+ /*
+ * We are in atomic context, but cgroup_event_remove()
+ * may sleep, so we have to call it in workqueue.
+ */
+ schedule_work(&event->remove);
+ }
+ spin_unlock(&cgrp->event_list_lock);
}
return 0;
@@ -4428,20 +4451,15 @@ static int cgroup_destroy_locked(struct cgroup *cgrp)
/*
* Unregister events and notify userspace.
* Notify userspace about cgroup removing only after rmdir of cgroup
- * directory to avoid race between userspace and kernelspace. Use
- * a temporary list to avoid a deadlock with cgroup_event_wake(). Since
- * cgroup_event_wake() is called with the wait queue head locked,
- * remove_wait_queue() cannot be called while holding event_list_lock.
+ * directory to avoid race between userspace and kernelspace.
*/
spin_lock(&cgrp->event_list_lock);
- list_splice_init(&cgrp->event_list, &tmp_list);
- spin_unlock(&cgrp->event_list_lock);
- list_for_each_entry_safe(event, tmp, &tmp_list, list) {
+ list_for_each_entry_safe(event, tmp, &cgrp->event_list, list) {
+ event->signal_on_remove = true;
list_del_init(&event->list);
- remove_wait_queue(event->wqh, &event->wait);
- eventfd_signal(event->eventfd, 1);
schedule_work(&event->remove);
}
+ spin_unlock(&cgrp->event_list_lock);
return 0;
}
--
1.8.0.2
^ permalink raw reply related [flat|nested] 5+ messages in thread[parent not found: <5121C647.7030608-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH v2] cgroup: fix cgroup_rmdir() vs close(eventfd) race [not found] ` <5121C647.7030608-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> @ 2013-02-18 10:36 ` Kirill A. Shutemov 2013-02-18 10:39 ` Li Zefan 2013-02-18 17:16 ` Tejun Heo 1 sibling, 1 reply; 5+ messages in thread From: Kirill A. Shutemov @ 2013-02-18 10:36 UTC (permalink / raw) To: Li Zefan; +Cc: Tejun Heo, Cgroups, LKML On Mon, Feb 18, 2013 at 02:12:23PM +0800, Li Zefan wrote: > commit 205a872bd6f9a9a09ef035ef1e90185a8245cc58 ("cgroup: fix lockdep > warning for event_control") solved a deadlock by introducing a new > bug. > > Move cgrp->event_list to a temporary list doesn't mean you can traverse > this list locklessly, because at the same time cgroup_event_wake() can > be called and remove the event from the list. The result of this race > is disastrous. > > We adopt the way how kvm irqfd code implements race-free event removal, > which is now described in the comments in cgroup_event_wake(). > > Signed-off-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> > --- > kernel/cgroup.c | 50 ++++++++++++++++++++++++++++++++++---------------- > 1 file changed, 34 insertions(+), 16 deletions(-) > > diff --git a/kernel/cgroup.c b/kernel/cgroup.c > index 26c071c..65c8101 100644 > --- a/kernel/cgroup.c > +++ b/kernel/cgroup.c > @@ -217,6 +217,10 @@ struct cgroup_event { > */ > struct list_head list; > /* > + * Need to notify userspace when this event is removed? > + */ > + bool signal_on_remove; > + /* > * All fields below needed to unregister event when > * userspace closes eventfd. > */ > @@ -3833,8 +3837,17 @@ static void cgroup_event_remove(struct work_struct *work) > remove); > struct cgroup *cgrp = event->cgrp; > > + remove_wait_queue(event->wqh, &event->wait); > + > event->cft->unregister_event(cgrp, event->cft, event->eventfd); > > + /* > + * If this event is to be removed due to cgroup removal, > + * we notify userspace. > + */ > + if (event->signal_on_remove) > + eventfd_signal(event->eventfd, 1); It's safe to notify anyway, isn't it? Let's just drop signal_on_remove. Otherwise, look good. Acked-by: Kirill A. Shutemov <kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org> -- Kirill A. Shutemov ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] cgroup: fix cgroup_rmdir() vs close(eventfd) race 2013-02-18 10:36 ` Kirill A. Shutemov @ 2013-02-18 10:39 ` Li Zefan 0 siblings, 0 replies; 5+ messages in thread From: Li Zefan @ 2013-02-18 10:39 UTC (permalink / raw) To: Kirill A. Shutemov; +Cc: Tejun Heo, Cgroups, LKML On 2013/2/18 18:36, Kirill A. Shutemov wrote: > On Mon, Feb 18, 2013 at 02:12:23PM +0800, Li Zefan wrote: >> commit 205a872bd6f9a9a09ef035ef1e90185a8245cc58 ("cgroup: fix lockdep >> warning for event_control") solved a deadlock by introducing a new >> bug. >> >> Move cgrp->event_list to a temporary list doesn't mean you can traverse >> this list locklessly, because at the same time cgroup_event_wake() can >> be called and remove the event from the list. The result of this race >> is disastrous. >> >> We adopt the way how kvm irqfd code implements race-free event removal, >> which is now described in the comments in cgroup_event_wake(). >> >> Signed-off-by: Li Zefan <lizefan@huawei.com> >> --- >> kernel/cgroup.c | 50 ++++++++++++++++++++++++++++++++++---------------- >> 1 file changed, 34 insertions(+), 16 deletions(-) >> >> diff --git a/kernel/cgroup.c b/kernel/cgroup.c >> index 26c071c..65c8101 100644 >> --- a/kernel/cgroup.c >> +++ b/kernel/cgroup.c >> @@ -217,6 +217,10 @@ struct cgroup_event { >> */ >> struct list_head list; >> /* >> + * Need to notify userspace when this event is removed? >> + */ >> + bool signal_on_remove; >> + /* >> * All fields below needed to unregister event when >> * userspace closes eventfd. >> */ >> @@ -3833,8 +3837,17 @@ static void cgroup_event_remove(struct work_struct *work) >> remove); >> struct cgroup *cgrp = event->cgrp; >> >> + remove_wait_queue(event->wqh, &event->wait); >> + >> event->cft->unregister_event(cgrp, event->cft, event->eventfd); >> >> + /* >> + * If this event is to be removed due to cgroup removal, >> + * we notify userspace. >> + */ >> + if (event->signal_on_remove) >> + eventfd_signal(event->eventfd, 1); > > It's safe to notify anyway, isn't it? Let's just drop signal_on_remove. > should be. just tried to be conservative to make sure I fix the bug without changing any behavior. > Otherwise, look good. > > Acked-by: Kirill A. Shutemov <kirill@shutemov.name> > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] cgroup: fix cgroup_rmdir() vs close(eventfd) race [not found] ` <5121C647.7030608-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> 2013-02-18 10:36 ` Kirill A. Shutemov @ 2013-02-18 17:16 ` Tejun Heo [not found] ` <20130218171647.GD17414-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org> 1 sibling, 1 reply; 5+ messages in thread From: Tejun Heo @ 2013-02-18 17:16 UTC (permalink / raw) To: Li Zefan; +Cc: Cgroups, LKML, Kirill A. Shutemov On Mon, Feb 18, 2013 at 02:12:23PM +0800, Li Zefan wrote: > commit 205a872bd6f9a9a09ef035ef1e90185a8245cc58 ("cgroup: fix lockdep > warning for event_control") solved a deadlock by introducing a new > bug. > > Move cgrp->event_list to a temporary list doesn't mean you can traverse > this list locklessly, because at the same time cgroup_event_wake() can > be called and remove the event from the list. The result of this race > is disastrous. > > We adopt the way how kvm irqfd code implements race-free event removal, > which is now described in the comments in cgroup_event_wake(). > > Signed-off-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> Applied to cgroup/for-3.9. Thanks. -- tejun ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <20130218171647.GD17414-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>]
* Re: [PATCH v2] cgroup: fix cgroup_rmdir() vs close(eventfd) race [not found] ` <20130218171647.GD17414-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org> @ 2013-02-18 17:18 ` Tejun Heo 0 siblings, 0 replies; 5+ messages in thread From: Tejun Heo @ 2013-02-18 17:18 UTC (permalink / raw) To: Li Zefan; +Cc: Cgroups, LKML, Kirill A. Shutemov On Mon, Feb 18, 2013 at 09:16:47AM -0800, Tejun Heo wrote: > On Mon, Feb 18, 2013 at 02:12:23PM +0800, Li Zefan wrote: > > commit 205a872bd6f9a9a09ef035ef1e90185a8245cc58 ("cgroup: fix lockdep > > warning for event_control") solved a deadlock by introducing a new > > bug. > > > > Move cgrp->event_list to a temporary list doesn't mean you can traverse > > this list locklessly, because at the same time cgroup_event_wake() can > > be called and remove the event from the list. The result of this race > > is disastrous. > > > > We adopt the way how kvm irqfd code implements race-free event removal, > > which is now described in the comments in cgroup_event_wake(). > > > > Signed-off-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> > > Applied to cgroup/for-3.9. Never mind. Just spotted v3 and applied that one instead. -- tejun ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-02-18 17:18 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-18 6:12 [PATCH v2] cgroup: fix cgroup_rmdir() vs close(eventfd) race Li Zefan
[not found] ` <5121C647.7030608-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-02-18 10:36 ` Kirill A. Shutemov
2013-02-18 10:39 ` Li Zefan
2013-02-18 17:16 ` Tejun Heo
[not found] ` <20130218171647.GD17414-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2013-02-18 17:18 ` Tejun Heo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox