* epoll: possible bug from wakeup_source activation
@ 2013-03-07 11:26 Eric Wong
2013-03-08 1:30 ` Eric Wong
0 siblings, 1 reply; 14+ messages in thread
From: Eric Wong @ 2013-03-07 11:26 UTC (permalink / raw)
To: Arve Hjønnevåg
Cc: NeilBrown, Rafael J. Wysocki, linux-kernel, Davide Libenzi,
Alexander Viro, linux-fsdevel, Andrew Morton
Hi Arve, looking at commit 4d7e30d98939a0340022ccd49325a3d70f7e0238
(epoll: Add a flag, EPOLLWAKEUP, to prevent suspend ...)
I think the reason for using ep->ws instead of epi->ws in the unlikely
ovflist case applies to the likely rdllist case, too. Since epi->ws is
only protected by ep->mtx, it can also be deactivated while inside
ep_poll_callback.
So something like the following patch might be necessary
(shown here with extra context):
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -968,39 +968,45 @@ static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *k
if (unlikely(ep->ovflist != EP_UNACTIVE_PTR)) {
if (epi->next == EP_UNACTIVE_PTR) {
epi->next = ep->ovflist;
ep->ovflist = epi;
if (epi->ws) {
/*
* Activate ep->ws since epi->ws may get
* deactivated at any time.
*/
__pm_stay_awake(ep->ws);
}
}
goto out_unlock;
}
/* If this file is already in the ready list we exit soon */
if (!ep_is_linked(&epi->rdllink)) {
list_add_tail(&epi->rdllink, &ep->rdllist);
- __pm_stay_awake(epi->ws);
+ if (epi->ws) {
+ /*
+ * Activate ep->ws since epi->ws may get
+ * deactivated at any time.
+ */
+ __pm_stay_awake(ep->ws);
+ }
}
-----------------------------------------------------------------------
However, I think my proposed patch will also cause breakage with the
way epi->ws is handled in ep_send_events_proc:
/*
* Activate ep->ws before deactivating epi->ws to prevent
* triggering auto-suspend here (in case we reactive epi->ws
* below).
*
* This could be rearranged to delay the deactivation of epi->ws
* instead, but then epi->ws would temporarily be out of sync
* with ep_is_linked().
*/
if (epi->ws && epi->ws->active)
__pm_stay_awake(ep->ws);
__pm_relax(epi->ws);
list_del_init(&epi->rdllink);
I'm not sure, but maybe only using ep->ws is the easiest way to go.
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: epoll: possible bug from wakeup_source activation 2013-03-07 11:26 epoll: possible bug from wakeup_source activation Eric Wong @ 2013-03-08 1:30 ` Eric Wong 2013-03-08 4:56 ` Arve Hjønnevåg 0 siblings, 1 reply; 14+ messages in thread From: Eric Wong @ 2013-03-08 1:30 UTC (permalink / raw) To: Arve Hjønnevåg Cc: NeilBrown, Rafael J. Wysocki, linux-kernel, Davide Libenzi, Alexander Viro, linux-fsdevel, Andrew Morton Eric Wong <normalperson@yhbt.net> wrote: > Hi Arve, looking at commit 4d7e30d98939a0340022ccd49325a3d70f7e0238 > (epoll: Add a flag, EPOLLWAKEUP, to prevent suspend ...) > > I think the reason for using ep->ws instead of epi->ws in the unlikely > ovflist case applies to the likely rdllist case, too. Since epi->ws is > only protected by ep->mtx, it can also be deactivated while inside > ep_poll_callback. > > So something like the following patch might be necessary > (shown here with extra context): > > --- a/fs/eventpoll.c > +++ b/fs/eventpoll.c > @@ -968,39 +968,45 @@ static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *k > if (unlikely(ep->ovflist != EP_UNACTIVE_PTR)) { > if (epi->next == EP_UNACTIVE_PTR) { > epi->next = ep->ovflist; > ep->ovflist = epi; > if (epi->ws) { > /* > * Activate ep->ws since epi->ws may get > * deactivated at any time. > */ > __pm_stay_awake(ep->ws); > } > > } Thinking about this more, it looks like the original ep->ovflist case of using ep->ws is unnecessary. ep->ovflist != EP_UNACTIVE_PTR can only happen while ep->mtx is held (in ep_scan_ready_list); which means ep_modify+friends cannot remove epi->ws. ep_poll_callback holding ep->lock means ep_poll_callback prevents ep_scan_ready_list from setting ep->ovflist = EP_UNACTIVE_PTR and releasing ep->mtx. > goto out_unlock; > } > > /* If this file is already in the ready list we exit soon */ > if (!ep_is_linked(&epi->rdllink)) { > list_add_tail(&epi->rdllink, &ep->rdllist); > - __pm_stay_awake(epi->ws); > + if (epi->ws) { > + /* > + * Activate ep->ws since epi->ws may get > + * deactivated at any time. > + */ > + __pm_stay_awake(ep->ws); > + } > } I still think ep->ws needs to be used in the common ep->rdllist case. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: epoll: possible bug from wakeup_source activation 2013-03-08 1:30 ` Eric Wong @ 2013-03-08 4:56 ` Arve Hjønnevåg 2013-03-08 20:49 ` Eric Wong 0 siblings, 1 reply; 14+ messages in thread From: Arve Hjønnevåg @ 2013-03-08 4:56 UTC (permalink / raw) To: Eric Wong Cc: NeilBrown, Rafael J. Wysocki, linux-kernel, Davide Libenzi, Alexander Viro, linux-fsdevel, Andrew Morton On Thu, Mar 7, 2013 at 5:30 PM, Eric Wong <normalperson@yhbt.net> wrote: > Eric Wong <normalperson@yhbt.net> wrote: >> Hi Arve, looking at commit 4d7e30d98939a0340022ccd49325a3d70f7e0238 >> (epoll: Add a flag, EPOLLWAKEUP, to prevent suspend ...) >> >> I think the reason for using ep->ws instead of epi->ws in the unlikely >> ovflist case applies to the likely rdllist case, too. Since epi->ws is >> only protected by ep->mtx, it can also be deactivated while inside >> ep_poll_callback. >> >> So something like the following patch might be necessary >> (shown here with extra context): >> >> --- a/fs/eventpoll.c >> +++ b/fs/eventpoll.c >> @@ -968,39 +968,45 @@ static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *k >> if (unlikely(ep->ovflist != EP_UNACTIVE_PTR)) { >> if (epi->next == EP_UNACTIVE_PTR) { >> epi->next = ep->ovflist; >> ep->ovflist = epi; >> if (epi->ws) { >> /* >> * Activate ep->ws since epi->ws may get >> * deactivated at any time. >> */ >> __pm_stay_awake(ep->ws); >> } >> >> } > > Thinking about this more, it looks like the original ep->ovflist case of > using ep->ws is unnecessary. > > ep->ovflist != EP_UNACTIVE_PTR can only happen while ep->mtx is held (in > ep_scan_ready_list); which means ep_modify+friends cannot remove epi->ws. > The callback function in ep_scan_ready_list can call __pm_relax on it though. > ep_poll_callback holding ep->lock means ep_poll_callback prevents > ep_scan_ready_list from setting ep->ovflist = EP_UNACTIVE_PTR and > releasing ep->mtx. This code is reached when ep_scan_ready_list has set ep->ovflist to NULL before releasing ep->lock. Since the callback function can call __pm_relax on epi->ws without holding ep->lock we call __pm_stay_awake in ep->ws here (the callback does not call __pm_relax on that). > >> goto out_unlock; >> } >> >> /* If this file is already in the ready list we exit soon */ >> if (!ep_is_linked(&epi->rdllink)) { >> list_add_tail(&epi->rdllink, &ep->rdllist); >> - __pm_stay_awake(epi->ws); >> + if (epi->ws) { >> + /* >> + * Activate ep->ws since epi->ws may get >> + * deactivated at any time. >> + */ >> + __pm_stay_awake(ep->ws); >> + } >> } > > I still think ep->ws needs to be used in the common ep->rdllist case. ep_scan_ready_list calls __pm_relax on ep->ws when it is done, so this will not work. ep->ws is not a "ep->rdllist not empty wakeup_source is is a "ep_scan_ready_list is running" wakeup_source. -- Arve Hjønnevåg ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: epoll: possible bug from wakeup_source activation 2013-03-08 4:56 ` Arve Hjønnevåg @ 2013-03-08 20:49 ` Eric Wong 2013-03-09 4:09 ` Arve Hjønnevåg 0 siblings, 1 reply; 14+ messages in thread From: Eric Wong @ 2013-03-08 20:49 UTC (permalink / raw) To: Arve Hjønnevåg Cc: NeilBrown, Rafael J. Wysocki, linux-kernel, Davide Libenzi, Alexander Viro, linux-fsdevel, Andrew Morton Arve Hjønnevåg <arve@android.com> wrote: > On Thu, Mar 7, 2013 at 5:30 PM, Eric Wong <normalperson@yhbt.net> wrote: > > Eric Wong <normalperson@yhbt.net> wrote: > >> Hi Arve, looking at commit 4d7e30d98939a0340022ccd49325a3d70f7e0238 > >> (epoll: Add a flag, EPOLLWAKEUP, to prevent suspend ...) > >> > >> I think the reason for using ep->ws instead of epi->ws in the unlikely > >> ovflist case applies to the likely rdllist case, too. Since epi->ws is > >> only protected by ep->mtx, it can also be deactivated while inside > >> ep_poll_callback. > >> > >> So something like the following patch might be necessary > >> (shown here with extra context): > >> > >> --- a/fs/eventpoll.c > >> +++ b/fs/eventpoll.c > >> @@ -968,39 +968,45 @@ static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *k > >> if (unlikely(ep->ovflist != EP_UNACTIVE_PTR)) { > >> if (epi->next == EP_UNACTIVE_PTR) { > >> epi->next = ep->ovflist; > >> ep->ovflist = epi; > >> if (epi->ws) { > >> /* > >> * Activate ep->ws since epi->ws may get > >> * deactivated at any time. > >> */ > >> __pm_stay_awake(ep->ws); > >> } > >> > >> } > > > > Thinking about this more, it looks like the original ep->ovflist case of > > using ep->ws is unnecessary. > > > > ep->ovflist != EP_UNACTIVE_PTR can only happen while ep->mtx is held (in > > ep_scan_ready_list); which means ep_modify+friends cannot remove epi->ws. > > > > The callback function in ep_scan_ready_list can call __pm_relax on it though. > > > ep_poll_callback holding ep->lock means ep_poll_callback prevents > > ep_scan_ready_list from setting ep->ovflist = EP_UNACTIVE_PTR and > > releasing ep->mtx. > > This code is reached when ep_scan_ready_list has set ep->ovflist to > NULL before releasing ep->lock. Since the callback function can call > __pm_relax on epi->ws without holding ep->lock we call __pm_stay_awake > in ep->ws here (the callback does not call __pm_relax on that). Thanks for the explanation. I got "deactivate" and "destroy" mixed up. However, I'm still concerned about the "destroy" case: > > > >> goto out_unlock; > >> } > >> > >> /* If this file is already in the ready list we exit soon */ > >> if (!ep_is_linked(&epi->rdllink)) { > >> list_add_tail(&epi->rdllink, &ep->rdllist); > >> - __pm_stay_awake(epi->ws); > >> + if (epi->ws) { > >> + /* > >> + * Activate ep->ws since epi->ws may get > >> + * deactivated at any time. > >> + */ > >> + __pm_stay_awake(ep->ws); > >> + } > >> } > > > > I still think ep->ws needs to be used in the common ep->rdllist case. > > ep_scan_ready_list calls __pm_relax on ep->ws when it is done, so this > will not work. ep->ws is not a "ep->rdllist not empty wakeup_source is > is a "ep_scan_ready_list is running" wakeup_source. What happens if ep_modify calls ep_destroy_wakeup_source while __pm_stay_awake is running on the same epi->ws? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: epoll: possible bug from wakeup_source activation 2013-03-08 20:49 ` Eric Wong @ 2013-03-09 4:09 ` Arve Hjønnevåg 2013-03-09 7:10 ` Eric Wong 0 siblings, 1 reply; 14+ messages in thread From: Arve Hjønnevåg @ 2013-03-09 4:09 UTC (permalink / raw) To: Eric Wong Cc: NeilBrown, Rafael J. Wysocki, linux-kernel, Davide Libenzi, Alexander Viro, linux-fsdevel, Andrew Morton On Fri, Mar 8, 2013 at 12:49 PM, Eric Wong <normalperson@yhbt.net> wrote: > Arve Hjønnevåg <arve@android.com> wrote: >> On Thu, Mar 7, 2013 at 5:30 PM, Eric Wong <normalperson@yhbt.net> wrote: >> > Eric Wong <normalperson@yhbt.net> wrote: >> >> Hi Arve, looking at commit 4d7e30d98939a0340022ccd49325a3d70f7e0238 >> >> (epoll: Add a flag, EPOLLWAKEUP, to prevent suspend ...) >> >> >> >> I think the reason for using ep->ws instead of epi->ws in the unlikely >> >> ovflist case applies to the likely rdllist case, too. Since epi->ws is >> >> only protected by ep->mtx, it can also be deactivated while inside >> >> ep_poll_callback. >> >> >> >> So something like the following patch might be necessary >> >> (shown here with extra context): >> >> >> >> --- a/fs/eventpoll.c >> >> +++ b/fs/eventpoll.c >> >> @@ -968,39 +968,45 @@ static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *k >> >> if (unlikely(ep->ovflist != EP_UNACTIVE_PTR)) { >> >> if (epi->next == EP_UNACTIVE_PTR) { >> >> epi->next = ep->ovflist; >> >> ep->ovflist = epi; >> >> if (epi->ws) { >> >> /* >> >> * Activate ep->ws since epi->ws may get >> >> * deactivated at any time. >> >> */ >> >> __pm_stay_awake(ep->ws); >> >> } >> >> >> >> } >> > >> > Thinking about this more, it looks like the original ep->ovflist case of >> > using ep->ws is unnecessary. >> > >> > ep->ovflist != EP_UNACTIVE_PTR can only happen while ep->mtx is held (in >> > ep_scan_ready_list); which means ep_modify+friends cannot remove epi->ws. >> > >> >> The callback function in ep_scan_ready_list can call __pm_relax on it though. >> >> > ep_poll_callback holding ep->lock means ep_poll_callback prevents >> > ep_scan_ready_list from setting ep->ovflist = EP_UNACTIVE_PTR and >> > releasing ep->mtx. >> >> This code is reached when ep_scan_ready_list has set ep->ovflist to >> NULL before releasing ep->lock. Since the callback function can call >> __pm_relax on epi->ws without holding ep->lock we call __pm_stay_awake >> in ep->ws here (the callback does not call __pm_relax on that). > > Thanks for the explanation. I got "deactivate" and "destroy" > mixed up. However, I'm still concerned about the "destroy" case: > >> > >> >> goto out_unlock; >> >> } >> >> >> >> /* If this file is already in the ready list we exit soon */ >> >> if (!ep_is_linked(&epi->rdllink)) { >> >> list_add_tail(&epi->rdllink, &ep->rdllist); >> >> - __pm_stay_awake(epi->ws); >> >> + if (epi->ws) { >> >> + /* >> >> + * Activate ep->ws since epi->ws may get >> >> + * deactivated at any time. >> >> + */ >> >> + __pm_stay_awake(ep->ws); >> >> + } >> >> } >> > >> > I still think ep->ws needs to be used in the common ep->rdllist case. >> >> ep_scan_ready_list calls __pm_relax on ep->ws when it is done, so this >> will not work. ep->ws is not a "ep->rdllist not empty wakeup_source is >> is a "ep_scan_ready_list is running" wakeup_source. > > What happens if ep_modify calls ep_destroy_wakeup_source > while __pm_stay_awake is running on the same epi->ws? Yes, that looks like a problem. I think calling ep_destroy_wakeup_source with ep->lock held should fix that. It is not clear how useful changing EPOLLWAKEUP in ep_modify is, so alternatively we could remove that feature and instead only allow it to be set in ep_insert. -- Arve Hjønnevåg ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: epoll: possible bug from wakeup_source activation 2013-03-09 4:09 ` Arve Hjønnevåg @ 2013-03-09 7:10 ` Eric Wong 2013-03-10 1:11 ` Eric Wong 2013-03-11 23:37 ` epoll: possible bug from wakeup_source activation Arve Hjønnevåg 0 siblings, 2 replies; 14+ messages in thread From: Eric Wong @ 2013-03-09 7:10 UTC (permalink / raw) To: Arve Hjønnevåg Cc: NeilBrown, Rafael J. Wysocki, linux-kernel, Davide Libenzi, Alexander Viro, linux-fsdevel, Andrew Morton Arve Hjønnevåg <arve@android.com> wrote: > On Fri, Mar 8, 2013 at 12:49 PM, Eric Wong <normalperson@yhbt.net> wrote: > > What happens if ep_modify calls ep_destroy_wakeup_source > > while __pm_stay_awake is running on the same epi->ws? > > Yes, that looks like a problem. I think calling > ep_destroy_wakeup_source with ep->lock held should fix that. It is not > clear how useful changing EPOLLWAKEUP in ep_modify is, so > alternatively we could remove that feature and instead only allow it > to be set in ep_insert. ep->lock would work, but ep->lock is already a source of heavy contention in my multithreaded+epoll webservers. Perhaps RCU can be used? I've no experience with RCU, but I've been meaning to get acquainted with RCU. Another possible solution is to only use ep->ws and add an atomic counter to ep; so __pm_relax(ep->ws) is only called when the atomic counter reaches zero. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: epoll: possible bug from wakeup_source activation 2013-03-09 7:10 ` Eric Wong @ 2013-03-10 1:11 ` Eric Wong 2013-03-10 4:59 ` Eric Dumazet 2013-03-11 23:37 ` epoll: possible bug from wakeup_source activation Arve Hjønnevåg 1 sibling, 1 reply; 14+ messages in thread From: Eric Wong @ 2013-03-10 1:11 UTC (permalink / raw) To: Arve Hjønnevåg Cc: NeilBrown, Rafael J. Wysocki, linux-kernel, Davide Libenzi, Alexander Viro, linux-fsdevel, Andrew Morton Eric Wong <normalperson@yhbt.net> wrote: > Arve Hjønnevåg <arve@android.com> wrote: > > On Fri, Mar 8, 2013 at 12:49 PM, Eric Wong <normalperson@yhbt.net> wrote: > > > What happens if ep_modify calls ep_destroy_wakeup_source > > > while __pm_stay_awake is running on the same epi->ws? > > > > Yes, that looks like a problem. I think calling > > ep_destroy_wakeup_source with ep->lock held should fix that. It is not > > clear how useful changing EPOLLWAKEUP in ep_modify is, so > > alternatively we could remove that feature and instead only allow it > > to be set in ep_insert. > > ep->lock would work, but ep->lock is already a source of heavy > contention in my multithreaded+epoll webservers. > > Perhaps RCU can be used? I've no experience with RCU, but I've been > meaning to get acquainted with RCU. The following is lightly tested, at least I haven't gotten lockdep to complain. --------------------------------- 8< ---------------------------- From 2bcd549893aa204bd858cc1500a70f20b28e47c1 Mon Sep 17 00:00:00 2001 From: Eric Wong <normalperson@yhbt.net> Date: Sun, 10 Mar 2013 01:06:50 +0000 Subject: [PATCH] epoll: RCU protect wakeup_source in epitem This prevents wakeup_source destruction when a user hits the item with EPOLL_CTL_MOD while ep_poll_callback is running. Signed-off-by: Eric Wong <normalperson@yhbt.net> --- fs/eventpoll.c | 31 ++++++++++++++++++++++++------- 1 file changed, 24 insertions(+), 7 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 9fec183..e008d54 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -158,7 +158,7 @@ struct epitem { struct list_head fllink; /* wakeup_source used when EPOLLWAKEUP is set */ - struct wakeup_source *ws; + struct wakeup_source __rcu *ws; /* The structure that describe the interested events and the source fd */ struct epoll_event event; @@ -536,6 +536,17 @@ static void ep_unregister_pollwait(struct eventpoll *ep, struct epitem *epi) } } +static inline void ep_pm_stay_awake(struct epitem *epi) +{ + struct wakeup_source *ws; + + rcu_read_lock(); + ws = rcu_dereference(epi->ws); + if (ws) + __pm_stay_awake(ws); + rcu_read_unlock(); +} + /** * ep_scan_ready_list - Scans the ready list in a way that makes possible for * the scan code, to call f_op->poll(). Also allows for @@ -984,7 +995,7 @@ static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *k /* If this file is already in the ready list we exit soon */ if (!ep_is_linked(&epi->rdllink)) { list_add_tail(&epi->rdllink, &ep->rdllist); - __pm_stay_awake(epi->ws); + ep_pm_stay_awake(epi); } /* @@ -1146,6 +1157,7 @@ static int reverse_path_check(void) static int ep_create_wakeup_source(struct epitem *epi) { const char *name; + struct wakeup_source *ws; if (!epi->ep->ws) { epi->ep->ws = wakeup_source_register("eventpoll"); @@ -1154,17 +1166,22 @@ static int ep_create_wakeup_source(struct epitem *epi) } name = epi->ffd.file->f_path.dentry->d_name.name; - epi->ws = wakeup_source_register(name); - if (!epi->ws) + ws = wakeup_source_register(name); + + if (!ws) return -ENOMEM; + rcu_assign_pointer(epi->ws, ws); return 0; } static void ep_destroy_wakeup_source(struct epitem *epi) { - wakeup_source_unregister(epi->ws); - epi->ws = NULL; + struct wakeup_source *ws = epi->ws; + + rcu_assign_pointer(epi->ws, NULL); + synchronize_rcu(); /* wait for ep_pm_stay_awake to finish */ + wakeup_source_unregister(ws); } /* @@ -1199,7 +1216,7 @@ static int ep_insert(struct eventpoll *ep, struct epoll_event *event, if (error) goto error_create_wakeup_source; } else { - epi->ws = NULL; + RCU_INIT_POINTER(epi->ws, NULL); } /* Initialize the poll table using the queue callback */ -- Eric Wong ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: epoll: possible bug from wakeup_source activation 2013-03-10 1:11 ` Eric Wong @ 2013-03-10 4:59 ` Eric Dumazet 2013-03-10 11:50 ` [PATCH] epoll: use RCU protect wakeup_source in epitem Eric Wong 0 siblings, 1 reply; 14+ messages in thread From: Eric Dumazet @ 2013-03-10 4:59 UTC (permalink / raw) To: Eric Wong Cc: Arve Hjønnevåg, NeilBrown, Rafael J. Wysocki, linux-kernel, Davide Libenzi, Alexander Viro, linux-fsdevel, Andrew Morton On Sun, 2013-03-10 at 01:11 +0000, Eric Wong wrote: > > static void ep_destroy_wakeup_source(struct epitem *epi) > { > - wakeup_source_unregister(epi->ws); > - epi->ws = NULL; > + struct wakeup_source *ws = epi->ws; > + > + rcu_assign_pointer(epi->ws, NULL); There is no need to use a memory barrier before setting epi->ws to NULL You should use RCU_POINTER_INIT() > + synchronize_rcu(); /* wait for ep_pm_stay_awake to finish */ Wont this add a significant slowdown ? > + wakeup_source_unregister(ws); > } Please add the following in your .config CONFIG_SPARSE_RCU_POINTER=y and try : make C=2 fs/eventpoll.o And fix all warnings ;) ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] epoll: use RCU protect wakeup_source in epitem 2013-03-10 4:59 ` Eric Dumazet @ 2013-03-10 11:50 ` Eric Wong 2013-03-14 3:09 ` [PATCH mm] epoll: lock ep->mtx in ep_free to silence lockdep Eric Wong 0 siblings, 1 reply; 14+ messages in thread From: Eric Wong @ 2013-03-10 11:50 UTC (permalink / raw) To: Eric Dumazet Cc: Arve Hjønnevåg, NeilBrown, Rafael J. Wysocki, linux-kernel, Davide Libenzi, Alexander Viro, linux-fsdevel, Andrew Morton Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Sun, 2013-03-10 at 01:11 +0000, Eric Wong wrote: > > > > static void ep_destroy_wakeup_source(struct epitem *epi) > > { > > - wakeup_source_unregister(epi->ws); > > - epi->ws = NULL; > > + struct wakeup_source *ws = epi->ws; > > + > > + rcu_assign_pointer(epi->ws, NULL); > > There is no need to use a memory barrier before setting epi->ws to NULL > > You should use RCU_POINTER_INIT() Thanks for looking at this patch. Using RCU_POINTER_INIT in my updated patch below. > > + synchronize_rcu(); /* wait for ep_pm_stay_awake to finish */ > > Wont this add a significant slowdown ? Maybe, but this is a very rare code path (using EPOLL_CTL_MOD to remove a wakeup source), and synchronize_rcu gets called later, too: wakeup_source_unregister wakeup_source_remove synchronize_rcu > > + wakeup_source_unregister(ws); > > } > > Please add the following in your .config > > CONFIG_SPARSE_RCU_POINTER=y > > and try : > > make C=2 fs/eventpoll.o > > And fix all warnings ;) Done. ---------------------------------8<------------------------------ From 24e5ee6222cc91208119973248fc4f8a6d1a31da Mon Sep 17 00:00:00 2001 From: Eric Wong <normalperson@yhbt.net> Date: Sun, 10 Mar 2013 01:06:50 +0000 Subject: [PATCH] epoll: use RCU protect wakeup_source in epitem This prevents wakeup_source destruction when a user hits the item with EPOLL_CTL_MOD while ep_poll_callback is running. Tested with CONFIG_SPARSE_RCU_POINTER=y and "make fs/eventpoll.o C=2" Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arve Hjønnevåg <arve@android.com> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: NeilBrown <neilb@suse.de> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Eric Wong <normalperson@yhbt.net> --- fs/eventpoll.c | 92 ++++++++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 71 insertions(+), 21 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 1326409..15d0f27 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -158,7 +158,7 @@ struct epitem { struct list_head fllink; /* wakeup_source used when EPOLLWAKEUP is set */ - struct wakeup_source *ws; + struct wakeup_source __rcu *ws; /* The structure that describe the interested events and the source fd */ struct epoll_event event; @@ -536,6 +536,38 @@ static void ep_unregister_pollwait(struct eventpoll *ep, struct epitem *epi) } } +/* call only when ep->mtx is held */ +static inline struct wakeup_source *ep_wakeup_source(struct epitem *epi) +{ + return rcu_dereference_check(epi->ws, lockdep_is_held(&epi->ep->mtx)); +} + +/* call only when ep->mtx is held */ +static inline void ep_pm_stay_awake(struct epitem *epi) +{ + struct wakeup_source *ws = ep_wakeup_source(epi); + + if (ws) + __pm_stay_awake(ws); +} + +static inline bool ep_has_wakeup_source(struct epitem *epi) +{ + return rcu_access_pointer(epi->ws) ? true : false; +} + +/* call when ep->mtx cannot be held (ep_poll_callback) */ +static inline void ep_pm_stay_awake_rcu(struct epitem *epi) +{ + struct wakeup_source *ws; + + rcu_read_lock(); + ws = rcu_dereference(epi->ws); + if (ws) + __pm_stay_awake(ws); + rcu_read_unlock(); +} + /** * ep_scan_ready_list - Scans the ready list in a way that makes possible for * the scan code, to call f_op->poll(). Also allows for @@ -599,7 +631,7 @@ static int ep_scan_ready_list(struct eventpoll *ep, */ if (!ep_is_linked(&epi->rdllink)) { list_add_tail(&epi->rdllink, &ep->rdllist); - __pm_stay_awake(epi->ws); + ep_pm_stay_awake(epi); } } /* @@ -668,7 +700,7 @@ static int ep_remove(struct eventpoll *ep, struct epitem *epi) list_del_init(&epi->rdllink); spin_unlock_irqrestore(&ep->lock, flags); - wakeup_source_unregister(epi->ws); + wakeup_source_unregister(ep_wakeup_source(epi)); /* At this point it is safe to free the eventpoll item */ kmem_cache_free(epi_cache, epi); @@ -752,7 +784,7 @@ static int ep_read_events_proc(struct eventpoll *ep, struct list_head *head, * callback, but it's not actually ready, as far as * caller requested events goes. We can remove it here. */ - __pm_relax(epi->ws); + __pm_relax(ep_wakeup_source(epi)); list_del_init(&epi->rdllink); } } @@ -984,7 +1016,7 @@ static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *k /* If this file is already in the ready list we exit soon */ if (!ep_is_linked(&epi->rdllink)) { list_add_tail(&epi->rdllink, &ep->rdllist); - __pm_stay_awake(epi->ws); + ep_pm_stay_awake(epi); } /* @@ -1146,6 +1178,7 @@ static int reverse_path_check(void) static int ep_create_wakeup_source(struct epitem *epi) { const char *name; + struct wakeup_source *ws; if (!epi->ep->ws) { epi->ep->ws = wakeup_source_register("eventpoll"); @@ -1154,17 +1187,29 @@ static int ep_create_wakeup_source(struct epitem *epi) } name = epi->ffd.file->f_path.dentry->d_name.name; - epi->ws = wakeup_source_register(name); - if (!epi->ws) + ws = wakeup_source_register(name); + + if (!ws) return -ENOMEM; + rcu_assign_pointer(epi->ws, ws); return 0; } -static void ep_destroy_wakeup_source(struct epitem *epi) +/* rare code path, only used when EPOLL_CTL_MOD removes a wakeup source */ +static noinline void ep_destroy_wakeup_source(struct epitem *epi) { - wakeup_source_unregister(epi->ws); - epi->ws = NULL; + struct wakeup_source *ws = ep_wakeup_source(epi); + + rcu_assign_pointer(epi->ws, NULL); + + /* + * wait for ep_pm_stay_awake_rcu to finish, synchronize_rcu is + * used internally by wakeup_source_remove, too (called by + * wakeup_source_unregister), so we cannot use call_rcu + */ + synchronize_rcu(); + wakeup_source_unregister(ws); } /* @@ -1199,7 +1244,7 @@ static int ep_insert(struct eventpoll *ep, struct epoll_event *event, if (error) goto error_create_wakeup_source; } else { - epi->ws = NULL; + RCU_INIT_POINTER(epi->ws, NULL); } /* Initialize the poll table using the queue callback */ @@ -1247,7 +1292,7 @@ static int ep_insert(struct eventpoll *ep, struct epoll_event *event, /* If the file is already "ready" we drop it inside the ready list */ if ((revents & event->events) && !ep_is_linked(&epi->rdllink)) { list_add_tail(&epi->rdllink, &ep->rdllist); - __pm_stay_awake(epi->ws); + ep_pm_stay_awake(epi); /* Notify waiting tasks that events are available */ if (waitqueue_active(&ep->wq)) @@ -1288,7 +1333,7 @@ error_unregister: list_del_init(&epi->rdllink); spin_unlock_irqrestore(&ep->lock, flags); - wakeup_source_unregister(epi->ws); + wakeup_source_unregister(ep_wakeup_source(epi)); error_create_wakeup_source: kmem_cache_free(epi_cache, epi); @@ -1317,9 +1362,9 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even pt._key = event->events; epi->event.data = event->data; /* protected by mtx */ if (epi->event.events & EPOLLWAKEUP) { - if (!epi->ws) + if (!ep_has_wakeup_source(epi)) ep_create_wakeup_source(epi); - } else if (epi->ws) { + } else if (ep_has_wakeup_source(epi)) { ep_destroy_wakeup_source(epi); } @@ -1357,7 +1402,7 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even spin_lock_irq(&ep->lock); if (!ep_is_linked(&epi->rdllink)) { list_add_tail(&epi->rdllink, &ep->rdllist); - __pm_stay_awake(epi->ws); + ep_pm_stay_awake(epi); /* Notify waiting tasks that events are available */ if (waitqueue_active(&ep->wq)) @@ -1383,6 +1428,7 @@ static int ep_send_events_proc(struct eventpoll *ep, struct list_head *head, unsigned int revents; struct epitem *epi; struct epoll_event __user *uevent; + struct wakeup_source *ws; poll_table pt; init_poll_funcptr(&pt, NULL); @@ -1405,9 +1451,13 @@ static int ep_send_events_proc(struct eventpoll *ep, struct list_head *head, * instead, but then epi->ws would temporarily be out of sync * with ep_is_linked(). */ - if (epi->ws && epi->ws->active) - __pm_stay_awake(ep->ws); - __pm_relax(epi->ws); + ws = ep_wakeup_source(epi); + if (ws) { + if (ws->active) + __pm_stay_awake(ep->ws); + __pm_relax(ws); + } + list_del_init(&epi->rdllink); pt._key = epi->event.events; @@ -1424,7 +1474,7 @@ static int ep_send_events_proc(struct eventpoll *ep, struct list_head *head, if (__put_user(revents, &uevent->events) || __put_user(epi->event.data, &uevent->data)) { list_add(&epi->rdllink, head); - __pm_stay_awake(epi->ws); + ep_pm_stay_awake(epi); return eventcnt ? eventcnt : -EFAULT; } eventcnt++; @@ -1444,7 +1494,7 @@ static int ep_send_events_proc(struct eventpoll *ep, struct list_head *head, * poll callback will queue them in ep->ovflist. */ list_add_tail(&epi->rdllink, &ep->rdllist); - __pm_stay_awake(epi->ws); + ep_pm_stay_awake(epi); } } } -- 1.8.2.rc3.2.g89ce8d6 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH mm] epoll: lock ep->mtx in ep_free to silence lockdep 2013-03-10 11:50 ` [PATCH] epoll: use RCU protect wakeup_source in epitem Eric Wong @ 2013-03-14 3:09 ` Eric Wong 0 siblings, 0 replies; 14+ messages in thread From: Eric Wong @ 2013-03-14 3:09 UTC (permalink / raw) To: Andrew Morton Cc: Arve Hjønnevåg, NeilBrown, Rafael J. Wysocki, linux-kernel, Davide Libenzi, Al Viro, linux-fsdevel, Eric Dumazet Technically we do not need to hold ep->mtx during ep_free since we are certain there are no other users of ep at that point. However, lockdep complains with a "suspicious rcu_dereference_check() usage!" message; so lock the mutex before ep_remove to silence the warning. Signed-off-by: Eric Wong <normalperson@yhbt.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arve Hjønnevåg <arve@android.com> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: NeilBrown <neilb@suse.de>, Cc: Rafael J. Wysocki <rjw@sisk.pl> --- I considered using lockdep_off()/lockdep_on(), but I figure that may hide other bugs... fs/eventpoll.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 633e69f..d71b754 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -745,11 +745,15 @@ static void ep_free(struct eventpoll *ep) * point we are sure no poll callbacks will be lingering around, and also by * holding "epmutex" we can be sure that no file cleanup code will hit * us during this operation. So we can avoid the lock on "ep->lock". + * We do not need to lock ep->mtx, either, we only do it to prevent + * a lockdep warning. */ + mutex_lock(&ep->mtx); while ((rbp = rb_first(&ep->rbr)) != NULL) { epi = rb_entry(rbp, struct epitem, rbn); ep_remove(ep, epi); } + mutex_unlock(&ep->mtx); mutex_unlock(&epmutex); mutex_destroy(&ep->mtx); -- Eric Wong ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: epoll: possible bug from wakeup_source activation 2013-03-09 7:10 ` Eric Wong 2013-03-10 1:11 ` Eric Wong @ 2013-03-11 23:37 ` Arve Hjønnevåg 2013-03-12 0:17 ` Eric Wong 1 sibling, 1 reply; 14+ messages in thread From: Arve Hjønnevåg @ 2013-03-11 23:37 UTC (permalink / raw) To: Eric Wong Cc: NeilBrown, Rafael J. Wysocki, linux-kernel, Davide Libenzi, Alexander Viro, linux-fsdevel, Andrew Morton On Fri, Mar 8, 2013 at 11:10 PM, Eric Wong <normalperson@yhbt.net> wrote: > Arve Hjønnevåg <arve@android.com> wrote: >> On Fri, Mar 8, 2013 at 12:49 PM, Eric Wong <normalperson@yhbt.net> wrote: >> > What happens if ep_modify calls ep_destroy_wakeup_source >> > while __pm_stay_awake is running on the same epi->ws? >> >> Yes, that looks like a problem. I think calling >> ep_destroy_wakeup_source with ep->lock held should fix that. It is not >> clear how useful changing EPOLLWAKEUP in ep_modify is, so >> alternatively we could remove that feature and instead only allow it >> to be set in ep_insert. > > ep->lock would work, but ep->lock is already a source of heavy > contention in my multithreaded+epoll webservers. > This should not have any significant impact on that since you would be adding a lock to a code path that is, as far as I know, unused. > Perhaps RCU can be used? I've no experience with RCU, but I've been > meaning to get acquainted with RCU. > That adds code to the common path however. The wakeup_source is not touch without holding one of the locks so holding both locks before deleting it seems like a simpler solution. > Another possible solution is to only use ep->ws and add an atomic > counter to ep; so __pm_relax(ep->ws) is only called when the atomic > counter reaches zero. -- Arve Hjønnevåg ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: epoll: possible bug from wakeup_source activation 2013-03-11 23:37 ` epoll: possible bug from wakeup_source activation Arve Hjønnevåg @ 2013-03-12 0:17 ` Eric Wong 2013-03-12 0:29 ` Arve Hjønnevåg 0 siblings, 1 reply; 14+ messages in thread From: Eric Wong @ 2013-03-12 0:17 UTC (permalink / raw) To: Arve Hjønnevåg Cc: NeilBrown, Rafael J. Wysocki, linux-kernel, Davide Libenzi, Alexander Viro, linux-fsdevel, Andrew Morton Arve Hjønnevåg <arve@android.com> wrote: > On Fri, Mar 8, 2013 at 11:10 PM, Eric Wong <normalperson@yhbt.net> wrote: > > Arve Hjønnevåg <arve@android.com> wrote: > >> On Fri, Mar 8, 2013 at 12:49 PM, Eric Wong <normalperson@yhbt.net> wrote: > >> > What happens if ep_modify calls ep_destroy_wakeup_source > >> > while __pm_stay_awake is running on the same epi->ws? > >> > >> Yes, that looks like a problem. I think calling > >> ep_destroy_wakeup_source with ep->lock held should fix that. It is not > >> clear how useful changing EPOLLWAKEUP in ep_modify is, so > >> alternatively we could remove that feature and instead only allow it > >> to be set in ep_insert. > > > > ep->lock would work, but ep->lock is already a source of heavy > > contention in my multithreaded+epoll webservers. > > This should not have any significant impact on that since you would be > adding a lock to a code path that is, as far as I know, unused. > > > Perhaps RCU can be used? I've no experience with RCU, but I've been > > meaning to get acquainted with RCU. > > That adds code to the common path however. The wakeup_source is not > touch without holding one of the locks so holding both locks before > deleting it seems like a simpler solution. True. However, I've been looking into eliminating ep->lock in more places (maybe entirely)[1]. I don't think the current overhead of RCU in epoll is significant, either. [1] I'll be testing Mathieu's wait-free concurrent queue soon: http://mid.gmane.org/20130311213602.GB9829@Krystal ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: epoll: possible bug from wakeup_source activation 2013-03-12 0:17 ` Eric Wong @ 2013-03-12 0:29 ` Arve Hjønnevåg 2013-03-12 0:44 ` Eric Wong 0 siblings, 1 reply; 14+ messages in thread From: Arve Hjønnevåg @ 2013-03-12 0:29 UTC (permalink / raw) To: Eric Wong Cc: NeilBrown, Rafael J. Wysocki, linux-kernel, Davide Libenzi, Alexander Viro, linux-fsdevel, Andrew Morton On Mon, Mar 11, 2013 at 5:17 PM, Eric Wong <normalperson@yhbt.net> wrote: > Arve Hjønnevåg <arve@android.com> wrote: >> On Fri, Mar 8, 2013 at 11:10 PM, Eric Wong <normalperson@yhbt.net> wrote: >> > Arve Hjønnevåg <arve@android.com> wrote: >> >> On Fri, Mar 8, 2013 at 12:49 PM, Eric Wong <normalperson@yhbt.net> wrote: >> >> > What happens if ep_modify calls ep_destroy_wakeup_source >> >> > while __pm_stay_awake is running on the same epi->ws? >> >> >> >> Yes, that looks like a problem. I think calling >> >> ep_destroy_wakeup_source with ep->lock held should fix that. It is not >> >> clear how useful changing EPOLLWAKEUP in ep_modify is, so >> >> alternatively we could remove that feature and instead only allow it >> >> to be set in ep_insert. >> > >> > ep->lock would work, but ep->lock is already a source of heavy >> > contention in my multithreaded+epoll webservers. >> >> This should not have any significant impact on that since you would be >> adding a lock to a code path that is, as far as I know, unused. >> >> > Perhaps RCU can be used? I've no experience with RCU, but I've been >> > meaning to get acquainted with RCU. >> >> That adds code to the common path however. The wakeup_source is not >> touch without holding one of the locks so holding both locks before >> deleting it seems like a simpler solution. > > True. However, I've been looking into eliminating ep->lock in more > places (maybe entirely)[1]. > > I don't think the current overhead of RCU in epoll is significant, > either. > > > [1] I'll be testing Mathieu's wait-free concurrent queue soon: > http://mid.gmane.org/20130311213602.GB9829@Krystal OK, but is there any way you could use the same locking scheme for the wakeup_source and the queue? -- Arve Hjønnevåg -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: epoll: possible bug from wakeup_source activation 2013-03-12 0:29 ` Arve Hjønnevåg @ 2013-03-12 0:44 ` Eric Wong 0 siblings, 0 replies; 14+ messages in thread From: Eric Wong @ 2013-03-12 0:44 UTC (permalink / raw) To: Arve Hjønnevåg Cc: NeilBrown, Rafael J. Wysocki, linux-kernel, Davide Libenzi, Alexander Viro, linux-fsdevel, Andrew Morton Arve Hjønnevåg <arve@android.com> wrote: > On Mon, Mar 11, 2013 at 5:17 PM, Eric Wong <normalperson@yhbt.net> wrote: > > Arve Hjønnevåg <arve@android.com> wrote: > >> On Fri, Mar 8, 2013 at 11:10 PM, Eric Wong <normalperson@yhbt.net> wrote: > >> > Arve Hjønnevåg <arve@android.com> wrote: > >> >> On Fri, Mar 8, 2013 at 12:49 PM, Eric Wong <normalperson@yhbt.net> wrote: > >> >> > What happens if ep_modify calls ep_destroy_wakeup_source > >> >> > while __pm_stay_awake is running on the same epi->ws? > >> >> > >> >> Yes, that looks like a problem. I think calling > >> >> ep_destroy_wakeup_source with ep->lock held should fix that. It is not > >> >> clear how useful changing EPOLLWAKEUP in ep_modify is, so > >> >> alternatively we could remove that feature and instead only allow it > >> >> to be set in ep_insert. > >> > > >> > ep->lock would work, but ep->lock is already a source of heavy > >> > contention in my multithreaded+epoll webservers. > >> > >> This should not have any significant impact on that since you would be > >> adding a lock to a code path that is, as far as I know, unused. > >> > >> > Perhaps RCU can be used? I've no experience with RCU, but I've been > >> > meaning to get acquainted with RCU. > >> > >> That adds code to the common path however. The wakeup_source is not > >> touch without holding one of the locks so holding both locks before > >> deleting it seems like a simpler solution. > > > > True. However, I've been looking into eliminating ep->lock in more > > places (maybe entirely)[1]. > > > > I don't think the current overhead of RCU in epoll is significant, > > either. > > > > > > [1] I'll be testing Mathieu's wait-free concurrent queue soon: > > http://mid.gmane.org/20130311213602.GB9829@Krystal > > OK, but is there any way you could use the same locking scheme for the > wakeup_source and the queue? Probably, yes. I think I can just use ep->mtx and ignore the mutex included with wfcq_head, need to protect the rbtree while dequeueing. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2013-03-14 3:09 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-03-07 11:26 epoll: possible bug from wakeup_source activation Eric Wong 2013-03-08 1:30 ` Eric Wong 2013-03-08 4:56 ` Arve Hjønnevåg 2013-03-08 20:49 ` Eric Wong 2013-03-09 4:09 ` Arve Hjønnevåg 2013-03-09 7:10 ` Eric Wong 2013-03-10 1:11 ` Eric Wong 2013-03-10 4:59 ` Eric Dumazet 2013-03-10 11:50 ` [PATCH] epoll: use RCU protect wakeup_source in epitem Eric Wong 2013-03-14 3:09 ` [PATCH mm] epoll: lock ep->mtx in ep_free to silence lockdep Eric Wong 2013-03-11 23:37 ` epoll: possible bug from wakeup_source activation Arve Hjønnevåg 2013-03-12 0:17 ` Eric Wong 2013-03-12 0:29 ` Arve Hjønnevåg 2013-03-12 0:44 ` Eric Wong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).