linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* epoll: possible bug from wakeup_source activation
@ 2013-03-07 11:26 Eric Wong
  2013-03-08  1:30 ` Eric Wong
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Wong @ 2013-03-07 11:26 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: NeilBrown, Rafael J. Wysocki, linux-kernel, Davide Libenzi,
	Alexander Viro, linux-fsdevel, Andrew Morton

Hi Arve, looking at commit 4d7e30d98939a0340022ccd49325a3d70f7e0238
(epoll: Add a flag, EPOLLWAKEUP, to prevent suspend ...)

I think the reason for using ep->ws instead of epi->ws in the unlikely
ovflist case applies to the likely rdllist case, too.  Since epi->ws is
only protected by ep->mtx, it can also be deactivated while inside
ep_poll_callback.

So something like the following patch might be necessary
(shown here with extra context):

--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -968,39 +968,45 @@ static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *k
 	if (unlikely(ep->ovflist != EP_UNACTIVE_PTR)) {
 		if (epi->next == EP_UNACTIVE_PTR) {
 			epi->next = ep->ovflist;
 			ep->ovflist = epi;
 			if (epi->ws) {
 				/*
 				 * Activate ep->ws since epi->ws may get
 				 * deactivated at any time.
 				 */
 				__pm_stay_awake(ep->ws);
 			}
 
 		}
 		goto out_unlock;
 	}
 
 	/* If this file is already in the ready list we exit soon */
 	if (!ep_is_linked(&epi->rdllink)) {
 		list_add_tail(&epi->rdllink, &ep->rdllist);
-		__pm_stay_awake(epi->ws);
+		if (epi->ws) {
+			/*
+			 * Activate ep->ws since epi->ws may get
+			 * deactivated at any time.
+			 */
+			__pm_stay_awake(ep->ws);
+		}
 	}
-----------------------------------------------------------------------
However, I think my proposed patch will also cause breakage with the
way epi->ws is handled in ep_send_events_proc:

		/*
		 * Activate ep->ws before deactivating epi->ws to prevent
		 * triggering auto-suspend here (in case we reactive epi->ws
		 * below).
		 *
		 * This could be rearranged to delay the deactivation of epi->ws
		 * instead, but then epi->ws would temporarily be out of sync
		 * with ep_is_linked().
		 */
		if (epi->ws && epi->ws->active)
			__pm_stay_awake(ep->ws);
		__pm_relax(epi->ws);
		list_del_init(&epi->rdllink);

I'm not sure, but maybe only using ep->ws is the easiest way to go.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: epoll: possible bug from wakeup_source activation
  2013-03-07 11:26 epoll: possible bug from wakeup_source activation Eric Wong
@ 2013-03-08  1:30 ` Eric Wong
  2013-03-08  4:56   ` Arve Hjønnevåg
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Wong @ 2013-03-08  1:30 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: NeilBrown, Rafael J. Wysocki, linux-kernel, Davide Libenzi,
	Alexander Viro, linux-fsdevel, Andrew Morton

Eric Wong <normalperson@yhbt.net> wrote:
> Hi Arve, looking at commit 4d7e30d98939a0340022ccd49325a3d70f7e0238
> (epoll: Add a flag, EPOLLWAKEUP, to prevent suspend ...)
> 
> I think the reason for using ep->ws instead of epi->ws in the unlikely
> ovflist case applies to the likely rdllist case, too.  Since epi->ws is
> only protected by ep->mtx, it can also be deactivated while inside
> ep_poll_callback.
> 
> So something like the following patch might be necessary
> (shown here with extra context):
> 
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -968,39 +968,45 @@ static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *k
>  	if (unlikely(ep->ovflist != EP_UNACTIVE_PTR)) {
>  		if (epi->next == EP_UNACTIVE_PTR) {
>  			epi->next = ep->ovflist;
>  			ep->ovflist = epi;
>  			if (epi->ws) {
>  				/*
>  				 * Activate ep->ws since epi->ws may get
>  				 * deactivated at any time.
>  				 */
>  				__pm_stay_awake(ep->ws);
>  			}
>  
>  		}

Thinking about this more, it looks like the original ep->ovflist case of
using ep->ws is unnecessary.

ep->ovflist != EP_UNACTIVE_PTR can only happen while ep->mtx is held (in
ep_scan_ready_list); which means ep_modify+friends cannot remove epi->ws.

ep_poll_callback holding ep->lock means ep_poll_callback prevents
ep_scan_ready_list from setting ep->ovflist = EP_UNACTIVE_PTR and
releasing ep->mtx.

>  		goto out_unlock;
>  	}
>  
>  	/* If this file is already in the ready list we exit soon */
>  	if (!ep_is_linked(&epi->rdllink)) {
>  		list_add_tail(&epi->rdllink, &ep->rdllist);
> -		__pm_stay_awake(epi->ws);
> +		if (epi->ws) {
> +			/*
> +			 * Activate ep->ws since epi->ws may get
> +			 * deactivated at any time.
> +			 */
> +			__pm_stay_awake(ep->ws);
> +		}
>  	}

I still think ep->ws needs to be used in the common ep->rdllist case.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: epoll: possible bug from wakeup_source activation
  2013-03-08  1:30 ` Eric Wong
@ 2013-03-08  4:56   ` Arve Hjønnevåg
  2013-03-08 20:49     ` Eric Wong
  0 siblings, 1 reply; 14+ messages in thread
From: Arve Hjønnevåg @ 2013-03-08  4:56 UTC (permalink / raw)
  To: Eric Wong
  Cc: NeilBrown, Rafael J. Wysocki, linux-kernel, Davide Libenzi,
	Alexander Viro, linux-fsdevel, Andrew Morton

On Thu, Mar 7, 2013 at 5:30 PM, Eric Wong <normalperson@yhbt.net> wrote:
> Eric Wong <normalperson@yhbt.net> wrote:
>> Hi Arve, looking at commit 4d7e30d98939a0340022ccd49325a3d70f7e0238
>> (epoll: Add a flag, EPOLLWAKEUP, to prevent suspend ...)
>>
>> I think the reason for using ep->ws instead of epi->ws in the unlikely
>> ovflist case applies to the likely rdllist case, too.  Since epi->ws is
>> only protected by ep->mtx, it can also be deactivated while inside
>> ep_poll_callback.
>>
>> So something like the following patch might be necessary
>> (shown here with extra context):
>>
>> --- a/fs/eventpoll.c
>> +++ b/fs/eventpoll.c
>> @@ -968,39 +968,45 @@ static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *k
>>       if (unlikely(ep->ovflist != EP_UNACTIVE_PTR)) {
>>               if (epi->next == EP_UNACTIVE_PTR) {
>>                       epi->next = ep->ovflist;
>>                       ep->ovflist = epi;
>>                       if (epi->ws) {
>>                               /*
>>                                * Activate ep->ws since epi->ws may get
>>                                * deactivated at any time.
>>                                */
>>                               __pm_stay_awake(ep->ws);
>>                       }
>>
>>               }
>
> Thinking about this more, it looks like the original ep->ovflist case of
> using ep->ws is unnecessary.
>
> ep->ovflist != EP_UNACTIVE_PTR can only happen while ep->mtx is held (in
> ep_scan_ready_list); which means ep_modify+friends cannot remove epi->ws.
>

The callback function in ep_scan_ready_list can call __pm_relax on it though.

> ep_poll_callback holding ep->lock means ep_poll_callback prevents
> ep_scan_ready_list from setting ep->ovflist = EP_UNACTIVE_PTR and
> releasing ep->mtx.

This code is reached when ep_scan_ready_list has set ep->ovflist to
NULL before releasing ep->lock. Since the callback function can call
__pm_relax on epi->ws without holding ep->lock we call __pm_stay_awake
in ep->ws here (the callback does not call __pm_relax on that).

>
>>               goto out_unlock;
>>       }
>>
>>       /* If this file is already in the ready list we exit soon */
>>       if (!ep_is_linked(&epi->rdllink)) {
>>               list_add_tail(&epi->rdllink, &ep->rdllist);
>> -             __pm_stay_awake(epi->ws);
>> +             if (epi->ws) {
>> +                     /*
>> +                      * Activate ep->ws since epi->ws may get
>> +                      * deactivated at any time.
>> +                      */
>> +                     __pm_stay_awake(ep->ws);
>> +             }
>>       }
>
> I still think ep->ws needs to be used in the common ep->rdllist case.

ep_scan_ready_list calls __pm_relax on ep->ws when it is done, so this
will not work. ep->ws is not a "ep->rdllist not empty wakeup_source is
is a "ep_scan_ready_list is running" wakeup_source.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: epoll: possible bug from wakeup_source activation
  2013-03-08  4:56   ` Arve Hjønnevåg
@ 2013-03-08 20:49     ` Eric Wong
  2013-03-09  4:09       ` Arve Hjønnevåg
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Wong @ 2013-03-08 20:49 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: NeilBrown, Rafael J. Wysocki, linux-kernel, Davide Libenzi,
	Alexander Viro, linux-fsdevel, Andrew Morton

Arve Hjønnevåg <arve@android.com> wrote:
> On Thu, Mar 7, 2013 at 5:30 PM, Eric Wong <normalperson@yhbt.net> wrote:
> > Eric Wong <normalperson@yhbt.net> wrote:
> >> Hi Arve, looking at commit 4d7e30d98939a0340022ccd49325a3d70f7e0238
> >> (epoll: Add a flag, EPOLLWAKEUP, to prevent suspend ...)
> >>
> >> I think the reason for using ep->ws instead of epi->ws in the unlikely
> >> ovflist case applies to the likely rdllist case, too.  Since epi->ws is
> >> only protected by ep->mtx, it can also be deactivated while inside
> >> ep_poll_callback.
> >>
> >> So something like the following patch might be necessary
> >> (shown here with extra context):
> >>
> >> --- a/fs/eventpoll.c
> >> +++ b/fs/eventpoll.c
> >> @@ -968,39 +968,45 @@ static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *k
> >>       if (unlikely(ep->ovflist != EP_UNACTIVE_PTR)) {
> >>               if (epi->next == EP_UNACTIVE_PTR) {
> >>                       epi->next = ep->ovflist;
> >>                       ep->ovflist = epi;
> >>                       if (epi->ws) {
> >>                               /*
> >>                                * Activate ep->ws since epi->ws may get
> >>                                * deactivated at any time.
> >>                                */
> >>                               __pm_stay_awake(ep->ws);
> >>                       }
> >>
> >>               }
> >
> > Thinking about this more, it looks like the original ep->ovflist case of
> > using ep->ws is unnecessary.
> >
> > ep->ovflist != EP_UNACTIVE_PTR can only happen while ep->mtx is held (in
> > ep_scan_ready_list); which means ep_modify+friends cannot remove epi->ws.
> >
> 
> The callback function in ep_scan_ready_list can call __pm_relax on it though.
> 
> > ep_poll_callback holding ep->lock means ep_poll_callback prevents
> > ep_scan_ready_list from setting ep->ovflist = EP_UNACTIVE_PTR and
> > releasing ep->mtx.
>
> This code is reached when ep_scan_ready_list has set ep->ovflist to
> NULL before releasing ep->lock. Since the callback function can call
> __pm_relax on epi->ws without holding ep->lock we call __pm_stay_awake
> in ep->ws here (the callback does not call __pm_relax on that).

Thanks for the explanation.  I got "deactivate" and "destroy"
mixed up.  However, I'm still concerned about the "destroy" case:

> >
> >>               goto out_unlock;
> >>       }
> >>
> >>       /* If this file is already in the ready list we exit soon */
> >>       if (!ep_is_linked(&epi->rdllink)) {
> >>               list_add_tail(&epi->rdllink, &ep->rdllist);
> >> -             __pm_stay_awake(epi->ws);
> >> +             if (epi->ws) {
> >> +                     /*
> >> +                      * Activate ep->ws since epi->ws may get
> >> +                      * deactivated at any time.
> >> +                      */
> >> +                     __pm_stay_awake(ep->ws);
> >> +             }
> >>       }
> >
> > I still think ep->ws needs to be used in the common ep->rdllist case.
> 
> ep_scan_ready_list calls __pm_relax on ep->ws when it is done, so this
> will not work. ep->ws is not a "ep->rdllist not empty wakeup_source is
> is a "ep_scan_ready_list is running" wakeup_source.

What happens if ep_modify calls ep_destroy_wakeup_source
while __pm_stay_awake is running on the same epi->ws?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: epoll: possible bug from wakeup_source activation
  2013-03-08 20:49     ` Eric Wong
@ 2013-03-09  4:09       ` Arve Hjønnevåg
  2013-03-09  7:10         ` Eric Wong
  0 siblings, 1 reply; 14+ messages in thread
From: Arve Hjønnevåg @ 2013-03-09  4:09 UTC (permalink / raw)
  To: Eric Wong
  Cc: NeilBrown, Rafael J. Wysocki, linux-kernel, Davide Libenzi,
	Alexander Viro, linux-fsdevel, Andrew Morton

On Fri, Mar 8, 2013 at 12:49 PM, Eric Wong <normalperson@yhbt.net> wrote:
> Arve Hjønnevåg <arve@android.com> wrote:
>> On Thu, Mar 7, 2013 at 5:30 PM, Eric Wong <normalperson@yhbt.net> wrote:
>> > Eric Wong <normalperson@yhbt.net> wrote:
>> >> Hi Arve, looking at commit 4d7e30d98939a0340022ccd49325a3d70f7e0238
>> >> (epoll: Add a flag, EPOLLWAKEUP, to prevent suspend ...)
>> >>
>> >> I think the reason for using ep->ws instead of epi->ws in the unlikely
>> >> ovflist case applies to the likely rdllist case, too.  Since epi->ws is
>> >> only protected by ep->mtx, it can also be deactivated while inside
>> >> ep_poll_callback.
>> >>
>> >> So something like the following patch might be necessary
>> >> (shown here with extra context):
>> >>
>> >> --- a/fs/eventpoll.c
>> >> +++ b/fs/eventpoll.c
>> >> @@ -968,39 +968,45 @@ static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *k
>> >>       if (unlikely(ep->ovflist != EP_UNACTIVE_PTR)) {
>> >>               if (epi->next == EP_UNACTIVE_PTR) {
>> >>                       epi->next = ep->ovflist;
>> >>                       ep->ovflist = epi;
>> >>                       if (epi->ws) {
>> >>                               /*
>> >>                                * Activate ep->ws since epi->ws may get
>> >>                                * deactivated at any time.
>> >>                                */
>> >>                               __pm_stay_awake(ep->ws);
>> >>                       }
>> >>
>> >>               }
>> >
>> > Thinking about this more, it looks like the original ep->ovflist case of
>> > using ep->ws is unnecessary.
>> >
>> > ep->ovflist != EP_UNACTIVE_PTR can only happen while ep->mtx is held (in
>> > ep_scan_ready_list); which means ep_modify+friends cannot remove epi->ws.
>> >
>>
>> The callback function in ep_scan_ready_list can call __pm_relax on it though.
>>
>> > ep_poll_callback holding ep->lock means ep_poll_callback prevents
>> > ep_scan_ready_list from setting ep->ovflist = EP_UNACTIVE_PTR and
>> > releasing ep->mtx.
>>
>> This code is reached when ep_scan_ready_list has set ep->ovflist to
>> NULL before releasing ep->lock. Since the callback function can call
>> __pm_relax on epi->ws without holding ep->lock we call __pm_stay_awake
>> in ep->ws here (the callback does not call __pm_relax on that).
>
> Thanks for the explanation.  I got "deactivate" and "destroy"
> mixed up.  However, I'm still concerned about the "destroy" case:
>
>> >
>> >>               goto out_unlock;
>> >>       }
>> >>
>> >>       /* If this file is already in the ready list we exit soon */
>> >>       if (!ep_is_linked(&epi->rdllink)) {
>> >>               list_add_tail(&epi->rdllink, &ep->rdllist);
>> >> -             __pm_stay_awake(epi->ws);
>> >> +             if (epi->ws) {
>> >> +                     /*
>> >> +                      * Activate ep->ws since epi->ws may get
>> >> +                      * deactivated at any time.
>> >> +                      */
>> >> +                     __pm_stay_awake(ep->ws);
>> >> +             }
>> >>       }
>> >
>> > I still think ep->ws needs to be used in the common ep->rdllist case.
>>
>> ep_scan_ready_list calls __pm_relax on ep->ws when it is done, so this
>> will not work. ep->ws is not a "ep->rdllist not empty wakeup_source is
>> is a "ep_scan_ready_list is running" wakeup_source.
>
> What happens if ep_modify calls ep_destroy_wakeup_source
> while __pm_stay_awake is running on the same epi->ws?

Yes, that looks like a problem. I think calling
ep_destroy_wakeup_source with ep->lock held should fix that. It is not
clear how useful changing EPOLLWAKEUP in ep_modify is, so
alternatively we could remove that feature and instead only allow it
to be set in ep_insert.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: epoll: possible bug from wakeup_source activation
  2013-03-09  4:09       ` Arve Hjønnevåg
@ 2013-03-09  7:10         ` Eric Wong
  2013-03-10  1:11           ` Eric Wong
  2013-03-11 23:37           ` epoll: possible bug from wakeup_source activation Arve Hjønnevåg
  0 siblings, 2 replies; 14+ messages in thread
From: Eric Wong @ 2013-03-09  7:10 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: NeilBrown, Rafael J. Wysocki, linux-kernel, Davide Libenzi,
	Alexander Viro, linux-fsdevel, Andrew Morton

Arve Hjønnevåg <arve@android.com> wrote:
> On Fri, Mar 8, 2013 at 12:49 PM, Eric Wong <normalperson@yhbt.net> wrote:
> > What happens if ep_modify calls ep_destroy_wakeup_source
> > while __pm_stay_awake is running on the same epi->ws?
> 
> Yes, that looks like a problem. I think calling
> ep_destroy_wakeup_source with ep->lock held should fix that. It is not
> clear how useful changing EPOLLWAKEUP in ep_modify is, so
> alternatively we could remove that feature and instead only allow it
> to be set in ep_insert.

ep->lock would work, but ep->lock is already a source of heavy
contention in my multithreaded+epoll webservers.

Perhaps RCU can be used?  I've no experience with RCU, but I've been
meaning to get acquainted with RCU.

Another possible solution is to only use ep->ws and add an atomic
counter to ep; so __pm_relax(ep->ws) is only called when the atomic
counter reaches zero.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: epoll: possible bug from wakeup_source activation
  2013-03-09  7:10         ` Eric Wong
@ 2013-03-10  1:11           ` Eric Wong
  2013-03-10  4:59             ` Eric Dumazet
  2013-03-11 23:37           ` epoll: possible bug from wakeup_source activation Arve Hjønnevåg
  1 sibling, 1 reply; 14+ messages in thread
From: Eric Wong @ 2013-03-10  1:11 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: NeilBrown, Rafael J. Wysocki, linux-kernel, Davide Libenzi,
	Alexander Viro, linux-fsdevel, Andrew Morton

Eric Wong <normalperson@yhbt.net> wrote:
> Arve Hjønnevåg <arve@android.com> wrote:
> > On Fri, Mar 8, 2013 at 12:49 PM, Eric Wong <normalperson@yhbt.net> wrote:
> > > What happens if ep_modify calls ep_destroy_wakeup_source
> > > while __pm_stay_awake is running on the same epi->ws?
> > 
> > Yes, that looks like a problem. I think calling
> > ep_destroy_wakeup_source with ep->lock held should fix that. It is not
> > clear how useful changing EPOLLWAKEUP in ep_modify is, so
> > alternatively we could remove that feature and instead only allow it
> > to be set in ep_insert.
> 
> ep->lock would work, but ep->lock is already a source of heavy
> contention in my multithreaded+epoll webservers.
> 
> Perhaps RCU can be used?  I've no experience with RCU, but I've been
> meaning to get acquainted with RCU.

The following is lightly tested, at least I haven't gotten lockdep
to complain.

--------------------------------- 8< ----------------------------
From 2bcd549893aa204bd858cc1500a70f20b28e47c1 Mon Sep 17 00:00:00 2001
From: Eric Wong <normalperson@yhbt.net>
Date: Sun, 10 Mar 2013 01:06:50 +0000
Subject: [PATCH] epoll: RCU protect wakeup_source in epitem

This prevents wakeup_source destruction when a user hits the
item with EPOLL_CTL_MOD while ep_poll_callback is running.

Signed-off-by: Eric Wong <normalperson@yhbt.net>
---
 fs/eventpoll.c | 31 ++++++++++++++++++++++++-------
 1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 9fec183..e008d54 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -158,7 +158,7 @@ struct epitem {
 	struct list_head fllink;
 
 	/* wakeup_source used when EPOLLWAKEUP is set */
-	struct wakeup_source *ws;
+	struct wakeup_source __rcu *ws;
 
 	/* The structure that describe the interested events and the source fd */
 	struct epoll_event event;
@@ -536,6 +536,17 @@ static void ep_unregister_pollwait(struct eventpoll *ep, struct epitem *epi)
 	}
 }
 
+static inline void ep_pm_stay_awake(struct epitem *epi)
+{
+	struct wakeup_source *ws;
+
+	rcu_read_lock();
+	ws = rcu_dereference(epi->ws);
+	if (ws)
+		__pm_stay_awake(ws);
+	rcu_read_unlock();
+}
+
 /**
  * ep_scan_ready_list - Scans the ready list in a way that makes possible for
  *                      the scan code, to call f_op->poll(). Also allows for
@@ -984,7 +995,7 @@ static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *k
 	/* If this file is already in the ready list we exit soon */
 	if (!ep_is_linked(&epi->rdllink)) {
 		list_add_tail(&epi->rdllink, &ep->rdllist);
-		__pm_stay_awake(epi->ws);
+		ep_pm_stay_awake(epi);
 	}
 
 	/*
@@ -1146,6 +1157,7 @@ static int reverse_path_check(void)
 static int ep_create_wakeup_source(struct epitem *epi)
 {
 	const char *name;
+	struct wakeup_source *ws;
 
 	if (!epi->ep->ws) {
 		epi->ep->ws = wakeup_source_register("eventpoll");
@@ -1154,17 +1166,22 @@ static int ep_create_wakeup_source(struct epitem *epi)
 	}
 
 	name = epi->ffd.file->f_path.dentry->d_name.name;
-	epi->ws = wakeup_source_register(name);
-	if (!epi->ws)
+	ws = wakeup_source_register(name);
+
+	if (!ws)
 		return -ENOMEM;
+	rcu_assign_pointer(epi->ws, ws);
 
 	return 0;
 }
 
 static void ep_destroy_wakeup_source(struct epitem *epi)
 {
-	wakeup_source_unregister(epi->ws);
-	epi->ws = NULL;
+	struct wakeup_source *ws = epi->ws;
+
+	rcu_assign_pointer(epi->ws, NULL);
+	synchronize_rcu(); /* wait for ep_pm_stay_awake to finish */
+	wakeup_source_unregister(ws);
 }
 
 /*
@@ -1199,7 +1216,7 @@ static int ep_insert(struct eventpoll *ep, struct epoll_event *event,
 		if (error)
 			goto error_create_wakeup_source;
 	} else {
-		epi->ws = NULL;
+		RCU_INIT_POINTER(epi->ws, NULL);
 	}
 
 	/* Initialize the poll table using the queue callback */
-- 
Eric Wong

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: epoll: possible bug from wakeup_source activation
  2013-03-10  1:11           ` Eric Wong
@ 2013-03-10  4:59             ` Eric Dumazet
  2013-03-10 11:50               ` [PATCH] epoll: use RCU protect wakeup_source in epitem Eric Wong
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2013-03-10  4:59 UTC (permalink / raw)
  To: Eric Wong
  Cc: Arve Hjønnevåg, NeilBrown, Rafael J. Wysocki,
	linux-kernel, Davide Libenzi, Alexander Viro, linux-fsdevel,
	Andrew Morton

On Sun, 2013-03-10 at 01:11 +0000, Eric Wong wrote:
>  
>  static void ep_destroy_wakeup_source(struct epitem *epi)
>  {
> -	wakeup_source_unregister(epi->ws);
> -	epi->ws = NULL;
> +	struct wakeup_source *ws = epi->ws;
> +
> +	rcu_assign_pointer(epi->ws, NULL);

There is no need to use a memory barrier before setting epi->ws to NULL

You should use RCU_POINTER_INIT()

> +	synchronize_rcu(); /* wait for ep_pm_stay_awake to finish */

Wont this add a significant slowdown ?

> +	wakeup_source_unregister(ws);
>  }

Please add the following in your .config

CONFIG_SPARSE_RCU_POINTER=y

and try :

make C=2 fs/eventpoll.o

And fix all warnings ;)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] epoll: use RCU protect wakeup_source in epitem
  2013-03-10  4:59             ` Eric Dumazet
@ 2013-03-10 11:50               ` Eric Wong
  2013-03-14  3:09                 ` [PATCH mm] epoll: lock ep->mtx in ep_free to silence lockdep Eric Wong
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Wong @ 2013-03-10 11:50 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Arve Hjønnevåg, NeilBrown, Rafael J. Wysocki,
	linux-kernel, Davide Libenzi, Alexander Viro, linux-fsdevel,
	Andrew Morton

Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Sun, 2013-03-10 at 01:11 +0000, Eric Wong wrote:
> >  
> >  static void ep_destroy_wakeup_source(struct epitem *epi)
> >  {
> > -	wakeup_source_unregister(epi->ws);
> > -	epi->ws = NULL;
> > +	struct wakeup_source *ws = epi->ws;
> > +
> > +	rcu_assign_pointer(epi->ws, NULL);
> 
> There is no need to use a memory barrier before setting epi->ws to NULL
> 
> You should use RCU_POINTER_INIT()

Thanks for looking at this patch.  Using RCU_POINTER_INIT in my
updated patch below.

> > +	synchronize_rcu(); /* wait for ep_pm_stay_awake to finish */
> 
> Wont this add a significant slowdown ?

Maybe, but this is a very rare code path (using EPOLL_CTL_MOD to remove
a wakeup source), and synchronize_rcu gets called later, too:

	wakeup_source_unregister
		wakeup_source_remove
			synchronize_rcu

> > +	wakeup_source_unregister(ws);
> >  }
> 
> Please add the following in your .config
> 
> CONFIG_SPARSE_RCU_POINTER=y
> 
> and try :
> 
> make C=2 fs/eventpoll.o
> 
> And fix all warnings ;)

Done.

---------------------------------8<------------------------------
From 24e5ee6222cc91208119973248fc4f8a6d1a31da Mon Sep 17 00:00:00 2001
From: Eric Wong <normalperson@yhbt.net>
Date: Sun, 10 Mar 2013 01:06:50 +0000
Subject: [PATCH] epoll: use RCU protect wakeup_source in epitem

This prevents wakeup_source destruction when a user hits the item with
EPOLL_CTL_MOD while ep_poll_callback is running.

Tested with CONFIG_SPARSE_RCU_POINTER=y and "make fs/eventpoll.o C=2"

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: NeilBrown <neilb@suse.de>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Eric Wong <normalperson@yhbt.net>
---
 fs/eventpoll.c | 92 ++++++++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 71 insertions(+), 21 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 1326409..15d0f27 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -158,7 +158,7 @@ struct epitem {
 	struct list_head fllink;
 
 	/* wakeup_source used when EPOLLWAKEUP is set */
-	struct wakeup_source *ws;
+	struct wakeup_source __rcu *ws;
 
 	/* The structure that describe the interested events and the source fd */
 	struct epoll_event event;
@@ -536,6 +536,38 @@ static void ep_unregister_pollwait(struct eventpoll *ep, struct epitem *epi)
 	}
 }
 
+/* call only when ep->mtx is held */
+static inline struct wakeup_source *ep_wakeup_source(struct epitem *epi)
+{
+	return rcu_dereference_check(epi->ws, lockdep_is_held(&epi->ep->mtx));
+}
+
+/* call only when ep->mtx is held */
+static inline void ep_pm_stay_awake(struct epitem *epi)
+{
+	struct wakeup_source *ws = ep_wakeup_source(epi);
+
+	if (ws)
+		__pm_stay_awake(ws);
+}
+
+static inline bool ep_has_wakeup_source(struct epitem *epi)
+{
+	return rcu_access_pointer(epi->ws) ? true : false;
+}
+
+/* call when ep->mtx cannot be held (ep_poll_callback) */
+static inline void ep_pm_stay_awake_rcu(struct epitem *epi)
+{
+	struct wakeup_source *ws;
+
+	rcu_read_lock();
+	ws = rcu_dereference(epi->ws);
+	if (ws)
+		__pm_stay_awake(ws);
+	rcu_read_unlock();
+}
+
 /**
  * ep_scan_ready_list - Scans the ready list in a way that makes possible for
  *                      the scan code, to call f_op->poll(). Also allows for
@@ -599,7 +631,7 @@ static int ep_scan_ready_list(struct eventpoll *ep,
 		 */
 		if (!ep_is_linked(&epi->rdllink)) {
 			list_add_tail(&epi->rdllink, &ep->rdllist);
-			__pm_stay_awake(epi->ws);
+			ep_pm_stay_awake(epi);
 		}
 	}
 	/*
@@ -668,7 +700,7 @@ static int ep_remove(struct eventpoll *ep, struct epitem *epi)
 		list_del_init(&epi->rdllink);
 	spin_unlock_irqrestore(&ep->lock, flags);
 
-	wakeup_source_unregister(epi->ws);
+	wakeup_source_unregister(ep_wakeup_source(epi));
 
 	/* At this point it is safe to free the eventpoll item */
 	kmem_cache_free(epi_cache, epi);
@@ -752,7 +784,7 @@ static int ep_read_events_proc(struct eventpoll *ep, struct list_head *head,
 			 * callback, but it's not actually ready, as far as
 			 * caller requested events goes. We can remove it here.
 			 */
-			__pm_relax(epi->ws);
+			__pm_relax(ep_wakeup_source(epi));
 			list_del_init(&epi->rdllink);
 		}
 	}
@@ -984,7 +1016,7 @@ static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *k
 	/* If this file is already in the ready list we exit soon */
 	if (!ep_is_linked(&epi->rdllink)) {
 		list_add_tail(&epi->rdllink, &ep->rdllist);
-		__pm_stay_awake(epi->ws);
+		ep_pm_stay_awake(epi);
 	}
 
 	/*
@@ -1146,6 +1178,7 @@ static int reverse_path_check(void)
 static int ep_create_wakeup_source(struct epitem *epi)
 {
 	const char *name;
+	struct wakeup_source *ws;
 
 	if (!epi->ep->ws) {
 		epi->ep->ws = wakeup_source_register("eventpoll");
@@ -1154,17 +1187,29 @@ static int ep_create_wakeup_source(struct epitem *epi)
 	}
 
 	name = epi->ffd.file->f_path.dentry->d_name.name;
-	epi->ws = wakeup_source_register(name);
-	if (!epi->ws)
+	ws = wakeup_source_register(name);
+
+	if (!ws)
 		return -ENOMEM;
+	rcu_assign_pointer(epi->ws, ws);
 
 	return 0;
 }
 
-static void ep_destroy_wakeup_source(struct epitem *epi)
+/* rare code path, only used when EPOLL_CTL_MOD removes a wakeup source */
+static noinline void ep_destroy_wakeup_source(struct epitem *epi)
 {
-	wakeup_source_unregister(epi->ws);
-	epi->ws = NULL;
+	struct wakeup_source *ws = ep_wakeup_source(epi);
+
+	rcu_assign_pointer(epi->ws, NULL);
+
+	/*
+	 * wait for ep_pm_stay_awake_rcu to finish, synchronize_rcu is
+	 * used internally by wakeup_source_remove, too (called by
+	 * wakeup_source_unregister), so we cannot use call_rcu
+	 */
+	synchronize_rcu();
+	wakeup_source_unregister(ws);
 }
 
 /*
@@ -1199,7 +1244,7 @@ static int ep_insert(struct eventpoll *ep, struct epoll_event *event,
 		if (error)
 			goto error_create_wakeup_source;
 	} else {
-		epi->ws = NULL;
+		RCU_INIT_POINTER(epi->ws, NULL);
 	}
 
 	/* Initialize the poll table using the queue callback */
@@ -1247,7 +1292,7 @@ static int ep_insert(struct eventpoll *ep, struct epoll_event *event,
 	/* If the file is already "ready" we drop it inside the ready list */
 	if ((revents & event->events) && !ep_is_linked(&epi->rdllink)) {
 		list_add_tail(&epi->rdllink, &ep->rdllist);
-		__pm_stay_awake(epi->ws);
+		ep_pm_stay_awake(epi);
 
 		/* Notify waiting tasks that events are available */
 		if (waitqueue_active(&ep->wq))
@@ -1288,7 +1333,7 @@ error_unregister:
 		list_del_init(&epi->rdllink);
 	spin_unlock_irqrestore(&ep->lock, flags);
 
-	wakeup_source_unregister(epi->ws);
+	wakeup_source_unregister(ep_wakeup_source(epi));
 
 error_create_wakeup_source:
 	kmem_cache_free(epi_cache, epi);
@@ -1317,9 +1362,9 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even
 	pt._key = event->events;
 	epi->event.data = event->data; /* protected by mtx */
 	if (epi->event.events & EPOLLWAKEUP) {
-		if (!epi->ws)
+		if (!ep_has_wakeup_source(epi))
 			ep_create_wakeup_source(epi);
-	} else if (epi->ws) {
+	} else if (ep_has_wakeup_source(epi)) {
 		ep_destroy_wakeup_source(epi);
 	}
 
@@ -1357,7 +1402,7 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even
 		spin_lock_irq(&ep->lock);
 		if (!ep_is_linked(&epi->rdllink)) {
 			list_add_tail(&epi->rdllink, &ep->rdllist);
-			__pm_stay_awake(epi->ws);
+			ep_pm_stay_awake(epi);
 
 			/* Notify waiting tasks that events are available */
 			if (waitqueue_active(&ep->wq))
@@ -1383,6 +1428,7 @@ static int ep_send_events_proc(struct eventpoll *ep, struct list_head *head,
 	unsigned int revents;
 	struct epitem *epi;
 	struct epoll_event __user *uevent;
+	struct wakeup_source *ws;
 	poll_table pt;
 
 	init_poll_funcptr(&pt, NULL);
@@ -1405,9 +1451,13 @@ static int ep_send_events_proc(struct eventpoll *ep, struct list_head *head,
 		 * instead, but then epi->ws would temporarily be out of sync
 		 * with ep_is_linked().
 		 */
-		if (epi->ws && epi->ws->active)
-			__pm_stay_awake(ep->ws);
-		__pm_relax(epi->ws);
+		ws = ep_wakeup_source(epi);
+		if (ws) {
+			if (ws->active)
+				__pm_stay_awake(ep->ws);
+			__pm_relax(ws);
+		}
+
 		list_del_init(&epi->rdllink);
 
 		pt._key = epi->event.events;
@@ -1424,7 +1474,7 @@ static int ep_send_events_proc(struct eventpoll *ep, struct list_head *head,
 			if (__put_user(revents, &uevent->events) ||
 			    __put_user(epi->event.data, &uevent->data)) {
 				list_add(&epi->rdllink, head);
-				__pm_stay_awake(epi->ws);
+				ep_pm_stay_awake(epi);
 				return eventcnt ? eventcnt : -EFAULT;
 			}
 			eventcnt++;
@@ -1444,7 +1494,7 @@ static int ep_send_events_proc(struct eventpoll *ep, struct list_head *head,
 				 * poll callback will queue them in ep->ovflist.
 				 */
 				list_add_tail(&epi->rdllink, &ep->rdllist);
-				__pm_stay_awake(epi->ws);
+				ep_pm_stay_awake(epi);
 			}
 		}
 	}
-- 
1.8.2.rc3.2.g89ce8d6

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: epoll: possible bug from wakeup_source activation
  2013-03-09  7:10         ` Eric Wong
  2013-03-10  1:11           ` Eric Wong
@ 2013-03-11 23:37           ` Arve Hjønnevåg
  2013-03-12  0:17             ` Eric Wong
  1 sibling, 1 reply; 14+ messages in thread
From: Arve Hjønnevåg @ 2013-03-11 23:37 UTC (permalink / raw)
  To: Eric Wong
  Cc: NeilBrown, Rafael J. Wysocki, linux-kernel, Davide Libenzi,
	Alexander Viro, linux-fsdevel, Andrew Morton

On Fri, Mar 8, 2013 at 11:10 PM, Eric Wong <normalperson@yhbt.net> wrote:
> Arve Hjønnevåg <arve@android.com> wrote:
>> On Fri, Mar 8, 2013 at 12:49 PM, Eric Wong <normalperson@yhbt.net> wrote:
>> > What happens if ep_modify calls ep_destroy_wakeup_source
>> > while __pm_stay_awake is running on the same epi->ws?
>>
>> Yes, that looks like a problem. I think calling
>> ep_destroy_wakeup_source with ep->lock held should fix that. It is not
>> clear how useful changing EPOLLWAKEUP in ep_modify is, so
>> alternatively we could remove that feature and instead only allow it
>> to be set in ep_insert.
>
> ep->lock would work, but ep->lock is already a source of heavy
> contention in my multithreaded+epoll webservers.
>

This should not have any significant impact on that since you would be
adding a lock to a code path that is, as far as I know, unused.

> Perhaps RCU can be used?  I've no experience with RCU, but I've been
> meaning to get acquainted with RCU.
>

That adds code to the common path however. The wakeup_source is not
touch without holding one of the locks so holding both locks before
deleting it seems like a simpler solution.

> Another possible solution is to only use ep->ws and add an atomic
> counter to ep; so __pm_relax(ep->ws) is only called when the atomic
> counter reaches zero.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: epoll: possible bug from wakeup_source activation
  2013-03-11 23:37           ` epoll: possible bug from wakeup_source activation Arve Hjønnevåg
@ 2013-03-12  0:17             ` Eric Wong
  2013-03-12  0:29               ` Arve Hjønnevåg
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Wong @ 2013-03-12  0:17 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: NeilBrown, Rafael J. Wysocki, linux-kernel, Davide Libenzi,
	Alexander Viro, linux-fsdevel, Andrew Morton

Arve Hjønnevåg <arve@android.com> wrote:
> On Fri, Mar 8, 2013 at 11:10 PM, Eric Wong <normalperson@yhbt.net> wrote:
> > Arve Hjønnevåg <arve@android.com> wrote:
> >> On Fri, Mar 8, 2013 at 12:49 PM, Eric Wong <normalperson@yhbt.net> wrote:
> >> > What happens if ep_modify calls ep_destroy_wakeup_source
> >> > while __pm_stay_awake is running on the same epi->ws?
> >>
> >> Yes, that looks like a problem. I think calling
> >> ep_destroy_wakeup_source with ep->lock held should fix that. It is not
> >> clear how useful changing EPOLLWAKEUP in ep_modify is, so
> >> alternatively we could remove that feature and instead only allow it
> >> to be set in ep_insert.
> >
> > ep->lock would work, but ep->lock is already a source of heavy
> > contention in my multithreaded+epoll webservers.
> 
> This should not have any significant impact on that since you would be
> adding a lock to a code path that is, as far as I know, unused.
> 
> > Perhaps RCU can be used?  I've no experience with RCU, but I've been
> > meaning to get acquainted with RCU.
> 
> That adds code to the common path however. The wakeup_source is not
> touch without holding one of the locks so holding both locks before
> deleting it seems like a simpler solution.

True.  However, I've been looking into eliminating ep->lock in more
places (maybe entirely)[1].

I don't think the current overhead of RCU in epoll is significant,
either.


[1] I'll be testing Mathieu's wait-free concurrent queue soon:
    http://mid.gmane.org/20130311213602.GB9829@Krystal

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: epoll: possible bug from wakeup_source activation
  2013-03-12  0:17             ` Eric Wong
@ 2013-03-12  0:29               ` Arve Hjønnevåg
  2013-03-12  0:44                 ` Eric Wong
  0 siblings, 1 reply; 14+ messages in thread
From: Arve Hjønnevåg @ 2013-03-12  0:29 UTC (permalink / raw)
  To: Eric Wong
  Cc: NeilBrown, Rafael J. Wysocki, linux-kernel, Davide Libenzi,
	Alexander Viro, linux-fsdevel, Andrew Morton

On Mon, Mar 11, 2013 at 5:17 PM, Eric Wong <normalperson@yhbt.net> wrote:
> Arve Hjønnevåg <arve@android.com> wrote:
>> On Fri, Mar 8, 2013 at 11:10 PM, Eric Wong <normalperson@yhbt.net> wrote:
>> > Arve Hjønnevåg <arve@android.com> wrote:
>> >> On Fri, Mar 8, 2013 at 12:49 PM, Eric Wong <normalperson@yhbt.net> wrote:
>> >> > What happens if ep_modify calls ep_destroy_wakeup_source
>> >> > while __pm_stay_awake is running on the same epi->ws?
>> >>
>> >> Yes, that looks like a problem. I think calling
>> >> ep_destroy_wakeup_source with ep->lock held should fix that. It is not
>> >> clear how useful changing EPOLLWAKEUP in ep_modify is, so
>> >> alternatively we could remove that feature and instead only allow it
>> >> to be set in ep_insert.
>> >
>> > ep->lock would work, but ep->lock is already a source of heavy
>> > contention in my multithreaded+epoll webservers.
>>
>> This should not have any significant impact on that since you would be
>> adding a lock to a code path that is, as far as I know, unused.
>>
>> > Perhaps RCU can be used?  I've no experience with RCU, but I've been
>> > meaning to get acquainted with RCU.
>>
>> That adds code to the common path however. The wakeup_source is not
>> touch without holding one of the locks so holding both locks before
>> deleting it seems like a simpler solution.
>
> True.  However, I've been looking into eliminating ep->lock in more
> places (maybe entirely)[1].
>
> I don't think the current overhead of RCU in epoll is significant,
> either.
>
>
> [1] I'll be testing Mathieu's wait-free concurrent queue soon:
>     http://mid.gmane.org/20130311213602.GB9829@Krystal

OK, but is there any way you could use the same locking scheme for the
wakeup_source and the queue?

-- 
Arve Hjønnevåg
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: epoll: possible bug from wakeup_source activation
  2013-03-12  0:29               ` Arve Hjønnevåg
@ 2013-03-12  0:44                 ` Eric Wong
  0 siblings, 0 replies; 14+ messages in thread
From: Eric Wong @ 2013-03-12  0:44 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: NeilBrown, Rafael J. Wysocki, linux-kernel, Davide Libenzi,
	Alexander Viro, linux-fsdevel, Andrew Morton

Arve Hjønnevåg <arve@android.com> wrote:
> On Mon, Mar 11, 2013 at 5:17 PM, Eric Wong <normalperson@yhbt.net> wrote:
> > Arve Hjønnevåg <arve@android.com> wrote:
> >> On Fri, Mar 8, 2013 at 11:10 PM, Eric Wong <normalperson@yhbt.net> wrote:
> >> > Arve Hjønnevåg <arve@android.com> wrote:
> >> >> On Fri, Mar 8, 2013 at 12:49 PM, Eric Wong <normalperson@yhbt.net> wrote:
> >> >> > What happens if ep_modify calls ep_destroy_wakeup_source
> >> >> > while __pm_stay_awake is running on the same epi->ws?
> >> >>
> >> >> Yes, that looks like a problem. I think calling
> >> >> ep_destroy_wakeup_source with ep->lock held should fix that. It is not
> >> >> clear how useful changing EPOLLWAKEUP in ep_modify is, so
> >> >> alternatively we could remove that feature and instead only allow it
> >> >> to be set in ep_insert.
> >> >
> >> > ep->lock would work, but ep->lock is already a source of heavy
> >> > contention in my multithreaded+epoll webservers.
> >>
> >> This should not have any significant impact on that since you would be
> >> adding a lock to a code path that is, as far as I know, unused.
> >>
> >> > Perhaps RCU can be used?  I've no experience with RCU, but I've been
> >> > meaning to get acquainted with RCU.
> >>
> >> That adds code to the common path however. The wakeup_source is not
> >> touch without holding one of the locks so holding both locks before
> >> deleting it seems like a simpler solution.
> >
> > True.  However, I've been looking into eliminating ep->lock in more
> > places (maybe entirely)[1].
> >
> > I don't think the current overhead of RCU in epoll is significant,
> > either.
> >
> >
> > [1] I'll be testing Mathieu's wait-free concurrent queue soon:
> >     http://mid.gmane.org/20130311213602.GB9829@Krystal
> 
> OK, but is there any way you could use the same locking scheme for the
> wakeup_source and the queue?

Probably, yes.  I think I can just use ep->mtx and ignore the mutex
included with wfcq_head, need to protect the rbtree while dequeueing.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH mm] epoll: lock ep->mtx in ep_free to silence lockdep
  2013-03-10 11:50               ` [PATCH] epoll: use RCU protect wakeup_source in epitem Eric Wong
@ 2013-03-14  3:09                 ` Eric Wong
  0 siblings, 0 replies; 14+ messages in thread
From: Eric Wong @ 2013-03-14  3:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Arve Hjønnevåg, NeilBrown, Rafael J. Wysocki,
	linux-kernel, Davide Libenzi, Al Viro, linux-fsdevel,
	Eric Dumazet

Technically we do not need to hold ep->mtx during ep_free since we are
certain there are no other users of ep at that point.  However, lockdep
complains with a "suspicious rcu_dereference_check() usage!" message;
so lock the mutex before ep_remove to silence the warning.

Signed-off-by: Eric Wong <normalperson@yhbt.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: NeilBrown <neilb@suse.de>,
Cc: Rafael J. Wysocki <rjw@sisk.pl>
---
  I considered using lockdep_off()/lockdep_on(), but I figure that may
  hide other bugs...

 fs/eventpoll.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 633e69f..d71b754 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -745,11 +745,15 @@ static void ep_free(struct eventpoll *ep)
 	 * point we are sure no poll callbacks will be lingering around, and also by
 	 * holding "epmutex" we can be sure that no file cleanup code will hit
 	 * us during this operation. So we can avoid the lock on "ep->lock".
+	 * We do not need to lock ep->mtx, either, we only do it to prevent
+	 * a lockdep warning.
 	 */
+	mutex_lock(&ep->mtx);
 	while ((rbp = rb_first(&ep->rbr)) != NULL) {
 		epi = rb_entry(rbp, struct epitem, rbn);
 		ep_remove(ep, epi);
 	}
+	mutex_unlock(&ep->mtx);
 
 	mutex_unlock(&epmutex);
 	mutex_destroy(&ep->mtx);
-- 
Eric Wong

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2013-03-14  3:09 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-07 11:26 epoll: possible bug from wakeup_source activation Eric Wong
2013-03-08  1:30 ` Eric Wong
2013-03-08  4:56   ` Arve Hjønnevåg
2013-03-08 20:49     ` Eric Wong
2013-03-09  4:09       ` Arve Hjønnevåg
2013-03-09  7:10         ` Eric Wong
2013-03-10  1:11           ` Eric Wong
2013-03-10  4:59             ` Eric Dumazet
2013-03-10 11:50               ` [PATCH] epoll: use RCU protect wakeup_source in epitem Eric Wong
2013-03-14  3:09                 ` [PATCH mm] epoll: lock ep->mtx in ep_free to silence lockdep Eric Wong
2013-03-11 23:37           ` epoll: possible bug from wakeup_source activation Arve Hjønnevåg
2013-03-12  0:17             ` Eric Wong
2013-03-12  0:29               ` Arve Hjønnevåg
2013-03-12  0:44                 ` Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).