* [PATCH 1/6] [2.6.29] epoll: fix for epoll_wait sometimes returning events on closed fds
@ 2009-02-24 17:26 Tony Battersby
2009-02-24 18:12 ` Eric Dumazet
0 siblings, 1 reply; 7+ messages in thread
From: Tony Battersby @ 2009-02-24 17:26 UTC (permalink / raw)
To: Andrew Morton, Davide Libenzi; +Cc: Jonathan Corbet, linux-kernel
>From the epoll manpage:
Q: Will closing a file descriptor cause it to be removed from all
epoll sets automatically?
A: Yes
sys_close() calls filp_close(), which calls fput(). If no one else
holds a reference to the file, then fput() calls __fput(), and __fput()
calls eventpoll_release(), which prevents epoll_wait() from returning
events on the fd to userspace. In the rare case that sys_close()
doesn't call __fput() because someone else has a reference to the file,
a subsequent epoll_wait() may still unexpectedly return events on the
fd after it has been closed. This can end up confusing or crashing
a userspace program that doesn't do epoll_ctl(EPOLL_CTL_DEL) before
closing the fd. I have reports of this actually happening under
heavy load with a program using epoll with network sockets on 2.6.27.
This patch fixes the problem by calling eventpoll_release_file()
from filp_close() instead of from __fput().
The locking in eventpoll_release() and eventpoll_release_file() needs
to be changed because previously it relied on the fact that no one
else could have a reference to the file when called from __fput(),
and this is no longer true. The new locking is admittedly ugly,
but I believe it works.
ep_insert() now also needs to check if the file has been closed
to avoid races in multi-threaded programs where one thread is doing
epoll_ctl(EPOLL_CTL_ADD) and another thread is closing the fd. This is
done by checking that fget_light still returns the same struct file *
as before.
Note that the list_del_init(&epi->fllink) previously done in
eventpoll_release_file() was unnecessary because it is also done
by ep_remove().
Userspace programs that might run on kernels with this bug can work
around the problem by doing epoll_ctl(EPOLL_CTL_DEL) before close().
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
CC: <stable@kernel.org>
---
This patch is against 2.6.29-rc6. I would like it to go into 2.6.29
and -stable; however, it will create conflicts with the patches
currently in -mm and linux-next.
diff -urpN a/fs/eventpoll.c b/fs/eventpoll.c
--- a/fs/eventpoll.c 2009-02-23 17:44:51.000000000 -0500
+++ b/fs/eventpoll.c 2009-02-24 09:32:42.000000000 -0500
@@ -538,27 +538,40 @@ void eventpoll_release_file(struct file
struct epitem *epi;
/*
- * We don't want to get "file->f_ep_lock" because it is not
- * necessary. It is not necessary because we're in the "struct file"
- * cleanup path, and this means that noone is using this file anymore.
- * So, for example, epoll_ctl() cannot hit here sicne if we reach this
- * point, the file counter already went to zero and fget() would fail.
- * The only hit might come from ep_free() but by holding the mutex
- * will correctly serialize the operation. We do need to acquire
- * "ep->mtx" after "epmutex" because ep_remove() requires it when called
- * from anywhere but ep_free().
+ * epmutex makes it safe to use *ep by preventing ep_free() from
+ * running.
*/
mutex_lock(&epmutex);
+ spin_lock(&file->f_ep_lock);
while (!list_empty(lsthead)) {
- epi = list_first_entry(lsthead, struct epitem, fllink);
+ ep = list_first_entry(lsthead, struct epitem, fllink)->ep;
+ spin_unlock(&file->f_ep_lock);
- ep = epi->ep;
- list_del_init(&epi->fllink);
+ /*
+ * ep->mtx protects against epoll_ctl() and is required for
+ * calling ep_remove().
+ */
mutex_lock(&ep->mtx);
- ep_remove(ep, epi);
+
+ spin_lock(&file->f_ep_lock);
+ epi = list_first_entry(lsthead, struct epitem, fllink);
+ spin_unlock(&file->f_ep_lock);
+
+ /*
+ * epoll_ctl(EPOLL_CTL_DEL) may have modified the list before
+ * we locked ep->mtx, so it is necessary to check that the
+ * first entry still exists and belongs to the ep whose mtx we
+ * are holding.
+ */
+ if (epi && epi->ep == ep)
+ ep_remove(ep, epi);
+
mutex_unlock(&ep->mtx);
+
+ spin_lock(&file->f_ep_lock);
}
+ spin_unlock(&file->f_ep_lock);
mutex_unlock(&epmutex);
}
@@ -736,6 +749,22 @@ static void ep_rbtree_insert(struct even
}
/*
+ * Returns true if the fd has been closed or false if not.
+ */
+static bool ep_is_file_closed(struct file *file, int fd)
+{
+ struct file *file2;
+ int needs_fput;
+ bool closed;
+
+ file2 = fget_light(fd, &needs_fput);
+ closed = (file2 != file);
+ if (file2)
+ fput_light(file2, needs_fput);
+ return closed;
+}
+
+/*
* Must be called with "mtx" held.
*/
static int ep_insert(struct eventpoll *ep, struct epoll_event *event,
@@ -786,6 +815,17 @@ static int ep_insert(struct eventpoll *e
/* Add the current item to the list of active epoll hook for this file */
spin_lock(&tfile->f_ep_lock);
+ /*
+ * Don't add the fd to the epoll set if it was closed in between the
+ * initial fget() and the time we get here. This check must be done
+ * while holding f_ep_lock to prevent races with
+ * eventpoll_release_file().
+ */
+ if (ep_is_file_closed(tfile, fd)) {
+ spin_unlock(&tfile->f_ep_lock);
+ error = -EBADF;
+ goto error_unregister;
+ }
list_add_tail(&epi->fllink, &tfile->f_ep_links);
spin_unlock(&tfile->f_ep_lock);
diff -urpN a/fs/file_table.c b/fs/file_table.c
--- a/fs/file_table.c 2009-02-23 17:44:51.000000000 -0500
+++ b/fs/file_table.c 2009-02-24 09:31:35.000000000 -0500
@@ -265,11 +265,6 @@ void __fput(struct file *file)
might_sleep();
fsnotify_close(file);
- /*
- * The function eventpoll_release() should be the first called
- * in the file cleanup chain.
- */
- eventpoll_release(file);
locks_remove_flock(file);
if (unlikely(file->f_flags & FASYNC)) {
diff -urpN a/fs/open.c b/fs/open.c
--- a/fs/open.c 2009-02-23 17:44:51.000000000 -0500
+++ b/fs/open.c 2009-02-24 09:31:35.000000000 -0500
@@ -29,6 +29,7 @@
#include <linux/rcupdate.h>
#include <linux/audit.h>
#include <linux/falloc.h>
+#include <linux/eventpoll.h>
int vfs_statfs(struct dentry *dentry, struct kstatfs *buf)
{
@@ -1104,6 +1105,7 @@ int filp_close(struct file *filp, fl_own
dnotify_flush(filp, id);
locks_remove_posix(filp, id);
+ eventpoll_release(filp);
fput(filp);
return retval;
}
diff -urpN a/include/linux/eventpoll.h b/include/linux/eventpoll.h
--- a/include/linux/eventpoll.h 2008-12-24 18:26:37.000000000 -0500
+++ b/include/linux/eventpoll.h 2009-02-24 09:31:35.000000000 -0500
@@ -69,23 +69,25 @@ static inline void eventpoll_init_file(s
void eventpoll_release_file(struct file *file);
/*
- * This is called from inside fs/file_table.c:__fput() to unlink files
+ * This is called from inside fs/open.c:filp_close() to unlink files
* from the eventpoll interface. We need to have this facility to cleanup
* correctly files that are closed without being removed from the eventpoll
* interface.
*/
static inline void eventpoll_release(struct file *file)
{
+ bool empty;
/*
- * Fast check to avoid the get/release of the semaphore. Since
- * we're doing this outside the semaphore lock, it might return
+ * Fast check to avoid the get/release of the mutex. Since
+ * we're doing this outside the mutex lock, it might return
* false negatives, but we don't care. It'll help in 99.99% of cases
- * to avoid the semaphore lock. False positives simply cannot happen
- * because the file in on the way to be removed and nobody ( but
- * eventpoll ) has still a reference to this file.
+ * to avoid the mutex lock.
*/
- if (likely(list_empty(&file->f_ep_links)))
+ spin_lock(&file->f_ep_lock);
+ empty = list_empty(&file->f_ep_links);
+ spin_unlock(&file->f_ep_lock);
+ if (likely(empty))
return;
/*
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH 1/6] [2.6.29] epoll: fix for epoll_wait sometimes returning events on closed fds
2009-02-24 17:26 [PATCH 1/6] [2.6.29] epoll: fix for epoll_wait sometimes returning events on closed fds Tony Battersby
@ 2009-02-24 18:12 ` Eric Dumazet
2009-02-24 18:44 ` Tony Battersby
0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2009-02-24 18:12 UTC (permalink / raw)
To: Tony Battersby
Cc: Andrew Morton, Davide Libenzi, Jonathan Corbet, linux-kernel
Tony Battersby a écrit :
>>From the epoll manpage:
> Q: Will closing a file descriptor cause it to be removed from all
> epoll sets automatically?
> A: Yes
>
> sys_close() calls filp_close(), which calls fput(). If no one else
> holds a reference to the file, then fput() calls __fput(), and __fput()
> calls eventpoll_release(), which prevents epoll_wait() from returning
> events on the fd to userspace. In the rare case that sys_close()
> doesn't call __fput() because someone else has a reference to the file,
> a subsequent epoll_wait() may still unexpectedly return events on the
> fd after it has been closed. This can end up confusing or crashing
> a userspace program that doesn't do epoll_ctl(EPOLL_CTL_DEL) before
> closing the fd. I have reports of this actually happening under
> heavy load with a program using epoll with network sockets on 2.6.27.
>
> This patch fixes the problem by calling eventpoll_release_file()
> from filp_close() instead of from __fput().
>
> The locking in eventpoll_release() and eventpoll_release_file() needs
> to be changed because previously it relied on the fact that no one
> else could have a reference to the file when called from __fput(),
> and this is no longer true. The new locking is admittedly ugly,
> but I believe it works.
>
> ep_insert() now also needs to check if the file has been closed
> to avoid races in multi-threaded programs where one thread is doing
> epoll_ctl(EPOLL_CTL_ADD) and another thread is closing the fd. This is
> done by checking that fget_light still returns the same struct file *
> as before.
>
> Note that the list_del_init(&epi->fllink) previously done in
> eventpoll_release_file() was unnecessary because it is also done
> by ep_remove().
>
> Userspace programs that might run on kernels with this bug can work
> around the problem by doing epoll_ctl(EPOLL_CTL_DEL) before close().
>
> Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
> CC: <stable@kernel.org>
Your patch may solve part of the problem. In your programs, maybe you
have one thread doing all epoll_wait() and close() syscalls, but what
of other programs ?
What prevents a thread doing close(fd) right after an other thread
got this fd from epoll_wait() ?
Nothing, and application may strangely react.
The moment you have several threads doing read()/write()/close() syscalls
on the same fd at the same time, you obviously get problems, not
only with epoll.
In a typical epoll driven application, with a pool of N worker threads all doing :
while (1) {
fd = epoll_wait(epoll_fd);
work_on_fd(fd); /* possibly calling close(fd); */
}
Then, you must be prepared to get a *false* event, ie an fd that another worker
already closed (and eventually reopened)
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH 1/6] [2.6.29] epoll: fix for epoll_wait sometimes returning events on closed fds
2009-02-24 18:12 ` Eric Dumazet
@ 2009-02-24 18:44 ` Tony Battersby
2009-02-24 19:52 ` Eric Dumazet
0 siblings, 1 reply; 7+ messages in thread
From: Tony Battersby @ 2009-02-24 18:44 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Andrew Morton, Davide Libenzi, Jonathan Corbet, linux-kernel
Eric Dumazet wrote:
> Your patch may solve part of the problem. In your programs, maybe you
> have one thread doing all epoll_wait() and close() syscalls, but what
> of other programs ?
>
> What prevents a thread doing close(fd) right after an other thread
> got this fd from epoll_wait() ?
> Nothing, and application may strangely react.
>
> The moment you have several threads doing read()/write()/close() syscalls
> on the same fd at the same time, you obviously get problems, not
> only with epoll.
>
> In a typical epoll driven application, with a pool of N worker threads all doing :
>
> while (1) {
> fd = epoll_wait(epoll_fd);
> work_on_fd(fd); /* possibly calling close(fd); */
> }
>
> Then, you must be prepared to get a *false* event, ie an fd that another worker
> already closed (and eventually reopened)
>
>
>
>
Yes, I agree that userspace threads do need synchronization to prevent
one thread from stomping on another thread's data. If userspace can't
prove that close() returned before the call to epoll_wait(), then
epoll_wait() may legitimately return an event on a closed fd. That's
why my test program did close() and then epoll_wait() from the same
thread - to prove that they were serialized.
I am not actually using epoll in any of my programs right now; I was
just investigating a bug reported to me by another programmer at my
company. So my test program isn't intented to reflect anything other
than a way to reproduce the problem reliably. However, I can imagine
that a network program might want to spawn separate rx/tx threads on the
same socket, in which case it might make sense to have separate threads
accessing the same file descriptor. As you say, the two threads would
have to use proper locking, but that is purely a userspace issue that
kernel developers need not worry about. So I am only concerned with the
case that userspace can prove that close() and epoll_wait() were
properly serialized, and epoll_wait() still returned an event on the
closed fd.
Tony
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH 1/6] [2.6.29] epoll: fix for epoll_wait sometimes returning events on closed fds
2009-02-24 18:44 ` Tony Battersby
@ 2009-02-24 19:52 ` Eric Dumazet
2009-02-24 20:36 ` Tony Battersby
0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2009-02-24 19:52 UTC (permalink / raw)
To: Tony Battersby
Cc: Andrew Morton, Davide Libenzi, Jonathan Corbet, linux-kernel
Tony Battersby a écrit :
> Eric Dumazet wrote:
>> Your patch may solve part of the problem. In your programs, maybe you
>> have one thread doing all epoll_wait() and close() syscalls, but what
>> of other programs ?
>>
>> What prevents a thread doing close(fd) right after an other thread
>> got this fd from epoll_wait() ?
>> Nothing, and application may strangely react.
>>
>> The moment you have several threads doing read()/write()/close() syscalls
>> on the same fd at the same time, you obviously get problems, not
>> only with epoll.
>>
>> In a typical epoll driven application, with a pool of N worker threads all doing :
>>
>> while (1) {
>> fd = epoll_wait(epoll_fd);
>> work_on_fd(fd); /* possibly calling close(fd); */
>> }
>>
>> Then, you must be prepared to get a *false* event, ie an fd that another worker
>> already closed (and eventually reopened)
>>
>>
>>
>>
> Yes, I agree that userspace threads do need synchronization to prevent
> one thread from stomping on another thread's data. If userspace can't
> prove that close() returned before the call to epoll_wait(), then
> epoll_wait() may legitimately return an event on a closed fd. That's
> why my test program did close() and then epoll_wait() from the same
> thread - to prove that they were serialized.
>
> I am not actually using epoll in any of my programs right now; I was
> just investigating a bug reported to me by another programmer at my
> company. So my test program isn't intented to reflect anything other
> than a way to reproduce the problem reliably. However, I can imagine
> that a network program might want to spawn separate rx/tx threads on the
> same socket, in which case it might make sense to have separate threads
> accessing the same file descriptor. As you say, the two threads would
> have to use proper locking, but that is purely a userspace issue that
> kernel developers need not worry about. So I am only concerned with the
> case that userspace can prove that close() and epoll_wait() were
> properly serialized, and epoll_wait() still returned an event on the
> closed fd.
I believe Davide already said that close() meant the real close, not
the file descriptor freeing.
And current documentations clearly states it, maybe your man page is too old ?
---
Q6 Will closing a file descriptor cause it to be removed from all epoll sets automatically?
A6 Yes, but be aware of the following point. A file descriptor is a reference to an open file description (see
open(2)). Whenever a descriptor is duplicated via dup(2), dup2(2), fcntl(2) F_DUPFD, or fork(2), a new file descriptor referring to the same open file description is created. An open file description continues to exist until all
file descriptors referring to it have been closed. A file descriptor is removed from an epoll set only after all the
file descriptors referring to the underlying open file description have been closed (or before if the descriptor is
explicitly removed using epoll_ctl() EPOLL_CTL_DEL). This means that even after a file descriptor that is part of an
epoll set has been closed, events may be reported for that file descriptor if other file descriptors referring to the
same underlying file description remain open.
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH 1/6] [2.6.29] epoll: fix for epoll_wait sometimes returning events on closed fds
2009-02-24 19:52 ` Eric Dumazet
@ 2009-02-24 20:36 ` Tony Battersby
2009-02-24 20:45 ` Davide Libenzi
0 siblings, 1 reply; 7+ messages in thread
From: Tony Battersby @ 2009-02-24 20:36 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Andrew Morton, Davide Libenzi, Jonathan Corbet, linux-kernel
Eric Dumazet wrote:
> I believe Davide already said that close() meant the real close, not
> the file descriptor freeing.
>
> And current documentations clearly states it, maybe your man page is too old ?
>
> ---
>
> Q6 Will closing a file descriptor cause it to be removed from all epoll sets automatically?
>
> A6 Yes, but be aware of the following point. A file descriptor is a reference to an open file description (see
> open(2)). Whenever a descriptor is duplicated via dup(2), dup2(2), fcntl(2) F_DUPFD, or fork(2), a new file descriptor referring to the same open file description is created. An open file description continues to exist until all
> file descriptors referring to it have been closed. A file descriptor is removed from an epoll set only after all the
> file descriptors referring to the underlying open file description have been closed (or before if the descriptor is
> explicitly removed using epoll_ctl() EPOLL_CTL_DEL). This means that even after a file descriptor that is part of an
> epoll set has been closed, events may be reported for that file descriptor if other file descriptors referring to the
> same underlying file description remain open.
>
>
>
>
>
Ugh. The manpage on Ubuntu 8.04 that I am using doesn't contain this
lengthy caveat; it just says "A6 Yes." To me it sounds like the
(newer) manpage is just documenting away a bug/misfeature with "It's
supposed to do that. Gives it character." rather than expecting the
kernel developers to make the API behave in a more reasonable way. Just
my opinion; others may agree or disagree.
If the officially-sanctioned epoll interface is supposed to report
events on fds that were added, duped, and then the original fd closed,
then my patch will probably break that behavior. In that case, it will
be more difficult to fix problem that my patch is trying to fix (which
doesn't involve dup, dup2, fcntl, fork, etc.). So before I try to think
of another way, let me ask: do we want to fix the problem that I am
reporting, or just document it away in the manpage as (apparently) has
been done before?
Tony
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/6] [2.6.29] epoll: fix for epoll_wait sometimes returning events on closed fds
2009-02-24 20:36 ` Tony Battersby
@ 2009-02-24 20:45 ` Davide Libenzi
2009-02-24 20:52 ` Tony Battersby
0 siblings, 1 reply; 7+ messages in thread
From: Davide Libenzi @ 2009-02-24 20:45 UTC (permalink / raw)
To: Tony Battersby
Cc: Eric Dumazet, Andrew Morton, Jonathan Corbet,
Linux Kernel Mailing List
On Tue, 24 Feb 2009, Tony Battersby wrote:
> Ugh. The manpage on Ubuntu 8.04 that I am using doesn't contain this
> lengthy caveat; it just says "A6 Yes." To me it sounds like the
> (newer) manpage is just documenting away a bug/misfeature with "It's
> supposed to do that. Gives it character." rather than expecting the
> kernel developers to make the API behave in a more reasonable way. Just
> my opinion; others may agree or disagree.
>
> If the officially-sanctioned epoll interface is supposed to report
> events on fds that were added, duped, and then the original fd closed,
> then my patch will probably break that behavior. In that case, it will
> be more difficult to fix problem that my patch is trying to fix (which
> doesn't involve dup, dup2, fcntl, fork, etc.). So before I try to think
> of another way, let me ask: do we want to fix the problem that I am
> reporting, or just document it away in the manpage as (apparently) has
> been done before?
No. Like it has been explained, MT applications can report events for
closed files anyway. It is a matter of where the close() happen in time
(pretty easy to figure out if you make a time/thread chart).
As for the other patches, some could be applied, but I didn't look at them
deeply and I need time to review them. Will take time to review them
tomorrow or the day after.
- Davide
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/6] [2.6.29] epoll: fix for epoll_wait sometimes returning events on closed fds
2009-02-24 20:45 ` Davide Libenzi
@ 2009-02-24 20:52 ` Tony Battersby
0 siblings, 0 replies; 7+ messages in thread
From: Tony Battersby @ 2009-02-24 20:52 UTC (permalink / raw)
To: Davide Libenzi
Cc: Eric Dumazet, Andrew Morton, Jonathan Corbet,
Linux Kernel Mailing List
Davide Libenzi wrote:
> No. Like it has been explained, MT applications can report events for
> closed files anyway. It is a matter of where the close() happen in time
> (pretty easy to figure out if you make a time/thread chart).
> As for the other patches, some could be applied, but I didn't look at them
> deeply and I need time to review them. Will take time to review them
> tomorrow or the day after.
>
>
>
> - Davide
>
>
>
>
OK then. Not the answer I wanted, but that's life. Good news for
everyone else though - there is no need to remake the patches already in
-mm. So please ignore the patches marked [2.6.29] and use the patches
marked [-mm] instead (excluding patch #1, which has now been rejected).
Tony
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-02-24 20:52 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-24 17:26 [PATCH 1/6] [2.6.29] epoll: fix for epoll_wait sometimes returning events on closed fds Tony Battersby
2009-02-24 18:12 ` Eric Dumazet
2009-02-24 18:44 ` Tony Battersby
2009-02-24 19:52 ` Eric Dumazet
2009-02-24 20:36 ` Tony Battersby
2009-02-24 20:45 ` Davide Libenzi
2009-02-24 20:52 ` Tony Battersby
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox