All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] epoll fixes and cleanups
@ 2009-02-24 17:25 Tony Battersby
  0 siblings, 0 replies; only message in thread
From: Tony Battersby @ 2009-02-24 17:25 UTC (permalink / raw)
  To: Andrew Morton, Davide Libenzi, Jonathan Corbet, linux-kernel

[forgot to CC linux-kernel]

Patch #1 fixes an important bug where epoll_wait can incorrectly
return events for closed fds.  Below is a test program that can
reliably test for the problem.  This bug has the potential to confuse
or crash userspace programs, so I consider it to be high-priority.
For example, a program could crash if it closes a fd, frees the
associated event.data.ptr, calls epoll_wait, and then gets back an
event with the freed event.data.ptr.

The test program uses a thread blocked in read() to keep the struct
file refcount from dropping to 0.  Then a different thread does close()
and then epoll_wait(), and epoll_wait() incorrectly returns an event
on the closed fd.  There are many more ways to trigger the bug with
a multi-threaded userspace program.  A single-threaded userspace
program may also trigger the bug if the kernel ever calls fput() from
outside a system call (e.g. from keventd or some other kernel thread).
It is difficult to know all the potential ways that the bug could be
triggered without doing an audit of a lot of kernel code.

Patches #2 and #3 fix some minor issues; the rest are just cleanups.

I would like to have patch #1 included in 2.6.29 and -stable.
Assuming that everyone agrees that the patch is indeed important,
it will unfortunately conflict with some of the patches currently
in -mm and linux-next that are queued for 2.6.30.  Some of my other
5 patches also conflict with the epoll patches currently in -mm, but
they are lower priority and do not need to go into 2.6.29 or -stable.
So I will post two versions of my patchset - one against 2.6.29 and
one against the current epoll patches in -mm.  I will need some help
from the authors of the other patches in -mm to resolve the conflicts
against my patch #1; my patches #2 - #6 hopefully won't conflict if
you use the versions for -mm and apply after the other patches in -mm.
I have never handled this situation before, so my apologies if I am
stepping on anyone's toes.

Other things I noticed while auditing epoll code:

The following functions could use spin_lock_irq() instead
of spin_lock_irqsave(): ep_remove, ep_insert, ep_modify,
ep_scan_ready_list, and ep_poll.  One of my patches changes
ep_modify to use spin_lock_irq while making some other changes, but
I haven't done anything with the others.  Some people prefer to use
spin_lock_irqsave() always because it is harder to use incorrectly,
so it is just a matter of preference.

epoll uses wake_up_locked() instead of wake_up() for ep->wq although
it never uses eq->wq.lock; presumably ep->lock outside of the
wait_queue_head_t protects the wait queue internals, but I don't see
this documented anywhere.

---

/*
This program tests for a bug in the kernel's implementation of epoll where
epoll_wait can incorrectly return an event on a fd that has been closed.
You can work around the kernel bug by calling epoll_ctl(EPOLL_CTL_DEL) before
closing the fd.

Compile:
   gcc -D_REENTRANT -o epoll_wait_bug epoll_wait_bug.c -lpthread
*/

#include <sys/types.h>
#include <sys/socket.h>
#include <sys/epoll.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <pthread.h>

#define EPOLL_DATA_MAGIC 0x1234567890123456ULL

static int fd[2];

static void *thread_func(void *arg)
{
   for (;;)
      {
      char ch;

      if (read(fd[0], &ch, sizeof(ch)) <= 0)
         {
         break;
         }
      }

   return NULL;
}

int main(int argc, char *argv[])
{
   pthread_attr_t detached_thread_attr;
   pthread_t thread;
   struct epoll_event evt;
   int epfd;
   int ret;
   int exit_status = EXIT_FAILURE;

   epfd = epoll_create(1);
   if (epfd == -1)
      {
      perror("epoll_create");
      exit(EXIT_FAILURE);
      }

   if (socketpair(AF_UNIX, SOCK_STREAM, 0, fd))
      {
      perror("socketpair");
      exit(EXIT_FAILURE);
      }

   evt.events   = EPOLLOUT;
   evt.data.u64 = EPOLL_DATA_MAGIC;
   if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd[0], &evt))
      {
      perror("epoll_ctl EPOLL_CTL_ADD");
      exit(EXIT_FAILURE);
      }

   pthread_attr_init(&detached_thread_attr);
   pthread_attr_setdetachstate(&detached_thread_attr,
                               PTHREAD_CREATE_DETACHED);
   if (pthread_create(&thread,
                      &detached_thread_attr,
                      &thread_func,
                      NULL))
      {
      perror("pthread_create");
      exit(EXIT_FAILURE);
      }

   sleep(1);

#if 0
   /* This works around the kernel bug. */
   if (epoll_ctl(epfd, EPOLL_CTL_DEL, fd[0], &evt))
      {
      perror("epoll_ctl EPOLL_CTL_DEL");
      exit(EXIT_FAILURE);
      }
#endif

   close(fd[0]);

   memset(&evt, 0, sizeof(evt));
   ret = epoll_wait(epfd, &evt, 1, 100);
   switch (ret)
      {
      case -1 :
         perror("epoll_wait");
         break;

      case 0 : /* timeout */
         printf("This kernel does not have the epoll_wait bug.\n");
         exit_status = EXIT_SUCCESS;
         break;

      case 1 :
         if (evt.data.u64 == EPOLL_DATA_MAGIC)
            {
            /* This is what happens for kernels with the bug. */
            printf("KERNEL BUG DETECTED: epoll_wait returned an event on a "
                   "closed fd\n");
            }
         else
            {
            printf("KERNEL BUG DETECTED: epoll_wait returned a bad event\n");
            }
         break;

      default :
         printf("KERNEL BUG DETECTED: epoll_wait returned unexpected value "
                "%d\n", ret);
      }

   return exit_status;
}




^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2009-02-24 17:26 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-24 17:25 [PATCH 0/6] epoll fixes and cleanups Tony Battersby

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.