public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Tony Battersby <tonyb@cybernetics.com>
To: Andrew Morton <akpm@linux-foundation.org>,
	Davide Libenzi <davidel@xmailserver.org>,
	Jonathan Corbet <corbet@lwn.net>,
	linux-kernel@vger.kernel.org
Subject: [PATCH 0/6] epoll fixes and cleanups
Date: Tue, 24 Feb 2009 12:25:54 -0500	[thread overview]
Message-ID: <49A42DA2.5080101@cybernetics.com> (raw)

[forgot to CC linux-kernel]

Patch #1 fixes an important bug where epoll_wait can incorrectly
return events for closed fds.  Below is a test program that can
reliably test for the problem.  This bug has the potential to confuse
or crash userspace programs, so I consider it to be high-priority.
For example, a program could crash if it closes a fd, frees the
associated event.data.ptr, calls epoll_wait, and then gets back an
event with the freed event.data.ptr.

The test program uses a thread blocked in read() to keep the struct
file refcount from dropping to 0.  Then a different thread does close()
and then epoll_wait(), and epoll_wait() incorrectly returns an event
on the closed fd.  There are many more ways to trigger the bug with
a multi-threaded userspace program.  A single-threaded userspace
program may also trigger the bug if the kernel ever calls fput() from
outside a system call (e.g. from keventd or some other kernel thread).
It is difficult to know all the potential ways that the bug could be
triggered without doing an audit of a lot of kernel code.

Patches #2 and #3 fix some minor issues; the rest are just cleanups.

I would like to have patch #1 included in 2.6.29 and -stable.
Assuming that everyone agrees that the patch is indeed important,
it will unfortunately conflict with some of the patches currently
in -mm and linux-next that are queued for 2.6.30.  Some of my other
5 patches also conflict with the epoll patches currently in -mm, but
they are lower priority and do not need to go into 2.6.29 or -stable.
So I will post two versions of my patchset - one against 2.6.29 and
one against the current epoll patches in -mm.  I will need some help
from the authors of the other patches in -mm to resolve the conflicts
against my patch #1; my patches #2 - #6 hopefully won't conflict if
you use the versions for -mm and apply after the other patches in -mm.
I have never handled this situation before, so my apologies if I am
stepping on anyone's toes.

Other things I noticed while auditing epoll code:

The following functions could use spin_lock_irq() instead
of spin_lock_irqsave(): ep_remove, ep_insert, ep_modify,
ep_scan_ready_list, and ep_poll.  One of my patches changes
ep_modify to use spin_lock_irq while making some other changes, but
I haven't done anything with the others.  Some people prefer to use
spin_lock_irqsave() always because it is harder to use incorrectly,
so it is just a matter of preference.

epoll uses wake_up_locked() instead of wake_up() for ep->wq although
it never uses eq->wq.lock; presumably ep->lock outside of the
wait_queue_head_t protects the wait queue internals, but I don't see
this documented anywhere.

---

/*
This program tests for a bug in the kernel's implementation of epoll where
epoll_wait can incorrectly return an event on a fd that has been closed.
You can work around the kernel bug by calling epoll_ctl(EPOLL_CTL_DEL) before
closing the fd.

Compile:
   gcc -D_REENTRANT -o epoll_wait_bug epoll_wait_bug.c -lpthread
*/

#include <sys/types.h>
#include <sys/socket.h>
#include <sys/epoll.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <pthread.h>

#define EPOLL_DATA_MAGIC 0x1234567890123456ULL

static int fd[2];

static void *thread_func(void *arg)
{
   for (;;)
      {
      char ch;

      if (read(fd[0], &ch, sizeof(ch)) <= 0)
         {
         break;
         }
      }

   return NULL;
}

int main(int argc, char *argv[])
{
   pthread_attr_t detached_thread_attr;
   pthread_t thread;
   struct epoll_event evt;
   int epfd;
   int ret;
   int exit_status = EXIT_FAILURE;

   epfd = epoll_create(1);
   if (epfd == -1)
      {
      perror("epoll_create");
      exit(EXIT_FAILURE);
      }

   if (socketpair(AF_UNIX, SOCK_STREAM, 0, fd))
      {
      perror("socketpair");
      exit(EXIT_FAILURE);
      }

   evt.events   = EPOLLOUT;
   evt.data.u64 = EPOLL_DATA_MAGIC;
   if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd[0], &evt))
      {
      perror("epoll_ctl EPOLL_CTL_ADD");
      exit(EXIT_FAILURE);
      }

   pthread_attr_init(&detached_thread_attr);
   pthread_attr_setdetachstate(&detached_thread_attr,
                               PTHREAD_CREATE_DETACHED);
   if (pthread_create(&thread,
                      &detached_thread_attr,
                      &thread_func,
                      NULL))
      {
      perror("pthread_create");
      exit(EXIT_FAILURE);
      }

   sleep(1);

#if 0
   /* This works around the kernel bug. */
   if (epoll_ctl(epfd, EPOLL_CTL_DEL, fd[0], &evt))
      {
      perror("epoll_ctl EPOLL_CTL_DEL");
      exit(EXIT_FAILURE);
      }
#endif

   close(fd[0]);

   memset(&evt, 0, sizeof(evt));
   ret = epoll_wait(epfd, &evt, 1, 100);
   switch (ret)
      {
      case -1 :
         perror("epoll_wait");
         break;

      case 0 : /* timeout */
         printf("This kernel does not have the epoll_wait bug.\n");
         exit_status = EXIT_SUCCESS;
         break;

      case 1 :
         if (evt.data.u64 == EPOLL_DATA_MAGIC)
            {
            /* This is what happens for kernels with the bug. */
            printf("KERNEL BUG DETECTED: epoll_wait returned an event on a "
                   "closed fd\n");
            }
         else
            {
            printf("KERNEL BUG DETECTED: epoll_wait returned a bad event\n");
            }
         break;

      default :
         printf("KERNEL BUG DETECTED: epoll_wait returned unexpected value "
                "%d\n", ret);
      }

   return exit_status;
}




                 reply	other threads:[~2009-02-24 17:26 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49A42DA2.5080101@cybernetics.com \
    --to=tonyb@cybernetics.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=davidel@xmailserver.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox