public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* PROBLEM: Bug in __pollwait() can cause select() and poll() to hang in  2.4.22-pre2 -- second try
@ 2003-06-27 18:19 Ray Bryant
  2003-06-30  4:34 ` Rusty Russell
  0 siblings, 1 reply; 8+ messages in thread
From: Ray Bryant @ 2003-06-27 18:19 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton, Manfred Spraul, Andi Kleen, trivial, alan

[1.] One line summary of the problem:

      In low memory situations, a process that issues a call to select()
or poll() can sleep forever in the kernel.

[2.] Full description of the problem/report:

      select() and poll() call a common routine: __pollwait().  On the
first call to __pollwait(), it calls __get_free_page(GFP_KERNEL) to
allocate a table to hold wait queues.  In the natural course of things,
this calls into __alloc_pages().  In low memory situations, the process
can then end up in the rebalance code at the bottom of __alloc_pages()
where there is a call to yield().  If the process makes this call, this
is a bad thing [tm], since the process state at that point is
TASK_INTERRUPTIBLE.  There is no wait queue yet for the process (that is
done later in __pollwait()) and no schedule timeout event has yet been
created (that is done later in select()) so the process will never
return from the call to yield().

[3.] Keywords (i.e., modules, networking, kernel):

      Kernel

[4.] Kernel version (from /proc/version):

      This bug appears to be present in every 2.4 kernel from (at least)
2.4.13 thru 2.4.22-pre2.  It is not present in 2.5.70, since a different
method of waiting for memory to free up is used there (in
__alloc_pages()).

[5.] Output of Oops.. message (if applicable) with symbolic information
      resolved (see Documentation/oops-tracing.txt)

      N/A.

[6.] A small shell script or example program which triggers the
      problem (if possible)

      We ecountered this whilst running batch queue tests that are too
complex to include here.

[7.] Environment

[7.1.] Software (add the output of the ver_linux script here)

[7.2.] Processor information (from /proc/cpuinfo):

       We encountered this on ia64, however, this is in machine
independent code and we believe the bug is present on all 2.4.21
platforms.

[7.3.] Module information (from /proc/modules):

[7.4.] Loaded driver and hardware information (/proc/ioports,
/proc/iomem)

[7.5.] PCI information ('lspci -vvv' as root)

[7.6.] SCSI information (from /proc/scsi/scsi)

[7.7.] Other information that might be relevant to the problem
        (please look in /proc and include all information that you
        think to be relevant):

[X.] Other notes, patches, fixes, workarounds:

      The simplest fix (as suggested by Manfred Spraul) is to set
current=>state to TASK_RUNNING just before the call to yield() in
__alloc_pages().  I have tested this sufficiently that I believe
this does not change the user level semantics of select() (my
concern was that if state got set to TASK_RUNNING that the syscall
could return before any fd's are ready or the select() timeout has
expired, but this does not appear to be the case).

Here is a trivial patch against 2.4.22-pre2:

--- linux-2.4.22-pre2.orig/mm/page_alloc.c      Thu Nov 28 17:53:15 2002
+++ linux-2.4.22-pre2/mm/page_alloc.c   Fri Jun 27 13:47:49 2003
@@ -418,6 +418,7 @@
                 return NULL;

         /* Yield for kswapd, and try again */
+        set_current_state(TASK_RUNNING);
         yield();
         goto rebalance;
  }

-- 
Best Regards,
Ray
-----------------------------------------------
                   Ray Bryant
512-453-9679 (work)         512-507-7807 (cell)
Jun 23-Jul 18 I will be at: 970-513-4743
raybry@sgi.com             raybry@austin.rr.com
The box said: "Requires Windows 98 or better",
            so I installed Linux.
-----------------------------------------------


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2003-07-03  1:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-06-27 18:19 PROBLEM: Bug in __pollwait() can cause select() and poll() to hang in 2.4.22-pre2 -- second try Ray Bryant
2003-06-30  4:34 ` Rusty Russell
2003-06-30 16:24   ` Manfred Spraul
2003-07-01  1:17     ` Rusty Russell
2003-07-01  4:17       ` Linus Torvalds
2003-07-01  5:08         ` Rusty Russell
2003-07-02 18:06           ` Ray Bryant
2003-07-03  0:56             ` Rusty Russell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox