All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ray Bryant <raybry@sgi.com>
To: linux-kernel@vger.kernel.org
Cc: Andrew Morton <akpm@digeo.com>,
	Manfred Spraul <manfred@colorfullife.com>,
	Andi Kleen <ak@suse.de>,
	trivial@rustcorp.com.au, alan@lxorguk.ukuu.org.uk
Subject: PROBLEM: Bug in __pollwait() can cause select() and poll() to hang in  2.4.22-pre2 -- second try
Date: Fri, 27 Jun 2003 13:19:20 -0500	[thread overview]
Message-ID: <3EFC8AA8.7000501@sgi.com> (raw)

[1.] One line summary of the problem:

      In low memory situations, a process that issues a call to select()
or poll() can sleep forever in the kernel.

[2.] Full description of the problem/report:

      select() and poll() call a common routine: __pollwait().  On the
first call to __pollwait(), it calls __get_free_page(GFP_KERNEL) to
allocate a table to hold wait queues.  In the natural course of things,
this calls into __alloc_pages().  In low memory situations, the process
can then end up in the rebalance code at the bottom of __alloc_pages()
where there is a call to yield().  If the process makes this call, this
is a bad thing [tm], since the process state at that point is
TASK_INTERRUPTIBLE.  There is no wait queue yet for the process (that is
done later in __pollwait()) and no schedule timeout event has yet been
created (that is done later in select()) so the process will never
return from the call to yield().

[3.] Keywords (i.e., modules, networking, kernel):

      Kernel

[4.] Kernel version (from /proc/version):

      This bug appears to be present in every 2.4 kernel from (at least)
2.4.13 thru 2.4.22-pre2.  It is not present in 2.5.70, since a different
method of waiting for memory to free up is used there (in
__alloc_pages()).

[5.] Output of Oops.. message (if applicable) with symbolic information
      resolved (see Documentation/oops-tracing.txt)

      N/A.

[6.] A small shell script or example program which triggers the
      problem (if possible)

      We ecountered this whilst running batch queue tests that are too
complex to include here.

[7.] Environment

[7.1.] Software (add the output of the ver_linux script here)

[7.2.] Processor information (from /proc/cpuinfo):

       We encountered this on ia64, however, this is in machine
independent code and we believe the bug is present on all 2.4.21
platforms.

[7.3.] Module information (from /proc/modules):

[7.4.] Loaded driver and hardware information (/proc/ioports,
/proc/iomem)

[7.5.] PCI information ('lspci -vvv' as root)

[7.6.] SCSI information (from /proc/scsi/scsi)

[7.7.] Other information that might be relevant to the problem
        (please look in /proc and include all information that you
        think to be relevant):

[X.] Other notes, patches, fixes, workarounds:

      The simplest fix (as suggested by Manfred Spraul) is to set
current=>state to TASK_RUNNING just before the call to yield() in
__alloc_pages().  I have tested this sufficiently that I believe
this does not change the user level semantics of select() (my
concern was that if state got set to TASK_RUNNING that the syscall
could return before any fd's are ready or the select() timeout has
expired, but this does not appear to be the case).

Here is a trivial patch against 2.4.22-pre2:

--- linux-2.4.22-pre2.orig/mm/page_alloc.c      Thu Nov 28 17:53:15 2002
+++ linux-2.4.22-pre2/mm/page_alloc.c   Fri Jun 27 13:47:49 2003
@@ -418,6 +418,7 @@
                 return NULL;

         /* Yield for kswapd, and try again */
+        set_current_state(TASK_RUNNING);
         yield();
         goto rebalance;
  }

-- 
Best Regards,
Ray
-----------------------------------------------
                   Ray Bryant
512-453-9679 (work)         512-507-7807 (cell)
Jun 23-Jul 18 I will be at: 970-513-4743
raybry@sgi.com             raybry@austin.rr.com
The box said: "Requires Windows 98 or better",
            so I installed Linux.
-----------------------------------------------


             reply	other threads:[~2003-06-27 18:04 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-06-27 18:19 Ray Bryant [this message]
2003-06-30  4:34 ` PROBLEM: Bug in __pollwait() can cause select() and poll() to hang in 2.4.22-pre2 -- second try Rusty Russell
2003-06-30 16:24   ` Manfred Spraul
2003-07-01  1:17     ` Rusty Russell
2003-07-01  4:17       ` Linus Torvalds
2003-07-01  5:08         ` Rusty Russell
2003-07-02 18:06           ` Ray Bryant
2003-07-03  0:56             ` Rusty Russell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3EFC8AA8.7000501@sgi.com \
    --to=raybry@sgi.com \
    --cc=ak@suse.de \
    --cc=akpm@digeo.com \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=manfred@colorfullife.com \
    --cc=trivial@rustcorp.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.