* [PATCH] poll(2) timeout values
@ 2005-11-10 16:31 Peter Staubach
2005-11-10 17:15 ` Alan Cox
0 siblings, 1 reply; 8+ messages in thread
From: Peter Staubach @ 2005-11-10 16:31 UTC (permalink / raw)
To: Linux Kernel Mailing List
[-- Attachment #1: Type: text/plain, Size: 2393 bytes --]
Hi.
On 32 bit platforms, the poll(2) system call support can not correctly
handle the range of timeout values which are possible to call it with.
This is due to some limitations based on the clock speed, but also due
to concerns about integer arithmetic overflow. The former is due to,
the number of jiffies which correspond to the number of milliseconds in
the timeout value, being larger than allowed. Clock speeds of 1000 HZ
or less do not have this problem, so this is not a very common situation
to find. The latter is due to problems in the ordering of the arithmetic
operations used to convert milliseconds to jiffies.
The problem being reported is that despite the man page descriptions, on
a typical 32 bit system with the clock rate set at 1000 HZ, the poll system
call can not handle timeout values larger than about 2,147 seconds. On
such a system, the limit should be about 2,147,483 seconds.
Another problem was also discovered in the definition of the arguments to
the kernel poll implementation. The poll(2) system takes an int as its
third argument. The kernel implementation was using a long.
Once this problem is corrected, then the first problem can also be
corrected because the length of the timeout value from the user level
is limited to 31 bits. 64 bit arithmetic can be used to correctly do
the conversion between milliseconds and clock jiffies without loss of
precision.
These changes were tested by running a program which issues a poll(2)
call with a timeout value of 2,200 seconds. Without these changes, the
poll call in this program hangs forever. With the changes, the poll
call times out after approximately 2,200 seconds, matching the semantics
as described in the man page.
Concerns have been expressed regarding the change in the arguments to
the sys_poll() routine. This is not an exported symbol, so there should
be compatibility issues with existing callers. The routine implements
the system call interface underneath the glibc poll(2) call, so, if there
are architectures for which this would be a problem, then these changes
are not good enough.
Clearly, the timeout calculations problem can be fixed without changing
the arguments to the sys_poll() routine. However, it is cleaner to fix
it this way by ensuring the sizes and types of arguments match.
Thanx...
ps
Signed-off-by: Peter Staubach <staubach@redhat.com>
[-- Attachment #2: poll.devel --]
[-- Type: text/plain, Size: 2093 bytes --]
--- linux-2.6.14/fs/select.c.org
+++ linux-2.6.14/fs/select.c
@@ -25,6 +25,7 @@
#include <linux/rcupdate.h>
#include <asm/uaccess.h>
+#include <asm/div64.h>
#define ROUND_UP(x,y) (((x)+(y)-1)/(y))
#define DEFAULT_POLLMASK (POLLIN | POLLOUT | POLLRDNORM | POLLWRNORM)
@@ -464,7 +465,7 @@ static int do_poll(unsigned int nfds, s
return count;
}
-asmlinkage long sys_poll(struct pollfd __user * ufds, unsigned int nfds, long timeout)
+asmlinkage long sys_poll(struct pollfd __user * ufds, unsigned int nfds, int timeout_msecs)
{
struct poll_wqueues table;
int fdcount, err;
@@ -473,6 +474,8 @@ asmlinkage long sys_poll(struct pollfd _
struct poll_list *walk;
struct fdtable *fdt;
int max_fdset;
+ long timeout;
+ int64_t lltimeout;
/* Do a sanity check on nfds ... */
rcu_read_lock();
@@ -482,13 +485,20 @@ asmlinkage long sys_poll(struct pollfd _
if (nfds > max_fdset && nfds > OPEN_MAX)
return -EINVAL;
- if (timeout) {
- /* Careful about overflow in the intermediate values */
- if ((unsigned long) timeout < MAX_SCHEDULE_TIMEOUT / HZ)
- timeout = (unsigned long)(timeout*HZ+999)/1000+1;
- else /* Negative or overflow */
+ if (timeout_msecs) {
+ if (timeout_msecs < 0)
timeout = MAX_SCHEDULE_TIMEOUT;
- }
+ else {
+ lltimeout = (int64_t)timeout_msecs * HZ + 999;
+ do_div(lltimeout, 1000);
+ lltimeout++;
+ if (lltimeout > MAX_SCHEDULE_TIMEOUT)
+ timeout = MAX_SCHEDULE_TIMEOUT;
+ else
+ timeout = (long)lltimeout;
+ }
+ } else
+ timeout = 0;
poll_initwait(&table);
--- linux-2.6.14/include/linux/syscalls.h.org
+++ linux-2.6.14/include/linux/syscalls.h
@@ -420,7 +420,7 @@ asmlinkage long sys_socketpair(int, int,
asmlinkage long sys_socketcall(int call, unsigned long __user *args);
asmlinkage long sys_listen(int, int);
asmlinkage long sys_poll(struct pollfd __user *ufds, unsigned int nfds,
- long timeout);
+ int timeout_msecs);
asmlinkage long sys_select(int n, fd_set __user *inp, fd_set __user *outp,
fd_set __user *exp, struct timeval __user *tvp);
asmlinkage long sys_epoll_create(int size);
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] poll(2) timeout values
2005-11-10 16:31 [PATCH] poll(2) timeout values Peter Staubach
@ 2005-11-10 17:15 ` Alan Cox
2005-11-10 21:02 ` Willy Tarreau
[not found] ` <a36005b50511101049vf20cde5m9385c433e18dcd2d@mail.gmail.com>
0 siblings, 2 replies; 8+ messages in thread
From: Alan Cox @ 2005-11-10 17:15 UTC (permalink / raw)
To: Peter Staubach; +Cc: Linux Kernel Mailing List
On Iau, 2005-11-10 at 11:31 -0500, Peter Staubach wrote:
> Clearly, the timeout calculations problem can be fixed without changing
> the arguments to the sys_poll() routine. However, it is cleaner to fix
> it this way by ensuring the sizes and types of arguments match.
There really is no need for the kernel API to match the userspace one,
many of our others differ between the syscall interface which is most
definitely 'exported' in one sense and the POSIX interface which is
defined by libc, posix and the LSB etc
No argument about the timeout fix.
Alan
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] poll(2) timeout values
2005-11-10 17:15 ` Alan Cox
@ 2005-11-10 21:02 ` Willy Tarreau
2005-11-11 20:19 ` Peter Staubach
[not found] ` <a36005b50511101049vf20cde5m9385c433e18dcd2d@mail.gmail.com>
1 sibling, 1 reply; 8+ messages in thread
From: Willy Tarreau @ 2005-11-10 21:02 UTC (permalink / raw)
To: Alan Cox; +Cc: Peter Staubach, Linux Kernel Mailing List
On Thu, Nov 10, 2005 at 05:15:56PM +0000, Alan Cox wrote:
> On Iau, 2005-11-10 at 11:31 -0500, Peter Staubach wrote:
> > Clearly, the timeout calculations problem can be fixed without changing
> > the arguments to the sys_poll() routine. However, it is cleaner to fix
> > it this way by ensuring the sizes and types of arguments match.
>
> There really is no need for the kernel API to match the userspace one,
> many of our others differ between the syscall interface which is most
> definitely 'exported' in one sense and the POSIX interface which is
> defined by libc, posix and the LSB etc
>
> No argument about the timeout fix.
I posted a different fix here about a month ago (but I sent it 3 times,
as it was twice wrong). Andrew was about to merge it in his tree but I
have not checked yet. It was different in the sense that it used
msecs_to_jiffies() to do the arithmetic in the best possible way depending
on the HZ value and the ints size. Most of the time (when 1000 % HZ == 0),
it will simplify the operations to a single divide by a constant and
correctly check for integer overflows. Eg, with HZ=250, a simple 2 bits
right shift will replace a multiply followed by an divide.
I'll check whether 2.6.14-mm1 has it, otherwise I can repost it.
Cheers,
Willy
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] poll(2) timeout values
[not found] ` <a36005b50511101049vf20cde5m9385c433e18dcd2d@mail.gmail.com>
@ 2005-11-10 22:33 ` Alan Cox
[not found] ` <a36005b50511101649l744f78c1i76133434be7304e8@mail.gmail.com>
0 siblings, 1 reply; 8+ messages in thread
From: Alan Cox @ 2005-11-10 22:33 UTC (permalink / raw)
To: Ulrich Drepper; +Cc: Peter Staubach, Linux Kernel Mailing List
On Iau, 2005-11-10 at 10:49 -0800, Ulrich Drepper wrote:
> On 11/10/05, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> > There really is no need for the kernel API to match the userspace one,
>
> But if in this case the argument is not changed you would have to add
> an explicit & 0xffffffff before using the parameter.
No. The poll POSIX libc call takes an int. What the kernel ones does
with the top bits is irrelevant to applications. It is however highly
relevant to things like syscall sequences and what the syscall interface
places on the stack.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] poll(2) timeout values
[not found] ` <a36005b50511101649l744f78c1i76133434be7304e8@mail.gmail.com>
@ 2005-11-11 13:16 ` Alan Cox
2005-11-11 13:17 ` Peter Staubach
0 siblings, 1 reply; 8+ messages in thread
From: Alan Cox @ 2005-11-11 13:16 UTC (permalink / raw)
To: Ulrich Drepper; +Cc: Peter Staubach, Linux Kernel Mailing List
On Iau, 2005-11-10 at 16:49 -0800, Ulrich Drepper wrote:
> On 11/10/05, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> > No. The poll POSIX libc call takes an int. What the kernel ones does
> > with the top bits is irrelevant to applications.
>
> The issue is that if the high bits are not handled special then
> somebody might cause problems. E.g., overflowing the division or so.
> Therefore the kernel has to sanitize the argument and then why not use
> the easiest way to do this?
Why does the kernel have to sanitize the input. Last time I checked
undefined inputs gave undefined outputs in the standards. fopen(NULL,
NULL) seems to crash glibc for example.
The kernel has to behave correctly given valid sensible inputs. Masking
the top bits is not behaving correctly
"sleep ages"
"no I'll sleep a short time"
Surely it would be far better to do
if((timeout >> 31) >> 1)
return -EINVAL;
for 64bit systems
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] poll(2) timeout values
2005-11-11 13:16 ` Alan Cox
@ 2005-11-11 13:17 ` Peter Staubach
0 siblings, 0 replies; 8+ messages in thread
From: Peter Staubach @ 2005-11-11 13:17 UTC (permalink / raw)
To: Alan Cox; +Cc: Ulrich Drepper, Linux Kernel Mailing List
Alan Cox wrote:
>On Iau, 2005-11-10 at 16:49 -0800, Ulrich Drepper wrote:
>
>
>>On 11/10/05, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
>>
>>
>>>No. The poll POSIX libc call takes an int. What the kernel ones does
>>>with the top bits is irrelevant to applications.
>>>
>>>
>>The issue is that if the high bits are not handled special then
>>somebody might cause problems. E.g., overflowing the division or so.
>>Therefore the kernel has to sanitize the argument and then why not use
>>the easiest way to do this?
>>
>>
>
>Why does the kernel have to sanitize the input. Last time I checked
>undefined inputs gave undefined outputs in the standards. fopen(NULL,
>NULL) seems to crash glibc for example.
>
>The kernel has to behave correctly given valid sensible inputs. Masking
>the top bits is not behaving correctly
>
> "sleep ages"
> "no I'll sleep a short time"
>
>Surely it would be far better to do
>
> if((timeout >> 31) >> 1)
> return -EINVAL;
>
>for 64bit systems
>
Given that the current, sole purpose of sys_poll() is to implement the
poll(2)
system call, and that currently, that system call takes a 32 bit signed
integer
third argument, then it seems safe to me to just ignore the top 32
bits. They
can be either masked off explicitly or implicitly via the use of a 32 bit
signed integer.
Is there a problem with making this change other than it is a change? Is
there a risk of something breaking?
Thanx...
ps
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] poll(2) timeout values
2005-11-10 21:02 ` Willy Tarreau
@ 2005-11-11 20:19 ` Peter Staubach
2005-11-11 22:02 ` Willy Tarreau
0 siblings, 1 reply; 8+ messages in thread
From: Peter Staubach @ 2005-11-11 20:19 UTC (permalink / raw)
To: Willy Tarreau; +Cc: Alan Cox, Linux Kernel Mailing List
Willy Tarreau wrote:
>On Thu, Nov 10, 2005 at 05:15:56PM +0000, Alan Cox wrote:
>
>
>>On Iau, 2005-11-10 at 11:31 -0500, Peter Staubach wrote:
>>
>>
>>>Clearly, the timeout calculations problem can be fixed without changing
>>>the arguments to the sys_poll() routine. However, it is cleaner to fix
>>>it this way by ensuring the sizes and types of arguments match.
>>>
>>>
>>There really is no need for the kernel API to match the userspace one,
>>many of our others differ between the syscall interface which is most
>>definitely 'exported' in one sense and the POSIX interface which is
>>defined by libc, posix and the LSB etc
>>
>>No argument about the timeout fix.
>>
>>
>
>I posted a different fix here about a month ago (but I sent it 3 times,
>as it was twice wrong). Andrew was about to merge it in his tree but I
>have not checked yet. It was different in the sense that it used
>msecs_to_jiffies() to do the arithmetic in the best possible way depending
>on the HZ value and the ints size. Most of the time (when 1000 % HZ == 0),
>it will simplify the operations to a single divide by a constant and
>correctly check for integer overflows. Eg, with HZ=250, a simple 2 bits
>right shift will replace a multiply followed by an divide.
>
>I'll check whether 2.6.14-mm1 has it, otherwise I can repost it.
>
Yes, I remember the conversation. I hadn't seen the final patch included
anywhere, so I posted this one.
That said, I think that msecs_to_jiffies() can still suffer from overflows
if (HZ % MSECS_PER_SEC) != 0 && (MSECS_PER_SEC % HZ) != 0. I'd like to
see the rest of your patch to see how this was worked around.
The patch that I posted does not suffer from overflow issues.
Thanx...
ps
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] poll(2) timeout values
2005-11-11 20:19 ` Peter Staubach
@ 2005-11-11 22:02 ` Willy Tarreau
0 siblings, 0 replies; 8+ messages in thread
From: Willy Tarreau @ 2005-11-11 22:02 UTC (permalink / raw)
To: Peter Staubach; +Cc: Alan Cox, Linux Kernel Mailing List
On Fri, Nov 11, 2005 at 03:19:41PM -0500, Peter Staubach wrote:
> >I posted a different fix here about a month ago (but I sent it 3 times,
> >as it was twice wrong). Andrew was about to merge it in his tree but I
> >have not checked yet. It was different in the sense that it used
> >msecs_to_jiffies() to do the arithmetic in the best possible way
> >depending
> >on the HZ value and the ints size. Most of the time (when 1000 % HZ ==
> >0),
> >it will simplify the operations to a single divide by a constant and
> >correctly check for integer overflows. Eg, with HZ=250, a simple 2 bits
> >right shift will replace a multiply followed by an divide.
> >
> >I'll check whether 2.6.14-mm1 has it, otherwise I can repost it.
> >
>
> Yes, I remember the conversation. I hadn't seen the final patch included
> anywhere, so I posted this one.
>
> That said, I think that msecs_to_jiffies() can still suffer from overflows
> if (HZ % MSECS_PER_SEC) != 0 && (MSECS_PER_SEC % HZ) != 0. I'd like to
> see the rest of your patch to see how this was worked around.
The original msecs_to_jiffies() had lots of overflow cases. That's why I
started by fixing it. Here it comes, along with the timeout change for
poll() and epoll(). However, I remember there was one thing to fix in the
epoll() and poll() patch below : it does not replace MAX_JIFFY_OFFSET with
MAX_SCHEDULE_TIMEOUT while it should because this is what indicates the
overflow. I believe I have another version somewhere, but I'll have to
search on my other machine.
> The patch that I posted does not suffer from overflow issues.
Agreed, but it will require some heavy operations that can be optimized away
in most cases (eg: HZ=1000 or HZ=250).
Cheers,
Willy
--- linux-2.6.14-rc2-mm1/include/linux/jiffies.h Thu Sep 29 23:04:49 2005
+++ linux-2.6.14-rc2-mm1-jiffies2/include/linux/jiffies.h Sat Oct 1 19:12:13 2005
@@ -246,6 +246,37 @@
#endif
+
+/*
+ * We define MAX_MSEC_OFFSET and MAX_USEC_OFFSET as maximal values that can be
+ * accepted by msecs_to_jiffies() and usec_to_jiffies() respectively, without
+ * risking a multiply overflow. Those functions return MAX_JIFFY_OFFSET for
+ * arguments above those values.
+ */
+
+#if HZ <= MSEC_PER_SEC && !(MSEC_PER_SEC % HZ)
+# define MAX_MSEC_OFFSET \
+ (ULONG_MAX - (MSEC_PER_SEC / HZ) + 1)
+#elif HZ > MSEC_PER_SEC && !(HZ % MSEC_PER_SEC)
+# define MAX_MSEC_OFFSET \
+ (ULONG_MAX / (HZ / MSEC_PER_SEC))
+#else
+# define MAX_MSEC_OFFSET \
+ ((ULONG_MAX - (MSEC_PER_SEC - 1)) / HZ)
+#endif
+
+#if HZ <= USEC_PER_SEC && !(USEC_PER_SEC % HZ)
+# define MAX_USEC_OFFSET \
+ (ULONG_MAX - (USEC_PER_SEC / HZ) + 1)
+#elif HZ > USEC_PER_SEC && !(HZ % USEC_PER_SEC)
+# define MAX_USEC_OFFSET \
+ (ULONG_MAX / (HZ / USEC_PER_SEC))
+#else
+# define MAX_USEC_OFFSET \
+ ((ULONG_MAX - (USEC_PER_SEC - 1)) / HZ)
+#endif
+
+
/*
* Convert jiffies to milliseconds and back.
*
@@ -276,27 +307,29 @@
static inline unsigned long msecs_to_jiffies(const unsigned int m)
{
- if (m > jiffies_to_msecs(MAX_JIFFY_OFFSET))
+ if (MAX_MSEC_OFFSET < UINT_MAX && m > (unsigned int)MAX_MSEC_OFFSET)
return MAX_JIFFY_OFFSET;
#if HZ <= MSEC_PER_SEC && !(MSEC_PER_SEC % HZ)
- return (m + (MSEC_PER_SEC / HZ) - 1) / (MSEC_PER_SEC / HZ);
+ return ((unsigned long)m + (MSEC_PER_SEC / HZ) - 1) /
+ (MSEC_PER_SEC / HZ);
#elif HZ > MSEC_PER_SEC && !(HZ % MSEC_PER_SEC)
- return m * (HZ / MSEC_PER_SEC);
+ return (unsigned long)m * (HZ / MSEC_PER_SEC);
#else
- return (m * HZ + MSEC_PER_SEC - 1) / MSEC_PER_SEC;
+ return ((unsigned long)m * HZ + MSEC_PER_SEC - 1) / MSEC_PER_SEC;
#endif
}
static inline unsigned long usecs_to_jiffies(const unsigned int u)
{
- if (u > jiffies_to_usecs(MAX_JIFFY_OFFSET))
+ if (MAX_USEC_OFFSET < UINT_MAX && u > (unsigned int)MAX_USEC_OFFSET)
return MAX_JIFFY_OFFSET;
#if HZ <= USEC_PER_SEC && !(USEC_PER_SEC % HZ)
- return (u + (USEC_PER_SEC / HZ) - 1) / (USEC_PER_SEC / HZ);
+ return ((unsigned long)u + (USEC_PER_SEC / HZ) - 1) /
+ (USEC_PER_SEC / HZ);
#elif HZ > USEC_PER_SEC && !(HZ % USEC_PER_SEC)
- return u * (HZ / USEC_PER_SEC);
+ return (unsigned long)u * (HZ / USEC_PER_SEC);
#else
- return (u * HZ + USEC_PER_SEC - 1) / USEC_PER_SEC;
+ return ((unsigned long)u * HZ + USEC_PER_SEC - 1) / USEC_PER_SEC;
#endif
}
[ poll() patch - not to be merged ]
diff -purN linux-2.6.14-rc2-mm1/fs/select.c linux-2.6.14-rc2-mm1-poll/fs/select.c
--- linux-2.6.14-rc2-mm1/fs/select.c Sat Sep 24 21:12:36 2005
+++ linux-2.6.14-rc2-mm1-poll/fs/select.c Sat Sep 24 21:23:21 2005
@@ -469,7 +469,6 @@ asmlinkage long sys_poll(struct pollfd _
{
struct poll_wqueues table;
int fdcount, err;
- int overflow;
unsigned int i;
struct poll_list *head;
struct poll_list *walk;
@@ -486,22 +485,11 @@ asmlinkage long sys_poll(struct pollfd _
return -EINVAL;
/*
- * We compare HZ with 1000 to work out which side of the
- * expression needs conversion. Because we want to avoid
- * converting any value to a numerically higher value, which
- * could overflow.
- */
-#if HZ > 1000
- overflow = timeout_msecs >= jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT);
-#else
- overflow = msecs_to_jiffies(timeout_msecs) >= MAX_SCHEDULE_TIMEOUT;
-#endif
-
- /*
* If we would overflow in the conversion or a negative timeout
- * is requested, sleep indefinitely.
+ * is requested, sleep indefinitely. Note: msecs_to_jiffies checks
+ * for the overflow.
*/
- if (overflow || timeout_msecs < 0)
+ if (timeout_msecs < 0)
timeout_jiffies = MAX_SCHEDULE_TIMEOUT;
else
timeout_jiffies = msecs_to_jiffies(timeout_msecs) + 1;
[ epoll() patch - not to be merged ]
diff -purN linux-2.6.14-rc2-mm1/fs/eventpoll.c linux-2.6.14-rc2-mm1-epoll/fs/eventpoll.c
--- linux-2.6.14-rc2-mm1/fs/eventpoll.c Sat Sep 24 21:11:49 2005
+++ linux-2.6.14-rc2-mm1-epoll/fs/eventpoll.c Sat Sep 24 21:20:10 2005
@@ -1502,12 +1502,11 @@ static int ep_poll(struct eventpoll *ep,
wait_queue_t wait;
/*
- * Calculate the timeout by checking for the "infinite" value ( -1 )
- * and the overflow condition. The passed timeout is in milliseconds,
- * that why (t * HZ) / 1000.
+ * Calculate the timeout by checking for the "infinite" value ( < 0 )
+ * and the overflow condition. The passed timeout is in milliseconds.
*/
- jtimeout = timeout == -1 || timeout > (MAX_SCHEDULE_TIMEOUT - 1000) / HZ ?
- MAX_SCHEDULE_TIMEOUT: (timeout * HZ + 999) / 1000;
+ jtimeout = (timeout < 0) ?
+ MAX_SCHEDULE_TIMEOUT : msecs_to_jiffies(timeout);
retry:
write_lock_irqsave(&ep->lock, flags);
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2005-11-11 22:18 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-11-10 16:31 [PATCH] poll(2) timeout values Peter Staubach
2005-11-10 17:15 ` Alan Cox
2005-11-10 21:02 ` Willy Tarreau
2005-11-11 20:19 ` Peter Staubach
2005-11-11 22:02 ` Willy Tarreau
[not found] ` <a36005b50511101049vf20cde5m9385c433e18dcd2d@mail.gmail.com>
2005-11-10 22:33 ` Alan Cox
[not found] ` <a36005b50511101649l744f78c1i76133434be7304e8@mail.gmail.com>
2005-11-11 13:16 ` Alan Cox
2005-11-11 13:17 ` Peter Staubach
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox