public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Nishanth Aravamudan <nacc@us.ibm.com>
To: Willy Tarreau <willy@w.ods.org>
Cc: Nish Aravamudan <nish.aravamudan@gmail.com>,
	Davide Libenzi <davidel@xmailserver.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@osdl.org>
Subject: Re: [patch] sys_epoll_wait() timeout saga ...
Date: Sat, 24 Sep 2005 10:19:28 -0700	[thread overview]
Message-ID: <20050924171928.GF3950@us.ibm.com> (raw)
In-Reply-To: <20050924061500.GA24628@alpha.home.local>

On 24.09.2005 [08:15:00 +0200], Willy Tarreau wrote:
> On Fri, Sep 23, 2005 at 09:44:10PM -0700, Nish Aravamudan wrote:
> > > >        * that why (t * HZ) / 1000.
> > > >        */
> > > > -     jtimeout = timeout == -1 || timeout > (MAX_SCHEDULE_TIMEOUT - 1000) / HZ ?
> > > > +     jtimeout = timeout < 0 || (timeout / 1000) >= (MAX_SCHEDULE_TIMEOUT / HZ) ?
> > > >               MAX_SCHEDULE_TIMEOUT: (timeout * HZ + 999) / 1000;
> > >
> > > Here, I'm not certain that gcc will optimize the divide. It would be better
> > > anyway to write this which is equivalent, and a pure integer comparison :
> > >
> > > +       jtimeout = timeout < 0 || timeout >= 1000 * MAX_SCHEDULE_TIMEOUT / HZ ?
> > > >               MAX_SCHEDULE_TIMEOUT: (timeout * HZ + 999) / 1000;
> > 
> > Just a question here, maybe it's dumb.
> 
> Your question is not dumb, this code is not trivial at all !
> 
> > * and / have the same priority in the order of operations, yes? If so,
> > won't the the 1000 * MAX_SCHEDULE_TIMEOUT overflow
> > (MAX_SCHEDULE_TIMEOUT is LONG_MAX)?
> 
> Yes it can, and that's why I said that gcc should send a warning when
> comparing an int with something too large for an int. But I should have
> forced the constant to be evaluated as long long. At the moment, the
> constant cannot overflow, but it can reach a value so high that
> timeout/1000 will never reach it. Example :

Ok, makes sense!

<snip>

> > In any case, this code is approaching unreadable with lots of jiffies
> > <--> human-time units manipulations done in non-standard ways, which
> > the updated sys_poll() also tries to avoid.
> 
> I've not checked sys_poll(), but I agree with you that it's rather
> difficult to imagine all corner cases this way.

Here is the patch I was referring to, adapted to the epoll use of
schedule_timeout(). A similar fix is in -mm for almost identical code in
sys_poll().

Description: The @timeout parameter to ep_poll() is in milliseconds but
we compare it to (MAX_SCHEDULE_TIMEOUT - 1000 / HZ), which is
(jiffies/jiffies-per-sec) or seconds. That seems blatantly broken.  This
led to improper overflow checking for @timeout. As Andrew Morton pointed
out in a similar fix for sys_poll(), the best solution is to to check
for potential overflow first, then either select an indefinite value or
convert @timeout.

To achieve this and clean-up the code, change the prototype of the
ep_poll() to make it clear that the parameter is in milliseconds and
rename jtimeout to timeout_jiffies to hold the corresonding jiffies
value.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>

---

 fs/eventpoll.c |   46 ++++++++++++++++++++++++++++++++++++++--------
 1 files changed, 38 insertions(+), 8 deletions(-)

diff -urpN 2.6.14-rc2-mm1/fs/eventpoll.c 2.6.14-rc2-mm1-dev/fs/eventpoll.c
--- 2.6.14-rc2-mm1/fs/eventpoll.c	2005-09-21 23:50:28.000000000 -0700
+++ 2.6.14-rc2-mm1-dev/fs/eventpoll.c	2005-09-24 10:07:51.000000000 -0700
@@ -260,7 +260,7 @@ static int ep_events_transfer(struct eve
 			      struct epoll_event __user *events,
 			      int maxevents);
 static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
-		   int maxevents, long timeout);
+		   int maxevents, long timeout_msecs);
 static int eventpollfs_delete_dentry(struct dentry *dentry);
 static struct inode *ep_eventpoll_inode(void);
 static struct super_block *eventpollfs_get_sb(struct file_system_type *fs_type,
@@ -1494,11 +1494,12 @@ static int ep_events_transfer(struct eve
 
 
 static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
-		   int maxevents, long timeout)
+		   int maxevents, long timeout_msecs)
 {
 	int res, eavail;
+	int overflow;
 	unsigned long flags;
-	long jtimeout;
+	long timeout_jiffies;
 	wait_queue_t wait;
 
 	/*
@@ -1506,8 +1507,36 @@ static int ep_poll(struct eventpoll *ep,
 	 * and the overflow condition. The passed timeout is in milliseconds,
 	 * that why (t * HZ) / 1000.
 	 */
-	jtimeout = timeout == -1 || timeout > (MAX_SCHEDULE_TIMEOUT - 1000) / HZ ?
-		MAX_SCHEDULE_TIMEOUT: (timeout * HZ + 999) / 1000;
+
+	if (timeout_msecs) {
+		/*
+		 * We compare HZ with 1000 to work out which side of the
+		 * expression needs conversion.  Because we want to
+		 * avoid converting any value to a numerically higher
+		 * value, which could overflow.
+		 */
+#if HZ > 1000
+		overflow = timeout_msecs >=
+				jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT);
+#else
+		overflow = msecs_to_jiffies(timeout_msecs) >=
+				MAX_SCHEDULE_TIMEOUT;
+#endif
+
+		/*
+		 * If we would overflow in the conversion or a negative
+		 * timeout is requested, sleep indefinitely.
+		 */
+		if (overflow || timeout_msecs < 0)
+			timeout_jiffies = MAX_SCHEDULE_TIMEOUT;
+		else
+			timeout_jiffies = msecs_to_jiffies(timeout_msecs) + 1;
+	} else {
+		/*
+		 * 0 millisecond requests become 0 jiffy requests
+		 */
+		timeout_jiffies = 0;
+	}
 
 retry:
 	write_lock_irqsave(&ep->lock, flags);
@@ -1529,7 +1558,7 @@ retry:
 			 * to TASK_INTERRUPTIBLE before doing the checks.
 			 */
 			set_current_state(TASK_INTERRUPTIBLE);
-			if (!list_empty(&ep->rdllist) || !jtimeout)
+			if (!list_empty(&ep->rdllist) || !timeout_jiffies)
 				break;
 			if (signal_pending(current)) {
 				res = -EINTR;
@@ -1537,7 +1566,7 @@ retry:
 			}
 
 			write_unlock_irqrestore(&ep->lock, flags);
-			jtimeout = schedule_timeout(jtimeout);
+			timeout_jiffies = schedule_timeout(timeout_jiffies);
 			write_lock_irqsave(&ep->lock, flags);
 		}
 		remove_wait_queue(&ep->wq, &wait);
@@ -1556,7 +1585,8 @@ retry:
 	 * more luck.
 	 */
 	if (!res && eavail &&
-	    !(res = ep_events_transfer(ep, events, maxevents)) && jtimeout)
+	    !(res = ep_events_transfer(ep, events, maxevents)) &&
+	    timeout_jiffies)
 		goto retry;
 
 	return res;

  parent reply	other threads:[~2005-09-24 17:19 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-09-23 18:13 [patch] sys_epoll_wait() timeout saga Davide Libenzi
2005-09-23 18:24 ` Nish Aravamudan
2005-09-24  4:05 ` Willy Tarreau
2005-09-24  4:44   ` Nish Aravamudan
2005-09-24  6:15     ` Willy Tarreau
2005-09-24  7:33       ` Vadim Lobanov
2005-09-24  7:51         ` Willy Tarreau
2005-09-24 15:10       ` Davide Libenzi
2005-09-24 17:20         ` Willy Tarreau
2005-09-24 18:19           ` Davide Libenzi
2005-09-25  6:05             ` Andrew Morton
2005-09-25  6:20               ` Willy Tarreau
2005-09-25  6:32                 ` Andrew Morton
2005-09-25  7:08               ` Vadim Lobanov
2005-09-25  8:03                 ` Willy Tarreau
2005-09-24 17:19       ` Nishanth Aravamudan [this message]
2005-09-24 18:25         ` Davide Libenzi
2005-09-24 19:38           ` [PATCH 0/3] fixes for overflow in poll(), epoll(), and msec_to_jiffies() Willy Tarreau
2005-09-24 19:44             ` [PATCH 1/3] 2.6.14-rc2-mm1: fixes for overflow msec_to_jiffies() Willy Tarreau
2005-09-29  9:43               ` Andrew Morton
2005-09-29 19:41                 ` Willy Tarreau
2005-09-29 19:52                   ` Andrew Morton
2005-09-29 20:55                     ` Willy Tarreau
2005-10-01 17:39                     ` Willy Tarreau
2005-09-24 19:47             ` [PATCH 2/3] 2.6.14-rc2-mm1: fixes for overflow in epoll() Willy Tarreau
2005-09-24 19:52             ` [PATCH 3/3] 2.6.14-rc2-mm1 : fixes for overflow in sys_poll() Willy Tarreau
2005-10-01 20:39               ` Willy Tarreau
2005-09-24 20:08             ` [PATCH 0/3] fixes for overflow in poll(), epoll(), and msec_to_jiffies() Davide Libenzi
2005-09-24 20:21               ` Willy TARREAU
2005-09-25 20:55             ` Nishanth Aravamudan
2005-09-25 22:06               ` Willy Tarreau
2005-09-24 21:25           ` [patch] sys_epoll_wait() timeout saga Vadim Lobanov
2005-09-24 18:30         ` Willy Tarreau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050924171928.GF3950@us.ibm.com \
    --to=nacc@us.ibm.com \
    --cc=akpm@osdl.org \
    --cc=davidel@xmailserver.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nish.aravamudan@gmail.com \
    --cc=willy@w.ods.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox