max_user_watches overflows on 16TB system.

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* max_user_watches overflows on 16TB system.
@ 2010-10-01 20:01 Robin Holt
  2010-10-01 20:37 ` Davide Libenzi
  0 siblings, 1 reply; 15+ messages in thread
From: Robin Holt @ 2010-10-01 20:01 UTC (permalink / raw)
  To: Davide Libenzi, Eric W. Biederman, Pekka Enberg; +Cc: linux-kernel

Following a boot of a 16TB system, we noticed that the max_user_watches
sysctl was negative.  Is there any downside to converting that to a
static long and handling the fallout from the change?  I believe that
fallout includes changing the definition of epoll_watches over to an
atomic_long_t as well.

Alternatively should we just limit max_user_watches to INT_MAX?

Thanks,
Robin Holt

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: max_user_watches overflows on 16TB system.
  2010-10-01 20:01 max_user_watches overflows on 16TB system Robin Holt
@ 2010-10-01 20:37 ` Davide Libenzi
  2010-10-02  3:04   ` Eric W. Biederman
  0 siblings, 1 reply; 15+ messages in thread
From: Davide Libenzi @ 2010-10-01 20:37 UTC (permalink / raw)
  To: Robin Holt; +Cc: Eric W. Biederman, Pekka Enberg, linux-kernel

On Fri, 1 Oct 2010, Robin Holt wrote:

> 
> Following a boot of a 16TB system, we noticed that the max_user_watches
> sysctl was negative.  Is there any downside to converting that to a
> static long and handling the fallout from the change?  I believe that
> fallout includes changing the definition of epoll_watches over to an
> atomic_long_t as well.
> 
> Alternatively should we just limit max_user_watches to INT_MAX?

2B watches looks an acceptable limit to me, at least for now.
Nobody complained about not having enough of them so far.


- Davide



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: max_user_watches overflows on 16TB system.
  2010-10-01 20:37 ` Davide Libenzi
@ 2010-10-02  3:04   ` Eric W. Biederman
  2010-10-02 14:04     ` Davide Libenzi
  0 siblings, 1 reply; 15+ messages in thread
From: Eric W. Biederman @ 2010-10-02  3:04 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Robin Holt, Pekka Enberg, linux-kernel

Davide Libenzi <davidel@xmailserver.org> writes:

> On Fri, 1 Oct 2010, Robin Holt wrote:
>
>> 
>> Following a boot of a 16TB system, we noticed that the max_user_watches
>> sysctl was negative.  Is there any downside to converting that to a
>> static long and handling the fallout from the change?  I believe that
>> fallout includes changing the definition of epoll_watches over to an
>> atomic_long_t as well.
>> 
>> Alternatively should we just limit max_user_watches to INT_MAX?
>
> 2B watches looks an acceptable limit to me, at least for now.
> Nobody complained about not having enough of them so far.

Which suggests that we need to force the boot time calculation to not
exceed 2B.

>From the sysctl interface perspective now that all of it is exported as
ascii strings I don't see a problem there.

Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: max_user_watches overflows on 16TB system.
  2010-10-02  3:04   ` Eric W. Biederman
@ 2010-10-02 14:04     ` Davide Libenzi
  2010-10-04 19:44       ` [Patch] Convert max_user_watches to long Robin Holt
  0 siblings, 1 reply; 15+ messages in thread
From: Davide Libenzi @ 2010-10-02 14:04 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Robin Holt, Pekka Enberg, Linux Kernel Mailing List

On Fri, 1 Oct 2010, Eric W. Biederman wrote:

> Davide Libenzi <davidel@xmailserver.org> writes:
> 
> > On Fri, 1 Oct 2010, Robin Holt wrote:
> >
> >> 
> >> Following a boot of a 16TB system, we noticed that the max_user_watches
> >> sysctl was negative.  Is there any downside to converting that to a
> >> static long and handling the fallout from the change?  I believe that
> >> fallout includes changing the definition of epoll_watches over to an
> >> atomic_long_t as well.
> >> 
> >> Alternatively should we just limit max_user_watches to INT_MAX?
> >
> > 2B watches looks an acceptable limit to me, at least for now.
> > Nobody complained about not having enough of them so far.
> 
> Which suggests that we need to force the boot time calculation to not
> exceed 2B.
> 
> From the sysctl interface perspective now that all of it is exported as
> ascii strings I don't see a problem there.

Thinking with actual code at hands, you're right.  It just makes more 
sense the 'long' variable conversion, even though MAX_INT watches are a 
huge number today.
Robin, sorry for the confusion.  Can you post the 'long' conversion patch?


- Davide



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Patch] Convert max_user_watches to long.
  2010-10-02 14:04     ` Davide Libenzi
@ 2010-10-04 19:44       ` Robin Holt
  2010-10-06  2:21         ` Davide Libenzi
  0 siblings, 1 reply; 15+ messages in thread
From: Robin Holt @ 2010-10-04 19:44 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Eric W. Biederman, Robin Holt, Pekka Enberg,
	Linux Kernel Mailing List



On a 16TB machine, max_user_watches has an integer overflow.  Convert it
to use a long and handle the associated fallout.


Signed-off-by: Robin Holt <holt@sgi.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
To: Davide Libenzi <davidel@xmailserver.org>
To: linux-kernel@vger.kernel.org
To: Pekka Enberg <penberg@cs.helsinki.fi>

---

Davide,  I changed the logic a bit in ep_insert.  It looked to me like
there was a window between when the epoll_watches is checked and when it
is incremented where multiple epoll_insert callers could be adding watches
at the same time and allow epoll_watches to exceed max_user_watches.
Not sure of the case where this could happen, but I assume something
like that must be possible or we would not be using atomics.  If that
is not to your liking, I will happily remove it.

 fs/eventpoll.c        |   22 +++++++++++++---------
 include/linux/sched.h |    2 +-
 2 files changed, 14 insertions(+), 10 deletions(-)

Index: pv1010933/fs/eventpoll.c
===================================================================
--- pv1010933.orig/fs/eventpoll.c	2010-10-02 06:38:15.000000000 -0500
+++ pv1010933/fs/eventpoll.c	2010-10-04 11:05:21.643823297 -0500
@@ -220,7 +220,7 @@ struct ep_send_events_data {
  * Configuration options available inside /proc/sys/fs/epoll/
  */
 /* Maximum number of epoll watched descriptors, per user */
-static int max_user_watches __read_mostly;
+static long max_user_watches __read_mostly;
 
 /*
  * This mutex is used to serialize ep_free() and eventpoll_release_file().
@@ -243,16 +243,18 @@ static struct kmem_cache *pwq_cache __re
 
 #include <linux/sysctl.h>
 
-static int zero;
+static long zero;
+static long long_max = LONG_MAX;
 
 ctl_table epoll_table[] = {
 	{
 		.procname	= "max_user_watches",
 		.data		= &max_user_watches,
-		.maxlen		= sizeof(int),
+		.maxlen		= sizeof(max_user_watches),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec_minmax,
+		.proc_handler	= proc_doulongvec_minmax,
 		.extra1		= &zero,
+		.extra2		= &long_max,
 	},
 	{ }
 };
@@ -564,7 +566,7 @@ static int ep_remove(struct eventpoll *e
 	/* At this point it is safe to free the eventpoll item */
 	kmem_cache_free(epi_cache, epi);
 
-	atomic_dec(&ep->user->epoll_watches);
+	atomic_long_dec(&ep->user->epoll_watches);
 
 	return 0;
 }
@@ -900,12 +902,15 @@ static int ep_insert(struct eventpoll *e
 {
 	int error, revents, pwake = 0;
 	unsigned long flags;
+	long user_watches;
 	struct epitem *epi;
 	struct ep_pqueue epq;
 
-	if (unlikely(atomic_read(&ep->user->epoll_watches) >=
-		     max_user_watches))
+	user_watches = atomic_long_inc_return(&ep->user->epoll_watches);
+	if (unlikely(user_watches > max_user_watches)) {
+		atomic_long_dec(&ep->user->epoll_watches);
 		return -ENOSPC;
+	}
 	if (!(epi = kmem_cache_alloc(epi_cache, GFP_KERNEL)))
 		return -ENOMEM;
 
@@ -968,8 +973,6 @@ static int ep_insert(struct eventpoll *e
 
 	spin_unlock_irqrestore(&ep->lock, flags);
 
-	atomic_inc(&ep->user->epoll_watches);
-
 	/* We have to call this outside the lock */
 	if (pwake)
 		ep_poll_safewake(&ep->poll_wait);
@@ -1422,6 +1425,7 @@ static int __init eventpoll_init(void)
 	 */
 	max_user_watches = (((si.totalram - si.totalhigh) / 25) << PAGE_SHIFT) /
 		EP_ITEM_COST;
+	BUG_ON(max_user_watches < 0);
 
 	/* Initialize the structure used to perform safe poll wait head wake ups */
 	ep_nested_calls_init(&poll_safewake_ncalls);
Index: pv1010933/include/linux/sched.h
===================================================================
--- pv1010933.orig/include/linux/sched.h	2010-10-01 10:27:07.000000000 -0500
+++ pv1010933/include/linux/sched.h	2010-10-04 10:44:11.287823312 -0500
@@ -666,7 +666,7 @@ struct user_struct {
 	atomic_t inotify_devs;	/* How many inotify devs does this user have opened? */
 #endif
 #ifdef CONFIG_EPOLL
-	atomic_t epoll_watches;	/* The number of file descriptors currently watched */
+	atomic_long_t epoll_watches; /* The number of file descriptors currently watched */
 #endif
 #ifdef CONFIG_POSIX_MQUEUE
 	/* protected by mq_lock	*/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Patch] Convert max_user_watches to long.
  2010-10-04 19:44       ` [Patch] Convert max_user_watches to long Robin Holt
@ 2010-10-06  2:21         ` Davide Libenzi
  2010-10-09  7:50           ` Robin Holt
  0 siblings, 1 reply; 15+ messages in thread
From: Davide Libenzi @ 2010-10-06  2:21 UTC (permalink / raw)
  To: Robin Holt; +Cc: Eric W. Biederman, Pekka Enberg, Linux Kernel Mailing List

On Mon, 4 Oct 2010, Robin Holt wrote:

> On a 16TB machine, max_user_watches has an integer overflow.  Convert it
> to use a long and handle the associated fallout.
> 
> 
> Signed-off-by: Robin Holt <holt@sgi.com>
> To: "Eric W. Biederman" <ebiederm@xmission.com>
> To: Davide Libenzi <davidel@xmailserver.org>
> To: linux-kernel@vger.kernel.org
> To: Pekka Enberg <penberg@cs.helsinki.fi>
> 
> ---
> 
> Davide,  I changed the logic a bit in ep_insert.  It looked to me like
> there was a window between when the epoll_watches is checked and when it
> is incremented where multiple epoll_insert callers could be adding watches
> at the same time and allow epoll_watches to exceed max_user_watches.
> Not sure of the case where this could happen, but I assume something
> like that must be possible or we would not be using atomics.  If that
> is not to your liking, I will happily remove it.

The case can happen, but the effect is not something we should be too 
worried about.
You seem to be leaking a count in case kmem_cache_alloc() and following 
fail.
I'd rather not have that code there, and have the patch cover the 'long' 
conversion only.  Or you need a proper cleanup goto target.





> 
>  fs/eventpoll.c        |   22 +++++++++++++---------
>  include/linux/sched.h |    2 +-
>  2 files changed, 14 insertions(+), 10 deletions(-)
> 
> Index: pv1010933/fs/eventpoll.c
> ===================================================================
> --- pv1010933.orig/fs/eventpoll.c	2010-10-02 06:38:15.000000000 -0500
> +++ pv1010933/fs/eventpoll.c	2010-10-04 11:05:21.643823297 -0500
> @@ -220,7 +220,7 @@ struct ep_send_events_data {
>   * Configuration options available inside /proc/sys/fs/epoll/
>   */
>  /* Maximum number of epoll watched descriptors, per user */
> -static int max_user_watches __read_mostly;
> +static long max_user_watches __read_mostly;
>  
>  /*
>   * This mutex is used to serialize ep_free() and eventpoll_release_file().
> @@ -243,16 +243,18 @@ static struct kmem_cache *pwq_cache __re
>  
>  #include <linux/sysctl.h>
>  
> -static int zero;
> +static long zero;
> +static long long_max = LONG_MAX;
>  
>  ctl_table epoll_table[] = {
>  	{
>  		.procname	= "max_user_watches",
>  		.data		= &max_user_watches,
> -		.maxlen		= sizeof(int),
> +		.maxlen		= sizeof(max_user_watches),
>  		.mode		= 0644,
> -		.proc_handler	= proc_dointvec_minmax,
> +		.proc_handler	= proc_doulongvec_minmax,
>  		.extra1		= &zero,
> +		.extra2		= &long_max,
>  	},
>  	{ }
>  };
> @@ -564,7 +566,7 @@ static int ep_remove(struct eventpoll *e
>  	/* At this point it is safe to free the eventpoll item */
>  	kmem_cache_free(epi_cache, epi);
>  
> -	atomic_dec(&ep->user->epoll_watches);
> +	atomic_long_dec(&ep->user->epoll_watches);
>  
>  	return 0;
>  }
> @@ -900,12 +902,15 @@ static int ep_insert(struct eventpoll *e
>  {
>  	int error, revents, pwake = 0;
>  	unsigned long flags;
> +	long user_watches;
>  	struct epitem *epi;
>  	struct ep_pqueue epq;
>  
> -	if (unlikely(atomic_read(&ep->user->epoll_watches) >=
> -		     max_user_watches))
> +	user_watches = atomic_long_inc_return(&ep->user->epoll_watches);
> +	if (unlikely(user_watches > max_user_watches)) {
> +		atomic_long_dec(&ep->user->epoll_watches);
>  		return -ENOSPC;
> +	}
>  	if (!(epi = kmem_cache_alloc(epi_cache, GFP_KERNEL)))
>  		return -ENOMEM;
>  
> @@ -968,8 +973,6 @@ static int ep_insert(struct eventpoll *e
>  
>  	spin_unlock_irqrestore(&ep->lock, flags);
>  
> -	atomic_inc(&ep->user->epoll_watches);
> -
>  	/* We have to call this outside the lock */
>  	if (pwake)
>  		ep_poll_safewake(&ep->poll_wait);
> @@ -1422,6 +1425,7 @@ static int __init eventpoll_init(void)
>  	 */
>  	max_user_watches = (((si.totalram - si.totalhigh) / 25) << PAGE_SHIFT) /
>  		EP_ITEM_COST;
> +	BUG_ON(max_user_watches < 0);
>  
>  	/* Initialize the structure used to perform safe poll wait head wake ups */
>  	ep_nested_calls_init(&poll_safewake_ncalls);
> Index: pv1010933/include/linux/sched.h
> ===================================================================
> --- pv1010933.orig/include/linux/sched.h	2010-10-01 10:27:07.000000000 -0500
> +++ pv1010933/include/linux/sched.h	2010-10-04 10:44:11.287823312 -0500
> @@ -666,7 +666,7 @@ struct user_struct {
>  	atomic_t inotify_devs;	/* How many inotify devs does this user have opened? */
>  #endif
>  #ifdef CONFIG_EPOLL
> -	atomic_t epoll_watches;	/* The number of file descriptors currently watched */
> +	atomic_long_t epoll_watches; /* The number of file descriptors currently watched */
>  #endif
>  #ifdef CONFIG_POSIX_MQUEUE
>  	/* protected by mq_lock	*/
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Patch] Convert max_user_watches to long.
  2010-10-06  2:21         ` Davide Libenzi
@ 2010-10-09  7:50           ` Robin Holt
  2010-10-10 19:05             ` Randy Dunlap
  2010-10-11  4:49             ` Davide Libenzi
  0 siblings, 2 replies; 15+ messages in thread
From: Robin Holt @ 2010-10-09  7:50 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Robin Holt, Eric W. Biederman, Pekka Enberg,
	Linux Kernel Mailing List

On Tue, Oct 05, 2010 at 07:21:09PM -0700, Davide Libenzi wrote:
> On Mon, 4 Oct 2010, Robin Holt wrote:
> 
> > On a 16TB machine, max_user_watches has an integer overflow.  Convert it
> > to use a long and handle the associated fallout.
> > 
> > 
> > Signed-off-by: Robin Holt <holt@sgi.com>
> > To: "Eric W. Biederman" <ebiederm@xmission.com>
> > To: Davide Libenzi <davidel@xmailserver.org>
> > To: linux-kernel@vger.kernel.org
> > To: Pekka Enberg <penberg@cs.helsinki.fi>
> > 
> > ---
> > 
> > Davide,  I changed the logic a bit in ep_insert.  It looked to me like
> > there was a window between when the epoll_watches is checked and when it
> > is incremented where multiple epoll_insert callers could be adding watches
> > at the same time and allow epoll_watches to exceed max_user_watches.
> > Not sure of the case where this could happen, but I assume something
> > like that must be possible or we would not be using atomics.  If that
> > is not to your liking, I will happily remove it.
> 
> The case can happen, but the effect is not something we should be too 
> worried about.
> You seem to be leaking a count in case kmem_cache_alloc() and following 
> fail.
> I'd rather not have that code there, and have the patch cover the 'long' 
> conversion only.  Or you need a proper cleanup goto target.

Bah.  Too rushed when I made that.  Here is the conversion only patch.  If
this is acceptable, what is the normal submission path for fs/eventpoll.c?

Robin

------------------------------------------------------------------------

On a 16TB machine, max_user_watches has an integer overflow.  Convert it
to use a long and handle the associated fallout.


Signed-off-by: Robin Holt <holt@sgi.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
To: Davide Libenzi <davidel@xmailserver.org>
To: linux-kernel@vger.kernel.org
To: Pekka Enberg <penberg@cs.helsinki.fi>

---

 fs/eventpoll.c        |   20 ++++++++++++--------
 include/linux/sched.h |    2 +-
 2 files changed, 13 insertions(+), 9 deletions(-)

Index: pv1010933/fs/eventpoll.c
===================================================================
--- pv1010933.orig/fs/eventpoll.c	2010-10-04 14:41:59.000000000 -0500
+++ pv1010933/fs/eventpoll.c	2010-10-09 02:40:07.360573988 -0500
@@ -220,7 +220,7 @@ struct ep_send_events_data {
  * Configuration options available inside /proc/sys/fs/epoll/
  */
 /* Maximum number of epoll watched descriptors, per user */
-static int max_user_watches __read_mostly;
+static long max_user_watches __read_mostly;
 
 /*
  * This mutex is used to serialize ep_free() and eventpoll_release_file().
@@ -243,16 +243,18 @@ static struct kmem_cache *pwq_cache __re
 
 #include <linux/sysctl.h>
 
-static int zero;
+static long zero;
+static long long_max = LONG_MAX;
 
 ctl_table epoll_table[] = {
 	{
 		.procname	= "max_user_watches",
 		.data		= &max_user_watches,
-		.maxlen		= sizeof(int),
+		.maxlen		= sizeof(max_user_watches),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec_minmax,
+		.proc_handler	= proc_doulongvec_minmax,
 		.extra1		= &zero,
+		.extra2		= &long_max,
 	},
 	{ }
 };
@@ -564,7 +566,7 @@ static int ep_remove(struct eventpoll *e
 	/* At this point it is safe to free the eventpoll item */
 	kmem_cache_free(epi_cache, epi);
 
-	atomic_dec(&ep->user->epoll_watches);
+	atomic_long_dec(&ep->user->epoll_watches);
 
 	return 0;
 }
@@ -900,11 +902,12 @@ static int ep_insert(struct eventpoll *e
 {
 	int error, revents, pwake = 0;
 	unsigned long flags;
+	long user_watches;
 	struct epitem *epi;
 	struct ep_pqueue epq;
 
-	if (unlikely(atomic_read(&ep->user->epoll_watches) >=
-		     max_user_watches))
+	user_watches = atomic_long_read(&ep->user->epoll_watches);
+	if (user_watches >= max_user_watches)
 		return -ENOSPC;
 	if (!(epi = kmem_cache_alloc(epi_cache, GFP_KERNEL)))
 		return -ENOMEM;
@@ -968,7 +971,7 @@ static int ep_insert(struct eventpoll *e
 
 	spin_unlock_irqrestore(&ep->lock, flags);
 
-	atomic_inc(&ep->user->epoll_watches);
+	atomic_long_inc(&ep->user->epoll_watches);
 
 	/* We have to call this outside the lock */
 	if (pwake)
@@ -1422,6 +1425,7 @@ static int __init eventpoll_init(void)
 	 */
 	max_user_watches = (((si.totalram - si.totalhigh) / 25) << PAGE_SHIFT) /
 		EP_ITEM_COST;
+	BUG_ON(max_user_watches < 0);
 
 	/* Initialize the structure used to perform safe poll wait head wake ups */
 	ep_nested_calls_init(&poll_safewake_ncalls);
Index: pv1010933/include/linux/sched.h
===================================================================
--- pv1010933.orig/include/linux/sched.h	2010-10-04 14:41:59.000000000 -0500
+++ pv1010933/include/linux/sched.h	2010-10-04 14:42:01.123824797 -0500
@@ -666,7 +666,7 @@ struct user_struct {
 	atomic_t inotify_devs;	/* How many inotify devs does this user have opened? */
 #endif
 #ifdef CONFIG_EPOLL
-	atomic_t epoll_watches;	/* The number of file descriptors currently watched */
+	atomic_long_t epoll_watches; /* The number of file descriptors currently watched */
 #endif
 #ifdef CONFIG_POSIX_MQUEUE
 	/* protected by mq_lock	*/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Patch] Convert max_user_watches to long.
  2010-10-09  7:50           ` Robin Holt
@ 2010-10-10 19:05             ` Randy Dunlap
  2010-10-11  4:49             ` Davide Libenzi
  1 sibling, 0 replies; 15+ messages in thread
From: Randy Dunlap @ 2010-10-10 19:05 UTC (permalink / raw)
  To: Robin Holt, akpm
  Cc: Davide Libenzi, Eric W. Biederman, Pekka Enberg,
	Linux Kernel Mailing List

On Sat, 9 Oct 2010 02:50:02 -0500 Robin Holt wrote:

> On Tue, Oct 05, 2010 at 07:21:09PM -0700, Davide Libenzi wrote:
> > On Mon, 4 Oct 2010, Robin Holt wrote:
> > 
> > > On a 16TB machine, max_user_watches has an integer overflow.  Convert it
> > > to use a long and handle the associated fallout.
> > > 
> > > 
> > > Signed-off-by: Robin Holt <holt@sgi.com>
> > > To: "Eric W. Biederman" <ebiederm@xmission.com>
> > > To: Davide Libenzi <davidel@xmailserver.org>
> > > To: linux-kernel@vger.kernel.org
> > > To: Pekka Enberg <penberg@cs.helsinki.fi>
> > > 
> > > ---
> > > 
> > > Davide,  I changed the logic a bit in ep_insert.  It looked to me like
> > > there was a window between when the epoll_watches is checked and when it
> > > is incremented where multiple epoll_insert callers could be adding watches
> > > at the same time and allow epoll_watches to exceed max_user_watches.
> > > Not sure of the case where this could happen, but I assume something
> > > like that must be possible or we would not be using atomics.  If that
> > > is not to your liking, I will happily remove it.
> > 
> > The case can happen, but the effect is not something we should be too 
> > worried about.
> > You seem to be leaking a count in case kmem_cache_alloc() and following 
> > fail.
> > I'd rather not have that code there, and have the patch cover the 'long' 
> > conversion only.  Or you need a proper cleanup goto target.
> 
> Bah.  Too rushed when I made that.  Here is the conversion only patch.  If
> this is acceptable, what is the normal submission path for fs/eventpoll.c?

git log looks mostly like either thru Andrew or straight to Linus.

> Robin
> 
> ------------------------------------------------------------------------
> 
> On a 16TB machine, max_user_watches has an integer overflow.  Convert it
> to use a long and handle the associated fallout.
> 
> 
> Signed-off-by: Robin Holt <holt@sgi.com>
> To: "Eric W. Biederman" <ebiederm@xmission.com>
> To: Davide Libenzi <davidel@xmailserver.org>
> To: linux-kernel@vger.kernel.org
> To: Pekka Enberg <penberg@cs.helsinki.fi>
> 
> ---
> 
>  fs/eventpoll.c        |   20 ++++++++++++--------
>  include/linux/sched.h |    2 +-
>  2 files changed, 13 insertions(+), 9 deletions(-)
> 
> Index: pv1010933/fs/eventpoll.c
> ===================================================================
> --- pv1010933.orig/fs/eventpoll.c	2010-10-04 14:41:59.000000000 -0500
> +++ pv1010933/fs/eventpoll.c	2010-10-09 02:40:07.360573988 -0500
> @@ -220,7 +220,7 @@ struct ep_send_events_data {
>   * Configuration options available inside /proc/sys/fs/epoll/
>   */
>  /* Maximum number of epoll watched descriptors, per user */
> -static int max_user_watches __read_mostly;
> +static long max_user_watches __read_mostly;
>  
>  /*
>   * This mutex is used to serialize ep_free() and eventpoll_release_file().
> @@ -243,16 +243,18 @@ static struct kmem_cache *pwq_cache __re
>  
>  #include <linux/sysctl.h>
>  
> -static int zero;
> +static long zero;
> +static long long_max = LONG_MAX;
>  
>  ctl_table epoll_table[] = {
>  	{
>  		.procname	= "max_user_watches",
>  		.data		= &max_user_watches,
> -		.maxlen		= sizeof(int),
> +		.maxlen		= sizeof(max_user_watches),
>  		.mode		= 0644,
> -		.proc_handler	= proc_dointvec_minmax,
> +		.proc_handler	= proc_doulongvec_minmax,
>  		.extra1		= &zero,
> +		.extra2		= &long_max,
>  	},
>  	{ }
>  };
> @@ -564,7 +566,7 @@ static int ep_remove(struct eventpoll *e
>  	/* At this point it is safe to free the eventpoll item */
>  	kmem_cache_free(epi_cache, epi);
>  
> -	atomic_dec(&ep->user->epoll_watches);
> +	atomic_long_dec(&ep->user->epoll_watches);
>  
>  	return 0;
>  }
> @@ -900,11 +902,12 @@ static int ep_insert(struct eventpoll *e
>  {
>  	int error, revents, pwake = 0;
>  	unsigned long flags;
> +	long user_watches;
>  	struct epitem *epi;
>  	struct ep_pqueue epq;
>  
> -	if (unlikely(atomic_read(&ep->user->epoll_watches) >=
> -		     max_user_watches))
> +	user_watches = atomic_long_read(&ep->user->epoll_watches);
> +	if (user_watches >= max_user_watches)
>  		return -ENOSPC;
>  	if (!(epi = kmem_cache_alloc(epi_cache, GFP_KERNEL)))
>  		return -ENOMEM;
> @@ -968,7 +971,7 @@ static int ep_insert(struct eventpoll *e
>  
>  	spin_unlock_irqrestore(&ep->lock, flags);
>  
> -	atomic_inc(&ep->user->epoll_watches);
> +	atomic_long_inc(&ep->user->epoll_watches);
>  
>  	/* We have to call this outside the lock */
>  	if (pwake)
> @@ -1422,6 +1425,7 @@ static int __init eventpoll_init(void)
>  	 */
>  	max_user_watches = (((si.totalram - si.totalhigh) / 25) << PAGE_SHIFT) /
>  		EP_ITEM_COST;
> +	BUG_ON(max_user_watches < 0);
>  
>  	/* Initialize the structure used to perform safe poll wait head wake ups */
>  	ep_nested_calls_init(&poll_safewake_ncalls);
> Index: pv1010933/include/linux/sched.h
> ===================================================================
> --- pv1010933.orig/include/linux/sched.h	2010-10-04 14:41:59.000000000 -0500
> +++ pv1010933/include/linux/sched.h	2010-10-04 14:42:01.123824797 -0500
> @@ -666,7 +666,7 @@ struct user_struct {
>  	atomic_t inotify_devs;	/* How many inotify devs does this user have opened? */
>  #endif
>  #ifdef CONFIG_EPOLL
> -	atomic_t epoll_watches;	/* The number of file descriptors currently watched */
> +	atomic_long_t epoll_watches; /* The number of file descriptors currently watched */
>  #endif
>  #ifdef CONFIG_POSIX_MQUEUE
>  	/* protected by mq_lock	*/
> --

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Patch] Convert max_user_watches to long.
  2010-10-09  7:50           ` Robin Holt
  2010-10-10 19:05             ` Randy Dunlap
@ 2010-10-11  4:49             ` Davide Libenzi
  2010-10-14 17:15               ` Robin Holt
  1 sibling, 1 reply; 15+ messages in thread
From: Davide Libenzi @ 2010-10-11  4:49 UTC (permalink / raw)
  To: Robin Holt; +Cc: Eric W. Biederman, Pekka Enberg, Linux Kernel Mailing List

On Sat, 9 Oct 2010, Robin Holt wrote:

> @@ -900,11 +902,12 @@ static int ep_insert(struct eventpoll *e
>  {
>  	int error, revents, pwake = 0;
>  	unsigned long flags;
> +	long user_watches;
>  	struct epitem *epi;
>  	struct ep_pqueue epq;
>  
> -	if (unlikely(atomic_read(&ep->user->epoll_watches) >=
> -		     max_user_watches))
> +	user_watches = atomic_long_read(&ep->user->epoll_watches);
> +	if (user_watches >= max_user_watches)
>  		return -ENOSPC;

Is there a particular reason for adding an extra, otherwise unused, 
user_watches variable?


- Davide



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Patch] Convert max_user_watches to long.
  2010-10-11  4:49             ` Davide Libenzi
@ 2010-10-14 17:15               ` Robin Holt
  0 siblings, 0 replies; 15+ messages in thread
From: Robin Holt @ 2010-10-14 17:15 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Robin Holt, Eric W. Biederman, Pekka Enberg,
	Linux Kernel Mailing List

On Sun, Oct 10, 2010 at 09:49:58PM -0700, Davide Libenzi wrote:
> On Sat, 9 Oct 2010, Robin Holt wrote:
> 
> > @@ -900,11 +902,12 @@ static int ep_insert(struct eventpoll *e
> >  {
> >  	int error, revents, pwake = 0;
> >  	unsigned long flags;
> > +	long user_watches;
> >  	struct epitem *epi;
> >  	struct ep_pqueue epq;
> >  
> > -	if (unlikely(atomic_read(&ep->user->epoll_watches) >=
> > -		     max_user_watches))
> > +	user_watches = atomic_long_read(&ep->user->epoll_watches);
> > +	if (user_watches >= max_user_watches)
> >  		return -ENOSPC;
> 
> Is there a particular reason for adding an extra, otherwise unused, 
> user_watches variable?

Keeps the line length and readability complaints down.  No other
reason.

Robin

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Patch] Convert max_user_watches to long.
       [not found] <20101027190914.146006767@gulag1.americas.sgi.com>
@ 2010-10-27 19:09 ` Robin Holt
  2010-10-27 19:31   ` Davide Libenzi
  2010-10-27 23:45   ` Andrew Morton
  0 siblings, 2 replies; 15+ messages in thread
From: Robin Holt @ 2010-10-27 19:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Eric W. Biederman, Davide Libenzi, linux-kernel, Pekka Enberg

[-- Attachment #1: max_user_watches_overflow --]
[-- Type: text/plain, Size: 3553 bytes --]

On a 16TB machine, max_user_watches has an integer overflow.  Convert it
to use a long and handle the associated fallout.


Signed-off-by: Robin Holt <holt@sgi.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: linux-kernel@vger.kernel.org
Cc: Pekka Enberg <penberg@cs.helsinki.fi>

---

 fs/eventpoll.c        |   20 ++++++++++++--------
 include/linux/sched.h |    2 +-
 2 files changed, 13 insertions(+), 9 deletions(-)

Index: pv1010933/fs/eventpoll.c
===================================================================
--- pv1010933.orig/fs/eventpoll.c	2010-10-27 14:04:44.000000000 -0500
+++ pv1010933/fs/eventpoll.c	2010-10-27 14:07:50.211457551 -0500
@@ -220,7 +220,7 @@ struct ep_send_events_data {
  * Configuration options available inside /proc/sys/fs/epoll/
  */
 /* Maximum number of epoll watched descriptors, per user */
-static int max_user_watches __read_mostly;
+static long max_user_watches __read_mostly;
 
 /*
  * This mutex is used to serialize ep_free() and eventpoll_release_file().
@@ -243,16 +243,18 @@ static struct kmem_cache *pwq_cache __re
 
 #include <linux/sysctl.h>
 
-static int zero;
+static long zero;
+static long long_max = LONG_MAX;
 
 ctl_table epoll_table[] = {
 	{
 		.procname	= "max_user_watches",
 		.data		= &max_user_watches,
-		.maxlen		= sizeof(int),
+		.maxlen		= sizeof(max_user_watches),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec_minmax,
+		.proc_handler	= proc_doulongvec_minmax,
 		.extra1		= &zero,
+		.extra2		= &long_max,
 	},
 	{ }
 };
@@ -564,7 +566,7 @@ static int ep_remove(struct eventpoll *e
 	/* At this point it is safe to free the eventpoll item */
 	kmem_cache_free(epi_cache, epi);
 
-	atomic_dec(&ep->user->epoll_watches);
+	atomic_long_dec(&ep->user->epoll_watches);
 
 	return 0;
 }
@@ -900,11 +902,12 @@ static int ep_insert(struct eventpoll *e
 {
 	int error, revents, pwake = 0;
 	unsigned long flags;
+	long user_watches;
 	struct epitem *epi;
 	struct ep_pqueue epq;
 
-	if (unlikely(atomic_read(&ep->user->epoll_watches) >=
-		     max_user_watches))
+	user_watches = atomic_long_read(&ep->user->epoll_watches);
+	if (unlikely(user_watches >= max_user_watches))
 		return -ENOSPC;
 	if (!(epi = kmem_cache_alloc(epi_cache, GFP_KERNEL)))
 		return -ENOMEM;
@@ -968,7 +971,7 @@ static int ep_insert(struct eventpoll *e
 
 	spin_unlock_irqrestore(&ep->lock, flags);
 
-	atomic_inc(&ep->user->epoll_watches);
+	atomic_long_inc(&ep->user->epoll_watches);
 
 	/* We have to call this outside the lock */
 	if (pwake)
@@ -1422,6 +1425,7 @@ static int __init eventpoll_init(void)
 	 */
 	max_user_watches = (((si.totalram - si.totalhigh) / 25) << PAGE_SHIFT) /
 		EP_ITEM_COST;
+	BUG_ON(max_user_watches < 0);
 
 	/* Initialize the structure used to perform safe poll wait head wake ups */
 	ep_nested_calls_init(&poll_safewake_ncalls);
Index: pv1010933/include/linux/sched.h
===================================================================
--- pv1010933.orig/include/linux/sched.h	2010-10-27 14:04:44.000000000 -0500
+++ pv1010933/include/linux/sched.h	2010-10-27 14:04:47.119415143 -0500
@@ -666,7 +666,7 @@ struct user_struct {
 	atomic_t inotify_devs;	/* How many inotify devs does this user have opened? */
 #endif
 #ifdef CONFIG_EPOLL
-	atomic_t epoll_watches;	/* The number of file descriptors currently watched */
+	atomic_long_t epoll_watches; /* The number of file descriptors currently watched */
 #endif
 #ifdef CONFIG_POSIX_MQUEUE
 	/* protected by mq_lock	*/


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Patch] Convert max_user_watches to long.
  2010-10-27 19:09 ` Robin Holt
@ 2010-10-27 19:31   ` Davide Libenzi
  2010-10-27 23:45   ` Andrew Morton
  1 sibling, 0 replies; 15+ messages in thread
From: Davide Libenzi @ 2010-10-27 19:31 UTC (permalink / raw)
  To: Robin Holt
  Cc: Andrew Morton, Eric W. Biederman, Linux Kernel Mailing List,
	Pekka Enberg

On Wed, 27 Oct 2010, Robin Holt wrote:

> On a 16TB machine, max_user_watches has an integer overflow.  Convert it
> to use a long and handle the associated fallout.

I can't say I like adding an otherwise unused variable just to please 
pretty-looking, but the compiler will take care of it.

Acked-by: Davide Libenzi <davidel@xmailserver.org>




> Signed-off-by: Robin Holt <holt@sgi.com>
> To: Andrew Morton <akpm@linux-foundation.org>
> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> Cc: Davide Libenzi <davidel@xmailserver.org>
> Cc: linux-kernel@vger.kernel.org
> Cc: Pekka Enberg <penberg@cs.helsinki.fi>
> 
> ---
> 
>  fs/eventpoll.c        |   20 ++++++++++++--------
>  include/linux/sched.h |    2 +-
>  2 files changed, 13 insertions(+), 9 deletions(-)
> 
> Index: pv1010933/fs/eventpoll.c
> ===================================================================
> --- pv1010933.orig/fs/eventpoll.c	2010-10-27 14:04:44.000000000 -0500
> +++ pv1010933/fs/eventpoll.c	2010-10-27 14:07:50.211457551 -0500
> @@ -220,7 +220,7 @@ struct ep_send_events_data {
>   * Configuration options available inside /proc/sys/fs/epoll/
>   */
>  /* Maximum number of epoll watched descriptors, per user */
> -static int max_user_watches __read_mostly;
> +static long max_user_watches __read_mostly;
>  
>  /*
>   * This mutex is used to serialize ep_free() and eventpoll_release_file().
> @@ -243,16 +243,18 @@ static struct kmem_cache *pwq_cache __re
>  
>  #include <linux/sysctl.h>
>  
> -static int zero;
> +static long zero;
> +static long long_max = LONG_MAX;
>  
>  ctl_table epoll_table[] = {
>  	{
>  		.procname	= "max_user_watches",
>  		.data		= &max_user_watches,
> -		.maxlen		= sizeof(int),
> +		.maxlen		= sizeof(max_user_watches),
>  		.mode		= 0644,
> -		.proc_handler	= proc_dointvec_minmax,
> +		.proc_handler	= proc_doulongvec_minmax,
>  		.extra1		= &zero,
> +		.extra2		= &long_max,
>  	},
>  	{ }
>  };
> @@ -564,7 +566,7 @@ static int ep_remove(struct eventpoll *e
>  	/* At this point it is safe to free the eventpoll item */
>  	kmem_cache_free(epi_cache, epi);
>  
> -	atomic_dec(&ep->user->epoll_watches);
> +	atomic_long_dec(&ep->user->epoll_watches);
>  
>  	return 0;
>  }
> @@ -900,11 +902,12 @@ static int ep_insert(struct eventpoll *e
>  {
>  	int error, revents, pwake = 0;
>  	unsigned long flags;
> +	long user_watches;
>  	struct epitem *epi;
>  	struct ep_pqueue epq;
>  
> -	if (unlikely(atomic_read(&ep->user->epoll_watches) >=
> -		     max_user_watches))
> +	user_watches = atomic_long_read(&ep->user->epoll_watches);
> +	if (unlikely(user_watches >= max_user_watches))
>  		return -ENOSPC;
>  	if (!(epi = kmem_cache_alloc(epi_cache, GFP_KERNEL)))
>  		return -ENOMEM;
> @@ -968,7 +971,7 @@ static int ep_insert(struct eventpoll *e
>  
>  	spin_unlock_irqrestore(&ep->lock, flags);
>  
> -	atomic_inc(&ep->user->epoll_watches);
> +	atomic_long_inc(&ep->user->epoll_watches);
>  
>  	/* We have to call this outside the lock */
>  	if (pwake)
> @@ -1422,6 +1425,7 @@ static int __init eventpoll_init(void)
>  	 */
>  	max_user_watches = (((si.totalram - si.totalhigh) / 25) << PAGE_SHIFT) /
>  		EP_ITEM_COST;
> +	BUG_ON(max_user_watches < 0);
>  
>  	/* Initialize the structure used to perform safe poll wait head wake ups */
>  	ep_nested_calls_init(&poll_safewake_ncalls);
> Index: pv1010933/include/linux/sched.h
> ===================================================================
> --- pv1010933.orig/include/linux/sched.h	2010-10-27 14:04:44.000000000 -0500
> +++ pv1010933/include/linux/sched.h	2010-10-27 14:04:47.119415143 -0500
> @@ -666,7 +666,7 @@ struct user_struct {
>  	atomic_t inotify_devs;	/* How many inotify devs does this user have opened? */
>  #endif
>  #ifdef CONFIG_EPOLL
> -	atomic_t epoll_watches;	/* The number of file descriptors currently watched */
> +	atomic_long_t epoll_watches; /* The number of file descriptors currently watched */
>  #endif
>  #ifdef CONFIG_POSIX_MQUEUE
>  	/* protected by mq_lock	*/
> 
> 


- Davide



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Patch] Convert max_user_watches to long.
  2010-10-27 19:09 ` Robin Holt
  2010-10-27 19:31   ` Davide Libenzi
@ 2010-10-27 23:45   ` Andrew Morton
  2010-10-28  2:03     ` Davide Libenzi
  1 sibling, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2010-10-27 23:45 UTC (permalink / raw)
  To: Robin Holt; +Cc: Eric W. Biederman, Davide Libenzi, linux-kernel, Pekka Enberg

On Wed, 27 Oct 2010 14:09:15 -0500
Robin Holt <holt@sgi.com> wrote:

> On a 16TB machine, max_user_watches has an integer overflow.  Convert it
> to use a long and handle the associated fallout.
> 

hand-wavy reality check:

Are the existing defaults sane?  How well does the code perform with a
few billion watches?

Is the expected use case one-watch-per-user-per-fd?  If so, then
perhaps the max number of user_watches should have some realtionship
with the max number of fds?


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Patch] Convert max_user_watches to long.
  2010-10-27 23:45   ` Andrew Morton
@ 2010-10-28  2:03     ` Davide Libenzi
  2010-10-28  4:08       ` Andrew Morton
  0 siblings, 1 reply; 15+ messages in thread
From: Davide Libenzi @ 2010-10-28  2:03 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Robin Holt, Eric W. Biederman, Linux Kernel Mailing List,
	Pekka Enberg

On Wed, 27 Oct 2010, Andrew Morton wrote:

> On Wed, 27 Oct 2010 14:09:15 -0500
> Robin Holt <holt@sgi.com> wrote:
> 
> > On a 16TB machine, max_user_watches has an integer overflow.  Convert it
> > to use a long and handle the associated fallout.
> > 
> 
> hand-wavy reality check:
> 
> Are the existing defaults sane?  How well does the code perform with a
> few billion watches?

Algorithmically, wake ups events are O(1) and add/mod/del are log2(N).
Tell you what though, if you give me a 16TB RAM box, I will tell you how 
it performs in reality ;)



> Is the expected use case one-watch-per-user-per-fd?  If so, then
> perhaps the max number of user_watches should have some realtionship
> with the max number of fds?

Yes, the expected case is one watch per fd, but the reason the watch limit 
went in in the first place, was because of DoS potential of someone not 
playing nicely.



- Davide



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Patch] Convert max_user_watches to long.
  2010-10-28  2:03     ` Davide Libenzi
@ 2010-10-28  4:08       ` Andrew Morton
  0 siblings, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2010-10-28  4:08 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Robin Holt, Eric W. Biederman, Linux Kernel Mailing List,
	Pekka Enberg

On Wed, 27 Oct 2010 19:03:55 -0700 (PDT) Davide Libenzi <davidel@xmailserver.org> wrote:

> > Is the expected use case one-watch-per-user-per-fd?  If so, then
> > perhaps the max number of user_watches should have some realtionship
> > with the max number of fds?
> 
> Yes, the expected case is one watch per fd, but the reason the watch limit 
> went in in the first place, was because of DoS potential of someone not 
> playing nicely.

Sometimes DoS's are accidental.  It only takes 25 people to be running
the same buggy (eg, slowly-leaky) app (or shared lib) at the same time...

There are surely plenty of ways of that sort of thing happening, of
course.  Not that this fact actually improves anything.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2010-10-28  4:09 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-01 20:01 max_user_watches overflows on 16TB system Robin Holt
2010-10-01 20:37 ` Davide Libenzi
2010-10-02  3:04   ` Eric W. Biederman
2010-10-02 14:04     ` Davide Libenzi
2010-10-04 19:44       ` [Patch] Convert max_user_watches to long Robin Holt
2010-10-06  2:21         ` Davide Libenzi
2010-10-09  7:50           ` Robin Holt
2010-10-10 19:05             ` Randy Dunlap
2010-10-11  4:49             ` Davide Libenzi
2010-10-14 17:15               ` Robin Holt
     [not found] <20101027190914.146006767@gulag1.americas.sgi.com>
2010-10-27 19:09 ` Robin Holt
2010-10-27 19:31   ` Davide Libenzi
2010-10-27 23:45   ` Andrew Morton
2010-10-28  2:03     ` Davide Libenzi
2010-10-28  4:08       ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox