public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] Change default MSGMNI tunable to scale with system memory
@ 2007-12-11 15:38 Nadia.Derbey
  2007-12-11 15:38 ` [RFC PATCH 1/2] Scaling msgmni to the " Nadia.Derbey
  2007-12-11 15:38 ` [RFC PATCH 2/2] Scaling msgmni to the number of ipc namespaces Nadia.Derbey
  0 siblings, 2 replies; 5+ messages in thread
From: Nadia.Derbey @ 2007-12-11 15:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: matthltc


On large systems we'd like to allow a larger number of message queues.
In some cases up to 32K. However simply setting MSGMNI to a larger value may
cause problems for smaller systems.

The first patch of this series introduces a default maximum number of message
queue ids that scales with the total amount of memory.

Since msgmni is per namespace and there is no amount of memory dedicated to
each namespace so far, the second patch of this series scales msgmni to
the number of ipc namespaces.

I still have 2 issues that I'll try to solve next:
  . use hotplug_memory_notifier() with a callback routine that would recompute
    msgmni each time memory is brought offline / online. The issue here is
    that I couldn't find a simple way to walk through all the nsproxy
    structures (without walking through the task structures).
  . add a new notification mechanism that would recompute all the msg_ctlmni
    tunables each time an ipc namespace is created / removed.
    

These patches should be applied to 2.6.24-rc4, in the following order:

[PATCH 1/2]: ipc_scale_msgmni_with_totalram.patch
[PATCH 2/2]: ipc_scale_msgmni_with_namespaces.patch

Note: a big thank to Matt Helsley who gave me a help!

Regards,
Nadia

--

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFC PATCH 1/2] Scaling msgmni to the system memory
  2007-12-11 15:38 [RFC PATCH 0/2] Change default MSGMNI tunable to scale with system memory Nadia.Derbey
@ 2007-12-11 15:38 ` Nadia.Derbey
  2007-12-19  0:06   ` Andrew Morton
  2007-12-11 15:38 ` [RFC PATCH 2/2] Scaling msgmni to the number of ipc namespaces Nadia.Derbey
  1 sibling, 1 reply; 5+ messages in thread
From: Nadia.Derbey @ 2007-12-11 15:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: matthltc, Nadia Derbey

[-- Attachment #1: ipc_scale_msgmni_with_totalram.patch --]
[-- Type: text/plain, Size: 3465 bytes --]

[PATCH 01/02]

This patch computes msg_ctlmni to make it scale with system memory.
msg_ctlmni is now set to make the message queues occupy 1/32 of the available
memory.

Some cleaning has also been done in the MSGXXX constants:
  . MSGPOOL: the msgctl man page says it's not used, but it also defines it as
             a size in bytes (the code expresses it in Kbytes).
  . MSGSEG definition has been removed since it used only once in msgctl().

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>

---
 include/linux/msg.h |    6 +++---
 ipc/msg.c           |   20 ++++++++++++++++++--
 2 files changed, 21 insertions(+), 5 deletions(-)

Index: linux-2.6.24-rc4/include/linux/msg.h
===================================================================
--- linux-2.6.24-rc4.orig/include/linux/msg.h	2007-12-11 11:57:53.000000000 +0100
+++ linux-2.6.24-rc4/include/linux/msg.h	2007-12-11 12:10:01.000000000 +0100
@@ -49,17 +49,17 @@ struct msginfo {
 	unsigned short  msgseg; 
 };
 
+#define MSG_MEM_SCALE 32    /* Scaling factor to compute msgmni */
+
 #define MSGMNI    16   /* <= IPCMNI */     /* max # of msg queue identifiers */
 #define MSGMAX  8192   /* <= INT_MAX */   /* max size of message (bytes) */
 #define MSGMNB 16384   /* <= INT_MAX */   /* default max size of a message queue */
 
 /* unused */
-#define MSGPOOL (MSGMNI*MSGMNB/1024)  /* size in kilobytes of message pool */
+#define MSGPOOL (MSGMNI * MSGMNB) /* size in bytes of message pool */
 #define MSGTQL  MSGMNB            /* number of system message headers */
 #define MSGMAP  MSGMNB            /* number of entries in message map */
 #define MSGSSZ  16                /* message segment size */
-#define __MSGSEG ((MSGPOOL*1024)/ MSGSSZ) /* max no. of segments */
-#define MSGSEG (__MSGSEG <= 0xffff ? __MSGSEG : 0xffff)
 
 #ifdef __KERNEL__
 #include <linux/list.h>
Index: linux-2.6.24-rc4/ipc/msg.c
===================================================================
--- linux-2.6.24-rc4.orig/ipc/msg.c	2007-12-11 11:57:58.000000000 +0100
+++ linux-2.6.24-rc4/ipc/msg.c	2007-12-11 12:12:32.000000000 +0100
@@ -27,6 +27,7 @@
 #include <linux/msg.h>
 #include <linux/spinlock.h>
 #include <linux/init.h>
+#include <linux/mm.h>
 #include <linux/proc_fs.h>
 #include <linux/list.h>
 #include <linux/security.h>
@@ -81,10 +82,25 @@ static int sysvipc_msg_proc_show(struct 
 
 static void __msg_init_ns(struct ipc_namespace *ns, struct ipc_ids *ids)
 {
+	struct sysinfo i;
+	unsigned long allowed;
+
 	ns->ids[IPC_MSG_IDS] = ids;
 	ns->msg_ctlmax = MSGMAX;
 	ns->msg_ctlmnb = MSGMNB;
-	ns->msg_ctlmni = MSGMNI;
+
+	/*
+	 * Scale msgmni with the available memory size: the memory dedicated
+	 * to msg queues should occupy 1/32 of the available memory:
+	 * up to 8MB       : msgmni = 16 (MSGMNI)
+	 * 4 GB            : msgmni = 8K
+	 * more than 16 GB : msgmni = 32K (IPCMNI)
+	 */
+	si_meminfo(&i);
+	allowed = ((i.totalram / MSG_MEM_SCALE) * i.mem_unit) / MSGMNB;
+	ns->msg_ctlmni = min((unsigned long) IPCMNI,
+			max((unsigned long) MSGMNI, allowed));
+
 	atomic_set(&ns->msg_bytes, 0);
 	atomic_set(&ns->msg_hdrs, 0);
 	ipc_init_ids(ids);
@@ -458,7 +474,7 @@ asmlinkage long sys_msgctl(int msqid, in
 		msginfo.msgmax = ns->msg_ctlmax;
 		msginfo.msgmnb = ns->msg_ctlmnb;
 		msginfo.msgssz = MSGSSZ;
-		msginfo.msgseg = MSGSEG;
+		msginfo.msgseg = min(MSGPOOL / MSGSSZ, 0xffff);
 		down_read(&msg_ids(ns).rw_mutex);
 		if (cmd == MSG_INFO) {
 			msginfo.msgpool = msg_ids(ns).in_use;

--

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFC PATCH 2/2] Scaling msgmni to the number of ipc namespaces
  2007-12-11 15:38 [RFC PATCH 0/2] Change default MSGMNI tunable to scale with system memory Nadia.Derbey
  2007-12-11 15:38 ` [RFC PATCH 1/2] Scaling msgmni to the " Nadia.Derbey
@ 2007-12-11 15:38 ` Nadia.Derbey
  1 sibling, 0 replies; 5+ messages in thread
From: Nadia.Derbey @ 2007-12-11 15:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: matthltc, Nadia Derbey

[-- Attachment #1: ipc_scale_msgmni_with_namespaces.patch --]
[-- Type: text/plain, Size: 2757 bytes --]

[PATCH 02/02]

Since all the namespaces see the same amount of memory (the total one)
this patch introduces a new variable that counts the ipc namespaces and divides
msg_ctlmni by this counter.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>

---
 include/linux/ipc.h |    1 +
 ipc/msg.c           |    6 +++++-
 ipc/util.c          |    6 ++++++
 3 files changed, 12 insertions(+), 1 deletion(-)

Index: linux-2.6.24-rc4/include/linux/ipc.h
===================================================================
--- linux-2.6.24-rc4.orig/include/linux/ipc.h	2007-12-11 11:57:53.000000000 +0100
+++ linux-2.6.24-rc4/include/linux/ipc.h	2007-12-11 12:34:53.000000000 +0100
@@ -121,6 +121,7 @@ struct ipc_namespace {
 };
 
 extern struct ipc_namespace init_ipc_ns;
+extern atomic_t nr_ipc_ns;
 
 #ifdef CONFIG_SYSVIPC
 #define INIT_IPC_NS(ns)		.ns		= &init_ipc_ns,
Index: linux-2.6.24-rc4/ipc/util.c
===================================================================
--- linux-2.6.24-rc4.orig/ipc/util.c	2007-12-11 11:57:58.000000000 +0100
+++ linux-2.6.24-rc4/ipc/util.c	2007-12-11 12:36:32.000000000 +0100
@@ -51,6 +51,8 @@ struct ipc_namespace init_ipc_ns = {
 	},
 };
 
+atomic_t nr_ipc_ns = ATOMIC_INIT(1);
+
 static struct ipc_namespace *clone_ipc_ns(struct ipc_namespace *old_ns)
 {
 	int err;
@@ -61,6 +63,8 @@ static struct ipc_namespace *clone_ipc_n
 	if (ns == NULL)
 		goto err_mem;
 
+	atomic_inc(&nr_ipc_ns);
+
 	err = sem_init_ns(ns);
 	if (err)
 		goto err_sem;
@@ -80,6 +84,7 @@ err_msg:
 	sem_exit_ns(ns);
 err_sem:
 	kfree(ns);
+	atomic_dec(&nr_ipc_ns);
 err_mem:
 	return ERR_PTR(err);
 }
@@ -109,6 +114,7 @@ void free_ipc_ns(struct kref *kref)
 	msg_exit_ns(ns);
 	shm_exit_ns(ns);
 	kfree(ns);
+	atomic_dec(&nr_ipc_ns);
 }
 
 /**
Index: linux-2.6.24-rc4/ipc/msg.c
===================================================================
--- linux-2.6.24-rc4.orig/ipc/msg.c	2007-12-11 12:12:32.000000000 +0100
+++ linux-2.6.24-rc4/ipc/msg.c	2007-12-11 14:20:28.000000000 +0100
@@ -84,6 +84,7 @@ static void __msg_init_ns(struct ipc_nam
 {
 	struct sysinfo i;
 	unsigned long allowed;
+	int nb_ns;
 
 	ns->ids[IPC_MSG_IDS] = ids;
 	ns->msg_ctlmax = MSGMAX;
@@ -95,10 +96,13 @@ static void __msg_init_ns(struct ipc_nam
 	 * up to 8MB       : msgmni = 16 (MSGMNI)
 	 * 4 GB            : msgmni = 8K
 	 * more than 16 GB : msgmni = 32K (IPCMNI)
+	 * Also take into account the number of nsproxies created so far.
 	 */
 	si_meminfo(&i);
 	allowed = ((i.totalram / MSG_MEM_SCALE) * i.mem_unit) / MSGMNB;
-	ns->msg_ctlmni = min((unsigned long) IPCMNI,
+	nb_ns = atomic_read(&nr_ipc_ns);
+	allowed /= nb_ns;
+	ns->msg_ctlmni = min((unsigned long) (IPCMNI / nb_ns),
 			max((unsigned long) MSGMNI, allowed));
 
 	atomic_set(&ns->msg_bytes, 0);

--

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 1/2] Scaling msgmni to the system memory
  2007-12-11 15:38 ` [RFC PATCH 1/2] Scaling msgmni to the " Nadia.Derbey
@ 2007-12-19  0:06   ` Andrew Morton
  2007-12-19  2:20     ` Matt Helsley
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2007-12-19  0:06 UTC (permalink / raw)
  To: Nadia.Derbey; +Cc: linux-kernel, matthltc, Nadia.Derbey

On Tue, 11 Dec 2007 16:38:46 +0100
Nadia.Derbey@bull.net wrote:

> [PATCH 01/02]
> 
> This patch computes msg_ctlmni to make it scale with system memory.
> msg_ctlmni is now set to make the message queues occupy 1/32 of the available
> memory.
> 
> Some cleaning has also been done in the MSGXXX constants:
>   . MSGPOOL: the msgctl man page says it's not used, but it also defines it as
>              a size in bytes (the code expresses it in Kbytes).
>   . MSGSEG definition has been removed since it used only once in msgctl().
> 

The objective seems reasonable.

> 
> ===================================================================
> --- linux-2.6.24-rc4.orig/include/linux/msg.h	2007-12-11 11:57:53.000000000 +0100
> +++ linux-2.6.24-rc4/include/linux/msg.h	2007-12-11 12:10:01.000000000 +0100
> @@ -49,17 +49,17 @@ struct msginfo {
>  	unsigned short  msgseg; 
>  };
>  
> +#define MSG_MEM_SCALE 32    /* Scaling factor to compute msgmni */
> +
>  #define MSGMNI    16   /* <= IPCMNI */     /* max # of msg queue identifiers */
>  #define MSGMAX  8192   /* <= INT_MAX */   /* max size of message (bytes) */
>  #define MSGMNB 16384   /* <= INT_MAX */   /* default max size of a message queue */
>  
>  /* unused */
> -#define MSGPOOL (MSGMNI*MSGMNB/1024)  /* size in kilobytes of message pool */
> +#define MSGPOOL (MSGMNI * MSGMNB) /* size in bytes of message pool */
>  #define MSGTQL  MSGMNB            /* number of system message headers */
>  #define MSGMAP  MSGMNB            /* number of entries in message map */
>  #define MSGSSZ  16                /* message segment size */
> -#define __MSGSEG ((MSGPOOL*1024)/ MSGSSZ) /* max no. of segments */
> -#define MSGSEG (__MSGSEG <= 0xffff ? __MSGSEG : 0xffff)
>  
>  #ifdef __KERNEL__
>  #include <linux/list.h>
> Index: linux-2.6.24-rc4/ipc/msg.c
> ===================================================================
> --- linux-2.6.24-rc4.orig/ipc/msg.c	2007-12-11 11:57:58.000000000 +0100
> +++ linux-2.6.24-rc4/ipc/msg.c	2007-12-11 12:12:32.000000000 +0100
> @@ -27,6 +27,7 @@
>  #include <linux/msg.h>
>  #include <linux/spinlock.h>
>  #include <linux/init.h>
> +#include <linux/mm.h>
>  #include <linux/proc_fs.h>
>  #include <linux/list.h>
>  #include <linux/security.h>
> @@ -81,10 +82,25 @@ static int sysvipc_msg_proc_show(struct 
>  
>  static void __msg_init_ns(struct ipc_namespace *ns, struct ipc_ids *ids)
>  {
> +	struct sysinfo i;
> +	unsigned long allowed;
> +
>  	ns->ids[IPC_MSG_IDS] = ids;
>  	ns->msg_ctlmax = MSGMAX;
>  	ns->msg_ctlmnb = MSGMNB;
> -	ns->msg_ctlmni = MSGMNI;
> +
> +	/*
> +	 * Scale msgmni with the available memory size: the memory dedicated
> +	 * to msg queues should occupy 1/32 of the available memory:
> +	 * up to 8MB       : msgmni = 16 (MSGMNI)
> +	 * 4 GB            : msgmni = 8K
> +	 * more than 16 GB : msgmni = 32K (IPCMNI)
> +	 */
> +	si_meminfo(&i);
> +	allowed = ((i.totalram / MSG_MEM_SCALE) * i.mem_unit) / MSGMNB;
> +	ns->msg_ctlmni = min((unsigned long) IPCMNI,
> +			max((unsigned long) MSGMNI, allowed));

The space after the (typecast) isn't useful IMO.

Please use min_t rather than the open-coded casts.

Even better would be to sort out the types so that neither casts nor min_t
are needed.

What about highmem machines?  For those we usually want to scale data
structures according to the amount of direct-addressable memory (ie:
lowmem) rather than acording to total physical memory.  I haven't a clue
how this consideration would be addressed when ipc-namespaces is taken into
consideration.

I'd suggest the addition of a printk telling people what value the kernel
calculated.

We should ensure that the calculated value is never _less_ than what the
kernel was previously giving - to avoid breaking existing things.

It's a bit of a concern that a change like this can cause an application to
work OK on machine A but then fail when it is taken over to (smaller)
machine B.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 1/2] Scaling msgmni to the system memory
  2007-12-19  0:06   ` Andrew Morton
@ 2007-12-19  2:20     ` Matt Helsley
  0 siblings, 0 replies; 5+ messages in thread
From: Matt Helsley @ 2007-12-19  2:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Nadia.Derbey, LKML, CKRM-Tech

On Tue, 2007-12-18 at 16:06 -0800, Andrew Morton wrote:
> On Tue, 11 Dec 2007 16:38:46 +0100
> Nadia.Derbey@bull.net wrote:
> 
> > [PATCH 01/02]
> > 
> > This patch computes msg_ctlmni to make it scale with system memory.
> > msg_ctlmni is now set to make the message queues occupy 1/32 of the available
> > memory.
> > 
> > Some cleaning has also been done in the MSGXXX constants:
> >   . MSGPOOL: the msgctl man page says it's not used, but it also defines it as
> >              a size in bytes (the code expresses it in Kbytes).
> >   . MSGSEG definition has been removed since it used only once in msgctl().
> > 
> 
> The objective seems reasonable.
> 
> > 
> > ===================================================================
> > --- linux-2.6.24-rc4.orig/include/linux/msg.h	2007-12-11 11:57:53.000000000 +0100
> > +++ linux-2.6.24-rc4/include/linux/msg.h	2007-12-11 12:10:01.000000000 +0100
> > @@ -49,17 +49,17 @@ struct msginfo {
> >  	unsigned short  msgseg; 
> >  };
> >  
> > +#define MSG_MEM_SCALE 32    /* Scaling factor to compute msgmni */
> > +
> >  #define MSGMNI    16   /* <= IPCMNI */     /* max # of msg queue identifiers */
> >  #define MSGMAX  8192   /* <= INT_MAX */   /* max size of message (bytes) */
> >  #define MSGMNB 16384   /* <= INT_MAX */   /* default max size of a message queue */
> >  
> >  /* unused */
> > -#define MSGPOOL (MSGMNI*MSGMNB/1024)  /* size in kilobytes of message pool */
> > +#define MSGPOOL (MSGMNI * MSGMNB) /* size in bytes of message pool */
> >  #define MSGTQL  MSGMNB            /* number of system message headers */
> >  #define MSGMAP  MSGMNB            /* number of entries in message map */
> >  #define MSGSSZ  16                /* message segment size */
> > -#define __MSGSEG ((MSGPOOL*1024)/ MSGSSZ) /* max no. of segments */
> > -#define MSGSEG (__MSGSEG <= 0xffff ? __MSGSEG : 0xffff)
> >  
> >  #ifdef __KERNEL__
> >  #include <linux/list.h>
> > Index: linux-2.6.24-rc4/ipc/msg.c
> > ===================================================================
> > --- linux-2.6.24-rc4.orig/ipc/msg.c	2007-12-11 11:57:58.000000000 +0100
> > +++ linux-2.6.24-rc4/ipc/msg.c	2007-12-11 12:12:32.000000000 +0100
> > @@ -27,6 +27,7 @@
> >  #include <linux/msg.h>
> >  #include <linux/spinlock.h>
> >  #include <linux/init.h>
> > +#include <linux/mm.h>
> >  #include <linux/proc_fs.h>
> >  #include <linux/list.h>
> >  #include <linux/security.h>
> > @@ -81,10 +82,25 @@ static int sysvipc_msg_proc_show(struct 
> >  
> >  static void __msg_init_ns(struct ipc_namespace *ns, struct ipc_ids *ids)
> >  {
> > +	struct sysinfo i;
> > +	unsigned long allowed;
> > +
> >  	ns->ids[IPC_MSG_IDS] = ids;
> >  	ns->msg_ctlmax = MSGMAX;
> >  	ns->msg_ctlmnb = MSGMNB;
> > -	ns->msg_ctlmni = MSGMNI;
> > +
> > +	/*
> > +	 * Scale msgmni with the available memory size: the memory dedicated
> > +	 * to msg queues should occupy 1/32 of the available memory:
> > +	 * up to 8MB       : msgmni = 16 (MSGMNI)
> > +	 * 4 GB            : msgmni = 8K
> > +	 * more than 16 GB : msgmni = 32K (IPCMNI)
> > +	 */
> > +	si_meminfo(&i);
> > +	allowed = ((i.totalram / MSG_MEM_SCALE) * i.mem_unit) / MSGMNB;
> > +	ns->msg_ctlmni = min((unsigned long) IPCMNI,
> > +			max((unsigned long) MSGMNI, allowed));
> 
> The space after the (typecast) isn't useful IMO.
> 
> Please use min_t rather than the open-coded casts.
> 
> Even better would be to sort out the types so that neither casts nor min_t
> are needed.
> 
> What about highmem machines?  For those we usually want to scale data
> structures according to the amount of direct-addressable memory (ie:
> lowmem) rather than acording to total physical memory.  I haven't a clue

An excellent point.

> how this consideration would be addressed when ipc-namespaces is taken into
> consideration.

Nadia's second patch divides totalram by the number of IPC namespaces.
That would also need to be changed in response to your point about
highmem machines.

It might be reasonable to have new namespaces simply copy the old
namespace's msgmni rather than recalculate msgmni. Or keep a flag and
only recalculate rather than copy if userspace has not explicitly set
msgmni.

If something more complicated seems necessary, perhaps an IPC controller
for control groups could replace that heuristic.

> I'd suggest the addition of a printk telling people what value the kernel
> calculated.
> 
> We should ensure that the calculated value is never _less_ than what the
> kernel was previously giving - to avoid breaking existing things.

Agreed. I believe Nadia's "min(IPCMNI, max(MSGMNI,...))" line should
take care of this since IPCMNI (32767) > MSGMNI and MSGMNI (16) is what
the kernel was previously giving.

> It's a bit of a concern that a change like this can cause an application to
> work OK on machine A but then fail when it is taken over to (smaller)
> machine B.

Yes, it's a bit of a concern. I think we can do something when ipc
namespaces and/or control groups are involved. In those cases I think
these patches are no worse than what we have currently and do not
preclude future patches from making further improvements. Finally, in
the simple case of different machines, what can anyone reasonably do for
a user who is unprepared for failures when moving an application to a
smaller machine?

Cheers,
	-Matt Helsley


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-12-19  2:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-11 15:38 [RFC PATCH 0/2] Change default MSGMNI tunable to scale with system memory Nadia.Derbey
2007-12-11 15:38 ` [RFC PATCH 1/2] Scaling msgmni to the " Nadia.Derbey
2007-12-19  0:06   ` Andrew Morton
2007-12-19  2:20     ` Matt Helsley
2007-12-11 15:38 ` [RFC PATCH 2/2] Scaling msgmni to the number of ipc namespaces Nadia.Derbey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox