* [RFC -mm 0/6] sysv ipc: scale msgmnb with the number of cpus
@ 2008-06-06 6:09 Solofo.Ramangalahy
2008-06-06 6:09 ` [RFC -mm 1/6] sysv ipc: scale msgmnb to " Solofo.Ramangalahy
` (7 more replies)
0 siblings, 8 replies; 11+ messages in thread
From: Solofo.Ramangalahy @ 2008-06-06 6:09 UTC (permalink / raw)
To: linux-kernel
The size in bytes of a SysV IPC message queue, msgmnb, is too small
for large machines, but we don't want to bloat small machines
Several methods are used already to modify (mainly increase) msgmnb:
. distribution specific patch
. system wide sysctl.conf
. application specific tuning via /proc/sys/kernel/msgmnb
Integrating this series would:
. reflect hardware and software evolutions and diversity,
. reduce configuration/tuning for the applications.
Here is the timeline of the evolution of MSG* #defines:
Year 1994 1999 1999 2008
Version 1.0 2.3.27 2.3.30 2.6.24
#define MSGMNI 128 128 16 16
#define MSGMAX 4056 8192 8192 8192
#define MSGMNB 16384 16384 16384 16384
This patch series scales msgmnb, with respect to the number of
cpus/cores for larger machines. For uniprocessor machines the value
does not increase.
This series is similar to (and depends on) the series which scales
msgmni, the number of IPC message queue identifiers, to the amount of
low memory.
While Nadia's previous series scaled msgmni along the memory axis,
hence the message pool (msgmni x msgmnb), this series uses a second
axis: the number of online CPUs.
As well as covering the (cpu,memory) space of machines size, this
reflects the parallelism allowed by lockless send/receive for
in-flight messages in queues (msgmnb / msgmax messages).
The initial scaling is done at initialization of the ipc namespace.
Furthermore, the value becomes dynamic with respect to cpu hotplug.
The msgmni and msgmnb values become dependent, as the value of msgmni
is computed with respect to the value of msgmnb.
The series is as follows:
. patch 1 introduces the scaling function
. patch 2 deals with cpu hotplug
. patch 3 allows user space to disable the scaling mechanism
. patch 4 allows user space to reenable the scaling mechanism
. patch 5 finer grain disabling/reenabling scaling mechanism
(disconnect msgmnb and msgmni)
. patch 6 adds documentation
---
The series applies to 2.6.26-rc2-mm1 + patch suppressing KERN_INFO
messages as discussed at:
http://article.gmane.org/gmane.linux.kernel/686229
"[PATCH 1/1] Only output msgmni value at boot time"
(in mmotm: ipc-only-output-msgmni-value-at-boot-time.patch)
The plan would be to have this ready for the 2.6.27 merge window if
there are no objections.
Documentation/sysctl/kernel.txt | 27 ++++++++++++++++++++++
include/linux/ipc_namespace.h | 4 ++-
include/linux/msg.h | 5 ++++
ipc/ipc_sysctl.c | 48 ++++++++++++++++++++++++++++++----------
ipc/ipcns_notifier.c | 23 +++++++------------
ipc/msg.c | 25 +++++++++++++++++---
ipc/util.c | 28 +++++++++++++++++++++++
ipc/util.h | 1
8 files changed, 131 insertions(+), 30 deletions(-)
--
Solofo Ramangalahy
Bull SA.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC -mm 1/6] sysv ipc: scale msgmnb to the number of cpus
2008-06-06 6:09 [RFC -mm 0/6] sysv ipc: scale msgmnb with the number of cpus Solofo.Ramangalahy
@ 2008-06-06 6:09 ` Solofo.Ramangalahy
2008-06-06 6:09 ` [RFC -mm 2/6] sysv ipc: recompute msgmnb (and msgmni) on cpu hotplug addition and removal Solofo.Ramangalahy
` (6 subsequent siblings)
7 siblings, 0 replies; 11+ messages in thread
From: Solofo.Ramangalahy @ 2008-06-06 6:09 UTC (permalink / raw)
To: linux-kernel; +Cc: Solofo Ramangalahy
[-- Attachment #1: ipc-scale-msgmnb-with-the-number-of-cpus.patch --]
[-- Type: text/plain, Size: 2893 bytes --]
From: Solofo Ramangalahy <Solofo.Ramangalahy@bull.net>
Initialize msgmnb value to
min(MSGMNB * num_online_cpus(), MSGMNB * MSG_CPU_SCALE)
to increase the default value for larger machines.
MSG_CPU_SCALE scaling factor is defined to be 4, as 16384 x 4 = 65536
is an already used and recommended value.
The msgmni value is made dependant of msgmnb to keep the memory
dedicated to message queues within the 1/MSG_MEM_SCALE of lowmem
bound.
Unlike msgmni, the value is not scaled (down) with respect to the
number of ipc namespaces for simplicity.
Signed-off-by: Solofo Ramangalahy <Solofo.Ramangalahy@bull.net>
---
include/linux/msg.h | 5 +++++
ipc/msg.c | 19 +++++++++++++++----
2 files changed, 20 insertions(+), 4 deletions(-)
Index: b/ipc/msg.c
===================================================================
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -38,6 +38,7 @@
#include <linux/rwsem.h>
#include <linux/nsproxy.h>
#include <linux/ipc_namespace.h>
+#include <linux/cpumask.h>
#include <asm/current.h>
#include <asm/uaccess.h>
@@ -92,7 +93,7 @@ void recompute_msgmni(struct ipc_namespa
si_meminfo(&i);
allowed = (((i.totalram - i.totalhigh) / MSG_MEM_SCALE) * i.mem_unit)
- / MSGMNB;
+ / ns->msg_ctlmnb;
nb_ns = atomic_read(&nr_ipc_ns);
allowed /= nb_ns;
@@ -108,11 +109,21 @@ void recompute_msgmni(struct ipc_namespa
ns->msg_ctlmni = allowed;
}
+/*
+ * Scale msgmnb with the number of online cpus, up to 4x MSGMNB.
+ * ns->msg_ctlmnb cannot be assigned zero because of division in
+ * recompute_msgmni()
+ */
+void recompute_msgmnb(struct ipc_namespace *ns)
+{
+ ns->msg_ctlmnb =
+ min(MSGMNB * num_online_cpus(), MSGMNB * MSG_CPU_SCALE);
+}
void msg_init_ns(struct ipc_namespace *ns)
{
ns->msg_ctlmax = MSGMAX;
- ns->msg_ctlmnb = MSGMNB;
+ recompute_msgmnb(ns);
recompute_msgmni(ns);
@@ -132,8 +143,8 @@ void __init msg_init(void)
{
msg_init_ns(&init_ipc_ns);
- printk(KERN_INFO "msgmni has been set to %d\n",
- init_ipc_ns.msg_ctlmni);
+ printk(KERN_INFO "msgmni has been set to %d, msgmnb to %d\n",
+ init_ipc_ns.msg_ctlmni, init_ipc_ns.msg_ctlmnb);
ipc_init_proc_interface("sysvipc/msg",
" key msqid perms cbytes qnum lspid lrpid uid gid cuid cgid stime rtime ctime\n",
Index: b/include/linux/msg.h
===================================================================
--- a/include/linux/msg.h
+++ b/include/linux/msg.h
@@ -58,6 +58,11 @@ struct msginfo {
* more than 16 GB : msgmni = 32K (IPCMNI)
*/
#define MSG_MEM_SCALE 32
+/*
+ * Scaling factor to compute msgmnb:
+ * ns->msg_ctlmnb is between MSGMNB and MSGMNB * MSG_CPU_SCALE
+ */
+#define MSG_CPU_SCALE 4
#define MSGMNI 16 /* <= IPCMNI */ /* max # of msg queue identifiers */
#define MSGMAX 8192 /* <= INT_MAX */ /* max size of message (bytes) */
--
Solofo Ramangalahy
Bull SA.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC -mm 2/6] sysv ipc: recompute msgmnb (and msgmni) on cpu hotplug addition and removal
2008-06-06 6:09 [RFC -mm 0/6] sysv ipc: scale msgmnb with the number of cpus Solofo.Ramangalahy
2008-06-06 6:09 ` [RFC -mm 1/6] sysv ipc: scale msgmnb to " Solofo.Ramangalahy
@ 2008-06-06 6:09 ` Solofo.Ramangalahy
2008-06-06 6:09 ` [RFC -mm 3/6] sysv ipc: do not recompute msgmni anymore if explicitely set by user Solofo.Ramangalahy
` (5 subsequent siblings)
7 siblings, 0 replies; 11+ messages in thread
From: Solofo.Ramangalahy @ 2008-06-06 6:09 UTC (permalink / raw)
To: linux-kernel; +Cc: Solofo Ramangalahy
[-- Attachment #1: ipc-recompute-msgmnb-and-msgmni-on-cpu-hotplug-addition-removal.patch --]
[-- Type: text/plain, Size: 3798 bytes --]
From: Solofo Ramangalahy <Solofo.Ramangalahy@bull.net>
As msgmnb is scaled wrt. online cpus, cpu hotplug events should grow
and shrink the value.
Like msgmni with ipc_memory_callback(), the ipc_cpu_callback()
function triggers msgmnb recomputation.
Signed-off-by: Solofo Ramangalahy <Solofo.Ramangalahy@bull.net>
---
include/linux/ipc_namespace.h | 1 +
ipc/ipcns_notifier.c | 14 +++++++++-----
ipc/util.c | 28 ++++++++++++++++++++++++++++
ipc/util.h | 1 +
4 files changed, 39 insertions(+), 5 deletions(-)
Index: b/include/linux/ipc_namespace.h
===================================================================
--- a/include/linux/ipc_namespace.h
+++ b/include/linux/ipc_namespace.h
@@ -12,6 +12,7 @@
#define IPCNS_MEMCHANGED 0x00000001 /* Notify lowmem size changed */
#define IPCNS_CREATED 0x00000002 /* Notify new ipc namespace created */
#define IPCNS_REMOVED 0x00000003 /* Notify ipc namespace removed */
+#define IPCNS_CPUCHANGED 0x00000004 /* Notify cpu hotplug addition/removal */
#define IPCNS_CALLBACK_PRI 0
Index: b/ipc/util.c
===================================================================
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -34,6 +34,7 @@
#include <linux/nsproxy.h>
#include <linux/rwsem.h>
#include <linux/memory.h>
+#include <linux/cpu.h>
#include <linux/ipc_namespace.h>
#include <asm/unistd.h>
@@ -96,6 +97,32 @@ static int ipc_memory_callback(struct no
#endif /* CONFIG_MEMORY_HOTPLUG */
+#ifdef CONFIG_HOTPLUG_CPU
+
+static void ipc_cpu_notifier(struct work_struct *work)
+{
+ ipcns_notify(IPCNS_CPUCHANGED);
+}
+
+static DECLARE_WORK(ipc_cpu_wq, ipc_cpu_notifier);
+
+static int __cpuinit ipc_cpu_callback(struct notifier_block *nfb,
+ unsigned long action, void *hcpu)
+{
+ switch (action) {
+ case CPU_ONLINE:
+ case CPU_ONLINE_FROZEN:
+ case CPU_DEAD:
+ case CPU_DEAD_FROZEN:
+ schedule_work(&ipc_cpu_wq);
+ break;
+ default:
+ break;
+ }
+ return NOTIFY_OK;
+}
+
+#endif /* CONFIG_HOTPLUG_CPU */
/**
* ipc_init - initialise IPC subsystem
*
@@ -112,6 +139,7 @@ static int __init ipc_init(void)
msg_init();
shm_init();
hotplug_memory_notifier(ipc_memory_callback, IPC_CALLBACK_PRI);
+ hotcpu_notifier(ipc_cpu_callback, IPC_CALLBACK_PRI);
register_ipcns_notifier(&init_ipc_ns);
return 0;
}
Index: b/ipc/ipcns_notifier.c
===================================================================
--- a/ipc/ipcns_notifier.c
+++ b/ipc/ipcns_notifier.c
@@ -26,16 +26,20 @@ static int ipcns_callback(struct notifie
unsigned long action, void *arg)
{
struct ipc_namespace *ns;
-
+ ns = container_of(self, struct ipc_namespace, ipcns_nb);
switch (action) {
+ case IPCNS_CPUCHANGED:
+ /*
+ * Fall through.
+ * We do not scale msgmnb with IPC namespace
+ * add/remove for simplicity (adjustment of the
+ * message pool is done indirectly via msgmni).
+ */
+ recompute_msgmnb(ns);
case IPCNS_MEMCHANGED: /* amount of lowmem has changed */
case IPCNS_CREATED:
case IPCNS_REMOVED:
/*
- * It's time to recompute msgmni
- */
- ns = container_of(self, struct ipc_namespace, ipcns_nb);
- /*
* No need to get a reference on the ns: the 1st job of
* free_ipc_ns() is to unregister the callback routine.
* blocking_notifier_chain_unregister takes the wr lock to do
Index: b/ipc/util.h
===================================================================
--- a/ipc/util.h
+++ b/ipc/util.h
@@ -122,6 +122,7 @@ extern struct msg_msg *load_msg(const vo
extern int store_msg(void __user *dest, struct msg_msg *msg, int len);
extern void recompute_msgmni(struct ipc_namespace *);
+extern void recompute_msgmnb(struct ipc_namespace *);
static inline int ipc_buildid(int id, int seq)
{
--
Solofo Ramangalahy
Bull SA.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC -mm 3/6] sysv ipc: do not recompute msgmni anymore if explicitely set by user
2008-06-06 6:09 [RFC -mm 0/6] sysv ipc: scale msgmnb with the number of cpus Solofo.Ramangalahy
2008-06-06 6:09 ` [RFC -mm 1/6] sysv ipc: scale msgmnb to " Solofo.Ramangalahy
2008-06-06 6:09 ` [RFC -mm 2/6] sysv ipc: recompute msgmnb (and msgmni) on cpu hotplug addition and removal Solofo.Ramangalahy
@ 2008-06-06 6:09 ` Solofo.Ramangalahy
2008-06-06 6:09 ` [RFC -mm 4/6] sysv ipc: re-enable msgmnb automatic recomputing if set to negative Solofo.Ramangalahy
` (4 subsequent siblings)
7 siblings, 0 replies; 11+ messages in thread
From: Solofo.Ramangalahy @ 2008-06-06 6:09 UTC (permalink / raw)
To: linux-kernel; +Cc: Solofo Ramangalahy
[-- Attachment #1: ipc-do-not-recompute-msgmnb-anymore-if-explicitely-set-by-user.patch --]
[-- Type: text/plain, Size: 995 bytes --]
From: Solofo Ramangalahy <Solofo.Ramangalahy@bull.net>
To disable recomputation when user explicitely set a value,
reuse the callback defined for msgmni for msgmnb.
As msgmni and msgmnb are correlated, user settings of any of the two
disable recomputation of both, for now. This will be refined in a
later patch.
Signed-off-by: Solofo Ramangalahy <Solofo.Ramangalahy@bull.net>
---
ipc/ipc_sysctl.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Index: b/ipc/ipc_sysctl.c
===================================================================
--- a/ipc/ipc_sysctl.c
+++ b/ipc/ipc_sysctl.c
@@ -210,8 +210,8 @@ static struct ctl_table ipc_kern_table[]
.data = &init_ipc_ns.msg_ctlmnb,
.maxlen = sizeof (init_ipc_ns.msg_ctlmnb),
.mode = 0644,
- .proc_handler = proc_ipc_dointvec,
- .strategy = sysctl_ipc_data,
+ .proc_handler = proc_ipc_callback_dointvec,
+ .strategy = sysctl_ipc_registered_data,
},
{
.ctl_name = KERN_SEM,
--
Solofo Ramangalahy
Bull SA.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC -mm 4/6] sysv ipc: re-enable msgmnb automatic recomputing if set to negative
2008-06-06 6:09 [RFC -mm 0/6] sysv ipc: scale msgmnb with the number of cpus Solofo.Ramangalahy
` (2 preceding siblings ...)
2008-06-06 6:09 ` [RFC -mm 3/6] sysv ipc: do not recompute msgmni anymore if explicitely set by user Solofo.Ramangalahy
@ 2008-06-06 6:09 ` Solofo.Ramangalahy
2008-06-06 6:10 ` [RFC -mm 5/6] sysv ipc: deconnect msgmnb and msgmni deactivation and reactivation Solofo.Ramangalahy
` (3 subsequent siblings)
7 siblings, 0 replies; 11+ messages in thread
From: Solofo.Ramangalahy @ 2008-06-06 6:09 UTC (permalink / raw)
To: linux-kernel; +Cc: Solofo Ramangalahy
[-- Attachment #1: ipc-re-enable-msgmnb-automatic-recomputing-msgmnb-if-set-to-negative.patch --]
[-- Type: text/plain, Size: 749 bytes --]
From: Solofo Ramangalahy <Solofo.Ramangalahy@bull.net>
When a negative value is put in /proc/sys/kernel/msgmnb
automatic recomputing is re-enabled.
Signed-off-by: Solofo Ramangalahy <Solofo.Ramangalahy@bull.net>
---
ipc/ipc_sysctl.c | 1 +
1 file changed, 1 insertion(+)
Index: b/ipc/ipc_sysctl.c
===================================================================
--- a/ipc/ipc_sysctl.c
+++ b/ipc/ipc_sysctl.c
@@ -42,6 +42,7 @@ static void tunable_set_callback(int val
* Re-enable automatic recomputing only if not already
* enabled.
*/
+ recompute_msgmnb(current->nsproxy->ipc_ns);
recompute_msgmni(current->nsproxy->ipc_ns);
cond_register_ipcns_notifier(current->nsproxy->ipc_ns);
}
--
Solofo Ramangalahy
Bull SA.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC -mm 5/6] sysv ipc: deconnect msgmnb and msgmni deactivation and reactivation
2008-06-06 6:09 [RFC -mm 0/6] sysv ipc: scale msgmnb with the number of cpus Solofo.Ramangalahy
` (3 preceding siblings ...)
2008-06-06 6:09 ` [RFC -mm 4/6] sysv ipc: re-enable msgmnb automatic recomputing if set to negative Solofo.Ramangalahy
@ 2008-06-06 6:10 ` Solofo.Ramangalahy
2008-06-10 7:05 ` Nadia Derbey
2008-06-06 6:10 ` [RFC -mm 6/6] sysv ipc: documentation for msgmnb scaling wrt. cpus Solofo.Ramangalahy
` (2 subsequent siblings)
7 siblings, 1 reply; 11+ messages in thread
From: Solofo.Ramangalahy @ 2008-06-06 6:10 UTC (permalink / raw)
To: linux-kernel; +Cc: Solofo Ramangalahy
[-- Attachment #1: ipc-deconnect-msgmni-msgmnb-deactivation-reactivation.patch --]
[-- Type: text/plain, Size: 5440 bytes --]
From: Solofo Ramangalahy <Solofo.Ramangalahy@bull.net>
The msgmnb and msgmni values are coupled for deactivation and
reactivation of value computation.
The uncoupling of msgmn{b,i} for deactivation/reactivation of
recomputation adds flexibility and testability.
. Flexibility was discussed during the msgmni series development and
ended up with reactivation by negative value on /proc.
. Testability allows to experiment with the automatic computation of
msgmn{b,i} values. For example, if current algorithm does not fit
application needs.
Signed-off-by: Solofo Ramangalahy <Solofo.Ramangalahy@bull.net>
---
include/linux/ipc_namespace.h | 3 +-
ipc/ipc_sysctl.c | 45 ++++++++++++++++++++++++++++++++----------
ipc/ipcns_notifier.c | 9 --------
ipc/msg.c | 6 +++++
4 files changed, 43 insertions(+), 20 deletions(-)
Index: b/include/linux/ipc_namespace.h
===================================================================
--- a/include/linux/ipc_namespace.h
+++ b/include/linux/ipc_namespace.h
@@ -34,7 +34,9 @@ struct ipc_namespace {
int msg_ctlmax;
int msg_ctlmnb;
+ bool msg_ctlmnb_activated; /* recompute_msgmnb activation */
int msg_ctlmni;
+ bool msg_ctlmni_activated; /* recompute_msgmni activation */
atomic_t msg_bytes;
atomic_t msg_hdrs;
@@ -53,7 +55,6 @@ extern atomic_t nr_ipc_ns;
#define INIT_IPC_NS(ns) .ns = &init_ipc_ns,
extern int register_ipcns_notifier(struct ipc_namespace *);
-extern int cond_register_ipcns_notifier(struct ipc_namespace *);
extern int unregister_ipcns_notifier(struct ipc_namespace *);
extern int ipcns_notify(unsigned long);
Index: b/ipc/msg.c
===================================================================
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -91,6 +91,8 @@ void recompute_msgmni(struct ipc_namespa
unsigned long allowed;
int nb_ns;
+ if (!ns->msg_ctlmni_activated)
+ return;
si_meminfo(&i);
allowed = (((i.totalram - i.totalhigh) / MSG_MEM_SCALE) * i.mem_unit)
/ ns->msg_ctlmnb;
@@ -116,6 +118,8 @@ void recompute_msgmni(struct ipc_namespa
*/
void recompute_msgmnb(struct ipc_namespace *ns)
{
+ if (!ns->msg_ctlmnb_activated)
+ return;
ns->msg_ctlmnb =
min(MSGMNB * num_online_cpus(), MSGMNB * MSG_CPU_SCALE);
}
@@ -123,6 +127,8 @@ void recompute_msgmnb(struct ipc_namespa
void msg_init_ns(struct ipc_namespace *ns)
{
ns->msg_ctlmax = MSGMAX;
+ ns->msg_ctlmnb_activated = true;
+ ns->msg_ctlmni_activated = true;
recompute_msgmnb(ns);
recompute_msgmni(ns);
Index: b/ipc/ipc_sysctl.c
===================================================================
--- a/ipc/ipc_sysctl.c
+++ b/ipc/ipc_sysctl.c
@@ -33,18 +33,42 @@ static void *get_ipc(ctl_table *table)
* add/remove or ipc namespace creation/removal.
* They can come back to a recomputable state by being set to a <0 value.
*/
-static void tunable_set_callback(int val)
+static void tunable_set_callback(int val, ctl_table *table)
{
- if (val >= 0)
- unregister_ipcns_notifier(current->nsproxy->ipc_ns);
- else {
+ int tunable = table->ctl_name;
+
+ if (val >= 0) {
+ switch (tunable) {
+ case KERN_MSGMNB:
+ current->nsproxy->ipc_ns->msg_ctlmnb_activated = false;
+ break;
+ case KERN_MSGMNI:
+ current->nsproxy->ipc_ns->msg_ctlmni_activated = false;
+ break;
+ default:
+ printk(KERN_ERR "ipc: unexpected value %s\n",
+ table->procname);
+ break;
+ }
+ } else {
/*
* Re-enable automatic recomputing only if not already
* enabled.
*/
- recompute_msgmnb(current->nsproxy->ipc_ns);
- recompute_msgmni(current->nsproxy->ipc_ns);
- cond_register_ipcns_notifier(current->nsproxy->ipc_ns);
+ switch (tunable) {
+ case KERN_MSGMNB:
+ current->nsproxy->ipc_ns->msg_ctlmnb_activated = true;
+ recompute_msgmnb(current->nsproxy->ipc_ns);
+ /* fall through */
+ case KERN_MSGMNI:
+ current->nsproxy->ipc_ns->msg_ctlmni_activated = true;
+ recompute_msgmni(current->nsproxy->ipc_ns);
+ break;
+ default:
+ printk(KERN_ERR "ipc: unexpected value %s\n",
+ table->procname);
+ break;
+ }
}
}
@@ -72,7 +96,8 @@ static int proc_ipc_callback_dointvec(ct
rc = proc_dointvec(&ipc_table, write, filp, buffer, lenp, ppos);
if (write && !rc && lenp_bef == *lenp)
- tunable_set_callback(*((int *)(ipc_table.data)));
+ BUG_ON(table == NULL);
+ tunable_set_callback(*((int *)(ipc_table.data)), table);
return rc;
}
@@ -148,8 +173,8 @@ static int sysctl_ipc_registered_data(ct
* Tunable has successfully been changed from userland
*/
int *data = get_ipc(table);
-
- tunable_set_callback(*data);
+ BUG_ON(table == NULL);
+ tunable_set_callback(*data, table);
}
return rc;
Index: b/ipc/ipcns_notifier.c
===================================================================
--- a/ipc/ipcns_notifier.c
+++ b/ipc/ipcns_notifier.c
@@ -65,15 +65,6 @@ int register_ipcns_notifier(struct ipc_n
return blocking_notifier_chain_register(&ipcns_chain, &ns->ipcns_nb);
}
-int cond_register_ipcns_notifier(struct ipc_namespace *ns)
-{
- memset(&ns->ipcns_nb, 0, sizeof(ns->ipcns_nb));
- ns->ipcns_nb.notifier_call = ipcns_callback;
- ns->ipcns_nb.priority = IPCNS_CALLBACK_PRI;
- return blocking_notifier_chain_cond_register(&ipcns_chain,
- &ns->ipcns_nb);
-}
-
int unregister_ipcns_notifier(struct ipc_namespace *ns)
{
return blocking_notifier_chain_unregister(&ipcns_chain,
--
Solofo Ramangalahy
Bull SA.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC -mm 6/6] sysv ipc: documentation for msgmnb scaling wrt. cpus
2008-06-06 6:09 [RFC -mm 0/6] sysv ipc: scale msgmnb with the number of cpus Solofo.Ramangalahy
` (4 preceding siblings ...)
2008-06-06 6:10 ` [RFC -mm 5/6] sysv ipc: deconnect msgmnb and msgmni deactivation and reactivation Solofo.Ramangalahy
@ 2008-06-06 6:10 ` Solofo.Ramangalahy
2008-06-06 8:23 ` [RFC -mm 0/6] sysv ipc: scale msgmnb with the number of cpus Nick Piggin
2008-06-10 6:56 ` Nadia Derbey
7 siblings, 0 replies; 11+ messages in thread
From: Solofo.Ramangalahy @ 2008-06-06 6:10 UTC (permalink / raw)
To: linux-kernel; +Cc: Solofo Ramangalahy
[-- Attachment #1: ipc-documentation-scale-msgmnb-with-the-number-of-cpus.patch --]
[-- Type: text/plain, Size: 1592 bytes --]
From: Solofo Ramangalahy <Solofo.Ramangalahy@bull.net>
Add documentation to explain how to disable and reenable the
computation mechanism.
Signed-off-by: Solofo Ramangalahy <Solofo.Ramangalahy@bull.net>
---
Documentation/sysctl/kernel.txt | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)
Index: b/Documentation/sysctl/kernel.txt
===================================================================
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -179,6 +179,33 @@ kernel stack.
==============================================================
+msgmnb
+
+Maximum size in bytes, not in message count, of a single SystemV IPC
+message queue (b stands for bytes).
+
+This value is dynamic and depends on the online cpu count of the
+machine (taking cpu hotplug into account).
+
+Computed values are between MSGMNB and MSGMNB*MSG_CPU_SCALE #define
+constants (currently [16384,65536]).
+
+The exact value is automatically (re)computed, but:
+. If the value is positioned from user space (via procfs or sysctl()),
+ to a positive value then the automatic recomputation is
+ disabled. This leaves control to user space. E.g.
+
+ # echo 16384 > /proc/sys/kernel/msgmnb
+
+. If the value is positioned from user space to a negative value, then
+ the computation is reenabled. E.g.
+
+ # echo -1 > /proc/sys/kernel/msgmnb
+
+See recompute_msgmnb() function in ipc/ directory for details.
+
+==============================================================
+
osrelease, ostype & version:
# cat osrelease
--
Solofo Ramangalahy
Bull SA.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC -mm 0/6] sysv ipc: scale msgmnb with the number of cpus
2008-06-06 6:09 [RFC -mm 0/6] sysv ipc: scale msgmnb with the number of cpus Solofo.Ramangalahy
` (5 preceding siblings ...)
2008-06-06 6:10 ` [RFC -mm 6/6] sysv ipc: documentation for msgmnb scaling wrt. cpus Solofo.Ramangalahy
@ 2008-06-06 8:23 ` Nick Piggin
2008-06-06 10:20 ` Solofo.Ramangalahy
2008-06-10 6:56 ` Nadia Derbey
7 siblings, 1 reply; 11+ messages in thread
From: Nick Piggin @ 2008-06-06 8:23 UTC (permalink / raw)
To: Solofo.Ramangalahy; +Cc: linux-kernel
On Friday 06 June 2008 16:09, Solofo.Ramangalahy@bull.net wrote:
> The size in bytes of a SysV IPC message queue, msgmnb, is too small
> for large machines, but we don't want to bloat small machines
What's your evidence for this? Can you provide before / after
performance numbers?
Also, when scaling things like this, it is probably more usual
to use a log scale rather than linear, so that's a thought.
>
> Several methods are used already to modify (mainly increase) msgmnb:
> . distribution specific patch
> . system wide sysctl.conf
> . application specific tuning via /proc/sys/kernel/msgmnb
>
> Integrating this series would:
> . reflect hardware and software evolutions and diversity,
> . reduce configuration/tuning for the applications.
>
> Here is the timeline of the evolution of MSG* #defines:
> Year 1994 1999 1999 2008
> Version 1.0 2.3.27 2.3.30 2.6.24
> #define MSGMNI 128 128 16 16
> #define MSGMAX 4056 8192 8192 8192
> #define MSGMNB 16384 16384 16384 16384
>
> This patch series scales msgmnb, with respect to the number of
> cpus/cores for larger machines. For uniprocessor machines the value
> does not increase.
>
> This series is similar to (and depends on) the series which scales
> msgmni, the number of IPC message queue identifiers, to the amount of
> low memory.
> While Nadia's previous series scaled msgmni along the memory axis,
> hence the message pool (msgmni x msgmnb), this series uses a second
> axis: the number of online CPUs.
> As well as covering the (cpu,memory) space of machines size, this
> reflects the parallelism allowed by lockless send/receive for
> in-flight messages in queues (msgmnb / msgmax messages).
>
> The initial scaling is done at initialization of the ipc namespace.
> Furthermore, the value becomes dynamic with respect to cpu hotplug.
>
> The msgmni and msgmnb values become dependent, as the value of msgmni
> is computed with respect to the value of msgmnb.
>
> The series is as follows:
> . patch 1 introduces the scaling function
> . patch 2 deals with cpu hotplug
> . patch 3 allows user space to disable the scaling mechanism
> . patch 4 allows user space to reenable the scaling mechanism
> . patch 5 finer grain disabling/reenabling scaling mechanism
> (disconnect msgmnb and msgmni)
> . patch 6 adds documentation
>
> ---
>
> The series applies to 2.6.26-rc2-mm1 + patch suppressing KERN_INFO
> messages as discussed at:
> http://article.gmane.org/gmane.linux.kernel/686229
> "[PATCH 1/1] Only output msgmni value at boot time"
> (in mmotm: ipc-only-output-msgmni-value-at-boot-time.patch)
>
> The plan would be to have this ready for the 2.6.27 merge window if
> there are no objections.
>
> Documentation/sysctl/kernel.txt | 27 ++++++++++++++++++++++
> include/linux/ipc_namespace.h | 4 ++-
> include/linux/msg.h | 5 ++++
> ipc/ipc_sysctl.c | 48
> ++++++++++++++++++++++++++++++---------- ipc/ipcns_notifier.c |
> 23 +++++++------------
> ipc/msg.c | 25 +++++++++++++++++---
> ipc/util.c | 28 +++++++++++++++++++++++
> ipc/util.h | 1
> 8 files changed, 131 insertions(+), 30 deletions(-)
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC -mm 0/6] sysv ipc: scale msgmnb with the number of cpus
2008-06-06 8:23 ` [RFC -mm 0/6] sysv ipc: scale msgmnb with the number of cpus Nick Piggin
@ 2008-06-06 10:20 ` Solofo.Ramangalahy
0 siblings, 0 replies; 11+ messages in thread
From: Solofo.Ramangalahy @ 2008-06-06 10:20 UTC (permalink / raw)
To: Nick Piggin; +Cc: linux-kernel
Hi Nick,
Nick Piggin writes:
> On Friday 06 June 2008 16:09, Solofo.Ramangalahy@bull.net wrote:
> > The size in bytes of a SysV IPC message queue, msgmnb, is too small
> > for large machines, but we don't want to bloat small machines
>
> What's your evidence for this? Can you provide before / after
> performance numbers?
Maybe I have not been clear enough that this is not directly about
performance, but more changing default value. So maybe "scale" in the
title is misleading.
The evidence would be that these default values are changed either
by a patch or "manually":
> > Several methods are used already to modify (mainly increase) msgmnb:
> > . distribution specific patch
> > . system wide sysctl.conf
> > . application specific tuning via /proc/sys/kernel/msgmnb
Further "evidence" could be googling for "linux msgmnb 65536", to see
that tuning for benchmarks or recommended application configuration
increase the value.
This is just settings default values. Performance test results would
not be different from those obtained by setting the values before
running the tests.
So here :
> > Here is the timeline of the evolution of MSG* #defines:
> > Year 1994 1999 1999 2008
> > Version 1.0 2.3.27 2.3.30 2.6.24
> > #define MSGMNI 128 128 16 16
> > #define MSGMAX 4056 8192 8192 8192
> > #define MSGMNB 16384 16384 16384 16384
I have 65536 instead of 16384 for msgmnb
(1982 instead of 16 for msgmni)
for my 4 cpus/4GB x86_64 machine.
Some result with pmsg used in recent discussions about performance gives:
16384/16:
./pmsg 4 10 |grep Total
Total: 9795993
65536/1982:
./pmsg 4 10 |grep Total
Total: 9829590
> Also, when scaling things like this, it is probably more usual to
> use a log scale rather than linear, so that's a thought.
Agreed, in general.
Here, there are only 4 values, so I do not think it is worth using a
log scale.
If different values are desirable (finer grain, bigger,...), then the
formula can be easily refined:
min(MSGMNB * num_online_cpus(), MSGMNB * MSG_CPU_SCALE);
What I did for the formula is simply taking the old value, a known
modified value and the intermediate values.
I hope this answers your questions,
--
solofo
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC -mm 0/6] sysv ipc: scale msgmnb with the number of cpus
2008-06-06 6:09 [RFC -mm 0/6] sysv ipc: scale msgmnb with the number of cpus Solofo.Ramangalahy
` (6 preceding siblings ...)
2008-06-06 8:23 ` [RFC -mm 0/6] sysv ipc: scale msgmnb with the number of cpus Nick Piggin
@ 2008-06-10 6:56 ` Nadia Derbey
7 siblings, 0 replies; 11+ messages in thread
From: Nadia Derbey @ 2008-06-10 6:56 UTC (permalink / raw)
To: Solofo.Ramangalahy; +Cc: linux-kernel
Solofo.Ramangalahy@bull.net wrote:
> The size in bytes of a SysV IPC message queue, msgmnb, is too small
> for large machines, but we don't want to bloat small machines
>
> Several methods are used already to modify (mainly increase) msgmnb:
> . distribution specific patch
> . system wide sysctl.conf
> . application specific tuning via /proc/sys/kernel/msgmnb
>
> Integrating this series would:
> . reflect hardware and software evolutions and diversity,
> . reduce configuration/tuning for the applications.
>
> Here is the timeline of the evolution of MSG* #defines:
> Year 1994 1999 1999 2008
> Version 1.0 2.3.27 2.3.30 2.6.24
> #define MSGMNI 128 128 16 16
> #define MSGMAX 4056 8192 8192 8192
> #define MSGMNB 16384 16384 16384 16384
>
> This patch series scales msgmnb, with respect to the number of
> cpus/cores for larger machines. For uniprocessor machines the value
> does not increase.
>
> This series is similar to (and depends on) the series which scales
> msgmni, the number of IPC message queue identifiers, to the amount of
> low memory.
> While Nadia's previous series scaled msgmni along the memory axis,
> hence the message pool (msgmni x msgmnb), this series uses a second
> axis: the number of online CPUs.
> As well as covering the (cpu,memory) space of machines size, this
> reflects the parallelism allowed by lockless send/receive for
> in-flight messages in queues (msgmnb / msgmax messages).
>
> The initial scaling is done at initialization of the ipc namespace.
> Furthermore, the value becomes dynamic with respect to cpu hotplug.
>
> The msgmni and msgmnb values become dependent, as the value of msgmni
> is computed with respect to the value of msgmnb.
>
> The series is as follows:
> . patch 1 introduces the scaling function
> . patch 2 deals with cpu hotplug
> . patch 3 allows user space to disable the scaling mechanism
> . patch 4 allows user space to reenable the scaling mechanism
> . patch 5 finer grain disabling/reenabling scaling mechanism
> (disconnect msgmnb and msgmni)
> . patch 6 adds documentation
>
Solofo,
Patches 3 and 4 are useless imho. If you really really want to keep
them, you should at least merge them.
Regards,
Nadia
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC -mm 5/6] sysv ipc: deconnect msgmnb and msgmni deactivation and reactivation
2008-06-06 6:10 ` [RFC -mm 5/6] sysv ipc: deconnect msgmnb and msgmni deactivation and reactivation Solofo.Ramangalahy
@ 2008-06-10 7:05 ` Nadia Derbey
0 siblings, 0 replies; 11+ messages in thread
From: Nadia Derbey @ 2008-06-10 7:05 UTC (permalink / raw)
To: Solofo.Ramangalahy; +Cc: linux-kernel
Solofo.Ramangalahy@bull.net wrote:
> From: Solofo Ramangalahy <Solofo.Ramangalahy@bull.net>
>
> The msgmnb and msgmni values are coupled for deactivation and
> reactivation of value computation.
>
> The uncoupling of msgmn{b,i} for deactivation/reactivation of
> recomputation adds flexibility and testability.
>
> . Flexibility was discussed during the msgmni series development and
> ended up with reactivation by negative value on /proc.
>
> . Testability allows to experiment with the automatic computation of
> msgmn{b,i} values. For example, if current algorithm does not fit
> application needs.
>
>
> Signed-off-by: Solofo Ramangalahy <Solofo.Ramangalahy@bull.net>
>
> ---
> include/linux/ipc_namespace.h | 3 +-
> ipc/ipc_sysctl.c | 45 ++++++++++++++++++++++++++++++++----------
> ipc/ipcns_notifier.c | 9 --------
> ipc/msg.c | 6 +++++
> 4 files changed, 43 insertions(+), 20 deletions(-)
>
> Index: b/include/linux/ipc_namespace.h
> ===================================================================
> --- a/include/linux/ipc_namespace.h
> +++ b/include/linux/ipc_namespace.h
> @@ -34,7 +34,9 @@ struct ipc_namespace {
>
> int msg_ctlmax;
> int msg_ctlmnb;
> + bool msg_ctlmnb_activated; /* recompute_msgmnb activation */
> int msg_ctlmni;
> + bool msg_ctlmni_activated; /* recompute_msgmni activation */
> atomic_t msg_bytes;
> atomic_t msg_hdrs;
>
> @@ -53,7 +55,6 @@ extern atomic_t nr_ipc_ns;
> #define INIT_IPC_NS(ns) .ns = &init_ipc_ns,
>
> extern int register_ipcns_notifier(struct ipc_namespace *);
> -extern int cond_register_ipcns_notifier(struct ipc_namespace *);
> extern int unregister_ipcns_notifier(struct ipc_namespace *);
> extern int ipcns_notify(unsigned long);
>
> Index: b/ipc/msg.c
> ===================================================================
> --- a/ipc/msg.c
> +++ b/ipc/msg.c
> @@ -91,6 +91,8 @@ void recompute_msgmni(struct ipc_namespa
> unsigned long allowed;
> int nb_ns;
>
> + if (!ns->msg_ctlmni_activated)
> + return;
> si_meminfo(&i);
> allowed = (((i.totalram - i.totalhigh) / MSG_MEM_SCALE) * i.mem_unit)
> / ns->msg_ctlmnb;
> @@ -116,6 +118,8 @@ void recompute_msgmni(struct ipc_namespa
> */
> void recompute_msgmnb(struct ipc_namespace *ns)
> {
> + if (!ns->msg_ctlmnb_activated)
> + return;
> ns->msg_ctlmnb =
> min(MSGMNB * num_online_cpus(), MSGMNB * MSG_CPU_SCALE);
> }
> @@ -123,6 +127,8 @@ void recompute_msgmnb(struct ipc_namespa
> void msg_init_ns(struct ipc_namespace *ns)
> {
> ns->msg_ctlmax = MSGMAX;
> + ns->msg_ctlmnb_activated = true;
> + ns->msg_ctlmni_activated = true;
> recompute_msgmnb(ns);
>
> recompute_msgmni(ns);
> Index: b/ipc/ipc_sysctl.c
> ===================================================================
> --- a/ipc/ipc_sysctl.c
> +++ b/ipc/ipc_sysctl.c
> @@ -33,18 +33,42 @@ static void *get_ipc(ctl_table *table)
> * add/remove or ipc namespace creation/removal.
> * They can come back to a recomputable state by being set to a <0 value.
> */
> -static void tunable_set_callback(int val)
> +static void tunable_set_callback(int val, ctl_table *table)
> {
> - if (val >= 0)
> - unregister_ipcns_notifier(current->nsproxy->ipc_ns);
> - else {
> + int tunable = table->ctl_name;
> +
> + if (val >= 0) {
> + switch (tunable) {
> + case KERN_MSGMNB:
> + current->nsproxy->ipc_ns->msg_ctlmnb_activated = false;
> + break;
> + case KERN_MSGMNI:
> + current->nsproxy->ipc_ns->msg_ctlmni_activated = false;
> + break;
> + default:
> + printk(KERN_ERR "ipc: unexpected value %s\n",
> + table->procname);
> + break;
> + }
> + } else {
> /*
> * Re-enable automatic recomputing only if not already
> * enabled.
> */
> - recompute_msgmnb(current->nsproxy->ipc_ns);
> - recompute_msgmni(current->nsproxy->ipc_ns);
> - cond_register_ipcns_notifier(current->nsproxy->ipc_ns);
> + switch (tunable) {
> + case KERN_MSGMNB:
> + current->nsproxy->ipc_ns->msg_ctlmnb_activated = true;
> + recompute_msgmnb(current->nsproxy->ipc_ns);
> + /* fall through */
You shouldn't be falling through here: if you do that, re-enablng msgmnb
will re-enable msgmni too.
> + case KERN_MSGMNI:
> + current->nsproxy->ipc_ns->msg_ctlmni_activated = true;
> + recompute_msgmni(current->nsproxy->ipc_ns);
> + break;
> + default:
> + printk(KERN_ERR "ipc: unexpected value %s\n",
> + table->procname);
> + break;
> + }
> }
> }
>
> @@ -72,7 +96,8 @@ static int proc_ipc_callback_dointvec(ct
> rc = proc_dointvec(&ipc_table, write, filp, buffer, lenp, ppos);
>
> if (write && !rc && lenp_bef == *lenp)
> - tunable_set_callback(*((int *)(ipc_table.data)));
> + BUG_ON(table == NULL);
> + tunable_set_callback(*((int *)(ipc_table.data)), table);
>
> return rc;
> }
> @@ -148,8 +173,8 @@ static int sysctl_ipc_registered_data(ct
> * Tunable has successfully been changed from userland
> */
> int *data = get_ipc(table);
> -
> - tunable_set_callback(*data);
> + BUG_ON(table == NULL);
> + tunable_set_callback(*data, table);
> }
>
> return rc;
> Index: b/ipc/ipcns_notifier.c
> ===================================================================
> --- a/ipc/ipcns_notifier.c
> +++ b/ipc/ipcns_notifier.c
> @@ -65,15 +65,6 @@ int register_ipcns_notifier(struct ipc_n
> return blocking_notifier_chain_register(&ipcns_chain, &ns->ipcns_nb);
> }
>
> -int cond_register_ipcns_notifier(struct ipc_namespace *ns)
> -{
> - memset(&ns->ipcns_nb, 0, sizeof(ns->ipcns_nb));
> - ns->ipcns_nb.notifier_call = ipcns_callback;
> - ns->ipcns_nb.priority = IPCNS_CALLBACK_PRI;
> - return blocking_notifier_chain_cond_register(&ipcns_chain,
> - &ns->ipcns_nb);
> -}
> -
> int unregister_ipcns_notifier(struct ipc_namespace *ns)
> {
> return blocking_notifier_chain_unregister(&ipcns_chain,
>
Doing this, we are completly loosing the benefits of the notification
chains: since the the notifier blocks remain registered + we are
unconditionally adding a test at the top of each recompute routine. But
the other choice would hve been to define another notifier chain
dedicated to msgmnb. I'm not convinced about what is the best solution?
Regards,
Nadia
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2008-06-10 7:05 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-06 6:09 [RFC -mm 0/6] sysv ipc: scale msgmnb with the number of cpus Solofo.Ramangalahy
2008-06-06 6:09 ` [RFC -mm 1/6] sysv ipc: scale msgmnb to " Solofo.Ramangalahy
2008-06-06 6:09 ` [RFC -mm 2/6] sysv ipc: recompute msgmnb (and msgmni) on cpu hotplug addition and removal Solofo.Ramangalahy
2008-06-06 6:09 ` [RFC -mm 3/6] sysv ipc: do not recompute msgmni anymore if explicitely set by user Solofo.Ramangalahy
2008-06-06 6:09 ` [RFC -mm 4/6] sysv ipc: re-enable msgmnb automatic recomputing if set to negative Solofo.Ramangalahy
2008-06-06 6:10 ` [RFC -mm 5/6] sysv ipc: deconnect msgmnb and msgmni deactivation and reactivation Solofo.Ramangalahy
2008-06-10 7:05 ` Nadia Derbey
2008-06-06 6:10 ` [RFC -mm 6/6] sysv ipc: documentation for msgmnb scaling wrt. cpus Solofo.Ramangalahy
2008-06-06 8:23 ` [RFC -mm 0/6] sysv ipc: scale msgmnb with the number of cpus Nick Piggin
2008-06-06 10:20 ` Solofo.Ramangalahy
2008-06-10 6:56 ` Nadia Derbey
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox