Netdev List
 help / color / mirror / Atom feed
* [v2 046/115] sysctl: remove .child from pm/ (frv)
From: Lucian Adrian Grijincu @ 2011-05-08 22:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, Lucian Adrian Grijincu
In-Reply-To: <1304894407-32201-1-git-send-email-lucian.grijincu@gmail.com>

Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
 arch/frv/kernel/pm.c |   10 +++-------
 1 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/arch/frv/kernel/pm.c b/arch/frv/kernel/pm.c
index 5fa3889..bcef945 100644
--- a/arch/frv/kernel/pm.c
+++ b/arch/frv/kernel/pm.c
@@ -329,13 +329,9 @@ static struct ctl_table pm_table[] =
 	{ }
 };
 
-static struct ctl_table pm_dir_table[] =
+static const __initdata struct ctl_path pm_path[] =
 {
-	{
-		.procname	= "pm",
-		.mode		= 0555,
-		.child		= pm_table,
-	},
+	{ .procname = "pm" },
 	{ }
 };
 
@@ -344,7 +340,7 @@ static struct ctl_table pm_dir_table[] =
  */
 static int __init pm_init(void)
 {
-	register_sysctl_table(pm_dir_table);
+	register_sysctl_paths(pm_path, pm_table);
 	return 0;
 }
 
-- 
1.7.5.134.g1c08b


^ permalink raw reply related

* [v2 049/115] sysctl: delete unused register_sysctl_table function
From: Lucian Adrian Grijincu @ 2011-05-08 22:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, Lucian Adrian Grijincu
In-Reply-To: <1304894407-32201-1-git-send-email-lucian.grijincu@gmail.com>

Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
 include/linux/sysctl.h |    3 +--
 kernel/sysctl.c        |   26 ++------------------------
 2 files changed, 3 insertions(+), 26 deletions(-)

diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 11684d9..470e06a 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -985,7 +985,7 @@ extern int proc_do_large_bitmap(struct ctl_table *, int,
 				void __user *, size_t *, loff_t *);
 
 /*
- * Register a set of sysctl names by calling register_sysctl_table
+ * Register a set of sysctl names by calling __register_sysctl_paths
  * with an initialised array of struct ctl_table's.  An entry with 
  * NULL procname terminates the table.  table->de will be
  * set up by the registration and need not be initialised in advance.
@@ -1065,7 +1065,6 @@ void register_sysctl_root(struct ctl_table_root *root);
 struct ctl_table_header *__register_sysctl_paths(
 	struct ctl_table_root *root, struct nsproxy *namespaces,
 	const struct ctl_path *path, struct ctl_table *table);
-struct ctl_table_header *register_sysctl_table(struct ctl_table * table);
 struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
 						struct ctl_table *table);
 
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index c0bb324..b813724 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1905,7 +1905,7 @@ struct ctl_table_header *__register_sysctl_paths(
 }
 
 /**
- * register_sysctl_table_path - register a sysctl table hierarchy
+ * register_sysctl_paths - register a sysctl table hierarchy
  * @path: The path to the directory the sysctl table is in.
  * @table: the top-level table structure
  *
@@ -1922,24 +1922,8 @@ struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
 }
 
 /**
- * register_sysctl_table - register a sysctl table hierarchy
- * @table: the top-level table structure
- *
- * Register a sysctl table hierarchy. @table should be a filled in ctl_table
- * array. A completely 0 filled entry terminates the table.
- *
- * See register_sysctl_paths for more details.
- */
-struct ctl_table_header *register_sysctl_table(struct ctl_table *table)
-{
-	static const struct ctl_path null_path[] = { {} };
-
-	return register_sysctl_paths(null_path, table);
-}
-
-/**
  * unregister_sysctl_table - unregister a sysctl table hierarchy
- * @header: the header returned from register_sysctl_table
+ * @header: the header returned from __register_sysctl_paths
  *
  * Unregisters the sysctl table and all children. proc entries may not
  * actually be removed until they are no longer used by anyone.
@@ -1987,11 +1971,6 @@ void setup_sysctl_set(struct ctl_table_set *p,
 }
 
 #else /* !CONFIG_SYSCTL */
-struct ctl_table_header *register_sysctl_table(struct ctl_table * table)
-{
-	return NULL;
-}
-
 struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
 						    struct ctl_table *table)
 {
@@ -2977,6 +2956,5 @@ EXPORT_SYMBOL(proc_dointvec_ms_jiffies);
 EXPORT_SYMBOL(proc_dostring);
 EXPORT_SYMBOL(proc_doulongvec_minmax);
 EXPORT_SYMBOL(proc_doulongvec_ms_jiffies_minmax);
-EXPORT_SYMBOL(register_sysctl_table);
 EXPORT_SYMBOL(register_sysctl_paths);
 EXPORT_SYMBOL(unregister_sysctl_table);
-- 
1.7.5.134.g1c08b


^ permalink raw reply related

* [v2 050/115] sysctl: remove .child from ax25 table
From: Lucian Adrian Grijincu @ 2011-05-08 22:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, Lucian Adrian Grijincu
In-Reply-To: <1304894407-32201-1-git-send-email-lucian.grijincu@gmail.com>

Only compile tested!

I'm sorry but I could not manage to add a ax25 interface.

Some notable changes: before this patch, each time a device switched
to up/down we would unregister everything under /proc/sys/net/ax25/
and then reregister an updated table with all devices in it (BTW, the
table was GFP_ATOMIC!).

Now each state change (up/down) registers it's own table (e.g.
/proc/sys/net/ax25/ax0/). I'm assuming ax25 devices cannot be renamed,
but if that's possible, this can be fixed by making a private copy of
the device name for sysctl, and unregistering/reregistering the table
on device rename (see net/ipv4/devinet.c).

Also added an empty /proc/sys/net/ax25/ root directory. Without it,
the first device added would have been the first to create the
/proc/sys/net/ax25/ sysctl path and all other devices would have
attached to it. If the first device was to be removed before other
ones, we would have gotten a harmless warning form sysctl telling us
we're unregistering the parent before the children.

Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
 include/net/ax25.h         |   10 +++---
 net/ax25/af_ax25.c         |   23 ++++++++++++-
 net/ax25/ax25_dev.c        |   10 +-----
 net/ax25/sysctl_net_ax25.c |   76 ++++++++++++++-----------------------------
 4 files changed, 53 insertions(+), 66 deletions(-)

diff --git a/include/net/ax25.h b/include/net/ax25.h
index 206d222..79c2d2d 100644
--- a/include/net/ax25.h
+++ b/include/net/ax25.h
@@ -215,7 +215,7 @@ typedef struct ax25_dev {
 	struct ax25_dev		*next;
 	struct net_device	*dev;
 	struct net_device	*forward;
-	struct ctl_table	*systable;
+	struct ctl_table_header	*ax25_sysheader;
 	int			values[AX25_MAX_VALUES];
 #if defined(CONFIG_AX25_DAMA_SLAVE) || defined(CONFIG_AX25_DAMA_MASTER)
 	ax25_dama_info		dama;
@@ -441,11 +441,11 @@ extern void ax25_uid_free(void);
 
 /* sysctl_net_ax25.c */
 #ifdef CONFIG_SYSCTL
-extern void ax25_register_sysctl(void);
-extern void ax25_unregister_sysctl(void);
+extern void ax25_register_sysctl(struct ax25_dev *dev);
+extern void ax25_unregister_sysctl(struct ax25_dev *dev);
 #else
-static inline void ax25_register_sysctl(void) {};
-static inline void ax25_unregister_sysctl(void) {};
+static inline void ax25_register_sysctl(struct ax25_dev *dev) {};
+static inline void ax25_unregister_sysctl(struct ax25_dev *dev) {};
 #endif /* CONFIG_SYSCTL */
 
 #endif
diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index 6da5dae..965662d 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -1989,6 +1989,18 @@ static struct notifier_block ax25_dev_notifier = {
 	.notifier_call =ax25_device_event,
 };
 
+
+#ifdef CONFIG_SYSCTL
+static const struct __initdata ctl_path ax25_path[] = {
+	{ .procname = "net" },
+	{ .procname = "ax25" },
+	{ }
+};
+static struct ctl_table empty;
+static struct ctl_table_header *ax25_root_header;
+#endif /* CONFIG_SYSCTL */
+
+
 static int __init ax25_init(void)
 {
 	int rc = proto_register(&ax25_proto, 0);
@@ -1999,7 +2011,11 @@ static int __init ax25_init(void)
 	sock_register(&ax25_family_ops);
 	dev_add_pack(&ax25_packet_type);
 	register_netdevice_notifier(&ax25_dev_notifier);
-	ax25_register_sysctl();
+
+	/* XXX: no error checking done in initializer */
+	#ifdef CONFIG_SYSCTL
+	ax25_root_header = register_sysctl_paths(ax25_path, &empty);
+	#endif
 
 	proc_net_fops_create(&init_net, "ax25_route", S_IRUGO, &ax25_route_fops);
 	proc_net_fops_create(&init_net, "ax25", S_IRUGO, &ax25_info_fops);
@@ -2024,7 +2040,10 @@ static void __exit ax25_exit(void)
 	ax25_uid_free();
 	ax25_dev_free();
 
-	ax25_unregister_sysctl();
+	#ifdef CONFIG_SYSCTL
+	unregister_sysctl_table(ax25_root_header);
+	#endif
+
 	unregister_netdevice_notifier(&ax25_dev_notifier);
 
 	dev_remove_pack(&ax25_packet_type);
diff --git a/net/ax25/ax25_dev.c b/net/ax25/ax25_dev.c
index c1cb982..6ff1853 100644
--- a/net/ax25/ax25_dev.c
+++ b/net/ax25/ax25_dev.c
@@ -60,8 +60,6 @@ void ax25_dev_device_up(struct net_device *dev)
 		return;
 	}
 
-	ax25_unregister_sysctl();
-
 	dev->ax25_ptr     = ax25_dev;
 	ax25_dev->dev     = dev;
 	dev_hold(dev);
@@ -91,7 +89,7 @@ void ax25_dev_device_up(struct net_device *dev)
 	ax25_dev_list  = ax25_dev;
 	spin_unlock_bh(&ax25_dev_lock);
 
-	ax25_register_sysctl();
+	ax25_register_sysctl(ax25_dev);
 }
 
 void ax25_dev_device_down(struct net_device *dev)
@@ -101,7 +99,7 @@ void ax25_dev_device_down(struct net_device *dev)
 	if ((ax25_dev = ax25_dev_ax25dev(dev)) == NULL)
 		return;
 
-	ax25_unregister_sysctl();
+	ax25_unregister_sysctl(ax25_dev);
 
 	spin_lock_bh(&ax25_dev_lock);
 
@@ -121,7 +119,6 @@ void ax25_dev_device_down(struct net_device *dev)
 		spin_unlock_bh(&ax25_dev_lock);
 		dev_put(dev);
 		kfree(ax25_dev);
-		ax25_register_sysctl();
 		return;
 	}
 
@@ -131,7 +128,6 @@ void ax25_dev_device_down(struct net_device *dev)
 			spin_unlock_bh(&ax25_dev_lock);
 			dev_put(dev);
 			kfree(ax25_dev);
-			ax25_register_sysctl();
 			return;
 		}
 
@@ -139,8 +135,6 @@ void ax25_dev_device_down(struct net_device *dev)
 	}
 	spin_unlock_bh(&ax25_dev_lock);
 	dev->ax25_ptr = NULL;
-
-	ax25_register_sysctl();
 }
 
 int ax25_fwd_ioctl(unsigned int cmd, struct ax25_fwd_struct *fwd)
diff --git a/net/ax25/sysctl_net_ax25.c b/net/ax25/sysctl_net_ax25.c
index ebe0ef3..b1181bc 100644
--- a/net/ax25/sysctl_net_ax25.c
+++ b/net/ax25/sysctl_net_ax25.c
@@ -29,17 +29,6 @@ static int min_proto[1],		max_proto[] = { AX25_PROTO_MAX };
 static int min_ds_timeout[1],		max_ds_timeout[] = {65535000};
 #endif
 
-static struct ctl_table_header *ax25_table_header;
-
-static ctl_table *ax25_table;
-static int ax25_table_size;
-
-static struct ctl_path ax25_path[] = {
-	{ .procname = "net", },
-	{ .procname = "ax25", },
-	{ }
-};
-
 static const ctl_table ax25_param_table[] = {
 	{
 		.procname	= "ip_default_mode",
@@ -159,52 +148,37 @@ static const ctl_table ax25_param_table[] = {
 	{ }	/* that's all, folks! */
 };
 
-void ax25_register_sysctl(void)
+void ax25_register_sysctl(struct ax25_dev *ax25_dev)
 {
-	ax25_dev *ax25_dev;
-	int n, k;
-
-	spin_lock_bh(&ax25_dev_lock);
-	for (ax25_table_size = sizeof(ctl_table), ax25_dev = ax25_dev_list; ax25_dev != NULL; ax25_dev = ax25_dev->next)
-		ax25_table_size += sizeof(ctl_table);
-
-	if ((ax25_table = kzalloc(ax25_table_size, GFP_ATOMIC)) == NULL) {
-		spin_unlock_bh(&ax25_dev_lock);
+	struct ctl_table *ax25_table;
+	int i;
+
+	/* Assuming the name does not change while this sysctl
+	 * is registered. If ax25 supports device renaming
+	 * (SIOCSIFNAME), sysctl will need it's own copy of
+	 * the name */
+	struct ctl_path ax25_path[] = {
+		{ .procname = "net" },
+		{ .procname = "ax25" },
+		{ .procname = ax25_dev->dev->name },
+		{ }
+	};
+
+
+	ax25_table = kmemdup(ax25_param_table, sizeof(ax25_param_table), GFP_KERNEL);
+	if (!ax25_table)
 		return;
-	}
-
-	for (n = 0, ax25_dev = ax25_dev_list; ax25_dev != NULL; ax25_dev = ax25_dev->next) {
-		struct ctl_table *child = kmemdup(ax25_param_table,
-						  sizeof(ax25_param_table),
-						  GFP_ATOMIC);
-		if (!child) {
-			while (n--)
-				kfree(ax25_table[n].child);
-			kfree(ax25_table);
-			spin_unlock_bh(&ax25_dev_lock);
-			return;
-		}
-		ax25_table[n].child = ax25_dev->systable = child;
-		ax25_table[n].procname     = ax25_dev->dev->name;
-		ax25_table[n].mode         = 0555;
-
 
-		for (k = 0; k < AX25_MAX_VALUES; k++)
-			child[k].data = &ax25_dev->values[k];
+	for (i = 0; i < AX25_MAX_VALUES; i++)
+		ax25_table[i].data = &ax25_dev->values[i];
 
-		n++;
-	}
-	spin_unlock_bh(&ax25_dev_lock);
-
-	ax25_table_header = register_sysctl_paths(ax25_path, ax25_table);
+	ax25_dev->ax25_sysheader = register_sysctl_paths(ax25_path, ax25_table);
 }
 
-void ax25_unregister_sysctl(void)
+void ax25_unregister_sysctl(struct ax25_dev *ax25_dev)
 {
-	ctl_table *p;
-	unregister_sysctl_table(ax25_table_header);
-
-	for (p = ax25_table; p->procname; p++)
-		kfree(p->child);
+	struct ctl_table *ax25_table = ax25_dev->ax25_sysheader->ctl_table_arg;
+	unregister_sysctl_table(ax25_dev->ax25_sysheader);
+	ax25_dev->ax25_sysheader = NULL;
 	kfree(ax25_table);
 }
-- 
1.7.5.134.g1c08b


^ permalink raw reply related

* [v2 054/115] sysctl: remove .child from net/llc tables
From: Lucian Adrian Grijincu @ 2011-05-08 22:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, Lucian Adrian Grijincu
In-Reply-To: <1304894407-32201-1-git-send-email-lucian.grijincu@gmail.com>

Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
 net/llc/sysctl_net_llc.c |   55 +++++++++++++++++++++++----------------------
 1 files changed, 28 insertions(+), 27 deletions(-)

diff --git a/net/llc/sysctl_net_llc.c b/net/llc/sysctl_net_llc.c
index e2ebe35..8977307 100644
--- a/net/llc/sysctl_net_llc.c
+++ b/net/llc/sysctl_net_llc.c
@@ -56,48 +56,49 @@ static struct ctl_table llc_station_table[] = {
 	{ },
 };
 
-static struct ctl_table llc2_dir_timeout_table[] = {
-	{
-		.procname	= "timeout",
-		.mode		= 0555,
-		.child		= llc2_timeout_table,
-	},
-	{ },
-};
 
-static struct ctl_table llc_table[] = {
-	{
-		.procname	= "llc2",
-		.mode		= 0555,
-		.child		= llc2_dir_timeout_table,
-	},
-	{
-		.procname       = "station",
-		.mode           = 0555,
-		.child          = llc_station_table,
-	},
-	{ },
+static const __initdata struct ctl_path llc2_timeout_path[] = {
+	{ .procname = "net", },
+	{ .procname = "llc", },
+	{ .procname = "llc2", },
+	{ .procname = "timeout", },
+	{ }
 };
 
-static struct ctl_path llc_path[] = {
+static const __initdata struct ctl_path llc_station_path[] = {
 	{ .procname = "net", },
 	{ .procname = "llc", },
+	{ .procname = "station", },
 	{ }
 };
 
-static struct ctl_table_header *llc_table_header;
+static struct ctl_table_header *llc_station_hdr;
+static struct ctl_table_header *llc2_timeout_hdr;
 
 int __init llc_sysctl_init(void)
 {
-	llc_table_header = register_sysctl_paths(llc_path, llc_table);
+	llc_station_hdr = register_sysctl_paths(llc_station_path, llc_station_table);
+	if (!llc_station_hdr)
+		return -ENOMEM;
 
-	return llc_table_header ? 0 : -ENOMEM;
+	llc2_timeout_hdr = register_sysctl_paths(llc2_timeout_path, llc2_timeout_table);
+	if (!llc2_timeout_hdr) {
+		unregister_sysctl_table(llc_station_hdr);
+		llc_station_hdr = NULL;
+		return -ENOMEM;
+	}
+
+	return 0;
 }
 
 void llc_sysctl_exit(void)
 {
-	if (llc_table_header) {
-		unregister_sysctl_table(llc_table_header);
-		llc_table_header = NULL;
+	if (llc2_timeout_hdr) {
+		unregister_sysctl_table(llc2_timeout_hdr);
+		llc2_timeout_hdr = NULL;
+	}
+	if (llc_station_hdr) {
+		unregister_sysctl_table(llc_station_hdr);
+		llc_station_hdr = NULL;
 	}
 }
-- 
1.7.5.134.g1c08b


^ permalink raw reply related

* [v2 057/115] sysctl: no-child: manually register kernel/keys
From: Lucian Adrian Grijincu @ 2011-05-08 22:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, Lucian Adrian Grijincu
In-Reply-To: <1304894407-32201-1-git-send-email-lucian.grijincu@gmail.com>

Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
 include/linux/key.h    |    4 +++-
 kernel/sysctl.c        |    7 -------
 security/keys/key.c    |    1 +
 security/keys/sysctl.c |   18 +++++++++++++++++-
 4 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/include/linux/key.h b/include/linux/key.h
index b2bb017..9b3df18 100644
--- a/include/linux/key.h
+++ b/include/linux/key.h
@@ -281,7 +281,9 @@ static inline key_serial_t key_serial(struct key *key)
 				   rwsem_is_locked(&((struct key *)(KEY))->sem)))
 
 #ifdef CONFIG_SYSCTL
-extern ctl_table key_sysctls[];
+extern int __init key_register_sysctls(void);
+#else
+static int __init key_register_sysctls(void) { return 0; }
 #endif
 
 extern void key_replace_session_keyring(void);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index b020156..33d5e2e 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -905,13 +905,6 @@ static struct ctl_table kern_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dostring,
 	},
-#ifdef CONFIG_KEYS
-	{
-		.procname	= "keys",
-		.mode		= 0555,
-		.child		= key_sysctls,
-	},
-#endif
 #ifdef CONFIG_RCU_TORTURE_TEST
 	{
 		.procname       = "rcutorture_runnable",
diff --git a/security/keys/key.c b/security/keys/key.c
index f7f9d93..33903c2 100644
--- a/security/keys/key.c
+++ b/security/keys/key.c
@@ -1099,6 +1099,7 @@ EXPORT_SYMBOL(unregister_key_type);
  */
 void __init key_init(void)
 {
+	key_register_sysctls();
 	/* allocate a slab in which we can store keys */
 	key_jar = kmem_cache_create("key_jar", sizeof(struct key),
 			0, SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
diff --git a/security/keys/sysctl.c b/security/keys/sysctl.c
index ee32d18..e079223 100644
--- a/security/keys/sysctl.c
+++ b/security/keys/sysctl.c
@@ -15,7 +15,7 @@
 
 static const int zero, one = 1, max = INT_MAX;
 
-ctl_table key_sysctls[] = {
+static struct ctl_table key_table[] = {
 	{
 		.procname = "maxkeys",
 		.data = &key_quota_maxkeys,
@@ -63,3 +63,19 @@ ctl_table key_sysctls[] = {
 	},
 	{ }
 };
+
+static const __initdata struct ctl_path key_path[] = {
+	{ .procname = "kernel" },
+	{ .procname = "keys" },
+	{ }
+};
+
+static struct ctl_table_header *key_header;
+
+int __init key_register_sysctls(void)
+{
+	key_header = register_sysctl_paths(key_path, key_table);
+	if (key_header == NULL)
+		return -ENOMEM;
+	return 0;
+}
-- 
1.7.5.134.g1c08b


^ permalink raw reply related

* [v2 065/115] sysctl: delete useless grab_header function
From: Lucian Adrian Grijincu @ 2011-05-08 22:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, Lucian Adrian Grijincu
In-Reply-To: <1304894407-32201-1-git-send-email-lucian.grijincu@gmail.com>

There are lots of header grabbing/getting functions around. We'll
start changing them later on and this one will just make conversions
harder. It doesn't help much, so kill it!

Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
 fs/proc/proc_sysctl.c |   15 +++++----------
 1 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 93962b0..64665e0 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -62,15 +62,10 @@ static struct ctl_table *find_in_table(struct ctl_table *p, struct qstr *name)
 	return NULL;
 }
 
-static struct ctl_table_header *grab_header(struct inode *inode)
-{
-	return sysctl_head_grab(PROC_I(inode)->sysctl);
-}
-
 static struct dentry *proc_sys_lookup(struct inode *dir, struct dentry *dentry,
 					struct nameidata *nd)
 {
-	struct ctl_table_header *head = grab_header(dir);
+	struct ctl_table_header *head = sysctl_head_grab(PROC_I(dir)->sysctl);
 	struct ctl_table *table = PROC_I(dir)->sysctl_entry;
 	struct ctl_table_header *h = NULL;
 	struct qstr *name = &dentry->d_name;
@@ -123,7 +118,7 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *buf,
 		size_t count, loff_t *ppos, int write)
 {
 	struct inode *inode = filp->f_path.dentry->d_inode;
-	struct ctl_table_header *head = grab_header(inode);
+	struct ctl_table_header *head = sysctl_head_grab(PROC_I(inode)->sysctl);
 	struct ctl_table *table = PROC_I(inode)->sysctl_entry;
 	ssize_t error;
 	size_t res;
@@ -234,7 +229,7 @@ static int proc_sys_readdir(struct file *filp, void *dirent, filldir_t filldir)
 {
 	struct dentry *dentry = filp->f_path.dentry;
 	struct inode *inode = dentry->d_inode;
-	struct ctl_table_header *head = grab_header(inode);
+	struct ctl_table_header *head = sysctl_head_grab(PROC_I(inode)->sysctl);
 	struct ctl_table *table = PROC_I(inode)->sysctl_entry;
 	struct ctl_table_header *h = NULL;
 	unsigned long pos;
@@ -302,7 +297,7 @@ static int proc_sys_permission(struct inode *inode, int mask,unsigned int flags)
 	if ((mask & MAY_EXEC) && S_ISREG(inode->i_mode))
 		return -EACCES;
 
-	head = grab_header(inode);
+	head = sysctl_head_grab(PROC_I(inode)->sysctl);
 	if (IS_ERR(head))
 		return PTR_ERR(head);
 
@@ -343,7 +338,7 @@ static int proc_sys_setattr(struct dentry *dentry, struct iattr *attr)
 static int proc_sys_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
 {
 	struct inode *inode = dentry->d_inode;
-	struct ctl_table_header *head = grab_header(inode);
+	struct ctl_table_header *head = sysctl_head_grab(PROC_I(inode)->sysctl);
 	struct ctl_table *table = PROC_I(inode)->sysctl_entry;
 
 	if (IS_ERR(head))
-- 
1.7.5.134.g1c08b


^ permalink raw reply related

* [v2 067/115] sysctl: rename sysctl_head_grab/finish to sysctl_use_header/unuse
From: Lucian Adrian Grijincu @ 2011-05-08 22:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, Lucian Adrian Grijincu
In-Reply-To: <1304894407-32201-1-git-send-email-lucian.grijincu@gmail.com>

The function names are clearer and they reflect the reference counter
that is being inc/decremented. No functional change, just aesthetics.

Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
 fs/proc/proc_sysctl.c  |   24 ++++++++++++------------
 include/linux/sysctl.h |    4 ++--
 kernel/sysctl.c        |   40 ++++++++++++++++++++--------------------
 kernel/sysctl_check.c  |    2 +-
 4 files changed, 35 insertions(+), 35 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 64665e0..b4cde14 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -65,7 +65,7 @@ static struct ctl_table *find_in_table(struct ctl_table *p, struct qstr *name)
 static struct dentry *proc_sys_lookup(struct inode *dir, struct dentry *dentry,
 					struct nameidata *nd)
 {
-	struct ctl_table_header *head = sysctl_head_grab(PROC_I(dir)->sysctl);
+	struct ctl_table_header *head = sysctl_use_header(PROC_I(dir)->sysctl);
 	struct ctl_table *table = PROC_I(dir)->sysctl_entry;
 	struct ctl_table_header *h = NULL;
 	struct qstr *name = &dentry->d_name;
@@ -100,7 +100,7 @@ static struct dentry *proc_sys_lookup(struct inode *dir, struct dentry *dentry,
 	err = ERR_PTR(-ENOMEM);
 	inode = proc_sys_make_inode(dir->i_sb, h ? h : head, p);
 	if (h)
-		sysctl_head_finish(h);
+		sysctl_unuse_header(h);
 
 	if (!inode)
 		goto out;
@@ -110,7 +110,7 @@ static struct dentry *proc_sys_lookup(struct inode *dir, struct dentry *dentry,
 	d_add(dentry, inode);
 
 out:
-	sysctl_head_finish(head);
+	sysctl_unuse_header(head);
 	return err;
 }
 
@@ -118,7 +118,7 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *buf,
 		size_t count, loff_t *ppos, int write)
 {
 	struct inode *inode = filp->f_path.dentry->d_inode;
-	struct ctl_table_header *head = sysctl_head_grab(PROC_I(inode)->sysctl);
+	struct ctl_table_header *head = sysctl_use_header(PROC_I(inode)->sysctl);
 	struct ctl_table *table = PROC_I(inode)->sysctl_entry;
 	ssize_t error;
 	size_t res;
@@ -145,7 +145,7 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *buf,
 	if (!error)
 		error = res;
 out:
-	sysctl_head_finish(head);
+	sysctl_unuse_header(head);
 
 	return error;
 }
@@ -229,7 +229,7 @@ static int proc_sys_readdir(struct file *filp, void *dirent, filldir_t filldir)
 {
 	struct dentry *dentry = filp->f_path.dentry;
 	struct inode *inode = dentry->d_inode;
-	struct ctl_table_header *head = sysctl_head_grab(PROC_I(inode)->sysctl);
+	struct ctl_table_header *head = sysctl_use_header(PROC_I(inode)->sysctl);
 	struct ctl_table *table = PROC_I(inode)->sysctl_entry;
 	struct ctl_table_header *h = NULL;
 	unsigned long pos;
@@ -270,13 +270,13 @@ static int proc_sys_readdir(struct file *filp, void *dirent, filldir_t filldir)
 			continue;
 		ret = scan(h, h->attached_by, &pos, filp, dirent, filldir);
 		if (ret) {
-			sysctl_head_finish(h);
+			sysctl_unuse_header(h);
 			break;
 		}
 	}
 	ret = 1;
 out:
-	sysctl_head_finish(head);
+	sysctl_unuse_header(head);
 	return ret;
 }
 
@@ -297,7 +297,7 @@ static int proc_sys_permission(struct inode *inode, int mask,unsigned int flags)
 	if ((mask & MAY_EXEC) && S_ISREG(inode->i_mode))
 		return -EACCES;
 
-	head = sysctl_head_grab(PROC_I(inode)->sysctl);
+	head = sysctl_use_header(PROC_I(inode)->sysctl);
 	if (IS_ERR(head))
 		return PTR_ERR(head);
 
@@ -307,7 +307,7 @@ static int proc_sys_permission(struct inode *inode, int mask,unsigned int flags)
 	else /* Use the permissions on the sysctl table entry */
 		error = sysctl_perm(head->root, table, mask);
 
-	sysctl_head_finish(head);
+	sysctl_unuse_header(head);
 	return error;
 }
 
@@ -338,7 +338,7 @@ static int proc_sys_setattr(struct dentry *dentry, struct iattr *attr)
 static int proc_sys_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
 {
 	struct inode *inode = dentry->d_inode;
-	struct ctl_table_header *head = sysctl_head_grab(PROC_I(inode)->sysctl);
+	struct ctl_table_header *head = sysctl_use_header(PROC_I(inode)->sysctl);
 	struct ctl_table *table = PROC_I(inode)->sysctl_entry;
 
 	if (IS_ERR(head))
@@ -348,7 +348,7 @@ static int proc_sys_getattr(struct vfsmount *mnt, struct dentry *dentry, struct
 	if (table)
 		stat->mode = (stat->mode & S_IFMT) | table->mode;
 
-	sysctl_head_finish(head);
+	sysctl_unuse_header(head);
 	return 0;
 }
 
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index fe13067..3ff0a9e 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -954,11 +954,11 @@ struct ctl_table_header;
 extern void sysctl_head_get(struct ctl_table_header *);
 extern void sysctl_head_put(struct ctl_table_header *);
 extern int sysctl_is_seen(struct ctl_table_header *);
-extern struct ctl_table_header *sysctl_head_grab(struct ctl_table_header *);
+extern struct ctl_table_header *sysctl_use_header(struct ctl_table_header *);
 extern struct ctl_table_header *sysctl_head_next(struct ctl_table_header *prev);
 extern struct ctl_table_header *__sysctl_head_next(struct nsproxy *namespaces,
 						struct ctl_table_header *prev);
-extern void sysctl_head_finish(struct ctl_table_header *prev);
+extern void sysctl_unuse_header(struct ctl_table_header *prev);
 extern int sysctl_perm(struct ctl_table_root *root,
 		struct ctl_table *table, int op);
 
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index ab242b4..5d52e7a 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1506,6 +1506,26 @@ static void unuse_table(struct ctl_table_header *p)
 			complete(p->unregistering);
 }
 
+struct ctl_table_header *sysctl_use_header(struct ctl_table_header *head)
+{
+	if (!head)
+		head = &root_table_header;
+	spin_lock(&sysctl_lock);
+	if (!use_table(head))
+		head = ERR_PTR(-ENOENT);
+	spin_unlock(&sysctl_lock);
+	return head;
+}
+
+void sysctl_unuse_header(struct ctl_table_header *head)
+{
+	if (!head)
+		return;
+	spin_lock(&sysctl_lock);
+	unuse_table(head);
+	spin_unlock(&sysctl_lock);
+}
+
 /* called under sysctl_lock, will reacquire if has to wait */
 static void start_unregistering(struct ctl_table_header *p)
 {
@@ -1551,26 +1571,6 @@ void sysctl_head_put(struct ctl_table_header *head)
 	spin_unlock(&sysctl_lock);
 }
 
-struct ctl_table_header *sysctl_head_grab(struct ctl_table_header *head)
-{
-	if (!head)
-		head = &root_table_header;
-	spin_lock(&sysctl_lock);
-	if (!use_table(head))
-		head = ERR_PTR(-ENOENT);
-	spin_unlock(&sysctl_lock);
-	return head;
-}
-
-void sysctl_head_finish(struct ctl_table_header *head)
-{
-	if (!head)
-		return;
-	spin_lock(&sysctl_lock);
-	unuse_table(head);
-	spin_unlock(&sysctl_lock);
-}
-
 static struct ctl_table_set *
 lookup_header_set(struct ctl_table_root *root, struct nsproxy *namespaces)
 {
diff --git a/kernel/sysctl_check.c b/kernel/sysctl_check.c
index 52f4810..a3a58b8 100644
--- a/kernel/sysctl_check.c
+++ b/kernel/sysctl_check.c
@@ -55,7 +55,7 @@ repeat:
 	}
 	ref = NULL;
 out:
-	sysctl_head_finish(head);
+	sysctl_unuse_header(head);
 	return ref;
 }
 
-- 
1.7.5.134.g1c08b


^ permalink raw reply related

* [v2 068/115] sysctl: rename sysctl_head_next to sysctl_use_next_header
From: Lucian Adrian Grijincu @ 2011-05-08 22:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, Lucian Adrian Grijincu
In-Reply-To: <1304894407-32201-1-git-send-email-lucian.grijincu@gmail.com>

The new names makes it clear that this increments ctl_use_refs and
that _unuse must be used on the header. No functional change.

Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
 fs/proc/proc_sysctl.c  |    4 ++--
 include/linux/sysctl.h |    4 ++--
 kernel/sysctl.c        |    6 +++---
 kernel/sysctl_check.c  |    4 ++--
 4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index b4cde14..068d39c 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -85,7 +85,7 @@ static struct dentry *proc_sys_lookup(struct inode *dir, struct dentry *dentry,
 
 	p = find_in_table(table, name);
 	if (!p) {
-		for (h = sysctl_head_next(NULL); h; h = sysctl_head_next(h)) {
+		for (h = sysctl_use_next_header(NULL); h; h = sysctl_use_next_header(h)) {
 			if (h->attached_to != table)
 				continue;
 			p = find_in_table(h->attached_by, name);
@@ -265,7 +265,7 @@ static int proc_sys_readdir(struct file *filp, void *dirent, filldir_t filldir)
 	if (ret)
 		goto out;
 
-	for (h = sysctl_head_next(NULL); h; h = sysctl_head_next(h)) {
+	for (h = sysctl_use_next_header(NULL); h; h = sysctl_use_next_header(h)) {
 		if (h->attached_to != table)
 			continue;
 		ret = scan(h, h->attached_by, &pos, filp, dirent, filldir);
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 3ff0a9e..4ed5235 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -955,8 +955,8 @@ extern void sysctl_head_get(struct ctl_table_header *);
 extern void sysctl_head_put(struct ctl_table_header *);
 extern int sysctl_is_seen(struct ctl_table_header *);
 extern struct ctl_table_header *sysctl_use_header(struct ctl_table_header *);
-extern struct ctl_table_header *sysctl_head_next(struct ctl_table_header *prev);
-extern struct ctl_table_header *__sysctl_head_next(struct nsproxy *namespaces,
+extern struct ctl_table_header *sysctl_use_next_header(struct ctl_table_header *prev);
+extern struct ctl_table_header *__sysctl_use_next_header(struct nsproxy *namespaces,
 						struct ctl_table_header *prev);
 extern void sysctl_unuse_header(struct ctl_table_header *prev);
 extern int sysctl_perm(struct ctl_table_root *root,
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 5d52e7a..e4ec23e 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1587,7 +1587,7 @@ lookup_header_list(struct ctl_table_root *root, struct nsproxy *namespaces)
 	return &set->list;
 }
 
-struct ctl_table_header *__sysctl_head_next(struct nsproxy *namespaces,
+struct ctl_table_header *__sysctl_use_next_header(struct nsproxy *namespaces,
 					    struct ctl_table_header *prev)
 {
 	struct ctl_table_root *root;
@@ -1631,9 +1631,9 @@ out:
 	return NULL;
 }
 
-struct ctl_table_header *sysctl_head_next(struct ctl_table_header *prev)
+struct ctl_table_header *sysctl_use_next_header(struct ctl_table_header *prev)
 {
-	return __sysctl_head_next(current->nsproxy, prev);
+	return __sysctl_use_next_header(current->nsproxy, prev);
 }
 
 void register_sysctl_root(struct ctl_table_root *root)
diff --git a/kernel/sysctl_check.c b/kernel/sysctl_check.c
index a3a58b8..44c31f0 100644
--- a/kernel/sysctl_check.c
+++ b/kernel/sysctl_check.c
@@ -28,8 +28,8 @@ static struct ctl_table *sysctl_check_lookup(struct nsproxy *namespaces,
 	struct ctl_table *ref, *test;
 	int cur_depth;
 
-	for (head = __sysctl_head_next(namespaces, NULL); head;
-	     head = __sysctl_head_next(namespaces, head)) {
+	for (head = __sysctl_use_next_header(namespaces, NULL); head;
+	     head = __sysctl_use_next_header(namespaces, head)) {
 		cur_depth = depth;
 		ref = head->ctl_table;
 repeat:
-- 
1.7.5.134.g1c08b


^ permalink raw reply related

* [v2 070/115] sysctl: rename sysctl_head_get/put to sysctl_proc_inode_get/put
From: Lucian Adrian Grijincu @ 2011-05-08 22:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, Lucian Adrian Grijincu
In-Reply-To: <1304894407-32201-1-git-send-email-lucian.grijincu@gmail.com>

Clarify the purpose of those references. No functional changes.

Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
 fs/proc/inode.c        |    2 +-
 fs/proc/proc_sysctl.c  |    2 +-
 include/linux/sysctl.h |    7 +++++--
 kernel/sysctl.c        |    6 +++---
 4 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index d15aa1b..08166df 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -42,7 +42,7 @@ static void proc_evict_inode(struct inode *inode)
 	head = PROC_I(inode)->sysctl;
 	if (head) {
 		rcu_assign_pointer(PROC_I(inode)->sysctl, NULL);
-		sysctl_head_put(head);
+		sysctl_proc_inode_put(head);
 	}
 }
 
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 068d39c..125b679 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -26,7 +26,7 @@ static struct inode *proc_sys_make_inode(struct super_block *sb,
 
 	inode->i_ino = get_next_ino();
 
-	sysctl_head_get(head);
+	sysctl_proc_inode_get(head);
 	ei = PROC_I(inode);
 	ei->sysctl = head;
 	ei->sysctl_entry = table;
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 0f41beb..e265880 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -951,8 +951,11 @@ extern void setup_sysctl_set(struct ctl_table_set *p,
 
 struct ctl_table_header;
 
-extern void sysctl_head_get(struct ctl_table_header *);
-extern void sysctl_head_put(struct ctl_table_header *);
+/* get/put a reference to this header that
+ * will be/was embedded in a procfs proc_inode */
+extern void sysctl_proc_inode_get(struct ctl_table_header *);
+extern void sysctl_proc_inode_put(struct ctl_table_header *);
+
 extern int sysctl_is_seen(struct ctl_table_header *);
 extern struct ctl_table_header *sysctl_use_header(struct ctl_table_header *);
 extern struct ctl_table_header *sysctl_use_next_header(struct ctl_table_header *prev);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 48a1ffd..caafbb8 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1551,7 +1551,7 @@ static void start_unregistering(struct ctl_table_header *p)
 	list_del_init(&p->ctl_entry);
 }
 
-void sysctl_head_get(struct ctl_table_header *head)
+void sysctl_proc_inode_get(struct ctl_table_header *head)
 {
 	spin_lock(&sysctl_lock);
 	head->ctl_procfs_refs++;
@@ -1563,7 +1563,7 @@ static void free_head(struct rcu_head *rcu)
 	kfree(container_of(rcu, struct ctl_table_header, rcu));
 }
 
-void sysctl_head_put(struct ctl_table_header *head)
+void sysctl_proc_inode_put(struct ctl_table_header *head)
 {
 	spin_lock(&sysctl_lock);
 	head->ctl_procfs_refs--;
@@ -1990,7 +1990,7 @@ void setup_sysctl_set(struct ctl_table_set *p,
 {
 }
 
-void sysctl_head_put(struct ctl_table_header *head)
+void sysctl_proc_inode_put(struct ctl_table_header *head)
 {
 }
 
-- 
1.7.5.134.g1c08b


^ permalink raw reply related

* [v2 071/115] sysctl: rename (un)use_table to __sysctl_(un)use_header
From: Lucian Adrian Grijincu @ 2011-05-08 22:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, Lucian Adrian Grijincu
In-Reply-To: <1304894407-32201-1-git-send-email-lucian.grijincu@gmail.com>

The former names were not semantically correct, as the use/unuse was
related to the header, not the table. Also this makes it clearer that
sysctl_use_header and __sysctl_use_header are related (one takes the
spin lock inside and the other doesn't).

Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
 kernel/sysctl.c |   21 ++++++++++-----------
 1 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index caafbb8..1281827 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1490,16 +1490,16 @@ static struct ctl_table dev_table[] = {
 static DEFINE_SPINLOCK(sysctl_lock);
 
 /* called under sysctl_lock */
-static int use_table(struct ctl_table_header *p)
+static struct ctl_table_header *__sysctl_use_header(struct ctl_table_header *head)
 {
-	if (unlikely(p->unregistering))
-		return 0;
-	p->ctl_use_refs++;
-	return 1;
+	if (unlikely(head->unregistering))
+		return ERR_PTR(-ENOENT);
+	head->ctl_use_refs++;
+	return head;
 }
 
 /* called under sysctl_lock */
-static void unuse_table(struct ctl_table_header *p)
+static void __sysctl_unuse_header(struct ctl_table_header *p)
 {
 	if (!--p->ctl_use_refs)
 		if (unlikely(p->unregistering))
@@ -1511,8 +1511,7 @@ struct ctl_table_header *sysctl_use_header(struct ctl_table_header *head)
 	if (!head)
 		head = &root_table_header;
 	spin_lock(&sysctl_lock);
-	if (!use_table(head))
-		head = ERR_PTR(-ENOENT);
+	head = __sysctl_use_header(head);
 	spin_unlock(&sysctl_lock);
 	return head;
 }
@@ -1522,7 +1521,7 @@ void sysctl_unuse_header(struct ctl_table_header *head)
 	if (!head)
 		return;
 	spin_lock(&sysctl_lock);
-	unuse_table(head);
+	__sysctl_unuse_header(head);
 	spin_unlock(&sysctl_lock);
 }
 
@@ -1600,14 +1599,14 @@ struct ctl_table_header *__sysctl_use_next_header(struct nsproxy *namespaces,
 	if (prev) {
 		head = prev;
 		tmp = &prev->ctl_entry;
-		unuse_table(prev);
+		__sysctl_unuse_header(prev);
 		goto next;
 	}
 	tmp = &root_table_header.ctl_entry;
 	for (;;) {
 		head = list_entry(tmp, struct ctl_table_header, ctl_entry);
 
-		if (!use_table(head))
+		if (IS_ERR(__sysctl_use_header(head)))
 			goto next;
 		spin_unlock(&sysctl_lock);
 		return head;
-- 
1.7.5.134.g1c08b


^ permalink raw reply related

* [v2 073/115] sysctl: group root-specific operations
From: Lucian Adrian Grijincu @ 2011-05-08 22:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, Lucian Adrian Grijincu
In-Reply-To: <1304894407-32201-1-git-send-email-lucian.grijincu@gmail.com>

No functional change, just moved stuff around.

->lookup was not moved to _ops because we'll get rid of it later.

This makes ctl_table_set occupy less space (the pointer to is_seen),
and that will means N*sizeof(void*) saved for N network
namespaces, but I don't that will impress anyone.

Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
 fs/proc/proc_sysctl.c  |    4 ++--
 include/linux/sysctl.h |   26 +++++++++++++++++++-------
 kernel/sysctl.c        |   25 ++++++++++++++-----------
 net/sysctl_net.c       |   20 ++++++++++++++------
 4 files changed, 49 insertions(+), 26 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 125b679..55c9bd1 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -131,7 +131,7 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *buf,
 	 * and won't be until we finish.
 	 */
 	error = -EPERM;
-	if (sysctl_perm(head->root, table, write ? MAY_WRITE : MAY_READ))
+	if (sysctl_perm(head->root->ctl_ops, table, write ? MAY_WRITE : MAY_READ))
 		goto out;
 
 	/* if that can happen at all, it should be -EINVAL, not -EISDIR */
@@ -305,7 +305,7 @@ static int proc_sys_permission(struct inode *inode, int mask,unsigned int flags)
 	if (!table) /* global root - r-xr-xr-x */
 		error = mask & MAY_WRITE ? -EACCES : 0;
 	else /* Use the permissions on the sysctl table entry */
-		error = sysctl_perm(head->root, table, mask);
+		error = sysctl_perm(head->root->ctl_ops, table, mask);
 
 	sysctl_unuse_header(head);
 	return error;
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 1af4ed5..8209d75 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -934,22 +934,21 @@ enum
 
 /* For the /proc/sys support */
 struct ctl_table;
+struct ctl_table_header;
+struct ctl_table_group_ops;
 struct nsproxy;
 struct ctl_table_root;
 
 struct ctl_table_set {
 	struct list_head list;
 	struct ctl_table_set *parent;
-	int (*is_seen)(struct ctl_table_set *);
 };
 
 extern __init int sysctl_init(void);
 
 extern void setup_sysctl_set(struct ctl_table_set *p,
-	struct ctl_table_set *parent,
-	int (*is_seen)(struct ctl_table_set *));
+			     struct ctl_table_set *parent);
 
-struct ctl_table_header;
 
 /* get/put a reference to this header that
  * will be/was embedded in a procfs proc_inode */
@@ -962,8 +961,8 @@ extern struct ctl_table_header *sysctl_use_next_header(struct ctl_table_header *
 extern struct ctl_table_header *__sysctl_use_next_header(struct nsproxy *namespaces,
 						struct ctl_table_header *prev);
 extern void sysctl_unuse_header(struct ctl_table_header *prev);
-extern int sysctl_perm(struct ctl_table_root *root,
-		struct ctl_table *table, int op);
+extern int sysctl_perm(const struct ctl_table_group_ops *ops,
+		       struct ctl_table *table, int op);
 
 typedef struct ctl_table ctl_table;
 
@@ -1029,12 +1028,25 @@ struct ctl_table
 	void *extra2;
 };
 
+struct ctl_table_group_ops {
+	/* some sysctl entries are visible only in some situations.
+	 * E.g.: /proc/sys/net/ipv4/conf/eth0/ is only visible in the
+	 * netns in which that eth0 interface lives.
+	 *
+	 * If this hook is not set, then all the sysctl entries in
+	 * this set are always visible. */
+	int (*is_seen)(struct ctl_table_set *set);
+
+	/* hook to alter permissions for some sysctl nodes at runtime */
+	int (*permissions)(struct ctl_table *table);
+};
+
 struct ctl_table_root {
 	struct list_head root_list;
 	struct ctl_table_set default_set;
 	struct ctl_table_set *(*lookup)(struct ctl_table_root *root,
 					   struct nsproxy *namespaces);
-	int (*permissions)(struct ctl_table *table);
+	const struct ctl_table_group_ops *ctl_ops;
 };
 
 /* struct ctl_table_header is used to maintain dynamic lists of
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 6e4e32b..0f00b87 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -197,6 +197,9 @@ static int sysrq_sysctl_handler(ctl_table *table, int write,
 
 #endif
 
+/* uses default ops */
+static const struct ctl_table_group_ops root_table_group_ops = { };
+
 static struct ctl_table root_table[];
 static struct ctl_table_root sysctl_table_root;
 static struct ctl_table_header root_table_header = {
@@ -206,7 +209,9 @@ static struct ctl_table_header root_table_header = {
 	.root = &sysctl_table_root,
 	.set = &sysctl_table_root.default_set,
 };
+
 static struct ctl_table_root sysctl_table_root = {
+	.ctl_ops = &root_table_group_ops,
 	.root_list = LIST_HEAD_INIT(sysctl_table_root.root_list),
 	.default_set.list = LIST_HEAD_INIT(root_table_header.ctl_entry),
 };
@@ -1659,12 +1664,13 @@ static int test_perm(int mode, int op)
 	return -EACCES;
 }
 
-int sysctl_perm(struct ctl_table_root *root, struct ctl_table *table, int op)
+int sysctl_perm(const struct ctl_table_group_ops *ops,
+		struct ctl_table *table, int op)
 {
 	int mode;
 
-	if (root->permissions)
-		mode = root->permissions(table);
+	if (ops->permissions)
+		mode = ops->permissions(table);
 	else
 		mode = table->mode;
 
@@ -1950,26 +1956,24 @@ void unregister_sysctl_table(struct ctl_table_header * header)
 
 int sysctl_is_seen(struct ctl_table_header *p)
 {
-	struct ctl_table_set *set = p->set;
+	const struct ctl_table_group_ops *ops = p->root->ctl_ops;
 	int res;
 	spin_lock(&sysctl_lock);
 	if (p->unregistering)
 		res = 0;
-	else if (!set->is_seen)
+	else if (!ops->is_seen)
 		res = 1;
 	else
-		res = set->is_seen(set);
+		res = ops->is_seen(p->set);
 	spin_unlock(&sysctl_lock);
 	return res;
 }
 
 void setup_sysctl_set(struct ctl_table_set *p,
-	struct ctl_table_set *parent,
-	int (*is_seen)(struct ctl_table_set *))
+		      struct ctl_table_set *parent)
 {
 	INIT_LIST_HEAD(&p->list);
 	p->parent = parent ? parent : &sysctl_table_root.default_set;
-	p->is_seen = is_seen;
 }
 
 #else /* !CONFIG_SYSCTL */
@@ -1984,8 +1988,7 @@ void unregister_sysctl_table(struct ctl_table_header * table)
 }
 
 void setup_sysctl_set(struct ctl_table_set *p,
-	struct ctl_table_set *parent,
-	int (*is_seen)(struct ctl_table_set *))
+		      struct ctl_table_set *parent)
 {
 }
 
diff --git a/net/sysctl_net.c b/net/sysctl_net.c
index 1c0cb57..c0d7140 100644
--- a/net/sysctl_net.c
+++ b/net/sysctl_net.c
@@ -51,12 +51,17 @@ static int net_ctl_permissions(struct ctl_table *table)
 	return table->mode;
 }
 
+static const struct ctl_table_group_ops net_sysctl_group_ops = {
+	.is_seen = is_seen,
+	.permissions = net_ctl_permissions,
+};
+
 static struct ctl_table_root net_sysctl_root = {
 	.lookup = net_ctl_header_lookup,
-	.permissions = net_ctl_permissions,
+	.ctl_ops = &net_sysctl_group_ops,
 };
 
-static int net_ctl_ro_header_perms(ctl_table *table)
+static int net_ctl_ro_header_permissions(ctl_table *table)
 {
 	if (net_eq(current->nsproxy->net_ns, &init_net))
 		return table->mode;
@@ -64,15 +69,18 @@ static int net_ctl_ro_header_perms(ctl_table *table)
 		return table->mode & ~0222;
 }
 
+static const struct ctl_table_group_ops net_sysctl_ro_group_ops = {
+	.permissions = net_ctl_ro_header_permissions,
+};
+
 static struct ctl_table_root net_sysctl_ro_root = {
-	.permissions = net_ctl_ro_header_perms,
+	.ctl_ops = &net_sysctl_ro_group_ops,
 };
 
 static int __net_init sysctl_net_init(struct net *net)
 {
 	setup_sysctl_set(&net->sysctls,
-			 &net_sysctl_ro_root.default_set,
-			 is_seen);
+			 &net_sysctl_ro_root.default_set);
 	return 0;
 }
 
@@ -93,7 +101,7 @@ static __init int net_sysctl_init(void)
 	if (ret)
 		goto out;
 	register_sysctl_root(&net_sysctl_root);
-	setup_sysctl_set(&net_sysctl_ro_root.default_set, NULL, NULL);
+	setup_sysctl_set(&net_sysctl_ro_root.default_set, NULL);
 	register_sysctl_root(&net_sysctl_ro_root);
 out:
 	return ret;
-- 
1.7.5.134.g1c08b


^ permalink raw reply related

* [v2 074/115] sysctl: introduce ctl_table_group
From: Lucian Adrian Grijincu @ 2011-05-08 22:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, Lucian Adrian Grijincu
In-Reply-To: <1304894407-32201-1-git-send-email-lucian.grijincu@gmail.com>

ctl_table_group will replace in the future ctl_table_root and
ctl_table_set. For now it only takes from ctl_table_root the ctl_ops
field.

Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
 fs/proc/proc_sysctl.c  |    4 ++--
 include/linux/sysctl.h |   16 ++++++++++++----
 kernel/sysctl.c        |   18 ++++++++++++------
 net/sysctl_net.c       |   13 +++++++++----
 4 files changed, 35 insertions(+), 16 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 55c9bd1..375d145 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -131,7 +131,7 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *buf,
 	 * and won't be until we finish.
 	 */
 	error = -EPERM;
-	if (sysctl_perm(head->root->ctl_ops, table, write ? MAY_WRITE : MAY_READ))
+	if (sysctl_perm(head->ctl_group, table, write ? MAY_WRITE : MAY_READ))
 		goto out;
 
 	/* if that can happen at all, it should be -EINVAL, not -EISDIR */
@@ -305,7 +305,7 @@ static int proc_sys_permission(struct inode *inode, int mask,unsigned int flags)
 	if (!table) /* global root - r-xr-xr-x */
 		error = mask & MAY_WRITE ? -EACCES : 0;
 	else /* Use the permissions on the sysctl table entry */
-		error = sysctl_perm(head->root->ctl_ops, table, mask);
+		error = sysctl_perm(head->ctl_group, table, mask);
 
 	sysctl_unuse_header(head);
 	return error;
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 8209d75..a12ab12 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -935,6 +935,7 @@ enum
 /* For the /proc/sys support */
 struct ctl_table;
 struct ctl_table_header;
+struct ctl_table_group;
 struct ctl_table_group_ops;
 struct nsproxy;
 struct ctl_table_root;
@@ -961,7 +962,7 @@ extern struct ctl_table_header *sysctl_use_next_header(struct ctl_table_header *
 extern struct ctl_table_header *__sysctl_use_next_header(struct nsproxy *namespaces,
 						struct ctl_table_header *prev);
 extern void sysctl_unuse_header(struct ctl_table_header *prev);
-extern int sysctl_perm(const struct ctl_table_group_ops *ops,
+extern int sysctl_perm(struct ctl_table_group *group,
 		       struct ctl_table *table, int op);
 
 typedef struct ctl_table ctl_table;
@@ -1041,12 +1042,15 @@ struct ctl_table_group_ops {
 	int (*permissions)(struct ctl_table *table);
 };
 
+struct ctl_table_group {
+	const struct ctl_table_group_ops *ctl_ops;
+};
+
 struct ctl_table_root {
 	struct list_head root_list;
 	struct ctl_table_set default_set;
 	struct ctl_table_set *(*lookup)(struct ctl_table_root *root,
 					   struct nsproxy *namespaces);
-	const struct ctl_table_group_ops *ctl_ops;
 };
 
 /* struct ctl_table_header is used to maintain dynamic lists of
@@ -1073,6 +1077,7 @@ struct ctl_table_header
 	struct completion *unregistering;
 	struct ctl_table *ctl_table_arg;
 	struct ctl_table_root *root;
+	struct ctl_table_group *ctl_group;
 	struct ctl_table_set *set;
 	struct ctl_table *attached_by;
 	struct ctl_table *attached_to;
@@ -1086,8 +1091,11 @@ struct ctl_path {
 
 void register_sysctl_root(struct ctl_table_root *root);
 struct ctl_table_header *__register_sysctl_paths(
-	struct ctl_table_root *root, struct nsproxy *namespaces,
-	const struct ctl_path *path, struct ctl_table *table);
+	struct ctl_table_root *root,
+	struct ctl_table_group *group,
+	struct nsproxy *namespaces,
+	const struct ctl_path *path,
+	struct ctl_table *table);
 struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
 						struct ctl_table *table);
 
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 0f00b87..8dde087 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -200,6 +200,10 @@ static int sysrq_sysctl_handler(ctl_table *table, int write,
 /* uses default ops */
 static const struct ctl_table_group_ops root_table_group_ops = { };
 
+static struct ctl_table_group root_table_group = {
+	.ctl_ops = &root_table_group_ops,
+};
+
 static struct ctl_table root_table[];
 static struct ctl_table_root sysctl_table_root;
 static struct ctl_table_header root_table_header = {
@@ -207,11 +211,11 @@ static struct ctl_table_header root_table_header = {
 	.ctl_table = root_table,
 	.ctl_entry = LIST_HEAD_INIT(sysctl_table_root.default_set.list),}},
 	.root = &sysctl_table_root,
+	.ctl_group = &root_table_group,
 	.set = &sysctl_table_root.default_set,
 };
 
 static struct ctl_table_root sysctl_table_root = {
-	.ctl_ops = &root_table_group_ops,
 	.root_list = LIST_HEAD_INIT(sysctl_table_root.root_list),
 	.default_set.list = LIST_HEAD_INIT(root_table_header.ctl_entry),
 };
@@ -1664,10 +1668,10 @@ static int test_perm(int mode, int op)
 	return -EACCES;
 }
 
-int sysctl_perm(const struct ctl_table_group_ops *ops,
-		struct ctl_table *table, int op)
+int sysctl_perm(struct ctl_table_group *group, struct ctl_table *table, int op)
 {
 	int mode;
+	const struct ctl_table_group_ops *ops = group->ctl_ops;
 
 	if (ops->permissions)
 		mode = ops->permissions(table);
@@ -1838,6 +1842,7 @@ static void try_attach(struct ctl_table_header *p, struct ctl_table_header *q)
  */
 struct ctl_table_header *__register_sysctl_paths(
 	struct ctl_table_root *root,
+	struct ctl_table_group *group,
 	struct nsproxy *namespaces,
 	const struct ctl_path *path, struct ctl_table *table)
 {
@@ -1883,6 +1888,7 @@ struct ctl_table_header *__register_sysctl_paths(
 	INIT_LIST_HEAD(&header->ctl_entry);
 	header->unregistering = NULL;
 	header->root = root;
+	header->ctl_group = group;
 	header->ctl_header_refs = 1;
 #ifdef CONFIG_SYSCTL_SYSCALL_CHECK
 	if (sysctl_check_table(namespaces, header->ctl_table)) {
@@ -1923,8 +1929,8 @@ struct ctl_table_header *__register_sysctl_paths(
 struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
 						struct ctl_table *table)
 {
-	return __register_sysctl_paths(&sysctl_table_root, current->nsproxy,
-					path, table);
+	return __register_sysctl_paths(&sysctl_table_root, &root_table_group,
+				       current->nsproxy, path, table);
 }
 
 /**
@@ -1956,7 +1962,7 @@ void unregister_sysctl_table(struct ctl_table_header * header)
 
 int sysctl_is_seen(struct ctl_table_header *p)
 {
-	const struct ctl_table_group_ops *ops = p->root->ctl_ops;
+	const struct ctl_table_group_ops *ops = p->ctl_group->ctl_ops;
 	int res;
 	spin_lock(&sysctl_lock);
 	if (p->unregistering)
diff --git a/net/sysctl_net.c b/net/sysctl_net.c
index c0d7140..5009d4e 100644
--- a/net/sysctl_net.c
+++ b/net/sysctl_net.c
@@ -56,9 +56,12 @@ static const struct ctl_table_group_ops net_sysctl_group_ops = {
 	.permissions = net_ctl_permissions,
 };
 
+static struct ctl_table_group net_sysctl_group = {
+	.ctl_ops = &net_sysctl_group_ops,
+};
+
 static struct ctl_table_root net_sysctl_root = {
 	.lookup = net_ctl_header_lookup,
-	.ctl_ops = &net_sysctl_group_ops,
 };
 
 static int net_ctl_ro_header_permissions(ctl_table *table)
@@ -73,10 +76,12 @@ static const struct ctl_table_group_ops net_sysctl_ro_group_ops = {
 	.permissions = net_ctl_ro_header_permissions,
 };
 
-static struct ctl_table_root net_sysctl_ro_root = {
+static struct ctl_table_group net_sysctl_ro_group = {
 	.ctl_ops = &net_sysctl_ro_group_ops,
 };
 
+static struct ctl_table_root net_sysctl_ro_root = { };
+
 static int __net_init sysctl_net_init(struct net *net)
 {
 	setup_sysctl_set(&net->sysctls,
@@ -114,7 +119,7 @@ struct ctl_table_header *register_net_sysctl_table(struct net *net,
 	struct nsproxy namespaces;
 	namespaces = *current->nsproxy;
 	namespaces.net_ns = net;
-	return __register_sysctl_paths(&net_sysctl_root,
+	return __register_sysctl_paths(&net_sysctl_root, &net_sysctl_group,
 					&namespaces, path, table);
 }
 EXPORT_SYMBOL_GPL(register_net_sysctl_table);
@@ -122,7 +127,7 @@ EXPORT_SYMBOL_GPL(register_net_sysctl_table);
 struct ctl_table_header *register_net_sysctl_rotable(const
 		struct ctl_path *path, struct ctl_table *table)
 {
-	return __register_sysctl_paths(&net_sysctl_ro_root,
+	return __register_sysctl_paths(&net_sysctl_ro_root, &net_sysctl_ro_group,
 			&init_nsproxy, path, table);
 }
 EXPORT_SYMBOL_GPL(register_net_sysctl_rotable);
-- 
1.7.5.134.g1c08b


^ permalink raw reply related

* [v2 076/115] sysctl: faster tree-based sysctl implementation
From: Lucian Adrian Grijincu @ 2011-05-08 22:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, Lucian Adrian Grijincu
In-Reply-To: <1304894407-32201-1-git-send-email-lucian.grijincu@gmail.com>

The old implementation used inefficient algorithms both at
lookup/readdir times and at registration. This patch introduces an
improved algorithm: lower memory consumption, better time complexity
for lookup/readdir/registration. Locking is a bit heavier in this
algorithm (in this patch: reader locks for lookup/readdir, writer
locks for register/unregister; in a later patch in this series: RCU +
spin-lock). I'll address this locking issue later in this commit.

I will shortly describe the previous algorithm, the new one and brag
at the end with an endless list of improvements and new limitations.

= Old algorithm =

== Description ==
We created a ctl_table_header for each registered sysctl table. The
header's role is to maintain sysctl internal data, reference counting
and as a token to unregister the table.

All headers were put in a list in the order of registration without
regard to the position of the tables in the sysctl tree. Headers were
also 'attached' one to another to (somewhat) speed up lookup/readdir.

Attachment meant looking at each other already registered header and
comparing the paths to the tables. A newly registered header would be
attached to the first header with which it would share most of it's
path.

e.g. paths registered: /, /a/b/c, /a/b/c/d, /a/x, /a/x/y, /a/z
     tree:
  /
  + /a/b/c
     |   + /a/b/c/d
     + /a/x
     | /a/x/y
     + /a/z

== Time complexity ==

- register N tables would take O(N^2) steps (see above)

- lookup: if the item searched for is not found in the current header,
  iterate the list of headers until you find another header that's
  attached to the current position in the header's table. Lookups for
  elements that are in a header registered under the current position
  or inexistent elements would take O(N) steps each.

- readdir: after searching the current headers table in the current
  position, always do an O(N) search for a header attached to the
  current table position.

== Memory ==

Each header was allocated some data and a variable-length path.
O(1) with kzalloc/kfree.

= New algorithm =

== Description ==

Reuses the 'ctl_table_header' concept but with two distinct meanings:
- as a wrapper of a table registered by the user
- as a directory entry.

Registering the paths from the above example gives this tree:
 paths: /, /a/b/c, /a/b/c/d, /a/x, /a/x/y, /a/z
 tree:
     /: .subdirs = a
       a: .subdirs = b x z
         b: subdirs = c
            c: subdirs = d
	      d:
         x: subdirs = y
	   y:
         z:

Each directory gets a header. Each header has a parent (except root)
and two lists:
 - ctl_subdirs: list of sub-directories - other headers
 - ctl_tables: list of headers that wrap a ctl_table array

Because the directory structure is now maintained as ctl_table_header
objects, we needed to remove the .child from ctl_tables (this explains
the previous patches). A ctl_table array represents a list of files.

== Time complexity ==

- registration of N headers. Registration means adding new directories
  at each level or incrementing an existing directory's refcount.

  - O(N * lnN) - if the paths to the headers are evenly distributed

  - O(N^2) - if most of the headers registered are children of the
    same parent directory (searching the list of subdirs takes O(N)).
    There are cases where this happens (e.g. registering sysctl
    entries for net devices under /proc/sys/net/ipv4|6/conf/device).

    A few later patches will add an optimisation, to fix locations
    that might trigger the O(N^2) issue.

- lookup: O(len(subdirs) + sum(len(tarr) for each tarr in ctl_tables)
  - could be made better:
     - sort ctl_subdirs (for binary search)
     - replace ctl_subdirs with a hash-table (increase memory footprint)
     - sort ctl_table entries at registration time (for binary search).
    Could be done, but I'm too lazy to do it now.

- readdir: O(len(subdirs) + sum(len(tarr) for each tarr in ctl_tables)
   - can't get any better than this :)

== Memory complexity ==

Although we create more ctl_table_header (one for each directory, one
for each table, and because we deleted the .child from ctl_table there
are more tables registered than before this patch) we remove the need
to store a full path (from too to the table) as was done in the old
solution => a O(N) small memory gain with report to the old algo.

= Limitations =

== ctl_table does not has .child => some code uglyfication  ==

Registering tables with multiple directories and files cannot be done
in a single operation: there must be at least a table registered for
each directory. This make code that registers sysctls uglier (see the
earlier patches that remove .child form sched_domain and the root
table). Other places e.g. the parport systls look much better now
without .child: I can now read and understand that code.

== Handling of netns specific paths is weirder ==

The algorithm descriptions from above are simplifications. In reality
the code needs to handle directories and files that must be visible in
some netns' only. E.g. the /proc/sys/net/ipv4/conf/DEVICENAME/
directory and it's files must be visible only in the netns of that
device.

The old algorithm used a secondary list that indexed all netns
specific headers. All algorithms remain the same, with the mention
that besides searching the global list, the algorithm would also look
into the current netns' list of headers. This scales perfectly in
rapport to the number of network namespaces.

The new algorithm does something similar, but a bit more complicated.
We also use netns specific lists of directories/tables and store them
in a special directory ctl_table_header (which I dubbed the
"netns-correspondent" of another directory - I'm not very pleased with
the name either).

When registering a net-ns specific table, we will create a
"netns-correspondent" to the last directory that is not net-ns
specific in that path.

E.g.: we're registering a netns specific table for 'lo':
      common path: /proc/sys/net/ipv4/
       netns path: /proc/sys/net/ipv4/conf/lo/

   We'll create an (unnamed) netns correspondent for 'ipv4' which will
   have 'conf' as it's subdir.

E.g.: We're registering a netns specific file in /proc/sys/net/core/somaxconn
      common path: /proc/sys/net/core/
       netns path: /proc/sys/net/core/

We'll create an (unnamed) netns correspondent for 'core' with the
table containing 'somaxconn' in ctl_tables.

All net-ns correspondents of one netns are held in a single list, and
each netns gets it own list. This keeps the algorithm complexity
indifferent of the number of network namespaces (as was the old one).

However, now only a smaller part of directories are members of this
list, improving register/lookup/readdir time complexity.

There is one ugly limitation that stems from this approach.
E.g.: register these files in this order:
 - register common         /dir1/file-common1
 - register netns specific /dir1/dir2/file-netns
 - register common         /dir1/dir2/file-common2

  We'll have this tree:
   'dir1' { .subdirs = ['dir2'], .tables = ['file-common1'] }
     ^                    |
     |                    -> { .subdirs = [], .tables = ['file-common2'] }
     |
     | (unnamed netns-corresp for dir1)
     -> { .subdir = ['dir2'] }
                        |
                        -> { .subdirs = [], .tables = ['file-netns'] }

readdir: when we list the contents of 'dir1' we'll see it has two
         sub-directories named 'dir2' each with a file in it.

lookup: lookup of /dir1/dir2/file-netns will not work because we find
        'dir2' as a subdir of 'dir1' and stick with it and never look
        into the netns correspondent of 'dir1'.

This can be fixed in two ways:

- A) by making sure to never register a netns specific directory and
  after that register that directory as a common one. From what I can
  tell there isn't such a problem in the kernel at the moment, but I
  did not study the source in detail.

- B) by increasing the complexity of the code:

  - readdir: looking at both lists and comparing if we have already
             listed a directory as common, so we don't list twice.
             -> For imbalanced trees this can make readdir O(N^2) :(

  - register: the netns 'dir2' from the example above needs to be
              connected to the common 'dir2' when 'dir2' is
              registered. I'm not even going to thing of how time
              complexity/ugliness is going to explode here.

= Change summary =

* include/linux/sysctl.h
  - removed _set and _root, replaced with _group

  - netns correspondent directories are held in each netns's
    group->corresp_list

  - reused the header structure to represent directories which don't
    use ctl_table_arg, but store the directory name directly.

  - each directory header also gets two lists: subdirs and tables

* fs/proc/proc_sysctl.c
  - a proc inode has ->sysctl_entry set only for files, not
    directories as these store the dirname directly

  - lookup:
     - take the dirs read-lock and iterate through subdirs and tables
     - if nothing is found, try the dir's netns-correspondent

  - scan: list every subdir and file that was not listed before

  - readdir: scan the current dir and it's netns correspondent

* kernel/sysctl.c
  - inlines the code of use_table/unuse_table as it is not used
    elsewhere (used to be called from __register, but aren't any more)

  - adds routines to get/set the netns-correspondent

  - adds routines to protect the subdirs/tables lists (rwsem)

  - __register_sysctl_paths:
    - preallocate ctl_table_header for every dir in 'path'
    - increase the ctl_header_refs of every existing directory
    - if the group needs a netns-correspondent it is created for the
      last existing directory that is part of the non-netns specific
      path.
    - all the non-existing directories are added as children of their
      parent's subdir lists.

   - unregister:
     - wait until no one uses the header
     - for normal directories and table-wrapper headers take the
       parent's write lock to be able to delete something from one of
       it's lists (ctl_subdir or ctl_tables).
     - netns-correspondent headers must take the netns group list lock
       before deleting.

Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
 fs/proc/proc_sysctl.c       |  159 ++++++++-----
 include/linux/sysctl.h      |  120 +++++------
 include/net/net_namespace.h |    2 +-
 kernel/sysctl.c             |  533 ++++++++++++++++++++++++++----------------
 kernel/sysctl_check.c       |  168 +--------------
 net/sysctl_net.c            |   41 +---
 6 files changed, 499 insertions(+), 524 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 375d145..9337149 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -32,13 +32,14 @@ static struct inode *proc_sys_make_inode(struct super_block *sb,
 	ei->sysctl_entry = table;
 
 	inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME;
-	inode->i_mode = table->mode;
-	if (!table->child) {
-		inode->i_mode |= S_IFREG;
+
+	/* directories have table==NULL (thus ei->sysctl_entry is NULL too) */
+	if (table) {
+		inode->i_mode = S_IFREG | table->mode;
 		inode->i_op = &proc_sys_inode_operations;
 		inode->i_fop = &proc_sys_file_operations;
 	} else {
-		inode->i_mode |= S_IFDIR;
+		inode->i_mode = S_IFDIR | S_IRUGO | S_IWUSR;
 		inode->i_nlink = 0;
 		inode->i_op = &proc_sys_dir_operations;
 		inode->i_fop = &proc_sys_dir_file_operations;
@@ -66,42 +67,65 @@ static struct dentry *proc_sys_lookup(struct inode *dir, struct dentry *dentry,
 					struct nameidata *nd)
 {
 	struct ctl_table_header *head = sysctl_use_header(PROC_I(dir)->sysctl);
-	struct ctl_table *table = PROC_I(dir)->sysctl_entry;
-	struct ctl_table_header *h = NULL;
 	struct qstr *name = &dentry->d_name;
-	struct ctl_table *p;
+	struct ctl_table_header *h = NULL, *found_head = NULL;
+	struct ctl_table *table = NULL;
 	struct inode *inode;
 	struct dentry *err = ERR_PTR(-ENOENT);
 
+
 	if (IS_ERR(head))
 		return ERR_CAST(head);
 
-	if (table && !table->child) {
-		WARN_ON(1);
-		goto out;
+retry:
+	sysctl_read_lock_head(head);
+
+	/* first check whether a subdirectory has the searched-for name */
+	list_for_each_entry(h, &head->ctl_subdirs, ctl_entry) {
+		if (IS_ERR(sysctl_use_header(h)))
+			continue;
+
+		if (strcmp(name->name, h->ctl_dirname) == 0) {
+			found_head = h;
+			goto search_finished;
+		}
+		sysctl_unuse_header(h);
 	}
 
-	table = table ? table->child : head->ctl_table;
+	/* no subdir with that name, look for the file in the ctl_tables */
+	list_for_each_entry(h, &head->ctl_tables, ctl_entry) {
+		if (IS_ERR(sysctl_use_header(h)))
+			continue;
 
-	p = find_in_table(table, name);
-	if (!p) {
-		for (h = sysctl_use_next_header(NULL); h; h = sysctl_use_next_header(h)) {
-			if (h->attached_to != table)
-				continue;
-			p = find_in_table(h->attached_by, name);
-			if (p)
-				break;
+		table = find_in_table(h->ctl_table_arg, name);
+		if (table) {
+			found_head = h;
+			goto search_finished;
 		}
+		sysctl_unuse_header(h);
 	}
 
-	if (!p)
+search_finished:
+	sysctl_read_unlock_head(head);
+
+	if (!found_head) {
+		struct ctl_table_header *netns_corresp;
+		/* the item was not found in the dir's sub-directories
+		 * or tables. See if this dir has a netns
+		 * correspondent and restart the lookup in there. */
+		netns_corresp = sysctl_use_netns_corresp(head);
+		if (netns_corresp) {
+			sysctl_unuse_header(head);
+			head = netns_corresp;
+			goto retry;
+		}
+	}
+	if (!found_head)
 		goto out;
 
 	err = ERR_PTR(-ENOMEM);
-	inode = proc_sys_make_inode(dir->i_sb, h ? h : head, p);
-	if (h)
-		sysctl_unuse_header(h);
-
+	inode = proc_sys_make_inode(dir->i_sb, found_head, table);
+	sysctl_unuse_header(found_head);
 	if (!inode)
 		goto out;
 
@@ -174,8 +198,8 @@ static int proc_sys_fill_cache(struct file *filp, void *dirent,
 	ino_t ino = 0;
 	unsigned type = DT_UNKNOWN;
 
-	qname.name = table->procname;
-	qname.len  = strlen(table->procname);
+	qname.name = table ? table->procname : head->ctl_dirname;
+	qname.len  = strlen(qname.name);
 	qname.hash = full_name_hash(qname.name, qname.len);
 
 	child = d_lookup(dir, &qname);
@@ -201,28 +225,56 @@ static int proc_sys_fill_cache(struct file *filp, void *dirent,
 	return !!filldir(dirent, qname.name, qname.len, filp->f_pos, ino, type);
 }
 
-static int scan(struct ctl_table_header *head, ctl_table *table,
+static int scan(struct ctl_table_header *head,
 		unsigned long *pos, struct file *file,
 		void *dirent, filldir_t filldir)
 {
+	struct ctl_table_header *h;
+	int res = 0;
 
-	for (; table->procname; table++, (*pos)++) {
-		int res;
+	sysctl_read_lock_head(head);
 
-		/* Can't do anything without a proc name */
-		if (!table->procname)
+	list_for_each_entry(h, &head->ctl_subdirs, ctl_entry) {
+		if (*pos < file->f_pos) {
+			(*pos)++;
 			continue;
+		}
 
-		if (*pos < file->f_pos)
+		if (IS_ERR(sysctl_use_header(h)))
 			continue;
 
-		res = proc_sys_fill_cache(file, dirent, filldir, head, table);
+		res = proc_sys_fill_cache(file, dirent, filldir, h, NULL);
+		sysctl_unuse_header(h);
 		if (res)
-			return res;
+			goto out;
 
 		file->f_pos = *pos + 1;
+		(*pos)++;
 	}
-	return 0;
+
+	list_for_each_entry(h, &head->ctl_tables, ctl_entry) {
+		ctl_table *t;
+
+		if (IS_ERR(sysctl_use_header(h)))
+			continue;
+
+		for (t = h->ctl_table_arg; t->procname; t++, (*pos)++) {
+			if (*pos < file->f_pos)
+				continue;
+
+			res = proc_sys_fill_cache(file, dirent, filldir, h, t);
+			if (res) {
+				sysctl_unuse_header(h);
+				goto out;
+			}
+			file->f_pos = *pos + 1;
+		}
+		sysctl_unuse_header(h);
+	}
+
+out:
+	sysctl_read_unlock_head(head);
+	return res;
 }
 
 static int proc_sys_readdir(struct file *filp, void *dirent, filldir_t filldir)
@@ -230,21 +282,12 @@ static int proc_sys_readdir(struct file *filp, void *dirent, filldir_t filldir)
 	struct dentry *dentry = filp->f_path.dentry;
 	struct inode *inode = dentry->d_inode;
 	struct ctl_table_header *head = sysctl_use_header(PROC_I(inode)->sysctl);
-	struct ctl_table *table = PROC_I(inode)->sysctl_entry;
-	struct ctl_table_header *h = NULL;
 	unsigned long pos;
 	int ret = -EINVAL;
 
 	if (IS_ERR(head))
 		return PTR_ERR(head);
 
-	if (table && !table->child) {
-		WARN_ON(1);
-		goto out;
-	}
-
-	table = table ? table->child : head->ctl_table;
-
 	ret = 0;
 	/* Avoid a switch here: arm builds fail with missing __cmpdi2 */
 	if (filp->f_pos == 0) {
@@ -260,18 +303,20 @@ static int proc_sys_readdir(struct file *filp, void *dirent, filldir_t filldir)
 		filp->f_pos++;
 	}
 	pos = 2;
-
-	ret = scan(head, table, &pos, filp, dirent, filldir);
-	if (ret)
-		goto out;
-
-	for (h = sysctl_use_next_header(NULL); h; h = sysctl_use_next_header(h)) {
-		if (h->attached_to != table)
-			continue;
-		ret = scan(h, h->attached_by, &pos, filp, dirent, filldir);
-		if (ret) {
-			sysctl_unuse_header(h);
-			break;
+	ret = scan(head, &pos, filp, dirent, filldir);
+	if (!ret) {
+		/* the netns-correspondent contains only those
+		 * subdirectories that are netns-specific, and not
+		 * shared with the @head directory: there is no
+		 * possibility to list the same directory twice (once
+		 * for @head and once for @netns_corresp). Sibling
+		 * tables cannot contain the entries with the same
+		 * name, no need to worry about them either. */
+		struct ctl_table_header *netns_corresp;
+		netns_corresp = sysctl_use_netns_corresp(head);
+		if (netns_corresp) {
+			ret = scan(netns_corresp, &pos, filp, dirent, filldir);
+			sysctl_unuse_header(netns_corresp);
 		}
 	}
 	ret = 1;
@@ -302,7 +347,7 @@ static int proc_sys_permission(struct inode *inode, int mask,unsigned int flags)
 		return PTR_ERR(head);
 
 	table = PROC_I(inode)->sysctl_entry;
-	if (!table) /* global root - r-xr-xr-x */
+	if (!table) /* directory - r-xr-xr-x */
 		error = mask & MAY_WRITE ? -EACCES : 0;
 	else /* Use the permissions on the sysctl table entry */
 		error = sysctl_perm(head->ctl_group, table, mask);
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index a12ab12..b626271 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -937,18 +937,12 @@ struct ctl_table;
 struct ctl_table_header;
 struct ctl_table_group;
 struct ctl_table_group_ops;
-struct nsproxy;
-struct ctl_table_root;
-
-struct ctl_table_set {
-	struct list_head list;
-	struct ctl_table_set *parent;
-};
 
 extern __init int sysctl_init(void);
 
-extern void setup_sysctl_set(struct ctl_table_set *p,
-			     struct ctl_table_set *parent);
+extern void sysctl_init_group(struct ctl_table_group *group,
+			      const struct ctl_table_group_ops *ops,
+			      int has_netns_corresp);
 
 
 /* get/put a reference to this header that
@@ -957,14 +951,23 @@ extern void sysctl_proc_inode_get(struct ctl_table_header *);
 extern void sysctl_proc_inode_put(struct ctl_table_header *);
 
 extern int sysctl_is_seen(struct ctl_table_header *);
-extern struct ctl_table_header *sysctl_use_header(struct ctl_table_header *);
-extern struct ctl_table_header *sysctl_use_next_header(struct ctl_table_header *prev);
-extern struct ctl_table_header *__sysctl_use_next_header(struct nsproxy *namespaces,
-						struct ctl_table_header *prev);
-extern void sysctl_unuse_header(struct ctl_table_header *prev);
 extern int sysctl_perm(struct ctl_table_group *group,
 		       struct ctl_table *table, int op);
 
+/* proctect the ctl_subdirs/ctl_tables lists */
+extern void sysctl_write_lock_head(struct ctl_table_header *head);
+extern void sysctl_write_unlock_head(struct ctl_table_header *head);
+extern void sysctl_read_lock_head(struct ctl_table_header *head);
+extern void sysctl_read_unlock_head(struct ctl_table_header *head);
+
+/* get/put references to this header with the pourpose of using it's internals.
+ * As long as the use count is not zero, there may be items accessing it,
+ * so we can't even remove it from the lists (ctl_entry). */
+extern struct ctl_table_header *sysctl_use_header(struct ctl_table_header *);
+extern struct ctl_table_header *sysctl_use_netns_corresp(struct ctl_table_header *);
+extern void sysctl_unuse_header(struct ctl_table_header *prev);
+
+
 typedef struct ctl_table ctl_table;
 
 typedef int proc_handler (struct ctl_table *ctl, int write,
@@ -991,39 +994,29 @@ extern int proc_do_large_bitmap(struct ctl_table *, int,
 
 /*
  * Register a set of sysctl names by calling __register_sysctl_paths
- * with an initialised array of struct ctl_table's.  An entry with 
- * NULL procname terminates the table.  table->de will be
- * set up by the registration and need not be initialised in advance.
- *
- * sysctl names can be mirrored automatically under /proc/sys.  The
- * procname supplied controls /proc naming.
+ * with an initialised array of struct ctl_table's. An entry with a
+ * NULL procname terminates the table.
  *
  * The table's mode will be honoured both for sys_sysctl(2) and
- * proc-fs access.
+ * proc-fs access (sys_sysctl(2) uses procfs internally).
  *
- * Leaf nodes in the sysctl tree will be represented by a single file
- * under /proc; non-leaf nodes will be represented by directories.  A
- * null procname disables /proc mirroring at this node.
+ * Only files can be represented by ctl_table elements. Directories
+ * are implemented with ctl_table_header objects.
  *
- * sysctl(2) can automatically manage read and write requests through
- * the sysctl table.  The data and maxlen fields of the ctl_table
- * struct enable minimal validation of the values being written to be
- * performed, and the mode field allows minimal authentication.
- * 
- * There must be a proc_handler routine for any terminal nodes
- * mirrored under /proc/sys (non-terminals are handled by a built-in
- * directory handler).  Several default handlers are available to
- * cover common cases.
+ * The data and maxlen fields of the ctl_table struct enable minimal
+ * validation of the values being written to be performed, and the
+ * mode field allows minimal authentication.
+ *
+ * There must be a proc_handler routine for each ctl_table node.
+ * Several default handlers are available to cover common cases.
  */
 
 /* A sysctl table is an array of struct ctl_table: */
-struct ctl_table 
-{
+struct ctl_table {
 	const char *procname;		/* Text ID for /proc/sys, or zero */
 	void *data;
 	int maxlen;
 	mode_t mode;
-	struct ctl_table *child;
 	proc_handler *proc_handler;	/* Callback for text formatting */
 	void *extra1;
 	void *extra2;
@@ -1035,8 +1028,8 @@ struct ctl_table_group_ops {
 	 * netns in which that eth0 interface lives.
 	 *
 	 * If this hook is not set, then all the sysctl entries in
-	 * this set are always visible. */
-	int (*is_seen)(struct ctl_table_set *set);
+	 * this group are always visible. */
+	int (*is_seen)(struct ctl_table_group *group);
 
 	/* hook to alter permissions for some sysctl nodes at runtime */
 	int (*permissions)(struct ctl_table *table);
@@ -1044,22 +1037,24 @@ struct ctl_table_group_ops {
 
 struct ctl_table_group {
 	const struct ctl_table_group_ops *ctl_ops;
-};
-
-struct ctl_table_root {
-	struct list_head root_list;
-	struct ctl_table_set default_set;
-	struct ctl_table_set *(*lookup)(struct ctl_table_root *root,
-					   struct nsproxy *namespaces);
+	/* A list of ctl_table_header elements that represent the
+	 * netns-specific correspondents of some sysctl directories */
+	struct list_head corresp_list;
+	/* binary: whether this group uses the @corresp_list */
+	char has_netns_corresp;
 };
 
 /* struct ctl_table_header is used to maintain dynamic lists of
    struct ctl_table trees. */
-struct ctl_table_header
-{
+struct ctl_table_header {
 	union {
 		struct {
-			struct ctl_table *ctl_table;
+			/* a header is used either as a wraper for a
+			 * ctl_table array or as directory entry. */
+			union {
+				struct ctl_table *ctl_table_arg;
+				const char *ctl_dirname;
+			};
 			struct list_head ctl_entry;
 			/* references to this header from contexts that
 			 * can access fields of this header */
@@ -1075,12 +1070,13 @@ struct ctl_table_header
 		struct rcu_head rcu;
 	};
 	struct completion *unregistering;
-	struct ctl_table *ctl_table_arg;
-	struct ctl_table_root *root;
 	struct ctl_table_group *ctl_group;
-	struct ctl_table_set *set;
-	struct ctl_table *attached_by;
-	struct ctl_table *attached_to;
+
+	/* Lists of other ctl_table_headers that represent either
+	 * subdirectories or ctl_tables of files. Add/remove and walk
+	 * this list holding the header's read/write lock. */
+	struct list_head ctl_tables;
+	struct list_head ctl_subdirs;
 	struct ctl_table_header *parent;
 };
 
@@ -1089,18 +1085,12 @@ struct ctl_path {
 	const char *procname;
 };
 
-void register_sysctl_root(struct ctl_table_root *root);
-struct ctl_table_header *__register_sysctl_paths(
-	struct ctl_table_root *root,
-	struct ctl_table_group *group,
-	struct nsproxy *namespaces,
-	const struct ctl_path *path,
-	struct ctl_table *table);
-struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
-						struct ctl_table *table);
-
-void unregister_sysctl_table(struct ctl_table_header * table);
-int sysctl_check_table(struct nsproxy *namespaces, struct ctl_table *table);
+extern struct ctl_table_header *__register_sysctl_paths(struct ctl_table_group *g,
+							const struct ctl_path *p,
+							struct ctl_table *table);
+extern struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
+						      struct ctl_table *table);
+extern void unregister_sysctl_table(struct ctl_table_header *table);
 
 #endif /* __KERNEL__ */
 
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 3ae4919..871dd2b 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -52,7 +52,7 @@ struct net {
 	struct proc_dir_entry 	*proc_net_stat;
 
 #ifdef CONFIG_SYSCTL
-	struct ctl_table_set	sysctls;
+	struct ctl_table_group	netns_ctl_group;
 #endif
 
 	struct sock 		*rtnl;			/* rtnetlink socket */
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index a863b56..cbf33b1 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -56,6 +56,7 @@
 #include <linux/kprobes.h>
 #include <linux/pipe_fs_i.h>
 #include <linux/oom.h>
+#include <linux/rwsem.h>
 
 #include <asm/uaccess.h>
 #include <asm/processor.h>
@@ -201,23 +202,16 @@ static int sysrq_sysctl_handler(ctl_table *table, int write,
 static const struct ctl_table_group_ops root_table_group_ops = { };
 
 static struct ctl_table_group root_table_group = {
+	.has_netns_corresp = 0,
 	.ctl_ops = &root_table_group_ops,
 };
 
-static struct ctl_table root_table[];
-static struct ctl_table_root sysctl_table_root;
 static struct ctl_table_header root_table_header = {
 	{{.ctl_header_refs = 1,
-	.ctl_table = root_table,
-	.ctl_entry = LIST_HEAD_INIT(sysctl_table_root.default_set.list),}},
-	.root = &sysctl_table_root,
-	.ctl_group = &root_table_group,
-	.set = &sysctl_table_root.default_set,
-};
-
-static struct ctl_table_root sysctl_table_root = {
-	.root_list = LIST_HEAD_INIT(sysctl_table_root.root_list),
-	.default_set.list = LIST_HEAD_INIT(root_table_header.ctl_entry),
+	  .ctl_entry	= LIST_HEAD_INIT(root_table_header.ctl_entry),}},
+	.ctl_tables	= LIST_HEAD_INIT(root_table_header.ctl_tables),
+	.ctl_subdirs	= LIST_HEAD_INIT(root_table_header.ctl_subdirs),
+	.ctl_group	= &root_table_group,
 };
 
 #ifdef HAVE_ARCH_PICK_MMAP_LAYOUT
@@ -226,10 +220,6 @@ int sysctl_legacy_va_layout;
 
 /* The default sysctl tables: */
 
-static struct ctl_table root_table[] = {
-	{ }
-};
-
 #ifdef CONFIG_SCHED_DEBUG
 static int min_sched_granularity_ns = 100000;		/* 100 usecs */
 static int max_sched_granularity_ns = NSEC_PER_SEC;	/* 1 second */
@@ -1575,78 +1565,76 @@ void sysctl_proc_inode_put(struct ctl_table_header *head)
 	spin_unlock(&sysctl_lock);
 }
 
-static struct ctl_table_set *
-lookup_header_set(struct ctl_table_root *root, struct nsproxy *namespaces)
-{
-	struct ctl_table_set *set = &root->default_set;
-	if (root->lookup)
-		set = root->lookup(root, namespaces);
-	return set;
-}
-
-static struct list_head *
-lookup_header_list(struct ctl_table_root *root, struct nsproxy *namespaces)
-{
-	struct ctl_table_set *set = lookup_header_set(root, namespaces);
-	return &set->list;
-}
-
-struct ctl_table_header *__sysctl_use_next_header(struct nsproxy *namespaces,
-					    struct ctl_table_header *prev)
+/*
+ * Find the netns correspondent of @head. If it is not found and @dflt
+ * is != NULL, set dflt to be the netns correspondent of @head.
+ */
+static struct ctl_table_header *sysctl_use_netns_corresp_dflt(
+	struct ctl_table_group *group,
+	struct ctl_table_header *head,
+	struct ctl_table_header *dflt)
 {
-	struct ctl_table_root *root;
-	struct list_head *header_list;
-	struct ctl_table_header *head;
-	struct list_head *tmp;
+	struct ctl_table_header *h, *ret = NULL;
 
 	spin_lock(&sysctl_lock);
-	if (prev) {
-		head = prev;
-		tmp = &prev->ctl_entry;
-		__sysctl_unuse_header(prev);
-		goto next;
+	list_for_each_entry(h, &group->corresp_list, ctl_entry) {
+		if (h->parent != head)
+			continue;
+		if (IS_ERR(__sysctl_use_header(h)))
+			continue;
+		ret = h;
+		goto out;
 	}
-	tmp = &root_table_header.ctl_entry;
-	for (;;) {
-		head = list_entry(tmp, struct ctl_table_header, ctl_entry);
 
-		if (IS_ERR(__sysctl_use_header(head)))
-			goto next;
-		spin_unlock(&sysctl_lock);
-		return head;
-	next:
-		root = head->root;
-		tmp = tmp->next;
-		header_list = lookup_header_list(root, namespaces);
-		if (tmp != header_list)
-			continue;
+	if (!dflt)
+		goto out;
+
+	/* will not fail because dflt is a brand-new header that no
+	 * one has seen yet, so no one has started to unregister it */
+	dflt = __sysctl_use_header(dflt);
+	dflt->ctl_dirname = NULL; /* this marks the header as a netns-corresp */
+	dflt->parent = head;
+	list_add_tail(&dflt->ctl_entry, &group->corresp_list);
+	ret = dflt;
 
-		do {
-			root = list_entry(root->root_list.next,
-					struct ctl_table_root, root_list);
-			if (root == &sysctl_table_root)
-				goto out;
-			header_list = lookup_header_list(root, namespaces);
-		} while (list_empty(header_list));
-		tmp = header_list->next;
-	}
 out:
 	spin_unlock(&sysctl_lock);
-	return NULL;
+	return ret;
 }
 
-struct ctl_table_header *sysctl_use_next_header(struct ctl_table_header *prev)
+struct ctl_table_header *sysctl_use_netns_corresp(struct ctl_table_header *h)
 {
-	return __sysctl_use_next_header(current->nsproxy, prev);
+	struct ctl_table_group *g = &current->nsproxy->net_ns->netns_ctl_group;
+	/* dflt == NULL means: if there's a netns corresp return it,
+	 *                     if there isn't, just return NULL */
+	return sysctl_use_netns_corresp_dflt(g, h, NULL);
 }
 
-void register_sysctl_root(struct ctl_table_root *root)
+
+/* This semaphore protects the ctl_subdirs and ctl_tables lists. You
+ * must also have incremented the _use_refs of the header before
+ * accessing any field of the header including these lists. If it's
+ * deemed necessary, we can create a per-header rwsem. For now a
+ * global one will do. */
+static DECLARE_RWSEM(sysctl_rwsem);
+void sysctl_write_lock_head(struct ctl_table_header *head)
 {
-	spin_lock(&sysctl_lock);
-	list_add_tail(&root->root_list, &sysctl_table_root.root_list);
-	spin_unlock(&sysctl_lock);
+	down_write(&sysctl_rwsem);
+}
+void sysctl_write_unlock_head(struct ctl_table_header *head)
+{
+	up_write(&sysctl_rwsem);
+}
+void sysctl_read_lock_head(struct ctl_table_header *head)
+{
+	down_read(&sysctl_rwsem);
+}
+void sysctl_read_unlock_head(struct ctl_table_header *head)
+{
+	up_read(&sysctl_rwsem);
 }
 
+
 /*
  * sysctl_perm does NOT grant the superuser all rights automatically, because
  * some sysctl variables are readonly even to root.
@@ -1710,10 +1698,6 @@ __init int sysctl_init(void)
 		goto fail_register_binfmt_misc;
 #endif
 
-
-#ifdef CONFIG_SYSCTL_SYSCALL_CHECK
-	sysctl_check_table(current->nsproxy, root_table);
-#endif
 	return 0;
 
 
@@ -1734,57 +1718,214 @@ fail_register_kern:
 	return -ENOMEM;
 }
 
-static struct ctl_table *is_branch_in(struct ctl_table *branch,
-				      struct ctl_table *table)
+static void header_refs_inc(struct ctl_table_header*head)
 {
-	struct ctl_table *p;
-	const char *s = branch->procname;
+	spin_lock(&sysctl_lock);
+	head->ctl_header_refs ++;
+	spin_unlock(&sysctl_lock);
+}
 
-	/* branch should have named subdirectory as its first element */
-	if (!s || !branch->child)
-		return NULL;
+static int ctl_path_items(const struct ctl_path *path)
+{
+	int n = 0;
+	while (path->procname) {
+		path++;
+		n++;
+	}
+	return n;
+}
 
-	/* ... and nothing else */
-	if (branch[1].procname)
+
+static struct ctl_table_header *alloc_sysctl_header(struct ctl_table_group *group)
+{
+	struct ctl_table_header *h;
+
+	h = kzalloc(sizeof(*h), GFP_KERNEL);
+	if (!h)
 		return NULL;
 
-	/* table should contain subdirectory with the same name */
-	for (p = table; p->procname; p++) {
-		if (!p->child)
-			continue;
-		if (p->procname && strcmp(p->procname, s) == 0)
-			return p;
+	h->ctl_group = group;
+	INIT_LIST_HEAD(&h->ctl_entry);
+	INIT_LIST_HEAD(&h->ctl_subdirs);
+	INIT_LIST_HEAD(&h->ctl_tables);
+	return h;
+}
+
+/* Increment the references to an existing subdir of @parent with the name
+ * @name and return that subdir. If no such subdir exists, return NULL.
+ * Called under the write lock protecting parent's ctl_subdirs. */
+static struct ctl_table_header *mkdir_existing_dir(struct ctl_table_header *parent,
+						   const char *name)
+{
+	struct ctl_table_header *h;
+	list_for_each_entry(h, &parent->ctl_subdirs, ctl_entry) {
+		spin_lock(&sysctl_lock);
+		if (likely(!h->unregistering)) {
+			if (strcmp(name, h->ctl_dirname) == 0) {
+				h->ctl_header_refs ++;
+				spin_unlock(&sysctl_lock);
+				return h;
+			}
+		}
+		spin_unlock(&sysctl_lock);
 	}
 	return NULL;
 }
 
-/* see if attaching q to p would be an improvement */
-static void try_attach(struct ctl_table_header *p, struct ctl_table_header *q)
+/* Some sysctl paths are netns-specific. The last directory that in
+ * not net-ns specific will have a corespondent dir in the netns
+ * specific ctl_table_group. That corespondent will hold the lists of
+ * netns specific tables and subdirectories.
+ *
+ * E.g.: registering netns/interface specific directories:
+ *       common path: /proc/sys/net/ipv4/
+ *        netns path: /proc/sys/net/ipv4/conf/lo/
+ * We'll create an (unnamed) netns correspondent for 'ipv4' which will
+ * have 'conf' as it's subdir.
+ *
+ * E.g.: We're registering a netns specific file in /proc/sys/net/core/somaxconn
+ *       common path: /proc/sys/net/core/
+ *        netns path: /proc/sys/net/core/
+ * We'll create an (unnamed) netns correspondent for 'core'.
+ */
+static struct ctl_table_header *mkdir_netns_corresp(
+	struct ctl_table_header *parent,
+	struct ctl_table_group *group,
+	struct ctl_table_header **__netns_corresp)
+{
+	struct ctl_table_header *ret;
+
+	ret = sysctl_use_netns_corresp_dflt(group, parent, *__netns_corresp);
+
+	/* *__netns_corresp is a pre-allocated header. If we used it
+            here, we have to tell the caller so it won't free it. */
+	if (*__netns_corresp == ret)
+		*__netns_corresp = NULL;
+
+	header_refs_inc(ret);
+	sysctl_unuse_header(ret);
+	return ret;
+}
+
+/* Add @dir as a subdir of @parent.
+ * Called under the write lock protecting parent's ctl_subdirs. */
+static struct ctl_table_header *mkdir_new_dir(struct ctl_table_header *parent,
+					      struct ctl_table_header *dir)
+{
+	dir->parent = parent;
+	header_refs_inc(dir);
+	list_add_tail(&dir->ctl_entry, &parent->ctl_subdirs);
+	return dir;
+}
+
+/*
+ * Attach the branch denoted by @dirs (a series of directories that
+ * are children of their predecessor in the array) to @parent.
+ *
+ * If at a level there exist in the parent tree a node with the same
+ * name as the one we're trying to add, increment that nodes'
+ * @count. If not, add that dir as a subdir of it's parent.
+ *
+ * Nodes that remain non-NULL in @dirs must be freed by the caller as
+ * they were not added to the tree.
+ *
+ * Return the corresponding ctl_table_header for dirs[nr_dirs-1] from
+ * the tree (either one added by this function, or one already in the
+ * tree).
+ */
+static struct ctl_table_header *sysctl_mkdirs(struct ctl_table_header *parent,
+					      struct ctl_table_group *group,
+					      const struct ctl_path *path,
+					      int nr_dirs)
 {
-	struct ctl_table *to = p->ctl_table, *by = q->ctl_table;
-	struct ctl_table *next;
-	int is_better = 0;
-	int not_in_parent = !p->attached_by;
+	struct ctl_table_header *dirs[CTL_MAXNAME];
+	struct ctl_table_header *__netns_corresp = NULL;
+	int create_first_netns_corresp = group->has_netns_corresp;
+	int i;
+
+	/* We create excess ctl_table_header for directory entries.
+	 * We do so because we may need new headers while under a lock
+	 * where we will not be able to allocate entries (sleeping).
+	 * Also, this simplifies handling of ENOMEM: no need to remove
+	 * already allocated/added directories and unlink them from
+	 * their parent directories. Stuff that is not used will be
+	 * freed at the end. */
+	for (i = 0; i < nr_dirs; i++) {
+		dirs[i] = alloc_sysctl_header(group);
+		if (!dirs[i])
+			goto err_alloc_dir;
+		dirs[i]->ctl_dirname = path[i].procname;
+	}
 
-	while ((next = is_branch_in(by, to)) != NULL) {
-		if (by == q->attached_by)
-			is_better = 1;
-		if (to == p->attached_by)
-			not_in_parent = 1;
-		by = by->child;
-		to = next->child;
+	if (create_first_netns_corresp) {
+		/* The netns correspondent for the last common path
+		 * component migh exist.  However we will only know
+		 * this later while being under a lock. We
+		 * pre-allocate it just in case it might be needed and
+		 * free it at the end only if it wasn't used. */
+		__netns_corresp = alloc_sysctl_header(group);
+		if (!__netns_corresp)
+			goto err_alloc_coresp;
 	}
 
-	if (is_better && not_in_parent) {
-		q->attached_by = by;
-		q->attached_to = to;
-		q->parent = p;
+	header_refs_inc(parent);
+
+	for (i = 0; i < nr_dirs; i++) {
+		struct ctl_table_header *h;
+
+	retry:
+		sysctl_write_lock_head(parent);
+
+		h = mkdir_existing_dir(parent, dirs[i]->ctl_dirname);
+		if (h != NULL) {
+			sysctl_write_unlock_head(parent);
+			parent = h;
+			continue;
+		}
+
+		if (likely(!create_first_netns_corresp)) {
+			h = mkdir_new_dir(parent, dirs[i]);
+			sysctl_write_unlock_head(parent);
+			parent = h;
+			dirs[i] = NULL; /* I'm used, don't free me */
+			continue;
+		}
+
+		sysctl_write_unlock_head(parent);
+
+		create_first_netns_corresp = 0;
+		parent = mkdir_netns_corresp(parent, group, &__netns_corresp);
+		/* We still have to add the new subdirectory, but
+		 * instead of adding it into the common parent, add it
+		 * to it's netns correspondent. */
+		goto retry;
 	}
+
+	if (create_first_netns_corresp)
+		parent = mkdir_netns_corresp(parent, group, &__netns_corresp);
+
+	if (__netns_corresp)
+		kfree(__netns_corresp);
+
+	/* free unused pre-allocated entries */
+	for (i = 0; i < nr_dirs; i++)
+		if (dirs[i])
+			kfree(dirs[i]);
+
+	return parent;
+
+err_alloc_coresp:
+	i = nr_dirs;
+err_alloc_dir:
+	for (i--; i >= 0; i--)
+		kfree(dirs[i]);
+	return NULL;
+
 }
 
 /**
  * __register_sysctl_paths - register a sysctl hierarchy
- * @root: List of sysctl headers to register on
+ * @group: Group of sysctl headers to register on
  * @namespaces: Data to compute which lists of sysctl entries are visible
  * @path: The path to the directory the sysctl table is in.
  * @table: the top-level table structure
@@ -1803,9 +1944,6 @@ static void try_attach(struct ctl_table_header *p, struct ctl_table_header *q)
  *
  * mode - the file permissions for the /proc/sys file, and for sysctl(2)
  *
- * child - a pointer to the child sysctl table if this entry is a directory, or
- *         %NULL.
- *
  * proc_handler - the text handler routine (described below)
  *
  * de - for internal use by the sysctl routines
@@ -1835,78 +1973,28 @@ static void try_attach(struct ctl_table_header *p, struct ctl_table_header *q)
  * This routine returns %NULL on a failure to register, and a pointer
  * to the table header on success.
  */
-struct ctl_table_header *__register_sysctl_paths(
-	struct ctl_table_root *root,
-	struct ctl_table_group *group,
-	struct nsproxy *namespaces,
+struct ctl_table_header *__register_sysctl_paths(struct ctl_table_group *group,
 	const struct ctl_path *path, struct ctl_table *table)
 {
 	struct ctl_table_header *header;
-	struct ctl_table *new, **prevp;
-	unsigned int n, npath;
-	struct ctl_table_set *set;
-
-	/* Count the path components */
-	for (npath = 0; path[npath].procname; ++npath)
-		;
+	int nr_dirs = ctl_path_items(path);
 
-	/*
-	 * For each path component, allocate a 2-element ctl_table array.
-	 * The first array element will be filled with the sysctl entry
-	 * for this, the second will be the sentinel (procname == 0).
-	 *
-	 * We allocate everything in one go so that we don't have to
-	 * worry about freeing additional memory in unregister_sysctl_table.
-	 */
-	header = kzalloc(sizeof(struct ctl_table_header) +
-			 (2 * npath * sizeof(struct ctl_table)), GFP_KERNEL);
+	header = alloc_sysctl_header(group);
 	if (!header)
 		return NULL;
 
-	new = (struct ctl_table *) (header + 1);
-
-	/* Now connect the dots */
-	prevp = &header->ctl_table;
-	for (n = 0; n < npath; ++n, ++path) {
-		/* Copy the procname */
-		new->procname = path->procname;
-		new->mode     = 0555;
-
-		*prevp = new;
-		prevp = &new->child;
-
-		new += 2;
-	}
-	*prevp = table;
-	header->ctl_table_arg = table;
-
-	INIT_LIST_HEAD(&header->ctl_entry);
-	header->unregistering = NULL;
-	header->root = root;
-	header->ctl_group = group;
-	header->ctl_header_refs = 1;
-#ifdef CONFIG_SYSCTL_SYSCALL_CHECK
-	if (sysctl_check_table(namespaces, header->ctl_table)) {
+	header->parent = sysctl_mkdirs(&root_table_header, group, path, nr_dirs);
+	if (!header->parent) {
 		kfree(header);
 		return NULL;
 	}
-#endif
-	spin_lock(&sysctl_lock);
-	header->set = lookup_header_set(root, namespaces);
-	header->attached_by = header->ctl_table;
-	header->attached_to = root_table;
-	header->parent = &root_table_header;
-	for (set = header->set; set; set = set->parent) {
-		struct ctl_table_header *p;
-		list_for_each_entry(p, &set->list, ctl_entry) {
-			if (p->unregistering)
-				continue;
-			try_attach(p, header);
-		}
-	}
-	header->parent->ctl_header_refs++;
-	list_add_tail(&header->ctl_entry, &header->set->list);
-	spin_unlock(&sysctl_lock);
+
+	header->ctl_table_arg = table;
+	header->ctl_header_refs = 1;
+
+	sysctl_write_lock_head(header->parent);
+	list_add_tail(&header->ctl_entry, &header->parent->ctl_tables);
+	sysctl_write_unlock_head(header->parent);
 
 	return header;
 }
@@ -1924,8 +2012,7 @@ struct ctl_table_header *__register_sysctl_paths(
 struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
 						struct ctl_table *table)
 {
-	return __register_sysctl_paths(&sysctl_table_root, &root_table_group,
-				       current->nsproxy, path, table);
+	return __register_sysctl_paths(&root_table_group, path, table);
 }
 
 /**
@@ -1935,31 +2022,67 @@ struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
  * Unregisters the sysctl table and all children. proc entries may not
  * actually be removed until they are no longer used by anyone.
  */
-void unregister_sysctl_table(struct ctl_table_header * header)
+void unregister_sysctl_table(struct ctl_table_header *header)
 {
 	might_sleep();
 
-	if (header == NULL)
-		return;
+	while(header->parent) {
+		struct ctl_table_header *parent = header->parent;
 
-	spin_lock(&sysctl_lock);
-	start_unregistering(header);
-
-	/* after start_unregistering has finished no one holds a
-	 * ctl_use_refs or is able to acquire one => no one is going
-	 * to access internal fields of this object, so we can remove
-	 * it from the list and schedule it for deletion. */
-	list_del_init(&p->ctl_entry);
-
-	if (!--header->parent->ctl_header_refs) {
-		WARN_ON(1);
-		if (!header->parent->ctl_procfs_refs)
-			call_rcu(&header->parent->rcu, free_head);
-	}
-	if (!--header->ctl_header_refs)
+		/* the three counters (ctl_header_refs, ctl_procfs_refs
+		 * and ctl_use_refs) are protected by the spin lock. */
+		spin_lock(&sysctl_lock);
+		if (header->ctl_header_refs > 1) {
+			/* other headers need a reference to this one. Just
+			 * mark that we don't need it and leave it as it is. */
+			header->ctl_header_refs --;
+			spin_unlock(&sysctl_lock);
+
+			goto unregister_parent;
+		}
+
+		/* header->ctl_header_refs is 1. We hold the only
+		 * ctl_header_refs reference, but others may still
+		 * hold _use_refs and _procfs_refs. We first need to
+		 * wait until no one is actively using this object
+		 * (that means until ctl_use_refs==0). While waiting
+		 * no one will increase this header's refs because we
+		 * set ->unregistering. */
+		start_unregistering(header);
+		spin_unlock(&sysctl_lock);
+
+		if (!header->ctl_dirname) {
+			/* the header is a netns correspondent of it's
+			 * parent. It is a member of it's netns
+			 * specific ctl_table_group list. For not that
+			 * list is protected by sysctl_lock. */
+			spin_lock(&sysctl_lock);
+			list_del_init(&header->ctl_entry);
+			spin_unlock(&sysctl_lock);
+		} else {
+			/* ctl_entry is a member of the parent's
+			 * ctl_tables/subdirs lists which are
+			 * protected by the parent's write lock. */
+			sysctl_write_lock_head(parent);
+			list_del_init(&header->ctl_entry);
+			sysctl_write_unlock_head(parent);
+		}
+
+		spin_lock(&sysctl_lock);
+		/* something is wrong in the register/unregister code
+		 * if this BUG triggers. No one should have changed the
+		 * _header_refs of this header after start_unregistering */
+		BUG_ON(header->ctl_header_refs != 1);
+
+		header->ctl_header_refs --;
 		if (!header->ctl_procfs_refs)
 			call_rcu(&header->rcu, free_head);
-	spin_unlock(&sysctl_lock);
+
+		spin_unlock(&sysctl_lock);
+
+unregister_parent:
+		header = parent;
+	}
 }
 
 int sysctl_is_seen(struct ctl_table_header *p)
@@ -1972,16 +2095,19 @@ int sysctl_is_seen(struct ctl_table_header *p)
 	else if (!ops->is_seen)
 		res = 1;
 	else
-		res = ops->is_seen(p->set);
+		res = ops->is_seen(p->ctl_group);
 	spin_unlock(&sysctl_lock);
 	return res;
 }
 
-void setup_sysctl_set(struct ctl_table_set *p,
-		      struct ctl_table_set *parent)
+void sysctl_init_group(struct ctl_table_group *group,
+		       const struct ctl_table_group_ops *ops,
+		       int has_netns_corresp)
 {
-	INIT_LIST_HEAD(&p->list);
-	p->parent = parent ? parent : &sysctl_table_root.default_set;
+	group->ctl_ops = ops;
+	group->has_netns_corresp = has_netns_corresp;
+	if (has_netns_corresp)
+		INIT_LIST_HEAD(&group->corresp_list);
 }
 
 #else /* !CONFIG_SYSCTL */
@@ -1995,8 +2121,9 @@ void unregister_sysctl_table(struct ctl_table_header * table)
 {
 }
 
-void setup_sysctl_set(struct ctl_table_set *p,
-		      struct ctl_table_set *parent)
+void sysctl_init_group(struct ctl_table_group *group,
+		       const struct ctl_table_group_ops *ops,
+		       int has_netns_corresp)
 {
 }
 
diff --git a/kernel/sysctl_check.c b/kernel/sysctl_check.c
index 44c31f0..e9a7a58 100644
--- a/kernel/sysctl_check.c
+++ b/kernel/sysctl_check.c
@@ -1,167 +1 @@
-#include <linux/stat.h>
-#include <linux/sysctl.h>
-#include "../fs/xfs/linux-2.6/xfs_sysctl.h"
-#include <linux/sunrpc/debug.h>
-#include <linux/string.h>
-#include <net/ip_vs.h>
-
-
-static void sysctl_print_path(struct ctl_table *table,
-			      struct ctl_table **parents, int depth)
-{
-	struct ctl_table *p;
-	int i;
-	if (table->procname) {
-		for (i = 0; i < depth; i++) {
-			p = parents[i];
-			printk("/%s", p->procname ? p->procname : "");
-		}
-		printk("/%s", table->procname);
-	}
-	printk(" ");
-}
-
-static struct ctl_table *sysctl_check_lookup(struct nsproxy *namespaces,
-	     struct ctl_table *table, struct ctl_table **parents, int depth)
-{
-	struct ctl_table_header *head;
-	struct ctl_table *ref, *test;
-	int cur_depth;
-
-	for (head = __sysctl_use_next_header(namespaces, NULL); head;
-	     head = __sysctl_use_next_header(namespaces, head)) {
-		cur_depth = depth;
-		ref = head->ctl_table;
-repeat:
-		test = parents[depth - cur_depth];
-		for (; ref->procname; ref++) {
-			int match = 0;
-			if (cur_depth && !ref->child)
-				continue;
-
-			if (test->procname && ref->procname &&
-			    (strcmp(test->procname, ref->procname) == 0))
-					match++;
-
-			if (match) {
-				if (cur_depth != 0) {
-					cur_depth--;
-					ref = ref->child;
-					goto repeat;
-				}
-				goto out;
-			}
-		}
-	}
-	ref = NULL;
-out:
-	sysctl_unuse_header(head);
-	return ref;
-}
-
-static void set_fail(const char **fail, struct ctl_table *table,
-	     const char *str, struct ctl_table **parents, int depth)
-{
-	if (*fail) {
-		printk(KERN_ERR "sysctl table check failed: ");
-		sysctl_print_path(table, parents, depth);
-		printk(" %s\n", *fail);
-		dump_stack();
-	}
-	*fail = str;
-}
-
-static void sysctl_check_leaf(struct nsproxy *namespaces,
-			      struct ctl_table *table, const char **fail,
-			      struct ctl_table **parents, int depth)
-{
-	struct ctl_table *ref;
-
-	ref = sysctl_check_lookup(namespaces, table, parents, depth);
-	if (ref && (ref != table))
-		set_fail(fail, table, "Sysctl already exists", parents, depth);
-}
-
-
-
-#define SET_FAIL(str) set_fail(&fail, table, str, parents, depth)
-
-static int __sysctl_check_table(struct nsproxy *namespaces,
-	struct ctl_table *table, struct ctl_table **parents, int depth)
-{
-	const char *fail = NULL;
-	int error = 0;
-
-	if (depth >= CTL_MAXNAME) {
-		SET_FAIL("Sysctl tree too deep");
-		return -EINVAL;
-	}
-
-	for (; table->procname; table++) {
-		fail = NULL;
-
-
-		if (depth != 0) { /* has parent */
-			if (!parents[depth - 1]->procname)
-				SET_FAIL("Parent without procname");
-		}
-		if (table->child) {
-			if (table->data)
-				SET_FAIL("Directory with data?");
-			if (table->maxlen)
-				SET_FAIL("Directory with maxlen?");
-			if ((table->mode & (S_IRUGO|S_IXUGO)) != table->mode)
-				SET_FAIL("Writable sysctl directory");
-			if (table->proc_handler)
-				SET_FAIL("Directory with proc_handler");
-			if (table->extra1)
-				SET_FAIL("Directory with extra1");
-			if (table->extra2)
-				SET_FAIL("Directory with extra2");
-		} else {
-			if ((table->proc_handler == proc_dostring) ||
-			    (table->proc_handler == proc_dointvec) ||
-			    (table->proc_handler == proc_dointvec_minmax) ||
-			    (table->proc_handler == proc_dointvec_jiffies) ||
-			    (table->proc_handler == proc_dointvec_userhz_jiffies) ||
-			    (table->proc_handler == proc_dointvec_ms_jiffies) ||
-			    (table->proc_handler == proc_doulongvec_minmax) ||
-			    (table->proc_handler == proc_doulongvec_ms_jiffies_minmax)) {
-				if (!table->data)
-					SET_FAIL("No data");
-				if (!table->maxlen)
-					SET_FAIL("No maxlen");
-			}
-#ifdef CONFIG_PROC_SYSCTL
-			if (!table->proc_handler)
-				SET_FAIL("No proc_handler");
-#endif
-			parents[depth] = table;
-			sysctl_check_leaf(namespaces, table, &fail,
-					  parents, depth);
-		}
-		if (table->mode > 0777)
-			SET_FAIL("bogus .mode");
-		if (fail) {
-			SET_FAIL(NULL);
-			error = -EINVAL;
-		}
-		if (table->child) {
-			parents[depth] = table;
-			error |= __sysctl_check_table(namespaces, table->child,
-						      parents, depth + 1);
-		}
-	}
-	return error;
-}
-
-
-int sysctl_check_table(struct nsproxy *namespaces, struct ctl_table *table)
-{
-	struct ctl_table *parents[CTL_MAXNAME];
-	/* Keep track of parents as we go down into the tree:
-	 * - the node at depth 'd' will have the parent at parents[d-1].
-	 * - the root node (depth=0) has no parent in this array.
-	 */
-	return __sysctl_check_table(namespaces, table, parents, 0);
-}
+/* will be rewritten */
diff --git a/net/sysctl_net.c b/net/sysctl_net.c
index 5009d4e..f610879 100644
--- a/net/sysctl_net.c
+++ b/net/sysctl_net.c
@@ -29,15 +29,9 @@
 #include <linux/if_tr.h>
 #endif
 
-static struct ctl_table_set *
-net_ctl_header_lookup(struct ctl_table_root *root, struct nsproxy *namespaces)
+static int is_seen(struct ctl_table_group *group)
 {
-	return &namespaces->net_ns->sysctls;
-}
-
-static int is_seen(struct ctl_table_set *set)
-{
-	return &current->nsproxy->net_ns->sysctls == set;
+	return &current->nsproxy->net_ns->netns_ctl_group == group;
 }
 
 /* Return standard mode bits for table entry. */
@@ -56,14 +50,6 @@ static const struct ctl_table_group_ops net_sysctl_group_ops = {
 	.permissions = net_ctl_permissions,
 };
 
-static struct ctl_table_group net_sysctl_group = {
-	.ctl_ops = &net_sysctl_group_ops,
-};
-
-static struct ctl_table_root net_sysctl_root = {
-	.lookup = net_ctl_header_lookup,
-};
-
 static int net_ctl_ro_header_permissions(ctl_table *table)
 {
 	if (net_eq(current->nsproxy->net_ns, &init_net))
@@ -77,21 +63,22 @@ static const struct ctl_table_group_ops net_sysctl_ro_group_ops = {
 };
 
 static struct ctl_table_group net_sysctl_ro_group = {
+	.has_netns_corresp = 0,
 	.ctl_ops = &net_sysctl_ro_group_ops,
 };
 
-static struct ctl_table_root net_sysctl_ro_root = { };
-
 static int __net_init sysctl_net_init(struct net *net)
 {
-	setup_sysctl_set(&net->sysctls,
-			 &net_sysctl_ro_root.default_set);
+	int has_netns_corresp = 1;
+
+	sysctl_init_group(&net->netns_ctl_group, &net_sysctl_group_ops,
+			  has_netns_corresp);
 	return 0;
 }
 
 static void __net_exit sysctl_net_exit(struct net *net)
 {
-	WARN_ON(!list_empty(&net->sysctls.list));
+	WARN_ON(!list_empty(&net->netns_ctl_group.corresp_list));
 }
 
 static struct pernet_operations sysctl_pernet_ops = {
@@ -105,9 +92,6 @@ static __init int net_sysctl_init(void)
 	ret = register_pernet_subsys(&sysctl_pernet_ops);
 	if (ret)
 		goto out;
-	register_sysctl_root(&net_sysctl_root);
-	setup_sysctl_set(&net_sysctl_ro_root.default_set, NULL);
-	register_sysctl_root(&net_sysctl_ro_root);
 out:
 	return ret;
 }
@@ -116,19 +100,14 @@ subsys_initcall(net_sysctl_init);
 struct ctl_table_header *register_net_sysctl_table(struct net *net,
 	const struct ctl_path *path, struct ctl_table *table)
 {
-	struct nsproxy namespaces;
-	namespaces = *current->nsproxy;
-	namespaces.net_ns = net;
-	return __register_sysctl_paths(&net_sysctl_root, &net_sysctl_group,
-					&namespaces, path, table);
+	return __register_sysctl_paths(&net->netns_ctl_group, path, table);
 }
 EXPORT_SYMBOL_GPL(register_net_sysctl_table);
 
 struct ctl_table_header *register_net_sysctl_rotable(const
 		struct ctl_path *path, struct ctl_table *table)
 {
-	return __register_sysctl_paths(&net_sysctl_ro_root, &net_sysctl_ro_group,
-			&init_nsproxy, path, table);
+	return __register_sysctl_paths(&net_sysctl_ro_group, path, table);
 }
 EXPORT_SYMBOL_GPL(register_net_sysctl_rotable);
 
-- 
1.7.5.134.g1c08b


^ permalink raw reply related

* [v2 080/115] sysctl: single subheader path: net/ipv4/conf/DEVICE-NAME/
From: Lucian Adrian Grijincu @ 2011-05-08 22:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, Lucian Adrian Grijincu
In-Reply-To: <1304894407-32201-1-git-send-email-lucian.grijincu@gmail.com>

Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
 net/ipv4/devinet.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index cd9ca08..e672107 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1631,7 +1631,13 @@ static int __devinet_sysctl_register(struct net *net, char *dev_name,
 		{ .procname = "net",  },
 		{ .procname = "ipv4", },
 		{ .procname = "conf", },
-		{ /* to be set */ },
+		{
+			/* to be set bellow (DEVINET_CTL_PATH_DEV) */
+			.procname = NULL,
+			/* skip duplicate name check; we're registering
+			 * just one subheader for this directory */
+			.has_just_one_subheader = 1,
+		},
 		{ },
 	};
 
-- 
1.7.5.134.g1c08b


^ permalink raw reply related

* [v2 083/115] sysctl: single subheader path: dev/parport/PORT/devices/DEVICE/
From: Lucian Adrian Grijincu @ 2011-05-08 22:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, Lucian Adrian Grijincu
In-Reply-To: <1304894407-32201-1-git-send-email-lucian.grijincu@gmail.com>

This patch was not tested!

Parport registers tables under these paths:
	dev/parport/default/
	dev/parport/PORT/
	dev/parport/PORT/devices/
	dev/parport/PORT/devices/DEVICE/

Nothing else is registered below dev/parport/PORT/devices/DEVICE/ and
I assume device names are unique (if they are not this patch is
invalid), so we can skip name checks for the 'DEVICE' directory.

This will have a positive performance impact when there are many
devices registered on the same port.

Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
 drivers/parport/procfs.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/drivers/parport/procfs.c b/drivers/parport/procfs.c
index 3bb5bed..9c48946 100644
--- a/drivers/parport/procfs.c
+++ b/drivers/parport/procfs.c
@@ -442,7 +442,12 @@ int parport_device_proc_register(struct pardevice *device)
 		{ .procname = "parport" },
 		{ .procname = port->name },
 		{ .procname = "devices" },
-		{ .procname = device->name },
+		{
+			.procname = device->name,
+			/* skip duplicate name check; we're registering
+			 * just one subheader for this directory */
+			.has_just_one_subheader = 1,
+		},
 		{  },
 	};
 
-- 
1.7.5.134.g1c08b


^ permalink raw reply related

* [v2 084/115] sysctl: single subheader path: net/ax25/DEVICE
From: Lucian Adrian Grijincu @ 2011-05-08 22:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, Lucian Adrian Grijincu
In-Reply-To: <1304894407-32201-1-git-send-email-lucian.grijincu@gmail.com>

Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
 net/ax25/sysctl_net_ax25.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/net/ax25/sysctl_net_ax25.c b/net/ax25/sysctl_net_ax25.c
index b1181bc..9bd49c0 100644
--- a/net/ax25/sysctl_net_ax25.c
+++ b/net/ax25/sysctl_net_ax25.c
@@ -160,7 +160,13 @@ void ax25_register_sysctl(struct ax25_dev *ax25_dev)
 	struct ctl_path ax25_path[] = {
 		{ .procname = "net" },
 		{ .procname = "ax25" },
-		{ .procname = ax25_dev->dev->name },
+		{
+			.procname = ax25_dev->dev->name,
+			/* skip duplicate name check; we're registering
+			 * just one subheader for this directory */
+			.has_just_one_subheader = 1,
+
+		},
 		{ }
 	};
 
-- 
1.7.5.134.g1c08b


^ permalink raw reply related

* [v2 086/115] sysctl: single subheader path: net/decnet/conf/DEVNAME
From: Lucian Adrian Grijincu @ 2011-05-08 22:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, Lucian Adrian Grijincu
In-Reply-To: <1304894407-32201-1-git-send-email-lucian.grijincu@gmail.com>

This patch was not tested!

I assume the DN_CTL_PATH_DEV .procname names are unique. If they are
not this patch is invalid.

Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
 net/decnet/dn_dev.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/net/decnet/dn_dev.c b/net/decnet/dn_dev.c
index 0dcaa90..d83d561 100644
--- a/net/decnet/dn_dev.c
+++ b/net/decnet/dn_dev.c
@@ -216,7 +216,13 @@ static void dn_dev_sysctl_register(struct net_device *dev, struct dn_dev_parms *
 		{ .procname = "net",  },
 		{ .procname = "decnet",  },
 		{ .procname = "conf",  },
-		{ /* to be set */ },
+		{
+			/* to be set bellow (DN_CTL_PATH_DEV) */
+			.procname = NULL,
+			/* skip duplicate name check; we're registering
+			 * just one subheader for this directory */
+			.has_just_one_subheader = 1,
+		},
 		{ },
 	};
 
-- 
1.7.5.134.g1c08b


^ permalink raw reply related

* [v2 088/115] RFC: sysctl: convert read-write lock to RCU
From: Lucian Adrian Grijincu @ 2011-05-08 22:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, Lucian Adrian Grijincu
In-Reply-To: <1304894407-32201-1-git-send-email-lucian.grijincu@gmail.com>

Apologies to reviewers who will feel insulted reading this. This patch
is just for kicks - and by kicks I mean ass-kicks for such an awful
misuse of the RCU API. I haven't done anything with RCUs until now and
I'm very unsure about the sanity of this patch.

This patch replaces the reader-writer lock protected lists ctl_subdirs
and ctl_tables with RCU protected lists.

Unlike in the RCU sniplets I found, where the Reader part only read
data from the object - Updates were done on a separate Copy (RCU ...),
here readers do change some data in the list elements (data access
protected by a separate spin lock), but does not touch the list_head.

read-side:
  - uses the for...rcu list traversal for DEC Alpha memory whatever
  - rcu_read_(un)lock make sure the grace period is as long as needed

write-site:
  - writers are synchronized with a spin-lock
  - list adding/removing is done with list_add_tail_rcu/list_del_rcu
  - freeing of elements is done after the grace period has ended (call_rcu)

Also note that there may be unwanted interactions with the RCU
protected VFS routines: ctl_table_header elements are scheduled to be
freed when all references to them have disappeared. This means after
removing the element from the list of at a later time (also with
call_rcu). I don't think that delaying free-ing some more would be a
problem, but I may be very wrong.

Free-ing of ctl_table_header is done with free_head.  This is
scheduled to be called with call_rcu in two places:

- sysctl_proc_inode_put() called from the VFS by proc_evict_inode which uses
       rcu_assign_pointer(PROC_I(inode)->sysctl, NULL)
   to delete the VFS's last reference to the object

- unregister_sysctl_table (no connection to the VFS).

Each of them determines if all references to that object have
disappeared, and if so, schedule the object to be freed with call_rcu.

Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
 fs/proc/proc_sysctl.c |    8 ++++----
 kernel/sysctl.c       |   29 ++++++++++++-----------------
 kernel/sysctl_check.c |    7 ++++---
 3 files changed, 20 insertions(+), 24 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 9337149..b3e2453 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -81,7 +81,7 @@ retry:
 	sysctl_read_lock_head(head);
 
 	/* first check whether a subdirectory has the searched-for name */
-	list_for_each_entry(h, &head->ctl_subdirs, ctl_entry) {
+	list_for_each_entry_rcu(h, &head->ctl_subdirs, ctl_entry) {
 		if (IS_ERR(sysctl_use_header(h)))
 			continue;
 
@@ -93,7 +93,7 @@ retry:
 	}
 
 	/* no subdir with that name, look for the file in the ctl_tables */
-	list_for_each_entry(h, &head->ctl_tables, ctl_entry) {
+	list_for_each_entry_rcu(h, &head->ctl_tables, ctl_entry) {
 		if (IS_ERR(sysctl_use_header(h)))
 			continue;
 
@@ -234,7 +234,7 @@ static int scan(struct ctl_table_header *head,
 
 	sysctl_read_lock_head(head);
 
-	list_for_each_entry(h, &head->ctl_subdirs, ctl_entry) {
+	list_for_each_entry_rcu(h, &head->ctl_subdirs, ctl_entry) {
 		if (*pos < file->f_pos) {
 			(*pos)++;
 			continue;
@@ -252,7 +252,7 @@ static int scan(struct ctl_table_header *head,
 		(*pos)++;
 	}
 
-	list_for_each_entry(h, &head->ctl_tables, ctl_entry) {
+	list_for_each_entry_rcu(h, &head->ctl_tables, ctl_entry) {
 		ctl_table *t;
 
 		if (IS_ERR(sysctl_use_header(h)))
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 9e50334..26c2bc6 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -56,7 +56,6 @@
 #include <linux/kprobes.h>
 #include <linux/pipe_fs_i.h>
 #include <linux/oom.h>
-#include <linux/rwsem.h>
 
 #include <asm/uaccess.h>
 #include <asm/processor.h>
@@ -1616,30 +1615,25 @@ struct ctl_table_header *sysctl_use_netns_corresp(struct ctl_table_header *h)
 }
 
 
-/* This semaphore protects the ctl_subdirs and ctl_tables lists. You
- * must also have incremented the _use_refs of the header before
- * accessing any field of the header including these lists. If it's
- * deemed necessary, we can create a per-header rwsem. For now a
- * global one will do. */
-static DECLARE_RWSEM(sysctl_rwsem);
+/* protection for the headers' ctl_subdirs/ctl_tables lists */
+static DEFINE_SPINLOCK(sysctl_list_lock);
 void sysctl_write_lock_head(struct ctl_table_header *head)
 {
-	down_write(&sysctl_rwsem);
+	spin_lock(&sysctl_list_lock);
 }
 void sysctl_write_unlock_head(struct ctl_table_header *head)
 {
-	up_write(&sysctl_rwsem);
+	spin_unlock(&sysctl_list_lock);
 }
 void sysctl_read_lock_head(struct ctl_table_header *head)
 {
-	down_read(&sysctl_rwsem);
+	rcu_read_lock();
 }
 void sysctl_read_unlock_head(struct ctl_table_header *head)
 {
-	up_read(&sysctl_rwsem);
+	rcu_read_unlock();
 }
 
-
 /*
  * sysctl_perm does NOT grant the superuser all rights automatically, because
  * some sysctl variables are readonly even to root.
@@ -1777,6 +1771,7 @@ static struct ctl_table_header *alloc_sysctl_header(struct ctl_table_group *grou
 	h->ctl_table_arg = NULL;
 	h->unregistering = NULL;
 	h->ctl_group = group;
+	INIT_LIST_HEAD(&h->ctl_entry);
 
 	return h;
 }
@@ -1788,7 +1783,7 @@ static struct ctl_table_header *mkdir_existing_dir(struct ctl_table_header *pare
 						   const char *name)
 {
 	struct ctl_table_header *h;
-	list_for_each_entry(h, &parent->ctl_subdirs, ctl_entry) {
+	list_for_each_entry_rcu(h, &parent->ctl_subdirs, ctl_entry) {
 		spin_lock(&sysctl_lock);
 		if (likely(!h->unregistering)) {
 			if (strcmp(name, h->ctl_dirname) == 0) {
@@ -1844,7 +1839,7 @@ static struct ctl_table_header *mkdir_new_dir(struct ctl_table_header *parent,
 {
 	dir->parent = parent;
 	header_refs_inc(dir);
-	list_add_tail(&dir->ctl_entry, &parent->ctl_subdirs);
+	list_add_tail_rcu(&dir->ctl_entry, &parent->ctl_subdirs);
 	return dir;
 }
 
@@ -2049,7 +2044,7 @@ struct ctl_table_header *__register_sysctl_paths(struct ctl_table_group *group,
 	failed_duplicate_check = sysctl_check_duplicates(header);
 #endif
 	if (!failed_duplicate_check)
-		list_add_tail(&header->ctl_entry, &header->parent->ctl_tables);
+		list_add_tail_rcu(&header->ctl_entry, &header->parent->ctl_tables);
 
 	sysctl_write_unlock_head(header->parent);
 
@@ -2119,14 +2114,14 @@ void unregister_sysctl_table(struct ctl_table_header *header)
 			 * specific ctl_table_group list. For not that
 			 * list is protected by sysctl_lock. */
 			spin_lock(&sysctl_lock);
-			list_del_init(&header->ctl_entry);
+			list_del_rcu(&header->ctl_entry);
 			spin_unlock(&sysctl_lock);
 		} else {
 			/* ctl_entry is a member of the parent's
 			 * ctl_tables/subdirs lists which are
 			 * protected by the parent's write lock. */
 			sysctl_write_lock_head(parent);
-			list_del_init(&header->ctl_entry);
+			list_del_rcu(&header->ctl_entry);
 			sysctl_write_unlock_head(parent);
 		}
 
diff --git a/kernel/sysctl_check.c b/kernel/sysctl_check.c
index 55e797a..b9573e0 100644
--- a/kernel/sysctl_check.c
+++ b/kernel/sysctl_check.c
@@ -1,5 +1,6 @@
 #include <linux/sysctl.h>
 #include <linux/string.h>
+#include <linux/rculist.h>
 
 /*
  * @path: the path to the offender
@@ -124,7 +125,7 @@ int sysctl_check_duplicates(struct ctl_table_header *header)
 	struct ctl_table_header *dir = header->parent;
 	struct ctl_table_header *h;
 
-	list_for_each_entry(h, &dir->ctl_subdirs, ctl_entry) {
+	list_for_each_entry_rcu(h, &dir->ctl_subdirs, ctl_entry) {
 		if (IS_ERR(sysctl_use_header(h)))
 			continue;
 
@@ -136,7 +137,7 @@ int sysctl_check_duplicates(struct ctl_table_header *header)
 		sysctl_unuse_header(h);
 	}
 
-	list_for_each_entry(h, &dir->ctl_tables, ctl_entry) {
+	list_for_each_entry_rcu(h, &dir->ctl_tables, ctl_entry) {
 		ctl_table *t;
 
 		if (IS_ERR(sysctl_use_header(h)))
@@ -188,7 +189,7 @@ int sysctl_check_netns_correspondents(struct ctl_table_header *header,
 	/* see if the netns_correspondent has a subdir
 	 * with the same as this non-netns specific header */
 	sysctl_read_lock_head(netns_corresp);
-	list_for_each_entry(h, &netns_corresp->ctl_subdirs, ctl_entry) {
+	list_for_each_entry_rcu(h, &netns_corresp->ctl_subdirs, ctl_entry) {
 		if (IS_ERR(sysctl_use_header(h)))
 			continue;
 		if (strcmp(header->ctl_dirname, h->ctl_dirname) == 0) {
-- 
1.7.5.134.g1c08b


^ permalink raw reply related

* [v2 089/115] RFC: sysctl: change type of ctl_procfs_refs to u8
From: Lucian Adrian Grijincu @ 2011-05-08 22:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, Lucian Adrian Grijincu
In-Reply-To: <1304894407-32201-1-git-send-email-lucian.grijincu@gmail.com>

255 files per registered header should be enough for everyone.
If not, either:
- register another header (and another, and another) each with max 255 files
- change the type of ctl_procfs_refs to something bigger (e.g. u16)

This patch makes two assumptions:

- there will be at max a single inode created for each sysctl
  file. That means that the ctl_table_header will be (at max)
  incremented once for each of it's files. For directories the counter
  will be incremented only once (when creating an inode for the
  directory itself).

- there are no sysctl tables in the kernel with more than 255 entries.

Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
 fs/proc/proc_sysctl.c  |   12 +++++++++---
 include/linux/sysctl.h |   11 +++++++----
 kernel/sysctl.c        |   10 +++++++++-
 kernel/sysctl_check.c  |   12 ++++++++++++
 4 files changed, 37 insertions(+), 8 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index b3e2453..9580794 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -20,13 +20,15 @@ static struct inode *proc_sys_make_inode(struct super_block *sb,
 	struct inode *inode;
 	struct proc_inode *ei;
 
+	if (sysctl_proc_inode_get(head))
+		goto err_get;
+
 	inode = new_inode(sb);
 	if (!inode)
-		goto out;
+		goto err_new_inode;
 
 	inode->i_ino = get_next_ino();
 
-	sysctl_proc_inode_get(head);
 	ei = PROC_I(inode);
 	ei->sysctl = head;
 	ei->sysctl_entry = table;
@@ -44,8 +46,12 @@ static struct inode *proc_sys_make_inode(struct super_block *sb,
 		inode->i_op = &proc_sys_dir_operations;
 		inode->i_fop = &proc_sys_dir_file_operations;
 	}
-out:
 	return inode;
+
+err_new_inode:
+	sysctl_proc_inode_put(head);
+err_get:
+	return NULL;
 }
 
 static struct ctl_table *find_in_table(struct ctl_table *p, struct qstr *name)
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 036d1aa..d5d9b66f 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -947,7 +947,7 @@ extern void sysctl_init_group(struct ctl_table_group *group,
 
 /* get/put a reference to this header that
  * will be/was embedded in a procfs proc_inode */
-extern void sysctl_proc_inode_get(struct ctl_table_header *);
+extern int  sysctl_proc_inode_get(struct ctl_table_header *);
 extern void sysctl_proc_inode_put(struct ctl_table_header *);
 
 extern int sysctl_is_seen(struct ctl_table_header *);
@@ -1059,13 +1059,16 @@ struct ctl_table_header {
 			/* references to this header from contexts that
 			 * can access fields of this header */
 			int ctl_use_refs;
-			/* references to this header from procfs inodes.
-			 * procfs embeds a pointer to the header in proc_inode */
-			int ctl_procfs_refs;
 			/* counts references to this header from other
 			 * headers (through ->parent) plus the reference
 			 * returned by __register_sysctl_paths */
 			int ctl_header_refs;
+			/* references to this header from procfs inodes.
+			 * procfs embeds a pointer to the header in proc_inode.
+			 * If there's at max one inode created per file then
+			 * the max value of this is the number of files in the
+			 * ctl_table array, or 1 for directories. */
+			u8 ctl_procfs_refs;
 		};
 		struct rcu_head rcu;
 	};
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 26c2bc6..3e30e78 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1546,11 +1546,19 @@ static void start_unregistering(struct ctl_table_header *p)
 	}
 }
 
-void sysctl_proc_inode_get(struct ctl_table_header *head)
+int sysctl_proc_inode_get(struct ctl_table_header *head)
 {
+	int err = 0;
 	spin_lock(&sysctl_lock);
 	head->ctl_procfs_refs++;
+	if (unlikely(head->ctl_procfs_refs == 0)) {
+		/* restore old value */
+		head->ctl_procfs_refs--;
+		err = 1;
+		WARN(head->ctl_procfs_refs == 0, "sysctl: ctl_procfs_refs overflow");
+	}
 	spin_unlock(&sysctl_lock);
+	return err;
 }
 
 static void free_head(struct rcu_head *rcu)
diff --git a/kernel/sysctl_check.c b/kernel/sysctl_check.c
index b9573e0..205f721 100644
--- a/kernel/sysctl_check.c
+++ b/kernel/sysctl_check.c
@@ -29,6 +29,8 @@ int sysctl_check_table(const struct ctl_path *path,
 		       struct ctl_table *table)
 {
 	struct ctl_table *t;
+	unsigned int max_bits, max_files;
+	unsigned int nr_files = 0;
 	int error = 0;
 
 	if (nr_dirs > CTL_MAXNAME - 1) {
@@ -37,6 +39,7 @@ int sysctl_check_table(const struct ctl_path *path,
 	}
 
 	for(t = table; t->procname; t++) {
+		nr_files ++;
 		if ((t->proc_handler == proc_dostring) ||
 		    (t->proc_handler == proc_dointvec) ||
 		    (t->proc_handler == proc_dointvec_minmax) ||
@@ -58,6 +61,15 @@ int sysctl_check_table(const struct ctl_path *path,
 			FAIL("bogus .mode");
 	}
 
+	/* make sure we can increment the header's ctl_procfs_refs
+	 * counter for each file in the table. If this fails we either
+	 * need to change the type of the ctl_procfs_refs variable, or
+	 * register more tables in the same directory. */
+	max_bits = 8 * sizeof(((struct ctl_table_header *) 0)->ctl_procfs_refs);
+	max_files = 1 << max_bits;
+	if (nr_files >= max_files)
+		FAIL("too many files in registered table");
+
 	if (error)
 		dump_stack();
 
-- 
1.7.5.134.g1c08b


^ permalink raw reply related

* [v2 090/115] sysctl: warn if registration/unregistration order is not respected
From: Lucian Adrian Grijincu @ 2011-05-08 22:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, Lucian Adrian Grijincu
In-Reply-To: <1304894407-32201-1-git-send-email-lucian.grijincu@gmail.com>

This patch sends a warning for each sysctl unregistration that cannot
delete all the directories that it created.

For example:
- register   /existingdir/newdir/file-a
- register   /existingdir/newdir/dir3/file-b
- unregister /existingdir/newdir/file-a
- unregister /existingdir/newdir/dir3/file-b

Here the order is violated because the first unregister operation
cannot delete all the directories it has created (namely 'newdir')
because they are used by another registered path.

This rule violation can be fixed in (at least) two ways:

- enforce order of unregistration:
  - register   /existingdir/newdir/file-a
  - register   /existingdir/newdir/dir3/file-b
  - unregister /existingdir/newdir/dir3/file-b
  - unregister /existingdir/newdir/file-a

- have a third party register the common part:
  - register   /existingdir/newdir/
  - register   /existingdir/newdir/file-a
  - register   /existingdir/newdir/dir3/file-b
  - unregister /existingdir/newdir/file-a
  - unregister /existingdir/newdir/dir3/file-b
  - unregister /existingdir/newdir/

The current implementation works well regardless of this order being
respected. In the future, other sysctl implementations may only work
if this rule is respected.

Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
 include/linux/sysctl.h |   15 ++++++++++++---
 kernel/sysctl.c        |   39 +++++++++++++++++++++++++++++++++++----
 2 files changed, 47 insertions(+), 7 deletions(-)

diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index d5d9b66f..322246d 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -1036,12 +1036,14 @@ struct ctl_table_group_ops {
 };
 
 struct ctl_table_group {
+	/* has initialization for this group finished? */
+	int is_initialized:1;
+	/* does this group use the @corresp_list? */
+	int has_netns_corresp:1;
+	struct list_head corresp_list;
 	const struct ctl_table_group_ops *ctl_ops;
 	/* A list of ctl_table_header elements that represent the
 	 * netns-specific correspondents of some sysctl directories */
-	struct list_head corresp_list;
-	/* binary: whether this group uses the @corresp_list */
-	char has_netns_corresp;
 };
 
 /* struct ctl_table_header is used to maintain dynamic lists of
@@ -1069,6 +1071,13 @@ struct ctl_table_header {
 			 * the max value of this is the number of files in the
 			 * ctl_table array, or 1 for directories. */
 			u8 ctl_procfs_refs;
+			/* how many dirs were created when this header was
+			 * registered. Rule: the header which created a directory
+			 * should be the one that deletes it. This counter is
+			 * used to signal violations of this rule. The counter's
+			 * max value is CTL_MAXNAME (currently=10) so we use
+			 * only 4 bits of the 8 available. */
+			u8 ctl_owned_dirs_refs;
 		};
 		struct rcu_head rcu;
 	};
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 3e30e78..94fff4e 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -204,8 +204,9 @@ static struct kmem_cache *sysctl_header_cachep;
 static const struct ctl_table_group_ops root_table_group_ops = { };
 
 static struct ctl_table_group root_table_group = {
-	.has_netns_corresp = 0,
-	.ctl_ops = &root_table_group_ops,
+	.is_initialized		= 1,
+	.has_netns_corresp	= 0,
+	.ctl_ops		= &root_table_group_ops,
 };
 
 static struct ctl_table_header root_table_header = {
@@ -1617,6 +1618,14 @@ out:
 struct ctl_table_header *sysctl_use_netns_corresp(struct ctl_table_header *h)
 {
 	struct ctl_table_group *g = &current->nsproxy->net_ns->netns_ctl_group;
+
+	/* this function may be called to check whether the
+	 * netns-specific vs. non-netns-specific registration order is
+	 * respected. Those checks may be done early during init when
+	 * nor init_net is not initialized, nor it's netns-specific group. */
+	if (!g->is_initialized)
+		return NULL;
+
 	/* dflt == NULL means: if there's a netns corresp return it,
 	 *                     if there isn't, just return NULL */
 	return sysctl_use_netns_corresp_dflt(g, h, NULL);
@@ -1869,13 +1878,14 @@ static struct ctl_table_header *mkdir_new_dir(struct ctl_table_header *parent,
 static struct ctl_table_header *sysctl_mkdirs(struct ctl_table_header *parent,
 					      struct ctl_table_group *group,
 					      const struct ctl_path *path,
-					      int nr_dirs)
+					      int nr_dirs, int *p_dirs_created)
 {
 	struct ctl_table_header *dirs[CTL_MAXNAME];
 	struct ctl_table_header *__netns_corresp = NULL;
 	int create_first_netns_corresp = group->has_netns_corresp;
 	int i;
 
+	*p_dirs_created = 0;
 	/* We create excess ctl_table_header for directory entries.
 	 * We do so because we may need new headers while under a lock
 	 * where we will not be able to allocate entries (sleeping).
@@ -1929,6 +1939,7 @@ static struct ctl_table_header *sysctl_mkdirs(struct ctl_table_header *parent,
 				goto err_check_netns_correspondents;
 			}
 #endif
+			(*p_dirs_created)++;
 			continue;
 		}
 
@@ -1945,8 +1956,12 @@ static struct ctl_table_header *sysctl_mkdirs(struct ctl_table_header *parent,
 	if (create_first_netns_corresp)
 		parent = mkdir_netns_corresp(parent, group, &__netns_corresp);
 
+	/* if mkdir_netns_corresp used it, it's NULL */
 	if (__netns_corresp)
 		kmem_cache_free(sysctl_header_cachep, __netns_corresp);
+	else
+		(*p_dirs_created)++;
+
 
 	/* free unused pre-allocated entries */
 	for (i = 0; i < nr_dirs; i++)
@@ -2027,6 +2042,7 @@ struct ctl_table_header *__register_sysctl_paths(struct ctl_table_group *group,
 	struct ctl_table_header *header;
 	int failed_duplicate_check = 0;
 	int nr_dirs = ctl_path_items(path);
+	int dirs_created = 0;
 
 #ifdef CONFIG_SYSCTL_SYSCALL_CHECK
 	if (sysctl_check_table(path, nr_dirs, table))
@@ -2037,7 +2053,8 @@ struct ctl_table_header *__register_sysctl_paths(struct ctl_table_group *group,
 	if (!header)
 		return NULL;
 
-	header->parent = sysctl_mkdirs(&root_table_header, group, path, nr_dirs);
+	header->parent = sysctl_mkdirs(&root_table_header, group, path,
+				       nr_dirs, &dirs_created);
 	if (!header->parent) {
 		kmem_cache_free(sysctl_header_cachep, header);
 		return NULL;
@@ -2045,6 +2062,7 @@ struct ctl_table_header *__register_sysctl_paths(struct ctl_table_group *group,
 
 	header->ctl_table_arg = table;
 	header->ctl_header_refs = 1;
+	header->ctl_owned_dirs_refs = dirs_created;
 
 	sysctl_write_lock_head(header->parent);
 
@@ -2089,6 +2107,7 @@ struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
  */
 void unregister_sysctl_table(struct ctl_table_header *header)
 {
+	int dirs_to_delete = header->ctl_owned_dirs_refs;
 	might_sleep();
 
 	while(header->parent) {
@@ -2098,6 +2117,13 @@ void unregister_sysctl_table(struct ctl_table_header *header)
 		 * and ctl_use_refs) are protected by the spin lock. */
 		spin_lock(&sysctl_lock);
 		if (header->ctl_header_refs > 1) {
+			if (WARN(dirs_to_delete != 0, "directory that we "
+				 "created is still used by another header.")) {
+				/* if one element of the path is still used it's
+				 * parents will be too. Stop sending warnings */
+				dirs_to_delete = 0;
+			}
+
 			/* other headers need a reference to this one. Just
 			 * mark that we don't need it and leave it as it is. */
 			header->ctl_header_refs --;
@@ -2116,6 +2142,10 @@ void unregister_sysctl_table(struct ctl_table_header *header)
 		start_unregistering(header);
 		spin_unlock(&sysctl_lock);
 
+		/* don't go negative */
+		if (dirs_to_delete)
+			dirs_to_delete --;
+
 		if (!header->ctl_dirname) {
 			/* the header is a netns correspondent of it's
 			 * parent. It is a member of it's netns
@@ -2173,6 +2203,7 @@ void sysctl_init_group(struct ctl_table_group *group,
 	group->has_netns_corresp = has_netns_corresp;
 	if (has_netns_corresp)
 		INIT_LIST_HEAD(&group->corresp_list);
+	group->is_initialized = 1;
 }
 
 #else /* !CONFIG_SYSCTL */
-- 
1.7.5.134.g1c08b


^ permalink raw reply related

* [v2 093/115] sysctl: ax25: create empty dir with register_sysctl_dir
From: Lucian Adrian Grijincu @ 2011-05-08 22:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, Lucian Adrian Grijincu
In-Reply-To: <1304894407-32201-1-git-send-email-lucian.grijincu@gmail.com>

Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
 net/ax25/af_ax25.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index 965662d..d8a4ea4 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -1996,7 +1996,6 @@ static const struct __initdata ctl_path ax25_path[] = {
 	{ .procname = "ax25" },
 	{ }
 };
-static struct ctl_table empty;
 static struct ctl_table_header *ax25_root_header;
 #endif /* CONFIG_SYSCTL */
 
@@ -2014,7 +2013,7 @@ static int __init ax25_init(void)
 
 	/* XXX: no error checking done in initializer */
 	#ifdef CONFIG_SYSCTL
-	ax25_root_header = register_sysctl_paths(ax25_path, &empty);
+	ax25_root_header = register_sysctl_dir(ax25_path);
 	#endif
 
 	proc_net_fops_create(&init_net, "ax25_route", S_IRUGO, &ax25_route_fops);
-- 
1.7.5.134.g1c08b


^ permalink raw reply related

* [v2 094/115] sysctl: net/core: create empty dir with register_sysctl_dir
From: Lucian Adrian Grijincu @ 2011-05-08 22:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, Lucian Adrian Grijincu
In-Reply-To: <1304894407-32201-1-git-send-email-lucian.grijincu@gmail.com>

Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
 net/core/sysctl_net_core.c |    4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 385b609..6d2fe6e 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -239,9 +239,7 @@ static __net_initdata struct pernet_operations sysctl_core_ops = {
 
 static __init int sysctl_core_init(void)
 {
-	static struct ctl_table empty[1];
-
-	register_sysctl_paths(net_core_path, empty);
+	register_sysctl_dir(net_core_path);
 	register_net_sysctl_rotable(net_core_path, net_core_table);
 	return register_pernet_subsys(&sysctl_core_ops);
 }
-- 
1.7.5.134.g1c08b


^ permalink raw reply related

* [v2 095/115] sysctl: net/ipv4/neigh: create empty dir with register_sysctl_dir
From: Lucian Adrian Grijincu @ 2011-05-08 22:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, Lucian Adrian Grijincu
In-Reply-To: <1304894407-32201-1-git-send-email-lucian.grijincu@gmail.com>

Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
 net/ipv4/route.c |    4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 46c7b3d..092f3d1 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -3151,8 +3151,6 @@ static ctl_table ipv4_route_table[] = {
 	{ }
 };
 
-static struct ctl_table empty[1];
-
 static __net_initdata struct ctl_path ipv4_neigh_path[] = {
 	{ .procname = "net", },
 	{ .procname = "ipv4", },
@@ -3310,6 +3308,6 @@ int __init ip_rt_init(void)
 void __init ip_static_sysctl_init(void)
 {
 	register_sysctl_paths(ipv4_route_path, ipv4_route_table);
-	register_sysctl_paths(ipv4_neigh_path, empty);
+	register_sysctl_dir(ipv4_neigh_path);
 }
 #endif
-- 
1.7.5.134.g1c08b


^ permalink raw reply related

* Re: [PATCH net-next] ethtool: Add 20G bit definitions
From: David Miller @ 2011-05-08 22:43 UTC (permalink / raw)
  To: yanivr; +Cc: netdev, eilong, bhutchings
In-Reply-To: <1304407809.2518.1.camel@lb-tlvb-dmitry>

From: "Yaniv Rosner" <yanivr@broadcom.com>
Date: Tue, 3 May 2011 10:30:08 +0300

> Add 20G supported and advertising bit definitions.
> 20G will be supported with the 57840 chips.
> 
> 
> Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
> Signed-off-by: Eilon Greenstein <eilong@broadcom.com>

Applied to net-next-2.6, thanks.

^ permalink raw reply

* Re: [PATCH net-next-2.6] ipheth: Properly distinguish length and alignment in URBs and skbs
From: David Miller @ 2011-05-08 22:46 UTC (permalink / raw)
  To: bhutchings
  Cc: david.hill, agimenez, linux-kernel, dgiagio, dborca, pmcenery,
	linux-usb, netdev
In-Reply-To: <1304444965.2873.11.camel@bwh-desktop>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Tue, 03 May 2011 18:49:25 +0100

> The USB protocol this driver implements appears to require 2 bytes of
> padding in front of each received packet.  This used to be equal to
> the value of NET_IP_ALIGN on x86, so the driver abused that constant
> and mostly worked, but this is no longer the case.  The driver also
> mixed up the URB and packet lengths, resulting in 2 bytes of junk at
> the end of the skb.
> 
> Introduce a private constant for the 2 bytes of padding; fix this
> confusion and check for the under-length case.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> ---
> Compile-tested only, as I'm not cool enough for an iPhone either.
> This is applicable to net-next-2.6 or v2.6.38.

I've applied this to net-2.6 and will conditionally queue it up for
-stable, if we need further fixups we can add relative patches.

Thanks.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox