From: "Paul E. McKenney" <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
To: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Cc: Linux Containers
<containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org>,
netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
Subject: Re: [PATCH 03/16] net: Basic network namespace infrastructure.
Date: Sat, 8 Sep 2007 17:33:08 -0700 [thread overview]
Message-ID: <20070909003308.GA10417@linux.vnet.ibm.com> (raw)
In-Reply-To: <m1ejh8x3ih.fsf_-_-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org>
On Sat, Sep 08, 2007 at 03:15:34PM -0600, Eric W. Biederman wrote:
>
> This is the basic infrastructure needed to support network
> namespaces. This infrastructure is:
> - Registration functions to support initializing per network
> namespace data when a network namespaces is created or destroyed.
>
> - struct net. The network namespace data structure.
> This structure will grow as variables are made per network
> namespace but this is the minimal starting point.
>
> - Functions to grab a reference to the network namespace.
> I provide both get/put functions that keep a network namespace
> from being freed. And hold/release functions serve as weak references
> and will warn if their count is not zero when the data structure
> is freed. Useful for dealing with more complicated data structures
> like the ipv4 route cache.
>
> - A list of all of the network namespaces so we can iterate over them.
>
> - A slab for the network namespace data structure allowing leaks
> to be spotted.
If I understand this correctly, the only way to get to a namespace is
via get_net_ns_by_pid(), which contains the rcu_read_lock() that matches
the rcu_barrier() below.
So, is the get_net() in sock_copy() in this patch adding a reference to
an element that is guaranteed to already have at least one reference?
If not, how are we preventing sock_copy() from running concurrently with
cleanup_net()? Ah, I see -- in sock_copy() we are getting a reference
to the new struct sock that no one else can get a reference to, so OK.
Ditto for the get_net() in sk_alloc().
But I still don't understand what is protecting the get_net() in
dev_seq_open(). Is there an existing reference? If so, how do we know
that it won't be removed just as we are trying to add our reference
(while at the same time cleanup_net() is running)? Ditto for the other
_open() operations in the same patch. And for netlink_seq_open().
Enlightenment?
Thanx, Paul
> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> ---
> include/net/net_namespace.h | 68 ++++++++++
> net/core/Makefile | 2 +-
> net/core/net_namespace.c | 292 +++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 361 insertions(+), 1 deletions(-)
> create mode 100644 include/net/net_namespace.h
> create mode 100644 net/core/net_namespace.c
>
> diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
> new file mode 100644
> index 0000000..6344b77
> --- /dev/null
> +++ b/include/net/net_namespace.h
> @@ -0,0 +1,68 @@
> +/*
> + * Operations on the network namespace
> + */
> +#ifndef __NET_NET_NAMESPACE_H
> +#define __NET_NET_NAMESPACE_H
> +
> +#include <asm/atomic.h>
> +#include <linux/workqueue.h>
> +#include <linux/list.h>
> +
> +struct net {
> + atomic_t count; /* To decided when the network
> + * namespace should be freed.
> + */
> + atomic_t use_count; /* To track references we
> + * destroy on demand
> + */
> + struct list_head list; /* list of network namespaces */
> + struct work_struct work; /* work struct for freeing */
> +};
> +
> +extern struct net init_net;
> +extern struct list_head net_namespace_list;
> +
> +extern void __put_net(struct net *net);
> +
> +static inline struct net *get_net(struct net *net)
> +{
> + atomic_inc(&net->count);
> + return net;
> +}
> +
> +static inline void put_net(struct net *net)
> +{
> + if (atomic_dec_and_test(&net->count))
> + __put_net(net);
> +}
> +
> +static inline struct net *hold_net(struct net *net)
> +{
> + atomic_inc(&net->use_count);
> + return net;
> +}
> +
> +static inline void release_net(struct net *net)
> +{
> + atomic_dec(&net->use_count);
> +}
> +
> +extern void net_lock(void);
> +extern void net_unlock(void);
> +
> +#define for_each_net(VAR) \
> + list_for_each_entry(VAR, &net_namespace_list, list)
> +
> +
> +struct pernet_operations {
> + struct list_head list;
> + int (*init)(struct net *net);
> + void (*exit)(struct net *net);
> +};
> +
> +extern int register_pernet_subsys(struct pernet_operations *);
> +extern void unregister_pernet_subsys(struct pernet_operations *);
> +extern int register_pernet_device(struct pernet_operations *);
> +extern void unregister_pernet_device(struct pernet_operations *);
> +
> +#endif /* __NET_NET_NAMESPACE_H */
> diff --git a/net/core/Makefile b/net/core/Makefile
> index 4751613..ea9b3f3 100644
> --- a/net/core/Makefile
> +++ b/net/core/Makefile
> @@ -3,7 +3,7 @@
> #
>
> obj-y := sock.o request_sock.o skbuff.o iovec.o datagram.o stream.o scm.o \
> - gen_stats.o gen_estimator.o
> + gen_stats.o gen_estimator.o net_namespace.o
>
> obj-$(CONFIG_SYSCTL) += sysctl_net_core.o
>
> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> new file mode 100644
> index 0000000..f259a9b
> --- /dev/null
> +++ b/net/core/net_namespace.c
> @@ -0,0 +1,292 @@
> +#include <linux/workqueue.h>
> +#include <linux/rtnetlink.h>
> +#include <linux/cache.h>
> +#include <linux/slab.h>
> +#include <linux/list.h>
> +#include <linux/delay.h>
> +#include <net/net_namespace.h>
> +
> +/*
> + * Our network namespace constructor/destructor lists
> + */
> +
> +static LIST_HEAD(pernet_list);
> +static struct list_head *first_device = &pernet_list;
> +static DEFINE_MUTEX(net_mutex);
> +
> +static DEFINE_MUTEX(net_list_mutex);
> +LIST_HEAD(net_namespace_list);
> +
> +static struct kmem_cache *net_cachep;
> +
> +struct net init_net;
> +EXPORT_SYMBOL_GPL(init_net);
> +
> +void net_lock(void)
> +{
> + mutex_lock(&net_list_mutex);
> +}
> +
> +void net_unlock(void)
> +{
> + mutex_unlock(&net_list_mutex);
> +}
> +
> +static struct net *net_alloc(void)
> +{
> + return kmem_cache_alloc(net_cachep, GFP_KERNEL);
> +}
> +
> +static void net_free(struct net *net)
> +{
> + if (!net)
> + return;
> +
> + if (unlikely(atomic_read(&net->use_count) != 0)) {
> + printk(KERN_EMERG "network namespace not free! Usage: %d\n",
> + atomic_read(&net->use_count));
> + return;
> + }
> +
> + kmem_cache_free(net_cachep, net);
> +}
> +
> +static void cleanup_net(struct work_struct *work)
> +{
> + struct pernet_operations *ops;
> + struct list_head *ptr;
> + struct net *net;
> +
> + net = container_of(work, struct net, work);
> +
> + mutex_lock(&net_mutex);
> +
> + /* Don't let anyone else find us. */
> + net_lock();
> + list_del(&net->list);
> + net_unlock();
> +
> + /* Run all of the network namespace exit methods */
> + list_for_each_prev(ptr, &pernet_list) {
> + ops = list_entry(ptr, struct pernet_operations, list);
> + if (ops->exit)
> + ops->exit(net);
> + }
> +
> + mutex_unlock(&net_mutex);
> +
> + /* Ensure there are no outstanding rcu callbacks using this
> + * network namespace.
> + */
> + rcu_barrier();
> +
> + /* Finally it is safe to free my network namespace structure */
> + net_free(net);
> +}
> +
> +
> +void __put_net(struct net *net)
> +{
> + /* Cleanup the network namespace in process context */
> + INIT_WORK(&net->work, cleanup_net);
> + schedule_work(&net->work);
> +}
> +EXPORT_SYMBOL_GPL(__put_net);
> +
> +/*
> + * setup_net runs the initializers for the network namespace object.
> + */
> +static int setup_net(struct net *net)
> +{
> + /* Must be called with net_mutex held */
> + struct pernet_operations *ops;
> + struct list_head *ptr;
> + int error;
> +
> + memset(net, 0, sizeof(struct net));
> + atomic_set(&net->count, 1);
> + atomic_set(&net->use_count, 0);
> +
> + error = 0;
> + list_for_each(ptr, &pernet_list) {
> + ops = list_entry(ptr, struct pernet_operations, list);
> + if (ops->init) {
> + error = ops->init(net);
> + if (error < 0)
> + goto out_undo;
> + }
> + }
> +out:
> + return error;
> +out_undo:
> + /* Walk through the list backwards calling the exit functions
> + * for the pernet modules whose init functions did not fail.
> + */
> + for (ptr = ptr->prev; ptr != &pernet_list; ptr = ptr->prev) {
> + ops = list_entry(ptr, struct pernet_operations, list);
> + if (ops->exit)
> + ops->exit(net);
> + }
> + goto out;
> +}
> +
> +static int __init net_ns_init(void)
> +{
> + int err;
> +
> + printk(KERN_INFO "net_namespace: %zd bytes\n", sizeof(struct net));
> + net_cachep = kmem_cache_create("net_namespace", sizeof(struct net),
> + SMP_CACHE_BYTES,
> + SLAB_PANIC, NULL);
> + mutex_lock(&net_mutex);
> + err = setup_net(&init_net);
> +
> + net_lock();
> + list_add_tail(&init_net.list, &net_namespace_list);
> + net_unlock();
> +
> + mutex_unlock(&net_mutex);
> + if (err)
> + panic("Could not setup the initial network namespace");
> +
> + return 0;
> +}
> +
> +pure_initcall(net_ns_init);
> +
> +static int register_pernet_operations(struct list_head *list,
> + struct pernet_operations *ops)
> +{
> + struct net *net, *undo_net;
> + int error;
> +
> + error = 0;
> + list_add_tail(&ops->list, list);
> + for_each_net(net) {
> + if (ops->init) {
> + error = ops->init(net);
> + if (error)
> + goto out_undo;
> + }
> + }
> +out:
> + return error;
> +
> +out_undo:
> + /* If I have an error cleanup all namespaces I initialized */
> + list_del(&ops->list);
> + for_each_net(undo_net) {
> + if (undo_net == net)
> + goto undone;
> + if (ops->exit)
> + ops->exit(undo_net);
> + }
> +undone:
> + goto out;
> +}
> +
> +static void unregister_pernet_operations(struct pernet_operations *ops)
> +{
> + struct net *net;
> +
> + list_del(&ops->list);
> + for_each_net(net)
> + if (ops->exit)
> + ops->exit(net);
> +}
> +
> +/**
> + * register_pernet_subsys - register a network namespace subsystem
> + * @ops: pernet operations structure for the subsystem
> + *
> + * Register a subsystem which has init and exit functions
> + * that are called when network namespaces are created and
> + * destroyed respectively.
> + *
> + * When registered all network namespace init functions are
> + * called for every existing network namespace. Allowing kernel
> + * modules to have a race free view of the set of network namespaces.
> + *
> + * When a new network namespace is created all of the init
> + * methods are called in the order in which they were registered.
> + *
> + * When a network namespace is destroyed all of the exit methods
> + * are called in the reverse of the order with which they were
> + * registered.
> + */
> +int register_pernet_subsys(struct pernet_operations *ops)
> +{
> + int error;
> + mutex_lock(&net_mutex);
> + error = register_pernet_operations(first_device, ops);
> + mutex_unlock(&net_mutex);
> + return error;
> +}
> +EXPORT_SYMBOL_GPL(register_pernet_subsys);
> +
> +/**
> + * unregister_pernet_subsys - unregister a network namespace subsystem
> + * @ops: pernet operations structure to manipulate
> + *
> + * Remove the pernet operations structure from the list to be
> + * used when network namespaces are created or destoryed. In
> + * addition run the exit method for all existing network
> + * namespaces.
> + */
> +void unregister_pernet_subsys(struct pernet_operations *module)
> +{
> + mutex_lock(&net_mutex);
> + unregister_pernet_operations(module);
> + mutex_unlock(&net_mutex);
> +}
> +EXPORT_SYMBOL_GPL(unregister_pernet_subsys);
> +
> +/**
> + * register_pernet_device - register a network namespace device
> + * @ops: pernet operations structure for the subsystem
> + *
> + * Register a device which has init and exit functions
> + * that are called when network namespaces are created and
> + * destroyed respectively.
> + *
> + * When registered all network namespace init functions are
> + * called for every existing network namespace. Allowing kernel
> + * modules to have a race free view of the set of network namespaces.
> + *
> + * When a new network namespace is created all of the init
> + * methods are called in the order in which they were registered.
> + *
> + * When a network namespace is destroyed all of the exit methods
> + * are called in the reverse of the order with which they were
> + * registered.
> + */
> +int register_pernet_device(struct pernet_operations *ops)
> +{
> + int error;
> + mutex_lock(&net_mutex);
> + error = register_pernet_operations(&pernet_list, ops);
> + if (!error && (first_device == &pernet_list))
> + first_device = &ops->list;
> + mutex_unlock(&net_mutex);
> + return error;
> +}
> +EXPORT_SYMBOL_GPL(register_pernet_device);
> +
> +/**
> + * unregister_pernet_device - unregister a network namespace netdevice
> + * @ops: pernet operations structure to manipulate
> + *
> + * Remove the pernet operations structure from the list to be
> + * used when network namespaces are created or destoryed. In
> + * addition run the exit method for all existing network
> + * namespaces.
> + */
> +void unregister_pernet_device(struct pernet_operations *ops)
> +{
> + mutex_lock(&net_mutex);
> + if (&ops->list == first_device)
> + first_device = first_device->next;
> + unregister_pernet_operations(ops);
> + mutex_unlock(&net_mutex);
> +}
> +EXPORT_SYMBOL_GPL(unregister_pernet_device);
> --
> 1.5.3.rc6.17.g1911
>
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2007-09-09 0:33 UTC|newest]
Thread overview: 75+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-09-08 21:07 [PATCH 00/16] core network namespace support Eric W. Biederman
2007-09-08 21:07 ` Eric W. Biederman
2007-09-08 21:09 ` [PATCH 01/16] appletalk: In notifier handlers convert the void pointer to a netdevice Eric W. Biederman
2007-09-08 21:09 ` Eric W. Biederman
2007-09-08 21:13 ` [PATCH 02/16] net: Don't implement dev_ifname32 inline Eric W. Biederman
2007-09-08 21:13 ` Eric W. Biederman
2007-09-08 21:15 ` [PATCH 03/16] net: Basic network namespace infrastructure Eric W. Biederman
2007-09-08 21:15 ` Eric W. Biederman
2007-09-08 21:17 ` [PATCH 04/16] net: Add a network namespace parameter to tasks Eric W. Biederman
2007-09-08 21:17 ` Eric W. Biederman
2007-09-08 21:18 ` [PATCH 05/16] net: Add a network namespace tag to struct net_device Eric W. Biederman
2007-09-08 21:18 ` Eric W. Biederman
2007-09-08 21:20 ` [PATCH 07/16] net: Make /proc/net per network namespace Eric W. Biederman
2007-09-08 21:20 ` Eric W. Biederman
2007-09-08 21:23 ` [PATCH 08/16] net: Make socket creation namespace safe Eric W. Biederman
2007-09-08 21:23 ` Eric W. Biederman
2007-09-08 21:24 ` [PATCH 09/16] net: Initialize the network namespace of network devices Eric W. Biederman
2007-09-08 21:24 ` Eric W. Biederman
2007-09-08 21:25 ` [PATCH 10/16] net: Make packet reception network namespace safe Eric W. Biederman
2007-09-08 21:25 ` Eric W. Biederman
2007-09-08 21:27 ` [PATCH 11/16] net: Make device event notification " Eric W. Biederman
2007-09-08 21:27 ` Eric W. Biederman
2007-09-08 21:28 ` [PATCH 12/16] net: Support multiple network namespaces with netlink Eric W. Biederman
2007-09-08 21:28 ` Eric W. Biederman
2007-09-08 21:35 ` [PATCH 13/16] net: Make the device list and device lookups per namespace Eric W. Biederman
2007-09-08 21:35 ` Eric W. Biederman
2007-09-08 21:36 ` [PATCH 14/16] net: Factor out __dev_alloc_name from dev_alloc_name Eric W. Biederman
2007-09-08 21:36 ` Eric W. Biederman
2007-09-08 21:38 ` [PATCH 15/16] net: Implement network device movement between namespaces Eric W. Biederman
2007-09-08 21:38 ` Eric W. Biederman
2007-09-08 21:43 ` [PATCH 16/16] net: netlink support for moving devices between network namespaces Eric W. Biederman
2007-09-08 21:43 ` Eric W. Biederman
2007-09-08 21:47 ` [PATCH 17/16] net: Disable netfilter sockopts when not in the initial network namespace Eric W. Biederman
2007-09-08 21:47 ` Eric W. Biederman
2007-09-10 13:50 ` Pavel Emelyanov
[not found] ` <46E54B96.8060105-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2007-09-10 15:27 ` Eric W. Biederman
2007-09-12 11:59 ` David Miller
2007-09-12 12:03 ` David Miller
2007-09-12 12:16 ` Eric W. Biederman
[not found] ` <m1tzq4u92n.fsf_-_-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org>
2007-09-10 19:07 ` [PATCH 16/16] net: netlink support for moving devices between network namespaces Serge E. Hallyn
2007-09-10 19:30 ` Eric W. Biederman
2007-09-11 0:54 ` Serge E. Hallyn
2007-09-12 11:57 ` David Miller
2007-09-12 11:54 ` [PATCH 15/16] net: Implement network device movement between namespaces David Miller
2007-09-12 11:49 ` [PATCH 14/16] net: Factor out __dev_alloc_name from dev_alloc_name David Miller
2007-09-12 11:39 ` [PATCH 13/16] net: Make the device list and device lookups per namespace David Miller
[not found] ` <m1bqccvock.fsf_-_-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org>
2007-09-10 13:46 ` [PATCH 12/16] net: Support multiple network namespaces with netlink Pavel Emelyanov
2007-09-10 15:24 ` Eric W. Biederman
2007-09-12 11:06 ` David Miller
2007-09-12 11:02 ` [PATCH 11/16] net: Make device event notification network namespace safe David Miller
2007-09-12 11:00 ` [PATCH 10/16] net: Make packet reception " David Miller
2007-09-12 10:58 ` [PATCH 09/16] net: Initialize the network namespace of network devices David Miller
2007-09-12 10:04 ` [PATCH 08/16] net: Make socket creation namespace safe David Miller
2007-09-12 10:02 ` [PATCH 07/16] net: Make /proc/net per network namespace David Miller
2007-09-12 12:12 ` Daniel Lezcano
2007-09-12 12:19 ` David Miller
2007-09-08 21:21 ` [PATCH 06/16] net: Add a network namespace parameter to struct sock Eric W. Biederman
2007-09-08 21:21 ` Eric W. Biederman
2007-09-12 9:58 ` David Miller
2007-09-12 9:57 ` [PATCH 05/16] net: Add a network namespace tag to struct net_device David Miller
2007-09-12 9:55 ` [PATCH 04/16] net: Add a network namespace parameter to tasks David Miller
[not found] ` <m1ejh8x3ih.fsf_-_-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org>
2007-09-09 0:33 ` Paul E. McKenney [this message]
2007-09-09 10:04 ` [PATCH 03/16] net: Basic network namespace infrastructure Eric W. Biederman
[not found] ` <m1fy1otarm.fsf-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org>
2007-09-09 16:45 ` Paul E. McKenney
2007-09-10 6:32 ` Eric W. Biederman
2007-09-10 13:16 ` Pavel Emelyanov
[not found] ` <46E543A0.7010104-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2007-09-10 15:53 ` Eric W. Biederman
2007-09-09 8:44 ` Eric Dumazet
[not found] ` <46E3B281.4030105-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
2007-09-09 10:18 ` Eric W. Biederman
2007-09-10 5:46 ` Krishna Kumar2
2007-09-10 5:46 ` Krishna Kumar2
[not found] ` <OF55551EA4.A3E6920C-ON65257352.001D6A3E-65257352.001FBEA7-xthvdsQ13ZrQT0dZR+AlfA@public.gmane.org>
2007-09-10 6:40 ` Eric W. Biederman
2007-09-12 9:52 ` David Miller
2007-09-12 9:39 ` [PATCH 02/16] net: Don't implement dev_ifname32 inline David Miller
2007-09-12 9:27 ` [PATCH 01/16] appletalk: In notifier handlers convert the void pointer to a netdevice David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070909003308.GA10417@linux.vnet.ibm.com \
--to=paulmck-23vcf4htsmix0ybbhkvfkdbpr1lh4cv8@public.gmane.org \
--cc=containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org \
--cc=davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org \
--cc=ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org \
--cc=netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.