Linux Container Development
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: paulmck@linux.vnet.ibm.com
Cc: David Miller <davem@davemloft.net>,
	netdev@vger.kernel.org,
	Linux Containers <containers@lists.osdl.org>
Subject: Re: [PATCH 03/16] net: Basic network namespace infrastructure.
Date: Sun, 09 Sep 2007 04:04:45 -0600	[thread overview]
Message-ID: <m1fy1otarm.fsf@ebiederm.dsl.xmission.com> (raw)
In-Reply-To: <20070909003308.GA10417@linux.vnet.ibm.com> (Paul E. McKenney's message of "Sat, 8 Sep 2007 17:33:08 -0700")

"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:

> On Sat, Sep 08, 2007 at 03:15:34PM -0600, Eric W. Biederman wrote:
>> 
>> This is the basic infrastructure needed to support network
>> namespaces.  This infrastructure is:
>> - Registration functions to support initializing per network
>>   namespace data when a network namespaces is created or destroyed.
>> 
>> - struct net.  The network namespace data structure.
>>   This structure will grow as variables are made per network
>>   namespace but this is the minimal starting point.
>> 
>> - Functions to grab a reference to the network namespace.
>>   I provide both get/put functions that keep a network namespace
>>   from being freed.  And hold/release functions serve as weak references
>>   and will warn if their count is not zero when the data structure
>>   is freed.  Useful for dealing with more complicated data structures
>>   like the ipv4 route cache.
>> 
>> - A list of all of the network namespaces so we can iterate over them.
>> 
>> - A slab for the network namespace data structure allowing leaks
>>   to be spotted.
>
> If I understand this correctly, the only way to get to a namespace is
> via get_net_ns_by_pid(), which contains the rcu_read_lock() that matches
> the rcu_barrier() below.

Not quite.  That is the convoluted case for getting a namespace someone
else is using.  current->nsproxy->net_ns works and should require no
locking to read (only the current process may modify it) and does hold
a reference to the network namespace.  Similarly for sock->sk_net.

> So, is the get_net() in sock_copy() in this patch adding a reference to
> an element that is guaranteed to already have at least one reference?

Yes.

> If not, how are we preventing sock_copy() from running concurrently with
> cleanup_net()?  Ah, I see -- in sock_copy() we are getting a reference
> to the new struct sock that no one else can get a reference to, so OK.
> Ditto for the get_net() in sk_alloc().

> But I still don't understand what is protecting the get_net() in
> dev_seq_open().  Is there an existing reference? 

Sort of.  The directories under /proc/net are created when create
a network namespace and they are destroyed when the network namespace
is removed.  And those directories remember which network namespace
they are for and that is what dev_seq_open is referencing.

So the tricky case what happens if we open a directory under /proc/net
as we are cleaning up a network namespace.

> If so, how do we know
> that it won't be removed just as we are trying to add our reference
> (while at the same time cleanup_net() is running)?  Ditto for the other
> _open() operations in the same patch.  And for netlink_seq_open().
>
> Enlightenment?

Good spotting. It looks like you have found a legitimate race.  Grr.
I thought I had a reference to the network namespace there.  I need to
step back and think about this a bit, and see if I can come up with a
legitimate idiom.

I know the network namespace exists and I have not finished
cleanup_net because I can still get to the /proc entries.

I know I cannot use get_net for the reference in in /proc because
otherwise I could not release the network namespace unless I was to
unmount the filesystem, which is not a desirable property.

I think I can change the idiom to:

struct net *maybe_get_net(struct net *net)
{
        if (!atomic_inc_not_zero(&net->count))
        	net = NULL;
	return net;               
}

Which would make dev_seq_open be:

static int dev_seq_open(struct inode *inode, struct file *file)
{
	struct seq_file *seq;
	int res;
	res =  seq_open(file, &dev_seq_ops);
	if (!res) {
		seq = file->private_data;
		seq->private = maybe_get_net(PROC_NET(inode));
		if (!seq->private) {
			res = -ENOENT;
                        seq_release(inode, file);
		}
	}
	return res;
}

I'm still asking myself if I need any kind of locking to ensure
struct net does not go away in the mean time, if so rcu_read_lock()
should be sufficient.

I will read through the generic proc code very carefully after
I have slept and see if there is what I the code above is sufficient,
and if so update the patchset.

Eric

  reply	other threads:[~2007-09-09 10:04 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-09-08 21:07 [PATCH 00/16] core network namespace support Eric W. Biederman
2007-09-08 21:09 ` [PATCH 01/16] appletalk: In notifier handlers convert the void pointer to a netdevice Eric W. Biederman
2007-09-08 21:13   ` [PATCH 02/16] net: Don't implement dev_ifname32 inline Eric W. Biederman
2007-09-08 21:15     ` [PATCH 03/16] net: Basic network namespace infrastructure Eric W. Biederman
2007-09-08 21:17       ` [PATCH 04/16] net: Add a network namespace parameter to tasks Eric W. Biederman
2007-09-08 21:18         ` [PATCH 05/16] net: Add a network namespace tag to struct net_device Eric W. Biederman
2007-09-08 21:20           ` [PATCH 07/16] net: Make /proc/net per network namespace Eric W. Biederman
2007-09-08 21:23             ` [PATCH 08/16] net: Make socket creation namespace safe Eric W. Biederman
2007-09-08 21:24               ` [PATCH 09/16] net: Initialize the network namespace of network devices Eric W. Biederman
2007-09-08 21:25                 ` [PATCH 10/16] net: Make packet reception network namespace safe Eric W. Biederman
2007-09-08 21:27                   ` [PATCH 11/16] net: Make device event notification " Eric W. Biederman
2007-09-08 21:28                     ` [PATCH 12/16] net: Support multiple network namespaces with netlink Eric W. Biederman
2007-09-08 21:35                       ` [PATCH 13/16] net: Make the device list and device lookups per namespace Eric W. Biederman
2007-09-08 21:36                         ` [PATCH 14/16] net: Factor out __dev_alloc_name from dev_alloc_name Eric W. Biederman
2007-09-08 21:38                           ` [PATCH 15/16] net: Implement network device movement between namespaces Eric W. Biederman
2007-09-08 21:43                             ` [PATCH 16/16] net: netlink support for moving devices between network namespaces Eric W. Biederman
2007-09-08 21:47                               ` [PATCH 17/16] net: Disable netfilter sockopts when not in the initial network namespace Eric W. Biederman
2007-09-10 13:50                                 ` Pavel Emelyanov
     [not found]                                   ` <46E54B96.8060105-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2007-09-10 15:27                                     ` Eric W. Biederman
2007-09-12 11:59                                 ` David Miller
2007-09-12 12:03                                   ` David Miller
2007-09-12 12:16                                     ` Eric W. Biederman
     [not found]                               ` <m1tzq4u92n.fsf_-_-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org>
2007-09-10 19:07                                 ` [PATCH 16/16] net: netlink support for moving devices between network namespaces Serge E. Hallyn
2007-09-10 19:30                                   ` Eric W. Biederman
2007-09-11  0:54                                     ` Serge E. Hallyn
2007-09-12 11:57                               ` David Miller
2007-09-12 11:54                             ` [PATCH 15/16] net: Implement network device movement between namespaces David Miller
2007-09-12 11:49                           ` [PATCH 14/16] net: Factor out __dev_alloc_name from dev_alloc_name David Miller
2007-09-12 11:39                         ` [PATCH 13/16] net: Make the device list and device lookups per namespace David Miller
     [not found]                       ` <m1bqccvock.fsf_-_-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org>
2007-09-10 13:46                         ` [PATCH 12/16] net: Support multiple network namespaces with netlink Pavel Emelyanov
2007-09-10 15:24                           ` Eric W. Biederman
2007-09-12 11:06                       ` David Miller
2007-09-12 11:02                     ` [PATCH 11/16] net: Make device event notification network namespace safe David Miller
2007-09-12 11:00                   ` [PATCH 10/16] net: Make packet reception " David Miller
2007-09-12 10:58                 ` [PATCH 09/16] net: Initialize the network namespace of network devices David Miller
2007-09-12 10:04               ` [PATCH 08/16] net: Make socket creation namespace safe David Miller
2007-09-12 10:02             ` [PATCH 07/16] net: Make /proc/net per network namespace David Miller
2007-09-12 12:12               ` Daniel Lezcano
2007-09-12 12:19                 ` David Miller
2007-09-08 21:21           ` [PATCH 06/16] net: Add a network namespace parameter to struct sock Eric W. Biederman
2007-09-12  9:58             ` David Miller
2007-09-12  9:57           ` [PATCH 05/16] net: Add a network namespace tag to struct net_device David Miller
2007-09-12  9:55         ` [PATCH 04/16] net: Add a network namespace parameter to tasks David Miller
     [not found]       ` <m1ejh8x3ih.fsf_-_-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org>
2007-09-09  0:33         ` [PATCH 03/16] net: Basic network namespace infrastructure Paul E. McKenney
2007-09-09 10:04           ` Eric W. Biederman [this message]
     [not found]             ` <m1fy1otarm.fsf-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org>
2007-09-09 16:45               ` Paul E. McKenney
2007-09-10  6:32                 ` Eric W. Biederman
2007-09-10 13:16         ` Pavel Emelyanov
     [not found]           ` <46E543A0.7010104-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2007-09-10 15:53             ` Eric W. Biederman
2007-09-09  8:44       ` Eric Dumazet
     [not found]         ` <46E3B281.4030105-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
2007-09-09 10:18           ` Eric W. Biederman
2007-09-10  5:46       ` Krishna Kumar2
     [not found]         ` <OF55551EA4.A3E6920C-ON65257352.001D6A3E-65257352.001FBEA7-xthvdsQ13ZrQT0dZR+AlfA@public.gmane.org>
2007-09-10  6:40           ` Eric W. Biederman
2007-09-12  9:52       ` David Miller
2007-09-12  9:39     ` [PATCH 02/16] net: Don't implement dev_ifname32 inline David Miller
2007-09-12  9:27   ` [PATCH 01/16] appletalk: In notifier handlers convert the void pointer to a netdevice David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m1fy1otarm.fsf@ebiederm.dsl.xmission.com \
    --to=ebiederm@xmission.com \
    --cc=containers@lists.osdl.org \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox