All of lore.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Pavel Emelyanov <xemul@parallels.com>
Cc: Linux Netdev List <netdev@vger.kernel.org>,
	David Miller <davem@davemloft.net>
Subject: Re: [PATCH 1/2] net: Allow to create links with given ifindex
Date: Tue, 31 Jul 2012 04:58:36 -0700	[thread overview]
Message-ID: <878ve0dtw3.fsf@xmission.com> (raw)
In-Reply-To: <50179F66.1000604@parallels.com> (Pavel Emelyanov's message of "Tue, 31 Jul 2012 13:03:34 +0400")

Pavel Emelyanov <xemul@parallels.com> writes:

> On 07/30/2012 02:56 PM, Eric W. Biederman wrote:
>> ebiederm@xmission.com (Eric W. Biederman) writes:
>> 
>>> Pavel Emelyanov <xemul@parallels.com> writes:
>>>
>>>> Currently the RTM_NEWLINK results in -EOPNOTSUPP if the ifinfomsg->ifi_index
>>>> is not zero. I propose to allow requesting ifindices on link creation. This
>>>> is required by the checkpoint-restore to correctly restore a net namespace
>>>> (i.e. -- a container). The question what to do with pre-created devices such
>>>> as lo or sit fbdev is open, but for manually created devices this can be 
>>>> solved by this patch.
>>>
>>> Have you walked through and found the locations where we still rely on
>>> ifindex being globally unique?
>>>
>>> Last time I was working in this area there were serveral places where
>>> things were indexed by just the interface index.
>> 
>> If it is really safe to make ifindex per network namespace at this
>> point you can make dev_new_ifindex have a per network namespace base
>> counter, and that will fix your problems with the loopback device.
>
> Not it's not so unfortunately :(
>
> First, let's imagine that on host A the loopback device got registered as
> first device, but on host B for some reason some other device got registered
> first. In that case after migration from A to B the lo on B will have index
> equals 2. And there's no any strict requirement that lo's per net operations
> are registered first. Please, correct me if I'm wrong.

Actually there is a hard requirement that the loopback device be the
last device in a network namespace to be unregistered.  We meet that
requirement by registering the loopback device first
"net/core/dev.c:net_dev_init()".

> Next. In fact, lo is not the only problem. Look at the e.g. sit versus ipgre
> fallback devices. Both gets created on netns creation and obtain whatever
> ifindices are generated for them. Even if we make ifidex per netns chances
> that sit gets registered _strictly_ before ipgre equal zero, since they are
> both modules.

True.  However those fallback devices should no longer be needed,
and even if they are I think you can delete and recreate them.

Making lo the particularly interesting case.

>> Unless you have done the work to root out the last of dependencies on
>> ifindex being globally unique I think you will run into some operational
>> problems.
>
> I totally agree with that. Before doing this patch I revisited the ancient
> attempt to make ifindices per netns and checked the issues Dave and you
> discussed then -- I have looked through how the ifindices are used in the
> networking code and found no places where the system-wide uniqueness is still
> required. That's why I proposed this patch for inclusion. If you know the 
> places I've missed, please let me know, I will work on it.

I took a quick look and I did not see anything.  I saw places under
net/sched/ that looked a bit suspicious, and of course there are places
where we use oif and iff in some of the routing code that make we wonder
a bit.  But if you have looked and if I have looked I think we are ok.

> Just an idea -- is it worth moving the possibility to have ifindidces intersect
> under CONFIG_<SOMETHING> (EXPERT/CHECKPOINT_RESTORE) to let wider audience check
> the code in real-life?

I think the best testing we are going to get diversity wise is to create
a per netns counter into dev_new_index when net-next opens up.

Having an ifindex that we can only set at netdevice creation time seems
reasonable.  

Eric

  reply	other threads:[~2012-07-31 11:58 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-30  4:34 [PATCH 1/2] net: Allow to create links with given ifindex Pavel Emelyanov
2012-07-30  4:36 ` [PATCH 2/2] veth: Allow to create peer link " Pavel Emelyanov
2012-07-30 10:49 ` [PATCH 1/2] net: Allow to create links " Eric W. Biederman
2012-07-30 10:56   ` Eric W. Biederman
2012-07-31  9:03     ` Pavel Emelyanov
2012-07-31 11:58       ` Eric W. Biederman [this message]
2012-07-31 13:30         ` Pavel Emelyanov
2012-08-02 10:28         ` Eric Dumazet
2012-08-02 11:09           ` Eric W. Biederman
2012-08-02 23:37             ` David Miller
2012-08-02 23:26           ` David Miller
2012-08-03  5:45             ` Eric Dumazet
2012-08-03  5:51               ` Eric Dumazet
2012-08-03 23:56               ` David Miller
2012-08-04  7:10                 ` Eric Dumazet
2012-08-04  8:25                   ` David Miller
2012-07-30 11:51   ` Eric Dumazet
2012-07-30 12:33     ` Eric W. Biederman
2012-07-31  9:06       ` Pavel Emelyanov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878ve0dtw3.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=xemul@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.