netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>,
	David Miller <davem@davemloft.net>,
	vyasevic@redhat.com, kstewart@linuxfoundation.org,
	pombredanne@nexb.com, Vladislav Yasevich <vyasevich@gmail.com>,
	mark.rutland@arm.com, Greg KH <gregkh@linuxfoundation.org>,
	Alexey Dobriyan <adobriyan@gmail.com>,
	Florian Westphal <fw@strlen.de>,
	Nicolas Dichtel <nicolas.dichtel@6wind.com>,
	roman.kapl@sysgo.com, Paul Moore <paul@paul-moore.com>,
	David Ahern <dsahern@gmail.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	lucien xin <lucien.xin@gmail.com>,
	Matthias Schiffer <mschiffer@universe-factory.net>,
	rshearma@brocade.com, LKML <linux-kernel@vger.kernel.org>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	avagin@virtuozzo.com, gorcunov@virtuozzo.com
Subject: Re: [PATCH] net: Convert net_mutex into rw_semaphore and down read it on net->init/->exit
Date: Fri, 17 Nov 2017 12:52:18 -0600	[thread overview]
Message-ID: <87mv3kwxel.fsf@xmission.com> (raw)
In-Reply-To: <24d2557d-5790-2a69-3758-d155e2b940be@virtuozzo.com> (Kirill Tkhai's message of "Fri, 17 Nov 2017 21:36:56 +0300")

Kirill Tkhai <ktkhai@virtuozzo.com> writes:

> On 15.11.2017 19:31, Eric W. Biederman wrote:
>> Kirill Tkhai <ktkhai@virtuozzo.com> writes:
>> 
>>> On 15.11.2017 12:51, Kirill Tkhai wrote:
>>>> On 15.11.2017 06:19, Eric W. Biederman wrote:
>>>>> Kirill Tkhai <ktkhai@virtuozzo.com> writes:
>>>>>
>>>>>> On 14.11.2017 21:39, Cong Wang wrote:
>>>>>>> On Tue, Nov 14, 2017 at 5:53 AM, Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
>>>>>>>> @@ -406,7 +406,7 @@ struct net *copy_net_ns(unsigned long flags,
>>>>>>>>
>>>>>>>>         get_user_ns(user_ns);
>>>>>>>>
>>>>>>>> -       rv = mutex_lock_killable(&net_mutex);
>>>>>>>> +       rv = down_read_killable(&net_sem);
>>>>>>>>         if (rv < 0) {
>>>>>>>>                 net_free(net);
>>>>>>>>                 dec_net_namespaces(ucounts);
>>>>>>>> @@ -421,7 +421,7 @@ struct net *copy_net_ns(unsigned long flags,
>>>>>>>>                 list_add_tail_rcu(&net->list, &net_namespace_list);
>>>>>>>>                 rtnl_unlock();
>>>>>>>>         }
>>>>>>>> -       mutex_unlock(&net_mutex);
>>>>>>>> +       up_read(&net_sem);
>>>>>>>>         if (rv < 0) {
>>>>>>>>                 dec_net_namespaces(ucounts);
>>>>>>>>                 put_user_ns(user_ns);
>>>>>>>> @@ -446,7 +446,7 @@ static void cleanup_net(struct work_struct *work)
>>>>>>>>         list_replace_init(&cleanup_list, &net_kill_list);
>>>>>>>>         spin_unlock_irq(&cleanup_list_lock);
>>>>>>>>
>>>>>>>> -       mutex_lock(&net_mutex);
>>>>>>>> +       down_read(&net_sem);
>>>>>>>>
>>>>>>>>         /* Don't let anyone else find us. */
>>>>>>>>         rtnl_lock();
>>>>>>>> @@ -486,7 +486,7 @@ static void cleanup_net(struct work_struct *work)
>>>>>>>>         list_for_each_entry_reverse(ops, &pernet_list, list)
>>>>>>>>                 ops_free_list(ops, &net_exit_list);
>>>>>>>>
>>>>>>>> -       mutex_unlock(&net_mutex);
>>>>>>>> +       up_read(&net_sem);
>>>>>>>
>>>>>>> After your patch setup_net() could run concurrently with cleanup_net(),
>>>>>>> given that ops_exit_list() is called on error path of setup_net() too,
>>>>>>> it means ops->exit() now could run concurrently if it doesn't have its
>>>>>>> own lock. Not sure if this breaks any existing user.
>>>>>>
>>>>>> Yes, there will be possible concurrent ops->init() for a net namespace,
>>>>>> and ops->exit() for another one. I hadn't found pernet operations, which
>>>>>> have a problem with that. If they exist, they are hidden and not clear seen.
>>>>>> The pernet operations in general do not touch someone else's memory.
>>>>>> If suddenly there is one, KASAN should show it after a while.
>>>>>
>>>>> Certainly the use of hash tables shared between multiple network
>>>>> namespaces would count.  I don't rembmer how many of these we have but
>>>>> there used to be quite a few.
>>>>
>>>> Could you please provide an example of hash tables, you mean?
>>>
>>> Ah, I see, it's dccp_hashinfo etc.
>
> JFI, I've checked dccp_hashinfo, and it seems to be safe.
>
>> 
>> The big one used to be the route cache.  With resizable hash tables
>> things may be getting better in that regard.
>
> I've checked some fib-related things, and wasn't able to find that.
> Excuse me, could you please clarify, if it's an assumption, or
> there is exactly a problem hash table, you know? Could you please
> point it me more exactly, if it's so.

Two things.
1) Hash tables are one case I know where we access data from multiple
   network namespaces.  As such it can not be asserted that is no
   possibility for problems.

2) The responsible way to handle this is one patch for each set of
   methods explaining why those methods are safe to run in parallel.

   That ensures there is opportunity for review and people are going
   slowly enough that they actually look at these issues.

The reason I want to see this broken up is that at 200ish sets of
methods it is too much to review all at once.

I completely agree that odds are that this can be made safe and that it
is mostly likely already safe in practically every instance.    My guess
would be that if there are problems that need to be addressed they
happen in one or two places and we need to find them.  If possible I
don't want to find them after the code has shipped in a stable release.

Eric



   

  reply	other threads:[~2017-11-17 18:52 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-14 13:53 [PATCH] net: Convert net_mutex into rw_semaphore and down read it on net->init/->exit Kirill Tkhai
2017-11-14 17:07 ` Eric Dumazet
2017-11-14 17:25   ` Kirill Tkhai
2017-11-14 17:44 ` Andrei Vagin
2017-11-14 18:00   ` Eric Dumazet
2017-11-14 22:15     ` Andrei Vagin
2017-11-14 18:04   ` Kirill Tkhai
2017-11-14 18:38     ` Andrei Vagin
2017-11-14 20:43       ` Kirill Tkhai
2017-11-14 18:11 ` Stephen Hemminger
2017-11-14 19:07   ` Eric Dumazet
2017-11-14 18:39 ` Cong Wang
2017-11-14 19:58   ` Kirill Tkhai
2017-11-15  3:19     ` Eric W. Biederman
2017-11-15  9:51       ` Kirill Tkhai
2017-11-15 12:36         ` Kirill Tkhai
2017-11-15 16:31           ` Eric W. Biederman
2017-11-17 18:36             ` Kirill Tkhai
2017-11-17 18:52               ` Eric W. Biederman [this message]
2017-11-17 20:16                 ` Kirill Tkhai
2017-11-15  6:25 ` Eric W. Biederman
2017-11-15  9:49   ` Kirill Tkhai
2017-11-15 16:29     ` Eric W. Biederman
2017-11-16  9:13       ` Kirill Tkhai
2017-11-17 16:46       ` Kirill Tkhai
2017-11-17 18:54         ` Eric W. Biederman
2017-11-17 20:12           ` Kirill Tkhai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87mv3kwxel.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=adobriyan@gmail.com \
    --cc=avagin@virtuozzo.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dsahern@gmail.com \
    --cc=fw@strlen.de \
    --cc=gorcunov@virtuozzo.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=kstewart@linuxfoundation.org \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lucien.xin@gmail.com \
    --cc=mark.rutland@arm.com \
    --cc=mschiffer@universe-factory.net \
    --cc=netdev@vger.kernel.org \
    --cc=nicolas.dichtel@6wind.com \
    --cc=paul@paul-moore.com \
    --cc=pombredanne@nexb.com \
    --cc=roman.kapl@sysgo.com \
    --cc=rshearma@brocade.com \
    --cc=vyasevic@redhat.com \
    --cc=vyasevich@gmail.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).