From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S966208AbdKQSxA (ORCPT <rfc822;w@1wt.eu>);
        Fri, 17 Nov 2017 13:53:00 -0500
Received: from out01.mta.xmission.com ([166.70.13.231]:41730 "EHLO
        out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S932986AbdKQSwv (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 17 Nov 2017 13:52:51 -0500
From: ebiederm@xmission.com (Eric W. Biederman)
To: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>,
        David Miller <davem@davemloft.net>, vyasevic@redhat.com,
        kstewart@linuxfoundation.org, pombredanne@nexb.com,
        Vladislav Yasevich <vyasevich@gmail.com>, mark.rutland@arm.com,
        Greg KH <gregkh@linuxfoundation.org>,
        Alexey Dobriyan <adobriyan@gmail.com>, Florian Westphal <fw@strlen.de>,
        Nicolas Dichtel <nicolas.dichtel@6wind.com>, roman.kapl@sysgo.com,
        Paul Moore <paul@paul-moore.com>, David Ahern <dsahern@gmail.com>,
        Daniel Borkmann <daniel@iogearbox.net>,
        lucien xin <lucien.xin@gmail.com>,
        Matthias Schiffer <mschiffer@universe-factory.net>,
        rshearma@brocade.com, LKML <linux-kernel@vger.kernel.org>,
        Linux Kernel Network Developers <netdev@vger.kernel.org>,
        avagin@virtuozzo.com, gorcunov@virtuozzo.com
References: <151066759055.14465.9783879083192000862.stgit@localhost.localdomain>
        <CAM_iQpU+Jv6twd7+0VWRYNqMPu6TsZk4Tsi35zmZM9cZpU6sRQ@mail.gmail.com>
        <88152c11-a5b5-90f8-be46-99ed6c722064@virtuozzo.com>
        <87shdg8bzd.fsf@xmission.com>
        <8c808278-1925-37c0-619c-87bd1802790a@virtuozzo.com>
        <06b1d740-d443-ac23-a7b0-675e7b6ff6f9@virtuozzo.com>
        <87r2sz33n4.fsf@xmission.com>
        <24d2557d-5790-2a69-3758-d155e2b940be@virtuozzo.com>
Date: Fri, 17 Nov 2017 12:52:18 -0600
In-Reply-To: <24d2557d-5790-2a69-3758-d155e2b940be@virtuozzo.com> (Kirill
        Tkhai's message of "Fri, 17 Nov 2017 21:36:56 +0300")
Message-ID: <87mv3kwxel.fsf@xmission.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-XM-SPF: eid=1eFlko-0001EK-78;;;mid=<87mv3kwxel.fsf@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=75.170.119.32;;;frm=ebiederm@xmission.com;;;spf=neutral
X-XM-AID: U2FsdGVkX18D4V7s/4aynzfhjdOvwldk1iLYO13Aqqc=
X-SA-Exim-Connect-IP: 75.170.119.32
X-SA-Exim-Mail-From: ebiederm@xmission.com
X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP
        *  0.7 XMSubLong Long Subject
        *  0.0 TVD_RCVD_IP Message was received from an IP address
        *  0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available.
        *  0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60%
        *      [score: 0.5000]
        * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC
        *      [sa06 1397; Body=1 Fuz1=1 Fuz2=1]
        *  0.0 T_TooManySym_02 5+ unique symbols in subject
        *  0.0 T_TooManySym_01 4+ unique symbols in subject
        *  0.0 T_TooManySym_03 6+ unique symbols in subject
X-Spam-DCC: XMission; sa06 1397; Body=1 Fuz1=1 Fuz2=1 
X-Spam-Combo: ;Kirill Tkhai <ktkhai@virtuozzo.com>
X-Spam-Relay-Country: 
X-Spam-Timing: total 5304 ms - load_scoreonly_sql: 0.03 (0.0%),
        signal_user_changed: 3.2 (0.1%), b_tie_ro: 2.2 (0.0%), parse: 1.17 (0.0%),
        extract_message_metadata: 13 (0.2%), get_uri_detail_list: 2.8 (0.1%),
        tests_pri_-1000: 7 (0.1%), tests_pri_-950: 1.20 (0.0%), tests_pri_-900: 1.05
        (0.0%), tests_pri_-400: 31 (0.6%), check_bayes: 29 (0.6%), b_tokenize: 11
        (0.2%), b_tok_get_all: 10 (0.2%), b_comp_prob: 3.2 (0.1%), b_tok_touch_all:
        3.5 (0.1%), b_finish: 0.67 (0.0%), tests_pri_0: 270 (5.1%),
        check_dkim_signature: 0.52 (0.0%), check_dkim_adsp: 2.7 (0.1%),
        tests_pri_500: 4973 (93.8%), poll_dns_idle: 4968 (93.7%), rewrite_mail: 0.00
        (0.0%)
Subject: Re: [PATCH] net: Convert net_mutex into rw_semaphore and down read it on net->init/->exit
X-Spam-Flag: No
X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600)
X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Kirill Tkhai <ktkhai@virtuozzo.com> writes:

> On 15.11.2017 19:31, Eric W. Biederman wrote:
>> Kirill Tkhai <ktkhai@virtuozzo.com> writes:
>> 
>>> On 15.11.2017 12:51, Kirill Tkhai wrote:
>>>> On 15.11.2017 06:19, Eric W. Biederman wrote:
>>>>> Kirill Tkhai <ktkhai@virtuozzo.com> writes:
>>>>>
>>>>>> On 14.11.2017 21:39, Cong Wang wrote:
>>>>>>> On Tue, Nov 14, 2017 at 5:53 AM, Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
>>>>>>>> @@ -406,7 +406,7 @@ struct net *copy_net_ns(unsigned long flags,
>>>>>>>>
>>>>>>>>         get_user_ns(user_ns);
>>>>>>>>
>>>>>>>> -       rv = mutex_lock_killable(&net_mutex);
>>>>>>>> +       rv = down_read_killable(&net_sem);
>>>>>>>>         if (rv < 0) {
>>>>>>>>                 net_free(net);
>>>>>>>>                 dec_net_namespaces(ucounts);
>>>>>>>> @@ -421,7 +421,7 @@ struct net *copy_net_ns(unsigned long flags,
>>>>>>>>                 list_add_tail_rcu(&net->list, &net_namespace_list);
>>>>>>>>                 rtnl_unlock();
>>>>>>>>         }
>>>>>>>> -       mutex_unlock(&net_mutex);
>>>>>>>> +       up_read(&net_sem);
>>>>>>>>         if (rv < 0) {
>>>>>>>>                 dec_net_namespaces(ucounts);
>>>>>>>>                 put_user_ns(user_ns);
>>>>>>>> @@ -446,7 +446,7 @@ static void cleanup_net(struct work_struct *work)
>>>>>>>>         list_replace_init(&cleanup_list, &net_kill_list);
>>>>>>>>         spin_unlock_irq(&cleanup_list_lock);
>>>>>>>>
>>>>>>>> -       mutex_lock(&net_mutex);
>>>>>>>> +       down_read(&net_sem);
>>>>>>>>
>>>>>>>>         /* Don't let anyone else find us. */
>>>>>>>>         rtnl_lock();
>>>>>>>> @@ -486,7 +486,7 @@ static void cleanup_net(struct work_struct *work)
>>>>>>>>         list_for_each_entry_reverse(ops, &pernet_list, list)
>>>>>>>>                 ops_free_list(ops, &net_exit_list);
>>>>>>>>
>>>>>>>> -       mutex_unlock(&net_mutex);
>>>>>>>> +       up_read(&net_sem);
>>>>>>>
>>>>>>> After your patch setup_net() could run concurrently with cleanup_net(),
>>>>>>> given that ops_exit_list() is called on error path of setup_net() too,
>>>>>>> it means ops->exit() now could run concurrently if it doesn't have its
>>>>>>> own lock. Not sure if this breaks any existing user.
>>>>>>
>>>>>> Yes, there will be possible concurrent ops->init() for a net namespace,
>>>>>> and ops->exit() for another one. I hadn't found pernet operations, which
>>>>>> have a problem with that. If they exist, they are hidden and not clear seen.
>>>>>> The pernet operations in general do not touch someone else's memory.
>>>>>> If suddenly there is one, KASAN should show it after a while.
>>>>>
>>>>> Certainly the use of hash tables shared between multiple network
>>>>> namespaces would count.  I don't rembmer how many of these we have but
>>>>> there used to be quite a few.
>>>>
>>>> Could you please provide an example of hash tables, you mean?
>>>
>>> Ah, I see, it's dccp_hashinfo etc.
>
> JFI, I've checked dccp_hashinfo, and it seems to be safe.
>
>> 
>> The big one used to be the route cache.  With resizable hash tables
>> things may be getting better in that regard.
>
> I've checked some fib-related things, and wasn't able to find that.
> Excuse me, could you please clarify, if it's an assumption, or
> there is exactly a problem hash table, you know? Could you please
> point it me more exactly, if it's so.

Two things.
1) Hash tables are one case I know where we access data from multiple
   network namespaces.  As such it can not be asserted that is no
   possibility for problems.

2) The responsible way to handle this is one patch for each set of
   methods explaining why those methods are safe to run in parallel.

   That ensures there is opportunity for review and people are going
   slowly enough that they actually look at these issues.

The reason I want to see this broken up is that at 200ish sets of
methods it is too much to review all at once.

I completely agree that odds are that this can be made safe and that it
is mostly likely already safe in practically every instance.    My guess
would be that if there are problems that need to be addressed they
happen in one or two places and we need to find them.  If possible I
don't want to find them after the code has shipped in a stable release.

Eric