From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AAC02C43381 for ; Thu, 21 Mar 2019 13:13:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 7181F2083D for ; Thu, 21 Mar 2019 13:13:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728172AbfCUNNQ (ORCPT ); Thu, 21 Mar 2019 09:13:16 -0400 Received: from mail-qk1-f195.google.com ([209.85.222.195]:39562 "EHLO mail-qk1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728102AbfCUNNQ (ORCPT ); Thu, 21 Mar 2019 09:13:16 -0400 Received: by mail-qk1-f195.google.com with SMTP id c189so16231840qke.6 for ; Thu, 21 Mar 2019 06:13:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=25d7EkZEOOZFDxFSNi3vh56UEcSfQI6CCUI3zUUZkpE=; b=N3q58ym7JxuvuOcivrJIB12d4hGnS+jDaf6VSW5K8wGx3vGRcrGWQcP7ZZ+xWvB4yT xRCS+fGiO9/4vCHs0nzdTonJzuddIHeB1ByY6F2JUktcmn1rgZ+Pmf2Ua3y2cCr+bMdY Am0oDJd4KM84rdvPcnxGTbDqu1gbgrtielSu88bDO9nqmGsL7mAk2CDUFSCZ60xD1WpT 3Yizwi8OaaWiqw7VikjDWo+YuoDZrzwsZCXmoNAu3aB1qSQF5YH4pWd0/4Z84v1tjgWu xH4+EtWispnHzU35g/kifw6WnP+KTHy+ZZvBEdNEBBwtbJFtdLryPI5i5v3wi4Bmy8YJ SbLQ== X-Gm-Message-State: APjAAAXhqKUzSLVKA1c5SnHVzcAXEqrX472+SK3yR9OIlXF2uEcfmEe/ S1XCjX2QpNhl6oJTx541alDtA7mhZzb66Q== X-Google-Smtp-Source: APXvYqy52/SlGSTxUPehcqFFMyJyE5VwGqImwf6xPKX9jh1MbGoVXtLtrxCSHZjKPRosyd0Da1Wsvg== X-Received: by 2002:a37:c20c:: with SMTP id i12mr2564798qkm.94.1553173994781; Thu, 21 Mar 2019 06:13:14 -0700 (PDT) Received: from redhat.com ([195.39.71.253]) by smtp.gmail.com with ESMTPSA id s88sm1823550qki.78.2019.03.21.06.13.10 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 21 Mar 2019 06:13:13 -0700 (PDT) Date: Thu, 21 Mar 2019 09:12:57 -0400 From: "Michael S. Tsirkin" To: Liran Alon Cc: Stephen Hemminger , Si-Wei Liu , Sridhar Samudrala , Alexander Duyck , Jakub Kicinski , Jiri Pirko , David Miller , Netdev , virtualization@lists.linux-foundation.org, boris.ostrovsky@oracle.com, vijay.balakrishna@oracle.com, jfreimann@redhat.com, ogerlitz@mellanox.com, vuhuong@mellanox.com Subject: Re: [summary] virtio network device failover writeup Message-ID: <20190321090619-mutt-send-email-mst@kernel.org> References: <20190320100747-mutt-send-email-mst@kernel.org> <36772E22-7A8F-4C42-A731-398E3204B418@oracle.com> <20190320180641-mutt-send-email-mst@kernel.org> <20190321044920-mutt-send-email-mst@kernel.org> <20190321082532-mutt-send-email-mst@kernel.org> <20190321085159-mutt-send-email-mst@kernel.org> <2939FB15-720A-4C9E-92B7-2DBA139DDE0F@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <2939FB15-720A-4C9E-92B7-2DBA139DDE0F@oracle.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Thu, Mar 21, 2019 at 03:04:37PM +0200, Liran Alon wrote: > > > > On 21 Mar 2019, at 14:57, Michael S. Tsirkin wrote: > > > > On Thu, Mar 21, 2019 at 02:47:50PM +0200, Liran Alon wrote: > >> > >> > >>> On 21 Mar 2019, at 14:37, Michael S. Tsirkin wrote: > >>> > >>> On Thu, Mar 21, 2019 at 12:07:57PM +0200, Liran Alon wrote: > >>>>>>>> 2) It brings non-intuitive customer experience. For example, a customer may attempt to analyse connectivity issue by checking the connectivity > >>>>>>>> on a net-failover slave (e.g. the VF) but will see no connectivity when in-fact checking the connectivity on the net-failover master netdev shows correct connectivity. > >>>>>>>> > >>>>>>>> The set of changes I vision to fix our issues are: > >>>>>>>> 1) Hide net-failover slaves in a different netns created and managed by the kernel. But that user can enter to it and manage the netdevs there if wishes to do so explicitly. > >>>>>>>> (E.g. Configure the net-failover VF slave in some special way). > >>>>>>>> 2) Match the virtio-net and the VF based on a PV attribute instead of MAC. (Similar to as done in NetVSC). E.g. Provide a virtio-net interface to get PCI slot where the matching VF will be hot-plugged by hypervisor. > >>>>>>>> 3) Have an explicit virtio-net control message to command hypervisor to switch data-path from virtio-net to VF and vice-versa. Instead of relying on intercepting the PCI master enable-bit > >>>>>>>> as an indicator on when VF is about to be set up. (Similar to as done in NetVSC). > >>>>>>>> > >>>>>>>> Is there any clear issue we see regarding the above suggestion? > >>>>>>>> > >>>>>>>> -Liran > >>>>>>> > >>>>>>> The issue would be this: how do we avoid conflicting with namespaces > >>>>>>> created by users? > >>>>>> > >>>>>> This is kinda controversial, but maybe separate netns names into 2 groups: hidden and normal. > >>>>>> To reference a hidden netns, you need to do it explicitly. > >>>>>> Hidden and normal netns names can collide as they will be maintained in different namespaces (Yes I’m overloading the term namespace here…). > >>>>> > >>>>> Maybe it's an unnamed namespace. Hidden until userspace gives it a name? > >>>> > >>>> This is also a good idea that will solve the issue. Yes. > >>>> > >>>>> > >>>>>> Does this seems reasonable? > >>>>>> > >>>>>> -Liran > >>>>> > >>>>> Reasonable I'd say yes, easy to implement probably no. But maybe I > >>>>> missed a trick or two. > >>>> > >>>> BTW, from a practical point of view, I think that even until we figure out a solution on how to implement this, > >>>> it was better to create an kernel auto-generated name (e.g. “kernel_net_failover_slaves") > >>>> that will break only userspace workloads that by a very rare-chance have a netns that collides with this then > >>>> the breakage we have today for the various userspace components. > >>>> > >>>> -Liran > >>> > >>> It seems quite easy to supply that as a module parameter. Do we need two > >>> namespaces though? Won't some userspace still be confused by the two > >>> slaves sharing the MAC address? > >> > >> That’s one reasonable option. > >> Another one is that we will indeed change the mechanism by which we determine a VF should be bonded with a virtio-net device. > >> i.e. Expose a new virtio-net property that specify the PCI slot of the VF to be bonded with. > >> > >> The second seems cleaner but I don’t have a strong opinion on this. Both seem reasonable to me and your suggestion is faster to implement from current state of things. > >> > >> -Liran > > > > OK. Now what happens if master is moved to another namespace? Do we need > > to move the slaves too? > > No. Why would we move the slaves? The reason we have 3 device model at all is so users can fine tune the slaves. I don't see why this applies to the root namespace but not a container. If it has access to failover it should have access to slaves. > The whole point is to make most customer ignore the net-failover slaves and remain them “hidden” in their dedicated netns. So that makes the common case easy. That is good. My worry is it might make some uncommon cases impossible. > We won’t prevent customer from explicitly moving the net-failover slaves out of this netns, but we will not move them out of there automatically. > > > > > Also siwei's patch is then kind of extraneous right? > > Attempts to rename a slave will now fail as it's in a namespace… > > I’m not sure actually. Isn't udev/systemd netns-aware? > I would expect it to be able to provide names also to netdevs in netns different than default netns. I think most people move devices after they are renamed. > If that’s the case, Si-Wei patch to be able to rename a net-failover slave when it is already open is still required. As the race-condition still exists. > > -Liran > > > > >>> > >>> -- > >>> MST