Linux virtualization list
 help / color / mirror / Atom feed
* Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
From: Andrew Morton @ 2018-05-02  0:36 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: dm-devel, eric.dumazet, mst, netdev, linux-kernel, Matthew Wilcox,
	Michal Hocko, linux-mm, edumazet, virtualization, David Miller,
	Vlastimil Babka
In-Reply-To: <alpine.LRH.2.02.1804241229410.23702@file01.intranet.prod.int.rdu2.redhat.com>

On Tue, 24 Apr 2018 12:33:01 -0400 (EDT) Mikulas Patocka <mpatocka@redhat.com> wrote:

> 
> 
> On Tue, 24 Apr 2018, Michal Hocko wrote:
> 
> > On Tue 24-04-18 11:30:40, Mikulas Patocka wrote:
> > > 
> > > 
> > > On Tue, 24 Apr 2018, Michal Hocko wrote:
> > > 
> > > > On Mon 23-04-18 20:25:15, Mikulas Patocka wrote:
> > > > 
> > > > > Fixing __vmalloc code 
> > > > > is easy and it doesn't require cooperation with maintainers.
> > > > 
> > > > But it is a hack against the intention of the scope api.
> > > 
> > > It is not!
> > 
> > This discussion simply doesn't make much sense it seems. The scope API
> > is to document the scope of the reclaim recursion critical section. That
> > certainly is not a utility function like vmalloc.
> 
> That 15-line __vmalloc bugfix doesn't prevent you (or any other kernel 
> developer) from converting the code to the scope API. You make nonsensical 
> excuses.
> 

Fun thread!

Winding back to the original problem, I'd state it as

- Caller uses kvmalloc() but passes the address into vmalloc-naive
  DMA API and

- Caller uses kvmalloc() but passes the address into kfree()

Yes?

If so, then...

Is there a way in which, in the kvmalloc-called-kmalloc path, we can
tag the slab-allocated memory with a "this memory was allocated with
kvmalloc()" flag?  I *think* there's extra per-object storage available
with suitable slab/slub debugging options?  Perhaps we could steal one
bit from the redzone, dunno.

If so then we can

a) set that flag in kvmalloc() if the kmalloc() call succeeded

b) check for that flag in the DMA code, WARN if it is set.

c) in kvfree(), clear that flag before calling kfree()

d) in kfree(), check for that flag and go WARN() if set.

So both potential bugs are detected all the time, dependent upon
CONFIG_SLUB_DEBUG (and perhaps other slub config options).

^ permalink raw reply

* Re: [PATCH net-next v9 3/4] virtio_net: Extend virtio to use VF datapath when available
From: Samudrala, Sridhar @ 2018-05-02  0:20 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: alexander.h.duyck, virtio-dev, mst, kubakici, netdev,
	virtualization, loseweigh, aaron.f.brown, davem
In-Reply-To: <20180430072034.GH23854@nanopsycho.orion>

On 4/30/2018 12:20 AM, Jiri Pirko wrote:
>
>>> Now I try to change mac of the failover master:
>>> [root@test1 ~]# ip link set ens3 addr 52:54:00:b2:a7:f3
>>> RTNETLINK answers: Operation not supported
>>>
>>> That I did expect to work. I would expect this would change the mac of
>>> the master and both standby and primary slaves.
>> If a VF is untrusted, a VM will not able to change its MAC and moreover
> Note that at this point, I have no VF. So I'm not sure why you mention
> that.
>
>
>> in this mode we are assuming that the hypervisor has assigned the MAC and
>> guest is not expected to change the MAC.
> Wait, for ordinary old-fashioned virtio_net, as a VM user, I can change
> mac and all works fine. How is this different? Change mac on "failover
> instance" should work and should propagate the mac down to its slaves.
>
>
>> For the initial implementation, i would propose not allowing the guest to
>> change the MAC of failover or standby dev.
> I see no reason for such restriction.
>

It is true that a VM user can change mac address of a normal virtio-net interface,
however when it is in STANDBY mode i think we should not allow this change specifically
because we are creating a failover instance based on a MAC that is assigned by the
hypervisor.

Moreover,  in a cloud environment i would think that PF/hypervisor assigns a MAC to
the VF and it cannot be changed by the guest.

So for the initial implementation, do you see any issues with having this restriction
in STANDBY mode.


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH] vhost: make msg padding explicit
From: David Miller @ 2018-05-01 18:05 UTC (permalink / raw)
  To: mst; +Cc: kevin, kvm, netdev, linux-kernel, virtualization
In-Reply-To: <20180501201841-mutt-send-email-mst@kernel.org>

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Tue, 1 May 2018 20:19:19 +0300

> On Tue, May 01, 2018 at 11:28:22AM -0400, David Miller wrote:
>> From: "Michael S. Tsirkin" <mst@redhat.com>
>> Date: Fri, 27 Apr 2018 19:02:05 +0300
>> 
>> > There's a 32 bit hole just after type. It's best to
>> > give it a name, this way compiler is forced to initialize
>> > it with rest of the structure.
>> > 
>> > Reported-by: Kevin Easton <kevin@guarana.org>
>> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>> 
>> Michael, will you be sending this directly to Linus or would you like
>> me to apply it to net or net-next?
>> 
>> Thanks.
> 
> I'd prefer you to apply it for net and cc stable if possible.

Ok, applied, and added to my -stable submission queue.

^ permalink raw reply

* Re: [PATCH] vhost: make msg padding explicit
From: Michael S. Tsirkin @ 2018-05-01 17:19 UTC (permalink / raw)
  To: David Miller; +Cc: kevin, kvm, netdev, linux-kernel, virtualization
In-Reply-To: <20180501.112822.1871426720257639849.davem@davemloft.net>

On Tue, May 01, 2018 at 11:28:22AM -0400, David Miller wrote:
> From: "Michael S. Tsirkin" <mst@redhat.com>
> Date: Fri, 27 Apr 2018 19:02:05 +0300
> 
> > There's a 32 bit hole just after type. It's best to
> > give it a name, this way compiler is forced to initialize
> > it with rest of the structure.
> > 
> > Reported-by: Kevin Easton <kevin@guarana.org>
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> Michael, will you be sending this directly to Linus or would you like
> me to apply it to net or net-next?
> 
> Thanks.

I'd prefer you to apply it for net and cc stable if possible.
Thanks!

-- 
MST

^ permalink raw reply

* Re: [PATCH] vhost: make msg padding explicit
From: David Miller @ 2018-05-01 15:28 UTC (permalink / raw)
  To: mst; +Cc: kevin, kvm, netdev, linux-kernel, virtualization
In-Reply-To: <1524844881-178524-1-git-send-email-mst@redhat.com>

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Fri, 27 Apr 2018 19:02:05 +0300

> There's a 32 bit hole just after type. It's best to
> give it a name, this way compiler is forced to initialize
> it with rest of the structure.
> 
> Reported-by: Kevin Easton <kevin@guarana.org>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Michael, will you be sending this directly to Linus or would you like
me to apply it to net or net-next?

Thanks.

^ permalink raw reply

* Re: [PATCH net-next v9 3/4] virtio_net: Extend virtio to use VF datapath when available
From: Jiri Pirko @ 2018-05-01  7:33 UTC (permalink / raw)
  To: Samudrala, Sridhar
  Cc: alexander.h.duyck, virtio-dev, mst, kubakici, netdev,
	virtualization, loseweigh, aaron.f.brown, davem
In-Reply-To: <c965d7f2-9ba6-fc82-ce1c-4d70c3aceb6d@intel.com>

Mon, Apr 30, 2018 at 09:26:34PM CEST, sridhar.samudrala@intel.com wrote:
>On 4/30/2018 12:12 AM, Jiri Pirko wrote:
>> Mon, Apr 30, 2018 at 05:00:33AM CEST, sridhar.samudrala@intel.com wrote:
>> > On 4/28/2018 1:24 AM, Jiri Pirko wrote:
>> > > Fri, Apr 27, 2018 at 07:06:59PM CEST, sridhar.samudrala@intel.com wrote:
>> > > > This patch enables virtio_net to switch over to a VF datapath when a VF
>> > > > netdev is present with the same MAC address. It allows live migration
>> > > > of a VM with a direct attached VF without the need to setup a bond/team
>> > > > between a VF and virtio net device in the guest.
>> > > > 
>> > > > The hypervisor needs to enable only one datapath at any time so that
>> > > > packets don't get looped back to the VM over the other datapath. When a VF
>> > > Why? Both datapaths could be enabled at a time. Why the loop on
>> > > hypervisor side would be a problem. This in not an issue for
>> > > bonding/team as well.
>> > Somehow the hypervisor needs to make sure that the broadcasts/multicasts from the VM
>> > sent over the VF datapath don't get looped back to the VM via the virtio-net datapth.
>> Why? Please see below.
>> 
>> 
>> > This can happen if both datapaths are enabled at the same time.
>> > 
>> > I would think this is an issue even with bonding/team as well when virtio-net and
>> > VF are backed by the same PF.
>> > 
>> > 
>> I believe that the scenario is the same as on an ordinary nic/swich
>> network:
>> 
>> ...................
>> 
>>    host
>>         bond0
>>        /     \
>>      eth0   eth1
>>       |       |
>> ...................
>>       |       |
>>       p1      p2
>> 
>>    switch
>> 
>> ...................
>> 
>> It is perfectly valid to p1 and p2 be up and "bridged" together. Bond
>> has to cope with loop-backed frames. "Failover driver" should too,
>> it's the same scenario.
>
>OK. So looks like we should be able to handle this by returning RX_HANDLER_EXACT
>for frames received on standby device when primary is present.

Yep.

^ permalink raw reply

* Re: [dm-devel] [PATCH v5] fault-injection: introduce kvmalloc fallback options
From: Mikulas Patocka @ 2018-04-30 21:07 UTC (permalink / raw)
  To: John Stoffel
  Cc: Andrew, dm-devel, eric.dumazet, mst, netdev, Randy Dunlap,
	linux-kernel, Matthew Wilcox, Hocko, James Bottomley, Michal,
	edumazet, linux-mm, David Rientjes, Morton, virtualization,
	David Miller, Vlastimil Babka
In-Reply-To: <23271.24580.695738.853532@quad.stoffel.home>



On Mon, 30 Apr 2018, John Stoffel wrote:

> >>>>> "Mikulas" == Mikulas Patocka <mpatocka@redhat.com> writes:
> 
> Mikulas> On Thu, 26 Apr 2018, John Stoffel wrote:
> 
> Mikulas> I see your point - and I think the misunderstanding is this.
> 
> Thanks.
> 
> Mikulas> This patch is not really helping people to debug existing crashes. It is 
> Mikulas> not like "you get a crash" - "you google for some keywords" - "you get a 
> Mikulas> page that suggests to turn this option on" - "you turn it on and solve the 
> Mikulas> crash".
> 
> Mikulas> What this patch really does is that - it makes the kernel deliberately 
> Mikulas> crash in a situation when the code violates the specification, but it 
> Mikulas> would not crash otherwise or it would crash very rarely. It helps to 
> Mikulas> detect specification violations.
> 
> Mikulas> If the kernel developer (or tester) doesn't use this option, his buggy 
> Mikulas> code won't crash - and if it won't crash, he won't fix the bug or report 
> Mikulas> it. How is the user or developer supposed to learn about this option, if 
> Mikulas> he gets no crash at all?
> 
> So why do we make this a KConfig option at all?

Because other people see the KConfig option (so, they may enable it) and 
they don't see the kernel parameter (so, they won't enable it).

Close your eyes and say how many kernel parameters do you remember :-)

> Just turn it on and let it rip.

I can't test if all the networking drivers use kvmalloc properly, because 
I don't have the hardware. You can't test it neither. No one has all the 
hardware that is supported by Linux.

Driver issues can only be tested by a mass of users. And if the users 
don't know about the debugging option, they won't enable it.

> >> I agree with James here.  Looking at the SLAB vs SLUB Kconfig entries
> >> tells me *nothing* about why I should pick one or the other, as an
> >> example.

BTW. You can enable slub debugging either with CONFIG_SLUB_DEBUG_ON or 
with the kernel parameter "slub_debug" - and most users who compile their 
own kernel use CONFIG_SLUB_DEBUG_ON - just because it is visible.

> Now I also think that Linus has the right idea to not just sprinkle 
> BUG_ONs into the code, just dump and oops and keep going if you can.  
> If it's a filesystem or a device, turn it read only so that people 
> notice right away.

This vmalloc fallback is similar to CONFIG_DEBUG_KOBJECT_RELEASE. 
CONFIG_DEBUG_KOBJECT_RELEASE changes the behavior of kobject_put in order 
to cause deliberate crashes (that wouldn't happen otherwise) in drivers 
that misuse kobject_put. In the same sense, we want to cause deliberate 
crashes (that wouldn't happen otherwise) in drivers that misuse kvmalloc.

The crashes will only happen in debugging kernels, not in production 
kernels.

Mikulas

^ permalink raw reply

* Re: [PATCH net-next v9 3/4] virtio_net: Extend virtio to use VF datapath when available
From: Samudrala, Sridhar @ 2018-04-30 19:26 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: alexander.h.duyck, virtio-dev, mst, kubakici, netdev,
	virtualization, loseweigh, aaron.f.brown, davem
In-Reply-To: <20180430071208.GG23854@nanopsycho.orion>

On 4/30/2018 12:12 AM, Jiri Pirko wrote:
> Mon, Apr 30, 2018 at 05:00:33AM CEST, sridhar.samudrala@intel.com wrote:
>> On 4/28/2018 1:24 AM, Jiri Pirko wrote:
>>> Fri, Apr 27, 2018 at 07:06:59PM CEST, sridhar.samudrala@intel.com wrote:
>>>> This patch enables virtio_net to switch over to a VF datapath when a VF
>>>> netdev is present with the same MAC address. It allows live migration
>>>> of a VM with a direct attached VF without the need to setup a bond/team
>>>> between a VF and virtio net device in the guest.
>>>>
>>>> The hypervisor needs to enable only one datapath at any time so that
>>>> packets don't get looped back to the VM over the other datapath. When a VF
>>> Why? Both datapaths could be enabled at a time. Why the loop on
>>> hypervisor side would be a problem. This in not an issue for
>>> bonding/team as well.
>> Somehow the hypervisor needs to make sure that the broadcasts/multicasts from the VM
>> sent over the VF datapath don't get looped back to the VM via the virtio-net datapth.
> Why? Please see below.
>
>
>> This can happen if both datapaths are enabled at the same time.
>>
>> I would think this is an issue even with bonding/team as well when virtio-net and
>> VF are backed by the same PF.
>>
>>
> I believe that the scenario is the same as on an ordinary nic/swich
> network:
>
> ...................
>
>    host
>   
>         bond0
>        /     \
>      eth0   eth1
>       |       |
> ...................
>       |       |
>       p1      p2
>
>    switch
>
> ...................
>
> It is perfectly valid to p1 and p2 be up and "bridged" together. Bond
> has to cope with loop-backed frames. "Failover driver" should too,
> it's the same scenario.

OK. So looks like we should be able to handle this by returning RX_HANDLER_EXACT
for frames received on standby device when primary is present.

^ permalink raw reply

* Re: [PATCH net-next v9 1/4] virtio_net: Introduce VIRTIO_NET_F_STANDBY feature bit
From: Samudrala, Sridhar @ 2018-04-30 19:14 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: alexander.h.duyck, virtio-dev, mst, kubakici, netdev,
	virtualization, loseweigh, aaron.f.brown, davem
In-Reply-To: <20180430070340.GF23854@nanopsycho.orion>


On 4/30/2018 12:03 AM, Jiri Pirko wrote:
> Mon, Apr 30, 2018 at 04:47:03AM CEST, sridhar.samudrala@intel.com wrote:
>> On 4/28/2018 12:50 AM, Jiri Pirko wrote:
>>> Fri, Apr 27, 2018 at 07:06:57PM CEST,sridhar.samudrala@intel.com  wrote:
>>>> This feature bit can be used by hypervisor to indicate virtio_net device to
>>>> act as a standby for another device with the same MAC address.
>>>>
>>>> VIRTIO_NET_F_STANDBY is defined as bit 62 as it is a device feature bit.
>>>>
>>>> Signed-off-by: Sridhar Samudrala<sridhar.samudrala@intel.com>
>>>> ---
>>>> drivers/net/virtio_net.c        | 2 +-
>>>> include/uapi/linux/virtio_net.h | 3 +++
>>>> 2 files changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>> index 3b5991734118..51a085b1a242 100644
>>>> --- a/drivers/net/virtio_net.c
>>>> +++ b/drivers/net/virtio_net.c
>>>> @@ -2999,7 +2999,7 @@ static struct virtio_device_id id_table[] = {
>>>> 	VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
>>>> 	VIRTIO_NET_F_CTRL_MAC_ADDR, \
>>>> 	VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, \
>>>> -	VIRTIO_NET_F_SPEED_DUPLEX
>>>> +	VIRTIO_NET_F_SPEED_DUPLEX, VIRTIO_NET_F_STANDBY
>>> This is not part of current qemu master (head 6f0c4706b35dead265509115ddbd2a8d1af516c1)
>>> Were I can find the qemu code?
>>>
>>> Also, I think it makes sense to push HW (qemu HW in this case) first
>>> and only then the driver.
>> I had sent qemu patch with a couple of earlier versions of this patchset.
>> Will include it when i send out v10.
> The point was, don't you want to push it to qemu first? Did you at least
> send RFC to qemu?

Yes. Here is the link to the RFC patch.
https://patchwork.ozlabs.org/patch/859521/

^ permalink raw reply

* Re: [PATCH net-next v9 3/4] virtio_net: Extend virtio to use VF datapath when available
From: Jiri Pirko @ 2018-04-30  7:20 UTC (permalink / raw)
  To: Samudrala, Sridhar
  Cc: alexander.h.duyck, virtio-dev, mst, kubakici, netdev,
	virtualization, loseweigh, aaron.f.brown, davem
In-Reply-To: <638e52c7-64db-9d8d-7530-f6b000d3e292@intel.com>

Mon, Apr 30, 2018 at 06:16:58AM CEST, sridhar.samudrala@intel.com wrote:
>On 4/28/2018 2:42 AM, Jiri Pirko wrote:
>> Fri, Apr 27, 2018 at 07:06:59PM CEST,sridhar.samudrala@intel.com  wrote:
>> > This patch enables virtio_net to switch over to a VF datapath when a VF
>> > netdev is present with the same MAC address. It allows live migration
>> > of a VM with a direct attached VF without the need to setup a bond/team
>> > between a VF and virtio net device in the guest.
>> > 
>> > The hypervisor needs to enable only one datapath at any time so that
>> > packets don't get looped back to the VM over the other datapath. When a VF
>> > is plugged, the virtio datapath link state can be marked as down. The
>> > hypervisor needs to unplug the VF device from the guest on the source host
>> > and reset the MAC filter of the VF to initiate failover of datapath to
>> > virtio before starting the migration. After the migration is completed,
>> > the destination hypervisor sets the MAC filter on the VF and plugs it back
>> > to the guest to switch over to VF datapath.
>> > 
>> > It uses the generic failover framework that provides 2 functions to create
>> > and destroy a master failover netdev. When STANDBY feature is enabled, an
>> > additional netdev(failover netdev) is created that acts as a master device
>> > and tracks the state of the 2 lower netdevs. The original virtio_net netdev
>> > is marked as 'standby' netdev and a passthru device with the same MAC is
>> > registered as 'primary' netdev.
>> > 
>> > This patch is based on the discussion initiated by Jesse on this thread.
>> > https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
>> > 
>> When I enabled the standby feature (hardcoded), I have 2 netdevices now:
>> 4: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
>>      link/ether 52:54:00:b2:a7:f1 brd ff:ff:ff:ff:ff:ff
>>      inet6 fe80::5054:ff:feb2:a7f1/64 scope link
>>         valid_lft forever preferred_lft forever
>> 5: ens3n_sby: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
>>      link/ether 52:54:00:b2:a7:f1 brd ff:ff:ff:ff:ff:ff
>>      inet6 fe80::5054:ff:feb2:a7f1/64 scope link
>>         valid_lft forever preferred_lft forever
>> 
>> However, it seems to confuse my initscripts on Fedora:
>> [root@test1 ~]# ifup ens3
>> ./network-functions: line 78: [: /etc/dhcp/dhclient-ens3: binary operator expected
>> ./network-functions: line 80: [: /etc/dhclient-ens3: binary operator expected
>> ./network-functions: line 69: [: /var/lib/dhclient/dhclient-ens3: binary operator expected
>> 
>> Determining IP information for ens3
>> ens3n_sby...Cannot find device "ens3n_sby.pid"
>> Cannot find device "ens3n_sby.lease"
>>   failed.
>> 
>> I tried to change the standby device mac:
>> ip link set ens3n_sby addr 52:54:00:b2:a7:f2
>> [root@test1 ~]# ifup ens3
>> 
>> Determining IP information for ens3... done.
>> [root@test1 ~]#
>> 
>> But now the network does not work. I think that the mac change on
>> standby device should be probably refused, no?
>
>Yes. we should block changing standby device mac.
>
>> When I change the mac back, all works fine.
>
>This is strange. So you had to change the standby device mac twice to
>get dhcp working.

Yes. First to some other mac to not to confuse initscripts, second back
to the "failover mac" in order to make frames go through (I'm guessing
that virtio_net also has mac filter for incoming frames).


>
>I do see NetworkManager trying to get dhcp address on standby device, but

Not using NM. Just good old Fedora initscripts.


>i don't see any issue with connectivity.  To be totally transparent, we
>need to only expose one netdev.

Yep.


>
>> 
>> Now I try to change mac of the failover master:
>> [root@test1 ~]# ip link set ens3 addr 52:54:00:b2:a7:f3
>> RTNETLINK answers: Operation not supported
>> 
>> That I did expect to work. I would expect this would change the mac of
>> the master and both standby and primary slaves.
>
>If a VF is untrusted, a VM will not able to change its MAC and moreover

Note that at this point, I have no VF. So I'm not sure why you mention
that.


>in this mode we are assuming that the hypervisor has assigned the MAC and
>guest is not expected to change the MAC.

Wait, for ordinary old-fashioned virtio_net, as a VM user, I can change
mac and all works fine. How is this different? Change mac on "failover
instance" should work and should propagate the mac down to its slaves.


>
>For the initial implementation, i would propose not allowing the guest to
>change the MAC of failover or standby dev.

I see no reason for such restriction.


>
>
>> 
>> Now I tried to add a primary pci device. I don't have any fancy VF on my
>> test setup, but I expected the good old 8139cp to work:
>> [root@test1 ~]# ethtool -i ens9
>> driver: 8139cp
>> ....
>> [root@test1 ~]# ip link set ens9 addr 52:54:00:b2:a7:f1
>> 
>> I see no message in dmesg, so I guess the failover module did not
>> enslave this netdev. The mac change is not monitored. I would expect
>> that it is and whenever a device changes mac to the failover one, it
>> should be enslaved and whenever it changes mac back to something else,
>> it should be released - the primary one ofcourse.
>
>Sure. that may be the best way to handle the guest changing the primary
>netdev's mac.

Yep.


>
>> 
>> 
>> 
>> [...]
>> 
>> > +static int virtnet_get_phys_port_name(struct net_device *dev, char *buf,
>> > +				      size_t len)
>> > +{
>> > +	struct virtnet_info *vi = netdev_priv(dev);
>> > +	int ret;
>> > +
>> > +	if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_STANDBY))
>> > +		return -EOPNOTSUPP;
>> > +
>> > +	ret = snprintf(buf, len, "_sby");
>> please avoid the "_".
>> 
>> [...]
>

^ permalink raw reply

* Re: [PATCH net-next v9 3/4] virtio_net: Extend virtio to use VF datapath when available
From: Jiri Pirko @ 2018-04-30  7:12 UTC (permalink / raw)
  To: Samudrala, Sridhar
  Cc: alexander.h.duyck, virtio-dev, mst, kubakici, netdev,
	virtualization, loseweigh, aaron.f.brown, davem
In-Reply-To: <62f3b81b-70f6-6991-8e23-2bf650ecea2d@intel.com>

Mon, Apr 30, 2018 at 05:00:33AM CEST, sridhar.samudrala@intel.com wrote:
>
>On 4/28/2018 1:24 AM, Jiri Pirko wrote:
>> Fri, Apr 27, 2018 at 07:06:59PM CEST, sridhar.samudrala@intel.com wrote:
>> > This patch enables virtio_net to switch over to a VF datapath when a VF
>> > netdev is present with the same MAC address. It allows live migration
>> > of a VM with a direct attached VF without the need to setup a bond/team
>> > between a VF and virtio net device in the guest.
>> > 
>> > The hypervisor needs to enable only one datapath at any time so that
>> > packets don't get looped back to the VM over the other datapath. When a VF
>> Why? Both datapaths could be enabled at a time. Why the loop on
>> hypervisor side would be a problem. This in not an issue for
>> bonding/team as well.
>
>Somehow the hypervisor needs to make sure that the broadcasts/multicasts from the VM
>sent over the VF datapath don't get looped back to the VM via the virtio-net datapth.

Why? Please see below.


>This can happen if both datapaths are enabled at the same time.
>
>I would think this is an issue even with bonding/team as well when virtio-net and
>VF are backed by the same PF.
>
>

I believe that the scenario is the same as on an ordinary nic/swich
network:

...................

  host
 
       bond0
      /     \
    eth0   eth1
     |       |
...................
     |       |
     p1      p2

  switch

...................

It is perfectly valid to p1 and p2 be up and "bridged" together. Bond
has to cope with loop-backed frames. "Failover driver" should too,
it's the same scenario.


>> 
>> 
>> > is plugged, the virtio datapath link state can be marked as down. The
>> > hypervisor needs to unplug the VF device from the guest on the source host
>> > and reset the MAC filter of the VF to initiate failover of datapath to
>> "reset the MAC filter of the VF" - you mean "set the VF mac"?
>
>Yes.  the PF should take away the MAC address assigned to the VF so that the PF
>starts receiving those packets.

Okay, got it. Please put this in the description.


>
>> 
>> 
>> > virtio before starting the migration. After the migration is completed,
>> > the destination hypervisor sets the MAC filter on the VF and plugs it back
>> > to the guest to switch over to VF datapath.
>> > 
>> > It uses the generic failover framework that provides 2 functions to create
>> > and destroy a master failover netdev. When STANDBY feature is enabled, an
>> > additional netdev(failover netdev) is created that acts as a master device
>> > and tracks the state of the 2 lower netdevs. The original virtio_net netdev
>> > is marked as 'standby' netdev and a passthru device with the same MAC is
>> > registered as 'primary' netdev.
>> > 
>> > This patch is based on the discussion initiated by Jesse on this thread.
>> > https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
>> [...]
>> 
>

^ permalink raw reply

* Re: [PATCH net-next v9 1/4] virtio_net: Introduce VIRTIO_NET_F_STANDBY feature bit
From: Jiri Pirko @ 2018-04-30  7:03 UTC (permalink / raw)
  To: Samudrala, Sridhar
  Cc: alexander.h.duyck, virtio-dev, mst, kubakici, netdev,
	virtualization, loseweigh, aaron.f.brown, davem
In-Reply-To: <9826909e-d6cd-a0a3-142f-6f7f8cf2b5ce@intel.com>

Mon, Apr 30, 2018 at 04:47:03AM CEST, sridhar.samudrala@intel.com wrote:
>On 4/28/2018 12:50 AM, Jiri Pirko wrote:
>> Fri, Apr 27, 2018 at 07:06:57PM CEST,sridhar.samudrala@intel.com  wrote:
>> > This feature bit can be used by hypervisor to indicate virtio_net device to
>> > act as a standby for another device with the same MAC address.
>> > 
>> > VIRTIO_NET_F_STANDBY is defined as bit 62 as it is a device feature bit.
>> > 
>> > Signed-off-by: Sridhar Samudrala<sridhar.samudrala@intel.com>
>> > ---
>> > drivers/net/virtio_net.c        | 2 +-
>> > include/uapi/linux/virtio_net.h | 3 +++
>> > 2 files changed, 4 insertions(+), 1 deletion(-)
>> > 
>> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> > index 3b5991734118..51a085b1a242 100644
>> > --- a/drivers/net/virtio_net.c
>> > +++ b/drivers/net/virtio_net.c
>> > @@ -2999,7 +2999,7 @@ static struct virtio_device_id id_table[] = {
>> > 	VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
>> > 	VIRTIO_NET_F_CTRL_MAC_ADDR, \
>> > 	VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, \
>> > -	VIRTIO_NET_F_SPEED_DUPLEX
>> > +	VIRTIO_NET_F_SPEED_DUPLEX, VIRTIO_NET_F_STANDBY
>> This is not part of current qemu master (head 6f0c4706b35dead265509115ddbd2a8d1af516c1)
>> Were I can find the qemu code?
>> 
>> Also, I think it makes sense to push HW (qemu HW in this case) first
>> and only then the driver.
>
>I had sent qemu patch with a couple of earlier versions of this patchset.
>Will include it when i send out v10.

The point was, don't you want to push it to qemu first? Did you at least
send RFC to qemu?

^ permalink raw reply

* Re: [PATCH net-next v9 3/4] virtio_net: Extend virtio to use VF datapath when available
From: Samudrala, Sridhar @ 2018-04-30  4:16 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: alexander.h.duyck, virtio-dev, mst, kubakici, netdev,
	virtualization, loseweigh, aaron.f.brown, davem
In-Reply-To: <20180428094205.GM5632@nanopsycho.orion>


[-- Attachment #1.1: Type: text/plain, Size: 4928 bytes --]

On 4/28/2018 2:42 AM, Jiri Pirko wrote:
> Fri, Apr 27, 2018 at 07:06:59PM CEST,sridhar.samudrala@intel.com  wrote:
>> This patch enables virtio_net to switch over to a VF datapath when a VF
>> netdev is present with the same MAC address. It allows live migration
>> of a VM with a direct attached VF without the need to setup a bond/team
>> between a VF and virtio net device in the guest.
>>
>> The hypervisor needs to enable only one datapath at any time so that
>> packets don't get looped back to the VM over the other datapath. When a VF
>> is plugged, the virtio datapath link state can be marked as down. The
>> hypervisor needs to unplug the VF device from the guest on the source host
>> and reset the MAC filter of the VF to initiate failover of datapath to
>> virtio before starting the migration. After the migration is completed,
>> the destination hypervisor sets the MAC filter on the VF and plugs it back
>> to the guest to switch over to VF datapath.
>>
>> It uses the generic failover framework that provides 2 functions to create
>> and destroy a master failover netdev. When STANDBY feature is enabled, an
>> additional netdev(failover netdev) is created that acts as a master device
>> and tracks the state of the 2 lower netdevs. The original virtio_net netdev
>> is marked as 'standby' netdev and a passthru device with the same MAC is
>> registered as 'primary' netdev.
>>
>> This patch is based on the discussion initiated by Jesse on this thread.
>> https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
>>
> When I enabled the standby feature (hardcoded), I have 2 netdevices now:
> 4: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
>      link/ether 52:54:00:b2:a7:f1 brd ff:ff:ff:ff:ff:ff
>      inet6 fe80::5054:ff:feb2:a7f1/64 scope link
>         valid_lft forever preferred_lft forever
> 5: ens3n_sby: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
>      link/ether 52:54:00:b2:a7:f1 brd ff:ff:ff:ff:ff:ff
>      inet6 fe80::5054:ff:feb2:a7f1/64 scope link
>         valid_lft forever preferred_lft forever
>
> However, it seems to confuse my initscripts on Fedora:
> [root@test1 ~]# ifup ens3
> ./network-functions: line 78: [: /etc/dhcp/dhclient-ens3: binary operator expected
> ./network-functions: line 80: [: /etc/dhclient-ens3: binary operator expected
> ./network-functions: line 69: [: /var/lib/dhclient/dhclient-ens3: binary operator expected
>
> Determining IP information for ens3
> ens3n_sby...Cannot find device "ens3n_sby.pid"
> Cannot find device "ens3n_sby.lease"
>   failed.
>
> I tried to change the standby device mac:
> ip link set ens3n_sby addr 52:54:00:b2:a7:f2
> [root@test1 ~]# ifup ens3
>
> Determining IP information for ens3... done.
> [root@test1 ~]#
>
> But now the network does not work. I think that the mac change on
> standby device should be probably refused, no?

Yes. we should block changing standby device mac.

> When I change the mac back, all works fine.

This is strange. So you had to change the standby device mac twice to
get dhcp working.

I do see NetworkManager trying to get dhcp address on standby device, but
i don't see any issue with connectivity.  To be totally transparent, we
need to only expose one netdev.

>
> Now I try to change mac of the failover master:
> [root@test1 ~]# ip link set ens3 addr 52:54:00:b2:a7:f3
> RTNETLINK answers: Operation not supported
>
> That I did expect to work. I would expect this would change the mac of
> the master and both standby and primary slaves.

If a VF is untrusted, a VM will not able to change its MAC and moreover
in this mode we are assuming that the hypervisor has assigned the MAC and
guest is not expected to change the MAC.

For the initial implementation, i would propose not allowing the guest to
change the MAC of failover or standby dev.


>
> Now I tried to add a primary pci device. I don't have any fancy VF on my
> test setup, but I expected the good old 8139cp to work:
> [root@test1 ~]# ethtool -i ens9
> driver: 8139cp
> ....
> [root@test1 ~]# ip link set ens9 addr 52:54:00:b2:a7:f1
>
> I see no message in dmesg, so I guess the failover module did not
> enslave this netdev. The mac change is not monitored. I would expect
> that it is and whenever a device changes mac to the failover one, it
> should be enslaved and whenever it changes mac back to something else,
> it should be released - the primary one ofcourse.

Sure. that may be the best way to handle the guest changing the primary
netdev's mac.

>
>
>
> [...]
>
>> +static int virtnet_get_phys_port_name(struct net_device *dev, char *buf,
>> +				      size_t len)
>> +{
>> +	struct virtnet_info *vi = netdev_priv(dev);
>> +	int ret;
>> +
>> +	if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_STANDBY))
>> +		return -EOPNOTSUPP;
>> +
>> +	ret = snprintf(buf, len, "_sby");
> please avoid the "_".
>
> [...]


[-- Attachment #1.2: Type: text/html, Size: 6088 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH net-next v9 2/4] net: Introduce generic failover module
From: Samudrala, Sridhar @ 2018-04-30  3:03 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: alexander.h.duyck, virtio-dev, mst, kubakici, netdev,
	virtualization, loseweigh, aaron.f.brown, davem
In-Reply-To: <20180428090601.GL5632@nanopsycho.orion>

On 4/28/2018 2:06 AM, Jiri Pirko wrote:
> Fri, Apr 27, 2018 at 07:06:58PM CEST, sridhar.samudrala@intel.com wrote:
>> This provides a generic interface for paravirtual drivers to listen
>> for netdev register/unregister/link change events from pci ethernet
>> devices with the same MAC and takeover their datapath. The notifier and
>> event handling code is based on the existing netvsc implementation.
>>
>> It exposes 2 sets of interfaces to the paravirtual drivers.
>> 1. For paravirtual drivers like virtio_net that use 3 netdev model, the
>>    the failover module provides interfaces to create/destroy additional
>>    master netdev and all the slave events are managed internally.
>>         net_failover_create()
>>         net_failover_destroy()
>>    A failover netdev is created that acts a master device and controls 2
>>    slave devices. The original virtio_net netdev is registered as 'standby'
>>    netdev and a passthru/vf device with the same MAC gets registered as
>>    'primary' netdev. Both 'standby' and 'primary' netdevs are associated
>>    with the same 'pci' device.  The user accesses the network interface via
>>    'failover' netdev. The 'failover' netdev chooses 'primary' netdev as
>>    default for transmits when it is available with link up and running.
>> 2. For existing netvsc driver that uses 2 netdev model, no master netdev
>>    is created. The paravirtual driver registers each instance of netvsc
>>    as a 'failover' netdev  along with a set of ops to manage the slave
>>    events. There is no 'standby' netdev in this model. A passthru/vf device
>>    with the same MAC gets registered as 'primary' netdev.
>>         net_failover_register()
>>         net_failover_unregister()
>>
> First of all, I like this v9 very much. Nice progress!
> Couple of notes inlined.

Thanks for the detailed reviews and all your suggestions for improvements
agree with all your comments. will address them in v10.

^ permalink raw reply

* Re: [PATCH net-next v9 3/4] virtio_net: Extend virtio to use VF datapath when available
From: Samudrala, Sridhar @ 2018-04-30  3:00 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: alexander.h.duyck, virtio-dev, mst, kubakici, netdev,
	virtualization, loseweigh, aaron.f.brown, davem
In-Reply-To: <20180428082433.GK5632@nanopsycho.orion>


On 4/28/2018 1:24 AM, Jiri Pirko wrote:
> Fri, Apr 27, 2018 at 07:06:59PM CEST, sridhar.samudrala@intel.com wrote:
>> This patch enables virtio_net to switch over to a VF datapath when a VF
>> netdev is present with the same MAC address. It allows live migration
>> of a VM with a direct attached VF without the need to setup a bond/team
>> between a VF and virtio net device in the guest.
>>
>> The hypervisor needs to enable only one datapath at any time so that
>> packets don't get looped back to the VM over the other datapath. When a VF
> Why? Both datapaths could be enabled at a time. Why the loop on
> hypervisor side would be a problem. This in not an issue for
> bonding/team as well.

Somehow the hypervisor needs to make sure that the broadcasts/multicasts from the VM
sent over the VF datapath don't get looped back to the VM via the virtio-net datapth.
This can happen if both datapaths are enabled at the same time.

I would think this is an issue even with bonding/team as well when virtio-net and
VF are backed by the same PF.


>
>
>> is plugged, the virtio datapath link state can be marked as down. The
>> hypervisor needs to unplug the VF device from the guest on the source host
>> and reset the MAC filter of the VF to initiate failover of datapath to
> "reset the MAC filter of the VF" - you mean "set the VF mac"?

Yes.  the PF should take away the MAC address assigned to the VF so that the PF
starts receiving those packets.

>
>
>> virtio before starting the migration. After the migration is completed,
>> the destination hypervisor sets the MAC filter on the VF and plugs it back
>> to the guest to switch over to VF datapath.
>>
>> It uses the generic failover framework that provides 2 functions to create
>> and destroy a master failover netdev. When STANDBY feature is enabled, an
>> additional netdev(failover netdev) is created that acts as a master device
>> and tracks the state of the 2 lower netdevs. The original virtio_net netdev
>> is marked as 'standby' netdev and a passthru device with the same MAC is
>> registered as 'primary' netdev.
>>
>> This patch is based on the discussion initiated by Jesse on this thread.
>> https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
> [...]
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH net-next v9 2/4] net: Introduce generic failover module
From: Samudrala, Sridhar @ 2018-04-30  2:47 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: alexander.h.duyck, virtio-dev, mst, kubakici, netdev,
	virtualization, loseweigh, aaron.f.brown, davem
In-Reply-To: <20180428081542.GJ5632@nanopsycho.orion>


[-- Attachment #1.1: Type: text/plain, Size: 2046 bytes --]

On 4/28/2018 1:15 AM, Jiri Pirko wrote:
> Fri, Apr 27, 2018 at 07:06:58PM CEST,sridhar.samudrala@intel.com  wrote:
>> This provides a generic interface for paravirtual drivers to listen
>> for netdev register/unregister/link change events from pci ethernet
>> devices with the same MAC and takeover their datapath. The notifier and
>> event handling code is based on the existing netvsc implementation.
>>
>> It exposes 2 sets of interfaces to the paravirtual drivers.
>> 1. For paravirtual drivers like virtio_net that use 3 netdev model, the
>>    the failover module provides interfaces to create/destroy additional
>>    master netdev and all the slave events are managed internally.
>>         net_failover_create()
>>         net_failover_destroy()
>>    A failover netdev is created that acts a master device and controls 2
>>    slave devices. The original virtio_net netdev is registered as 'standby'
>>    netdev and a passthru/vf device with the same MAC gets registered as
>>    'primary' netdev. Both 'standby' and 'primary' netdevs are associated
>>    with the same 'pci' device.  The user accesses the network interface via
> 'standby' and 'primary' netdevs are not associated with the same 'pci'
> device.
> "Primary" is the VF netdevice and "standby" is virtio_net. Each
> associated with different pci device.

I meant to say that 'standby' and 'failover' netdevs are associated with
the same 'pci' device. will fix it in v10.


>
>>    'failover' netdev. The 'failover' netdev chooses 'primary' netdev as
>>    default for transmits when it is available with link up and running.
>> 2. For existing netvsc driver that uses 2 netdev model, no master netdev
>>    is created. The paravirtual driver registers each instance of netvsc
>>    as a 'failover' netdev  along with a set of ops to manage the slave
>>    events. There is no 'standby' netdev in this model. A passthru/vf device
>>    with the same MAC gets registered as 'primary' netdev.
>>         net_failover_register()
>>         net_failover_unregister()
> [...]


[-- Attachment #1.2: Type: text/html, Size: 2696 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH net-next v9 1/4] virtio_net: Introduce VIRTIO_NET_F_STANDBY feature bit
From: Samudrala, Sridhar @ 2018-04-30  2:47 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: alexander.h.duyck, virtio-dev, mst, kubakici, netdev,
	virtualization, loseweigh, aaron.f.brown, davem
In-Reply-To: <20180428075027.GI5632@nanopsycho.orion>


[-- Attachment #1.1: Type: text/plain, Size: 1370 bytes --]

On 4/28/2018 12:50 AM, Jiri Pirko wrote:
> Fri, Apr 27, 2018 at 07:06:57PM CEST,sridhar.samudrala@intel.com  wrote:
>> This feature bit can be used by hypervisor to indicate virtio_net device to
>> act as a standby for another device with the same MAC address.
>>
>> VIRTIO_NET_F_STANDBY is defined as bit 62 as it is a device feature bit.
>>
>> Signed-off-by: Sridhar Samudrala<sridhar.samudrala@intel.com>
>> ---
>> drivers/net/virtio_net.c        | 2 +-
>> include/uapi/linux/virtio_net.h | 3 +++
>> 2 files changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 3b5991734118..51a085b1a242 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -2999,7 +2999,7 @@ static struct virtio_device_id id_table[] = {
>> 	VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
>> 	VIRTIO_NET_F_CTRL_MAC_ADDR, \
>> 	VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, \
>> -	VIRTIO_NET_F_SPEED_DUPLEX
>> +	VIRTIO_NET_F_SPEED_DUPLEX, VIRTIO_NET_F_STANDBY
> This is not part of current qemu master (head 6f0c4706b35dead265509115ddbd2a8d1af516c1)
> Were I can find the qemu code?
>
> Also, I think it makes sense to push HW (qemu HW in this case) first
> and only then the driver.

I had sent qemu patch with a couple of earlier versions of this patchset.
Will include it when i send out v10.



[-- Attachment #1.2: Type: text/html, Size: 1897 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH] vhost: make msg padding explicit
From: David Miller @ 2018-04-30  1:34 UTC (permalink / raw)
  To: mst; +Cc: kevin, kvm, netdev, linux-kernel, virtualization
In-Reply-To: <1524844881-178524-1-git-send-email-mst@redhat.com>

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Fri, 27 Apr 2018 19:02:05 +0300

> There's a 32 bit hole just after type. It's best to
> give it a name, this way compiler is forced to initialize
> it with rest of the structure.
> 
> Reported-by: Kevin Easton <kevin@guarana.org>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Who applied this, me? :-)

^ permalink raw reply

* Re: [PATCH net-next v9 3/4] virtio_net: Extend virtio to use VF datapath when available
From: Jiri Pirko @ 2018-04-29 13:45 UTC (permalink / raw)
  To: Siwei Liu
  Cc: Alexander Duyck, virtio-dev, Michael S. Tsirkin, Jakub Kicinski,
	Sridhar Samudrala, virtualization, Netdev, aaron.f.brown,
	David Miller
In-Reply-To: <CADGSJ23A4w=MmnP7CneBwk=eouS07HfK4=1UWTk3Dz3gBX0yjg@mail.gmail.com>

Sun, Apr 29, 2018 at 10:56:30AM CEST, loseweigh@gmail.com wrote:
>On Sat, Apr 28, 2018 at 2:42 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> Fri, Apr 27, 2018 at 07:06:59PM CEST, sridhar.samudrala@intel.com wrote:
>>>This patch enables virtio_net to switch over to a VF datapath when a VF
>>>netdev is present with the same MAC address. It allows live migration
>>>of a VM with a direct attached VF without the need to setup a bond/team
>>>between a VF and virtio net device in the guest.
>>>
>>>The hypervisor needs to enable only one datapath at any time so that
>>>packets don't get looped back to the VM over the other datapath. When a VF
>>>is plugged, the virtio datapath link state can be marked as down. The
>>>hypervisor needs to unplug the VF device from the guest on the source host
>>>and reset the MAC filter of the VF to initiate failover of datapath to
>>>virtio before starting the migration. After the migration is completed,
>>>the destination hypervisor sets the MAC filter on the VF and plugs it back
>>>to the guest to switch over to VF datapath.
>>>
>>>It uses the generic failover framework that provides 2 functions to create
>>>and destroy a master failover netdev. When STANDBY feature is enabled, an
>>>additional netdev(failover netdev) is created that acts as a master device
>>>and tracks the state of the 2 lower netdevs. The original virtio_net netdev
>>>is marked as 'standby' netdev and a passthru device with the same MAC is
>>>registered as 'primary' netdev.
>>>
>>>This patch is based on the discussion initiated by Jesse on this thread.
>>>https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
>>>
>>
>> When I enabled the standby feature (hardcoded), I have 2 netdevices now:
>> 4: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
>>     link/ether 52:54:00:b2:a7:f1 brd ff:ff:ff:ff:ff:ff
>>     inet6 fe80::5054:ff:feb2:a7f1/64 scope link
>>        valid_lft forever preferred_lft forever
>> 5: ens3n_sby: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
>>     link/ether 52:54:00:b2:a7:f1 brd ff:ff:ff:ff:ff:ff
>>     inet6 fe80::5054:ff:feb2:a7f1/64 scope link
>>        valid_lft forever preferred_lft forever
>>
>> However, it seems to confuse my initscripts on Fedora:
>> [root@test1 ~]# ifup ens3
>> ./network-functions: line 78: [: /etc/dhcp/dhclient-ens3: binary operator expected
>> ./network-functions: line 80: [: /etc/dhclient-ens3: binary operator expected
>> ./network-functions: line 69: [: /var/lib/dhclient/dhclient-ens3: binary operator expected
>>
>You should teach Fedora and all cloud vendors to upgrade their
>initscripts and other userspace tools, no?

I just wanted to point out that the conversion from "nostandby" to
"standby" isn't always that smooth as claimed. The claim was "no change
for the current user" iirc.

^ permalink raw reply

* Re: [PATCH net-next v9 3/4] virtio_net: Extend virtio to use VF datapath when available
From: Siwei Liu @ 2018-04-29  8:56 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Alexander Duyck, virtio-dev, Michael S. Tsirkin, Jakub Kicinski,
	Sridhar Samudrala, virtualization, Netdev, aaron.f.brown,
	David Miller
In-Reply-To: <20180428094205.GM5632@nanopsycho.orion>

On Sat, Apr 28, 2018 at 2:42 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> Fri, Apr 27, 2018 at 07:06:59PM CEST, sridhar.samudrala@intel.com wrote:
>>This patch enables virtio_net to switch over to a VF datapath when a VF
>>netdev is present with the same MAC address. It allows live migration
>>of a VM with a direct attached VF without the need to setup a bond/team
>>between a VF and virtio net device in the guest.
>>
>>The hypervisor needs to enable only one datapath at any time so that
>>packets don't get looped back to the VM over the other datapath. When a VF
>>is plugged, the virtio datapath link state can be marked as down. The
>>hypervisor needs to unplug the VF device from the guest on the source host
>>and reset the MAC filter of the VF to initiate failover of datapath to
>>virtio before starting the migration. After the migration is completed,
>>the destination hypervisor sets the MAC filter on the VF and plugs it back
>>to the guest to switch over to VF datapath.
>>
>>It uses the generic failover framework that provides 2 functions to create
>>and destroy a master failover netdev. When STANDBY feature is enabled, an
>>additional netdev(failover netdev) is created that acts as a master device
>>and tracks the state of the 2 lower netdevs. The original virtio_net netdev
>>is marked as 'standby' netdev and a passthru device with the same MAC is
>>registered as 'primary' netdev.
>>
>>This patch is based on the discussion initiated by Jesse on this thread.
>>https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
>>
>
> When I enabled the standby feature (hardcoded), I have 2 netdevices now:
> 4: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
>     link/ether 52:54:00:b2:a7:f1 brd ff:ff:ff:ff:ff:ff
>     inet6 fe80::5054:ff:feb2:a7f1/64 scope link
>        valid_lft forever preferred_lft forever
> 5: ens3n_sby: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
>     link/ether 52:54:00:b2:a7:f1 brd ff:ff:ff:ff:ff:ff
>     inet6 fe80::5054:ff:feb2:a7f1/64 scope link
>        valid_lft forever preferred_lft forever
>
> However, it seems to confuse my initscripts on Fedora:
> [root@test1 ~]# ifup ens3
> ./network-functions: line 78: [: /etc/dhcp/dhclient-ens3: binary operator expected
> ./network-functions: line 80: [: /etc/dhclient-ens3: binary operator expected
> ./network-functions: line 69: [: /var/lib/dhclient/dhclient-ens3: binary operator expected
>
You should teach Fedora and all cloud vendors to upgrade their
initscripts and other userspace tools, no?

> Determining IP information for ens3
> ens3n_sby...Cannot find device "ens3n_sby.pid"
> Cannot find device "ens3n_sby.lease"
>  failed.
>
> I tried to change the standby device mac:
> ip link set ens3n_sby addr 52:54:00:b2:a7:f2
> [root@test1 ~]# ifup ens3
>
> Determining IP information for ens3... done.
> [root@test1 ~]#
>
> But now the network does not work. I think that the mac change on
> standby device should be probably refused, no?
>
> When I change the mac back, all works fine.
>
>
> Now I try to change mac of the failover master:
> [root@test1 ~]# ip link set ens3 addr 52:54:00:b2:a7:f3
> RTNETLINK answers: Operation not supported
>
> That I did expect to work. I would expect this would change the mac of
> the master and both standby and primary slaves.
>
>
> Now I tried to add a primary pci device. I don't have any fancy VF on my
> test setup, but I expected the good old 8139cp to work:
> [root@test1 ~]# ethtool -i ens9
> driver: 8139cp
> ....
> [root@test1 ~]# ip link set ens9 addr 52:54:00:b2:a7:f1
>
> I see no message in dmesg, so I guess the failover module did not
> enslave this netdev. The mac change is not monitored. I would expect
> that it is and whenever a device changes mac to the failover one, it
> should be enslaved and whenever it changes mac back to something else,
> it should be released - the primary one ofcourse.
>
>
>
> [...]
>
>>+static int virtnet_get_phys_port_name(struct net_device *dev, char *buf,
>>+                                    size_t len)
>>+{
>>+      struct virtnet_info *vi = netdev_priv(dev);
>>+      int ret;
>>+
>>+      if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_STANDBY))
>>+              return -EOPNOTSUPP;
>>+
>>+      ret = snprintf(buf, len, "_sby");
>
> please avoid the "_".
>
> [...]

^ permalink raw reply

* Re: [PATCH net] vhost: Use kzalloc() to allocate vhost_msg_node
From: Dmitry Vyukov via Virtualization @ 2018-04-29  8:10 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Kevin Easton, KVM list, netdev, syzkaller-bugs, LKML,
	virtualization
In-Reply-To: <20180427223636-mutt-send-email-mst@kernel.org>

On Fri, Apr 27, 2018 at 9:36 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> >> The struct vhost_msg within struct vhost_msg_node is copied to userspace,
>> >> so it should be allocated with kzalloc() to ensure all structure padding
>> >> is zeroed.
>> >>
>> >> Signed-off-by: Kevin Easton <kevin@guarana.org>
>> >> Reported-by: syzbot+87cfa083e727a224754b@syzkaller.appspotmail.com
>> >
>> > Does it help if a patch naming the padding is applied,
>> > and then we init just the relevant field?
>> > Just curious.
>>
>> Yes, it would help.
>
> How about a Tested-by tag then?

I didn't test either patch.

>> >> ---
>> >>  drivers/vhost/vhost.c | 2 +-
>> >>  1 file changed, 1 insertion(+), 1 deletion(-)
>> >>
>> >> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>> >> index f3bd8e9..1b84dcff 100644
>> >> --- a/drivers/vhost/vhost.c
>> >> +++ b/drivers/vhost/vhost.c
>> >> @@ -2339,7 +2339,7 @@ EXPORT_SYMBOL_GPL(vhost_disable_notify);
>> >>  /* Create a new message. */
>> >>  struct vhost_msg_node *vhost_new_msg(struct vhost_virtqueue *vq, int type)
>> >>  {
>> >> -     struct vhost_msg_node *node = kmalloc(sizeof *node, GFP_KERNEL);
>> >> +     struct vhost_msg_node *node = kzalloc(sizeof *node, GFP_KERNEL);
>> >>       if (!node)
>> >>               return NULL;
>> >>       node->vq = vq;
>> >> --
>> >> 2.8.1
>> >
>> > --
>> > You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
>> > To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
>> > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20180427185501-mutt-send-email-mst%40kernel.org.
>> > For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply

* Re: [PATCH net-next v9 3/4] virtio_net: Extend virtio to use VF datapath when available
From: Jiri Pirko @ 2018-04-28  9:42 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: alexander.h.duyck, virtio-dev, mst, kubakici, netdev,
	virtualization, loseweigh, aaron.f.brown, davem
In-Reply-To: <1524848820-42258-4-git-send-email-sridhar.samudrala@intel.com>

Fri, Apr 27, 2018 at 07:06:59PM CEST, sridhar.samudrala@intel.com wrote:
>This patch enables virtio_net to switch over to a VF datapath when a VF
>netdev is present with the same MAC address. It allows live migration
>of a VM with a direct attached VF without the need to setup a bond/team
>between a VF and virtio net device in the guest.
>
>The hypervisor needs to enable only one datapath at any time so that
>packets don't get looped back to the VM over the other datapath. When a VF
>is plugged, the virtio datapath link state can be marked as down. The
>hypervisor needs to unplug the VF device from the guest on the source host
>and reset the MAC filter of the VF to initiate failover of datapath to
>virtio before starting the migration. After the migration is completed,
>the destination hypervisor sets the MAC filter on the VF and plugs it back
>to the guest to switch over to VF datapath.
>
>It uses the generic failover framework that provides 2 functions to create
>and destroy a master failover netdev. When STANDBY feature is enabled, an
>additional netdev(failover netdev) is created that acts as a master device
>and tracks the state of the 2 lower netdevs. The original virtio_net netdev
>is marked as 'standby' netdev and a passthru device with the same MAC is
>registered as 'primary' netdev.
>
>This patch is based on the discussion initiated by Jesse on this thread.
>https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
>

When I enabled the standby feature (hardcoded), I have 2 netdevices now:
4: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:b2:a7:f1 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5054:ff:feb2:a7f1/64 scope link 
       valid_lft forever preferred_lft forever
5: ens3n_sby: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:b2:a7:f1 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5054:ff:feb2:a7f1/64 scope link 
       valid_lft forever preferred_lft forever

However, it seems to confuse my initscripts on Fedora:
[root@test1 ~]# ifup ens3
./network-functions: line 78: [: /etc/dhcp/dhclient-ens3: binary operator expected
./network-functions: line 80: [: /etc/dhclient-ens3: binary operator expected
./network-functions: line 69: [: /var/lib/dhclient/dhclient-ens3: binary operator expected

Determining IP information for ens3
ens3n_sby...Cannot find device "ens3n_sby.pid"
Cannot find device "ens3n_sby.lease"
 failed.

I tried to change the standby device mac:
ip link set ens3n_sby addr 52:54:00:b2:a7:f2
[root@test1 ~]# ifup ens3

Determining IP information for ens3... done.
[root@test1 ~]#

But now the network does not work. I think that the mac change on
standby device should be probably refused, no?

When I change the mac back, all works fine.


Now I try to change mac of the failover master:
[root@test1 ~]# ip link set ens3 addr 52:54:00:b2:a7:f3
RTNETLINK answers: Operation not supported

That I did expect to work. I would expect this would change the mac of
the master and both standby and primary slaves.


Now I tried to add a primary pci device. I don't have any fancy VF on my
test setup, but I expected the good old 8139cp to work:
[root@test1 ~]# ethtool -i ens9
driver: 8139cp
....
[root@test1 ~]# ip link set ens9 addr 52:54:00:b2:a7:f1

I see no message in dmesg, so I guess the failover module did not
enslave this netdev. The mac change is not monitored. I would expect
that it is and whenever a device changes mac to the failover one, it
should be enslaved and whenever it changes mac back to something else,
it should be released - the primary one ofcourse.



[...]

>+static int virtnet_get_phys_port_name(struct net_device *dev, char *buf,
>+				      size_t len)
>+{
>+	struct virtnet_info *vi = netdev_priv(dev);
>+	int ret;
>+
>+	if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_STANDBY))
>+		return -EOPNOTSUPP;
>+
>+	ret = snprintf(buf, len, "_sby");

please avoid the "_".

[...]

^ permalink raw reply

* Re: [PATCH net-next v9 2/4] net: Introduce generic failover module
From: Jiri Pirko @ 2018-04-28  9:06 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: alexander.h.duyck, virtio-dev, mst, kubakici, netdev,
	virtualization, loseweigh, aaron.f.brown, davem
In-Reply-To: <1524848820-42258-3-git-send-email-sridhar.samudrala@intel.com>

Fri, Apr 27, 2018 at 07:06:58PM CEST, sridhar.samudrala@intel.com wrote:
>This provides a generic interface for paravirtual drivers to listen
>for netdev register/unregister/link change events from pci ethernet
>devices with the same MAC and takeover their datapath. The notifier and
>event handling code is based on the existing netvsc implementation.
>
>It exposes 2 sets of interfaces to the paravirtual drivers.
>1. For paravirtual drivers like virtio_net that use 3 netdev model, the
>   the failover module provides interfaces to create/destroy additional
>   master netdev and all the slave events are managed internally.
>        net_failover_create()
>        net_failover_destroy()
>   A failover netdev is created that acts a master device and controls 2
>   slave devices. The original virtio_net netdev is registered as 'standby'
>   netdev and a passthru/vf device with the same MAC gets registered as
>   'primary' netdev. Both 'standby' and 'primary' netdevs are associated
>   with the same 'pci' device.  The user accesses the network interface via
>   'failover' netdev. The 'failover' netdev chooses 'primary' netdev as
>   default for transmits when it is available with link up and running.
>2. For existing netvsc driver that uses 2 netdev model, no master netdev
>   is created. The paravirtual driver registers each instance of netvsc
>   as a 'failover' netdev  along with a set of ops to manage the slave
>   events. There is no 'standby' netdev in this model. A passthru/vf device
>   with the same MAC gets registered as 'primary' netdev.
>        net_failover_register()
>        net_failover_unregister()
>

First of all, I like this v9 very much. Nice progress!
Couple of notes inlined.


>Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>---
> include/linux/netdevice.h  |  16 +
> include/net/net_failover.h |  62 ++++
> net/Kconfig                |  10 +
> net/core/Makefile          |   1 +
> net/core/net_failover.c    | 892 +++++++++++++++++++++++++++++++++++++++++++++
> 5 files changed, 981 insertions(+)
> create mode 100644 include/net/net_failover.h
> create mode 100644 net/core/net_failover.c

[...]


>+static int net_failover_slave_register(struct net_device *slave_dev)
>+{
>+	struct net_failover_info *nfo_info;
>+	struct net_failover_ops *nfo_ops;
>+	struct net_device *failover_dev;
>+	bool slave_is_standby;
>+	u32 orig_mtu;
>+	int err;
>+
>+	ASSERT_RTNL();
>+
>+	failover_dev = net_failover_get_bymac(slave_dev->perm_addr, &nfo_ops);
>+	if (!failover_dev)
>+		goto done;
>+
>+	if (failover_dev->type != slave_dev->type)
>+		goto done;
>+
>+	if (nfo_ops && nfo_ops->slave_register)
>+		return nfo_ops->slave_register(slave_dev, failover_dev);
>+
>+	nfo_info = netdev_priv(failover_dev);
>+	slave_is_standby = (slave_dev->dev.parent == failover_dev->dev.parent);

No parentheses needed.


>+	if (slave_is_standby ? rtnl_dereference(nfo_info->standby_dev) :
>+			rtnl_dereference(nfo_info->primary_dev)) {
>+		netdev_err(failover_dev, "%s attempting to register as slave dev when %s already present\n",
>+			   slave_dev->name,
>+			   slave_is_standby ? "standby" : "primary");
>+		goto done;
>+	}
>+
>+	/* We want to allow only a direct attached VF device as a primary
>+	 * netdev. As there is no easy way to check for a VF device, restrict
>+	 * this to a pci device.
>+	 */
>+	if (!slave_is_standby && (!slave_dev->dev.parent ||
>+				  !dev_is_pci(slave_dev->dev.parent)))

Yeah, this is good for now.


>+		goto done;
>+
>+	if (failover_dev->features & NETIF_F_VLAN_CHALLENGED &&
>+	    vlan_uses_dev(failover_dev)) {
>+		netdev_err(failover_dev, "Device %s is VLAN challenged and failover device has VLAN set up\n",
>+			   failover_dev->name);
>+		goto done;
>+	}
>+
>+	/* Align MTU of slave with failover dev */
>+	orig_mtu = slave_dev->mtu;
>+	err = dev_set_mtu(slave_dev, failover_dev->mtu);
>+	if (err) {
>+		netdev_err(failover_dev, "unable to change mtu of %s to %u register failed\n",
>+			   slave_dev->name, failover_dev->mtu);
>+		goto done;
>+	}
>+
>+	dev_hold(slave_dev);
>+
>+	if (netif_running(failover_dev)) {
>+		err = dev_open(slave_dev);
>+		if (err && (err != -EBUSY)) {
>+			netdev_err(failover_dev, "Opening slave %s failed err:%d\n",
>+				   slave_dev->name, err);
>+			goto err_dev_open;
>+		}
>+	}
>+
>+	netif_addr_lock_bh(failover_dev);
>+	dev_uc_sync_multiple(slave_dev, failover_dev);
>+	dev_uc_sync_multiple(slave_dev, failover_dev);
>+	netif_addr_unlock_bh(failover_dev);
>+
>+	err = vlan_vids_add_by_dev(slave_dev, failover_dev);
>+	if (err) {
>+		netdev_err(failover_dev, "Failed to add vlan ids to device %s err:%d\n",
>+			   slave_dev->name, err);
>+		goto err_vlan_add;
>+	}
>+
>+	err = netdev_rx_handler_register(slave_dev, net_failover_handle_frame,
>+					 failover_dev);
>+	if (err) {
>+		netdev_err(slave_dev, "can not register failover rx handler (err = %d)\n",
>+			   err);
>+		goto err_handler_register;
>+	}
>+
>+	err = netdev_upper_dev_link(slave_dev, failover_dev, NULL);

Please use netdev_master_upper_dev_link().



>+	if (err) {
>+		netdev_err(slave_dev, "can not set failover device %s (err = %d)\n",
>+			   failover_dev->name, err);
>+		goto err_upper_link;
>+	}
>+
>+	slave_dev->priv_flags |= IFF_FAILOVER_SLAVE;
>+
>+	if (slave_is_standby) {
>+		rcu_assign_pointer(nfo_info->standby_dev, slave_dev);
>+		dev_get_stats(nfo_info->standby_dev, &nfo_info->standby_stats);
>+	} else {
>+		rcu_assign_pointer(nfo_info->primary_dev, slave_dev);
>+		dev_get_stats(nfo_info->primary_dev, &nfo_info->primary_stats);
>+		failover_dev->min_mtu = slave_dev->min_mtu;
>+		failover_dev->max_mtu = slave_dev->max_mtu;
>+	}
>+
>+	net_failover_compute_features(failover_dev);
>+
>+	call_netdevice_notifiers(NETDEV_JOIN, slave_dev);
>+
>+	netdev_info(failover_dev, "failover %s slave:%s registered\n",
>+		    slave_is_standby ? "standby" : "primary", slave_dev->name);

I wonder if noise like this is needed in dmesg...


>+
>+	goto done;
>+
>+err_upper_link:
>+	netdev_rx_handler_unregister(slave_dev);
>+err_handler_register:
>+	vlan_vids_del_by_dev(slave_dev, failover_dev);
>+err_vlan_add:
>+	dev_uc_unsync(slave_dev, failover_dev);
>+	dev_mc_unsync(slave_dev, failover_dev);
>+	dev_close(slave_dev);
>+err_dev_open:
>+	dev_put(slave_dev);
>+	dev_set_mtu(slave_dev, orig_mtu);
>+done:
>+	return NOTIFY_DONE;
>+}
>+
>+int net_failover_slave_unregister(struct net_device *slave_dev)
>+{
>+	struct net_device *standby_dev, *primary_dev;
>+	struct net_failover_info *nfo_info;
>+	struct net_failover_ops *nfo_ops;
>+	struct net_device *failover_dev;
>+	bool slave_is_standby;
>+
>+	if (!netif_is_failover_slave(slave_dev))
>+		goto done;
>+
>+	ASSERT_RTNL();
>+
>+	failover_dev = net_failover_get_bymac(slave_dev->perm_addr, &nfo_ops);
>+	if (!failover_dev)
>+		goto done;
>+
>+	if (nfo_ops && nfo_ops->slave_unregister)
>+		return nfo_ops->slave_unregister(slave_dev, failover_dev);
>+
>+	nfo_info = netdev_priv(failover_dev);
>+	primary_dev = rtnl_dereference(nfo_info->primary_dev);
>+	standby_dev = rtnl_dereference(nfo_info->standby_dev);
>+
>+	if (slave_dev != primary_dev && slave_dev != standby_dev)
>+		goto done;
>+
>+	slave_is_standby = (slave_dev->dev.parent == failover_dev->dev.parent);
>+
>+	netdev_rx_handler_unregister(slave_dev);
>+	netdev_upper_dev_unlink(slave_dev, failover_dev);
>+	vlan_vids_del_by_dev(slave_dev, failover_dev);
>+	dev_uc_unsync(slave_dev, failover_dev);
>+	dev_mc_unsync(slave_dev, failover_dev);
>+	dev_close(slave_dev);
>+	slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE;
>+
>+	nfo_info = netdev_priv(failover_dev);
>+	net_failover_get_stats(failover_dev, &nfo_info->failover_stats);
>+
>+	if (slave_is_standby) {
>+		RCU_INIT_POINTER(nfo_info->standby_dev, NULL);
>+	} else {
>+		RCU_INIT_POINTER(nfo_info->primary_dev, NULL);
>+		if (standby_dev) {
>+			failover_dev->min_mtu = standby_dev->min_mtu;
>+			failover_dev->max_mtu = standby_dev->max_mtu;
>+		}
>+	}
>+
>+	dev_put(slave_dev);
>+
>+	net_failover_compute_features(failover_dev);
>+
>+	netdev_info(failover_dev, "failover %s slave:%s unregistered\n",
>+		    slave_is_standby ? "standby" : "primary", slave_dev->name);
>+
>+done:
>+	return NOTIFY_DONE;
>+}
>+EXPORT_SYMBOL_GPL(net_failover_slave_unregister);
>+
>+static int net_failover_slave_link_change(struct net_device *slave_dev)
>+{
>+	struct net_device *failover_dev, *primary_dev, *standby_dev;
>+	struct net_failover_info *nfo_info;
>+	struct net_failover_ops *nfo_ops;
>+
>+	if (!netif_is_failover_slave(slave_dev))
>+		goto done;
>+
>+	ASSERT_RTNL();
>+
>+	failover_dev = net_failover_get_bymac(slave_dev->perm_addr, &nfo_ops);
>+	if (!failover_dev)
>+		goto done;
>+
>+	if (nfo_ops && nfo_ops->slave_link_change)
>+		return nfo_ops->slave_link_change(slave_dev, failover_dev);
>+
>+	if (!netif_running(failover_dev))
>+		return 0;
>+
>+	nfo_info = netdev_priv(failover_dev);
>+
>+	primary_dev = rtnl_dereference(nfo_info->primary_dev);
>+	standby_dev = rtnl_dereference(nfo_info->standby_dev);
>+
>+	if (slave_dev != primary_dev && slave_dev != standby_dev)
>+		goto done;
>+
>+	if ((primary_dev && net_failover_xmit_ready(primary_dev)) ||
>+	    (standby_dev && net_failover_xmit_ready(standby_dev))) {
>+		netif_carrier_on(failover_dev);
>+		netif_tx_wake_all_queues(failover_dev);
>+	} else {
>+		net_failover_get_stats(failover_dev, &nfo_info->failover_stats);
>+		netif_carrier_off(failover_dev);
>+		netif_tx_stop_all_queues(failover_dev);
>+	}
>+
>+done:
>+	return NOTIFY_DONE;
>+}
>+
>+static int
>+net_failover_event(struct notifier_block *this, unsigned long event, void *ptr)
>+{
>+	struct net_device *event_dev = netdev_notifier_info_to_dev(ptr);
>+
>+	/* Skip parent events */
>+	if (netif_is_failover(event_dev))
>+		return NOTIFY_DONE;
>+
>+	switch (event) {
>+	case NETDEV_REGISTER:
>+		return net_failover_slave_register(event_dev);
>+	case NETDEV_UNREGISTER:
>+		return net_failover_slave_unregister(event_dev);
>+	case NETDEV_UP:
>+	case NETDEV_DOWN:
>+	case NETDEV_CHANGE:
>+		return net_failover_slave_link_change(event_dev);
>+	default:
>+		return NOTIFY_DONE;
>+	}
>+}
>+
>+static struct notifier_block net_failover_notifier = {
>+	.notifier_call = net_failover_event,
>+};
>+
>+static void nfo_register_existing_slave(struct net_device *failover_dev)

Please maintain the same function prefixes withing the whole code.

Also, to be consistent with the rest of the code, have "_register" as a
suffix.


>+{
>+	struct net *net = dev_net(failover_dev);
>+	struct net_device *dev;
>+
>+	rtnl_lock();
>+	for_each_netdev(net, dev) {
>+		if (netif_is_failover(dev))
>+			continue;
>+		if (ether_addr_equal(failover_dev->perm_addr, dev->perm_addr))
>+			net_failover_slave_register(dev);
>+	}
>+	rtnl_unlock();
>+}
>+


For every exported function, please provide documentation in format:

/**
 *	net_failover_register - Register net failover device
 *
 *	@dev: netdevice the failover is registerd for
 *	@ops: failover ops
 *
 *	Describe what the function does, what are expected inputs and
 *	outputs, etc. Don't hesistate to be verbose. Mention the 2/3netdev
 *	model here. Then you don't need the comment in the header file
 *	for there functions.
 */

>+int net_failover_register(struct net_device *dev, struct net_failover_ops *ops,
>+			  struct net_failover **pfailover)

Just return "struct net_failover *" instead of arg ** and use ERR_PTR
macro to propagate an error.


>+{
>+	struct net_failover *failover;
>+
>+	failover = kzalloc(sizeof(*failover), GFP_KERNEL);
>+	if (!failover)
>+		return -ENOMEM;
>+
>+	rcu_assign_pointer(failover->ops, ops);
>+	dev_hold(dev);
>+	dev->priv_flags |= IFF_FAILOVER;
>+	rcu_assign_pointer(failover->failover_dev, dev);
>+
>+	spin_lock(&net_failover_lock);
>+	list_add_tail(&failover->list, &net_failover_list);
>+	spin_unlock(&net_failover_lock);
>+
>+	netdev_info(dev, "failover master:%s registered\n", dev->name);
>+
>+	nfo_register_existing_slave(dev);
>+
>+	*pfailover = failover;
>+
>+	return 0;
>+}
>+EXPORT_SYMBOL_GPL(net_failover_register);
>+
>+void net_failover_unregister(struct net_failover *failover)
>+{
>+	struct net_device *failover_dev;
>+
>+	failover_dev = rcu_dereference(failover->failover_dev);
>+
>+	netdev_info(failover_dev, "failover master:%s unregistered\n",
>+		    failover_dev->name);
>+
>+	failover_dev->priv_flags &= ~IFF_FAILOVER;
>+	dev_put(failover_dev);
>+
>+	spin_lock(&net_failover_lock);
>+	list_del(&failover->list);
>+	spin_unlock(&net_failover_lock);
>+
>+	kfree(failover);
>+}
>+EXPORT_SYMBOL_GPL(net_failover_unregister);
>+
>+int net_failover_create(struct net_device *standby_dev,
>+			struct net_failover **pfailover)

Same here, just return "struct net_failover *"


>+{
>+	struct device *dev = standby_dev->dev.parent;
>+	struct net_device *failover_dev;
>+	int err;
>+
>+	/* Alloc at least 2 queues, for now we are going with 16 assuming
>+	 * that VF devices being enslaved won't have too many queues.
>+	 */
>+	failover_dev = alloc_etherdev_mq(sizeof(struct net_failover_info), 16);
>+	if (!failover_dev) {
>+		dev_err(dev, "Unable to allocate failover_netdev!\n");
>+		return -ENOMEM;
>+	}
>+
>+	dev_net_set(failover_dev, dev_net(standby_dev));
>+	SET_NETDEV_DEV(failover_dev, dev);
>+
>+	failover_dev->netdev_ops = &failover_dev_ops;
>+	failover_dev->ethtool_ops = &failover_ethtool_ops;
>+
>+	/* Initialize the device options */
>+	failover_dev->priv_flags |= IFF_UNICAST_FLT | IFF_NO_QUEUE;
>+	failover_dev->priv_flags &= ~(IFF_XMIT_DST_RELEASE |
>+				       IFF_TX_SKB_SHARING);
>+
>+	/* don't acquire failover netdev's netif_tx_lock when transmitting */
>+	failover_dev->features |= NETIF_F_LLTX;
>+
>+	/* Don't allow failover devices to change network namespaces. */
>+	failover_dev->features |= NETIF_F_NETNS_LOCAL;
>+
>+	failover_dev->hw_features = FAILOVER_VLAN_FEATURES |
>+				    NETIF_F_HW_VLAN_CTAG_TX |
>+				    NETIF_F_HW_VLAN_CTAG_RX |
>+				    NETIF_F_HW_VLAN_CTAG_FILTER;
>+
>+	failover_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL;
>+	failover_dev->features |= failover_dev->hw_features;
>+
>+	memcpy(failover_dev->dev_addr, standby_dev->dev_addr,
>+	       failover_dev->addr_len);
>+
>+	failover_dev->min_mtu = standby_dev->min_mtu;
>+	failover_dev->max_mtu = standby_dev->max_mtu;
>+
>+	err = register_netdev(failover_dev);
>+	if (err < 0) {

if (err)
is enough


>+		dev_err(dev, "Unable to register failover_dev!\n");
>+		goto err_register_netdev;
>+	}
>+
>+	netif_carrier_off(failover_dev);
>+
>+	err = net_failover_register(failover_dev, NULL, pfailover);
>+	if (err < 0)

if (err)
is enough


>+		goto err_failover_register;
>+
>+	return 0;
>+
>+err_failover_register:
>+	unregister_netdev(failover_dev);
>+err_register_netdev:
>+	free_netdev(failover_dev);
>+
>+	return err;
>+}
>+EXPORT_SYMBOL_GPL(net_failover_create);

[...]

^ permalink raw reply

* Re: [PATCH net-next v9 3/4] virtio_net: Extend virtio to use VF datapath when available
From: Jiri Pirko @ 2018-04-28  8:24 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: alexander.h.duyck, virtio-dev, mst, kubakici, netdev,
	virtualization, loseweigh, aaron.f.brown, davem
In-Reply-To: <1524848820-42258-4-git-send-email-sridhar.samudrala@intel.com>

Fri, Apr 27, 2018 at 07:06:59PM CEST, sridhar.samudrala@intel.com wrote:
>This patch enables virtio_net to switch over to a VF datapath when a VF
>netdev is present with the same MAC address. It allows live migration
>of a VM with a direct attached VF without the need to setup a bond/team
>between a VF and virtio net device in the guest.
>
>The hypervisor needs to enable only one datapath at any time so that
>packets don't get looped back to the VM over the other datapath. When a VF

Why? Both datapaths could be enabled at a time. Why the loop on
hypervisor side would be a problem. This in not an issue for
bonding/team as well.


>is plugged, the virtio datapath link state can be marked as down. The
>hypervisor needs to unplug the VF device from the guest on the source host
>and reset the MAC filter of the VF to initiate failover of datapath to

"reset the MAC filter of the VF" - you mean "set the VF mac"?


>virtio before starting the migration. After the migration is completed,
>the destination hypervisor sets the MAC filter on the VF and plugs it back
>to the guest to switch over to VF datapath.
>
>It uses the generic failover framework that provides 2 functions to create
>and destroy a master failover netdev. When STANDBY feature is enabled, an
>additional netdev(failover netdev) is created that acts as a master device
>and tracks the state of the 2 lower netdevs. The original virtio_net netdev
>is marked as 'standby' netdev and a passthru device with the same MAC is
>registered as 'primary' netdev.
>
>This patch is based on the discussion initiated by Jesse on this thread.
>https://marc.info/?l=linux-virtualization&m=151189725224231&w=2

[...]

^ permalink raw reply

* Re: [PATCH net-next v8 2/4] net: Introduce generic failover module
From: Dan Carpenter @ 2018-04-28  8:23 UTC (permalink / raw)
  To: kbuild
  Cc: alexander.h.duyck, virtio-dev, jiri, mst, kubakici, netdev,
	virtualization, loseweigh, kbuild-all, sridhar.samudrala,
	aaron.f.brown, davem
In-Reply-To: <1524700768-38627-3-git-send-email-sridhar.samudrala@intel.com>

Hi Sridhar,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]
url:    https://github.com/0day-ci/linux/commits/Sridhar-Samudrala/Enable-virtio_net-to-act-as-a-standby-for-a-passthru-device/20180427-183842

smatch warnings:
net/core/net_failover.c:229 net_failover_change_mtu() error: we previously assumed 'primary_dev' could be null (see line 219)
net/core/net_failover.c:279 net_failover_vlan_rx_add_vid() error: we previously assumed 'primary_dev' could be null (see line 269)

# https://github.com/0day-ci/linux/commit/5a5f2e3efcb699867db79543dfebe764927b9c93
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout 5a5f2e3efcb699867db79543dfebe764927b9c93
vim +/primary_dev +229 net/core/net_failover.c

5a5f2e3e Sridhar Samudrala 2018-04-25  211  
5a5f2e3e Sridhar Samudrala 2018-04-25  212  static int net_failover_change_mtu(struct net_device *dev, int new_mtu)
5a5f2e3e Sridhar Samudrala 2018-04-25  213  {
5a5f2e3e Sridhar Samudrala 2018-04-25  214  	struct net_failover_info *nfo_info = netdev_priv(dev);
5a5f2e3e Sridhar Samudrala 2018-04-25  215  	struct net_device *primary_dev, *standby_dev;
5a5f2e3e Sridhar Samudrala 2018-04-25  216  	int ret = 0;
5a5f2e3e Sridhar Samudrala 2018-04-25  217  
5a5f2e3e Sridhar Samudrala 2018-04-25  218  	primary_dev = rcu_dereference(nfo_info->primary_dev);
5a5f2e3e Sridhar Samudrala 2018-04-25 @219  	if (primary_dev) {
5a5f2e3e Sridhar Samudrala 2018-04-25  220  		ret = dev_set_mtu(primary_dev, new_mtu);
5a5f2e3e Sridhar Samudrala 2018-04-25  221  		if (ret)
5a5f2e3e Sridhar Samudrala 2018-04-25  222  			return ret;
5a5f2e3e Sridhar Samudrala 2018-04-25  223  	}
5a5f2e3e Sridhar Samudrala 2018-04-25  224  
5a5f2e3e Sridhar Samudrala 2018-04-25  225  	standby_dev = rcu_dereference(nfo_info->standby_dev);
5a5f2e3e Sridhar Samudrala 2018-04-25  226  	if (standby_dev) {
5a5f2e3e Sridhar Samudrala 2018-04-25  227  		ret = dev_set_mtu(standby_dev, new_mtu);
5a5f2e3e Sridhar Samudrala 2018-04-25  228  		if (ret) {
5a5f2e3e Sridhar Samudrala 2018-04-25 @229  			dev_set_mtu(primary_dev, dev->mtu);
5a5f2e3e Sridhar Samudrala 2018-04-25  230  			return ret;
5a5f2e3e Sridhar Samudrala 2018-04-25  231  		}
5a5f2e3e Sridhar Samudrala 2018-04-25  232  	}
5a5f2e3e Sridhar Samudrala 2018-04-25  233  
5a5f2e3e Sridhar Samudrala 2018-04-25  234  	dev->mtu = new_mtu;
5a5f2e3e Sridhar Samudrala 2018-04-25  235  
5a5f2e3e Sridhar Samudrala 2018-04-25  236  	return 0;
5a5f2e3e Sridhar Samudrala 2018-04-25  237  }
5a5f2e3e Sridhar Samudrala 2018-04-25  238  
5a5f2e3e Sridhar Samudrala 2018-04-25  239  static void net_failover_set_rx_mode(struct net_device *dev)
5a5f2e3e Sridhar Samudrala 2018-04-25  240  {
5a5f2e3e Sridhar Samudrala 2018-04-25  241  	struct net_failover_info *nfo_info = netdev_priv(dev);
5a5f2e3e Sridhar Samudrala 2018-04-25  242  	struct net_device *slave_dev;
5a5f2e3e Sridhar Samudrala 2018-04-25  243  
5a5f2e3e Sridhar Samudrala 2018-04-25  244  	rcu_read_lock();
5a5f2e3e Sridhar Samudrala 2018-04-25  245  
5a5f2e3e Sridhar Samudrala 2018-04-25  246  	slave_dev = rcu_dereference(nfo_info->primary_dev);
5a5f2e3e Sridhar Samudrala 2018-04-25  247  	if (slave_dev) {
5a5f2e3e Sridhar Samudrala 2018-04-25  248  		dev_uc_sync_multiple(slave_dev, dev);
5a5f2e3e Sridhar Samudrala 2018-04-25  249  		dev_mc_sync_multiple(slave_dev, dev);
5a5f2e3e Sridhar Samudrala 2018-04-25  250  	}
5a5f2e3e Sridhar Samudrala 2018-04-25  251  
5a5f2e3e Sridhar Samudrala 2018-04-25  252  	slave_dev = rcu_dereference(nfo_info->standby_dev);
5a5f2e3e Sridhar Samudrala 2018-04-25  253  	if (slave_dev) {
5a5f2e3e Sridhar Samudrala 2018-04-25  254  		dev_uc_sync_multiple(slave_dev, dev);
5a5f2e3e Sridhar Samudrala 2018-04-25  255  		dev_mc_sync_multiple(slave_dev, dev);
5a5f2e3e Sridhar Samudrala 2018-04-25  256  	}
5a5f2e3e Sridhar Samudrala 2018-04-25  257  
5a5f2e3e Sridhar Samudrala 2018-04-25  258  	rcu_read_unlock();
5a5f2e3e Sridhar Samudrala 2018-04-25  259  }
5a5f2e3e Sridhar Samudrala 2018-04-25  260  
5a5f2e3e Sridhar Samudrala 2018-04-25  261  static int net_failover_vlan_rx_add_vid(struct net_device *dev, __be16 proto,
5a5f2e3e Sridhar Samudrala 2018-04-25  262  					u16 vid)
5a5f2e3e Sridhar Samudrala 2018-04-25  263  {
5a5f2e3e Sridhar Samudrala 2018-04-25  264  	struct net_failover_info *nfo_info = netdev_priv(dev);
5a5f2e3e Sridhar Samudrala 2018-04-25  265  	struct net_device *primary_dev, *standby_dev;
5a5f2e3e Sridhar Samudrala 2018-04-25  266  	int ret = 0;
5a5f2e3e Sridhar Samudrala 2018-04-25  267  
5a5f2e3e Sridhar Samudrala 2018-04-25  268  	primary_dev = rcu_dereference(nfo_info->primary_dev);
5a5f2e3e Sridhar Samudrala 2018-04-25 @269  	if (primary_dev) {
5a5f2e3e Sridhar Samudrala 2018-04-25  270  		ret = vlan_vid_add(primary_dev, proto, vid);
5a5f2e3e Sridhar Samudrala 2018-04-25  271  		if (ret)
5a5f2e3e Sridhar Samudrala 2018-04-25  272  			return ret;
5a5f2e3e Sridhar Samudrala 2018-04-25  273  	}
5a5f2e3e Sridhar Samudrala 2018-04-25  274  
5a5f2e3e Sridhar Samudrala 2018-04-25  275  	standby_dev = rcu_dereference(nfo_info->standby_dev);
5a5f2e3e Sridhar Samudrala 2018-04-25  276  	if (standby_dev) {
5a5f2e3e Sridhar Samudrala 2018-04-25  277  		ret = vlan_vid_add(standby_dev, proto, vid);
5a5f2e3e Sridhar Samudrala 2018-04-25  278  		if (ret)
5a5f2e3e Sridhar Samudrala 2018-04-25 @279  			vlan_vid_del(primary_dev, proto, vid);
5a5f2e3e Sridhar Samudrala 2018-04-25  280  	}
5a5f2e3e Sridhar Samudrala 2018-04-25  281  
5a5f2e3e Sridhar Samudrala 2018-04-25  282  	return ret;
5a5f2e3e Sridhar Samudrala 2018-04-25  283  }
5a5f2e3e Sridhar Samudrala 2018-04-25  284  

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox