Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH bpf-next 2/3] bpf: btf: add btf json print functionality
From: Jakub Kicinski @ 2018-06-22 21:27 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Okash Khawaja, Daniel Borkmann, Alexei Starovoitov, Yonghong Song,
	Quentin Monnet, David S. Miller, netdev, kernel-team,
	linux-kernel
In-Reply-To: <20180622205535.c6vjhdwt5er4wc32@kafai-mbp.dhcp.thefacebook.com>

On Fri, 22 Jun 2018 13:58:52 -0700, Martin KaFai Lau wrote:
> On Fri, Jun 22, 2018 at 11:40:32AM -0700, Jakub Kicinski wrote:
> > On Thu, 21 Jun 2018 18:20:52 -0700, Martin KaFai Lau wrote:  
> > > On Thu, Jun 21, 2018 at 05:25:23PM -0700, Jakub Kicinski wrote:  
> > > > On Thu, 21 Jun 2018 16:58:15 -0700, Martin KaFai Lau wrote:    
> > > > > On Thu, Jun 21, 2018 at 04:07:19PM -0700, Jakub Kicinski wrote:    
> > > > > > On Thu, 21 Jun 2018 15:51:17 -0700, Martin KaFai Lau wrote:      
> > > > > > > On Thu, Jun 21, 2018 at 02:59:35PM -0700, Jakub Kicinski wrote:      
> > > > > > > > On Wed, 20 Jun 2018 13:30:53 -0700, Okash Khawaja wrote:        
> > > > > > > > > $ sudo bpftool map dump -p id 14
> > > > > > > > > [{
> > > > > > > > >         "key": 0
> > > > > > > > >     },{
> > > > > > > > >         "value": {
> > > > > > > > >             "m": 1,
> > > > > > > > >             "n": 2,
> > > > > > > > >             "o": "c",
> > > > > > > > >             "p": [15,16,17,18,15,16,17,18
> > > > > > > > >             ],
> > > > > > > > >             "q": [[25,26,27,28,25,26,27,28
> > > > > > > > >                 ],[35,36,37,38,35,36,37,38
> > > > > > > > >                 ],[45,46,47,48,45,46,47,48
> > > > > > > > >                 ],[55,56,57,58,55,56,57,58
> > > > > > > > >                 ]
> > > > > > > > >             ],
> > > > > > > > >             "r": 1,
> > > > > > > > >             "s": 0x7ffff6f70568,
> > > > > > > > >             "t": {
> > > > > > > > >                 "x": 5,
> > > > > > > > >                 "y": 10
> > > > > > > > >             },
> > > > > > > > >             "u": 100,
> > > > > > > > >             "v": 20,
> > > > > > > > >             "w1": 0x7,
> > > > > > > > >             "w2": 0x3
> > > > > > > > >         }
> > > > > > > > >     }
> > > > > > > > > ]        
> > > > > > > > 
> > > > > > > > I don't think this format is okay, JSON output is an API you shouldn't
> > > > > > > > break.  You can change the non-JSON output whatever way you like, but
> > > > > > > > JSON must remain backwards compatible.
> > > > > > > > 
> > > > > > > > The dump today has object per entry, e.g.:
> > > > > > > > 
> > > > > > > > {
> > > > > > > >         "key":["0x00","0x00","0x00","0x00",
> > > > > > > >         ],
> > > > > > > >         "value": ["0x02","0x00","0x00","0x00","0x00","0x00","0x00","0x00"
> > > > > > > >         ]
> > > > > > > > }
> > > > > > > > 
> > > > > > > > This format must remain, you may only augment it with new fields.  E.g.:
> > > > > > > > 
> > > > > > > > {
> > > > > > > >         "key":["0x00","0x00","0x00","0x00",
> > > > > > > >         ],
> > > > > > > > 	"key_struct":{
> > > > > > > > 		"index":0
> > > > > > > > 	},    
> > > Got a few questions.
> > > 
> > > When we support hashtab later, the key could be int
> > > but reusing the name as "index" is weird.  
> > 
> > Ugh, yes, naturally.  I just typed that out without thinking, so for
> > array maps there is usually no BTF info?...  For hashes obviously we  
> The key of array map also has BTF info which must be an int.

Perfect.

> > should just use the BTF, I'm not sure we should format indexes for
> > arrays nicely or not :S
> >   
> > > The key could also be a struct (e.g. a struct to describe ip:port).
> > > Can you suggest how the "key_struct" will look like?  
> > 
> > Hm.  I think in my mind it has only been a struct but that's not true :S
> > So the struct in the name makes very limited sense now.
> > 
> > Should we do:
> > "formatted" : {
> > 	"value" : XXX
> > }
> > 
> > Where
> > XXX == plain int for integers, e.g. "value":0
> > XXX == array for arrays, e.g. "value":[1,2,3,4]
> > XXX == object for objects, e.g. "value":{"field":XXX, "field2":XXX}  
> It is exactly how this patch is using json's {}, [] and int. ;)

Great, then just wrap that in a "formatted" object instead of
redefining fields and we're good?

> but other than that, it does not have to be json.
> In the next spin, lets stop calling this output json to avoid wrong
> user's expection and I also don't want the future readability
> improvements to be limited by that.  Lets call it something
> like plain text output with BTF.

I don't understand.  We are discussing JSON output here.  The example we
are commenting on is output of:

$ sudo bpftool map dump -p id 14

That -p means JSON.  Nobody objects to plain test output changes.  I
actually didn't realize you haven't implemented plain text in this
series, we should have both.

> How about:
> When "bpftool map dump id 1" is used, it will print the BTF plaintext output
> if a map has BTF available.  If not, it will print the existing
> plaintext output.  That should solve the concern about the user may not
> know BTF is available.
> 
> This ascii output is for human.  The script should not parse the ascii output
> because it is silly not to use the binary ABI (like what this patch is using)
> which does not suffer backward compat issue.

What binary ABI?  I'm also not 100% sure what this patch is doing as it
adds bunch of code in new files that never gets called:

 tools/bpf/bpftool/btf_dumper.c |  247 +++++++++++++++++++++++++++++++++++++++++
 tools/bpf/bpftool/btf_dumper.h |   18 ++
 2 files changed, 265 insertions(+)

> The existing "-j" can be used to dump the map's binary data
> for remote debugging purpose.  The BTF is in one of the elf section of
> the bpf_prog.o.

> > > > > > > >         "value": ["0x02","0x00","0x00","0x00","0x00","0x00","0x00","0x00"
> > > > > > > >         ],
> > > > > > > > 	"value_struct":{
> > > > > > > > 		"src_ip":2,    
> > > If for the same map the user changes the "src_ip" to an array of int[4]
> > > later (e.g. to support ipv6), it will become "src_ip": [1, 2, 3, 4].
> > > Is it breaking backward compat?
> > > i.e.
> > > struct five_tuples {
> > > -	int src_ip;
> > > +	int src_ip[4];
> > > /* ... */
> > > };  
> > 
> > Well, it is breaking backward compat, but it's the program doing it,
> > not bpftool :)  BTF changes so does the output.  
> As we see, the key/value's btf-output is inherently not backward compat.
> Hence, "-j" and "-p" will stay as is.  The whole existing json will
> be backward compat instead of only partly backward compat.

No.  There is a difference between user of a facility changing their
input and kernel/libraries providing different output in response to
that, and the libraries suddenly changing the output on their own.

Your example is like saying if user started using IPv6 addresses
instead of IPv4 the netlink attributes in dumps will be different so
kernel didn't keep backwards compat.  While what you're doing is more
equivalent to dropping support for old ioctl interfaces because there
is a better mechanism now.

BTF in JSON is very useful, and will help people who writes simple
orchestration/scripts based on bpftool *a* *lot*.  I really appreciate
this addition to bpftool and will start using it myself as soon as it
lands.  I'm not sure why the reluctance to slightly change the output
format?

^ permalink raw reply

* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Michael S. Tsirkin @ 2018-06-22 21:29 UTC (permalink / raw)
  To: Siwei Liu
  Cc: Cornelia Huck, Samudrala, Sridhar, Alexander Duyck, virtio-dev,
	aaron.f.brown, Jiri Pirko, Jakub Kicinski, Netdev, qemu-devel,
	virtualization, konrad.wilk, boris.ostrovsky, Joao Martins,
	Venu Busireddy, vijay.balakrishna
In-Reply-To: <CADGSJ22nLcz_XaJNzUUdpjQaJY8b5wZs=ohm=B7c2qf2K7_yGA@mail.gmail.com>

On Fri, Jun 22, 2018 at 01:03:04PM -0700, Siwei Liu wrote:
> On Fri, Jun 22, 2018 at 1:00 PM, Siwei Liu <loseweigh@gmail.com> wrote:
> > On Thu, Jun 21, 2018 at 7:32 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> On Thu, Jun 21, 2018 at 06:21:55PM -0700, Siwei Liu wrote:
> >>> On Thu, Jun 21, 2018 at 7:59 AM, Cornelia Huck <cohuck@redhat.com> wrote:
> >>> > On Wed, 20 Jun 2018 22:48:58 +0300
> >>> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >>> >
> >>> >> On Wed, Jun 20, 2018 at 06:06:19PM +0200, Cornelia Huck wrote:
> >>> >> > In any case, I'm not sure anymore why we'd want the extra uuid.
> >>> >>
> >>> >> It's mostly so we can have e.g. multiple devices with same MAC
> >>> >> (which some people seem to want in order to then use
> >>> >> then with different containers).
> >>> >>
> >>> >> But it is also handy for when you assign a PF, since then you
> >>> >> can't set the MAC.
> >>> >>
> >>> >
> >>> > OK, so what about the following:
> >>> >
> >>> > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates
> >>> >   that we have a new uuid field in the virtio-net config space
> >>> > - in QEMU, add a property for virtio-net that allows to specify a uuid,
> >>> >   offer VIRTIO_NET_F_STANDBY_UUID if set
> >>> > - when configuring, set the property to the group UUID of the vfio-pci
> >>> >   device
> >>>
> >>> If feature negotiation fails on VIRTIO_NET_F_STANDBY_UUID, is it safe
> >>> to still expose UUID in the config space on virtio-pci?
> >>
> >>
> >> Yes but guest is not supposed to read it.
> >>
> >>> I'm not even sure if it's sane to expose group UUID on the PCI bridge
> >>> where the corresponding vfio-pci device attached to for a guest which
> >>> doesn't support the feature (legacy).
> >>>
> >>> -Siwei
> >>
> >> Yes but you won't add the primary behind such a bridge.
> >
> > I assume the UUID feature is a new one besides the exiting
> > VIRTIO_NET_F_STANDBY feature, where I think the current proposal is,
> > if UUID feature is present and supported by guest, we'll do pairing
> > based on UUID; if UUID feature present
>                                  ^^^^^^^  is NOT present

Let's go back for a bit.

What is wrong with matching by mac?

1. Does not allow multiple NICs with same mac
2. Requires MAC to be programmed by host in the PT device
   (which is often possible with VFs but isn't possible with all PT
   devices)

Both issues are up to host: if host needs them it needs the UUID
feature, if not MAC matching is sufficient.


> > or not supported by guest,
> > we'll still plug in the VF (if guest supports the _F_STANDBY feature)
> > but the pairing will be fallback to using MAC address. Are you saying
> > you don't even want to plug in the primary when the UUID feature is
> > not supported by guest? Does the feature negotiation UUID have to
> > depend on STANDBY being set, or the other way around? I thought that
> > just the absence of STANDBY will prevent primary to get exposed to the
> > guest.
> >
> > -Siwei
> >
> >>
> >>>
> >>> > - in the guest, use the uuid from the virtio-net device's config space
> >>> >   if applicable; else, fall back to matching by MAC as done today

^ permalink raw reply

* Re: [PATCH] net: drivers/net: Convert random_ether_addr to eth_random_addr
From: Jeff Kirsher @ 2018-06-22 21:30 UTC (permalink / raw)
  To: Joe Perches, Derek Chickles, Satanand Burla, Felix Manlunas,
	Raghu Vatsavayi, Hans Ulli Kroll, Linus Walleij, Yisen Zhuang,
	Salil Mehta, Bryan Whitehead, Microchip Linux Driver Support,
	Harish Patil, Manish Chopra, Dept-GELinuxNICDev,
	Solarflare linux maintainers, Edward Cree, Bert Kenward
  Cc: Kalle Valo, netdev, linux-kernel, linux-arm-kernel,
	intel-wired-lan, linux-omap, linux-ntb, linux-usb, linux-wireless,
	b.a.t.m.a.n
In-Reply-To: <c558b59181756dfe647bcdeaf659c386701c0948.1529689773.git.joe@perches.com>

[-- Attachment #1: Type: text/plain, Size: 1491 bytes --]

On Fri, 2018-06-22 at 10:51 -0700, Joe Perches wrote:
> random_ether_addr is a #define for eth_random_addr which is
> generally preferred in kernel code by ~3:1
> 
> Convert the uses of random_ether_addr to enable removing the #define
> 
> Miscellanea:
> 
> o Convert &vfmac[0] to equivalent vfmac and avoid unnecessary line
> wrap
> 
> Signed-off-by: Joe Perches <joe@perches.com>

Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

For i40e change

> ---
>  drivers/net/ethernet/cavium/liquidio/lio_main.c          | 5 ++---
>  drivers/net/ethernet/cortina/gemini.c                    | 2 +-
>  drivers/net/ethernet/hisilicon/hip04_eth.c               | 2 +-
>  drivers/net/ethernet/intel/i40e/i40e_main.c              | 2 +-
>  drivers/net/ethernet/microchip/lan743x_main.c            | 2 +-
>  drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c | 2 +-
>  drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c          | 2 +-
>  drivers/net/ethernet/sfc/ef10_sriov.c                    | 2 +-
>  drivers/net/ethernet/ti/cpsw.c                           | 2 +-
>  drivers/net/ethernet/ti/netcp_core.c                     | 4 ++--
>  drivers/net/ntb_netdev.c                                 | 2 +-
>  drivers/net/usb/lan78xx.c                                | 2 +-
>  drivers/net/wireless/ath/ath9k/hw.c                      | 2 +-
>  net/batman-adv/bridge_loop_avoidance.c                   | 2 +-
>  14 files changed, 16 insertions(+), 17 deletions(-)

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Michael S. Tsirkin @ 2018-06-22 21:32 UTC (permalink / raw)
  To: Siwei Liu
  Cc: Cornelia Huck, Samudrala, Sridhar, Alexander Duyck, virtio-dev,
	aaron.f.brown, Jiri Pirko, Jakub Kicinski, Netdev, qemu-devel,
	virtualization, konrad.wilk, boris.ostrovsky, Joao Martins,
	Venu Busireddy, vijay.balakrishna
In-Reply-To: <CADGSJ22e=Sa8S-HqM4XtOa6ngBKPL73SiQRPh8PUzQgRaie5oA@mail.gmail.com>

On Fri, Jun 22, 2018 at 01:21:45PM -0700, Siwei Liu wrote:
> On Fri, Jun 22, 2018 at 12:05 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Fri, Jun 22, 2018 at 05:09:55PM +0200, Cornelia Huck wrote:
> >> On Thu, 21 Jun 2018 21:20:13 +0300
> >> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >>
> >> > On Thu, Jun 21, 2018 at 04:59:13PM +0200, Cornelia Huck wrote:
> >> > > OK, so what about the following:
> >> > >
> >> > > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates
> >> > >   that we have a new uuid field in the virtio-net config space
> >> > > - in QEMU, add a property for virtio-net that allows to specify a uuid,
> >> > >   offer VIRTIO_NET_F_STANDBY_UUID if set
> >> > > - when configuring, set the property to the group UUID of the vfio-pci
> >> > >   device
> >> > > - in the guest, use the uuid from the virtio-net device's config space
> >> > >   if applicable; else, fall back to matching by MAC as done today
> >> > >
> >> > > That should work for all virtio transports.
> >> >
> >> > True. I'm a bit unhappy that it's virtio net specific though
> >> > since down the road I expect we'll have a very similar feature
> >> > for scsi (and maybe others).
> >> >
> >> > But we do not have a way to have fields that are portable
> >> > both across devices and transports, and I think it would
> >> > be a useful addition. How would this work though? Any idea?
> >>
> >> Can we introduce some kind of device-independent config space area?
> >> Pushing back the device-specific config space by a certain value if the
> >> appropriate feature is negotiated and use that for things like the uuid?
> >
> > So config moves back and forth?
> > Reminds me of the msi vector mess we had with pci.
> > I'd rather have every transport add a new config.
> >
> >> But regardless of that, I'm not sure whether extending this approach to
> >> other device types is the way to go. Tying together two different
> >> devices is creating complicated situations at least in the hypervisor
> >> (even if it's fairly straightforward in the guest). [I have not come
> >> around again to look at the "how to handle visibility in QEMU"
> >> questions due to lack of cycles, sorry about that.]
> >>
> >> So, what's the goal of this approach? Only to allow migration with
> >> vfio-pci, or also to plug in a faster device and use it instead of an
> >> already attached paravirtualized device?
> >
> > These are two sides of the same coin, I think the second approach
> > is closer to what we are doing here.
> 
> I'm leaning towards being conservative to keep the potential of doing
> both. It's the vfio migration itself that has to be addessed, not to
> make every virtio device to have an accelerated datapath, right?
> 
> -Siwei

I think that the approach we took (signal configuration
through standby) assumes standby always present and
primary appearing and disappearing.

Anything else isn't well supported and I'm not sure we
should complicate code too much to support it.

> 
> >
> >> What about migration of vfio devices that are not easily replaced by a
> >> paravirtualized device? I'm thinking of vfio-ccw, where our main (and
> >> currently only) supported device is dasd (disks) -- which can do a lot
> >> of specialized things that virtio-blk does not support (and should not
> >> or even cannot support).
> >
> > But maybe virtio-scsi can?
> >
> >> Would it be more helpful to focus on generic
> >> migration support for vfio instead of going about it device by device?
> >>
> >> This network device approach already seems far along, so it makes sense
> >> to continue with it. But I'm not sure whether we want to spend time and
> >> energy on that for other device types rather than working on a general
> >> solution for vfio migration.
> >
> > I'm inclined to say finalizing this feature would be a good first step
> > and will teach us how we can move forward.
> >
> > --
> > MST

^ permalink raw reply

* [Patch net] net_sched: remove a bogus warning in hfsc
From: Cong Wang @ 2018-06-22 21:33 UTC (permalink / raw)
  To: netdev; +Cc: pupilla, Cong Wang

In update_vf():

  cftree_remove(cl);
  update_cfmin(cl->cl_parent);

the cl_cfmin of cl->cl_parent is intentionally updated to 0
when that parent only has one child. And if this parent is
root qdisc, we could end up, in hfsc_schedule_watchdog(),
that we can't decide the next schedule time for qdisc watchdog.
But it seems safe that we can just skip it, as this watchdog is
not always scheduled anyway.

Thanks to Marco for testing all the cases, nothing is broken.

Reported-by: Marco Berizzi <pupilla@libero.it>
Tested-by: Marco Berizzi <pupilla@libero.it>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 net/sched/sch_hfsc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index 3ae9877ea205..3278a76f6861 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -1385,8 +1385,8 @@ hfsc_schedule_watchdog(struct Qdisc *sch)
 		if (next_time == 0 || next_time > q->root.cl_cfmin)
 			next_time = q->root.cl_cfmin;
 	}
-	WARN_ON(next_time == 0);
-	qdisc_watchdog_schedule(&q->watchdog, next_time);
+	if (next_time)
+		qdisc_watchdog_schedule(&q->watchdog, next_time);
 }
 
 static int
-- 
2.14.4

^ permalink raw reply related

* [PATCH v2] selftests: bpf: notification about privilege required to run test_lwt_seg6local.sh testing script
From: Jeffrin Jose T @ 2018-06-22 21:40 UTC (permalink / raw)
  To: ast, daniel, shuah; +Cc: netdev, linux-kernel, linux-kselftest, Jeffrin Jose T

This test needs root privilege for it's successful execution.

This patch is atleast used to notify the user about the privilege
the script demands for the  smooth execution of the test.

Signed-off-by: Jeffrin Jose T (Rajagiri SET) <ahiliation@gmail.com>
---
 tools/testing/selftests/bpf/test_lwt_seg6local.sh | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_lwt_seg6local.sh b/tools/testing/selftests/bpf/test_lwt_seg6local.sh
index 1c77994b5e71..30575577a8b2 100755
--- a/tools/testing/selftests/bpf/test_lwt_seg6local.sh
+++ b/tools/testing/selftests/bpf/test_lwt_seg6local.sh
@@ -21,6 +21,15 @@
 # An UDP datagram is sent from fb00::1 to fb00::6. The test succeeds if this
 # datagram can be read on NS6 when binding to fb00::6.
 
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
+msg="skip all tests:"
+if [ $UID != 0 ]; then
+    echo $msg please run this as root >&2
+    exit $ksft_skip
+fi
+
 TMP_FILE="/tmp/selftest_lwt_seg6local.txt"
 
 cleanup()
-- 
2.17.0

^ permalink raw reply related

* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Michael S. Tsirkin @ 2018-06-22 21:43 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Siwei Liu, Samudrala, Sridhar, Alexander Duyck, virtio-dev,
	aaron.f.brown, Jiri Pirko, Jakub Kicinski, Netdev, qemu-devel,
	virtualization, konrad.wilk, boris.ostrovsky, Joao Martins,
	Venu Busireddy, vijay.balakrishna
In-Reply-To: <20180622170955.298900c1.cohuck@redhat.com>

On Fri, Jun 22, 2018 at 05:09:55PM +0200, Cornelia Huck wrote:
> Would it be more helpful to focus on generic
> migration support for vfio instead of going about it device by device?

Just to note this approach is actually device by device *type*.  It's
mostly device agnostic for a given device type so you can migrate
between hosts with very different hardware.

And support for more PV device types has other advantages
such as security and forward compatibility to future hosts.

Finally, it all can happen mostly within QEMU. User is currently
required to enable it but it's pretty lightweight.

OTOH vfio migration generally requires actual device-specific work, and
only works when hosts are mostly identical. When they aren't it's easy
to blame the user, but tools for checking host compatiblity are
currently non-existent. Upper layer management will also have to learn
about host and device compatibility wrt migration. At the moment they
can't even figure it out wrt software versions of vhost in kernel and
dpdk so I won't hold my breath for all of this happening quickly.

-- 
MST

^ permalink raw reply

* Re: [PATCH bpf-next 2/3] bpf: btf: add btf json print functionality
From: Jakub Kicinski @ 2018-06-22 21:49 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Okash Khawaja, Daniel Borkmann, Alexei Starovoitov, Yonghong Song,
	Quentin Monnet, David S. Miller, netdev, kernel-team,
	linux-kernel
In-Reply-To: <20180622142743.2b890d0f@cakuba.netronome.com>

On Fri, 22 Jun 2018 14:27:43 -0700, Jakub Kicinski wrote:
> BTF in JSON is very useful, and will help people who writes simple
> orchestration/scripts based on bpftool *a* *lot*.  I really appreciate
> this addition to bpftool and will start using it myself as soon as it
> lands.  I'm not sure why the reluctance to slightly change the output
> format?

Ohh, maybe that's the misunderstanding, you only implemented JSON so
you wanted it to be as readable and clean as possible.  Hence the hex
output and cutting out the old cruft!  That perspective makes sense!
But I think we should keep JSON for machines (but including BTF
formatted values) and let's make the plain text output nice and clean,
agreed.

^ permalink raw reply

* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Siwei Liu @ 2018-06-22 21:51 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Cornelia Huck, Samudrala, Sridhar, Alexander Duyck, virtio-dev,
	aaron.f.brown, Jiri Pirko, Jakub Kicinski, Netdev, qemu-devel,
	virtualization, konrad.wilk, boris.ostrovsky, Joao Martins,
	Venu Busireddy, vijay.balakrishna
In-Reply-To: <20180623002628-mutt-send-email-mst@kernel.org>

On Fri, Jun 22, 2018 at 2:29 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Fri, Jun 22, 2018 at 01:03:04PM -0700, Siwei Liu wrote:
>> On Fri, Jun 22, 2018 at 1:00 PM, Siwei Liu <loseweigh@gmail.com> wrote:
>> > On Thu, Jun 21, 2018 at 7:32 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> >> On Thu, Jun 21, 2018 at 06:21:55PM -0700, Siwei Liu wrote:
>> >>> On Thu, Jun 21, 2018 at 7:59 AM, Cornelia Huck <cohuck@redhat.com> wrote:
>> >>> > On Wed, 20 Jun 2018 22:48:58 +0300
>> >>> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> >>> >
>> >>> >> On Wed, Jun 20, 2018 at 06:06:19PM +0200, Cornelia Huck wrote:
>> >>> >> > In any case, I'm not sure anymore why we'd want the extra uuid.
>> >>> >>
>> >>> >> It's mostly so we can have e.g. multiple devices with same MAC
>> >>> >> (which some people seem to want in order to then use
>> >>> >> then with different containers).
>> >>> >>
>> >>> >> But it is also handy for when you assign a PF, since then you
>> >>> >> can't set the MAC.
>> >>> >>
>> >>> >
>> >>> > OK, so what about the following:
>> >>> >
>> >>> > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates
>> >>> >   that we have a new uuid field in the virtio-net config space
>> >>> > - in QEMU, add a property for virtio-net that allows to specify a uuid,
>> >>> >   offer VIRTIO_NET_F_STANDBY_UUID if set
>> >>> > - when configuring, set the property to the group UUID of the vfio-pci
>> >>> >   device
>> >>>
>> >>> If feature negotiation fails on VIRTIO_NET_F_STANDBY_UUID, is it safe
>> >>> to still expose UUID in the config space on virtio-pci?
>> >>
>> >>
>> >> Yes but guest is not supposed to read it.
>> >>
>> >>> I'm not even sure if it's sane to expose group UUID on the PCI bridge
>> >>> where the corresponding vfio-pci device attached to for a guest which
>> >>> doesn't support the feature (legacy).
>> >>>
>> >>> -Siwei
>> >>
>> >> Yes but you won't add the primary behind such a bridge.
>> >
>> > I assume the UUID feature is a new one besides the exiting
>> > VIRTIO_NET_F_STANDBY feature, where I think the current proposal is,
>> > if UUID feature is present and supported by guest, we'll do pairing
>> > based on UUID; if UUID feature present
>>                                  ^^^^^^^  is NOT present
>
> Let's go back for a bit.
>
> What is wrong with matching by mac?
>
> 1. Does not allow multiple NICs with same mac
> 2. Requires MAC to be programmed by host in the PT device
>    (which is often possible with VFs but isn't possible with all PT
>    devices)

Might not neccessarily be something wrong, but it's very limited to
prohibit the MAC of VF from changing when enslaved by failover. The
same as you indicated on the PT device. I suspect the same is
prohibited on even virtio-net and failover is because of the same
limitation.

>
> Both issues are up to host: if host needs them it needs the UUID
> feature, if not MAC matching is sufficient.

I know. What I'm saying is that we rely on STANDBY feature to control
the visibility of the primary, not the UUID feature.

-Siwei

>
>
>> > or not supported by guest,
>> > we'll still plug in the VF (if guest supports the _F_STANDBY feature)
>> > but the pairing will be fallback to using MAC address. Are you saying
>> > you don't even want to plug in the primary when the UUID feature is
>> > not supported by guest? Does the feature negotiation UUID have to
>> > depend on STANDBY being set, or the other way around? I thought that
>> > just the absence of STANDBY will prevent primary to get exposed to the
>> > guest.
>> >
>> > -Siwei
>> >
>> >>
>> >>>
>> >>> > - in the guest, use the uuid from the virtio-net device's config space
>> >>> >   if applicable; else, fall back to matching by MAC as done today
>

^ permalink raw reply

* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Siwei Liu @ 2018-06-22 21:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alexander Duyck, virtio-dev, Jiri Pirko, konrad.wilk,
	Jakub Kicinski, Samudrala, Sridhar, Cornelia Huck, qemu-devel,
	virtualization, Venu Busireddy, Netdev, boris.ostrovsky,
	aaron.f.brown, Joao Martins
In-Reply-To: <20180623003022-mutt-send-email-mst@kernel.org>

On Fri, Jun 22, 2018 at 2:32 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Fri, Jun 22, 2018 at 01:21:45PM -0700, Siwei Liu wrote:
>> On Fri, Jun 22, 2018 at 12:05 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> > On Fri, Jun 22, 2018 at 05:09:55PM +0200, Cornelia Huck wrote:
>> >> On Thu, 21 Jun 2018 21:20:13 +0300
>> >> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> >>
>> >> > On Thu, Jun 21, 2018 at 04:59:13PM +0200, Cornelia Huck wrote:
>> >> > > OK, so what about the following:
>> >> > >
>> >> > > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates
>> >> > >   that we have a new uuid field in the virtio-net config space
>> >> > > - in QEMU, add a property for virtio-net that allows to specify a uuid,
>> >> > >   offer VIRTIO_NET_F_STANDBY_UUID if set
>> >> > > - when configuring, set the property to the group UUID of the vfio-pci
>> >> > >   device
>> >> > > - in the guest, use the uuid from the virtio-net device's config space
>> >> > >   if applicable; else, fall back to matching by MAC as done today
>> >> > >
>> >> > > That should work for all virtio transports.
>> >> >
>> >> > True. I'm a bit unhappy that it's virtio net specific though
>> >> > since down the road I expect we'll have a very similar feature
>> >> > for scsi (and maybe others).
>> >> >
>> >> > But we do not have a way to have fields that are portable
>> >> > both across devices and transports, and I think it would
>> >> > be a useful addition. How would this work though? Any idea?
>> >>
>> >> Can we introduce some kind of device-independent config space area?
>> >> Pushing back the device-specific config space by a certain value if the
>> >> appropriate feature is negotiated and use that for things like the uuid?
>> >
>> > So config moves back and forth?
>> > Reminds me of the msi vector mess we had with pci.
>> > I'd rather have every transport add a new config.
>> >
>> >> But regardless of that, I'm not sure whether extending this approach to
>> >> other device types is the way to go. Tying together two different
>> >> devices is creating complicated situations at least in the hypervisor
>> >> (even if it's fairly straightforward in the guest). [I have not come
>> >> around again to look at the "how to handle visibility in QEMU"
>> >> questions due to lack of cycles, sorry about that.]
>> >>
>> >> So, what's the goal of this approach? Only to allow migration with
>> >> vfio-pci, or also to plug in a faster device and use it instead of an
>> >> already attached paravirtualized device?
>> >
>> > These are two sides of the same coin, I think the second approach
>> > is closer to what we are doing here.
>>
>> I'm leaning towards being conservative to keep the potential of doing
>> both. It's the vfio migration itself that has to be addessed, not to
>> make every virtio device to have an accelerated datapath, right?
>>
>> -Siwei
>
> I think that the approach we took (signal configuration
> through standby) assumes standby always present and
> primary appearing and disappearing.

I can imagine that's still true even for 1-netdev model. As long as we
can hide the lower devices.

That's what I said "to keep the potential to support both" models. But
we should not go further along to assume the other aspect ie. to get
PV accelerated whenever possible, but not to get VF migrated whenever
possible. That's essetially a big diveregence of the priority for the
goal we'd want to pursue.

-Siwei

>
> Anything else isn't well supported and I'm not sure we
> should complicate code too much to support it.
>
>>
>> >
>> >> What about migration of vfio devices that are not easily replaced by a
>> >> paravirtualized device? I'm thinking of vfio-ccw, where our main (and
>> >> currently only) supported device is dasd (disks) -- which can do a lot
>> >> of specialized things that virtio-blk does not support (and should not
>> >> or even cannot support).
>> >
>> > But maybe virtio-scsi can?
>> >
>> >> Would it be more helpful to focus on generic
>> >> migration support for vfio instead of going about it device by device?
>> >>
>> >> This network device approach already seems far along, so it makes sense
>> >> to continue with it. But I'm not sure whether we want to spend time and
>> >> energy on that for other device types rather than working on a general
>> >> solution for vfio migration.
>> >
>> > I'm inclined to say finalizing this feature would be a good first step
>> > and will teach us how we can move forward.
>> >
>> > --
>> > MST

^ permalink raw reply

* napi: SCHED flag set/clear
From: Krishna Chaitanya @ 2018-06-22 22:22 UTC (permalink / raw)
  To: netdev

Hi,

While debugging a NAPI related issue (calling napi_disable() twice
leading to deadlock),
I have come across below query. The issue is not documented anywhere
(except for Ben grear's  ath10k patch)

Below are my expectations:

napi_disable(): waits for SCHED (ignoring other flags)
napi_enable(): set SCHED flag

but in reality,

napi_disable: test and set SCHED flag in a while() loop?
napi_enable: clear SCHED flag

This is a bit counter-intuitive, SCHED generally means pollable (enabled).

Am i missing something?

^ permalink raw reply

* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Michael S. Tsirkin @ 2018-06-22 22:25 UTC (permalink / raw)
  To: Siwei Liu
  Cc: Cornelia Huck, Samudrala, Sridhar, Alexander Duyck, virtio-dev,
	aaron.f.brown, Jiri Pirko, Jakub Kicinski, Netdev, qemu-devel,
	virtualization, konrad.wilk, boris.ostrovsky, Joao Martins,
	Venu Busireddy, vijay.balakrishna
In-Reply-To: <CADGSJ20td7b6He6S1PBSJjCq=bNvgqMGbOjiho2eeQm2pgpK3g@mail.gmail.com>

On Fri, Jun 22, 2018 at 02:51:11PM -0700, Siwei Liu wrote:
> On Fri, Jun 22, 2018 at 2:29 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Fri, Jun 22, 2018 at 01:03:04PM -0700, Siwei Liu wrote:
> >> On Fri, Jun 22, 2018 at 1:00 PM, Siwei Liu <loseweigh@gmail.com> wrote:
> >> > On Thu, Jun 21, 2018 at 7:32 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> >> On Thu, Jun 21, 2018 at 06:21:55PM -0700, Siwei Liu wrote:
> >> >>> On Thu, Jun 21, 2018 at 7:59 AM, Cornelia Huck <cohuck@redhat.com> wrote:
> >> >>> > On Wed, 20 Jun 2018 22:48:58 +0300
> >> >>> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >> >>> >
> >> >>> >> On Wed, Jun 20, 2018 at 06:06:19PM +0200, Cornelia Huck wrote:
> >> >>> >> > In any case, I'm not sure anymore why we'd want the extra uuid.
> >> >>> >>
> >> >>> >> It's mostly so we can have e.g. multiple devices with same MAC
> >> >>> >> (which some people seem to want in order to then use
> >> >>> >> then with different containers).
> >> >>> >>
> >> >>> >> But it is also handy for when you assign a PF, since then you
> >> >>> >> can't set the MAC.
> >> >>> >>
> >> >>> >
> >> >>> > OK, so what about the following:
> >> >>> >
> >> >>> > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates
> >> >>> >   that we have a new uuid field in the virtio-net config space
> >> >>> > - in QEMU, add a property for virtio-net that allows to specify a uuid,
> >> >>> >   offer VIRTIO_NET_F_STANDBY_UUID if set
> >> >>> > - when configuring, set the property to the group UUID of the vfio-pci
> >> >>> >   device
> >> >>>
> >> >>> If feature negotiation fails on VIRTIO_NET_F_STANDBY_UUID, is it safe
> >> >>> to still expose UUID in the config space on virtio-pci?
> >> >>
> >> >>
> >> >> Yes but guest is not supposed to read it.
> >> >>
> >> >>> I'm not even sure if it's sane to expose group UUID on the PCI bridge
> >> >>> where the corresponding vfio-pci device attached to for a guest which
> >> >>> doesn't support the feature (legacy).
> >> >>>
> >> >>> -Siwei
> >> >>
> >> >> Yes but you won't add the primary behind such a bridge.
> >> >
> >> > I assume the UUID feature is a new one besides the exiting
> >> > VIRTIO_NET_F_STANDBY feature, where I think the current proposal is,
> >> > if UUID feature is present and supported by guest, we'll do pairing
> >> > based on UUID; if UUID feature present
> >>                                  ^^^^^^^  is NOT present
> >
> > Let's go back for a bit.
> >
> > What is wrong with matching by mac?
> >
> > 1. Does not allow multiple NICs with same mac
> > 2. Requires MAC to be programmed by host in the PT device
> >    (which is often possible with VFs but isn't possible with all PT
> >    devices)
> 
> Might not neccessarily be something wrong, but it's very limited to
> prohibit the MAC of VF from changing when enslaved by failover.

You mean guest changing MAC? I'm not sure why we prohibit that.


> The
> same as you indicated on the PT device. I suspect the same is
> prohibited on even virtio-net and failover is because of the same
> limitation.
> 
> >
> > Both issues are up to host: if host needs them it needs the UUID
> > feature, if not MAC matching is sufficient.
> 
> I know. What I'm saying is that we rely on STANDBY feature to control
> the visibility of the primary, not the UUID feature.
> 
> -Siwei

And what I'm saying is that it's up to a host policy.

> >
> >
> >> > or not supported by guest,
> >> > we'll still plug in the VF (if guest supports the _F_STANDBY feature)
> >> > but the pairing will be fallback to using MAC address. Are you saying
> >> > you don't even want to plug in the primary when the UUID feature is
> >> > not supported by guest? Does the feature negotiation UUID have to
> >> > depend on STANDBY being set, or the other way around? I thought that
> >> > just the absence of STANDBY will prevent primary to get exposed to the
> >> > guest.
> >> >
> >> > -Siwei
> >> >
> >> >>
> >> >>>
> >> >>> > - in the guest, use the uuid from the virtio-net device's config space
> >> >>> >   if applicable; else, fall back to matching by MAC as done today
> >

^ permalink raw reply

* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Michael S. Tsirkin @ 2018-06-22 22:33 UTC (permalink / raw)
  To: Siwei Liu
  Cc: Alexander Duyck, virtio-dev, Jiri Pirko, konrad.wilk,
	Jakub Kicinski, Samudrala, Sridhar, Cornelia Huck, qemu-devel,
	virtualization, Venu Busireddy, Netdev, boris.ostrovsky,
	aaron.f.brown, Joao Martins
In-Reply-To: <CADGSJ22oFhP955cuTmMWeSt0UOsG5K1A--c34QUSMx5z3Up2SA@mail.gmail.com>

On Fri, Jun 22, 2018 at 02:57:39PM -0700, Siwei Liu wrote:
> On Fri, Jun 22, 2018 at 2:32 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Fri, Jun 22, 2018 at 01:21:45PM -0700, Siwei Liu wrote:
> >> On Fri, Jun 22, 2018 at 12:05 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> > On Fri, Jun 22, 2018 at 05:09:55PM +0200, Cornelia Huck wrote:
> >> >> On Thu, 21 Jun 2018 21:20:13 +0300
> >> >> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >> >>
> >> >> > On Thu, Jun 21, 2018 at 04:59:13PM +0200, Cornelia Huck wrote:
> >> >> > > OK, so what about the following:
> >> >> > >
> >> >> > > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates
> >> >> > >   that we have a new uuid field in the virtio-net config space
> >> >> > > - in QEMU, add a property for virtio-net that allows to specify a uuid,
> >> >> > >   offer VIRTIO_NET_F_STANDBY_UUID if set
> >> >> > > - when configuring, set the property to the group UUID of the vfio-pci
> >> >> > >   device
> >> >> > > - in the guest, use the uuid from the virtio-net device's config space
> >> >> > >   if applicable; else, fall back to matching by MAC as done today
> >> >> > >
> >> >> > > That should work for all virtio transports.
> >> >> >
> >> >> > True. I'm a bit unhappy that it's virtio net specific though
> >> >> > since down the road I expect we'll have a very similar feature
> >> >> > for scsi (and maybe others).
> >> >> >
> >> >> > But we do not have a way to have fields that are portable
> >> >> > both across devices and transports, and I think it would
> >> >> > be a useful addition. How would this work though? Any idea?
> >> >>
> >> >> Can we introduce some kind of device-independent config space area?
> >> >> Pushing back the device-specific config space by a certain value if the
> >> >> appropriate feature is negotiated and use that for things like the uuid?
> >> >
> >> > So config moves back and forth?
> >> > Reminds me of the msi vector mess we had with pci.
> >> > I'd rather have every transport add a new config.
> >> >
> >> >> But regardless of that, I'm not sure whether extending this approach to
> >> >> other device types is the way to go. Tying together two different
> >> >> devices is creating complicated situations at least in the hypervisor
> >> >> (even if it's fairly straightforward in the guest). [I have not come
> >> >> around again to look at the "how to handle visibility in QEMU"
> >> >> questions due to lack of cycles, sorry about that.]
> >> >>
> >> >> So, what's the goal of this approach? Only to allow migration with
> >> >> vfio-pci, or also to plug in a faster device and use it instead of an
> >> >> already attached paravirtualized device?
> >> >
> >> > These are two sides of the same coin, I think the second approach
> >> > is closer to what we are doing here.
> >>
> >> I'm leaning towards being conservative to keep the potential of doing
> >> both. It's the vfio migration itself that has to be addessed, not to
> >> make every virtio device to have an accelerated datapath, right?
> >>
> >> -Siwei
> >
> > I think that the approach we took (signal configuration
> > through standby) assumes standby always present and
> > primary appearing and disappearing.
> 
> I can imagine that's still true even for 1-netdev model. As long as we
> can hide the lower devices.
> 
> That's what I said "to keep the potential to support both" models. But
> we should not go further along to assume the other aspect ie. to get
> PV accelerated whenever possible, but not to get VF migrated whenever
> possible. That's essetially a big diveregence of the priority for the
> goal we'd want to pursue.
> 
> -Siwei

I suspect the diveregence will be lost on most users though
simply because they don't even care about vfio. They just
want things to go fast.

Rephrasing requirements in terms of acceleration actually
made things implementable. So I'm not in a hurry to try
and go back to asking for a migrateable vfio - very
easy to get stuck there.

> >
> > Anything else isn't well supported and I'm not sure we
> > should complicate code too much to support it.
> >
> >>
> >> >
> >> >> What about migration of vfio devices that are not easily replaced by a
> >> >> paravirtualized device? I'm thinking of vfio-ccw, where our main (and
> >> >> currently only) supported device is dasd (disks) -- which can do a lot
> >> >> of specialized things that virtio-blk does not support (and should not
> >> >> or even cannot support).
> >> >
> >> > But maybe virtio-scsi can?
> >> >
> >> >> Would it be more helpful to focus on generic
> >> >> migration support for vfio instead of going about it device by device?
> >> >>
> >> >> This network device approach already seems far along, so it makes sense
> >> >> to continue with it. But I'm not sure whether we want to spend time and
> >> >> energy on that for other device types rather than working on a general
> >> >> solution for vfio migration.
> >> >
> >> > I'm inclined to say finalizing this feature would be a good first step
> >> > and will teach us how we can move forward.
> >> >
> >> > --
> >> > MST

^ permalink raw reply

* Re: [PATCH bpf-next 2/3] bpf: btf: add btf json print functionality
From: Okash Khawaja @ 2018-06-22 22:48 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Martin KaFai Lau, Daniel Borkmann, Alexei Starovoitov,
	Yonghong Song, Quentin Monnet, David S. Miller, netdev,
	kernel-team, linux-kernel
In-Reply-To: <20180622142743.2b890d0f@cakuba.netronome.com>

On Fri, Jun 22, 2018 at 02:27:43PM -0700, Jakub Kicinski wrote:

[...]

> > > should just use the BTF, I'm not sure we should format indexes for
> > > arrays nicely or not :S
> > >   
> > > > The key could also be a struct (e.g. a struct to describe ip:port).
> > > > Can you suggest how the "key_struct" will look like?  
> > > 
> > > Hm.  I think in my mind it has only been a struct but that's not true :S
> > > So the struct in the name makes very limited sense now.
> > > 
> > > Should we do:
> > > "formatted" : {
> > > 	"value" : XXX
> > > }
> > > 
> > > Where
> > > XXX == plain int for integers, e.g. "value":0
> > > XXX == array for arrays, e.g. "value":[1,2,3,4]
> > > XXX == object for objects, e.g. "value":{"field":XXX, "field2":XXX}  
> > It is exactly how this patch is using json's {}, [] and int. ;)
> 
> Great, then just wrap that in a "formatted" object instead of
> redefining fields and we're good?
Please don't let two formats of same data in a single ouput. Overloading
same output with multple formats suggests a confused design, IMO. It
will confuse users too.

> 
> > but other than that, it does not have to be json.
> > In the next spin, lets stop calling this output json to avoid wrong
> > user's expection and I also don't want the future readability
> > improvements to be limited by that.  Lets call it something
> > like plain text output with BTF.
> 
> I don't understand.  We are discussing JSON output here.  The example we
> are commenting on is output of:
> 
> $ sudo bpftool map dump -p id 14
> 
> That -p means JSON.  Nobody objects to plain test output changes.  I
> actually didn't realize you haven't implemented plain text in this
> series, we should have both.
> 
> > How about:
> > When "bpftool map dump id 1" is used, it will print the BTF plaintext output
> > if a map has BTF available.  If not, it will print the existing
> > plaintext output.  That should solve the concern about the user may not
> > know BTF is available.
> > 
> > This ascii output is for human.  The script should not parse the ascii output
> > because it is silly not to use the binary ABI (like what this patch is using)
> > which does not suffer backward compat issue.
> 
> What binary ABI?
Reading binary data and linking it with BTF information.

> I'm also not 100% sure what this patch is doing as it
> adds bunch of code in new files that never gets called:
Please take a look at the patch then :) Code in these files does get
called.

We seem to be conflating two different - and conflicting - intents here.
First is progam-readability of the output and second is human
readability of the output. I believe both are important. Let's leave
existing output for programs to read. That way we can try to keep it
backward compatible as well as JSON compatible. Let's dedicate the new
BTF format for humans to read. That way, we don't have to worry about
backward compatibility or JSON compatibility.

Let me know if this is not clear. If we agree on above then I think we
can move towards a solution.

Thanks,
Okash

^ permalink raw reply

* Re: [PATCH bpf-next 2/3] bpf: btf: add btf json print functionality
From: Martin KaFai Lau @ 2018-06-22 22:54 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Okash Khawaja, Daniel Borkmann, Alexei Starovoitov, Yonghong Song,
	Quentin Monnet, David S. Miller, netdev, kernel-team,
	linux-kernel
In-Reply-To: <20180622142743.2b890d0f@cakuba.netronome.com>

On Fri, Jun 22, 2018 at 02:27:43PM -0700, Jakub Kicinski wrote:
> On Fri, 22 Jun 2018 13:58:52 -0700, Martin KaFai Lau wrote:
> > On Fri, Jun 22, 2018 at 11:40:32AM -0700, Jakub Kicinski wrote:
> > > On Thu, 21 Jun 2018 18:20:52 -0700, Martin KaFai Lau wrote:  
> > > > On Thu, Jun 21, 2018 at 05:25:23PM -0700, Jakub Kicinski wrote:  
> > > > > On Thu, 21 Jun 2018 16:58:15 -0700, Martin KaFai Lau wrote:    
> > > > > > On Thu, Jun 21, 2018 at 04:07:19PM -0700, Jakub Kicinski wrote:    
> > > > > > > On Thu, 21 Jun 2018 15:51:17 -0700, Martin KaFai Lau wrote:      
> > > > > > > > On Thu, Jun 21, 2018 at 02:59:35PM -0700, Jakub Kicinski wrote:      
> > > > > > > > > On Wed, 20 Jun 2018 13:30:53 -0700, Okash Khawaja wrote:        
> > > > > > > > > > $ sudo bpftool map dump -p id 14
> > > > > > > > > > [{
> > > > > > > > > >         "key": 0
> > > > > > > > > >     },{
> > > > > > > > > >         "value": {
> > > > > > > > > >             "m": 1,
> > > > > > > > > >             "n": 2,
> > > > > > > > > >             "o": "c",
> > > > > > > > > >             "p": [15,16,17,18,15,16,17,18
> > > > > > > > > >             ],
> > > > > > > > > >             "q": [[25,26,27,28,25,26,27,28
> > > > > > > > > >                 ],[35,36,37,38,35,36,37,38
> > > > > > > > > >                 ],[45,46,47,48,45,46,47,48
> > > > > > > > > >                 ],[55,56,57,58,55,56,57,58
> > > > > > > > > >                 ]
> > > > > > > > > >             ],
> > > > > > > > > >             "r": 1,
> > > > > > > > > >             "s": 0x7ffff6f70568,
> > > > > > > > > >             "t": {
> > > > > > > > > >                 "x": 5,
> > > > > > > > > >                 "y": 10
> > > > > > > > > >             },
> > > > > > > > > >             "u": 100,
> > > > > > > > > >             "v": 20,
> > > > > > > > > >             "w1": 0x7,
> > > > > > > > > >             "w2": 0x3
> > > > > > > > > >         }
> > > > > > > > > >     }
> > > > > > > > > > ]        
> > > > > > > > > 
> > > > > > > > > I don't think this format is okay, JSON output is an API you shouldn't
> > > > > > > > > break.  You can change the non-JSON output whatever way you like, but
> > > > > > > > > JSON must remain backwards compatible.
> > > > > > > > > 
> > > > > > > > > The dump today has object per entry, e.g.:
> > > > > > > > > 
> > > > > > > > > {
> > > > > > > > >         "key":["0x00","0x00","0x00","0x00",
> > > > > > > > >         ],
> > > > > > > > >         "value": ["0x02","0x00","0x00","0x00","0x00","0x00","0x00","0x00"
> > > > > > > > >         ]
> > > > > > > > > }
> > > > > > > > > 
> > > > > > > > > This format must remain, you may only augment it with new fields.  E.g.:
> > > > > > > > > 
> > > > > > > > > {
> > > > > > > > >         "key":["0x00","0x00","0x00","0x00",
> > > > > > > > >         ],
> > > > > > > > > 	"key_struct":{
> > > > > > > > > 		"index":0
> > > > > > > > > 	},    
> > > > Got a few questions.
> > > > 
> > > > When we support hashtab later, the key could be int
> > > > but reusing the name as "index" is weird.  
> > > 
> > > Ugh, yes, naturally.  I just typed that out without thinking, so for
> > > array maps there is usually no BTF info?...  For hashes obviously we  
> > The key of array map also has BTF info which must be an int.
> 
> Perfect.
> 
> > > should just use the BTF, I'm not sure we should format indexes for
> > > arrays nicely or not :S
> > >   
> > > > The key could also be a struct (e.g. a struct to describe ip:port).
> > > > Can you suggest how the "key_struct" will look like?  
> > > 
> > > Hm.  I think in my mind it has only been a struct but that's not true :S
> > > So the struct in the name makes very limited sense now.
> > > 
> > > Should we do:
> > > "formatted" : {
> > > 	"value" : XXX
> > > }
> > > 
> > > Where
> > > XXX == plain int for integers, e.g. "value":0
> > > XXX == array for arrays, e.g. "value":[1,2,3,4]
> > > XXX == object for objects, e.g. "value":{"field":XXX, "field2":XXX}  
> > It is exactly how this patch is using json's {}, [] and int. ;)
> 
> Great, then just wrap that in a "formatted" object instead of
> redefining fields and we're good?
> 
> > but other than that, it does not have to be json.
> > In the next spin, lets stop calling this output json to avoid wrong
> > user's expection and I also don't want the future readability
> > improvements to be limited by that.  Lets call it something
> > like plain text output with BTF.
> 
> I don't understand.  We are discussing JSON output here.  The example we
> are commenting on is output of:
> 
> $ sudo bpftool map dump -p id 14
> 
> That -p means JSON.  Nobody objects to plain test output changes.  I
> actually didn't realize you haven't implemented plain text in this
> series, we should have both.
> 
> > How about:
> > When "bpftool map dump id 1" is used, it will print the BTF plaintext output
> > if a map has BTF available.  If not, it will print the existing
> > plaintext output.  That should solve the concern about the user may not
> > know BTF is available.
> > 
> > This ascii output is for human.  The script should not parse the ascii output
> > because it is silly not to use the binary ABI (like what this patch is using)
> > which does not suffer backward compat issue.
> 
> What binary ABI?  I'm also not 100% sure what this patch is doing as it
> adds bunch of code in new files that never gets called:
I meant the BTF format, the new kernel API to get BTF and how it is used
on map data are stable.

Yes, there is new codes to get and consume the new BTF format but so does
any new kernel API.  and it should not drive everybody to parse ascii.

> 
>  tools/bpf/bpftool/btf_dumper.c |  247 +++++++++++++++++++++++++++++++++++++++++
>  tools/bpf/bpftool/btf_dumper.h |   18 ++
>  2 files changed, 265 insertions(+)
> 
> > The existing "-j" can be used to dump the map's binary data
> > for remote debugging purpose.  The BTF is in one of the elf section of
> > the bpf_prog.o.
> 
> > > > > > > > >         "value": ["0x02","0x00","0x00","0x00","0x00","0x00","0x00","0x00"
> > > > > > > > >         ],
> > > > > > > > > 	"value_struct":{
> > > > > > > > > 		"src_ip":2,    
> > > > If for the same map the user changes the "src_ip" to an array of int[4]
> > > > later (e.g. to support ipv6), it will become "src_ip": [1, 2, 3, 4].
> > > > Is it breaking backward compat?
> > > > i.e.
> > > > struct five_tuples {
> > > > -	int src_ip;
> > > > +	int src_ip[4];
> > > > /* ... */
> > > > };  
> > > 
> > > Well, it is breaking backward compat, but it's the program doing it,
> > > not bpftool :)  BTF changes so does the output.  
> > As we see, the key/value's btf-output is inherently not backward compat.
> > Hence, "-j" and "-p" will stay as is.  The whole existing json will
> > be backward compat instead of only partly backward compat.
> 
> No.  There is a difference between user of a facility changing their
> input and kernel/libraries providing different output in response to
> that, and the libraries suddenly changing the output on their own.
> 
> Your example is like saying if user started using IPv6 addresses
> instead of IPv4 the netlink attributes in dumps will be different so
> kernel didn't keep backwards compat.  While what you're doing is more
> equivalent to dropping support for old ioctl interfaces because there
> is a better mechanism now.
Sorry, I don't follow this.  I don't see netlink suffer json issue like
the one on "key" and "value".

All I can grasp is, the json should normally be backward compat but now
we are saying anything added by btf-output is an exception because
the script parsing it will treat it differently than "key" and "value"

> 
> BTF in JSON is very useful, and will help people who writes simple
> orchestration/scripts based on bpftool *a* *lot*.  I really appreciate
Can you share what the script will do?  I want to understand why
it cannot directly use the BTF format and the map data.

> this addition to bpftool and will start using it myself as soon as it
> lands.  I'm not sure why the reluctance to slightly change the output
> format?
The initial change argument is because the json has to be backward compat.

Then we show that btf-output is inherently not backward compat, so
printing it in json does not make sense at all.

However, now it is saying part of it does not have to be backward compat.

I am fine putting it under "formatted" for "-j" or "-p" if that is the
case, other than the double output is still confusing.  Lets wait for
Okash's input.

At the same time, the same output will be used as the default plaintext
output when BTF is available.  Then the plaintext BTF output
will not be limited by the json restrictions when we want
to improve human readability later.  Apparently, the
improvements on plaintext will not be always applicable
to json output.

^ permalink raw reply

* arm64 regression bisected to "bpf: reject any prog that failed read-only lock"
From: dann frazier @ 2018-06-22 23:09 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: linux-kernel, netdev, Martin KaFai Lau, linux-arm-kernel,
	Alexei Starovoitov

fyi, I'm seeing a regression that I've bisected down to:
  9facc336876f bpf: reject any prog that failed read-only lock

A snippet of the boot messages are below. I'll be out of pocket until
Monday, so apologies in advance for delays in responding to follow-ups.

[   28.544142] Internal error: SP/PC alignment exception: 8a000000 [#21] SMP
[   28.550918] Modules linked in: btrfs zstd_compress hid_generic raid10 raid456 usbhid async_raid6_recov async_memcpy async_pq async_xor hid async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear marvell hibmc_drm aes_ce_blk ttm aes_ce_cipher drm_kms_helper crc32_ce crct10dif_ce syscopyarea ghash_ce qla2xxx sysfillrect sha2_ce sysimgblt fb_sys_fops sha256_arm64 sha1_ce hns3 mpt3sas nvme_fc hisi_sas_v3_hw nvme_fabrics drm ixgbe hisi_sas_main scsi_transport_fc hclge libsas ahci raid_class hnae3 libahci scsi_transport_sas mdio aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64
[   28.602352] CPU: 19 PID: 1423 Comm: systemd-udevd Tainted: G      D           4.18.0-rc1+ #19
[   28.610861] Hardware name: Huawei D06/D06, BIOS Hisilicon D06 EC UEFI Nemo 2.0 RC0 - B308 (V0.38) 06/02/2018
[   28.620674] pstate: 40400009 (nZcv daif +PAN -UAO)
[   28.625451] pc : 0xffff00000327f857
[   28.628928] lr : sk_filter_trim_cap+0x8c/0x1b8
[   28.633358] sp : ffff00002b1bba80
[   28.636659] x29: ffff00002b1bba80 x28: 0000000000000001 
[   28.641958] x27: ffff803f71e25800 x26: ffff803f6b16d000 
[   28.647257] x25: 0000000000000000 x24: 0000000000000000 
[   28.652555] x23: 0000000000000001 x22: ffff00002ab06000 
[   28.657854] x21: 0000000000000000 x20: ffff0000095c8000 
[   28.663152] x19: ffff803f73c7a700 x18: 0000fffff53ce450 
[   28.668451] x17: 0000ffffb96575f8 x16: ffff0000089c1c38 
[   28.673750] x15: 0000000000000020 x14: 5953425553003331 
[   28.679049] x13: 3a303a332d796870 x12: 2f7968705f736173 
[   28.684347] x11: 2f33313a303a332d x10: 00000000000000db 
[   28.689646] x9 : ffff00002b1bbd50 x8 : ffff803f73c7a700 
[   28.694945] x7 : ffff803f73c7be00 x6 : 0000000000001900 
[   28.700243] x5 : 0000000000000183 x4 : ffff803f80ccf9d0 
[   28.705542] x3 : 0000000000000000 x2 : ffff00000327f857 
[   28.710840] x1 : ffff00002ab06038 x0 : ffff803f73c7a700 
[   28.716140] Process systemd-udevd (pid: 1423, stack limit = 0x(____ptrval____))
[   28.723434] Call trace:
[   28.725867]  0xffff00000327f857
[   28.728995]  netlink_broadcast_filtered+0x234/0x418
[   28.733860]  netlink_sendmsg+0x354/0x360
[   28.737769]  sock_sendmsg+0x4c/0x68
[   28.741244]  ___sys_sendmsg+0x2ac/0x2f0
[   28.745067]  __sys_sendmsg+0x7c/0xd0
[   28.748629]  sys_sendmsg+0x38/0x48
[   28.752017]  el0_svc_naked+0x30/0x34
[   28.755580] Code: 202000d4 202000d4 202000d4 202000d4 (a9bf7bfd) 
[   28.761660] ---[ end trace 4451ed1a5b395c65 ]---
[   28.766267] Internal error: SP/PC alignment exception: 8a000000 [#22] SMP
[   28.773044] Modules linked in: btrfs zstd_compress hid_generic raid10 raid456 usbhid async_raid6_recov async_memcpy async_pq async_xor hid async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear marvell hibmc_drm aes_ce_blk ttm aes_ce_cipher drm_kms_helper crc32_ce crct10dif_ce syscopyarea ghash_ce qla2xxx sysfillrect sha2_ce sysimgblt fb_sys_fops sha256_arm64 sha1_ce hns3 mpt3sas nvme_fc hisi_sas_v3_hw nvme_fabrics drm ixgbe hisi_sas_main scsi_transport_fc hclge libsas ahci raid_class hnae3 libahci scsi_transport_sas mdio aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64
[   28.824485] CPU: 20 PID: 1424 Comm: systemd-udevd Tainted: G      D           4.18.0-rc1+ #19
[   28.832994] Hardware name: Huawei D06/D06, BIOS Hisilicon D06 EC UEFI Nemo 2.0 RC0 - B308 (V0.38) 06/02/2018
[   28.842805] pstate: 40400009 (nZcv daif +PAN -UAO)
[   28.847583] pc : 0xffff00000327f857
[   28.851060] lr : sk_filter_trim_cap+0x8c/0x1b8
[   28.855490] sp : ffff00002b1c3a80
[   28.858791] x29: ffff00002b1c3a80 x28: 0000000000000001 
[   28.864090] x27: ffff803f71e25800 x26: ffff803f6b169000 
[   28.869388] x25: 0000000000000000 x24: 0000000000000000 
[   28.874687] x23: 0000000000000001 x22: ffff00002ab06000 
[   28.879985] x21: 0000000000000000 x20: ffff0000095c8000 
[   28.885284] x19: ffff803f742c0900 x18: 0000fffff53ce450 
[   28.890583] x17: 0000ffffb96575f8 x16: ffff0000089c1c38 
[   28.895881] x15: 0000000000000020 x14: 5953425553003431 
[   28.901180] x13: 3a303a332d796870 x12: 2f7968705f736173 
[   28.906478] x11: 2f34313a303a332d x10: 00000000000000db 
[   28.911777] x9 : ffff00002b1c3d50 x8 : ffff803f742c0900 
[   28.917075] x7 : ffff803f742c0c00 x6 : 0000000000000500 
[   28.922374] x5 : 0000000000000045 x4 : ffff803f80ce89d0 
[   28.927673] x3 : 0000000000000000 x2 : ffff00000327f857 
[   28.932971] x1 : ffff00002ab06038 x0 : ffff803f742c0900 
[   28.938271] Process systemd-udevd (pid: 1424, stack limit = 0x(____ptrval____))
[   28.945565] Call trace:
[   28.947998]  0xffff00000327f857
[   28.951127]  netlink_broadcast_filtered+0x234/0x418
[   28.955991]  netlink_sendmsg+0x354/0x360
[   28.959901]  sock_sendmsg+0x4c/0x68
[   28.963377]  ___sys_sendmsg+0x2ac/0x2f0
[   28.967199]  __sys_sendmsg+0x7c/0xd0
[   28.970761]  sys_sendmsg+0x38/0x48
[   28.974150]  el0_svc_naked+0x30/0x34
[   28.977713] Code: 202000d4 202000d4 202000d4 202000d4 (a9bf7bfd) 
[   28.983793] ---[ end trace 4451ed1a5b395c66 ]---
[   28.988399] Internal error: SP/PC alignment exception: 8a000000 [#23] SMP
[   28.995175] Modules linked in: btrfs zstd_compress hid_generic raid10 raid456 usbhid async_raid6_recov async_memcpy async_pq async_xor hid async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear marvell hibmc_drm aes_ce_blk ttm aes_ce_cipher drm_kms_helper crc32_ce crct10dif_ce syscopyarea ghash_ce qla2xxx sysfillrect sha2_ce sysimgblt fb_sys_fops sha256_arm64 sha1_ce hns3 mpt3sas nvme_fc hisi_sas_v3_hw nvme_fabrics drm ixgbe hisi_sas_main scsi_transport_fc hclge libsas ahci raid_class hnae3 libahci scsi_transport_sas mdio aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64
[   29.046612] CPU: 22 PID: 1425 Comm: systemd-udevd Tainted: G      D           4.18.0-rc1+ #19
[   29.055121] Hardware name: Huawei D06/D06, BIOS Hisilicon D06 EC UEFI Nemo 2.0 RC0 - B308 (V0.38) 06/02/2018
[   29.064932] pstate: 40400009 (nZcv daif +PAN -UAO)
[   29.069710] pc : 0xffff00000327f857
[   29.073187] lr : sk_filter_trim_cap+0x8c/0x1b8
[   29.077616] sp : ffff00002b1cba80
[   29.080917] x29: ffff00002b1cba80 x28: 0000000000000001 
[   29.086216] x27: ffff803f71e25800 x26: ffff803f6b16b000 
[   29.091515] x25: 0000000000000000 x24: 0000000000000000 
[   29.096813] x23: 0000000000000001 x22: ffff00002ab06000 
[   29.102112] x21: 0000000000000000 x20: ffff0000095c8000 
[   29.107411] x19: ffff803f74362c00 x18: 0000fffff53ce450 
[   29.112709] x17: 0000ffffb96575f8 x16: ffff0000089c1c38 
[   29.118008] x15: 0000000000000020 x14: 5953425553003531 
[   29.123306] x13: 3a303a332d796870 x12: 2f7968705f736173 
[   29.128605] x11: 2f35313a303a332d x10: 00000000000000db 
[   29.133904] x9 : ffff00002b1cbd50 x8 : ffff803f74362c00 
[   29.139202] x7 : ffff803f74363900 x6 : 0000000000001500 
[   29.144501] x5 : 00000000000000c2 x4 : ffff803f80d1a9d0 
[   29.149799] x3 : 0000000000000000 x2 : ffff00000327f857 
[   29.155098] x1 : ffff00002ab06038 x0 : ffff803f74362c00 
[   29.160397] Process systemd-udevd (pid: 1425, stack limit = 0x(____ptrval____))
[   29.167691] Call trace:
[   29.170125]  0xffff00000327f857
[   29.173254]  netlink_broadcast_filtered+0x234/0x418
[   29.178118]  netlink_sendmsg+0x354/0x360
[   29.182027]  sock_sendmsg+0x4c/0x68
[   29.185503]  ___sys_sendmsg+0x2ac/0x2f0
[   29.189325]  __sys_sendmsg+0x7c/0xd0
[   29.192888]  sys_sendmsg+0x38/0x48
[   29.196276]  el0_svc_naked+0x30/0x34
[   29.199839] Code: 202000d4 202000d4 202000d4 202000d4 (a9bf7bfd) 
[   29.205919] ---[ end trace 4451ed1a5b395c67 ]---

^ permalink raw reply

* Re: [PATCH bpf-next 2/3] bpf: btf: add btf json print functionality
From: Martin KaFai Lau @ 2018-06-22 23:19 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Okash Khawaja, Daniel Borkmann, Alexei Starovoitov, Yonghong Song,
	Quentin Monnet, David S. Miller, netdev, kernel-team,
	linux-kernel
In-Reply-To: <20180622144952.353d50c0@cakuba.netronome.com>

On Fri, Jun 22, 2018 at 02:49:52PM -0700, Jakub Kicinski wrote:
> On Fri, 22 Jun 2018 14:27:43 -0700, Jakub Kicinski wrote:
> > BTF in JSON is very useful, and will help people who writes simple
> > orchestration/scripts based on bpftool *a* *lot*.  I really appreciate
> > this addition to bpftool and will start using it myself as soon as it
> > lands.  I'm not sure why the reluctance to slightly change the output
> > format?
> 
> Ohh, maybe that's the misunderstanding, you only implemented JSON so
> you wanted it to be as readable and clean as possible.  Hence the hex
> output and cutting out the old cruft!  That perspective makes sense!
> But I think we should keep JSON for machines (but including BTF
> formatted values) and let's make the plain text output nice and clean,
> agreed.
Right, it is what my earlier comment meant on "this ascii output is
for human".  We merely call it json because we are reusing
the json's meaning on {}, [] and int since it fits nicely
on what we want to achieve, readability.  Other than that,
it does not have to follow other json's requirements.
We can call it whatever except json to avoid wrong
user expectation.  Putting it under "-j"/"-p" was a mistake.
Hence, I said this patch belongs to the 'plaintext" output.

If we really needed a similar output under "-j" or "-p", things
had to be changed and it belongs to a separate patch.

^ permalink raw reply

* [PATCH net-next] netns: get more entropy from net_hash_mix()
From: Eric Dumazet @ 2018-06-22 23:27 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet

struct net are effectively allocated from order-1 pages on x86,
with one object per slab, meaning that the 13 low order bits
of their addresses are zero.

Once shifted by L1_CACHE_SHIFT, this leaves 7 zero-bits,
meaning that net_hash_mix() does not help spreading
objects on various hash tables.

For example, TCP listen table has 32 buckets, meaning that
all netns use the same bucket for port 80 or port 443.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Maciej Żenczykowski <maze@google.com>
---
 include/net/netns/hash.h | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/include/net/netns/hash.h b/include/net/netns/hash.h
index 24c78183a4c262e086c29cde1e61b7f397d8bab0..16a842456189f2fc1a3363685b5dd4310a32b2b8 100644
--- a/include/net/netns/hash.h
+++ b/include/net/netns/hash.h
@@ -9,12 +9,7 @@ struct net;
 static inline u32 net_hash_mix(const struct net *net)
 {
 #ifdef CONFIG_NET_NS
-	/*
-	 * shift this right to eliminate bits, that are
-	 * always zeroed
-	 */
-
-	return (u32)(((unsigned long)net) >> L1_CACHE_SHIFT);
+	return (u32)(((unsigned long)net) >> ilog2(sizeof(*net)));
 #else
 	return 0;
 #endif
-- 
2.18.0.rc2.346.g013aa6912e-goog

^ permalink raw reply related

* Re: [PATCH bpf-next 2/3] bpf: btf: add btf json print functionality
From: Jakub Kicinski @ 2018-06-22 23:32 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Okash Khawaja, Daniel Borkmann, Alexei Starovoitov, Yonghong Song,
	Quentin Monnet, David S. Miller, netdev, kernel-team,
	linux-kernel
In-Reply-To: <20180622225408.jor7lpvsksnwiiec@kafai-mbp.dhcp.thefacebook.com>

On Fri, 22 Jun 2018 15:54:08 -0700, Martin KaFai Lau wrote:
> > > > > > > > > >         "value": ["0x02","0x00","0x00","0x00","0x00","0x00","0x00","0x00"
> > > > > > > > > >         ],
> > > > > > > > > > 	"value_struct":{
> > > > > > > > > > 		"src_ip":2,      
> > > > > If for the same map the user changes the "src_ip" to an array of int[4]
> > > > > later (e.g. to support ipv6), it will become "src_ip": [1, 2, 3, 4].
> > > > > Is it breaking backward compat?
> > > > > i.e.
> > > > > struct five_tuples {
> > > > > -	int src_ip;
> > > > > +	int src_ip[4];
> > > > > /* ... */
> > > > > };    
> > > > 
> > > > Well, it is breaking backward compat, but it's the program doing it,
> > > > not bpftool :)  BTF changes so does the output.    
> > > As we see, the key/value's btf-output is inherently not backward compat.
> > > Hence, "-j" and "-p" will stay as is.  The whole existing json will
> > > be backward compat instead of only partly backward compat.  
> > 
> > No.  There is a difference between user of a facility changing their
> > input and kernel/libraries providing different output in response to
> > that, and the libraries suddenly changing the output on their own.
> > 
> > Your example is like saying if user started using IPv6 addresses
> > instead of IPv4 the netlink attributes in dumps will be different so
> > kernel didn't keep backwards compat.  While what you're doing is more
> > equivalent to dropping support for old ioctl interfaces because there
> > is a better mechanism now.  
> Sorry, I don't follow this.  I don't see netlink suffer json issue like
> the one on "key" and "value".
> 
> All I can grasp is, the json should normally be backward compat but now
> we are saying anything added by btf-output is an exception because
> the script parsing it will treat it differently than "key" and "value"

Backward compatibility means that if I run *the same* program against
different kernels/libraries it continues to work.  If someone decides
to upgrade their program to work with IPv6 (which was your example)
obviously there is no way system as a whole will look 1:1 the same.

> > BTF in JSON is very useful, and will help people who writes simple
> > orchestration/scripts based on bpftool *a* *lot*.  I really appreciate  
> Can you share what the script will do?  I want to understand why
> it cannot directly use the BTF format and the map data.

Think about a python script which wants to read a counter in a map.
Right now it would have to get the BTF, find out which bytes are the
counter, then convert the bytes into a larger int.  With JSON BTF if
just does entry["formatted"]["value"]["counter"].

Real life example from my test code (conversion of 3 element counter
array):

def str2int(strtab):
    inttab = []
    for i in strtab:
        inttab.append(int(i, 16))
    ba = bytearray(inttab)
    if len(strtab) == 4:
        fmt = "I"
    elif len(strtab) == 8:
        fmt = "Q"
    else:
        raise Exception("String array of len %d can't be unpacked to an int" %
                        (len(strtab)))
    return struct.unpack(fmt, ba)[0]

def convert(elems, idx):
    val = []
    for i in range(3):
        part = elems[idx]["value"][i * length:(i + 1) * length]
        val.append(str2int(part))
    return val

With BTF it would be:

	elems[idx]["formatted"]["value"]

Which is fairly awesome.

> > this addition to bpftool and will start using it myself as soon as it
> > lands.  I'm not sure why the reluctance to slightly change the output
> > format?  
> The initial change argument is because the json has to be backward compat.
> 
> Then we show that btf-output is inherently not backward compat, so
> printing it in json does not make sense at all.
> 
> However, now it is saying part of it does not have to be backward compat.

Compatibility of "formatted" member is defined as -> fields broken out
according to BTF.  So it is backward compatible.  The definition of
"value" member is -> an array of unfortunately formatted array of
ugly hex strings :(

> I am fine putting it under "formatted" for "-j" or "-p" if that is the
> case, other than the double output is still confusing.  Lets wait for
> Okash's input.
>
> At the same time, the same output will be used as the default plaintext
> output when BTF is available.  Then the plaintext BTF output
> will not be limited by the json restrictions when we want
> to improve human readability later.  Apparently, the
> improvements on plaintext will not be always applicable
> to json output.

^ permalink raw reply

* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Siwei Liu @ 2018-06-22 23:40 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Cornelia Huck, Samudrala, Sridhar, Alexander Duyck, virtio-dev,
	aaron.f.brown, Jiri Pirko, Jakub Kicinski, Netdev, qemu-devel,
	virtualization, konrad.wilk, boris.ostrovsky, Joao Martins,
	Venu Busireddy, vijay.balakrishna
In-Reply-To: <20180623012406-mutt-send-email-mst@kernel.org>

On Fri, Jun 22, 2018 at 3:25 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Fri, Jun 22, 2018 at 02:51:11PM -0700, Siwei Liu wrote:
>> On Fri, Jun 22, 2018 at 2:29 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> > On Fri, Jun 22, 2018 at 01:03:04PM -0700, Siwei Liu wrote:
>> >> On Fri, Jun 22, 2018 at 1:00 PM, Siwei Liu <loseweigh@gmail.com> wrote:
>> >> > On Thu, Jun 21, 2018 at 7:32 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> >> >> On Thu, Jun 21, 2018 at 06:21:55PM -0700, Siwei Liu wrote:
>> >> >>> On Thu, Jun 21, 2018 at 7:59 AM, Cornelia Huck <cohuck@redhat.com> wrote:
>> >> >>> > On Wed, 20 Jun 2018 22:48:58 +0300
>> >> >>> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> >> >>> >
>> >> >>> >> On Wed, Jun 20, 2018 at 06:06:19PM +0200, Cornelia Huck wrote:
>> >> >>> >> > In any case, I'm not sure anymore why we'd want the extra uuid.
>> >> >>> >>
>> >> >>> >> It's mostly so we can have e.g. multiple devices with same MAC
>> >> >>> >> (which some people seem to want in order to then use
>> >> >>> >> then with different containers).
>> >> >>> >>
>> >> >>> >> But it is also handy for when you assign a PF, since then you
>> >> >>> >> can't set the MAC.
>> >> >>> >>
>> >> >>> >
>> >> >>> > OK, so what about the following:
>> >> >>> >
>> >> >>> > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates
>> >> >>> >   that we have a new uuid field in the virtio-net config space
>> >> >>> > - in QEMU, add a property for virtio-net that allows to specify a uuid,
>> >> >>> >   offer VIRTIO_NET_F_STANDBY_UUID if set
>> >> >>> > - when configuring, set the property to the group UUID of the vfio-pci
>> >> >>> >   device
>> >> >>>
>> >> >>> If feature negotiation fails on VIRTIO_NET_F_STANDBY_UUID, is it safe
>> >> >>> to still expose UUID in the config space on virtio-pci?
>> >> >>
>> >> >>
>> >> >> Yes but guest is not supposed to read it.
>> >> >>
>> >> >>> I'm not even sure if it's sane to expose group UUID on the PCI bridge
>> >> >>> where the corresponding vfio-pci device attached to for a guest which
>> >> >>> doesn't support the feature (legacy).
>> >> >>>
>> >> >>> -Siwei
>> >> >>
>> >> >> Yes but you won't add the primary behind such a bridge.
>> >> >
>> >> > I assume the UUID feature is a new one besides the exiting
>> >> > VIRTIO_NET_F_STANDBY feature, where I think the current proposal is,
>> >> > if UUID feature is present and supported by guest, we'll do pairing
>> >> > based on UUID; if UUID feature present
>> >>                                  ^^^^^^^  is NOT present
>> >
>> > Let's go back for a bit.
>> >
>> > What is wrong with matching by mac?
>> >
>> > 1. Does not allow multiple NICs with same mac
>> > 2. Requires MAC to be programmed by host in the PT device
>> >    (which is often possible with VFs but isn't possible with all PT
>> >    devices)
>>
>> Might not neccessarily be something wrong, but it's very limited to
>> prohibit the MAC of VF from changing when enslaved by failover.
>
> You mean guest changing MAC? I'm not sure why we prohibit that.

I think Sridhar and Jiri might be better person to answer it. My
impression was that sync'ing the MAC address change between all 3
devices is challenging, as the failover driver uses MAC address to
match net_device internally.

>
>
>> The
>> same as you indicated on the PT device. I suspect the same is
>> prohibited on even virtio-net and failover is because of the same
>> limitation.
>>
>> >
>> > Both issues are up to host: if host needs them it needs the UUID
>> > feature, if not MAC matching is sufficient.
>>
>> I know. What I'm saying is that we rely on STANDBY feature to control
>> the visibility of the primary, not the UUID feature.
>>
>> -Siwei
>
> And what I'm saying is that it's up to a host policy.

So lets say host policy explicitly requires UUID but the guest does
not support it. The guest could be a legacy guest with no STANDBY
support, or a relevately recent guest with STANDBY support but without
the UUID support. In either case, since host asked for UUID matching
explicitly, maybe because it requires different NIC using same MAC, so
it has to override the negotiation result of STANDBY, even if STANDBY
is set and supported? That will end up with inter-feature dependency
over STANDBY, as you see.

-Siwei

>
>> >
>> >
>> >> > or not supported by guest,
>> >> > we'll still plug in the VF (if guest supports the _F_STANDBY feature)
>> >> > but the pairing will be fallback to using MAC address. Are you saying
>> >> > you don't even want to plug in the primary when the UUID feature is
>> >> > not supported by guest? Does the feature negotiation UUID have to
>> >> > depend on STANDBY being set, or the other way around? I thought that
>> >> > just the absence of STANDBY will prevent primary to get exposed to the
>> >> > guest.
>> >> >
>> >> > -Siwei
>> >> >
>> >> >>
>> >> >>>
>> >> >>> > - in the guest, use the uuid from the virtio-net device's config space
>> >> >>> >   if applicable; else, fall back to matching by MAC as done today
>> >

^ permalink raw reply

* Re: [PATCH bpf-next 2/3] bpf: btf: add btf json print functionality
From: Jakub Kicinski @ 2018-06-22 23:40 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Okash Khawaja, Daniel Borkmann, Alexei Starovoitov, Yonghong Song,
	Quentin Monnet, David S. Miller, netdev, kernel-team,
	linux-kernel
In-Reply-To: <20180622231940.fvkxfqccvyf5uewk@kafai-mbp.dhcp.thefacebook.com>

On Fri, 22 Jun 2018 16:19:40 -0700, Martin KaFai Lau wrote:
> On Fri, Jun 22, 2018 at 02:49:52PM -0700, Jakub Kicinski wrote:
> > On Fri, 22 Jun 2018 14:27:43 -0700, Jakub Kicinski wrote:  
> > > BTF in JSON is very useful, and will help people who writes simple
> > > orchestration/scripts based on bpftool *a* *lot*.  I really appreciate
> > > this addition to bpftool and will start using it myself as soon as it
> > > lands.  I'm not sure why the reluctance to slightly change the output
> > > format?  
> > 
> > Ohh, maybe that's the misunderstanding, you only implemented JSON so
> > you wanted it to be as readable and clean as possible.  Hence the hex
> > output and cutting out the old cruft!  That perspective makes sense!
> > But I think we should keep JSON for machines (but including BTF
> > formatted values) and let's make the plain text output nice and clean,
> > agreed.  
> Right, it is what my earlier comment meant on "this ascii output is
> for human".  We merely call it json because we are reusing
> the json's meaning on {}, [] and int since it fits nicely
> on what we want to achieve, readability.  Other than that,
> it does not have to follow other json's requirements.
> We can call it whatever except json to avoid wrong
> user expectation.  Putting it under "-j"/"-p" was a mistake.
> Hence, I said this patch belongs to the 'plaintext" output.

Yes, that were the confusion came from, right.  I'm personally not sold
on JSON as "human readable" but I'm very far from a UI guru so up to
you :)  

I think it may be good to intentionally do non-JSON things, like use
hex integers, don't put quotes around strings, or always add a comma
after value, so people can't use it as JSON even if they try.

Basic printer is trivial to write, I'm concerned that the reuse of JSON
writer to create a user-readable output will bite us on the posterior
later on...

^ permalink raw reply

* Re: [PATCH bpf-next 2/3] bpf: btf: add btf json print functionality
From: Martin KaFai Lau @ 2018-06-22 23:58 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Okash Khawaja, Daniel Borkmann, Alexei Starovoitov, Yonghong Song,
	Quentin Monnet, David S. Miller, netdev, kernel-team,
	linux-kernel
In-Reply-To: <20180622164048.6283b469@cakuba.netronome.com>

On Fri, Jun 22, 2018 at 04:40:48PM -0700, Jakub Kicinski wrote:
> On Fri, 22 Jun 2018 16:19:40 -0700, Martin KaFai Lau wrote:
> > On Fri, Jun 22, 2018 at 02:49:52PM -0700, Jakub Kicinski wrote:
> > > On Fri, 22 Jun 2018 14:27:43 -0700, Jakub Kicinski wrote:  
> > > > BTF in JSON is very useful, and will help people who writes simple
> > > > orchestration/scripts based on bpftool *a* *lot*.  I really appreciate
> > > > this addition to bpftool and will start using it myself as soon as it
> > > > lands.  I'm not sure why the reluctance to slightly change the output
> > > > format?  
> > > 
> > > Ohh, maybe that's the misunderstanding, you only implemented JSON so
> > > you wanted it to be as readable and clean as possible.  Hence the hex
> > > output and cutting out the old cruft!  That perspective makes sense!
> > > But I think we should keep JSON for machines (but including BTF
> > > formatted values) and let's make the plain text output nice and clean,
> > > agreed.  
> > Right, it is what my earlier comment meant on "this ascii output is
> > for human".  We merely call it json because we are reusing
> > the json's meaning on {}, [] and int since it fits nicely
> > on what we want to achieve, readability.  Other than that,
> > it does not have to follow other json's requirements.
> > We can call it whatever except json to avoid wrong
> > user expectation.  Putting it under "-j"/"-p" was a mistake.
> > Hence, I said this patch belongs to the 'plaintext" output.
> 
> Yes, that were the confusion came from, right.  I'm personally not sold
> on JSON as "human readable" but I'm very far from a UI guru so up to
> you :)  
We have thought of a few options but nothing else shine in term
of saving the verbosity on spelling out the word like "it is a struct"
, "it is an array"... and at the same time trying to use some short form
that people are already familiar with.

> 
> I think it may be good to intentionally do non-JSON things, like use
> hex integers, don't put quotes around strings, or always add a comma
> after value, so people can't use it as JSON even if they try.
Good point.

> 
> Basic printer is trivial to write, I'm concerned that the reuse of JSON
> writer to create a user-readable output will bite us on the posterior
> later on...
If we can think of some better way later, we can also provide another
plaintext output.  I don't want to over design it at this point.  Let
see how it goes.

^ permalink raw reply

* Re: [PATCH] infiniband: i40iw, nes: don't use wall time for TCP sequence numbers
From: Shiraz Saleem @ 2018-06-23  0:01 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Latif, Faisal, Doug Ledford, Jason Gunthorpe, David S. Miller,
	y2038@lists.linaro.org, netdev@vger.kernel.org, Orosco, Henry,
	Nikolova, Tatyana E, Ismail, Mustafa, Jia-Ju Bai, Yuval Shaia,
	Bart Van Assche, Kees Cook, Reshetova, Elena,
	linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20180618144443.137068-1-arnd@arndb.de>

On Mon, Jun 18, 2018 at 08:44:17AM -0600, Arnd Bergmann wrote:
> The nes infiniband driver uses current_kernel_time() to get a nanosecond
> granunarity timestamp to initialize its tcp sequence counters. This is
> one of only a few remaining users of that deprecated function, so we
> should try to get rid of it.
> 
> Aside from using a deprecated API, there are several problems I see here:
> 
> - Using a CLOCK_REALTIME based time source makes it predictable in
>   case the time base is synchronized.
> - Using a coarse timestamp means it only gets updated once per jiffie,
>   making it even more predictable in order to avoid having to access
>   the hardware clock source
> - The upper 2 bits are always zero because the nanoseconds are at most
>   999999999.
> 
> For the Linux TCP implementation, we use secure_tcp_seq(), which appears
> to be appropriate here as well, and solves all the above problems.
> 
> I'm doing the same change in both versions of the nes driver, with
> i40iw being a later copy of the same code.
> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---

Thanks Arnd for the patch!

[...]

> @@ -2164,7 +2165,6 @@ static struct i40iw_cm_node *i40iw_make_cm_node(
>  				   struct i40iw_cm_listener *listener)
>  {
>  	struct i40iw_cm_node *cm_node;
> -	struct timespec ts;
>  	int oldarpindex;
>  	int arpindex;
>  	struct net_device *netdev = iwdev->netdev;
> @@ -2214,8 +2214,10 @@ static struct i40iw_cm_node *i40iw_make_cm_node(
>  	cm_node->tcp_cntxt.rcv_wscale = I40IW_CM_DEFAULT_RCV_WND_SCALE;
>  	cm_node->tcp_cntxt.rcv_wnd =
>  			I40IW_CM_DEFAULT_RCV_WND_SCALED >> I40IW_CM_DEFAULT_RCV_WND_SCALE;
> -	ts = current_kernel_time();
> -	cm_node->tcp_cntxt.loc_seq_num = ts.tv_nsec;
> +	cm_node->tcp_cntxt.loc_seq_num = secure_tcp_seq(htonl(cm_node->loc_addr[0]),
> +							htonl(cm_node->rem_addr[0]),
> +							htons(cm_node->loc_port),
> +							htons(cm_node->rem_port));

Should we not be using secure_tcpv6_seq() when we are ipv6?

Shiraz

>  	cm_node->tcp_cntxt.mss = (cm_node->ipv4) ? (iwdev->vsi.mtu - I40IW_MTU_TO_MSS_IPV4) :
>  				 (iwdev->vsi.mtu - I40IW_MTU_TO_MSS_IPV6);
>  
> 

^ permalink raw reply

* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Siwei Liu @ 2018-06-23  0:05 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Cornelia Huck, Samudrala, Sridhar, Alexander Duyck, virtio-dev,
	aaron.f.brown, Jiri Pirko, Jakub Kicinski, Netdev, qemu-devel,
	virtualization, konrad.wilk, boris.ostrovsky, Joao Martins,
	Venu Busireddy, vijay.balakrishna
In-Reply-To: <20180623012934-mutt-send-email-mst@kernel.org>

On Fri, Jun 22, 2018 at 3:33 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Fri, Jun 22, 2018 at 02:57:39PM -0700, Siwei Liu wrote:
>> On Fri, Jun 22, 2018 at 2:32 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> > On Fri, Jun 22, 2018 at 01:21:45PM -0700, Siwei Liu wrote:
>> >> On Fri, Jun 22, 2018 at 12:05 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> >> > On Fri, Jun 22, 2018 at 05:09:55PM +0200, Cornelia Huck wrote:
>> >> >> On Thu, 21 Jun 2018 21:20:13 +0300
>> >> >> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> >> >>
>> >> >> > On Thu, Jun 21, 2018 at 04:59:13PM +0200, Cornelia Huck wrote:
>> >> >> > > OK, so what about the following:
>> >> >> > >
>> >> >> > > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates
>> >> >> > >   that we have a new uuid field in the virtio-net config space
>> >> >> > > - in QEMU, add a property for virtio-net that allows to specify a uuid,
>> >> >> > >   offer VIRTIO_NET_F_STANDBY_UUID if set
>> >> >> > > - when configuring, set the property to the group UUID of the vfio-pci
>> >> >> > >   device
>> >> >> > > - in the guest, use the uuid from the virtio-net device's config space
>> >> >> > >   if applicable; else, fall back to matching by MAC as done today
>> >> >> > >
>> >> >> > > That should work for all virtio transports.
>> >> >> >
>> >> >> > True. I'm a bit unhappy that it's virtio net specific though
>> >> >> > since down the road I expect we'll have a very similar feature
>> >> >> > for scsi (and maybe others).
>> >> >> >
>> >> >> > But we do not have a way to have fields that are portable
>> >> >> > both across devices and transports, and I think it would
>> >> >> > be a useful addition. How would this work though? Any idea?
>> >> >>
>> >> >> Can we introduce some kind of device-independent config space area?
>> >> >> Pushing back the device-specific config space by a certain value if the
>> >> >> appropriate feature is negotiated and use that for things like the uuid?
>> >> >
>> >> > So config moves back and forth?
>> >> > Reminds me of the msi vector mess we had with pci.
>> >> > I'd rather have every transport add a new config.
>> >> >
>> >> >> But regardless of that, I'm not sure whether extending this approach to
>> >> >> other device types is the way to go. Tying together two different
>> >> >> devices is creating complicated situations at least in the hypervisor
>> >> >> (even if it's fairly straightforward in the guest). [I have not come
>> >> >> around again to look at the "how to handle visibility in QEMU"
>> >> >> questions due to lack of cycles, sorry about that.]
>> >> >>
>> >> >> So, what's the goal of this approach? Only to allow migration with
>> >> >> vfio-pci, or also to plug in a faster device and use it instead of an
>> >> >> already attached paravirtualized device?
>> >> >
>> >> > These are two sides of the same coin, I think the second approach
>> >> > is closer to what we are doing here.
>> >>
>> >> I'm leaning towards being conservative to keep the potential of doing
>> >> both. It's the vfio migration itself that has to be addessed, not to
>> >> make every virtio device to have an accelerated datapath, right?
>> >>
>> >> -Siwei
>> >
>> > I think that the approach we took (signal configuration
>> > through standby) assumes standby always present and
>> > primary appearing and disappearing.
>>
>> I can imagine that's still true even for 1-netdev model. As long as we
>> can hide the lower devices.
>>
>> That's what I said "to keep the potential to support both" models. But
>> we should not go further along to assume the other aspect ie. to get
>> PV accelerated whenever possible, but not to get VF migrated whenever
>> possible. That's essetially a big diveregence of the priority for the
>> goal we'd want to pursue.
>>
>> -Siwei
>
> I suspect the diveregence will be lost on most users though
> simply because they don't even care about vfio. They just
> want things to go fast.

Like Jason said, VF isn't faster than virtio-net in all cases. It
depends on the workload and performance metrics: throughput, latency,
or packet per second.

>
> Rephrasing requirements in terms of acceleration actually
> made things implementable. So I'm not in a hurry to try
> and go back to asking for a migrateable vfio - very
> easy to get stuck there.

Understood. When we get all ethtool_ops exposed on the failover
interface, we'll get back to attack migrateable vfio with the 1-netdev
model.

-Siwei


>
>> >
>> > Anything else isn't well supported and I'm not sure we
>> > should complicate code too much to support it.
>> >
>> >>
>> >> >
>> >> >> What about migration of vfio devices that are not easily replaced by a
>> >> >> paravirtualized device? I'm thinking of vfio-ccw, where our main (and
>> >> >> currently only) supported device is dasd (disks) -- which can do a lot
>> >> >> of specialized things that virtio-blk does not support (and should not
>> >> >> or even cannot support).
>> >> >
>> >> > But maybe virtio-scsi can?
>> >> >
>> >> >> Would it be more helpful to focus on generic
>> >> >> migration support for vfio instead of going about it device by device?
>> >> >>
>> >> >> This network device approach already seems far along, so it makes sense
>> >> >> to continue with it. But I'm not sure whether we want to spend time and
>> >> >> energy on that for other device types rather than working on a general
>> >> >> solution for vfio migration.
>> >> >
>> >> > I'm inclined to say finalizing this feature would be a good first step
>> >> > and will teach us how we can move forward.
>> >> >
>> >> > --
>> >> > MST

^ permalink raw reply

* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Siwei Liu @ 2018-06-23  0:17 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alexander Duyck, virtio-dev, Jiri Pirko, konrad.wilk,
	Jakub Kicinski, Samudrala, Sridhar, Cornelia Huck, qemu-devel,
	virtualization, Venu Busireddy, Netdev, boris.ostrovsky,
	aaron.f.brown, Joao Martins
In-Reply-To: <CADGSJ211pWKYnuuzcgUqJZiHU+=7H3oQSpp1TnE=+EBjkdmCSQ@mail.gmail.com>

On Fri, Jun 22, 2018 at 4:40 PM, Siwei Liu <loseweigh@gmail.com> wrote:
> On Fri, Jun 22, 2018 at 3:25 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> On Fri, Jun 22, 2018 at 02:51:11PM -0700, Siwei Liu wrote:
>>> On Fri, Jun 22, 2018 at 2:29 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>> > On Fri, Jun 22, 2018 at 01:03:04PM -0700, Siwei Liu wrote:
>>> >> On Fri, Jun 22, 2018 at 1:00 PM, Siwei Liu <loseweigh@gmail.com> wrote:
>>> >> > On Thu, Jun 21, 2018 at 7:32 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>> >> >> On Thu, Jun 21, 2018 at 06:21:55PM -0700, Siwei Liu wrote:
>>> >> >>> On Thu, Jun 21, 2018 at 7:59 AM, Cornelia Huck <cohuck@redhat.com> wrote:
>>> >> >>> > On Wed, 20 Jun 2018 22:48:58 +0300
>>> >> >>> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
>>> >> >>> >
>>> >> >>> >> On Wed, Jun 20, 2018 at 06:06:19PM +0200, Cornelia Huck wrote:
>>> >> >>> >> > In any case, I'm not sure anymore why we'd want the extra uuid.
>>> >> >>> >>
>>> >> >>> >> It's mostly so we can have e.g. multiple devices with same MAC
>>> >> >>> >> (which some people seem to want in order to then use
>>> >> >>> >> then with different containers).
>>> >> >>> >>
>>> >> >>> >> But it is also handy for when you assign a PF, since then you
>>> >> >>> >> can't set the MAC.
>>> >> >>> >>
>>> >> >>> >
>>> >> >>> > OK, so what about the following:
>>> >> >>> >
>>> >> >>> > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates
>>> >> >>> >   that we have a new uuid field in the virtio-net config space
>>> >> >>> > - in QEMU, add a property for virtio-net that allows to specify a uuid,
>>> >> >>> >   offer VIRTIO_NET_F_STANDBY_UUID if set
>>> >> >>> > - when configuring, set the property to the group UUID of the vfio-pci
>>> >> >>> >   device
>>> >> >>>
>>> >> >>> If feature negotiation fails on VIRTIO_NET_F_STANDBY_UUID, is it safe
>>> >> >>> to still expose UUID in the config space on virtio-pci?
>>> >> >>
>>> >> >>
>>> >> >> Yes but guest is not supposed to read it.
>>> >> >>
>>> >> >>> I'm not even sure if it's sane to expose group UUID on the PCI bridge
>>> >> >>> where the corresponding vfio-pci device attached to for a guest which
>>> >> >>> doesn't support the feature (legacy).
>>> >> >>>
>>> >> >>> -Siwei
>>> >> >>
>>> >> >> Yes but you won't add the primary behind such a bridge.
>>> >> >
>>> >> > I assume the UUID feature is a new one besides the exiting
>>> >> > VIRTIO_NET_F_STANDBY feature, where I think the current proposal is,
>>> >> > if UUID feature is present and supported by guest, we'll do pairing
>>> >> > based on UUID; if UUID feature present
>>> >>                                  ^^^^^^^  is NOT present
>>> >
>>> > Let's go back for a bit.
>>> >
>>> > What is wrong with matching by mac?
>>> >
>>> > 1. Does not allow multiple NICs with same mac
>>> > 2. Requires MAC to be programmed by host in the PT device
>>> >    (which is often possible with VFs but isn't possible with all PT
>>> >    devices)
>>>
>>> Might not neccessarily be something wrong, but it's very limited to
>>> prohibit the MAC of VF from changing when enslaved by failover.
>>
>> You mean guest changing MAC? I'm not sure why we prohibit that.
>
> I think Sridhar and Jiri might be better person to answer it. My
> impression was that sync'ing the MAC address change between all 3
> devices is challenging, as the failover driver uses MAC address to
> match net_device internally.
>
>>
>>
>>> The
>>> same as you indicated on the PT device. I suspect the same is
>>> prohibited on even virtio-net and failover is because of the same
>>> limitation.
>>>
>>> >
>>> > Both issues are up to host: if host needs them it needs the UUID
>>> > feature, if not MAC matching is sufficient.
>>>
>>> I know. What I'm saying is that we rely on STANDBY feature to control
>>> the visibility of the primary, not the UUID feature.
>>>
>>> -Siwei
>>
>> And what I'm saying is that it's up to a host policy.
>
> So lets say host policy explicitly requires UUID but the guest does
> not support it. The guest could be a legacy guest with no STANDBY
> support, or a relevately recent guest with STANDBY support but without
> the UUID support. In either case, since host asked for UUID matching
> explicitly, maybe because it requires different NIC using same MAC, so
> it has to override the negotiation result of STANDBY, even if STANDBY
> is set and supported? That will end up with inter-feature dependency
> over STANDBY, as you see.

I forgot to mention the above has the assumption that we expose both
STANDBY and UUID feature to QEMU user. In fact, as we're going towards
not exposing the STANDBY feature directly to user, UUID may be always
required to enable STANDBY. If not, how do we make sure QEMU can
control the visibility of primary device? Something to be confirmed
before implementing the code.

-Siwei

>
> -Siwei
>
>>
>>> >
>>> >
>>> >> > or not supported by guest,
>>> >> > we'll still plug in the VF (if guest supports the _F_STANDBY feature)
>>> >> > but the pairing will be fallback to using MAC address. Are you saying
>>> >> > you don't even want to plug in the primary when the UUID feature is
>>> >> > not supported by guest? Does the feature negotiation UUID have to
>>> >> > depend on STANDBY being set, or the other way around? I thought that
>>> >> > just the absence of STANDBY will prevent primary to get exposed to the
>>> >> > guest.
>>> >> >
>>> >> > -Siwei
>>> >> >
>>> >> >>
>>> >> >>>
>>> >> >>> > - in the guest, use the uuid from the virtio-net device's config space
>>> >> >>> >   if applicable; else, fall back to matching by MAC as done today
>>> >

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox