Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH bpf-next 2/3] bpf: btf: add btf json print functionality
From: Jakub Kicinski @ 2018-06-22 21:49 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Okash Khawaja, Daniel Borkmann, Alexei Starovoitov, Yonghong Song,
	Quentin Monnet, David S. Miller, netdev, kernel-team,
	linux-kernel
In-Reply-To: <20180622142743.2b890d0f@cakuba.netronome.com>

On Fri, 22 Jun 2018 14:27:43 -0700, Jakub Kicinski wrote:
> BTF in JSON is very useful, and will help people who writes simple
> orchestration/scripts based on bpftool *a* *lot*.  I really appreciate
> this addition to bpftool and will start using it myself as soon as it
> lands.  I'm not sure why the reluctance to slightly change the output
> format?

Ohh, maybe that's the misunderstanding, you only implemented JSON so
you wanted it to be as readable and clean as possible.  Hence the hex
output and cutting out the old cruft!  That perspective makes sense!
But I think we should keep JSON for machines (but including BTF
formatted values) and let's make the plain text output nice and clean,
agreed.

^ permalink raw reply

* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Michael S. Tsirkin @ 2018-06-22 21:43 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Siwei Liu, Samudrala, Sridhar, Alexander Duyck, virtio-dev,
	aaron.f.brown, Jiri Pirko, Jakub Kicinski, Netdev, qemu-devel,
	virtualization, konrad.wilk, boris.ostrovsky, Joao Martins,
	Venu Busireddy, vijay.balakrishna
In-Reply-To: <20180622170955.298900c1.cohuck@redhat.com>

On Fri, Jun 22, 2018 at 05:09:55PM +0200, Cornelia Huck wrote:
> Would it be more helpful to focus on generic
> migration support for vfio instead of going about it device by device?

Just to note this approach is actually device by device *type*.  It's
mostly device agnostic for a given device type so you can migrate
between hosts with very different hardware.

And support for more PV device types has other advantages
such as security and forward compatibility to future hosts.

Finally, it all can happen mostly within QEMU. User is currently
required to enable it but it's pretty lightweight.

OTOH vfio migration generally requires actual device-specific work, and
only works when hosts are mostly identical. When they aren't it's easy
to blame the user, but tools for checking host compatiblity are
currently non-existent. Upper layer management will also have to learn
about host and device compatibility wrt migration. At the moment they
can't even figure it out wrt software versions of vhost in kernel and
dpdk so I won't hold my breath for all of this happening quickly.

-- 
MST

^ permalink raw reply

* [PATCH v2] selftests: bpf: notification about privilege required to run test_lwt_seg6local.sh testing script
From: Jeffrin Jose T @ 2018-06-22 21:40 UTC (permalink / raw)
  To: ast, daniel, shuah; +Cc: netdev, linux-kernel, linux-kselftest, Jeffrin Jose T

This test needs root privilege for it's successful execution.

This patch is atleast used to notify the user about the privilege
the script demands for the  smooth execution of the test.

Signed-off-by: Jeffrin Jose T (Rajagiri SET) <ahiliation@gmail.com>
---
 tools/testing/selftests/bpf/test_lwt_seg6local.sh | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_lwt_seg6local.sh b/tools/testing/selftests/bpf/test_lwt_seg6local.sh
index 1c77994b5e71..30575577a8b2 100755
--- a/tools/testing/selftests/bpf/test_lwt_seg6local.sh
+++ b/tools/testing/selftests/bpf/test_lwt_seg6local.sh
@@ -21,6 +21,15 @@
 # An UDP datagram is sent from fb00::1 to fb00::6. The test succeeds if this
 # datagram can be read on NS6 when binding to fb00::6.
 
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
+msg="skip all tests:"
+if [ $UID != 0 ]; then
+    echo $msg please run this as root >&2
+    exit $ksft_skip
+fi
+
 TMP_FILE="/tmp/selftest_lwt_seg6local.txt"
 
 cleanup()
-- 
2.17.0

^ permalink raw reply related

* [Patch net] net_sched: remove a bogus warning in hfsc
From: Cong Wang @ 2018-06-22 21:33 UTC (permalink / raw)
  To: netdev; +Cc: pupilla, Cong Wang

In update_vf():

  cftree_remove(cl);
  update_cfmin(cl->cl_parent);

the cl_cfmin of cl->cl_parent is intentionally updated to 0
when that parent only has one child. And if this parent is
root qdisc, we could end up, in hfsc_schedule_watchdog(),
that we can't decide the next schedule time for qdisc watchdog.
But it seems safe that we can just skip it, as this watchdog is
not always scheduled anyway.

Thanks to Marco for testing all the cases, nothing is broken.

Reported-by: Marco Berizzi <pupilla@libero.it>
Tested-by: Marco Berizzi <pupilla@libero.it>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 net/sched/sch_hfsc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index 3ae9877ea205..3278a76f6861 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -1385,8 +1385,8 @@ hfsc_schedule_watchdog(struct Qdisc *sch)
 		if (next_time == 0 || next_time > q->root.cl_cfmin)
 			next_time = q->root.cl_cfmin;
 	}
-	WARN_ON(next_time == 0);
-	qdisc_watchdog_schedule(&q->watchdog, next_time);
+	if (next_time)
+		qdisc_watchdog_schedule(&q->watchdog, next_time);
 }
 
 static int
-- 
2.14.4

^ permalink raw reply related

* Re: Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Michael S. Tsirkin @ 2018-06-22 21:32 UTC (permalink / raw)
  To: Siwei Liu
  Cc: Cornelia Huck, Samudrala, Sridhar, Alexander Duyck, virtio-dev,
	aaron.f.brown, Jiri Pirko, Jakub Kicinski, Netdev, qemu-devel,
	virtualization, konrad.wilk, boris.ostrovsky, Joao Martins,
	Venu Busireddy, vijay.balakrishna
In-Reply-To: <CADGSJ22e=Sa8S-HqM4XtOa6ngBKPL73SiQRPh8PUzQgRaie5oA@mail.gmail.com>

On Fri, Jun 22, 2018 at 01:21:45PM -0700, Siwei Liu wrote:
> On Fri, Jun 22, 2018 at 12:05 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Fri, Jun 22, 2018 at 05:09:55PM +0200, Cornelia Huck wrote:
> >> On Thu, 21 Jun 2018 21:20:13 +0300
> >> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >>
> >> > On Thu, Jun 21, 2018 at 04:59:13PM +0200, Cornelia Huck wrote:
> >> > > OK, so what about the following:
> >> > >
> >> > > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates
> >> > >   that we have a new uuid field in the virtio-net config space
> >> > > - in QEMU, add a property for virtio-net that allows to specify a uuid,
> >> > >   offer VIRTIO_NET_F_STANDBY_UUID if set
> >> > > - when configuring, set the property to the group UUID of the vfio-pci
> >> > >   device
> >> > > - in the guest, use the uuid from the virtio-net device's config space
> >> > >   if applicable; else, fall back to matching by MAC as done today
> >> > >
> >> > > That should work for all virtio transports.
> >> >
> >> > True. I'm a bit unhappy that it's virtio net specific though
> >> > since down the road I expect we'll have a very similar feature
> >> > for scsi (and maybe others).
> >> >
> >> > But we do not have a way to have fields that are portable
> >> > both across devices and transports, and I think it would
> >> > be a useful addition. How would this work though? Any idea?
> >>
> >> Can we introduce some kind of device-independent config space area?
> >> Pushing back the device-specific config space by a certain value if the
> >> appropriate feature is negotiated and use that for things like the uuid?
> >
> > So config moves back and forth?
> > Reminds me of the msi vector mess we had with pci.
> > I'd rather have every transport add a new config.
> >
> >> But regardless of that, I'm not sure whether extending this approach to
> >> other device types is the way to go. Tying together two different
> >> devices is creating complicated situations at least in the hypervisor
> >> (even if it's fairly straightforward in the guest). [I have not come
> >> around again to look at the "how to handle visibility in QEMU"
> >> questions due to lack of cycles, sorry about that.]
> >>
> >> So, what's the goal of this approach? Only to allow migration with
> >> vfio-pci, or also to plug in a faster device and use it instead of an
> >> already attached paravirtualized device?
> >
> > These are two sides of the same coin, I think the second approach
> > is closer to what we are doing here.
> 
> I'm leaning towards being conservative to keep the potential of doing
> both. It's the vfio migration itself that has to be addessed, not to
> make every virtio device to have an accelerated datapath, right?
> 
> -Siwei

I think that the approach we took (signal configuration
through standby) assumes standby always present and
primary appearing and disappearing.

Anything else isn't well supported and I'm not sure we
should complicate code too much to support it.

> 
> >
> >> What about migration of vfio devices that are not easily replaced by a
> >> paravirtualized device? I'm thinking of vfio-ccw, where our main (and
> >> currently only) supported device is dasd (disks) -- which can do a lot
> >> of specialized things that virtio-blk does not support (and should not
> >> or even cannot support).
> >
> > But maybe virtio-scsi can?
> >
> >> Would it be more helpful to focus on generic
> >> migration support for vfio instead of going about it device by device?
> >>
> >> This network device approach already seems far along, so it makes sense
> >> to continue with it. But I'm not sure whether we want to spend time and
> >> energy on that for other device types rather than working on a general
> >> solution for vfio migration.
> >
> > I'm inclined to say finalizing this feature would be a good first step
> > and will teach us how we can move forward.
> >
> > --
> > MST

^ permalink raw reply

* Re: [PATCH] net: drivers/net: Convert random_ether_addr to eth_random_addr
From: Jeff Kirsher @ 2018-06-22 21:30 UTC (permalink / raw)
  To: Joe Perches, Derek Chickles, Satanand Burla, Felix Manlunas,
	Raghu Vatsavayi, Hans Ulli Kroll, Linus Walleij, Yisen Zhuang,
	Salil Mehta, Bryan Whitehead, Microchip Linux Driver Support,
	Harish Patil, Manish Chopra, Dept-GELinuxNICDev,
	Solarflare linux maintainers, Edward Cree, Bert Kenward
  Cc: Kalle Valo, netdev, linux-kernel, linux-arm-kernel,
	intel-wired-lan, linux-omap, linux-ntb, linux-usb, linux-wireless,
	b.a.t.m.a.n
In-Reply-To: <c558b59181756dfe647bcdeaf659c386701c0948.1529689773.git.joe@perches.com>

[-- Attachment #1: Type: text/plain, Size: 1491 bytes --]

On Fri, 2018-06-22 at 10:51 -0700, Joe Perches wrote:
> random_ether_addr is a #define for eth_random_addr which is
> generally preferred in kernel code by ~3:1
> 
> Convert the uses of random_ether_addr to enable removing the #define
> 
> Miscellanea:
> 
> o Convert &vfmac[0] to equivalent vfmac and avoid unnecessary line
> wrap
> 
> Signed-off-by: Joe Perches <joe@perches.com>

Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

For i40e change

> ---
>  drivers/net/ethernet/cavium/liquidio/lio_main.c          | 5 ++---
>  drivers/net/ethernet/cortina/gemini.c                    | 2 +-
>  drivers/net/ethernet/hisilicon/hip04_eth.c               | 2 +-
>  drivers/net/ethernet/intel/i40e/i40e_main.c              | 2 +-
>  drivers/net/ethernet/microchip/lan743x_main.c            | 2 +-
>  drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c | 2 +-
>  drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c          | 2 +-
>  drivers/net/ethernet/sfc/ef10_sriov.c                    | 2 +-
>  drivers/net/ethernet/ti/cpsw.c                           | 2 +-
>  drivers/net/ethernet/ti/netcp_core.c                     | 4 ++--
>  drivers/net/ntb_netdev.c                                 | 2 +-
>  drivers/net/usb/lan78xx.c                                | 2 +-
>  drivers/net/wireless/ath/ath9k/hw.c                      | 2 +-
>  net/batman-adv/bridge_loop_avoidance.c                   | 2 +-
>  14 files changed, 16 insertions(+), 17 deletions(-)

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Michael S. Tsirkin @ 2018-06-22 21:29 UTC (permalink / raw)
  To: Siwei Liu
  Cc: Cornelia Huck, Samudrala, Sridhar, Alexander Duyck, virtio-dev,
	aaron.f.brown, Jiri Pirko, Jakub Kicinski, Netdev, qemu-devel,
	virtualization, konrad.wilk, boris.ostrovsky, Joao Martins,
	Venu Busireddy, vijay.balakrishna
In-Reply-To: <CADGSJ22nLcz_XaJNzUUdpjQaJY8b5wZs=ohm=B7c2qf2K7_yGA@mail.gmail.com>

On Fri, Jun 22, 2018 at 01:03:04PM -0700, Siwei Liu wrote:
> On Fri, Jun 22, 2018 at 1:00 PM, Siwei Liu <loseweigh@gmail.com> wrote:
> > On Thu, Jun 21, 2018 at 7:32 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> On Thu, Jun 21, 2018 at 06:21:55PM -0700, Siwei Liu wrote:
> >>> On Thu, Jun 21, 2018 at 7:59 AM, Cornelia Huck <cohuck@redhat.com> wrote:
> >>> > On Wed, 20 Jun 2018 22:48:58 +0300
> >>> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >>> >
> >>> >> On Wed, Jun 20, 2018 at 06:06:19PM +0200, Cornelia Huck wrote:
> >>> >> > In any case, I'm not sure anymore why we'd want the extra uuid.
> >>> >>
> >>> >> It's mostly so we can have e.g. multiple devices with same MAC
> >>> >> (which some people seem to want in order to then use
> >>> >> then with different containers).
> >>> >>
> >>> >> But it is also handy for when you assign a PF, since then you
> >>> >> can't set the MAC.
> >>> >>
> >>> >
> >>> > OK, so what about the following:
> >>> >
> >>> > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates
> >>> >   that we have a new uuid field in the virtio-net config space
> >>> > - in QEMU, add a property for virtio-net that allows to specify a uuid,
> >>> >   offer VIRTIO_NET_F_STANDBY_UUID if set
> >>> > - when configuring, set the property to the group UUID of the vfio-pci
> >>> >   device
> >>>
> >>> If feature negotiation fails on VIRTIO_NET_F_STANDBY_UUID, is it safe
> >>> to still expose UUID in the config space on virtio-pci?
> >>
> >>
> >> Yes but guest is not supposed to read it.
> >>
> >>> I'm not even sure if it's sane to expose group UUID on the PCI bridge
> >>> where the corresponding vfio-pci device attached to for a guest which
> >>> doesn't support the feature (legacy).
> >>>
> >>> -Siwei
> >>
> >> Yes but you won't add the primary behind such a bridge.
> >
> > I assume the UUID feature is a new one besides the exiting
> > VIRTIO_NET_F_STANDBY feature, where I think the current proposal is,
> > if UUID feature is present and supported by guest, we'll do pairing
> > based on UUID; if UUID feature present
>                                  ^^^^^^^  is NOT present

Let's go back for a bit.

What is wrong with matching by mac?

1. Does not allow multiple NICs with same mac
2. Requires MAC to be programmed by host in the PT device
   (which is often possible with VFs but isn't possible with all PT
   devices)

Both issues are up to host: if host needs them it needs the UUID
feature, if not MAC matching is sufficient.


> > or not supported by guest,
> > we'll still plug in the VF (if guest supports the _F_STANDBY feature)
> > but the pairing will be fallback to using MAC address. Are you saying
> > you don't even want to plug in the primary when the UUID feature is
> > not supported by guest? Does the feature negotiation UUID have to
> > depend on STANDBY being set, or the other way around? I thought that
> > just the absence of STANDBY will prevent primary to get exposed to the
> > guest.
> >
> > -Siwei
> >
> >>
> >>>
> >>> > - in the guest, use the uuid from the virtio-net device's config space
> >>> >   if applicable; else, fall back to matching by MAC as done today

^ permalink raw reply

* Re: [PATCH bpf-next 2/3] bpf: btf: add btf json print functionality
From: Jakub Kicinski @ 2018-06-22 21:27 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Okash Khawaja, Daniel Borkmann, Alexei Starovoitov, Yonghong Song,
	Quentin Monnet, David S. Miller, netdev, kernel-team,
	linux-kernel
In-Reply-To: <20180622205535.c6vjhdwt5er4wc32@kafai-mbp.dhcp.thefacebook.com>

On Fri, 22 Jun 2018 13:58:52 -0700, Martin KaFai Lau wrote:
> On Fri, Jun 22, 2018 at 11:40:32AM -0700, Jakub Kicinski wrote:
> > On Thu, 21 Jun 2018 18:20:52 -0700, Martin KaFai Lau wrote:  
> > > On Thu, Jun 21, 2018 at 05:25:23PM -0700, Jakub Kicinski wrote:  
> > > > On Thu, 21 Jun 2018 16:58:15 -0700, Martin KaFai Lau wrote:    
> > > > > On Thu, Jun 21, 2018 at 04:07:19PM -0700, Jakub Kicinski wrote:    
> > > > > > On Thu, 21 Jun 2018 15:51:17 -0700, Martin KaFai Lau wrote:      
> > > > > > > On Thu, Jun 21, 2018 at 02:59:35PM -0700, Jakub Kicinski wrote:      
> > > > > > > > On Wed, 20 Jun 2018 13:30:53 -0700, Okash Khawaja wrote:        
> > > > > > > > > $ sudo bpftool map dump -p id 14
> > > > > > > > > [{
> > > > > > > > >         "key": 0
> > > > > > > > >     },{
> > > > > > > > >         "value": {
> > > > > > > > >             "m": 1,
> > > > > > > > >             "n": 2,
> > > > > > > > >             "o": "c",
> > > > > > > > >             "p": [15,16,17,18,15,16,17,18
> > > > > > > > >             ],
> > > > > > > > >             "q": [[25,26,27,28,25,26,27,28
> > > > > > > > >                 ],[35,36,37,38,35,36,37,38
> > > > > > > > >                 ],[45,46,47,48,45,46,47,48
> > > > > > > > >                 ],[55,56,57,58,55,56,57,58
> > > > > > > > >                 ]
> > > > > > > > >             ],
> > > > > > > > >             "r": 1,
> > > > > > > > >             "s": 0x7ffff6f70568,
> > > > > > > > >             "t": {
> > > > > > > > >                 "x": 5,
> > > > > > > > >                 "y": 10
> > > > > > > > >             },
> > > > > > > > >             "u": 100,
> > > > > > > > >             "v": 20,
> > > > > > > > >             "w1": 0x7,
> > > > > > > > >             "w2": 0x3
> > > > > > > > >         }
> > > > > > > > >     }
> > > > > > > > > ]        
> > > > > > > > 
> > > > > > > > I don't think this format is okay, JSON output is an API you shouldn't
> > > > > > > > break.  You can change the non-JSON output whatever way you like, but
> > > > > > > > JSON must remain backwards compatible.
> > > > > > > > 
> > > > > > > > The dump today has object per entry, e.g.:
> > > > > > > > 
> > > > > > > > {
> > > > > > > >         "key":["0x00","0x00","0x00","0x00",
> > > > > > > >         ],
> > > > > > > >         "value": ["0x02","0x00","0x00","0x00","0x00","0x00","0x00","0x00"
> > > > > > > >         ]
> > > > > > > > }
> > > > > > > > 
> > > > > > > > This format must remain, you may only augment it with new fields.  E.g.:
> > > > > > > > 
> > > > > > > > {
> > > > > > > >         "key":["0x00","0x00","0x00","0x00",
> > > > > > > >         ],
> > > > > > > > 	"key_struct":{
> > > > > > > > 		"index":0
> > > > > > > > 	},    
> > > Got a few questions.
> > > 
> > > When we support hashtab later, the key could be int
> > > but reusing the name as "index" is weird.  
> > 
> > Ugh, yes, naturally.  I just typed that out without thinking, so for
> > array maps there is usually no BTF info?...  For hashes obviously we  
> The key of array map also has BTF info which must be an int.

Perfect.

> > should just use the BTF, I'm not sure we should format indexes for
> > arrays nicely or not :S
> >   
> > > The key could also be a struct (e.g. a struct to describe ip:port).
> > > Can you suggest how the "key_struct" will look like?  
> > 
> > Hm.  I think in my mind it has only been a struct but that's not true :S
> > So the struct in the name makes very limited sense now.
> > 
> > Should we do:
> > "formatted" : {
> > 	"value" : XXX
> > }
> > 
> > Where
> > XXX == plain int for integers, e.g. "value":0
> > XXX == array for arrays, e.g. "value":[1,2,3,4]
> > XXX == object for objects, e.g. "value":{"field":XXX, "field2":XXX}  
> It is exactly how this patch is using json's {}, [] and int. ;)

Great, then just wrap that in a "formatted" object instead of
redefining fields and we're good?

> but other than that, it does not have to be json.
> In the next spin, lets stop calling this output json to avoid wrong
> user's expection and I also don't want the future readability
> improvements to be limited by that.  Lets call it something
> like plain text output with BTF.

I don't understand.  We are discussing JSON output here.  The example we
are commenting on is output of:

$ sudo bpftool map dump -p id 14

That -p means JSON.  Nobody objects to plain test output changes.  I
actually didn't realize you haven't implemented plain text in this
series, we should have both.

> How about:
> When "bpftool map dump id 1" is used, it will print the BTF plaintext output
> if a map has BTF available.  If not, it will print the existing
> plaintext output.  That should solve the concern about the user may not
> know BTF is available.
> 
> This ascii output is for human.  The script should not parse the ascii output
> because it is silly not to use the binary ABI (like what this patch is using)
> which does not suffer backward compat issue.

What binary ABI?  I'm also not 100% sure what this patch is doing as it
adds bunch of code in new files that never gets called:

 tools/bpf/bpftool/btf_dumper.c |  247 +++++++++++++++++++++++++++++++++++++++++
 tools/bpf/bpftool/btf_dumper.h |   18 ++
 2 files changed, 265 insertions(+)

> The existing "-j" can be used to dump the map's binary data
> for remote debugging purpose.  The BTF is in one of the elf section of
> the bpf_prog.o.

> > > > > > > >         "value": ["0x02","0x00","0x00","0x00","0x00","0x00","0x00","0x00"
> > > > > > > >         ],
> > > > > > > > 	"value_struct":{
> > > > > > > > 		"src_ip":2,    
> > > If for the same map the user changes the "src_ip" to an array of int[4]
> > > later (e.g. to support ipv6), it will become "src_ip": [1, 2, 3, 4].
> > > Is it breaking backward compat?
> > > i.e.
> > > struct five_tuples {
> > > -	int src_ip;
> > > +	int src_ip[4];
> > > /* ... */
> > > };  
> > 
> > Well, it is breaking backward compat, but it's the program doing it,
> > not bpftool :)  BTF changes so does the output.  
> As we see, the key/value's btf-output is inherently not backward compat.
> Hence, "-j" and "-p" will stay as is.  The whole existing json will
> be backward compat instead of only partly backward compat.

No.  There is a difference between user of a facility changing their
input and kernel/libraries providing different output in response to
that, and the libraries suddenly changing the output on their own.

Your example is like saying if user started using IPv6 addresses
instead of IPv4 the netlink attributes in dumps will be different so
kernel didn't keep backwards compat.  While what you're doing is more
equivalent to dropping support for old ioctl interfaces because there
is a better mechanism now.

BTF in JSON is very useful, and will help people who writes simple
orchestration/scripts based on bpftool *a* *lot*.  I really appreciate
this addition to bpftool and will start using it myself as soon as it
lands.  I'm not sure why the reluctance to slightly change the output
format?

^ permalink raw reply

* [PATCH] ipv6: avoid copy_from_user() via ipv6_renew_options_kern()
From: Paul Moore @ 2018-06-22 21:18 UTC (permalink / raw)
  To: netdev; +Cc: selinux, linux-security-module

From: Paul Moore <paul@paul-moore.com>

The ipv6_renew_options_kern() function eventually called into
copy_from_user(), despite it not using any userspace buffers, which
was problematic as that ended up calling access_ok() which emited
a warning on x86 (and likely other arches as well).

  ipv6_renew_options_kern()
    ipv6_renew_options()
      ipv6_renew_option()
        copy_from_user()
          _copy_from_user()
            access_ok()

The access_ok() check inside _copy_from_user() is obviously the right
thing to do which means that calling copy_from_user() via
ipv6_renew_options_kern() is obviously the wrong thing to do.  This
patch fixes this by duplicating ipv6_renew_option() in the _kern()
variant, omitting the userspace copies and attributes.  The patch
does make an attempt at limiting the duplicated code by moving the
option allocation code into a common helper function.  I'm not in
love with this solution, but everything else I could think of seemed
worse.

The ipv6_renew_options_kern() function is an required by the
CALIPSO/RFC5570 code in net/ipv6/calipso.c.

Signed-off-by: Paul Moore <paul@paul-moore.com>
---
 net/ipv6/exthdrs.c |  155 +++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 121 insertions(+), 34 deletions(-)

diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
index 5bc2bf3733ab..902748acd6fe 100644
--- a/net/ipv6/exthdrs.c
+++ b/net/ipv6/exthdrs.c
@@ -1040,36 +1040,47 @@ static int ipv6_renew_option(void *ohdr,
 	return 0;
 }
 
+static int ipv6_renew_option_kern(void *ohdr,
+				  struct ipv6_opt_hdr *newopt, int newoptlen,
+				  int inherit,
+				  struct ipv6_opt_hdr **hdr,
+				  char **p)
+{
+	if (inherit) {
+		if (ohdr) {
+			memcpy(*p, ohdr,
+			       ipv6_optlen((struct ipv6_opt_hdr *)ohdr));
+			*hdr = (struct ipv6_opt_hdr *)*p;
+			*p += CMSG_ALIGN(ipv6_optlen(*hdr));
+		}
+	} else if (newopt) {
+		memcpy(*p, newopt, newoptlen);
+		*hdr = (struct ipv6_opt_hdr *)*p;
+		if (ipv6_optlen(*hdr) > newoptlen)
+			return -EINVAL;
+		*p += CMSG_ALIGN(newoptlen);
+	}
+	return 0;
+}
+
 /**
- * ipv6_renew_options - replace a specific ext hdr with a new one.
+ * ipv6_renew_option_alloc - helper function for allocating ipv6_txoptions
  *
  * @sk: sock from which to allocate memory
  * @opt: original options
  * @newtype: option type to replace in @opt
- * @newopt: new option of type @newtype to replace (user-mem)
- * @newoptlen: length of @newopt
- *
- * Returns a new set of options which is a copy of @opt with the
- * option type @newtype replaced with @newopt.
+ * @newoptlen: length of the new option
  *
- * @opt may be NULL, in which case a new set of options is returned
- * containing just @newopt.
- *
- * @newopt may be NULL, in which case the specified option type is
- * not copied into the new set of options.
- *
- * The new set of options is allocated from the socket option memory
- * buffer of @sk.
+ * This really should only ever be called by ipv6_renew_option() or
+ * ipv6_renew_option_kern().
  */
-struct ipv6_txoptions *
-ipv6_renew_options(struct sock *sk, struct ipv6_txoptions *opt,
-		   int newtype,
-		   struct ipv6_opt_hdr __user *newopt, int newoptlen)
+static struct ipv6_txoptions *ipv6_renew_option_alloc(struct sock *sk,
+						      struct ipv6_txoptions *opt,
+						      int newtype,
+						      int newoptlen)
 {
 	int tot_len = 0;
-	char *p;
 	struct ipv6_txoptions *opt2;
-	int err;
 
 	if (opt) {
 		if (newtype != IPV6_HOPOPTS && opt->hopopt)
@@ -1082,7 +1093,7 @@ ipv6_renew_options(struct sock *sk, struct ipv6_txoptions *opt,
 			tot_len += CMSG_ALIGN(ipv6_optlen(opt->dst1opt));
 	}
 
-	if (newopt && newoptlen)
+	if (newoptlen)
 		tot_len += CMSG_ALIGN(newoptlen);
 
 	if (!tot_len)
@@ -1096,6 +1107,44 @@ ipv6_renew_options(struct sock *sk, struct ipv6_txoptions *opt,
 	memset(opt2, 0, tot_len);
 	refcount_set(&opt2->refcnt, 1);
 	opt2->tot_len = tot_len;
+
+	return opt2;
+}
+
+/**
+ * ipv6_renew_options - replace a specific ext hdr with a new one.
+ *
+ * @sk: sock from which to allocate memory
+ * @opt: original options
+ * @newtype: option type to replace in @opt
+ * @newopt: new option of type @newtype to replace (user-mem)
+ * @newoptlen: length of @newopt
+ *
+ * Returns a new set of options which is a copy of @opt with the
+ * option type @newtype replaced with @newopt.
+ *
+ * @opt may be NULL, in which case a new set of options is returned
+ * containing just @newopt.
+ *
+ * @newopt may be NULL, in which case the specified option type is
+ * not copied into the new set of options.
+ *
+ * The new set of options is allocated from the socket option memory
+ * buffer of @sk.
+ */
+struct ipv6_txoptions *
+ipv6_renew_options(struct sock *sk, struct ipv6_txoptions *opt,
+		   int newtype,
+		   struct ipv6_opt_hdr __user *newopt, int newoptlen)
+{
+	char *p;
+	struct ipv6_txoptions *opt2;
+	int err;
+
+	opt2 = ipv6_renew_option_alloc(sk, opt, newtype,
+				       newopt && newoptlen ? newoptlen : 0);
+	if (!opt2 || IS_ERR(opt2))
+		return opt2;
 	p = (char *)(opt2 + 1);
 
 	err = ipv6_renew_option(opt ? opt->hopopt : NULL, newopt, newoptlen,
@@ -1142,23 +1191,61 @@ ipv6_renew_options(struct sock *sk, struct ipv6_txoptions *opt,
  * @newopt: new option of type @newtype to replace (kernel-mem)
  * @newoptlen: length of @newopt
  *
- * See ipv6_renew_options().  The difference is that @newopt is
- * kernel memory, rather than user memory.
+ * See ipv6_renew_options().  The difference is that @newopt is kernel memory,
+ * rather than user memory.
  */
 struct ipv6_txoptions *
 ipv6_renew_options_kern(struct sock *sk, struct ipv6_txoptions *opt,
-			int newtype, struct ipv6_opt_hdr *newopt,
-			int newoptlen)
+			int newtype,
+			struct ipv6_opt_hdr *newopt, int newoptlen)
 {
-	struct ipv6_txoptions *ret_val;
-	const mm_segment_t old_fs = get_fs();
-
-	set_fs(KERNEL_DS);
-	ret_val = ipv6_renew_options(sk, opt, newtype,
-				     (struct ipv6_opt_hdr __user *)newopt,
-				     newoptlen);
-	set_fs(old_fs);
-	return ret_val;
+	char *p;
+	struct ipv6_txoptions *opt2;
+	int err;
+
+	opt2 = ipv6_renew_option_alloc(sk, opt, newtype,
+				       newopt && newoptlen ? newoptlen : 0);
+	if (!opt2 || IS_ERR(opt2))
+		return opt2;
+	p = (char *)(opt2 + 1);
+
+	err = ipv6_renew_option_kern(opt ? opt->hopopt : NULL,
+				     newopt, newoptlen,
+				     newtype != IPV6_HOPOPTS,
+				     &opt2->hopopt, &p);
+	if (err)
+		goto out;
+
+	err = ipv6_renew_option_kern(opt ? opt->dst0opt : NULL,
+				     newopt, newoptlen,
+				     newtype != IPV6_RTHDRDSTOPTS,
+				     &opt2->dst0opt, &p);
+	if (err)
+		goto out;
+
+	err = ipv6_renew_option_kern(opt ? opt->srcrt : NULL,
+				     newopt, newoptlen,
+				     newtype != IPV6_RTHDR,
+				     (struct ipv6_opt_hdr **)&opt2->srcrt, &p);
+	if (err)
+		goto out;
+
+	err = ipv6_renew_option_kern(opt ? opt->dst1opt : NULL,
+				     newopt, newoptlen,
+				     newtype != IPV6_DSTOPTS,
+				     &opt2->dst1opt, &p);
+	if (err)
+		goto out;
+
+	opt2->opt_nflen = (opt2->hopopt ? ipv6_optlen(opt2->hopopt) : 0) +
+			  (opt2->dst0opt ? ipv6_optlen(opt2->dst0opt) : 0) +
+			  (opt2->srcrt ? ipv6_optlen(opt2->srcrt) : 0);
+	opt2->opt_flen = (opt2->dst1opt ? ipv6_optlen(opt2->dst1opt) : 0);
+
+	return opt2;
+out:
+	sock_kfree_s(sk, opt2, opt2->tot_len);
+	return ERR_PTR(err);
 }
 
 struct ipv6_txoptions *ipv6_fixup_options(struct ipv6_txoptions *opt_space,

^ permalink raw reply related

* Re: [PATCH bpf-next 2/3] bpf: btf: add btf json print functionality
From: Martin KaFai Lau @ 2018-06-22 20:58 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Okash Khawaja, Daniel Borkmann, Alexei Starovoitov, Yonghong Song,
	Quentin Monnet, David S. Miller, netdev, kernel-team,
	linux-kernel
In-Reply-To: <20180622114032.162b2a76@cakuba.netronome.com>

On Fri, Jun 22, 2018 at 11:40:32AM -0700, Jakub Kicinski wrote:
> On Thu, 21 Jun 2018 18:20:52 -0700, Martin KaFai Lau wrote:
> > On Thu, Jun 21, 2018 at 05:25:23PM -0700, Jakub Kicinski wrote:
> > > On Thu, 21 Jun 2018 16:58:15 -0700, Martin KaFai Lau wrote:  
> > > > On Thu, Jun 21, 2018 at 04:07:19PM -0700, Jakub Kicinski wrote:  
> > > > > On Thu, 21 Jun 2018 15:51:17 -0700, Martin KaFai Lau wrote:    
> > > > > > On Thu, Jun 21, 2018 at 02:59:35PM -0700, Jakub Kicinski wrote:    
> > > > > > > On Wed, 20 Jun 2018 13:30:53 -0700, Okash Khawaja wrote:      
> > > > > > > > $ sudo bpftool map dump -p id 14
> > > > > > > > [{
> > > > > > > >         "key": 0
> > > > > > > >     },{
> > > > > > > >         "value": {
> > > > > > > >             "m": 1,
> > > > > > > >             "n": 2,
> > > > > > > >             "o": "c",
> > > > > > > >             "p": [15,16,17,18,15,16,17,18
> > > > > > > >             ],
> > > > > > > >             "q": [[25,26,27,28,25,26,27,28
> > > > > > > >                 ],[35,36,37,38,35,36,37,38
> > > > > > > >                 ],[45,46,47,48,45,46,47,48
> > > > > > > >                 ],[55,56,57,58,55,56,57,58
> > > > > > > >                 ]
> > > > > > > >             ],
> > > > > > > >             "r": 1,
> > > > > > > >             "s": 0x7ffff6f70568,
> > > > > > > >             "t": {
> > > > > > > >                 "x": 5,
> > > > > > > >                 "y": 10
> > > > > > > >             },
> > > > > > > >             "u": 100,
> > > > > > > >             "v": 20,
> > > > > > > >             "w1": 0x7,
> > > > > > > >             "w2": 0x3
> > > > > > > >         }
> > > > > > > >     }
> > > > > > > > ]      
> > > > > > > 
> > > > > > > I don't think this format is okay, JSON output is an API you shouldn't
> > > > > > > break.  You can change the non-JSON output whatever way you like, but
> > > > > > > JSON must remain backwards compatible.
> > > > > > > 
> > > > > > > The dump today has object per entry, e.g.:
> > > > > > > 
> > > > > > > {
> > > > > > >         "key":["0x00","0x00","0x00","0x00",
> > > > > > >         ],
> > > > > > >         "value": ["0x02","0x00","0x00","0x00","0x00","0x00","0x00","0x00"
> > > > > > >         ]
> > > > > > > }
> > > > > > > 
> > > > > > > This format must remain, you may only augment it with new fields.  E.g.:
> > > > > > > 
> > > > > > > {
> > > > > > >         "key":["0x00","0x00","0x00","0x00",
> > > > > > >         ],
> > > > > > > 	"key_struct":{
> > > > > > > 		"index":0
> > > > > > > 	},  
> > Got a few questions.
> > 
> > When we support hashtab later, the key could be int
> > but reusing the name as "index" is weird.
> 
> Ugh, yes, naturally.  I just typed that out without thinking, so for
> array maps there is usually no BTF info?...  For hashes obviously we
The key of array map also has BTF info which must be an int.

> should just use the BTF, I'm not sure we should format indexes for
> arrays nicely or not :S
> 
> > The key could also be a struct (e.g. a struct to describe ip:port).
> > Can you suggest how the "key_struct" will look like?
> 
> Hm.  I think in my mind it has only been a struct but that's not true :S
> So the struct in the name makes very limited sense now.
> 
> Should we do:
> "formatted" : {
> 	"value" : XXX
> }
> 
> Where
> XXX == plain int for integers, e.g. "value":0
> XXX == array for arrays, e.g. "value":[1,2,3,4]
> XXX == object for objects, e.g. "value":{"field":XXX, "field2":XXX}
It is exactly how this patch is using json's {}, [] and int. ;)
but other than that, it does not have to be json.
In the next spin, lets stop calling this output json to avoid wrong
user's expection and I also don't want the future readability
improvements to be limited by that.  Lets call it something
like plain text output with BTF.

How about:
When "bpftool map dump id 1" is used, it will print the BTF plaintext output
if a map has BTF available.  If not, it will print the existing
plaintext output.  That should solve the concern about the user may not
know BTF is available.

This ascii output is for human.  The script should not parse the ascii output
because it is silly not to use the binary ABI (like what this patch is using)
which does not suffer backward compat issue.
The existing "-j" can be used to dump the map's binary data
for remote debugging purpose.  The BTF is in one of the elf section of
the bpf_prog.o.

> 
> > > > > > >         "value": ["0x02","0x00","0x00","0x00","0x00","0x00","0x00","0x00"
> > > > > > >         ],
> > > > > > > 	"value_struct":{
> > > > > > > 		"src_ip":2,  
> > If for the same map the user changes the "src_ip" to an array of int[4]
> > later (e.g. to support ipv6), it will become "src_ip": [1, 2, 3, 4].
> > Is it breaking backward compat?
> > i.e.
> > struct five_tuples {
> > -	int src_ip;
> > +	int src_ip[4];
> > /* ... */
> > };
> 
> Well, it is breaking backward compat, but it's the program doing it,
> not bpftool :)  BTF changes so does the output.
As we see, the key/value's btf-output is inherently not backward compat.
Hence, "-j" and "-p" will stay as is.  The whole existing json will
be backward compat instead of only partly backward compat.


>  
> > > > > > > 		"dst_ip:0
> > > > > > > 	}
> > > > > > > }      
> > > > > > I am not sure how useful to have both "key|value" and "(key|value)_struct"
> > > > > > while most people would prefer "key_struct"/"value_struct" if it is
> > > > > > available.    
> > > > > 
> > > > > Agreed, it's not that useful, especially with the string-hex debacle :(
> > > > > It's just about the backwards compat.
> > > > >     
> > > > > > How about introducing a new option, like "-b", to print the
> > > > > > map with BTF (if available) such that it won't break the existing
> > > > > > one (-j or -p) while the "-b" output can keep using the "key"
> > > > > > and "value".
> > > > > > 
> > > > > > The existing json can be kept as is.    
> > > > > 
> > > > > That was my knee jerk reaction too, but on reflection it doesn't sound
> > > > > that great.  We expect people with new-enough bpftool to use btf, so it
> > > > > should be available in the default output, without hiding it behind a
> > > > > switch.  We could add a switch to hide the old output, but that doesn't
> > > > > give us back the names...  What about Key and Value or k and v?  Or
> > > > > key_fields and value_fields?    
> > > > I thought the current default output is "plain" ;)
> > > > Having said that, yes, the btf is currently printed in json.
> > > > 
> > > > Ideally, the default json output should do what most people want:
> > > > print btf and btf only (if it is available).
> > > > but I don't see a way out without new option if we need to
> > > > be backward compat :(
> > > > 
> > > > Agree that showing the btf in the existing json output will be useful (e.g.
> > > > to hint people that BTF is available).  If btf is showing in old json,
> > > > also agree that the names should be the same with the new json.
> > > > key_fields and value_fields may hint it has >1 fields though.
> > > > May be "formatted_key" and "formatted_value"?  
> > > 
> > > SGTM!  Or even maybe as a "formatted" object?:
> > > 
> > > {
> > >          "key":["0x00","0x00","0x00","0x00",
> > >          ],
> > >          "value": ["0x02","0x00","0x00","0x00","0x00","0x00","0x00","0x00"
> > >          ],
> > > 	"formatted":{
> > > 	 	"key":{
> > >  			"index":0
> > > 	 	},
> > > 	 	"value":{
> > >  			"src_ip":2,
> > >  			"dst_ip:0
> > > 	 	}
> > > 	}  
> > hmm... that is an extra indentation (keep in mind that the "value" could
> > already have a few nested structs which itself consumes a few indentations)
> > but I guess adding another one may be ok-ish.
> 
> I'm not fussed about this, whatever JSON writer does by default is fine
> with me, really.
> 
> > > > > > > The name XYZ_struct may not be the best, perhaps you can come up with a
> > > > > > > better one?  
> > > > > > > 
> > > > > > > Does that make sense?  Am I missing what you're doing here?
> > > > > > > 
> > > > > > > One process note - please make sure you run checkpatch.pl --strict on
> > > > > > > bpftool patches before posting.
> > > > > > > 
> > > > > > > Thanks for working on this!      
> > >   
> 

^ permalink raw reply

* offload_handle issue in ipsec offload
From: Shannon Nelson @ 2018-06-22 20:30 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: netdev@vger.kernel.org, Boris Pismenny, Ilan Tayari, Don Skidmore

Hi Steffen,

While adding the ipsec-offload API to netdevsim I ran across an issue 
with the use of x->xso.offload_handle that I think needs attention, and 
would like your opinion before I try to address it.

The offload_handle is essentially an opaque magic cookie to be used by 
the driver to help track the driver's offload information.  As such, the 
driver fills it in when an SA is setup for an offload, and then the 
offload_handle value can be used in the driver's xmit handler to quickly 
find the device specific config information for setting up each packet.

Since this is driver specific information the stack code, e.g. 
xfrm_dev_offload_ok() and validate_xmit_xfrm() and a couple of esp 
functions, really shouldn't be trying to use it.  However, since they 
are checking offload_handle for == 0 to see if it has been used, the 
drivers need to be sure they have set it to non-zero.  This requirement 
is not clear in the API description at the moment.

I ran across this because my addition to netdevsim uses offload_handle 
to store the index of the array item that has the related SA 
information.  I found in testing that the Tx offload wasn't getting 
called because the array index was 0 and so xfrm_dev_offload_ok() would 
think there was no offload and bypass the offloaded Tx.  It turns out 
that the ixgbe implementation has the same bug, but I missed this in 
earlier testing.  The Mellanox driver doesn't run into this problem 
because they are storing a struct pointer rather than an array index.

Does the stack really need to check offload_handle?  I don't think it 
does, but perhaps I'm missing something.  Here's where I found it used:
  - validate_xmit_xfrm()   - added in 3dca3f38
  - xfrm_dev_offload_ok()  - added in original patch d77e38e6
  - esp4_gso_segment()     - added in 3dca3f38
  - esp_xmit()             - added in fca11ebd
  - esp6_gso_segment()     - added in 3dca3f38
  - esp6_xmit()            - added in 3dca3f38

I think in all cases it is unnecessary and could be pulled out.

If these checks need to be left in, then any client drivers need to be 
sure this is a non-zero value, possibly by adding a VALID bit to their 
opaque value.  The ixgbe and netdevsim implementations would be fine 
with using a high bit as a VALID flag, but that might not play well with 
drivers like Mellanox that store a struct address.

So, after all that... shall we remove the offload_handle references in 
the stack code?

sln

-- 
==================================================
Shannon Nelson           shannon.nelson@oracle.com
Parents can't afford to be squeamish

^ permalink raw reply

* [PATCH net] net: mvneta: fix the Rx desc DMA address in the Rx path
From: Yan Markman @ 2018-06-22 20:26 UTC (permalink / raw)
  To: Antoine Tenart, davem@davemloft.net, gregory.clement@bootlin.com,
	mw@semihalf.com
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	thomas.petazzoni@bootlin.com, maxime.chevallier@bootlin.com,
	miquel.raynal@bootlin.com, Nadav Haklai, Stefan Chulski,
	Yelena Krivosheev

YANM: adding Yelena-K
--------------------------------
When using s/w buffer management, buffers are allocated and DMA mapped.
When doing so on an arm64 platform, an offset correction is applied on the DMA address, before storing it in an Rx descriptor. The issue is this DMA address is then used later in the Rx path without removing the offset correction. Thus the DMA address is wrong, which can led to various issues.

This patch fixes this by removing the offset correction from the DMA address retrieved from the Rx descriptor before using it in the Rx path.

Fixes: 8d5047cf9ca2 ("net: mvneta: Convert to be 64 bits compatible")
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
---
 drivers/net/ethernet/marvell/mvneta.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 17a904cc6a5e..0ad2f3f7da85 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -1932,7 +1932,7 @@ static int mvneta_rx_swbm(struct mvneta_port *pp, int rx_todo,
 		rx_bytes = rx_desc->data_size - (ETH_FCS_LEN + MVNETA_MH_SIZE);
 		index = rx_desc - rxq->descs;
 		data = rxq->buf_virt_addr[index];
-		phys_addr = rx_desc->buf_phys_addr;
+		phys_addr = rx_desc->buf_phys_addr - pp->rx_offset_correction;
 
 		if (!mvneta_rxq_desc_is_first_last(rx_status) ||
 		    (rx_status & MVNETA_RXD_ERR_SUMMARY)) {
--
2.17.1

^ permalink raw reply related

* Re: [PATCH 0/4] docs: e100[0] fix build errors
From: Randy Dunlap @ 2018-06-22 20:22 UTC (permalink / raw)
  To: Tobin C. Harding, Jonathan Corbet
  Cc: Jeff Kirsher, David S. Miller, linux-doc, netdev, linux-kernel
In-Reply-To: <20180622003708.31848-1-me@tobin.cc>

Hi Tobin,

On 06/21/2018 05:37 PM, Tobin C. Harding wrote:
> Hi Jonathan,
> 
> This patch set fixes current docs build failure on Linus' mainline
> 
> commit: (ba4dbdedd3ed Merge tag 'jfs-4.18' of git://github.com/kleikamp/linux-shaggy)
> 
> (FYI this is 8 commits after Linux 4.18-rc1).
> 
> And also same build errors on today's linux-next
> 
> 8439c34f07a3 (tag: next-20180621, linux-next/master, linux-next) Add linux-next specific files for 20180621
> 
> 
> I split the patches in between the two drivers to enable use of the
> 'Fixes:' tag.
> 
> Tobin C. Harding (4):
>   Documentation: e100: Use correct heading adornment
>   Documentation: e1000: Use correct heading adornment
>   Documentation: e100: Fix docs build error
>   Documentation: e1000: Fix docs build error
> 
>  Documentation/networking/e100.rst  | 112 +++++++++++++++--------------
>  Documentation/networking/e1000.rst |  76 ++++++++++----------
>  2 files changed, 96 insertions(+), 92 deletions(-)

I am still seeing a few warnings (your 4 patches applied to
linux-next-20180622):

linux-next-20180622/Documentation/networking/e100.rst:57: WARNING: Literal block expected; none found.
linux-next-20180622/Documentation/networking/e100.rst:68: WARNING: Literal block expected; none found.
linux-next-20180622/Documentation/networking/e100.rst:75: WARNING: Literal block expected; none found.
linux-next-20180622/Documentation/networking/e100.rst:84: WARNING: Literal block expected; none found.
linux-next-20180622/Documentation/networking/e100.rst:93: WARNING: Inline emphasis start-string without end-string.
linux-next-20180622/Documentation/networking/e1000.rst:83: ERROR: Unexpected indentation.
linux-next-20180622/Documentation/networking/e1000.rst:84: WARNING: Block quote ends without a blank line; unexpected unindent.
linux-next-20180622/Documentation/networking/e1000.rst:173: WARNING: Definition list ends without a blank line; unexpected unindent.
linux-next-20180622/Documentation/networking/e1000.rst:236: WARNING: Definition list ends without a blank line; unexpected unindent.

You didn't get these warnings?  or they weren't important?
or "perfect" was not your primary goal?   :)   [which is fine]

[I see around 50 similar doc formatting warnings/errors in the entire
Documentation build.]

Anyway, much better than it was.  Thanks.

Tested-by: Randy Dunlap <rdunlap@infradead.org>

-- 
~Randy

^ permalink raw reply

* Re: Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Siwei Liu @ 2018-06-22 20:21 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Cornelia Huck, Samudrala, Sridhar, Alexander Duyck, virtio-dev,
	aaron.f.brown, Jiri Pirko, Jakub Kicinski, Netdev, qemu-devel,
	virtualization, konrad.wilk, boris.ostrovsky, Joao Martins,
	Venu Busireddy, vijay.balakrishna
In-Reply-To: <20180622214259-mutt-send-email-mst@kernel.org>

On Fri, Jun 22, 2018 at 12:05 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Fri, Jun 22, 2018 at 05:09:55PM +0200, Cornelia Huck wrote:
>> On Thu, 21 Jun 2018 21:20:13 +0300
>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>>
>> > On Thu, Jun 21, 2018 at 04:59:13PM +0200, Cornelia Huck wrote:
>> > > OK, so what about the following:
>> > >
>> > > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates
>> > >   that we have a new uuid field in the virtio-net config space
>> > > - in QEMU, add a property for virtio-net that allows to specify a uuid,
>> > >   offer VIRTIO_NET_F_STANDBY_UUID if set
>> > > - when configuring, set the property to the group UUID of the vfio-pci
>> > >   device
>> > > - in the guest, use the uuid from the virtio-net device's config space
>> > >   if applicable; else, fall back to matching by MAC as done today
>> > >
>> > > That should work for all virtio transports.
>> >
>> > True. I'm a bit unhappy that it's virtio net specific though
>> > since down the road I expect we'll have a very similar feature
>> > for scsi (and maybe others).
>> >
>> > But we do not have a way to have fields that are portable
>> > both across devices and transports, and I think it would
>> > be a useful addition. How would this work though? Any idea?
>>
>> Can we introduce some kind of device-independent config space area?
>> Pushing back the device-specific config space by a certain value if the
>> appropriate feature is negotiated and use that for things like the uuid?
>
> So config moves back and forth?
> Reminds me of the msi vector mess we had with pci.
> I'd rather have every transport add a new config.
>
>> But regardless of that, I'm not sure whether extending this approach to
>> other device types is the way to go. Tying together two different
>> devices is creating complicated situations at least in the hypervisor
>> (even if it's fairly straightforward in the guest). [I have not come
>> around again to look at the "how to handle visibility in QEMU"
>> questions due to lack of cycles, sorry about that.]
>>
>> So, what's the goal of this approach? Only to allow migration with
>> vfio-pci, or also to plug in a faster device and use it instead of an
>> already attached paravirtualized device?
>
> These are two sides of the same coin, I think the second approach
> is closer to what we are doing here.

I'm leaning towards being conservative to keep the potential of doing
both. It's the vfio migration itself that has to be addessed, not to
make every virtio device to have an accelerated datapath, right?

-Siwei


>
>> What about migration of vfio devices that are not easily replaced by a
>> paravirtualized device? I'm thinking of vfio-ccw, where our main (and
>> currently only) supported device is dasd (disks) -- which can do a lot
>> of specialized things that virtio-blk does not support (and should not
>> or even cannot support).
>
> But maybe virtio-scsi can?
>
>> Would it be more helpful to focus on generic
>> migration support for vfio instead of going about it device by device?
>>
>> This network device approach already seems far along, so it makes sense
>> to continue with it. But I'm not sure whether we want to spend time and
>> energy on that for other device types rather than working on a general
>> solution for vfio migration.
>
> I'm inclined to say finalizing this feature would be a good first step
> and will teach us how we can move forward.
>
> --
> MST

^ permalink raw reply

* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Siwei Liu @ 2018-06-22 20:03 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alexander Duyck, virtio-dev, Jiri Pirko, konrad.wilk,
	Jakub Kicinski, Samudrala, Sridhar, Cornelia Huck, qemu-devel,
	virtualization, Venu Busireddy, Netdev, boris.ostrovsky,
	aaron.f.brown, Joao Martins
In-Reply-To: <CADGSJ23SBEFJCD50K-F-Gt86Q1-i_6iHTmSSjdXDzNnnpWZ_oQ@mail.gmail.com>

On Fri, Jun 22, 2018 at 1:00 PM, Siwei Liu <loseweigh@gmail.com> wrote:
> On Thu, Jun 21, 2018 at 7:32 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> On Thu, Jun 21, 2018 at 06:21:55PM -0700, Siwei Liu wrote:
>>> On Thu, Jun 21, 2018 at 7:59 AM, Cornelia Huck <cohuck@redhat.com> wrote:
>>> > On Wed, 20 Jun 2018 22:48:58 +0300
>>> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
>>> >
>>> >> On Wed, Jun 20, 2018 at 06:06:19PM +0200, Cornelia Huck wrote:
>>> >> > In any case, I'm not sure anymore why we'd want the extra uuid.
>>> >>
>>> >> It's mostly so we can have e.g. multiple devices with same MAC
>>> >> (which some people seem to want in order to then use
>>> >> then with different containers).
>>> >>
>>> >> But it is also handy for when you assign a PF, since then you
>>> >> can't set the MAC.
>>> >>
>>> >
>>> > OK, so what about the following:
>>> >
>>> > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates
>>> >   that we have a new uuid field in the virtio-net config space
>>> > - in QEMU, add a property for virtio-net that allows to specify a uuid,
>>> >   offer VIRTIO_NET_F_STANDBY_UUID if set
>>> > - when configuring, set the property to the group UUID of the vfio-pci
>>> >   device
>>>
>>> If feature negotiation fails on VIRTIO_NET_F_STANDBY_UUID, is it safe
>>> to still expose UUID in the config space on virtio-pci?
>>
>>
>> Yes but guest is not supposed to read it.
>>
>>> I'm not even sure if it's sane to expose group UUID on the PCI bridge
>>> where the corresponding vfio-pci device attached to for a guest which
>>> doesn't support the feature (legacy).
>>>
>>> -Siwei
>>
>> Yes but you won't add the primary behind such a bridge.
>
> I assume the UUID feature is a new one besides the exiting
> VIRTIO_NET_F_STANDBY feature, where I think the current proposal is,
> if UUID feature is present and supported by guest, we'll do pairing
> based on UUID; if UUID feature present
                                 ^^^^^^^  is NOT present

> or not supported by guest,
> we'll still plug in the VF (if guest supports the _F_STANDBY feature)
> but the pairing will be fallback to using MAC address. Are you saying
> you don't even want to plug in the primary when the UUID feature is
> not supported by guest? Does the feature negotiation UUID have to
> depend on STANDBY being set, or the other way around? I thought that
> just the absence of STANDBY will prevent primary to get exposed to the
> guest.
>
> -Siwei
>
>>
>>>
>>> > - in the guest, use the uuid from the virtio-net device's config space
>>> >   if applicable; else, fall back to matching by MAC as done today
>>> >
>>> > That should work for all virtio transports.

^ permalink raw reply

* [PATCH] selftests: bpf: notification about privilege required to run test_lwt_seg6local.sh testing script
From: Jeffrin Jose T @ 2018-06-22 20:00 UTC (permalink / raw)
  To: ast, daniel, shuah; +Cc: netdev, linux-kernel, linux-kselftest, Jeffrin Jose T

This test_lwt_seg6local.sh will execute with root privilege.

This patch is atleast used to notify the user about the privilege
required related to smooth execution of the script.

Signed-off-by: Jeffrin Jose T (Rajagiri SET) <ahiliation@gmail.com>
---
 tools/testing/selftests/bpf/test_lwt_seg6local.sh | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_lwt_seg6local.sh b/tools/testing/selftests/bpf/test_lwt_seg6local.sh
index 1c77994b5e71..30575577a8b2 100755
--- a/tools/testing/selftests/bpf/test_lwt_seg6local.sh
+++ b/tools/testing/selftests/bpf/test_lwt_seg6local.sh
@@ -21,6 +21,15 @@
 # An UDP datagram is sent from fb00::1 to fb00::6. The test succeeds if this
 # datagram can be read on NS6 when binding to fb00::6.
 
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
+msg="skip all tests:"
+if [ $UID != 0 ]; then
+    echo $msg please run this as root >&2
+    exit $ksft_skip
+fi
+
 TMP_FILE="/tmp/selftest_lwt_seg6local.txt"
 
 cleanup()
-- 
2.17.0

^ permalink raw reply related

* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Siwei Liu @ 2018-06-22 20:00 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alexander Duyck, virtio-dev, Jiri Pirko, konrad.wilk,
	Jakub Kicinski, Samudrala, Sridhar, Cornelia Huck, qemu-devel,
	virtualization, Venu Busireddy, Netdev, boris.ostrovsky,
	aaron.f.brown, Joao Martins
In-Reply-To: <20180622053141-mutt-send-email-mst@kernel.org>

On Thu, Jun 21, 2018 at 7:32 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Thu, Jun 21, 2018 at 06:21:55PM -0700, Siwei Liu wrote:
>> On Thu, Jun 21, 2018 at 7:59 AM, Cornelia Huck <cohuck@redhat.com> wrote:
>> > On Wed, 20 Jun 2018 22:48:58 +0300
>> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> >
>> >> On Wed, Jun 20, 2018 at 06:06:19PM +0200, Cornelia Huck wrote:
>> >> > In any case, I'm not sure anymore why we'd want the extra uuid.
>> >>
>> >> It's mostly so we can have e.g. multiple devices with same MAC
>> >> (which some people seem to want in order to then use
>> >> then with different containers).
>> >>
>> >> But it is also handy for when you assign a PF, since then you
>> >> can't set the MAC.
>> >>
>> >
>> > OK, so what about the following:
>> >
>> > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates
>> >   that we have a new uuid field in the virtio-net config space
>> > - in QEMU, add a property for virtio-net that allows to specify a uuid,
>> >   offer VIRTIO_NET_F_STANDBY_UUID if set
>> > - when configuring, set the property to the group UUID of the vfio-pci
>> >   device
>>
>> If feature negotiation fails on VIRTIO_NET_F_STANDBY_UUID, is it safe
>> to still expose UUID in the config space on virtio-pci?
>
>
> Yes but guest is not supposed to read it.
>
>> I'm not even sure if it's sane to expose group UUID on the PCI bridge
>> where the corresponding vfio-pci device attached to for a guest which
>> doesn't support the feature (legacy).
>>
>> -Siwei
>
> Yes but you won't add the primary behind such a bridge.

I assume the UUID feature is a new one besides the exiting
VIRTIO_NET_F_STANDBY feature, where I think the current proposal is,
if UUID feature is present and supported by guest, we'll do pairing
based on UUID; if UUID feature present or not supported by guest,
we'll still plug in the VF (if guest supports the _F_STANDBY feature)
but the pairing will be fallback to using MAC address. Are you saying
you don't even want to plug in the primary when the UUID feature is
not supported by guest? Does the feature negotiation UUID have to
depend on STANDBY being set, or the other way around? I thought that
just the absence of STANDBY will prevent primary to get exposed to the
guest.

-Siwei

>
>>
>> > - in the guest, use the uuid from the virtio-net device's config space
>> >   if applicable; else, fall back to matching by MAC as done today
>> >
>> > That should work for all virtio transports.

^ permalink raw reply

* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Michael S. Tsirkin @ 2018-06-22 19:05 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Siwei Liu, Samudrala, Sridhar, Alexander Duyck, virtio-dev,
	aaron.f.brown, Jiri Pirko, Jakub Kicinski, Netdev, qemu-devel,
	virtualization, konrad.wilk, boris.ostrovsky, Joao Martins,
	Venu Busireddy, vijay.balakrishna
In-Reply-To: <20180622170955.298900c1.cohuck@redhat.com>

On Fri, Jun 22, 2018 at 05:09:55PM +0200, Cornelia Huck wrote:
> On Thu, 21 Jun 2018 21:20:13 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Thu, Jun 21, 2018 at 04:59:13PM +0200, Cornelia Huck wrote:
> > > OK, so what about the following:
> > > 
> > > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates
> > >   that we have a new uuid field in the virtio-net config space
> > > - in QEMU, add a property for virtio-net that allows to specify a uuid,
> > >   offer VIRTIO_NET_F_STANDBY_UUID if set
> > > - when configuring, set the property to the group UUID of the vfio-pci
> > >   device
> > > - in the guest, use the uuid from the virtio-net device's config space
> > >   if applicable; else, fall back to matching by MAC as done today
> > > 
> > > That should work for all virtio transports.  
> > 
> > True. I'm a bit unhappy that it's virtio net specific though
> > since down the road I expect we'll have a very similar feature
> > for scsi (and maybe others).
> > 
> > But we do not have a way to have fields that are portable
> > both across devices and transports, and I think it would
> > be a useful addition. How would this work though? Any idea?
> 
> Can we introduce some kind of device-independent config space area?
> Pushing back the device-specific config space by a certain value if the
> appropriate feature is negotiated and use that for things like the uuid?

So config moves back and forth?
Reminds me of the msi vector mess we had with pci.
I'd rather have every transport add a new config.

> But regardless of that, I'm not sure whether extending this approach to
> other device types is the way to go. Tying together two different
> devices is creating complicated situations at least in the hypervisor
> (even if it's fairly straightforward in the guest). [I have not come
> around again to look at the "how to handle visibility in QEMU"
> questions due to lack of cycles, sorry about that.]
> 
> So, what's the goal of this approach? Only to allow migration with
> vfio-pci, or also to plug in a faster device and use it instead of an
> already attached paravirtualized device?

These are two sides of the same coin, I think the second approach
is closer to what we are doing here.

> What about migration of vfio devices that are not easily replaced by a
> paravirtualized device? I'm thinking of vfio-ccw, where our main (and
> currently only) supported device is dasd (disks) -- which can do a lot
> of specialized things that virtio-blk does not support (and should not
> or even cannot support).

But maybe virtio-scsi can?

> Would it be more helpful to focus on generic
> migration support for vfio instead of going about it device by device?
> 
> This network device approach already seems far along, so it makes sense
> to continue with it. But I'm not sure whether we want to spend time and
> energy on that for other device types rather than working on a general
> solution for vfio migration.

I'm inclined to say finalizing this feature would be a good first step
and will teach us how we can move forward.

-- 
MST

^ permalink raw reply

* Re: [PATCH] net: bridge: fix potential null pointer dereference on return from br_port_get_rtnl()
From: Garry McNulty @ 2018-06-22 19:05 UTC (permalink / raw)
  To: nikolay
  Cc: davem, netdev, stephen, Jiří Pírko, bridge,
	linux-kernel
In-Reply-To: <e4fb2883-1643-a921-a138-d1255831a9bd@cumulusnetworks.com>

On Fri, 22 Jun 2018 at 00:35, Nikolay Aleksandrov
<nikolay@cumulusnetworks.com> wrote:
>
> On 06/22/2018 01:20 AM, David Miller wrote:
> > From: Garry McNulty <garrmcnu@gmail.com>
> > Date: Thu, 21 Jun 2018 21:14:27 +0100
> >
> >> br_port_get_rtnl() can return NULL if the network device is not a bridge
> >> port (IFF_BRIDGE_PORT flag not set). br_port_slave_changelink() and
> >> br_port_fill_slave_info() callbacks dereference this pointer without
> >> checking. Currently this is not a problem because slave devices always
> >> set this flag. Add null check in case these conditions ever changye.
> >>
> >> Detected by CoverityScan, CID 1339613 ("Dereference null return value")
> >>
> >> Signed-off-by: Garry McNulty <garrmcnu@gmail.com>
> >
> > I don't think this is reasonable.
> >
> > The bridge code will never, ever, install a slave that doesn't have
> > that bit set.  It's the most fundamental aspect of how these objects
> > are managed.
> >
> +1
>
> This keeps coming up, here's the previous one:
> https://patchwork.ozlabs.org/patch/896046/
>
> Please do a more thorough check if these conditions can actually occur.
> In this case, as Dave said, they cannot.
>
> To be explicit as with the patch I mentioned above:
> Nacked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>
> You can find more info in my reply to the patch above.
>
> Thanks,
>  Nik

Thanks for reviewing and for the feedback.

Regards

Garry

^ permalink raw reply

* [PATCH bpf] nfp: bpf: don't stop offload if replace failed
From: Jakub Kicinski @ 2018-06-22 18:56 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: netdev, oss-drivers, Jakub Kicinski

Stopping offload completely if replace of program failed dates
back to days of transparent offload.  Back then we wanted to
silently fall back to the in-driver processing.  Today we mark
programs for offload when they are loaded into the kernel, so
the transparent offload is no longer a reality.

Flags check in the driver will only allow replace of a driver
program with another driver program or an offload program with
another offload program.

When driver program is replaced stopping offload is a no-op,
because driver program isn't offloaded.  When replacing
offloaded program if the offload fails the entire operation
will fail all the way back to user space and we should continue
using the old program.  IOW when replacing a driver program
stopping offload is unnecessary and when replacing offloaded
program - it's a bug, old program should continue to run.

In practice this bug would mean that if offload operation was to
fail (either due to FW communication error, kernel OOM or new
program being offloaded but for a different netdev) driver
would continue reporting that previous XDP program is offloaded
but in fact no program will be loaded in hardware.  The failure
is fairly unlikely (found by inspection, when working on the code)
but it's unpleasant.

Backport note: even though the bug was introduced in commit
cafa92ac2553 ("nfp: bpf: add support for XDP_FLAGS_HW_MODE"),
this fix depends on commit 441a33031fe5 ("net: xdp: don't allow
device-bound programs in driver mode"), so this fix is sufficient
only in v4.15 or newer.  Kernels v4.13.x and v4.14.x do need to
stop offload if it was transparent/opportunistic, i.e. if
XDP_FLAGS_HW_MODE was not set on running program.

Fixes: cafa92ac2553 ("nfp: bpf: add support for XDP_FLAGS_HW_MODE")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/bpf/main.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c b/drivers/net/ethernet/netronome/nfp/bpf/main.c
index fcdfb8e7fdea..7184af94aa53 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c
@@ -81,10 +81,10 @@ nfp_bpf_xdp_offload(struct nfp_app *app, struct nfp_net *nn,

 	ret = nfp_net_bpf_offload(nn, prog, running, extack);
 	/* Stop offload if replace not possible */
-	if (ret && prog)
-		nfp_bpf_xdp_offload(app, nn, NULL, extack);
+	if (ret)
+		return ret;

-	nn->dp.bpf_offload_xdp = prog && !ret;
+	nn->dp.bpf_offload_xdp = !!prog;
 	return ret;
 }

-- 
2.17.1

^ permalink raw reply related

* [PATCH] selftests: bpf: notification about privilege required to run test_lirc_mode2.sh testing script
From: Jeffrin Jose T @ 2018-06-22 18:54 UTC (permalink / raw)
  To: ast, daniel, shuah; +Cc: netdev, linux-kernel, linux-kselftest, Jeffrin Jose T

The test_lirc_mode2.sh script require root privilege for the successful
execution of the test.

This patch is to notify the user about the privilege the script
demands for the successful execution of the test.

Signed-off-by: Jeffrin Jose T (Rajagiri SET) <ahiliation@gmail.com>
---
 tools/testing/selftests/bpf/test_lirc_mode2.sh | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_lirc_mode2.sh b/tools/testing/selftests/bpf/test_lirc_mode2.sh
index ce2e15e4f976..51184f8f9e64 100755
--- a/tools/testing/selftests/bpf/test_lirc_mode2.sh
+++ b/tools/testing/selftests/bpf/test_lirc_mode2.sh
@@ -1,6 +1,15 @@
 #!/bin/bash
 # SPDX-License-Identifier: GPL-2.0
 
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
+msg="skip all tests:"
+if [ $UID != 0 ]; then
+    echo $msg please run this as root >&2
+    exit $ksft_skip
+fi
+
 GREEN='\033[0;92m'
 RED='\033[0;31m'
 NC='\033[0m' # No Color
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH bpf-next 2/3] bpf: btf: add btf json print functionality
From: Jakub Kicinski @ 2018-06-22 18:44 UTC (permalink / raw)
  To: Quentin Monnet
  Cc: Okash Khawaja, Daniel Borkmann, Martin KaFai Lau,
	Alexei Starovoitov, Yonghong Song, David S. Miller, netdev,
	kernel-team, linux-kernel
In-Reply-To: <a598aea7-e967-1779-5092-937565a69c0b@netronome.com>

On Fri, 22 Jun 2018 11:39:13 +0100, Quentin Monnet wrote:
> 2018-06-22 11:24 UTC+0100 ~ Okash Khawaja <osk@fb.com>
> > On Thu, Jun 21, 2018 at 11:42:59AM +0100, Quentin Monnet wrote:  
> >> Hi Okash,  
> > hi and sorry about delay in responding. the email got routed to
> > incorrect folder.  
> >>
> >> 2018-06-20 13:30 UTC-0700 ~ Okash Khawaja <osk@fb.com>  
>  [...]  
> >>
> >> Hexadecimal values, without quotes, are not valid JSON. Please stick to
> >> decimal values.  
> > ah sorry, i used a buggy json validator. this should be a quick fix.
> > which would be better:  pointers be output hex strings or integers?  
> 
> I would go for integers. Although this is harder to read for humans, it
> is easier to process for machines, which remain the primary targets for
> JSON output.

+1

^ permalink raw reply

* Re: [PATCH bpf-next 2/3] bpf: btf: add btf json print functionality
From: Jakub Kicinski @ 2018-06-22 18:43 UTC (permalink / raw)
  To: Okash Khawaja
  Cc: Martin KaFai Lau, Daniel Borkmann, Alexei Starovoitov,
	Yonghong Song, Quentin Monnet, David S. Miller, netdev,
	kernel-team, linux-kernel
In-Reply-To: <20180622111751.GB3050@w1t1fb>

On Fri, 22 Jun 2018 12:17:52 +0100, Okash Khawaja wrote:
> > > > > > > The name XYZ_struct may not be the best, perhaps you can come up with a
> > > > > > > better one?  
> > > > > > > 
> > > > > > > Does that make sense?  Am I missing what you're doing here?
> > > > > > > 
> > > > > > > One process note - please make sure you run checkpatch.pl --strict on
> > > > > > > bpftool patches before posting.
> > > > > > > 
> > > > > > > Thanks for working on this!      
> > >   
> 
> Hi,
> 
> While I agree on the point of backward compatibility, I think printing
> two overlapping pieces of information side-by-side will make the
> interface less clear. Having separate outputs for the two will keep the
> interface clear and readable.
> 
> Is there a major downside to adding a new flag for BTF output?

plain output and JSON should be more or less equivalent, just formatted
differently.  If we hide the BTF-ed JSON under some flag most users
will just run bpftool -jp map dump id X, see BTF is not there and move
on, without ever looking into the man page to discover there is a magic
switch...

^ permalink raw reply

* Re: [PATCH bpf-next 2/3] bpf: btf: add btf json print functionality
From: Jakub Kicinski @ 2018-06-22 18:40 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Okash Khawaja, Daniel Borkmann, Alexei Starovoitov, Yonghong Song,
	Quentin Monnet, David S. Miller, netdev, kernel-team,
	linux-kernel
In-Reply-To: <20180622012052.htkvholi674x6i4f@kafai-mbp.dhcp.thefacebook.com>

On Thu, 21 Jun 2018 18:20:52 -0700, Martin KaFai Lau wrote:
> On Thu, Jun 21, 2018 at 05:25:23PM -0700, Jakub Kicinski wrote:
> > On Thu, 21 Jun 2018 16:58:15 -0700, Martin KaFai Lau wrote:  
> > > On Thu, Jun 21, 2018 at 04:07:19PM -0700, Jakub Kicinski wrote:  
> > > > On Thu, 21 Jun 2018 15:51:17 -0700, Martin KaFai Lau wrote:    
> > > > > On Thu, Jun 21, 2018 at 02:59:35PM -0700, Jakub Kicinski wrote:    
> > > > > > On Wed, 20 Jun 2018 13:30:53 -0700, Okash Khawaja wrote:      
> > > > > > > $ sudo bpftool map dump -p id 14
> > > > > > > [{
> > > > > > >         "key": 0
> > > > > > >     },{
> > > > > > >         "value": {
> > > > > > >             "m": 1,
> > > > > > >             "n": 2,
> > > > > > >             "o": "c",
> > > > > > >             "p": [15,16,17,18,15,16,17,18
> > > > > > >             ],
> > > > > > >             "q": [[25,26,27,28,25,26,27,28
> > > > > > >                 ],[35,36,37,38,35,36,37,38
> > > > > > >                 ],[45,46,47,48,45,46,47,48
> > > > > > >                 ],[55,56,57,58,55,56,57,58
> > > > > > >                 ]
> > > > > > >             ],
> > > > > > >             "r": 1,
> > > > > > >             "s": 0x7ffff6f70568,
> > > > > > >             "t": {
> > > > > > >                 "x": 5,
> > > > > > >                 "y": 10
> > > > > > >             },
> > > > > > >             "u": 100,
> > > > > > >             "v": 20,
> > > > > > >             "w1": 0x7,
> > > > > > >             "w2": 0x3
> > > > > > >         }
> > > > > > >     }
> > > > > > > ]      
> > > > > > 
> > > > > > I don't think this format is okay, JSON output is an API you shouldn't
> > > > > > break.  You can change the non-JSON output whatever way you like, but
> > > > > > JSON must remain backwards compatible.
> > > > > > 
> > > > > > The dump today has object per entry, e.g.:
> > > > > > 
> > > > > > {
> > > > > >         "key":["0x00","0x00","0x00","0x00",
> > > > > >         ],
> > > > > >         "value": ["0x02","0x00","0x00","0x00","0x00","0x00","0x00","0x00"
> > > > > >         ]
> > > > > > }
> > > > > > 
> > > > > > This format must remain, you may only augment it with new fields.  E.g.:
> > > > > > 
> > > > > > {
> > > > > >         "key":["0x00","0x00","0x00","0x00",
> > > > > >         ],
> > > > > > 	"key_struct":{
> > > > > > 		"index":0
> > > > > > 	},  
> Got a few questions.
> 
> When we support hashtab later, the key could be int
> but reusing the name as "index" is weird.

Ugh, yes, naturally.  I just typed that out without thinking, so for
array maps there is usually no BTF info?...  For hashes obviously we
should just use the BTF, I'm not sure we should format indexes for
arrays nicely or not :S

> The key could also be a struct (e.g. a struct to describe ip:port).
> Can you suggest how the "key_struct" will look like?

Hm.  I think in my mind it has only been a struct but that's not true :S
So the struct in the name makes very limited sense now.

Should we do:
"formatted" : {
	"value" : XXX
}

Where
XXX == plain int for integers, e.g. "value":0
XXX == array for arrays, e.g. "value":[1,2,3,4]
XXX == object for objects, e.g. "value":{"field":XXX, "field2":XXX}

> > > > > >         "value": ["0x02","0x00","0x00","0x00","0x00","0x00","0x00","0x00"
> > > > > >         ],
> > > > > > 	"value_struct":{
> > > > > > 		"src_ip":2,  
> If for the same map the user changes the "src_ip" to an array of int[4]
> later (e.g. to support ipv6), it will become "src_ip": [1, 2, 3, 4].
> Is it breaking backward compat?
> i.e.
> struct five_tuples {
> -	int src_ip;
> +	int src_ip[4];
> /* ... */
> };

Well, it is breaking backward compat, but it's the program doing it,
not bpftool :)  BTF changes so does the output.
 
> > > > > > 		"dst_ip:0
> > > > > > 	}
> > > > > > }      
> > > > > I am not sure how useful to have both "key|value" and "(key|value)_struct"
> > > > > while most people would prefer "key_struct"/"value_struct" if it is
> > > > > available.    
> > > > 
> > > > Agreed, it's not that useful, especially with the string-hex debacle :(
> > > > It's just about the backwards compat.
> > > >     
> > > > > How about introducing a new option, like "-b", to print the
> > > > > map with BTF (if available) such that it won't break the existing
> > > > > one (-j or -p) while the "-b" output can keep using the "key"
> > > > > and "value".
> > > > > 
> > > > > The existing json can be kept as is.    
> > > > 
> > > > That was my knee jerk reaction too, but on reflection it doesn't sound
> > > > that great.  We expect people with new-enough bpftool to use btf, so it
> > > > should be available in the default output, without hiding it behind a
> > > > switch.  We could add a switch to hide the old output, but that doesn't
> > > > give us back the names...  What about Key and Value or k and v?  Or
> > > > key_fields and value_fields?    
> > > I thought the current default output is "plain" ;)
> > > Having said that, yes, the btf is currently printed in json.
> > > 
> > > Ideally, the default json output should do what most people want:
> > > print btf and btf only (if it is available).
> > > but I don't see a way out without new option if we need to
> > > be backward compat :(
> > > 
> > > Agree that showing the btf in the existing json output will be useful (e.g.
> > > to hint people that BTF is available).  If btf is showing in old json,
> > > also agree that the names should be the same with the new json.
> > > key_fields and value_fields may hint it has >1 fields though.
> > > May be "formatted_key" and "formatted_value"?  
> > 
> > SGTM!  Or even maybe as a "formatted" object?:
> > 
> > {
> >          "key":["0x00","0x00","0x00","0x00",
> >          ],
> >          "value": ["0x02","0x00","0x00","0x00","0x00","0x00","0x00","0x00"
> >          ],
> > 	"formatted":{
> > 	 	"key":{
> >  			"index":0
> > 	 	},
> > 	 	"value":{
> >  			"src_ip":2,
> >  			"dst_ip:0
> > 	 	}
> > 	}  
> hmm... that is an extra indentation (keep in mind that the "value" could
> already have a few nested structs which itself consumes a few indentations)
> but I guess adding another one may be ok-ish.

I'm not fussed about this, whatever JSON writer does by default is fine
with me, really.

> > > > > > The name XYZ_struct may not be the best, perhaps you can come up with a
> > > > > > better one?  
> > > > > > 
> > > > > > Does that make sense?  Am I missing what you're doing here?
> > > > > > 
> > > > > > One process note - please make sure you run checkpatch.pl --strict on
> > > > > > bpftool patches before posting.
> > > > > > 
> > > > > > Thanks for working on this!      
> >   

^ permalink raw reply

* Re: bnx2x: kernel panic in the bnx2x driver
From: Vishwanath Pai @ 2018-06-22 18:27 UTC (permalink / raw)
  To: Sudarsana.Kalluru, Ariel.Elior, Dept-EngEverestLinuxL2
  Cc: davem, netdev, dbanerje, pai.vishwain

The patch below worked for me (on 4.14.51 LTS kernel):

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
index 1e33abd..2b3863a 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
@@ -3387,14 +3387,20 @@ static int bnx2x_set_rss_flags(struct bnx2x *bp, struct ethtool_rxnfc *info)
 			DP(BNX2X_MSG_ETHTOOL,
 			   "rss re-configured, UDP 4-tupple %s\n",
 			   udp_rss_requested ? "enabled" : "disabled");
-			return bnx2x_rss(bp, &bp->rss_conf_obj, false, true);
+			if (bp->state == BNX2X_STATE_OPEN)
+				return bnx2x_rss(bp, &bp->rss_conf_obj, false, true);
+			else
+				return 0;
 		} else if ((info->flow_type == UDP_V6_FLOW) &&
 			   (bp->rss_conf_obj.udp_rss_v6 != udp_rss_requested)) {
 			bp->rss_conf_obj.udp_rss_v6 = udp_rss_requested;
 			DP(BNX2X_MSG_ETHTOOL,
 			   "rss re-configured, UDP 4-tupple %s\n",
 			   udp_rss_requested ? "enabled" : "disabled");
-			return bnx2x_rss(bp, &bp->rss_conf_obj, false, true);
+			if (bp->state == BNX2X_STATE_OPEN)
+				return bnx2x_rss(bp, &bp->rss_conf_obj, false, true);
+			else
+				return 0;
 		}
 		return 0;
 
Although I think there might be another place where we may need to fix this as well:
bnx2x_config_rss_eth()

Thanks,
Vishwanath

On 06/22/2018 10:57 AM, Vishwanath Pai wrote:
> Ah, that is great! I will test it out on my machine and let you know.
>
> Thanks,
> Vishwanath
>
> On 06/22/2018 10:21 AM, Kalluru, Sudarsana wrote:
>> Hi Vishwanath,
>>     The config will be cached in the device structure (bp->rss_conf_obj.udp_rss_v4) in this scenario, and will be applied in the load path (bnx2x_nic_load() --> bnx2x_init_rss()). Have unit tested the change on my setup.
>>
>> Thanks,
>> Sudarsana
>>
>> -----Original Message-----
>> From: Vishwanath Pai [mailto:vpai@akamai.com] 
>> Sent: 22 June 2018 18:52
>> To: Kalluru, Sudarsana <Sudarsana.Kalluru@cavium.com>; Elior, Ariel <Ariel.Elior@cavium.com>; Dept-Eng Everest Linux L2 <Dept-EngEverestLinuxL2@cavium.com>
>> Cc: davem@davemloft.net; netdev@vger.kernel.org; dbanerje@akamai.com; pai.vishwain@gmail.com
>> Subject: Re: bnx2x: kernel panic in the bnx2x driver
>>
>> Hi Sudarsana,
>>
>> Thanks for taking a look at my email. The fix you suggested would definitely fix the kernel panic, but at the same time wouldn't it also silently ignore the request by ethtool to set rx-flow-hash?
>>
>> Thanks,
>> Vishwanath
>>
>> On 06/22/2018 06:20 AM, Kalluru, Sudarsana wrote:
>>> Hi Vishwanath,
>>>     Thanks for your mail, and the analysis.
>>> The fix would be to invoke bnx2x_rss() only when the device is opened,
>>>       if (bp->state == BNX2X_STATE_OPEN)
>>>               return bnx2x_rss(bp, &bp->rss_conf_obj, false, true);
>>>       else
>>>               return 0;
>>> Ariel,
>>>    Could you please review the path (bnx2x_set_rss_flags()--> bnx2x_rss()) and confirm/correct on the above.
>>>
>>> Thanks,
>>> Sudarsana
>>>
>>> -----Original Message-----
>>> From: Vishwanath Pai [mailto:vpai@akamai.com]
>>> Sent: 22 June 2018 10:37
>>> To: Elior, Ariel <Ariel.Elior@cavium.com>; Dept-Eng Everest Linux L2 
>>> <Dept-EngEverestLinuxL2@cavium.com>
>>> Cc: davem@davemloft.net; netdev@vger.kernel.org; dbanerje@akamai.com; 
>>> pai.vishwain@gmail.com
>>> Subject: bnx2x: kernel panic in the bnx2x driver
>>>
>>> External Email
>>>
>>> Hi,
>>>
>>> We recently noticed a kernel panic in the bnx2x driver when trying to 
>>> set rx-flow-hash parameters via ethtool during if-pre-up.d. I am 
>>> running kernel
>>> v4.17.2 from ubuntu-mainline-ppa. I have added the stack trace below:
>>>
>>> [   18.280209] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
>>> [   18.280212] PGD 8000000407a79067 P4D 8000000407a79067 PUD 40ce8a067 PMD 0
>>> [   18.280214] Oops: 0010 [#1] SMP PTI
>>> [   18.280215] Modules linked in: intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm_intel joydev input_led kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc hid_eneric aesni_intel gpio_ich aes_x86_64 usbhid lpc_ich crpto_simd ie31200_edac cryptd glue_helper intel_cstate mac_hid intel_rapl_perf bnx2x mdio tcp_bbr netconsole ipmi_devintf ipmi_msghandler i2c_i801 coretemp autofs4 raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 multipath linear sha26_mb mcryptd sha256_ssse3 hid ast i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt mpt3sas fb_sys_fops drm raid_class scsi_transport_sas ahci libahci shpchp video
>>> [   18.280241] CPU: 6 PID: 1081 Comm: ethtool Not tainted 4.17.2-041702-generic #201806160433
>>> [   18.280242] Hardware name: Foxconn CangJie/CangJie, BIOS CC1F108D 02/26/2014
>>> [   18.280243] RIP: 0010:          (null)
>>> [   18.280243] RSP: 0018:ffffb84bc260b9c0 EFLAGS: 00010246
>>> [   18.280244] RAX: 0000000000000000 RBX: ffff92f987f020f0 RCX: 0000000000000000
>>> [   18.280245] RDX: 0000000000000000 RSI: ffffb84bc260b9f8 RDI: ffff92f987f020f0
>>> [   18.280245] RBP: ffffb8bc260b9e8 R08: 0000000000000001 R09: 0000000000000000
>>> [   18.280246] R10: ffffb84bc260bd20 R11: 0000000000000000 R12: ffffb84bc260b9f8
>>> [   18.280246] R13: ffff92f987f008c0 R14: 00007ffdb75bec40 R15: 0000000000000000
>>> [   18.280247] FS:  00007fc0e8798700(0000) GS:ffff92f99fd80000(0000) knlGS:0000000000000000
>>> [   18.280248] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [   18.280249] CR2: 0000000000000000 CR3: 0000000409b4c003 CR4: 00000000001606e0
>>> [   18.280249] Call Trace:
>>> [   18.280263]  ? bnx2x_config_rss+0x2f/0xd0 [bnx2x]
>>> [   18.280270]  bnx2x_rss+0x1d9/0x210 [bnx2x]
>>> [   18.280276]  bnx2x_set_rxnfc+0x17d/0x380 [bnx2x]
>>> [   18.280279]  ethtool_set_rxnfc+0x9b/0x110
>>> [   18.280281]  ? __do_page_cache_readahead+0x1da/0x2c0
>>> [   18.280283]  ? security_capable+0x3c/0x60
>>> [   18.280284]  dev_ethtool+0350/0x2610
>>> [   18.280286]  ? page_cache_async_readahead+0x71/0x80
>>> [   18.280288]  ? page_add_file_rmap+0x5d/0x220
>>> [   18.280290]  ? inet_ioctl+0x182/0x1a0
>>> [   18.280291]  dev_ioctl+0x203/0x3f0
>>> [   18.280293]  ? dev_ioctl+0x203/0x3f0
>>> [   18.280294]  sock_do_ioctl+0xae/0x150
>>> [   18.280296]  sock_ioctl+0x1e2/0x330
>>> [   18.280296]  ? sock_ioctl+0x1e2/0x330
>>> [   18.280299]  do_vfs_ioctl+0xa8/0x620
>>> [   18.280300]  ? dlci_ioctl_set+0x30/0x30
>>> [   18.280301]  ? do_vfs_ioctl+0xa8/0x620
>>> [   18.280302]  ? handle_mm_fault+0xe3/0x220
>>> [   18.280304]  ksys_ioctl+0x75/0x80
>>> [   18.280305]  __x64_sys_ioctl+0x1a/0x20
>>> [   18.280307]  do_syscall_64+0x5a/0x120
>>> [   18.280309]  entry_SYSCALL_64_aftr_hwframe+0x44/0xa9
>>> [   18.280310] RIP: 0033:0x7fc0e7fba107
>>> [   18.280311] RSP: 002b:00007ffdb75beb78 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
>>> [   18.280312] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc0e7fba107
>>> [   18.280312] RDX: 00007ffdb75bed60 RSI: 0000000000008946 RDI: 0000000000000003
>>> [   18.280313] RBP: 00007ffdb75bed50 R08: 00007ffdb75bed60 R09: 0000000000000001
>>> [   18.280313] R10: 0000000000000541 R11: 0000000000000206 R12: 00007ffdb75beed0
>>> [   18.280314] R13: 0000000000421020 R14: 000000000041fe28 R15: 0000000000000003
>>> [   18.280315] Code:  Bad RIP value.
>>> [   18.280317] RIP:           (null) RSP: ffffb84bc260b9c0
>>> [  18.280318] CR2: 0000000000000000
>>> [   18.280319] ---[ end trace 5f361db3fb9059f1 ]---
>>>
>>> To reproduce this I created a bash script in "/etc/network/if-pre-up.d/" with these two lines:
>>> ethtool -N $IFACE rx-flow-hash udp4 "sdfn"
>>> ethtool -N $IFACE rx-flow-hash udp6 "sdfn"
>>>
>>> The problem here is that rss_obj in bnx2x struct for the device hasn't 
>>> been initialized yet, which causes an exception in bnx2x_config_rss() 
>>> when calling "r->set_pending(r)" because r->set_pending is NULL. It 
>>> looks like a lot many things haven't been initialized at this point, 
>>> most of that happens in this
>>> function: "bnx2x_init_bp_objs()" which isn't called until ifup. Any thoughts on how this can be fixed? Would it be possible to safely move bnx2x_init_bp_objs() to maybe bnx2x_init_one() which runs much before ifup?
>>>
>>> Thanks,
>>> Vishwanath
>>>
>

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox