Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: KASAN: use-after-free Read in __wake_up_common_lock
From: Eric Dumazet @ 2019-02-06  3:42 UTC (permalink / raw)
  To: Dmitry Vyukov, syzbot, Eric Dumazet
  Cc: Karsten Keil, LKML, netdev, syzkaller-bugs
In-Reply-To: <CACT4Y+YOTSY=ZVw882tfDDPZ6Y5ZSX6HHn2M=Q2uTxjUtjb9=w@mail.gmail.com>



On 02/05/2019 07:28 PM, Dmitry Vyukov wrote:
> On Wed, Jan 30, 2019 at 10:02 PM syzbot
> <syzbot+fb065bc06d3d4054be6f@syzkaller.appspotmail.com> wrote:
>>
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit:    62967898789d Merge git://git.kernel.org/pub/scm/linux/kern..
>> git tree:       upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=10f0bf08c00000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=4fceea9e2d99ac20
>> dashboard link: https://syzkaller.appspot.com/bug?extid=fb065bc06d3d4054be6f
>> compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
>>
>> Unfortunately, I don't have any reproducer for this crash yet.
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+fb065bc06d3d4054be6f@syzkaller.appspotmail.com
> 
> I assume this is also fixed by:
> 
> #syz fix: mISDN: fix a race in dev_expire_timer()

Yes this looks very probable, thanks.


^ permalink raw reply

* Re: Pull patches from tip/perf/core to bpf-next
From: Song Liu @ 2019-02-06  3:44 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, Alexei Starovoitov, Kernel Team, Netdev
In-Reply-To: <20190206033600.6terh4xgozy5r3hs@ast-mbp>



> On Feb 5, 2019, at 7:36 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> On Tue, Feb 05, 2019 at 10:47:06PM +0000, Song Liu wrote:
>> Hi Alexei and Daniel, 
>> 
>> The following patches are required for BPF introspection in perf tools. 
>> Please pull them to bpf-next, so that we get all the dependencies in one
>> tree. 
>> 
>> Thanks,
>> Song
>> 
>> (from 1/10 to 10/10)
>> 76193a94522f perf, bpf: Introduce PERF_RECORD_KSYMBOL
>> d764ac646491 tools headers uapi: Sync tools/include/uapi/linux/perf_event.h
>> 6ee52e2a3fe4 perf, bpf: Introduce PERF_RECORD_BPF_EVENT
>> df063c83aa2c tools headers uapi: Sync tools/include/uapi/linux/perf_event.h
>> 9aa0bfa370b2 perf tools: Handle PERF_RECORD_KSYMBOL
>> 45178a928a4b perf tools: Handle PERF_RECORD_BPF_EVENT
>> 7b612e291a5a perf tools: Synthesize PERF_RECORD_* for loaded BPF programs
>> a40b95bcd30c perf top: Synthesize BPF events for pre-existing loaded BPF programs
>> 6934058d9fb6 bpf: Add module name [bpf] to ksymbols for bpf programs
>> 811184fb6977 perf bpf: Fix synthesized PERF_RECORD_KSYMBOL/BPF_EVENT
> 
> yes. we can certainly do that.
> Do you have bpf specific patches that depend on that ?
> Since it's rc5 already. Are you planning to send them within next week?

BPF introspection work depends on these patches. I have been hopping 
between perf tree and bpf-next tree. I think basing the series up on 
bpf-next plus these patches leads least conflicts. 

I do plan to send the series within next week. 

On a second thought, maybe I should send based on perf tree, and worry
about the conflicts later? It is really heavier on perf side. 

Thanks,
Song


^ permalink raw reply

* Re: Pull patches from tip/perf/core to bpf-next
From: Alexei Starovoitov @ 2019-02-06  3:47 UTC (permalink / raw)
  To: Song Liu; +Cc: Daniel Borkmann, Alexei Starovoitov, Kernel Team, Netdev
In-Reply-To: <A6EB5CAD-BF0B-4C8C-8BD8-9703DFE7E0C7@fb.com>

On Wed, Feb 06, 2019 at 03:44:29AM +0000, Song Liu wrote:
> 
> 
> > On Feb 5, 2019, at 7:36 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> > 
> > On Tue, Feb 05, 2019 at 10:47:06PM +0000, Song Liu wrote:
> >> Hi Alexei and Daniel, 
> >> 
> >> The following patches are required for BPF introspection in perf tools. 
> >> Please pull them to bpf-next, so that we get all the dependencies in one
> >> tree. 
> >> 
> >> Thanks,
> >> Song
> >> 
> >> (from 1/10 to 10/10)
> >> 76193a94522f perf, bpf: Introduce PERF_RECORD_KSYMBOL
> >> d764ac646491 tools headers uapi: Sync tools/include/uapi/linux/perf_event.h
> >> 6ee52e2a3fe4 perf, bpf: Introduce PERF_RECORD_BPF_EVENT
> >> df063c83aa2c tools headers uapi: Sync tools/include/uapi/linux/perf_event.h
> >> 9aa0bfa370b2 perf tools: Handle PERF_RECORD_KSYMBOL
> >> 45178a928a4b perf tools: Handle PERF_RECORD_BPF_EVENT
> >> 7b612e291a5a perf tools: Synthesize PERF_RECORD_* for loaded BPF programs
> >> a40b95bcd30c perf top: Synthesize BPF events for pre-existing loaded BPF programs
> >> 6934058d9fb6 bpf: Add module name [bpf] to ksymbols for bpf programs
> >> 811184fb6977 perf bpf: Fix synthesized PERF_RECORD_KSYMBOL/BPF_EVENT
> > 
> > yes. we can certainly do that.
> > Do you have bpf specific patches that depend on that ?
> > Since it's rc5 already. Are you planning to send them within next week?
> 
> BPF introspection work depends on these patches. I have been hopping 
> between perf tree and bpf-next tree. I think basing the series up on 
> bpf-next plus these patches leads least conflicts. 
> 
> I do plan to send the series within next week. 
> 
> On a second thought, maybe I should send based on perf tree, and worry
> about the conflicts later? It is really heavier on perf side. 

whichever way is easier.
if bpf-next is the best use that as a base with above patches.
Once your set gets Acks from perf folks we can push above patches
from tip first and then apply your set.


^ permalink raw reply

* Re: [RFC bpf-next 0/7] net: flow_dissector: trigger BPF hook when called from eth_get_headlen
From: Stanislav Fomichev @ 2019-02-06  3:56 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Willem de Bruijn, Stanislav Fomichev, Network Development,
	David Miller, Alexei Starovoitov, Daniel Borkmann, simon.horman,
	Willem de Bruijn
In-Reply-To: <20190206031215.qldeh7pfgqr3frg3@ast-mbp>

On 02/05, Alexei Starovoitov wrote:
> On Tue, Feb 05, 2019 at 04:59:31PM -0800, Stanislav Fomichev wrote:
> > On 02/05, Alexei Starovoitov wrote:
> > > On Tue, Feb 05, 2019 at 12:40:03PM -0800, Stanislav Fomichev wrote:
> > > > On 02/05, Willem de Bruijn wrote:
> > > > > On Tue, Feb 5, 2019 at 12:57 PM Stanislav Fomichev <sdf@google.com> wrote:
> > > > > >
> > > > > > Currently, when eth_get_headlen calls flow dissector, it doesn't pass any
> > > > > > skb. Because we use passed skb to lookup associated networking namespace
> > > > > > to find whether we have a BPF program attached or not, we always use
> > > > > > C-based flow dissector in this case.
> > > > > >
> > > > > > The goal of this patch series is to add new networking namespace argument
> > > > > > to the eth_get_headlen and make BPF flow dissector programs be able to
> > > > > > work in the skb-less case.
> > > > > >
> > > > > > The series goes like this:
> > > > > > 1. introduce __init_skb and __init_skb_shinfo; those will be used to
> > > > > >    initialize temporary skb
> > > > > > 2. introduce skb_net which can be used to get networking namespace
> > > > > >    associated with an skb
> > > > > > 3. add new optional network namespace argument to __skb_flow_dissect and
> > > > > >    plumb through the callers
> > > > > > 4. add new __flow_bpf_dissect which constructs temporary on-stack skb
> > > > > >    (using __init_skb) and calls BPF flow dissector program
> > > > > 
> > > > > The main concern I see with this series is this cost of skb zeroing
> > > > > for every packet in the device driver receive routine, *independent*
> > > > > from the real skb allocation and zeroing which will likely happen
> > > > > later.
> > > > Yes, plus ~200 bytes on the stack for the callers.
> > > > 
> > > > Not sure how visible this zeroing though, I can probably try to get some
> > > > numbers from BPF_PROG_TEST_RUN (running current version vs running with
> > > > on-stack skb).
> > > 
> > > imo extra 256 byte memset for every packet is non starter.
> > We can put pre-allocated/initialized skbs without data into percpu or even
> > use pcpu_freelist_pop/pcpu_freelist_push to make sure we don't have to think
> > about having multiple percpu for irq/softirq/process contexts.
> > Any concerns with that approach?
> > Any other possible concerns with the overall series?
> 
> I'm missing why the whole thing is needed.
> You're saying:
> " make BPF flow dissector programs be able to work in the skb-less case".
> What does it mean specifically?
> The only non-skb case is XDP.
> Are you saying you want flow_dissector prog to be run in XDP?
eth_get_headlen that drivers call on RX path on a chunk of data to
guesstimate the length of the headers calls flow dissector without an skb
(__skb_flow_dissect was a weird interface where it accepts skb or
data+len). Right now, there is no way to trigger BPF flow dissector
for this case (we don't have an skb to get associated namespace/etc/etc).
The patch series tries to fix that to make sure that we always trigger
BPF program if it's attached to a device's namespace.

Context:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2ba38943ba190eb6a494262003e23187d1b40fb4

^ permalink raw reply

* RE: [PATCH v2] arm64: dts: lx2160aqds: Add mdio mux nodes
From: Pankaj Bansal @ 2019-02-06  4:01 UTC (permalink / raw)
  To: Leo Li
  Cc: Shawn Guo, Andrew Lunn, Florian Fainelli, netdev@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <CADRPPNSnPoEwUFKK0=FE9RG9QfTUPKoTXkZPVEpskC13mD-R2A@mail.gmail.com>



> -----Original Message-----
> From: Li Yang [mailto:leoyang.li@nxp.com]
> Sent: Wednesday, 6 February, 2019 12:07 AM
> To: Pankaj Bansal <pankaj.bansal@nxp.com>
> Cc: Shawn Guo <shawnguo@kernel.org>; Andrew Lunn <andrew@lunn.ch>;
> Florian Fainelli <f.fainelli@gmail.com>; netdev@vger.kernel.org; linux-arm-
> kernel@lists.infradead.org
> Subject: Re: [PATCH v2] arm64: dts: lx2160aqds: Add mdio mux nodes
> 
> On Mon, Feb 4, 2019 at 2:53 AM Pankaj Bansal <pankaj.bansal@nxp.com>
> wrote:
> >
> > The two external MDIO buses used to communicate with phy devices that
> > are external to SOC are muxed in LX2160AQDS board.
> >
> > These buses can be routed to any one of the eight IO slots on
> > LX2160AQDS board depending on value in fpga register 0x54.
> >
> > Additionally the external MDIO1 is used to communicate to the onboard
> > RGMII phy devices.
> >
> > The mdio1 is controlled by bits 4-7 of fpga register and mdio2 is
> > controlled by bits 0-3 of fpga register.
> >
> > Signed-off-by: Pankaj Bansal <pankaj.bansal@nxp.com>
> > ---
> >
> > Notes:
> >     V2:
> >     - removed unnecassary TODO statements
> >     - removed device_type from mdio nodes
> >     - change the case of hex number to lowercase
> >     - removed board specific comments from soc file
> 
> There are still some comments from V1 haven't been addressed.
> 
> >
> >  .../boot/dts/freescale/fsl-lx2160a-qds.dts   | 115 +++++++++++++++++
> >  .../boot/dts/freescale/fsl-lx2160a.dtsi      |  18 +++
> >  2 files changed, 133 insertions(+)
> >
> > diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a-qds.dts
> > b/arch/arm64/boot/dts/freescale/fsl-lx2160a-qds.dts
> > index 99a22abbe725..2c3020a72d41 100644
> > --- a/arch/arm64/boot/dts/freescale/fsl-lx2160a-qds.dts
> > +++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a-qds.dts
> > @@ -46,6 +46,121 @@
> >  &i2c0 {
> >         status = "okay";
> >
> > +       fpga@66 {
> > +               compatible = "fsl,lx2160aqds-fpga", "fsl,fpga-qixis-i2c";
> > +               reg = <0x66>;
> > +               #address-cells = <1>;
> > +               #size-cells = <0>;
> > +
> > +               mdio-mux-1@54 {
> 
> Still no compatible string defined for the node.  Probably should be "mdio-mux-
> mmioreg", "mdio-mux"

it is not a specific device. MDIO mux is meant to be controlled by some registers of parent device (FPGA).
Therefore, IMO this should not be a device and there should not be any "compatible" property for it.

> 
> > +                       mdio-parent-bus = <&emdio1>;
> > +                       reg = <0x54>;            /* BRDCFG4 */
> > +                       mux-mask = <0xf8>;      /* EMI1_MDIO */
> > +                       #address-cells=<1>;
> > +                       #size-cells = <0>;
> > +
> > +                       mdio@0 {
> > +                               reg = <0x00>;
> > +                               #address-cells = <1>;
> > +                               #size-cells = <0>;
> > +                       };
> > +                       mdio@40 {
> > +                               reg = <0x40>;
> > +                               #address-cells = <1>;
> > +                               #size-cells = <0>;
> > +                       };
> > +                       mdio@c0 {
> > +                               reg = <0xc0>;
> > +                               #address-cells = <1>;
> > +                               #size-cells = <0>;
> > +                       };
> > +                       mdio@c8 {
> > +                               reg = <0xc8>;
> > +                               #address-cells = <1>;
> > +                               #size-cells = <0>;
> > +                       };
> > +                       mdio@d0 {
> > +                               reg = <0xd0>;
> > +                               #address-cells = <1>;
> > +                               #size-cells = <0>;
> > +                       };
> > +                       mdio@d8 {
> > +                               reg = <0xd8>;
> > +                               #address-cells = <1>;
> > +                               #size-cells = <0>;
> > +                       };
> > +                       mdio@e0 {
> > +                               reg = <0xe0>;
> > +                               #address-cells = <1>;
> > +                               #size-cells = <0>;
> > +                       };
> > +                       mdio@e8 {
> > +                               reg = <0xe8>;
> > +                               #address-cells = <1>;
> > +                               #size-cells = <0>;
> > +                       };
> > +                       mdio@f0 {
> > +                               reg = <0xf0>;
> > +                               #address-cells = <1>;
> > +                               #size-cells = <0>;
> > +                       };
> > +                       mdio@f8 {
> > +                               reg = <0xf8>;
> > +                               #address-cells = <1>;
> > +                               #size-cells = <0>;
> > +                       };
> > +               };
> > +
> > +               mdio-mux-2@54 {
> > +                       mdio-parent-bus = <&emdio2>;
> > +                       reg = <0x54>;            /* BRDCFG4 */
> > +                       mux-mask = <0x07>;      /* EMI2_MDIO */
> > +                       #address-cells=<1>;
> > +                       #size-cells = <0>;
> > +
> > +                       mdio@0 {
> > +                               reg = <0x00>;
> > +                               #address-cells = <1>;
> > +                               #size-cells = <0>;
> > +                       };
> > +                       mdio@1 {
> > +                               reg = <0x01>;
> > +                               #address-cells = <1>;
> > +                               #size-cells = <0>;
> > +                       };
> > +                       mdio@2 {
> > +                               reg = <0x02>;
> > +                               #address-cells = <1>;
> > +                               #size-cells = <0>;
> > +                       };
> > +                       mdio@3 {
> > +                               reg = <0x03>;
> > +                               #address-cells = <1>;
> > +                               #size-cells = <0>;
> > +                       };
> > +                       mdio@4 {
> > +                               reg = <0x04>;
> > +                               #address-cells = <1>;
> > +                               #size-cells = <0>;
> > +                       };
> > +                       mdio@5 {
> > +                               reg = <0x05>;
> > +                               #address-cells = <1>;
> > +                               #size-cells = <0>;
> > +                       };
> > +                       mdio@6 {
> > +                               reg = <0x06>;
> > +                               #address-cells = <1>;
> > +                               #size-cells = <0>;
> > +                       };
> > +                       mdio@7 {
> > +                               reg = <0x07>;
> > +                               #address-cells = <1>;
> > +                               #size-cells = <0>;
> > +                       };
> > +               };
> > +       };
> > +
> >         i2c-mux@77 {
> >                 compatible = "nxp,pca9547";
> >                 reg = <0x77>;
> > diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
> > b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
> > index a79f5c1ea56d..a74045ad22ad 100644
> > --- a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
> > +++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
> > @@ -762,5 +762,23 @@
> >                                      <GIC_SPI 209 IRQ_TYPE_LEVEL_HIGH>;
> >                         dma-coherent;
> >                 };
> > +
> > +               /* WRIOP0: 0x8b8_0000, E-MDIO1: 0x1_6000 */
> > +               emdio1: mdio@8b96000 {
> > +                       compatible = "fsl,fman-memac-mdio";
> > +                       reg = <0x0 0x8b96000 0x0 0x1000>;
> > +                       #address-cells = <1>;
> > +                       #size-cells = <0>;
> > +                       little-endian;  /* force the driver in LE mode
> > + */
> 
> Still doesn't fully align with the binding at
> "Documentation/devicetree/bindings/powerpc/fsl/fman.txt".
> 
> It should either have the "interrupts" property for external MDIO or "fsl,fman-
> internal-mdio" for internal MDIO.

I see that for DPAA1 based SOCs, there was definitely an interrupt property in external MDIO node.
BUT, for DPAA2 based devices none of the SOC has this property. I am looking further into this.

> 
> > +               };
> > +
> > +               /* WRIOP0: 0x8b8_0000, E-MDIO2: 0x1_7000 */
> > +               emdio2: mdio@8b97000 {
> > +                       compatible = "fsl,fman-memac-mdio";
> > +                       reg = <0x0 0x8b97000 0x0 0x1000>;
> > +                       #address-cells = <1>;
> > +                       #size-cells = <0>;
> > +                       little-endian;  /* force the driver in LE mode */
> > +               };
> >         };
> >  };
> > --
> > 2.17.1
> >

^ permalink raw reply

* [PATCH bpf-next 0/5] net: xdp: allow offload to coexist with generic
From: Jakub Kicinski @ 2019-02-06  4:03 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: netdev, oss-drivers, Jakub Kicinski

Hi!

Offloaded and native/driver XDP programs can already coexist.
Allow offload and generic hook to coexist as well, there seem
to be no reason why not to do so.

Jakub Kicinski (5):
  selftests/bpf: fix the expected messages
  net: xdp: allow generic and driver XDP on one interface
  selftests/bpf: print traceback when test fails
  selftests/bpf: add test for mixing generic and offload XDP
  selftests/bpf: test reading the offloaded program

 net/core/dev.c                              |  10 +-
 tools/testing/selftests/bpf/test_offload.py | 135 ++++++++++++--------
 2 files changed, 85 insertions(+), 60 deletions(-)

-- 
2.19.2


^ permalink raw reply

* [PATCH bpf-next 1/5] selftests/bpf: fix the expected messages
From: Jakub Kicinski @ 2019-02-06  4:03 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: netdev, oss-drivers, Jakub Kicinski
In-Reply-To: <20190206040324.22109-1-jakub.kicinski@netronome.com>

Recent changes added extack to program replacement path,
expect extack instead of generic messages.

Fixes: 01dde20ce04b ("xdp: Provide extack messages when prog attachment failed")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 tools/testing/selftests/bpf/test_offload.py | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_offload.py b/tools/testing/selftests/bpf/test_offload.py
index d59642e70f56..a564d1a5a1b8 100755
--- a/tools/testing/selftests/bpf/test_offload.py
+++ b/tools/testing/selftests/bpf/test_offload.py
@@ -842,7 +842,9 @@ netns = []
     ret, _, err = sim.set_xdp(obj, "generic", force=True,
                               fail=False, include_stderr=True)
     fail(ret == 0, "Replaced XDP program with a program in different mode")
-    fail(err.count("File exists") != 1, "Replaced driver XDP with generic")
+    check_extack(err,
+                 "native and generic XDP can't be active at the same time.",
+                 args)
     ret, _, err = sim.set_xdp(obj, "", force=True,
                               fail=False, include_stderr=True)
     fail(ret == 0, "Replaced XDP program with a program in different mode")
@@ -957,7 +959,8 @@ netns = []
 
     start_test("Test multi-attachment XDP - replace...")
     ret, _, err = sim.set_xdp(obj, "offload", fail=False, include_stderr=True)
-    fail(err.count("busy") != 1, "Replaced one of programs without -force")
+    fail(ret == 0, "Replaced one of programs without -force")
+    check_extack(err, "XDP program already attached.", args)
 
     start_test("Test multi-attachment XDP - detach...")
     ret, _, err = sim.unset_xdp("drv", force=True,
-- 
2.19.2


^ permalink raw reply related

* [PATCH bpf-next 2/5] net: xdp: allow generic and driver XDP on one interface
From: Jakub Kicinski @ 2019-02-06  4:03 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: netdev, oss-drivers, Jakub Kicinski
In-Reply-To: <20190206040324.22109-1-jakub.kicinski@netronome.com>

Since commit a25717d2b604 ("xdp: support simultaneous driver and
hw XDP attachment") users can load an XDP program for offload and
in native driver mode simultaneously.  Allow a similar mix of
offload and SKB mode/generic XDP.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 net/core/dev.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index bfa4be42afff..78c3b48392e1 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7976,11 +7976,13 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 	enum bpf_netdev_command query;
 	struct bpf_prog *prog = NULL;
 	bpf_op_t bpf_op, bpf_chk;
+	bool offload;
 	int err;
 
 	ASSERT_RTNL();
 
-	query = flags & XDP_FLAGS_HW_MODE ? XDP_QUERY_PROG_HW : XDP_QUERY_PROG;
+	offload = flags & XDP_FLAGS_HW_MODE;
+	query = offload ? XDP_QUERY_PROG_HW : XDP_QUERY_PROG;
 
 	bpf_op = bpf_chk = ops->ndo_bpf;
 	if (!bpf_op && (flags & (XDP_FLAGS_DRV_MODE | XDP_FLAGS_HW_MODE))) {
@@ -7993,8 +7995,7 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 		bpf_chk = generic_xdp_install;
 
 	if (fd >= 0) {
-		if (__dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG) ||
-		    __dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG_HW)) {
+		if (!offload && __dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG)) {
 			NL_SET_ERR_MSG(extack, "native and generic XDP can't be active at the same time");
 			return -EEXIST;
 		}
@@ -8009,8 +8010,7 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 		if (IS_ERR(prog))
 			return PTR_ERR(prog);
 
-		if (!(flags & XDP_FLAGS_HW_MODE) &&
-		    bpf_prog_is_dev_bound(prog->aux)) {
+		if (!offload && bpf_prog_is_dev_bound(prog->aux)) {
 			NL_SET_ERR_MSG(extack, "using device-bound program without HW_MODE flag is not supported");
 			bpf_prog_put(prog);
 			return -EINVAL;
-- 
2.19.2


^ permalink raw reply related

* [PATCH bpf-next 3/5] selftests/bpf: print traceback when test fails
From: Jakub Kicinski @ 2019-02-06  4:03 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: netdev, oss-drivers, Jakub Kicinski
In-Reply-To: <20190206040324.22109-1-jakub.kicinski@netronome.com>

Figuring out which exact check in test_offload.py takes more
time than it should.  Print the traceback (to the screen and
the logs).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 tools/testing/selftests/bpf/test_offload.py | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/test_offload.py b/tools/testing/selftests/bpf/test_offload.py
index a564d1a5a1b8..c379b36a5443 100755
--- a/tools/testing/selftests/bpf/test_offload.py
+++ b/tools/testing/selftests/bpf/test_offload.py
@@ -23,6 +23,7 @@ import string
 import struct
 import subprocess
 import time
+import traceback
 
 logfile = None
 log_level = 1
@@ -78,7 +79,9 @@ netns = [] # net namespaces to be removed
     if not cond:
         return
     print("FAIL: " + msg)
-    log("FAIL: " + msg, "", level=1)
+    tb = "".join(traceback.extract_stack().format())
+    print(tb)
+    log("FAIL: " + msg, tb, level=1)
     os.sys.exit(1)
 
 def start_test(msg):
-- 
2.19.2


^ permalink raw reply related

* [PATCH bpf-next 4/5] selftests/bpf: add test for mixing generic and offload XDP
From: Jakub Kicinski @ 2019-02-06  4:03 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: netdev, oss-drivers, Jakub Kicinski
In-Reply-To: <20190206040324.22109-1-jakub.kicinski@netronome.com>

Add simple sanity check for enabling generic and offload
XDP, simply reuse the native and offload checks.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 tools/testing/selftests/bpf/test_offload.py | 116 +++++++++++---------
 1 file changed, 62 insertions(+), 54 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_offload.py b/tools/testing/selftests/bpf/test_offload.py
index c379b36a5443..13fe12d09704 100755
--- a/tools/testing/selftests/bpf/test_offload.py
+++ b/tools/testing/selftests/bpf/test_offload.py
@@ -603,6 +603,65 @@ def bpftool_prog_load(sample, file_name, maps=[], prog_type="xdp", dev=None,
                             include_stderr=True)
     check_no_extack(res, needle)
 
+def test_multi_prog(sim, obj, modename, modeid):
+    start_test("Test multi-attachment XDP - %s + offload..." %
+               (modename or "default", ))
+    sim.set_xdp(obj, "offload")
+    xdp = sim.ip_link_show(xdp=True)["xdp"]
+    offloaded = sim.dfs_read("bpf_offloaded_id")
+    fail("prog" not in xdp, "Base program not reported in single program mode")
+    fail(len(xdp["attached"]) != 1,
+         "Wrong attached program count with one program")
+
+    sim.set_xdp(obj, modename)
+    two_xdps = sim.ip_link_show(xdp=True)["xdp"]
+    offloaded2 = sim.dfs_read("bpf_offloaded_id")
+
+    fail(two_xdps["mode"] != 4, "Bad mode reported with multiple programs")
+    fail("prog" in two_xdps, "Base program reported in multi program mode")
+    fail(xdp["attached"][0] not in two_xdps["attached"],
+         "Offload program not reported after other activated")
+    fail(len(two_xdps["attached"]) != 2,
+         "Wrong attached program count with two programs")
+    fail(two_xdps["attached"][0]["prog"]["id"] ==
+         two_xdps["attached"][1]["prog"]["id"],
+         "Offloaded and other programs have the same id")
+    fail(offloaded != offloaded2,
+         "Offload ID changed after loading other program")
+
+    start_test("Test multi-attachment XDP - replace...")
+    ret, _, err = sim.set_xdp(obj, "offload", fail=False, include_stderr=True)
+    fail(ret == 0, "Replaced one of programs without -force")
+    check_extack(err, "XDP program already attached.", args)
+
+    if modename == "" or modename == "drv":
+        othermode = "" if modename == "drv" else "drv"
+        start_test("Test multi-attachment XDP - detach...")
+        ret, _, err = sim.unset_xdp(othermode, force=True,
+                                    fail=False, include_stderr=True)
+        fail(ret == 0, "Removed program with a bad mode")
+        check_extack(err, "program loaded with different flags.", args)
+
+    sim.unset_xdp("offload")
+    xdp = sim.ip_link_show(xdp=True)["xdp"]
+    offloaded = sim.dfs_read("bpf_offloaded_id")
+
+    fail(xdp["mode"] != modeid, "Bad mode reported after multiple programs")
+    fail("prog" not in xdp,
+         "Base program not reported after multi program mode")
+    fail(xdp["attached"][0] not in two_xdps["attached"],
+         "Offload program not reported after other activated")
+    fail(len(xdp["attached"]) != 1,
+         "Wrong attached program count with remaining programs")
+    fail(offloaded != "0", "Offload ID reported with only other program left")
+
+    start_test("Test multi-attachment XDP - device remove...")
+    sim.set_xdp(obj, "offload")
+    sim.remove()
+
+    sim = NetdevSim()
+    sim.set_ethtool_tc_offloads(True)
+    return sim
 
 # Parse command line
 parser = argparse.ArgumentParser()
@@ -936,60 +995,9 @@ netns = []
     rm(pin_file)
     bpftool_prog_list_wait(expected=0)
 
-    start_test("Test multi-attachment XDP - attach...")
-    sim.set_xdp(obj, "offload")
-    xdp = sim.ip_link_show(xdp=True)["xdp"]
-    offloaded = sim.dfs_read("bpf_offloaded_id")
-    fail("prog" not in xdp, "Base program not reported in single program mode")
-    fail(len(ipl["xdp"]["attached"]) != 1,
-         "Wrong attached program count with one program")
-
-    sim.set_xdp(obj, "")
-    two_xdps = sim.ip_link_show(xdp=True)["xdp"]
-    offloaded2 = sim.dfs_read("bpf_offloaded_id")
-
-    fail(two_xdps["mode"] != 4, "Bad mode reported with multiple programs")
-    fail("prog" in two_xdps, "Base program reported in multi program mode")
-    fail(xdp["attached"][0] not in two_xdps["attached"],
-         "Offload program not reported after driver activated")
-    fail(len(two_xdps["attached"]) != 2,
-         "Wrong attached program count with two programs")
-    fail(two_xdps["attached"][0]["prog"]["id"] ==
-         two_xdps["attached"][1]["prog"]["id"],
-         "offloaded and drv programs have the same id")
-    fail(offloaded != offloaded2,
-         "offload ID changed after loading driver program")
-
-    start_test("Test multi-attachment XDP - replace...")
-    ret, _, err = sim.set_xdp(obj, "offload", fail=False, include_stderr=True)
-    fail(ret == 0, "Replaced one of programs without -force")
-    check_extack(err, "XDP program already attached.", args)
-
-    start_test("Test multi-attachment XDP - detach...")
-    ret, _, err = sim.unset_xdp("drv", force=True,
-                                fail=False, include_stderr=True)
-    fail(ret == 0, "Removed program with a bad mode")
-    check_extack(err, "program loaded with different flags.", args)
-
-    sim.unset_xdp("offload")
-    xdp = sim.ip_link_show(xdp=True)["xdp"]
-    offloaded = sim.dfs_read("bpf_offloaded_id")
-
-    fail(xdp["mode"] != 1, "Bad mode reported after multiple programs")
-    fail("prog" not in xdp,
-         "Base program not reported after multi program mode")
-    fail(xdp["attached"][0] not in two_xdps["attached"],
-         "Offload program not reported after driver activated")
-    fail(len(ipl["xdp"]["attached"]) != 1,
-         "Wrong attached program count with remaining programs")
-    fail(offloaded != "0", "offload ID reported with only driver program left")
-
-    start_test("Test multi-attachment XDP - device remove...")
-    sim.set_xdp(obj, "offload")
-    sim.remove()
-
-    sim = NetdevSim()
-    sim.set_ethtool_tc_offloads(True)
+    sim = test_multi_prog(sim, obj, "", 1)
+    sim = test_multi_prog(sim, obj, "drv", 1)
+    sim = test_multi_prog(sim, obj, "generic", 2)
 
     start_test("Test mixing of TC and XDP...")
     sim.tc_add_ingress()
-- 
2.19.2


^ permalink raw reply related

* [PATCH bpf-next 5/5] selftests/bpf: test reading the offloaded program
From: Jakub Kicinski @ 2019-02-06  4:03 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: netdev, oss-drivers, Jakub Kicinski
In-Reply-To: <20190206040324.22109-1-jakub.kicinski@netronome.com>

Test adding the offloaded program after the other program
is already installed.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 tools/testing/selftests/bpf/test_offload.py | 29 ++++++++++++++-------
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_offload.py b/tools/testing/selftests/bpf/test_offload.py
index 13fe12d09704..84bea3985d64 100755
--- a/tools/testing/selftests/bpf/test_offload.py
+++ b/tools/testing/selftests/bpf/test_offload.py
@@ -592,6 +592,15 @@ def bpftool_prog_load(sample, file_name, maps=[], prog_type="xdp", dev=None,
             return
     fail(True, "Missing or incorrect message from netdevsim in verifier log")
 
+def check_multi_basic(two_xdps):
+    fail(two_xdps["mode"] != 4, "Bad mode reported with multiple programs")
+    fail("prog" in two_xdps, "Base program reported in multi program mode")
+    fail(len(two_xdps["attached"]) != 2,
+         "Wrong attached program count with two programs")
+    fail(two_xdps["attached"][0]["prog"]["id"] ==
+         two_xdps["attached"][1]["prog"]["id"],
+         "Offloaded and other programs have the same id")
+
 def test_spurios_extack(sim, obj, skip_hw, needle):
     res = sim.cls_bpf_add_filter(obj, prio=1, handle=1, skip_hw=skip_hw,
                                  include_stderr=True)
@@ -615,17 +624,12 @@ def bpftool_prog_load(sample, file_name, maps=[], prog_type="xdp", dev=None,
 
     sim.set_xdp(obj, modename)
     two_xdps = sim.ip_link_show(xdp=True)["xdp"]
-    offloaded2 = sim.dfs_read("bpf_offloaded_id")
 
-    fail(two_xdps["mode"] != 4, "Bad mode reported with multiple programs")
-    fail("prog" in two_xdps, "Base program reported in multi program mode")
     fail(xdp["attached"][0] not in two_xdps["attached"],
          "Offload program not reported after other activated")
-    fail(len(two_xdps["attached"]) != 2,
-         "Wrong attached program count with two programs")
-    fail(two_xdps["attached"][0]["prog"]["id"] ==
-         two_xdps["attached"][1]["prog"]["id"],
-         "Offloaded and other programs have the same id")
+    check_multi_basic(two_xdps)
+
+    offloaded2 = sim.dfs_read("bpf_offloaded_id")
     fail(offloaded != offloaded2,
          "Offload ID changed after loading other program")
 
@@ -655,8 +659,15 @@ def bpftool_prog_load(sample, file_name, maps=[], prog_type="xdp", dev=None,
          "Wrong attached program count with remaining programs")
     fail(offloaded != "0", "Offload ID reported with only other program left")
 
-    start_test("Test multi-attachment XDP - device remove...")
+    start_test("Test multi-attachment XDP - reattach...")
     sim.set_xdp(obj, "offload")
+    two_xdps = sim.ip_link_show(xdp=True)["xdp"]
+
+    fail(xdp["attached"][0] not in two_xdps["attached"],
+         "Other program not reported after offload activated")
+    check_multi_basic(two_xdps)
+
+    start_test("Test multi-attachment XDP - device remove...")
     sim.remove()
 
     sim = NetdevSim()
-- 
2.19.2


^ permalink raw reply related

* Re: Pull patches from tip/perf/core to bpf-next
From: Song Liu @ 2019-02-06  4:07 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, Alexei Starovoitov, Kernel Team, Netdev
In-Reply-To: <20190206034729.ao2gifc4phbhrhib@ast-mbp>



> On Feb 5, 2019, at 7:47 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> On Wed, Feb 06, 2019 at 03:44:29AM +0000, Song Liu wrote:
>> 
>> 
>>> On Feb 5, 2019, at 7:36 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>>> 
>>> On Tue, Feb 05, 2019 at 10:47:06PM +0000, Song Liu wrote:
>>>> Hi Alexei and Daniel, 
>>>> 
>>>> The following patches are required for BPF introspection in perf tools. 
>>>> Please pull them to bpf-next, so that we get all the dependencies in one
>>>> tree. 
>>>> 
>>>> Thanks,
>>>> Song
>>>> 
>>>> (from 1/10 to 10/10)
>>>> 76193a94522f perf, bpf: Introduce PERF_RECORD_KSYMBOL
>>>> d764ac646491 tools headers uapi: Sync tools/include/uapi/linux/perf_event.h
>>>> 6ee52e2a3fe4 perf, bpf: Introduce PERF_RECORD_BPF_EVENT
>>>> df063c83aa2c tools headers uapi: Sync tools/include/uapi/linux/perf_event.h
>>>> 9aa0bfa370b2 perf tools: Handle PERF_RECORD_KSYMBOL
>>>> 45178a928a4b perf tools: Handle PERF_RECORD_BPF_EVENT
>>>> 7b612e291a5a perf tools: Synthesize PERF_RECORD_* for loaded BPF programs
>>>> a40b95bcd30c perf top: Synthesize BPF events for pre-existing loaded BPF programs
>>>> 6934058d9fb6 bpf: Add module name [bpf] to ksymbols for bpf programs
>>>> 811184fb6977 perf bpf: Fix synthesized PERF_RECORD_KSYMBOL/BPF_EVENT
>>> 
>>> yes. we can certainly do that.
>>> Do you have bpf specific patches that depend on that ?
>>> Since it's rc5 already. Are you planning to send them within next week?
>> 
>> BPF introspection work depends on these patches. I have been hopping 
>> between perf tree and bpf-next tree. I think basing the series up on 
>> bpf-next plus these patches leads least conflicts. 
>> 
>> I do plan to send the series within next week. 
>> 
>> On a second thought, maybe I should send based on perf tree, and worry
>> about the conflicts later? It is really heavier on perf side. 
> 
> whichever way is easier.
> if bpf-next is the best use that as a base with above patches.
> Once your set gets Acks from perf folks we can push above patches
> from tip first and then apply your set.

I see. Let's wait until the patchset is acked. 

Thanks,
Song

^ permalink raw reply

* Re: [RFC bpf-next 0/7] net: flow_dissector: trigger BPF hook when called from eth_get_headlen
From: Alexei Starovoitov @ 2019-02-06  4:11 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Willem de Bruijn, Stanislav Fomichev, Network Development,
	David Miller, Alexei Starovoitov, Daniel Borkmann, simon.horman,
	Willem de Bruijn
In-Reply-To: <20190206035619.GG10769@mini-arch>

On Tue, Feb 05, 2019 at 07:56:19PM -0800, Stanislav Fomichev wrote:
> On 02/05, Alexei Starovoitov wrote:
> > On Tue, Feb 05, 2019 at 04:59:31PM -0800, Stanislav Fomichev wrote:
> > > On 02/05, Alexei Starovoitov wrote:
> > > > On Tue, Feb 05, 2019 at 12:40:03PM -0800, Stanislav Fomichev wrote:
> > > > > On 02/05, Willem de Bruijn wrote:
> > > > > > On Tue, Feb 5, 2019 at 12:57 PM Stanislav Fomichev <sdf@google.com> wrote:
> > > > > > >
> > > > > > > Currently, when eth_get_headlen calls flow dissector, it doesn't pass any
> > > > > > > skb. Because we use passed skb to lookup associated networking namespace
> > > > > > > to find whether we have a BPF program attached or not, we always use
> > > > > > > C-based flow dissector in this case.
> > > > > > >
> > > > > > > The goal of this patch series is to add new networking namespace argument
> > > > > > > to the eth_get_headlen and make BPF flow dissector programs be able to
> > > > > > > work in the skb-less case.
> > > > > > >
> > > > > > > The series goes like this:
> > > > > > > 1. introduce __init_skb and __init_skb_shinfo; those will be used to
> > > > > > >    initialize temporary skb
> > > > > > > 2. introduce skb_net which can be used to get networking namespace
> > > > > > >    associated with an skb
> > > > > > > 3. add new optional network namespace argument to __skb_flow_dissect and
> > > > > > >    plumb through the callers
> > > > > > > 4. add new __flow_bpf_dissect which constructs temporary on-stack skb
> > > > > > >    (using __init_skb) and calls BPF flow dissector program
> > > > > > 
> > > > > > The main concern I see with this series is this cost of skb zeroing
> > > > > > for every packet in the device driver receive routine, *independent*
> > > > > > from the real skb allocation and zeroing which will likely happen
> > > > > > later.
> > > > > Yes, plus ~200 bytes on the stack for the callers.
> > > > > 
> > > > > Not sure how visible this zeroing though, I can probably try to get some
> > > > > numbers from BPF_PROG_TEST_RUN (running current version vs running with
> > > > > on-stack skb).
> > > > 
> > > > imo extra 256 byte memset for every packet is non starter.
> > > We can put pre-allocated/initialized skbs without data into percpu or even
> > > use pcpu_freelist_pop/pcpu_freelist_push to make sure we don't have to think
> > > about having multiple percpu for irq/softirq/process contexts.
> > > Any concerns with that approach?
> > > Any other possible concerns with the overall series?
> > 
> > I'm missing why the whole thing is needed.
> > You're saying:
> > " make BPF flow dissector programs be able to work in the skb-less case".
> > What does it mean specifically?
> > The only non-skb case is XDP.
> > Are you saying you want flow_dissector prog to be run in XDP?
> eth_get_headlen that drivers call on RX path on a chunk of data to
> guesstimate the length of the headers calls flow dissector without an skb
> (__skb_flow_dissect was a weird interface where it accepts skb or
> data+len). Right now, there is no way to trigger BPF flow dissector
> for this case (we don't have an skb to get associated namespace/etc/etc).
> The patch series tries to fix that to make sure that we always trigger
> BPF program if it's attached to a device's namespace.

then why not to create flow_dissector prog type that works without skb?
Why do you need to fake an skb?
XDP progs work just fine without it.


^ permalink raw reply

* Re: [net-next 00/14][pull request] Intel Wired LAN Driver Updates 2019-02-05
From: David Miller @ 2019-02-06  4:21 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann
In-Reply-To: <20190206020424.12225-1-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Tue,  5 Feb 2019 18:04:10 -0800

> This series contains updates to igc, e1000e, ixgbe, fm10k and driver
> documentation.
> 
> Kai-Heng Feng fixes an e1000e issue where the Wake-On-LAN settings where
> being set incorrectly during a system suspend.
> 
> Sasha addresses community feedback on the igc driver and provides a
> number of code cleanups to remove either unreachable or unused code.  In
> addition, added basic ethtool support for the igc driver.
> 
> Mike Rapoport fixes the formatting of the kernel driver documentation so
> that the title is properly formatted and does not get lumped with the
> document sections in the HTML kernel documents generated.
> 
> Jiri Kosina updates a hard coded RAR entries value with the existing
> define IXGBE_82599_RAR_ENTRIES.
> 
> Jake fixes up whitespace in the fm10k driver.
> 
> Konstantin Khlebnikov fixes an issue where in some cases, the e1000e
> driver will continually reset during a system boot because the watchdog
> task sees items in the transmit buffer but the carrier is off (trying to
> establish link) causing the device reset to flush the buffer.  To
> resolve, just move this check/flush into the watchdog section for when
> the carrier is off.
> 
> Todd bumps the igb driver version to reflect the recent driver changes.

Pulled, thanks Jeff.

^ permalink raw reply

* Re: [PATCH bpf-next] tools/bpf: fix a selftest test_btf failure
From: Yonghong Song @ 2019-02-06  4:26 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, netdev@vger.kernel.org, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team
In-Reply-To: <20190206023615.7w5q2y4xpk3acde4@ast-mbp>



On 2/5/19 6:36 PM, Alexei Starovoitov wrote:
> On Tue, Feb 05, 2019 at 02:28:44PM -0800, Yonghong Song wrote:
>> Commit 9c651127445c ("selftests/btf: add initial BTF dedup tests")
>> added dedup tests in test_btf.c.
>> It broke the raw test:
>>   BTF raw test[71] (func proto (Bad arg name_off)):
>>      btf_raw_create:2905:FAIL Error getting string #65535, strs_cnt:1
>>
>> The test itself encodes invalid func_proto parameter name
>> offset 0xffffFFFF as a negative test for the kernel.
>> The above commit changed the meaning of that offset and
>> resulted in a user space error.
>>    #define NAME_NTH(N) (0xffff0000 | N)
>>    #define IS_NAME_NTH(X) ((X & 0xffff0000) == 0xffff0000)
>>    #define GET_NAME_NTH_IDX(X) (X & 0x0000ffff)
>>
>> Currently, the kernel permits maximum name offset 0xffff.
>> Set the test name off as 0x0fffFFFF to trigger the kernel
>> verification failure.
>>
>> Cc: Andrii Nakryiko <andriin@fb.com>
>> Fixes: 9c651127445c ("selftests/btf: add initial BTF dedup tests")
>> Signed-off-by: Yonghong Song <yhs@fb.com>
> 
> Applied, Thanks
> 
> Also I see the following new error in test_btf:
> BTF libbpf test[2] (test_btf_nokv.o): libbpf: map:btf_map container_name:____btf_map_btf_map cannot be found in BTF. Missing BPF_ANNOTATE_KV_PAIR?
> OK
> 
> A bunch of similar errors are in test_progs as well.
> I suspect it's related to the last few btf changes.
> Andrii, Yonghong, Please investigate.

It is due to one of previous patches to refector btf__get_map_kv_tids(),
accidentally using pr_warning instead of original pr_debug.
Will submit a patch soon.

> 

^ permalink raw reply

* Re: [RFC 00/14] netlink/hierarchical stats
From: Jakub Kicinski @ 2019-02-06  4:45 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: David Miller, oss-drivers, netdev, Jiří Pírko,
	Florian Fainelli, Andrew Lunn, Michal Kubecek, David Ahern,
	Simon Horman, Brandeburg, Jesse, Maciek Fijałkowski,
	vasundhara-v.volam, Michael Chan, shalomt, Ido Schimmel
In-Reply-To: <CAJieiUh7wEjYo1pUuPCgAFTNkWs+kvWiSScBGFzhNyOc2TibOA@mail.gmail.com>

On Sat, 2 Feb 2019 15:14:44 -0800, Roopa Prabhu wrote:
> On Thu, Jan 31, 2019 at 11:31 AM Jakub Kicinski wrote:
> > On Thu, 31 Jan 2019 08:31:51 -0800, Roopa Prabhu wrote:  
> > > On Thu, Jan 31, 2019 at 8:16 AM Roopa Prabhu wrote:  
> > > > On Wed, Jan 30, 2019 at 4:24 PM Jakub Kicinski wrote:  
> > > > > On Wed, 30 Jan 2019 14:14:34 -0800, Roopa Prabhu wrote:
> > > > >
> > > > > My thinking was that we should leave truly custom/strange stats to
> > > > > ethtool API which works quite well for that and at the same time be
> > > > > very accepting of people adding new IDs to HSTAT (only requirement is
> > > > > basically defining the meaning very clearly).  
> > > >
> > > > that sounds reasonable. But the 'defining meaning clearly' gets tricky
> > > > sometimes.
> > > > The vendor who gets their ID or meaning first wins :) and the rest
> > > > will have to live with
> > > > ethtool and explain to rest of the world that ethtool is more reliable
> > > > for their hardware :)  
> >
> > Right, that's the trade off inherent to standardization.  I don't see
> > any way to work around the fact that the definition may not fit all.
> >
> > What I want as a end user and what I want for my customers is the
> > ability to switch the NIC on their system and not spend two months
> > "integrating" into their automation :(  If the definition of statistics
> > is not solid we're back to square one.  
> 
> agree. And I am with you on standardizing them.
> 
> >  
> > > > I am also concerned that this getting the ID into common HSTAT ID
> > > > space will  slow down the process of adding new counters
> > > > for vendors. Which will lead to vendors sticking with ethtool API.  
> >
> > I feel like whatever we did here will end up looking much like the
> > ethtool interface, which is why I decided to leave that part out.
> > Ethtool -S works pretty well for custom stats.  Standard and structured
> > stats don't fit with it in any way, the two seem best left separate.  
> 
> understand the fear. My only point was getting them together in a
> single API is better so that they don't get developed separately and
> we don't end up with duplicate stats code.
> 
> >  
> > > > It would be great if people can get all stats in one place and not
> > > > rely on another API for 'more'.  
> >
> > One place in the driver or for the user?  I'm happy to add the code to
> > ethtool to also dump hstats and render them in a standard way.  In fact
> > the tool I have for testing has a "simplified" output format which
> > looks exactly like ethtool -S.
> >
> > One place for the driver to report is hard, as I said I think the
> > custom stats are best left with ethtool.  Adding an extra incentive to
> > standardize.
> >  
> > > > > For the first stab I looked at two drivers and added all the stats that
> > > > > were common.
> > > > >
> > > > > Given this set is identifying statistics by ID - how would we make that
> > > > > extensible to drivers?  Would we go back to strings or have some
> > > > > "driver specific" ID space?  
> > > >
> > > > I was looking for ideas from you really, to see if you had considered
> > > > this. agree per driver ID space seems ugly.
> > > > ethtool strings are great today...if we can control the duplication.
> > > > But thinking some more..., i did see some
> > > > patches recently for vendor specific parameter (with ID) space in
> > > > devlink. maybe something like that will be
> > > > reasonable ?  
> >
> > I thought about this for a year and I basically came to the conclusion
> > I can't find any perfect solution, if there is one.
> >
> > The devlink parameters are useful, but as anticipated they became the
> > laziest excuse of an ABI... Don't get me started ;)
> >  
> > > > > Is there any particular type of statistic you'd expect drivers to want
> > > > > to add?  For NICs I think IEEE/RMON should pretty much cover the
> > > > > silicon ones, but I don't know much about switches :)  
> > > >
> > > > I will have to go through the list. But switch asics do support
> > > > flexible stats/counters that can be attached at various points.
> > > > And new chip versions come with more support. Having that flexibility
> > > > to expose/extend such stats incrementally is very valuable on a per
> > > > hardware/vendor basis.  
> >
> > Yes, I'm not too familiar with those counters.  Do they need to be
> > enabled to start counting?  
> 
> yes correct.
> 
> > Do they have performance impact?  
> 
> I have not heard of any performance impact...but they are not enabled
> by default because of limited counter resource pool.

I see.. I'd personally see that as something that we probably either
support via perf, or new devlink perf creation.  Those are perf events,
not stats to me.  Devlink would probably suit fixed HW better, and
perf could feel slightly more natural to certain NICs (*ekhm* perf
traces of offloaded BPF programs).

> > Can the
> > "sample" events perf-style?  
> 
> I don't think so
> 
> > How is the condition on which they trigger
> > defined?  Is it maybe just "match a packet and increment a counter"?  
> 
> yes, something like that.
> 
> > Would such counters benefit from hierarchical structure?  
> 
> hmm not sure.
> 
> 
> One thing though, for most of these flexible counters and their
> attachment points in hardware, we can count them on logical devices or
> other objects in software like per vlan, vni, route stats etc.
> 
> >
> > I was trying to cover the long standing use cases - namely the
> > IEEE/RMON stats which all MAC have had for years and per queue stats
> > which all drivers have had for years.  But if we can cater to more
> > cases I'm open.
> >  
> > > Just want to clarify that I am suggesting a nested HSTATS extension
> > > infra for drivers (just like ethtool).
> > > 'Common stats' stays at the top-level.  
> >
> > I got a concept of groups here.  The dump generally looks like this:
> >
> > [root group A (say MAC stats)]
> >   [sub group RX]
> >   [sub group TX]
> > [root group B (say PCIe stats)]
> >   [sub group RX]
> >   [sub group TX]
> > [root group C (say per-q driver stats]
> >   [sub group RX]
> >     [q1 group]
> >     [q2 group]
> >     [q3 group]
> >   [sub group TX]
> >     [q1 group]
> >     [q2 group]
> >     [q3 group]
> >
> > Each root group representing a "point in the pipeline".
> >
> > So it's not too hard to add a root group with whatever, the questions
> > are move how would it benefit over existing ethtool if the stats are
> > custom anyway?  Hm..  
> 
> It wouldn't. I am only saying that the netlink stats API is the new
> place to move stats.
> Ethtool stats will have to move to netlink some day and I don't see a
> need to draw a hardline on saying no to ethtool custom stats moving to
> the netlink based common stats API. And unless there is a good
> migration path for a new hardware stats API that is all inclusive,
> there is a higher chance of continued development on the older
> hardware stats API.
> I have no objections to having a standard set of stats (this is
> essentially what we have for software stats too).
> 
> I don't want to block your series from going forward without HW custom
> stats extensions. And IIUC your API is extensible and does not
> preclude anyone from adding the ability to include HW custom stats
> extensions in the future with enough justification. That is good for
> me.

Would you be more interested in seeing the similarity in API on the
driver side or on the netlink side?  I was hoping to leave the legacy
stats in ethtool (soon to be running over netlink as well) for the
time being.  I wish we had some form of library on the iproute2 side 
we could evolve together with the kernel libbpf-style :(

> To take a random example, we expose the following stats on our
> switches via ethtool. I have not used them personally but they
> correspond to respective hardware counters. Is there any room for such
> stats in the new HSTATS netlink API or they will have to stick to
> ethtool ?
> I believe people will need per-queue counters for this.
> 
>      HwIfOutWredDrops: 0
>      HwIfOutQ0WredDrops: 0
>      HwIfOutQ1WredDrops: 0
>      HwIfOutQ2WredDrops: 0
>      HwIfOutQ3WredDrops: 0
>      HwIfOutQ4WredDrops: 0
>      HwIfOutQ5WredDrops: 0
>      HwIfOutQ6WredDrops: 0
>      HwIfOutQ7WredDrops: 0
>      HwIfOutQ8WredDrops: 0
>      HwIfOutQ9WredDrops: 0

Well, yes, so these are clearly enough defined stats, and I'd be very
happy to add an ID for you for those... if those shouldn't be reported
in the tc qdisc red stats that should be used to configure WRED :(

^ permalink raw reply

* BUG: spinlock bad magic in __queue_work
From: syzbot @ 2019-02-06  5:01 UTC (permalink / raw)
  To: davem, kuznet, linux-kernel, netdev, syzkaller-bugs, yoshfuji

Hello,

syzbot found the following crash on:

HEAD commit:    962c382d482a Merge tag 'mac80211-next-for-davem-2019-02-01..
git tree:       net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=169299f8c00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=33ad02b9305759c3
dashboard link: https://syzkaller.appspot.com/bug?extid=636bcaf4b481f6b7343c
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+636bcaf4b481f6b7343c@syzkaller.appspotmail.com

netlink: 8 bytes leftover after parsing attributes in process  
`syz-executor5'.
BUG: spinlock bad magic on CPU#0, syz-executor0/1737
  lock: 0xffff8880ae82c7e0, .magic: ffffffff, .owner: %us  
workers=%d/1902867055, .owner_cpu: 0
CPU: 0 PID: 1737 Comm: syz-executor0 Not tainted 5.0.0-rc4+ #40
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x172/0x1f0 lib/dump_stack.c:113
  spin_dump.cold+0x81/0xe6 kernel/locking/spinlock_debug.c:67
  spin_bug kernel/locking/spinlock_debug.c:75 [inline]
  debug_spin_lock_before kernel/locking/spinlock_debug.c:83 [inline]
  do_raw_spin_lock+0x231/0x2e0 kernel/locking/spinlock_debug.c:112
  __raw_spin_lock include/linux/spinlock_api_smp.h:143 [inline]
  _raw_spin_lock+0x37/0x40 kernel/locking/spinlock.c:144
  spin_lock include/linux/spinlock.h:329 [inline]
  __queue_work+0x23d/0x1180 kernel/workqueue.c:1417
  __queue_delayed_work+0x1d6/0x270 kernel/workqueue.c:1522
  mod_delayed_work_on+0xd8/0x200 kernel/workqueue.c:1596
  mod_delayed_work include/linux/workqueue.h:542 [inline]
  addrconf_mod_dad_work+0x3f/0xa0 net/ipv6/addrconf.c:328
  addrconf_dad_start+0x76/0xb0 net/ipv6/addrconf.c:4011
  addrconf_add_linklocal+0x28c/0x3c0 net/ipv6/addrconf.c:3190
  addrconf_addr_gen+0x34d/0x3a0 net/ipv6/addrconf.c:3313
  addrconf_dev_config+0x1ea/0x2c0 net/ipv6/addrconf.c:3356
  addrconf_notify+0x3c5/0x2280 net/ipv6/addrconf.c:3591
kobject: 'loop4' (00000000f520b29a): kobject_uevent_env
kobject: 'loop4' (00000000f520b29a): fill_kobj_path: path  
= '/devices/virtual/block/loop4'
  notifier_call_chain+0xc7/0x240 kernel/notifier.c:93
  __raw_notifier_call_chain kernel/notifier.c:394 [inline]
  raw_notifier_call_chain+0x2e/0x40 kernel/notifier.c:401
  call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1739
  call_netdevice_notifiers_extack net/core/dev.c:1751 [inline]
  call_netdevice_notifiers net/core/dev.c:1765 [inline]
  dev_open net/core/dev.c:1436 [inline]
  dev_open+0x143/0x160 net/core/dev.c:1424
  ipmr_new_tunnel+0x3d3/0x4d0 net/ipv4/ipmr.c:511
  vif_add+0x1bd/0xed0 net/ipv4/ipmr.c:873
  ip_mroute_setsockopt+0xd05/0xe10 net/ipv4/ipmr.c:1453
  do_ip_setsockopt.isra.0+0x3036/0x3e00 net/ipv4/ip_sockglue.c:638
  ip_setsockopt+0x49/0x100 net/ipv4/ip_sockglue.c:1246
  raw_setsockopt+0xe0/0x100 net/ipv4/raw.c:860
  sock_common_setsockopt+0x9a/0xe0 net/core/sock.c:3016
  __sys_setsockopt+0x180/0x280 net/socket.c:1902
  __do_sys_setsockopt net/socket.c:1913 [inline]
  __se_sys_setsockopt net/socket.c:1910 [inline]
  __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1910
  do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457e39
Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fbac2bcec78 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000457e39
RDX: 00000000000000ca RSI: 0000000000000000 RDI: 0000000000000004
RBP: 000000000073bfa0 R08: 0000000000000010 R09: 0000000000000000
R10: 0000000020000000 R11: 0000000000000246 R12: 00007fbac2bcf6d4
R13: 00000000004c5c09 R14: 00000000004da070 R15: 00000000ffffffff
netlink: 8 bytes leftover after parsing attributes in process  
`syz-executor5'.


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.

^ permalink raw reply

* KASAN: use-after-free Read in tcp_sk_exit
From: syzbot @ 2019-02-06  5:01 UTC (permalink / raw)
  To: davem, edumazet, kuznet, linux-kernel, netdev, syzkaller-bugs,
	yoshfuji

Hello,

syzbot found the following crash on:

HEAD commit:    33a0efa4baec devlink: Use DIV_ROUND_UP_ULL in DEVLINK_HEAL..
git tree:       net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=14e2e55f400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=505743eba4e4f68
dashboard link: https://syzkaller.appspot.com/bug?extid=f797267da5e5012d0920
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+f797267da5e5012d0920@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: use-after-free in inet_ctl_sock_destroy  
include/net/inet_common.h:56 [inline]
BUG: KASAN: use-after-free in tcp_sk_exit+0x203/0x230  
net/ipv4/tcp_ipv4.c:2588
Read of size 8 at addr ffff888064de4378 by task kworker/u4:3/131

CPU: 0 PID: 131 Comm: kworker/u4:3 Not tainted 5.0.0-rc3+ #17
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Workqueue: netns cleanup_net
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1db/0x2d0 lib/dump_stack.c:113
  print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
  kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
  __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:135
  inet_ctl_sock_destroy include/net/inet_common.h:56 [inline]
  tcp_sk_exit+0x203/0x230 net/ipv4/tcp_ipv4.c:2588
  ops_exit_list.isra.0+0xb0/0x160 net/core/net_namespace.c:153
  cleanup_net+0x51d/0xb10 net/core/net_namespace.c:551
  process_one_work+0xd0c/0x1ce0 kernel/workqueue.c:2153
  worker_thread+0x143/0x14a0 kernel/workqueue.c:2296
  kthread+0x357/0x430 kernel/kthread.c:246
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352

Allocated by task 5864:
  save_stack+0x45/0xd0 mm/kasan/common.c:73
  set_track mm/kasan/common.c:85 [inline]
  __kasan_kmalloc mm/kasan/common.c:496 [inline]
  __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:469
  kasan_kmalloc mm/kasan/common.c:504 [inline]
  kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:411
  kmem_cache_alloc+0x12d/0x710 mm/slab.c:3543
  getname_flags fs/namei.c:140 [inline]
  getname_flags+0xd6/0x5b0 fs/namei.c:129
  getname+0x1a/0x20 fs/namei.c:211
  do_sys_open+0x3a5/0x7c0 fs/open.c:1057
  __do_sys_open fs/open.c:1081 [inline]
  __se_sys_open fs/open.c:1076 [inline]
  __x64_sys_open+0x7e/0xc0 fs/open.c:1076
  do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 5864:
  save_stack+0x45/0xd0 mm/kasan/common.c:73
  set_track mm/kasan/common.c:85 [inline]
  __kasan_slab_free+0x102/0x150 mm/kasan/common.c:458
  kasan_slab_free+0xe/0x10 mm/kasan/common.c:466
  __cache_free mm/slab.c:3487 [inline]
  kmem_cache_free+0x86/0x260 mm/slab.c:3749
  putname+0xef/0x130 fs/namei.c:261
  do_sys_open+0x3f4/0x7c0 fs/open.c:1072
  __do_sys_open fs/open.c:1081 [inline]
  __se_sys_open fs/open.c:1076 [inline]
  __x64_sys_open+0x7e/0xc0 fs/open.c:1076
  do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff888064de4180
  which belongs to the cache names_cache of size 4096
The buggy address is located 504 bytes inside of
  4096-byte region [ffff888064de4180, ffff888064de5180)
The buggy address belongs to the page:
page:ffffea0001937900 count:1 mapcount:0 mapping:ffff88821bc45380 index:0x0  
compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea0001931108 ffffea0001796b88 ffff88821bc45380
raw: 0000000000000000 ffff888064de4180 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff888064de4200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff888064de4280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff888064de4300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                                 ^
  ffff888064de4380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff888064de4400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.

^ permalink raw reply

* KASAN: slab-out-of-bounds Read in ip_send_unicast_reply
From: syzbot @ 2019-02-06  5:02 UTC (permalink / raw)
  To: davem, kuznet, linux-kernel, netdev, syzkaller-bugs, yoshfuji

Hello,

syzbot found the following crash on:

HEAD commit:    b71acb0e3721 Merge branch 'linus' of git://git.kernel.org/..
git tree:       net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=11a6994b400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=b03c5892bb940c76
dashboard link: https://syzkaller.appspot.com/bug?extid=f1741fbf71635c029556
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+f1741fbf71635c029556@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: slab-out-of-bounds in sock_net include/net/sock.h:2403 [inline]
BUG: KASAN: slab-out-of-bounds in ip_send_unicast_reply+0x14a6/0x1800  
net/ipv4/ip_output.c:1569
Read of size 8 at addr ffff888097d4c5ac by task syz-executor4/19801

CPU: 0 PID: 19801 Comm: syz-executor4 Not tainted 4.20.0+ #3
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  <IRQ>
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1db/0x2d0 lib/dump_stack.c:113
  print_address_description.cold+0x7c/0x20d mm/kasan/report.c:256
  kasan_report_error mm/kasan/report.c:354 [inline]
  kasan_report mm/kasan/report.c:412 [inline]
  kasan_report.cold+0x8c/0x2ba mm/kasan/report.c:396
  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
  sock_net include/net/sock.h:2403 [inline]
  ip_send_unicast_reply+0x14a6/0x1800 net/ipv4/ip_output.c:1569
  tcp_v4_send_reset+0x1496/0x2ad0 net/ipv4/tcp_ipv4.c:775
  tcp_v4_rcv+0x1e17/0x3c40 net/ipv4/tcp_ipv4.c:1936
  ip_protocol_deliver_rcu+0xb6/0xa20 net/ipv4/ip_input.c:208
  ip_local_deliver_finish+0x23b/0x390 net/ipv4/ip_input.c:234
  NF_HOOK include/linux/netfilter.h:289 [inline]
  NF_HOOK include/linux/netfilter.h:283 [inline]
  ip_local_deliver+0x1f0/0x740 net/ipv4/ip_input.c:255
  dst_input include/net/dst.h:450 [inline]
  ip_rcv_finish+0x1f4/0x2f0 net/ipv4/ip_input.c:414
  NF_HOOK include/linux/netfilter.h:289 [inline]
  NF_HOOK include/linux/netfilter.h:283 [inline]
  ip_rcv+0xed/0x620 net/ipv4/ip_input.c:523
  __netif_receive_skb_one_core+0x160/0x210 net/core/dev.c:4973
  __netif_receive_skb+0x2c/0x1c0 net/core/dev.c:5083
  process_backlog+0x206/0x750 net/core/dev.c:5923
  napi_poll net/core/dev.c:6346 [inline]
  net_rx_action+0x76d/0x1930 net/core/dev.c:6412
  __do_softirq+0x30b/0xb11 kernel/softirq.c:292
  do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1027
  </IRQ>
  do_softirq.part.0+0x11a/0x170 kernel/softirq.c:337
  do_softirq kernel/softirq.c:329 [inline]
  __local_bh_enable_ip+0x211/0x270 kernel/softirq.c:189
  local_bh_enable include/linux/bottom_half.h:32 [inline]
  rcu_read_unlock_bh include/linux/rcupdate.h:696 [inline]
  ip_finish_output2+0xa88/0x1a00 net/ipv4/ip_output.c:231
  ip_finish_output+0x7e4/0xf60 net/ipv4/ip_output.c:317
  NF_HOOK_COND include/linux/netfilter.h:278 [inline]
  ip_output+0x226/0x880 net/ipv4/ip_output.c:405
  dst_output include/net/dst.h:444 [inline]
  ip_local_out+0xc4/0x1b0 net/ipv4/ip_output.c:124
  __ip_queue_xmit+0x99a/0x1ee0 net/ipv4/ip_output.c:505
  ip_queue_xmit+0x5a/0x70 include/net/ip.h:198
  __tcp_transmit_skb+0x1add/0x3af0 net/ipv4/tcp_output.c:1160
  tcp_transmit_skb net/ipv4/tcp_output.c:1176 [inline]
  tcp_connect+0x33ca/0x4770 net/ipv4/tcp_output.c:3535
kobject: 'rx-0' (0000000070ecd47e): kobject_uevent_env
  tcp_v4_connect+0x1598/0x1da0 net/ipv4/tcp_ipv4.c:315
  __inet_stream_connect+0x9a3/0x11b0 net/ipv4/af_inet.c:655
kobject: 'rx-0' (0000000070ecd47e): fill_kobj_path: path  
= '/devices/virtual/net/gretap0/queues/rx-0'
kobject: 'tx-0' (00000000301d47dc): kobject_add_internal: parent: 'queues',  
set: 'queues'
  inet_stream_connect+0x58/0xa0 net/ipv4/af_inet.c:719
  __sys_connect+0x357/0x490 net/socket.c:1664
kobject: 'tx-0' (00000000301d47dc): kobject_uevent_env
  __do_sys_connect net/socket.c:1675 [inline]
  __se_sys_connect net/socket.c:1672 [inline]
  __x64_sys_connect+0x73/0xb0 net/socket.c:1672
kobject: 'tx-0' (00000000301d47dc): fill_kobj_path: path  
= '/devices/virtual/net/gretap0/queues/tx-0'
  do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
kobject: 'batman_adv' (00000000ab8b79c4): kobject_add_internal:  
parent: 'gretap0', set: '<NULL>'
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457ec9
Code: 6d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 3b b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f116f5d1c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
kobject: 'erspan0' (000000001f020f96): kobject_add_internal: parent: 'net',  
set: 'devices'
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457ec9
RDX: 0000000000000010 RSI: 0000000020ccb000 RDI: 0000000000000007
RBP: 000000000073bfa0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f116f5d26d4
R13: 00000000004be30a R14: 00000000004ce6d8 R15: 00000000ffffffff

Allocated by task 3862:
  save_stack+0x45/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  kasan_kmalloc mm/kasan/kasan.c:553 [inline]
  kasan_kmalloc+0xce/0xf0 mm/kasan/kasan.c:531
  kasan_slab_alloc+0xf/0x20 mm/kasan/kasan.c:490
  kmem_cache_alloc+0x12e/0x700 mm/slab.c:3554
  getname_flags fs/namei.c:140 [inline]
  getname_flags+0xd6/0x5b0 fs/namei.c:129
  getname+0x1a/0x20 fs/namei.c:211
kobject: 'erspan0' (000000001f020f96): kobject_uevent_env
  do_sys_open+0x3a5/0x740 fs/open.c:1057
  __do_sys_open fs/open.c:1081 [inline]
  __se_sys_open fs/open.c:1076 [inline]
  __x64_sys_open+0x7e/0xc0 fs/open.c:1076
  do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 3862:
  save_stack+0x45/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
  kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
kobject: 'erspan0' (000000001f020f96): fill_kobj_path: path  
= '/devices/virtual/net/erspan0'
  __cache_free mm/slab.c:3498 [inline]
  kmem_cache_free+0x86/0x260 mm/slab.c:3760
  putname+0xef/0x130 fs/namei.c:261
  do_sys_open+0x3f4/0x740 fs/open.c:1072
  __do_sys_open fs/open.c:1081 [inline]
  __se_sys_open fs/open.c:1076 [inline]
  __x64_sys_open+0x7e/0xc0 fs/open.c:1076
  do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff888097d4cb00
  which belongs to the cache names_cache of size 4096
The buggy address is located 1364 bytes to the left of
  4096-byte region [ffff888097d4cb00, ffff888097d4db00)
The buggy address belongs to the page:
kobject: 'queues' (000000003e95a060): kobject_add_internal:  
parent: 'erspan0', set: '<NULL>'
page:ffffea00025f5300 count:1 mapcount:0 mapping:ffff88812c2bec40 index:0x0  
compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea0002498108 ffffea0000cfdc88 ffff88812c2bec40
raw: 0000000000000000 ffff888097d4cb00 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff888097d4c480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ffff888097d4c500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
kobject: 'queues' (000000003e95a060): kobject_uevent_env
> ffff888097d4c580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                                   ^
  ffff888097d4c600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ffff888097d4c680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.

^ permalink raw reply

* Re: [PATCH v2 bpf-next 1/2] btf: separate btf creation and loading
From: Andrii Nakryiko @ 2019-02-06  5:03 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, Song Liu, Yonghong Song, Alexei Starovoitov,
	Martin Lau, netdev, daniel
In-Reply-To: <20190206030333.lurnxegmblbxrqjw@ast-mbp>

On Tue, Feb 5, 2019 at 7:03 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Feb 05, 2019 at 04:29:48PM -0800, Andrii Nakryiko wrote:
> > This change splits out previous btf__new functionality of constructing
> > struct btf and loading it into kernel into two:
> > - btf__new() just creates and initializes struct btf
> > - btf__load() attempts to load existing struct btf into kernel
> >
> > btf__free will still close BTF fd, if it was ever loaded successfully
> > into kernel.
> >
> > This change allows users of libbpf to manipulate BTF using its API,
> > without the need to unnecessarily load it into kernel.
> >
> > One of the intended use cases is pahole using libbpf to do DWARF to BTF
> > conversion and deduplication using libbpf, while handling ELF sections
> > overwrites and other concerns on its own.
> >
> > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > Acked-by: Song Liu <songliubraving@fb.com>
>
> should it be
> Fixes: 2d3feca8c44f ("bpf: btf: print map dump and lookup with btf info")
> ?
> Even back then btf__new() was doing the loading.
> So that btf support in bpftool was silently double loading btf.
>

ok, will add that to commit message

^ permalink raw reply

* Re: [Patch net v2] xfrm: destroy xfrm_state synchronously on net exit path
From: Steffen Klassert @ 2019-02-06  5:14 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, syzbot+e9aebef558e3ed673934
In-Reply-To: <20190131210549.3114-1-xiyou.wangcong@gmail.com>

On Thu, Jan 31, 2019 at 01:05:49PM -0800, Cong Wang wrote:
> xfrm_state_put() moves struct xfrm_state to the GC list
> and schedules the GC work to clean it up. On net exit call
> path, xfrm_state_flush() is called to clean up and
> xfrm_flush_gc() is called to wait for the GC work to complete
> before exit.
> 
> However, this doesn't work because one of the ->destructor(),
> ipcomp_destroy(), schedules the same GC work again inside
> the GC work. It is hard to wait for such a nested async
> callback. This is also why syzbot still reports the following
> warning:
> 
>  WARNING: CPU: 1 PID: 33 at net/ipv6/xfrm6_tunnel.c:351 xfrm6_tunnel_net_exit+0x2cb/0x500 net/ipv6/xfrm6_tunnel.c:351
>  ...
>   ops_exit_list.isra.0+0xb0/0x160 net/core/net_namespace.c:153
>   cleanup_net+0x51d/0xb10 net/core/net_namespace.c:551
>   process_one_work+0xd0c/0x1ce0 kernel/workqueue.c:2153
>   worker_thread+0x143/0x14a0 kernel/workqueue.c:2296
>   kthread+0x357/0x430 kernel/kthread.c:246
>   ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
> 
> In fact, it is perfectly fine to bypass GC and destroy xfrm_state
> synchronously on net exit call path, because it is in process context
> and doesn't need a work struct to do any blocking work.
> 
> This patch introduces xfrm_state_put_sync() which simply bypasses
> GC, and lets its callers to decide whether to use this synchronous
> version. On net exit path, xfrm_state_fini() and
> xfrm6_tunnel_net_exit() use it. And, as ipcomp_destroy() itself is
> blocking, it can use xfrm_state_put_sync() directly too.
> 
> Also rename xfrm_state_gc_destroy() to ___xfrm_state_destroy() to
> reflect this change.
> 
> Fixes: b48c05ab5d32 ("xfrm: Fix warning in xfrm6_tunnel_net_exit.")
> Reported-and-tested-by: syzbot+e9aebef558e3ed673934@syzkaller.appspotmail.com
> Cc: Steffen Klassert <steffen.klassert@secunet.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Applied, thanks a lot!

^ permalink raw reply

* [PATCH bpf-next] tools/bpf: silence a libbpf unnecessary warning
From: Yonghong Song @ 2019-02-06  5:38 UTC (permalink / raw)
  To: Andrii Nakryiko, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Yonghong Song

Commit 96408c43447a ("tools/bpf: implement libbpf
btf__get_map_kv_tids() API function") refactored
function bpf_map_find_btf_info() and moved bulk of
implementation into btf.c as btf__get_map_kv_tids().
This change introduced a bug such that test_btf will
print out the following warning although the test passed:
  BTF libbpf test[2] (test_btf_nokv.o): libbpf: map:btf_map
      container_name:____btf_map_btf_map cannot be found
      in BTF. Missing BPF_ANNOTATE_KV_PAIR?

Previously, the error message is guarded with pr_debug().
Commit 96408c43447a changed it to pr_warning() and
hence caused the warning.

Restoring to pr_debug() for the message fixed the issue.

Fixes: 96408c43447a ("tools/bpf: implement libbpf btf__get_map_kv_tids() API function")
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/lib/bpf/btf.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 3b3a2959d03a..ab6528c935a1 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -531,8 +531,8 @@ int btf__get_map_kv_tids(const struct btf *btf, const char *map_name,
 
 	container_id = btf__find_by_name(btf, container_name);
 	if (container_id < 0) {
-		pr_warning("map:%s container_name:%s cannot be found in BTF. Missing BPF_ANNOTATE_KV_PAIR?\n",
-			   map_name, container_name);
+		pr_debug("map:%s container_name:%s cannot be found in BTF. Missing BPF_ANNOTATE_KV_PAIR?\n",
+			 map_name, container_name);
 		return container_id;
 	}
 
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH v2 bpf-next 2/2] btf: expose API to work with raw btf data
From: Andrii Nakryiko @ 2019-02-06  5:46 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, Song Liu, Yonghong Song, Alexei Starovoitov,
	Martin Lau, netdev, Daniel Borkmann
In-Reply-To: <20190206030709.hrxaorkkmgaxn5au@ast-mbp>

On Tue, Feb 5, 2019 at 7:07 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Feb 05, 2019 at 04:29:49PM -0800, Andrii Nakryiko wrote:
> > This patch exposes two new APIs btf__get_raw_data_size() and
> > btf__get_raw_data() that allows to get a copy of raw BTF data out of
> > struct btf. This is useful for external programs that need to manipulate
> > raw data, e.g., pahole using btf__dedup() to deduplicate BTF type info
> > and then writing it back to file.
> >
> > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > Acked-by: Song Liu <songliubraving@fb.com>
> > ---
> >  tools/lib/bpf/btf.c      | 10 ++++++++++
> >  tools/lib/bpf/btf.h      |  2 ++
> >  tools/lib/bpf/libbpf.map |  2 ++
> >  3 files changed, 14 insertions(+)
> >
> > diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> > index 1c2ba7182400..34bfb3641aac 100644
> > --- a/tools/lib/bpf/btf.c
> > +++ b/tools/lib/bpf/btf.c
> > @@ -437,6 +437,16 @@ int btf__fd(const struct btf *btf)
> >       return btf->fd;
> >  }
> >
> > +__u32 btf__get_raw_data_size(const struct btf *btf)
> > +{
> > +     return btf->data_size;
> > +}
> > +
> > +void btf__get_raw_data(const struct btf *btf, char *data)
> > +{
> > +     memcpy(data, btf->data, btf->data_size);
> > +}
>
> I cannot think of any other way to use this api,
> but to call btf__get_raw_data_size() first,
> then malloc that much memory and then call btf__get_raw_data()
> to store btf into it.
>
> If so, may be api should be single call that allocates, copies,
> and returns pointer to new mem and its size?
> Probably less error prone?
>

I don't have strong preference, but providing pointer to allocated memory
seems more flexible and allows more clever/optimal use of memory from caller
side. E.g., instead of doing two mallocs, you can imagine doing something
like:

int max_size = max(btf__get_raw_data_size(btf),
                   btf_ext__get_raw_data_size(btf_ext));
char *m = malloc(max_size);
btf__get_raw_data(btf, m);
dump_btf_section_to_file(m, some_file);
btf_ext__get_raw_data(btf_ext, m);
dump_btf_ext_section_to_file(m, some_file);
free(m);

Also, pointer to memory could be mmap()'ed file, for instance. In general,
for a library it might be a good thing to not be prescriptive as to how one
gets that piece of memory.

If those examples are not convincing enough, I'm happy to go with single
btf__get_raw_data() call doing allocation and returning pointer.

^ permalink raw reply

* Re: [RFC bpf-next 0/7] net: flow_dissector: trigger BPF hook when called from eth_get_headlen
From: Stanislav Fomichev @ 2019-02-06  5:49 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Willem de Bruijn, Stanislav Fomichev, Network Development,
	David Miller, Alexei Starovoitov, Daniel Borkmann, simon.horman,
	Willem de Bruijn
In-Reply-To: <20190206041112.h3ccahimp2uerdfk@ast-mbp>

On 02/05, Alexei Starovoitov wrote:
> On Tue, Feb 05, 2019 at 07:56:19PM -0800, Stanislav Fomichev wrote:
> > On 02/05, Alexei Starovoitov wrote:
> > > On Tue, Feb 05, 2019 at 04:59:31PM -0800, Stanislav Fomichev wrote:
> > > > On 02/05, Alexei Starovoitov wrote:
> > > > > On Tue, Feb 05, 2019 at 12:40:03PM -0800, Stanislav Fomichev wrote:
> > > > > > On 02/05, Willem de Bruijn wrote:
> > > > > > > On Tue, Feb 5, 2019 at 12:57 PM Stanislav Fomichev <sdf@google.com> wrote:
> > > > > > > >
> > > > > > > > Currently, when eth_get_headlen calls flow dissector, it doesn't pass any
> > > > > > > > skb. Because we use passed skb to lookup associated networking namespace
> > > > > > > > to find whether we have a BPF program attached or not, we always use
> > > > > > > > C-based flow dissector in this case.
> > > > > > > >
> > > > > > > > The goal of this patch series is to add new networking namespace argument
> > > > > > > > to the eth_get_headlen and make BPF flow dissector programs be able to
> > > > > > > > work in the skb-less case.
> > > > > > > >
> > > > > > > > The series goes like this:
> > > > > > > > 1. introduce __init_skb and __init_skb_shinfo; those will be used to
> > > > > > > >    initialize temporary skb
> > > > > > > > 2. introduce skb_net which can be used to get networking namespace
> > > > > > > >    associated with an skb
> > > > > > > > 3. add new optional network namespace argument to __skb_flow_dissect and
> > > > > > > >    plumb through the callers
> > > > > > > > 4. add new __flow_bpf_dissect which constructs temporary on-stack skb
> > > > > > > >    (using __init_skb) and calls BPF flow dissector program
> > > > > > > 
> > > > > > > The main concern I see with this series is this cost of skb zeroing
> > > > > > > for every packet in the device driver receive routine, *independent*
> > > > > > > from the real skb allocation and zeroing which will likely happen
> > > > > > > later.
> > > > > > Yes, plus ~200 bytes on the stack for the callers.
> > > > > > 
> > > > > > Not sure how visible this zeroing though, I can probably try to get some
> > > > > > numbers from BPF_PROG_TEST_RUN (running current version vs running with
> > > > > > on-stack skb).
> > > > > 
> > > > > imo extra 256 byte memset for every packet is non starter.
> > > > We can put pre-allocated/initialized skbs without data into percpu or even
> > > > use pcpu_freelist_pop/pcpu_freelist_push to make sure we don't have to think
> > > > about having multiple percpu for irq/softirq/process contexts.
> > > > Any concerns with that approach?
> > > > Any other possible concerns with the overall series?
> > > 
> > > I'm missing why the whole thing is needed.
> > > You're saying:
> > > " make BPF flow dissector programs be able to work in the skb-less case".
> > > What does it mean specifically?
> > > The only non-skb case is XDP.
> > > Are you saying you want flow_dissector prog to be run in XDP?
> > eth_get_headlen that drivers call on RX path on a chunk of data to
> > guesstimate the length of the headers calls flow dissector without an skb
> > (__skb_flow_dissect was a weird interface where it accepts skb or
> > data+len). Right now, there is no way to trigger BPF flow dissector
> > for this case (we don't have an skb to get associated namespace/etc/etc).
> > The patch series tries to fix that to make sure that we always trigger
> > BPF program if it's attached to a device's namespace.
> 
> then why not to create flow_dissector prog type that works without skb?
> Why do you need to fake an skb?
> XDP progs work just fine without it.
What's the advantage of having another prog type? In this case we would have
to write the same flow dissector program twice: first time against __skb_buff
interface, second time against xdp_md.
By using fake skb, we make the same flow dissector __sk_buff BPF program
work in both contexts without a rewrite to an xdp interface (I don't
think users should care whether flow dissector was called form "xdp" vs skb
context; and we're sort of stuck with __sk_buff interface already).

^ permalink raw reply

* Re: [PATCH bpf-next 2/2] tools/bpf: add log_level to bpf_load_program_attr
From: Yonghong Song @ 2019-02-06  5:51 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: netdev@vger.kernel.org, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team
In-Reply-To: <20190206022640.lnhpp6ebxmjvizo6@ast-mbp>



On 2/5/19 6:26 PM, Alexei Starovoitov wrote:
> On Tue, Feb 05, 2019 at 11:48:23AM -0800, Yonghong Song wrote:
>> The kernel verifier has three levels of logs:
>>      0: no logs
>>      1: logs mostly useful
>>    > 1: verbose
>>
>> Current libbpf API functions bpf_load_program_xattr() and
>> bpf_load_program() cannot specify log_level.
>> The bcc, however, provides an interface for user to
>> specify log_level 2 for verbose output.
>>
>> This patch added log_level into structure
>> bpf_load_program_attr, so users, including bcc, can use
>> bpf_load_program_xattr() to change log_level.
>>
>> The bpf selftest test_sock.c is modified to enable log_level = 2.
>> If the "verbose" in test_sock.c is changed to true,
>> the test will output logs like below:
>>    $ ./test_sock
>>    func#0 @0
>>    0: R1=ctx(id=0,off=0,imm=0) R10=fp0,call_-1
>>    0: (bf) r6 = r1
>>    1: R1=ctx(id=0,off=0,imm=0) R6_w=ctx(id=0,off=0,imm=0) R10=fp0,call_-1
>>    1: (61) r7 = *(u32 *)(r6 +28)
>>    invalid bpf_context access off=28 size=4
>>
>>    Test case: bind4 load with invalid access: src_ip6 .. [PASS]
>>    ...
>>    Test case: bind6 allow all .. [PASS]
>>    Summary: 16 PASSED, 0 FAILED
>>
>> Some test_sock tests are negative tests and verbose verifier
>> log will be printed out as shown in the above.
>>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>>   tools/lib/bpf/bpf.c                     | 4 +++-
>>   tools/lib/bpf/bpf.h                     | 1 +
>>   tools/testing/selftests/bpf/test_sock.c | 9 ++++++++-
>>   3 files changed, 12 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
>> index 3defad77dc7a..78aa8c2b1a5c 100644
>> --- a/tools/lib/bpf/bpf.c
>> +++ b/tools/lib/bpf/bpf.c
>> @@ -214,6 +214,7 @@ int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
>>   {
>>   	void *finfo = NULL, *linfo = NULL;
>>   	union bpf_attr attr;
>> +	__u32 log_level;
>>   	__u32 name_len;
>>   	int fd;
>>   
>> @@ -292,7 +293,8 @@ int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
>>   	/* Try again with log */
>>   	attr.log_buf = ptr_to_u64(log_buf);
>>   	attr.log_size = log_buf_sz;
>> -	attr.log_level = 1;
>> +	log_level = load_attr->log_level;
>> +	attr.log_level = (log_level >= 2) ? log_level : 1;
>>   	log_buf[0] = 0;
>>   	fd = sys_bpf_prog_load(&attr, sizeof(attr));
>>   done:
>> diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
>> index ed09eed2dc3b..15a8e22e8eae 100644
>> --- a/tools/lib/bpf/bpf.h
>> +++ b/tools/lib/bpf/bpf.h
>> @@ -76,6 +76,7 @@ struct bpf_load_program_attr {
>>   	const struct bpf_insn *insns;
>>   	size_t insns_cnt;
>>   	const char *license;
>> +	__u32 log_level;
>>   	__u32 kern_version;
>>   	__u32 prog_ifindex;
>>   	__u32 prog_btf_fd;
> 
> this will break binary compatibility in libbpf api.
> Please add new fields always to the end of *_attr structs.

I felt that we were still at 0.0.x stage and still could tweak
the api a little bit so I used the above sequence to
mimic the kernel prog_load attr sequence. I do agree that
in general we should add to the end of data structure.
I can change to the end of structure we still decided
if exposing log_level=2 is needed.

> 
> Also why not to silence bcc instead?
> Let it treat log_level > 1 as log_level=1
> I don't think anyone but the most extreme verifier hacker used level=2.

In general, that is true. I just used the feature a couple of
weeks ago to toubleshoot a verifier error... But most cases
log_level=1 should be sufficient.

> Personally I don't remember when was the last time I used it.
> It seem like a niche feature that we can safely remove in bcc.
Will discuss a little more in bcc community and then make
a decision.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox