Netdev List
 help / color / mirror / Atom feed
* [PATCH RESEND net-next v2 0/8] net: stmmac: dwmac-sun8i: Support R40
From: Chen-Yu Tsai @ 2018-05-13 19:14 UTC (permalink / raw)
  To: Giuseppe Cavallaro
  Cc: Chen-Yu Tsai, linux-arm-kernel, devicetree, netdev,
	Corentin Labbe, Icenowy Zheng, Maxime Ripard, Rob Herring

This is a resend of the patches for net-next split out from my R40
Ethernet support v2 series, as requested by David Miller. The arm-soc
bits will follow, once I rework the A64 system controller compatible.

Patches 1, 2, and 3 clean up the dwmac-sun8i binding.

Patch 4 adds device tree binding for Allwinner R40's Ethernet
controller.

Patch 5 converts regmap access of the syscon region in the dwmac-sun8i
driver to regmap_field, in anticipation of different field widths on
the R40.

Patch 6 introduces custom plumbing in the dwmac-sun8i driver to fetch
a regmap from another device, by looking up said device via a phandle,
then getting the regmap associated with that device.

Patch 7 adds support for different or absent TX/RX delay chain ranges
to the dwmac-sun8i driver.

Patch 8 adds support for the R40's ethernet controller.


Excerpt from original cover letter:

Changes since v1:

  - Default to fetching regmap from device pointed to by syscon phandle,
    and falling back to syscon API if that fails.

  - Dropped .syscon_from_dev field in device data as a result of the
    previous change.

  - Added a large comment block explaining the first change.

  - Simplified description of syscon property in sun8i-dwmac binding.

  - Regmap now only exposes the EMAC/GMAC register, but retains the
    offset within its address space.

  - Added patches for A64, which reuse the same sun8i-dwmac changes.

This series adds support for the DWMAC based Ethernet controller found
on the Allwinner R40 SoC. The controller is either a DWMAC clone or
DWMAC core with its registers rearranged. This is already supported by
the dwmac-sun8i driver. The glue layer control registers, unlike other
sun8i family SoCs, is not in the system controller region, but in the
clock control unit, like with the older A20 and A31 SoCs.

While we reuse the bindings for dwmac-sun8i using a syscon phandle
reference, we need some custom plumbing for the clock driver to export
a regmap that only allows access to the GMAC register to the dwmac-sun8i
driver. An alternative would be to allow drivers to register custom
syscon devices with their own regmap and locking.


Please have a look.

Regards
ChenYu

Chen-Yu Tsai (8):
  dt-bindings: net: dwmac-sun8i: Clean up clock delay chain descriptions
  dt-bindings: net: dwmac-sun8i: Sort syscon compatibles by alphabetical
    order
  dt-bindings: net: dwmac-sun8i: simplify description of syscon property
  dt-bindings: net: dwmac-sun8i: Add binding for GMAC on Allwinner R40
    SoC
  net: stmmac: dwmac-sun8i: Use regmap_field for syscon register access
  net: stmmac: dwmac-sun8i: Allow getting syscon regmap from external
    device
  net: stmmac: dwmac-sun8i: Support different ranges for TX/RX delay
    chains
  net: stmmac: dwmac-sun8i: Add support for GMAC on Allwinner R40 SoC

 .../devicetree/bindings/net/dwmac-sun8i.txt   |  21 +--
 .../net/ethernet/stmicro/stmmac/dwmac-sun8i.c | 139 +++++++++++++++---
 2 files changed, 130 insertions(+), 30 deletions(-)

-- 
2.17.0

^ permalink raw reply

* [PATCH RESEND net-next v2 1/8] dt-bindings: net: dwmac-sun8i: Clean up clock delay chain descriptions
From: Chen-Yu Tsai @ 2018-05-13 19:14 UTC (permalink / raw)
  To: Giuseppe Cavallaro
  Cc: Chen-Yu Tsai, linux-arm-kernel, devicetree, netdev,
	Corentin Labbe, Icenowy Zheng, Maxime Ripard, Rob Herring
In-Reply-To: <20180513191425.9801-1-wens@csie.org>

The clock delay chains found in the glue layer for dwmac-sun8i are only
used with RGMII PHYs. They are not intended for non-RGMII PHYs, such as
MII external PHYs or the internal PHY. Also, a recent SoC has a smaller
range of possible values for the delay chain.

This patch reformats the delay chain section of the device tree binding
to make it clear that the delay chains only apply to RGMII PHYs, and
make it easier to add the R40-specific bits later.

Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>
---
 Documentation/devicetree/bindings/net/dwmac-sun8i.txt | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/dwmac-sun8i.txt b/Documentation/devicetree/bindings/net/dwmac-sun8i.txt
index 3d6d5fa0c4d5..e04ce75e24a3 100644
--- a/Documentation/devicetree/bindings/net/dwmac-sun8i.txt
+++ b/Documentation/devicetree/bindings/net/dwmac-sun8i.txt
@@ -28,10 +28,13 @@ Required properties:
   - allwinner,sun8i-a83t-system-controller
 
 Optional properties:
-- allwinner,tx-delay-ps: TX clock delay chain value in ps. Range value is 0-700. Default is 0)
-- allwinner,rx-delay-ps: RX clock delay chain value in ps. Range value is 0-3100. Default is 0)
-Both delay properties need to be a multiple of 100. They control the delay for
-external PHY.
+- allwinner,tx-delay-ps: TX clock delay chain value in ps.
+			 Range is 0-700. Default is 0.
+- allwinner,rx-delay-ps: RX clock delay chain value in ps.
+			 Range is 0-3100. Default is 0.
+Both delay properties need to be a multiple of 100. They control the
+clock delay for external RGMII PHY. They do not apply to the internal
+PHY or external non-RGMII PHYs.
 
 Optional properties for the following compatibles:
   - "allwinner,sun8i-h3-emac",
-- 
2.17.0

^ permalink raw reply related

* [PATCH RESEND net-next v2 4/8] dt-bindings: net: dwmac-sun8i: Add binding for GMAC on Allwinner R40 SoC
From: Chen-Yu Tsai @ 2018-05-13 19:14 UTC (permalink / raw)
  To: Giuseppe Cavallaro
  Cc: Chen-Yu Tsai, linux-arm-kernel, devicetree, netdev,
	Corentin Labbe, Icenowy Zheng, Maxime Ripard, Rob Herring
In-Reply-To: <20180513191425.9801-1-wens@csie.org>

The Allwinner R40 SoC has the EMAC controller supported by dwmac-sun8i.
It is named "GMAC", while EMAC refers to the 10/100 Mbps Ethernet
controller supported by sun4i-emac. The controller is the same, but
the R40 has the glue layer controls in the clock control unit (CCU),
with a reduced RX delay chain, and no TX delay chain.

This patch adds the R40 specific bits to the dwmac-sun8i binding.

Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>
---
 Documentation/devicetree/bindings/net/dwmac-sun8i.txt | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/devicetree/bindings/net/dwmac-sun8i.txt b/Documentation/devicetree/bindings/net/dwmac-sun8i.txt
index 1c0906a5c02b..cfe724398a12 100644
--- a/Documentation/devicetree/bindings/net/dwmac-sun8i.txt
+++ b/Documentation/devicetree/bindings/net/dwmac-sun8i.txt
@@ -7,6 +7,7 @@ Required properties:
 - compatible: must be one of the following string:
 		"allwinner,sun8i-a83t-emac"
 		"allwinner,sun8i-h3-emac"
+		"allwinner,sun8i-r40-gmac"
 		"allwinner,sun8i-v3s-emac"
 		"allwinner,sun50i-a64-emac"
 - reg: address and length of the register for the device.
@@ -25,8 +26,10 @@ Required properties:
 Optional properties:
 - allwinner,tx-delay-ps: TX clock delay chain value in ps.
 			 Range is 0-700. Default is 0.
+			 Unavailable for allwinner,sun8i-r40-gmac
 - allwinner,rx-delay-ps: RX clock delay chain value in ps.
 			 Range is 0-3100. Default is 0.
+			 Range is 0-700 for allwinner,sun8i-r40-gmac
 Both delay properties need to be a multiple of 100. They control the
 clock delay for external RGMII PHY. They do not apply to the internal
 PHY or external non-RGMII PHYs.
-- 
2.17.0

^ permalink raw reply related

* [PATCH RESEND net-next v2 3/8] dt-bindings: net: dwmac-sun8i: simplify description of syscon property
From: Chen-Yu Tsai @ 2018-05-13 19:14 UTC (permalink / raw)
  To: Giuseppe Cavallaro
  Cc: Chen-Yu Tsai, linux-arm-kernel, devicetree, netdev,
	Corentin Labbe, Icenowy Zheng, Maxime Ripard, Rob Herring
In-Reply-To: <20180513191425.9801-1-wens@csie.org>

The syscon property is used to point to the device that holds the glue
layer control register known as the "EMAC (or GMAC) clock register".

We do not need to explicitly list what compatible strings are needed, as
this information is readily available in the user manuals. Also the
"syscon" device type is more of an implementation detail. There are many
ways to access a register not in a device's address range, the syscon
interface being the most generic and unrestricted one.

Simplify the description so that it says what it is supposed to
describe.

Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>
---
 Documentation/devicetree/bindings/net/dwmac-sun8i.txt | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/dwmac-sun8i.txt b/Documentation/devicetree/bindings/net/dwmac-sun8i.txt
index 1b8e33e71651..1c0906a5c02b 100644
--- a/Documentation/devicetree/bindings/net/dwmac-sun8i.txt
+++ b/Documentation/devicetree/bindings/net/dwmac-sun8i.txt
@@ -20,12 +20,7 @@ Required properties:
 - phy-handle: See ethernet.txt
 - #address-cells: shall be 1
 - #size-cells: shall be 0
-- syscon: A phandle to the syscon of the SoC with one of the following
- compatible string:
-  - allwinner,sun8i-a83t-system-controller
-  - allwinner,sun8i-h3-system-controller
-  - allwinner,sun8i-v3s-system-controller
-  - allwinner,sun50i-a64-system-controller
+- syscon: A phandle to the device containing the EMAC or GMAC clock register
 
 Optional properties:
 - allwinner,tx-delay-ps: TX clock delay chain value in ps.
-- 
2.17.0

^ permalink raw reply related

* [PATCH RESEND net-next v2 2/8] dt-bindings: net: dwmac-sun8i: Sort syscon compatibles by alphabetical order
From: Chen-Yu Tsai @ 2018-05-13 19:14 UTC (permalink / raw)
  To: Giuseppe Cavallaro
  Cc: devicetree, Maxime Ripard, netdev, Chen-Yu Tsai, Rob Herring,
	Corentin Labbe, linux-arm-kernel, Icenowy Zheng
In-Reply-To: <20180513191425.9801-1-wens@csie.org>

The A83T syscon compatible was appended to the syscon compatibles list,
instead of inserted in to preserve the ordering.

Move it to the proper place to keep the list sorted.

Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>
---
 Documentation/devicetree/bindings/net/dwmac-sun8i.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/net/dwmac-sun8i.txt b/Documentation/devicetree/bindings/net/dwmac-sun8i.txt
index e04ce75e24a3..1b8e33e71651 100644
--- a/Documentation/devicetree/bindings/net/dwmac-sun8i.txt
+++ b/Documentation/devicetree/bindings/net/dwmac-sun8i.txt
@@ -22,10 +22,10 @@ Required properties:
 - #size-cells: shall be 0
 - syscon: A phandle to the syscon of the SoC with one of the following
  compatible string:
+  - allwinner,sun8i-a83t-system-controller
   - allwinner,sun8i-h3-system-controller
   - allwinner,sun8i-v3s-system-controller
   - allwinner,sun50i-a64-system-controller
-  - allwinner,sun8i-a83t-system-controller
 
 Optional properties:
 - allwinner,tx-delay-ps: TX clock delay chain value in ps.
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH bpf v3] x86/cpufeature: bpf hack for clang not supporting asm goto
From: Thomas Gleixner @ 2018-05-13 18:02 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Borislav Petkov, Peter Zijlstra, Yonghong Song, Ingo Molnar,
	Linus Torvalds, Alexei Starovoitov, Daniel Borkmann, LKML, X86 ML,
	Network Development, Kernel Team
In-Reply-To: <20180513174348.eoh2lrhzqbmqb5nc@ast-mbp>

On Sun, 13 May 2018, Alexei Starovoitov wrote:
> On Sat, May 12, 2018 at 10:30:02PM +0200, Thomas Gleixner wrote:
> > But yes, the situation is slightly different here because tools which
> > create trace event magic _HAVE_ to pull in kernel headers. At the same time
> > these tools depend on a compiler which failed to implement asm_goto for
> > fricking 8 years.
> 
> As a maintainer of a piece of llvm codebase I have to say that
> this bullying tactic has the opposite effect.

I'm not bullying at all. Its a fact that the discussion about asm goto is
dragging out 8 years now. We've stayed away from mandating it for quite
some time, but at some point it just doesn't make sense anymore.

> The inline asm is processed by gcc and llvm very differently.  gcc is
> leaking internal backend implementation details into inline asm
> syntax. It makes little sense for llvm to do the same, since compiler
> codegen is completely different. gcc doesn't have integrated assembler
> whereas llvm not only can parse asm, but can potentially optimize it as
> well.  Instead of demanding asm-goto that matches gcc one to one it's
> better to work with the community to define the syntax that works for
> both kernel and llvm.

Come on, we surely are open for discussions, but what I've seen so far is
just 'oh we can't do this because' instead of a sane proposal how it can be
done w/o rewriting the whole ASM GOTO stuff in the kernel or even
duplicating it.

> > + * Workaround for the sake of BPF compilation which utilizes kernel
> > + * headers, but clang does not support ASM GOTO and fails the build.
> > + */
> > +#ifndef __BPF__
> > +#warning "Compiler lacks ASM_GOTO support. Add -D __BPF__ to your compiler arguments"
> > +#endif
> 
> Agree.
> The warning makes sense to me, but it has to be different macro name.
> How about -D__BPF_TRACING__ or -D__BPF_KPROBES__ or something similar ?

Fair enough.

> Such name will also make it clear that only tracing bpf programs
> need this. Networking programs shouldn't be including kernel headers.
> There was never a need, but since the tracing progs are often used
> as an example people copy paste makefiles too.
> We tried to document it as much as possible, but people still use
> 'clang -target native -I/kernel/includes bpf_prog.c -emit-llvm | llc -march=bpf'
> in their builds.
> (sometimes as a workaround for setups where clang is older version,
> but llc/llvm is new)
> Now they will see this warning and it will force them to think whether
> they actually need the kernel headers.

Makes sense.

> > +
> > +#define static_cpu_has(bit)		boot_cpu_has(bit)
> > +
> > +#else
> > +
> >  /*
> >   * Static testing of CPU features.  Used the same as boot_cpu_has().
> >   * These will statically patch the target code for additional
> > @@ -195,6 +209,7 @@ static __always_inline __pure bool _stat
> >  		boot_cpu_has(bit) :				\
> >  		_static_cpu_has(bit)				\
> >  )
> > +#endif
> >  
> >  #define cpu_has_bug(c, bit)		cpu_has(c, (bit))
> >  #define set_cpu_bug(c, bit)		set_cpu_cap(c, (bit))
> > --- a/samples/bpf/Makefile
> > +++ b/samples/bpf/Makefile
> > @@ -255,7 +255,7 @@ verify_target_bpf: verify_cmds
> >  $(obj)/%.o: $(src)/%.c
> >  	$(CLANG) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) -I$(obj) \
> >  		-I$(srctree)/tools/testing/selftests/bpf/ \
> > -		-D__KERNEL__ -Wno-unused-value -Wno-pointer-sign \
> > +		-D__KERNEL__ -D__BPF__ -Wno-unused-value -Wno-pointer-sign \
> 
> Yep. In samples/bpf and libbcc we can selectively add -D__BPF_TRACING__
> I think sysdig and other folks can live with that as well.
> Agree?

Sure. Care to send an updated patch?

Thanks,

	tglx

^ permalink raw reply

* [PATCH net] qede: Fix ref-cnt usage count
From: Michal Kalderon @ 2018-05-13 17:54 UTC (permalink / raw)
  To: michal.kalderon, davem
  Cc: netdev, linux-rdma, chad.dupuis, Michal Kalderon, Ariel Elior

Rebooting while qedr is loaded with a VLAN interface present
results in unregister_netdevice waiting for the usage count
to become free.
The fix is that rdma devices should be removed before unregistering
the netdevice, to assure all references to ndev are decreased.

Fixes: cee9fbd8e2e9 ("qede: Add qedr framework")
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com>
---
 drivers/net/ethernet/qlogic/qede/qede_main.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index a01e7d6..f6655e2 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -1066,13 +1066,12 @@ static void __qede_remove(struct pci_dev *pdev, enum qede_remove_mode mode)
 
 	DP_INFO(edev, "Starting qede_remove\n");
 
+	qede_rdma_dev_remove(edev);
 	unregister_netdev(ndev);
 	cancel_delayed_work_sync(&edev->sp_task);
 
 	qede_ptp_disable(edev);
 
-	qede_rdma_dev_remove(edev);
-
 	edev->ops->common->set_power_state(cdev, PCI_D0);
 
 	pci_set_drvdata(pdev, NULL);
-- 
2.9.5

^ permalink raw reply related

* Re: [PATCH bpf v3] x86/cpufeature: bpf hack for clang not supporting asm goto
From: Alexei Starovoitov @ 2018-05-13 17:43 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Borislav Petkov, Peter Zijlstra, Yonghong Song, Ingo Molnar,
	Linus Torvalds, Alexei Starovoitov, Daniel Borkmann, LKML, X86 ML,
	Network Development, Kernel Team
In-Reply-To: <alpine.DEB.2.21.1805122110230.1582@nanos.tec.linutronix.de>

On Sat, May 12, 2018 at 10:30:02PM +0200, Thomas Gleixner wrote:
> On Sat, 12 May 2018, Alexei Starovoitov wrote:
> > On Thu, May 10, 2018 at 10:58 AM, Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > > I see no option, but to fix the kernel.
> > > Regardless whether it's called user space breakage or kernel breakage.
> 
> There is a big difference. If you are abusing a kernel internal header in a
> user space tool, then there is absolutely ZERO excuse for requesting that
> the header in question has to be modified.
> 
> But yes, the situation is slightly different here because tools which
> create trace event magic _HAVE_ to pull in kernel headers. At the same time
> these tools depend on a compiler which failed to implement asm_goto for
> fricking 8 years.

As a maintainer of a piece of llvm codebase I have to say that
this bullying tactic has the opposite effect.
The inline asm is processed by gcc and llvm very differently.
gcc is leaking internal backend implementation details into inline asm
syntax. It makes little sense for llvm to do the same, since compiler
codegen is completely different. gcc doesn't have integrated assembler
whereas llvm not only can parse asm, but can potentially optimize it as well.
Instead of demanding asm-goto that matches gcc one to one it's better
to work with the community to define the syntax that works for both
kernel and llvm.

> So while Boris is right, that nothing has to fiddle with a kernel only
> header, I grumpily agree with you that we need a workaround in the kernel
> for this particular issue.
> 
> > could you please ack the patch or better yet take it into tip tree
> > and send to Linus asap ?
> 
> Nope. The patch is a horrible hack.
> 
> Why the heck do we need that extra fugly define? That has exactly zero
> value simply because we already have a define which denotes availablity of
> ASM GOTO: CC_HAVE_ASM_GOTO.

I agree. That's why the v1 patch that was using CC_HAVE_ASM_GOTO was better:
https://patchwork.kernel.org/patch/10333829/
I'm fine on adding a warning to it though.

> In case of samples/bpf/ and libbcc the compile does not go through the
> arch/x86 Makefile which stops the build anyway when ASM_GOTO is
> missing. Those builds merily pull in the headers and have their own build
> magic, which is broken btw: Changing a kernel header which gets pulled into
> the build does not rebuild anything in samples/bpf. Qualitee..
> 
> So we can just use CC_HAVE_ASM_GOTO and be done with it.
> 
> But we also want the tools which needs this to be aware of this. Peter
> requested -D __BPF__ several times which got ignored. It's not too much of
> a request to add that.

quite the opposite.
It was explained already why -D__BPF__ makes little sense.
It's like saying that -D__arm__ has to be specified in command line.

clang automatically adds -D__arm__ when '-target arm' is used
and adds -D__BPF__ when '-target bpf' is used.
For samples/bpf, libbcc and other cases we have to use -target native.
If we do '-target native -D__BPF__' that's just like trying to compile
kernel headers with '-target x86 -D__arm__'. Absurd.

> Find a patch which deos exactly this for samples/bpf, but also allows other
> tools to build with a warning emitted so they get fixed.

agree

> Thanks,
> 
> 	tglx
> 
> 8<----------------
> --- a/arch/x86/include/asm/cpufeature.h
> +++ b/arch/x86/include/asm/cpufeature.h
> @@ -140,6 +140,20 @@ extern void clear_cpu_cap(struct cpuinfo
>  
>  #define setup_force_cpu_bug(bit) setup_force_cpu_cap(bit)
>  
> +#ifndef CC_HAVE_ASM_GOTO
> +
> +/*
> + * Workaround for the sake of BPF compilation which utilizes kernel
> + * headers, but clang does not support ASM GOTO and fails the build.
> + */
> +#ifndef __BPF__
> +#warning "Compiler lacks ASM_GOTO support. Add -D __BPF__ to your compiler arguments"
> +#endif

Agree.
The warning makes sense to me, but it has to be different macro name.
How about -D__BPF_TRACING__ or -D__BPF_KPROBES__ or something similar ?
Such name will also make it clear that only tracing bpf programs
need this. Networking programs shouldn't be including kernel headers.
There was never a need, but since the tracing progs are often used
as an example people copy paste makefiles too.
We tried to document it as much as possible, but people still use
'clang -target native -I/kernel/includes bpf_prog.c -emit-llvm | llc -march=bpf'
in their builds.
(sometimes as a workaround for setups where clang is older version,
but llc/llvm is new)
Now they will see this warning and it will force them to think whether
they actually need the kernel headers.

> +
> +#define static_cpu_has(bit)		boot_cpu_has(bit)
> +
> +#else
> +
>  /*
>   * Static testing of CPU features.  Used the same as boot_cpu_has().
>   * These will statically patch the target code for additional
> @@ -195,6 +209,7 @@ static __always_inline __pure bool _stat
>  		boot_cpu_has(bit) :				\
>  		_static_cpu_has(bit)				\
>  )
> +#endif
>  
>  #define cpu_has_bug(c, bit)		cpu_has(c, (bit))
>  #define set_cpu_bug(c, bit)		set_cpu_cap(c, (bit))
> --- a/samples/bpf/Makefile
> +++ b/samples/bpf/Makefile
> @@ -255,7 +255,7 @@ verify_target_bpf: verify_cmds
>  $(obj)/%.o: $(src)/%.c
>  	$(CLANG) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) -I$(obj) \
>  		-I$(srctree)/tools/testing/selftests/bpf/ \
> -		-D__KERNEL__ -Wno-unused-value -Wno-pointer-sign \
> +		-D__KERNEL__ -D__BPF__ -Wno-unused-value -Wno-pointer-sign \

Yep. In samples/bpf and libbcc we can selectively add -D__BPF_TRACING__
I think sysdig and other folks can live with that as well.
Agree?

^ permalink raw reply

* [PATCH] [RFC] bpf: tracing: new helper bpf_get_current_cgroup_ino
From: Alban Crequy @ 2018-05-13 17:33 UTC (permalink / raw)
  To: netdev, linux-kernel, containers, cgroups; +Cc: Alban Crequy

From: Alban Crequy <alban@kinvolk.io>

bpf_get_current_cgroup_ino() allows BPF trace programs to get the inode
of the cgroup where the current process resides.

My use case is to get statistics about syscalls done by a specific
Kubernetes container. I have a tracepoint on raw_syscalls/sys_enter and
a BPF map containing the cgroup inode that I want to trace. I use
bpf_get_current_cgroup_ino() and I quickly return from the tracepoint if
the inode is not in the BPF hash map.

Without this BPF helper, I would need to keep track of all pids in the
container. The Netlink proc connector can be used to follow process
creation and destruction but it is racy.

This patch only looks at the memory cgroup, which was enough for me
since each Kubernetes container is placed in a different mem cgroup.
For a generic implementation, I'm not sure how to proceed: it seems I
would need to use 'for_each_root(root)' (see example in
proc_cgroup_show() from kernel/cgroup/cgroup.c) but I don't know if
taking the cgroup mutex is possible in the BPF helper function. It might
be ok in the tracepoint raw_syscalls/sys_enter but could the mutex
already be taken in some other tracepoints?

Signed-off-by: Alban Crequy <alban@kinvolk.io>
---
 include/uapi/linux/bpf.h | 11 ++++++++++-
 kernel/trace/bpf_trace.c | 25 +++++++++++++++++++++++++
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c5ec89732a8d..38ac3959cdf3 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -755,6 +755,14 @@ union bpf_attr {
  *     @addr: pointer to struct sockaddr to bind socket to
  *     @addr_len: length of sockaddr structure
  *     Return: 0 on success or negative error code
+ *
+ * u64 bpf_get_current_cgroup_ino(hierarchy, flags)
+ *     Get the cgroup{1,2} inode of current task under the specified hierarchy.
+ *     @hierarchy: cgroup hierarchy
+ *     @flags: reserved for future use
+ *     Return:
+ *       == 0 error
+ *        > 0 inode of the cgroup
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -821,7 +829,8 @@ union bpf_attr {
 	FN(msg_apply_bytes),		\
 	FN(msg_cork_bytes),		\
 	FN(msg_pull_data),		\
-	FN(bind),
+	FN(bind),			\
+	FN(get_current_cgroup_ino),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 56ba0f2a01db..9bf92a786639 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -524,6 +524,29 @@ static const struct bpf_func_proto bpf_probe_read_str_proto = {
 	.arg3_type	= ARG_ANYTHING,
 };
 
+BPF_CALL_2(bpf_get_current_cgroup_ino, u32, hierarchy, u64, flags)
+{
+	// TODO: pick the correct hierarchy instead of the mem controller
+	struct cgroup *cgrp = task_cgroup(current, memory_cgrp_id);
+
+	if (unlikely(!cgrp))
+		return -EINVAL;
+	if (unlikely(hierarchy))
+		return -EINVAL;
+	if (unlikely(flags))
+		return -EINVAL;
+
+	return cgrp->kn->id.ino;
+}
+
+static const struct bpf_func_proto bpf_get_current_cgroup_ino_proto = {
+	.func           = bpf_get_current_cgroup_ino,
+	.gpl_only       = false,
+	.ret_type       = RET_INTEGER,
+	.arg1_type      = ARG_DONTCARE,
+	.arg2_type      = ARG_DONTCARE,
+};
+
 static const struct bpf_func_proto *
 tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
@@ -564,6 +587,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_get_prandom_u32_proto;
 	case BPF_FUNC_probe_read_str:
 		return &bpf_probe_read_str_proto;
+	case BPF_FUNC_get_current_cgroup_ino:
+		return &bpf_get_current_cgroup_ino_proto;
 	default:
 		return NULL;
 	}
-- 
2.14.3

^ permalink raw reply related

* KMSAN: uninit-value in tipc_conn_rcv_sub
From: syzbot @ 2018-05-13 16:38 UTC (permalink / raw)
  To: davem, jon.maloy, linux-kernel, netdev, syzkaller-bugs,
	tipc-discussion, ying.xue

Hello,

syzbot found the following crash on:

HEAD commit:    74ee2200b89f kmsan: bump .config.example to v4.17-rc3
git tree:       https://github.com/google/kmsan.git/master
console output: https://syzkaller.appspot.com/x/log.txt?x=12ab8637800000
kernel config:  https://syzkaller.appspot.com/x/.config?x=4ca1e57bafa8ab1f
dashboard link: https://syzkaller.appspot.com/bug?extid=8951a3065ee7fd6d6e23
compiler:       clang version 7.0.0 (trunk 329391)
syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=15a497f7800000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=177c1907800000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+8951a3065ee7fd6d6e23@syzkaller.appspotmail.com

random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
==================================================================
BUG: KMSAN: uninit-value in tipc_conn_rcv_sub+0x184/0x950  
net/tipc/topsrv.c:373
CPU: 0 PID: 66 Comm: kworker/u4:4 Not tainted 4.17.0-rc3+ #88
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Workqueue: tipc_rcv tipc_conn_recv_work
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x185/0x1d0 lib/dump_stack.c:113
  kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
  __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683
  tipc_conn_rcv_sub+0x184/0x950 net/tipc/topsrv.c:373
  tipc_conn_rcv_from_sock net/tipc/topsrv.c:409 [inline]
  tipc_conn_recv_work+0x3cd/0x560 net/tipc/topsrv.c:424
  process_one_work+0x12c6/0x1f60 kernel/workqueue.c:2145
  worker_thread+0x113c/0x24f0 kernel/workqueue.c:2279
  kthread+0x539/0x720 kernel/kthread.c:239
  ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:412

Local variable description: ----s.i@tipc_conn_recv_work
Variable was created at:
  tipc_conn_recv_work+0x65/0x560 net/tipc/topsrv.c:419
  process_one_work+0x12c6/0x1f60 kernel/workqueue.c:2145
==================================================================
Kernel panic - not syncing: panic_on_warn set ...

CPU: 0 PID: 66 Comm: kworker/u4:4 Tainted: G    B             4.17.0-rc3+  
#88
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Workqueue: tipc_rcv tipc_conn_recv_work
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x185/0x1d0 lib/dump_stack.c:113
  panic+0x39d/0x940 kernel/panic.c:184
  kmsan_report+0x238/0x240 mm/kmsan/kmsan.c:1083
  __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683
  tipc_conn_rcv_sub+0x184/0x950 net/tipc/topsrv.c:373
  tipc_conn_rcv_from_sock net/tipc/topsrv.c:409 [inline]
  tipc_conn_recv_work+0x3cd/0x560 net/tipc/topsrv.c:424
  process_one_work+0x12c6/0x1f60 kernel/workqueue.c:2145
  worker_thread+0x113c/0x24f0 kernel/workqueue.c:2279
  kthread+0x539/0x720 kernel/kthread.c:239
  ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:412
Dumping ftrace buffer:
    (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* WARNING in iov_iter_revert
From: syzbot @ 2018-05-13 16:28 UTC (permalink / raw)
  To: aviadye, davejwatson, davem, ilyal, linux-kernel, netdev,
	syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    427fbe89261d Merge branch 'next' of git://git.kernel.org/p..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=16b33477800000
kernel config:  https://syzkaller.appspot.com/x/.config?x=fcce42b221691ff9
dashboard link: https://syzkaller.appspot.com/bug?extid=c226690f7b3126c5ee04
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=144f1997800000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=141d5417800000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+c226690f7b3126c5ee04@syzkaller.appspotmail.com

random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
WARNING: CPU: 1 PID: 4542 at lib/iov_iter.c:857 iov_iter_revert+0x2ee/0xaa0  
lib/iov_iter.c:857
Kernel panic - not syncing: panic_on_warn set ...

CPU: 1 PID: 4542 Comm: syz-executor650 Not tainted 4.17.0-rc4+ #44
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
  panic+0x22f/0x4de kernel/panic.c:184
  __warn.cold.8+0x163/0x1b3 kernel/panic.c:536
  report_bug+0x252/0x2d0 lib/bug.c:186
  fixup_bug arch/x86/kernel/traps.c:178 [inline]
  do_error_trap+0x1de/0x490 arch/x86/kernel/traps.c:296
  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
  invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992
RIP: 0010:iov_iter_revert+0x2ee/0xaa0 lib/iov_iter.c:857
RSP: 0018:ffff8801ad1bf700 EFLAGS: 00010293
RAX: ffff8801ac55e6c0 RBX: 00000000ffffffff RCX: ffffffff835104a1
RDX: 0000000000000000 RSI: ffffffff8351074e RDI: 0000000000000007
RBP: ffff8801ad1bf760 R08: ffff8801ac55e6c0 R09: ffffed003b5e46c2
R10: 0000000000000003 R11: 0000000000000001 R12: 0000000000000001
R13: ffff8801ad1bfd60 R14: 0000000000000011 R15: ffff8801ae9ac040
  tls_sw_sendmsg+0xf1c/0x12d0 net/tls/tls_sw.c:448
  inet_sendmsg+0x19f/0x690 net/ipv4/af_inet.c:798
  sock_sendmsg_nosec net/socket.c:629 [inline]
  sock_sendmsg+0xd5/0x120 net/socket.c:639
  ___sys_sendmsg+0x805/0x940 net/socket.c:2117
  __sys_sendmsg+0x115/0x270 net/socket.c:2155
  __do_sys_sendmsg net/socket.c:2164 [inline]
  __se_sys_sendmsg net/socket.c:2162 [inline]
  __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2162
  do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4403a9
RSP: 002b:00007ffdcdfbd6c8 EFLAGS: 00000207 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004403a9
RDX: 0000000000000000 RSI: 0000000020001340 RDI: 0000000000000003
RBP: 00000000006ca018 R08: 000000000000001c R09: 000000000000001c
R10: 0000000020000180 R11: 0000000000000207 R12: 0000000000401cd0
R13: 0000000000401d60 R14: 0000000000000000 R15: 0000000000000000
Dumping ftrace buffer:
    (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: [PATCH v2] {net, IB}/mlx5: Use 'kvfree()' for memory allocated by 'kvzalloc()'
From: Eric Dumazet @ 2018-05-13 16:21 UTC (permalink / raw)
  To: Christophe JAILLET, saeedm, matanb, leon, dledford, jgg, davem
  Cc: netdev, linux-rdma, linux-kernel, kernel-janitors
In-Reply-To: <20180513070041.24246-1-christophe.jaillet@wanadoo.fr>



On 05/13/2018 12:00 AM, Christophe JAILLET wrote:
> When 'kvzalloc()' is used to allocate memory, 'kvfree()' must be used to
> free it.
> 
> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
> ---
> v1 -> v2: More places to update have been added to the patch


Please add relevant Fixes: tag(s)

^ permalink raw reply

* Re: INFO: rcu detected stall in kfree_skbmem
From: Marcelo Ricardo Leitner @ 2018-05-13 16:02 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Eric Dumazet, syzbot, Vladislav Yasevich, Neil Horman, linux-sctp,
	Andrei Vagin, David Miller, Kirill Tkhai, LKML, netdev,
	syzkaller-bugs
In-Reply-To: <CACT4Y+ZG6igbhgimA=oFz4k=x_P4tSYGQgFk0EVrjznm8V8fLA@mail.gmail.com>

On Sun, May 13, 2018 at 03:52:01PM +0200, Dmitry Vyukov wrote:
> On Fri, May 11, 2018 at 10:42 PM, Marcelo Ricardo Leitner
> <marcelo.leitner@gmail.com> wrote:
> > On Fri, May 11, 2018 at 12:08:33PM -0700, Eric Dumazet wrote:
> >>
> >>
> >> On 05/11/2018 11:41 AM, Marcelo Ricardo Leitner wrote:
> >>
> >> > But calling ip6_xmit with rcu_read_lock is expected. tcp stack also
> >> > does it.
> >> > Thus I think this is more of an issue with IPv6 stack. If a host has
> >> > an extensive ip6tables ruleset, it probably generates this more
> >> > easily.
> >> >
> >> >>>  sctp_v6_xmit+0x4a5/0x6b0 net/sctp/ipv6.c:225
> >> >>>  sctp_packet_transmit+0x26f6/0x3ba0 net/sctp/output.c:650
> >> >>>  sctp_outq_flush+0x1373/0x4370 net/sctp/outqueue.c:1197
> >> >>>  sctp_outq_uncork+0x6a/0x80 net/sctp/outqueue.c:776
> >> >>>  sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1820 [inline]
> >> >>>  sctp_side_effects net/sctp/sm_sideeffect.c:1220 [inline]
> >> >>>  sctp_do_sm+0x596/0x7160 net/sctp/sm_sideeffect.c:1191
> >> >>>  sctp_generate_heartbeat_event+0x218/0x450 net/sctp/sm_sideeffect.c:406
> >> >>>  call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
> >> >>>  expire_timers kernel/time/timer.c:1363 [inline]
> >> >
> >> > Having this call from a timer means it wasn't processing sctp stack
> >> > for too long.
> >> >
> >>
> >> I feel the problem is that this part is looping, in some infinite loop.
> >>
> >> I have seen this stack traces in other reports.
> >
> > Checked mail history now, seems at least two other reports on RCU
> > stalls had sctp_generate_heartbeat_event involved.
> >
> >>
> >> Maybe some kind of list corruption.
> >
> > Could be.
> > Do we know if it generated a flood of packets?
>
> We only know what's in the bug reports. Do the other ones have

Ok.

> reproducers? It can make sense to mark them as duplicates to not have

No.

> a placer of open bugs about the same root cause.

They may have the same root cause, but right now I cannot tell for
sure.

^ permalink raw reply

* Re: INFO: rcu detected stall in kfree_skbmem
From: Dmitry Vyukov @ 2018-05-13 13:52 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Eric Dumazet, syzbot, Vladislav Yasevich, Neil Horman, linux-sctp,
	Andrei Vagin, David Miller, Kirill Tkhai, LKML, netdev,
	syzkaller-bugs
In-Reply-To: <20180511204228.GO4977@localhost.localdomain>

On Fri, May 11, 2018 at 10:42 PM, Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
> On Fri, May 11, 2018 at 12:08:33PM -0700, Eric Dumazet wrote:
>>
>>
>> On 05/11/2018 11:41 AM, Marcelo Ricardo Leitner wrote:
>>
>> > But calling ip6_xmit with rcu_read_lock is expected. tcp stack also
>> > does it.
>> > Thus I think this is more of an issue with IPv6 stack. If a host has
>> > an extensive ip6tables ruleset, it probably generates this more
>> > easily.
>> >
>> >>>  sctp_v6_xmit+0x4a5/0x6b0 net/sctp/ipv6.c:225
>> >>>  sctp_packet_transmit+0x26f6/0x3ba0 net/sctp/output.c:650
>> >>>  sctp_outq_flush+0x1373/0x4370 net/sctp/outqueue.c:1197
>> >>>  sctp_outq_uncork+0x6a/0x80 net/sctp/outqueue.c:776
>> >>>  sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1820 [inline]
>> >>>  sctp_side_effects net/sctp/sm_sideeffect.c:1220 [inline]
>> >>>  sctp_do_sm+0x596/0x7160 net/sctp/sm_sideeffect.c:1191
>> >>>  sctp_generate_heartbeat_event+0x218/0x450 net/sctp/sm_sideeffect.c:406
>> >>>  call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
>> >>>  expire_timers kernel/time/timer.c:1363 [inline]
>> >
>> > Having this call from a timer means it wasn't processing sctp stack
>> > for too long.
>> >
>>
>> I feel the problem is that this part is looping, in some infinite loop.
>>
>> I have seen this stack traces in other reports.
>
> Checked mail history now, seems at least two other reports on RCU
> stalls had sctp_generate_heartbeat_event involved.
>
>>
>> Maybe some kind of list corruption.
>
> Could be.
> Do we know if it generated a flood of packets?

We only know what's in the bug reports. Do the other ones have
reproducers? It can make sense to mark them as duplicates to not have
a placer of open bugs about the same root cause.

^ permalink raw reply

* Re: safe skb resetting after decapsulation and encapsulation
From: Jason A. Donenfeld @ 2018-05-13 13:24 UTC (permalink / raw)
  To: Md. Islam; +Cc: Netdev, brouer, sbrivio
In-Reply-To: <CAFgPn1Ba-N-UFZAGZ55ocjr_2A5zKGJw2=n-jxMsmCZTDT+Ebg@mail.gmail.com>

On Sat, May 12, 2018 at 4:07 AM, Md. Islam <mislam4@kent.edu> wrote:
> I'm not an expert on this, but it looks about right.

Really? Even zeroing between headers_start and headers_end? With the
latest RHEL 7.5 kernel's i40e driver, doing this results in a crash in
kfree. It's possible redhat is putting something silly within
header_start and header_end, and so zeroing it is bad, but I suspect
that instead blanket zeroing it like that might actually be incorrect.

> look at build_skb() or __build_skb(). It shows the fields that needs to be set

These just kmalloc a new skb, with most fields set to zero. The ones
it modifies are the ones I'm modifying anyway when messing with the
data the skb contains. Doesn't look like there's much to help there.


I wrote the original post wondering precisely -- which specifically of
1-14 are incorrect, and is there anything specific missing from there.

^ permalink raw reply

* Re: [PATCH v1 iproute2-next 2/3] rdma: print driver resource attributes
From: Leon Romanovsky @ 2018-05-13 13:24 UTC (permalink / raw)
  To: Steve Wise; +Cc: dsahern, stephen, netdev, linux-rdma
In-Reply-To: <1a0d146dffb17449aa6d8a6b6d06e865e69226de.1525709213.git.swise@opengridcomputing.com>

[-- Attachment #1: Type: text/plain, Size: 11971 bytes --]

On Mon, May 07, 2018 at 08:53:16AM -0700, Steve Wise wrote:
> This enhancement allows printing rdma device-specific state, if provided
> by the kernel.  This is done in a generic manner, so rdma tool doesn't

Double space between "." and "This".

> need to know about the details of every type of rdma device.
>
> Driver attributes for a rdma resource are in the form of <key,
> [print_type], value> tuples, where the key is a string and the value can
> be any supported driver attribute.  The print_type attribute, if present,

ditto

> provides a print format to use vs the standard print format for the type.
> For example, the default print type for a PROVIDER_S32 value is "%d ",
> but "0x%x " if the print_type of PRINT_TYPE_HEX is included inthe tuple.
>
> Driver resources are only printed when the -dd flag is present.
> If -p is present, then the output is formatted to not exceed 80 columns,
> otherwise it is printed as a single row to be grep/awk friendly.
>
> Example output:
>
> # rdma resource show qp lqpn 1028 -dd -p
> link cxgb4_0/- lqpn 1028 rqpn 0 type RC state RTS rq-psn 0 sq-psn 0 path-mig-state MIGRATED pid 0 comm [nvme_rdma]
>     sqid 1028 flushed 0 memsize 123968 cidx 85 pidx 85 wq_pidx 106 flush_cidx 85 in_use 0
>     size 386 flags 0x0 rqid 1029 memsize 16768 cidx 43 pidx 41 wq_pidx 171 msn 44 rqt_hwaddr 0x2a8a5d00
>     rqt_size 256 in_use 128 size 130 idx 43 wr_id 0xffff881057c03408 idx 40 wr_id 0xffff881057c033f0
>
> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
> ---
>  rdma/rdma.c  |   7 ++-
>  rdma/rdma.h  |  11 ++++
>  rdma/res.c   |  30 +++------
>  rdma/utils.c | 194 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 221 insertions(+), 21 deletions(-)
>
> diff --git a/rdma/rdma.c b/rdma/rdma.c
> index b43e538..0155627 100644
> --- a/rdma/rdma.c
> +++ b/rdma/rdma.c
> @@ -132,6 +132,7 @@ int main(int argc, char **argv)
>  	const char *batch_file = NULL;
>  	bool pretty_output = false;
>  	bool show_details = false;
> +	bool show_driver_details = false;

Reversed Christmas tree please.

>  	bool json_output = false;
>  	bool force = false;
>  	char *filename;
> @@ -152,7 +153,10 @@ int main(int argc, char **argv)
>  			pretty_output = true;
>  			break;
>  		case 'd':
> -			show_details = true;
> +			if (show_details)
> +				show_driver_details = true;
> +			else
> +				show_details = true;
>  			break;
>  		case 'j':
>  			json_output = true;
> @@ -180,6 +184,7 @@ int main(int argc, char **argv)
>  	argv += optind;
>
>  	rd.show_details = show_details;
> +	rd.show_driver_details = show_driver_details;
>  	rd.json_output = json_output;
>  	rd.pretty_output = pretty_output;
>
> diff --git a/rdma/rdma.h b/rdma/rdma.h
> index 1908fc4..fcaf9e6 100644
> --- a/rdma/rdma.h
> +++ b/rdma/rdma.h
> @@ -55,6 +55,7 @@ struct rd {
>  	char **argv;
>  	char *filename;
>  	bool show_details;
> +	bool show_driver_details;
>  	struct list_head dev_map_list;
>  	uint32_t dev_idx;
>  	uint32_t port_idx;
> @@ -115,4 +116,14 @@ int rd_recv_msg(struct rd *rd, mnl_cb_t callback, void *data, uint32_t seq);
>  void rd_prepare_msg(struct rd *rd, uint32_t cmd, uint32_t *seq, uint16_t flags);
>  int rd_dev_init_cb(const struct nlmsghdr *nlh, void *data);
>  int rd_attr_cb(const struct nlattr *attr, void *data);
> +int rd_attr_check(const struct nlattr *attr, int *typep);
> +
> +/*
> + * Print helpers
> + */
> +void print_driver_table(struct rd *rd, struct nlattr *tb);
> +void newline(struct rd *rd);
> +void newline_indent(struct rd *rd);
> +#define MAX_LINE_LENGTH 80
> +
>  #endif /* _RDMA_TOOL_H_ */
> diff --git a/rdma/res.c b/rdma/res.c
> index 1a0aab6..074b992 100644
> --- a/rdma/res.c
> +++ b/rdma/res.c
> @@ -439,10 +439,8 @@ static int res_qp_parse_cb(const struct nlmsghdr *nlh, void *data)
>  		if (nla_line[RDMA_NLDEV_ATTR_RES_PID])
>  			free(comm);
>
> -		if (rd->json_output)
> -			jsonw_end_array(rd->jw);
> -		else
> -			pr_out("\n");
> +		print_driver_table(rd, nla_line[RDMA_NLDEV_ATTR_DRIVER]);
> +		newline(rd);
>  	}
>  	return MNL_CB_OK;
>  }
> @@ -678,10 +676,8 @@ static int res_cm_id_parse_cb(const struct nlmsghdr *nlh, void *data)
>  		if (nla_line[RDMA_NLDEV_ATTR_RES_PID])
>  			free(comm);
>
> -		if (rd->json_output)
> -			jsonw_end_array(rd->jw);
> -		else
> -			pr_out("\n");
> +		print_driver_table(rd, nla_line[RDMA_NLDEV_ATTR_DRIVER]);
> +		newline(rd);
>  	}
>  	return MNL_CB_OK;
>  }
> @@ -804,10 +800,8 @@ static int res_cq_parse_cb(const struct nlmsghdr *nlh, void *data)
>  		if (nla_line[RDMA_NLDEV_ATTR_RES_PID])
>  			free(comm);
>
> -		if (rd->json_output)
> -			jsonw_end_array(rd->jw);
> -		else
> -			pr_out("\n");
> +		print_driver_table(rd, nla_line[RDMA_NLDEV_ATTR_DRIVER]);
> +		newline(rd);
>  	}
>  	return MNL_CB_OK;
>  }
> @@ -919,10 +913,8 @@ static int res_mr_parse_cb(const struct nlmsghdr *nlh, void *data)
>  		if (nla_line[RDMA_NLDEV_ATTR_RES_PID])
>  			free(comm);
>
> -		if (rd->json_output)
> -			jsonw_end_array(rd->jw);
> -		else
> -			pr_out("\n");
> +		print_driver_table(rd, nla_line[RDMA_NLDEV_ATTR_DRIVER]);
> +		newline(rd);
>  	}
>  	return MNL_CB_OK;
>  }
> @@ -1004,10 +996,8 @@ static int res_pd_parse_cb(const struct nlmsghdr *nlh, void *data)
>  		if (nla_line[RDMA_NLDEV_ATTR_RES_PID])
>  			free(comm);
>
> -		if (rd->json_output)
> -			jsonw_end_array(rd->jw);
> -		else
> -			pr_out("\n");
> +		print_driver_table(rd, nla_line[RDMA_NLDEV_ATTR_DRIVER]);
> +		newline(rd);
>  	}
>  	return MNL_CB_OK;
>  }
> diff --git a/rdma/utils.c b/rdma/utils.c
> index 49c967f..452fe92 100644
> --- a/rdma/utils.c
> +++ b/rdma/utils.c
> @@ -11,6 +11,7 @@
>
>  #include "rdma.h"
>  #include <ctype.h>
> +#include <inttypes.h>
>
>  int rd_argc(struct rd *rd)
>  {
> @@ -393,8 +394,32 @@ static const enum mnl_attr_data_type nldev_policy[RDMA_NLDEV_ATTR_MAX] = {
>  	[RDMA_NLDEV_ATTR_RES_MRLEN] = MNL_TYPE_U64,
>  	[RDMA_NLDEV_ATTR_NDEV_INDEX]		= MNL_TYPE_U32,
>  	[RDMA_NLDEV_ATTR_NDEV_NAME]		= MNL_TYPE_NUL_STRING,
> +	[RDMA_NLDEV_ATTR_DRIVER] = MNL_TYPE_NESTED,
> +	[RDMA_NLDEV_ATTR_DRIVER_ENTRY] = MNL_TYPE_NESTED,
> +	[RDMA_NLDEV_ATTR_DRIVER_STRING] = MNL_TYPE_NUL_STRING,
> +	[RDMA_NLDEV_ATTR_DRIVER_PRINT_TYPE] = MNL_TYPE_U8,
> +	[RDMA_NLDEV_ATTR_DRIVER_S32] = MNL_TYPE_U32,
> +	[RDMA_NLDEV_ATTR_DRIVER_U32] = MNL_TYPE_U32,
> +	[RDMA_NLDEV_ATTR_DRIVER_S64] = MNL_TYPE_U64,
> +	[RDMA_NLDEV_ATTR_DRIVER_U64] = MNL_TYPE_U64,
>  };
>
> +int rd_attr_check(const struct nlattr *attr, int *typep)
> +{
> +	int type;
> +
> +	if (mnl_attr_type_valid(attr, RDMA_NLDEV_ATTR_MAX) < 0)
> +		return MNL_CB_ERROR;
> +
> +	type = mnl_attr_get_type(attr);
> +
> +	if (mnl_attr_validate(attr, nldev_policy[type]) < 0)
> +		return MNL_CB_ERROR;
> +
> +	*typep = nldev_policy[type];
> +	return MNL_CB_OK;
> +}
> +
>  int rd_attr_cb(const struct nlattr *attr, void *data)
>  {
>  	const struct nlattr **tb = data;
> @@ -660,3 +685,172 @@ struct dev_map *dev_map_lookup(struct rd *rd, bool allow_port_index)
>  	free(dev_name);
>  	return dev_map;
>  }
> +
> +#define nla_type(attr) ((attr)->nla_type & NLA_TYPE_MASK)
> +
> +void newline(struct rd *rd)
> +{
> +	if (rd->json_output)
> +		jsonw_end_array(rd->jw);
> +	else
> +		pr_out("\n");
> +}
> +
> +void newline_indent(struct rd *rd)
> +{
> +	newline(rd);
> +	if (!rd->json_output)
> +		pr_out("    ");
> +}
> +
> +static int print_driver_string(struct rd *rd, const char *key_str,
> +				 const char *val_str)
> +{
> +	if (rd->json_output) {
> +		jsonw_string_field(rd->jw, key_str, val_str);
> +		return 0;
> +	} else {
> +		return pr_out("%s %s ", key_str, val_str);
> +	}
> +}
> +
> +static int print_driver_s32(struct rd *rd, const char *key_str, int32_t val,
> +			      enum rdma_nldev_print_type print_type)
> +{
> +	if (rd->json_output) {
> +		jsonw_int_field(rd->jw, key_str, val);
> +		return 0;
> +	}
> +	switch (print_type) {
> +	case RDMA_NLDEV_PRINT_TYPE_UNSPEC:
> +		return pr_out("%s %d ", key_str, val);
> +	case RDMA_NLDEV_PRINT_TYPE_HEX:
> +		return pr_out("%s 0x%x ", key_str, val);
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
> +static int print_driver_u32(struct rd *rd, const char *key_str, uint32_t val,
> +			      enum rdma_nldev_print_type print_type)
> +{
> +	if (rd->json_output) {
> +		jsonw_int_field(rd->jw, key_str, val);
> +		return 0;
> +	}
> +	switch (print_type) {
> +	case RDMA_NLDEV_PRINT_TYPE_UNSPEC:
> +		return pr_out("%s %u ", key_str, val);
> +	case RDMA_NLDEV_PRINT_TYPE_HEX:
> +		return pr_out("%s 0x%x ", key_str, val);
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
> +static int print_driver_s64(struct rd *rd, const char *key_str, int64_t val,
> +			      enum rdma_nldev_print_type print_type)
> +{
> +	if (rd->json_output) {
> +		jsonw_int_field(rd->jw, key_str, val);
> +		return 0;
> +	}
> +	switch (print_type) {
> +	case RDMA_NLDEV_PRINT_TYPE_UNSPEC:
> +		return pr_out("%s %" PRId64 " ", key_str, val);
> +	case RDMA_NLDEV_PRINT_TYPE_HEX:
> +		return pr_out("%s 0x%" PRIx64 " ", key_str, val);
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
> +static int print_driver_u64(struct rd *rd, const char *key_str, uint64_t val,
> +			      enum rdma_nldev_print_type print_type)
> +{
> +	if (rd->json_output) {
> +		jsonw_int_field(rd->jw, key_str, val);
> +		return 0;
> +	}
> +	switch (print_type) {
> +	case RDMA_NLDEV_PRINT_TYPE_UNSPEC:
> +		return pr_out("%s %" PRIu64 " ", key_str, val);
> +	case RDMA_NLDEV_PRINT_TYPE_HEX:
> +		return pr_out("%s 0x%" PRIx64 " ", key_str, val);
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
> +static int print_driver_entry(struct rd *rd, struct nlattr *key_attr,
> +				struct nlattr *val_attr,
> +				enum rdma_nldev_print_type print_type)
> +{
> +	const char *key_str = mnl_attr_get_str(key_attr);
> +	int attr_type = nla_type(val_attr);
> +
> +	switch (attr_type) {
> +	case RDMA_NLDEV_ATTR_DRIVER_STRING:
> +		return print_driver_string(rd, key_str,
> +				mnl_attr_get_str(val_attr));
> +	case RDMA_NLDEV_ATTR_DRIVER_S32:
> +		return print_driver_s32(rd, key_str,
> +				mnl_attr_get_u32(val_attr), print_type);
> +	case RDMA_NLDEV_ATTR_DRIVER_U32:
> +		return print_driver_u32(rd, key_str,
> +				mnl_attr_get_u32(val_attr), print_type);
> +	case RDMA_NLDEV_ATTR_DRIVER_S64:
> +		return print_driver_s64(rd, key_str,
> +				mnl_attr_get_u64(val_attr), print_type);
> +	case RDMA_NLDEV_ATTR_DRIVER_U64:
> +		return print_driver_u64(rd, key_str,
> +				mnl_attr_get_u64(val_attr), print_type);
> +	}
> +	return -EINVAL;
> +}
> +
> +void print_driver_table(struct rd *rd, struct nlattr *tb)
> +{
> +	int print_type = RDMA_NLDEV_PRINT_TYPE_UNSPEC;
> +	struct nlattr *tb_entry, *key = NULL, *val;
> +	int type, cc = 0;
> +
> +	if (!rd->show_driver_details || !tb)
> +		return;
> +
> +	if (rd->pretty_output)
> +		newline_indent(rd);
> +
> +	/*
> +	 * Driver attrs are tuples of {key, [print-type], value}.
> +	 * The key must be a string.  If print-type is present, it
> +	 * defines an alternate printf format type vs the native format
> +	 * for the attribute.  And the value can be any available
> +	 * driver type.
> +	 */
> +	mnl_attr_for_each_nested(tb_entry, tb) {
> +
> +		if (cc > MAX_LINE_LENGTH) {
> +			if (rd->pretty_output)
> +				newline_indent(rd);
> +			cc = 0;
> +		}
> +		if (rd_attr_check(tb_entry, &type) != MNL_CB_OK)
> +			return;
> +		if (!key) {
> +			if (type != MNL_TYPE_NUL_STRING)
> +				return;
> +			key = tb_entry;
> +		} else if (type == MNL_TYPE_U8) {
> +			print_type = mnl_attr_get_u8(tb_entry);
> +		} else {
> +			val = tb_entry;
> +			cc += print_driver_entry(rd, key, val, print_type);

I stopped to read here, because of two problems:
1. print_driver_entry can return negative number, so unclear to me what
will be the final result of "cc += ..".
2. The netlink design is to ignore unknown attributes and not return
error. It allows to use new kernels with old applications.

> +			if (cc < 0)
> +				return;
> +			print_type = RDMA_NLDEV_PRINT_TYPE_UNSPEC;
> +			key = NULL;
> +		}
> +	}
> +	return;
> +}
> --
> 1.8.3.1
>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* Re: [PATCH v1 iproute2-next 1/3] rdma: update rdma_netlink.h to get driver attrs
From: Leon Romanovsky @ 2018-05-13 13:15 UTC (permalink / raw)
  To: Steve Wise; +Cc: dsahern, stephen, netdev, linux-rdma
In-Reply-To: <312486eb14b460e455e7b2926d7ea06e3a8411fc.1525709213.git.swise@opengridcomputing.com>

[-- Attachment #1: Type: text/plain, Size: 2423 bytes --]

On Mon, May 07, 2018 at 08:53:10AM -0700, Steve Wise wrote:
> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
> ---
>  rdma/include/uapi/rdma/rdma_netlink.h | 37 ++++++++++++++++++++++++++++++++++-
>  1 file changed, 36 insertions(+), 1 deletion(-)

Please write in commit message something like: "Based on kernel commit
....", so we will be able to track changes.

>
> diff --git a/rdma/include/uapi/rdma/rdma_netlink.h b/rdma/include/uapi/rdma/rdma_netlink.h
> index 45474f1..40be0d8 100644
> --- a/rdma/include/uapi/rdma/rdma_netlink.h
> +++ b/rdma/include/uapi/rdma/rdma_netlink.h
> @@ -249,10 +249,22 @@ enum rdma_nldev_command {
>  	RDMA_NLDEV_NUM_OPS
>  };
>
> +enum {
> +	RDMA_NLDEV_ATTR_ENTRY_STRLEN = 16,
> +};
> +
> +enum rdma_nldev_print_type {
> +	RDMA_NLDEV_PRINT_TYPE_UNSPEC,
> +	RDMA_NLDEV_PRINT_TYPE_HEX,
> +};
> +
>  enum rdma_nldev_attr {
>  	/* don't change the order or add anything between, this is ABI! */
>  	RDMA_NLDEV_ATTR_UNSPEC,
>
> +	/* Pad attribute for 64b alignment */
> +	RDMA_NLDEV_ATTR_PAD = RDMA_NLDEV_ATTR_UNSPEC,
> +
>  	/* Identifier for ib_device */
>  	RDMA_NLDEV_ATTR_DEV_INDEX,		/* u32 */
>
> @@ -387,8 +399,31 @@ enum rdma_nldev_attr {
>  	RDMA_NLDEV_ATTR_RES_PD_ENTRY,		/* nested table */
>  	RDMA_NLDEV_ATTR_RES_LOCAL_DMA_LKEY,	/* u32 */
>  	RDMA_NLDEV_ATTR_RES_UNSAFE_GLOBAL_RKEY,	/* u32 */
> +	/*
> +	 * driver-specific attributes.
> +	 */
> +	RDMA_NLDEV_ATTR_DRIVER,			/* nested table */
> +	RDMA_NLDEV_ATTR_DRIVER_ENTRY,		/* nested table */
> +	RDMA_NLDEV_ATTR_DRIVER_STRING,		/* string */
> +	/*
> +	 * u8 values from enum rdma_nldev_print_type
> +	 */
> +	RDMA_NLDEV_ATTR_DRIVER_PRINT_TYPE,	/* u8 */
> +	RDMA_NLDEV_ATTR_DRIVER_S32,		/* s32 */
> +	RDMA_NLDEV_ATTR_DRIVER_U32,		/* u32 */
> +	RDMA_NLDEV_ATTR_DRIVER_S64,		/* s64 */
> +	RDMA_NLDEV_ATTR_DRIVER_U64,		/* u64 */
>
> -	/* Netdev information for relevant protocols, like RoCE and iWARP */
> +	/*
> +	 * Provides logical name and index of netdevice which is
> +	 * connected to physical port. This information is relevant
> +	 * for RoCE and iWARP.
> +	 *
> +	 * The netdevices which are associated with containers are
> +	 * supposed to be exported together with GID table once it
> +	 * will be exposed through the netlink. Because the
> +	 * associated netdevices are properties of GIDs.
> +	 */
>  	RDMA_NLDEV_ATTR_NDEV_INDEX,		/* u32 */
>  	RDMA_NLDEV_ATTR_NDEV_NAME,		/* string */
>
> --
> 1.8.3.1
>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* Re: [PATCH v1 iproute2-next 2/3] rdma: print driver resource attributes
From: Leon Romanovsky @ 2018-05-13 13:10 UTC (permalink / raw)
  To: David Ahern; +Cc: Steve Wise, stephen, netdev, linux-rdma
In-Reply-To: <06d8cb88-21e0-b7cc-10e2-efa453d9adc9@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1585 bytes --]

On Thu, May 10, 2018 at 08:20:51AM -0600, David Ahern wrote:
> On 5/10/18 8:19 AM, Steve Wise wrote:
> >
> > On 5/9/2018 11:08 PM, David Ahern wrote:
> >> On 5/7/18 9:53 AM, Steve Wise wrote:
> >>> @@ -152,7 +153,10 @@ int main(int argc, char **argv)
> >>>  			pretty_output = true;
> >>>  			break;
> >>>  		case 'd':
> >>> -			show_details = true;
> >>> +			if (show_details)
> >>> +				show_driver_details = true;
> >>> +			else
> >>> +				show_details = true;
> >>>  			break;
> >>>  		case 'j':
> >>>  			json_output = true;
> >> The above change should be reflected in the man page.
> >
> > I did mention it in the man page:
> >
> >        -d, --details
> >               Output detailed information.  Adding a second -d includes
> > driver-specific details.
> >
> > But I wasn't sure how to show it in the syntax.  Maybe this?
> >
> >  OPTIONS := { -V[ersion] | -d[etails] [-d[etails]] } -j[son] } -p[retty] }
>
> I should have read the second patch before commenting. Didn't it have
> first -d = details, a second -d = driver details? That should be fine.

Yes, our idea is to require "-dd" to print such driver specific
information. The level of nesting is:
 * No arguments -> info usable for most of the users
 * -d - pre-parsed flags and rarely used information.
 * -dd - very detailed output, can be very specific to device.

Thanks

>
> >
> >
> >> Also, the set needs to be respun after I merged master where Stephen
> >> brought in updates to the uapi files.
> >
> > Will do.  Thanks for reviewing.
> >
> > Steve.
> >
>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* Kernel panic on kernel-3.10.0-693.21.1.el7 in ndisc.h
From: Roman Makhov @ 2018-05-13 11:35 UTC (permalink / raw)
  To: linux-wpan, netdev

Hello,

We have a problem with Kernel panic after upgrade from CentOS 7.3
(kernel-3.10.0-514.el7) to CentOS 7.4 (kernel-3.10.0-693.21.1.el7).
It occurs when we have the incoming traffic from other nodes and we
are performing the re-configuration of IPv6 interfaces.

It is high-availability system without 802.15.4 support.

The log of crash:
=========================================================
#10 [ffff88043fc03cf0] async_page_fault at ffffffff816b7798
    [exception RIP: ndisc_send_rs+238]
    RIP: ffffffff8166575e  RSP: ffff88043fc03da8  RFLAGS: 00010202
    RAX: 0000000000000002  RBX: ffff88042caa9000  RCX: 0000000000000001
    RDX: 0000000000000000  RSI: 0000000000000200  RDI: ffffffff816534f7
    RBP: ffff88043fc03dd0   R8: 0000000000000000   R9: ffffffff81e9f1c0
    R10: 0000000000000002  R11: ffff88043fc03da8  R12: 0000000000000008
    R13: 0000000000000006  R14: ffff88043fc03de0  R15: ffffffff81772410
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#11 [ffff88043fc03da0] ndisc_send_rs at ffffffff81665704
=========================================================

I see that crash points on ndisc.h, it is ndisc_ops_opt_addr_space()
in function:
=========================================================
crash> kmem ffffffff8166575e
ffffffff8166575e (T) ndisc_send_rs+238
/usr/src/debug/kernel-3.10.0-693.21.1.el7/linux-3.10.0-693.21.1.el7.x86_64/include/net/ndisc.h:
251

      PAGE        PHYSICAL      MAPPING       INDEX CNT FLAGS
ffffea0000059940   1665000                0        0  1 1fffff00000400 reserved
crash>
=========================================================

I checked the difference between 514 and 693 kernels is in the patch
https://patchwork.kernel.org/patch/9179229/ .

Any suggesions about what I am doing wrong are welcome.

Thanks!
Roman Makhov

^ permalink raw reply

* [RFC PATCH] net: Remove a confusing comment of macro SIOCDEVPRIVATE
From: Jian-Hong Pan @ 2018-05-13  9:54 UTC (permalink / raw)
  To: Philippe Ombredanne, Greg Kroah-Hartman, Thomas Gleixner,
	Kate Stewart, David S. Miller, netdev, linux-kernel
  Cc: Jian-Hong Pan

I have been reading the NET related header files recently.  I found
there is a macro "#define SIOCDEVPRIVATE 0x89F0" defined in
include/uapi/linux/sockios.h which is useful for private controls of net
devices.  When I read this section:

/* Device private ioctl calls */

/*
 *	These 16 ioctls are available to devices via the do_ioctl() device
 *	vector. Each device should include this file and redefine these names
 *	as their own. Because these are device dependent it is a good idea
 *	_NOT_ to issue them to random objects and hope.
 *
 *	THESE IOCTLS ARE _DEPRECATED_ AND WILL DISAPPEAR IN 2.5.X -DaveM
 */

I notice there is a string in the comment:
"THESE IOCTLS ARE _DEPRECATED_ AND WILL DISAPPEAR IN 2.5.X -DaveM"
which makes me confused.  Because, there are still a lot of devices or
subsystems using this macro, for example, ethernet, appletalk,
usb/rtl8150 ..., etc.

Therefore, I make this patch to remove the confusing comment.

Signed-off-by: Jian-Hong Pan <starnight@g.ncu.edu.tw>
---
 include/uapi/linux/sockios.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/include/uapi/linux/sockios.h b/include/uapi/linux/sockios.h
index d393e9ed3964..c166f8c6b20f 100644
--- a/include/uapi/linux/sockios.h
+++ b/include/uapi/linux/sockios.h
@@ -139,8 +139,6 @@
  *	vector. Each device should include this file and redefine these names
  *	as their own. Because these are device dependent it is a good idea
  *	_NOT_ to issue them to random objects and hope.
- *
- *	THESE IOCTLS ARE _DEPRECATED_ AND WILL DISAPPEAR IN 2.5.X -DaveM
  */
  
 #define SIOCDEVPRIVATE	0x89F0	/* to 89FF */
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH V2] mlx4_core: allocate ICM memory in page size chunks
From: Tariq Toukan @ 2018-05-13  9:00 UTC (permalink / raw)
  To: Qing Huang, tariqt, davem, haakon.bugge, yanjun.zhu
  Cc: netdev, linux-rdma, linux-kernel
In-Reply-To: <20180511192318.22342-1-qing.huang@oracle.com>



On 11/05/2018 10:23 PM, Qing Huang wrote:
> When a system is under memory presure (high usage with fragments),
> the original 256KB ICM chunk allocations will likely trigger kernel
> memory management to enter slow path doing memory compact/migration
> ops in order to complete high order memory allocations.
> 
> When that happens, user processes calling uverb APIs may get stuck
> for more than 120s easily even though there are a lot of free pages
> in smaller chunks available in the system.
> 
> Syslog:
> ...
> Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task
> oracle_205573_e:205573 blocked for more than 120 seconds.
> ...
> 
> With 4KB ICM chunk size on x86_64 arch, the above issue is fixed.
> 
> However in order to support smaller ICM chunk size, we need to fix
> another issue in large size kcalloc allocations.
> 
> E.g.
> Setting log_num_mtt=30 requires 1G mtt entries. With the 4KB ICM chunk
> size, each ICM chunk can only hold 512 mtt entries (8 bytes for each mtt
> entry). So we need a 16MB allocation for a table->icm pointer array to
> hold 2M pointers which can easily cause kcalloc to fail.
> 
> The solution is to use vzalloc to replace kcalloc. There is no need
> for contiguous memory pages for a driver meta data structure (no need
> of DMA ops).
> 
> Signed-off-by: Qing Huang <qing.huang@oracle.com>
> Acked-by: Daniel Jurgens <danielj@mellanox.com>
> Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
> ---
> v2 -> v1: adjusted chunk size to reflect different architectures.
> 
>   drivers/net/ethernet/mellanox/mlx4/icm.c | 14 +++++++-------
>   1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c b/drivers/net/ethernet/mellanox/mlx4/icm.c
> index a822f7a..ccb62b8 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/icm.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/icm.c
> @@ -43,12 +43,12 @@
>   #include "fw.h"
>   
>   /*
> - * We allocate in as big chunks as we can, up to a maximum of 256 KB
> - * per chunk.
> + * We allocate in page size (default 4KB on many archs) chunks to avoid high
> + * order memory allocations in fragmented/high usage memory situation.
>    */
>   enum {
> -	MLX4_ICM_ALLOC_SIZE	= 1 << 18,
> -	MLX4_TABLE_CHUNK_SIZE	= 1 << 18
> +	MLX4_ICM_ALLOC_SIZE	= 1 << PAGE_SHIFT,
> +	MLX4_TABLE_CHUNK_SIZE	= 1 << PAGE_SHIFT

Which is actually PAGE_SIZE.
Also, please add a comma at the end of the last entry.

>   };
>   
>   static void mlx4_free_icm_pages(struct mlx4_dev *dev, struct mlx4_icm_chunk *chunk)
> @@ -400,7 +400,7 @@ int mlx4_init_icm_table(struct mlx4_dev *dev, struct mlx4_icm_table *table,
>   	obj_per_chunk = MLX4_TABLE_CHUNK_SIZE / obj_size;
>   	num_icm = (nobj + obj_per_chunk - 1) / obj_per_chunk;
>   
> -	table->icm      = kcalloc(num_icm, sizeof(*table->icm), GFP_KERNEL);
> +	table->icm      = vzalloc(num_icm * sizeof(*table->icm));

Why not kvzalloc ?

>   	if (!table->icm)
>   		return -ENOMEM;
>   	table->virt     = virt;
> @@ -446,7 +446,7 @@ int mlx4_init_icm_table(struct mlx4_dev *dev, struct mlx4_icm_table *table,
>   			mlx4_free_icm(dev, table->icm[i], use_coherent);
>   		}
>   
> -	kfree(table->icm);
> +	vfree(table->icm);
>   
>   	return -ENOMEM;
>   }
> @@ -462,5 +462,5 @@ void mlx4_cleanup_icm_table(struct mlx4_dev *dev, struct mlx4_icm_table *table)
>   			mlx4_free_icm(dev, table->icm[i], table->coherent);
>   		}
>   
> -	kfree(table->icm);
> +	vfree(table->icm);
>   }
> 

Thanks for your patch.

I need to verify there is no dramatic performance degradation here.
You can prepare and send a v3 in the meanwhile.

Thanks,
Tariq

^ permalink raw reply

* Re: [PATCH] dt-bindings: net: ravb: Add support for r8a77990 SoC
From: Simon Horman @ 2018-05-13  7:58 UTC (permalink / raw)
  To: David Miller
  Cc: yoshihiro.shimoda.uh, netdev, linux-renesas-soc, robh+dt,
	mark.rutland, sergei.shtylyov, devicetree
In-Reply-To: <20180511.155942.16024095909155343.davem@davemloft.net>

On Fri, May 11, 2018 at 03:59:42PM -0400, David Miller wrote:
> From: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> Date: Fri, 11 May 2018 12:18:56 +0900
> 
> > Add documentation for r8a77990 compatible string to renesas ravb device
> > tree bindings documentation.
> > 
> > Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> 
> I'm assuming this isn't targetted at one of my trees.  Just FYI.

Hi Dave,

I think this is appropriate for net-next but if not I can take it.

Reviewed-by: Simon Horman <horms+renesas@verge.net.au>


Shimoda-san,

please use "[PATCH net-next]" for non-bugfix networking updates which
are targeted at Dave's net-next tree. Bug fixes should be for "net".
Patches should of course apply cleanly to whichever tree it is targeted at.

^ permalink raw reply

* Re: [PATCH] net/mlx4_core: Fix error handling in mlx4_init_port_info.
From: Tariq Toukan @ 2018-05-13  7:02 UTC (permalink / raw)
  To: Tarick Bedeir, gthelen, netdev, linux-rdma, linux-kernel
In-Reply-To: <7e5d6d30-ed89-8a8e-55c1-a25897937727@mellanox.com>



On 02/05/2018 4:31 PM, Tariq Toukan wrote:
> 
> 
> On 27/04/2018 6:20 PM, Tarick Bedeir wrote:
>> Avoid exiting the function with a lingering sysfs file (if the first
>> call to device_create_file() fails while the second succeeds), and avoid
>> calling devlink_port_unregister() twice.
>>
>> In other words, either mlx4_init_port_info() succeeds and returns 
>> zero, or
>> it fails, returns non-zero, and requires no cleanup.
>>
>> Signed-off-by: Tarick Bedeir <tarick@google.com>
>> ---
>>   drivers/net/ethernet/mellanox/mlx4/main.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c 
>> b/drivers/net/ethernet/mellanox/mlx4/main.c
>> index 4d84cab77105..e8a3a45d0b53 100644
>> --- a/drivers/net/ethernet/mellanox/mlx4/main.c
>> +++ b/drivers/net/ethernet/mellanox/mlx4/main.c
>> @@ -3007,6 +3007,7 @@ static int mlx4_init_port_info(struct mlx4_dev 
>> *dev, int port)
>>           mlx4_err(dev, "Failed to create file for port %d\n", port);
>>           devlink_port_unregister(&info->devlink_port);
>>           info->port = -1;
>> +        return err;
>>       }
>>       sprintf(info->dev_mtu_name, "mlx4_port%d_mtu", port);
>> @@ -3028,9 +3029,10 @@ static int mlx4_init_port_info(struct mlx4_dev 
>> *dev, int port)
>>                      &info->port_attr);
>>           devlink_port_unregister(&info->devlink_port);
>>           info->port = -1;
>> +        return err;
>>       }
>> -    return err;
>> +    return 0;
>>   }
>>   static void mlx4_cleanup_port_info(struct mlx4_port_info *info)
>>
> Acked-by: Tariq Toukan <tariqt@mellanox.com>
> 
> Thanks Tarick.

Actually, you need to add a Fixes line:

Fixes: 096335b3f983 ("mlx4_core: Allow dynamic MTU configuration for IB 
ports")

^ permalink raw reply

* [PATCH v2] {net, IB}/mlx5: Use 'kvfree()' for memory allocated by 'kvzalloc()'
From: Christophe JAILLET @ 2018-05-13  7:00 UTC (permalink / raw)
  To: saeedm, matanb, leon, dledford, jgg, davem
  Cc: netdev, linux-rdma, linux-kernel, kernel-janitors,
	Christophe JAILLET

When 'kvzalloc()' is used to allocate memory, 'kvfree()' must be used to
free it.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
---
v1 -> v2: More places to update have been added to the patch
---
 drivers/infiniband/hw/mlx5/cq.c                            | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/vport.c            | 6 +++---
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 77d257ec899b..6d52ea03574e 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -849,7 +849,7 @@ static int create_cq_user(struct mlx5_ib_dev *dev, struct ib_udata *udata,
 	return 0;
 
 err_cqb:
-	kfree(*cqb);
+	kvfree(*cqb);
 
 err_db:
 	mlx5_ib_db_unmap_user(to_mucontext(context), &cq->db);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 35e256eb2f6e..b123f8a52ad8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -663,7 +663,7 @@ static int esw_create_vport_rx_group(struct mlx5_eswitch *esw)
 
 	esw->offloads.vport_rx_group = g;
 out:
-	kfree(flow_group_in);
+	kvfree(flow_group_in);
 	return err;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
index 177e076b8d17..719cecb182c6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
@@ -511,7 +511,7 @@ int mlx5_query_nic_vport_system_image_guid(struct mlx5_core_dev *mdev,
 	*system_image_guid = MLX5_GET64(query_nic_vport_context_out, out,
 					nic_vport_context.system_image_guid);
 
-	kfree(out);
+	kvfree(out);
 
 	return 0;
 }
@@ -531,7 +531,7 @@ int mlx5_query_nic_vport_node_guid(struct mlx5_core_dev *mdev, u64 *node_guid)
 	*node_guid = MLX5_GET64(query_nic_vport_context_out, out,
 				nic_vport_context.node_guid);
 
-	kfree(out);
+	kvfree(out);
 
 	return 0;
 }
@@ -587,7 +587,7 @@ int mlx5_query_nic_vport_qkey_viol_cntr(struct mlx5_core_dev *mdev,
 	*qkey_viol_cntr = MLX5_GET(query_nic_vport_context_out, out,
 				   nic_vport_context.qkey_violation_counter);
 
-	kfree(out);
+	kvfree(out);
 
 	return 0;
 }
-- 
2.17.0

^ permalink raw reply related

* Re: [Intel-wired-lan] [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes
From: Neftin, Sasha @ 2018-05-13  6:55 UTC (permalink / raw)
  To: Keller, Jacob E, Benjamin Poirier, Kirsher, Jeffrey T
  Cc: ehabkost@redhat.com, netdev@vger.kernel.org, jayanth@goubiq.com,
	linux-kernel@vger.kernel.org, postmodern.mod3@gmail.com,
	Achim Mildenberger, intel-wired-lan@lists.osuosl.org,
	Bart.VanAssche@wdc.com, olouvignes@gmail.com
In-Reply-To: <02874ECE860811409154E81DA85FBB5882DD85D3@ORSMSX115.amr.corp.intel.com>

On 5/10/2018 21:42, Keller, Jacob E wrote:
>> -----Original Message-----
>> From: Benjamin Poirier [mailto:bpoirier@suse.com]
>> Sent: Thursday, May 10, 2018 12:29 AM
>> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
>> Cc: Keller, Jacob E <jacob.e.keller@intel.com>; Achim Mildenberger
>> <admin@fph.physik.uni-karlsruhe.de>; olouvignes@gmail.com;
>> jayanth@goubiq.com; ehabkost@redhat.com; postmodern.mod3@gmail.com;
>> Bart.VanAssche@wdc.com; intel-wired-lan@lists.osuosl.org;
>> netdev@vger.kernel.org; linux-kernel@vger.kernel.org
>> Subject: [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes
>>
>> There have been multiple reports of crashes that look like
>> kernel: RIP: 0010:[<ffffffff8110303f>] timecounter_read+0xf/0x50
>> [...]
>> kernel: Call Trace:
>> kernel:  [<ffffffffa0806b0f>] e1000e_phc_gettime+0x2f/0x60 [e1000e]
>> kernel:  [<ffffffffa0806c5d>] e1000e_systim_overflow_work+0x1d/0x80 [e1000e]
>> kernel:  [<ffffffff810992c5>] process_one_work+0x155/0x440
>> kernel:  [<ffffffff81099e16>] worker_thread+0x116/0x4b0
>> kernel:  [<ffffffff8109f422>] kthread+0xd2/0xf0
>> kernel:  [<ffffffff8163184f>] ret_from_fork+0x3f/0x70
>>
>> These can be traced back to the fact that e1000e_systim_reset() skips the
>> timecounter_init() call if e1000e_get_base_timinca() returns -EINVAL, which
>> leads to a null deref in timecounter_read().
>>
>> Commit 83129b37ef35 ("e1000e: fix systim issues", v4.2-rc1) reworked
>> e1000e_get_base_timinca() in such a way that it can return -EINVAL for
>> e1000_pch_spt if the SYSCFI bit is not set in TSYNCRXCTL.
>>
>> Some experimentation has shown that on I219 (e1000_pch_spt, "MAC: 12")
>> adapters, the E1000_TSYNCRXCTL_SYSCFI flag is unstable; TSYNCRXCTL reads
>> sometimes don't have the SYSCFI bit set. Retrying the read shortly after
>> finds the bit to be set. This was observed at boot (probe) but also link up
>> and link down.
>>
>> Moreover, the phc (PTP Hardware Clock) seems to operate normally even after
>> reads where SYSCFI=0. Therefore, remove this register read and
>> unconditionally set the clock parameters.
>>
>> Reported-by: Achim Mildenberger <admin@fph.physik.uni-karlsruhe.de>
>> Message-Id: <20180425065243.g5mqewg5irkwgwgv@f2>
>> Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1075876
>> Fixes: 83129b37ef35 ("e1000e: fix systim issues")
>> Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
>> ---
>>   drivers/net/ethernet/intel/e1000e/netdev.c | 15 ++++++---------
>>   1 file changed, 6 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c
>> b/drivers/net/ethernet/intel/e1000e/netdev.c
>> index ec4a9759a6f2..3afb1f3b6f91 100644
>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>> @@ -3546,15 +3546,12 @@ s32 e1000e_get_base_timinca(struct e1000_adapter
>> *adapter, u32 *timinca)
>>   		}
>>   		break;
>>   	case e1000_pch_spt:
>> -		if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
>> -			/* Stable 24MHz frequency */
>> -			incperiod = INCPERIOD_24MHZ;
>> -			incvalue = INCVALUE_24MHZ;
>> -			shift = INCVALUE_SHIFT_24MHZ;
>> -			adapter->cc.shift = shift;
>> -			break;
>> -		}
>> -		return -EINVAL;
>> +		/* Stable 24MHz frequency */
>> +		incperiod = INCPERIOD_24MHZ;
>> +		incvalue = INCVALUE_24MHZ;
>> +		shift = INCVALUE_SHIFT_24MHZ;
>> +		adapter->cc.shift = shift;
>> +		break;
>>   	case e1000_pch_cnp:
>>   		if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
>>   			/* Stable 24MHz frequency */
>> --
>> 2.16.3
> 
> Given testing showing that the clock operates fine regardless of the register read, I think this is probably fine. Normally I believe the register was used to check which frequency was in use, but it doesn't seem to serve that purpose here.
> 
> Thanks,
> Jake
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
> 
I've checked our specification, looks only 24MHz used for this product. 
Hope no different platform with another clock support has been 
distributed. So, let's pick up this change.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox