* Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi
From: Michael S. Tsirkin @ 2017-08-22 17:55 UTC (permalink / raw)
To: Jason Wang
Cc: Willem de Bruijn, Koichiro Den, virtualization,
Network Development
In-Reply-To: <64d451ae-9944-e978-5a05-54bb1a62aaad@redhat.com>
On Tue, Aug 22, 2017 at 10:50:41AM +0800, Jason Wang wrote:
> > Perhaps the descriptor pool should also be
> > revised to allow out of order completions. Then there is no need to
> > copy zerocopy packets whenever they may experience delay.
>
> Yes, but as replied in the referenced thread, windows driver may treat out
> of order completion as a bug.
That would be a windows driver bug then, but I don't think it makes this
assumption. What the referenced thread
(https://patchwork.kernel.org/patch/3787671/) is saying is that host
must use any buffers made available on a tx vq within a reasonable
timeframe otherwise windows guests panic.
Ideally we would detect that a packet is actually experiencing delay and
trigger the copy at that point e.g. by calling skb_linearize. But it
isn't easy to track these packets though and even harder to do a data
copy without races.
Which reminds me that skb_linearize in net core seems to be
fundamentally racy - I suspect that if skb is cloned, and someone is
trying to use the shared frags while another thread calls skb_linearize,
we get some use after free bugs which likely mostly go undetected
because the corrupted packets mostly go on wire and get dropped
by checksum code.
--
MST
^ permalink raw reply
* Re: [PATCH V8 net-next 00/22] Huawei HiNIC Ethernet Driver
From: David Miller @ 2017-08-22 17:58 UTC (permalink / raw)
To: aviad.krawczyk
Cc: linux-kernel, netdev, bc.y, victor.gissin, zhaochen6, tony.qu
In-Reply-To: <cover.1503330613.git.aviad.krawczyk@huawei.com>
From: Aviad Krawczyk <aviad.krawczyk@huawei.com>
Date: Mon, 21 Aug 2017 23:55:46 +0800
> The patch-set contains the support of the HiNIC Ethernet driver for
> hinic family of PCIE Network Interface Cards.
Series applied, thanks.
^ permalink raw reply
* Re: [PATCH v12 0/8] Replace PCI pool by DMA pool API
From: Doug Ledford @ 2017-08-22 18:00 UTC (permalink / raw)
To: Romain Perier, Dan Williams, Sean Hefty, Hal Rosenstock,
jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w, David S. Miller,
stas.yakovlev-Re5JQEeQqe8AvxtiuMwx3w
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Greg Kroah-Hartman
In-Reply-To: <20170822114701.14907-1-romain.perier-ZGY8ohtN/8qB+jHODAdFcQ@public.gmane.org>
On Tue, 2017-08-22 at 13:46 +0200, Romain Perier wrote:
> The current PCI pool API are simple macro functions direct expanded
> to
> the appropriate dma pool functions. The prototypes are almost the
> same
> and semantically, they are very similar. I propose to use the DMA
> pool
> API directly and get rid of the old API.
>
> This set of patches, replaces the old API by the dma pool API
> and remove the defines.
>
> Changes in v12:
> - Rebased series onto next-20170822
Hi Romain,
I've applied the three that are in my purview to my for-next branch
(mthca, mlx4, and mlx5 patches). Thanks.
--
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi
From: David Miller @ 2017-08-22 18:01 UTC (permalink / raw)
To: mst; +Cc: jasowang, willemdebruijn.kernel, den, virtualization, netdev
In-Reply-To: <20170822204015-mutt-send-email-mst@kernel.org>
From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Tue, 22 Aug 2017 20:55:56 +0300
> Which reminds me that skb_linearize in net core seems to be
> fundamentally racy - I suspect that if skb is cloned, and someone is
> trying to use the shared frags while another thread calls skb_linearize,
> we get some use after free bugs which likely mostly go undetected
> because the corrupted packets mostly go on wire and get dropped
> by checksum code.
Indeed, it does assume that the skb from which the clone was made
never has it's geometry changed.
I don't think even the TCP retransmit queue has this guarantee.
^ permalink raw reply
* RE: [PATCH net] net: dsa: skb_put_padto() already frees nskb
From: Woojung.Huh @ 2017-08-22 18:01 UTC (permalink / raw)
To: f.fainelli, netdev; +Cc: davem, andrew, vivien.didelot
In-Reply-To: <52a0e01e-8a55-2360-ca11-b65c613a812c@gmail.com>
> > Because skb_put_padto() frees skb when it fails, below lines in
> e71cb9e00922
> > ("net: dsa: ksz: fix skb freeing") will be an issue to.
> >
> > if (skb_tailroom(skb) >= padlen + KSZ_INGRESS_TAG_LEN) {
> > + if (skb_put_padto(skb, skb->len + padlen))
> > + return NULL;
> > +
> >
> > When it fails skb will be freed twice in skb_put_padto() and
> > caller of dsa_slave_xmit().
>
> You are right, I am not sure what is the best way to fix tag_ksz.c other
> than somehow open coding skb_put_padto() minus the freeing on error part?
Agree. May need to go back to 1st submission without skb_put_padto().
Will check more if any other ideas.
Thanks.
Woojung
^ permalink raw reply
* Re: XDP redirect measurements, gotchas and tracepoints
From: Michael Chan @ 2017-08-22 18:02 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: xdp-newbies@vger.kernel.org, John Fastabend, Daniel Borkmann,
Andy Gospodarek, netdev@vger.kernel.org, Paweł Staszewski
In-Reply-To: <20170821212506.1cb0d5d6@redhat.com>
On Mon, Aug 21, 2017 at 12:25 PM, Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
>
> I'be been playing with the latest XDP_REDIRECT feature, that was
> accepted in net-next (for ixgbe), see merge commit[1].
> [1] https://git.kernel.org/davem/net-next/c/6093ec2dc31
>
Just catching on XDP_REDIRECT and I have a very basic question. The
ingress device passes the XDP buffer to the egress device for XDP
redirect transmission. When the egress device has transmitted the
packet, is it supposed to just free the buffer? Or is it supposed to
be recycled?
In XDP_TX, the buffer is recycled back to the rx ring.
^ permalink raw reply
* Re: [PATCH V4 net-next] net: hns3: Add support to change MTU in HNS3 hardware
From: David Miller @ 2017-08-22 18:03 UTC (permalink / raw)
To: salil.mehta
Cc: yisen.zhuang, lipeng321, mehta.salil.lnk, netdev, linux-kernel,
linux-rdma, linuxarm
In-Reply-To: <20170821160524.32760-1-salil.mehta@huawei.com>
From: Salil Mehta <salil.mehta@huawei.com>
Date: Mon, 21 Aug 2017 17:05:24 +0100
> This patch adds the following support to the HNS3 driver:
> 1. Support to change the Maximum Transmission Unit of a
> port in the HNS NIC hardware.
> 2. Initializes the supported MTU range for the netdevice.
>
> Signed-off-by: lipeng <lipeng321@huawei.com>
> Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
> Reviewed-by: Andrew Lunn <andrew@lunn.ch>
> Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Applied.
^ permalink raw reply
* Re: [PATCH net-next 3/4] bpf/verifier: when pruning a branch, ignore its write marks
From: Edward Cree @ 2017-08-22 18:03 UTC (permalink / raw)
To: Alexei Starovoitov, davem, Alexei Starovoitov, Daniel Borkmann
Cc: netdev, iovisor-dev
In-Reply-To: <97f3da2d-abb0-ba1f-07ca-ea426f30c7fb@solarflare.com>
On 22/08/17 16:50, Edward Cree wrote:
> On 22/08/17 16:24, Alexei Starovoitov wrote:
>> Do you have a test case for this by any chance?
> I think something like
> if (cond)
> r0=0;
> if (cond)
> r0=0;
> return r0;
> might tickle the bug, but I'm not sure.
It turns out that (cond) has to be constructed not to alter our knowledge
of whatever register we're testing, but apart from that, this works.
{
"liveness pruning and write screening",
.insns = {
/* Get an unknown value */
BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, 0),
/* branch conditions teach us nothing about R2 */
BPF_JMP_IMM(BPF_JGE, BPF_REG_2, 0, 1),
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_JMP_IMM(BPF_JGE, BPF_REG_2, 0, 1),
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
},
.errstr = "R0 !read_ok",
.result = REJECT,
.prog_type = BPF_PROG_TYPE_LWT_IN,
},
This test fails on net-next, but passes with my patch.
I'll include it in the next spin of the series.
^ permalink raw reply
* Re: [PATCH net v3] ipv6: add rcu grace period before freeing fib6_node
From: David Miller @ 2017-08-22 18:03 UTC (permalink / raw)
To: weiwan; +Cc: netdev, edumazet, kafai
In-Reply-To: <20170821164710.134367-1-tracywwnj@gmail.com>
From: Wei Wang <weiwan@google.com>
Date: Mon, 21 Aug 2017 09:47:10 -0700
> From: Wei Wang <weiwan@google.com>
>
> We currently keep rt->rt6i_node pointing to the fib6_node for the route.
> And some functions make use of this pointer to dereference the fib6_node
> from rt structure, e.g. rt6_check(). However, as there is neither
> refcount nor rcu taken when dereferencing rt->rt6i_node, it could
> potentially cause crashes as rt->rt6i_node could be set to NULL by other
> CPUs when doing a route deletion.
> This patch introduces an rcu grace period before freeing fib6_node and
> makes sure the functions that dereference it takes rcu_read_lock().
>
> Note: there is no "Fixes" tag because this bug was there in a very
> early stage.
>
> Signed-off-by: Wei Wang <weiwan@google.com>
> Acked-by: Eric Dumazet <edumazet@google.com>
> Acked-by: Martin KaFai Lau <kafai@fb.com>
Applied and queued up for -stable, thanks.
^ permalink raw reply
* Re: [PATCH] net: ethernet: make ptp_clock_info const
From: David Miller @ 2017-08-22 18:05 UTC (permalink / raw)
To: bhumirks
Cc: julia.lawall, nicolas.ferre, richardcochran, peppe.cavallaro,
alexandre.torgue, cmetcalf, adi-buildroot-devel, netdev,
linux-kernel
In-Reply-To: <1503335210-16179-1-git-send-email-bhumirks@gmail.com>
From: Bhumika Goyal <bhumirks@gmail.com>
Date: Mon, 21 Aug 2017 22:36:50 +0530
> Make these const as they are only used in a copy operation.
> Done using Coccinelle.
...
> Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Applied.
^ permalink raw reply
* Re: [PATCH] ptp: make ptp_clock_info const
From: David Miller @ 2017-08-22 18:05 UTC (permalink / raw)
To: bhumirks; +Cc: julia.lawall, richardcochran, netdev, linux-kernel
In-Reply-To: <1503336672-22918-1-git-send-email-bhumirks@gmail.com>
From: Bhumika Goyal <bhumirks@gmail.com>
Date: Mon, 21 Aug 2017 23:01:12 +0530
> Make these const as they are only used in a copy operation.
> Done using Coccinelle.
...
> Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Applied.
^ permalink raw reply
* Re: [PATCH net-next v2 00/10] net: mvpp2: MAC/GoP configuration
From: Andrew Lunn @ 2017-08-22 18:07 UTC (permalink / raw)
To: Antoine Tenart
Cc: davem, jason, gregory.clement, sebastian.hesselbarth,
thomas.petazzoni, nadavh, linux, mw, stefanc, netdev,
linux-arm-kernel
In-Reply-To: <20170822170830.32413-1-antoine.tenart@free-electrons.com>
On Tue, Aug 22, 2017 at 07:08:20PM +0200, Antoine Tenart wrote:
> Hi all,
>
> This is based on net-next (e2a7c34fb285).
>
> I removed the GoP interrupt and PHY optional parts in this v2 to ease
> the review process as the MAC/GoP initialization seemed to start less
> discussions :)
Hi Antoine
By that, do you mean setting the SMI PHY address?
Andrew
^ permalink raw reply
* Re: [PATCH V2 net-next 0/2] liquidio: VF driver will notify NIC firmware of MTU change
From: David Miller @ 2017-08-22 18:08 UTC (permalink / raw)
To: felix.manlunas
Cc: netdev, raghu.vatsavayi, derek.chickles, satananda.burla,
veerasenareddy.burru
In-Reply-To: <20170821193527.GA6102@felix-thinkpad.cavium.com>
From: Felix Manlunas <felix.manlunas@cavium.com>
Date: Mon, 21 Aug 2017 12:35:27 -0700
> From: Veerasenareddy Burru <veerasenareddy.burru@cavium.com>
>
> Make VF driver notify NIC firmware of MTU change. Firmware needs this
> information for MTU propagation and enforcement.
>
> The first patch in this series moves a macro definition to a proper place
> to prevent a build error in the second patch which has the code that sends
> the notification.
>
> Change Log:
> V1 -> V2
> * Add "From:" line to patch #1 and #2 to give credit to the author.
> * In patch #2, order local variable declarations from longest to
> shortest line.
Series applied.
^ permalink raw reply
* [PATCH] net: amd: constify zorro_device_id
From: Arvind Yadav @ 2017-08-22 18:11 UTC (permalink / raw)
To: davem, jarod; +Cc: linux-kernel, netdev
zorro_device_id are not supposed to change at runtime. All functions
working with zorro_device_id provided by <linux/zorro.h> work with
const zorro_device_id. So mark the non-const structs as const.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
---
drivers/net/ethernet/amd/a2065.c | 2 +-
drivers/net/ethernet/amd/ariadne.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/amd/a2065.c b/drivers/net/ethernet/amd/a2065.c
index ee4b94e..e22f976 100644
--- a/drivers/net/ethernet/amd/a2065.c
+++ b/drivers/net/ethernet/amd/a2065.c
@@ -643,7 +643,7 @@ static int a2065_init_one(struct zorro_dev *z,
static void a2065_remove_one(struct zorro_dev *z);
-static struct zorro_device_id a2065_zorro_tbl[] = {
+static const struct zorro_device_id a2065_zorro_tbl[] = {
{ ZORRO_PROD_CBM_A2065_1 },
{ ZORRO_PROD_CBM_A2065_2 },
{ ZORRO_PROD_AMERISTAR_A2065 },
diff --git a/drivers/net/ethernet/amd/ariadne.c b/drivers/net/ethernet/amd/ariadne.c
index 5fd7b15..4b6a5cb 100644
--- a/drivers/net/ethernet/amd/ariadne.c
+++ b/drivers/net/ethernet/amd/ariadne.c
@@ -692,7 +692,7 @@ static void ariadne_remove_one(struct zorro_dev *z)
free_netdev(dev);
}
-static struct zorro_device_id ariadne_zorro_tbl[] = {
+static const struct zorro_device_id ariadne_zorro_tbl[] = {
{ ZORRO_PROD_VILLAGE_TRONIC_ARIADNE },
{ 0 }
};
--
2.7.4
^ permalink raw reply related
* Re: [PATCH v3 3/4] net: stmmac: register parent MDIO node for sun8i-h3-emac
From: Corentin Labbe @ 2017-08-22 18:11 UTC (permalink / raw)
To: Florian Fainelli
Cc: Chen-Yu Tsai, Andrew Lunn, Maxime Ripard, Rob Herring,
Mark Rutland, Russell King, Giuseppe Cavallaro, Alexandre Torgue,
devicetree, linux-arm-kernel, linux-kernel, netdev
In-Reply-To: <bfe784b6-5137-8fcd-dbd0-ad96974f4060@gmail.com>
On Tue, Aug 22, 2017 at 09:40:24AM -0700, Florian Fainelli wrote:
> On 08/22/2017 08:39 AM, Chen-Yu Tsai wrote:
> > On Mon, Aug 21, 2017 at 10:23 PM, Andrew Lunn <andrew@lunn.ch> wrote:
> >>> All muxes are mostly always represented the same way afaik, or do you
> >>> want to simply introduce a new compatible / property?
> >>
> >> + mdio-mux {
> >> + compatible = "allwinner,sun8i-h3-mdio-switch";
> >> + mdio-parent-bus = <&mdio_parent>;
> >> + #address-cells = <1>;
> >> + #size-cells = <0>;
> >> +
> >> + internal_mdio: mdio@1 {
> >> reg = <1>;
> >> - clocks = <&ccu CLK_BUS_EPHY>;
> >> - resets = <&ccu RST_BUS_EPHY>;
> >> + #address-cells = <1>;
> >> + #size-cells = <0>;
> >> + int_mii_phy: ethernet-phy@1 {
> >> + compatible = "ethernet-phy-ieee802.3-c22";
> >> + reg = <1>;
> >> + clocks = <&ccu CLK_BUS_EPHY>;
> >> + resets = <&ccu RST_BUS_EPHY>;
> >> + phy-is-integrated;
> >> + };
> >> + };
> >> + mdio: mdio@0 {
> >> + reg = <0>;
> >> + #address-cells = <1>;
> >> + #size-cells = <0>;
> >> };
> >>
> >> Hi Maxim
> >>
> >> Anybody who knows the MDIO-mux code/binding, knows that it is a run
> >> time mux. You swap the mux per MDIO transaction. You can access all
> >> the PHY and switches on the mux'ed MDIO bus.
> >>
> >> However here, it is effectively a boot-time MUX. You cannot change it
> >> on the fly. What happens when somebody has a phandle to a PHY on the
> >> internal and a phandle to a phy on the external? Does the driver at
> >> least return -EINVAL, or -EBUSY? Is there a representation which
> >> eliminates this possibility?
> >
> > There is only one controller. Either you use the internal PHY, which
> > is then directly coupled (no magnetics needed) to the RJ45 port, or
> > you use an external PHY over MII/RMII/RGMII. You could supposedly
> > have both on a board, and let the user choose one. But why bother
> > with the extra complexity and cost? Either you use the internal PHY
> > at 100M, or an external RGMII PHY for gigabit speeds.
>
> I agree, there is no point in over-engineering any of this. I don't
> think there is actually any MDIO mux per-se in that the MDIO clock and
> data lines are muxed, however there has to be some kind of built-in port
> multiplexer that lets you chose between connecting to the internal PHY
> and any external PHY/MAC, but that is not what a "mdio-mux" node represents.
>
> >
> > So I think what you are saying is either impossible or engineering-wise
> > a very stupid design, like using an external MAC with a discrete PHY
> > connected to the internal MAC's MDIO bus, while using the internal MAC
> > with the internal PHY.
> >
> > Now can we please decide on something? We're a week and a half from
> > the 4.13 release. If mdio-mux is wrong, then we could have two mdio
> > nodes (internal-mdio & external-mdio).
>
> I really don't see a need for a mdio-mux in the first place, just have
> one MDIO controller (current state) sub-node which describes the
> built-in STMMAC MDIO controller and declare the internal PHY as a child
> node (along with 'phy-is-integrated'). If a different configuration is
> used, then just put the external PHY as a child node there.
>
> If fixed-link is required, the mdio node becomes unused anyway.
>
> Works for everyone?
If we put an external PHY with reg=1 as a child of internal MDIO, il will be merged with internal PHY node and get phy-is-integrated.
Does two MDIO node "internal-mdio" and "mdio" works for you ?
(We keep "mdio" for external MDIO for reducing the number of patchs)
Thanks
Regards
diff --git a/arch/arm/boot/dts/sunxi-h3-h5.dtsi b/arch/arm/boot/dts/sunxi-h3-h5.dtsi
index 4b599b5d26f6..d5e7cf0d9454 100644
--- a/arch/arm/boot/dts/sunxi-h3-h5.dtsi
+++ b/arch/arm/boot/dts/sunxi-h3-h5.dtsi
@@ -417,7 +417,8 @@
#size-cells = <0>;
status = "disabled";
- mdio: mdio {
+ /* Only one MDIO is usable at the time */
+ internal_mdio: mdio@1 {
#address-cells = <1>;
#size-cells = <0>;
int_mii_phy: ethernet-phy@1 {
@@ -425,8 +426,13 @@
reg = <1>;
clocks = <&ccu CLK_BUS_EPHY>;
resets = <&ccu RST_BUS_EPHY>;
+ phy-is-integrated;
};
};
+ mdio: mdio@0 {
+ #address-cells = <1>;
+ #size-cells = <0>;
+ };
};
spi0: spi@01c68000 {
^ permalink raw reply related
* Re: [PATCH net-next v3 1/2] tcp: Get a proper dst before checking it.
From: Eric Dumazet @ 2017-08-22 18:14 UTC (permalink / raw)
To: Tonghao Zhang; +Cc: netdev
In-Reply-To: <1503383629-12392-2-git-send-email-xiangxia.m.yue@gmail.com>
On Mon, 2017-08-21 at 23:33 -0700, Tonghao Zhang wrote:
> tcp_peer_is_proven needs a proper route to make the
> determination, but dst always is NULL. This bug may
> be there at the beginning of git tree. This does not
> look serious enough to deserve backports to stable
> versions.
>
> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> ---
Acked-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply
* Re: XDP redirect measurements, gotchas and tracepoints
From: John Fastabend @ 2017-08-22 18:17 UTC (permalink / raw)
To: Michael Chan, Jesper Dangaard Brouer
Cc: xdp-newbies@vger.kernel.org, Daniel Borkmann, Andy Gospodarek,
netdev@vger.kernel.org, Paweł Staszewski, Alexander H Duyck
In-Reply-To: <CACKFLimAF9OYwz8EqG=1-N40hWuA-8C8k8ut0oFKYdduvZOxHA@mail.gmail.com>
On 08/22/2017 11:02 AM, Michael Chan wrote:
> On Mon, Aug 21, 2017 at 12:25 PM, Jesper Dangaard Brouer
> <brouer@redhat.com> wrote:
>>
>> I'be been playing with the latest XDP_REDIRECT feature, that was
>> accepted in net-next (for ixgbe), see merge commit[1].
>> [1] https://git.kernel.org/davem/net-next/c/6093ec2dc31
>>
>
> Just catching on XDP_REDIRECT and I have a very basic question. The
> ingress device passes the XDP buffer to the egress device for XDP
> redirect transmission. When the egress device has transmitted the
> packet, is it supposed to just free the buffer? Or is it supposed to
> be recycled?
>
> In XDP_TX, the buffer is recycled back to the rx ring.
>
With XDP_REDIRECT we must "just free the buffer" in ixgbe this means
page_frag_free() on the data. There is no way to know where the xdp
buffer came from it could be a different NIC for example.
However with how ixgbe is coded up recycling will work as long as
the memory is free'd before the driver ring tries to use it again. In
normal usage this should be the case. And if we are over-running a device
it doesn't really hurt to slow down the sender a bit.
I think this is a pretty good model, we could probably provide a set
of APIs for drivers to use so that we get some consistency across
vendors here, ala Jesper's page pool ideas.
(+Alex, for ixgbe details)
Thanks,
John
^ permalink raw reply
* Re: [PATCH net-next v3 2/2] tcp: Remove the unused parameter for tcp_try_fastopen.
From: Eric Dumazet @ 2017-08-22 18:19 UTC (permalink / raw)
To: Tonghao Zhang; +Cc: netdev
In-Reply-To: <1503383629-12392-3-git-send-email-xiangxia.m.yue@gmail.com>
On Mon, 2017-08-21 at 23:33 -0700, Tonghao Zhang wrote:
> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> ---
> include/net/tcp.h | 3 +--
> net/ipv4/tcp_fastopen.c | 6 ++----
> net/ipv4/tcp_input.c | 2 +-
> 3 files changed, 4 insertions(+), 7 deletions(-)
Acked-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply
* Re: [PATCH net-next 3/4] bpf/verifier: when pruning a branch, ignore its write marks
From: Alexei Starovoitov via iovisor-dev @ 2017-08-22 18:25 UTC (permalink / raw)
To: Edward Cree, davem-fT/PcQaiUtIeIZ0/mPfg9Q, Alexei Starovoitov,
Daniel Borkmann
Cc: netdev-u79uwXL29TY76Z2rM5mHXA, iovisor-dev
In-Reply-To: <9a8076d1-a88b-b0e8-3c47-969a022feb60-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>
On 8/22/17 11:03 AM, Edward Cree wrote:
> On 22/08/17 16:50, Edward Cree wrote:
>> On 22/08/17 16:24, Alexei Starovoitov wrote:
>>> Do you have a test case for this by any chance?
>> I think something like
>> if (cond)
>> r0=0;
>> if (cond)
>> r0=0;
>> return r0;
>> might tickle the bug, but I'm not sure.
> It turns out that (cond) has to be constructed not to alter our knowledge
> of whatever register we're testing, but apart from that, this works.
> {
> "liveness pruning and write screening",
> .insns = {
> /* Get an unknown value */
> BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, 0),
> /* branch conditions teach us nothing about R2 */
> BPF_JMP_IMM(BPF_JGE, BPF_REG_2, 0, 1),
> BPF_MOV64_IMM(BPF_REG_0, 0),
> BPF_JMP_IMM(BPF_JGE, BPF_REG_2, 0, 1),
> BPF_MOV64_IMM(BPF_REG_0, 0),
> BPF_EXIT_INSN(),
> },
> .errstr = "R0 !read_ok",
> .result = REJECT,
> .prog_type = BPF_PROG_TYPE_LWT_IN,
> },
> This test fails on net-next, but passes with my patch.
> I'll include it in the next spin of the series.
nice. thanks for the test!
^ permalink raw reply
* Re: [PATCH net] net: dsa: skb_put_padto() already frees nskb
From: David Miller @ 2017-08-22 18:27 UTC (permalink / raw)
To: f.fainelli; +Cc: netdev, andrew, vivien.didtelot, woojung.huh
In-Reply-To: <20170821194131.27839-1-f.fainelli@gmail.com>
From: Florian Fainelli <f.fainelli@gmail.com>
Date: Mon, 21 Aug 2017 12:41:31 -0700
> skb_put_padto() already frees the passed sk_buff reference upon error,
> so calling kfree_skb() on it again is not necessary.
>
> Detected by CoverityScan, CID#1416687 ("USE_AFTER_FREE")
>
> Fixes: e71cb9e00922 ("net: dsa: ksz: fix skb freeing")
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
I know it seems like a lot of work, but with appropriate wrappers we
can control the freeing that skb_pad() does at the deepest part of
this call chain.
int __skb_pad(struct sk_buff *skb, int pad, bool free_skb_on_err);
static inline int skb_pad(struct sk_buff *skb, int pad)
{
return __skb_pad(skb, pad, true);
}
static inline int __skb_put_padto(struct sk_buff *skb, unsigned int len,
bool free_skb_on_err)
{
unsigned int size = skb->len;
if (unlikely(size < len)) {
len -= size;
if (__skb_pad(skb, len, free_skb_on_err))
return -ENOMEM;
__skb_put(skb, len);
}
return 0;
}
static inline int skb_put_padto(struct sk_buff *skb, unsigned int len)
{
return __skb_put_padto(skb, len, true);
}
And then here in the ksz_xmit() code, invoke __skb_put_padto() with the
boolean set appropriately.
^ permalink raw reply
* Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi
From: Eric Dumazet @ 2017-08-22 18:28 UTC (permalink / raw)
To: David Miller; +Cc: willemdebruijn.kernel, mst, netdev, den, virtualization
In-Reply-To: <20170822.110108.343109469263087166.davem@davemloft.net>
On Tue, 2017-08-22 at 11:01 -0700, David Miller wrote:
> From: "Michael S. Tsirkin" <mst@redhat.com>
> Date: Tue, 22 Aug 2017 20:55:56 +0300
>
> > Which reminds me that skb_linearize in net core seems to be
> > fundamentally racy - I suspect that if skb is cloned, and someone is
> > trying to use the shared frags while another thread calls skb_linearize,
> > we get some use after free bugs which likely mostly go undetected
> > because the corrupted packets mostly go on wire and get dropped
> > by checksum code.
>
> Indeed, it does assume that the skb from which the clone was made
> never has it's geometry changed.
>
> I don't think even the TCP retransmit queue has this guarantee.
TCP retransmit makes sure to avoid that.
if (skb_unclone(skb, GFP_ATOMIC))
return -ENOMEM;
( Before cloning again skb )
^ permalink raw reply
* Re: XDP redirect measurements, gotchas and tracepoints
From: Duyck, Alexander H @ 2017-08-22 18:30 UTC (permalink / raw)
To: john.fastabend@gmail.com, brouer@redhat.com,
michael.chan@broadcom.com
Cc: pstaszewski@itcare.pl, netdev@vger.kernel.org,
xdp-newbies@vger.kernel.org, andy@greyhouse.net,
borkmann@iogearbox.net
In-Reply-To: <599C7530.2010405@gmail.com>
On Tue, 2017-08-22 at 11:17 -0700, John Fastabend wrote:
> On 08/22/2017 11:02 AM, Michael Chan wrote:
> > On Mon, Aug 21, 2017 at 12:25 PM, Jesper Dangaard Brouer
> > <brouer@redhat.com> wrote:
> > >
> > > I'be been playing with the latest XDP_REDIRECT feature, that was
> > > accepted in net-next (for ixgbe), see merge commit[1].
> > > [1] https://git.kernel.org/davem/net-next/c/6093ec2dc31
> > >
> >
> > Just catching on XDP_REDIRECT and I have a very basic question. The
> > ingress device passes the XDP buffer to the egress device for XDP
> > redirect transmission. When the egress device has transmitted the
> > packet, is it supposed to just free the buffer? Or is it supposed to
> > be recycled?
> >
> > In XDP_TX, the buffer is recycled back to the rx ring.
> >
>
> With XDP_REDIRECT we must "just free the buffer" in ixgbe this means
> page_frag_free() on the data. There is no way to know where the xdp
> buffer came from it could be a different NIC for example.
>
> However with how ixgbe is coded up recycling will work as long as
> the memory is free'd before the driver ring tries to use it again. In
> normal usage this should be the case. And if we are over-running a device
> it doesn't really hurt to slow down the sender a bit.
>
> I think this is a pretty good model, we could probably provide a set
> of APIs for drivers to use so that we get some consistency across
> vendors here, ala Jesper's page pool ideas.
>
> (+Alex, for ixgbe details)
>
> Thanks,
> John
I think you pretty much covered the inner workings for the ixgbe bits.
The only piece I would add is that the recycling trick normally only
works if the same interface/driver is doing both the Tx and the Rx. The
redirect code cannot assume that is the case and that is the reason why
it must always be freeing the traffic on clean-up.
- Alex
^ permalink raw reply
* Re: [PATCH v3 3/4] net: stmmac: register parent MDIO node for sun8i-h3-emac
From: Florian Fainelli @ 2017-08-22 18:35 UTC (permalink / raw)
To: Corentin Labbe
Cc: Chen-Yu Tsai, Andrew Lunn, Maxime Ripard, Rob Herring,
Mark Rutland, Russell King, Giuseppe Cavallaro, Alexandre Torgue,
devicetree, linux-arm-kernel, linux-kernel, netdev
In-Reply-To: <20170822181140.GA11596@Red>
On 08/22/2017 11:11 AM, Corentin Labbe wrote:
> On Tue, Aug 22, 2017 at 09:40:24AM -0700, Florian Fainelli wrote:
>> On 08/22/2017 08:39 AM, Chen-Yu Tsai wrote:
>>> On Mon, Aug 21, 2017 at 10:23 PM, Andrew Lunn <andrew@lunn.ch> wrote:
>>>>> All muxes are mostly always represented the same way afaik, or do you
>>>>> want to simply introduce a new compatible / property?
>>>>
>>>> + mdio-mux {
>>>> + compatible = "allwinner,sun8i-h3-mdio-switch";
>>>> + mdio-parent-bus = <&mdio_parent>;
>>>> + #address-cells = <1>;
>>>> + #size-cells = <0>;
>>>> +
>>>> + internal_mdio: mdio@1 {
>>>> reg = <1>;
>>>> - clocks = <&ccu CLK_BUS_EPHY>;
>>>> - resets = <&ccu RST_BUS_EPHY>;
>>>> + #address-cells = <1>;
>>>> + #size-cells = <0>;
>>>> + int_mii_phy: ethernet-phy@1 {
>>>> + compatible = "ethernet-phy-ieee802.3-c22";
>>>> + reg = <1>;
>>>> + clocks = <&ccu CLK_BUS_EPHY>;
>>>> + resets = <&ccu RST_BUS_EPHY>;
>>>> + phy-is-integrated;
>>>> + };
>>>> + };
>>>> + mdio: mdio@0 {
>>>> + reg = <0>;
>>>> + #address-cells = <1>;
>>>> + #size-cells = <0>;
>>>> };
>>>>
>>>> Hi Maxim
>>>>
>>>> Anybody who knows the MDIO-mux code/binding, knows that it is a run
>>>> time mux. You swap the mux per MDIO transaction. You can access all
>>>> the PHY and switches on the mux'ed MDIO bus.
>>>>
>>>> However here, it is effectively a boot-time MUX. You cannot change it
>>>> on the fly. What happens when somebody has a phandle to a PHY on the
>>>> internal and a phandle to a phy on the external? Does the driver at
>>>> least return -EINVAL, or -EBUSY? Is there a representation which
>>>> eliminates this possibility?
>>>
>>> There is only one controller. Either you use the internal PHY, which
>>> is then directly coupled (no magnetics needed) to the RJ45 port, or
>>> you use an external PHY over MII/RMII/RGMII. You could supposedly
>>> have both on a board, and let the user choose one. But why bother
>>> with the extra complexity and cost? Either you use the internal PHY
>>> at 100M, or an external RGMII PHY for gigabit speeds.
>>
>> I agree, there is no point in over-engineering any of this. I don't
>> think there is actually any MDIO mux per-se in that the MDIO clock and
>> data lines are muxed, however there has to be some kind of built-in port
>> multiplexer that lets you chose between connecting to the internal PHY
>> and any external PHY/MAC, but that is not what a "mdio-mux" node represents.
>>
>>>
>>> So I think what you are saying is either impossible or engineering-wise
>>> a very stupid design, like using an external MAC with a discrete PHY
>>> connected to the internal MAC's MDIO bus, while using the internal MAC
>>> with the internal PHY.
>>>
>>> Now can we please decide on something? We're a week and a half from
>>> the 4.13 release. If mdio-mux is wrong, then we could have two mdio
>>> nodes (internal-mdio & external-mdio).
>>
>> I really don't see a need for a mdio-mux in the first place, just have
>> one MDIO controller (current state) sub-node which describes the
>> built-in STMMAC MDIO controller and declare the internal PHY as a child
>> node (along with 'phy-is-integrated'). If a different configuration is
>> used, then just put the external PHY as a child node there.
>>
>> If fixed-link is required, the mdio node becomes unused anyway.
>>
>> Works for everyone?
>
> If we put an external PHY with reg=1 as a child of internal MDIO, il will be merged with internal PHY node and get phy-is-integrated.
Then have the .dtsi file contain just the mdio node, but no internal or
external PHY and push all the internal and external PHY node definition
(in its entirety) to the per-board DTS file, does not that work?
>
> Does two MDIO node "internal-mdio" and "mdio" works for you ?
> (We keep "mdio" for external MDIO for reducing the number of patchs)
How does that solve the problem and not create a new one where both MDIO
nodes end-up being registered? Does that mean you force the writer of
the board-level DTS to systematically disable the internal MDIO node
when using an external PHY and vice versa? How is that better than not
being explicit like I suggested earlier?
>
> Thanks
> Regards
>
> diff --git a/arch/arm/boot/dts/sunxi-h3-h5.dtsi b/arch/arm/boot/dts/sunxi-h3-h5.dtsi
> index 4b599b5d26f6..d5e7cf0d9454 100644
> --- a/arch/arm/boot/dts/sunxi-h3-h5.dtsi
> +++ b/arch/arm/boot/dts/sunxi-h3-h5.dtsi
> @@ -417,7 +417,8 @@
> #size-cells = <0>;
> status = "disabled";
>
> - mdio: mdio {
> + /* Only one MDIO is usable at the time */
> + internal_mdio: mdio@1 {
> #address-cells = <1>;
> #size-cells = <0>;
> int_mii_phy: ethernet-phy@1 {
> @@ -425,8 +426,13 @@
> reg = <1>;
> clocks = <&ccu CLK_BUS_EPHY>;
> resets = <&ccu RST_BUS_EPHY>;
> + phy-is-integrated;
> };
> };
> + mdio: mdio@0 {
> + #address-cells = <1>;
> + #size-cells = <0>;
> + };
> };
>
> spi0: spi@01c68000 {
>
--
Florian
^ permalink raw reply
* [PATCH 1/2] net: phy: Freeze PHY polling before suspending devices
From: Geert Uytterhoeven @ 2017-08-22 18:37 UTC (permalink / raw)
To: David S . Miller, Steve Glendinning, Andrew Lunn,
Florian Fainelli
Cc: Lukas Wunner, Rafael J . Wysocki, netdev, linux-pm,
linux-renesas-soc, linux-kernel, Geert Uytterhoeven
In-Reply-To: <1503427046-17618-1-git-send-email-geert+renesas@glider.be>
During system resume, the PHY state machine may be run from the
workqueue while the Ethernet device (and its PHY) are still suspended,
which may lead to a system crash.
E.g. on sh73a0/kzm9g and r8a73a4/ape6evm, the external Ethernet chip is
driven by a PM controlled clock. If the Ethernet registers are accessed
while the clock is not running, the system will crash with an imprecise
external abort.
As this is a race condition with a small time window, it is not so easy
to trigger at will. Using pm_test may increase your chances:
# echo 0 > /sys/module/printk/parameters/console_suspend
# echo platform > /sys/power/pm_test
# echo mem > /sys/power/state
However, since commit 7ad813f208533ceb ("net: phy: Correctly process
PHY_HALTED in phy_stop_machine()"), this is much more likely to happen,
presumably due to the new sync point where phy_state_machine() calls
queue_delayed_work() just before entering system suspend.
To fix this, move PHY polling from the non-freezable to the freezable
power efficient work queue. This is similar to what was done in commit
ea00353f36b64375 ("PCI: Freeze PME scan before suspending devices").
Stacktrace for posterity:
PM: Syncing filesystems ... done.
Freezing user space processes ... (elapsed 0.001 seconds) done.
OOM killer disabled.
Freezing remaining freezable tasks ... (elapsed 0.000 seconds) done.
PM: suspend devices took 0.010 seconds
Disabling non-boot CPUs ...
Enabling non-boot CPUs ...
Unhandled fault: imprecise external abort (0x1406) at 0x000cf6c8
pgd = c0004000
[000cf6c8] *pgd=00000000
Internal error: : 1406 [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 57 Comm: kworker/0:2 Not tainted 4.13.0-rc5-kzm9g-04113-g3a897d275111fd6c #903
Hardware name: Generic SH73A0 (Flattened Device Tree)
Workqueue: events_power_efficient phy_state_machine
task: df6b4800 task.stack: df796000
PC is at __smsc911x_reg_read+0x1c/0x60
LR is at smsc911x_mac_read+0x4c/0x118
pc : [<c03c7e88>] lr : [<c03c8b88>] psr: 200d0093
sp : df797e98 ip : 00000000 fp : 00000001
r10: 00000000 r9 : df70e380 r8 : 800d0093
r7 : df631de8 r6 : 00000006 r5 : df631e08 r4 : df631dc0
r3 : e0903000 r2 : 00000000 r1 : e09030a4 r0 : 00000000
Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none
Control: 10c5387d Table: 5ee1404a DAC: 00000051
Process kworker/0:2 (pid: 57, stack limit = 0xdf796210)
Stack: (0xdf797e98 to 0xdf798000)
7e80: df631dc0 00000001
7ea0: 00000001 df631de8 600d0013 c03c8df4 df70e400 00000001 00000001 df70e458
7ec0: df70e000 c03c7104 c03c6170 df70e000 df70e000 dfbc7bc0 df797f30 c03c6138
7ee0: df77d900 c03c617c df77d900 df70e32c dfbc7bc0 c03c4218 df77d900 df70e32c
7f00: dfbc7bc0 df797f30 dfbcb200 00000000 00000000 c013aa0c 00000001 00000000
7f20: c013a994 00000000 00000000 00000000 c1117f70 00000000 00000000 c07420ec
7f40: c0904900 df77d900 dfbc7bc0 dfbc7bc0 df796000 dfbc7bf4 c0904900 df77d918
7f60: 00000008 c013b1e0 df6b4800 df75b500 df77e5c0 00000000 df44dea8 df77d900
7f80: c013af28 df75b538 00000000 c0140584 df77e5c0 c0140460 00000000 00000000
7fa0: 00000000 00000000 00000000 c0106ef0 00000000 00000000 00000000 00000000
7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
7fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[<c03c7e88>] (__smsc911x_reg_read) from [<c03c8b88>] (smsc911x_mac_read+0x4c/0x118)
[<c03c8b88>] (smsc911x_mac_read) from [<c03c8df4>] (smsc911x_mii_read+0x2c/0xa4)
[<c03c8df4>] (smsc911x_mii_read) from [<c03c7104>] (mdiobus_read+0x58/0x70)
[<c03c7104>] (mdiobus_read) from [<c03c6138>] (genphy_update_link+0x18/0x50)
[<c03c6138>] (genphy_update_link) from [<c03c617c>] (genphy_read_status+0xc/0x1cc)
[<c03c617c>] (genphy_read_status) from [<c03c4218>] (phy_state_machine+0xa8/0x3dc)
[<c03c4218>] (phy_state_machine) from [<c013aa0c>] (process_one_work+0x240/0x3fc)
[<c013aa0c>] (process_one_work) from [<c013b1e0>] (worker_thread+0x2b8/0x3f4)
[<c013b1e0>] (worker_thread) from [<c0140584>] (kthread+0x124/0x144)
[<c0140584>] (kthread) from [<c0106ef0>] (ret_from_fork+0x14/0x24)
Code: e5903000 e0831001 e5910000 f57ff04f (e12fff1e)
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
drivers/net/phy/phy.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index dae13f028c84ee17..2055165ca6349ee5 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -554,7 +554,8 @@ EXPORT_SYMBOL(phy_start_aneg);
*/
void phy_start_machine(struct phy_device *phydev)
{
- queue_delayed_work(system_power_efficient_wq, &phydev->state_queue, HZ);
+ queue_delayed_work(system_freezable_power_efficient_wq,
+ &phydev->state_queue, HZ);
}
EXPORT_SYMBOL_GPL(phy_start_machine);
@@ -574,7 +575,8 @@ void phy_trigger_machine(struct phy_device *phydev, bool sync)
cancel_delayed_work_sync(&phydev->state_queue);
else
cancel_delayed_work(&phydev->state_queue);
- queue_delayed_work(system_power_efficient_wq, &phydev->state_queue, 0);
+ queue_delayed_work(system_freezable_power_efficient_wq,
+ &phydev->state_queue, 0);
}
/**
@@ -1081,8 +1083,8 @@ void phy_state_machine(struct work_struct *work)
* between states from phy_mac_interrupt()
*/
if (phydev->irq == PHY_POLL)
- queue_delayed_work(system_power_efficient_wq, &phydev->state_queue,
- PHY_STATE_TIME * HZ);
+ queue_delayed_work(system_freezable_power_efficient_wq,
+ &phydev->state_queue, PHY_STATE_TIME * HZ);
}
/**
@@ -1099,7 +1101,7 @@ void phy_mac_interrupt(struct phy_device *phydev, int new_link)
phydev->link = new_link;
/* Trigger a state machine change */
- queue_work(system_power_efficient_wq, &phydev->phy_queue);
+ queue_work(system_freezable_power_efficient_wq, &phydev->phy_queue);
}
EXPORT_SYMBOL(phy_mac_interrupt);
--
2.7.4
^ permalink raw reply related
* [PATCH 0/2] net: Fix crashes due to activity during suspend
From: Geert Uytterhoeven @ 2017-08-22 18:37 UTC (permalink / raw)
To: David S . Miller, Steve Glendinning, Andrew Lunn,
Florian Fainelli
Cc: Lukas Wunner, Rafael J . Wysocki, netdev, linux-pm,
linux-renesas-soc, linux-kernel, Geert Uytterhoeven
Hi all,
If an Ethernet device is used while the device is suspended, the system may
crash.
E.g. on sh73a0/kzm9g and r8a73a4/ape6evm, the external Ethernet chip is
driven by a PM controlled clock. If the Ethernet registers are accessed
while the clock is not running, the system will crash with an imprecise
external abort.
This patch series fixes two of such crashes:
1. The first patch prevents the PHY polling state machine from accessing
PHY registers while a device is suspended,
2. The second patch prevents the net core from trying to transmit packets
when an smsc911x device is suspended.
Both crashes can be reproduced on sh73a0/kzm9g and r8a73a4/ape6evm during
s2ram (rarely), or by using pm_test (more likely to trigger):
# echo 0 > /sys/module/printk/parameters/console_suspend
# echo platform > /sys/power/pm_test
# echo mem > /sys/power/state
With this series applied, my test systems survive a loop of 100 test
suspends.
Thanks for your comments!
Geert Uytterhoeven (2):
net: phy: Freeze PHY polling before suspending devices
net: smsc911x: Quiten netif during suspend
drivers/net/ethernet/smsc/smsc911x.c | 15 ++++++++++++++-
drivers/net/phy/phy.c | 12 +++++++-----
2 files changed, 21 insertions(+), 6 deletions(-)
--
2.7.4
Gr{oetje,eeting}s,
Geert
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox