* [PATCH net-next v2 04/15] dt-bindings: net: dwmac-sun8i: Sort syscon compatibles by alphabetical order
From: Chen-Yu Tsai @ 2018-05-01 16:12 UTC (permalink / raw)
To: Maxime Ripard, Michael Turquette, Stephen Boyd,
Giuseppe Cavallaro, Rob Herring, Mark Rutland, Mark Brown
Cc: devicetree, netdev, Chen-Yu Tsai, Corentin Labbe, linux-clk,
linux-arm-kernel, Icenowy Zheng
In-Reply-To: <20180501161227.2110-1-wens@csie.org>
The A83T syscon compatible was appended to the syscon compatibles list,
instead of inserted in to preserve the ordering.
Move it to the proper place to keep the list sorted.
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Reviewed-by: Rob Herring <robh@kernel.org>
---
Documentation/devicetree/bindings/net/dwmac-sun8i.txt | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/devicetree/bindings/net/dwmac-sun8i.txt b/Documentation/devicetree/bindings/net/dwmac-sun8i.txt
index e04ce75e24a3..1b8e33e71651 100644
--- a/Documentation/devicetree/bindings/net/dwmac-sun8i.txt
+++ b/Documentation/devicetree/bindings/net/dwmac-sun8i.txt
@@ -22,10 +22,10 @@ Required properties:
- #size-cells: shall be 0
- syscon: A phandle to the syscon of the SoC with one of the following
compatible string:
+ - allwinner,sun8i-a83t-system-controller
- allwinner,sun8i-h3-system-controller
- allwinner,sun8i-v3s-system-controller
- allwinner,sun50i-a64-system-controller
- - allwinner,sun8i-a83t-system-controller
Optional properties:
- allwinner,tx-delay-ps: TX clock delay chain value in ps.
--
2.17.0
^ permalink raw reply related
* [PATCH net-next v2 03/15] dt-bindings: net: dwmac-sun8i: Clean up clock delay chain descriptions
From: Chen-Yu Tsai @ 2018-05-01 16:12 UTC (permalink / raw)
To: Maxime Ripard, Michael Turquette, Stephen Boyd,
Giuseppe Cavallaro, Rob Herring, Mark Rutland, Mark Brown
Cc: devicetree, netdev, Chen-Yu Tsai, Corentin Labbe, linux-clk,
linux-arm-kernel, Icenowy Zheng
In-Reply-To: <20180501161227.2110-1-wens@csie.org>
The clock delay chains found in the glue layer for dwmac-sun8i are only
used with RGMII PHYs. They are not intended for non-RGMII PHYs, such as
MII external PHYs or the internal PHY. Also, a recent SoC has a smaller
range of possible values for the delay chain.
This patch reformats the delay chain section of the device tree binding
to make it clear that the delay chains only apply to RGMII PHYs, and
make it easier to add the R40-specific bits later.
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
---
Documentation/devicetree/bindings/net/dwmac-sun8i.txt | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/Documentation/devicetree/bindings/net/dwmac-sun8i.txt b/Documentation/devicetree/bindings/net/dwmac-sun8i.txt
index 3d6d5fa0c4d5..e04ce75e24a3 100644
--- a/Documentation/devicetree/bindings/net/dwmac-sun8i.txt
+++ b/Documentation/devicetree/bindings/net/dwmac-sun8i.txt
@@ -28,10 +28,13 @@ Required properties:
- allwinner,sun8i-a83t-system-controller
Optional properties:
-- allwinner,tx-delay-ps: TX clock delay chain value in ps. Range value is 0-700. Default is 0)
-- allwinner,rx-delay-ps: RX clock delay chain value in ps. Range value is 0-3100. Default is 0)
-Both delay properties need to be a multiple of 100. They control the delay for
-external PHY.
+- allwinner,tx-delay-ps: TX clock delay chain value in ps.
+ Range is 0-700. Default is 0.
+- allwinner,rx-delay-ps: RX clock delay chain value in ps.
+ Range is 0-3100. Default is 0.
+Both delay properties need to be a multiple of 100. They control the
+clock delay for external RGMII PHY. They do not apply to the internal
+PHY or external non-RGMII PHYs.
Optional properties for the following compatibles:
- "allwinner,sun8i-h3-emac",
--
2.17.0
^ permalink raw reply related
* [PATCH net-next v2 02/15] clk: sunxi-ng: r40: export a regmap to access the GMAC register
From: Chen-Yu Tsai @ 2018-05-01 16:12 UTC (permalink / raw)
To: Maxime Ripard, Michael Turquette, Stephen Boyd,
Giuseppe Cavallaro, Rob Herring, Mark Rutland, Mark Brown
Cc: devicetree, netdev, Chen-Yu Tsai, Corentin Labbe, linux-clk,
linux-arm-kernel, Icenowy Zheng
In-Reply-To: <20180501161227.2110-1-wens@csie.org>
From: Icenowy Zheng <icenowy@aosc.io>
There's a GMAC configuration register, which exists on A64/A83T/H3/H5 in
the syscon part, in the CCU of R40 SoC.
Export a regmap of the CCU.
Read access is not restricted to all registers, but only the GMAC
register is allowed to be written.
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
---
drivers/clk/sunxi-ng/ccu-sun8i-r40.c | 33 ++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)
diff --git a/drivers/clk/sunxi-ng/ccu-sun8i-r40.c b/drivers/clk/sunxi-ng/ccu-sun8i-r40.c
index c3aa839a453d..65ba6455feb7 100644
--- a/drivers/clk/sunxi-ng/ccu-sun8i-r40.c
+++ b/drivers/clk/sunxi-ng/ccu-sun8i-r40.c
@@ -1251,9 +1251,37 @@ static struct ccu_mux_nb sun8i_r40_cpu_nb = {
.bypass_index = 1, /* index of 24 MHz oscillator */
};
+/*
+ * Add a regmap for the GMAC driver (dwmac-sun8i) to access the
+ * GMAC configuration register.
+ * Only this register is allowed to be written, in order to
+ * prevent overriding critical clock configuration.
+ */
+
+#define SUN8I_R40_GMAC_CFG_REG 0x164
+static bool sun8i_r40_ccu_regmap_accessible_reg(struct device *dev,
+ unsigned int reg)
+{
+ if (reg == SUN8I_R40_GMAC_CFG_REG)
+ return true;
+ return false;
+}
+
+static struct regmap_config sun8i_r40_ccu_regmap_config = {
+ .reg_bits = 32,
+ .val_bits = 32,
+ .reg_stride = 4,
+ .max_register = 0x320, /* PLL_LOCK_CTRL_REG */
+
+ /* other devices have no business accessing other registers */
+ .readable_reg = sun8i_r40_ccu_regmap_accessible_reg,
+ .writeable_reg = sun8i_r40_ccu_regmap_accessible_reg,
+};
+
static int sun8i_r40_ccu_probe(struct platform_device *pdev)
{
struct resource *res;
+ struct regmap *regmap;
void __iomem *reg;
u32 val;
int ret;
@@ -1278,6 +1306,11 @@ static int sun8i_r40_ccu_probe(struct platform_device *pdev)
val &= ~GENMASK(25, 20);
writel(val, reg + SUN8I_R40_USB_CLK_REG);
+ regmap = devm_regmap_init_mmio(&pdev->dev, reg,
+ &sun8i_r40_ccu_regmap_config);
+ if (IS_ERR(regmap))
+ return PTR_ERR(regmap);
+
ret = sunxi_ccu_probe(pdev->dev.of_node, reg, &sun8i_r40_ccu_desc);
if (ret)
return ret;
--
2.17.0
^ permalink raw reply related
* [PATCH net-next v2 01/15] clk: sunxi-ng: r40: rewrite init code to a platform driver
From: Chen-Yu Tsai @ 2018-05-01 16:12 UTC (permalink / raw)
To: Maxime Ripard, Michael Turquette, Stephen Boyd,
Giuseppe Cavallaro, Rob Herring, Mark Rutland, Mark Brown
Cc: devicetree, netdev, Chen-Yu Tsai, Corentin Labbe, linux-clk,
linux-arm-kernel, Icenowy Zheng
In-Reply-To: <20180501161227.2110-1-wens@csie.org>
From: Icenowy Zheng <icenowy@aosc.io>
As we need to register a regmap on the R40 CCU, there needs to be a
device structure bound to the CCU device node.
Rewrite the R40 CCU driver initial code to make it a proper platform
driver, thus we will have a platform device bound to it.
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
---
drivers/clk/sunxi-ng/ccu-sun8i-r40.c | 39 ++++++++++++++++++++--------
1 file changed, 28 insertions(+), 11 deletions(-)
diff --git a/drivers/clk/sunxi-ng/ccu-sun8i-r40.c b/drivers/clk/sunxi-ng/ccu-sun8i-r40.c
index 933f2e68f42a..c3aa839a453d 100644
--- a/drivers/clk/sunxi-ng/ccu-sun8i-r40.c
+++ b/drivers/clk/sunxi-ng/ccu-sun8i-r40.c
@@ -12,7 +12,8 @@
*/
#include <linux/clk-provider.h>
-#include <linux/of_address.h>
+#include <linux/platform_device.h>
+#include <linux/regmap.h>
#include "ccu_common.h"
#include "ccu_reset.h"
@@ -1250,17 +1251,17 @@ static struct ccu_mux_nb sun8i_r40_cpu_nb = {
.bypass_index = 1, /* index of 24 MHz oscillator */
};
-static void __init sun8i_r40_ccu_setup(struct device_node *node)
+static int sun8i_r40_ccu_probe(struct platform_device *pdev)
{
+ struct resource *res;
void __iomem *reg;
u32 val;
+ int ret;
- reg = of_io_request_and_map(node, 0, of_node_full_name(node));
- if (IS_ERR(reg)) {
- pr_err("%s: Could not map the clock registers\n",
- of_node_full_name(node));
- return;
- }
+ res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ reg = devm_ioremap_resource(&pdev->dev, res);
+ if (IS_ERR(reg))
+ return PTR_ERR(reg);
/* Force the PLL-Audio-1x divider to 4 */
val = readl(reg + SUN8I_R40_PLL_AUDIO_REG);
@@ -1277,7 +1278,9 @@ static void __init sun8i_r40_ccu_setup(struct device_node *node)
val &= ~GENMASK(25, 20);
writel(val, reg + SUN8I_R40_USB_CLK_REG);
- sunxi_ccu_probe(node, reg, &sun8i_r40_ccu_desc);
+ ret = sunxi_ccu_probe(pdev->dev.of_node, reg, &sun8i_r40_ccu_desc);
+ if (ret)
+ return ret;
/* Gate then ungate PLL CPU after any rate changes */
ccu_pll_notifier_register(&sun8i_r40_pll_cpu_nb);
@@ -1285,6 +1288,20 @@ static void __init sun8i_r40_ccu_setup(struct device_node *node)
/* Reparent CPU during PLL CPU rate changes */
ccu_mux_notifier_register(pll_cpu_clk.common.hw.clk,
&sun8i_r40_cpu_nb);
+
+ return 0;
}
-CLK_OF_DECLARE(sun8i_r40_ccu, "allwinner,sun8i-r40-ccu",
- sun8i_r40_ccu_setup);
+
+static const struct of_device_id sun8i_r40_ccu_ids[] = {
+ { .compatible = "allwinner,sun8i-r40-ccu" },
+ { }
+};
+
+static struct platform_driver sun8i_r40_ccu_driver = {
+ .probe = sun8i_r40_ccu_probe,
+ .driver = {
+ .name = "sun8i-r40-ccu",
+ .of_match_table = sun8i_r40_ccu_ids,
+ },
+};
+builtin_platform_driver(sun8i_r40_ccu_driver);
--
2.17.0
^ permalink raw reply related
* [PATCH net-next v2 00/15] ARM: sun8i: r40: Add Ethernet support
From: Chen-Yu Tsai @ 2018-05-01 16:12 UTC (permalink / raw)
To: Maxime Ripard, Michael Turquette, Stephen Boyd,
Giuseppe Cavallaro, Rob Herring, Mark Rutland, Mark Brown
Cc: devicetree, netdev, Chen-Yu Tsai, Corentin Labbe, linux-clk,
linux-arm-kernel, Icenowy Zheng
Hi everyone,
This is v2 of my R40 Ethernet support series.
Changes since v1:
- Default to fetching regmap from device pointed to by syscon phandle,
and falling back to syscon API if that fails.
- Dropped .syscon_from_dev field in device data as a result of the
previous change.
- Added a large comment block explaining the first change.
- Simplified description of syscon property in sun8i-dwmac binding.
- Regmap now only exposes the EMAC/GMAC register, but retains the
offset within its address space.
- Added patches for A64, which reuse the same sun8i-dwmac changes.
This series adds support for the DWMAC based Ethernet controller found
on the Allwinner R40 SoC. The controller is either a DWMAC clone or
DWMAC core with its registers rearranged. This is already supported by
the dwmac-sun8i driver. The glue layer control registers, unlike other
sun8i family SoCs, is not in the system controller region, but in the
clock control unit, like with the older A20 and A31 SoCs.
While we reuse the bindings for dwmac-sun8i using a syscon phandle
reference, we need some custom plumbing for the clock driver to export
a regmap that only allows access to the GMAC register to the dwmac-sun8i
driver. An alternative would be to allow drivers to register custom
syscon devices with their own regmap and locking.
Patch 1 converts the CLK_OF_DECLARE style clock driver to a platform
one, so the regmap introduced later has a struct device to tie to.
Patch 2 adds a regmap that is exported by the clock driver for the
dwmac-sun8i driver to use.
Patches 3, 4, and 5 clean up the dwmac-sun8i binding.
Patch 6 adds device tree binding for Allwinner R40's Ethernet
controller.
Patch 7 converts regmap access of the syscon region in the dwmac-sun8i
driver to regmap_field, in anticipation of different field widths on
the R40.
Patch 8 introduces custom plumbing in the dwmac-sun8i driver to fetch
a regmap from another device, by looking up said device via a phandle,
then getting the regmap associated with that device.
Patch 9 adds support for different or absent TX/RX delay chain ranges
to the dwmac-sun8i driver.
Patch 10 adds support for the R40's ethernet controller.
Patch 11 cleans up the Bananapi M2 Ultra device tree file.
Patch 12 adds a GMAC device node and RGMII mode pinmux setting for the
R40.
Patch 13 enables Ethernet on the Bananapi M2 Ultra.
Patches 14 and 15 are for the A64. They convert the existing syscon
device to an SRAM controller device that exports a regmap. The needed
driver changes are in patch 14, and the device tree changes are in
patch 15.
Please have a look.
Regards
ChenYu
Chen-Yu Tsai (11):
dt-bindings: net: dwmac-sun8i: Clean up clock delay chain descriptions
dt-bindings: net: dwmac-sun8i: Sort syscon compatibles by alphabetical
order
dt-bindings: net: dwmac-sun8i: simplify description of syscon property
dt-bindings: net: dwmac-sun8i: Add binding for GMAC on Allwinner R40
SoC
net: stmmac: dwmac-sun8i: Use regmap_field for syscon register access
net: stmmac: dwmac-sun8i: Allow getting syscon regmap from external
device
net: stmmac: dwmac-sun8i: Support different ranges for TX/RX delay
chains
net: stmmac: dwmac-sun8i: Add support for GMAC on Allwinner R40 SoC
ARM: dts: sun8i: r40: bananapi-m2-ultra: Sort device node dereferences
ARM: dts: sun8i: r40: Add device node and RGMII pinmux node for GMAC
ARM: dts: sun8i: r40: bananapi-m2-ultra: Enable GMAC ethernet
controller
Icenowy Zheng (4):
clk: sunxi-ng: r40: rewrite init code to a platform driver
clk: sunxi-ng: r40: export a regmap to access the GMAC register
soc: sunxi: export a regmap for EMAC clock reg on A64
arm64: dts: allwinner: a64: add SRAM controller device tree node
.../devicetree/bindings/net/dwmac-sun8i.txt | 21 +--
.../boot/dts/sun8i-r40-bananapi-m2-ultra.dts | 99 ++++++++-----
arch/arm/boot/dts/sun8i-r40.dtsi | 34 +++++
arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi | 23 ++-
drivers/clk/sunxi-ng/ccu-sun8i-r40.c | 72 +++++++--
.../net/ethernet/stmicro/stmmac/dwmac-sun8i.c | 139 +++++++++++++++---
drivers/soc/sunxi/sunxi_sram.c | 57 ++++++-
7 files changed, 364 insertions(+), 81 deletions(-)
--
2.17.0
^ permalink raw reply
* Re: kTLS in combination with mlx4 is very unstable
From: Dave Watson @ 2018-05-01 16:09 UTC (permalink / raw)
To: Andre Tomt; +Cc: netdev, borisp, Aviad Yehezkel
In-Reply-To: <20180424170100.GA40104@fidjisimo-mbp.dhcp.thefacebook.com>
Hi Andre,
On 04/24/18 10:01 AM, Dave Watson wrote:
> On 04/22/18 11:21 PM, Andre Tomt wrote:
> > The kernel seems to get increasingly unstable as I load it up with client
> > connections. At about 9Gbps and 700 connections, it is okay at least for a
> > while - it might run fine for say 45 minutes. Once it gets to 20 - 30Gbps,
> > the kernel will usually start spewing OOPSes within minutes and the traffic
> > drops.
> >
> > Some bad interaction between mlx4 and kTLS?
I tried to repro, but wasn't able to - of course I don't have an mlx4
test setup. If I manually add a tls_write_space call after
do_tcp_sendpages, I get a similar stack though.
Something like the following should work, can you test? Thanks
diff --git a/include/net/tls.h b/include/net/tls.h
index 8c56809..ee78f33 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -187,6 +187,7 @@ struct tls_context {
struct scatterlist *partially_sent_record;
u16 partially_sent_offset;
unsigned long flags;
+ bool in_tcp_sendpages;
u16 pending_open_record_frags;
int (*push_pending_record)(struct sock *sk, int flags);
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 3aafb87..095af65 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -114,6 +114,7 @@ int tls_push_sg(struct sock *sk,
size = sg->length - offset;
offset += sg->offset;
+ ctx->in_tcp_sendpages = 1;
while (1) {
if (sg_is_last(sg))
sendpage_flags = flags;
@@ -148,6 +149,8 @@ int tls_push_sg(struct sock *sk,
}
clear_bit(TLS_PENDING_CLOSED_RECORD, &ctx->flags);
+ ctx->in_tcp_sendpages = 0;
+ ctx->sk_write_space(sk);
return 0;
}
@@ -217,6 +220,9 @@ static void tls_write_space(struct sock *sk)
{
struct tls_context *ctx = tls_get_ctx(sk);
+ if (ctx->in_tcp_sendpages)
+ return;
+
if (!sk->sk_write_pending && tls_is_pending_closed_record(ctx)) {
gfp_t sk_allocation = sk->sk_allocation;
int rc;
^ permalink raw reply related
* Re: [PATCH net-next v6] Add Common Applications Kept Enhanced (cake) qdisc
From: Eric Dumazet @ 2018-05-01 16:06 UTC (permalink / raw)
To: Dave Taht, Cong Wang
Cc: Toke Høiland-Jørgensen, Linux Kernel Network Developers,
Cake List
In-Reply-To: <CAA93jw6F+c-QRXe+MA2QmRkwiKEBqFgOFKTvWGfO7FvCQ5tFvw@mail.gmail.com>
On 04/30/2018 02:27 PM, Dave Taht wrote:
> I actually have a tc - bpf based ack filter, during the development of
> cake's ack-thinner, that I should submit one of these days. It
> proved to be of limited use.
>
> Probably the biggest mistake we made is by calling this cake feature a
> filter. It isn't.
>
> Maybe we should have called it a "thinner" or something like that? In
> order to properly "thin" or "reduce" an ack stream
> you have to have a queue to look at and some related state. TC filters
> do not operate on queues, qdiscs do. Thus the "ack-filter" here is
> deeply embedded into cake's flow isolation and queue structures.
A feature eating packets _is_ a filter.
Given that a qdisc only sees one direction, I really do not get why ack-filter
is so desirable in a packet scheduler.
You have not provided any numbers to show how useful it is to maintain this
code (probably still broken btw, considering it is changing some skb attributes).
On wifi (or any half duplex medium), you might gain something by carefully
sending ACK not too often, but ultimately this should be done by TCP stack,
in well controlled environment [1], instead of middle-boxes happily playing/breaking
some Internet standards.
[1] TCP stack has the estimations of RTT, RWIN, throughput, and might be able to
avoid flooding the network with acks under some conditions.
Say RTT is 100ms, and we receive 1 packet every 100 usec (no GRO aggregation)
Maybe we do not really _need_ to send 5000 ack per second
(or even 10,000 ack per second if a hole needs a repair)
Also on wifi, the queue builds in the driver queues anyway, not in the qdisc.
So ACK filtering, if _really_ successful, would need to be modularized.
Please split Cake into a patch series.
Presumably putting the ack-filter on a patch of its own.
^ permalink raw reply
* Re: [PATCH v2 0/2] net: stmmac: dwmac-meson: 100M phy mode support for AXG SoC
From: David Miller @ 2018-05-01 15:30 UTC (permalink / raw)
To: yixun.lan
Cc: netdev, khilman, carlo, robh, jbrunet, martin.blumenstingl,
linux-amlogic, linux-arm-kernel, linux-kernel, devicetree
In-Reply-To: <20180428102111.18384-1-yixun.lan@amlogic.com>
From: Yixun Lan <yixun.lan@amlogic.com>
Date: Sat, 28 Apr 2018 10:21:09 +0000
> Due to the dwmac glue layer register changed, we need to
> introduce a new compatible name for the Meson-AXG SoC
> to support for the RMII 100M ethernet PHY.
>
> Change since v1 at [1]:
> - implement set_phy_mode() for each SoC
>
> [1] https://lkml.kernel.org/r/20180426160508.29380-1-yixun.lan@amlogic.com
Series applied, thank you.
^ permalink raw reply
* Re: [PATCH] vhost: make msg padding explicit
From: David Miller @ 2018-05-01 15:28 UTC (permalink / raw)
To: mst; +Cc: linux-kernel, kevin, jasowang, kvm, virtualization, netdev
In-Reply-To: <1524844881-178524-1-git-send-email-mst@redhat.com>
From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Fri, 27 Apr 2018 19:02:05 +0300
> There's a 32 bit hole just after type. It's best to
> give it a name, this way compiler is forced to initialize
> it with rest of the structure.
>
> Reported-by: Kevin Easton <kevin@guarana.org>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Michael, will you be sending this directly to Linus or would you like
me to apply it to net or net-next?
Thanks.
^ permalink raw reply
* Re: [PATCH net-next 1/2] mlxsw: spectrum_router: Return an error for non-default FIB rules
From: Ido Schimmel @ 2018-05-01 15:19 UTC (permalink / raw)
To: David Ahern; +Cc: Ido Schimmel, netdev, davem, jiri, mlxsw
In-Reply-To: <320c79af-3de6-f012-75a2-e5a7effde9f6@gmail.com>
On Tue, May 01, 2018 at 09:16:23AM -0600, David Ahern wrote:
> On 5/1/18 2:16 AM, Ido Schimmel wrote:
> > Since commit 9776d32537d2 ("net: Move call_fib_rule_notifiers up in
> > fib_nl_newrule") it is possible to forbid the installation of
> > unsupported FIB rules.
> >
> > Have mlxsw return an error for non-default FIB rules in addition to the
> > existing extack message.
> >
> > Example:
> > # ip rule add from 198.51.100.1 table 10
> > Error: mlxsw_spectrum: FIB rules not supported.
> >
> > Note that offload is only aborted when non-default FIB rules are already
> > installed and merely replayed during module initialization.
> >
> > Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> > ---
> > drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 6 +++---
> > 1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
> > index 8e4edb634b11..baea97560029 100644
> > --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
> > +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
> > @@ -5899,7 +5899,7 @@ static int mlxsw_sp_router_fib_rule_event(unsigned long event,
> > }
> >
> > if (err < 0)
> > - NL_SET_ERR_MSG_MOD(extack, "FIB rules not supported. Aborting offload");
> > + NL_SET_ERR_MSG_MOD(extack, "FIB rules not supported");
> >
> > return err;
>
> shouldn't mlxsw_sp_router_fib_rule_event return -EOPNOTSUPP instead of
> -1 (EPERM)?
The -1 wasn't visible until now so it didn't matter. Will change to
-EOPNOTSUPP in v2. Thanks
^ permalink raw reply
* Re: [PATCH net-next 1/2] mlxsw: spectrum_router: Return an error for non-default FIB rules
From: David Ahern @ 2018-05-01 15:16 UTC (permalink / raw)
To: Ido Schimmel, netdev; +Cc: davem, jiri, mlxsw
In-Reply-To: <20180501081639.29162-2-idosch@mellanox.com>
On 5/1/18 2:16 AM, Ido Schimmel wrote:
> Since commit 9776d32537d2 ("net: Move call_fib_rule_notifiers up in
> fib_nl_newrule") it is possible to forbid the installation of
> unsupported FIB rules.
>
> Have mlxsw return an error for non-default FIB rules in addition to the
> existing extack message.
>
> Example:
> # ip rule add from 198.51.100.1 table 10
> Error: mlxsw_spectrum: FIB rules not supported.
>
> Note that offload is only aborted when non-default FIB rules are already
> installed and merely replayed during module initialization.
>
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> ---
> drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
> index 8e4edb634b11..baea97560029 100644
> --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
> +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
> @@ -5899,7 +5899,7 @@ static int mlxsw_sp_router_fib_rule_event(unsigned long event,
> }
>
> if (err < 0)
> - NL_SET_ERR_MSG_MOD(extack, "FIB rules not supported. Aborting offload");
> + NL_SET_ERR_MSG_MOD(extack, "FIB rules not supported");
>
> return err;
shouldn't mlxsw_sp_router_fib_rule_event return -EOPNOTSUPP instead of
-1 (EPERM)?
> }
> @@ -5926,8 +5926,8 @@ static int mlxsw_sp_router_fib_event(struct notifier_block *nb,
> case FIB_EVENT_RULE_DEL:
> err = mlxsw_sp_router_fib_rule_event(event, info,
> router->mlxsw_sp);
> - if (!err)
> - return NOTIFY_DONE;
> + if (!err || info->extack)
> + return notifier_from_errno(err);
> }
>
> fib_work = kzalloc(sizeof(*fib_work), GFP_ATOMIC);
>
^ permalink raw reply
* Re: [RFC/PATCH] net: ethernet: nixge: Use of_get_mac_address()
From: Rob Herring @ 2018-05-01 15:05 UTC (permalink / raw)
To: Moritz Fischer; +Cc: linux-kernel, devicetree, netdev, davem, mark.rutland
In-Reply-To: <20180426220401.bad53hxnkxwvgjot@derp-derp.lan>
On Thu, Apr 26, 2018 at 03:04:01PM -0700, Moritz Fischer wrote:
> On Thu, Apr 26, 2018 at 02:57:42PM -0700, Moritz Fischer wrote:
> > Make nixge driver work with 'mac-address' property instead of
> > 'address' property. There are currently no in-tree users and
> > the only users of this driver are devices that use overlays
> > we control to instantiate the device together with the corresponding
> > FPGA images.
> >
> > Signed-off-by: Moritz Fischer <mdf@kernel.org>
> > ---
> >
> > Hi David, Rob,
> >
> > with Mike's change that enable the generic 'mac-address'
> > binding that I barely missed with the submission of this
> > driver I was wondering if we can still change the binding.
> >
> > I'm aware that this generally is a nonono case, since the binding
> > is considered API, but since there are no users outside of our
> > devicetree overlays that we ship with our devices I thought I'd ask.
Fine by me. It really comes down to whether there are any users that
would be impacted.
Rob
^ permalink raw reply
* Re: Request for stable 4.14.x inclusion: net: don't call update_pmtu unconditionally
From: Greg KH @ 2018-05-01 15:04 UTC (permalink / raw)
To: Thomas Deutschmann; +Cc: Eddie Chapman, stable, davem, nicolas.dichtel, netdev
In-Reply-To: <1ae8845f-6106-29e1-ceec-02eff35beed9@gentoo.org>
On Tue, May 01, 2018 at 12:15:37AM +0200, Thomas Deutschmann wrote:
> Hi,
>
> On 2018-04-30 20:22, Greg KH wrote:
> > The geneve hunk doesn't apply at all to the 4.14.y tree, so I think
> > someone has a messed up tree somewhere...
> >
> > I'll go look into this now.
>
> Mh?
>
> > $ git clone https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
> > $ cd linux-stable
> > $ git checkout v4.14.38
> > $ git cherry-pick 52a589d51f1008f62569bf89e95b26221ee76690
>
> Works for me... then I cherry-pick
> f15ca723c1ebe6c1a06bc95fda6b62cd87b44559 on top, adjust
> "net/ipv6/ip6_tunnel.c" like shown in my previous mail and everything is
> fine for me...
Ah crap, I missed the dependancy here as well, it was a long day
yesterday...
I'll drop this and try it again for the next release.
greg k-h
^ permalink raw reply
* Re: [PATCH iproute2-master] iproute: Parse last nexthop in a multipath route
From: David Ahern @ 2018-05-01 14:59 UTC (permalink / raw)
To: Ido Schimmel, netdev; +Cc: stephen, mlxsw
In-Reply-To: <20180501131635.14981-1-idosch@mellanox.com>
On 5/1/18 7:16 AM, Ido Schimmel wrote:
> Continue parsing a multipath payload as long as another nexthop can fit
> in the payload.
>
> # ip route add 192.0.2.0/24 nexthop dev dummy0 nexthop dev dummy1
>
> Before:
> # ip route show 192.0.2.0/24
> 192.0.2.0/24
> nexthop dev dummy0 weight 1
>
> After:
> # ip route show 192.0.2.0/24
> 192.0.2.0/24
> nexthop dev dummy0 weight 1
> nexthop dev dummy1 weight 1
>
> Fixes: f48e14880a0e ("iproute: refactor multipath print")
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> ---
> ip/iproute.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
Acked-by: David Ahern <dsahern@gmail.com>
^ permalink raw reply
* Re: [PATCH V10 net-next 00/14] TLS offload, netdev & MLX5 support
From: David Miller @ 2018-05-01 14:23 UTC (permalink / raw)
To: borisp; +Cc: netdev, saeedm, davejwatson, ktkhai, sergei.shtylyov
In-Reply-To: <1525072583-138506-1-git-send-email-borisp@mellanox.com>
From: Boris Pismenny <borisp@mellanox.com>
Date: Mon, 30 Apr 2018 10:16:09 +0300
> The following series provides TLS TX inline crypto offload.
Series applied, assuming the build is successful this will be
pushed out to net-next shortly.
Thank you.
^ permalink raw reply
* Re: [PATCH net-next 0/2 v5] netns: uevent filtering
From: David Miller @ 2018-05-01 14:23 UTC (permalink / raw)
To: ebiederm
Cc: christian.brauner, netdev, linux-kernel, avagin, ktkhai, serge,
gregkh
In-Reply-To: <87fu3cbsdw.fsf@xmission.com>
From: ebiederm@xmission.com (Eric W. Biederman)
Date: Mon, 30 Apr 2018 10:55:55 -0500
> Christian Brauner <christian.brauner@ubuntu.com> writes:
>
>> Hey everyone,
>>
>> This is the new approach to uevent filtering as discussed (see the
>> threads in [1], [2], and [3]). It only contains *non-functional
>> changes*.
...
> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Series applied, thanks everyone.
^ permalink raw reply
* Re: [RFC v2 bpf-next 0/9] bpf: Add helper to do FIB lookups
From: David Miller @ 2018-05-01 14:20 UTC (permalink / raw)
To: dsahern; +Cc: netdev, borkmann, ast, shm, roopa, brouer, toke, john.fastabend
In-Reply-To: <20180429180752.15428-1-dsahern@gmail.com>
From: David Ahern <dsahern@gmail.com>
Date: Sun, 29 Apr 2018 11:07:43 -0700
> Provide a helper for doing a FIB and neighbor lookups in the kernel
> tables from an XDP program. The helper provides a fastpath for forwarding
> packets. If the packet is a local delivery or for any reason is not a
> simple lookup and forward, the packet is expected to continue up the stack
> for full processing.
>
> Patches 1-6 do some more refactoring to IPv6 with the end goal of
> extracting a FIB lookup function that aligns with fib_lookup for IPv4,
> basically returning a fib6_info without creating a dst based entry.
>
> Patch 7 adds lookup functions to the ipv6 stub. These are needed since
> bpf is built into the kernel and ipv6 may not be built or loaded.
>
> Patch 8 adds the bpf helper and 9 adds a sample program.
>
> v2
> - fixed use of foward helper from cls_act as noted by Daniel
> - in patch 1 rename fib6_lookup_1 as well for consistency
I've reviewed this and generally I agree with the semantic choices
wrt. resolution.
We really can't do neigh resolution without an SKB, so at least in
the xdp case we must push the packet up into the full stack path.
I guess we could do the neigh resolve in the cls_bpf case, but I
wonder how helpful that would be.
^ permalink raw reply
* [PATCH net-next 2/2] tc-testing: Updated csum action tests batch create w/wo cookies.
From: Craig Dillabaugh @ 2018-05-01 14:17 UTC (permalink / raw)
To: davem; +Cc: netdev, jhs, xiyou.wangcong, Craig Dillabaugh
In-Reply-To: <1525184264-9436-1-git-send-email-cdillaba@mojatatu.com>
Signed-off-by: Craig Dillabaugh <cdillaba@mojatatu.com>
---
.../tc-testing/tc-tests/actions/csum.json | 74 +++++++++++++++++++++-
1 file changed, 72 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/csum.json b/tools/testing/selftests/tc-testing/tc-tests/actions/csum.json
index 93cf8fe..3a2f51f 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/actions/csum.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/actions/csum.json
@@ -398,13 +398,83 @@
255
]
],
- "cmdUnderTest": "for i in `seq 1 32`; do cmd=\"action csum tcp continue index $i \"; args=\"$args$cmd\"; done && $TC actions add $args",
- "expExitCode": "255",
+ "cmdUnderTest": "bash -c \"for i in \\`seq 1 32\\`; do cmd=\\\"action csum tcp continue index \\$i \\\"; args=\"\\$args\\$cmd\"; done && $TC actions add \\$args\"",
+ "expExitCode": "0",
"verifyCmd": "$TC actions ls action csum",
"matchPattern": "^[ \t]+index [0-9]* ref",
"matchCount": "32",
"teardown": [
"$TC actions flush action csum"
]
+ },
+ {
+ "id": "b4e9",
+ "name": "Delete batch of 32 csum actions",
+ "category": [
+ "actions",
+ "csum"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action csum",
+ 0,
+ 1,
+ 255
+ ],
+ "bash -c \"for i in \\`seq 1 32\\`; do cmd=\\\"action csum tcp continue index \\$i \\\"; args=\"\\$args\\$cmd\"; done && $TC actions add \\$args\""
+ ],
+ "cmdUnderTest": "bash -c \"for i in \\`seq 1 32\\`; do cmd=\\\"action csum index \\$i \\\"; args=\"\\$args\\$cmd\"; done && $TC actions del \\$args\"",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action csum",
+ "matchPattern": "^[ \t]+index [0-9]+ ref",
+ "matchCount": "0",
+ "teardown": []
+ },
+ {
+ "id": "0015",
+ "name": "Add batch of 32 csum tcp actions with large cookies",
+ "category": [
+ "actions",
+ "csum"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action csum",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "bash -c \"for i in \\`seq 1 32\\`; do cmd=\\\"action csum tcp continue index \\$i cookie aaabbbcccdddeee \\\"; args=\"\\$args\\$cmd\"; done && $TC actions add \\$args\"",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions ls action csum",
+ "matchPattern": "^[ \t]+index [0-9]* ref",
+ "matchCount": "32",
+ "teardown": [
+ "$TC actions flush action csum"
+ ]
+ },
+ {
+ "id": "989e",
+ "name": "Delete batch of 32 csum actions with large cookies",
+ "category": [
+ "actions",
+ "csum"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action csum",
+ 0,
+ 1,
+ 255
+ ],
+ "bash -c \"for i in \\`seq 1 32\\`; do cmd=\\\"action csum tcp continue index \\$i cookie aaabbbcccdddeee \\\"; args=\"\\$args\\$cmd\"; done && $TC actions add \\$args\""
+ ],
+ "cmdUnderTest": "bash -c \"for i in \\`seq 1 32\\`; do cmd=\\\"action csum index \\$i \\\"; args=\"\\$args\\$cmd\"; done && $TC actions del \\$args\"",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action csum",
+ "matchPattern": "^[ \t]+index [0-9]+ ref",
+ "matchCount": "0",
+ "teardown": []
}
]
--
1.9.1
^ permalink raw reply related
* [PATCH net-next 1/2] net sched: Implemented get_fill_size routine for act_csum.
From: Craig Dillabaugh @ 2018-05-01 14:17 UTC (permalink / raw)
To: davem; +Cc: netdev, jhs, xiyou.wangcong, Craig Dillabaugh
In-Reply-To: <1525184264-9436-1-git-send-email-cdillaba@mojatatu.com>
Signed-off-by: Craig Dillabaugh <cdillaba@mojatatu.com>
---
net/sched/act_csum.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/net/sched/act_csum.c b/net/sched/act_csum.c
index 7e28b2c..b85e088 100644
--- a/net/sched/act_csum.c
+++ b/net/sched/act_csum.c
@@ -648,6 +648,11 @@ static int tcf_csum_search(struct net *net, struct tc_action **a, u32 index,
return tcf_idr_search(tn, a, index);
}
+static size_t tcf_csum_get_fill_size(const struct tc_action *act)
+{
+ return nla_total_size(sizeof(struct tc_csum));
+}
+
static struct tc_action_ops act_csum_ops = {
.kind = "csum",
.type = TCA_ACT_CSUM,
@@ -658,6 +663,7 @@ static int tcf_csum_search(struct net *net, struct tc_action **a, u32 index,
.cleanup = tcf_csum_cleanup,
.walk = tcf_csum_walker,
.lookup = tcf_csum_search,
+ .get_fill_size = tcf_csum_get_fill_size,
.size = sizeof(struct tcf_csum),
};
--
1.9.1
^ permalink raw reply related
* [PATCH net-next 0/2] Update csum tc action for batch operation.
From: Craig Dillabaugh @ 2018-05-01 14:17 UTC (permalink / raw)
To: davem; +Cc: netdev, jhs, xiyou.wangcong, Craig Dillabaugh
This patchset includes two patches the first updating act_csum.c
to include the get_fill_size routine required for batch operation, and
the second including updated TDC tests for the feature.
Craig Dillabaugh (2):
net sched: Implemented get_fill_size routine for act_csum.
tc-testing: Updated csum action tests batch create w/wo cookies.
net/sched/act_csum.c | 6 ++
.../tc-testing/tc-tests/actions/csum.json | 74 +++++++++++++++++++++-
2 files changed, 78 insertions(+), 2 deletions(-)
--
1.9.1
^ permalink raw reply
* [PATCH V3 net-next 2/2] selftest: add test for TCP_INQ
From: Soheil Hassas Yeganeh @ 2018-05-01 14:11 UTC (permalink / raw)
To: davem, netdev; +Cc: ycheng, ncardwell, edumazet, willemb, Soheil Hassas Yeganeh
In-Reply-To: <20180501141128.208705-1-soheil.kdev@gmail.com>
From: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
---
tools/testing/selftests/net/Makefile | 3 +-
tools/testing/selftests/net/tcp_inq.c | 189 ++++++++++++++++++++++++++
2 files changed, 191 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/net/tcp_inq.c
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index df9102ec7b7af..0a1821f8dfb18 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -9,7 +9,7 @@ TEST_PROGS += fib_tests.sh fib-onlink-tests.sh in_netns.sh pmtu.sh udpgso.sh
TEST_PROGS += udpgso_bench.sh
TEST_GEN_FILES = socket
TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy
-TEST_GEN_FILES += tcp_mmap
+TEST_GEN_FILES += tcp_mmap tcp_inq
TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict
TEST_GEN_PROGS += udpgso udpgso_bench_tx udpgso_bench_rx
@@ -18,3 +18,4 @@ include ../lib.mk
$(OUTPUT)/reuseport_bpf_numa: LDFLAGS += -lnuma
$(OUTPUT)/tcp_mmap: LDFLAGS += -lpthread
+$(OUTPUT)/tcp_inq: LDFLAGS += -lpthread
diff --git a/tools/testing/selftests/net/tcp_inq.c b/tools/testing/selftests/net/tcp_inq.c
new file mode 100644
index 0000000000000..d044b29ddabcc
--- /dev/null
+++ b/tools/testing/selftests/net/tcp_inq.c
@@ -0,0 +1,189 @@
+/*
+ * Copyright 2018 Google Inc.
+ * Author: Soheil Hassas Yeganeh (soheil@google.com)
+ *
+ * Simple example on how to use TCP_INQ and TCP_CM_INQ.
+ *
+ * License (GPLv2):
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. * See the GNU General Public License for
+ * more details.
+ */
+#define _GNU_SOURCE
+
+#include <error.h>
+#include <netinet/in.h>
+#include <netinet/tcp.h>
+#include <pthread.h>
+#include <stdio.h>
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/socket.h>
+#include <unistd.h>
+
+#ifndef TCP_INQ
+#define TCP_INQ 36
+#endif
+
+#ifndef TCP_CM_INQ
+#define TCP_CM_INQ TCP_INQ
+#endif
+
+#define BUF_SIZE 8192
+#define CMSG_SIZE 32
+
+static int family = AF_INET6;
+static socklen_t addr_len = sizeof(struct sockaddr_in6);
+static int port = 4974;
+
+static void setup_loopback_addr(int family, struct sockaddr_storage *sockaddr)
+{
+ struct sockaddr_in6 *addr6 = (void *) sockaddr;
+ struct sockaddr_in *addr4 = (void *) sockaddr;
+
+ switch (family) {
+ case PF_INET:
+ memset(addr4, 0, sizeof(*addr4));
+ addr4->sin_family = AF_INET;
+ addr4->sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+ addr4->sin_port = htons(port);
+ break;
+ case PF_INET6:
+ memset(addr6, 0, sizeof(*addr6));
+ addr6->sin6_family = AF_INET6;
+ addr6->sin6_addr = in6addr_loopback;
+ addr6->sin6_port = htons(port);
+ break;
+ default:
+ error(1, 0, "illegal family");
+ }
+}
+
+void *start_server(void *arg)
+{
+ int server_fd = (int)(unsigned long)arg;
+ struct sockaddr_in addr;
+ socklen_t addrlen = sizeof(addr);
+ char *buf;
+ int fd;
+ int r;
+
+ buf = malloc(BUF_SIZE);
+
+ for (;;) {
+ fd = accept(server_fd, (struct sockaddr *)&addr, &addrlen);
+ if (fd == -1) {
+ perror("accept");
+ break;
+ }
+ do {
+ r = send(fd, buf, BUF_SIZE, 0);
+ } while (r < 0 && errno == EINTR);
+ if (r < 0)
+ perror("send");
+ if (r != BUF_SIZE)
+ fprintf(stderr, "can only send %d bytes\n", r);
+ /* TCP_INQ can overestimate in-queue by one byte if we send
+ * the FIN packet. Sleep for 1 second, so that the client
+ * likely invoked recvmsg().
+ */
+ sleep(1);
+ close(fd);
+ }
+
+ free(buf);
+ close(server_fd);
+ pthread_exit(0);
+}
+
+int main(int argc, char *argv[])
+{
+ struct sockaddr_storage listen_addr, addr;
+ int c, one = 1, inq = -1;
+ pthread_t server_thread;
+ char cmsgbuf[CMSG_SIZE];
+ struct iovec iov[1];
+ struct cmsghdr *cm;
+ struct msghdr msg;
+ int server_fd, fd;
+ char *buf;
+
+ while ((c = getopt(argc, argv, "46p:")) != -1) {
+ switch (c) {
+ case '4':
+ family = PF_INET;
+ addr_len = sizeof(struct sockaddr_in);
+ break;
+ case '6':
+ family = PF_INET6;
+ addr_len = sizeof(struct sockaddr_in6);
+ break;
+ case 'p':
+ port = atoi(optarg);
+ break;
+ }
+ }
+
+ server_fd = socket(family, SOCK_STREAM, 0);
+ if (server_fd < 0)
+ error(1, errno, "server socket");
+ setup_loopback_addr(family, &listen_addr);
+ if (setsockopt(server_fd, SOL_SOCKET, SO_REUSEADDR,
+ &one, sizeof(one)) != 0)
+ error(1, errno, "setsockopt(SO_REUSEADDR)");
+ if (bind(server_fd, (const struct sockaddr *)&listen_addr,
+ addr_len) == -1)
+ error(1, errno, "bind");
+ if (listen(server_fd, 128) == -1)
+ error(1, errno, "listen");
+ if (pthread_create(&server_thread, NULL, start_server,
+ (void *)(unsigned long)server_fd) != 0)
+ error(1, errno, "pthread_create");
+
+ fd = socket(family, SOCK_STREAM, 0);
+ if (fd < 0)
+ error(1, errno, "client socket");
+ setup_loopback_addr(family, &addr);
+ if (connect(fd, (const struct sockaddr *)&addr, addr_len) == -1)
+ error(1, errno, "connect");
+ if (setsockopt(fd, SOL_TCP, TCP_INQ, &one, sizeof(one)) != 0)
+ error(1, errno, "setsockopt(TCP_INQ)");
+
+ msg.msg_name = NULL;
+ msg.msg_namelen = 0;
+ msg.msg_iov = iov;
+ msg.msg_iovlen = 1;
+ msg.msg_control = cmsgbuf;
+ msg.msg_controllen = sizeof(cmsgbuf);
+ msg.msg_flags = 0;
+
+ buf = malloc(BUF_SIZE);
+ iov[0].iov_base = buf;
+ iov[0].iov_len = BUF_SIZE / 2;
+
+ if (recvmsg(fd, &msg, 0) != iov[0].iov_len)
+ error(1, errno, "recvmsg");
+ if (msg.msg_flags & MSG_CTRUNC)
+ error(1, 0, "control message is truncated");
+
+ for (cm = CMSG_FIRSTHDR(&msg); cm; cm = CMSG_NXTHDR(&msg, cm))
+ if (cm->cmsg_level == SOL_TCP && cm->cmsg_type == TCP_CM_INQ)
+ inq = *((int *) CMSG_DATA(cm));
+
+ if (inq != BUF_SIZE - iov[0].iov_len) {
+ fprintf(stderr, "unexpected inq: %d\n", inq);
+ exit(1);
+ }
+
+ printf("PASSED\n");
+ free(buf);
+ close(fd);
+ return 0;
+}
--
2.17.0.441.gb46fe60e1d-goog
^ permalink raw reply related
* [PATCH V3 net-next 1/2] tcp: send in-queue bytes in cmsg upon read
From: Soheil Hassas Yeganeh @ 2018-05-01 14:11 UTC (permalink / raw)
To: davem, netdev; +Cc: ycheng, ncardwell, edumazet, willemb, Soheil Hassas Yeganeh
From: Soheil Hassas Yeganeh <soheil@google.com>
Applications with many concurrent connections, high variance
in receive queue length and tight memory bounds cannot
allocate worst-case buffer size to drain sockets. Knowing
the size of receive queue length, applications can optimize
how they allocate buffers to read from the socket.
The number of bytes pending on the socket is directly
available through ioctl(FIONREAD/SIOCINQ) and can be
approximated using getsockopt(MEMINFO) (rmem_alloc includes
skb overheads in addition to application data). But, both of
these options add an extra syscall per recvmsg. Moreover,
ioctl(FIONREAD/SIOCINQ) takes the socket lock.
Add the TCP_INQ socket option to TCP. When this socket
option is set, recvmsg() relays the number of bytes available
on the socket for reading to the application via the
TCP_CM_INQ control message.
Calculate the number of bytes after releasing the socket lock
to include the processed backlog, if any. To avoid an extra
branch in the hot path of recvmsg() for this new control
message, move all cmsg processing inside an existing branch for
processing receive timestamps. Since the socket lock is not held
when calculating the size of receive queue, TCP_INQ is a hint.
For example, it can overestimate the queue size by one byte,
if FIN is received.
With this method, applications can start reading from the socket
using a small buffer, and then use larger buffers based on the
remaining data when needed.
V3 change-log:
As suggested by David Miller, added loads with barrier
to check whether we have multiple threads calling
recvmsg in parallel. When that happens we lock the
socket to calculate inq.
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Suggested-by: David Miller <davem@davemloft.net>
---
include/linux/tcp.h | 2 +-
include/uapi/linux/tcp.h | 3 +++
net/ipv4/tcp.c | 43 ++++++++++++++++++++++++++++++++++++----
3 files changed, 43 insertions(+), 5 deletions(-)
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 20585d5c4e1c3..807776928cb86 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -228,7 +228,7 @@ struct tcp_sock {
unused:2;
u8 nonagle : 4,/* Disable Nagle algorithm? */
thin_lto : 1,/* Use linear timeouts for thin streams */
- unused1 : 1,
+ recvmsg_inq : 1,/* Indicate # of bytes in queue upon recvmsg */
repair : 1,
frto : 1;/* F-RTO (RFC5682) activated in CA_Loss */
u8 repair_queue;
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index e9e8373b34b9d..29eb659aa77a1 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -123,6 +123,9 @@ enum {
#define TCP_FASTOPEN_KEY 33 /* Set the key for Fast Open (cookie) */
#define TCP_FASTOPEN_NO_COOKIE 34 /* Enable TFO without a TFO cookie */
#define TCP_ZEROCOPY_RECEIVE 35
+#define TCP_INQ 36 /* Notify bytes available to read as a cmsg on read */
+
+#define TCP_CM_INQ TCP_INQ
struct tcp_repair_opt {
__u32 opt_code;
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 4028ddd14dd5a..ca7365db59dff 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1889,6 +1889,22 @@ static void tcp_recv_timestamp(struct msghdr *msg, const struct sock *sk,
}
}
+static inline int tcp_inq_hint(struct sock *sk)
+{
+ const struct tcp_sock *tp = tcp_sk(sk);
+ u32 copied_seq = READ_ONCE(tp->copied_seq);
+ u32 rcv_nxt = READ_ONCE(tp->rcv_nxt);
+ int inq;
+
+ inq = rcv_nxt - copied_seq;
+ if (unlikely(inq < 0 || copied_seq != READ_ONCE(tp->copied_seq))) {
+ lock_sock(sk);
+ inq = tp->rcv_nxt - tp->copied_seq;
+ release_sock(sk);
+ }
+ return inq;
+}
+
/*
* This routine copies from a sock struct into the user buffer.
*
@@ -1905,13 +1921,14 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
u32 peek_seq;
u32 *seq;
unsigned long used;
- int err;
+ int err, inq;
int target; /* Read at least this many bytes */
long timeo;
struct sk_buff *skb, *last;
u32 urg_hole = 0;
struct scm_timestamping tss;
bool has_tss = false;
+ bool has_cmsg;
if (unlikely(flags & MSG_ERRQUEUE))
return inet_recv_error(sk, msg, len, addr_len);
@@ -1926,6 +1943,7 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
if (sk->sk_state == TCP_LISTEN)
goto out;
+ has_cmsg = tp->recvmsg_inq;
timeo = sock_rcvtimeo(sk, nonblock);
/* Urgent data needs to be handled specially. */
@@ -2112,6 +2130,7 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
if (TCP_SKB_CB(skb)->has_rxtstamp) {
tcp_update_recv_tstamps(skb, &tss);
has_tss = true;
+ has_cmsg = true;
}
if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
goto found_fin_ok;
@@ -2131,13 +2150,20 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
* on connected socket. I was just happy when found this 8) --ANK
*/
- if (has_tss)
- tcp_recv_timestamp(msg, sk, &tss);
-
/* Clean up data we have read: This will do ACK frames. */
tcp_cleanup_rbuf(sk, copied);
release_sock(sk);
+
+ if (has_cmsg) {
+ if (has_tss)
+ tcp_recv_timestamp(msg, sk, &tss);
+ if (tp->recvmsg_inq) {
+ inq = tcp_inq_hint(sk);
+ put_cmsg(msg, SOL_TCP, TCP_CM_INQ, sizeof(inq), &inq);
+ }
+ }
+
return copied;
out:
@@ -3006,6 +3032,12 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
tp->notsent_lowat = val;
sk->sk_write_space(sk);
break;
+ case TCP_INQ:
+ if (val > 1 || val < 0)
+ err = -EINVAL;
+ else
+ tp->recvmsg_inq = val;
+ break;
default:
err = -ENOPROTOOPT;
break;
@@ -3431,6 +3463,9 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
case TCP_NOTSENT_LOWAT:
val = tp->notsent_lowat;
break;
+ case TCP_INQ:
+ val = tp->recvmsg_inq;
+ break;
case TCP_SAVE_SYN:
val = tp->save_syn;
break;
--
2.17.0.441.gb46fe60e1d-goog
^ permalink raw reply related
* Re: [PATCH 8/8] dt-bindings: stm32: add compatible for syscon
From: Rob Herring @ 2018-05-01 14:01 UTC (permalink / raw)
To: Christophe Roullier
Cc: mark.rutland, mcoquelin.stm32, alexandre.torgue, peppe.cavallaro,
devicetree, linux-arm-kernel, netdev
In-Reply-To: <1524582120-4451-9-git-send-email-christophe.roullier@st.com>
On Tue, Apr 24, 2018 at 05:02:00PM +0200, Christophe Roullier wrote:
> This patch describes syscon DT bindings.
>
> Signed-off-by: Christophe Roullier <christophe.roullier@st.com>
> ---
> Documentation/devicetree/bindings/arm/stm32.txt | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/arm/stm32.txt b/Documentation/devicetree/bindings/arm/stm32.txt
> index 6808ed9..a871a78 100644
> --- a/Documentation/devicetree/bindings/arm/stm32.txt
> +++ b/Documentation/devicetree/bindings/arm/stm32.txt
> @@ -8,3 +8,10 @@ using one of the following compatible strings:
> st,stm32f746
> st,stm32h743
> st,stm32mp157
> +
> +Required nodes:
> +
> +- syscon: some subnode of the STM32 SoC node must be a
> + system controller node pointing to the control registers,
> + with the compatible string set to one of these tuples:
> + "st,stm32-syscfg", "syscon"
This should be a separate file.
I'd guess the syscfg registers differ from SoC to SoC, so you need more
specific compatible strings.
Rob
^ permalink raw reply
* Re: [PATCH 2/8] dt-bindings: stm32-dwmac: add support of MPU families
From: Rob Herring @ 2018-05-01 13:58 UTC (permalink / raw)
To: Christophe Roullier
Cc: mark.rutland, mcoquelin.stm32, alexandre.torgue, peppe.cavallaro,
devicetree, linux-arm-kernel, netdev
In-Reply-To: <1524582120-4451-3-git-send-email-christophe.roullier@st.com>
On Tue, Apr 24, 2018 at 05:01:54PM +0200, Christophe Roullier wrote:
> Add description for Ethernet MPU families fields
>
> Signed-off-by: Christophe Roullier <christophe.roullier@st.com>
> ---
> Documentation/devicetree/bindings/net/stm32-dwmac.txt | 16 ++++++++++++++--
> 1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/devicetree/bindings/net/stm32-dwmac.txt b/Documentation/devicetree/bindings/net/stm32-dwmac.txt
> index 489dbcb..e9d1c4a 100644
> --- a/Documentation/devicetree/bindings/net/stm32-dwmac.txt
> +++ b/Documentation/devicetree/bindings/net/stm32-dwmac.txt
> @@ -6,14 +6,26 @@ Please see stmmac.txt for the other unchanged properties.
> The device node has following properties.
>
> Required properties:
> -- compatible: Should be "st,stm32-dwmac" to select glue, and
> +- compatible: For MCU family should be "st,stm32-dwmac" to select glue, and
> "snps,dwmac-3.50a" to select IP version.
> + For MPU family should be "st,stm32mp1-dwmac" to select
> + glue, and "snps,dwmac-4.20a" to select IP version.
> - clocks: Must contain a phandle for each entry in clock-names.
> - clock-names: Should be "stmmaceth" for the host clock.
> Should be "mac-clk-tx" for the MAC TX clock.
> Should be "mac-clk-rx" for the MAC RX clock.
> + For MPU family "ethstp" for power mode clock.
> + For MPU family need also "syscfg-clk" for SYSCFG clock.
These are in addition or instead of the first 3 clocks.
> +- interrupt-names: Should contain a list of interrupt names corresponding to
> + the interrupts in the interrupts property, if available.
You need to list the names. Seems unrelated to MPU support.
> - st,syscon : Should be phandle/offset pair. The phandle to the syscon node which
> - encompases the glue register, and the offset of the control register.
> + encompases the glue register, and the offset of the control register.
> +
> +Optional properties:
> +- clock-names: For MPU family "mac-clk-ck" for PHY without quartz
The clock is always connected whether you use it or not, right? So it
shouldn't be optional based on use.
> +- st,int-phyclk : valid only where PHY do not have quartz and need to be clock
> + by RCC
Boolean?
> +
> Example:
>
> ethernet@40028000 {
> --
> 1.9.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe devicetree" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [net-next 0/9][pull request] 40GbE Intel Wired LAN Driver Updates 2018-04-30
From: David Miller @ 2018-05-01 13:38 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann, jogreene
In-Reply-To: <20180430170059.13186-1-jeffrey.t.kirsher@intel.com>
From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon, 30 Apr 2018 10:00:50 -0700
> This series contains updates to i40e and i40evf only.
>
> Jia-Ju Bai replaces an instance of GFP_ATOMIC to GFP_KERNEL, since
> i40evf is not in atomic context when i40evf_add_vlan() is called.
>
> Jake cleans up function header comments to ensure that the function
> parameter comments actually match the function parameters. Fixed a
> possible overflow error in the PTP clock code. Fixed warnings regarding
> restricted __be32 type usage.
>
> Mariusz fixes the reading of the LLDP configuration, which moves from
> using relative values to calculating the absolute address.
>
> Jakub adds a check for 10G LR mode for i40e.
>
> Paweł fixes an issue, where changing the MTU would turn on TSO, GSO and
> GRO.
>
> Alex fixes a couple of issues with the UDP tunnel filter configuration.
> First being that the tunnels did not have mutual exclusion in place to
> prevent a race condition between a user request to add/remove a port and
> an update. The second issue was we were deleting filters that were not
> associated with the actual filter we wanted to delete.
>
> Harshitha ensures that the queue map sent by the VF is taken into
> account when enabling/disabling queues in the VF VSI.
>
> The following are changes since commit 76c2a96d42ca3bdac12c463ff27fec3bb2982e3f:
> liquidio: fix spelling mistake: "mac_tx_multi_collison" -> "mac_tx_multi_collision"
> and are available in the git repository at:
> git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 40GbE
Pulled, thanks Jeff.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox