Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH iproute2 2/2] arpd: remove pthread dependency
From: Baruch Siach @ 2018-05-01 12:43 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Baruch Siach
In-Reply-To: <1918c6139ac3ee9affe8e62817f30b110ab235c7.1525178588.git.baruch@tkos.co.il>

Explicit link with pthread is not needed when linking dynamically. Even
static link with recent libdb does not pull in the code that uses
pthread. Finally, the configure check introduced in commit a25df4887d7
(configure: Check for Berkeley DB for arpd compilation) does not add
-lpthread to its link command.

This change allows arpd build with toolchains that do not provide
threads support.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
---
 misc/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/misc/Makefile b/misc/Makefile
index 34ef6b21b4ed..b2dd6b26e2dc 100644
--- a/misc/Makefile
+++ b/misc/Makefile
@@ -25,7 +25,7 @@ rtacct: rtacct.c
 	$(QUIET_CC)$(CC) $(CFLAGS) $(LDFLAGS) -o rtacct rtacct.c $(LDLIBS) -lm
 
 arpd: arpd.c
-	$(QUIET_CC)$(CC) $(CFLAGS) -I$(DBM_INCLUDE) $(LDFLAGS) -o arpd arpd.c $(LDLIBS) -ldb -lpthread
+	$(QUIET_CC)$(CC) $(CFLAGS) -I$(DBM_INCLUDE) $(LDFLAGS) -o arpd arpd.c $(LDLIBS) -ldb
 
 ssfilter.c: ssfilter.y
 	$(QUIET_YACC)bison ssfilter.y -o ssfilter.c
-- 
2.17.0

^ permalink raw reply related

* [PATCH iproute2 1/2] README: update libdb build dependency information
From: Baruch Siach @ 2018-05-01 12:43 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Baruch Siach

Debian does not distribute libdb4.x-dev for quite some time now. Current
stable carries libdb5.3-dev. Update the wording accordingly.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
---
 README | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README b/README
index f66fd5faf4cf..bc82187cf018 100644
--- a/README
+++ b/README
@@ -16,8 +16,8 @@ How to compile this.
 --------------------
 1. libdbm
 
-arpd needs to have the db4 development libraries. For Debian
-users this is the package with a name like libdb4.x-dev.
+arpd needs to have the berkeleydb development libraries. For Debian
+users this is the package with a name like libdbX.X-dev.
 DBM_INCLUDE points to the directory with db_185.h which
 is the include file used by arpd to get to the old format Berkeley
 database routines.  Often this is in the db-devel package.
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH v2 1/2] dt-bindings: net: meson-dwmac: new compatible name for AXG SoC
From: Rob Herring @ 2018-05-01 12:55 UTC (permalink / raw)
  To: Yixun Lan
  Cc: David S. Miller, netdev, Kevin Hilman, Carlo Caione,
	Jerome Brunet, Martin Blumenstingl, linux-amlogic,
	linux-arm-kernel, linux-kernel, devicetree
In-Reply-To: <20180428102111.18384-2-yixun.lan@amlogic.com>

On Sat, Apr 28, 2018 at 10:21:10AM +0000, Yixun Lan wrote:
> We need to introduce a new compatible name for the Meson-AXG SoC
> in order to support the RMII 100M ethernet PHY, since the PRG_ETH0
> register of the dwmac glue layer is changed from previous old SoC.
> 
> Signed-off-by: Yixun Lan <yixun.lan@amlogic.com>
> ---
>  Documentation/devicetree/bindings/net/meson-dwmac.txt | 1 +
>  1 file changed, 1 insertion(+)

Reviewed-by: Rob Herring <robh@kernel.org>

^ permalink raw reply

* [PATCH iproute2-master] iproute: Parse last nexthop in a multipath route
From: Ido Schimmel @ 2018-05-01 13:16 UTC (permalink / raw)
  To: netdev; +Cc: stephen, dsahern, mlxsw, Ido Schimmel

Continue parsing a multipath payload as long as another nexthop can fit
in the payload.

# ip route add 192.0.2.0/24 nexthop dev dummy0 nexthop dev dummy1

Before:
# ip route show 192.0.2.0/24
192.0.2.0/24
        nexthop dev dummy0 weight 1

After:
# ip route show 192.0.2.0/24
192.0.2.0/24
        nexthop dev dummy0 weight 1
        nexthop dev dummy1 weight 1

Fixes: f48e14880a0e ("iproute: refactor multipath print")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 ip/iproute.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ip/iproute.c b/ip/iproute.c
index 44351bc51b4b..56dd9f25e38e 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -650,7 +650,7 @@ static void print_rta_multipath(FILE *fp, const struct rtmsg *r,
 	int len = RTA_PAYLOAD(rta);
 	int first = 1;
 
-	while (len > sizeof(*nh)) {
+	while (len >= sizeof(*nh)) {
 		struct rtattr *tb[RTA_MAX + 1];
 
 		if (nh->rtnh_len > len)
-- 
2.14.3

^ permalink raw reply related

* Re: [PATCH v2 2/2] net: stmmac: dwmac-meson: extend phy mode setting
From: Martin Blumenstingl @ 2018-05-01 13:20 UTC (permalink / raw)
  To: Yixun Lan
  Cc: David S. Miller, netdev, Kevin Hilman, Carlo Caione, Rob Herring,
	Jerome Brunet, linux-amlogic, linux-arm-kernel, linux-kernel
In-Reply-To: <20180428102111.18384-3-yixun.lan@amlogic.com>

Hello Yixun,

On Sat, Apr 28, 2018 at 12:21 PM, Yixun Lan <yixun.lan@amlogic.com> wrote:
>   In the Meson-AXG SoC, the phy mode setting of PRG_ETH0 in the glue layer
> is extended from bit[0] to bit[2:0].
>   There is no problem if we configure it to the RGMII 1000M PHY mode,
> since the register setting is coincidentally compatible with previous one,
> but for the RMII 100M PHY mode, the configuration need to be changed to
> value - b100.
>   This patch was verified with a RTL8201F 100M ethernet PHY.
>
> Signed-off-by: Yixun Lan <yixun.lan@amlogic.com>
> ---
>  .../ethernet/stmicro/stmmac/dwmac-meson8b.c   | 120 +++++++++++++++---
>  1 file changed, 104 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
> index 7cb794094a70..4ff231df7322 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
> @@ -18,6 +18,7 @@
>  #include <linux/io.h>
>  #include <linux/ioport.h>
>  #include <linux/module.h>
> +#include <linux/of_device.h>
>  #include <linux/of_net.h>
>  #include <linux/mfd/syscon.h>
>  #include <linux/platform_device.h>
> @@ -29,6 +30,10 @@
>
>  #define PRG_ETH0_RGMII_MODE            BIT(0)
>
> +#define PRG_ETH0_EXT_PHY_MODE_MASK     GENMASK(2, 0)
> +#define PRG_ETH0_EXT_RGMII_MODE                1
> +#define PRG_ETH0_EXT_RMII_MODE         4
> +
>  /* mux to choose between fclk_div2 (bit unset) and mpll2 (bit set) */
>  #define PRG_ETH0_CLK_M250_SEL_SHIFT    4
>  #define PRG_ETH0_CLK_M250_SEL_MASK     GENMASK(4, 4)
> @@ -47,12 +52,20 @@
>
>  #define MUX_CLK_NUM_PARENTS            2
>
> +struct meson8b_dwmac;
> +
> +struct meson8b_dwmac_data {
> +       int (*set_phy_mode)(struct meson8b_dwmac *dwmac);
> +};
> +
>  struct meson8b_dwmac {
> -       struct device           *dev;
> -       void __iomem            *regs;
> -       phy_interface_t         phy_mode;
> -       struct clk              *rgmii_tx_clk;
> -       u32                     tx_delay_ns;
> +       struct device                   *dev;
> +       void __iomem                    *regs;
> +
> +       const struct meson8b_dwmac_data *data;
> +       phy_interface_t                 phy_mode;
> +       struct clk                      *rgmii_tx_clk;
> +       u32                             tx_delay_ns;
>  };
>
>  struct meson8b_dwmac_clk_configs {
> @@ -171,6 +184,59 @@ static int meson8b_init_rgmii_tx_clk(struct meson8b_dwmac *dwmac)
>         return 0;
>  }
>
> +static int meson8b_set_phy_mode(struct meson8b_dwmac *dwmac)
> +{
> +       switch (dwmac->phy_mode) {
> +       case PHY_INTERFACE_MODE_RGMII:
> +       case PHY_INTERFACE_MODE_RGMII_RXID:
> +       case PHY_INTERFACE_MODE_RGMII_ID:
> +       case PHY_INTERFACE_MODE_RGMII_TXID:
> +               /* enable RGMII mode */
> +               meson8b_dwmac_mask_bits(dwmac, PRG_ETH0,
> +                                       PRG_ETH0_RGMII_MODE,
> +                                       PRG_ETH0_RGMII_MODE);
> +               break;
> +       case PHY_INTERFACE_MODE_RMII:
> +               /* disable RGMII mode -> enables RMII mode */
> +               meson8b_dwmac_mask_bits(dwmac, PRG_ETH0,
> +                                       PRG_ETH0_RGMII_MODE, 0);
> +               break;
> +       default:
> +               dev_err(dwmac->dev, "fail to set phy-mode %s\n",
> +                       phy_modes(dwmac->phy_mode));
> +               return -EINVAL;
> +       }
> +
> +       return 0;
> +}
> +
> +static int meson_axg_set_phy_mode(struct meson8b_dwmac *dwmac)
> +{
> +       switch (dwmac->phy_mode) {
> +       case PHY_INTERFACE_MODE_RGMII:
> +       case PHY_INTERFACE_MODE_RGMII_RXID:
> +       case PHY_INTERFACE_MODE_RGMII_ID:
> +       case PHY_INTERFACE_MODE_RGMII_TXID:
> +               /* enable RGMII mode */
> +               meson8b_dwmac_mask_bits(dwmac, PRG_ETH0,
> +                                       PRG_ETH0_EXT_PHY_MODE_MASK,
> +                                       PRG_ETH0_EXT_RGMII_MODE);
> +               break;
> +       case PHY_INTERFACE_MODE_RMII:
> +               /* disable RGMII mode -> enables RMII mode */
if you have to re-send it for whatever reason:
maybe you could remove the comments from meson_axg_set_phy_mode. the
"older" register layout requires un-setting RGMII mode to enable RMII
mode. however, for AXG there seem to be two dedicated values (1 and 4)
for each mode

apart from that:
Acked-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>

> +               meson8b_dwmac_mask_bits(dwmac, PRG_ETH0,
> +                                       PRG_ETH0_EXT_PHY_MODE_MASK,
> +                                       PRG_ETH0_EXT_RMII_MODE);
> +               break;
> +       default:
> +               dev_err(dwmac->dev, "fail to set phy-mode %s\n",
> +                       phy_modes(dwmac->phy_mode));
> +               return -EINVAL;
> +       }
> +
> +       return 0;
> +}
> +
>  static int meson8b_init_prg_eth(struct meson8b_dwmac *dwmac)
>  {
>         int ret;
> @@ -188,10 +254,6 @@ static int meson8b_init_prg_eth(struct meson8b_dwmac *dwmac)
>
>         case PHY_INTERFACE_MODE_RGMII_ID:
>         case PHY_INTERFACE_MODE_RGMII_TXID:
> -               /* enable RGMII mode */
> -               meson8b_dwmac_mask_bits(dwmac, PRG_ETH0, PRG_ETH0_RGMII_MODE,
> -                                       PRG_ETH0_RGMII_MODE);
> -
>                 /* only relevant for RMII mode -> disable in RGMII mode */
>                 meson8b_dwmac_mask_bits(dwmac, PRG_ETH0,
>                                         PRG_ETH0_INVERTED_RMII_CLK, 0);
> @@ -224,10 +286,6 @@ static int meson8b_init_prg_eth(struct meson8b_dwmac *dwmac)
>                 break;
>
>         case PHY_INTERFACE_MODE_RMII:
> -               /* disable RGMII mode -> enables RMII mode */
> -               meson8b_dwmac_mask_bits(dwmac, PRG_ETH0, PRG_ETH0_RGMII_MODE,
> -                                       0);
> -
>                 /* invert internal clk_rmii_i to generate 25/2.5 tx_rx_clk */
>                 meson8b_dwmac_mask_bits(dwmac, PRG_ETH0,
>                                         PRG_ETH0_INVERTED_RMII_CLK,
> @@ -274,6 +332,11 @@ static int meson8b_dwmac_probe(struct platform_device *pdev)
>                 goto err_remove_config_dt;
>         }
>
> +       dwmac->data = (const struct meson8b_dwmac_data *)
> +               of_device_get_match_data(&pdev->dev);
> +       if (!dwmac->data)
> +               return -EINVAL;
> +
>         res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
>         dwmac->regs = devm_ioremap_resource(&pdev->dev, res);
>         if (IS_ERR(dwmac->regs)) {
> @@ -298,6 +361,10 @@ static int meson8b_dwmac_probe(struct platform_device *pdev)
>         if (ret)
>                 goto err_remove_config_dt;
>
> +       ret = dwmac->data->set_phy_mode(dwmac);
> +       if (ret)
> +               goto err_remove_config_dt;
> +
>         ret = meson8b_init_prg_eth(dwmac);
>         if (ret)
>                 goto err_remove_config_dt;
> @@ -316,10 +383,31 @@ static int meson8b_dwmac_probe(struct platform_device *pdev)
>         return ret;
>  }
>
> +static const struct meson8b_dwmac_data meson8b_dwmac_data = {
> +       .set_phy_mode = meson8b_set_phy_mode,
> +};
> +
> +static const struct meson8b_dwmac_data meson_axg_dwmac_data = {
> +       .set_phy_mode = meson_axg_set_phy_mode,
> +};
> +
>  static const struct of_device_id meson8b_dwmac_match[] = {
> -       { .compatible = "amlogic,meson8b-dwmac" },
> -       { .compatible = "amlogic,meson8m2-dwmac" },
> -       { .compatible = "amlogic,meson-gxbb-dwmac" },
> +       {
> +               .compatible = "amlogic,meson8b-dwmac",
> +               .data = &meson8b_dwmac_data,
> +       },
> +       {
> +               .compatible = "amlogic,meson8m2-dwmac",
> +               .data = &meson8b_dwmac_data,
> +       },
> +       {
> +               .compatible = "amlogic,meson-gxbb-dwmac",
> +               .data = &meson8b_dwmac_data,
> +       },
> +       {
> +               .compatible = "amlogic,meson-axg-dwmac",
> +               .data = &meson_axg_dwmac_data,
> +       },
>         { }
>  };
>  MODULE_DEVICE_TABLE(of, meson8b_dwmac_match);
> --
> 2.17.0
>

^ permalink raw reply

* Re: [PATCH] ipv6: Allow non-gateway ECMP for IPv6
From: Ido Schimmel @ 2018-05-01 13:20 UTC (permalink / raw)
  To: David Ahern
  Cc: Thomas Winter, netdev, David S. Miller, Alexey Kuznetsov,
	Hideaki YOSHIFUJI
In-Reply-To: <89497565-f1e6-a916-70d3-dfc7efa7a7e4@gmail.com>

On Mon, Apr 30, 2018 at 08:59:10PM -0600, David Ahern wrote:
> On 4/30/18 3:15 PM, Thomas Winter wrote:
> > It is valid to have static routes where the nexthop
> > is an interface not an address such as tunnels.
> > For IPv4 it was possible to use ECMP on these routes
> > but not for IPv6.
> > 
> > Signed-off-by: Thomas Winter <Thomas.Winter@alliedtelesis.co.nz>
> > Cc: David Ahern <dsahern@gmail.com>
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
> > Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
> > ---
> >  include/net/ip6_route.h | 3 +--
> >  net/ipv6/ip6_fib.c      | 3 ---
> >  2 files changed, 1 insertion(+), 5 deletions(-)
> > 
> 
> Interesting. Existing code inserts the dev nexthop as a separate route.
> 
> Change looks good to me.
> 
> Acked-by: David Ahern <dsahern@gmail.com>

Thanks for the Cc, David. I'll need to adjust mlxsw to support this.
Specifically, mlxsw_sp_fib6_rt_can_mp().

BTW, I hit this bug while looking into this:
https://patchwork.ozlabs.org/patch/907050/

^ permalink raw reply

* Re: [net-next 0/9][pull request] 40GbE Intel Wired LAN Driver Updates 2018-04-30
From: David Miller @ 2018-05-01 13:38 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann, jogreene
In-Reply-To: <20180430170059.13186-1-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon, 30 Apr 2018 10:00:50 -0700

> This series contains updates to i40e and i40evf only.
> 
> Jia-Ju Bai replaces an instance of GFP_ATOMIC to GFP_KERNEL, since
> i40evf is not in atomic context when i40evf_add_vlan() is called.
> 
> Jake cleans up function header comments to ensure that the function
> parameter comments actually match the function parameters.  Fixed a
> possible overflow error in the PTP clock code.  Fixed warnings regarding
> restricted __be32 type usage.
> 
> Mariusz fixes the reading of the LLDP configuration, which moves from
> using relative values to calculating the absolute address.
> 
> Jakub adds a check for 10G LR mode for i40e.
> 
> Paweł fixes an issue, where changing the MTU would turn on TSO, GSO and
> GRO.
> 
> Alex fixes a couple of issues with the UDP tunnel filter configuration.
> First being that the tunnels did not have mutual exclusion in place to
> prevent a race condition between a user request to add/remove a port and
> an update.  The second issue was we were deleting filters that were not
> associated with the actual filter we wanted to delete.
> 
> Harshitha ensures that the queue map sent by the VF is taken into
> account when enabling/disabling queues in the VF VSI.
> 
> The following are changes since commit 76c2a96d42ca3bdac12c463ff27fec3bb2982e3f:
>   liquidio: fix spelling mistake: "mac_tx_multi_collison" -> "mac_tx_multi_collision"
> and are available in the git repository at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 40GbE

Pulled, thanks Jeff.

^ permalink raw reply

* Re: [PATCH  2/8] dt-bindings: stm32-dwmac: add support of MPU families
From: Rob Herring @ 2018-05-01 13:58 UTC (permalink / raw)
  To: Christophe Roullier
  Cc: mark.rutland, mcoquelin.stm32, alexandre.torgue, peppe.cavallaro,
	devicetree, linux-arm-kernel, netdev
In-Reply-To: <1524582120-4451-3-git-send-email-christophe.roullier@st.com>

On Tue, Apr 24, 2018 at 05:01:54PM +0200, Christophe Roullier wrote:
> Add description for Ethernet MPU families fields
> 
> Signed-off-by: Christophe Roullier <christophe.roullier@st.com>
> ---
>  Documentation/devicetree/bindings/net/stm32-dwmac.txt | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/stm32-dwmac.txt b/Documentation/devicetree/bindings/net/stm32-dwmac.txt
> index 489dbcb..e9d1c4a 100644
> --- a/Documentation/devicetree/bindings/net/stm32-dwmac.txt
> +++ b/Documentation/devicetree/bindings/net/stm32-dwmac.txt
> @@ -6,14 +6,26 @@ Please see stmmac.txt for the other unchanged properties.
>  The device node has following properties.
>  
>  Required properties:
> -- compatible:  Should be "st,stm32-dwmac" to select glue, and
> +- compatible:  For MCU family should be "st,stm32-dwmac" to select glue, and
>  	       "snps,dwmac-3.50a" to select IP version.
> +	       For MPU family should be "st,stm32mp1-dwmac" to select
> +	       glue, and "snps,dwmac-4.20a" to select IP version.
>  - clocks: Must contain a phandle for each entry in clock-names.
>  - clock-names: Should be "stmmaceth" for the host clock.
>  	       Should be "mac-clk-tx" for the MAC TX clock.
>  	       Should be "mac-clk-rx" for the MAC RX clock.
> +	       For MPU family "ethstp" for power mode clock.
> +	       For MPU family need also "syscfg-clk" for SYSCFG clock.

These are in addition or instead of the first 3 clocks.

> +- interrupt-names: Should contain a list of interrupt names corresponding to
> +           the interrupts in the interrupts property, if available.

You need to list the names. Seems unrelated to MPU support.

>  - st,syscon : Should be phandle/offset pair. The phandle to the syscon node which
> -	      encompases the glue register, and the offset of the control register.
> +	       encompases the glue register, and the offset of the control register.
> +
> +Optional properties:
> +- clock-names:     For MPU family "mac-clk-ck" for PHY without quartz

The clock is always connected whether you use it or not, right? So it 
shouldn't be optional based on use.

> +- st,int-phyclk :  valid only where PHY do not have quartz and need to be clock
> +	           by RCC

Boolean?


> +
>  Example:
>  
>  	ethernet@40028000 {
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe devicetree" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH  8/8] dt-bindings: stm32: add compatible for syscon
From: Rob Herring @ 2018-05-01 14:01 UTC (permalink / raw)
  To: Christophe Roullier
  Cc: mark.rutland, mcoquelin.stm32, alexandre.torgue, peppe.cavallaro,
	devicetree, linux-arm-kernel, netdev
In-Reply-To: <1524582120-4451-9-git-send-email-christophe.roullier@st.com>

On Tue, Apr 24, 2018 at 05:02:00PM +0200, Christophe Roullier wrote:
> This patch describes syscon DT bindings.
> 
> Signed-off-by: Christophe Roullier <christophe.roullier@st.com>
> ---
>  Documentation/devicetree/bindings/arm/stm32.txt | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/arm/stm32.txt b/Documentation/devicetree/bindings/arm/stm32.txt
> index 6808ed9..a871a78 100644
> --- a/Documentation/devicetree/bindings/arm/stm32.txt
> +++ b/Documentation/devicetree/bindings/arm/stm32.txt
> @@ -8,3 +8,10 @@ using one of the following compatible strings:
>    st,stm32f746
>    st,stm32h743
>    st,stm32mp157
> +
> +Required nodes:
> +
> +- syscon: some subnode of the STM32 SoC node must be a
> +  system controller node pointing to the control registers,
> +  with the compatible string set to one of these tuples:
> +  "st,stm32-syscfg", "syscon"

This should be a separate file.

I'd guess the syscfg registers differ from SoC to SoC, so you need more 
specific compatible strings.

Rob

^ permalink raw reply

* [PATCH V3 net-next 1/2] tcp: send in-queue bytes in cmsg upon read
From: Soheil Hassas Yeganeh @ 2018-05-01 14:11 UTC (permalink / raw)
  To: davem, netdev; +Cc: ycheng, ncardwell, edumazet, willemb, Soheil Hassas Yeganeh

From: Soheil Hassas Yeganeh <soheil@google.com>

Applications with many concurrent connections, high variance
in receive queue length and tight memory bounds cannot
allocate worst-case buffer size to drain sockets. Knowing
the size of receive queue length, applications can optimize
how they allocate buffers to read from the socket.

The number of bytes pending on the socket is directly
available through ioctl(FIONREAD/SIOCINQ) and can be
approximated using getsockopt(MEMINFO) (rmem_alloc includes
skb overheads in addition to application data). But, both of
these options add an extra syscall per recvmsg. Moreover,
ioctl(FIONREAD/SIOCINQ) takes the socket lock.

Add the TCP_INQ socket option to TCP. When this socket
option is set, recvmsg() relays the number of bytes available
on the socket for reading to the application via the
TCP_CM_INQ control message.

Calculate the number of bytes after releasing the socket lock
to include the processed backlog, if any. To avoid an extra
branch in the hot path of recvmsg() for this new control
message, move all cmsg processing inside an existing branch for
processing receive timestamps. Since the socket lock is not held
when calculating the size of receive queue, TCP_INQ is a hint.
For example, it can overestimate the queue size by one byte,
if FIN is received.

With this method, applications can start reading from the socket
using a small buffer, and then use larger buffers based on the
remaining data when needed.

V3 change-log:
	As suggested by David Miller, added loads with barrier
	to check whether we have multiple threads calling
	recvmsg in parallel. When that happens we lock the
	socket to calculate inq.

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Suggested-by: David Miller <davem@davemloft.net>
---
 include/linux/tcp.h      |  2 +-
 include/uapi/linux/tcp.h |  3 +++
 net/ipv4/tcp.c           | 43 ++++++++++++++++++++++++++++++++++++----
 3 files changed, 43 insertions(+), 5 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 20585d5c4e1c3..807776928cb86 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -228,7 +228,7 @@ struct tcp_sock {
 		unused:2;
 	u8	nonagle     : 4,/* Disable Nagle algorithm?             */
 		thin_lto    : 1,/* Use linear timeouts for thin streams */
-		unused1	    : 1,
+		recvmsg_inq : 1,/* Indicate # of bytes in queue upon recvmsg */
 		repair      : 1,
 		frto        : 1;/* F-RTO (RFC5682) activated in CA_Loss */
 	u8	repair_queue;
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index e9e8373b34b9d..29eb659aa77a1 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -123,6 +123,9 @@ enum {
 #define TCP_FASTOPEN_KEY	33	/* Set the key for Fast Open (cookie) */
 #define TCP_FASTOPEN_NO_COOKIE	34	/* Enable TFO without a TFO cookie */
 #define TCP_ZEROCOPY_RECEIVE	35
+#define TCP_INQ			36	/* Notify bytes available to read as a cmsg on read */
+
+#define TCP_CM_INQ		TCP_INQ
 
 struct tcp_repair_opt {
 	__u32	opt_code;
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 4028ddd14dd5a..ca7365db59dff 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1889,6 +1889,22 @@ static void tcp_recv_timestamp(struct msghdr *msg, const struct sock *sk,
 	}
 }
 
+static inline int tcp_inq_hint(struct sock *sk)
+{
+	const struct tcp_sock *tp = tcp_sk(sk);
+	u32 copied_seq = READ_ONCE(tp->copied_seq);
+	u32 rcv_nxt = READ_ONCE(tp->rcv_nxt);
+	int inq;
+
+	inq = rcv_nxt - copied_seq;
+	if (unlikely(inq < 0 || copied_seq != READ_ONCE(tp->copied_seq))) {
+		lock_sock(sk);
+		inq = tp->rcv_nxt - tp->copied_seq;
+		release_sock(sk);
+	}
+	return inq;
+}
+
 /*
  *	This routine copies from a sock struct into the user buffer.
  *
@@ -1905,13 +1921,14 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
 	u32 peek_seq;
 	u32 *seq;
 	unsigned long used;
-	int err;
+	int err, inq;
 	int target;		/* Read at least this many bytes */
 	long timeo;
 	struct sk_buff *skb, *last;
 	u32 urg_hole = 0;
 	struct scm_timestamping tss;
 	bool has_tss = false;
+	bool has_cmsg;
 
 	if (unlikely(flags & MSG_ERRQUEUE))
 		return inet_recv_error(sk, msg, len, addr_len);
@@ -1926,6 +1943,7 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
 	if (sk->sk_state == TCP_LISTEN)
 		goto out;
 
+	has_cmsg = tp->recvmsg_inq;
 	timeo = sock_rcvtimeo(sk, nonblock);
 
 	/* Urgent data needs to be handled specially. */
@@ -2112,6 +2130,7 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
 		if (TCP_SKB_CB(skb)->has_rxtstamp) {
 			tcp_update_recv_tstamps(skb, &tss);
 			has_tss = true;
+			has_cmsg = true;
 		}
 		if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
 			goto found_fin_ok;
@@ -2131,13 +2150,20 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
 	 * on connected socket. I was just happy when found this 8) --ANK
 	 */
 
-	if (has_tss)
-		tcp_recv_timestamp(msg, sk, &tss);
-
 	/* Clean up data we have read: This will do ACK frames. */
 	tcp_cleanup_rbuf(sk, copied);
 
 	release_sock(sk);
+
+	if (has_cmsg) {
+		if (has_tss)
+			tcp_recv_timestamp(msg, sk, &tss);
+		if (tp->recvmsg_inq) {
+			inq = tcp_inq_hint(sk);
+			put_cmsg(msg, SOL_TCP, TCP_CM_INQ, sizeof(inq), &inq);
+		}
+	}
+
 	return copied;
 
 out:
@@ -3006,6 +3032,12 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
 		tp->notsent_lowat = val;
 		sk->sk_write_space(sk);
 		break;
+	case TCP_INQ:
+		if (val > 1 || val < 0)
+			err = -EINVAL;
+		else
+			tp->recvmsg_inq = val;
+		break;
 	default:
 		err = -ENOPROTOOPT;
 		break;
@@ -3431,6 +3463,9 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
 	case TCP_NOTSENT_LOWAT:
 		val = tp->notsent_lowat;
 		break;
+	case TCP_INQ:
+		val = tp->recvmsg_inq;
+		break;
 	case TCP_SAVE_SYN:
 		val = tp->save_syn;
 		break;
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related

* [PATCH V3 net-next 2/2] selftest: add test for TCP_INQ
From: Soheil Hassas Yeganeh @ 2018-05-01 14:11 UTC (permalink / raw)
  To: davem, netdev; +Cc: ycheng, ncardwell, edumazet, willemb, Soheil Hassas Yeganeh
In-Reply-To: <20180501141128.208705-1-soheil.kdev@gmail.com>

From: Soheil Hassas Yeganeh <soheil@google.com>

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
---
 tools/testing/selftests/net/Makefile  |   3 +-
 tools/testing/selftests/net/tcp_inq.c | 189 ++++++++++++++++++++++++++
 2 files changed, 191 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/net/tcp_inq.c

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index df9102ec7b7af..0a1821f8dfb18 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -9,7 +9,7 @@ TEST_PROGS += fib_tests.sh fib-onlink-tests.sh in_netns.sh pmtu.sh udpgso.sh
 TEST_PROGS += udpgso_bench.sh
 TEST_GEN_FILES =  socket
 TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy
-TEST_GEN_FILES += tcp_mmap
+TEST_GEN_FILES += tcp_mmap tcp_inq
 TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
 TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict
 TEST_GEN_PROGS += udpgso udpgso_bench_tx udpgso_bench_rx
@@ -18,3 +18,4 @@ include ../lib.mk
 
 $(OUTPUT)/reuseport_bpf_numa: LDFLAGS += -lnuma
 $(OUTPUT)/tcp_mmap: LDFLAGS += -lpthread
+$(OUTPUT)/tcp_inq: LDFLAGS += -lpthread
diff --git a/tools/testing/selftests/net/tcp_inq.c b/tools/testing/selftests/net/tcp_inq.c
new file mode 100644
index 0000000000000..d044b29ddabcc
--- /dev/null
+++ b/tools/testing/selftests/net/tcp_inq.c
@@ -0,0 +1,189 @@
+/*
+ * Copyright 2018 Google Inc.
+ * Author: Soheil Hassas Yeganeh (soheil@google.com)
+ *
+ * Simple example on how to use TCP_INQ and TCP_CM_INQ.
+ *
+ * License (GPLv2):
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. * See the GNU General Public License for
+ * more details.
+ */
+#define _GNU_SOURCE
+
+#include <error.h>
+#include <netinet/in.h>
+#include <netinet/tcp.h>
+#include <pthread.h>
+#include <stdio.h>
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/socket.h>
+#include <unistd.h>
+
+#ifndef TCP_INQ
+#define TCP_INQ 36
+#endif
+
+#ifndef TCP_CM_INQ
+#define TCP_CM_INQ TCP_INQ
+#endif
+
+#define BUF_SIZE 8192
+#define CMSG_SIZE 32
+
+static int family = AF_INET6;
+static socklen_t addr_len = sizeof(struct sockaddr_in6);
+static int port = 4974;
+
+static void setup_loopback_addr(int family, struct sockaddr_storage *sockaddr)
+{
+	struct sockaddr_in6 *addr6 = (void *) sockaddr;
+	struct sockaddr_in *addr4 = (void *) sockaddr;
+
+	switch (family) {
+	case PF_INET:
+		memset(addr4, 0, sizeof(*addr4));
+		addr4->sin_family = AF_INET;
+		addr4->sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+		addr4->sin_port = htons(port);
+		break;
+	case PF_INET6:
+		memset(addr6, 0, sizeof(*addr6));
+		addr6->sin6_family = AF_INET6;
+		addr6->sin6_addr = in6addr_loopback;
+		addr6->sin6_port = htons(port);
+		break;
+	default:
+		error(1, 0, "illegal family");
+	}
+}
+
+void *start_server(void *arg)
+{
+	int server_fd = (int)(unsigned long)arg;
+	struct sockaddr_in addr;
+	socklen_t addrlen = sizeof(addr);
+	char *buf;
+	int fd;
+	int r;
+
+	buf = malloc(BUF_SIZE);
+
+	for (;;) {
+		fd = accept(server_fd, (struct sockaddr *)&addr, &addrlen);
+		if (fd == -1) {
+			perror("accept");
+			break;
+		}
+		do {
+			r = send(fd, buf, BUF_SIZE, 0);
+		} while (r < 0 && errno == EINTR);
+		if (r < 0)
+			perror("send");
+		if (r != BUF_SIZE)
+			fprintf(stderr, "can only send %d bytes\n", r);
+		/* TCP_INQ can overestimate in-queue by one byte if we send
+		 * the FIN packet. Sleep for 1 second, so that the client
+		 * likely invoked recvmsg().
+		 */
+		sleep(1);
+		close(fd);
+	}
+
+	free(buf);
+	close(server_fd);
+	pthread_exit(0);
+}
+
+int main(int argc, char *argv[])
+{
+	struct sockaddr_storage listen_addr, addr;
+	int c, one = 1, inq = -1;
+	pthread_t server_thread;
+	char cmsgbuf[CMSG_SIZE];
+	struct iovec iov[1];
+	struct cmsghdr *cm;
+	struct msghdr msg;
+	int server_fd, fd;
+	char *buf;
+
+	while ((c = getopt(argc, argv, "46p:")) != -1) {
+		switch (c) {
+		case '4':
+			family = PF_INET;
+			addr_len = sizeof(struct sockaddr_in);
+			break;
+		case '6':
+			family = PF_INET6;
+			addr_len = sizeof(struct sockaddr_in6);
+			break;
+		case 'p':
+			port = atoi(optarg);
+			break;
+		}
+	}
+
+	server_fd = socket(family, SOCK_STREAM, 0);
+	if (server_fd < 0)
+		error(1, errno, "server socket");
+	setup_loopback_addr(family, &listen_addr);
+	if (setsockopt(server_fd, SOL_SOCKET, SO_REUSEADDR,
+		       &one, sizeof(one)) != 0)
+		error(1, errno, "setsockopt(SO_REUSEADDR)");
+	if (bind(server_fd, (const struct sockaddr *)&listen_addr,
+		 addr_len) == -1)
+		error(1, errno, "bind");
+	if (listen(server_fd, 128) == -1)
+		error(1, errno, "listen");
+	if (pthread_create(&server_thread, NULL, start_server,
+			   (void *)(unsigned long)server_fd) != 0)
+		error(1, errno, "pthread_create");
+
+	fd = socket(family, SOCK_STREAM, 0);
+	if (fd < 0)
+		error(1, errno, "client socket");
+	setup_loopback_addr(family, &addr);
+	if (connect(fd, (const struct sockaddr *)&addr, addr_len) == -1)
+		error(1, errno, "connect");
+	if (setsockopt(fd, SOL_TCP, TCP_INQ, &one, sizeof(one)) != 0)
+		error(1, errno, "setsockopt(TCP_INQ)");
+
+	msg.msg_name = NULL;
+	msg.msg_namelen = 0;
+	msg.msg_iov = iov;
+	msg.msg_iovlen = 1;
+	msg.msg_control = cmsgbuf;
+	msg.msg_controllen = sizeof(cmsgbuf);
+	msg.msg_flags = 0;
+
+	buf = malloc(BUF_SIZE);
+	iov[0].iov_base = buf;
+	iov[0].iov_len = BUF_SIZE / 2;
+
+	if (recvmsg(fd, &msg, 0) != iov[0].iov_len)
+		error(1, errno, "recvmsg");
+	if (msg.msg_flags & MSG_CTRUNC)
+		error(1, 0, "control message is truncated");
+
+	for (cm = CMSG_FIRSTHDR(&msg); cm; cm = CMSG_NXTHDR(&msg, cm))
+		if (cm->cmsg_level == SOL_TCP && cm->cmsg_type == TCP_CM_INQ)
+			inq = *((int *) CMSG_DATA(cm));
+
+	if (inq != BUF_SIZE - iov[0].iov_len) {
+		fprintf(stderr, "unexpected inq: %d\n", inq);
+		exit(1);
+	}
+
+	printf("PASSED\n");
+	free(buf);
+	close(fd);
+	return 0;
+}
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related

* [PATCH net-next 0/2] Update csum tc action for batch operation.
From: Craig Dillabaugh @ 2018-05-01 14:17 UTC (permalink / raw)
  To: davem; +Cc: netdev, jhs, xiyou.wangcong, Craig Dillabaugh

This patchset includes two patches the first updating act_csum.c 
to include the get_fill_size routine required for batch operation, and
the second including updated TDC tests for the feature.

Craig Dillabaugh (2):
  net sched: Implemented get_fill_size routine for act_csum.
  tc-testing: Updated csum action tests batch create w/wo cookies.

 net/sched/act_csum.c                               |  6 ++
 .../tc-testing/tc-tests/actions/csum.json          | 74 +++++++++++++++++++++-
 2 files changed, 78 insertions(+), 2 deletions(-)

-- 
1.9.1

^ permalink raw reply

* [PATCH net-next 1/2] net sched: Implemented get_fill_size routine for act_csum.
From: Craig Dillabaugh @ 2018-05-01 14:17 UTC (permalink / raw)
  To: davem; +Cc: netdev, jhs, xiyou.wangcong, Craig Dillabaugh
In-Reply-To: <1525184264-9436-1-git-send-email-cdillaba@mojatatu.com>

Signed-off-by: Craig Dillabaugh <cdillaba@mojatatu.com>
---
 net/sched/act_csum.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/sched/act_csum.c b/net/sched/act_csum.c
index 7e28b2c..b85e088 100644
--- a/net/sched/act_csum.c
+++ b/net/sched/act_csum.c
@@ -648,6 +648,11 @@ static int tcf_csum_search(struct net *net, struct tc_action **a, u32 index,
 	return tcf_idr_search(tn, a, index);
 }
 
+static size_t tcf_csum_get_fill_size(const struct tc_action *act)
+{
+	return nla_total_size(sizeof(struct tc_csum));
+}
+
 static struct tc_action_ops act_csum_ops = {
 	.kind		= "csum",
 	.type		= TCA_ACT_CSUM,
@@ -658,6 +663,7 @@ static int tcf_csum_search(struct net *net, struct tc_action **a, u32 index,
 	.cleanup	= tcf_csum_cleanup,
 	.walk		= tcf_csum_walker,
 	.lookup		= tcf_csum_search,
+	.get_fill_size  = tcf_csum_get_fill_size,
 	.size		= sizeof(struct tcf_csum),
 };
 
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 2/2] tc-testing: Updated csum action tests batch create w/wo cookies.
From: Craig Dillabaugh @ 2018-05-01 14:17 UTC (permalink / raw)
  To: davem; +Cc: netdev, jhs, xiyou.wangcong, Craig Dillabaugh
In-Reply-To: <1525184264-9436-1-git-send-email-cdillaba@mojatatu.com>

Signed-off-by: Craig Dillabaugh <cdillaba@mojatatu.com>
---
 .../tc-testing/tc-tests/actions/csum.json          | 74 +++++++++++++++++++++-
 1 file changed, 72 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/csum.json b/tools/testing/selftests/tc-testing/tc-tests/actions/csum.json
index 93cf8fe..3a2f51f 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/actions/csum.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/actions/csum.json
@@ -398,13 +398,83 @@
                 255
             ]
         ],
-        "cmdUnderTest": "for i in `seq 1 32`; do cmd=\"action csum tcp continue index $i \"; args=\"$args$cmd\"; done && $TC actions add $args",
-        "expExitCode": "255",
+        "cmdUnderTest": "bash -c \"for i in \\`seq 1 32\\`; do cmd=\\\"action csum tcp continue index \\$i \\\"; args=\"\\$args\\$cmd\"; done && $TC actions add \\$args\"",
+        "expExitCode": "0",
         "verifyCmd": "$TC actions ls action csum",
         "matchPattern": "^[ \t]+index [0-9]* ref",
         "matchCount": "32",
         "teardown": [
             "$TC actions flush action csum"
         ]
+    },
+    {
+        "id": "b4e9",
+        "name": "Delete batch of 32 csum actions",
+        "category": [
+            "actions",
+            "csum"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action csum",
+                0,
+                1,
+                255
+            ],
+            "bash -c \"for i in \\`seq 1 32\\`; do cmd=\\\"action csum tcp continue index \\$i \\\"; args=\"\\$args\\$cmd\"; done && $TC actions add \\$args\""
+        ],
+        "cmdUnderTest": "bash -c \"for i in \\`seq 1 32\\`; do cmd=\\\"action csum index \\$i \\\"; args=\"\\$args\\$cmd\"; done && $TC actions del \\$args\"",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions list action csum",
+        "matchPattern": "^[ \t]+index [0-9]+ ref",
+        "matchCount": "0",
+        "teardown": []
+    },
+    {
+        "id": "0015",
+        "name": "Add batch of 32 csum tcp actions with large cookies",
+        "category": [
+            "actions",
+            "csum"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action csum",
+                0,
+                1,
+                255
+            ]
+        ],
+        "cmdUnderTest": "bash -c \"for i in \\`seq 1 32\\`; do cmd=\\\"action csum tcp continue index \\$i cookie aaabbbcccdddeee \\\"; args=\"\\$args\\$cmd\"; done && $TC actions add \\$args\"",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions ls action csum",
+        "matchPattern": "^[ \t]+index [0-9]* ref",
+        "matchCount": "32",
+        "teardown": [
+            "$TC actions flush action csum"
+        ]
+    },
+    {
+        "id": "989e",
+        "name": "Delete batch of 32 csum actions with large cookies",
+        "category": [
+            "actions",
+            "csum"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action csum",
+                0,
+                1,
+                255
+            ],
+            "bash -c \"for i in \\`seq 1 32\\`; do cmd=\\\"action csum tcp continue index \\$i cookie aaabbbcccdddeee \\\"; args=\"\\$args\\$cmd\"; done && $TC actions add \\$args\""
+        ],
+        "cmdUnderTest": "bash -c \"for i in \\`seq 1 32\\`; do cmd=\\\"action csum index \\$i \\\"; args=\"\\$args\\$cmd\"; done && $TC actions del \\$args\"",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions list action csum",
+        "matchPattern": "^[ \t]+index [0-9]+ ref",
+        "matchCount": "0",
+        "teardown": []
     }
 ]
-- 
1.9.1

^ permalink raw reply related

* Re: [RFC v2 bpf-next 0/9] bpf: Add helper to do FIB lookups
From: David Miller @ 2018-05-01 14:20 UTC (permalink / raw)
  To: dsahern; +Cc: netdev, borkmann, ast, shm, roopa, brouer, toke, john.fastabend
In-Reply-To: <20180429180752.15428-1-dsahern@gmail.com>

From: David Ahern <dsahern@gmail.com>
Date: Sun, 29 Apr 2018 11:07:43 -0700

> Provide a helper for doing a FIB and neighbor lookups in the kernel
> tables from an XDP program. The helper provides a fastpath for forwarding
> packets. If the packet is a local delivery or for any reason is not a
> simple lookup and forward, the packet is expected to continue up the stack
> for full processing.
> 
> Patches 1-6 do some more refactoring to IPv6 with the end goal of
> extracting a FIB lookup function that aligns with fib_lookup for IPv4,
> basically returning a fib6_info without creating a dst based entry.
> 
> Patch 7 adds lookup functions to the ipv6 stub. These are needed since
> bpf is built into the kernel and ipv6 may not be built or loaded.
> 
> Patch 8 adds the bpf helper and 9 adds a sample program.
> 
> v2
> - fixed use of foward helper from cls_act as noted by Daniel
> - in patch 1 rename fib6_lookup_1 as well for consistency

I've reviewed this and generally I agree with the semantic choices
wrt. resolution.

We really can't do neigh resolution without an SKB, so at least in
the xdp case we must push the packet up into the full stack path.

I guess we could do the neigh resolve in the cls_bpf case, but I
wonder how helpful that would be.

^ permalink raw reply

* Re: [PATCH net-next 0/2 v5] netns: uevent filtering
From: David Miller @ 2018-05-01 14:23 UTC (permalink / raw)
  To: ebiederm
  Cc: christian.brauner, netdev, linux-kernel, avagin, ktkhai, serge,
	gregkh
In-Reply-To: <87fu3cbsdw.fsf@xmission.com>

From: ebiederm@xmission.com (Eric W. Biederman)
Date: Mon, 30 Apr 2018 10:55:55 -0500

> Christian Brauner <christian.brauner@ubuntu.com> writes:
> 
>> Hey everyone,
>>
>> This is the new approach to uevent filtering as discussed (see the
>> threads in [1], [2], and [3]). It only contains *non-functional
>> changes*.
 ...
> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>

Series applied, thanks everyone.

^ permalink raw reply

* Re: [PATCH V10 net-next 00/14] TLS offload, netdev & MLX5 support
From: David Miller @ 2018-05-01 14:23 UTC (permalink / raw)
  To: borisp; +Cc: netdev, saeedm, davejwatson, ktkhai, sergei.shtylyov
In-Reply-To: <1525072583-138506-1-git-send-email-borisp@mellanox.com>

From: Boris Pismenny <borisp@mellanox.com>
Date: Mon, 30 Apr 2018 10:16:09 +0300

> The following series provides TLS TX inline crypto offload.

Series applied, assuming the build is successful this will be
pushed out to net-next shortly.

Thank you.

^ permalink raw reply

* Re: [PATCH iproute2-master] iproute: Parse last nexthop in a multipath route
From: David Ahern @ 2018-05-01 14:59 UTC (permalink / raw)
  To: Ido Schimmel, netdev; +Cc: stephen, mlxsw
In-Reply-To: <20180501131635.14981-1-idosch@mellanox.com>

On 5/1/18 7:16 AM, Ido Schimmel wrote:
> Continue parsing a multipath payload as long as another nexthop can fit
> in the payload.
> 
> # ip route add 192.0.2.0/24 nexthop dev dummy0 nexthop dev dummy1
> 
> Before:
> # ip route show 192.0.2.0/24
> 192.0.2.0/24
>         nexthop dev dummy0 weight 1
> 
> After:
> # ip route show 192.0.2.0/24
> 192.0.2.0/24
>         nexthop dev dummy0 weight 1
>         nexthop dev dummy1 weight 1
> 
> Fixes: f48e14880a0e ("iproute: refactor multipath print")
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> ---
>  ip/iproute.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 

Acked-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply

* Re: Request for stable 4.14.x inclusion: net: don't call update_pmtu unconditionally
From: Greg KH @ 2018-05-01 15:04 UTC (permalink / raw)
  To: Thomas Deutschmann; +Cc: Eddie Chapman, stable, davem, nicolas.dichtel, netdev
In-Reply-To: <1ae8845f-6106-29e1-ceec-02eff35beed9@gentoo.org>

On Tue, May 01, 2018 at 12:15:37AM +0200, Thomas Deutschmann wrote:
> Hi,
> 
> On 2018-04-30 20:22, Greg KH wrote:
> > The geneve hunk doesn't apply at all to the 4.14.y tree, so I think
> > someone has a messed up tree somewhere...
> > 
> > I'll go look into this now.
> 
> Mh?
> 
> > $ git clone https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
> > $ cd linux-stable
> > $ git checkout v4.14.38
> > $ git cherry-pick 52a589d51f1008f62569bf89e95b26221ee76690
> 
> Works for me... then I cherry-pick
> f15ca723c1ebe6c1a06bc95fda6b62cd87b44559 on top, adjust
> "net/ipv6/ip6_tunnel.c" like shown in my previous mail and everything is
> fine for me...

Ah crap, I missed the dependancy here as well, it was a long day
yesterday...

I'll drop this and try it again for the next release.

greg k-h

^ permalink raw reply

* Re: [RFC/PATCH] net: ethernet: nixge: Use of_get_mac_address()
From: Rob Herring @ 2018-05-01 15:05 UTC (permalink / raw)
  To: Moritz Fischer; +Cc: linux-kernel, devicetree, netdev, davem, mark.rutland
In-Reply-To: <20180426220401.bad53hxnkxwvgjot@derp-derp.lan>

On Thu, Apr 26, 2018 at 03:04:01PM -0700, Moritz Fischer wrote:
> On Thu, Apr 26, 2018 at 02:57:42PM -0700, Moritz Fischer wrote:
> > Make nixge driver work with 'mac-address' property instead of
> > 'address' property. There are currently no in-tree users and
> > the only users of this driver are devices that use overlays
> > we control to instantiate the device together with the corresponding
> > FPGA images.
> > 
> > Signed-off-by: Moritz Fischer <mdf@kernel.org>
> > ---
> > 
> > Hi David, Rob,
> > 
> > with Mike's change that enable the generic 'mac-address'
> > binding that I barely missed with the submission of this
> > driver I was wondering if we can still change the binding.
> > 
> > I'm aware that this generally is a nonono case, since the binding
> > is considered API, but since there are no users outside of our
> > devicetree overlays that we ship with our devices I thought I'd ask.

Fine by me. It really comes down to whether there are any users that 
would be impacted.

Rob

^ permalink raw reply

* Re: [PATCH net-next 1/2] mlxsw: spectrum_router: Return an error for non-default FIB rules
From: David Ahern @ 2018-05-01 15:16 UTC (permalink / raw)
  To: Ido Schimmel, netdev; +Cc: davem, jiri, mlxsw
In-Reply-To: <20180501081639.29162-2-idosch@mellanox.com>

On 5/1/18 2:16 AM, Ido Schimmel wrote:
> Since commit 9776d32537d2 ("net: Move call_fib_rule_notifiers up in
> fib_nl_newrule") it is possible to forbid the installation of
> unsupported FIB rules.
> 
> Have mlxsw return an error for non-default FIB rules in addition to the
> existing extack message.
> 
> Example:
> # ip rule add from 198.51.100.1 table 10
> Error: mlxsw_spectrum: FIB rules not supported.
> 
> Note that offload is only aborted when non-default FIB rules are already
> installed and merely replayed during module initialization.
> 
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> ---
>  drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
> index 8e4edb634b11..baea97560029 100644
> --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
> +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
> @@ -5899,7 +5899,7 @@ static int mlxsw_sp_router_fib_rule_event(unsigned long event,
>  	}
>  
>  	if (err < 0)
> -		NL_SET_ERR_MSG_MOD(extack, "FIB rules not supported. Aborting offload");
> +		NL_SET_ERR_MSG_MOD(extack, "FIB rules not supported");
>  
>  	return err;

shouldn't mlxsw_sp_router_fib_rule_event return -EOPNOTSUPP instead of
-1 (EPERM)?


>  }
> @@ -5926,8 +5926,8 @@ static int mlxsw_sp_router_fib_event(struct notifier_block *nb,
>  	case FIB_EVENT_RULE_DEL:
>  		err = mlxsw_sp_router_fib_rule_event(event, info,
>  						     router->mlxsw_sp);
> -		if (!err)
> -			return NOTIFY_DONE;
> +		if (!err || info->extack)
> +			return notifier_from_errno(err);
>  	}
>  
>  	fib_work = kzalloc(sizeof(*fib_work), GFP_ATOMIC);
> 

^ permalink raw reply

* Re: [PATCH net-next 1/2] mlxsw: spectrum_router: Return an error for non-default FIB rules
From: Ido Schimmel @ 2018-05-01 15:19 UTC (permalink / raw)
  To: David Ahern; +Cc: Ido Schimmel, netdev, davem, jiri, mlxsw
In-Reply-To: <320c79af-3de6-f012-75a2-e5a7effde9f6@gmail.com>

On Tue, May 01, 2018 at 09:16:23AM -0600, David Ahern wrote:
> On 5/1/18 2:16 AM, Ido Schimmel wrote:
> > Since commit 9776d32537d2 ("net: Move call_fib_rule_notifiers up in
> > fib_nl_newrule") it is possible to forbid the installation of
> > unsupported FIB rules.
> > 
> > Have mlxsw return an error for non-default FIB rules in addition to the
> > existing extack message.
> > 
> > Example:
> > # ip rule add from 198.51.100.1 table 10
> > Error: mlxsw_spectrum: FIB rules not supported.
> > 
> > Note that offload is only aborted when non-default FIB rules are already
> > installed and merely replayed during module initialization.
> > 
> > Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> > ---
> >  drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
> > index 8e4edb634b11..baea97560029 100644
> > --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
> > +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
> > @@ -5899,7 +5899,7 @@ static int mlxsw_sp_router_fib_rule_event(unsigned long event,
> >  	}
> >  
> >  	if (err < 0)
> > -		NL_SET_ERR_MSG_MOD(extack, "FIB rules not supported. Aborting offload");
> > +		NL_SET_ERR_MSG_MOD(extack, "FIB rules not supported");
> >  
> >  	return err;
> 
> shouldn't mlxsw_sp_router_fib_rule_event return -EOPNOTSUPP instead of
> -1 (EPERM)?

The -1 wasn't visible until now so it didn't matter. Will change to
-EOPNOTSUPP in v2. Thanks

^ permalink raw reply

* Re: [PATCH] vhost: make msg padding explicit
From: David Miller @ 2018-05-01 15:28 UTC (permalink / raw)
  To: mst; +Cc: linux-kernel, kevin, jasowang, kvm, virtualization, netdev
In-Reply-To: <1524844881-178524-1-git-send-email-mst@redhat.com>

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Fri, 27 Apr 2018 19:02:05 +0300

> There's a 32 bit hole just after type. It's best to
> give it a name, this way compiler is forced to initialize
> it with rest of the structure.
> 
> Reported-by: Kevin Easton <kevin@guarana.org>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Michael, will you be sending this directly to Linus or would you like
me to apply it to net or net-next?

Thanks.

^ permalink raw reply

* Re: [PATCH v2 0/2] net: stmmac: dwmac-meson: 100M phy mode support for AXG SoC
From: David Miller @ 2018-05-01 15:30 UTC (permalink / raw)
  To: yixun.lan
  Cc: netdev, khilman, carlo, robh, jbrunet, martin.blumenstingl,
	linux-amlogic, linux-arm-kernel, linux-kernel, devicetree
In-Reply-To: <20180428102111.18384-1-yixun.lan@amlogic.com>

From: Yixun Lan <yixun.lan@amlogic.com>
Date: Sat, 28 Apr 2018 10:21:09 +0000

> Due to the dwmac glue layer register changed, we need to 
> introduce a new compatible name for the Meson-AXG SoC
> to support for the RMII 100M ethernet PHY.
> 
> Change since v1 at [1]:
>   - implement set_phy_mode() for each SoC
> 
> [1] https://lkml.kernel.org/r/20180426160508.29380-1-yixun.lan@amlogic.com

Series applied, thank you.

^ permalink raw reply

* Re: [PATCH net-next v6] Add Common Applications Kept Enhanced (cake) qdisc
From: Eric Dumazet @ 2018-05-01 16:06 UTC (permalink / raw)
  To: Dave Taht, Cong Wang
  Cc: Toke Høiland-Jørgensen, Linux Kernel Network Developers,
	Cake List
In-Reply-To: <CAA93jw6F+c-QRXe+MA2QmRkwiKEBqFgOFKTvWGfO7FvCQ5tFvw@mail.gmail.com>

On 04/30/2018 02:27 PM, Dave Taht wrote:

> I actually have a tc - bpf based ack filter, during the development of
> cake's ack-thinner, that I should submit one of these days. It
> proved to be of limited use.
> 
> Probably the biggest mistake we made is by calling this cake feature a
> filter. It isn't.
> 
> Maybe we should have called it a "thinner" or something like that? In
> order to properly "thin" or "reduce" an ack stream
> you have to have a queue to look at and some related state. TC filters
> do not operate on queues, qdiscs do. Thus the "ack-filter" here is
> deeply embedded into cake's flow isolation and queue structures.

A feature eating packets _is_ a filter.

Given that a qdisc only sees one direction, I really do not get why ack-filter
is so desirable in a packet scheduler.

You have not provided any numbers to show how useful it is to maintain this
code (probably still broken btw, considering it is changing some skb attributes).

On wifi (or any half duplex medium), you might gain something by carefully
sending ACK not too often, but ultimately this should be done by TCP stack,
in well controlled environment [1], instead of middle-boxes happily playing/breaking
some Internet standards.

[1] TCP stack has the estimations of RTT, RWIN, throughput, and might be able to
avoid flooding the network with acks under some conditions.

Say RTT is 100ms, and we receive 1 packet every 100 usec (no GRO aggregation)
Maybe we do not really _need_ to send 5000 ack per second 
(or even 10,000 ack per second if a hole needs a repair)

Also on wifi, the queue builds in the driver queues anyway, not in the qdisc.
So ACK filtering, if _really_ successful, would need to be modularized.

Please split Cake into a patch series.
Presumably putting the ack-filter on a patch of its own.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox