Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net] net/mlx5e: fix csum adjustments caused by RXFCS
From: Eric Dumazet @ 2018-10-30 19:03 UTC (permalink / raw)
  To: Cong Wang
  Cc: David Miller, netdev, Eric Dumazet, Eran Ben Elisha,
	Saeed Mahameed, Dimitris Michailidis, Paweł Staszewski
In-Reply-To: <CAM_iQpUPWncS22uYURFZ5OsvPZp7aJozPNOcDCak9nRFOS0jig@mail.gmail.com>

On Tue, Oct 30, 2018 at 11:59 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:

> I wonder how compiler recognizes it as "never fail" when marked with
> __must_check.

__must_check means that you can not ignore the return value of a function.

Here we do use the return value.

Also prior code was not checking skb->length so really my patch does
add explicit
check if in the future skb->len gets wrong after a bug is added in this driver.

(A NULL deref will trap the bug, instead of potentially reading
garbage from skb->data)

^ permalink raw reply

* Re: [PATCH net] net/mlx5e: fix csum adjustments caused by RXFCS
From: Cong Wang @ 2018-10-30 18:59 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Linux Kernel Network Developers, Eric Dumazet,
	eranbe, Saeed Mahameed, dmichail, Paweł Staszewski
In-Reply-To: <CANn89iKx7d18v+gG33QyyOPAhVR8bh4t=qRGDaJcM7gTXAOU1w@mail.gmail.com>

On Tue, Oct 30, 2018 at 11:42 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, Oct 30, 2018 at 11:09 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> > At least skb_header_pointer() is marked as __must_check, I don't see
> > you check its return value here.
>
> This can not fail here.
>
> skb->length must be above 14+4 at this point.

Never say it is wrong, just saying what compiler thinks.

>
> My compiler seems to be okay with that.

I wonder how compiler recognizes it as "never fail" when marked with
__must_check.

^ permalink raw reply

* Re: [Patch net] net: make pskb_trim_rcsum_slow() robust
From: Cong Wang @ 2018-10-30 18:57 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Linux Kernel Network Developers, Eric Dumazet
In-Reply-To: <f0639ba8-ba57-4782-1133-443b51d61bb0@gmail.com>

On Mon, Oct 29, 2018 at 8:08 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 10/29/2018 07:41 PM, Cong Wang wrote:
> > On Mon, Oct 29, 2018 at 7:25 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >>
> >>
> >>
> >> On 10/29/2018 07:21 PM, Cong Wang wrote:
> >>> On Mon, Oct 29, 2018 at 7:14 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >>>>
> >>>> Would not it be simpler to set ip_summed to CHECKSUM_NONE (no need to save old_csum) ?
> >>>
> >>> For !CHECKSUM_COMPLETE, ip_summed should be untouched, right?
> >>>
> >>> If you mean only setting to CHECKSUM_NONE for CHECKSUM_COMPLETE case,
> >>> the end result may not be simpler.
> >>
> >> I meant to reinstate what was there before my patch in this error case
> >>
> >>        if (skb->ip_summed == CHECKSUM_COMPLETE)
> >>                skb->ip_summed = CHECKSUM_NONE;
> >>
> >> That would only be run in error (quite unlikely) path, instead of saving old_csum in all cases.
> >
> > I know your point, however, I am not sure that is a desired behavior.
> >
> > On failure, I think the whole skb should be restored to its previous state
> > before entering this function, changing it to CHECKSUM_NONE on failure
> > is inconsistent with success case.
> >
>
> Before my patch, we were changing skb->ip_summed to CHECKSUM_NONE,
> so why suddenly we need to be consistent ?

That is because setting it to CHECKSUM_NONE _was_ how the success
case works and nothing _was_ needed for failure case.

You changed how we handle checksum for success case, it is why we need
to change for the failure case too.


>
> In any case, ip_check_defrag() should really drop this skb, as for other allocation
> failures (like skb_share_check()), if really we want consistency.

I have the same feeling, just not brave enough to change the logic of
ip_check_defrag() where pskb_may_pull() failure is treated in a same way.

^ permalink raw reply

* [PATCH 3/4] net: macb: Add pm runtime support
From: Harini Katakam @ 2018-10-31  3:40 UTC (permalink / raw)
  To: nicolas.ferre, davem, claudiu.beznea
  Cc: netdev, linux-kernel, michal.simek, harinikatakamlinux,
	harini.katakam, Harini Katakam, Shubhrajyoti Datta
In-Reply-To: <1540957223-30984-1-git-send-email-harini.katakam@xilinx.com>

From: Harini Katakam <harinik@xilinx.com>

Add runtime pm functions and move clock handling there.
If device is suspended and not a wake device, then return from
mdio read/write functions without performing any action because
the clocks are not active.

Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@xilinx.com>
Signed-off-by: Harini Katakam <harinik@xilinx.com>
---
Changes from RFC:
Updated pm get sync/put sync calls.
Removed unecessary clk up in mdio helpers.

 drivers/net/ethernet/cadence/macb_main.c | 103 +++++++++++++++++++++++++------
 1 file changed, 85 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index 7ae8d731..09cb4bb 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -36,6 +36,7 @@
 #include <linux/ip.h>
 #include <linux/udp.h>
 #include <linux/tcp.h>
+#include <linux/pm_runtime.h>
 #include "macb.h"
 
 #define MACB_RX_BUFFER_SIZE	128
@@ -78,6 +79,7 @@
  * 1 frame time (10 Mbits/s, full-duplex, ignoring collisions)
  */
 #define MACB_HALT_TIMEOUT	1230
+#define MACB_PM_TIMEOUT  100 /* ms */
 
 /* DMA buffer descriptor might be different size
  * depends on hardware configuration:
@@ -345,6 +347,10 @@ static int macb_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
 	int value;
 	int err;
 
+	if (pm_runtime_status_suspended(&bp->pdev->dev) &&
+	    !device_may_wakeup(&bp->dev->dev))
+		return -EAGAIN;
+
 	err = macb_mdio_wait_for_idle(bp);
 	if (err < 0)
 		return err;
@@ -370,6 +376,9 @@ static int macb_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
 	struct macb *bp = bus->priv;
 	int err;
 
+	if (pm_runtime_status_suspended(&bp->pdev->dev) &&
+	    !device_may_wakeup(&bp->dev->dev))
+		return -EAGAIN;
 
 	err = macb_mdio_wait_for_idle(bp);
 	if (err < 0)
@@ -2397,12 +2406,18 @@ static int macb_open(struct net_device *dev)
 
 	netdev_dbg(bp->dev, "open\n");
 
+	err = pm_runtime_get_sync(&bp->pdev->dev);
+	if (err < 0)
+		goto pm_exit;
+
 	/* carrier starts down */
 	netif_carrier_off(dev);
 
 	/* if the phy is not yet register, retry later*/
-	if (!dev->phydev)
-		return -EAGAIN;
+	if (!dev->phydev) {
+		err = -EAGAIN;
+		goto pm_exit;
+	}
 
 	/* RX buffers initialization */
 	macb_init_rx_buffer_size(bp, bufsz);
@@ -2411,7 +2426,7 @@ static int macb_open(struct net_device *dev)
 	if (err) {
 		netdev_err(dev, "Unable to allocate DMA memory (error %d)\n",
 			   err);
-		return err;
+		goto pm_exit;
 	}
 
 	bp->macbgem_ops.mog_init_rings(bp);
@@ -2428,6 +2443,11 @@ static int macb_open(struct net_device *dev)
 	if (bp->ptp_info)
 		bp->ptp_info->ptp_init(dev);
 
+pm_exit:
+	if (err) {
+		pm_runtime_put_sync(&bp->pdev->dev);
+		return err;
+	}
 	return 0;
 }
 
@@ -2456,6 +2476,8 @@ static int macb_close(struct net_device *dev)
 	if (bp->ptp_info)
 		bp->ptp_info->ptp_remove(dev);
 
+	pm_runtime_put(&bp->pdev->dev);
+
 	return 0;
 }
 
@@ -4019,6 +4041,11 @@ static int macb_probe(struct platform_device *pdev)
 	if (err)
 		return err;
 
+	pm_runtime_set_autosuspend_delay(&pdev->dev, MACB_PM_TIMEOUT);
+	pm_runtime_use_autosuspend(&pdev->dev);
+	pm_runtime_get_noresume(&pdev->dev);
+	pm_runtime_set_active(&pdev->dev);
+	pm_runtime_enable(&pdev->dev);
 	native_io = hw_is_native_io(mem);
 
 	macb_probe_queues(mem, native_io, &queue_mask, &num_queues);
@@ -4150,6 +4177,9 @@ static int macb_probe(struct platform_device *pdev)
 		    macb_is_gem(bp) ? "GEM" : "MACB", macb_readl(bp, MID),
 		    dev->base_addr, dev->irq, dev->dev_addr);
 
+	pm_runtime_mark_last_busy(&bp->pdev->dev);
+	pm_runtime_put_autosuspend(&bp->pdev->dev);
+
 	return 0;
 
 err_out_unregister_mdio:
@@ -4169,6 +4199,9 @@ static int macb_probe(struct platform_device *pdev)
 	clk_disable_unprepare(pclk);
 	clk_disable_unprepare(rx_clk);
 	clk_disable_unprepare(tsu_clk);
+	pm_runtime_disable(&pdev->dev);
+	pm_runtime_set_suspended(&pdev->dev);
+	pm_runtime_dont_use_autosuspend(&pdev->dev);
 
 	return err;
 }
@@ -4192,11 +4225,16 @@ static int macb_remove(struct platform_device *pdev)
 		mdiobus_free(bp->mii_bus);
 
 		unregister_netdev(dev);
-		clk_disable_unprepare(bp->tx_clk);
-		clk_disable_unprepare(bp->hclk);
-		clk_disable_unprepare(bp->pclk);
-		clk_disable_unprepare(bp->rx_clk);
-		clk_disable_unprepare(bp->tsu_clk);
+		pm_runtime_disable(&pdev->dev);
+		pm_runtime_dont_use_autosuspend(&pdev->dev);
+		if (!pm_runtime_suspended(&pdev->dev)) {
+			clk_disable_unprepare(bp->tx_clk);
+			clk_disable_unprepare(bp->hclk);
+			clk_disable_unprepare(bp->pclk);
+			clk_disable_unprepare(bp->rx_clk);
+			clk_disable_unprepare(bp->tsu_clk);
+			pm_runtime_set_suspended(&pdev->dev);
+		}
 		of_node_put(bp->phy_node);
 		free_netdev(dev);
 	}
@@ -4217,13 +4255,9 @@ static int __maybe_unused macb_suspend(struct device *dev)
 		macb_writel(bp, IER, MACB_BIT(WOL));
 		macb_writel(bp, WOL, MACB_BIT(MAG));
 		enable_irq_wake(bp->queues[0].irq);
-	} else {
-		clk_disable_unprepare(bp->tx_clk);
-		clk_disable_unprepare(bp->hclk);
-		clk_disable_unprepare(bp->pclk);
-		clk_disable_unprepare(bp->rx_clk);
 	}
-	clk_disable_unprepare(bp->tsu_clk);
+
+	pm_runtime_force_suspend(dev);
 
 	return 0;
 }
@@ -4234,11 +4268,43 @@ static int __maybe_unused macb_resume(struct device *dev)
 	struct net_device *netdev = platform_get_drvdata(pdev);
 	struct macb *bp = netdev_priv(netdev);
 
+	pm_runtime_force_resume(dev);
+
 	if (bp->wol & MACB_WOL_ENABLED) {
 		macb_writel(bp, IDR, MACB_BIT(WOL));
 		macb_writel(bp, WOL, 0);
 		disable_irq_wake(bp->queues[0].irq);
-	} else {
+	}
+
+	netif_device_attach(netdev);
+
+	return 0;
+}
+
+static int __maybe_unused macb_runtime_suspend(struct device *dev)
+{
+	struct platform_device *pdev = to_platform_device(dev);
+	struct net_device *netdev = platform_get_drvdata(pdev);
+	struct macb *bp = netdev_priv(netdev);
+
+	if (!(device_may_wakeup(&bp->dev->dev))) {
+		clk_disable_unprepare(bp->tx_clk);
+		clk_disable_unprepare(bp->hclk);
+		clk_disable_unprepare(bp->pclk);
+		clk_disable_unprepare(bp->rx_clk);
+	}
+	clk_disable_unprepare(bp->tsu_clk);
+
+	return 0;
+}
+
+static int __maybe_unused macb_runtime_resume(struct device *dev)
+{
+	struct platform_device *pdev = to_platform_device(dev);
+	struct net_device *netdev = platform_get_drvdata(pdev);
+	struct macb *bp = netdev_priv(netdev);
+
+	if (!(device_may_wakeup(&bp->dev->dev))) {
 		clk_prepare_enable(bp->pclk);
 		clk_prepare_enable(bp->hclk);
 		clk_prepare_enable(bp->tx_clk);
@@ -4246,12 +4312,13 @@ static int __maybe_unused macb_resume(struct device *dev)
 	}
 	clk_prepare_enable(bp->tsu_clk);
 
-	netif_device_attach(netdev);
-
 	return 0;
 }
 
-static SIMPLE_DEV_PM_OPS(macb_pm_ops, macb_suspend, macb_resume);
+static const struct dev_pm_ops macb_pm_ops = {
+	SET_SYSTEM_SLEEP_PM_OPS(macb_suspend, macb_resume)
+	SET_RUNTIME_PM_OPS(macb_runtime_suspend, macb_runtime_resume, NULL)
+};
 
 static struct platform_driver macb_driver = {
 	.probe		= macb_probe,
-- 
2.7.4

^ permalink raw reply related

* [PATCH 1/4] net: macb: Check MDIO state before read/write and use timeouts
From: Harini Katakam @ 2018-10-31  3:40 UTC (permalink / raw)
  To: nicolas.ferre, davem, claudiu.beznea
  Cc: netdev, linux-kernel, michal.simek, harinikatakamlinux,
	harini.katakam, Harini Katakam, Shubhrajyoti Datta,
	Sai Pavan Boddu
In-Reply-To: <1540957223-30984-1-git-send-email-harini.katakam@xilinx.com>

From: Harini Katakam <harinik@xilinx.com>

Replace the while loop in MDIO read/write functions with a timeout.
In addition, add a check for MDIO bus busy before initiating a new
operation as well to make sure there is no ongoing MDIO operation.

Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@xilinx.com>
Signed-off-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com>
Signed-off-by: Harini Katakam <harinik@xilinx.com>
---
Changes form RFC:
Cleaned up timeout implementation and moved it to a helper.

 drivers/net/ethernet/cadence/macb_main.c | 44 +++++++++++++++++++++++++++-----
 1 file changed, 38 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index 0acaef3..b4e26c1 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -318,10 +318,36 @@ static void macb_get_hwaddr(struct macb *bp)
 	eth_hw_addr_random(bp->dev);
 }
 
+static int macb_mdio_wait_for_idle(struct macb *bp)
+{
+	ulong timeout;
+
+	timeout = jiffies + msecs_to_jiffies(1000);
+	/* wait for end of transfer */
+	while (1) {
+		if (MACB_BFEXT(IDLE, macb_readl(bp, NSR)))
+			break;
+
+		if (time_after_eq(jiffies, timeout)) {
+			netdev_err(bp->dev, "wait for end of transfer timed out\n");
+			return -ETIMEDOUT;
+		}
+
+		cpu_relax();
+	}
+
+	return 0;
+}
+
 static int macb_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
 {
 	struct macb *bp = bus->priv;
 	int value;
+	int err;
+
+	err = macb_mdio_wait_for_idle(bp);
+	if (err < 0)
+		return err;
 
 	macb_writel(bp, MAN, (MACB_BF(SOF, MACB_MAN_SOF)
 			      | MACB_BF(RW, MACB_MAN_READ)
@@ -329,9 +355,9 @@ static int macb_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
 			      | MACB_BF(REGA, regnum)
 			      | MACB_BF(CODE, MACB_MAN_CODE)));
 
-	/* wait for end of transfer */
-	while (!MACB_BFEXT(IDLE, macb_readl(bp, NSR)))
-		cpu_relax();
+	err = macb_mdio_wait_for_idle(bp);
+	if (err < 0)
+		return err;
 
 	value = MACB_BFEXT(DATA, macb_readl(bp, MAN));
 
@@ -342,6 +368,12 @@ static int macb_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
 			   u16 value)
 {
 	struct macb *bp = bus->priv;
+	int err;
+
+
+	err = macb_mdio_wait_for_idle(bp);
+	if (err < 0)
+		return err;
 
 	macb_writel(bp, MAN, (MACB_BF(SOF, MACB_MAN_SOF)
 			      | MACB_BF(RW, MACB_MAN_WRITE)
@@ -350,9 +382,9 @@ static int macb_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
 			      | MACB_BF(CODE, MACB_MAN_CODE)
 			      | MACB_BF(DATA, value)));
 
-	/* wait for end of transfer */
-	while (!MACB_BFEXT(IDLE, macb_readl(bp, NSR)))
-		cpu_relax();
+	err = macb_mdio_wait_for_idle(bp);
+	if (err < 0)
+		return err;
 
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related

* [PATCH 0/4] Macb power management support for ZynqMP
From: Harini Katakam @ 2018-10-31  3:40 UTC (permalink / raw)
  To: nicolas.ferre, davem, claudiu.beznea
  Cc: netdev, linux-kernel, michal.simek, harinikatakamlinux,
	harini.katakam

This series adds support for macb suspend/resume with system power down
and wake on LAN with ARP packets.
In relation to the above, this series also updates mdio_read/write
function for PM and adds tsu clock management.

Harini Katakam (4):
  net: macb: Check MDIO state before read/write and use timeouts
  net: macb: Support clock management for tsu_clk
  net: macb: Add pm runtime support
  net: macb: Add support for suspend/resume with full power down

 drivers/net/ethernet/cadence/macb.h      |   3 +-
 drivers/net/ethernet/cadence/macb_main.c | 207 +++++++++++++++++++++++++++----
 2 files changed, 182 insertions(+), 28 deletions(-)

-- 
2.7.4

^ permalink raw reply

* Re: [PATCH net] net/mlx5e: fix csum adjustments caused by RXFCS
From: David Miller @ 2018-10-30 18:43 UTC (permalink / raw)
  To: xiyou.wangcong
  Cc: edumazet, netdev, eric.dumazet, eranbe, saeedm, dmichail,
	pstaszewski
In-Reply-To: <CAM_iQpXS2zp-_6XEmzyfvenomDmHqR3uJWoSiMSJG71OKZ3Sjg@mail.gmail.com>

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Tue, 30 Oct 2018 11:09:21 -0700

> On Tue, Oct 30, 2018 at 12:57 AM Eric Dumazet <edumazet@google.com> wrote:
>> -static __be32 mlx5e_get_fcs(struct sk_buff *skb)
>> +static u32 mlx5e_get_fcs(const struct sk_buff *skb)
>>  {
>> -       int last_frag_sz, bytes_in_prev, nr_frags;
>> -       u8 *fcs_p1, *fcs_p2;
>> -       skb_frag_t *last_frag;
>> -       __be32 fcs_bytes;
>> +       const void *fcs_bytes;
>> +       u32 _fcs_bytes;
>>
>> -       if (!skb_is_nonlinear(skb))
>> -               return *(__be32 *)(skb->data + skb->len - ETH_FCS_LEN);
>> +       fcs_bytes = skb_header_pointer(skb, skb->len - ETH_FCS_LEN,
>> +                                      ETH_FCS_LEN, &_fcs_bytes);
> 
> At least skb_header_pointer() is marked as __must_check, I don't see
> you check its return value here.

In this case we reasonably know it is guaranteed to succeed though.

We know skb->len is non-zero and we are asking for the skb->len - 1
byte or something like that.

^ permalink raw reply

* Re: [PATCH net] net/mlx5e: fix csum adjustments caused by RXFCS
From: Eric Dumazet @ 2018-10-30 18:42 UTC (permalink / raw)
  To: Cong Wang
  Cc: David Miller, netdev, Eric Dumazet, Eran Ben Elisha,
	Saeed Mahameed, Dimitris Michailidis, Paweł Staszewski
In-Reply-To: <CAM_iQpXS2zp-_6XEmzyfvenomDmHqR3uJWoSiMSJG71OKZ3Sjg@mail.gmail.com>

On Tue, Oct 30, 2018 at 11:09 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:

> At least skb_header_pointer() is marked as __must_check, I don't see
> you check its return value here.

This can not fail here.

skb->length must be above 14+4 at this point.

My compiler seems to be okay with that.

^ permalink raw reply

* Your urgent respond is needed.
From: Dr. Bernard Zoungrana @ 2018-10-30 18:38 UTC (permalink / raw)


Dear Partner,

  We want to transfer to overseas ($50.500, 000.00) Fifth Million Five
Hundred Thousand United States Dollars) from the Bank of Africa, I
want to ask you to quietly look for a reliable and honest person who
will be capable and fit to provide either an existing bank account or
to set up a new Bank account immediately to receive this money, even
an empty account can serve to receive this funds quietly.

  I am Dr. Bernard Zoungrana, the accountant personal confidant to
Mr.Raymond Beck. Who died together with his wife in a plane crash on
the Tuesday, November 2, 1999 on their way to attend wedding in
Boston. Mr. Raymond Beck, is an American, a physician and
industrialist, he died without having any beneficiary to his assets
including his account here in Burkina Faso which he opened in a Bank
of Africa in the year 1994 as his personal savings for the purpose of
expansion and development of his company before his untimely death in
1999.

  The amount involved is ($50,500,000.00) Fifth Million Five Hundred
Thousand United State Dollars, no other person knows about this
account, I am contacting you for us to transfer this funds to your
account as the beneficiary as foreigner, I am only contacting you as a
foreigner because this money cannot be approved to a local person
here, without valid international foreign passport, but can only be
approved to any foreigner with valid international passport or
driver's license and foreign account because the money is in US
Dollars and the former owner of the account Mr. Raymond Beck is a
foreigner too, and as such the money can only be approved into a
foreign account.

  However, I am revealing this to you with believe in God that you
will never let me down in this business, you are the first and the
only person that I am contacting for this business, so please reply
urgently so that I will inform you the next step to take urgently.
Send also your private telephone and fax number including the full
details of the account to be used for the deposit. I need your full
co-operation to make this work fine. Because the management is ready
to approve this payment to any foreigner who has correct information
of this account, which I will give to you, upon your positive response
and once I am convinced that you are capable and will meet up with
instruction of a key bank official who is deeply involved with me in
this business at the conclusion of this business, you will be given
40% of the total amount, 50% will be for me, while 10% will be for
expenses both parties might have incurred during the process of
transferring. I look forward to your earliest reply.

(1) YOUR FULL NAME: _________________________ (2) YOUR
AGE:_______________________________

(3) MARITAL STATUS: __________________________(4) YOUR CELL PHONE
NUMBER: ______________

(5) YOUR FAX NUMBER: ________________________ (6) YOUR
COUNTRY:__________________________

(7) YOUR OCCUPATION: ________________________ (8)
SEX:_____________________________________

(9) YOUR RELIGION:___________________________ (10) YOUR PRIVATE E-MAIL
ADDRESS: ___________

NOTE: THIS TRANSACTION IS CONFIDENTIAL.

Thanks for your assistance in advance.

Dr. Bernard Zoungrana.

Bill and Exchange (Executive Director).

^ permalink raw reply

* Re: [PATCH net v5] net/ipv6: Add anycast addresses to a global hashtable
From: David Miller @ 2018-10-30 18:31 UTC (permalink / raw)
  To: 0xeffeff; +Cc: netdev, kuznet, yoshfuji
In-Reply-To: <CAL6e_pfB=8RE6XFTVj6Bz2qHRstPciS6x3TdHggxwcJqcAtwVQ@mail.gmail.com>

From: Jeff Barnhill <0xeffeff@gmail.com>
Date: Tue, 30 Oct 2018 07:10:58 -0400

> I originally started implementing it the way you suggested; however,
> it seemed to complicate management of that structure because it isn't
> currently using rcu.  Also, assuming that can be worked out, where
> would I get the net from?  Would I need to store a copy in ifcaddr6,
> or is there some way to access it during ipv6_chk_acast_addr()?  It
> seems that if I don't add a copy of net, but instead access it through
> aca_rt(?), then freeing the ifcaddr6 memory becomes problematic
> (detaching it from idev, while read_rcu may still be accessing it).
> On Mon, Oct 29, 2018 at 11:32 PM David Miller <davem@davemloft.net> wrote:

I don't think converting the structure over to RCU, especially because
all of the read paths (everything leading to ipv6_chk_acast_dev()) are
taking RCU locks already.

And I cannot understand how having _two_ structures to manage a piece
of information can be less complicated than just one.

You can add a backpointer to the 'idev' in ifacaddr6 to get at the
network namespace.  You don't even need to do additional reference
counting because the idev->ac_list is always purged before an idev
is destroyed.

^ permalink raw reply

* Re: [PATCH net] net/mlx4_en: add a missing <net/ip.h> include
From: David Miller @ 2018-10-30 18:19 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, tariqt, abdhalee, eric.dumazet
In-Reply-To: <20181030071812.152491-1-edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>
Date: Tue, 30 Oct 2018 00:18:12 -0700

> Abdul Haleem reported a build error on ppc :
> 
> drivers/net/ethernet/mellanox/mlx4/en_rx.c:582:18: warning: `struct
> iphdr` declared inside parameter list [enabled by default]
>            struct iphdr *iph)
>                   ^
> drivers/net/ethernet/mellanox/mlx4/en_rx.c:582:18: warning: its scope is
> only this definition or declaration, which is probably not what you want
> [enabled by default]
> drivers/net/ethernet/mellanox/mlx4/en_rx.c: In function
> get_fixed_ipv4_csum:
> drivers/net/ethernet/mellanox/mlx4/en_rx.c:586:20: error: dereferencing
> pointer to incomplete type
>   __u8 ipproto = iph->protocol;
>                     ^
> 
> Fixes: 55469bc6b577 ("drivers: net: remove <net/busy_poll.h> inclusion when not needed")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>

Applied, thanks Eric.

^ permalink raw reply

* Re: [PATCH net] net/mlx5e: fix csum adjustments caused by RXFCS
From: Cong Wang @ 2018-10-30 18:09 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Linux Kernel Network Developers, Eric Dumazet,
	eranbe, Saeed Mahameed, dmichail, Paweł Staszewski
In-Reply-To: <20181030075725.195824-1-edumazet@google.com>

On Tue, Oct 30, 2018 at 12:57 AM Eric Dumazet <edumazet@google.com> wrote:
> -static __be32 mlx5e_get_fcs(struct sk_buff *skb)
> +static u32 mlx5e_get_fcs(const struct sk_buff *skb)
>  {
> -       int last_frag_sz, bytes_in_prev, nr_frags;
> -       u8 *fcs_p1, *fcs_p2;
> -       skb_frag_t *last_frag;
> -       __be32 fcs_bytes;
> +       const void *fcs_bytes;
> +       u32 _fcs_bytes;
>
> -       if (!skb_is_nonlinear(skb))
> -               return *(__be32 *)(skb->data + skb->len - ETH_FCS_LEN);
> +       fcs_bytes = skb_header_pointer(skb, skb->len - ETH_FCS_LEN,
> +                                      ETH_FCS_LEN, &_fcs_bytes);

At least skb_header_pointer() is marked as __must_check, I don't see
you check its return value here.

^ permalink raw reply

* Re: [PATCH v2] kcm: remove any offset before parsing messages
From: Dominique Martinet @ 2018-10-31  2:56 UTC (permalink / raw)
  To: David Miller; +Cc: doronrk, tom, davejwatson, netdev, linux-kernel
In-Reply-To: <20180918015723.GA26300@nautica>

Dominique Martinet wrote on Tue, Sep 18, 2018:
> David Miller wrote on Mon, Sep 17, 2018:
> > From: Dominique Martinet <asmadeus@codewreck.org>
> > Date: Wed, 12 Sep 2018 07:36:42 +0200
> > > Dominique Martinet wrote on Tue, Sep 11, 2018:
> > >> Hmm, while trying to benchmark this, I sometimes got hangs in
> > >> kcm_wait_data() for the last packet somehow?
> > >> The sender program was done (exited (zombie) so I assumed the sender
> > >> socket flushed), but the receiver was in kcm_wait_data in kcm_recvmsg
> > >> indicating it parsed a header but there was no skb to peek at?
> > >> But the sock is locked so this shouldn't be racy...
> > >> 
> > >> I can get it fairly often with this patch and small messages with an
> > >> offset, but I think it's just because the pull changes some timing - I
> > >> can't hit it with just the clone, and I can hit it with a pull without
> > >> clone as well.... And I don't see how pulling a cloned skb can impact
> > >> the original socket, but I'm a bit fuzzy on this.
> > > 
> > > This is weird, I cannot reproduce at all without that pull, even if I
> > > add another delay there instead of the pull, so it's not just timing...
> > 
> > I really can't apply this patch until you resolve this.
> > 
> > It is weird, given your description, though...
> 
> Thanks for the reminder! I totally agree with you here and did not
> expect this to be merged as it is (in retrospect, I probably should have
> written something to that extent in the subject, "RFC"?)

Found the issue after some trouble reproducing on other VM, long story
short:
 - I was blaming kcm_wait_data's sk_wait_data to wait while there was
something in sk->sk_receive_queue, but after adding a fake timeout and
some debug messages I can see the receive queue is empty.
However going back up from the kcm_sock to the kcm_mux to the kcm_psock,
there are things in the psock's socket's receive_queue... (If I'm
following the code correctly, that would be the underlying tcp socket)

 - that psock's strparser contains some hints: the interrupted and
stopped bits are set. strp->interrupted looks like it's only set if
kcm_parse_msg returns something < 0. . .
And surely enough, the skb_pull returns NULL iff there's such a hang...!

I might be tempted to send a patch to strparser to add a pr_debug
message in strp_abort_strp...

Anyway, that probably explains I have no problem with bigger VM
(uselessly more memory available) or without KASAN (I guess there's
overhead?), but I'm sending at most 300k of data and the VM has a 1.5GB
of ram, so if there's an allocation failure there I think there's a
problem ! . . .

So, well, I'm not sure on the way forward. Adding a bpf helper and
document that kcm users should mind the offset?

Thanks,
-- 
Dominique

^ permalink raw reply

* Re: Latest net-next kernel 4.19.0+
From: Cong Wang @ 2018-10-30 17:54 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Paweł Staszewski, dmichail, Linux Kernel Network Developers
In-Reply-To: <473bee73-b40e-2038-35e2-2c03482f7b75@gmail.com>

On Tue, Oct 30, 2018 at 10:50 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 10/30/2018 10:32 AM, Cong Wang wrote:
>
> > Unlike Pawel's case, we don't use vlan at all, maybe this is why we see
> > it much less frequently than Pawel.
> >
> > Also, it is probably not specific to mlx5, as there is another report which
> > is probably a non-mlx5 driver.
>
> Not sure if you provided a stack trace ?

I said it is the same with Pawel's. Here it is anyway:

[ 3731.075989] eth0: hw csum failure
[ 3731.079316] CPU: 15 PID: 0 Comm: swapper/15 Not tainted 4.14.74.x86_64 #1
[ 3731.086703] Hardware name: Wiwynn F4WW/Y 300-0284/F4WW MAIN BOARD,
BIOS F4WWP02 10/19/2018
[ 3731.094961] Call Trace:
[ 3731.097408]  <IRQ>
[ 3731.099432]  dump_stack+0x46/0x59
[ 3731.102751]  __skb_checksum_complete+0xb8/0xd0
[ 3731.107194]  tcp_v4_rcv+0x116/0xa30
[ 3731.110688]  ip_local_deliver_finish+0x5d/0x1f0
[ 3731.115218]  ip_local_deliver+0x6b/0xe0
[ 3731.119056]  ? ip_rcv_finish+0x400/0x400
[ 3731.122973]  ip_rcv+0x287/0x360
[ 3731.126112]  ? inet_del_offload+0x40/0x40
[ 3731.130124]  __netif_receive_skb_core+0x404/0xc10
[ 3731.134831]  ? netif_receive_skb_internal+0x34/0xd0
[ 3731.139709]  netif_receive_skb_internal+0x34/0xd0
[ 3731.144415]  napi_gro_receive+0xb8/0xe0
[ 3731.148271]  mlx5e_handle_rx_cqe_mpwrq+0x4e3/0x7f0 [mlx5_core]
[ 3731.154099]  ? enqueue_entity+0x103/0x7f0
[ 3731.158114]  mlx5e_poll_rx_cq+0xba/0x850 [mlx5_core]
[ 3731.163080]  mlx5e_napi_poll+0x91/0x290 [mlx5_core]
[ 3731.167955]  net_rx_action+0x14a/0x3e0
[ 3731.171707]  ? credit_entropy_bits+0x23d/0x260
[ 3731.176153]  __do_softirq+0xe2/0x2c3
[ 3731.179734]  irq_exit+0xbc/0xd0
[ 3731.182878]  do_IRQ+0x89/0xd0
[ 3731.185851]  common_interrupt+0x7a/0x7a
[ 3731.189690]  </IRQ>
[ 3731.191799] RIP: 0010:cpuidle_enter_state+0xa6/0x2d0
[ 3731.196761] RSP: 0018:ffffbb950c6f7eb0 EFLAGS: 00000246 ORIG_RAX:
ffffffffffffff60
[ 3731.204328] RAX: ffff9fe25fbe14c0 RBX: 00000364b57553af RCX: 000000000000001f
[ 3731.211459] RDX: 20c49ba5e353f7cf RSI: ffff68294248f469 RDI: 0000000000000000
[ 3731.218583] RBP: ffffdb7d003c3300 R08: 000000000000c3be R09: 0000000000008612
[ 3731.225709] R10: ffffbb950c6f7e98 R11: 000000000000c3be R12: 0000000000000003
[ 3731.232841] R13: ffffffff912c9d18 R14: 0000000000000000 R15: 00000364b396207a
[ 3731.239968]  do_idle+0x166/0x1a0
[ 3731.243199]  cpu_startup_entry+0x6f/0x80
[ 3731.247128]  start_secondary+0x19c/0x1f0
[ 3731.251052]  secondary_startup_64+0xa5/0xb0



>
> Have you tried IPv6 frags maybe ?
>

We have no IPv6 traffic. I asked people to try to generate IPv4 fragment
traffic to see if it would be more reproducible, no progress yet.

^ permalink raw reply

* Re: Latest net-next kernel 4.19.0+
From: Eric Dumazet @ 2018-10-30 17:50 UTC (permalink / raw)
  To: Cong Wang
  Cc: Paweł Staszewski, dmichail, Linux Kernel Network Developers
In-Reply-To: <CAM_iQpUKTh51maAzht8M3LuJAYDRMRnsGn_+Db0rGG-scW2SnA@mail.gmail.com>



On 10/30/2018 10:32 AM, Cong Wang wrote:

> Unlike Pawel's case, we don't use vlan at all, maybe this is why we see
> it much less frequently than Pawel.
> 
> Also, it is probably not specific to mlx5, as there is another report which
> is probably a non-mlx5 driver.

Not sure if you provided a stack trace ?

Have you tried IPv6 frags maybe ?

^ permalink raw reply

* Re: [PATCH iproute2 net-next 0/3] ss: Allow selection of columns to be displayed
From: Stefano Brivio @ 2018-10-30 17:34 UTC (permalink / raw)
  To: David Ahern; +Cc: Yoann P., Stephen Hemminger, netdev
In-Reply-To: <7ffc00c8-bdf6-5c75-564e-2663494bda5d@gmail.com>

On Tue, 30 Oct 2018 10:34:45 -0600
David Ahern <dsahern@gmail.com> wrote:

> A more flexible approach is to use format strings to allow users to
> customize the output order and whitespace as well. So for ss and your
> column list (winging it here):
> 
>     netid          = %N
>     state          = %S
>     recv Q         = %Qr
>     send Q         = %Qs
>     local address  = %Al
>     lport port     = %Pl
>     remote address = %Ar
>     remote port    = %Pr
>     process data   = %p
>     ...
> 
> then a format string could be: "%S  %Qr %Qs  %Al:%Pl %Ar:%Pr  %p\n"

I like the idea indeed, but I see two issues with ss:

- the current column abstraction is rather lightweight, things are
  already buffered in the defined column order so we don't have to jump
  back and forth in the buffer while rendering. Doing that needs some
  extra care to avoid a performance hit, but it's probably doable, I
  can put that on my to-do list

- how would you model automatic spacing in a format string? Should we
  support width specifiers? Disable automatic spacing if a format
  string is given? It might even make sense to allow partial automatic
  spacing with a special character in the format string, that is:

	"%S.%Qr.%Qs  %Al:%Pl %Ar:%Pr  %p\n"

  would mean "align everything to the right, distribute remaining
  whitespace between %S, %Qr and %Qs". But it looks rather complicated
  at a glance.

-- 
Stefano

^ permalink raw reply

* Re: Latest net-next kernel 4.19.0+
From: Cong Wang @ 2018-10-30 17:32 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Paweł Staszewski, dmichail, Linux Kernel Network Developers
In-Reply-To: <76dfbbda-d7f1-b13a-5921-c12c3b0f8e3e@gmail.com>

On Tue, Oct 30, 2018 at 7:16 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 10/30/2018 01:09 AM, Paweł Staszewski wrote:
> >
> >
> > W dniu 30.10.2018 o 08:29, Eric Dumazet pisze:
> >>
> >> On 10/29/2018 11:09 PM, Dimitris Michailidis wrote:
> >>
> >>> Indeed this is a bug. I would expect it to produce frequent errors
> >>> though as many odd-length
> >>> packets would trigger it. Do you have RXFCS? Regardless, how
> >>> frequently do you see the problem?
> >>>
> >> Old kernels (before 88078d98d1bb) were simply resetting ip_summed to CHECKSUM_NONE
> >>
> >> And before your fix (commit d55bef5059dd057bd), mlx5 bug was canceling the bug you fixed.
> >>
> >> So we now need to also fix mlx5.
> >>
> >> And of course use skb_header_pointer() in mlx5e_get_fcs() as I mentioned earlier,
> >> plus __get_unaligned_cpu32() as you hinted.
> >>
> >>
> >>
> >>
> >
> > No RXFCS


Same with Pawel, RXFCS is disabled by default.


> >
> > And this trace is rly frequently like once per 3/4 seconds
> > like below:
> > [28965.776864] vlan1490: hw csum failure
>
> Might be vlan related.

Unlike Pawel's case, we don't use vlan at all, maybe this is why we see
it much less frequently than Pawel.

Also, it is probably not specific to mlx5, as there is another report which
is probably a non-mlx5 driver.

Thanks.

^ permalink raw reply

* [RFC PATCH v3 10/10] selftests: add functionals test for UDP GRO
From: Paolo Abeni @ 2018-10-30 17:24 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Willem de Bruijn, Steffen Klassert,
	Subash Abhinov Kasiviswanathan
In-Reply-To: <cover.1540920083.git.pabeni@redhat.com>

Extends the existing udp programs to allow checking for proper
GRO aggregation/GSO size, and run the tests via a shell script, using
a veth pair with XDP program attached to trigger the GRO code path.

rfc v2 -> rfc v3:
 - add missing test program options documentation
 - fix sporatic test failures (receiver faster than sender)

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 tools/testing/selftests/net/Makefile          |   2 +-
 tools/testing/selftests/net/udpgro.sh         | 147 ++++++++++++++++++
 tools/testing/selftests/net/udpgro_bench.sh   |   8 +-
 tools/testing/selftests/net/udpgso_bench.sh   |   2 +-
 tools/testing/selftests/net/udpgso_bench_rx.c | 123 +++++++++++++--
 tools/testing/selftests/net/udpgso_bench_tx.c |  22 ++-
 6 files changed, 281 insertions(+), 23 deletions(-)
 create mode 100755 tools/testing/selftests/net/udpgro.sh

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index ac999354af54..a8a0d256aafb 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -7,7 +7,7 @@ CFLAGS += -I../../../../usr/include/
 TEST_PROGS := run_netsocktests run_afpackettests test_bpf.sh netdevice.sh rtnetlink.sh
 TEST_PROGS += fib_tests.sh fib-onlink-tests.sh pmtu.sh udpgso.sh ip_defrag.sh
 TEST_PROGS += udpgso_bench.sh fib_rule_tests.sh msg_zerocopy.sh psock_snd.sh
-TEST_PROGS += udpgro_bench.sh
+TEST_PROGS += udpgro_bench.sh udpgro.sh
 TEST_PROGS_EXTENDED := in_netns.sh
 TEST_GEN_FILES =  socket
 TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy
diff --git a/tools/testing/selftests/net/udpgro.sh b/tools/testing/selftests/net/udpgro.sh
new file mode 100755
index 000000000000..3f12b72a3568
--- /dev/null
+++ b/tools/testing/selftests/net/udpgro.sh
@@ -0,0 +1,147 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Run a series of udpgro functional tests.
+
+readonly PEER_NS="ns-peer-$(mktemp -u XXXXXX)"
+
+cleanup() {
+	local -r jobs="$(jobs -p)"
+	local -r ns="$(ip netns list|grep $PEER_NS)"
+
+	[ -n "${jobs}" ] && kill -1 ${jobs} 2>/dev/null
+	[ -n "$ns" ] && ip netns del $ns 2>/dev/null
+}
+trap cleanup EXIT
+
+cfg_veth() {
+	ip netns add "${PEER_NS}"
+	ip -netns "${PEER_NS}" link set lo up
+	ip link add type veth
+	ip link set dev veth0 up
+	ip addr add dev veth0 192.168.1.2/24
+	ip addr add dev veth0 2001:db8::2/64 nodad
+
+	ip link set dev veth1 netns "${PEER_NS}"
+	ip -netns "${PEER_NS}" addr add dev veth1 192.168.1.1/24
+	ip -netns "${PEER_NS}" addr add dev veth1 2001:db8::1/64 nodad
+	ip -netns "${PEER_NS}" link set dev veth1 up
+}
+
+run_one() {
+	# use 'rx' as separator between sender args and receiver args
+	local -r all="$@"
+	local -r tx_args=${all%rx*}
+	local -r rx_args=${all#*rx}
+
+	cfg_veth
+
+	ip netns exec "${PEER_NS}" ./udpgso_bench_rx ${rx_args} && \
+		echo "ok" || \
+		echo "failed" &
+
+	# Hack: let bg programs complete the startup
+	sleep 0.1
+	./udpgso_bench_tx ${tx_args}
+	wait $(jobs -p)
+}
+
+run_test() {
+	local -r args=$@
+
+	printf " %-40s" "$1"
+	./in_netns.sh $0 __subprocess $2 rx -G -r -x veth1 $3
+}
+
+run_one_nat() {
+	# use 'rx' as separator between sender args and receiver args
+	local addr1 addr2 pid family="" ipt_cmd=ip6tables
+	local -r all="$@"
+	local -r tx_args=${all%rx*}
+	local -r rx_args=${all#*rx}
+
+	if [[ ${tx_args} = *-4* ]]; then
+		ipt_cmd=iptables
+		family=-4
+		addr1=192.168.1.1
+		addr2=192.168.1.3/24
+	else
+		addr1=2001:db8::1
+		addr2="2001:db8::3/64 nodad"
+	fi
+
+	cfg_veth
+	ip -netns "${PEER_NS}" addr add dev veth1 ${addr2}
+
+	# fool the GRO engine changing the destination address ...
+	ip netns exec "${PEER_NS}" $ipt_cmd -t nat -I PREROUTING -d ${addr1} -j DNAT --to-destination ${addr2%/*}
+
+	# ... so that GRO will match the UDP_GRO enabled socket, but packets
+	# will land on the 'plain' one
+	ip netns exec "${PEER_NS}" ./udpgso_bench_rx -G ${family} -x veth1 -b ${addr1} -n 0 &
+	pid=$!
+	ip netns exec "${PEER_NS}" ./udpgso_bench_rx ${family} -b ${addr2%/*} ${rx_args} && \
+		echo "ok" || \
+		echo "failed"&
+
+	sleep 0.1
+	./udpgso_bench_tx ${tx_args}
+	kill -INT $pid
+	wait $(jobs -p)
+}
+
+run_nat_test() {
+	local -r args=$@
+
+	printf " %-40s" "$1"
+	./in_netns.sh $0 __subprocess_nat $2 rx -r $3
+}
+
+run_all() {
+	local -r core_args="-l 4"
+	local -r ipv4_args="${core_args} -4 -D 192.168.1.1"
+	local -r ipv6_args="${core_args} -6 -D 2001:db8::1"
+
+	echo "ipv4"
+	run_test "no GRO" "${ipv4_args} -M 10 -s 1400" "-4 -n 10 -l 1400"
+
+	# explicitly check we are not receiving UDP_SEGMENT cmsg (-S -1)
+	# when GRO does not take place
+	run_test "no GRO chk cmsg" "${ipv4_args} -M 10 -s 1400" "-4 -n 10 -l 1400 -S -1"
+
+	# the GSO packets are aggregated because:
+	# * veth schedule napi after each xmit
+	# * segmentation happens in BH context, veth napi poll is delayed after
+	#   the transmission of the last segment
+	run_test "GRO" "${ipv4_args} -M 1 -s 14720 -S 0 " "-4 -n 1 -l 14720"
+	run_test "GRO chk cmsg" "${ipv4_args} -M 1 -s 14720 -S 0 " "-4 -n 1 -l 14720 -S 1472"
+	run_test "GRO with custom segment size" "${ipv4_args} -M 1 -s 14720 -S 500 " "-4 -n 1 -l 14720"
+	run_test "GRO with custom segment size cmsg" "${ipv4_args} -M 1 -s 14720 -S 500 " "-4 -n 1 -l 14720 -S 500"
+
+	run_nat_test "bad GRO lookup" "${ipv4_args} -M 1 -s 14720 -S 0" "-n 10 -l 1472"
+
+	echo "ipv6"
+	run_test "no GRO" "${ipv6_args} -M 10 -s 1400" "-n 10 -l 1400"
+	run_test "no GRO chk cmsg" "${ipv6_args} -M 10 -s 1400" "-n 10 -l 1400 -S -1"
+	run_test "GRO" "${ipv6_args} -M 1 -s 14520 -S 0" "-n 1 -l 14520"
+	run_test "GRO chk cmsg" "${ipv6_args} -M 1 -s 14520 -S 0" "-n 1 -l 14520 -S 1452"
+	run_test "GRO with custom segment size" "${ipv6_args} -M 1 -s 14520 -S 500" "-n 1 -l 14520"
+	run_test "GRO with custom segment size cmsg" "${ipv6_args} -M 1 -s 14520 -S 500" "-n 1 -l 14520 -S 500"
+
+	run_nat_test "bad GRO lookup" "${ipv6_args} -M 1 -s 14520 -S 0" "-n 10 -l 1452"
+}
+
+if [ ! -f xdp_dummy.o ]; then
+	echo "Skipping GRO tests - missing LLC"
+	exit 0
+fi
+
+if [[ $# -eq 0 ]]; then
+	run_all
+elif [[ $1 == "__subprocess" ]]; then
+	shift
+	run_one $@
+elif [[ $1 == "__subprocess_nat" ]]; then
+	shift
+	run_one_nat $@
+fi
diff --git a/tools/testing/selftests/net/udpgro_bench.sh b/tools/testing/selftests/net/udpgro_bench.sh
index 03d37e5e7424..77a1fb0ae0bc 100755
--- a/tools/testing/selftests/net/udpgro_bench.sh
+++ b/tools/testing/selftests/net/udpgro_bench.sh
@@ -18,7 +18,9 @@ run_one() {
 	# use 'rx' as separator between sender args and receiver args
 	local -r all="$@"
 	local -r tx_args=${all%rx*}
-	local -r rx_args=${all#*rx}
+	local rx_args=${all#*rx}
+
+	[[ "${tx_args}" == *"-4"* ]] && rx_args="${rx_args} -4"
 
 	ip netns add "${PEER_NS}"
 	ip -netns "${PEER_NS}" link set lo up
@@ -50,10 +52,10 @@ run_udp() {
 	local -r args=$@
 
 	echo "udp gso - over veth touching data"
-	run_in_netns ${args} -S rx
+	run_in_netns ${args} -S 0 rx
 
 	echo "udp gso and gro - over veth touching data"
-	run_in_netns ${args} -S rx -G
+	run_in_netns ${args} -S 0 rx -G
 }
 
 run_tcp() {
diff --git a/tools/testing/selftests/net/udpgso_bench.sh b/tools/testing/selftests/net/udpgso_bench.sh
index 99e537ab5ad9..0f0628613f81 100755
--- a/tools/testing/selftests/net/udpgso_bench.sh
+++ b/tools/testing/selftests/net/udpgso_bench.sh
@@ -34,7 +34,7 @@ run_udp() {
 	run_in_netns ${args}
 
 	echo "udp gso"
-	run_in_netns ${args} -S
+	run_in_netns ${args} -S 0
 }
 
 run_tcp() {
diff --git a/tools/testing/selftests/net/udpgso_bench_rx.c b/tools/testing/selftests/net/udpgso_bench_rx.c
index 5dcb719abe04..9657c6988f26 100644
--- a/tools/testing/selftests/net/udpgso_bench_rx.c
+++ b/tools/testing/selftests/net/udpgso_bench_rx.c
@@ -44,6 +44,12 @@ static bool cfg_tcp;
 static bool cfg_verify;
 static bool cfg_read_all;
 static bool cfg_gro_segment;
+static int  cfg_family		= PF_INET6;
+static int  cfg_alen 		= sizeof(struct sockaddr_in6);
+static int  cfg_expected_pkt_nr;
+static int  cfg_expected_pkt_len;
+static int  cfg_expected_gso_size;
+static struct sockaddr_storage cfg_bind_addr;
 #ifdef SUPPORT_XDP
 static int cfg_xdp_iface;
 #endif
@@ -57,6 +63,29 @@ static void sigint_handler(int signum)
 		interrupted = true;
 }
 
+static void setup_sockaddr(int domain, const char *str_addr, void *sockaddr)
+{
+	struct sockaddr_in6 *addr6 = (void *) sockaddr;
+	struct sockaddr_in *addr4 = (void *) sockaddr;
+
+	switch (domain) {
+	case PF_INET:
+		addr4->sin_family = AF_INET;
+		addr4->sin_port = htons(cfg_port);
+		if (inet_pton(AF_INET, str_addr, &(addr4->sin_addr)) != 1)
+			error(1, 0, "ipv4 parse error: %s", str_addr);
+		break;
+	case PF_INET6:
+		addr6->sin6_family = AF_INET6;
+		addr6->sin6_port = htons(cfg_port);
+		if (inet_pton(AF_INET6, str_addr, &(addr6->sin6_addr)) != 1)
+			error(1, 0, "ipv6 parse error: %s", str_addr);
+		break;
+	default:
+		error(1, 0, "illegal domain");
+	}
+}
+
 static unsigned long gettimeofday_ms(void)
 {
 	struct timeval tv;
@@ -90,10 +119,9 @@ static void do_poll(int fd)
 
 static int do_socket(bool do_tcp)
 {
-	struct sockaddr_in6 addr = {0};
 	int fd, val;
 
-	fd = socket(PF_INET6, cfg_tcp ? SOCK_STREAM : SOCK_DGRAM, 0);
+	fd = socket(cfg_family, cfg_tcp ? SOCK_STREAM : SOCK_DGRAM, 0);
 	if (fd == -1)
 		error(1, errno, "socket");
 
@@ -104,10 +132,7 @@ static int do_socket(bool do_tcp)
 	if (setsockopt(fd, SOL_SOCKET, SO_REUSEPORT, &val, sizeof(val)))
 		error(1, errno, "setsockopt reuseport");
 
-	addr.sin6_family =	PF_INET6;
-	addr.sin6_port =	htons(cfg_port);
-	addr.sin6_addr =	in6addr_any;
-	if (bind(fd, (void *) &addr, sizeof(addr)))
+	if (bind(fd, (void *)&cfg_bind_addr, cfg_alen))
 		error(1, errno, "bind");
 
 	if (do_tcp) {
@@ -181,52 +206,117 @@ static void do_verify_udp(const char *data, int len)
 	}
 }
 
+static int recv_msg(int fd, char *buf, int len, int *gso_size)
+{
+	char control[CMSG_SPACE(sizeof(uint16_t))] = {0};
+	struct msghdr msg = {0};
+	struct iovec iov = {0};
+	struct cmsghdr *cmsg;
+	uint16_t *gsosizeptr;
+	int ret;
+
+	iov.iov_base = buf;
+	iov.iov_len = len;
+
+	msg.msg_iov = &iov;
+	msg.msg_iovlen = 1;
+
+	msg.msg_control = control;
+	msg.msg_controllen = sizeof(control);
+
+	*gso_size = -1;
+	ret = recvmsg(fd, &msg, MSG_TRUNC | MSG_DONTWAIT);
+	if (ret != -1) {
+		for (cmsg = CMSG_FIRSTHDR(&msg); cmsg != NULL;
+		     cmsg = CMSG_NXTHDR(&msg, cmsg)) {
+			if (cmsg->cmsg_level == SOL_UDP
+			    && cmsg->cmsg_type == UDP_GRO) {
+				gsosizeptr = (uint16_t *) CMSG_DATA(cmsg);
+				*gso_size = *gsosizeptr;
+				break;
+			}
+		}
+	}
+	return ret;
+}
+
 /* Flush all outstanding datagrams. Verify first few bytes of each. */
 static void do_flush_udp(int fd)
 {
 	static char rbuf[ETH_MAX_MTU];
-	int ret, len, budget = 256;
+	int ret, len, gso_size, budget = 256;
 
 	len = cfg_read_all ? sizeof(rbuf) : 0;
 	while (budget--) {
 		/* MSG_TRUNC will make return value full datagram length */
-		ret = recv(fd, rbuf, len, MSG_TRUNC | MSG_DONTWAIT);
+		if (!cfg_expected_gso_size)
+			ret = recv(fd, rbuf, len, MSG_TRUNC | MSG_DONTWAIT);
+		else
+			ret = recv_msg(fd, rbuf, len, &gso_size);
 		if (ret == -1 && errno == EAGAIN)
-			return;
+			break;
 		if (ret == -1)
 			error(1, errno, "recv");
+		if (cfg_expected_pkt_len && ret != cfg_expected_pkt_len)
+			error(1, 0, "recv: bad packet len, got %d,"
+			      " expected %d\n", ret, cfg_expected_pkt_len);
 		if (len && cfg_verify) {
 			if (ret == 0)
 				error(1, errno, "recv: 0 byte datagram\n");
 
 			do_verify_udp(rbuf, ret);
 		}
+		if (cfg_expected_gso_size && cfg_expected_gso_size != gso_size)
+			error(1, 0, "recv: bad gso size, got %d, expected %d "
+			      "(-1 == no gso cmsg))\n", gso_size,
+			      cfg_expected_gso_size);
 
 		packets++;
 		bytes += ret;
+		if (cfg_expected_pkt_nr && packets >= cfg_expected_pkt_nr)
+			break;
 	}
 }
 
 static void usage(const char *filepath)
 {
-	error(1, 0, "Usage: %s [-Grtv] [-p port] [-x device]", filepath);
+	error(1, 0, "Usage: %s [-Grtv] [-b addr] [-p port] [-l pktlen] [-n packetnr] [-S gsosize] [-x device]", filepath);
 }
 
 static void parse_opts(int argc, char **argv)
 {
 	int c;
 
-	while ((c = getopt(argc, argv, "Gp:rtvx:")) != -1) {
+	/* bind to any by default */
+	setup_sockaddr(PF_INET6, "::", &cfg_bind_addr);
+	while ((c = getopt(argc, argv, "4b:Gl:n:p:rS:tvx:")) != -1) {
 		switch (c) {
+		case '4':
+			cfg_family = PF_INET;
+			cfg_alen = sizeof(struct sockaddr_in);
+			setup_sockaddr(PF_INET, "0.0.0.0", &cfg_bind_addr);
+			break;
+		case 'b':
+			setup_sockaddr(cfg_family, optarg, &cfg_bind_addr);
+			break;
 		case 'G':
 			cfg_gro_segment = true;
 			break;
+		case 'l':
+			cfg_expected_pkt_len = strtoul(optarg, NULL, 0);
+			break;
+		case 'n':
+			cfg_expected_pkt_nr = strtoul(optarg, NULL, 0);
+			break;
 		case 'p':
 			cfg_port = strtoul(optarg, NULL, 0);
 			break;
 		case 'r':
 			cfg_read_all = true;
 			break;
+		case 'S':
+			cfg_expected_gso_size = strtol(optarg, NULL, 0);
+			break;
 		case 't':
 			cfg_tcp = true;
 			break;
@@ -253,7 +343,7 @@ static void parse_opts(int argc, char **argv)
 
 static void do_recv(void)
 {
-	unsigned long tnow, treport;
+	unsigned long tnow, treport, loop = 0;
 #ifdef SUPPORT_XDP
 	int prog_fd = -1;
 #endif
@@ -285,6 +375,11 @@ static void do_recv(void)
 
 	treport = gettimeofday_ms() + 1000;
 	do {
+		/* force termination after the second poll(); this cope both
+		 * with sender slower than receiver and missing packet errors
+		 */
+		if (cfg_expected_pkt_nr && loop++)
+			interrupted = true;
 		do_poll(fd);
 
 		if (cfg_tcp)
@@ -305,6 +400,10 @@ static void do_recv(void)
 
 	} while (!interrupted);
 
+	if (cfg_expected_pkt_nr && (packets != cfg_expected_pkt_nr))
+		error(1, 0, "wrong packet number! got %ld, expected %d\n",
+		      packets, cfg_expected_pkt_nr);
+
 	if (close(fd))
 		error(1, errno, "close");
 #ifdef SUPPORT_XDP
diff --git a/tools/testing/selftests/net/udpgso_bench_tx.c b/tools/testing/selftests/net/udpgso_bench_tx.c
index e821564053cf..2b24de666750 100644
--- a/tools/testing/selftests/net/udpgso_bench_tx.c
+++ b/tools/testing/selftests/net/udpgso_bench_tx.c
@@ -52,6 +52,8 @@ static bool	cfg_segment;
 static bool	cfg_sendmmsg;
 static bool	cfg_tcp;
 static bool	cfg_zerocopy;
+static int	cfg_msg_nr;
+static uint16_t	cfg_gso_size;
 
 static socklen_t cfg_alen;
 static struct sockaddr_storage cfg_dst_addr;
@@ -205,14 +207,14 @@ static void send_udp_segment_cmsg(struct cmsghdr *cm)
 
 	cm->cmsg_level = SOL_UDP;
 	cm->cmsg_type = UDP_SEGMENT;
-	cm->cmsg_len = CMSG_LEN(sizeof(cfg_mss));
+	cm->cmsg_len = CMSG_LEN(sizeof(cfg_gso_size));
 	valp = (void *)CMSG_DATA(cm);
-	*valp = cfg_mss;
+	*valp = cfg_gso_size;
 }
 
 static int send_udp_segment(int fd, char *data)
 {
-	char control[CMSG_SPACE(sizeof(cfg_mss))] = {0};
+	char control[CMSG_SPACE(sizeof(cfg_gso_size))] = {0};
 	struct msghdr msg = {0};
 	struct iovec iov = {0};
 	int ret;
@@ -241,7 +243,7 @@ static int send_udp_segment(int fd, char *data)
 
 static void usage(const char *filepath)
 {
-	error(1, 0, "Usage: %s [-46cmStuz] [-C cpu] [-D dst ip] [-l secs] [-p port] [-s sendsize]",
+	error(1, 0, "Usage: %s [-46cmtuz] [-C cpu] [-D dst ip] [-l secs] [-M messagenr] [-p port] [-s sendsize] [-S gsosize]",
 		    filepath);
 }
 
@@ -250,7 +252,7 @@ static void parse_opts(int argc, char **argv)
 	int max_len, hdrlen;
 	int c;
 
-	while ((c = getopt(argc, argv, "46cC:D:l:mp:s:Stuz")) != -1) {
+	while ((c = getopt(argc, argv, "46cC:D:l:mM:p:s:S:tuz")) != -1) {
 		switch (c) {
 		case '4':
 			if (cfg_family != PF_UNSPEC)
@@ -279,6 +281,9 @@ static void parse_opts(int argc, char **argv)
 		case 'm':
 			cfg_sendmmsg = true;
 			break;
+		case 'M':
+			cfg_msg_nr = strtoul(optarg, NULL, 10);
+			break;
 		case 'p':
 			cfg_port = strtoul(optarg, NULL, 0);
 			break;
@@ -286,6 +291,7 @@ static void parse_opts(int argc, char **argv)
 			cfg_payload_len = strtoul(optarg, NULL, 0);
 			break;
 		case 'S':
+			cfg_gso_size = strtoul(optarg, NULL, 0);
 			cfg_segment = true;
 			break;
 		case 't':
@@ -317,6 +323,8 @@ static void parse_opts(int argc, char **argv)
 
 	cfg_mss = ETH_DATA_LEN - hdrlen;
 	max_len = ETH_MAX_MTU - hdrlen;
+	if (!cfg_gso_size)
+		cfg_gso_size = cfg_mss;
 
 	if (cfg_payload_len > max_len)
 		error(1, 0, "payload length %u exceeds max %u",
@@ -392,10 +400,12 @@ int main(int argc, char **argv)
 		else
 			num_sends += send_udp(fd, buf[i]);
 		num_msgs++;
-
 		if (cfg_zerocopy && ((num_msgs & 0xF) == 0))
 			flush_zerocopy(fd);
 
+		if (cfg_msg_nr && num_msgs >= cfg_msg_nr)
+			break;
+
 		tnow = gettimeofday_ms();
 		if (tnow > treport) {
 			fprintf(stderr,
-- 
2.17.2

^ permalink raw reply related

* [RFC PATCH v3 09/10] selftests: add some benchmark for UDP GRO
From: Paolo Abeni @ 2018-10-30 17:24 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Willem de Bruijn, Steffen Klassert,
	Subash Abhinov Kasiviswanathan
In-Reply-To: <cover.1540920083.git.pabeni@redhat.com>

Run on top of veth pair, using a dummy XDP program to enable the GRO.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 tools/testing/selftests/net/Makefile        |  1 +
 tools/testing/selftests/net/udpgro_bench.sh | 92 +++++++++++++++++++++
 2 files changed, 93 insertions(+)
 create mode 100755 tools/testing/selftests/net/udpgro_bench.sh

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 176459b7c4d6..ac999354af54 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -7,6 +7,7 @@ CFLAGS += -I../../../../usr/include/
 TEST_PROGS := run_netsocktests run_afpackettests test_bpf.sh netdevice.sh rtnetlink.sh
 TEST_PROGS += fib_tests.sh fib-onlink-tests.sh pmtu.sh udpgso.sh ip_defrag.sh
 TEST_PROGS += udpgso_bench.sh fib_rule_tests.sh msg_zerocopy.sh psock_snd.sh
+TEST_PROGS += udpgro_bench.sh
 TEST_PROGS_EXTENDED := in_netns.sh
 TEST_GEN_FILES =  socket
 TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy
diff --git a/tools/testing/selftests/net/udpgro_bench.sh b/tools/testing/selftests/net/udpgro_bench.sh
new file mode 100755
index 000000000000..03d37e5e7424
--- /dev/null
+++ b/tools/testing/selftests/net/udpgro_bench.sh
@@ -0,0 +1,92 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Run a series of udpgro benchmarks
+
+readonly PEER_NS="ns-peer-$(mktemp -u XXXXXX)"
+
+cleanup() {
+	local -r jobs="$(jobs -p)"
+	local -r ns="$(ip netns list|grep $PEER_NS)"
+
+	[ -n "${jobs}" ] && kill -INT ${jobs} 2>/dev/null
+	[ -n "$ns" ] && ip netns del $ns 2>/dev/null
+}
+trap cleanup EXIT
+
+run_one() {
+	# use 'rx' as separator between sender args and receiver args
+	local -r all="$@"
+	local -r tx_args=${all%rx*}
+	local -r rx_args=${all#*rx}
+
+	ip netns add "${PEER_NS}"
+	ip -netns "${PEER_NS}" link set lo up
+	ip link add type veth
+	ip link set dev veth0 up
+	ip addr add dev veth0 192.168.1.2/24
+	ip addr add dev veth0 2001:db8::2/64 nodad
+
+	ip link set dev veth1 netns "${PEER_NS}"
+	ip -netns "${PEER_NS}" addr add dev veth1 192.168.1.1/24
+	ip -netns "${PEER_NS}" addr add dev veth1 2001:db8::1/64 nodad
+	ip -netns "${PEER_NS}" link set dev veth1 up
+
+	ip netns exec "${PEER_NS}" ./udpgso_bench_rx ${rx_args} -r -x veth1 &
+	ip netns exec "${PEER_NS}" ./udpgso_bench_rx -t ${rx_args} -r &
+
+	# Hack: let bg programs complete the startup
+	sleep 0.1
+	./udpgso_bench_tx ${tx_args}
+}
+
+run_in_netns() {
+	local -r args=$@
+
+	./in_netns.sh $0 __subprocess ${args}
+}
+
+run_udp() {
+	local -r args=$@
+
+	echo "udp gso - over veth touching data"
+	run_in_netns ${args} -S rx
+
+	echo "udp gso and gro - over veth touching data"
+	run_in_netns ${args} -S rx -G
+}
+
+run_tcp() {
+	local -r args=$@
+
+	echo "tcp - over veth touching data"
+	run_in_netns ${args} -t rx
+}
+
+run_all() {
+	local -r core_args="-l 4"
+	local -r ipv4_args="${core_args} -4 -D 192.168.1.1"
+	local -r ipv6_args="${core_args} -6 -D 2001:db8::1"
+
+	echo "ipv4"
+	run_tcp "${ipv4_args}"
+	run_udp "${ipv4_args}"
+
+	echo "ipv6"
+	run_tcp "${ipv4_args}"
+	run_udp "${ipv6_args}"
+}
+
+if [ ! -f xdp_dummy.o ]; then
+	echo "Skipping GRO benchmarks - missing LLC"
+	exit 0
+fi
+
+if [[ $# -eq 0 ]]; then
+	run_all
+elif [[ $1 == "__subprocess" ]]; then
+	shift
+	run_one $@
+else
+	run_in_netns $@
+fi
-- 
2.17.2

^ permalink raw reply related

* [RFC PATCH v3 08/10] selftests: conditionally enable XDP support in udpgso_bench_rx
From: Paolo Abeni @ 2018-10-30 17:24 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Willem de Bruijn, Steffen Klassert,
	Subash Abhinov Kasiviswanathan
In-Reply-To: <cover.1540920083.git.pabeni@redhat.com>

XDP support will be used by a later patch to test the GRO path
in a net namespace, leveraging the veth XDP implementation.
To avoid breaking existing setup, XDP support is conditionally
enabled and build only if llc is locally available.

rfc v2 -> rfc v3:
 - move 'x' option handling here

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 tools/testing/selftests/net/Makefile          | 69 +++++++++++++++++++
 tools/testing/selftests/net/udpgso_bench_rx.c | 41 ++++++++++-
 tools/testing/selftests/net/xdp_dummy.c       | 13 ++++
 3 files changed, 121 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/net/xdp_dummy.c

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 256d82d5fa87..176459b7c4d6 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -16,8 +16,77 @@ TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
 TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict tls
 
 KSFT_KHDR_INSTALL := 1
+
+# Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline:
+#  make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang
+LLC ?= llc
+CLANG ?= clang
+LLVM_OBJCOPY ?= llvm-objcopy
+BTF_PAHOLE ?= pahole
+HAS_LLC := $(shell which $(LLC) 2>/dev/null)
+
+# conditional enable testes requiring llc
+ifneq (, $(HAS_LLC))
+TEST_GEN_FILES += xdp_dummy.o
+endif
+
 include ../lib.mk
 
+ifneq (, $(HAS_LLC))
+
+# Detect that we're cross compiling and use the cross compiler
+ifdef CROSS_COMPILE
+CLANG_ARCH_ARGS = -target $(ARCH)
+endif
+
+PROBE := $(shell $(LLC) -march=bpf -mcpu=probe -filetype=null /dev/null 2>&1)
+
+# Let newer LLVM versions transparently probe the kernel for availability
+# of full BPF instruction set.
+ifeq ($(PROBE),)
+  CPU ?= probe
+else
+  CPU ?= generic
+endif
+
+SRC_PATH := $(abspath ../../../..)
+LIB_PATH := $(SRC_PATH)/tools/lib
+XDP_CFLAGS := -D SUPPORT_XDP=1 -I$(LIB_PATH)
+LIBBPF = $(LIB_PATH)/bpf/libbpf.a
+BTF_LLC_PROBE := $(shell $(LLC) -march=bpf -mattr=help 2>&1 | grep dwarfris)
+BTF_PAHOLE_PROBE := $(shell $(BTF_PAHOLE) --help 2>&1 | grep BTF)
+BTF_OBJCOPY_PROBE := $(shell $(LLVM_OBJCOPY) --help 2>&1 | grep -i 'usage.*llvm')
+CLANG_SYS_INCLUDES := $(shell $(CLANG) -v -E - </dev/null 2>&1 \
+        | sed -n '/<...> search starts here:/,/End of search list./{ s| \(/.*\)|-idirafter \1|p }')
+CLANG_FLAGS = -I. -I$(SRC_PATH)/include -I../bpf/ \
+	      $(CLANG_SYS_INCLUDES) -Wno-compare-distinct-pointer-types
+
+ifneq ($(and $(BTF_LLC_PROBE),$(BTF_PAHOLE_PROBE),$(BTF_OBJCOPY_PROBE)),)
+	CLANG_CFLAGS += -g
+	LLC_FLAGS += -mattr=dwarfris
+	DWARF2BTF = y
+endif
+
+$(LIBBPF): FORCE
+# Fix up variables inherited from Kbuild that tools/ build system won't like
+	$(MAKE) -C $(dir $@) RM='rm -rf' LDFLAGS= srctree=$(SRC_PATH) O= $(nodir $@)
+
+$(OUTPUT)/udpgso_bench_rx: $(OUTPUT)/udpgso_bench_rx.c $(LIBBPF)
+	$(CC) -o $@ $(XDP_CFLAGS) $(CFLAGS) $(LOADLIBES) $(LDLIBS) $^ -lelf
+
+FORCE:
+
+# bpf program[s] generation
+$(OUTPUT)/%.o: %.c
+	$(CLANG) $(CLANG_FLAGS) \
+		 -O2 -target bpf -emit-llvm -c $< -o - |      \
+	$(LLC) -march=bpf -mcpu=$(CPU) $(LLC_FLAGS) -filetype=obj -o $@
+ifeq ($(DWARF2BTF),y)
+	$(BTF_PAHOLE) -J $@
+endif
+
+endif
+
 $(OUTPUT)/reuseport_bpf_numa: LDFLAGS += -lnuma
 $(OUTPUT)/tcp_mmap: LDFLAGS += -lpthread
 $(OUTPUT)/tcp_inq: LDFLAGS += -lpthread
diff --git a/tools/testing/selftests/net/udpgso_bench_rx.c b/tools/testing/selftests/net/udpgso_bench_rx.c
index 8f48d7fb32cf..5dcb719abe04 100644
--- a/tools/testing/selftests/net/udpgso_bench_rx.c
+++ b/tools/testing/selftests/net/udpgso_bench_rx.c
@@ -31,6 +31,10 @@
 #include <sys/wait.h>
 #include <unistd.h>
 
+#ifdef SUPPORT_XDP
+#include "bpf/libbpf.h"
+#endif
+
 #ifndef UDP_GRO
 #define UDP_GRO		104
 #endif
@@ -40,6 +44,9 @@ static bool cfg_tcp;
 static bool cfg_verify;
 static bool cfg_read_all;
 static bool cfg_gro_segment;
+#ifdef SUPPORT_XDP
+static int cfg_xdp_iface;
+#endif
 
 static bool interrupted;
 static unsigned long packets, bytes;
@@ -202,14 +209,14 @@ static void do_flush_udp(int fd)
 
 static void usage(const char *filepath)
 {
-	error(1, 0, "Usage: %s [-Grtv] [-p port]", filepath);
+	error(1, 0, "Usage: %s [-Grtv] [-p port] [-x device]", filepath);
 }
 
 static void parse_opts(int argc, char **argv)
 {
 	int c;
 
-	while ((c = getopt(argc, argv, "Gp:rtv")) != -1) {
+	while ((c = getopt(argc, argv, "Gp:rtvx:")) != -1) {
 		switch (c) {
 		case 'G':
 			cfg_gro_segment = true;
@@ -227,6 +234,13 @@ static void parse_opts(int argc, char **argv)
 			cfg_verify = true;
 			cfg_read_all = true;
 			break;
+#ifdef SUPPORT_XDP
+		case 'x':
+			cfg_xdp_iface = if_nametoindex(optarg);
+			if (!cfg_xdp_iface)
+				error(1, errno, "unknown interface %s", optarg);
+			break;
+#endif
 		}
 	}
 
@@ -240,6 +254,9 @@ static void parse_opts(int argc, char **argv)
 static void do_recv(void)
 {
 	unsigned long tnow, treport;
+#ifdef SUPPORT_XDP
+	int prog_fd = -1;
+#endif
 	int fd;
 
 	fd = do_socket(cfg_tcp);
@@ -250,6 +267,22 @@ static void do_recv(void)
 			error(1, errno, "setsockopt UDP_GRO");
 	}
 
+#ifdef SUPPORT_XDP
+	if (cfg_xdp_iface) {
+		struct bpf_prog_load_attr prog_load_attr = {
+			.prog_type	= BPF_PROG_TYPE_XDP,
+			.file 		= "xdp_dummy.o",
+		};
+		struct bpf_object *obj;
+
+		if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
+			error(1, errno, "xdp program load failed\n");
+
+		if (bpf_set_link_xdp_fd(cfg_xdp_iface, prog_fd, 0) < 0)
+			error(1, errno, "link set xdp fd failed\n");
+	}
+#endif
+
 	treport = gettimeofday_ms() + 1000;
 	do {
 		do_poll(fd);
@@ -274,6 +307,10 @@ static void do_recv(void)
 
 	if (close(fd))
 		error(1, errno, "close");
+#ifdef SUPPORT_XDP
+	if (cfg_xdp_iface && bpf_set_link_xdp_fd(cfg_xdp_iface, -1, 0))
+		error(1, errno, "removing xdp program");
+#endif
 }
 
 int main(int argc, char **argv)
diff --git a/tools/testing/selftests/net/xdp_dummy.c b/tools/testing/selftests/net/xdp_dummy.c
new file mode 100644
index 000000000000..1a64cf5099ed
--- /dev/null
+++ b/tools/testing/selftests/net/xdp_dummy.c
@@ -0,0 +1,13 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define KBUILD_MODNAME "xdp_dummy"
+#include <uapi/linux/bpf.h>
+#include "bpf_helpers.h"
+
+SEC("xdp_dummy")
+int xdp_dummy_prog(struct xdp_md *ctx)
+{
+	return XDP_PASS;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.17.2

^ permalink raw reply related

* [RFC PATCH v3 07/10] selftests: add GRO support to udp bench rx program
From: Paolo Abeni @ 2018-10-30 17:24 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Willem de Bruijn, Steffen Klassert,
	Subash Abhinov Kasiviswanathan
In-Reply-To: <cover.1540920083.git.pabeni@redhat.com>

And fix a couple of buglets (port option processing,
clean termination on SIGINT). This is preparatory work
for GRO tests.

rfc v2 -> rfc v3:
 - use ETH_MAX_MTU

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 tools/testing/selftests/net/udpgso_bench_rx.c | 37 +++++++++++++++----
 1 file changed, 30 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/net/udpgso_bench_rx.c b/tools/testing/selftests/net/udpgso_bench_rx.c
index 727cf67a3f75..8f48d7fb32cf 100644
--- a/tools/testing/selftests/net/udpgso_bench_rx.c
+++ b/tools/testing/selftests/net/udpgso_bench_rx.c
@@ -31,9 +31,15 @@
 #include <sys/wait.h>
 #include <unistd.h>
 
+#ifndef UDP_GRO
+#define UDP_GRO		104
+#endif
+
 static int  cfg_port		= 8000;
 static bool cfg_tcp;
 static bool cfg_verify;
+static bool cfg_read_all;
+static bool cfg_gro_segment;
 
 static bool interrupted;
 static unsigned long packets, bytes;
@@ -63,6 +69,8 @@ static void do_poll(int fd)
 
 	do {
 		ret = poll(&pfd, 1, 10);
+		if (interrupted)
+			break;
 		if (ret == -1)
 			error(1, errno, "poll");
 		if (ret == 0)
@@ -70,7 +78,7 @@ static void do_poll(int fd)
 		if (pfd.revents != POLLIN)
 			error(1, errno, "poll: 0x%x expected 0x%x\n",
 					pfd.revents, POLLIN);
-	} while (!ret && !interrupted);
+	} while (!ret);
 }
 
 static int do_socket(bool do_tcp)
@@ -102,6 +110,8 @@ static int do_socket(bool do_tcp)
 			error(1, errno, "listen");
 
 		do_poll(accept_fd);
+		if (interrupted)
+			exit(0);
 
 		fd = accept(accept_fd, NULL, NULL);
 		if (fd == -1)
@@ -167,10 +177,10 @@ static void do_verify_udp(const char *data, int len)
 /* Flush all outstanding datagrams. Verify first few bytes of each. */
 static void do_flush_udp(int fd)
 {
-	static char rbuf[ETH_DATA_LEN];
+	static char rbuf[ETH_MAX_MTU];
 	int ret, len, budget = 256;
 
-	len = cfg_verify ? sizeof(rbuf) : 0;
+	len = cfg_read_all ? sizeof(rbuf) : 0;
 	while (budget--) {
 		/* MSG_TRUNC will make return value full datagram length */
 		ret = recv(fd, rbuf, len, MSG_TRUNC | MSG_DONTWAIT);
@@ -178,7 +188,7 @@ static void do_flush_udp(int fd)
 			return;
 		if (ret == -1)
 			error(1, errno, "recv");
-		if (len) {
+		if (len && cfg_verify) {
 			if (ret == 0)
 				error(1, errno, "recv: 0 byte datagram\n");
 
@@ -192,23 +202,30 @@ static void do_flush_udp(int fd)
 
 static void usage(const char *filepath)
 {
-	error(1, 0, "Usage: %s [-tv] [-p port]", filepath);
+	error(1, 0, "Usage: %s [-Grtv] [-p port]", filepath);
 }
 
 static void parse_opts(int argc, char **argv)
 {
 	int c;
 
-	while ((c = getopt(argc, argv, "ptv")) != -1) {
+	while ((c = getopt(argc, argv, "Gp:rtv")) != -1) {
 		switch (c) {
+		case 'G':
+			cfg_gro_segment = true;
+			break;
 		case 'p':
-			cfg_port = htons(strtoul(optarg, NULL, 0));
+			cfg_port = strtoul(optarg, NULL, 0);
+			break;
+		case 'r':
+			cfg_read_all = true;
 			break;
 		case 't':
 			cfg_tcp = true;
 			break;
 		case 'v':
 			cfg_verify = true;
+			cfg_read_all = true;
 			break;
 		}
 	}
@@ -227,6 +244,12 @@ static void do_recv(void)
 
 	fd = do_socket(cfg_tcp);
 
+	if (cfg_gro_segment && !cfg_tcp) {
+		int val = 1;
+		if (setsockopt(fd, IPPROTO_UDP, UDP_GRO, &val, sizeof(val)))
+			error(1, errno, "setsockopt UDP_GRO");
+	}
+
 	treport = gettimeofday_ms() + 1000;
 	do {
 		do_poll(fd);
-- 
2.17.2

^ permalink raw reply related

* [RFC PATCH v3 06/10] udp: cope with UDP GRO packet misdirection
From: Paolo Abeni @ 2018-10-30 17:24 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Willem de Bruijn, Steffen Klassert,
	Subash Abhinov Kasiviswanathan
In-Reply-To: <cover.1540920083.git.pabeni@redhat.com>

In some scenarios, the GRO engine can assemble an UDP GRO packet
that ultimately lands on a non GRO-enabled socket.
This patch tries to address the issue explicitly checking for the UDP
socket features before enqueuing the packet, and eventually segmenting
the unexpected GRO packet, as needed.

We must also cope with re-insertion requests: after segmentation the
UDP code calls the helper introduced by the previous patches, as needed.

Segmentation is performed by a common helper, which takes care of
updating socket and protocol stats is case of failure.

rfc v2 -> rfc v3
 - moved udp_rcv_segment() into net/udp.h, account errors to socket
   and ns, always return NULL or segs list

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/linux/udp.h |  6 ++++++
 include/net/udp.h   | 51 ++++++++++++++++++++++++++++++++++++++-------
 net/ipv4/udp.c      | 25 +++++++++++++++++++++-
 net/ipv6/udp.c      | 27 +++++++++++++++++++++++-
 4 files changed, 99 insertions(+), 10 deletions(-)

diff --git a/include/linux/udp.h b/include/linux/udp.h
index e23d5024f42f..0a9c54e76305 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -132,6 +132,12 @@ static inline void udp_cmsg_recv(struct msghdr *msg, struct sock *sk,
 	}
 }
 
+static inline bool udp_unexpected_gso(struct sock *sk, struct sk_buff *skb)
+{
+	return !udp_sk(sk)->gro_enabled && skb_is_gso(skb) &&
+	       skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4;
+}
+
 #define udp_portaddr_for_each_entry(__sk, list) \
 	hlist_for_each_entry(__sk, list, __sk_common.skc_portaddr_node)
 
diff --git a/include/net/udp.h b/include/net/udp.h
index 9e82cb391dea..f94aed316a04 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -406,17 +406,24 @@ static inline int copy_linear_skb(struct sk_buff *skb, int len, int off,
 } while(0)
 
 #if IS_ENABLED(CONFIG_IPV6)
-#define __UDPX_INC_STATS(sk, field)					\
-do {									\
-	if ((sk)->sk_family == AF_INET)					\
-		__UDP_INC_STATS(sock_net(sk), field, 0);		\
-	else								\
-		__UDP6_INC_STATS(sock_net(sk), field, 0);		\
-} while (0)
+#define __UDPX_MIB(sk, ipv4)						\
+({									\
+	ipv4 ? (IS_UDPLITE(sk) ? sock_net(sk)->mib.udplite_statistics :	\
+				 sock_net(sk)->mib.udp_statistics) :	\
+		(IS_UDPLITE(sk) ? sock_net(sk)->mib.udplite_stats_in6 :	\
+				 sock_net(sk)->mib.udp_stats_in6);	\
+})
 #else
-#define __UDPX_INC_STATS(sk, field) __UDP_INC_STATS(sock_net(sk), field, 0)
+#define __UDPX_MIB(sk, ipv4)						\
+({									\
+	IS_UDPLITE(sk) ? sock_net(sk)->mib.udplite_statistics :		\
+			 sock_net(sk)->mib.udp_statistics;		\
+})
 #endif
 
+#define __UDPX_INC_STATS(sk, field) \
+	__SNMP_INC_STATS(__UDPX_MIB(sk, (sk)->sk_family == AF_INET, field)
+
 #ifdef CONFIG_PROC_FS
 struct udp_seq_afinfo {
 	sa_family_t			family;
@@ -450,4 +457,32 @@ DECLARE_STATIC_KEY_FALSE(udpv6_encap_needed_key);
 void udpv6_encap_enable(void);
 #endif
 
+static inline struct sk_buff *udp_rcv_segment(struct sock *sk,
+					      struct sk_buff *skb)
+{
+	bool ipv4 = skb->protocol == htons(ETH_P_IP);
+	int segs_nr = skb_shinfo(skb)->gso_segs;
+	struct sk_buff *segs;
+
+	/* the GSO CB lays after the UDP one, no need to save and restore any
+	 * CB fragment
+	 */
+	segs = __skb_gso_segment(skb, NETIF_F_SG, false);
+	if (unlikely(IS_ERR(segs))) {
+		kfree_skb(skb);
+		goto drop;
+	}
+
+	if (unlikely(!segs))
+		goto drop;
+
+	consume_skb(skb);
+	return segs;
+
+drop:
+	atomic_add(segs_nr, &sk->sk_drops);
+	SNMP_ADD_STATS(__UDPX_MIB(sk, ipv4), UDP_MIB_INERRORS, segs_nr);
+	return NULL;
+}
+
 #endif	/* _UDP_H */
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index b345f71b1cbb..b45033f63673 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1909,7 +1909,7 @@ EXPORT_SYMBOL(udp_encap_enable);
  * Note that in the success and error cases, the skb is assumed to
  * have either been requeued or freed.
  */
-static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
+static int udp_queue_rcv_one_skb(struct sock *sk, struct sk_buff *skb)
 {
 	struct udp_sock *up = udp_sk(sk);
 	int is_udplite = IS_UDPLITE(sk);
@@ -2012,6 +2012,29 @@ static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	return -1;
 }
 
+void ip_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int proto);
+
+static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
+{
+	struct sk_buff *next, *segs;
+	int ret;
+
+	if (likely(!udp_unexpected_gso(sk, skb)))
+		return udp_queue_rcv_one_skb(sk, skb);
+
+	BUILD_BUG_ON(sizeof(struct udp_skb_cb) > SKB_SGO_CB_OFFSET);
+	__skb_push(skb, -skb_mac_offset(skb));
+	segs = udp_rcv_segment(sk, skb);
+	for (skb = segs; skb; skb = next) {
+		next = skb->next;
+		__skb_pull(skb, skb_transport_offset(skb));
+		ret = udp_queue_rcv_one_skb(sk, skb);
+		if (ret > 0)
+			ip_protocol_deliver_rcu(dev_net(skb->dev), skb, -ret);
+	}
+	return 0;
+}
+
 /* For TCP sockets, sk_rx_dst is protected by socket lock
  * For UDP, we use xchg() to guard against concurrent changes.
  */
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 8e76e719305c..137c421bef82 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -558,7 +558,7 @@ void udpv6_encap_enable(void)
 }
 EXPORT_SYMBOL(udpv6_encap_enable);
 
-static int udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
+static int udpv6_queue_rcv_one_skb(struct sock *sk, struct sk_buff *skb)
 {
 	struct udp_sock *up = udp_sk(sk);
 	int is_udplite = IS_UDPLITE(sk);
@@ -641,6 +641,31 @@ static int udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	return -1;
 }
 
+void ip6_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int nexthdr,
+			      bool have_final);
+
+static int udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
+{
+	struct sk_buff *next, *segs;
+	int ret;
+
+	if (likely(!udp_unexpected_gso(sk, skb)))
+		return udpv6_queue_rcv_one_skb(sk, skb);
+
+	__skb_push(skb, -skb_mac_offset(skb));
+	segs = udp_rcv_segment(sk, skb);
+	for (skb = segs; skb; skb = next) {
+		next = skb->next;
+		__skb_pull(skb, skb_transport_offset(skb));
+
+		ret = udpv6_queue_rcv_one_skb(sk, skb);
+		if (ret > 0)
+			ip6_protocol_deliver_rcu(dev_net(skb->dev), skb, ret,
+						 true);
+	}
+	return 0;
+}
+
 static bool __udp_v6_is_mcast_sock(struct net *net, struct sock *sk,
 				   __be16 loc_port, const struct in6_addr *loc_addr,
 				   __be16 rmt_port, const struct in6_addr *rmt_addr,
-- 
2.17.2

^ permalink raw reply related

* [RFC PATCH v3 05/10] ipv6: factor out protocol delivery helper
From: Paolo Abeni @ 2018-10-30 17:24 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Willem de Bruijn, Steffen Klassert,
	Subash Abhinov Kasiviswanathan
In-Reply-To: <cover.1540920083.git.pabeni@redhat.com>

So that we can re-use it at the UDP lavel in the next patch

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/ipv6/ip6_input.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index 96577e742afd..3065226bdc57 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -319,28 +319,26 @@ void ipv6_list_rcv(struct list_head *head, struct packet_type *pt,
 /*
  *	Deliver the packet to the host
  */
-
-
-static int ip6_input_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
+void ip6_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int nexthdr,
+			      bool have_final)
 {
 	const struct inet6_protocol *ipprot;
 	struct inet6_dev *idev;
 	unsigned int nhoff;
-	int nexthdr;
 	bool raw;
-	bool have_final = false;
 
 	/*
 	 *	Parse extension headers
 	 */
 
-	rcu_read_lock();
 resubmit:
 	idev = ip6_dst_idev(skb_dst(skb));
-	if (!pskb_pull(skb, skb_transport_offset(skb)))
-		goto discard;
 	nhoff = IP6CB(skb)->nhoff;
-	nexthdr = skb_network_header(skb)[nhoff];
+	if (!have_final) {
+		if (!pskb_pull(skb, skb_transport_offset(skb)))
+			goto discard;
+		nexthdr = skb_network_header(skb)[nhoff];
+	}
 
 resubmit_final:
 	raw = raw6_local_deliver(skb, nexthdr);
@@ -411,13 +409,19 @@ static int ip6_input_finish(struct net *net, struct sock *sk, struct sk_buff *sk
 			consume_skb(skb);
 		}
 	}
-	rcu_read_unlock();
-	return 0;
+	return;
 
 discard:
 	__IP6_INC_STATS(net, idev, IPSTATS_MIB_INDISCARDS);
-	rcu_read_unlock();
 	kfree_skb(skb);
+}
+
+static int ip6_input_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
+{
+	rcu_read_lock();
+	ip6_protocol_deliver_rcu(net, skb, 0, false);
+	rcu_read_unlock();
+
 	return 0;
 }
 
-- 
2.17.2

^ permalink raw reply related

* [RFC PATCH v3 04/10] ip: factor out protocol delivery helper
From: Paolo Abeni @ 2018-10-30 17:24 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Willem de Bruijn, Steffen Klassert,
	Subash Abhinov Kasiviswanathan
In-Reply-To: <cover.1540920083.git.pabeni@redhat.com>

So that we can re-use it at the UDP lavel in a later patch

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/ipv4/ip_input.c | 73 ++++++++++++++++++++++-----------------------
 1 file changed, 36 insertions(+), 37 deletions(-)

diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index 35a786c0aaa0..72250b4e466d 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -188,51 +188,50 @@ bool ip_call_ra_chain(struct sk_buff *skb)
 	return false;
 }
 
-static int ip_local_deliver_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
+void ip_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int protocol)
 {
-	__skb_pull(skb, skb_network_header_len(skb));
-
-	rcu_read_lock();
-	{
-		int protocol = ip_hdr(skb)->protocol;
-		const struct net_protocol *ipprot;
-		int raw;
+	const struct net_protocol *ipprot;
+	int raw, ret;
 
-	resubmit:
-		raw = raw_local_deliver(skb, protocol);
+resubmit:
+	raw = raw_local_deliver(skb, protocol);
 
-		ipprot = rcu_dereference(inet_protos[protocol]);
-		if (ipprot) {
-			int ret;
-
-			if (!ipprot->no_policy) {
-				if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) {
-					kfree_skb(skb);
-					goto out;
-				}
-				nf_reset(skb);
+	ipprot = rcu_dereference(inet_protos[protocol]);
+	if (ipprot) {
+		if (!ipprot->no_policy) {
+			if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) {
+				kfree_skb(skb);
+				return;
 			}
-			ret = ipprot->handler(skb);
-			if (ret < 0) {
-				protocol = -ret;
-				goto resubmit;
+			nf_reset(skb);
+		}
+		ret = ipprot->handler(skb);
+		if (ret < 0) {
+			protocol = -ret;
+			goto resubmit;
+		}
+		__IP_INC_STATS(net, IPSTATS_MIB_INDELIVERS);
+	} else {
+		if (!raw) {
+			if (xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) {
+				__IP_INC_STATS(net, IPSTATS_MIB_INUNKNOWNPROTOS);
+				icmp_send(skb, ICMP_DEST_UNREACH,
+					  ICMP_PROT_UNREACH, 0);
 			}
-			__IP_INC_STATS(net, IPSTATS_MIB_INDELIVERS);
+			kfree_skb(skb);
 		} else {
-			if (!raw) {
-				if (xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) {
-					__IP_INC_STATS(net, IPSTATS_MIB_INUNKNOWNPROTOS);
-					icmp_send(skb, ICMP_DEST_UNREACH,
-						  ICMP_PROT_UNREACH, 0);
-				}
-				kfree_skb(skb);
-			} else {
-				__IP_INC_STATS(net, IPSTATS_MIB_INDELIVERS);
-				consume_skb(skb);
-			}
+			__IP_INC_STATS(net, IPSTATS_MIB_INDELIVERS);
+			consume_skb(skb);
 		}
 	}
- out:
+}
+
+static int ip_local_deliver_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
+{
+	__skb_pull(skb, skb_network_header_len(skb));
+
+	rcu_read_lock();
+	ip_protocol_deliver_rcu(net, skb, ip_hdr(skb)->protocol);
 	rcu_read_unlock();
 
 	return 0;
-- 
2.17.2

^ permalink raw reply related

* [RFC PATCH v3 03/10] udp: add support for UDP_GRO cmsg
From: Paolo Abeni @ 2018-10-30 17:24 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Willem de Bruijn, Steffen Klassert,
	Subash Abhinov Kasiviswanathan
In-Reply-To: <cover.1540920083.git.pabeni@redhat.com>

When UDP GRO is enabled, the UDP_GRO cmsg will carry the ingress
datagram size. User-space can use such info to compute the original
packets layout.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
Note: I avoided setting a bit in cmsg_flag for UDP_GRO, as that
attempt produced some uglyfication, expecially on the ipv6 side
with no measurable performances benefits.
---
 include/linux/udp.h | 11 +++++++++++
 net/ipv4/udp.c      |  4 ++++
 net/ipv6/udp.c      |  3 +++
 3 files changed, 18 insertions(+)

diff --git a/include/linux/udp.h b/include/linux/udp.h
index f613b329852e..e23d5024f42f 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -121,6 +121,17 @@ static inline bool udp_get_no_check6_rx(struct sock *sk)
 	return udp_sk(sk)->no_check6_rx;
 }
 
+static inline void udp_cmsg_recv(struct msghdr *msg, struct sock *sk,
+				 struct sk_buff *skb)
+{
+	int gso_size;
+
+	if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) {
+		gso_size = skb_shinfo(skb)->gso_size;
+		put_cmsg(msg, SOL_UDP, UDP_GRO, sizeof(gso_size), &gso_size);
+	}
+}
+
 #define udp_portaddr_for_each_entry(__sk, list) \
 	hlist_for_each_entry(__sk, list, __sk_common.skc_portaddr_node)
 
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 4d4f4d044c28..b345f71b1cbb 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1714,6 +1714,10 @@ int udp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int noblock,
 		memset(sin->sin_zero, 0, sizeof(sin->sin_zero));
 		*addr_len = sizeof(*sin);
 	}
+
+	if (udp_sk(sk)->gro_enabled)
+		udp_cmsg_recv(msg, sk, skb);
+
 	if (inet->cmsg_flags)
 		ip_cmsg_recv_offset(msg, sk, skb, sizeof(struct udphdr), off);
 
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index fc0ce6c59ebb..8e76e719305c 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -421,6 +421,9 @@ int udpv6_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		*addr_len = sizeof(*sin6);
 	}
 
+	if (udp_sk(sk)->gro_enabled)
+		udp_cmsg_recv(msg, sk, skb);
+
 	if (np->rxopt.all)
 		ip6_datagram_recv_common_ctl(sk, msg, skb);
 
-- 
2.17.2

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox