Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
From: Andreas Schwab @ 2018-06-19 19:10 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Mathieu Malaterre, David S. Miller, Eric Dumazet, LKML,
	Christophe LEROY, Meelis Roos, netdev, linuxppc-dev
In-Reply-To: <dd6f13bc-c5f2-85f8-c08d-837bc024fc7c@gmail.com>

On Jun 18 2018, Eric Dumazet <eric.dumazet@gmail.com> wrote:

> DUMP_PREFIX_ADDRESS might give us more information (say alignment problem, or crossing page boundaries)

DUMP_PREFIX_ADDRESS is useless for that purpose.

Here are some samples of broken csums:

[  853.849225] sungem: sungem wrong csum : 9886/07be, len 94 bytes, c0000001fa187e02
[  853.849232] raw data: 00000000: 00 0d 93 43 81 62 18 d6 c7 51 b8 1c 08 00 45 10  ...C.b...Q....E.
[  853.849235] raw data: 00000010: 00 4c cb a0 40 00 40 11 d9 97 c0 a8 0a 01 c0 a8  .L..@.@.........
[  853.849237] raw data: 00000020: 0a 07 00 7b 00 7b 00 38 69 e1 1c 03 0c f7 00 00  ...{.{.8i.......
[  853.849240] raw data: 00000030: 08 f0 00 00 15 f0 c0 35 67 67 de d3 ca c9 d9 5b  .......5gg.....[
[  853.849242] raw data: 00000040: 1f ff de d3 d2 86 8f 67 fa f2 de d3 d2 86 8f 38  .......g.......8
[  853.849244] raw data: 00000050: 2f ff de d3 d2 86 8f 3b ff ff d1 93 bc 50        /......;.....P

[  857.883052] sungem: sungem wrong csum : dbb4/c48f, len 94 bytes, c0000001fa185882
[  857.883058] raw data: 00000000: 00 0d 93 43 81 62 18 d6 c7 51 b8 1c 08 00 45 00  ...C.b...Q....E.
[  857.883070] raw data: 00000010: 00 4c a1 97 40 00 3a 11 ce ed d9 5b 2c 11 c0 a8  .L..@.:....[,...
[  857.883080] raw data: 00000020: 0a 07 00 7b 00 7b 00 38 14 4b 24 02 06 ea 00 00  ...{.{.8.K$.....
[  857.883085] raw data: 00000030: 00 0b 00 00 02 99 c0 a8 64 09 de d3 d2 5a 36 e4  ........d....Z6.
[  857.883090] raw data: 00000040: bc f5 de d3 d2 8a 8f 2c 17 44 de d3 d2 8a 93 8b  .......,.D......
[  857.883094] raw data: 00000050: d7 b7 de d3 d2 8a 93 97 69 6e 39 7b d2 5a        ........in9{.Z

[  858.124689] sungem: sungem wrong csum : 1f4f/02d0, len 118 bytes, c0000001fa185602
[  858.124700] raw data: 00000000: 00 0d 93 43 81 62 d4 3d 7e 4c 48 b7 86 dd 61 01  ...C.b.=~LH...a.
[  858.124705] raw data: 00000010: 1e b1 00 3c 06 40 20 01 0a 62 17 11 88 01 00 00  ...<.@ ..b......
[  858.124709] raw data: 00000020: 00 00 00 00 0a 38 20 01 0a 62 17 11 88 01 00 00  .....8 ..b......
[  858.124714] raw data: 00000030: 00 00 00 00 00 07 94 b4 00 16 86 f5 29 e8 36 cb  ............).6.
[  858.124718] raw data: 00000040: 50 49 80 18 05 93 9a 53 00 00 01 01 08 0a 58 b2  PI.....S......X.
[  858.124723] raw data: 00000050: de 54 61 5f 2f 3c 00 00 00 10 cc 08 55 f7 da 21  .Ta_/<......U..!
[  858.124727] raw data: 00000060: f4 60 0a 6b 3c aa b9 b3 7e 61 10 b8 c2 be 9a 0b  .`.k<...~a......
[  858.124731] raw data: 00000070: c7 e9 5b 97 1b ac                                ..[...

[  858.126522] sungem: sungem wrong csum : 0836/19e9, len 90 bytes, c0000001fa185382
[  858.126530] raw data: 00000000: 00 0d 93 43 81 62 d4 3d 7e 4c 48 b7 86 dd 61 01  ...C.b.=~LH...a.
[  858.126532] raw data: 00000010: 1e b1 00 20 06 40 20 01 0a 62 17 11 88 01 00 00  ... .@ ..b......
[  858.126535] raw data: 00000020: 00 00 00 00 0a 38 20 01 0a 62 17 11 88 01 00 00  .....8 ..b......
[  858.126537] raw data: 00000030: 00 00 00 00 00 07 94 b4 00 16 86 f5 2a 04 36 cb  ............*.6.
[  858.126540] raw data: 00000040: 50 65 80 10 05 93 3e 56 00 00 01 01 08 0a 58 b2  Pe....>V......X.
[  858.126542] raw data: 00000050: de 56 61 5f 30 4d 1d 58 42 d2                    .Va_0M.XB.

[  858.131559] sungem: sungem wrong csum : 5891/c98d, len 90 bytes, c0000001fa185102
[  858.131567] raw data: 00000000: 00 0d 93 43 81 62 d4 3d 7e 4c 48 b7 86 dd 61 01  ...C.b.=~LH...a.
[  858.131570] raw data: 00000010: 1e b1 00 20 06 40 20 01 0a 62 17 11 88 01 00 00  ... .@ ..b......
[  858.131572] raw data: 00000020: 00 00 00 00 0a 38 20 01 0a 62 17 11 88 01 00 00  .....8 ..b......
[  858.131574] raw data: 00000030: 00 00 00 00 00 07 94 b4 00 16 86 f5 2a 04 36 cb  ............*.6.
[  858.131577] raw data: 00000040: 50 a1 80 10 05 93 3e 10 00 00 01 01 08 0a 58 b2  P.....>.......X.
[  858.131579] raw data: 00000050: de 5b 61 5f 30 52 3f ea 70 9b                    .[a_0R?.p.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply

* Re: [PATCH V2] brcmfmac: stop watchdog before detach and free everything
From: Kalle Valo @ 2018-06-19 19:06 UTC (permalink / raw)
  To: Arend van Spriel
  Cc: Michael Trimarchi, Franky Lin, Hante Meuleman, Chi-Hsien Lin,
	Wright Feng, David S. Miller, Pieter-Paul Giesberts, Ian Molton,
	linux-wireless, brcm80211-dev-list.pdl, brcm80211-dev-list,
	netdev, linux-kernel
In-Reply-To: <5B294920.5050909@broadcom.com>

Arend van Spriel <arend.vanspriel@broadcom.com> writes:

> On 5/30/2018 11:06 AM, Michael Trimarchi wrote:
>> Using built-in in kernel image without a firmware in filesystem
>> or in the kernel image can lead to a kernel NULL pointer deference.
>> Watchdog need to be stopped in brcmf_sdio_remove
>>
>> The system is going down NOW!
>> [ 1348.110759] Unable to handle kernel NULL pointer dereference at
>> virtual address 000002f8
>> Sent SIGTERM to all processes
>> [ 1348.121412] Mem abort info:
>> [ 1348.126962]   ESR = 0x96000004
>> [ 1348.130023]   Exception class = DABT (current EL), IL = 32 bits
>> [ 1348.135948]   SET = 0, FnV = 0
>> [ 1348.138997]   EA = 0, S1PTW = 0
>> [ 1348.142154] Data abort info:
>> [ 1348.145045]   ISV = 0, ISS = 0x00000004
>> [ 1348.148884]   CM = 0, WnR = 0
>> [ 1348.151861] user pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
>> [ 1348.158475] [00000000000002f8] pgd=0000000000000000
>> [ 1348.163364] Internal error: Oops: 96000004 [#1] PREEMPT SMP
>> [ 1348.168927] Modules linked in: ipv6
>> [ 1348.172421] CPU: 3 PID: 1421 Comm: brcmf_wdog/mmc0 Not tainted
>> 4.17.0-rc5-next-20180517 #18
>> [ 1348.180757] Hardware name: Amarula A64-Relic (DT)
>> [ 1348.185455] pstate: 60000005 (nZCv daif -PAN -UAO)
>> [ 1348.190251] pc : brcmf_sdiod_freezer_count+0x0/0x20
>> [ 1348.195124] lr : brcmf_sdio_watchdog_thread+0x64/0x290
>> [ 1348.200253] sp : ffff00000b85be30
>> [ 1348.203561] x29: ffff00000b85be30 x28: 0000000000000000
>> [ 1348.208868] x27: ffff00000b6cb918 x26: ffff80003b990638
>> [ 1348.214176] x25: ffff0000087b1a20 x24: ffff80003b94f800
>> [ 1348.219483] x23: ffff000008e620c8 x22: ffff000008f0b660
>> [ 1348.224790] x21: ffff000008c6a858 x20: 00000000fffffe00
>> [ 1348.230097] x19: ffff80003b94f800 x18: 0000000000000001
>> [ 1348.235404] x17: 0000ffffab2e8a74 x16: ffff0000080d7de8
>> [ 1348.240711] x15: 0000000000000000 x14: 0000000000000400
>> [ 1348.246018] x13: 0000000000000400 x12: 0000000000000001
>> [ 1348.251324] x11: 00000000000002c4 x10: 0000000000000a10
>> [ 1348.256631] x9 : ffff00000b85bc40 x8 : ffff80003be11870
>> [ 1348.261937] x7 : ffff80003dfc7308 x6 : 000000078ff08b55
>> [ 1348.267243] x5 : 00000139e1058400 x4 : 0000000000000000
>> [ 1348.272550] x3 : dead000000000100 x2 : 958f2788d6618100
>> [ 1348.277856] x1 : 00000000fffffe00 x0 : 0000000000000000
>
> Forgot about this one.
>
> Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>

Should this go to 4.18?

-- 
Kalle Valo

^ permalink raw reply

* Re: [PATCH] ucc_geth: Add BQL support
From: Joakim Tjernlund @ 2018-06-19 19:06 UTC (permalink / raw)
  To: leoyang.li@nxp.com, dave.taht@gmail.com; +Cc: netdev@vger.kernel.org
In-Reply-To: <CAA93jw4fk1zCqSJq3pZ2uNKd8N=foChF56O1TX5At1yi3yPd5Q@mail.gmail.com>

On Tue, 2018-06-19 at 11:37 -0700, Dave Taht wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
> 
> 
> very happy to see this. is there a specific chip or devboard this runs on?

This driver is for MPC83xx family SOCs(possibly others as well) on our custom boards, used in 
our telecom product.

You are actually the reason I impl. this :)

 Jocke

> 
> On Tue, Jun 19, 2018 at 11:24 AM, Li Yang <leoyang.li@nxp.com> wrote:
> > On Tue, Jun 19, 2018 at 11:30 AM, Joakim Tjernlund
> > <joakim.tjernlund@infinera.com> wrote:
> > > Signed-off-by: Joakim Tjernlund <joakim.tjernlund@infinera.com>
> > 
> > Acked-by: Li Yang <leoyang.li@nxp.com>
> > 
> > > ---
> > >  drivers/net/ethernet/freescale/ucc_geth.c | 7 ++++++-
> > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/net/ethernet/freescale/ucc_geth.c b/drivers/net/ethernet/freescale/ucc_geth.c
> > > index f77ba9fa257b..6c99a9af6647 100644
> > > --- a/drivers/net/ethernet/freescale/ucc_geth.c
> > > +++ b/drivers/net/ethernet/freescale/ucc_geth.c
> > > @@ -3096,6 +3096,7 @@ static int ucc_geth_start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > 
> > >         ugeth_vdbg("%s: IN", __func__);
> > > 
> > > +       netdev_sent_queue(dev, skb->len);
> > >         spin_lock_irqsave(&ugeth->lock, flags);
> > > 
> > >         dev->stats.tx_bytes += skb->len;
> > > @@ -3242,6 +3243,8 @@ static int ucc_geth_tx(struct net_device *dev, u8 txQ)
> > >         struct ucc_geth_private *ugeth = netdev_priv(dev);
> > >         u8 __iomem *bd;         /* BD pointer */
> > >         u32 bd_status;
> > > +       int howmany = 0;
> > > +       unsigned int bytes_sent = 0;
> > > 
> > >         bd = ugeth->confBd[txQ];
> > >         bd_status = in_be32((u32 __iomem *)bd);
> > > @@ -3257,7 +3260,8 @@ static int ucc_geth_tx(struct net_device *dev, u8 txQ)
> > >                 skb = ugeth->tx_skbuff[txQ][ugeth->skb_dirtytx[txQ]];
> > >                 if (!skb)
> > >                         break;
> > > -
> > > +               howmany++;
> > > +               bytes_sent += skb->len;
> > >                 dev->stats.tx_packets++;
> > > 
> > >                 dev_consume_skb_any(skb);
> > > @@ -3279,6 +3283,7 @@ static int ucc_geth_tx(struct net_device *dev, u8 txQ)
> > >                 bd_status = in_be32((u32 __iomem *)bd);
> > >         }
> > >         ugeth->confBd[txQ] = bd;
> > > +       netdev_completed_queue(dev, howmany, bytes_sent);
> > >         return 0;
> > >  }
> > > 
> > > --
> > > 2.13.6
> > > 
> 
> 
> 
> --
> 
> Dave Täht
> CEO, TekLibre, LLC
> http://www.teklibre.com
> Tel: 1-669-226-2619

^ permalink raw reply

* Re: [PATCH] ucc_geth: Add BQL support
From: Dave Taht @ 2018-06-19 18:37 UTC (permalink / raw)
  To: Li Yang; +Cc: Joakim Tjernlund, Netdev
In-Reply-To: <CADRPPNRWh=L2DhiWbPn4z5MXzwLutps_B-xZT9Rp4R5B0VMgzQ@mail.gmail.com>

very happy to see this. is there a specific chip or devboard this runs on?

On Tue, Jun 19, 2018 at 11:24 AM, Li Yang <leoyang.li@nxp.com> wrote:
> On Tue, Jun 19, 2018 at 11:30 AM, Joakim Tjernlund
> <joakim.tjernlund@infinera.com> wrote:
>> Signed-off-by: Joakim Tjernlund <joakim.tjernlund@infinera.com>
>
> Acked-by: Li Yang <leoyang.li@nxp.com>
>
>> ---
>>  drivers/net/ethernet/freescale/ucc_geth.c | 7 ++++++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/freescale/ucc_geth.c b/drivers/net/ethernet/freescale/ucc_geth.c
>> index f77ba9fa257b..6c99a9af6647 100644
>> --- a/drivers/net/ethernet/freescale/ucc_geth.c
>> +++ b/drivers/net/ethernet/freescale/ucc_geth.c
>> @@ -3096,6 +3096,7 @@ static int ucc_geth_start_xmit(struct sk_buff *skb, struct net_device *dev)
>>
>>         ugeth_vdbg("%s: IN", __func__);
>>
>> +       netdev_sent_queue(dev, skb->len);
>>         spin_lock_irqsave(&ugeth->lock, flags);
>>
>>         dev->stats.tx_bytes += skb->len;
>> @@ -3242,6 +3243,8 @@ static int ucc_geth_tx(struct net_device *dev, u8 txQ)
>>         struct ucc_geth_private *ugeth = netdev_priv(dev);
>>         u8 __iomem *bd;         /* BD pointer */
>>         u32 bd_status;
>> +       int howmany = 0;
>> +       unsigned int bytes_sent = 0;
>>
>>         bd = ugeth->confBd[txQ];
>>         bd_status = in_be32((u32 __iomem *)bd);
>> @@ -3257,7 +3260,8 @@ static int ucc_geth_tx(struct net_device *dev, u8 txQ)
>>                 skb = ugeth->tx_skbuff[txQ][ugeth->skb_dirtytx[txQ]];
>>                 if (!skb)
>>                         break;
>> -
>> +               howmany++;
>> +               bytes_sent += skb->len;
>>                 dev->stats.tx_packets++;
>>
>>                 dev_consume_skb_any(skb);
>> @@ -3279,6 +3283,7 @@ static int ucc_geth_tx(struct net_device *dev, u8 txQ)
>>                 bd_status = in_be32((u32 __iomem *)bd);
>>         }
>>         ugeth->confBd[txQ] = bd;
>> +       netdev_completed_queue(dev, howmany, bytes_sent);
>>         return 0;
>>  }
>>
>> --
>> 2.13.6
>>



-- 

Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619

^ permalink raw reply

* Re: [PATCH] ucc_geth: Add BQL support
From: Li Yang @ 2018-06-19 18:24 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Netdev
In-Reply-To: <20180619163036.20578-1-joakim.tjernlund@infinera.com>

On Tue, Jun 19, 2018 at 11:30 AM, Joakim Tjernlund
<joakim.tjernlund@infinera.com> wrote:
> Signed-off-by: Joakim Tjernlund <joakim.tjernlund@infinera.com>

Acked-by: Li Yang <leoyang.li@nxp.com>

> ---
>  drivers/net/ethernet/freescale/ucc_geth.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/freescale/ucc_geth.c b/drivers/net/ethernet/freescale/ucc_geth.c
> index f77ba9fa257b..6c99a9af6647 100644
> --- a/drivers/net/ethernet/freescale/ucc_geth.c
> +++ b/drivers/net/ethernet/freescale/ucc_geth.c
> @@ -3096,6 +3096,7 @@ static int ucc_geth_start_xmit(struct sk_buff *skb, struct net_device *dev)
>
>         ugeth_vdbg("%s: IN", __func__);
>
> +       netdev_sent_queue(dev, skb->len);
>         spin_lock_irqsave(&ugeth->lock, flags);
>
>         dev->stats.tx_bytes += skb->len;
> @@ -3242,6 +3243,8 @@ static int ucc_geth_tx(struct net_device *dev, u8 txQ)
>         struct ucc_geth_private *ugeth = netdev_priv(dev);
>         u8 __iomem *bd;         /* BD pointer */
>         u32 bd_status;
> +       int howmany = 0;
> +       unsigned int bytes_sent = 0;
>
>         bd = ugeth->confBd[txQ];
>         bd_status = in_be32((u32 __iomem *)bd);
> @@ -3257,7 +3260,8 @@ static int ucc_geth_tx(struct net_device *dev, u8 txQ)
>                 skb = ugeth->tx_skbuff[txQ][ugeth->skb_dirtytx[txQ]];
>                 if (!skb)
>                         break;
> -
> +               howmany++;
> +               bytes_sent += skb->len;
>                 dev->stats.tx_packets++;
>
>                 dev_consume_skb_any(skb);
> @@ -3279,6 +3283,7 @@ static int ucc_geth_tx(struct net_device *dev, u8 txQ)
>                 bd_status = in_be32((u32 __iomem *)bd);
>         }
>         ugeth->confBd[txQ] = bd;
> +       netdev_completed_queue(dev, howmany, bytes_sent);
>         return 0;
>  }
>
> --
> 2.13.6
>

^ permalink raw reply

* Re: [PATCH] dt-bindings: Fix unbalanced quotation marks
From: Dmitry Torokhov @ 2018-06-19 18:23 UTC (permalink / raw)
  To: Jonathan Neuschäfer
  Cc: devicetree, Kukjin Kim, Krzysztof Kozlowski, Rob Herring,
	Mark Rutland, Linus Walleij, Thomas Gleixner, Jason Cooper,
	Marc Zyngier, Thierry Reding, Jonathan Hunter, Maxime Coquelin,
	Alexandre Torgue, Hauke Mehrtens, Rafał Miłecki,
	Ralf Baechle, Paul Burton, James Hogan, Madalin Bucur
In-Reply-To: <20180617143127.11421-1-j.neuschaefer@gmx.net>

On Sun, Jun 17, 2018 at 04:31:18PM +0200, Jonathan Neuschäfer wrote:
> diff --git a/Documentation/devicetree/bindings/input/touchscreen/hideep.txt b/Documentation/devicetree/bindings/input/touchscreen/hideep.txt
> index 121d9b7c79a2..1063c30d53f7 100644
> --- a/Documentation/devicetree/bindings/input/touchscreen/hideep.txt
> +++ b/Documentation/devicetree/bindings/input/touchscreen/hideep.txt
> @@ -32,7 +32,7 @@ i2c@00000000 {
>  		reg = <0x6c>;
>  		interrupt-parent = <&gpx1>;
>  		interrupts = <2 IRQ_TYPE_LEVEL_LOW>;
> -		vdd-supply = <&ldo15_reg>";
> +		vdd-supply = <&ldo15_reg>;
>  		vid-supply = <&ldo18_reg>;
>  		reset-gpios = <&gpx1 5 0>;
>  		touchscreen-size-x = <1080>;

Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

-- 
Dmitry

^ permalink raw reply

* Re: [PATCH V2] brcmfmac: stop watchdog before detach and free everything
From: Arend van Spriel @ 2018-06-19 18:19 UTC (permalink / raw)
  To: Michael Trimarchi
  Cc: Franky Lin, Hante Meuleman, Chi-Hsien Lin, Wright Feng,
	Kalle Valo, David S. Miller, Pieter-Paul Giesberts, Ian Molton,
	linux-wireless, brcm80211-dev-list.pdl, brcm80211-dev-list,
	netdev, linux-kernel
In-Reply-To: <20180530090633.GA15390@panicking>

On 5/30/2018 11:06 AM, Michael Trimarchi wrote:
> Using built-in in kernel image without a firmware in filesystem
> or in the kernel image can lead to a kernel NULL pointer deference.
> Watchdog need to be stopped in brcmf_sdio_remove
>
> The system is going down NOW!
> [ 1348.110759] Unable to handle kernel NULL pointer dereference at virtual address 000002f8
> Sent SIGTERM to all processes
> [ 1348.121412] Mem abort info:
> [ 1348.126962]   ESR = 0x96000004
> [ 1348.130023]   Exception class = DABT (current EL), IL = 32 bits
> [ 1348.135948]   SET = 0, FnV = 0
> [ 1348.138997]   EA = 0, S1PTW = 0
> [ 1348.142154] Data abort info:
> [ 1348.145045]   ISV = 0, ISS = 0x00000004
> [ 1348.148884]   CM = 0, WnR = 0
> [ 1348.151861] user pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
> [ 1348.158475] [00000000000002f8] pgd=0000000000000000
> [ 1348.163364] Internal error: Oops: 96000004 [#1] PREEMPT SMP
> [ 1348.168927] Modules linked in: ipv6
> [ 1348.172421] CPU: 3 PID: 1421 Comm: brcmf_wdog/mmc0 Not tainted 4.17.0-rc5-next-20180517 #18
> [ 1348.180757] Hardware name: Amarula A64-Relic (DT)
> [ 1348.185455] pstate: 60000005 (nZCv daif -PAN -UAO)
> [ 1348.190251] pc : brcmf_sdiod_freezer_count+0x0/0x20
> [ 1348.195124] lr : brcmf_sdio_watchdog_thread+0x64/0x290
> [ 1348.200253] sp : ffff00000b85be30
> [ 1348.203561] x29: ffff00000b85be30 x28: 0000000000000000
> [ 1348.208868] x27: ffff00000b6cb918 x26: ffff80003b990638
> [ 1348.214176] x25: ffff0000087b1a20 x24: ffff80003b94f800
> [ 1348.219483] x23: ffff000008e620c8 x22: ffff000008f0b660
> [ 1348.224790] x21: ffff000008c6a858 x20: 00000000fffffe00
> [ 1348.230097] x19: ffff80003b94f800 x18: 0000000000000001
> [ 1348.235404] x17: 0000ffffab2e8a74 x16: ffff0000080d7de8
> [ 1348.240711] x15: 0000000000000000 x14: 0000000000000400
> [ 1348.246018] x13: 0000000000000400 x12: 0000000000000001
> [ 1348.251324] x11: 00000000000002c4 x10: 0000000000000a10
> [ 1348.256631] x9 : ffff00000b85bc40 x8 : ffff80003be11870
> [ 1348.261937] x7 : ffff80003dfc7308 x6 : 000000078ff08b55
> [ 1348.267243] x5 : 00000139e1058400 x4 : 0000000000000000
> [ 1348.272550] x3 : dead000000000100 x2 : 958f2788d6618100
> [ 1348.277856] x1 : 00000000fffffe00 x0 : 0000000000000000

Forgot about this one.

Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
> Signed-off-by: Michael Trimarchi <michael@amarulasolutions.com>
> ---
>   drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c | 7 +++++++
>   1 file changed, 7 insertions(+)
>
> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
> index 412a05b..061f69d 100644
> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
> @@ -4294,6 +4294,13 @@ void brcmf_sdio_remove(struct brcmf_sdio *bus)
>   	brcmf_dbg(TRACE, "Enter\n");
>
>   	if (bus) {
> +		/* Stop watchdog task */
> +		if (bus->watchdog_tsk) {
> +			send_sig(SIGTERM, bus->watchdog_tsk, 1);
> +			kthread_stop(bus->watchdog_tsk);
> +			bus->watchdog_tsk = NULL;
> +		}
> +
>   		/* De-register interrupt handler */
>   		brcmf_sdiod_intr_unregister(bus->sdiodev);
>
>

^ permalink raw reply

* Re: [net RFC] net/mlx4_en: Use frag stride in crossing page boundary condition
From: Saeed Mahameed @ 2018-06-19 18:05 UTC (permalink / raw)
  To: eric.dumazet@gmail.com, kafai@fb.com, Tariq Toukan
  Cc: netdev@vger.kernel.org, edumazet@google.com
In-Reply-To: <1ddecaaa-9613-03ba-d761-a4d3410c4f7d@gmail.com>

On Thu, 2018-06-14 at 16:49 -0700, Eric Dumazet wrote:
> 
> On 06/14/2018 02:04 PM, Saeed Mahameed wrote:
> 
> > I was looking at the code without my fix :)
> > 
> > with my fix:
> > release = frags->page_offset + frag_info->frag_stride > PAGE_SIZE;
> > 
> > for XDP: frag_info->frag_stride is PAGE_SIZE, so release will
> > always be
> > true regardless of PAGE_SIZE.
> > 
> > So i guess i didn't quite understand your PowerPC concern.. can you
> > elaborate ?
> > 
> 
> So your maths with PAGE_SIZE=65536 and MTU 9000
> 
> frag_stride is about 9344
> 
> So if the last chunk of the page has 9100 bytes, we wont be able to
> use it, while really we should be able to use it.
> 
> 

this is only true for XDP setup, for non XDP max stride_size can only
be around ~3k and only for mtu > ~6k

For XDP setup you suggested:
-               priv->frag_info[0].frag_size = eff_mtu;
+               priv->frag_info[0].frag_size = PAGE_SIZE;

currently the condition is:

release = frags->page_offset + frag_info->frag_size > PAGE_SIZE;

so my solution and yours have the same problem you described above.

the problem is not with the initial values or with stride/farg size
math, it just that in XDP we shouldn't reuse at ALL. I agree with you
that we need to optimize and maybe for PAGE_SIZE > 8k we need to allow
XDP setup to reuses. but for now there is a data corruption to handle.

^ permalink raw reply

* [RFC v2 PATCH 3/4] ebpf: Add sample ebpf program for SOCKET_SG_FILTER
From: Tushar Dave @ 2018-06-19 18:00 UTC (permalink / raw)
  To: ast, daniel, davem, jakub.kicinski, quentin.monnet, jiong.wang,
	guro, sandipan, john.fastabend, kafai, rdna, brakmo, netdev, acme,
	sowmini.varadhan
In-Reply-To: <1529431217-5264-1-git-send-email-tushar.n.dave@oracle.com>

Add a sample program that shows how socksg program is used and attached
to socket filter. The kernel sample program deals with struct
scatterlist that is passed as bpf context.

When run in server mode, the sample RDS program opens PF_RDS socket,
attaches eBPF program to RDS socket which then uses bpf_sg_next
helper along with bpf tail calls to retrieve packet data contained in
struct scatterlist form.

To ease testing, RDS client functionality is also added so that users
can generate RDS packet.

Server:
[root@lab71 bpf]# ./rds_filter -s 192.168.3.71 -t tcp
running server in a loop
transport tcp
server bound to address: 192.168.3.71 port 4000
server listening on 192.168.3.71

Client:
[root@lab70 bpf]# ./rds_filter -s 192.168.3.71 -c 192.168.3.70 -t tcp
transport tcp
client bound to address: 192.168.3.70 port 25278
client sending 8192 byte message  from 192.168.3.70 to 192.168.3.71 on
port 25278
payload contains:30 31 32 33 34 35 36 37 38 39 ...

Server output:
192.168.3.71 received a packet from 192.168.3.71 of len 8192 cmsg len 0,
on port 25278
payload contains:30 31 32 33 34 35 36 37 38 39 ...
server listening on 192.168.3.71

BPF program output:
[root@lab71]# cat /sys/kernel/debug/tracing/trace_pipe
          <idle>-0     [007] ..s.   525.994894: 0: Print first 6 bytes from sg element
          <idle>-0     [007] ..s.   525.994897: 0: First sg element:
          <idle>-0     [007] ..s.   525.994899: 0: 30 31 32
          <idle>-0     [007] ..s.   525.994900: 0: 33 34 35
          <idle>-0     [007] ..s.   525.994901: 0: next sg element:
          <idle>-0     [007] ..s.   525.994902: 0: a8 a9 aa
          <idle>-0     [007] ..s.   525.994903: 0: ab ac ad
          <idle>-0     [007] ..s.   525.994904: 0: next sg element:
          <idle>-0     [007] ..s.   525.994905: 0: 50 51 52
          <idle>-0     [007] ..s.   525.994905: 0: 53 54 55
          <idle>-0     [007] ..s.   525.994906: 0: next sg element:
          <idle>-0     [007] ..s.   525.994907: 0: f8 f9 fa
          <idle>-0     [007] ..s.   525.994907: 0: fb fc fd
          <idle>-0     [007] ..s.   525.994908: 0: next sg element:
          <idle>-0     [007] ..s.   525.994909: 0: a0 a1 a2
          <idle>-0     [007] ..s.   525.994909: 0: a3 a4 a5
          <idle>-0     [007] ..s.   525.994910: 0: next sg element:
          <idle>-0     [007] ..s.   525.994911: 0: 48 49 4a
          <idle>-0     [007] ..s.   525.994911: 0: 4b 4c 4d
          <idle>-0     [007] ..s.   525.994912: 0: no more sg element

Similary specifying '-t ib' will run this on IB link.

Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
 samples/bpf/Makefile          |   3 +
 samples/bpf/rds_filter_kern.c |  78 ++++++++++
 samples/bpf/rds_filter_user.c | 339 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 420 insertions(+)
 create mode 100644 samples/bpf/rds_filter_kern.c
 create mode 100644 samples/bpf/rds_filter_user.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 1303af1..5de238b 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -52,6 +52,7 @@ hostprogs-y += xdp_adjust_tail
 hostprogs-y += xdpsock
 hostprogs-y += xdp_fwd
 hostprogs-y += task_fd_query
+hostprogs-y += rds_filter
 
 # Libbpf dependencies
 LIBBPF = $(TOOLS_PATH)/lib/bpf/libbpf.a
@@ -107,6 +108,7 @@ xdp_adjust_tail-objs := xdp_adjust_tail_user.o
 xdpsock-objs := bpf_load.o xdpsock_user.o
 xdp_fwd-objs := bpf_load.o xdp_fwd_user.o
 task_fd_query-objs := bpf_load.o task_fd_query_user.o $(TRACE_HELPERS)
+rds_filter-objs := bpf_load.o rds_filter_user.o
 
 # Tell kbuild to always build the programs
 always := $(hostprogs-y)
@@ -163,6 +165,7 @@ always += xdp_adjust_tail_kern.o
 always += xdpsock_kern.o
 always += xdp_fwd_kern.o
 always += task_fd_query_kern.o
+always += rds_filter_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 HOSTCFLAGS += -I$(srctree)/tools/lib/
diff --git a/samples/bpf/rds_filter_kern.c b/samples/bpf/rds_filter_kern.c
new file mode 100644
index 0000000..8fe3d3c
--- /dev/null
+++ b/samples/bpf/rds_filter_kern.c
@@ -0,0 +1,78 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/filter.h>
+#include <linux/ptrace.h>
+#include <linux/version.h>
+#include <uapi/linux/bpf.h>
+#include <linux/rds.h>
+#include "bpf_helpers.h"
+
+#define PROG(F) SEC("socksg/"__stringify(F)) int bpf_func_##F
+
+#define bpf_printk(fmt, ...)				\
+({							\
+	char ____fmt[] = fmt;				\
+	bpf_trace_printk(____fmt, sizeof(____fmt),	\
+			##__VA_ARGS__);			\
+})
+
+struct bpf_map_def SEC("maps") jmp_table = {
+	.type = BPF_MAP_TYPE_PROG_ARRAY,
+	.key_size = sizeof(u32),
+	.value_size = sizeof(u32),
+	.max_entries = 2,
+};
+
+#define SG1 1
+
+static inline void dump_sg(struct sg_filter_md *sg)
+{
+	void *data = (void *)(long) sg->data;
+	void *data_end = (void *)(long) sg->data_end;
+	unsigned char *d;
+
+	if (data + 8 > data_end)
+		return;
+
+	d = (unsigned char *)data;
+	bpf_printk("%x %x %x\n", d[0], d[1], d[2]);
+	bpf_printk("%x %x %x\n", d[3], d[4], d[5]);
+
+	return;
+
+}
+
+static void sg_dispatcher(struct sg_filter_md *sg)
+{
+	int ret;
+
+	ret = bpf_sg_next(sg);
+	if (ret == -ENODATA) {
+		bpf_printk("no more sg element\n");
+		return;
+	}
+
+	/* We use same function to walk sg list */
+	bpf_tail_call(sg, &jmp_table, 1);
+}
+
+/* walk sg list */
+PROG(SG1)(struct sg_filter_md *sg)
+{
+	bpf_printk("next sg element:\n");
+	dump_sg(sg);
+	sg_dispatcher(sg);
+	return 0;
+}
+
+SEC("socksg/0")
+int main_prog(struct sg_filter_md *sg)
+{
+	bpf_printk("Print first 6 bytes from sg element\n");
+	bpf_printk("First sg element:\n");
+	dump_sg(sg);
+	sg_dispatcher(sg);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
+u32 _version SEC("version") = LINUX_VERSION_CODE;
diff --git a/samples/bpf/rds_filter_user.c b/samples/bpf/rds_filter_user.c
new file mode 100644
index 0000000..1165f1e
--- /dev/null
+++ b/samples/bpf/rds_filter_user.c
@@ -0,0 +1,339 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <arpa/inet.h>
+#include <assert.h>
+#include "bpf_load.h"
+#include <getopt.h>
+#include <errno.h>
+#include <netinet/in.h>
+#include <limits.h>
+#include <linux/sockios.h>
+#include <linux/rds.h>
+#include <linux/errqueue.h>
+#include <linux/bpf.h>
+#include <strings.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <string.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <unistd.h>
+
+#define TESTPORT	4000
+#define BUFSIZE		8192
+
+int transport = -1;
+
+static int str2trans(const char *trans)
+{
+	if (strcmp(trans, "tcp") == 0)
+		return RDS_TRANS_TCP;
+	if (strcmp(trans, "ib") == 0)
+		return RDS_TRANS_IB;
+	return (RDS_TRANS_NONE);
+}
+
+static const char *trans2str(int trans)
+{
+	switch (trans) {
+	case RDS_TRANS_TCP:
+		return ("tcp");
+	case RDS_TRANS_IB:
+		return ("ib");
+	case RDS_TRANS_NONE:
+		return ("none");
+	default:
+		return ("unknown");
+	}
+}
+
+static int gettransport(int sock)
+{
+	int err;
+	char val;
+	socklen_t len = sizeof(int);
+
+	err = getsockopt(sock, SOL_RDS, SO_RDS_TRANSPORT,
+			 (char *)&val, &len);
+	if (err < 0) {
+		fprintf(stderr, "%s: getsockopt %s\n",
+			__func__, strerror(errno));
+		return err;
+	}
+	return (int)val;
+}
+
+static int settransport(int sock, int transport)
+{
+	int err;
+
+	err = setsockopt(sock, SOL_RDS, SO_RDS_TRANSPORT,
+			 (char *)&transport, sizeof(transport));
+	if (err < 0) {
+		fprintf(stderr, "could not set transport %s, %s\n",
+			trans2str(transport), strerror(errno));
+	}
+	return err;
+}
+
+static void print_sock_local_info(int fd, char *str, struct sockaddr_in *ret)
+{
+	socklen_t sin_size = sizeof(struct sockaddr_in);
+	struct sockaddr_in sin;
+	int err;
+
+	err = getsockname(fd, (struct sockaddr *)&sin, &sin_size);
+	if (err < 0) {
+		fprintf(stderr, "%s getsockname %s\n",
+			__func__, strerror(errno));
+		return;
+	}
+	printf("%s address: %s port %d\n",
+		(str ? str : ""), inet_ntoa(sin.sin_addr), ntohs(sin.sin_port));
+
+	if (ret != NULL)
+		*ret = sin;
+}
+
+static void print_payload(char *buf)
+{
+	int i;
+
+	printf("payload contains:");
+	for (i = 0; i < 10; i++)
+		printf("%x ", buf[i]);
+	printf("...\n");
+}
+
+static void server(char *address, in_port_t port)
+{
+	struct sockaddr_in sin, din;
+	struct msghdr msg;
+	struct iovec *iov;
+	int rc, sock;
+	char *buf;
+
+	buf = calloc(BUFSIZE, sizeof(char));
+	if (!buf) {
+		fprintf(stderr, "%s: calloc %s\n", __func__, strerror(errno));
+		return;
+	}
+
+	sock = socket(PF_RDS, SOCK_SEQPACKET, 0);
+	if (sock < 0) {
+		fprintf(stderr, "%s: socket %s\n", __func__, strerror(errno));
+		goto out;
+	}
+	if (settransport(sock, transport) < 0)
+		goto out;
+
+	printf("transport %s\n", trans2str(gettransport(sock)));
+
+	memset(&sin, 0, sizeof(sin));
+	sin.sin_family = AF_INET;
+	sin.sin_addr.s_addr = inet_addr(address);
+	sin.sin_port = htons(port);
+
+	rc = bind(sock, (struct sockaddr *)&sin, sizeof(sin));
+	if (rc < 0) {
+		fprintf(stderr, "%s: bind %s\n", __func__, strerror(errno));
+		goto out;
+	}
+
+	/* attach bpf prog */
+	assert(setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd[1],
+			  sizeof(prog_fd[0])) == 0);
+
+	print_sock_local_info(sock, "server bound to", NULL);
+
+	iov = calloc(1, sizeof(struct iovec));
+	if (!iov) {
+		fprintf(stderr, "%s: calloc %s\n", __func__, strerror(errno));
+		goto out;
+	}
+
+	while (1) {
+		memset(buf, 0, BUFSIZE);
+		iov[0].iov_base = buf;
+		iov[0].iov_len = BUFSIZE;
+
+		memset(&msg, 0, sizeof(msg));
+		msg.msg_name = &din;
+		msg.msg_namelen = sizeof(din);
+		msg.msg_iov = iov;
+		msg.msg_iovlen = 1;
+
+		printf("server listening on %s\n", inet_ntoa(sin.sin_addr));
+
+		rc = recvmsg(sock, &msg, 0);
+		if (rc < 0) {
+			fprintf(stderr, "%s: recvmsg %s\n",
+				__func__, strerror(errno));
+			break;
+		}
+
+		printf("%s received a packet from %s of len %d cmsg len %d, on port %d\n",
+			inet_ntoa(sin.sin_addr),
+			inet_ntoa(din.sin_addr),
+			(uint32_t) iov[0].iov_len,
+			(uint32_t) msg.msg_controllen,
+			ntohs(din.sin_port));
+
+		print_payload(buf);
+	}
+	free(iov);
+out:
+	free(buf);
+}
+
+static void create_message(char *buf)
+{
+	unsigned int i;
+
+	for (i = 0; i < BUFSIZE; i++) {
+		buf[i] = i + 0x30;
+	}
+}
+
+static int build_rds_packet(struct msghdr *msg, char *buf)
+{
+	struct iovec *iov;
+
+	iov = calloc(1, sizeof(struct iovec));
+	if (!iov) {
+		fprintf(stderr, "%s: calloc %s\n", __func__, strerror(errno));
+		return -1;
+	}
+
+	msg->msg_iov = iov;
+	msg->msg_iovlen = 1;
+
+	iov[0].iov_base = buf;
+	iov[0].iov_len = BUFSIZE * sizeof(char);
+
+	return 0;
+}
+
+static void client(char *localaddr, char *remoteaddr, in_port_t server_port)
+{
+	struct sockaddr_in sin, din;
+	struct msghdr msg;
+	int rc, sock;
+	char *buf;
+
+	buf = calloc(BUFSIZE, sizeof(char));
+	if (!buf) {
+		fprintf(stderr, "%s: calloc %s\n", __func__, strerror(errno));
+		return;
+	}
+
+	create_message(buf);
+
+	sock = socket(PF_RDS, SOCK_SEQPACKET, 0);
+	if (sock < 0) {
+		fprintf(stderr, "%s: socket %s\n", __func__, strerror(errno));
+		goto out;
+	}
+
+	if (settransport(sock, transport) < 0)
+		goto out;
+
+	printf("transport %s\n", trans2str(gettransport(sock)));
+
+	memset(&sin, 0, sizeof(sin));
+	sin.sin_family = AF_INET;
+	sin.sin_addr.s_addr = inet_addr(localaddr);
+	sin.sin_port = 0;
+
+	rc = bind(sock, (struct sockaddr *)&sin, sizeof(sin));
+	if (rc < 0) {
+		fprintf(stderr, "%s: bind %s\n", __func__, strerror(errno));
+		goto out;
+	}
+	print_sock_local_info(sock, "client bound to",  &sin);
+
+	memset(&msg, 0, sizeof(msg));
+	msg.msg_name = &din;
+	msg.msg_namelen = sizeof(din);
+
+	memset(&din, 0, sizeof(din));
+	din.sin_family = AF_INET;
+	din.sin_addr.s_addr = inet_addr(remoteaddr);
+	din.sin_port = htons(server_port);
+
+	rc = build_rds_packet(&msg, buf);
+	if (rc < 0)
+		goto out;
+
+	printf("client sending %d byte message from %s to %s on port %d\n",
+		(uint32_t) msg.msg_iov->iov_len, localaddr,
+		remoteaddr, ntohs(sin.sin_port));
+
+	rc = sendmsg(sock, &msg, 0);
+	if (rc < 0)
+		fprintf(stderr, "%s: sendmsg %s\n", __func__, strerror(errno));
+
+	print_payload(buf);
+
+	if (msg.msg_control)
+		free(msg.msg_control);
+	if (msg.msg_iov)
+		free(msg.msg_iov);
+out:
+	free(buf);
+
+	return;
+}
+
+static void usage(char *progname)
+{
+	fprintf(stderr, "Usage %s [-s srvaddr] [-c clientaddr] [-t transport]"
+		"\n", progname);
+}
+
+int main(int argc, char **argv)
+{
+	in_port_t server_port = TESTPORT;
+	char *serveraddr = NULL;
+	char *clientaddr = NULL;
+	char filename[256];
+	int opt;
+
+	while ((opt = getopt(argc, argv, "s:c:t:")) != -1) {
+		switch (opt) {
+		case 's':
+			serveraddr = optarg;
+			break;
+		case 'c':
+			clientaddr = optarg;
+			break;
+		case 't':
+			transport = str2trans(optarg);
+			if (transport == RDS_TRANS_NONE) {
+				fprintf(stderr,
+					"unknown transport %s\n", optarg);
+					usage(argv[0]);
+					return (-1);
+			}
+			break;
+		default:
+			usage(argv[0]);
+			return 1;
+		}
+	}
+
+	snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
+
+	if (load_bpf_file(filename)) {
+		fprintf(stderr, "Error: load_bpf_file %s", bpf_log_buf);
+		return 1;
+	}
+
+	if (serveraddr && !clientaddr) {
+		printf("running server in a loop\n");
+		server(serveraddr, server_port);
+	} else if (serveraddr && clientaddr) {
+		client(clientaddr, serveraddr, server_port);
+	}
+
+	return 0;
+}
-- 
1.8.3.1

^ permalink raw reply related

* [RFC v2 PATCH 4/4] rds: invoke socket sg filter attached to rds socket
From: Tushar Dave @ 2018-06-19 18:00 UTC (permalink / raw)
  To: ast, daniel, davem, jakub.kicinski, quentin.monnet, jiong.wang,
	guro, sandipan, john.fastabend, kafai, rdna, brakmo, netdev, acme,
	sowmini.varadhan
In-Reply-To: <1529431217-5264-1-git-send-email-tushar.n.dave@oracle.com>

RDS module sits on top of TCP (rds_tcp) and IB (rds_rdma), so messages
arrive in form of skb (over TCP) and scatterlist (over IB/RDMA).
However, because socket filter only deal with skb (e.g. struct skb as
bpf context) we can only use socket filter for rds_tcp and not for
rds_rdma.

Considering one filtering solution for RDS, it seems that the common
denominator between sk_buff and scatterlist is scatterlist. Therefore,
this patch converts skb to sgvec and invoke sg_filter_run for
rds_tcp and simply invoke sg_filter_run for IB/rds_rdma.

Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
 net/rds/ib.c       |  1 +
 net/rds/ib.h       |  1 +
 net/rds/ib_recv.c  | 12 ++++++++++++
 net/rds/rds.h      |  2 ++
 net/rds/recv.c     | 16 ++++++++++++++++
 net/rds/tcp.c      |  2 ++
 net/rds/tcp.h      |  2 ++
 net/rds/tcp_recv.c | 38 ++++++++++++++++++++++++++++++++++++++
 8 files changed, 74 insertions(+)

diff --git a/net/rds/ib.c b/net/rds/ib.c
index 02deee2..3027832 100644
--- a/net/rds/ib.c
+++ b/net/rds/ib.c
@@ -421,6 +421,7 @@ struct rds_transport rds_ib_transport = {
 	.conn_path_shutdown	= rds_ib_conn_path_shutdown,
 	.inc_copy_to_user	= rds_ib_inc_copy_to_user,
 	.inc_free		= rds_ib_inc_free,
+	.inc_to_sg_get		= rds_ib_inc_to_sg_get,
 	.cm_initiate_connect	= rds_ib_cm_initiate_connect,
 	.cm_handle_connect	= rds_ib_cm_handle_connect,
 	.cm_connect_complete	= rds_ib_cm_connect_complete,
diff --git a/net/rds/ib.h b/net/rds/ib.h
index a6f4d7d..699b5b9b 100644
--- a/net/rds/ib.h
+++ b/net/rds/ib.h
@@ -375,6 +375,7 @@ void rds_ib_cm_connect_complete(struct rds_connection *conn,
 void rds_ib_recv_free_caches(struct rds_ib_connection *ic);
 void rds_ib_recv_refill(struct rds_connection *conn, int prefill, gfp_t gfp);
 void rds_ib_inc_free(struct rds_incoming *inc);
+int rds_ib_inc_to_sg_get(struct rds_incoming *inc, struct scatterlist **sg);
 int rds_ib_inc_copy_to_user(struct rds_incoming *inc, struct iov_iter *to);
 void rds_ib_recv_cqe_handler(struct rds_ib_connection *ic, struct ib_wc *wc,
 			     struct rds_ib_ack_state *state);
diff --git a/net/rds/ib_recv.c b/net/rds/ib_recv.c
index b4e421a..62be497 100644
--- a/net/rds/ib_recv.c
+++ b/net/rds/ib_recv.c
@@ -219,6 +219,18 @@ void rds_ib_inc_free(struct rds_incoming *inc)
 	rds_ib_recv_cache_put(&ibinc->ii_cache_entry, &ic->i_cache_incs);
 }
 
+int rds_ib_inc_to_sg_get(struct rds_incoming *inc, struct scatterlist **sg)
+{
+	struct rds_ib_incoming *ibinc;
+	struct rds_page_frag *frag;
+
+	ibinc = container_of(inc, struct rds_ib_incoming, ii_inc);
+	frag = list_entry(ibinc->ii_frags.next, struct rds_page_frag, f_item);
+	*sg =  &frag->f_sg;
+
+	return 0;
+}
+
 static void rds_ib_recv_clear_one(struct rds_ib_connection *ic,
 				  struct rds_ib_recv_work *recv)
 {
diff --git a/net/rds/rds.h b/net/rds/rds.h
index b04c333..f5ea833 100644
--- a/net/rds/rds.h
+++ b/net/rds/rds.h
@@ -528,6 +528,8 @@ struct rds_transport {
 	int (*recv_path)(struct rds_conn_path *cp);
 	int (*inc_copy_to_user)(struct rds_incoming *inc, struct iov_iter *to);
 	void (*inc_free)(struct rds_incoming *inc);
+	int (*inc_to_sg_get)(struct rds_incoming *inc, struct scatterlist **sg);
+	void (*inc_to_sg_put)(struct scatterlist **sg);
 
 	int (*cm_handle_connect)(struct rdma_cm_id *cm_id,
 				 struct rdma_cm_event *event);
diff --git a/net/rds/recv.c b/net/rds/recv.c
index dc67458..e0c5b4c 100644
--- a/net/rds/recv.c
+++ b/net/rds/recv.c
@@ -286,6 +286,7 @@ void rds_recv_incoming(struct rds_connection *conn, __be32 saddr, __be32 daddr,
 	struct sock *sk;
 	unsigned long flags;
 	struct rds_conn_path *cp;
+	struct sk_filter *filter;
 
 	inc->i_conn = conn;
 	inc->i_rx_jiffies = jiffies;
@@ -369,6 +370,21 @@ void rds_recv_incoming(struct rds_connection *conn, __be32 saddr, __be32 daddr,
 	/* We can be racing with rds_release() which marks the socket dead. */
 	sk = rds_rs_to_sk(rs);
 
+	rcu_read_lock();
+	filter = rcu_dereference(sk->sk_filter);
+	if (filter) {
+		if (conn->c_trans->inc_to_sg_get) {
+			struct scatterlist *sg;
+
+			if (conn->c_trans->inc_to_sg_get(inc, &sg) == 0) {
+				sg_filter_run(sk, sg);
+				if (conn->c_trans->inc_to_sg_put)
+					conn->c_trans->inc_to_sg_put(&sg);
+			}
+		}
+	}
+	rcu_read_unlock();
+
 	/* serialize with rds_release -> sock_orphan */
 	write_lock_irqsave(&rs->rs_recv_lock, flags);
 	if (!sock_flag(sk, SOCK_DEAD)) {
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index 351a284..b431854 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -376,6 +376,8 @@ struct rds_transport rds_tcp_transport = {
 	.conn_path_shutdown	= rds_tcp_conn_path_shutdown,
 	.inc_copy_to_user	= rds_tcp_inc_copy_to_user,
 	.inc_free		= rds_tcp_inc_free,
+	.inc_to_sg_get		= rds_tcp_inc_to_sg_get,
+	.inc_to_sg_put		= rds_tcp_inc_to_sg_put,
 	.stats_info_copy	= rds_tcp_stats_info_copy,
 	.exit			= rds_tcp_exit,
 	.t_owner		= THIS_MODULE,
diff --git a/net/rds/tcp.h b/net/rds/tcp.h
index c6fa080..466bdb9 100644
--- a/net/rds/tcp.h
+++ b/net/rds/tcp.h
@@ -82,6 +82,8 @@ void rds_tcp_restore_callbacks(struct socket *sock,
 int rds_tcp_recv_path(struct rds_conn_path *cp);
 void rds_tcp_inc_free(struct rds_incoming *inc);
 int rds_tcp_inc_copy_to_user(struct rds_incoming *inc, struct iov_iter *to);
+int rds_tcp_inc_to_sg_get(struct rds_incoming *inc, struct scatterlist **sg);
+void rds_tcp_inc_to_sg_put(struct scatterlist **sg);
 
 /* tcp_send.c */
 void rds_tcp_xmit_path_prepare(struct rds_conn_path *cp);
diff --git a/net/rds/tcp_recv.c b/net/rds/tcp_recv.c
index b9fbd2e..ce62712 100644
--- a/net/rds/tcp_recv.c
+++ b/net/rds/tcp_recv.c
@@ -56,6 +56,44 @@ void rds_tcp_inc_free(struct rds_incoming *inc)
 	kmem_cache_free(rds_tcp_incoming_slab, tinc);
 }
 
+#define MAX_SG 17
+int rds_tcp_inc_to_sg_get(struct rds_incoming *inc, struct scatterlist **sg)
+{
+	struct scatterlist *sg_list;
+	struct rds_tcp_incoming *tinc;
+	struct sk_buff *skb;
+	int num_sg = 0;
+
+	tinc = container_of(inc, struct rds_tcp_incoming, ti_inc);
+
+	/* For now we are assuming that the max sg elements we need is MAX_SG.
+	 * To determine actual number of sg elements we need to traverse the
+	 * skb queue e.g.
+	 *
+	 * skb_queue_walk(&tinc->ti_skb_list, skb) {
+	 *	num_sg += skb_shinfo(skb)->nr_frags + 1;
+	 * }
+	 */
+	sg_list = kzalloc(sizeof(*sg_list) * MAX_SG, GFP_KERNEL);
+	if (!sg_list)
+		return -ENOMEM;
+
+	sg_init_table(sg_list, MAX_SG);
+	skb_queue_walk(&tinc->ti_skb_list, skb) {
+		num_sg += skb_to_sgvec_nomark(skb, &sg_list[num_sg], 0,
+					      skb->len);
+	}
+	sg_mark_end(&sg_list[num_sg - 1]);
+	*sg = sg_list;
+
+	return 0;
+}
+
+void rds_tcp_inc_to_sg_put(struct scatterlist **sg)
+{
+	kfree(*sg);
+}
+
 /*
  * this is pretty lame, but, whatever.
  */
-- 
1.8.3.1

^ permalink raw reply related

* [RFC v2 PATCH 2/4] ebpf: Add sg_filter_run and sg helper
From: Tushar Dave @ 2018-06-19 18:00 UTC (permalink / raw)
  To: ast, daniel, davem, jakub.kicinski, quentin.monnet, jiong.wang,
	guro, sandipan, john.fastabend, kafai, rdna, brakmo, netdev, acme,
	sowmini.varadhan
In-Reply-To: <1529431217-5264-1-git-send-email-tushar.n.dave@oracle.com>

When sg_filter_run() is invoked it runs the attached eBPF
SOCKET_SG_FILTER program which deals with struct scatterlist.

In addition, this patch also adds bpf_sg_next helper function that
allows users to retrieve the next sg element from sg list.

Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
 include/linux/filter.h                    |  2 +
 include/uapi/linux/bpf.h                  | 10 ++++-
 net/core/filter.c                         | 72 +++++++++++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h            | 10 ++++-
 tools/testing/selftests/bpf/bpf_helpers.h |  3 ++
 5 files changed, 95 insertions(+), 2 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 71618b1..d176402 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1072,4 +1072,6 @@ struct bpf_sock_ops_kern {
 					 */
 };
 
+int sg_filter_run(struct sock *sk, struct scatterlist *sg);
+
 #endif /* __LINUX_FILTER_H__ */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index ef0a7b6..036432b 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2076,6 +2076,13 @@ struct bpf_stack_build_id {
  * 	Return
  * 		A 64-bit integer containing the current cgroup id based
  * 		on the cgroup within which the current task is running.
+ *
+ * int bpf_sg_next(struct bpf_scatterlist *sg)
+ *	Description
+ *		This helper allows user to retrieve next sg element from
+ *		sg list.
+ *	Return
+ *		Returns 0 on success, or a negative error in case of failure.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2158,7 +2165,8 @@ struct bpf_stack_build_id {
 	FN(rc_repeat),			\
 	FN(rc_keydown),			\
 	FN(skb_cgroup_id),		\
-	FN(get_current_cgroup_id),
+	FN(get_current_cgroup_id),	\
+	FN(sg_next),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/net/core/filter.c b/net/core/filter.c
index 8f67942..702ff5b 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -121,6 +121,53 @@ int sk_filter_trim_cap(struct sock *sk, struct sk_buff *skb, unsigned int cap)
 }
 EXPORT_SYMBOL(sk_filter_trim_cap);
 
+int sg_filter_run(struct sock *sk, struct scatterlist *sg)
+{
+	struct sk_filter *filter;
+	int err;
+
+	rcu_read_lock();
+	filter = rcu_dereference(sk->sk_filter);
+	if (filter) {
+		struct bpf_scatterlist bpfsg;
+		int num_sg;
+
+		if (!sg) {
+			err = -EINVAL;
+			goto out;
+		}
+
+		num_sg = sg_nents(sg);
+		if (num_sg <= 0) {
+			err = -EINVAL;
+			goto out;
+		}
+
+		/* We store a reference  to the sg list so it can later used by
+		 * eBPF helpers to retrieve the next sg element.
+		 */
+		bpfsg.num_sg = num_sg;
+		bpfsg.cur_sg = 0;
+		bpfsg.sg = sg;
+
+		/* For the first sg element, we store the pkt access pointers
+		 * into start and end so eBPF program can have pkt access using
+		 * data and data_end. The pkt access for subsequent element of
+		 * sg list is possible when eBPF program invokes bpf_sg_next
+		 * which takes care of setting start and end to the correct sg
+		 * element.
+		 */
+		bpfsg.start = sg_virt(sg);
+		bpfsg.end = bpfsg.start + sg->length;
+		BPF_PROG_RUN(filter->prog, &bpfsg);
+	}
+out:
+	rcu_read_unlock();
+
+	return err;
+}
+EXPORT_SYMBOL(sg_filter_run);
+
 BPF_CALL_1(bpf_skb_get_pay_offset, struct sk_buff *, skb)
 {
 	return skb_get_poff(skb);
@@ -3753,6 +3800,29 @@ static unsigned long bpf_xdp_copy(void *dst_buff, const void *src_buff,
 	.arg1_type      = ARG_PTR_TO_CTX,
 };
 
+BPF_CALL_1(bpf_sg_next, struct bpf_scatterlist *, bpfsg)
+{
+	struct scatterlist *sg = bpfsg->sg;
+	int cur_sg = bpfsg->cur_sg;
+
+	cur_sg++;
+	if (cur_sg >= bpfsg->num_sg)
+		return -ENODATA;
+
+	bpfsg->cur_sg = cur_sg;
+	bpfsg->start = sg_virt(&sg[cur_sg]);
+	bpfsg->end = bpfsg->start + sg[cur_sg].length;
+
+	return 0;
+}
+
+static const struct bpf_func_proto bpf_sg_next_proto = {
+	.func		= bpf_sg_next,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+};
+
 BPF_CALL_5(bpf_setsockopt, struct bpf_sock_ops_kern *, bpf_sock,
 	   int, level, int, optname, char *, optval, int, optlen)
 {
@@ -4720,6 +4790,8 @@ bool bpf_helper_changes_pkt_data(void *func)
 socksg_filter_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
 	switch (func_id) {
+	case BPF_FUNC_sg_next:
+		return &bpf_sg_next_proto;
 	default:
 		return bpf_base_func_proto(func_id);
 	}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index c87ae16..a298498 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2076,6 +2076,13 @@ struct bpf_stack_build_id {
  * 	Return
  * 		A 64-bit integer containing the current cgroup id based
  * 		on the cgroup within which the current task is running.
+ *
+ * int bpf_sg_next(struct bpf_scatterlist *sg)
+ *	Description
+ *		This helper allows user to retrieve next sg element from
+ *		sg list.
+ *	Return
+ *		Returns 0 on success, or a negative error in case of failure.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2158,7 +2165,8 @@ struct bpf_stack_build_id {
 	FN(rc_repeat),			\
 	FN(rc_keydown),			\
 	FN(skb_cgroup_id),		\
-	FN(get_current_cgroup_id),
+	FN(get_current_cgroup_id),	\
+	FN(sg_next),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index f2f28b6..1997ba2 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -133,6 +133,9 @@ static int (*bpf_rc_keydown)(void *ctx, unsigned int protocol,
 	(void *) BPF_FUNC_rc_keydown;
 static unsigned long long (*bpf_get_current_cgroup_id)(void) =
 	(void *) BPF_FUNC_get_current_cgroup_id;
+static unsigned long long (*bpf_sg_next)(void *ctx) =
+	(void *) BPF_FUNC_sg_next;
+
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions
-- 
1.8.3.1

^ permalink raw reply related

* [RFC v2 PATCH 1/4] eBPF: Add new eBPF prog type BPF_PROG_TYPE_SOCKET_SG_FILTER
From: Tushar Dave @ 2018-06-19 18:00 UTC (permalink / raw)
  To: ast, daniel, davem, jakub.kicinski, quentin.monnet, jiong.wang,
	guro, sandipan, john.fastabend, kafai, rdna, brakmo, netdev, acme,
	sowmini.varadhan
In-Reply-To: <1529431217-5264-1-git-send-email-tushar.n.dave@oracle.com>

Add new eBPF prog type BPF_PROG_TYPE_SOCKET_SG_FILTER which uses the
existing socket filter infrastructure for bpf program attach and load.
SOCKET_SG_FILTER eBPF program receives struct scatterlist as bpf context
contrast to SOCKET_FILTER which deals with struct skb. This is useful
for kernel entities that don't have skb to represent packet data but
want to run eBPF socket filter on packet data that is in form of struct
scatterlist e.g. IB/RDMA

Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
 include/linux/bpf_types.h      |  1 +
 include/linux/filter.h         |  8 +++++
 include/uapi/linux/bpf.h       |  7 ++++
 kernel/bpf/syscall.c           |  1 +
 kernel/bpf/verifier.c          |  1 +
 net/core/filter.c              | 77 ++++++++++++++++++++++++++++++++++++++++--
 samples/bpf/bpf_load.c         | 11 ++++--
 tools/bpf/bpftool/prog.c       |  1 +
 tools/include/uapi/linux/bpf.h |  7 ++++
 tools/lib/bpf/libbpf.c         |  3 ++
 tools/lib/bpf/libbpf.h         |  2 ++
 11 files changed, 114 insertions(+), 5 deletions(-)

diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index c5700c2..f8b4b56 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -16,6 +16,7 @@
 BPF_PROG_TYPE(BPF_PROG_TYPE_SOCK_OPS, sock_ops)
 BPF_PROG_TYPE(BPF_PROG_TYPE_SK_SKB, sk_skb)
 BPF_PROG_TYPE(BPF_PROG_TYPE_SK_MSG, sk_msg)
+BPF_PROG_TYPE(BPF_PROG_TYPE_SOCKET_SG_FILTER, socksg_filter)
 #endif
 #ifdef CONFIG_BPF_EVENTS
 BPF_PROG_TYPE(BPF_PROG_TYPE_KPROBE, kprobe)
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 45fc0f5..71618b1 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -517,6 +517,14 @@ struct bpf_skb_data_end {
 	void *data_end;
 };
 
+struct bpf_scatterlist {
+	struct scatterlist *sg;
+	void *start;
+	void *end;
+	int cur_sg;
+	int num_sg;
+};
+
 struct sk_msg_buff {
 	void *data;
 	void *data_end;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 59b19b6..ef0a7b6 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -144,6 +144,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
 	BPF_PROG_TYPE_LWT_SEG6LOCAL,
 	BPF_PROG_TYPE_LIRC_MODE2,
+	BPF_PROG_TYPE_SOCKET_SG_FILTER,
 };
 
 enum bpf_attach_type {
@@ -2358,6 +2359,12 @@ enum sk_action {
 	SK_PASS,
 };
 
+/* use accessible scatterlist */
+struct sg_filter_md {
+	void *data; /* sg_virt(sg) */
+	void *data_end; /* sg_virt(sg) + sg->length */
+};
+
 /* user accessible metadata for SK_MSG packet hook, new fields must
  * be added to the end of this structure
  */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 0fa2062..74193a8 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1300,6 +1300,7 @@ static int bpf_prog_load(union bpf_attr *attr)
 
 	if (type != BPF_PROG_TYPE_SOCKET_FILTER &&
 	    type != BPF_PROG_TYPE_CGROUP_SKB &&
+	    type != BPF_PROG_TYPE_SOCKET_SG_FILTER &&
 	    !capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d6403b5..a00d3eb 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1320,6 +1320,7 @@ static bool may_access_direct_pkt_data(struct bpf_verifier_env *env,
 	case BPF_PROG_TYPE_LWT_XMIT:
 	case BPF_PROG_TYPE_SK_SKB:
 	case BPF_PROG_TYPE_SK_MSG:
+	case BPF_PROG_TYPE_SOCKET_SG_FILTER:
 		if (meta)
 			return meta->pkt_access;
 
diff --git a/net/core/filter.c b/net/core/filter.c
index 3d9ba7e..8f67942 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1130,7 +1130,8 @@ static void bpf_release_orig_filter(struct bpf_prog *fp)
 
 static void __bpf_prog_release(struct bpf_prog *prog)
 {
-	if (prog->type == BPF_PROG_TYPE_SOCKET_FILTER) {
+	if (prog->type == BPF_PROG_TYPE_SOCKET_FILTER ||
+	    prog->type == BPF_PROG_TYPE_SOCKET_SG_FILTER) {
 		bpf_prog_put(prog);
 	} else {
 		bpf_release_orig_filter(prog);
@@ -1551,10 +1552,16 @@ int sk_reuseport_attach_filter(struct sock_fprog *fprog, struct sock *sk)
 
 static struct bpf_prog *__get_bpf(u32 ufd, struct sock *sk)
 {
+	struct bpf_prog *prog;
+
 	if (sock_flag(sk, SOCK_FILTER_LOCKED))
 		return ERR_PTR(-EPERM);
 
-	return bpf_prog_get_type(ufd, BPF_PROG_TYPE_SOCKET_FILTER);
+	prog = bpf_prog_get_type(ufd, BPF_PROG_TYPE_SOCKET_FILTER);
+	if (IS_ERR(prog))
+		prog = bpf_prog_get_type(ufd, BPF_PROG_TYPE_SOCKET_SG_FILTER);
+
+	return prog;
 }
 
 int sk_attach_bpf(u32 ufd, struct sock *sk)
@@ -4710,6 +4717,15 @@ bool bpf_helper_changes_pkt_data(void *func)
 }
 
 static const struct bpf_func_proto *
+socksg_filter_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
+{
+	switch (func_id) {
+	default:
+		return bpf_base_func_proto(func_id);
+	}
+}
+
+static const struct bpf_func_proto *
 tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
 	switch (func_id) {
@@ -5037,6 +5053,30 @@ static bool sk_filter_is_valid_access(int off, int size,
 	return bpf_skb_is_valid_access(off, size, type, prog, info);
 }
 
+static bool socksg_filter_is_valid_access(int off, int size,
+					  enum bpf_access_type type,
+					  const struct bpf_prog *prog,
+					  struct bpf_insn_access_aux *info)
+{
+	switch (off) {
+	case offsetof(struct sg_filter_md, data):
+		info->reg_type = PTR_TO_PACKET;
+		break;
+	case offsetof(struct sg_filter_md, data_end):
+		info->reg_type = PTR_TO_PACKET_END;
+		break;
+	}
+
+	if (off < 0 || off >= sizeof(struct sg_filter_md))
+		return false;
+	if (off % size != 0)
+		return false;
+	if (size != sizeof(__u64))
+		return false;
+
+	return true;
+}
+
 static bool lwt_is_valid_access(int off, int size,
 				enum bpf_access_type type,
 				const struct bpf_prog *prog,
@@ -6516,6 +6556,30 @@ static u32 sk_skb_convert_ctx_access(enum bpf_access_type type,
 	return insn - insn_buf;
 }
 
+static u32 socksg_filter_convert_ctx_access(enum bpf_access_type type,
+					    const struct bpf_insn *si,
+					    struct bpf_insn *insn_buf,
+					    struct bpf_prog *prog,
+					    u32 *target_size)
+{
+	struct bpf_insn *insn = insn_buf;
+
+	switch (si->off) {
+	case offsetof(struct sg_filter_md, data):
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_scatterlist, start),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct bpf_scatterlist, start));
+		break;
+	case offsetof(struct sg_filter_md, data_end):
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_scatterlist, end),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct bpf_scatterlist, end));
+		break;
+	}
+
+	return insn - insn_buf;
+}
+
 static u32 sk_msg_convert_ctx_access(enum bpf_access_type type,
 				     const struct bpf_insn *si,
 				     struct bpf_insn *insn_buf,
@@ -6654,6 +6718,15 @@ static u32 sk_msg_convert_ctx_access(enum bpf_access_type type,
 	.test_run		= bpf_prog_test_run_skb,
 };
 
+const struct bpf_verifier_ops socksg_filter_verifier_ops = {
+	.get_func_proto         = socksg_filter_func_proto,
+	.is_valid_access        = socksg_filter_is_valid_access,
+	.convert_ctx_access     = socksg_filter_convert_ctx_access,
+};
+
+const struct bpf_prog_ops socksg_filter_prog_ops = {
+};
+
 const struct bpf_verifier_ops tc_cls_act_verifier_ops = {
 	.get_func_proto		= tc_cls_act_func_proto,
 	.is_valid_access	= tc_cls_act_is_valid_access,
diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index 89161c9..15c355e 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -69,6 +69,8 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
 	bool is_sockops = strncmp(event, "sockops", 7) == 0;
 	bool is_sk_skb = strncmp(event, "sk_skb", 6) == 0;
 	bool is_sk_msg = strncmp(event, "sk_msg", 6) == 0;
+	bool is_socksg = strncmp(event, "socksg", 6) == 0;
+
 	size_t insns_cnt = size / sizeof(struct bpf_insn);
 	enum bpf_prog_type prog_type;
 	char buf[256];
@@ -102,6 +104,8 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
 		prog_type = BPF_PROG_TYPE_SK_SKB;
 	} else if (is_sk_msg) {
 		prog_type = BPF_PROG_TYPE_SK_MSG;
+	} else if (is_socksg) {
+		prog_type = BPF_PROG_TYPE_SOCKET_SG_FILTER;
 	} else {
 		printf("Unknown event '%s'\n", event);
 		return -1;
@@ -119,8 +123,8 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
 	if (is_xdp || is_perf_event || is_cgroup_skb || is_cgroup_sk)
 		return 0;
 
-	if (is_socket || is_sockops || is_sk_skb || is_sk_msg) {
-		if (is_socket)
+	if (is_socket || is_sockops || is_sk_skb || is_sk_msg || is_socksg) {
+		if (is_socket || is_socksg)
 			event += 6;
 		else
 			event += 7;
@@ -624,7 +628,8 @@ static int do_load_bpf_file(const char *path, fixup_map_cb fixup_map)
 		    memcmp(shname, "cgroup/", 7) == 0 ||
 		    memcmp(shname, "sockops", 7) == 0 ||
 		    memcmp(shname, "sk_skb", 6) == 0 ||
-		    memcmp(shname, "sk_msg", 6) == 0) {
+		    memcmp(shname, "sk_msg", 6) == 0 ||
+		    memcmp(shname, "socksg", 6) == 0) {
 			ret = load_and_attach(shname, data->d_buf,
 					      data->d_size);
 			if (ret != 0)
diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index a4f4352..06b2fef 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -72,6 +72,7 @@
 	[BPF_PROG_TYPE_RAW_TRACEPOINT]	= "raw_tracepoint",
 	[BPF_PROG_TYPE_CGROUP_SOCK_ADDR] = "cgroup_sock_addr",
 	[BPF_PROG_TYPE_LIRC_MODE2]	= "lirc_mode2",
+	[BPF_PROG_TYPE_SOCKET_SG_FILTER] = "socket_sg_filter",
 };
 
 static void print_boot_time(__u64 nsecs, char *buf, unsigned int size)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index e0b0678..c87ae16 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -144,6 +144,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
 	BPF_PROG_TYPE_LWT_SEG6LOCAL,
 	BPF_PROG_TYPE_LIRC_MODE2,
+	BPF_PROG_TYPE_SOCKET_SG_FILTER,
 };
 
 enum bpf_attach_type {
@@ -2358,6 +2359,12 @@ enum sk_action {
 	SK_PASS,
 };
 
+/* use accessible scatterlist */
+struct sg_filter_md {
+	void *data; /* sg_virt(sg) */
+	void *data_end; /* sg_virt(sg) + sg->length */
+};
+
 /* user accessible metadata for SK_MSG packet hook, new fields must
  * be added to the end of this structure
  */
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index a1e96b5..7628278 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1463,6 +1463,7 @@ static bool bpf_prog_type__needs_kver(enum bpf_prog_type type)
 	case BPF_PROG_TYPE_SK_MSG:
 	case BPF_PROG_TYPE_CGROUP_SOCK_ADDR:
 	case BPF_PROG_TYPE_LIRC_MODE2:
+	case BPF_PROG_TYPE_SOCKET_SG_FILTER:
 		return false;
 	case BPF_PROG_TYPE_UNSPEC:
 	case BPF_PROG_TYPE_KPROBE:
@@ -1998,6 +1999,7 @@ static bool bpf_program__is_type(struct bpf_program *prog,
 BPF_PROG_TYPE_FNS(raw_tracepoint, BPF_PROG_TYPE_RAW_TRACEPOINT);
 BPF_PROG_TYPE_FNS(xdp, BPF_PROG_TYPE_XDP);
 BPF_PROG_TYPE_FNS(perf_event, BPF_PROG_TYPE_PERF_EVENT);
+BPF_PROG_TYPE_FNS(socket_sg_filter, BPF_PROG_TYPE_SOCKET_SG_FILTER);
 
 void bpf_program__set_expected_attach_type(struct bpf_program *prog,
 					   enum bpf_attach_type type)
@@ -2048,6 +2050,7 @@ void bpf_program__set_expected_attach_type(struct bpf_program *prog,
 	BPF_SA_PROG_SEC("cgroup/sendmsg6", BPF_CGROUP_UDP6_SENDMSG),
 	BPF_S_PROG_SEC("cgroup/post_bind4", BPF_CGROUP_INET4_POST_BIND),
 	BPF_S_PROG_SEC("cgroup/post_bind6", BPF_CGROUP_INET6_POST_BIND),
+	BPF_PROG_SEC("socksg",          BPF_PROG_TYPE_SOCKET_SG_FILTER),
 };
 
 #undef BPF_PROG_SEC
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 0997653..3be165b 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -195,6 +195,7 @@ int bpf_program__set_prep(struct bpf_program *prog, int nr_instance,
 void bpf_program__set_type(struct bpf_program *prog, enum bpf_prog_type type);
 void bpf_program__set_expected_attach_type(struct bpf_program *prog,
 					   enum bpf_attach_type type);
+int bpf_program__set_socket_sg_filter(struct bpf_program *prog);
 
 bool bpf_program__is_socket_filter(struct bpf_program *prog);
 bool bpf_program__is_tracepoint(struct bpf_program *prog);
@@ -204,6 +205,7 @@ void bpf_program__set_expected_attach_type(struct bpf_program *prog,
 bool bpf_program__is_sched_act(struct bpf_program *prog);
 bool bpf_program__is_xdp(struct bpf_program *prog);
 bool bpf_program__is_perf_event(struct bpf_program *prog);
+bool bpf_program__is_socket_sg_filter(struct bpf_program *prog);
 
 /*
  * No need for __attribute__((packed)), all members of 'bpf_map_def'
-- 
1.8.3.1

^ permalink raw reply related

* [RFC v2 PATCH 0/4] eBPF and struct scatterlist
From: Tushar Dave @ 2018-06-19 18:00 UTC (permalink / raw)
  To: ast, daniel, davem, jakub.kicinski, quentin.monnet, jiong.wang,
	guro, sandipan, john.fastabend, kafai, rdna, brakmo, netdev, acme,
	sowmini.varadhan

This follows up on https://patchwork.ozlabs.org/cover/927050/
where the review feedback was to use bpf_skb_load_bytes() to deal with
linear and non-linear skbs. While that feedback is valid and correct,
the motivation for this work is to allow eBPF based firewalling for
kernel modules that do not always get their packet as an sk_buff from
their downlink drivers. One such instance of this use-case is RDS, which
can be run both over IB (driver RDMA's a scatterlist to the RDS module)
or over TCP (TCP passes an sk_buff to the RDS module)

This RFC (call it v2) uses exiting socket filter infrastructure and
extend it with new eBPF program type that deals with struct scatterlist.
For RDS, the integrated approach treats the scatterlist as the common
denominator, and allows the application to write a filter for processing
a scatterlist.


Details:
Patch 1 adds new eBPF prog type BPF_PROG_TYPE_SOCKET_SG_FILTER which
uses the existing socket filter infrastructure for bpf program attach
and load. eBPF program of type BPF_PROG_TYPE_SOCKET_SG_FILTER receives
struct scatterlist as bpf context contrast to
BPF_PROG_TYPE_SOCKET_FILTER which deals with struct skb. This new eBPF
program type allow socket filter to run on packet data that is in form
form of struct scatterlist.

Patch 2 adds functionality to run BPF_PROG_TYPE_SOCKET_SG_FILTER socket
filter program. A bpf helpers bpf_sg_next() is also added so users can
retrieve sg elements from scatterlist.

Patch 3 adds socket filter eBPF sample program that uses patch 1 and
patch 2. The sample program opens an rds socket, attach ebpf program
(socksg i.e. BPF_PROG_TYPE_SOCKET_SG_FILTER) to rds socket and uses
bpf_sg_next helper to look into sg. For a test, current ebpf program
only prints first few bytes from each elements of sg list.

Finally, patch 4 allows rds_recv_incoming to invoke socket filter
program which deals with scatterlist.

Thanks.

-Tushar

Tushar Dave (4):
  eBPF: Add new eBPF prog type BPF_PROG_TYPE_SOCKET_SG_FILTER
  ebpf: Add sg_filter_run and sg helper
  ebpf: Add sample ebpf program for SOCKET_SG_FILTER
  rds: invoke socket sg filter attached to rds socket

 include/linux/bpf_types.h                 |   1 +
 include/linux/filter.h                    |  10 +
 include/uapi/linux/bpf.h                  |  17 +-
 kernel/bpf/syscall.c                      |   1 +
 kernel/bpf/verifier.c                     |   1 +
 net/core/filter.c                         | 149 ++++++++++++-
 net/rds/ib.c                              |   1 +
 net/rds/ib.h                              |   1 +
 net/rds/ib_recv.c                         |  12 ++
 net/rds/rds.h                             |   2 +
 net/rds/recv.c                            |  16 ++
 net/rds/tcp.c                             |   2 +
 net/rds/tcp.h                             |   2 +
 net/rds/tcp_recv.c                        |  38 ++++
 samples/bpf/Makefile                      |   3 +
 samples/bpf/bpf_load.c                    |  11 +-
 samples/bpf/rds_filter_kern.c             |  78 +++++++
 samples/bpf/rds_filter_user.c             | 339 ++++++++++++++++++++++++++++++
 tools/bpf/bpftool/prog.c                  |   1 +
 tools/include/uapi/linux/bpf.h            |  17 +-
 tools/lib/bpf/libbpf.c                    |   3 +
 tools/lib/bpf/libbpf.h                    |   2 +
 tools/testing/selftests/bpf/bpf_helpers.h |   3 +
 23 files changed, 703 insertions(+), 7 deletions(-)
 create mode 100644 samples/bpf/rds_filter_kern.c
 create mode 100644 samples/bpf/rds_filter_user.c

-- 
1.8.3.1

^ permalink raw reply

* Re: [PATCH 1/3] net: ethernet: fix suspend/resume in davinci_emac
From: Lukas Wunner @ 2018-06-19 18:00 UTC (permalink / raw)
  To: Bartosz Golaszewski
  Cc: Grygorii Strashko, David S . Miller, Florian Fainelli,
	Dan Carpenter, Ivan Khoronzhuk, Rob Herring, Kevin Hilman,
	David Lechner, Sekhar Nori, Andrew Lunn, linux-omap, netdev,
	linux-kernel, Bartosz Golaszewski, stable
In-Reply-To: <20180619160950.6283-2-brgl@bgdev.pl>

On Tue, Jun 19, 2018 at 06:09:48PM +0200, Bartosz Golaszewski wrote:
> From: Bartosz Golaszewski <bgolaszewski@baylibre.com>
> 
> This patch reverts commit 3243ff2a05ec ("net: ethernet: davinci_emac:
> Deduplicate bus_find_device() by name matching") and adds a comment
> which should stop anyone from reintroducing the same "fix" in the future.
> 
> We can't use bus_find_device_by_name() here because the device name is
> not guaranteed to be 'davinci_mdio'. On some systems it can be
> 'davinci_mdio.0' so we need to use strncmp() against the first part of
> the string to correctly match it.
> 
> Fixes: 3243ff2a05ec ("net: ethernet: davinci_emac: Deduplicate bus_find_device() by name matching")
> Cc: stable@vger.kernel.org
> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>

This is still

Acked-by: Lukas Wunner <lukas@wunner.de>

given that my patch which is reverted here seems to have been incorrect.
Feel free to keep the ack if you respin in response to Florian Fainelli's
comments.

Thanks,

Lukas

> ---
>  drivers/net/ethernet/ti/davinci_emac.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/ti/davinci_emac.c b/drivers/net/ethernet/ti/davinci_emac.c
> index 06d7c9e4dcda..a1a6445b5a7e 100644
> --- a/drivers/net/ethernet/ti/davinci_emac.c
> +++ b/drivers/net/ethernet/ti/davinci_emac.c
> @@ -1385,6 +1385,11 @@ static int emac_devioctl(struct net_device *ndev, struct ifreq *ifrq, int cmd)
>  		return -EOPNOTSUPP;
>  }
>  
> +static int match_first_device(struct device *dev, void *data)
> +{
> +	return !strncmp(dev_name(dev), "davinci_mdio", 12);
> +}
> +
>  /**
>   * emac_dev_open - EMAC device open
>   * @ndev: The DaVinci EMAC network adapter
> @@ -1484,8 +1489,14 @@ static int emac_dev_open(struct net_device *ndev)
>  
>  	/* use the first phy on the bus if pdata did not give us a phy id */
>  	if (!phydev && !priv->phy_id) {
> -		phy = bus_find_device_by_name(&mdio_bus_type, NULL,
> -					      "davinci_mdio");
> +		/* NOTE: we can't use bus_find_device_by_name() here because
> +		 * the device name is not guaranteed to be 'davinci_mdio'. On
> +		 * some systems it can be 'davinci_mdio.0' so we need to use
> +		 * strncmp() against the first part of the string to correctly
> +		 * match it.
> +		 */
> +		phy = bus_find_device(&mdio_bus_type, NULL, NULL,
> +				      match_first_device);
>  		if (phy) {
>  			priv->phy_id = dev_name(phy);
>  			if (!priv->phy_id || !*priv->phy_id)
> -- 
> 2.17.1
> 

^ permalink raw reply

* Re: [PATCH] net: nixge: Add __packed attribute to DMA descriptor struct
From: Florian Fainelli @ 2018-06-19 17:41 UTC (permalink / raw)
  To: Moritz Fischer
  Cc: David S. Miller, Kees Cook, netdev, Linux Kernel Mailing List
In-Reply-To: <CAAtXAHfGwSLfhhPU9B=O8q5aGExrmgkPgkC5u8XYLtKL5i4JHQ@mail.gmail.com>

On 06/19/2018 10:31 AM, Moritz Fischer wrote:
> Hi Florian,
> 
> On Tue, Jun 19, 2018 at 10:13 AM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>> On 06/19/2018 09:54 AM, Moritz Fischer wrote:
>>> Add __packed attribute to DMA descriptor structure  in order to
>>> make sure that the DMA engine's alignemnt requirements are met.
>>>
>>> Fixes commit 492caffa8a1a ("net: ethernet: nixge: Add support for
>>> National Instruments XGE netdev")
>>> Signed-off-by: Moritz Fischer <mdf@kernel.org>
>>> ---
>>>
>>> Hi David,
>>>
>>> this addresses an issue where padding occured breaking the alignment
>>> in the array the descriptors are allocated in coherent memory.
>>> This was discovered when we tried to bring up the driver via a PCIe
>>> bridge on x86.
>>
>> How could padding be inserted given than all of the structure members
>> are naturally aligned (all u32 type). Compiler bug?
> 
> I have no good answer to this, all I can tell you is that it wouldn't work
> otherwise. This was part of a bunch of changes that I made in order
> to make this work with 64bit DMA. I made sure to remove the padding/
> reserved fields accordingly such that the net difference would be zero.
> 
> I might've messed that up? The descriptors looked something like this:

Ah ah! This is not the layout in the upstream tree I am looking at, but
your layout below will definitive introduce padding, yes.

> 
> struct nixge_hw_dma_bd {
> #ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
>         u64 next;
>         u64 phys;
> #else
>         u32 next;
>         u32 reserved1;
>         u32 phys;
>         u32 reserved2;
> #endif
>         u32 reserved3;
>         u32 reserved4;
>         u32 cntrl;
>         u32 status;
>         u32 app0;
>         u32 app1;
>         u32 app2;
>         u32 app3;
>         u32 app4;
> #ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
>         u64 sw_id_offset;
> #else
>         u32 sw_id_offset;
>         u32 reserved5;
> #endif
>         u32 reserved6;
> } __packed;
> 
> 
> I'll have some follow up patches to add 64bit support together with a
> wrapper driver for the PCIe bridge once the architecture solidifies here.

Why not have the structure updated like this:

struct nixge_hw_dma_bd {
	u32 next_lo;	/* lower 32-bit address part */
	u32 next_hi;	/* upper 32-bit address part */
	u32 phys_lo;
	u32 phys_hi;
        u32 reserved3;
        u32 reserved4;
        u32 cntrl;
        u32 status;
        u32 app0;
        u32 app1;
        u32 app2;
        u32 app3;
        u32 app4;
        u32 sw_id_offset_low;
        u32 sw_id_offset_hi;
        u32 reserved6;
};

That assumes I got the upper/lower address part correct, if not, swapthe
members. Then in your code, if you want to be efficient, you can
populate the fields only if CONFIG_ARCH_DMA_ADDR_T_64BIT is defined. I
did that in bcmgenet.c for instance because the register accesses are
slow, so if we can save 200ns per packet transmit/receive, that's a win.

This should not change anything because the structure size is the same
in both cases, but how you are managing is different, and that would in
turn influence what the HW sees.

Does that make sense?

> 
> Thanks for the feedback,
> 
> Moritz
> 


-- 
Florian

^ permalink raw reply

* bpfilter compile failure on parisc
From: Meelis Roos @ 2018-06-19 17:38 UTC (permalink / raw)
  To: netdev, linux-parisc

Tried enabling bpfilter on parisc, got this:

  HOSTCC  net/bpfilter/main.o
net/bpfilter/main.c:3:21: fatal error: sys/uio.h: No such file or directory
 #include <sys/uio.h>
                     ^
compilation terminated.

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply

* Re: [PATCH] net: nixge: Add __packed attribute to DMA descriptor struct
From: Moritz Fischer @ 2018-06-19 17:31 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: David S. Miller, Kees Cook, netdev, Linux Kernel Mailing List
In-Reply-To: <832bebb8-300d-e911-2946-5edfe82dc30a@gmail.com>

Hi Florian,

On Tue, Jun 19, 2018 at 10:13 AM, Florian Fainelli <f.fainelli@gmail.com> wrote:
> On 06/19/2018 09:54 AM, Moritz Fischer wrote:
>> Add __packed attribute to DMA descriptor structure  in order to
>> make sure that the DMA engine's alignemnt requirements are met.
>>
>> Fixes commit 492caffa8a1a ("net: ethernet: nixge: Add support for
>> National Instruments XGE netdev")
>> Signed-off-by: Moritz Fischer <mdf@kernel.org>
>> ---
>>
>> Hi David,
>>
>> this addresses an issue where padding occured breaking the alignment
>> in the array the descriptors are allocated in coherent memory.
>> This was discovered when we tried to bring up the driver via a PCIe
>> bridge on x86.
>
> How could padding be inserted given than all of the structure members
> are naturally aligned (all u32 type). Compiler bug?

I have no good answer to this, all I can tell you is that it wouldn't work
otherwise. This was part of a bunch of changes that I made in order
to make this work with 64bit DMA. I made sure to remove the padding/
reserved fields accordingly such that the net difference would be zero.

I might've messed that up? The descriptors looked something like this:

struct nixge_hw_dma_bd {
#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
        u64 next;
        u64 phys;
#else
        u32 next;
        u32 reserved1;
        u32 phys;
        u32 reserved2;
#endif
        u32 reserved3;
        u32 reserved4;
        u32 cntrl;
        u32 status;
        u32 app0;
        u32 app1;
        u32 app2;
        u32 app3;
        u32 app4;
#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
        u64 sw_id_offset;
#else
        u32 sw_id_offset;
        u32 reserved5;
#endif
        u32 reserved6;
} __packed;


I'll have some follow up patches to add 64bit support together with a
wrapper driver for the PCIe bridge once the architecture solidifies here.

Thanks for the feedback,

Moritz

^ permalink raw reply

* Re: [PATCH net 5/5] net sched actions: fix misleading text strings in pedit action
From: Roman Mashak @ 2018-06-19 17:15 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: davem, netdev, kernel, jhs, xiyou.wangcong, jiri
In-Reply-To: <20180619100408.42155193@xeon-e3>

Stephen Hemminger <stephen@networkplumber.org> writes:

> On Tue, 19 Jun 2018 12:56:08 -0400

[...]

>> @@ -326,12 +326,12 @@ static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
>>  			}
>>  
>>  			if (offset % 4) {
>> -				pr_info("tc filter pedit offset must be on 32 bit boundaries\n");
>> +				pr_info("tc action pedit offset must be on 32 bit boundaries\n");
>>  				goto bad;
>>  			}
>>  
>>  			if (!offset_valid(skb, hoffset + offset)) {
>> -				pr_info("tc filter pedit offset %d out of bounds\n",
>> +				pr_info("tc action pedit offset %d out of bounds\n",
>>  					hoffset + offset);
>>  				goto bad;
>
> Time to convert these to netlink extack reporting?

Yes, this is planned in next patches.

^ permalink raw reply

* Re: [PATCH] net: nixge: Add __packed attribute to DMA descriptor struct
From: Florian Fainelli @ 2018-06-19 17:13 UTC (permalink / raw)
  To: Moritz Fischer, davem; +Cc: keescook, netdev, linux-kernel
In-Reply-To: <20180619165453.31894-1-mdf@kernel.org>

On 06/19/2018 09:54 AM, Moritz Fischer wrote:
> Add __packed attribute to DMA descriptor structure  in order to
> make sure that the DMA engine's alignemnt requirements are met.
> 
> Fixes commit 492caffa8a1a ("net: ethernet: nixge: Add support for
> National Instruments XGE netdev")
> Signed-off-by: Moritz Fischer <mdf@kernel.org>
> ---
> 
> Hi David,
> 
> this addresses an issue where padding occured breaking the alignment
> in the array the descriptors are allocated in coherent memory.
> This was discovered when we tried to bring up the driver via a PCIe
> bridge on x86.

How could padding be inserted given than all of the structure members
are naturally aligned (all u32 type). Compiler bug?

Also

> 
> Thanks,
> 
> Moritz
> 
> ---
>  drivers/net/ethernet/ni/nixge.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/ni/nixge.c b/drivers/net/ethernet/ni/nixge.c
> index 09f674ec0f9e..fea0e994324b 100644
> --- a/drivers/net/ethernet/ni/nixge.c
> +++ b/drivers/net/ethernet/ni/nixge.c
> @@ -122,7 +122,7 @@ struct nixge_hw_dma_bd {
>  	u32 sw_id_offset;
>  	u32 reserved5;
>  	u32 reserved6;
> -};
> +} __packed;
>  
>  struct nixge_tx_skb {
>  	struct sk_buff *skb;
> 


-- 
Florian

^ permalink raw reply

* Re: [PATCH net 5/5] net sched actions: fix misleading text strings in pedit action
From: Stephen Hemminger @ 2018-06-19 17:04 UTC (permalink / raw)
  To: Roman Mashak; +Cc: davem, netdev, kernel, jhs, xiyou.wangcong, jiri
In-Reply-To: <1529427368-17129-6-git-send-email-mrv@mojatatu.com>

On Tue, 19 Jun 2018 12:56:08 -0400
Roman Mashak <mrv@mojatatu.com> wrote:

> Change "tc filter pedit .." to "tc actions pedit .." in error
> messages to clearly refer to pedit action.
> 
> Signed-off-by: Roman Mashak <mrv@mojatatu.com>
> ---
>  net/sched/act_pedit.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> index 3b775f54cee5..caa6927a992c 100644
> --- a/net/sched/act_pedit.c
> +++ b/net/sched/act_pedit.c
> @@ -305,7 +305,7 @@ static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
>  
>  			rc = pedit_skb_hdr_offset(skb, htype, &hoffset);
>  			if (rc) {
> -				pr_info("tc filter pedit bad header type specified (0x%x)\n",
> +				pr_info("tc action pedit bad header type specified (0x%x)\n",
>  					htype);
>  				goto bad;
>  			}
> @@ -314,7 +314,7 @@ static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
>  				char *d, _d;
>  
>  				if (!offset_valid(skb, hoffset + tkey->at)) {
> -					pr_info("tc filter pedit 'at' offset %d out of bounds\n",
> +					pr_info("tc action pedit 'at' offset %d out of bounds\n",
>  						hoffset + tkey->at);
>  					goto bad;
>  				}
> @@ -326,12 +326,12 @@ static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
>  			}
>  
>  			if (offset % 4) {
> -				pr_info("tc filter pedit offset must be on 32 bit boundaries\n");
> +				pr_info("tc action pedit offset must be on 32 bit boundaries\n");
>  				goto bad;
>  			}
>  
>  			if (!offset_valid(skb, hoffset + offset)) {
> -				pr_info("tc filter pedit offset %d out of bounds\n",
> +				pr_info("tc action pedit offset %d out of bounds\n",
>  					hoffset + offset);
>  				goto bad;

Time to convert these to netlink extack reporting?

^ permalink raw reply

* [PATCH net 5/5] net sched actions: fix misleading text strings in pedit action
From: Roman Mashak @ 2018-06-19 16:56 UTC (permalink / raw)
  To: davem; +Cc: netdev, kernel, jhs, xiyou.wangcong, jiri, Roman Mashak
In-Reply-To: <1529427368-17129-1-git-send-email-mrv@mojatatu.com>

Change "tc filter pedit .." to "tc actions pedit .." in error
messages to clearly refer to pedit action.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
---
 net/sched/act_pedit.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
index 3b775f54cee5..caa6927a992c 100644
--- a/net/sched/act_pedit.c
+++ b/net/sched/act_pedit.c
@@ -305,7 +305,7 @@ static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
 
 			rc = pedit_skb_hdr_offset(skb, htype, &hoffset);
 			if (rc) {
-				pr_info("tc filter pedit bad header type specified (0x%x)\n",
+				pr_info("tc action pedit bad header type specified (0x%x)\n",
 					htype);
 				goto bad;
 			}
@@ -314,7 +314,7 @@ static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
 				char *d, _d;
 
 				if (!offset_valid(skb, hoffset + tkey->at)) {
-					pr_info("tc filter pedit 'at' offset %d out of bounds\n",
+					pr_info("tc action pedit 'at' offset %d out of bounds\n",
 						hoffset + tkey->at);
 					goto bad;
 				}
@@ -326,12 +326,12 @@ static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
 			}
 
 			if (offset % 4) {
-				pr_info("tc filter pedit offset must be on 32 bit boundaries\n");
+				pr_info("tc action pedit offset must be on 32 bit boundaries\n");
 				goto bad;
 			}
 
 			if (!offset_valid(skb, hoffset + offset)) {
-				pr_info("tc filter pedit offset %d out of bounds\n",
+				pr_info("tc action pedit offset %d out of bounds\n",
 					hoffset + offset);
 				goto bad;
 			}
@@ -349,7 +349,7 @@ static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
 				val = (*ptr + tkey->val) & ~tkey->mask;
 				break;
 			default:
-				pr_info("tc filter pedit bad command (%d)\n",
+				pr_info("tc action pedit bad command (%d)\n",
 					cmd);
 				goto bad;
 			}
-- 
2.7.4

^ permalink raw reply related

* [PATCH net 4/5] net sched actions: use sizeof operator for buffer length
From: Roman Mashak @ 2018-06-19 16:56 UTC (permalink / raw)
  To: davem; +Cc: netdev, kernel, jhs, xiyou.wangcong, jiri, Roman Mashak
In-Reply-To: <1529427368-17129-1-git-send-email-mrv@mojatatu.com>

Replace constant integer with sizeof() to clearly indicate
the destination buffer length in skb_header_pointer() calls.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
---
 net/sched/act_pedit.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
index 9c2d8a31a5c5..3b775f54cee5 100644
--- a/net/sched/act_pedit.c
+++ b/net/sched/act_pedit.c
@@ -319,7 +319,7 @@ static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
 					goto bad;
 				}
 				d = skb_header_pointer(skb, hoffset + tkey->at,
-						       1, &_d);
+						       sizeof(_d), &_d);
 				if (!d)
 					goto bad;
 				offset += (*d & tkey->offmask) >> tkey->shift;
@@ -337,7 +337,7 @@ static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
 			}
 
 			ptr = skb_header_pointer(skb, hoffset + offset,
-						 4, &hdata);
+						 sizeof(hdata), &hdata);
 			if (!ptr)
 				goto bad;
 			/* just do it, baby */
-- 
2.7.4

^ permalink raw reply related

* [PATCH net 3/5] net sched actions: fix sparse warning
From: Roman Mashak @ 2018-06-19 16:56 UTC (permalink / raw)
  To: davem; +Cc: netdev, kernel, jhs, xiyou.wangcong, jiri, Roman Mashak
In-Reply-To: <1529427368-17129-1-git-send-email-mrv@mojatatu.com>

The variable _data in include/asm-generic/sections.h defines sections,
this causes sparse warning in pedit:

net/sched/act_pedit.c:293:35: warning: symbol '_data' shadows an earlier one
./include/asm-generic/sections.h:36:13: originally declared here

Therefore rename the variable.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
---
 net/sched/act_pedit.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
index e4b29ee79ba8..9c2d8a31a5c5 100644
--- a/net/sched/act_pedit.c
+++ b/net/sched/act_pedit.c
@@ -290,7 +290,7 @@ static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
 		enum pedit_cmd cmd = TCA_PEDIT_KEY_EX_CMD_SET;
 
 		for (i = p->tcfp_nkeys; i > 0; i--, tkey++) {
-			u32 *ptr, _data;
+			u32 *ptr, hdata;
 			int offset = tkey->off;
 			int hoffset;
 			u32 val;
@@ -337,7 +337,7 @@ static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
 			}
 
 			ptr = skb_header_pointer(skb, hoffset + offset,
-						 4, &_data);
+						 4, &hdata);
 			if (!ptr)
 				goto bad;
 			/* just do it, baby */
@@ -355,7 +355,7 @@ static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
 			}
 
 			*ptr = ((*ptr & tkey->mask) ^ val);
-			if (ptr == &_data)
+			if (ptr == &hdata)
 				skb_store_bits(skb, hoffset + offset, ptr, 4);
 		}
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH net 2/5] net sched actions: fix coding style in pedit headers
From: Roman Mashak @ 2018-06-19 16:56 UTC (permalink / raw)
  To: davem; +Cc: netdev, kernel, jhs, xiyou.wangcong, jiri, Roman Mashak
In-Reply-To: <1529427368-17129-1-git-send-email-mrv@mojatatu.com>

Fix coding style issues in tc pedit headers detected by the
checkpatch script.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
---
 include/net/tc_act/tc_pedit.h        | 1 +
 include/uapi/linux/tc_act/tc_pedit.h | 9 +++++++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/net/tc_act/tc_pedit.h b/include/net/tc_act/tc_pedit.h
index 227a6f1d02f4..fac3ad4a86de 100644
--- a/include/net/tc_act/tc_pedit.h
+++ b/include/net/tc_act/tc_pedit.h
@@ -17,6 +17,7 @@ struct tcf_pedit {
 	struct tc_pedit_key	*tcfp_keys;
 	struct tcf_pedit_key_ex	*tcfp_keys_ex;
 };
+
 #define to_pedit(a) ((struct tcf_pedit *)a)
 
 static inline bool is_tcf_pedit(const struct tc_action *a)
diff --git a/include/uapi/linux/tc_act/tc_pedit.h b/include/uapi/linux/tc_act/tc_pedit.h
index 162d1094c41c..24ec792dacc1 100644
--- a/include/uapi/linux/tc_act/tc_pedit.h
+++ b/include/uapi/linux/tc_act/tc_pedit.h
@@ -17,13 +17,15 @@ enum {
 	TCA_PEDIT_KEY_EX,
 	__TCA_PEDIT_MAX
 };
+
 #define TCA_PEDIT_MAX (__TCA_PEDIT_MAX - 1)
-                                                                                
+
 enum {
 	TCA_PEDIT_KEY_EX_HTYPE = 1,
 	TCA_PEDIT_KEY_EX_CMD = 2,
 	__TCA_PEDIT_KEY_EX_MAX
 };
+
 #define TCA_PEDIT_KEY_EX_MAX (__TCA_PEDIT_KEY_EX_MAX - 1)
 
  /* TCA_PEDIT_KEY_EX_HDR_TYPE_NETWROK is a special case for legacy users. It
@@ -38,6 +40,7 @@ enum pedit_header_type {
 	TCA_PEDIT_KEY_EX_HDR_TYPE_UDP = 5,
 	__PEDIT_HDR_TYPE_MAX,
 };
+
 #define TCA_PEDIT_HDR_TYPE_MAX (__PEDIT_HDR_TYPE_MAX - 1)
 
 enum pedit_cmd {
@@ -45,6 +48,7 @@ enum pedit_cmd {
 	TCA_PEDIT_KEY_EX_CMD_ADD = 1,
 	__PEDIT_CMD_MAX,
 };
+
 #define TCA_PEDIT_CMD_MAX (__PEDIT_CMD_MAX - 1)
 
 struct tc_pedit_key {
@@ -55,13 +59,14 @@ struct tc_pedit_key {
 	__u32           offmask;
 	__u32           shift;
 };
-                                                                                
+
 struct tc_pedit_sel {
 	tc_gen;
 	unsigned char           nkeys;
 	unsigned char           flags;
 	struct tc_pedit_key     keys[0];
 };
+
 #define tc_pedit tc_pedit_sel
 
 #endif
-- 
2.7.4

^ permalink raw reply related

* [PATCH net 1/5] net sched actions: fix coding style in pedit action
From: Roman Mashak @ 2018-06-19 16:56 UTC (permalink / raw)
  To: davem; +Cc: netdev, kernel, jhs, xiyou.wangcong, jiri, Roman Mashak
In-Reply-To: <1529427368-17129-1-git-send-email-mrv@mojatatu.com>

Fix coding style issues in tc pedit action detected by the
checkpatch script.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
---
 net/sched/act_pedit.c | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
index 8a925c72db5f..e4b29ee79ba8 100644
--- a/net/sched/act_pedit.c
+++ b/net/sched/act_pedit.c
@@ -136,15 +136,15 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla,
 {
 	struct tc_action_net *tn = net_generic(net, pedit_net_id);
 	struct nlattr *tb[TCA_PEDIT_MAX + 1];
-	struct nlattr *pattr;
-	struct tc_pedit *parm;
-	int ret = 0, err;
-	struct tcf_pedit *p;
 	struct tc_pedit_key *keys = NULL;
 	struct tcf_pedit_key_ex *keys_ex;
+	struct tc_pedit *parm;
+	struct nlattr *pattr;
+	struct tcf_pedit *p;
+	int ret = 0, err;
 	int ksize;
 
-	if (nla == NULL)
+	if (!nla)
 		return -EINVAL;
 
 	err = nla_parse_nested(tb, TCA_PEDIT_MAX, nla, pedit_policy, NULL);
@@ -175,7 +175,7 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla,
 			return ret;
 		p = to_pedit(*a);
 		keys = kmalloc(ksize, GFP_KERNEL);
-		if (keys == NULL) {
+		if (!keys) {
 			tcf_idr_release(*a, bind);
 			kfree(keys_ex);
 			return -ENOMEM;
@@ -220,6 +220,7 @@ static void tcf_pedit_cleanup(struct tc_action *a)
 {
 	struct tcf_pedit *p = to_pedit(a);
 	struct tc_pedit_key *keys = p->tcfp_keys;
+
 	kfree(keys);
 	kfree(p->tcfp_keys_ex);
 }
@@ -284,7 +285,8 @@ static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
 	if (p->tcfp_nkeys > 0) {
 		struct tc_pedit_key *tkey = p->tcfp_keys;
 		struct tcf_pedit_key_ex *tkey_ex = p->tcfp_keys_ex;
-		enum pedit_header_type htype = TCA_PEDIT_KEY_EX_HDR_TYPE_NETWORK;
+		enum pedit_header_type htype =
+			TCA_PEDIT_KEY_EX_HDR_TYPE_NETWORK;
 		enum pedit_cmd cmd = TCA_PEDIT_KEY_EX_CMD_SET;
 
 		for (i = p->tcfp_nkeys; i > 0; i--, tkey++) {
@@ -316,16 +318,15 @@ static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
 						hoffset + tkey->at);
 					goto bad;
 				}
-				d = skb_header_pointer(skb, hoffset + tkey->at, 1,
-						       &_d);
+				d = skb_header_pointer(skb, hoffset + tkey->at,
+						       1, &_d);
 				if (!d)
 					goto bad;
 				offset += (*d & tkey->offmask) >> tkey->shift;
 			}
 
 			if (offset % 4) {
-				pr_info("tc filter pedit"
-					" offset must be on 32 bit boundaries\n");
+				pr_info("tc filter pedit offset must be on 32 bit boundaries\n");
 				goto bad;
 			}
 
@@ -335,7 +336,8 @@ static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
 				goto bad;
 			}
 
-			ptr = skb_header_pointer(skb, hoffset + offset, 4, &_data);
+			ptr = skb_header_pointer(skb, hoffset + offset,
+						 4, &_data);
 			if (!ptr)
 				goto bad;
 			/* just do it, baby */
@@ -358,8 +360,9 @@ static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
 		}
 
 		goto done;
-	} else
+	} else {
 		WARN(1, "pedit BUG: index %d\n", p->tcf_index);
+	}
 
 bad:
 	p->tcf_qstats.overlimits++;
-- 
2.7.4

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox