Netdev List
 help / color / mirror / Atom feed
* Re: [ovs-dev] [PATCH net-next] net: openvswitch: do not update max_headroom if new headroom is equal to old headroom
From: Gregory Rose @ 2019-07-08 23:22 UTC (permalink / raw)
  To: David Miller, ap420073; +Cc: dev, netdev, Pravin Shelar
In-Reply-To: <87bfb355-9ddf-c27b-c160-b3028a945a22@gmail.com>



On 7/8/2019 4:18 PM, Gregory Rose wrote:
> On 7/8/2019 4:08 PM, David Miller wrote:
>> From: Taehee Yoo <ap420073@gmail.com>
>> Date: Sat,  6 Jul 2019 01:08:09 +0900
>>
>>> When a vport is deleted, the maximum headroom size would be changed.
>>> If the vport which has the largest headroom is deleted,
>>> the new max_headroom would be set.
>>> But, if the new headroom size is equal to the old headroom size,
>>> updating routine is unnecessary.
>>>
>>> Signed-off-by: Taehee Yoo <ap420073@gmail.com>
>> I'm not so sure about the logic here and I'd therefore like an OVS 
>> expert
>> to review this.
>
> I'll review and test it and get back.  Pravin may have input as well.
>

Err, adding Pravin.

- Greg

> Thanks,
>
> - Greg
>
>> Thanks.
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>


^ permalink raw reply

* Re: linux-next: manual merge of the net-next tree with the sh tree
From: Stephen Rothwell @ 2019-07-08 23:22 UTC (permalink / raw)
  To: David Miller, Networking, Yoshinori Sato
  Cc: Linux Next Mailing List, Linux Kernel Mailing List,
	Krzysztof Kozlowski, Jiri Pirko
In-Reply-To: <20190617114011.4159295e@canb.auug.org.au>

[-- Attachment #1: Type: text/plain, Size: 2466 bytes --]

Hi all,

On Mon, 17 Jun 2019 11:40:11 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> Today's linux-next merge of the net-next tree got conflicts in:
> 
>   arch/sh/configs/se7712_defconfig
>   arch/sh/configs/se7721_defconfig
>   arch/sh/configs/titan_defconfig
> 
> between commit:
> 
>   7c04efc8d2ef ("sh: configs: Remove useless UEVENT_HELPER_PATH")
> 
> from the sh tree and commit:
> 
>   a51486266c3b ("net: sched: remove NET_CLS_IND config option")
> 
> from the net-next tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 
> -- 
> Cheers,
> Stephen Rothwell
> 
> diff --cc arch/sh/configs/se7712_defconfig
> index 6ac7d362e106,1e116529735f..000000000000
> --- a/arch/sh/configs/se7712_defconfig
> +++ b/arch/sh/configs/se7712_defconfig
> @@@ -63,7 -63,7 +63,6 @@@ CONFIG_NET_SCH_NETEM=
>   CONFIG_NET_CLS_TCINDEX=y
>   CONFIG_NET_CLS_ROUTE4=y
>   CONFIG_NET_CLS_FW=y
> - CONFIG_NET_CLS_IND=y
>  -CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
>   CONFIG_MTD=y
>   CONFIG_MTD_BLOCK=y
>   CONFIG_MTD_CFI=y
> diff --cc arch/sh/configs/se7721_defconfig
> index ffd15acc2a04,c66e512719ab..000000000000
> --- a/arch/sh/configs/se7721_defconfig
> +++ b/arch/sh/configs/se7721_defconfig
> @@@ -62,7 -62,7 +62,6 @@@ CONFIG_NET_SCH_NETEM=
>   CONFIG_NET_CLS_TCINDEX=y
>   CONFIG_NET_CLS_ROUTE4=y
>   CONFIG_NET_CLS_FW=y
> - CONFIG_NET_CLS_IND=y
>  -CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
>   CONFIG_MTD=y
>   CONFIG_MTD_BLOCK=y
>   CONFIG_MTD_CFI=y
> diff --cc arch/sh/configs/titan_defconfig
> index 1c1c78e74fbb,171ab05ce4fc..000000000000
> --- a/arch/sh/configs/titan_defconfig
> +++ b/arch/sh/configs/titan_defconfig
> @@@ -142,7 -142,7 +142,6 @@@ CONFIG_GACT_PROB=
>   CONFIG_NET_ACT_MIRRED=m
>   CONFIG_NET_ACT_IPT=m
>   CONFIG_NET_ACT_PEDIT=m
> - CONFIG_NET_CLS_IND=y
>  -CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
>   CONFIG_FW_LOADER=m
>   CONFIG_CONNECTOR=m
>   CONFIG_MTD=m

I am still getting this conflict (the commit ids may have changed).
Just a reminder in case you think Linus may need to know.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [ovs-dev] [PATCH net-next] net: openvswitch: do not update max_headroom if new headroom is equal to old headroom
From: Gregory Rose @ 2019-07-08 23:18 UTC (permalink / raw)
  To: David Miller, ap420073; +Cc: dev, netdev
In-Reply-To: <20190708.160804.2026506853635876959.davem@davemloft.net>

On 7/8/2019 4:08 PM, David Miller wrote:
> From: Taehee Yoo <ap420073@gmail.com>
> Date: Sat,  6 Jul 2019 01:08:09 +0900
>
>> When a vport is deleted, the maximum headroom size would be changed.
>> If the vport which has the largest headroom is deleted,
>> the new max_headroom would be set.
>> But, if the new headroom size is equal to the old headroom size,
>> updating routine is unnecessary.
>>
>> Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> I'm not so sure about the logic here and I'd therefore like an OVS expert
> to review this.

I'll review and test it and get back.  Pravin may have input as well.

Thanks,

- Greg

> Thanks.
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev


^ permalink raw reply

* [PATCH] net/mlx5e: Return in default case statement in tx_post_resync_params
From: Nathan Chancellor @ 2019-07-08 23:11 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky
  Cc: David S. Miller, Boris Pismenny, netdev, linux-rdma, linux-kernel,
	clang-built-linux, Nathan Chancellor

clang warns:

drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c:251:2:
warning: variable 'rec_seq_sz' is used uninitialized whenever switch
default is taken [-Wsometimes-uninitialized]
        default:
        ^~~~~~~
drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c:255:46: note:
uninitialized use occurs here
        skip_static_post = !memcmp(rec_seq, &rn_be, rec_seq_sz);
                                                    ^~~~~~~~~~
drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c:239:16: note:
initialize the variable 'rec_seq_sz' to silence this warning
        u16 rec_seq_sz;
                      ^
                       = 0
1 warning generated.

This case statement was clearly designed to be one that should not be
hit during runtime because of the WARN_ON statement so just return early
to prevent copying uninitialized memory up into rn_be.

Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support")
Link: https://github.com/ClangBuiltLinux/linux/issues/590
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
index 3f5f4317a22b..5c08891806f0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
@@ -250,6 +250,7 @@ tx_post_resync_params(struct mlx5e_txqsq *sq,
 	}
 	default:
 		WARN_ON(1);
+		return;
 	}
 
 	skip_static_post = !memcmp(rec_seq, &rn_be, rec_seq_sz);
-- 
2.22.0


^ permalink raw reply related

* Re: [net-next] net: fib_rules: do not flow dissect local packets
From: David Miller @ 2019-07-08 23:12 UTC (permalink / raw)
  To: ppenkov; +Cc: netdev, roopa, edumazet
In-Reply-To: <20190705184643.249884-1-ppenkov@google.com>

From: Petar Penkov <ppenkov@google.com>
Date: Fri,  5 Jul 2019 11:46:43 -0700

> Rules matching on loopback iif do not need early flow dissection as the
> packet originates from the host. Stop counting such rules in
> fib_rule_requires_fldissect
> 
> Signed-off-by: Petar Penkov <ppenkov@google.com>

Roopa, please review.

^ permalink raw reply

* Re: [PATCH v2 net-next] net: stmmac: enable clause 45 mdio support
From: David Miller @ 2019-07-08 23:09 UTC (permalink / raw)
  To: weifeng.voon
  Cc: mcoquelin.stm32, netdev, linux-kernel, joabreu, peppe.cavallaro,
	andrew, f.fainelli, alexandre.torgue, biao.huang, boon.leong.ong,
	hock.leong.kweh
In-Reply-To: <1562348007-12263-1-git-send-email-weifeng.voon@intel.com>

From: Voon Weifeng <weifeng.voon@intel.com>
Date: Sat,  6 Jul 2019 01:33:27 +0800

> From: Kweh Hock Leong <hock.leong.kweh@intel.com>
> 
> DWMAC4 is capable to support clause 45 mdio communication.
> This patch enable the feature on stmmac_mdio_write() and
> stmmac_mdio_read() by following phy_write_mmd() and
> phy_read_mmd() mdiobus read write implementation format.
> 
> Reviewed-by: Li, Yifan <yifan2.li@intel.com>
> Signed-off-by: Kweh Hock Leong <hock.leong.kweh@intel.com>
> Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
> Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next] net: openvswitch: do not update max_headroom if new headroom is equal to old headroom
From: David Miller @ 2019-07-08 23:08 UTC (permalink / raw)
  To: ap420073; +Cc: pshelar, netdev, dev
In-Reply-To: <20190705160809.5202-1-ap420073@gmail.com>

From: Taehee Yoo <ap420073@gmail.com>
Date: Sat,  6 Jul 2019 01:08:09 +0900

> When a vport is deleted, the maximum headroom size would be changed.
> If the vport which has the largest headroom is deleted,
> the new max_headroom would be set.
> But, if the new headroom size is equal to the old headroom size,
> updating routine is unnecessary.
> 
> Signed-off-by: Taehee Yoo <ap420073@gmail.com>

I'm not so sure about the logic here and I'd therefore like an OVS expert
to review this.

Thanks.

^ permalink raw reply

* Re: [PATCH v3 net-next 19/19] ionic: Add basic devlink interface
From: Shannon Nelson @ 2019-07-08 22:58 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev
In-Reply-To: <20190708200350.GG2282@nanopsycho.orion>

On 7/8/19 1:03 PM, Jiri Pirko wrote:
> Mon, Jul 08, 2019 at 09:58:09PM CEST, snelson@pensando.io wrote:
>> On 7/8/19 12:34 PM, Jiri Pirko wrote:
>>> Mon, Jul 08, 2019 at 09:25:32PM CEST, snelson@pensando.io wrote:
>>>>
>>>> +
>>>> +static const struct devlink_ops ionic_dl_ops = {
>>>> +	.info_get	= ionic_dl_info_get,
>>>> +};
>>>> +
>>>> +int ionic_devlink_register(struct ionic *ionic)
>>>> +{
>>>> +	struct devlink *dl;
>>>> +	struct ionic **ip;
>>>> +	int err;
>>>> +
>>>> +	dl = devlink_alloc(&ionic_dl_ops, sizeof(struct ionic *));
>>> Oups. Something is wrong with your flow. The devlink alloc is allocating
>>> the structure that holds private data (per-device data) for you. This is
>>> misuse :/
>>>
>>> You are missing one parent device struct apparently.
>>>
>>> Oh, I think I see something like it. The unused "struct ionic_devlink".
>> If I'm not mistaken, the alloc is only allocating enough for a pointer, not
>> the whole per device struct, and a few lines down from here the pointer to
>> the new devlink struct is assigned to ionic->dl.  This was based on what I
>> found in the qed driver's qed_devlink_register(), and it all seems to work.
> I'm not saying your code won't work. What I say is that you should have
> a struct for device that would be allocated by devlink_alloc()

Is there a particular reason why?  I appreciate that devlink_alloc() can 
give you this device specific space, just as alloc_etherdev_mq() can, 
but is there a specific reason why this should be used instead of 
setting up simply a pointer to a space that has already been allocated?  
There are several drivers that are using it the way I've setup here, 
which happened to be the first examples I followed - are they doing 
something different that makes this valid for them?

>
> The ionic struct should be associated with devlink_port. That you are
> missing too.

We don't support any of devlink_port features at this point, just the 
simple device information.

sln

>
>
>> That unused struct ionic_devlink does need to go away, it was superfluous
>> after working out a better typecast off of devlink_priv().
>>
>> I'll remove the unused struct ionic_devlink, but I think the rest is okay.
>>
>> sln
>>
>>>
>>>> +	if (!dl) {
>>>> +		dev_warn(ionic->dev, "devlink_alloc failed");
>>>> +		return -ENOMEM;
>>>> +	}
>>>> +
>>>> +	ip = (struct ionic **)devlink_priv(dl);
>>>> +	*ip = ionic;
>>>> +	ionic->dl = dl;
>>>> +
>>>> +	err = devlink_register(dl, ionic->dev);
>>>> +	if (err) {
>>>> +		dev_warn(ionic->dev, "devlink_register failed: %d\n", err);
>>>> +		goto err_dl_free;
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +
>>>> +err_dl_free:
>>>> +	ionic->dl = NULL;
>>>> +	devlink_free(dl);
>>>> +	return err;
>>>> +}
>>>> +
>>>> +void ionic_devlink_unregister(struct ionic *ionic)
>>>> +{
>>>> +	if (!ionic->dl)
>>>> +		return;
>>>> +
>>>> +	devlink_unregister(ionic->dl);
>>>> +	devlink_free(ionic->dl);
>>>> +}
>>>> diff --git a/drivers/net/ethernet/pensando/ionic/ionic_devlink.h b/drivers/net/ethernet/pensando/ionic/ionic_devlink.h
>>>> new file mode 100644
>>>> index 000000000000..35528884e29f
>>>> --- /dev/null
>>>> +++ b/drivers/net/ethernet/pensando/ionic/ionic_devlink.h
>>>> @@ -0,0 +1,12 @@
>>>> +/* SPDX-License-Identifier: GPL-2.0 */
>>>> +/* Copyright(c) 2017 - 2019 Pensando Systems, Inc */
>>>> +
>>>> +#ifndef _IONIC_DEVLINK_H_
>>>> +#define _IONIC_DEVLINK_H_
>>>> +
>>>> +#include <net/devlink.h>
>>>> +
>>>> +int ionic_devlink_register(struct ionic *ionic);
>>>> +void ionic_devlink_unregister(struct ionic *ionic);
>>>> +
>>>> +#endif /* _IONIC_DEVLINK_H_ */
>>>> -- 
>>>> 2.17.1
>>>>


^ permalink raw reply

* Re: [PATCH net-next] net: openvswitch: use netif_ovs_is_port() instead of opencode
From: David Miller @ 2019-07-08 22:53 UTC (permalink / raw)
  To: ap420073; +Cc: pshelar, netdev, dev
In-Reply-To: <20190705160546.4847-1-ap420073@gmail.com>

From: Taehee Yoo <ap420073@gmail.com>
Date: Sat,  6 Jul 2019 01:05:46 +0900

> Use netif_ovs_is_port() function instead of open code.
> This patch doesn't change logic.
> 
> Signed-off-by: Taehee Yoo <ap420073@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 00/11] Add drop monitor for offloaded data paths
From: Jakub Kicinski @ 2019-07-08 22:51 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: David Miller, netdev, jiri, mlxsw, dsahern, roopa, nikolay, andy,
	pablo, pieter.jansenvanvuuren, andrew, f.fainelli, vivien.didelot,
	idosch, Alexei Starovoitov
In-Reply-To: <20190708131908.GA13672@splinter>

On Mon, 8 Jul 2019 16:19:08 +0300, Ido Schimmel wrote:
> On Sun, Jul 07, 2019 at 12:45:41PM -0700, David Miller wrote:
> > From: Ido Schimmel <idosch@idosch.org>
> > Date: Sun,  7 Jul 2019 10:58:17 +0300
> >   
> > > Users have several ways to debug the kernel and understand why a packet
> > > was dropped. For example, using "drop monitor" and "perf". Both
> > > utilities trace kfree_skb(), which is the function called when a packet
> > > is freed as part of a failure. The information provided by these tools
> > > is invaluable when trying to understand the cause of a packet loss.
> > > 
> > > In recent years, large portions of the kernel data path were offloaded
> > > to capable devices. Today, it is possible to perform L2 and L3
> > > forwarding in hardware, as well as tunneling (IP-in-IP and VXLAN).
> > > Different TC classifiers and actions are also offloaded to capable
> > > devices, at both ingress and egress.
> > > 
> > > However, when the data path is offloaded it is not possible to achieve
> > > the same level of introspection as tools such "perf" and "drop monitor"
> > > become irrelevant.
> > > 
> > > This patchset aims to solve this by allowing users to monitor packets
> > > that the underlying device decided to drop along with relevant metadata
> > > such as the drop reason and ingress port.  
> > 
> > We are now going to have 5 or so ways to capture packets passing through
> > the system, this is nonsense.
> > 
> > AF_PACKET, kfree_skb drop monitor, perf, XDP perf events, and now this
> > devlink thing.
> > 
> > This is insanity, too many ways to do the same thing and therefore the
> > worst possible user experience.
> > 
> > Pick _ONE_ method to trap packets and forward normal kfree_skb events,
> > XDP perf events, and these taps there too.
> > 
> > I mean really, think about it from the average user's perspective.  To
> > see all drops/pkts I have to attach a kfree_skb tracepoint, and not just
> > listen on devlink but configure a special tap thing beforehand and then
> > if someone is using XDP I gotta setup another perf event buffer capture
> > thing too.  
> 
> Let me try to explain again because I probably wasn't clear enough. The
> devlink-trap mechanism is not doing the same thing as other solutions.
> 
> The packets we are capturing in this patchset are packets that the
> kernel (the CPU) never saw up until now - they were silently dropped by
> the underlying device performing the packet forwarding instead of the
> CPU.

When you say silently dropped do you mean that mlxsw as of today
doesn't have any counters exposed for those events?

If we wanted to consolidate this into something existing we can either
 (a) add similar traps in the kernel data path;
 (b) make these traps extension of statistics.

My knee jerk reaction to seeing the patches was that it adds a new
place where device statistics are reported. Users who want to know why
things are dropped will not get detailed breakdown from ethtool -S which
for better or worse is the one stop shop for device stats today. 

Having thought about it some more, however, I think that having a
forwarding "exception" object and hanging statistics off of it is a
better design, even if we need to deal with some duplication to get
there.

IOW having an way to "trap all packets which would increment a
statistic" (option (b) above) is probably a bad design.

As for (a) I wonder how many of those events have a corresponding event
in the kernel stack? If we could add corresponding trace points and
just feed those from the device driver, that'd obviously be a holy
grail. Not to mention that requiring trace points to be added to the
core would make Alexei happy:

http://vger.kernel.org/netconf2019_files/netconf2019_slides_ast.pdf#page=3

;)

That's my $.02, not very insightful.

> For each such packet we get valuable metadata from the underlying device
> such as the drop reason and the ingress port. With time, even more
> reasons and metadata could be provided (e.g., egress port, traffic
> class). Netlink provides a structured and extensible way to report the
> packet along with the metadata to interested users. The tc-sample action
> uses a similar concept.
> 
> I would like to emphasize that these dropped packets are not injected to
> the kernel's receive path and therefore not subject to kfree_skb() and
> related infrastructure. There is no need to waste CPU cycles on packets
> we already know were dropped (and why). Further, hardware tail/early
> drops will not be dropped by the kernel, given its qdiscs are probably
> empty.
> 
> Regarding the use of devlink, current ASICs can forward packets at
> 6.4Tb/s. We do not want to overwhelm the CPU with dropped packets and
> therefore we give users the ability to control - via devlink - the
> trapping of certain packets to the CPU and their reporting to user
> space. In the future, devlink-trap can be extended to support the
> configuration of the hardware policers of each trap.

^ permalink raw reply

* Re: [PATCH net-next V2] MAINTAINERS: Add page_pool maintainer entry
From: David Miller @ 2019-07-08 22:51 UTC (permalink / raw)
  To: brouer
  Cc: ilias.apalodimas, netdev, daniel, jakub.kicinski, john.fastabend,
	ast
In-Reply-To: <156233140902.25371.7033961410347587264.stgit@carbon>

From: Jesper Dangaard Brouer <brouer@redhat.com>
Date: Fri, 05 Jul 2019 14:57:55 +0200

> In this release cycle the number of NIC drivers using page_pool
> will likely reach 4 drivers.  It is about time to add a maintainer
> entry.  Add myself and Ilias.
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
> V2: Ilias also volunteered to co-maintain over IRC

Applied.

^ permalink raw reply

* Re: [PATCH net-next 0/2] net: mvpp2: Add classification based on the ETHER flow
From: David Miller @ 2019-07-08 22:50 UTC (permalink / raw)
  To: maxime.chevallier
  Cc: netdev, linux-kernel, linux-arm-kernel, antoine.tenart,
	thomas.petazzoni, gregory.clement, miquel.raynal, nadavh, stefanc,
	mw
In-Reply-To: <20190705120913.25013-1-maxime.chevallier@bootlin.com>

From: Maxime Chevallier <maxime.chevallier@bootlin.com>
Date: Fri,  5 Jul 2019 14:09:11 +0200

> Hello everyone,
> 
> This series adds support for classification of the ETHER flow in the
> mvpp2 driver.
> 
> The first patch allows detecting when a user specifies a flow_type that
> isn't supported by the driver, while the second adds support for this
> flow_type by adding the mapping between the ETHER_FLOW enum value and
> the relevant classifier flow entries.

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH] net: sysctl: cleanup net_sysctl_init error exit paths
From: George G. Davis @ 2019-07-08 22:47 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel
In-Reply-To: <20190517144345.GA16926@mam-gdavis-lt>

Hello David,

On Fri, May 17, 2019 at 10:43:45AM -0400, George G. Davis wrote:
> Hello David,
> 
> On Thu, May 16, 2019 at 02:27:44PM -0700, David Miller wrote:
> > From: "George G. Davis" <george_davis@mentor.com>
> > Date: Thu, 16 May 2019 11:23:08 -0400
> > 
> > > Unwind net_sysctl_init error exit goto spaghetti code
> > > 
> > > Suggested-by: Joshua Frkuska <joshua_frkuska@mentor.com>
> > > Signed-off-by: George G. Davis <george_davis@mentor.com>
> > 
> > Cleanups are not appropriate until the net-next tree opens back up.
> > 
> > So please resubmit at that time.
> 
> I fear that I may be distracted by other shiny objects by then but
> I'll make a reminder and try to resubmit during the next merge window.

Since the "Linux 5.2" kernel has been released [1], I'm guessing that the
net-next merge window is open now? If yes, the patch remains unchanged
since my initial post. Please consider applying or let me know when to
resubmit when the net-next merge window is again open.

TIA!


> 
> Thanks!
> 
> > 
> > Thank you.
> 
> -- 
> Regards,
> George

-- 
Regards,
George
[1] https://lwn.net/Articles/792995/

^ permalink raw reply

* Re: [PATCH] selftests: txring_overwrite: fix incorrect test of mmap() return value
From: David Miller @ 2019-07-08 22:40 UTC (permalink / raw)
  To: debrabander; +Cc: netdev
In-Reply-To: <1562326994-4569-1-git-send-email-debrabander@gmail.com>

From: Frank de Brabander <debrabander@gmail.com>
Date: Fri,  5 Jul 2019 13:43:14 +0200

> If mmap() fails it returns MAP_FAILED, which is defined as ((void *) -1).
> The current if-statement incorrectly tests if *ring is NULL.
> 
> Signed-off-by: Frank de Brabander <debrabander@gmail.com>

Applied with fixes tag added and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH 1/1] tools/dtrace: initial implementation of DTrace
From: Kris Van Hees @ 2019-07-08 22:38 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Kris Van Hees, netdev, bpf, dtrace-devel, linux-kernel, rostedt,
	mhiramat, ast, daniel, Peter Zijlstra, Chris Mason
In-Reply-To: <20190708171537.GA11960@kernel.org>

On Mon, Jul 08, 2019 at 02:15:37PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, Jul 03, 2019 at 08:14:30PM -0700, Kris Van Hees escreveu:
> > This initial implementation of a tiny subset of DTrace functionality
> > provides the following options:
> > 
> > 	dtrace [-lvV] [-b bufsz] -s script
> > 	    -b  set trace buffer size
> > 	    -l  list probes (only works with '-s script' for now)
> > 	    -s  enable or list probes for the specified BPF program
> > 	    -V  report DTrace API version
> > 
> > The patch comprises quite a bit of code due to DTrace requiring a few
> > crucial components, even in its most basic form.
> > 
> > The code is structured around the command line interface implemented in
> > dtrace.c.  It provides option parsing and drives the three modes of
> > operation that are currently implemented:
> > 
> > 1. Report DTrace API version information.
> > 	Report the version information and terminate.
> > 
> > 2. List probes in BPF programs.
> > 	Initialize the list of probes that DTrace recognizes, load BPF
> > 	programs, parse all BPF ELF section names, resolve them into
> > 	known probes, and emit the probe names.  Then terminate.
> > 
> > 3. Load BPF programs and collect tracing data.
> > 	Initialize the list of probes that DTrace recognizes, load BPF
> > 	programs and attach them to their corresponding probes, set up
> > 	perf event output buffers, and start processing tracing data.
> > 
> > This implementation makes extensive use of BPF (handled by dt_bpf.c) and
> > the perf event output ring buffer (handled by dt_buffer.c).  DTrace-style
> > probe handling (dt_probe.c) offers an interface to probes that hides the
> > implementation details of the individual probe types by provider (dt_fbt.c
> > and dt_syscall.c).  Probe lookup by name uses a hashtable implementation
> > (dt_hash.c).  The dt_utils.c code populates a list of online CPU ids, so
> > we know what CPUs we can obtain tracing data from.
> > 
> > Building the tool is trivial because its only dependency (libbpf) is in
> > the kernel tree under tools/lib/bpf.  A simple 'make' in the tools/dtrace
> > directory suffices.
> > 
> > The 'dtrace' executable needs to run as root because BPF programs cannot
> > be loaded by non-root users.
> > 
> > Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
> > Reviewed-by: David Mc Lean <david.mclean@oracle.com>
> > Reviewed-by: Eugene Loh <eugene.loh@oracle.com>
> > ---
> >  MAINTAINERS                |   6 +
> >  tools/dtrace/Makefile      |  88 ++++++++++
> >  tools/dtrace/bpf_sample.c  | 145 ++++++++++++++++
> >  tools/dtrace/dt_bpf.c      | 188 +++++++++++++++++++++
> >  tools/dtrace/dt_buffer.c   | 331 +++++++++++++++++++++++++++++++++++++
> >  tools/dtrace/dt_fbt.c      | 201 ++++++++++++++++++++++
> >  tools/dtrace/dt_hash.c     | 211 +++++++++++++++++++++++
> >  tools/dtrace/dt_probe.c    | 230 ++++++++++++++++++++++++++
> >  tools/dtrace/dt_syscall.c  | 179 ++++++++++++++++++++
> >  tools/dtrace/dt_utils.c    | 132 +++++++++++++++
> >  tools/dtrace/dtrace.c      | 249 ++++++++++++++++++++++++++++
> >  tools/dtrace/dtrace.h      |  13 ++
> >  tools/dtrace/dtrace_impl.h | 101 +++++++++++
> >  13 files changed, 2074 insertions(+)
> >  create mode 100644 tools/dtrace/Makefile
> >  create mode 100644 tools/dtrace/bpf_sample.c
> >  create mode 100644 tools/dtrace/dt_bpf.c
> >  create mode 100644 tools/dtrace/dt_buffer.c
> >  create mode 100644 tools/dtrace/dt_fbt.c
> >  create mode 100644 tools/dtrace/dt_hash.c
> >  create mode 100644 tools/dtrace/dt_probe.c
> >  create mode 100644 tools/dtrace/dt_syscall.c
> >  create mode 100644 tools/dtrace/dt_utils.c
> >  create mode 100644 tools/dtrace/dtrace.c
> >  create mode 100644 tools/dtrace/dtrace.h
> >  create mode 100644 tools/dtrace/dtrace_impl.h
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 606d1f80bc49..668468834865 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -5474,6 +5474,12 @@ W:	https://linuxtv.org
> >  S:	Odd Fixes
> >  F:	drivers/media/pci/dt3155/
> >  
> > +DTRACE
> > +M:	Kris Van Hees <kris.van.hees@oracle.com>
> > +L:	dtrace-devel@oss.oracle.com
> > +S:	Maintained
> > +F:	tools/dtrace/
> > +
> >  DVB_USB_AF9015 MEDIA DRIVER
> >  M:	Antti Palosaari <crope@iki.fi>
> >  L:	linux-media@vger.kernel.org
> > diff --git a/tools/dtrace/Makefile b/tools/dtrace/Makefile
> > new file mode 100644
> > index 000000000000..99fd0f9dd1d6
> > --- /dev/null
> > +++ b/tools/dtrace/Makefile
> > @@ -0,0 +1,88 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +#
> > +# This Makefile is based on samples/bpf.
> > +#
> > +# Copyright (c) 2019, Oracle and/or its affiliates. All rights reserved.
> > +
> > +DT_VERSION		:= 2.0.0
> > +DT_GIT_VERSION		:= $(shell git rev-parse HEAD 2>/dev/null || \
> > +				   echo Unknown)
> > +
> > +DTRACE_PATH		?= $(abspath $(srctree)/$(src))
> > +TOOLS_PATH		:= $(DTRACE_PATH)/..
> > +SAMPLES_PATH		:= $(DTRACE_PATH)/../../samples
> > +
> > +hostprogs-y		:= dtrace
> > +
> > +LIBBPF			:= $(TOOLS_PATH)/lib/bpf/libbpf.a
> > +OBJS			:= dt_bpf.o dt_buffer.o dt_utils.o dt_probe.o \
> > +			   dt_hash.o \
> > +			   dt_fbt.o dt_syscall.o
> > +
> > +dtrace-objs		:= $(OBJS) dtrace.o
> > +
> > +always			:= $(hostprogs-y)
> > +always			+= bpf_sample.o
> > +
> > +KBUILD_HOSTCFLAGS	+= -DDT_VERSION=\"$(DT_VERSION)\"
> > +KBUILD_HOSTCFLAGS	+= -DDT_GIT_VERSION=\"$(DT_GIT_VERSION)\"
> > +KBUILD_HOSTCFLAGS	+= -I$(srctree)/tools/lib
> > +KBUILD_HOSTCFLAGS	+= -I$(srctree)/tools/perf
> 
> Interesting, what are you using from tools/perf/? So that we can move to
> tools/{include,lib,arch}.

This is my mistake...  an earlier version of the code (as I was developing it)
was using stuff from tools/perf, but that is no longer the case.  Removing it.

> > +KBUILD_HOSTCFLAGS	+= -I$(srctree)/tools/include/uapi
> > +KBUILD_HOSTCFLAGS	+= -I$(srctree)/tools/include/
> > +KBUILD_HOSTCFLAGS	+= -I$(srctree)/usr/include
> > +
> > +KBUILD_HOSTLDLIBS	:= $(LIBBPF) -lelf
> > +
> > +LLC			?= llc
> > +CLANG			?= clang
> > +LLVM_OBJCOPY		?= llvm-objcopy
> > +
> > +ifdef CROSS_COMPILE
> > +HOSTCC			= $(CROSS_COMPILE)gcc
> > +CLANG_ARCH_ARGS		= -target $(ARCH)
> > +endif
> > +
> > +all:
> > +	$(MAKE) -C ../../ $(CURDIR)/ DTRACE_PATH=$(CURDIR)
> > +
> > +clean:
> > +	$(MAKE) -C ../../ M=$(CURDIR) clean
> > +	@rm -f *~
> > +
> > +$(LIBBPF): FORCE
> > +	$(MAKE) -C $(dir $@) RM='rm -rf' LDFLAGS= srctree=$(DTRACE_PATH)/../../ O=
> > +
> > +FORCE:
> > +
> > +.PHONY: verify_cmds verify_target_bpf $(CLANG) $(LLC)
> > +
> > +verify_cmds: $(CLANG) $(LLC)
> > +	@for TOOL in $^ ; do \
> > +		if ! (which -- "$${TOOL}" > /dev/null 2>&1); then \
> > +			echo "*** ERROR: Cannot find LLVM tool $${TOOL}" ;\
> > +			exit 1; \
> > +		else true; fi; \
> > +	done
> > +
> > +verify_target_bpf: verify_cmds
> > +	@if ! (${LLC} -march=bpf -mattr=help > /dev/null 2>&1); then \
> > +		echo "*** ERROR: LLVM (${LLC}) does not support 'bpf' target" ;\
> > +		echo "   NOTICE: LLVM version >= 3.7.1 required" ;\
> > +		exit 2; \
> > +	else true; fi
> > +
> > +$(DTRACE_PATH)/*.c: verify_target_bpf $(LIBBPF)
> > +$(src)/*.c: verify_target_bpf $(LIBBPF)
> > +
> > +$(obj)/%.o: $(src)/%.c
> > +	@echo "  CLANG-bpf " $@
> > +	$(Q)$(CLANG) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) -I$(obj) \
> > +		-I$(srctree)/tools/testing/selftests/bpf/ \
> > +		-D__KERNEL__ -D__BPF_TRACING__ -Wno-unused-value -Wno-pointer-sign \
> > +		-D__TARGET_ARCH_$(ARCH) -Wno-compare-distinct-pointer-types \
> > +		-Wno-gnu-variable-sized-type-not-at-end \
> > +		-Wno-address-of-packed-member -Wno-tautological-compare \
> > +		-Wno-unknown-warning-option $(CLANG_ARCH_ARGS) \
> > +		-I$(srctree)/samples/bpf/ -include asm_goto_workaround.h \
> > +		-O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf $(LLC_FLAGS) -filetype=obj -o $@
> 
> 
> We have the above in tools/perf/util/llvm-utils.c, perhaps we need to
> move it to some place in lib/ to share?

Yes, if there is a way to put things like this in a central location so we can
maintain a single copy that would be a good idea indeed.

> > diff --git a/tools/dtrace/bpf_sample.c b/tools/dtrace/bpf_sample.c
> > new file mode 100644
> > index 000000000000..49f350390b5f
> > --- /dev/null
> > +++ b/tools/dtrace/bpf_sample.c
> > @@ -0,0 +1,145 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * This sample DTrace BPF tracing program demonstrates how actions can be
> > + * associated with different probe types.
> > + *
> > + * The kprobe/ksys_write probe is a Function Boundary Tracing (FBT) entry probe
> > + * on the ksys_write(fd, buf, count) function in the kernel.  Arguments to the
> > + * function can be retrieved from the CPU registers (struct pt_regs).
> > + *
> > + * The tracepoint/syscalls/sys_enter_write probe is a System Call entry probe
> > + * for the write(d, buf, count) system call.  Arguments to the system call can
> > + * be retrieved from the tracepoint data passed to the BPF program as context
> > + * struct syscall_data) when the probe fires.
> > + *
> > + * The BPF program associated with each probe prepares a DTrace BPF context
> > + * (struct dt_bpf_context) that stores the probe ID and up to 10 arguments.
> > + * Only 3 arguments are used in this sample.  Then the prorgams call a shared
> > + * BPF function (bpf_action) that implements the actual action to be taken when
> > + * a probe fires.  It prepares a data record to be stored in the tracing buffer
> > + * and submits it to the buffer.  The data in the data record is obtained from
> > + * the DTrace BPF context.
> > + *
> > + * Copyright (c) 2019, Oracle and/or its affiliates. All rights reserved.
> > + */
> > +#include <uapi/linux/bpf.h>
> > +#include <linux/ptrace.h>
> > +#include <linux/version.h>
> > +#include <uapi/linux/unistd.h>
> > +#include "bpf_helpers.h"
> > +
> > +#include "dtrace.h"
> > +
> > +struct syscall_data {
> > +	struct pt_regs *regs;
> > +	long syscall_nr;
> > +	long arg[6];
> > +};
> > +
> > +struct bpf_map_def SEC("maps") buffers = {
> > +	.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
> > +	.key_size = sizeof(u32),
> > +	.value_size = sizeof(u32),
> > +	.max_entries = NR_CPUS,
> > +};
> > +
> > +#if defined(__amd64)
> > +# define GET_REGS_ARG0(regs)	((regs)->di)
> > +# define GET_REGS_ARG1(regs)	((regs)->si)
> > +# define GET_REGS_ARG2(regs)	((regs)->dx)
> > +# define GET_REGS_ARG3(regs)	((regs)->cx)
> > +# define GET_REGS_ARG4(regs)	((regs)->r8)
> > +# define GET_REGS_ARG5(regs)	((regs)->r9)
> > +#else
> > +# warning Argument retrieval from pt_regs is not supported yet on this arch.
> > +# define GET_REGS_ARG0(regs)	0
> > +# define GET_REGS_ARG1(regs)	0
> > +# define GET_REGS_ARG2(regs)	0
> > +# define GET_REGS_ARG3(regs)	0
> > +# define GET_REGS_ARG4(regs)	0
> > +# define GET_REGS_ARG5(regs)	0
> > +#endif
> 
> We have this in tools/testing/selftests/bpf/bpf_helpers.h, probably need
> to move to some other place in tools/include/ where this can be shared.

I should be using the ones in bpf_helpers (since I already include that
anyway), and yes, if we can move that to a general use location under
tools/include that would be a good idea.

Also, I jsut updated my code to use this and I added a PT_REGS_PARM6(x) for
all the listed archs because I need to be able to get to up to 6 parameters
rather than the supported 5.  As far as I can see, all listed archs support
argument passing of at least 6 arguments so this should be no problem.

Any objections?

^ permalink raw reply

* Re: [PATCH bpf-next v3] virtio_net: add XDP meta data support
From: Daniel Borkmann @ 2019-07-08 22:38 UTC (permalink / raw)
  To: Yuya Kusakabe, Jason Wang
  Cc: ast, davem, hawk, jakub.kicinski, john.fastabend, kafai, mst,
	netdev, songliubraving, yhs
In-Reply-To: <52e3fc0d-bdd7-83ee-58e6-488e2b91cc83@gmail.com>

On 07/02/2019 04:11 PM, Yuya Kusakabe wrote:
> On 7/2/19 5:33 PM, Jason Wang wrote:
>> On 2019/7/2 下午4:16, Yuya Kusakabe wrote:
>>> This adds XDP meta data support to both receive_small() and
>>> receive_mergeable().
>>>
>>> Fixes: de8f3a83b0a0 ("bpf: add meta pointer for direct access")
>>> Signed-off-by: Yuya Kusakabe <yuya.kusakabe@gmail.com>
>>> ---
>>> v3:
>>>   - fix preserve the vnet header in receive_small().
>>> v2:
>>>   - keep copy untouched in page_to_skb().
>>>   - preserve the vnet header in receive_small().
>>>   - fix indentation.
>>> ---
>>>   drivers/net/virtio_net.c | 45 +++++++++++++++++++++++++++-------------
>>>   1 file changed, 31 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index 4f3de0ac8b0b..03a1ae6fe267 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -371,7 +371,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>>>                      struct receive_queue *rq,
>>>                      struct page *page, unsigned int offset,
>>>                      unsigned int len, unsigned int truesize,
>>> -                   bool hdr_valid)
>>> +                   bool hdr_valid, unsigned int metasize)
>>>   {
>>>       struct sk_buff *skb;
>>>       struct virtio_net_hdr_mrg_rxbuf *hdr;
>>> @@ -393,7 +393,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>>>       else
>>>           hdr_padded_len = sizeof(struct padded_vnet_hdr);
>>>   -    if (hdr_valid)
>>> +    if (hdr_valid && !metasize)
>>>           memcpy(hdr, p, hdr_len);
>>>         len -= hdr_len;
>>> @@ -405,6 +405,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>>>           copy = skb_tailroom(skb);
>>>       skb_put_data(skb, p, copy);
>>>   +    if (metasize) {
>>> +        __skb_pull(skb, metasize);
>>> +        skb_metadata_set(skb, metasize);
>>> +    }
>>> +
>>>       len -= copy;
>>>       offset += copy;
>>>   @@ -644,6 +649,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
>>>       unsigned int delta = 0;
>>>       struct page *xdp_page;
>>>       int err;
>>> +    unsigned int metasize = 0;
>>>         len -= vi->hdr_len;
>>>       stats->bytes += len;
>>> @@ -683,10 +689,13 @@ static struct sk_buff *receive_small(struct net_device *dev,
>>>             xdp.data_hard_start = buf + VIRTNET_RX_PAD + vi->hdr_len;
>>>           xdp.data = xdp.data_hard_start + xdp_headroom;
>>> -        xdp_set_data_meta_invalid(&xdp);
>>>           xdp.data_end = xdp.data + len;
>>> +        xdp.data_meta = xdp.data;
>>>           xdp.rxq = &rq->xdp_rxq;
>>>           orig_data = xdp.data;
>>> +        /* Copy the vnet header to the front of data_hard_start to avoid
>>> +         * overwriting by XDP meta data */
>>> +        memcpy(xdp.data_hard_start - vi->hdr_len, xdp.data - vi->hdr_len, vi->hdr_len);

I'm not fully sure if I'm following this one correctly, probably just missing
something. Isn't the vnet header based on how we set up xdp.data_hard_start
earlier already in front of it? Wouldn't we copy invalid data from xdp.data -
vi->hdr_len into the vnet header at that point (given there can be up to 256
bytes of headroom between the two)? If it's relative to xdp.data and headroom
is >0, then BPF prog could otherwise mangle this; something doesn't add up to
me here. Could you clarify? Thx

>> What happens if we have a large metadata that occupies all headroom here?
>>
>> Thanks
> 
> Do you mean a large "XDP" metadata? If a large metadata is a large "XDP" metadata, I think we can not use a metadata that occupies all headroom. The size of metadata limited by bpf_xdp_adjust_meta() as below.
> bpf_xdp_adjust_meta() in net/core/filter.c:
> 	if (unlikely((metalen & (sizeof(__u32) - 1)) ||
> 		     (metalen > 32)))
> 		return -EACCES;
> 
> Thanks.
> 
>>
>>
>>>           act = bpf_prog_run_xdp(xdp_prog, &xdp);
>>>           stats->xdp_packets++;
>>>   @@ -695,9 +704,11 @@ static struct sk_buff *receive_small(struct net_device *dev,
>>>               /* Recalculate length in case bpf program changed it */
>>>               delta = orig_data - xdp.data;
>>>               len = xdp.data_end - xdp.data;
>>> +            metasize = xdp.data - xdp.data_meta;
>>>               break;
>>>           case XDP_TX:
>>>               stats->xdp_tx++;
>>> +            xdp.data_meta = xdp.data;
>>>               xdpf = convert_to_xdp_frame(&xdp);
>>>               if (unlikely(!xdpf))
>>>                   goto err_xdp;
>>> @@ -736,10 +747,12 @@ static struct sk_buff *receive_small(struct net_device *dev,
>>>       skb_reserve(skb, headroom - delta);
>>>       skb_put(skb, len);
>>>       if (!delta) {
>>> -        buf += header_offset;
>>> -        memcpy(skb_vnet_hdr(skb), buf, vi->hdr_len);
>>> +        memcpy(skb_vnet_hdr(skb), buf + VIRTNET_RX_PAD, vi->hdr_len);
>>>       } /* keep zeroed vnet hdr since packet was changed by bpf */
>>>   +    if (metasize)
>>> +        skb_metadata_set(skb, metasize);
>>> +
>>>   err:
>>>       return skb;
>>>   @@ -760,8 +773,8 @@ static struct sk_buff *receive_big(struct net_device *dev,
>>>                      struct virtnet_rq_stats *stats)
>>>   {
>>>       struct page *page = buf;
>>> -    struct sk_buff *skb = page_to_skb(vi, rq, page, 0, len,
>>> -                      PAGE_SIZE, true);
>>> +    struct sk_buff *skb =
>>> +        page_to_skb(vi, rq, page, 0, len, PAGE_SIZE, true, 0);
>>>         stats->bytes += len - vi->hdr_len;
>>>       if (unlikely(!skb))
>>> @@ -793,6 +806,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>>       unsigned int truesize;
>>>       unsigned int headroom = mergeable_ctx_to_headroom(ctx);
>>>       int err;
>>> +    unsigned int metasize = 0;
>>>         head_skb = NULL;
>>>       stats->bytes += len - vi->hdr_len;
>>> @@ -839,8 +853,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>>           data = page_address(xdp_page) + offset;
>>>           xdp.data_hard_start = data - VIRTIO_XDP_HEADROOM + vi->hdr_len;
>>>           xdp.data = data + vi->hdr_len;
>>> -        xdp_set_data_meta_invalid(&xdp);
>>>           xdp.data_end = xdp.data + (len - vi->hdr_len);
>>> +        xdp.data_meta = xdp.data;
>>>           xdp.rxq = &rq->xdp_rxq;
>>>             act = bpf_prog_run_xdp(xdp_prog, &xdp);
>>> @@ -852,8 +866,9 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>>                * adjustments. Note other cases do not build an
>>>                * skb and avoid using offset
>>>                */
>>> -            offset = xdp.data -
>>> -                    page_address(xdp_page) - vi->hdr_len;
>>> +            metasize = xdp.data - xdp.data_meta;
>>> +            offset = xdp.data - page_address(xdp_page) -
>>> +                 vi->hdr_len - metasize;
>>>                 /* recalculate len if xdp.data or xdp.data_end were
>>>                * adjusted
>>> @@ -863,14 +878,15 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>>               if (unlikely(xdp_page != page)) {
>>>                   rcu_read_unlock();
>>>                   put_page(page);
>>> -                head_skb = page_to_skb(vi, rq, xdp_page,
>>> -                               offset, len,
>>> -                               PAGE_SIZE, false);
>>> +                head_skb = page_to_skb(vi, rq, xdp_page, offset,
>>> +                               len, PAGE_SIZE, false,
>>> +                               metasize);
>>>                   return head_skb;
>>>               }
>>>               break;
>>>           case XDP_TX:
>>>               stats->xdp_tx++;
>>> +            xdp.data_meta = xdp.data;
>>>               xdpf = convert_to_xdp_frame(&xdp);
>>>               if (unlikely(!xdpf))
>>>                   goto err_xdp;
>>> @@ -921,7 +937,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>>           goto err_skb;
>>>       }
>>>   -    head_skb = page_to_skb(vi, rq, page, offset, len, truesize, !xdp_prog);
>>> +    head_skb = page_to_skb(vi, rq, page, offset, len, truesize, !xdp_prog,
>>> +                   metasize);
>>>       curr_skb = head_skb;
>>>         if (unlikely(!curr_skb))


^ permalink raw reply

* Re: [PATCH v3 0/3] vsock/virtio: several fixes in the .probe() and .remove()
From: David Miller @ 2019-07-08 22:35 UTC (permalink / raw)
  To: sgarzare; +Cc: netdev, kvm, mst, linux-kernel, jasowang, virtualization,
	stefanha
In-Reply-To: <20190705110454.95302-1-sgarzare@redhat.com>

From: Stefano Garzarella <sgarzare@redhat.com>
Date: Fri,  5 Jul 2019 13:04:51 +0200

> During the review of "[PATCH] vsock/virtio: Initialize core virtio vsock
> before registering the driver", Stefan pointed out some possible issues
> in the .probe() and .remove() callbacks of the virtio-vsock driver.
 ...

Series applied to net-next, thanks.

^ permalink raw reply

* Re: [PATCH v3 0/2] Document the configuration of b53
From: David Miller @ 2019-07-08 22:30 UTC (permalink / raw)
  To: b.spranger; +Cc: f.fainelli, netdev, bigeasy, kurt, andrew, vivien.didelot
In-Reply-To: <20190705095719.24095-1-b.spranger@linutronix.de>

From: Benedikt Spranger <b.spranger@linutronix.de>
Date: Fri,  5 Jul 2019 11:57:17 +0200

> this is the third round to document the configuration of a b53 supported
> switch.

Series applied.

There was some trailing whitespace which I took care of for you this
time.

Thanks.

^ permalink raw reply

* Re: [PATCH 0/2] forcedeth: recv cache support
From: David Miller @ 2019-07-08 22:23 UTC (permalink / raw)
  To: yanjun.zhu; +Cc: netdev
In-Reply-To: <1562307568-21549-1-git-send-email-yanjun.zhu@oracle.com>

From: Zhu Yanjun <yanjun.zhu@oracle.com>
Date: Fri,  5 Jul 2019 02:19:26 -0400

> This recv cache is to make NIC work steadily when the system memory is
> not enough.

The system is supposed to hold onto enough atomic memory to absorb all
reasonable situations like this.

If anything a solution to this problem belongs generically somewhere,
not in a driver.  And furthermore looping over an allocation attempt
with a delay is strongly discouraged.

^ permalink raw reply

* Re: [PATCH net-next v2 0/4] bnxt_en: Add XDP_REDIRECT support.
From: David Miller @ 2019-07-08 22:20 UTC (permalink / raw)
  To: michael.chan; +Cc: gospo, netdev, hawk, ast, ilias.apalodimas
In-Reply-To: <1562622784-29918-1-git-send-email-michael.chan@broadcom.com>

From: Michael Chan <michael.chan@broadcom.com>
Date: Mon,  8 Jul 2019 17:53:00 -0400

> This patch series adds XDP_REDIRECT support by Andy Gospodarek.

Series applied, thanks everyone.

^ permalink raw reply

* Re: [PATCH v9 net-next 0/5] net: ethernet: ti: cpsw: Add XDP support
From: David Miller @ 2019-07-08 22:12 UTC (permalink / raw)
  To: ivan.khoronzhuk
  Cc: grygorii.strashko, hawk, ast, linux-kernel, linux-omap,
	xdp-newbies, ilias.apalodimas, netdev, daniel, jakub.kicinski,
	john.fastabend
In-Reply-To: <20190708213432.8525-1-ivan.khoronzhuk@linaro.org>

From: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Date: Tue,  9 Jul 2019 00:34:27 +0300

> This patchset adds XDP support for TI cpsw driver and base it on
> page_pool allocator. It was verified on af_xdp socket drop,
> af_xdp l2f, ebpf XDP_DROP, XDP_REDIRECT, XDP_PASS, XDP_TX.
 ...

Series applied, thanks Ivan!

^ permalink raw reply

* Re: [PATCH v3 net-next 13/19] ionic: Add initial ethtool support
From: Andrew Lunn @ 2019-07-08 22:04 UTC (permalink / raw)
  To: Shannon Nelson; +Cc: netdev
In-Reply-To: <20190708192532.27420-14-snelson@pensando.io>

> +static int ionic_get_link_ksettings(struct net_device *netdev,
> +				    struct ethtool_link_ksettings *ks)
> +{
> +	struct lif *lif = netdev_priv(netdev);
> +	struct ionic_dev *idev = &lif->ionic->idev;
> +	int copper_seen = 0;
> +
> +	ethtool_link_ksettings_zero_link_mode(ks, supported);
> +	ethtool_link_ksettings_zero_link_mode(ks, advertising);
> +
> +	switch (le16_to_cpu(idev->port_info->status.xcvr.pid)) {
> +		/* Copper */
> +	case XCVR_PID_QSFP_100G_CR4:
> +		ethtool_link_ksettings_add_link_mode(ks, supported,
> +						     100000baseCR4_Full);
> +		copper_seen++;
> +		break;
> +	case XCVR_PID_QSFP_40GBASE_CR4:
> +		ethtool_link_ksettings_add_link_mode(ks, supported,
> +						     40000baseCR4_Full);
> +		copper_seen++;
> +		break;
> +	case XCVR_PID_SFP_25GBASE_CR_S:
> +	case XCVR_PID_SFP_25GBASE_CR_L:
> +	case XCVR_PID_SFP_25GBASE_CR_N:
> +		ethtool_link_ksettings_add_link_mode(ks, supported,
> +						     25000baseCR_Full);
> +		copper_seen++;
> +		break;
> +	case XCVR_PID_SFP_10GBASE_AOC:
> +	case XCVR_PID_SFP_10GBASE_CU:
> +		ethtool_link_ksettings_add_link_mode(ks, supported,
> +						     10000baseCR_Full);
> +		copper_seen++;
> +		break;
> +
> +		/* Fibre */
> +	case XCVR_PID_QSFP_100G_SR4:
> +	case XCVR_PID_QSFP_100G_AOC:
> +		ethtool_link_ksettings_add_link_mode(ks, supported,
> +						     100000baseSR4_Full);
> +		break;
> +	case XCVR_PID_QSFP_100G_LR4:
> +		ethtool_link_ksettings_add_link_mode(ks, supported,
> +						     100000baseLR4_ER4_Full);
> +		break;
> +	case XCVR_PID_QSFP_100G_ER4:
> +		ethtool_link_ksettings_add_link_mode(ks, supported,
> +						     100000baseLR4_ER4_Full);
> +		break;
> +	case XCVR_PID_QSFP_40GBASE_SR4:
> +	case XCVR_PID_QSFP_40GBASE_AOC:
> +		ethtool_link_ksettings_add_link_mode(ks, supported,
> +						     40000baseSR4_Full);
> +		break;
> +	case XCVR_PID_QSFP_40GBASE_LR4:
> +		ethtool_link_ksettings_add_link_mode(ks, supported,
> +						     40000baseLR4_Full);
> +		break;
> +	case XCVR_PID_SFP_25GBASE_SR:
> +	case XCVR_PID_SFP_25GBASE_AOC:
> +		ethtool_link_ksettings_add_link_mode(ks, supported,
> +						     25000baseSR_Full);
> +		break;
> +	case XCVR_PID_SFP_10GBASE_SR:
> +		ethtool_link_ksettings_add_link_mode(ks, supported,
> +						     10000baseSR_Full);
> +		break;
> +	case XCVR_PID_SFP_10GBASE_LR:
> +		ethtool_link_ksettings_add_link_mode(ks, supported,
> +						     10000baseLR_Full);
> +		break;
> +	case XCVR_PID_SFP_10GBASE_LRM:
> +		ethtool_link_ksettings_add_link_mode(ks, supported,
> +						     10000baseLRM_Full);
> +		break;
> +	case XCVR_PID_SFP_10GBASE_ER:
> +		ethtool_link_ksettings_add_link_mode(ks, supported,
> +						     10000baseER_Full);
> +		break;

I don't know these link modes too well. But only setting a single bit
seems odd. What i do know is that an SFP which supports 2500BaseX
should also be able to support 1000BaseX. So should a 100G SFP also
support 40G, 25G, 10G etc? The SERDES just runs a slower bitstream
over the basic bitpipe?

> +	case XCVR_PID_QSFP_100G_ACC:
> +	case XCVR_PID_QSFP_40GBASE_ER4:
> +	case XCVR_PID_SFP_25GBASE_LR:
> +	case XCVR_PID_SFP_25GBASE_ER:
> +		dev_info(lif->ionic->dev, "no decode bits for xcvr type pid=%d / 0x%x\n",
> +			 idev->port_info->status.xcvr.pid,
> +			 idev->port_info->status.xcvr.pid);
> +		break;

Why not add them?


> +	memcpy(ks->link_modes.advertising, ks->link_modes.supported,
> +	       sizeof(ks->link_modes.advertising));

bitmap_copy() would be a better way to do this. You could consider
adding a helper to ethtool.h.

       Andrew

^ permalink raw reply

* [PATCH net-next v2 4/4] bnxt_en: add page_pool support
From: Michael Chan @ 2019-07-08 21:53 UTC (permalink / raw)
  To: davem, gospo; +Cc: netdev, hawk, ast, ilias.apalodimas
In-Reply-To: <1562622784-29918-1-git-send-email-michael.chan@broadcom.com>

From: Andy Gospodarek <gospo@broadcom.com>

This removes contention over page allocation for XDP_REDIRECT actions by
adding page_pool support per queue for the driver.  The performance for
XDP_REDIRECT actions scales linearly with the number of cores performing
redirect actions when using the page pools instead of the standard page
allocator.

v2: Fix up the error path from XDP registration, noted by Ilias Apalodimas.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/Kconfig         |  1 +
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     | 47 +++++++++++++++++++++++----
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |  3 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c |  3 +-
 4 files changed, 47 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/Kconfig b/drivers/net/ethernet/broadcom/Kconfig
index 2e4a8c7..e9017ca 100644
--- a/drivers/net/ethernet/broadcom/Kconfig
+++ b/drivers/net/ethernet/broadcom/Kconfig
@@ -199,6 +199,7 @@ config BNXT
 	select FW_LOADER
 	select LIBCRC32C
 	select NET_DEVLINK
+	select PAGE_POOL
 	---help---
 	  This driver supports Broadcom NetXtreme-C/E 10/25/40/50 gigabit
 	  Ethernet cards.  To compile this driver as a module, choose M here:
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index d8f0846..d25bb38 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -54,6 +54,7 @@
 #include <net/pkt_cls.h>
 #include <linux/hwmon.h>
 #include <linux/hwmon-sysfs.h>
+#include <net/page_pool.h>
 
 #include "bnxt_hsi.h"
 #include "bnxt.h"
@@ -668,19 +669,20 @@ static void bnxt_tx_int(struct bnxt *bp, struct bnxt_napi *bnapi, int nr_pkts)
 }
 
 static struct page *__bnxt_alloc_rx_page(struct bnxt *bp, dma_addr_t *mapping,
+					 struct bnxt_rx_ring_info *rxr,
 					 gfp_t gfp)
 {
 	struct device *dev = &bp->pdev->dev;
 	struct page *page;
 
-	page = alloc_page(gfp);
+	page = page_pool_dev_alloc_pages(rxr->page_pool);
 	if (!page)
 		return NULL;
 
 	*mapping = dma_map_page_attrs(dev, page, 0, PAGE_SIZE, bp->rx_dir,
 				      DMA_ATTR_WEAK_ORDERING);
 	if (dma_mapping_error(dev, *mapping)) {
-		__free_page(page);
+		page_pool_recycle_direct(rxr->page_pool, page);
 		return NULL;
 	}
 	*mapping += bp->rx_dma_offset;
@@ -716,7 +718,8 @@ int bnxt_alloc_rx_data(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
 	dma_addr_t mapping;
 
 	if (BNXT_RX_PAGE_MODE(bp)) {
-		struct page *page = __bnxt_alloc_rx_page(bp, &mapping, gfp);
+		struct page *page =
+			__bnxt_alloc_rx_page(bp, &mapping, rxr, gfp);
 
 		if (!page)
 			return -ENOMEM;
@@ -2360,7 +2363,7 @@ static void bnxt_free_rx_skbs(struct bnxt *bp)
 				dma_unmap_page_attrs(&pdev->dev, mapping,
 						     PAGE_SIZE, bp->rx_dir,
 						     DMA_ATTR_WEAK_ORDERING);
-				__free_page(data);
+				page_pool_recycle_direct(rxr->page_pool, data);
 			} else {
 				dma_unmap_single_attrs(&pdev->dev, mapping,
 						       bp->rx_buf_use_size,
@@ -2497,6 +2500,8 @@ static void bnxt_free_rx_rings(struct bnxt *bp)
 		if (xdp_rxq_info_is_reg(&rxr->xdp_rxq))
 			xdp_rxq_info_unreg(&rxr->xdp_rxq);
 
+		rxr->page_pool = NULL;
+
 		kfree(rxr->rx_tpa);
 		rxr->rx_tpa = NULL;
 
@@ -2511,6 +2516,26 @@ static void bnxt_free_rx_rings(struct bnxt *bp)
 	}
 }
 
+static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
+				   struct bnxt_rx_ring_info *rxr)
+{
+	struct page_pool_params pp = { 0 };
+
+	pp.pool_size = bp->rx_ring_size;
+	pp.nid = dev_to_node(&bp->pdev->dev);
+	pp.dev = &bp->pdev->dev;
+	pp.dma_dir = DMA_BIDIRECTIONAL;
+
+	rxr->page_pool = page_pool_create(&pp);
+	if (IS_ERR(rxr->page_pool)) {
+		int err = PTR_ERR(rxr->page_pool);
+
+		rxr->page_pool = NULL;
+		return err;
+	}
+	return 0;
+}
+
 static int bnxt_alloc_rx_rings(struct bnxt *bp)
 {
 	int i, rc, agg_rings = 0, tpa_rings = 0;
@@ -2530,14 +2555,24 @@ static int bnxt_alloc_rx_rings(struct bnxt *bp)
 
 		ring = &rxr->rx_ring_struct;
 
+		rc = bnxt_alloc_rx_page_pool(bp, rxr);
+		if (rc)
+			return rc;
+
 		rc = xdp_rxq_info_reg(&rxr->xdp_rxq, bp->dev, i);
-		if (rc < 0)
+		if (rc < 0) {
+			page_pool_free(rxr->page_pool);
+			rxr->page_pool = NULL;
 			return rc;
+		}
 
 		rc = xdp_rxq_info_reg_mem_model(&rxr->xdp_rxq,
-						MEM_TYPE_PAGE_SHARED, NULL);
+						MEM_TYPE_PAGE_POOL,
+						rxr->page_pool);
 		if (rc) {
 			xdp_rxq_info_unreg(&rxr->xdp_rxq);
+			page_pool_free(rxr->page_pool);
+			rxr->page_pool = NULL;
 			return rc;
 		}
 
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 8ac51fa..16694b7 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -26,6 +26,8 @@
 #include <net/xdp.h>
 #include <linux/dim.h>
 
+struct page_pool;
+
 struct tx_bd {
 	__le32 tx_bd_len_flags_type;
 	#define TX_BD_TYPE					(0x3f << 0)
@@ -799,6 +801,7 @@ struct bnxt_rx_ring_info {
 	struct bnxt_ring_struct	rx_ring_struct;
 	struct bnxt_ring_struct	rx_agg_ring_struct;
 	struct xdp_rxq_info	xdp_rxq;
+	struct page_pool	*page_pool;
 };
 
 struct bnxt_cp_ring_info {
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
index 12489d2..c6f6f20 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
@@ -15,6 +15,7 @@
 #include <linux/bpf.h>
 #include <linux/bpf_trace.h>
 #include <linux/filter.h>
+#include <net/page_pool.h>
 #include "bnxt_hsi.h"
 #include "bnxt.h"
 #include "bnxt_xdp.h"
@@ -191,7 +192,7 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons,
 
 		if (xdp_do_redirect(bp->dev, &xdp, xdp_prog)) {
 			trace_xdp_exception(bp->dev, xdp_prog, act);
-			__free_page(page);
+			page_pool_recycle_direct(rxr->page_pool, page);
 			return true;
 		}
 
-- 
2.5.1


^ permalink raw reply related

* [PATCH net-next v2 3/4] bnxt_en: optimized XDP_REDIRECT support
From: Michael Chan @ 2019-07-08 21:53 UTC (permalink / raw)
  To: davem, gospo; +Cc: netdev, hawk, ast, ilias.apalodimas
In-Reply-To: <1562622784-29918-1-git-send-email-michael.chan@broadcom.com>

From: Andy Gospodarek <gospo@broadcom.com>

This adds basic support for XDP_REDIRECT in the bnxt_en driver.  Next
patch adds the more optimized page pool support.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |  27 ++++++-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |  13 +++-
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 108 ++++++++++++++++++++++++--
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h |   2 +
 4 files changed, 140 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index b7b6227..d8f0846 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -1989,6 +1989,9 @@ static int __bnxt_poll_work(struct bnxt *bp, struct bnxt_cp_ring_info *cpr,
 		}
 	}
 
+	if (event & BNXT_REDIRECT_EVENT)
+		xdp_do_flush_map();
+
 	if (event & BNXT_TX_EVENT) {
 		struct bnxt_tx_ring_info *txr = bnapi->tx_ring;
 		u16 prod = txr->tx_prod;
@@ -2254,9 +2257,23 @@ static void bnxt_free_tx_skbs(struct bnxt *bp)
 
 		for (j = 0; j < max_idx;) {
 			struct bnxt_sw_tx_bd *tx_buf = &txr->tx_buf_ring[j];
-			struct sk_buff *skb = tx_buf->skb;
+			struct sk_buff *skb;
 			int k, last;
 
+			if (i < bp->tx_nr_rings_xdp &&
+			    tx_buf->action == XDP_REDIRECT) {
+				dma_unmap_single(&pdev->dev,
+					dma_unmap_addr(tx_buf, mapping),
+					dma_unmap_len(tx_buf, len),
+					PCI_DMA_TODEVICE);
+				xdp_return_frame(tx_buf->xdpf);
+				tx_buf->action = 0;
+				tx_buf->xdpf = NULL;
+				j++;
+				continue;
+			}
+
+			skb = tx_buf->skb;
 			if (!skb) {
 				j++;
 				continue;
@@ -2517,6 +2534,13 @@ static int bnxt_alloc_rx_rings(struct bnxt *bp)
 		if (rc < 0)
 			return rc;
 
+		rc = xdp_rxq_info_reg_mem_model(&rxr->xdp_rxq,
+						MEM_TYPE_PAGE_SHARED, NULL);
+		if (rc) {
+			xdp_rxq_info_unreg(&rxr->xdp_rxq);
+			return rc;
+		}
+
 		rc = bnxt_alloc_ring(bp, &ring->ring_mem);
 		if (rc)
 			return rc;
@@ -10233,6 +10257,7 @@ static const struct net_device_ops bnxt_netdev_ops = {
 	.ndo_udp_tunnel_add	= bnxt_udp_tunnel_add,
 	.ndo_udp_tunnel_del	= bnxt_udp_tunnel_del,
 	.ndo_bpf		= bnxt_xdp,
+	.ndo_xdp_xmit		= bnxt_xdp_xmit,
 	.ndo_bridge_getlink	= bnxt_bridge_getlink,
 	.ndo_bridge_setlink	= bnxt_bridge_setlink,
 	.ndo_get_devlink_port	= bnxt_get_devlink_port,
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index bf12cfc..8ac51fa 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -587,13 +587,18 @@ struct nqe_cn {
 #define BNXT_HWRM_CHNL_CHIMP	0
 #define BNXT_HWRM_CHNL_KONG	1
 
-#define BNXT_RX_EVENT	1
-#define BNXT_AGG_EVENT	2
-#define BNXT_TX_EVENT	4
+#define BNXT_RX_EVENT		1
+#define BNXT_AGG_EVENT		2
+#define BNXT_TX_EVENT		4
+#define BNXT_REDIRECT_EVENT	8
 
 struct bnxt_sw_tx_bd {
-	struct sk_buff		*skb;
+	union {
+		struct sk_buff		*skb;
+		struct xdp_frame	*xdpf;
+	};
 	DEFINE_DMA_UNMAP_ADDR(mapping);
+	DEFINE_DMA_UNMAP_LEN(len);
 	u8			is_gso;
 	u8			is_push;
 	u8			action;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
index 41e232e..12489d2 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
@@ -53,6 +53,20 @@ static void __bnxt_xmit_xdp(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
 	tx_buf->action = XDP_TX;
 }
 
+static void __bnxt_xmit_xdp_redirect(struct bnxt *bp,
+				     struct bnxt_tx_ring_info *txr,
+				     dma_addr_t mapping, u32 len,
+				     struct xdp_frame *xdpf)
+{
+	struct bnxt_sw_tx_bd *tx_buf;
+
+	tx_buf = bnxt_xmit_bd(bp, txr, mapping, len);
+	tx_buf->action = XDP_REDIRECT;
+	tx_buf->xdpf = xdpf;
+	dma_unmap_addr_set(tx_buf, mapping, mapping);
+	dma_unmap_len_set(tx_buf, len, 0);
+}
+
 void bnxt_tx_int_xdp(struct bnxt *bp, struct bnxt_napi *bnapi, int nr_pkts)
 {
 	struct bnxt_tx_ring_info *txr = bnapi->tx_ring;
@@ -66,7 +80,17 @@ void bnxt_tx_int_xdp(struct bnxt *bp, struct bnxt_napi *bnapi, int nr_pkts)
 	for (i = 0; i < nr_pkts; i++) {
 		tx_buf = &txr->tx_buf_ring[tx_cons];
 
-		if (tx_buf->action == XDP_TX) {
+		if (tx_buf->action == XDP_REDIRECT) {
+			struct pci_dev *pdev = bp->pdev;
+
+			dma_unmap_single(&pdev->dev,
+					 dma_unmap_addr(tx_buf, mapping),
+					 dma_unmap_len(tx_buf, len),
+					 PCI_DMA_TODEVICE);
+			xdp_return_frame(tx_buf->xdpf);
+			tx_buf->action = 0;
+			tx_buf->xdpf = NULL;
+		} else if (tx_buf->action == XDP_TX) {
 			rx_doorbell_needed = true;
 			last_tx_cons = tx_cons;
 		}
@@ -101,19 +125,19 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons,
 		return false;
 
 	pdev = bp->pdev;
-	txr = rxr->bnapi->tx_ring;
 	rx_buf = &rxr->rx_buf_ring[cons];
 	offset = bp->rx_offset;
 
+	mapping = rx_buf->mapping - bp->rx_dma_offset;
+	dma_sync_single_for_cpu(&pdev->dev, mapping + offset, *len, bp->rx_dir);
+
+	txr = rxr->bnapi->tx_ring;
 	xdp.data_hard_start = *data_ptr - offset;
 	xdp.data = *data_ptr;
 	xdp_set_data_meta_invalid(&xdp);
 	xdp.data_end = *data_ptr + *len;
 	xdp.rxq = &rxr->xdp_rxq;
 	orig_data = xdp.data;
-	mapping = rx_buf->mapping - bp->rx_dma_offset;
-
-	dma_sync_single_for_cpu(&pdev->dev, mapping + offset, *len, bp->rx_dir);
 
 	rcu_read_lock();
 	act = bpf_prog_run_xdp(xdp_prog, &xdp);
@@ -149,6 +173,30 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons,
 				NEXT_RX(rxr->rx_prod));
 		bnxt_reuse_rx_data(rxr, cons, page);
 		return true;
+	case XDP_REDIRECT:
+		/* if we are calling this here then we know that the
+		 * redirect is coming from a frame received by the
+		 * bnxt_en driver.
+		 */
+		dma_unmap_page_attrs(&pdev->dev, mapping,
+				     PAGE_SIZE, bp->rx_dir,
+				     DMA_ATTR_WEAK_ORDERING);
+
+		/* if we are unable to allocate a new buffer, abort and reuse */
+		if (bnxt_alloc_rx_data(bp, rxr, rxr->rx_prod, GFP_ATOMIC)) {
+			trace_xdp_exception(bp->dev, xdp_prog, act);
+			bnxt_reuse_rx_data(rxr, cons, page);
+			return true;
+		}
+
+		if (xdp_do_redirect(bp->dev, &xdp, xdp_prog)) {
+			trace_xdp_exception(bp->dev, xdp_prog, act);
+			__free_page(page);
+			return true;
+		}
+
+		*event |= BNXT_REDIRECT_EVENT;
+		break;
 	default:
 		bpf_warn_invalid_xdp_action(act);
 		/* Fall thru */
@@ -162,6 +210,56 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons,
 	return true;
 }
 
+int bnxt_xdp_xmit(struct net_device *dev, int num_frames,
+		  struct xdp_frame **frames, u32 flags)
+{
+	struct bnxt *bp = netdev_priv(dev);
+	struct bpf_prog *xdp_prog = READ_ONCE(bp->xdp_prog);
+	struct pci_dev *pdev = bp->pdev;
+	struct bnxt_tx_ring_info *txr;
+	dma_addr_t mapping;
+	int drops = 0;
+	int ring;
+	int i;
+
+	if (!test_bit(BNXT_STATE_OPEN, &bp->state) ||
+	    !bp->tx_nr_rings_xdp ||
+	    !xdp_prog)
+		return -EINVAL;
+
+	ring = smp_processor_id() % bp->tx_nr_rings_xdp;
+	txr = &bp->tx_ring[ring];
+
+	for (i = 0; i < num_frames; i++) {
+		struct xdp_frame *xdp = frames[i];
+
+		if (!txr || !bnxt_tx_avail(bp, txr) ||
+		    !(bp->bnapi[ring]->flags & BNXT_NAPI_FLAG_XDP)) {
+			xdp_return_frame_rx_napi(xdp);
+			drops++;
+			continue;
+		}
+
+		mapping = dma_map_single(&pdev->dev, xdp->data, xdp->len,
+					 DMA_TO_DEVICE);
+
+		if (dma_mapping_error(&pdev->dev, mapping)) {
+			xdp_return_frame_rx_napi(xdp);
+			drops++;
+			continue;
+		}
+		__bnxt_xmit_xdp_redirect(bp, txr, mapping, xdp->len, xdp);
+	}
+
+	if (flags & XDP_XMIT_FLUSH) {
+		/* Sync BD data before updating doorbell */
+		wmb();
+		bnxt_db_write(bp, &txr->tx_db, txr->tx_prod);
+	}
+
+	return num_frames - drops;
+}
+
 /* Under rtnl_lock */
 static int bnxt_xdp_set(struct bnxt *bp, struct bpf_prog *prog)
 {
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h
index 20e470c..0df40c3 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h
@@ -18,5 +18,7 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons,
 		 struct page *page, u8 **data_ptr, unsigned int *len,
 		 u8 *event);
 int bnxt_xdp(struct net_device *dev, struct netdev_bpf *xdp);
+int bnxt_xdp_xmit(struct net_device *dev, int num_frames,
+		  struct xdp_frame **frames, u32 flags);
 
 #endif
-- 
2.5.1


^ permalink raw reply related

* [PATCH net-next v2 1/4] bnxt_en: rename some xdp functions
From: Michael Chan @ 2019-07-08 21:53 UTC (permalink / raw)
  To: davem, gospo; +Cc: netdev, hawk, ast, ilias.apalodimas
In-Reply-To: <1562622784-29918-1-git-send-email-michael.chan@broadcom.com>

From: Andy Gospodarek <gospo@broadcom.com>

Renaming bnxt_xmit_xdp to __bnxt_xmit_xdp to get ready for XDP_REDIRECT
support and reduce confusion/namespace collision.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c     | 8 ++++----
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h     | 4 ++--
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index a6c7baf..21a0431 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -2799,7 +2799,7 @@ static int bnxt_run_loopback(struct bnxt *bp)
 		dev_kfree_skb(skb);
 		return -EIO;
 	}
-	bnxt_xmit_xdp(bp, txr, map, pkt_size, 0);
+	__bnxt_xmit_xdp(bp, txr, map, pkt_size, 0);
 
 	/* Sync BD data before updating doorbell */
 	wmb();
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
index 0184ef6..4bc9595 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
@@ -19,8 +19,8 @@
 #include "bnxt.h"
 #include "bnxt_xdp.h"
 
-void bnxt_xmit_xdp(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
-		   dma_addr_t mapping, u32 len, u16 rx_prod)
+void __bnxt_xmit_xdp(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
+		     dma_addr_t mapping, u32 len, u16 rx_prod)
 {
 	struct bnxt_sw_tx_bd *tx_buf;
 	struct tx_bd *txbd;
@@ -132,8 +132,8 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons,
 		*event = BNXT_TX_EVENT;
 		dma_sync_single_for_device(&pdev->dev, mapping + offset, *len,
 					   bp->rx_dir);
-		bnxt_xmit_xdp(bp, txr, mapping + offset, *len,
-			      NEXT_RX(rxr->rx_prod));
+		__bnxt_xmit_xdp(bp, txr, mapping + offset, *len,
+				NEXT_RX(rxr->rx_prod));
 		bnxt_reuse_rx_data(rxr, cons, page);
 		return true;
 	default:
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h
index 414b748..b36087b 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h
@@ -10,8 +10,8 @@
 #ifndef BNXT_XDP_H
 #define BNXT_XDP_H
 
-void bnxt_xmit_xdp(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
-		   dma_addr_t mapping, u32 len, u16 rx_prod);
+void __bnxt_xmit_xdp(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
+		     dma_addr_t mapping, u32 len, u16 rx_prod);
 void bnxt_tx_int_xdp(struct bnxt *bp, struct bnxt_napi *bnapi, int nr_pkts);
 bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons,
 		 struct page *page, u8 **data_ptr, unsigned int *len,
-- 
2.5.1


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox