* Re: [Intel-wired-lan] [PATCH 1/2] e1000e: Don't return uninitialized stats
From: Neftin, Sasha @ 2017-04-24 8:17 UTC (permalink / raw)
To: netdev@vger.kernel.org; +Cc: intel-wired-lan, Ruinskiy, Dima
In-Reply-To: <630A6B92B7EDEB45A87E20D3D286660171182EED@hasmsx109.ger.corp.intel.com>
On 4/23/2017 15:53, Neftin, Sasha wrote:
> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@lists.osuosl.org] On Behalf Of Benjamin Poirier
> Sent: Saturday, April 22, 2017 00:20
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: netdev@vger.kernel.org; intel-wired-lan@lists.osuosl.org; Stefan Priebe <s.priebe@profihost.ag>
> Subject: [Intel-wired-lan] [PATCH 1/2] e1000e: Don't return uninitialized stats
>
> Some statistics passed to ethtool are garbage because e1000e_get_stats64() doesn't write them, for example: tx_heartbeat_errors. This leaks kernel memory to userspace and confuses users.
>
> Do like ixgbe and use dev_get_stats() which first zeroes out rtnl_link_stats64.
>
> Reported-by: Stefan Priebe <s.priebe@profihost.ag>
> Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
> ---
> drivers/net/ethernet/intel/e1000e/ethtool.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/intel/e1000e/ethtool.c b/drivers/net/ethernet/intel/e1000e/ethtool.c
> index 7aff68a4a4df..f117b90cdc2f 100644
> --- a/drivers/net/ethernet/intel/e1000e/ethtool.c
> +++ b/drivers/net/ethernet/intel/e1000e/ethtool.c
> @@ -2063,7 +2063,7 @@ static void e1000_get_ethtool_stats(struct net_device *netdev,
>
> pm_runtime_get_sync(netdev->dev.parent);
>
> - e1000e_get_stats64(netdev, &net_stats);
> + dev_get_stats(netdev, &net_stats);
>
> pm_runtime_put_sync(netdev->dev.parent);
>
> --
> 2.12.2
>
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan@lists.osuosl.org
> http://lists.osuosl.org/mailman/listinfo/intel-wired-lan
Hello,
We would like to not accept this patch. Suggested generic method
'*dev_get_stats' (net/core/dev.c) calls 'ops->ndo_get_stats64' method
which eventually calls e1000e_get_stats64 (netdev.c) - so there is same
functionality. Also, see that 'e1000e_get_stats64' method in netdev.c
(line 5928) calls 'memset' with 0's before update statistics. Local
sanity check in our lab shows 'tx_heartbeat_errors' counter reported as 0.
Thanks,
Sasha
^ permalink raw reply
* [PATCH net-next] net: dsa: LAN9303: add i2c dependency
From: Tobias Regnery @ 2017-04-24 8:07 UTC (permalink / raw)
To: andrew, vivien.didelot, f.fainelli, netdev, linux-kernel, jbe
Cc: Tobias Regnery
The Kconfig symbol for the I2C mode of the LAN9303 driver selects
REGMAP_I2C. This symbol depends on I2C but the driver doesen't has a
dependency on I2C.
With CONFIG_I2C=n kconfig fails to select REGMAP_I2C and we get the
following warning:
warning: (NET_DSA_SMSC_LAN9303_I2C) selects REGMAP_I2C which has unmet direct dependencies (I2C)
This results in a build failure because the regmap-i2c functions aren't
available:
drivers/base/regmap/regmap-i2c.c: In function 'regmap_smbus_byte_reg_read':
drivers/base/regmap/regmap-i2c.c:29:8: error: implicit declaration of function 'i2c_smbus_read_byte_data' [-Werror=implicit-function-declaration]
drivers/base/regmap/regmap-i2c.c: In function 'regmap_smbus_byte_reg_write':
drivers/base/regmap/regmap-i2c.c:47:9: error: implicit declaration of function 'i2c_smbus_write_byte_data' [-Werror=implicit-function-declaration]
...
Fix this by adding a kconfig dependency on the I2C subsystem.
Fixes: be4e119f9914 ("net: dsa: LAN9303: add I2C managed mode support")
Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
---
drivers/net/dsa/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/dsa/Kconfig b/drivers/net/dsa/Kconfig
index 131a5b1cbfc8..862ee22303c2 100644
--- a/drivers/net/dsa/Kconfig
+++ b/drivers/net/dsa/Kconfig
@@ -59,7 +59,7 @@ config NET_DSA_SMSC_LAN9303
config NET_DSA_SMSC_LAN9303_I2C
tristate "SMSC/Microchip LAN9303 3-ports 10/100 ethernet switch in I2C managed mode"
- depends on NET_DSA
+ depends on NET_DSA && I2C
select NET_DSA_SMSC_LAN9303
select REGMAP_I2C
---help---
--
2.11.0
^ permalink raw reply related
* Re: [PATCH net] xfrm: fix stack access out of bounds with CONFIG_XFRM_SUB_POLICY
From: Herbert Xu @ 2017-04-24 7:59 UTC (permalink / raw)
To: Sabrina Dubroca; +Cc: netdev, Steffen Klassert
In-Reply-To: <20170421120531.GA12502@bistromath.localdomain>
On Fri, Apr 21, 2017 at 02:05:31PM +0200, Sabrina Dubroca wrote:
>
> You're right. I had a note about that but it got lost. The bug
> (ignoring what flavor of flowi is passed) is older than that (from the
> introduction of subpolicies, I suspect, but I would have to dig more
> into the history), but this commit removed the last uses of
> origin/partner.
>
> Looking into raw_sendmsg(), this code may have been safe before
> 9d6ec938019c ("ipv4: Use flowi4 in public route lookup interfaces."),
> since full flowi were used.
>
> If we want a fix for kernels that don't have ca116922afa8, we would
> probably need to take the size of the flowi* struct depending on the
> address family passed to xfrm_resolve_and_create_bundle().
>
>
> I'm not sure how to proceed here, any advice?
I think the Fixes header should simply refer to the commit that
introduced the bug. You're instead using it as a hint for how
to backport the patch. That is beyond the scope of the Fixes
header.
So if the bug started with 9d6ec938019c then just use that in
your Fixes header. If you want to help the backporter with more
information then you can put ca116922afa8 in the body of the commit
description.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: macvlan: Fix device ref leak when purging bc_queue
From: Herbert Xu @ 2017-04-24 7:56 UTC (permalink / raw)
To: Joe.Ghalam; +Cc: davem, Clifford.Wichmann, netdev
In-Reply-To: <1492785652578.53801@Dell.com>
On Fri, Apr 21, 2017 at 02:40:50PM +0000, Joe.Ghalam@dell.com wrote:
> That's not true. macvlan_dellink() unregisters the queue, and macvlan_process_broadcast() will never get called. Please note that I'm not speculating. I have traced enabled on the dev_put and dev_hold, and I'm reporting a real, reproducible issue.
The only thing that can stop macvlan_process_broadcast from getting
called is macvlan_port_destroy. Nothing else can stop the work
queue, unless of course the work queue mechanism itself is broken.
So if you're sure macvlan_port_destroy is never even called in
your case, then you'll need to start debugging the kernel work
queue mechanism to see why macvlan_process_broadcast is not getting
called.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [PATCH iproute2 net 3/8] tc/pedit: Introduce 'add' operation
From: Amir Vadai @ 2017-04-24 7:52 UTC (permalink / raw)
To: Jamal Hadi Salim; +Cc: Stephen Hemminger, netdev, Or Gerlitz
In-Reply-To: <f358025b-2618-3f22-b593-2e4e70fb7a79@mojatatu.com>
On Sun, Apr 23, 2017 at 01:44:51PM -0400, Jamal Hadi Salim wrote:
>
> Thanks for the excellent work.
>
> On 17-04-23 08:53 AM, Amir Vadai wrote:
> > This command could be useful to increase/decrease fields value.
> >
>
> Does this contradict the "retain" feature? Example rule to
> retain the second nibble but set the first to 0xA
> (essentially it X-ORs):
> tc filter add dev lo parent ffff: protocol ip prio 10 \
> u32 match ip dst 127.0.0.1/32 flowid 1:1 \
> action pedit munge offset 0 u8 set 0x0A retain 0xF0
You mean, for example increasing a single nible in an ipv4 address?
If so, then it should work:
# tc filter add dev veth0 parent ffff: u32 \
match ip dport 23 0xffff \
action pedit ex \
munge ip src add 1.0.0.0 retain 0xff000000
# tc -s filter show dev veth0 parent ffff:
filter protocol all pref 49151 u32
filter protocol all pref 49151 u32 fh 801: ht divisor 1
filter protocol all pref 49151 u32 fh 801::800 order 2048 key ht 801 bkt 0 terminal flowid ??? (rule hit 0 success 0)
match 00000017/0000ffff at 20 (success 0 )
action order 1: pedit action pass keys 1
index 48 ref 1 bind 1 installed 1 sec used 1 sec
key #0 at ipv4+12: add 01000000 mask 00ffffff
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
>
> Mostly curious about hardware handling.
As to hardware handling, Or did the implementation so he will answer
better. AFAIK, current implementation doesn't allow partial field
setting/adding, so such a rule can't be offloaded. I think it is only to
make things simple and because there was no use case for that.
To my knowledge, hardware should support such thing if needed.
Amir
>
> cheers,
> jamal
>
^ permalink raw reply
* [PATCH net] xfrm: do the garbage collection after flushing policy
From: Xin Long @ 2017-04-24 7:33 UTC (permalink / raw)
To: network dev; +Cc: davem, Herbert Xu, fw
Now xfrm garbage collection can be triggered by 'ip xfrm policy del'.
These is no reason not to do it after flushing policies, especially
considering that 'garbage collection deferred' is only triggered
when it reaches gc_thresh.
It's no good that the policy is gone but the xdst still hold there.
The worse thing is that xdst->route/orig_dst is also hold and can
not be released even if the orig_dst is already expired.
This patch is to do the garbage collection if there is any policy
removed in xfrm_policy_flush.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
net/xfrm/xfrm_policy.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 236cbbc..dfc77b9 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1006,6 +1006,10 @@ int xfrm_policy_flush(struct net *net, u8 type, bool task_valid)
err = -ESRCH;
out:
spin_unlock_bh(&net->xfrm.xfrm_policy_lock);
+
+ if (cnt)
+ xfrm_garbage_collect(net);
+
return err;
}
EXPORT_SYMBOL(xfrm_policy_flush);
--
2.1.0
^ permalink raw reply related
* [PATCH net] bridge: shutdown bridge device before removing it
From: Xin Long @ 2017-04-24 7:25 UTC (permalink / raw)
To: network dev; +Cc: davem, nikolay
During removing a bridge device, if the bridge is still up, a new mdb entry
still can be added in br_multicast_add_group() after all mdb entries are
removed in br_multicast_dev_del(). Like the path:
mld_ifc_timer_expire ->
mld_sendpack -> ...
br_multicast_rcv ->
br_multicast_add_group
The new mp's timer will be set up. If the timer expires after the bridge
is freed, it may cause use-after-free panic in br_multicast_group_expired.
This can happen when ip link remove a bridge or destroy a netns with a
bridge device inside.
As we can see in br_del_bridge, brctl is also supposed to remove a bridge
device after it's shutdown.
This patch is to call dev_close at the beginning of br_dev_delete so that
netif_running check in br_multicast_add_group can avoid this issue. But
to keep consistent with before, it will not remove the IFF_UP check in
br_del_bridge for brctl.
Reported-by: Jianwen Ji <jiji@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
net/bridge/br_if.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 56a2a72..8175f13 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -305,6 +305,8 @@ void br_dev_delete(struct net_device *dev, struct list_head *head)
struct net_bridge *br = netdev_priv(dev);
struct net_bridge_port *p, *n;
+ dev_close(br->dev);
+
list_for_each_entry_safe(p, n, &br->port_list, list) {
del_nbp(p);
}
--
2.1.0
^ permalink raw reply related
* Re: [PATCH] ipvs: explicitly forbid ipv6 service/dest creation if ipv6 mod is disabled
From: Julian Anastasov @ 2017-04-24 7:21 UTC (permalink / raw)
To: Paolo Abeni
Cc: netfilter-devel, lvs-devel, Wensong Zhang, Simon Horman, netdev
In-Reply-To: <1493016651.5744.1.camel@redhat.com>
Hello,
On Mon, 24 Apr 2017, Paolo Abeni wrote:
> Hi,
>
> The problem with the patched code is that it tries to resolve ipv6
> addresses that are not created/validated by the kernel.
OK. Simon, please apply to ipvs tree.
Acked-by: Julian Anastasov <ja@ssi.bg>
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply
* Re: [PATCH 2/2] net: team: fix memory leak in team_nl_send_options_get
From: Jiri Pirko @ 2017-04-24 7:11 UTC (permalink / raw)
To: Pan Bian; +Cc: netdev, linux-kernel
In-Reply-To: <1493017495-3578-1-git-send-email-bianpan2016@163.com>
Mon, Apr 24, 2017 at 09:04:55AM CEST, bianpan2016@163.com wrote:
>In function team_nl_send_options_get(), pointer skb keeps the return
>value of function nlmsg_new(). When the call to genlmsg_put() fails, the
>control flow directly returns and does not free skb. This will result in
>a memory leak bug. This patch fixes it.
>
>Fixes: 8ea7fd0d8792 ("team: fix memory leak")
test1:~/net-next$ git log 8ea7fd0d8792
fatal: ambiguous argument '8ea7fd0d8792': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
Please look up the commit that introduces the issue. Also no newline
in between "fixes" and "signed off".
Also. The subject should be:
[PATCH net 2/2] team: fix memory leak in team_nl_send_options_get
You can see this right away if you look in the mailing list archive...
>
>Signed-off-by: Pan Bian <bianpan2016@163.com>
>---
> drivers/net/team/team.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
>index dd3a2e9..85c0124 100644
>--- a/drivers/net/team/team.c
>+++ b/drivers/net/team/team.c
>@@ -2361,8 +2361,10 @@ static int team_nl_send_options_get(struct team *team, u32 portid, u32 seq,
>
> hdr = genlmsg_put(skb, portid, seq, &team_nl_family, flags | NLM_F_MULTI,
> TEAM_CMD_OPTIONS_GET);
>- if (!hdr)
>+ if (!hdr) {
>+ nlmsg_free(skb);
> return -EMSGSIZE;
>+ }
>
> if (nla_put_u32(skb, TEAM_ATTR_TEAM_IFINDEX, team->dev->ifindex))
> goto nla_put_failure;
>--
>1.9.1
>
>
^ permalink raw reply
* Re: [PATCH net-next 2/2] cls_flower: add support for matching MPLS fields (v2)
From: Jiri Pirko @ 2017-04-24 7:07 UTC (permalink / raw)
To: Benjamin LaHaise
Cc: netdev, Benjamin LaHaise, David S. Miller, Simon Horman,
Jamal Hadi Salim, Cong Wang, Jiri Pirko, Eric Dumazet,
Hadar Hen Zion, Gao Feng
In-Reply-To: <1492894367-11637-3-git-send-email-benjamin.lahaise@netronome.com>
Sat, Apr 22, 2017 at 10:52:47PM CEST, benjamin.lahaise@netronome.com wrote:
>Add support to the tc flower classifier to match based on fields in MPLS
>labels (TTL, Bottom of Stack, TC field, Label).
>
>Signed-off-by: Benjamin LaHaise <benjamin.lahaise@netronome.com>
>Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
>Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
^ permalink raw reply
* Re: [PATCH net-next 1/2] flow_dissector: add mpls support (v2)
From: Jiri Pirko @ 2017-04-24 7:05 UTC (permalink / raw)
To: Benjamin LaHaise
Cc: netdev, Benjamin LaHaise, David S. Miller, Simon Horman,
Jamal Hadi Salim, Cong Wang, Jiri Pirko, Hadar Hen Zion, Gao Feng
In-Reply-To: <1492894367-11637-2-git-send-email-benjamin.lahaise@netronome.com>
Sat, Apr 22, 2017 at 10:52:46PM CEST, benjamin.lahaise@netronome.com wrote:
>Add support for parsing MPLS flows to the flow dissector in preparation for
>adding MPLS match support to cls_flower.
>
>Signed-off-by: Benjamin LaHaise <benjamin.lahaise@netronome.com>
>Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
This looks odd :)
>Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
^ permalink raw reply
* [PATCH 2/2] net: team: fix memory leak in team_nl_send_options_get
From: Pan Bian @ 2017-04-24 7:04 UTC (permalink / raw)
To: Jiri Pirko, netdev; +Cc: linux-kernel, Pan Bian
In function team_nl_send_options_get(), pointer skb keeps the return
value of function nlmsg_new(). When the call to genlmsg_put() fails, the
control flow directly returns and does not free skb. This will result in
a memory leak bug. This patch fixes it.
Fixes: 8ea7fd0d8792 ("team: fix memory leak")
Signed-off-by: Pan Bian <bianpan2016@163.com>
---
drivers/net/team/team.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index dd3a2e9..85c0124 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -2361,8 +2361,10 @@ static int team_nl_send_options_get(struct team *team, u32 portid, u32 seq,
hdr = genlmsg_put(skb, portid, seq, &team_nl_family, flags | NLM_F_MULTI,
TEAM_CMD_OPTIONS_GET);
- if (!hdr)
+ if (!hdr) {
+ nlmsg_free(skb);
return -EMSGSIZE;
+ }
if (nla_put_u32(skb, TEAM_ATTR_TEAM_IFINDEX, team->dev->ifindex))
goto nla_put_failure;
--
1.9.1
^ permalink raw reply related
* [PATCH 1/2] net: team: fix memory leak in team_nl_send_port_list_get
From: Pan Bian @ 2017-04-24 7:04 UTC (permalink / raw)
To: Jiri Pirko, netdev; +Cc: linux-kernel, Pan Bian
In function team_nl_send_port_list_get(), pointer skb keeps the return
value of nlmsg_new(). When the call to genlmsg_put() fails, the memory
is not freed. This will result in a memory leak bug. This patch fixes
it.
Fixes: fbd69cda90e7 ("team: fix memory leak")
Signed-off-by: Pan Bian <bianpan2016@163.com>
---
drivers/net/team/team.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index f8c81f1..dd3a2e9 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -2634,8 +2634,10 @@ static int team_nl_send_port_list_get(struct team *team, u32 portid, u32 seq,
hdr = genlmsg_put(skb, portid, seq, &team_nl_family, flags | NLM_F_MULTI,
TEAM_CMD_PORT_LIST_GET);
- if (!hdr)
+ if (!hdr) {
+ nlmsg_free(skb);
return -EMSGSIZE;
+ }
if (nla_put_u32(skb, TEAM_ATTR_TEAM_IFINDEX, team->dev->ifindex))
goto nla_put_failure;
--
1.9.1
^ permalink raw reply related
* Re: [PATCH] ipvs: explicitly forbid ipv6 service/dest creation if ipv6 mod is disabled
From: Paolo Abeni @ 2017-04-24 6:50 UTC (permalink / raw)
To: Julian Anastasov
Cc: netfilter-devel, lvs-devel, Wensong Zhang, Simon Horman, netdev
In-Reply-To: <alpine.LFD.2.20.1704221355290.1974@ja.home.ssi.bg>
Hi,
On Sat, 2017-04-22 at 14:16 +0300, Julian Anastasov wrote:
> On Thu, 20 Apr 2017, Paolo Abeni wrote:
>
> > When creating a new ipvs service, ipv6 addresses are always accepted
> > if CONFIG_IP_VS_IPV6 is enabled. On dest creation the address family
> > is not explicitly checked.
> >
> > This allows the user-space to configure ipvs services even if the
> > system is booted with ipv6.disable=1. On specific configuration, ipvs
> > can try to call ipv6 routing code at setup time, causing the kernel to
> > oops due to fib6_rules_ops being NULL.
> >
> > This change addresses the issue adding a check for the ipv6
> > module being enabled while validating ipv6 service operations and
> > adding the same validation for dest operations.
> >
> > According to git history, this issue is apparently present since
> > the introduction of ipv6 support, and the oops can be triggered
> > since commit 09571c7ae30865ad ("IPVS: Add function to determine
> > if IPv6 address is local")
> >
> > Fixes: 09571c7ae30865ad ("IPVS: Add function to determine if IPv6 address is local")
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>
> Looks good to me but I see two places that can benefit
> from such check:
I'm sorry for the lag, I was delayed by other notorious issues ;-)
I'm unable to trigger any crash with the patched kernel.
> - in ip_vs_genl_new_daemon() if we do not want to create IPv6 sockets
> for the sync protocol in make_send_sock() and make_receive_sock().
> Not sure if this can lead to crashes.
This one is, AFAICS, safe, because ip_vs_genl_new_daemon() calls
start_sync_thread(), which tries to create a socket of the specified
address family before doing any real action. Such operation will fail
gracefully, and overall we will get an - expected - error to userspace.
> - in ip_vs_proc_sync_conn() if we do not want backup server to accept
> IPv6 conns because they may be created even when dests are missing.
> We may use retc = 10 there. Not fatal but may eat memory for
> conns that will not be used.
If I read the above correct correctly, even that should be safe, for
the same reason: ipv6 socket creation will fail if ipv6 is disabled.
The problem with the patched code is that it tries to resolve ipv6
addresses that are not created/validated by the kernel.
Cheers,
Paolo
^ permalink raw reply
* Re: [ISSUE: sky2 - rx error] Link stops working under heavy traffic load connected to a mv88e6176
From: Rafa Corvillo @ 2017-04-24 6:45 UTC (permalink / raw)
To: netdev
In-Reply-To: <58F9FD64.80506@aoifes.com>
I resend the mail with the schema fixed. Sorry for the inconvenience.
We are working in an ARMv7 embedded system running kernel 4.9 (LEDE build).
It is an imx6 board with 2 ethernet interfaces. One of them is connected to
a Marvell switch.
The schema of the system is the following:
+-------------------+ eth0
| +--+
| | |
| Embedded system +--+
| |
| ARMv7 |
| | Marvell 88E8057(sky2) +-------------+
| +--+ +--+ +--+ eth1
| | +---------------------+ | | +------+
| +--+ CPU port +--+ mv88e6176 +--+
+------+--+---------+ | |
emulated| | | |
GPIO +--+ +--+ +--+ eth2
MDIO +-----------------------------------+ | | +------+
MDIO +--+ +--+
+-------------+
There is a bridge (br-lan) which includes eth0/eth1/eth2
If I connect the eth1/eth2, the link is up and I can do ping through it.
But, once
I start sending a heavy traffic load the link fails and the kernel sends
the
following messages:
[ 48.557140] sky2 0000:04:00.0 marvell: rx error, status 0x5f20010
length 1518
[ 48.564964] sky2 0000:04:00.0 marvell: rx error, status 0x5f20010
length 1518
[ 48.572110] sky2 0000:04:00.0 marvell: rx error, status 0x5f20010
length 1518
[ 48.579263] sky2 0000:04:00.0 marvell: rx error, status 0x5f20010
length 1518
[ 48.586417] sky2 0000:04:00.0 marvell: rx error, status 0x5f20010
length 1518
[ 48.593573] sky2 0000:04:00.0 marvell: rx error, status 0x5f20010
length 1518
[ 48.600718] sky2 0000:04:00.0 marvell: rx error, status 0x5f20010
length 1518
[ 54.877567] net_ratelimit: 6 callbacks suppressed
[ 54.882293] sky2 0000:04:00.0 marvell: rx error, status 0x5f20010
length 1518
[ 61.413552] sky2 0000:04:00.0 marvell: rx error, status 0x5f20010
length 1518
I have a modified device-tree of imx6 which includes the mdio and dsa
nodes. This
device-tree works in a kernel 4.1.6, but I know that these parts of the
kernel have
a lot of changes. The changes included for mdio and dsa in the
device-tree are the
following (diff arch/arm/boot/dts/imx6qdl-gw53xx.dtsi
arch/arm/boot/dts/imx6qdl-gw53xx-mdio.dtsi):
16a17,18
> can0 = &can1;
> ethernet0 = &fec;
21a24
> sky2 = ð1;
24a28,29
> usdhc2 = &usdhc3;
> mdio-gpio0 = &mdio0;
62a68,125
> mdio0: mdio {
> compatible = "virtual,mdio-gpio";
> #address-cells = <1>;
> #size-cells = <0>;
> /* MDC = gpio-17, MDIO = gpio-20 */
> gpios = <&gpio1 17 1
> &gpio1 20 0>;
> ethernet-phy@0 {
> compatible = "marvell,dsa";
> };
> };
>
> dsa {
> compatible = "marvell,dsa";
> #address-cells = <2>;
> #size-cells = <0>;
>
> interrupts = <10>;
> dsa,ethernet = <ð1>;
> dsa,mii-bus = <&mdio0>;
>
> switch@0 {
> #address-cells = <1>;
> #size-cells = <0>;
> reg = <0 0>; /* MDIO address 0, switch 0 in tree */
>
> port@0 {
> reg = <0>;
> label = "cpu";
> };
>
> port@1 {
> reg = <1>;
> label = "eth1";
> };
>
> port@2 {
> reg = <2>;
> label = "eth2";
> };
>
> port@3 {
> reg = <3>;
> label = "eth3";
> };
>
> port@4 {
> reg = <4>;
> label = "eth4";
> };
> };
> };
>
361a425,430
> &mdio0 {
> pinctrl-names = "default";
> pinctrl-0 = <&pinctrl_mdio>;
> status = "okay";
> };
>
363c432
< imx6qdl-gw53xx {
---
> imx6qdl-gw53xx-mdio {
448a518,524
> >;
> };
>
> pinctrl_mdio: mdiogrp {
> fsl,pins = <
> MX6QDL_PAD_SD1_DAT1__GPIO1_IO17 0x1b0b9
> MX6QDL_PAD_SD1_CLK__GPIO1_IO20 0x1b0b9
Do you know of any possible reason why this could be happening?
Thanks in advance.
Rafa
^ permalink raw reply
* Re: [PATCH v3 07/29] x86: bpf_jit, use ENTRY+ENDPROC
From: Jiri Slaby @ 2017-04-24 6:45 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: mingo, tglx, hpa, x86, jpoimboe, linux-kernel, David S. Miller,
netdev, Daniel Borkmann, Eric Dumazet
In-Reply-To: <20170421193213.GB91454@ast-mbp.thefacebook.com>
On 04/21/2017, 09:32 PM, Alexei Starovoitov wrote:
> On Fri, Apr 21, 2017 at 04:12:43PM +0200, Jiri Slaby wrote:
>> Do not use a custom macro FUNC for starts of the global functions, use
>> ENTRY instead.
>>
>> And while at it, annotate also ends of the functions by ENDPROC.
>>
>> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
>> Cc: "David S. Miller" <davem@davemloft.net>
>> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
>> Cc: James Morris <jmorris@namei.org>
>> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
>> Cc: Patrick McHardy <kaber@trash.net>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: x86@kernel.org
>> Cc: netdev@vger.kernel.org
>> ---
>> arch/x86/net/bpf_jit.S | 32 ++++++++++++++++++--------------
>> 1 file changed, 18 insertions(+), 14 deletions(-)
>>
>> diff --git a/arch/x86/net/bpf_jit.S b/arch/x86/net/bpf_jit.S
>> index f2a7faf4706e..762c29fb8832 100644
>> --- a/arch/x86/net/bpf_jit.S
>> +++ b/arch/x86/net/bpf_jit.S
>> @@ -23,16 +23,12 @@
>> 32 /* space for rbx,r13,r14,r15 */ + \
>> 8 /* space for skb_copy_bits */)
>>
>> -#define FUNC(name) \
>> - .globl name; \
>> - .type name, @function; \
>> - name:
>> -
>> -FUNC(sk_load_word)
>> +ENTRY(sk_load_word)
>> test %esi,%esi
>> js bpf_slow_path_word_neg
>> +ENDPROC(sk_load_word)
>
> this doens't look right.
> It will add alignment nops in critical paths of these pseudo functions.
> I'm also not sure whether it will still work afterwards.
> Was it tested?
> I'd prefer if this code kept as-is.
It cannot stay as-is simply because we want to know where the functions
end to inject debuginfo properly. The code above does not warrant for
any exception.
Executing a nop takes a little and having externally-callable functions
aligned can actually help performance (no, I haven't measured nor tested
the code). But sure, the tool is generic, so I can introduce a local
macros to avoid alignments in the functions:
#define BPF_FUNC_START_LOCAL(name) \
SYM_START(name, SYM_V_LOCAL, SYM_A_NONE)
#define BPF_FUNC_START(name) \
SYM_START(name, SYM_V_GLOBAL, SYM_A_NONE)
#define BPF_FUNC_END(name) SYM_FUNC_END(name)
thanks,
--
js
suse labs
^ permalink raw reply
* Re: [PATCH 1/2] team: fix memory leak
From: Jiri Pirko @ 2017-04-24 5:40 UTC (permalink / raw)
To: Pan Bian; +Cc: netdev, linux-kernel
In-Reply-To: <1492932564-722-1-git-send-email-bianpan2016@163.com>
Sun, Apr 23, 2017 at 09:29:24AM CEST, bianpan2016@163.com wrote:
>In function team_nl_send_port_list_get(), pointer skb keeps the return
>value of nlmsg_new(). When the call to genlmsg_put() fails, the memory
>is not freed. This will result in a memory leak bug. This patch fixes
>it.
>
Looks good. Please adjust subject so the both patches have a specific
one. Also, please add "Fixes" tag (see git log for details).
Also, is is good to say which tree this patches are generated against
("-net")
>Signed-off-by: Pan Bian <bianpan2016@163.com>
>---
> drivers/net/team/team.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
>index f8c81f1..dd3a2e9 100644
>--- a/drivers/net/team/team.c
>+++ b/drivers/net/team/team.c
>@@ -2634,8 +2634,10 @@ static int team_nl_send_port_list_get(struct team *team, u32 portid, u32 seq,
>
> hdr = genlmsg_put(skb, portid, seq, &team_nl_family, flags | NLM_F_MULTI,
> TEAM_CMD_PORT_LIST_GET);
>- if (!hdr)
>+ if (!hdr) {
>+ nlmsg_free(skb);
> return -EMSGSIZE;
>+ }
>
> if (nla_put_u32(skb, TEAM_ATTR_TEAM_IFINDEX, team->dev->ifindex))
> goto nla_put_failure;
>--
>1.9.1
>
>
^ permalink raw reply
* Regarding patch: vlan: avoid a synchronize_net when parent device goes down
From: prasad padiyar @ 2017-04-24 5:10 UTC (permalink / raw)
To: netdev
Hi,
I had a query regarding the patch in the subject line.
>From the patch pasted below
vlan = vlan_dev_priv(vlandev);
if (!(vlan->flags & VLAN_FLAG_LOOSE_BINDING))
- dev_change_flags(vlandev, flgs & ~IFF_UP);
+ list_add(&vlandev->close_list, &close_list);
+ }
+
+ dev_close_many(&close_list, false);
+
+ list_for_each_entry_safe(vlandev, tmp, &close_list, close_list) {
netif_stacked_transfer_operstate(dev, vlandev);
+ list_del_init(&vlandev->close_list);
}
+ list_del(&close_list);
"
if (!(vlan->flags & VLAN_FLAG_LOOSE_BINDING))
- dev_change_flags(vlandev, flgs & ~IFF_UP);
+ list_add(&vlandev->close_list, &close_list);
+ }
+
+ dev_close_many(&close_list, false);
+
+ list_for_each_entry_safe(vlandev, tmp, &close_list, close_list) {
netif_stacked_transfer_operstate(dev, vlandev);
+ list_del_init(&vlandev->close_list);
}
+ list_del(&close_list);.
i can see that dev_change_flags which was setting the vlan device to
down state in case VLAN_FLAG_LOOSE_BINDING was not enabled.
But setting the operstate when the VLAN real device goes admin down
was quite independent of VLAN_FLAG_LOOSE_BINDING.
Operstate was always getting transferred irrespective of LOOSE_BINDING flag.
But now with this patch, we see that control of transferring the
operstate is based of VLAN_FLAG_LOOSE_BINDING. If
VLAN_FLAG_LOOSE_BINDING is set then oper state is not changed as per
its real iface.
Is this the expected behavior. Does VLAN_FLAG_LOOSE_BINDING with this
patch has to control the operstate as well which was not
the case earlier without this patch. ?
Can someone share your views about this.
Kernel version: 4.4.20
patch link:
https://patchwork.ozlabs.org/patch/447837/
--
Regards,
Prasad
^ permalink raw reply
* Re: [PATCH 1/2] ixgbe: add XDP support for pass and drop actions
From: John Fastabend @ 2017-04-24 4:26 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: jeffrey.t.kirsher, netdev
In-Reply-To: <20170423210506.79168335@laptop>
On 17-04-23 09:05 PM, Jakub Kicinski wrote:
> Hi!
>
> On Sun, 23 Apr 2017 18:31:19 -0700, John Fastabend wrote:
>> +static int ixgbe_xdp_setup(struct net_device *dev, struct bpf_prog *prog)
>> +{
>> + int i, frame_size = dev->mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN;
>> + struct ixgbe_adapter *adapter = netdev_priv(dev);
>> + struct bpf_prog *old_prog;
>> +
>> + if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED)
>> + return -EINVAL;
>> +
>> + if (adapter->flags & IXGBE_FLAG_DCB_ENABLED)
>> + return -EINVAL;
>> +
>> + /* verify ixgbe ring attributes are sufficient for XDP */
>> + for (i = 0; i < adapter->num_rx_queues; i++) {
>> + struct ixgbe_ring *ring = adapter->rx_ring[i];
>> +
>> + if (ring_is_rsc_enabled(ring))
>> + return -EINVAL;
>> +
>> + if (frame_size > ixgbe_rx_bufsz(ring))
>> + return -EINVAL;
>> + }
>
> I was just looking through the drivers, working on extended ack
> reporting, trying to bring out the driver XDP error messages out
> directly to iproute2. It seems that multiple drivers are only
> checking that MTU/buffer size is appropriate in the XDP_SETUP
> function, and ignore XDP in case user tries to change MTU later.
> And I think it's the same story with LRO?
>
Agreed we need to harden the drivers to changes post XDP init. I'll submit
a few follow on patches for the ixgbe devices in the morning.
Thanks,
John
^ permalink raw reply
* Re: [PATCH 1/2] ixgbe: add XDP support for pass and drop actions
From: Jakub Kicinski @ 2017-04-24 4:05 UTC (permalink / raw)
To: John Fastabend; +Cc: jeffrey.t.kirsher, netdev
In-Reply-To: <20170424013119.8354.18750.stgit@john-Precision-Tower-5810>
Hi!
On Sun, 23 Apr 2017 18:31:19 -0700, John Fastabend wrote:
> +static int ixgbe_xdp_setup(struct net_device *dev, struct bpf_prog *prog)
> +{
> + int i, frame_size = dev->mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN;
> + struct ixgbe_adapter *adapter = netdev_priv(dev);
> + struct bpf_prog *old_prog;
> +
> + if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED)
> + return -EINVAL;
> +
> + if (adapter->flags & IXGBE_FLAG_DCB_ENABLED)
> + return -EINVAL;
> +
> + /* verify ixgbe ring attributes are sufficient for XDP */
> + for (i = 0; i < adapter->num_rx_queues; i++) {
> + struct ixgbe_ring *ring = adapter->rx_ring[i];
> +
> + if (ring_is_rsc_enabled(ring))
> + return -EINVAL;
> +
> + if (frame_size > ixgbe_rx_bufsz(ring))
> + return -EINVAL;
> + }
I was just looking through the drivers, working on extended ack
reporting, trying to bring out the driver XDP error messages out
directly to iproute2. It seems that multiple drivers are only
checking that MTU/buffer size is appropriate in the XDP_SETUP
function, and ignore XDP in case user tries to change MTU later.
And I think it's the same story with LRO?
^ permalink raw reply
* Re: [PATCH 1/1] mt7601u: check return value of alloc_skb
From: Jakub Kicinski @ 2017-04-24 3:49 UTC (permalink / raw)
To: Pan Bian
Cc: Kalle Valo, Matthias Brugger, linux-wireless, netdev,
linux-kernel, linux-arm-kernel, linux-mediatek
In-Reply-To: <1492930823-17249-1-git-send-email-bianpan2016@163.com>
On Sun, 23 Apr 2017 15:00:23 +0800, Pan Bian wrote:
> Function alloc_skb() will return a NULL pointer if there is no enough
> memory. However, in function mt7601u_mcu_msg_alloc(), its return value
> is not validated before it is used. This patch fixes it.
>
> Signed-off-by: Pan Bian <bianpan2016@163.com>
Acked-by: Jakub Kicinski <kubakici@wp.pl>
Thanks!
^ permalink raw reply
* Re: [PATCH RFC (resend) net-next 0/6] virtio-net: Add support for virtio-net header extensions
From: Jason Wang @ 2017-04-24 3:22 UTC (permalink / raw)
To: vyasevic, Vladislav Yasevich, netdev
Cc: virtio-dev, mst, maxime.coquelin, virtualization
In-Reply-To: <022c8af3-e7e7-5d35-5152-cb12e90359ef@redhat.com>
On 2017年04月21日 21:08, Vlad Yasevich wrote:
> On 04/21/2017 12:05 AM, Jason Wang wrote:
>> On 2017年04月20日 23:34, Vlad Yasevich wrote:
>>> On 04/17/2017 11:01 PM, Jason Wang wrote:
>>>> On 2017年04月16日 00:38, Vladislav Yasevich wrote:
>>>>> Curreclty virtion net header is fixed size and adding things to it is rather
>>>>> difficult to do. This series attempt to add the infrastructure as well as some
>>>>> extensions that try to resolve some deficiencies we currently have.
>>>>>
>>>>> First, vnet header only has space for 16 flags. This may not be enough
>>>>> in the future. The extensions will provide space for 32 possbile extension
>>>>> flags and 32 possible extensions. These flags will be carried in the
>>>>> first pseudo extension header, the presense of which will be determined by
>>>>> the flag in the virtio net header.
>>>>>
>>>>> The extensions themselves will immidiately follow the extension header itself.
>>>>> They will be added to the packet in the same order as they appear in the
>>>>> extension flags. No padding is placed between the extensions and any
>>>>> extensions negotiated, but not used need by a given packet will convert to
>>>>> trailing padding.
>>>> Do we need a explicit padding (e.g an extension) which could be controlled by each side?
>>> I don't think so. The size of the vnet header is set based on the extensions negotiated.
>>> The one part I am not crazy about is that in the case of packet not using any extensions,
>>> the data is still placed after the entire vnet header, which essentially adds a lot
>>> of padding. However, that's really no different then if we simply grew the vnet header.
>>>
>>> The other thing I've tried before is putting extensions into their own sg buffer, but that
>>> made it slower.h
>> Yes.
>>
>>>>> For example:
>>>>> | vnet mrg hdr | ext hdr | ext 1 | ext 2 | ext 5 | .. pad .. | packet data |
>>>> Just some rough thoughts:
>>>>
>>>> - Is this better to use TLV instead of bitmap here? One advantage of TLV is that the
>>>> length is not limited by the length of bitmap.
>>> but the disadvantage is that we add at least 4 bytes per extension of just TL data. That
>>> makes this thing even longer.
>> Yes, and it looks like the length is still limited by e.g the length of T.
> Not only that, but it is also limited by the skb->cb as a whole. So adding putting
> extensions into a TLV style means we have less extensions for now, until we get rid of
> skb->cb usage.
>
>>>> - For 1.1, do we really want something like vnet header? AFAIK, it was not used by modern
>>>> NICs, is this better to pack all meta-data into descriptor itself? This may need a some
>>>> changes in tun/macvtap, but looks more PCIE friendly.
>>> That would really be ideal and I've looked at this. There are small issues of exposing
>>> the 'net metadata' of the descriptor to taps so they can be filled in. The alternative
>>> is to use a different control structure for tap->qemu|vhost channel (that can be
>>> implementation specific) and have qemu|vhost populate the 'net metadata' of the descriptor.
>> Yes, this needs some thought. For vhost, things looks a little bit easier, we can probably
>> use msg_control.
>>
> We can use msg_control in qemu as well, can't we?
AFAIK, it needs some changes since we don't export socket to userspace.
> It really is a question of who is doing
> the work and the number of copies.
>
> I can take a closer look of how it would look if we extend the descriptor with type
> specific data. I don't know if other users of virtio would benefit from it?
Not sure, but we can have a common descriptor header followed by device
specific meta data. This probably need some prototype benchmarking to
see the benefits first.
Thanks
^ permalink raw reply
* [PATCH 2/2] ixgbe: add support for XDP_TX action
From: John Fastabend @ 2017-04-24 1:31 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: netdev
In-Reply-To: <20170424013004.8354.34893.stgit@john-Precision-Tower-5810>
From: John Fastabend <john.fastabend@gmail.com>
Add support for XDP_TX action.
A couple design choices were made here. First I use a new ring
pointer structure xdp_ring[] in the adapter struct instead of
pushing the newly allocated xdp TX rings into the tx_ring[]
structure. This means we have to duplicate loops around rings
in places we want to initialize both TX rings and XDP rings.
But by making it explicit it is obvious when we are using XDP
rings and when we are using TX rings. Further we don't have
to do ring arithmatic which is error prone. As a proof point
for doing this my first patches used only a single ring structure
and introduced bugs in FCoE code and macvlan code paths.
Second I am aware this is not the most optimized version of
this code possible. I want to get baseline support in using
the most readable format possible and then once this series
is included I will optimize the TX path in another series
of patches.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
0 files changed
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index cb14813..e8cd449 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -235,7 +235,11 @@ struct vf_macvlans {
struct ixgbe_tx_buffer {
union ixgbe_adv_tx_desc *next_to_watch;
unsigned long time_stamp;
- struct sk_buff *skb;
+ union {
+ struct sk_buff *skb;
+ /* XDP uses address ptr on irq_clean */
+ void *data;
+ };
unsigned int bytecount;
unsigned short gso_segs;
__be16 protocol;
@@ -288,6 +292,7 @@ enum ixgbe_ring_state_t {
__IXGBE_TX_XPS_INIT_DONE,
__IXGBE_TX_DETECT_HANG,
__IXGBE_HANG_CHECK_ARMED,
+ __IXGBE_TX_XDP_RING,
};
#define ring_uses_build_skb(ring) \
@@ -314,6 +319,12 @@ struct ixgbe_fwd_adapter {
set_bit(__IXGBE_RX_RSC_ENABLED, &(ring)->state)
#define clear_ring_rsc_enabled(ring) \
clear_bit(__IXGBE_RX_RSC_ENABLED, &(ring)->state)
+#define ring_is_xdp(ring) \
+ test_bit(__IXGBE_TX_XDP_RING, &(ring)->state)
+#define set_ring_xdp(ring) \
+ set_bit(__IXGBE_TX_XDP_RING, &(ring)->state)
+#define clear_ring_xdp(ring) \
+ clear_bit(__IXGBE_TX_XDP_RING, &(ring)->state)
struct ixgbe_ring {
struct ixgbe_ring *next; /* pointer to next ring in q_vector */
struct ixgbe_q_vector *q_vector; /* backpointer to host q_vector */
@@ -380,6 +391,7 @@ enum ixgbe_ring_f_enum {
#define IXGBE_MAX_FCOE_INDICES 8
#define MAX_RX_QUEUES (IXGBE_MAX_FDIR_INDICES + 1)
#define MAX_TX_QUEUES (IXGBE_MAX_FDIR_INDICES + 1)
+#define MAX_XDP_QUEUES (IXGBE_MAX_FDIR_INDICES + 1)
#define IXGBE_MAX_L2A_QUEUES 4
#define IXGBE_BAD_L2A_QUEUE 3
#define IXGBE_MAX_MACVLANS 31
@@ -623,6 +635,10 @@ struct ixgbe_adapter {
__be16 vxlan_port;
__be16 geneve_port;
+ /* XDP */
+ int num_xdp_queues;
+ struct ixgbe_ring *xdp_ring[MAX_XDP_QUEUES];
+
/* TX */
struct ixgbe_ring *tx_ring[MAX_TX_QUEUES] ____cacheline_aligned_in_smp;
@@ -669,6 +685,7 @@ struct ixgbe_adapter {
u64 tx_busy;
unsigned int tx_ring_count;
+ unsigned int xdp_ring_count;
unsigned int rx_ring_count;
u32 link_speed;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index 79a126d..b0fd2f5 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -1071,15 +1071,19 @@ static int ixgbe_set_ringparam(struct net_device *netdev,
if (!netif_running(adapter->netdev)) {
for (i = 0; i < adapter->num_tx_queues; i++)
adapter->tx_ring[i]->count = new_tx_count;
+ for (i = 0; i < adapter->num_xdp_queues; i++)
+ adapter->xdp_ring[i]->count = new_tx_count;
for (i = 0; i < adapter->num_rx_queues; i++)
adapter->rx_ring[i]->count = new_rx_count;
adapter->tx_ring_count = new_tx_count;
+ adapter->xdp_ring_count = new_tx_count;
adapter->rx_ring_count = new_rx_count;
goto clear_reset;
}
/* allocate temporary buffer to store rings in */
i = max_t(int, adapter->num_tx_queues, adapter->num_rx_queues);
+ i = max_t(int, i, adapter->num_xdp_queues);
temp_ring = vmalloc(i * sizeof(struct ixgbe_ring));
if (!temp_ring) {
@@ -1111,12 +1115,33 @@ static int ixgbe_set_ringparam(struct net_device *netdev,
}
}
+ for (i = 0; i < adapter->num_xdp_queues; i++) {
+ memcpy(&temp_ring[i], adapter->xdp_ring[i],
+ sizeof(struct ixgbe_ring));
+
+ temp_ring[i].count = new_tx_count;
+ err = ixgbe_setup_tx_resources(&temp_ring[i]);
+ if (err) {
+ while (i) {
+ i--;
+ ixgbe_free_tx_resources(&temp_ring[i]);
+ }
+ goto err_setup;
+ }
+ }
+
for (i = 0; i < adapter->num_tx_queues; i++) {
ixgbe_free_tx_resources(adapter->tx_ring[i]);
memcpy(adapter->tx_ring[i], &temp_ring[i],
sizeof(struct ixgbe_ring));
}
+ for (i = 0; i < adapter->num_xdp_queues; i++) {
+ ixgbe_free_tx_resources(adapter->xdp_ring[i]);
+
+ memcpy(adapter->xdp_ring[i], &temp_ring[i],
+ sizeof(struct ixgbe_ring));
+ }
adapter->tx_ring_count = new_tx_count;
}
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
index 1b8be7d..b45fdc9 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
@@ -267,12 +267,14 @@ static bool ixgbe_cache_ring_sriov(struct ixgbe_adapter *adapter)
**/
static bool ixgbe_cache_ring_rss(struct ixgbe_adapter *adapter)
{
- int i;
+ int i, reg_idx;
for (i = 0; i < adapter->num_rx_queues; i++)
adapter->rx_ring[i]->reg_idx = i;
- for (i = 0; i < adapter->num_tx_queues; i++)
- adapter->tx_ring[i]->reg_idx = i;
+ for (i = 0, reg_idx = 0; i < adapter->num_tx_queues; i++, reg_idx++)
+ adapter->tx_ring[i]->reg_idx = reg_idx;
+ for (i = 0; i < adapter->num_xdp_queues; i++, reg_idx++)
+ adapter->xdp_ring[i]->reg_idx = reg_idx;
return true;
}
@@ -308,6 +310,11 @@ static void ixgbe_cache_ring_register(struct ixgbe_adapter *adapter)
ixgbe_cache_ring_rss(adapter);
}
+static int ixgbe_xdp_queues(struct ixgbe_adapter *adapter)
+{
+ return adapter->xdp_prog ? nr_cpu_ids : 0;
+}
+
#define IXGBE_RSS_64Q_MASK 0x3F
#define IXGBE_RSS_16Q_MASK 0xF
#define IXGBE_RSS_8Q_MASK 0x7
@@ -382,6 +389,7 @@ static bool ixgbe_set_dcb_sriov_queues(struct ixgbe_adapter *adapter)
adapter->num_rx_queues_per_pool = tcs;
adapter->num_tx_queues = vmdq_i * tcs;
+ adapter->num_xdp_queues = 0;
adapter->num_rx_queues = vmdq_i * tcs;
#ifdef IXGBE_FCOE
@@ -479,6 +487,7 @@ static bool ixgbe_set_dcb_queues(struct ixgbe_adapter *adapter)
netdev_set_tc_queue(dev, i, rss_i, rss_i * i);
adapter->num_tx_queues = rss_i * tcs;
+ adapter->num_xdp_queues = 0;
adapter->num_rx_queues = rss_i * tcs;
return true;
@@ -549,6 +558,7 @@ static bool ixgbe_set_sriov_queues(struct ixgbe_adapter *adapter)
adapter->num_rx_queues = vmdq_i * rss_i;
adapter->num_tx_queues = vmdq_i * rss_i;
+ adapter->num_xdp_queues = 0;
/* disable ATR as it is not supported when VMDq is enabled */
adapter->flags &= ~IXGBE_FLAG_FDIR_HASH_CAPABLE;
@@ -669,6 +679,7 @@ static bool ixgbe_set_rss_queues(struct ixgbe_adapter *adapter)
#endif /* IXGBE_FCOE */
adapter->num_rx_queues = rss_i;
adapter->num_tx_queues = rss_i;
+ adapter->num_xdp_queues = ixgbe_xdp_queues(adapter);
return true;
}
@@ -689,6 +700,7 @@ static void ixgbe_set_num_queues(struct ixgbe_adapter *adapter)
/* Start with base case */
adapter->num_rx_queues = 1;
adapter->num_tx_queues = 1;
+ adapter->num_xdp_queues = 0;
adapter->num_rx_pools = adapter->num_rx_queues;
adapter->num_rx_queues_per_pool = 1;
@@ -719,8 +731,11 @@ static int ixgbe_acquire_msix_vectors(struct ixgbe_adapter *adapter)
struct ixgbe_hw *hw = &adapter->hw;
int i, vectors, vector_threshold;
- /* We start by asking for one vector per queue pair */
+ /* We start by asking for one vector per queue pair with XDP queues
+ * being stacked with TX queues.
+ */
vectors = max(adapter->num_rx_queues, adapter->num_tx_queues);
+ vectors = max(vectors, adapter->num_xdp_queues);
/* It is easy to be greedy for MSI-X vectors. However, it really
* doesn't do much good if we have a lot more vectors than CPUs. We'll
@@ -800,6 +815,8 @@ static void ixgbe_add_ring(struct ixgbe_ring *ring,
* @v_idx: index of vector in adapter struct
* @txr_count: total number of Tx rings to allocate
* @txr_idx: index of first Tx ring to allocate
+ * @xdp_count: total number of XDP rings to allocate
+ * @xdp_idx: index of first XDP ring to allocate
* @rxr_count: total number of Rx rings to allocate
* @rxr_idx: index of first Rx ring to allocate
*
@@ -808,6 +825,7 @@ static void ixgbe_add_ring(struct ixgbe_ring *ring,
static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
int v_count, int v_idx,
int txr_count, int txr_idx,
+ int xdp_count, int xdp_idx,
int rxr_count, int rxr_idx)
{
struct ixgbe_q_vector *q_vector;
@@ -817,7 +835,7 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
int ring_count, size;
u8 tcs = netdev_get_num_tc(adapter->netdev);
- ring_count = txr_count + rxr_count;
+ ring_count = txr_count + rxr_count + xdp_count;
size = sizeof(struct ixgbe_q_vector) +
(sizeof(struct ixgbe_ring) * ring_count);
@@ -909,6 +927,33 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
ring++;
}
+ while (xdp_count) {
+ /* assign generic ring traits */
+ ring->dev = &adapter->pdev->dev;
+ ring->netdev = adapter->netdev;
+
+ /* configure backlink on ring */
+ ring->q_vector = q_vector;
+
+ /* update q_vector Tx values */
+ ixgbe_add_ring(ring, &q_vector->tx);
+
+ /* apply Tx specific ring traits */
+ ring->count = adapter->tx_ring_count;
+ ring->queue_index = xdp_idx;
+ set_ring_xdp(ring);
+
+ /* assign ring to adapter */
+ adapter->xdp_ring[xdp_idx] = ring;
+
+ /* update count and index */
+ xdp_count--;
+ xdp_idx++;
+
+ /* push pointer to next ring */
+ ring++;
+ }
+
while (rxr_count) {
/* assign generic ring traits */
ring->dev = &adapter->pdev->dev;
@@ -1002,17 +1047,18 @@ static int ixgbe_alloc_q_vectors(struct ixgbe_adapter *adapter)
int q_vectors = adapter->num_q_vectors;
int rxr_remaining = adapter->num_rx_queues;
int txr_remaining = adapter->num_tx_queues;
- int rxr_idx = 0, txr_idx = 0, v_idx = 0;
+ int xdp_remaining = adapter->num_xdp_queues;
+ int rxr_idx = 0, txr_idx = 0, xdp_idx = 0, v_idx = 0;
int err;
/* only one q_vector if MSI-X is disabled. */
if (!(adapter->flags & IXGBE_FLAG_MSIX_ENABLED))
q_vectors = 1;
- if (q_vectors >= (rxr_remaining + txr_remaining)) {
+ if (q_vectors >= (rxr_remaining + txr_remaining + xdp_remaining)) {
for (; rxr_remaining; v_idx++) {
err = ixgbe_alloc_q_vector(adapter, q_vectors, v_idx,
- 0, 0, 1, rxr_idx);
+ 0, 0, 0, 0, 1, rxr_idx);
if (err)
goto err_out;
@@ -1026,8 +1072,11 @@ static int ixgbe_alloc_q_vectors(struct ixgbe_adapter *adapter)
for (; v_idx < q_vectors; v_idx++) {
int rqpv = DIV_ROUND_UP(rxr_remaining, q_vectors - v_idx);
int tqpv = DIV_ROUND_UP(txr_remaining, q_vectors - v_idx);
+ int xqpv = DIV_ROUND_UP(xdp_remaining, q_vectors - v_idx);
+
err = ixgbe_alloc_q_vector(adapter, q_vectors, v_idx,
tqpv, txr_idx,
+ xqpv, xdp_idx,
rqpv, rxr_idx);
if (err)
@@ -1036,14 +1085,17 @@ static int ixgbe_alloc_q_vectors(struct ixgbe_adapter *adapter)
/* update counts and index */
rxr_remaining -= rqpv;
txr_remaining -= tqpv;
+ xdp_remaining -= xqpv;
rxr_idx++;
txr_idx++;
+ xdp_idx += xqpv;
}
return 0;
err_out:
adapter->num_tx_queues = 0;
+ adapter->num_xdp_queues = 0;
adapter->num_rx_queues = 0;
adapter->num_q_vectors = 0;
@@ -1066,6 +1118,7 @@ static void ixgbe_free_q_vectors(struct ixgbe_adapter *adapter)
int v_idx = adapter->num_q_vectors;
adapter->num_tx_queues = 0;
+ adapter->num_xdp_queues = 0;
adapter->num_rx_queues = 0;
adapter->num_q_vectors = 0;
@@ -1172,9 +1225,10 @@ int ixgbe_init_interrupt_scheme(struct ixgbe_adapter *adapter)
ixgbe_cache_ring_register(adapter);
- e_dev_info("Multiqueue %s: Rx Queue count = %u, Tx Queue count = %u\n",
+ e_dev_info("Multiqueue %s: Rx Queue count = %u, Tx Queue count = %u XDP Queue count = %u\n",
(adapter->num_rx_queues > 1) ? "Enabled" : "Disabled",
- adapter->num_rx_queues, adapter->num_tx_queues);
+ adapter->num_rx_queues, adapter->num_tx_queues,
+ adapter->num_xdp_queues);
set_bit(__IXGBE_DOWN, &adapter->state);
@@ -1195,6 +1249,7 @@ int ixgbe_init_interrupt_scheme(struct ixgbe_adapter *adapter)
void ixgbe_clear_interrupt_scheme(struct ixgbe_adapter *adapter)
{
adapter->num_tx_queues = 0;
+ adapter->num_xdp_queues = 0;
adapter->num_rx_queues = 0;
ixgbe_free_q_vectors(adapter);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index ebcf9c0..a13d2bf 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -594,6 +594,19 @@ static void ixgbe_regdump(struct ixgbe_hw *hw, struct ixgbe_reg_info *reginfo)
}
+static void ixgbe_print_buffer(struct ixgbe_ring *ring, int n)
+{
+ struct ixgbe_tx_buffer *tx_buffer;
+
+ tx_buffer = &ring->tx_buffer_info[ring->next_to_clean];
+ pr_info(" %5d %5X %5X %016llX %08X %p %016llX\n",
+ n, ring->next_to_use, ring->next_to_clean,
+ (u64)dma_unmap_addr(tx_buffer, dma),
+ dma_unmap_len(tx_buffer, len),
+ tx_buffer->next_to_watch,
+ (u64)tx_buffer->time_stamp);
+}
+
/*
* ixgbe_dump - Print registers, tx-rings and rx-rings
*/
@@ -603,7 +616,7 @@ static void ixgbe_dump(struct ixgbe_adapter *adapter)
struct ixgbe_hw *hw = &adapter->hw;
struct ixgbe_reg_info *reginfo;
int n = 0;
- struct ixgbe_ring *tx_ring;
+ struct ixgbe_ring *ring;
struct ixgbe_tx_buffer *tx_buffer;
union ixgbe_adv_tx_desc *tx_desc;
struct my_u0 { u64 a; u64 b; } *u0;
@@ -643,14 +656,13 @@ static void ixgbe_dump(struct ixgbe_adapter *adapter)
"Queue [NTU] [NTC] [bi(ntc)->dma ]",
"leng", "ntw", "timestamp");
for (n = 0; n < adapter->num_tx_queues; n++) {
- tx_ring = adapter->tx_ring[n];
- tx_buffer = &tx_ring->tx_buffer_info[tx_ring->next_to_clean];
- pr_info(" %5d %5X %5X %016llX %08X %p %016llX\n",
- n, tx_ring->next_to_use, tx_ring->next_to_clean,
- (u64)dma_unmap_addr(tx_buffer, dma),
- dma_unmap_len(tx_buffer, len),
- tx_buffer->next_to_watch,
- (u64)tx_buffer->time_stamp);
+ ring = adapter->tx_ring[n];
+ ixgbe_print_buffer(ring, n);
+ }
+
+ for (n = 0; n < adapter->num_xdp_queues; n++) {
+ ring = adapter->xdp_ring[n];
+ ixgbe_print_buffer(ring, n);
}
/* Print TX Rings */
@@ -695,28 +707,28 @@ static void ixgbe_dump(struct ixgbe_adapter *adapter)
*/
for (n = 0; n < adapter->num_tx_queues; n++) {
- tx_ring = adapter->tx_ring[n];
+ ring = adapter->tx_ring[n];
pr_info("------------------------------------\n");
- pr_info("TX QUEUE INDEX = %d\n", tx_ring->queue_index);
+ pr_info("TX QUEUE INDEX = %d\n", ring->queue_index);
pr_info("------------------------------------\n");
pr_info("%s%s %s %s %s %s\n",
"T [desc] [address 63:0 ] ",
"[PlPOIdStDDt Ln] [bi->dma ] ",
"leng", "ntw", "timestamp", "bi->skb");
- for (i = 0; tx_ring->desc && (i < tx_ring->count); i++) {
- tx_desc = IXGBE_TX_DESC(tx_ring, i);
- tx_buffer = &tx_ring->tx_buffer_info[i];
+ for (i = 0; ring->desc && (i < ring->count); i++) {
+ tx_desc = IXGBE_TX_DESC(ring, i);
+ tx_buffer = &ring->tx_buffer_info[i];
u0 = (struct my_u0 *)tx_desc;
if (dma_unmap_len(tx_buffer, len) > 0) {
const char *ring_desc;
- if (i == tx_ring->next_to_use &&
- i == tx_ring->next_to_clean)
+ if (i == ring->next_to_use &&
+ i == ring->next_to_clean)
ring_desc = " NTC/U";
- else if (i == tx_ring->next_to_use)
+ else if (i == ring->next_to_use)
ring_desc = " NTU";
- else if (i == tx_ring->next_to_clean)
+ else if (i == ring->next_to_clean)
ring_desc = " NTC";
else
ring_desc = "";
@@ -985,6 +997,10 @@ static void ixgbe_update_xoff_rx_lfc(struct ixgbe_adapter *adapter)
for (i = 0; i < adapter->num_tx_queues; i++)
clear_bit(__IXGBE_HANG_CHECK_ARMED,
&adapter->tx_ring[i]->state);
+
+ for (i = 0; i < adapter->num_xdp_queues; i++)
+ clear_bit(__IXGBE_HANG_CHECK_ARMED,
+ &adapter->xdp_ring[i]->state);
}
static void ixgbe_update_xoff_received(struct ixgbe_adapter *adapter)
@@ -1029,6 +1045,14 @@ static void ixgbe_update_xoff_received(struct ixgbe_adapter *adapter)
if (xoff[tc])
clear_bit(__IXGBE_HANG_CHECK_ARMED, &tx_ring->state);
}
+
+ for (i = 0; i < adapter->num_xdp_queues; i++) {
+ struct ixgbe_ring *xdp_ring = adapter->xdp_ring[i];
+
+ tc = xdp_ring->dcb_tc;
+ if (xoff[tc])
+ clear_bit(__IXGBE_HANG_CHECK_ARMED, &xdp_ring->state);
+ }
}
static u64 ixgbe_get_tx_completed(struct ixgbe_ring *ring)
@@ -1180,7 +1204,10 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
total_packets += tx_buffer->gso_segs;
/* free the skb */
- napi_consume_skb(tx_buffer->skb, napi_budget);
+ if (ring_is_xdp(tx_ring))
+ page_frag_free(tx_buffer->data);
+ else
+ napi_consume_skb(tx_buffer->skb, napi_budget);
/* unmap skb header data */
dma_unmap_single(tx_ring->dev,
@@ -1241,7 +1268,7 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
if (check_for_tx_hang(tx_ring) && ixgbe_check_tx_hang(tx_ring)) {
/* schedule immediate reset if we believe we hung */
struct ixgbe_hw *hw = &adapter->hw;
- e_err(drv, "Detected Tx Unit Hang\n"
+ e_err(drv, "Detected Tx Unit Hang %s\n"
" Tx Queue <%d>\n"
" TDH, TDT <%x>, <%x>\n"
" next_to_use <%x>\n"
@@ -1249,13 +1276,16 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
"tx_buffer_info[next_to_clean]\n"
" time_stamp <%lx>\n"
" jiffies <%lx>\n",
+ ring_is_xdp(tx_ring) ? "(XDP)" : "",
tx_ring->queue_index,
IXGBE_READ_REG(hw, IXGBE_TDH(tx_ring->reg_idx)),
IXGBE_READ_REG(hw, IXGBE_TDT(tx_ring->reg_idx)),
tx_ring->next_to_use, i,
tx_ring->tx_buffer_info[i].time_stamp, jiffies);
- netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index);
+ if (!ring_is_xdp(tx_ring))
+ netif_stop_subqueue(tx_ring->netdev,
+ tx_ring->queue_index);
e_info(probe,
"tx hang %d detected on queue %d, resetting adapter\n",
@@ -1268,6 +1298,9 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
return true;
}
+ if (ring_is_xdp(tx_ring))
+ return !!budget;
+
netdev_tx_completed_queue(txring_txq(tx_ring),
total_packets, total_bytes);
@@ -2170,8 +2203,13 @@ static struct sk_buff *ixgbe_build_skb(struct ixgbe_ring *rx_ring,
#define IXGBE_XDP_PASS 0
#define IXGBE_XDP_CONSUMED 1
+#define IXGBE_XDP_TX 2
-static struct sk_buff *ixgbe_run_xdp(struct ixgbe_ring *rx_ring,
+static int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter,
+ struct xdp_buff *xdp);
+
+static struct sk_buff *ixgbe_run_xdp(struct ixgbe_adapter *adapter,
+ struct ixgbe_ring *rx_ring,
struct xdp_buff *xdp)
{
int result = IXGBE_XDP_PASS;
@@ -2188,9 +2226,11 @@ static struct sk_buff *ixgbe_run_xdp(struct ixgbe_ring *rx_ring,
switch (act) {
case XDP_PASS:
break;
+ case XDP_TX:
+ result = ixgbe_xmit_xdp_ring(adapter, xdp);
+ break;
default:
bpf_warn_invalid_xdp_action(act);
- case XDP_TX:
case XDP_ABORTED:
trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
/* fallthrough -- handle aborts by dropping packet */
@@ -2203,6 +2243,23 @@ static struct sk_buff *ixgbe_run_xdp(struct ixgbe_ring *rx_ring,
return ERR_PTR(-result);
}
+static void ixgbe_rx_buffer_flip(struct ixgbe_ring *rx_ring,
+ struct ixgbe_rx_buffer *rx_buffer,
+ unsigned int size)
+{
+#if (PAGE_SIZE < 8192)
+ unsigned int truesize = ixgbe_rx_pg_size(rx_ring) / 2;
+
+ rx_buffer->page_offset ^= truesize;
+#else
+ unsigned int truesize = ring_uses_build_skb(rx_ring) ?
+ SKB_DATA_ALIGN(IXGBE_SKB_PAD + size) :
+ SKB_DATA_ALIGN(size);
+
+ rx_buffer->page_offset += truesize;
+#endif
+}
+
/**
* ixgbe_clean_rx_irq - Clean completed descriptors from Rx ring - bounce buf
* @q_vector: structure containing interrupt and ring information
@@ -2221,8 +2278,8 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
const int budget)
{
unsigned int total_rx_bytes = 0, total_rx_packets = 0;
-#ifdef IXGBE_FCOE
struct ixgbe_adapter *adapter = q_vector->adapter;
+#ifdef IXGBE_FCOE
int ddp_bytes;
unsigned int mss = 0;
#endif /* IXGBE_FCOE */
@@ -2262,13 +2319,16 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
ixgbe_rx_offset(rx_ring);
xdp.data_end = xdp.data + size;
- skb = ixgbe_run_xdp(rx_ring, &xdp);
+ skb = ixgbe_run_xdp(adapter, rx_ring, &xdp);
}
if (IS_ERR(skb)) {
+ if (PTR_ERR(skb) == -IXGBE_XDP_TX)
+ ixgbe_rx_buffer_flip(rx_ring, rx_buffer, size);
+ else
+ rx_buffer->pagecnt_bias++;
total_rx_packets++;
total_rx_bytes += size;
- rx_buffer->pagecnt_bias++;
} else if (skb) {
ixgbe_add_rx_frag(rx_ring, rx_buffer, skb, size);
} else if (ring_uses_build_skb(rx_ring)) {
@@ -3438,6 +3498,8 @@ static void ixgbe_configure_tx(struct ixgbe_adapter *adapter)
/* Setup the HW Tx Head and Tail descriptor pointers */
for (i = 0; i < adapter->num_tx_queues; i++)
ixgbe_configure_tx_ring(adapter, adapter->tx_ring[i]);
+ for (i = 0; i < adapter->num_xdp_queues; i++)
+ ixgbe_configure_tx_ring(adapter, adapter->xdp_ring[i]);
}
static void ixgbe_enable_rx_drop(struct ixgbe_adapter *adapter,
@@ -5579,7 +5641,10 @@ static void ixgbe_clean_tx_ring(struct ixgbe_ring *tx_ring)
union ixgbe_adv_tx_desc *eop_desc, *tx_desc;
/* Free all the Tx ring sk_buffs */
- dev_kfree_skb_any(tx_buffer->skb);
+ if (ring_is_xdp(tx_ring))
+ page_frag_free(tx_buffer->data);
+ else
+ dev_kfree_skb_any(tx_buffer->skb);
/* unmap skb header data */
dma_unmap_single(tx_ring->dev,
@@ -5620,7 +5685,8 @@ static void ixgbe_clean_tx_ring(struct ixgbe_ring *tx_ring)
}
/* reset BQL for queue */
- netdev_tx_reset_queue(txring_txq(tx_ring));
+ if (!ring_is_xdp(tx_ring))
+ netdev_tx_reset_queue(txring_txq(tx_ring));
/* reset next_to_use and next_to_clean */
tx_ring->next_to_use = 0;
@@ -5649,6 +5715,8 @@ static void ixgbe_clean_all_tx_rings(struct ixgbe_adapter *adapter)
for (i = 0; i < adapter->num_tx_queues; i++)
ixgbe_clean_tx_ring(adapter->tx_ring[i]);
+ for (i = 0; i < adapter->num_xdp_queues; i++)
+ ixgbe_clean_tx_ring(adapter->xdp_ring[i]);
}
static void ixgbe_fdir_filter_exit(struct ixgbe_adapter *adapter)
@@ -5743,6 +5811,11 @@ void ixgbe_down(struct ixgbe_adapter *adapter)
u8 reg_idx = adapter->tx_ring[i]->reg_idx;
IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(reg_idx), IXGBE_TXDCTL_SWFLSH);
}
+ for (i = 0; i < adapter->num_xdp_queues; i++) {
+ u8 reg_idx = adapter->xdp_ring[i]->reg_idx;
+
+ IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(reg_idx), IXGBE_TXDCTL_SWFLSH);
+ }
/* Disable the Tx DMA engine on 82599 and later MAC */
switch (hw->mac.type) {
@@ -6111,7 +6184,7 @@ int ixgbe_setup_tx_resources(struct ixgbe_ring *tx_ring)
**/
static int ixgbe_setup_all_tx_resources(struct ixgbe_adapter *adapter)
{
- int i, err = 0;
+ int i, j = 0, err = 0;
for (i = 0; i < adapter->num_tx_queues; i++) {
err = ixgbe_setup_tx_resources(adapter->tx_ring[i]);
@@ -6121,10 +6194,20 @@ static int ixgbe_setup_all_tx_resources(struct ixgbe_adapter *adapter)
e_err(probe, "Allocation for Tx Queue %u failed\n", i);
goto err_setup_tx;
}
+ for (j = 0; j < adapter->num_xdp_queues; j++) {
+ err = ixgbe_setup_tx_resources(adapter->xdp_ring[j]);
+ if (!err)
+ continue;
+
+ e_err(probe, "Allocation for Tx Queue %u failed\n", j);
+ goto err_setup_tx;
+ }
return 0;
err_setup_tx:
/* rewind the index freeing the rings as we go */
+ while (j--)
+ ixgbe_free_tx_resources(adapter->xdp_ring[j]);
while (i--)
ixgbe_free_tx_resources(adapter->tx_ring[i]);
return err;
@@ -6255,6 +6338,9 @@ static void ixgbe_free_all_tx_resources(struct ixgbe_adapter *adapter)
for (i = 0; i < adapter->num_tx_queues; i++)
if (adapter->tx_ring[i]->desc)
ixgbe_free_tx_resources(adapter->tx_ring[i]);
+ for (i = 0; i < adapter->num_xdp_queues; i++)
+ if (adapter->xdp_ring[i]->desc)
+ ixgbe_free_tx_resources(adapter->xdp_ring[i]);
}
/**
@@ -6674,6 +6760,14 @@ void ixgbe_update_stats(struct ixgbe_adapter *adapter)
bytes += tx_ring->stats.bytes;
packets += tx_ring->stats.packets;
}
+ for (i = 0; i < adapter->num_xdp_queues; i++) {
+ struct ixgbe_ring *xdp_ring = adapter->xdp_ring[i];
+
+ restart_queue += xdp_ring->tx_stats.restart_queue;
+ tx_busy += xdp_ring->tx_stats.tx_busy;
+ bytes += xdp_ring->stats.bytes;
+ packets += xdp_ring->stats.packets;
+ }
adapter->restart_queue = restart_queue;
adapter->tx_busy = tx_busy;
netdev->stats.tx_bytes = bytes;
@@ -6867,6 +6961,9 @@ static void ixgbe_fdir_reinit_subtask(struct ixgbe_adapter *adapter)
for (i = 0; i < adapter->num_tx_queues; i++)
set_bit(__IXGBE_TX_FDIR_INIT_DONE,
&(adapter->tx_ring[i]->state));
+ for (i = 0; i < adapter->num_xdp_queues; i++)
+ set_bit(__IXGBE_TX_FDIR_INIT_DONE,
+ &adapter->xdp_ring[i]->state);
/* re-enable flow director interrupts */
IXGBE_WRITE_REG(hw, IXGBE_EIMS, IXGBE_EIMS_FLOW_DIR);
} else {
@@ -6900,6 +6997,8 @@ static void ixgbe_check_hang_subtask(struct ixgbe_adapter *adapter)
if (netif_carrier_ok(adapter->netdev)) {
for (i = 0; i < adapter->num_tx_queues; i++)
set_check_for_tx_hang(adapter->tx_ring[i]);
+ for (i = 0; i < adapter->num_xdp_queues; i++)
+ set_check_for_tx_hang(adapter->xdp_ring[i]);
}
if (!(adapter->flags & IXGBE_FLAG_MSIX_ENABLED)) {
@@ -7130,6 +7229,13 @@ static bool ixgbe_ring_tx_pending(struct ixgbe_adapter *adapter)
return true;
}
+ for (i = 0; i < adapter->num_xdp_queues; i++) {
+ struct ixgbe_ring *ring = adapter->xdp_ring[i];
+
+ if (ring->next_to_use != ring->next_to_clean)
+ return true;
+ }
+
return false;
}
@@ -8090,6 +8196,69 @@ static u16 ixgbe_select_queue(struct net_device *dev, struct sk_buff *skb,
#endif
}
+static int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter,
+ struct xdp_buff *xdp)
+{
+ struct ixgbe_ring *ring = adapter->xdp_ring[smp_processor_id()];
+ struct ixgbe_tx_buffer *tx_buffer;
+ union ixgbe_adv_tx_desc *tx_desc;
+ u32 len, cmd_type;
+ dma_addr_t dma;
+ u16 i;
+
+ len = xdp->data_end - xdp->data;
+
+ if (unlikely(!ixgbe_desc_unused(ring)))
+ return IXGBE_XDP_CONSUMED;
+
+ dma = dma_map_single(ring->dev, xdp->data, len, DMA_TO_DEVICE);
+ if (dma_mapping_error(ring->dev, dma))
+ return IXGBE_XDP_CONSUMED;
+
+ /* record the location of the first descriptor for this packet */
+ tx_buffer = &ring->tx_buffer_info[ring->next_to_use];
+ tx_buffer->bytecount = len;
+ tx_buffer->gso_segs = 1;
+ tx_buffer->protocol = 0;
+
+ i = ring->next_to_use;
+ tx_desc = IXGBE_TX_DESC(ring, i);
+
+ dma_unmap_len_set(tx_buffer, len, len);
+ dma_unmap_addr_set(tx_buffer, dma, dma);
+ tx_buffer->data = xdp->data;
+ tx_desc->read.buffer_addr = cpu_to_le64(dma);
+
+ /* put descriptor type bits */
+ cmd_type = IXGBE_ADVTXD_DTYP_DATA |
+ IXGBE_ADVTXD_DCMD_DEXT |
+ IXGBE_ADVTXD_DCMD_IFCS;
+ cmd_type |= len | IXGBE_TXD_CMD;
+ tx_desc->read.cmd_type_len = cpu_to_le32(cmd_type);
+ tx_desc->read.olinfo_status =
+ cpu_to_le32(len << IXGBE_ADVTXD_PAYLEN_SHIFT);
+
+ /* Force memory writes to complete before letting h/w know there
+ * are new descriptors to fetch. (Only applicable for weak-ordered
+ * memory model archs, such as IA-64).
+ *
+ * We also need this memory barrier to make certain all of the
+ * status bits have been updated before next_to_watch is written.
+ */
+ wmb();
+
+ /* set next_to_watch value indicating a packet is present */
+ i++;
+ if (i == ring->count)
+ i = 0;
+
+ tx_buffer->next_to_watch = tx_desc;
+ ring->next_to_use = i;
+
+ writel(i, ring->tail);
+ return IXGBE_XDP_TX;
+}
+
netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb,
struct ixgbe_adapter *adapter,
struct ixgbe_ring *tx_ring)
@@ -8381,6 +8550,23 @@ static void ixgbe_netpoll(struct net_device *netdev)
#endif
+static void ixgbe_get_ring_stats64(struct rtnl_link_stats64 *stats,
+ struct ixgbe_ring *ring)
+{
+ u64 bytes, packets;
+ unsigned int start;
+
+ if (ring) {
+ do {
+ start = u64_stats_fetch_begin_irq(&ring->syncp);
+ packets = ring->stats.packets;
+ bytes = ring->stats.bytes;
+ } while (u64_stats_fetch_retry_irq(&ring->syncp, start));
+ stats->tx_packets += packets;
+ stats->tx_bytes += bytes;
+ }
+}
+
static void ixgbe_get_stats64(struct net_device *netdev,
struct rtnl_link_stats64 *stats)
{
@@ -8406,18 +8592,13 @@ static void ixgbe_get_stats64(struct net_device *netdev,
for (i = 0; i < adapter->num_tx_queues; i++) {
struct ixgbe_ring *ring = ACCESS_ONCE(adapter->tx_ring[i]);
- u64 bytes, packets;
- unsigned int start;
- if (ring) {
- do {
- start = u64_stats_fetch_begin_irq(&ring->syncp);
- packets = ring->stats.packets;
- bytes = ring->stats.bytes;
- } while (u64_stats_fetch_retry_irq(&ring->syncp, start));
- stats->tx_packets += packets;
- stats->tx_bytes += bytes;
- }
+ ixgbe_get_ring_stats64(stats, ring);
+ }
+ for (i = 0; i < adapter->num_xdp_queues; i++) {
+ struct ixgbe_ring *ring = ACCESS_ONCE(adapter->xdp_ring[i]);
+
+ ixgbe_get_ring_stats64(stats, ring);
}
rcu_read_unlock();
@@ -9559,9 +9740,23 @@ static int ixgbe_xdp_setup(struct net_device *dev, struct bpf_prog *prog)
return -EINVAL;
}
+ if (nr_cpu_ids > MAX_XDP_QUEUES)
+ return -ENOMEM;
+
old_prog = xchg(&adapter->xdp_prog, prog);
- for (i = 0; i < adapter->num_rx_queues; i++)
- xchg(&adapter->rx_ring[i]->xdp_prog, adapter->xdp_prog);
+
+ /* If transitioning XDP modes reconfigure rings */
+ if (!!prog != !!old_prog) {
+ int err = ixgbe_setup_tc(dev, netdev_get_num_tc(dev));
+
+ if (err) {
+ rcu_assign_pointer(adapter->xdp_prog, old_prog);
+ return -EINVAL;
+ }
+ } else {
+ for (i = 0; i < adapter->num_rx_queues; i++)
+ xchg(&adapter->rx_ring[i]->xdp_prog, adapter->xdp_prog);
+ }
if (old_prog)
bpf_prog_put(old_prog);
@@ -10070,6 +10265,9 @@ static int ixgbe_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
for (i = 0; i < adapter->num_tx_queues; i++)
u64_stats_init(&adapter->tx_ring[i]->syncp);
+ for (i = 0; i < adapter->num_xdp_queues; i++)
+ u64_stats_init(&adapter->xdp_ring[i]->syncp);
+
/* WOL not supported for all devices */
adapter->wol = 0;
hw->eeprom.ops.read(hw, 0x2c, &adapter->eeprom_cap);
^ permalink raw reply related
* [PATCH 1/2] ixgbe: add XDP support for pass and drop actions
From: John Fastabend @ 2017-04-24 1:31 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: netdev
In-Reply-To: <20170424013004.8354.34893.stgit@john-Precision-Tower-5810>
Basic XDP drop support for ixgbe. Uses READ_ONCE/xchg semantics on XDP
programs instead of rcu primitives as suggested by Daniel Borkmann and
Alex Duyck.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
0 files changed
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 656ca8f..cb14813 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -318,6 +318,7 @@ struct ixgbe_ring {
struct ixgbe_ring *next; /* pointer to next ring in q_vector */
struct ixgbe_q_vector *q_vector; /* backpointer to host q_vector */
struct net_device *netdev; /* netdev ring belongs to */
+ struct bpf_prog *xdp_prog;
struct device *dev; /* device for DMA mapping */
struct ixgbe_fwd_adapter *l2_accel_priv;
void *desc; /* descriptor ring memory */
@@ -555,6 +556,7 @@ struct ixgbe_adapter {
unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
/* OS defined structs */
struct net_device *netdev;
+ struct bpf_prog *xdp_prog;
struct pci_dev *pdev;
unsigned long state;
@@ -835,7 +837,7 @@ enum ixgbe_boards {
void ixgbe_reinit_locked(struct ixgbe_adapter *adapter);
void ixgbe_reset(struct ixgbe_adapter *adapter);
void ixgbe_set_ethtool_ops(struct net_device *netdev);
-int ixgbe_setup_rx_resources(struct ixgbe_ring *);
+int ixgbe_setup_rx_resources(struct ixgbe_adapter *, struct ixgbe_ring *);
int ixgbe_setup_tx_resources(struct ixgbe_ring *);
void ixgbe_free_rx_resources(struct ixgbe_ring *);
void ixgbe_free_tx_resources(struct ixgbe_ring *);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index 59730ed..79a126d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -1128,7 +1128,7 @@ static int ixgbe_set_ringparam(struct net_device *netdev,
sizeof(struct ixgbe_ring));
temp_ring[i].count = new_rx_count;
- err = ixgbe_setup_rx_resources(&temp_ring[i]);
+ err = ixgbe_setup_rx_resources(adapter, &temp_ring[i]);
if (err) {
while (i) {
i--;
@@ -1761,7 +1761,7 @@ static int ixgbe_setup_desc_rings(struct ixgbe_adapter *adapter)
rx_ring->netdev = adapter->netdev;
rx_ring->reg_idx = adapter->rx_ring[0]->reg_idx;
- err = ixgbe_setup_rx_resources(rx_ring);
+ err = ixgbe_setup_rx_resources(adapter, rx_ring);
if (err) {
ret_val = 4;
goto err_nomem;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 730742a..ebcf9c0 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -49,6 +49,9 @@
#include <linux/if_macvlan.h>
#include <linux/if_bridge.h>
#include <linux/prefetch.h>
+#include <linux/bpf.h>
+#include <linux/bpf_trace.h>
+#include <linux/atomic.h>
#include <scsi/fc/fc_fcoe.h>
#include <net/udp_tunnel.h>
#include <net/pkt_cls.h>
@@ -1856,6 +1859,10 @@ static void ixgbe_dma_sync_frag(struct ixgbe_ring *rx_ring,
* @rx_desc: pointer to the EOP Rx descriptor
* @skb: pointer to current skb being fixed
*
+ * Check if the skb is valid in the XDP case it will be an error pointer.
+ * Return true in this case to abort processing and advance to next
+ * descriptor.
+ *
* Check for corrupted packet headers caused by senders on the local L2
* embedded NIC switch not setting up their Tx Descriptors right. These
* should be very rare.
@@ -1874,6 +1881,10 @@ static bool ixgbe_cleanup_headers(struct ixgbe_ring *rx_ring,
{
struct net_device *netdev = rx_ring->netdev;
+ /* XDP packets use error pointer so abort at this point */
+ if (IS_ERR(skb))
+ return true;
+
/* verify that the packet does not have any known errors */
if (unlikely(ixgbe_test_staterr(rx_desc,
IXGBE_RXDADV_ERR_FRAME_ERR_MASK) &&
@@ -2049,7 +2060,7 @@ static void ixgbe_put_rx_buffer(struct ixgbe_ring *rx_ring,
/* hand second half of page back to the ring */
ixgbe_reuse_rx_page(rx_ring, rx_buffer);
} else {
- if (IXGBE_CB(skb)->dma == rx_buffer->dma) {
+ if (!IS_ERR(skb) && IXGBE_CB(skb)->dma == rx_buffer->dma) {
/* the page has been released from the ring */
IXGBE_CB(skb)->page_released = true;
} else {
@@ -2070,21 +2081,22 @@ static void ixgbe_put_rx_buffer(struct ixgbe_ring *rx_ring,
static struct sk_buff *ixgbe_construct_skb(struct ixgbe_ring *rx_ring,
struct ixgbe_rx_buffer *rx_buffer,
- union ixgbe_adv_rx_desc *rx_desc,
- unsigned int size)
+ struct xdp_buff *xdp,
+ union ixgbe_adv_rx_desc *rx_desc)
{
- void *va = page_address(rx_buffer->page) + rx_buffer->page_offset;
+ unsigned int size = xdp->data_end - xdp->data;
#if (PAGE_SIZE < 8192)
unsigned int truesize = ixgbe_rx_pg_size(rx_ring) / 2;
#else
- unsigned int truesize = SKB_DATA_ALIGN(size);
+ unsigned int truesize = SKB_DATA_ALIGN(xdp->data_end -
+ xdp->data_hard_start);
#endif
struct sk_buff *skb;
/* prefetch first cache line of first page */
- prefetch(va);
+ prefetch(xdp->data);
#if L1_CACHE_BYTES < 128
- prefetch(va + L1_CACHE_BYTES);
+ prefetch(xdp->data + L1_CACHE_BYTES);
#endif
/* allocate a skb to store the frags */
@@ -2097,7 +2109,7 @@ static struct sk_buff *ixgbe_construct_skb(struct ixgbe_ring *rx_ring,
IXGBE_CB(skb)->dma = rx_buffer->dma;
skb_add_rx_frag(skb, 0, rx_buffer->page,
- rx_buffer->page_offset,
+ xdp->data - page_address(rx_buffer->page),
size, truesize);
#if (PAGE_SIZE < 8192)
rx_buffer->page_offset ^= truesize;
@@ -2105,7 +2117,8 @@ static struct sk_buff *ixgbe_construct_skb(struct ixgbe_ring *rx_ring,
rx_buffer->page_offset += truesize;
#endif
} else {
- memcpy(__skb_put(skb, size), va, ALIGN(size, sizeof(long)));
+ memcpy(__skb_put(skb, size),
+ xdp->data, ALIGN(size, sizeof(long)));
rx_buffer->pagecnt_bias++;
}
@@ -2114,32 +2127,32 @@ static struct sk_buff *ixgbe_construct_skb(struct ixgbe_ring *rx_ring,
static struct sk_buff *ixgbe_build_skb(struct ixgbe_ring *rx_ring,
struct ixgbe_rx_buffer *rx_buffer,
- union ixgbe_adv_rx_desc *rx_desc,
- unsigned int size)
+ struct xdp_buff *xdp,
+ union ixgbe_adv_rx_desc *rx_desc)
{
- void *va = page_address(rx_buffer->page) + rx_buffer->page_offset;
#if (PAGE_SIZE < 8192)
unsigned int truesize = ixgbe_rx_pg_size(rx_ring) / 2;
#else
unsigned int truesize = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) +
- SKB_DATA_ALIGN(IXGBE_SKB_PAD + size);
+ SKB_DATA_ALIGN(xdp->data_end -
+ xdp->data_hard_start);
#endif
struct sk_buff *skb;
/* prefetch first cache line of first page */
- prefetch(va);
+ prefetch(xdp->data);
#if L1_CACHE_BYTES < 128
- prefetch(va + L1_CACHE_BYTES);
+ prefetch(xdp->data + L1_CACHE_BYTES);
#endif
- /* build an skb around the page buffer */
- skb = build_skb(va - IXGBE_SKB_PAD, truesize);
+ /* build an skb to around the page buffer */
+ skb = build_skb(xdp->data_hard_start, truesize);
if (unlikely(!skb))
return NULL;
/* update pointers within the skb to store the data */
- skb_reserve(skb, IXGBE_SKB_PAD);
- __skb_put(skb, size);
+ skb_reserve(skb, xdp->data - xdp->data_hard_start);
+ __skb_put(skb, xdp->data_end - xdp->data);
/* record DMA address if this is the start of a chain of buffers */
if (!ixgbe_test_staterr(rx_desc, IXGBE_RXD_STAT_EOP))
@@ -2155,6 +2168,41 @@ static struct sk_buff *ixgbe_build_skb(struct ixgbe_ring *rx_ring,
return skb;
}
+#define IXGBE_XDP_PASS 0
+#define IXGBE_XDP_CONSUMED 1
+
+static struct sk_buff *ixgbe_run_xdp(struct ixgbe_ring *rx_ring,
+ struct xdp_buff *xdp)
+{
+ int result = IXGBE_XDP_PASS;
+ struct bpf_prog *xdp_prog;
+ u32 act;
+
+ rcu_read_lock();
+ xdp_prog = READ_ONCE(rx_ring->xdp_prog);
+
+ if (!xdp_prog)
+ goto xdp_out;
+
+ act = bpf_prog_run_xdp(xdp_prog, xdp);
+ switch (act) {
+ case XDP_PASS:
+ break;
+ default:
+ bpf_warn_invalid_xdp_action(act);
+ case XDP_TX:
+ case XDP_ABORTED:
+ trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
+ /* fallthrough -- handle aborts by dropping packet */
+ case XDP_DROP:
+ result = IXGBE_XDP_CONSUMED;
+ break;
+ }
+xdp_out:
+ rcu_read_unlock();
+ return ERR_PTR(-result);
+}
+
/**
* ixgbe_clean_rx_irq - Clean completed descriptors from Rx ring - bounce buf
* @q_vector: structure containing interrupt and ring information
@@ -2184,6 +2232,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
union ixgbe_adv_rx_desc *rx_desc;
struct ixgbe_rx_buffer *rx_buffer;
struct sk_buff *skb;
+ struct xdp_buff xdp;
unsigned int size;
/* return some buffers to hardware, one at a time is too slow */
@@ -2206,14 +2255,29 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
rx_buffer = ixgbe_get_rx_buffer(rx_ring, rx_desc, &skb, size);
/* retrieve a buffer from the ring */
- if (skb)
+ if (!skb) {
+ xdp.data = page_address(rx_buffer->page) +
+ rx_buffer->page_offset;
+ xdp.data_hard_start = xdp.data -
+ ixgbe_rx_offset(rx_ring);
+ xdp.data_end = xdp.data + size;
+
+ skb = ixgbe_run_xdp(rx_ring, &xdp);
+ }
+
+ if (IS_ERR(skb)) {
+ total_rx_packets++;
+ total_rx_bytes += size;
+ rx_buffer->pagecnt_bias++;
+ } else if (skb) {
ixgbe_add_rx_frag(rx_ring, rx_buffer, skb, size);
- else if (ring_uses_build_skb(rx_ring))
+ } else if (ring_uses_build_skb(rx_ring)) {
skb = ixgbe_build_skb(rx_ring, rx_buffer,
- rx_desc, size);
- else
+ &xdp, rx_desc);
+ } else {
skb = ixgbe_construct_skb(rx_ring, rx_buffer,
- rx_desc, size);
+ &xdp, rx_desc);
+ }
/* exit if we failed to retrieve a buffer */
if (!skb) {
@@ -6072,7 +6136,8 @@ static int ixgbe_setup_all_tx_resources(struct ixgbe_adapter *adapter)
*
* Returns 0 on success, negative on failure
**/
-int ixgbe_setup_rx_resources(struct ixgbe_ring *rx_ring)
+int ixgbe_setup_rx_resources(struct ixgbe_adapter *adapter,
+ struct ixgbe_ring *rx_ring)
{
struct device *dev = rx_ring->dev;
int orig_node = dev_to_node(dev);
@@ -6109,6 +6174,8 @@ int ixgbe_setup_rx_resources(struct ixgbe_ring *rx_ring)
rx_ring->next_to_clean = 0;
rx_ring->next_to_use = 0;
+ rx_ring->xdp_prog = adapter->xdp_prog;
+
return 0;
err:
vfree(rx_ring->rx_buffer_info);
@@ -6132,7 +6199,7 @@ static int ixgbe_setup_all_rx_resources(struct ixgbe_adapter *adapter)
int i, err = 0;
for (i = 0; i < adapter->num_rx_queues; i++) {
- err = ixgbe_setup_rx_resources(adapter->rx_ring[i]);
+ err = ixgbe_setup_rx_resources(adapter, adapter->rx_ring[i]);
if (!err)
continue;
@@ -6200,6 +6267,7 @@ void ixgbe_free_rx_resources(struct ixgbe_ring *rx_ring)
{
ixgbe_clean_rx_ring(rx_ring);
+ rx_ring->xdp_prog = NULL;
vfree(rx_ring->rx_buffer_info);
rx_ring->rx_buffer_info = NULL;
@@ -9468,6 +9536,54 @@ static void ixgbe_fwd_del(struct net_device *pdev, void *priv)
return features;
}
+static int ixgbe_xdp_setup(struct net_device *dev, struct bpf_prog *prog)
+{
+ int i, frame_size = dev->mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN;
+ struct ixgbe_adapter *adapter = netdev_priv(dev);
+ struct bpf_prog *old_prog;
+
+ if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED)
+ return -EINVAL;
+
+ if (adapter->flags & IXGBE_FLAG_DCB_ENABLED)
+ return -EINVAL;
+
+ /* verify ixgbe ring attributes are sufficient for XDP */
+ for (i = 0; i < adapter->num_rx_queues; i++) {
+ struct ixgbe_ring *ring = adapter->rx_ring[i];
+
+ if (ring_is_rsc_enabled(ring))
+ return -EINVAL;
+
+ if (frame_size > ixgbe_rx_bufsz(ring))
+ return -EINVAL;
+ }
+
+ old_prog = xchg(&adapter->xdp_prog, prog);
+ for (i = 0; i < adapter->num_rx_queues; i++)
+ xchg(&adapter->rx_ring[i]->xdp_prog, adapter->xdp_prog);
+
+ if (old_prog)
+ bpf_prog_put(old_prog);
+
+ return 0;
+}
+
+static int ixgbe_xdp(struct net_device *dev, struct netdev_xdp *xdp)
+{
+ struct ixgbe_adapter *adapter = netdev_priv(dev);
+
+ switch (xdp->command) {
+ case XDP_SETUP_PROG:
+ return ixgbe_xdp_setup(dev, xdp->prog);
+ case XDP_QUERY_PROG:
+ xdp->prog_attached = !!(adapter->xdp_prog);
+ return 0;
+ default:
+ return -EINVAL;
+ }
+}
+
static const struct net_device_ops ixgbe_netdev_ops = {
.ndo_open = ixgbe_open,
.ndo_stop = ixgbe_close,
@@ -9513,6 +9629,7 @@ static void ixgbe_fwd_del(struct net_device *pdev, void *priv)
.ndo_udp_tunnel_add = ixgbe_add_udp_tunnel_port,
.ndo_udp_tunnel_del = ixgbe_del_udp_tunnel_port,
.ndo_features_check = ixgbe_features_check,
+ .ndo_xdp = ixgbe_xdp,
};
/**
^ permalink raw reply related
* [PATCH 0/2] ixgbe updates
From: John Fastabend @ 2017-04-24 1:30 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: netdev
Hi Jeff, Here are two patch updates for your tree. I merged in Alex's
build fix and Jakub's comments. Lightly tested xdp1 and xdp2.
---
John Fastabend (2):
ixgbe: add XDP support for pass and drop actions
ixgbe: add support for XDP_TX action
0 files changed
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox