Netdev List
 help / color / mirror / Atom feed
* [PATCH] net: stmmac: constify clk_div_table
From: Arvind Yadav @ 2017-08-28  5:52 UTC (permalink / raw)
  To: khilman, carlo, alexandre.torgue, peppe.cavallaro, davem
  Cc: linux-kernel, linux-amlogic, linux-arm-kernel, netdev

clk_div_table are not supposed to change at runtime.
meson8b_dwmac structure is working with const clk_div_table.
So mark the non-const structs as const.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
index 9685555..4404650b 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
@@ -89,7 +89,7 @@ static int meson8b_init_clk(struct meson8b_dwmac *dwmac)
 	char clk_name[32];
 	const char *clk_div_parents[1];
 	const char *mux_parent_names[MUX_CLK_NUM_PARENTS];
-	static struct clk_div_table clk_25m_div_table[] = {
+	static const struct clk_div_table clk_25m_div_table[] = {
 		{ .val = 0, .div = 5 },
 		{ .val = 1, .div = 10 },
 		{ /* sentinel */ },
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH net-next v7 05/10] landlock: Add LSM hooks related to filesystem
From: Alexei Starovoitov @ 2017-08-28  5:26 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman,
	James Morris, Jann Horn, Jonathan Corbet, Matthew Garrett,
	Michael Kerrisk, Kees Cook, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Shuah Khan, Tejun Heo, Thomas Graf <tgr
In-Reply-To: <3325bd7d-f3d8-2f51-384c-b5e8cee5cb91@digikod.net>

On Sun, Aug 27, 2017 at 03:31:35PM +0200, Mickaël Salaün wrote:
> 
> > How can you add 3rd argument? All FS events would have to get it,
> > but in some LSM hooks such argument will be meaningless, whereas
> > in other places it will carry useful info that rule can operate on.
> > Would that mean that we'll have FS_3 event type and only few LSM
> > hooks will be converted to it. That works, but then we'll lose
> > compatiblity with old rules written for FS event and that given hook.
> > Otherwise we'd need to have fancy logic to accept old FS event
> > into FS_3 LSM hook.
> 
> If we want to add a third argument to the FS event, then it will become
> accessible because its type will be different than NOT_INIT. This keep
> the compatibility with old rules because this new field was then denied.
> 
> If we want to add a new argument but only for a subset of the hooks used
> by the FS event, then we need to create a new event, like FS_FCNTL. For
> example, we may want to add a FS_RENAME event to be able to tie the
> source file and the destination file of a rename call.

that's exactly my point. To add another argument FS event
to a subset of hooks will require either new FS_FOO and
to be backwards compatible these hooks will call _both_ FS and FS_FOO
or some magic logic on kernel side that will allow old FS rules
to be attached to FS_FOO hooks?
Two calls doesn't scale and if we do 'magic logic' can we do it now
and avoid introducing events altogether?
Like all landlock programs can be landlock type and they would need
to declare what arg1, arg2, argN they expect. Then at attach
time the kernel only needs to verify that hook arg types match
what program requested.

> Anyway, I added the subtype/ABI version as a safeguard in case of
> unexpected future evolution.

I don't think that abi/version field adds anything in this context.
I still think it should simply be removed.

^ permalink raw reply

* Re: [PATCH net-next v7 04/10] bpf: Define handle_fs and add a new helper bpf_handle_fs_get_mode()
From: James Morris @ 2017-08-28  4:09 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman,
	James Morris, Jann Horn, Jonathan Corbet, Matthew Garrett,
	Michael Kerrisk, Kees Cook, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Shuah Khan, Tejun Heo, Thomas Graf
In-Reply-To: <20170821000933.13024-5-mic@digikod.net>

[-- Attachment #1: Type: text/plain, Size: 348 bytes --]

On Mon, 21 Aug 2017, Mickaël Salaün wrote:

> @@ -85,6 +90,8 @@ enum bpf_arg_type {
>  
>  	ARG_PTR_TO_CTX,		/* pointer to context */
>  	ARG_ANYTHING,		/* any (initialized) argument is ok */
> +
> +	ARG_CONST_PTR_TO_HANDLE_FS,	/* pointer to an abstract FS struct */
>  };

Looks like a spurious empty line.

-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply

* Re: Re: [PATCH net-next v7 02/10] bpf: Add eBPF program subtype and is_valid_subtype() verifier
From: James Morris @ 2017-08-28  3:48 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Alexei Starovoitov, linux-kernel, Alexei Starovoitov,
	Andy Lutomirski, Arnaldo Carvalho de Melo, Casey Schaufler,
	Daniel Borkmann, David Drysdale, David S . Miller,
	Eric W . Biederman, James Morris, Jann Horn, Jonathan Corbet,
	Matthew Garrett, Michael Kerrisk, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Shuah Khan
In-Reply-To: <607ceb21-5aa5-678b-4438-0d8dcb69fc3c@digikod.net>

[-- Attachment #1: Type: text/plain, Size: 1583 bytes --]

On Wed, 23 Aug 2017, Mickaël Salaün wrote:

> >> +	struct {
> >> +		__u32		abi; /* minimal ABI version, cf. user doc */
> > 
> > the concept of abi (version) sounds a bit weird to me.
> > Why bother with it at all?
> > Once the first set of patches lands the kernel as whole will have landlock feature
> > with a set of helpers, actions, event types.
> > Some future patches will extend the landlock feature step by step.
> > This abi concept assumes that anyone who adds new helper would need
> > to keep incrementing this 'abi'. What value does it give to user or to kernel?
> > The users will already know that landlock is present in kernel 4.14 or whatever
> > and the kernel 4.18 has more landlock features. Why bother with extra abi number?
> 
> That's right for helpers and context fields, but we can't check the use
> of one field's content. The status field is intended to be a bitfield
> extendable in the future. For example, one use case is to set a flag to
> inform the eBPF program that it was already called with the same context
> and can skip most of its check (if not related to maps). Same goes for
> the FS action bitfield, one may want to add more of them. Another
> example may be the check for abilities. We may want to relax/remove the
> capability require to set one of them. With an ABI version, the user can
> easily check if the current kernel support that.

Don't call it an ABI, perhaps minimum policy version (similar to 
what SELinux does).  Changes need to be made so that any existing 
userspace still works.



-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply

* Re: [PATCH net-next v7 02/10] bpf: Add eBPF program subtype and is_valid_subtype() verifier
From: James Morris @ 2017-08-28  3:46 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Mickaël Salaün, linux-kernel, Alexei Starovoitov,
	Andy Lutomirski, Arnaldo Carvalho de Melo, Casey Schaufler,
	Daniel Borkmann, David Drysdale, David S . Miller,
	Eric W . Biederman, James Morris, Jann Horn, Jonathan Corbet,
	Matthew Garrett, Michael Kerrisk, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Shuah Khan
In-Reply-To: <20170823024452.zvizovwfd7xjucsx@ast-mbp>

On Tue, 22 Aug 2017, Alexei Starovoitov wrote:

> more general question: what is the status of security/ bits?
> I'm assuming they still need to be reviewed and explicitly acked by James, right?

Yep, along with other core security developers where possible.


-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply

* Re: [PATCH net-next v7 00/10] Landlock LSM: Toward unprivileged sandboxing
From: James Morris @ 2017-08-28  3:38 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman,
	James Morris, Jann Horn, Jonathan Corbet, Matthew Garrett,
	Michael Kerrisk, Kees Cook, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Shuah Khan, Tejun Heo, Thomas Graf
In-Reply-To: <20170821000933.13024-1-mic@digikod.net>

[-- Attachment #1: Type: text/plain, Size: 575 bytes --]

On Mon, 21 Aug 2017, Mickaël Salaün wrote:

> ## Why a new LSM? Are SELinux, AppArmor, Smack and Tomoyo not good enough?
> 
> The current access control LSMs are fine for their purpose which is to give the
> *root* the ability to enforce a security policy for the *system*. What is
> missing is a way to enforce a security policy for any application by its
> developer and *unprivileged user* as seccomp can do for raw syscall filtering.
> 

You could mention here that the first case is Mandatory Access Control, 
in general terms.



-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply

* Re: Get ARP/ND tables from kernel
From: Waskiewicz Jr, Peter @ 2017-08-28  2:53 UTC (permalink / raw)
  To: Bassam Alsanie, netdev@vger.kernel.org
In-Reply-To: <CAJW__LkW10SWRmEKP_0YxjbVHjUYqJFPKsKn-52D1ucF+HH9uQ@mail.gmail.com>

On 8/27/17 9:25 PM, Bassam Alsanie wrote:
> Hello everyone,
> I looking into a good way (stable and compatible with large number of
> distros) to get the arp/nd cache from kernel to user space, for both
> IP4 and IP6.
> 
> It seem IOCTL (SIOCGARP) can't do that, you can only get MAC address
> from provided IP address. But IOCTL can't give the the full arp/nd
> table.
> The other option is the Netlink interface. I tried it and I got the
> ARP/ND table :).
> The third option is using /proc/net/arp, which only restricted to IP4.
> 
> There is command line utilities that I excluding in my case.
> 
> Is there another way to do it? what is the best way in my case?
> 
> Thank you all.

# strace arp -an
[...]
open("/proc/net/arp", O_RDONLY)         = 4
fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(4, "IP address       HW type     Fla"..., 1024) = 310
[...]

# strace ip -6 neighbor show
[...]
socket(AF_NETLINK, SOCK_RAW|SOCK_CLOEXEC, NETLINK_ROUTE) = 3
setsockopt(3, SOL_SOCKET, SO_SNDBUF, [32768], 4) = 0
setsockopt(3, SOL_SOCKET, SO_RCVBUF, [1048576], 4) = 0
bind(3, {sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, 12) = 0
getsockname(3, {sa_family=AF_NETLINK, nl_pid=30292, nl_groups=00000000}, 
[12]) = 0
sendto(3, {{len=40, type=RTM_GETLINK, flags=NLM_F_REQUEST|NLM_F_DUMP, 
seq=1503888680, pid=0}, 
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\0\35\0\1\0\0\0"}, 40, 0, NULL, 0) = 40
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, 
nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{{len=1268, 
type=RTM_NEWLINK, flags=NLM_F_MULTI, seq=1503888680, pid=30292}, 
"\0\0\4\3\1\0\0\0I\0\1\0\0\0\0\0\7\0\3\0lo\0\0\10\0\r\0\350\3\0\0"...}, 
{{len=1280, type=RTM_NEWLINK, flags=NLM_F_MULTI, seq=1503888680, 
pid=30292}, 
"\0\0\1\0\2\0\0\0C\20\1\0\0\0\0\0\t\0\3\0eno1\0\0\0\0\10\0\r\0"...}], 
iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 2548
[...]

Seems like it's pretty obvious if you don't want to use the existing 
tools, just look at how the existing tools get this data.  IPv4 uses 
/proc/net/arp, IPv6 uses netlink.

Cheers,
-PJ

^ permalink raw reply

* [PATCH net] ipv6: do not set sk_destruct in IPV6_ADDRFORM sockopt
From: Xin Long @ 2017-08-28  2:45 UTC (permalink / raw)
  To: network dev; +Cc: davem, pabeni, chunwang, syzkaller

ChunYu found a kernel warn_on during syzkaller fuzzing:

[40226.038539] WARNING: CPU: 5 PID: 23720 at net/ipv4/af_inet.c:152 inet_sock_destruct+0x78d/0x9a0
[40226.144849] Call Trace:
[40226.147590]  <IRQ>
[40226.149859]  dump_stack+0xe2/0x186
[40226.176546]  __warn+0x1a4/0x1e0
[40226.180066]  warn_slowpath_null+0x31/0x40
[40226.184555]  inet_sock_destruct+0x78d/0x9a0
[40226.246355]  __sk_destruct+0xfa/0x8c0
[40226.290612]  rcu_process_callbacks+0xaa0/0x18a0
[40226.336816]  __do_softirq+0x241/0x75e
[40226.367758]  irq_exit+0x1f6/0x220
[40226.371458]  smp_apic_timer_interrupt+0x7b/0xa0
[40226.376507]  apic_timer_interrupt+0x93/0xa0

The warn_on happned when sk->sk_rmem_alloc wasn't 0 in inet_sock_destruct.
As after commit f970bd9e3a06 ("udp: implement memory accounting helpers"),
udp has changed to use udp_destruct_sock as sk_destruct where it would
udp_rmem_release all rmem.

But IPV6_ADDRFORM sockopt sets sk_destruct with inet_sock_destruct after
changing family to PF_INET. If rmem is not 0 at that time, and there is
no place to release rmem before calling inet_sock_destruct, the warn_on
will be triggered.

This patch is to fix it by not setting sk_destruct in IPV6_ADDRFORM sockopt
any more. As IPV6_ADDRFORM sockopt only works for tcp and udp. TCP sock has
already set it's sk_destruct with inet_sock_destruct and UDP has set with
udp_destruct_sock since they're created.

Fixes: f970bd9e3a06 ("udp: implement memory accounting helpers")
Reported-by: ChunYu Wang <chunwang@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/ipv6/ipv6_sockglue.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 02d795f..a5e466d 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -242,7 +242,6 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
 			pktopt = xchg(&np->pktoptions, NULL);
 			kfree_skb(pktopt);
 
-			sk->sk_destruct = inet_sock_destruct;
 			/*
 			 * ... and add it to the refcnt debug socks count
 			 * in the new family. -acme
-- 
2.1.0

^ permalink raw reply related

* Re: [PATCH RFC WIP 0/5] IGMP snooping for local traffic
From: Florian Fainelli @ 2017-08-28  2:44 UTC (permalink / raw)
  To: Andrew Lunn, netdev; +Cc: Vivien Didelot, nikolay, roopa, bridge, jiri
In-Reply-To: <1503780970-10312-1-git-send-email-andrew@lunn.ch>

Hi Andrew,

On 08/26/2017 01:56 PM, Andrew Lunn wrote:
> This is a WIP patchset i would like comments on from bridge,
> switchdev and hardware offload people.
> 
> The linux bridge supports IGMP snooping. It will listen to IGMP 
> reports on bridge ports and keep track of which groups have been 
> joined on an interface. It will then forward multicast based on this 
> group membership.
> 
> When the bridge adds or removed groups from an interface, it uses 
> switchdev to request the hardware add an mdb to a port, so the 
> hardware can perform the selective forwarding between ports.
> 
> What is not covered by the current bridge code, is IGMP joins/leaves 
> from the host on the brX interface. No such monitoring is performed.
> With a pure software bridge, it is not required. All mulitcast frames
> are passed to the brX interface, and the network stack filters them,
> as it does for any interface. However, when hardware offload is
> involved, things change. We should program the hardware to only send
> multcast packets to the host when the host has in interest in them.

OK, so if I understand this right, without a bridge, we have the
following happen today: with a DSA-enabled setup using any kind of
switch tagging protocol, if a host is interested in receiving particular
multicast traffic, we would receive IGMP joins/leaves through sw0p0, and
the stack should call ndo_set_rx_mode for sw0p0, which would be
dsa_slave_set_rx_mode() and which would synchronize the DSA master
network device with the slave network device, everything works fine
provided that the CPU port is configured to accept multicast traffic.

Note here that we don't really add a MDB entry for sw0p0 when that
happens, but it seems like we should for switches that lack IGMP
snooping and/or multicast filtering.

With the current bridge and DSA code, are not we actually always going
to get the CPU port to be added with the multicast address and therefore
no filtering is occurring and snooping is pretty much useless?

> 
> Thus we need to perform IGMP snooping on the brX interface, just
> like any other interface of the bridge. However, currently the brX 
> interface is missing all the needed data structures to do this.
> There is no net_bridge_port structure for the brX interface. This
> strucuture is created when an interface is added to the bridge. But
> the brX interface is not a member of the bridge. So this patchset
> makes the brX interface a first class member of the bridge. When the
> brX interface is opened, the interface is added to the bridge. A 
> net_bridge_port is allocated for it, and IGMP snooping is performed
> as usual.

Would not making brX be part of the bridge have a huge negative
performance impact on locally generated traffic either? Even though we
do an early return in br_handle_frame() this may become noticeable.

> 
> There are some complexities here. Some assumptions are broken, like 
> the master interface of a port interface is the bridge interface.
> The brX interface cannot be its own master. The use of 
> netdev_master_upper_dev_get() within the bridge code has been
> changed to reflecit this. The bridge receive handler needs to not
> process frames for the brX interface, etc.
> 
> The interface downward to the hardware is also an issue. The code 
> presented here is a hack and needs to change. But that is secondary 
> and can be solved once it is agreed how the bridge needs to change
> to support this use case.
> 
> Comment welcome and wanted.

While I understand the reasons why you did it that way, I think this is
going to break a lot of code in bridge that does not expect brX to be a
bridge port member.

Maybe we can just generate switch MDB events targeting the bridge
network device and let switch drivers resolve that to whatever their
CPU/master port is?

It does sound like we are moving more and more to a model where brX
becomes one (if not the only one) net_device representor of what the
CPU/master port of a switch is (at least with DSA) which sort of makes
us go back to the multi-CPU port discussion we had a while ago.

Thanks!
-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next] bridge: fdb add and delete tracepoints
From: Florian Fainelli @ 2017-08-28  2:11 UTC (permalink / raw)
  To: Roopa Prabhu, davem; +Cc: netdev, nikolay, bridge, Andrew Lunn
In-Reply-To: <1503869593-30165-1-git-send-email-roopa@cumulusnetworks.com>

On 08/27/2017 02:33 PM, Roopa Prabhu wrote:
> From: Roopa Prabhu <roopa@cumulusnetworks.com>
> 
> Tracepoints to trace bridge forwarding database updates.

Thanks for adding this!

> 
> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
> ---
>  include/trace/events/bridge.h | 98 +++++++++++++++++++++++++++++++++++++++++++
>  net/bridge/br_fdb.c           |  7 ++++
>  net/core/net-traces.c         |  6 +++
>  3 files changed, 111 insertions(+)
>  create mode 100644 include/trace/events/bridge.h
> 
> diff --git a/include/trace/events/bridge.h b/include/trace/events/bridge.h
> new file mode 100644
> index 0000000..e2d52cf
> --- /dev/null
> +++ b/include/trace/events/bridge.h
> @@ -0,0 +1,98 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM bridge
> +
> +#if !defined(_TRACE_BRIDGE_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_BRIDGE_H
> +
> +#include <linux/netdevice.h>
> +#include <linux/tracepoint.h>
> +
> +#include "../../../net/bridge/br_private.h"
> +
> +TRACE_EVENT(br_fdb_add,
> +
> +	TP_PROTO(struct ndmsg *ndm, struct net_device *dev,
> +		 const unsigned char *addr, u16 vid, u16 nlh_flags),
> +
> +	TP_ARGS(ndm, dev, addr, vid, nlh_flags),
> +
> +	TP_STRUCT__entry(
> +		__field(u8, ndm_flags)
> +		__string(dev, dev->name)
> +		__array(unsigned char, addr, 6)

Can you use ETH_ALEN instead of 6 here?

> +		__field(u16, vid)
> +		__field(u16, nlh_flags)
> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(dev, dev->name);
> +		memcpy(__entry->addr, addr, 6);

Likewise

> +		__entry->vid = vid;
> +		__entry->nlh_flags = nlh_flags;
> +		__entry->ndm_flags = ndm->ndm_flags;
> +	),
> +
> +	TP_printk("dev %s addr %02x:%02x:%02x:%02x:%02x:%02x vid %u nlh_flags %x ndm_flags = %x",

I wonder if we could make %pM work for TP_printk() as this would
simplify the argument list a bitt. Can you use %04x for vid, nlh_flags
and %02x for ndm_flags?

> +		  __get_str(dev), __entry->addr[0], __entry->addr[1],
> +		  __entry->addr[2], __entry->addr[3], __entry->addr[4],
> +		  __entry->addr[5], __entry->vid,
> +		  __entry->nlh_flags, __entry->ndm_flags)
> +);
> +
> +TRACE_EVENT(br_fdb_external_learn_add,
> +
> +	TP_PROTO(struct net_bridge *br, struct net_bridge_port *p,
> +		 const unsigned char *addr, u16 vid),
> +
> +	TP_ARGS(br, p, addr, vid),
> +
> +	TP_STRUCT__entry(
> +		__string(br_dev, br->dev->name)
> +		__string(dev, p->dev->name)
> +		__array(unsigned char, addr, 6)
> +		__field(u16, vid)
> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(br_dev, br ? br->dev->name : "null");
> +		__assign_str(dev, p ? p->dev->name : "null");
> +		memcpy(__entry->addr, addr, 6);
> +		__entry->vid = vid;
> +	),
> +
> +	TP_printk("br_dev %s port %s addr %02x:%02x:%02x:%02x:%02x:%02x vid %u",
> +		  __get_str(br_dev), __get_str(dev), __entry->addr[0],
> +		  __entry->addr[1], __entry->addr[2], __entry->addr[3],
> +		  __entry->addr[4], __entry->addr[5], __entry->vid)
> +);
> +
> +TRACE_EVENT(fdb_delete,
> +
> +	TP_PROTO(struct net_bridge *br, struct net_bridge_fdb_entry *f),
> +
> +	TP_ARGS(br, f),
> +
> +	TP_STRUCT__entry(
> +		__string(br_dev, br->dev->name)
> +		__string(dev, f->dst ? f->dst->dev->name : "null")
> +		__array(unsigned char, addr, 6)

Same here, using ETH_ALEN would be clearer.

> +		__field(u16, vid)
> +	),
> +

Thanks!
-- 
Florian

^ permalink raw reply

* linux-next: manual merge of the net-next tree with the rockchip tree
From: Stephen Rothwell @ 2017-08-28  1:33 UTC (permalink / raw)
  To: David Miller, Networking, Heiko Stuebner
  Cc: Linux-Next Mailing List, Linux Kernel Mailing List, David Wu,
	Liang Chen

Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  arch/arm64/boot/dts/rockchip/rk3328-evb.dts

between commit:

  0e54e062692a ("arm64: dts: rockchip: add mmc nodes for rk3328 evaluation board")
  57fca160b2be ("arm64: dts: rockchip: add cpu regulator for rk3328 evaluation board")

from the rockchip tree and commit:

  4b05bc6157eb ("ARM64: dts: rockchip: Enable gmac2phy for rk3328-evb")

from the net-next tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/arm64/boot/dts/rockchip/rk3328-evb.dts
index f82b2d0d9e86,b9f36dad17e6..000000000000
--- a/arch/arm64/boot/dts/rockchip/rk3328-evb.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3328-evb.dts
@@@ -51,217 -51,24 +51,234 @@@
  		stdout-path = "serial2:1500000n8";
  	};
  
 +	dc_12v: dc-12v {
 +		compatible = "regulator-fixed";
 +		regulator-name = "dc_12v";
 +		regulator-always-on;
 +		regulator-boot-on;
 +		regulator-min-microvolt = <12000000>;
 +		regulator-max-microvolt = <12000000>;
 +	};
 +
 +	sdio_pwrseq: sdio-pwrseq {
 +		compatible = "mmc-pwrseq-simple";
 +		pinctrl-names = "default";
 +		pinctrl-0 = <&wifi_enable_h>;
 +
 +		/*
 +		 * On the module itself this is one of these (depending
 +		 * on the actual card populated):
 +		 * - SDIO_RESET_L_WL_REG_ON
 +		 * - PDN (power down when low)
 +		 */
 +		reset-gpios = <&gpio1 18 GPIO_ACTIVE_LOW>;
 +	};
 +
+ 	vcc_phy: vcc-phy-regulator {
+ 		compatible = "regulator-fixed";
+ 		regulator-name = "vcc_phy";
+ 		regulator-always-on;
+ 		regulator-boot-on;
+ 	};
++
 +	vcc_sys: vcc-sys {
 +		compatible = "regulator-fixed";
 +		regulator-name = "vcc_sys";
 +		regulator-always-on;
 +		regulator-boot-on;
 +		regulator-min-microvolt = <5000000>;
 +		regulator-max-microvolt = <5000000>;
 +		vin-supply = <&dc_12v>;
 +	};
 +
 +	vcc_sd: sdmmc-regulator {
 +		compatible = "regulator-fixed";
 +		gpio = <&gpio0 30 GPIO_ACTIVE_LOW>;
 +		pinctrl-names = "default";
 +		pinctrl-0 = <&sdmmc0m1_gpio>;
 +		regulator-name = "vcc_sd";
 +		regulator-min-microvolt = <3300000>;
 +		regulator-max-microvolt = <3300000>;
 +		vin-supply = <&vcc_io>;
 +	};
 +};
 +
 +&cpu0 {
 +	cpu-supply = <&vdd_arm>;
 +};
 +
 +&emmc {
 +	bus-width = <8>;
 +	cap-mmc-highspeed;
 +	non-removable;
 +	pinctrl-names = "default";
 +	pinctrl-0 = <&emmc_clk &emmc_cmd &emmc_bus8>;
 +	status = "okay";
  };
  
+ &gmac2phy {
+ 	phy-supply = <&vcc_phy>;
+ 	clock_in_out = "output";
+ 	assigned-clocks = <&cru SCLK_MAC2PHY_SRC>;
+ 	assigned-clock-rate = <50000000>;
+ 	assigned-clocks = <&cru SCLK_MAC2PHY>;
+ 	assigned-clock-parents = <&cru SCLK_MAC2PHY_SRC>;
+ 	status = "okay";
+ };
+ 
 +&i2c1 {
 +	status = "okay";
 +
 +	rk805: rk805@18 {
 +		compatible = "rockchip,rk805";
 +		reg = <0x18>;
 +		interrupt-parent = <&gpio2>;
 +		interrupts = <6 IRQ_TYPE_LEVEL_LOW>;
 +		#clock-cells = <1>;
 +		clock-output-names = "xin32k", "rk805-clkout2";
 +		gpio-controller;
 +		#gpio-cells = <2>;
 +		pinctrl-names = "default";
 +		pinctrl-0 = <&pmic_int_l>;
 +		rockchip,system-power-controller;
 +		wakeup-source;
 +
 +		vcc1-supply = <&vcc_sys>;
 +		vcc2-supply = <&vcc_sys>;
 +		vcc3-supply = <&vcc_sys>;
 +		vcc4-supply = <&vcc_sys>;
 +		vcc5-supply = <&vcc_io>;
 +		vcc6-supply = <&vcc_io>;
 +
 +		regulators {
 +			vdd_logic: DCDC_REG1 {
 +				regulator-name = "vdd_logic";
 +				regulator-min-microvolt = <712500>;
 +				regulator-max-microvolt = <1450000>;
 +				regulator-always-on;
 +				regulator-boot-on;
 +				regulator-state-mem {
 +					regulator-on-in-suspend;
 +					regulator-suspend-microvolt = <1000000>;
 +				};
 +			};
 +
 +			vdd_arm: DCDC_REG2 {
 +				regulator-name = "vdd_arm";
 +				regulator-min-microvolt = <712500>;
 +				regulator-max-microvolt = <1450000>;
 +				regulator-always-on;
 +				regulator-boot-on;
 +				regulator-state-mem {
 +					regulator-on-in-suspend;
 +					regulator-suspend-microvolt = <950000>;
 +				};
 +			};
 +
 +			vcc_ddr: DCDC_REG3 {
 +				regulator-name = "vcc_ddr";
 +				regulator-always-on;
 +				regulator-boot-on;
 +				regulator-state-mem {
 +					regulator-on-in-suspend;
 +				};
 +			};
 +
 +			vcc_io: DCDC_REG4 {
 +				regulator-name = "vcc_io";
 +				regulator-min-microvolt = <3300000>;
 +				regulator-max-microvolt = <3300000>;
 +				regulator-always-on;
 +				regulator-boot-on;
 +				regulator-state-mem {
 +					regulator-on-in-suspend;
 +					regulator-suspend-microvolt = <3300000>;
 +				};
 +			};
 +
 +			vcc_18: LDO_REG1 {
 +				regulator-name = "vcc_18";
 +				regulator-min-microvolt = <1800000>;
 +				regulator-max-microvolt = <1800000>;
 +				regulator-always-on;
 +				regulator-boot-on;
 +				regulator-state-mem {
 +					regulator-on-in-suspend;
 +					regulator-suspend-microvolt = <1800000>;
 +				};
 +			};
 +
 +			vcc18_emmc: LDO_REG2 {
 +				regulator-name = "vcc18_emmc";
 +				regulator-min-microvolt = <1800000>;
 +				regulator-max-microvolt = <1800000>;
 +				regulator-always-on;
 +				regulator-boot-on;
 +				regulator-state-mem {
 +					regulator-on-in-suspend;
 +					regulator-suspend-microvolt = <1800000>;
 +				};
 +			};
 +
 +			vdd_10: LDO_REG3 {
 +				regulator-name = "vdd_10";
 +				regulator-min-microvolt = <1000000>;
 +				regulator-max-microvolt = <1000000>;
 +				regulator-always-on;
 +				regulator-boot-on;
 +				regulator-state-mem {
 +					regulator-on-in-suspend;
 +					regulator-suspend-microvolt = <1000000>;
 +				};
 +			};
 +		};
 +	};
 +};
 +
 +&pinctrl {
 +	pmic {
 +		pmic_int_l: pmic-int-l {
 +			rockchip,pins = <2 RK_PA6 RK_FUNC_GPIO &pcfg_pull_up>;
 +		};
 +	};
 +
 +	sdio-pwrseq {
 +		wifi_enable_h: wifi-enable-h {
 +		rockchip,pins =
 +			<1 18 RK_FUNC_GPIO &pcfg_pull_none>;
 +		};
 +	};
 +};
 +
 +&sdio {
 +	bus-width = <4>;
 +	cap-sd-highspeed;
 +	cap-sdio-irq;
 +	keep-power-in-suspend;
 +	max-frequency = <150000000>;
 +	mmc-pwrseq = <&sdio_pwrseq>;
 +	non-removable;
 +	pinctrl-names = "default";
 +	pinctrl-0 = <&sdmmc1_bus4 &sdmmc1_cmd &sdmmc1_clk>;
 +	status = "okay";
 +};
 +
 +&sdmmc {
 +	bus-width = <4>;
 +	cap-mmc-highspeed;
 +	cap-sd-highspeed;
 +	disable-wp;
 +	max-frequency = <150000000>;
 +	pinctrl-names = "default";
 +	pinctrl-0 = <&sdmmc0_clk &sdmmc0_cmd &sdmmc0_dectn &sdmmc0_bus4>;
 +	vmmc-supply = <&vcc_sd>;
 +	status = "okay";
 +};
 +
 +&tsadc {
 +	status = "okay";
 +};
 +
  &uart2 {
  	status = "okay";
  };

^ permalink raw reply

* Get ARP/ND tables from kernel
From: Bassam Alsanie @ 2017-08-28  1:25 UTC (permalink / raw)
  To: netdev

Hello everyone,
I looking into a good way (stable and compatible with large number of
distros) to get the arp/nd cache from kernel to user space, for both
IP4 and IP6.

It seem IOCTL (SIOCGARP) can't do that, you can only get MAC address
from provided IP address. But IOCTL can't give the the full arp/nd
table.
The other option is the Netlink interface. I tried it and I got the
ARP/ND table :).
The third option is using /proc/net/arp, which only restricted to IP4.

There is command line utilities that I excluding in my case.

Is there another way to do it? what is the best way in my case?

Thank you all.

^ permalink raw reply

* Re: [PATCH net-next 3/4] net/core: Add violation counters to VF statisctics
From: Jakub Kicinski @ 2017-08-28  0:43 UTC (permalink / raw)
  To: Saeed Mahameed; +Cc: David S. Miller, netdev, Eugenia Emantayev
In-Reply-To: <20170827110618.20599-4-saeedm@mellanox.com>

On Sun, 27 Aug 2017 14:06:17 +0300, Saeed Mahameed wrote:
> From: Eugenia Emantayev <eugenia@mellanox.com>
> 
> Add receive and transmit violation counters to be
> displayed in iproute2 VF statistics.
> 
> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> ---
>  include/linux/if_link.h      |  2 ++
>  include/uapi/linux/if_link.h |  2 ++
>  net/core/rtnetlink.c         | 10 +++++++++-
>  3 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
> index da70af27e42e..ebf3448acb5b 100644
> --- a/include/linux/if_link.h
> +++ b/include/linux/if_link.h
> @@ -12,6 +12,8 @@ struct ifla_vf_stats {
>  	__u64 tx_bytes;
>  	__u64 broadcast;
>  	__u64 multicast;
> +	__u64 rx_dropped;
> +	__u64 tx_dropped;

I'm a little concerned that you call those violation counters in the
commit message.  Do you expect them to only be used if the VF traffic
indeed violates some admin-set rules?  I would imaging HW/FW may drop
frames in certain situations and naming the counters *_dropped suggests
it would be OK to increment them even if the drop reason was not any
sort of violation.  Would you mind clarifying?

^ permalink raw reply

* Re: [PATCH net-next 1/4] net: Add SRIOV VGT+ support
From: Jakub Kicinski @ 2017-08-28  0:38 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, netdev, Eugenia Emantayev, Mohamad Haj Yahia
In-Reply-To: <20170827110618.20599-2-saeedm@mellanox.com>

On Sun, 27 Aug 2017 14:06:15 +0300, Saeed Mahameed wrote:
> From: Mohamad Haj Yahia <mohamad@mellanox.com>
> 
> VGT+ is a security feature that gives the administrator the ability of
> controlling the allowed vlan-ids list that can be transmitted/received
> from/to the VF.
> The allowed vlan-ids list is called "trunk".
> Admin can add/remove a range of allowed vlan-ids via iptool.
> Example:
> After this series of configuration :
> 1) ip link set eth3 vf 0 trunk add 10 100 (allow vlan-id 10-100, default tpid 0x8100)
> 2) ip link set eth3 vf 0 trunk add 105 proto 802.1q (allow vlan-id 105 tpid 0x8100)
> 3) ip link set eth3 vf 0 trunk add 105 proto 802.1ad (allow vlan-id 105 tpid 0x88a8)
> 4) ip link set eth3 vf 0 trunk rem 90 (block vlan-id 90)
> 5) ip link set eth3 vf 0 trunk rem 50 60 (block vlan-ids 50-60)
> 
> The VF 0 can only communicate on vlan-ids: 10-49,61-89,91-100,105 with
> tpid 0x8100 and vlan-id 105 with tpid 0x88a8.
> 
> For this purpose we added the following netlink sr-iov commands:
> 
> 1) IFLA_VF_VLAN_RANGE: used to add/remove allowed vlan-ids range.
> We added the ifla_vf_vlan_range struct to specify the range we want to
> add/remove from the userspace.
> We added ndo_add_vf_vlan_trunk_range and ndo_del_vf_vlan_trunk_range
> netdev ops to add/remove allowed vlan-ids range in the netdev.
> 
> 2) IFLA_VF_VLAN_TRUNK: used to query the allowed vlan-ids trunk.
> We added trunk bitmap to the ifla_vf_info struct to get the current
> allowed vlan-ids trunk from the netdev.
> We added ifla_vf_vlan_trunk struct for sending the allowed vlan-ids
> trunk to the userspace.
> 
> Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

Interesting work, I have some minor questions if you don't mind :)

I was under impression that "trunk" is a vendor-specific term, would it
make sense to drop it from the APIs?

> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index 8d062c58d5cb..3aa895c5fbc1 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -168,6 +168,8 @@ enum {
>  #ifndef __KERNEL__
>  #define IFLA_RTA(r)  ((struct rtattr*)(((char*)(r)) + NLMSG_ALIGN(sizeof(struct ifinfomsg))))
>  #define IFLA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct ifinfomsg))
> +#define BITS_PER_BYTE 8
> +#define DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d))
>  #endif
>  
>  enum {
> @@ -645,6 +647,8 @@ enum {
>  	IFLA_VF_IB_NODE_GUID,	/* VF Infiniband node GUID */
>  	IFLA_VF_IB_PORT_GUID,	/* VF Infiniband port GUID */
>  	IFLA_VF_VLAN_LIST,	/* nested list of vlans, option for QinQ */
> +	IFLA_VF_VLAN_RANGE,	/* add/delete vlan range filtering */
> +	IFLA_VF_VLAN_TRUNK,	/* vlan trunk filtering */
>  	__IFLA_VF_MAX,
>  };
>  
> @@ -669,6 +673,7 @@ enum {
>  
>  #define IFLA_VF_VLAN_INFO_MAX (__IFLA_VF_VLAN_INFO_MAX - 1)
>  #define MAX_VLAN_LIST_LEN 1
> +#define VF_VLAN_N_VID 4096
>  
>  struct ifla_vf_vlan_info {
>  	__u32 vf;
> @@ -677,6 +682,21 @@ struct ifla_vf_vlan_info {
>  	__be16 vlan_proto; /* VLAN protocol either 802.1Q or 802.1ad */
>  };
>  
> +struct ifla_vf_vlan_range {
> +	__u32 vf;
> +	__u32 start_vid;   /* 1 - 4095 */
> +	__u32 end_vid;     /* 1 - 4095 */
> +	__u32 setting;
> +	__be16 vlan_proto; /* VLAN protocol either 802.1Q or 802.1ad */
> +};
> +
> +#define VF_VLAN_BITMAP	DIV_ROUND_UP(VF_VLAN_N_VID, sizeof(__u64) * BITS_PER_BYTE)
> +struct ifla_vf_vlan_trunk {
> +	__u32 vf;
> +	__u64 allowed_vlans_8021q_bm[VF_VLAN_BITMAP];
> +	__u64 allowed_vlans_8021ad_bm[VF_VLAN_BITMAP];
> +};

Would you mind explaining why you chose to make the API asymmetrical
like that?  I mean the set operation is range-based, yet the get
returns a bitmask.  You seem to solely depend on the bitmasks in the
driver anyway...

>  struct ifla_vf_tx_rate {
>  	__u32 vf;
>  	__u32 rate; /* Max TX bandwidth in Mbps, 0 disables throttling */
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index a78fd61da0ec..56909f11d88e 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -827,6 +827,7 @@ static inline int rtnl_vfinfo_size(const struct net_device *dev,
>  			 nla_total_size(MAX_VLAN_LIST_LEN *
>  					sizeof(struct ifla_vf_vlan_info)) +
>  			 nla_total_size(sizeof(struct ifla_vf_spoofchk)) +
> +			 nla_total_size(sizeof(struct ifla_vf_vlan_trunk)) +
>  			 nla_total_size(sizeof(struct ifla_vf_tx_rate)) +
>  			 nla_total_size(sizeof(struct ifla_vf_rate)) +
>  			 nla_total_size(sizeof(struct ifla_vf_link_state)) +
> @@ -1098,31 +1099,43 @@ static noinline_for_stack int rtnl_fill_vfinfo(struct sk_buff *skb,
>  	struct ifla_vf_link_state vf_linkstate;
>  	struct ifla_vf_vlan_info vf_vlan_info;
>  	struct ifla_vf_spoofchk vf_spoofchk;
> +	struct ifla_vf_vlan_trunk *vf_trunk;
>  	struct ifla_vf_tx_rate vf_tx_rate;
>  	struct ifla_vf_stats vf_stats;
>  	struct ifla_vf_trust vf_trust;
>  	struct ifla_vf_vlan vf_vlan;
>  	struct ifla_vf_rate vf_rate;
>  	struct ifla_vf_mac vf_mac;
> -	struct ifla_vf_info ivi;
> +	struct ifla_vf_info *ivi;
>  
> -	memset(&ivi, 0, sizeof(ivi));
> +	ivi = kzalloc(sizeof(*ivi), GFP_KERNEL);
> +	if (!ivi)
> +		return -ENOMEM;

In the future please try to split code adjustments like allocating ivi
here into a separate patch.  Makes the changes a little more obvious to
read.

^ permalink raw reply

* [net-next 15/15] i40e/i40evf: avoid dynamic ITR updates when polling or low packet rate
From: Jeff Kirsher @ 2017-08-28  0:16 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20170828001603.75876-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

The dynamic ITR algorithm depends on a calculation of usecs which
assumes that the interrupts have been firing constantly at the interrupt
throttle rate. This is not guaranteed because we could have a low packet
rate, or have been polling in software.

We'll estimate whether this is the case by using jiffies to determine if
we've been too long. If the time difference of jiffies is larger we are
guaranteed to have an incorrect calculation. If the time difference of
jiffies is smaller we might have been polling some but the difference
shouldn't affect the calculation too much.

This ensures that we don't get stuck in BULK latency during certain rare
situations where we receive bursts of packets that force us into NAPI
polling.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 22 +++++++++++++++++-----
 drivers/net/ethernet/intel/i40e/i40e_txrx.h   |  1 +
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 22 +++++++++++++++++-----
 drivers/net/ethernet/intel/i40evf/i40e_txrx.h |  1 +
 4 files changed, 36 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index f00f233092e9..1519dfb851d0 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -961,11 +961,25 @@ static bool i40e_set_new_dynamic_itr(struct i40e_ring_container *rc)
 	enum i40e_latency_range new_latency_range = rc->latency_range;
 	u32 new_itr = rc->itr;
 	int bytes_per_int;
-	int usecs;
+	unsigned int usecs, estimated_usecs;
 
 	if (rc->total_packets == 0 || !rc->itr)
 		return false;
 
+	usecs = (rc->itr << 1) * ITR_COUNTDOWN_START;
+	bytes_per_int = rc->total_bytes / usecs;
+
+	/* The calculations in this algorithm depend on interrupts actually
+	 * firing at the ITR rate. This may not happen if the packet rate is
+	 * really low, or if we've been napi polling. Check to make sure
+	 * that's not the case before we continue.
+	 */
+	estimated_usecs = jiffies_to_usecs(jiffies - rc->last_itr_update);
+	if (estimated_usecs > usecs) {
+		new_latency_range = I40E_LOW_LATENCY;
+		goto reset_latency;
+	}
+
 	/* simple throttlerate management
 	 *   0-10MB/s   lowest (50000 ints/s)
 	 *  10-20MB/s   low    (20000 ints/s)
@@ -977,9 +991,6 @@ static bool i40e_set_new_dynamic_itr(struct i40e_ring_container *rc)
 	 * are in 2 usec increments in the ITR registers, and make sure
 	 * to use the smoothed values that the countdown timer gives us.
 	 */
-	usecs = (rc->itr << 1) * ITR_COUNTDOWN_START;
-	bytes_per_int = rc->total_bytes / usecs;
-
 	switch (new_latency_range) {
 	case I40E_LOWEST_LATENCY:
 		if (bytes_per_int > 10)
@@ -998,6 +1009,7 @@ static bool i40e_set_new_dynamic_itr(struct i40e_ring_container *rc)
 		break;
 	}
 
+reset_latency:
 	rc->latency_range = new_latency_range;
 
 	switch (new_latency_range) {
@@ -1016,12 +1028,12 @@ static bool i40e_set_new_dynamic_itr(struct i40e_ring_container *rc)
 
 	rc->total_bytes = 0;
 	rc->total_packets = 0;
+	rc->last_itr_update = jiffies;
 
 	if (new_itr != rc->itr) {
 		rc->itr = new_itr;
 		return true;
 	}
-
 	return false;
 }
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index e6456e8a899c..2f848bc5e391 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -461,6 +461,7 @@ struct i40e_ring_container {
 	struct i40e_ring *ring;
 	unsigned int total_bytes;	/* total bytes processed this int */
 	unsigned int total_packets;	/* total packets processed this int */
+	unsigned long last_itr_update;	/* jiffies of last ITR update */
 	u16 count;
 	enum i40e_latency_range latency_range;
 	u16 itr;
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 2f7d9f4a6746..c32c62462c84 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -359,11 +359,25 @@ static bool i40e_set_new_dynamic_itr(struct i40e_ring_container *rc)
 	enum i40e_latency_range new_latency_range = rc->latency_range;
 	u32 new_itr = rc->itr;
 	int bytes_per_int;
-	int usecs;
+	unsigned int usecs, estimated_usecs;
 
 	if (rc->total_packets == 0 || !rc->itr)
 		return false;
 
+	usecs = (rc->itr << 1) * ITR_COUNTDOWN_START;
+	bytes_per_int = rc->total_bytes / usecs;
+
+	/* The calculations in this algorithm depend on interrupts actually
+	 * firing at the ITR rate. This may not happen if the packet rate is
+	 * really low, or if we've been napi polling. Check to make sure
+	 * that's not the case before we continue.
+	 */
+	estimated_usecs = jiffies_to_usecs(jiffies - rc->last_itr_update);
+	if (estimated_usecs > usecs) {
+		new_latency_range = I40E_LOW_LATENCY;
+		goto reset_latency;
+	}
+
 	/* simple throttlerate management
 	 *   0-10MB/s   lowest (50000 ints/s)
 	 *  10-20MB/s   low    (20000 ints/s)
@@ -375,9 +389,6 @@ static bool i40e_set_new_dynamic_itr(struct i40e_ring_container *rc)
 	 * are in 2 usec increments in the ITR registers, and make sure
 	 * to use the smoothed values that the countdown timer gives us.
 	 */
-	usecs = (rc->itr << 1) * ITR_COUNTDOWN_START;
-	bytes_per_int = rc->total_bytes / usecs;
-
 	switch (new_latency_range) {
 	case I40E_LOWEST_LATENCY:
 		if (bytes_per_int > 10)
@@ -396,6 +407,7 @@ static bool i40e_set_new_dynamic_itr(struct i40e_ring_container *rc)
 		break;
 	}
 
+reset_latency:
 	rc->latency_range = new_latency_range;
 
 	switch (new_latency_range) {
@@ -414,12 +426,12 @@ static bool i40e_set_new_dynamic_itr(struct i40e_ring_container *rc)
 
 	rc->total_bytes = 0;
 	rc->total_packets = 0;
+	rc->last_itr_update = jiffies;
 
 	if (new_itr != rc->itr) {
 		rc->itr = new_itr;
 		return true;
 	}
-
 	return false;
 }
 
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.h b/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
index fb506c9f5a6e..0d9f98bc07bd 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
@@ -432,6 +432,7 @@ struct i40e_ring_container {
 	struct i40e_ring *ring;
 	unsigned int total_bytes;	/* total bytes processed this int */
 	unsigned int total_packets;	/* total packets processed this int */
+	unsigned long last_itr_update;	/* jiffies of last ITR update */
 	u16 count;
 	enum i40e_latency_range latency_range;
 	u16 itr;
-- 
2.14.1

^ permalink raw reply related

* [net-next 12/15] i40e: initialize our affinity_mask based on cpu_possible_mask
From: Jeff Kirsher @ 2017-08-28  0:16 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20170828001603.75876-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

On older kernels a call to irq_set_affinity_hint does not guarantee that
the IRQ affinity will be set. If nothing else on the system sets the IRQ
affinity this can result in a bug in the i40e_napi_poll() routine where
we notice that our interrupt fired on the "wrong" CPU according to our
internal affinity_mask variable.

This results in a bug where we continuously tell NAPI to stop polling to
move the interrupt to a new CPU, but the CPU never changes because our
affinity mask does not match the actual mask setup for the IRQ.

The root problem is a mismatched affinity mask value. So lets initialize
the value to cpu_possible_mask instead. This ensures that prior to the
first time we get an IRQ affinity notification we'll have the mask set
to include every possible CPU.

We use cpu_possible_mask instead of cpu_online_mask since the former is
almost certainly never going to change, while the later might change
after we've made a copy.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c     | 12 +++++++-----
 drivers/net/ethernet/intel/i40evf/i40evf_main.c |  7 +++++--
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 7366e7c7f399..6498da8806cb 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -2881,7 +2881,7 @@ static void i40e_config_xps_tx_ring(struct i40e_ring *ring)
 	if ((vsi->tc_config.numtc <= 1) &&
 	    !test_and_set_bit(__I40E_TX_XPS_INIT_DONE, &ring->state)) {
 		netif_set_xps_queue(ring->netdev,
-				    &ring->q_vector->affinity_mask,
+				    get_cpu_mask(ring->q_vector->v_idx),
 				    ring->queue_index);
 	}
 
@@ -3506,8 +3506,10 @@ static int i40e_vsi_request_irq_msix(struct i40e_vsi *vsi, char *basename)
 		q_vector->affinity_notify.notify = i40e_irq_affinity_notify;
 		q_vector->affinity_notify.release = i40e_irq_affinity_release;
 		irq_set_affinity_notifier(irq_num, &q_vector->affinity_notify);
-		/* assign the mask for this irq */
-		irq_set_affinity_hint(irq_num, &q_vector->affinity_mask);
+		/* get_cpu_mask returns a static constant mask with
+		 * a permanent lifetime so it's ok to use here.
+		 */
+		irq_set_affinity_hint(irq_num, get_cpu_mask(q_vector->v_idx));
 	}
 
 	vsi->irqs_ready = true;
@@ -4289,7 +4291,7 @@ static void i40e_vsi_free_irq(struct i40e_vsi *vsi)
 
 			/* clear the affinity notifier in the IRQ descriptor */
 			irq_set_affinity_notifier(irq_num, NULL);
-			/* clear the affinity_mask in the IRQ descriptor */
+			/* remove our suggested affinity mask for this IRQ */
 			irq_set_affinity_hint(irq_num, NULL);
 			synchronize_irq(irq_num);
 			free_irq(irq_num, vsi->q_vectors[i]);
@@ -8235,7 +8237,7 @@ static int i40e_vsi_alloc_q_vector(struct i40e_vsi *vsi, int v_idx, int cpu)
 
 	q_vector->vsi = vsi;
 	q_vector->v_idx = v_idx;
-	cpumask_set_cpu(cpu, &q_vector->affinity_mask);
+	cpumask_copy(&q_vector->affinity_mask, cpu_possible_mask);
 
 	if (vsi->netdev)
 		netif_napi_add(vsi->netdev, &q_vector->napi,
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 9ee277e87f10..1825d956bb00 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -584,8 +584,10 @@ i40evf_request_traffic_irqs(struct i40evf_adapter *adapter, char *basename)
 		q_vector->affinity_notify.release =
 						   i40evf_irq_affinity_release;
 		irq_set_affinity_notifier(irq_num, &q_vector->affinity_notify);
-		/* assign the mask for this irq */
-		irq_set_affinity_hint(irq_num, &q_vector->affinity_mask);
+		/* get_cpu_mask returns a static constant mask with
+		 * a permanent lifetime so it's ok to use here.
+		 */
+		irq_set_affinity_hint(irq_num, get_cpu_mask(q_vector->v_idx));
 	}
 
 	return 0;
@@ -1456,6 +1458,7 @@ static int i40evf_alloc_q_vectors(struct i40evf_adapter *adapter)
 		q_vector->adapter = adapter;
 		q_vector->vsi = &adapter->vsi;
 		q_vector->v_idx = q_idx;
+		cpumask_copy(&q_vector->affinity_mask, cpu_possible_mask);
 		netif_napi_add(adapter->netdev, &q_vector->napi,
 			       i40evf_napi_poll, NAPI_POLL_WEIGHT);
 	}
-- 
2.14.1

^ permalink raw reply related

* [net-next 14/15] i40e/i40evf: remove ULTRA latency mode
From: Jeff Kirsher @ 2017-08-28  0:16 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20170828001603.75876-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

Since commit c56625d59726 ("i40e/i40evf: change dynamic interrupt
thresholds") a new higher latency ITR setting called I40E_ULTRA_LATENCY
was added with a cryptic comment about how it was meant for adjusting Rx
more aggressively when streaming small packets.

This mode was attempting to calculate packets per second and then kick
in when we have a huge number of small packets.

Unfortunately, the ULTRA setting was kicking in for workloads it wasn't
intended for including single-thread UDP_STREAM workloads.

This wasn't caught for a variety of reasons. First, the ip_defrag
routines were improved somewhat which makes the UDP_STREAM test still
reasonable at 10GbE, even when dropped down to 8k interrupts a second.
Additionally, some other obvious workloads appear to work fine, such
as TCP_STREAM.

The number 40k doesn't make sense for a number of reasons. First, we
absolutely can do more than 40k packets per second. Second, we calculate
the value inline in an integer, which sometimes can overflow resulting
in using incorrect values.

If we fix this overflow it makes it even more likely that we'll enter
ULTRA mode which is the opposite of what we want.

The ULTRA mode was added originally as a way to reduce CPU utilization
during a small packet workload where we weren't keeping up anyways. It
should never have been kicking in during these other workloads.

Given the issues outlined above, let's remove the ULTRA latency mode. If
necessary, a better solution to the CPU utilization issue for small
packet workloads will be added in a future patch.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 17 -----------------
 drivers/net/ethernet/intel/i40e/i40e_txrx.h   |  1 -
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 17 -----------------
 drivers/net/ethernet/intel/i40evf/i40e_txrx.h |  1 -
 4 files changed, 36 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 3999afea518b..f00f233092e9 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -959,7 +959,6 @@ void i40e_force_wb(struct i40e_vsi *vsi, struct i40e_q_vector *q_vector)
 static bool i40e_set_new_dynamic_itr(struct i40e_ring_container *rc)
 {
 	enum i40e_latency_range new_latency_range = rc->latency_range;
-	struct i40e_q_vector *qv = rc->ring->q_vector;
 	u32 new_itr = rc->itr;
 	int bytes_per_int;
 	int usecs;
@@ -971,7 +970,6 @@ static bool i40e_set_new_dynamic_itr(struct i40e_ring_container *rc)
 	 *   0-10MB/s   lowest (50000 ints/s)
 	 *  10-20MB/s   low    (20000 ints/s)
 	 *  20-1249MB/s bulk   (18000 ints/s)
-	 *  > 40000 Rx packets per second (8000 ints/s)
 	 *
 	 * The math works out because the divisor is in 10^(-6) which
 	 * turns the bytes/us input value into MB/s values, but
@@ -994,24 +992,12 @@ static bool i40e_set_new_dynamic_itr(struct i40e_ring_container *rc)
 			new_latency_range = I40E_LOWEST_LATENCY;
 		break;
 	case I40E_BULK_LATENCY:
-	case I40E_ULTRA_LATENCY:
 	default:
 		if (bytes_per_int <= 20)
 			new_latency_range = I40E_LOW_LATENCY;
 		break;
 	}
 
-	/* this is to adjust RX more aggressively when streaming small
-	 * packets.  The value of 40000 was picked as it is just beyond
-	 * what the hardware can receive per second if in low latency
-	 * mode.
-	 */
-#define RX_ULTRA_PACKET_RATE 40000
-
-	if ((((rc->total_packets * 1000000) / usecs) > RX_ULTRA_PACKET_RATE) &&
-	    (&qv->rx == rc))
-		new_latency_range = I40E_ULTRA_LATENCY;
-
 	rc->latency_range = new_latency_range;
 
 	switch (new_latency_range) {
@@ -1024,9 +1010,6 @@ static bool i40e_set_new_dynamic_itr(struct i40e_ring_container *rc)
 	case I40E_BULK_LATENCY:
 		new_itr = I40E_ITR_18K;
 		break;
-	case I40E_ULTRA_LATENCY:
-		new_itr = I40E_ITR_8K;
-		break;
 	default:
 		break;
 	}
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index f0a0eabc2666..e6456e8a899c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -454,7 +454,6 @@ enum i40e_latency_range {
 	I40E_LOWEST_LATENCY = 0,
 	I40E_LOW_LATENCY = 1,
 	I40E_BULK_LATENCY = 2,
-	I40E_ULTRA_LATENCY = 3,
 };
 
 struct i40e_ring_container {
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index f15e341ada9e..2f7d9f4a6746 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -357,7 +357,6 @@ void i40evf_force_wb(struct i40e_vsi *vsi, struct i40e_q_vector *q_vector)
 static bool i40e_set_new_dynamic_itr(struct i40e_ring_container *rc)
 {
 	enum i40e_latency_range new_latency_range = rc->latency_range;
-	struct i40e_q_vector *qv = rc->ring->q_vector;
 	u32 new_itr = rc->itr;
 	int bytes_per_int;
 	int usecs;
@@ -369,7 +368,6 @@ static bool i40e_set_new_dynamic_itr(struct i40e_ring_container *rc)
 	 *   0-10MB/s   lowest (50000 ints/s)
 	 *  10-20MB/s   low    (20000 ints/s)
 	 *  20-1249MB/s bulk   (18000 ints/s)
-	 *  > 40000 Rx packets per second (8000 ints/s)
 	 *
 	 * The math works out because the divisor is in 10^(-6) which
 	 * turns the bytes/us input value into MB/s values, but
@@ -392,24 +390,12 @@ static bool i40e_set_new_dynamic_itr(struct i40e_ring_container *rc)
 			new_latency_range = I40E_LOWEST_LATENCY;
 		break;
 	case I40E_BULK_LATENCY:
-	case I40E_ULTRA_LATENCY:
 	default:
 		if (bytes_per_int <= 20)
 			new_latency_range = I40E_LOW_LATENCY;
 		break;
 	}
 
-	/* this is to adjust RX more aggressively when streaming small
-	 * packets.  The value of 40000 was picked as it is just beyond
-	 * what the hardware can receive per second if in low latency
-	 * mode.
-	 */
-#define RX_ULTRA_PACKET_RATE 40000
-
-	if ((((rc->total_packets * 1000000) / usecs) > RX_ULTRA_PACKET_RATE) &&
-	    (&qv->rx == rc))
-		new_latency_range = I40E_ULTRA_LATENCY;
-
 	rc->latency_range = new_latency_range;
 
 	switch (new_latency_range) {
@@ -422,9 +408,6 @@ static bool i40e_set_new_dynamic_itr(struct i40e_ring_container *rc)
 	case I40E_BULK_LATENCY:
 		new_itr = I40E_ITR_18K;
 		break;
-	case I40E_ULTRA_LATENCY:
-		new_itr = I40E_ITR_8K;
-		break;
 	default:
 		break;
 	}
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.h b/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
index 489684002e94..fb506c9f5a6e 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
@@ -425,7 +425,6 @@ enum i40e_latency_range {
 	I40E_LOWEST_LATENCY = 0,
 	I40E_LOW_LATENCY = 1,
 	I40E_BULK_LATENCY = 2,
-	I40E_ULTRA_LATENCY = 3,
 };
 
 struct i40e_ring_container {
-- 
2.14.1

^ permalink raw reply related

* [net-next 13/15] i40e: invert logic for checking incorrect cpu vs irq affinity
From: Jeff Kirsher @ 2017-08-28  0:16 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20170828001603.75876-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

In commit 96db776a3682 ("i40e/vf: fix interrupt affinity bug")
we added some code to force exit of polling in case we did
not have the correct CPU. This is important since it was possible for
the IRQ affinity to be changed while the CPU is pegged at 100%. This can
result in the polling routine being stuck on the wrong CPU until
traffic finally stops.

Unfortunately, the implementation, "if the CPU is correct, exit as
normal, otherwise, fall-through to the end-polling exit" is incredibly
confusing to reason about. In this case, the normal flow looks like the
exception, while the exception actually occurs far away from the if
statement and comment.

We recently discovered and fixed a bug in this code because we were
incorrectly initializing the affinity mask.

Re-write the code so that the exceptional case is handled at the check,
rather than having the logic be spread through the regular exit flow.
This does end up with minor code duplication, but the resulting code is
much easier to reason about.

The new logic is identical, but inverted. If we are running on a CPU not
in our affinity mask, we'll exit polling. However, the code flow is much
easier to understand.

Note that we don't actually have to check for MSI-X, because in the MSI
case we'll only have one q_vector, but its default affinity mask should
be correct as it includes all CPUs when it's initialized. Further, we
could at some point add code to setup the notifier for the non-MSI-X
case and enable this workaround for that case too, if desired, though
there isn't much gain since its unlikely to be the common case.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 31 +++++++++++++--------------
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 30 +++++++++++++-------------
 2 files changed, 30 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 5c1edcce9459..3999afea518b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2369,7 +2369,6 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
 
 	/* If work not completed, return budget and polling will return */
 	if (!clean_complete) {
-		const cpumask_t *aff_mask = &q_vector->affinity_mask;
 		int cpu_id = smp_processor_id();
 
 		/* It is possible that the interrupt affinity has changed but,
@@ -2379,15 +2378,22 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
 		 * continue to poll, otherwise we must stop polling so the
 		 * interrupt can move to the correct cpu.
 		 */
-		if (likely(cpumask_test_cpu(cpu_id, aff_mask) ||
-			   !(vsi->back->flags & I40E_FLAG_MSIX_ENABLED))) {
+		if (!cpumask_test_cpu(cpu_id, &q_vector->affinity_mask)) {
+			/* Tell napi that we are done polling */
+			napi_complete_done(napi, work_done);
+
+			/* Force an interrupt */
+			i40e_force_wb(vsi, q_vector);
+
+			/* Return budget-1 so that polling stops */
+			return budget - 1;
+		}
 tx_only:
-			if (arm_wb) {
-				q_vector->tx.ring[0].tx_stats.tx_force_wb++;
-				i40e_enable_wb_on_itr(vsi, q_vector);
-			}
-			return budget;
+		if (arm_wb) {
+			q_vector->tx.ring[0].tx_stats.tx_force_wb++;
+			i40e_enable_wb_on_itr(vsi, q_vector);
 		}
+		return budget;
 	}
 
 	if (vsi->back->flags & I40E_TXR_FLAGS_WB_ON_ITR)
@@ -2396,14 +2402,7 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
 	/* Work is done so exit the polling mode and re-enable the interrupt */
 	napi_complete_done(napi, work_done);
 
-	/* If we're prematurely stopping polling to fix the interrupt
-	 * affinity we want to make sure polling starts back up so we
-	 * issue a call to i40e_force_wb which triggers a SW interrupt.
-	 */
-	if (!clean_complete)
-		i40e_force_wb(vsi, q_vector);
-	else
-		i40e_update_enable_itr(vsi, q_vector);
+	i40e_update_enable_itr(vsi, q_vector);
 
 	return min(work_done, budget - 1);
 }
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index d91676ccf125..f15e341ada9e 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -1575,7 +1575,6 @@ int i40evf_napi_poll(struct napi_struct *napi, int budget)
 
 	/* If work not completed, return budget and polling will return */
 	if (!clean_complete) {
-		const cpumask_t *aff_mask = &q_vector->affinity_mask;
 		int cpu_id = smp_processor_id();
 
 		/* It is possible that the interrupt affinity has changed but,
@@ -1585,14 +1584,22 @@ int i40evf_napi_poll(struct napi_struct *napi, int budget)
 		 * continue to poll, otherwise we must stop polling so the
 		 * interrupt can move to the correct cpu.
 		 */
-		if (likely(cpumask_test_cpu(cpu_id, aff_mask))) {
+		if (!cpumask_test_cpu(cpu_id, &q_vector->affinity_mask)) {
+			/* Tell napi that we are done polling */
+			napi_complete_done(napi, work_done);
+
+			/* Force an interrupt */
+			i40evf_force_wb(vsi, q_vector);
+
+			/* Return budget-1 so that polling stops */
+			return budget - 1;
+		}
 tx_only:
-			if (arm_wb) {
-				q_vector->tx.ring[0].tx_stats.tx_force_wb++;
-				i40e_enable_wb_on_itr(vsi, q_vector);
-			}
-			return budget;
+		if (arm_wb) {
+			q_vector->tx.ring[0].tx_stats.tx_force_wb++;
+			i40e_enable_wb_on_itr(vsi, q_vector);
 		}
+		return budget;
 	}
 
 	if (vsi->back->flags & I40E_TXR_FLAGS_WB_ON_ITR)
@@ -1601,14 +1608,7 @@ int i40evf_napi_poll(struct napi_struct *napi, int budget)
 	/* Work is done so exit the polling mode and re-enable the interrupt */
 	napi_complete_done(napi, work_done);
 
-	/* If we're prematurely stopping polling to fix the interrupt
-	 * affinity we want to make sure polling starts back up so we
-	 * issue a call to i40evf_force_wb which triggers a SW interrupt.
-	 */
-	if (!clean_complete)
-		i40evf_force_wb(vsi, q_vector);
-	else
-		i40e_update_enable_itr(vsi, q_vector);
+	i40e_update_enable_itr(vsi, q_vector);
 
 	return min(work_done, budget - 1);
 }
-- 
2.14.1

^ permalink raw reply related

* [net-next 11/15] i40e: move enabling icr0 into i40e_update_enable_itr
From: Jeff Kirsher @ 2017-08-28  0:15 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20170828001603.75876-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

If we don't have MSI-X enabled, we handle interrupts on all icr0. This
is a special case, so let's move the conditional into
i40e_update_enable_itr() in order to make i40e_napi_poll easier to
read about.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 8a969d8f0790..5c1edcce9459 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2243,6 +2243,12 @@ static inline void i40e_update_enable_itr(struct i40e_vsi *vsi,
 	int idx = q_vector->v_idx;
 	int rx_itr_setting, tx_itr_setting;
 
+	/* If we don't have MSIX, then we only need to re-enable icr0 */
+	if (!(vsi->back->flags & I40E_FLAG_MSIX_ENABLED)) {
+		i40e_irq_dynamic_enable_icr0(vsi->back, false);
+		return;
+	}
+
 	vector = (q_vector->v_idx + vsi->base_vector);
 
 	/* avoid dynamic calculation if in countdown mode OR if
@@ -2396,8 +2402,6 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
 	 */
 	if (!clean_complete)
 		i40e_force_wb(vsi, q_vector);
-	else if (!(vsi->back->flags & I40E_FLAG_MSIX_ENABLED))
-		i40e_irq_dynamic_enable_icr0(vsi->back, false);
 	else
 		i40e_update_enable_itr(vsi, q_vector);
 
-- 
2.14.1

^ permalink raw reply related

* [net-next 09/15] i40e: Fix for unused value issue found by static analysis
From: Jeff Kirsher @ 2017-08-28  0:15 UTC (permalink / raw)
  To: davem; +Cc: Carolyn Wyborny, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <20170828001603.75876-1-jeffrey.t.kirsher@intel.com>

From: Carolyn Wyborny <carolyn.wyborny@intel.com>

This patch fixes an issue where an error return value is
set, but without an immediate exit, the value can be overwritten
by the following code execution.  The condition  at this point
is not fatal, so remove the error assignment and comment the
intent for future code maintainers

Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 5a06cd23b9e6..0962b85ef6f3 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -9884,13 +9884,15 @@ static int i40e_add_vsi(struct i40e_vsi *vsi)
 			 */
 			ret = i40e_vsi_config_tc(vsi, enabled_tc);
 			if (ret) {
+				/* Single TC condition is not fatal,
+				 * message and continue
+				 */
 				dev_info(&pf->pdev->dev,
 					 "failed to configure TCs for main VSI tc_map 0x%08x, err %s aq_err %s\n",
 					 enabled_tc,
 					 i40e_stat_str(&pf->hw, ret),
 					 i40e_aq_str(&pf->hw,
 						    pf->hw.aq.asq_last_status));
-				ret = -ENOENT;
 			}
 		}
 		break;
-- 
2.14.1

^ permalink raw reply related

* [net-next 10/15] i40e: remove workaround for resetting XPS
From: Jeff Kirsher @ 2017-08-28  0:15 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20170828001603.75876-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

Since commit 3ffa037d7f78 ("i40e: Set XPS bit mask to zero in DCB mode")
we've tried to reset the XPS settings by building a custom
empty CPU mask.

This workaround is not necessary because we're not really removing the
XPS setting, but simply setting it so that no CPU is valid.

Second, we shorten the code further by using zalloc_cpumask_var instead
of a separate call to bitmap_zero().

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 0962b85ef6f3..7366e7c7f399 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -2874,22 +2874,15 @@ static void i40e_vsi_free_rx_resources(struct i40e_vsi *vsi)
 static void i40e_config_xps_tx_ring(struct i40e_ring *ring)
 {
 	struct i40e_vsi *vsi = ring->vsi;
-	cpumask_var_t mask;
 
 	if (!ring->q_vector || !ring->netdev)
 		return;
 
-	/* Single TC mode enable XPS */
-	if (vsi->tc_config.numtc <= 1) {
-		if (!test_and_set_bit(__I40E_TX_XPS_INIT_DONE, &ring->state))
-			netif_set_xps_queue(ring->netdev,
-					    &ring->q_vector->affinity_mask,
-					    ring->queue_index);
-	} else if (alloc_cpumask_var(&mask, GFP_KERNEL)) {
-		/* Disable XPS to allow selection based on TC */
-		bitmap_zero(cpumask_bits(mask), nr_cpumask_bits);
-		netif_set_xps_queue(ring->netdev, mask, ring->queue_index);
-		free_cpumask_var(mask);
+	if ((vsi->tc_config.numtc <= 1) &&
+	    !test_and_set_bit(__I40E_TX_XPS_INIT_DONE, &ring->state)) {
+		netif_set_xps_queue(ring->netdev,
+				    &ring->q_vector->affinity_mask,
+				    ring->queue_index);
 	}
 
 	/* schedule our worker thread which will take care of
-- 
2.14.1

^ permalink raw reply related

* [net-next 05/15] i40evf: fix possible snprintf truncation of q_vector->name
From: Jeff Kirsher @ 2017-08-28  0:15 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20170828001603.75876-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

The q_vector names are based on the interface name with a driver prefix,
the type of q_vector setup, and the queue number. We previously set the
size of this variable to IFNAMSIZ + 9, which is incorrect, because we
actually include a minimum of 14 characters extra beyond the interface
name size.

New versions of GCC since 7 include a new warning that detects this
possible truncation and complains. We can fix this by increasing the
size in case our interface name is too large to avoid truncation. We
don't need to go beyond 14 because the compiler is smart enough to
realize our values can never exceed size of 1. We do go up to 15 here
because possible future changes may increase the number of queues beyond
one digit.

While we are here, also change some variables to be unsigned (since they
are never negative) and stop using an extra unnecessary %s format
specifier.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40evf/i40evf.h      |  2 +-
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 21 +++++++++------------
 2 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf.h b/drivers/net/ethernet/intel/i40evf/i40evf.h
index d310544c6c6e..e5293d35fb6a 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf.h
+++ b/drivers/net/ethernet/intel/i40evf/i40evf.h
@@ -121,7 +121,7 @@ struct i40e_q_vector {
 #define ITR_COUNTDOWN_START 100
 	u8 itr_countdown;	/* when 0 or 1 update ITR */
 	int v_idx;	/* vector index in list */
-	char name[IFNAMSIZ + 9];
+	char name[IFNAMSIZ + 15];
 	bool arm_wb_state;
 	cpumask_t affinity_mask;
 	struct irq_affinity_notify affinity_notify;
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 0d87191b6bac..258e8e27068b 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -543,9 +543,9 @@ static void i40evf_irq_affinity_release(struct kref *ref) {}
 static int
 i40evf_request_traffic_irqs(struct i40evf_adapter *adapter, char *basename)
 {
-	int vector, err, q_vectors;
-	int rx_int_idx = 0, tx_int_idx = 0;
-	int irq_num;
+	unsigned int vector, q_vectors;
+	unsigned int rx_int_idx = 0, tx_int_idx = 0;
+	int irq_num, err;
 
 	i40evf_irq_disable(adapter);
 	/* Decrement for Other and TCP Timer vectors */
@@ -556,18 +556,15 @@ i40evf_request_traffic_irqs(struct i40evf_adapter *adapter, char *basename)
 		irq_num = adapter->msix_entries[vector + NONQ_VECS].vector;
 
 		if (q_vector->tx.ring && q_vector->rx.ring) {
-			snprintf(q_vector->name, sizeof(q_vector->name) - 1,
-				 "i40evf-%s-%s-%d", basename,
-				 "TxRx", rx_int_idx++);
+			snprintf(q_vector->name, sizeof(q_vector->name),
+				 "i40evf-%s-TxRx-%d", basename, rx_int_idx++);
 			tx_int_idx++;
 		} else if (q_vector->rx.ring) {
-			snprintf(q_vector->name, sizeof(q_vector->name) - 1,
-				 "i40evf-%s-%s-%d", basename,
-				 "rx", rx_int_idx++);
+			snprintf(q_vector->name, sizeof(q_vector->name),
+				 "i40evf-%s-rx-%d", basename, rx_int_idx++);
 		} else if (q_vector->tx.ring) {
-			snprintf(q_vector->name, sizeof(q_vector->name) - 1,
-				 "i40evf-%s-%s-%d", basename,
-				 "tx", tx_int_idx++);
+			snprintf(q_vector->name, sizeof(q_vector->name),
+				 "i40evf-%s-tx-%d", basename, tx_int_idx++);
 		} else {
 			/* skip this unused q_vector */
 			continue;
-- 
2.14.1

^ permalink raw reply related

* [net-next 06/15] i40e: force VMDQ device name truncation
From: Jeff Kirsher @ 2017-08-28  0:15 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20170828001603.75876-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

In new versions of GCC since 7.x a new warning exists which warns when
a string is truncated before all of the format can be completed.

When we setup VMDQ netdev names we are copying a pre-existing interface
name which could be up to 15 characters in length. Since we also add
4 bytes, v, the literal %, the d and a \0 null, we would overrun the
available size unless snprintf truncated for us.

The snprintf call will of course truncate on the end, so lets instead
modify the code to force truncation of the copied netdev name by
4 characters, to create enough space for the 4 bytes we're adding.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index b0ccd3c2eec6..3a6a752c6c58 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -9690,8 +9690,13 @@ static int i40e_config_netdev(struct i40e_vsi *vsi)
 		i40e_add_mac_filter(vsi, mac_addr);
 		spin_unlock_bh(&vsi->mac_filter_hash_lock);
 	} else {
-		/* relate the VSI_VMDQ name to the VSI_MAIN name */
-		snprintf(netdev->name, IFNAMSIZ, "%sv%%d",
+		/* Relate the VSI_VMDQ name to the VSI_MAIN name. Note that we
+		 * are still limited by IFNAMSIZ, but we're adding 'v%d\0' to
+		 * the end, which is 4 bytes long, so force truncation of the
+		 * original name by IFNAMSIZ - 4
+		 */
+		snprintf(netdev->name, IFNAMSIZ, "%.*sv%%d",
+			 IFNAMSIZ - 4,
 			 pf->vsi[pf->lan_vsi]->netdev->name);
 		random_ether_addr(mac_addr);
 
-- 
2.14.1

^ permalink raw reply related

* [net-next 08/15] i40e: 25G FEC status improvements
From: Jeff Kirsher @ 2017-08-28  0:15 UTC (permalink / raw)
  To: davem; +Cc: Mariusz Stachura, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <20170828001603.75876-1-jeffrey.t.kirsher@intel.com>

From: Mariusz Stachura <mariusz.stachura@intel.com>

This patch improves the system log message. The log message will
be expanded to include the FEC mode the FW requested before link
was established.

Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 3a6a752c6c58..5a06cd23b9e6 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5354,6 +5354,7 @@ void i40e_print_link_message(struct i40e_vsi *vsi, bool isup)
 	char *speed = "Unknown";
 	char *fc = "Unknown";
 	char *fec = "";
+	char *req_fec = "";
 	char *an = "";
 
 	new_speed = vsi->back->hw.phy.link_info.link_speed;
@@ -5415,6 +5416,7 @@ void i40e_print_link_message(struct i40e_vsi *vsi, bool isup)
 	}
 
 	if (vsi->back->hw.phy.link_info.link_speed == I40E_LINK_SPEED_25GB) {
+		req_fec = ", Requested FEC: None";
 		fec = ", FEC: None";
 		an = ", Autoneg: False";
 
@@ -5427,10 +5429,22 @@ void i40e_print_link_message(struct i40e_vsi *vsi, bool isup)
 		else if (vsi->back->hw.phy.link_info.fec_info &
 			 I40E_AQ_CONFIG_FEC_RS_ENA)
 			fec = ", FEC: CL108 RS-FEC";
+
+		/* 'CL108 RS-FEC' should be displayed when RS is requested, or
+		 * both RS and FC are requested
+		 */
+		if (vsi->back->hw.phy.link_info.req_fec_info &
+		    (I40E_AQ_REQUEST_FEC_KR | I40E_AQ_REQUEST_FEC_RS)) {
+			if (vsi->back->hw.phy.link_info.req_fec_info &
+			    I40E_AQ_REQUEST_FEC_RS)
+				req_fec = ", Requested FEC: CL108 RS-FEC";
+			else
+				req_fec = ", Requested FEC: CL74 FC-FEC/BASE-R";
+		}
 	}
 
-	netdev_info(vsi->netdev, "NIC Link is Up, %sbps Full Duplex%s%s, Flow Control: %s\n",
-		    speed, fec, an, fc);
+	netdev_info(vsi->netdev, "NIC Link is Up, %sbps Full Duplex%s%s%s, Flow Control: %s\n",
+		    speed, req_fec, fec, an, fc);
 }
 
 /**
-- 
2.14.1

^ permalink raw reply related

* [net-next 07/15] i40e/i40evf: support for VF VLAN tag stripping control
From: Jeff Kirsher @ 2017-08-28  0:15 UTC (permalink / raw)
  To: davem; +Cc: Mariusz Stachura, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <20170828001603.75876-1-jeffrey.t.kirsher@intel.com>

From: Mariusz Stachura <mariusz.stachura@intel.com>

This patch gives VF capability to control VLAN tag stripping via
ethtool. As rx-vlan-offload was fixed before, now the VF is able to
change it using "ethtool --offload <IF> rxvlan on/off" settings.

Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 60 ++++++++++++++++++++++
 drivers/net/ethernet/intel/i40evf/i40evf.h         |  4 ++
 drivers/net/ethernet/intel/i40evf/i40evf_main.c    | 33 ++++++++++++
 .../net/ethernet/intel/i40evf/i40evf_virtchnl.c    | 40 +++++++++++++++
 include/linux/avf/virtchnl.h                       |  5 ++
 5 files changed, 142 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 27d87bef4ba3..4d1e670f490e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -2529,6 +2529,60 @@ static int i40e_vc_set_rss_hena(struct i40e_vf *vf, u8 *msg, u16 msglen)
 	return i40e_vc_send_resp_to_vf(vf, VIRTCHNL_OP_SET_RSS_HENA, aq_ret);
 }
 
+/**
+ * i40e_vc_enable_vlan_stripping
+ * @vf: pointer to the VF info
+ * @msg: pointer to the msg buffer
+ * @msglen: msg length
+ *
+ * Enable vlan header stripping for the VF
+ **/
+static int i40e_vc_enable_vlan_stripping(struct i40e_vf *vf, u8 *msg,
+					 u16 msglen)
+{
+	struct i40e_vsi *vsi = vf->pf->vsi[vf->lan_vsi_idx];
+	i40e_status aq_ret = 0;
+
+	if (!test_bit(I40E_VF_STATE_ACTIVE, &vf->vf_states)) {
+		aq_ret = I40E_ERR_PARAM;
+		goto err;
+	}
+
+	i40e_vlan_stripping_enable(vsi);
+
+	/* send the response to the VF */
+err:
+	return i40e_vc_send_resp_to_vf(vf, VIRTCHNL_OP_ENABLE_VLAN_STRIPPING,
+				       aq_ret);
+}
+
+/**
+ * i40e_vc_disable_vlan_stripping
+ * @vf: pointer to the VF info
+ * @msg: pointer to the msg buffer
+ * @msglen: msg length
+ *
+ * Disable vlan header stripping for the VF
+ **/
+static int i40e_vc_disable_vlan_stripping(struct i40e_vf *vf, u8 *msg,
+					  u16 msglen)
+{
+	struct i40e_vsi *vsi = vf->pf->vsi[vf->lan_vsi_idx];
+	i40e_status aq_ret = 0;
+
+	if (!test_bit(I40E_VF_STATE_ACTIVE, &vf->vf_states)) {
+		aq_ret = I40E_ERR_PARAM;
+		goto err;
+	}
+
+	i40e_vlan_stripping_disable(vsi);
+
+	/* send the response to the VF */
+err:
+	return i40e_vc_send_resp_to_vf(vf, VIRTCHNL_OP_DISABLE_VLAN_STRIPPING,
+				       aq_ret);
+}
+
 /**
  * i40e_vc_process_vf_msg
  * @pf: pointer to the PF structure
@@ -2648,6 +2702,12 @@ int i40e_vc_process_vf_msg(struct i40e_pf *pf, s16 vf_id, u32 v_opcode,
 	case VIRTCHNL_OP_SET_RSS_HENA:
 		ret = i40e_vc_set_rss_hena(vf, msg, msglen);
 		break;
+	case VIRTCHNL_OP_ENABLE_VLAN_STRIPPING:
+		ret = i40e_vc_enable_vlan_stripping(vf, msg, msglen);
+		break;
+	case VIRTCHNL_OP_DISABLE_VLAN_STRIPPING:
+		ret = i40e_vc_disable_vlan_stripping(vf, msg, msglen);
+		break;
 
 	case VIRTCHNL_OP_UNKNOWN:
 	default:
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf.h b/drivers/net/ethernet/intel/i40evf/i40evf.h
index e5293d35fb6a..82f69031e5cd 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf.h
+++ b/drivers/net/ethernet/intel/i40evf/i40evf.h
@@ -261,6 +261,8 @@ struct i40evf_adapter {
 #define I40EVF_FLAG_AQ_RELEASE_PROMISC		BIT(16)
 #define I40EVF_FLAG_AQ_REQUEST_ALLMULTI		BIT(17)
 #define I40EVF_FLAG_AQ_RELEASE_ALLMULTI		BIT(18)
+#define I40EVF_FLAG_AQ_ENABLE_VLAN_STRIPPING	BIT(19)
+#define I40EVF_FLAG_AQ_DISABLE_VLAN_STRIPPING	BIT(20)
 
 	/* OS defined structs */
 	struct net_device *netdev;
@@ -358,6 +360,8 @@ void i40evf_get_hena(struct i40evf_adapter *adapter);
 void i40evf_set_hena(struct i40evf_adapter *adapter);
 void i40evf_set_rss_key(struct i40evf_adapter *adapter);
 void i40evf_set_rss_lut(struct i40evf_adapter *adapter);
+void i40evf_enable_vlan_stripping(struct i40evf_adapter *adapter);
+void i40evf_disable_vlan_stripping(struct i40evf_adapter *adapter);
 void i40evf_virtchnl_completion(struct i40evf_adapter *adapter,
 				enum virtchnl_ops v_opcode,
 				i40e_status v_retval, u8 *msg, u16 msglen);
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 258e8e27068b..9ee277e87f10 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -1676,6 +1676,16 @@ static void i40evf_watchdog_task(struct work_struct *work)
 		goto watchdog_done;
 	}
 
+	if (adapter->aq_required & I40EVF_FLAG_AQ_ENABLE_VLAN_STRIPPING) {
+		i40evf_enable_vlan_stripping(adapter);
+		goto watchdog_done;
+	}
+
+	if (adapter->aq_required & I40EVF_FLAG_AQ_DISABLE_VLAN_STRIPPING) {
+		i40evf_disable_vlan_stripping(adapter);
+		goto watchdog_done;
+	}
+
 	if (adapter->aq_required & I40EVF_FLAG_AQ_CONFIGURE_QUEUES) {
 		i40evf_configure_queues(adapter);
 		goto watchdog_done;
@@ -2293,6 +2303,28 @@ static int i40evf_change_mtu(struct net_device *netdev, int new_mtu)
 	return 0;
 }
 
+/**
+ * i40e_set_features - set the netdev feature flags
+ * @netdev: ptr to the netdev being adjusted
+ * @features: the feature set that the stack is suggesting
+ * Note: expects to be called while under rtnl_lock()
+ **/
+static int i40evf_set_features(struct net_device *netdev,
+			       netdev_features_t features)
+{
+	struct i40evf_adapter *adapter = netdev_priv(netdev);
+
+	if (!VLAN_ALLOWED(adapter))
+		return -EINVAL;
+
+	if (features & NETIF_F_HW_VLAN_CTAG_RX)
+		adapter->aq_required |= I40EVF_FLAG_AQ_ENABLE_VLAN_STRIPPING;
+	else
+		adapter->aq_required |= I40EVF_FLAG_AQ_DISABLE_VLAN_STRIPPING;
+
+	return 0;
+}
+
 /**
  * i40evf_features_check - Validate encapsulated packet conforms to limits
  * @skb: skb buff
@@ -2386,6 +2418,7 @@ static const struct net_device_ops i40evf_netdev_ops = {
 	.ndo_vlan_rx_kill_vid	= i40evf_vlan_rx_kill_vid,
 	.ndo_features_check	= i40evf_features_check,
 	.ndo_fix_features	= i40evf_fix_features,
+	.ndo_set_features	= i40evf_set_features,
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller	= i40evf_netpoll,
 #endif
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c b/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
index 6c403bf1bbb8..85876f4fb1fb 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
@@ -820,6 +820,46 @@ void i40evf_set_rss_lut(struct i40evf_adapter *adapter)
 	kfree(vrl);
 }
 
+/**
+ * i40evf_enable_vlan_stripping
+ * @adapter: adapter structure
+ *
+ * Request VLAN header stripping to be enabled
+ **/
+void i40evf_enable_vlan_stripping(struct i40evf_adapter *adapter)
+{
+	if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
+		/* bail because we already have a command pending */
+		dev_err(&adapter->pdev->dev, "Cannot enable stripping, command %d pending\n",
+			adapter->current_op);
+		return;
+	}
+	adapter->current_op = VIRTCHNL_OP_ENABLE_VLAN_STRIPPING;
+	adapter->aq_required &= ~I40EVF_FLAG_AQ_ENABLE_VLAN_STRIPPING;
+	i40evf_send_pf_msg(adapter, VIRTCHNL_OP_ENABLE_VLAN_STRIPPING,
+			   NULL, 0);
+}
+
+/**
+ * i40evf_disable_vlan_stripping
+ * @adapter: adapter structure
+ *
+ * Request VLAN header stripping to be disabled
+ **/
+void i40evf_disable_vlan_stripping(struct i40evf_adapter *adapter)
+{
+	if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
+		/* bail because we already have a command pending */
+		dev_err(&adapter->pdev->dev, "Cannot disable stripping, command %d pending\n",
+			adapter->current_op);
+		return;
+	}
+	adapter->current_op = VIRTCHNL_OP_DISABLE_VLAN_STRIPPING;
+	adapter->aq_required &= ~I40EVF_FLAG_AQ_DISABLE_VLAN_STRIPPING;
+	i40evf_send_pf_msg(adapter, VIRTCHNL_OP_DISABLE_VLAN_STRIPPING,
+			   NULL, 0);
+}
+
 /**
  * i40evf_print_link_message - print link up or down
  * @adapter: adapter structure
diff --git a/include/linux/avf/virtchnl.h b/include/linux/avf/virtchnl.h
index becfca2ae94e..2b038442c352 100644
--- a/include/linux/avf/virtchnl.h
+++ b/include/linux/avf/virtchnl.h
@@ -133,6 +133,8 @@ enum virtchnl_ops {
 	VIRTCHNL_OP_CONFIG_RSS_LUT = 24,
 	VIRTCHNL_OP_GET_RSS_HENA_CAPS = 25,
 	VIRTCHNL_OP_SET_RSS_HENA = 26,
+	VIRTCHNL_OP_ENABLE_VLAN_STRIPPING = 27,
+	VIRTCHNL_OP_DISABLE_VLAN_STRIPPING = 28,
 };
 
 /* This macro is used to generate a compilation error if a structure
@@ -686,6 +688,9 @@ virtchnl_vc_validate_vf_msg(struct virtchnl_version_info *ver, u32 v_opcode,
 	case VIRTCHNL_OP_SET_RSS_HENA:
 		valid_len = sizeof(struct virtchnl_rss_hena);
 		break;
+	case VIRTCHNL_OP_ENABLE_VLAN_STRIPPING:
+	case VIRTCHNL_OP_DISABLE_VLAN_STRIPPING:
+		break;
 	/* These are always errors coming from the VF. */
 	case VIRTCHNL_OP_EVENT:
 	case VIRTCHNL_OP_UNKNOWN:
-- 
2.14.1

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox