Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net 3/3] net: skb_queue_purge(): lock/unlock the queue only once
From: Stephen Hemminger @ 2017-10-02 14:55 UTC (permalink / raw)
  To: Michael Witten
  Cc: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Eric Dumazet, netdev, linux-kernel
In-Reply-To: <057dd5367241468691b2b9adbc38a3ba-mfwitten@gmail.com>

On Mon, 02 Oct 2017 05:15:32 -0000
Michael Witten <mfwitten@gmail.com> wrote:

> On Sun, 1 Oct 2017 17:59:09 -0700, Stephen Hemminger wrote:
> 
> > On Sun, 01 Oct 2017 22:19:20 -0000 Michael Witten wrote:
> >  
> >> +	spin_lock_irqsave(&q->lock, flags);
> >> +	skb = q->next;
> >> +	__skb_queue_head_init(q);
> >> +	spin_unlock_irqrestore(&q->lock, flags);  
> >
> > Other code manipulating lists uses splice operation and
> > a sk_buff_head temporary on the stack. That would be easier
> > to understand.
> >
> > 	struct sk_buf_head head;
> >
> > 	__skb_queue_head_init(&head);
> > 	spin_lock_irqsave(&q->lock, flags);
> > 	skb_queue_splice_init(q, &head);
> > 	spin_unlock_irqrestore(&q->lock, flags);
> >
> >  
> >> +	while (skb != head) {
> >> +		next = skb->next;
> >>  		kfree_skb(skb);
> >> +		skb = next;
> >> +	}  
> >
> > It would be cleaner if you could use
> > skb_queue_walk_safe rather than open coding the loop.
> >
> > 	skb_queue_walk_safe(&head, skb,  tmp)
> > 		kfree_skb(skb);  
> 
> I appreciate abstraction as much as anybody, but I do not believe
> that such abstractions would actually be an improvement here.
> 
> * Splice-initing seems more like an idiom than an abstraction;
>   at first blush, it wouldn't be clear to me what the intention
>   is.
> 
> * Such abstractions are fairly unnecessary.
> 
>     * The function as written is already so short as to be
>       easily digested.
> 
>     * More to the point, this function is not some generic,
>       higher-level algorithm that just happens to employ the
>       socket buffer interface; rather, it is a function that
>       implements part of that very interface, and may thus
>       twiddle the intimate bits of these data structures
>       without being accused of abusing a leaky abstraction.
> 
> * Such abstractions add overhead, if only conceptually. In this
>   case, a temporary socket buffer queue allocates *3* unnecessary
>   struct members, including a whole `spinlock_t' member:
>   
>     prev
>     qlen
>     lock
> 
>   It's possible that the compiler will be smart enough to leave
>   those out, but I have my suspicions that it won't, not only
>   given that the interface contract requires that the temporary
>   socket buffer queue be properly initialized before use, but
>   also because splicing into the temporary will manipulate its
>   `qlen'. Yet, why worry whether optimization happens? The whole
>   issue can simply be avoided by exploiting the intimate details
>   that are already philosophically available to us.
> 
>   Similarly, the function `skb_queue_walk_safe' is nice, but it
>   loses value both because a temporary queue loses value (as just
>   described), and because it ignores the fact that legitimate
>   access to the internals of these data structures allows for
>   setting up the requested loop in advance; that is to say, the
>   two parts of the function that we are now debating can be woven
>   together more tightly than `skb_queue_walk_safe' allows.
> 
> For these reasons, I stand by the way that the patch currently
> implements this function; it does exactly what is desired, no more
> or less.
> 
> Sincerely,
> Michael Witten

The point is that there was discussion in the past of replacing
the next/prev as used in skb with more generic code from list.h.
If the abstraction was used, then this code would just work.

The temporary skb_buff_head is on the stack, and any
access to updating those fields like qlen are in CPU cache
and therefore have very little impact on any peformance.

^ permalink raw reply

* Re: RFC iproute2 doc files
From: Stephen Hemminger @ 2017-10-02 14:56 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: netdev
In-Reply-To: <20171002073109.GK2031@mtr-leonro.local>

[-- Attachment #1: Type: text/plain, Size: 716 bytes --]

On Mon, 2 Oct 2017 10:31:09 +0300
Leon Romanovsky <leon@kernel.org> wrote:

> On Wed, Sep 20, 2017 at 08:11:59AM -0700, Stephen Hemminger wrote:
> > I noticed that the iproute man pages are up to date but the LaTex documentation
> > is very out of date. Rarely updated since the Linux 2.2 days.
> >
> > Either someone needs to do a massive editing job on them, or they should just
> > be dropped. My preference would be to just drop everything in the doc/ directory.
> > The current versions are so old, they can't be helping.  
> 
> If my vote counts, I will say to drop it.
> 
> Thanks

They are gone now in current git repo. If anyone wants them they can
resurrect them from git and revise them.

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: v4.14-rc2/arm64 kernel BUG at net/core/skbuff.c:2626
From: Mark Rutland @ 2017-10-02 15:01 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: LKML, netdev, linux-arm-kernel, syzkaller, David S. Miller,
	Willem de Bruijn
In-Reply-To: <CANn89i+zQG=rjHRqzsvPzjg5tqW43Lcz-BJ9spLascP9Nt5z8Q@mail.gmail.com>

On Mon, Oct 02, 2017 at 07:42:17AM -0700, Eric Dumazet wrote:
> On Mon, Oct 2, 2017 at 7:21 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> > Just to check I've understood correctly, are you suggesting that the
> > IPv4 code should also check the dev->mtu against a IP_MIN_MTU (which
> > doesn't seem to exist today)?
> 
> We have plenty of places this is checked.
> 
> For example, trying to set MTU < 68 usually removes IPv4 addresses and routes.
> 
> Problem is : these checks are not fool proof yet.
> 
> ( Only the admin was supposed to play these games )

Sorry, I meant that there was no constant called IP_MIN_MTU, and I was
just looking at the sites fixed up by c780a049f9bf4423.

I appreciate given that this requires admin privileges it's not exactly
high priority. I didn't mean for the above to sound like some kind of
accusation!

> > Otherwise, I do spot another potential issue. The writer side (e.g. most
> > net_device::ndo_change_mtu implementations and the __dev_set_mtu()
> > fallback) doesn't use WRITE_ONCE().
> 
> It does not matter how many strange values can be observed by the reader :
> We must be fool proof anyway from reader point of view, so the
> WRITE_ONCE() is not strictly needed.

Ok. If we expect to always check somewhere on the reader side I guess
that makes sense.

Thanks,
Mark.

^ permalink raw reply

* Re: [iproute PATCH v3 0/3] Check user supplied interface name lengths
From: Stephen Hemminger @ 2017-10-02 15:03 UTC (permalink / raw)
  To: Phil Sutter; +Cc: netdev
In-Reply-To: <20171002114637.25703-1-phil@nwl.cc>

On Mon,  2 Oct 2017 13:46:34 +0200
Phil Sutter <phil@nwl.cc> wrote:

> This series adds explicit checks for user-supplied interface names to
> make sure they fit Linux's requirements.
> 
> The first two patches simplify interface name parsing in some places -
> these are side-effects of working on the actual implementation provided
> in patch three.
> 
> Changes since v2:
> - Changed patch 3 as suggested in review.
> 
> Changes since v1:
> - Patches 1 and 2 introduced.
> - Changes to patch 3 are listed in there.
> 
> Phil Sutter (3):
>   ip{6,}tunnel: Avoid copying user-supplied interface name around
>   tc: flower: No need to cache indev arg
>   Check user supplied interface name lengths
> 
>  include/utils.h |  2 ++
>  ip/ip6tunnel.c  |  9 +++++----
>  ip/ipl2tp.c     |  4 +++-
>  ip/iplink.c     | 31 ++++++++++++-------------------
>  ip/ipmaddr.c    |  3 ++-
>  ip/iprule.c     | 10 ++++++++--
>  ip/iptunnel.c   | 29 +++++++++++++++--------------
>  ip/iptuntap.c   |  6 ++++--
>  lib/utils.c     | 29 +++++++++++++++++++++++++++++
>  misc/arpd.c     |  3 ++-
>  tc/f_flower.c   |  7 +++----
>  11 files changed, 85 insertions(+), 48 deletions(-)
> 

Applied.

^ permalink raw reply

* Re: v4.14-rc2/arm64 kernel BUG at net/core/skbuff.c:2626
From: Mark Rutland @ 2017-10-02 15:03 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, LKML, netdev, linux-arm-kernel, syzkaller,
	David S. Miller, Willem de Bruijn
In-Reply-To: <1506955708.8061.5.camel@edumazet-glaptop3.roam.corp.google.com>

On Mon, Oct 02, 2017 at 07:48:28AM -0700, Eric Dumazet wrote:
> Please try the following fool proof patch.
> 
> This is what I had in my local tree back in August but could not
> conclude on the syzkaller bug I was working on.

Thanks, I'll give this a go shortly.

I'm currently minimizing the Syzkaller log so that I can trigger the
issue more quickly (and have some confidence in a Tested-by)!

Thanks,
Mark.

> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
> index 681e33998e03b609fdca83a83e0fc62a3fee8c39..e51d777797a927058760a1ab7af00579f7488cb5 100644
> --- a/net/ipv4/icmp.c
> +++ b/net/ipv4/icmp.c
> @@ -732,7 +732,8 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
>  		room = 576;
>  	room -= sizeof(struct iphdr) + icmp_param.replyopts.opt.opt.optlen;
>  	room -= sizeof(struct icmphdr);
> -
> +	if (room < 0)
> +		goto ende;
>  	icmp_param.data_len = skb_in->len - icmp_param.offset;
>  	if (icmp_param.data_len > room)
>  		icmp_param.data_len = room;
> 
> 
> 

^ permalink raw reply

* Re: [PATCH net-next v2] net: core: decouple ifalias get/set from rtnl lock
From: Florian Westphal @ 2017-10-02 15:09 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Florian Westphal, netdev
In-Reply-To: <1506956029.8061.8.camel@edumazet-glaptop3.roam.corp.google.com>

Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Just use RCU : A writer is supposed to work on a private copy, and
> _then_ publish the new pointer, so that a reader can not see mangled
> string.
> 
> We either copy the 'old' name or the 'new' one.
> 
> A seqcount is not needed, and wont prevent you from reading the value
> right before a change anyway.

Would you rather use kfree_rcu or unconditional synchronize_net()
before releasing old memory?

^ permalink raw reply

* Re: [PATCH net-next v2] net: core: decouple ifalias get/set from rtnl lock
From: Eric Dumazet @ 2017-10-02 15:19 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev
In-Reply-To: <20171002150936.GB30423@breakpoint.cc>

On Mon, 2017-10-02 at 17:09 +0200, Florian Westphal wrote:
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Just use RCU : A writer is supposed to work on a private copy, and
> > _then_ publish the new pointer, so that a reader can not see mangled
> > string.
> > 
> > We either copy the 'old' name or the 'new' one.
> > 
> > A seqcount is not needed, and wont prevent you from reading the value
> > right before a change anyway.
> 
> Would you rather use kfree_rcu or unconditional synchronize_net()
> before releasing old memory?

kfree_rcu() please ;)

Adding 16 bytes for the rcu_head is acceptable I think.

^ permalink raw reply

* Re: [PATCH v2 3/6] staging: fsl-dpaa2/ethsw: Add ethtool support
From: Andrew Lunn @ 2017-10-02 15:37 UTC (permalink / raw)
  To: Razvan Stefanescu
  Cc: gregkh, devel, linux-kernel, netdev, agraf, arnd,
	alexandru.marginean, bogdan.purcareata, ruxandra.radulescu,
	laurentiu.tudor, stuyoder
In-Reply-To: <1506933380-12641-4-git-send-email-razvan.stefanescu@nxp.com>

Hi Razvan

> +static void ethsw_get_drvinfo(struct net_device *netdev,
> +			      struct ethtool_drvinfo *drvinfo)
> +{
> +	struct ethsw_port_priv *port_priv = netdev_priv(netdev);
> +	u16 version_major, version_minor;
> +	int err;
> +
> +	strlcpy(drvinfo->driver, KBUILD_MODNAME, sizeof(drvinfo->driver));
> +	strlcpy(drvinfo->version, ethsw_drv_version, sizeof(drvinfo->version));

Software driver versions are mostly useless. I would suggest you
remove this.

       Andrew

^ permalink raw reply

* [net-next 00/13][pull request] 100GbE Intel Wired LAN Driver Updates 2017-10-02
From: Jeff Kirsher @ 2017-10-02 15:42 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann, jogreene

This series contains updates to fm10k only.

Jake provides all but one of the changes in this series.  Most are small
fixes, starting with ensuring prompt transmission of messages queued up
after each VF message is received and handled.  Fix a possible race
condition between the watchdog task and the processing of mailbox
messages by just checking whether the mailbox is still open.  Fix a
couple of GCC v7 warnings, including misspelled "fall through" comments
and warnings about possible truncation of calls to snprintf().  Cleaned
up a convoluted bitshift and read for the PFVFLRE register.  Fixed a
potential divide by zero when finding the proper r_idx.

Markus Elfring fixes an issue which was found using Coccinelle, where
we should have been using seq_putc() instead of seq_puts().

The following are changes since commit 0929567a7a2dab8455a7313956973ff0d339709a:
  samples/bpf: fix warnings in xdp_monitor_user
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 100GbE

Jacob Keller (12):
  fm10k: ensure we process SM mbx when processing VF mbx
  fm10k: reschedule service event if we stall the PF<->SM mailbox
  fm10k: stop spurious link down messages when Tx FIFO is full
  fm10k: fix typos on fall through comments
  fm10k: avoid possible truncation of q_vector->name
  fm10k: add missing fall through comment
  fm10k: avoid needless delay when loading driver
  fm10k: simplify reading PFVFLRE register
  fm10k: don't loop while resetting VFs due to VFLR event
  fm10k: avoid divide by zero in rare cases when device is resetting
  fm10k: move fm10k_prepare_for_reset and fm10k_handle_reset
  fm10k: prevent race condition of __FM10K_SERVICE_SCHED

Markus Elfring (1):
  fm10k: Use seq_putc() in fm10k_dbg_desc_break()

 drivers/net/ethernet/intel/fm10k/fm10k_common.c  |   6 +-
 drivers/net/ethernet/intel/fm10k/fm10k_debugfs.c |   4 +-
 drivers/net/ethernet/intel/fm10k/fm10k_iov.c     |  35 ++++----
 drivers/net/ethernet/intel/fm10k/fm10k_main.c    |   1 +
 drivers/net/ethernet/intel/fm10k/fm10k_mbx.c     |   4 +-
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c  |   8 +-
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c     | 110 +++++++++++++----------
 drivers/net/ethernet/intel/fm10k/fm10k_pf.c      |  10 +--
 8 files changed, 101 insertions(+), 77 deletions(-)

-- 
2.14.2

^ permalink raw reply

* [net-next 01/13] fm10k: ensure we process SM mbx when processing VF mbx
From: Jeff Kirsher @ 2017-10-02 15:42 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171002154236.84043-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

When we process VF mailboxes, the driver is likely going to also queue
up messages to the switch manager. This process merely queues up the
FIFO, but doesn't actually begin the transmission process. Because we
hold the mailbox lock during this VF processing, the PF<->SM mailbox is
not getting processed at this time. Ensure that we actually process the
PF<->SM mailbox in between each PF<->VF mailbox.

This should ensure prompt transmission of the messages queued up after
each VF message is received and handled.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_iov.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
index 5f4dac0d36ef..2ec49116fe91 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
@@ -126,6 +126,9 @@ s32 fm10k_iov_mbx(struct fm10k_intfc *interface)
 		struct fm10k_mbx_info *mbx = &vf_info->mbx;
 		u16 glort = vf_info->glort;
 
+		/* process the SM mailbox first to drain outgoing messages */
+		hw->mbx.ops.process(hw, &hw->mbx);
+
 		/* verify port mapping is valid, if not reset port */
 		if (vf_info->vf_flags && !fm10k_glort_valid_pf(hw, glort))
 			hw->iov.ops.reset_lport(hw, vf_info);
-- 
2.14.2

^ permalink raw reply related

* [net-next 06/13] fm10k: avoid possible truncation of q_vector->name
From: Jeff Kirsher @ 2017-10-02 15:42 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171002154236.84043-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

New versions of GCC since version 7 began warning about possible
truncation of calls to snprintf. We can fix this and avoid false
positives. First, we should pass the full buffer size to snprintf,
because it guarantees a NULL character as part of its passed length, so
passing len-1 is simply wasting a byte of possible storage.

Second, if we make the ri and ti variables unsigned, the compiler is
able to correctly reason that the value never gets larger than 256, so
it doesn't need to warn about the full space required to print a signed
integer.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 63784576ae8b..9212b3fa3b62 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -1544,7 +1544,7 @@ int fm10k_qv_request_irq(struct fm10k_intfc *interface)
 	struct net_device *dev = interface->netdev;
 	struct fm10k_hw *hw = &interface->hw;
 	struct msix_entry *entry;
-	int ri = 0, ti = 0;
+	unsigned int ri = 0, ti = 0;
 	int vector, err;
 
 	entry = &interface->msix_entries[NON_Q_VECTORS(hw)];
@@ -1554,15 +1554,15 @@ int fm10k_qv_request_irq(struct fm10k_intfc *interface)
 
 		/* name the vector */
 		if (q_vector->tx.count && q_vector->rx.count) {
-			snprintf(q_vector->name, sizeof(q_vector->name) - 1,
-				 "%s-TxRx-%d", dev->name, ri++);
+			snprintf(q_vector->name, sizeof(q_vector->name),
+				 "%s-TxRx-%u", dev->name, ri++);
 			ti++;
 		} else if (q_vector->rx.count) {
-			snprintf(q_vector->name, sizeof(q_vector->name) - 1,
-				 "%s-rx-%d", dev->name, ri++);
+			snprintf(q_vector->name, sizeof(q_vector->name),
+				 "%s-rx-%u", dev->name, ri++);
 		} else if (q_vector->tx.count) {
-			snprintf(q_vector->name, sizeof(q_vector->name) - 1,
-				 "%s-tx-%d", dev->name, ti++);
+			snprintf(q_vector->name, sizeof(q_vector->name),
+				 "%s-tx-%u", dev->name, ti++);
 		} else {
 			/* skip this unused q_vector */
 			continue;
-- 
2.14.2

^ permalink raw reply related

* [net-next 02/13] fm10k: reschedule service event if we stall the PF<->SM mailbox
From: Jeff Kirsher @ 2017-10-02 15:42 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171002154236.84043-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

When we are handling PF<->VF mailbox messages, it is possible that the
VF will send us so many messages that the PF<->SM FIFO will fill up. In
this case, we stop the loop and wait until the service event is
rescheduled.

Normally this should happen due to an interrupt. But it is possible that
we don't get another interrupt for a while and it isn't until the
service timer actually reschedules us. Instead, simply reschedule
immediately which will cause the service event to be run again as soon
as we exit.

This ensures that we promptly handle all of the PF<->VF messages with
minimal delay, while still giving time for the SM mailbox to drain.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_iov.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
index 2ec49116fe91..d8356c494f06 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
@@ -143,6 +143,10 @@ s32 fm10k_iov_mbx(struct fm10k_intfc *interface)
 		if (!hw->mbx.ops.tx_ready(&hw->mbx, FM10K_VFMBX_MSG_MTU)) {
 			/* keep track of how many times this occurs */
 			interface->hw_sm_mbx_full++;
+
+			/* make sure we try again momentarily */
+			fm10k_service_event_schedule(interface);
+
 			break;
 		}
 
-- 
2.14.2

^ permalink raw reply related

* [net-next 08/13] fm10k: avoid needless delay when loading driver
From: Jeff Kirsher @ 2017-10-02 15:42 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171002154236.84043-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

When we load the driver, we set the last_reset to be in the future,
which delays the initial driver reset. Additionally, the service task
isn't scheduled to run automatically until the timer runs out. This
causes a needless delay of the first reset to begin talking to the
switch manager.

We can avoid this by simply not setting last_reset and immediately
scheduling the service task while in probe. This allows the device to
wake up faster, and avoids this delay.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 9212b3fa3b62..6c2c4bffaedf 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -1800,9 +1800,6 @@ static int fm10k_sw_init(struct fm10k_intfc *interface,
 		netdev->vlan_features |= NETIF_F_HIGHDMA;
 	}
 
-	/* delay any future reset requests */
-	interface->last_reset = jiffies + (10 * HZ);
-
 	/* reset and initialize the hardware so it is in a known state */
 	err = hw->mac.ops.reset_hw(hw);
 	if (err) {
@@ -2079,8 +2076,9 @@ static int fm10k_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	/* enable SR-IOV after registering netdev to enforce PF/VF ordering */
 	fm10k_iov_configure(pdev, 0);
 
-	/* clear the service task disable bit to allow service task to start */
+	/* clear the service task disable bit and kick off service task */
 	clear_bit(__FM10K_SERVICE_DISABLE, interface->state);
+	fm10k_service_event_schedule(interface);
 
 	return 0;
 
-- 
2.14.2

^ permalink raw reply related

* [net-next 04/13] fm10k: stop spurious link down messages when Tx FIFO is full
From: Jeff Kirsher @ 2017-10-02 15:42 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171002154236.84043-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

In fm10k_get_host_state_generic, we check the mailbox tx_read() function
to ensure that the mailbox is still open. This function also checks to
make sure we have space to transmit another message. Unfortunately, if
we just recently sent a bunch of messages (such as enabling hundreds of
VLANs on a VF) this can result in a race where the watchdog task thinks
the link went down just because we haven't had time to process all these
messages yet.

Instead, lets just check whether the mailbox is still open. This ensures
that we don't race with the Tx FIFO, and we only link down once the
mailbox is not open.

This is safe, because if the FIFO fills up and we're unable to send
a message for too long, we'll end up triggering the timeout detection
which results in a reset. Additionally, since we still check to ensure
the mailbox state is OPEN, we'll transition to link down whenever the
mailbox closes as well.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_common.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_common.c b/drivers/net/ethernet/intel/fm10k/fm10k_common.c
index 62a6ad9b3eed..736a9f087bc9 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_common.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_common.c
@@ -1,5 +1,5 @@
 /* Intel(R) Ethernet Switch Host Interface Driver
- * Copyright(c) 2013 - 2016 Intel Corporation.
+ * Copyright(c) 2013 - 2017 Intel Corporation.
  *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms and conditions of the GNU General Public License,
@@ -517,8 +517,8 @@ s32 fm10k_get_host_state_generic(struct fm10k_hw *hw, bool *host_ready)
 		goto out;
 	}
 
-	/* verify Mailbox is still valid */
-	if (!mbx->ops.tx_ready(mbx, FM10K_VFMBX_MSG_MTU))
+	/* verify Mailbox is still open */
+	if (mbx->state != FM10K_STATE_OPEN)
 		goto out;
 
 	/* interface cannot receive traffic without logical ports */
-- 
2.14.2

^ permalink raw reply related

* [net-next 03/13] fm10k: Use seq_putc() in fm10k_dbg_desc_break()
From: Jeff Kirsher @ 2017-10-02 15:42 UTC (permalink / raw)
  To: davem; +Cc: Markus Elfring, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171002154236.84043-1-jeffrey.t.kirsher@intel.com>

From: Markus Elfring <elfring@users.sourceforge.net>

Two single characters should be put into a sequence.
Thus use the corresponding function "seq_putc".

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_debugfs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_debugfs.c b/drivers/net/ethernet/intel/fm10k/fm10k_debugfs.c
index 5116fd043630..14df09e2d964 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_debugfs.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_debugfs.c
@@ -52,9 +52,9 @@ static void fm10k_dbg_desc_seq_stop(struct seq_file __always_unused *s,
 static void fm10k_dbg_desc_break(struct seq_file *s, int i)
 {
 	while (i--)
-		seq_puts(s, "-");
+		seq_putc(s, '-');
 
-	seq_puts(s, "\n");
+	seq_putc(s, '\n');
 }
 
 static int fm10k_dbg_tx_desc_seq_show(struct seq_file *s, void *v)
-- 
2.14.2

^ permalink raw reply related

* [net-next 09/13] fm10k: simplify reading PFVFLRE register
From: Jeff Kirsher @ 2017-10-02 15:42 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171002154236.84043-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

We're doing a really convoluted bitshift and read for the PFVFLRE
register. Just reading the PFVFLRE(1), shifting it by 32, then reading
PFVFLRE(0) should be sufficient.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_iov.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
index d8356c494f06..dfc88a463735 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
@@ -1,5 +1,5 @@
 /* Intel(R) Ethernet Switch Host Interface Driver
- * Copyright(c) 2013 - 2016 Intel Corporation.
+ * Copyright(c) 2013 - 2017 Intel Corporation.
  *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms and conditions of the GNU General Public License,
@@ -67,10 +67,8 @@ s32 fm10k_iov_event(struct fm10k_intfc *interface)
 
 	/* read VFLRE to determine if any VFs have been reset */
 	do {
-		vflre = fm10k_read_reg(hw, FM10K_PFVFLRE(0));
+		vflre = fm10k_read_reg(hw, FM10K_PFVFLRE(1));
 		vflre <<= 32;
-		vflre |= fm10k_read_reg(hw, FM10K_PFVFLRE(1));
-		vflre = (vflre << 32) | (vflre >> 32);
 		vflre |= fm10k_read_reg(hw, FM10K_PFVFLRE(0));
 
 		i = iov_data->num_vfs;
-- 
2.14.2

^ permalink raw reply related

* [net-next 05/13] fm10k: fix typos on fall through comments
From: Jeff Kirsher @ 2017-10-02 15:42 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171002154236.84043-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

Newer versions of GCC since version 7 now warn when a case statement may
fall through without an explicit comment. "Fallthough" does not count as
it is misspelled. Fix the typos for these comments to appease the new
warnings.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_mbx.c |  4 ++--
 drivers/net/ethernet/intel/fm10k/fm10k_pf.c  | 10 +++++-----
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_mbx.c b/drivers/net/ethernet/intel/fm10k/fm10k_mbx.c
index 334088a101c3..244d3ad58ca7 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_mbx.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_mbx.c
@@ -1,5 +1,5 @@
 /* Intel(R) Ethernet Switch Host Interface Driver
- * Copyright(c) 2013 - 2016 Intel Corporation.
+ * Copyright(c) 2013 - 2017 Intel Corporation.
  *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms and conditions of the GNU General Public License,
@@ -1586,7 +1586,7 @@ s32 fm10k_pfvf_mbx_init(struct fm10k_hw *hw, struct fm10k_mbx_info *mbx,
 			mbx->mbmem_reg = FM10K_MBMEM_VF(id, 0);
 			break;
 		}
-		/* fallthough */
+		/* fall through */
 	default:
 		return FM10K_MBX_ERR_NO_MBX;
 	}
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pf.c b/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
index 40ee0242a80a..9e4fb3a44376 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
@@ -1,5 +1,5 @@
 /* Intel(R) Ethernet Switch Host Interface Driver
- * Copyright(c) 2013 - 2016 Intel Corporation.
+ * Copyright(c) 2013 - 2017 Intel Corporation.
  *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms and conditions of the GNU General Public License,
@@ -1334,19 +1334,19 @@ static u8 fm10k_iov_supported_xcast_mode_pf(struct fm10k_vf_info *vf_info,
 	case FM10K_XCAST_MODE_PROMISC:
 		if (vf_flags & FM10K_VF_FLAG_PROMISC_CAPABLE)
 			return FM10K_XCAST_MODE_PROMISC;
-		/* fallthough */
+		/* fall through */
 	case FM10K_XCAST_MODE_ALLMULTI:
 		if (vf_flags & FM10K_VF_FLAG_ALLMULTI_CAPABLE)
 			return FM10K_XCAST_MODE_ALLMULTI;
-		/* fallthough */
+		/* fall through */
 	case FM10K_XCAST_MODE_MULTI:
 		if (vf_flags & FM10K_VF_FLAG_MULTI_CAPABLE)
 			return FM10K_XCAST_MODE_MULTI;
-		/* fallthough */
+		/* fall through */
 	case FM10K_XCAST_MODE_NONE:
 		if (vf_flags & FM10K_VF_FLAG_NONE_CAPABLE)
 			return FM10K_XCAST_MODE_NONE;
-		/* fallthough */
+		/* fall through */
 	default:
 		break;
 	}
-- 
2.14.2

^ permalink raw reply related

* [net-next 11/13] fm10k: avoid divide by zero in rare cases when device is resetting
From: Jeff Kirsher @ 2017-10-02 15:42 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171002154236.84043-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

It is possible that under rare circumstances the device is undergoing
a reset, such as when a PFLR occurs, and the device may be transmitting
simultaneously. In this case, we might attempt to divide by zero when
finding the proper r_idx. Instead, lets read the num_tx_queues once,
and make sure it's non-zero.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
index e69d49d91d67..77d495fedced 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
@@ -643,9 +643,13 @@ int fm10k_close(struct net_device *netdev)
 static netdev_tx_t fm10k_xmit_frame(struct sk_buff *skb, struct net_device *dev)
 {
 	struct fm10k_intfc *interface = netdev_priv(dev);
+	int num_tx_queues = READ_ONCE(interface->num_tx_queues);
 	unsigned int r_idx = skb->queue_mapping;
 	int err;
 
+	if (!num_tx_queues)
+		return NETDEV_TX_BUSY;
+
 	if ((skb->protocol == htons(ETH_P_8021Q)) &&
 	    !skb_vlan_tag_present(skb)) {
 		/* FM10K only supports hardware tagging, any tags in frame
@@ -698,8 +702,8 @@ static netdev_tx_t fm10k_xmit_frame(struct sk_buff *skb, struct net_device *dev)
 		__skb_put(skb, pad_len);
 	}
 
-	if (r_idx >= interface->num_tx_queues)
-		r_idx %= interface->num_tx_queues;
+	if (r_idx >= num_tx_queues)
+		r_idx %= num_tx_queues;
 
 	err = fm10k_xmit_frame_ring(skb, interface->tx_ring[r_idx]);
 
-- 
2.14.2

^ permalink raw reply related

* [net-next 12/13] fm10k: move fm10k_prepare_for_reset and fm10k_handle_reset
From: Jeff Kirsher @ 2017-10-02 15:42 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171002154236.84043-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

A future patch needs these functions defined earlier in the file. Move
them closer to above where they will be called.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 58 ++++++++++++++--------------
 1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 6c2c4bffaedf..41335154d6b1 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -132,35 +132,6 @@ static void fm10k_service_timer(unsigned long data)
 	fm10k_service_event_schedule(interface);
 }
 
-static void fm10k_detach_subtask(struct fm10k_intfc *interface)
-{
-	struct net_device *netdev = interface->netdev;
-	u32 __iomem *hw_addr;
-	u32 value;
-
-	/* do nothing if device is still present or hw_addr is set */
-	if (netif_device_present(netdev) || interface->hw.hw_addr)
-		return;
-
-	/* check the real address space to see if we've recovered */
-	hw_addr = READ_ONCE(interface->uc_addr);
-	value = readl(hw_addr);
-	if (~value) {
-		interface->hw.hw_addr = interface->uc_addr;
-		netif_device_attach(netdev);
-		set_bit(FM10K_FLAG_RESET_REQUESTED, interface->flags);
-		netdev_warn(netdev, "PCIe link restored, device now attached\n");
-		return;
-	}
-
-	rtnl_lock();
-
-	if (netif_running(netdev))
-		dev_close(netdev);
-
-	rtnl_unlock();
-}
-
 static void fm10k_prepare_for_reset(struct fm10k_intfc *interface)
 {
 	struct net_device *netdev = interface->netdev;
@@ -270,6 +241,35 @@ static int fm10k_handle_reset(struct fm10k_intfc *interface)
 	return err;
 }
 
+static void fm10k_detach_subtask(struct fm10k_intfc *interface)
+{
+	struct net_device *netdev = interface->netdev;
+	u32 __iomem *hw_addr;
+	u32 value;
+
+	/* do nothing if device is still present or hw_addr is set */
+	if (netif_device_present(netdev) || interface->hw.hw_addr)
+		return;
+
+	/* check the real address space to see if we've recovered */
+	hw_addr = READ_ONCE(interface->uc_addr);
+	value = readl(hw_addr);
+	if (~value) {
+		interface->hw.hw_addr = interface->uc_addr;
+		netif_device_attach(netdev);
+		set_bit(FM10K_FLAG_RESET_REQUESTED, interface->flags);
+		netdev_warn(netdev, "PCIe link restored, device now attached\n");
+		return;
+	}
+
+	rtnl_lock();
+
+	if (netif_running(netdev))
+		dev_close(netdev);
+
+	rtnl_unlock();
+}
+
 static void fm10k_reinit(struct fm10k_intfc *interface)
 {
 	int err;
-- 
2.14.2

^ permalink raw reply related

* [net-next 07/13] fm10k: add missing fall through comment
From: Jeff Kirsher @ 2017-10-02 15:42 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171002154236.84043-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

Newer versions of GCC starting with 7 now additionally warn when a case
statement may fall through without an explicit comment mentioning it.
Add such a comment to silence the warning, as this is expected.

Unfortunately the comment must come directly before the next case
statement, so we put it outside the #ifdef. Otherwise, the compiler
cannot properly detect it and thus the warning is displayed regardless.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index 9dffaba85ae6..189d52a8a605 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -876,6 +876,7 @@ static void fm10k_tx_csum(struct fm10k_ring *tx_ring,
 	case IPPROTO_GRE:
 		if (skb->encapsulation)
 			break;
+		/* fall through */
 	default:
 		if (unlikely(net_ratelimit())) {
 			dev_warn(tx_ring->dev,
-- 
2.14.2

^ permalink raw reply related

* [net-next 10/13] fm10k: don't loop while resetting VFs due to VFLR event
From: Jeff Kirsher @ 2017-10-02 15:42 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171002154236.84043-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

We've always had a really weird looping construction for resetting VFs.
We read the VFLRE register and reset the VF if the corresponding bit is
set, which makes sense. However we loop continuously until we no longer
have any bits left unset. At first this makes sense, as a sort of "keep
trying until we succeed" concept.

Unfortunately this causes a problem if we happen to surprise remove
while this code is executing, because in this case we'll always read all
1s for the VFLRE register. This results in a hard lockup on the CPU
because the loop will never terminate.

Because our own reset function will clear the VFLR event register
always, (except when we've lost PCIe link obviously) there is no real
reason to loop. In practice, we'll loop over once and find that no VFs
are pending anymore.

Lets just check once. Since we're clear the notification when we reset
there's no benefit to the loop. Additionally, there shouldn't be a race
as future VLFRE events should trigger an interrupt. Additionally, we
didn't warn or do anything in the looped case anyways.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_iov.c | 24 +++++++++++-------------
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
index dfc88a463735..03897720bf0b 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
@@ -66,23 +66,21 @@ s32 fm10k_iov_event(struct fm10k_intfc *interface)
 		goto read_unlock;
 
 	/* read VFLRE to determine if any VFs have been reset */
-	do {
-		vflre = fm10k_read_reg(hw, FM10K_PFVFLRE(1));
-		vflre <<= 32;
-		vflre |= fm10k_read_reg(hw, FM10K_PFVFLRE(0));
+	vflre = fm10k_read_reg(hw, FM10K_PFVFLRE(1));
+	vflre <<= 32;
+	vflre |= fm10k_read_reg(hw, FM10K_PFVFLRE(0));
 
-		i = iov_data->num_vfs;
+	i = iov_data->num_vfs;
 
-		for (vflre <<= 64 - i; vflre && i--; vflre += vflre) {
-			struct fm10k_vf_info *vf_info = &iov_data->vf_info[i];
+	for (vflre <<= 64 - i; vflre && i--; vflre += vflre) {
+		struct fm10k_vf_info *vf_info = &iov_data->vf_info[i];
 
-			if (vflre >= 0)
-				continue;
+		if (vflre >= 0)
+			continue;
 
-			hw->iov.ops.reset_resources(hw, vf_info);
-			vf_info->mbx.ops.connect(hw, &vf_info->mbx);
-		}
-	} while (i != iov_data->num_vfs);
+		hw->iov.ops.reset_resources(hw, vf_info);
+		vf_info->mbx.ops.connect(hw, &vf_info->mbx);
+	}
 
 read_unlock:
 	rcu_read_unlock();
-- 
2.14.2

^ permalink raw reply related

* [net-next 13/13] fm10k: prevent race condition of __FM10K_SERVICE_SCHED
From: Jeff Kirsher @ 2017-10-02 15:42 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171002154236.84043-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

Although very unlikely, it is possible that cancel_work_sync() may stop
the service_task before it actually started. In this case, the
__FM10K_SERVICE_SCHED bit will never be cleared. This results in the
service task being unable to reschedule in the future. Add a helper
function which sets the service disable bit, waits for the service task
to stop and clears the schedule bit, thus avoiding the race condition.
We know the schedule bit is safe to clear because the cancel_work_sync()
guarantees the service task is not running.

Add a helper function also to restart the service task, for symmetry.
This is not strictly needed but helps the mental model of how to stop
and start the service task.

This race could only happen in fm10k_suspend/fm10k_resume as this is the
only place where the service task is actually restarted. Thus,
suspend/resume testing would be ideal. However, note that the chance of
this happening is very slim as the service event is scheduled for
immediate execution, and you would have to trigger a suspend at almost
the exact same time as the service task was scheduled.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 32 ++++++++++++++++++++++------
 1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 41335154d6b1..9575f7c1862d 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -118,6 +118,27 @@ static void fm10k_service_event_complete(struct fm10k_intfc *interface)
 		fm10k_service_event_schedule(interface);
 }
 
+static void fm10k_stop_service_event(struct fm10k_intfc *interface)
+{
+	set_bit(__FM10K_SERVICE_DISABLE, interface->state);
+	cancel_work_sync(&interface->service_task);
+
+	/* It's possible that cancel_work_sync stopped the service task from
+	 * running before it could actually start. In this case the
+	 * __FM10K_SERVICE_SCHED bit will never be cleared. Since we know that
+	 * the service task cannot be running at this point, we need to clear
+	 * the scheduled bit, as otherwise the service task may never be
+	 * restarted.
+	 */
+	clear_bit(__FM10K_SERVICE_SCHED, interface->state);
+}
+
+static void fm10k_start_service_event(struct fm10k_intfc *interface)
+{
+	clear_bit(__FM10K_SERVICE_DISABLE, interface->state);
+	fm10k_service_event_schedule(interface);
+}
+
 /**
  * fm10k_service_timer - Timer Call-back
  * @data: pointer to interface cast into an unsigned long
@@ -2116,8 +2137,7 @@ static void fm10k_remove(struct pci_dev *pdev)
 
 	del_timer_sync(&interface->service_timer);
 
-	set_bit(__FM10K_SERVICE_DISABLE, interface->state);
-	cancel_work_sync(&interface->service_task);
+	fm10k_stop_service_event(interface);
 
 	/* free netdev, this may bounce the interrupts due to setup_tc */
 	if (netdev->reg_state == NETREG_REGISTERED)
@@ -2155,8 +2175,7 @@ static void fm10k_prepare_suspend(struct fm10k_intfc *interface)
 	 * stopped. We stop the watchdog task until after we resume software
 	 * activity.
 	 */
-	set_bit(__FM10K_SERVICE_DISABLE, interface->state);
-	cancel_work_sync(&interface->service_task);
+	fm10k_stop_service_event(interface);
 
 	fm10k_prepare_for_reset(interface);
 }
@@ -2183,9 +2202,8 @@ static int fm10k_handle_resume(struct fm10k_intfc *interface)
 	interface->link_down_event = jiffies + (HZ);
 	set_bit(__FM10K_LINK_DOWN, interface->state);
 
-	/* clear the service task disable bit to allow service task to start */
-	clear_bit(__FM10K_SERVICE_DISABLE, interface->state);
-	fm10k_service_event_schedule(interface);
+	/* restart the service task */
+	fm10k_start_service_event(interface);
 
 	return err;
 }
-- 
2.14.2

^ permalink raw reply related

* Re: [next-queue PATCH v2 2/5] net/sched: Fix accessing invalid dev_queue
From: Jesus Sanchez-Palencia @ 2017-10-02 15:57 UTC (permalink / raw)
  To: Cong Wang, Vinicius Costa Gomes
  Cc: Linux Kernel Network Developers, intel-wired-lan,
	Jamal Hadi Salim, Jiri Pirko, andre.guedes, Ivan Briano,
	boon.leong.ong, richardcochran, Henrik Austad, levipearson,
	rodney.cummings
In-Reply-To: <CAM_iQpW7YtS2=rRw814=zWcJaUOGmkWQ+NHQKPwsBF4hMQSRVg@mail.gmail.com>

Hi,

On 09/30/2017 05:22 PM, Cong Wang wrote:
> On Fri, Sep 29, 2017 at 5:26 PM, Vinicius Costa Gomes
> <vinicius.gomes@intel.com> wrote:
>> From: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
>>
>> In qdisc_alloc() the dev_queue pointer was used without any checks being
>> performed. If qdisc_create() gets a null dev_queue pointer, it just
>> passes it along to qdisc_alloc(), leading to a crash. That happens if a
>> root qdisc implements select_queue() and returns a null dev_queue
>> pointer for an "invalid handle", for example.
> 
> Does it make sense to let mqprio_select_queue() always return
> non-NULL?
> 
> At least mq_select_queue() returns queue #0 as a fallback.

I had seen that, but my understanding was that for mqprio the inner qdiscs are
always related to one of the Tx netdev_queue per design. Returning any other
queue as a fallback seemed like going against that to me.

I'd rather keep this function as the patch is proposing, thus either returning
the correct netdev_queue for a given handle, or NULL as a way to flag that
something was 'wrong' with it. Returning queue #0 is misleading in that sense, imo.

What do you think?

Regards,
Jesus

^ permalink raw reply

* Re: [PATCH RFC] flow_dissector: Add FLOW_DISSECTOR_F_FLOWER
From: Tom Herbert @ 2017-10-02 16:05 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Tom Herbert, David S. Miller, Hannes Frederic Sowa,
	Linux Kernel Network Developers, Rohit Seth
In-Reply-To: <20171002144946.GE1941@nanopsycho.orion>

On Mon, Oct 2, 2017 at 7:49 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> Fri, Sep 29, 2017 at 09:13:42PM CEST, tom@quantonium.net wrote:
>>This patch is RFC and would be applied after "flow_dissector:
>>Protocol specific flow dissector offload"
>>
>>In order to maitain uAPI in flower, the FLOW_DISSECTOR_F_FLOWER flag
>>is added to indicate to flow_dissector that the caller is flower.
>>As new funtionality is addes to flow_dissector that would break
>>the flower uAPI, the code can be wrapped in "if (!(flags &
>>FLOW_DISSECTOR_F_FLOWER)).
>>
>>In this patch the conditional is use around protocol specific
>>dissection (e.g. DPI into VXLAN) as well as the code that
>>enforces a depth of parsing to prevent DPI. The latter was a
>>recent patch that would introduce a parsing limit to flower that
>>did not exist before (i.e. would break uAPI).
>>
>>Signed-off-by: Tom Herbert <tom@quantonium.net>
>>---
>> include/net/flow_dissector.h |  1 +
>> net/core/flow_dissector.c    | 17 +++++++++++------
>> net/sched/cls_flow.c         |  3 ++-
>> 3 files changed, 14 insertions(+), 7 deletions(-)
>>
>>diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
>>index ad75bbfd1c9c..ca315107d147 100644
>>--- a/include/net/flow_dissector.h
>>+++ b/include/net/flow_dissector.h
>>@@ -214,6 +214,7 @@ enum flow_dissector_key_id {
>> #define FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL   BIT(2)
>> #define FLOW_DISSECTOR_F_STOP_AT_ENCAP                BIT(3)
>> #define FLOW_DISSECTOR_F_STOP_AT_L4           BIT(4)
>>+#define FLOW_DISSECTOR_F_FLOWER                       BIT(5)
>
> I don't like flow_dissector to have any user-specific bits. Note that
> the same dissection may be used not only from flower, but from other
> code as well (OVS). Flow dissector should not care who the caller is.

I agree with that, but unfortunately that's now how it works in
reality. As pointed out flower has assumed flow_dissector semantics as
its uAPI, so we can't change flow dissector with out considering this
one specific caller even if all other use cases of flow dissector
don't care.

If you don't like this approach, then please suggest an alternative
that will achieve the same effect.

Thanks,
Tom

^ permalink raw reply

* [net-next V3 PATCH 0/5] New bpf cpumap type for XDP_REDIRECT
From: Jesper Dangaard Brouer @ 2017-10-02 16:05 UTC (permalink / raw)
  To: netdev
  Cc: jakub.kicinski, Michael S. Tsirkin, pavel.odintsov, Jason Wang,
	mchan, John Fastabend, peter.waskiewicz.jr,
	Jesper Dangaard Brouer, Daniel Borkmann, Alexei Starovoitov,
	Andy Gospodarek

Introducing a new way to redirect XDP frames.  Notice how no driver
changes are necessary given the design of XDP_REDIRECT.

This redirect map type is called 'cpumap', as it allows redirection
XDP frames to remote CPUs.  The remote CPU will do the SKB allocation
and start the network stack invocation on that CPU.

This is a scalability and isolation mechanism, that allow separating
the early driver network XDP layer, from the rest of the netstack, and
assigning dedicated CPUs for this stage.  The sysadm control/configure
the RX-CPU to NIC-RX queue (as usual) via procfs smp_affinity and how
many queues are configured via ethtool --set-channels.  Benchmarks
show that a single CPU can handle approx 11Mpps.  Thus, only assigning
two NIC RX-queues (and two CPUs) is sufficient for handling 10Gbit/s
wirespeed smallest packet 14.88Mpps.  Reducing the number of queues
have the advantage that more packets being "bulk" available per hard
interrupt[1].

[1] https://www.netdevconf.org/2.1/papers/BusyPollingNextGen.pdf

Use-cases:

1. End-host based pre-filtering for DDoS mitigation.  This is fast
   enough to allow software to see and filter all packets wirespeed.
   Thus, no packets getting silently dropped by hardware.

2. Given NIC HW unevenly distributes packets across RX queue, this
   mechanism can be used for redistribution load across CPUs.  This
   usually happens when HW is unaware of a new protocol.  This
   resembles RPS (Receive Packet Steering), just faster, but with more
   responsibility placed on the BPF program for correct steering.

3. Auto-scaling or power saving via only activating the appropriate
   number of remote CPUs for handling the current load.  The cpumap
   tracepoints can function as a feedback loop for this purpose.

Patchset V3 based on net-next at:
 commit 0929567a7a2d ("samples/bpf: fix warnings in xdp_monitor_user")

---

Jesper Dangaard Brouer (5):
      bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAP
      bpf: XDP_REDIRECT enable use of cpumap
      bpf: cpumap xdp_buff to skb conversion and allocation
      bpf: cpumap add tracepoints
      samples/bpf: add cpumap sample program xdp_redirect_cpu


 include/linux/bpf.h                 |   31 ++
 include/linux/bpf_types.h           |    1 
 include/trace/events/xdp.h          |   80 ++++
 include/uapi/linux/bpf.h            |    1 
 kernel/bpf/Makefile                 |    1 
 kernel/bpf/cpumap.c                 |  683 +++++++++++++++++++++++++++++++++++
 kernel/bpf/syscall.c                |    8 
 kernel/bpf/verifier.c               |    3 
 net/core/filter.c                   |   64 +++
 samples/bpf/Makefile                |    4 
 samples/bpf/xdp_redirect_cpu_kern.c |  619 ++++++++++++++++++++++++++++++++
 samples/bpf/xdp_redirect_cpu_user.c |  647 +++++++++++++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h      |    1 
 13 files changed, 2130 insertions(+), 13 deletions(-)
 create mode 100644 kernel/bpf/cpumap.c
 create mode 100644 samples/bpf/xdp_redirect_cpu_kern.c
 create mode 100644 samples/bpf/xdp_redirect_cpu_user.c

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox