Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi
From: Willem de Bruijn @ 2017-08-25 22:44 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Koichiro Den, Jason Wang, virtualization, Network Development
In-Reply-To: <20170824234551-mutt-send-email-mst@kernel.org>

>> >> > We don't enable network watchdog on virtio but we could and maybe
>> >> > should.
>> >>
>> >> Can you elaborate?
>> >
>> > The issue is that holding onto buffers for very long times makes guests
>> > think they are stuck. This is funamentally because from guest point of
>> > view this is a NIC, so it is supposed to transmit things out in
>> > a timely manner. If host backs the virtual NIC by something that is not
>> > a NIC, with traffic shaping etc introducing unbounded latencies,
>> > guest will be confused.
>>
>> That assumes that guests are fragile in this regard. A linux guest
>> does not make such assumptions.
>
> Yes it does. Examples above:
>         > > - a single slow flow can occupy the whole ring, you will not
>         > >   be able to make any new buffers available for the fast flow

Oh, right. Though those are due to vring_desc pool exhaustion
rather than an upper bound on latency of any single packet.

Limiting the number of zerocopy packets in flight to some fraction
of the ring ensures that fast flows can always grab a slot. Running
out of ubuf_info slots reverts to copy, so indirectly does this. But
I read it correclty the zerocopy pool may be equal to or larger than
the descriptor pool. Should we refine the zcopy_used test

    (nvq->upend_idx + 1) % UIO_MAXIOV != nvq->done_idx

to also return false if the number of outstanding ubuf_info is greater
than, say, vq->num >> 1?

^ permalink raw reply

* Re: [PATCH net-next v2 05/14] net: mvpp2: do not force the link mode
From: Russell King - ARM Linux @ 2017-08-25 22:43 UTC (permalink / raw)
  To: Antoine Tenart
  Cc: davem, kishon, andrew, jason, sebastian.hesselbarth,
	gregory.clement, thomas.petazzoni, nadavh, linux-kernel, mw,
	stefanc, miquel.raynal, netdev
In-Reply-To: <20170825144821.31129-6-antoine.tenart@free-electrons.com>

On Fri, Aug 25, 2017 at 04:48:12PM +0200, Antoine Tenart wrote:
> The link mode (speed, duplex) was forced based on what the phylib
> returns. This should not be the case, and only forced by ethtool
> functions manually. This patch removes the link mode enforcement from
> the phylib link_event callback.

So how does RGMII work (which has no in-band signalling between the PHY
and MAC)?

phylib expects the network driver to configure it according to the PHY
state at link_event time - I think you need to explain more why you
think that this is not necessary.

> 
> Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
> ---
>  drivers/net/ethernet/marvell/mvpp2.c | 24 ------------------------
>  1 file changed, 24 deletions(-)
> 
> diff --git a/drivers/net/ethernet/marvell/mvpp2.c b/drivers/net/ethernet/marvell/mvpp2.c
> index fab231858a41..498a4969dc58 100644
> --- a/drivers/net/ethernet/marvell/mvpp2.c
> +++ b/drivers/net/ethernet/marvell/mvpp2.c
> @@ -5741,30 +5741,10 @@ static void mvpp2_link_event(struct net_device *dev)
>  	struct mvpp2_port *port = netdev_priv(dev);
>  	struct phy_device *phydev = dev->phydev;
>  	int status_change = 0;
> -	u32 val;
>  
>  	if (phydev->link) {
>  		if ((port->speed != phydev->speed) ||
>  		    (port->duplex != phydev->duplex)) {
> -			u32 val;
> -
> -			val = readl(port->base + MVPP2_GMAC_AUTONEG_CONFIG);
> -			val &= ~(MVPP2_GMAC_CONFIG_MII_SPEED |
> -				 MVPP2_GMAC_CONFIG_GMII_SPEED |
> -				 MVPP2_GMAC_CONFIG_FULL_DUPLEX |
> -				 MVPP2_GMAC_AN_SPEED_EN |
> -				 MVPP2_GMAC_AN_DUPLEX_EN);
> -
> -			if (phydev->duplex)
> -				val |= MVPP2_GMAC_CONFIG_FULL_DUPLEX;
> -
> -			if (phydev->speed == SPEED_1000)
> -				val |= MVPP2_GMAC_CONFIG_GMII_SPEED;
> -			else if (phydev->speed == SPEED_100)
> -				val |= MVPP2_GMAC_CONFIG_MII_SPEED;
> -
> -			writel(val, port->base + MVPP2_GMAC_AUTONEG_CONFIG);
> -
>  			port->duplex = phydev->duplex;
>  			port->speed  = phydev->speed;
>  		}
> @@ -5782,10 +5762,6 @@ static void mvpp2_link_event(struct net_device *dev)
>  
>  	if (status_change) {
>  		if (phydev->link) {
> -			val = readl(port->base + MVPP2_GMAC_AUTONEG_CONFIG);
> -			val |= (MVPP2_GMAC_FORCE_LINK_PASS |
> -				MVPP2_GMAC_FORCE_LINK_DOWN);
> -			writel(val, port->base + MVPP2_GMAC_AUTONEG_CONFIG);
>  			mvpp2_egress_enable(port);
>  			mvpp2_ingress_enable(port);
>  		} else {
> -- 
> 2.13.5
> 

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply

* [ethtool 1/1] ethtool: Add DMA Coalescing support
From: Jeff Kirsher @ 2017-08-25 22:39 UTC (permalink / raw)
  To: linville
  Cc: Paul Greenwalt, netdev, nhorman, sassmann, jogreene, Jeff Kirsher

From: Paul Greenwalt <paul.greenwalt@intel.com>

Add support for DMA Coalescing (DMAC) hardware feature. The feature
allows synchronization of port DMA activity across ports in order to
optimize power consumption. DMAC is supported on igb and ixgbe
devices.

Support for enabling and configuring the DMAC watchdog timer is via
the ethtool coalesce [-c|-C] dmac option added with this patch.
Since DMAC is disabled when interrupt moderation is disabled, placing
the dmac option in the coalesce command provides related -c 'show'
information with a single command. The dmac option allows the users
to disable DMAC, or enable and set the DMAC watchdog timer. When in
coalescing mode, this timer starts counting down when the first
transaction is batched. The controller moves to the not coalescing
state when the watchdog timer reaches zero.

Set usage: ethtool -C DEVNAME dmac [0 | N]

Where 0 disables DMAC, and N is watchdog timer interval in usecs.
The device driver will check that N is within a valid range.

Example:
Enable and configure DMAC watchdog timer to 1000 usecs:
 # ethtool -C dmac 1000

Disable DMAC:
 # ethtool -C dmac 0

Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 ethtool-copy.h | 2 ++
 ethtool.8.in   | 1 +
 ethtool.c      | 8 +++++++-
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/ethtool-copy.h b/ethtool-copy.h
index 06fc04c..4bb91eb 100644
--- a/ethtool-copy.h
+++ b/ethtool-copy.h
@@ -400,6 +400,7 @@ struct ethtool_modinfo {
  *	a TX interrupt, when the packet rate is above @pkt_rate_high.
  * @rate_sample_interval: How often to do adaptive coalescing packet rate
  *	sampling, measured in seconds.  Must not be zero.
+ * @dmac: How many usecs to store packets before moving to host memory.
  *
  * Each pair of (usecs, max_frames) fields specifies that interrupts
  * should be coalesced until
@@ -450,6 +451,7 @@ struct ethtool_coalesce {
 	__u32	tx_coalesce_usecs_high;
 	__u32	tx_max_coalesced_frames_high;
 	__u32	rate_sample_interval;
+	__u32	dmac;
 };
 
 /**
diff --git a/ethtool.8.in b/ethtool.8.in
index 7224b04..96b0a67 100644
--- a/ethtool.8.in
+++ b/ethtool.8.in
@@ -165,6 +165,7 @@ ethtool \- query or control network driver and hardware settings
 .BN tx\-usecs\-high
 .BN tx\-frames\-high
 .BN sample\-interval
+.BN dmac
 .HP
 .B ethtool \-g|\-\-show\-ring
 .I devname
diff --git a/ethtool.c b/ethtool.c
index ad18704..e1530f5 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -1337,6 +1337,7 @@ static int dump_coalesce(const struct ethtool_coalesce *ecoal)
 		"sample-interval: %u\n"
 		"pkt-rate-low: %u\n"
 		"pkt-rate-high: %u\n"
+		"dmac: %u\n"
 		"\n"
 		"rx-usecs: %u\n"
 		"rx-frames: %u\n"
@@ -1362,6 +1363,7 @@ static int dump_coalesce(const struct ethtool_coalesce *ecoal)
 		ecoal->rate_sample_interval,
 		ecoal->pkt_rate_low,
 		ecoal->pkt_rate_high,
+		ecoal->dmac,
 
 		ecoal->rx_coalesce_usecs,
 		ecoal->rx_max_coalesced_frames,
@@ -2068,6 +2070,7 @@ static int do_scoalesce(struct cmd_context *ctx)
 	int coal_adaptive_rx_wanted = -1;
 	int coal_adaptive_tx_wanted = -1;
 	s32 coal_sample_rate_wanted = -1;
+	s32 coal_dmac_wanted = -1;
 	s32 coal_pkt_rate_low_wanted = -1;
 	s32 coal_pkt_rate_high_wanted = -1;
 	s32 coal_rx_usec_wanted = -1;
@@ -2093,6 +2096,8 @@ static int do_scoalesce(struct cmd_context *ctx)
 		  &ecoal.use_adaptive_tx_coalesce },
 		{ "sample-interval", CMDL_S32, &coal_sample_rate_wanted,
 		  &ecoal.rate_sample_interval },
+		{ "dmac", CMDL_S32, &coal_dmac_wanted,
+		  &ecoal.dmac },
 		{ "stats-block-usecs", CMDL_S32, &coal_stats_wanted,
 		  &ecoal.stats_block_coalesce_usecs },
 		{ "pkt-rate-low", CMDL_S32, &coal_pkt_rate_low_wanted,
@@ -4784,7 +4789,8 @@ static const struct option {
 	  "		[rx-frames-high N]\n"
 	  "		[tx-usecs-high N]\n"
 	  "		[tx-frames-high N]\n"
-	  "		[sample-interval N]\n" },
+	  "		[sample-interval N]\n"
+	  "		[dmac N]\n" },
 	{ "-g|--show-ring", 1, do_gring, "Query RX/TX ring parameters" },
 	{ "-G|--set-ring", 1, do_sring, "Set RX/TX ring parameters",
 	  "		[ rx N ]\n"
-- 
2.14.1

^ permalink raw reply related

* Re: [RFC PATCH] net: limit maximum number of packets to mark with xmit_more
From: Alexander Duyck @ 2017-08-25 22:33 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Waskiewicz Jr, Peter, Keller, Jacob E, netdev@vger.kernel.org
In-Reply-To: <20170825085816.3425a70c@xeon-e3>

On Fri, Aug 25, 2017 at 8:58 AM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Fri, 25 Aug 2017 15:36:22 +0000
> "Waskiewicz Jr, Peter" <peter.waskiewicz.jr@intel.com> wrote:
>
>> On 8/25/17 11:25 AM, Jacob Keller wrote:
>> > Under some circumstances, such as with many stacked devices, it is
>> > possible that dev_hard_start_xmit will bundle many packets together, and
>> > mark them all with xmit_more.
>> >
>> > Most drivers respond to xmit_more by skipping tail bumps on packet
>> > rings, or similar behavior as long as xmit_more is set. This is
>> > a performance win since it means drivers can avoid notifying hardware of
>> > new packets repeat daily, and thus avoid wasting unnecessary PCIe or other
>> > bandwidth.
>> >
>> > This use of xmit_more comes with a trade off because bundling too many
>> > packets can increase latency of the Tx packets. To avoid this, we should
>> > limit the maximum number of packets with xmit_more.
>> >
>> > Driver authors could modify their drivers to check for some determined
>> > limit, but this requires all drivers to be modified in order to gain
>> > advantage.
>> >
>> > Instead, add a sysctl "xmit_more_max" which can be used to configure the
>> > maximum number of xmit_more skbs to send in a sequence. This ensures
>> > that all drivers benefit, and allows system administrators the option to
>> > tune the value to their environment.
>> >
>> > Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
>> > ---
>> >
>> > Stray thoughts and further questions....
>> >
>> > Is this the right approach? Did I miss any other places where we should
>> > limit? Does the limit make sense? Should it instead be a per-device
>> > tuning nob instead of a global? Is 32 a good default?
>>
>> I actually like the idea of a per-device knob.  A xmit_more_max that's
>> global in a system with 1GbE devices along with a 25/50GbE or more just
>> doesn't make much sense to me.  Or having heterogeneous vendor devices
>> in the same system that have different HW behaviors could mask issues
>> with latency.
>>
>> This seems like another incarnation of possible buffer-bloat if the max
>> is too high...
>>
>> >
>> >   Documentation/sysctl/net.txt |  6 ++++++
>> >   include/linux/netdevice.h    |  2 ++
>> >   net/core/dev.c               | 10 +++++++++-
>> >   net/core/sysctl_net_core.c   |  7 +++++++
>> >   4 files changed, 24 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
>> > index b67044a2575f..3d995e8f4448 100644
>> > --- a/Documentation/sysctl/net.txt
>> > +++ b/Documentation/sysctl/net.txt
>> > @@ -230,6 +230,12 @@ netdev_max_backlog
>> >   Maximum number  of  packets,  queued  on  the  INPUT  side, when the interface
>> >   receives packets faster than kernel can process them.
>> >
>> > +xmit_more_max
>> > +-------------
>> > +
>> > +Maximum number of packets in a row to mark with skb->xmit_more. A value of zero
>> > +indicates no limit.
>>
>> What defines "packet?"  MTU-sized packets, or payloads coming down from
>> the stack (e.g. TSO's)?
>
> xmit_more is only a hint to the device. The device driver should ignore it unless
> there are hardware advantages. The device driver is the place with HW specific
> knowledge (like 4 Tx descriptors is equivalent to one PCI transaction on this device).
>
> Anything that pushes that optimization out to the user is only useful for benchmarks
> and embedded devices.

Actually I think I might have an idea what is going on here and I
agree that this is probably something that needs to be fixed in the
drivers. Especially since the problem isn't so much the skbs but
descriptors in the descriptor ring.

If I am not mistaken the issue is most drivers will honor the
xmit_more unless the ring cannot enqueue another packet. The problem
is if the clean-up is occurring on a different CPU than transmit we
can cause the clean-up CPU/device DMA to go idle by not providing any
notifications to the device that new packets are present. What we
should probably do is look at adding another condition which is to
force us to flush the packet if we have used over half of the
descriptors in a given ring without notifying the device. Then that
way we can be filling half while the device is processing the other
half which should result in us operating smoothly.

- Alex

^ permalink raw reply

* Re: [Intel-wired-lan] how to submit fixes for i40e/i40evf?
From: Stefano Brivio @ 2017-08-25 22:28 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: David S. Miller, Jeff Kirsher, Netdev, intel-wired-lan
In-Reply-To: <CAKgT0UcLAw0958zJ4o9V9Vyamvd1mBQX1+9GAaZw-hZubMqXnw@mail.gmail.com>

On Fri, 25 Aug 2017 15:10:08 -0700
Alexander Duyck <alexander.duyck@gmail.com> wrote:

> On Fri, Aug 25, 2017 at 1:52 PM, Stefano Brivio <sbrivio@redhat.com> wrote:
>
> [...]
>
> > Once patches reach Intel's patchwork, will they need to wait for some
> > kind of periodically scheduled pull request process?  
> 
> Once in the patchwork they go through testing and after they have
> passed testing Jeff will try to push them to Dave.

Ok, the whole part above is clear, thanks a lot for clarifying.

> > I don't know if a process is actually defined at this level of detail,
> > but still I feel it's wrong that an obvious fix for a potential crash is
> > waiting in some sort of limbo for 10 days now. Sure, worse things
> > happen in the world, but I can't understand what this patch is waiting
> > for.  
> 
> Well in the case of your patch it was rejected as it didn't apply to
> Jeff's tree

It actually did when I posted it.

> and conflicted with Jacob Keller's patch. He submitted a v2 on Tuesday
> which has only been applied for a few days. Once it receives a
> "Tested-by:"

Which, if I understood correctly, only comes after some internal testing
process, right?

> it will be ready for submission assuming it passes testing.

Now that patch is again in a v2 pull request for net-next, without the
changes I suggested for the commit message. And the same exact code
changes were around for two weeks. IMHO there's room for improvement,
so to speak.

> I hope that helps to clarify things.

It did to some extent, and thanks again for that.

^ permalink raw reply

* [PATCH 2/2] drivers: net: xgene: Clean up all outstanding tx descriptors
From: Iyappan Subramanian @ 2017-08-25 22:23 UTC (permalink / raw)
  To: davem, netdev
  Cc: linux-arm-kernel, dnelson, qnguyen, patches, Iyappan Subramanian
In-Reply-To: <1503699810-12803-1-git-send-email-isubramanian@apm.com>

When xgene_enet is rmmod'd and there are still outstanding tx descriptors
that have been setup but have not completed, it is possible on the next
modprobe of the driver to receive the oldest of such tx descriptors. This
results in a kernel NULL pointer dereference.

This patch attempts to clean up (by tearing down) all outstanding tx
descriptors when the xgene_enet driver is being rmmod'd.

Given that, on the next modprobe it should be safe to ignore any such tx
descriptors received that map to a NULL skb pointer.

Additionally this patch removes redundant call to dev_kfree_skb_any() from
xgene_enet_setup_tx_desc(). The only caller of xgene_enet_setup_tx_desc()
will call dev_kfree_skb_any() upon return of an error. Nothing is gained by
calling it twice in a row.

Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Dean Nelson <dnelson@redhat.com>
Tested-by: Quan Nguyen <qnguyen@apm.com>
---
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 120 +++++++++++++++++------
 1 file changed, 89 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index 6e253d9..76e2903 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -237,22 +237,24 @@ static irqreturn_t xgene_enet_rx_irq(const int irq, void *data)
 	return IRQ_HANDLED;
 }
 
-static int xgene_enet_tx_completion(struct xgene_enet_desc_ring *cp_ring,
-				    struct xgene_enet_raw_desc *raw_desc)
+static dma_addr_t *xgene_get_frag_dma_array(struct xgene_enet_desc_ring *ring,
+					    u16 skb_index)
 {
-	struct xgene_enet_pdata *pdata = netdev_priv(cp_ring->ndev);
-	struct sk_buff *skb;
+	return &ring->frag_dma_addr[skb_index * MAX_SKB_FRAGS];
+}
+
+static void xgene_enet_teardown_tx_desc(struct xgene_enet_desc_ring *cp_ring,
+					struct xgene_enet_raw_desc *raw_desc,
+					struct xgene_enet_raw_desc *exp_desc,
+					struct sk_buff *skb,
+					u16 skb_index)
+{
+	dma_addr_t dma_addr, *frag_dma_addr;
 	struct device *dev;
 	skb_frag_t *frag;
-	dma_addr_t *frag_dma_addr;
-	u16 skb_index;
-	u8 mss_index;
-	u8 status;
 	int i;
 
-	skb_index = GET_VAL(USERINFO, le64_to_cpu(raw_desc->m0));
-	skb = cp_ring->cp_skb[skb_index];
-	frag_dma_addr = &cp_ring->frag_dma_addr[skb_index * MAX_SKB_FRAGS];
+	frag_dma_addr = xgene_get_frag_dma_array(cp_ring, skb_index);
 
 	dev = ndev_to_dev(cp_ring->ndev);
 	dma_unmap_single(dev, GET_VAL(DATAADDR, le64_to_cpu(raw_desc->m1)),
@@ -265,6 +267,36 @@ static int xgene_enet_tx_completion(struct xgene_enet_desc_ring *cp_ring,
 			       DMA_TO_DEVICE);
 	}
 
+	if (exp_desc && GET_VAL(LL_BYTES_LSB, le64_to_cpu(raw_desc->m2))) {
+		dma_addr = GET_VAL(DATAADDR, le64_to_cpu(exp_desc->m2));
+		dma_unmap_single(dev, dma_addr, sizeof(u64) * MAX_EXP_BUFFS,
+				 DMA_TO_DEVICE);
+	}
+
+	dev_kfree_skb_any(skb);
+}
+
+static int xgene_enet_tx_completion(struct xgene_enet_desc_ring *cp_ring,
+				    struct xgene_enet_raw_desc *raw_desc,
+				    struct xgene_enet_raw_desc *exp_desc)
+{
+	struct xgene_enet_pdata *pdata = netdev_priv(cp_ring->ndev);
+	struct sk_buff *skb;
+	u16 skb_index;
+	u8 status;
+	u8 mss_index;
+
+	skb_index = GET_VAL(USERINFO, le64_to_cpu(raw_desc->m0));
+	skb = cp_ring->cp_skb[skb_index];
+	if (unlikely(!skb)) {
+		netdev_err(cp_ring->ndev, "completion skb is NULL\n");
+		return -EIO;
+	}
+	cp_ring->cp_skb[skb_index] = NULL;
+
+	xgene_enet_teardown_tx_desc(cp_ring, raw_desc, exp_desc, skb,
+				    skb_index);
+
 	if (GET_BIT(ET, le64_to_cpu(raw_desc->m3))) {
 		mss_index = GET_VAL(MSS, le64_to_cpu(raw_desc->m3));
 		spin_lock(&pdata->mss_lock);
@@ -279,12 +311,6 @@ static int xgene_enet_tx_completion(struct xgene_enet_desc_ring *cp_ring,
 		cp_ring->tx_errors++;
 	}
 
-	if (likely(skb)) {
-		dev_kfree_skb_any(skb);
-	} else {
-		netdev_err(cp_ring->ndev, "completion skb is NULL\n");
-	}
-
 	return 0;
 }
 
@@ -412,11 +438,6 @@ static __le64 *xgene_enet_get_exp_bufs(struct xgene_enet_desc_ring *ring)
 	return exp_bufs;
 }
 
-static dma_addr_t *xgene_get_frag_dma_array(struct xgene_enet_desc_ring *ring)
-{
-	return &ring->cp_ring->frag_dma_addr[ring->tail * MAX_SKB_FRAGS];
-}
-
 static int xgene_enet_setup_tx_desc(struct xgene_enet_desc_ring *tx_ring,
 				    struct sk_buff *skb)
 {
@@ -473,7 +494,8 @@ static int xgene_enet_setup_tx_desc(struct xgene_enet_desc_ring *tx_ring,
 	for (i = nr_frags; i < 4 ; i++)
 		exp_desc[i ^ 1] = cpu_to_le64(LAST_BUFFER);
 
-	frag_dma_addr = xgene_get_frag_dma_array(tx_ring);
+	frag_dma_addr = xgene_get_frag_dma_array(tx_ring->cp_ring,
+						 tx_ring->tail);
 
 	for (i = 0, fidx = 0; split || (fidx < nr_frags); i++) {
 		if (!split) {
@@ -484,7 +506,7 @@ static int xgene_enet_setup_tx_desc(struct xgene_enet_desc_ring *tx_ring,
 			pbuf_addr = skb_frag_dma_map(dev, frag, 0, size,
 						     DMA_TO_DEVICE);
 			if (dma_mapping_error(dev, pbuf_addr))
-				return -EINVAL;
+				goto err;
 
 			frag_dma_addr[fidx] = pbuf_addr;
 			fidx++;
@@ -539,10 +561,9 @@ static int xgene_enet_setup_tx_desc(struct xgene_enet_desc_ring *tx_ring,
 		dma_addr = dma_map_single(dev, exp_bufs,
 					  sizeof(u64) * MAX_EXP_BUFFS,
 					  DMA_TO_DEVICE);
-		if (dma_mapping_error(dev, dma_addr)) {
-			dev_kfree_skb_any(skb);
-			return -EINVAL;
-		}
+		if (dma_mapping_error(dev, dma_addr))
+			goto err;
+
 		i = ell_bytes >> LL_BYTES_LSB_LEN;
 		exp_desc[2] = cpu_to_le64(SET_VAL(DATAADDR, dma_addr) |
 					  SET_VAL(LL_BYTES_MSB, i) |
@@ -558,6 +579,19 @@ static int xgene_enet_setup_tx_desc(struct xgene_enet_desc_ring *tx_ring,
 	tx_ring->tail = tail;
 
 	return count;
+
+err:
+	dma_unmap_single(dev, GET_VAL(DATAADDR, le64_to_cpu(raw_desc->m1)),
+			 skb_headlen(skb),
+			 DMA_TO_DEVICE);
+
+	for (i = 0; i < fidx; i++) {
+		frag = &skb_shinfo(skb)->frags[i];
+		dma_unmap_page(dev, frag_dma_addr[i], skb_frag_size(frag),
+			       DMA_TO_DEVICE);
+	}
+
+	return -EINVAL;
 }
 
 static netdev_tx_t xgene_enet_start_xmit(struct sk_buff *skb,
@@ -828,7 +862,8 @@ static int xgene_enet_process_ring(struct xgene_enet_desc_ring *ring,
 		if (is_rx_desc(raw_desc)) {
 			ret = xgene_enet_rx_frame(ring, raw_desc, exp_desc);
 		} else {
-			ret = xgene_enet_tx_completion(ring, raw_desc);
+			ret = xgene_enet_tx_completion(ring, raw_desc,
+						       exp_desc);
 			is_completion = true;
 		}
 		xgene_enet_mark_desc_slot_empty(raw_desc);
@@ -1071,18 +1106,41 @@ static void xgene_enet_delete_desc_rings(struct xgene_enet_pdata *pdata)
 {
 	struct xgene_enet_desc_ring *buf_pool, *page_pool;
 	struct xgene_enet_desc_ring *ring;
-	int i;
+	struct xgene_enet_raw_desc *raw_desc, *exp_desc;
+	struct sk_buff *skb;
+	int i, j, k;
 
 	for (i = 0; i < pdata->txq_cnt; i++) {
 		ring = pdata->tx_ring[i];
 		if (ring) {
+			/*
+			 * Find any tx descriptors that were setup but never
+			 * completed, and teardown the setup.
+			 */
+			for (j = 0; j < ring->slots; j++) {
+				skb = ring->cp_ring->cp_skb[j];
+				if (likely(!skb))
+					continue;
+
+				raw_desc = &ring->raw_desc[j];
+				exp_desc = NULL;
+				if (GET_BIT(NV, le64_to_cpu(raw_desc->m0))) {
+					k = (j + 1) & (ring->slots - 1);
+					exp_desc = &ring->raw_desc[k];
+				}
+
+				xgene_enet_teardown_tx_desc(ring->cp_ring,
+							    raw_desc, exp_desc,
+							    skb, j);
+			}
+
 			xgene_enet_delete_ring(ring);
 			pdata->port_ops->clear(pdata, ring);
+
 			if (pdata->cq_cnt)
 				xgene_enet_delete_ring(ring->cp_ring);
 			pdata->tx_ring[i] = NULL;
 		}
-
 	}
 
 	for (i = 0; i < pdata->rxq_cnt; i++) {
-- 
2.7.4

^ permalink raw reply related

* [PATCH 1/2] drivers: net: xgene: Correct probe sequence handling
From: Iyappan Subramanian @ 2017-08-25 22:23 UTC (permalink / raw)
  To: davem, netdev
  Cc: linux-arm-kernel, dnelson, qnguyen, patches, Iyappan Subramanian
In-Reply-To: <1503699810-12803-1-git-send-email-isubramanian@apm.com>

From: Quan Nguyen <qnguyen@apm.com>

The phy is connected at early stage of probe but not properly
disconnected if error occurs.  This patch fixes the issue.

Also changing the return type of xgene_enet_check_phy_handle(),
since this function always returns success.

Signed-off-by: Quan Nguyen <qnguyen@apm.com>
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
---
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 27 ++++++++++++------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index 1d307f2..6e253d9 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -1661,21 +1661,21 @@ static int xgene_enet_get_irqs(struct xgene_enet_pdata *pdata)
 	return 0;
 }
 
-static int xgene_enet_check_phy_handle(struct xgene_enet_pdata *pdata)
+static void xgene_enet_check_phy_handle(struct xgene_enet_pdata *pdata)
 {
 	int ret;
 
 	if (pdata->phy_mode == PHY_INTERFACE_MODE_XGMII)
-		return 0;
+		return;
 
 	if (!IS_ENABLED(CONFIG_MDIO_XGENE))
-		return 0;
+		return;
 
 	ret = xgene_enet_phy_connect(pdata->ndev);
 	if (!ret)
 		pdata->mdio_driver = true;
 
-	return 0;
+	return;
 }
 
 static void xgene_enet_gpiod_get(struct xgene_enet_pdata *pdata)
@@ -1779,10 +1779,6 @@ static int xgene_enet_get_resources(struct xgene_enet_pdata *pdata)
 	if (ret)
 		return ret;
 
-	ret = xgene_enet_check_phy_handle(pdata);
-	if (ret)
-		return ret;
-
 	xgene_enet_gpiod_get(pdata);
 
 	pdata->clk = devm_clk_get(&pdev->dev, NULL);
@@ -2097,9 +2093,11 @@ static int xgene_enet_probe(struct platform_device *pdev)
 		goto err;
 	}
 
+	xgene_enet_check_phy_handle(pdata);
+
 	ret = xgene_enet_init_hw(pdata);
 	if (ret)
-		goto err;
+		goto err2;
 
 	link_state = pdata->mac_ops->link_state;
 	if (pdata->phy_mode == PHY_INTERFACE_MODE_XGMII) {
@@ -2117,29 +2115,30 @@ static int xgene_enet_probe(struct platform_device *pdev)
 	spin_lock_init(&pdata->stats_lock);
 	ret = xgene_extd_stats_init(pdata);
 	if (ret)
-		goto err2;
+		goto err1;
 
 	xgene_enet_napi_add(pdata);
 	ret = register_netdev(ndev);
 	if (ret) {
 		netdev_err(ndev, "Failed to register netdev\n");
-		goto err2;
+		goto err1;
 	}
 
 	return 0;
 
-err2:
+err1:
 	/*
 	 * If necessary, free_netdev() will call netif_napi_del() and undo
 	 * the effects of xgene_enet_napi_add()'s calls to netif_napi_add().
 	 */
 
+	xgene_enet_delete_desc_rings(pdata);
+
+err2:
 	if (pdata->mdio_driver)
 		xgene_enet_phy_disconnect(pdata);
 	else if (phy_interface_mode_is_rgmii(pdata->phy_mode))
 		xgene_enet_mdio_remove(pdata);
-err1:
-	xgene_enet_delete_desc_rings(pdata);
 err:
 	free_netdev(ndev);
 	return ret;
-- 
2.7.4

^ permalink raw reply related

* [PATCH 0/2] drivers: net: xgene: Misc bug fixes
From: Iyappan Subramanian @ 2017-08-25 22:23 UTC (permalink / raw)
  To: davem, netdev
  Cc: linux-arm-kernel, dnelson, qnguyen, patches, Iyappan Subramanian

This patch set,

     1. Adds call to PHY disconnect in the case of error
     2. Cleans up all outstanding TX descriptors when the driver is
	being rmmod'd

Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
---

Iyappan Subramanian (1):
  drivers: net: xgene: Clean up all outstanding tx descriptors

Quan Nguyen (1):
  drivers: net: xgene: Correct probe sequence handling

 drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 147 ++++++++++++++++-------
 1 file changed, 102 insertions(+), 45 deletions(-)

-- 
2.7.4

^ permalink raw reply

* Re: [Intel-wired-lan] how to submit fixes for i40e/i40evf?
From: Alexander Duyck @ 2017-08-25 22:10 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: David S. Miller, Jeff Kirsher, Netdev, intel-wired-lan
In-Reply-To: <20170825225215.35fc6b59@elisabeth>

On Fri, Aug 25, 2017 at 1:52 PM, Stefano Brivio <sbrivio@redhat.com> wrote:
> Hi,
>
> As I'm currently preparing another fix for i40e, and the last one I
> submitted has been stuck for about two weeks now, I would like to ask
> some details about the process to submit fixes for i40e/i40evf drivers,
> before I do something wrong again.
>
> Do all the patches have to go through Intel's patchwork, no matter
> what's the perceived severity of the issue? Should I still submit them
> to netdev anyway?

It is preferred if they go through Intel's patchwork as they should go
through some additional testing and review that way.

> Which trees should I check before submitting a patch? Is it enough to
> check the master branch of jkirsher/net-queue.git and
> jkirsher/next-queue.git?

It depends on if you want to see the patches end up in Dave's net tree
or his net-next tree. If the fix is high enough priority to be
accepted into Dave's net tree then you can submit it against Jeff's
net-queue, otherwise it should be the next-queue. You might want to
check the next-queue for a fix if you are seeing an issue though, as
there is a chance of a fix for an issue you deem critical ending up
there if somebody else judged it differently.

> Once patches reach Intel's patchwork, will they need to wait for some
> kind of periodically scheduled pull request process?

Once in the patchwork they go through testing and after they have
passed testing Jeff will try to push them to Dave.

> I don't know if a process is actually defined at this level of detail,
> but still I feel it's wrong that an obvious fix for a potential crash is
> waiting in some sort of limbo for 10 days now. Sure, worse things
> happen in the world, but I can't understand what this patch is waiting
> for.

Well in the case of your patch it was rejected as it didn't apply to
Jeff's tree and conflicted with Jacob Keller's patch. He submitted a
v2 on Tuesday which has only been applied for a few days. Once it
receives a "Tested-by:" it will be ready for submission assuming it
passes testing.

> Any answer is appreciated. Thanks,
>
> --
> Stefano

I hope that helps to clarify things.

- Alex

^ permalink raw reply

* [PATCH net] ipv6: fix sparse warning on rt6i_node
From: Wei Wang @ 2017-08-25 22:03 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: Eric Dumazet, Martin KaFai Lau, Wei Wang

From: Wei Wang <weiwan@google.com>

Commit c5cff8561d2d adds rcu grace period before freeing fib6_node. This
generates a new sparse warning on rt->rt6i_node related code:
  net/ipv6/route.c:1394:30: error: incompatible types in comparison
  expression (different address spaces)
  ./include/net/ip6_fib.h:187:14: error: incompatible types in comparison
  expression (different address spaces)

This commit adds "__rcu" tag for rt6i_node and makes sure corresponding
rcu API is used for it.
After this fix, sparse no longer generates the above warning.

Fixes: c5cff8561d2d ("ipv6: add rcu grace period before freeing fib6_node")
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
---
 include/net/ip6_fib.h |  2 +-
 net/ipv6/addrconf.c   |  2 +-
 net/ipv6/ip6_fib.c    | 11 +++++++----
 net/ipv6/route.c      |  3 ++-
 4 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index e9c59db92942..af509f801084 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -105,7 +105,7 @@ struct rt6_info {
 	 * the same cache line.
 	 */
 	struct fib6_table		*rt6i_table;
-	struct fib6_node		*rt6i_node;
+	struct fib6_node __rcu		*rt6i_node;
 
 	struct in6_addr			rt6i_gateway;
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 3c46e9513a31..936e9ab4dda5 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -5556,7 +5556,7 @@ static void __ipv6_ifa_notify(int event, struct inet6_ifaddr *ifp)
 		 * our DAD process, so we don't need
 		 * to do it again
 		 */
-		if (!(ifp->rt->rt6i_node))
+		if (!rcu_access_pointer(ifp->rt->rt6i_node))
 			ip6_ins_rt(ifp->rt);
 		if (ifp->idev->cnf.forwarding)
 			addrconf_join_anycast(ifp);
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index a5ebf86f6be8..10b4b1f8b838 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -889,7 +889,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
 
 		rt->dst.rt6_next = iter;
 		*ins = rt;
-		rt->rt6i_node = fn;
+		rcu_assign_pointer(rt->rt6i_node, fn);
 		atomic_inc(&rt->rt6i_ref);
 		if (!info->skip_notify)
 			inet6_rt_notify(RTM_NEWROUTE, rt, info, nlflags);
@@ -915,7 +915,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
 			return err;
 
 		*ins = rt;
-		rt->rt6i_node = fn;
+		rcu_assign_pointer(rt->rt6i_node, fn);
 		rt->dst.rt6_next = iter->dst.rt6_next;
 		atomic_inc(&rt->rt6i_ref);
 		if (!info->skip_notify)
@@ -1480,8 +1480,9 @@ static void fib6_del_route(struct fib6_node *fn, struct rt6_info **rtp,
 
 int fib6_del(struct rt6_info *rt, struct nl_info *info)
 {
+	struct fib6_node *fn = rcu_dereference_protected(rt->rt6i_node,
+				    lockdep_is_held(&rt->rt6i_table->tb6_lock));
 	struct net *net = info->nl_net;
-	struct fib6_node *fn = rt->rt6i_node;
 	struct rt6_info **rtp;
 
 #if RT6_DEBUG >= 2
@@ -1670,7 +1671,9 @@ static int fib6_clean_node(struct fib6_walker *w)
 			if (res) {
 #if RT6_DEBUG >= 2
 				pr_debug("%s: del failed: rt=%p@%p err=%d\n",
-					 __func__, rt, rt->rt6i_node, res);
+					 __func__, rt,
+					 rcu_access_pointer(rt->rt6i_node),
+					 res);
 #endif
 				continue;
 			}
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index a9d3564caf49..33629f2a0f9d 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1383,7 +1383,8 @@ static void rt6_do_update_pmtu(struct rt6_info *rt, u32 mtu)
 static bool rt6_cache_allowed_for_pmtu(const struct rt6_info *rt)
 {
 	return !(rt->rt6i_flags & RTF_CACHE) &&
-		(rt->rt6i_flags & RTF_PCPU || rt->rt6i_node);
+		(rt->rt6i_flags & RTF_PCPU ||
+		 rcu_access_pointer(rt->rt6i_node));
 }
 
 static void __ip6_rt_update_pmtu(struct dst_entry *dst, const struct sock *sk,
-- 
2.14.1.342.g6490525c54-goog

^ permalink raw reply related

* Re: [PATCH] net: sunrpc: svcsock: fix NULL-pointer exception
From: J. Bruce Fields @ 2017-08-25 22:01 UTC (permalink / raw)
  To: Vadim Lomovtsev
  Cc: trond.myklebust, anna.schumaker, jlayton, davem, linux-nfs,
	netdev, linux-kernel, pabeni
In-Reply-To: <1503050447-13362-1-git-send-email-vlomovts@redhat.com>

On Fri, Aug 18, 2017 at 06:00:47AM -0400, Vadim Lomovtsev wrote:
> While running nfs/connectathon tests kernel NULL-pointer exception
> has been observed due to races in svcsock.c.
> 
> Race is appear when kernel accepts connection by kernel_accept
> (which creates new socket) and start queuing ingress packets
> to new socket. This happanes in ksoftirq context which concurrently
> on a differnt core while new socket setup is not done yet.
> 
> The fix is to re-order socket user data init sequence, add NULL-ptr
> check before callback call along with barriers to prevent kernel crash.
> 
> Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.

By the way, is there anything special about your setup that allows you
to reproduce this?  There's nothing special about connectathon tests, so
I'm just wondering why we haven't had a lot of reports of this.

--b.

> 
> Crash log:
> ---<-snip->---
> [ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> [ 6708.647093] pgd = ffff0000094e0000
> [ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
> [ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
> [ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgb
 lt fb_sys_fops
> [ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
> [ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
> [ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
> [ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
> [ 6708.773822] PC is at 0x0
> [ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
> [ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
> [ 6708.789248] sp : ffff810ffbad3900
> [ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
> [ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
> [ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
> [ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
> [ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
> [ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
> [ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
> [ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
> [ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
> [ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
> [ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
> [ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
> [ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
> [ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
> [ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
> [ 6708.872084]
> [ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
> [ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
> [ 6708.886075] Call trace:
> [ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
> [ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
> [ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
> [ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
> [ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
> [ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
> [ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
> [ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
> [ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
> [ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
> [ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
> [ 6708.973117] [<          (null)>]           (null)
> [ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
> [ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
> [ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
> [ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
> [ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
> [ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
> [ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
> [ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
> [ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
> [ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
> [ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
> [ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
> [ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
> [ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
> [ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
> [ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
> [ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
> [ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
> [ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
> [ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
> [ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
> [ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
> [ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
> [ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
> [ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
> [ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
> [ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
> [ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
> [ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
> [ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
> [ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
> [ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
> [ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
> [ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
> [ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
> [ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
> [ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
> [ 6709.207967] Code: bad PC value
> [ 6709.211061] SMP: stopping secondary CPUs
> [ 6709.218830] Starting crashdump kernel...
> [ 6709.222749] Bye!
> ---<-snip>---
> 
> Signed-off-by: Vadim Lomovtsev <vlomovts@redhat.com>
> ---
>  net/sunrpc/svcsock.c | 24 ++++++++++++++++++------
>  1 file changed, 18 insertions(+), 6 deletions(-)
> 
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 2b720fa..b6496f3 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -421,7 +421,9 @@ static void svc_data_ready(struct sock *sk)
>  		dprintk("svc: socket %p(inet %p), busy=%d\n",
>  			svsk, sk,
>  			test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> -		svsk->sk_odata(sk);
> +		rmb();
> +		if (svsk->sk_odata)
> +			svsk->sk_odata(sk);
>  		if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
>  			svc_xprt_enqueue(&svsk->sk_xprt);
>  	}
> @@ -437,7 +439,9 @@ static void svc_write_space(struct sock *sk)
>  	if (svsk) {
>  		dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
>  			svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> -		svsk->sk_owspace(sk);
> +		rmb();
> +		if (svsk->sk_owspace)
> +			svsk->sk_owspace(sk);
>  		svc_xprt_enqueue(&svsk->sk_xprt);
>  	}
>  }
> @@ -760,8 +764,12 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
>  	dprintk("svc: socket %p TCP (listen) state change %d\n",
>  		sk, sk->sk_state);
>  
> -	if (svsk)
> -		svsk->sk_odata(sk);
> +	if (svsk) { 
> +		rmb();
> +		if (svsk->sk_odata)
> +			svsk->sk_odata(sk);
> +	}
> +
>  	/*
>  	 * This callback may called twice when a new connection
>  	 * is established as a child socket inherits everything
> @@ -794,7 +802,10 @@ static void svc_tcp_state_change(struct sock *sk)
>  	if (!svsk)
>  		printk("svc: socket %p: no user data\n", sk);
>  	else {
> -		svsk->sk_ostate(sk);
> +		rmb();
> +		if (svsk->sk_ostate)
> +			svsk->sk_ostate(sk);
> +
>  		if (sk->sk_state != TCP_ESTABLISHED) {
>  			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
>  			svc_xprt_enqueue(&svsk->sk_xprt);
> @@ -1381,12 +1392,13 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
>  		return ERR_PTR(err);
>  	}
>  
> -	inet->sk_user_data = svsk;
>  	svsk->sk_sock = sock;
>  	svsk->sk_sk = inet;
>  	svsk->sk_ostate = inet->sk_state_change;
>  	svsk->sk_odata = inet->sk_data_ready;
>  	svsk->sk_owspace = inet->sk_write_space;
> +	wmb();
> +	inet->sk_user_data = svsk;
>  
>  	/* Initialize the socket */
>  	if (sock->type == SOCK_DGRAM)
> -- 
> 1.8.3.1

^ permalink raw reply

* [net-next v2 13/13] i40e: synchronize nvmupdate command and adminq subtask
From: Jeff Kirsher @ 2017-08-25 22:00 UTC (permalink / raw)
  To: davem
  Cc: Sudheer Mogilappagari, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <20170825220057.51804-1-jeffrey.t.kirsher@intel.com>

From: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>

During NVM update, state machine gets into unrecoverable state because
i40e_clean_adminq_subtask can get scheduled after the admin queue
command but before other state variables are updated. This causes
incorrect input to i40e_nvmupd_check_wait_event and state transitions
don't happen.

This issue existed before but surfaced after commit 373149fc99a0
("i40e: Decrease the scope of rtnl lock")

This fix adds locking around admin queue command and update of
state variables so that adminq_subtask will have accurate information
whenever it gets scheduled.

Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_nvm.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_nvm.c b/drivers/net/ethernet/intel/i40e/i40e_nvm.c
index 6fdecd70dcbc..2cf7db2dc7cd 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_nvm.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_nvm.c
@@ -753,6 +753,11 @@ i40e_status i40e_nvmupd_command(struct i40e_hw *hw,
 		hw->nvmupd_state = I40E_NVMUPD_STATE_INIT;
 	}
 
+	/* Acquire lock to prevent race condition where adminq_task
+	 * can execute after i40e_nvmupd_nvm_read/write but before state
+	 * variables (nvm_wait_opcode, nvm_release_on_done) are updated
+	 */
+	mutex_lock(&hw->aq.arq_mutex);
 	switch (hw->nvmupd_state) {
 	case I40E_NVMUPD_STATE_INIT:
 		status = i40e_nvmupd_state_init(hw, cmd, bytes, perrno);
@@ -788,6 +793,7 @@ i40e_status i40e_nvmupd_command(struct i40e_hw *hw,
 		*perrno = -ESRCH;
 		break;
 	}
+	mutex_unlock(&hw->aq.arq_mutex);
 	return status;
 }
 
-- 
2.14.1

^ permalink raw reply related

* [net-next v2 11/13] i40e: use cpumask_copy instead of direct assignment
From: Jeff Kirsher @ 2017-08-25 22:00 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20170825220057.51804-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

According to the header file cpumask.h, we shouldn't be directly copying
a cpumask_t, since its a bitmap and might not be copied correctly. Lets
use the provided cpumask_copy() function instead.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c     | 2 +-
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 1b3b681a8b1d..b0ccd3c2eec6 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -3449,7 +3449,7 @@ static void i40e_irq_affinity_notify(struct irq_affinity_notify *notify,
 	struct i40e_q_vector *q_vector =
 		container_of(notify, struct i40e_q_vector, affinity_notify);
 
-	q_vector->affinity_mask = *mask;
+	cpumask_copy(&q_vector->affinity_mask, mask);
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 4dccb67e9268..0d87191b6bac 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -520,7 +520,7 @@ static void i40evf_irq_affinity_notify(struct irq_affinity_notify *notify,
 	struct i40e_q_vector *q_vector =
 		container_of(notify, struct i40e_q_vector, affinity_notify);
 
-	q_vector->affinity_mask = *mask;
+	cpumask_copy(&q_vector->affinity_mask, mask);
 }
 
 /**
-- 
2.14.1

^ permalink raw reply related

* [net-next v2 12/13] i40e: prevent changing ITR if adaptive-rx/tx enabled
From: Jeff Kirsher @ 2017-08-25 22:00 UTC (permalink / raw)
  To: davem; +Cc: Alan Brady, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20170825220057.51804-1-jeffrey.t.kirsher@intel.com>

From: Alan Brady <alan.brady@intel.com>

Currently the driver allows the user to change (or even disable)
interrupt moderation if adaptive-rx/tx is enabled when this should
not be the case.

Adaptive RX/TX will not respect the user's ITR settings so
allowing the user to change it is weird.  This bug would also
allow the user to disable interrupt moderation with adaptive-rx/tx
enabled which doesn't make much sense either.

This patch makes it such that if adaptive-rx/tx is enabled, the user
cannot make any manual adjustments to interrupt moderation.  It also
makes it so that if ITR is disabled but adaptive-rx/tx is then
enabled, ITR will be re-enabled.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 65 +++++++++++++++++---------
 1 file changed, 43 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index a868c8d4fec9..05e89864f781 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -2194,14 +2194,29 @@ static int __i40e_set_coalesce(struct net_device *netdev,
 			       int queue)
 {
 	struct i40e_netdev_priv *np = netdev_priv(netdev);
+	u16 intrl_reg, cur_rx_itr, cur_tx_itr;
 	struct i40e_vsi *vsi = np->vsi;
 	struct i40e_pf *pf = vsi->back;
-	u16 intrl_reg;
 	int i;
 
 	if (ec->tx_max_coalesced_frames_irq || ec->rx_max_coalesced_frames_irq)
 		vsi->work_limit = ec->tx_max_coalesced_frames_irq;
 
+	if (queue < 0) {
+		cur_rx_itr = vsi->rx_rings[0]->rx_itr_setting;
+		cur_tx_itr = vsi->tx_rings[0]->tx_itr_setting;
+	} else if (queue < vsi->num_queue_pairs) {
+		cur_rx_itr = vsi->rx_rings[queue]->rx_itr_setting;
+		cur_tx_itr = vsi->tx_rings[queue]->tx_itr_setting;
+	} else {
+		netif_info(pf, drv, netdev, "Invalid queue value, queue range is 0 - %d\n",
+			   vsi->num_queue_pairs - 1);
+		return -EINVAL;
+	}
+
+	cur_tx_itr &= ~I40E_ITR_DYNAMIC;
+	cur_rx_itr &= ~I40E_ITR_DYNAMIC;
+
 	/* tx_coalesce_usecs_high is ignored, use rx-usecs-high instead */
 	if (ec->tx_coalesce_usecs_high != vsi->int_rate_limit) {
 		netif_info(pf, drv, netdev, "tx-usecs-high is not used, please program rx-usecs-high\n");
@@ -2214,15 +2229,34 @@ static int __i40e_set_coalesce(struct net_device *netdev,
 		return -EINVAL;
 	}
 
-	if (ec->rx_coalesce_usecs == 0) {
-		if (ec->use_adaptive_rx_coalesce)
-			netif_info(pf, drv, netdev, "rx-usecs=0, need to disable adaptive-rx for a complete disable\n");
-	} else if ((ec->rx_coalesce_usecs < (I40E_MIN_ITR << 1)) ||
-		   (ec->rx_coalesce_usecs > (I40E_MAX_ITR << 1))) {
-			netif_info(pf, drv, netdev, "Invalid value, rx-usecs range is 0-8160\n");
-			return -EINVAL;
+	if (ec->rx_coalesce_usecs != cur_rx_itr &&
+	    ec->use_adaptive_rx_coalesce) {
+		netif_info(pf, drv, netdev, "RX interrupt moderation cannot be changed if adaptive-rx is enabled.\n");
+		return -EINVAL;
+	}
+
+	if (ec->rx_coalesce_usecs > (I40E_MAX_ITR << 1)) {
+		netif_info(pf, drv, netdev, "Invalid value, rx-usecs range is 0-8160\n");
+		return -EINVAL;
 	}
 
+	if (ec->tx_coalesce_usecs != cur_tx_itr &&
+	    ec->use_adaptive_tx_coalesce) {
+		netif_info(pf, drv, netdev, "TX interrupt moderation cannot be changed if adaptive-tx is enabled.\n");
+		return -EINVAL;
+	}
+
+	if (ec->tx_coalesce_usecs > (I40E_MAX_ITR << 1)) {
+		netif_info(pf, drv, netdev, "Invalid value, tx-usecs range is 0-8160\n");
+		return -EINVAL;
+	}
+
+	if (ec->use_adaptive_rx_coalesce && !cur_rx_itr)
+		ec->rx_coalesce_usecs = I40E_MIN_ITR << 1;
+
+	if (ec->use_adaptive_tx_coalesce && !cur_tx_itr)
+		ec->tx_coalesce_usecs = I40E_MIN_ITR << 1;
+
 	intrl_reg = i40e_intrl_usec_to_reg(ec->rx_coalesce_usecs_high);
 	vsi->int_rate_limit = INTRL_REG_TO_USEC(intrl_reg);
 	if (vsi->int_rate_limit != ec->rx_coalesce_usecs_high) {
@@ -2230,27 +2264,14 @@ static int __i40e_set_coalesce(struct net_device *netdev,
 			   vsi->int_rate_limit);
 	}
 
-	if (ec->tx_coalesce_usecs == 0) {
-		if (ec->use_adaptive_tx_coalesce)
-			netif_info(pf, drv, netdev, "tx-usecs=0, need to disable adaptive-tx for a complete disable\n");
-	} else if ((ec->tx_coalesce_usecs < (I40E_MIN_ITR << 1)) ||
-		   (ec->tx_coalesce_usecs > (I40E_MAX_ITR << 1))) {
-			netif_info(pf, drv, netdev, "Invalid value, tx-usecs range is 0-8160\n");
-			return -EINVAL;
-	}
-
 	/* rx and tx usecs has per queue value. If user doesn't specify the queue,
 	 * apply to all queues.
 	 */
 	if (queue < 0) {
 		for (i = 0; i < vsi->num_queue_pairs; i++)
 			i40e_set_itr_per_queue(vsi, ec, i);
-	} else if (queue < vsi->num_queue_pairs) {
-		i40e_set_itr_per_queue(vsi, ec, queue);
 	} else {
-		netif_info(pf, drv, netdev, "Invalid queue value, queue range is 0 - %d\n",
-			   vsi->num_queue_pairs - 1);
-		return -EINVAL;
+		i40e_set_itr_per_queue(vsi, ec, queue);
 	}
 
 	return 0;
-- 
2.14.1

^ permalink raw reply related

* [net-next v2 10/13] i40evf: use netdev variable in reset task
From: Jeff Kirsher @ 2017-08-25 22:00 UTC (permalink / raw)
  To: davem; +Cc: Alan Brady, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20170825220057.51804-1-jeffrey.t.kirsher@intel.com>

From: Alan Brady <alan.brady@intel.com>

If we're going to bother initializing a variable to reference it we might
as well use it.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 4a36c2ee3837..4dccb67e9268 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -1879,7 +1879,7 @@ static void i40evf_reset_task(struct work_struct *work)
 	}
 
 continue_reset:
-	if (netif_running(adapter->netdev)) {
+	if (netif_running(netdev)) {
 		netif_carrier_off(netdev);
 		netif_tx_stop_all_queues(netdev);
 		adapter->link_up = false;
@@ -1947,7 +1947,7 @@ static void i40evf_reset_task(struct work_struct *work)
 	return;
 reset_err:
 	dev_err(&adapter->pdev->dev, "failed to allocate resources during reinit\n");
-	i40evf_close(adapter->netdev);
+	i40evf_close(netdev);
 }
 
 /**
-- 
2.14.1

^ permalink raw reply related

* [net-next v2 09/13] i40e/i40evf: rename vf_offload_flags to vf_cap_flags in struct virtchnl_vf_resource
From: Jeff Kirsher @ 2017-08-25 22:00 UTC (permalink / raw)
  To: davem; +Cc: Stefan Assmann, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20170825220057.51804-1-jeffrey.t.kirsher@intel.com>

From: Stefan Assmann <sassmann@kpanic.de>

The current name of vf_offload_flags indicates that the bitmap is
limited to offload related features. Make this more generic by renaming
it to vf_cap_flags, which allows for other capabilities besides
offloading to be added.

Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 22 +++++++++++-----------
 drivers/net/ethernet/intel/i40evf/i40e_common.c    |  2 +-
 drivers/net/ethernet/intel/i40evf/i40evf.h         | 10 +++++-----
 drivers/net/ethernet/intel/i40evf/i40evf_main.c    | 12 ++++++------
 include/linux/avf/virtchnl.h                       |  4 ++--
 5 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 3ef67dc094fc..057c77be96e4 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -1528,39 +1528,39 @@ static int i40e_vc_get_vf_resources_msg(struct i40e_vf *vf, u8 *msg)
 				  VIRTCHNL_VF_OFFLOAD_RSS_REG |
 				  VIRTCHNL_VF_OFFLOAD_VLAN;
 
-	vfres->vf_offload_flags = VIRTCHNL_VF_OFFLOAD_L2;
+	vfres->vf_cap_flags = VIRTCHNL_VF_OFFLOAD_L2;
 	vsi = pf->vsi[vf->lan_vsi_idx];
 	if (!vsi->info.pvid)
-		vfres->vf_offload_flags |= VIRTCHNL_VF_OFFLOAD_VLAN;
+		vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_VLAN;
 
 	if (i40e_vf_client_capable(pf, vf->vf_id) &&
 	    (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_IWARP)) {
-		vfres->vf_offload_flags |= VIRTCHNL_VF_OFFLOAD_IWARP;
+		vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_IWARP;
 		set_bit(I40E_VF_STATE_IWARPENA, &vf->vf_states);
 	}
 
 	if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_RSS_PF) {
-		vfres->vf_offload_flags |= VIRTCHNL_VF_OFFLOAD_RSS_PF;
+		vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_RSS_PF;
 	} else {
 		if ((pf->hw_features & I40E_HW_RSS_AQ_CAPABLE) &&
 		    (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_RSS_AQ))
-			vfres->vf_offload_flags |= VIRTCHNL_VF_OFFLOAD_RSS_AQ;
+			vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_RSS_AQ;
 		else
-			vfres->vf_offload_flags |= VIRTCHNL_VF_OFFLOAD_RSS_REG;
+			vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_RSS_REG;
 	}
 
 	if (pf->hw_features & I40E_HW_MULTIPLE_TCP_UDP_RSS_PCTYPE) {
 		if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_RSS_PCTYPE_V2)
-			vfres->vf_offload_flags |=
+			vfres->vf_cap_flags |=
 				VIRTCHNL_VF_OFFLOAD_RSS_PCTYPE_V2;
 	}
 
 	if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_ENCAP)
-		vfres->vf_offload_flags |= VIRTCHNL_VF_OFFLOAD_ENCAP;
+		vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_ENCAP;
 
 	if ((pf->hw_features & I40E_HW_OUTER_UDP_CSUM_CAPABLE) &&
 	    (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_ENCAP_CSUM))
-		vfres->vf_offload_flags |= VIRTCHNL_VF_OFFLOAD_ENCAP_CSUM;
+		vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_ENCAP_CSUM;
 
 	if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_RX_POLLING) {
 		if (pf->flags & I40E_FLAG_MFP_ENABLED) {
@@ -1570,12 +1570,12 @@ static int i40e_vc_get_vf_resources_msg(struct i40e_vf *vf, u8 *msg)
 			aq_ret = I40E_ERR_PARAM;
 			goto err;
 		}
-		vfres->vf_offload_flags |= VIRTCHNL_VF_OFFLOAD_RX_POLLING;
+		vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_RX_POLLING;
 	}
 
 	if (pf->hw_features & I40E_HW_WB_ON_ITR_CAPABLE) {
 		if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_WB_ON_ITR)
-			vfres->vf_offload_flags |=
+			vfres->vf_cap_flags |=
 					VIRTCHNL_VF_OFFLOAD_WB_ON_ITR;
 	}
 
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_common.c b/drivers/net/ethernet/intel/i40evf/i40e_common.c
index 1dd1938f594f..d69c2e44cd1a 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_common.c
@@ -1104,7 +1104,7 @@ void i40e_vf_parse_hw_config(struct i40e_hw *hw,
 	hw->dev_caps.num_rx_qp = msg->num_queue_pairs;
 	hw->dev_caps.num_tx_qp = msg->num_queue_pairs;
 	hw->dev_caps.num_msix_vectors_vf = msg->max_vectors;
-	hw->dev_caps.dcb = msg->vf_offload_flags &
+	hw->dev_caps.dcb = msg->vf_cap_flags &
 			   VIRTCHNL_VF_OFFLOAD_L2;
 	hw->dev_caps.fcoe = 0;
 	for (i = 0; i < msg->num_vsis; i++) {
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf.h b/drivers/net/ethernet/intel/i40evf/i40evf.h
index 7f905368fc93..d310544c6c6e 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf.h
+++ b/drivers/net/ethernet/intel/i40evf/i40evf.h
@@ -277,19 +277,19 @@ struct i40evf_adapter {
 	enum virtchnl_link_speed link_speed;
 	enum virtchnl_ops current_op;
 #define CLIENT_ALLOWED(_a) ((_a)->vf_res ? \
-			    (_a)->vf_res->vf_offload_flags & \
+			    (_a)->vf_res->vf_cap_flags & \
 				VIRTCHNL_VF_OFFLOAD_IWARP : \
 			    0)
 #define CLIENT_ENABLED(_a) ((_a)->cinst)
 /* RSS by the PF should be preferred over RSS via other methods. */
-#define RSS_PF(_a) ((_a)->vf_res->vf_offload_flags & \
+#define RSS_PF(_a) ((_a)->vf_res->vf_cap_flags & \
 		    VIRTCHNL_VF_OFFLOAD_RSS_PF)
-#define RSS_AQ(_a) ((_a)->vf_res->vf_offload_flags & \
+#define RSS_AQ(_a) ((_a)->vf_res->vf_cap_flags & \
 		    VIRTCHNL_VF_OFFLOAD_RSS_AQ)
-#define RSS_REG(_a) (!((_a)->vf_res->vf_offload_flags & \
+#define RSS_REG(_a) (!((_a)->vf_res->vf_cap_flags & \
 		       (VIRTCHNL_VF_OFFLOAD_RSS_AQ | \
 			VIRTCHNL_VF_OFFLOAD_RSS_PF)))
-#define VLAN_ALLOWED(_a) ((_a)->vf_res->vf_offload_flags & \
+#define VLAN_ALLOWED(_a) ((_a)->vf_res->vf_cap_flags & \
 			  VIRTCHNL_VF_OFFLOAD_VLAN)
 	struct virtchnl_vf_resource *vf_res; /* incl. all VSIs */
 	struct virtchnl_vsi_resource *vsi_res; /* our LAN VSI */
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 8603911cc550..4a36c2ee3837 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -1418,7 +1418,7 @@ static int i40evf_init_rss(struct i40evf_adapter *adapter)
 
 	if (!RSS_PF(adapter)) {
 		/* Enable PCTYPES for RSS, TCP/UDP with IPv4/IPv6 */
-		if (adapter->vf_res->vf_offload_flags &
+		if (adapter->vf_res->vf_cap_flags &
 		    VIRTCHNL_VF_OFFLOAD_RSS_PCTYPE_V2)
 			adapter->hena = I40E_DEFAULT_RSS_HENA_EXPANDED;
 		else
@@ -2371,7 +2371,7 @@ static netdev_features_t i40evf_fix_features(struct net_device *netdev,
 	struct i40evf_adapter *adapter = netdev_priv(netdev);
 
 	features &= ~I40EVF_VLAN_FEATURES;
-	if (adapter->vf_res->vf_offload_flags & VIRTCHNL_VF_OFFLOAD_VLAN)
+	if (adapter->vf_res->vf_cap_flags & VIRTCHNL_VF_OFFLOAD_VLAN)
 		features |= I40EVF_VLAN_FEATURES;
 	return features;
 }
@@ -2458,7 +2458,7 @@ int i40evf_process_config(struct i40evf_adapter *adapter)
 	/* advertise to stack only if offloads for encapsulated packets is
 	 * supported
 	 */
-	if (vfres->vf_offload_flags & VIRTCHNL_VF_OFFLOAD_ENCAP) {
+	if (vfres->vf_cap_flags & VIRTCHNL_VF_OFFLOAD_ENCAP) {
 		hw_enc_features |= NETIF_F_GSO_UDP_TUNNEL	|
 				   NETIF_F_GSO_GRE		|
 				   NETIF_F_GSO_GRE_CSUM		|
@@ -2468,7 +2468,7 @@ int i40evf_process_config(struct i40evf_adapter *adapter)
 				   NETIF_F_GSO_PARTIAL		|
 				   0;
 
-		if (!(vfres->vf_offload_flags &
+		if (!(vfres->vf_cap_flags &
 		      VIRTCHNL_VF_OFFLOAD_ENCAP_CSUM))
 			netdev->gso_partial_features |=
 				NETIF_F_GSO_UDP_TUNNEL_CSUM;
@@ -2496,7 +2496,7 @@ int i40evf_process_config(struct i40evf_adapter *adapter)
 	adapter->vsi.work_limit = I40E_DEFAULT_IRQ_WORK;
 	vsi->netdev = adapter->netdev;
 	vsi->qs_handle = adapter->vsi_res->qset_handle;
-	if (vfres->vf_offload_flags & VIRTCHNL_VF_OFFLOAD_RSS_PF) {
+	if (vfres->vf_cap_flags & VIRTCHNL_VF_OFFLOAD_RSS_PF) {
 		adapter->rss_key_size = vfres->rss_key_size;
 		adapter->rss_lut_size = vfres->rss_lut_size;
 	} else {
@@ -2664,7 +2664,7 @@ static void i40evf_init_task(struct work_struct *work)
 	if (err)
 		goto err_sw_init;
 	i40evf_map_rings_to_vectors(adapter);
-	if (adapter->vf_res->vf_offload_flags &
+	if (adapter->vf_res->vf_cap_flags &
 	    VIRTCHNL_VF_OFFLOAD_WB_ON_ITR)
 		adapter->flags |= I40EVF_FLAG_WB_ON_ITR_CAPABLE;
 
diff --git a/include/linux/avf/virtchnl.h b/include/linux/avf/virtchnl.h
index c893b9520a67..becfca2ae94e 100644
--- a/include/linux/avf/virtchnl.h
+++ b/include/linux/avf/virtchnl.h
@@ -223,7 +223,7 @@ struct virtchnl_vsi_resource {
 
 VIRTCHNL_CHECK_STRUCT_LEN(16, virtchnl_vsi_resource);
 
-/* VF offload flags
+/* VF capability flags
  * VIRTCHNL_VF_OFFLOAD_L2 flag is inclusive of base mode L2 offloads including
  * TX/RX Checksum offloading and TSO for non-tunnelled packets.
  */
@@ -251,7 +251,7 @@ struct virtchnl_vf_resource {
 	u16 max_vectors;
 	u16 max_mtu;
 
-	u32 vf_offload_flags;
+	u32 vf_cap_flags;
 	u32 rss_key_size;
 	u32 rss_lut_size;
 
-- 
2.14.1

^ permalink raw reply related

* [net-next v2 08/13] i40e: move check for avoiding VID=0 filters into i40e_vsi_add_vlan
From: Jeff Kirsher @ 2017-08-25 22:00 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20170825220057.51804-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

In i40e_vsi_add_vlan we treat attempting to add VID=0 as an error,
because it does not do what the caller might expect. We already special
case VID=0 in i40e_vlan_rx_add_vid so that we avoid this error when
adding the VLAN.

This special casing is necessary so that we do not add the VLAN=0 filter
since we don't want to stop receiving untagged traffic. Unfortunately,
not all callers of i40e_vsi_add_vlan are aware of this, including when
we add VLANs from a VF device.

Rather than special casing every single caller of i40e_vsi_add_vlan,
lets just move this check internally. This makes the code simpler
because the caller does not need to be aware of how VLAN=0 is special,
and we don't forget to add this check in new places.

This fixes a harmless error message displaying when adding a VLAN from
within a VF. The message was meaningless but there is no reason to
confuse end users and system administrators, and this is now avoided.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 6a59d9367a2a..1b3b681a8b1d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -2595,9 +2595,20 @@ int i40e_vsi_add_vlan(struct i40e_vsi *vsi, u16 vid)
 {
 	int err;
 
-	if (!vid || vsi->info.pvid)
+	if (vsi->info.pvid)
 		return -EINVAL;
 
+	/* The network stack will attempt to add VID=0, with the intention to
+	 * receive priority tagged packets with a VLAN of 0. Our HW receives
+	 * these packets by default when configured to receive untagged
+	 * packets, so we don't need to add a filter for this case.
+	 * Additionally, HW interprets adding a VID=0 filter as meaning to
+	 * receive *only* tagged traffic and stops receiving untagged traffic.
+	 * Thus, we do not want to actually add a filter for VID=0
+	 */
+	if (!vid)
+		return 0;
+
 	/* Locked once because all functions invoked below iterates list*/
 	spin_lock_bh(&vsi->mac_filter_hash_lock);
 	err = i40e_add_vlan_all_mac(vsi, vid);
@@ -2674,15 +2685,7 @@ static int i40e_vlan_rx_add_vid(struct net_device *netdev,
 	if (vid >= VLAN_N_VID)
 		return -EINVAL;
 
-	/* If the network stack called us with vid = 0 then
-	 * it is asking to receive priority tagged packets with
-	 * vlan id 0.  Our HW receives them by default when configured
-	 * to receive untagged packets so there is no need to add an
-	 * extra filter for vlan 0 tagged packets.
-	 */
-	if (vid)
-		ret = i40e_vsi_add_vlan(vsi, vid);
-
+	ret = i40e_vsi_add_vlan(vsi, vid);
 	if (!ret)
 		set_bit(vid, vsi->active_vlans);
 
-- 
2.14.1

^ permalink raw reply related

* [net-next v2 05/13] i40e: remove workaround for Open Firmware MAC address
From: Jeff Kirsher @ 2017-08-25 22:00 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20170825220057.51804-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

Since commit b499ffb0a22c ("i40e: Look up MAC address in Open Firmware
or IDPROM"), we've had support for obtaining the MAC address
form Open Firmware or IDPROM.

This code relied on sending the Open Firmware address directly to the
device firmware instead of relying on our MAC/VLAN filter list. Thus,
a work around was introduced in commit b1b15df59232 ("i40e: Explicitly
write platform-specific mac address after PF reset")

We refactored the Open Firmware address enablement code in the ill-named
commit 41c4c2b50d52 ("i40e: allow look-up of MAC address from Open
Firmware or IDPROM")

Since this refactor, we no longer even set I40E_FLAG_PF_MAC. Further, we
don't need this work around, because we actually store the MAC address
as part of the MAC/VLAN filter hash. Thus, we will restore the address
correctly upon reset.

The refactor above failed to revert the workaround, so do that now.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h      |  1 -
 drivers/net/ethernet/intel/i40e/i40e_main.c | 60 -----------------------------
 2 files changed, 61 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index f07217b15ffd..d0c1bf5441d8 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -445,7 +445,6 @@ struct i40e_pf {
 #define I40E_FLAG_VEB_STATS_ENABLED		BIT_ULL(37)
 #define I40E_FLAG_LINK_POLLING_ENABLED		BIT_ULL(39)
 #define I40E_FLAG_VEB_MODE_ENABLED		BIT_ULL(40)
-#define I40E_FLAG_PF_MAC			BIT_ULL(50)
 #define I40E_FLAG_TRUE_PROMISC_SUPPORT		BIT_ULL(51)
 #define I40E_FLAG_CLIENT_RESET			BIT_ULL(54)
 #define I40E_FLAG_TEMP_LINK_POLLING		BIT_ULL(55)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index d7248e5c5f01..4d1eb0c19028 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -2713,44 +2713,6 @@ static int i40e_vlan_rx_kill_vid(struct net_device *netdev,
 	return 0;
 }
 
-/**
- * i40e_macaddr_init - explicitly write the mac address filters
- *
- * @vsi: pointer to the vsi
- * @macaddr: the MAC address
- *
- * This is needed when the macaddr has been obtained by other
- * means than the default, e.g., from Open Firmware or IDPROM.
- * Returns 0 on success, negative on failure
- **/
-static int i40e_macaddr_init(struct i40e_vsi *vsi, u8 *macaddr)
-{
-	int ret;
-	struct i40e_aqc_add_macvlan_element_data element;
-
-	ret = i40e_aq_mac_address_write(&vsi->back->hw,
-					I40E_AQC_WRITE_TYPE_LAA_WOL,
-					macaddr, NULL);
-	if (ret) {
-		dev_info(&vsi->back->pdev->dev,
-			 "Addr change for VSI failed: %d\n", ret);
-		return -EADDRNOTAVAIL;
-	}
-
-	memset(&element, 0, sizeof(element));
-	ether_addr_copy(element.mac_addr, macaddr);
-	element.flags = cpu_to_le16(I40E_AQC_MACVLAN_ADD_PERFECT_MATCH);
-	ret = i40e_aq_add_macvlan(&vsi->back->hw, vsi->seid, &element, 1, NULL);
-	if (ret) {
-		dev_info(&vsi->back->pdev->dev,
-			 "add filter failed err %s aq_err %s\n",
-			 i40e_stat_str(&vsi->back->hw, ret),
-			 i40e_aq_str(&vsi->back->hw,
-				     vsi->back->hw.aq.asq_last_status));
-	}
-	return ret;
-}
-
 /**
  * i40e_restore_vlan - Reinstate vlans when vsi/netdev comes back up
  * @vsi: the vsi being brought back up
@@ -3203,19 +3165,8 @@ static void i40e_vsi_config_dcb_rings(struct i40e_vsi *vsi)
  **/
 static void i40e_set_vsi_rx_mode(struct i40e_vsi *vsi)
 {
-	struct i40e_pf *pf = vsi->back;
-	int err;
-
 	if (vsi->netdev)
 		i40e_set_rx_mode(vsi->netdev);
-
-	if (!!(pf->flags & I40E_FLAG_PF_MAC)) {
-		err = i40e_macaddr_init(vsi, pf->hw.mac.addr);
-		if (err) {
-			dev_warn(&pf->pdev->dev,
-				 "could not set up macaddr; err %d\n", err);
-		}
-	}
 }
 
 /**
@@ -10400,17 +10351,6 @@ struct i40e_vsi *i40e_vsi_setup(struct i40e_pf *pf, u8 type,
 	switch (vsi->type) {
 	/* setup the netdev if needed */
 	case I40E_VSI_MAIN:
-		/* Apply relevant filters if a platform-specific mac
-		 * address was selected.
-		 */
-		if (!!(pf->flags & I40E_FLAG_PF_MAC)) {
-			ret = i40e_macaddr_init(vsi, pf->hw.mac.addr);
-			if (ret) {
-				dev_warn(&pf->pdev->dev,
-					 "could not set up macaddr; err %d\n",
-					 ret);
-			}
-		}
 	case I40E_VSI_VMDQ2:
 		ret = i40e_config_netdev(vsi);
 		if (ret)
-- 
2.14.1

^ permalink raw reply related

* [net-next v2 06/13] i40e: Detect ATR HW Evict NVM issue and disable the feature
From: Jeff Kirsher @ 2017-08-25 22:00 UTC (permalink / raw)
  To: davem
  Cc: Anjali Singhai Jain, netdev, nhorman, sassmann, jogreene,
	Alice Michael, Jeff Kirsher
In-Reply-To: <20170825220057.51804-1-jeffrey.t.kirsher@intel.com>

From: Anjali Singhai Jain <anjali.singhai@intel.com>

This patch fixes a problem with the HW ATR eviction feature where the
NVM setting was incorrect.  This patch detects the issue on X720
adapters and disables the feature if the NVM setting is incorrect.

Without this patch, HW ATR Evict feature does not work on broken NVMs
and is not detected either.  If the HW ATR Evict feature is disabled
the SW Eviction feature will take effect.

Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Alice Michael <alice.michael@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 4d1eb0c19028..6a59d9367a2a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -8963,6 +8963,14 @@ static int i40e_sw_init(struct i40e_pf *pf)
 				    I40E_HW_PTP_L4_CAPABLE |
 				    I40E_HW_WOL_MC_MAGIC_PKT_WAKE |
 				    I40E_HW_OUTER_UDP_CSUM_CAPABLE);
+
+#define I40E_FDEVICT_PCTYPE_DEFAULT 0xc03
+		if (rd32(&pf->hw, I40E_GLQF_FDEVICTENA(1)) !=
+		    I40E_FDEVICT_PCTYPE_DEFAULT) {
+			dev_warn(&pf->pdev->dev,
+				 "FD EVICT PCTYPES are not right, disable FD HW EVICT\n");
+			pf->hw_features &= ~I40E_HW_ATR_EVICT_CAPABLE;
+		}
 	} else if ((pf->hw.aq.api_maj_ver > 1) ||
 		   ((pf->hw.aq.api_maj_ver == 1) &&
 		    (pf->hw.aq.api_min_ver > 4))) {
-- 
2.14.1

^ permalink raw reply related

* [net-next v2 04/13] i40e: separate hw_features from runtime changing flags
From: Jeff Kirsher @ 2017-08-25 22:00 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20170825220057.51804-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

The number of flags found in pf->flags has grown quite large, and there
are a lot of different types of flags. Most of the flags are simply
hardware features which are enabled on some firmware or some MAC types.
Other flags are dynamic run-time flags which enable or disable certain
features of the driver.

Separate these two types of flags into pf->hw_features and pf->flags.
The hw_features list will contain a set of features which are enabled at
init time. This will not contain toggles or otherwise dynamically
changing features. These flags should not need atomic protections, as
they will be set once during init and then be essentially read only.

Everything else will remain in the flags variable. These flags may be
modified at any time during run time. A future patch may wish to convert
these flags into set_bit/clear_bit/test_bit or similar approach to
ensure atomic correctness.

The I40E_FLAG_MFP_ENABLED flag may be a good fit for hw_features but
currently is used by ethtool in the private flags settings, and thus has
been left as part of flags.

Additionally, I40E_FLAG_DCB_CAPABLE may be a good fit for the
hw_features but this patch has not tried to untangle it yet.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h             | 43 +++++++------
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c     | 34 +++++-----
 drivers/net/ethernet/intel/i40e/i40e_main.c        | 72 +++++++++++-----------
 drivers/net/ethernet/intel/i40e/i40e_ptp.c         |  6 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.h        |  2 +-
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |  8 +--
 drivers/net/ethernet/intel/i40evf/i40e_txrx.h      |  4 --
 drivers/net/ethernet/intel/i40evf/i40evf.h         |  2 -
 drivers/net/ethernet/intel/i40evf/i40evf_main.c    |  2 +-
 9 files changed, 85 insertions(+), 88 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index d616f698e155..f07217b15ffd 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -75,11 +75,11 @@
 #define I40E_MIN_VSI_ALLOC		83 /* LAN, ATR, FCOE, 64 VF */
 /* max 16 qps */
 #define i40e_default_queues_per_vmdq(pf) \
-		(((pf)->flags & I40E_FLAG_RSS_AQ_CAPABLE) ? 4 : 1)
+		(((pf)->hw_features & I40E_HW_RSS_AQ_CAPABLE) ? 4 : 1)
 #define I40E_DEFAULT_QUEUES_PER_VF	4
 #define I40E_DEFAULT_QUEUES_PER_TC	1 /* should be a power of 2 */
 #define i40e_pf_get_max_q_per_tc(pf) \
-		(((pf)->flags & I40E_FLAG_128_QP_RSS_CAPABLE) ? 128 : 64)
+		(((pf)->hw_features & I40E_HW_128_QP_RSS_CAPABLE) ? 128 : 64)
 #define I40E_FDIR_RING			0
 #define I40E_FDIR_RING_COUNT		32
 #define I40E_MAX_AQ_BUF_SIZE		4096
@@ -401,6 +401,27 @@ struct i40e_pf {
 	struct timer_list service_timer;
 	struct work_struct service_task;
 
+	u64 hw_features;
+#define I40E_HW_RSS_AQ_CAPABLE			BIT_ULL(0)
+#define I40E_HW_128_QP_RSS_CAPABLE		BIT_ULL(1)
+#define I40E_HW_ATR_EVICT_CAPABLE		BIT_ULL(2)
+#define I40E_HW_WB_ON_ITR_CAPABLE		BIT_ULL(3)
+#define I40E_HW_MULTIPLE_TCP_UDP_RSS_PCTYPE	BIT_ULL(4)
+#define I40E_HW_NO_PCI_LINK_CHECK		BIT_ULL(5)
+#define I40E_HW_100M_SGMII_CAPABLE		BIT_ULL(6)
+#define I40E_HW_NO_DCB_SUPPORT			BIT_ULL(7)
+#define I40E_HW_USE_SET_LLDP_MIB		BIT_ULL(8)
+#define I40E_HW_GENEVE_OFFLOAD_CAPABLE		BIT_ULL(9)
+#define I40E_HW_PTP_L4_CAPABLE			BIT_ULL(10)
+#define I40E_HW_WOL_MC_MAGIC_PKT_WAKE		BIT_ULL(11)
+#define I40E_HW_MPLS_HDR_OFFLOAD_CAPABLE	BIT_ULL(12)
+#define I40E_HW_HAVE_CRT_RETIMER		BIT_ULL(13)
+#define I40E_HW_OUTER_UDP_CSUM_CAPABLE		BIT_ULL(14)
+#define I40E_HW_PHY_CONTROLS_LEDS		BIT_ULL(15)
+#define I40E_HW_STOP_FW_LLDP			BIT_ULL(16)
+#define I40E_HW_PORT_ID_VALID			BIT_ULL(17)
+#define I40E_HW_RESTART_AUTONEG			BIT_ULL(18)
+
 	u64 flags;
 #define I40E_FLAG_RX_CSUM_ENABLED		BIT_ULL(1)
 #define I40E_FLAG_MSI_ENABLED			BIT_ULL(2)
@@ -420,33 +441,15 @@ struct i40e_pf {
 #define I40E_FLAG_PTP				BIT_ULL(25)
 #define I40E_FLAG_MFP_ENABLED			BIT_ULL(26)
 #define I40E_FLAG_UDP_FILTER_SYNC		BIT_ULL(27)
-#define I40E_FLAG_PORT_ID_VALID			BIT_ULL(28)
 #define I40E_FLAG_DCB_CAPABLE			BIT_ULL(29)
-#define I40E_FLAG_RSS_AQ_CAPABLE		BIT_ULL(31)
-#define I40E_FLAG_HW_ATR_EVICT_CAPABLE		BIT_ULL(32)
-#define I40E_FLAG_OUTER_UDP_CSUM_CAPABLE	BIT_ULL(33)
-#define I40E_FLAG_128_QP_RSS_CAPABLE		BIT_ULL(34)
-#define I40E_FLAG_WB_ON_ITR_CAPABLE		BIT_ULL(35)
 #define I40E_FLAG_VEB_STATS_ENABLED		BIT_ULL(37)
-#define I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE	BIT_ULL(38)
 #define I40E_FLAG_LINK_POLLING_ENABLED		BIT_ULL(39)
 #define I40E_FLAG_VEB_MODE_ENABLED		BIT_ULL(40)
-#define I40E_FLAG_GENEVE_OFFLOAD_CAPABLE	BIT_ULL(41)
-#define I40E_FLAG_NO_PCI_LINK_CHECK		BIT_ULL(42)
-#define I40E_FLAG_100M_SGMII_CAPABLE		BIT_ULL(43)
-#define I40E_FLAG_RESTART_AUTONEG		BIT_ULL(44)
-#define I40E_FLAG_NO_DCB_SUPPORT		BIT_ULL(45)
-#define I40E_FLAG_USE_SET_LLDP_MIB		BIT_ULL(46)
-#define I40E_FLAG_STOP_FW_LLDP			BIT_ULL(47)
-#define I40E_FLAG_PHY_CONTROLS_LEDS		BIT_ULL(48)
 #define I40E_FLAG_PF_MAC			BIT_ULL(50)
 #define I40E_FLAG_TRUE_PROMISC_SUPPORT		BIT_ULL(51)
-#define I40E_FLAG_HAVE_CRT_RETIMER		BIT_ULL(52)
-#define I40E_FLAG_PTP_L4_CAPABLE		BIT_ULL(53)
 #define I40E_FLAG_CLIENT_RESET			BIT_ULL(54)
 #define I40E_FLAG_TEMP_LINK_POLLING		BIT_ULL(55)
 #define I40E_FLAG_CLIENT_L2_CHANGE		BIT_ULL(56)
-#define I40E_FLAG_WOL_MC_MAGIC_PKT_WAKE		BIT_ULL(57)
 #define I40E_FLAG_LEGACY_RX			BIT_ULL(58)
 
 	struct i40e_client_instance *cinst;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 1d29152256fe..c76549e41705 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -271,7 +271,7 @@ static void i40e_phy_type_to_ethtool(struct i40e_pf *pf, u32 *supported,
 		*advertising |= ADVERTISED_Autoneg;
 		if (hw_link_info->requested_speeds & I40E_LINK_SPEED_1GB)
 			*advertising |= ADVERTISED_1000baseT_Full;
-		if (pf->flags & I40E_FLAG_100M_SGMII_CAPABLE) {
+		if (pf->hw_features & I40E_HW_100M_SGMII_CAPABLE) {
 			*supported |= SUPPORTED_100baseT_Full;
 			*advertising |= ADVERTISED_100baseT_Full;
 		}
@@ -340,12 +340,12 @@ static void i40e_phy_type_to_ethtool(struct i40e_pf *pf, u32 *supported,
 			*advertising |= ADVERTISED_20000baseKR2_Full;
 	}
 	if (phy_types & I40E_CAP_PHY_TYPE_10GBASE_KR) {
-		if (!(pf->flags & I40E_FLAG_HAVE_CRT_RETIMER))
+		if (!(pf->hw_features & I40E_HW_HAVE_CRT_RETIMER))
 			*supported |= SUPPORTED_10000baseKR_Full |
 				      SUPPORTED_Autoneg;
 		*advertising |= ADVERTISED_Autoneg;
 		if (hw_link_info->requested_speeds & I40E_LINK_SPEED_10GB)
-			if (!(pf->flags & I40E_FLAG_HAVE_CRT_RETIMER))
+			if (!(pf->hw_features & I40E_HW_HAVE_CRT_RETIMER))
 				*advertising |= ADVERTISED_10000baseKR_Full;
 	}
 	if (phy_types & I40E_CAP_PHY_TYPE_10GBASE_KX4) {
@@ -356,12 +356,12 @@ static void i40e_phy_type_to_ethtool(struct i40e_pf *pf, u32 *supported,
 			*advertising |= ADVERTISED_10000baseKX4_Full;
 	}
 	if (phy_types & I40E_CAP_PHY_TYPE_1000BASE_KX) {
-		if (!(pf->flags & I40E_FLAG_HAVE_CRT_RETIMER))
+		if (!(pf->hw_features & I40E_HW_HAVE_CRT_RETIMER))
 			*supported |= SUPPORTED_1000baseKX_Full |
 				      SUPPORTED_Autoneg;
 		*advertising |= ADVERTISED_Autoneg;
 		if (hw_link_info->requested_speeds & I40E_LINK_SPEED_1GB)
-			if (!(pf->flags & I40E_FLAG_HAVE_CRT_RETIMER))
+			if (!(pf->hw_features & I40E_HW_HAVE_CRT_RETIMER))
 				*advertising |= ADVERTISED_1000baseKX_Full;
 	}
 	if (phy_types & I40E_CAP_PHY_TYPE_25GBASE_KR ||
@@ -474,7 +474,7 @@ static void i40e_get_settings_link_up(struct i40e_hw *hw,
 			    SUPPORTED_1000baseT_Full;
 		if (hw_link_info->requested_speeds & I40E_LINK_SPEED_1GB)
 			advertising |= ADVERTISED_1000baseT_Full;
-		if (pf->flags & I40E_FLAG_100M_SGMII_CAPABLE) {
+		if (pf->hw_features & I40E_HW_100M_SGMII_CAPABLE) {
 			supported |= SUPPORTED_100baseT_Full;
 			if (hw_link_info->requested_speeds &
 			    I40E_LINK_SPEED_100MB)
@@ -1765,7 +1765,7 @@ static int i40e_get_ts_info(struct net_device *dev,
 			   BIT(HWTSTAMP_FILTER_PTP_V2_L2_SYNC) |
 			   BIT(HWTSTAMP_FILTER_PTP_V2_L2_DELAY_REQ);
 
-	if (pf->flags & I40E_FLAG_PTP_L4_CAPABLE)
+	if (pf->hw_features & I40E_HW_PTP_L4_CAPABLE)
 		info->rx_filters |= BIT(HWTSTAMP_FILTER_PTP_V1_L4_SYNC) |
 				    BIT(HWTSTAMP_FILTER_PTP_V1_L4_DELAY_REQ) |
 				    BIT(HWTSTAMP_FILTER_PTP_V2_EVENT) |
@@ -2005,7 +2005,7 @@ static int i40e_set_phys_id(struct net_device *netdev,
 
 	switch (state) {
 	case ETHTOOL_ID_ACTIVE:
-		if (!(pf->flags & I40E_FLAG_PHY_CONTROLS_LEDS)) {
+		if (!(pf->hw_features & I40E_HW_PHY_CONTROLS_LEDS)) {
 			pf->led_status = i40e_led_get(hw);
 		} else {
 			i40e_aq_set_phy_debug(hw, I40E_PHY_DEBUG_ALL, NULL);
@@ -2015,19 +2015,19 @@ static int i40e_set_phys_id(struct net_device *netdev,
 		}
 		return blink_freq;
 	case ETHTOOL_ID_ON:
-		if (!(pf->flags & I40E_FLAG_PHY_CONTROLS_LEDS))
+		if (!(pf->hw_features & I40E_HW_PHY_CONTROLS_LEDS))
 			i40e_led_set(hw, 0xf, false);
 		else
 			ret = i40e_led_set_phy(hw, true, pf->led_status, 0);
 		break;
 	case ETHTOOL_ID_OFF:
-		if (!(pf->flags & I40E_FLAG_PHY_CONTROLS_LEDS))
+		if (!(pf->hw_features & I40E_HW_PHY_CONTROLS_LEDS))
 			i40e_led_set(hw, 0x0, false);
 		else
 			ret = i40e_led_set_phy(hw, false, pf->led_status, 0);
 		break;
 	case ETHTOOL_ID_INACTIVE:
-		if (!(pf->flags & I40E_FLAG_PHY_CONTROLS_LEDS)) {
+		if (!(pf->hw_features & I40E_HW_PHY_CONTROLS_LEDS)) {
 			i40e_led_set(hw, pf->led_status, false);
 		} else {
 			ret = i40e_led_set_phy(hw, false, pf->led_status,
@@ -2727,22 +2727,22 @@ static int i40e_set_rss_hash_opt(struct i40e_pf *pf, struct ethtool_rxnfc *nfc)
 	switch (nfc->flow_type) {
 	case TCP_V4_FLOW:
 		flow_pctype = I40E_FILTER_PCTYPE_NONF_IPV4_TCP;
-		if (pf->flags & I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE)
+		if (pf->hw_features & I40E_HW_MULTIPLE_TCP_UDP_RSS_PCTYPE)
 			hena |=
 			  BIT_ULL(I40E_FILTER_PCTYPE_NONF_IPV4_TCP_SYN_NO_ACK);
 		break;
 	case TCP_V6_FLOW:
 		flow_pctype = I40E_FILTER_PCTYPE_NONF_IPV6_TCP;
-		if (pf->flags & I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE)
+		if (pf->hw_features & I40E_HW_MULTIPLE_TCP_UDP_RSS_PCTYPE)
 			hena |=
 			  BIT_ULL(I40E_FILTER_PCTYPE_NONF_IPV4_TCP_SYN_NO_ACK);
-		if (pf->flags & I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE)
+		if (pf->hw_features & I40E_HW_MULTIPLE_TCP_UDP_RSS_PCTYPE)
 			hena |=
 			  BIT_ULL(I40E_FILTER_PCTYPE_NONF_IPV6_TCP_SYN_NO_ACK);
 		break;
 	case UDP_V4_FLOW:
 		flow_pctype = I40E_FILTER_PCTYPE_NONF_IPV4_UDP;
-		if (pf->flags & I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE)
+		if (pf->hw_features & I40E_HW_MULTIPLE_TCP_UDP_RSS_PCTYPE)
 			hena |=
 			  BIT_ULL(I40E_FILTER_PCTYPE_NONF_UNICAST_IPV4_UDP) |
 			  BIT_ULL(I40E_FILTER_PCTYPE_NONF_MULTICAST_IPV4_UDP);
@@ -2751,7 +2751,7 @@ static int i40e_set_rss_hash_opt(struct i40e_pf *pf, struct ethtool_rxnfc *nfc)
 		break;
 	case UDP_V6_FLOW:
 		flow_pctype = I40E_FILTER_PCTYPE_NONF_IPV6_UDP;
-		if (pf->flags & I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE)
+		if (pf->hw_features & I40E_HW_MULTIPLE_TCP_UDP_RSS_PCTYPE)
 			hena |=
 			  BIT_ULL(I40E_FILTER_PCTYPE_NONF_UNICAST_IPV6_UDP) |
 			  BIT_ULL(I40E_FILTER_PCTYPE_NONF_MULTICAST_IPV6_UDP);
@@ -4122,7 +4122,7 @@ static int i40e_set_priv_flags(struct net_device *dev, u32 flags)
 	}
 
 	/* Only allow ATR evict on hardware that is capable of handling it */
-	if (pf->flags & I40E_FLAG_HW_ATR_EVICT_CAPABLE)
+	if (!(pf->hw_features & I40E_HW_ATR_EVICT_CAPABLE))
 		pf->flags &= ~I40E_FLAG_HW_ATR_EVICT_ENABLED;
 
 	if (changed_flags & I40E_FLAG_TRUE_PROMISC_SUPPORT) {
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 5df25df123d7..d7248e5c5f01 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5350,7 +5350,7 @@ static int i40e_init_pf_dcb(struct i40e_pf *pf)
 	int err = 0;
 
 	/* Do not enable DCB for SW1 and SW2 images even if the FW is capable */
-	if (pf->flags & I40E_FLAG_NO_DCB_SUPPORT)
+	if (pf->hw_features & I40E_HW_NO_DCB_SUPPORT)
 		goto out;
 
 	/* Get the initial DCB configuration */
@@ -7332,7 +7332,7 @@ static void i40e_rebuild(struct i40e_pf *pf, bool reinit, bool lock_acquired)
 		wr32(hw, I40E_REG_MSS, val);
 	}
 
-	if (pf->flags & I40E_FLAG_RESTART_AUTONEG) {
+	if (pf->hw_features & I40E_HW_RESTART_AUTONEG) {
 		msleep(75);
 		ret = i40e_aq_set_link_restart_an(&pf->hw, true, NULL);
 		if (ret)
@@ -7970,7 +7970,7 @@ static int i40e_alloc_rings(struct i40e_vsi *vsi)
 		ring->count = vsi->num_desc;
 		ring->size = 0;
 		ring->dcb_tc = 0;
-		if (vsi->back->flags & I40E_FLAG_WB_ON_ITR_CAPABLE)
+		if (vsi->back->hw_features & I40E_HW_WB_ON_ITR_CAPABLE)
 			ring->flags = I40E_TXR_FLAGS_WB_ON_ITR;
 		ring->tx_itr_setting = pf->tx_itr_default;
 		vsi->tx_rings[i] = ring++;
@@ -7987,7 +7987,7 @@ static int i40e_alloc_rings(struct i40e_vsi *vsi)
 		ring->count = vsi->num_desc;
 		ring->size = 0;
 		ring->dcb_tc = 0;
-		if (vsi->back->flags & I40E_FLAG_WB_ON_ITR_CAPABLE)
+		if (vsi->back->hw_features & I40E_HW_WB_ON_ITR_CAPABLE)
 			ring->flags = I40E_TXR_FLAGS_WB_ON_ITR;
 		set_ring_xdp(ring);
 		ring->tx_itr_setting = pf->tx_itr_default;
@@ -8523,7 +8523,7 @@ static int i40e_vsi_config_rss(struct i40e_vsi *vsi)
 	u8 *lut;
 	int ret;
 
-	if (!(pf->flags & I40E_FLAG_RSS_AQ_CAPABLE))
+	if (!(pf->hw_features & I40E_HW_RSS_AQ_CAPABLE))
 		return 0;
 
 	if (!vsi->rss_size)
@@ -8653,7 +8653,7 @@ int i40e_config_rss(struct i40e_vsi *vsi, u8 *seed, u8 *lut, u16 lut_size)
 {
 	struct i40e_pf *pf = vsi->back;
 
-	if (pf->flags & I40E_FLAG_RSS_AQ_CAPABLE)
+	if (pf->hw_features & I40E_HW_RSS_AQ_CAPABLE)
 		return i40e_config_rss_aq(vsi, seed, lut, lut_size);
 	else
 		return i40e_config_rss_reg(vsi, seed, lut, lut_size);
@@ -8672,7 +8672,7 @@ int i40e_get_rss(struct i40e_vsi *vsi, u8 *seed, u8 *lut, u16 lut_size)
 {
 	struct i40e_pf *pf = vsi->back;
 
-	if (pf->flags & I40E_FLAG_RSS_AQ_CAPABLE)
+	if (pf->hw_features & I40E_HW_RSS_AQ_CAPABLE)
 		return i40e_get_rss_aq(vsi, seed, lut, lut_size);
 	else
 		return i40e_get_rss_reg(vsi, seed, lut, lut_size);
@@ -9001,47 +9001,47 @@ static int i40e_sw_init(struct i40e_pf *pf)
 	}
 
 	if (pf->hw.mac.type == I40E_MAC_X722) {
-		pf->flags |= I40E_FLAG_RSS_AQ_CAPABLE
-			     | I40E_FLAG_128_QP_RSS_CAPABLE
-			     | I40E_FLAG_HW_ATR_EVICT_CAPABLE
-			     | I40E_FLAG_OUTER_UDP_CSUM_CAPABLE
-			     | I40E_FLAG_WB_ON_ITR_CAPABLE
-			     | I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE
-			     | I40E_FLAG_NO_PCI_LINK_CHECK
-			     | I40E_FLAG_USE_SET_LLDP_MIB
-			     | I40E_FLAG_GENEVE_OFFLOAD_CAPABLE
-			     | I40E_FLAG_PTP_L4_CAPABLE
-			     | I40E_FLAG_WOL_MC_MAGIC_PKT_WAKE;
+		pf->hw_features |= (I40E_HW_RSS_AQ_CAPABLE |
+				    I40E_HW_128_QP_RSS_CAPABLE |
+				    I40E_HW_ATR_EVICT_CAPABLE |
+				    I40E_HW_WB_ON_ITR_CAPABLE |
+				    I40E_HW_MULTIPLE_TCP_UDP_RSS_PCTYPE |
+				    I40E_HW_NO_PCI_LINK_CHECK |
+				    I40E_HW_USE_SET_LLDP_MIB |
+				    I40E_HW_GENEVE_OFFLOAD_CAPABLE |
+				    I40E_HW_PTP_L4_CAPABLE |
+				    I40E_HW_WOL_MC_MAGIC_PKT_WAKE |
+				    I40E_HW_OUTER_UDP_CSUM_CAPABLE);
 	} else if ((pf->hw.aq.api_maj_ver > 1) ||
 		   ((pf->hw.aq.api_maj_ver == 1) &&
 		    (pf->hw.aq.api_min_ver > 4))) {
 		/* Supported in FW API version higher than 1.4 */
-		pf->flags |= I40E_FLAG_GENEVE_OFFLOAD_CAPABLE;
+		pf->hw_features |= I40E_HW_GENEVE_OFFLOAD_CAPABLE;
 	}
 
 	/* Enable HW ATR eviction if possible */
-	if (pf->flags & I40E_FLAG_HW_ATR_EVICT_CAPABLE)
+	if (pf->hw_features & I40E_HW_ATR_EVICT_CAPABLE)
 		pf->flags |= I40E_FLAG_HW_ATR_EVICT_ENABLED;
 
 	if ((pf->hw.mac.type == I40E_MAC_XL710) &&
 	    (((pf->hw.aq.fw_maj_ver == 4) && (pf->hw.aq.fw_min_ver < 33)) ||
 	    (pf->hw.aq.fw_maj_ver < 4))) {
-		pf->flags |= I40E_FLAG_RESTART_AUTONEG;
+		pf->hw_features |= I40E_HW_RESTART_AUTONEG;
 		/* No DCB support  for FW < v4.33 */
-		pf->flags |= I40E_FLAG_NO_DCB_SUPPORT;
+		pf->hw_features |= I40E_HW_NO_DCB_SUPPORT;
 	}
 
 	/* Disable FW LLDP if FW < v4.3 */
 	if ((pf->hw.mac.type == I40E_MAC_XL710) &&
 	    (((pf->hw.aq.fw_maj_ver == 4) && (pf->hw.aq.fw_min_ver < 3)) ||
 	    (pf->hw.aq.fw_maj_ver < 4)))
-		pf->flags |= I40E_FLAG_STOP_FW_LLDP;
+		pf->hw_features |= I40E_HW_STOP_FW_LLDP;
 
 	/* Use the FW Set LLDP MIB API if FW > v4.40 */
 	if ((pf->hw.mac.type == I40E_MAC_XL710) &&
 	    (((pf->hw.aq.fw_maj_ver == 4) && (pf->hw.aq.fw_min_ver >= 40)) ||
 	    (pf->hw.aq.fw_maj_ver >= 5)))
-		pf->flags |= I40E_FLAG_USE_SET_LLDP_MIB;
+		pf->hw_features |= I40E_HW_USE_SET_LLDP_MIB;
 
 	if (pf->hw.func_caps.vmdq) {
 		pf->num_vmdq_vsis = I40E_DEFAULT_NUM_VMDQ_VSI;
@@ -9244,7 +9244,7 @@ static void i40e_udp_tunnel_add(struct net_device *netdev,
 		pf->udp_ports[next_idx].type = I40E_AQC_TUNNEL_TYPE_VXLAN;
 		break;
 	case UDP_TUNNEL_TYPE_GENEVE:
-		if (!(pf->flags & I40E_FLAG_GENEVE_OFFLOAD_CAPABLE))
+		if (!(pf->hw_features & I40E_HW_GENEVE_OFFLOAD_CAPABLE))
 			return;
 		pf->udp_ports[next_idx].type = I40E_AQC_TUNNEL_TYPE_NGE;
 		break;
@@ -9311,7 +9311,7 @@ static int i40e_get_phys_port_id(struct net_device *netdev,
 	struct i40e_pf *pf = np->vsi->back;
 	struct i40e_hw *hw = &pf->hw;
 
-	if (!(pf->flags & I40E_FLAG_PORT_ID_VALID))
+	if (!(pf->hw_features & I40E_HW_PORT_ID_VALID))
 		return -EOPNOTSUPP;
 
 	ppid->id_len = min_t(int, sizeof(hw->mac.port_addr), sizeof(ppid->id));
@@ -9689,7 +9689,7 @@ static int i40e_config_netdev(struct i40e_vsi *vsi)
 			  NETIF_F_RXCSUM		|
 			  0;
 
-	if (!(pf->flags & I40E_FLAG_OUTER_UDP_CSUM_CAPABLE))
+	if (!(pf->hw_features & I40E_HW_OUTER_UDP_CSUM_CAPABLE))
 		netdev->gso_partial_features |= NETIF_F_GSO_UDP_TUNNEL_CSUM;
 
 	netdev->gso_partial_features |= NETIF_F_GSO_GRE_CSUM;
@@ -10447,7 +10447,7 @@ struct i40e_vsi *i40e_vsi_setup(struct i40e_pf *pf, u8 type,
 		break;
 	}
 
-	if ((pf->flags & I40E_FLAG_RSS_AQ_CAPABLE) &&
+	if ((pf->hw_features & I40E_HW_RSS_AQ_CAPABLE) &&
 	    (vsi->type == I40E_VSI_VMDQ2)) {
 		ret = i40e_vsi_config_rss(vsi);
 	}
@@ -11456,7 +11456,7 @@ static int i40e_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	 * Ignore error return codes because if it was already disabled via
 	 * hardware settings this will fail
 	 */
-	if (pf->flags & I40E_FLAG_STOP_FW_LLDP) {
+	if (pf->hw_features & I40E_HW_STOP_FW_LLDP) {
 		dev_info(&pdev->dev, "Stopping firmware LLDP agent.\n");
 		i40e_aq_stop_lldp(hw, true, NULL);
 	}
@@ -11473,7 +11473,7 @@ static int i40e_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	ether_addr_copy(hw->mac.perm_addr, hw->mac.addr);
 	i40e_get_port_mac_addr(hw, hw->mac.port_addr);
 	if (is_valid_ether_addr(hw->mac.port_addr))
-		pf->flags |= I40E_FLAG_PORT_ID_VALID;
+		pf->hw_features |= I40E_HW_PORT_ID_VALID;
 
 	pci_set_drvdata(pdev, pf);
 	pci_save_state(pdev);
@@ -11589,7 +11589,7 @@ static int i40e_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		wr32(hw, I40E_REG_MSS, val);
 	}
 
-	if (pf->flags & I40E_FLAG_RESTART_AUTONEG) {
+	if (pf->hw_features & I40E_HW_RESTART_AUTONEG) {
 		msleep(75);
 		err = i40e_aq_set_link_restart_an(&pf->hw, true, NULL);
 		if (err)
@@ -11676,7 +11676,7 @@ static int i40e_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	 * and will report PCI Gen 1 x 1 by default so don't bother
 	 * checking them.
 	 */
-	if (!(pf->flags & I40E_FLAG_NO_PCI_LINK_CHECK)) {
+	if (!(pf->hw_features & I40E_HW_NO_PCI_LINK_CHECK)) {
 		char speed[PCI_SPEED_SIZE] = "Unknown";
 		char width[PCI_WIDTH_SIZE] = "Unknown";
 
@@ -11747,9 +11747,9 @@ static int i40e_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	if ((pf->hw.device_id == I40E_DEV_ID_10G_BASE_T) ||
 		(pf->hw.device_id == I40E_DEV_ID_10G_BASE_T4))
-		pf->flags |= I40E_FLAG_PHY_CONTROLS_LEDS;
+		pf->hw_features |= I40E_HW_PHY_CONTROLS_LEDS;
 	if (pf->hw.device_id == I40E_DEV_ID_SFP_I_X722)
-		pf->flags |= I40E_FLAG_HAVE_CRT_RETIMER;
+		pf->hw_features |= I40E_HW_HAVE_CRT_RETIMER;
 	/* print a string summarizing features */
 	i40e_print_features(pf);
 
@@ -12061,7 +12061,7 @@ static void i40e_shutdown(struct pci_dev *pdev)
 	 */
 	i40e_notify_client_of_netdev_close(pf->vsi[pf->lan_vsi], false);
 
-	if (pf->wol_en && (pf->flags & I40E_FLAG_WOL_MC_MAGIC_PKT_WAKE))
+	if (pf->wol_en && (pf->hw_features & I40E_HW_WOL_MC_MAGIC_PKT_WAKE))
 		i40e_enable_mc_magic_wake(pf);
 
 	i40e_prep_for_reset(pf, false);
@@ -12093,7 +12093,7 @@ static int i40e_suspend(struct pci_dev *pdev, pm_message_t state)
 	set_bit(__I40E_SUSPENDED, pf->state);
 	set_bit(__I40E_DOWN, pf->state);
 
-	if (pf->wol_en && (pf->flags & I40E_FLAG_WOL_MC_MAGIC_PKT_WAKE))
+	if (pf->wol_en && (pf->hw_features & I40E_HW_WOL_MC_MAGIC_PKT_WAKE))
 		i40e_enable_mc_magic_wake(pf);
 
 	i40e_prep_for_reset(pf, false);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ptp.c b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
index 0129ed3b78ec..d8456c381c99 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ptp.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
@@ -569,7 +569,7 @@ static int i40e_ptp_set_timestamp_mode(struct i40e_pf *pf,
 	case HWTSTAMP_FILTER_PTP_V1_L4_SYNC:
 	case HWTSTAMP_FILTER_PTP_V1_L4_DELAY_REQ:
 	case HWTSTAMP_FILTER_PTP_V1_L4_EVENT:
-		if (!(pf->flags & I40E_FLAG_PTP_L4_CAPABLE))
+		if (!(pf->hw_features & I40E_HW_PTP_L4_CAPABLE))
 			return -ERANGE;
 		pf->ptp_rx = true;
 		tsyntype = I40E_PRTTSYN_CTL1_V1MESSTYPE0_MASK |
@@ -583,7 +583,7 @@ static int i40e_ptp_set_timestamp_mode(struct i40e_pf *pf,
 	case HWTSTAMP_FILTER_PTP_V2_L4_SYNC:
 	case HWTSTAMP_FILTER_PTP_V2_DELAY_REQ:
 	case HWTSTAMP_FILTER_PTP_V2_L4_DELAY_REQ:
-		if (!(pf->flags & I40E_FLAG_PTP_L4_CAPABLE))
+		if (!(pf->hw_features & I40E_HW_PTP_L4_CAPABLE))
 			return -ERANGE;
 		/* fall through */
 	case HWTSTAMP_FILTER_PTP_V2_L2_EVENT:
@@ -592,7 +592,7 @@ static int i40e_ptp_set_timestamp_mode(struct i40e_pf *pf,
 		pf->ptp_rx = true;
 		tsyntype = I40E_PRTTSYN_CTL1_V2MESSTYPE0_MASK |
 			   I40E_PRTTSYN_CTL1_TSYNTYPE_V2;
-		if (pf->flags & I40E_FLAG_PTP_L4_CAPABLE) {
+		if (pf->hw_features & I40E_HW_PTP_L4_CAPABLE) {
 			tsyntype |= I40E_PRTTSYN_CTL1_UDP_ENA_MASK;
 			config->rx_filter = HWTSTAMP_FILTER_PTP_V2_EVENT;
 		} else {
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index a39892d2453d..f0a0eabc2666 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -112,7 +112,7 @@ enum i40e_dyn_idx_t {
 	BIT_ULL(I40E_FILTER_PCTYPE_NONF_MULTICAST_IPV6_UDP))
 
 #define i40e_pf_get_default_rss_hena(pf) \
-	(((pf)->flags & I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE) ? \
+	(((pf)->hw_features & I40E_HW_MULTIPLE_TCP_UDP_RSS_PCTYPE) ? \
 	  I40E_DEFAULT_RSS_HENA_EXPANDED : I40E_DEFAULT_RSS_HENA)
 
 /* Supported Rx Buffer Sizes (a multiple of 128) */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 979110d59f67..3ef67dc094fc 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -1542,14 +1542,14 @@ static int i40e_vc_get_vf_resources_msg(struct i40e_vf *vf, u8 *msg)
 	if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_RSS_PF) {
 		vfres->vf_offload_flags |= VIRTCHNL_VF_OFFLOAD_RSS_PF;
 	} else {
-		if ((pf->flags & I40E_FLAG_RSS_AQ_CAPABLE) &&
+		if ((pf->hw_features & I40E_HW_RSS_AQ_CAPABLE) &&
 		    (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_RSS_AQ))
 			vfres->vf_offload_flags |= VIRTCHNL_VF_OFFLOAD_RSS_AQ;
 		else
 			vfres->vf_offload_flags |= VIRTCHNL_VF_OFFLOAD_RSS_REG;
 	}
 
-	if (pf->flags & I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE) {
+	if (pf->hw_features & I40E_HW_MULTIPLE_TCP_UDP_RSS_PCTYPE) {
 		if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_RSS_PCTYPE_V2)
 			vfres->vf_offload_flags |=
 				VIRTCHNL_VF_OFFLOAD_RSS_PCTYPE_V2;
@@ -1558,7 +1558,7 @@ static int i40e_vc_get_vf_resources_msg(struct i40e_vf *vf, u8 *msg)
 	if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_ENCAP)
 		vfres->vf_offload_flags |= VIRTCHNL_VF_OFFLOAD_ENCAP;
 
-	if ((pf->flags & I40E_FLAG_OUTER_UDP_CSUM_CAPABLE) &&
+	if ((pf->hw_features & I40E_HW_OUTER_UDP_CSUM_CAPABLE) &&
 	    (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_ENCAP_CSUM))
 		vfres->vf_offload_flags |= VIRTCHNL_VF_OFFLOAD_ENCAP_CSUM;
 
@@ -1573,7 +1573,7 @@ static int i40e_vc_get_vf_resources_msg(struct i40e_vf *vf, u8 *msg)
 		vfres->vf_offload_flags |= VIRTCHNL_VF_OFFLOAD_RX_POLLING;
 	}
 
-	if (pf->flags & I40E_FLAG_WB_ON_ITR_CAPABLE) {
+	if (pf->hw_features & I40E_HW_WB_ON_ITR_CAPABLE) {
 		if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_WB_ON_ITR)
 			vfres->vf_offload_flags |=
 					VIRTCHNL_VF_OFFLOAD_WB_ON_ITR;
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.h b/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
index 472f606629d4..489684002e94 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
@@ -98,10 +98,6 @@ enum i40e_dyn_idx_t {
 	BIT_ULL(I40E_FILTER_PCTYPE_NONF_UNICAST_IPV6_UDP) | \
 	BIT_ULL(I40E_FILTER_PCTYPE_NONF_MULTICAST_IPV6_UDP))
 
-#define i40e_pf_get_default_rss_hena(pf) \
-	(((pf)->flags & I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE) ? \
-	  I40E_DEFAULT_RSS_HENA_EXPANDED : I40E_DEFAULT_RSS_HENA)
-
 /* Supported Rx Buffer Sizes (a multiple of 128) */
 #define I40E_RXBUFFER_256   256
 #define I40E_RXBUFFER_1536  1536  /* 128B aligned standard Ethernet frame */
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf.h b/drivers/net/ethernet/intel/i40evf/i40evf.h
index 52cf38f47349..7f905368fc93 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf.h
+++ b/drivers/net/ethernet/intel/i40evf/i40evf.h
@@ -238,8 +238,6 @@ struct i40evf_adapter {
 /* duplicates for common code */
 #define I40E_FLAG_DCB_ENABLED			0
 #define I40E_FLAG_RX_CSUM_ENABLED		I40EVF_FLAG_RX_CSUM_ENABLED
-#define I40E_FLAG_WB_ON_ITR_CAPABLE		I40EVF_FLAG_WB_ON_ITR_CAPABLE
-#define I40E_FLAG_OUTER_UDP_CSUM_CAPABLE	I40EVF_FLAG_OUTER_UDP_CSUM_CAPABLE
 #define I40E_FLAG_LEGACY_RX			I40EVF_FLAG_LEGACY_RX
 	/* flags for admin queue service task */
 	u32 aq_required;
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 21ab3ff5e9ec..8603911cc550 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -1242,7 +1242,7 @@ static int i40evf_alloc_queues(struct i40evf_adapter *adapter)
 		tx_ring->dev = &adapter->pdev->dev;
 		tx_ring->count = adapter->tx_desc_count;
 		tx_ring->tx_itr_setting = (I40E_ITR_DYNAMIC | I40E_ITR_TX_DEF);
-		if (adapter->flags & I40E_FLAG_WB_ON_ITR_CAPABLE)
+		if (adapter->flags & I40EVF_FLAG_WB_ON_ITR_CAPABLE)
 			tx_ring->flags |= I40E_TXR_FLAGS_WB_ON_ITR;
 
 		rx_ring = &adapter->rx_rings[i];
-- 
2.14.1

^ permalink raw reply related

* [net-next v2 07/13] i40e/i40evf: use cmpxchg64 when updating private flags in ethtool
From: Jeff Kirsher @ 2017-08-25 22:00 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20170825220057.51804-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

When a user gives an invalid command to change a private flag which is
not supported, either because it is read-only, or the device is not
capable of the feature, we simply ignore the request.

A naive solution would simply be to report error codes when one of the
flags was not supported. However, this causes problems because it makes
the operation not atomic. If a user requests multiple private flags
together at once we could end up changing one before failing at the
second flag.

We can do a bit better if we instead update a temporary copy of the
flags variable in the loop, and then copy it into place after. If we
aren't careful this has the pitfall of potentially silently overwriting
any changes caused by other threads.

Avoid this by using cmpxchg64 which will compare and swap the flags
variable only if it currently matched the old value. We'll report
-EAGAIN in the (hopefully rare!) case where the cmpxchg64 fails.

This ensures that we can properly report when flags are not supported in
an atomic fashion without the risk of overwriting other threads changes.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c     | 57 +++++++++++++++-------
 drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c | 41 ++++++++++++----
 2 files changed, 70 insertions(+), 28 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index c76549e41705..a868c8d4fec9 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -4069,23 +4069,26 @@ static int i40e_set_priv_flags(struct net_device *dev, u32 flags)
 	struct i40e_netdev_priv *np = netdev_priv(dev);
 	struct i40e_vsi *vsi = np->vsi;
 	struct i40e_pf *pf = vsi->back;
-	u64 changed_flags;
+	u64 orig_flags, new_flags, changed_flags;
 	u32 i, j;
 
-	changed_flags = pf->flags;
+	orig_flags = READ_ONCE(pf->flags);
+	new_flags = orig_flags;
 
 	for (i = 0; i < I40E_PRIV_FLAGS_STR_LEN; i++) {
 		const struct i40e_priv_flags *priv_flags;
 
 		priv_flags = &i40e_gstrings_priv_flags[i];
 
-		if (priv_flags->read_only)
-			continue;
-
 		if (flags & BIT(i))
-			pf->flags |= priv_flags->flag;
+			new_flags |= priv_flags->flag;
 		else
-			pf->flags &= ~(priv_flags->flag);
+			new_flags &= ~(priv_flags->flag);
+
+		/* If this is a read-only flag, it can't be changed */
+		if (priv_flags->read_only &&
+		    ((orig_flags ^ new_flags) & ~BIT(i)))
+			return -EOPNOTSUPP;
 	}
 
 	if (pf->hw.pf_id != 0)
@@ -4096,18 +4099,40 @@ static int i40e_set_priv_flags(struct net_device *dev, u32 flags)
 
 		priv_flags = &i40e_gl_gstrings_priv_flags[j];
 
-		if (priv_flags->read_only)
-			continue;
-
 		if (flags & BIT(i + j))
-			pf->flags |= priv_flags->flag;
+			new_flags |= priv_flags->flag;
 		else
-			pf->flags &= ~(priv_flags->flag);
+			new_flags &= ~(priv_flags->flag);
+
+		/* If this is a read-only flag, it can't be changed */
+		if (priv_flags->read_only &&
+		    ((orig_flags ^ new_flags) & ~BIT(i)))
+			return -EOPNOTSUPP;
 	}
 
 flags_complete:
-	/* check for flags that changed */
-	changed_flags ^= pf->flags;
+	/* Before we finalize any flag changes, we need to perform some
+	 * checks to ensure that the changes are supported and safe.
+	 */
+
+	/* ATR eviction is not supported on all devices */
+	if ((new_flags & I40E_FLAG_HW_ATR_EVICT_ENABLED) &&
+	    !(pf->hw_features & I40E_HW_ATR_EVICT_CAPABLE))
+		return -EOPNOTSUPP;
+
+	/* Compare and exchange the new flags into place. If we failed, that
+	 * is if cmpxchg64 returns anything but the old value, this means that
+	 * something else has modified the flags variable since we copied it
+	 * originally. We'll just punt with an error and log something in the
+	 * message buffer.
+	 */
+	if (cmpxchg64(&pf->flags, orig_flags, new_flags) != orig_flags) {
+		dev_warn(&pf->pdev->dev,
+			 "Unable to update pf->flags as it was modified by another thread...\n");
+		return -EAGAIN;
+	}
+
+	changed_flags = orig_flags ^ new_flags;
 
 	/* Process any additional changes needed as a result of flag changes.
 	 * The changed_flags value reflects the list of bits that were
@@ -4121,10 +4146,6 @@ static int i40e_set_priv_flags(struct net_device *dev, u32 flags)
 		set_bit(__I40E_FD_FLUSH_REQUESTED, pf->state);
 	}
 
-	/* Only allow ATR evict on hardware that is capable of handling it */
-	if (!(pf->hw_features & I40E_HW_ATR_EVICT_CAPABLE))
-		pf->flags &= ~I40E_FLAG_HW_ATR_EVICT_ENABLED;
-
 	if (changed_flags & I40E_FLAG_TRUE_PROMISC_SUPPORT) {
 		u16 sw_flags = 0, valid_flags = 0;
 		int ret;
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c b/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
index 76fd89c1dbb2..65874d6b3ab9 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
@@ -258,29 +258,50 @@ static u32 i40evf_get_priv_flags(struct net_device *netdev)
 static int i40evf_set_priv_flags(struct net_device *netdev, u32 flags)
 {
 	struct i40evf_adapter *adapter = netdev_priv(netdev);
-	u64 changed_flags;
+	u32 orig_flags, new_flags, changed_flags;
 	u32 i;
 
-	changed_flags = adapter->flags;
+	orig_flags = READ_ONCE(adapter->flags);
+	new_flags = orig_flags;
 
 	for (i = 0; i < I40EVF_PRIV_FLAGS_STR_LEN; i++) {
 		const struct i40evf_priv_flags *priv_flags;
 
 		priv_flags = &i40evf_gstrings_priv_flags[i];
 
-		if (priv_flags->read_only)
-			continue;
-
 		if (flags & BIT(i))
-			adapter->flags |= priv_flags->flag;
+			new_flags |= priv_flags->flag;
 		else
-			adapter->flags &= ~(priv_flags->flag);
+			new_flags &= ~(priv_flags->flag);
+
+		if (priv_flags->read_only &&
+		    ((orig_flags ^ new_flags) & ~BIT(i)))
+			return -EOPNOTSUPP;
+	}
+
+	/* Before we finalize any flag changes, any checks which we need to
+	 * perform to determine if the new flags will be supported should go
+	 * here...
+	 */
+
+	/* Compare and exchange the new flags into place. If we failed, that
+	 * is if cmpxchg returns anything but the old value, this means
+	 * something else must have modified the flags variable since we
+	 * copied it. We'll just punt with an error and log something in the
+	 * message buffer.
+	 */
+	if (cmpxchg(&adapter->flags, orig_flags, new_flags) != orig_flags) {
+		dev_warn(&adapter->pdev->dev,
+			 "Unable to update adapter->flags as it was modified by another thread...\n");
+		return -EAGAIN;
 	}
 
-	/* check for flags that changed */
-	changed_flags ^= adapter->flags;
+	changed_flags = orig_flags ^ new_flags;
 
-	/* Process any additional changes needed as a result of flag changes. */
+	/* Process any additional changes needed as a result of flag changes.
+	 * The changed_flags value reflects the list of bits that were changed
+	 * in the code above.
+	 */
 
 	/* issue a reset to force legacy-rx change to take effect */
 	if (changed_flags & I40EVF_FLAG_LEGACY_RX) {
-- 
2.14.1

^ permalink raw reply related

* [net-next v2 02/13] i40evf: prevent VF close returning before state transitions to DOWN
From: Jeff Kirsher @ 2017-08-25 22:00 UTC (permalink / raw)
  To: davem
  Cc: Sudheer Mogilappagari, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <20170825220057.51804-1-jeffrey.t.kirsher@intel.com>

From: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>

Currently i40evf_close() can return before state transitions to
__I40EVF_DOWN because of the latency involved in processing and
receiving response from PF driver and scheduling of VF watchdog_task.
Due to this inconsistency an immediate call to i40evf_open() fails
because state is still DOWN_PENDING.

When a VF interface is in up state and we try to add it as slave,
The bonding driver calls dev_close() and dev_open() in short duration
resulting in dev_open returning error. The ifenslave command needs
to be run again for dev_open to succeed.

This fix ensures that watchdog timer is scheduled immediately after
admin queue operations are scheduled in i40evf_down(). In addition a
wait condition is added at the end of i40evf_close so that function
wont return when state is still DOWN_PENDING. The timeout value is
chosen after some profiling and includes some buffer.

Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40evf/i40evf.h          |  2 ++
 drivers/net/ethernet/intel/i40evf/i40evf_main.c     | 19 +++++++++++++++++++
 drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c |  4 +++-
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf.h b/drivers/net/ethernet/intel/i40evf/i40evf.h
index 7901cc85cbe5..52cf38f47349 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf.h
+++ b/drivers/net/ethernet/intel/i40evf/i40evf.h
@@ -43,6 +43,7 @@
 #include <linux/bitops.h>
 #include <linux/timer.h>
 #include <linux/workqueue.h>
+#include <linux/wait.h>
 #include <linux/delay.h>
 #include <linux/gfp.h>
 #include <linux/skbuff.h>
@@ -194,6 +195,7 @@ struct i40evf_adapter {
 	struct work_struct adminq_task;
 	struct delayed_work client_task;
 	struct delayed_work init_task;
+	wait_queue_head_t down_waitqueue;
 	struct i40e_q_vector *q_vectors;
 	struct list_head vlan_filter_list;
 	char misc_vector_name[IFNAMSIZ + 9];
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 22919b444ddf..21ab3ff5e9ec 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -1143,6 +1143,7 @@ void i40evf_down(struct i40evf_adapter *adapter)
 	}
 
 	clear_bit(__I40EVF_IN_CRITICAL_TASK, &adapter->crit_section);
+	mod_timer_pending(&adapter->watchdog_timer, jiffies + 1);
 }
 
 /**
@@ -1794,6 +1795,7 @@ static void i40evf_disable_vf(struct i40evf_adapter *adapter)
 	clear_bit(__I40EVF_IN_CRITICAL_TASK, &adapter->crit_section);
 	adapter->flags &= ~I40EVF_FLAG_RESET_PENDING;
 	adapter->state = __I40EVF_DOWN;
+	wake_up(&adapter->down_waitqueue);
 	dev_info(&adapter->pdev->dev, "Reset task did not complete, VF disabled\n");
 }
 
@@ -1939,6 +1941,7 @@ static void i40evf_reset_task(struct work_struct *work)
 		i40evf_irq_enable(adapter, true);
 	} else {
 		adapter->state = __I40EVF_DOWN;
+		wake_up(&adapter->down_waitqueue);
 	}
 
 	return;
@@ -2238,6 +2241,7 @@ static int i40evf_open(struct net_device *netdev)
 static int i40evf_close(struct net_device *netdev)
 {
 	struct i40evf_adapter *adapter = netdev_priv(netdev);
+	int status;
 
 	if (adapter->state <= __I40EVF_DOWN_PENDING)
 		return 0;
@@ -2255,7 +2259,18 @@ static int i40evf_close(struct net_device *netdev)
 	 * still active and can DMA into memory. Resources are cleared in
 	 * i40evf_virtchnl_completion() after we get confirmation from the PF
 	 * driver that the rings have been stopped.
+	 *
+	 * Also, we wait for state to transition to __I40EVF_DOWN before
+	 * returning. State change occurs in i40evf_virtchnl_completion() after
+	 * VF resources are released (which occurs after PF driver processes and
+	 * responds to admin queue commands).
 	 */
+
+	status = wait_event_timeout(adapter->down_waitqueue,
+				    adapter->state == __I40EVF_DOWN,
+				    msecs_to_jiffies(200));
+	if (!status)
+		netdev_warn(netdev, "Device resources not yet released\n");
 	return 0;
 }
 
@@ -2683,6 +2698,7 @@ static void i40evf_init_task(struct work_struct *work)
 	adapter->state = __I40EVF_DOWN;
 	set_bit(__I40E_VSI_DOWN, adapter->vsi.state);
 	i40evf_misc_irq_enable(adapter);
+	wake_up(&adapter->down_waitqueue);
 
 	adapter->rss_key = kzalloc(adapter->rss_key_size, GFP_KERNEL);
 	adapter->rss_lut = kzalloc(adapter->rss_lut_size, GFP_KERNEL);
@@ -2844,6 +2860,9 @@ static int i40evf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	schedule_delayed_work(&adapter->init_task,
 			      msecs_to_jiffies(5 * (pdev->devfn & 0x07)));
 
+	/* Setup the wait queue for indicating transition to down status */
+	init_waitqueue_head(&adapter->down_waitqueue);
+
 	return 0;
 
 err_ioremap:
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c b/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
index d2bb250a71af..6c403bf1bbb8 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
@@ -991,8 +991,10 @@ void i40evf_virtchnl_completion(struct i40evf_adapter *adapter,
 	case VIRTCHNL_OP_DISABLE_QUEUES:
 		i40evf_free_all_tx_resources(adapter);
 		i40evf_free_all_rx_resources(adapter);
-		if (adapter->state == __I40EVF_DOWN_PENDING)
+		if (adapter->state == __I40EVF_DOWN_PENDING) {
 			adapter->state = __I40EVF_DOWN;
+			wake_up(&adapter->down_waitqueue);
+		}
 		break;
 	case VIRTCHNL_OP_VERSION:
 	case VIRTCHNL_OP_CONFIG_IRQ_MAP:
-- 
2.14.1

^ permalink raw reply related

* [net-next v2 03/13] i40e: Fix a bug with VMDq RSS queue allocation
From: Jeff Kirsher @ 2017-08-25 22:00 UTC (permalink / raw)
  To: davem
  Cc: Anjali Singhai Jain, netdev, nhorman, sassmann, jogreene,
	Alice Michael, Jeff Kirsher
In-Reply-To: <20170825220057.51804-1-jeffrey.t.kirsher@intel.com>

From: Anjali Singhai Jain <anjali.singhai@intel.com>

The X722 pf flag setup should happen before the VMDq RSS queue count is
initialized for VMDq VSI to get the right number of queues for RSS in
case of X722 devices.

Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Alice Michael <alice.michael@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 46 ++++++++++++++---------------
 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 0cb571e337f6..5df25df123d7 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -9000,6 +9000,29 @@ static int i40e_sw_init(struct i40e_pf *pf)
 				 pf->hw.func_caps.fd_filters_best_effort;
 	}
 
+	if (pf->hw.mac.type == I40E_MAC_X722) {
+		pf->flags |= I40E_FLAG_RSS_AQ_CAPABLE
+			     | I40E_FLAG_128_QP_RSS_CAPABLE
+			     | I40E_FLAG_HW_ATR_EVICT_CAPABLE
+			     | I40E_FLAG_OUTER_UDP_CSUM_CAPABLE
+			     | I40E_FLAG_WB_ON_ITR_CAPABLE
+			     | I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE
+			     | I40E_FLAG_NO_PCI_LINK_CHECK
+			     | I40E_FLAG_USE_SET_LLDP_MIB
+			     | I40E_FLAG_GENEVE_OFFLOAD_CAPABLE
+			     | I40E_FLAG_PTP_L4_CAPABLE
+			     | I40E_FLAG_WOL_MC_MAGIC_PKT_WAKE;
+	} else if ((pf->hw.aq.api_maj_ver > 1) ||
+		   ((pf->hw.aq.api_maj_ver == 1) &&
+		    (pf->hw.aq.api_min_ver > 4))) {
+		/* Supported in FW API version higher than 1.4 */
+		pf->flags |= I40E_FLAG_GENEVE_OFFLOAD_CAPABLE;
+	}
+
+	/* Enable HW ATR eviction if possible */
+	if (pf->flags & I40E_FLAG_HW_ATR_EVICT_CAPABLE)
+		pf->flags |= I40E_FLAG_HW_ATR_EVICT_ENABLED;
+
 	if ((pf->hw.mac.type == I40E_MAC_XL710) &&
 	    (((pf->hw.aq.fw_maj_ver == 4) && (pf->hw.aq.fw_min_ver < 33)) ||
 	    (pf->hw.aq.fw_maj_ver < 4))) {
@@ -9041,29 +9064,6 @@ static int i40e_sw_init(struct i40e_pf *pf)
 					I40E_MAX_VF_COUNT);
 	}
 #endif /* CONFIG_PCI_IOV */
-	if (pf->hw.mac.type == I40E_MAC_X722) {
-		pf->flags |= I40E_FLAG_RSS_AQ_CAPABLE
-			     | I40E_FLAG_128_QP_RSS_CAPABLE
-			     | I40E_FLAG_HW_ATR_EVICT_CAPABLE
-			     | I40E_FLAG_OUTER_UDP_CSUM_CAPABLE
-			     | I40E_FLAG_WB_ON_ITR_CAPABLE
-			     | I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE
-			     | I40E_FLAG_NO_PCI_LINK_CHECK
-			     | I40E_FLAG_USE_SET_LLDP_MIB
-			     | I40E_FLAG_GENEVE_OFFLOAD_CAPABLE
-			     | I40E_FLAG_PTP_L4_CAPABLE
-			     | I40E_FLAG_WOL_MC_MAGIC_PKT_WAKE;
-	} else if ((pf->hw.aq.api_maj_ver > 1) ||
-		   ((pf->hw.aq.api_maj_ver == 1) &&
-		    (pf->hw.aq.api_min_ver > 4))) {
-		/* Supported in FW API version higher than 1.4 */
-		pf->flags |= I40E_FLAG_GENEVE_OFFLOAD_CAPABLE;
-	}
-
-	/* Enable HW ATR eviction if possible */
-	if (pf->flags & I40E_FLAG_HW_ATR_EVICT_CAPABLE)
-		pf->flags |= I40E_FLAG_HW_ATR_EVICT_ENABLED;
-
 	pf->eeprom_version = 0xDEAD;
 	pf->lan_veb = I40E_NO_VEB;
 	pf->lan_vsi = I40E_NO_VSI;
-- 
2.14.1

^ permalink raw reply related

* [net-next v2 01/13] i40e/i40evf: adjust packet size to account for double VLANs
From: Jeff Kirsher @ 2017-08-25 22:00 UTC (permalink / raw)
  To: davem; +Cc: Mitch Williams, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20170825220057.51804-1-jeffrey.t.kirsher@intel.com>

From: Mitch Williams <mitch.a.williams@intel.com>

Now that the kernel supports double VLAN tags, we should at least play
nice. Adjust the max packet size to account for two VLAN tags, not just
one.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c     | 3 +--
 drivers/net/ethernet/intel/i40e/i40e_txrx.h     | 1 +
 drivers/net/ethernet/intel/i40evf/i40e_txrx.h   | 1 +
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 2 +-
 4 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index a7e5a76703e7..0cb571e337f6 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -9770,8 +9770,7 @@ static int i40e_config_netdev(struct i40e_vsi *vsi)
 
 	/* MTU range: 68 - 9706 */
 	netdev->min_mtu = ETH_MIN_MTU;
-	netdev->max_mtu = I40E_MAX_RXBUFFER -
-			  (ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN);
+	netdev->max_mtu = I40E_MAX_RXBUFFER - I40E_PACKET_HDR_PAD;
 
 	return 0;
 }
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index b288d58313a6..a39892d2453d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -130,6 +130,7 @@ enum i40e_dyn_idx_t {
  * i.e. RXBUFFER_512 --> 1216 byte skb (size-2048 slab)
  */
 #define I40E_RX_HDR_SIZE I40E_RXBUFFER_256
+#define I40E_PACKET_HDR_PAD (ETH_HLEN + ETH_FCS_LEN + (VLAN_HLEN * 2))
 #define i40e_rx_desc i40e_32byte_rx_desc
 
 #define I40E_RX_DMA_ATTR \
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.h b/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
index 901282c87cf6..472f606629d4 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
@@ -117,6 +117,7 @@ enum i40e_dyn_idx_t {
  * i.e. RXBUFFER_512 --> 1216 byte skb (size-2048 slab)
  */
 #define I40E_RX_HDR_SIZE I40E_RXBUFFER_256
+#define I40E_PACKET_HDR_PAD (ETH_HLEN + ETH_FCS_LEN + (VLAN_HLEN * 2))
 #define i40e_rx_desc i40e_32byte_rx_desc
 
 #define I40E_RX_DMA_ATTR \
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 93536b9fc629..22919b444ddf 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -2625,7 +2625,7 @@ static void i40evf_init_task(struct work_struct *work)
 
 	/* MTU range: 68 - 9710 */
 	netdev->min_mtu = ETH_MIN_MTU;
-	netdev->max_mtu = I40E_MAX_RXBUFFER - (ETH_HLEN + ETH_FCS_LEN);
+	netdev->max_mtu = I40E_MAX_RXBUFFER - I40E_PACKET_HDR_PAD;
 
 	if (!is_valid_ether_addr(adapter->hw.mac.addr)) {
 		dev_info(&pdev->dev, "Invalid MAC address %pM, using random\n",
-- 
2.14.1

^ permalink raw reply related

* [net-next v2 00/13][pull request] 40GbE Intel Wired LAN Driver Updates 2017-08-25
From: Jeff Kirsher @ 2017-08-25 22:00 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann, jogreene

This series contains updates to i40e and i40evf only.

Mitch adjusts the max packet size to account for two VLAN tags.

Sudheer provides a fix to ensure that the watchdog timer is scheduled
immediately after admin queue operations are scheduled in i40evf_down().
Fixes an issue by adding locking around the admin queue command and
update of state variables so that adminq_subtask will have the accurate
information whenever it gets scheduled.

Anjali fixes a bug where the PF flag setup should happen before the VMDq
RSS queue count is initialized for VMDq VSI to get the right number of
queues for RSS in the case of x722 devices.  Fixed a problem with the
hardware ATR eviction feature where the NVM setting was incorrect.

Jake separates the flags into two types, hw_features and flags.  The
hw_features flags contain a set of features which are enabled at init
time and will not contain feature flags that can be toggled.  Everything
else will remain in the flags variable, and can be modified anytime
during run time.  We should not be directly copying a cpumask_t, since
it is bitmap and might not be copied correctly, so use cpumask_copy()
instead.

Stefan Assmann makes vf _offload_flags more "generic" by renaming it to
vf_cap_flags, which allows other capabilities besides offloading to be
added.

Alan makes it such that if adaptive-rx/tx is enabled, the user cannot
make any manual adjustments to interrupt moderation.  Also makes it so
that if ITR is disabled by adaptive-rx/tx is then enabled, ITR will be
re-enabled.

v2: Dropped patches #1 & #8 from the original patch series submission,
    while Jesse and Jake re-work their patches based on feedback from
    David Miller.  Also removed the duplicate patch 3 that was
    accidentally sent out twice in the previous submission.

The following are changes since commit 3fd87127073292538047adf1c9c757e9cab0dd56:
  strparser: initialize all callbacks
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 40GbE

Alan Brady (2):
  i40evf: use netdev variable in reset task
  i40e: prevent changing ITR if adaptive-rx/tx enabled

Anjali Singhai Jain (2):
  i40e: Fix a bug with VMDq RSS queue allocation
  i40e: Detect ATR HW Evict NVM issue and disable the feature

Jacob Keller (5):
  i40e: separate hw_features from runtime changing flags
  i40e: remove workaround for Open Firmware MAC address
  i40e/i40evf: use cmpxchg64 when updating private flags in ethtool
  i40e: move check for avoiding VID=0 filters into i40e_vsi_add_vlan
  i40e: use cpumask_copy instead of direct assignment

Mitch Williams (1):
  i40e/i40evf: adjust packet size to account for double VLANs

Stefan Assmann (1):
  i40e/i40evf: rename vf_offload_flags to vf_cap_flags in struct
    virtchnl_vf_resource

Sudheer Mogilappagari (2):
  i40evf: prevent VF close returning before state transitions to DOWN
  i40e: synchronize nvmupdate command and adminq subtask

 drivers/net/ethernet/intel/i40e/i40e.h             |  44 ++---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c     | 154 +++++++++++------
 drivers/net/ethernet/intel/i40e/i40e_main.c        | 188 ++++++++-------------
 drivers/net/ethernet/intel/i40e/i40e_nvm.c         |   6 +
 drivers/net/ethernet/intel/i40e/i40e_ptp.c         |   6 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.h        |   3 +-
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |  30 ++--
 drivers/net/ethernet/intel/i40evf/i40e_common.c    |   2 +-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.h      |   5 +-
 drivers/net/ethernet/intel/i40evf/i40evf.h         |  14 +-
 drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c |  41 +++--
 drivers/net/ethernet/intel/i40evf/i40evf_main.c    |  41 +++--
 .../net/ethernet/intel/i40evf/i40evf_virtchnl.c    |   4 +-
 include/linux/avf/virtchnl.h                       |   4 +-
 14 files changed, 291 insertions(+), 251 deletions(-)

-- 
2.14.1

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox