Netdev List
 help / color / mirror / Atom feed
* [PATCH v2 2/5] stmmac: pci: convert to use dev_pm_ops
From: Andy Shevchenko @ 2014-11-03 13:02 UTC (permalink / raw)
  To: Giuseppe Cavallaro, netdev, Kweh Hock Leong, David S . Miller,
	Vince Bridgers, Rayagond K
  Cc: Andy Shevchenko
In-Reply-To: <1415019737-32736-1-git-send-email-andriy.shevchenko@linux.intel.com>

Convert system PM callbacks to use dev_pm_ops. In addition remove the PCI calls
related to a power state since the bus code cares about this already.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c | 27 ++++++++++--------------
 1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
index e45794d..f19ac8e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
@@ -150,30 +150,26 @@ static void stmmac_pci_remove(struct pci_dev *pdev)
 	pci_disable_device(pdev);
 }
 
-#ifdef CONFIG_PM
-static int stmmac_pci_suspend(struct pci_dev *pdev, pm_message_t state)
+#ifdef CONFIG_PM_SLEEP
+static int stmmac_pci_suspend(struct device *dev)
 {
+	struct pci_dev *pdev = to_pci_dev(dev);
 	struct net_device *ndev = pci_get_drvdata(pdev);
-	int ret;
 
-	ret = stmmac_suspend(ndev);
-	pci_save_state(pdev);
-	pci_set_power_state(pdev, pci_choose_state(pdev, state));
-
-	return ret;
+	return stmmac_suspend(ndev);
 }
 
-static int stmmac_pci_resume(struct pci_dev *pdev)
+static int stmmac_pci_resume(struct device *dev)
 {
+	struct pci_dev *pdev = to_pci_dev(dev);
 	struct net_device *ndev = pci_get_drvdata(pdev);
 
-	pci_set_power_state(pdev, PCI_D0);
-	pci_restore_state(pdev);
-
 	return stmmac_resume(ndev);
 }
 #endif
 
+static SIMPLE_DEV_PM_OPS(stmmac_pm_ops, stmmac_pci_suspend, stmmac_pci_resume);
+
 #define STMMAC_VENDOR_ID 0x700
 #define STMMAC_DEVICE_ID 0x1108
 
@@ -190,10 +186,9 @@ struct pci_driver stmmac_pci_driver = {
 	.id_table = stmmac_id_table,
 	.probe = stmmac_pci_probe,
 	.remove = stmmac_pci_remove,
-#ifdef CONFIG_PM
-	.suspend = stmmac_pci_suspend,
-	.resume = stmmac_pci_resume,
-#endif
+	.driver         = {
+		.pm     = &stmmac_pm_ops,
+	},
 };
 
 MODULE_DESCRIPTION("STMMAC 10/100/1000 Ethernet PCI driver");
-- 
2.1.1

^ permalink raw reply related

* [PATCH v2 1/5] stmmac: pci: use defined constant instead of magic number
From: Andy Shevchenko @ 2014-11-03 13:02 UTC (permalink / raw)
  To: Giuseppe Cavallaro, netdev, Kweh Hock Leong, David S . Miller,
	Vince Bridgers, Rayagond K
  Cc: Andy Shevchenko
In-Reply-To: <1415019737-32736-1-git-send-email-andriy.shevchenko@linux.intel.com>

The last standard PCI resource is defined as PCI_STD_RESOURCE_END. Thus, we
could use it instead of plain integer.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
index e17a970..e45794d 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
@@ -90,7 +90,7 @@ static int stmmac_pci_probe(struct pci_dev *pdev,
 	}
 
 	/* Get the base address of device */
-	for (i = 0; i <= 5; i++) {
+	for (i = 0; i <= PCI_STD_RESOURCE_END; i++) {
 		if (pci_resource_len(pdev, i) == 0)
 			continue;
 		addr = pci_iomap(pdev, i, 0);
-- 
2.1.1

^ permalink raw reply related

* [PATCH v2 3/5] stmmac: pci: use managed resources
From: Andy Shevchenko @ 2014-11-03 13:02 UTC (permalink / raw)
  To: Giuseppe Cavallaro, netdev, Kweh Hock Leong, David S . Miller,
	Vince Bridgers, Rayagond K
  Cc: Andy Shevchenko
In-Reply-To: <1415019737-32736-1-git-send-email-andriy.shevchenko@linux.intel.com>

Migrate pci driver to managed resources to reduce boilerplate error handling
code.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c | 43 ++++++------------------
 1 file changed, 10 insertions(+), 33 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
index f19ac8e..5357a3f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
@@ -71,46 +71,37 @@ static void stmmac_default_data(void)
 static int stmmac_pci_probe(struct pci_dev *pdev,
 			    const struct pci_device_id *id)
 {
-	int ret = 0;
-	void __iomem *addr = NULL;
-	struct stmmac_priv *priv = NULL;
+	struct stmmac_priv *priv;
 	int i;
+	int ret;
 
 	/* Enable pci device */
-	ret = pci_enable_device(pdev);
+	ret = pcim_enable_device(pdev);
 	if (ret) {
 		pr_err("%s : ERROR: failed to enable %s device\n", __func__,
 		       pci_name(pdev));
 		return ret;
 	}
-	if (pci_request_regions(pdev, STMMAC_RESOURCE_NAME)) {
-		pr_err("%s: ERROR: failed to get PCI region\n", __func__);
-		ret = -ENODEV;
-		goto err_out_req_reg_failed;
-	}
 
 	/* Get the base address of device */
 	for (i = 0; i <= PCI_STD_RESOURCE_END; i++) {
 		if (pci_resource_len(pdev, i) == 0)
 			continue;
-		addr = pci_iomap(pdev, i, 0);
-		if (addr == NULL) {
-			pr_err("%s: ERROR: cannot map register memory aborting",
-			       __func__);
-			ret = -EIO;
-			goto err_out_map_failed;
-		}
+		ret = pcim_iomap_regions(pdev, BIT(i), pci_name(pdev));
+		if (ret)
+			return ret;
 		break;
 	}
+
 	pci_set_master(pdev);
 
 	stmmac_default_data();
 
-	priv = stmmac_dvr_probe(&(pdev->dev), &plat_dat, addr);
+	priv = stmmac_dvr_probe(&pdev->dev, &plat_dat,
+				pcim_iomap_table(pdev)[i]);
 	if (IS_ERR(priv)) {
 		pr_err("%s: main driver probe failed", __func__);
-		ret = PTR_ERR(priv);
-		goto err_out;
+		return PTR_ERR(priv);
 	}
 	priv->dev->irq = pdev->irq;
 	priv->wol_irq = pdev->irq;
@@ -120,15 +111,6 @@ static int stmmac_pci_probe(struct pci_dev *pdev,
 	pr_debug("STMMAC platform driver registration completed");
 
 	return 0;
-
-err_out:
-	pci_clear_master(pdev);
-err_out_map_failed:
-	pci_release_regions(pdev);
-err_out_req_reg_failed:
-	pci_disable_device(pdev);
-
-	return ret;
 }
 
 /**
@@ -141,13 +123,8 @@ err_out_req_reg_failed:
 static void stmmac_pci_remove(struct pci_dev *pdev)
 {
 	struct net_device *ndev = pci_get_drvdata(pdev);
-	struct stmmac_priv *priv = netdev_priv(ndev);
 
 	stmmac_dvr_remove(ndev);
-
-	pci_iounmap(pdev, priv->ioaddr);
-	pci_release_regions(pdev);
-	pci_disable_device(pdev);
 }
 
 #ifdef CONFIG_PM_SLEEP
-- 
2.1.1

^ permalink raw reply related

* [PATCH net] sfc: don't BUG_ON efx->max_channels == 0 in probe
From: Edward Cree @ 2014-11-03 14:14 UTC (permalink / raw)
  To: davem; +Cc: netdev, sshah, linux-net-drivers

efx_ef10_probe() was BUGging out if the BAR2 size was 0.  This is
 unnecessarily violent; instead we should just fail to probe the device.
Kept a WARN_ON as this problem indicates a broken or misconfigured NIC.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 drivers/net/ethernet/sfc/ef10.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index 002d4cd..a77f05c 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -180,7 +180,8 @@ static int efx_ef10_probe(struct efx_nic *efx)
 		      EFX_MAX_CHANNELS,
 		      resource_size(&efx->pci_dev->resource[EFX_MEM_BAR]) /
 		      (EFX_VI_PAGE_SIZE * EFX_TXQ_TYPES));
-	BUG_ON(efx->max_channels == 0);
+	if (WARN_ON(efx->max_channels == 0))
+		return -EIO;
 
 	nic_data = kzalloc(sizeof(*nic_data), GFP_KERNEL);
 	if (!nic_data)
-- 
1.7.11.7

^ permalink raw reply related

* RE: [PATCH -next v3 1/3] syncookies: avoid magic values and document which-bit-is-what-option
From: David Laight @ 2014-11-03 14:24 UTC (permalink / raw)
  To: 'Florian Westphal', netdev@vger.kernel.org; +Cc: Daniel Borkmann
In-Reply-To: <1415019720-17106-2-git-send-email-fw@strlen.de>

From: Florian Westphal
> Was a bit more difficult to read than needed due to magic shifts;
> add defines and document the used encoding scheme.
> 
> Joint work with Daniel Borkmann.
> 
> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  This patch was not part of earlier versions of the set.
> 
>  net/ipv4/syncookies.c | 50 +++++++++++++++++++++++++++++++++++---------------
>  1 file changed, 35 insertions(+), 15 deletions(-)
> 
> diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
> index 4ac7bca..c3792c0 100644
> --- a/net/ipv4/syncookies.c
> +++ b/net/ipv4/syncookies.c
> @@ -19,10 +19,6 @@
>  #include <net/tcp.h>
>  #include <net/route.h>
> 
> -/* Timestamps: lowest bits store TCP options */
> -#define TSBITS 6
> -#define TSMASK (((__u32)1 << TSBITS) - 1)
> -
>  extern int sysctl_tcp_syncookies;
> 
>  static u32 syncookie_secret[2][16-4+SHA_DIGEST_WORDS] __read_mostly;
> @@ -30,6 +26,30 @@ static u32 syncookie_secret[2][16-4+SHA_DIGEST_WORDS] __read_mostly;
>  #define COOKIEBITS 24	/* Upper bits store count */
>  #define COOKIEMASK (((__u32)1 << COOKIEBITS) - 1)
> 
> +/* TCP Timestamp: 6 lowest bits of timestamp sent in the cookie SYN-ACK
> + * stores TCP options:
> + *
> + * MSB                               LSB
> + * | 31 ...   6 |  5  |  4   | 3 2 1 0 |
> + * |  Timestamp | ECN | SACK | WScale  |
> + *
> + * When we receive a valid cookie-ACK, we look at the echoed tsval (if
> + * any) to figure out which TCP options we should use for the rebuilt
> + * connection.
> + *
> + * A WScale setting of '0xf' (which is an invalid scaling value)
> + * means that original syn did not include the TCP window scaling option.
> + */
> +#define TS_OPT_WSCALE_MASK	0xf
> +#define TS_OPT_SACK		BIT(4)
> +#define TS_OPT_ECN		BIT(5)
> +/* There is no TS_OPT_TIMESTAMP:
> + * if ACK contains timestamp option, we already know it was
> + * requested/supported by the syn/synack exchange.
> + */
> +#define TSBITS	6
> +#define TSMASK	(((__u32)1 << TSBITS) - 1)

Personally I'd define all the values as hex constants instead of mixing
and matching the defines.

So probably just:
#define TS_OPT_WSCALE_MASK	0x0f
#define TS_OPT_SACK		0x10
#define TS_OPT_ECN		0x20
#define TSMASK                0x3f

	David

^ permalink raw reply

* Re: [PATCH -next v3 1/3] syncookies: avoid magic values and document which-bit-is-what-option
From: Daniel Borkmann @ 2014-11-03 14:33 UTC (permalink / raw)
  To: David Laight; +Cc: 'Florian Westphal', netdev@vger.kernel.org
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D1C9E44B1@AcuExch.aculab.com>

On 11/03/2014 03:24 PM, David Laight wrote:
> From: Florian Westphal
>> Was a bit more difficult to read than needed due to magic shifts;
>> add defines and document the used encoding scheme.
>>
>> Joint work with Daniel Borkmann.
>>
>> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
>> Signed-off-by: Florian Westphal <fw@strlen.de>
>> ---
>>   This patch was not part of earlier versions of the set.
>>
>>   net/ipv4/syncookies.c | 50 +++++++++++++++++++++++++++++++++++---------------
>>   1 file changed, 35 insertions(+), 15 deletions(-)
>>
>> diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
>> index 4ac7bca..c3792c0 100644
>> --- a/net/ipv4/syncookies.c
>> +++ b/net/ipv4/syncookies.c
>> @@ -19,10 +19,6 @@
>>   #include <net/tcp.h>
>>   #include <net/route.h>
>>
>> -/* Timestamps: lowest bits store TCP options */
>> -#define TSBITS 6
>> -#define TSMASK (((__u32)1 << TSBITS) - 1)
>> -
>>   extern int sysctl_tcp_syncookies;
>>
>>   static u32 syncookie_secret[2][16-4+SHA_DIGEST_WORDS] __read_mostly;
>> @@ -30,6 +26,30 @@ static u32 syncookie_secret[2][16-4+SHA_DIGEST_WORDS] __read_mostly;
>>   #define COOKIEBITS 24	/* Upper bits store count */
>>   #define COOKIEMASK (((__u32)1 << COOKIEBITS) - 1)
>>
>> +/* TCP Timestamp: 6 lowest bits of timestamp sent in the cookie SYN-ACK
>> + * stores TCP options:
>> + *
>> + * MSB                               LSB
>> + * | 31 ...   6 |  5  |  4   | 3 2 1 0 |
>> + * |  Timestamp | ECN | SACK | WScale  |
>> + *
>> + * When we receive a valid cookie-ACK, we look at the echoed tsval (if
>> + * any) to figure out which TCP options we should use for the rebuilt
>> + * connection.
>> + *
>> + * A WScale setting of '0xf' (which is an invalid scaling value)
>> + * means that original syn did not include the TCP window scaling option.
>> + */
>> +#define TS_OPT_WSCALE_MASK	0xf
>> +#define TS_OPT_SACK		BIT(4)
>> +#define TS_OPT_ECN		BIT(5)
>> +/* There is no TS_OPT_TIMESTAMP:
>> + * if ACK contains timestamp option, we already know it was
>> + * requested/supported by the syn/synack exchange.
>> + */
>> +#define TSBITS	6
>> +#define TSMASK	(((__u32)1 << TSBITS) - 1)
>
> Personally I'd define all the values as hex constants instead of mixing
> and matching the defines.
>
> So probably just:
> #define TS_OPT_WSCALE_MASK	0x0f
> #define TS_OPT_SACK		0x10
> #define TS_OPT_ECN		0x20
> #define TSMASK                0x3f

If you look at the above comment and then take a peek at the actual TS_OPT_*,
it is much easier to follow. Moreover, how is having TSMASK as 0x3f better?!
Currently, it is a constant calculated based upon TSBITS.

^ permalink raw reply

* stmmac: potential circular locking dependency
From: Emilio López @ 2014-11-03 14:36 UTC (permalink / raw)
  To: peppe.cavallaro, maxime.ripard; +Cc: netdev, linux-kernel, linux-arm-kernel

Hi everyone,

I was playing with iperf on my Cubietruck today when I hit this lockdep 
report/breakage on stmmac. It seems to be fairly reproducible by

1) Being on a GbE network
2) "iperf -s" on the stmmmac device
3) "iperf -c <IP of above device> -d -P 5" on some other device on the LAN

Here it goes:

======================================================
[ INFO: possible circular locking dependency detected ]
3.18.0-rc3-00003-g7180edf #916 Not tainted
-------------------------------------------------------
iperf/141 is trying to acquire lock:
  (&(&dev->tx_global_lock)->rlock){+.-...}, at: [<c025927c>] 
stmmac_tx_clean+0x350/0x43c

but task is already holding lock:
  (&(&priv->tx_lock)->rlock){+.-...}, at: [<c0258f5c>] 
stmmac_tx_clean+0x30/0x43c

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (&(&priv->tx_lock)->rlock){+.-...}:
        [<c03f57d0>] _raw_spin_lock+0x5c/0x94
        [<c0259ec4>] stmmac_xmit+0x88/0x620
        [<c0319058>] dev_hard_start_xmit+0x230/0x49c
        [<c033773c>] sch_direct_xmit+0xdc/0x20c
        [<c03194dc>] __dev_queue_xmit+0x218/0x604
        [<c0319954>] dev_queue_xmit+0x1c/0x20
        [<c032896c>] neigh_resolve_output+0x180/0x254
        [<c03a36fc>] ip6_finish_output2+0x188/0x8a4
        [<c03a7440>] ip6_output+0xc8/0x398
        [<c03c81ec>] mld_sendpack+0x2e0/0x6d8
        [<c03ca564>] mld_ifc_timer_expire+0x1f0/0x308
        [<c007c06c>] call_timer_fn+0xb4/0x1f0
        [<c007c7d0>] run_timer_softirq+0x224/0x2f0
        [<c0023fb0>] __do_softirq+0x1d4/0x3e0
        [<c00244e0>] irq_exit+0x9c/0xd0
        [<c006df48>] __handle_domain_irq+0x70/0xb4
        [<c0008754>] gic_handle_irq+0x34/0x6c
        [<c0013484>] __irq_svc+0x44/0x5c
        [<c00628f0>] lock_acquire+0xec/0x17c
        [<c00628f0>] lock_acquire+0xec/0x17c
        [<c03f57d0>] _raw_spin_lock+0x5c/0x94
        [<c00ddeac>] do_read_fault.isra.93+0xa8/0x2a0
        [<c00e0144>] handle_mm_fault+0x44c/0x8dc
        [<c0018670>] do_page_fault+0x160/0x2d8
        [<c0008564>] do_PrefetchAbort+0x44/0xa8
        [<c0013960>] ret_from_exception+0x0/0x20
        [<b6eb0120>] 0xb6eb0120

-> #1 (_xmit_ETHER#2){+.-...}:
        [<c03f57d0>] _raw_spin_lock+0x5c/0x94
        [<c0338254>] dev_deactivate_many+0xd0/0x250
        [<c0338410>] dev_deactivate+0x3c/0x4c
        [<c032e908>] linkwatch_do_dev+0x50/0x84
        [<c032eb74>] __linkwatch_run_queue+0xdc/0x148
        [<c032ec1c>] linkwatch_event+0x3c/0x44
        [<c00367d0>] process_one_work+0x1ec/0x510
        [<c0036b50>] worker_thread+0x5c/0x4d8
        [<c003c748>] kthread+0xe8/0xfc
        [<c000ed28>] ret_from_fork+0x14/0x20

-> #0 (&(&dev->tx_global_lock)->rlock){+.-...}:
        [<c00628e0>] lock_acquire+0xdc/0x17c
        [<c03f57d0>] _raw_spin_lock+0x5c/0x94
        [<c025927c>] stmmac_tx_clean+0x350/0x43c
        [<c02593c0>] stmmac_poll+0x3c/0x618
        [<c031a760>] net_rx_action+0x178/0x28c
        [<c0023fb0>] __do_softirq+0x1d4/0x3e0
        [<c00244e0>] irq_exit+0x9c/0xd0
        [<c006df48>] __handle_domain_irq+0x70/0xb4
        [<c0008754>] gic_handle_irq+0x34/0x6c
        [<c0013484>] __irq_svc+0x44/0x5c
        [<c002435c>] __local_bh_enable_ip+0x9c/0xfc
        [<c002435c>] __local_bh_enable_ip+0x9c/0xfc
        [<c03f5f64>] _raw_read_unlock_bh+0x40/0x44
        [<c03ad050>] inet6_dump_addr+0x33c/0x530
        [<c03ad2a0>] inet6_dump_ifaddr+0x1c/0x20
        [<c032a0fc>] rtnl_dump_all+0x50/0xf4
        [<c033c020>] netlink_dump+0xc0/0x250
        [<c033c3e4>] netlink_recvmsg+0x234/0x300
        [<c02ff7b4>] sock_recvmsg+0xa4/0xc8
        [<c03000bc>] ___sys_recvmsg.part.33+0xe4/0x1c0
        [<c0302158>] __sys_recvmsg+0x60/0x90
        [<c03021a0>] SyS_recvmsg+0x18/0x1c
        [<c000ec60>] ret_fast_syscall+0x0/0x48

other info that might help us debug this:

Chain exists of:
   &(&dev->tx_global_lock)->rlock --> _xmit_ETHER#2 --> 
&(&priv->tx_lock)->rlock

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&(&priv->tx_lock)->rlock);
                                lock(_xmit_ETHER#2);
                                lock(&(&priv->tx_lock)->rlock);
   lock(&(&dev->tx_global_lock)->rlock);

  *** DEADLOCK ***

3 locks held by iperf/141:
  #0:  (rtnl_mutex){+.+.+.}, at: [<c033bf88>] netlink_dump+0x28/0x250
  #1:  (rcu_read_lock){......}, at: [<c03acd14>] inet6_dump_addr+0x0/0x530
  #2:  (&(&priv->tx_lock)->rlock){+.-...}, at: [<c0258f5c>] 
stmmac_tx_clean+0x30/0x43c

stack backtrace:
CPU: 0 PID: 141 Comm: iperf Not tainted 3.18.0-rc3-00003-g7180edf #916
[<c0015df4>] (unwind_backtrace) from [<c0012850>] (show_stack+0x20/0x24)
[<c0012850>] (show_stack) from [<c03ef6dc>] (dump_stack+0x9c/0xbc)
[<c03ef6dc>] (dump_stack) from [<c005bbc0>] (print_circular_bug+0x21c/0x33c)
[<c005bbc0>] (print_circular_bug) from [<c0061e88>] 
(__lock_acquire+0x2060/0x2148)
[<c0061e88>] (__lock_acquire) from [<c00628e0>] (lock_acquire+0xdc/0x17c)
[<c00628e0>] (lock_acquire) from [<c03f57d0>] (_raw_spin_lock+0x5c/0x94)
[<c03f57d0>] (_raw_spin_lock) from [<c025927c>] 
(stmmac_tx_clean+0x350/0x43c)
[<c025927c>] (stmmac_tx_clean) from [<c02593c0>] (stmmac_poll+0x3c/0x618)
[<c02593c0>] (stmmac_poll) from [<c031a760>] (net_rx_action+0x178/0x28c)
[<c031a760>] (net_rx_action) from [<c0023fb0>] (__do_softirq+0x1d4/0x3e0)
[<c0023fb0>] (__do_softirq) from [<c00244e0>] (irq_exit+0x9c/0xd0)
[<c00244e0>] (irq_exit) from [<c006df48>] (__handle_domain_irq+0x70/0xb4)
[<c006df48>] (__handle_domain_irq) from [<c0008754>] 
(gic_handle_irq+0x34/0x6c)
[<c0008754>] (gic_handle_irq) from [<c0013484>] (__irq_svc+0x44/0x5c)
Exception stack(0xcabc1be0 to 0xcabc1c28)
1be0: 00000001 2df53000 00000000 caf15e80 cabc0000 00000201 ca9a9840 
c03ad050
1c00: ca8d9404 00000000 ca9b4f50 cabc1c44 c08732d0 cabc1c28 c005d5d0 
c002435c
1c20: 20000013 ffffffff
[<c0013484>] (__irq_svc) from [<c002435c>] (__local_bh_enable_ip+0x9c/0xfc)
[<c002435c>] (__local_bh_enable_ip) from [<c03f5f64>] 
(_raw_read_unlock_bh+0x40/0x44)
[<c03f5f64>] (_raw_read_unlock_bh) from [<c03ad050>] 
(inet6_dump_addr+0x33c/0x530)
[<c03ad050>] (inet6_dump_addr) from [<c03ad2a0>] 
(inet6_dump_ifaddr+0x1c/0x20)
[<c03ad2a0>] (inet6_dump_ifaddr) from [<c032a0fc>] (rtnl_dump_all+0x50/0xf4)
[<c032a0fc>] (rtnl_dump_all) from [<c033c020>] (netlink_dump+0xc0/0x250)
[<c033c020>] (netlink_dump) from [<c033c3e4>] (netlink_recvmsg+0x234/0x300)
[<c033c3e4>] (netlink_recvmsg) from [<c02ff7b4>] (sock_recvmsg+0xa4/0xc8)
[<c02ff7b4>] (sock_recvmsg) from [<c03000bc>] 
(___sys_recvmsg.part.33+0xe4/0x1c0)
[<c03000bc>] (___sys_recvmsg.part.33) from [<c0302158>] 
(__sys_recvmsg+0x60/0x90)
[<c0302158>] (__sys_recvmsg) from [<c03021a0>] (SyS_recvmsg+0x18/0x1c)
[<c03021a0>] (SyS_recvmsg) from [<c000ec60>] (ret_fast_syscall+0x0/0x48)
---------------------------------

Cheers,

Emilio

^ permalink raw reply

* Re: stmmac: potential circular locking dependency
From: Giuseppe CAVALLARO @ 2014-11-03 14:39 UTC (permalink / raw)
  To: Emilio López, maxime.ripard; +Cc: netdev, linux-kernel, linux-arm-kernel
In-Reply-To: <545792E8.1020208@elopez.com.ar>

Hello Emilio

I have a subset of new patches to review and fix locks in the
driver. I will plan to send them in the next days.

Sincerely
Peppe

On 11/3/2014 3:36 PM, Emilio López wrote:
> Hi everyone,
>
> I was playing with iperf on my Cubietruck today when I hit this lockdep
> report/breakage on stmmac. It seems to be fairly reproducible by
>
> 1) Being on a GbE network
> 2) "iperf -s" on the stmmmac device
> 3) "iperf -c <IP of above device> -d -P 5" on some other device on the LAN
>
> Here it goes:
>
> ======================================================
> [ INFO: possible circular locking dependency detected ]
> 3.18.0-rc3-00003-g7180edf #916 Not tainted
> -------------------------------------------------------
> iperf/141 is trying to acquire lock:
>   (&(&dev->tx_global_lock)->rlock){+.-...}, at: [<c025927c>]
> stmmac_tx_clean+0x350/0x43c
>
> but task is already holding lock:
>   (&(&priv->tx_lock)->rlock){+.-...}, at: [<c0258f5c>]
> stmmac_tx_clean+0x30/0x43c
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #2 (&(&priv->tx_lock)->rlock){+.-...}:
>         [<c03f57d0>] _raw_spin_lock+0x5c/0x94
>         [<c0259ec4>] stmmac_xmit+0x88/0x620
>         [<c0319058>] dev_hard_start_xmit+0x230/0x49c
>         [<c033773c>] sch_direct_xmit+0xdc/0x20c
>         [<c03194dc>] __dev_queue_xmit+0x218/0x604
>         [<c0319954>] dev_queue_xmit+0x1c/0x20
>         [<c032896c>] neigh_resolve_output+0x180/0x254
>         [<c03a36fc>] ip6_finish_output2+0x188/0x8a4
>         [<c03a7440>] ip6_output+0xc8/0x398
>         [<c03c81ec>] mld_sendpack+0x2e0/0x6d8
>         [<c03ca564>] mld_ifc_timer_expire+0x1f0/0x308
>         [<c007c06c>] call_timer_fn+0xb4/0x1f0
>         [<c007c7d0>] run_timer_softirq+0x224/0x2f0
>         [<c0023fb0>] __do_softirq+0x1d4/0x3e0
>         [<c00244e0>] irq_exit+0x9c/0xd0
>         [<c006df48>] __handle_domain_irq+0x70/0xb4
>         [<c0008754>] gic_handle_irq+0x34/0x6c
>         [<c0013484>] __irq_svc+0x44/0x5c
>         [<c00628f0>] lock_acquire+0xec/0x17c
>         [<c00628f0>] lock_acquire+0xec/0x17c
>         [<c03f57d0>] _raw_spin_lock+0x5c/0x94
>         [<c00ddeac>] do_read_fault.isra.93+0xa8/0x2a0
>         [<c00e0144>] handle_mm_fault+0x44c/0x8dc
>         [<c0018670>] do_page_fault+0x160/0x2d8
>         [<c0008564>] do_PrefetchAbort+0x44/0xa8
>         [<c0013960>] ret_from_exception+0x0/0x20
>         [<b6eb0120>] 0xb6eb0120
>
> -> #1 (_xmit_ETHER#2){+.-...}:
>         [<c03f57d0>] _raw_spin_lock+0x5c/0x94
>         [<c0338254>] dev_deactivate_many+0xd0/0x250
>         [<c0338410>] dev_deactivate+0x3c/0x4c
>         [<c032e908>] linkwatch_do_dev+0x50/0x84
>         [<c032eb74>] __linkwatch_run_queue+0xdc/0x148
>         [<c032ec1c>] linkwatch_event+0x3c/0x44
>         [<c00367d0>] process_one_work+0x1ec/0x510
>         [<c0036b50>] worker_thread+0x5c/0x4d8
>         [<c003c748>] kthread+0xe8/0xfc
>         [<c000ed28>] ret_from_fork+0x14/0x20
>
> -> #0 (&(&dev->tx_global_lock)->rlock){+.-...}:
>         [<c00628e0>] lock_acquire+0xdc/0x17c
>         [<c03f57d0>] _raw_spin_lock+0x5c/0x94
>         [<c025927c>] stmmac_tx_clean+0x350/0x43c
>         [<c02593c0>] stmmac_poll+0x3c/0x618
>         [<c031a760>] net_rx_action+0x178/0x28c
>         [<c0023fb0>] __do_softirq+0x1d4/0x3e0
>         [<c00244e0>] irq_exit+0x9c/0xd0
>         [<c006df48>] __handle_domain_irq+0x70/0xb4
>         [<c0008754>] gic_handle_irq+0x34/0x6c
>         [<c0013484>] __irq_svc+0x44/0x5c
>         [<c002435c>] __local_bh_enable_ip+0x9c/0xfc
>         [<c002435c>] __local_bh_enable_ip+0x9c/0xfc
>         [<c03f5f64>] _raw_read_unlock_bh+0x40/0x44
>         [<c03ad050>] inet6_dump_addr+0x33c/0x530
>         [<c03ad2a0>] inet6_dump_ifaddr+0x1c/0x20
>         [<c032a0fc>] rtnl_dump_all+0x50/0xf4
>         [<c033c020>] netlink_dump+0xc0/0x250
>         [<c033c3e4>] netlink_recvmsg+0x234/0x300
>         [<c02ff7b4>] sock_recvmsg+0xa4/0xc8
>         [<c03000bc>] ___sys_recvmsg.part.33+0xe4/0x1c0
>         [<c0302158>] __sys_recvmsg+0x60/0x90
>         [<c03021a0>] SyS_recvmsg+0x18/0x1c
>         [<c000ec60>] ret_fast_syscall+0x0/0x48
>
> other info that might help us debug this:
>
> Chain exists of:
>    &(&dev->tx_global_lock)->rlock --> _xmit_ETHER#2 -->
> &(&priv->tx_lock)->rlock
>
>   Possible unsafe locking scenario:
>
>         CPU0                    CPU1
>         ----                    ----
>    lock(&(&priv->tx_lock)->rlock);
>                                 lock(_xmit_ETHER#2);
>                                 lock(&(&priv->tx_lock)->rlock);
>    lock(&(&dev->tx_global_lock)->rlock);
>
>   *** DEADLOCK ***
>
> 3 locks held by iperf/141:
>   #0:  (rtnl_mutex){+.+.+.}, at: [<c033bf88>] netlink_dump+0x28/0x250
>   #1:  (rcu_read_lock){......}, at: [<c03acd14>] inet6_dump_addr+0x0/0x530
>   #2:  (&(&priv->tx_lock)->rlock){+.-...}, at: [<c0258f5c>]
> stmmac_tx_clean+0x30/0x43c
>
> stack backtrace:
> CPU: 0 PID: 141 Comm: iperf Not tainted 3.18.0-rc3-00003-g7180edf #916
> [<c0015df4>] (unwind_backtrace) from [<c0012850>] (show_stack+0x20/0x24)
> [<c0012850>] (show_stack) from [<c03ef6dc>] (dump_stack+0x9c/0xbc)
> [<c03ef6dc>] (dump_stack) from [<c005bbc0>]
> (print_circular_bug+0x21c/0x33c)
> [<c005bbc0>] (print_circular_bug) from [<c0061e88>]
> (__lock_acquire+0x2060/0x2148)
> [<c0061e88>] (__lock_acquire) from [<c00628e0>] (lock_acquire+0xdc/0x17c)
> [<c00628e0>] (lock_acquire) from [<c03f57d0>] (_raw_spin_lock+0x5c/0x94)
> [<c03f57d0>] (_raw_spin_lock) from [<c025927c>]
> (stmmac_tx_clean+0x350/0x43c)
> [<c025927c>] (stmmac_tx_clean) from [<c02593c0>] (stmmac_poll+0x3c/0x618)
> [<c02593c0>] (stmmac_poll) from [<c031a760>] (net_rx_action+0x178/0x28c)
> [<c031a760>] (net_rx_action) from [<c0023fb0>] (__do_softirq+0x1d4/0x3e0)
> [<c0023fb0>] (__do_softirq) from [<c00244e0>] (irq_exit+0x9c/0xd0)
> [<c00244e0>] (irq_exit) from [<c006df48>] (__handle_domain_irq+0x70/0xb4)
> [<c006df48>] (__handle_domain_irq) from [<c0008754>]
> (gic_handle_irq+0x34/0x6c)
> [<c0008754>] (gic_handle_irq) from [<c0013484>] (__irq_svc+0x44/0x5c)
> Exception stack(0xcabc1be0 to 0xcabc1c28)
> 1be0: 00000001 2df53000 00000000 caf15e80 cabc0000 00000201 ca9a9840
> c03ad050
> 1c00: ca8d9404 00000000 ca9b4f50 cabc1c44 c08732d0 cabc1c28 c005d5d0
> c002435c
> 1c20: 20000013 ffffffff
> [<c0013484>] (__irq_svc) from [<c002435c>] (__local_bh_enable_ip+0x9c/0xfc)
> [<c002435c>] (__local_bh_enable_ip) from [<c03f5f64>]
> (_raw_read_unlock_bh+0x40/0x44)
> [<c03f5f64>] (_raw_read_unlock_bh) from [<c03ad050>]
> (inet6_dump_addr+0x33c/0x530)
> [<c03ad050>] (inet6_dump_addr) from [<c03ad2a0>]
> (inet6_dump_ifaddr+0x1c/0x20)
> [<c03ad2a0>] (inet6_dump_ifaddr) from [<c032a0fc>]
> (rtnl_dump_all+0x50/0xf4)
> [<c032a0fc>] (rtnl_dump_all) from [<c033c020>] (netlink_dump+0xc0/0x250)
> [<c033c020>] (netlink_dump) from [<c033c3e4>] (netlink_recvmsg+0x234/0x300)
> [<c033c3e4>] (netlink_recvmsg) from [<c02ff7b4>] (sock_recvmsg+0xa4/0xc8)
> [<c02ff7b4>] (sock_recvmsg) from [<c03000bc>]
> (___sys_recvmsg.part.33+0xe4/0x1c0)
> [<c03000bc>] (___sys_recvmsg.part.33) from [<c0302158>]
> (__sys_recvmsg+0x60/0x90)
> [<c0302158>] (__sys_recvmsg) from [<c03021a0>] (SyS_recvmsg+0x18/0x1c)
> [<c03021a0>] (SyS_recvmsg) from [<c000ec60>] (ret_fast_syscall+0x0/0x48)
> ---------------------------------
>
> Cheers,
>
> Emilio
>
>

^ permalink raw reply

* RE: [PATCH -next v3 1/3] syncookies: avoid magic values and document which-bit-is-what-option
From: David Laight @ 2014-11-03 14:41 UTC (permalink / raw)
  To: 'Daniel Borkmann'
  Cc: 'Florian Westphal', netdev@vger.kernel.org
In-Reply-To: <54579240.8060309@redhat.com>

From: Daniel Borkmann
> On 11/03/2014 03:24 PM, David Laight wrote:
> > From: Florian Westphal
> >> Was a bit more difficult to read than needed due to magic shifts;
> >> add defines and document the used encoding scheme.
> >>
> >> Joint work with Daniel Borkmann.
> >>
> >> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
> >> Signed-off-by: Florian Westphal <fw@strlen.de>
> >> ---
> >>   This patch was not part of earlier versions of the set.
> >>
> >>   net/ipv4/syncookies.c | 50 +++++++++++++++++++++++++++++++++++---------------
> >>   1 file changed, 35 insertions(+), 15 deletions(-)
> >>
> >> diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
> >> index 4ac7bca..c3792c0 100644
> >> --- a/net/ipv4/syncookies.c
> >> +++ b/net/ipv4/syncookies.c
> >> @@ -19,10 +19,6 @@
> >>   #include <net/tcp.h>
> >>   #include <net/route.h>
> >>
> >> -/* Timestamps: lowest bits store TCP options */
> >> -#define TSBITS 6
> >> -#define TSMASK (((__u32)1 << TSBITS) - 1)
> >> -
> >>   extern int sysctl_tcp_syncookies;
> >>
> >>   static u32 syncookie_secret[2][16-4+SHA_DIGEST_WORDS] __read_mostly;
> >> @@ -30,6 +26,30 @@ static u32 syncookie_secret[2][16-4+SHA_DIGEST_WORDS] __read_mostly;
> >>   #define COOKIEBITS 24	/* Upper bits store count */
> >>   #define COOKIEMASK (((__u32)1 << COOKIEBITS) - 1)
> >>
> >> +/* TCP Timestamp: 6 lowest bits of timestamp sent in the cookie SYN-ACK
> >> + * stores TCP options:
> >> + *
> >> + * MSB                               LSB
> >> + * | 31 ...   6 |  5  |  4   | 3 2 1 0 |
> >> + * |  Timestamp | ECN | SACK | WScale  |
> >> + *
> >> + * When we receive a valid cookie-ACK, we look at the echoed tsval (if
> >> + * any) to figure out which TCP options we should use for the rebuilt
> >> + * connection.
> >> + *
> >> + * A WScale setting of '0xf' (which is an invalid scaling value)
> >> + * means that original syn did not include the TCP window scaling option.
> >> + */
> >> +#define TS_OPT_WSCALE_MASK	0xf
> >> +#define TS_OPT_SACK		BIT(4)
> >> +#define TS_OPT_ECN		BIT(5)
> >> +/* There is no TS_OPT_TIMESTAMP:
> >> + * if ACK contains timestamp option, we already know it was
> >> + * requested/supported by the syn/synack exchange.
> >> + */
> >> +#define TSBITS	6
> >> +#define TSMASK	(((__u32)1 << TSBITS) - 1)
> >
> > Personally I'd define all the values as hex constants instead of mixing
> > and matching the defines.
> >
> > So probably just:
> > #define TS_OPT_WSCALE_MASK	0x0f
> > #define TS_OPT_SACK		0x10
> > #define TS_OPT_ECN		0x20
> > #define TSMASK                0x3f
> 
> If you look at the above comment and then take a peek at the actual TS_OPT_*,
> it is much easier to follow. Moreover, how is having TSMASK as 0x3f better?!
> Currently, it is a constant calculated based upon TSBITS.

TSMASK is also (TS_OPT_WSCALE_MASK | TS_OPT_SACK | TS_OPT_ECN) defining
the values in hex makes this even more clear.
Defining TSBITS from the mask would save the extra definition - which might
be easier done by replacing the shifts with multiply/divide by (TSMASK + 1)
(probably in a #define/inline function to make the code easier to read.

	David

^ permalink raw reply

* Re: stmmac: potential circular locking dependency
From: Emilio López @ 2014-11-03 14:48 UTC (permalink / raw)
  To: Giuseppe CAVALLARO; +Cc: maxime.ripard, netdev, linux-kernel, linux-arm-kernel
In-Reply-To: <545793B6.5010109@st.com>

El 03/11/14 a las 11:39, Giuseppe CAVALLARO escibió:
> Hello Emilio
>
> I have a subset of new patches to review and fix locks in the
> driver. I will plan to send them in the next days.

Great then :) Please Cc: me on the patches when you send them

Cheers,

Emilio

^ permalink raw reply

* RE: [PATCH 0/1] mv643xx_eth: Disable TSO by default
From: David Laight @ 2014-11-03 14:51 UTC (permalink / raw)
  To: 'Eric Dumazet', Ezequiel Garcia
  Cc: netdev@vger.kernel.org, David Miller, Thomas Petazzoni,
	Gregory Clement, Tawfik Bayouk, Lior Amsalem, Nadav Haklai
In-Reply-To: <1414862766.31792.7.camel@edumazet-glaptop2.roam.corp.google.com>

From: Eric Dumazet
> On Sat, 2014-11-01 at 12:30 -0300, Ezequiel Garcia wrote:
> > Several users ([1], [2]) have been reporting data corruption with TSO on
> > Kirkwood platforms (i.e. using the mv643xx_eth driver).
> >
> > Until we manage to find what's causing this, this simple patch will make
> > the TSO path disabled by default. This patch should be queued for stable,
> > fixing the TSO feature introduced in v3.16.
> >
> > The corruption itself is very easy to reproduce: checking md5sum on a mounted
> > NFS directory gives a different result each time. Same tests using the mvneta
> > driver (Armada 370/38x/XP SoC) pass with no issues.
> >
> > Frankly, I'm a bit puzzled about this, and so any ideas or debugging hints
> > are well received.
> 
> lack of barriers maybe ?
> 
> It seems you might need to populate all TX descriptors but delay the
> first, like doing the populate in descending order.
> 
> If you take a look at txq_submit_skb(), you'll see the final
> desc->cmd_sts = cmd_sts (line 959) is done _after_ frags were cooked by
> txq_submit_frag_skb()
> 
> You should kick the nick only when all TX descriptors are ready and
> committed to memory.

Don't forget that the nick might process the first descriptor without
being given a 'kick' - it will read it when it finishes processing the
previous frame.
This also means that you have to be careful about the order of the writes
to the first descriptor.

	David


^ permalink raw reply

* [net-next 01/11] i40e: Add condition to enter fdir flush and reinit
From: Jeff Kirsher @ 2014-11-03 14:56 UTC (permalink / raw)
  To: davem
  Cc: Akeem G Abodunrin, netdev, nhorman, sassmann, jogreene,
	Patrick Lu, Jeff Kirsher
In-Reply-To: <1415026599-16232-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>

When FD_SB/ATR are not enabled, do not allow flow director flush
and reinit.

Change-ID: Iafe261c1862992981615815551abd1ed9fada0a8
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Signed-off-by: Patrick Lu <patrick.lu@intel.com>
Tested-by: <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 834c9ff..0eccd82 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5211,6 +5211,9 @@ static void i40e_fdir_flush_and_replay(struct i40e_pf *pf)
 	int flush_wait_retry = 50;
 	int reg;
 
+	if (!(pf->flags & (I40E_FLAG_FD_SB_ENABLED | I40E_FLAG_FD_ATR_ENABLED)))
+		return;
+
 	if (time_after(jiffies, pf->fd_flush_timestamp +
 				(I40E_MIN_FD_FLUSH_INTERVAL * HZ))) {
 		set_bit(__I40E_FD_FLUSH_REQUESTED, &pf->state);
@@ -5272,6 +5275,9 @@ static void i40e_fdir_reinit_subtask(struct i40e_pf *pf)
 	if (test_bit(__I40E_DOWN, &pf->state))
 		return;
 
+	if (!(pf->flags & (I40E_FLAG_FD_SB_ENABLED | I40E_FLAG_FD_ATR_ENABLED)))
+		return;
+
 	if ((pf->fd_add_err >= I40E_MAX_FD_PROGRAM_ERROR) &&
 	    (i40e_get_current_atr_cnt(pf) >= pf->fd_atr_cnt) &&
 	    (i40e_get_current_atr_cnt(pf) > pf->fdir_pf_filter_count))
-- 
1.9.3

^ permalink raw reply related

* [net-next 00/11][pull request] Intel Wired LAN Driver Updates 2014-11-03
From: Jeff Kirsher @ 2014-11-03 14:56 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann, jogreene

This series contains updates to i40e and i40evf.

Akeem adds a check for i40e so that flow director flush and reinit are
not done when flow director is not enabled.

Mitch fixes the i40evf driver to properly handle multiple admin queue
messages, by reinit the msg_size field each time we go through the loop.
Without this, we may receive truncated messages due to the firmware
thinking we have insufficient buffer size.  Also fixes the link checking
logic to only check the carrier state if the interface is actually
open, which allows link changes to be reported correctly without spamming
the VFs.  Updates i40e to inset the VSI ID in the QTX_CTL register
when configuring queues for VMDq VSIs.

Paul adds support for 10G-base-T in i40evf.

Jesse fixes i40e where the call to irq_dynamic_disable() was turning off
the interrupt completely when trying to set ITR to 0 (for lowest
moderation).

Shannon removes debugfs dump stats function, since it was not being
kept up-to-date and was redundant with the ethtool output.  Also, scales
back the LAN MSIx usage to force queue/vector sharing and leave some
vectors for Flow Director, VMDq, etc. when there are more cores than
vectors available to the PF.  Cleans up the error reporting for
get_lump() resource tracking errors.  Also adds a check for the
debug module parameter earlier to be able to catch the early configuration
phase admin queue messages.

The following are changes since commit 55b42b5ca2dcf143465968697fe6c6503b05fca1:
  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master

Akeem G Abodunrin (1):
  i40e: Add condition to enter fdir flush and reinit

Jesse Brandeburg (1):
  i40e: avoid disable of interrupt when changing ITR

Mitch Williams (4):
  i40evf: properly handle multiple AQ messages
  i40e: fix link checking logic
  i40e: configure VM ID in qtx_ctl
  i40e: properly parse MDET registers

Paul M Stillwell Jr (1):
  i40evf: Add support for 10G base T parts

Shannon Nelson (4):
  i40e: remove debugfs dump stats
  i40e: scale msix vector use when more cores than vectors
  i40e: better wording for resource tracking errors
  i40e: enable debug earlier

 drivers/net/ethernet/intel/i40e/i40e_debugfs.c  | 93 +------------------------
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c  |  2 -
 drivers/net/ethernet/intel/i40e/i40e_main.c     | 67 ++++++++++++------
 drivers/net/ethernet/intel/i40evf/i40e_common.c |  1 +
 drivers/net/ethernet/intel/i40evf/i40e_type.h   |  1 +
 drivers/net/ethernet/intel/i40evf/i40evf_main.c |  4 +-
 6 files changed, 51 insertions(+), 117 deletions(-)

-- 
1.9.3

^ permalink raw reply

* [net-next 03/11] i40e: fix link checking logic
From: Jeff Kirsher @ 2014-11-03 14:56 UTC (permalink / raw)
  To: davem
  Cc: Mitch Williams, netdev, nhorman, sassmann, jogreene, Patrick Lu,
	Jeff Kirsher
In-Reply-To: <1415026599-16232-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Mitch Williams <mitch.a.williams@intel.com>

If the interface is closed, but VFs exist, current code will spam all
the VFs with link messages every second. This is because the link event
code was looking at netif_carrier_ok() without checking to see if the
interface was actually open.

Refactor the logic to only check the carrier state if the interface is
actually open. This allows link changes to be reported correctly without
spamming the VFs.

Change-ID: If136e79bb3820d21ea4e39e332e8a9604efc2b2a
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Patrick Lu <patrick.lu@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 0eccd82..f95c04a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5358,6 +5358,7 @@ static void i40e_veb_link_event(struct i40e_veb *veb, bool link_up)
 static void i40e_link_event(struct i40e_pf *pf)
 {
 	bool new_link, old_link;
+	struct i40e_vsi *vsi = pf->vsi[pf->lan_vsi];
 
 	/* set this to force the get_link_status call to refresh state */
 	pf->hw.phy.get_link_info = true;
@@ -5366,10 +5367,12 @@ static void i40e_link_event(struct i40e_pf *pf)
 	new_link = i40e_get_link_status(&pf->hw);
 
 	if (new_link == old_link &&
-	    new_link == netif_carrier_ok(pf->vsi[pf->lan_vsi]->netdev))
+	    (test_bit(__I40E_DOWN, &vsi->state) ||
+	     new_link == netif_carrier_ok(vsi->netdev)))
 		return;
-	if (!test_bit(__I40E_DOWN, &pf->vsi[pf->lan_vsi]->state))
-		i40e_print_link_message(pf->vsi[pf->lan_vsi], new_link);
+
+	if (!test_bit(__I40E_DOWN, &vsi->state))
+		i40e_print_link_message(vsi, new_link);
 
 	/* Notify the base of the switch tree connected to
 	 * the link.  Floating VEBs are not notified.
@@ -5377,7 +5380,7 @@ static void i40e_link_event(struct i40e_pf *pf)
 	if (pf->lan_veb != I40E_NO_VEB && pf->veb[pf->lan_veb])
 		i40e_veb_link_event(pf->veb[pf->lan_veb], new_link);
 	else
-		i40e_vsi_link_event(pf->vsi[pf->lan_vsi], new_link);
+		i40e_vsi_link_event(vsi, new_link);
 
 	if (pf->vf)
 		i40e_vc_notify_link_state(pf);
-- 
1.9.3

^ permalink raw reply related

* [net-next 04/11] i40evf: Add support for 10G base T parts
From: Jeff Kirsher @ 2014-11-03 14:56 UTC (permalink / raw)
  To: davem
  Cc: Paul M Stillwell Jr, netdev, nhorman, sassmann, jogreene,
	Patrick Lu, Jeff Kirsher
In-Reply-To: <1415026599-16232-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>

Add 10G-Base-T support in i40evf.

Change-ID: I98a1c3138d7d6572fe7903a7c1c4692cae3260d5
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Patrick Lu <patrick.lu@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40evf/i40e_common.c | 1 +
 drivers/net/ethernet/intel/i40evf/i40e_type.h   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40evf/i40e_common.c b/drivers/net/ethernet/intel/i40evf/i40e_common.c
index 9525605..28c40c5 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_common.c
@@ -50,6 +50,7 @@ i40e_status i40e_set_mac_type(struct i40e_hw *hw)
 		case I40E_DEV_ID_QSFP_A:
 		case I40E_DEV_ID_QSFP_B:
 		case I40E_DEV_ID_QSFP_C:
+		case I40E_DEV_ID_10G_BASE_T:
 			hw->mac.type = I40E_MAC_XL710;
 			break;
 		case I40E_DEV_ID_VF:
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_type.h b/drivers/net/ethernet/intel/i40evf/i40e_type.h
index 1537643..8fe34fc 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_type.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_type.h
@@ -43,6 +43,7 @@
 #define I40E_DEV_ID_QSFP_A		0x1583
 #define I40E_DEV_ID_QSFP_B		0x1584
 #define I40E_DEV_ID_QSFP_C		0x1585
+#define I40E_DEV_ID_10G_BASE_T		0x1586
 #define I40E_DEV_ID_VF		0x154C
 #define I40E_DEV_ID_VF_HV		0x1571
 
-- 
1.9.3

^ permalink raw reply related

* [net-next 02/11] i40evf: properly handle multiple AQ messages
From: Jeff Kirsher @ 2014-11-03 14:56 UTC (permalink / raw)
  To: davem
  Cc: Mitch Williams, netdev, nhorman, sassmann, jogreene, Patrick Lu,
	Jeff Kirsher
In-Reply-To: <1415026599-16232-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Mitch Williams <mitch.a.williams@intel.com>

When we receive an admin queue message, the msg_size field in the event
struct gets overwritten. Because of this, we need to reinit the field
each time we go through the loop. Without this we may receive truncated
messages due to the firmware thinking we have insufficient buffer size.

Change-ID: I21dcca5114d91365d731169965ce3ffec0e4a190
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Patrick Lu <patrick.lu@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index dabe6a4..b2f01eb 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -1647,10 +1647,8 @@ static void i40evf_adminq_task(struct work_struct *work)
 					   v_msg->v_retval, event.msg_buf,
 					   event.msg_size);
 		if (pending != 0) {
-			dev_info(&adapter->pdev->dev,
-				 "%s: ARQ: Pending events %d\n",
-				 __func__, pending);
 			memset(event.msg_buf, 0, I40EVF_MAX_AQ_BUF_SIZE);
+			event.msg_size = I40EVF_MAX_AQ_BUF_SIZE;
 		}
 	} while (pending);
 
-- 
1.9.3

^ permalink raw reply related

* [net-next 05/11] i40e: avoid disable of interrupt when changing ITR
From: Jeff Kirsher @ 2014-11-03 14:56 UTC (permalink / raw)
  To: davem
  Cc: Jesse Brandeburg, netdev, nhorman, sassmann, jogreene, Patrick Lu,
	Jeff Kirsher
In-Reply-To: <1415026599-16232-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Jesse Brandeburg <jesse.brandeburg@intel.com>

The call to irq_dynamic_disable was turning off the interrupt completely
when trying to set ITR to 0 (for lowest moderation).  Just remove the
call as setting the values to 0 later in this function will suffice.

Change-ID: I47caf1ecbe65653cf63ec833db93094cd83fd84d
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Patrick Lu <patrick.lu@intel.com>
Tested-By: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 12adc08..b6e745f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -1574,7 +1574,6 @@ static int i40e_set_coalesce(struct net_device *netdev,
 		vsi->rx_itr_setting = ec->rx_coalesce_usecs;
 	} else if (ec->rx_coalesce_usecs == 0) {
 		vsi->rx_itr_setting = ec->rx_coalesce_usecs;
-		i40e_irq_dynamic_disable(vsi, vector);
 		if (ec->use_adaptive_rx_coalesce)
 			netif_info(pf, drv, netdev,
 				   "Rx-secs=0, need to disable adaptive-Rx for a complete disable\n");
@@ -1589,7 +1588,6 @@ static int i40e_set_coalesce(struct net_device *netdev,
 		vsi->tx_itr_setting = ec->tx_coalesce_usecs;
 	} else if (ec->tx_coalesce_usecs == 0) {
 		vsi->tx_itr_setting = ec->tx_coalesce_usecs;
-		i40e_irq_dynamic_disable(vsi, vector);
 		if (ec->use_adaptive_tx_coalesce)
 			netif_info(pf, drv, netdev,
 				   "Tx-secs=0, need to disable adaptive-Tx for a complete disable\n");
-- 
1.9.3

^ permalink raw reply related

* [net-next 08/11] i40e: better wording for resource tracking errors
From: Jeff Kirsher @ 2014-11-03 14:56 UTC (permalink / raw)
  To: davem
  Cc: Shannon Nelson, netdev, nhorman, sassmann, jogreene, Patrick Lu,
	Jeff Kirsher
In-Reply-To: <1415026599-16232-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Shannon Nelson <shannon.nelson@intel.com>

Tweak and homogenize the error reporting for get_lump() resource
tracking errors.

Change-ID: I11330161cc6ad8d04371c499c63071c816171c3b
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Patrick Lu <patrick.lu@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 83fee7f..6a481bf 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -7957,8 +7957,8 @@ static int i40e_vsi_setup_vectors(struct i40e_vsi *vsi)
 						 vsi->num_q_vectors, vsi->idx);
 	if (vsi->base_vector < 0) {
 		dev_info(&pf->pdev->dev,
-			 "failed to get queue tracking for VSI %d, err=%d\n",
-			 vsi->seid, vsi->base_vector);
+			 "failed to get tracking for %d vectors for VSI %d, err=%d\n",
+			 vsi->num_q_vectors, vsi->seid, vsi->base_vector);
 		i40e_vsi_free_q_vectors(vsi);
 		ret = -ENOENT;
 		goto vector_setup_out;
@@ -7994,8 +7994,9 @@ static struct i40e_vsi *i40e_vsi_reinit_setup(struct i40e_vsi *vsi)
 
 	ret = i40e_get_lump(pf, pf->qp_pile, vsi->alloc_queue_pairs, vsi->idx);
 	if (ret < 0) {
-		dev_info(&pf->pdev->dev, "VSI %d get_lump failed %d\n",
-			 vsi->seid, ret);
+		dev_info(&pf->pdev->dev,
+			 "failed to get tracking for %d queues for VSI %d err=%d\n",
+			 vsi->alloc_queue_pairs, vsi->seid, ret);
 		goto err_vsi;
 	}
 	vsi->base_queue = ret;
@@ -8124,8 +8125,9 @@ struct i40e_vsi *i40e_vsi_setup(struct i40e_pf *pf, u8 type,
 	ret = i40e_get_lump(pf, pf->qp_pile, vsi->alloc_queue_pairs,
 				vsi->idx);
 	if (ret < 0) {
-		dev_info(&pf->pdev->dev, "VSI %d get_lump failed %d\n",
-			 vsi->seid, ret);
+		dev_info(&pf->pdev->dev,
+			 "failed to get tracking for %d queues for VSI %d err=%d\n",
+			 vsi->alloc_queue_pairs, vsi->seid, ret);
 		goto err_vsi;
 	}
 	vsi->base_queue = ret;
-- 
1.9.3

^ permalink raw reply related

* [net-next 06/11] i40e: remove debugfs dump stats
From: Jeff Kirsher @ 2014-11-03 14:56 UTC (permalink / raw)
  To: davem
  Cc: Shannon Nelson, netdev, nhorman, sassmann, jogreene, Patrick Lu,
	Jeff Kirsher
In-Reply-To: <1415026599-16232-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Shannon Nelson <shannon.nelson@intel.com>

The debugfs dump stats wasn't being kept up-to-date, was redundant with
the ethtool output, and didn't offer any useful additional info.  Rather
than continue trying to keep them aligned, just remove the debugfs command.

Change-ID: Id130ed9aef01c6369ab662c7b4c5ec5b1dbc5b40
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Patrick Lu <patrick.lu@intel.com>
Tested-by: Jim Young <Jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c | 93 +-------------------------
 1 file changed, 2 insertions(+), 91 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
index 7067f4b..a03f459 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
@@ -895,90 +895,6 @@ static void i40e_dbg_dump_eth_stats(struct i40e_pf *pf,
 }
 
 /**
- * i40e_dbg_dump_stats - handles dump stats write into command datum
- * @pf: the i40e_pf created in command write
- * @stats: the stats structure to be dumped
- **/
-static void i40e_dbg_dump_stats(struct i40e_pf *pf,
-				struct i40e_hw_port_stats *stats)
-{
-	int i;
-
-	dev_info(&pf->pdev->dev, "  stats:\n");
-	dev_info(&pf->pdev->dev,
-		 "    crc_errors = \t\t%lld \tillegal_bytes = \t%lld \terror_bytes = \t\t%lld\n",
-		 stats->crc_errors, stats->illegal_bytes, stats->error_bytes);
-	dev_info(&pf->pdev->dev,
-		 "    mac_local_faults = \t%lld \tmac_remote_faults = \t%lld \trx_length_errors = \t%lld\n",
-		 stats->mac_local_faults, stats->mac_remote_faults,
-		 stats->rx_length_errors);
-	dev_info(&pf->pdev->dev,
-		 "    link_xon_rx = \t\t%lld \tlink_xoff_rx = \t\t%lld \tlink_xon_tx = \t\t%lld\n",
-		 stats->link_xon_rx, stats->link_xoff_rx, stats->link_xon_tx);
-	dev_info(&pf->pdev->dev,
-		 "    link_xoff_tx = \t\t%lld \trx_size_64 = \t\t%lld \trx_size_127 = \t\t%lld\n",
-		 stats->link_xoff_tx, stats->rx_size_64, stats->rx_size_127);
-	dev_info(&pf->pdev->dev,
-		 "    rx_size_255 = \t\t%lld \trx_size_511 = \t\t%lld \trx_size_1023 = \t\t%lld\n",
-		 stats->rx_size_255, stats->rx_size_511, stats->rx_size_1023);
-	dev_info(&pf->pdev->dev,
-		 "    rx_size_big = \t\t%lld \trx_undersize = \t\t%lld \trx_jabber = \t\t%lld\n",
-		 stats->rx_size_big, stats->rx_undersize, stats->rx_jabber);
-	dev_info(&pf->pdev->dev,
-		 "    rx_fragments = \t\t%lld \trx_oversize = \t\t%lld \ttx_size_64 = \t\t%lld\n",
-		 stats->rx_fragments, stats->rx_oversize, stats->tx_size_64);
-	dev_info(&pf->pdev->dev,
-		 "    tx_size_127 = \t\t%lld \ttx_size_255 = \t\t%lld \ttx_size_511 = \t\t%lld\n",
-		 stats->tx_size_127, stats->tx_size_255, stats->tx_size_511);
-	dev_info(&pf->pdev->dev,
-		 "    tx_size_1023 = \t\t%lld \ttx_size_big = \t\t%lld \tmac_short_packet_dropped = \t%lld\n",
-		 stats->tx_size_1023, stats->tx_size_big,
-		 stats->mac_short_packet_dropped);
-	for (i = 0; i < 8; i += 4) {
-		dev_info(&pf->pdev->dev,
-			 "    priority_xon_rx[%d] = \t%lld \t[%d] = \t%lld \t[%d] = \t%lld \t[%d] = \t%lld\n",
-			 i, stats->priority_xon_rx[i],
-			 i+1, stats->priority_xon_rx[i+1],
-			 i+2, stats->priority_xon_rx[i+2],
-			 i+3, stats->priority_xon_rx[i+3]);
-	}
-	for (i = 0; i < 8; i += 4) {
-		dev_info(&pf->pdev->dev,
-			 "    priority_xoff_rx[%d] = \t%lld \t[%d] = \t%lld \t[%d] = \t%lld \t[%d] = \t%lld\n",
-			 i, stats->priority_xoff_rx[i],
-			 i+1, stats->priority_xoff_rx[i+1],
-			 i+2, stats->priority_xoff_rx[i+2],
-			 i+3, stats->priority_xoff_rx[i+3]);
-	}
-	for (i = 0; i < 8; i += 4) {
-		dev_info(&pf->pdev->dev,
-			 "    priority_xon_tx[%d] = \t%lld \t[%d] = \t%lld \t[%d] = \t%lld \t[%d] = \t%lld\n",
-			 i, stats->priority_xon_tx[i],
-			 i+1, stats->priority_xon_tx[i+1],
-			 i+2, stats->priority_xon_tx[i+2],
-			 i+3, stats->priority_xon_rx[i+3]);
-	}
-	for (i = 0; i < 8; i += 4) {
-		dev_info(&pf->pdev->dev,
-			 "    priority_xoff_tx[%d] = \t%lld \t[%d] = \t%lld \t[%d] = \t%lld \t[%d] = \t%lld\n",
-			 i, stats->priority_xoff_tx[i],
-			 i+1, stats->priority_xoff_tx[i+1],
-			 i+2, stats->priority_xoff_tx[i+2],
-			 i+3, stats->priority_xoff_tx[i+3]);
-	}
-	for (i = 0; i < 8; i += 4) {
-		dev_info(&pf->pdev->dev,
-			 "    priority_xon_2_xoff[%d] = \t%lld \t[%d] = \t%lld \t[%d] = \t%lld \t[%d] = \t%lld\n",
-			 i, stats->priority_xon_2_xoff[i],
-			 i+1, stats->priority_xon_2_xoff[i+1],
-			 i+2, stats->priority_xon_2_xoff[i+2],
-			 i+3, stats->priority_xon_2_xoff[i+3]);
-	}
-
-	i40e_dbg_dump_eth_stats(pf, &stats->eth);
-}
-
-/**
  * i40e_dbg_dump_veb_seid - handles dump stats of a single given veb
  * @pf: the i40e_pf created in command write
  * @seid: the seid the user put in
@@ -1342,11 +1258,6 @@ static ssize_t i40e_dbg_command_write(struct file *filp,
 					 "dump desc rx <vsi_seid> <ring_id> [<desc_n>]\n");
 				dev_info(&pf->pdev->dev, "dump desc aq\n");
 			}
-		} else if (strncmp(&cmd_buf[5], "stats", 5) == 0) {
-			dev_info(&pf->pdev->dev, "pf stats:\n");
-			i40e_dbg_dump_stats(pf, &pf->stats);
-			dev_info(&pf->pdev->dev, "pf stats_offsets:\n");
-			i40e_dbg_dump_stats(pf, &pf->stats_offsets);
 		} else if (strncmp(&cmd_buf[5], "reset stats", 11) == 0) {
 			dev_info(&pf->pdev->dev,
 				 "core reset count: %d\n", pf->corer_count);
@@ -1464,8 +1375,8 @@ static ssize_t i40e_dbg_command_write(struct file *filp,
 		} else {
 			dev_info(&pf->pdev->dev,
 				 "dump desc tx <vsi_seid> <ring_id> [<desc_n>], dump desc rx <vsi_seid> <ring_id> [<desc_n>],\n");
-			dev_info(&pf->pdev->dev, "dump switch, dump vsi [seid] or\n");
-			dev_info(&pf->pdev->dev, "dump stats\n");
+			dev_info(&pf->pdev->dev, "dump switch\n");
+			dev_info(&pf->pdev->dev, "dump vsi [seid]\n");
 			dev_info(&pf->pdev->dev, "dump reset stats\n");
 			dev_info(&pf->pdev->dev, "dump port\n");
 			dev_info(&pf->pdev->dev,
-- 
1.9.3

^ permalink raw reply related

* [net-next 07/11] i40e: scale msix vector use when more cores than vectors
From: Jeff Kirsher @ 2014-11-03 14:56 UTC (permalink / raw)
  To: davem
  Cc: Shannon Nelson, netdev, nhorman, sassmann, jogreene, Patrick Lu,
	Jeff Kirsher
In-Reply-To: <1415026599-16232-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Shannon Nelson <shannon.nelson@intel.com>

When there are more cores than vectors available to the PF, scale back
the LAN msix usage to force queue/vector sharing and leave some vectors
for Flow Director, VMDq, etc.

Change-ID: Ie0317732eb85ad8d851d7da7d9af86b1bf8c21ad
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Patrick Lu <patrick.lu@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index f95c04a..83fee7f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -6699,6 +6699,7 @@ static int i40e_init_msix(struct i40e_pf *pf)
 {
 	i40e_status err = 0;
 	struct i40e_hw *hw = &pf->hw;
+	int other_vecs = 0;
 	int v_budget, i;
 	int vec;
 
@@ -6724,10 +6725,10 @@ static int i40e_init_msix(struct i40e_pf *pf)
 	 */
 	pf->num_lan_msix = pf->num_lan_qps - (pf->rss_size_max - pf->rss_size);
 	pf->num_vmdq_msix = pf->num_vmdq_qps;
-	v_budget = 1 + pf->num_lan_msix;
-	v_budget += (pf->num_vmdq_vsis * pf->num_vmdq_msix);
+	other_vecs = 1;
+	other_vecs += (pf->num_vmdq_vsis * pf->num_vmdq_msix);
 	if (pf->flags & I40E_FLAG_FD_SB_ENABLED)
-		v_budget++;
+		other_vecs++;
 
 #ifdef I40E_FCOE
 	if (pf->flags & I40E_FLAG_FCOE_ENABLED) {
@@ -6737,7 +6738,9 @@ static int i40e_init_msix(struct i40e_pf *pf)
 
 #endif
 	/* Scale down if necessary, and the rings will share vectors */
-	v_budget = min_t(int, v_budget, hw->func_caps.num_msix_vectors);
+	pf->num_lan_msix = min_t(int, pf->num_lan_msix,
+			(hw->func_caps.num_msix_vectors - other_vecs));
+	v_budget = pf->num_lan_msix + other_vecs;
 
 	pf->msix_entries = kcalloc(v_budget, sizeof(struct msix_entry),
 				   GFP_KERNEL);
-- 
1.9.3

^ permalink raw reply related

* [net-next 09/11] i40e: enable debug earlier
From: Jeff Kirsher @ 2014-11-03 14:56 UTC (permalink / raw)
  To: davem
  Cc: Shannon Nelson, netdev, nhorman, sassmann, jogreene, Patrick Lu,
	Jeff Kirsher
In-Reply-To: <1415026599-16232-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Shannon Nelson <shannon.nelson@intel.com>

Check the debug module parameter earlier to be able to catch the early
configuration phase adminq messages.

Change-ID: Ic84fabd72393489bbf96042de770790a80fd8468
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Patrick Lu <patrick.lu@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 6a481bf..ea62267 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -9023,6 +9023,11 @@ static int i40e_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	hw->bus.func = PCI_FUNC(pdev->devfn);
 	pf->instance = pfs_found;
 
+	if (debug != -1) {
+		pf->msg_enable = pf->hw.debug_mask;
+		pf->msg_enable = debug;
+	}
+
 	/* do a special CORER for clearing PXE mode once at init */
 	if (hw->revision_id == 0 &&
 	    (rd32(hw, I40E_GLLAN_RCTL_0) & I40E_GLLAN_RCTL_0_PXE_MODE_MASK)) {
-- 
1.9.3

^ permalink raw reply related

* [net-next 11/11] i40e: properly parse MDET registers
From: Jeff Kirsher @ 2014-11-03 14:56 UTC (permalink / raw)
  To: davem
  Cc: Mitch Williams, netdev, nhorman, sassmann, jogreene, Patrick Lu,
	Jeff Kirsher
In-Reply-To: <1415026599-16232-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Mitch Williams <mitch.a.williams@intel.com>

Fix a few problems with our parsing of the MDET registers:
* Queue IDs are longer than 8 bits
* Queue IDs are absolute for the device and the base queue must be
  subtracted out.
* VF IDs are longer than 8 bits
* Use the MASK define to mask the event value, instead of the SHIFT
  define.

Change-ID: I3dc7237f480c02e1192a2a8ea782f8a02ab2a8b7
Reported-by: Marc Neustadter <marc.neustadter@intel.com>
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Patrick Lu <patrick.lu@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 1b0c437..1a98e23 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -6174,12 +6174,13 @@ static void i40e_handle_mdd_event(struct i40e_pf *pf)
 	if (reg & I40E_GL_MDET_TX_VALID_MASK) {
 		u8 pf_num = (reg & I40E_GL_MDET_TX_PF_NUM_MASK) >>
 				I40E_GL_MDET_TX_PF_NUM_SHIFT;
-		u8 vf_num = (reg & I40E_GL_MDET_TX_VF_NUM_MASK) >>
+		u16 vf_num = (reg & I40E_GL_MDET_TX_VF_NUM_MASK) >>
 				I40E_GL_MDET_TX_VF_NUM_SHIFT;
 		u8 event = (reg & I40E_GL_MDET_TX_EVENT_MASK) >>
 				I40E_GL_MDET_TX_EVENT_SHIFT;
-		u8 queue = (reg & I40E_GL_MDET_TX_QUEUE_MASK) >>
-				I40E_GL_MDET_TX_QUEUE_SHIFT;
+		u16 queue = ((reg & I40E_GL_MDET_TX_QUEUE_MASK) >>
+				I40E_GL_MDET_TX_QUEUE_SHIFT) -
+				pf->hw.func_caps.base_queue;
 		if (netif_msg_tx_err(pf))
 			dev_info(&pf->pdev->dev, "Malicious Driver Detection event 0x%02x on TX queue %d pf number 0x%02x vf number 0x%02x\n",
 				 event, queue, pf_num, vf_num);
@@ -6192,8 +6193,9 @@ static void i40e_handle_mdd_event(struct i40e_pf *pf)
 				I40E_GL_MDET_RX_FUNCTION_SHIFT;
 		u8 event = (reg & I40E_GL_MDET_RX_EVENT_MASK) >>
 				I40E_GL_MDET_RX_EVENT_SHIFT;
-		u8 queue = (reg & I40E_GL_MDET_RX_QUEUE_MASK) >>
-				I40E_GL_MDET_RX_QUEUE_SHIFT;
+		u16 queue = ((reg & I40E_GL_MDET_RX_QUEUE_MASK) >>
+				I40E_GL_MDET_RX_QUEUE_SHIFT) -
+				pf->hw.func_caps.base_queue;
 		if (netif_msg_rx_err(pf))
 			dev_info(&pf->pdev->dev, "Malicious Driver Detection event 0x%02x on RX queue %d of function 0x%02x\n",
 				 event, queue, func);
-- 
1.9.3

^ permalink raw reply related

* [net-next 10/11] i40e: configure VM ID in qtx_ctl
From: Jeff Kirsher @ 2014-11-03 14:56 UTC (permalink / raw)
  To: davem
  Cc: Mitch Williams, netdev, nhorman, sassmann, jogreene, Patrick Lu,
	Jeff Kirsher
In-Reply-To: <1415026599-16232-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Mitch Williams <mitch.a.williams@intel.com>

We must insert the VSI ID in the QTX_CTL register when
configuring queues for VMDQ VSIs.

Change-ID: Iedfe36bd42ca0adc90a7cc2b7cf04795a98f4761
Reported-by: Marc Neustadter <marc.neustadter@intel.com>
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Patrick Lu <patrick.lu@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index ea62267..1b0c437 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -2462,10 +2462,14 @@ static int i40e_configure_tx_ring(struct i40e_ring *ring)
 	}
 
 	/* Now associate this queue with this PCI function */
-	if (vsi->type == I40E_VSI_VMDQ2)
+	if (vsi->type == I40E_VSI_VMDQ2) {
 		qtx_ctl = I40E_QTX_CTL_VM_QUEUE;
-	else
+		qtx_ctl |= ((vsi->id) << I40E_QTX_CTL_VFVM_INDX_SHIFT) &
+			   I40E_QTX_CTL_VFVM_INDX_MASK;
+	} else {
 		qtx_ctl = I40E_QTX_CTL_PF_QUEUE;
+	}
+
 	qtx_ctl |= ((hw->pf_id << I40E_QTX_CTL_PF_INDX_SHIFT) &
 		    I40E_QTX_CTL_PF_INDX_MASK);
 	wr32(hw, I40E_QTX_CTL(pf_q), qtx_ctl);
-- 
1.9.3

^ permalink raw reply related

* Re: [PATCH -next v3 1/3] syncookies: avoid magic values and document which-bit-is-what-option
From: Daniel Borkmann @ 2014-11-03 15:27 UTC (permalink / raw)
  To: David Laight; +Cc: 'Florian Westphal', netdev@vger.kernel.org
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D1C9E44D6@AcuExch.aculab.com>

On 11/03/2014 03:41 PM, David Laight wrote:
...
>>>> +/* TCP Timestamp: 6 lowest bits of timestamp sent in the cookie SYN-ACK
>>>> + * stores TCP options:
>>>> + *
>>>> + * MSB                               LSB
>>>> + * | 31 ...   6 |  5  |  4   | 3 2 1 0 |
>>>> + * |  Timestamp | ECN | SACK | WScale  |
>>>> + *
>>>> + * When we receive a valid cookie-ACK, we look at the echoed tsval (if
>>>> + * any) to figure out which TCP options we should use for the rebuilt
>>>> + * connection.
>>>> + *
>>>> + * A WScale setting of '0xf' (which is an invalid scaling value)
>>>> + * means that original syn did not include the TCP window scaling option.
>>>> + */
>>>> +#define TS_OPT_WSCALE_MASK	0xf
>>>> +#define TS_OPT_SACK		BIT(4)
>>>> +#define TS_OPT_ECN		BIT(5)
>>>> +/* There is no TS_OPT_TIMESTAMP:
>>>> + * if ACK contains timestamp option, we already know it was
>>>> + * requested/supported by the syn/synack exchange.
>>>> + */
>>>> +#define TSBITS	6
>>>> +#define TSMASK	(((__u32)1 << TSBITS) - 1)
>>>
>>> Personally I'd define all the values as hex constants instead of mixing
>>> and matching the defines.
>>>
>>> So probably just:
>>> #define TS_OPT_WSCALE_MASK	0x0f
>>> #define TS_OPT_SACK		0x10
>>> #define TS_OPT_ECN		0x20
>>> #define TSMASK                0x3f
>>
>> If you look at the above comment and then take a peek at the actual TS_OPT_*,
>> it is much easier to follow. Moreover, how is having TSMASK as 0x3f better?!
>> Currently, it is a constant calculated based upon TSBITS.
>
> TSMASK is also (TS_OPT_WSCALE_MASK | TS_OPT_SACK | TS_OPT_ECN) defining
> the values in hex makes this even more clear.

Right, that's your personal taste. ;) Besides, the definition of TSBITS/TSMASK
itself is not even altered here.

> Defining TSBITS from the mask would save the extra definition - which might
> be easier done by replacing the shifts with multiply/divide by (TSMASK + 1)
> (probably in a #define/inline function to make the code easier to read.

Sure, lets make it more complicated than it actually needs to be ... again,
I think the code is fine as is, sorry.

^ permalink raw reply

* Re: [PATCH] ipv4: avoid divide 0 error in tcp_incr_quickack
From: Eric Dumazet @ 2014-11-03 15:30 UTC (permalink / raw)
  To: chenweilong, Yuchung Cheng, Neal Cardwell
  Cc: davem, kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel
In-Reply-To: <54571335.2060709@huawei.com>

On Mon, 2014-11-03 at 13:31 +0800, chenweilong wrote:

> Hi Eric,
> 
> I check the code and find that:
> 
> 1.In function "tcp_rcv_state_process",
> the "tcp_initialize_rcv_mss" is called at "step 5: check the ACK field" when the sk->sk_state is TCP_SYN_RECV
> and there is a "tcp_validate_incoming" just before it.
> So when we call "tcp_validate_incoming", the rcv_mss may not been initialized.
> 
> 2.In function "tcp_validate_incoming",
> the "Step 1: check sequence number", according to RFC793 page 69,
> If an incoming segment is not acceptable,an acknowledgment should be sent in reply (unless the RST
> bit is set, if so drop the segment and return).
> So we may call "tcp_send_dupack" while the rcv_mss hasn't been initialized.
> 
> 3.In function "tcp_send_dupack",
> when the condition is suitable, it'll enter quick ack mode. Notice it only check the seq !
> So I think add another state check should be OK.
> 
> Any suggestion ?
> 

You did find what immediate conditions for the crash (rcv_mss = 0, state
= TCP_SYN_RCV) were.

Your patch avoids the zero divide, but leaves other issues. rcv_mss = 0
here is a sign some logic is wrong in the stack.

Given this potential zero divide had been there for years, I believe we
should take the time for a more complete fix, instead of papering over
the immediate problem.

We have been working with Neal to reproduce the issue with packetdrill,
we'll post our results when we manage to get our first crash ;)

Thanks !

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox