Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH 3/6] net: davinci_emac: Free clock after checking the frequency
From: Tony Lindgren @ 2015-01-13 19:29 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-omap, Brian Hutchinson, Felipe Balbi
In-Reply-To: <1421177368-19756-1-git-send-email-tony@atomide.com>

We only use clk_get() to get the frequency, the rest is done by
the runtime PM calls. Let's free the clock too.

Cc: Brian Hutchinson <b.hutchman@gmail.com>
Cc: Felipe Balbi <balbi@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
---
 drivers/net/ethernet/ti/davinci_emac.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/ti/davinci_emac.c b/drivers/net/ethernet/ti/davinci_emac.c
index deb43b3..e9efc74 100644
--- a/drivers/net/ethernet/ti/davinci_emac.c
+++ b/drivers/net/ethernet/ti/davinci_emac.c
@@ -1881,6 +1881,7 @@ static int davinci_emac_probe(struct platform_device *pdev)
 		return -EBUSY;
 	}
 	emac_bus_frequency = clk_get_rate(emac_clk);
+	clk_put(emac_clk);
 
 	/* TODO: Probe PHY here if possible */
 
-- 
2.1.4


^ permalink raw reply related

* [PATCH 2/6] net: davinci_emac: Fix runtime pm calls for davinci_emac
From: Tony Lindgren @ 2015-01-13 19:29 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, linux-omap, Brian Hutchinson, Felipe Balbi, Mark A. Greer
In-Reply-To: <1421177368-19756-1-git-send-email-tony@atomide.com>

Commit 3ba97381343b ("net: ethernet: davinci_emac: add pm_runtime support")
added support for runtime PM, but it causes issues on omap3 related devices
that actually gate the clocks:

Unhandled fault: external abort on non-linefetch (0x1008)
...
[<c04160f0>] (emac_dev_getnetstats) from [<c04d6a3c>] (dev_get_stats+0x78/0xc8)
[<c04d6a3c>] (dev_get_stats) from [<c04e9ccc>] (rtnl_fill_ifinfo+0x3b8/0x938)
[<c04e9ccc>] (rtnl_fill_ifinfo) from [<c04eade4>] (rtmsg_ifinfo+0x68/0xd8)
[<c04eade4>] (rtmsg_ifinfo) from [<c04dd35c>] (register_netdevice+0x3a0/0x4ec)
[<c04dd35c>] (register_netdevice) from [<c04dd4bc>] (register_netdev+0x14/0x24)
[<c04dd4bc>] (register_netdev) from [<c041755c>] (davinci_emac_probe+0x408/0x5c8)
[<c041755c>] (davinci_emac_probe) from [<c0396d78>] (platform_drv_probe+0x48/0xa4)

Let's fix it by moving the pm_runtime_get() call earlier, and also
add it to the emac_dev_getnetstats(). Also note that we want to use
pm_rutime_get_sync() as we don't want to have deferred_resume happen.

Cc: Brian Hutchinson <b.hutchman@gmail.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
---
 drivers/net/ethernet/ti/davinci_emac.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/ti/davinci_emac.c b/drivers/net/ethernet/ti/davinci_emac.c
index 383ed52..deb43b3 100644
--- a/drivers/net/ethernet/ti/davinci_emac.c
+++ b/drivers/net/ethernet/ti/davinci_emac.c
@@ -1538,7 +1538,7 @@ static int emac_dev_open(struct net_device *ndev)
 	int i = 0;
 	struct emac_priv *priv = netdev_priv(ndev);
 
-	pm_runtime_get(&priv->pdev->dev);
+	pm_runtime_get_sync(&priv->pdev->dev);
 
 	netif_carrier_off(ndev);
 	for (cnt = 0; cnt < ETH_ALEN; cnt++)
@@ -1726,6 +1726,8 @@ static struct net_device_stats *emac_dev_getnetstats(struct net_device *ndev)
 	u32 mac_control;
 	u32 stats_clear_mask;
 
+	pm_runtime_get_sync(&priv->pdev->dev);
+
 	/* update emac hardware stats and reset the registers*/
 
 	mac_control = emac_read(EMAC_MACCONTROL);
@@ -1767,6 +1769,8 @@ static struct net_device_stats *emac_dev_getnetstats(struct net_device *ndev)
 	ndev->stats.tx_fifo_errors += emac_read(EMAC_TXUNDERRUN);
 	emac_write(EMAC_TXUNDERRUN, stats_clear_mask);
 
+	pm_runtime_put(&priv->pdev->dev);
+
 	return &ndev->stats;
 }
 
@@ -1981,12 +1985,16 @@ static int davinci_emac_probe(struct platform_device *pdev)
 	ndev->ethtool_ops = &ethtool_ops;
 	netif_napi_add(ndev, &priv->napi, emac_poll, EMAC_POLL_WEIGHT);
 
+	pm_runtime_enable(&pdev->dev);
+	pm_runtime_get_sync(&pdev->dev);
+
 	/* register the network device */
 	SET_NETDEV_DEV(ndev, &pdev->dev);
 	rc = register_netdev(ndev);
 	if (rc) {
 		dev_err(&pdev->dev, "error in register_netdev\n");
 		rc = -ENODEV;
+		pm_runtime_put(&pdev->dev);
 		goto no_cpdma_chan;
 	}
 
@@ -1996,9 +2004,7 @@ static int davinci_emac_probe(struct platform_device *pdev)
 			   "(regs: %p, irq: %d)\n",
 			   (void *)priv->emac_base_phys, ndev->irq);
 	}
-
-	pm_runtime_enable(&pdev->dev);
-	pm_runtime_resume(&pdev->dev);
+	pm_runtime_put(&pdev->dev);
 
 	return 0;
 
-- 
2.1.4


^ permalink raw reply related

* [PATCH 0/6] Fixes for davinci_emac
From: Tony Lindgren @ 2015-01-13 19:29 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-omap

Hi,

Here are some fixes for davinci_emac for the issues I've noticed
recently.

Regards,

Tony

Tony Lindgren (6):
  net: davinci_emac: Fix hangs with interrupts
  net: davinci_emac: Fix runtime pm calls for davinci_emac
  net: davinci_emac: Free clock after checking the frequency
  net: davinci_emac: Fix incomplete code for getting the phy from device
    tree
  net: davinci_emac: Fix ioremap for devices with MDIO within the EMAC
    address space
  net: davinci_emac: Add support for emac on dm816x

 .../devicetree/bindings/net/davinci_emac.txt       |  3 +-
 drivers/net/ethernet/ti/davinci_emac.c             | 77 ++++++++++++++++------
 2 files changed, 58 insertions(+), 22 deletions(-)

-- 
2.1.4


^ permalink raw reply

* Re: Fwd: [rhashtable] WARNING: CPU: 0 PID: 10 at kernel/locking/mutex.c:570 mutex_lock_nested()
From: Thomas Graf @ 2015-01-13 19:28 UTC (permalink / raw)
  To: Cong Wang; +Cc: Ying Xue, linux-kernel@vger.kernel.org, lkp, Netdev
In-Reply-To: <CAHA+R7M1ZSCF+FwKVtZUbsJ05zesNg-WVTqH13=oFe5gM--3gw@mail.gmail.com>

On 01/13/15 at 11:14am, Cong Wang wrote:
> On Tue, Jan 13, 2015 at 12:41 AM, Thomas Graf <tgraf@suug.ch> wrote:
> > I can't reproduce it in my KVM box either so far. It looks like a
> > mutex_lock() on an uninitialized mutex or use after free but I can't
> > find such a code path so far.
> 
> Couldn't that be the delayed work is still running after rhashtable
> is destroyed by its caller? I mean, cancel_delayed_work_sync()
> should be called in rhashtable_destroy()?
> 
> Of course, it may be caller's responsibility to ensure that, I haven't
> looked into it that much.

Yes, we came to the very same conclusion in a different email thread
and found the offending race condition.

^ permalink raw reply

* Re: [PATCH] af_packet: fix typo of "unlikely" conditional in packet_snd
From: David Miller @ 2015-01-13 19:26 UTC (permalink / raw)
  To: linville; +Cc: netdev, dborkman, hannes
In-Reply-To: <1421176811-22594-1-git-send-email-linville@tuxdriver.com>

From: "John W. Linville" <linville@tuxdriver.com>
Date: Tue, 13 Jan 2015 14:20:11 -0500

> Change "unlikely(offset) < 0" to "unlikely(offset < 0)"...
> 
> Coverity: CID 1259984
> 
> Signed-off-by: John W. Linville <linville@tuxdriver.com>

Should be fixed in the 'net' tree by:

commit 46d2cfb192b30d729aef064808ed5ece47cee369
Author: Christoph Jaeger <cj@linux.com>
Date:   Sun Jan 11 13:01:16 2015 -0500

    packet: bail out of packet_snd() if L2 header creation fails

^ permalink raw reply

* Re: [PATCH] bridge: only provide proxy ARP when CONFIG_INET is enabled
From: Cong Wang @ 2015-01-13 19:25 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: netdev, Kyeyoon Park, bridge@lists.linux-foundation.org,
	David Miller
In-Reply-To: <56868207.rHBDZL3pbk@wuerfel>

On Tue, Jan 13, 2015 at 6:10 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> When IPV4 support is disabled, we cannot call arp_send from
> the bridge code, which would result in a kernel link error:
>
> net/built-in.o: In function `br_handle_frame_finish':
> :(.text+0x59914): undefined reference to `arp_send'
> :(.text+0x59a50): undefined reference to `arp_tbl'
>
> This makes the newly added proxy ARP support in the bridge
> code depend on the CONFIG_INET symbol and lets the compiler
> optimize the code out to avoid the link error.
>

Not sure how much sense to make CONFIG_BRIDGE depend
on CONFIG_INET, at least CONFIG_BONDING does.

^ permalink raw reply

* Re: [PATCH net-next v3] tcp: avoid reducing cwnd when ACK+DSACK is received
From: David Miller @ 2015-01-13 19:22 UTC (permalink / raw)
  To: sebastien.barre
  Cc: ncardwell, ycheng, eric.dumazet, netdev, gregory.detal, nanditad
In-Reply-To: <1421055040-8732-1-git-send-email-sebastien.barre@uclouvain.be>

From: Sébastien Barré <sebastien.barre@uclouvain.be>
Date: Mon, 12 Jan 2015 10:30:40 +0100

> With TLP, the peer may reply to a probe with an
> ACK+D-SACK, with ack value set to tlp_high_seq. In the current code,
> such ACK+DSACK will be missed and only at next, higher ack will the TLP
> episode be considered done. Since the DSACK is not present anymore,
> this will cost a cwnd reduction.
> 
> This patch ensures that this scenario does not cause a cwnd reduction, since
> receiving an ACK+DSACK indicates that both the initial segment and the probe
> have been received by the peer.
> 
> The following packetdrill test, from Neal Cardwell, validates this patch:
 ...
> Credits:
> -Gregory helped in finding that tcp_process_tlp_ack was where the cwnd
> got reduced in our MPTCP tests.
> -Neal wrote the packetdrill test above
> -Yuchung reworked the patch to make it more readable.
> 
> Cc: Gregory Detal <gregory.detal@uclouvain.be>
> Cc: Nandita Dukkipati <nanditad@google.com>
> Tested-by: Neal Cardwell <ncardwell@google.com>
> Reviewed-by: Yuchung Cheng <ycheng@google.com>
> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
> Signed-off-by: Sébastien Barré <sebastien.barre@uclouvain.be>

Applied, thanks everyone.

^ permalink raw reply

* [PATCH] af_packet: fix typo of "unlikely" conditional in packet_snd
From: John W. Linville @ 2015-01-13 19:20 UTC (permalink / raw)
  To: netdev
  Cc: Daniel Borkmann, David S. Miller, Hannes Frederic Sowa,
	John W. Linville

Change "unlikely(offset) < 0" to "unlikely(offset < 0)"...

Coverity: CID 1259984

Signed-off-by: John W. Linville <linville@tuxdriver.com>
---
Compile tested only...

 net/packet/af_packet.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 6880f34a529a..9cfe2e1dd8b5 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2517,7 +2517,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len)
 	err = -EINVAL;
 	if (sock->type == SOCK_DGRAM) {
 		offset = dev_hard_header(skb, dev, ntohs(proto), addr, NULL, len);
-		if (unlikely(offset) < 0)
+		if (unlikely(offset < 0))
 			goto out_free;
 	} else {
 		if (ll_header_truncated(dev, len))
-- 
2.1.0

^ permalink raw reply related

* Re: Fwd: [rhashtable] WARNING: CPU: 0 PID: 10 at kernel/locking/mutex.c:570 mutex_lock_nested()
From: Cong Wang @ 2015-01-13 19:14 UTC (permalink / raw)
  To: Thomas Graf; +Cc: Ying Xue, linux-kernel@vger.kernel.org, lkp, Netdev
In-Reply-To: <20150113084126.GE20387@casper.infradead.org>

On Tue, Jan 13, 2015 at 12:41 AM, Thomas Graf <tgraf@suug.ch> wrote:
> On 01/13/15 at 03:50pm, Ying Xue wrote:
>> On 01/12/2015 08:42 PM, Thomas Graf wrote:
>> > On 01/12/15 at 09:38am, Ying Xue wrote:
>> >> Hi Thomas,
>> >>
>> >> I am really unable to see where is wrong leading to below warning
>> >> complaints. Can you please help me check it?
>> >
>> > Not sure yet. It's not your patch that introduced the issue though.
>> > It merely exposed the affected code path.
>> >
>> > Just wondering, did you test with CONFIG_DEBUG_MUTEXES enabled?
>> >
>> >
>>
>> After I enable above option, I don't find similar complaints during my
>> testing.
>
> I can't reproduce it in my KVM box either so far. It looks like a
> mutex_lock() on an uninitialized mutex or use after free but I can't
> find such a code path so far.

Couldn't that be the delayed work is still running after rhashtable
is destroyed by its caller? I mean, cancel_delayed_work_sync()
should be called in rhashtable_destroy()?

Of course, it may be caller's responsibility to ensure that, I haven't
looked into it that much.

^ permalink raw reply

* Re: [PATCH] Corrected the comment describing the ndo operations to reflect the actual prototype for couple of operations
From: David Miller @ 2015-01-13 19:04 UTC (permalink / raw)
  To: marichika4; +Cc: netdev
In-Reply-To: <1421054185-10249-1-git-send-email-marichika4@gmail.com>

From: B Viswanath <marichika4@gmail.com>
Date: Mon, 12 Jan 2015 14:46:25 +0530

> Corrected the comment describing the ndo operations to
> reflect the actual prototype for couple of operations
> 
> Signed-off-by: B Viswanath <marichika4@gmail.com>

Applied, thanks.

Please in the future put proper subsystem prefixes in your Subject lines,
in this case it should have been "[PATCH] net: Corrected ..."

^ permalink raw reply

* Re: [PATCH] i40e: avoid use of uninitialized v_budget in i40e_init_msix
From: Jeff Kirsher @ 2015-01-13 19:03 UTC (permalink / raw)
  To: John W. Linville; +Cc: netdev, Linux NICS
In-Reply-To: <1421174908-20445-1-git-send-email-linville@tuxdriver.com>

[-- Attachment #1: Type: text/plain, Size: 578 bytes --]

On Tue, 2015-01-13 at 13:48 -0500, John W. Linville wrote:
> This I40E_FCOE block increments v_budget before it has been
> initialized,
> then v_budget gets overwritten a few lines later.  This patch just
> reorders the code hunks in what I believe was the intended sequence.
> 
> Coverity: CID 1260099
> 
> Signed-off-by: John W. Linville <linville@tuxdriver.com>
> ---
> Compile tested only...
> 
>  drivers/net/ethernet/intel/i40e/i40e_main.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)

Thanks John, I will add your patch to my queue.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [net-next 00/15][pull request] Intel Wired LAN Driver Updates 2015-01-13
From: Jeff Kirsher @ 2015-01-13 19:01 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, nhorman, sassmann, jogreene
In-Reply-To: <20150113.135751.1359752961454701029.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 395 bytes --]

On Tue, 2015-01-13 at 13:57 -0500, David Miller wrote:
> From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Date: Tue, 13 Jan 2015 03:33:16 -0800
> 
> > This series contains updates to i40e and i40evf.
> 
> There was minor feedback for patch #3, please address and
> respin, thanks.

I was just about to send you email saying I am re-spinning the series
with an updated patch #3. :-)

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v2 0/3] remove nl_sk_hash_lock from netlink socket
From: David Miller @ 2015-01-13 19:01 UTC (permalink / raw)
  To: ying.xue; +Cc: tgraf, netdev
In-Reply-To: <1421045544-13670-1-git-send-email-ying.xue@windriver.com>

From: Ying Xue <ying.xue@windriver.com>
Date: Mon, 12 Jan 2015 14:52:21 +0800

> After tipc socket successfully avoids the involvement of an extra lock
> with rhashtable_lookup_insert(), it's possible for netlink socket to
> remove its hash socket lock now. But as netlink socket needs a compare
> function to look for an object, we first introduce a new function
> called rhashtable_lookup_compare_insert() in commit #1 which is
> implemented based on original rhashtable_lookup_insert(). We
> subsequently remove nl_sk_hash_lock from netlink socket with the new
> introduced function in commit #2. Lastly, as Thomas requested, we add
> commit #3 to indicate the implementation of what the grow and shrink
> decision function must enforce min/max shift.
> 
> v2:
>  As Thomas pointed out, there was a race between checking portid and
>  then setting it in commit #2. Now use socket lock to make the process
>  of both checking and setting portid atomic, and then eliminate the
>  race.

Series applied, thanks.

^ permalink raw reply

* [PATCH] i40e: avoid use of uninitialized v_budget in i40e_init_msix
From: John W. Linville @ 2015-01-13 18:48 UTC (permalink / raw)
  To: netdev; +Cc: Jeff Kirsher, Linux NICS, John W. Linville

This I40E_FCOE block increments v_budget before it has been initialized,
then v_budget gets overwritten a few lines later.  This patch just
reorders the code hunks in what I believe was the intended sequence.

Coverity: CID 1260099

Signed-off-by: John W. Linville <linville@tuxdriver.com>
---
Compile tested only...

 drivers/net/ethernet/intel/i40e/i40e_main.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index a5f2660d552d..5415d9fd7c63 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -6881,17 +6881,17 @@ static int i40e_init_msix(struct i40e_pf *pf)
 	if (pf->flags & I40E_FLAG_FD_SB_ENABLED)
 		other_vecs++;
 
+	/* Scale down if necessary, and the rings will share vectors */
+	pf->num_lan_msix = min_t(int, pf->num_lan_msix,
+			(hw->func_caps.num_msix_vectors - other_vecs));
+	v_budget = pf->num_lan_msix + other_vecs;
+
 #ifdef I40E_FCOE
 	if (pf->flags & I40E_FLAG_FCOE_ENABLED) {
 		pf->num_fcoe_msix = pf->num_fcoe_qps;
 		v_budget += pf->num_fcoe_msix;
 	}
-
 #endif
-	/* Scale down if necessary, and the rings will share vectors */
-	pf->num_lan_msix = min_t(int, pf->num_lan_msix,
-			(hw->func_caps.num_msix_vectors - other_vecs));
-	v_budget = pf->num_lan_msix + other_vecs;
 
 	pf->msix_entries = kcalloc(v_budget, sizeof(struct msix_entry),
 				   GFP_KERNEL);
-- 
2.1.0

^ permalink raw reply related

* Re: [net-next 00/15][pull request] Intel Wired LAN Driver Updates 2015-01-13
From: David Miller @ 2015-01-13 18:57 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann, jogreene
In-Reply-To: <1421148811-9763-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Tue, 13 Jan 2015 03:33:16 -0800

> This series contains updates to i40e and i40evf.

There was minor feedback for patch #3, please address and
respin, thanks.

^ permalink raw reply

* Re: [RFC PATCH v2 2/2] net: ixgbe: implement af_packet direct queue mappings
From: Willem de Bruijn @ 2015-01-13 18:58 UTC (permalink / raw)
  To: John Fastabend
  Cc: Network Development, Zhou, Danny, Neil Horman, Daniel Borkmann,
	Ronciak, John, Hannes Frederic Sowa, brouer
In-Reply-To: <20150113043542.29985.15658.stgit@nitbit.x32>

On Mon, Jan 12, 2015 at 11:35 PM, John Fastabend
<john.fastabend@gmail.com> wrote:
> This allows driver queues to be split off and mapped into user
> space using af_packet.
>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe.h         |   17 +
>  drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |   23 +
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |  407 ++++++++++++++++++++++
>  drivers/net/ethernet/intel/ixgbe/ixgbe_type.h    |    1
>  4 files changed, 440 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> index 38fc64c..aa4960e 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> @@ -204,6 +204,20 @@ struct ixgbe_tx_queue_stats {
>         u64 tx_done_old;
>  };
>
> +#define MAX_USER_DMA_REGIONS_PER_SOCKET  16
> +
> +struct ixgbe_user_dma_region {
> +       dma_addr_t dma_region_iova;
> +       unsigned long dma_region_size;
> +       int direction;
> +};
> +
> +struct ixgbe_user_queue_info {
> +       struct sock *sk_handle;
> +       struct ixgbe_user_dma_region regions[MAX_USER_DMA_REGIONS_PER_SOCKET];
> +       int num_of_regions;
> +};
> +
>  struct ixgbe_rx_queue_stats {
>         u64 rsc_count;
>         u64 rsc_flush;
> @@ -673,6 +687,9 @@ struct ixgbe_adapter {
>
>         struct ixgbe_q_vector *q_vector[MAX_Q_VECTORS];
>
> +       /* Direct User Space Queues */
> +       struct ixgbe_user_queue_info user_queue_info[MAX_RX_QUEUES];
> +
>         /* DCB parameters */
>         struct ieee_pfc *ixgbe_ieee_pfc;
>         struct ieee_ets *ixgbe_ieee_ets;
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
> index e5be0dd..f180a58 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
> @@ -2598,12 +2598,17 @@ static int ixgbe_add_ethtool_fdir_entry(struct ixgbe_adapter *adapter,
>         if (!(adapter->flags & IXGBE_FLAG_FDIR_PERFECT_CAPABLE))
>                 return -EOPNOTSUPP;
>
> +       if (fsp->ring_cookie > MAX_RX_QUEUES)
> +               return -EINVAL;
> +
>         /*
>          * Don't allow programming if the action is a queue greater than
> -        * the number of online Rx queues.
> +        * the number of online Rx queues unless it is a user space
> +        * queue.
>          */
>         if ((fsp->ring_cookie != RX_CLS_FLOW_DISC) &&
> -           (fsp->ring_cookie >= adapter->num_rx_queues))
> +           (fsp->ring_cookie >= adapter->num_rx_queues) &&
> +           !adapter->user_queue_info[fsp->ring_cookie].sk_handle)
>                 return -EINVAL;
>
>         /* Don't allow indexes to exist outside of available space */
> @@ -2680,12 +2685,18 @@ static int ixgbe_add_ethtool_fdir_entry(struct ixgbe_adapter *adapter,
>         /* apply mask and compute/store hash */
>         ixgbe_atr_compute_perfect_hash_82599(&input->filter, &mask);
>
> +       /* Set input action to reg_idx for driver owned queues otherwise
> +        * use the absolute index for user space queues.
> +        */
> +       if (fsp->ring_cookie < adapter->num_rx_queues &&
> +           fsp->ring_cookie != IXGBE_FDIR_DROP_QUEUE)
> +               input->action = adapter->rx_ring[input->action]->reg_idx;
> +
>         /* program filters to filter memory */
>         err = ixgbe_fdir_write_perfect_filter_82599(hw,
> -                               &input->filter, input->sw_idx,
> -                               (input->action == IXGBE_FDIR_DROP_QUEUE) ?
> -                               IXGBE_FDIR_DROP_QUEUE :
> -                               adapter->rx_ring[input->action]->reg_idx);
> +                                                   &input->filter,
> +                                                   input->sw_idx,
> +                                                   input->action);
>         if (err)
>                 goto err_out_w_lock;
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index 2ed2c7d..be5bde86 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -50,6 +50,9 @@
>  #include <linux/if_bridge.h>
>  #include <linux/prefetch.h>
>  #include <scsi/fc/fc_fcoe.h>
> +#include <linux/mm.h>
> +#include <linux/if_packet.h>
> +#include <linux/iommu.h>
>
>  #ifdef CONFIG_OF
>  #include <linux/of_net.h>
> @@ -80,6 +83,12 @@ const char ixgbe_driver_version[] = DRV_VERSION;
>  static const char ixgbe_copyright[] =
>                                 "Copyright (c) 1999-2014 Intel Corporation.";
>
> +static unsigned int *dummy_page_buf;
> +
> +#ifndef CONFIG_DMA_MEMORY_PROTECTION
> +#define CONFIG_DMA_MEMORY_PROTECTION
> +#endif
> +
>  static const struct ixgbe_info *ixgbe_info_tbl[] = {
>         [board_82598]           = &ixgbe_82598_info,
>         [board_82599]           = &ixgbe_82599_info,
> @@ -167,6 +176,76 @@ MODULE_DESCRIPTION("Intel(R) 10 Gigabit PCI Express Network Driver");
>  MODULE_LICENSE("GPL");
>  MODULE_VERSION(DRV_VERSION);
>
> +enum ixgbe_legacy_rx_enum {
> +       IXGBE_LEGACY_RX_FIELD_PKT_ADDR = 0,     /* Packet buffer address */
> +       IXGBE_LEGACY_RX_FIELD_LENGTH,           /* Packet length */
> +       IXGBE_LEGACY_RX_FIELD_CSUM,             /* Fragment checksum */
> +       IXGBE_LEGACY_RX_FIELD_STATUS,           /* Descriptors status */
> +       IXGBE_LEGACY_RX_FIELD_ERRORS,           /* Receive errors */
> +       IXGBE_LEGACY_RX_FIELD_VLAN,             /* VLAN tag */
> +};
> +
> +enum ixgbe_legacy_tx_enum {
> +       IXGBE_LEGACY_TX_FIELD_PKT_ADDR = 0,     /* Packet buffer address */
> +       IXGBE_LEGACY_TX_FIELD_LENGTH,           /* Packet length */
> +       IXGBE_LEGACY_TX_FIELD_CSO,              /* Checksum offset*/
> +       IXGBE_LEGACY_TX_FIELD_CMD,              /* Descriptor control */
> +       IXGBE_LEGACY_TX_FIELD_STATUS,           /* Descriptor status */
> +       IXGBE_LEGACY_TX_FIELD_RSVD,             /* Reserved */
> +       IXGBE_LEGACY_TX_FIELD_CSS,              /* Checksum start */
> +       IXGBE_LEGACY_TX_FIELD_VLAN_TAG,         /* VLAN tag */
> +};
> +
> +/* IXGBE Receive Descriptor - Legacy */
> +static const struct tpacket_nic_desc_fld ixgbe_legacy_rx_desc[] = {
> +       /* Packet buffer address */
> +       {PACKET_NIC_DESC_FIELD(IXGBE_LEGACY_RX_FIELD_PKT_ADDR,
> +                               0,  64, 64,  BO_NATIVE)},
> +       /* Packet length */
> +       {PACKET_NIC_DESC_FIELD(IXGBE_LEGACY_RX_FIELD_LENGTH,
> +                               64, 16, 8,  BO_NATIVE)},
> +       /* Fragment checksum */
> +       {PACKET_NIC_DESC_FIELD(IXGBE_LEGACY_RX_FIELD_CSUM,
> +                               80, 16, 8,  BO_NATIVE)},
> +       /* Descriptors status */
> +       {PACKET_NIC_DESC_FIELD(IXGBE_LEGACY_RX_FIELD_STATUS,
> +                               96, 8, 8,  BO_NATIVE)},
> +       /* Receive errors */
> +       {PACKET_NIC_DESC_FIELD(IXGBE_LEGACY_RX_FIELD_ERRORS,
> +                               104, 8, 8,  BO_NATIVE)},
> +       /* VLAN tag */
> +       {PACKET_NIC_DESC_FIELD(IXGBE_LEGACY_RX_FIELD_VLAN,
> +                               112, 16, 8,  BO_NATIVE)},
> +};
> +
> +/* IXGBE Transmit Descriptor - Legacy */
> +static const struct tpacket_nic_desc_fld ixgbe_legacy_tx_desc[] = {
> +       /* Packet buffer address */
> +       {PACKET_NIC_DESC_FIELD(IXGBE_LEGACY_TX_FIELD_PKT_ADDR,
> +                               0,   64, 64,  BO_NATIVE)},
> +       /* Data buffer length */
> +       {PACKET_NIC_DESC_FIELD(IXGBE_LEGACY_TX_FIELD_LENGTH,
> +                               64,  16, 8,  BO_NATIVE)},
> +       /* Checksum offset */
> +       {PACKET_NIC_DESC_FIELD(IXGBE_LEGACY_TX_FIELD_CSO,
> +                               80,  8, 8,  BO_NATIVE)},
> +       /* Command byte */
> +       {PACKET_NIC_DESC_FIELD(IXGBE_LEGACY_TX_FIELD_CMD,
> +                               88,  8, 8,  BO_NATIVE)},
> +       /* Transmitted status */
> +       {PACKET_NIC_DESC_FIELD(IXGBE_LEGACY_TX_FIELD_STATUS,
> +                               96,  4, 1,  BO_NATIVE)},
> +       /* Reserved */
> +       {PACKET_NIC_DESC_FIELD(IXGBE_LEGACY_TX_FIELD_RSVD,
> +                               100, 4, 1,  BO_NATIVE)},
> +       /* Checksum start */
> +       {PACKET_NIC_DESC_FIELD(IXGBE_LEGACY_TX_FIELD_CSS,
> +                               104, 8, 8,  BO_NATIVE)},
> +       /* VLAN tag */
> +       {PACKET_NIC_DESC_FIELD(IXGBE_LEGACY_TX_FIELD_VLAN_TAG,
> +                               112, 16, 8,  BO_NATIVE)},
> +};
> +
>  static bool ixgbe_check_cfg_remove(struct ixgbe_hw *hw, struct pci_dev *pdev);
>
>  static int ixgbe_read_pci_cfg_word_parent(struct ixgbe_adapter *adapter,
> @@ -3137,6 +3216,17 @@ static void ixgbe_enable_rx_drop(struct ixgbe_adapter *adapter,
>         IXGBE_WRITE_REG(hw, IXGBE_SRRCTL(reg_idx), srrctl);
>  }
>
> +static bool ixgbe_have_user_queues(struct ixgbe_adapter *adapter)
> +{
> +       int i;
> +
> +       for (i = 0; i < MAX_RX_QUEUES; i++) {
> +               if (adapter->user_queue_info[i].sk_handle)
> +                       return true;
> +       }
> +       return false;
> +}
> +
>  static void ixgbe_disable_rx_drop(struct ixgbe_adapter *adapter,
>                                   struct ixgbe_ring *ring)
>  {
> @@ -3171,7 +3261,8 @@ static void ixgbe_set_rx_drop_en(struct ixgbe_adapter *adapter)
>          *  and performance reasons.
>          */
>         if (adapter->num_vfs || (adapter->num_rx_queues > 1 &&
> -           !(adapter->hw.fc.current_mode & ixgbe_fc_tx_pause) && !pfc_en)) {
> +           !(adapter->hw.fc.current_mode & ixgbe_fc_tx_pause) && !pfc_en) ||
> +           ixgbe_have_user_queues(adapter)) {
>                 for (i = 0; i < adapter->num_rx_queues; i++)
>                         ixgbe_enable_rx_drop(adapter, adapter->rx_ring[i]);
>         } else {
> @@ -7938,6 +8029,306 @@ static void ixgbe_fwd_del(struct net_device *pdev, void *priv)
>         kfree(fwd_adapter);
>  }
>
> +static int ixgbe_ndo_split_queue_pairs(struct net_device *dev,
> +                                      unsigned int start_from,
> +                                      unsigned int qpairs_num,
> +                                      struct sock *sk)
> +{
> +       struct ixgbe_adapter *adapter = netdev_priv(dev);
> +       unsigned int qpair_index;
> +
> +       /* allocate whatever available qpairs */
> +       if (start_from == -1) {

When is this wildcard case used? If the nic is configured to send
specific traffic to a specific rxqueue, then that queue has to be
mapped. When is an arbitrary queue acceptable?

> +               unsigned int count = 0;
> +
> +               for (qpair_index = adapter->num_rx_queues;
> +                    qpair_index < MAX_RX_QUEUES;
> +                    qpair_index++) {
> +                       if (!adapter->user_queue_info[qpair_index].sk_handle) {
> +                               count++;
> +                               if (count == qpairs_num) {
> +                                       start_from = qpair_index - count + 1;
> +                                       break;
> +                               }
> +                       } else {
> +                               count = 0;
> +                       }
> +               }
> +       }
> +
> +       /* otherwise the caller specified exact queues */
> +       if ((start_from > MAX_TX_QUEUES) ||
> +           (start_from > MAX_RX_QUEUES) ||
> +           (start_from + qpairs_num > MAX_TX_QUEUES) ||
> +           (start_from + qpairs_num > MAX_RX_QUEUES))
> +               return -EINVAL;
> +
> +       /* If the qpairs are being used by the driver do not let user space
> +        * consume the queues. Also if the queue has already been allocated
> +        * to a socket do fail the request.
> +        */
> +       for (qpair_index = start_from;
> +            qpair_index < start_from + qpairs_num;
> +            qpair_index++) {
> +               if ((qpair_index < adapter->num_tx_queues) ||
> +                   (qpair_index < adapter->num_rx_queues))
> +                       return -EINVAL;

is there a similar check to ensure that the driver does not increase its
number of queues with ethtool -X and subsumes user queues?

> +
> +               if (adapter->user_queue_info[qpair_index].sk_handle)
> +                       return -EBUSY;
> +       }
> +
> +       /* remember the sk handle for each queue pair */
> +       for (qpair_index = start_from;
> +            qpair_index < start_from + qpairs_num;
> +            qpair_index++) {
> +               adapter->user_queue_info[qpair_index].sk_handle = sk;
> +               adapter->user_queue_info[qpair_index].num_of_regions = 0;
> +       }
> +
> +       return 0;
> +}
> +
> +static int ixgbe_ndo_get_split_queue_pairs(struct net_device *dev,
> +                                          unsigned int *start_from,
> +                                          unsigned int *qpairs_num,
> +                                          struct sock *sk)
> +{
> +       struct ixgbe_adapter *adapter = netdev_priv(dev);
> +       unsigned int qpair_index;
> +       *qpairs_num = 0;
> +
> +       for (qpair_index = adapter->num_tx_queues;
> +            qpair_index < MAX_RX_QUEUES;
> +            qpair_index++) {
> +               if (adapter->user_queue_info[qpair_index].sk_handle == sk) {
> +                       if (*qpairs_num == 0)
> +                               *start_from = qpair_index;
> +                       *qpairs_num = *qpairs_num + 1;
> +               }
> +       }
> +
> +       return 0;
> +}
> +
> +static int ixgbe_ndo_return_queue_pairs(struct net_device *dev, struct sock *sk)
> +{
> +       struct ixgbe_adapter *adapter = netdev_priv(dev);
> +       struct ixgbe_user_queue_info *info;
> +       unsigned int qpair_index;
> +
> +       for (qpair_index = adapter->num_tx_queues;
> +            qpair_index < MAX_RX_QUEUES;
> +            qpair_index++) {
> +               info = &adapter->user_queue_info[qpair_index];
> +
> +               if (info->sk_handle == sk) {
> +                       info->sk_handle = NULL;
> +                       info->num_of_regions = 0;
> +               }
> +       }
> +
> +       return 0;
> +}
> +
> +/* Rx descriptor starts from 0x1000 and Tx descriptor starts from 0x6000
> + * both the TX and RX descriptors use 4K pages.
> + */
> +#define RX_DESC_ADDR_OFFSET            0x1000
> +#define TX_DESC_ADDR_OFFSET            0x6000
> +#define PAGE_SIZE_4K                   4096
> +
> +static int
> +ixgbe_ndo_qpair_map_region(struct net_device *dev,
> +                          struct tpacket_dev_qpair_map_region_info *info)
> +{
> +       struct ixgbe_adapter *adapter = netdev_priv(dev);
> +
> +       /* no need to map systme memory to userspace for ixgbe */
> +       info->tp_dev_sysm_sz = 0;
> +       info->tp_num_sysm_map_regions = 0;
> +
> +       info->tp_dev_bar_sz = pci_resource_len(adapter->pdev, 0);
> +       info->tp_num_map_regions = 2;
> +
> +       info->tp_regions[0].page_offset = RX_DESC_ADDR_OFFSET;
> +       info->tp_regions[0].page_sz = PAGE_SIZE;
> +       info->tp_regions[0].page_cnt = 1;
> +       info->tp_regions[1].page_offset = TX_DESC_ADDR_OFFSET;
> +       info->tp_regions[1].page_sz = PAGE_SIZE;
> +       info->tp_regions[1].page_cnt = 1;
> +
> +       return 0;
> +}
> +
> +static int ixgbe_ndo_get_device_desc_info(struct net_device *dev,
> +                                         struct tpacket_dev_info *dev_info)
> +{
> +       struct ixgbe_adapter *adapter = netdev_priv(dev);
> +       int max_queues;
> +       int i;
> +       __u8 flds_rx = sizeof(ixgbe_legacy_rx_desc) /
> +                      sizeof(struct tpacket_nic_desc_fld);
> +       __u8 flds_tx = sizeof(ixgbe_legacy_tx_desc) /
> +                      sizeof(struct tpacket_nic_desc_fld);
> +
> +       max_queues = max(adapter->num_rx_queues, adapter->num_tx_queues);
> +
> +       dev_info->tp_device_id = adapter->hw.device_id;
> +       dev_info->tp_vendor_id = adapter->hw.vendor_id;
> +       dev_info->tp_subsystem_device_id = adapter->hw.subsystem_device_id;
> +       dev_info->tp_subsystem_vendor_id = adapter->hw.subsystem_vendor_id;
> +       dev_info->tp_revision_id = adapter->hw.revision_id;
> +       dev_info->tp_numa_node = dev_to_node(&dev->dev);
> +
> +       dev_info->tp_num_total_qpairs = min(MAX_RX_QUEUES, MAX_TX_QUEUES);
> +       dev_info->tp_num_inuse_qpairs = max_queues;
> +
> +       dev_info->tp_num_rx_desc_fmt = 1;
> +       dev_info->tp_num_tx_desc_fmt = 1;
> +
> +       dev_info->tp_rx_dexpr[0].version = 1;
> +       dev_info->tp_rx_dexpr[0].size = sizeof(union ixgbe_adv_rx_desc);
> +       dev_info->tp_rx_dexpr[0].byte_order = BO_NATIVE;
> +       dev_info->tp_rx_dexpr[0].num_of_fld = flds_rx;
> +       for (i = 0; i < dev_info->tp_rx_dexpr[0].num_of_fld; i++)
> +               memcpy(&dev_info->tp_rx_dexpr[0].fields[i],
> +                      &ixgbe_legacy_rx_desc[i],
> +                      sizeof(struct tpacket_nic_desc_fld));
> +
> +       dev_info->tp_tx_dexpr[0].version = 1;
> +       dev_info->tp_tx_dexpr[0].size = sizeof(union ixgbe_adv_tx_desc);
> +       dev_info->tp_tx_dexpr[0].byte_order = BO_NATIVE;
> +       dev_info->tp_tx_dexpr[0].num_of_fld = flds_tx;
> +       for (i = 0; i < dev_info->tp_rx_dexpr[0].num_of_fld; i++)
> +               memcpy(&dev_info->tp_tx_dexpr[0].fields[i],
> +                      &ixgbe_legacy_tx_desc[i],
> +                      sizeof(struct tpacket_nic_desc_fld));
> +
> +       return 0;
> +}
> +
> +static int
> +ixgbe_ndo_qpair_page_map(struct vm_area_struct *vma, struct net_device *dev)
> +{
> +       struct ixgbe_adapter *adapter = netdev_priv(dev);
> +       phys_addr_t phy_addr = pci_resource_start(adapter->pdev, 0);
> +       unsigned long pfn_rx = (phy_addr + RX_DESC_ADDR_OFFSET) >> PAGE_SHIFT;
> +       unsigned long pfn_tx = (phy_addr + TX_DESC_ADDR_OFFSET) >> PAGE_SHIFT;
> +       unsigned long dummy_page_phy;
> +       pgprot_t pre_vm_page_prot;
> +       unsigned long start;
> +       unsigned int i;
> +       int err;
> +
> +       if (!dummy_page_buf) {
> +               dummy_page_buf = kzalloc(PAGE_SIZE_4K, GFP_KERNEL);
> +               if (!dummy_page_buf)
> +                       return -ENOMEM;
> +
> +               for (i = 0; i < PAGE_SIZE_4K / sizeof(unsigned int); i++)
> +                       dummy_page_buf[i] = 0xdeadbeef;
> +       }
> +
> +       dummy_page_phy = virt_to_phys(dummy_page_buf);
> +       pre_vm_page_prot = vma->vm_page_prot;
> +       vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> +
> +       /* assume the vm_start is 4K aligned address */
> +       for (start = vma->vm_start;
> +            start < vma->vm_end;
> +            start += PAGE_SIZE_4K) {
> +               if (start == vma->vm_start + RX_DESC_ADDR_OFFSET) {
> +                       err = remap_pfn_range(vma, start, pfn_rx, PAGE_SIZE_4K,
> +                                             vma->vm_page_prot);
> +                       if (err)
> +                               return -EAGAIN;
> +               } else if (start == vma->vm_start + TX_DESC_ADDR_OFFSET) {
> +                       err = remap_pfn_range(vma, start, pfn_tx, PAGE_SIZE_4K,
> +                                             vma->vm_page_prot);
> +                       if (err)
> +                               return -EAGAIN;
> +               } else {
> +                       unsigned long addr = dummy_page_phy > PAGE_SHIFT;
> +
> +                       err = remap_pfn_range(vma, start, addr, PAGE_SIZE_4K,
> +                                             pre_vm_page_prot);
> +                       if (err)
> +                               return -EAGAIN;
> +               }
> +       }
> +       return 0;
> +}
> +
> +static int
> +ixgbe_ndo_val_dma_mem_region_map(struct net_device *dev,
> +                                struct tpacket_dma_mem_region *region,
> +                                struct sock *sk)
> +{
> +       struct ixgbe_adapter *adapter = netdev_priv(dev);
> +       unsigned int qpair_index, i;
> +       struct ixgbe_user_queue_info *info;
> +
> +#ifdef CONFIG_DMA_MEMORY_PROTECTION
> +       /* IOVA not equal to physical address means IOMMU takes effect */
> +       if (region->phys_addr == region->iova)
> +               return -EFAULT;
> +#endif
> +
> +       for (qpair_index = adapter->num_tx_queues;
> +            qpair_index < MAX_RX_QUEUES;
> +            qpair_index++) {
> +               info = &adapter->user_queue_info[qpair_index];
> +               i = info->num_of_regions;
> +
> +               if (info->sk_handle != sk)
> +                       continue;
> +
> +               if (info->num_of_regions >= MAX_USER_DMA_REGIONS_PER_SOCKET)
> +                       return -EFAULT;
> +
> +               info->regions[i].dma_region_size = region->size;
> +               info->regions[i].direction = region->direction;
> +               info->regions[i].dma_region_iova = region->iova;
> +               info->num_of_regions++;
> +       }
> +
> +       return 0;
> +}
> +
> +static int
> +ixgbe_get_dma_region_info(struct net_device *dev,
> +                         struct tpacket_dma_mem_region *region,
> +                         struct sock *sk)
> +{
> +       struct ixgbe_adapter *adapter = netdev_priv(dev);
> +       struct ixgbe_user_queue_info *info;
> +       unsigned int qpair_index;
> +
> +       for (qpair_index = adapter->num_tx_queues;
> +            qpair_index < MAX_RX_QUEUES;
> +            qpair_index++) {
> +               int i;
> +
> +               info = &adapter->user_queue_info[qpair_index];
> +               if (info->sk_handle != sk)
> +                       continue;
> +
> +               for (i = 0; i <= info->num_of_regions; i++) {
> +                       struct ixgbe_user_dma_region *r;
> +
> +                       r = &info->regions[i];
> +                       if ((r->dma_region_size == region->size) &&
> +                           (r->direction == region->direction)) {
> +                               region->iova = r->dma_region_iova;
> +                               return 0;
> +                       }
> +               }
> +       }
> +
> +       return -1;
> +}
> +
>  static const struct net_device_ops ixgbe_netdev_ops = {
>         .ndo_open               = ixgbe_open,
>         .ndo_stop               = ixgbe_close,
> @@ -7982,6 +8373,15 @@ static const struct net_device_ops ixgbe_netdev_ops = {
>         .ndo_bridge_getlink     = ixgbe_ndo_bridge_getlink,
>         .ndo_dfwd_add_station   = ixgbe_fwd_add,
>         .ndo_dfwd_del_station   = ixgbe_fwd_del,
> +
> +       .ndo_split_queue_pairs  = ixgbe_ndo_split_queue_pairs,
> +       .ndo_get_split_queue_pairs = ixgbe_ndo_get_split_queue_pairs,
> +       .ndo_return_queue_pairs    = ixgbe_ndo_return_queue_pairs,
> +       .ndo_get_device_desc_info  = ixgbe_ndo_get_device_desc_info,
> +       .ndo_direct_qpair_page_map = ixgbe_ndo_qpair_page_map,
> +       .ndo_get_dma_region_info   = ixgbe_get_dma_region_info,
> +       .ndo_get_device_qpair_map_region_info = ixgbe_ndo_qpair_map_region,
> +       .ndo_validate_dma_mem_region_map = ixgbe_ndo_val_dma_mem_region_map,
>  };
>
>  /**
> @@ -8203,7 +8603,9 @@ static int ixgbe_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>         hw->back = adapter;
>         adapter->msg_enable = netif_msg_init(debug, DEFAULT_MSG_ENABLE);
>
> -       hw->hw_addr = ioremap(pci_resource_start(pdev, 0),
> +       hw->pci_hw_addr = pci_resource_start(pdev, 0);
> +
> +       hw->hw_addr = ioremap(hw->pci_hw_addr,
>                               pci_resource_len(pdev, 0));
>         adapter->io_addr = hw->hw_addr;
>         if (!hw->hw_addr) {
> @@ -8875,6 +9277,7 @@ module_init(ixgbe_init_module);
>   **/
>  static void __exit ixgbe_exit_module(void)
>  {
> +       kfree(dummy_page_buf);
>  #ifdef CONFIG_IXGBE_DCA
>         dca_unregister_notify(&dca_notifier);
>  #endif
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
> index d101b25..4034d31 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
> @@ -3180,6 +3180,7 @@ struct ixgbe_mbx_info {
>
>  struct ixgbe_hw {
>         u8 __iomem                      *hw_addr;
> +       phys_addr_t                     pci_hw_addr;
>         void                            *back;
>         struct ixgbe_mac_info           mac;
>         struct ixgbe_addr_filter_info   addr_ctrl;
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: BW regression after "tcp: refine TSO autosizing"
From: Eric Dumazet @ 2015-01-13 18:57 UTC (permalink / raw)
  To: Eyal Perry; +Cc: netdev, Amir Vadai, yevgenyp, saeedm, idos, amira, eyalpe
In-Reply-To: <54B54C72.8060705@dev.mellanox.co.il>

On Tue, 2015-01-13 at 18:48 +0200, Eyal Perry wrote:
> Hello Eric,
> Lately we've observed performance degradation in BW of about 30-40% (depends on
> the setup we use).
> I've bisected the issue down to the this commit: 605ad7f1 ("tcp: refine TSO
> autosizing")
> 
> For instance, I was running the following test:
> 1. Bounding net device' irqs to core 0 for both client and server side
> 2. Running netperf with 64K massage size (used the following command)
> $ netperf -H remote -T 1,1 -l 100 -t TCP_STREAM -- -k THROUGHPUT -M 65536 -m 65536
> 
> I ran the test on upstream net-next including your patch and than reverted it
> and these are the results I got was improvement from 14.6Gbps to 22.1Gbps.
> 
> an additional difference I've noticed when inspecting the ethtool statics,
> number of xmit_more packets increased from 4 to 160 with the reverted kernel.
> 
> We are investigating this issue, do you have a hint?

Which driver are you using for this test ?

^ permalink raw reply

* Re: [RFC PATCH v2 1/2] net: af_packet support for direct ring access in user space
From: Willem de Bruijn @ 2015-01-13 18:52 UTC (permalink / raw)
  To: John Fastabend
  Cc: Network Development, Zhou, Danny, Neil Horman, Daniel Borkmann,
	Ronciak, John, Hannes Frederic Sowa, brouer
In-Reply-To: <20150113043509.29985.33515.stgit@nitbit.x32>

On Mon, Jan 12, 2015 at 11:35 PM, John Fastabend
<john.fastabend@gmail.com> wrote:
> This patch adds net_device ops to split off a set of driver queues
> from the driver and map the queues into user space via mmap. This
> allows the queues to be directly manipulated from user space. For
> raw packet interface this removes any overhead from the kernel network
> stack.

Can you elaborate how packet payload mapping is handled?
Processes are still responsible for translating from user virtual to
physical (and bus) addresses, correct? The IOMMU is only there
to restrict the physical address ranges that may be written.

>
> With these operations we bypass the network stack and packet_type
> handlers that would typically send traffic to an af_packet socket.
> This means hardware must do the forwarding. To do this ew can use
> the ETHTOOL_SRXCLSRLINS ops in the ethtool command set. It is
> currently supported by multiple drivers including sfc, mlx4, niu,
> ixgbe, and i40e. Supporting some way to steer traffic to a queue
> is the _only_ hardware requirement to support this interface.
>
> A follow on patch adds support for ixgbe but we expect at least
> the subset of drivers implementing ETHTOOL_SRXCLSRLINS can be
> implemented later.
>
> The high level flow, leveraging the af_packet control path, looks
> like:
>
>         bind(fd, &sockaddr, sizeof(sockaddr));
>
>         /* Get the device type and info */
>         getsockopt(fd, SOL_PACKET, PACKET_DEV_DESC_INFO, &def_info,
>                    &optlen);
>
>         /* With device info we can look up descriptor format */
>
>         /* Get the layout of ring space offset, page_sz, cnt */
>         getsockopt(fd, SOL_PACKET, PACKET_DEV_QPAIR_MAP_REGION_INFO,
>                    &info, &optlen);
>
>         /* request some queues from the driver */
>         setsockopt(fd, SOL_PACKET, PACKET_RXTX_QPAIRS_SPLIT,
>                    &qpairs_info, sizeof(qpairs_info));
>
>         /* if we let the driver pick us queues learn which queues
>          * we were given
>          */
>         getsockopt(fd, SOL_PACKET, PACKET_RXTX_QPAIRS_SPLIT,
>                    &qpairs_info, sizeof(qpairs_info));
>
>         /* And mmap queue pairs to user space */
>         mmap(NULL, info.tp_dev_bar_sz, PROT_READ | PROT_WRITE,
>              MAP_SHARED, fd, 0);
>
>         /* Now we have some user space queues to read/write to*/
>
> There is one critical difference when running with these interfaces
> vs running without them. In the normal case the af_packet module
> uses a standard descriptor format exported by the af_packet user
> space headers. In this model because we are working directly with
> driver queues the descriptor format maps to the descriptor format
> used by the device. User space applications can learn device
> information from the socket option PACKET_DEV_DESC_INFO. These
> are described by giving the vendor/deviceid and a descriptor layout
> in offset/length/width/alignment/byte_ordering.

Raising the issue of exposed vs. virtualized interface just once
more. I wonder if it is possible to keep the virtual to physical
translation in the kernel while avoiding syscall latency, by doing
the translation in a kernel thread on a coupled hyperthread that
waits with mwait on the virtual queue producer index. The page
table operations that Neil proposed in v1 of this patch may work
even better.

> To protect against arbitrary DMA writes IOMMU devices put memory
> in a single domain to stop arbitrary DMA to memory. Note it would
> be possible to dma into another sockets pages because most NIC
> devices only support a single domain. This would require being
> able to guess another sockets page layout. However the socket
> operation does require CAP_NET_ADMIN privileges.
>
> Additionally we have a set of DPDK patches to enable DPDK with this
> interface. DPDK can be downloaded @ dpdk.org although as I hope is
> clear from above DPDK is just our paticular test environment we
> expect other libraries could be built on this interface.
>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>  include/linux/netdevice.h      |   79 ++++++++
>  include/uapi/linux/if_packet.h |   88 +++++++++
>  net/packet/af_packet.c         |  397 ++++++++++++++++++++++++++++++++++++++++
>  net/packet/internal.h          |   10 +
>  4 files changed, 573 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 679e6e9..b71c97d 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -52,6 +52,8 @@
>  #include <linux/neighbour.h>
>  #include <uapi/linux/netdevice.h>
>
> +#include <linux/if_packet.h>
> +
>  struct netpoll_info;
>  struct device;
>  struct phy_device;
> @@ -1030,6 +1032,54 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
>   * int (*ndo_switch_port_stp_update)(struct net_device *dev, u8 state);
>   *     Called to notify switch device port of bridge port STP
>   *     state change.
> + *
> + * int (*ndo_split_queue_pairs) (struct net_device *dev,
> + *                              unsigned int qpairs_start_from,
> + *                              unsigned int qpairs_num,
> + *                              struct sock *sk)
> + *     Called to request a set of queues from the driver to be handed to the
> + *     callee for management. After this returns the driver will not use the
> + *     queues.
> + *
> + * int (*ndo_get_split_queue_pairs) (struct net_device *dev,
> + *                              unsigned int *qpairs_start_from,
> + *                              unsigned int *qpairs_num,
> + *                              struct sock *sk)
> + *     Called to get the location of queues that have been split for user
> + *     space to use. The socket must have previously requested the queues via
> + *     ndo_split_queue_pairs successfully.
> + *
> + * int (*ndo_return_queue_pairs) (struct net_device *dev,
> + *                               struct sock *sk)
> + *     Called to return a set of queues identified by sock to the driver. The
> + *     socket must have previously requested the queues via
> + *     ndo_split_queue_pairs for this action to be performed.
> + *
> + * int (*ndo_get_device_qpair_map_region_info) (struct net_device *dev,
> + *                             struct tpacket_dev_qpair_map_region_info *info)
> + *     Called to return mapping of queue memory region.
> + *
> + * int (*ndo_get_device_desc_info) (struct net_device *dev,
> + *                                 struct tpacket_dev_info *dev_info)
> + *     Called to get device specific information. This should uniquely identify
> + *     the hardware so that descriptor formats can be learned by the stack/user
> + *     space.
> + *
> + * int (*ndo_direct_qpair_page_map) (struct vm_area_struct *vma,
> + *                                  struct net_device *dev)
> + *     Called to map queue pair range from split_queue_pairs into mmap region.
> + *
> + * int (*ndo_direct_validate_dma_mem_region_map)
> + *                                     (struct net_device *dev,
> + *                                      struct tpacket_dma_mem_region *region,
> + *                                      struct sock *sk)
> + *     Called to validate DMA address remaping for userspace memory region
> + *
> + * int (*ndo_get_dma_region_info)
> + *                              (struct net_device *dev,
> + *                               struct tpacket_dma_mem_region *region,
> + *                               struct sock *sk)
> + *     Called to get dma region' information such as iova.
>   */
>  struct net_device_ops {
>         int                     (*ndo_init)(struct net_device *dev);
> @@ -1190,6 +1240,35 @@ struct net_device_ops {
>         int                     (*ndo_switch_port_stp_update)(struct net_device *dev,
>                                                               u8 state);
>  #endif
> +       int                     (*ndo_split_queue_pairs)(struct net_device *dev,
> +                                        unsigned int qpairs_start_from,
> +                                        unsigned int qpairs_num,
> +                                        struct sock *sk);
> +       int                     (*ndo_get_split_queue_pairs)
> +                                       (struct net_device *dev,
> +                                        unsigned int *qpairs_start_from,
> +                                        unsigned int *qpairs_num,
> +                                        struct sock *sk);
> +       int                     (*ndo_return_queue_pairs)
> +                                       (struct net_device *dev,
> +                                        struct sock *sk);
> +       int                     (*ndo_get_device_qpair_map_region_info)
> +                                       (struct net_device *dev,
> +                                        struct tpacket_dev_qpair_map_region_info *info);
> +       int                     (*ndo_get_device_desc_info)
> +                                       (struct net_device *dev,
> +                                        struct tpacket_dev_info *dev_info);
> +       int                     (*ndo_direct_qpair_page_map)
> +                                       (struct vm_area_struct *vma,
> +                                        struct net_device *dev);
> +       int                     (*ndo_validate_dma_mem_region_map)
> +                                       (struct net_device *dev,
> +                                        struct tpacket_dma_mem_region *region,
> +                                        struct sock *sk);
> +       int                     (*ndo_get_dma_region_info)
> +                                       (struct net_device *dev,
> +                                        struct tpacket_dma_mem_region *region,
> +                                        struct sock *sk);
>  };
>
>  /**
> diff --git a/include/uapi/linux/if_packet.h b/include/uapi/linux/if_packet.h
> index da2d668..eb7a727 100644
> --- a/include/uapi/linux/if_packet.h
> +++ b/include/uapi/linux/if_packet.h
> @@ -54,6 +54,13 @@ struct sockaddr_ll {
>  #define PACKET_FANOUT                  18
>  #define PACKET_TX_HAS_OFF              19
>  #define PACKET_QDISC_BYPASS            20
> +#define PACKET_RXTX_QPAIRS_SPLIT       21
> +#define PACKET_RXTX_QPAIRS_RETURN      22
> +#define PACKET_DEV_QPAIR_MAP_REGION_INFO       23
> +#define PACKET_DEV_DESC_INFO           24
> +#define PACKET_DMA_MEM_REGION_MAP       25
> +#define PACKET_DMA_MEM_REGION_RELEASE   26
> +
>
>  #define PACKET_FANOUT_HASH             0
>  #define PACKET_FANOUT_LB               1
> @@ -64,6 +71,87 @@ struct sockaddr_ll {
>  #define PACKET_FANOUT_FLAG_ROLLOVER    0x1000
>  #define PACKET_FANOUT_FLAG_DEFRAG      0x8000
>
> +#define PACKET_MAX_NUM_MAP_MEMORY_REGIONS 64
> +#define PACKET_MAX_NUM_DESC_FORMATS      8
> +#define PACKET_MAX_NUM_DESC_FIELDS       64
> +#define PACKET_NIC_DESC_FIELD(fseq, foffset, fwidth, falign, fbo) \
> +               .seqn = (__u8)fseq,                             \
> +               .offset = (__u8)foffset,                        \
> +               .width = (__u8)fwidth,                          \
> +               .align = (__u8)falign,                          \
> +               .byte_order = (__u8)fbo
> +
> +#define MAX_MAP_MEMORY_REGIONS 64
> +
> +/* setsockopt takes addr, size ,direction parametner, getsockopt takes
> + * iova, size, direction.
> + * */
> +struct tpacket_dma_mem_region {
> +       void *addr;             /* userspace virtual address */
> +       __u64 phys_addr;        /* physical address */
> +       __u64 iova;             /* IO virtual address used for DMA */
> +       unsigned long size;     /* size of region */
> +       int direction;          /* dma data direction */
> +};
> +
> +struct tpacket_dev_qpair_map_region_info {
> +       unsigned int tp_dev_bar_sz;             /* size of BAR */
> +       unsigned int tp_dev_sysm_sz;            /* size of systerm memory */
> +       /* number of contiguous memory on BAR mapping to user space */
> +       unsigned int tp_num_map_regions;
> +       /* number of contiguous memory on system mapping to user apce */
> +       unsigned int tp_num_sysm_map_regions;
> +       struct map_page_region {
> +               unsigned page_offset;   /* offset to start of region */
> +               unsigned page_sz;       /* size of page */
> +               unsigned page_cnt;      /* number of pages */
> +       } tp_regions[MAX_MAP_MEMORY_REGIONS];
> +};
> +
> +struct tpacket_dev_qpairs_info {
> +       unsigned int tp_qpairs_start_from;      /* qpairs index to start from */
> +       unsigned int tp_qpairs_num;             /* number of qpairs */
> +};
> +
> +enum tpack_desc_byte_order {
> +       BO_NATIVE = 0,
> +       BO_NETWORK,
> +       BO_BIG_ENDIAN,
> +       BO_LITTLE_ENDIAN,
> +};
> +
> +struct tpacket_nic_desc_fld {
> +       __u8 seqn;      /* Sequency index of descriptor field */
> +       __u8 offset;    /* Offset to start */
> +       __u8 width;     /* Width of field */
> +       __u8 align;     /* Alignment in bits */
> +       enum tpack_desc_byte_order byte_order;  /* Endian flag */
> +};
> +
> +struct tpacket_nic_desc_expr {
> +       __u8 version;           /* Version number */
> +       __u8 size;              /* Descriptor size in bytes */
> +       enum tpack_desc_byte_order byte_order;          /* Endian flag */
> +       __u8 num_of_fld;        /* Number of valid fields */
> +       /* List of each descriptor field */
> +       struct tpacket_nic_desc_fld fields[PACKET_MAX_NUM_DESC_FIELDS];
> +};
> +
> +struct tpacket_dev_info {
> +       __u16   tp_device_id;
> +       __u16   tp_vendor_id;
> +       __u16   tp_subsystem_device_id;
> +       __u16   tp_subsystem_vendor_id;
> +       __u32   tp_numa_node;
> +       __u32   tp_revision_id;
> +       __u32   tp_num_total_qpairs;
> +       __u32   tp_num_inuse_qpairs;
> +       __u32   tp_num_rx_desc_fmt;
> +       __u32   tp_num_tx_desc_fmt;
> +       struct tpacket_nic_desc_expr tp_rx_dexpr[PACKET_MAX_NUM_DESC_FORMATS];
> +       struct tpacket_nic_desc_expr tp_tx_dexpr[PACKET_MAX_NUM_DESC_FORMATS];
> +};
> +
>  struct tpacket_stats {
>         unsigned int    tp_packets;
>         unsigned int    tp_drops;
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index 6880f34..8cd17da 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -214,6 +214,9 @@ static void prb_clear_rxhash(struct tpacket_kbdq_core *,
>  static void prb_fill_vlan_info(struct tpacket_kbdq_core *,
>                 struct tpacket3_hdr *);
>  static void packet_flush_mclist(struct sock *sk);
> +static int umem_release(struct net_device *dev, struct packet_sock *po);
> +static int get_umem_pages(struct tpacket_dma_mem_region *region,
> +                         struct packet_umem_region *umem);
>
>  struct packet_skb_cb {
>         unsigned int origlen;
> @@ -2633,6 +2636,16 @@ static int packet_release(struct socket *sock)
>         sock_prot_inuse_add(net, sk->sk_prot, -1);
>         preempt_enable();
>
> +       if (po->tp_owns_queue_pairs) {
> +               struct net_device *dev;
> +
> +               dev = __dev_get_by_index(sock_net(sk), po->ifindex);
> +               if (dev) {
> +                       dev->netdev_ops->ndo_return_queue_pairs(dev, sk);
> +                       umem_release(dev, po);
> +               }
> +       }
> +
>         spin_lock(&po->bind_lock);
>         unregister_prot_hook(sk, false);
>         packet_cached_dev_reset(po);
> @@ -2829,6 +2842,8 @@ static int packet_create(struct net *net, struct socket *sock, int protocol,
>         po->num = proto;
>         po->xmit = dev_queue_xmit;
>
> +       INIT_LIST_HEAD(&po->umem_list);
> +
>         err = packet_alloc_pending(po);
>         if (err)
>                 goto out2;
> @@ -3226,6 +3241,88 @@ static void packet_flush_mclist(struct sock *sk)
>  }
>
>  static int
> +get_umem_pages(struct tpacket_dma_mem_region *region,
> +              struct packet_umem_region *umem)
> +{
> +       struct page **page_list;
> +       unsigned long npages;
> +       unsigned long offset;
> +       unsigned long base;
> +       unsigned long i;
> +       int ret;
> +       dma_addr_t phys_base;
> +
> +       phys_base = (region->phys_addr) & PAGE_MASK;
> +       base = ((unsigned long)region->addr) & PAGE_MASK;
> +       offset = ((unsigned long)region->addr) & (~PAGE_MASK);
> +       npages = PAGE_ALIGN(region->size + offset) >> PAGE_SHIFT;
> +
> +       npages = min_t(unsigned long, npages, umem->nents);
> +       sg_init_table(umem->sglist, npages);
> +
> +       umem->nmap = 0;
> +       page_list = (struct page **)__get_free_page(GFP_KERNEL);
> +       if (!page_list)
> +               return -ENOMEM;
> +
> +       while (npages) {
> +               unsigned long min = min_t(unsigned long, npages,
> +                                         PAGE_SIZE / sizeof(struct page *));
> +
> +               ret = get_user_pages(current, current->mm, base, min,
> +                                    1, 0, page_list, NULL);
> +               if (ret < 0)
> +                       break;
> +
> +               base += ret * PAGE_SIZE;
> +               npages -= ret;
> +
> +               /* validate if the memory region is physically contigenous */
> +               for (i = 0; i < ret; i++) {
> +                       unsigned int page_index =
> +                               (page_to_phys(page_list[i]) - phys_base) /
> +                               PAGE_SIZE;
> +
> +                       if (page_index != umem->nmap + i) {
> +                               int j;
> +
> +                               for (j = 0; j < (umem->nmap + i); j++)
> +                                       put_page(sg_page(&umem->sglist[j]));
> +
> +                               free_page((unsigned long)page_list);
> +                               return -EFAULT;
> +                       }
> +
> +                       sg_set_page(&umem->sglist[umem->nmap + i],
> +                                   page_list[i], PAGE_SIZE, 0);
> +               }
> +
> +               umem->nmap += ret;
> +       }
> +
> +       free_page((unsigned long)page_list);
> +       return 0;
> +}
> +
> +static int
> +umem_release(struct net_device *dev, struct packet_sock *po)
> +{
> +       struct packet_umem_region *umem, *tmp;
> +       int i;
> +
> +       list_for_each_entry_safe(umem, tmp, &po->umem_list, list) {
> +               dma_unmap_sg(dev->dev.parent, umem->sglist,
> +                            umem->nmap, umem->direction);
> +               for (i = 0; i < umem->nmap; i++)
> +                       put_page(sg_page(&umem->sglist[i]));
> +
> +               vfree(umem);
> +       }
> +
> +       return 0;
> +}
> +
> +static int
>  packet_setsockopt(struct socket *sock, int level, int optname, char __user *optval, unsigned int optlen)
>  {
>         struct sock *sk = sock->sk;
> @@ -3428,6 +3525,167 @@ packet_setsockopt(struct socket *sock, int level, int optname, char __user *optv
>                 po->xmit = val ? packet_direct_xmit : dev_queue_xmit;
>                 return 0;
>         }
> +       case PACKET_RXTX_QPAIRS_SPLIT:
> +       {
> +               struct tpacket_dev_qpairs_info qpairs;
> +               const struct net_device_ops *ops;
> +               struct net_device *dev;
> +               int err;
> +
> +               if (optlen != sizeof(qpairs))
> +                       return -EINVAL;
> +               if (copy_from_user(&qpairs, optval, sizeof(qpairs)))
> +                       return -EFAULT;
> +
> +               /* Only allow one set of queues to be owned by userspace */
> +               if (po->tp_owns_queue_pairs)
> +                       return -EBUSY;
> +
> +               /* This call only works after a bind call which calls a dev_hold
> +                * operation so we do not need to increment dev ref counter
> +                */
> +               dev = __dev_get_by_index(sock_net(sk), po->ifindex);
> +               if (!dev)
> +                       return -EINVAL;
> +               ops = dev->netdev_ops;
> +               if (!ops->ndo_split_queue_pairs)
> +                       return -EOPNOTSUPP;
> +
> +               err =  ops->ndo_split_queue_pairs(dev,
> +                                                 qpairs.tp_qpairs_start_from,
> +                                                 qpairs.tp_qpairs_num, sk);
> +               if (!err)
> +                       po->tp_owns_queue_pairs = true;
> +
> +               return err;
> +       }
> +       case PACKET_RXTX_QPAIRS_RETURN:
> +       {
> +               struct tpacket_dev_qpairs_info qpairs_info;
> +               const struct net_device_ops *ops;
> +               struct net_device *dev;
> +               int err;
> +
> +               if (optlen != sizeof(qpairs_info))
> +                       return -EINVAL;
> +               if (copy_from_user(&qpairs_info, optval, sizeof(qpairs_info)))
> +                       return -EFAULT;
> +
> +               if (!po->tp_owns_queue_pairs)
> +                       return -EINVAL;
> +
> +               /* This call only work after a bind call which calls a dev_hold
> +                * operation so we do not need to increment dev ref counter
> +                */
> +               dev = __dev_get_by_index(sock_net(sk), po->ifindex);
> +               if (!dev)
> +                       return -EINVAL;
> +               ops = dev->netdev_ops;
> +               if (!ops->ndo_split_queue_pairs)
> +                       return -EOPNOTSUPP;
> +
> +               err =  dev->netdev_ops->ndo_return_queue_pairs(dev, sk);
> +               if (!err)
> +                       po->tp_owns_queue_pairs = false;
> +
> +               return err;
> +       }
> +       case PACKET_DMA_MEM_REGION_MAP:
> +       {
> +               struct tpacket_dma_mem_region region;
> +               const struct net_device_ops *ops;
> +               struct net_device *dev;
> +               struct packet_umem_region *umem;
> +               unsigned long npages;
> +               unsigned long offset;
> +               unsigned long i;
> +               int err;
> +
> +               if (optlen != sizeof(region))
> +                       return -EINVAL;
> +               if (copy_from_user(&region, optval, sizeof(region)))
> +                       return -EFAULT;
> +               if ((region.direction != DMA_BIDIRECTIONAL) &&
> +                   (region.direction != DMA_TO_DEVICE) &&
> +                   (region.direction != DMA_FROM_DEVICE))
> +                       return -EFAULT;
> +
> +               if (!po->tp_owns_queue_pairs)
> +                       return -EINVAL;
> +
> +               /* This call only work after a bind call which calls a dev_hold
> +                * operation so we do not need to increment dev ref counter
> +                */
> +               dev = __dev_get_by_index(sock_net(sk), po->ifindex);
> +               if (!dev)
> +                       return -EINVAL;
> +
> +               offset = ((unsigned long)region.addr) & (~PAGE_MASK);
> +               npages = PAGE_ALIGN(region.size + offset) >> PAGE_SHIFT;
> +
> +               umem = vzalloc(sizeof(*umem) +
> +                              sizeof(struct scatterlist) * npages);
> +               if (!umem)
> +                       return -ENOMEM;
> +
> +               umem->nents = npages;
> +               umem->direction = region.direction;
> +
> +               down_write(&current->mm->mmap_sem);
> +               if (get_umem_pages(&region, umem) < 0) {
> +                       ret = -EFAULT;
> +                       goto exit;
> +               }
> +
> +               if ((umem->nmap == npages) &&
> +                   (0 != dma_map_sg(dev->dev.parent, umem->sglist,
> +                                    umem->nmap, region.direction))) {
> +                       region.iova = sg_dma_address(umem->sglist) + offset;
> +
> +                       ops = dev->netdev_ops;
> +                       if (!ops->ndo_validate_dma_mem_region_map) {
> +                               ret = -EOPNOTSUPP;
> +                               goto unmap;
> +                       }
> +
> +                       /* use driver to validate mapping of dma memory */
> +                       err = ops->ndo_validate_dma_mem_region_map(dev,
> +                                                                  &region,
> +                                                                  sk);
> +                       if (!err) {
> +                               list_add_tail(&umem->list, &po->umem_list);
> +                               ret = 0;
> +                               goto exit;
> +                       }
> +               }
> +
> +unmap:
> +               dma_unmap_sg(dev->dev.parent, umem->sglist,
> +                            umem->nmap, umem->direction);
> +               for (i = 0; i < umem->nmap; i++)
> +                       put_page(sg_page(&umem->sglist[i]));
> +
> +               vfree(umem);
> +exit:
> +               up_write(&current->mm->mmap_sem);
> +
> +               return ret;
> +       }
> +       case PACKET_DMA_MEM_REGION_RELEASE:
> +       {
> +               struct net_device *dev;
> +
> +               dev = __dev_get_by_index(sock_net(sk), po->ifindex);
> +               if (!dev)
> +                       return -EINVAL;
> +
> +               down_write(&current->mm->mmap_sem);
> +               ret = umem_release(dev, po);
> +               up_write(&current->mm->mmap_sem);
> +
> +               return ret;
> +       }
> +
>         default:
>                 return -ENOPROTOOPT;
>         }
> @@ -3523,6 +3781,129 @@ static int packet_getsockopt(struct socket *sock, int level, int optname,
>         case PACKET_QDISC_BYPASS:
>                 val = packet_use_direct_xmit(po);
>                 break;
> +       case PACKET_RXTX_QPAIRS_SPLIT:
> +       {
> +               struct net_device *dev;
> +               struct tpacket_dev_qpairs_info qpairs_info;
> +               int err;
> +
> +               if (len != sizeof(qpairs_info))
> +                       return -EINVAL;
> +               if (copy_from_user(&qpairs_info, optval, sizeof(qpairs_info)))
> +                       return -EFAULT;
> +
> +               /* This call only work after a successful queue pairs split-off
> +                * operation via setsockopt()
> +                */
> +               if (!po->tp_owns_queue_pairs)
> +                       return -EINVAL;
> +
> +               /* This call only work after a bind call which calls a dev_hold
> +                * operation so we do not need to increment dev ref counter
> +                */
> +               dev = __dev_get_by_index(sock_net(sk), po->ifindex);
> +               if (!dev)
> +                       return -EINVAL;
> +               if (!dev->netdev_ops->ndo_split_queue_pairs)
> +                       return -EOPNOTSUPP;
> +
> +               err =  dev->netdev_ops->ndo_get_split_queue_pairs(dev,
> +                                       &qpairs_info.tp_qpairs_start_from,
> +                                       &qpairs_info.tp_qpairs_num, sk);
> +
> +               lv = sizeof(qpairs_info);
> +               data = &qpairs_info;
> +               break;
> +       }
> +       case PACKET_DEV_QPAIR_MAP_REGION_INFO:
> +       {
> +               struct tpacket_dev_qpair_map_region_info info;
> +               const struct net_device_ops *ops;
> +               struct net_device *dev;
> +               int err;
> +
> +               if (len != sizeof(info))
> +                       return -EINVAL;
> +               if (copy_from_user(&info, optval, sizeof(info)))
> +                       return -EFAULT;
> +
> +               /* This call only work after a bind call which calls a dev_hold
> +                * operation so we do not need to increment dev ref counter
> +                */
> +               dev = __dev_get_by_index(sock_net(sk), po->ifindex);
> +               if (!dev)
> +                       return -EINVAL;
> +
> +               ops = dev->netdev_ops;
> +               if (!ops->ndo_get_device_qpair_map_region_info)
> +                       return -EOPNOTSUPP;
> +
> +               err = ops->ndo_get_device_qpair_map_region_info(dev, &info);
> +               if (err)
> +                       return err;
> +
> +               lv = sizeof(struct tpacket_dev_qpair_map_region_info);
> +               data = &info;
> +               break;
> +       }
> +       case PACKET_DEV_DESC_INFO:
> +       {
> +               struct net_device *dev;
> +               struct tpacket_dev_info info;
> +               int err;
> +
> +               if (len != sizeof(info))
> +                       return -EINVAL;
> +               if (copy_from_user(&info, optval, sizeof(info)))
> +                       return -EFAULT;
> +
> +               /* This call only work after a bind call which calls a dev_hold
> +                * operation so we do not need to increment dev ref counter
> +                */
> +               dev = __dev_get_by_index(sock_net(sk), po->ifindex);
> +               if (!dev)
> +                       return -EINVAL;
> +               if (!dev->netdev_ops->ndo_get_device_desc_info)
> +                       return -EOPNOTSUPP;
> +
> +               err =  dev->netdev_ops->ndo_get_device_desc_info(dev, &info);
> +               if (err)
> +                       return err;
> +
> +               lv = sizeof(struct tpacket_dev_info);
> +               data = &info;
> +               break;
> +       }
> +       case PACKET_DMA_MEM_REGION_MAP:
> +       {
> +               struct tpacket_dma_mem_region info;
> +               struct net_device *dev;
> +               int err;
> +
> +               if (len != sizeof(info))
> +                               return -EINVAL;
> +               if (copy_from_user(&info, optval, sizeof(info)))
> +                               return -EFAULT;
> +
> +               /* This call only work after a bind call which calls a dev_hold
> +                * operation so we do not need to increment dev ref counter
> +                */
> +               dev = __dev_get_by_index(sock_net(sk), po->ifindex);
> +               if (!dev)
> +                       return -EINVAL;
> +
> +               if (!dev->netdev_ops->ndo_get_dma_region_info)
> +                       return -EOPNOTSUPP;
> +
> +               err =  dev->netdev_ops->ndo_get_dma_region_info(dev, &info, sk);
> +               if (err)
> +                       return err;
> +
> +               lv = sizeof(struct tpacket_dma_mem_region);
> +               data = &info;
> +               break;
> +       }
> +
>         default:
>                 return -ENOPROTOOPT;
>         }
> @@ -3536,7 +3917,6 @@ static int packet_getsockopt(struct socket *sock, int level, int optname,
>         return 0;
>  }
>
> -
>  static int packet_notifier(struct notifier_block *this,
>                            unsigned long msg, void *ptr)
>  {
> @@ -3920,6 +4300,8 @@ static int packet_mmap(struct file *file, struct socket *sock,
>         struct packet_sock *po = pkt_sk(sk);
>         unsigned long size, expected_size;
>         struct packet_ring_buffer *rb;
> +       const struct net_device_ops *ops;
> +       struct net_device *dev;
>         unsigned long start;
>         int err = -EINVAL;
>         int i;
> @@ -3927,8 +4309,20 @@ static int packet_mmap(struct file *file, struct socket *sock,
>         if (vma->vm_pgoff)
>                 return -EINVAL;
>
> +       dev = __dev_get_by_index(sock_net(sk), po->ifindex);
> +       if (!dev)
> +               return -EINVAL;
> +
>         mutex_lock(&po->pg_vec_lock);
>
> +       if (po->tp_owns_queue_pairs) {
> +               ops = dev->netdev_ops;
> +               err = ops->ndo_direct_qpair_page_map(vma, dev);
> +               if (err)
> +                       goto out;
> +               goto done;
> +       }
> +
>         expected_size = 0;
>         for (rb = &po->rx_ring; rb <= &po->tx_ring; rb++) {
>                 if (rb->pg_vec) {
> @@ -3966,6 +4360,7 @@ static int packet_mmap(struct file *file, struct socket *sock,
>                 }
>         }
>
> +done:
>         atomic_inc(&po->mapped);
>         vma->vm_ops = &packet_mmap_ops;
>         err = 0;
> diff --git a/net/packet/internal.h b/net/packet/internal.h
> index cdddf6a..55d2fce 100644
> --- a/net/packet/internal.h
> +++ b/net/packet/internal.h
> @@ -90,6 +90,14 @@ struct packet_fanout {
>         struct packet_type      prot_hook ____cacheline_aligned_in_smp;
>  };
>
> +struct packet_umem_region {
> +       struct list_head        list;
> +       int                     nents;
> +       int                     nmap;
> +       int                     direction;
> +       struct scatterlist      sglist[0];
> +};
> +
>  struct packet_sock {
>         /* struct sock has to be the first member of packet_sock */
>         struct sock             sk;
> @@ -97,6 +105,7 @@ struct packet_sock {
>         union  tpacket_stats_u  stats;
>         struct packet_ring_buffer       rx_ring;
>         struct packet_ring_buffer       tx_ring;
> +       struct list_head        umem_list;
>         int                     copy_thresh;
>         spinlock_t              bind_lock;
>         struct mutex            pg_vec_lock;
> @@ -113,6 +122,7 @@ struct packet_sock {
>         unsigned int            tp_reserve;
>         unsigned int            tp_loss:1;
>         unsigned int            tp_tx_has_off:1;
> +       unsigned int            tp_owns_queue_pairs:1;
>         unsigned int            tp_tstamp;
>         struct net_device __rcu *cached_dev;
>         int                     (*xmit)(struct sk_buff *skb);
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next] ipv6: directly include libc-compat.h in ipv6.h
From: Cong Wang @ 2015-01-13 18:50 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Linux Kernel Network Developers, David Miller
In-Reply-To: <1421090974-30018-1-git-send-email-willemb@google.com>

On Mon, Jan 12, 2015 at 11:29 AM, Willem de Bruijn <willemb@google.com> wrote:
> From: Willem de Bruijn <willemb@google.com>
>
> Patch 3b50d9029809 ("ipv6: fix redefinition of in6_pktinfo ...")
> fixed a libc compatibility issue in ipv6 structure definitions
> as described in include/uapi/linux/libc-compat.h.
>
> It relies on including linux/in6.h to include libc-compat.h itself.
> Include that file directly to clearly communicate the dependency
> (libc-compat.h: "This include must be as early as possible").
>
> Signed-off-by: Willem de Bruijn <willemb@google.com>
>

Acked-by: Cong Wang <xiyou.wangcong@gmail.com>

Thanks for the patch!

^ permalink raw reply

* [PATCH] atm: horizon: Remove some unused functions
From: Rickard Strandqvist @ 2015-01-13 18:50 UTC (permalink / raw)
  To: Chas Williams, linux-atm-general
  Cc: Rickard Strandqvist, netdev, linux-kernel

Removes some functions that are not used anywhere:
channel_to_vpivci() query_tx_channel_config() rx_disabled_handler()

This was partially found by using a static code analysis program called cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
---
 drivers/atm/horizon.c |   24 ------------------------
 1 file changed, 24 deletions(-)

diff --git a/drivers/atm/horizon.c b/drivers/atm/horizon.c
index 1dc0519..527bbd5 100644
--- a/drivers/atm/horizon.c
+++ b/drivers/atm/horizon.c
@@ -458,12 +458,6 @@ static inline void update_tx_channel_config (hrz_dev * dev, short chan, u8 mode,
     return;
 }
 
-static inline u16 query_tx_channel_config (hrz_dev * dev, short chan, u8 mode) {
-  wr_regw (dev, TX_CHANNEL_CONFIG_COMMAND_OFF,
-	   chan * TX_CHANNEL_CONFIG_MULT | mode);
-    return rd_regw (dev, TX_CHANNEL_CONFIG_DATA_OFF);
-}
-
 /********** dump functions **********/
 
 static inline void dump_skb (char * prefix, unsigned int vc, struct sk_buff * skb) {
@@ -513,16 +507,6 @@ static inline void dump_framer (hrz_dev * dev) {
 
 /* RX channels are 10 bit integers, these fns are quite paranoid */
 
-static inline int channel_to_vpivci (const u16 channel, short * vpi, int * vci) {
-  unsigned short vci_bits = 10 - vpi_bits;
-  if ((channel & RX_CHANNEL_MASK) == channel) {
-    *vci = channel & ((~0)<<vci_bits);
-    *vpi = channel >> vci_bits;
-    return channel ? 0 : -EINVAL;
-  }
-  return -EINVAL;
-}
-
 static inline int vpivci_to_channel (u16 * channel, const short vpi, const int vci) {
   unsigned short vci_bits = 10 - vpi_bits;
   if (0 <= vpi && vpi < 1<<vpi_bits && 0 <= vci && vci < 1<<vci_bits) {
@@ -1260,14 +1244,6 @@ static u32 rx_queue_entry_next (hrz_dev * dev) {
   return rx_queue_entry;
 }
 
-/********** handle RX disabled by device **********/
-
-static inline void rx_disabled_handler (hrz_dev * dev) {
-  wr_regw (dev, RX_CONFIG_OFF, rd_regw (dev, RX_CONFIG_OFF) | RX_ENABLE);
-  // count me please
-  PRINTK (KERN_WARNING, "RX was disabled!");
-}
-
 /********** handle RX data received by device **********/
 
 // called from IRQ handler
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH] atm: lanai: Remove unused function
From: Rickard Strandqvist @ 2015-01-13 18:45 UTC (permalink / raw)
  To: Chas Williams, linux-atm-general
  Cc: Rickard Strandqvist, netdev, linux-kernel

Remove the function aal5_spacefor() that is not used anywhere.

This was partially found by using a static code analysis program called cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
---
 drivers/atm/lanai.c |    9 ---------
 1 file changed, 9 deletions(-)

diff --git a/drivers/atm/lanai.c b/drivers/atm/lanai.c
index 93eaf8d..d2e9ea8 100644
--- a/drivers/atm/lanai.c
+++ b/drivers/atm/lanai.c
@@ -681,15 +681,6 @@ static inline int aal5_size(int size)
 	return cells * 48;
 }
 
-/* How many bytes can we send if we have "space" space, assuming we have
- * to send full cells
- */
-static inline int aal5_spacefor(int space)
-{
-	int cells = space / 48;
-	return cells * 48;
-}
-
 /* -------------------- FREE AN ATM SKB: */
 
 static inline void lanai_free_skb(struct atm_vcc *atmvcc, struct sk_buff *skb)
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH v5] can: Convert to runtime_pm
From: Sören Brinkmann @ 2015-01-13 18:43 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: Kedareswara rao Appana, wg-5Yr1BZd7O62+XT7JhA+gdA,
	michal.simek-gjFFaj9aHVfQT0dZR+AlfA,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, linux-can-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Kedareswara rao Appana
In-Reply-To: <54B55DF4.2090505-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>

On Tue, 2015-01-13 at 07:03PM +0100, Marc Kleine-Budde wrote:
> On 01/13/2015 06:49 PM, Sören Brinkmann wrote:
> > On Tue, 2015-01-13 at 06:44PM +0100, Marc Kleine-Budde wrote:
> >> On 01/13/2015 06:24 PM, Sören Brinkmann wrote:
> >>> On Tue, 2015-01-13 at 06:17PM +0100, Marc Kleine-Budde wrote:
> >>>> On 01/13/2015 06:08 PM, Sören Brinkmann wrote:
> >>>>> On Tue, 2015-01-13 at 12:08PM +0100, Marc Kleine-Budde wrote:
> >>>>>> On 01/12/2015 07:45 PM, Sören Brinkmann wrote:
> >>>>>>> On Mon, 2015-01-12 at 08:34PM +0530, Kedareswara rao Appana wrote:
> >>>>>>>> Instead of enabling/disabling clocks at several locations in the driver,
> >>>>>>>> Use the runtime_pm framework. This consolidates the actions for runtime PM
> >>>>>>>> In the appropriate callbacks and makes the driver more readable and mantainable.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Soren Brinkmann <soren.brinkmann-gjFFaj9aHVfQT0dZR+AlfA@public.gmane.org>
> >>>>>>>> Signed-off-by: Kedareswara rao Appana <appanad-gjFFaj9aHVfQT0dZR+AlfA@public.gmane.org>
> >>>>>>>> ---
> >>>>>>>> Changes for v5:
> >>>>>>>>  - Updated with the review comments.
> >>>>>>>>    Updated the remove fuction to use runtime_pm.
> >>>>>>>> Chnages for v4:
> >>>>>>>>  - Updated with the review comments.
> >>>>>>>> Changes for v3:
> >>>>>>>>   - Converted the driver to use runtime_pm.
> >>>>>>>> Changes for v2:
> >>>>>>>>   - Removed the struct platform_device* from suspend/resume
> >>>>>>>>     as suggest by Lothar.
> >>>>>>>>
> >>>>>>>>  drivers/net/can/xilinx_can.c |  157 ++++++++++++++++++++++++++++-------------
> >>>>>>>>  1 files changed, 107 insertions(+), 50 deletions(-)
> >>>>>>> [..]
> >>>>>>>> +static int __maybe_unused xcan_runtime_resume(struct device *dev)
> >>>>>>>>  {
> >>>>>>>> -	struct platform_device *pdev = dev_get_drvdata(dev);
> >>>>>>>> -	struct net_device *ndev = platform_get_drvdata(pdev);
> >>>>>>>> +	struct net_device *ndev = dev_get_drvdata(dev);
> >>>>>>>>  	struct xcan_priv *priv = netdev_priv(ndev);
> >>>>>>>>  	int ret;
> >>>>>>>> +	u32 isr, status;
> >>>>>>>>  
> >>>>>>>>  	ret = clk_enable(priv->bus_clk);
> >>>>>>>>  	if (ret) {
> >>>>>>>> @@ -1014,15 +1030,28 @@ static int __maybe_unused xcan_resume(struct device *dev)
> >>>>>>>>  	ret = clk_enable(priv->can_clk);
> >>>>>>>>  	if (ret) {
> >>>>>>>>  		dev_err(dev, "Cannot enable clock.\n");
> >>>>>>>> -		clk_disable_unprepare(priv->bus_clk);
> >>>>>>>> +		clk_disable(priv->bus_clk);
> >>>>>>> [...]
> >>>>>>>> @@ -1173,12 +1219,23 @@ static int xcan_remove(struct platform_device *pdev)
> >>>>>>>>  {
> >>>>>>>>  	struct net_device *ndev = platform_get_drvdata(pdev);
> >>>>>>>>  	struct xcan_priv *priv = netdev_priv(ndev);
> >>>>>>>> +	int ret;
> >>>>>>>> +
> >>>>>>>> +	ret = pm_runtime_get_sync(&pdev->dev);
> >>>>>>>> +	if (ret < 0) {
> >>>>>>>> +		netdev_err(ndev, "%s: pm_runtime_get failed(%d)\n",
> >>>>>>>> +				__func__, ret);
> >>>>>>>> +		return ret;
> >>>>>>>> +	}
> >>>>>>>>  
> >>>>>>>>  	if (set_reset_mode(ndev) < 0)
> >>>>>>>>  		netdev_err(ndev, "mode resetting failed!\n");
> >>>>>>>>  
> >>>>>>>>  	unregister_candev(ndev);
> >>>>>>>> +	pm_runtime_disable(&pdev->dev);
> >>>>>>>>  	netif_napi_del(&priv->napi);
> >>>>>>>> +	clk_disable_unprepare(priv->bus_clk);
> >>>>>>>> +	clk_disable_unprepare(priv->can_clk);
> >>>>>>>
> >>>>>>> Shouldn't pretty much all these occurrences of clk_disable/enable
> >>>>>>> disappear? This should all be handled by the runtime_pm framework now.
> >>>>>>
> >>>>>> We have:
> >>>>>> - clk_prepare_enable() in probe
> >>>>>
> >>>>> This should become something like pm_runtime_get_sync(), shouldn't it?
> >>>>>
> >>>>>> - clk_disable_unprepare() in remove
> >>>>>
> >>>>> pm_runtime_put()
> >>>>>
> >>>>>> - clk_enable() in runtime_resume
> >>>>>> - clk_disable() in runtime_suspend
> >>>>>
> >>>>> These are the ones needed.
> >>>>>
> >>>>> The above makes me suspect that the clocks are always on, regardless of
> >>>>
> >>>> Define "on" :)
> >>>> The clocks are prepared after probe() exists, but not enabled. The first
> >>>> pm_runtime_get_sync() will enable the clocks.
> >>>>
> >>>>> the runtime suspend state since they are enabled in probe and disabled
> >>>>> in remove, is that right? Ideally, the usage in probe and remove should
> >>>>> be migrated to runtime_pm and clocks should really only be running when
> >>>>> needed and not throughout the whole lifetime of the driver.
> >>>>
> >>>> The clocks are not en/disabled via pm_runtime, because
> >>>> pm_runtime_get_sync() is called from atomic contect. We can have another
> >>>> look into the driver and try to change this.
> >>
> >>> Wasn't that why the call to pm_runtime_irq_safe() was added?
> >>
> >> Good question. That should be investigated.
> >>
> >>> Also, clk_enable/disable should be okay to be run from atomic context.
> >>> And if the clock are already prepared after the exit of probe that
> >>> should be enough. Then remove() should just have to do the unprepare.
> >>> But I don't see why runtime_pm shouldn't be able to do the
> >>> enable/disable.
> >>
> >> runtime_pm does call the clk_{enable,disable} function. But you mean
> >> clk_prepare() + pm_runtime_get_sync() should be used in probe() instead
> >> of calling clk_prepare_enable(). Good idea! I think the
> >> "pm_runtime_set_active(&pdev->dev);" has to be removed from the patch.
> > 
> > Right, that's what I was thinking. The proposed changes make sense, IMHO.
> > 
> >>
> >> Coming back whether blocking calls are allowed or not.
> >> If you make a call to pm_runtime_irq_safe(), you state that it's okay to
> >> call pm_runtime_get_sync() from atomic context. But it's only called in
> >> open, probe, remove and in xcan_get_berr_counter, which is not called
> >> from atomic either. So let's try to remove the pm_runtime_irq_safe() and
> >> use clk_prepare_enable() clk_disable_unprepare() in the runtime_resume()
> >> runtime_suspend() functions.
> > 
> > IIRC, xcan_get_berr_counter() is called from atomic context. I think
> > that was how this got started.
> 
> In some drivers the get_berr_counter() function is used in the irq
> handler, but here it's only called from outside, an thus from non atomic
> context.
> 
> From an older mail of yours:
> 
> > I have the feeling I'm missing something. If I remove the 'must not
> > sleep' requirement from the runtime suspend/resume functions, I get
> > this:
> > 
> > BUG: sleeping function called from invalid context at drivers/base/power/runtime.c:954
> 
> http://lxr.free-electrons.com/source/drivers/base/power/runtime.c#L954
> 
> I think it's failing because of the pm_runtime_irq_safe() call.

Adding that call fixed this issue.

	Sören
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC 1/2] misc: uidstat: Add uid stat driver to collect network statistics.
From: Greg Kroah-Hartman @ 2015-01-13 18:34 UTC (permalink / raw)
  To: Kiran Raparthy
  Cc: linux-kernel, Mike Chan, Arnd Bergmann, David S. Miller,
	Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy, netdev, Android Kernel Team, John Stultz,
	Sumit Semwal, JP Abgrall, Arve Hj�nnev�g
In-Reply-To: <1421147642-28360-1-git-send-email-kiran.kumar@linaro.org>

On Tue, Jan 13, 2015 at 04:44:01PM +0530, Kiran Raparthy wrote:
> From: Mike Chan <mike@android.com>
> 
> misc: uidstat: Add uid stat driver to collect network statistics.
> 
> To analyze application's network statistics, we need a mechanism to export
> the UID based statistics to userspace so that userspace tools can use the
> exported numbers and generate the report against the UID.
> 
> This patch allows the user to explore the UID based network statistics
> exported to /proc/uid_stat.

This is a joke, right?  It's not April 1 yet people, why send this in
January?

confused,

greg k-h

^ permalink raw reply

* Re: [PATCH][v3.13.y] e1000e: Fix no connectivity when driver loaded with cable out
From: Kamal Mostafa @ 2015-01-13 18:29 UTC (permalink / raw)
  To: Joseph Salisbury
  Cc: Kamal Mostafa, stable@vger.kernel.org, davidx.m.ertman,
	jeffrey.e.pieper, jeffrey.t.kirsher, LKML, linux.nics,
	e1000-devel, netdev@vger.kernel.org
In-Reply-To: <5491B88E.3070906@canonical.com>

On Wed, 2014-12-17 at 12:08 -0500, Joseph Salisbury wrote:
> Hello,
> 
> Please consider including mainline commit b20a774 in the next v3.13.y
> stable release.  It was included in the mainline tree as of v3.15-rc1. 
> It has been tested and confirmed to resolve
> http://bugs.launchpad.net/bugs/1400365 .
> 
> commit b20a774495671f037e7160ea2ce8789af6b61533
> Author: David Ertman <davidx.m.ertman@intel.com>
> Date:   Tue Mar 25 04:27:55 2014 +0000
> 
>     e1000e: Fix no connectivity when driver loaded with cable out

Yes, I'll queue this up for the next 3.13-stable.  Thanks, Joseph!

 -Kamal

^ permalink raw reply

* [PATCH] NFC: hci: llc: Remove unused function
From: Rickard Strandqvist @ 2015-01-13 18:28 UTC (permalink / raw)
  To: Lauro Ramos Venancio, Aloisio Almeida Jr
  Cc: Rickard Strandqvist, Samuel Ortiz, David S. Miller, Axel Lin,
	linux-wireless, netdev, linux-kernel

Remove the function nfc_llc_unregister() that is not used anywhere.

This was partially found by using a static code analysis program called cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
---
 net/nfc/hci/llc.c |   13 -------------
 net/nfc/hci/llc.h |    1 -
 2 files changed, 14 deletions(-)

diff --git a/net/nfc/hci/llc.c b/net/nfc/hci/llc.c
index 1b90c05..91b71e8 100644
--- a/net/nfc/hci/llc.c
+++ b/net/nfc/hci/llc.c
@@ -85,19 +85,6 @@ static struct nfc_llc_engine *nfc_llc_name_to_engine(const char *name)
 	return NULL;
 }
 
-void nfc_llc_unregister(const char *name)
-{
-	struct nfc_llc_engine *llc_engine;
-
-	llc_engine = nfc_llc_name_to_engine(name);
-	if (llc_engine == NULL)
-		return;
-
-	list_del(&llc_engine->entry);
-	kfree(llc_engine->name);
-	kfree(llc_engine);
-}
-
 struct nfc_llc *nfc_llc_allocate(const char *name, struct nfc_hci_dev *hdev,
 				 xmit_to_drv_t xmit_to_drv,
 				 rcv_to_hci_t rcv_to_hci, int tx_headroom,
diff --git a/net/nfc/hci/llc.h b/net/nfc/hci/llc.h
index 5dad4c5..f8ac1dc 100644
--- a/net/nfc/hci/llc.h
+++ b/net/nfc/hci/llc.h
@@ -51,7 +51,6 @@ struct nfc_llc {
 void *nfc_llc_get_data(struct nfc_llc *llc);
 
 int nfc_llc_register(const char *name, struct nfc_llc_ops *ops);
-void nfc_llc_unregister(const char *name);
 
 int nfc_llc_nop_register(void);
 
-- 
1.7.10.4

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox