Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: pull request: wireless-next 2014-10-03
From: John W. Linville @ 2014-10-07 14:18 UTC (permalink / raw)
  To: David Miller; +Cc: linux-wireless, netdev, linux-kernel, larry.finger
In-Reply-To: <20141007.004724.187788111776188480.davem@davemloft.net>

On Tue, Oct 07, 2014 at 12:47:24AM -0400, David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Sun, 05 Oct 2014 21:38:53 -0400 (EDT)
> 
> > From: David Miller <davem@davemloft.net>
> > Date: Sun, 05 Oct 2014 21:35:11 -0400 (EDT)
> > 
> >> From: "John W. Linville" <linville@tuxdriver.com>
> >> Date: Fri, 3 Oct 2014 14:01:52 -0400
> >> 
> >>> Please pull tihs batch of updates intended for the 3.18 stream!
> >> 
> >> Pulled, thanks for the stellar pull request text, as always.
> > 
> > John, what's the deal with the following?  Will it be resolved by the
> > driver being removed from the staging tree?
> > 
> > WARNING: drivers/staging/rtl8192ee/r8192ee: 'rtl_evm_dbm_jaguar' exported twice. Previous export was in drivers/net/wireless/rtlwifi/rtlwifi.ko
> 
> John, ping?

Sorry, Dave -- sendmail died on my local machine and I didn't notice
until now... :-(

Anyway...yes, the staging driver should be removed by a patch in
Greg's queue.

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply

* Re: [PATCH] net: fec: fix regression on i.MX28 introduced by rx_copybreak support
From: Lothar Waßmann @ 2014-10-07 14:17 UTC (permalink / raw)
  To: David Laight
  Cc: netdev@vger.kernel.org, David S. Miller, Russell King, Frank Li,
	Fabio Estevam, linux-kernel@vger.kernel.org
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D174C6025@AcuExch.aculab.com>

Hi,

David Laight wrote:
> From: Lothar Waßmann
> > commit 1b7bde6d659d ("net: fec: implement rx_copybreak to improve rx performance")
> > introduced a regression for i.MX28. The swap_buffer() function doing
> > the endian conversion of the received data on i.MX28 may access memory
> > beyond the actual packet size in the DMA buffer. fec_enet_copybreak()
> > does not copy those bytes, so that the last bytes of a packet may be
> > filled with invalid data after swapping.
> > This will likely lead to checksum errors on received packets.
> > E.g. when trying to mount an NFS rootfs:
> > UDP: bad checksum. From 192.168.1.225:111 to 192.168.100.73:44662 ulen 36
> > 
> > Do the byte swapping and copying to the new skb in one go if
> > necessary.
> 
> ISTM that if you need to do the 'swap' you should copy the data regardless
> of the length.
> 
The swap function has to look at at most 3 bytes beyond the actual
packet length. That is what the original swap_buffer() function does and
what the new function swap_buffer2(), that does the endian swapping
while copying to the new buffer, also does.


Lothar Waßmann
-- 
___________________________________________________________

Ka-Ro electronics GmbH | Pascalstraße 22 | D - 52076 Aachen
Phone: +49 2408 1402-0 | Fax: +49 2408 1402-10
Geschäftsführer: Matthias Kaussen
Handelsregistereintrag: Amtsgericht Aachen, HRB 4996

www.karo-electronics.de | info@karo-electronics.de
___________________________________________________________

^ permalink raw reply

* Re: [PATCH] net: fec: fix regression on i.MX28 introduced by rx_copybreak support
From: Fabio Estevam @ 2014-10-07 14:02 UTC (permalink / raw)
  To: Lothar Waßmann
  Cc: netdev@vger.kernel.org, David S. Miller, Russell King, Frank Li,
	Fabio Estevam, linux-kernel
In-Reply-To: <1412687977-11742-1-git-send-email-LW@KARO-electronics.de>

Hi Lothar,

On Tue, Oct 7, 2014 at 10:19 AM, Lothar Waßmann <LW@karo-electronics.de> wrote:
> commit 1b7bde6d659d ("net: fec: implement rx_copybreak to improve rx performance")
> introduced a regression for i.MX28. The swap_buffer() function doing
> the endian conversion of the received data on i.MX28 may access memory
> beyond the actual packet size in the DMA buffer. fec_enet_copybreak()
> does not copy those bytes, so that the last bytes of a packet may be
> filled with invalid data after swapping.
> This will likely lead to checksum errors on received packets.
> E.g. when trying to mount an NFS rootfs:
> UDP: bad checksum. From 192.168.1.225:111 to 192.168.100.73:44662 ulen 36
>
> Do the byte swapping and copying to the new skb in one go if
> necessary.
>
> Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de>

Yesterday night I was using linux-next on mx28evk and could not boot from NFS.

I haven't had a chance to debug it, but I am glad you found a fix.

Won't be able to have access to my mx28evk until Thursday to test it though.

Thanks

^ permalink raw reply

* [PATCH RESEND v3] 3c59x: fix bad split of cpu_to_le32(pci_map_single())
From: Sylvain 'ythier' Hitier @ 2014-10-07 13:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: nhorman, mroos, David Miller, netdev, Steffen Klassert

From: Sylvain "ythier" Hitier <sylvain.hitier@gmail.com>

In commit 6f2b6a3005b2c34c39f207a87667564f64f2f91a,
  # 3c59x: Add dma error checking and recovery
the intent is to split out the mapping from the byte-swapping in order to
insert a dma_mapping_error() check.

Kinda this semantic patch:

    // See http://coccinelle.lip6.fr/
    //
    // Beware, grouik-and-dirty!
    @@
    expression DEV, X, Y, Z;
    @@
    -   cpu_to_le32(pci_map_single(DEV, X, Y, Z))
    +   dma_addr_t addr = pci_map_single(DEV, X, Y, Z);
    +   if (dma_mapping_error(&DEV->dev, addr))
    +       /* snip */;
    +   cpu_to_le32(addr)

However, the #else part (of the #if DO_ZEROCOPY test) is changed this way:

    -   cpu_to_le32(pci_map_single(DEV, X, Y, Z))
    +   dma_addr_t addr = cpu_to_le32(pci_map_single(DEV, X, Y, Z));
    //                    ^^^^^^^^^^^
    //                    That mismatches the 3 other changes!
    +   if (dma_mapping_error(&DEV->dev, addr))
    +       /* snip */;
    +   cpu_to_le32(addr)

Let's remove the leftover cpu_to_le32() for coherency.

v2: Better changelog.
v3: Add Acked-by

Fixes: 6f2b6a3005b2c34c39f207a87667564f64f2f91a
  # 3c59x: Add dma error checking and recovery
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Sylvain "ythier" Hitier <sylvain.hitier@gmail.com>
---

[Resent with maintainer and mailing-list Cc-ed]

 drivers/net/ethernet/3com/3c59x.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/3com/3c59x.c b/drivers/net/ethernet/3com/3c59x.c
index 8ca49f04..0a3108b3 100644
--- a/drivers/net/ethernet/3com/3c59x.c
+++ b/drivers/net/ethernet/3com/3c59x.c
@@ -2214,7 +2214,7 @@ boomerang_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		}
 	}
 #else
-	dma_addr = cpu_to_le32(pci_map_single(VORTEX_PCI(vp), skb->data, skb->len, PCI_DMA_TODEVICE));
+	dma_addr = pci_map_single(VORTEX_PCI(vp), skb->data, skb->len, PCI_DMA_TODEVICE);
 	if (dma_mapping_error(&VORTEX_PCI(vp)->dev, dma_addr))
 		goto out_dma_err;
 	vp->tx_ring[entry].addr = cpu_to_le32(dma_addr);
-- 
1.7.10.4

Regards,
Sylvain "ythier" Hitier

-- 
Business is about being busy, not being rich...
Lived 777 days in a Debian package => http://en.wikipedia.org/wiki/Apt,_Vaucluse
There's THE room for ideals in this mechanical place!

^ permalink raw reply related

* RE: [PATCH] net: fec: fix regression on i.MX28 introduced by rx_copybreak support
From: David Laight @ 2014-10-07 13:31 UTC (permalink / raw)
  To: 'Lothar Waßmann', netdev@vger.kernel.org
  Cc: David S. Miller, Russell King, Frank Li, Fabio Estevam,
	linux-kernel@vger.kernel.org
In-Reply-To: <1412687977-11742-1-git-send-email-LW@KARO-electronics.de>

From: Lothar Waßmann
> commit 1b7bde6d659d ("net: fec: implement rx_copybreak to improve rx performance")
> introduced a regression for i.MX28. The swap_buffer() function doing
> the endian conversion of the received data on i.MX28 may access memory
> beyond the actual packet size in the DMA buffer. fec_enet_copybreak()
> does not copy those bytes, so that the last bytes of a packet may be
> filled with invalid data after swapping.
> This will likely lead to checksum errors on received packets.
> E.g. when trying to mount an NFS rootfs:
> UDP: bad checksum. From 192.168.1.225:111 to 192.168.100.73:44662 ulen 36
> 
> Do the byte swapping and copying to the new skb in one go if
> necessary.

ISTM that if you need to do the 'swap' you should copy the data regardless
of the length.

	David

> 
> Signed-off-by: Lothar Wamann <LW@KARO-electronics.de>
> ---
>  drivers/net/ethernet/freescale/fec_main.c |   25 +++++++++++++++++++++----
>  1 file changed, 21 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
> index 87975b5..eaaebad 100644
> --- a/drivers/net/ethernet/freescale/fec_main.c
> +++ b/drivers/net/ethernet/freescale/fec_main.c
> @@ -339,6 +339,18 @@ static void *swap_buffer(void *bufaddr, int len)
>  	return bufaddr;
>  }
> 
> +static void *swap_buffer2(void *dst_buf, void *src_buf, int len)
> +{
> +	int i;
> +	unsigned int *src = src_buf;
> +	unsigned int *dst = dst_buf;
> +
> +	for (i = 0; i < DIV_ROUND_UP(len, 4); i++, src++, dst++)
> +		*dst = cpu_to_be32(*src);
> +
> +	return dst_buf;
> +}
> +
>  static void fec_dump(struct net_device *ndev)
>  {
>  	struct fec_enet_private *fep = netdev_priv(ndev);
> @@ -1348,7 +1360,7 @@ fec_enet_new_rxbdp(struct net_device *ndev, struct bufdesc *bdp, struct sk_buff
>  }
> 
>  static bool fec_enet_copybreak(struct net_device *ndev, struct sk_buff **skb,
> -			       struct bufdesc *bdp, u32 length)
> +			       struct bufdesc *bdp, u32 length, int swap)
>  {
>  	struct  fec_enet_private *fep = netdev_priv(ndev);
>  	struct sk_buff *new_skb;
> @@ -1363,7 +1375,10 @@ static bool fec_enet_copybreak(struct net_device *ndev, struct sk_buff **skb,
>  	dma_sync_single_for_cpu(&fep->pdev->dev, bdp->cbd_bufaddr,
>  				FEC_ENET_RX_FRSIZE - fep->rx_align,
>  				DMA_FROM_DEVICE);
> -	memcpy(new_skb->data, (*skb)->data, length);
> +	if (!swap)
> +		memcpy(new_skb->data, (*skb)->data, length);
> +	else
> +		swap_buffer2(new_skb->data, (*skb)->data, length);
>  	*skb = new_skb;
> 
>  	return true;
> @@ -1393,6 +1408,7 @@ fec_enet_rx_queue(struct net_device *ndev, int budget, u16 queue_id)
>  	u16	vlan_tag;
>  	int	index = 0;
>  	bool	is_copybreak;
> +	bool need_swap = id_entry->driver_data & FEC_QUIRK_SWAP_FRAME;
> 
>  #ifdef CONFIG_M532x
>  	flush_cache_all();
> @@ -1456,7 +1472,8 @@ fec_enet_rx_queue(struct net_device *ndev, int budget, u16 queue_id)
>  		 * include that when passing upstream as it messes up
>  		 * bridging applications.
>  		 */
> -		is_copybreak = fec_enet_copybreak(ndev, &skb, bdp, pkt_len - 4);
> +		is_copybreak = fec_enet_copybreak(ndev, &skb, bdp, pkt_len - 4,
> +						  need_swap);
>  		if (!is_copybreak) {
>  			skb_new = netdev_alloc_skb(ndev, FEC_ENET_RX_FRSIZE);
>  			if (unlikely(!skb_new)) {
> @@ -1471,7 +1488,7 @@ fec_enet_rx_queue(struct net_device *ndev, int budget, u16 queue_id)
>  		prefetch(skb->data - NET_IP_ALIGN);
>  		skb_put(skb, pkt_len - 4);
>  		data = skb->data;
> -		if (id_entry->driver_data & FEC_QUIRK_SWAP_FRAME)
> +		if (!is_copybreak && need_swap)
>  			swap_buffer(data, pkt_len);
> 
>  		/* Extract the enhanced buffer descriptor */
> --
> 1.7.10.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] net: fec: fix regression on i.MX28 introduced by rx_copybreak support
From: Lothar Waßmann @ 2014-10-07 13:19 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Russell King, Frank Li, Fabio Estevam,
	linux-kernel, Lothar Waßmann

commit 1b7bde6d659d ("net: fec: implement rx_copybreak to improve rx performance")
introduced a regression for i.MX28. The swap_buffer() function doing
the endian conversion of the received data on i.MX28 may access memory
beyond the actual packet size in the DMA buffer. fec_enet_copybreak()
does not copy those bytes, so that the last bytes of a packet may be
filled with invalid data after swapping.
This will likely lead to checksum errors on received packets.
E.g. when trying to mount an NFS rootfs:
UDP: bad checksum. From 192.168.1.225:111 to 192.168.100.73:44662 ulen 36

Do the byte swapping and copying to the new skb in one go if
necessary.

Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de>
---
 drivers/net/ethernet/freescale/fec_main.c |   25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 87975b5..eaaebad 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -339,6 +339,18 @@ static void *swap_buffer(void *bufaddr, int len)
 	return bufaddr;
 }
 
+static void *swap_buffer2(void *dst_buf, void *src_buf, int len)
+{
+	int i;
+	unsigned int *src = src_buf;
+	unsigned int *dst = dst_buf;
+
+	for (i = 0; i < DIV_ROUND_UP(len, 4); i++, src++, dst++)
+		*dst = cpu_to_be32(*src);
+
+	return dst_buf;
+}
+
 static void fec_dump(struct net_device *ndev)
 {
 	struct fec_enet_private *fep = netdev_priv(ndev);
@@ -1348,7 +1360,7 @@ fec_enet_new_rxbdp(struct net_device *ndev, struct bufdesc *bdp, struct sk_buff
 }
 
 static bool fec_enet_copybreak(struct net_device *ndev, struct sk_buff **skb,
-			       struct bufdesc *bdp, u32 length)
+			       struct bufdesc *bdp, u32 length, int swap)
 {
 	struct  fec_enet_private *fep = netdev_priv(ndev);
 	struct sk_buff *new_skb;
@@ -1363,7 +1375,10 @@ static bool fec_enet_copybreak(struct net_device *ndev, struct sk_buff **skb,
 	dma_sync_single_for_cpu(&fep->pdev->dev, bdp->cbd_bufaddr,
 				FEC_ENET_RX_FRSIZE - fep->rx_align,
 				DMA_FROM_DEVICE);
-	memcpy(new_skb->data, (*skb)->data, length);
+	if (!swap)
+		memcpy(new_skb->data, (*skb)->data, length);
+	else
+		swap_buffer2(new_skb->data, (*skb)->data, length);
 	*skb = new_skb;
 
 	return true;
@@ -1393,6 +1408,7 @@ fec_enet_rx_queue(struct net_device *ndev, int budget, u16 queue_id)
 	u16	vlan_tag;
 	int	index = 0;
 	bool	is_copybreak;
+	bool need_swap = id_entry->driver_data & FEC_QUIRK_SWAP_FRAME;
 
 #ifdef CONFIG_M532x
 	flush_cache_all();
@@ -1456,7 +1472,8 @@ fec_enet_rx_queue(struct net_device *ndev, int budget, u16 queue_id)
 		 * include that when passing upstream as it messes up
 		 * bridging applications.
 		 */
-		is_copybreak = fec_enet_copybreak(ndev, &skb, bdp, pkt_len - 4);
+		is_copybreak = fec_enet_copybreak(ndev, &skb, bdp, pkt_len - 4,
+						  need_swap);
 		if (!is_copybreak) {
 			skb_new = netdev_alloc_skb(ndev, FEC_ENET_RX_FRSIZE);
 			if (unlikely(!skb_new)) {
@@ -1471,7 +1488,7 @@ fec_enet_rx_queue(struct net_device *ndev, int budget, u16 queue_id)
 		prefetch(skb->data - NET_IP_ALIGN);
 		skb_put(skb, pkt_len - 4);
 		data = skb->data;
-		if (id_entry->driver_data & FEC_QUIRK_SWAP_FRAME)
+		if (!is_copybreak && need_swap)
 			swap_buffer(data, pkt_len);
 
 		/* Extract the enhanced buffer descriptor */
-- 
1.7.10.4

^ permalink raw reply related

* Re: [Xen-devel] [PATCHv1] xen-netfront: always keep the Rx ring full of requests
From: annie li @ 2014-10-07 13:12 UTC (permalink / raw)
  To: David Vrabel; +Cc: David Miller, netdev, boris.ostrovsky, xen-devel
In-Reply-To: <5433B5C8.9060309@citrix.com>


On 2014/10/7 5:43, David Vrabel wrote:
> On 06/10/14 22:07, David Miller wrote:
>> From: annie li <annie.li@oracle.com>
>> Date: Mon, 06 Oct 2014 14:41:48 -0400
>>
>>> On 2014/10/6 12:00, David Vrabel wrote:
>>>>>>     +    queue->rx.req_prod_pvt = req_prod;
>>>>>> +
>>>>>> +    /* Not enough requests? Try again later. */
>>>>>> +    if (req_prod - queue->rx.rsp_cons < NET_RX_SLOTS_MIN) {
>>>>>> +        mod_timer(&queue->rx_refill_timer, jiffies + (HZ/10));
>>>>>> +        return;
>>>>> If the previous for loop breaks because of failure of
>>>>> xennet_alloc_one_rx_buffer, then notify_remote_via_irq is missed here
>>>>> if
>>>>> the code returns directly.
>>>> This is deliberate -- there's no point notifying the backend if there
>>>> aren't enough requests for the next packet.  Since we don't know what
>>>> the next packet might be we assume it's the largest possible.
>>> That makes sense.
>>> However, the largest packet case does not happen so
>>> frequently. Moreover, netback checks the slots every incoming skb
>>> requires in xenvif_rx_ring_slots_available, not only concerning the
>>> largest case.
> An upcoming change to netback will cause it to wait for enough slots for
> the largest possible packet.

Netback knows the exact slot number that incoming skb will consumes, is 
there any reason to let it wait for the largest possible packets?

Thanks
Annie

^ permalink raw reply

* [PATCH 2/2] net: fs_enet: Add NAPI TX
From: Christophe Leroy @ 2014-10-07 13:05 UTC (permalink / raw)
  To: Pantelis Antoniou, Vitaly Bordug; +Cc: linux-kernel, linuxppc-dev, netdev

When using a MPC8xx as a router, 'perf' shows a significant time spent in 
fs_enet_interrupt() and fs_enet_start_xmit().
'perf annotate' shows that the time spent in fs_enet_start_xmit is indeed spent
between spin_unlock_irqrestore() and the following instruction, hence in
interrupt handling. This is due to the TX complete interrupt that fires after
each transmitted packet.
This patch modifies the handling of TX complete to use NAPI.
With this patch, my NAT router offers a throughput improved by 21%

Original performance:

[root@localhost tmp]# scp toto pgs:/tmp 
toto                                          100%  256MB   2.8MB/s   01:31    

Performance with the patch:

[root@localhost tmp]# scp toto pgs:/tmp
toto                                          100%  256MB   3.4MB/s   01:16    

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>

---
 .../net/ethernet/freescale/fs_enet/fs_enet-main.c  | 47 +++++++++++++++++-----
 drivers/net/ethernet/freescale/fs_enet/fs_enet.h   |  9 ++++-
 drivers/net/ethernet/freescale/fs_enet/mac-fcc.c   | 29 +++++++++++++
 drivers/net/ethernet/freescale/fs_enet/mac-fec.c   | 29 +++++++++++++
 drivers/net/ethernet/freescale/fs_enet/mac-scc.c   | 29 +++++++++++++
 5 files changed, 132 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c b/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c
index 71a25b4..c92c3b7 100644
--- a/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c
+++ b/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c
@@ -215,17 +215,23 @@ static int fs_enet_rx_napi(struct napi_struct *napi, int budget)
 	return received;
 }
 
-static void fs_enet_tx(struct net_device *dev)
+static int fs_enet_tx_napi(struct napi_struct *napi, int budget)
 {
-	struct fs_enet_private *fep = netdev_priv(dev);
+	struct fs_enet_private *fep = container_of(napi, struct fs_enet_private,
+						   napi_tx);
+	struct net_device *dev = fep->ndev;
 	cbd_t __iomem *bdp;
 	struct sk_buff *skb;
 	int dirtyidx, do_wake, do_restart;
 	u16 sc;
+	int has_tx_work = 0;
 
 	spin_lock(&fep->tx_lock);
 	bdp = fep->dirty_tx;
 
+	/* clear TX status bits for napi*/
+	(*fep->ops->napi_clear_tx_event)(dev);
+
 	do_wake = do_restart = 0;
 	while (((sc = CBDR_SC(bdp)) & BD_ENET_TX_READY) == 0) {
 		dirtyidx = bdp - fep->tx_bd_base;
@@ -278,7 +284,7 @@ static void fs_enet_tx(struct net_device *dev)
 		/*
 		 * Free the sk buffer associated with this last transmit.
 		 */
-		dev_kfree_skb_irq(skb);
+		dev_kfree_skb(skb);
 		fep->tx_skbuff[dirtyidx] = NULL;
 
 		/*
@@ -295,6 +301,7 @@ static void fs_enet_tx(struct net_device *dev)
 		 */
 		if (!fep->tx_free++)
 			do_wake = 1;
+		has_tx_work = 1;
 	}
 
 	fep->dirty_tx = bdp;
@@ -302,10 +309,19 @@ static void fs_enet_tx(struct net_device *dev)
 	if (do_restart)
 		(*fep->ops->tx_restart)(dev);
 
+	if (!has_tx_work) {
+		napi_complete(napi);
+		(*fep->ops->napi_enable_tx)(dev);
+	}
+
 	spin_unlock(&fep->tx_lock);
 
 	if (do_wake)
 		netif_wake_queue(dev);
+
+	if (has_tx_work)
+		return budget;
+	return 0;
 }
 
 /*
@@ -350,8 +366,17 @@ fs_enet_interrupt(int irq, void *dev_id)
 				__napi_schedule(&fep->napi);
 		}
 
-		if (int_events & fep->ev_tx)
-			fs_enet_tx(dev);
+		if (int_events & fep->ev_tx) {
+			napi_ok = napi_schedule_prep(&fep->napi_tx);
+
+			(*fep->ops->napi_disable_tx)(dev);
+			(*fep->ops->clear_int_events)(dev, fep->ev_napi_tx);
+
+			/* NOTE: it is possible for FCCs in NAPI mode    */
+			/* to submit a spurious interrupt while in poll  */
+			if (napi_ok)
+				__napi_schedule(&fep->napi_tx);
+		}
 	}
 
 	handled = nr > 0;
@@ -484,7 +509,6 @@ static int fs_enet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	cbd_t __iomem *bdp;
 	int curidx;
 	u16 sc;
-	unsigned long flags;
 
 #ifdef CONFIG_FS_ENET_MPC5121_FEC
 	if (((unsigned long)skb->data) & 0x3) {
@@ -499,7 +523,7 @@ static int fs_enet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		}
 	}
 #endif
-	spin_lock_irqsave(&fep->tx_lock, flags);
+	spin_lock(&fep->tx_lock);
 
 	/*
 	 * Fill in a Tx ring entry
@@ -508,7 +532,7 @@ static int fs_enet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	if (!fep->tx_free || (CBDR_SC(bdp) & BD_ENET_TX_READY)) {
 		netif_stop_queue(dev);
-		spin_unlock_irqrestore(&fep->tx_lock, flags);
+		spin_unlock(&fep->tx_lock);
 
 		/*
 		 * Ooops.  All transmit buffers are full.  Bail out.
@@ -564,7 +588,7 @@ static int fs_enet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	(*fep->ops->tx_kickstart)(dev);
 
-	spin_unlock_irqrestore(&fep->tx_lock, flags);
+	spin_unlock(&fep->tx_lock);
 
 	return NETDEV_TX_OK;
 }
@@ -685,6 +709,7 @@ static int fs_enet_open(struct net_device *dev)
 	fs_init_bds(fep->ndev);
 
 	napi_enable(&fep->napi);
+	napi_enable(&fep->napi_tx);
 
 	/* Install our interrupt handler. */
 	r = request_irq(fep->interrupt, fs_enet_interrupt, IRQF_SHARED,
@@ -692,6 +717,7 @@ static int fs_enet_open(struct net_device *dev)
 	if (r != 0) {
 		dev_err(fep->dev, "Could not allocate FS_ENET IRQ!");
 		napi_disable(&fep->napi);
+		napi_disable(&fep->napi_tx);
 		return -EINVAL;
 	}
 
@@ -699,6 +725,7 @@ static int fs_enet_open(struct net_device *dev)
 	if (err) {
 		free_irq(fep->interrupt, dev);
 		napi_disable(&fep->napi);
+		napi_disable(&fep->napi_tx);
 		return err;
 	}
 	phy_start(fep->phydev);
@@ -716,6 +743,7 @@ static int fs_enet_close(struct net_device *dev)
 	netif_stop_queue(dev);
 	netif_carrier_off(dev);
 	napi_disable(&fep->napi);
+	napi_disable(&fep->napi_tx);
 	phy_stop(fep->phydev);
 
 	spin_lock_irqsave(&fep->lock, flags);
@@ -971,6 +999,7 @@ static int fs_enet_probe(struct platform_device *ofdev)
 	ndev->netdev_ops = &fs_enet_netdev_ops;
 	ndev->watchdog_timeo = 2 * HZ;
 	netif_napi_add(ndev, &fep->napi, fs_enet_rx_napi, fpi->napi_weight);
+	netif_napi_add(ndev, &fep->napi_tx, fs_enet_tx_napi, 2);
 
 	ndev->ethtool_ops = &fs_ethtool_ops;
 
diff --git a/drivers/net/ethernet/freescale/fs_enet/fs_enet.h b/drivers/net/ethernet/freescale/fs_enet/fs_enet.h
index 1ece4b1..3a4b49e 100644
--- a/drivers/net/ethernet/freescale/fs_enet/fs_enet.h
+++ b/drivers/net/ethernet/freescale/fs_enet/fs_enet.h
@@ -84,6 +84,9 @@ struct fs_ops {
 	void (*napi_clear_rx_event)(struct net_device *dev);
 	void (*napi_enable_rx)(struct net_device *dev);
 	void (*napi_disable_rx)(struct net_device *dev);
+	void (*napi_clear_tx_event)(struct net_device *dev);
+	void (*napi_enable_tx)(struct net_device *dev);
+	void (*napi_disable_tx)(struct net_device *dev);
 	void (*rx_bd_done)(struct net_device *dev);
 	void (*tx_kickstart)(struct net_device *dev);
 	u32 (*get_int_events)(struct net_device *dev);
@@ -119,6 +122,7 @@ struct phy_info {
 
 struct fs_enet_private {
 	struct napi_struct napi;
+	struct napi_struct napi_tx;
 	struct device *dev;	/* pointer back to the device (must be initialized first) */
 	struct net_device *ndev;
 	spinlock_t lock;	/* during all ops except TX pckt processing */
@@ -149,6 +153,7 @@ struct fs_enet_private {
 
 	/* event masks */
 	u32 ev_napi_rx;		/* mask of NAPI rx events */
+	u32 ev_napi_tx;		/* mask of NAPI rx events */
 	u32 ev_rx;		/* rx event mask          */
 	u32 ev_tx;		/* tx event mask          */
 	u32 ev_err;		/* error event mask       */
@@ -191,8 +196,8 @@ void fs_cleanup_bds(struct net_device *dev);
 
 #define DRV_MODULE_NAME		"fs_enet"
 #define PFX DRV_MODULE_NAME	": "
-#define DRV_MODULE_VERSION	"1.0"
-#define DRV_MODULE_RELDATE	"Aug 8, 2005"
+#define DRV_MODULE_VERSION	"1.1"
+#define DRV_MODULE_RELDATE	"Sep 22, 2014"
 
 /***************************************************************************/
 
diff --git a/drivers/net/ethernet/freescale/fs_enet/mac-fcc.c b/drivers/net/ethernet/freescale/fs_enet/mac-fcc.c
index f5383ab..2c578db 100644
--- a/drivers/net/ethernet/freescale/fs_enet/mac-fcc.c
+++ b/drivers/net/ethernet/freescale/fs_enet/mac-fcc.c
@@ -125,6 +125,7 @@ out:
 }
 
 #define FCC_NAPI_RX_EVENT_MSK	(FCC_ENET_RXF | FCC_ENET_RXB)
+#define FCC_NAPI_TX_EVENT_MSK	(FCC_ENET_TXF | FCC_ENET_TXB)
 #define FCC_RX_EVENT		(FCC_ENET_RXF)
 #define FCC_TX_EVENT		(FCC_ENET_TXB)
 #define FCC_ERR_EVENT_MSK	(FCC_ENET_TXE)
@@ -137,6 +138,7 @@ static int setup_data(struct net_device *dev)
 		return -EINVAL;
 
 	fep->ev_napi_rx = FCC_NAPI_RX_EVENT_MSK;
+	fep->ev_napi_tx = FCC_NAPI_TX_EVENT_MSK;
 	fep->ev_rx = FCC_RX_EVENT;
 	fep->ev_tx = FCC_TX_EVENT;
 	fep->ev_err = FCC_ERR_EVENT_MSK;
@@ -446,6 +448,30 @@ static void napi_disable_rx(struct net_device *dev)
 	C16(fccp, fcc_fccm, FCC_NAPI_RX_EVENT_MSK);
 }
 
+static void napi_clear_tx_event(struct net_device *dev)
+{
+	struct fs_enet_private *fep = netdev_priv(dev);
+	fcc_t __iomem *fccp = fep->fcc.fccp;
+
+	W16(fccp, fcc_fcce, FCC_NAPI_TX_EVENT_MSK);
+}
+
+static void napi_enable_tx(struct net_device *dev)
+{
+	struct fs_enet_private *fep = netdev_priv(dev);
+	fcc_t __iomem *fccp = fep->fcc.fccp;
+
+	S16(fccp, fcc_fccm, FCC_NAPI_TX_EVENT_MSK);
+}
+
+static void napi_disable_tx(struct net_device *dev)
+{
+	struct fs_enet_private *fep = netdev_priv(dev);
+	fcc_t __iomem *fccp = fep->fcc.fccp;
+
+	C16(fccp, fcc_fccm, FCC_NAPI_TX_EVENT_MSK);
+}
+
 static void rx_bd_done(struct net_device *dev)
 {
 	/* nothing */
@@ -572,6 +598,9 @@ const struct fs_ops fs_fcc_ops = {
 	.napi_clear_rx_event	= napi_clear_rx_event,
 	.napi_enable_rx		= napi_enable_rx,
 	.napi_disable_rx	= napi_disable_rx,
+	.napi_clear_tx_event	= napi_clear_tx_event,
+	.napi_enable_tx		= napi_enable_tx,
+	.napi_disable_tx	= napi_disable_tx,
 	.rx_bd_done		= rx_bd_done,
 	.tx_kickstart		= tx_kickstart,
 	.get_int_events		= get_int_events,
diff --git a/drivers/net/ethernet/freescale/fs_enet/mac-fec.c b/drivers/net/ethernet/freescale/fs_enet/mac-fec.c
index 1eedfba2..3d4e08b 100644
--- a/drivers/net/ethernet/freescale/fs_enet/mac-fec.c
+++ b/drivers/net/ethernet/freescale/fs_enet/mac-fec.c
@@ -110,6 +110,7 @@ static int do_pd_setup(struct fs_enet_private *fep)
 }
 
 #define FEC_NAPI_RX_EVENT_MSK	(FEC_ENET_RXF | FEC_ENET_RXB)
+#define FEC_NAPI_TX_EVENT_MSK	(FEC_ENET_TXF | FEC_ENET_TXB)
 #define FEC_RX_EVENT		(FEC_ENET_RXF)
 #define FEC_TX_EVENT		(FEC_ENET_TXF)
 #define FEC_ERR_EVENT_MSK	(FEC_ENET_HBERR | FEC_ENET_BABR | \
@@ -126,6 +127,7 @@ static int setup_data(struct net_device *dev)
 	fep->fec.htlo = 0;
 
 	fep->ev_napi_rx = FEC_NAPI_RX_EVENT_MSK;
+	fep->ev_napi_tx = FEC_NAPI_TX_EVENT_MSK;
 	fep->ev_rx = FEC_RX_EVENT;
 	fep->ev_tx = FEC_TX_EVENT;
 	fep->ev_err = FEC_ERR_EVENT_MSK;
@@ -415,6 +417,30 @@ static void napi_disable_rx(struct net_device *dev)
 	FC(fecp, imask, FEC_NAPI_RX_EVENT_MSK);
 }
 
+static void napi_clear_tx_event(struct net_device *dev)
+{
+	struct fs_enet_private *fep = netdev_priv(dev);
+	struct fec __iomem *fecp = fep->fec.fecp;
+
+	FW(fecp, ievent, FEC_NAPI_TX_EVENT_MSK);
+}
+
+static void napi_enable_tx(struct net_device *dev)
+{
+	struct fs_enet_private *fep = netdev_priv(dev);
+	struct fec __iomem *fecp = fep->fec.fecp;
+
+	FS(fecp, imask, FEC_NAPI_TX_EVENT_MSK);
+}
+
+static void napi_disable_tx(struct net_device *dev)
+{
+	struct fs_enet_private *fep = netdev_priv(dev);
+	struct fec __iomem *fecp = fep->fec.fecp;
+
+	FC(fecp, imask, FEC_NAPI_TX_EVENT_MSK);
+}
+
 static void rx_bd_done(struct net_device *dev)
 {
 	struct fs_enet_private *fep = netdev_priv(dev);
@@ -487,6 +513,9 @@ const struct fs_ops fs_fec_ops = {
 	.napi_clear_rx_event	= napi_clear_rx_event,
 	.napi_enable_rx		= napi_enable_rx,
 	.napi_disable_rx	= napi_disable_rx,
+	.napi_clear_tx_event	= napi_clear_tx_event,
+	.napi_enable_tx		= napi_enable_tx,
+	.napi_disable_tx	= napi_disable_tx,
 	.rx_bd_done		= rx_bd_done,
 	.tx_kickstart		= tx_kickstart,
 	.get_int_events		= get_int_events,
diff --git a/drivers/net/ethernet/freescale/fs_enet/mac-scc.c b/drivers/net/ethernet/freescale/fs_enet/mac-scc.c
index 90b3b19..41aa0b4 100644
--- a/drivers/net/ethernet/freescale/fs_enet/mac-scc.c
+++ b/drivers/net/ethernet/freescale/fs_enet/mac-scc.c
@@ -116,6 +116,7 @@ static int do_pd_setup(struct fs_enet_private *fep)
 }
 
 #define SCC_NAPI_RX_EVENT_MSK	(SCCE_ENET_RXF | SCCE_ENET_RXB)
+#define SCC_NAPI_TX_EVENT_MSK	(SCCE_ENET_TXF | SCCE_ENET_TXB)
 #define SCC_RX_EVENT		(SCCE_ENET_RXF)
 #define SCC_TX_EVENT		(SCCE_ENET_TXB)
 #define SCC_ERR_EVENT_MSK	(SCCE_ENET_TXE | SCCE_ENET_BSY)
@@ -130,6 +131,7 @@ static int setup_data(struct net_device *dev)
 	fep->scc.htlo = 0;
 
 	fep->ev_napi_rx = SCC_NAPI_RX_EVENT_MSK;
+	fep->ev_napi_tx = SCC_NAPI_TX_EVENT_MSK;
 	fep->ev_rx = SCC_RX_EVENT;
 	fep->ev_tx = SCC_TX_EVENT | SCCE_ENET_TXE;
 	fep->ev_err = SCC_ERR_EVENT_MSK;
@@ -398,6 +400,30 @@ static void napi_disable_rx(struct net_device *dev)
 	C16(sccp, scc_sccm, SCC_NAPI_RX_EVENT_MSK);
 }
 
+static void napi_clear_tx_event(struct net_device *dev)
+{
+	struct fs_enet_private *fep = netdev_priv(dev);
+	scc_t __iomem *sccp = fep->scc.sccp;
+
+	W16(sccp, scc_scce, SCC_NAPI_TX_EVENT_MSK);
+}
+
+static void napi_enable_tx(struct net_device *dev)
+{
+	struct fs_enet_private *fep = netdev_priv(dev);
+	scc_t __iomem *sccp = fep->scc.sccp;
+
+	S16(sccp, scc_sccm, SCC_NAPI_TX_EVENT_MSK);
+}
+
+static void napi_disable_tx(struct net_device *dev)
+{
+	struct fs_enet_private *fep = netdev_priv(dev);
+	scc_t __iomem *sccp = fep->scc.sccp;
+
+	C16(sccp, scc_sccm, SCC_NAPI_TX_EVENT_MSK);
+}
+
 static void rx_bd_done(struct net_device *dev)
 {
 	/* nothing */
@@ -471,6 +497,9 @@ const struct fs_ops fs_scc_ops = {
 	.napi_clear_rx_event	= napi_clear_rx_event,
 	.napi_enable_rx		= napi_enable_rx,
 	.napi_disable_rx	= napi_disable_rx,
+	.napi_clear_tx_event	= napi_clear_tx_event,
+	.napi_enable_tx		= napi_enable_tx,
+	.napi_disable_tx	= napi_disable_tx,
 	.rx_bd_done		= rx_bd_done,
 	.tx_kickstart		= tx_kickstart,
 	.get_int_events		= get_int_events,
-- 
2.1.0

^ permalink raw reply related

* [PATCH 1/2] net: fs_enet: Remove non NAPI RX
From: Christophe Leroy @ 2014-10-07 13:04 UTC (permalink / raw)
  To: Pantelis Antoniou, Vitaly Bordug; +Cc: linux-kernel, linuxppc-dev, netdev

In the probe function, use_napi is inconditionnaly set to 1. This patch removes
all the code which is conditional to !use_napi, and removes use_napi which has
then become useless.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>

---
 .../net/ethernet/freescale/fs_enet/fs_enet-main.c  | 164 ++-------------------
 include/linux/fs_enet_pd.h                         |   1 -
 2 files changed, 15 insertions(+), 150 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c b/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c
index 748fd24..71a25b4 100644
--- a/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c
+++ b/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c
@@ -215,128 +215,6 @@ static int fs_enet_rx_napi(struct napi_struct *napi, int budget)
 	return received;
 }
 
-/* non NAPI receive function */
-static int fs_enet_rx_non_napi(struct net_device *dev)
-{
-	struct fs_enet_private *fep = netdev_priv(dev);
-	const struct fs_platform_info *fpi = fep->fpi;
-	cbd_t __iomem *bdp;
-	struct sk_buff *skb, *skbn, *skbt;
-	int received = 0;
-	u16 pkt_len, sc;
-	int curidx;
-	/*
-	 * First, grab all of the stats for the incoming packet.
-	 * These get messed up if we get called due to a busy condition.
-	 */
-	bdp = fep->cur_rx;
-
-	while (((sc = CBDR_SC(bdp)) & BD_ENET_RX_EMPTY) == 0) {
-
-		curidx = bdp - fep->rx_bd_base;
-
-		/*
-		 * Since we have allocated space to hold a complete frame,
-		 * the last indicator should be set.
-		 */
-		if ((sc & BD_ENET_RX_LAST) == 0)
-			dev_warn(fep->dev, "rcv is not +last\n");
-
-		/*
-		 * Check for errors.
-		 */
-		if (sc & (BD_ENET_RX_LG | BD_ENET_RX_SH | BD_ENET_RX_CL |
-			  BD_ENET_RX_NO | BD_ENET_RX_CR | BD_ENET_RX_OV)) {
-			fep->stats.rx_errors++;
-			/* Frame too long or too short. */
-			if (sc & (BD_ENET_RX_LG | BD_ENET_RX_SH))
-				fep->stats.rx_length_errors++;
-			/* Frame alignment */
-			if (sc & (BD_ENET_RX_NO | BD_ENET_RX_CL))
-				fep->stats.rx_frame_errors++;
-			/* CRC Error */
-			if (sc & BD_ENET_RX_CR)
-				fep->stats.rx_crc_errors++;
-			/* FIFO overrun */
-			if (sc & BD_ENET_RX_OV)
-				fep->stats.rx_crc_errors++;
-
-			skb = fep->rx_skbuff[curidx];
-
-			dma_unmap_single(fep->dev, CBDR_BUFADDR(bdp),
-				L1_CACHE_ALIGN(PKT_MAXBUF_SIZE),
-				DMA_FROM_DEVICE);
-
-			skbn = skb;
-
-		} else {
-
-			skb = fep->rx_skbuff[curidx];
-
-			dma_unmap_single(fep->dev, CBDR_BUFADDR(bdp),
-				L1_CACHE_ALIGN(PKT_MAXBUF_SIZE),
-				DMA_FROM_DEVICE);
-
-			/*
-			 * Process the incoming frame.
-			 */
-			fep->stats.rx_packets++;
-			pkt_len = CBDR_DATLEN(bdp) - 4;	/* remove CRC */
-			fep->stats.rx_bytes += pkt_len + 4;
-
-			if (pkt_len <= fpi->rx_copybreak) {
-				/* +2 to make IP header L1 cache aligned */
-				skbn = netdev_alloc_skb(dev, pkt_len + 2);
-				if (skbn != NULL) {
-					skb_reserve(skbn, 2);	/* align IP header */
-					skb_copy_from_linear_data(skb,
-						      skbn->data, pkt_len);
-					/* swap */
-					skbt = skb;
-					skb = skbn;
-					skbn = skbt;
-				}
-			} else {
-				skbn = netdev_alloc_skb(dev, ENET_RX_FRSIZE);
-
-				if (skbn)
-					skb_align(skbn, ENET_RX_ALIGN);
-			}
-
-			if (skbn != NULL) {
-				skb_put(skb, pkt_len);	/* Make room */
-				skb->protocol = eth_type_trans(skb, dev);
-				received++;
-				netif_rx(skb);
-			} else {
-				fep->stats.rx_dropped++;
-				skbn = skb;
-			}
-		}
-
-		fep->rx_skbuff[curidx] = skbn;
-		CBDW_BUFADDR(bdp, dma_map_single(fep->dev, skbn->data,
-			     L1_CACHE_ALIGN(PKT_MAXBUF_SIZE),
-			     DMA_FROM_DEVICE));
-		CBDW_DATLEN(bdp, 0);
-		CBDW_SC(bdp, (sc & ~BD_ENET_RX_STATS) | BD_ENET_RX_EMPTY);
-
-		/*
-		 * Update BD pointer to next entry.
-		 */
-		if ((sc & BD_ENET_RX_WRAP) == 0)
-			bdp++;
-		else
-			bdp = fep->rx_bd_base;
-
-		(*fep->ops->rx_bd_done)(dev);
-	}
-
-	fep->cur_rx = bdp;
-
-	return 0;
-}
-
 static void fs_enet_tx(struct net_device *dev)
 {
 	struct fs_enet_private *fep = netdev_priv(dev);
@@ -453,8 +331,7 @@ fs_enet_interrupt(int irq, void *dev_id)
 		nr++;
 
 		int_clr_events = int_events;
-		if (fpi->use_napi)
-			int_clr_events &= ~fep->ev_napi_rx;
+		int_clr_events &= ~fep->ev_napi_rx;
 
 		(*fep->ops->clear_int_events)(dev, int_clr_events);
 
@@ -462,19 +339,15 @@ fs_enet_interrupt(int irq, void *dev_id)
 			(*fep->ops->ev_error)(dev, int_events);
 
 		if (int_events & fep->ev_rx) {
-			if (!fpi->use_napi)
-				fs_enet_rx_non_napi(dev);
-			else {
-				napi_ok = napi_schedule_prep(&fep->napi);
-
-				(*fep->ops->napi_disable_rx)(dev);
-				(*fep->ops->clear_int_events)(dev, fep->ev_napi_rx);
-
-				/* NOTE: it is possible for FCCs in NAPI mode    */
-				/* to submit a spurious interrupt while in poll  */
-				if (napi_ok)
-					__napi_schedule(&fep->napi);
-			}
+			napi_ok = napi_schedule_prep(&fep->napi);
+
+			(*fep->ops->napi_disable_rx)(dev);
+			(*fep->ops->clear_int_events)(dev, fep->ev_napi_rx);
+
+			/* NOTE: it is possible for FCCs in NAPI mode    */
+			/* to submit a spurious interrupt while in poll  */
+			if (napi_ok)
+				__napi_schedule(&fep->napi);
 		}
 
 		if (int_events & fep->ev_tx)
@@ -811,24 +684,21 @@ static int fs_enet_open(struct net_device *dev)
 	/* not doing this, will cause a crash in fs_enet_rx_napi */
 	fs_init_bds(fep->ndev);
 
-	if (fep->fpi->use_napi)
-		napi_enable(&fep->napi);
+	napi_enable(&fep->napi);
 
 	/* Install our interrupt handler. */
 	r = request_irq(fep->interrupt, fs_enet_interrupt, IRQF_SHARED,
 			"fs_enet-mac", dev);
 	if (r != 0) {
 		dev_err(fep->dev, "Could not allocate FS_ENET IRQ!");
-		if (fep->fpi->use_napi)
-			napi_disable(&fep->napi);
+		napi_disable(&fep->napi);
 		return -EINVAL;
 	}
 
 	err = fs_init_phy(dev);
 	if (err) {
 		free_irq(fep->interrupt, dev);
-		if (fep->fpi->use_napi)
-			napi_disable(&fep->napi);
+		napi_disable(&fep->napi);
 		return err;
 	}
 	phy_start(fep->phydev);
@@ -845,8 +715,7 @@ static int fs_enet_close(struct net_device *dev)
 
 	netif_stop_queue(dev);
 	netif_carrier_off(dev);
-	if (fep->fpi->use_napi)
-		napi_disable(&fep->napi);
+	napi_disable(&fep->napi);
 	phy_stop(fep->phydev);
 
 	spin_lock_irqsave(&fep->lock, flags);
@@ -1022,7 +891,6 @@ static int fs_enet_probe(struct platform_device *ofdev)
 	fpi->rx_ring = 32;
 	fpi->tx_ring = 32;
 	fpi->rx_copybreak = 240;
-	fpi->use_napi = 1;
 	fpi->napi_weight = 17;
 	fpi->phy_node = of_parse_phandle(ofdev->dev.of_node, "phy-handle", 0);
 	if (!fpi->phy_node && of_phy_is_fixed_link(ofdev->dev.of_node)) {
@@ -1102,9 +970,7 @@ static int fs_enet_probe(struct platform_device *ofdev)
 
 	ndev->netdev_ops = &fs_enet_netdev_ops;
 	ndev->watchdog_timeo = 2 * HZ;
-	if (fpi->use_napi)
-		netif_napi_add(ndev, &fep->napi, fs_enet_rx_napi,
-		               fpi->napi_weight);
+	netif_napi_add(ndev, &fep->napi, fs_enet_rx_napi, fpi->napi_weight);
 
 	ndev->ethtool_ops = &fs_ethtool_ops;
 
diff --git a/include/linux/fs_enet_pd.h b/include/linux/fs_enet_pd.h
index efb0596..77d783f 100644
--- a/include/linux/fs_enet_pd.h
+++ b/include/linux/fs_enet_pd.h
@@ -139,7 +139,6 @@ struct fs_platform_info {
 	int rx_ring, tx_ring;	/* number of buffers on rx     */
 	__u8 macaddr[ETH_ALEN];	/* mac address                 */
 	int rx_copybreak;	/* limit we copy small frames  */
-	int use_napi;		/* use NAPI                    */
 	int napi_weight;	/* NAPI weight                 */
 
 	int use_rmii;		/* use RMII mode 	       */
-- 
2.1.0

^ permalink raw reply related

* [PATCH 0/2] net: fs_enet: Remove non NAPI RX and add NAPI for TX
From: Christophe Leroy @ 2014-10-07 13:04 UTC (permalink / raw)
  To: Pantelis Antoniou, Vitaly Bordug; +Cc: linux-kernel, linuxppc-dev, netdev

When using a MPC8xx as a router, 'perf' shows a significant time spent in 
fs_enet_interrupt() and fs_enet_start_xmit().
'perf annotate' shows that the time spent in fs_enet_start_xmit is indeed spent
between spin_unlock_irqrestore() and the following instruction, hence in
interrupt handling. This is due to the TX complete interrupt that fires after
each transmitted packet.
This patchset first remove all non NAPI handling as NAPI has become the only
mode for RX, then adds NAPI for handling TX complete.
This improves NAT TCP throughput by 21% on MPC885 with FEC.

Tested on MPC885 with FEC.

[PATCH 1/2] net: fs_enet: Remove non NAPI RX
[PATCH 2/2] net: fs_enet: Add NAPI TX

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>

---
 .../net/ethernet/freescale/fs_enet/fs_enet-main.c  | 211 ++++++---------------
 .../net/ethernet/freescale/fs_enet/fs_enet.h       |   9 +-
 .../net/ethernet/freescale/fs_enet/mac-fcc.c       |  29 +++
 .../net/ethernet/freescale/fs_enet/mac-fec.c       |  29 +++
 .../net/ethernet/freescale/fs_enet/mac-scc.c       |  29 +++
 linux/include/linux/fs_enet_pd.h                   |   1 -
 6 files changed, 147 insertions(+), 161 deletions(-)

^ permalink raw reply

* Re: Quota in __qdisc_run() (was: qdisc: validate skb without holding lock)
From: Eric Dumazet @ 2014-10-07 12:47 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: David Miller, netdev, therbert, hannes, fw, dborkman, jhs,
	alexander.duyck, john.r.fastabend, dave.taht, toke
In-Reply-To: <20141007093441.35ce3a02@redhat.com>

On Tue, 2014-10-07 at 09:34 +0200, Jesper Dangaard Brouer wrote:
> On Fri, 03 Oct 2014 16:30:44 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > Another problem we need to address is the quota in __qdisc_run()
> > is no longer meaningfull, if each qdisc_restart() can pump many packets.
> 
> I fully agree. My earlier "magic" packet limit was covering/pampering
> over this issue.

Although quota was multiplied by 7 or 8 in worst case ?

> 
> > An idea would be to use the bstats (or cpu_qstats if applicable)
> 
> Please elaborate some more, as I don't completely follow (feel free to
> show with a patch ;-)).
> 

I was hoping John could finish the percpu stats before I do that.

Problem with q->bstats.packets is that TSO packets with 45 MSS add 45 to
this counter.

Using a time quota would be better, but : jiffies is too big, and
local_clock() might be too expensive.

^ permalink raw reply

* Re: [PATCH net-next] net/mlx4_en: remove NETDEV_TX_BUSY
From: Eric Dumazet @ 2014-10-07 12:38 UTC (permalink / raw)
  To: amirv
  Cc: David S. Miller, Eric Dumazet, netdev, Yevgeny Petrilin,
	Or Gerlitz, Ido Shamay
In-Reply-To: <5433A48E.1040505@gmail.com>

On Tue, 2014-10-07 at 11:30 +0300, Amir Vadai wrote:

> Reviewed. Also verified that it fixes the deadlock (by sending a large
> burst - larger than ring size). Before this fix, last packet of the
> burst wasn't sent, therefore no doorbell was rang, and the queue was
> stalled.
> 
> BTW, another nice optimization that we hope to send soon, is not to arm
> the CQ unless ringing the doorbell.
> 
> Acked-by: Amir Vadai <amirv@mellanox.com>

This sounds great, thanks Amir

^ permalink raw reply

* Re: randconfig build error with next-20141001, in drivers/i2c/algos/i2c-algo-bit.c
From: Stephane Grosjean @ 2014-10-07 12:37 UTC (permalink / raw)
  To: Oliver Hartkopp
  Cc: Randy Dunlap, Jim Davis, Stephen Rothwell, linux-next, linux-i2c,
	netdev@vger.kernel.org, linux-can
In-Reply-To: <5433AB31.9090603@hartkopp.net>

Le 07/10/2014 10:58, Oliver Hartkopp a écrit :
> On 10/06/2014 08:09 PM, Randy Dunlap wrote:
>> On 10/06/14 10:39, Oliver Hartkopp wrote:
>>> AFAICS there is 'just' a style problem as 'configs should not enable entire
>>> subsystems'. But it finally is a correct and valid Kconfig, right?
>> Yes, right.
> (..)
>
>> In the unlikely case that I2C is not enabled, the user should have to enable
>> it instead of a solitary driver enabling it.  IOW, if a subsystem is disabled,
>> the user probably wanted it that way and a single driver should not override
>> that setting.
> Due to the fact that a change to 'depends on I2C' would make the config option
> invisible (and therefore not selectable) in the case I2C was (unlikely)
> disabled I would finally vote to leave it as-is.
>
> The current Kconfig entry already contains a description that points to the
> requirement to have I2C and I2C_ALGOBIT to be enabled to compile this driver:
>
> config CAN_PEAK_PCIEC
> 	bool "PEAK PCAN-ExpressCard Cards"
> 	depends on CAN_PEAK_PCI
> 	select I2C
> 	select I2C_ALGOBIT
> 	default y
> 	---help---
> 	  Say Y here if you want to use a PCAN-ExpressCard from PEAK-System
> 	  Technik. This will also automatically select I2C and I2C_ALGO
> 	  configuration options.
>
> AFAIK the PEAK PCAN-ExpressCard is usually used in x86 architecture Laptops,
> so it's near to an academic discussion as x86 usually selects I2C ;-)
>
> @Stephane: When updating the help text to introduce the PCAN-ExpressCard 34
> support anyway you might probably add some more information *why* the I2C
> support is needed (for CAN transceiver settings and status LED).
>
> And /s/I2C_ALGO/I2C_ALGOBIT/ :-)

Ok! (FYI, I had already prepared the help text for introducing the 
PCIEC34. I will subst I2C_ALGO as well. I'll prepare the patch asap...)

Regards,

Stéphane
> Tnx & best regards,
> Oliver

--
PEAK-System Technik GmbH
Sitz der Gesellschaft Darmstadt
Handelsregister Darmstadt HRB 9183 
Geschaeftsfuehrung: Alexander Gach, Uwe Wilhelm
--

^ permalink raw reply

* Re: [net v2 3/8] net/fsl_pq_mdio: Replace spin_event_timeout() with arch independent
From: Claudiu Manoil @ 2014-10-07 12:16 UTC (permalink / raw)
  To: David Laight, netdev@vger.kernel.org
  Cc: David S. Miller, Xiubo Li, Shruti Kanetkar, Kim Phillips
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D174C5BAC@AcuExch.aculab.com>

On 10/7/2014 11:12 AM, David Laight wrote:
> From: netdev-owner@vger.kernel.
>> spin_event_timeout() is PPC dependent, use an arch independent
>> equivalent instead.
>
> I think you should white a local function/#define that expands to spin_event_timeout()
> on ppc and to the code below you are substituting on other architectures.
>
>>   	/* Wait for the transaction to finish */
>> -	status = spin_event_timeout(!(ioread32be(&regs->miimind) &
>> -				    MIIMIND_BUSY), MII_TIMEOUT, 0);
>> +	timeout = MII_TIMEOUT;
>> +	while ((ioread32be(&regs->miimind) & MIIMIND_BUSY) && timeout) {
>> +		cpu_relax();
>> +		timeout--;
>> +	}
>
> 	David
>
>
>

Hi David,

This point is debatable. Still better than adding a new local function 
would be to provide an implementation for spin_even_timeout() for ARM.
But it is not the place of the ethernet driver to provide an 
implementation of spin_event_timeout() on ARM.
Instead, I opted to simplify the busy wait/ timeout mechanism
in the driver using a simple and arch independent implementation 
replacing the PPC specific spin_event_timeout().  This timeout 
implementation, open-coded as a while() loop, is commonly used by other 
drivers too and I think that for this particular driver (fsl_pq_mdio) it 
is good enough to do the job, while keeping the code simple and 
portable.  It may not be as precise as spin_event_timeout() on PPC, but 
good enough.

Thanks,
Claudiu

^ permalink raw reply

* Re: [Patch net] net_sched: copy exts->type in tcf_exts_change()
From: Jamal Hadi Salim @ 2014-10-07 11:33 UTC (permalink / raw)
  To: Cong Wang, netdev; +Cc: John Fastabend
In-Reply-To: <1412641314-17335-1-git-send-email-xiyou.wangcong@gmail.com>

On 10/06/14 20:21, Cong Wang wrote:
> We need to copy exts->type when committing the change, otherwise
> it would be always 0. This is a quick fix for -net and -stable,
> for net-next tcf_exts will be removed.
>
> Fixes: commit 33be627159913b094bb578e83 ("net_sched: act: use standard struct list_head")
> Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>


cheers,
jamal

^ permalink raw reply

* Re: [iproute2 1/1] RFC: obsolete direct invocation of police
From: Jamal Hadi Salim @ 2014-10-07 11:32 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, xiyou.wangcong, john.r.fastabend
In-Reply-To: <20141006100132.032e75bc@urahara>

On 10/06/14 13:01, Stephen Hemminger wrote:

> I think iproute utilities needs to accept the old syntax and warn about
> deprecated syntax use. Later (like 2yr +) the code can be removed.
>
> The old syntax can be removed from all documentation and help messages
> now though.

Ok, so ignore that one patch. I will send another one.
Fair to put a date for when the obsoletion notice went out?
Example:
"As of October 10, 2014 this syntax is obsolete. Please use instead ..."
Maybe also mention when the approximate cutoff date is.

cheers,
jamal

^ permalink raw reply

* Re: [PATCH] net: Add ndo_gso_check
From: Or Gerlitz @ 2014-10-07 11:07 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Alexander Duyck, John Fastabend, Jeff Kirsher, David Miller,
	Linux Netdev List, Thomas Graf, Pravin Shelar, Andy Zhou
In-Reply-To: <CA+mtBx9HMuMnsmN0rjqV9-5iK9H6b+J8OZQsmmbFHwjm+qW7bQ@mail.gmail.com>

> yes, I will talk about FOU and GUE implementation. You should
> abstracts in the schedule now.

Thanks, I think it would be also good to cover the challenges we're
discussing over this
thread w.r.t nowadays upstream NICs HW/drivers

>> I think a replay of your LKS presentation along with open discussion
>> on how to get there with the legacy requirements could be very
>> helpful.

^ permalink raw reply

* hw csum error since 3.16‏
From: Klemen Mihevc @ 2014-10-07 10:54 UTC (permalink / raw)
  To: netdev

Since 3.16.0 (also present in 3.17.0) kernel im geting following error 
in my kern.log

Aug 15 14:51:00 mih kernel: <unknown>: hw csum failure
Aug 15 14:51:00 mih kernel: CPU: 0 PID: 5543 Comm: named Not tainted 
3.16.1-gentoo #1
Aug 15 14:51:00 mih kernel: Hardware name: stem manufacturer System 
Product Name/P5LD2, BIOS 2002    08/28/2009
Aug 15 14:51:00 mih kernel: c16e4bba c14ab6db 00000000 f1b29d20 ffff43fa 
00000008 87d4782b f5a31200
Aug 15 14:51:00 mih kernel: f1b1dc80 f1b29f24 0000005a c15e13b1 f1b29d58 
f1b29d5c 00000000 0000005a
Aug 15 14:51:00 mih kernel: 00000000 f1b29d78 f1b1dec0 00000000 00000000 
f1e3f600 00000040 f1b29f24
Aug 15 14:51:00 mih kernel: Call Trace:
Aug 15 14:51:00 mih kernel: [<c16e4bba>] ? dump_stack+0xa/0x13
Aug 15 14:51:00 mih kernel: [<c14ab6db>] ? 
skb_copy_and_csum_datagram_iovec+0xe5/0xea
Aug 15 14:51:00 mih kernel: [<c15e13b1>] ? udpv6_recvmsg+0xd9/0x4f7
Aug 15 14:51:00 mih kernel: [<c158efde>] ? inet_recvmsg+0x3e/0x50
Aug 15 14:51:00 mih kernel: [<c149f2d3>] ? sock_recvmsg+0x60/0x80
Aug 15 14:51:00 mih kernel: [<c14aa367>] ? verify_iovec+0x35/0x9e
Aug 15 14:51:00 mih kernel: [<c149f273>] ? kernel_sendmsg+0x34/0x34
Aug 15 14:51:00 mih kernel: [<c14a030e>] ? 
___sys_recvmsg.part.33+0xe6/0x175
Aug 15 14:51:00 mih kernel: [<c149f273>] ? kernel_sendmsg+0x34/0x34
Aug 15 14:51:00 mih kernel: [<c1093cae>] ? futex_wait+0x12e/0x1e1
Aug 15 14:51:00 mih kernel: [<c1583ce5>] ? 
ip4_datagram_connect+0x1e6/0x281
Aug 15 14:51:00 mih kernel: [<c108304a>] ? __wake_up_common+0x3f/0x66
Aug 15 14:51:00 mih kernel: [<c109390c>] ? get_futex_key+0x164/0x1d6
Aug 15 14:51:00 mih kernel: [<c1093a97>] ? futex_wake+0xcc/0xf3
Aug 15 14:51:00 mih kernel: [<c10f5982>] ? __fget_light+0x19/0x49
Aug 15 14:51:00 mih kernel: [<c14a1098>] ? __sys_recvmsg+0x44/0x6c
Aug 15 14:51:00 mih kernel: [<c14a142f>] ? SyS_socketcall+0xdf/0x2e6
Aug 15 14:51:00 mih kernel: [<c108b72d>] ? __getnstimeofday+0x2c/0x109
Aug 15 14:51:00 mih kernel: [<c10676ea>] ? SyS_gettimeofday+0x26/0x5e
Aug 15 14:51:00 mih kernel: [<c16e9fd1>] ? syscall_call+0x7/0x7
Aug 15 14:51:00 mih kernel: [<c16e0000>] ? soft_store+0x3b/0x6e

I guess it something related to my hardware/network. Everything seems to 
work fine tho... just error poping out every few minutes. Its kinda 
random, sometimes time between is ~5min, sometimes 30sec... If i boot 
with old kernel (3.15.8) this error is gone. If you need anything else i 
can provide config etc, but since this seems like somekind of trace i 
tought its enough :)

Also filled in a bug on bugzilla 
(https://bugzilla.kernel.org/show_bug.cgi?id=82461)

Thanks

^ permalink raw reply

* Re: r8168 is needed to enter P-state: Package State 6 (pc6)onHaswell hardware
From: Ceriel Jacobs @ 2014-10-07 10:40 UTC (permalink / raw)
  To: Francois Romieu, Hayes Wang; +Cc: nic_swsd, netdev@vger.kernel.org
In-Reply-To: <20141006221307.GB10936@electric-eye.fr.zoreil.com>



Francois Romieu schreef op 07-10-14 om 00:13:
> Hayes Wang <hayeswang@realtek.com> :
>>   Francois Romieu [mailto:romieu@fr.zoreil.com]
> [...]
>> I don't sure if the following information is helpful. Besides, I remember
>> the rtl_init_one() would disable it.
>>
>> http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=d64ec841517a25f6d468bde9f67e5b4cffdc67c7
>>
>> http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=4521e1a94279ce610d3f9b7945c17d581f804242
>
> Yes, I did not expect this stuff to stay in geostationary orbit for long :o/
>
> Realtek's r8168 driver defaults to CONFIG_ASPM=1

# modinfo r8168 suggests the opposite (ASPM is disabled by default):
version:        8.039.00-NAPI
parm:           aspm:Enable ASPM. (int)

When ASPM would be enabled by default, one would need a boot parameter like:
parm:		aspm:Disable ASPM. (int)

  but I guess some users
> need to disable it and there's no known pattern / blacklist, right ?

I don't want to disable ASPM. In fact the r8168 module I am even running 
with poot params like:
# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-3.17.0-999-generic root=/dev/sda1 ro 
biosdevname=0 intel_pstate=enable ipv6.disabled=1 debug ignore_loglevel 
panic=10 pcie_aspm.policy=powersave pcie_aspm=force r8168.aspm=1 
r8168.eee_enable=1 oops=panic

>
> Ceriel, does the patch below against current kernel make a difference ?

Francois, what do you mean with "current kernel", the latest Ubuntu 
mainline kernel or something different?

>
> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
> index 0921302..b4a3881 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -468,6 +468,7 @@ enum rtl8168_registers {
>   #define PWM_EN				(1 << 22)
>   #define RXDV_GATED_EN			(1 << 19)
>   #define EARLY_TALLY_EN			(1 << 16)
> +#define FORCE_CLK			(1 << 15) /* force clock request */
>   };
>
>   enum rtl_register_content {
> @@ -5279,8 +5280,10 @@ static void rtl_hw_start_8168g_1(struct rtl8169_private *tp)
>   	rtl_eri_write(tp, 0x2f8, ERIAR_MASK_0011, 0x1d8f, ERIAR_EXGMAC);
>
>   	RTL_W8(ChipCmd, CmdTxEnb | CmdRxEnb);
> -	RTL_W32(MISC, RTL_R32(MISC) & ~RXDV_GATED_EN);
> +	RTL_W32(MISC, (RTL_R32(MISC) | FORCE_CLK) & ~RXDV_GATED_EN);
>   	RTL_W8(MaxTxPacketSize, EarlySize);
> +	RTL_W8(Config5, RTL_R8(Config5) | ASPM_en);
> +	RTL_W8(Config2, RTL_R8(Config2) | ClkReqEn);
>
>   	rtl_eri_write(tp, 0xc0, ERIAR_MASK_0011, 0x0000, ERIAR_EXGMAC);
>   	rtl_eri_write(tp, 0xb8, ERIAR_MASK_0011, 0x0000, ERIAR_EXGMAC);
>

^ permalink raw reply

* Re: [Xen-devel] [PATCHv1] xen-netfront: always keep the Rx ring full of requests
From: David Vrabel @ 2014-10-07  9:43 UTC (permalink / raw)
  To: David Miller, annie.li; +Cc: netdev, boris.ostrovsky, xen-devel
In-Reply-To: <20141006.170748.1817067290457286845.davem@redhat.com>

On 06/10/14 22:07, David Miller wrote:
> From: annie li <annie.li@oracle.com>
> Date: Mon, 06 Oct 2014 14:41:48 -0400
> 
>>
>> On 2014/10/6 12:00, David Vrabel wrote:
>>>>>    +    queue->rx.req_prod_pvt = req_prod;
>>>>> +
>>>>> +    /* Not enough requests? Try again later. */
>>>>> +    if (req_prod - queue->rx.rsp_cons < NET_RX_SLOTS_MIN) {
>>>>> +        mod_timer(&queue->rx_refill_timer, jiffies + (HZ/10));
>>>>> +        return;
>>>> If the previous for loop breaks because of failure of
>>>> xennet_alloc_one_rx_buffer, then notify_remote_via_irq is missed here
>>>> if
>>>> the code returns directly.
>>> This is deliberate -- there's no point notifying the backend if there
>>> aren't enough requests for the next packet.  Since we don't know what
>>> the next packet might be we assume it's the largest possible.
>> That makes sense.
>> However, the largest packet case does not happen so
>> frequently. Moreover, netback checks the slots every incoming skb
>> requires in xenvif_rx_ring_slots_available, not only concerning the
>> largest case.

An upcoming change to netback will cause it to wait for enough slots for
the largest possible packet.

> I have an opinion about the sysfs stuff.
> 
> It's user facing, so even if it doesn't influence behavior any more
> you have to keep the files around, just make them nops.

That's a good point.

David

^ permalink raw reply

* RE: [net-next PATCH v1 1/3] net: sched: af_packet support for direct ring access
From: David Laight @ 2014-10-07  9:27 UTC (permalink / raw)
  To: 'Willem de Bruijn', John Fastabend
  Cc: Daniel Borkmann, Florian Westphal, gerlitz.or@gmail.com,
	Hannes Frederic Sowa, Network Development, john.ronciak@intel.com,
	Amir Vadai, Eric Dumazet, danny.zhou@intel.com
In-Reply-To: <CA+FuTSfwTBtZLu7CXh4bPxUtVubOvpCPo+O38BsnSiLdvV_KEA@mail.gmail.com>

From: Willem de Bruijn
...
> When keeping the kernel in the loop, it is possible to do
> some basic sanity checking and transparently translate between
> vaddr and paddr, even when exposing the hardware descriptors
> directly.

The application could change the addresses after they have been
validated, but before they have been read by the device.

> Though at this point it may be just as cheap to expose
> an idealized virtualized descriptor format and copy fields between
> that and device descriptors.

That is (probably) the only scheme that stops the application
accessing random parts of physical memory.

> One assumption underlying exposing the hardware descriptors
> is that they are quire similar between devices. How true is this
> in the context of formats that span multiple descriptors?

I suspect you'd need to define complete ring entries for 'initial',
'middle', 'final' and 'complete' fragments, together with the
offsets and endianness (and size?) of the address and data fields.

Also whether there is a special 'last entry' in the ring.

Passing checksum offload flags through adds an extra level of complexity.

Rings like the xhci (actually USB, but could contain ethernet data)
require the 'owner' bit be written odd or even in alternating passes.
Actually mapping support for usbnet (especially xhci - usb3) might show
up some deficiencies in the definition.

You also need to know when transmits have completed.
This might be an 'owner' bit being cleared, but could be signalled
in an entirely different way.

	David

^ permalink raw reply

* [PATCH linux v3 1/1] fs/proc: use a rb tree for the directory entries
From: Nicolas Dichtel @ 2014-10-07  9:02 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: davem, ebiederm, akpm, adobriyan, rui.xiang, viro, oleg, gorcunov,
	kirill.shutemov, grant.likely, tytso, Nicolas Dichtel
In-Reply-To: <1412672559-5256-1-git-send-email-nicolas.dichtel@6wind.com>

The current implementation for the directories in /proc is using a single
linked list. This is slow when handling directories with large numbers of
entries (eg netdevice-related entries when lots of tunnels are opened).

This patch replaces this linked list by a red-black tree.

Here are some numbers:

dummy30000.batch contains 30 000 times 'link add type dummy'.

Before the patch:
$ time ip -b dummy30000.batch
real	2m31.950s
user	0m0.440s
sys	2m21.440s
$ time rmmod dummy
real	1m35.764s
user	0m0.000s
sys	1m24.088s

After the patch:
$ time ip -b dummy30000.batch
real	2m0.874s
user	0m0.448s
sys	1m49.720s
$ time rmmod dummy
real	1m13.988s
user	0m0.000s
sys	1m1.008s

The idea of improving this part was suggested by
Thierry Herbelot <thierry.herbelot@6wind.com>.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: David S. Miller <davem@davemloft.net>
---
 fs/proc/generic.c  | 164 ++++++++++++++++++++++++++++++++++-------------------
 fs/proc/internal.h |  11 ++--
 fs/proc/proc_net.c |   1 +
 fs/proc/root.c     |   1 +
 4 files changed, 113 insertions(+), 64 deletions(-)

diff --git a/fs/proc/generic.c b/fs/proc/generic.c
index 317b72641ebf..9f8fa1e5e8aa 100644
--- a/fs/proc/generic.c
+++ b/fs/proc/generic.c
@@ -31,9 +31,81 @@ static DEFINE_SPINLOCK(proc_subdir_lock);
 
 static int proc_match(unsigned int len, const char *name, struct proc_dir_entry *de)
 {
-	if (de->namelen != len)
-		return 0;
-	return !memcmp(name, de->name, len);
+	if (len < de->namelen)
+		return -1;
+	if (len > de->namelen)
+		return 1;
+
+	return memcmp(name, de->name, len);
+}
+
+static struct proc_dir_entry *pde_subdir_first(struct proc_dir_entry *dir)
+{
+	struct rb_node *node = rb_first(&dir->subdir);
+
+	if (node == NULL)
+		return NULL;
+
+	return rb_entry(node, struct proc_dir_entry, subdir_node);
+}
+
+static struct proc_dir_entry *pde_subdir_next(struct proc_dir_entry *dir)
+{
+	struct rb_node *node = rb_next(&dir->subdir_node);
+
+	if (node == NULL)
+		return NULL;
+
+	return rb_entry(node, struct proc_dir_entry, subdir_node);
+}
+
+static struct proc_dir_entry *pde_subdir_find(struct proc_dir_entry *dir,
+					      const char *name,
+					      unsigned int len)
+{
+	struct rb_node *node = dir->subdir.rb_node;
+
+	while (node) {
+		struct proc_dir_entry *de = container_of(node,
+							 struct proc_dir_entry,
+							 subdir_node);
+		int result = proc_match(len, name, de);
+
+		if (result < 0)
+			node = node->rb_left;
+		else if (result > 0)
+			node = node->rb_right;
+		else
+			return de;
+	}
+	return NULL;
+}
+
+static bool pde_subdir_insert(struct proc_dir_entry *dir,
+			      struct proc_dir_entry *de)
+{
+	struct rb_root *root = &dir->subdir;
+	struct rb_node **new = &root->rb_node, *parent = NULL;
+
+	/* Figure out where to put new node */
+	while (*new) {
+		struct proc_dir_entry *this =
+			container_of(*new, struct proc_dir_entry, subdir_node);
+		int result = proc_match(de->namelen, de->name, this);
+
+		parent = *new;
+		if (result < 0)
+			new = &(*new)->rb_left;
+		else if (result > 0)
+			new = &(*new)->rb_right;
+		else
+			return false;
+	}
+
+	/* Add new node and rebalance tree. */
+	rb_link_node(&de->subdir_node, parent, new);
+	rb_insert_color(&de->subdir_node, root);
+	return true;
 }
 
 static int proc_notify_change(struct dentry *dentry, struct iattr *iattr)
@@ -92,10 +164,7 @@ static int __xlate_proc_name(const char *name, struct proc_dir_entry **ret,
 			break;
 
 		len = next - cp;
-		for (de = de->subdir; de ; de = de->next) {
-			if (proc_match(len, cp, de))
-				break;
-		}
+		de = pde_subdir_find(de, cp, len);
 		if (!de) {
 			WARN(1, "name '%s'\n", name);
 			return -ENOENT;
@@ -183,19 +252,16 @@ struct dentry *proc_lookup_de(struct proc_dir_entry *de, struct inode *dir,
 	struct inode *inode;
 
 	spin_lock(&proc_subdir_lock);
-	for (de = de->subdir; de ; de = de->next) {
-		if (de->namelen != dentry->d_name.len)
-			continue;
-		if (!memcmp(dentry->d_name.name, de->name, de->namelen)) {
-			pde_get(de);
-			spin_unlock(&proc_subdir_lock);
-			inode = proc_get_inode(dir->i_sb, de);
-			if (!inode)
-				return ERR_PTR(-ENOMEM);
-			d_set_d_op(dentry, &simple_dentry_operations);
-			d_add(dentry, inode);
-			return NULL;
-		}
+	de = pde_subdir_find(de, dentry->d_name.name, dentry->d_name.len);
+	if (de) {
+		pde_get(de);
+		spin_unlock(&proc_subdir_lock);
+		inode = proc_get_inode(dir->i_sb, de);
+		if (!inode)
+			return ERR_PTR(-ENOMEM);
+		d_set_d_op(dentry, &simple_dentry_operations);
+		d_add(dentry, inode);
+		return NULL;
 	}
 	spin_unlock(&proc_subdir_lock);
 	return ERR_PTR(-ENOENT);
@@ -225,7 +291,7 @@ int proc_readdir_de(struct proc_dir_entry *de, struct file *file,
 		return 0;
 
 	spin_lock(&proc_subdir_lock);
-	de = de->subdir;
+	de = pde_subdir_first(de);
 	i = ctx->pos - 2;
 	for (;;) {
 		if (!de) {
@@ -234,7 +300,7 @@ int proc_readdir_de(struct proc_dir_entry *de, struct file *file,
 		}
 		if (!i)
 			break;
-		de = de->next;
+		de = pde_subdir_next(de);
 		i--;
 	}
 
@@ -249,7 +315,7 @@ int proc_readdir_de(struct proc_dir_entry *de, struct file *file,
 		}
 		spin_lock(&proc_subdir_lock);
 		ctx->pos++;
-		next = de->next;
+		next = pde_subdir_next(de);
 		pde_put(de);
 		de = next;
 	} while (de);
@@ -286,9 +352,8 @@ static const struct inode_operations proc_dir_inode_operations = {
 
 static int proc_register(struct proc_dir_entry * dir, struct proc_dir_entry * dp)
 {
-	struct proc_dir_entry *tmp;
 	int ret;
-	
+
 	ret = proc_alloc_inum(&dp->low_ino);
 	if (ret)
 		return ret;
@@ -308,17 +373,10 @@ static int proc_register(struct proc_dir_entry * dir, struct proc_dir_entry * dp
 	}
 
 	spin_lock(&proc_subdir_lock);
-
-	for (tmp = dir->subdir; tmp; tmp = tmp->next)
-		if (strcmp(tmp->name, dp->name) == 0) {
-			WARN(1, "proc_dir_entry '%s/%s' already registered\n",
-				dir->name, dp->name);
-			break;
-		}
-
-	dp->next = dir->subdir;
 	dp->parent = dir;
-	dir->subdir = dp;
+	if (pde_subdir_insert(dir, dp) == false)
+		WARN(1, "proc_dir_entry '%s/%s' already registered\n",
+		     dir->name, dp->name);
 	spin_unlock(&proc_subdir_lock);
 
 	return 0;
@@ -354,6 +412,7 @@ static struct proc_dir_entry *__proc_create(struct proc_dir_entry **parent,
 	ent->namelen = qstr.len;
 	ent->mode = mode;
 	ent->nlink = nlink;
+	ent->subdir = RB_ROOT;
 	atomic_set(&ent->count, 1);
 	spin_lock_init(&ent->pde_unload_lock);
 	INIT_LIST_HEAD(&ent->pde_openers);
@@ -485,7 +544,6 @@ void pde_put(struct proc_dir_entry *pde)
  */
 void remove_proc_entry(const char *name, struct proc_dir_entry *parent)
 {
-	struct proc_dir_entry **p;
 	struct proc_dir_entry *de = NULL;
 	const char *fn = name;
 	unsigned int len;
@@ -497,14 +555,9 @@ void remove_proc_entry(const char *name, struct proc_dir_entry *parent)
 	}
 	len = strlen(fn);
 
-	for (p = &parent->subdir; *p; p=&(*p)->next ) {
-		if (proc_match(len, fn, *p)) {
-			de = *p;
-			*p = de->next;
-			de->next = NULL;
-			break;
-		}
-	}
+	de = pde_subdir_find(parent, fn, len);
+	if (de)
+		rb_erase(&de->subdir_node, &parent->subdir);
 	spin_unlock(&proc_subdir_lock);
 	if (!de) {
 		WARN(1, "name '%s'\n", name);
@@ -516,16 +569,15 @@ void remove_proc_entry(const char *name, struct proc_dir_entry *parent)
 	if (S_ISDIR(de->mode))
 		parent->nlink--;
 	de->nlink = 0;
-	WARN(de->subdir, "%s: removing non-empty directory "
-			 "'%s/%s', leaking at least '%s'\n", __func__,
-			 de->parent->name, de->name, de->subdir->name);
+	WARN(pde_subdir_first(de),
+	     "%s: removing non-empty directory '%s/%s', leaking at least '%s'\n",
+	     __func__, de->parent->name, de->name, pde_subdir_first(de)->name);
 	pde_put(de);
 }
 EXPORT_SYMBOL(remove_proc_entry);
 
 int remove_proc_subtree(const char *name, struct proc_dir_entry *parent)
 {
-	struct proc_dir_entry **p;
 	struct proc_dir_entry *root = NULL, *de, *next;
 	const char *fn = name;
 	unsigned int len;
@@ -537,24 +589,18 @@ int remove_proc_subtree(const char *name, struct proc_dir_entry *parent)
 	}
 	len = strlen(fn);
 
-	for (p = &parent->subdir; *p; p=&(*p)->next ) {
-		if (proc_match(len, fn, *p)) {
-			root = *p;
-			*p = root->next;
-			root->next = NULL;
-			break;
-		}
-	}
+	root = pde_subdir_find(parent, fn, len);
 	if (!root) {
 		spin_unlock(&proc_subdir_lock);
 		return -ENOENT;
 	}
+	rb_erase(&root->subdir_node, &parent->subdir);
+
 	de = root;
 	while (1) {
-		next = de->subdir;
+		next = pde_subdir_first(de);
 		if (next) {
-			de->subdir = next->next;
-			next->next = NULL;
+			rb_erase(&next->subdir_node, &de->subdir);
 			de = next;
 			continue;
 		}
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 7da13e49128a..433557634c1b 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -24,10 +24,9 @@ struct mempolicy;
  * tree) of these proc_dir_entries, so that we can dynamically
  * add new files to /proc.
  *
- * The "next" pointer creates a linked list of one /proc directory,
- * while parent/subdir create the directory structure (every
- * /proc file has a parent, but "subdir" is NULL for all
- * non-directory entries).
+ * parent/subdir are used for the directory structure (every /proc file has a
+ * parent, but "subdir" is empty for all non-directory entries).
+ * subdir_node is used to build the rb tree "subdir" of the parent.
  */
 struct proc_dir_entry {
 	unsigned int low_ino;
@@ -38,7 +37,9 @@ struct proc_dir_entry {
 	loff_t size;
 	const struct inode_operations *proc_iops;
 	const struct file_operations *proc_fops;
-	struct proc_dir_entry *next, *parent, *subdir;
+	struct proc_dir_entry *parent;
+	struct rb_root subdir;
+	struct rb_node subdir_node;
 	void *data;
 	atomic_t count;		/* use count */
 	atomic_t in_use;	/* number of callers into module in progress; */
diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c
index a63af3e0a612..1bde894bc624 100644
--- a/fs/proc/proc_net.c
+++ b/fs/proc/proc_net.c
@@ -192,6 +192,7 @@ static __net_init int proc_net_ns_init(struct net *net)
 	if (!netd)
 		goto out;
 
+	netd->subdir = RB_ROOT;
 	netd->data = net;
 	netd->nlink = 2;
 	netd->namelen = 3;
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 094e44d4a6be..4eae849baedd 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -166,6 +166,7 @@ void __init proc_root_init(void)
 {
 	int err;
 
+	proc_root.subdir = RB_ROOT;
 	proc_init_inodecache();
 	err = register_filesystem(&proc_fs_type);
 	if (err)
-- 
2.1.0

^ permalink raw reply related

* [PATCH linux v3 0/1] Optimize network interfaces creation
From: Nicolas Dichtel @ 2014-10-07  9:02 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: davem, ebiederm, akpm, adobriyan, rui.xiang, viro, oleg, gorcunov,
	kirill.shutemov, grant.likely, tytso
In-Reply-To: <20141006.181448.1696747135961247651.davem@davemloft.net>

When a lot of netdevices are created, one of the bottleneck is the creation
of proc entries. This serie aims to accelerate this part.

I'm not sure against which tree this patch should be done. I've done it against
linux.git.

v2 -> v3
 - restore credit to Thierry Herbelot for the initial idea.

RFCv1 -> v2
 - use a red-black tree instead of a hash list

 fs/proc/generic.c  | 164 ++++++++++++++++++++++++++++++++++-------------------
 fs/proc/internal.h |  11 ++--
 fs/proc/proc_net.c |   1 +
 fs/proc/root.c     |   1 +
 4 files changed, 113 insertions(+), 64 deletions(-)

Comments are welcome.

Regards,
Nicolas

^ permalink raw reply

* Re: randconfig build error with next-20141001, in drivers/i2c/algos/i2c-algo-bit.c
From: Oliver Hartkopp @ 2014-10-07  8:58 UTC (permalink / raw)
  To: Randy Dunlap, Stephane Grosjean
  Cc: Jim Davis, Stephen Rothwell, linux-next, linux-i2c,
	netdev@vger.kernel.org, linux-can
In-Reply-To: <5432DAE8.5030509@infradead.org>

On 10/06/2014 08:09 PM, Randy Dunlap wrote:
> On 10/06/14 10:39, Oliver Hartkopp wrote:

>> AFAICS there is 'just' a style problem as 'configs should not enable entire
>> subsystems'. But it finally is a correct and valid Kconfig, right?
> 
> Yes, right.

(..)

> In the unlikely case that I2C is not enabled, the user should have to enable
> it instead of a solitary driver enabling it.  IOW, if a subsystem is disabled,
> the user probably wanted it that way and a single driver should not override
> that setting.

Due to the fact that a change to 'depends on I2C' would make the config option
invisible (and therefore not selectable) in the case I2C was (unlikely)
disabled I would finally vote to leave it as-is.

The current Kconfig entry already contains a description that points to the
requirement to have I2C and I2C_ALGOBIT to be enabled to compile this driver:

config CAN_PEAK_PCIEC
	bool "PEAK PCAN-ExpressCard Cards"
	depends on CAN_PEAK_PCI
	select I2C
	select I2C_ALGOBIT
	default y
	---help---
	  Say Y here if you want to use a PCAN-ExpressCard from PEAK-System
	  Technik. This will also automatically select I2C and I2C_ALGO
	  configuration options.

AFAIK the PEAK PCAN-ExpressCard is usually used in x86 architecture Laptops,
so it's near to an academic discussion as x86 usually selects I2C ;-)

@Stephane: When updating the help text to introduce the PCAN-ExpressCard 34
support anyway you might probably add some more information *why* the I2C
support is needed (for CAN transceiver settings and status LED).

And /s/I2C_ALGO/I2C_ALGOBIT/ :-)

Tnx & best regards,
Oliver

^ permalink raw reply

* (unknown), 
From: Omar Hashim @ 2014-10-07  8:28 UTC (permalink / raw)





--
I have a lucrative business proposal of mutual 
interest to share with you, contact me if you are 
interested.
--

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox