Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next] tcp: remove dead prototype for tcp_v4_get_peer()
From: David Miller @ 2012-11-23 19:11 UTC (permalink / raw)
  To: ncardwell; +Cc: netdev
In-Reply-To: <1353642539-6509-1-git-send-email-ncardwell@google.com>

From: Neal Cardwell <ncardwell@google.com>
Date: Thu, 22 Nov 2012 22:48:59 -0500

> This function no longer exists.
> 
> Signed-off-by: Neal Cardwell <ncardwell@google.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next 0/9] tipc: updates for what will be v3.8
From: David Miller @ 2012-11-23 19:10 UTC (permalink / raw)
  To: paul.gortmaker; +Cc: netdev
In-Reply-To: <1353617994-3962-1-git-send-email-paul.gortmaker@windriver.com>

From: Paul Gortmaker <paul.gortmaker@windriver.com>
Date: Thu, 22 Nov 2012 15:59:45 -0500

> The most interesting thing here, at least from a user perspective,
> is the broadcast link fix -- where there was a corner case where
> two endpoints could get in a state where they disagree on where
> to start Rx and ack of broadcast packets.
> 
> There is also the poll/wait changes which could also impact
> end users for certain use cases - the fixes there also better
> align tipc with the rest of the networking code.
> 
> The rest largely falls into routine cleanup category, by getting
> rid of some unused routines, some Kconfig clutter, etc.
> 
> Assuming there is nothing that jumps out as needing a rework,
> the full set can be found as per below.

Pulled, thanks.

^ permalink raw reply

* Re: [PATCH] bonding: fix multiple bugs
From: David Miller @ 2012-11-23 19:06 UTC (permalink / raw)
  To: nikolay; +Cc: netdev, fubar, andy
In-Reply-To: <1353589827-2921-1-git-send-email-nikolay@redhat.com>


We've fixed thousands of bugs in the bonding driver, what about
this subject line describes what's unique about this commit?

This subject line is not descriptive enough, and actually it
implies that you need to submit multiple patches, one for each
of these bugs you are fixing.

I'm not applying this patch.

^ permalink raw reply

* Re: [PATCH] vxlan: fix command usage in its doc
From: David Miller @ 2012-11-23 19:04 UTC (permalink / raw)
  To: zwu.kernel; +Cc: netdev, wuzhy
In-Reply-To: <1353579001-31052-1-git-send-email-zwu.kernel@gmail.com>

From: zwu.kernel@gmail.com
Date: Thu, 22 Nov 2012 18:10:01 +0800

> From: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
> 
>   Some commands don't work in its example doc. The patch will fix it.
> 
> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net 1/1] 8139cp: revert "set ring address before enabling receiver"
From: David Miller @ 2012-11-23 19:01 UTC (permalink / raw)
  To: romieu; +Cc: netdev, dwmw2, jasowang, jgarzik, gilboad
In-Reply-To: <20121121200729.GA17603@electric-eye.fr.zoreil.com>

From: Francois Romieu <romieu@fr.zoreil.com>
Date: Wed, 21 Nov 2012 21:07:29 +0100

> This patch reverts b01af4579ec41f48e9b9c774e70bd6474ad210db.
> 
> The original patch was tested with emulated hardware. Real
> hardware chokes.
> 
> Fixes https://bugzilla.kernel.org/show_bug.cgi?id=47041
> 
> Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>

I leiu of a response from the realtec folks, I'm applying this
for now.

What I'll do, the next time I merge net into net-next, is revert
this revert and apply Dave's patch.

^ permalink raw reply

* Re: BQL support in gianfar causes network hickup
From: Paul Gortmaker @ 2012-11-23 16:34 UTC (permalink / raw)
  To: Keitel, Tino (ALC NetworX GmbH)
  Cc: David Miller, netdev@vger.kernel.org, Eric Dumazet
In-Reply-To: <9AA65D849A88EB44B5D9B6A8BA098E23040A60D6EE6E@Exchange1.lawo.de>

On 12-11-23 10:58 AM, Keitel, Tino (ALC NetworX GmbH) wrote:
> Hi,
> 
> commit d8a0f1b0af67679bba886784de10d8c21acc4e0e causes the following
> trace on a Freescale RDB8313 board:

Thanks for the report.

> 
> NETDEV WATCHDOG: eth1 (fsl-gianfar): transmit queue 0 timed out
> ------------[ cut here ]------------
> WARNING:
> at /home/keitelt1/src/git/linux-stable/net/sched/sch_generic.c:255
> Modules linked in:
> NIP: c02448b0 LR: c02448b0 CTR: c01c19b8
> REGS: c7ffbe40 TRAP: 0700   Not tainted  (3.7.0-rc6-rt18)
                                            ^^^^^^^^^^^^^^^
I almost overlooked the above.  It would have been nice to
see more explicit information on what kernel you are running.
I say that because the above concerns me.  For several reasons.

1) it looks to be not mainline, but preempt_rt
2) There is no RT on 3.7 yet, so I'm assuming this is a custom
   forward port of the 250 odd RT patches.  (The RT is 3.6.7-rt18,
   i.e. based on the 3.6 gregKH stable tree.)

> MSR: 00029032 <EE,ME,IR,DR,RI>  CR: 24002044  XER: 20000000
> TASK = c03dd370[0] 'swapper' THREAD: c03fe000
> GPR00: c02448b0 c7ffbef0 c03dd370 0000003f 00000001 c001aea8 00000000
> 00000001
> GPR08: 00000001 c03e0000 00000000 0000009d 24002084 1008eb5c 07ffb000
> ffffffff
> GPR16: 00000004 c0362c7c c03dfbf8 00200000 c0411ed0 c0411cd0 c0411ad0
> ffffffff
> GPR24: 00000000 c749e1d8 00000004 c783d1b0 c0400000 c03e0000 c749e000
> 00000000
> NIP [c02448b0] dev_watchdog+0x288/0x298
> LR [c02448b0] dev_watchdog+0x288/0x298
> Call Trace:
> [c7ffbef0] [c02448b0] dev_watchdog+0x288/0x298 (unreliable)
> [c7ffbf20] [c00267f8] call_timer_fn+0x6c/0xd8
> [c7ffbf50] [c00269e4] run_timer_softirq+0x180/0x1f8
> [c7ffbfa0] [c0021144] __do_softirq+0xc4/0x160
> [c7ffbff0] [c000d0b8] call_do_softirq+0x14/0x24
> [c03ffe90] [c00058e8] do_softirq+0x8c/0xb8
> [c03ffeb0] [c0021358] irq_exit+0x98/0xb4
> [c03ffec0] [c0009fb0] timer_interrupt+0x158/0x170
> [c03ffee0] [c000f02c] ret_from_except+0x0/0x14
> --- Exception: 901 at cpu_idle+0x94/0x100
>     LR = cpu_idle+0x94/0x100
> [c03fffa0] [c00088ec] cpu_idle+0x5c/0x100 (unreliable)
> [c03fffc0] [c03b37b0] start_kernel+0x2dc/0x2f0
> [c03ffff0] [00003438] 0x3438
> Instruction dump:
> 7d2903a6 4e800421 80fe01fc 4bffff74 7fc3f378 4bfecb7d 7fc4f378 7fe6fb78
> 7c651b78 3c60c038 38637280 48090e69 <0fe00000> 39200001 993cc7c9
> 4bffffb8
> ---[ end trace 32125455035c2f70 ]---
> 
> With this commit reverted, it works fine. v3.3 is ok, v3.4 contains the
> bad commit. The commit doesn't revert in a clean way in 3.7-rc6. I
> attached diff without the tqi changes.

Have you reproduced this on a mainline kernel, i.e. vanilla 3.4
or vanilla 3.7-rc6?  And then done a revert on that baseline?

The patch was relatively straightforward and reviewed by Eric
who knows this stuff inside out; it isn't immediately clear
to me why it would cause problems for you.

Paul.
--

> 
> The above trace happens while a ptp client (for IEEE1588) is running, so
> there is some locally generated network traffic. The client stops to
> work after this, but maybe this is due to bad error handling.
> 
> Regards,
> Tino
> 

^ permalink raw reply

* Re: [PATCH] net/macb: Use non-coherent memory for rx buffers
From: Joachim Eastwood @ 2012-11-23 16:12 UTC (permalink / raw)
  To: Nicolas Ferre
  Cc: David S. Miller, netdev, linux-arm-kernel, linux-kernel,
	Jean-Christophe PLAGNIOL-VILLARD, Havard Skinnemoen
In-Reply-To: <1353678601-26888-1-git-send-email-nicolas.ferre@atmel.com>

Hi Nicolas,

On 23 November 2012 14:50, Nicolas Ferre <nicolas.ferre@atmel.com> wrote:
> From: Havard Skinnemoen <havard@skinnemoen.net>
>
> Allocate regular pages to use as backing for the RX ring and use the
> DMA API to sync the caches. This should give a bit better performance
> since it allows the CPU to do burst transfers from memory. It is also
> a necessary step on the way to reduce the amount of copying done by
> the driver.
>
> Signed-off-by: Havard Skinnemoen <havard@skinnemoen.net>
> [nicolas.ferre@atmel.com: adapt to newer kernel]
> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
> ---
>  drivers/net/ethernet/cadence/macb.c | 206 +++++++++++++++++++++++-------------
>  drivers/net/ethernet/cadence/macb.h |  20 +++-
>  2 files changed, 148 insertions(+), 78 deletions(-)

<snip>

> diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
> index 570908b..74e68a3 100644
> --- a/drivers/net/ethernet/cadence/macb.h
> +++ b/drivers/net/ethernet/cadence/macb.h
> @@ -453,6 +453,23 @@ struct macb_dma_desc {
>  #define MACB_TX_USED_SIZE                      1
>
>  /**
> + * struct macb_rx_page - data associated with a page used as RX buffers
> + * @page: Physical page used as storage for the buffers
> + * @phys: DMA address of the page
> + *
> + * Each page is used to provide %MACB_RX_BUFFERS_PER_PAGE RX buffers.
> + * The page gets an initial reference when it is inserted into the
> + * ring, and an additional reference each time it is passed up the
> + * stack as a fragment. When all the buffers have been used, we drop
> + * the initial reference and allocate a new page. Any additional
> + * references are dropped when the higher layers free the skb.
> + */
> +struct macb_rx_page {
> +       struct page             *page;
> +       dma_addr_t              phys;
> +};
> +
> +/**
>   * struct macb_tx_skb - data about an skb which is being transmitted
>   * @skb: skb currently being transmitted
>   * @mapping: DMA address of the skb's data buffer
> @@ -543,7 +560,7 @@ struct macb {
>
>         unsigned int            rx_tail;
>         struct macb_dma_desc    *rx_ring;
> -       void                    *rx_buffers;
> +       struct macb_rx_page     *rx_page;
>
>         unsigned int            tx_head, tx_tail;
>         struct macb_dma_desc    *tx_ring;
> @@ -564,7 +581,6 @@ struct macb {
>
>         dma_addr_t              rx_ring_dma;
>         dma_addr_t              tx_ring_dma;
> -       dma_addr_t              rx_buffers_dma;
>
>         struct mii_bus          *mii_bus;
>         struct phy_device       *phy_dev;
> --

struct macb is shared between at91_ether and macb. Removing
rx_buffers_dma and rx_buffers will break compilation on at91_ether.

So please either leave the two struct members alone, for now, or fix
up at91_ether at the same time.

regards
Joachim Eastwood

^ permalink raw reply

* Re: [PATCH] net/macb: GEM DMA configuration register update
From: Joachim Eastwood @ 2012-11-23 16:04 UTC (permalink / raw)
  To: Nicolas Ferre
  Cc: David S. Miller, netdev, linux-arm-kernel, linux-kernel,
	Jean-Christophe PLAGNIOL-VILLARD
In-Reply-To: <1353678541-26839-1-git-send-email-nicolas.ferre@atmel.com>

On 23 November 2012 14:49, Nicolas Ferre <nicolas.ferre@atmel.com> wrote:
> Add information to the DMA Configuration Register to
> maximize system performance:
> - rx/tx packet buffer full memory size
> - allow possibility to use INCR16 if supported
>
> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>

Acked-by: Joachim Eastwood <manabian@gmail.com

regards
Joachim Eastwood

^ permalink raw reply

* BQL support in gianfar causes network hickup
From: Keitel, Tino (ALC NetworX GmbH) @ 2012-11-23 15:58 UTC (permalink / raw)
  To: Paul Gortmaker, David Miller; +Cc: netdev@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 2155 bytes --]

Hi,

commit d8a0f1b0af67679bba886784de10d8c21acc4e0e causes the following
trace on a Freescale RDB8313 board:

NETDEV WATCHDOG: eth1 (fsl-gianfar): transmit queue 0 timed out
------------[ cut here ]------------
WARNING:
at /home/keitelt1/src/git/linux-stable/net/sched/sch_generic.c:255
Modules linked in:
NIP: c02448b0 LR: c02448b0 CTR: c01c19b8
REGS: c7ffbe40 TRAP: 0700   Not tainted  (3.7.0-rc6-rt18)
MSR: 00029032 <EE,ME,IR,DR,RI>  CR: 24002044  XER: 20000000
TASK = c03dd370[0] 'swapper' THREAD: c03fe000
GPR00: c02448b0 c7ffbef0 c03dd370 0000003f 00000001 c001aea8 00000000
00000001
GPR08: 00000001 c03e0000 00000000 0000009d 24002084 1008eb5c 07ffb000
ffffffff
GPR16: 00000004 c0362c7c c03dfbf8 00200000 c0411ed0 c0411cd0 c0411ad0
ffffffff
GPR24: 00000000 c749e1d8 00000004 c783d1b0 c0400000 c03e0000 c749e000
00000000
NIP [c02448b0] dev_watchdog+0x288/0x298
LR [c02448b0] dev_watchdog+0x288/0x298
Call Trace:
[c7ffbef0] [c02448b0] dev_watchdog+0x288/0x298 (unreliable)
[c7ffbf20] [c00267f8] call_timer_fn+0x6c/0xd8
[c7ffbf50] [c00269e4] run_timer_softirq+0x180/0x1f8
[c7ffbfa0] [c0021144] __do_softirq+0xc4/0x160
[c7ffbff0] [c000d0b8] call_do_softirq+0x14/0x24
[c03ffe90] [c00058e8] do_softirq+0x8c/0xb8
[c03ffeb0] [c0021358] irq_exit+0x98/0xb4
[c03ffec0] [c0009fb0] timer_interrupt+0x158/0x170
[c03ffee0] [c000f02c] ret_from_except+0x0/0x14
--- Exception: 901 at cpu_idle+0x94/0x100
    LR = cpu_idle+0x94/0x100
[c03fffa0] [c00088ec] cpu_idle+0x5c/0x100 (unreliable)
[c03fffc0] [c03b37b0] start_kernel+0x2dc/0x2f0
[c03ffff0] [00003438] 0x3438
Instruction dump:
7d2903a6 4e800421 80fe01fc 4bffff74 7fc3f378 4bfecb7d 7fc4f378 7fe6fb78
7c651b78 3c60c038 38637280 48090e69 <0fe00000> 39200001 993cc7c9
4bffffb8
---[ end trace 32125455035c2f70 ]---

With this commit reverted, it works fine. v3.3 is ok, v3.4 contains the
bad commit. The commit doesn't revert in a clean way in 3.7-rc6. I
attached diff without the tqi changes.

The above trace happens while a ptp client (for IEEE1588) is running, so
there is some locally generated network traffic. The client stops to
work after this, but maybe this is due to bad error handling.

Regards,
Tino


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: gianfar_fix_ptp.diff --]
[-- Type: text/x-patch; name="gianfar_fix_ptp.diff", Size: 2551 bytes --]

diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
index 19ac096..e314c6f 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -1748,13 +1748,9 @@ static void free_skb_resources(struct gfar_private *priv)
 
 	/* Go through all the buffer descriptors and free their data buffers */
 	for (i = 0; i < priv->num_tx_queues; i++) {
-		struct netdev_queue *txq;
-
 		tx_queue = priv->tx_queue[i];
-		txq = netdev_get_tx_queue(tx_queue->dev, tx_queue->qindex);
 		if (tx_queue->tx_skbuff)
 			free_skb_tx_queue(tx_queue);
-		netdev_tx_reset_queue(txq);
 	}
 
 	for (i = 0; i < priv->num_rx_queues; i++) {
@@ -2210,8 +2206,6 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		lstatus |= BD_LFLAG(TXBD_CRC | TXBD_READY) | skb_headlen(skb);
 	}
 
-	netdev_tx_sent_queue(txq, skb->len);
-
 	/* We can work in parallel with gfar_clean_tx_ring(), except
 	 * when modifying num_txbdfree. Note that we didn't grab the lock
 	 * when we were reading the num_txbdfree and checking for available
@@ -2456,7 +2450,6 @@ static void gfar_align_skb(struct sk_buff *skb)
 static int gfar_clean_tx_ring(struct gfar_priv_tx_q *tx_queue)
 {
 	struct net_device *dev = tx_queue->dev;
-	struct netdev_queue *txq;
 	struct gfar_private *priv = netdev_priv(dev);
 	struct gfar_priv_rx_q *rx_queue = NULL;
 	struct txbd8 *bdp, *next = NULL;
@@ -2469,12 +2462,10 @@ static int gfar_clean_tx_ring(struct gfar_priv_tx_q *tx_queue)
 	int i;
 	int howmany = 0;
 	int tqi = tx_queue->qindex;
-	unsigned int bytes_sent = 0;
 	u32 lstatus;
 	size_t buflen;
 
 	rx_queue = priv->rx_queue[tqi];
-	txq = netdev_get_tx_queue(dev, tqi);
 	bdp = tx_queue->dirty_tx;
 	skb_dirtytx = tx_queue->skb_dirtytx;
 
@@ -2531,8 +2522,6 @@ static int gfar_clean_tx_ring(struct gfar_priv_tx_q *tx_queue)
 			bdp = next_txbd(bdp, base, tx_ring_size);
 		}
 
-		bytes_sent += skb->len;
-
 		dev_kfree_skb_any(skb);
 
 		tx_queue->tx_skbuff[skb_dirtytx] = NULL;
@@ -2547,15 +2536,13 @@ static int gfar_clean_tx_ring(struct gfar_priv_tx_q *tx_queue)
 	}
 
 	/* If we freed a buffer, we can restart transmission, if necessary */
-	if (netif_tx_queue_stopped(txq) && tx_queue->num_txbdfree)
+	if (__netif_subqueue_stopped(dev, tx_queue->qindex) && tx_queue->num_txbdfree)
 		netif_wake_subqueue(dev, tqi);
 
 	/* Update dirty indicators */
 	tx_queue->skb_dirtytx = skb_dirtytx;
 	tx_queue->dirty_tx = bdp;
 
-	netdev_tx_completed_queue(txq, howmany, bytes_sent);
-
 	return howmany;
 }
 

^ permalink raw reply related

* Hello my dear
From: Mercy Stokes @ 2012-11-23 14:10 UTC (permalink / raw)
  To: mercy5love5

Hello my dear

My name is Mercy.
I saw your email address when i was browsing through the internet searching
for honest partner.
I want us to be friends and to know more about each other.
Please contact me back through my email address so that i can
send my picture to you and also tell you more about myself.
I believe that age, distance or color doesn't matter in a Serious relationship
but love matters most.
I will be waiting to hear from you soon.
Thanks for your understanding.

With love from,
Mercy.

^ permalink raw reply

* Re: [PATCH v2] kvaser_usb: fix "dma on the stack" errors
From: Olivier Sobrie @ 2012-11-23 14:16 UTC (permalink / raw)
  To: Marc Kleine-Budde; +Cc: linux-can, Wolfgang Grandegge, netdev, linux-usb
In-Reply-To: <50AF8348.6080701@pengutronix.de>

On Fri, Nov 23, 2012 at 03:08:08PM +0100, Marc Kleine-Budde wrote:
> On 11/23/2012 02:54 PM, Olivier Sobrie wrote:
> > The dma buffer given to usb_bulk_msg() must be allocated and not on
> > the stack.
> > See Documentation/DMA-API-HOWTO.txt section "What memory is DMA'able?"
> > 
> > Signed-off-by: Olivier Sobrie <olivier@sobrie.be>
> 
> Thanks, I've squashed it into the original patch and pushed to
> linux-can-next/for-davem

Ok thanks.
Have a good week-end,

Olivier

> 
> Marc
> 
> -- 
> Pengutronix e.K.                  | Marc Kleine-Budde           |
> Industrial Linux Solutions        | Phone: +49-231-2826-924     |
> Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
> Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |
> 



-- 
Olivier

^ permalink raw reply

* Re: [PATCH v2] kvaser_usb: fix "dma on the stack" errors
From: Marc Kleine-Budde @ 2012-11-23 14:08 UTC (permalink / raw)
  To: Olivier Sobrie; +Cc: linux-can, Wolfgang Grandegge, netdev, linux-usb
In-Reply-To: <1353678883-23136-1-git-send-email-olivier@sobrie.be>

[-- Attachment #1: Type: text/plain, Size: 640 bytes --]

On 11/23/2012 02:54 PM, Olivier Sobrie wrote:
> The dma buffer given to usb_bulk_msg() must be allocated and not on
> the stack.
> See Documentation/DMA-API-HOWTO.txt section "What memory is DMA'able?"
> 
> Signed-off-by: Olivier Sobrie <olivier@sobrie.be>

Thanks, I've squashed it into the original patch and pushed to
linux-can-next/for-davem

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 261 bytes --]

^ permalink raw reply

* [net-next patch] tun: change tun_get_iff() prototype.
From: Rami Rosen @ 2012-11-23 13:58 UTC (permalink / raw)
  To: davem; +Cc: maxk, netdev, Rami Rosen

This patch changes tun_get_iff() prototype to return void as it never fails.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
---
 drivers/net/tun.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index b01e8c0..3bd9932 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1662,7 +1662,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 	return err;
 }
 
-static int tun_get_iff(struct net *net, struct tun_struct *tun,
+static void tun_get_iff(struct net *net, struct tun_struct *tun,
 		       struct ifreq *ifr)
 {
 	tun_debug(KERN_INFO, tun, "tun_get_iff\n");
@@ -1671,7 +1671,6 @@ static int tun_get_iff(struct net *net, struct tun_struct *tun,
 
 	ifr->ifr_flags = tun_flags(tun);
 
-	return 0;
 }
 
 /* This is like a cut-down ethtool ops, except done via tun fd so no
@@ -1847,9 +1846,7 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
 	ret = 0;
 	switch (cmd) {
 	case TUNGETIFF:
-		ret = tun_get_iff(current->nsproxy->net_ns, tun, &ifr);
-		if (ret)
-			break;
+		tun_get_iff(current->nsproxy->net_ns, tun, &ifr);
 
 		if (copy_to_user(argp, &ifr, ifreq_len))
 			ret = -EFAULT;
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH v2] kvaser_usb: fix "dma on the stack" errors
From: Olivier Sobrie @ 2012-11-23 13:54 UTC (permalink / raw)
  To: Marc Kleine-Budde, linux-can
  Cc: Greg KH, Wolfgang Grandegge, netdev, linux-usb, Daniel Berglund,
	Olivier Sobrie
In-Reply-To: <20121123134027.GA32098@hposo>

The dma buffer given to usb_bulk_msg() must be allocated and not on
the stack.
See Documentation/DMA-API-HOWTO.txt section "What memory is DMA'able?"

Signed-off-by: Olivier Sobrie <olivier@sobrie.be>
---
 drivers/net/can/usb/kvaser_usb.c |  111 ++++++++++++++++++++++++--------------
 1 file changed, 70 insertions(+), 41 deletions(-)

diff --git a/drivers/net/can/usb/kvaser_usb.c b/drivers/net/can/usb/kvaser_usb.c
index 8807bf8..1b159e7 100644
--- a/drivers/net/can/usb/kvaser_usb.c
+++ b/drivers/net/can/usb/kvaser_usb.c
@@ -421,14 +421,22 @@ end:
 static int kvaser_usb_send_simple_msg(const struct kvaser_usb *dev,
 				      u8 msg_id, int channel)
 {
-	struct kvaser_msg msg = {
-		.len = MSG_HEADER_LEN + sizeof(struct kvaser_msg_simple),
-		.id = msg_id,
-		.u.simple.channel = channel,
-		.u.simple.tid = 0xff,
-	};
-
-	return kvaser_usb_send_msg(dev, &msg);
+	struct kvaser_msg *msg;
+	int rc;
+
+	msg = kmalloc(sizeof(*msg), GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;
+
+	msg->id = msg_id;
+	msg->len = MSG_HEADER_LEN + sizeof(struct kvaser_msg_simple);
+	msg->u.simple.channel = channel;
+	msg->u.simple.tid = 0xff;
+
+	rc = kvaser_usb_send_msg(dev, msg);
+
+	kfree(msg);
+	return rc;
 }
 
 static int kvaser_usb_get_software_info(struct kvaser_usb *dev)
@@ -1057,20 +1065,27 @@ static int kvaser_usb_setup_rx_urbs(struct kvaser_usb *dev)
 
 static int kvaser_usb_set_opt_mode(const struct kvaser_usb_net_priv *priv)
 {
-	struct kvaser_msg msg = {
-		.id = CMD_SET_CTRL_MODE,
-		.len = MSG_HEADER_LEN +
-		       sizeof(struct kvaser_msg_ctrl_mode),
-		.u.ctrl_mode.tid = 0xff,
-		.u.ctrl_mode.channel = priv->channel,
-	};
+	struct kvaser_msg *msg;
+	int rc;
+
+	msg = kmalloc(sizeof(*msg), GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;
+
+	msg->id = CMD_SET_CTRL_MODE;
+	msg->len = MSG_HEADER_LEN + sizeof(struct kvaser_msg_ctrl_mode);
+	msg->u.ctrl_mode.tid = 0xff;
+	msg->u.ctrl_mode.channel = priv->channel;
 
 	if (priv->can.ctrlmode & CAN_CTRLMODE_LISTENONLY)
-		msg.u.ctrl_mode.ctrl_mode = KVASER_CTRL_MODE_SILENT;
+		msg->u.ctrl_mode.ctrl_mode = KVASER_CTRL_MODE_SILENT;
 	else
-		msg.u.ctrl_mode.ctrl_mode = KVASER_CTRL_MODE_NORMAL;
+		msg->u.ctrl_mode.ctrl_mode = KVASER_CTRL_MODE_NORMAL;
 
-	return kvaser_usb_send_msg(priv->dev, &msg);
+	rc = kvaser_usb_send_msg(priv->dev, msg);
+
+	kfree(msg);
+	return rc;
 }
 
 static int kvaser_usb_start_chip(struct kvaser_usb_net_priv *priv)
@@ -1163,15 +1178,22 @@ static int kvaser_usb_stop_chip(struct kvaser_usb_net_priv *priv)
 
 static int kvaser_usb_flush_queue(struct kvaser_usb_net_priv *priv)
 {
-	struct kvaser_msg msg = {
-		.id = CMD_FLUSH_QUEUE,
-		.len = MSG_HEADER_LEN +
-		       sizeof(struct kvaser_msg_flush_queue),
-		.u.flush_queue.channel = priv->channel,
-		.u.flush_queue.flags = 0x00,
-	};
-
-	return kvaser_usb_send_msg(priv->dev, &msg);
+	struct kvaser_msg *msg;
+	int rc;
+
+	msg = kmalloc(sizeof(*msg), GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;
+
+	msg->id = CMD_FLUSH_QUEUE;
+	msg->len = MSG_HEADER_LEN + sizeof(struct kvaser_msg_flush_queue);
+	msg->u.flush_queue.channel = priv->channel;
+	msg->u.flush_queue.flags = 0x00;
+
+	rc = kvaser_usb_send_msg(priv->dev, msg);
+
+	kfree(msg);
+	return rc;
 }
 
 static int kvaser_usb_close(struct net_device *netdev)
@@ -1364,24 +1386,31 @@ static int kvaser_usb_set_bittiming(struct net_device *netdev)
 	struct kvaser_usb_net_priv *priv = netdev_priv(netdev);
 	struct can_bittiming *bt = &priv->can.bittiming;
 	struct kvaser_usb *dev = priv->dev;
-	struct kvaser_msg msg = {
-		.id = CMD_SET_BUS_PARAMS,
-		.len = MSG_HEADER_LEN +
-		       sizeof(struct kvaser_msg_busparams),
-		.u.busparams.channel = priv->channel,
-		.u.busparams.tid = 0xff,
-		.u.busparams.bitrate = cpu_to_le32(bt->bitrate),
-		.u.busparams.sjw = bt->sjw,
-		.u.busparams.tseg1 = bt->prop_seg + bt->phase_seg1,
-		.u.busparams.tseg2 = bt->phase_seg2,
-	};
+	struct kvaser_msg *msg;
+	int rc;
+
+	msg = kmalloc(sizeof(*msg), GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;
+
+	msg->id = CMD_SET_BUS_PARAMS;
+	msg->len = MSG_HEADER_LEN + sizeof(struct kvaser_msg_busparams);
+	msg->u.busparams.channel = priv->channel;
+	msg->u.busparams.tid = 0xff;
+	msg->u.busparams.bitrate = cpu_to_le32(bt->bitrate);
+	msg->u.busparams.sjw = bt->sjw;
+	msg->u.busparams.tseg1 = bt->prop_seg + bt->phase_seg1;
+	msg->u.busparams.tseg2 = bt->phase_seg2;
 
 	if (priv->can.ctrlmode & CAN_CTRLMODE_3_SAMPLES)
-		msg.u.busparams.no_samp = 3;
+		msg->u.busparams.no_samp = 3;
 	else
-		msg.u.busparams.no_samp = 1;
+		msg->u.busparams.no_samp = 1;
+
+	rc = kvaser_usb_send_msg(dev, msg);
 
-	return kvaser_usb_send_msg(dev, &msg);
+	kfree(msg);
+	return rc;
 }
 
 static int kvaser_usb_set_mode(struct net_device *netdev,
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH] net/macb: Use non-coherent memory for rx buffers
From: Nicolas Ferre @ 2012-11-23 13:50 UTC (permalink / raw)
  To: David S. Miller, netdev
  Cc: linux-arm-kernel, linux-kernel, Joachim Eastwood,
	Jean-Christophe PLAGNIOL-VILLARD, Havard Skinnemoen,
	Nicolas Ferre

From: Havard Skinnemoen <havard@skinnemoen.net>

Allocate regular pages to use as backing for the RX ring and use the
DMA API to sync the caches. This should give a bit better performance
since it allows the CPU to do burst transfers from memory. It is also
a necessary step on the way to reduce the amount of copying done by
the driver.

Signed-off-by: Havard Skinnemoen <havard@skinnemoen.net>
[nicolas.ferre@atmel.com: adapt to newer kernel]
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
---
 drivers/net/ethernet/cadence/macb.c | 206 +++++++++++++++++++++++-------------
 drivers/net/ethernet/cadence/macb.h |  20 +++-
 2 files changed, 148 insertions(+), 78 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c
index 6a59bce..c2955da 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -10,6 +10,7 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 #include <linux/clk.h>
+#include <linux/highmem.h>
 #include <linux/module.h>
 #include <linux/moduleparam.h>
 #include <linux/kernel.h>
@@ -35,6 +36,8 @@
 #define RX_BUFFER_SIZE		128
 #define RX_RING_SIZE		512 /* must be power of 2 */
 #define RX_RING_BYTES		(sizeof(struct macb_dma_desc) * RX_RING_SIZE)
+#define RX_BUFFERS_PER_PAGE	(PAGE_SIZE / RX_BUFFER_SIZE)
+#define RX_RING_PAGES		(RX_RING_SIZE / RX_BUFFERS_PER_PAGE)
 
 #define TX_RING_SIZE		128 /* must be power of 2 */
 #define TX_RING_BYTES		(sizeof(struct macb_dma_desc) * TX_RING_SIZE)
@@ -90,9 +93,16 @@ static struct macb_dma_desc *macb_rx_desc(struct macb *bp, unsigned int index)
 	return &bp->rx_ring[macb_rx_ring_wrap(index)];
 }
 
-static void *macb_rx_buffer(struct macb *bp, unsigned int index)
+static struct macb_rx_page *macb_rx_page(struct macb *bp, unsigned int index)
 {
-	return bp->rx_buffers + RX_BUFFER_SIZE * macb_rx_ring_wrap(index);
+	unsigned int entry = macb_rx_ring_wrap(index);
+
+	return &bp->rx_page[entry / RX_BUFFERS_PER_PAGE];
+}
+
+static unsigned int macb_rx_page_offset(struct macb *bp, unsigned int index)
+{
+	return (index % RX_BUFFERS_PER_PAGE) * RX_BUFFER_SIZE;
 }
 
 void macb_set_hwaddr(struct macb *bp)
@@ -528,11 +538,15 @@ static void macb_tx_interrupt(struct macb *bp)
 static int macb_rx_frame(struct macb *bp, unsigned int first_frag,
 			 unsigned int last_frag)
 {
-	unsigned int len;
-	unsigned int frag;
-	unsigned int offset;
-	struct sk_buff *skb;
-	struct macb_dma_desc *desc;
+	unsigned int		len;
+	unsigned int		frag;
+	unsigned int		skb_offset;
+	unsigned int		pg_offset;
+	struct macb_rx_page	*rx_page;
+	dma_addr_t		phys;
+	void			*buf;
+	struct sk_buff		*skb;
+	struct macb_dma_desc	*desc;
 
 	desc = macb_rx_desc(bp, last_frag);
 	len = MACB_BFEXT(RX_FRMLEN, desc->ctrl);
@@ -566,7 +580,7 @@ static int macb_rx_frame(struct macb *bp, unsigned int first_frag,
 		return 1;
 	}
 
-	offset = 0;
+	skb_offset = 0;
 	len += NET_IP_ALIGN;
 	skb_checksum_none_assert(skb);
 	skb_put(skb, len);
@@ -574,13 +588,28 @@ static int macb_rx_frame(struct macb *bp, unsigned int first_frag,
 	for (frag = first_frag; ; frag++) {
 		unsigned int frag_len = RX_BUFFER_SIZE;
 
-		if (offset + frag_len > len) {
+		if (skb_offset + frag_len > len) {
 			BUG_ON(frag != last_frag);
-			frag_len = len - offset;
+			frag_len = len - skb_offset;
 		}
-		skb_copy_to_linear_data_offset(skb, offset,
-				macb_rx_buffer(bp, frag), frag_len);
-		offset += RX_BUFFER_SIZE;
+
+		rx_page = macb_rx_page(bp, frag);
+		pg_offset = macb_rx_page_offset(bp, frag);
+		phys = rx_page->phys;
+
+		dma_sync_single_range_for_cpu(&bp->pdev->dev, phys,
+				pg_offset, frag_len, DMA_FROM_DEVICE);
+
+		buf = kmap_atomic(rx_page->page);
+		skb_copy_to_linear_data_offset(skb, skb_offset,
+				buf + pg_offset, frag_len);
+		kunmap_atomic(buf);
+
+		skb_offset += frag_len;
+
+		dma_sync_single_range_for_device(&bp->pdev->dev, phys,
+				pg_offset, frag_len, DMA_FROM_DEVICE);
+
 		desc = macb_rx_desc(bp, frag);
 		desc->addr &= ~MACB_BIT(RX_USED);
 
@@ -860,86 +889,90 @@ static int macb_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	return NETDEV_TX_OK;
 }
 
-static void macb_free_consistent(struct macb *bp)
+static void macb_free_rings(struct macb *bp)
 {
-	if (bp->tx_skb) {
-		kfree(bp->tx_skb);
-		bp->tx_skb = NULL;
-	}
-	if (bp->rx_ring) {
-		dma_free_coherent(&bp->pdev->dev, RX_RING_BYTES,
-				  bp->rx_ring, bp->rx_ring_dma);
-		bp->rx_ring = NULL;
-	}
-	if (bp->tx_ring) {
-		dma_free_coherent(&bp->pdev->dev, TX_RING_BYTES,
-				  bp->tx_ring, bp->tx_ring_dma);
-		bp->tx_ring = NULL;
-	}
-	if (bp->rx_buffers) {
-		dma_free_coherent(&bp->pdev->dev,
-				  RX_RING_SIZE * RX_BUFFER_SIZE,
-				  bp->rx_buffers, bp->rx_buffers_dma);
-		bp->rx_buffers = NULL;
+	int i;
+
+	for (i = 0; i < RX_RING_PAGES; i++) {
+		struct macb_rx_page *rx_page = &bp->rx_page[i];
+
+		if (!rx_page->page)
+			continue;
+
+		dma_unmap_page(&bp->pdev->dev, rx_page->phys,
+			       PAGE_SIZE, DMA_FROM_DEVICE);
+		put_page(rx_page->page);
+		rx_page->page = NULL;
 	}
+
+	kfree(bp->tx_skb);
+	kfree(bp->rx_page);
+	dma_free_coherent(&bp->pdev->dev, TX_RING_BYTES, bp->tx_ring,
+			  bp->tx_ring_dma);
+	dma_free_coherent(&bp->pdev->dev, RX_RING_BYTES, bp->rx_ring,
+			  bp->rx_ring_dma);
 }
 
-static int macb_alloc_consistent(struct macb *bp)
+static int macb_init_rings(struct macb *bp)
 {
-	int size;
+	struct page	*page;
+	dma_addr_t	phys;
+	unsigned int	page_idx;
+	unsigned int	ring_idx;
+	unsigned int	i;
 
-	size = TX_RING_SIZE * sizeof(struct macb_tx_skb);
-	bp->tx_skb = kmalloc(size, GFP_KERNEL);
-	if (!bp->tx_skb)
-		goto out_err;
-
-	size = RX_RING_BYTES;
-	bp->rx_ring = dma_alloc_coherent(&bp->pdev->dev, size,
+	bp->rx_ring = dma_alloc_coherent(&bp->pdev->dev, RX_RING_BYTES,
 					 &bp->rx_ring_dma, GFP_KERNEL);
 	if (!bp->rx_ring)
-		goto out_err;
+		goto err_alloc_rx_ring;
+
 	netdev_dbg(bp->dev,
 		   "Allocated RX ring of %d bytes at %08lx (mapped %p)\n",
-		   size, (unsigned long)bp->rx_ring_dma, bp->rx_ring);
+		   RX_RING_BYTES, (unsigned long)bp->rx_ring_dma, bp->rx_ring);
 
-	size = TX_RING_BYTES;
-	bp->tx_ring = dma_alloc_coherent(&bp->pdev->dev, size,
+	bp->tx_ring = dma_alloc_coherent(&bp->pdev->dev, TX_RING_BYTES,
 					 &bp->tx_ring_dma, GFP_KERNEL);
 	if (!bp->tx_ring)
-		goto out_err;
-	netdev_dbg(bp->dev,
-		   "Allocated TX ring of %d bytes at %08lx (mapped %p)\n",
-		   size, (unsigned long)bp->tx_ring_dma, bp->tx_ring);
-
-	size = RX_RING_SIZE * RX_BUFFER_SIZE;
-	bp->rx_buffers = dma_alloc_coherent(&bp->pdev->dev, size,
-					    &bp->rx_buffers_dma, GFP_KERNEL);
-	if (!bp->rx_buffers)
-		goto out_err;
+		goto err_alloc_tx_ring;
+
 	netdev_dbg(bp->dev,
-		   "Allocated RX buffers of %d bytes at %08lx (mapped %p)\n",
-		   size, (unsigned long)bp->rx_buffers_dma, bp->rx_buffers);
+		   "Allocated TX ring of %d bytes at 0x%08lx (mapped %p)\n",
+		   TX_RING_BYTES, (unsigned long)bp->tx_ring_dma, bp->tx_ring);
 
-	return 0;
+	bp->rx_page = kcalloc(RX_RING_PAGES, sizeof(struct macb_rx_page),
+			      GFP_KERNEL);
+	if (!bp->rx_page)
+		goto err_alloc_rx_page;
 
-out_err:
-	macb_free_consistent(bp);
-	return -ENOMEM;
-}
+	bp->tx_skb = kcalloc(TX_RING_SIZE, sizeof(struct macb_tx_skb),
+			     GFP_KERNEL);
+	if (!bp->tx_skb)
+		goto err_alloc_tx_skb;
 
-static void macb_init_rings(struct macb *bp)
-{
-	int i;
-	dma_addr_t addr;
+	for (page_idx = 0, ring_idx = 0; page_idx < RX_RING_PAGES; page_idx++) {
+		page = alloc_page(GFP_KERNEL);
+		if (!page)
+			goto err_alloc_page;
+
+		phys = dma_map_page(&bp->pdev->dev, page, 0, PAGE_SIZE,
+				    DMA_FROM_DEVICE);
+		if (dma_mapping_error(&bp->pdev->dev, phys))
+			goto err_map_page;
+
+		bp->rx_page[page_idx].page = page;
+		bp->rx_page[page_idx].phys = phys;
 
-	addr = bp->rx_buffers_dma;
-	for (i = 0; i < RX_RING_SIZE; i++) {
-		bp->rx_ring[i].addr = addr;
-		bp->rx_ring[i].ctrl = 0;
-		addr += RX_BUFFER_SIZE;
+		for (i = 0; i < RX_BUFFERS_PER_PAGE; i++, ring_idx++) {
+			bp->rx_ring[ring_idx].addr = phys;
+			bp->rx_ring[ring_idx].ctrl = 0;
+			phys += RX_BUFFER_SIZE;
+		}
 	}
 	bp->rx_ring[RX_RING_SIZE - 1].addr |= MACB_BIT(RX_WRAP);
 
+	netdev_dbg(bp->dev, "Allocated %u RX buffers (%lu pages)\n",
+		   RX_RING_SIZE, RX_RING_PAGES);
+
 	for (i = 0; i < TX_RING_SIZE; i++) {
 		bp->tx_ring[i].addr = 0;
 		bp->tx_ring[i].ctrl = MACB_BIT(TX_USED);
@@ -947,6 +980,28 @@ static void macb_init_rings(struct macb *bp)
 	bp->tx_ring[TX_RING_SIZE - 1].ctrl |= MACB_BIT(TX_WRAP);
 
 	bp->rx_tail = bp->tx_head = bp->tx_tail = 0;
+
+	return 0;
+
+err_map_page:
+	__free_page(page);
+err_alloc_page:
+	while (page_idx--) {
+		dma_unmap_page(&bp->pdev->dev, bp->rx_page[page_idx].phys,
+			       PAGE_SIZE, DMA_FROM_DEVICE);
+		__free_page(bp->rx_page[page_idx].page);
+	}
+	kfree(bp->tx_skb);
+err_alloc_tx_skb:
+	kfree(bp->rx_page);
+err_alloc_rx_page:
+	dma_free_coherent(&bp->pdev->dev, TX_RING_BYTES, bp->tx_ring,
+			  bp->tx_ring_dma);
+err_alloc_tx_ring:
+	dma_free_coherent(&bp->pdev->dev, RX_RING_BYTES, bp->rx_ring,
+			  bp->rx_ring_dma);
+err_alloc_rx_ring:
+	return -ENOMEM;
 }
 
 static void macb_reset_hw(struct macb *bp)
@@ -1221,16 +1276,15 @@ static int macb_open(struct net_device *dev)
 	if (!bp->phy_dev)
 		return -EAGAIN;
 
-	err = macb_alloc_consistent(bp);
+	err = macb_init_rings(bp);
 	if (err) {
-		netdev_err(dev, "Unable to allocate DMA memory (error %d)\n",
+		netdev_err(dev, "Unable to allocate DMA rings (error %d)\n",
 			   err);
 		return err;
 	}
 
 	napi_enable(&bp->napi);
 
-	macb_init_rings(bp);
 	macb_init_hw(bp);
 
 	/* schedule a link state check */
@@ -1257,7 +1311,7 @@ static int macb_close(struct net_device *dev)
 	netif_carrier_off(dev);
 	spin_unlock_irqrestore(&bp->lock, flags);
 
-	macb_free_consistent(bp);
+	macb_free_rings(bp);
 
 	return 0;
 }
diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
index 570908b..74e68a3 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -453,6 +453,23 @@ struct macb_dma_desc {
 #define MACB_TX_USED_SIZE			1
 
 /**
+ * struct macb_rx_page - data associated with a page used as RX buffers
+ * @page: Physical page used as storage for the buffers
+ * @phys: DMA address of the page
+ *
+ * Each page is used to provide %MACB_RX_BUFFERS_PER_PAGE RX buffers.
+ * The page gets an initial reference when it is inserted into the
+ * ring, and an additional reference each time it is passed up the
+ * stack as a fragment. When all the buffers have been used, we drop
+ * the initial reference and allocate a new page. Any additional
+ * references are dropped when the higher layers free the skb.
+ */
+struct macb_rx_page {
+	struct page		*page;
+	dma_addr_t		phys;
+};
+
+/**
  * struct macb_tx_skb - data about an skb which is being transmitted
  * @skb: skb currently being transmitted
  * @mapping: DMA address of the skb's data buffer
@@ -543,7 +560,7 @@ struct macb {
 
 	unsigned int		rx_tail;
 	struct macb_dma_desc	*rx_ring;
-	void			*rx_buffers;
+	struct macb_rx_page	*rx_page;
 
 	unsigned int		tx_head, tx_tail;
 	struct macb_dma_desc	*tx_ring;
@@ -564,7 +581,6 @@ struct macb {
 
 	dma_addr_t		rx_ring_dma;
 	dma_addr_t		tx_ring_dma;
-	dma_addr_t		rx_buffers_dma;
 
 	struct mii_bus		*mii_bus;
 	struct phy_device	*phy_dev;
-- 
1.8.0

^ permalink raw reply related

* [PATCH] net/macb: GEM DMA configuration register update
From: Nicolas Ferre @ 2012-11-23 13:49 UTC (permalink / raw)
  To: David S. Miller, netdev
  Cc: linux-arm-kernel, linux-kernel, Joachim Eastwood,
	Jean-Christophe PLAGNIOL-VILLARD, Nicolas Ferre

Add information to the DMA Configuration Register to
maximize system performance:
- rx/tx packet buffer full memory size
- allow possibility to use INCR16 if supported

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
---
 drivers/net/ethernet/cadence/macb.c | 10 ++++++++--
 drivers/net/ethernet/cadence/macb.h | 11 +++++++++++
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c
index cc6e593..6a59bce 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -1033,8 +1033,12 @@ static u32 macb_dbw(struct macb *bp)
 }
 
 /*
- * Configure the receive DMA engine to use the correct receive buffer size.
- * This is a configurable parameter for GEM.
+ * Configure the receive DMA engine
+ * - use the correct receive buffer size
+ * - set the possibility to use INCR16 bursts
+ *   (if not supported by FIFO, it will fallback to default)
+ * - set both rx/tx packet buffers to full memory size
+ * These are configurable parameters for GEM.
  */
 static void macb_configure_dma(struct macb *bp)
 {
@@ -1043,6 +1047,8 @@ static void macb_configure_dma(struct macb *bp)
 	if (macb_is_gem(bp)) {
 		dmacfg = gem_readl(bp, DMACFG) & ~GEM_BF(RXBS, -1L);
 		dmacfg |= GEM_BF(RXBS, RX_BUFFER_SIZE / 64);
+		dmacfg |= GEM_BF(FBLDO, 16);
+		dmacfg |= GEM_BIT(TXPBMS) | GEM_BF(RXBMS, -1L);
 		gem_writel(bp, DMACFG, dmacfg);
 	}
 }
diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
index 4414421..570908b 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -171,8 +171,19 @@
 #define GEM_DBW128				2
 
 /* Bitfields in DMACFG. */
+#define GEM_FBLDO_OFFSET			0
+#define GEM_FBLDO_SIZE				5
+#define GEM_RXBMS_OFFSET			8
+#define GEM_RXBMS_SIZE				2
+#define GEM_TXPBMS_OFFSET			10
+#define GEM_TXPBMS_SIZE				1
+#define GEM_TXCOEN_OFFSET			11
+#define GEM_TXCOEN_SIZE				1
 #define GEM_RXBS_OFFSET				16
 #define GEM_RXBS_SIZE				8
+#define GEM_DDRP_OFFSET				24
+#define GEM_DDRP_SIZE				1
+
 
 /* Bitfields in NSR */
 #define MACB_NSR_LINK_OFFSET			0
-- 
1.8.0

^ permalink raw reply related

* Re: [PATCH] kvaser_usb: fix "dma on the stack" errors
From: Marc Kleine-Budde @ 2012-11-23 13:48 UTC (permalink / raw)
  To: Olivier Sobrie; +Cc: linux-can, Wolfgang Grandegge, netdev, linux-usb
In-Reply-To: <20121123134027.GA32098@hposo>

[-- Attachment #1: Type: text/plain, Size: 1725 bytes --]

On 11/23/2012 02:40 PM, Olivier Sobrie wrote:
> On Fri, Nov 23, 2012 at 02:30:28PM +0100, Olivier Sobrie wrote:
>> The dma buffer given to usb_bulk_msg() must be allocated and not on
>> the stack.
>> See Documentation/DMA-API-HOWTO.txt section "What memory is DMA'able?"
>>
>> Signed-off-by: Olivier Sobrie <olivier@sobrie.be>
>> ---
>> Here is the incremental patch.
>> Thank you Greg !
>>
>> Olivier
>>
>>  drivers/net/can/usb/kvaser_usb.c |  110 ++++++++++++++++++++++++--------------
>>  1 file changed, 69 insertions(+), 41 deletions(-)
>>
>> diff --git a/drivers/net/can/usb/kvaser_usb.c b/drivers/net/can/usb/kvaser_usb.c
>> index 8807bf8..7ac6e82 100644
>> --- a/drivers/net/can/usb/kvaser_usb.c
>> +++ b/drivers/net/can/usb/kvaser_usb.c
>> @@ -421,14 +421,21 @@ end:
>>  static int kvaser_usb_send_simple_msg(const struct kvaser_usb *dev,
>>  				      u8 msg_id, int channel)
>>  {
>> -	struct kvaser_msg msg = {
>> -		.len = MSG_HEADER_LEN + sizeof(struct kvaser_msg_simple),
>> -		.id = msg_id,
>> -		.u.simple.channel = channel,
>> -		.u.simple.tid = 0xff,
>> -	};
>> -
>> -	return kvaser_usb_send_msg(dev, &msg);
>> +	struct kvaser_msg *msg;
>> +	int rc;
>> +
>> +	msg = kmalloc(sizeof(*msg), GFP_KERNEL);
>> +	if (!msg)
>> +		return -ENOMEM;
>> +
> 
> Doh! I removed by mistake the line "msg->id = msg_id"... grr

Please send a v2 version of this patch with this problem fixed.

MarcMarc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 261 bytes --]

^ permalink raw reply

* Re: [PATCH] kvaser_usb: fix "dma on the stack" errors
From: Olivier Sobrie @ 2012-11-23 13:40 UTC (permalink / raw)
  To: Marc Kleine-Budde, linux-can
  Cc: Greg KH, Wolfgang Grandegge, netdev, linux-usb, Daniel Berglund
In-Reply-To: <1353677428-15805-1-git-send-email-olivier@sobrie.be>

On Fri, Nov 23, 2012 at 02:30:28PM +0100, Olivier Sobrie wrote:
> The dma buffer given to usb_bulk_msg() must be allocated and not on
> the stack.
> See Documentation/DMA-API-HOWTO.txt section "What memory is DMA'able?"
> 
> Signed-off-by: Olivier Sobrie <olivier@sobrie.be>
> ---
> Here is the incremental patch.
> Thank you Greg !
> 
> Olivier
> 
>  drivers/net/can/usb/kvaser_usb.c |  110 ++++++++++++++++++++++++--------------
>  1 file changed, 69 insertions(+), 41 deletions(-)
> 
> diff --git a/drivers/net/can/usb/kvaser_usb.c b/drivers/net/can/usb/kvaser_usb.c
> index 8807bf8..7ac6e82 100644
> --- a/drivers/net/can/usb/kvaser_usb.c
> +++ b/drivers/net/can/usb/kvaser_usb.c
> @@ -421,14 +421,21 @@ end:
>  static int kvaser_usb_send_simple_msg(const struct kvaser_usb *dev,
>  				      u8 msg_id, int channel)
>  {
> -	struct kvaser_msg msg = {
> -		.len = MSG_HEADER_LEN + sizeof(struct kvaser_msg_simple),
> -		.id = msg_id,
> -		.u.simple.channel = channel,
> -		.u.simple.tid = 0xff,
> -	};
> -
> -	return kvaser_usb_send_msg(dev, &msg);
> +	struct kvaser_msg *msg;
> +	int rc;
> +
> +	msg = kmalloc(sizeof(*msg), GFP_KERNEL);
> +	if (!msg)
> +		return -ENOMEM;
> +

Doh! I removed by mistake the line "msg->id = msg_id"... grr

> +	msg->len = MSG_HEADER_LEN + sizeof(struct kvaser_msg_simple);
> +	msg->u.simple.channel = channel;
> +	msg->u.simple.tid = 0xff;
> +
> +	rc = kvaser_usb_send_msg(dev, msg);
> +
> +	kfree(msg);
> +	return rc;
>  }
>  
>  static int kvaser_usb_get_software_info(struct kvaser_usb *dev)
> @@ -1057,20 +1064,27 @@ static int kvaser_usb_setup_rx_urbs(struct kvaser_usb *dev)
>  
>  static int kvaser_usb_set_opt_mode(const struct kvaser_usb_net_priv *priv)
>  {
> -	struct kvaser_msg msg = {
> -		.id = CMD_SET_CTRL_MODE,
> -		.len = MSG_HEADER_LEN +
> -		       sizeof(struct kvaser_msg_ctrl_mode),
> -		.u.ctrl_mode.tid = 0xff,
> -		.u.ctrl_mode.channel = priv->channel,
> -	};
> +	struct kvaser_msg *msg;
> +	int rc;
> +
> +	msg = kmalloc(sizeof(*msg), GFP_KERNEL);
> +	if (!msg)
> +		return -ENOMEM;
> +
> +	msg->id = CMD_SET_CTRL_MODE;
> +	msg->len = MSG_HEADER_LEN + sizeof(struct kvaser_msg_ctrl_mode);
> +	msg->u.ctrl_mode.tid = 0xff;
> +	msg->u.ctrl_mode.channel = priv->channel;
>  
>  	if (priv->can.ctrlmode & CAN_CTRLMODE_LISTENONLY)
> -		msg.u.ctrl_mode.ctrl_mode = KVASER_CTRL_MODE_SILENT;
> +		msg->u.ctrl_mode.ctrl_mode = KVASER_CTRL_MODE_SILENT;
>  	else
> -		msg.u.ctrl_mode.ctrl_mode = KVASER_CTRL_MODE_NORMAL;
> +		msg->u.ctrl_mode.ctrl_mode = KVASER_CTRL_MODE_NORMAL;
> +
> +	rc = kvaser_usb_send_msg(priv->dev, msg);
>  
> -	return kvaser_usb_send_msg(priv->dev, &msg);
> +	kfree(msg);
> +	return rc;
>  }
>  
>  static int kvaser_usb_start_chip(struct kvaser_usb_net_priv *priv)
> @@ -1163,15 +1177,22 @@ static int kvaser_usb_stop_chip(struct kvaser_usb_net_priv *priv)
>  
>  static int kvaser_usb_flush_queue(struct kvaser_usb_net_priv *priv)
>  {
> -	struct kvaser_msg msg = {
> -		.id = CMD_FLUSH_QUEUE,
> -		.len = MSG_HEADER_LEN +
> -		       sizeof(struct kvaser_msg_flush_queue),
> -		.u.flush_queue.channel = priv->channel,
> -		.u.flush_queue.flags = 0x00,
> -	};
> -
> -	return kvaser_usb_send_msg(priv->dev, &msg);
> +	struct kvaser_msg *msg;
> +	int rc;
> +
> +	msg = kmalloc(sizeof(*msg), GFP_KERNEL);
> +	if (!msg)
> +		return -ENOMEM;
> +
> +	msg->id = CMD_FLUSH_QUEUE;
> +	msg->len = MSG_HEADER_LEN + sizeof(struct kvaser_msg_flush_queue);
> +	msg->u.flush_queue.channel = priv->channel;
> +	msg->u.flush_queue.flags = 0x00;
> +
> +	rc = kvaser_usb_send_msg(priv->dev, msg);
> +
> +	kfree(msg);
> +	return rc;
>  }
>  
>  static int kvaser_usb_close(struct net_device *netdev)
> @@ -1364,24 +1385,31 @@ static int kvaser_usb_set_bittiming(struct net_device *netdev)
>  	struct kvaser_usb_net_priv *priv = netdev_priv(netdev);
>  	struct can_bittiming *bt = &priv->can.bittiming;
>  	struct kvaser_usb *dev = priv->dev;
> -	struct kvaser_msg msg = {
> -		.id = CMD_SET_BUS_PARAMS,
> -		.len = MSG_HEADER_LEN +
> -		       sizeof(struct kvaser_msg_busparams),
> -		.u.busparams.channel = priv->channel,
> -		.u.busparams.tid = 0xff,
> -		.u.busparams.bitrate = cpu_to_le32(bt->bitrate),
> -		.u.busparams.sjw = bt->sjw,
> -		.u.busparams.tseg1 = bt->prop_seg + bt->phase_seg1,
> -		.u.busparams.tseg2 = bt->phase_seg2,
> -	};
> +	struct kvaser_msg *msg;
> +	int rc;
> +
> +	msg = kmalloc(sizeof(*msg), GFP_KERNEL);
> +	if (!msg)
> +		return -ENOMEM;
> +
> +	msg->id = CMD_SET_BUS_PARAMS;
> +	msg->len = MSG_HEADER_LEN + sizeof(struct kvaser_msg_busparams);
> +	msg->u.busparams.channel = priv->channel;
> +	msg->u.busparams.tid = 0xff;
> +	msg->u.busparams.bitrate = cpu_to_le32(bt->bitrate);
> +	msg->u.busparams.sjw = bt->sjw;
> +	msg->u.busparams.tseg1 = bt->prop_seg + bt->phase_seg1;
> +	msg->u.busparams.tseg2 = bt->phase_seg2;
>  
>  	if (priv->can.ctrlmode & CAN_CTRLMODE_3_SAMPLES)
> -		msg.u.busparams.no_samp = 3;
> +		msg->u.busparams.no_samp = 3;
>  	else
> -		msg.u.busparams.no_samp = 1;
> +		msg->u.busparams.no_samp = 1;
> +
> +	rc = kvaser_usb_send_msg(dev, msg);
>  
> -	return kvaser_usb_send_msg(dev, &msg);
> +	kfree(msg);
> +	return rc;
>  }
>  
>  static int kvaser_usb_set_mode(struct net_device *netdev,
> -- 
> 1.7.9.5
> 

-- 
Olivier

^ permalink raw reply

* [PATCH] kvaser_usb: fix "dma on the stack" errors
From: Olivier Sobrie @ 2012-11-23 13:30 UTC (permalink / raw)
  To: Marc Kleine-Budde, linux-can
  Cc: Greg KH, Wolfgang Grandegge, netdev, linux-usb, Daniel Berglund,
	Olivier Sobrie
In-Reply-To: <50AF3856.8060808@pengutronix.de>

The dma buffer given to usb_bulk_msg() must be allocated and not on
the stack.
See Documentation/DMA-API-HOWTO.txt section "What memory is DMA'able?"

Signed-off-by: Olivier Sobrie <olivier@sobrie.be>
---
Here is the incremental patch.
Thank you Greg !

Olivier

 drivers/net/can/usb/kvaser_usb.c |  110 ++++++++++++++++++++++++--------------
 1 file changed, 69 insertions(+), 41 deletions(-)

diff --git a/drivers/net/can/usb/kvaser_usb.c b/drivers/net/can/usb/kvaser_usb.c
index 8807bf8..7ac6e82 100644
--- a/drivers/net/can/usb/kvaser_usb.c
+++ b/drivers/net/can/usb/kvaser_usb.c
@@ -421,14 +421,21 @@ end:
 static int kvaser_usb_send_simple_msg(const struct kvaser_usb *dev,
 				      u8 msg_id, int channel)
 {
-	struct kvaser_msg msg = {
-		.len = MSG_HEADER_LEN + sizeof(struct kvaser_msg_simple),
-		.id = msg_id,
-		.u.simple.channel = channel,
-		.u.simple.tid = 0xff,
-	};
-
-	return kvaser_usb_send_msg(dev, &msg);
+	struct kvaser_msg *msg;
+	int rc;
+
+	msg = kmalloc(sizeof(*msg), GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;
+
+	msg->len = MSG_HEADER_LEN + sizeof(struct kvaser_msg_simple);
+	msg->u.simple.channel = channel;
+	msg->u.simple.tid = 0xff;
+
+	rc = kvaser_usb_send_msg(dev, msg);
+
+	kfree(msg);
+	return rc;
 }
 
 static int kvaser_usb_get_software_info(struct kvaser_usb *dev)
@@ -1057,20 +1064,27 @@ static int kvaser_usb_setup_rx_urbs(struct kvaser_usb *dev)
 
 static int kvaser_usb_set_opt_mode(const struct kvaser_usb_net_priv *priv)
 {
-	struct kvaser_msg msg = {
-		.id = CMD_SET_CTRL_MODE,
-		.len = MSG_HEADER_LEN +
-		       sizeof(struct kvaser_msg_ctrl_mode),
-		.u.ctrl_mode.tid = 0xff,
-		.u.ctrl_mode.channel = priv->channel,
-	};
+	struct kvaser_msg *msg;
+	int rc;
+
+	msg = kmalloc(sizeof(*msg), GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;
+
+	msg->id = CMD_SET_CTRL_MODE;
+	msg->len = MSG_HEADER_LEN + sizeof(struct kvaser_msg_ctrl_mode);
+	msg->u.ctrl_mode.tid = 0xff;
+	msg->u.ctrl_mode.channel = priv->channel;
 
 	if (priv->can.ctrlmode & CAN_CTRLMODE_LISTENONLY)
-		msg.u.ctrl_mode.ctrl_mode = KVASER_CTRL_MODE_SILENT;
+		msg->u.ctrl_mode.ctrl_mode = KVASER_CTRL_MODE_SILENT;
 	else
-		msg.u.ctrl_mode.ctrl_mode = KVASER_CTRL_MODE_NORMAL;
+		msg->u.ctrl_mode.ctrl_mode = KVASER_CTRL_MODE_NORMAL;
+
+	rc = kvaser_usb_send_msg(priv->dev, msg);
 
-	return kvaser_usb_send_msg(priv->dev, &msg);
+	kfree(msg);
+	return rc;
 }
 
 static int kvaser_usb_start_chip(struct kvaser_usb_net_priv *priv)
@@ -1163,15 +1177,22 @@ static int kvaser_usb_stop_chip(struct kvaser_usb_net_priv *priv)
 
 static int kvaser_usb_flush_queue(struct kvaser_usb_net_priv *priv)
 {
-	struct kvaser_msg msg = {
-		.id = CMD_FLUSH_QUEUE,
-		.len = MSG_HEADER_LEN +
-		       sizeof(struct kvaser_msg_flush_queue),
-		.u.flush_queue.channel = priv->channel,
-		.u.flush_queue.flags = 0x00,
-	};
-
-	return kvaser_usb_send_msg(priv->dev, &msg);
+	struct kvaser_msg *msg;
+	int rc;
+
+	msg = kmalloc(sizeof(*msg), GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;
+
+	msg->id = CMD_FLUSH_QUEUE;
+	msg->len = MSG_HEADER_LEN + sizeof(struct kvaser_msg_flush_queue);
+	msg->u.flush_queue.channel = priv->channel;
+	msg->u.flush_queue.flags = 0x00;
+
+	rc = kvaser_usb_send_msg(priv->dev, msg);
+
+	kfree(msg);
+	return rc;
 }
 
 static int kvaser_usb_close(struct net_device *netdev)
@@ -1364,24 +1385,31 @@ static int kvaser_usb_set_bittiming(struct net_device *netdev)
 	struct kvaser_usb_net_priv *priv = netdev_priv(netdev);
 	struct can_bittiming *bt = &priv->can.bittiming;
 	struct kvaser_usb *dev = priv->dev;
-	struct kvaser_msg msg = {
-		.id = CMD_SET_BUS_PARAMS,
-		.len = MSG_HEADER_LEN +
-		       sizeof(struct kvaser_msg_busparams),
-		.u.busparams.channel = priv->channel,
-		.u.busparams.tid = 0xff,
-		.u.busparams.bitrate = cpu_to_le32(bt->bitrate),
-		.u.busparams.sjw = bt->sjw,
-		.u.busparams.tseg1 = bt->prop_seg + bt->phase_seg1,
-		.u.busparams.tseg2 = bt->phase_seg2,
-	};
+	struct kvaser_msg *msg;
+	int rc;
+
+	msg = kmalloc(sizeof(*msg), GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;
+
+	msg->id = CMD_SET_BUS_PARAMS;
+	msg->len = MSG_HEADER_LEN + sizeof(struct kvaser_msg_busparams);
+	msg->u.busparams.channel = priv->channel;
+	msg->u.busparams.tid = 0xff;
+	msg->u.busparams.bitrate = cpu_to_le32(bt->bitrate);
+	msg->u.busparams.sjw = bt->sjw;
+	msg->u.busparams.tseg1 = bt->prop_seg + bt->phase_seg1;
+	msg->u.busparams.tseg2 = bt->phase_seg2;
 
 	if (priv->can.ctrlmode & CAN_CTRLMODE_3_SAMPLES)
-		msg.u.busparams.no_samp = 3;
+		msg->u.busparams.no_samp = 3;
 	else
-		msg.u.busparams.no_samp = 1;
+		msg->u.busparams.no_samp = 1;
+
+	rc = kvaser_usb_send_msg(dev, msg);
 
-	return kvaser_usb_send_msg(dev, &msg);
+	kfree(msg);
+	return rc;
 }
 
 static int kvaser_usb_set_mode(struct net_device *netdev,
-- 
1.7.9.5

^ permalink raw reply related

* [RFC net-next PATCH V1 4/9] net: frag helper functions for mem limit tracking
From: Jesper Dangaard Brouer @ 2012-11-23 13:08 UTC (permalink / raw)
  To: Eric Dumazet, David S. Miller, Florian Westphal
  Cc: Jesper Dangaard Brouer, netdev, Pablo Neira Ayuso, Thomas Graf,
	Cong Wang, Patrick McHardy, Paul E. McKenney, Herbert Xu
In-Reply-To: <20121123130749.18764.25962.stgit@dragon>

This change is primarily a preparation to ease the extension of memory
limit tracking.

The change does reduce the number atomic operation, during freeing of
a frag queue.  This does introduce a fairly good performance improvement, as
these atomic operations are at the core of the performance problems
seen on NUMA systems.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---

 include/net/inet_frag.h                 |   28 ++++++++++++++++++++++++++++
 include/net/ipv6.h                      |    2 +-
 net/ipv4/inet_fragment.c                |   27 +++++++++++++--------------
 net/ipv4/ip_fragment.c                  |   24 +++++++++++-------------
 net/ipv6/netfilter/nf_conntrack_reasm.c |    6 +++---
 net/ipv6/reassembly.c                   |    6 +++---
 6 files changed, 59 insertions(+), 34 deletions(-)

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 312a3fa..9bbef17 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -95,4 +95,32 @@ static inline void inet_frag_lru_add(struct netns_frags *nf,
 	list_add_tail(&q->lru_list, &nf->lru_list);
 	spin_unlock(&nf->lru_lock);
 }
+
+/* Memory Tracking Functions. */
+
+static inline int frag_mem_limit(struct netns_frags *nf)
+{
+	return atomic_read(&nf->mem);
+}
+
+static inline void sub_frag_mem_limit(struct inet_frag_queue *q, int i)
+{
+	atomic_sub(i, &q->net->mem);
+}
+
+static inline void add_frag_mem_limit(struct inet_frag_queue *q, int i)
+{
+	atomic_add(i, &q->net->mem);
+}
+
+static inline void init_frag_mem_limit(struct netns_frags *nf)
+{
+	atomic_set(&nf->mem, 0);
+}
+
+static inline int sum_frag_mem_limit(struct netns_frags *nf)
+{
+	return atomic_read(&nf->mem);
+}
+
 #endif
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 979bf6c..a5c1cf1 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -279,7 +279,7 @@ static inline int ip6_frag_nqueues(struct net *net)
 
 static inline int ip6_frag_mem(struct net *net)
 {
-	return atomic_read(&net->ipv6.frags.mem);
+	return sum_frag_mem_limit(&net->ipv6.frags);
 }
 #endif
 
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 47141ab..4f1ab8a 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -73,7 +73,7 @@ EXPORT_SYMBOL(inet_frags_init);
 void inet_frags_init_net(struct netns_frags *nf)
 {
 	nf->nqueues = 0;
-	atomic_set(&nf->mem, 0);
+	init_frag_mem_limit(nf);
 	INIT_LIST_HEAD(&nf->lru_list);
 	spin_lock_init(&nf->lru_lock);
 }
@@ -118,12 +118,8 @@ void inet_frag_kill(struct inet_frag_queue *fq, struct inet_frags *f)
 EXPORT_SYMBOL(inet_frag_kill);
 
 static inline void frag_kfree_skb(struct netns_frags *nf, struct inet_frags *f,
-		struct sk_buff *skb, int *work)
+		struct sk_buff *skb)
 {
-	if (work)
-		*work -= skb->truesize;
-
-	atomic_sub(skb->truesize, &nf->mem);
 	if (f->skb_free)
 		f->skb_free(skb);
 	kfree_skb(skb);
@@ -134,6 +130,7 @@ void inet_frag_destroy(struct inet_frag_queue *q, struct inet_frags *f,
 {
 	struct sk_buff *fp;
 	struct netns_frags *nf;
+	unsigned int sum, sum_truesize = 0;
 
 	WARN_ON(!(q->last_in & INET_FRAG_COMPLETE));
 	WARN_ON(del_timer(&q->timer) != 0);
@@ -144,13 +141,14 @@ void inet_frag_destroy(struct inet_frag_queue *q, struct inet_frags *f,
 	while (fp) {
 		struct sk_buff *xp = fp->next;
 
-		frag_kfree_skb(nf, f, fp, work);
+		sum_truesize += fp->truesize;
+		frag_kfree_skb(nf, f, fp);
 		fp = xp;
 	}
-
+	sum = sum_truesize + f->qsize;
 	if (work)
-		*work -= f->qsize;
-	atomic_sub(f->qsize, &nf->mem);
+		*work -= sum;
+	sub_frag_mem_limit(q, sum);
 
 	if (f->destructor)
 		f->destructor(q);
@@ -165,11 +163,11 @@ int inet_frag_evictor(struct netns_frags *nf, struct inet_frags *f, bool force)
 	int work, evicted = 0;
 
 	if (!force) {
-		if (atomic_read(&nf->mem) <= nf->high_thresh)
+		if (frag_mem_limit(nf) <= nf->high_thresh)
 			return 0;
 	}
 
-	work = atomic_read(&nf->mem) - nf->low_thresh;
+	work = frag_mem_limit(nf) - nf->low_thresh;
 	while (work > 0) {
 		spin_lock(&nf->lru_lock);
 
@@ -269,7 +267,7 @@ static struct inet_frag_queue *inet_frag_alloc(struct netns_frags *nf,
 	/* Guard creations of new frag queues if mem limit reached, as
 	 * we allow warm/recent elements to survive in inet_frag_evictor()
 	 */
-	if (atomic_read(&nf->mem) > nf->high_thresh)
+	if (frag_mem_limit(nf) > nf->high_thresh)
 		return NULL;
 
 	q = kzalloc(f->qsize, GFP_ATOMIC);
@@ -279,7 +277,8 @@ static struct inet_frag_queue *inet_frag_alloc(struct netns_frags *nf,
 	q->creation_ts = (u32) jiffies;
 	q->net = nf;
 	f->constructor(q, arg);
-	atomic_add(f->qsize, &nf->mem);
+	add_frag_mem_limit(q, f->qsize);
+
 	setup_timer(&q->timer, f->frag_expire, (unsigned long)q);
 	spin_lock_init(&q->lock);
 	atomic_set(&q->refcnt, 1);
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 04c9e53..cc36a0b 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -122,7 +122,7 @@ int ip_frag_nqueues(struct net *net)
 
 int ip_frag_mem(struct net *net)
 {
-	return atomic_read(&net->ipv4.frags.mem);
+	return sum_frag_mem_limit(&net->ipv4.frags);
 }
 
 static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev,
@@ -161,13 +161,6 @@ static bool ip4_frag_match(struct inet_frag_queue *q, void *a)
 		qp->user == arg->user;
 }
 
-/* Memory Tracking Functions. */
-static void frag_kfree_skb(struct netns_frags *nf, struct sk_buff *skb)
-{
-	atomic_sub(skb->truesize, &nf->mem);
-	kfree_skb(skb);
-}
-
 static void ip4_frag_init(struct inet_frag_queue *q, void *a)
 {
 	struct ipq *qp = container_of(q, struct ipq, q);
@@ -340,6 +333,7 @@ static inline int ip_frag_too_far(struct ipq *qp)
 static int ip_frag_reinit(struct ipq *qp)
 {
 	struct sk_buff *fp;
+	unsigned int sum_truesize = 0;
 
 	if (!mod_timer(&qp->q.timer, jiffies + qp->q.net->timeout)) {
 		atomic_inc(&qp->q.refcnt);
@@ -349,9 +343,12 @@ static int ip_frag_reinit(struct ipq *qp)
 	fp = qp->q.fragments;
 	do {
 		struct sk_buff *xp = fp->next;
-		frag_kfree_skb(qp->q.net, fp);
+
+		sum_truesize += fp->truesize;
+		kfree_skb(fp);
 		fp = xp;
 	} while (fp);
+	sub_frag_mem_limit(&qp->q, sum_truesize);
 
 	qp->q.last_in = 0;
 	qp->q.len = 0;
@@ -496,7 +493,8 @@ found:
 				qp->q.fragments = next;
 
 			qp->q.meat -= free_it->len;
-			frag_kfree_skb(qp->q.net, free_it);
+			sub_frag_mem_limit(&qp->q, free_it->truesize);
+			kfree_skb(free_it);
 		}
 	}
 
@@ -519,7 +517,7 @@ found:
 	qp->q.stamp = skb->tstamp;
 	qp->q.meat += skb->len;
 	qp->ecn |= ecn;
-	atomic_add(skb->truesize, &qp->q.net->mem);
+	add_frag_mem_limit(&qp->q, skb->truesize);
 	if (offset == 0)
 		qp->q.last_in |= INET_FRAG_FIRST_IN;
 
@@ -614,7 +612,7 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev,
 		head->len -= clone->len;
 		clone->csum = 0;
 		clone->ip_summed = head->ip_summed;
-		atomic_add(clone->truesize, &qp->q.net->mem);
+		add_frag_mem_limit(&qp->q, clone->truesize);
 	}
 
 	skb_push(head, head->data - skb_network_header(head));
@@ -642,7 +640,7 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev,
 		}
 		fp = next;
 	}
-	atomic_sub(sum_truesize, &qp->q.net->mem);
+	sub_frag_mem_limit(&qp->q, sum_truesize);
 
 	head->next = NULL;
 	head->dev = dev;
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index b0a1c96..c088831 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -316,7 +316,7 @@ found:
 	fq->q.meat += skb->len;
 	if (payload_len > fq->q.max_size)
 		fq->q.max_size = payload_len;
-	atomic_add(skb->truesize, &fq->q.net->mem);
+	add_frag_mem_limit(&fq->q, skb->truesize);
 
 	/* The first fragment.
 	 * nhoffset is obtained from the first fragment, of course.
@@ -394,7 +394,7 @@ nf_ct_frag6_reasm(struct frag_queue *fq, struct net_device *dev)
 		clone->ip_summed = head->ip_summed;
 
 		NFCT_FRAG6_CB(clone)->orig = NULL;
-		atomic_add(clone->truesize, &fq->q.net->mem);
+		add_frag_mem_limit(&fq->q, clone->truesize);
 	}
 
 	/* We have to remove fragment header from datagram and to relocate
@@ -418,7 +418,7 @@ nf_ct_frag6_reasm(struct frag_queue *fq, struct net_device *dev)
 			head->csum = csum_add(head->csum, fp->csum);
 		head->truesize += fp->truesize;
 	}
-	atomic_sub(head->truesize, &fq->q.net->mem);
+	sub_frag_mem_limit(&fq->q, head->truesize);
 
 	head->local_df = 1;
 	head->next = NULL;
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index b38f290..9cfe047 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -327,7 +327,7 @@ found:
 	}
 	fq->q.stamp = skb->tstamp;
 	fq->q.meat += skb->len;
-	atomic_add(skb->truesize, &fq->q.net->mem);
+	add_frag_mem_limit(&fq->q, skb->truesize);
 
 	/* The first fragment.
 	 * nhoffset is obtained from the first fragment, of course.
@@ -426,7 +426,7 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev,
 		head->len -= clone->len;
 		clone->csum = 0;
 		clone->ip_summed = head->ip_summed;
-		atomic_add(clone->truesize, &fq->q.net->mem);
+		add_frag_mem_limit(&fq->q, clone->truesize);
 	}
 
 	/* We have to remove fragment header from datagram and to relocate
@@ -464,7 +464,7 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev,
 		}
 		fp = next;
 	}
-	atomic_sub(sum_truesize, &fq->q.net->mem);
+	sub_frag_mem_limit(&fq->q, sum_truesize);
 
 	head->next = NULL;
 	head->dev = dev;

^ permalink raw reply related

* [RFC net-next PATCH V1 2/9] net: frag cache line adjust inet_frag_queue.net
From: Jesper Dangaard Brouer @ 2012-11-23 13:08 UTC (permalink / raw)
  To: Eric Dumazet, David S. Miller, Florian Westphal
  Cc: Jesper Dangaard Brouer, netdev, Pablo Neira Ayuso, Thomas Graf,
	Cong Wang, Patrick McHardy, Paul E. McKenney, Herbert Xu
In-Reply-To: <20121123130749.18764.25962.stgit@dragon>

In inet_frag_find() unfortunate cache-line bounces can occur with
struct inet_frag_queue.  As the net pointer (struct netns_frags)
is placed on the same write-often cache-line as e.g. refcnt and
lock. As the hash match check always check (q->net == nf).

This (of-cause) only happens on hash bucket collisions, but as current
hash size is only 64 this makes collisions more likely.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---

 include/net/inet_frag.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 7b897b2..1f75316 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -14,7 +14,6 @@ struct netns_frags {
 
 struct inet_frag_queue {
 	struct hlist_node	list;
-	struct netns_frags	*net;
 	struct list_head	lru_list;   /* lru list member */
 	spinlock_t		lock;
 	atomic_t		refcnt;
@@ -24,6 +23,7 @@ struct inet_frag_queue {
 	ktime_t			stamp;
 	int			len;        /* total length of orig datagram */
 	int			meat;
+	struct netns_frags	*net;
 	u32			creation_ts;/* jiffies when queue was created*/
 	__u8			last_in;    /* first/last segment arrived? */
 

^ permalink raw reply related

* [RFC net-next PATCH V1 0/9] net: fragmentation performance scalability on NUMA/SMP systems
From: Jesper Dangaard Brouer @ 2012-11-23 13:08 UTC (permalink / raw)
  To: Eric Dumazet, David S. Miller, Florian Westphal
  Cc: Jesper Dangaard Brouer, netdev, Pablo Neira Ayuso, Thomas Graf,
	Cong Wang, Patrick McHardy, Paul E. McKenney, Herbert Xu

This patchset implements significant performance improvements for
fragmentation handling in the kernel, with a focus on NUMA and SMP
based systems.

Review:

 Please review these patches.  I have on purpose added comments in the
 code with the "//" comments style.  These comments are to be removed
 before applying.  They serve as a questions to, you, the reviewer.

The fragmentation code today:

 The fragmentation code "protects" kernel resources, by implementing
 some memory resource limitation code.  This is centered around a
 global readers-writer lock, and (per network namespace) an atomic mem
 counter and a LRU (Least-Recently-Used) list.  (Although separate
 global variables and namespace resources, are kept for IPv4, IPv6
 and Netfilter reassembly.)

 The code tries to keep the memory usage between a high and low
 threshold (see: /proc/sys/net/ipv4/ipfrag_{high,low}_thresh).  The
 "evictor" code cleans up fragments, when the high threshold is
 exceeded, and stops only, when the low threshold is reached.

The scalability problem:

 Having a global/central variable for a resource limit is obviously a
 scalability issue on SMP systems, and even amplified on a NUMA based
 system.

 When profiling the code, the scalability problems appeared to be the
 readers-writer lock.  But, surprise, the primary scalability issue
 was caused by the global atomic mem limit counter, which, especially
 on NUMA systems, would prolong the time spend inside the
 readers-writer lock sections.  It is not trivial to remove the
 readers-writer lock, but it is possible to reduce the number of
 writer lock sections.

Testlab:

 My original big-testlab were based on four Intel based 10Gbit/s NICs
 on two identical Sandy-Bridge-E NUMA system.  The testlab
 used/available, while rebasing to net-next, were not as powerful.
 Its based on a single Sandy-Bridge-E NUMA system with the same Intel
 10G NICs, but the generator machine was an old Core-i7 920 with some
 older NICs. This means that I have not been able to generate full 4x
 10G wirespeed.  I have chosen (mostly) to include 2x 10G test results
 due to the generator machine (although the 4x 10G results from the
 big system looks more impressive).

 The tests are performed with netperf -t UDP_STREAM (which default
 send UDP packets with size 65507 bytes, which gets fragmented).  The
 netserver's get numactl pinned and the CPU sockets get smp_affinity
 aligned to the physical NIC connected to its own NUMA node.

Performance results:

 For the impressive 4x 10Gbit/s big-testlab results, performance goes
  from (a collective) 496 Mbit/s to 38463 Mbit/s (per stream 9615 Mbit/s)
  (at packet size 65507 bytes)

 For the results to be fair/meaningful, I'll report the used packet
 size, as (after the fixes) bigger UDP packets scale better, because
 smaller packets will require/create more frag queues to handle.

 I'll report packet size 65507 and three fragments 1472*3=4416 bytes.

 Disabled Ethernet Flow Control (via ethtool -A).  To show the real
 effect of the patches, the system needs to be in an "overload"
 situation.  When Ethernet Flow Control is enabled, the system will
 make the generator back-off, and the code path will be less stressed.
 Thus, I have disabled Ethernet Flow Control.

No patches:
 -------
 Results without any patches, and no flow control:

  2x10G size(65507) result:(7+50)     =57   Mbit/s (gen:9613+9473 Mbit/s)
  2x10G size(4416)  result:(3619+3772)=7391 Mbit/s (gen:8339+9105 Mbit/s)

 The very pure result with large frames is a result of the "evictor"
 code, which gets fixed in patch-01.

Patch-01: net: frag evictor, avoid killing warm frag queues
 -------
 The fragmentation evictor system have a very unfortunate eviction
 system for killing fragment, when the system is put under pressure.
 The evictor code basically kills "warm" fragments too quickly.
 Resulting in a massive, DoS like, performance drop, as seen above
 (no-patch) results with large packets.

 The solution is to avoid killing "warm" fragments, and rather block
 new incoming in case mem limit is exceeded. This is solved by
 introducing a creation time-stamp, which set to "jiffies" in
 inet_frag_alloc().

  2x10G size(65507) result:(3011+2568)=5579 Mbit/s (gen:9613+9553 Mbit/s)
  2x10G size(4416)  result:(3716+3518)=7234 Mbit/s (gen:9037+8614 Mbit/s)

Patch-02: cache line adjust inet_frag_queue.net (netns)
 -------
 Avoid possible cache-line bounces in struct inet_frag_queue.  By
 moving the net pointer (struct netns_frags) because its placed on the
 same write-often cache-line as e.g. refcnt and lock.

  2x10G size(65507) result:(2960+2613)=5573 Mbit/s (gen:9614+9465 Mbit/s)
  2x10G size(4416)  result:(3858+3650)=7508 Mbit/s (gen:8076+7633 Mbit/s)

 The performance benefit looks small. We can discuss if this patch is
 needed or not.

Patch-03: move LRU list maintenance outside of rwlock
 -------
 Updating the fragmentation queues LRU (Least-Recently-Used) list,
 required taking the hash writer lock.  However, the LRU list isn't
 tied to the hash at all, so we can use a separate lock for it.

 This patch looks like a performance loss for big packets, but the LRU
 locking changes are needed, by later patches.

  2x10G size(65507) result:(2533+2138)=4671 Mbit/s (gen:9612+9461 Mbit/s)
  2x10G size(4416)  result:(3952+3713)=7665 Mbit/s (gen:9168+8415 Mbit/s)

Patch-04: frag helper functions for mem limit tracking
 -------
 This patch is only meant as a preparation patch, towards the next
 patch.  The performance improvement comes from reduce the number
 atomic operation, during freeing of a frag queue, by summing the mem
 accounting before and doing a single atomic dec.

  2x10G size(65507) result:(2475+3101)=5576 Mbit/s (gen:9614+9439 Mbit/s)
  2x10G size(4416)  result:(3928+4129)=8057 Mbit/s (gen:7259+8131 Mbit/s)

Patch-05: per CPU mem limit and LRU list accounting
 -------
 The major performance bottleneck on NUMA systems, is the mem limit
 counter, which is based on an atomic counter.  This patch removes the
 cache-bouncing of the atomic counter, by moving this accounting to be
 bound to each CPU.  The LRU list also need to be done per CPU,
 in-order to keep the accounting straight.

  2x10G size(65507) result:(9603+9458)=19061 Mbit/s (gen:9614+9458 Mbit/s)
  2x10G size(4416)  result:(4871+4848)=9719 Mbit/s (gen:9107+8378 Mbit/s)

 To compare the benefit of the next patches, its necessary to increase
 the stress on the code, but doing 4x 10Gbit/s tests.

  4x10G size(65507) result:(8631+9337+7534+6928)=32430 Mbit/s
                       (gen:8646+9613+7547+6937 =32743 Mbit/s)
  4x10G size(4416)  result:(2870+2990+2993+3016)=11869 Mbit/s
                       (gen:4819+7767+6893+5043 =24522 Mbit/s)

Patch-06: nqueues_under_LRU_lock
 -------
 This patch just moves the nqueues counter under the LRU lock (and
 per CPU), instead of the write lock, to prepare for next patch.  No
 need for performance testing this part.

Patch-07: hash_bucket_locking
 -------
 This patch implements per hash bucket locking for the frag queue
 hash.  This removes two write locks, and the only remaining write
 lock is for protecting hash rebuild.  This essentially reduces the
 readers-writer lock to a rebuild lock.

 UPDATE: This patch can result in a OOPS during hash rebuilding.
 Needs more work before its safe to apply.

  2x10G size(65507) result:(9602+9466)=19068 Mbit/s (gen:9613+9472 Mbit/s)
  2x10G size(4416)  result:(5024+4925)= 9949 Mbit/s (gen:8581+8957 Mbit/s)

 To see the real benefit of this patch, we need to crank up the load
 and stress on the code, with 4x 10Gbit/s at small packets,
 improvement at size(4416): before 11869 Mbit/s now 17155 Mbit/s. Also
 note the regression at size(65507) 32430 -> 31021.

  4x10G size(65507) result:(7618+8708+7381+7314)=31021 Mbit/s
                       (gen:7628+9501+8728+7321 =33178 Mbit/s)
  4x10G size(4416)  result:(4156+4714+4300+3985)=17155 Mbit/s
                       (gen:6614+5330+7745+5366 =25055 Mbit/s)

 At 4x10G size(4416) I have seen 206 frag queues in use, and hash size is 64.

Patch-08: cache_align_hash_bucket
 -------
 Increase frag queue hash size and assure cache-line alignment to
 avoid false sharing.  Hash size is set to 256, because I have
 observed 206 frag queues in use at 4x10G with packet size 4416 bytes.

  2x10G size(65507) result:(9601+9414)=19015 Mbit/s (gen:9614+9434 Mbit/s)
  2x10G size(4416)  result:(5421+5268)=10689 Mbit/s (gen:8028+7457 Mbit/s)

 This does introduce an improvement (although not as big as I
 expected), but most importantly the regression seen in patch-07 4x10G
 at size(65507) is gone (patch-05:32430 Mbits/s -> 32676 Mbit).

  4x10G size(65507) result:(7604+8307+9593+7172)=32676 Mbit/s
                       (gen:7615+8713+9606+7184 =33118 Mbit/s)
  4x10G size(4416)  result:(4890+4364+4139+4530)=17923 Mbit/s
                       (gen:5170+6873+5215+7632 =24890 Mbit/s)

 After this patch it looks like the read lock is now the new
 contention point.

Patch-09: Hack disable rebuild and remove rw_lock
 -------
 I've done a quick hack patch, that remove the readers-writer lock, by
 disabling/breaking hash rebuilding.  Just to see how big the
 performance gain would be.

  2x10G size(4416) result: 6481+6764 = 13245 Mbit/s (gen: 7652+8077 Mbit/s)

  4x10G size(4416) result:(5610+6283+5735+5238)=22866 Mbit/s
                     (gen: 6530+7860+5967+5238 =25595 Mbit/s)

 And the results show, that its a big win. With 4x10G size(4416)
 before: 17923 Mbit/s -> now: 22866 Mbit/s increase 4943 Mbit/s.
 With 2x10G size(4416) before 10689 Mbit/s -> 13245 Mbit/s
 increase 2556 Mbit/s.

 I'll work on a real solution for removing the rw_lock while still
 supporting hash rebuilding.  Suggestions and ideas are welcome.


This patchset is based upon:
  Davem's net-next tree:
    git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git
  On top of:
    commit ff33c0e1885cda44dd14c79f70df4706f83582a0
    (net: Remove bogus dependencies on INET)

---

Jesper Dangaard Brouer (9):
      net: frag remove readers-writer lock (hack)
      net: increase frag queue hash size and cache-line
      net: frag queue locking per hash bucket
      net: frag, move nqueues counter under LRU lock protection
      net: frag per CPU mem limit and LRU list accounting
      net: frag helper functions for mem limit tracking
      net: frag, move LRU list maintenance outside of rwlock
      net: frag cache line adjust inet_frag_queue.net
      net: frag evictor, avoid killing warm frag queues


 include/net/inet_frag.h                 |  120 +++++++++++++++++++++++--
 include/net/ipv6.h                      |    4 -
 net/ipv4/inet_fragment.c                |  150 ++++++++++++++++++++++---------
 net/ipv4/ip_fragment.c                  |   43 +++++----
 net/ipv6/netfilter/nf_conntrack_reasm.c |   13 +--
 net/ipv6/reassembly.c                   |   16 ++-
 6 files changed, 259 insertions(+), 87 deletions(-)


--
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* [RFC net-next PATCH V1 9/9] net: frag remove readers-writer lock (hack)
From: Jesper Dangaard Brouer @ 2012-11-23 13:08 UTC (permalink / raw)
  To: Eric Dumazet, David S. Miller, Florian Westphal
  Cc: Jesper Dangaard Brouer, netdev, Pablo Neira Ayuso, Thomas Graf,
	Cong Wang, Patrick McHardy, Paul E. McKenney, Herbert Xu
In-Reply-To: <20121123130749.18764.25962.stgit@dragon>

Do NOT APPLY this patch.

After all the other patches, the rw_lock is now the contention point.

This is a quick hack, that remove the readers-writer lock, by
disabling/breaking hash rebuilding.  Just to see how big the
performance gain would be.

  2x10G size(4416) result: 6481+6764 = 13245 Mbit/s (gen: 7652+8077 Mbit/s)

  4x10G size(4416) result:(5610+6283+5735+5238)=22866 Mbit/s
                     (gen: 6530+7860+5967+5238 =25595 Mbit/s)

And the results show, that its a big win. With 4x10G size(4416)
before: 17923 Mbit/s -> now: 22866 Mbit/s increase 4943 Mbit/s.
With 2x10G size(4416) before 10689 Mbit/s -> 13245 Mbit/s
increase 2556 Mbit/s.

I'll work on a real solution for removing the rw_lock while still
supporting hash rebuilding.  Suggestions and ideas are welcome.

NOT-signed-off
---

 include/net/inet_frag.h                 |    2 +-
 net/ipv4/inet_fragment.c                |   23 +++++++++++++----------
 net/ipv4/ip_fragment.c                  |    2 +-
 net/ipv6/netfilter/nf_conntrack_reasm.c |    2 +-
 net/ipv6/reassembly.c                   |    2 +-
 5 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 5054228..2fb8578 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -58,7 +58,7 @@ struct inet_frag_bucket {
 
 struct inet_frags {
 	struct inet_frag_bucket	hash[INETFRAGS_HASHSZ];
-	rwlock_t		lock; /* Rebuild lock */
+//	rwlock_t		lock; /* Rebuild lock */
 	u32			rnd;
 	int			qsize;
 	int			secret_interval;
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 447423f..63227d6 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -35,8 +35,11 @@ static void inet_frag_secret_rebuild(unsigned long dummy)
 	unsigned long now = jiffies;
 	int i;
 
+	//HACK don't rebuild
+	return;
+
 	/* Per bucket lock NOT needed here, due to write lock protection */
-	write_lock(&f->lock);
+//	write_lock(&f->lock);
 
 	get_random_bytes(&f->rnd, sizeof(u32));
 	for (i = 0; i < INETFRAGS_HASHSZ; i++) {
@@ -59,7 +62,7 @@ static void inet_frag_secret_rebuild(unsigned long dummy)
 			}
 		}
 	}
-	write_unlock(&f->lock);
+//	write_unlock(&f->lock);
 
 	mod_timer(&f->secret_timer, now + f->secret_interval);
 }
@@ -74,7 +77,7 @@ void inet_frags_init(struct inet_frags *f)
 		spin_lock_init(&hb->chain_lock);
 		INIT_HLIST_HEAD(&hb->chain);
 	}
-	rwlock_init(&f->lock);
+//	rwlock_init(&f->lock);
 
 	f->rnd = (u32) ((num_physpages ^ (num_physpages>>7)) ^
 				   (jiffies ^ (jiffies >> 6)));
@@ -115,14 +118,14 @@ static inline void fq_unlink(struct inet_frag_queue *fq, struct inet_frags *f)
 	struct inet_frag_bucket *hb;
 	unsigned int hash;
 
-	read_lock(&f->lock);
+	//read_lock(&f->lock);
 	hash = f->hashfn(fq);
 	hb = &f->hash[hash];
 
 	spin_lock_bh(&hb->chain_lock);
 	hlist_del(&fq->list);
 	spin_unlock_bh(&hb->chain_lock);
-	read_unlock(&f->lock);
+	//read_unlock(&f->lock);
 	inet_frag_lru_del(fq);
 }
 
@@ -249,7 +252,7 @@ static struct inet_frag_queue *inet_frag_intern(struct netns_frags *nf,
 #endif
 	unsigned int hash;
 
-	read_lock(&f->lock); /* Protects against hash rebuild */
+	//read_lock(&f->lock); /* Protects against hash rebuild */
 	/*
 	 * While we stayed w/o the lock other CPU could update
 	 * the rnd seed, so we need to re-calculate the hash
@@ -268,7 +271,7 @@ static struct inet_frag_queue *inet_frag_intern(struct netns_frags *nf,
 		if (qp->net == nf && f->match(qp, arg)) {
 			atomic_inc(&qp->refcnt);
 			spin_unlock_bh(&hb->chain_lock);
-			read_unlock(&f->lock);
+			//read_unlock(&f->lock);
 			qp_in->last_in |= INET_FRAG_COMPLETE;
 			inet_frag_put(qp_in, f);
 			return qp;
@@ -282,7 +285,7 @@ static struct inet_frag_queue *inet_frag_intern(struct netns_frags *nf,
 	atomic_inc(&qp->refcnt);
 	hlist_add_head(&qp->list, &hb->chain);
 	spin_unlock_bh(&hb->chain_lock);
-	read_unlock(&f->lock);
+	//read_unlock(&f->lock);
 	inet_frag_lru_add(nf, qp);
 	return qp;
 }
@@ -342,12 +345,12 @@ struct inet_frag_queue *inet_frag_find(struct netns_frags *nf,
 		if (q->net == nf && f->match(q, key)) {
 			atomic_inc(&q->refcnt);
 			spin_unlock_bh(&hb->chain_lock);
-			read_unlock(&f->lock);
+			//read_unlock(&f->lock);
 			return q;
 		}
 	}
 	spin_unlock_bh(&hb->chain_lock);
-	read_unlock(&f->lock);
+	//read_unlock(&f->lock);
 
 	return inet_frag_create(nf, f, key);
 }
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 7b1cf51..b2cb05f 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -289,7 +289,7 @@ static inline struct ipq *ip_find(struct net *net, struct iphdr *iph, u32 user)
 	arg.iph = iph;
 	arg.user = user;
 
-	read_lock(&ip4_frags.lock);
+	//read_lock(&ip4_frags.lock);
 	hash = ipqhashfn(iph->id, iph->saddr, iph->daddr, iph->protocol);
 
 	q = inet_frag_find(&net->ipv4.frags, &ip4_frags, &arg, hash);
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index c088831..5b57e03 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -175,7 +175,7 @@ static inline struct frag_queue *fq_find(struct net *net, __be32 id,
 	arg.src = src;
 	arg.dst = dst;
 
-	read_lock_bh(&nf_frags.lock);
+	//read_lock_bh(&nf_frags.lock);
 	hash = inet6_hash_frag(id, src, dst, nf_frags.rnd);
 
 	q = inet_frag_find(&net->nf_frag.frags, &nf_frags, &arg, hash);
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index 9cfe047..2c74394 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -193,7 +193,7 @@ fq_find(struct net *net, __be32 id, const struct in6_addr *src, const struct in6
 	arg.src = src;
 	arg.dst = dst;
 
-	read_lock(&ip6_frags.lock);
+	//read_lock(&ip6_frags.lock);
 	hash = inet6_hash_frag(id, src, dst, ip6_frags.rnd);
 
 	q = inet_frag_find(&net->ipv6.frags, &ip6_frags, &arg, hash);

^ permalink raw reply related

* [RFC net-next PATCH V1 8/9] net: increase frag queue hash size and cache-line
From: Jesper Dangaard Brouer @ 2012-11-23 13:08 UTC (permalink / raw)
  To: Eric Dumazet, David S. Miller, Florian Westphal
  Cc: Jesper Dangaard Brouer, netdev, Pablo Neira Ayuso, Thomas Graf,
	Cong Wang, Patrick McHardy, Paul E. McKenney, Herbert Xu
In-Reply-To: <20121123130749.18764.25962.stgit@dragon>

Increase frag queue hash size and assure cache-line alignment to
avoid false sharing.  Hash size is set to 256, because I have
observed 206 frag queues in use at 4x10G with packet size 4416 bytes.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---

 include/net/inet_frag.h |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 1efec6b..5054228 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -49,13 +49,12 @@ struct inet_frag_queue {
 	u16			max_size;
 };
 
-#define INETFRAGS_HASHSZ		64
-
+#define INETFRAGS_HASHSZ	256
 
 struct inet_frag_bucket {
 	struct hlist_head	chain;
 	spinlock_t		chain_lock;
-};
+} ____cacheline_aligned_in_smp;
 
 struct inet_frags {
 	struct inet_frag_bucket	hash[INETFRAGS_HASHSZ];

^ permalink raw reply related

* [RFC net-next PATCH V1 7/9] net: frag queue locking per hash bucket
From: Jesper Dangaard Brouer @ 2012-11-23 13:08 UTC (permalink / raw)
  To: Eric Dumazet, David S. Miller, Florian Westphal
  Cc: Jesper Dangaard Brouer, netdev, Pablo Neira Ayuso, Thomas Graf,
	Cong Wang, Patrick McHardy, Paul E. McKenney, Herbert Xu
In-Reply-To: <20121123130749.18764.25962.stgit@dragon>

DO NOT apply - patch not finished, can cause on OOPS/PANIC during hash rebuild

This patch implements per hash bucket locking for the frag queue
hash.  This removes two write locks, and the only remaining write
lock is for protecting hash rebuild.  This essentially reduce the
readers-writer lock to a rebuild lock.

NOT-Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---

 include/net/inet_frag.h  |   10 +++++++-
 net/ipv4/inet_fragment.c |   56 +++++++++++++++++++++++++++++++++++-----------
 2 files changed, 51 insertions(+), 15 deletions(-)

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 9938ea4..1efec6b 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -51,9 +51,15 @@ struct inet_frag_queue {
 
 #define INETFRAGS_HASHSZ		64
 
+
+struct inet_frag_bucket {
+	struct hlist_head	chain;
+	spinlock_t		chain_lock;
+};
+
 struct inet_frags {
-	struct hlist_head	hash[INETFRAGS_HASHSZ];
-	rwlock_t		lock;
+	struct inet_frag_bucket	hash[INETFRAGS_HASHSZ];
+	rwlock_t		lock; /* Rebuild lock */
 	u32			rnd;
 	int			qsize;
 	int			secret_interval;
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 1620a21..447423f 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -35,20 +35,27 @@ static void inet_frag_secret_rebuild(unsigned long dummy)
 	unsigned long now = jiffies;
 	int i;
 
+	/* Per bucket lock NOT needed here, due to write lock protection */
 	write_lock(&f->lock);
+
 	get_random_bytes(&f->rnd, sizeof(u32));
 	for (i = 0; i < INETFRAGS_HASHSZ; i++) {
+		struct inet_frag_bucket *hb;
 		struct inet_frag_queue *q;
 		struct hlist_node *p, *n;
 
-		hlist_for_each_entry_safe(q, p, n, &f->hash[i], list) {
+		hb = &f->hash[i];
+		hlist_for_each_entry_safe(q, p, n, &hb->chain, list) {
 			unsigned int hval = f->hashfn(q);
 
 			if (hval != i) {
+				struct inet_frag_bucket *hb_dest;
+
 				hlist_del(&q->list);
 
 				/* Relink to new hash chain. */
-				hlist_add_head(&q->list, &f->hash[hval]);
+				hb_dest = &f->hash[hval];
+				hlist_add_head(&q->list, &hb->chain);
 			}
 		}
 	}
@@ -61,9 +68,12 @@ void inet_frags_init(struct inet_frags *f)
 {
 	int i;
 
-	for (i = 0; i < INETFRAGS_HASHSZ; i++)
-		INIT_HLIST_HEAD(&f->hash[i]);
+	for (i = 0; i < INETFRAGS_HASHSZ; i++) {
+		struct inet_frag_bucket *hb = &f->hash[i];
 
+		spin_lock_init(&hb->chain_lock);
+		INIT_HLIST_HEAD(&hb->chain);
+	}
 	rwlock_init(&f->lock);
 
 	f->rnd = (u32) ((num_physpages ^ (num_physpages>>7)) ^
@@ -102,9 +112,17 @@ EXPORT_SYMBOL(inet_frags_exit_net);
 
 static inline void fq_unlink(struct inet_frag_queue *fq, struct inet_frags *f)
 {
-	write_lock(&f->lock);
+	struct inet_frag_bucket *hb;
+	unsigned int hash;
+
+	read_lock(&f->lock);
+	hash = f->hashfn(fq);
+	hb = &f->hash[hash];
+
+	spin_lock_bh(&hb->chain_lock);
 	hlist_del(&fq->list);
-	write_unlock(&f->lock);
+	spin_unlock_bh(&hb->chain_lock);
+	read_unlock(&f->lock);
 	inet_frag_lru_del(fq);
 }
 
@@ -224,28 +242,33 @@ static struct inet_frag_queue *inet_frag_intern(struct netns_frags *nf,
 		struct inet_frag_queue *qp_in, struct inet_frags *f,
 		void *arg)
 {
+	struct inet_frag_bucket *hb;
 	struct inet_frag_queue *qp;
 #ifdef CONFIG_SMP
 	struct hlist_node *n;
 #endif
 	unsigned int hash;
 
-	write_lock(&f->lock);
+	read_lock(&f->lock); /* Protects against hash rebuild */
 	/*
 	 * While we stayed w/o the lock other CPU could update
 	 * the rnd seed, so we need to re-calculate the hash
 	 * chain. Fortunatelly the qp_in can be used to get one.
 	 */
 	hash = f->hashfn(qp_in);
+	hb = &f->hash[hash];
+	spin_lock_bh(&hb->chain_lock);
+
 #ifdef CONFIG_SMP
 	/* With SMP race we have to recheck hash table, because
 	 * such entry could be created on other cpu, while we
-	 * promoted read lock to write lock.
+	 * promoted read lock to write lock. ***Comment FIXME***
 	 */
-	hlist_for_each_entry(qp, n, &f->hash[hash], list) {
+	hlist_for_each_entry(qp, n, &hb->chain, list) {
 		if (qp->net == nf && f->match(qp, arg)) {
 			atomic_inc(&qp->refcnt);
-			write_unlock(&f->lock);
+			spin_unlock_bh(&hb->chain_lock);
+			read_unlock(&f->lock);
 			qp_in->last_in |= INET_FRAG_COMPLETE;
 			inet_frag_put(qp_in, f);
 			return qp;
@@ -257,8 +280,9 @@ static struct inet_frag_queue *inet_frag_intern(struct netns_frags *nf,
 		atomic_inc(&qp->refcnt);
 
 	atomic_inc(&qp->refcnt);
-	hlist_add_head(&qp->list, &f->hash[hash]);
-	write_unlock(&f->lock);
+	hlist_add_head(&qp->list, &hb->chain);
+	spin_unlock_bh(&hb->chain_lock);
+	read_unlock(&f->lock);
 	inet_frag_lru_add(nf, qp);
 	return qp;
 }
@@ -307,16 +331,22 @@ struct inet_frag_queue *inet_frag_find(struct netns_frags *nf,
 		struct inet_frags *f, void *key, unsigned int hash)
 	__releases(&f->lock)
 {
+	struct inet_frag_bucket *hb;
 	struct inet_frag_queue *q;
 	struct hlist_node *n;
 
-	hlist_for_each_entry(q, n, &f->hash[hash], list) {
+	hb = &f->hash[hash];
+
+	spin_lock_bh(&hb->chain_lock);
+	hlist_for_each_entry(q, n, &hb->chain, list) {
 		if (q->net == nf && f->match(q, key)) {
 			atomic_inc(&q->refcnt);
+			spin_unlock_bh(&hb->chain_lock);
 			read_unlock(&f->lock);
 			return q;
 		}
 	}
+	spin_unlock_bh(&hb->chain_lock);
 	read_unlock(&f->lock);
 
 	return inet_frag_create(nf, f, key);

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox