Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH stable 3.2 3.4] ipv4: disable bh while doing route gc
From: David Miller @ 2014-10-13 16:52 UTC (permalink / raw)
  To: mleitner; +Cc: netdev, hannes
In-Reply-To: <6c3d6eca5d6a15c01393b010f2116bd169477c5a.1413215324.git.mleitner@redhat.com>

From: Marcelo Ricardo Leitner <mleitner@redhat.com>
Date: Mon, 13 Oct 2014 13:20:38 -0300

> Further tests revealed that after moving the garbage collector to a work
> queue and protecting it with a spinlock may leave the system prone to
> soft lockups if bottom half gets very busy.
> 
> It was reproced with a set of firewall rules that REJECTed packets. If
> the NIC bottom half handler ends up running on the same CPU that is
> running the garbage collector on a very large cache, the garbage
> collector will not be able to do its job due to the amount of work
> needed for handling the REJECTs and also won't reschedule.
> 
> The fix is to disable bottom half during the garbage collecting, as it
> already was in the first place (most calls to it came from softirqs).
> 
> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

Please add my:

Acked-by: David S. Miller <davem@davemloft.net>

and submit this directly to -stable, thanks.

^ permalink raw reply

* [net PATCH 3/3] drivers: net: cpsw: remove child devices while driver detach
From: Mugunthan V N @ 2014-10-13 16:51 UTC (permalink / raw)
  To: netdev; +Cc: davem, Mugunthan V N
In-Reply-To: <1413219067-15328-1-git-send-email-mugunthanvnm@ti.com>

remove all the child devices from the system to make sure that re-insert of
cpsw module doesn't fail on child device populated by of_platform_populate().

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
---
 drivers/net/ethernet/ti/cpsw.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index ab167dc..952e1e4 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -2392,6 +2392,15 @@ clean_ndev_ret:
 	return ret;
 }
 
+static int cpsw_remove_child_device(struct device *dev, void *c)
+{
+	struct platform_device *pdev = to_platform_device(dev);
+
+	of_device_unregister(pdev);
+
+	return 0;
+}
+
 static int cpsw_remove(struct platform_device *pdev)
 {
 	struct net_device *ndev = platform_get_drvdata(pdev);
@@ -2406,6 +2415,7 @@ static int cpsw_remove(struct platform_device *pdev)
 	cpdma_chan_destroy(priv->rxch);
 	cpdma_ctlr_destroy(priv->dma);
 	pm_runtime_disable(&pdev->dev);
+	device_for_each_child(&pdev->dev, NULL, cpsw_remove_child_device);
 	if (priv->data.dual_emac)
 		free_netdev(cpsw_get_slave_ndev(priv, 1));
 	free_netdev(ndev);
-- 
2.1.1.332.g0bf7dd6

^ permalink raw reply related

* [net PATCH 2/3] drivers: net: davinci_cpdma: remove spinlock as SOFTIRQ-unsafe lock order detected
From: Mugunthan V N @ 2014-10-13 16:51 UTC (permalink / raw)
  To: netdev; +Cc: davem, Mugunthan V N
In-Reply-To: <1413219067-15328-1-git-send-email-mugunthanvnm@ti.com>

remove spinlock in cpdma_desc_pool_destroy() as there is no active cpdma
channel and iounmap should be called without auquiring lock.

root@dra7xx-evm:~# modprobe -r ti_cpsw
[   50.539743]
[   50.541312] ======================================================
[   50.547796] [ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ]
[   50.554826] 3.14.19-02124-g95c5b7b #308 Not tainted
[   50.559939] ------------------------------------------------------
[   50.566416] modprobe/1921 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
[   50.573347]  (vmap_area_lock){+.+...}, at: [<c01127fc>] find_vmap_area+0x10/0x6c
[   50.581132]
[   50.581132] and this task is already holding:
[   50.587249]  (&(&pool->lock)->rlock#2){..-...}, at: [<bf017c74>] cpdma_ctlr_destroy+0x5c/0x114 [davinci_cpdma]
[   50.597766] which would create a new lock dependency:
[   50.603048]  (&(&pool->lock)->rlock#2){..-...} -> (vmap_area_lock){+.+...}
[   50.610296]
[   50.610296] but this new dependency connects a SOFTIRQ-irq-safe lock:
[   50.618601]  (&(&pool->lock)->rlock#2){..-...}
... which became SOFTIRQ-irq-safe at:
[   50.626829]   [<c06585a4>] _raw_spin_lock_irqsave+0x38/0x4c
[   50.632677]   [<bf01773c>] cpdma_desc_free.constprop.7+0x28/0x58 [davinci_cpdma]
[   50.640437]   [<bf0177e8>] __cpdma_chan_free+0x7c/0xa8 [davinci_cpdma]
[   50.647289]   [<bf017908>] __cpdma_chan_process+0xf4/0x134 [davinci_cpdma]
[   50.654512]   [<bf017984>] cpdma_chan_process+0x3c/0x54 [davinci_cpdma]
[   50.661455]   [<bf0277e8>] cpsw_poll+0x14/0xa8 [ti_cpsw]
[   50.667038]   [<c05844f4>] net_rx_action+0xc0/0x1e8
[   50.672150]   [<c0048234>] __do_softirq+0xcc/0x304
[   50.677183]   [<c004873c>] irq_exit+0xa8/0xfc
[   50.681751]   [<c000eeac>] handle_IRQ+0x50/0xb0
[   50.686513]   [<c0008638>] gic_handle_irq+0x28/0x5c
[   50.691628]   [<c06590a4>] __irq_svc+0x44/0x5c
[   50.696289]   [<c0658ab4>] _raw_spin_unlock_irqrestore+0x34/0x44
[   50.702591]   [<c065a9c4>] do_page_fault.part.9+0x144/0x3c4
[   50.708433]   [<c065acb8>] do_page_fault+0x74/0x84
[   50.713453]   [<c00083dc>] do_DataAbort+0x34/0x98
[   50.718391]   [<c065923c>] __dabt_usr+0x3c/0x40
[   50.723148]
[   50.723148] to a SOFTIRQ-irq-unsafe lock:
[   50.728893]  (vmap_area_lock){+.+...}
... which became SOFTIRQ-irq-unsafe at:
[   50.736476] ...  [<c06584e8>] _raw_spin_lock+0x28/0x38
[   50.741876]   [<c011376c>] alloc_vmap_area.isra.28+0xb8/0x300
[   50.747908]   [<c0113a44>] __get_vm_area_node.isra.29+0x90/0x134
[   50.754210]   [<c011486c>] get_vm_area_caller+0x3c/0x48
[   50.759692]   [<c0114be0>] vmap+0x40/0x78
[   50.763900]   [<c09442f0>] check_writebuffer_bugs+0x54/0x1a0
[   50.769835]   [<c093eac0>] start_kernel+0x320/0x388
[   50.774952]   [<80008074>] 0x80008074
[   50.778793]
[   50.778793] other info that might help us debug this:
[   50.778793]
[   50.787181]  Possible interrupt unsafe locking scenario:
[   50.787181]
[   50.794295]        CPU0                    CPU1
[   50.799042]        ----                    ----
[   50.803785]   lock(vmap_area_lock);
[   50.807446]                                local_irq_disable();
[   50.813652]                                lock(&(&pool->lock)->rlock#2);
[   50.820782]                                lock(vmap_area_lock);
[   50.827086]   <Interrupt>
[   50.829823]     lock(&(&pool->lock)->rlock#2);
[   50.834490]
[   50.834490]  *** DEADLOCK ***
[   50.834490]
[   50.840695] 4 locks held by modprobe/1921:
[   50.844981]  #0:  (&__lockdep_no_validate__){......}, at: [<c03e53e8>] driver_detach+0x44/0xb8
[   50.854038]  #1:  (&__lockdep_no_validate__){......}, at: [<c03e53f4>] driver_detach+0x50/0xb8
[   50.863102]  #2:  (&(&ctlr->lock)->rlock){......}, at: [<bf017c34>] cpdma_ctlr_destroy+0x1c/0x114 [davinci_cpdma]
[   50.873890]  #3:  (&(&pool->lock)->rlock#2){..-...}, at: [<bf017c74>] cpdma_ctlr_destroy+0x5c/0x114 [davinci_cpdma]
[   50.884871]
the dependencies between SOFTIRQ-irq-safe lock and the holding lock:
[   50.892827] -> (&(&pool->lock)->rlock#2){..-...} ops: 167 {
[   50.898703]    IN-SOFTIRQ-W at:
[   50.901995]                     [<c06585a4>] _raw_spin_lock_irqsave+0x38/0x4c
[   50.909476]                     [<bf01773c>] cpdma_desc_free.constprop.7+0x28/0x58 [davinci_cpdma]
[   50.918878]                     [<bf0177e8>] __cpdma_chan_free+0x7c/0xa8 [davinci_cpdma]
[   50.927366]                     [<bf017908>] __cpdma_chan_process+0xf4/0x134 [davinci_cpdma]
[   50.936218]                     [<bf017984>] cpdma_chan_process+0x3c/0x54 [davinci_cpdma]
[   50.944794]                     [<bf0277e8>] cpsw_poll+0x14/0xa8 [ti_cpsw]
[   50.952009]                     [<c05844f4>] net_rx_action+0xc0/0x1e8
[   50.958765]                     [<c0048234>] __do_softirq+0xcc/0x304
[   50.965432]                     [<c004873c>] irq_exit+0xa8/0xfc
[   50.971635]                     [<c000eeac>] handle_IRQ+0x50/0xb0
[   50.978035]                     [<c0008638>] gic_handle_irq+0x28/0x5c
[   50.984788]                     [<c06590a4>] __irq_svc+0x44/0x5c
[   50.991085]                     [<c0658ab4>] _raw_spin_unlock_irqrestore+0x34/0x44
[   50.999023]                     [<c065a9c4>] do_page_fault.part.9+0x144/0x3c4
[   51.006510]                     [<c065acb8>] do_page_fault+0x74/0x84
[   51.013171]                     [<c00083dc>] do_DataAbort+0x34/0x98
[   51.019738]                     [<c065923c>] __dabt_usr+0x3c/0x40
[   51.026129]    INITIAL USE at:
[   51.029335]                    [<c06585a4>] _raw_spin_lock_irqsave+0x38/0x4c
[   51.036729]                    [<bf017d78>] cpdma_chan_submit+0x4c/0x2f0 [davinci_cpdma]
[   51.045225]                    [<bf02863c>] cpsw_ndo_open+0x378/0x6bc [ti_cpsw]
[   51.052897]                    [<c058747c>] __dev_open+0x9c/0x104
[   51.059287]                    [<c05876ec>] __dev_change_flags+0x88/0x160
[   51.066420]                    [<c05877e4>] dev_change_flags+0x18/0x48
[   51.073270]                    [<c05ed51c>] devinet_ioctl+0x61c/0x6e0
[   51.080029]                    [<c056ee54>] sock_ioctl+0x5c/0x298
[   51.086418]                    [<c01350a4>] do_vfs_ioctl+0x78/0x61c
[   51.092993]                    [<c01356ac>] SyS_ioctl+0x64/0x74
[   51.099200]                    [<c000e580>] ret_fast_syscall+0x0/0x48
[   51.105956]  }
[   51.107696]  ... key      at: [<bf019000>] __key.21312+0x0/0xfffff650 [davinci_cpdma]
[   51.115912]  ... acquired at:
[   51.119019]    [<c00899ac>] lock_acquire+0x9c/0x104
[   51.124138]    [<c06584e8>] _raw_spin_lock+0x28/0x38
[   51.129341]    [<c01127fc>] find_vmap_area+0x10/0x6c
[   51.134547]    [<c0114960>] remove_vm_area+0x8/0x6c
[   51.139659]    [<c0114a7c>] __vunmap+0x20/0xf8
[   51.144318]    [<c001c350>] __arm_iounmap+0x10/0x18
[   51.149440]    [<bf017d08>] cpdma_ctlr_destroy+0xf0/0x114 [davinci_cpdma]
[   51.156560]    [<bf026294>] cpsw_remove+0x48/0x8c [ti_cpsw]
[   51.162407]    [<c03e62c8>] platform_drv_remove+0x18/0x1c
[   51.168063]    [<c03e4c44>] __device_release_driver+0x70/0xc8
[   51.174094]    [<c03e5458>] driver_detach+0xb4/0xb8
[   51.179212]    [<c03e4a6c>] bus_remove_driver+0x4c/0x90
[   51.184693]    [<c00b024c>] SyS_delete_module+0x10c/0x198
[   51.190355]    [<c000e580>] ret_fast_syscall+0x0/0x48
[   51.195661]
[   51.197217]
the dependencies between the lock to be acquired and SOFTIRQ-irq-unsafe lock:
[   51.205986] -> (vmap_area_lock){+.+...} ops: 520 {
[   51.211032]    HARDIRQ-ON-W at:
[   51.214321]                     [<c06584e8>] _raw_spin_lock+0x28/0x38
[   51.221090]                     [<c011376c>] alloc_vmap_area.isra.28+0xb8/0x300
[   51.228750]                     [<c0113a44>] __get_vm_area_node.isra.29+0x90/0x134
[   51.236690]                     [<c011486c>] get_vm_area_caller+0x3c/0x48
[   51.243811]                     [<c0114be0>] vmap+0x40/0x78
[   51.249654]                     [<c09442f0>] check_writebuffer_bugs+0x54/0x1a0
[   51.257239]                     [<c093eac0>] start_kernel+0x320/0x388
[   51.263994]                     [<80008074>] 0x80008074
[   51.269474]    SOFTIRQ-ON-W at:
[   51.272769]                     [<c06584e8>] _raw_spin_lock+0x28/0x38
[   51.279525]                     [<c011376c>] alloc_vmap_area.isra.28+0xb8/0x300
[   51.287190]                     [<c0113a44>] __get_vm_area_node.isra.29+0x90/0x134
[   51.295126]                     [<c011486c>] get_vm_area_caller+0x3c/0x48
[   51.302245]                     [<c0114be0>] vmap+0x40/0x78
[   51.308094]                     [<c09442f0>] check_writebuffer_bugs+0x54/0x1a0
[   51.315669]                     [<c093eac0>] start_kernel+0x320/0x388
[   51.322423]                     [<80008074>] 0x80008074
[   51.327906]    INITIAL USE at:
[   51.331112]                    [<c06584e8>] _raw_spin_lock+0x28/0x38
[   51.337775]                    [<c011376c>] alloc_vmap_area.isra.28+0xb8/0x300
[   51.345352]                    [<c0113a44>] __get_vm_area_node.isra.29+0x90/0x134
[   51.353197]                    [<c011486c>] get_vm_area_caller+0x3c/0x48
[   51.360224]                    [<c0114be0>] vmap+0x40/0x78
[   51.365977]                    [<c09442f0>] check_writebuffer_bugs+0x54/0x1a0
[   51.373464]                    [<c093eac0>] start_kernel+0x320/0x388
[   51.380131]                    [<80008074>] 0x80008074
[   51.385517]  }
[   51.387260]  ... key      at: [<c0a66948>] vmap_area_lock+0x10/0x20
[   51.393841]  ... acquired at:
[   51.396945]    [<c00899ac>] lock_acquire+0x9c/0x104
[   51.402060]    [<c06584e8>] _raw_spin_lock+0x28/0x38
[   51.407266]    [<c01127fc>] find_vmap_area+0x10/0x6c
[   51.412478]    [<c0114960>] remove_vm_area+0x8/0x6c
[   51.417592]    [<c0114a7c>] __vunmap+0x20/0xf8
[   51.422252]    [<c001c350>] __arm_iounmap+0x10/0x18
[   51.427369]    [<bf017d08>] cpdma_ctlr_destroy+0xf0/0x114 [davinci_cpdma]
[   51.434487]    [<bf026294>] cpsw_remove+0x48/0x8c [ti_cpsw]
[   51.440336]    [<c03e62c8>] platform_drv_remove+0x18/0x1c
[   51.446000]    [<c03e4c44>] __device_release_driver+0x70/0xc8
[   51.452031]    [<c03e5458>] driver_detach+0xb4/0xb8
[   51.457147]    [<c03e4a6c>] bus_remove_driver+0x4c/0x90
[   51.462628]    [<c00b024c>] SyS_delete_module+0x10c/0x198
[   51.468289]    [<c000e580>] ret_fast_syscall+0x0/0x48
[   51.473584]
[   51.475140]
[   51.475140] stack backtrace:
[   51.479703] CPU: 0 PID: 1921 Comm: modprobe Not tainted 3.14.19-02124-g95c5b7b #308
[   51.487744] [<c0016090>] (unwind_backtrace) from [<c0012060>] (show_stack+0x10/0x14)
[   51.495865] [<c0012060>] (show_stack) from [<c0652a20>] (dump_stack+0x78/0x94)
[   51.503444] [<c0652a20>] (dump_stack) from [<c0086f18>] (check_usage+0x408/0x594)
[   51.511293] [<c0086f18>] (check_usage) from [<c00870f8>] (check_irq_usage+0x54/0xb0)
[   51.519416] [<c00870f8>] (check_irq_usage) from [<c0088724>] (__lock_acquire+0xe54/0x1b90)
[   51.528077] [<c0088724>] (__lock_acquire) from [<c00899ac>] (lock_acquire+0x9c/0x104)
[   51.536291] [<c00899ac>] (lock_acquire) from [<c06584e8>] (_raw_spin_lock+0x28/0x38)
[   51.544417] [<c06584e8>] (_raw_spin_lock) from [<c01127fc>] (find_vmap_area+0x10/0x6c)
[   51.552726] [<c01127fc>] (find_vmap_area) from [<c0114960>] (remove_vm_area+0x8/0x6c)
[   51.560935] [<c0114960>] (remove_vm_area) from [<c0114a7c>] (__vunmap+0x20/0xf8)
[   51.568693] [<c0114a7c>] (__vunmap) from [<c001c350>] (__arm_iounmap+0x10/0x18)
[   51.576362] [<c001c350>] (__arm_iounmap) from [<bf017d08>] (cpdma_ctlr_destroy+0xf0/0x114 [davinci_cpdma])
[   51.586494] [<bf017d08>] (cpdma_ctlr_destroy [davinci_cpdma]) from [<bf026294>] (cpsw_remove+0x48/0x8c [ti_cpsw])
[   51.597261] [<bf026294>] (cpsw_remove [ti_cpsw]) from [<c03e62c8>] (platform_drv_remove+0x18/0x1c)
[   51.606659] [<c03e62c8>] (platform_drv_remove) from [<c03e4c44>] (__device_release_driver+0x70/0xc8)
[   51.616237] [<c03e4c44>] (__device_release_driver) from [<c03e5458>] (driver_detach+0xb4/0xb8)
[   51.625264] [<c03e5458>] (driver_detach) from [<c03e4a6c>] (bus_remove_driver+0x4c/0x90)
[   51.633749] [<c03e4a6c>] (bus_remove_driver) from [<c00b024c>] (SyS_delete_module+0x10c/0x198)
[   51.642781] [<c00b024c>] (SyS_delete_module) from [<c000e580>] (ret_fast_syscall+0x0/0x48)

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
---
 drivers/net/ethernet/ti/davinci_cpdma.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c
index 32dc289..657b65b 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -193,12 +193,9 @@ fail:
 
 static void cpdma_desc_pool_destroy(struct cpdma_desc_pool *pool)
 {
-	unsigned long flags;
-
 	if (!pool)
 		return;
 
-	spin_lock_irqsave(&pool->lock, flags);
 	WARN_ON(pool->used_desc);
 	if (pool->cpumap) {
 		dma_free_coherent(pool->dev, pool->mem_size, pool->cpumap,
@@ -206,7 +203,6 @@ static void cpdma_desc_pool_destroy(struct cpdma_desc_pool *pool)
 	} else {
 		iounmap(pool->iomap);
 	}
-	spin_unlock_irqrestore(&pool->lock, flags);
 }
 
 static inline dma_addr_t desc_phys(struct cpdma_desc_pool *pool,
-- 
2.1.1.332.g0bf7dd6

^ permalink raw reply related

* [net PATCH 1/3] drivers: net: davinci_cpdma: remove kfree on objects allocated with devm_* apis
From: Mugunthan V N @ 2014-10-13 16:51 UTC (permalink / raw)
  To: netdev; +Cc: davem, Mugunthan V N
In-Reply-To: <1413219067-15328-1-git-send-email-mugunthanvnm@ti.com>

memories allocated with devm_* apis must not be freed with kfree apis,
so removing the kfree calls

Fixes: e194312854ed ('drivers: net: davinci_cpdma: Convert kzalloc() to devm_kzalloc().')

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
---
 drivers/net/ethernet/ti/davinci_cpdma.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c
index 4a000f6..32dc289 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -561,7 +561,6 @@ int cpdma_chan_destroy(struct cpdma_chan *chan)
 		cpdma_chan_stop(chan);
 	ctlr->channels[chan->chan_num] = NULL;
 	spin_unlock_irqrestore(&ctlr->lock, flags);
-	kfree(chan);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(cpdma_chan_destroy);
-- 
2.1.1.332.g0bf7dd6

^ permalink raw reply related

* [net PATCH 0/3] bug fixes for davinci_cpdma and cpsw drivers
From: Mugunthan V N @ 2014-10-13 16:51 UTC (permalink / raw)
  To: netdev; +Cc: davem, Mugunthan V N


Mugunthan V N (3):
  drivers: net: davinci_cpdma: remove kfree on objects allocated with
    devm_* apis
  drivers: net: davinci_cpdma: remove spinlock as SOFTIRQ-unsafe lock
    order detected
  drivers: net: cpsw: remove child devices while driver detach

 drivers/net/ethernet/ti/cpsw.c          | 10 ++++++++++
 drivers/net/ethernet/ti/davinci_cpdma.c |  5 -----
 2 files changed, 10 insertions(+), 5 deletions(-)

-- 
2.1.1.332.g0bf7dd6

^ permalink raw reply

* Re: [PATCH net-next] tg3: Add skb->xmit_more support
From: Eric Dumazet @ 2014-10-13 16:48 UTC (permalink / raw)
  To: Prashant Sreedharan; +Cc: davem, netdev, dborkman, mchan
In-Reply-To: <1413217302-15396-1-git-send-email-prashant@broadcom.com>

On Mon, 2014-10-13 at 09:21 -0700, Prashant Sreedharan wrote:
> Ring TX doorbell only if xmit_more is not set or the queue is stopped.
> 
> Suggested-by: Daniel Borkmann <dborkman@redhat.com>
> Signed-off-by: Prashant Sreedharan <prashant@broadcom.com>
> Signed-off-by: Michael Chan <mchan@broadcom.com>
> ---
>  drivers/net/ethernet/broadcom/tg3.c |   10 ++++++----
>  1 files changed, 6 insertions(+), 4 deletions(-)

Have you noticed any performance change ?

I did the patch for bnx2x but got no real difference...

Thanks

^ permalink raw reply

* [PATCH net-next] tg3: Add skb->xmit_more support
From: Prashant Sreedharan @ 2014-10-13 16:21 UTC (permalink / raw)
  To: davem; +Cc: netdev, dborkman, mchan, Prashant Sreedharan

Ring TX doorbell only if xmit_more is not set or the queue is stopped.

Suggested-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Prashant Sreedharan <prashant@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/ethernet/broadcom/tg3.c |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index ba49948..dbb41c1 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -8099,9 +8099,6 @@ static netdev_tx_t tg3_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	/* Sync BD data before updating mailbox */
 	wmb();
 
-	/* Packets are ready, update Tx producer idx local and on card. */
-	tw32_tx_mbox(tnapi->prodmbox, entry);
-
 	tnapi->tx_prod = entry;
 	if (unlikely(tg3_tx_avail(tnapi) <= (MAX_SKB_FRAGS + 1))) {
 		netif_tx_stop_queue(txq);
@@ -8116,7 +8113,12 @@ static netdev_tx_t tg3_start_xmit(struct sk_buff *skb, struct net_device *dev)
 			netif_tx_wake_queue(txq);
 	}
 
-	mmiowb();
+	if (!skb->xmit_more || netif_xmit_stopped(txq)) {
+		/* Packets are ready, update Tx producer idx on card. */
+		tw32_tx_mbox(tnapi->prodmbox, entry);
+		mmiowb();
+	}
+
 	return NETDEV_TX_OK;
 
 dma_error:
-- 
1.7.1

^ permalink raw reply related

* [PATCH stable 3.2 3.4] ipv4: disable bh while doing route gc
From: Marcelo Ricardo Leitner @ 2014-10-13 16:20 UTC (permalink / raw)
  To: davem; +Cc: netdev, hannes

Further tests revealed that after moving the garbage collector to a work
queue and protecting it with a spinlock may leave the system prone to
soft lockups if bottom half gets very busy.

It was reproced with a set of firewall rules that REJECTed packets. If
the NIC bottom half handler ends up running on the same CPU that is
running the garbage collector on a very large cache, the garbage
collector will not be able to do its job due to the amount of work
needed for handling the REJECTs and also won't reschedule.

The fix is to disable bottom half during the garbage collecting, as it
already was in the first place (most calls to it came from softirqs).

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---

Notes:
    Hi Dave,

    This is needed for stables 3.2 and 3.4, as those are the ones that we
    applied the previous patches:
        ipv4: move route garbage collector to work queue
        ipv4: avoid parallel route cache gc executions

    Thanks.

 net/ipv4/route.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 9e7909eef8d10107008f8d629f9f2d75fde52eb2..6c34bc98bce7147cf6c242439f2afeb9adf28c72 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -998,7 +998,7 @@ static void __do_rt_garbage_collect(int elasticity, int min_interval)
 	 * do not make it too frequently.
 	 */

-	spin_lock(&rt_gc_lock);
+	spin_lock_bh(&rt_gc_lock);

 	RT_CACHE_STAT_INC(gc_total);

@@ -1101,7 +1101,7 @@ work_done:
 	    dst_entries_get_slow(&ipv4_dst_ops) < ipv4_dst_ops.gc_thresh)
 		expire = ip_rt_gc_timeout;
 out:
-	spin_unlock(&rt_gc_lock);
+	spin_unlock_bh(&rt_gc_lock);
 }

 static void __rt_garbage_collect(struct work_struct *w)
-- 
1.9.3

^ permalink raw reply related

* [PATCH] nf_conntrack_proto_tcp: allow server to become a client in TW handling
From: Marcelo Ricardo Leitner @ 2014-10-13 16:09 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev
In-Reply-To: <5419B546.40702@redhat.com>

When a port that was used to listen for inbound connections gets closed
and reused for outgoing connections (like rsh ends up doing for stderr
flow), current we may reject the SYN/ACK packet for the new connection
because tcp_conntracks states forbirds a port to become a client while
there is still a TIME_WAIT entry in there for it.

As TCP may expire the TIME_WAIT socket in 60s and conntrack's timeout
for it is 120s, there is a ~60s window that the application can end up
opening a port that conntrack will end up blocking.

This patch fixes this by simply allowing such state transition: if we
see a SYN, in TIME_WAIT state, on REPLY direction, move it to sSS. Note
that the rest of the code already handles this situation, more
specificly in tcp_packet(), first switch clause.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
---
 net/netfilter/nf_conntrack_proto_tcp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index 44d1ea32570a07338dc39f34624bd823b6f76916..d87b6423ffb21e0f8f9b6ef25ef51c1cb5f54ad6 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -213,7 +213,7 @@ static const u8 tcp_conntracks[2][6][TCP_CONNTRACK_MAX] = {
 	{
 /* REPLY */
 /* 	     sNO, sSS, sSR, sES, sFW, sCW, sLA, sTW, sCL, sS2	*/
-/*syn*/	   { sIV, sS2, sIV, sIV, sIV, sIV, sIV, sIV, sIV, sS2 },
+/*syn*/	   { sIV, sS2, sIV, sIV, sIV, sIV, sIV, sSS, sIV, sS2 },
 /*
  *	sNO -> sIV	Never reached.
  *	sSS -> sS2	Simultaneous open
@@ -223,7 +223,7 @@ static const u8 tcp_conntracks[2][6][TCP_CONNTRACK_MAX] = {
  *	sFW -> sIV
  *	sCW -> sIV
  *	sLA -> sIV
- *	sTW -> sIV	Reopened connection, but server may not do it.
+ *	sTW -> sSS	Reopened connection, but server may have switched role
  *	sCL -> sIV
  */
 /* 	     sNO, sSS, sSR, sES, sFW, sCW, sLA, sTW, sCL, sS2	*/
-- 
1.9.3

^ permalink raw reply related

* Re: [PATCH] ipv4: dst_entry leak in ip_append_data()
From: David Miller @ 2014-10-13 16:03 UTC (permalink / raw)
  To: vvs; +Cc: netdev, kuznet, jmorris, yoshfuji, kaber, eric.dumazet
In-Reply-To: <543BA6BE.3040509@parallels.com>

From: Vasily Averin <vvs@parallels.com>
Date: Mon, 13 Oct 2014 14:17:34 +0400

> @@ -1152,9 +1161,14 @@ int ip_append_data(struct sock *sk, struct flowi4 *fl4,
>  		transhdrlen = 0;
>  	}
>  
> -	return __ip_append_data(sk, fl4, &sk->sk_write_queue, &inet->cork.base,
> +	err = __ip_append_data(sk, fl4, &sk->sk_write_queue, &inet->cork.base,
>  				sk_page_frag(sk), getfrag,
>  				from, length, transhdrlen, flags);

If you are changing the column of the openning parenthesis of the function
call, you must adjust the indentation of the arguments on the subsequent
lines so that they start exactly at the first column after that openning
parenthesis.

Thanks.

^ permalink raw reply

* Re: [PATCH 1/2] netfilter: kill nf_send_reset6() from include/net/netfilter/ipv6/nf_reject.h
From: Josh Boyer @ 2014-10-13 16:01 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Pablo Neira Ayuso, netfilter-devel, David Miller, netdev,
	Eric Dumazet
In-Reply-To: <20141013155510.GA26105@breakpoint.cc>

On Mon, Oct 13, 2014 at 11:55 AM, Florian Westphal <fw@strlen.de> wrote:
> Josh Boyer <jwboyer@fedoraproject.org> wrote:
>> On Thu, Oct 9, 2014 at 2:27 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>> > nf_send_reset6() now resides in net/ipv6/netfilter/nf_reject_ipv6.c
>> >
>> > Fixes: c8d7b98 ("netfilter: move nf_send_resetX() code to nf_reject_ipvX modules")
>> > Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
>> > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
>> > Acked-by: Eric Dumazet <edumazet@google.com>
>> > ---
>> >  include/net/netfilter/ipv6/nf_reject.h |  157 +-------------------------------
>> >  1 file changed, 2 insertions(+), 155 deletions(-)
>>
>> Hi All,
>>
>> This morning I was testing a kernel build from Linus' tree as of Linux
>> v3.17-7639-g90eac7eee2f4.  When I rebooted my test machines, I
>> couldn't ssh back into any of them.  I poked around a bit and noticed
>> that it seems the iptables rules weren't getting loaded properly.
>> Traffic out worked fine, and I could ping the machine, but other
>> incoming traffic was blocked.  Then I saw that the ip6t_REJECT and
>> ip6t_rpfilter modules were not being loaded on the bad kernel.
>> Looking in dmesg I see:
>>
>> [   14.619028] nf_reject_ipv6: module license 'unspecified' taints kernel.
>> [   14.619125] nf_reject_ipv6: Unknown symbol ip6_local_out (err 0)
>
> Ouch. ip6_local_is EXPORT_SYMBOL_GPL.
>
> http://patchwork.ozlabs.org/patch/398501/
>
> should fix this.

I believe you're correct.  I dug in some more and I was just about to
send a similar patch.  I'll add it on top of my builds and test it
out.  Thanks for the pointer.

josh

^ permalink raw reply

* Re: [PATCH 1/2] netfilter: kill nf_send_reset6() from include/net/netfilter/ipv6/nf_reject.h
From: Florian Westphal @ 2014-10-13 15:55 UTC (permalink / raw)
  To: Josh Boyer
  Cc: Pablo Neira Ayuso, netfilter-devel, David Miller, netdev,
	Eric Dumazet
In-Reply-To: <CA+5PVA7-k-HFGUUeZ3LTCmVcmR1h_=8W-_PzRO4-2dcc_cRHaQ@mail.gmail.com>

Josh Boyer <jwboyer@fedoraproject.org> wrote:
> On Thu, Oct 9, 2014 at 2:27 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > nf_send_reset6() now resides in net/ipv6/netfilter/nf_reject_ipv6.c
> >
> > Fixes: c8d7b98 ("netfilter: move nf_send_resetX() code to nf_reject_ipvX modules")
> > Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
> > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> > Acked-by: Eric Dumazet <edumazet@google.com>
> > ---
> >  include/net/netfilter/ipv6/nf_reject.h |  157 +-------------------------------
> >  1 file changed, 2 insertions(+), 155 deletions(-)
> 
> Hi All,
> 
> This morning I was testing a kernel build from Linus' tree as of Linux
> v3.17-7639-g90eac7eee2f4.  When I rebooted my test machines, I
> couldn't ssh back into any of them.  I poked around a bit and noticed
> that it seems the iptables rules weren't getting loaded properly.
> Traffic out worked fine, and I could ping the machine, but other
> incoming traffic was blocked.  Then I saw that the ip6t_REJECT and
> ip6t_rpfilter modules were not being loaded on the bad kernel.
> Looking in dmesg I see:
> 
> [   14.619028] nf_reject_ipv6: module license 'unspecified' taints kernel.
> [   14.619125] nf_reject_ipv6: Unknown symbol ip6_local_out (err 0)

Ouch. ip6_local_is EXPORT_SYMBOL_GPL.

http://patchwork.ozlabs.org/patch/398501/

should fix this.

^ permalink raw reply

* Re: Regarding tx-nocache-copy in the Sheevaplug
From: Lluís Batlle i Rossell @ 2014-10-13 15:48 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ezequiel Garcia, Andrew Lunn, linux-kernel, netdev,
	Carles Pagès, linux-arm-kernel
In-Reply-To: <1413211759.9362.103.camel@edumazet-glaptop2.roam.corp.google.com>

On Mon, Oct 13, 2014 at 07:49:19AM -0700, Eric Dumazet wrote:
> On Mon, 2014-10-13 at 16:31 +0200, Lluís Batlle i Rossell wrote:
> > Enabling tx offload and disabling tx-nocache-copy, making the machine *send* a
> > lot of ssh traffic (sftp for example) makes ssh fail HMAC. It's quite easy to
> > reproduce here.
> > 
> > As for the hardware, it's an old sheevaplug board.
> 
> 
> Have you tried disabling TSO only, and are you using the latest kernel ?
> 
> Ezequiel Garcia added lot of changes recently.
> 
> 

Is TSO TCP segmentation offload? It's disabled. The kernel is 3.16.3 (debian).
https://packages.debian.org/testing/kernel/linux-image-3.16-2-kirkwood

^ permalink raw reply

* Re: [PATCH 1/2] netfilter: kill nf_send_reset6() from include/net/netfilter/ipv6/nf_reject.h
From: Josh Boyer @ 2014-10-13 15:41 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, David Miller, netdev, Eric Dumazet
In-Reply-To: <1412879261-25045-2-git-send-email-pablo@netfilter.org>

[-- Attachment #1: Type: text/plain, Size: 2111 bytes --]

On Thu, Oct 9, 2014 at 2:27 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> nf_send_reset6() now resides in net/ipv6/netfilter/nf_reject_ipv6.c
>
> Fixes: c8d7b98 ("netfilter: move nf_send_resetX() code to nf_reject_ipvX modules")
> Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> Acked-by: Eric Dumazet <edumazet@google.com>
> ---
>  include/net/netfilter/ipv6/nf_reject.h |  157 +-------------------------------
>  1 file changed, 2 insertions(+), 155 deletions(-)

Hi All,

This morning I was testing a kernel build from Linus' tree as of Linux
v3.17-7639-g90eac7eee2f4.  When I rebooted my test machines, I
couldn't ssh back into any of them.  I poked around a bit and noticed
that it seems the iptables rules weren't getting loaded properly.
Traffic out worked fine, and I could ping the machine, but other
incoming traffic was blocked.  Then I saw that the ip6t_REJECT and
ip6t_rpfilter modules were not being loaded on the bad kernel.
Looking in dmesg I see:

[   14.619028] nf_reject_ipv6: module license 'unspecified' taints kernel.
[   14.619125] nf_reject_ipv6: Unknown symbol ip6_local_out (err 0)

So I did a git bisect and it pointed to this patch:

[jwboyer@obiwan linux]$ git bisect bad
91c1a09b33c902e20e09d9742560cc238a714de5 is the first bad commit
commit 91c1a09b33c902e20e09d9742560cc238a714de5
Author: Pablo Neira Ayuso <pablo@netfilter.org>
Date:   Tue Oct 7 18:48:12 2014 +0200

    netfilter: kill nf_send_reset6() from include/net/netfilter/ipv6/nf_reject.h

    nf_send_reset6() now resides in net/ipv6/netfilter/nf_reject_ipv6.c

    Fixes: c8d7b98 ("netfilter: move nf_send_resetX() code to
nf_reject_ipvX modules")
    Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Acked-by: Eric Dumazet <edumazet@google.com>

:040000 040000 ab5b61e7ba562e0b22d781a4322ed73c657878dc
71822a7e785c408dd55a6c4600144573de500512 M      include
[jwboyer@obiwan linux]$

I've attached the bisect log.

Perhaps one too many header files were trimmed in this case?

josh

[-- Attachment #2: BISECT_LOG --]
[-- Type: application/octet-stream, Size: 2277 bytes --]

git bisect start
# bad: [90eac7eee2f4257644dcfb9d22348fded7c24afd] Merge tag 'ftracetest-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
git bisect bad 90eac7eee2f4257644dcfb9d22348fded7c24afd
# good: [c798360cd1438090d51eeaa8e67985da11362eba] Merge branch 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
git bisect good c798360cd1438090d51eeaa8e67985da11362eba
# good: [a2ce35273c2f1aa0dcddd8822681d64ee5f31852] Merge tag 'sound-3.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
git bisect good a2ce35273c2f1aa0dcddd8822681d64ee5f31852
# good: [90d0c376f5ee1927327b267faf15bf970476f09e] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
git bisect good 90d0c376f5ee1927327b267faf15bf970476f09e
# good: [d53ba6b3bba33432cc37b7101a86f8f3392c46e7] cxl: Fix afu_read() not doing finish_wait() on signal or non-blocking
git bisect good d53ba6b3bba33432cc37b7101a86f8f3392c46e7
# good: [052db7ec86dff26f734031c3ef5c2c03a94af0af] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
git bisect good 052db7ec86dff26f734031c3ef5c2c03a94af0af
# bad: [2403077d47991a8385789779ee5fc90b003f9fbe] Merge branch 'xgene'
git bisect bad 2403077d47991a8385789779ee5fc90b003f9fbe
# good: [b71b12dce200e4709bd9f709e71c84dcb2cf8a82] networking: fm10k: Fix build failure
git bisect good b71b12dce200e4709bd9f709e71c84dcb2cf8a82
# good: [4511a4a50e1a8757f771681c3e92dbf5a928eeac] Merge tag 'master-2014-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
git bisect good 4511a4a50e1a8757f771681c3e92dbf5a928eeac
# bad: [b61d18904e2a99ed16b6e97d5419f1db19e08bd2] MAINTAINERS: Update APM X-Gene section
git bisect bad b61d18904e2a99ed16b6e97d5419f1db19e08bd2
# bad: [f0d1f04f0a2f662b6b617e24d115fddcf6ef8723] netfilter: fix wrong arithmetics regarding NFT_REJECT_ICMPX_MAX
git bisect bad f0d1f04f0a2f662b6b617e24d115fddcf6ef8723
# bad: [91c1a09b33c902e20e09d9742560cc238a714de5] netfilter: kill nf_send_reset6() from include/net/netfilter/ipv6/nf_reject.h
git bisect bad 91c1a09b33c902e20e09d9742560cc238a714de5
# first bad commit: [91c1a09b33c902e20e09d9742560cc238a714de5] netfilter: kill nf_send_reset6() from include/net/netfilter/ipv6/nf_reject.h

^ permalink raw reply

* Re: Regarding tx-nocache-copy in the Sheevaplug
From: Eric Dumazet @ 2014-10-13 14:49 UTC (permalink / raw)
  To: Lluís Batlle i Rossell, Ezequiel Garcia
  Cc: Andrew Lunn, linux-kernel, netdev, Carles Pagès,
	linux-arm-kernel
In-Reply-To: <20141013143138.GJ1972@vicerveza.homeunix.net>

On Mon, 2014-10-13 at 16:31 +0200, Lluís Batlle i Rossell wrote:
> Enabling tx offload and disabling tx-nocache-copy, making the machine *send* a
> lot of ssh traffic (sftp for example) makes ssh fail HMAC. It's quite easy to
> reproduce here.
> 
> As for the hardware, it's an old sheevaplug board.


Have you tried disabling TSO only, and are you using the latest kernel ?

Ezequiel Garcia added lot of changes recently.

^ permalink raw reply

* Re: [patch net] ipv4: fix nexthop attlen check in fib_nh_match
From: Jiri Pirko @ 2014-10-13 14:34 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Thomas Graf, netdev, davem, kuznet, jmorris, yoshfuji, kaber
In-Reply-To: <1413202966.9362.80.camel@edumazet-glaptop2.roam.corp.google.com>

Mon, Oct 13, 2014 at 02:22:46PM CEST, eric.dumazet@gmail.com wrote:
>On Mon, 2014-10-13 at 11:54 +0200, Jiri Pirko wrote:
>> fib_nh_match does not match nexthops correctly. Example:
>> 
>> This command is not successful and route is removed. After this patch
>> applied, the route is correctly matched and result is:
>> RTNETLINK answers: No such process
>> 
>> Please consider this for stable trees as well.
>> 
>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>> ---
>>  net/ipv4/fib_semantics.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
>> index 5b6efb3..f99f41b 100644
>> --- a/net/ipv4/fib_semantics.c
>> +++ b/net/ipv4/fib_semantics.c
>> @@ -537,7 +537,7 @@ int fib_nh_match(struct fib_config *cfg, struct fib_info *fi)
>>  			return 1;
>>  
>>  		attrlen = rtnh_attrlen(rtnh);
>> -		if (attrlen < 0) {
>> +		if (attrlen > 0) {
>>  			struct nlattr *nla, *attrs = rtnh_attrs(rtnh);
>>  
>>  			nla = nla_find(attrs, attrlen, RTA_GATEWAY);
>
>Fixes: 4e902c57417c4 ("[IPv4]: FIB configuration using struct fib_config")
>
>Good catch, thanks !
>
>Acked-by: Eric Dumazet <edumazet@google.com>


Thanks Eric. I reposted with your ack, fixes line and missing example.

^ permalink raw reply

* [patch net repost] ipv4: fix nexthop attlen check in fib_nh_match
From: Jiri Pirko @ 2014-10-13 14:34 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuznet, jmorris, yoshfuji, kaber, edumazet, tgraf

fib_nh_match does not match nexthops correctly. Example:

ip route add 172.16.10/24 nexthop via 192.168.122.12 dev eth0 \
                          nexthop via 192.168.122.13 dev eth0
ip route del 172.16.10/24 nexthop via 192.168.122.14 dev eth0 \
                          nexthop via 192.168.122.15 dev eth0

Del command is successful and route is removed. After this patch
applied, the route is correctly matched and result is:
RTNETLINK answers: No such process

Please consider this for stable trees as well.

Fixes: 4e902c57417c4 ("[IPv4]: FIB configuration using struct fib_config")
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Eric Dumazet <edumazet@google.com>
---
reposted with example (it was missing for some reason in the original post)

 net/ipv4/fib_semantics.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 5b6efb3..f99f41b 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -537,7 +537,7 @@ int fib_nh_match(struct fib_config *cfg, struct fib_info *fi)
 			return 1;
 
 		attrlen = rtnh_attrlen(rtnh);
-		if (attrlen < 0) {
+		if (attrlen > 0) {
 			struct nlattr *nla, *attrs = rtnh_attrs(rtnh);
 
 			nla = nla_find(attrs, attrlen, RTA_GATEWAY);
-- 
1.9.3

^ permalink raw reply related

* Re: Regarding tx-nocache-copy in the Sheevaplug
From: Lluís Batlle i Rossell @ 2014-10-13 14:31 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: linux-kernel, netdev, Carles Pagès, linux-arm-kernel
In-Reply-To: <20141013142156.GE26864@lunn.ch>

Enabling tx offload and disabling tx-nocache-copy, making the machine *send* a
lot of ssh traffic (sftp for example) makes ssh fail HMAC. It's quite easy to
reproduce here.

As for the hardware, it's an old sheevaplug board.

On Mon, Oct 13, 2014 at 04:21:56PM +0200, Andrew Lunn wrote:
> On Mon, Oct 13, 2014 at 12:52:46PM +0200, Lluís Batlle i Rossell wrote:
> > Hello,
> > 
> > on the 7th of January 2014 ths patch was applied:
> > https://lkml.org/lkml/2014/1/7/307
> > 
> > [PATCH v2] net: Do not enable tx-nocache-copy by default
> >         
> > In the Sheevaplug (ARM Feroceon 88FR131 from Marvell) this made packets to be
> > sent corrupted. I think this machine has something special about the cache.
> 
> Hi Lluís
> 
> Please could you describe your test setup. I would like to try to
> reproduce the problem. I have a machine based on kirkwood 6282 and the
> same ethernet.
> 
> Thanks
> 	Andrew

^ permalink raw reply

* Network optimality (was Re: [PATCH net-next] qdisc: validate skb without holding lock_
From: Dave Taht @ 2014-10-13 14:22 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: David Miller, Eric Dumazet, netdev@vger.kernel.org, Tom Herbert,
	Hannes Frederic Sowa, Florian Westphal, Daniel Borkmann,
	Jamal Hadi Salim, Alexander Duyck, John Fastabend,
	Toke Høiland-Jørgensen

When I first got cc'd on these threads, and saw netperf-wrapper being
used on it,
I thought: "Oh god, I've created a monster.". My intent with helping create
such a measurement tool was not to routinely drive a network to saturation
but to be able to measure the impact on latency of doing so.

I was trying get reasonable behavior when a "router" went into overload.

Servers, on the other hand, have more options to avoid overload than
routers do. There's been a great deal of really nice work on that
front. I love all that.

and I like BQL because it provides enough backpressure to be able to
do smarter things about scheduling packets higher in the stack. (life
pre-BQL cost some hair)

But tom once told me me "BQL's objective is to keep the hardware busy".
It uses an MIAD controller instead of a more sane AIMD one, in particular,
I'd much rather it ramped down to smaller values after
absorbing a burst.

My objective is always to keep the *network's behavior optimal*,
minimizing bursts, and subsequent tail loss on the other side,
and responding quickly to loss, and doing that by
preserving to the highest extent possible the ack clocking that a
fluid model has. I LOVE BQL for providing more backpressure than has
ever existed before, and I know it's incredibly difficult to have fluid models
in a conventional cpu architecture that has to do other stuff.

But in order to get the best results for network behavior I'm willing to
sacrifice a great deal of cpu, interrupts, whatever it takes! to get the
most packets to all the destinations specified, whatever the workload,
with the *minimum amount of latency between ack and reply* possible.

What I'd hoped for in the new bulking and rcu stuff was to be able to
see a net reduction in TSO/GSO Size, and/or BQL's size, and I also did
keep hoping for some profiles on sch_fq, and for more complex
benchmarking of dozens or hundreds of realistically sized TCP flows
(in both directions) to exercise it all.

Some of the data presented showed that a single BQL'd queue was >400K,
and with hardware multi-queue, 128K, when TSO and GSO were used, but
with hardware multi-queue and no TSO/GSO, BQL was closer to 30K.

This said to me that the maximum "right" size for a TSO/GSO "packet" was
closer to 12k in this environment, and the right size for BQL, 30k,
before it started exerting backpressure to the qdisc.

This would reduce the potential inter-flow network latency by a factor
of 10 on the single hw queue scenario, and 4 in the multi queue one.

It would probably cost some interrupts, and in scenarios lacking
packet loss, throughput, but in other scenarios with lots of flows
each flow will ramp up in speed, faster, as you reduce the RTTs.
Paying attention to this will also push profiling activities into
areas of the stack that might be profitable.

I would very much like to have profiles of happens now both here and
elsewhere in the stack with this new code with TSO/GSO sizes capped
thusly and BQL capped to 30k, and a smarter qdisc like fq used.

2) Most of the time, a server is not driving the wire to saturation. If
   it is, you are doing something wrong. The BQL queues are empty, or
   nearly so, so the instant someone creates a qdisc queue, it
   drains.

But: if there are two or more flows under contention, creating a qdisc queue
    better multiplexing the results is highly desirable, and the stack
   should be smart enough to make that overload only last briefly.

   This is part of why I'm unfond of the deep and persistent BQL queues as we
get today.

3) Pure ack-only workloads are rare. It is a useful test case, but...

4) I thought the ring-cleanup optimization was rather interesting and
   could be made more dynamic.

5) I remain amazed at the vast improvements in throughput, reductions in
interrupts, lockless operation and the RCU stuff that have come out of
this so far, but had to make these points in the hope that the big picture
is retained.

It does no good to blast packets through the network unless there is a
high probability that they will actually be received on the other side.

thanks for listening.

^ permalink raw reply

* Re: Regarding tx-nocache-copy in the Sheevaplug
From: Andrew Lunn @ 2014-10-13 14:21 UTC (permalink / raw)
  To: Lluís Batlle i Rossell
  Cc: linux-kernel, netdev, Carles Pagès, linux-arm-kernel
In-Reply-To: <20141013105246.GD1972@vicerveza.homeunix.net>

On Mon, Oct 13, 2014 at 12:52:46PM +0200, Lluís Batlle i Rossell wrote:
> Hello,
> 
> on the 7th of January 2014 ths patch was applied:
> https://lkml.org/lkml/2014/1/7/307
> 
> [PATCH v2] net: Do not enable tx-nocache-copy by default
>         
> In the Sheevaplug (ARM Feroceon 88FR131 from Marvell) this made packets to be
> sent corrupted. I think this machine has something special about the cache.

Hi Lluís

Please could you describe your test setup. I would like to try to
reproduce the problem. I have a machine based on kirkwood 6282 and the
same ethernet.

Thanks
	Andrew

^ permalink raw reply

* [PATCH net] tcp: TCP Small Queues and strange attractors
From: Eric Dumazet @ 2014-10-13 13:27 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

From: Eric Dumazet <edumazet@google.com>

TCP Small queues tries to keep number of packets in qdisc
as small as possible, and depends on a tasklet to feed following
packets at TX completion time.
Choice of tasklet was driven by latencies requirements.

Then, TCP stack tries to avoid reorders, by locking flows with
outstanding packets in qdisc in a given TX queue.

What can happen is that many flows get attracted by a low performing
TX queue, and cpu servicing TX completion has to feed packets for all of
them, making this cpu 100% busy in softirq mode.

This became particularly visible with latest skb->xmit_more support

Strategy adopted in this patch is to detect when tcp_wfree() is called
from ksoftirqd and let the outstanding queue for this flow being drained
before feeding additional packets, so that skb->ooo_okay can be set
to allow select_queue() to select the optimal queue :

Incoming ACKS are normally handled by different cpus, so this patch
gives more chance for these cpus to take over the burden of feeding
qdisc with future packets.

Tested:

lpaa23:~# ./super_netperf 1400 --google-pacing-rate 3028000 -H lpaa24 -l 3600 &

lpaa23:~# sar -n DEV 1 10 | grep eth1
06:16:18 AM      eth1 595448.00 1190564.00  38381.09 1760253.12      0.00      0.00      1.00
06:16:19 AM      eth1 594858.00 1189686.00  38340.76 1758952.72      0.00      0.00      0.00
06:16:20 AM      eth1 597017.00 1194019.00  38480.79 1765370.29      0.00      0.00      1.00
06:16:21 AM      eth1 595450.00 1190936.00  38380.19 1760805.05      0.00      0.00      0.00
06:16:22 AM      eth1 596385.00 1193096.00  38442.56 1763976.29      0.00      0.00      1.00
06:16:23 AM      eth1 598155.00 1195978.00  38552.97 1768264.60      0.00      0.00      0.00
06:16:24 AM      eth1 594405.00 1188643.00  38312.57 1757414.89      0.00      0.00      1.00
06:16:25 AM      eth1 593366.00 1187154.00  38252.16 1755195.83      0.00      0.00      0.00
06:16:26 AM      eth1 593188.00 1186118.00  38232.88 1753682.57      0.00      0.00      1.00
06:16:27 AM      eth1 596301.00 1192241.00  38440.94 1762733.09      0.00      0.00      0.00
Average:         eth1 595457.30 1190843.50  38381.69 1760664.84      0.00      0.00      0.50
lpaa23:~# ./tc -s -d qd sh dev eth1 | grep backlog
 backlog 7606336b 2513p requeues 167982 
 backlog 224072b 74p requeues 566 
 backlog 581376b 192p requeues 5598 
 backlog 181680b 60p requeues 1070 
 backlog 5305056b 1753p requeues 110166    // Here, this TX queue is attracting flows
 backlog 157456b 52p requeues 1758 
 backlog 672216b 222p requeues 3025 
 backlog 60560b 20p requeues 24541 
 backlog 448144b 148p requeues 21258 

lpaa23:~# echo 1 >/proc/sys/net/ipv4/tcp_tsq_enable_tcp_wfree_ksoftirqd_detect

Immediate jump to full bandwidth, and traffic is properly
shard on all tx queues.

lpaa23:~# sar -n DEV 1 10 | grep eth1
06:16:46 AM      eth1 1397632.00 2795397.00  90081.87 4133031.26      0.00      0.00      1.00
06:16:47 AM      eth1 1396874.00 2793614.00  90032.99 4130385.46      0.00      0.00      0.00
06:16:48 AM      eth1 1395842.00 2791600.00  89966.46 4127409.67      0.00      0.00      1.00
06:16:49 AM      eth1 1395528.00 2791017.00  89946.17 4126551.24      0.00      0.00      0.00
06:16:50 AM      eth1 1397891.00 2795716.00  90098.74 4133497.39      0.00      0.00      1.00
06:16:51 AM      eth1 1394951.00 2789984.00  89908.96 4125022.51      0.00      0.00      0.00
06:16:52 AM      eth1 1394608.00 2789190.00  89886.90 4123851.36      0.00      0.00      1.00
06:16:53 AM      eth1 1395314.00 2790653.00  89934.33 4125983.09      0.00      0.00      0.00
06:16:54 AM      eth1 1396115.00 2792276.00  89984.25 4128411.21      0.00      0.00      1.00
06:16:55 AM      eth1 1396829.00 2793523.00  90030.19 4130250.28      0.00      0.00      0.00
Average:         eth1 1396158.40 2792297.00  89987.09 4128439.35      0.00      0.00      0.50

lpaa23:~# tc -s -d qd sh dev eth1 | grep backlog
 backlog 7900052b 2609p requeues 173287 
 backlog 878120b 290p requeues 589 
 backlog 1068884b 354p requeues 5621 
 backlog 996212b 329p requeues 1088 
 backlog 984100b 325p requeues 115316 
 backlog 956848b 316p requeues 1781 
 backlog 1080996b 357p requeues 3047 
 backlog 975016b 322p requeues 24571 
 backlog 990156b 327p requeues 21274 

(All 8 TX queues get a fair share of the traffic)

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_output.c |   26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 8d4eac793700..4a7e97811d71 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -839,26 +839,38 @@ void tcp_wfree(struct sk_buff *skb)
 {
 	struct sock *sk = skb->sk;
 	struct tcp_sock *tp = tcp_sk(sk);
+	int wmem;
+
+	/* Keep one reference on sk_wmem_alloc.
+	 * Will be released by sk_free() from here or tcp_tasklet_func()
+	 */
+	wmem = atomic_sub_return(skb->truesize - 1, &sk->sk_wmem_alloc);
+
+	/* If this softirq is serviced by ksoftirqd, we are likely under stress.
+	 * Wait until our queues (qdisc + devices) are drained.
+	 * This gives :
+	 * - less callbacks to tcp_write_xmit(), reducing stress (batches)
+	 * - chance for incoming ACK (processed by another cpu maybe)
+	 *   to migrate this flow (skb->ooo_okay will be eventually set)
+	 */
+	if (wmem >= SKB_TRUESIZE(1) && this_cpu_ksoftirqd() == current)
+		goto out;
 
 	if (test_and_clear_bit(TSQ_THROTTLED, &tp->tsq_flags) &&
 	    !test_and_set_bit(TSQ_QUEUED, &tp->tsq_flags)) {
 		unsigned long flags;
 		struct tsq_tasklet *tsq;
 
-		/* Keep a ref on socket.
-		 * This last ref will be released in tcp_tasklet_func()
-		 */
-		atomic_sub(skb->truesize - 1, &sk->sk_wmem_alloc);
-
 		/* queue this socket to tasklet queue */
 		local_irq_save(flags);
 		tsq = &__get_cpu_var(tsq_tasklet);
 		list_add(&tp->tsq_node, &tsq->head);
 		tasklet_schedule(&tsq->tasklet);
 		local_irq_restore(flags);
-	} else {
-		sock_wfree(skb);
+		return;
 	}
+out:
+	sk_free(sk);
 }
 
 /* This routine actually transmits TCP packets queued in by

^ permalink raw reply related

* Re: Regarding tx-nocache-copy in the Sheevaplug
From: Lluís Batlle i Rossell @ 2014-10-13 12:32 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel, netdev, Carles Pagès, linux-arm-kernel
In-Reply-To: <1413203171.9362.81.camel@edumazet-glaptop2.roam.corp.google.com>

On Mon, Oct 13, 2014 at 05:26:11AM -0700, Eric Dumazet wrote:
> On Mon, 2014-10-13 at 12:52 +0200, Lluís Batlle i Rossell wrote:
> > Hello,
> > 
> > on the 7th of January 2014 ths patch was applied:
> > https://lkml.org/lkml/2014/1/7/307
> > 
> > [PATCH v2] net: Do not enable tx-nocache-copy by default
> >         
> > In the Sheevaplug (ARM Feroceon 88FR131 from Marvell) this made packets to be
> > sent corrupted. I think this machine has something special about the cache.
> > 
> > Enabling back this tx-nocache-copy (as it used to be before the patch) the
> > transfers work fine again. I think that most people, encountering this problem,
> > completely disable the tx offload instead of enabling back this setting.
> > 
> > Is this an ARM kernel problem regarding this platform?
> 
> Which NIC and driver is this exactly ?

According to dmesg in 3.10.1:
[    7.858872] mv643xx_eth: MV-643xx 10/100/1000 ethernet driver version 1.4
[    7.866001] mv643xx_eth_port mv643xx_eth_port.0 eth0: port 0 with MAC address 00:50:43:01:d1:bb

Regards,
Lluís.

^ permalink raw reply

* Re: Regarding tx-nocache-copy in the Sheevaplug
From: Eric Dumazet @ 2014-10-13 12:26 UTC (permalink / raw)
  To: Lluís Batlle i Rossell
  Cc: linux-kernel, netdev, Carles Pagès, linux-arm-kernel
In-Reply-To: <20141013105246.GD1972@vicerveza.homeunix.net>

On Mon, 2014-10-13 at 12:52 +0200, Lluís Batlle i Rossell wrote:
> Hello,
> 
> on the 7th of January 2014 ths patch was applied:
> https://lkml.org/lkml/2014/1/7/307
> 
> [PATCH v2] net: Do not enable tx-nocache-copy by default
>         
> In the Sheevaplug (ARM Feroceon 88FR131 from Marvell) this made packets to be
> sent corrupted. I think this machine has something special about the cache.
> 
> Enabling back this tx-nocache-copy (as it used to be before the patch) the
> transfers work fine again. I think that most people, encountering this problem,
> completely disable the tx offload instead of enabling back this setting.
> 
> Is this an ARM kernel problem regarding this platform?

Which NIC and driver is this exactly ?

^ permalink raw reply

* Re: [patch net] ipv4: fix nexthop attlen check in fib_nh_match
From: Eric Dumazet @ 2014-10-13 12:22 UTC (permalink / raw)
  To: Jiri Pirko, Thomas Graf; +Cc: netdev, davem, kuznet, jmorris, yoshfuji, kaber
In-Reply-To: <1413194063-10354-1-git-send-email-jiri@resnulli.us>

On Mon, 2014-10-13 at 11:54 +0200, Jiri Pirko wrote:
> fib_nh_match does not match nexthops correctly. Example:
> 
> This command is not successful and route is removed. After this patch
> applied, the route is correctly matched and result is:
> RTNETLINK answers: No such process
> 
> Please consider this for stable trees as well.
> 
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---
>  net/ipv4/fib_semantics.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
> index 5b6efb3..f99f41b 100644
> --- a/net/ipv4/fib_semantics.c
> +++ b/net/ipv4/fib_semantics.c
> @@ -537,7 +537,7 @@ int fib_nh_match(struct fib_config *cfg, struct fib_info *fi)
>  			return 1;
>  
>  		attrlen = rtnh_attrlen(rtnh);
> -		if (attrlen < 0) {
> +		if (attrlen > 0) {
>  			struct nlattr *nla, *attrs = rtnh_attrs(rtnh);
>  
>  			nla = nla_find(attrs, attrlen, RTA_GATEWAY);

Fixes: 4e902c57417c4 ("[IPv4]: FIB configuration using struct fib_config")

Good catch, thanks !

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Please reply
From: Jose Calvache @ 2014-10-13 11:50 UTC (permalink / raw)


Dear Sir/Madam, Here is a pdf attachment of my proposal to you. Please
read and reply I would be grateful. Jose Calvache

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox