[PATCH] [IPV4] route: fix locking in rt_run

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] [IPV4] route: fix locking in rt_run_flush()
@ 2008-01-21 15:08 Joonwoo Park
  2008-01-21 10:40 ` David Miller
  0 siblings, 1 reply; 6+ messages in thread
From: Joonwoo Park @ 2008-01-21 15:08 UTC (permalink / raw)
  To: David Miller; +Cc: Joonwoo Park, linux-netdev

The rt_run_flush() can be stucked if it was called while netdev is on the 
high load.
It's possible when pushing rtable to rt_hash is faster than pulling
from it.

The commands 'ifconfig up or ifconfig mtu' and netif_carrier_on() can
introduce soft lockup like this:

[  363.528001] BUG: soft lockup - CPU#0 stuck for 11s! [events/0:9]
[  363.531492]
[  363.535027] Pid: 9, comm: events/0 Not tainted (2.6.24-rc8 #14)
[  363.538837] EIP: 0060:[<c4086a39>] EFLAGS: 00000286 CPU: 0
[  363.542762] EIP is at kfree+0xa9/0xf0
...
[  363.660815]  [<c42fb0fd>] skb_release_data+0x5d/0x90
[  363.666989]  [<c42fb7dc>] skb_release_all+0x5c/0xd0
[  363.673207]  [<c42faf8b>] __kfree_skb+0xb/0x90
[  363.679474]  [<c42fb029>] kfree_skb+0x19/0x40
[  363.685811]  [<c4322d87>] ip_rcv+0x27/0x290
[  363.692223]  [<c4300ae5>] netif_receive_skb+0x255/0x320
[  363.698759]  [<f88465aa>] e1000_clean_rx_irq+0x14a/0x4f0 [e1000]
[  363.705456]  [<f88437c2>] e1000_clean+0x62/0x270 [e1000]
[  363.712217]  [<c43031ee>] net_rx_action+0x16e/0x220
[  363.719065]  [<c40346d7>] __do_softirq+0x87/0x100
[  363.726001]  [<c40347a7>] do_softirq+0x57/0x60
[  363.732979]  [<c4034b4e>] local_bh_enable_ip+0xae/0x100
[  363.740094]  [<c43e73f5>] _spin_unlock_bh+0x25/0x30
[  363.747283]  [<c431ec88>] rt_run_flush+0xc8/0xe0
[  363.754566]  [<c4320c76>] rt_cache_flush+0xd6/0xe0
[  363.761917]  [<c4350269>] fib_netdev_event+0x89/0xa0
[  363.769361]  [<c4047d67>] notifier_call_chain+0x37/0x80
...

This patch makes rt_run_flush() to run with softirq is disabled.

Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
---
 net/ipv4/route.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 28484f3..454e71c 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -620,18 +620,20 @@ static void rt_run_flush(unsigned long dummy)
 
 	get_random_bytes(&rt_hash_rnd, 4);
 
+	local_bh_disable();
 	for (i = rt_hash_mask; i >= 0; i--) {
-		spin_lock_bh(rt_hash_lock_addr(i));
+		spin_lock(rt_hash_lock_addr(i));
 		rth = rt_hash_table[i].chain;
 		if (rth)
 			rt_hash_table[i].chain = NULL;
-		spin_unlock_bh(rt_hash_lock_addr(i));
+		spin_unlock(rt_hash_lock_addr(i));
 
 		for (; rth; rth = next) {
 			next = rth->u.dst.rt_next;
 			rt_free(rth);
 		}
 	}
+	local_bh_enable();
 }
 
 static DEFINE_SPINLOCK(rt_flush_lock);
-- 
1.5.3.rc5


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] [IPV4] route: fix locking in rt_run_flush()
  2008-01-21 15:08 [PATCH] [IPV4] route: fix locking in rt_run_flush() Joonwoo Park
@ 2008-01-21 10:40 ` David Miller
  2008-01-21 21:14   ` Eric Dumazet
  2008-01-23  7:43   ` joonwpark81
  0 siblings, 2 replies; 6+ messages in thread
From: David Miller @ 2008-01-21 10:40 UTC (permalink / raw)
  To: joonwpark81; +Cc: netdev, dada1

From: Joonwoo Park <joonwpark81@gmail.com>
Date: Tue, 22 Jan 2008 00:08:57 +0900

> The rt_run_flush() can be stucked if it was called while netdev is on the 
> high load.
> It's possible when pushing rtable to rt_hash is faster than pulling
> from it.
> 
> The commands 'ifconfig up or ifconfig mtu' and netif_carrier_on() can
> introduce soft lockup like this:
> 
> [  363.528001] BUG: soft lockup - CPU#0 stuck for 11s! [events/0:9]
> [  363.531492]
> [  363.535027] Pid: 9, comm: events/0 Not tainted (2.6.24-rc8 #14)
> [  363.538837] EIP: 0060:[<c4086a39>] EFLAGS: 00000286 CPU: 0
> [  363.542762] EIP is at kfree+0xa9/0xf0
> ...
> [  363.660815]  [<c42fb0fd>] skb_release_data+0x5d/0x90
> [  363.666989]  [<c42fb7dc>] skb_release_all+0x5c/0xd0
> [  363.673207]  [<c42faf8b>] __kfree_skb+0xb/0x90
> [  363.679474]  [<c42fb029>] kfree_skb+0x19/0x40
> [  363.685811]  [<c4322d87>] ip_rcv+0x27/0x290
> [  363.692223]  [<c4300ae5>] netif_receive_skb+0x255/0x320
> [  363.698759]  [<f88465aa>] e1000_clean_rx_irq+0x14a/0x4f0 [e1000]
> [  363.705456]  [<f88437c2>] e1000_clean+0x62/0x270 [e1000]
> [  363.712217]  [<c43031ee>] net_rx_action+0x16e/0x220
> [  363.719065]  [<c40346d7>] __do_softirq+0x87/0x100
> [  363.726001]  [<c40347a7>] do_softirq+0x57/0x60
> [  363.732979]  [<c4034b4e>] local_bh_enable_ip+0xae/0x100
> [  363.740094]  [<c43e73f5>] _spin_unlock_bh+0x25/0x30
> [  363.747283]  [<c431ec88>] rt_run_flush+0xc8/0xe0
> [  363.754566]  [<c4320c76>] rt_cache_flush+0xd6/0xe0
> [  363.761917]  [<c4350269>] fib_netdev_event+0x89/0xa0
> [  363.769361]  [<c4047d67>] notifier_call_chain+0x37/0x80
> ...
> 
> This patch makes rt_run_flush() to run with softirq is disabled.
> 
> Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>

I agree with the analysis of the problem, however not the solution.

This will absolutely kill software interrupt latency.

In fact, we have moved much of the flush work into a workqueue in
net-2.6.25 because of how important that is

We need to find some other way to solve this.

Eric, any ideas?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] [IPV4] route: fix locking in rt_run_flush()
  2008-01-21 10:40 ` David Miller
@ 2008-01-21 21:14   ` Eric Dumazet
  2008-01-23  7:43   ` joonwpark81
  1 sibling, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2008-01-21 21:14 UTC (permalink / raw)
  To: David Miller; +Cc: joonwpark81, netdev

David Miller a écrit :
> From: Joonwoo Park <joonwpark81@gmail.com>
> Date: Tue, 22 Jan 2008 00:08:57 +0900
> 
>> The rt_run_flush() can be stucked if it was called while netdev is on the 
>> high load.
>> It's possible when pushing rtable to rt_hash is faster than pulling
>> from it.
>>
>> The commands 'ifconfig up or ifconfig mtu' and netif_carrier_on() can
>> introduce soft lockup like this:
>>
>> [  363.528001] BUG: soft lockup - CPU#0 stuck for 11s! [events/0:9]
>> [  363.531492]
>> [  363.535027] Pid: 9, comm: events/0 Not tainted (2.6.24-rc8 #14)
>> [  363.538837] EIP: 0060:[<c4086a39>] EFLAGS: 00000286 CPU: 0
>> [  363.542762] EIP is at kfree+0xa9/0xf0
>> ...
>> [  363.660815]  [<c42fb0fd>] skb_release_data+0x5d/0x90
>> [  363.666989]  [<c42fb7dc>] skb_release_all+0x5c/0xd0
>> [  363.673207]  [<c42faf8b>] __kfree_skb+0xb/0x90
>> [  363.679474]  [<c42fb029>] kfree_skb+0x19/0x40
>> [  363.685811]  [<c4322d87>] ip_rcv+0x27/0x290
>> [  363.692223]  [<c4300ae5>] netif_receive_skb+0x255/0x320
>> [  363.698759]  [<f88465aa>] e1000_clean_rx_irq+0x14a/0x4f0 [e1000]
>> [  363.705456]  [<f88437c2>] e1000_clean+0x62/0x270 [e1000]
>> [  363.712217]  [<c43031ee>] net_rx_action+0x16e/0x220
>> [  363.719065]  [<c40346d7>] __do_softirq+0x87/0x100
>> [  363.726001]  [<c40347a7>] do_softirq+0x57/0x60
>> [  363.732979]  [<c4034b4e>] local_bh_enable_ip+0xae/0x100
>> [  363.740094]  [<c43e73f5>] _spin_unlock_bh+0x25/0x30
>> [  363.747283]  [<c431ec88>] rt_run_flush+0xc8/0xe0
>> [  363.754566]  [<c4320c76>] rt_cache_flush+0xd6/0xe0
>> [  363.761917]  [<c4350269>] fib_netdev_event+0x89/0xa0
>> [  363.769361]  [<c4047d67>] notifier_call_chain+0x37/0x80
>> ...
>>
>> This patch makes rt_run_flush() to run with softirq is disabled.
>>
>> Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
> 
> I agree with the analysis of the problem, however not the solution.
> 
> This will absolutely kill software interrupt latency.
> 
> In fact, we have moved much of the flush work into a workqueue in
> net-2.6.25 because of how important that is
> 
> We need to find some other way to solve this.
> 
> Eric, any ideas?

Hum... 2.6.25 is certainly better in this aspect, but I remember we left 
something to finish :)

We currently can have a worker doing the automatic flush every 600 seconds, 
and another task doing a rt_cache_flush(...)

On very loaded machines (DDOS), routes might be added faster than deleted.

Also, each change in routes must invalidate rtcache, and/but full scan of this 
cache is way too expensive (huge amount of MBytes must me read/written)

One possibility is to use a genid marker so that each entry can be thrown away 
   if its genid is different than the global one.

rt_cache_flush(-1) or rt_secret_build() would just have to increment the 
global genid.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] [IPV4] route: fix locking in rt_run_flush()
  2008-01-21 10:40 ` David Miller
  2008-01-21 21:14   ` Eric Dumazet
@ 2008-01-23  7:43   ` joonwpark81
  2008-01-23 20:44     ` Eric Dumazet
  1 sibling, 1 reply; 6+ messages in thread
From: joonwpark81 @ 2008-01-23  7:43 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Eric Dumazet

On Mon, Jan 21, 2008 at 02:40:43AM -0800, David Miller wrote:
> From: Joonwoo Park <joonwpark81@gmail.com>
> Date: Tue, 22 Jan 2008 00:08:57 +0900
> 
> > The rt_run_flush() can be stucked if it was called while netdev is on the 
> > high load.
> > It's possible when pushing rtable to rt_hash is faster than pulling
> > from it.
> > 
> > Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
> 
> I agree with the analysis of the problem, however not the solution.
> 
> This will absolutely kill software interrupt latency.
> 
> In fact, we have moved much of the flush work into a workqueue in
> net-2.6.25 because of how important that is
> 
> We need to find some other way to solve this.
> 

Dave, Eric,
Thanks so much for comments.

I did stress tests and I found that the real problem was not consumer & supplier
issue.
It was the problem for me to innumerable enabling & disabling the softirq.
But I'm still thinking need of considering issue 'faster caching than flush'. :) 

ifconfig up on heavy loaded interface.
Before patching:
 time ifconfig eth1 up
 BUG: soft lockup - CPU#0 stuck for 11s! [events/0:9]
 ...

After patching:
 time ifconfig eth1 up
real	0m0.007s
user	0m0.000s
sys	0m0.004s

Thanks!
Joonwoo


>From 87c29506de967e811ad5b57cd2e1a002134e878f Mon Sep 17 00:00:00 2001
From: Joonwoo Park <joonwpark81@gmail.com>
Date: Wed, 23 Jan 2008 15:16:54 +0900
Subject: [PATCH] [IPV4] route: reduce locking/unlocking in rt_run_flush

The rt_run_flush does spin_lock_bh/spin_unlock_bh for rt_hash_mask + 1
times.
The rt_hash_mask takes from 32767 to 65535, so it's big overhead.
In addition, disable_bh/enable_bh for many times in the rt_run_flush
can cause stuck on a machine with heavily pended softirqs.

This patch reduces locking/unlocking as doing it with jumping the lock
slots.

ifconfig up on heavy loaded interface.
Before:
 time ifconfig eth1 up
 BUG: soft lockup - CPU#0 stuck for 11s! [events/0:9]
 ...

After:
 time ifconfig eth1 up
real	0m0.007s
user	0m0.000s
sys	0m0.004s

Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
---
 net/ipv4/route.c |   38 +++++++++++++++++++++++++++-----------
 1 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 28484f3..79a401f 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -239,9 +239,20 @@ static spinlock_t	*rt_hash_locks;
 		for (i = 0; i < RT_HASH_LOCK_SZ; i++) \
 			spin_lock_init(&rt_hash_locks[i]); \
 		}
+# define rt_hash_lock(lock) &rt_hash_locks[lock]
+# define rt_hash_for_each_lock(lock) \
+	for (lock = 0; lock < RT_HASH_LOCK_SZ; lock++)
+# define rt_hash_for_each_slot(slot, lock, mask) \
+	for (slot = 0, mask = lock; \
+		slot < ((int)rt_hash_mask + 1) / RT_HASH_LOCK_SZ; \
+		slot++, mask = (slot * RT_HASH_LOCK_SZ) + lock)
 #else
 # define rt_hash_lock_addr(slot) NULL
 # define rt_hash_lock_init()
+# define rt_hash_lock(lock) NULL
+# define rt_hash_for_each_lock(lock) do { lock = 0; } while (0);
+# define rt_hash_for_each_slot(slot, lock, mask) \
+	for (slot = 0, lock = 0, mask = 0; mask <= rt_hash_mask; mask++)
 #endif
 
 static struct rt_hash_bucket 	*rt_hash_table;
@@ -613,24 +624,29 @@ static void rt_check_expire(struct work_struct *work)
  */
 static void rt_run_flush(unsigned long dummy)
 {
-	int i;
+	int slot, lock, mask;
 	struct rtable *rth, *next;
 
 	rt_deadline = 0;
 
 	get_random_bytes(&rt_hash_rnd, 4);
 
-	for (i = rt_hash_mask; i >= 0; i--) {
-		spin_lock_bh(rt_hash_lock_addr(i));
-		rth = rt_hash_table[i].chain;
-		if (rth)
-			rt_hash_table[i].chain = NULL;
-		spin_unlock_bh(rt_hash_lock_addr(i));
-
-		for (; rth; rth = next) {
-			next = rth->u.dst.rt_next;
-			rt_free(rth);
+	rt_hash_for_each_lock(lock) {
+		spin_lock_bh(rt_hash_lock(lock));
+		rt_hash_for_each_slot(slot, lock, mask) {
+			rth = rt_hash_table[mask].chain;
+
+			if (rth) {
+				rt_hash_table[mask].chain = NULL;
+				spin_unlock_bh(rt_hash_lock(lock));
+				for (; rth; rth = next) {
+					next = rth->u.dst.rt_next;
+					rt_free(rth);
+				}
+				spin_lock_bh(rt_hash_lock(lock));
+			}
 		}
+		spin_unlock_bh(rt_hash_lock(lock));
 	}
 }
 
-- 
1.5.3.rc5


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] [IPV4] route: fix locking in rt_run_flush()
  2008-01-23  7:43   ` joonwpark81
@ 2008-01-23 20:44     ` Eric Dumazet
  2008-01-24  5:06       ` Joonwoo Park
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2008-01-23 20:44 UTC (permalink / raw)
  To: joonwpark81; +Cc: David Miller, netdev

joonwpark81@gmail.com a écrit :
> On Mon, Jan 21, 2008 at 02:40:43AM -0800, David Miller wrote:
>> From: Joonwoo Park <joonwpark81@gmail.com>
>> Date: Tue, 22 Jan 2008 00:08:57 +0900
>>
>>> The rt_run_flush() can be stucked if it was called while netdev is on the 
>>> high load.
>>> It's possible when pushing rtable to rt_hash is faster than pulling
>>> from it.
>>>
>>> Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
>> I agree with the analysis of the problem, however not the solution.
>>
>> This will absolutely kill software interrupt latency.
>>
>> In fact, we have moved much of the flush work into a workqueue in
>> net-2.6.25 because of how important that is
>>
>> We need to find some other way to solve this.
>>
> 
> Dave, Eric,
> Thanks so much for comments.
> 
> I did stress tests and I found that the real problem was not consumer & supplier
> issue.
> It was the problem for me to innumerable enabling & disabling the softirq.
> But I'm still thinking need of considering issue 'faster caching than flush'. :) 
> 
> ifconfig up on heavy loaded interface.
> Before patching:
>  time ifconfig eth1 up
>  BUG: soft lockup - CPU#0 stuck for 11s! [events/0:9]
>  ...
> 
> After patching:
>  time ifconfig eth1 up
> real	0m0.007s
> user	0m0.000s
> sys	0m0.004s
> 
> Thanks!
> Joonwoo
> 
> 
>>From 87c29506de967e811ad5b57cd2e1a002134e878f Mon Sep 17 00:00:00 2001
> From: Joonwoo Park <joonwpark81@gmail.com>
> Date: Wed, 23 Jan 2008 15:16:54 +0900
> Subject: [PATCH] [IPV4] route: reduce locking/unlocking in rt_run_flush
> 
> The rt_run_flush does spin_lock_bh/spin_unlock_bh for rt_hash_mask + 1
> times.
> The rt_hash_mask takes from 32767 to 65535, so it's big overhead.
> In addition, disable_bh/enable_bh for many times in the rt_run_flush
> can cause stuck on a machine with heavily pended softirqs.
> 
> This patch reduces locking/unlocking as doing it with jumping the lock
> slots.
> 
> ifconfig up on heavy loaded interface.
> Before:
>  time ifconfig eth1 up
>  BUG: soft lockup - CPU#0 stuck for 11s! [events/0:9]
>  ...
> 
> After:
>  time ifconfig eth1 up
> real	0m0.007s
> user	0m0.000s
> sys	0m0.004s
> 

Unfortunatly, your patch doesnt work on CONFIG_SMP=n (softirq will be disabled 
for the whole scan of table)

Also, some machines around there have 2^22 slots in hash table, and NR_CPUS=4, 
so softirqs will be disabled for a too long time.

Please try net-2.6.25 and submit patches on top of it if necessary, since 
rt_run_flush() has pending changes, not in net-2.6

Note : The 'soft lockup' can be avoided by other means.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] [IPV4] route: fix locking in rt_run_flush()
  2008-01-23 20:44     ` Eric Dumazet
@ 2008-01-24  5:06       ` Joonwoo Park
  0 siblings, 0 replies; 6+ messages in thread
From: Joonwoo Park @ 2008-01-24  5:06 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

2008/1/24, Eric Dumazet <dada1@cosmosbay.com>:
>
> Unfortunatly, your patch doesnt work on CONFIG_SMP=n (softirq will be disabled
> for the whole scan of table)
>
> Also, some machines around there have 2^22 slots in hash table, and NR_CPUS=4,
> so softirqs will be disabled for a too long time.
>
> Please try net-2.6.25 and submit patches on top of it if necessary, since
> rt_run_flush() has pending changes, not in net-2.6
>
> Note : The 'soft lockup' can be avoided by other means.
>
>

Eric,
Thank you for your help.

I checked net-2.6.25 in this morining :(
And it brought me a conclusion that the fashion of it is the best solution for
the all circumstances.

I believe that
 'cond_resched()'
and
 'if (!rth) continue' without disable softirq
of net-2.6.25 is correct answer.

Thanks again!
Joonwoo

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-01-24  5:06 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-21 15:08 [PATCH] [IPV4] route: fix locking in rt_run_flush() Joonwoo Park
2008-01-21 10:40 ` David Miller
2008-01-21 21:14   ` Eric Dumazet
2008-01-23  7:43   ` joonwpark81
2008-01-23 20:44     ` Eric Dumazet
2008-01-24  5:06       ` Joonwoo Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).