* neigh_periodic_timer expires too often
@ 2009-07-30 10:44 Luciano Coelho
2009-07-30 13:15 ` Eric Dumazet
0 siblings, 1 reply; 7+ messages in thread
From: Luciano Coelho @ 2009-07-30 10:44 UTC (permalink / raw)
To: netdev; +Cc: tero.kristo
Hi,
We were making some measurements and trying to figure out which timers
in the kernel can be made deferrable so that our device doesn't have to
wake up too often.
During this investigation, we found out that the neigh_periodic_timer is
expiring approximately every 8 seconds, even when we don't have a
network connection established. After the connection is established,
the timer starts expiring every 2 seconds and continues to expire at
this interval after the connection is closed.
We have been converting many of the kernel times to deferrable timers.
Checking the netdev mailing list archives, I found out that this issue
has been discussed in December 2007 [1], but the thread seems to have
died out and the proposed patch has never been applied AFAICS.
Another proposed solution, which has never been applied either, was to
convert this timer from softirq-based to workqueue-based [2]. Would
that be any better?
So, my question is, does it make sense to make this timer deferrable or
use the workqueue instead? Or is there any other better solution to
avoid unnecessarily frequent wakeups caused by neigh_periodic_timer?
We are using a kernel based on 2.6.28.
[1] http://article.gmane.org/gmane.linux.network/81361
[2] http://article.gmane.org/gmane.linux.network/81140
--
Cheers,
Luca.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: neigh_periodic_timer expires too often
2009-07-30 10:44 neigh_periodic_timer expires too often Luciano Coelho
@ 2009-07-30 13:15 ` Eric Dumazet
2009-07-30 14:33 ` Luciano Coelho
2009-08-03 1:35 ` David Miller
0 siblings, 2 replies; 7+ messages in thread
From: Eric Dumazet @ 2009-07-30 13:15 UTC (permalink / raw)
To: Luciano Coelho; +Cc: netdev, tero.kristo
Luciano Coelho a écrit :
> Hi,
>
> We were making some measurements and trying to figure out which timers
> in the kernel can be made deferrable so that our device doesn't have to
> wake up too often.
>
> During this investigation, we found out that the neigh_periodic_timer is
> expiring approximately every 8 seconds, even when we don't have a
> network connection established. After the connection is established,
> the timer starts expiring every 2 seconds and continues to expire at
> this interval after the connection is closed.
>
> We have been converting many of the kernel times to deferrable timers.
> Checking the netdev mailing list archives, I found out that this issue
> has been discussed in December 2007 [1], but the thread seems to have
> died out and the proposed patch has never been applied AFAICS.
>
> Another proposed solution, which has never been applied either, was to
> convert this timer from softirq-based to workqueue-based [2]. Would
> that be any better?
>
> So, my question is, does it make sense to make this timer deferrable or
> use the workqueue instead? Or is there any other better solution to
> avoid unnecessarily frequent wakeups caused by neigh_periodic_timer?
>
> We are using a kernel based on 2.6.28.
>
> [1] http://article.gmane.org/gmane.linux.network/81361
> [2] http://article.gmane.org/gmane.linux.network/81140
>
Hi Luciano
Thank you for reminding us this stuff !
You are mixing two different problems here.
First patch you link (from Stephen Hemminger) was using deferrable timer for
individual entry (function neigh_timer_handler()). Probably not your
current concern.
Second patch you link (from me) was using a workqueue (instead of a timer)
for garbage collection of whole table.
I bet you are more intested about the garbage collection problem
(function neigh_periodic_timer()) so I could try to resurrect
my patch. I guess at that time Parag didnt replied to my mail...
If nobody cared, patch was not considered for inclusion. Thats all.
So please test it and tell me/us you like it :)
Thanks
[PATCH net-next-2.6] neigh: Convert garbage collection from softirq to workqueue
Current neigh_periodic_timer() function is fired by timer IRQ, and
scans one hash bucket each round (very litle work in fact)
As we are supposed to scan whole hash table in 15 seconds, this means
neigh_periodic_timer() can be fired very often. (depending on the number
of concurrent hash entries we stored in this table)
Converting this to a workqueue permits scanning whole table, minimizing
icache pollution, and firing this work every 15 seconds, independantly
of hash table size.
This 15 seconds delay is not a hard number, as work is a deferrable one.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
include/net/neighbour.h | 4 -
net/core/neighbour.c | 89 ++++++++++++++++++--------------------
2 files changed, 45 insertions(+), 48 deletions(-)
diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index d8d790e..18b69b6 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -24,6 +24,7 @@
#include <linux/err.h>
#include <linux/sysctl.h>
+#include <linux/workqueue.h>
#include <net/rtnetlink.h>
/*
@@ -167,7 +168,7 @@ struct neigh_table
int gc_thresh2;
int gc_thresh3;
unsigned long last_flush;
- struct timer_list gc_timer;
+ struct delayed_work gc_work;
struct timer_list proxy_timer;
struct sk_buff_head proxy_queue;
atomic_t entries;
@@ -178,7 +179,6 @@ struct neigh_table
struct neighbour **hash_buckets;
unsigned int hash_mask;
__u32 hash_rnd;
- unsigned int hash_chain_gc;
struct pneigh_entry **phash_buckets;
};
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index c6f9ad8..e587e68 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -692,75 +692,74 @@ static void neigh_connect(struct neighbour *neigh)
hh->hh_output = neigh->ops->hh_output;
}
-static void neigh_periodic_timer(unsigned long arg)
+static void neigh_periodic_work(struct work_struct *work)
{
- struct neigh_table *tbl = (struct neigh_table *)arg;
+ struct neigh_table *tbl = container_of(work, struct neigh_table, gc_work.work);
struct neighbour *n, **np;
- unsigned long expire, now = jiffies;
+ unsigned int i;
NEIGH_CACHE_STAT_INC(tbl, periodic_gc_runs);
- write_lock(&tbl->lock);
+ write_lock_bh(&tbl->lock);
/*
* periodically recompute ReachableTime from random function
*/
- if (time_after(now, tbl->last_rand + 300 * HZ)) {
+ if (time_after(jiffies, tbl->last_rand + 300 * HZ)) {
struct neigh_parms *p;
- tbl->last_rand = now;
+ tbl->last_rand = jiffies;
for (p = &tbl->parms; p; p = p->next)
p->reachable_time =
neigh_rand_reach_time(p->base_reachable_time);
}
- np = &tbl->hash_buckets[tbl->hash_chain_gc];
- tbl->hash_chain_gc = ((tbl->hash_chain_gc + 1) & tbl->hash_mask);
+ for (i = 0 ; i <= tbl->hash_mask; i++) {
+ np = &tbl->hash_buckets[i];
- while ((n = *np) != NULL) {
- unsigned int state;
+ while ((n = *np) != NULL) {
+ unsigned int state;
- write_lock(&n->lock);
+ write_lock(&n->lock);
- state = n->nud_state;
- if (state & (NUD_PERMANENT | NUD_IN_TIMER)) {
- write_unlock(&n->lock);
- goto next_elt;
- }
+ state = n->nud_state;
+ if (state & (NUD_PERMANENT | NUD_IN_TIMER)) {
+ write_unlock(&n->lock);
+ goto next_elt;
+ }
- if (time_before(n->used, n->confirmed))
- n->used = n->confirmed;
+ if (time_before(n->used, n->confirmed))
+ n->used = n->confirmed;
- if (atomic_read(&n->refcnt) == 1 &&
- (state == NUD_FAILED ||
- time_after(now, n->used + n->parms->gc_staletime))) {
- *np = n->next;
- n->dead = 1;
+ if (atomic_read(&n->refcnt) == 1 &&
+ (state == NUD_FAILED ||
+ time_after(jiffies, n->used + n->parms->gc_staletime))) {
+ *np = n->next;
+ n->dead = 1;
+ write_unlock(&n->lock);
+ neigh_cleanup_and_release(n);
+ continue;
+ }
write_unlock(&n->lock);
- neigh_cleanup_and_release(n);
- continue;
- }
- write_unlock(&n->lock);
next_elt:
- np = &n->next;
+ np = &n->next;
+ }
+ /*
+ * It's fine to release lock here, even if hash table
+ * grows while we are preempted.
+ */
+ write_unlock_bh(&tbl->lock);
+ cond_resched();
+ write_lock_bh(&tbl->lock);
}
-
/* Cycle through all hash buckets every base_reachable_time/2 ticks.
* ARP entry timeouts range from 1/2 base_reachable_time to 3/2
* base_reachable_time.
*/
- expire = tbl->parms.base_reachable_time >> 1;
- expire /= (tbl->hash_mask + 1);
- if (!expire)
- expire = 1;
-
- if (expire>HZ)
- mod_timer(&tbl->gc_timer, round_jiffies(now + expire));
- else
- mod_timer(&tbl->gc_timer, now + expire);
-
- write_unlock(&tbl->lock);
+ schedule_delayed_work(&tbl->gc_work,
+ tbl->parms.base_reachable_time >> 1);
+ write_unlock_bh(&tbl->lock);
}
static __inline__ int neigh_max_probes(struct neighbour *n)
@@ -1442,10 +1441,8 @@ void neigh_table_init_no_netlink(struct neigh_table *tbl)
get_random_bytes(&tbl->hash_rnd, sizeof(tbl->hash_rnd));
rwlock_init(&tbl->lock);
- setup_timer(&tbl->gc_timer, neigh_periodic_timer, (unsigned long)tbl);
- tbl->gc_timer.expires = now + 1;
- add_timer(&tbl->gc_timer);
-
+ INIT_DELAYED_WORK_DEFERRABLE(&tbl->gc_work, neigh_periodic_work);
+ schedule_delayed_work(&tbl->gc_work, tbl->parms.reachable_time);
setup_timer(&tbl->proxy_timer, neigh_proxy_process, (unsigned long)tbl);
skb_queue_head_init_class(&tbl->proxy_queue,
&neigh_table_proxy_queue_class);
@@ -1482,7 +1479,8 @@ int neigh_table_clear(struct neigh_table *tbl)
struct neigh_table **tp;
/* It is not clean... Fix it to unload IPv6 module safely */
- del_timer_sync(&tbl->gc_timer);
+ cancel_delayed_work(&tbl->gc_work);
+ flush_scheduled_work();
del_timer_sync(&tbl->proxy_timer);
pneigh_queue_purge(&tbl->proxy_queue);
neigh_ifdown(tbl, NULL);
@@ -1752,7 +1750,6 @@ static int neightbl_fill_info(struct sk_buff *skb, struct neigh_table *tbl,
.ndtc_last_rand = jiffies_to_msecs(rand_delta),
.ndtc_hash_rnd = tbl->hash_rnd,
.ndtc_hash_mask = tbl->hash_mask,
- .ndtc_hash_chain_gc = tbl->hash_chain_gc,
.ndtc_proxy_qlen = tbl->proxy_queue.qlen,
};
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: neigh_periodic_timer expires too often
2009-07-30 13:15 ` Eric Dumazet
@ 2009-07-30 14:33 ` Luciano Coelho
2009-07-31 12:22 ` Luciano Coelho
2009-08-03 1:35 ` David Miller
1 sibling, 1 reply; 7+ messages in thread
From: Luciano Coelho @ 2009-07-30 14:33 UTC (permalink / raw)
To: ext Eric Dumazet; +Cc: netdev@vger.kernel.org, Kristo Tero (Nokia-D/Tampere)
Hi Eric,
Thanks for your very quick response!
ext Eric Dumazet wrote:
> You are mixing two different problems here.
>
> First patch you link (from Stephen Hemminger) was using deferrable timer for
> individual entry (function neigh_timer_handler()). Probably not your
> current concern.
>
> Second patch you link (from me) was using a workqueue (instead of a timer)
> for garbage collection of whole table.
>
Yes, it was the second patch that I saw as the most efficient way to
deal with the problem. I mentioned the first one because we were
considering using deferrable timers until we saw your workqueue patch. ;)
> I bet you are more intested about the garbage collection problem
> (function neigh_periodic_timer()) so I could try to resurrect
> my patch. I guess at that time Parag didnt replied to my mail...
>
> If nobody cared, patch was not considered for inclusion. Thats all.
>
Indeed, the interest seemed to have died out. But I was not sure
whether there had been some other discussions elsewhere and the decision
had been not to take that patch because of some real technical reasons.
I'm glad that was not the case ;)
> So please test it and tell me/us you like it :)
>
I applied your patch on top of the 2.6.28 I'm using and will start
testing it now. I can also try to test it on top of 2.6.31-rc4 plus
wireless-testing stuff, if needed.
I'll tell you that I liked your patch in a moment (I actually already
liked it, but I'll wait until I know it works before I tell you ;)
--
Cheers,
Luca.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: neigh_periodic_timer expires too often
2009-07-30 14:33 ` Luciano Coelho
@ 2009-07-31 12:22 ` Luciano Coelho
2009-07-31 12:50 ` Eric Dumazet
0 siblings, 1 reply; 7+ messages in thread
From: Luciano Coelho @ 2009-07-31 12:22 UTC (permalink / raw)
To: ext Eric Dumazet; +Cc: netdev@vger.kernel.org, Kristo Tero (Nokia-D/Tampere)
Coelho Luciano (Nokia-D/Helsinki) wrote:
> ext Eric Dumazet wrote:
>
>> So please test it and tell me/us you like it :)
>>
>>
>
> I applied your patch on top of the 2.6.28 I'm using and will start
> testing it now. I can also try to test it on top of 2.6.31-rc4 plus
> wireless-testing stuff, if needed.
>
> I'll tell you that I liked your patch in a moment (I actually already
> liked it, but I'll wait until I know it works before I tell you ;)
>
Just a small update. We have run some very basic tests with 2.6.28.
The patch seems to work fine and we don't see excessive wake-ups from
neigh anymore (thanks Tero for helping with the tests). We will still
run one of our full release test rounds in order to make sure that there
are no regressions. I'll let you know when we have some more results.
--
Cheers,
Luca.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: neigh_periodic_timer expires too often
2009-07-31 12:22 ` Luciano Coelho
@ 2009-07-31 12:50 ` Eric Dumazet
2009-08-05 7:48 ` Luciano Coelho
0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2009-07-31 12:50 UTC (permalink / raw)
To: Luciano Coelho; +Cc: netdev@vger.kernel.org, Kristo Tero (Nokia-D/Tampere)
Luciano Coelho a écrit :
> Coelho Luciano (Nokia-D/Helsinki) wrote:
>> ext Eric Dumazet wrote:
>>
>>> So please test it and tell me/us you like it :)
>>>
>>
>> I applied your patch on top of the 2.6.28 I'm using and will start
>> testing it now. I can also try to test it on top of 2.6.31-rc4 plus
>> wireless-testing stuff, if needed.
>>
>> I'll tell you that I liked your patch in a moment (I actually already
>> liked it, but I'll wait until I know it works before I tell you ;)
>>
>
> Just a small update. We have run some very basic tests with 2.6.28.
> The patch seems to work fine and we don't see excessive wake-ups from
> neigh anymore (thanks Tero for helping with the tests). We will still
> run one of our full release test rounds in order to make sure that there
> are no regressions. I'll let you know when we have some more results.
>
Thanks for the update, I am running this patch on net-next-2.6 with no problem.
An 'easy' way to count how many time gc fired is to look at 10th columns of /proc/net/stat/arp_cache :
$ cat /proc/net/stat/arp_cache ; cat /proc/uptime ; sleep 150 ; cat /proc/net/stat/arp_cache
entries allocs destroys hash_grows lookups hits res_failed rcv_probes_mcast rcv_probes_ucast periodic_gc_runs forced_gc_runs unresolved_discards
000000fd 00000059 0000050e 00000001 00004552 0000032b 00000022 00000000 00000000 000013e5 00000000 00000000
000000fd 000000df 00000003 00000002 000005a5 0000014d 00000010 00000000 00000000 00000000 00000000 00000000
000000fd 000000cb 00000000 00000000 0000054f 00000141 00000011 00000000 00000000 00000000 00000000 00000000
000000fd 000000b2 00000000 00000001 000005ec 000001d7 0000001d 00000000 00000000 00000000 00000000 00000004
000000fd 000000de 00000000 00000001 000005f7 0000019b 00000027 00000000 00000000 00000000 00000000 00000000
000000fd 000000e5 00000000 00000000 000005db 0000018e 00000027 00000000 00000000 00000000 00000000 00000000
000000fd 000000d3 00000000 00000002 000005e4 0000018a 00000024 00000000 00000000 00000000 00000000 00000000
000000fd 000000c3 00000000 00000000 000005ba 000001a0 00000025 00000000 00000000 00000000 00000000 00000000
79173.78 0.31
entries allocs destroys hash_grows lookups hits res_failed rcv_probes_mcast rcv_probes_ucast periodic_gc_runs forced_gc_runs unresolved_discards
000000fd 00000059 0000050f 00000001 00004569 00000342 00000022 00000000 00000000 000013ee 00000000 00000000
000000fd 000000df 00000003 00000002 000005a8 00000150 00000010 00000000 00000000 00000000 00000000 00000000
000000fd 000000cb 00000000 00000000 00000550 00000142 00000011 00000000 00000000 00000000 00000000 00000000
000000fd 000000b2 00000000 00000001 000005ed 000001d8 0000001d 00000000 00000000 00000000 00000000 00000004
000000fd 000000de 00000000 00000001 000005f8 0000019c 00000027 00000000 00000000 00000000 00000000 00000000
000000fd 000000e6 00000000 00000000 000005df 00000190 00000027 00000000 00000000 00000000 00000000 00000000
000000fd 000000d3 00000000 00000002 000005e6 0000018c 00000024 00000000 00000000 00000000 00000000 00000000
000000fd 000000c3 00000000 00000000 000005bc 000001a1 00000025 00000000 00000000 00000000 00000000 00000000
In a 150 second interval, it was fired 10 times, even with a 256 slots hash table,
instead of 2560 times with pristine kernel.
On my dev machine, neigh_timer_handler is now at the bottom of powertop :)
Wakeups-from-idle per second : 2.5 interval: 10.0s
no ACPI power usage estimate available
Top causes for wakeups:
24.0% ( 4.0) <kernel core> : usb_hcd_poll_rh_status (rh_timer_func)
18.6% ( 3.1) <kernel core> : bnx2_timer (bnx2_timer)
12.0% ( 2.0) <kernel core> : clocksource_watchdog (clocksource_watchdog)
6.0% ( 1.0) <kernel core> : tg3_timer (tg3_timer)
6.0% ( 1.0) bond0 : queue_delayed_work (delayed_work_timer_fn)
6.0% ( 1.0) ntpd : hrtimer_start_range_ns (it_real_fn)
6.0% ( 1.0) <kernel core> : run_timer_softirq (htable_gc)
5.4% ( 0.9) hpsmhd : hrtimer_start_range_ns (hrtimer_wakeup)
3.0% ( 0.5) <interrupt> : cciss0
2.4% ( 0.4) <kernel core> : dev_watchdog (dev_watchdog)
1.8% ( 0.3) <kernel core> : queue_delayed_work (delayed_work_timer_fn)
1.2% ( 0.2) <(null)> : (hrtimer_wakeup)
1.2% ( 0.2) master : start_this_handle (commit_timeout)
1.2% ( 0.2) snmptrapd : hrtimer_start_range_ns (hrtimer_wakeup)
1.2% ( 0.2) pdflush : wb_kupdate (wb_timer_fn)
0.6% ( 0.1) <interrupt> : eth0
0.6% ( 0.1) master : hrtimer_start_range_ns (hrtimer_wakeup)
0.6% ( 0.1) ntpd : blk_plug_device (blk_unplug_timeout)
0.6% ( 0.1) syslogd : hrtimer_start (it_real_fn)
0.6% ( 0.1) init : hrtimer_start_range_ns (hrtimer_wakeup)
0.6% ( 0.1) ntpd : start_this_handle (commit_timeout)
0.6% ( 0.1) <kernel core> : neigh_timer_handler (neigh_timer_handler)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: neigh_periodic_timer expires too often
2009-07-30 13:15 ` Eric Dumazet
2009-07-30 14:33 ` Luciano Coelho
@ 2009-08-03 1:35 ` David Miller
1 sibling, 0 replies; 7+ messages in thread
From: David Miller @ 2009-08-03 1:35 UTC (permalink / raw)
To: eric.dumazet; +Cc: luciano.coelho, netdev, tero.kristo
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 30 Jul 2009 15:15:07 +0200
> [PATCH net-next-2.6] neigh: Convert garbage collection from softirq to workqueue
>
> Current neigh_periodic_timer() function is fired by timer IRQ, and
> scans one hash bucket each round (very litle work in fact)
>
> As we are supposed to scan whole hash table in 15 seconds, this means
> neigh_periodic_timer() can be fired very often. (depending on the number
> of concurrent hash entries we stored in this table)
>
> Converting this to a workqueue permits scanning whole table, minimizing
> icache pollution, and firing this work every 15 seconds, independantly
> of hash table size.
>
> This 15 seconds delay is not a hard number, as work is a deferrable one.
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Applied to net-next-2.6, thanks!
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: neigh_periodic_timer expires too often
2009-07-31 12:50 ` Eric Dumazet
@ 2009-08-05 7:48 ` Luciano Coelho
0 siblings, 0 replies; 7+ messages in thread
From: Luciano Coelho @ 2009-08-05 7:48 UTC (permalink / raw)
To: ext Eric Dumazet; +Cc: netdev@vger.kernel.org, Kristo Tero (Nokia-D/Tampere)
ext Eric Dumazet wrote:
> Luciano Coelho a écrit :
>
>> Coelho Luciano (Nokia-D/Helsinki) wrote:
>>
>>> ext Eric Dumazet wrote:
>>>
>>>
>>>> So please test it and tell me/us you like it :)
>>>>
>>>>
>>> I applied your patch on top of the 2.6.28 I'm using and will start
>>> testing it now. I can also try to test it on top of 2.6.31-rc4 plus
>>> wireless-testing stuff, if needed.
>>>
>>> I'll tell you that I liked your patch in a moment (I actually already
>>> liked it, but I'll wait until I know it works before I tell you ;)
>>>
>>>
>> Just a small update. We have run some very basic tests with 2.6.28.
>> The patch seems to work fine and we don't see excessive wake-ups from
>> neigh anymore (thanks Tero for helping with the tests). We will still
>> run one of our full release test rounds in order to make sure that there
>> are no regressions. I'll let you know when we have some more results.
>>
>>
>
> Thanks for the update, I am running this patch on net-next-2.6 with no problem.
>
We have now run our release round tests and everything is working fine.
I'll report back if we find any issues with this change in the future
(which I really don't expect to happen ;).
Thank you very much for fixing this.
--
Cheers,
Luca.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-08-05 7:48 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-30 10:44 neigh_periodic_timer expires too often Luciano Coelho
2009-07-30 13:15 ` Eric Dumazet
2009-07-30 14:33 ` Luciano Coelho
2009-07-31 12:22 ` Luciano Coelho
2009-07-31 12:50 ` Eric Dumazet
2009-08-05 7:48 ` Luciano Coelho
2009-08-03 1:35 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).