* [RFC] net/core/dst.c : Should'nt dst_run_gc() be more scalable and friendly ?
@ 2007-08-16 12:09 Eric Dumazet
2007-08-16 12:41 ` Herbert Xu
0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2007-08-16 12:09 UTC (permalink / raw)
To: netdev@vger.kernel.org
Hi all
When the periodic route cache flush is done, this machine suffers a lot and eventually triggers the "soft lockup" message.
dst_run_gc() is doing a scan of a possibly huge list of dst_entries, eventually freeing some (less than 1%) of them, but holding the dst_lock spinlock for the whole scan.
Then it rearms a timer to redo the full thing 1/10 s later. The slowdown can last one minute or so.
No need to say that machine is not really usable in the meantime.
I already looked at the code and I am testing a patch (included in this mail) where I limit the number of entries that can be scanned at each dst_run_gc() call, and not holding dst_lock during the scan. But then I dont know how to address the dst_dev_event() problem, since we must be able to dst_ifdown() all entries attached to one dev.
I am wondering how to really solve the problem. Could a workqueue be used here instead of a timer ?
Aug 16 06:21:37 SRV1 kernel: BUG: soft lockup detected on CPU#0!
Aug 16 06:21:37 SRV1 kernel:
Aug 16 06:21:37 SRV1 kernel: Call Trace:
Aug 16 06:21:37 SRV1 kernel: <IRQ> [<ffffffff802286f0>] wake_up_process+0x10/0x20
Aug 16 06:21:37 SRV1 kernel: [<ffffffff80251e09>] softlockup_tick+0xe9/0x110
Aug 16 06:21:37 SRV1 kernel: [<ffffffff803cd380>] dst_run_gc+0x0/0x140
Aug 16 06:21:37 SRV1 kernel: [<ffffffff802376f3>] run_local_timers+0x13/0x20
Aug 16 06:21:37 SRV1 kernel: [<ffffffff802379c7>] update_process_times+0x57/0x90
Aug 16 06:21:37 SRV1 kernel: [<ffffffff80216034>] smp_local_timer_interrupt+0x34/0x60
Aug 16 06:21:37 SRV1 kernel: [<ffffffff802165cc>] smp_apic_timer_interrupt+0x5c/0x80
Aug 16 06:21:37 SRV1 kernel: [<ffffffff8020a816>] apic_timer_interrupt+0x66/0x70
Aug 16 06:21:37 SRV1 kernel: [<ffffffff803cd3d3>] dst_run_gc+0x53/0x140
Aug 16 06:21:37 SRV1 kernel: [<ffffffff803cd3c6>] dst_run_gc+0x46/0x140
Aug 16 06:21:37 SRV1 kernel: [<ffffffff80237148>] run_timer_softirq+0x148/0x1c0
Aug 16 06:21:37 SRV1 kernel: [<ffffffff8023340c>] __do_softirq+0x6c/0xe0
Aug 16 06:21:37 SRV1 kernel: [<ffffffff8020ad6c>] call_softirq+0x1c/0x30
Aug 16 06:21:37 SRV1 kernel: <EOI> [<ffffffff8020cb34>] do_softirq+0x34/0x90
Aug 16 06:21:37 SRV1 kernel: [<ffffffff802331cf>] local_bh_enable_ip+0x3f/0x60
Aug 16 06:21:37 SRV1 kernel: [<ffffffff80422913>] _spin_unlock_bh+0x13/0x20
Aug 16 06:21:37 SRV1 kernel: [<ffffffff803dfde8>] rt_garbage_collect+0x1d8/0x320
Aug 16 06:21:37 SRV1 kernel: [<ffffffff803cd4dd>] dst_alloc+0x1d/0xa0
Aug 16 06:21:37 SRV1 kernel: [<ffffffff803e1433>] __ip_route_output_key+0x573/0x800
Aug 16 06:21:37 SRV1 kernel: [<ffffffff803c02e2>] sock_common_recvmsg+0x32/0x50
Aug 16 06:21:37 SRV1 kernel: [<ffffffff803e16dc>] ip_route_output_flow+0x1c/0x60
Aug 16 06:21:37 SRV1 kernel: [<ffffffff80400160>] tcp_v4_connect+0x150/0x610
Aug 16 06:21:37 SRV1 kernel: [<ffffffff803ebf07>] inet_bind_bucket_create+0x17/0x60
Aug 16 06:21:37 SRV1 kernel: [<ffffffff8040cd16>] inet_stream_connect+0xa6/0x2c0
Aug 16 06:21:37 SRV1 kernel: [<ffffffff80422981>] _spin_lock_bh+0x11/0x30
Aug 16 06:21:37 SRV1 kernel: [<ffffffff803c0bdf>] lock_sock_nested+0xcf/0xe0
Aug 16 06:21:37 SRV1 kernel: [<ffffffff80422981>] _spin_lock_bh+0x11/0x30
Aug 16 06:21:37 SRV1 kernel: [<ffffffff803be551>] sys_connect+0x71/0xa0
Aug 16 06:21:37 SRV1 kernel: [<ffffffff803eee3f>] tcp_setsockopt+0x1f/0x30
Aug 16 06:21:37 SRV1 kernel: [<ffffffff803c030f>] sock_common_setsockopt+0xf/0x20
Aug 16 06:21:37 SRV1 kernel: [<ffffffff803be4bd>] sys_setsockopt+0x9d/0xc0
Aug 16 06:21:37 SRV1 kernel: [<ffffffff8028881e>] sys_ioctl+0x5e/0x80
Aug 16 06:21:37 SRV1 kernel: [<ffffffff80209c4e>] system_call+0x7e/0x83
Prelimary patch :
--- linux-2.6.22/net/core/dst.c
+++ linux-2.6.22-ed/net/core/dst.c
@@ -28,6 +28,9 @@
* 4) All operations modify state, so a spinlock is used.
*/
static struct dst_entry *dst_garbage_list;
+static struct dst_entry *dst_garbage_wrk;
+static struct dst_entry *dst_garbage_inuse;
+static int dst_gc_running;
#if RT_CACHE_DEBUG >= 2
static atomic_t dst_total = ATOMIC_INIT(0);
#endif
@@ -42,26 +45,42 @@
static void dst_run_gc(unsigned long dummy)
{
- int delayed = 0;
- int work_performed;
- struct dst_entry * dst, **dstp;
-
- if (!spin_trylock(&dst_lock)) {
- mod_timer(&dst_gc_timer, jiffies + HZ/10);
- return;
- }
-
+ int quota = 1000;
+ int work_performed = 0;
+ struct dst_entry *dst, *next;
+ struct dst_entry *first = NULL, *last = NULL;
+#if RT_CACHE_DEBUG >= 2
+ unsigned long t0 = jiffies;
+#endif
+ spin_lock(&dst_lock);
+ if (dst_gc_running)
+ goto out;
del_timer(&dst_gc_timer);
- dstp = &dst_garbage_list;
- work_performed = 0;
- while ((dst = *dstp) != NULL) {
- if (atomic_read(&dst->__refcnt)) {
- dstp = &dst->next;
- delayed++;
+ if (!dst_garbage_wrk) {
+ dst_garbage_wrk = dst_garbage_list;
+ dst_garbage_list = dst_garbage_inuse;
+ dst_garbage_inuse = NULL;
+ }
+ /*
+ * before releasing dst_lock, tell others we are currently running
+ * so they wont start timer or mess dst_garbage_wrk
+ */
+ dst_gc_running = 1;
+ next = dst_garbage_wrk;
+ spin_unlock(&dst_lock);
+
+ while ((dst = next) != NULL) {
+ next = dst->next;
+ if (likely(atomic_read(&dst->__refcnt))) {
+ if (unlikely(last == NULL))
+ last = dst;
+ dst->next = first;
+ first = dst;
+ if (--quota == 0)
+ break;
continue;
}
- *dstp = dst->next;
- work_performed = 1;
+ work_performed++;
dst = dst_destroy(dst);
if (dst) {
@@ -77,16 +96,26 @@
continue;
___dst_free(dst);
- dst->next = *dstp;
- *dstp = dst;
- dstp = &dst->next;
+ dst->next = next;
+ next = dst;
}
}
- if (!dst_garbage_list) {
+ dst_garbage_wrk = next;
+
+ spin_lock(&dst_lock);
+ dst_gc_running = 0;
+ if (last != NULL) {
+ last->next = dst_garbage_inuse;
+ dst_garbage_inuse = first;
+ }
+ /*
+ * If all lists are empty, no need to rearm a timer
+ */
+ if (!dst_garbage_list && !dst_garbage_wrk && !dst_garbage_inuse) {
dst_gc_timer_inc = DST_GC_MAX;
goto out;
}
- if (!work_performed) {
+ if (!work_performed && !dst_garbage_wrk) {
if ((dst_gc_timer_expires += dst_gc_timer_inc) > DST_GC_MAX)
dst_gc_timer_expires = DST_GC_MAX;
dst_gc_timer_inc += DST_GC_INC;
@@ -95,8 +124,12 @@
dst_gc_timer_expires = DST_GC_MIN;
}
#if RT_CACHE_DEBUG >= 2
- printk("dst_total: %d/%d %ld\n",
- atomic_read(&dst_total), delayed, dst_gc_timer_expires);
+ if (net_msg_warn)
+ printk(KERN_DEBUG "dst_run_gc: "
+ "dst_total=%d quota=%d perf=%d expires=%ld elapsed=%lu\n",
+ atomic_read(&dst_total),
+ quota, work_performed,
+ dst_gc_timer_expires, jiffies - t0);
#endif
/* if the next desired timer is more than 4 seconds in the future
* then round the timer to whole seconds
@@ -157,7 +190,7 @@
___dst_free(dst);
dst->next = dst_garbage_list;
dst_garbage_list = dst;
- if (dst_gc_timer_inc > DST_GC_INC) {
+ if (dst_gc_timer_inc > DST_GC_INC && !dst_gc_running) {
dst_gc_timer_inc = DST_GC_INC;
dst_gc_timer_expires = DST_GC_MIN;
mod_timer(&dst_gc_timer, jiffies + dst_gc_timer_expires);
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [RFC] net/core/dst.c : Should'nt dst_run_gc() be more scalable and friendly ? 2007-08-16 12:09 [RFC] net/core/dst.c : Should'nt dst_run_gc() be more scalable and friendly ? Eric Dumazet @ 2007-08-16 12:41 ` Herbert Xu 2007-08-16 13:36 ` Eric Dumazet 0 siblings, 1 reply; 9+ messages in thread From: Herbert Xu @ 2007-08-16 12:41 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev Eric Dumazet <dada1@cosmosbay.com> wrote: > > I am wondering how to really solve the problem. Could a workqueue be used here instead of a timer ? I think so. A mutex would separate the GC and dst_ifdown. You'd take the spin lock in the GC only to replace the garbage list with NULL. You then drop the lock to process it. Once it's done you take the lock again and join it with whatever that's been added in the mean time. This is easy because you should already have the tail after the GC process. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] net/core/dst.c : Should'nt dst_run_gc() be more scalable and friendly ? 2007-08-16 12:41 ` Herbert Xu @ 2007-08-16 13:36 ` Eric Dumazet 2007-08-16 13:59 ` Herbert Xu 0 siblings, 1 reply; 9+ messages in thread From: Eric Dumazet @ 2007-08-16 13:36 UTC (permalink / raw) To: Herbert Xu; +Cc: netdev On Thu, 16 Aug 2007 20:41:58 +0800 Herbert Xu <herbert@gondor.apana.org.au> wrote: > Eric Dumazet <dada1@cosmosbay.com> wrote: > > > > I am wondering how to really solve the problem. Could a workqueue be used here instead of a timer ? > > I think so. > > A mutex would separate the GC and dst_ifdown. You'd take the > spin lock in the GC only to replace the garbage list with NULL. > You then drop the lock to process it. Once it's done you take > the lock again and join it with whatever that's been added in > the mean time. This is easy because you should already have > the tail after the GC process. Thanks Herbert Yes, I already did this (with the current softirq based timer model), but how can dst_dev_event() do its work, since the GC is using a private list. (In my patch, time to GC process XXX.000 entries is about XX seconds.) We would have to change dst_dev_event() to : - Signal to GC it has to stop as soon as possible. - Wait for GC be stoped (busy wait I suspect we cannot sleep in dst_dev_event() ? ) , giving us a full garbage_list. - Process the whole list. - Restart GC ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] net/core/dst.c : Should'nt dst_run_gc() be more scalable and friendly ? 2007-08-16 13:36 ` Eric Dumazet @ 2007-08-16 13:59 ` Herbert Xu 2007-08-16 15:40 ` Eric Dumazet 0 siblings, 1 reply; 9+ messages in thread From: Herbert Xu @ 2007-08-16 13:59 UTC (permalink / raw) To: Eric Dumazet; +Cc: herbert, netdev Eric Dumazet <dada1@cosmosbay.com> wrote: > > Yes, I already did this (with the current softirq based timer model), > but how can dst_dev_event() do its work, since the GC is using > a private list. (In my patch, time to GC process XXX.000 entries is about XX seconds.) > > We would have to change dst_dev_event() to : > - Signal to GC it has to stop as soon as possible. > - Wait for GC be stoped (busy wait I suspect we cannot sleep in dst_dev_event() ? ) You can sleep in dst_dev_event so just use a mutex to separate it from the GC. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] net/core/dst.c : Should'nt dst_run_gc() be more scalable and friendly ? 2007-08-16 13:59 ` Herbert Xu @ 2007-08-16 15:40 ` Eric Dumazet 2007-08-16 23:33 ` Herbert Xu 0 siblings, 1 reply; 9+ messages in thread From: Eric Dumazet @ 2007-08-16 15:40 UTC (permalink / raw) To: Herbert Xu; +Cc: netdev On Thu, 16 Aug 2007 21:59:43 +0800 Herbert Xu <herbert@gondor.apana.org.au> wrote: > Eric Dumazet <dada1@cosmosbay.com> wrote: > > > > Yes, I already did this (with the current softirq based timer model), > > but how can dst_dev_event() do its work, since the GC is using > > a private list. (In my patch, time to GC process XXX.000 entries is about XX seconds.) > > > > We would have to change dst_dev_event() to : > > - Signal to GC it has to stop as soon as possible. > > - Wait for GC be stoped (busy wait I suspect we cannot sleep in dst_dev_event() ? ) > > You can sleep in dst_dev_event so just use a mutex to separate > it from the GC. Thanks for the suggestion, I added dst_mutex and attempt a mutex_trylock() inside dst_run_gc(). So do you think this patch is enough or should we convert dst_run_gc processing from softirq to workqueue too ? Eric [PATCH] net/core/dst.c : let dst_run_gc() do its job incrementally When the periodic route cache flush is done, we can feed dst_garbage_list with a huge number of dst_entry structs. Then dst_run_gc() is doing a scan of a possibly huge list of dst_entries, eventually freeing some (less than 1%) of them, while holding the dst_lock spinlock for the whole scan. This sometimes triggers a soft lockup. Then it rearms a timer to redo the full thing 1/10 s later. The slowdown can last one minute or so. No need to say that machine is not really usable in the meantime. This patch attempts to solve the problem giving dst_run_gc() the ability to perform its work by chunks instead of whole list scan. We limit a chunk by the number of entries that could not be freed in a run. This should limit the CPU cache trashing (currently 128 bytes per entry on x86_64), yet giving a chance to free unreferenced entries (not included in the quota) if the load is really high. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> --- linux-2.6.22/net/core/dst.c.orig +++ linux-2.6.22/net/core/dst.c @@ -9,6 +9,7 @@ #include <linux/errno.h> #include <linux/init.h> #include <linux/kernel.h> +#include <linux/mutex.h> #include <linux/mm.h> #include <linux/module.h> #include <linux/netdevice.h> @@ -28,10 +29,17 @@ * 4) All operations modify state, so a spinlock is used. */ static struct dst_entry *dst_garbage_list; +static struct dst_entry *dst_garbage_wrk; +static struct dst_entry *dst_garbage_inuse; +static int dst_gc_running; #if RT_CACHE_DEBUG >= 2 static atomic_t dst_total = ATOMIC_INIT(0); #endif static DEFINE_SPINLOCK(dst_lock); +/* + * A mutex is used to synchronize GC with dst_dev_event() + */ +static DEFINE_MUTEX(dst_mutex); static unsigned long dst_gc_timer_expires; static unsigned long dst_gc_timer_inc = DST_GC_MAX; @@ -42,26 +50,39 @@ static DEFINE_TIMER(dst_gc_timer, dst_ru static void dst_run_gc(unsigned long dummy) { - int delayed = 0; - int work_performed; - struct dst_entry * dst, **dstp; + int quota = 1000; + int work_performed = 0; + struct dst_entry *dst, *next; + struct dst_entry *first = NULL, *last = NULL; - if (!spin_trylock(&dst_lock)) { + if (!mutex_trylock(&dst_mutex)) { mod_timer(&dst_gc_timer, jiffies + HZ/10); return; } + dst_gc_running = 1; + spin_lock(&dst_lock); del_timer(&dst_gc_timer); - dstp = &dst_garbage_list; - work_performed = 0; - while ((dst = *dstp) != NULL) { - if (atomic_read(&dst->__refcnt)) { - dstp = &dst->next; - delayed++; + if (!dst_garbage_wrk) { + dst_garbage_wrk = dst_garbage_list; + dst_garbage_list = dst_garbage_inuse; + dst_garbage_inuse = NULL; + } + next = dst_garbage_wrk; + spin_unlock(&dst_lock); + + while ((dst = next) != NULL) { + next = dst->next; + if (likely(atomic_read(&dst->__refcnt))) { + if (unlikely(last == NULL)) + last = dst; + dst->next = first; + first = dst; + if (--quota == 0) + break; continue; } - *dstp = dst->next; - work_performed = 1; + work_performed++; dst = dst_destroy(dst); if (dst) { @@ -77,16 +98,26 @@ static void dst_run_gc(unsigned long dum continue; ___dst_free(dst); - dst->next = *dstp; - *dstp = dst; - dstp = &dst->next; + dst->next = next; + next = dst; } } - if (!dst_garbage_list) { + dst_garbage_wrk = next; + + spin_lock(&dst_lock); + dst_gc_running = 0; + if (last != NULL) { + last->next = dst_garbage_inuse; + dst_garbage_inuse = first; + } + /* + * If all lists are empty, no need to rearm a timer + */ + if (!dst_garbage_list && !dst_garbage_wrk && !dst_garbage_inuse) { dst_gc_timer_inc = DST_GC_MAX; goto out; } - if (!work_performed) { + if (!work_performed && !dst_garbage_wrk) { if ((dst_gc_timer_expires += dst_gc_timer_inc) > DST_GC_MAX) dst_gc_timer_expires = DST_GC_MAX; dst_gc_timer_inc += DST_GC_INC; @@ -95,8 +126,11 @@ static void dst_run_gc(unsigned long dum dst_gc_timer_expires = DST_GC_MIN; } #if RT_CACHE_DEBUG >= 2 - printk("dst_total: %d/%d %ld\n", - atomic_read(&dst_total), delayed, dst_gc_timer_expires); + if (net_msg_warn) + printk(KERN_DEBUG "dst_run_gc: " + "dst_total=%d quota=%d perf=%d expires=%ld\n", + atomic_read(&dst_total), quota, + work_performed, dst_gc_timer_expires); #endif /* if the next desired timer is more than 4 seconds in the future * then round the timer to whole seconds @@ -109,6 +143,7 @@ static void dst_run_gc(unsigned long dum out: spin_unlock(&dst_lock); + mutex_unlock(&dst_mutex); } static int dst_discard(struct sk_buff *skb) @@ -157,7 +192,7 @@ void __dst_free(struct dst_entry * dst) ___dst_free(dst); dst->next = dst_garbage_list; dst_garbage_list = dst; - if (dst_gc_timer_inc > DST_GC_INC) { + if (dst_gc_timer_inc > DST_GC_INC && !dst_gc_running) { dst_gc_timer_inc = DST_GC_INC; dst_gc_timer_expires = DST_GC_MIN; mod_timer(&dst_gc_timer, jiffies + dst_gc_timer_expires); @@ -224,7 +259,7 @@ again: * * Commented and originally written by Alexey. */ -static inline void dst_ifdown(struct dst_entry *dst, struct net_device *dev, +static void dst_ifdown(struct dst_entry *dst, struct net_device *dev, int unregister) { if (dst->ops->ifdown) @@ -251,15 +286,34 @@ static int dst_dev_event(struct notifier { struct net_device *dev = ptr; struct dst_entry *dst; + struct dst_entry *first, *last; switch (event) { case NETDEV_UNREGISTER: case NETDEV_DOWN: + mutex_lock(&dst_mutex); /* guard against GC */ + spin_lock_bh(&dst_lock); - for (dst = dst_garbage_list; dst; dst = dst->next) { - dst_ifdown(dst, dev, event != NETDEV_DOWN); + first = dst_garbage_list; + if (first) { + dst_garbage_list = NULL; + spin_unlock_bh(&dst_lock); + for (dst = first; dst; dst = dst->next) { + last = dst; + dst_ifdown(dst, dev, event != NETDEV_DOWN); + } + spin_lock_bh(&dst_lock); + last->next = dst_garbage_list; + dst_garbage_list = first; } spin_unlock_bh(&dst_lock); + for (dst = dst_garbage_wrk; dst; dst = dst->next) { + dst_ifdown(dst, dev, event != NETDEV_DOWN); + } + for (dst = dst_garbage_inuse; dst; dst = dst->next) { + dst_ifdown(dst, dev, event != NETDEV_DOWN); + } + mutex_unlock(&dst_mutex); break; } return NOTIFY_DONE; ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] net/core/dst.c : Should'nt dst_run_gc() be more scalable and friendly ? 2007-08-16 15:40 ` Eric Dumazet @ 2007-08-16 23:33 ` Herbert Xu 2007-08-17 8:10 ` Eric Dumazet 0 siblings, 1 reply; 9+ messages in thread From: Herbert Xu @ 2007-08-16 23:33 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev On Thu, Aug 16, 2007 at 05:40:44PM +0200, Eric Dumazet wrote: > > So do you think this patch is enough or should we convert dst_run_gc processing from softirq to workqueue too ? I think a workqueue would be the best solution since with that you wouldn't have to worry about processing things in chunks. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] net/core/dst.c : Should'nt dst_run_gc() be more scalable and friendly ? 2007-08-16 23:33 ` Herbert Xu @ 2007-08-17 8:10 ` Eric Dumazet 2007-08-17 8:15 ` Herbert Xu 0 siblings, 1 reply; 9+ messages in thread From: Eric Dumazet @ 2007-08-17 8:10 UTC (permalink / raw) To: Herbert Xu; +Cc: netdev On Fri, 17 Aug 2007 07:33:39 +0800 Herbert Xu <herbert@gondor.apana.org.au> wrote: > On Thu, Aug 16, 2007 at 05:40:44PM +0200, Eric Dumazet wrote: > > > > So do you think this patch is enough or should we convert dst_run_gc processing from softirq to workqueue too ? > > I think a workqueue would be the best solution since with > that you wouldn't have to worry about processing things in > chunks. Will a workqueue react the same in case of a DDOS situation, where softirq could use all CPU cycles to handle incoming packets and feed the GC list, and GC would never have a chance to scan and free some items ? About chunk processing, I did it on purpose, to not throw away all CPU cache. Goal is to process entries, but not all of them in a row, especially if we find many yet referenced entries (and thus not candidates to freeing) Thanks ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] net/core/dst.c : Should'nt dst_run_gc() be more scalable and friendly ? 2007-08-17 8:10 ` Eric Dumazet @ 2007-08-17 8:15 ` Herbert Xu 2007-08-17 8:31 ` Eric Dumazet 0 siblings, 1 reply; 9+ messages in thread From: Herbert Xu @ 2007-08-17 8:15 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev On Fri, Aug 17, 2007 at 10:10:30AM +0200, Eric Dumazet wrote: > > Will a workqueue react the same in case of a DDOS situation, > where softirq could use all CPU cycles to handle incoming > packets and feed the GC list, and GC would never > have a chance to scan and free some items ? Well when that happens the softirqs will be deferred to ksoftirqd which should share the CPU fairly with the workqueue. > About chunk processing, I did it on purpose, to not throw away > all CPU cache. Goal is to process entries, but not all of them > in a row, especially if we find many yet referenced entries > (and thus not candidates to freeing) I agree that chunks are desirable for a timer since you'd be hogging the CPU otherwise. However, if you went to a workqueue then it's less of a concern and would simplify things. In particular, you won't have to pick a good chunk size :) Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] net/core/dst.c : Should'nt dst_run_gc() be more scalable and friendly ? 2007-08-17 8:15 ` Herbert Xu @ 2007-08-17 8:31 ` Eric Dumazet 0 siblings, 0 replies; 9+ messages in thread From: Eric Dumazet @ 2007-08-17 8:31 UTC (permalink / raw) To: Herbert Xu; +Cc: netdev On Fri, 17 Aug 2007 16:15:22 +0800 Herbert Xu <herbert@gondor.apana.org.au> wrote: > On Fri, Aug 17, 2007 at 10:10:30AM +0200, Eric Dumazet wrote: > > > > Will a workqueue react the same in case of a DDOS situation, > > where softirq could use all CPU cycles to handle incoming > > packets and feed the GC list, and GC would never > > have a chance to scan and free some items ? > > Well when that happens the softirqs will be deferred to > ksoftirqd which should share the CPU fairly with the > workqueue. Thats nice :) I'll code a workqueue based thing in about 10 days after my hollidays, and perform DOS tests as well. Thanks for the feedback. > > > About chunk processing, I did it on purpose, to not throw away > > all CPU cache. Goal is to process entries, but not all of them > > in a row, especially if we find many yet referenced entries > > (and thus not candidates to freeing) > > I agree that chunks are desirable for a timer since you'd > be hogging the CPU otherwise. However, if you went to a > workqueue then it's less of a concern and would simplify > things. In particular, you won't have to pick a good > chunk size :) ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2007-08-17 8:31 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-08-16 12:09 [RFC] net/core/dst.c : Should'nt dst_run_gc() be more scalable and friendly ? Eric Dumazet 2007-08-16 12:41 ` Herbert Xu 2007-08-16 13:36 ` Eric Dumazet 2007-08-16 13:59 ` Herbert Xu 2007-08-16 15:40 ` Eric Dumazet 2007-08-16 23:33 ` Herbert Xu 2007-08-17 8:10 ` Eric Dumazet 2007-08-17 8:15 ` Herbert Xu 2007-08-17 8:31 ` Eric Dumazet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox