* [PATCH 1/1] net: fix scheduling of dst_gc_task by __dst_free
[not found] <20080912123113.770453085@theryb.frec.bull.fr>
@ 2008-09-12 12:31 ` Benjamin Thery
2008-09-12 14:46 ` Eric Dumazet
0 siblings, 1 reply; 4+ messages in thread
From: Benjamin Thery @ 2008-09-12 12:31 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet; +Cc: netdev, Benjamin Thery
The dst garbage collector dst_gc_task() may not be scheduled as we
expect it to be in __dst_free().
Indeed, when the dst_gc_timer was replaced by the delayed_work
dst_gc_work, the mod_timer() call used to schedule the garbage
collector at an earlier date was replaced by a schedule_delayed_work()
(see commit 86bba269d08f0c545ae76c90b56727f65d62d57f).
But, the behaviour of mod_timer() and schedule_delayed_work() is
different in the way they handle the delay.
mod_timer() stops the timer and re-arm it with the new given delay,
whereas schedule_delayed_work() only check if the work is already
queued in the workqueue (and queue it (with delay) if it is not)
BUT it does NOT take into account the new delay (even if the new delay
is earlier in time).
schedule_delayed_work() returns 0 if it didn't queue the work,
but we don't check the return code in __dst_free().
If I understand the code in __dst_free() correctly, we want dst_gc_task
to be queued after DST_GC_INC jiffies if we pass the test (and not in
some undetermined time in the future), so I think we should add a call
to cancel_delayed_work() before schedule_delayed_work(). Patch below.
Or we should at least test the return code of schedule_delayed_work(),
and reset the values of dst_garbage.timer_inc and dst_garbage.timer_expires
back to their former values if schedule_delayed_work() failed.
Otherwise the subsequent calls to __dst_free will test the wrong values
and assume wrong thing about when the garbage collector is supposed to
be scheduled.
dst_gc_task() also calls schedule_delayed_work() without checking
its return code (or calling cancel_scheduled_work() first), but it
should fine there: dst_gc_task is the routine of the delayed_work, so
no dst_gc_work should be pending in the queue when it's running.
This patch applies on top of net-2.6.
(Sorry, I think I've been a bit verbose to expose this simple issue :)
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
---
net/core/dst.c | 1 +
1 file changed, 1 insertion(+)
Index: net-2.6/net/core/dst.c
===================================================================
--- net-2.6.orig/net/core/dst.c
+++ net-2.6/net/core/dst.c
@@ -203,6 +203,7 @@ void __dst_free(struct dst_entry * dst)
if (dst_garbage.timer_inc > DST_GC_INC) {
dst_garbage.timer_inc = DST_GC_INC;
dst_garbage.timer_expires = DST_GC_MIN;
+ cancel_delayed_work(&dst_gc_work);
schedule_delayed_work(&dst_gc_work, dst_garbage.timer_expires);
}
spin_unlock_bh(&dst_garbage.lock);
--
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 1/1] net: fix scheduling of dst_gc_task by __dst_free
2008-09-12 12:31 ` [PATCH 1/1] net: fix scheduling of dst_gc_task by __dst_free Benjamin Thery
@ 2008-09-12 14:46 ` Eric Dumazet
2008-09-12 23:14 ` David Miller
0 siblings, 1 reply; 4+ messages in thread
From: Eric Dumazet @ 2008-09-12 14:46 UTC (permalink / raw)
To: Benjamin Thery; +Cc: David S. Miller, netdev
Benjamin Thery a écrit :
> The dst garbage collector dst_gc_task() may not be scheduled as we
> expect it to be in __dst_free().
>
> Indeed, when the dst_gc_timer was replaced by the delayed_work
> dst_gc_work, the mod_timer() call used to schedule the garbage
> collector at an earlier date was replaced by a schedule_delayed_work()
> (see commit 86bba269d08f0c545ae76c90b56727f65d62d57f).
>
> But, the behaviour of mod_timer() and schedule_delayed_work() is
> different in the way they handle the delay.
>
> mod_timer() stops the timer and re-arm it with the new given delay,
> whereas schedule_delayed_work() only check if the work is already
> queued in the workqueue (and queue it (with delay) if it is not)
> BUT it does NOT take into account the new delay (even if the new delay
> is earlier in time).
> schedule_delayed_work() returns 0 if it didn't queue the work,
> but we don't check the return code in __dst_free().
>
> If I understand the code in __dst_free() correctly, we want dst_gc_task
> to be queued after DST_GC_INC jiffies if we pass the test (and not in
> some undetermined time in the future), so I think we should add a call
> to cancel_delayed_work() before schedule_delayed_work(). Patch below.
>
Well, you are right that time is undetermined (but < ~120 seconds), so your patch
makes sense.
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Then we should ask why we reset the timer back to its minimum value
every time we call __dst_free(). On machines with many dormant tcp sessions,
dst_garbage.list can contain huge number of non freeable entries :(
Maybe we should count the entries and change the timer only if really needed.
> Or we should at least test the return code of schedule_delayed_work(),
> and reset the values of dst_garbage.timer_inc and dst_garbage.timer_expires
> back to their former values if schedule_delayed_work() failed.
> Otherwise the subsequent calls to __dst_free will test the wrong values
> and assume wrong thing about when the garbage collector is supposed to
> be scheduled.
>
> dst_gc_task() also calls schedule_delayed_work() without checking
> its return code (or calling cancel_scheduled_work() first), but it
> should fine there: dst_gc_task is the routine of the delayed_work, so
> no dst_gc_work should be pending in the queue when it's running.
>
> This patch applies on top of net-2.6.
>
> (Sorry, I think I've been a bit verbose to expose this simple issue :)
>
> Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
> ---
> net/core/dst.c | 1 +
> 1 file changed, 1 insertion(+)
>
> Index: net-2.6/net/core/dst.c
> ===================================================================
> --- net-2.6.orig/net/core/dst.c
> +++ net-2.6/net/core/dst.c
> @@ -203,6 +203,7 @@ void __dst_free(struct dst_entry * dst)
> if (dst_garbage.timer_inc > DST_GC_INC) {
> dst_garbage.timer_inc = DST_GC_INC;
> dst_garbage.timer_expires = DST_GC_MIN;
> + cancel_delayed_work(&dst_gc_work);
> schedule_delayed_work(&dst_gc_work, dst_garbage.timer_expires);
> }
> spin_unlock_bh(&dst_garbage.lock);
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 1/1] net: fix scheduling of dst_gc_task by __dst_free
2008-09-12 14:46 ` Eric Dumazet
@ 2008-09-12 23:14 ` David Miller
2008-09-15 13:26 ` Benjamin Thery
0 siblings, 1 reply; 4+ messages in thread
From: David Miller @ 2008-09-12 23:14 UTC (permalink / raw)
To: dada1; +Cc: benjamin.thery, netdev
From: Eric Dumazet <dada1@cosmosbay.com>
Date: Fri, 12 Sep 2008 16:46:52 +0200
> Benjamin Thery a écrit :
> > The dst garbage collector dst_gc_task() may not be scheduled as we
> > expect it to be in __dst_free().
> > Indeed, when the dst_gc_timer was replaced by the delayed_work
> > dst_gc_work, the mod_timer() call used to schedule the garbage
> > collector at an earlier date was replaced by a schedule_delayed_work()
> > (see commit 86bba269d08f0c545ae76c90b56727f65d62d57f).
> > But, the behaviour of mod_timer() and schedule_delayed_work() is
> > different in the way they handle the delay. mod_timer() stops the timer and re-arm it with the new given delay,
> > whereas schedule_delayed_work() only check if the work is already
> > queued in the workqueue (and queue it (with delay) if it is not)
> > BUT it does NOT take into account the new delay (even if the new delay
> > is earlier in time).
> > schedule_delayed_work() returns 0 if it didn't queue the work,
> > but we don't check the return code in __dst_free().
> > If I understand the code in __dst_free() correctly, we want dst_gc_task
> > to be queued after DST_GC_INC jiffies if we pass the test (and not in
> > some undetermined time in the future), so I think we should add a call
> > to cancel_delayed_work() before schedule_delayed_work(). Patch below.
> >
>
> Well, you are right that time is undetermined (but < ~120 seconds), so your patch
> makes sense.
>
> Acked-by: Eric Dumazet <dada1@cosmosbay.com>
I'll add this to net-next-2.6 for now. Benjamin, do you know of any
real cases where users are being tripped up by our not using the
shorter scheduling of the workqueue?
> Then we should ask why we reset the timer back to its minimum value
> every time we call __dst_free(). On machines with many dormant tcp
> sessions, dst_garbage.list can contain huge number of non freeable
> entries :(
>
> Maybe we should count the entries and change the timer only if really needed.
Yet another area of black magic in our routing cache :)
> > (Sorry, I think I've been a bit verbose to expose this simple issue :)
No, do not apologize, I wish every commit message were this verbose.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 1/1] net: fix scheduling of dst_gc_task by __dst_free
2008-09-12 23:14 ` David Miller
@ 2008-09-15 13:26 ` Benjamin Thery
0 siblings, 0 replies; 4+ messages in thread
From: Benjamin Thery @ 2008-09-15 13:26 UTC (permalink / raw)
To: David Miller; +Cc: dada1, netdev
David Miller wrote:
> From: Eric Dumazet <dada1@cosmosbay.com>
> Date: Fri, 12 Sep 2008 16:46:52 +0200
>
>> Benjamin Thery a écrit :
>>> The dst garbage collector dst_gc_task() may not be scheduled as we
>>> expect it to be in __dst_free().
>>> Indeed, when the dst_gc_timer was replaced by the delayed_work
>>> dst_gc_work, the mod_timer() call used to schedule the garbage
>>> collector at an earlier date was replaced by a schedule_delayed_work()
>>> (see commit 86bba269d08f0c545ae76c90b56727f65d62d57f).
>>> But, the behaviour of mod_timer() and schedule_delayed_work() is
>>> different in the way they handle the delay. mod_timer() stops the timer and re-arm it with the new given delay,
>>> whereas schedule_delayed_work() only check if the work is already
>>> queued in the workqueue (and queue it (with delay) if it is not)
>>> BUT it does NOT take into account the new delay (even if the new delay
>>> is earlier in time).
>>> schedule_delayed_work() returns 0 if it didn't queue the work,
>>> but we don't check the return code in __dst_free().
>>> If I understand the code in __dst_free() correctly, we want dst_gc_task
>>> to be queued after DST_GC_INC jiffies if we pass the test (and not in
>>> some undetermined time in the future), so I think we should add a call
>>> to cancel_delayed_work() before schedule_delayed_work(). Patch below.
>>>
>> Well, you are right that time is undetermined (but < ~120 seconds), so your patch
>> makes sense.
>>
>> Acked-by: Eric Dumazet <dada1@cosmosbay.com>
>
> I'll add this to net-next-2.6 for now. Benjamin, do you know of any
> real cases where users are being tripped up by our not using the
> shorter scheduling of the workqueue?
I found this issue while tracking an issue that sometimes occurs at
network namespace exit.
When a network namespace exits, the routes need to be freed as fast as
possible to complete the unregistration of the net devices present in
the namespace (ie. the loopback).
Sometimes, the routes garbage collection gets delayed (because of the
issue described here) and the refcount on the device isn't decremented
as expected when we reach netdev_wait_allrefs() and we get the infamous
"unregister_netdevice: waiting for lo to become free."
This fix in __dst_free() fixes part of the problem.
Benjamin
>
>> Then we should ask why we reset the timer back to its minimum value
>> every time we call __dst_free(). On machines with many dormant tcp
>> sessions, dst_garbage.list can contain huge number of non freeable
>> entries :(
>>
>> Maybe we should count the entries and change the timer only if really needed.
>
> Yet another area of black magic in our routing cache :)
>
>>> (Sorry, I think I've been a bit verbose to expose this simple issue :)
>
> No, do not apologize, I wish every commit message were this verbose.
>
>
--
B e n j a m i n T h e r y - BULL/DT/Open Software R&D
http://www.bull.com
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2008-09-15 13:27 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20080912123113.770453085@theryb.frec.bull.fr>
2008-09-12 12:31 ` [PATCH 1/1] net: fix scheduling of dst_gc_task by __dst_free Benjamin Thery
2008-09-12 14:46 ` Eric Dumazet
2008-09-12 23:14 ` David Miller
2008-09-15 13:26 ` Benjamin Thery
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).