* [PATCH net] ipv6: prevent fib6_run_gc() contention
@ 2013-06-04 11:08 Michal Kubecek
2013-06-10 21:26 ` David Miller
0 siblings, 1 reply; 9+ messages in thread
From: Michal Kubecek @ 2013-06-04 11:08 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, Alexey Kuznetsov, James Morris,
Hideaki YOSHIFUJI, Patrick McHardy
On a high-traffic router with many processors and many IPv6 dst
entries, soft lockup in fib6_run_gc() can occur when number of
entries reaches gc_thresh.
This happens because fib6_run_gc() uses fib6_gc_lock to allow
only one thread to run the garbage collector but ip6_dst_gc()
doesn't update net->ipv6.ip6_rt_last_gc until fib6_run_gc()
returns. On a system with many entries, this can take some time
so that in the meantime, other threads pass the tests in
ip6_dst_gc() (ip6_rt_last_gc is still not updated) and wait for
the lock. They then have to run the garbage collector one after
another which blocks them for quite long.
Resolve this by replacing special value ~0UL of expire parameter
to fib6_run_gc() by explicit "force" parameter to choose between
spin_lock_bh() and spin_trylock_bh() and call fib6_run_gc() with
force=false if gc_thresh is reached but not max_size.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
---
include/net/ip6_fib.h | 2 +-
net/ipv6/ip6_fib.c | 19 ++++++++-----------
net/ipv6/ndisc.c | 4 ++--
net/ipv6/route.c | 4 ++--
4 files changed, 13 insertions(+), 16 deletions(-)
diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 2a601e7..48ec25a 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -300,7 +300,7 @@ extern void inet6_rt_notify(int event, struct rt6_info *rt,
struct nl_info *info);
extern void fib6_run_gc(unsigned long expires,
- struct net *net);
+ struct net *net, bool force);
extern void fib6_gc_cleanup(void);
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 192dd1a..1dcd618 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1625,19 +1625,16 @@ static int fib6_age(struct rt6_info *rt, void *arg)
static DEFINE_SPINLOCK(fib6_gc_lock);
-void fib6_run_gc(unsigned long expires, struct net *net)
+void fib6_run_gc(unsigned long expires, struct net *net, bool force)
{
- if (expires != ~0UL) {
+ if (force)
spin_lock_bh(&fib6_gc_lock);
- gc_args.timeout = expires ? (int)expires :
- net->ipv6.sysctl.ip6_rt_gc_interval;
- } else {
- if (!spin_trylock_bh(&fib6_gc_lock)) {
- mod_timer(&net->ipv6.ip6_fib_timer, jiffies + HZ);
- return;
- }
- gc_args.timeout = net->ipv6.sysctl.ip6_rt_gc_interval;
+ else if (!spin_trylock_bh(&fib6_gc_lock)) {
+ mod_timer(&net->ipv6.ip6_fib_timer, jiffies + HZ);
+ return;
}
+ gc_args.timeout = expires ? (int)expires :
+ net->ipv6.sysctl.ip6_rt_gc_interval;
gc_args.more = icmp6_dst_gc();
@@ -1654,7 +1651,7 @@ void fib6_run_gc(unsigned long expires, struct net *net)
static void fib6_gc_timer_cb(unsigned long arg)
{
- fib6_run_gc(0, (struct net *)arg);
+ fib6_run_gc(0, (struct net *)arg, true);
}
static int __net_init fib6_net_init(struct net *net)
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 2712ab2..bf60c25 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -1575,7 +1575,7 @@ static int ndisc_netdev_event(struct notifier_block *this, unsigned long event,
switch (event) {
case NETDEV_CHANGEADDR:
neigh_changeaddr(&nd_tbl, dev);
- fib6_run_gc(~0UL, net);
+ fib6_run_gc(0, net, false);
idev = in6_dev_get(dev);
if (!idev)
break;
@@ -1585,7 +1585,7 @@ static int ndisc_netdev_event(struct notifier_block *this, unsigned long event,
break;
case NETDEV_DOWN:
neigh_ifdown(&nd_tbl, dev);
- fib6_run_gc(~0UL, net);
+ fib6_run_gc(0, net, false);
break;
case NETDEV_NOTIFY_PEERS:
ndisc_send_unsol_na(dev);
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index ad0aa6b..fdb5ccb 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1300,7 +1300,7 @@ static int ip6_dst_gc(struct dst_ops *ops)
goto out;
net->ipv6.ip6_rt_gc_expire++;
- fib6_run_gc(net->ipv6.ip6_rt_gc_expire, net);
+ fib6_run_gc(net->ipv6.ip6_rt_gc_expire, net, entries > rt_max_size);
net->ipv6.ip6_rt_last_gc = now;
entries = dst_entries_get_slow(ops);
if (entries < ops->gc_thresh)
@@ -2801,7 +2801,7 @@ int ipv6_sysctl_rtcache_flush(ctl_table *ctl, int write,
net = (struct net *)ctl->extra1;
delay = net->ipv6.sysctl.flush_delay;
proc_dointvec(ctl, write, buffer, lenp, ppos);
- fib6_run_gc(delay <= 0 ? ~0UL : (unsigned long)delay, net);
+ fib6_run_gc(delay <= 0 ? 0 : (unsigned long)delay, net, delay > 0);
return 0;
}
--
1.8.1.4
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH net] ipv6: prevent fib6_run_gc() contention
2013-06-04 11:08 [PATCH net] ipv6: prevent fib6_run_gc() contention Michal Kubecek
@ 2013-06-10 21:26 ` David Miller
2013-06-11 10:07 ` Michal Kubecek
0 siblings, 1 reply; 9+ messages in thread
From: David Miller @ 2013-06-10 21:26 UTC (permalink / raw)
To: mkubecek; +Cc: netdev, kuznet, jmorris, yoshfuji, kaber
From: Michal Kubecek <mkubecek@suse.cz>
Date: Tue, 4 Jun 2013 13:08:59 +0200
> On a high-traffic router with many processors and many IPv6 dst
> entries, soft lockup in fib6_run_gc() can occur when number of
> entries reaches gc_thresh.
>
> This happens because fib6_run_gc() uses fib6_gc_lock to allow
> only one thread to run the garbage collector but ip6_dst_gc()
> doesn't update net->ipv6.ip6_rt_last_gc until fib6_run_gc()
> returns. On a system with many entries, this can take some time
> so that in the meantime, other threads pass the tests in
> ip6_dst_gc() (ip6_rt_last_gc is still not updated) and wait for
> the lock. They then have to run the garbage collector one after
> another which blocks them for quite long.
>
> Resolve this by replacing special value ~0UL of expire parameter
> to fib6_run_gc() by explicit "force" parameter to choose between
> spin_lock_bh() and spin_trylock_bh() and call fib6_run_gc() with
> force=false if gc_thresh is reached but not max_size.
>
> Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
It seems to me that it would be much simpler to simply update
ip6_rt_last_gc first, that way the other threads would elide the GC
call.
Can you see if simply doing that fixes the problem too?
Thanks.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] ipv6: prevent fib6_run_gc() contention
2013-06-10 21:26 ` David Miller
@ 2013-06-11 10:07 ` Michal Kubecek
2013-06-11 10:40 ` Eric Dumazet
2013-06-11 20:01 ` David Miller
0 siblings, 2 replies; 9+ messages in thread
From: Michal Kubecek @ 2013-06-11 10:07 UTC (permalink / raw)
To: David Miller; +Cc: netdev, kuznet, jmorris, yoshfuji, kaber
On Mon, Jun 10, 2013 at 02:26:42PM -0700, David Miller wrote:
> From: Michal Kubecek <mkubecek@suse.cz>
> Date: Tue, 4 Jun 2013 13:08:59 +0200
>
> > On a high-traffic router with many processors and many IPv6 dst
> > entries, soft lockup in fib6_run_gc() can occur when number of
> > entries reaches gc_thresh.
> >
> > This happens because fib6_run_gc() uses fib6_gc_lock to allow
> > only one thread to run the garbage collector but ip6_dst_gc()
> > doesn't update net->ipv6.ip6_rt_last_gc until fib6_run_gc()
> > returns. On a system with many entries, this can take some time
> > so that in the meantime, other threads pass the tests in
> > ip6_dst_gc() (ip6_rt_last_gc is still not updated) and wait for
> > the lock. They then have to run the garbage collector one after
> > another which blocks them for quite long.
> >
> > Resolve this by replacing special value ~0UL of expire parameter
> > to fib6_run_gc() by explicit "force" parameter to choose between
> > spin_lock_bh() and spin_trylock_bh() and call fib6_run_gc() with
> > force=false if gc_thresh is reached but not max_size.
> >
> > Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
>
> It seems to me that it would be much simpler to simply update
> ip6_rt_last_gc first, that way the other threads would elide the GC
> call.
That was my original idea but I was afraid that while the remaining
window in ip6_dst_gc() would be very short and probably safe, we could
still run into problem if fib6_gc_lock was locked by some other caller
of fib6_run_gc() which doesn't update net->ipv6.ip6_rt_last_gc,
especially via a timer.
Michal Kubecek
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH net] ipv6: prevent fib6_run_gc() contention
2013-06-11 10:07 ` Michal Kubecek
@ 2013-06-11 10:40 ` Eric Dumazet
2013-06-11 20:01 ` David Miller
1 sibling, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2013-06-11 10:40 UTC (permalink / raw)
To: Michal Kubecek; +Cc: David Miller, netdev, kuznet, jmorris, yoshfuji, kaber
On Tue, 2013-06-11 at 12:07 +0200, Michal Kubecek wrote:
> That was my original idea but I was afraid that while the remaining
> window in ip6_dst_gc() would be very short and probably safe, we could
> still run into problem if fib6_gc_lock was locked by some other caller
> of fib6_run_gc() which doesn't update net->ipv6.ip6_rt_last_gc,
> especially via a timer.
This looks a bug (but not a big one) to me :
We should update ip6_rt_last_gc if we did a gc,
not only from ip6_dst_gc()
So, I would move the "net->ipv6.ip6_rt_last_gc = now;" from ip6_dst_gc()
to fib6_run_gc(), right after the spin_lock_bh() ?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] ipv6: prevent fib6_run_gc() contention
2013-06-11 10:07 ` Michal Kubecek
2013-06-11 10:40 ` Eric Dumazet
@ 2013-06-11 20:01 ` David Miller
2013-06-11 20:49 ` Michal Kubecek
1 sibling, 1 reply; 9+ messages in thread
From: David Miller @ 2013-06-11 20:01 UTC (permalink / raw)
To: mkubecek; +Cc: netdev, kuznet, jmorris, yoshfuji, kaber
From: Michal Kubecek <mkubecek@suse.cz>
Date: Tue, 11 Jun 2013 12:07:18 +0200
> On Mon, Jun 10, 2013 at 02:26:42PM -0700, David Miller wrote:
>> From: Michal Kubecek <mkubecek@suse.cz>
>> Date: Tue, 4 Jun 2013 13:08:59 +0200
>>
>> > On a high-traffic router with many processors and many IPv6 dst
>> > entries, soft lockup in fib6_run_gc() can occur when number of
>> > entries reaches gc_thresh.
>> >
>> > This happens because fib6_run_gc() uses fib6_gc_lock to allow
>> > only one thread to run the garbage collector but ip6_dst_gc()
>> > doesn't update net->ipv6.ip6_rt_last_gc until fib6_run_gc()
>> > returns. On a system with many entries, this can take some time
>> > so that in the meantime, other threads pass the tests in
>> > ip6_dst_gc() (ip6_rt_last_gc is still not updated) and wait for
>> > the lock. They then have to run the garbage collector one after
>> > another which blocks them for quite long.
>> >
>> > Resolve this by replacing special value ~0UL of expire parameter
>> > to fib6_run_gc() by explicit "force" parameter to choose between
>> > spin_lock_bh() and spin_trylock_bh() and call fib6_run_gc() with
>> > force=false if gc_thresh is reached but not max_size.
>> >
>> > Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
>>
>> It seems to me that it would be much simpler to simply update
>> ip6_rt_last_gc first, that way the other threads would elide the GC
>> call.
>
> That was my original idea but I was afraid that while the remaining
> window in ip6_dst_gc() would be very short and probably safe, we could
> still run into problem if fib6_gc_lock was locked by some other caller
> of fib6_run_gc() which doesn't update net->ipv6.ip6_rt_last_gc,
> especially via a timer.
Why don't you simply try it and find out?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] ipv6: prevent fib6_run_gc() contention
2013-06-11 20:01 ` David Miller
@ 2013-06-11 20:49 ` Michal Kubecek
2013-06-11 21:22 ` David Miller
0 siblings, 1 reply; 9+ messages in thread
From: Michal Kubecek @ 2013-06-11 20:49 UTC (permalink / raw)
To: David Miller; +Cc: netdev, kuznet, jmorris, yoshfuji, kaber
On Tue, Jun 11, 2013 at 01:01:14PM -0700, David Miller wrote:
> From: Michal Kubecek <mkubecek@suse.cz>
> Date: Tue, 11 Jun 2013 12:07:18 +0200
>
> > On Mon, Jun 10, 2013 at 02:26:42PM -0700, David Miller wrote:
> >>
> >> It seems to me that it would be much simpler to simply update
> >> ip6_rt_last_gc first, that way the other threads would elide the GC
> >> call.
> >
> > That was my original idea but I was afraid that while the remaining
> > window in ip6_dst_gc() would be very short and probably safe, we could
> > still run into problem if fib6_gc_lock was locked by some other caller
> > of fib6_run_gc() which doesn't update net->ipv6.ip6_rt_last_gc,
> > especially via a timer.
>
> Why don't you simply try it and find out?
I don't have the system where the issue was observed at hand. I'll have
to ask the customer who reported it to run the test so that it may take
some time.
Michal Kubecek
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH net] ipv6: prevent fib6_run_gc() contention
2013-06-11 20:49 ` Michal Kubecek
@ 2013-06-11 21:22 ` David Miller
2013-07-31 22:19 ` Michal Kubecek
0 siblings, 1 reply; 9+ messages in thread
From: David Miller @ 2013-06-11 21:22 UTC (permalink / raw)
To: mkubecek; +Cc: netdev, kuznet, jmorris, yoshfuji, kaber
From: Michal Kubecek <mkubecek@suse.cz>
Date: Tue, 11 Jun 2013 22:49:12 +0200
> On Tue, Jun 11, 2013 at 01:01:14PM -0700, David Miller wrote:
>> From: Michal Kubecek <mkubecek@suse.cz>
>> Date: Tue, 11 Jun 2013 12:07:18 +0200
>>
>> > On Mon, Jun 10, 2013 at 02:26:42PM -0700, David Miller wrote:
>> >>
>> >> It seems to me that it would be much simpler to simply update
>> >> ip6_rt_last_gc first, that way the other threads would elide the GC
>> >> call.
>> >
>> > That was my original idea but I was afraid that while the remaining
>> > window in ip6_dst_gc() would be very short and probably safe, we could
>> > still run into problem if fib6_gc_lock was locked by some other caller
>> > of fib6_run_gc() which doesn't update net->ipv6.ip6_rt_last_gc,
>> > especially via a timer.
>>
>> Why don't you simply try it and find out?
>
> I don't have the system where the issue was observed at hand. I'll have
> to ask the customer who reported it to run the test so that it may take
> some time.
Please also take care of the issue Eric mentioned, in that some GC
calls erroneously do not update the last GC timestamp.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] ipv6: prevent fib6_run_gc() contention
2013-06-11 21:22 ` David Miller
@ 2013-07-31 22:19 ` Michal Kubecek
2013-07-31 23:43 ` David Miller
0 siblings, 1 reply; 9+ messages in thread
From: Michal Kubecek @ 2013-07-31 22:19 UTC (permalink / raw)
To: David Miller; +Cc: netdev, kuznet, jmorris, yoshfuji, kaber, Eric Dumazet
On Tue, Jun 11, 2013 at 02:22:29PM -0700, David Miller wrote:
> From: Michal Kubecek <mkubecek@suse.cz>
> Date: Tue, 11 Jun 2013 22:49:12 +0200
>
> > On Tue, Jun 11, 2013 at 01:01:14PM -0700, David Miller wrote:
> >> From: Michal Kubecek <mkubecek@suse.cz>
> >> Date: Tue, 11 Jun 2013 12:07:18 +0200
> >>
> >> > On Mon, Jun 10, 2013 at 02:26:42PM -0700, David Miller wrote:
> >> >>
> >> >> It seems to me that it would be much simpler to simply update
> >> >> ip6_rt_last_gc first, that way the other threads would elide the GC
> >> >> call.
> >> >
> >> > That was my original idea but I was afraid that while the remaining
> >> > window in ip6_dst_gc() would be very short and probably safe, we could
> >> > still run into problem if fib6_gc_lock was locked by some other caller
> >> > of fib6_run_gc() which doesn't update net->ipv6.ip6_rt_last_gc,
> >> > especially via a timer.
> >>
> >> Why don't you simply try it and find out?
> >
> > I don't have the system where the issue was observed at hand. I'll have
> > to ask the customer who reported it to run the test so that it may take
> > some time.
>
> Please also take care of the issue Eric mentioned, in that some GC
> calls erroneously do not update the last GC timestamp.
I received the feedback today. The customer reported the patch moving
update of ip6_rt_last_gc into fib6_run_gc() before the call of
fib6_clean_all() was not sufficient to prevent the contention with their
workload.
So I suggest to return back to the approach I submitted originally but
to move the update into fib6_run_gc() anyway so that every run of the
garbage collector updates it. I will submit the patches tomorrow.
Michal Kubecek
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH net] ipv6: prevent fib6_run_gc() contention
2013-07-31 22:19 ` Michal Kubecek
@ 2013-07-31 23:43 ` David Miller
0 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2013-07-31 23:43 UTC (permalink / raw)
To: mkubecek; +Cc: netdev, kuznet, jmorris, yoshfuji, kaber, eric.dumazet
From: Michal Kubecek <mkubecek@suse.cz>
Date: Thu, 1 Aug 2013 00:19:31 +0200
> I received the feedback today. The customer reported the patch moving
> update of ip6_rt_last_gc into fib6_run_gc() before the call of
> fib6_clean_all() was not sufficient to prevent the contention with their
> workload.
>
> So I suggest to return back to the approach I submitted originally but
> to move the update into fib6_run_gc() anyway so that every run of the
> garbage collector updates it. I will submit the patches tomorrow.
Fair enough.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-07-31 23:44 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-04 11:08 [PATCH net] ipv6: prevent fib6_run_gc() contention Michal Kubecek
2013-06-10 21:26 ` David Miller
2013-06-11 10:07 ` Michal Kubecek
2013-06-11 10:40 ` Eric Dumazet
2013-06-11 20:01 ` David Miller
2013-06-11 20:49 ` Michal Kubecek
2013-06-11 21:22 ` David Miller
2013-07-31 22:19 ` Michal Kubecek
2013-07-31 23:43 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).