All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Mason <clm@fb.com>
To: David Miller <davem@davemloft.net>
Cc: <netdev@vger.kernel.org>
Subject: Re: [PATCH RFC] ipv6_fib limit spinlock hold times for /proc/net/ipv6_route
Date: Fri, 25 Apr 2014 16:27:21 -0400	[thread overview]
Message-ID: <535AC529.4030107@fb.com> (raw)
In-Reply-To: <20140425.160929.1031376209639331549.davem@davemloft.net>

On 04/25/2014 04:09 PM, David Miller wrote:
> From: Chris Mason <clm@fb.com>
> Date: Thu, 24 Apr 2014 09:59:24 -0400
>
>> The ipv6 code to dump routes in /proc/net/ipv6_route can hold
>> a read lock on the table for a very long time.  This ends up blocking
>> writers and triggering softlockups.
>>
>> This patch is a simple work around to limit the number of entries
>> we'll walk while processing /proc/net/ipv6_route.  It intentionally
>> slows down proc file reading to make sure we don't lock out the
>> real ipv6 traffic.
>>
>> This patch is also horrible, and doesn't actually fix the entire
>> problem.  We still have rcu_read_lock held the whole time we cat
>> /proc/net/ipv6_route.  On an unpatched machine, I've clocked the
>> time required to cat /proc/net/ipv6_route at 14 minutes.
>
> There is another way to more effectively mitigate this.
>
> Take the rtnl mutex over the traversals.
>
> The tables cannot change if you hold it.
>
> Then you can use rcu_dereference_rtnl() in the table traversals and
> get rid of the RCU locking entirely.

Ah ok, so the rtnl mutex can replace rcu_read_lock().  Will it end up 
blocking any traffic? (sorry, filesystem guys are a little slow)

>
> Now you're only left with the read locking over the individual trees.
> And as in your patch we can drop it temporarily after a limit is hit.

That would be wonderful because I can use some cond_resched() variant, 
and get rid of the max_walk counter completely.

>
> But yes, longer term we need to convert the ipv6 route trees over to
> RCU or similar.

Instead of the ->skip counter, can we get a cursor into the tree and 
just resume walking at the first entry after that cursor?  It would have 
to be a key that we copy out instead of a pointer so we can drop the 
rcu_read_lock()

>
> Even better would be to align the ipv6 routing with how ipv4 works
> since the routing-cache removal.
>

I'll shop task that around here.

-chris

  reply	other threads:[~2014-04-25 20:26 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-24 13:59 [PATCH RFC] ipv6_fib limit spinlock hold times for /proc/net/ipv6_route Chris Mason
2014-04-24 14:20 ` Hannes Frederic Sowa
2014-04-24 14:30   ` Eric Dumazet
2014-04-24 14:41   ` Chris Mason
2014-04-25 21:31     ` Hannes Frederic Sowa
2014-04-26  4:06       ` David Miller
2014-04-24 14:20 ` Eric Dumazet
2014-04-25 19:53 ` David Miller
2014-04-25 20:09 ` David Miller
2014-04-25 20:27   ` Chris Mason [this message]
2014-04-25 21:52     ` Hannes Frederic Sowa
2014-04-26  4:11     ` David Miller
2014-04-28 17:21       ` Chris Mason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=535AC529.4030107@fb.com \
    --to=clm@fb.com \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.