netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Mason <clm@fb.com>
To: <netdev@vger.kernel.org>
Subject: [PATCH RFC] ipv6_fib limit spinlock hold times for /proc/net/ipv6_route
Date: Thu, 24 Apr 2014 09:59:24 -0400	[thread overview]
Message-ID: <535918BC.5030708@fb.com> (raw)


The ipv6 code to dump routes in /proc/net/ipv6_route can hold
a read lock on the table for a very long time.  This ends up blocking
writers and triggering softlockups.

This patch is a simple work around to limit the number of entries
we'll walk while processing /proc/net/ipv6_route.  It intentionally
slows down proc file reading to make sure we don't lock out the
real ipv6 traffic.

This patch is also horrible, and doesn't actually fix the entire
problem.  We still have rcu_read_lock held the whole time we cat
/proc/net/ipv6_route.  On an unpatched machine, I've clocked the
time required to cat /proc/net/ipv6_route at 14 minutes.

java cats this proc file on startup to search for local routes, and the
resulting contention on the table lock makes our boxes fall over.

So, I'm sending the partial fix to get discussion started.

Signed-off-by: Chris Mason <clm@fb.com>

---
 net/ipv6/ip6_fib.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 87891f5..19b0f78 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1814,6 +1814,7 @@ struct ipv6_route_iter {
 	loff_t skip;
 	struct fib6_table *tbl;
 	__u32 sernum;
+	int max_walk;
 };
 
 static int ipv6_route_seq_show(struct seq_file *seq, void *v)
@@ -1853,8 +1854,11 @@ static int ipv6_route_yield(struct fib6_walker_t *w)
 		iter->skip--;
 		if (!iter->skip && iter->w.leaf)
 			return 1;
+		iter->max_walk--;
 	} while (iter->w.leaf);
 
+	if (iter->max_walk <= 0)
+		return -EAGAIN;
 	return 0;
 }
 
@@ -1867,6 +1871,7 @@ static void ipv6_route_seq_setup_walk(struct ipv6_route_iter *iter)
 	iter->w.node = iter->w.root;
 	iter->w.args = iter;
 	iter->sernum = iter->w.root->fn_sernum;
+	iter->max_walk = 128;
 	INIT_LIST_HEAD(&iter->w.lh);
 	fib6_walker_link(&iter->w);
 }
@@ -1921,7 +1926,9 @@ static void *ipv6_route_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 
 iter_table:
 	ipv6_route_check_sernum(iter);
+iter_again:
 	read_lock(&iter->tbl->tb6_lock);
+	iter->max_walk = 128;
 	r = fib6_walk_continue(&iter->w);
 	read_unlock(&iter->tbl->tb6_lock);
 	if (r > 0) {
@@ -1929,6 +1936,8 @@ iter_table:
 			++*pos;
 		return iter->w.leaf;
 	} else if (r < 0) {
+		if (r == -EAGAIN)
+			goto iter_again;
 		fib6_walker_unlink(&iter->w);
 		return NULL;
 	}
-- 
1.8.1

             reply	other threads:[~2014-04-24 13:58 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-24 13:59 Chris Mason [this message]
2014-04-24 14:20 ` [PATCH RFC] ipv6_fib limit spinlock hold times for /proc/net/ipv6_route Hannes Frederic Sowa
2014-04-24 14:30   ` Eric Dumazet
2014-04-24 14:41   ` Chris Mason
2014-04-25 21:31     ` Hannes Frederic Sowa
2014-04-26  4:06       ` David Miller
2014-04-24 14:20 ` Eric Dumazet
2014-04-25 19:53 ` David Miller
2014-04-25 20:09 ` David Miller
2014-04-25 20:27   ` Chris Mason
2014-04-25 21:52     ` Hannes Frederic Sowa
2014-04-26  4:11     ` David Miller
2014-04-28 17:21       ` Chris Mason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=535918BC.5030708@fb.com \
    --to=clm@fb.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).