From: ebiederm@xmission.com (Eric W. Biederman)
To: David Miller <davem@davemloft.net>
Cc: hans.schillstrom@ericsson.com, daniel.lezcano@free.fr,
netdev@vger.kernel.org
Subject: Re: BUG ? ipip unregister_netdevice_many()
Date: Wed, 13 Oct 2010 22:20:28 -0700 [thread overview]
Message-ID: <m162x5492h.fsf@fess.ebiederm.org> (raw)
In-Reply-To: 20101013.215013.104074480.davem@davemloft.net
David Miller <davem@davemloft.net> writes:
> From: ebiederm@xmission.com (Eric W. Biederman)
> Date: Wed, 13 Oct 2010 21:40:49 -0700
>
>> However I think the test should still be rt_is_expired(), because
>> that is what rt_do_flush() is doing removing the expired entries
>> from the list.
>
> I can't see a reason for that test.
>
> Everything calling into this code path has created a condition
> that requires that all routing cache entries for that namespace
> be deleted.
>
> This function is meant to unconditionally flush the entire table.
>
> I believe you added that extraneous test, and it never existed there
> before.
At the point network namespaces entered the picture the logic was:
void rt_cache_flush(struct net *net, int delay)
{
rt_cache_invalidate();
if (delay >= 0)
rt_do_flush(!in_softirq());
}
/* Strictly speaking rt_is_expired was just open coded in
* rt_check_expire. But this is the check that was used.
*/
static inline int rt_is_expired(struct rtable *rth)
{
return rth->rt_genid != atomic_read(&rt_genid);
}
static void rt_cache_invalidate(void)
{
unsigned char shuffle;
get_random_bytes(&shuffle, sizeof(shuffle));
atomic_add(shuffle + 1U, &rt_genid);
}
static void rt_do_flush(int process_context)
{
unsigned int i;
struct rtable *rth, *next;
for (i = 0; i <= rt_hash_mask; i++) {
if (process_context && need_resched())
cond_resched();
rth = rt_hash_table[i].chain;
if (!rth)
continue;
spin_lock_bh(rt_hash_lock_addr(i));
rth = rt_hash_table[i].chain;
rt_hash_table[i].chain = NULL;
tail = NULL;
spin_unlock_bh(rt_hash_lock_addr(i));
for(; rth != tail; rth = next)
{
next = rth->dst.rt_next;
rt_free(rth);
}
}
}
Because of the rt_cache_invalidate() in rt_cache_flush() this
guaranteed that rt_is_expired() was true for every route cache entry,
and this also guaranteed that every routing cache entry we were flush
atomically became inaccessible.
So rt_is_expired() has always been valid, but in practice it was just
always optimized out as being redundant.
With the network namespace support we limit the scope of the test of
the invalidate to just a single network namespace, and as such
rt_is_expired stops being true for every cache entry. So we cannot
unconditionally throw away entire chains.
All of which can be either done by network namespace equality or by
rt_is_expired(). Although Denis picked rt_is_expired() when he made
his change.
The only place it makes a noticable difference in practice is what
happens when we do batched deleletes of lots of network devices in
different network namespaces.
During batched network device deletes in fib_netdev_event we do
rt_cache_flush(dev_net(dev), -1) for each network device. and then a
final rt_cache_flush_batch() to remove the invalidated entries. These
devices can be from multiple network namespaces, so I suspect that is
a savings worth having.
So if we are going to change the tests we need to do something with
rt_cache_flush_batch(). Further I do not see what is confusing about
a test that asks if the routing cache entry is unusable. Is
rt_cache_expired() a bad name?
Eric
next prev parent reply other threads:[~2010-10-14 5:50 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-07 8:48 BUG ? ipip unregister_netdevice_many() Hans Schillstrom
2010-10-08 11:19 ` Daniel Lezcano
2010-10-08 11:53 ` Hans Schillstrom
2010-10-08 12:28 ` Hans Schillstrom
2010-10-08 15:53 ` Daniel Lezcano
2010-10-08 16:17 ` Daniel Lezcano
2010-10-08 16:58 ` Eric W. Biederman
2010-10-08 17:29 ` Daniel Lezcano
2010-10-08 17:47 ` Daniel Lezcano
2010-10-08 16:45 ` Eric W. Biederman
2010-10-08 17:20 ` David Miller
2010-10-08 17:32 ` Eric W. Biederman
2010-10-12 20:05 ` David Miller
2010-10-13 11:19 ` Jarek Poplawski
2010-10-13 21:58 ` David Miller
2010-10-14 6:41 ` Hans Schillstrom
2010-10-13 22:16 ` Daniel Lezcano
2010-10-13 23:23 ` David Miller
2010-10-14 3:57 ` Eric Dumazet
2010-10-14 23:28 ` Paul E. McKenney
2010-10-14 4:40 ` Eric W. Biederman
2010-10-14 4:50 ` David Miller
2010-10-14 5:20 ` Eric W. Biederman [this message]
2010-10-14 15:09 ` David Miller
2010-10-14 18:35 ` Eric W. Biederman
2010-10-08 16:51 ` Eric W. Biederman
2010-10-08 16:06 ` Eric W. Biederman
-- strict thread matches above, loose matches on Subject: below --
2010-10-14 19:21 Octavian Purdila
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m162x5492h.fsf@fess.ebiederm.org \
--to=ebiederm@xmission.com \
--cc=daniel.lezcano@free.fr \
--cc=davem@davemloft.net \
--cc=hans.schillstrom@ericsson.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).