From: Stefano Brivio <sbrivio@redhat.com>
To: Martin Lau <kafai@fb.com>
Cc: David Ahern <dsahern@gmail.com>,
David Miller <davem@davemloft.net>,
Jianlin Shi <jishi@redhat.com>, Wei Wang <weiwan@google.com>,
Eric Dumazet <edumazet@google.com>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
Subject: Re: [PATCH net 1/2] ipv6: Dump route exceptions too in rt6_dump_route()
Date: Mon, 10 Jun 2019 23:01:54 +0200 [thread overview]
Message-ID: <20190610230154.63788980@redhat.com> (raw)
In-Reply-To: <20190610194238.3gke27kflrocrpwo@kafai-mbp.dhcp.thefacebook.com>
On Mon, 10 Jun 2019 19:42:41 +0000
Martin Lau <kafai@fb.com> wrote:
> On Sat, Jun 08, 2019 at 05:47:07PM +0200, Stefano Brivio wrote:
> > On Sat, 8 Jun 2019 17:02:06 +0200
> > Stefano Brivio <sbrivio@redhat.com> wrote:
> >
> > > On Sat, 8 Jun 2019 07:19:23 +0000
> > > Martin Lau <kafai@fb.com> wrote:
> > >
> > > > On Sat, Jun 08, 2019 at 07:59:11AM +0200, Stefano Brivio wrote:
> > > > > I also agree it makes more sense to filter routes this way.
> > > > >
> > > > > But it wasn't like this before 2b760fcf5cfb, so this smells like
> > > > > breaking userspace expectations, even though iproute already filters
> > > > > routes this way: with 'cache' it only displays routes with
> > > > > RTM_F_CLONED, without, it won't display exceptions, see filter_nlmsg():
> > > > Thanks for pointing it out.
> > > >
> > > > > if (filter.cloned == !(r->rtm_flags & RTM_F_CLONED))
> > > > > return 0;
> > > > >
> > > > > This, together with the fact it's been like that for almost two years
> > > > > now, makes it acceptable in my opinion. What do you think?
> > > > With learning the above fact on iproute2,
> > > > it makes even less sense to dump exceptions from the kernel side
> > > > when RTM_F_CLONED is not set.
> > >
> > > I just hit a more fundamental problem though: iproute2 filters on the
> > > flag, but never sets it on a dump request. Flags will be NLM_F_DUMP |
> > > NLM_F_REQUEST, no matter what, see rtnl_routedump_req(). So the current
> > > iproute2 would have no way to dump cached routes.
> >
> > Partially wrong: it actually sets it on 'list':
> >
> > if (rtnl_routedump_req(&rth, dump_family, iproute_dump_filter) < 0) {
> >
> > [...]
> > static int iproute_dump_filter(struct nlmsghdr *nlh, int reqlen)
> > [...]
> > if (filter.cloned)
> > rtm->rtm_flags |= RTM_F_CLONED;
> >
> > but not on 'flush':
> >
> > if (rtnl_routedump_req(&rth, family, NULL) < 0) {
> >
> > but this doesn't change things much: it still has no way to flush the
> > cache, because the dump to get the routes to flush doesn't contain the
> > exceptions.
> 'ip -6 r l table cache' can be limited to dump the cache only, right?
Yes, at it was in v1 and v2. But that's arbitrary and inconsistent,
because without RTM_F_CLONED we would anyway need (with current
iproute2) to dump exceptions too.
RTM_F_CLONED just isn't a filter right now. Let's add support for
NLM_F_MATCH later and have clear semantics. I think this whole mess
comes from the fact we miss that.
> I am still missing something about why the kernel is required
> to output everything and then filtered out in the iproute2.
>
> You meant either:
> The kernel needs to dump everything first. iproute2 can then figure out
> which one is cache and then flush them?
Yes, this is the case, right now.
> or
> the iproute2 can be changed to only get the cache from the kernel and then
> flush them?
I didn't mean this. In theory, we could, but I wouldn't fix a kernel
regression with a userspace change. I'm not adding NLM_F_MATCH support
in this fix because it wouldn't be a fix for past and current iproute2
versions, so it wouldn't be a fix.
> AFAIK, the kernel has never dumped the cache routes for IPv4.
Right now, it doesn't, but I can't believe that 'ip route list cache'
was implemented in iproute2 and it never worked. I haven't tested IPv4
with older kernels, though.
> What is done here has to be consistent with the future patch in IPv4.
> Each node can hold up to 5*1024 caches which is ok-ish but still a waste
> to dump it and then not printing it.
Indeed, I agree. For both IPv4 and IPv6, we need to have clear
semantics to avoid this waste. NLM_F_MATCH is specified by RFC 3549
exactly for this purpose, but we don't implement that.
Without NLM_F_MATCH, RTM_F_CLONED is just what the comment in UAPI says:
#define RTM_F_CLONED 0x200 /* This route is cloned */
So the future patch for IPv4 would be consistent: without NLM_F_MATCH
we'll dump everything and iproute2 filters (as it was until 2017 at
least for IPv6).
Then I'd add, for both IPv4 and IPv6, support for NLM_F_MATCH (both
for kernel and iproute2): kernel filters, sends NLM_F_DUMP_FILTERED
back and iproute2 doesn't (need to) filter. But this will be a (kind of
needed) optimisation, not a fix.
--
Stefano
next prev parent reply other threads:[~2019-06-10 21:02 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-06 20:13 [PATCH net 0/2] ipv6: Fix listing and flushing of cached route exceptions Stefano Brivio
2019-06-06 20:13 ` [PATCH net 1/2] ipv6: Dump route exceptions too in rt6_dump_route() Stefano Brivio
2019-06-06 20:57 ` David Ahern
2019-06-06 21:18 ` Stefano Brivio
2019-06-06 22:47 ` David Ahern
2019-06-06 23:07 ` Stefano Brivio
2019-06-08 5:40 ` Martin Lau
2019-06-08 5:59 ` Stefano Brivio
2019-06-08 7:19 ` Martin Lau
2019-06-08 15:02 ` Stefano Brivio
2019-06-08 15:47 ` Stefano Brivio
2019-06-10 19:42 ` Martin Lau
2019-06-10 21:01 ` Stefano Brivio [this message]
2019-06-10 5:56 ` Vaittinen, Matti
2019-06-10 19:01 ` Stefano Brivio
2019-06-06 21:44 ` Martin Lau
2019-06-06 22:17 ` Stefano Brivio
2019-06-06 22:37 ` Martin Lau
2019-06-06 22:48 ` David Ahern
2019-06-07 1:54 ` Stefano Brivio
2019-06-06 22:58 ` Stefano Brivio
2019-06-06 23:15 ` Stefano Brivio
2019-06-06 23:19 ` David Ahern
2019-06-06 23:31 ` Martin Lau
2019-06-06 20:13 ` [PATCH net 2/2] ip6_fib: Don't discard nodes with valid routing information in fib6_locate_1() Stefano Brivio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190610230154.63788980@redhat.com \
--to=sbrivio@redhat.com \
--cc=davem@davemloft.net \
--cc=dsahern@gmail.com \
--cc=edumazet@google.com \
--cc=jishi@redhat.com \
--cc=kafai@fb.com \
--cc=matti.vaittinen@fi.rohmeurope.com \
--cc=netdev@vger.kernel.org \
--cc=weiwan@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.