netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: Martin Lau <kafai@fb.com>
Cc: David Ahern <dsahern@gmail.com>,
	David Miller <davem@davemloft.net>,
	Jianlin Shi <jishi@redhat.com>, Wei Wang <weiwan@google.com>,
	Eric Dumazet <edumazet@google.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
Subject: Re: [PATCH net 1/2] ipv6: Dump route exceptions too in rt6_dump_route()
Date: Mon, 10 Jun 2019 23:01:54 +0200	[thread overview]
Message-ID: <20190610230154.63788980@redhat.com> (raw)
In-Reply-To: <20190610194238.3gke27kflrocrpwo@kafai-mbp.dhcp.thefacebook.com>

On Mon, 10 Jun 2019 19:42:41 +0000
Martin Lau <kafai@fb.com> wrote:

> On Sat, Jun 08, 2019 at 05:47:07PM +0200, Stefano Brivio wrote:
> > On Sat, 8 Jun 2019 17:02:06 +0200
> > Stefano Brivio <sbrivio@redhat.com> wrote:
> >   
> > > On Sat, 8 Jun 2019 07:19:23 +0000
> > > Martin Lau <kafai@fb.com> wrote:
> > >   
> > > > On Sat, Jun 08, 2019 at 07:59:11AM +0200, Stefano Brivio wrote:    
> > > > > I also agree it makes more sense to filter routes this way.
> > > > > 
> > > > > But it wasn't like this before 2b760fcf5cfb, so this smells like
> > > > > breaking userspace expectations, even though iproute already filters
> > > > > routes this way: with 'cache' it only displays routes with
> > > > > RTM_F_CLONED, without, it won't display exceptions, see filter_nlmsg():      
> > > > Thanks for pointing it out.
> > > >     
> > > > > 	if (filter.cloned == !(r->rtm_flags & RTM_F_CLONED))
> > > > > 		return 0;
> > > > > 
> > > > > This, together with the fact it's been like that for almost two years
> > > > > now, makes it acceptable in my opinion. What do you think?      
> > > > With learning the above fact on iproute2,
> > > > it makes even less sense to dump exceptions from the kernel side
> > > > when RTM_F_CLONED is not set.    
> > > 
> > > I just hit a more fundamental problem though: iproute2 filters on the
> > > flag, but never sets it on a dump request. Flags will be NLM_F_DUMP |
> > > NLM_F_REQUEST, no matter what, see rtnl_routedump_req(). So the current
> > > iproute2 would have no way to dump cached routes.  
> > 
> > Partially wrong: it actually sets it on 'list':
> > 
> > 	if (rtnl_routedump_req(&rth, dump_family, iproute_dump_filter) < 0) {
> > 
> > [...]
> > static int iproute_dump_filter(struct nlmsghdr *nlh, int reqlen)
> > [...]
> > 	if (filter.cloned)
> > 		rtm->rtm_flags |= RTM_F_CLONED;
> > 
> > but not on 'flush':
> > 
> > 		if (rtnl_routedump_req(&rth, family, NULL) < 0) {
> > 
> > but this doesn't change things much: it still has no way to flush the
> > cache, because the dump to get the routes to flush doesn't contain the
> > exceptions.  
> 'ip -6 r l table cache' can be limited to dump the cache only, right?

Yes, at it was in v1 and v2. But that's arbitrary and inconsistent,
because without RTM_F_CLONED we would anyway need (with current
iproute2) to dump exceptions too.

RTM_F_CLONED just isn't a filter right now. Let's add support for
NLM_F_MATCH later and have clear semantics. I think this whole mess
comes from the fact we miss that.

> I am still missing something about why the kernel is required
> to output everything and then filtered out in the iproute2.
> 
> You meant either:
> The kernel needs to dump everything first. iproute2 can then figure out
> which one is cache and then flush them?

Yes, this is the case, right now.

> or
> the iproute2 can be changed to only get the cache from the kernel and then
> flush them?

I didn't mean this. In theory, we could, but I wouldn't fix a kernel
regression with a userspace change. I'm not adding NLM_F_MATCH support
in this fix because it wouldn't be a fix for past and current iproute2
versions, so it wouldn't be a fix.

> AFAIK, the kernel has never dumped the cache routes for IPv4.

Right now, it doesn't, but I can't believe that 'ip route list cache'
was implemented in iproute2 and it never worked. I haven't tested IPv4
with older kernels, though.

> What is done here has to be consistent with the future patch in IPv4.
> Each node can hold up to 5*1024 caches which is ok-ish but still a waste
> to dump it and then not printing it.

Indeed, I agree. For both IPv4 and IPv6, we need to have clear
semantics to avoid this waste. NLM_F_MATCH is specified by RFC 3549
exactly for this purpose, but we don't implement that.

Without NLM_F_MATCH, RTM_F_CLONED is just what the comment in UAPI says:

#define RTM_F_CLONED		0x200	/* This route is cloned		*/

So the future patch for IPv4 would be consistent: without NLM_F_MATCH
we'll dump everything and iproute2 filters (as it was until 2017 at
least for IPv6).

Then I'd add, for both IPv4 and IPv6, support for NLM_F_MATCH (both
for kernel and iproute2): kernel filters, sends NLM_F_DUMP_FILTERED
back and iproute2 doesn't (need to) filter. But this will be a (kind of
needed) optimisation, not a fix.

-- 
Stefano

  reply	other threads:[~2019-06-10 21:02 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-06 20:13 [PATCH net 0/2] ipv6: Fix listing and flushing of cached route exceptions Stefano Brivio
2019-06-06 20:13 ` [PATCH net 1/2] ipv6: Dump route exceptions too in rt6_dump_route() Stefano Brivio
2019-06-06 20:57   ` David Ahern
2019-06-06 21:18     ` Stefano Brivio
2019-06-06 22:47       ` David Ahern
2019-06-06 23:07         ` Stefano Brivio
2019-06-08  5:40         ` Martin Lau
2019-06-08  5:59           ` Stefano Brivio
2019-06-08  7:19             ` Martin Lau
2019-06-08 15:02               ` Stefano Brivio
2019-06-08 15:47                 ` Stefano Brivio
2019-06-10 19:42                   ` Martin Lau
2019-06-10 21:01                     ` Stefano Brivio [this message]
2019-06-10  5:56                 ` Vaittinen, Matti
2019-06-10 19:01                   ` Stefano Brivio
2019-06-06 21:44   ` Martin Lau
2019-06-06 22:17     ` Stefano Brivio
2019-06-06 22:37       ` Martin Lau
2019-06-06 22:48         ` David Ahern
2019-06-07  1:54           ` Stefano Brivio
2019-06-06 22:58         ` Stefano Brivio
2019-06-06 23:15           ` Stefano Brivio
2019-06-06 23:19           ` David Ahern
2019-06-06 23:31           ` Martin Lau
2019-06-06 20:13 ` [PATCH net 2/2] ip6_fib: Don't discard nodes with valid routing information in fib6_locate_1() Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190610230154.63788980@redhat.com \
    --to=sbrivio@redhat.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@gmail.com \
    --cc=edumazet@google.com \
    --cc=jishi@redhat.com \
    --cc=kafai@fb.com \
    --cc=matti.vaittinen@fi.rohmeurope.com \
    --cc=netdev@vger.kernel.org \
    --cc=weiwan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).