From: Roopa Prabhu <roopa@cumulusnetworks.com>
To: Ido Schimmel <idosch@idosch.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
netdev@vger.kernel.org, davem@davemloft.net, jiri@mellanox.com,
mlxsw@mellanox.com, dsa@cumulusnetworks.com,
nikolay@cumulusnetworks.com, andy@greyhouse.net,
vivien.didelot@savoirfairelinux.com, andrew@lunn.ch,
f.fainelli@gmail.com, alexander.h.duyck@intel.com,
kuznet@ms2.inr.ac.ru, jmorris@namei.org, yoshfuji@linux-ipv6.org,
kaber@trash.net, Ido Schimmel <idosch@mellanox.com>
Subject: Re: [PATCH net-next v2] ipv4: fib: Replay events when registering FIB notifier
Date: Tue, 01 Nov 2016 19:13:42 -0700 [thread overview]
Message-ID: <58194BD6.5040406@cumulusnetworks.com> (raw)
In-Reply-To: <20161101170345.pq2ewecw35mrurkp@splinter>
On 11/1/16, 10:03 AM, Ido Schimmel wrote:
> Hi Roopa,
>
> On Tue, Nov 01, 2016 at 08:14:14AM -0700, Roopa Prabhu wrote:
>>
[snip]
>> I have the same concern as Eric here.
>>
>> I understand why you need it, but can the driver request for an initial dump and that
>> dump be made more efficient somehow ie not hold rtnl for the whole dump ?.
>> instead of making the fib notifier registration code doing it.
> We can do what we suggested in the last bi-weekly meeting, which is
> still holding rtnl, but moving the hardware operation to delayed work.
> This is possible because upper layers always assume operation was
> successful and driver is responsible for invoking its abort mechanism in
> case of failure.
>
>> these routing table sizes can be huge and an analogy for this in user-space:
>> We do request a netlink dump of routing tables at initialization (on driver starts or resets)...
>> but, existing netlink routing table dumps for that scale don't hold rtnl for the whole dump.
>> The dump is split into multiple responses to the user and hence it does not starve other rtnl users.
> In my reply to Eric I mentioned that when we register and unregister
> from this chain the tables aren't really huge, but instead quite small.
> I understand your concerns, but I don't wish to make things more
> complicated than they should be only to address concerns that aren't
> really realistic.
I understand..but, if you are adding some core infrastructure for switchdev ..it cannot be
based on the number of simple use-cases or data you have today.
I won't be surprised if tomorrow other switch drivers have a case where they need to
reset the hw routing table state and reprogram all routes again. Re-registering the notifier to just
get the routing state of the kernel will not scale. For the long term, since the driver does not maintain a cache,
a pull api with efficient use of rtnl will be useful for other such cases as well.
If you don't want to get to the complexity of a new api right away because of the
simple case of management interface routes you have, Can your driver register the notifier early ?
(I am sure you have probably already thought about this)
>
> I believe current patch is quite simple and also consistent with other
> notification chains in the kernel, such as the netdevice, where rtnl is
> held during replay of events.
> http://lxr.free-electrons.com/source/net/core/dev.c#L1535
as you know, netdev and routing scale are not the same thing.
Looking at the current code for netdevices (replay and rollback on failure),
a pull api (equivalent to the netlink dump api) may end up being less complex...with an
ability to batch in the future.
next prev parent reply other threads:[~2016-11-02 2:13 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-31 21:13 [PATCH net-next v2] ipv4: fib: Replay events when registering FIB notifier idosch
2016-10-31 21:24 ` Eric Dumazet
2016-10-31 22:57 ` Ido Schimmel
2016-11-01 14:19 ` Eric Dumazet
2016-11-01 15:14 ` Roopa Prabhu
2016-11-01 15:36 ` David Miller
2016-11-02 7:35 ` Jiri Pirko
2016-11-02 15:26 ` David Miller
2016-11-01 17:03 ` Ido Schimmel
2016-11-02 2:13 ` Roopa Prabhu [this message]
2016-11-02 7:20 ` Jiri Pirko
2016-11-02 13:29 ` Roopa Prabhu
2016-11-02 13:44 ` Ido Schimmel
2016-11-02 13:48 ` Jiri Pirko
2016-11-02 14:35 ` Roopa Prabhu
2016-11-02 14:43 ` Jiri Pirko
2016-11-01 15:44 ` Ido Schimmel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=58194BD6.5040406@cumulusnetworks.com \
--to=roopa@cumulusnetworks.com \
--cc=alexander.h.duyck@intel.com \
--cc=andrew@lunn.ch \
--cc=andy@greyhouse.net \
--cc=davem@davemloft.net \
--cc=dsa@cumulusnetworks.com \
--cc=eric.dumazet@gmail.com \
--cc=f.fainelli@gmail.com \
--cc=idosch@idosch.org \
--cc=idosch@mellanox.com \
--cc=jiri@mellanox.com \
--cc=jmorris@namei.org \
--cc=kaber@trash.net \
--cc=kuznet@ms2.inr.ac.ru \
--cc=mlxsw@mellanox.com \
--cc=netdev@vger.kernel.org \
--cc=nikolay@cumulusnetworks.com \
--cc=vivien.didelot@savoirfairelinux.com \
--cc=yoshfuji@linux-ipv6.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.