From: Jiri Pirko <jiri@resnulli.us>
To: netdev@vger.kernel.org
Cc: davem@davemloft.net, idosch@mellanox.com, eladr@mellanox.com,
yotamg@mellanox.com, nogahf@mellanox.com, ogerlitz@mellanox.com,
roopa@cumulusnetworks.com, nikolay@cumulusnetworks.com,
linville@tuxdriver.com, tgraf@suug.ch, gospo@cumulusnetworks.com,
sfeldma@gmail.com, ast@plumgrid.com, edumazet@google.com,
hannes@stressinduktion.org, f.fainelli@gmail.com,
dsa@cumulusnetworks.com, jhs@mojatatu.com,
vivien.didelot@savoirfairelinux.com, john.fastabend@intel.com,
andrew@lunn.ch, ivecera@redhat.com
Subject: [patch net-next RFC 0/2] fib4 offload: notifier to let hw to be aware of all prefixes
Date: Tue, 6 Sep 2016 14:01:38 +0200 [thread overview]
Message-ID: <1473163300-2045-1-git-send-email-jiri@resnulli.us> (raw)
From: Jiri Pirko <jiri@mellanox.com>
This is RFC, unfinished. I came across some issues in the process so I would
like to share those and restart the fib offload discussion in order to make it
really usable.
So the goal of this patchset is to allow driver to propagate all prefixes
configured in kernel down HW. This is necessary for routing to work
as expected. If we don't do that HW might forward prefixes known to kernel
incorrectly. Take an example when default route is set in switch HW and there
is an IP address set on a management (non-switch) port.
Currently, only fibs related to the switch port netdev are offloaded using
switchdev ops. This model is not extendable so the first patch introduces
a replacement: notifier to propagate fib additions and removals to whoever
interested. The second patch makes mlxsw to adopt this new way, registering
one notifier block for each mlxsw (asic) instance.
Using switchdev ops, "abort" is called by switchdev core whenever there is
an error during fib add offload. This leads to removal of all offloaded fibs on
system by fib_trie code.
Now the new notifier assumes the driver takes care of the abort action.
Here's why:
1) The fact that one HW cannot offload fib does not mean that the others can't
do it. So let only one entity to abort and leave the rest to work happily.
2) The driver knows what to in order to properly abort. For example, currently
abort is broken for mlxsw as for Spectrum there is a need to set 0.0.0.0/0
trap in RALUE register.
Issues:
1) RTNH_F_OFFLOAD is originally set in switchdev core. There the assumption is
that only one offload device exists. But for fib notifier, we assume
multiple offload devices. When should the offload flag be set and by who?
I think that it would make sense to have a per-fib reference counter
for this:
0 means RTNH_F_OFFLOAD is not set, no device offloads this entry
n means RTNH_F_OFFLOAD is set and the fib entry is offloaded by n devices
2) Unabort? Would be nice. Currently when add_failure->abort happens,
user's only option is to reboot the machine. I would like to make this
nicer for the fib notifier implementation. Perhaps to provide some button in
devlink which would tell driver to try to offload entries again? Not sure.
3) Policies. Not directly connected to this patchset but this issues
we have been discussing couple of times and I still believe that
the current state is not good.
Software-only forwarding now happens in case of abort and makes the ASIC
ports to act like dummy separate NICs. In case of Spectrum, the bandwidth
of CPU port is something around 4Gbit. For 32x100Gbit ports this is
simply not possible to handle. In case of abort, the system is broken
as it really could not forward packets at a speed not even close
to the expected.
Here the policies come to the picture, allowing the user to set the
system to behave according his expectations. For example rather
fail to add the route than to abort to software forward.
This policy could be per-ASIC, configurable by devlink.
Thoughts please?
Jiri Pirko (2):
fib: introduce fib notification infrastructure
mlxsw: spectrum_router: Use FIB notifications instead of switchdev
calls
drivers/net/ethernet/mellanox/mlxsw/spectrum.h | 8 +-
.../net/ethernet/mellanox/mlxsw/spectrum_router.c | 257 ++++++++++-----------
.../ethernet/mellanox/mlxsw/spectrum_switchdev.c | 9 -
include/net/ip_fib.h | 19 ++
net/ipv4/fib_trie.c | 43 ++++
5 files changed, 181 insertions(+), 155 deletions(-)
--
2.5.5
next reply other threads:[~2016-09-06 12:01 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-06 12:01 Jiri Pirko [this message]
2016-09-06 12:01 ` [patch net-next RFC 1/2] fib: introduce fib notification infrastructure Jiri Pirko
2016-09-06 14:32 ` David Ahern
2016-09-06 14:44 ` Jiri Pirko
2016-09-06 15:11 ` David Ahern
2016-09-06 15:49 ` Jiri Pirko
2016-09-06 16:14 ` Hannes Frederic Sowa
2016-09-06 15:13 ` David Ahern
2016-09-07 8:03 ` Jiri Pirko
2016-09-15 14:41 ` [net-next,RFC,1/2] " Andy Gospodarek
2016-09-15 14:45 ` Jiri Pirko
2016-09-15 14:47 ` Andy Gospodarek
2016-09-18 23:23 ` [patch net-next RFC 1/2] " Roopa Prabhu
2016-09-19 6:06 ` Jiri Pirko
2016-09-19 14:53 ` Roopa Prabhu
2016-09-19 15:08 ` Jiri Pirko
2016-09-06 12:01 ` [patch net-next RFC 2/2] mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls Jiri Pirko
2016-09-18 20:00 ` [patch net-next RFC 0/2] fib4 offload: notifier to let hw to be aware of all prefixes Florian Fainelli
2016-09-18 23:16 ` Roopa Prabhu
2016-09-19 6:14 ` Jiri Pirko
2016-09-19 14:59 ` Roopa Prabhu
2016-09-19 15:15 ` Jiri Pirko
2016-09-20 5:49 ` Roopa Prabhu
2016-09-20 6:02 ` Jiri Pirko
2016-09-20 6:18 ` Roopa Prabhu
2016-09-19 6:08 ` Jiri Pirko
2016-09-19 16:44 ` Florian Fainelli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1473163300-2045-1-git-send-email-jiri@resnulli.us \
--to=jiri@resnulli.us \
--cc=andrew@lunn.ch \
--cc=ast@plumgrid.com \
--cc=davem@davemloft.net \
--cc=dsa@cumulusnetworks.com \
--cc=edumazet@google.com \
--cc=eladr@mellanox.com \
--cc=f.fainelli@gmail.com \
--cc=gospo@cumulusnetworks.com \
--cc=hannes@stressinduktion.org \
--cc=idosch@mellanox.com \
--cc=ivecera@redhat.com \
--cc=jhs@mojatatu.com \
--cc=john.fastabend@intel.com \
--cc=linville@tuxdriver.com \
--cc=netdev@vger.kernel.org \
--cc=nikolay@cumulusnetworks.com \
--cc=nogahf@mellanox.com \
--cc=ogerlitz@mellanox.com \
--cc=roopa@cumulusnetworks.com \
--cc=sfeldma@gmail.com \
--cc=tgraf@suug.ch \
--cc=vivien.didelot@savoirfairelinux.com \
--cc=yotamg@mellanox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).