From: Jiri Pirko <jiri@resnulli.us>
To: netdev@vger.kernel.org
Cc: davem@davemloft.net, idosch@mellanox.com, eladr@mellanox.com,
yotamg@mellanox.com, nogahf@mellanox.com, arkadis@mellanox.com,
ogerlitz@mellanox.com, roopa@cumulusnetworks.com,
dsa@cumulusnetworks.com, nikolay@cumulusnetworks.com,
andy@greyhouse.net, vivien.didelot@savoirfairelinux.com,
andrew@lunn.ch, f.fainelli@gmail.com,
alexander.h.duyck@intel.com, hannes@stressinduktion.org,
kaber@trash.net
Subject: [patch net-next v3 04/12] mlxsw: spectrum_router: Implement FIB offload in deferred work
Date: Wed, 30 Nov 2016 11:08:58 +0100 [thread overview]
Message-ID: <1480500546-2544-5-git-send-email-jiri@resnulli.us> (raw)
In-Reply-To: <1480500546-2544-1-git-send-email-jiri@resnulli.us>
From: Ido Schimmel <idosch@mellanox.com>
FIB offload is currently done in process context with RTNL held, but
we're about to dump the FIB tables in RCU critical section, so we can no
longer sleep.
Instead, defer the operation to process context using deferred work. Make
sure fib info isn't freed while the work is queued by taking a reference
on it and releasing it after the operation is done.
Deferring the operation is valid because the upper layers always assume
the operation was successful. If it's not, then the driver-specific
abort mechanism is called and all routed traffic is directed to slow
path.
The work items are submitted to an ordered workqueue to prevent a
mismatch between the kernel's FIB table and the device's.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
.../net/ethernet/mellanox/mlxsw/spectrum_router.c | 72 +++++++++++++++++++---
1 file changed, 62 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 683f045..14bed1d 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -593,6 +593,14 @@ static void mlxsw_sp_router_fib_flush(struct mlxsw_sp *mlxsw_sp);
static void mlxsw_sp_vrs_fini(struct mlxsw_sp *mlxsw_sp)
{
+ /* At this stage we're guaranteed not to have new incoming
+ * FIB notifications and the work queue is free from FIBs
+ * sitting on top of mlxsw netdevs. However, we can still
+ * have other FIBs queued. Flush the queue before flushing
+ * the device's tables. No need for locks, as we're the only
+ * writer.
+ */
+ mlxsw_core_flush_owq();
mlxsw_sp_router_fib_flush(mlxsw_sp);
kfree(mlxsw_sp->router.vrs);
}
@@ -1948,30 +1956,74 @@ static void __mlxsw_sp_router_fini(struct mlxsw_sp *mlxsw_sp)
kfree(mlxsw_sp->rifs);
}
-static int mlxsw_sp_router_fib_event(struct notifier_block *nb,
- unsigned long event, void *ptr)
+struct mlxsw_sp_fib_event_work {
+ struct delayed_work dw;
+ struct fib_entry_notifier_info fen_info;
+ struct mlxsw_sp *mlxsw_sp;
+ unsigned long event;
+};
+
+static void mlxsw_sp_router_fib_event_work(struct work_struct *work)
{
- struct mlxsw_sp *mlxsw_sp = container_of(nb, struct mlxsw_sp, fib_nb);
- struct fib_entry_notifier_info *fen_info = ptr;
+ struct mlxsw_sp_fib_event_work *fib_work =
+ container_of(work, struct mlxsw_sp_fib_event_work, dw.work);
+ struct mlxsw_sp *mlxsw_sp = fib_work->mlxsw_sp;
int err;
- if (!net_eq(fen_info->info.net, &init_net))
- return NOTIFY_DONE;
-
- switch (event) {
+ /* Protect internal structures from changes */
+ rtnl_lock();
+ switch (fib_work->event) {
case FIB_EVENT_ENTRY_ADD:
- err = mlxsw_sp_router_fib4_add(mlxsw_sp, fen_info);
+ err = mlxsw_sp_router_fib4_add(mlxsw_sp, &fib_work->fen_info);
if (err)
mlxsw_sp_router_fib4_abort(mlxsw_sp);
+ fib_info_put(fib_work->fen_info.fi);
break;
case FIB_EVENT_ENTRY_DEL:
- mlxsw_sp_router_fib4_del(mlxsw_sp, fen_info);
+ mlxsw_sp_router_fib4_del(mlxsw_sp, &fib_work->fen_info);
+ fib_info_put(fib_work->fen_info.fi);
break;
case FIB_EVENT_RULE_ADD: /* fall through */
case FIB_EVENT_RULE_DEL:
mlxsw_sp_router_fib4_abort(mlxsw_sp);
break;
}
+ rtnl_unlock();
+ kfree(fib_work);
+}
+
+/* Called with rcu_read_lock() */
+static int mlxsw_sp_router_fib_event(struct notifier_block *nb,
+ unsigned long event, void *ptr)
+{
+ struct mlxsw_sp *mlxsw_sp = container_of(nb, struct mlxsw_sp, fib_nb);
+ struct mlxsw_sp_fib_event_work *fib_work;
+ struct fib_notifier_info *info = ptr;
+
+ if (!net_eq(info->net, &init_net))
+ return NOTIFY_DONE;
+
+ fib_work = kzalloc(sizeof(*fib_work), GFP_ATOMIC);
+ if (WARN_ON(!fib_work))
+ return NOTIFY_BAD;
+
+ INIT_DELAYED_WORK(&fib_work->dw, mlxsw_sp_router_fib_event_work);
+ fib_work->mlxsw_sp = mlxsw_sp;
+ fib_work->event = event;
+
+ switch (event) {
+ case FIB_EVENT_ENTRY_ADD: /* fall through */
+ case FIB_EVENT_ENTRY_DEL:
+ memcpy(&fib_work->fen_info, ptr, sizeof(fib_work->fen_info));
+ /* Take referece on fib_info to prevent it from being
+ * freed while work is queued. Release it afterwards.
+ */
+ fib_info_hold(fib_work->fen_info.fi);
+ break;
+ }
+
+ mlxsw_core_schedule_odw(&fib_work->dw, 0);
+
return NOTIFY_DONE;
}
--
2.7.4
next prev parent reply other threads:[~2016-11-30 10:09 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-30 10:08 [patch net-next v3 00/12] ipv4: fib: Allow modules to dump FIB tables Jiri Pirko
2016-11-30 10:08 ` [patch net-next v3 01/12] ipv4: fib: Export free_fib_info() Jiri Pirko
2016-11-30 10:08 ` [patch net-next v3 02/12] ipv4: fib: Add fib_info_hold() helper Jiri Pirko
2016-11-30 10:08 ` [patch net-next v3 03/12] mlxsw: core: Create an ordered workqueue for FIB offload Jiri Pirko
2016-11-30 10:08 ` Jiri Pirko [this message]
2016-11-30 10:08 ` [patch net-next v3 05/12] rocker: " Jiri Pirko
2016-11-30 10:09 ` [patch net-next v3 06/12] rocker: Implement FIB offload in deferred work Jiri Pirko
2016-11-30 10:09 ` [patch net-next v3 07/12] ipv4: fib: Convert FIB notification chain to be atomic Jiri Pirko
2016-11-30 10:09 ` [patch net-next v3 08/12] ipv4: fib: Allow for consistent FIB dumping Jiri Pirko
2016-11-30 10:09 ` [patch net-next v3 09/12] ipv4: fib: Add sysctl to limit number of FIB dump retries Jiri Pirko
2016-11-30 10:09 ` [patch net-next v3 10/12] ipv4: fib: Add an API to request a FIB dump Jiri Pirko
2016-11-30 10:09 ` [patch net-next v3 11/12] mlxsw: spectrum_router: Request a dump of FIB tables during init Jiri Pirko
2016-11-30 15:37 ` Hannes Frederic Sowa
2016-11-30 16:32 ` Ido Schimmel
2016-11-30 16:49 ` Hannes Frederic Sowa
2016-11-30 18:22 ` Ido Schimmel
2016-12-01 21:57 ` Hannes Frederic Sowa
2016-12-01 23:14 ` Ido Schimmel
2016-12-01 23:27 ` Hannes Frederic Sowa
2016-12-02 9:34 ` Ido Schimmel
2016-12-01 20:04 ` David Miller
2016-12-01 20:40 ` Hannes Frederic Sowa
2016-12-01 20:54 ` Ido Schimmel
2016-12-01 21:09 ` Hannes Frederic Sowa
2016-12-01 21:21 ` Ido Schimmel
2016-12-01 21:09 ` Hannes Frederic Sowa
2016-11-30 10:09 ` [patch net-next v3 12/12] rocker: Register FIB notifier before creating ports Jiri Pirko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1480500546-2544-5-git-send-email-jiri@resnulli.us \
--to=jiri@resnulli.us \
--cc=alexander.h.duyck@intel.com \
--cc=andrew@lunn.ch \
--cc=andy@greyhouse.net \
--cc=arkadis@mellanox.com \
--cc=davem@davemloft.net \
--cc=dsa@cumulusnetworks.com \
--cc=eladr@mellanox.com \
--cc=f.fainelli@gmail.com \
--cc=hannes@stressinduktion.org \
--cc=idosch@mellanox.com \
--cc=kaber@trash.net \
--cc=netdev@vger.kernel.org \
--cc=nikolay@cumulusnetworks.com \
--cc=nogahf@mellanox.com \
--cc=ogerlitz@mellanox.com \
--cc=roopa@cumulusnetworks.com \
--cc=vivien.didelot@savoirfairelinux.com \
--cc=yotamg@mellanox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).