* [PATCH net] nfp: fix rcu_read_lock/unlock while rcu_derefrencing
@ 2023-05-09 6:06 Louis Peens
2023-05-09 8:08 ` Leon Romanovsky
2023-05-10 2:40 ` Jakub Kicinski
0 siblings, 2 replies; 4+ messages in thread
From: Louis Peens @ 2023-05-09 6:06 UTC (permalink / raw)
To: David Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, netdev, stable, oss-drivers
From: Tianyu Yuan <tianyu.yuan@corigine.com>
When CONFIG_PROVE_LOCKING and CONFIG_PROVE_RCU are enabled, using
OVS with vf reprs on bridge will lead to following log in dmesg:
'''
.../nfp/flower/main.c:269 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
no locks held by swapper/15/0.
......
Call Trace:
<IRQ>
dump_stack_lvl+0x8c/0xa0
lockdep_rcu_suspicious+0x118/0x1a0
nfp_flower_dev_get+0xc1/0x240 [nfp]
nfp_nfd3_rx+0x419/0xb90 [nfp]
? validate_chain+0x640/0x1880
nfp_nfd3_poll+0x3e/0x180 [nfp]
__napi_poll+0x28/0x1d0
net_rx_action+0x2bd/0x3c0
? _raw_spin_unlock_irqrestore+0x42/0x70
__do_softirq+0xc3/0x3c6
irq_exit_rcu+0xeb/0x130
common_interrupt+0xb9/0xd0
</IRQ>
<TASK>
......
</TASK>
'''
This debug log is caused by missing of rcu_read_lock()/unlock().
In previous patch rcu_read_lock/unlock are removed while they are
still needed when calling rcu_dereference() in nfp_app_dev_get().
Fixes: d5789621b658 ("nfp: Remove rcu_read_lock() around XDP program invocation")
CC: stable@vger.kernel.org
Signed-off-by: Tianyu Yuan <tianyu.yuan@corigine.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
---
drivers/net/ethernet/netronome/nfp/nfd3/dp.c | 4 ++--
drivers/net/ethernet/netronome/nfp/nfd3/xsk.c | 2 +-
drivers/net/ethernet/netronome/nfp/nfdk/dp.c | 4 ++--
drivers/net/ethernet/netronome/nfp/nfp_app.h | 16 ++++++++++++++++
4 files changed, 21 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/netronome/nfp/nfd3/dp.c b/drivers/net/ethernet/netronome/nfp/nfd3/dp.c
index 0cc026b0aefd..3e5a84370d8a 100644
--- a/drivers/net/ethernet/netronome/nfp/nfd3/dp.c
+++ b/drivers/net/ethernet/netronome/nfp/nfd3/dp.c
@@ -1058,8 +1058,8 @@ static int nfp_nfd3_rx(struct nfp_net_rx_ring *rx_ring, int budget)
struct nfp_net *nn;
nn = netdev_priv(dp->netdev);
- netdev = nfp_app_dev_get(nn->app, meta.portid,
- &redir_egress);
+ netdev = nfp_app_dev_get_locked(nn->app, meta.portid,
+ &redir_egress);
if (unlikely(!netdev)) {
nfp_nfd3_rx_drop(dp, r_vec, rx_ring, rxbuf,
NULL);
diff --git a/drivers/net/ethernet/netronome/nfp/nfd3/xsk.c b/drivers/net/ethernet/netronome/nfp/nfd3/xsk.c
index 5d9db8c2a5b4..6ec5b0d430ea 100644
--- a/drivers/net/ethernet/netronome/nfp/nfd3/xsk.c
+++ b/drivers/net/ethernet/netronome/nfp/nfd3/xsk.c
@@ -71,7 +71,7 @@ static void nfp_nfd3_xsk_rx_skb(struct nfp_net_rx_ring *rx_ring,
} else {
struct nfp_net *nn = netdev_priv(dp->netdev);
- netdev = nfp_app_dev_get(nn->app, meta->portid, NULL);
+ netdev = nfp_app_dev_get_locked(nn->app, meta->portid, NULL);
if (unlikely(!netdev)) {
nfp_net_xsk_rx_drop(r_vec, xrxbuf);
return;
diff --git a/drivers/net/ethernet/netronome/nfp/nfdk/dp.c b/drivers/net/ethernet/netronome/nfp/nfdk/dp.c
index 33b6d74adb4b..0ac68985d289 100644
--- a/drivers/net/ethernet/netronome/nfp/nfdk/dp.c
+++ b/drivers/net/ethernet/netronome/nfp/nfdk/dp.c
@@ -1177,8 +1177,8 @@ static int nfp_nfdk_rx(struct nfp_net_rx_ring *rx_ring, int budget)
struct nfp_net *nn;
nn = netdev_priv(dp->netdev);
- netdev = nfp_app_dev_get(nn->app, meta.portid,
- &redir_egress);
+ netdev = nfp_app_dev_get_locked(nn->app, meta.portid,
+ &redir_egress);
if (unlikely(!netdev)) {
nfp_nfdk_rx_drop(dp, r_vec, rx_ring, rxbuf,
NULL);
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_app.h b/drivers/net/ethernet/netronome/nfp/nfp_app.h
index 90707346a4ef..639bb1a6349b 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_app.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_app.h
@@ -417,6 +417,22 @@ struct net_device *nfp_app_dev_get(struct nfp_app *app, u32 id,
return app->type->dev_get(app, id, redir_egress);
}
+static inline
+struct net_device *nfp_app_dev_get_locked(struct nfp_app *app, u32 id,
+ bool *redir_egress)
+{
+ struct net_device *dev;
+
+ if (unlikely(!app || !app->type->dev_get))
+ return NULL;
+
+ rcu_read_lock();
+ dev = app->type->dev_get(app, id, redir_egress);
+ rcu_read_unlock();
+
+ return dev;
+}
+
struct nfp_app *nfp_app_from_netdev(struct net_device *netdev);
u64 *nfp_app_port_get_stats(struct nfp_port *port, u64 *data);
--
2.34.1
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH net] nfp: fix rcu_read_lock/unlock while rcu_derefrencing
2023-05-09 6:06 [PATCH net] nfp: fix rcu_read_lock/unlock while rcu_derefrencing Louis Peens
@ 2023-05-09 8:08 ` Leon Romanovsky
2023-05-10 2:40 ` Jakub Kicinski
1 sibling, 0 replies; 4+ messages in thread
From: Leon Romanovsky @ 2023-05-09 8:08 UTC (permalink / raw)
To: Louis Peens
Cc: David Miller, Jakub Kicinski, Paolo Abeni, Simon Horman, netdev,
stable, oss-drivers
On Tue, May 09, 2023 at 08:06:32AM +0200, Louis Peens wrote:
> From: Tianyu Yuan <tianyu.yuan@corigine.com>
>
> When CONFIG_PROVE_LOCKING and CONFIG_PROVE_RCU are enabled, using
> OVS with vf reprs on bridge will lead to following log in dmesg:
>
> '''
> .../nfp/flower/main.c:269 suspicious rcu_dereference_check() usage!
>
> other info that might help us debug this:
>
> rcu_scheduler_active = 2, debug_locks = 1
> no locks held by swapper/15/0.
>
> ......
> Call Trace:
> <IRQ>
> dump_stack_lvl+0x8c/0xa0
> lockdep_rcu_suspicious+0x118/0x1a0
> nfp_flower_dev_get+0xc1/0x240 [nfp]
> nfp_nfd3_rx+0x419/0xb90 [nfp]
> ? validate_chain+0x640/0x1880
> nfp_nfd3_poll+0x3e/0x180 [nfp]
> __napi_poll+0x28/0x1d0
> net_rx_action+0x2bd/0x3c0
> ? _raw_spin_unlock_irqrestore+0x42/0x70
> __do_softirq+0xc3/0x3c6
> irq_exit_rcu+0xeb/0x130
> common_interrupt+0xb9/0xd0
> </IRQ>
> <TASK>
> ......
> </TASK>
> '''
>
> This debug log is caused by missing of rcu_read_lock()/unlock().
> In previous patch rcu_read_lock/unlock are removed while they are
> still needed when calling rcu_dereference() in nfp_app_dev_get().
>
> Fixes: d5789621b658 ("nfp: Remove rcu_read_lock() around XDP program invocation")
> CC: stable@vger.kernel.org
> Signed-off-by: Tianyu Yuan <tianyu.yuan@corigine.com>
> Acked-by: Simon Horman <simon.horman@corigine.com>
> Signed-off-by: Louis Peens <louis.peens@corigine.com>
> ---
> drivers/net/ethernet/netronome/nfp/nfd3/dp.c | 4 ++--
> drivers/net/ethernet/netronome/nfp/nfd3/xsk.c | 2 +-
> drivers/net/ethernet/netronome/nfp/nfdk/dp.c | 4 ++--
> drivers/net/ethernet/netronome/nfp/nfp_app.h | 16 ++++++++++++++++
> 4 files changed, 21 insertions(+), 5 deletions(-)
>
Thanks,
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH net] nfp: fix rcu_read_lock/unlock while rcu_derefrencing
2023-05-09 6:06 [PATCH net] nfp: fix rcu_read_lock/unlock while rcu_derefrencing Louis Peens
2023-05-09 8:08 ` Leon Romanovsky
@ 2023-05-10 2:40 ` Jakub Kicinski
2023-05-10 6:10 ` Louis Peens
1 sibling, 1 reply; 4+ messages in thread
From: Jakub Kicinski @ 2023-05-10 2:40 UTC (permalink / raw)
To: Louis Peens
Cc: David Miller, Paolo Abeni, Simon Horman, netdev, stable,
oss-drivers
On Tue, 9 May 2023 08:06:32 +0200 Louis Peens wrote:
> +static inline
> +struct net_device *nfp_app_dev_get_locked(struct nfp_app *app, u32 id,
_locked() in what way? RCU functions typically use an _rcu suffix, no?
> + bool *redir_egress)
> +{
> + struct net_device *dev;
> +
> + if (unlikely(!app || !app->type->dev_get))
> + return NULL;
> +
> + rcu_read_lock();
> + dev = app->type->dev_get(app, id, redir_egress);
> + rcu_read_unlock();
> +
> + return dev;
this looks very suspicious, RCU takes care primarily of the lifetime of
objects, in this case dev. Returning it after dropping the lock seems
wrong.
If the context is safe maybe it's a better idea to change the
condition in rcu_dereference_check() to include rcu_read_lock_bh_held()?
--
pw-bot: cr
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH net] nfp: fix rcu_read_lock/unlock while rcu_derefrencing
2023-05-10 2:40 ` Jakub Kicinski
@ 2023-05-10 6:10 ` Louis Peens
0 siblings, 0 replies; 4+ messages in thread
From: Louis Peens @ 2023-05-10 6:10 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David Miller, Paolo Abeni, Simon Horman, netdev, stable,
oss-drivers
On Tue, May 09, 2023 at 07:40:13PM -0700, Jakub Kicinski wrote:
> On Tue, 9 May 2023 08:06:32 +0200 Louis Peens wrote:
> > +static inline
> > +struct net_device *nfp_app_dev_get_locked(struct nfp_app *app, u32 id,
>
> _locked() in what way? RCU functions typically use an _rcu suffix, no?
We were discussing the naming during internal review, for some reason didn't
think about using _rcu, will update if there is a v2.
>
> > + bool *redir_egress)
> > +{
> > + struct net_device *dev;
> > +
> > + if (unlikely(!app || !app->type->dev_get))
> > + return NULL;
> > +
> > + rcu_read_lock();
> > + dev = app->type->dev_get(app, id, redir_egress);
> > + rcu_read_unlock();
> > +
> > + return dev;
>
> this looks very suspicious, RCU takes care primarily of the lifetime of
> objects, in this case dev. Returning it after dropping the lock seems
> wrong.
>
> If the context is safe maybe it's a better idea to change the
> condition in rcu_dereference_check() to include rcu_read_lock_bh_held()?
Thanks, will take a closer look at this.
> --
> pw-bot: cr
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-05-10 6:11 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-09 6:06 [PATCH net] nfp: fix rcu_read_lock/unlock while rcu_derefrencing Louis Peens
2023-05-09 8:08 ` Leon Romanovsky
2023-05-10 2:40 ` Jakub Kicinski
2023-05-10 6:10 ` Louis Peens
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).