From: Remi Pommarel <repk@triplefau.lt>
To: Andrew Strohman <andrew@andrewstrohman.com>
Cc: Sven Eckelmann <sven@narfation.org>, b.a.t.m.a.n@lists.open-mesh.org
Subject: Re: [PATCH] batman-adv: fix panic during interface removal
Date: Fri, 10 Jan 2025 14:10:21 +0100 [thread overview]
Message-ID: <Z4EcPQOMU1BUtO07@pilgrim> (raw)
In-Reply-To: <CAA8ajJnVQWO3fhLAjQtEfjEVReY7x-==TEkhrKQKZMrVJms44w@mail.gmail.com>
Hi Andrew,
On Fri, Jan 10, 2025 at 01:02:19AM -0800, Andrew Strohman wrote:
> > I would prefer when you would call cancel_work_sync when metric stuff should
> > be stopped. I was expecting to see this somewhere around
> > batadv_v_elp_iface_disable after the cancel_work_sync but it seems like it is
> > missing there (or in a similar place)
> >
>
> I tried this:
>
> diff --git a/net/batman-adv/bat_v_elp.c b/net/batman-adv/bat_v_elp.c
> index 1d704574..b35ded79 100644
> --- a/net/batman-adv/bat_v_elp.c
> +++ b/net/batman-adv/bat_v_elp.c
> @@ -387,8 +387,20 @@ int batadv_v_elp_iface_enable(struct
> batadv_hard_iface *hard_iface)
> */
> void batadv_v_elp_iface_disable(struct batadv_hard_iface *hard_iface)
> {
> + struct batadv_hardif_neigh_node *hardif_neigh;
> +
> cancel_delayed_work_sync(&hard_iface->bat_v.elp_wq);
>
> + rcu_read_lock();
> + hlist_for_each_entry_rcu(hardif_neigh,
> + &hard_iface->neigh_list, list) {
> + if (!kref_get_unless_zero(&hardif_neigh->refcount))
> + continue;
> + cancel_work_sync(&hardif_neigh->bat_v.metric_work);
> + batadv_hardif_neigh_put(hardif_neigh);
> + }
> + rcu_read_unlock();
> +
> dev_kfree_skb(hard_iface->bat_v.elp_skb);
> hard_iface->bat_v.elp_skb = NULL;
> }
>
> But it seems to cause a hang on reboot every once in a while. When the hang
> happens, I'm not able to trigger sysrq over serial.
Quickly looking at that I think that metric_work may need to sleep so
calling cancel_work_sync() on this work does not seem safe while in rcu
protected context.
I would try to frame the cancel_work_sync() call with
rcu_read_unlock()/rcu_read_lock() as below:
rcu_read_unlock();
cancel_work_sync(...);
rcu_read_lock();
batadv_hardif_neigh_put(...);
But be careful as batadv_hardif_neigh_put() could modify the list you
are currently traversing. At first glance it seems safe to realease
rcu read constraint to call cancel_work_sync() as long as you take it
back before calling batadv_hardif_neigh_put(), but this could need more
though on that.
--
Remi
next prev parent reply other threads:[~2025-01-10 13:13 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-09 2:27 [PATCH] batman-adv: fix panic during interface removal Andy Strohman
2025-01-09 7:46 ` Sven Eckelmann
2025-01-09 10:10 ` Andrew Strohman
2025-01-09 10:23 ` Sven Eckelmann
2025-01-10 9:02 ` Andrew Strohman
2025-01-10 13:10 ` Remi Pommarel [this message]
2025-01-13 7:35 ` Andrew Strohman
2025-01-19 22:28 ` Andrew Strohman
2025-01-19 23:03 ` Sven Eckelmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z4EcPQOMU1BUtO07@pilgrim \
--to=repk@triplefau.lt \
--cc=andrew@andrewstrohman.com \
--cc=b.a.t.m.a.n@lists.open-mesh.org \
--cc=sven@narfation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox