From: Nikolay Aleksandrov <razor@blackwall.org>
To: Cosmin Ratiu <cratiu@nvidia.com>,
"liuhangbin@gmail.com" <liuhangbin@gmail.com>
Cc: "andrew+netdev@lunn.ch" <andrew+netdev@lunn.ch>,
"jarod@redhat.com" <jarod@redhat.com>,
"davem@davemloft.net" <davem@davemloft.net>,
Tariq Toukan <tariqt@nvidia.com>,
"linux-kselftest@vger.kernel.org"
<linux-kselftest@vger.kernel.org>,
"shuah@kernel.org" <shuah@kernel.org>,
"steffen.klassert@secunet.com" <steffen.klassert@secunet.com>,
"jv@jvosburgh.net" <jv@jvosburgh.net>,
"kuba@kernel.org" <kuba@kernel.org>,
"horms@kernel.org" <horms@kernel.org>,
"edumazet@google.com" <edumazet@google.com>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Jianbo Liu <jianbol@nvidia.com>,
"pabeni@redhat.com" <pabeni@redhat.com>
Subject: Re: [PATCHv3 net 1/3] bonding: move IPsec deletion to bond_ipsec_free_sa
Date: Fri, 28 Feb 2025 13:07:19 +0200 [thread overview]
Message-ID: <ab9e32b2-2574-48d4-bc13-8e752a194c43@blackwall.org> (raw)
In-Reply-To: <76ed1d018596b81548d095aa2d4a9b31b360479c.camel@nvidia.com>
On 2/28/25 12:31, Cosmin Ratiu wrote:
> On Fri, 2025-02-28 at 02:20 +0000, Hangbin Liu wrote:
>> On Thu, Feb 27, 2025 at 03:31:01PM +0200, Nikolay Aleksandrov wrote:
>>>>> One more thing - note I'm not an xfrm expert by far but it
>>>>> seems to me here you have
>>>>> to also call xdo_dev_state_free() with the old active slave
>>>>> dev otherwise that will
>>>>> never get called with the original real_dev after the switch to
>>>>> a new
>>>>> active slave (or more accurately it might if the GC runs
>>>>> between the switching
>>>>> but it is a race), care must be taken wrt sequence of events
>>>>> because the XFRM
>>>>
>>>> Can we just call xs->xso.real_dev->xfrmdev_ops-
>>>>> xdo_dev_state_free(xs)
>>>> no matter xs->xso.real_dev == real_dev or not? I'm afraid calling
>>>> xdo_dev_state_free() every where may make us lot more easily.
>>>>
>>>
>>> You'd have to check all drivers that implement the callback to
>>> answer that and even then
>>> I'd stick to the canonical way of how it's done in xfrm and make
>>> the bond just passthrough.
>>> Any other games become dangerous and new code will have to be
>>> carefully reviewed every
>>> time, calling another device's free_sa when it wasn't added before
>>> doesn't sound good.
>>>
>>>>> GC may be running in parallel which probably means that in
>>>>> bond_ipsec_free_sa()
>>>>> you'll have to take the mutex before calling
>>>>> xdo_dev_state_free() and check
>>>>> if the entry is still linked in the bond's ipsec list before
>>>>> calling the free_sa
>>>>> callback, if it isn't then del_sa_all got to it before the GC
>>>>> and there's nothing
>>>>> to do if it also called the dev's free_sa callback. The check
>>>>> for real_dev doesn't
>>>>> seem enough to protect against this race.
>>>>
>>>> I agree that we need to take the mutex before calling
>>>> xdo_dev_state_free()
>>>> in bond_ipsec_free_sa(). Do you think if this is enough? I'm a
>>>> bit lot here.
>>>>
>>>> Thanks
>>>> Hangbin
>>>
>>> Well, the race is between the xfrm GC and del_sa_all, in bond's
>>> free_sa if you
>>> walk the list under the mutex before calling real_dev's free
>>> callback and
>>> don't find the current element that's being freed in free_sa then
>>> it was
>>> cleaned up by del_sa_all, otherwise del_sa_all is waiting to walk
>>> that
>>> list and clean the entries. I think it should be fine as long as
>>> free_sa
>>> was called once with the proper device.
>>
>> OK, so the free will be called either in del_sa_all() or free_sa().
>> Something like this?
>>
> [...]
>
> Unfortunately, after applying these changes and reasoning about them
> for a bit, I don't think this will work. There are still races left.
> For example:
> 1. An xs is marked DEAD (in __xfrm_state_delete, with x->lock held) and
> before .xdo_dev_state_delete() is called on it, bond_ipsec_del_sa_all
> is called in parallel, doesn't call delete on xs (because it's dead),
> then calls free (incorrect without delete first), then removes the list
> entry. Later, xdo_dev_state_delete( == bond_ipsec_del_sa) is called,
> and calls delete (incorrect, out of order with free). Finally,
> bond_ipsec_free_sa is called, which fortunately doesn't do anything
> silly in the new proposed form because xs is no longer in the list.
>
> 2. A more sinister form of the above race can happen when
> bond_ipsec_del_sa_all() calls delete on real_dev, then in parallel and
> immediately after __xfrm_state_delete marks xs as DEAD and calls
> bond_ipsec_del_sa() which happily calls delete on real_dev again.
>
> In order to fix these races (and others like it), I think
> bond_ipsec_del_sa_all and bond_ipsec_add_sa_all *need* to acquire x-
>> lock for each xs being processed. This would prevent xfrm from
> concurrently initiating add/delete operations on the managed states.
>
> Cosmin.
Duh, right you are. The state is protected by x->lock and cannot be trusted
outside of it. If you take x->lock inside the list walk with the mutex held
you can deadlock.
Cheers,
Nik
next prev parent reply other threads:[~2025-02-28 11:07 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-27 8:37 [PATCHv3 net 0/3] bond: fix xfrm offload issues Hangbin Liu
2025-02-27 8:37 ` [PATCHv3 net 1/3] bonding: move IPsec deletion to bond_ipsec_free_sa Hangbin Liu
2025-02-27 8:50 ` Nikolay Aleksandrov
2025-02-27 9:21 ` Nikolay Aleksandrov
2025-02-27 13:21 ` Hangbin Liu
2025-02-27 13:31 ` Nikolay Aleksandrov
2025-02-28 2:20 ` Hangbin Liu
2025-02-28 10:31 ` Cosmin Ratiu
2025-02-28 11:07 ` Nikolay Aleksandrov [this message]
2025-02-28 11:10 ` Nikolay Aleksandrov
2025-02-28 12:59 ` Hangbin Liu
2025-03-04 9:18 ` Hangbin Liu
2025-03-04 10:25 ` Cosmin Ratiu
2025-02-27 8:37 ` [PATCHv3 net 2/3] bonding: fix xfrm offload feature setup on active-backup mode Hangbin Liu
2025-02-27 8:37 ` [PATCHv3 net 3/3] selftests: bonding: add ipsec offload test Hangbin Liu
2025-02-27 13:59 ` Petr Machata
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ab9e32b2-2574-48d4-bc13-8e752a194c43@blackwall.org \
--to=razor@blackwall.org \
--cc=andrew+netdev@lunn.ch \
--cc=cratiu@nvidia.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=jarod@redhat.com \
--cc=jianbol@nvidia.com \
--cc=jv@jvosburgh.net \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=liuhangbin@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=shuah@kernel.org \
--cc=steffen.klassert@secunet.com \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox