From: Matthieu Baerts <matttbe@kernel.org>
To: Geliang Tang <geliang@kernel.org>, mptcp@lists.linux.dev
Subject: Re: [PATCH mptcp-next] mptcp: pm: userspace: avoid scheduling in-kernel PM worker
Date: Tue, 25 Feb 2025 19:15:19 +0100 [thread overview]
Message-ID: <cf5fdbbe-856e-419c-812e-44f61fc89778@kernel.org> (raw)
In-Reply-To: <0b1c7e47-78d1-4dbb-b78b-e071f9c08dac@kernel.org>
Hi Geliang,
On 25/02/2025 10:25, Matthieu Baerts wrote:
> Hi Geliang,
>
> Thank you for your review!
>
> On 25/02/2025 05:21, Geliang Tang wrote:
>> Hi Matt,
>>
>> On Mon, 2025-02-24 at 20:01 +0100, Matthieu Baerts (NGI0) wrote:
>>> When the userspace PM is used, there is no need to schedule the PM
>>> worker for in-kernel specific tasks, e.g. creating new subflows, or
>>> sending more ADD_ADDR.
>>>
>>> Now, these tasks will be done only if the in-kernel PM is being used.
>>>
>>> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
>>> ---
>>> Notes:
>>> - I don't know if this should be seen as a fix. Maybe yes because it
>>> was not supposed to do that from the beginning, and it feels less
>>> risky not to schedule the worker when it is not needed. I'm open
>>> to
>>> suggestions here.
>>> - I'm mainly sharing this patch now because Geliang is going to
>>> modify
>>> these helpers to separate PM specific code.
>>> @Geliang: please rebase your series on top of this one (or include
>>> this patch if it is easier).
>>> Cc: Geliang Tang <tanggeliang@kylinos.cn>
>>> ---
>>> net/mptcp/pm.c | 10 ++++++++--
>>> 1 file changed, 8 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
>>> index
>>> 16cacce6c10fe86467aa7ef8e588f9f535b586fb..4eabb83328905676759768fb44c
>>> 859f5682721e3 100644
>>> --- a/net/mptcp/pm.c
>>> +++ b/net/mptcp/pm.c
>>> @@ -138,7 +138,7 @@ void mptcp_pm_fully_established(struct mptcp_sock
>>> *msk, const struct sock *ssk)
>>> * racing paths - accept() and check_fully_established()
>>> * be sure to serve this event only once.
>>> */
>>> - if (READ_ONCE(pm->work_pending) &&
>>> + if (mptcp_pm_is_kernel(msk) && READ_ONCE(pm->work_pending)
>>> &&
>>
>> I think there's no need to modify this. work_pending is set to 0 by
>> userspace pm, for all cases where work_pending is 1, the path manager
>> must be an in-kernel pm.
>
> Good point, I missed that!
>
> 'pm->work_pending' doesn't look like a good name for what it does.
> Hopefully, this will be clearer when pm->ops will be used, which will
> clearly separate what is done by the different PMs.
>
> Also, is it possible we never set 'pm->work_pending' back to 'true' once
> it has been set to 'false'? Does it mean when we reach the limits, and
> something happens (e.g. a removed subflow, limits are changed, etc.),
> the worker is no longer used to establish new subflows or react when an
> ADD_ADDR is received?
I didn't check, but it feels like we should add something like that:
> diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
> index d4328443d844..dee3e3c14ec3 100644
> --- a/net/mptcp/pm_netlink.c
> +++ b/net/mptcp/pm_netlink.c
> @@ -248,6 +248,8 @@ bool mptcp_pm_nl_check_work_pending(struct mptcp_sock *msk)
> WRITE_ONCE(msk->pm.work_pending, false);
> return false;
> }
> +
> + WRITE_ONCE(msk->pm.work_pending, true);
> return true;
> }
>
> @@ -911,6 +913,9 @@ static void mptcp_pm_nl_rm_addr_or_subflow(struct mptcp_sock *msk,
> WRITE_ONCE(msk->pm.accept_addr, true);
> }
> }
> +
> + if (mptcp_pm_is_kernel(msk))
> + mptcp_pm_nl_check_work_pending(msk);
> }
>
> static void mptcp_pm_nl_rm_addr_received(struct mptcp_sock *msk)
So to re-enable msk->pm.work_pending when limits have been reached, and
it is no longer the case. WDYT?
We would need a new subtest to confirm this patch is needed, and to
avoid regressions later.
(...)
>>> @@ -278,6 +281,9 @@ void mptcp_pm_rm_addr_received(struct mptcp_sock
>>> *msk,
>>> for (i = 0; i < rm_list->nr; i++)
>>> mptcp_event_addr_removed(msk, rm_list->ids[i]);
>>>
>>> + if (!mptcp_pm_is_kernel(msk))
>>> + return;
>>
>> This mptcp_pm_is_kernel() is indeed needed.
>
> Yes it is. And 'pm->work_pending' cannot be used, because it will be set
> to false when the limits are reached, but a RM_ADDR will potentially
> decrement counters.
In fact, it is not needed: from what I see, the corresponding subflows
are removed on the kernel side, the userspace daemon doesn't need to
send a request to remove them. See the modification from 4d25247d3ae4
("mptcp: bypass in-kernel PM restrictions for non-kernel PMs") in
mptcp_pm_nl_rm_addr_or_subflow().
> I will send a v2 with only this change.
I will send a v2 with only an early exit in mptcp_pm_add_addr_echoed()
if there is no need to schedule the worker.
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
next prev parent reply other threads:[~2025-02-25 18:15 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-24 19:01 [PATCH mptcp-next] mptcp: pm: userspace: avoid scheduling in-kernel PM worker Matthieu Baerts (NGI0)
2025-02-24 20:05 ` MPTCP CI
2025-02-25 4:21 ` Geliang Tang
2025-02-25 9:25 ` Matthieu Baerts
2025-02-25 18:15 ` Matthieu Baerts [this message]
2025-02-26 1:05 ` Geliang Tang
2025-02-26 15:02 ` Matthieu Baerts
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cf5fdbbe-856e-419c-812e-44f61fc89778@kernel.org \
--to=matttbe@kernel.org \
--cc=geliang@kernel.org \
--cc=mptcp@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.