From: Aaron Conole <aconole@redhat.com>
To: Ilya Maximets <i.maximets@ovn.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
netdev@vger.kernel.org, linux-rt-devel@lists.linux.dev,
dev@openvswitch.org, Eric Dumazet <edumazet@google.com>,
Simon Horman <horms@kernel.org>,
Jakub Kicinski <kuba@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Paolo Abeni <pabeni@redhat.com>,
"David S. Miller" <davem@davemloft.net>,
Eelco Chaudron <echaudro@redhat.com>
Subject: Re: [ovs-dev] [PATCH net-next 11/18] openvswitch: Use nested-BH locking for ovs_actions.
Date: Thu, 13 Mar 2025 10:11:45 -0400 [thread overview]
Message-ID: <f7tzfhpxboe.fsf@redhat.com> (raw)
In-Reply-To: <1d72e5df-921b-4027-bf9c-ca374c3a09d1@ovn.org> (Ilya Maximets's message of "Thu, 13 Mar 2025 14:23:16 +0100")
Ilya Maximets <i.maximets@ovn.org> writes:
> On 3/13/25 12:50, Sebastian Andrzej Siewior wrote:
>> On 2025-03-10 17:56:09 [+0100], Ilya Maximets wrote:
>>>>>> + local_lock_nested_bh(&ovs_actions.bh_lock);
>>>>>
>>>>> Wouldn't this cause a warning when we're in a syscall/process context?
>>>>
>>>> My understanding is that is only invoked in softirq context. Did I
>>>> misunderstood it?
>>>
>>> It can be called from the syscall/process context while processing
>>> OVS_PACKET_CMD_EXECUTE request.
>>>
>>>> Otherwise that this_cpu_ptr() above should complain
>>>> that preemption is not disabled and if preemption is indeed not disabled
>>>> how do you ensure that you don't get preempted after the
>>>> __this_cpu_inc_return() in several tasks (at the same time) leading to
>>>> exceeding the OVS_RECURSION_LIMIT?
>>>
>>> We disable BH in this case, so it should be safe (on non-RT). See the
>>> ovs_packet_cmd_execute() for more details.
>>
>> Yes, exactly. So if BH is disabled then local_lock_nested_bh() can
>> safely acquire a per-CPU spinlock on PREEMPT_RT here. This basically
>> mimics the local_bh_disable() behaviour in terms of exclusive data
>> structures on a smaller scope.
>
> OK. I missed that is_softirq() returns true when BH is disabled manually.
>
>>
>>>>>> + ovs_act->owner = current;
>>>>>> + }
>>>>>> +
>>>>>> level = __this_cpu_inc_return(ovs_actions.exec_level);
>>>>>> if (unlikely(level > OVS_RECURSION_LIMIT)) {
>>>>>> net_crit_ratelimited("ovs: recursion limit reached on datapath %s, probable configuration error\n",
>>>>>> @@ -1710,5 +1718,10 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb,
>>>>>>
>>>>>> out:
>>>>>> __this_cpu_dec(ovs_actions.exec_level);
>>>>>> +
>>>>>> + if (level == 1) {
>>>>>> + ovs_act->owner = NULL;
>>>>>> + local_unlock_nested_bh(&ovs_actions.bh_lock);
>>>>>> + }
>>>>>
>>>>> Seems dangerous to lock every time the owner changes but unlock only
>>>>> once on level 1. Even if this works fine, it seems unnecessarily
>>>>> complicated. Maybe it's better to just lock once before calling
>>>>> ovs_execute_actions() instead?
>>>>
>>>> My understanding is this can be invoked recursively. That means on first
>>>> invocation owner == NULL and then you acquire the lock at which point
>>>> exec_level goes 0->1. On the recursive invocation owner == current and
>>>> you skip the lock but exec_level goes 1 -> 2.
>>>> On your return path once level becomes 1, then it means that dec made it
>>>> go 1 -> 0, you unlock the lock.
>>>
>>> My point is: why locking here with some extra non-obvious logic of owner
>>> tracking if we can lock (unconditionally?) in ovs_packet_cmd_execute() and
>>> ovs_dp_process_packet() instead? We already disable BH in one of those
>>> and take appropriate RCU locks in both. So, feels like a better place
>>> for the extra locking if necessary. We will also not need to move around
>>> any code in actions.c if the code there is guaranteed to be safe by holding
>>> locks outside of it.
>>
>> I think I was considering it but dropped it because it looks like one
>> can call the other.
>> ovs_packet_cmd_execute() is an unique entry to ovs_execute_actions().
>> This could the lock unconditionally.
>> Then we have ovs_dp_process_packet() as the second entry point towards
>> ovs_execute_actions() and is the tricky one. One originates from
>> netdev_frame_hook() which the "normal" packet receiving.
>> Then within ovs_execute_actions() there is ovs_vport_send() which could
>> use internal_dev_recv() for forwarding. This one throws the packet into
>> the networking stack so it could come back via netdev_frame_hook().
>> Then there is this internal forwarding via internal_dev_xmit() which
>> also ends up in ovs_execute_actions(). Here I don't know if this can
>> originate from within the recursion.
Just a note that it still needs to go through `ovs_dp_process_packet()`
before it will enter the `ovs_execute_actions()` call by way of the
`ovs_vport_receive()` call. So keeping the locking in datapath.c should
not be a complex task.
> It's true that ovs_packet_cmd_execute() can not be re-intered, while
> ovs_dp_process_packet() can be re-entered if the packet leaves OVS and
> then comes back from another port. It's still better to handle all the
> locking within datapath.c and not lock for RT in actions.c and for non-RT
> in datapath.c.
+1
>>
>> After looking at this and seeing the internal_dev_recv() I decided to
>> move it to within ovs_execute_actions() where the recursion check itself
>> is.
>>
>>>> The locking part happens only on PREEMPT_RT because !PREEMPT_RT has
>>>> softirqs disabled which guarantee that there will be no preemption.
>>>>
>>>> tools/testing/selftests/net/openvswitch should cover this?
>>>
>>> It's not a comprehensive test suite, it covers some cases, but it
>>> doesn't test anything related to preemptions specifically.
Yes, this would be good to enhance, and the plan is to improve it as we
can. If during the course of this work you identify a nice test case,
please do feel empowered to add it.
>> From looking at the traces, everything originates from
>> netdev_frame_hook() and there is sometimes one recursion from within
>> ovs_execute_actions(). I haven't seen anything else.
>>
>>>>> Also, the name of the struct ovs_action doesn't make a lot of sense,
>>>>> I'd suggest to call it pcpu_storage or something like that instead.
>>>>> I.e. have a more generic name as the fields inside are not directly
>>>>> related to each other.
>>>>
>>>> Understood. ovs_pcpu_storage maybe?
>>>
>>> It's OK, I guess, but see also a point about locking inside datapath.c
>>> instead and probably not needing to change anything in actions.c.
>>
>> If you say that adding a lock to ovs_dp_process_packet() and another to
>> ovs_packet_cmd_execute() then I can certainly update. However based on
>> what I wrote above, I am not sure.
>
> I think, it's better if we keep all the locks in datapath.c and let
> actions.c assume that all the operations are always safe as it was
> originally intended.
Agreed - reading through actions code can be complex enough without need
to remember to do recursions checks there.
> Cc: Aaron and Eelco, in case they have some thoughts on this as well.
Thanks Ilya - I think you covered the major points.
> Best regards, Ilya Maximets.
next prev parent reply other threads:[~2025-03-13 14:11 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-09 14:46 [PATCH net-next 00/18] net: Cover more per-CPU storage with local nested BH locking Sebastian Andrzej Siewior
2025-03-09 14:46 ` [PATCH net-next 01/18] net: page_pool: Don't recycle into cache on PREEMPT_RT Sebastian Andrzej Siewior
2025-03-09 14:46 ` [PATCH net-next 02/18] net: dst_cache: Use nested-BH locking for dst_cache::cache Sebastian Andrzej Siewior
2025-03-09 14:46 ` [PATCH net-next 03/18] ipv4/route: Use this_cpu_inc() for stats on PREEMPT_RT Sebastian Andrzej Siewior
2025-03-09 14:46 ` [PATCH net-next 04/18] ipv6: sr: Use nested-BH locking for hmac_storage Sebastian Andrzej Siewior
2025-03-09 14:46 ` [PATCH net-next 05/18] xdp: Use nested-BH locking for system_page_pool Sebastian Andrzej Siewior
2025-03-09 14:46 ` [PATCH net-next 06/18] netfilter: nf_dup{4, 6}: Move duplication check to task_struct Sebastian Andrzej Siewior
2025-03-17 17:29 ` Paolo Abeni
2025-04-07 9:33 ` Juri Lelli
2025-03-09 14:46 ` [PATCH net-next 07/18] netfilter: nft_inner: Use nested-BH locking for nft_pcpu_tun_ctx Sebastian Andrzej Siewior
2025-03-09 14:46 ` [PATCH net-next 08/18] netfilter: nf_dup_netdev: Move the recursion counter struct netdev_xmit Sebastian Andrzej Siewior
2025-03-09 14:46 ` [PATCH net-next 09/18] xfrm: Use nested-BH locking for nat_keepalive_sk_ipv[46] Sebastian Andrzej Siewior
2025-03-09 14:46 ` [PATCH net-next 10/18] openvswitch: Merge three per-CPU structures into one Sebastian Andrzej Siewior
2025-03-17 17:34 ` Paolo Abeni
2025-03-17 17:38 ` Paolo Abeni
2025-03-09 14:46 ` [PATCH net-next 11/18] openvswitch: Use nested-BH locking for ovs_actions Sebastian Andrzej Siewior
2025-03-10 14:17 ` [ovs-dev] " Ilya Maximets
2025-03-10 14:44 ` Sebastian Andrzej Siewior
2025-03-10 16:56 ` Ilya Maximets
2025-03-13 11:50 ` Sebastian Andrzej Siewior
2025-03-13 13:23 ` Ilya Maximets
2025-03-13 14:02 ` Sebastian Andrzej Siewior
2025-03-13 14:11 ` Aaron Conole [this message]
2025-03-09 14:46 ` [PATCH net-next 12/18] openvswitch: Move ovs_frag_data_storage into the struct ovs_action Sebastian Andrzej Siewior
2025-03-09 14:46 ` [PATCH net-next 13/18] net/sched: act_mirred: Move the recursion counter struct netdev_xmit Sebastian Andrzej Siewior
2025-03-21 17:14 ` Davide Caratti
2025-04-11 13:12 ` Sebastian Andrzej Siewior
2025-03-09 14:46 ` [PATCH net-next 14/18] net/sched: Use nested-BH locking for sch_frag_data_storage Sebastian Andrzej Siewior
2025-03-09 14:46 ` [PATCH net-next 15/18] mptcp: Use nested-BH locking for hmac_storage Sebastian Andrzej Siewior
2025-03-09 14:46 ` [PATCH net-next 16/18] rds: Disable only bottom halves in rds_page_remainder_alloc() Sebastian Andrzej Siewior
2025-03-09 14:46 ` [PATCH net-next 17/18] rds: Acquire per-CPU pointer within BH disabled section Sebastian Andrzej Siewior
2025-03-09 14:46 ` [PATCH net-next 18/18] rds: Use nested-BH locking for rds_page_remainder Sebastian Andrzej Siewior
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f7tzfhpxboe.fsf@redhat.com \
--to=aconole@redhat.com \
--cc=bigeasy@linutronix.de \
--cc=davem@davemloft.net \
--cc=dev@openvswitch.org \
--cc=echaudro@redhat.com \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=i.maximets@ovn.org \
--cc=kuba@kernel.org \
--cc=linux-rt-devel@lists.linux.dev \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).