Re: [PATCH v4 bpf-next 09/14] bpf: Allow reuse from waiting_for_gp_ttrace list.

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Hou Tao <houtao@huaweicloud.com>
To: paulmck@kernel.org, Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Tejun Heo <tj@kernel.org>,
	rcu@vger.kernel.org, Network Development <netdev@vger.kernel.org>,
	bpf <bpf@vger.kernel.org>, Kernel Team <kernel-team@fb.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	David Vernet <void@manifault.com>
Subject: Re: [PATCH v4 bpf-next 09/14] bpf: Allow reuse from waiting_for_gp_ttrace list.
Date: Sat, 8 Jul 2023 15:03:40 +0800	[thread overview]
Message-ID: <bdfc76dc-459a-7c23-bb23-854742fbd0c3@huaweicloud.com> (raw)
In-Reply-To: <3f72c4e7-340f-4374-9ebe-f9bffd08c755@paulmck-laptop>

Hi,

On 7/8/2023 1:47 AM, Paul E. McKenney wrote:
> On Fri, Jul 07, 2023 at 09:11:22AM -0700, Alexei Starovoitov wrote:
>> On Thu, Jul 6, 2023 at 9:37 PM Hou Tao <houtao@huaweicloud.com> wrote:
SNIP
>>> I guess you're assuming that alloc_bulk() from irq_work
>>> is running within rcu_tasks_trace critical section,
>>> so __free_rcu_tasks_trace() callback will execute after
>>> irq work completed?
>>> I don't think that's the case.
>>> Yes. The following is my original thoughts. Correct me if I was wrong:
>>>
>>> 1. llist_del_first() must be running concurrently with llist_del_all().
>>> If llist_del_first() runs after llist_del_all(), it will return NULL
>>> directly.
>>> 2. call_rcu_tasks_trace() must happen after llist_del_all(), else the
>>> elements in free_by_rcu_ttrace will not be freed back to slab.
>>> 3. call_rcu_tasks_trace() will wait for one tasks trace RCU grace period
>>> to call __free_rcu_tasks_trace()
>>> 4. llist_del_first() in running in an context with irq-disabled, so the
>>> tasks trace RCU grace period will wait for the end of llist_del_first()
>>>
>>> It seems you thought step 4) is not true, right ?
>> Yes. I think so. For two reasons:
>>
>> 1.
>> I believe irq disabled region isn't considered equivalent
>> to rcu_read_lock_trace() region.
>>
>> Paul,
>> could you clarify ?
> You are correct, Alexei.  Unlike vanilla RCU, RCU Tasks Trace does not
> count irq-disabled regions of code as readers.

I see. But I still have one question: considering that in current
implementation one Tasks Trace RCU grace period implies one vanilla RCU
grace period (aka rcu_trace_implies_rcu_gp), so in my naive
understanding of RCU, does that mean __free_rcu_tasks_trace() will be
invoked after the expiration of current Task Trace RCU grace period,
right ? And does it also mean __free_rcu_tasks_trace() will be invoked
after the expiration of current vanilla RCU grace period, right ? If
these two conditions above are true, does it mean
__free_rcu_tasks_trace() will wait for the irq-disabled code reigion ?
> But why not just put an rcu_read_lock_trace() and a matching
> rcu_read_unlock_trace() within that irq-disabled region of code?
>
> For completeness, if it were not for CONFIG_TASKS_TRACE_RCU_READ_MB,
> Hou Tao would be correct from a strict current-implementation
> viewpoint.  The reason is that, given the current implementation in
> CONFIG_TASKS_TRACE_RCU_READ_MB=n kernels, a task must either block or
> take an IPI in order for the grace-period machinery to realize that this
> task is done with all prior readers.

Thanks for the detailed explanation.
> However, we need to account for the possibility of IPI-free
> implementations, for example, if the real-time guys decide to start
> making heavy use of BPF sleepable programs.  They would then insist on
> getting rid of those IPIs for CONFIG_PREEMPT_RT=y kernels.  At which
> point, irq-disabled regions of code will absolutely not act as
> RCU tasks trace readers.
>
> Again, why not just put an rcu_read_lock_trace() and a matching
> rcu_read_unlock_trace() within that irq-disabled region of code?
>
> Or maybe there is a better workaround.

Yes. I think we could use rcu_read_{lock,unlock}_trace to fix the ABA
problem for free_by_rcu_ttrace.
>
>> 2.
>> Even if 1 is incorrect, in RT llist_del_first() from alloc_bulk()
>> runs "in a per-CPU thread in preemptible context."
>> See irq_work_run_list.
> Agreed, under RT, "interrupt handlers" often run in task context.

Yes, I missed that. I misread alloc_bulk(), and it seems it only does
inc_active() for c->free_llist.
> 						Thanx, Paul

next prev parent reply	other threads:[~2023-07-08  7:03 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-06  3:34 [PATCH v4 bpf-next 00/14] bpf: Introduce bpf_mem_cache_free_rcu() Alexei Starovoitov
2023-07-06  3:34 ` [PATCH v4 bpf-next 01/14] bpf: Rename few bpf_mem_alloc fields Alexei Starovoitov
2023-07-06  3:34 ` [PATCH v4 bpf-next 02/14] bpf: Simplify code of destroy_mem_alloc() with kmemdup() Alexei Starovoitov
2023-07-06  3:34 ` [PATCH v4 bpf-next 03/14] bpf: Let free_all() return the number of freed elements Alexei Starovoitov
2023-07-06  3:34 ` [PATCH v4 bpf-next 04/14] bpf: Refactor alloc_bulk() Alexei Starovoitov
2023-07-06  3:34 ` [PATCH v4 bpf-next 05/14] bpf: Factor out inc/dec of active flag into helpers Alexei Starovoitov
2023-07-06  3:34 ` [PATCH v4 bpf-next 06/14] bpf: Further refactor alloc_bulk() Alexei Starovoitov
2023-07-06  3:34 ` [PATCH v4 bpf-next 07/14] bpf: Change bpf_mem_cache draining process Alexei Starovoitov
2023-07-06 12:55   ` Hou Tao
2023-07-06  3:34 ` [PATCH v4 bpf-next 08/14] bpf: Add a hint to allocated objects Alexei Starovoitov
2023-07-06  3:34 ` [PATCH v4 bpf-next 09/14] bpf: Allow reuse from waiting_for_gp_ttrace list Alexei Starovoitov
2023-07-07  2:07   ` Hou Tao
2023-07-07  2:12     ` Alexei Starovoitov
2023-07-07  3:38       ` Hou Tao
2023-07-07  4:16         ` Alexei Starovoitov
2023-07-07  4:37           ` Hou Tao
2023-07-07 16:11             ` Alexei Starovoitov
2023-07-07 17:47               ` Paul E. McKenney
2023-07-07 22:22                 ` Joel Fernandes
2023-07-08  7:03                 ` Hou Tao [this message]
2023-07-10  4:45                   ` Paul E. McKenney
2023-07-06  3:34 ` [PATCH v4 bpf-next 10/14] rcu: Export rcu_request_urgent_qs_task() Alexei Starovoitov
2023-07-06  3:34 ` [PATCH v4 bpf-next 11/14] selftests/bpf: Improve test coverage of bpf_mem_alloc Alexei Starovoitov
2023-07-06  3:34 ` [PATCH v4 bpf-next 12/14] bpf: Introduce bpf_mem_free_rcu() similar to kfree_rcu() Alexei Starovoitov
2023-07-07  1:45   ` Hou Tao
2023-07-07  2:10     ` Alexei Starovoitov
2023-07-07  4:05       ` Hou Tao
2023-07-08  7:00         ` Hou Tao
2023-07-06  3:34 ` [PATCH v4 bpf-next 13/14] bpf: Convert bpf_cpumask to bpf_mem_cache_free_rcu Alexei Starovoitov
2023-07-06  3:34 ` [PATCH v4 bpf-next 14/14] bpf: Add object leak check Alexei Starovoitov
2023-07-12 21:50 ` [PATCH v4 bpf-next 00/14] bpf: Introduce bpf_mem_cache_free_rcu() patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bdfc76dc-459a-7c23-bb23-854742fbd0c3@huaweicloud.com \
    --to=houtao@huaweicloud.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kernel-team@fb.com \
    --cc=netdev@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).