Re: [PATCH v4 bpf-next/net 5/5] selftest: bpf: Add test for SK_BPF_MEMCG_SOCK_ISOLATED.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Martin KaFai Lau <martin.lau@linux.dev>
To: Kuniyuki Iwashima <kuniyu@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	John Fastabend <john.fastabend@gmail.com>,
	Stanislav Fomichev <sdf@fomichev.me>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Neal Cardwell <ncardwell@google.com>,
	Willem de Bruijn <willemb@google.com>,
	Mina Almasry <almasrymina@google.com>,
	Kuniyuki Iwashima <kuni1840@gmail.com>,
	bpf@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: [PATCH v4 bpf-next/net 5/5] selftest: bpf: Add test for SK_BPF_MEMCG_SOCK_ISOLATED.
Date: Thu, 4 Sep 2025 12:48:03 -0700	[thread overview]
Message-ID: <26939fec-b70f-4beb-8895-427db69c38a0@linux.dev> (raw)
In-Reply-To: <CAAVpQUCLpi+6w1SP=FKVaXwdDHQC_P6B1hzzDC5y4brsf3_UnQ@mail.gmail.com>

On 9/4/25 9:45 AM, Kuniyuki Iwashima wrote:
> On Wed, Sep 3, 2025 at 10:51 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>
>> On 9/3/25 10:08 AM, Kuniyuki Iwashima wrote:
>>> On Wed, Sep 3, 2025 at 9:59 AM Kuniyuki Iwashima <kuniyu@google.com> wrote:
>>>>
>>>> On Tue, Sep 2, 2025 at 1:49 PM Kuniyuki Iwashima <kuniyu@google.com> wrote:
>>>>>
>>>>> On Tue, Sep 2, 2025 at 1:26 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>>>>>
>>>>>> On 8/28/25 6:00 PM, Kuniyuki Iwashima wrote:
>>>>>>> The test does the following for IPv4/IPv6 x TCP/UDP sockets
>>>>>>> with/without BPF prog.
>>>>>>>
>>>>>>>      1. Create socket pairs
>>>>>>>      2. Send a bunch of data that requires more than 256 pages
>>>>>>>      3. Read memory_allocated from the 3rd column in /proc/net/protocols
>>>>>>>      4. Check if unread data is charged to memory_allocated
>>>>>>>
>>>>>>> If BPF prog is attached, memory_allocated should not be changed,
>>>>>>> but we allow a small error (up to 10 pages) in case other processes
>>>>>>> on the host use some amounts of TCP/UDP memory.
>>>>>>>
>>>>>>> At 2., the test actually sends more than 1024 pages because the sysctl
>>>>>>> net.core.mem_pcpu_rsv is 256 is by default, which means 256 pages are
>>>>>>> buffered per cpu before reporting to sk->sk_prot->memory_allocated.
>>>>>>>
>>>>>>>      BUF_SINGLE (1024) * NR_SEND (64) * NR_SOCKETS (64) / 4096
>>>>>>>      = 1024 pages
>>>>>>>
>>>>>>> When I reduced it to 512 pages, the following assertion for the
>>>>>>> non-isolated case got flaky.
>>>>>>>
>>>>>>>      ASSERT_GT(memory_allocated[1], memory_allocated[0] + 256, ...)
>>>>>>>
>>>>>>> Another contributor to slowness is 150ms sleep to make sure 1 RCU
>>>>>>> grace period passes because UDP recv queue is destroyed after that.
>>>>>>
>>>>>> There is a kern_sync_rcu() in testing_helpers.c.
>>>>>
>>>>> Nice helper :)  Will use it.
>>>>>
>>>>>>
>>>>>>>
>>>>>>>      # time ./test_progs -t sk_memcg
>>>>>>>      #370/1   sk_memcg/TCP       :OK
>>>>>>>      #370/2   sk_memcg/UDP       :OK
>>>>>>>      #370/3   sk_memcg/TCPv6     :OK
>>>>>>>      #370/4   sk_memcg/UDPv6     :OK
>>>>>>>      #370     sk_memcg:OK
>>>>>>>      Summary: 1/4 PASSED, 0 SKIPPED, 0 FAILED
>>>>>>>
>>>>>>>      real       0m1.214s
>>>>>>>      user       0m0.014s
>>>>>>>      sys        0m0.318s
>>>>>>
>>>>>> Thanks. It finished much faster in my setup also comparing with the earlier
>>>>>> revision. However, it is a bit flaky when I run it in a loop:
>>>>>>
>>>>>> check_isolated:FAIL:not isolated unexpected not isolated: actual 861 <= expected 861
>>>>>>
>>>>>> I usually can hit this at ~40-th iteration.
>>>>>
>>>>> Oh.. I tested ~10 times manually but will try in a tight loop.
>>>>
>>>> This didn't reproduce on my QEMU with/without --enable-kvm.
>>>>
>>>> Changing the assert from _GT to _GE will address the very case
>>>> above, but I'm not sure if it's enough.
>>>
>>> I doubled NR_SEND and it was still faster with kern_sync_rcu()
>>> than usleep(), so I'll simply double NR_SEND in v5
>>>
>>> # time ./test_progs -t sk_memcg
>>> ...
>>> Summary: 1/4 PASSED, 0 SKIPPED, 0 FAILED
>>> real 0m0.483s
>>> user 0m0.010s
>>> sys 0m0.191s
>>>
>>>
>>>>
>>>> Does the bpf CI run tests repeatedly or is this only a manual
>>>> scenario ?
>>
>> I haven't seen bpf CI hit it yet. It is in my manual bash while loop. It should
>> not be dismissed so easily. Some flaky CI tests were eventually reproduced in a
>> loop before and fixed. I kept the bash loop continue this time until grep-ed a
>> "0" from the error output:
>>
>> check_isolated:FAIL:not isolated unexpected not isolated: actual 0 <= expected 256
>>
>> The "long memory_allocated[2]" read from /proc/net/protocols are printed as 0
>> but it is probably actually negative:
>>
>> static inline long
>> proto_memory_allocated(const struct proto *prot)
>> {
>>           return max(0L, atomic_long_read(prot->memory_allocated));
>> }
>>
>> prot->memory_allocated could be negative afaict but printed as 0 in
>> /proc/net/protocols. Even the machine is network quiet after test_progs started,
>> the "prot->memory_allocated" and the "proto->per_cpu_fw_alloc" could be in some
>> random states before the test_progs start.  When I hit "0", it will take some
>> efforts to send some random traffic to the machine to get the test working again. :(
>>
>> Also, after reading the selftest closer, I am not sure I understand why "+ 256".
>> The "proto-> per_cpu_fw_alloc" can start with -255 or +255.
> 
> Actually I didn't expect the random state and assumed the test's
> local communication would complete on the same CPU thus 0~255.
> 
> Do you see the flakiness with net.core.mem_pcpu_rsv=0 ?
> 
> The per-cpu cache is just for performance and I think it's not
> critical for testing and it's fine to set it to 0 during the test.
> 
> 
>>
>> I don't think changing NR_SEND help here. It needs a better way. May be some
>> functions can be traced such that prot->memory_allocated can be read directly?
>> If fentry and fexit of that function has different memory_allocated values, then
>> the test could also become more straight forward.
> 
> Maybe like this ?  Not yet tested, but we could attach a prog to
> sock_init_data() or somewhere else and trigger it by additional socket(2).
> 
>          memory_allocated = sk->sk_prot->memory_allocated;
>          nr_cpu = bpf_num_possible_cpus();
> 
>          for (i = 0; i < nr_cpu; i++) {
>                  per_cpu_fw_alloc =
> bpf_per_cpu_ptr(sk->sk_prot->per_cpu_fw_alloc, i);

I suspect passing per_cpu_fw_alloc to bpf_per_cpu_ptr won't work for now. sk is 
trusted if it is a "tp_btf" but I don't think the verifier recognizes the 
sk->sk_prot is a trusted ptr. I haven't tested it though. If the above does not 
work, try to directly use the global percpu tcp_memory_per_cpu_fw_alloc. Take a 
look at how "bpf_prog_active" is used in test_ksyms_btf.c.

>                  if (per_cpu_fw_alloc)
>                          memory_allocated += *per_cpu_fw_alloc;

Yeah. I think figuring out the true memory_allocated value and use it as the 
before/after value should be good enough. Then no need to worry about the 
initial states. I wonder why proto_memory_allocated() does not do that for 
/proc/net/protocols but I guess it may not be accurate for a lot of cores.

>          }
> 
> per_cpu_fw_alloc might have been added to sk_prot->memory_allocated
> during loop, so it's not 100% accurate still.
> 
> Probably we should set net.core.mem_pcpu_rsv=0 and stress
> memory_allocated before the actual test to drain per_cpu_fw_alloc
> (at least on the testing CPU).
I think the best is if a suitable kernel func can be traced or figure out the 
true memory_allocated value. At least figuring out the true memory_allocated 
seems doable. If nothing of the above works out, mem_pcpu_rsv=0 and 
pre-stress/pre-flush should help by getting the per_cpu_fw_alloc and 
memory_allocated to some certain states before using it in the before/after result.

[ Before re-spinning, need to conclude/resolve the on-going discussion in v5 first ]

next prev parent reply	other threads:[~2025-09-04 19:48 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-29  1:00 [PATCH v4 bpf-next/net 0/5] bpf: Allow decoupling memcg from sk->sk_prot->memory_allocated Kuniyuki Iwashima
2025-08-29  1:00 ` [PATCH v4 bpf-next/net 1/5] tcp: Save lock_sock() for memcg in inet_csk_accept() Kuniyuki Iwashima
2025-09-02 18:55   ` Martin KaFai Lau
2025-09-02 19:32     ` Kuniyuki Iwashima
2025-08-29  1:00 ` [PATCH v4 bpf-next/net 2/5] bpf: Support bpf_setsockopt() for BPF_CGROUP_INET_SOCK_CREATE Kuniyuki Iwashima
2025-09-02 19:10   ` Martin KaFai Lau
2025-09-02 19:33     ` Kuniyuki Iwashima
2025-08-29  1:00 ` [PATCH v4 bpf-next/net 3/5] bpf: Introduce SK_BPF_MEMCG_FLAGS and SK_BPF_MEMCG_SOCK_ISOLATED Kuniyuki Iwashima
2025-09-02 20:02   ` Martin KaFai Lau
2025-09-02 20:13     ` Kuniyuki Iwashima
2025-08-29  1:00 ` [PATCH v4 bpf-next/net 4/5] net-memcg: Allow decoupling memcg from global protocol memory accounting Kuniyuki Iwashima
2025-09-02 20:16   ` Martin KaFai Lau
2025-09-02 20:45     ` Kuniyuki Iwashima
2025-08-29  1:00 ` [PATCH v4 bpf-next/net 5/5] selftest: bpf: Add test for SK_BPF_MEMCG_SOCK_ISOLATED Kuniyuki Iwashima
2025-09-02 20:26   ` Martin KaFai Lau
2025-09-02 20:49     ` Kuniyuki Iwashima
2025-09-03 16:59       ` Kuniyuki Iwashima
2025-09-03 17:08         ` Kuniyuki Iwashima
2025-09-04  5:50           ` Martin KaFai Lau
2025-09-04 16:45             ` Kuniyuki Iwashima
2025-09-04 19:48               ` Martin KaFai Lau [this message]
2025-09-04 20:29                 ` Kuniyuki Iwashima

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=26939fec-b70f-4beb-8895-427db69c38a0@linux.dev \
    --to=martin.lau@linux.dev \
    --cc=almasrymina@google.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=kuni1840@gmail.com \
    --cc=kuniyu@google.com \
    --cc=mhocko@kernel.org \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=roman.gushchin@linux.dev \
    --cc=sdf@fomichev.me \
    --cc=shakeel.butt@linux.dev \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.