Re: [PATCH bpf-next v1 2/2] veristat: memory accounting for bpf programs

bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Eduard Zingerman <eddyz87@gmail.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Mykyta Yatsenko <mykyta.yatsenko5@gmail.com>,
	bpf@vger.kernel.org,  ast@kernel.org, andrii@kernel.org,
	daniel@iogearbox.net, martin.lau@linux.dev, 	kernel-team@fb.com,
	yonghong.song@linux.dev
Subject: Re: [PATCH bpf-next v1 2/2] veristat: memory accounting for bpf programs
Date: Sat, 07 Jun 2025 01:13:26 -0700	[thread overview]
Message-ID: <ae7b709f618ecd75214e62f2a300fe2949d9b567.camel@gmail.com> (raw)
In-Reply-To: <CAEf4BzY2CzZy8DMe==F7OmvEO2gkGG___SaZgu8dGDJd4LG4_Q@mail.gmail.com>

On Fri, 2025-06-06 at 11:19 -0700, Andrii Nakryiko wrote:

[...]

> Looking at memory_peak_write() in mm/memcontrol.c it looks reasonable
> and should have worked (we do reset pc->local_watermark). But note if
> (usage > peer_ctx->value) logic and /* initial write, register watcher
> */ comment. I'm totally guessing and speculating, but maybe you didn't
> close and re-open the file in between and so you had stale "watcher"
> with already recorded high watermark?..
> 
> I'd try again but be very careful what cgroup and at what point this
> is being reset...

The way I read memcontrol.c:memory_peak_write(), it always transfers
current memcg->memory (aka memory.current) to the ofp->value of the
currently open file (aka memory.peak). So this should work as
documentation suggests: one needs to keep a single fd for memory.peak
and periodically write something to it to reset the value.

---

I tried several versions with selftests and scx BPF binaries:
- version as in this patch-set, aka "many cg";
- version with a single control group that writes to memory.reclaim
  and then to memory.peak between program verifications (while holding
  same FDs for these files), aka "reset+reclaim", implementation is in [1];
- version with a single control group same as "reset+reclaim" but
  without "reclaim" part, aka "reset only", implementation can be
  trivially derived from [1].

Here are stats for each of the versions, where I try to figure out the
stability of results. Each version was run twice and generated results
compared.

|                                    |         |        one cg |     one cg |        |
|                                    | many cg | reclaim+reset | reset only | master |
|------------------------------------+---------+---------------+------------+--------|
| SCX                                |         |               |            |        |
|------------------------------------+---------+---------------+------------+--------|
| running time (sec)                 |      48 |            50 |         46 |     43 |
| jitter mem_peak_diff!=0  (of 172)  |       3 |            93 |         80 |        |
| jitter mem_peak_diff>256 (of 172)  |       0 |             5 |          7 |        |
|------------------------------------+---------+---------------+------------+--------|
| selftests                          |         |               |            |        |
|------------------------------------+---------+---------------+------------+--------|
| running time (sec)                 |     108 |           140 |         90 |     86 |
| jitter mem_peak_diff!=0  (of 3601) |     195 |          1751 |       1181 |        |
| jitter mem_peak_diff>256 (of 3601) |       1 |            22 |         14 |        |

- "jitter mem_peak_diff!=0" means that veristat was run two times and
  results were compared to produce a number of differences:
  `veristat -C -f "mem_peak_diff!=0" first-run.csv second-run.csv| wc -l`
- "jitter mem_peak_diff>256" is the same, but the filter expression
  was "mem_peak_diff>256", meaning difference is greater than 256KiB.

The big jitter comes from `0->256KiB` and `256KiB->0` transitions
occurring to very small programs. There are a lot of such programs in
selftests.

Comparison of results quality between many cg and other types (same
metrics as above, but different veristat versions were used to produce
CSVs for comparison):

|                                    |          many cg |       many cg |
|                                    | vs reset+reclaim | vs reset-only |
|------------------------------------+------------------+---------------|
| SCX                                |                  |               |
|------------------------------------+------------------+---------------|
| jitter mem_peak_diff!=0  (of 172)  |              108 |            70 |
| jitter mem_peak_diff>256 (of 172)  |                6 |             2 |
|------------------------------------+------------------+---------------|
| sleftests                          |                  |               |
|------------------------------------+------------------+---------------|
| jitter mem_peak_diff!=0  (of 3601) |             1885 |           942 |
| jitter mem_peak_diff>256 (of 3601) |               27 |            11 |


As can be seen, most of the difference in collected stats is not
bigger than 256KiB.

---

Given above I'm inclined to stick with "many cg" approach, as it has
less jitter and is reasonably performant. I need to wrap-up parallel
veristat version anyway (and many cg should be easier to manage for
parallel run).

---

[1] https://github.com/eddyz87/bpf/tree/veristat-memory-accounting.one-cg

P.S.

The only difference between [1] and my initial experiments is that I
used dprintf instead of pwrite to access memory.{peak,reclaim},
¯\_(ツ)_/¯.

next prev parent reply	other threads:[~2025-06-07  8:13 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-05 23:06 [PATCH bpf-next v1 0/2] veristat: memory accounting for bpf programs Eduard Zingerman
2025-06-05 23:06 ` [PATCH bpf-next v1 1/2] bpf: include verifier memory allocations in memcg statistics Eduard Zingerman
2025-06-05 23:06 ` [PATCH bpf-next v1 2/2] veristat: memory accounting for bpf programs Eduard Zingerman
2025-06-06  1:03   ` Eduard Zingerman
2025-06-06  2:17     ` Alexei Starovoitov
2025-06-06  2:33       ` Eduard Zingerman
2025-06-06  2:35         ` Alexei Starovoitov
2025-06-06  2:46           ` Eduard Zingerman
2025-06-06 16:53   ` Mykyta Yatsenko
2025-06-06 17:03     ` Eduard Zingerman
2025-06-06 18:19       ` Andrii Nakryiko
2025-06-07  8:13         ` Eduard Zingerman [this message]
2025-06-09 20:57           ` Andrii Nakryiko
2025-06-09 22:45             ` Eduard Zingerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ae7b709f618ecd75214e62f2a300fe2949d9b567.camel@gmail.com \
    --to=eddyz87@gmail.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kernel-team@fb.com \
    --cc=martin.lau@linux.dev \
    --cc=mykyta.yatsenko5@gmail.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).