Re: [PATCH bpf-next v1 2/2] veristat: memory accounting for bpf programs

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Eduard Zingerman <eddyz87@gmail.com>
To: Mykyta Yatsenko <mykyta.yatsenko5@gmail.com>,
	bpf@vger.kernel.org,  ast@kernel.org
Cc: andrii@kernel.org, daniel@iogearbox.net, martin.lau@linux.dev,
	 kernel-team@fb.com, yonghong.song@linux.dev
Subject: Re: [PATCH bpf-next v1 2/2] veristat: memory accounting for bpf programs
Date: Fri, 06 Jun 2025 10:03:42 -0700	[thread overview]
Message-ID: <efe0cc259f70b11ffd3e398441efd0de5aa98c3e.camel@gmail.com> (raw)
In-Reply-To: <3dd16f19-63a4-4090-abd0-9b84fb07346b@gmail.com>

On Fri, 2025-06-06 at 17:53 +0100, Mykyta Yatsenko wrote:

[...]

> > +/*
> > + * Creates a cgroup at /tmp/veristat-cgroup-mount-XXXXXX/accounting-<pid>,
> > + * moves current process to this cgroup.
> > + */
> > +static int create_stat_cgroup(void)
> > +{
> > +	char buf[PATH_MAX + 1];
> > +	int err;
> > +
> > +	if (!env.cgroup_fs_mount[0])
> > +		return -1;
> > +
> > +	env.memory_peak_fd = -1;
> > +
> > +	snprintf_trunc(buf, sizeof(buf), "%s/accounting-%d", env.cgroup_fs_mount, getpid());
> > +	err = mkdir(buf, 0777);
> > +	if (err < 0) {
> > +		err = log_errno("mkdir(%s)", buf);
> > +		goto err_out;
> > +	}
> > +	strcpy(env.stat_cgroup, buf);
> > +
> > +	snprintf_trunc(buf, sizeof(buf), "%s/cgroup.procs", env.stat_cgroup);
> > +	err = write_one_line(buf, "%d\n", getpid());
> > +	if (err < 0) {
> > +		err = log_errno("echo %d > %s", getpid(), buf);
> > +		goto err_out;
> > +	}
> > +
> > +	snprintf_trunc(buf, sizeof(buf), "%s/memory.peak", env.stat_cgroup);
> > +	env.memory_peak_fd = open(buf, O_RDWR | O_APPEND);
>
> Why is it necessary to open in RW|APPEND mode? Won't O_RDONLY cut it?

With current implementation -- not necessary, O_RDONLY should be sufficient.

> > +	if (env.memory_peak_fd < 0) {
> > +		err = log_errno("open(%s)", buf);
> > +		goto err_out;
> > +	}
> > +
> > +	return 0;
> > +
> > +err_out:
> > +	destroy_stat_cgroup();
> > +	return err;
> > +}

[...]

> > +/* Current value of /tmp/veristat-cgroup-mount-XXXXXX/accounting-<pid>/memory.peak */
> > +static long cgroup_memory_peak(void)
> > +{
> > +	long err, memory_peak;
> > +	char buf[32];
> > +
> > +	if (env.memory_peak_fd < 0)
> > +		return -1;
> > +
> > +	err = pread(env.memory_peak_fd, buf, sizeof(buf) - 1, 0);
> > +	if (err <= 0) {
> > +		log_errno("read(%s/memory.peak)", env.stat_cgroup);
> > +		return -1;
> > +	}
> > +
> > +	buf[err] = 0;
>
> nit: maybe rename err to len here?

Sure, will rename.

> > +	errno = 0;
> > +	memory_peak = strtoll(buf, NULL, 10);
> > +	if (errno) {
> > +		log_errno("unrecognized %s/memory.peak format: %s", env.stat_cgroup, buf);
> > +		return -1;
> > +	}
> > +
> > +	return memory_peak;
> > +}
> > +

[...]

> > @@ -1332,7 +1551,16 @@ static int process_prog(const char *filename, struct bpf_object *obj, struct bpf
> >   	if (env.force_reg_invariants)
> >   		bpf_program__set_flags(prog, bpf_program__flags(prog) | BPF_F_TEST_REG_INVARIANTS);
> >   
> > -	err = bpf_object__load(obj);
> > +	err = bpf_object__prepare(obj);
> > +	if (!err) {
> > +		cgroup_err = create_stat_cgroup();
> > +		mem_peak_a = cgroup_memory_peak();
> > +		err = bpf_object__load(obj);
> > +		mem_peak_b = cgroup_memory_peak();
> > +		destroy_stat_cgroup();
>
> What if we do create_stat_cgroup/destory_stat_cgroup in 
> handle_verif_mode along with mount/umount_cgroupfs.
> It may speed things up a little bit here and moving all cgroup 
> preparations into the single place seems reasonable.
> Will memory counter behave differently? We are checking the difference 
> around bpf_object__load, from layman's
> perspective it should be the same.

The memory.peak file accounts for peak memory consumption, so one
would need to reset this counter between program verifications.
Doc [1] describes such mechanism:

    memory.peak

      A read-write single value file which exists on non-root cgroups.

      The max memory usage recorded for the cgroup and its descendants
      since either the creation of the cgroup or the most recent reset
      for that FD.

      A write of any non-empty string to this file resets it to the
      current memory usage for subsequent reads through the same file
      descriptor.

    memory.reclaim

      A write-only nested-keyed file which exists for all cgroups.
    
      This is a simple interface to trigger memory reclaim in the target
      cgroup.
    
      Example:
    
        echo "1G" > memory.reclaim
    
      Please note that the kernel can over or under reclaim from the
      target cgroup. If less bytes are reclaimed than the specified
      amount, -EAGAIN is returned.

As mentioned in cover letter, I tried using a combination of the above,
while creating a cgroup only once. For reasons I don't understand this
did not produce stable measurements. E.g. depending on a program being
verified in isolation or as part of a batch results vary from 5mb to 2mb.

[1] https://docs.kernel.org/admin-guide/cgroup-v2.html

> > +		if (!cgroup_err && mem_peak_a >= 0 && mem_peak_b >= 0)
> > +			mem_peak = mem_peak_b - mem_peak_a;
> > +	}
> >   	env.progs_processed++;
> >   
> >   	stats->file_name = strdup(base_filename);
> > @@ -1341,6 +1569,7 @@ static int process_prog(const char *filename, struct bpf_object *obj, struct bpf

[...]

> Acked-by: Mykyta Yatsenko <yatsenko@meta.com>

Thank you. I will have to send a v2 avoiding mount operations and
controllers file modification.

next prev parent reply	other threads:[~2025-06-06 17:03 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-05 23:06 [PATCH bpf-next v1 0/2] veristat: memory accounting for bpf programs Eduard Zingerman
2025-06-05 23:06 ` [PATCH bpf-next v1 1/2] bpf: include verifier memory allocations in memcg statistics Eduard Zingerman
2025-06-05 23:06 ` [PATCH bpf-next v1 2/2] veristat: memory accounting for bpf programs Eduard Zingerman
2025-06-06  1:03   ` Eduard Zingerman
2025-06-06  2:17     ` Alexei Starovoitov
2025-06-06  2:33       ` Eduard Zingerman
2025-06-06  2:35         ` Alexei Starovoitov
2025-06-06  2:46           ` Eduard Zingerman
2025-06-06 16:53   ` Mykyta Yatsenko
2025-06-06 17:03     ` Eduard Zingerman [this message]
2025-06-06 18:19       ` Andrii Nakryiko
2025-06-07  8:13         ` Eduard Zingerman
2025-06-09 20:57           ` Andrii Nakryiko
2025-06-09 22:45             ` Eduard Zingerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=efe0cc259f70b11ffd3e398441efd0de5aa98c3e.camel@gmail.com \
    --to=eddyz87@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kernel-team@fb.com \
    --cc=martin.lau@linux.dev \
    --cc=mykyta.yatsenko5@gmail.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.