From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pf1-f169.google.com (mail-pf1-f169.google.com [209.85.210.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B70ED289371 for ; Fri, 6 Jun 2025 17:03:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.169 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749229428; cv=none; b=WxqDczx6Q79Q//5+zRnggjhrI25ZsMmh6ww0ZUadjwgfZOSPQ2usNmAABHT/Q0yI95x3IAcIS04LCWz4uSizcBWTsm7MjuCQgPo3uuieMJ9urfF2xB5631cJmQhx6JmBA+aFFGAdMbHNgYtbnKQ0eSI1ovdJe8RcufkbIT5adOQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749229428; c=relaxed/simple; bh=qEImhkaJ7GLgdGLXf2Ew5YolnI5XHp6SrkSsK0gFk30=; h=Message-ID:Subject:From:To:Cc:Date:In-Reply-To:References: Content-Type:MIME-Version; b=IsqG+kmSaRi/l148Ev1fHpl37RxvepKbnqAeK2m21NPsfObqiXq8L1RZT6ljjdACr9MeGpZRK9gchEVdOXeJK4mc+xWXif8Icc6DiGRQuA3qhjnu5Tq4EPbdOf7+nWX5Yw3mth9xbaay1eC0SFqTyVZyelwL1SJMMEkAk0NWPeM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=XGRxtxt0; arc=none smtp.client-ip=209.85.210.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="XGRxtxt0" Received: by mail-pf1-f169.google.com with SMTP id d2e1a72fcca58-7482377b086so979379b3a.1 for ; Fri, 06 Jun 2025 10:03:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1749229426; x=1749834226; darn=vger.kernel.org; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:from:to:cc:subject :date:message-id:reply-to; bh=5z9/VSGextZfTY0qbpGtrbHUZngDfdst5gligRDVd4M=; b=XGRxtxt0nMZHID2DVgrn7XTq6cDgQ6uNYPMGgjVEuuDiNgY6mLOS2YuudW6XVooDF0 uhGqAJck9tshScn8CIUJBY+q5IpA/+NTC0JmWwsjrJvstQo65AjQc20hjzaTOA3SXoD3 SECbgCQDK95hdqwxiKij9Zyi4mL8//7jD981omjH0TZcCpBICmA1fg3PlLenSFUXPvSS O5d0wQCI3kpPRlJhi1iCXhcQ7uIRizvUaySGaNF5r9+T3W1Ij8wwqXusyLc0W1IS1uGF +2tp9wBVhl8aXAmv01kQ0Vov6oOO/l3S1PjVnYCRybiIME90+Pck8OtKekRKF2t8pA72 xXAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749229426; x=1749834226; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=5z9/VSGextZfTY0qbpGtrbHUZngDfdst5gligRDVd4M=; b=My6Y3LzsPEmfv0FfZY3fOQ4kXLAAjq9EzX6xbHZuKVYMbj4AyDpIe9ojCXAGCQAsdL p0o1SjjrXUAR9KnMEZBbGtbYQ7kteTFW3iQh0rU59Pm2viiGNfu5eJfHCWniAc4FH7Il AD6ptPlWevRU7kZAuHkgJvPquxe2kJNAAPyHaI6nEzv7BvauIzQgs+tgKApTimym6gl+ iZy3c1camKC4y1F0H0pAGFhI5t3z4tlnDyv3jbkr3Gvq1COi79injac095c+M0acQ2Bg C4LU1SIFbffuXILrzQ7KhL4AcjNQBPp5ZmuoJToHc7gidm0a9A9HD+DuhuiJvZNyfbhy Sf1A== X-Forwarded-Encrypted: i=1; AJvYcCV3bwIQvPp78C/372b/g982Uy/ulasVV/CvUNWgswHckBhfyjSt8k8PZNy2eNwASJlBBnM=@vger.kernel.org X-Gm-Message-State: AOJu0YyI7GgRV3Yohbpu21itDcl1kmB4vJIdjCKepVpP+6WJBlvnwDjN waFzbTdtz/DMBbliQKbu1UdblAYniuVriloC8bKo9Ae6SXvzzfJsuYEw X-Gm-Gg: ASbGncvT6ocRMknG5H9ee2uYIcJAWKge9Bs0pf2wYSe4XbB5O+LEiB7r90QOpTpr8Os k85mVG+8NgI4HjmD/M4v70nN2jKo8hTzvsFR+EOBob7UQ0vb6IZNXm3XlOjy1lyWv8ydxRwJcA+ m60CNHA/MC8NsmGez7e1TTSTxldlr6mvFSS5F+lNbQKUlSIo3QfdDfgCs2JrirHDeWDm2rh7oOZ 2X2JO1PTk/hpcREi7tVTjnVU7giR1TuvLBFbgns8vEY3HxuAlg7KLjUSVpoV6YJwMhEnODRFRFx Nd/JMe8Wy1q5CcBwn9rPW1Ylv4XJ89Pxk9tscNcLfT9z/zc= X-Google-Smtp-Source: AGHT+IGxslJZOQ08d+hHOamBCoKip7MKXQOdV5a5gPa9BgnqlsfJbnRBL46E/S4KSgdDi7wA7rOVlA== X-Received: by 2002:a05:6a00:2343:b0:746:2a0b:3dc8 with SMTP id d2e1a72fcca58-74827f10ac0mr5315326b3a.17.1749229425658; Fri, 06 Jun 2025 10:03:45 -0700 (PDT) Received: from [192.168.0.56] ([38.34.87.7]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-7482b0bd815sm1543472b3a.109.2025.06.06.10.03.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Jun 2025 10:03:45 -0700 (PDT) Message-ID: Subject: Re: [PATCH bpf-next v1 2/2] veristat: memory accounting for bpf programs From: Eduard Zingerman To: Mykyta Yatsenko , bpf@vger.kernel.org, ast@kernel.org Cc: andrii@kernel.org, daniel@iogearbox.net, martin.lau@linux.dev, kernel-team@fb.com, yonghong.song@linux.dev Date: Fri, 06 Jun 2025 10:03:42 -0700 In-Reply-To: <3dd16f19-63a4-4090-abd0-9b84fb07346b@gmail.com> References: <20250605230609.1444980-1-eddyz87@gmail.com> <20250605230609.1444980-3-eddyz87@gmail.com> <3dd16f19-63a4-4090-abd0-9b84fb07346b@gmail.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.54.3 (3.54.3-1.fc41) Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 On Fri, 2025-06-06 at 17:53 +0100, Mykyta Yatsenko wrote: [...] > > +/* > > + * Creates a cgroup at /tmp/veristat-cgroup-mount-XXXXXX/accounting-, > > + * moves current process to this cgroup. > > + */ > > +static int create_stat_cgroup(void) > > +{ > > + char buf[PATH_MAX + 1]; > > + int err; > > + > > + if (!env.cgroup_fs_mount[0]) > > + return -1; > > + > > + env.memory_peak_fd =3D -1; > > + > > + snprintf_trunc(buf, sizeof(buf), "%s/accounting-%d", env.cgroup_fs_mo= unt, getpid()); > > + err =3D mkdir(buf, 0777); > > + if (err < 0) { > > + err =3D log_errno("mkdir(%s)", buf); > > + goto err_out; > > + } > > + strcpy(env.stat_cgroup, buf); > > + > > + snprintf_trunc(buf, sizeof(buf), "%s/cgroup.procs", env.stat_cgroup); > > + err =3D write_one_line(buf, "%d\n", getpid()); > > + if (err < 0) { > > + err =3D log_errno("echo %d > %s", getpid(), buf); > > + goto err_out; > > + } > > + > > + snprintf_trunc(buf, sizeof(buf), "%s/memory.peak", env.stat_cgroup); > > + env.memory_peak_fd =3D open(buf, O_RDWR | O_APPEND); > > Why is it necessary to open in RW|APPEND mode? Won't O_RDONLY cut it? With current implementation -- not necessary, O_RDONLY should be sufficient= . > > + if (env.memory_peak_fd < 0) { > > + err =3D log_errno("open(%s)", buf); > > + goto err_out; > > + } > > + > > + return 0; > > + > > +err_out: > > + destroy_stat_cgroup(); > > + return err; > > +} [...] > > +/* Current value of /tmp/veristat-cgroup-mount-XXXXXX/accounting-= /memory.peak */ > > +static long cgroup_memory_peak(void) > > +{ > > + long err, memory_peak; > > + char buf[32]; > > + > > + if (env.memory_peak_fd < 0) > > + return -1; > > + > > + err =3D pread(env.memory_peak_fd, buf, sizeof(buf) - 1, 0); > > + if (err <=3D 0) { > > + log_errno("read(%s/memory.peak)", env.stat_cgroup); > > + return -1; > > + } > > + > > + buf[err] =3D 0; > > nit: maybe rename err to len here? Sure, will rename. > > + errno =3D 0; > > + memory_peak =3D strtoll(buf, NULL, 10); > > + if (errno) { > > + log_errno("unrecognized %s/memory.peak format: %s", env.stat_cgroup,= buf); > > + return -1; > > + } > > + > > + return memory_peak; > > +} > > + [...] > > @@ -1332,7 +1551,16 @@ static int process_prog(const char *filename, st= ruct bpf_object *obj, struct bpf > > if (env.force_reg_invariants) > > bpf_program__set_flags(prog, bpf_program__flags(prog) | BPF_F_TEST_= REG_INVARIANTS); > > =20 > > - err =3D bpf_object__load(obj); > > + err =3D bpf_object__prepare(obj); > > + if (!err) { > > + cgroup_err =3D create_stat_cgroup(); > > + mem_peak_a =3D cgroup_memory_peak(); > > + err =3D bpf_object__load(obj); > > + mem_peak_b =3D cgroup_memory_peak(); > > + destroy_stat_cgroup(); > > What if we do create_stat_cgroup/destory_stat_cgroup in=20 > handle_verif_mode along with mount/umount_cgroupfs. > It may speed things up a little bit here and moving all cgroup=20 > preparations into the single place seems reasonable. > Will memory counter behave differently? We are checking the difference= =20 > around bpf_object__load, from layman's > perspective it should be the same. The memory.peak file accounts for peak memory consumption, so one would need to reset this counter between program verifications. Doc [1] describes such mechanism: memory.peak A read-write single value file which exists on non-root cgroups. The max memory usage recorded for the cgroup and its descendants since either the creation of the cgroup or the most recent reset for that FD. A write of any non-empty string to this file resets it to the current memory usage for subsequent reads through the same file descriptor. memory.reclaim A write-only nested-keyed file which exists for all cgroups. =20 This is a simple interface to trigger memory reclaim in the target cgroup. =20 Example: =20 echo "1G" > memory.reclaim =20 Please note that the kernel can over or under reclaim from the target cgroup. If less bytes are reclaimed than the specified amount, -EAGAIN is returned. As mentioned in cover letter, I tried using a combination of the above, while creating a cgroup only once. For reasons I don't understand this did not produce stable measurements. E.g. depending on a program being verified in isolation or as part of a batch results vary from 5mb to 2mb. [1] https://docs.kernel.org/admin-guide/cgroup-v2.html > > + if (!cgroup_err && mem_peak_a >=3D 0 && mem_peak_b >=3D 0) > > + mem_peak =3D mem_peak_b - mem_peak_a; > > + } > > env.progs_processed++; > > =20 > > stats->file_name =3D strdup(base_filename); > > @@ -1341,6 +1569,7 @@ static int process_prog(const char *filename, str= uct bpf_object *obj, struct bpf [...] > Acked-by: Mykyta Yatsenko Thank you. I will have to send a v2 avoiding mount operations and controllers file modification.