From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0072B125D5 for ; Mon, 3 Jul 2023 21:52:13 +0000 (UTC) Received: from mail-pg1-x52d.google.com (mail-pg1-x52d.google.com [IPv6:2607:f8b0:4864:20::52d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 720F0FA; Mon, 3 Jul 2023 14:52:12 -0700 (PDT) Received: by mail-pg1-x52d.google.com with SMTP id 41be03b00d2f7-553b2979fceso1872189a12.3; Mon, 03 Jul 2023 14:52:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1688421132; x=1691013132; h=content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=O4uuguBMJuINg9fUQIGTuCA7yJtjUnEHnEO3dRUqxso=; b=TzBPGY2/HmytqFLZ3lsMk2uiVeAnhRcWIPFS3IdBtoPRpSPjWKuO3+fc638ZPLRsrh MbN4wkH5diNdXkeevMuD0POAKIvEhXHJEkkH71wQlkC2Mk0+vybXZO+d1PBQDEhBybhK yzaE6gPvi/cnZ3ER5V0VOVCye/Kwg55eVNa9dtEGgdZwDDsIJ67KjRtrYpvicxBPBxhP UweLKKDf6X7SjCUooK98kDYmjcZVZibH1xkelTSbuy037E4DwzLb0o61vjRg9rxQbixi R7O+nBumEXAibbNwR4qtpMfQELFVpsuYvUHAL7YUTX+Ybb/SEAFc5hrRWHXLCKI29vss 0nJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688421132; x=1691013132; h=content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=O4uuguBMJuINg9fUQIGTuCA7yJtjUnEHnEO3dRUqxso=; b=Flu67QPa7heGJcehOeFY/NBukL7RxUD7arn5X48LblgRdM6vsLjJcEB09V00M60Ghi 6R/jO7VsnYH45J9nIdK/FN85c0eo6W5/tw6QMbU2NL8HOjFkfja2Yx60vbFKynuNDkKO 4xwEKw5/rPaCCRlk5zyPCGYTnk5oXB93IgX78qY0TFIH9tgA6pX4kTR0AC1D87yjsgDU teoayaXC/i4q3NZYnqqlk3vBnQZDsqMvhO2U64/5j1MTi/VKdgZQCTJWaLEMtUPL/Jtq RMYKFuDtcFQVR4J61ocK6UVaZyjp1lxmcpOK6e7iScurJ6XgduHwmUiKkaqfzYFAn3Ks M3MA== X-Gm-Message-State: ABy/qLY1ztJ0R6NCf6f6w4Wq35m2IVmTMDiue2uUKM+ltl6KdnqwYJ2A KnRolOxsd8JHKCZ5+klzjYw= X-Google-Smtp-Source: APBJJlH2FsqRvGiCrTEApKhDEn2T6iQpBEx51K02bT6q5dAS9fkVKs9Ltil7O2wT5/4/0kCPNAb/BA== X-Received: by 2002:a05:6a20:a5a8:b0:12d:a534:42bb with SMTP id bc40-20020a056a20a5a800b0012da53442bbmr9006357pzb.20.1688421131823; Mon, 03 Jul 2023 14:52:11 -0700 (PDT) Received: from localhost ([2605:59c8:148:ba10::41f]) by smtp.gmail.com with ESMTPSA id l24-20020a62be18000000b006829ef1e179sm951716pff.99.2023.07.03.14.52.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Jul 2023 14:52:11 -0700 (PDT) Date: Mon, 03 Jul 2023 14:52:09 -0700 From: John Fastabend To: Hou Tao , bpf@vger.kernel.org, Martin KaFai Lau , Alexei Starovoitov Cc: Andrii Nakryiko , Song Liu , Hao Luo , Yonghong Song , Daniel Borkmann , KP Singh , Stanislav Fomichev , Jiri Olsa , John Fastabend , "Paul E . McKenney" , rcu@vger.kernel.org, houtao1@huawei.com Message-ID: <64a34309e42aa_652052084f@john.notmuch> In-Reply-To: <20230703141332.3319271-1-houtao@huaweicloud.com> References: <20230703141332.3319271-1-houtao@huaweicloud.com> Subject: RE: [PATCH bpf-next v8] selftests/bpf: Add benchmark for bpf memory allocator Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Hou Tao wrote: > From: Hou Tao > = > The benchmark could be used to compare the performance of hash map > operations and the memory usage between different flavors of bpf memory= > allocator (e.g., no bpf ma vs bpf ma vs reuse-after-gp bpf ma). It also= > could be used to check the performance improvement or the memory saving= > provided by optimization. > = > The benchmark creates a non-preallocated hash map which uses bpf memory= > allocator and shows the operation performance and the memory usage of > the hash map under different use cases: > (1) overwrite > Each CPU overwrites nonoverlapping part of hash map. When each CPU > completes overwriting of 64 elements in hash map, it increases the > op_count. > (2) batch_add_batch_del > Each CPU adds then deletes nonoverlapping part of hash map in batch. > When each CPU adds and deletes 64 elements in hash map, it increases > the op_count twice. > (3) add_del_on_diff_cpu > Each two-CPUs pair adds and deletes nonoverlapping part of map > cooperatively. When each CPU adds or deletes 64 elements in hash map, > it will increase the op_count. > = > The following is the benchmark results when comparing between different= > flavors of bpf memory allocator. These tests are conducted on a KVM gue= st > with 8 CPUs and 16 GB memory. The command line below is used to do all > the following benchmarks: > = > ./bench htab-mem --use-case $name ${OPTS} -w3 -d10 -a -p8 > = > These results show that preallocated hash map has both better performan= ce > and smaller memory footprint. > = > (1) non-preallocated + no bpf memory allocator (v6.0.19) > use kmalloc() + call_rcu > = > overwrite per-prod-op: 11.24 =C2=B1 0.07k/s, avg mem: 82.64 = =C2=B1 26.32MiB, peak mem: 119.18MiB > batch_add_batch_del per-prod-op: 18.45 =C2=B1 0.10k/s, avg mem: 50.47 = =C2=B1 14.51MiB, peak mem: 94.96MiB > add_del_on_diff_cpu per-prod-op: 14.50 =C2=B1 0.03k/s, avg mem: 4.64 =C2= =B1 0.73MiB, peak mem: 7.20MiB > = > (2) preallocated > OPTS=3D--preallocated > = > overwrite per-prod-op: 191.92 =C2=B1 0.07k/s, avg mem: 1.23 = =C2=B1 0.00MiB, peak mem: 1.49MiB > batch_add_batch_del per-prod-op: 218.10 =C2=B1 0.25k/s, avg mem: 1.23 = =C2=B1 0.00MiB, peak mem: 1.49MiB > add_del_on_diff_cpu per-prod-op: 39.59 =C2=B1 0.41k/s, avg mem: 1.48 =C2= =B1 0.11MiB, peak mem: 1.74MiB > = > (3) normal bpf memory allocator > = > overwrite per-prod-op: 134.81 =C2=B1 0.22k/s, avg mem: 1.67 = =C2=B1 0.12MiB, peak mem: 2.74MiB > batch_add_batch_del per-prod-op: 90.44 =C2=B1 0.34k/s, avg mem: 2.27 =C2= =B1 0.00MiB, peak mem: 2.74MiB > add_del_on_diff_cpu per-prod-op: 28.20 =C2=B1 0.15k/s, avg mem: 1.73 =C2= =B1 0.17MiB, peak mem: 2.06MiB Acked-by: John Fastabend > + > +static error_t htab_mem_parse_arg(int key, char *arg, struct argp_stat= e *state) > +{ > + switch (key) { > + case ARG_VALUE_SIZE: > + args.value_size =3D strtoul(arg, NULL, 10); > + if (args.value_size > 4096) { > + fprintf(stderr, "too big value size %u\n", args.value_size); > + argp_usage(state); > + } > + break; > + case ARG_USE_CASE: > + args.use_case =3D strdup(arg); might be worth checking for null and returning an error? Only matters if = we run from CI or something and then this looks like a flake. > + break; > + case ARG_PREALLOCATED: > + args.preallocated =3D true; > + break; > + default: > + return ARGP_ERR_UNKNOWN; > + } > + > + return 0; > +}=