From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 917E828E567 for ; Wed, 28 May 2025 17:14:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1748452465; cv=none; b=dlEf21wTVZLokZ7TkbaBgG0wf9RWGsg3LXycDEh9LKqa6Ix4rY+YauRVQYibeQIm85H58jviWNqp058S3x2wD4SB3xGfqv9HH0MbFAYx5oK0x+H97dpm0Iv9wNwt03/o1oZ4y+C3cnttvAZXX1EhiKdloloybx1dfOFMcmkITrQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1748452465; c=relaxed/simple; bh=5V8sHcKdX0ZloH6uba8PkmXe7ZxbXS2g7lOsFOQ4DIs=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=XKfVrUumACPN+hDSDAtfjURaBM+SP3Puw7MnLupZE+HiW0ceWlpZYSQGt138c29eS9FKGDnP2W2SCr8E+bqbGaKon9vm538SpNnsZ59f53WBZiNsuCWyA3xkJxHxhG67uYuKKVMhlpB5MBL/AEhNka77uZiIakacJ4ohyCHV0lE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=XEAvzNsx; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="XEAvzNsx" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1748452461; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Af1cQmnUWBy6xyFGLsztGXRxWkyoQy2XklFMCkPpofg=; b=XEAvzNsxGFOSaHLy4EQ5s2VDU0u3cA1QbJDBPB2CfbhxyrhFLuy+3cc+GKORPzaHRURqC9 Q+XMsOSwApkMwUZeFQaEopmUb5svTGiFxX3uM+Ma0Oi6ZlwgmDWIF/BGIvn3X5hO7riKPO N/hrrQnxtErblELqqNJAiw3CZoKMPv8= Received: from mail-ej1-f70.google.com (mail-ej1-f70.google.com [209.85.218.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-551-ooxuzV8oNOKTGLBAFHZJ7w-1; Wed, 28 May 2025 13:14:19 -0400 X-MC-Unique: ooxuzV8oNOKTGLBAFHZJ7w-1 X-Mimecast-MFC-AGG-ID: ooxuzV8oNOKTGLBAFHZJ7w_1748452458 Received: by mail-ej1-f70.google.com with SMTP id a640c23a62f3a-ad56c5412f2so396855066b.1 for ; Wed, 28 May 2025 10:14:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748452458; x=1749057258; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Af1cQmnUWBy6xyFGLsztGXRxWkyoQy2XklFMCkPpofg=; b=kgrihrqZCugCNJHrE9U7aWYhkQXDShnj/WSqete3FPJDlUlE8e0V7szn2okYh8iZQu YEhnGZuxESqPcQcj5t02YCd/DDkXWQzoexBYD1JnygsKv5DLmIKTCu2JIUZUKuiZ/OS8 Otdg+IZlagntne799EAYB9wKO+aWjtUWb3fFeI0j8SP5Ccl5krxwEBCLSj/eGYQTHAi5 FsU9frE5eXike2Tbni3S7UMOfFlA8N/Hg5zI0gCL06WLrXmEBZJBn4aFbsIPRChO8cLl q+zj7IxkDaxuKgz9BV632PakXTNOSke6pIXEJ+9UDv10c3727AdenqJSKptRI1psr3l4 ol3Q== X-Forwarded-Encrypted: i=1; AJvYcCV/vIzCAvMK2OTt9whl6SQc3HLUCLkqoGhaeh5x7KGDhDssZYgkL64RYi+zAU+n/zOCS6bk65wiMPfeaepbTGU=@vger.kernel.org X-Gm-Message-State: AOJu0YwB0Ff/Va+9whqq/17pqL+NlYfE8asgx8iymQqsuY4FhM6nRhLY BZguYp1Qf3dAnYot1guHb8Cvqp2cSmN77gWqORSk9MCAqfo/fl+I6xvdGuQqjAmb+hekGYZTLVz unBnBMt3RiHHVLUPpG/2LB56xtsl7oQtrbhIS9fm2b2tR64E6oi8QagSTGQZvnhc9+BvygHcIjy AGkWPj X-Gm-Gg: ASbGncs6+tS6yUsEZp41gJbFULSmUhsNbFY1rMv9MRZqGQKtnXdKHzz13Q/NFSHBw8p FUXaXke7Ec31K05Eqqw+8M9M345MyvrH7ceJ3fgvVZJUmNrCr67h+/ZD4f4E6fQdoTXKgRxFW25 9P28PId7NDuVAIrqWDZvfWYS/q3ttc8ffuOkQ0TcMSVKu29ksNrig7S5CC80GhrEOunlhkfCAnf qzuc5H4rb+1LUzTVq7Sqn1CM4mP3hNk78N+2W6Fz5C2uVXeVvqMjng1O1ur2CoTGJ8qbgbbuG8/ sdYIvjeFr7qgGHqjw23MbFY8T5yExW3aILiD X-Received: by 2002:a17:906:dc8a:b0:ad2:28be:9a16 with SMTP id a640c23a62f3a-ad85b329c4bmr1741952366b.51.1748452458105; Wed, 28 May 2025 10:14:18 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEdcbEeJUfJ8I7m21/YTxGQ8eHMpRz88wZmwa7gj2fZublgDvyXBqKeXaE7YhDftP+Dy8Xh+A== X-Received: by 2002:a17:906:dc8a:b0:ad2:28be:9a16 with SMTP id a640c23a62f3a-ad85b329c4bmr1741947766b.51.1748452457601; Wed, 28 May 2025 10:14:17 -0700 (PDT) Received: from alrua-x1.borgediget.toke.dk ([45.145.92.2]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-ad8a19acb5esm139084866b.5.2025.05.28.10.14.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 May 2025 10:14:16 -0700 (PDT) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id D69BF1AA8976; Wed, 28 May 2025 19:14:15 +0200 (CEST) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= To: Arnaldo Carvalho de Melo Cc: Mina Almasry , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, Jesper Dangaard Brouer , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Shuah Khan , Ilias Apalodimas Subject: Re: [PATCH RFC net-next v2] page_pool: import Jesper's page_pool benchmark In-Reply-To: References: <20250525034354.258247-1-almasrymina@google.com> <87iklna61r.fsf@toke.dk> <87h615m6cp.fsf@toke.dk> X-Clacks-Overhead: GNU Terry Pratchett Date: Wed, 28 May 2025 19:14:15 +0200 Message-ID: <87cybsr72w.fsf@toke.dk> Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Arnaldo Carvalho de Melo writes: > On Wed, May 28, 2025 at 11:28:54AM +0200, Toke H=C3=B8iland-J=C3=B8rgense= n wrote: >> Mina Almasry writes: >> > On Mon, May 26, 2025 at 5:51=E2=80=AFAM Toke H=C3=B8iland-J=C3=B8rgens= en wrote: >> >> Back when you posted the first RFC, Jesper and I chatted about ways to >> >> avoid the ugly "load module and read the output from dmesg" interface= to >> >> the test. > >> > I agree the existing interface is ugly. > >> >> One idea we came up with was to make the module include only the "inn= er" >> >> functions for the benchmark, and expose those to BPF as kfuncs. Then = the >> >> test runner can be a BPF program that runs the tests, collects the da= ta >> >> and passes it to userspace via maps or a ringbuffer or something. Tha= t's >> >> a nicer and more customisable interface than the printk output. And if >> >> they're small enough, maybe we could even include the functions into = the >> >> page_pool code itself, instead of in a separate benchmark module? > >> >> WDYT of that idea? :) > >> > ...but this sounds like an enormous amount of effort, for something >> > that is a bit ugly but isn't THAT bad. Especially for me, I'm not that >> > much of an expert that I know how to implement what you're referring >> > to off the top of my head. I normally am open to spending time but >> > this is not that high on my todolist and I have limited bandwidth to >> > resolve this :( > >> > I also feel that this is something that could be improved post merge. > > agreed > >> > I think it's very beneficial to have this merged in some form that can >> > be improved later. Byungchul is making a lot of changes to these mm >> > things and it would be nice to have an easy way to run the benchmark >> > in tree and maybe even get automated results from nipa. If we could >> > agree on mvp that is appropriate to merge without too much scope creep >> > that would be ideal from my side at least. >=20=20 >> Right, fair. I guess we can merge it as-is, and then investigate whether >> we can move it to BPF-based (or maybe 'perf bench' - Cc acme) later :) > > tldr; I'd advise to merge it as-is, then kfunc'ify parts of it and use > it from a 'perf bench' suite. > > Yeah, the model would be what I did for uprobes, but even then there is > a selftests based uprobes benchmark ;-) > > The 'perf bench' part, that calls into the skel: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/t= ools/perf/bench/uprobe.c > > The skel: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/t= ools/perf/util/bpf_skel/bench_uprobe.bpf.c > > While this one is just to generate BPF load to measure the impact on > uprobes, for your case it would involve using a ring buffer to > communicate from the skel (BPF/kernel side) to the userspace part, > similar to what is done in various other BPF based perf tooling > available in: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/t= ools/perf/util/bpf_skel > > Like at this line (BPF skel part): > > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/= tree/tools/perf/util/bpf_skel/off_cpu.bpf.c?h=3Dperf-tools-next#n253 > > The simplest part is in the canonical, standalone runqslower tool, also > hosted in the kernel sources: > > BPF skel sending stuff to userspace: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/t= ools/bpf/runqslower/runqslower.bpf.c#n99 > > The userspace part that reads it: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/t= ools/bpf/runqslower/runqslower.c#n90 > > This is a callback that gets called for every event that the BPF skel > produces, called from this loop: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/t= ools/bpf/runqslower/runqslower.c#n162 > > That handle_event callback was associated via: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/t= ools/bpf/runqslower/runqslower.c#n153 > > There is a dissection I did about this process a long time ago, but > still relevant, I think: > > http://oldvger.kernel.org/~acme/bpf/devconf.cz-2020-BPF-The-Status-of-BTF= -producers-consumers/#/33 > > The part explaining the interaction userspace/kernel starts here: > > http://oldvger.kernel.org/~acme/bpf/devconf.cz-2020-BPF-The-Status-of-BTF= -producers-consumers/#/40 > > (yeah, its http, but then, its _old_vger ;-) > > Doing it in perf is interesting because it gets widely packaged, so > whatever you add to it gets visibility for people using 'perf bench' and > also gets available in most places, it would add to this collection: > > root@number:~# perf bench > Usage:=20 > perf bench [] [] > > # List of all available benchmark collections: > > sched: Scheduler and IPC benchmarks > syscall: System call benchmarks > mem: Memory access benchmarks > numa: NUMA scheduling and MM benchmarks > futex: Futex stressing benchmarks > epoll: Epoll stressing benchmarks > internals: Perf-internals benchmarks > breakpoint: Breakpoint benchmarks > uprobe: uprobe benchmarks > all: All benchmarks > > root@number:~# > > the 'perf bench' that uses BPF skel: > > root@number:~# perf bench uprobe baseline > # Running 'uprobe/baseline' benchmark: > # Executed 1,000 usleep(1000) calls > Total time: 1,050,383 usecs > > 1,050.383 usecs/op > root@number:~# perf trace --summary perf bench uprobe trace_printk > # Running 'uprobe/trace_printk' benchmark: > # Executed 1,000 usleep(1000) calls > Total time: 1,053,082 usecs > > 1,053.082 usecs/op > > Summary of events: > > uprobe-trace_pr (1247691), 3316 events, 96.9% > > syscall calls errors total min avg max = stddev > (msec) (msec) (msec) (msec)= (%) > --------------- -------- ------ -------- --------- --------- --------= - ------ > clock_nanosleep 1000 0 1101.236 1.007 1.101 50.93= 9 4.53% > close 98 0 32.979 0.001 0.337 32.82= 1 99.52% > perf_event_open 1 0 18.691 18.691 18.691 18.69= 1 0.00% > mmap 209 0 0.567 0.001 0.003 0.00= 7 2.59% > bpf 38 2 0.380 0.000 0.010 0.09= 2 28.38% > openat 65 0 0.171 0.001 0.003 0.01= 2 7.14% > mprotect 56 0 0.141 0.001 0.003 0.00= 8 6.86% > read 68 0 0.082 0.001 0.001 0.01= 0 11.60% > fstat 65 0 0.056 0.001 0.001 0.00= 3 5.40% > brk 10 0 0.050 0.001 0.005 0.01= 2 24.29% > pread64 8 0 0.042 0.001 0.005 0.02= 1 49.29% > > > root@number:~# Cool, thanks for the pointers! Guess we'd need to restructure the functions to be benchmarked a bit, but that should be doable I guess. -Toke