Linux Documentation

Linux Documentation
 help / color / mirror / Atom feed

* Re: [PATCH] Documentation: KVM: Document guest-visible compatibility expectations
From: Paolo Bonzini @ 2026-05-19 12:13 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Will Deacon, Marc Zyngier, Jonathan Corbet, Shuah Khan, kvm,
	Linux Doc Mailing List, Kernel Mailing List, Linux,
	Sean Christopherson, Jim Mattson, Oliver Upton, Joey Gouly,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Raghavendra Rao Ananta, Eric Auger, Kees Cook, Arnd Bergmann,
	Nathan Chancellor, linux-arm-kernel, kvmarm, linux-kselftest
In-Reply-To: <cf429f2082e863571595f74d1d3dedc3e6a82964.camel@infradead.org>

On Tue, May 19, 2026 at 1:44 PM David Woodhouse <dwmw2@infradead.org> wrote:
> > > So... what next? Is one of the other KVM/arm64 maintainers going to
> > > speak up? Paolo would you consider taking the fixes through your tree
> > > directly?

I admit that my knowledge of Arm is really limited, and I do not
understand which IIDR values have architecturally allowed behaviors
and which (if any) were made up by KVM; but even if I cannot honestly
remark on the code or even the approach, a compatibility knob is the
right thing to have.  That's a userspace API design matter, not an Arm
or GIC matter.

I hope that Marc provides a better explanation of why he believes
https://lore.kernel.org/all/20260511113558.3325004-2-dwmw2@infradead.org/
shouldn't be accepted, because I am more than a bit puzzled about
*why* that patch is being rejected or (in v3) so far ignored. Marc in
this thread wrote: "If userspace is not a total joke, it will read all
the ID registers, and configure what it wants to see, assuming it is a
feature that can be configured (not everything can, because the
architecture itself is not fully backward compatible)". But in this
case there's an ID register that tells KVM if userspace wants the old
or the new behavior, independent of whether that old behavior is
architecturally valid or not.

I will certainly take this patch, but I won't override Marc. However
I'd like to better understand his point of view, because right now I
just don't get it.

> If KVM on arm64 doesn't aspire to maintain guest compatibility across
> host kernel changes — regardless of whether the previous kernel's
> behaviour was "blessed" by the architecture specification or not — then
> it does not meet the expectation that we have of KVM implementations in
> the Linux kernel.

I agree with the "aspire" wording. Even if it's not going to be 100%
achievable, KVM *needs* to aspire to maintain both guest compatibility
and architecture precision. Sometimes it's impossible, sometimes there
are constraints that require you to trade off one for another (e.g.
via quirks, or by breaking behavior that no sane guest would have
cared about). But in general as a maintainer you don't *get* to
choose.

Paolo

> Or indeed the standards that we've held for Linux kernel ABIs for the
> last 35 years.

^ permalink raw reply

* Re: [PATCH v3] killswitch: add per-function short-circuit mitigation primitive
From: Daniel Borkmann @ 2026-05-19 12:13 UTC (permalink / raw)
  To: Song Liu, Sasha Levin
  Cc: linux-kernel, linux-doc, linux-kselftest, bpf, live-patching,
	Greg Kroah-Hartman, Andrew Morton, Jonathan Corbet,
	Mathieu Desnoyers, Joshua Peisach, Florian Weimer, Breno Leitao,
	Anthony Iliopoulos, Michal Hocko, Jiri Olsa, John Fastabend,
	Christian Brauner
In-Reply-To: <CAPhsuW44UX663Au=WwHz8MVwnQgLkjxOqpJSCKxNiv3=RpZvqw@mail.gmail.com>

On 5/19/26 1:59 AM, Song Liu wrote:
> On Mon, May 18, 2026 at 6:33 AM Sasha Levin <sashal@kernel.org> wrote:
>> On Sun, May 17, 2026 at 11:37:36PM -0700, Song Liu wrote:
>>> On Sun, May 17, 2026 at 6:49 AM Sasha Levin <sashal@kernel.org> wrote:
>>>> * fail_function (CONFIG_FUNCTION_ERROR_INJECTION) is disabled in
>>>>    most production kernels. Even where enabled, it only works on
>>>>    functions pre-annotated with ALLOW_ERROR_INJECTION() in source -
>>>>    no help for a freshly-disclosed CVE. The debugfs UI is blocked by
>>>>    lockdown=integrity and the override is probabilistic.
>>>>
>>>> * BPF override (bpf_override_return) honors the same
>>>>    ALLOW_ERROR_INJECTION() whitelist, and BPF itself is off in many
>>>>    production kernels. Even where on, the operator interface is
>>>>    "load a verified BPF program," not a one-line write.
>>>
>>> If it is OK for killswitch to attach to any kernel functions, do we still
>>> need ALLOW_ERROR_INJECTION() for fail_function and BPF
>>> override? Shall we instead also allow fail_function and BPF override
>>> to attach to any kernel functions?
>>
>> I don't think so. ALLOW_ERROR_INJECTION is not a security mechanism, it's an
>> integrity/safety mechanism for both bpf and fault injection.
>>
>> It protects against a "developer or CI script doing legitimate fault injection
>> accidentally panics the box" scenario, not an "attacker gets in" one.
> 
> There really isn't a clear boundary between "security mechanism" and
> "non-security mechanism". As we are making killswitch available
> everywhere under root, users will soon learn to use it to do fault injection,
> and potentially much more scary things. (Think about agents with sudo
> access).

Fully agree with Song here that there is no clear boundary, and that the
killswitch could lead to arbitrary, hard to debug breakage if applied to
the wrong function.. introducing worse bugs than the one being mitigated
or even /short-circuit LSM enforcement/ (engage security_file_open 0,
engage cap_capable 0, engage apparmor_* etc).

The ALLOW_ERROR_INJECTION() provides a curated white-list where you may
return with an error without causing more severe damage (assuming the
error handling code is right). The right thing would be to more widely
apply ALLOW_ERROR_INJECTION() or to figure out a better way to safely
enable the latter without explicit function annotation.

Wrt BPF:

>>>> * BPF override (bpf_override_return) honors the same
>>>>    ALLOW_ERROR_INJECTION() whitelist, and BPF itself is off in many
>>>>    production kernels. Even where on, the operator interface is
>>>>    "load a verified BPF program," not a one-line write.

The claim that BPF itself is off in many production kernels is not really
true, where did you get that from? All the major distros and cloud providers
have BPF enabled these days, and even systemd ships BPF programs for
custom service firewalling etc.

The operator interface is to load a program vs. one-line write.. so we're
disregarding existing infra where you can already achieve the same for a
less safe one-liner convenience? (similarly for the livepatch infra..)

If you need a one-liner: bpftrace -e 'kprobe:FUNC { override(RETVAL); }'
Alternatively, add an extension to systemd where you can just deploy a
list of functions, and it does the necessary work in the background and
persistently.

Also, what about other classes of bugs, like OOB access, UAFs, locking
issues, etc which then could be used as a means for privilege escalations?
It feels like this proposal is a quick'n'dirty prototype via Claude as a
reaction to copy fail bug, but the right solution would be to improve
the user space tooling as mentioned and existing infra we have in kernel.

^ permalink raw reply

* Re: [PATCH v6 03/11] dt-bindings: mfd: add documentation for S2MU005 PMIC
From: Kaustabh Chakraborty @ 2026-05-19 12:07 UTC (permalink / raw)
  To: Krzysztof Kozlowski, Conor Dooley, Kaustabh Chakraborty
  Cc: Lee Jones, Pavel Machek, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, MyungJoo Ham, Chanwoo Choi, Sebastian Reichel,
	André Draszik, Alexandre Belloni, Jonathan Corbet,
	Shuah Khan, Nam Tran, Łukasz Lebiedziński, linux-leds,
	devicetree, linux-kernel, linux-pm, linux-samsung-soc, linux-rtc,
	linux-doc
In-Reply-To: <0240eb13-6c56-4879-8db7-b990a220a78f@kernel.org>

On 2026-05-18 12:23 +02:00, Krzysztof Kozlowski wrote:
> On 18/05/2026 11:45, Conor Dooley wrote:
>> On Mon, May 18, 2026 at 09:15:11AM +0200, Krzysztof Kozlowski wrote:
>>> On 17/05/2026 22:52, Conor Dooley wrote:
>>>> On Sun, May 17, 2026 at 06:39:37PM +0530, Kaustabh Chakraborty wrote:
>>>>>>>>>> +
>>>>>>>>> +    properties:
>>>>>>>>> +      compatible:
>>>>>>>>> +        const: samsung,s2mu005-rgb
>>>>>>>>> +
>>>>>>>>> +    required:
>>>>>>>>> +      - compatible
>>>>>>>>> +
>>>>>>>>> +    unevaluatedProperties: false
>>>>>>>>> +
>>>>>>>>> +  reg:
>>>>>>>>> +    maxItems: 1
>>>>>>>>
>>>>>>>> Move this above the child nodes please.
>>>>>>>
>>>>>>> But properties are sorted in lex order?
>>>>>>
>>>>>> Typically the binding is sorted in the same order as properties go in
>>>>>> nodes. Common stuff like reg/clocks/interrupts therefore send up above
>>>>>> child nodes.
>>>>>
>>>>> So, do I change this? For one, I don't see the same being followed in
>>>>> other schemas of samsung in the same dir (not that I'm trying to pose it
>>>>> as an argument against your suggestion), and this was reviewed by
>>>>> Krzysztof and is adderssed in v7.
>>>>
>>>> If Krzysztof doesn't care, then I won't ask you to change it.
>>>
>>> This builds on top of bindings for previous Samsung PMIC devices, so
>>> that's why it keeps the compatibles for children, I guess. No one
>>> complained about this at v1-v2 reviews, so when I joined reviewing in v3
>>> I did not, either.
>>>
>>> I don't think the compatible should be here, but I also don't want to
>>> stall that patchset. I understand that it is inconsistent review from my
>>> side, because other similar patchsets receive comment to drop the
>>> compatible. But I don't think we will be fair asking to drop the
>>> compatible now, when we did not ask for that in the early versions at all.
>> 
>> 
>> I think you misunderstood, we were talking about the ordering of the
>> properties in the binding file being alphanumerical, rather than the
>> more typical approach of approximately following the order of
>> dts-coding-style.
>
>
> Ah, then I misunderstood and, even though it is a nit, I do care because
> old code is then used for new patches. Bindings follow DTS rules, thus
> should be:
> 1. compatible
> 2. reg
> 3. core properties
> 4. vendor properties
>
> Kaustabh, can you change it please?

Ack, will do that in v8 then.

While at it, do you also want me to drop the multi-led compatible string?
So it would be:

  multi-led:
    $ref: /schemas/leds/leds-class-multicolor.yaml#

>
> Best regards,
> Krzysztof


^ permalink raw reply

* [PATCH v2 2/2] selftests/mm: rewrite gup_test as a standalone harness-based selftest
From: Sarthak Sharma @ 2026-05-19 12:05 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand
  Cc: Jonathan Corbet, Jason Gunthorpe, John Hubbard, Peter Xu,
	Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-mm,
	linux-kselftest, linux-kernel, linux-doc, Sarthak Sharma
In-Reply-To: <20260519120506.184512-1-sarthak.sharma@arm.com>

Rewrite gup_test.c using kselftest_harness.h. The new test covers 12
mapping configurations: THP on, THP off and hugetlb, each across
private/shared and read/write variants. It runs seven test cases per
variant: get_user_pages, get_user_pages_fast, pin_user_pages,
pin_user_pages_fast, pin_user_pages_longterm, and DUMP_USER_PAGES_TEST
via both get and pin.

Each test case sweeps four nr_pages_per_call values: 1, 512, 123, and
all pages. This preserves the old run_gup_matrix() sweep: 12 mapping
combinations x 5 GUP/PUP operations x 4 batch sizes = 240 ioctl sweeps.
It also expands DUMP_USER_PAGES_TEST coverage from one standalone
invocation to 12 variants x 2 dump modes x 4 batch sizes = 96
additional sweeps, for 336 total ioctl sweeps and 84 TAP-reported cases.

On a Radxa Orion O6 board, ./gup_test completes in 5.07s on average
over 10 runs (range: 4.94s - 5.18s).

Update run_vmtests.sh: remove run_gup_matrix() and the multiple flagged
invocations of gup_test, replacing them with a single unconditional
invocation. Benchmark functionality is handled by tools/mm/gup_bench
introduced in the previous patch.

Update Documentation/core-api/pin_user_pages.rst to reflect the new
harness-based gup_test interface rather than command-line flag
invocations.

Suggested-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Sarthak Sharma <sarthak.sharma@arm.com>
---
 Documentation/core-api/pin_user_pages.rst |  12 +-
 tools/testing/selftests/mm/gup_test.c     | 536 +++++++++++++---------
 tools/testing/selftests/mm/run_vmtests.sh |  37 +-
 3 files changed, 325 insertions(+), 260 deletions(-)

diff --git a/Documentation/core-api/pin_user_pages.rst b/Documentation/core-api/pin_user_pages.rst
index c16ca163b55e..ea722adf22cc 100644
--- a/Documentation/core-api/pin_user_pages.rst
+++ b/Documentation/core-api/pin_user_pages.rst
@@ -230,10 +230,16 @@ This file::
 
  tools/testing/selftests/mm/gup_test.c
 
-has the following new calls to exercise the new pin*() wrapper functions:
+contains the following test cases to exercise pin_user_pages*():
 
-* PIN_FAST_BENCHMARK (./gup_test -a)
-* PIN_BASIC_TEST (./gup_test -b)
+* pin_user_pages via PIN_BASIC_TEST
+* pin_user_pages_fast via PIN_FAST_BENCHMARK
+* pin_user_pages_longterm via PIN_LONGTERM_BENCHMARK
+
+Run with::
+
+  make -C tools/testing/selftests/mm
+  ./tools/testing/selftests/mm/gup_test
 
 You can monitor how many total dma-pinned pages have been acquired and released
 since the system was booted, via two new /proc/vmstat entries: ::
diff --git a/tools/testing/selftests/mm/gup_test.c b/tools/testing/selftests/mm/gup_test.c
index 3f841a96f870..d60d48bb9126 100644
--- a/tools/testing/selftests/mm/gup_test.c
+++ b/tools/testing/selftests/mm/gup_test.c
@@ -9,267 +9,361 @@
 #include <sys/mman.h>
 #include <sys/stat.h>
 #include <sys/types.h>
-#include <pthread.h>
-#include <assert.h>
 #include <mm/gup_test.h>
 #include "kselftest.h"
 #include "vm_util.h"
 #include "hugepage_settings.h"
+#include "kselftest_harness.h"
 
 #define MB (1UL << 20)
 
+#ifndef ARRAY_SIZE
+#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
+#endif
+
 /* Just the flags we need, copied from the kernel internals. */
 #define FOLL_WRITE	0x01	/* check pte is writable */
 
+/* Page counts exercising single, THP-batch, partial, and full-mapping GUP. */
+static const int nr_pages_list[] = { 1, 512, 123, -1 };
+
 #define GUP_TEST_FILE "/sys/kernel/debug/gup_test"
 
-static unsigned long cmd = GUP_FAST_BENCHMARK;
-static int gup_fd, repeats = 1;
-static unsigned long size = 128 * MB;
-/* Serialize prints */
-static pthread_mutex_t print_mutex = PTHREAD_MUTEX_INITIALIZER;
+FIXTURE(gup_test) {
+	int gup_fd;
+	char *addr;
+	unsigned long size;
+};
+
+FIXTURE_VARIANT(gup_test) {
+	bool thp;
+	bool hugetlb;
+	bool write;
+	bool shared;
+};
+
+FIXTURE_VARIANT_ADD(gup_test, private_write)
+{
+	.thp = false,
+	.hugetlb = false,
+	.write = true,
+	.shared = false,
+};
+
+FIXTURE_VARIANT_ADD(gup_test, private_readonly)
+{
+	.thp = false,
+	.hugetlb = false,
+	.write = false,
+	.shared = false,
+};
+
+FIXTURE_VARIANT_ADD(gup_test, private_write_thp)
+{
+	.thp = true,
+	.hugetlb = false,
+	.write = true,
+	.shared = false,
+};
+
+FIXTURE_VARIANT_ADD(gup_test, private_readonly_thp)
+{
+	.thp = true,
+	.hugetlb = false,
+	.write = false,
+	.shared = false,
+};
+
+FIXTURE_VARIANT_ADD(gup_test, private_write_hugetlb)
+{
+	.thp = false,
+	.hugetlb = true,
+	.write = true,
+	.shared = false,
+};
+
+FIXTURE_VARIANT_ADD(gup_test, private_readonly_hugetlb)
+{
+	.thp = false,
+	.hugetlb = true,
+	.write = false,
+	.shared = false,
+};
+
+FIXTURE_VARIANT_ADD(gup_test, shared_write)
+{
+	.thp = false,
+	.hugetlb = false,
+	.write = true,
+	.shared = true,
+};
+
+FIXTURE_VARIANT_ADD(gup_test, shared_readonly)
+{
+	.thp = false,
+	.hugetlb = false,
+	.write = false,
+	.shared = true,
+};
+
+FIXTURE_VARIANT_ADD(gup_test, shared_write_thp)
+{
+	.thp = true,
+	.hugetlb = false,
+	.write = true,
+	.shared = true,
+};
+
+FIXTURE_VARIANT_ADD(gup_test, shared_readonly_thp)
+{
+	.thp = true,
+	.hugetlb = false,
+	.write = false,
+	.shared = true,
+};
+
+FIXTURE_VARIANT_ADD(gup_test, shared_write_hugetlb)
+{
+	.thp = false,
+	.hugetlb = true,
+	.write = true,
+	.shared = true,
+};
 
-static char *cmd_to_str(unsigned long cmd)
+FIXTURE_VARIANT_ADD(gup_test, shared_readonly_hugetlb)
 {
-	switch (cmd) {
-	case GUP_FAST_BENCHMARK:
-		return "GUP_FAST_BENCHMARK";
-	case PIN_FAST_BENCHMARK:
-		return "PIN_FAST_BENCHMARK";
-	case PIN_LONGTERM_BENCHMARK:
-		return "PIN_LONGTERM_BENCHMARK";
-	case GUP_BASIC_TEST:
-		return "GUP_BASIC_TEST";
-	case PIN_BASIC_TEST:
-		return "PIN_BASIC_TEST";
-	case DUMP_USER_PAGES_TEST:
-		return "DUMP_USER_PAGES_TEST";
+	.thp = false,
+	.hugetlb = true,
+	.write = false,
+	.shared = true,
+};
+
+FIXTURE_SETUP(gup_test) {
+	int mmap_flags = MAP_PRIVATE;
+	int zero_fd;
+	char *p;
+
+	self->size = variant->hugetlb ? 256 * MB : 128 * MB;
+
+	/* Check for hugetlb */
+	if (variant->hugetlb) {
+		unsigned long hp_size = default_huge_page_size();
+
+		if (!hp_size)
+			SKIP(return, "HugeTLB not available\n");
+
+		self->size = (self->size + hp_size - 1) & ~(hp_size - 1);
+		if (!hugetlb_setup_default(self->size / hp_size))
+			SKIP(return, "Not enough huge pages\n");
+
+		mmap_flags |= (MAP_HUGETLB | MAP_ANONYMOUS);
 	}
-	return "Unknown command";
+
+	/* zero_fd has to be >= 0. Already checked in main() */
+	zero_fd = open("/dev/zero", O_RDWR);
+	ASSERT_GE(zero_fd, 0);
+
+	/* gup_fd has to be >= 0. Already checked in main() */
+	self->gup_fd = open(GUP_TEST_FILE, O_RDWR);
+	ASSERT_GE(self->gup_fd, 0);
+
+	if (variant->shared)
+		mmap_flags = (mmap_flags & ~MAP_PRIVATE) | MAP_SHARED;
+
+	self->addr = mmap(NULL, self->size, PROT_READ | PROT_WRITE,
+					mmap_flags, zero_fd, 0);
+	close(zero_fd);
+	ASSERT_NE(self->addr, MAP_FAILED);
+
+	if (variant->thp)
+		madvise(self->addr, self->size, MADV_HUGEPAGE);
+	else
+		madvise(self->addr, self->size, MADV_NOHUGEPAGE);
+
+	for (p = self->addr; (unsigned long)p < (unsigned long)self->addr
+			+ self->size; p += psize())
+		p[0] = 0;
 }
 
-void *gup_thread(void *data)
-{
-	struct gup_test gup = *(struct gup_test *)data;
-	int i, status;
-
-	/* Only report timing information on the *_BENCHMARK commands: */
-	if ((cmd == PIN_FAST_BENCHMARK) || (cmd == GUP_FAST_BENCHMARK) ||
-	     (cmd == PIN_LONGTERM_BENCHMARK)) {
-		for (i = 0; i < repeats; i++) {
-			gup.size = size;
-			status = ioctl(gup_fd, cmd, &gup);
-			if (status)
-				break;
-
-			pthread_mutex_lock(&print_mutex);
-			ksft_print_msg("%s: Time: get:%lld put:%lld us",
-				       cmd_to_str(cmd), gup.get_delta_usec,
-				       gup.put_delta_usec);
-			if (gup.size != size)
-				ksft_print_msg(", truncated (size: %lld)", gup.size);
-			ksft_print_msg("\n");
-			pthread_mutex_unlock(&print_mutex);
-		}
-	} else {
-		gup.size = size;
-		status = ioctl(gup_fd, cmd, &gup);
-		if (status)
-			goto return_;
-
-		pthread_mutex_lock(&print_mutex);
-		ksft_print_msg("%s: done\n", cmd_to_str(cmd));
-		if (gup.size != size)
-			ksft_print_msg("Truncated (size: %lld)\n", gup.size);
-		pthread_mutex_unlock(&print_mutex);
+FIXTURE_TEARDOWN(gup_test) {
+	munmap(self->addr, self->size);
+	close(self->gup_fd);
+
+	if (variant->hugetlb)
+		hugetlb_restore_settings();
+}
+
+TEST_F(gup_test, get_user_pages) {
+	/* Tests the get_user_pages path */
+	int i;
+
+	for (i = 0; i < (int)ARRAY_SIZE(nr_pages_list); i++) {
+		struct gup_test gup = { 0 };
+
+		gup.addr = (unsigned long)self->addr;
+		gup.size = self->size;
+		gup.nr_pages_per_call = nr_pages_list[i] < 0 ?
+			self->size / psize() : nr_pages_list[i];
+
+		if (variant->write)
+			gup.gup_flags |= FOLL_WRITE;
+
+		TH_LOG("nr_pages_per_call=%u", gup.nr_pages_per_call);
+		ASSERT_EQ(ioctl(self->gup_fd, GUP_BASIC_TEST, &gup), 0);
 	}
+}
+
+TEST_F(gup_test, pin_user_pages) {
+	/* Tests the pin_user_pages path */
+	int i;
+
+	for (i = 0; i < (int)ARRAY_SIZE(nr_pages_list); i++) {
+		struct gup_test gup = { 0 };
+
+		gup.addr = (unsigned long)self->addr;
+		gup.size = self->size;
+		gup.nr_pages_per_call = nr_pages_list[i] < 0 ?
+			self->size / psize() : nr_pages_list[i];
 
-return_:
-	ksft_test_result(!status, "ioctl status %d\n", status);
-	return NULL;
+		if (variant->write)
+			gup.gup_flags |= FOLL_WRITE;
+
+		TH_LOG("nr_pages_per_call=%u", gup.nr_pages_per_call);
+		ASSERT_EQ(ioctl(self->gup_fd, PIN_BASIC_TEST, &gup), 0);
+	}
 }
 
-int main(int argc, char **argv)
-{
-	struct gup_test gup = { 0 };
-	int filed, i, opt, nr_pages = 1, thp = -1, write = 1, nthreads = 1, ret;
-	int flags = MAP_PRIVATE;
-	char *file = "/dev/zero";
-	bool hugetlb = false;
-	pthread_t *tid;
-	char *p;
+TEST_F(gup_test, dump_user_pages_with_get) {
+	/* Tests DUMP_USER_PAGES_TEST using get_user_pages */
+	int i;
 
-	while ((opt = getopt(argc, argv, "m:r:n:F:f:abcj:tTLUuwWSHpz")) != -1) {
-		switch (opt) {
-		case 'a':
-			cmd = PIN_FAST_BENCHMARK;
-			break;
-		case 'b':
-			cmd = PIN_BASIC_TEST;
-			break;
-		case 'L':
-			cmd = PIN_LONGTERM_BENCHMARK;
-			break;
-		case 'c':
-			cmd = DUMP_USER_PAGES_TEST;
-			/*
-			 * Dump page 0 (index 1). May be overridden later, by
-			 * user's non-option arguments.
-			 *
-			 * .which_pages is zero-based, so that zero can mean "do
-			 * nothing".
-			 */
-			gup.which_pages[0] = 1;
-			break;
-		case 'p':
-			/* works only with DUMP_USER_PAGES_TEST */
-			gup.test_flags |= GUP_TEST_FLAG_DUMP_PAGES_USE_PIN;
-			break;
-		case 'F':
-			/* strtol, so you can pass flags in hex form */
-			gup.gup_flags = strtol(optarg, 0, 0);
-			break;
-		case 'j':
-			nthreads = atoi(optarg);
-			break;
-		case 'm':
-			size = atoi(optarg) * MB;
-			break;
-		case 'r':
-			repeats = atoi(optarg);
-			break;
-		case 'n':
-			nr_pages = atoi(optarg);
-			if (nr_pages < 0)
-				nr_pages = size / psize();
-			break;
-		case 't':
-			thp = 1;
-			break;
-		case 'T':
-			thp = 0;
-			break;
-		case 'U':
-			cmd = GUP_BASIC_TEST;
-			break;
-		case 'u':
-			cmd = GUP_FAST_BENCHMARK;
-			break;
-		case 'w':
-			write = 1;
-			break;
-		case 'W':
-			write = 0;
-			break;
-		case 'f':
-			file = optarg;
-			break;
-		case 'S':
-			flags &= ~MAP_PRIVATE;
-			flags |= MAP_SHARED;
-			break;
-		case 'H':
-			flags |= (MAP_HUGETLB | MAP_ANONYMOUS);
-			hugetlb = true;
-			break;
-		default:
-			ksft_exit_fail_msg("Wrong argument\n");
-		}
+	for (i = 0; i < (int)ARRAY_SIZE(nr_pages_list); i++) {
+		struct gup_test gup = { 0 };
+
+		gup.addr = (unsigned long)self->addr;
+		gup.size = self->size;
+		gup.nr_pages_per_call = nr_pages_list[i] < 0 ?
+			self->size / psize() : nr_pages_list[i];
+
+		if (variant->write)
+			gup.gup_flags |= FOLL_WRITE;
+
+		gup.which_pages[0] = 1;
+
+		TH_LOG("nr_pages_per_call=%u", gup.nr_pages_per_call);
+		ASSERT_EQ(ioctl(self->gup_fd, DUMP_USER_PAGES_TEST, &gup), 0);
 	}
+}
 
-	if (optind < argc) {
-		int extra_arg_count = 0;
-		/*
-		 * For example:
-		 *
-		 *   ./gup_test -c 0 1 0x1001
-		 *
-		 * ...to dump pages 0, 1, and 4097
-		 */
-
-		while ((optind < argc) &&
-		       (extra_arg_count < GUP_TEST_MAX_PAGES_TO_DUMP)) {
-			/*
-			 * Do the 1-based indexing here, so that the user can
-			 * use normal 0-based indexing on the command line.
-			 */
-			long page_index = strtol(argv[optind], 0, 0) + 1;
-
-			gup.which_pages[extra_arg_count] = page_index;
-			extra_arg_count++;
-			optind++;
-		}
+TEST_F(gup_test, dump_user_pages_with_pin) {
+	/* Tests DUMP_USER_PAGES_TEST using pin_user_pages */
+	int i;
+
+	for (i = 0; i < (int)ARRAY_SIZE(nr_pages_list); i++) {
+		struct gup_test gup = { 0 };
+
+		gup.addr = (unsigned long)self->addr;
+		gup.size = self->size;
+		gup.nr_pages_per_call = nr_pages_list[i] < 0 ?
+			self->size / psize() : nr_pages_list[i];
+
+		if (variant->write)
+			gup.gup_flags |= FOLL_WRITE;
+
+		gup.which_pages[0] = 1;
+		gup.test_flags |= GUP_TEST_FLAG_DUMP_PAGES_USE_PIN;
+
+		TH_LOG("nr_pages_per_call=%u", gup.nr_pages_per_call);
+		ASSERT_EQ(ioctl(self->gup_fd, DUMP_USER_PAGES_TEST, &gup), 0);
 	}
+}
 
-	ksft_print_header();
+TEST_F(gup_test, get_user_pages_fast) {
+	/* Tests the lockless get_user_pages_fast() path */
+	int i;
 
-	if (hugetlb) {
-		unsigned long hp_size = default_huge_page_size();
+	for (i = 0; i < (int)ARRAY_SIZE(nr_pages_list); i++) {
+		struct gup_test gup = { 0 };
 
-		if (!hp_size)
-			ksft_exit_skip("HugeTLB is unavailable\n");
+		gup.addr = (unsigned long)self->addr;
+		gup.size = self->size;
+		gup.nr_pages_per_call = nr_pages_list[i] < 0 ?
+			self->size / psize() : nr_pages_list[i];
 
-		size = (size + hp_size - 1) & ~(hp_size - 1);
-		if (!hugetlb_setup_default(size / hp_size))
-			ksft_exit_skip("Not enough huge pages\n");
+		if (variant->write)
+			gup.gup_flags |= FOLL_WRITE;
+
+		TH_LOG("nr_pages_per_call=%u", gup.nr_pages_per_call);
+		ASSERT_EQ(ioctl(self->gup_fd, GUP_FAST_BENCHMARK, &gup), 0);
 	}
+}
 
-	ksft_set_plan(nthreads);
+TEST_F(gup_test, pin_user_pages_fast) {
+	/* Tests the lockless pin_user_pages_fast() path */
+	int i;
 
-	filed = open(file, O_RDWR|O_CREAT, 0664);
-	if (filed < 0)
-		ksft_exit_fail_msg("Unable to open %s: %s\n", file, strerror(errno));
+	for (i = 0; i < (int)ARRAY_SIZE(nr_pages_list); i++) {
+		struct gup_test gup = { 0 };
 
-	gup.nr_pages_per_call = nr_pages;
-	if (write)
-		gup.gup_flags |= FOLL_WRITE;
-
-	gup_fd = open(GUP_TEST_FILE, O_RDWR);
-	if (gup_fd == -1) {
-		switch (errno) {
-		case EACCES:
-			if (getuid())
-				ksft_print_msg("Please run this test as root\n");
-			break;
-		case ENOENT:
-			if (opendir("/sys/kernel/debug") == NULL)
-				ksft_print_msg("mount debugfs at /sys/kernel/debug\n");
-			ksft_print_msg("check if CONFIG_GUP_TEST is enabled in kernel config\n");
-			break;
-		default:
-			ksft_print_msg("failed to open %s: %s\n", GUP_TEST_FILE, strerror(errno));
-			break;
-		}
-		ksft_test_result_skip("Please run this test as root\n");
-		ksft_exit_pass();
+		gup.addr = (unsigned long)self->addr;
+		gup.size = self->size;
+		gup.nr_pages_per_call = nr_pages_list[i] < 0 ?
+			self->size / psize() : nr_pages_list[i];
+
+		if (variant->write)
+			gup.gup_flags |= FOLL_WRITE;
+
+		TH_LOG("nr_pages_per_call=%u", gup.nr_pages_per_call);
+		ASSERT_EQ(ioctl(self->gup_fd, PIN_FAST_BENCHMARK, &gup), 0);
 	}
+}
 
-	p = mmap(NULL, size, PROT_READ | PROT_WRITE, flags, filed, 0);
-	if (p == MAP_FAILED)
-		ksft_exit_fail_msg("mmap: %s\n", strerror(errno));
-	gup.addr = (unsigned long)p;
+TEST_F(gup_test, pin_user_pages_longterm) {
+	/* Tests pin_user_pages() with FOLL_LONGTERM */
+	int i;
 
-	if (thp == 1)
-		madvise(p, size, MADV_HUGEPAGE);
-	else if (thp == 0)
-		madvise(p, size, MADV_NOHUGEPAGE);
+	for (i = 0; i < (int)ARRAY_SIZE(nr_pages_list); i++) {
+		struct gup_test gup = { 0 };
 
-	/* Fault them in here, from user space. */
-	for (; (unsigned long)p < gup.addr + size; p += psize())
-		p[0] = 0;
+		gup.addr = (unsigned long)self->addr;
+		gup.size = self->size;
+		gup.nr_pages_per_call = nr_pages_list[i] < 0 ?
+			self->size / psize() : nr_pages_list[i];
 
-	tid = malloc(sizeof(pthread_t) * nthreads);
-	assert(tid);
-	for (i = 0; i < nthreads; i++) {
-		ret = pthread_create(&tid[i], NULL, gup_thread, &gup);
-		assert(ret == 0);
+		if (variant->write)
+			gup.gup_flags |= FOLL_WRITE;
+
+		TH_LOG("nr_pages_per_call=%u", gup.nr_pages_per_call);
+		ASSERT_EQ(ioctl(self->gup_fd, PIN_LONGTERM_BENCHMARK, &gup), 0);
 	}
-	for (i = 0; i < nthreads; i++) {
-		ret = pthread_join(tid[i], NULL);
-		assert(ret == 0);
+}
+
+int main(int argc, char **argv)
+{
+	int fd;
+	char *file = "/dev/zero";
+
+	fd = open(file, O_RDWR);
+	if (fd < 0) {
+		ksft_print_header();
+		ksft_exit_fail_msg("Unable to open %s: %s\n", file, strerror(errno));
 	}
+	close(fd);
 
-	free(tid);
+	fd = open(GUP_TEST_FILE, O_RDWR);
+	if (fd == -1) {
+		ksft_print_header();
+		if (errno == EACCES)
+			ksft_exit_skip("Please run this test as root\n");
+		if (errno == ENOENT) {
+			if (opendir("/sys/kernel/debug") == NULL)
+				ksft_exit_skip("Mount debugfs at /sys/kernel/debug\n");
+			else
+				ksft_exit_skip("Check CONFIG_GUP_TEST in kernel config\n");
+		}
+		ksft_exit_skip("failed to open %s: %s\n", GUP_TEST_FILE, strerror(errno));
+	}
+	close(fd);
 
-	ksft_exit_pass();
+	return test_harness_run(argc, argv);
 }
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index 043aa3ed2596..65a4ef0f3748 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -130,30 +130,6 @@ test_selected() {
 	fi
 }
 
-run_gup_matrix() {
-    # -t: thp=on, -T: thp=off, -H: hugetlb=on
-    local hugetlb_mb=256
-
-    for huge in -t -T "-H -m $hugetlb_mb"; do
-        # -u: gup-fast, -U: gup-basic, -a: pin-fast, -b: pin-basic, -L: pin-longterm
-        for test_cmd in -u -U -a -b -L; do
-            # -w: write=1, -W: write=0
-            for write in -w -W; do
-                # -S: shared
-                for share in -S " "; do
-                    # -n: How many pages to fetch together?  512 is special
-                    # because it's default thp size (or 2M on x86), 123 to
-                    # just test partial gup when hit a huge in whatever form
-                    for num in "-n 1" "-n 512" "-n 123" "-n -1"; do
-                        CATEGORY="gup_test" run_test ./gup_test \
-                                $huge $test_cmd $write $share $num
-                    done
-                done
-            done
-        done
-    done
-}
-
 # filter 64bit architectures
 ARCH64STR="arm64 mips64 parisc64 ppc64 ppc64le riscv64 s390x sparc64 x86_64"
 if [ -z "$ARCH" ]; then
@@ -239,18 +215,7 @@ fi
 
 CATEGORY="mmap" run_test ./map_fixed_noreplace
 
-if $RUN_ALL; then
-    run_gup_matrix
-else
-    # get_user_pages_fast() benchmark
-    CATEGORY="gup_test" run_test ./gup_test -u -n 1
-    CATEGORY="gup_test" run_test ./gup_test -u -n -1
-    # pin_user_pages_fast() benchmark
-    CATEGORY="gup_test" run_test ./gup_test -a -n 1
-    CATEGORY="gup_test" run_test ./gup_test -a -n -1
-fi
-# Dump pages 0, 19, and 4096, using pin_user_pages:
-CATEGORY="gup_test" run_test ./gup_test -ct -F 0x1 0 19 0x1000
+CATEGORY="gup_test" run_test ./gup_test
 CATEGORY="gup_test" run_test ./gup_longterm
 
 CATEGORY="userfaultfd" run_test ./uffd-unit-tests
-- 
2.39.5


^ permalink raw reply related

* [PATCH v2 1/2] tools/mm: add a standalone GUP microbenchmark
From: Sarthak Sharma @ 2026-05-19 12:05 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand
  Cc: Jonathan Corbet, Jason Gunthorpe, John Hubbard, Peter Xu,
	Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-mm,
	linux-kselftest, linux-kernel, linux-doc, Sarthak Sharma
In-Reply-To: <20260519120506.184512-1-sarthak.sharma@arm.com>

Add a command-line tool for benchmarking get_user_pages fast-path
(GUP_FAST), pin_user_pages fast-path (PIN_FAST), and pin_user_pages
longterm (PIN_LONGTERM) via the CONFIG_GUP_TEST debugfs interface.

When invoked without arguments, gup_bench runs the same matrix of
configurations as run_gup_matrix() in run_vmtests.sh: all three GUP
commands across read/write, private/shared mappings, and a range of
page counts, with THP on/off for regular mappings and hugetlb for huge
page mappings.

This tool is a mix of reused and new logic. The mapping/setup path comes
from selftests/mm/gup_test.c, while the default benchmark matrix matches
run_gup_matrix() in run_vmtests.sh. The standalone CLI and tools/mm
integration are added here so tools/mm does not depend on kselftest.

Add gup_bench to BUILD_TARGETS and INSTALL_TARGETS in tools/mm/Makefile,
and ignore the resulting binary in tools/mm/.gitignore. While here, also
add the missing thp_swap_allocator_test entry to .gitignore.

Add tools/mm/gup_bench.c to the GUP entry in MAINTAINERS.

Suggested-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Sarthak Sharma <sarthak.sharma@arm.com>
---
 MAINTAINERS          |   1 +
 tools/mm/.gitignore  |   2 +
 tools/mm/Makefile    |   6 +-
 tools/mm/gup_bench.c | 491 +++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 497 insertions(+), 3 deletions(-)
 create mode 100644 tools/mm/gup_bench.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 98d0a7a1c689..c91165b9280e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16830,6 +16830,7 @@ T:	git git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
 F:	mm/gup.c
 F:	mm/gup_test.c
 F:	mm/gup_test.h
+F:	tools/mm/gup_bench.c
 F:	tools/testing/selftests/mm/gup_longterm.c
 F:	tools/testing/selftests/mm/gup_test.c
 
diff --git a/tools/mm/.gitignore b/tools/mm/.gitignore
index 922879f93fc8..154d740be02e 100644
--- a/tools/mm/.gitignore
+++ b/tools/mm/.gitignore
@@ -2,3 +2,5 @@
 slabinfo
 page-types
 page_owner_sort
+thp_swap_allocator_test
+gup_bench
diff --git a/tools/mm/Makefile b/tools/mm/Makefile
index f5725b5c23aa..8e4db797a17a 100644
--- a/tools/mm/Makefile
+++ b/tools/mm/Makefile
@@ -3,13 +3,13 @@
 #
 include ../scripts/Makefile.include
 
-BUILD_TARGETS=page-types slabinfo page_owner_sort thp_swap_allocator_test
+BUILD_TARGETS=page-types slabinfo page_owner_sort thp_swap_allocator_test gup_bench
 INSTALL_TARGETS = $(BUILD_TARGETS) thpmaps
 
 LIB_DIR = ../lib/api
 LIBS = $(LIB_DIR)/libapi.a
 
-CFLAGS += -Wall -Wextra -I../lib/ -pthread
+CFLAGS += -Wall -Wextra -I../lib/ -I../.. -pthread
 LDFLAGS += $(LIBS) -pthread
 
 all: $(BUILD_TARGETS)
@@ -23,7 +23,7 @@ $(LIBS):
 	$(CC) $(CFLAGS) -o $@ $< $(LDFLAGS)
 
 clean:
-	$(RM) page-types slabinfo page_owner_sort thp_swap_allocator_test
+	$(RM) page-types slabinfo page_owner_sort thp_swap_allocator_test gup_bench
 	make -C $(LIB_DIR) clean
 
 sbindir ?= /usr/sbin
diff --git a/tools/mm/gup_bench.c b/tools/mm/gup_bench.c
new file mode 100644
index 000000000000..2806ee0d7453
--- /dev/null
+++ b/tools/mm/gup_bench.c
@@ -0,0 +1,491 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Microbenchmark for get_user_pages (GUP) kernel interfaces.
+ *
+ * Exercises GUP_FAST_BENCHMARK, PIN_FAST_BENCHMARK, and
+ * PIN_LONGTERM_BENCHMARK via the CONFIG_GUP_TEST debugfs interface.
+ *
+ * Example use:
+ *   # Run the full matrix (all commands, access modes, page counts):
+ *   ./gup_bench
+ *
+ *   # Single run: pin_user_pages_fast, 512 pages, write access, hugetlb:
+ *   ./gup_bench -a -n 512 -w -H
+ *
+ * Requires CONFIG_GUP_TEST=y and debugfs mounted at /sys/kernel/debug.
+ * Must be run as root.
+ */
+
+#define __SANE_USERSPACE_TYPES__ // Use ll64
+#include <fcntl.h>
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <dirent.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <sys/types.h>
+#include <pthread.h>
+#include <assert.h>
+#include <stdbool.h>
+#include <stdatomic.h>
+#include <limits.h>
+#include <mm/gup_test.h>
+#include <string.h>
+
+#define MB (1UL << 20)
+
+#ifndef ARRAY_SIZE
+#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
+#endif
+
+/* Just the flags we need, copied from the kernel internals. */
+#define FOLL_WRITE	0x01	/* check pte is writable */
+
+#define GUP_TEST_FILE "/sys/kernel/debug/gup_test"
+
+/*
+ * Local HugeTLB setup helpers for gup_bench.
+ *
+ * These helpers were copied from tools/testing/selftests/mm/ and adjusted to
+ * remove the ksft formatting. Keep this copy local so tools/mm does not
+ * depend on ksft output behavior.
+ */
+
+static unsigned int psize(void)
+{
+	static unsigned int __page_size;
+
+	if (!__page_size)
+		__page_size = sysconf(_SC_PAGESIZE);
+	return __page_size;
+}
+
+static unsigned long default_huge_page_size(void)
+{
+	FILE *f = fopen("/proc/meminfo", "r");
+	unsigned long hpage_size = 0;
+	char buf[256];
+
+	if (!f)
+		return 0;
+	while (fgets(buf, sizeof(buf), f)) {
+		if (sscanf(buf, "Hugepagesize:       %lu kB", &hpage_size) == 1)
+			break;
+	}
+	fclose(f);
+	hpage_size <<= 10;
+	return hpage_size;
+}
+
+static void hugetlb_sysfs_path(char *buf, size_t buflen,
+			       unsigned long size, const char *attr)
+{
+	snprintf(buf, buflen, "/sys/kernel/mm/hugepages/hugepages-%lukB/%s",
+		 size / 1024, attr);
+}
+
+static unsigned long hugetlb_read_num(const char *path)
+{
+	char buf[32];
+	FILE *f = fopen(path, "r");
+	unsigned long val = 0;
+
+	if (!f)
+		return 0;
+	if (fgets(buf, sizeof(buf), f))
+		val = strtoul(buf, NULL, 10);
+	fclose(f);
+	return val;
+}
+
+static void hugetlb_write_num(const char *path, unsigned long num)
+{
+	FILE *f = fopen(path, "w");
+
+	if (!f)
+		return;
+	fprintf(f, "%lu\n", num);
+	fclose(f);
+}
+
+static unsigned long hugetlb_nr_pages(unsigned long size)
+{
+	char path[PATH_MAX];
+
+	hugetlb_sysfs_path(path, sizeof(path), size, "nr_hugepages");
+	return hugetlb_read_num(path);
+}
+
+static void hugetlb_set_nr_pages(unsigned long size, unsigned long nr)
+{
+	char path[PATH_MAX];
+
+	hugetlb_sysfs_path(path, sizeof(path), size, "nr_hugepages");
+	hugetlb_write_num(path, nr);
+}
+
+static unsigned long hugetlb_free_pages(unsigned long size)
+{
+	char path[PATH_MAX];
+
+	hugetlb_sysfs_path(path, sizeof(path), size, "free_hugepages");
+	return hugetlb_read_num(path);
+}
+
+/* Saved pool size to restore on exit */
+static unsigned long hugetlb_saved_nr;
+static unsigned long hugetlb_saved_size;
+
+static void hugetlb_restore_atexit(void)
+{
+	if (hugetlb_saved_size)
+		hugetlb_set_nr_pages(hugetlb_saved_size, hugetlb_saved_nr);
+}
+
+static bool __hugetlb_setup(unsigned long size, unsigned long nr)
+{
+	unsigned long free = hugetlb_free_pages(size);
+	unsigned long total = hugetlb_nr_pages(size);
+
+	if (free >= nr)
+		return true;
+
+	hugetlb_set_nr_pages(size, total + (nr - free));
+
+	return hugetlb_free_pages(size) >= nr;
+}
+
+static bool hugetlb_setup_default(unsigned long nr)
+{
+	unsigned long hsize = default_huge_page_size();
+
+	if (!hsize)
+		return false;
+
+	/* Save current pool so we can restore it on exit (only on first call) */
+	if (!hugetlb_saved_size) {
+		hugetlb_saved_size = hsize;
+		hugetlb_saved_nr = hugetlb_nr_pages(hsize);
+		atexit(hugetlb_restore_atexit);
+	}
+
+	return __hugetlb_setup(hsize, nr);
+}
+
+static unsigned long cmd;
+static const char *bench_label;
+static int gup_fd, repeats = 1;
+static unsigned long size = 128 * MB;
+static atomic_int bench_error;
+/* Serialize prints */
+static pthread_mutex_t print_mutex = PTHREAD_MUTEX_INITIALIZER;
+
+static const unsigned long bench_cmds[] = {
+	GUP_FAST_BENCHMARK,
+	PIN_FAST_BENCHMARK,
+	PIN_LONGTERM_BENCHMARK,
+};
+static const int bench_thp_modes[] = { 1, 0 };	/* on, off */
+static const int bench_nr_pages_list[] = { 1, 512, 123, -1 };
+
+static const char *cmd_to_str(unsigned long cmd)
+{
+	switch (cmd) {
+	case GUP_FAST_BENCHMARK:
+		return "GUP_FAST_BENCHMARK";
+	case PIN_FAST_BENCHMARK:
+		return "PIN_FAST_BENCHMARK";
+	case PIN_LONGTERM_BENCHMARK:
+		return "PIN_LONGTERM_BENCHMARK";
+	}
+	return "Unknown command";
+}
+
+struct bench_run {
+	unsigned long cmd;
+	int thp;		/* -1: default, 0: off, 1: on */
+	bool hugetlb;
+	bool write;
+	bool shared;
+	int nr_pages;		/* -1 means all pages (size / psize()) */
+	unsigned long size;
+	char *file;
+	int nthreads;
+	unsigned int gup_flags;
+};
+
+void *gup_thread(void *data)
+{
+	struct gup_test gup = *(struct gup_test *)data;
+	int i, status;
+
+	for (i = 0; i < repeats; i++) {
+		gup.size = size;
+		status = ioctl(gup_fd, cmd, &gup);
+		if (status) {
+			bench_error = 1;
+			break;
+		}
+
+		pthread_mutex_lock(&print_mutex);
+		printf("%s time: get:%lld put:%lld us",
+			    bench_label, gup.get_delta_usec,
+				    gup.put_delta_usec);
+		if (gup.size != size)
+			printf(", truncated (size: %lld)", gup.size);
+		printf("\n");
+		pthread_mutex_unlock(&print_mutex);
+	}
+
+	return NULL;
+}
+
+static int run_bench(struct bench_run *run)
+{
+	struct gup_test gup = { 0 };
+	int zero_fd, i, ret, started_threads = 0;
+	int flags = MAP_PRIVATE;
+	pthread_t *tid;
+	char label[128];
+	char *p;
+
+	/* Set globals consumed by gup_thread */
+	cmd = run->cmd;
+	size = run->size;
+	bench_error = 0;
+
+	if (run->hugetlb) {
+		unsigned long hp_size = default_huge_page_size();
+
+		if (!hp_size) {
+			fprintf(stderr, "Could not determine huge page size\n");
+			return 1;
+		}
+		size = (size + hp_size - 1) & ~(hp_size - 1);
+		if (!hugetlb_setup_default(size / hp_size)) {
+			fprintf(stderr, "Not enough huge pages\n");
+			return 1;
+		}
+		flags |= (MAP_HUGETLB | MAP_ANONYMOUS);
+	}
+
+	if (run->shared) {
+		flags &= ~MAP_PRIVATE;
+		flags |= MAP_SHARED;
+	}
+
+	gup.nr_pages_per_call = run->nr_pages < 0 ? size / psize() :
+		(unsigned long)run->nr_pages;
+
+	gup.gup_flags = run->gup_flags;
+	if (run->write)
+		gup.gup_flags |= FOLL_WRITE;
+
+	snprintf(label, sizeof(label), "%s (nr_pages=%-4u %s %s %s %s)",
+		 cmd_to_str(run->cmd),
+		 gup.nr_pages_per_call,
+		 run->write  ? "write"   : "read",
+		 run->shared ? "shared"  : "private",
+		 run->hugetlb ? "hugetlb=on" : "hugetlb=off",
+		 run->hugetlb ? "thp=off" :
+		 (run->thp == 1 ? "thp=on" :
+		 (run->thp == 0 ? "thp=off" : "thp=default")));
+	bench_label = label;
+
+	zero_fd = open(run->file, O_RDWR);
+	if (zero_fd < 0) {
+		fprintf(stderr, "Unable to open %s: %s\n", run->file, strerror(errno));
+		return 1;
+	}
+
+	p = mmap(NULL, size, PROT_READ | PROT_WRITE, flags, zero_fd, 0);
+	close(zero_fd);
+	if (p == MAP_FAILED) {
+		fprintf(stderr, "mmap: %s\n", strerror(errno));
+		return 1;
+	}
+	gup.addr = (unsigned long)p;
+
+	if (run->thp == 1)
+		madvise(p, size, MADV_HUGEPAGE);
+	else if (run->thp == 0)
+		madvise(p, size, MADV_NOHUGEPAGE);
+
+	/* Fault them in here, from user space. */
+	for (; (unsigned long)p < gup.addr + size; p += psize())
+		p[0] = 0;
+
+	tid = malloc(sizeof(pthread_t) * run->nthreads);
+	assert(tid);
+	for (i = 0; i < run->nthreads; i++) {
+		ret = pthread_create(&tid[i], NULL, gup_thread, &gup);
+		if (ret) {
+			fprintf(stderr, "pthread_create failed: %s\n", strerror(ret));
+			bench_error = 1;
+			break;
+		}
+		started_threads++;
+	}
+	for (i = 0; i < started_threads; i++) {
+		ret = pthread_join(tid[i], NULL);
+		if (ret) {
+			fprintf(stderr, "pthread_join failed: %s\n", strerror(ret));
+			bench_error = 1;
+		}
+	}
+
+	free(tid);
+	munmap((void *)gup.addr, size);
+
+	return bench_error ? 1 : 0;
+}
+
+static int run_matrix(void)
+{
+	unsigned int c, t, w, s, n;
+	int ret = 0;
+
+	for (c = 0; c < ARRAY_SIZE(bench_cmds); c++) {
+		for (w = 0; w <= 1; w++) {
+			for (s = 0; s <= 1; s++) {
+				for (t = 0; t < ARRAY_SIZE(bench_thp_modes); t++) {
+					for (n = 0; n < ARRAY_SIZE(bench_nr_pages_list); n++) {
+						struct bench_run run = {
+							.cmd	  = bench_cmds[c],
+							.thp	  = bench_thp_modes[t],
+							.hugetlb  = false,
+							.write	  = w,
+							.shared	  = s,
+							.nr_pages = bench_nr_pages_list[n],
+							.size	  = 128 * MB,
+							.file	  = "/dev/zero",
+							.nthreads = 1,
+						};
+						ret |= run_bench(&run);
+					}
+				}
+				/* hugetlb: 256M to match run_gup_matrix() in run_vmtests.sh */
+				for (n = 0; n < ARRAY_SIZE(bench_nr_pages_list); n++) {
+					struct bench_run run = {
+						.cmd	  = bench_cmds[c],
+						.thp	  = -1,
+						.hugetlb  = true,
+						.write	  = w,
+						.shared	  = s,
+						.nr_pages = bench_nr_pages_list[n],
+						.size	  = 256 * MB,
+						.file	  = "/dev/zero",
+						.nthreads = 1,
+					};
+					ret |= run_bench(&run);
+				}
+			}
+		}
+	}
+	return ret;
+}
+
+int main(int argc, char **argv)
+{
+	struct bench_run run = {
+		.cmd	  = GUP_FAST_BENCHMARK,
+		.thp	  = -1,
+		.hugetlb  = false,
+		.write	  = true,
+		.shared	  = false,
+		.nr_pages = 1,
+		.size	  = 128 * MB,
+		.file	  = "/dev/zero",
+		.nthreads = 1,
+	};
+	int opt, result;
+
+	while ((opt = getopt(argc, argv, "m:r:n:F:f:aj:tTLuwWSH")) != -1) {
+		switch (opt) {
+
+		/* Command selection */
+		case 'u':
+			run.cmd = GUP_FAST_BENCHMARK;
+			break;
+		case 'a':
+			run.cmd = PIN_FAST_BENCHMARK;
+			break;
+		case 'L':
+			run.cmd = PIN_LONGTERM_BENCHMARK;
+			break;
+
+		/* Memory type */
+		case 'H':
+			run.hugetlb = true;
+			break;
+		case 't':
+			run.thp = 1;
+			break;
+		case 'T':
+			run.thp = 0;
+			break;
+
+		/* Access mode */
+		case 'w':
+			run.write = true;
+			break;
+		case 'W':
+			run.write = false;
+			break;
+		case 'S':
+			run.shared = true;
+			break;
+
+		/* Mapping */
+		case 'f':
+			run.file = optarg;
+			break;
+
+		/* Sizing and iteration */
+		case 'm':
+			run.size = atoi(optarg) * MB;
+			break;
+		case 'n':
+			run.nr_pages = atoi(optarg);
+			break;
+		case 'r':
+			repeats = atoi(optarg);
+			break;
+		case 'j':
+			run.nthreads = atoi(optarg);
+			break;
+
+		/* Advanced */
+		case 'F':
+			/* strtol, so you can pass flags in hex form */
+			run.gup_flags = strtol(optarg, 0, 0);
+			break;
+
+		default:
+			fprintf(stderr, "Wrong argument\n");
+			exit(1);
+		}
+	}
+
+	gup_fd = open(GUP_TEST_FILE, O_RDWR);
+	if (gup_fd == -1) {
+		if (errno == EACCES) {
+			fprintf(stderr, "Please run as root\n");
+		} else if (errno == ENOENT) {
+			if (opendir("/sys/kernel/debug") == NULL)
+				fprintf(stderr, "Mount debugfs at /sys/kernel/debug\n");
+			else
+				fprintf(stderr, "Check CONFIG_GUP_TEST in kernel config\n");
+		} else {
+			fprintf(stderr, "Failed to open %s: %s\n", GUP_TEST_FILE,
+				strerror(errno));
+		}
+		exit(1);
+	}
+
+	result = (argc == 1) ? run_matrix() : run_bench(&run);
+	close(gup_fd);
+	return result;
+}
-- 
2.39.5


^ permalink raw reply related

* [PATCH v2 0/2] selftests/mm: separate GUP microbenchmarking from functional testing
From: Sarthak Sharma @ 2026-05-19 12:05 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand
  Cc: Jonathan Corbet, Jason Gunthorpe, John Hubbard, Peter Xu,
	Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-mm,
	linux-kselftest, linux-kernel, linux-doc, Sarthak Sharma

gup_test.c currently serves two distinct purposes: microbenchmarking
(GUP_FAST_BENCHMARK, PIN_FAST_BENCHMARK, PIN_LONGTERM_BENCHMARK) and
functional correctness testing (GUP_BASIC_TEST, PIN_BASIC_TEST,
DUMP_USER_PAGES_TEST). Mixing these in a single binary means functional
tests cannot be run or reported individually, and run_vmtests.sh must
invoke the binary multiple times with different flag combinations to
cover all configurations. This patch series separates the two concerns:
tools/mm/gup_bench for benchmarking and tools/testing/selftests/mm/gup_test
for functional testing.

Patch 1 adds tools/mm/gup_bench.c, a standalone microbenchmark for
GUP_FAST, PIN_FAST and PIN_LONGTERM via the CONFIG_GUP_TEST debugfs
interface. It runs the same matrix of configurations as the old
run_gup_matrix() shell function (all three commands, read/write,
private/shared, four page counts, THP on/off, hugetlb), but as a
standalone C program under tools/mm with no dependency on kselftest.

Patch 2 rewrites gup_test.c as a kselftest harness-based selftest. It
covers all five GUP kernel functions (get_user_pages, get_user_pages_fast,
pin_user_pages, pin_user_pages_fast, pin_user_pages with FOLL_LONGTERM)
plus DUMP_USER_PAGES_TEST, across 12 mapping configurations (THP on,
THP off and hugetlb, each across private/shared and read/write variants)
and four batch sizes (1, 512, 123, all pages). Results are reported as
standard TAP output with no command-line arguments required.

---
These patches apply on top of mm/mm-new.

Changes in v2:
- Address v1 feedback from Sashiko
- Add fast and longterm GUP/PUP coverage
- Sweep nr_pages_per_call over 1, 512, 123, and all pages
- Call madvise(MADV_NOHUGEPAGE) in non-THP variants
- Use 256 MB for hugetlb fixtures
- Use hugetlb_restore_settings() in FIXTURE_TEARDOWN instead of atexit()
- Add TH_LOG to report nr_pages_per_call for each iteration
- Update Documentation/core-api/pin_user_pages.rst unit testing section

Sarthak Sharma (2):
  tools/mm: add a standalone GUP microbenchmark
  selftests/mm: rewrite gup_test as a standalone harness-based selftest

 Documentation/core-api/pin_user_pages.rst |  12 +-
 MAINTAINERS                               |   1 +
 tools/mm/.gitignore                       |   2 +
 tools/mm/Makefile                         |   6 +-
 tools/mm/gup_bench.c                      | 491 ++++++++++++++++++++
 tools/testing/selftests/mm/gup_test.c     | 536 +++++++++++++---------
 tools/testing/selftests/mm/run_vmtests.sh |  37 +-
 7 files changed, 822 insertions(+), 263 deletions(-)
 create mode 100644 tools/mm/gup_bench.c

base-commit: 2c3f468717231305523ddcd94d91c0d5e4a72419
-- 
2.39.5

^ permalink raw reply

* Re: [PATCH] Documentation: KVM: Document guest-visible compatibility expectations
From: David Woodhouse @ 2026-05-19 11:44 UTC (permalink / raw)
  To: Will Deacon
  Cc: Paolo Bonzini, Marc Zyngier, Jonathan Corbet, Shuah Khan, kvm,
	Linux Doc Mailing List, Kernel Mailing List, Linux,
	Sean Christopherson, Jim Mattson, Oliver Upton, Joey Gouly,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Raghavendra Rao Ananta, Eric Auger, Kees Cook, Arnd Bergmann,
	Nathan Chancellor, linux-arm-kernel, kvmarm, linux-kselftest
In-Reply-To: <agxFbniU_6eQ98t2@willie-the-truck>

[-- Attachment #1: Type: text/plain, Size: 3074 bytes --]

On Tue, 2026-05-19 at 12:11 +0100, Will Deacon wrote:
> On Tue, May 19, 2026 at 11:41:26AM +0100, David Woodhouse wrote:
> > On Wed, 2026-05-13 at 18:24 +0200, Paolo Bonzini wrote:
> > > 
> > > > See commit https://git.kernel.org/torvalds/c/49a1a2c70a7f which adds a
> > > > new guest-visible feature in revision 3, but allowed userspace to
> > > > restore the old behaviour by setting it to revision 2. All my patch
> > > >  above does, is make it possible to set it to revision 1 as
> > > > well. Because https://git.kernel.org/torvalds/c/d53c2c29ae0d previously
> > > > changed the behaviour and bumped the default to 2 *without* allowing
> > > > userspace to restore the prior behaviour, and we've been carrying a
> > > > *revert* of that patch.
> > > > 
> > > > Why would we *not* accept such a patch?
> > > 
> > > Agreed. Even ignoring your revert, there's no reason why any upgrade
> > > past 49a1a2c70a7f has to be from after d53c2c29ae0d.
> > 
> > So where do we go from here?
> > 
> > I assume you'll be taking this Documentation patch via the KVM tree?
> > 
> > But what about the actual fix at 
> > https://lore.kernel.org/all/20260511113558.3325004-2-dwmw2@infradead.org/
> > 
> > This is a simple and unintrusive bug fix to make KVM/arm64 follow the
> > "common sense" requirement that the doc patch codifies, apparently
> > being rejected with the rather bizarre claim that KVM has no *need* to
> > maintain guest-visible compatibility across host kernel changes.
> > 
> > So... what next? Is one of the other KVM/arm64 maintainers going to
> > speak up? Paolo would you consider taking the fixes through your tree
> > directly? 
> > 
> > Does Arm not actually *care* whether AArch64 is considered a stable and
> > mature platform for KVM hosting?
> 
> Hey, come on. Marc cares more about this stuff than anybody else on the
> planet. He's been single-handedly maintaining the tree for the past
> couple of releases while Oliver was out and he's on the end of a _lot_
> of patches. I'm only cc'd on a fraction of the KVM/arm64 changes and
> it's bedlam.

I certainly wouldn't disagree with any of that. The depth of knowledge
and the amount of energy that Marc displays through this work is
impressive, and I'm sure we all have an enormous amount of respect for
it, and for him. I know I do.

Nevertheless, the specific technical decision to reject the simple bug
fix linked above is dead wrong.

Because the principle under which it was rejected — the idea that KVM
has no responsibility to maintain compatibility of guest-visible
behaviour from one kernel version to the next — is also dead wrong.

If KVM on arm64 doesn't aspire to maintain guest compatibility across
host kernel changes — regardless of whether the previous kernel's
behaviour was "blessed" by the architecture specification or not — then
it does not meet the expectation that we have of KVM implementations in
the Linux kernel.

Or indeed the standards that we've held for Linux kernel ABIs for the
last 35 years.

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v2 0/6] io_uring/zcrx: add CQE based notifications and stats reporting
From: Pavel Begunkov @ 2026-05-19 11:43 UTC (permalink / raw)
  To: Clément Léger, io-uring, Jens Axboe
  Cc: linux-doc, linux-kernel, linux-kselftest, netdev, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Jonathan Corbet, Shuah Khan, Vishwanath Seshagiri
In-Reply-To: <20260518153532.2835502-1-cleger@meta.com>

On 5/18/26 16:35, Clément Léger wrote:
> The zcrx path can encounter various conditions that lead to internal
> fallbacks or errors. These errors can have a large impact on performance
> and functionality but are not yet not being reported to the user which
> is then unable to take action.> 
> This series addresses this problem by adding a new notification system
> paired with a statistics structure. The notification system currently
> report out of buffer and packets that fallback to copy. The statistics
> structure report the number and total size of packets that were copied
> rather than received via the zero-copy path.
> 
> The out of buffer notification allows the user to actually adjust the
> buffer sizing when registering zcrx support for the ifq. Some future
> work could allow the user to add more memory on the fly to the pool so
> the page allocator doesn't run out of memory.

Looks good, I'm going to take the first 4 and send out with other
zcrx patches.

-- 
Pavel Begunkov


^ permalink raw reply

* Re: [PATCH] nios2: remove the architecture
From: Dinh Nguyen @ 2026-05-19 11:40 UTC (permalink / raw)
  To: Wolfram Sang, Simon Schuster
  Cc: Ethan Nelson-Moore, Peter Zijlstra, Arnd Bergmann, linux-doc,
	devicetree, workflows, Linux-Arch, dmaengine, linux-i2c,
	linux-iio, Netdev, linux-pci, linux-pwm, linux-hardening,
	linux-kbuild, linux-csky@vger.kernel.org, Jonathan Corbet,
	Shuah Khan, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Daniel Lezcano, Thomas Gleixner, Alex Shi, Yanteng Si,
	Dongliang Mu, Hu Haowen, Kees Cook, Oleg Nesterov, Will Deacon,
	Aneesh Kumar K.V (Arm), Andrew Morton, Nicholas Piggin,
	Vinod Koul, Frank Li, Dave Penkler, Andi Shyti, Jonathan Cameron,
	David Lechner, Nuno Sá, Andy Shevchenko, Andrew Lunn,
	David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Lorenzo Pieralisi, Krzysztof WilczyDski, Andreas Oetken
In-Reply-To: <agxBqd-ubOL2_i-j@shikoro>

Hi Simon,

On 5/19/26 05:55, Wolfram Sang wrote:
> Hi Simon,
> 
>>> ... but given this, you might want to get added in MAINTAINERS as
>>> reviewer (or even maintainer) for nios2? Besides that your efforts are
>>> already worth it in my book, it would also ensure you get CCed on
>>> patches like this. Then, you are not depending on people like Arnd
>>> putting you in the loop manually.
>>
>> Sure, I'd be glad to do so, but so far I refrained from it as I was a bit
>> unsure about the netiquette (can I simply do so by self-proclamation? At
>> least the git history seems to suggest so...).
> 
> In your case, you can do so, I'd say. You explained your very reasonable
> interest in the architecture and have already shown efforts to keep it,
> as we can see from the git history. The final call will be done by Dinh
> Nguyen obviously with whom you probably need to sort out details. But I
> can't imagine your offer for help will be rejected, quite the contrary.
> 

I 100% support adding you as a maintainer. Please send a patch.

Thanks,
Dinh


^ permalink raw reply

* Re: [Linaro-mm-sig] Re: [PATCH 4/8] drm/panthor: Add support for protected memory allocation in panthor
From: Boris Brezillon @ 2026-05-19 11:37 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: Chia-I Wu, Liviu Dudau, Marcin Ślusarz, Ketil Johnsen,
	David Airlie, Simona Vetter, Maarten Lankhorst, Thomas Zimmermann,
	Jonathan Corbet, Shuah Khan, Sumit Semwal, Benjamin Gaignard,
	Brian Starkey, John Stultz, T.J. Mercier, Christian König,
	Steven Price, Daniel Almeida, Alice Ryhl, Matthias Brugger,
	AngeloGioacchino Del Regno, dri-devel, linux-doc, linux-kernel,
	linux-media, linaro-mm-sig, linux-arm-kernel, linux-mediatek,
	Florent Tomasin, nd
In-Reply-To: <20260519-loutish-beautiful-trogon-67453f@houat>

On Tue, 19 May 2026 11:52:13 +0200
Maxime Ripard <mripard@kernel.org> wrote:

> Hi Boris,
> 
> On Mon, May 18, 2026 at 09:16:50AM +0200, Boris Brezillon wrote:
> > On Wed, 13 May 2026 12:31:32 -0700
> > Chia-I Wu <olvaffe@gmail.com> wrote:
> >   
> > > On Tue, May 12, 2026 at 8:39 AM Liviu Dudau <liviu.dudau@arm.com> wrote:  
> > > >
> > > > On Tue, May 12, 2026 at 04:11:11PM +0200, Boris Brezillon wrote:    
> > > > > On Tue, 12 May 2026 14:47:27 +0100
> > > > > Liviu Dudau <liviu.dudau@arm.com> wrote:
> > > > >    
> > > > > > On Thu, May 07, 2026 at 01:53:56PM +0200, Boris Brezillon wrote:    
> > > > > > > On Thu, 7 May 2026 11:02:26 +0200
> > > > > > > Marcin Ślusarz <marcin.slusarz@arm.com> wrote:
> > > > > > >    
> > > > > > > > On Tue, May 05, 2026 at 06:15:23PM +0200, Boris Brezillon wrote:    
> > > > > > > > > > @@ -277,9 +286,21 @@ int panthor_device_init(struct panthor_device *ptdev)
> > > > > > > > > >                     return ret;
> > > > > > > > > >     }
> > > > > > > > > >
> > > > > > > > > > +   /* If a protected heap name is specified but not found, defer the probe until created */
> > > > > > > > > > +   if (protected_heap_name && strlen(protected_heap_name)) {    
> > > > > > > > >
> > > > > > > > > Do we really need this strlen() > 0? Won't dma_heap_find() fail is the
> > > > > > > > > name is "" already?    
> > > > > > > >
> > > > > > > > If dma_heap_find() will fail, then the whole probe with fail too.
> > > > > > > > This check prevents that.    
> > > > > > >
> > > > > > > Yeah, that's also a questionable design choice. I mean, we can
> > > > > > > currently probe and boot the FW even though we never setup the
> > > > > > > protected FW sections, so why should we defer the probe here? Can't we
> > > > > > > just retry the next time a group with the protected bit is created and
> > > > > > > fail if we can find a protected heap?    
> > > > > >
> > > > > > The problem we have with the current firmware is that it does a number of setup steps at "boot"
> > > > > > time only. One of the steps is preparing its internal structures for when it enters protected
> > > > > > mode and it stores them in the buffer passed in at firmware loading. We cannot later run the
> > > > > > process when we have a group with protected mode set.    
> > > > >
> > > > > No, but we can force a full/slow reset and have that thing
> > > > > re-initialized, can't we? I mean, that's basically what we do when a
> > > > > fast reset fails: we re-initialize all the sections and reset again, at
> > > > > which point the FW should start from a fresh state, and be able to
> > > > > properly initialize the protected-related stuff if protected sections
> > > > > are populated. Am I missing something?    
> > > >
> > > > Right, we can do that. For some reason I keep associating the reset with the
> > > > error handling and not with "normal" operations.    
> > > I kind of hope we end up with either
> > > 
> > >  - panthor knows the exact heap to use and fails with EPROBE_DEFER if
> > > the heap is missing, or
> > >  - panthor gets a dma-buf from userspace and does the full reset
> > >    - userspace also needs to provide a dma-buf for each protected
> > > group for the suspend buffer
> > > 
> > > than something in-between. The latter is more ad-hoc and basically
> > > kicks the issue to the userspace.  
> > 
> > Indeed, the second option is more ad-hoc, but when you think about it,
> > userspace has to have this knowledge, because it needs to know the
> > dma-heap to use for buffer allocation that cross a device boundary
> > anyway. Think about frames produced by a video decoder, and composited
> > by the GPU into a protected scanout buffer that's passed to the KMS
> > device. Why would the GPU driver be source of truth when it comes to
> > choosing the heap to use to allocate protected buffers for the video
> > decoder or those used for the display?  
> 
> Just fyi, the trend is to go to devices listing the heaps userspace
> should allocate from

Devices listing the heaps they are able to import buffers from
(with the list being different based on the buffer properties, I
guess) is a good thing. This way the central allocator is in a position
where it can intersect the devices lists and decide which heap to
allocate from based on its buffer sharing knowledge.

> and/or using the heaps internally to allocate their
> buffers,

Yes, that too. For internal buffers (especially the device-wide ones,
like the protected FW sections we were discussing), it makes sense to
leave that up to the driver.

> so that last part is where we're headed, and feels totally
> reasonable to me.

Just to be clear, my main concern right now is not the long term plan,
but how realistic it is to assume we'll have all the DT/dma_heap pieces
in place in a reasonable amount of time. Looking at the current state
of affairs (based on this patchset), it feels like we're a long way
till we can have a robust way of exposing dma_heaps to in-kernel users
(refcounting, lifetime issues, describing allowed heaps, ensuring heaps
truly provide buffers with the expected properties, ...). I'm certainly
not saying these are not valid concerns, but I'd like to have a
temporary solution to support protected rendering in the meantime...

> 
> Maxime


^ permalink raw reply

* Re: [PATCH] Documentation: KVM: Document guest-visible compatibility expectations
From: Will Deacon @ 2026-05-19 11:11 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Paolo Bonzini, Marc Zyngier, Jonathan Corbet, Shuah Khan, kvm,
	Linux Doc Mailing List, Kernel Mailing List, Linux,
	Sean Christopherson, Jim Mattson, Oliver Upton, Joey Gouly,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Raghavendra Rao Ananta, Eric Auger, Kees Cook, Arnd Bergmann,
	Nathan Chancellor, linux-arm-kernel, kvmarm, linux-kselftest
In-Reply-To: <3f9d731c3d26b0367600f1069e6425099bc34eac.camel@infradead.org>

On Tue, May 19, 2026 at 11:41:26AM +0100, David Woodhouse wrote:
> On Wed, 2026-05-13 at 18:24 +0200, Paolo Bonzini wrote:
> > 
> > > See commit https://git.kernel.org/torvalds/c/49a1a2c70a7f which adds a
> > > new guest-visible feature in revision 3, but allowed userspace to
> > > restore the old behaviour by setting it to revision 2. All my patch
> > >  above does, is make it possible to set it to revision 1 as
> > > well. Because https://git.kernel.org/torvalds/c/d53c2c29ae0d previously
> > > changed the behaviour and bumped the default to 2 *without* allowing
> > > userspace to restore the prior behaviour, and we've been carrying a
> > > *revert* of that patch.
> > > 
> > > Why would we *not* accept such a patch?
> > 
> > Agreed. Even ignoring your revert, there's no reason why any upgrade
> > past 49a1a2c70a7f has to be from after d53c2c29ae0d.
> 
> So where do we go from here?
> 
> I assume you'll be taking this Documentation patch via the KVM tree?
> 
> But what about the actual fix at 
> https://lore.kernel.org/all/20260511113558.3325004-2-dwmw2@infradead.org/
> 
> This is a simple and unintrusive bug fix to make KVM/arm64 follow the
> "common sense" requirement that the doc patch codifies, apparently
> being rejected with the rather bizarre claim that KVM has no *need* to
> maintain guest-visible compatibility across host kernel changes.
> 
> So... what next? Is one of the other KVM/arm64 maintainers going to
> speak up? Paolo would you consider taking the fixes through your tree
> directly? 
> 
> Does Arm not actually *care* whether AArch64 is considered a stable and
> mature platform for KVM hosting?

Hey, come on. Marc cares more about this stuff than anybody else on the
planet. He's been single-handedly maintaining the tree for the past
couple of releases while Oliver was out and he's on the end of a _lot_
of patches. I'm only cc'd on a fraction of the KVM/arm64 changes and
it's bedlam.

Will

^ permalink raw reply

* Re: [PATCH v2] docs: fix typo in uniwill-laptop.rst
From: Ilpo Järvinen @ 2026-05-19 11:09 UTC (permalink / raw)
  To: Sakurai Shun
  Cc: Armin Wolf, Jonathan Corbet, Shuah Khan, platform-driver-x86,
	linux-doc, linux-kernel
In-Reply-To: <20260517024148.9642-1-ssh1326@icloud.com>

On Sun, 17 May 2026, Sakurai Shun wrote:

> Replace "benifit" with "benefit".
> 
> Signed-off-by: Sakurai Shun <ssh1326@icloud.com>

Thanks for the patch.

When sending an update, you should collect the tags from the earlier 
version.

No need to send another version because of it, I've added Armin's 
Reviewed-by while applying to review-ilpo-next (it will appear there later 
once I push the local changes into the public repo).

-- 
 i.

> ---
>  Documentation/wmi/devices/uniwill-laptop.rst | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/Documentation/wmi/devices/uniwill-laptop.rst b/Documentation/wmi/devices/uniwill-laptop.rst
> index e246bf293..65583b239 100644
> --- a/Documentation/wmi/devices/uniwill-laptop.rst
> +++ b/Documentation/wmi/devices/uniwill-laptop.rst
> @@ -189,7 +189,7 @@ Indexed IO
>  
>  Indexed IO with IO ports with a granularity of a single byte can be performed using the ``RIOP``
>  (read) and ``WIOP`` (write) ACPI control methods. Those ACPI methods are unused because they
> -provide no benifit when compared to the native IO port access functions provided by the kernel.
> +provide no benefit when compared to the native IO port access functions provided by the kernel.
>  
>  Special thanks go to github user `pobrn` which developed the
>  `qc71_laptop <https://github.com/pobrn/qc71_laptop>`_ driver on which this driver is partly based.
> 

^ permalink raw reply

* Re: [PATCH] nios2: remove the architecture
From: Miguel Ojeda @ 2026-05-19 11:07 UTC (permalink / raw)
  To: Simon Schuster
  Cc: Ethan Nelson-Moore, Wolfram Sang, Peter Zijlstra, Arnd Bergmann,
	Dinh Nguyen, linux-doc, devicetree, workflows, Linux-Arch,
	dmaengine, linux-i2c, linux-iio, Netdev, linux-pci, linux-pwm,
	linux-hardening, linux-kbuild, linux-csky@vger.kernel.org,
	Jonathan Corbet, Shuah Khan, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Daniel Lezcano, Thomas Gleixner, Alex Shi,
	Yanteng Si, Dongliang Mu, Hu Haowen, Kees Cook, Oleg Nesterov,
	Will Deacon, Aneesh Kumar K.V (Arm), Andrew Morton,
	Nicholas Piggin, Vinod Koul, Frank Li, Dave Penkler, Andi Shyti,
	Jonathan Cameron, David Lechner, Nuno Sá, Andy Shevchenko,
	Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Lorenzo Pieralisi, Krzysztof WilczyDski,
	Andreas Oetken
In-Reply-To: <20260519103012.blot4bssgiqfer6p@dev-vm-schuster>

On Tue, May 19, 2026 at 12:41 PM Simon Schuster
<schuster.simon@siemens-energy.com> wrote:
>
> Sure, I'd be glad to do so, but so far I refrained from it as I was a bit
> unsure about the netiquette (can I simply do so by self-proclamation? At
> least the git history seems to suggest so...).

Up to the existing maintainer, in general.

I would also suggest changing the support level to "Supported",
instead of "Maintained" -- that would help justify keeping it in
mainline.

I hope that helps a bit...

Cheers,
Miguel

^ permalink raw reply

* Re: [PATCH] nios2: remove the architecture
From: Wolfram Sang @ 2026-05-19 10:55 UTC (permalink / raw)
  To: Simon Schuster
  Cc: Ethan Nelson-Moore, Peter Zijlstra, Arnd Bergmann, Dinh Nguyen,
	linux-doc, devicetree, workflows, Linux-Arch, dmaengine,
	linux-i2c, linux-iio, Netdev, linux-pci, linux-pwm,
	linux-hardening, linux-kbuild, linux-csky@vger.kernel.org,
	Jonathan Corbet, Shuah Khan, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Daniel Lezcano, Thomas Gleixner, Alex Shi,
	Yanteng Si, Dongliang Mu, Hu Haowen, Kees Cook, Oleg Nesterov,
	Will Deacon, Aneesh Kumar K.V (Arm), Andrew Morton,
	Nicholas Piggin, Vinod Koul, Frank Li, Dave Penkler, Andi Shyti,
	Jonathan Cameron, David Lechner, Nuno Sá, Andy Shevchenko,
	Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Lorenzo Pieralisi, Krzysztof WilczyDski,
	Andreas Oetken
In-Reply-To: <20260519103012.blot4bssgiqfer6p@dev-vm-schuster>

Hi Simon,

> > ... but given this, you might want to get added in MAINTAINERS as
> > reviewer (or even maintainer) for nios2? Besides that your efforts are
> > already worth it in my book, it would also ensure you get CCed on
> > patches like this. Then, you are not depending on people like Arnd
> > putting you in the loop manually.
> 
> Sure, I'd be glad to do so, but so far I refrained from it as I was a bit
> unsure about the netiquette (can I simply do so by self-proclamation? At
> least the git history seems to suggest so...).

In your case, you can do so, I'd say. You explained your very reasonable
interest in the architecture and have already shown efforts to keep it,
as we can see from the git history. The final call will be done by Dinh
Nguyen obviously with whom you probably need to sort out details. But I
can't imagine your offer for help will be rejected, quite the contrary.

Happy hacking,

   Wolfram


^ permalink raw reply

* Re: [PATCH] Documentation: KVM: Document guest-visible compatibility expectations
From: David Woodhouse @ 2026-05-19 10:41 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, Jonathan Corbet, Shuah Khan, kvm,
	Linux Doc Mailing List, Kernel Mailing List, Linux,
	Sean Christopherson, Jim Mattson, Oliver Upton, Joey Gouly,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon,
	Raghavendra Rao Ananta, Eric Auger, Kees Cook, Arnd Bergmann,
	Nathan Chancellor, linux-arm-kernel, kvmarm, linux-kselftest
In-Reply-To: <CABgObfaM-JtNn2MuYXaiadQnLfAhTEaoHAcTG9=J6LkMcQCJ3A@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1763 bytes --]

On Wed, 2026-05-13 at 18:24 +0200, Paolo Bonzini wrote:
> 
> > See commit https://git.kernel.org/torvalds/c/49a1a2c70a7f which adds a
> > new guest-visible feature in revision 3, but allowed userspace to
> > restore the old behaviour by setting it to revision 2. All my patch
> >  above does, is make it possible to set it to revision 1 as
> > well. Because https://git.kernel.org/torvalds/c/d53c2c29ae0d previously
> > changed the behaviour and bumped the default to 2 *without* allowing
> > userspace to restore the prior behaviour, and we've been carrying a
> > *revert* of that patch.
> > 
> > Why would we *not* accept such a patch?
> 
> Agreed. Even ignoring your revert, there's no reason why any upgrade
> past 49a1a2c70a7f has to be from after d53c2c29ae0d.

So where do we go from here?

I assume you'll be taking this Documentation patch via the KVM tree?

But what about the actual fix at 
https://lore.kernel.org/all/20260511113558.3325004-2-dwmw2@infradead.org/

This is a simple and unintrusive bug fix to make KVM/arm64 follow the
"common sense" requirement that the doc patch codifies, apparently
being rejected with the rather bizarre claim that KVM has no *need* to
maintain guest-visible compatibility across host kernel changes.

So... what next? Is one of the other KVM/arm64 maintainers going to
speak up? Paolo would you consider taking the fixes through your tree
directly? 

Does Arm not actually *care* whether AArch64 is considered a stable and
mature platform for KVM hosting?

We don't have CONFIG_EXPERIMENTAL any more, do we? Or perhaps we could
mark it such. Is CONFIG_STAGING the right thing, for unstable things
which might violate the normal maturity expectations of the kernel?

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH] nios2: remove the architecture
From: Simon Schuster @ 2026-05-19 10:30 UTC (permalink / raw)
  To: Ethan Nelson-Moore, Wolfram Sang
  Cc: Peter Zijlstra, Arnd Bergmann, Dinh Nguyen, linux-doc, devicetree,
	workflows, Linux-Arch, dmaengine, linux-i2c, linux-iio, Netdev,
	linux-pci, linux-pwm, linux-hardening, linux-kbuild,
	linux-csky@vger.kernel.org, Jonathan Corbet, Shuah Khan,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley, Daniel Lezcano,
	Thomas Gleixner, Alex Shi, Yanteng Si, Dongliang Mu, Hu Haowen,
	Kees Cook, Oleg Nesterov, Will Deacon, Aneesh Kumar K.V (Arm),
	Andrew Morton, Nicholas Piggin, Vinod Koul, Frank Li,
	Dave Penkler, Andi Shyti, Jonathan Cameron, David Lechner,
	Nuno Sá, Andy Shevchenko, Andrew Lunn, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Lorenzo Pieralisi,
	Krzysztof WilczyDski, Andreas Oetken
In-Reply-To: <CADkSEUjhq6HSdg4ignzbuJiN5uXATsTdxFbRJ3BMxs5=WUWLDg@mail.gmail.com>

Hi Ethan, hi Wolfram,

Thank you for your thoughtful responses.

On Mon, May 18, 2026 at 05:13:58PM -0700, Ethan Nelson-Moore wrote:
> Your reasoning makes complete sense. However, there is an alternative
> to maintaining the architecture in mainline.
> 
> The Civil Infrastructure Platform project maintains super-LTS kernels
> (and a set of base Debian packages) for 10 years. They are intended to
> be used for exactly these kinds of devices.
> See here: https://wiki.linuxfoundation.org/civilinfrastructureplatform/start#kernel_maintainership
> and here: https://cip-project.org/about/linux-kernel-core-packages
> 
> CIP will maintain kernel 6.12 until 2035. Is this long enough for your
> lifecycle? What kernel are you currently using? If it's newer than
> 6.12, we can easily wait until the next CIP SLTS release to remove
> Nios II support to avoid a downgrade.

This depends. For released/maintained firmware revisions we already
track CIP SLTS versions (candidates) to be prepared, the majority of which
is currently still running 6.1.x with 6.12.x up-and-coming.
But for the reasons outlined by you regarding architectural and feature
support in CIP SLTS, we do not, however, use the extended support duration
SLTS releases in production, and instead upgrade with the kernel.org LTS
branch release schedule and track these internally alongside mainline
to prevent major obstacles during version jumps.
2035 is still a rather tight timeframe for our typical support/phase-out
period (we would hope to get close to 2040 with the SLTS extensions),
which is also the reason for our targeted 'lifetime extension' for the
nios2 architecture for approximately 5 years, or more precisely ~2-3
SLTS kernels assuming the usual cadence of 2 years between SLTS versions
(+ some safety margin).

> Also, CIP focuses on architectures used by CIP members - currently I
> think they are x86 (32 and 64-bit), ARM (32 and 64-bit) and RISC-V.
> Since Siemens is already a CIP member, you can simply ask them to add
> Nios II to the list, and you can assist them with testing and directly
> submit patches to them once the standard 6.12 LTS period ends.

We have already been in contact with the CIP team (even though the
contact has unfortunately lapsed a bit, mostly our fault), but adding an
additional architecture seemed to be a more substantial effort.
N.B.: Due to past circumstances, we are a completely distinct business
entity from Siemens AG that merely shares the trademark and a common
history; but of course this should not hinder us from getting directly
involved in CIP (quite the opposite!). But this also requires some setup
time.

On Mon, May 18, 2026 at 10:46:55PM +0200, Wolfram Sang wrote:
> > If desired, we also would be happy to intensify our support regarding
> > reviews or testing to share the maintnance burden if it helps to keep
> > nios2 in mainline a bit longer.
> 
> ... but given this, you might want to get added in MAINTAINERS as
> reviewer (or even maintainer) for nios2? Besides that your efforts are
> already worth it in my book, it would also ensure you get CCed on
> patches like this. Then, you are not depending on people like Arnd
> putting you in the loop manually.

Sure, I'd be glad to do so, but so far I refrained from it as I was a bit
unsure about the netiquette (can I simply do so by self-proclamation? At
least the git history seems to suggest so...).

Best regards,
Simon

^ permalink raw reply

* [agd5f:drm-next 36/58] htmldocs: Documentation/gpu/amdgpu/display/display-manager:50: ./drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_color.c:420: WARNING: Error in declarator or parameters
From: kernel test robot @ 2026-05-19 10:10 UTC (permalink / raw)
  To: Alex Hung
  Cc: oe-kbuild-all, Alex Deucher, Harry Wentland, Ivan Lipski,
	linux-doc

tree:   https://gitlab.freedesktop.org/agd5f/linux.git drm-next
head:   99cbcb3453b7d19cab507db9313ada9a38e82d01
commit: 82ffa89fa2803f9288163f538151a45581a88ca2 [36/58] drm/amd/display: Add KUnit test for color helpers
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
docutils: docutils (Docutils 0.21.2, Python 3.13.5, on linux)
reproduce: (https://download.01.org/0day-ci/archive/20260519/202605191223.ct8ZUEYU-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202605191223.ct8ZUEYU-lkp@intel.com/

All warnings (new ones prefixed by >>):

   AMD plane color pipeline
   ------------------------ [docutils]
>> Documentation/gpu/amdgpu/display/display-manager:50: ./drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_color.c:420: WARNING: Error in declarator or parameters
   Invalid C declaration: Expected identifier in nested name, got keyword: struct [error at 29]
   STATIC_IFN_KUNIT const struct drm_color_lut * __extract_blob_lut (const struct drm_property_blob *blob, uint32_t *size)
   -----------------------------^
   Documentation/gpu/amdgpu/display/display-manager:50: ./drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_color.c:437: WARNING: Error in declarator or parameters
   Invalid C declaration: Expected identifier in nested name, got keyword: struct [error at 29]

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* Re: [PATCH net-next v3 02/14] libie: add PCI device initialization helpers to libie
From: Larysa Zaremba @ 2026-05-19 10:03 UTC (permalink / raw)
  To: phasta, Bjorn Helgaas
  Cc: Bjorn Helgaas, Tony Nguyen, davem, kuba, pabeni, edumazet,
	andrew+netdev, netdev, Phani R Burra, przemyslaw.kitszel,
	aleksander.lobakin, sridhar.samudrala, anjali.singhai,
	michal.swiatkowski, maciej.fijalkowski, emil.s.tantilov,
	madhu.chittim, joshua.a.hay, jacob.e.keller,
	jayaprakash.shanmugam, jiri, horms, corbet, richardcochran,
	linux-doc, bhelgaas, linux-pci, Bharath R, Samuel Salin,
	Aleksandr Loktionov
In-Reply-To: <7a477885c58709f287f6c1440fb7e31331227d10.camel@mailbox.org>

On Tue, May 19, 2026 at 10:20:27AM +0200, Philipp Stanner wrote:
> On Mon, 2026-05-18 at 16:54 -0500, Bjorn Helgaas wrote:
> > [+cc Philipp]
> > 
> > On Fri, May 15, 2026 at 03:44:26PM -0700, Tony Nguyen wrote:
> > > From: Phani R Burra <phani.r.burra@intel.com>
> > > 
> > > Add support functions for drivers to configure PCI functionality and access
> > > MMIO space.
> > 
> > This looks kind of like what pcim_iomap_range() does, i.e., a way to
> > ioremap (BAR-idx, offset, size) pieces of PCI BARs.  That sounds like
> > useful functionality.

I agree that pci_iomap_range() could simplify the implementation a little bit. 
But libie_pci API is still needed for ixd and idpf.

I guess commit message lacks clarity. Apart from mapping separate ranges, 
libie_pci also adds a list which is intended to be tranversed by a driver, e.g. 
to pass certain IO mappings to the auxbus devices. Such list is also traversed 
by the libie_pci_get_mmio_addr() helper. And there is also some input 
validation, because ranges are received from a very customizable FW. So a lot of 
driver-specific convenience stuff.

> > 
> > Is there something Intel-specific or even ethernet-specific about
> > this?  If devm_* and pcim_* don't do what you need, maybe they should
> > be extended or this could be made generic so any drivers could use it?
> >
> > This looks like a mix of managed (pcim_enable_device(),
> > pcim_request_region()), and unmanaged (ioremap(), iounmap()) things.
> > I haven't looked at how all this is used, but it's pretty easy to get
> > things wrong when mixing models.
> >

The mix of managed-unmanaged things is because of static and non-static regions. 
Static regions do have a device lifetime. Dynamic are theoretically only valid 
between hard resets (e.g. from idpf_vc_core_init() and idpf_vc_core_deinit), so 
a nice thing to do in such case is to unmap them before the reset.

I settled on a "mixed" model, because this way libie_pci_init_dev() can befefit 
from managed APIs, but region management part can stay much more flexible.

I do not think making this generic makes sense, it is rather tailored to the 
requirements of idpf (data plane) + ixd (control plane) generation of intel 
ethernet devices. I you have some examples that could use the same logic, I 
could take a look.

For context:
Libie was generally created as a way to just reduce code duplication between 
intel ethernet drivers. And we had a lot of code duplication. So this library is 
mostly things that would be otherwise copy-pasted.
 
> > > +++ b/drivers/net/ethernet/intel/libie/pci.c
> > > @@ -0,0 +1,208 @@
> > > +// SPDX-License-Identifier: GPL-2.0-only
> > > +/* Copyright (C) 2025 Intel Corporation */
> > > +
> > > +#include <linux/intel/libie/pci.h>
> > > +
> > > +/**
> > > + * libie_find_mmio_region - find MMIO region containing a range
> > > + * @mmio_list: list that contains MMIO region info
> > > + * @offset: range start offset
> > > + * @size: range size
> > > + * @bar_idx: BAR index containing the range to search
> > > + *
> > > + * Return: pointer to a MMIO region overlapping with the range in any way or
> > > + *	   NULL if no such region is mapped.

[...]

> > > +
> > > +	if (offset + size > pci_resource_len(pdev, bar_idx))
> > > +		return false;
> > > +
> > > +	mr = libie_find_mmio_region(&mmio_info->mmio_list, offset, size,
> > > +				    bar_idx);
> > > +	if (mr) {
> > > +		pci_warn(pdev,
> > > +			 "Mapping of BAR%u (offset=%llu, size=%llu) intersecting region (offset=%llu, size=%llu) already exists\n",
> > > +			 bar_idx, (unsigned long long)mr->offset,
> > > +			 (unsigned long long)mr->size,
> > > +			 (unsigned long long)offset, (unsigned long long)size);
> > > +		return mr->offset <= offset &&
> > > +		       mr->offset + mr->size >= offset + size;
> > > +	}
> > > +
> > > +	pa = pci_resource_start(pdev, bar_idx) + offset;
> > > +	va = ioremap(pa, size);
> 
> I agree with Bjorn, this certainly looks like something that can be
> covered by shared PCI infrastructure?
>

I terms of address calculation I agree that pci_iomap_range() could help shrink 
the code a little bit.

> > > +	if (!va) {
> > > +		pci_err(pdev, "Failed to map BAR%u region\n", bar_idx);
> > > +		return false;
> > > +	}
> > > +
> > > +	mr = kvzalloc_obj(*mr);
> > > +	if (!mr) {
> > > +		iounmap(va);
> > > +		return false;
> > > +	}
> > > +
> > > +	mr->addr = va;
> > > +	mr->offset = offset;
> > > +	mr->size = size;
> > > +	mr->bar_idx = bar_idx;
> > > +
> > > +	list_add_tail(&mr->list, &mmio_info->mmio_list);
> > > +
> > > +	return true;
> > > +}
> > > +EXPORT_SYMBOL_NS_GPL(__libie_pci_map_mmio_region, "LIBIE_PCI");
> > > +
> > > +/**
> > > + * libie_pci_unmap_fltr_regs - unmap selected PCI device MMIO regions
> > > + * @mmio_info: contains list of MMIO regions to unmap
> > > + * @fltr: returns true, if region is to be unmapped
> > > + */
> > > +void libie_pci_unmap_fltr_regs(struct libie_mmio_info *mmio_info,
> > > +			       bool (*fltr)(struct libie_mmio_info *mmio_info,
> > > +					    struct libie_pci_mmio_region *reg))
> > > +{
> > > +	struct libie_pci_mmio_region *mr, *tmp;
> > > +
> > > +	list_for_each_entry_safe(mr, tmp, &mmio_info->mmio_list, list) {
> > > +		if (!fltr(mmio_info, mr))
> > > +			continue;
> > > +		iounmap(mr->addr);
> > > +		list_del(&mr->list);
> > > +		kvfree(mr);
> > > +	}
> > > +}
> > > +EXPORT_SYMBOL_NS_GPL(libie_pci_unmap_fltr_regs, "LIBIE_PCI");
> > > +
> > > +/**
> > > + * libie_pci_unmap_all_mmio_regions - unmap all PCI device MMIO regions
> > > + * @mmio_info: contains list of MMIO regions to unmap
> > > + */
> > > +void libie_pci_unmap_all_mmio_regions(struct libie_mmio_info *mmio_info)
> > > +{
> > > +	struct libie_pci_mmio_region *mr, *tmp;
> > > +
> > > +	list_for_each_entry_safe(mr, tmp, &mmio_info->mmio_list, list) {
> > > +		iounmap(mr->addr);
> > > +		list_del(&mr->list);
> > > +		kvfree(mr);
> > > +	}
> > > +}
> > > +EXPORT_SYMBOL_NS_GPL(libie_pci_unmap_all_mmio_regions, "LIBIE_PCI");
> > > +
> > > +/**
> > > + * libie_pci_init_dev - enable and reserve PCI regions of the device
> > > + * @pdev: PCI device information
> > > + *
> > > + * Return: %0 on success, -%errno on failure.
> > > + */
> > > +int libie_pci_init_dev(struct pci_dev *pdev)
> > > +{
> > > +	int err;
> > > +
> > > +	err = pcim_enable_device(pdev);
> > > +	if (err)
> > > +		return err;
> > > +
> > > +	for (int bar = 0; bar < PCI_STD_NUM_BARS; bar++)
> > > +		if (pci_resource_flags(pdev, bar) & IORESOURCE_MEM) {
> > > +			err = pcim_request_region(pdev, bar, pci_name(pdev));
> 
> So mappings are handled manually, and region requests automatically
> through devres?
> 
> In case you can use (or add) a pcim_iomap_region() function for that,
> you would get consistent automatic devres management.
> 
> 
> Greetings,
> P.
> 
> > > +			if (err)
> > > +				return err;
> > > +		}
> > > +
> > > +	err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
> > > +	if (err)
> > > +		return err;
> > > +
> > > +	pci_set_master(pdev);
> > > +
> > > +	return 0;
> > > +}
> > > +EXPORT_SYMBOL_NS_GPL(libie_pci_init_dev, "LIBIE_PCI");
> > > +
> > > +MODULE_DESCRIPTION("Common Ethernet PCI library");
> > > +MODULE_LICENSE("GPL");
> > > diff --git a/include/linux/intel/libie/pci.h b/include/linux/intel/libie/pci.h
> > > new file mode 100644
> > > index 000000000000..effd072c55c8
> > > --- /dev/null
> > > +++ b/include/linux/intel/libie/pci.h
> > > @@ -0,0 +1,56 @@
> > > +/* SPDX-License-Identifier: GPL-2.0-only */
> > > +/* Copyright (C) 2025 Intel Corporation */
> > > +
> > > +#ifndef __LIBIE_PCI_H
> > > +#define __LIBIE_PCI_H
> > > +
> > > +#include <linux/pci.h>
> > > +
> > > +/**
> > > + * struct libie_pci_mmio_region - structure for MMIO region info
> > > + * @list: used to add a MMIO region to the list of MMIO regions in
> > > + *	  libie_mmio_info
> > > + * @addr: virtual address of MMIO region start
> > > + * @offset: start offset of the MMIO region
> > > + * @size: size of the MMIO region
> > > + * @bar_idx: BAR index to which the MMIO region belongs to
> > > + */
> > > +struct libie_pci_mmio_region {
> > > +	struct list_head	list;
> > > +	void __iomem		*addr;
> > > +	resource_size_t		offset;
> > > +	resource_size_t		size;
> > > +	u16			bar_idx;
> > > +};
> > > +
> > > +/**
> > > + * struct libie_mmio_info - contains list of MMIO regions
> > > + * @pdev: PCI device pointer
> > > + * @mmio_list: list of MMIO regions
> > > + */
> > > +struct libie_mmio_info {
> > > +	struct pci_dev		*pdev;
> > > +	struct list_head	mmio_list;
> > > +};
> > > +
> > > +#define libie_pci_map_mmio_region(mmio_info, offset, size, ...)	\
> > > +	__libie_pci_map_mmio_region(mmio_info, offset, size,		\
> > > +				     COUNT_ARGS(__VA_ARGS__), ##__VA_ARGS__)
> > > +
> > > +#define libie_pci_get_mmio_addr(mmio_info, offset, ...)		\
> > > +	__libie_pci_get_mmio_addr(mmio_info, offset,			\
> > > +				   COUNT_ARGS(__VA_ARGS__), ##__VA_ARGS__)
> > > +
> > > +bool __libie_pci_map_mmio_region(struct libie_mmio_info *mmio_info,
> > > +				 resource_size_t offset, resource_size_t size,
> > > +				 int num_args, ...);
> > > +void __iomem *__libie_pci_get_mmio_addr(struct libie_mmio_info *mmio_info,
> > > +					resource_size_t offset,
> > > +					int num_args, ...);
> > > +void libie_pci_unmap_all_mmio_regions(struct libie_mmio_info *mmio_info);
> > > +void libie_pci_unmap_fltr_regs(struct libie_mmio_info *mmio_info,
> > > +			       bool (*fltr)(struct libie_mmio_info *mmio_info,
> > > +					    struct libie_pci_mmio_region *reg));
> > > +int libie_pci_init_dev(struct pci_dev *pdev);
> > > +
> > > +#endif /* __LIBIE_PCI_H */
> > > -- 
> > > 2.47.1
> > > 
> 

^ permalink raw reply

* Re: [PATCH v4 09/10] dt-bindings: firmware: add arm,ras-cper
From: Ahmed Tiba @ 2026-05-19  9:57 UTC (permalink / raw)
  To: Krzysztof Kozlowski, rafael, bp, saket.dumbre, will, xueshuai,
	mchehab, krzk+dt, dave, conor+dt, vishal.l.verma, jic23, corbet,
	guohanjun, dave.jiang, catalin.marinas, lenb, tony.luck, skhan,
	djbw, alison.schofield, ira.weiny, robh
  Cc: devicetree, linux-acpi, linux-doc, Dmitry.Lamerov, linux-cxl,
	Michael.Zhao2, acpica-devel, linux-kernel, linux-arm-kernel,
	linux-edac
In-Reply-To: <d12b5738-ca14-40aa-930f-eddf3199818d@kernel.org>

On 19/05/2026 10:22, Krzysztof Kozlowski wrote:
> On 19/05/2026 11:02, Ahmed Tiba wrote:
>> On 19/05/2026 08:04, Krzysztof Kozlowski wrote:
>>> On 18/05/2026 13:57, Ahmed Tiba wrote:
>>>> Describe the DeviceTree node that exposes the Arm firmware-first
>>>> CPER provider and hook the file into MAINTAINERS so the
>>>> binding has an owner.
>>>>
>>>> Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>
>>>
>>> Please implement previous comments.
>>
>> Could you please clarify which previous DT comments you still see
>> as unaddressed?
>>
>> My understanding was that I had addressed the earlier points on the YAML
>> description formatting, the `memory-region` description text, and the
>> example. If I missed a specific item beyond the one below, please point
>> me to it.
> 
> You do not need other nodes for your device in the example. I asked why
> this is needed for the example, but there was no answer.
Understood. I wanted the example to show the full binding context,
including how the `memory-region` phandles point to the reserved memory,
but I see your point.

I will remove the `reserved-memory` node from the example and simplify 
it to only the `arm,ras-cper` device node.

Best regards,
Ahmed




^ permalink raw reply

* [PATCH] docs: md: fix grammar in speed_limit description
From: MigMarGil @ 2026-05-19  9:56 UTC (permalink / raw)
  To: corbet, skhan; +Cc: linux-doc, linux-kernel, MigMarGil

Replace 'This are' with 'These are' in the md sysfs speed limit
section to correct grammar and improve readability.

Signed-off-by: MigMarGil <miguel.martin.gil.uni@gmail.com>
---
 Documentation/admin-guide/md.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/md.rst b/Documentation/admin-guide/md.rst
index dc7eab191..003fd34f7 100644
--- a/Documentation/admin-guide/md.rst
+++ b/Documentation/admin-guide/md.rst
@@ -734,7 +734,7 @@ also have
       They should be scaled by the bitmap_chunksize.
 
    sync_speed_min, sync_speed_max
-     This are similar to ``/proc/sys/dev/raid/speed_limit_{min,max}``
+     These are similar to ``/proc/sys/dev/raid/speed_limit_{min,max}``
      however they only apply to the particular array.
 
      If no value has been written to these, or if the word ``system``
-- 
2.43.0


^ permalink raw reply related

* Re: [Linaro-mm-sig] Re: [PATCH 4/8] drm/panthor: Add support for protected memory allocation in panthor
From: Maxime Ripard @ 2026-05-19  9:52 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Chia-I Wu, Liviu Dudau, Marcin Ślusarz, Ketil Johnsen,
	David Airlie, Simona Vetter, Maarten Lankhorst, Thomas Zimmermann,
	Jonathan Corbet, Shuah Khan, Sumit Semwal, Benjamin Gaignard,
	Brian Starkey, John Stultz, T.J. Mercier, Christian König,
	Steven Price, Daniel Almeida, Alice Ryhl, Matthias Brugger,
	AngeloGioacchino Del Regno, dri-devel, linux-doc, linux-kernel,
	linux-media, linaro-mm-sig, linux-arm-kernel, linux-mediatek,
	Florent Tomasin, nd
In-Reply-To: <20260518091650.5a7a4f4a@fedora>

[-- Attachment #1: Type: text/plain, Size: 3983 bytes --]

Hi Boris,

On Mon, May 18, 2026 at 09:16:50AM +0200, Boris Brezillon wrote:
> On Wed, 13 May 2026 12:31:32 -0700
> Chia-I Wu <olvaffe@gmail.com> wrote:
> 
> > On Tue, May 12, 2026 at 8:39 AM Liviu Dudau <liviu.dudau@arm.com> wrote:
> > >
> > > On Tue, May 12, 2026 at 04:11:11PM +0200, Boris Brezillon wrote:  
> > > > On Tue, 12 May 2026 14:47:27 +0100
> > > > Liviu Dudau <liviu.dudau@arm.com> wrote:
> > > >  
> > > > > On Thu, May 07, 2026 at 01:53:56PM +0200, Boris Brezillon wrote:  
> > > > > > On Thu, 7 May 2026 11:02:26 +0200
> > > > > > Marcin Ślusarz <marcin.slusarz@arm.com> wrote:
> > > > > >  
> > > > > > > On Tue, May 05, 2026 at 06:15:23PM +0200, Boris Brezillon wrote:  
> > > > > > > > > @@ -277,9 +286,21 @@ int panthor_device_init(struct panthor_device *ptdev)
> > > > > > > > >                     return ret;
> > > > > > > > >     }
> > > > > > > > >
> > > > > > > > > +   /* If a protected heap name is specified but not found, defer the probe until created */
> > > > > > > > > +   if (protected_heap_name && strlen(protected_heap_name)) {  
> > > > > > > >
> > > > > > > > Do we really need this strlen() > 0? Won't dma_heap_find() fail is the
> > > > > > > > name is "" already?  
> > > > > > >
> > > > > > > If dma_heap_find() will fail, then the whole probe with fail too.
> > > > > > > This check prevents that.  
> > > > > >
> > > > > > Yeah, that's also a questionable design choice. I mean, we can
> > > > > > currently probe and boot the FW even though we never setup the
> > > > > > protected FW sections, so why should we defer the probe here? Can't we
> > > > > > just retry the next time a group with the protected bit is created and
> > > > > > fail if we can find a protected heap?  
> > > > >
> > > > > The problem we have with the current firmware is that it does a number of setup steps at "boot"
> > > > > time only. One of the steps is preparing its internal structures for when it enters protected
> > > > > mode and it stores them in the buffer passed in at firmware loading. We cannot later run the
> > > > > process when we have a group with protected mode set.  
> > > >
> > > > No, but we can force a full/slow reset and have that thing
> > > > re-initialized, can't we? I mean, that's basically what we do when a
> > > > fast reset fails: we re-initialize all the sections and reset again, at
> > > > which point the FW should start from a fresh state, and be able to
> > > > properly initialize the protected-related stuff if protected sections
> > > > are populated. Am I missing something?  
> > >
> > > Right, we can do that. For some reason I keep associating the reset with the
> > > error handling and not with "normal" operations.  
> > I kind of hope we end up with either
> > 
> >  - panthor knows the exact heap to use and fails with EPROBE_DEFER if
> > the heap is missing, or
> >  - panthor gets a dma-buf from userspace and does the full reset
> >    - userspace also needs to provide a dma-buf for each protected
> > group for the suspend buffer
> > 
> > than something in-between. The latter is more ad-hoc and basically
> > kicks the issue to the userspace.
> 
> Indeed, the second option is more ad-hoc, but when you think about it,
> userspace has to have this knowledge, because it needs to know the
> dma-heap to use for buffer allocation that cross a device boundary
> anyway. Think about frames produced by a video decoder, and composited
> by the GPU into a protected scanout buffer that's passed to the KMS
> device. Why would the GPU driver be source of truth when it comes to
> choosing the heap to use to allocate protected buffers for the video
> decoder or those used for the display?

Just fyi, the trend is to go to devices listing the heaps userspace
should allocate from and/or using the heaps internally to allocate their
buffers, so that last part is where we're headed, and feels totally
reasonable to me.

Maxime

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 273 bytes --]

^ permalink raw reply

* Re: [PATCH] nios2: remove the architecture
From: Geert Uytterhoeven @ 2026-05-19  9:43 UTC (permalink / raw)
  To: David Laight
  Cc: Ethan Nelson-Moore, linux-doc, devicetree, workflows, linux-arch,
	dmaengine, linux-i2c, linux-iio, netdev, linux-pci, linux-pwm,
	linux-hardening, linux-kbuild, linux-csky, Jonathan Corbet,
	Shuah Khan, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Daniel Lezcano, Thomas Gleixner, Alex Shi, Yanteng Si,
	Dongliang Mu, Hu Haowen, Dinh Nguyen, Kees Cook, Oleg Nesterov,
	Will Deacon, Aneesh Kumar K.V, Andrew Morton, Nick Piggin,
	Peter Zijlstra, Vinod Koul, Frank Li, Dave Penkler, Andi Shyti,
	Jonathan Cameron, David Lechner, Nuno Sá, Andy Shevchenko,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Lorenzo Pieralisi, Krzysztof Wilczyński
In-Reply-To: <20260519094820.1f05ab8e@pumpkin>

Hi David,

On Tue, 19 May 2026 at 10:55, David Laight <david.laight.linux@gmail.com> wrote:
> The company I used to work for used 4 NIOS II inside an fpga.
> The instruction timing for one is pretty critical, it has some code that
> has to complete in 122 clocks (worst case).
> Our solution was to spend a few man-weeks writing a compatible cpu!
> I think it came out with fewer pipeline stalls (in particular it 'lost'
> the one for a (predicted) taken branch).
> The maximum clock frequency might be lower; but it is ok at 62.5MHz and the
> higher 125MHz in just impossible for all sorts of reasons.
>
> OTOH I really wouldn't run Linux on it!

Sounds similar to what CoreSemi is doing with J2 (nommu, also for
predictable latency), but their products do run Linux.
See the video from the LPC session at
https://lpc.events/event/19/contributions/2097/

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH RFC 2/5] dma-heap: charge dma-buf memory via explicit memcg
From: Albert Esteve @ 2026-05-19  9:43 UTC (permalink / raw)
  To: Barry Song
  Cc: Tejun Heo, Johannes Weiner, Michal Koutný, Jonathan Corbet,
	Shuah Khan, Sumit Semwal, Christian König, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton,
	Benjamin Gaignard, Brian Starkey, John Stultz, T.J. Mercier,
	Christian Brauner, Paul Moore, James Morris, Serge E. Hallyn,
	Stephen Smalley, Ondrej Mosnacek, Shuah Khan, cgroups, linux-doc,
	linux-kernel, linux-media, dri-devel, linaro-mm-sig, linux-mm,
	linux-security-module, selinux, linux-kselftest, mripard,
	echanude
In-Reply-To: <CAGsJ_4xwJ7SAhKPJyRtMTw6psTO7H1EcFFpDw0po1W8PX4FE8g@mail.gmail.com>

On Tue, May 19, 2026 at 12:43 AM Barry Song <baohua@kernel.org> wrote:
>
> On Mon, May 18, 2026 at 8:16 PM Albert Esteve <aesteve@redhat.com> wrote:
> >
> > On Sat, May 16, 2026 at 9:37 AM Barry Song <baohua@kernel.org> wrote:
> > >
> > > On Tue, May 12, 2026 at 5:18 PM Albert Esteve <aesteve@redhat.com> wrote:
> > > >
> > > > On embedded platforms a central process often allocates dma-buf
> > > > memory on behalf of client applications. Without a way to
> > > > attribute the charge to the requesting client's cgroup, the
> > > > cost lands on the allocator, making per-cgroup memory limits
> > > > ineffective for the actual consumers.
> > > >
> > > > Add charge_pid_fd to struct dma_heap_allocation_data. When set to
> > > > a valid pidfd, DMA_HEAP_IOCTL_ALLOC resolves the target task's
> > > > memcg and charges the buffer there via mem_cgroup_charge_dmabuf()
> > > > inside dma_heap_buffer_alloc(). Without charge_pid_fd, and with
> > > > the mem_accounting module parameter enabled, the buffer is charged
> > > > to the allocator's own cgroup.
> > > >
> > > > Additionally, commit 3c227be90659 ("dma-buf: system_heap: account for
> > > > system heap allocation in memcg") adds __GFP_ACCOUNT to system-heap
> > > > page allocations. Keeping __GFP_ACCOUNT would charge the same pages
> > > > twice (once to kmem, once to MEMCG_DMABUF), thus remove it and route
> > > > all accounting through a single MEMCG_DMABUF path.
> > > >
> > > [...]
> > >
> > > > -               if (mem_accounting)
> > > > -                       flags |= __GFP_ACCOUNT;
> > >
> > > Hi Albert,
> > >
> > > would it be better to move this and its description to patch 1? It
> > > looks like patch 1 already introduces the double accounting changes,
> > > and patch 2 is mainly just supporting remote charging.
> >
> > Hi Barry,
> >
> > Thanks for looking into this series! Yes, in my head I was trying to
> > keep patch 1, which was taken from a previous, different series, and
> > then diverge from it starting with patch 2. This would clarify the
> > difference between the two. But I can see it just added some confusion
> > (for example, patch 1 charges on dma_buf_export() and then it is moved
> > to dma_heap_buffer_alloc() in patch 2). I will reorganize it better
> > for the next version, including your suggestion.
>
> Yep, I understand the situation now. I also understand
> that you were referring to T.J.'s patch, which caused
> some back-and-forth confusion for readers when reading
> patches 1 and 2.
>
> >
> > >
> > > Also, mem_accounting is only used by system_heap.c; has this patchset
> > > also eliminated its need?
> >
> > No, mem_accounting is still handled in this patch for the general case
> > where no `charge_pid_fd` is used. See dma_heap_buffer_alloc() code:
> >
> > +       if (memcg)
> > +               css_get(&memcg->css);
> > +       else if (mem_accounting)
> > +               memcg = get_mem_cgroup_from_mm(current->mm);
>
> I see. What feels a bit odd to me is that mem_accounting
> could either be dropped (with unconditional charging), or
> it should cover both remote and local charge cases.

Good point. If I understand correctly, looking at patch [1] that
introduced the flag, the shared buffer caveats mentioned there are not
yet covered by this approach, so the flag should stay. I will make it
consistent and cover both remote and local charge cases.

[1] https://lore.kernel.org/all/20260116-dmabuf-heap-system-memcg-v3-1-ecc6b62cc446@redhat.com/

>
> I don’t have a strong opinion here—it just feels a bit
> strange, since its description is quite generic for memcg:
>
> "Enable cgroup-based memory accounting for dma-buf heap
> allocations (default=false)."
>
> Best Regards
> Barry
>


^ permalink raw reply

* Re: Re: [PATCH v2] dcache: add fs.dentry-limit sysctl with negative-first reaper
From: Horst Birthelmer @ 2026-05-19  9:37 UTC (permalink / raw)
  To: Jan Kara
  Cc: Matthew Wilcox, Horst Birthelmer, Miklos Szeredi, Jonathan Corbet,
	Shuah Khan, Alexander Viro, Christian Brauner, linux-doc,
	linux-kernel, linux-fsdevel, Horst Birthelmer
In-Reply-To: <mptmd2qxgqwkhfrq5dgwomysdnwoy6fnztr3ibrvbbsb7hvrv3@peg7mojzfucy>

On Tue, May 19, 2026 at 10:45:09AM +0200, Jan Kara wrote:
> Hi Horst!
> 
> On Sun 17-05-26 09:57:41, Horst Birthelmer wrote:
> > On Sun, May 17, 2026 at 12:09:26AM +0100, Matthew Wilcox wrote:
> > > On Sat, May 16, 2026 at 04:52:54PM +0200, Horst Birthelmer wrote:
> > > > There was a discussion at LSFMM about servers with too many cached
> > > > negative dentries.
> > > > That gave me the idea to keep the dentries in general limited
> > > > if the system administrator needs it to.
> > > 
> > > I feel you should link to the dozens of previous attempts at this kind
> > > of thing to show that you're aware that this has been tried before and
> > > you're doing something meaningfully different.
> 
> <snip>
> 
> > As a conclusion, I think I have an uncommon perspective on the cache entries
> > since I don't usually work on vfs but argue from the perspective of a fuse server
> > Where the kernel makes us waste resources. This hurts way more in the FUSE context
> > than in a 'normal' file system.
> > I have taken the look at the dentry cache just because people told me that this
> > has to be solved in the vfs (and I agree). I actually have a somewhat hacky patch
> > to do this from fuse and only for the fuse sb.
> 
> So I'm a bit confused here. The changelog speaks only about negative
> dentries (and that's what the change also concentrates on). OTOH you've
> mentioned multiple times that you are not really interested in limiting
> negative dentries but rather positive ones because you have a problem with
> cached inodes. So can you perhaps formulate what is exactly the problem
> you're trying to solve?

Maybe the changelog was a bit misleading here.
I did of course prefer negative entries, since that could bring down the 
number of cached entries. In retrospect that was probably a mistake but I 
was somewhat afraid if I don't reduce those, too, someone would shurely
point out that it would be easier to cut those, since they are not really
used anyway, and would be cheap to free.

This was only to be more useful than just solving _my_ problem. Maybe not
a good approach, I don't know yet.

> 
> Also you mention that cached (positive) dentries and inodes are a wasted
> memory when they aren't used. That is certainly a valid view, OTOH you can
> never predict future so you don't really know what will get used in the
> future and thus will be useful. That's why we currently side with the idea
> that memory that isn't used for something is wasted and unless there's
> something to use the memory for, we cache dentries & inodes & page cache in
> it.
> 
> If I remember correctly the discussion we had at LSF, the problem why inode
> caching is a problem for you, although there's enough free memory and no
> memory pressure, is that these cached inodes pin memory on the other end of
> the FUSE communication channel and there we are getting short on memory. Is
> this what you're trying to solve?

You remember our conversation correctly and have masterfully summerized it in 
the passage above. Yes, that is what I'm trying to solve.

The problem we are facing is, that the fuse server has to keep a lot of private
data and some data for locks (DLM) for the cached inodes and dentries.
(inodes are even more expensive due to byte range locking)

So my idea was to NOT keep unused (and negative) entries around.
Letting the admin set the limit where the kernel starts to clean, was just
for convenience. If it was up to me I would like to set this in the initial
negotiation in FUSE during mount.

The waste of memory for me is not in the kernel but in the fuse server. The
kernel is just the master of what we have to keep, and thus the kernel moved
to the center of attention.

In short:
all caching in the kernel hurts us since we have to keep our private data
for all positive dentries, and I want to get the most for the amount pain.

OTOH caching meta data is really useful but you have to have a good prediction
on what to keep. As we cannot predict that on either side of the kernel,
throwing away the unused parts when they get out of hand seemed like a good idea.

After the discussions here it seems like everybody has his own interpretation on
what useful data to cache is.
I'm really inclined to think about letting the lower layers decide what useful
cached data should be.

In this context probably a fuse server message as a notification for which data
it thinks can be thrown out, similar to the FORGET call but in the other direction
and if the kernel agrees it really sends a FORGET and we can clean up on the other
side.

> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

Thanks a lot for looking at this,
I really appreciate it!
... and I hope I could clarify, what I was trying to do.

Horst

^ permalink raw reply

* Re: [PATCH net-next v2 2/2] net: ti: icssg: Add HSR and LRE PA statistics
From: Paolo Abeni @ 2026-05-19  9:29 UTC (permalink / raw)
  To: MD Danish Anwar, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Simon Horman, Jonathan Corbet, Shuah Khan, Roger Quadros,
	Andrew Lunn, Meghana Malladi, Jacob Keller, David Carlier,
	Vadim Fedorenko, Kevin Hao
  Cc: netdev, linux-doc, linux-kernel, linux-arm-kernel,
	Vladimir Oltean
In-Reply-To: <20260514075605.850674-3-danishanwar@ti.com>

On 5/14/26 9:56 AM, MD Danish Anwar wrote:
> @@ -201,6 +201,16 @@ static const struct icssg_pa_stats icssg_all_pa_stats[] = {
>  	ICSSG_PA_STATS(FW_HOST_TX_PKT_CNT),
>  	ICSSG_PA_STATS(FW_HOST_EGRESS_Q_PRE_OVERFLOW),
>  	ICSSG_PA_STATS(FW_HOST_EGRESS_Q_EXP_OVERFLOW),
> +	ICSSG_PA_STATS(FW_HSR_FWD_CHECK_FAIL_DROP),
> +	ICSSG_PA_STATS(FW_HSR_HE_CHECK_FAIL_DROP),
> +	ICSSG_PA_STATS(FW_HSR_SKIP_HOST_DUP_DISCARD_FRAMES),

Sashiko noted that this statistic name exceed the ethtool string limit.

/P


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox