* [ndctl PATCH] ndctl/test: fully reset nfit_test in pmem_ns unit test @ 2025-10-21 21:26 Alison Schofield 2025-10-22 15:04 ` Dave Jiang 2025-10-22 18:37 ` Marc Herbert 0 siblings, 2 replies; 4+ messages in thread From: Alison Schofield @ 2025-10-21 21:26 UTC (permalink / raw) To: nvdimm; +Cc: Alison Schofield, Marc Herbert The pmem_ns unit test frequently fails when run as part of the full suite, yet passes when executed alone. The test first looks for an ACPI.NFIT bus with a usable region, and if none is found, falls back to using the nfit_test bus. However, that fallback consistently fails with errors such as: path: /sys/devices/platform/nfit_test.0/ndbus2/region7/namespace7.0/uuid libndctl: write_attr: failed to open /sys/devices/platform/nfit_test.0/ndbus2/region7/namespace7.0/uuid: No such file or directory /root/ndctl/build/test/pmem-ns: failed to create PMEM namespace This occurs because calling ndctl_test_init() with a NULL context only unloads and reloads the nfit_test module, but does not invalidate and reinitialize the libndctl context or sysfs view from previous runs. The resulting stale state prevents the pmem_ns test from creating a new namespace cleanly. Replace the NULL context parameter when calling ndctl_test_init() with the available ndctl_ctx to ensure pmem_ns can find usable PMEM regions. Reported-by: Marc Herbert <marc.herbert@intel.com> Closes: https://github.com/pmem/ndctl/issues/290 Signed-off-by: Alison Schofield <alison.schofield@intel.com> --- test/pmem_namespaces.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/test/pmem_namespaces.c b/test/pmem_namespaces.c index 4bafff5164c8..7b8de9dcb61d 100644 --- a/test/pmem_namespaces.c +++ b/test/pmem_namespaces.c @@ -191,7 +191,7 @@ int test_pmem_namespaces(int log_level, struct ndctl_test *test, if (!bus) { fprintf(stderr, "ACPI.NFIT unavailable falling back to nfit_test\n"); - rc = ndctl_test_init(&kmod_ctx, &mod, NULL, log_level, test); + rc = ndctl_test_init(&kmod_ctx, &mod, ctx, log_level, test); ndctl_invalidate(ctx); bus = ndctl_bus_get_by_provider(ctx, "nfit_test.0"); if (rc < 0 || !bus) { -- 2.37.3 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [ndctl PATCH] ndctl/test: fully reset nfit_test in pmem_ns unit test 2025-10-21 21:26 [ndctl PATCH] ndctl/test: fully reset nfit_test in pmem_ns unit test Alison Schofield @ 2025-10-22 15:04 ` Dave Jiang 2025-10-22 18:37 ` Marc Herbert 1 sibling, 0 replies; 4+ messages in thread From: Dave Jiang @ 2025-10-22 15:04 UTC (permalink / raw) To: Alison Schofield, nvdimm; +Cc: Marc Herbert On 10/21/25 2:26 PM, Alison Schofield wrote: > The pmem_ns unit test frequently fails when run as part of the full > suite, yet passes when executed alone. > > The test first looks for an ACPI.NFIT bus with a usable region, and if > none is found, falls back to using the nfit_test bus. However, that > fallback consistently fails with errors such as: > > path: /sys/devices/platform/nfit_test.0/ndbus2/region7/namespace7.0/uuid > libndctl: write_attr: failed to open /sys/devices/platform/nfit_test.0/ndbus2/region7/namespace7.0/uuid: No such file or directory > /root/ndctl/build/test/pmem-ns: failed to create PMEM namespace > > This occurs because calling ndctl_test_init() with a NULL context only > unloads and reloads the nfit_test module, but does not invalidate and > reinitialize the libndctl context or sysfs view from previous runs. > The resulting stale state prevents the pmem_ns test from creating a > new namespace cleanly. > > Replace the NULL context parameter when calling ndctl_test_init() > with the available ndctl_ctx to ensure pmem_ns can find usable PMEM > regions. > > Reported-by: Marc Herbert <marc.herbert@intel.com> > Closes: https://github.com/pmem/ndctl/issues/290 > Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com>> --- > test/pmem_namespaces.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/test/pmem_namespaces.c b/test/pmem_namespaces.c > index 4bafff5164c8..7b8de9dcb61d 100644 > --- a/test/pmem_namespaces.c > +++ b/test/pmem_namespaces.c > @@ -191,7 +191,7 @@ int test_pmem_namespaces(int log_level, struct ndctl_test *test, > > if (!bus) { > fprintf(stderr, "ACPI.NFIT unavailable falling back to nfit_test\n"); > - rc = ndctl_test_init(&kmod_ctx, &mod, NULL, log_level, test); > + rc = ndctl_test_init(&kmod_ctx, &mod, ctx, log_level, test); > ndctl_invalidate(ctx); > bus = ndctl_bus_get_by_provider(ctx, "nfit_test.0"); > if (rc < 0 || !bus) { ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [ndctl PATCH] ndctl/test: fully reset nfit_test in pmem_ns unit test 2025-10-21 21:26 [ndctl PATCH] ndctl/test: fully reset nfit_test in pmem_ns unit test Alison Schofield 2025-10-22 15:04 ` Dave Jiang @ 2025-10-22 18:37 ` Marc Herbert 2025-10-23 5:15 ` Alison Schofield 1 sibling, 1 reply; 4+ messages in thread From: Marc Herbert @ 2025-10-22 18:37 UTC (permalink / raw) To: Alison Schofield, nvdimm; +Cc: Marc Herbert On 2025-10-21 14:26, Alison Schofield wrote: > The pmem_ns unit test frequently fails when run as part of the full > suite, yet passes when executed alone. > > [...] > > Replace the NULL context parameter when calling ndctl_test_init() > with the available ndctl_ctx to ensure pmem_ns can find usable PMEM > regions. > > Reported-by: Marc Herbert <marc.herbert@intel.com> > Closes: https://github.com/pmem/ndctl/issues/290 > Signed-off-by: Alison Schofield <alison.schofield@intel.com> > --- > test/pmem_namespaces.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/test/pmem_namespaces.c b/test/pmem_namespaces.c > index 4bafff5164c8..7b8de9dcb61d 100644 > --- a/test/pmem_namespaces.c > +++ b/test/pmem_namespaces.c > @@ -191,7 +191,7 @@ int test_pmem_namespaces(int log_level, struct ndctl_test *test, > > if (!bus) { > fprintf(stderr, "ACPI.NFIT unavailable falling back to nfit_test\n"); > - rc = ndctl_test_init(&kmod_ctx, &mod, NULL, log_level, test); > + rc = ndctl_test_init(&kmod_ctx, &mod, ctx, log_level, test); > ndctl_invalidate(ctx); > bus = ndctl_bus_get_by_provider(ctx, "nfit_test.0"); > if (rc < 0 || !bus) { Thanks Alison! This does fix the crash, so you can also add my Tested-By:! But to test, I had to combine this fix with this temporary hack from https://github.com/pmem/ndctl/issues/290 --- a/test/pmem_namespaces.c +++ b/test/pmem_namespaces.c @@ -189,7 +189,7 @@ int test_pmem_namespaces(int log_level, struct ndctl_test *test, bus = NULL; } - if (!bus) { + if (!bus || true) { fprintf(stderr, "ACPI.NFIT unavailable falling back to nfit_test\n"); rc = ndctl_test_init(&kmod_ctx, &mod, NULL, log_level, test); ndctl_invalidate(ctx); ... which explains why I disagree with... the commit message! I don't think this necessary fix "closes" https://github.com/pmem/ndctl/issues/290 entirely. This fix does stop the test from failing which is great and it lowers dramatically the severity of 290. But we still don't know why ACPI.NFIT is "available" most of the time and... sometimes not. In other words, we still don't know why this test is non-deterministic. Of course, there will always be some non-determinism because the kernel and QEMU are too complex to be deterministic but I don't think non-determism should extend to test fixtures and test code themselves like this. Why 290 should stay open IMHO. Also, this feels like a (missed?) opportunity to add better logging of this non-determinism, I mean stuff like: https://github.com/pmem/ndctl/issues/290#issuecomment-3260168362 This is test code, it should not be mean with logging. All bash scripts run with "set -x" already so this would not make much difference to the total volume. Generally speaking, tests should follow a CLEAN - TEST - CLEAN logic to minimize interferences; as much as time allows[*]. Bug 290 demonstrates that: 1. Some unknown test running before pmem-ns does not clean properly after itself, and 2. The pmem-ns test is not capable of creating a deterministic setup for itself. We still have no clue about 1. and 2. is not mitigated with logs and source comments. So there's still an open bug there. Marc [*] there are practical limits: rebooting QEMU for each test would be too slow. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [ndctl PATCH] ndctl/test: fully reset nfit_test in pmem_ns unit test 2025-10-22 18:37 ` Marc Herbert @ 2025-10-23 5:15 ` Alison Schofield 0 siblings, 0 replies; 4+ messages in thread From: Alison Schofield @ 2025-10-23 5:15 UTC (permalink / raw) To: Marc Herbert; +Cc: nvdimm, Marc Herbert On Wed, Oct 22, 2025 at 11:37:37AM -0700, Marc Herbert wrote: > On 2025-10-21 14:26, Alison Schofield wrote: > > The pmem_ns unit test frequently fails when run as part of the full > > suite, yet passes when executed alone. > > > > [...] > > > Replace the NULL context parameter when calling ndctl_test_init() > > with the available ndctl_ctx to ensure pmem_ns can find usable PMEM > > regions. > > > > Reported-by: Marc Herbert <marc.herbert@intel.com> > > Closes: https://github.com/pmem/ndctl/issues/290 > > Signed-off-by: Alison Schofield <alison.schofield@intel.com> > > --- > > test/pmem_namespaces.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/test/pmem_namespaces.c b/test/pmem_namespaces.c > > index 4bafff5164c8..7b8de9dcb61d 100644 > > --- a/test/pmem_namespaces.c > > +++ b/test/pmem_namespaces.c > > @@ -191,7 +191,7 @@ int test_pmem_namespaces(int log_level, struct ndctl_test *test, > > > > if (!bus) { > > fprintf(stderr, "ACPI.NFIT unavailable falling back to nfit_test\n"); > > - rc = ndctl_test_init(&kmod_ctx, &mod, NULL, log_level, test); > > + rc = ndctl_test_init(&kmod_ctx, &mod, ctx, log_level, test); > > ndctl_invalidate(ctx); > > bus = ndctl_bus_get_by_provider(ctx, "nfit_test.0"); > > if (rc < 0 || !bus) { > > Thanks Alison! This does fix the crash, so you can also add my Tested-By:! > > But to test, I had to combine this fix with this temporary hack from > https://github.com/pmem/ndctl/issues/290 Ah, yes I did similar to debug and test. > > --- a/test/pmem_namespaces.c > +++ b/test/pmem_namespaces.c > @@ -189,7 +189,7 @@ int test_pmem_namespaces(int log_level, struct ndctl_test *test, > bus = NULL; > } > > - if (!bus) { > + if (!bus || true) { > fprintf(stderr, "ACPI.NFIT unavailable falling back to nfit_test\n"); > rc = ndctl_test_init(&kmod_ctx, &mod, NULL, log_level, test); > ndctl_invalidate(ctx); > > > > ... which explains why I disagree with... the commit message! I don't think > this necessary fix "closes" https://github.com/pmem/ndctl/issues/290 entirely. Marc, Thanks for the review! Ah, you disagree with the Closes tag? I added the close tag expecting the test case will now pass. pmem-ns will successfully fallback to nfit_test region if ACPI.NFIT is not present or does not have the pmem capable region. wrt the reason why ACPI.NFIT fails to find a suitable region, I haven't given up on it. In my setup, it fails because the region type is ND_DEVICE_NAMESPACE_IO (4) rather than ND_DEVICE_NAMESPACE_PMEM (5) wrt why it fails in your case, a full test run after boot, and with my reproducer (simply run pmem-ns alone). I don't have the soln yet. If you have time to check that your failure is same as with my reproducer, you can collect and share this: ND_DEVICE_NAMESPACE_IO is 4 ND_DEVICE_NAMESPACE_PMEM is 5 diff --git a/test/pmem_namespaces.c b/test/pmem_namespaces.c index 4bafff5164c8..c2f25bb02025 100644 --- a/test/pmem_namespaces.c +++ b/test/pmem_namespaces.c @@ -180,11 +180,15 @@ int test_pmem_namespaces(int log_level, struct ndctl_test *test, bus = ndctl_bus_get_by_provider(ctx, "ACPI.NFIT"); if (bus) { + int nstype; + /* skip this bus if no label-enabled PMEM regions */ - ndctl_region_foreach(bus, region) - if (ndctl_region_get_nstype(region) - == ND_DEVICE_NAMESPACE_PMEM) + ndctl_region_foreach(bus, region) { + nstype = ndctl_region_get_nstype(region); + fprintf(stderr, "ALISON nstype %d\n", nstype); + if (nstype == ND_DEVICE_NAMESPACE_PMEM) break; + } if (!region) bus = NULL; } > > This fix does stop the test from failing which is great and it lowers dramatically > the severity of 290. But we still don't know why ACPI.NFIT is "available" most of > the time and... sometimes not. In other words, we still don't know why this test is > non-deterministic. Of course, there will always be some non-determinism because > the kernel and QEMU are too complex to be deterministic but I don't think > non-determism should extend to test fixtures and test code themselves like this. > Why 290 should stay open IMHO. > > Also, this feels like a (missed?) opportunity to add better logging of this > non-determinism, I mean stuff like: > https://github.com/pmem/ndctl/issues/290#issuecomment-3260168362 > This is test code, it should not be mean with logging. All bash scripts run > with "set -x" already so this would not make much difference to the total > volume. > > > Generally speaking, tests should follow a CLEAN - TEST - CLEAN logic to > minimize interferences; as much as time allows[*]. Bug 290 demonstrates that: > 1. Some unknown test running before pmem-ns does not clean properly after itself, and > 2. The pmem-ns test is not capable of creating a deterministic setup for itself. > > We still have no clue about 1. and 2. is not mitigated with logs > and source comments. So there's still an open bug there. > > Marc > > > > > [*] there are practical limits: rebooting QEMU for each test would be too slow. ^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-10-23 5:15 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-10-21 21:26 [ndctl PATCH] ndctl/test: fully reset nfit_test in pmem_ns unit test Alison Schofield 2025-10-22 15:04 ` Dave Jiang 2025-10-22 18:37 ` Marc Herbert 2025-10-23 5:15 ` Alison Schofield
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox