* [ndctl PATCH] ndctl/test: fully reset nfit_test in pmem_ns unit test
@ 2025-10-21 21:26 Alison Schofield
2025-10-22 15:04 ` Dave Jiang
2025-10-22 18:37 ` Marc Herbert
0 siblings, 2 replies; 4+ messages in thread
From: Alison Schofield @ 2025-10-21 21:26 UTC (permalink / raw)
To: nvdimm; +Cc: Alison Schofield, Marc Herbert
The pmem_ns unit test frequently fails when run as part of the full
suite, yet passes when executed alone.
The test first looks for an ACPI.NFIT bus with a usable region, and if
none is found, falls back to using the nfit_test bus. However, that
fallback consistently fails with errors such as:
path: /sys/devices/platform/nfit_test.0/ndbus2/region7/namespace7.0/uuid
libndctl: write_attr: failed to open /sys/devices/platform/nfit_test.0/ndbus2/region7/namespace7.0/uuid: No such file or directory
/root/ndctl/build/test/pmem-ns: failed to create PMEM namespace
This occurs because calling ndctl_test_init() with a NULL context only
unloads and reloads the nfit_test module, but does not invalidate and
reinitialize the libndctl context or sysfs view from previous runs.
The resulting stale state prevents the pmem_ns test from creating a
new namespace cleanly.
Replace the NULL context parameter when calling ndctl_test_init()
with the available ndctl_ctx to ensure pmem_ns can find usable PMEM
regions.
Reported-by: Marc Herbert <marc.herbert@intel.com>
Closes: https://github.com/pmem/ndctl/issues/290
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
test/pmem_namespaces.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/test/pmem_namespaces.c b/test/pmem_namespaces.c
index 4bafff5164c8..7b8de9dcb61d 100644
--- a/test/pmem_namespaces.c
+++ b/test/pmem_namespaces.c
@@ -191,7 +191,7 @@ int test_pmem_namespaces(int log_level, struct ndctl_test *test,
if (!bus) {
fprintf(stderr, "ACPI.NFIT unavailable falling back to nfit_test\n");
- rc = ndctl_test_init(&kmod_ctx, &mod, NULL, log_level, test);
+ rc = ndctl_test_init(&kmod_ctx, &mod, ctx, log_level, test);
ndctl_invalidate(ctx);
bus = ndctl_bus_get_by_provider(ctx, "nfit_test.0");
if (rc < 0 || !bus) {
--
2.37.3
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [ndctl PATCH] ndctl/test: fully reset nfit_test in pmem_ns unit test
2025-10-21 21:26 [ndctl PATCH] ndctl/test: fully reset nfit_test in pmem_ns unit test Alison Schofield
@ 2025-10-22 15:04 ` Dave Jiang
2025-10-22 18:37 ` Marc Herbert
1 sibling, 0 replies; 4+ messages in thread
From: Dave Jiang @ 2025-10-22 15:04 UTC (permalink / raw)
To: Alison Schofield, nvdimm; +Cc: Marc Herbert
On 10/21/25 2:26 PM, Alison Schofield wrote:
> The pmem_ns unit test frequently fails when run as part of the full
> suite, yet passes when executed alone.
>
> The test first looks for an ACPI.NFIT bus with a usable region, and if
> none is found, falls back to using the nfit_test bus. However, that
> fallback consistently fails with errors such as:
>
> path: /sys/devices/platform/nfit_test.0/ndbus2/region7/namespace7.0/uuid
> libndctl: write_attr: failed to open /sys/devices/platform/nfit_test.0/ndbus2/region7/namespace7.0/uuid: No such file or directory
> /root/ndctl/build/test/pmem-ns: failed to create PMEM namespace
>
> This occurs because calling ndctl_test_init() with a NULL context only
> unloads and reloads the nfit_test module, but does not invalidate and
> reinitialize the libndctl context or sysfs view from previous runs.
> The resulting stale state prevents the pmem_ns test from creating a
> new namespace cleanly.
>
> Replace the NULL context parameter when calling ndctl_test_init()
> with the available ndctl_ctx to ensure pmem_ns can find usable PMEM
> regions.
>
> Reported-by: Marc Herbert <marc.herbert@intel.com>
> Closes: https://github.com/pmem/ndctl/issues/290
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>> ---
> test/pmem_namespaces.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/test/pmem_namespaces.c b/test/pmem_namespaces.c
> index 4bafff5164c8..7b8de9dcb61d 100644
> --- a/test/pmem_namespaces.c
> +++ b/test/pmem_namespaces.c
> @@ -191,7 +191,7 @@ int test_pmem_namespaces(int log_level, struct ndctl_test *test,
>
> if (!bus) {
> fprintf(stderr, "ACPI.NFIT unavailable falling back to nfit_test\n");
> - rc = ndctl_test_init(&kmod_ctx, &mod, NULL, log_level, test);
> + rc = ndctl_test_init(&kmod_ctx, &mod, ctx, log_level, test);
> ndctl_invalidate(ctx);
> bus = ndctl_bus_get_by_provider(ctx, "nfit_test.0");
> if (rc < 0 || !bus) {
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [ndctl PATCH] ndctl/test: fully reset nfit_test in pmem_ns unit test
2025-10-21 21:26 [ndctl PATCH] ndctl/test: fully reset nfit_test in pmem_ns unit test Alison Schofield
2025-10-22 15:04 ` Dave Jiang
@ 2025-10-22 18:37 ` Marc Herbert
2025-10-23 5:15 ` Alison Schofield
1 sibling, 1 reply; 4+ messages in thread
From: Marc Herbert @ 2025-10-22 18:37 UTC (permalink / raw)
To: Alison Schofield, nvdimm; +Cc: Marc Herbert
On 2025-10-21 14:26, Alison Schofield wrote:
> The pmem_ns unit test frequently fails when run as part of the full
> suite, yet passes when executed alone.
>
> [...]
> > Replace the NULL context parameter when calling ndctl_test_init()
> with the available ndctl_ctx to ensure pmem_ns can find usable PMEM
> regions.
>
> Reported-by: Marc Herbert <marc.herbert@intel.com>
> Closes: https://github.com/pmem/ndctl/issues/290
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> ---
> test/pmem_namespaces.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/test/pmem_namespaces.c b/test/pmem_namespaces.c
> index 4bafff5164c8..7b8de9dcb61d 100644
> --- a/test/pmem_namespaces.c
> +++ b/test/pmem_namespaces.c
> @@ -191,7 +191,7 @@ int test_pmem_namespaces(int log_level, struct ndctl_test *test,
>
> if (!bus) {
> fprintf(stderr, "ACPI.NFIT unavailable falling back to nfit_test\n");
> - rc = ndctl_test_init(&kmod_ctx, &mod, NULL, log_level, test);
> + rc = ndctl_test_init(&kmod_ctx, &mod, ctx, log_level, test);
> ndctl_invalidate(ctx);
> bus = ndctl_bus_get_by_provider(ctx, "nfit_test.0");
> if (rc < 0 || !bus) {
Thanks Alison! This does fix the crash, so you can also add my Tested-By:!
But to test, I had to combine this fix with this temporary hack from
https://github.com/pmem/ndctl/issues/290
--- a/test/pmem_namespaces.c
+++ b/test/pmem_namespaces.c
@@ -189,7 +189,7 @@ int test_pmem_namespaces(int log_level, struct ndctl_test *test,
bus = NULL;
}
- if (!bus) {
+ if (!bus || true) {
fprintf(stderr, "ACPI.NFIT unavailable falling back to nfit_test\n");
rc = ndctl_test_init(&kmod_ctx, &mod, NULL, log_level, test);
ndctl_invalidate(ctx);
... which explains why I disagree with... the commit message! I don't think
this necessary fix "closes" https://github.com/pmem/ndctl/issues/290 entirely.
This fix does stop the test from failing which is great and it lowers dramatically
the severity of 290. But we still don't know why ACPI.NFIT is "available" most of
the time and... sometimes not. In other words, we still don't know why this test is
non-deterministic. Of course, there will always be some non-determinism because
the kernel and QEMU are too complex to be deterministic but I don't think
non-determism should extend to test fixtures and test code themselves like this.
Why 290 should stay open IMHO.
Also, this feels like a (missed?) opportunity to add better logging of this
non-determinism, I mean stuff like:
https://github.com/pmem/ndctl/issues/290#issuecomment-3260168362
This is test code, it should not be mean with logging. All bash scripts run
with "set -x" already so this would not make much difference to the total
volume.
Generally speaking, tests should follow a CLEAN - TEST - CLEAN logic to
minimize interferences; as much as time allows[*]. Bug 290 demonstrates that:
1. Some unknown test running before pmem-ns does not clean properly after itself, and
2. The pmem-ns test is not capable of creating a deterministic setup for itself.
We still have no clue about 1. and 2. is not mitigated with logs
and source comments. So there's still an open bug there.
Marc
[*] there are practical limits: rebooting QEMU for each test would be too slow.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [ndctl PATCH] ndctl/test: fully reset nfit_test in pmem_ns unit test
2025-10-22 18:37 ` Marc Herbert
@ 2025-10-23 5:15 ` Alison Schofield
0 siblings, 0 replies; 4+ messages in thread
From: Alison Schofield @ 2025-10-23 5:15 UTC (permalink / raw)
To: Marc Herbert; +Cc: nvdimm, Marc Herbert
On Wed, Oct 22, 2025 at 11:37:37AM -0700, Marc Herbert wrote:
> On 2025-10-21 14:26, Alison Schofield wrote:
> > The pmem_ns unit test frequently fails when run as part of the full
> > suite, yet passes when executed alone.
> >
> > [...]
> > > Replace the NULL context parameter when calling ndctl_test_init()
> > with the available ndctl_ctx to ensure pmem_ns can find usable PMEM
> > regions.
> >
> > Reported-by: Marc Herbert <marc.herbert@intel.com>
> > Closes: https://github.com/pmem/ndctl/issues/290
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > ---
> > test/pmem_namespaces.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/test/pmem_namespaces.c b/test/pmem_namespaces.c
> > index 4bafff5164c8..7b8de9dcb61d 100644
> > --- a/test/pmem_namespaces.c
> > +++ b/test/pmem_namespaces.c
> > @@ -191,7 +191,7 @@ int test_pmem_namespaces(int log_level, struct ndctl_test *test,
> >
> > if (!bus) {
> > fprintf(stderr, "ACPI.NFIT unavailable falling back to nfit_test\n");
> > - rc = ndctl_test_init(&kmod_ctx, &mod, NULL, log_level, test);
> > + rc = ndctl_test_init(&kmod_ctx, &mod, ctx, log_level, test);
> > ndctl_invalidate(ctx);
> > bus = ndctl_bus_get_by_provider(ctx, "nfit_test.0");
> > if (rc < 0 || !bus) {
>
> Thanks Alison! This does fix the crash, so you can also add my Tested-By:!
>
> But to test, I had to combine this fix with this temporary hack from
> https://github.com/pmem/ndctl/issues/290
Ah, yes I did similar to debug and test.
>
> --- a/test/pmem_namespaces.c
> +++ b/test/pmem_namespaces.c
> @@ -189,7 +189,7 @@ int test_pmem_namespaces(int log_level, struct ndctl_test *test,
> bus = NULL;
> }
>
> - if (!bus) {
> + if (!bus || true) {
> fprintf(stderr, "ACPI.NFIT unavailable falling back to nfit_test\n");
> rc = ndctl_test_init(&kmod_ctx, &mod, NULL, log_level, test);
> ndctl_invalidate(ctx);
>
>
>
> ... which explains why I disagree with... the commit message! I don't think
> this necessary fix "closes" https://github.com/pmem/ndctl/issues/290 entirely.
Marc,
Thanks for the review!
Ah, you disagree with the Closes tag? I added the close tag expecting
the test case will now pass. pmem-ns will successfully fallback to
nfit_test region if ACPI.NFIT is not present or does not have the pmem
capable region.
wrt the reason why ACPI.NFIT fails to find a suitable region, I haven't
given up on it. In my setup, it fails because the region type is
ND_DEVICE_NAMESPACE_IO (4) rather than ND_DEVICE_NAMESPACE_PMEM (5)
wrt why it fails in your case, a full test run after boot, and with
my reproducer (simply run pmem-ns alone). I don't have the soln yet.
If you have time to check that your failure is same as with my
reproducer, you can collect and share this:
ND_DEVICE_NAMESPACE_IO is 4
ND_DEVICE_NAMESPACE_PMEM is 5
diff --git a/test/pmem_namespaces.c b/test/pmem_namespaces.c
index 4bafff5164c8..c2f25bb02025 100644
--- a/test/pmem_namespaces.c
+++ b/test/pmem_namespaces.c
@@ -180,11 +180,15 @@ int test_pmem_namespaces(int log_level, struct ndctl_test *test,
bus = ndctl_bus_get_by_provider(ctx, "ACPI.NFIT");
if (bus) {
+ int nstype;
+
/* skip this bus if no label-enabled PMEM regions */
- ndctl_region_foreach(bus, region)
- if (ndctl_region_get_nstype(region)
- == ND_DEVICE_NAMESPACE_PMEM)
+ ndctl_region_foreach(bus, region) {
+ nstype = ndctl_region_get_nstype(region);
+ fprintf(stderr, "ALISON nstype %d\n", nstype);
+ if (nstype == ND_DEVICE_NAMESPACE_PMEM)
break;
+ }
if (!region)
bus = NULL;
}
>
> This fix does stop the test from failing which is great and it lowers dramatically
> the severity of 290. But we still don't know why ACPI.NFIT is "available" most of
> the time and... sometimes not. In other words, we still don't know why this test is
> non-deterministic. Of course, there will always be some non-determinism because
> the kernel and QEMU are too complex to be deterministic but I don't think
> non-determism should extend to test fixtures and test code themselves like this.
> Why 290 should stay open IMHO.
>
> Also, this feels like a (missed?) opportunity to add better logging of this
> non-determinism, I mean stuff like:
> https://github.com/pmem/ndctl/issues/290#issuecomment-3260168362
> This is test code, it should not be mean with logging. All bash scripts run
> with "set -x" already so this would not make much difference to the total
> volume.
>
>
> Generally speaking, tests should follow a CLEAN - TEST - CLEAN logic to
> minimize interferences; as much as time allows[*]. Bug 290 demonstrates that:
> 1. Some unknown test running before pmem-ns does not clean properly after itself, and
> 2. The pmem-ns test is not capable of creating a deterministic setup for itself.
>
> We still have no clue about 1. and 2. is not mitigated with logs
> and source comments. So there's still an open bug there.
>
> Marc
>
>
>
>
> [*] there are practical limits: rebooting QEMU for each test would be too slow.
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-10-23 5:15 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-21 21:26 [ndctl PATCH] ndctl/test: fully reset nfit_test in pmem_ns unit test Alison Schofield
2025-10-22 15:04 ` Dave Jiang
2025-10-22 18:37 ` Marc Herbert
2025-10-23 5:15 ` Alison Schofield
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox