public inbox for nvdimm@lists.linux.dev
 help / color / mirror / Atom feed
From: Alison Schofield <alison.schofield@intel.com>
To: Marc Herbert <marc.herbert@linux.intel.com>
Cc: <nvdimm@lists.linux.dev>, Marc Herbert <marc.herbert@intel.com>
Subject: Re: [ndctl PATCH] ndctl/test: fully reset nfit_test in pmem_ns unit test
Date: Wed, 22 Oct 2025 22:15:07 -0700	[thread overview]
Message-ID: <aPm524Y0hIEOUehg@aschofie-mobl2.lan> (raw)
In-Reply-To: <a9806830-6ce7-4d1b-a72d-7fa123e8b326@linux.intel.com>

On Wed, Oct 22, 2025 at 11:37:37AM -0700, Marc Herbert wrote:
> On 2025-10-21 14:26, Alison Schofield wrote:
> > The pmem_ns unit test frequently fails when run as part of the full
> > suite, yet passes when executed alone.
> > 
> > [...]
> > > Replace the NULL context parameter when calling ndctl_test_init()
> > with the available ndctl_ctx to ensure pmem_ns can find usable PMEM
> > regions.
> > 
> > Reported-by: Marc Herbert <marc.herbert@intel.com>
> > Closes: https://github.com/pmem/ndctl/issues/290
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > ---
> >  test/pmem_namespaces.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/test/pmem_namespaces.c b/test/pmem_namespaces.c
> > index 4bafff5164c8..7b8de9dcb61d 100644
> > --- a/test/pmem_namespaces.c
> > +++ b/test/pmem_namespaces.c
> > @@ -191,7 +191,7 @@ int test_pmem_namespaces(int log_level, struct ndctl_test *test,
> >  
> >  	if (!bus) {
> >  		fprintf(stderr, "ACPI.NFIT unavailable falling back to nfit_test\n");
> > -		rc = ndctl_test_init(&kmod_ctx, &mod, NULL, log_level, test);
> > +		rc = ndctl_test_init(&kmod_ctx, &mod, ctx, log_level, test);
> >  		ndctl_invalidate(ctx);
> >  		bus = ndctl_bus_get_by_provider(ctx, "nfit_test.0");
> >  		if (rc < 0 || !bus) {
> 
> Thanks Alison! This does fix the crash, so you can also add my Tested-By:!
> 
> But to test, I had to combine this fix with this temporary hack from
> https://github.com/pmem/ndctl/issues/290

Ah, yes I did similar to debug and test.

> 
> --- a/test/pmem_namespaces.c
> +++ b/test/pmem_namespaces.c
> @@ -189,7 +189,7 @@ int test_pmem_namespaces(int log_level, struct ndctl_test *test,
>  			bus = NULL;
>  	}
>  
> -	if (!bus) {
> +	if (!bus || true) {
>  		fprintf(stderr, "ACPI.NFIT unavailable falling back to nfit_test\n");
>  		rc = ndctl_test_init(&kmod_ctx, &mod, NULL, log_level, test);
>  		ndctl_invalidate(ctx);
> 
> 
> 
> ... which explains why I disagree with... the commit message! I don't think
> this necessary fix "closes" https://github.com/pmem/ndctl/issues/290 entirely.

Marc,

Thanks for the review!

Ah, you disagree with the Closes tag? I added the close tag expecting
the test case will now pass. pmem-ns will successfully fallback to
nfit_test region if ACPI.NFIT is not present or does not have the pmem
capable region. 

wrt the reason why ACPI.NFIT fails to find a suitable region, I haven't
given up on it. In my setup, it fails because the region type is 
ND_DEVICE_NAMESPACE_IO (4) rather than ND_DEVICE_NAMESPACE_PMEM (5)

wrt why it fails in your case, a full test run after boot, and with
my reproducer (simply run pmem-ns alone). I don't have the soln yet.

If you have time to check that your failure is same as with my
reproducer, you can collect and share this:

ND_DEVICE_NAMESPACE_IO is 4
ND_DEVICE_NAMESPACE_PMEM is 5


diff --git a/test/pmem_namespaces.c b/test/pmem_namespaces.c
index 4bafff5164c8..c2f25bb02025 100644
--- a/test/pmem_namespaces.c
+++ b/test/pmem_namespaces.c
@@ -180,11 +180,15 @@ int test_pmem_namespaces(int log_level, struct ndctl_test *test,

        bus = ndctl_bus_get_by_provider(ctx, "ACPI.NFIT");
        if (bus) {
+               int nstype;
+
                /* skip this bus if no label-enabled PMEM regions */
-               ndctl_region_foreach(bus, region)
-                       if (ndctl_region_get_nstype(region)
-                                       == ND_DEVICE_NAMESPACE_PMEM)
+               ndctl_region_foreach(bus, region) {
+                       nstype = ndctl_region_get_nstype(region);
+                       fprintf(stderr, "ALISON nstype %d\n", nstype);
+                       if (nstype == ND_DEVICE_NAMESPACE_PMEM)
                                break;
+               }
                if (!region)
                        bus = NULL;
        }

> 
> This fix does stop  the test from failing which is great and it lowers dramatically
> the severity of 290. But we still don't know why ACPI.NFIT is "available" most of
> the time and... sometimes not. In other words, we still don't know why this test is
> non-deterministic. Of course, there will always be some non-determinism because
> the kernel and QEMU are too complex to be deterministic but I don't think
> non-determism should extend to test fixtures and test code themselves like this.
> Why 290 should stay open IMHO.
> 
> Also, this feels like a (missed?) opportunity to add better logging of this
> non-determinism, I mean stuff like:
> https://github.com/pmem/ndctl/issues/290#issuecomment-3260168362
> This is test code, it should not be mean with logging. All bash scripts run
> with "set -x" already so this would not make much difference to the total
> volume.
> 
> 
> Generally speaking, tests should follow a CLEAN - TEST - CLEAN logic to
> minimize interferences; as much as time allows[*]. Bug 290 demonstrates that:
> 1. Some unknown test running before pmem-ns does not clean properly after itself, and
> 2. The pmem-ns test is not capable of creating a deterministic setup for itself.
> 
> We still have no clue about 1. and 2. is not mitigated with logs
> and source comments. So there's still an open bug there.
> 
> Marc
> 
> 
> 
> 
> [*] there are practical limits: rebooting QEMU for each test would be too slow.

      reply	other threads:[~2025-10-23  5:15 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-21 21:26 [ndctl PATCH] ndctl/test: fully reset nfit_test in pmem_ns unit test Alison Schofield
2025-10-22 15:04 ` Dave Jiang
2025-10-22 18:37 ` Marc Herbert
2025-10-23  5:15   ` Alison Schofield [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aPm524Y0hIEOUehg@aschofie-mobl2.lan \
    --to=alison.schofield@intel.com \
    --cc=marc.herbert@intel.com \
    --cc=marc.herbert@linux.intel.com \
    --cc=nvdimm@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox