Linux CXL
 help / color / mirror / Atom feed
From: "Verma, Vishal L" <vishal.l.verma@intel.com>
To: "Williams, Dan J" <dan.j.williams@intel.com>,
	"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
	"nvdimm@lists.linux.dev" <nvdimm@lists.linux.dev>
Cc: "Schofield, Alison" <alison.schofield@intel.com>,
	"Jiang, Dave" <dave.jiang@intel.com>,
	"Weiny, Ira" <ira.weiny@intel.com>
Subject: Re: [PATCH ndctl 2/2] daxctl/device.c: Fix error propagation in do_xaction_device()
Date: Fri, 12 Apr 2024 21:42:00 +0000	[thread overview]
Message-ID: <71977fdd416cd5ddfb896010ef56ecea0c23c26e.camel@intel.com> (raw)
In-Reply-To: <6619a7dcd1a18_24596294ec@dwillia2-mobl3.amr.corp.intel.com.notmuch>

On Fri, 2024-04-12 at 14:30 -0700, Dan Williams wrote:
> Vishal Verma wrote:
> > The loop through the provided list of devices in do_xaction_device()
> > returns the status based on whatever the last device did. Since the
> > order of processing devices, especially in cases like the 'all' keyword,
> > can be effectively random, this can lead to the same command, and same
> > effects, exiting with a different error code based on device ordering.
> > 
> > This was noticed with flakiness in the daxctl-create.sh unit test. Its
> > 'destroy-device all' command would either pass or fail based on the
> > order it tried to destroy devices in. (Recall that until now, destroying
> > a daxX.0 device would result in a failure).
> > 
> > Make this slightly more consistent by saving a failed status in
> > do_xaction_device if any iteration of the loop produces a failure.
> > Return this saved status instead of returning the status of the last
> > device processed.
> 
> I think "this is the way", at least it follows what cxl/memdev.c is
> doing. However we have ended up with an error scheme per tool when it
> comes to reporting errors for multi-device operations.
> 
> cxl/memdev.c: report the first error
> 
> daxctl/device.c: now fixed to report the first error

With this patch it's actually reporting the last error, not first.

> 
> ndctl/namespace.c: reports last result (same daxctl/device.c issue), unless in the greedy-create case
> 
> cxl/region.c: reports the last error even if that is not the last
> result, immune to the above bug, but why different?

I guess I always preferred last error instead of first, but happy to
change these to last to match cxl/memdev.c. I can see how first error
is probably slightly better (we'd normally want to know about the first
thing that fails).

> 
> The struggle here is that all of these tools continue on error, so it
> has always been the case that the only way to get a reliable error code
> vs action carried out is to not use the "all" or "multi-device" ways to
> specify the devices to operate upon.
> 
> I don't have a good answer besides, be careful when using "all".
> 
> It might make sense to bring ndctl/namespace.c in line to guarantee
> "unless 100% of the attempts are successful the command reports
> failure". However, it might be too late to make that change there if it
> breaks people's scripts. ndctl/namespace.c does not suffer from needing
> to know that namspaceX.0 can not be deleted since the deletion there is
> exclusively done by setting namespace size to zero.

Agreed, I'm onboard with holding off changes to ndctl/namespace.c
unless there's a reported problem, or we hit a test suite failure or
something down the line.

> 
> I think this daxctl change has a low risk of breaking folks because the
> primary failure case is fixed to swallow the error.
> 
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>

Thanks Dan!


  reply	other threads:[~2024-04-12 21:42 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-12 21:05 [PATCH ndctl 0/2] daxctl: Fix error handling and propagation in daxctl/device.c Vishal Verma
2024-04-12 21:05 ` [PATCH ndctl 1/2] daxctl/device.c: Handle special case of destroying daxX.0 Vishal Verma
2024-04-12 21:09   ` Dan Williams
2024-04-15 16:17   ` Dave Jiang
2024-04-12 21:05 ` [PATCH ndctl 2/2] daxctl/device.c: Fix error propagation in do_xaction_device() Vishal Verma
2024-04-12 21:30   ` Dan Williams
2024-04-12 21:42     ` Verma, Vishal L [this message]
2024-04-15 16:19   ` Dave Jiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=71977fdd416cd5ddfb896010ef56ecea0c23c26e.camel@intel.com \
    --to=vishal.l.verma@intel.com \
    --cc=alison.schofield@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=nvdimm@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox