Re: [PATCH v2 1/2] dax: Introduce normal and recovery dax operation modes

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Jane Chu <jane.chu@oracle.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: david <david@fromorbit.com>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Christoph Hellwig <hch@infradead.org>,
	Vishal L Verma <vishal.l.verma@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Alasdair Kergon <agk@redhat.com>,
	Mike Snitzer <snitzer@redhat.com>,
	device-mapper development <dm-devel@redhat.com>,
	"Weiny, Ira" <ira.weiny@intel.com>,
	Matthew Wilcox <willy@infradead.org>,
	Vivek Goyal <vgoyal@redhat.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux NVDIMM <nvdimm@lists.linux.dev>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-xfs <linux-xfs@vger.kernel.org>
Subject: Re: [PATCH v2 1/2] dax: Introduce normal and recovery dax operation modes
Date: Mon, 8 Nov 2021 21:02:29 +0000	[thread overview]
Message-ID: <63f89475-7a1f-e79e-7785-ba996211615b@oracle.com> (raw)
In-Reply-To: <CAPcyv4jcgFxgoXFhWL9+BReY8vFtgjb_=Lfai-adFpdzc4-35Q@mail.gmail.com>

On 11/6/2021 9:48 AM, Dan Williams wrote:
> On Fri, Nov 5, 2021 at 6:17 PM Jane Chu <jane.chu@oracle.com> wrote:
>>
>> Introduce DAX_OP_NORMAL and DAX_OP_RECOVERY operation modes to
>> {dax_direct_access, dax_copy_from_iter, dax_copy_to_iter}.
>> DAX_OP_NORMAL is the default or the existing mode, and
>> DAX_OP_RECOVERY is a new mode for data recovery purpose.
>>
>> When dax-FS suspects dax media error might be encountered
>> on a read or write, it can enact the recovery mode read or write
>> by setting DAX_OP_RECOVERY in the aforementioned APIs. A read
>> in recovery mode attempts to fetch as much data as possible
>> until the first poisoned page is encountered. A write in recovery
>> mode attempts to clear poison(s) in a page-aligned range and
>> then write the user provided data over.
>>
>> DAX_OP_NORMAL should be used for all non-recovery code path.
>>
>> Signed-off-by: Jane Chu <jane.chu@oracle.com>
> [..]
>> diff --git a/include/linux/dax.h b/include/linux/dax.h
>> index 324363b798ec..931586df2905 100644
>> --- a/include/linux/dax.h
>> +++ b/include/linux/dax.h
>> @@ -9,6 +9,10 @@
>>   /* Flag for synchronous flush */
>>   #define DAXDEV_F_SYNC (1UL << 0)
>>
>> +/* dax operation mode dynamically set by caller */
>> +#define        DAX_OP_NORMAL           0
> 
> Perhaps this should be called DAX_OP_FAILFAST?

Sure.

> 
>> +#define        DAX_OP_RECOVERY         1
>> +
>>   typedef unsigned long dax_entry_t;
>>
>>   struct dax_device;
>> @@ -22,8 +26,8 @@ struct dax_operations {
>>           * logical-page-offset into an absolute physical pfn. Return the
>>           * number of pages available for DAX at that pfn.
>>           */
>> -       long (*direct_access)(struct dax_device *, pgoff_t, long,
>> -                       void **, pfn_t *);
>> +       long (*direct_access)(struct dax_device *, pgoff_t, long, int,
> 
> Would be nice if that 'int' was an enum, but I'm not sure a new
> parameter is needed at all, see below...

Let's do your suggestion below. :)

> 
>> +                               void **, pfn_t *);
>>          /*
>>           * Validate whether this device is usable as an fsdax backing
>>           * device.
>> @@ -32,10 +36,10 @@ struct dax_operations {
>>                          sector_t, sector_t);
>>          /* copy_from_iter: required operation for fs-dax direct-i/o */
>>          size_t (*copy_from_iter)(struct dax_device *, pgoff_t, void *, size_t,
>> -                       struct iov_iter *);
>> +                       struct iov_iter *, int);
> 
> I'm not sure the flag is needed here as the "void *" could carry a
> flag in the pointer to indicate that is a recovery kaddr.

Agreed.

> 
>>          /* copy_to_iter: required operation for fs-dax direct-i/o */
>>          size_t (*copy_to_iter)(struct dax_device *, pgoff_t, void *, size_t,
>> -                       struct iov_iter *);
>> +                       struct iov_iter *, int);
> 
> Same comment here.
> 
>>          /* zero_page_range: required operation. Zero page range   */
>>          int (*zero_page_range)(struct dax_device *, pgoff_t, size_t);
>>   };
>> @@ -186,11 +190,11 @@ static inline void dax_read_unlock(int id)
>>   bool dax_alive(struct dax_device *dax_dev);
>>   void *dax_get_private(struct dax_device *dax_dev);
>>   long dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, long nr_pages,
>> -               void **kaddr, pfn_t *pfn);
>> +               int mode, void **kaddr, pfn_t *pfn);
> 
> How about dax_direct_access() calling convention stays the same, but
> the kaddr is optionally updated to carry a flag in the lower unused
> bits. So:
> 
> void **kaddr = NULL; /* caller only cares about the pfn */
> 
> void *failfast = NULL;
> void **kaddr = &failfast; /* caller wants -EIO not recovery */
> 
> void *recovery = (void *) DAX_OP_RECOVERY;
> void **kaddr = &recovery; /* caller wants to carefully access page(s)
> containing poison */
> 

Got it.

thanks!
-jane

next prev parent reply	other threads:[~2021-11-08 21:02 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-06  1:16 [PATCH v2 0/2] Dax poison recovery Jane Chu
2021-11-06  1:16 ` [PATCH v2 1/2] dax: Introduce normal and recovery dax operation modes Jane Chu
2021-11-06  1:50   ` Darrick J. Wong
2021-11-08 20:43     ` Jane Chu
2021-11-06 16:48   ` Dan Williams
2021-11-08 21:02     ` Jane Chu [this message]
2021-11-09  5:26       ` Ira Weiny
2021-11-09  6:04         ` Dan Williams
2021-11-06  1:16 ` [PATCH v2 2/2] dax,pmem: Implement pmem based dax data recovery Jane Chu
2021-11-06  2:04   ` Darrick J. Wong
2021-11-08 20:53     ` Jane Chu
2021-11-08 21:00     ` Jane Chu
2021-11-09  7:27   ` Christoph Hellwig
2021-11-09 18:48     ` Dan Williams
2021-11-09 19:52       ` Christoph Hellwig
2021-11-09 19:58       ` Jane Chu
2021-11-09 21:02         ` Dan Williams
2021-11-10 18:26           ` Jane Chu
2021-11-12 15:36             ` Mike Snitzer
2021-11-12 18:00               ` Jane Chu
2021-11-09 19:14     ` Jane Chu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=63f89475-7a1f-e79e-7785-ba996211615b@oracle.com \
    --to=jane.chu@oracle.com \
    --cc=agk@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=dm-devel@redhat.com \
    --cc=hch@infradead.org \
    --cc=ira.weiny@intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=snitzer@redhat.com \
    --cc=vgoyal@redhat.com \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox