From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "Luck, Tony" <tony.luck@intel.com>,
Thomas Gleixner <tglx@linutronix.de>,
Andrew Morton <akpm@linux-foundation.org>,
linux-nvdimm <linux-nvdimm@lists.01.org>,
Peter Zijlstra <peterz@infradead.org>, X86 ML <x86@kernel.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Andy Lutomirski <luto@amacapital.net>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Al Viro <viro@zeniv.linux.org.uk>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH v3 7/9] dax: report bytes remaining in dax_iomap_actor()
Date: Wed, 23 May 2018 11:04:02 -0600 [thread overview]
Message-ID: <20180523170402.GA1847@linux.intel.com> (raw)
In-Reply-To: <CAPcyv4heJgEc8=Y0FjxvqzieY8hh4yK0uJbnEEBnXYFQodOPKg@mail.gmail.com>
On Wed, May 23, 2018 at 09:53:38AM -0700, Dan Williams wrote:
> On Wed, May 23, 2018 at 9:47 AM, Ross Zwisler
> <ross.zwisler@linux.intel.com> wrote:
> > On Wed, May 23, 2018 at 09:39:04AM -0700, Dan Williams wrote:
> >> On Wed, May 23, 2018 at 9:34 AM, Ross Zwisler
> >> <ross.zwisler@linux.intel.com> wrote:
> >> > On Thu, May 03, 2018 at 05:06:42PM -0700, Dan Williams wrote:
> >> >> In preparation for protecting the dax read(2) path from media errors
> >> >> with copy_to_iter_mcsafe() (via dax_copy_to_iter()), convert the
> >> >> implementation to report the bytes successfully transferred.
> >> >>
> >> >> Cc: <x86@kernel.org>
> >> >> Cc: Ingo Molnar <mingo@redhat.com>
> >> >> Cc: Borislav Petkov <bp@alien8.de>
> >> >> Cc: Tony Luck <tony.luck@intel.com>
> >> >> Cc: Al Viro <viro@zeniv.linux.org.uk>
> >> >> Cc: Thomas Gleixner <tglx@linutronix.de>
> >> >> Cc: Andy Lutomirski <luto@amacapital.net>
> >> >> Cc: Peter Zijlstra <peterz@infradead.org>
> >> >> Cc: Andrew Morton <akpm@linux-foundation.org>
> >> >> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> >> >> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> >> >> ---
> >> >> fs/dax.c | 20 +++++++++++---------
> >> >> 1 file changed, 11 insertions(+), 9 deletions(-)
> >> >>
> >> >> diff --git a/fs/dax.c b/fs/dax.c
> >> >> index a64afdf7ec0d..34a2d435ae4b 100644
> >> >> --- a/fs/dax.c
> >> >> +++ b/fs/dax.c
> >> >> @@ -991,6 +991,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> >> >> struct iov_iter *iter = data;
> >> >> loff_t end = pos + length, done = 0;
> >> >> ssize_t ret = 0;
> >> >> + size_t xfer;
> >> >> int id;
> >> >>
> >> >> if (iov_iter_rw(iter) == READ) {
> >> >> @@ -1054,19 +1055,20 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> >> >> * vfs_write(), depending on which operation we are doing.
> >> >> */
> >> >> if (iov_iter_rw(iter) == WRITE)
> >> >> - map_len = dax_copy_from_iter(dax_dev, pgoff, kaddr,
> >> >> + xfer = dax_copy_from_iter(dax_dev, pgoff, kaddr,
> >> >> map_len, iter);
> >> >> else
> >> >> - map_len = dax_copy_to_iter(dax_dev, pgoff, kaddr,
> >> >> + xfer = dax_copy_to_iter(dax_dev, pgoff, kaddr,
> >> >> map_len, iter);
> >> >> - if (map_len <= 0) {
> >> >> - ret = map_len ? map_len : -EFAULT;
> >> >> - break;
> >> >> - }
> >> >>
> >> >> - pos += map_len;
> >> >> - length -= map_len;
> >> >> - done += map_len;
> >> >> + pos += xfer;
> >> >> + length -= xfer;
> >> >> + done += xfer;
> >> >> +
> >> >> + if (xfer == 0)
> >> >> + ret = -EFAULT;
> >> >> + if (xfer < map_len)
> >> >> + break;
> >> >
> >> > I'm confused by this error handling. So if we hit an error on a given iov and
> >> > we don't transfer the expected number of bytes, we have two cases:
> >> >
> >> > 1) We transferred *something* on this iov but not everything - return success.
> >> > 2) We didn't transfer anything on this iov - return -EFAULT.
> >> >
> >> > Both of these are true regardless of data transferred on previous iovs.
> >> >
> >> > Why the distinction? If a given iov is interrupted, regardless of whether it
> >> > transferred 0 bytes or 1, shouldn't the error path be the same?
> >>
> >> This is is the semantics of read(2) / write(2). Quoting the pwrite man page:
> >>
> >> Note that is not an error for a successful call to
> >> transfer fewer bytes than
> >> requested (see read(2) and write(2)).
> >
> > Consider this case:
> >
> > I have 4 IOVs, each of a full page. The first three transfer their full page,
> > but on the third we hit an error.
> >
> > If we transferred 0 bytes in the fourth page, we'll return -EFAULT.
> >
> > If we transferred 1 byte in the fourth page, we'll return the total length
> > transferred, so 3 pages + 1 byte.
> >
> > Why? pwrite(2) says it returns the number of bytes written, which can be less
> > than the total requested. Why not just return the length transferred in both
> > cases, instead of returning -EFAULT for one of them?
>
> Ah, now I see. Yes, that's a bug. Once we have successfully completed
> any iovec we should be returning bytes transferred not -EFAULT.
Actually, your code is fine. This is handled by the:
return done ? done : ret;
at the end of the function. So if we've transferred any data at all, we'll
return the number of bytes transferred, and if we didn't we'll return -EFAULT
because 0 is the special case which means EOF according to pread(2)/pwrite(2).
Looks good, then. Thanks for answering my questions.
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
WARNING: multiple messages have this Message-ID (diff)
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>,
linux-nvdimm <linux-nvdimm@lists.01.org>,
"Luck, Tony" <tony.luck@intel.com>,
Peter Zijlstra <peterz@infradead.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
X86 ML <x86@kernel.org>, Christoph Hellwig <hch@lst.de>,
Andy Lutomirski <luto@amacapital.net>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Al Viro <viro@zeniv.linux.org.uk>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v3 7/9] dax: report bytes remaining in dax_iomap_actor()
Date: Wed, 23 May 2018 11:04:02 -0600 [thread overview]
Message-ID: <20180523170402.GA1847@linux.intel.com> (raw)
In-Reply-To: <CAPcyv4heJgEc8=Y0FjxvqzieY8hh4yK0uJbnEEBnXYFQodOPKg@mail.gmail.com>
On Wed, May 23, 2018 at 09:53:38AM -0700, Dan Williams wrote:
> On Wed, May 23, 2018 at 9:47 AM, Ross Zwisler
> <ross.zwisler@linux.intel.com> wrote:
> > On Wed, May 23, 2018 at 09:39:04AM -0700, Dan Williams wrote:
> >> On Wed, May 23, 2018 at 9:34 AM, Ross Zwisler
> >> <ross.zwisler@linux.intel.com> wrote:
> >> > On Thu, May 03, 2018 at 05:06:42PM -0700, Dan Williams wrote:
> >> >> In preparation for protecting the dax read(2) path from media errors
> >> >> with copy_to_iter_mcsafe() (via dax_copy_to_iter()), convert the
> >> >> implementation to report the bytes successfully transferred.
> >> >>
> >> >> Cc: <x86@kernel.org>
> >> >> Cc: Ingo Molnar <mingo@redhat.com>
> >> >> Cc: Borislav Petkov <bp@alien8.de>
> >> >> Cc: Tony Luck <tony.luck@intel.com>
> >> >> Cc: Al Viro <viro@zeniv.linux.org.uk>
> >> >> Cc: Thomas Gleixner <tglx@linutronix.de>
> >> >> Cc: Andy Lutomirski <luto@amacapital.net>
> >> >> Cc: Peter Zijlstra <peterz@infradead.org>
> >> >> Cc: Andrew Morton <akpm@linux-foundation.org>
> >> >> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> >> >> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> >> >> ---
> >> >> fs/dax.c | 20 +++++++++++---------
> >> >> 1 file changed, 11 insertions(+), 9 deletions(-)
> >> >>
> >> >> diff --git a/fs/dax.c b/fs/dax.c
> >> >> index a64afdf7ec0d..34a2d435ae4b 100644
> >> >> --- a/fs/dax.c
> >> >> +++ b/fs/dax.c
> >> >> @@ -991,6 +991,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> >> >> struct iov_iter *iter = data;
> >> >> loff_t end = pos + length, done = 0;
> >> >> ssize_t ret = 0;
> >> >> + size_t xfer;
> >> >> int id;
> >> >>
> >> >> if (iov_iter_rw(iter) == READ) {
> >> >> @@ -1054,19 +1055,20 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> >> >> * vfs_write(), depending on which operation we are doing.
> >> >> */
> >> >> if (iov_iter_rw(iter) == WRITE)
> >> >> - map_len = dax_copy_from_iter(dax_dev, pgoff, kaddr,
> >> >> + xfer = dax_copy_from_iter(dax_dev, pgoff, kaddr,
> >> >> map_len, iter);
> >> >> else
> >> >> - map_len = dax_copy_to_iter(dax_dev, pgoff, kaddr,
> >> >> + xfer = dax_copy_to_iter(dax_dev, pgoff, kaddr,
> >> >> map_len, iter);
> >> >> - if (map_len <= 0) {
> >> >> - ret = map_len ? map_len : -EFAULT;
> >> >> - break;
> >> >> - }
> >> >>
> >> >> - pos += map_len;
> >> >> - length -= map_len;
> >> >> - done += map_len;
> >> >> + pos += xfer;
> >> >> + length -= xfer;
> >> >> + done += xfer;
> >> >> +
> >> >> + if (xfer == 0)
> >> >> + ret = -EFAULT;
> >> >> + if (xfer < map_len)
> >> >> + break;
> >> >
> >> > I'm confused by this error handling. So if we hit an error on a given iov and
> >> > we don't transfer the expected number of bytes, we have two cases:
> >> >
> >> > 1) We transferred *something* on this iov but not everything - return success.
> >> > 2) We didn't transfer anything on this iov - return -EFAULT.
> >> >
> >> > Both of these are true regardless of data transferred on previous iovs.
> >> >
> >> > Why the distinction? If a given iov is interrupted, regardless of whether it
> >> > transferred 0 bytes or 1, shouldn't the error path be the same?
> >>
> >> This is is the semantics of read(2) / write(2). Quoting the pwrite man page:
> >>
> >> Note that is not an error for a successful call to
> >> transfer fewer bytes than
> >> requested (see read(2) and write(2)).
> >
> > Consider this case:
> >
> > I have 4 IOVs, each of a full page. The first three transfer their full page,
> > but on the third we hit an error.
> >
> > If we transferred 0 bytes in the fourth page, we'll return -EFAULT.
> >
> > If we transferred 1 byte in the fourth page, we'll return the total length
> > transferred, so 3 pages + 1 byte.
> >
> > Why? pwrite(2) says it returns the number of bytes written, which can be less
> > than the total requested. Why not just return the length transferred in both
> > cases, instead of returning -EFAULT for one of them?
>
> Ah, now I see. Yes, that's a bug. Once we have successfully completed
> any iovec we should be returning bytes transferred not -EFAULT.
Actually, your code is fine. This is handled by the:
return done ? done : ret;
at the end of the function. So if we've transferred any data at all, we'll
return the number of bytes transferred, and if we didn't we'll return -EFAULT
because 0 is the special case which means EOF according to pread(2)/pwrite(2).
Looks good, then. Thanks for answering my questions.
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
next prev parent reply other threads:[~2018-05-23 17:04 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-04 0:06 [PATCH v3 0/9] Series short description Dan Williams
2018-05-04 0:06 ` Dan Williams
2018-05-04 0:06 ` [PATCH v3 1/9] x86, memcpy_mcsafe: remove loop unrolling Dan Williams
2018-05-04 0:06 ` Dan Williams
2018-05-04 0:06 ` [PATCH v3 2/9] x86, memcpy_mcsafe: add labels for write fault handling Dan Williams
2018-05-04 0:06 ` Dan Williams
2018-05-04 0:06 ` [PATCH v3 3/9] x86, memcpy_mcsafe: return bytes remaining Dan Williams
2018-05-04 0:06 ` Dan Williams
2018-05-04 0:06 ` [PATCH v3 4/9] x86, memcpy_mcsafe: add write-protection-fault handling Dan Williams
2018-05-04 0:06 ` Dan Williams
2018-05-04 0:06 ` [PATCH v3 5/9] x86, memcpy_mcsafe: define copy_to_iter_mcsafe() Dan Williams
2018-05-04 0:06 ` Dan Williams
2018-05-04 0:06 ` [PATCH v3 6/9] dax: introduce a ->copy_to_iter dax operation Dan Williams
2018-05-04 0:06 ` Dan Williams
2018-05-23 16:13 ` Ross Zwisler
2018-05-23 16:13 ` Ross Zwisler
2018-05-04 0:06 ` [PATCH v3 7/9] dax: report bytes remaining in dax_iomap_actor() Dan Williams
2018-05-04 0:06 ` Dan Williams
2018-05-23 16:34 ` Ross Zwisler
2018-05-23 16:34 ` Ross Zwisler
2018-05-23 16:39 ` Dan Williams
2018-05-23 16:39 ` Dan Williams
2018-05-23 16:47 ` Ross Zwisler
2018-05-23 16:47 ` Ross Zwisler
2018-05-23 16:53 ` Dan Williams
2018-05-23 16:53 ` Dan Williams
2018-05-23 17:04 ` Ross Zwisler [this message]
2018-05-23 17:04 ` Ross Zwisler
2018-05-04 0:06 ` [PATCH v3 8/9] pmem: switch to copy_to_iter_mcsafe() Dan Williams
2018-05-04 0:06 ` Dan Williams
2018-05-23 16:35 ` Ross Zwisler
2018-05-23 16:35 ` Ross Zwisler
2018-05-04 0:06 ` [PATCH v3 9/9] x86, nfit_test: unit test for memcpy_mcsafe() Dan Williams
2018-05-04 0:06 ` Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180523170402.GA1847@linux.intel.com \
--to=ross.zwisler@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=dan.j.williams@intel.com \
--cc=hch@lst.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvdimm@lists.01.org \
--cc=luto@amacapital.net \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.