From: "Verma, Vishal L" <vishal.l.verma@intel.com>
To: "linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
"boaz@plexistor.com" <boaz@plexistor.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
"hch@infradead.org" <hch@infradead.org>,
"xfs@oss.sgi.com" <xfs@oss.sgi.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
"axboe@fb.com" <axboe@fb.com>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
"david@fromorbit.com" <david@fromorbit.com>,
"jack@suse.cz" <jack@suse.cz>, "matthew@wil.cx" <matthew@wil.cx>
Subject: Re: [PATCH v4 5/7] fs: prioritize and separate direct_io from dax_io
Date: Mon, 2 May 2016 18:52:02 +0000 [thread overview]
Message-ID: <1462215110.1421.43.camel@intel.com> (raw)
In-Reply-To: <57277A59.3000306@plexistor.com>
On Mon, 2016-05-02 at 19:03 +0300, Boaz Harrosh wrote:
> On 05/02/2016 06:51 PM, Vishal Verma wrote:
> >
> > On Mon, 2016-05-02 at 18:41 +0300, Boaz Harrosh wrote:
> > >
> > > On 04/29/2016 12:16 AM, Vishal Verma wrote:
> > > >
> > > >
> > > > All IO in a dax filesystem used to go through dax_do_io, which
> > > > cannot
> > > > handle media errors, and thus cannot provide a recovery path
> > > > that
> > > > can
> > > > send a write through the driver to clear errors.
> > > >
> > > > Add a new iocb flag for DAX, and set it only for DAX mounts. In
> > > > the
> > > > IO
> > > > path for DAX filesystems, use the same direct_IO path for both
> > > > DAX
> > > > and
> > > > direct_io iocbs, but use the flags to identify when we are in
> > > > O_DIRECT
> > > > mode vs non O_DIRECT with DAX, and for O_DIRECT, use the
> > > > conventional
> > > > direct_IO path instead of DAX.
> > > >
> > > Really? What are your thinking here?
> > >
> > > What about all the current users of O_DIRECT, you have just made
> > > them
> > > 4 times slower and "less concurrent*" then "buffred io" users.
> > > Since
> > > direct_IO path will queue an IO request and all.
> > > (And if it is not so slow then why do we need dax_do_io at all?
> > > [Rhetorical])
> > >
> > > I hate it that you overload the semantics of a known and expected
> > > O_DIRECT flag, for special pmem quirks. This is an incompatible
> > > and unrelated overload of the semantics of O_DIRECT.
> > We overloaded O_DIRECT a long time ago when we made DAX piggyback on
> > the same path:
> >
> > static inline bool io_is_direct(struct file *filp)
> > {
> > return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping-
> > >host);
> > }
> >
> No as far as the user is concerned we have not. The O_DIRECT user
> is still getting all the semantics he wants, .i.e no syncs no
> memory cache usage, no copies ...
>
> Only with DAX the buffered IO is the same since with pmem it is
> faster.
> Then why not? The basic contract with the user did not break.
>
> The above was just an implementation detail to easily navigate
> through the Linux vfs IO stack and make the least amount of changes
> in every FS that wanted to support DAX.(And since dax_do_io is much
> more like direct_IO then like page-cache IO)
>
> >
> > Yes O_DIRECT on a DAX mounted file system will now be slower, but -
> >
> > >
> > >
> > > >
> > > >
> > > > This allows us a recovery path in the form of opening the file
> > > > with
> > > > O_DIRECT and writing to it with the usual O_DIRECT semantics
> > > > (sector
> > > > alignment restrictions).
> > > >
> > > I understand that you want a sector aligned IO, right? for the
> > > clear of errors. But I hate it that you forced all O_DIRECT IO
> > > to be slow for this.
> > > Can you not make dax_do_io handle media errors? At least for the
> > > parts of the IO that are aligned.
> > > (And your recovery path application above can use only aligned
> > > IO to make sure)
> > >
> > > Please look for another solution. Even a special
> > > IOCTL_DAX_CLEAR_ERROR
> > - see all the versions of this series prior to this one, where we
> > try
> > to do a fallback...
> >
> And?
>
> So now all O_DIRECT APPs go 4 times slower. I will have a look but if
> it is really so bad than please consider an IOCTL or syscall. Or a
> special
> O_DAX_ERRORS flag ...
I'm curious where the 4x slower comes from.. The O_DIRECT path is still
without page-cache copies, and nor does it go through request queues
(since pmem is a bio-based driver). The only overhead is that of
submitting a bio - and while I agree it is more overhead than dax_do_io,
4x seems a bit high.
>
> Please do not trash all the O_DIRECT users, they are the more
> important
> clients, like DBs and VMs.
Shouldn't they be using mmaps and dax faults? I was under the impression
that the dax_do_io path is a nice-to-have, but for anyone that will want
to use DAX, they will want the mmap/fault path, not the IO path. This is
just making the IO path 'more correct' by allowing it a way to deal with
errors.
>
> Thanks
> Boaz
>
> >
> > >
> > >
> > > [*"less concurrent" because of the queuing done in bdev. Note how
> > > pmem is not even multi-queue, and even if it was it will be much
> > > slower then DAX because of the code depth and all the locks and
> > > task
> > > switches done in the block layer. In DAX the final memcpy is
> > > done
> > > directly
> > > on the user-mode thread]
> > >
> > > Thanks
> > > Boaz
> > >
next prev parent reply other threads:[~2016-05-02 18:52 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-28 21:16 [PATCH v4 0/7] dax: handling media errors Vishal Verma
2016-04-28 21:16 ` [PATCH v4 1/7] block, dax: pass blk_dax_ctl through to drivers Vishal Verma
2016-04-28 21:16 ` [PATCH v4 2/7] dax: fallback from pmd to pte on error Vishal Verma
2016-04-28 21:16 ` [PATCH v4 3/7] dax: enable dax in the presence of known media errors (badblocks) Vishal Verma
2016-04-28 21:16 ` [PATCH v4 4/7] dax: use sb_issue_zerout instead of calling dax_clear_sectors Vishal Verma
2016-04-28 21:16 ` [PATCH v4 5/7] fs: prioritize and separate direct_io from dax_io Vishal Verma
2016-05-02 14:56 ` Christoph Hellwig
2016-05-02 15:45 ` Vishal Verma
2016-05-02 15:41 ` Boaz Harrosh
2016-05-02 15:51 ` Vishal Verma
2016-05-02 16:03 ` Boaz Harrosh
2016-05-02 18:52 ` Verma, Vishal L [this message]
2016-05-02 16:01 ` Dan Williams
2016-05-02 16:22 ` Boaz Harrosh
2016-05-02 16:49 ` Dan Williams
2016-05-02 17:44 ` Boaz Harrosh
2016-05-02 18:10 ` Dan Williams
2016-05-02 18:32 ` Boaz Harrosh
2016-05-02 18:48 ` Dan Williams
2016-05-02 19:22 ` Boaz Harrosh
2016-05-05 14:24 ` Christoph Hellwig
2016-05-05 15:15 ` Dan Williams
2016-05-05 15:22 ` Christoph Hellwig
2016-05-05 16:24 ` Dan Williams
2016-05-05 21:45 ` Verma, Vishal L
2016-05-08 9:01 ` hch
2016-05-08 18:42 ` Verma, Vishal L
2016-05-05 21:42 ` Verma, Vishal L
2016-05-05 21:39 ` Verma, Vishal L
2016-05-08 9:01 ` hch
2016-04-28 21:16 ` [PATCH v4 6/7] dax: for truncate/hole-punch, do zeroing through the driver if possible Vishal Verma
2016-04-28 21:16 ` [PATCH v4 7/7] dax: fix a comment in dax_zero_page_range and dax_truncate_page Vishal Verma
2016-04-29 21:55 ` [PATCH v4 8/7] Documentation: add error handling information to dax.txt Vishal Verma
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1462215110.1421.43.camel@intel.com \
--to=vishal.l.verma@intel.com \
--cc=akpm@linux-foundation.org \
--cc=axboe@fb.com \
--cc=boaz@plexistor.com \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=linux-block@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@lists.01.org \
--cc=matthew@wil.cx \
--cc=viro@zeniv.linux.org.uk \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).