From: Boaz Harrosh <boaz@plexistor.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>,
"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
linux-block@vger.kernel.org, Jan Kara <jack@suse.cz>,
Matthew Wilcox <matthew@wil.cx>,
Dave Chinner <david@fromorbit.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
XFS Developers <xfs@oss.sgi.com>, Jens Axboe <axboe@fb.com>,
Linux MM <linux-mm@kvack.org>, Al Viro <viro@zeniv.linux.org.uk>,
Christoph Hellwig <hch@infradead.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
linux-ext4 <linux-ext4@vger.kernel.org>
Subject: Re: [PATCH v4 5/7] fs: prioritize and separate direct_io from dax_io
Date: Mon, 02 May 2016 20:44:01 +0300 [thread overview]
Message-ID: <572791E1.7000103@plexistor.com> (raw)
In-Reply-To: <CAPcyv4jnz69a3S+XZgLaLojHZmpfoVXGDkJkt_1Q=8kk0gik9w@mail.gmail.com>
On 05/02/2016 07:49 PM, Dan Williams wrote:
> On Mon, May 2, 2016 at 9:22 AM, Boaz Harrosh <boaz@plexistor.com> wrote:
>> On 05/02/2016 07:01 PM, Dan Williams wrote:
>>> On Mon, May 2, 2016 at 8:41 AM, Boaz Harrosh <boaz@plexistor.com> wrote:
>>>> On 04/29/2016 12:16 AM, Vishal Verma wrote:
>>>>> All IO in a dax filesystem used to go through dax_do_io, which cannot
>>>>> handle media errors, and thus cannot provide a recovery path that can
>>>>> send a write through the driver to clear errors.
>>>>>
>>>>> Add a new iocb flag for DAX, and set it only for DAX mounts. In the IO
>>>>> path for DAX filesystems, use the same direct_IO path for both DAX and
>>>>> direct_io iocbs, but use the flags to identify when we are in O_DIRECT
>>>>> mode vs non O_DIRECT with DAX, and for O_DIRECT, use the conventional
>>>>> direct_IO path instead of DAX.
>>>>>
>>>>
>>>> Really? What are your thinking here?
>>>>
>>>> What about all the current users of O_DIRECT, you have just made them
>>>> 4 times slower and "less concurrent*" then "buffred io" users. Since
>>>> direct_IO path will queue an IO request and all.
>>>> (And if it is not so slow then why do we need dax_do_io at all? [Rhetorical])
>>>>
>>>> I hate it that you overload the semantics of a known and expected
>>>> O_DIRECT flag, for special pmem quirks. This is an incompatible
>>>> and unrelated overload of the semantics of O_DIRECT.
>>>
>>> I think it is the opposite situation, it us undoing the premature
>>> overloading of O_DIRECT that went in without performance numbers.
>>
>> We have tons of measurements. Is not hard to imagine the results though.
>> Specially the 1000 threads case
>>
>>> This implementation clarifies that dax_do_io() handles the lack of a
>>> page cache for buffered I/O and O_DIRECT behaves as it nominally would
>>> by sending an I/O to the driver.
>>
>>> It has the benefit of matching the
>>> error semantics of a typical block device where a buffered write could
>>> hit an error filling the page cache, but an O_DIRECT write potentially
>>> triggers the drive to remap the block.
>>>
>>
>> I fail to see how in writes the device error semantics regarding remapping of
>> blocks is any different between buffered and direct IO. As far as the block
>> device it is the same exact code path. All The big difference is higher in the
>> VFS.
>>
>> And ... So you are willing to sacrifice the 99% hotpath for the sake of the
>> 1% error path? and piggybacking on poor O_DIRECT.
>>
>> Again there are tons of O_DIRECT apps out there, why are you forcing them to
>> change if they want true pmem performance?
>
> This isn't forcing them to change. This is the path of least surprise
> as error semantics are identical to a typical block device. Yes, an
> application can go faster by switching to the "buffered" / dax_do_io()
> path it can go even faster to switch to mmap() I/O and use DAX
> directly. If we can later optimize the O_DIRECT path to bring it's
> performance more in line with dax_do_io(), great, but the
> implementation should be correct first and optimized later.
>
Why does it need to be either or. Why not both?
And also I disagree if you are correct and dax_do_io is bad and needs fixing
than you have broken applications. Because in current model:
read => -EIO, write-bufferd, sync()
gives you the same error semantics as: read => -EIO, write-direct-io
In fact this is what the delete, restore from backup model does today.
Who said it uses / must direct IO. Actually I think it does not.
Two things I can think of which are better:
[1]
Why not go deeper into the dax io loops, and for any WRITE
failed page call bdev_rw_page() to let the pmem.c clear / relocate
the error page.
So reads return -EIO - is what you wanted no?
writes get a memory error and retry with bdev_rw_page() to let the bdev
relocate / clear the error - is what you wanted no?
In the partial page WRITE case on bad sectors. we can carefully read-modify-write
sector-by-sector and zero-out the bad-sectors that could not be read, what else?
(Or enhance the bdev_rw_page() API)
[2]
Only switch to slow O_DIRECT, on presence of errors like you wanted. But I still
hate that you overload error semantics with O_DIRECT which does not exist today
see above
Thanks
Boaz
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-05-02 17:44 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-28 21:16 [PATCH v4 0/7] dax: handling media errors Vishal Verma
2016-04-28 21:16 ` [PATCH v4 1/7] block, dax: pass blk_dax_ctl through to drivers Vishal Verma
2016-04-28 21:16 ` [PATCH v4 2/7] dax: fallback from pmd to pte on error Vishal Verma
2016-04-28 21:16 ` [PATCH v4 3/7] dax: enable dax in the presence of known media errors (badblocks) Vishal Verma
2016-04-28 21:16 ` [PATCH v4 4/7] dax: use sb_issue_zerout instead of calling dax_clear_sectors Vishal Verma
2016-04-28 21:16 ` [PATCH v4 5/7] fs: prioritize and separate direct_io from dax_io Vishal Verma
2016-05-02 14:56 ` Christoph Hellwig
2016-05-02 15:45 ` Vishal Verma
2016-05-02 15:41 ` Boaz Harrosh
2016-05-02 15:51 ` Vishal Verma
2016-05-02 16:03 ` Boaz Harrosh
2016-05-02 18:52 ` Verma, Vishal L
2016-05-02 16:01 ` Dan Williams
2016-05-02 16:22 ` Boaz Harrosh
2016-05-02 16:49 ` Dan Williams
2016-05-02 17:44 ` Boaz Harrosh [this message]
2016-05-02 18:10 ` Dan Williams
2016-05-02 18:32 ` Boaz Harrosh
2016-05-02 18:48 ` Dan Williams
2016-05-02 19:22 ` Boaz Harrosh
2016-05-05 14:24 ` Christoph Hellwig
2016-05-05 15:15 ` Dan Williams
2016-05-05 15:22 ` Christoph Hellwig
2016-05-05 16:24 ` Dan Williams
2016-05-05 21:45 ` Verma, Vishal L
2016-05-08 9:01 ` hch
2016-05-08 18:42 ` Verma, Vishal L
2016-05-05 21:42 ` Verma, Vishal L
2016-05-05 21:39 ` Verma, Vishal L
2016-05-08 9:01 ` hch
2016-04-28 21:16 ` [PATCH v4 6/7] dax: for truncate/hole-punch, do zeroing through the driver if possible Vishal Verma
2016-04-28 21:16 ` [PATCH v4 7/7] dax: fix a comment in dax_zero_page_range and dax_truncate_page Vishal Verma
2016-04-29 21:55 ` [PATCH v4 8/7] Documentation: add error handling information to dax.txt Vishal Verma
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=572791E1.7000103@plexistor.com \
--to=boaz@plexistor.com \
--cc=akpm@linux-foundation.org \
--cc=axboe@fb.com \
--cc=dan.j.williams@intel.com \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=linux-block@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@lists.01.org \
--cc=matthew@wil.cx \
--cc=viro@zeniv.linux.org.uk \
--cc=vishal.l.verma@intel.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).