linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Boaz Harrosh <boaz@plexistor.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Matthew Wilcox <willy@linux.intel.com>, Jan Kara <jack@suse.cz>,
	linux-fsdevel@vger.kernel.org
Subject: Re: direct_access, pinning and truncation
Date: Tue, 21 Oct 2014 12:17:42 +0300	[thread overview]
Message-ID: <544624B6.8080404@gmail.com> (raw)
In-Reply-To: <20141019230152.GM17506@dastard>

On 10/20/2014 02:01 AM, Dave Chinner wrote:
> On Sun, Oct 19, 2014 at 02:08:07PM +0300, Boaz Harrosh wrote:
>> On 10/10/2014 05:24 PM, Matthew Wilcox wrote:
>> <>
>>>
>>> I'm assuming that we come up with *some* way to solve the missing struct
>>> page problem.  Whether it's restructuring splice, O_DIRECT and RDMA to do
>>> without struct pages, 
>>
>> That makes no sense to me, where will it end? You are doubling the size of the
>> code to have two paths, and there will always be a subsystem you did not touch
>> and is missing support. And why? page was already invented to do exactly what you
>> want, track state of a PFN.
> .....
>>> whether it's coming up
>>> with some other data structure that takes the place of struct page for
>>> DAX ... 
>>
>> Again. Why reinvent the wheel when the old one works perfectly and does
>> everything you want, including the most important aspect. Not adding any
>> new infrastructure, and/or modifying any code. So why even think about it?
>>
>>> doesn't matter for this part of the conversation.
>>>
>>
>> I agree, this does not solve the reference problem, in this case DAX will
>> need an new entry into the FS to communicate delayed free-block. But as Jan
>> pointed out this is not against current FS structure.
>>
>> I think lots of current DAX problems and performance short comings can be
>> solved very nicely if we assume we have struct-page for pmem. For example
>> the use of the page-lock instead of the i_mutex we take today.
> 
> Which makes me look at what DAX is intended for.
> 
> DAX is an enabler, allowing us to get direct access to PMEM with
> *existing filesystem technologies*.  I don't want to have to add new
> extent management functions to XFS to add temporary references to
> allow DAX to hold onto extents after an inode has been freed because
> some RDMA app has pinned the PMEM and forgot to let it go. That way
> lies madness for existing filesystems - yes, we can add such warts
> to them, but it's ugly, nasty and needed only by a very, very small
> lunatic fringe of users.
> 

I agree

> IMO, this proposal is way outside the original DAX-replaces-XIP scope;
> I really don't think that requiring extensive modifications to
> filesystems to use DAX is a good idea. Apart from it being contrary to the
> original architectural goal of DAX (which was "enable direct access
> with minimal filesystem implementation impact"), we risk significant
> impact on non-DAX users by requiring architectural changes to the
> underlying filesystems to support DAX.
> 
> So my question is this: at what point do we say "out of scope for
> DAX, make this work with a native PMEM filesystem"?  DAX as it
> stands fills the "95% of what people need" goal with minimal effort;
> our efforts should be focussed on merging what we have, not creeping
> the scope and making it harder to implement and get merged.
> 
> If we want RDMA into PMEM devices or direct IO to/from persisten
> memory, then I'd suggest that this is functionality that belongs in
> native PMEM storage devices/filesystems and should be designed to be
> efficient in that environment way from the ground up.
> 

You convinced me. This is out of scope for DAX and is up to the user.
It actually works today, let me explain:

Today, after my patch to pmem, one can just mmap a file and the pointer
returned pass to any RDMA engine he chooses and it will just work. With
brd driver and DAX it will just work today, and even with old XIP.
The problem that remains is the truncate while RDMA mapped. What the
user will need to do is take a lock on the file to wart any truncates.
For me this is like trashing the block-dev directly while an FS is
mounted, I think, can a none root do this?
Please note that this scenario is possible today with a brd device.

> Cheers,
> Dave.
> 

Thanks
Boaz


      reply	other threads:[~2014-10-21  9:17 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-08 19:05 direct_access, pinning and truncation Matthew Wilcox
2014-10-08 23:21 ` Zach Brown
2014-10-09 16:44   ` Matthew Wilcox
2014-10-09 19:14     ` Zach Brown
2014-10-10 10:01       ` Jan Kara
2014-10-09  1:10 ` Dave Chinner
2014-10-09 15:25   ` Matthew Wilcox
2014-10-13  1:19     ` Dave Chinner
2014-10-19  9:51     ` Boaz Harrosh
2014-10-10 13:08 ` Jan Kara
2014-10-10 14:24   ` Matthew Wilcox
2014-10-19 11:08     ` Boaz Harrosh
2014-10-19 23:01       ` Dave Chinner
2014-10-21  9:17         ` Boaz Harrosh [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=544624B6.8080404@gmail.com \
    --to=boaz@plexistor.com \
    --cc=david@fromorbit.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).