All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boaz Harrosh <boaz@plexistor.com>
To: Ross Zwisler <ross.zwisler@linux.intel.com>,
	Christoph Hellwig <hch@lst.de>,
	Dave Chinner <david@fromorbit.com>, Andrew Morton <akpm@osdl.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	linux-nvdimm@lists.01.org, Peter Zijlstra <peterz@infradead.org>,
	x86@kernel.org, Hugh Dickins <hughd@google.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Ingo Molnar <mingo@redhat.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	"H. Peter Anvin" <hpa@zytor.com>,
	linux-fsdevel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [PATCH] dax, pmem: add support for msync
Date: Thu, 03 Sep 2015 09:32:02 +0300	[thread overview]
Message-ID: <55E7E962.2000607@plexistor.com> (raw)
In-Reply-To: <20150902190401.GC32255@linux.intel.com>

On 09/02/2015 10:04 PM, Ross Zwisler wrote:
> On Tue, Sep 01, 2015 at 03:18:41PM +0300, Boaz Harrosh wrote:
<>
>> Apps expect all these to work:
>> 1. open mmap m-write msync ... close
>> 2. open mmap m-write fsync ... close
>> 3. open mmap m-write unmap ... fsync close
>>
>> 4. open mmap m-write sync ...
> 
> So basically you made close have an implicit fsync?  What about the flow that
> looks like this:
> 
> 5. open mmap close m-write
> 

What? no, close means ummap because you need a file* attached to your vma

And you miss-understood me, the vm_opts->close is the *unmap* operation not
the file::close() operation.

I meant memory-cl_flush on unmap before the vma goes away.

> This guy definitely needs an msync/fsync at the end to make sure that the
> m-write becomes durable.  
> 

Exactly done at unmap time.

> Also, the CLOSE(2) man page specifically says that a flush does not occur at
> close:
> 	A successful close does not guarantee that the data has been
> 	successfully  saved  to  disk,  as  the  kernel defers  writes.   It
> 	is not common for a filesystem to flush the buffers when the stream is
> 	closed.  If you need to be sure that the data is physically stored,
> 	use fsync(2).  (It will depend on the disk  hardware  at this point.)
> 
> I don't think that adding an implicit fsync to close is the right solution -
> we just need to get msync and fsync correctly working.
> 

So above is not relevant, and we are doing that. taking care of cpu-cache flushing.
This is not disk-flushing, on a long memcpy from usermode most of the data is
already durable, is only the leftover margins. Like the dax_io in the kernel
dax implies direct_io always, all we are trying is to have the least
performance hit in memory-cache-flushing.

IS nothing to do with the text above.

>> The first 3 are supported with above, because what happens is that at [3]
>> the fsync actually happens on unmap and fsync is redundant in that case.
>>
>> The only broken scenario is [3]. We do not have a list of "dax-dirty" inodes
>> per sb to iterate on and call inode-sync on. This cause problems mostly in
>> freeze because with actual [3] scenario the file will be eventually closed
>> and persistent, but after the call to sync returns.
>>
>> Its on my TODO to fix [3] based on instructions from Dave.
>> The mmap call will put the inode on the list and the dax_vm_close will
>> remove it. One of the regular dirty list should be used as suggested by
>> Dave.
> 
> I believe in the above two paragraphs you meant [4], so the 
> 
> 4. open mmap m-write sync ...
> 
> case needs to be fixed so that we can detect DAX-dirty inodes?
> 

Yes I'll be working on sync (DAX-dirty-i_list) soon but it needs a working
fsync to be in place (eg: dax_fsync(inode))

Thanks
Boaz

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Boaz Harrosh <boaz@plexistor.com>
To: Ross Zwisler <ross.zwisler@linux.intel.com>,
	Christoph Hellwig <hch@lst.de>,
	Dave Chinner <david@fromorbit.com>, Andrew Morton <akpm@osdl.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	linux-nvdimm@lists.01.org, Peter Zijlstra <peterz@infradead.org>,
	x86@kernel.org, Hugh Dickins <hughd@google.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Ingo Molnar <mingo@redhat.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	"H. Peter Anvin" <hpa@zytor.com>,
	linux-fsdevel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [PATCH] dax, pmem: add support for msync
Date: Thu, 03 Sep 2015 09:32:02 +0300	[thread overview]
Message-ID: <55E7E962.2000607@plexistor.com> (raw)
In-Reply-To: <20150902190401.GC32255@linux.intel.com>

On 09/02/2015 10:04 PM, Ross Zwisler wrote:
> On Tue, Sep 01, 2015 at 03:18:41PM +0300, Boaz Harrosh wrote:
<>
>> Apps expect all these to work:
>> 1. open mmap m-write msync ... close
>> 2. open mmap m-write fsync ... close
>> 3. open mmap m-write unmap ... fsync close
>>
>> 4. open mmap m-write sync ...
> 
> So basically you made close have an implicit fsync?  What about the flow that
> looks like this:
> 
> 5. open mmap close m-write
> 

What? no, close means ummap because you need a file* attached to your vma

And you miss-understood me, the vm_opts->close is the *unmap* operation not
the file::close() operation.

I meant memory-cl_flush on unmap before the vma goes away.

> This guy definitely needs an msync/fsync at the end to make sure that the
> m-write becomes durable.  
> 

Exactly done at unmap time.

> Also, the CLOSE(2) man page specifically says that a flush does not occur at
> close:
> 	A successful close does not guarantee that the data has been
> 	successfully  saved  to  disk,  as  the  kernel defers  writes.   It
> 	is not common for a filesystem to flush the buffers when the stream is
> 	closed.  If you need to be sure that the data is physically stored,
> 	use fsync(2).  (It will depend on the disk  hardware  at this point.)
> 
> I don't think that adding an implicit fsync to close is the right solution -
> we just need to get msync and fsync correctly working.
> 

So above is not relevant, and we are doing that. taking care of cpu-cache flushing.
This is not disk-flushing, on a long memcpy from usermode most of the data is
already durable, is only the leftover margins. Like the dax_io in the kernel
dax implies direct_io always, all we are trying is to have the least
performance hit in memory-cache-flushing.

IS nothing to do with the text above.

>> The first 3 are supported with above, because what happens is that at [3]
>> the fsync actually happens on unmap and fsync is redundant in that case.
>>
>> The only broken scenario is [3]. We do not have a list of "dax-dirty" inodes
>> per sb to iterate on and call inode-sync on. This cause problems mostly in
>> freeze because with actual [3] scenario the file will be eventually closed
>> and persistent, but after the call to sync returns.
>>
>> Its on my TODO to fix [3] based on instructions from Dave.
>> The mmap call will put the inode on the list and the dax_vm_close will
>> remove it. One of the regular dirty list should be used as suggested by
>> Dave.
> 
> I believe in the above two paragraphs you meant [4], so the 
> 
> 4. open mmap m-write sync ...
> 
> case needs to be fixed so that we can detect DAX-dirty inodes?
> 

Yes I'll be working on sync (DAX-dirty-i_list) soon but it needs a working
fsync to be in place (eg: dax_fsync(inode))

Thanks
Boaz


WARNING: multiple messages have this Message-ID (diff)
From: Boaz Harrosh <boaz@plexistor.com>
To: Ross Zwisler <ross.zwisler@linux.intel.com>,
	Christoph Hellwig <hch@lst.de>,
	Dave Chinner <david@fromorbit.com>, Andrew Morton <akpm@osdl.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	linux-nvdimm@ml01.01.org, Peter Zijlstra <peterz@infradead.org>,
	x86@kernel.org, Hugh Dickins <hughd@google.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Ingo Molnar <mingo@redhat.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	"H. Peter Anvin" <hpa@zytor.com>,
	linux-fsdevel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [PATCH] dax, pmem: add support for msync
Date: Thu, 03 Sep 2015 09:32:02 +0300	[thread overview]
Message-ID: <55E7E962.2000607@plexistor.com> (raw)
In-Reply-To: <20150902190401.GC32255@linux.intel.com>

On 09/02/2015 10:04 PM, Ross Zwisler wrote:
> On Tue, Sep 01, 2015 at 03:18:41PM +0300, Boaz Harrosh wrote:
<>
>> Apps expect all these to work:
>> 1. open mmap m-write msync ... close
>> 2. open mmap m-write fsync ... close
>> 3. open mmap m-write unmap ... fsync close
>>
>> 4. open mmap m-write sync ...
> 
> So basically you made close have an implicit fsync?  What about the flow that
> looks like this:
> 
> 5. open mmap close m-write
> 

What? no, close means ummap because you need a file* attached to your vma

And you miss-understood me, the vm_opts->close is the *unmap* operation not
the file::close() operation.

I meant memory-cl_flush on unmap before the vma goes away.

> This guy definitely needs an msync/fsync at the end to make sure that the
> m-write becomes durable.  
> 

Exactly done at unmap time.

> Also, the CLOSE(2) man page specifically says that a flush does not occur at
> close:
> 	A successful close does not guarantee that the data has been
> 	successfully  saved  to  disk,  as  the  kernel defers  writes.   It
> 	is not common for a filesystem to flush the buffers when the stream is
> 	closed.  If you need to be sure that the data is physically stored,
> 	use fsync(2).  (It will depend on the disk  hardware  at this point.)
> 
> I don't think that adding an implicit fsync to close is the right solution -
> we just need to get msync and fsync correctly working.
> 

So above is not relevant, and we are doing that. taking care of cpu-cache flushing.
This is not disk-flushing, on a long memcpy from usermode most of the data is
already durable, is only the leftover margins. Like the dax_io in the kernel
dax implies direct_io always, all we are trying is to have the least
performance hit in memory-cache-flushing.

IS nothing to do with the text above.

>> The first 3 are supported with above, because what happens is that at [3]
>> the fsync actually happens on unmap and fsync is redundant in that case.
>>
>> The only broken scenario is [3]. We do not have a list of "dax-dirty" inodes
>> per sb to iterate on and call inode-sync on. This cause problems mostly in
>> freeze because with actual [3] scenario the file will be eventually closed
>> and persistent, but after the call to sync returns.
>>
>> Its on my TODO to fix [3] based on instructions from Dave.
>> The mmap call will put the inode on the list and the dax_vm_close will
>> remove it. One of the regular dirty list should be used as suggested by
>> Dave.
> 
> I believe in the above two paragraphs you meant [4], so the 
> 
> 4. open mmap m-write sync ...
> 
> case needs to be fixed so that we can detect DAX-dirty inodes?
> 

Yes I'll be working on sync (DAX-dirty-i_list) soon but it needs a working
fsync to be in place (eg: dax_fsync(inode))

Thanks
Boaz


  parent reply	other threads:[~2015-09-03  6:32 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-31 18:59 [PATCH] dax, pmem: add support for msync Ross Zwisler
2015-08-31 18:59 ` Ross Zwisler
2015-08-31 18:59 ` Ross Zwisler
2015-08-31 19:06 ` Christoph Hellwig
2015-08-31 19:06   ` Christoph Hellwig
2015-08-31 19:06   ` Christoph Hellwig
2015-08-31 19:26   ` Ross Zwisler
2015-08-31 19:26     ` Ross Zwisler
2015-08-31 19:34     ` Christoph Hellwig
2015-08-31 19:34       ` Christoph Hellwig
2015-08-31 23:38 ` Dave Chinner
2015-08-31 23:38   ` Dave Chinner
2015-09-01  7:06   ` Christoph Hellwig
2015-09-01  7:06     ` Christoph Hellwig
2015-09-01 12:18     ` Boaz Harrosh
2015-09-01 12:18       ` Boaz Harrosh
2015-09-02 19:04       ` Ross Zwisler
2015-09-02 19:04         ` Ross Zwisler
2015-09-02 20:17         ` Kirill A. Shutemov
2015-09-02 20:17           ` Kirill A. Shutemov
2015-09-03  6:32         ` Boaz Harrosh [this message]
2015-09-03  6:32           ` Boaz Harrosh
2015-09-03  6:32           ` Boaz Harrosh
2015-09-03 16:44           ` Ross Zwisler
2015-09-03 16:44             ` Ross Zwisler
2015-09-01 22:21     ` Dave Chinner
2015-09-01 22:21       ` Dave Chinner
2015-09-02  3:19       ` Ross Zwisler
2015-09-02  3:19         ` Ross Zwisler
2015-09-02  5:17         ` Dave Chinner
2015-09-02  5:17           ` Dave Chinner
2015-09-02 10:27           ` Boaz Harrosh
2015-09-02 10:27             ` Boaz Harrosh
2015-09-02 14:23             ` Dave Hansen
2015-09-02 14:23               ` Dave Hansen
2015-09-02 14:23               ` Dave Hansen
2015-09-02 15:18               ` Boaz Harrosh
2015-09-02 15:18                 ` Boaz Harrosh
2015-09-02 15:39                 ` Dave Hansen
2015-09-02 15:39                   ` Dave Hansen
2015-09-02 15:39                   ` Dave Hansen
2015-09-02 16:00                   ` Boaz Harrosh
2015-09-02 16:00                     ` Boaz Harrosh
2015-09-02 16:00                     ` Boaz Harrosh
2015-09-02 16:19                     ` Dave Hansen
2015-09-02 16:19                       ` Dave Hansen
2015-09-02 16:19                       ` Dave Hansen
2015-09-03  6:41                       ` Boaz Harrosh
2015-09-03  6:41                         ` Boaz Harrosh
2015-09-02 10:04         ` Boaz Harrosh
2015-09-02 10:04           ` Boaz Harrosh
2015-09-01 10:08   ` Kirill A. Shutemov
2015-09-01 10:08     ` Kirill A. Shutemov
2015-09-01 11:27     ` Boaz Harrosh
2015-09-01 11:27       ` Boaz Harrosh
2015-09-01 22:49     ` Dave Chinner
2015-09-01 22:49       ` Dave Chinner
2015-09-02  9:13       ` Kirill A. Shutemov
2015-09-02  9:13         ` Kirill A. Shutemov
2015-09-02  9:37         ` Boaz Harrosh
2015-09-02  9:37           ` Boaz Harrosh
2015-09-02  9:37           ` Boaz Harrosh
2015-09-02  9:41           ` Boaz Harrosh
2015-09-02  9:41             ` Boaz Harrosh
2015-09-02  9:41             ` Boaz Harrosh
2015-09-02  9:47             ` Kirill A. Shutemov
2015-09-02  9:47               ` Kirill A. Shutemov
2015-09-02 10:28               ` Boaz Harrosh
2015-09-02 10:28                 ` Boaz Harrosh
2015-09-03  0:57         ` Dave Chinner
2015-09-03  0:57           ` Dave Chinner
2015-09-01 13:12 ` Boaz Harrosh
2015-09-01 13:12   ` Boaz Harrosh
2015-09-02 17:47   ` Ross Zwisler
2015-09-02 17:47     ` Ross Zwisler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55E7E962.2000607@plexistor.com \
    --to=boaz@plexistor.com \
    --cc=akpm@osdl.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=ross.zwisler@linux.intel.com \
    --cc=tglx@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.