From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Ross Zwisler <ross.zwisler@linux.intel.com>,
Boaz Harrosh <boaz@plexistor.com>, Christoph Hellwig <hch@lst.de>,
Dave Chinner <david@fromorbit.com>, Andrew Morton <akpm@osdl.org>,
Dave Hansen <dave.hansen@linux.intel.com>,
linux-nvdimm@lists.01.org, Peter Zijlstra <peterz@infradead.org>,
x86@kernel.org, Hugh Dickins <hughd@google.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Ingo Molnar <mingo@redhat.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
"H. Peter Anvin" <hpa@zytor.com>,
linux-fsdevel@vger.kernel.org,
Thomas Gleixner <tglx@linutronix.de>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [PATCH] dax, pmem: add support for msync
Date: Wed, 2 Sep 2015 23:17:38 +0300 [thread overview]
Message-ID: <20150902201738.GA5775@node.dhcp.inet.fi> (raw)
In-Reply-To: <20150902190401.GC32255@linux.intel.com>
On Wed, Sep 02, 2015 at 01:04:01PM -0600, Ross Zwisler wrote:
> On Tue, Sep 01, 2015 at 03:18:41PM +0300, Boaz Harrosh wrote:
> > So the approach we took was a bit different to exactly solve these
> > problem, and to also not over flush too much. here is what we did.
> >
> > * At vm_operations_struct we also override the .close vector (say call it dax_vm_close)
> >
> > * At dax_vm_close() on writable files call ->fsync(,vma->vm_start, vma->vm_end,)
> > (We have an inode flag if the file was actually dirtied, but even if not, that will
> > not be that bad, so a file was opened for write, mmapped, but actually never
> > modified. Not a lot of these, and the do nothing cl_flushing is very fast)
> >
> > * At ->fsync() do the actual cl_flush for all cases but only iff
> > if (mapping_mapped(inode->i_mapping) == 0)
> > return 0;
> >
> > This is because data written not through mmap is already persistent and we
> > do not need the cl_flushing
> >
> > Apps expect all these to work:
> > 1. open mmap m-write msync ... close
> > 2. open mmap m-write fsync ... close
> > 3. open mmap m-write unmap ... fsync close
> >
> > 4. open mmap m-write sync ...
>
> So basically you made close have an implicit fsync? What about the flow that
> looks like this:
>
> 5. open mmap close m-write
>
> This guy definitely needs an msync/fsync at the end to make sure that the
> m-write becomes durable.
We can sync on pte_dirty() during zap_page_range(): it's practically free,
since we page walk anyway.
With this approach it probably makes sense to come back to page walk on
msync() side too to be consistent wrt pte_dirty() meaning.
> Also, the CLOSE(2) man page specifically says that a flush does not occur at
> close:
> A successful close does not guarantee that the data has been
> successfully saved to disk, as the kernel defers writes. It
> is not common for a filesystem to flush the buffers when the stream is
> closed. If you need to be sure that the data is physically stored,
> use fsync(2). (It will depend on the disk hardware at this point.)
>
> I don't think that adding an implicit fsync to close is the right solution -
> we just need to get msync and fsync correctly working.
I doesn't mean we can't sync if we can do without noticible performance
degradation.
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Ross Zwisler <ross.zwisler@linux.intel.com>,
Boaz Harrosh <boaz@plexistor.com>, Christoph Hellwig <hch@lst.de>,
Dave Chinner <david@fromorbit.com>, Andrew Morton <akpm@osdl.org>,
Dave Hansen <dave.hansen@linux.intel.com>,
linux-nvdimm@ml01.01.org, Peter Zijlstra <peterz@infradead.org>,
x86@kernel.org, Hugh Dickins <hughd@google.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Ingo Molnar <mingo@redhat.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
"H. Peter Anvin" <hpa@zytor.com>,
linux-fsdevel@vger.kernel.org,
Thomas Gleixner <tglx@linutronix.de>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [PATCH] dax, pmem: add support for msync
Date: Wed, 2 Sep 2015 23:17:38 +0300 [thread overview]
Message-ID: <20150902201738.GA5775@node.dhcp.inet.fi> (raw)
In-Reply-To: <20150902190401.GC32255@linux.intel.com>
On Wed, Sep 02, 2015 at 01:04:01PM -0600, Ross Zwisler wrote:
> On Tue, Sep 01, 2015 at 03:18:41PM +0300, Boaz Harrosh wrote:
> > So the approach we took was a bit different to exactly solve these
> > problem, and to also not over flush too much. here is what we did.
> >
> > * At vm_operations_struct we also override the .close vector (say call it dax_vm_close)
> >
> > * At dax_vm_close() on writable files call ->fsync(,vma->vm_start, vma->vm_end,)
> > (We have an inode flag if the file was actually dirtied, but even if not, that will
> > not be that bad, so a file was opened for write, mmapped, but actually never
> > modified. Not a lot of these, and the do nothing cl_flushing is very fast)
> >
> > * At ->fsync() do the actual cl_flush for all cases but only iff
> > if (mapping_mapped(inode->i_mapping) == 0)
> > return 0;
> >
> > This is because data written not through mmap is already persistent and we
> > do not need the cl_flushing
> >
> > Apps expect all these to work:
> > 1. open mmap m-write msync ... close
> > 2. open mmap m-write fsync ... close
> > 3. open mmap m-write unmap ... fsync close
> >
> > 4. open mmap m-write sync ...
>
> So basically you made close have an implicit fsync? What about the flow that
> looks like this:
>
> 5. open mmap close m-write
>
> This guy definitely needs an msync/fsync at the end to make sure that the
> m-write becomes durable.
We can sync on pte_dirty() during zap_page_range(): it's practically free,
since we page walk anyway.
With this approach it probably makes sense to come back to page walk on
msync() side too to be consistent wrt pte_dirty() meaning.
> Also, the CLOSE(2) man page specifically says that a flush does not occur at
> close:
> A successful close does not guarantee that the data has been
> successfully saved to disk, as the kernel defers writes. It
> is not common for a filesystem to flush the buffers when the stream is
> closed. If you need to be sure that the data is physically stored,
> use fsync(2). (It will depend on the disk hardware at this point.)
>
> I don't think that adding an implicit fsync to close is the right solution -
> we just need to get msync and fsync correctly working.
I doesn't mean we can't sync if we can do without noticible performance
degradation.
--
Kirill A. Shutemov
next prev parent reply other threads:[~2015-09-02 20:17 UTC|newest]
Thread overview: 75+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-31 18:59 [PATCH] dax, pmem: add support for msync Ross Zwisler
2015-08-31 18:59 ` Ross Zwisler
2015-08-31 18:59 ` Ross Zwisler
2015-08-31 19:06 ` Christoph Hellwig
2015-08-31 19:06 ` Christoph Hellwig
2015-08-31 19:06 ` Christoph Hellwig
2015-08-31 19:26 ` Ross Zwisler
2015-08-31 19:26 ` Ross Zwisler
2015-08-31 19:34 ` Christoph Hellwig
2015-08-31 19:34 ` Christoph Hellwig
2015-08-31 23:38 ` Dave Chinner
2015-08-31 23:38 ` Dave Chinner
2015-09-01 7:06 ` Christoph Hellwig
2015-09-01 7:06 ` Christoph Hellwig
2015-09-01 12:18 ` Boaz Harrosh
2015-09-01 12:18 ` Boaz Harrosh
2015-09-02 19:04 ` Ross Zwisler
2015-09-02 19:04 ` Ross Zwisler
2015-09-02 20:17 ` Kirill A. Shutemov [this message]
2015-09-02 20:17 ` Kirill A. Shutemov
2015-09-03 6:32 ` Boaz Harrosh
2015-09-03 6:32 ` Boaz Harrosh
2015-09-03 6:32 ` Boaz Harrosh
2015-09-03 16:44 ` Ross Zwisler
2015-09-03 16:44 ` Ross Zwisler
2015-09-01 22:21 ` Dave Chinner
2015-09-01 22:21 ` Dave Chinner
2015-09-02 3:19 ` Ross Zwisler
2015-09-02 3:19 ` Ross Zwisler
2015-09-02 5:17 ` Dave Chinner
2015-09-02 5:17 ` Dave Chinner
2015-09-02 10:27 ` Boaz Harrosh
2015-09-02 10:27 ` Boaz Harrosh
2015-09-02 14:23 ` Dave Hansen
2015-09-02 14:23 ` Dave Hansen
2015-09-02 14:23 ` Dave Hansen
2015-09-02 15:18 ` Boaz Harrosh
2015-09-02 15:18 ` Boaz Harrosh
2015-09-02 15:39 ` Dave Hansen
2015-09-02 15:39 ` Dave Hansen
2015-09-02 15:39 ` Dave Hansen
2015-09-02 16:00 ` Boaz Harrosh
2015-09-02 16:00 ` Boaz Harrosh
2015-09-02 16:00 ` Boaz Harrosh
2015-09-02 16:19 ` Dave Hansen
2015-09-02 16:19 ` Dave Hansen
2015-09-02 16:19 ` Dave Hansen
2015-09-03 6:41 ` Boaz Harrosh
2015-09-03 6:41 ` Boaz Harrosh
2015-09-02 10:04 ` Boaz Harrosh
2015-09-02 10:04 ` Boaz Harrosh
2015-09-01 10:08 ` Kirill A. Shutemov
2015-09-01 10:08 ` Kirill A. Shutemov
2015-09-01 11:27 ` Boaz Harrosh
2015-09-01 11:27 ` Boaz Harrosh
2015-09-01 22:49 ` Dave Chinner
2015-09-01 22:49 ` Dave Chinner
2015-09-02 9:13 ` Kirill A. Shutemov
2015-09-02 9:13 ` Kirill A. Shutemov
2015-09-02 9:37 ` Boaz Harrosh
2015-09-02 9:37 ` Boaz Harrosh
2015-09-02 9:37 ` Boaz Harrosh
2015-09-02 9:41 ` Boaz Harrosh
2015-09-02 9:41 ` Boaz Harrosh
2015-09-02 9:41 ` Boaz Harrosh
2015-09-02 9:47 ` Kirill A. Shutemov
2015-09-02 9:47 ` Kirill A. Shutemov
2015-09-02 10:28 ` Boaz Harrosh
2015-09-02 10:28 ` Boaz Harrosh
2015-09-03 0:57 ` Dave Chinner
2015-09-03 0:57 ` Dave Chinner
2015-09-01 13:12 ` Boaz Harrosh
2015-09-01 13:12 ` Boaz Harrosh
2015-09-02 17:47 ` Ross Zwisler
2015-09-02 17:47 ` Ross Zwisler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150902201738.GA5775@node.dhcp.inet.fi \
--to=kirill@shutemov.name \
--cc=akpm@osdl.org \
--cc=boaz@plexistor.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@fromorbit.com \
--cc=hch@lst.de \
--cc=hpa@zytor.com \
--cc=hughd@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@lists.01.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=ross.zwisler@linux.intel.com \
--cc=tglx@linutronix.de \
--cc=viro@zeniv.linux.org.uk \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.