From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"H. Peter Anvin" <hpa@zytor.com>,
"J. Bruce Fields" <bfields@fieldses.org>,
Theodore Ts'o <tytso@mit.edu>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Andreas Dilger <adilger.kernel@dilger.ca>,
Andrew Morton <akpm@linux-foundation.org>,
Dave Chinner <david@fromorbit.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
Ingo Molnar <mingo@redhat.com>, Jan Kara <jack@suse.com>,
Jeff Layton <jlayton@poochiereds.net>,
Matthew Wilcox <matthew.r.wilcox@intel.com>,
Matthew Wilcox <willy@linux.intel.com>,
Thomas Gleixner <tglx@linutronix.de>,
linux-ext4 <linux-ext4@vger.kernel.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Linux MM <linux-mm@kvack.org>,
"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
X86 ML <x86@kernel.org>, XFS Developers <xfs@oss.sgi.com>
Subject: Re: [PATCH v7 2/9] dax: fix conversion of holes to PMDs
Date: Thu, 7 Jan 2016 15:34:55 -0700 [thread overview]
Message-ID: <20160107223455.GC20802@linux.intel.com> (raw)
In-Reply-To: <CAPcyv4ig1W8LpC6ORYCZd65idK3QuOYa40FsbujWXXaZT_WMRA@mail.gmail.com>
On Wed, Jan 06, 2016 at 11:04:35AM -0800, Dan Williams wrote:
> On Wed, Jan 6, 2016 at 10:00 AM, Ross Zwisler
> <ross.zwisler@linux.intel.com> wrote:
> > When we get a DAX PMD fault for a write it is possible that there could be
> > some number of 4k zero pages already present for the same range that were
> > inserted to service reads from a hole. These 4k zero pages need to be
> > unmapped from the VMAs and removed from the struct address_space radix tree
> > before the real DAX PMD entry can be inserted.
> >
> > For PTE faults this same use case also exists and is handled by a
> > combination of unmap_mapping_range() to unmap the VMAs and
> > delete_from_page_cache() to remove the page from the address_space radix
> > tree.
> >
> > For PMD faults we do have a call to unmap_mapping_range() (protected by a
> > buffer_new() check), but nothing clears out the radix tree entry. The
> > buffer_new() check is also incorrect as the current ext4 and XFS filesystem
> > code will never return a buffer_head with BH_New set, even when allocating
> > new blocks over a hole. Instead the filesystem will zero the blocks
> > manually and return a buffer_head with only BH_Mapped set.
> >
> > Fix this situation by removing the buffer_new() check and adding a call to
> > truncate_inode_pages_range() to clear out the radix tree entries before we
> > insert the DAX PMD.
> >
> > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
>
> Replaced the current contents of v6 in -mm from next-20160106 with
> this v7 set and it looks good.
>
> Reported-by: Dan Williams <dan.j.williams@intel.com>
> Tested-by: Dan Williams <dan.j.williams@intel.com>
>
> One question below...
>
> > ---
> > fs/dax.c | 20 ++++++++++----------
> > 1 file changed, 10 insertions(+), 10 deletions(-)
> >
> > diff --git a/fs/dax.c b/fs/dax.c
> > index 03cc4a3..9dc0c97 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -594,6 +594,7 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
> > bool write = flags & FAULT_FLAG_WRITE;
> > struct block_device *bdev;
> > pgoff_t size, pgoff;
> > + loff_t lstart, lend;
> > sector_t block;
> > int result = 0;
> >
> > @@ -647,15 +648,13 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
> > goto fallback;
> > }
> >
> > - /*
> > - * If we allocated new storage, make sure no process has any
> > - * zero pages covering this hole
> > - */
> > - if (buffer_new(&bh)) {
> > - i_mmap_unlock_read(mapping);
> > - unmap_mapping_range(mapping, pgoff << PAGE_SHIFT, PMD_SIZE, 0);
> > - i_mmap_lock_read(mapping);
> > - }
> > + /* make sure no process has any zero pages covering this hole */
> > + lstart = pgoff << PAGE_SHIFT;
> > + lend = lstart + PMD_SIZE - 1; /* inclusive */
> > + i_mmap_unlock_read(mapping);
> > + unmap_mapping_range(mapping, lstart, PMD_SIZE, 0);
> > + truncate_inode_pages_range(mapping, lstart, lend);
>
> Do we need to do both unmap and truncate given that
> truncate_inode_page() optionally does an unmap_mapping_range()
> internally?
Ah, indeed it does. Sure, having just the call to truncate_inode_page() seems
cleaner. I'll re-test and send this out in v8.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
"J. Bruce Fields" <bfields@fieldses.org>,
Linux MM <linux-mm@kvack.org>,
Andreas Dilger <adilger.kernel@dilger.ca>,
"H. Peter Anvin" <hpa@zytor.com>,
Jeff Layton <jlayton@poochiereds.net>,
"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
X86 ML <x86@kernel.org>, Ingo Molnar <mingo@redhat.com>,
Matthew Wilcox <willy@linux.intel.com>,
Ross Zwisler <ross.zwisler@linux.intel.com>,
linux-ext4 <linux-ext4@vger.kernel.org>,
XFS Developers <xfs@oss.sgi.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Thomas Gleixner <tglx@linutronix.de>,
Theodore Ts'o <tytso@mit.edu>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Jan Kara <jack@suse.com>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Matthew Wilcox <matthew.r.wilcox@intel.com>
Subject: Re: [PATCH v7 2/9] dax: fix conversion of holes to PMDs
Date: Thu, 7 Jan 2016 15:34:55 -0700 [thread overview]
Message-ID: <20160107223455.GC20802@linux.intel.com> (raw)
In-Reply-To: <CAPcyv4ig1W8LpC6ORYCZd65idK3QuOYa40FsbujWXXaZT_WMRA@mail.gmail.com>
On Wed, Jan 06, 2016 at 11:04:35AM -0800, Dan Williams wrote:
> On Wed, Jan 6, 2016 at 10:00 AM, Ross Zwisler
> <ross.zwisler@linux.intel.com> wrote:
> > When we get a DAX PMD fault for a write it is possible that there could be
> > some number of 4k zero pages already present for the same range that were
> > inserted to service reads from a hole. These 4k zero pages need to be
> > unmapped from the VMAs and removed from the struct address_space radix tree
> > before the real DAX PMD entry can be inserted.
> >
> > For PTE faults this same use case also exists and is handled by a
> > combination of unmap_mapping_range() to unmap the VMAs and
> > delete_from_page_cache() to remove the page from the address_space radix
> > tree.
> >
> > For PMD faults we do have a call to unmap_mapping_range() (protected by a
> > buffer_new() check), but nothing clears out the radix tree entry. The
> > buffer_new() check is also incorrect as the current ext4 and XFS filesystem
> > code will never return a buffer_head with BH_New set, even when allocating
> > new blocks over a hole. Instead the filesystem will zero the blocks
> > manually and return a buffer_head with only BH_Mapped set.
> >
> > Fix this situation by removing the buffer_new() check and adding a call to
> > truncate_inode_pages_range() to clear out the radix tree entries before we
> > insert the DAX PMD.
> >
> > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
>
> Replaced the current contents of v6 in -mm from next-20160106 with
> this v7 set and it looks good.
>
> Reported-by: Dan Williams <dan.j.williams@intel.com>
> Tested-by: Dan Williams <dan.j.williams@intel.com>
>
> One question below...
>
> > ---
> > fs/dax.c | 20 ++++++++++----------
> > 1 file changed, 10 insertions(+), 10 deletions(-)
> >
> > diff --git a/fs/dax.c b/fs/dax.c
> > index 03cc4a3..9dc0c97 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -594,6 +594,7 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
> > bool write = flags & FAULT_FLAG_WRITE;
> > struct block_device *bdev;
> > pgoff_t size, pgoff;
> > + loff_t lstart, lend;
> > sector_t block;
> > int result = 0;
> >
> > @@ -647,15 +648,13 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
> > goto fallback;
> > }
> >
> > - /*
> > - * If we allocated new storage, make sure no process has any
> > - * zero pages covering this hole
> > - */
> > - if (buffer_new(&bh)) {
> > - i_mmap_unlock_read(mapping);
> > - unmap_mapping_range(mapping, pgoff << PAGE_SHIFT, PMD_SIZE, 0);
> > - i_mmap_lock_read(mapping);
> > - }
> > + /* make sure no process has any zero pages covering this hole */
> > + lstart = pgoff << PAGE_SHIFT;
> > + lend = lstart + PMD_SIZE - 1; /* inclusive */
> > + i_mmap_unlock_read(mapping);
> > + unmap_mapping_range(mapping, lstart, PMD_SIZE, 0);
> > + truncate_inode_pages_range(mapping, lstart, lend);
>
> Do we need to do both unmap and truncate given that
> truncate_inode_page() optionally does an unmap_mapping_range()
> internally?
Ah, indeed it does. Sure, having just the call to truncate_inode_page() seems
cleaner. I'll re-test and send this out in v8.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
WARNING: multiple messages have this Message-ID (diff)
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"H. Peter Anvin" <hpa@zytor.com>,
"J. Bruce Fields" <bfields@fieldses.org>,
"Theodore Ts'o" <tytso@mit.edu>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Andreas Dilger <adilger.kernel@dilger.ca>,
Andrew Morton <akpm@linux-foundation.org>,
Dave Chinner <david@fromorbit.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
Ingo Molnar <mingo@redhat.com>, Jan Kara <jack@suse.com>,
Jeff Layton <jlayton@poochiereds.net>,
Matthew Wilcox <matthew.r.wilcox@intel.com>,
Matthew Wilcox <willy@linux.intel.com>,
Thomas Gleixner <tglx@linutronix.de>,
linux-ext4 <linux-ext4@vger.kernel.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Linux MM <linux-mm@kvack.org>,
"linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>,
X86 ML <x86@kernel.org>, XFS Developers <xfs@oss.sgi.com>
Subject: Re: [PATCH v7 2/9] dax: fix conversion of holes to PMDs
Date: Thu, 7 Jan 2016 15:34:55 -0700 [thread overview]
Message-ID: <20160107223455.GC20802@linux.intel.com> (raw)
In-Reply-To: <CAPcyv4ig1W8LpC6ORYCZd65idK3QuOYa40FsbujWXXaZT_WMRA@mail.gmail.com>
On Wed, Jan 06, 2016 at 11:04:35AM -0800, Dan Williams wrote:
> On Wed, Jan 6, 2016 at 10:00 AM, Ross Zwisler
> <ross.zwisler@linux.intel.com> wrote:
> > When we get a DAX PMD fault for a write it is possible that there could be
> > some number of 4k zero pages already present for the same range that were
> > inserted to service reads from a hole. These 4k zero pages need to be
> > unmapped from the VMAs and removed from the struct address_space radix tree
> > before the real DAX PMD entry can be inserted.
> >
> > For PTE faults this same use case also exists and is handled by a
> > combination of unmap_mapping_range() to unmap the VMAs and
> > delete_from_page_cache() to remove the page from the address_space radix
> > tree.
> >
> > For PMD faults we do have a call to unmap_mapping_range() (protected by a
> > buffer_new() check), but nothing clears out the radix tree entry. The
> > buffer_new() check is also incorrect as the current ext4 and XFS filesystem
> > code will never return a buffer_head with BH_New set, even when allocating
> > new blocks over a hole. Instead the filesystem will zero the blocks
> > manually and return a buffer_head with only BH_Mapped set.
> >
> > Fix this situation by removing the buffer_new() check and adding a call to
> > truncate_inode_pages_range() to clear out the radix tree entries before we
> > insert the DAX PMD.
> >
> > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
>
> Replaced the current contents of v6 in -mm from next-20160106 with
> this v7 set and it looks good.
>
> Reported-by: Dan Williams <dan.j.williams@intel.com>
> Tested-by: Dan Williams <dan.j.williams@intel.com>
>
> One question below...
>
> > ---
> > fs/dax.c | 20 ++++++++++----------
> > 1 file changed, 10 insertions(+), 10 deletions(-)
> >
> > diff --git a/fs/dax.c b/fs/dax.c
> > index 03cc4a3..9dc0c97 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -594,6 +594,7 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
> > bool write = flags & FAULT_FLAG_WRITE;
> > struct block_device *bdev;
> > pgoff_t size, pgoff;
> > + loff_t lstart, lend;
> > sector_t block;
> > int result = 0;
> >
> > @@ -647,15 +648,13 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
> > goto fallback;
> > }
> >
> > - /*
> > - * If we allocated new storage, make sure no process has any
> > - * zero pages covering this hole
> > - */
> > - if (buffer_new(&bh)) {
> > - i_mmap_unlock_read(mapping);
> > - unmap_mapping_range(mapping, pgoff << PAGE_SHIFT, PMD_SIZE, 0);
> > - i_mmap_lock_read(mapping);
> > - }
> > + /* make sure no process has any zero pages covering this hole */
> > + lstart = pgoff << PAGE_SHIFT;
> > + lend = lstart + PMD_SIZE - 1; /* inclusive */
> > + i_mmap_unlock_read(mapping);
> > + unmap_mapping_range(mapping, lstart, PMD_SIZE, 0);
> > + truncate_inode_pages_range(mapping, lstart, lend);
>
> Do we need to do both unmap and truncate given that
> truncate_inode_page() optionally does an unmap_mapping_range()
> internally?
Ah, indeed it does. Sure, having just the call to truncate_inode_page() seems
cleaner. I'll re-test and send this out in v8.
next prev parent reply other threads:[~2016-01-07 22:34 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-06 18:00 [PATCH v7 0/9] DAX fsync/msync support Ross Zwisler
2016-01-06 18:00 ` Ross Zwisler
2016-01-06 18:00 ` Ross Zwisler
2016-01-06 18:00 ` Ross Zwisler
2016-01-06 18:00 ` [PATCH v7 1/9] dax: fix NULL pointer dereference in __dax_dbg() Ross Zwisler
2016-01-06 18:00 ` Ross Zwisler
2016-01-06 18:00 ` Ross Zwisler
2016-01-06 18:00 ` Ross Zwisler
2016-01-06 19:14 ` Dan Williams
2016-01-06 19:14 ` Dan Williams
2016-01-06 19:14 ` Dan Williams
2016-01-07 9:34 ` Jan Kara
2016-01-07 9:34 ` Jan Kara
2016-01-07 9:34 ` Jan Kara
2016-01-07 9:34 ` Jan Kara
2016-01-07 15:17 ` Dan Williams
2016-01-07 15:17 ` Dan Williams
2016-01-07 15:17 ` Dan Williams
2016-01-07 15:17 ` Dan Williams
2016-01-07 22:16 ` Ross Zwisler
2016-01-07 22:16 ` Ross Zwisler
2016-01-07 22:16 ` Ross Zwisler
2016-01-07 23:10 ` Dave Chinner
2016-01-07 23:10 ` Dave Chinner
2016-01-07 23:10 ` Dave Chinner
2016-01-07 23:39 ` Ross Zwisler
2016-01-07 23:39 ` Ross Zwisler
2016-01-07 23:39 ` Ross Zwisler
2016-01-07 23:39 ` Ross Zwisler
2016-01-06 18:00 ` [PATCH v7 2/9] dax: fix conversion of holes to PMDs Ross Zwisler
2016-01-06 18:00 ` Ross Zwisler
2016-01-06 18:00 ` Ross Zwisler
2016-01-06 19:04 ` Dan Williams
2016-01-06 19:04 ` Dan Williams
2016-01-06 19:04 ` Dan Williams
2016-01-06 19:04 ` Dan Williams
2016-01-07 22:34 ` Ross Zwisler [this message]
2016-01-07 22:34 ` Ross Zwisler
2016-01-07 22:34 ` Ross Zwisler
2016-01-08 4:18 ` Ross Zwisler
2016-01-08 4:18 ` Ross Zwisler
2016-01-08 4:18 ` Ross Zwisler
2016-01-08 4:18 ` Ross Zwisler
2016-01-07 13:22 ` Jan Kara
2016-01-07 13:22 ` Jan Kara
2016-01-07 13:22 ` Jan Kara
2016-01-07 22:11 ` Ross Zwisler
2016-01-07 22:11 ` Ross Zwisler
2016-01-07 22:11 ` Ross Zwisler
2016-01-11 12:23 ` Jan Kara
2016-01-11 12:23 ` Jan Kara
2016-01-11 12:23 ` Jan Kara
2016-01-06 18:00 ` [PATCH v7 3/9] pmem: add wb_cache_pmem() to the PMEM API Ross Zwisler
2016-01-06 18:00 ` Ross Zwisler
2016-01-06 18:00 ` Ross Zwisler
2016-01-06 18:00 ` [PATCH v7 4/9] dax: support dirty DAX entries in radix tree Ross Zwisler
2016-01-06 18:00 ` Ross Zwisler
2016-01-06 18:00 ` Ross Zwisler
2016-01-06 18:00 ` [PATCH v7 5/9] mm: add find_get_entries_tag() Ross Zwisler
2016-01-06 18:00 ` Ross Zwisler
2016-01-06 18:00 ` Ross Zwisler
2016-01-06 18:01 ` [PATCH v7 6/9] dax: add support for fsync/msync Ross Zwisler
2016-01-06 18:01 ` Ross Zwisler
2016-01-06 18:01 ` Ross Zwisler
2016-01-06 18:01 ` [PATCH v7 7/9] ext2: call dax_pfn_mkwrite() for DAX fsync/msync Ross Zwisler
2016-01-06 18:01 ` Ross Zwisler
2016-01-06 18:01 ` Ross Zwisler
2016-01-06 18:01 ` [PATCH v7 8/9] ext4: " Ross Zwisler
2016-01-06 18:01 ` Ross Zwisler
2016-01-06 18:01 ` Ross Zwisler
2016-01-06 18:01 ` [PATCH v7 9/9] xfs: " Ross Zwisler
2016-01-06 18:01 ` Ross Zwisler
2016-01-06 18:01 ` Ross Zwisler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160107223455.GC20802@linux.intel.com \
--to=ross.zwisler@linux.intel.com \
--cc=adilger.kernel@dilger.ca \
--cc=akpm@linux-foundation.org \
--cc=bfields@fieldses.org \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@fromorbit.com \
--cc=hpa@zytor.com \
--cc=jack@suse.com \
--cc=jlayton@poochiereds.net \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@lists.01.org \
--cc=matthew.r.wilcox@intel.com \
--cc=mingo@redhat.com \
--cc=tglx@linutronix.de \
--cc=tytso@mit.edu \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@linux.intel.com \
--cc=x86@kernel.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.