From: Jan Kara <jack@suse.cz>
To: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
"J. Bruce Fields" <bfields@fieldses.org>,
linux-mm@kvack.org, Andreas Dilger <adilger.kernel@dilger.ca>,
"H. Peter Anvin" <hpa@zytor.com>,
Jeff Layton <jlayton@poochiereds.net>,
Dan Williams <dan.j.williams@intel.com>,
linux-nvdimm@lists.01.org, x86@kernel.org,
Ingo Molnar <mingo@redhat.com>,
Matthew Wilcox <willy@linux.intel.com>,
linux-ext4@vger.kernel.org, xfs@oss.sgi.com,
Alexander Viro <viro@zeniv.linux.org.uk>,
Thomas Gleixner <tglx@linutronix.de>,
Theodore Ts'o <tytso@mit.edu>,
linux-kernel@vger.kernel.org, Jan Kara <jack@suse.com>,
linux-fsdevel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Matthew Wilcox <matthew.r.wilcox@intel.com>
Subject: Re: [PATCH v8 2/9] dax: fix conversion of holes to PMDs
Date: Tue, 12 Jan 2016 10:44:51 +0100 [thread overview]
Message-ID: <20160112094451.GS6262@quack.suse.cz> (raw)
In-Reply-To: <1452230879-18117-3-git-send-email-ross.zwisler@linux.intel.com>
On Thu 07-01-16 22:27:52, Ross Zwisler wrote:
> When we get a DAX PMD fault for a write it is possible that there could be
> some number of 4k zero pages already present for the same range that were
> inserted to service reads from a hole. These 4k zero pages need to be
> unmapped from the VMAs and removed from the struct address_space radix tree
> before the real DAX PMD entry can be inserted.
>
> For PTE faults this same use case also exists and is handled by a
> combination of unmap_mapping_range() to unmap the VMAs and
> delete_from_page_cache() to remove the page from the address_space radix
> tree.
>
> For PMD faults we do have a call to unmap_mapping_range() (protected by a
> buffer_new() check), but nothing clears out the radix tree entry. The
> buffer_new() check is also incorrect as the current ext4 and XFS filesystem
> code will never return a buffer_head with BH_New set, even when allocating
> new blocks over a hole. Instead the filesystem will zero the blocks
> manually and return a buffer_head with only BH_Mapped set.
>
> Fix this situation by removing the buffer_new() check and adding a call to
> truncate_inode_pages_range() to clear out the radix tree entries before we
> insert the DAX PMD.
>
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> Reported-by: Dan Williams <dan.j.williams@intel.com>
> Tested-by: Dan Williams <dan.j.williams@intel.com>
Just two nits below. Nothing serious so you can add:
Reviewed-by: Jan Kara <jack@suse.cz>
> ---
> fs/dax.c | 20 ++++++++++----------
> 1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index 513bba5..5b84a46 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -589,6 +589,7 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
> bool write = flags & FAULT_FLAG_WRITE;
> struct block_device *bdev;
> pgoff_t size, pgoff;
> + loff_t lstart, lend;
> sector_t block;
> int result = 0;
>
> @@ -643,15 +644,13 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
> goto fallback;
> }
>
> - /*
> - * If we allocated new storage, make sure no process has any
> - * zero pages covering this hole
> - */
> - if (buffer_new(&bh)) {
> - i_mmap_unlock_read(mapping);
> - unmap_mapping_range(mapping, pgoff << PAGE_SHIFT, PMD_SIZE, 0);
> - i_mmap_lock_read(mapping);
> - }
> + /* make sure no process has any zero pages covering this hole */
> + lstart = pgoff << PAGE_SHIFT;
> + lend = lstart + PMD_SIZE - 1; /* inclusive */
> + i_mmap_unlock_read(mapping);
Just a nit but is there reason why we grab i_mmap_lock_read(mapping) only
to release it a few lines below? The bh checks inside the locked region
don't seem to rely on i_mmap_lock...
> + unmap_mapping_range(mapping, lstart, PMD_SIZE, 0);
> + truncate_inode_pages_range(mapping, lstart, lend);
These two calls can be shortened as:
truncate_pagecache_range(inode, lstart, lend);
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
WARNING: multiple messages have this Message-ID (diff)
From: Jan Kara <jack@suse.cz>
To: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: linux-kernel@vger.kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
"J. Bruce Fields" <bfields@fieldses.org>,
Theodore Ts'o <tytso@mit.edu>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Andreas Dilger <adilger.kernel@dilger.ca>,
Andrew Morton <akpm@linux-foundation.org>,
Dan Williams <dan.j.williams@intel.com>,
Dave Chinner <david@fromorbit.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
Ingo Molnar <mingo@redhat.com>, Jan Kara <jack@suse.com>,
Jeff Layton <jlayton@poochiereds.net>,
Matthew Wilcox <matthew.r.wilcox@intel.com>,
Matthew Wilcox <willy@linux.intel.com>,
Thomas Gleixner <tglx@linutronix.de>,
linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, linux-nvdimm@lists.01.org, x86@kernel.org,
xfs@oss.sgi.com
Subject: Re: [PATCH v8 2/9] dax: fix conversion of holes to PMDs
Date: Tue, 12 Jan 2016 10:44:51 +0100 [thread overview]
Message-ID: <20160112094451.GS6262@quack.suse.cz> (raw)
In-Reply-To: <1452230879-18117-3-git-send-email-ross.zwisler@linux.intel.com>
On Thu 07-01-16 22:27:52, Ross Zwisler wrote:
> When we get a DAX PMD fault for a write it is possible that there could be
> some number of 4k zero pages already present for the same range that were
> inserted to service reads from a hole. These 4k zero pages need to be
> unmapped from the VMAs and removed from the struct address_space radix tree
> before the real DAX PMD entry can be inserted.
>
> For PTE faults this same use case also exists and is handled by a
> combination of unmap_mapping_range() to unmap the VMAs and
> delete_from_page_cache() to remove the page from the address_space radix
> tree.
>
> For PMD faults we do have a call to unmap_mapping_range() (protected by a
> buffer_new() check), but nothing clears out the radix tree entry. The
> buffer_new() check is also incorrect as the current ext4 and XFS filesystem
> code will never return a buffer_head with BH_New set, even when allocating
> new blocks over a hole. Instead the filesystem will zero the blocks
> manually and return a buffer_head with only BH_Mapped set.
>
> Fix this situation by removing the buffer_new() check and adding a call to
> truncate_inode_pages_range() to clear out the radix tree entries before we
> insert the DAX PMD.
>
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> Reported-by: Dan Williams <dan.j.williams@intel.com>
> Tested-by: Dan Williams <dan.j.williams@intel.com>
Just two nits below. Nothing serious so you can add:
Reviewed-by: Jan Kara <jack@suse.cz>
> ---
> fs/dax.c | 20 ++++++++++----------
> 1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index 513bba5..5b84a46 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -589,6 +589,7 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
> bool write = flags & FAULT_FLAG_WRITE;
> struct block_device *bdev;
> pgoff_t size, pgoff;
> + loff_t lstart, lend;
> sector_t block;
> int result = 0;
>
> @@ -643,15 +644,13 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
> goto fallback;
> }
>
> - /*
> - * If we allocated new storage, make sure no process has any
> - * zero pages covering this hole
> - */
> - if (buffer_new(&bh)) {
> - i_mmap_unlock_read(mapping);
> - unmap_mapping_range(mapping, pgoff << PAGE_SHIFT, PMD_SIZE, 0);
> - i_mmap_lock_read(mapping);
> - }
> + /* make sure no process has any zero pages covering this hole */
> + lstart = pgoff << PAGE_SHIFT;
> + lend = lstart + PMD_SIZE - 1; /* inclusive */
> + i_mmap_unlock_read(mapping);
Just a nit but is there reason why we grab i_mmap_lock_read(mapping) only
to release it a few lines below? The bh checks inside the locked region
don't seem to rely on i_mmap_lock...
> + unmap_mapping_range(mapping, lstart, PMD_SIZE, 0);
> + truncate_inode_pages_range(mapping, lstart, lend);
These two calls can be shortened as:
truncate_pagecache_range(inode, lstart, lend);
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Jan Kara <jack@suse.cz>
To: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: linux-kernel@vger.kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
"J. Bruce Fields" <bfields@fieldses.org>,
"Theodore Ts'o" <tytso@mit.edu>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Andreas Dilger <adilger.kernel@dilger.ca>,
Andrew Morton <akpm@linux-foundation.org>,
Dan Williams <dan.j.williams@intel.com>,
Dave Chinner <david@fromorbit.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
Ingo Molnar <mingo@redhat.com>, Jan Kara <jack@suse.com>,
Jeff Layton <jlayton@poochiereds.net>,
Matthew Wilcox <matthew.r.wilcox@intel.com>,
Matthew Wilcox <willy@linux.intel.com>,
Thomas Gleixner <tglx@linutronix.de>,
linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, linux-nvdimm@ml01.01.org, x86@kernel.org,
xfs@oss.sgi.com
Subject: Re: [PATCH v8 2/9] dax: fix conversion of holes to PMDs
Date: Tue, 12 Jan 2016 10:44:51 +0100 [thread overview]
Message-ID: <20160112094451.GS6262@quack.suse.cz> (raw)
In-Reply-To: <1452230879-18117-3-git-send-email-ross.zwisler@linux.intel.com>
On Thu 07-01-16 22:27:52, Ross Zwisler wrote:
> When we get a DAX PMD fault for a write it is possible that there could be
> some number of 4k zero pages already present for the same range that were
> inserted to service reads from a hole. These 4k zero pages need to be
> unmapped from the VMAs and removed from the struct address_space radix tree
> before the real DAX PMD entry can be inserted.
>
> For PTE faults this same use case also exists and is handled by a
> combination of unmap_mapping_range() to unmap the VMAs and
> delete_from_page_cache() to remove the page from the address_space radix
> tree.
>
> For PMD faults we do have a call to unmap_mapping_range() (protected by a
> buffer_new() check), but nothing clears out the radix tree entry. The
> buffer_new() check is also incorrect as the current ext4 and XFS filesystem
> code will never return a buffer_head with BH_New set, even when allocating
> new blocks over a hole. Instead the filesystem will zero the blocks
> manually and return a buffer_head with only BH_Mapped set.
>
> Fix this situation by removing the buffer_new() check and adding a call to
> truncate_inode_pages_range() to clear out the radix tree entries before we
> insert the DAX PMD.
>
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> Reported-by: Dan Williams <dan.j.williams@intel.com>
> Tested-by: Dan Williams <dan.j.williams@intel.com>
Just two nits below. Nothing serious so you can add:
Reviewed-by: Jan Kara <jack@suse.cz>
> ---
> fs/dax.c | 20 ++++++++++----------
> 1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index 513bba5..5b84a46 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -589,6 +589,7 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
> bool write = flags & FAULT_FLAG_WRITE;
> struct block_device *bdev;
> pgoff_t size, pgoff;
> + loff_t lstart, lend;
> sector_t block;
> int result = 0;
>
> @@ -643,15 +644,13 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
> goto fallback;
> }
>
> - /*
> - * If we allocated new storage, make sure no process has any
> - * zero pages covering this hole
> - */
> - if (buffer_new(&bh)) {
> - i_mmap_unlock_read(mapping);
> - unmap_mapping_range(mapping, pgoff << PAGE_SHIFT, PMD_SIZE, 0);
> - i_mmap_lock_read(mapping);
> - }
> + /* make sure no process has any zero pages covering this hole */
> + lstart = pgoff << PAGE_SHIFT;
> + lend = lstart + PMD_SIZE - 1; /* inclusive */
> + i_mmap_unlock_read(mapping);
Just a nit but is there reason why we grab i_mmap_lock_read(mapping) only
to release it a few lines below? The bh checks inside the locked region
don't seem to rely on i_mmap_lock...
> + unmap_mapping_range(mapping, lstart, PMD_SIZE, 0);
> + truncate_inode_pages_range(mapping, lstart, lend);
These two calls can be shortened as:
truncate_pagecache_range(inode, lstart, lend);
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next prev parent reply other threads:[~2016-01-12 9:44 UTC|newest]
Thread overview: 89+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-08 5:27 [PATCH v8 0/9] DAX fsync/msync support Ross Zwisler
2016-01-08 5:27 ` Ross Zwisler
2016-01-08 5:27 ` Ross Zwisler
2016-01-08 5:27 ` Ross Zwisler
2016-01-08 5:27 ` [PATCH v8 1/9] dax: fix NULL pointer dereference in __dax_dbg() Ross Zwisler
2016-01-08 5:27 ` Ross Zwisler
2016-01-08 5:27 ` Ross Zwisler
2016-01-12 9:34 ` Jan Kara
2016-01-12 9:34 ` Jan Kara
2016-01-12 9:34 ` Jan Kara
2016-01-13 7:08 ` Ross Zwisler
2016-01-13 7:08 ` Ross Zwisler
2016-01-13 7:08 ` Ross Zwisler
2016-01-13 9:07 ` Jan Kara
2016-01-13 9:07 ` Jan Kara
2016-01-13 9:07 ` Jan Kara
2016-01-08 5:27 ` [PATCH v8 2/9] dax: fix conversion of holes to PMDs Ross Zwisler
2016-01-08 5:27 ` Ross Zwisler
2016-01-08 5:27 ` Ross Zwisler
2016-01-12 9:44 ` Jan Kara [this message]
2016-01-12 9:44 ` Jan Kara
2016-01-12 9:44 ` Jan Kara
2016-01-13 7:37 ` Ross Zwisler
2016-01-13 7:37 ` Ross Zwisler
2016-01-13 7:37 ` Ross Zwisler
2016-01-08 5:27 ` [PATCH v8 3/9] pmem: add wb_cache_pmem() to the PMEM API Ross Zwisler
2016-01-08 5:27 ` Ross Zwisler
2016-01-08 5:27 ` Ross Zwisler
2016-01-08 5:27 ` [PATCH v8 4/9] dax: support dirty DAX entries in radix tree Ross Zwisler
2016-01-08 5:27 ` Ross Zwisler
2016-01-08 5:27 ` Ross Zwisler
2016-01-13 9:44 ` Jan Kara
2016-01-13 9:44 ` Jan Kara
2016-01-13 9:44 ` Jan Kara
2016-01-13 18:48 ` Ross Zwisler
2016-01-13 18:48 ` Ross Zwisler
2016-01-13 18:48 ` Ross Zwisler
2016-01-13 18:48 ` Ross Zwisler
2016-01-15 13:22 ` Jan Kara
2016-01-15 13:22 ` Jan Kara
2016-01-15 13:22 ` Jan Kara
2016-01-15 13:22 ` Jan Kara
2016-01-15 19:03 ` Ross Zwisler
2016-01-15 19:03 ` Ross Zwisler
2016-01-15 19:03 ` Ross Zwisler
2016-02-03 16:42 ` Ross Zwisler
2016-02-03 16:42 ` Ross Zwisler
2016-02-03 16:42 ` Ross Zwisler
2016-01-08 5:27 ` [PATCH v8 5/9] mm: add find_get_entries_tag() Ross Zwisler
2016-01-08 5:27 ` Ross Zwisler
2016-01-08 5:27 ` Ross Zwisler
2016-01-08 5:27 ` [PATCH v8 6/9] dax: add support for fsync/msync Ross Zwisler
2016-01-08 5:27 ` Ross Zwisler
2016-01-08 5:27 ` Ross Zwisler
2016-01-12 10:57 ` Jan Kara
2016-01-12 10:57 ` Jan Kara
2016-01-12 10:57 ` Jan Kara
2016-01-13 7:30 ` Ross Zwisler
2016-01-13 7:30 ` Ross Zwisler
2016-01-13 7:30 ` Ross Zwisler
2016-01-13 9:35 ` Jan Kara
2016-01-13 9:35 ` Jan Kara
2016-01-13 9:35 ` Jan Kara
2016-01-13 18:58 ` Ross Zwisler
2016-01-13 18:58 ` Ross Zwisler
2016-01-13 18:58 ` Ross Zwisler
2016-01-15 13:10 ` Jan Kara
2016-01-15 13:10 ` Jan Kara
2016-01-15 13:10 ` Jan Kara
2016-02-06 14:33 ` Dmitry Monakhov
2016-02-06 14:33 ` Dmitry Monakhov
2016-02-06 14:33 ` Dmitry Monakhov
2016-02-06 14:33 ` Dmitry Monakhov
2016-02-08 9:44 ` Jan Kara
2016-02-08 9:44 ` Jan Kara
2016-02-08 9:44 ` Jan Kara
2016-02-08 22:06 ` Ross Zwisler
2016-02-08 22:06 ` Ross Zwisler
2016-02-08 22:06 ` Ross Zwisler
2016-01-08 5:27 ` [PATCH v8 7/9] ext2: call dax_pfn_mkwrite() for DAX fsync/msync Ross Zwisler
2016-01-08 5:27 ` Ross Zwisler
2016-01-08 5:27 ` Ross Zwisler
2016-01-08 5:27 ` [PATCH v8 8/9] ext4: " Ross Zwisler
2016-01-08 5:27 ` Ross Zwisler
2016-01-08 5:27 ` Ross Zwisler
2016-01-08 5:27 ` [PATCH v8 9/9] xfs: " Ross Zwisler
2016-01-08 5:27 ` Ross Zwisler
2016-01-08 5:27 ` Ross Zwisler
2016-01-08 5:27 ` Ross Zwisler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160112094451.GS6262@quack.suse.cz \
--to=jack@suse.cz \
--cc=adilger.kernel@dilger.ca \
--cc=akpm@linux-foundation.org \
--cc=bfields@fieldses.org \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=jack@suse.com \
--cc=jlayton@poochiereds.net \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@lists.01.org \
--cc=matthew.r.wilcox@intel.com \
--cc=mingo@redhat.com \
--cc=ross.zwisler@linux.intel.com \
--cc=tglx@linutronix.de \
--cc=tytso@mit.edu \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@linux.intel.com \
--cc=x86@kernel.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.