From: Ross Zwisler <ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
To: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
Cc: linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH 3/6] dax: Avoid page invalidation races and unnecessary radix tree traversals
Date: Tue, 29 Nov 2016 15:31:38 -0700 [thread overview]
Message-ID: <20161129223138.GB16608@linux.intel.com> (raw)
In-Reply-To: <1479980796-26161-4-git-send-email-jack-AlSwsSmVLrQ@public.gmane.org>
On Thu, Nov 24, 2016 at 10:46:33AM +0100, Jan Kara wrote:
> Currently each filesystem (possibly through generic_file_direct_write()
> or iomap_dax_rw()) takes care of invalidating page tables and evicting
Just some nits about the commit message: the DAX I/O path function is now
called dax_iomap_rw(), and no filesystems still use
generic_file_direct_write() for DAX so you can probably remove it from the
changelog - up to you.
Aside from that:
Reviewed-by: Ross Zwisler <ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> hole pages from the radix tree when write(2) to the file happens. This
> invalidation is only necessary when there is some block allocation
> resulting from write(2). Furthermore in current place the invalidation
> is racy wrt page fault instantiating a hole page just after we have
> invalidated it.
>
> So perform the page invalidation inside dax_do_io() where we can do it
> only when really necessary and after blocks have been allocated so
> nobody will be instantiating new hole pages anymore.
>
> Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> Signed-off-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
> ---
> fs/dax.c | 28 +++++++++++-----------------
> 1 file changed, 11 insertions(+), 17 deletions(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index 4534f0e232e9..ddf77ef2ca18 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -984,6 +984,17 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> if (WARN_ON_ONCE(iomap->type != IOMAP_MAPPED))
> return -EIO;
>
> + /*
> + * Write can allocate block for an area which has a hole page mapped
> + * into page tables. We have to tear down these mappings so that data
> + * written by write(2) is visible in mmap.
> + */
> + if ((iomap->flags & IOMAP_F_NEW) && inode->i_mapping->nrpages) {
> + invalidate_inode_pages2_range(inode->i_mapping,
> + pos >> PAGE_SHIFT,
> + (end - 1) >> PAGE_SHIFT);
> + }
> +
> while (pos < end) {
> unsigned offset = pos & (PAGE_SIZE - 1);
> struct blk_dax_ctl dax = { 0 };
> @@ -1042,23 +1053,6 @@ dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
> if (iov_iter_rw(iter) == WRITE)
> flags |= IOMAP_WRITE;
>
> - /*
> - * Yes, even DAX files can have page cache attached to them: A zeroed
> - * page is inserted into the pagecache when we have to serve a write
> - * fault on a hole. It should never be dirtied and can simply be
> - * dropped from the pagecache once we get real data for the page.
> - *
> - * XXX: This is racy against mmap, and there's nothing we can do about
> - * it. We'll eventually need to shift this down even further so that
> - * we can check if we allocated blocks over a hole first.
> - */
> - if (mapping->nrpages) {
> - ret = invalidate_inode_pages2_range(mapping,
> - pos >> PAGE_SHIFT,
> - (pos + iov_iter_count(iter) - 1) >> PAGE_SHIFT);
> - WARN_ON_ONCE(ret);
> - }
> -
> while (iov_iter_count(iter)) {
> ret = iomap_apply(inode, pos, iov_iter_count(iter), flags, ops,
> iter, dax_iomap_actor);
> --
> 2.6.6
>
WARNING: multiple messages have this Message-ID (diff)
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-nvdimm@lists.01.org, linux-mm@kvack.org,
Johannes Weiner <hannes@cmpxchg.org>,
linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org
Subject: Re: [PATCH 3/6] dax: Avoid page invalidation races and unnecessary radix tree traversals
Date: Tue, 29 Nov 2016 15:31:38 -0700 [thread overview]
Message-ID: <20161129223138.GB16608@linux.intel.com> (raw)
In-Reply-To: <1479980796-26161-4-git-send-email-jack@suse.cz>
On Thu, Nov 24, 2016 at 10:46:33AM +0100, Jan Kara wrote:
> Currently each filesystem (possibly through generic_file_direct_write()
> or iomap_dax_rw()) takes care of invalidating page tables and evicting
Just some nits about the commit message: the DAX I/O path function is now
called dax_iomap_rw(), and no filesystems still use
generic_file_direct_write() for DAX so you can probably remove it from the
changelog - up to you.
Aside from that:
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> hole pages from the radix tree when write(2) to the file happens. This
> invalidation is only necessary when there is some block allocation
> resulting from write(2). Furthermore in current place the invalidation
> is racy wrt page fault instantiating a hole page just after we have
> invalidated it.
>
> So perform the page invalidation inside dax_do_io() where we can do it
> only when really necessary and after blocks have been allocated so
> nobody will be instantiating new hole pages anymore.
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> fs/dax.c | 28 +++++++++++-----------------
> 1 file changed, 11 insertions(+), 17 deletions(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index 4534f0e232e9..ddf77ef2ca18 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -984,6 +984,17 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> if (WARN_ON_ONCE(iomap->type != IOMAP_MAPPED))
> return -EIO;
>
> + /*
> + * Write can allocate block for an area which has a hole page mapped
> + * into page tables. We have to tear down these mappings so that data
> + * written by write(2) is visible in mmap.
> + */
> + if ((iomap->flags & IOMAP_F_NEW) && inode->i_mapping->nrpages) {
> + invalidate_inode_pages2_range(inode->i_mapping,
> + pos >> PAGE_SHIFT,
> + (end - 1) >> PAGE_SHIFT);
> + }
> +
> while (pos < end) {
> unsigned offset = pos & (PAGE_SIZE - 1);
> struct blk_dax_ctl dax = { 0 };
> @@ -1042,23 +1053,6 @@ dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
> if (iov_iter_rw(iter) == WRITE)
> flags |= IOMAP_WRITE;
>
> - /*
> - * Yes, even DAX files can have page cache attached to them: A zeroed
> - * page is inserted into the pagecache when we have to serve a write
> - * fault on a hole. It should never be dirtied and can simply be
> - * dropped from the pagecache once we get real data for the page.
> - *
> - * XXX: This is racy against mmap, and there's nothing we can do about
> - * it. We'll eventually need to shift this down even further so that
> - * we can check if we allocated blocks over a hole first.
> - */
> - if (mapping->nrpages) {
> - ret = invalidate_inode_pages2_range(mapping,
> - pos >> PAGE_SHIFT,
> - (pos + iov_iter_count(iter) - 1) >> PAGE_SHIFT);
> - WARN_ON_ONCE(ret);
> - }
> -
> while (iov_iter_count(iter)) {
> ret = iomap_apply(inode, pos, iov_iter_count(iter), flags, ops,
> iter, dax_iomap_actor);
> --
> 2.6.6
>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
WARNING: multiple messages have this Message-ID (diff)
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-fsdevel@vger.kernel.org,
Ross Zwisler <ross.zwisler@linux.intel.com>,
linux-ext4@vger.kernel.org, linux-mm@kvack.org,
linux-nvdimm@lists.01.org, Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH 3/6] dax: Avoid page invalidation races and unnecessary radix tree traversals
Date: Tue, 29 Nov 2016 15:31:38 -0700 [thread overview]
Message-ID: <20161129223138.GB16608@linux.intel.com> (raw)
In-Reply-To: <1479980796-26161-4-git-send-email-jack@suse.cz>
On Thu, Nov 24, 2016 at 10:46:33AM +0100, Jan Kara wrote:
> Currently each filesystem (possibly through generic_file_direct_write()
> or iomap_dax_rw()) takes care of invalidating page tables and evicting
Just some nits about the commit message: the DAX I/O path function is now
called dax_iomap_rw(), and no filesystems still use
generic_file_direct_write() for DAX so you can probably remove it from the
changelog - up to you.
Aside from that:
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> hole pages from the radix tree when write(2) to the file happens. This
> invalidation is only necessary when there is some block allocation
> resulting from write(2). Furthermore in current place the invalidation
> is racy wrt page fault instantiating a hole page just after we have
> invalidated it.
>
> So perform the page invalidation inside dax_do_io() where we can do it
> only when really necessary and after blocks have been allocated so
> nobody will be instantiating new hole pages anymore.
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> fs/dax.c | 28 +++++++++++-----------------
> 1 file changed, 11 insertions(+), 17 deletions(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index 4534f0e232e9..ddf77ef2ca18 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -984,6 +984,17 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> if (WARN_ON_ONCE(iomap->type != IOMAP_MAPPED))
> return -EIO;
>
> + /*
> + * Write can allocate block for an area which has a hole page mapped
> + * into page tables. We have to tear down these mappings so that data
> + * written by write(2) is visible in mmap.
> + */
> + if ((iomap->flags & IOMAP_F_NEW) && inode->i_mapping->nrpages) {
> + invalidate_inode_pages2_range(inode->i_mapping,
> + pos >> PAGE_SHIFT,
> + (end - 1) >> PAGE_SHIFT);
> + }
> +
> while (pos < end) {
> unsigned offset = pos & (PAGE_SIZE - 1);
> struct blk_dax_ctl dax = { 0 };
> @@ -1042,23 +1053,6 @@ dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
> if (iov_iter_rw(iter) == WRITE)
> flags |= IOMAP_WRITE;
>
> - /*
> - * Yes, even DAX files can have page cache attached to them: A zeroed
> - * page is inserted into the pagecache when we have to serve a write
> - * fault on a hole. It should never be dirtied and can simply be
> - * dropped from the pagecache once we get real data for the page.
> - *
> - * XXX: This is racy against mmap, and there's nothing we can do about
> - * it. We'll eventually need to shift this down even further so that
> - * we can check if we allocated blocks over a hole first.
> - */
> - if (mapping->nrpages) {
> - ret = invalidate_inode_pages2_range(mapping,
> - pos >> PAGE_SHIFT,
> - (pos + iov_iter_count(iter) - 1) >> PAGE_SHIFT);
> - WARN_ON_ONCE(ret);
> - }
> -
> while (iov_iter_count(iter)) {
> ret = iomap_apply(inode, pos, iov_iter_count(iter), flags, ops,
> iter, dax_iomap_actor);
> --
> 2.6.6
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-11-29 22:31 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-24 9:46 [PATCH 0/6 v2] dax: Page invalidation fixes Jan Kara
2016-11-24 9:46 ` Jan Kara
2016-11-24 9:46 ` Jan Kara
2016-11-24 9:46 ` Jan Kara
2016-11-24 9:46 ` [PATCH 1/6] ext2: Return BH_New buffers for zeroed blocks Jan Kara
2016-11-24 9:46 ` Jan Kara
2016-11-24 9:46 ` Jan Kara
2016-11-29 17:48 ` Ross Zwisler
2016-11-29 17:48 ` Ross Zwisler
[not found] ` <1479980796-26161-1-git-send-email-jack-AlSwsSmVLrQ@public.gmane.org>
2016-11-24 9:46 ` [PATCH 2/6] mm: Invalidate DAX radix tree entries only if appropriate Jan Kara
2016-11-24 9:46 ` Jan Kara
2016-11-24 9:46 ` Jan Kara
2016-11-24 9:46 ` Jan Kara
2016-11-29 19:34 ` Johannes Weiner
2016-11-29 19:34 ` Johannes Weiner
[not found] ` <20161129193403.GA12396-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2016-11-30 8:08 ` Jan Kara
2016-11-30 8:08 ` Jan Kara
2016-11-30 8:08 ` Jan Kara
[not found] ` <20161130080841.GD16667-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>
2016-11-30 15:59 ` Johannes Weiner
2016-11-30 15:59 ` Johannes Weiner
2016-12-09 12:02 ` Jan Kara
2016-12-09 12:02 ` Jan Kara
2016-11-29 22:17 ` Ross Zwisler
2016-11-24 9:46 ` [PATCH 3/6] dax: Avoid page invalidation races and unnecessary radix tree traversals Jan Kara
2016-11-24 9:46 ` Jan Kara
2016-11-24 9:46 ` Jan Kara
2016-11-24 9:46 ` Jan Kara
[not found] ` <1479980796-26161-4-git-send-email-jack-AlSwsSmVLrQ@public.gmane.org>
2016-11-29 22:31 ` Ross Zwisler [this message]
2016-11-29 22:31 ` Ross Zwisler
2016-11-29 22:31 ` Ross Zwisler
2016-11-30 8:23 ` Jan Kara
2016-11-30 8:23 ` Jan Kara
2016-11-24 9:46 ` [PATCH 4/6] dax: Finish fault completely when loading holes Jan Kara
2016-11-24 9:46 ` Jan Kara
2016-11-24 9:46 ` Jan Kara
2016-11-24 9:46 ` Jan Kara
[not found] ` <1479980796-26161-5-git-send-email-jack-AlSwsSmVLrQ@public.gmane.org>
2016-12-01 22:13 ` Ross Zwisler
2016-12-01 22:13 ` Ross Zwisler
2016-11-24 9:46 ` [PATCH 6/6] ext4: Simplify DAX fault path Jan Kara
2016-11-24 9:46 ` Jan Kara
2016-11-24 9:46 ` Jan Kara
2016-11-24 9:46 ` Jan Kara
2016-11-24 9:46 ` [PATCH 5/6] dax: Call ->iomap_begin without entry lock during dax fault Jan Kara
2016-11-24 9:46 ` Jan Kara
2016-11-24 9:46 ` Jan Kara
2016-11-24 9:46 ` Jan Kara
2016-12-01 22:24 ` Ross Zwisler
2016-12-01 22:24 ` Ross Zwisler
2016-12-01 23:27 ` Ross Zwisler
2016-12-01 23:27 ` Ross Zwisler
2016-12-02 10:12 ` Jan Kara
2016-12-02 10:08 ` Jan Kara
2016-12-02 10:08 ` Jan Kara
-- strict thread matches above, loose matches on Subject: below --
2016-12-12 16:47 [PATCH 0/6 v3] dax: Page invalidation fixes Jan Kara
2016-12-12 16:47 ` [PATCH 3/6] dax: Avoid page invalidation races and unnecessary radix tree traversals Jan Kara
2016-12-12 16:47 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161129223138.GB16608@linux.intel.com \
--to=ross.zwisler-vuqaysv1563yd54fqh9/ca@public.gmane.org \
--cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
--cc=jack-AlSwsSmVLrQ@public.gmane.org \
--cc=linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
--cc=linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.