From: Jan Kara <jack@suse.cz>
To: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>,
linux-kernel@vger.kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
"J. Bruce Fields" <bfields@fieldses.org>,
Theodore Ts'o <tytso@mit.edu>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Andreas Dilger <adilger.kernel@dilger.ca>,
Andrew Morton <akpm@linux-foundation.org>,
Dan Williams <dan.j.williams@intel.com>,
Dave Chinner <david@fromorbit.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
Ingo Molnar <mingo@redhat.com>, Jan Kara <jack@suse.com>,
Jeff Layton <jlayton@poochiereds.net>,
Matthew Wilcox <matthew.r.wilcox@intel.com>,
Matthew Wilcox <willy@linux.intel.com>,
Thomas Gleixner <tglx@linutronix.de>,
linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, linux-nvdimm@lists.01.org, x86@kernel.org,
xfs@oss.sgi.com
Subject: Re: [PATCH v8 4/9] dax: support dirty DAX entries in radix tree
Date: Fri, 15 Jan 2016 14:22:49 +0100 [thread overview]
Message-ID: <20160115132249.GL15950@quack.suse.cz> (raw)
In-Reply-To: <20160113184832.GA5904@linux.intel.com>
On Wed 13-01-16 11:48:32, Ross Zwisler wrote:
> On Wed, Jan 13, 2016 at 10:44:11AM +0100, Jan Kara wrote:
> > On Thu 07-01-16 22:27:54, Ross Zwisler wrote:
> > > Add support for tracking dirty DAX entries in the struct address_space
> > > radix tree. This tree is already used for dirty page writeback, and it
> > > already supports the use of exceptional (non struct page*) entries.
> > >
> > > In order to properly track dirty DAX pages we will insert new exceptional
> > > entries into the radix tree that represent dirty DAX PTE or PMD pages.
> > > These exceptional entries will also contain the writeback sectors for the
> > > PTE or PMD faults that we can use at fsync/msync time.
> > >
> > > There are currently two types of exceptional entries (shmem and shadow)
> > > that can be placed into the radix tree, and this adds a third. We rely on
> > > the fact that only one type of exceptional entry can be found in a given
> > > radix tree based on its usage. This happens for free with DAX vs shmem but
> > > we explicitly prevent shadow entries from being added to radix trees for
> > > DAX mappings.
> > >
> > > The only shadow entries that would be generated for DAX radix trees would
> > > be to track zero page mappings that were created for holes. These pages
> > > would receive minimal benefit from having shadow entries, and the choice
> > > to have only one type of exceptional entry in a given radix tree makes the
> > > logic simpler both in clear_exceptional_entry() and in the rest of DAX.
> > >
> > > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> > > Reviewed-by: Jan Kara <jack@suse.cz>
> >
> > I have realized there's one issue with this code. See below:
> >
> > > @@ -34,31 +35,39 @@ static void clear_exceptional_entry(struct address_space *mapping,
> > > return;
> > >
> > > spin_lock_irq(&mapping->tree_lock);
> > > - /*
> > > - * Regular page slots are stabilized by the page lock even
> > > - * without the tree itself locked. These unlocked entries
> > > - * need verification under the tree lock.
> > > - */
> > > - if (!__radix_tree_lookup(&mapping->page_tree, index, &node, &slot))
> > > - goto unlock;
> > > - if (*slot != entry)
> > > - goto unlock;
> > > - radix_tree_replace_slot(slot, NULL);
> > > - mapping->nrshadows--;
> > > - if (!node)
> > > - goto unlock;
> > > - workingset_node_shadows_dec(node);
> > > - /*
> > > - * Don't track node without shadow entries.
> > > - *
> > > - * Avoid acquiring the list_lru lock if already untracked.
> > > - * The list_empty() test is safe as node->private_list is
> > > - * protected by mapping->tree_lock.
> > > - */
> > > - if (!workingset_node_shadows(node) &&
> > > - !list_empty(&node->private_list))
> > > - list_lru_del(&workingset_shadow_nodes, &node->private_list);
> > > - __radix_tree_delete_node(&mapping->page_tree, node);
> > > +
> > > + if (dax_mapping(mapping)) {
> > > + if (radix_tree_delete_item(&mapping->page_tree, index, entry))
> > > + mapping->nrexceptional--;
> >
> > So when you punch hole in a file, you can delete a PMD entry from a radix
> > tree which covers part of the file which still stays. So in this case you
> > have to split the PMD entry into PTE entries (probably that needs to happen
> > up in truncate_inode_pages_range()) or something similar...
>
> I think (and will verify) that the DAX code just unmaps the entire PMD range
> when we receive a hole punch request inside of the PMD. If this is true then
> I think the radix tree code should behave the same way and just remove the PMD
> entry in the radix tree.
But you cannot just remove it if it is dirty... You have to keep somewhere
information that part of the PMD range is still dirty (or write that range
out before removing the radix tree entry).
> This will cause new accesses that used to land in the PMD range to get new
> page faults. These faults will call get_blocks(), where presumably the
> filesystem will tell us that we don't have a contiguous 2MiB range anymore, so
> we will fall back to PTE faults. These PTEs will fill in both the radix tree
> and the page tables.
>
> So, I think the work here is to verify the behavior of DAX wrt hole punches
> for PMD ranges, and make the radix tree code match that behavior. Sound good?
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next prev parent reply other threads:[~2016-01-15 13:22 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-08 5:27 [PATCH v8 0/9] DAX fsync/msync support Ross Zwisler
2016-01-08 5:27 ` [PATCH v8 1/9] dax: fix NULL pointer dereference in __dax_dbg() Ross Zwisler
2016-01-12 9:34 ` Jan Kara
2016-01-13 7:08 ` Ross Zwisler
2016-01-13 9:07 ` Jan Kara
2016-01-08 5:27 ` [PATCH v8 2/9] dax: fix conversion of holes to PMDs Ross Zwisler
2016-01-12 9:44 ` Jan Kara
2016-01-13 7:37 ` Ross Zwisler
2016-01-08 5:27 ` [PATCH v8 3/9] pmem: add wb_cache_pmem() to the PMEM API Ross Zwisler
2016-01-08 5:27 ` [PATCH v8 4/9] dax: support dirty DAX entries in radix tree Ross Zwisler
2016-01-13 9:44 ` Jan Kara
2016-01-13 18:48 ` Ross Zwisler
2016-01-15 13:22 ` Jan Kara [this message]
2016-01-15 19:03 ` Ross Zwisler
2016-02-03 16:42 ` Ross Zwisler
2016-01-08 5:27 ` [PATCH v8 5/9] mm: add find_get_entries_tag() Ross Zwisler
2016-01-08 5:27 ` [PATCH v8 6/9] dax: add support for fsync/msync Ross Zwisler
2016-01-12 10:57 ` Jan Kara
2016-01-13 7:30 ` Ross Zwisler
2016-01-13 9:35 ` Jan Kara
2016-01-13 18:58 ` Ross Zwisler
2016-01-15 13:10 ` Jan Kara
2016-02-06 14:33 ` Dmitry Monakhov
2016-02-08 9:44 ` Jan Kara
2016-02-08 22:06 ` Ross Zwisler
2016-01-08 5:27 ` [PATCH v8 7/9] ext2: call dax_pfn_mkwrite() for DAX fsync/msync Ross Zwisler
2016-01-08 5:27 ` [PATCH v8 8/9] ext4: " Ross Zwisler
2016-01-08 5:27 ` [PATCH v8 9/9] xfs: " Ross Zwisler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160115132249.GL15950@quack.suse.cz \
--to=jack@suse.cz \
--cc=adilger.kernel@dilger.ca \
--cc=akpm@linux-foundation.org \
--cc=bfields@fieldses.org \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@fromorbit.com \
--cc=hpa@zytor.com \
--cc=jack@suse.com \
--cc=jlayton@poochiereds.net \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@lists.01.org \
--cc=matthew.r.wilcox@intel.com \
--cc=mingo@redhat.com \
--cc=ross.zwisler@linux.intel.com \
--cc=tglx@linutronix.de \
--cc=tytso@mit.edu \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@linux.intel.com \
--cc=x86@kernel.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).