From: Matthew Wilcox <willy@infradead.org>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
Matthew Wilcox <mawilcox@microsoft.com>, Jan Kara <jack@suse.cz>,
Jeff Layton <jlayton@redhat.com>,
Lukas Czerner <lczerner@redhat.com>,
Ross Zwisler <ross.zwisler@linux.intel.com>,
Christoph Hellwig <hch@lst.de>,
Goldwyn Rodrigues <rgoldwyn@suse.com>,
Nicholas Piggin <npiggin@gmail.com>,
Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>,
linux-nilfs@vger.kernel.org, Jaegeuk Kim <jaegeuk@kernel.org>,
Chao Yu <yuchao0@huawei.com>,
linux-f2fs-devel@lists.sourceforge.net,
Oleg Drokin <oleg.drokin@intel.com>,
Andreas Dilger <andreas.dilger@intel.com>,
James Simmons <jsimmons@infradead.org>
Subject: Re: [PATCH v10 43/62] memfd: Convert shmem_tag_pins to XArray
Date: Tue, 3 Apr 2018 13:51:17 -0700 [thread overview]
Message-ID: <20180403205117.GA30145@bombadil.infradead.org> (raw)
In-Reply-To: <20180331021111.GB13332@bombadil.infradead.org>
On Fri, Mar 30, 2018 at 07:11:11PM -0700, Matthew Wilcox wrote:
> On Fri, Mar 30, 2018 at 05:05:05PM -0700, Mike Kravetz wrote:
> > On 03/29/2018 08:42 PM, Matthew Wilcox wrote:
> > > Simplify the locking by taking the spinlock while we walk the tree on
> > > the assumption that many acquires and releases of the lock will be
> > > worse than holding the lock for a (potentially) long time.
> >
> > I see this change made in several of the patches and do not have a
> > specific issue with it. As part of the XArray implementation you
> > have XA_CHECK_SCHED = 4096. So, we drop locks and do a cond_resched
> > after XA_CHECK_SCHED iterations. Just curious how you came up with
> > that number. Apologies in advance if this was discussed in a previous
> > round of reviews.
>
> It comes from two places, the current implementations of
> tag_pages_for_writeback() and find_swap_entry(). I have no idea if it's
> the optimal number for anybody, but it's the only number that anyone
> was using. I'll have no problem if somebody suggests we tweak the number
> in the future.
I thought about this some more. One of the principles behind the xarray
rewrite was to make it easier to use than the radix tree. Even this
interface succeeds at that; compare:
- radix_tree_for_each_slot(slot, root, &iter, 0) {
- if (*slot == item) {
- found = iter.index;
+ xas_for_each(&xas, entry, ULONG_MAX) {
+ if (xas_retry(&xas, entry))
+ continue;
+ if (entry == item)
break;
- }
checked++;
- if ((checked % 4096) != 0)
+ if ((checked % XA_CHECK_SCHED) != 0)
continue;
- slot = radix_tree_iter_resume(slot, &iter);
+ xas_pause(&xas);
cond_resched_rcu();
}
But it's not *great*. It doesn't succeed in capturing all the necessary
state in the xa_state so that the user doesn't have to worry about this.
It's subtle and relatively easy to get wrong. And, as you say, why this
magic number?
I came up with this, which does capture everything necessary in the
xa_state -- just one bit which tells the xa_state whether we're in an
iteration with interrupts disabled. If that bit is set, iterations
pause at the end of every second-level group of the tree. That's every
4096 _indices_ (as opposed to every 4096 _processed entries_), but those
should be close to each other in dense trees.
I don't love the name of the function (xas_long_iter_irq()), but I'm
expecting somebody else will have a better name. I did notice that
shmem_tag_pins() was only called from shmem_wait_for_pins(), so it
doesn't need its own XA_STATE; we can just pass in the xa_state pointer.
I rather like the outcome in the callers; the loops are reduced to doing
the thing that the loops are supposed to do instead of having this silly
tail on them that handles bookkeeping.
diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index eac04922eba2..7ac25c402e19 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -842,6 +842,31 @@ static inline void xas_set_order(struct xa_state *xas, unsigned long index,
#endif
}
+#define XAS_FLAG_PAUSE (1 << 7)
+
+/**
+ * xas_long_iter_irq() - Mark XArray operation state as being used for a long
+ * iteration.
+ * @xas: XArray operation state.
+ *
+ * If you are about to start an iteration over a potentially large array
+ * with the xa_lock held and interrupts off, call this function to have
+ * the iteration pause itself at opportune moments and check whether other
+ * threads need to run.
+ *
+ * If your iteration uses the rcu lock or the xa_lock without interrupts
+ * disabled, you can use cond_resched_rcu() or cond_resched_lock() at the
+ * end of each loop. There is no cond_resched_lock_irq() (because most
+ * of the ways which notify a thread that a higher-priority thread wants
+ * to run depend on interrupts being enabled).
+ *
+ * Context: Process context.
+ */
+static inline void xas_long_iter_irq(struct xa_state *xas)
+{
+ xas->xa_flags |= XAS_FLAG_PAUSE;
+}
+
/**
* xas_set_update() - Set up XArray operation state for a callback.
* @xas: XArray operation state.
diff --git a/lib/xarray.c b/lib/xarray.c
index 653ab0555673..9328e7b7ac85 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -948,6 +948,18 @@ void *__xas_next(struct xa_state *xas)
}
EXPORT_SYMBOL_GPL(__xas_next);
+static
+void *xas_find_restart(struct xa_state *xas, unsigned long max, xa_tag_t tag)
+{
+ xas_unlock_irq(xas);
+ cond_resched();
+ xas_reset(xas);
+ xas_lock_irq(xas);
+ if (tag == XA_PRESENT)
+ return xas_find(xas, max);
+ return xas_find_tag(xas, max, tag);
+}
+
/**
* xas_find() - Find the next present entry in the XArray.
* @xas: XArray operation state.
@@ -984,6 +996,9 @@ void *xas_find(struct xa_state *xas, unsigned long max)
while (xas->xa_node && (xas->xa_index <= max)) {
if (unlikely(xas->xa_offset == XA_CHUNK_SIZE)) {
+ if ((xas->xa_flags & XAS_FLAG_PAUSE) &&
+ xas->xa_node->shift)
+ return xas_find_restart(xas, max, XA_PRESENT);
xas->xa_offset = xas->xa_node->offset + 1;
xas->xa_node = xa_parent(xas->xa, xas->xa_node);
continue;
@@ -1056,6 +1071,9 @@ void *xas_find_tag(struct xa_state *xas, unsigned long max, xa_tag_t tag)
while (xas->xa_index <= max) {
if (unlikely(xas->xa_offset == XA_CHUNK_SIZE)) {
+ if ((xas->xa_flags & XAS_FLAG_PAUSE) &&
+ xas->xa_node->shift)
+ return xas_find_restart(xas, max, tag);
xas->xa_offset = xas->xa_node->offset + 1;
xas->xa_node = xa_parent(xas->xa, xas->xa_node);
if (!xas->xa_node)
diff --git a/mm/memfd.c b/mm/memfd.c
index 0e0835e63af2..7eb703651a73 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -27,30 +27,20 @@
#define SHMEM_TAG_PINNED PAGECACHE_TAG_TOWRITE
#define LAST_SCAN 4 /* about 150ms max */
-static void shmem_tag_pins(struct address_space *mapping)
+static void shmem_tag_pins(struct xa_state *xas)
{
- XA_STATE(xas, &mapping->i_pages, 0);
struct page *page;
- unsigned int tagged = 0;
lru_add_drain();
- xas_lock_irq(&xas);
- xas_for_each(&xas, page, ULONG_MAX) {
+ xas_lock_irq(xas);
+ xas_for_each(xas, page, ULONG_MAX) {
if (xa_is_value(page))
continue;
if (page_count(page) - page_mapcount(page) > 1)
- xas_set_tag(&xas, SHMEM_TAG_PINNED);
-
- if (++tagged % XA_CHECK_SCHED)
- continue;
-
- xas_pause(&xas);
- xas_unlock_irq(&xas);
- cond_resched();
- xas_lock_irq(&xas);
+ xas_set_tag(xas, SHMEM_TAG_PINNED);
}
- xas_unlock_irq(&xas);
+ xas_unlock_irq(xas);
}
/*
@@ -68,12 +58,11 @@ static int shmem_wait_for_pins(struct address_space *mapping)
struct page *page;
int error, scan;
- shmem_tag_pins(mapping);
+ xas_long_iter_irq(&xas);
+ shmem_tag_pins(&xas);
error = 0;
for (scan = 0; scan <= LAST_SCAN; scan++) {
- unsigned int tagged = 0;
-
if (!xas_tagged(&xas, SHMEM_TAG_PINNED))
break;
@@ -101,13 +90,6 @@ static int shmem_wait_for_pins(struct address_space *mapping)
}
if (clear)
xas_clear_tag(&xas, SHMEM_TAG_PINNED);
- if (++tagged % XA_CHECK_SCHED)
- continue;
-
- xas_pause(&xas);
- xas_unlock_irq(&xas);
- cond_resched();
- xas_lock_irq(&xas);
}
xas_unlock_irq(&xas);
}
diff --git a/tools/testing/radix-tree/linux.c b/tools/testing/radix-tree/linux.c
index 44a0d1ad4408..22421ad63559 100644
--- a/tools/testing/radix-tree/linux.c
+++ b/tools/testing/radix-tree/linux.c
@@ -16,6 +16,12 @@ int nr_allocated;
int preempt_count;
int kmalloc_verbose;
int test_verbose;
+int resched_count;
+
+void cond_resched(void)
+{
+ resched_count++;
+}
struct kmem_cache {
pthread_mutex_t lock;
diff --git a/tools/testing/radix-tree/linux/kernel.h b/tools/testing/radix-tree/linux/kernel.h
index 9fa1828dde5e..57e16a2554f6 100644
--- a/tools/testing/radix-tree/linux/kernel.h
+++ b/tools/testing/radix-tree/linux/kernel.h
@@ -25,4 +25,6 @@
#define local_irq_disable() rcu_read_lock()
#define local_irq_enable() rcu_read_unlock()
+extern void cond_resched(void);
+
#endif /* _KERNEL_H */
diff --git a/tools/testing/radix-tree/test.h b/tools/testing/radix-tree/test.h
index f97cacd1422d..730d91849d8f 100644
--- a/tools/testing/radix-tree/test.h
+++ b/tools/testing/radix-tree/test.h
@@ -54,6 +54,7 @@ void tree_verify_min_height(struct radix_tree_root *root, int maxindex);
void verify_tag_consistency(struct radix_tree_root *root, unsigned int tag);
extern int nr_allocated;
+extern int resched_count;
/* Normally private parts of lib/radix-tree.c */
struct radix_tree_node *entry_to_node(void *ptr);
diff --git a/tools/testing/radix-tree/xarray-test.c b/tools/testing/radix-tree/xarray-test.c
index f8909eb09cbc..3a70ad1734cf 100644
--- a/tools/testing/radix-tree/xarray-test.c
+++ b/tools/testing/radix-tree/xarray-test.c
@@ -183,6 +183,48 @@ void check_xas_pause(struct xarray *xa)
assert(seen == 2);
}
+void check_xas_set_pause(struct xarray *xa)
+{
+ XA_STATE(xas, xa, 0);
+ void *entry;
+ unsigned i = 0;
+ int count = resched_count;
+
+ BUG_ON(!xa_empty(xa));
+
+ xa_store(xa, 4095, xa_mk_value(4095), GFP_KERNEL);
+ xa_store(xa, 4096, xa_mk_value(4096), GFP_KERNEL);
+
+ xas_lock_irq(&xas);
+ xas_for_each(&xas, entry, ULONG_MAX) {
+ if (i == 0)
+ BUG_ON(entry != xa_mk_value(4095));
+ else if (i == 1)
+ BUG_ON(entry != xa_mk_value(4096));
+ i++;
+ }
+ xas_unlock_irq(&xas);
+ BUG_ON(resched_count != count);
+ BUG_ON(i != 2);
+ BUG_ON(entry != NULL);
+
+ xas_set(&xas, 0);
+ i = 0;
+ xas_long_iter_irq(&xas);
+ xas_lock_irq(&xas);
+ xas_for_each(&xas, entry, ULONG_MAX) {
+ if (i == 0)
+ BUG_ON(entry != xa_mk_value(4095));
+ else if (i == 1)
+ BUG_ON(entry != xa_mk_value(4096));
+ i++;
+ }
+ xas_unlock_irq(&xas);
+ BUG_ON(i != 2);
+ BUG_ON(entry != NULL);
+ BUG_ON(resched_count == count);
+}
+
void check_xas_retry(struct xarray *xa)
{
XA_STATE(xas, xa, 0);
@@ -553,6 +595,9 @@ void xarray_checks(void)
check_xas_pause(&array);
item_kill_tree(&array);
+ check_xas_set_pause(&array);
+ item_kill_tree(&array);
+
check_xa_load(&array);
item_kill_tree(&array);
next prev parent reply other threads:[~2018-04-03 20:51 UTC|newest]
Thread overview: 74+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-30 3:41 [PATCH v10 00/62] Convert page cache to XArray Matthew Wilcox
2018-03-30 3:41 ` [PATCH v10 01/62] page cache: Use xa_lock Matthew Wilcox
2018-03-30 3:41 ` [PATCH v10 02/62] xarray: Replace exceptional entries Matthew Wilcox
2018-03-30 3:41 ` [PATCH v10 03/62] xarray: Change definition of sibling entries Matthew Wilcox
2018-03-30 3:41 ` [PATCH v10 04/62] xarray: Add definition of struct xarray Matthew Wilcox
2018-03-30 3:41 ` [PATCH v10 05/62] xarray: Define struct xa_node Matthew Wilcox
2018-03-30 3:41 ` [PATCH v10 06/62] xarray: Add documentation Matthew Wilcox
2018-03-30 3:41 ` [PATCH v10 07/62] xarray: Add xa_load Matthew Wilcox
2018-03-30 3:41 ` [PATCH v10 08/62] xarray: Add xa_get_tag, xa_set_tag and xa_clear_tag Matthew Wilcox
2018-03-30 3:41 ` [PATCH v10 09/62] xarray: Add xa_store Matthew Wilcox
2018-03-30 3:41 ` [PATCH v10 10/62] xarray: Add xa_cmpxchg and xa_insert Matthew Wilcox
2018-03-30 3:41 ` [PATCH v10 11/62] xarray: Add xa_for_each Matthew Wilcox
2018-03-30 3:41 ` [PATCH v10 12/62] xarray: Add xa_extract Matthew Wilcox
2018-03-30 3:41 ` [PATCH v10 13/62] xarray: Add xa_destroy Matthew Wilcox
2018-03-30 3:41 ` [PATCH v10 14/62] xarray: Add xas_next and xas_prev Matthew Wilcox
2018-03-30 3:41 ` [PATCH v10 15/62] xarray: Add xas_create_range Matthew Wilcox
2018-03-30 3:41 ` [PATCH v10 16/62] xarray: Add MAINTAINERS entry Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 17/62] page cache: Rearrange address_space Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 18/62] page cache: Convert hole search to XArray Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 19/62] page cache: Add and replace pages using the XArray Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 20/62] page cache: Convert page deletion to XArray Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 21/62] page cache: Convert page cache lookups " Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 22/62] page cache: Convert delete_batch " Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 23/62] page cache: Remove stray radix comment Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 24/62] page cache: Convert filemap_range_has_page to XArray Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 25/62] mm: Convert page-writeback " Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 26/62] mm: Convert workingset " Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 27/62] mm: Convert truncate " Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 28/62] mm: Convert add_to_swap_cache " Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 29/62] mm: Convert delete_from_swap_cache " Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 30/62] mm: Convert __do_page_cache_readahead " Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 31/62] mm: Convert page migration " Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 32/62] mm: Convert huge_memory " Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 33/62] mm: Convert collapse_shmem " Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 34/62] mm: Convert khugepaged_scan_shmem " Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 35/62] pagevec: Use xa_tag_t Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 36/62] shmem: Convert replace to XArray Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 37/62] shmem: Convert shmem_confirm_swap " Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 38/62] shmem: Convert find_swap_entry " Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 39/62] shmem: Convert shmem_add_to_page_cache " Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 40/62] shmem: Convert shmem_alloc_hugepage " Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 41/62] shmem: Convert shmem_free_swap " Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 42/62] shmem: Convert shmem_partial_swap_usage " Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 43/62] memfd: Convert shmem_tag_pins " Matthew Wilcox
2018-03-31 0:05 ` Mike Kravetz
2018-03-31 2:11 ` Matthew Wilcox
2018-04-03 20:51 ` Matthew Wilcox [this message]
2018-03-30 3:42 ` [PATCH v10 44/62] memfd: Convert shmem_wait_for_pins " Matthew Wilcox
2018-03-31 0:07 ` Mike Kravetz
2018-03-30 3:42 ` [PATCH v10 45/62] shmem: Comment fixups Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 46/62] btrfs: Convert page cache to XArray Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 47/62] fs: Convert buffer " Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 48/62] fs: Convert writeback " Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 49/62] nilfs2: Convert " Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 50/62] f2fs: " Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 51/62] lustre: " Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 52/62] dax: Fix use of zero page Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 53/62] dax: dax_insert_mapping_entry always succeeds Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 54/62] dax: Rename some functions Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 55/62] dax: Hash on XArray instead of mapping Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 56/62] dax: Convert dax_insert_pfn_mkwrite to XArray Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 57/62] dax: Convert dax_layout_busy_page " Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 58/62] dax: Convert __dax_invalidate_entry " Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 59/62] dax: Convert dax writeback " Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 60/62] dax: Convert page fault handlers " Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 61/62] page cache: Finish XArray conversion Matthew Wilcox
2018-03-30 3:42 ` [PATCH v10 62/62] radix tree: Remove unused functions Matthew Wilcox
2018-04-04 16:35 ` [PATCH v10 00/62] Convert page cache to XArray Mike Kravetz
2018-04-05 3:52 ` Matthew Wilcox
2018-04-05 18:33 ` Mike Kravetz
2018-04-09 21:18 ` Goldwyn Rodrigues
2018-04-14 19:50 ` Matthew Wilcox
2018-04-14 19:58 ` Matthew Wilcox
2018-04-17 21:49 ` Goldwyn Rodrigues
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180403205117.GA30145@bombadil.infradead.org \
--to=willy@infradead.org \
--cc=andreas.dilger@intel.com \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=jaegeuk@kernel.org \
--cc=jlayton@redhat.com \
--cc=jsimmons@infradead.org \
--cc=konishi.ryusuke@lab.ntt.co.jp \
--cc=lczerner@redhat.com \
--cc=linux-f2fs-devel@lists.sourceforge.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nilfs@vger.kernel.org \
--cc=mawilcox@microsoft.com \
--cc=mike.kravetz@oracle.com \
--cc=npiggin@gmail.com \
--cc=oleg.drokin@intel.com \
--cc=rgoldwyn@suse.com \
--cc=ross.zwisler@linux.intel.com \
--cc=yuchao0@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).