linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	Matthew Wilcox <mawilcox@microsoft.com>, Jan Kara <jack@suse.cz>,
	Jeff Layton <jlayton@redhat.com>,
	Lukas Czerner <lczerner@redhat.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Christoph Hellwig <hch@lst.de>,
	Goldwyn Rodrigues <rgoldwyn@suse.com>,
	Nicholas Piggin <npiggin@gmail.com>,
	Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>,
	linux-nilfs@vger.kernel.org, Jaegeuk Kim <jaegeuk@kernel.org>,
	Chao Yu <yuchao0@huawei.com>,
	linux-f2fs-devel@lists.sourceforge.net,
	Oleg Drokin <oleg.drokin@intel.com>,
	Andreas Dilger <andreas.dilger@intel.com>,
	James Simmons <jsimmons@infradead.org>
Subject: Re: [PATCH v10 43/62] memfd: Convert shmem_tag_pins to XArray
Date: Tue, 3 Apr 2018 13:51:17 -0700	[thread overview]
Message-ID: <20180403205117.GA30145@bombadil.infradead.org> (raw)
In-Reply-To: <20180331021111.GB13332@bombadil.infradead.org>

On Fri, Mar 30, 2018 at 07:11:11PM -0700, Matthew Wilcox wrote:
> On Fri, Mar 30, 2018 at 05:05:05PM -0700, Mike Kravetz wrote:
> > On 03/29/2018 08:42 PM, Matthew Wilcox wrote:
> > > Simplify the locking by taking the spinlock while we walk the tree on
> > > the assumption that many acquires and releases of the lock will be
> > > worse than holding the lock for a (potentially) long time.
> > 
> > I see this change made in several of the patches and do not have a
> > specific issue with it.  As part of the XArray implementation you
> > have XA_CHECK_SCHED = 4096.   So, we drop locks and do a cond_resched
> > after XA_CHECK_SCHED iterations.  Just curious how you came up with
> > that number.  Apologies in advance if this was discussed in a previous
> > round of reviews.
> 
> It comes from two places, the current implementations of
> tag_pages_for_writeback() and find_swap_entry().  I have no idea if it's
> the optimal number for anybody, but it's the only number that anyone
> was using.  I'll have no problem if somebody suggests we tweak the number
> in the future.

I thought about this some more.  One of the principles behind the xarray
rewrite was to make it easier to use than the radix tree.  Even this
interface succeeds at that; compare:

-       radix_tree_for_each_slot(slot, root, &iter, 0) {
-               if (*slot == item) {
-                       found = iter.index;
+       xas_for_each(&xas, entry, ULONG_MAX) {
+               if (xas_retry(&xas, entry))
+                       continue;
+               if (entry == item)
                        break;
-               }
                checked++;
-               if ((checked % 4096) != 0)
+               if ((checked % XA_CHECK_SCHED) != 0)
                        continue;
-               slot = radix_tree_iter_resume(slot, &iter);
+               xas_pause(&xas);
                cond_resched_rcu();
        }

But it's not *great*.  It doesn't succeed in capturing all the necessary
state in the xa_state so that the user doesn't have to worry about this.
It's subtle and relatively easy to get wrong.  And, as you say, why this
magic number?

I came up with this, which does capture everything necessary in the
xa_state -- just one bit which tells the xa_state whether we're in an
iteration with interrupts disabled.  If that bit is set, iterations
pause at the end of every second-level group of the tree.  That's every
4096 _indices_ (as opposed to every 4096 _processed entries_), but those
should be close to each other in dense trees.

I don't love the name of the function (xas_long_iter_irq()), but I'm
expecting somebody else will have a better name.  I did notice that
shmem_tag_pins() was only called from shmem_wait_for_pins(), so it
doesn't need its own XA_STATE; we can just pass in the xa_state pointer.

I rather like the outcome in the callers; the loops are reduced to doing
the thing that the loops are supposed to do instead of having this silly
tail on them that handles bookkeeping.

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index eac04922eba2..7ac25c402e19 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -842,6 +842,31 @@ static inline void xas_set_order(struct xa_state *xas, unsigned long index,
 #endif
 }
 
+#define XAS_FLAG_PAUSE	(1 << 7)
+
+/**
+ * xas_long_iter_irq() - Mark XArray operation state as being used for a long
+ * iteration.
+ * @xas: XArray operation state.
+ *
+ * If you are about to start an iteration over a potentially large array
+ * with the xa_lock held and interrupts off, call this function to have
+ * the iteration pause itself at opportune moments and check whether other
+ * threads need to run.
+ *
+ * If your iteration uses the rcu lock or the xa_lock without interrupts
+ * disabled, you can use cond_resched_rcu() or cond_resched_lock() at the
+ * end of each loop.  There is no cond_resched_lock_irq() (because most
+ * of the ways which notify a thread that a higher-priority thread wants
+ * to run depend on interrupts being enabled).
+ *
+ * Context: Process context.
+ */
+static inline void xas_long_iter_irq(struct xa_state *xas)
+{
+	xas->xa_flags |= XAS_FLAG_PAUSE;
+}
+
 /**
  * xas_set_update() - Set up XArray operation state for a callback.
  * @xas: XArray operation state.
diff --git a/lib/xarray.c b/lib/xarray.c
index 653ab0555673..9328e7b7ac85 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -948,6 +948,18 @@ void *__xas_next(struct xa_state *xas)
 }
 EXPORT_SYMBOL_GPL(__xas_next);
 
+static
+void *xas_find_restart(struct xa_state *xas, unsigned long max, xa_tag_t tag)
+{
+	xas_unlock_irq(xas);
+	cond_resched();
+	xas_reset(xas);
+	xas_lock_irq(xas);
+	if (tag == XA_PRESENT)
+		return xas_find(xas, max);
+	return xas_find_tag(xas, max, tag);
+}
+
 /**
  * xas_find() - Find the next present entry in the XArray.
  * @xas: XArray operation state.
@@ -984,6 +996,9 @@ void *xas_find(struct xa_state *xas, unsigned long max)
 
 	while (xas->xa_node && (xas->xa_index <= max)) {
 		if (unlikely(xas->xa_offset == XA_CHUNK_SIZE)) {
+			if ((xas->xa_flags & XAS_FLAG_PAUSE) &&
+					xas->xa_node->shift)
+				return xas_find_restart(xas, max, XA_PRESENT);
 			xas->xa_offset = xas->xa_node->offset + 1;
 			xas->xa_node = xa_parent(xas->xa, xas->xa_node);
 			continue;
@@ -1056,6 +1071,9 @@ void *xas_find_tag(struct xa_state *xas, unsigned long max, xa_tag_t tag)
 
 	while (xas->xa_index <= max) {
 		if (unlikely(xas->xa_offset == XA_CHUNK_SIZE)) {
+			if ((xas->xa_flags & XAS_FLAG_PAUSE) &&
+					xas->xa_node->shift)
+				return xas_find_restart(xas, max, tag);
 			xas->xa_offset = xas->xa_node->offset + 1;
 			xas->xa_node = xa_parent(xas->xa, xas->xa_node);
 			if (!xas->xa_node)
diff --git a/mm/memfd.c b/mm/memfd.c
index 0e0835e63af2..7eb703651a73 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -27,30 +27,20 @@
 #define SHMEM_TAG_PINNED        PAGECACHE_TAG_TOWRITE
 #define LAST_SCAN               4       /* about 150ms max */
 
-static void shmem_tag_pins(struct address_space *mapping)
+static void shmem_tag_pins(struct xa_state *xas)
 {
-	XA_STATE(xas, &mapping->i_pages, 0);
 	struct page *page;
-	unsigned int tagged = 0;
 
 	lru_add_drain();
 
-	xas_lock_irq(&xas);
-	xas_for_each(&xas, page, ULONG_MAX) {
+	xas_lock_irq(xas);
+	xas_for_each(xas, page, ULONG_MAX) {
 		if (xa_is_value(page))
 			continue;
 		if (page_count(page) - page_mapcount(page) > 1)
-			xas_set_tag(&xas, SHMEM_TAG_PINNED);
-
-		if (++tagged % XA_CHECK_SCHED)
-			continue;
-
-		xas_pause(&xas);
-		xas_unlock_irq(&xas);
-		cond_resched();
-		xas_lock_irq(&xas);
+			xas_set_tag(xas, SHMEM_TAG_PINNED);
 	}
-	xas_unlock_irq(&xas);
+	xas_unlock_irq(xas);
 }
 
 /*
@@ -68,12 +58,11 @@ static int shmem_wait_for_pins(struct address_space *mapping)
 	struct page *page;
 	int error, scan;
 
-	shmem_tag_pins(mapping);
+	xas_long_iter_irq(&xas);
+	shmem_tag_pins(&xas);
 
 	error = 0;
 	for (scan = 0; scan <= LAST_SCAN; scan++) {
-		unsigned int tagged = 0;
-
 		if (!xas_tagged(&xas, SHMEM_TAG_PINNED))
 			break;
 
@@ -101,13 +90,6 @@ static int shmem_wait_for_pins(struct address_space *mapping)
 			}
 			if (clear)
 				xas_clear_tag(&xas, SHMEM_TAG_PINNED);
-			if (++tagged % XA_CHECK_SCHED)
-				continue;
-
-			xas_pause(&xas);
-			xas_unlock_irq(&xas);
-			cond_resched();
-			xas_lock_irq(&xas);
 		}
 		xas_unlock_irq(&xas);
 	}
diff --git a/tools/testing/radix-tree/linux.c b/tools/testing/radix-tree/linux.c
index 44a0d1ad4408..22421ad63559 100644
--- a/tools/testing/radix-tree/linux.c
+++ b/tools/testing/radix-tree/linux.c
@@ -16,6 +16,12 @@ int nr_allocated;
 int preempt_count;
 int kmalloc_verbose;
 int test_verbose;
+int resched_count;
+
+void cond_resched(void)
+{
+	resched_count++;
+}
 
 struct kmem_cache {
 	pthread_mutex_t lock;
diff --git a/tools/testing/radix-tree/linux/kernel.h b/tools/testing/radix-tree/linux/kernel.h
index 9fa1828dde5e..57e16a2554f6 100644
--- a/tools/testing/radix-tree/linux/kernel.h
+++ b/tools/testing/radix-tree/linux/kernel.h
@@ -25,4 +25,6 @@
 #define local_irq_disable()	rcu_read_lock()
 #define local_irq_enable()	rcu_read_unlock()
 
+extern void cond_resched(void);
+
 #endif /* _KERNEL_H */
diff --git a/tools/testing/radix-tree/test.h b/tools/testing/radix-tree/test.h
index f97cacd1422d..730d91849d8f 100644
--- a/tools/testing/radix-tree/test.h
+++ b/tools/testing/radix-tree/test.h
@@ -54,6 +54,7 @@ void tree_verify_min_height(struct radix_tree_root *root, int maxindex);
 void verify_tag_consistency(struct radix_tree_root *root, unsigned int tag);
 
 extern int nr_allocated;
+extern int resched_count;
 
 /* Normally private parts of lib/radix-tree.c */
 struct radix_tree_node *entry_to_node(void *ptr);
diff --git a/tools/testing/radix-tree/xarray-test.c b/tools/testing/radix-tree/xarray-test.c
index f8909eb09cbc..3a70ad1734cf 100644
--- a/tools/testing/radix-tree/xarray-test.c
+++ b/tools/testing/radix-tree/xarray-test.c
@@ -183,6 +183,48 @@ void check_xas_pause(struct xarray *xa)
 	assert(seen == 2);
 }
 
+void check_xas_set_pause(struct xarray *xa)
+{
+	XA_STATE(xas, xa, 0);
+	void *entry;
+	unsigned i = 0;
+	int count = resched_count;
+
+	BUG_ON(!xa_empty(xa));
+
+	xa_store(xa, 4095, xa_mk_value(4095), GFP_KERNEL);
+	xa_store(xa, 4096, xa_mk_value(4096), GFP_KERNEL);
+
+	xas_lock_irq(&xas);
+	xas_for_each(&xas, entry, ULONG_MAX) {
+		if (i == 0)
+			BUG_ON(entry != xa_mk_value(4095));
+		else if (i == 1)
+			BUG_ON(entry != xa_mk_value(4096));
+		i++;
+	}
+	xas_unlock_irq(&xas);
+	BUG_ON(resched_count != count);
+	BUG_ON(i != 2);
+	BUG_ON(entry != NULL);
+
+	xas_set(&xas, 0);
+	i = 0;
+	xas_long_iter_irq(&xas);
+	xas_lock_irq(&xas);
+	xas_for_each(&xas, entry, ULONG_MAX) {
+		if (i == 0)
+			BUG_ON(entry != xa_mk_value(4095));
+		else if (i == 1)
+			BUG_ON(entry != xa_mk_value(4096));
+		i++;
+	}
+	xas_unlock_irq(&xas);
+	BUG_ON(i != 2);
+	BUG_ON(entry != NULL);
+	BUG_ON(resched_count == count);
+}
+
 void check_xas_retry(struct xarray *xa)
 {
 	XA_STATE(xas, xa, 0);
@@ -553,6 +595,9 @@ void xarray_checks(void)
 	check_xas_pause(&array);
 	item_kill_tree(&array);
 
+	check_xas_set_pause(&array);
+	item_kill_tree(&array);
+
 	check_xa_load(&array);
 	item_kill_tree(&array);
 

  reply	other threads:[~2018-04-03 20:51 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-30  3:41 [PATCH v10 00/62] Convert page cache to XArray Matthew Wilcox
2018-03-30  3:41 ` [PATCH v10 01/62] page cache: Use xa_lock Matthew Wilcox
2018-03-30  3:41 ` [PATCH v10 02/62] xarray: Replace exceptional entries Matthew Wilcox
2018-03-30  3:41 ` [PATCH v10 03/62] xarray: Change definition of sibling entries Matthew Wilcox
2018-03-30  3:41 ` [PATCH v10 04/62] xarray: Add definition of struct xarray Matthew Wilcox
2018-03-30  3:41 ` [PATCH v10 05/62] xarray: Define struct xa_node Matthew Wilcox
2018-03-30  3:41 ` [PATCH v10 06/62] xarray: Add documentation Matthew Wilcox
2018-03-30  3:41 ` [PATCH v10 07/62] xarray: Add xa_load Matthew Wilcox
2018-03-30  3:41 ` [PATCH v10 08/62] xarray: Add xa_get_tag, xa_set_tag and xa_clear_tag Matthew Wilcox
2018-03-30  3:41 ` [PATCH v10 09/62] xarray: Add xa_store Matthew Wilcox
2018-03-30  3:41 ` [PATCH v10 10/62] xarray: Add xa_cmpxchg and xa_insert Matthew Wilcox
2018-03-30  3:41 ` [PATCH v10 11/62] xarray: Add xa_for_each Matthew Wilcox
2018-03-30  3:41 ` [PATCH v10 12/62] xarray: Add xa_extract Matthew Wilcox
2018-03-30  3:41 ` [PATCH v10 13/62] xarray: Add xa_destroy Matthew Wilcox
2018-03-30  3:41 ` [PATCH v10 14/62] xarray: Add xas_next and xas_prev Matthew Wilcox
2018-03-30  3:41 ` [PATCH v10 15/62] xarray: Add xas_create_range Matthew Wilcox
2018-03-30  3:41 ` [PATCH v10 16/62] xarray: Add MAINTAINERS entry Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 17/62] page cache: Rearrange address_space Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 18/62] page cache: Convert hole search to XArray Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 19/62] page cache: Add and replace pages using the XArray Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 20/62] page cache: Convert page deletion to XArray Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 21/62] page cache: Convert page cache lookups " Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 22/62] page cache: Convert delete_batch " Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 23/62] page cache: Remove stray radix comment Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 24/62] page cache: Convert filemap_range_has_page to XArray Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 25/62] mm: Convert page-writeback " Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 26/62] mm: Convert workingset " Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 27/62] mm: Convert truncate " Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 28/62] mm: Convert add_to_swap_cache " Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 29/62] mm: Convert delete_from_swap_cache " Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 30/62] mm: Convert __do_page_cache_readahead " Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 31/62] mm: Convert page migration " Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 32/62] mm: Convert huge_memory " Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 33/62] mm: Convert collapse_shmem " Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 34/62] mm: Convert khugepaged_scan_shmem " Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 35/62] pagevec: Use xa_tag_t Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 36/62] shmem: Convert replace to XArray Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 37/62] shmem: Convert shmem_confirm_swap " Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 38/62] shmem: Convert find_swap_entry " Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 39/62] shmem: Convert shmem_add_to_page_cache " Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 40/62] shmem: Convert shmem_alloc_hugepage " Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 41/62] shmem: Convert shmem_free_swap " Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 42/62] shmem: Convert shmem_partial_swap_usage " Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 43/62] memfd: Convert shmem_tag_pins " Matthew Wilcox
2018-03-31  0:05   ` Mike Kravetz
2018-03-31  2:11     ` Matthew Wilcox
2018-04-03 20:51       ` Matthew Wilcox [this message]
2018-03-30  3:42 ` [PATCH v10 44/62] memfd: Convert shmem_wait_for_pins " Matthew Wilcox
2018-03-31  0:07   ` Mike Kravetz
2018-03-30  3:42 ` [PATCH v10 45/62] shmem: Comment fixups Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 46/62] btrfs: Convert page cache to XArray Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 47/62] fs: Convert buffer " Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 48/62] fs: Convert writeback " Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 49/62] nilfs2: Convert " Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 50/62] f2fs: " Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 51/62] lustre: " Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 52/62] dax: Fix use of zero page Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 53/62] dax: dax_insert_mapping_entry always succeeds Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 54/62] dax: Rename some functions Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 55/62] dax: Hash on XArray instead of mapping Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 56/62] dax: Convert dax_insert_pfn_mkwrite to XArray Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 57/62] dax: Convert dax_layout_busy_page " Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 58/62] dax: Convert __dax_invalidate_entry " Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 59/62] dax: Convert dax writeback " Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 60/62] dax: Convert page fault handlers " Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 61/62] page cache: Finish XArray conversion Matthew Wilcox
2018-03-30  3:42 ` [PATCH v10 62/62] radix tree: Remove unused functions Matthew Wilcox
2018-04-04 16:35 ` [PATCH v10 00/62] Convert page cache to XArray Mike Kravetz
2018-04-05  3:52   ` Matthew Wilcox
2018-04-05 18:33     ` Mike Kravetz
2018-04-09 21:18 ` Goldwyn Rodrigues
2018-04-14 19:50   ` Matthew Wilcox
2018-04-14 19:58     ` Matthew Wilcox
2018-04-17 21:49       ` Goldwyn Rodrigues

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180403205117.GA30145@bombadil.infradead.org \
    --to=willy@infradead.org \
    --cc=andreas.dilger@intel.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jaegeuk@kernel.org \
    --cc=jlayton@redhat.com \
    --cc=jsimmons@infradead.org \
    --cc=konishi.ryusuke@lab.ntt.co.jp \
    --cc=lczerner@redhat.com \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nilfs@vger.kernel.org \
    --cc=mawilcox@microsoft.com \
    --cc=mike.kravetz@oracle.com \
    --cc=npiggin@gmail.com \
    --cc=oleg.drokin@intel.com \
    --cc=rgoldwyn@suse.com \
    --cc=ross.zwisler@linux.intel.com \
    --cc=yuchao0@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).