[PATCH v2 0/4] migrate: Avoid unbounded blocks in MIGRATE_SYNC

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/4] migrate: Avoid unbounded blocks in MIGRATE_SYNC_LIGHT
@ 2023-04-21 22:12 Douglas Anderson
  2023-04-21 22:12 ` [PATCH v2 1/4] mm/filemap: Add folio_lock_timeout() Douglas Anderson
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Douglas Anderson @ 2023-04-21 22:12 UTC (permalink / raw)
  To: Andrew Morton, Mel Gorman, Vlastimil Babka, Ying, Alexander Viro,
	Christian Brauner
  Cc: linux-kernel, linux-mm, Yu Zhao, linux-fsdevel, Matthew Wilcox,
	Douglas Anderson, Bart Van Assche, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Ingo Molnar,
	Jan Kara, Juri Lelli, Mel Gorman, Mikulas Patocka, Peter Zijlstra,
	Ritesh Harjani (IBM), Steven Rostedt, Valentin Schneider,
	Vincent Guittot, Will Deacon, Zhang Yi


This series is the result of discussion around my RFC patch [1] where
I talked about completely removing the waits for the folio_lock in
migrate_folio_unmap().

This new series should, I think, be more palatable to folks. Please
let me know what you think!

Most of the description of why I think we want this patch series can
be seen in patch #3 ("migrate_pages: Don't wait forever locking pages
in MIGRATE_SYNC_LIGHT"). I won't repeat all that information here and
would humbly request that anyone wishing to comment on the overall
patch series respond there. ;-)

[1] https://lore.kernel.org/r/20230413182313.RFC.1.Ia86ccac02a303154a0b8bc60567e7a95d34c96d3@changeid

Changes in v2:
- "Add folio_lock_timeout()" new for v2.
- "Add lock_buffer_timeout()" new for v2.
- Keep unbounded delay in "SYNC", delay with a timeout in "SYNC_LIGHT"
- "Don't wait forever locking buffers in MIGRATE_SYNC_LIGHT" new for v2.

Douglas Anderson (4):
  mm/filemap: Add folio_lock_timeout()
  buffer: Add lock_buffer_timeout()
  migrate_pages: Don't wait forever locking pages in MIGRATE_SYNC_LIGHT
  migrate_pages: Don't wait forever locking buffers in
    MIGRATE_SYNC_LIGHT

 fs/buffer.c                 |  7 ++++++
 include/linux/buffer_head.h | 10 ++++++++
 include/linux/pagemap.h     | 16 +++++++++++++
 include/linux/wait_bit.h    | 24 +++++++++++++++++++
 kernel/sched/wait_bit.c     | 14 +++++++++++
 mm/filemap.c                | 47 +++++++++++++++++++++++++++----------
 mm/migrate.c                | 45 +++++++++++++++++++++--------------
 7 files changed, 132 insertions(+), 31 deletions(-)

-- 
2.40.0.634.g4ca3ef3211-goog


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/4] mm/filemap: Add folio_lock_timeout()
  2023-04-21 22:12 [PATCH v2 0/4] migrate: Avoid unbounded blocks in MIGRATE_SYNC_LIGHT Douglas Anderson
@ 2023-04-21 22:12 ` Douglas Anderson
  2023-04-23  7:50   ` Huang, Ying
  2023-04-24  8:22   ` Mel Gorman
  2023-04-21 22:12 ` [PATCH v2 2/4] buffer: Add lock_buffer_timeout() Douglas Anderson
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 14+ messages in thread
From: Douglas Anderson @ 2023-04-21 22:12 UTC (permalink / raw)
  To: Andrew Morton, Mel Gorman, Vlastimil Babka, Ying, Alexander Viro,
	Christian Brauner
  Cc: linux-kernel, linux-mm, Yu Zhao, linux-fsdevel, Matthew Wilcox,
	Douglas Anderson

Add a variant of folio_lock() that can timeout. This is useful to
avoid unbounded waits for the page lock in kcompactd.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

Changes in v2:
- "Add folio_lock_timeout()" new for v2.

 include/linux/pagemap.h | 16 ++++++++++++++
 mm/filemap.c            | 47 +++++++++++++++++++++++++++++------------
 2 files changed, 50 insertions(+), 13 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 0acb8e1fb7af..0f3ef9f79300 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -892,6 +892,7 @@ static inline bool wake_page_match(struct wait_page_queue *wait_page,
 }
 
 void __folio_lock(struct folio *folio);
+int __folio_lock_timeout(struct folio *folio, long timeout);
 int __folio_lock_killable(struct folio *folio);
 bool __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm,
 				unsigned int flags);
@@ -952,6 +953,21 @@ static inline void folio_lock(struct folio *folio)
 		__folio_lock(folio);
 }
 
+/**
+ * folio_lock_timeout() - Lock this folio, with a timeout.
+ * @folio: The folio to lock.
+ * @timeout: The timeout in jiffies; %MAX_SCHEDULE_TIMEOUT means wait forever.
+ *
+ * Return: 0 upon success; -ETIMEDOUT upon failure.
+ */
+static inline int folio_lock_timeout(struct folio *folio, long timeout)
+{
+	might_sleep();
+	if (!folio_trylock(folio))
+		return __folio_lock_timeout(folio, timeout);
+	return 0;
+}
+
 /**
  * lock_page() - Lock the folio containing this page.
  * @page: The page to lock.
diff --git a/mm/filemap.c b/mm/filemap.c
index 2723104cc06a..c6056ec41284 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1220,7 +1220,7 @@ static inline bool folio_trylock_flag(struct folio *folio, int bit_nr,
 int sysctl_page_lock_unfairness = 5;
 
 static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
-		int state, enum behavior behavior)
+		int state, enum behavior behavior, long timeout)
 {
 	wait_queue_head_t *q = folio_waitqueue(folio);
 	int unfairness = sysctl_page_lock_unfairness;
@@ -1229,6 +1229,7 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
 	bool thrashing = false;
 	unsigned long pflags;
 	bool in_thrashing;
+	int err;
 
 	if (bit_nr == PG_locked &&
 	    !folio_test_uptodate(folio) && folio_test_workingset(folio)) {
@@ -1295,10 +1296,13 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
 		/* Loop until we've been woken or interrupted */
 		flags = smp_load_acquire(&wait->flags);
 		if (!(flags & WQ_FLAG_WOKEN)) {
+			if (!timeout)
+				break;
+
 			if (signal_pending_state(state, current))
 				break;
 
-			io_schedule();
+			timeout = io_schedule_timeout(timeout);
 			continue;
 		}
 
@@ -1324,10 +1328,10 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
 	}
 
 	/*
-	 * If a signal happened, this 'finish_wait()' may remove the last
-	 * waiter from the wait-queues, but the folio waiters bit will remain
-	 * set. That's ok. The next wakeup will take care of it, and trying
-	 * to do it here would be difficult and prone to races.
+	 * If a signal/timeout happened, this 'finish_wait()' may remove the
+	 * last waiter from the wait-queues, but the folio waiters bit will
+	 * remain set. That's ok. The next wakeup will take care of it, and
+	 * trying to do it here would be difficult and prone to races.
 	 */
 	finish_wait(q, wait);
 
@@ -1336,6 +1340,13 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
 		psi_memstall_leave(&pflags);
 	}
 
+	/*
+	 * If we don't meet the success criteria below then we've got an error
+	 * of some sort. Differentiate between the two error cases. If there's
+	 * no time left it must have been a timeout.
+	 */
+	err = !timeout ? -ETIMEDOUT : -EINTR;
+
 	/*
 	 * NOTE! The wait->flags weren't stable until we've done the
 	 * 'finish_wait()', and we could have exited the loop above due
@@ -1350,9 +1361,9 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
 	 * waiter, but an exclusive one requires WQ_FLAG_DONE.
 	 */
 	if (behavior == EXCLUSIVE)
-		return wait->flags & WQ_FLAG_DONE ? 0 : -EINTR;
+		return wait->flags & WQ_FLAG_DONE ? 0 : err;
 
-	return wait->flags & WQ_FLAG_WOKEN ? 0 : -EINTR;
+	return wait->flags & WQ_FLAG_WOKEN ? 0 : err;
 }
 
 #ifdef CONFIG_MIGRATION
@@ -1442,13 +1453,15 @@ void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep,
 
 void folio_wait_bit(struct folio *folio, int bit_nr)
 {
-	folio_wait_bit_common(folio, bit_nr, TASK_UNINTERRUPTIBLE, SHARED);
+	folio_wait_bit_common(folio, bit_nr, TASK_UNINTERRUPTIBLE, SHARED,
+			      MAX_SCHEDULE_TIMEOUT);
 }
 EXPORT_SYMBOL(folio_wait_bit);
 
 int folio_wait_bit_killable(struct folio *folio, int bit_nr)
 {
-	return folio_wait_bit_common(folio, bit_nr, TASK_KILLABLE, SHARED);
+	return folio_wait_bit_common(folio, bit_nr, TASK_KILLABLE, SHARED,
+				     MAX_SCHEDULE_TIMEOUT);
 }
 EXPORT_SYMBOL(folio_wait_bit_killable);
 
@@ -1467,7 +1480,8 @@ EXPORT_SYMBOL(folio_wait_bit_killable);
  */
 static int folio_put_wait_locked(struct folio *folio, int state)
 {
-	return folio_wait_bit_common(folio, PG_locked, state, DROP);
+	return folio_wait_bit_common(folio, PG_locked, state, DROP,
+				     MAX_SCHEDULE_TIMEOUT);
 }
 
 /**
@@ -1662,17 +1676,24 @@ EXPORT_SYMBOL_GPL(page_endio);
 void __folio_lock(struct folio *folio)
 {
 	folio_wait_bit_common(folio, PG_locked, TASK_UNINTERRUPTIBLE,
-				EXCLUSIVE);
+				EXCLUSIVE, MAX_SCHEDULE_TIMEOUT);
 }
 EXPORT_SYMBOL(__folio_lock);
 
 int __folio_lock_killable(struct folio *folio)
 {
 	return folio_wait_bit_common(folio, PG_locked, TASK_KILLABLE,
-					EXCLUSIVE);
+					EXCLUSIVE, MAX_SCHEDULE_TIMEOUT);
 }
 EXPORT_SYMBOL_GPL(__folio_lock_killable);
 
+int __folio_lock_timeout(struct folio *folio, long timeout)
+{
+	return folio_wait_bit_common(folio, PG_locked, TASK_KILLABLE,
+					EXCLUSIVE, timeout);
+}
+EXPORT_SYMBOL_GPL(__folio_lock_timeout);
+
 static int __folio_lock_async(struct folio *folio, struct wait_page_queue *wait)
 {
 	struct wait_queue_head *q = folio_waitqueue(folio);
-- 
2.40.0.634.g4ca3ef3211-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/4] buffer: Add lock_buffer_timeout()
  2023-04-21 22:12 [PATCH v2 0/4] migrate: Avoid unbounded blocks in MIGRATE_SYNC_LIGHT Douglas Anderson
  2023-04-21 22:12 ` [PATCH v2 1/4] mm/filemap: Add folio_lock_timeout() Douglas Anderson
@ 2023-04-21 22:12 ` Douglas Anderson
  2023-04-23  8:47   ` Huang, Ying
  2023-04-21 22:12 ` [PATCH v2 3/4] migrate_pages: Don't wait forever locking pages in MIGRATE_SYNC_LIGHT Douglas Anderson
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Douglas Anderson @ 2023-04-21 22:12 UTC (permalink / raw)
  To: Andrew Morton, Mel Gorman, Vlastimil Babka, Ying, Alexander Viro,
	Christian Brauner
  Cc: linux-kernel, linux-mm, Yu Zhao, linux-fsdevel, Matthew Wilcox,
	Douglas Anderson, Bart Van Assche, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Ingo Molnar,
	Jan Kara, Juri Lelli, Mel Gorman, Mikulas Patocka, Peter Zijlstra,
	Ritesh Harjani (IBM), Steven Rostedt, Valentin Schneider,
	Vincent Guittot, Will Deacon, Zhang Yi

Add a variant of lock_buffer() that can timeout. This is useful to
avoid unbounded waits for the page lock in kcompactd.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

Changes in v2:
- "Add lock_buffer_timeout()" new for v2.

 fs/buffer.c                 |  7 +++++++
 include/linux/buffer_head.h | 10 ++++++++++
 include/linux/wait_bit.h    | 24 ++++++++++++++++++++++++
 kernel/sched/wait_bit.c     | 14 ++++++++++++++
 4 files changed, 55 insertions(+)

diff --git a/fs/buffer.c b/fs/buffer.c
index 9e1e2add541e..fcd19c270024 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -71,6 +71,13 @@ void __lock_buffer(struct buffer_head *bh)
 }
 EXPORT_SYMBOL(__lock_buffer);
 
+int __lock_buffer_timeout(struct buffer_head *bh, unsigned long timeout)
+{
+	return wait_on_bit_lock_io_timeout(&bh->b_state, BH_Lock,
+					   TASK_UNINTERRUPTIBLE, timeout);
+}
+EXPORT_SYMBOL(__lock_buffer_timeout);
+
 void unlock_buffer(struct buffer_head *bh)
 {
 	clear_bit_unlock(BH_Lock, &bh->b_state);
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 8f14dca5fed7..2bae464f89d5 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -237,6 +237,7 @@ struct buffer_head *alloc_buffer_head(gfp_t gfp_flags);
 void free_buffer_head(struct buffer_head * bh);
 void unlock_buffer(struct buffer_head *bh);
 void __lock_buffer(struct buffer_head *bh);
+int __lock_buffer_timeout(struct buffer_head *bh, unsigned long timeout);
 int sync_dirty_buffer(struct buffer_head *bh);
 int __sync_dirty_buffer(struct buffer_head *bh, blk_opf_t op_flags);
 void write_dirty_buffer(struct buffer_head *bh, blk_opf_t op_flags);
@@ -400,6 +401,15 @@ static inline void lock_buffer(struct buffer_head *bh)
 		__lock_buffer(bh);
 }
 
+static inline int lock_buffer_timeout(struct buffer_head *bh,
+				      unsigned long timeout)
+{
+	might_sleep();
+	if (!trylock_buffer(bh))
+		return __lock_buffer_timeout(bh, timeout);
+	return 0;
+}
+
 static inline struct buffer_head *getblk_unmovable(struct block_device *bdev,
 						   sector_t block,
 						   unsigned size)
diff --git a/include/linux/wait_bit.h b/include/linux/wait_bit.h
index 7725b7579b78..33f0f60b1c8c 100644
--- a/include/linux/wait_bit.h
+++ b/include/linux/wait_bit.h
@@ -30,6 +30,7 @@ void wake_up_bit(void *word, int bit);
 int out_of_line_wait_on_bit(void *word, int, wait_bit_action_f *action, unsigned int mode);
 int out_of_line_wait_on_bit_timeout(void *word, int, wait_bit_action_f *action, unsigned int mode, unsigned long timeout);
 int out_of_line_wait_on_bit_lock(void *word, int, wait_bit_action_f *action, unsigned int mode);
+int out_of_line_wait_on_bit_lock_timeout(void *word, int, wait_bit_action_f *action, unsigned int mode, unsigned long timeout);
 struct wait_queue_head *bit_waitqueue(void *word, int bit);
 extern void __init wait_bit_init(void);
 
@@ -208,6 +209,29 @@ wait_on_bit_lock_io(unsigned long *word, int bit, unsigned mode)
 	return out_of_line_wait_on_bit_lock(word, bit, bit_wait_io, mode);
 }
 
+/**
+ * wait_on_bit_lock_io_timeout - wait_on_bit_lock_io() with a timeout
+ * @word: the word being waited on, a kernel virtual address
+ * @bit: the bit of the word being waited on
+ * @mode: the task state to sleep in
+ * @timeout: the timeout in jiffies; %MAX_SCHEDULE_TIMEOUT means wait forever
+ *
+ * Returns zero if the bit was (eventually) found to be clear and was
+ * set.  Returns non-zero if a timeout happened or a signal was delivered to
+ * the process and the @mode allows that signal to wake the process.
+ */
+static inline int
+wait_on_bit_lock_io_timeout(unsigned long *word, int bit, unsigned mode,
+			    unsigned long timeout)
+{
+	might_sleep();
+	if (!test_and_set_bit(bit, word))
+		return 0;
+	return out_of_line_wait_on_bit_lock_timeout(word, bit,
+						    bit_wait_io_timeout,
+						    mode, timeout);
+}
+
 /**
  * wait_on_bit_lock_action - wait for a bit to be cleared, when wanting to set it
  * @word: the word being waited on, a kernel virtual address
diff --git a/kernel/sched/wait_bit.c b/kernel/sched/wait_bit.c
index 0b1cd985dc27..629acd1c6c79 100644
--- a/kernel/sched/wait_bit.c
+++ b/kernel/sched/wait_bit.c
@@ -118,6 +118,20 @@ int __sched out_of_line_wait_on_bit_lock(void *word, int bit,
 }
 EXPORT_SYMBOL(out_of_line_wait_on_bit_lock);
 
+int __sched out_of_line_wait_on_bit_lock_timeout(void *word, int bit,
+						 wait_bit_action_f *action,
+						 unsigned mode,
+						 unsigned long timeout)
+{
+	struct wait_queue_head *wq_head = bit_waitqueue(word, bit);
+	DEFINE_WAIT_BIT(wq_entry, word, bit);
+
+	wq_entry.key.timeout = jiffies + timeout;
+
+	return __wait_on_bit_lock(wq_head, &wq_entry, action, mode);
+}
+EXPORT_SYMBOL(out_of_line_wait_on_bit_lock_timeout);
+
 void __wake_up_bit(struct wait_queue_head *wq_head, void *word, int bit)
 {
 	struct wait_bit_key key = __WAIT_BIT_KEY_INITIALIZER(word, bit);
-- 
2.40.0.634.g4ca3ef3211-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 3/4] migrate_pages: Don't wait forever locking pages in MIGRATE_SYNC_LIGHT
  2023-04-21 22:12 [PATCH v2 0/4] migrate: Avoid unbounded blocks in MIGRATE_SYNC_LIGHT Douglas Anderson
  2023-04-21 22:12 ` [PATCH v2 1/4] mm/filemap: Add folio_lock_timeout() Douglas Anderson
  2023-04-21 22:12 ` [PATCH v2 2/4] buffer: Add lock_buffer_timeout() Douglas Anderson
@ 2023-04-21 22:12 ` Douglas Anderson
  2023-04-23  7:59   ` Huang, Ying
  2023-04-24  9:38   ` Mel Gorman
  2023-04-21 22:12 ` [PATCH v2 4/4] migrate_pages: Don't wait forever locking buffers " Douglas Anderson
       [not found] ` <20230422051858.1696-1-hdanton@sina.com>
  4 siblings, 2 replies; 14+ messages in thread
From: Douglas Anderson @ 2023-04-21 22:12 UTC (permalink / raw)
  To: Andrew Morton, Mel Gorman, Vlastimil Babka, Ying, Alexander Viro,
	Christian Brauner
  Cc: linux-kernel, linux-mm, Yu Zhao, linux-fsdevel, Matthew Wilcox,
	Douglas Anderson

The MIGRATE_SYNC_LIGHT mode is intended to block for things that will
finish quickly but not for things that will take a long time. Exactly
how long is too long is not well defined, but waits of tens of
milliseconds is likely non-ideal.

Waiting on the folio lock in isolate_movable_page() is something that
usually is pretty quick, but is not officially bounded. Nothing stops
another process from holding a folio lock while doing an expensive
operation. Having an unbounded wait like this is not within the design
goals of MIGRATE_SYNC_LIGHT.

When putting a Chromebook under memory pressure (opening over 90 tabs
on a 4GB machine) it was fairly easy to see delays waiting for the
lock of > 100 ms. While the laptop wasn't amazingly usable in this
state, it was still limping along and this state isn't something
artificial. Sometimes we simply end up with a lot of memory pressure.

Putting the same Chromebook under memory pressure while it was running
Android apps (though not stressing them) showed a much worse result
(NOTE: this was on a older kernel but the codepaths here are
similar). Android apps on ChromeOS currently run from a 128K-block,
zlib-compressed, loopback-mounted squashfs disk. If we get a page
fault from something backed by the squashfs filesystem we could end up
holding a folio lock while reading enough from disk to decompress 128K
(and then decompressing it using the somewhat slow zlib algorithms).
That reading goes through the ext4 subsystem (because it's a loopback
mount) before eventually ending up in the block subsystem. This extra
jaunt adds extra overhead. Without much work I could see cases where
we ended up blocked on a folio lock for over a second. With more
more extreme memory pressure I could see up to 25 seconds.

Let's bound the amount of time we can wait for the folio lock. The
SYNC_LIGHT migration mode can already handle failure for things that
are slow, so adding this timeout in is fairly straightforward.

With this timeout, it can be seen that kcompactd can move on to more
productive tasks if it's taking a long time to acquire a lock.

NOTE: The reason I stated digging into this isn't because some
benchmark had gone awry, but because we've received in-the-field crash
reports where we have a hung task waiting on the page lock (which is
the equivalent code path on old kernels). While the root cause of
those crashes is likely unrelated and won't be fixed by this patch,
analyzing those crash reports did point out this unbounded wait and it
seemed like something good to fix.

ALSO NOTE: the timeout mechanism used here uses "jiffies" and we also
will retry up to 7 times. That doesn't give us much accuracy in
specifying the timeout. On 1000 Hz machines we'll end up timing out in
7-14 ms. On 100 Hz machines we'll end up in 70-140 ms. Given that we
don't have a strong definition of how long "too long" is, this is
probably OK.

Suggested-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

Changes in v2:
- Keep unbounded delay in "SYNC", delay with a timeout in "SYNC_LIGHT"

 mm/migrate.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index db3f154446af..60982df71a93 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -58,6 +58,23 @@
 
 #include "internal.h"
 
+/* Returns the schedule timeout for a non-async mode */
+static long timeout_for_mode(enum migrate_mode mode)
+{
+	/*
+	 * We'll always return 1 jiffy as the timeout. Since all places using
+	 * this timeout are in a retry loop this means that the maximum time
+	 * we might block is actually NR_MAX_MIGRATE_SYNC_RETRY jiffies.
+	 * If a jiffy is 1 ms that's 7 ms, though with the accuracy of the
+	 * timeouts it often ends up more like 14 ms; if a jiffy is 10 ms
+	 * that's 70-140 ms.
+	 */
+	if (mode == MIGRATE_SYNC_LIGHT)
+		return 1;
+
+	return MAX_SCHEDULE_TIMEOUT;
+}
+
 bool isolate_movable_page(struct page *page, isolate_mode_t mode)
 {
 	struct folio *folio = folio_get_nontail_page(page);
@@ -1162,7 +1179,8 @@ static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page
 		if (current->flags & PF_MEMALLOC)
 			goto out;
 
-		folio_lock(src);
+		if (folio_lock_timeout(src, timeout_for_mode(mode)))
+			goto out;
 	}
 	locked = true;
 
-- 
2.40.0.634.g4ca3ef3211-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 4/4] migrate_pages: Don't wait forever locking buffers in MIGRATE_SYNC_LIGHT
  2023-04-21 22:12 [PATCH v2 0/4] migrate: Avoid unbounded blocks in MIGRATE_SYNC_LIGHT Douglas Anderson
                   ` (2 preceding siblings ...)
  2023-04-21 22:12 ` [PATCH v2 3/4] migrate_pages: Don't wait forever locking pages in MIGRATE_SYNC_LIGHT Douglas Anderson
@ 2023-04-21 22:12 ` Douglas Anderson
       [not found] ` <20230422051858.1696-1-hdanton@sina.com>
  4 siblings, 0 replies; 14+ messages in thread
From: Douglas Anderson @ 2023-04-21 22:12 UTC (permalink / raw)
  To: Andrew Morton, Mel Gorman, Vlastimil Babka, Ying, Alexander Viro,
	Christian Brauner
  Cc: linux-kernel, linux-mm, Yu Zhao, linux-fsdevel, Matthew Wilcox,
	Douglas Anderson

Just as talked about in the patch ("migrate_pages: Don't wait forever
locking pages in MIGRATE_SYNC_LIGHT"), we don't really want unbounded
waits when we're running in MIGRATE_SYNC_LIGHT mode. Waiting on the
buffer lock is a second such unbounded wait. Let's put a timeout on
it.

While measurement didn't show this wait to be quite as bad as the one
waiting for the folio lock, it could still be measured to be over a
second in some cases.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

Changes in v2:
- "Don't wait forever locking buffers in MIGRATE_SYNC_LIGHT" new for v2.

 mm/migrate.c | 25 ++++++++-----------------
 1 file changed, 8 insertions(+), 17 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 60982df71a93..97c93604eb4c 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -715,25 +715,16 @@ static bool buffer_migrate_lock_buffers(struct buffer_head *head,
 							enum migrate_mode mode)
 {
 	struct buffer_head *bh = head;
+	bool locked;
 
-	/* Simple case, sync compaction */
-	if (mode != MIGRATE_ASYNC) {
-		do {
-			lock_buffer(bh);
-			bh = bh->b_this_page;
-
-		} while (bh != head);
-
-		return true;
-	}
-
-	/* async case, we cannot block on lock_buffer so use trylock_buffer */
 	do {
-		if (!trylock_buffer(bh)) {
-			/*
-			 * We failed to lock the buffer and cannot stall in
-			 * async migration. Release the taken locks
-			 */
+		if (mode == MIGRATE_ASYNC)
+			locked = trylock_buffer(bh);
+		else
+			locked = !lock_buffer_timeout(bh, timeout_for_mode(mode));
+
+		if (!locked) {
+			/* We failed to lock the buffer. Release the taken locks. */
 			struct buffer_head *failed_bh = bh;
 			bh = head;
 			while (bh != failed_bh) {
-- 
2.40.0.634.g4ca3ef3211-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/4] mm/filemap: Add folio_lock_timeout()
  2023-04-21 22:12 ` [PATCH v2 1/4] mm/filemap: Add folio_lock_timeout() Douglas Anderson
@ 2023-04-23  7:50   ` Huang, Ying
  2023-04-24  8:22   ` Mel Gorman
  1 sibling, 0 replies; 14+ messages in thread
From: Huang, Ying @ 2023-04-23  7:50 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Andrew Morton, Mel Gorman, Vlastimil Babka, Alexander Viro,
	Christian Brauner, linux-kernel, linux-mm, Yu Zhao, linux-fsdevel,
	Matthew Wilcox

Douglas Anderson <dianders@chromium.org> writes:

> Add a variant of folio_lock() that can timeout. This is useful to
> avoid unbounded waits for the page lock in kcompactd.
>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
>
> Changes in v2:
> - "Add folio_lock_timeout()" new for v2.
>
>  include/linux/pagemap.h | 16 ++++++++++++++
>  mm/filemap.c            | 47 +++++++++++++++++++++++++++++------------
>  2 files changed, 50 insertions(+), 13 deletions(-)
>
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 0acb8e1fb7af..0f3ef9f79300 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -892,6 +892,7 @@ static inline bool wake_page_match(struct wait_page_queue *wait_page,
>  }
>  
>  void __folio_lock(struct folio *folio);
> +int __folio_lock_timeout(struct folio *folio, long timeout);
>  int __folio_lock_killable(struct folio *folio);
>  bool __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm,
>  				unsigned int flags);
> @@ -952,6 +953,21 @@ static inline void folio_lock(struct folio *folio)
>  		__folio_lock(folio);
>  }
>  
> +/**
> + * folio_lock_timeout() - Lock this folio, with a timeout.
> + * @folio: The folio to lock.
> + * @timeout: The timeout in jiffies; %MAX_SCHEDULE_TIMEOUT means wait forever.
> + *
> + * Return: 0 upon success; -ETIMEDOUT upon failure.

IIUC, the funtion may return -EINTR too.

Otherwise looks good to me.  Thanks!

Best Regards,
Huang, Ying

> + */
> +static inline int folio_lock_timeout(struct folio *folio, long timeout)
> +{
> +	might_sleep();
> +	if (!folio_trylock(folio))
> +		return __folio_lock_timeout(folio, timeout);
> +	return 0;
> +}
> +
>  /**
>   * lock_page() - Lock the folio containing this page.
>   * @page: The page to lock.
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 2723104cc06a..c6056ec41284 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1220,7 +1220,7 @@ static inline bool folio_trylock_flag(struct folio *folio, int bit_nr,
>  int sysctl_page_lock_unfairness = 5;
>  
>  static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
> -		int state, enum behavior behavior)
> +		int state, enum behavior behavior, long timeout)
>  {
>  	wait_queue_head_t *q = folio_waitqueue(folio);
>  	int unfairness = sysctl_page_lock_unfairness;
> @@ -1229,6 +1229,7 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
>  	bool thrashing = false;
>  	unsigned long pflags;
>  	bool in_thrashing;
> +	int err;
>  
>  	if (bit_nr == PG_locked &&
>  	    !folio_test_uptodate(folio) && folio_test_workingset(folio)) {
> @@ -1295,10 +1296,13 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
>  		/* Loop until we've been woken or interrupted */
>  		flags = smp_load_acquire(&wait->flags);
>  		if (!(flags & WQ_FLAG_WOKEN)) {
> +			if (!timeout)
> +				break;
> +
>  			if (signal_pending_state(state, current))
>  				break;
>  
> -			io_schedule();
> +			timeout = io_schedule_timeout(timeout);
>  			continue;
>  		}
>  
> @@ -1324,10 +1328,10 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
>  	}
>  
>  	/*
> -	 * If a signal happened, this 'finish_wait()' may remove the last
> -	 * waiter from the wait-queues, but the folio waiters bit will remain
> -	 * set. That's ok. The next wakeup will take care of it, and trying
> -	 * to do it here would be difficult and prone to races.
> +	 * If a signal/timeout happened, this 'finish_wait()' may remove the
> +	 * last waiter from the wait-queues, but the folio waiters bit will
> +	 * remain set. That's ok. The next wakeup will take care of it, and
> +	 * trying to do it here would be difficult and prone to races.
>  	 */
>  	finish_wait(q, wait);
>  
> @@ -1336,6 +1340,13 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
>  		psi_memstall_leave(&pflags);
>  	}
>  
> +	/*
> +	 * If we don't meet the success criteria below then we've got an error
> +	 * of some sort. Differentiate between the two error cases. If there's
> +	 * no time left it must have been a timeout.
> +	 */
> +	err = !timeout ? -ETIMEDOUT : -EINTR;
> +
>  	/*
>  	 * NOTE! The wait->flags weren't stable until we've done the
>  	 * 'finish_wait()', and we could have exited the loop above due
> @@ -1350,9 +1361,9 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
>  	 * waiter, but an exclusive one requires WQ_FLAG_DONE.
>  	 */
>  	if (behavior == EXCLUSIVE)
> -		return wait->flags & WQ_FLAG_DONE ? 0 : -EINTR;
> +		return wait->flags & WQ_FLAG_DONE ? 0 : err;
>  
> -	return wait->flags & WQ_FLAG_WOKEN ? 0 : -EINTR;
> +	return wait->flags & WQ_FLAG_WOKEN ? 0 : err;
>  }
>  
>  #ifdef CONFIG_MIGRATION
> @@ -1442,13 +1453,15 @@ void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep,
>  
>  void folio_wait_bit(struct folio *folio, int bit_nr)
>  {
> -	folio_wait_bit_common(folio, bit_nr, TASK_UNINTERRUPTIBLE, SHARED);
> +	folio_wait_bit_common(folio, bit_nr, TASK_UNINTERRUPTIBLE, SHARED,
> +			      MAX_SCHEDULE_TIMEOUT);
>  }
>  EXPORT_SYMBOL(folio_wait_bit);
>  
>  int folio_wait_bit_killable(struct folio *folio, int bit_nr)
>  {
> -	return folio_wait_bit_common(folio, bit_nr, TASK_KILLABLE, SHARED);
> +	return folio_wait_bit_common(folio, bit_nr, TASK_KILLABLE, SHARED,
> +				     MAX_SCHEDULE_TIMEOUT);
>  }
>  EXPORT_SYMBOL(folio_wait_bit_killable);
>  
> @@ -1467,7 +1480,8 @@ EXPORT_SYMBOL(folio_wait_bit_killable);
>   */
>  static int folio_put_wait_locked(struct folio *folio, int state)
>  {
> -	return folio_wait_bit_common(folio, PG_locked, state, DROP);
> +	return folio_wait_bit_common(folio, PG_locked, state, DROP,
> +				     MAX_SCHEDULE_TIMEOUT);
>  }
>  
>  /**
> @@ -1662,17 +1676,24 @@ EXPORT_SYMBOL_GPL(page_endio);
>  void __folio_lock(struct folio *folio)
>  {
>  	folio_wait_bit_common(folio, PG_locked, TASK_UNINTERRUPTIBLE,
> -				EXCLUSIVE);
> +				EXCLUSIVE, MAX_SCHEDULE_TIMEOUT);
>  }
>  EXPORT_SYMBOL(__folio_lock);
>  
>  int __folio_lock_killable(struct folio *folio)
>  {
>  	return folio_wait_bit_common(folio, PG_locked, TASK_KILLABLE,
> -					EXCLUSIVE);
> +					EXCLUSIVE, MAX_SCHEDULE_TIMEOUT);
>  }
>  EXPORT_SYMBOL_GPL(__folio_lock_killable);
>  
> +int __folio_lock_timeout(struct folio *folio, long timeout)
> +{
> +	return folio_wait_bit_common(folio, PG_locked, TASK_KILLABLE,
> +					EXCLUSIVE, timeout);
> +}
> +EXPORT_SYMBOL_GPL(__folio_lock_timeout);
> +
>  static int __folio_lock_async(struct folio *folio, struct wait_page_queue *wait)
>  {
>  	struct wait_queue_head *q = folio_waitqueue(folio);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 3/4] migrate_pages: Don't wait forever locking pages in MIGRATE_SYNC_LIGHT
  2023-04-21 22:12 ` [PATCH v2 3/4] migrate_pages: Don't wait forever locking pages in MIGRATE_SYNC_LIGHT Douglas Anderson
@ 2023-04-23  7:59   ` Huang, Ying
  2023-04-24  9:38   ` Mel Gorman
  1 sibling, 0 replies; 14+ messages in thread
From: Huang, Ying @ 2023-04-23  7:59 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Andrew Morton, Mel Gorman, Vlastimil Babka, Alexander Viro,
	Christian Brauner, linux-kernel, linux-mm, Yu Zhao, linux-fsdevel,
	Matthew Wilcox

Douglas Anderson <dianders@chromium.org> writes:

> The MIGRATE_SYNC_LIGHT mode is intended to block for things that will
> finish quickly but not for things that will take a long time. Exactly
> how long is too long is not well defined, but waits of tens of
> milliseconds is likely non-ideal.
>
> Waiting on the folio lock in isolate_movable_page() is something that
> usually is pretty quick, but is not officially bounded. Nothing stops
> another process from holding a folio lock while doing an expensive
> operation. Having an unbounded wait like this is not within the design
> goals of MIGRATE_SYNC_LIGHT.
>
> When putting a Chromebook under memory pressure (opening over 90 tabs
> on a 4GB machine) it was fairly easy to see delays waiting for the
> lock of > 100 ms. While the laptop wasn't amazingly usable in this
> state, it was still limping along and this state isn't something
> artificial. Sometimes we simply end up with a lot of memory pressure.
>
> Putting the same Chromebook under memory pressure while it was running
> Android apps (though not stressing them) showed a much worse result
> (NOTE: this was on a older kernel but the codepaths here are
> similar). Android apps on ChromeOS currently run from a 128K-block,
> zlib-compressed, loopback-mounted squashfs disk. If we get a page
> fault from something backed by the squashfs filesystem we could end up
> holding a folio lock while reading enough from disk to decompress 128K
> (and then decompressing it using the somewhat slow zlib algorithms).
> That reading goes through the ext4 subsystem (because it's a loopback
> mount) before eventually ending up in the block subsystem. This extra
> jaunt adds extra overhead. Without much work I could see cases where
> we ended up blocked on a folio lock for over a second. With more
> more extreme memory pressure I could see up to 25 seconds.
>
> Let's bound the amount of time we can wait for the folio lock. The
> SYNC_LIGHT migration mode can already handle failure for things that
> are slow, so adding this timeout in is fairly straightforward.
>
> With this timeout, it can be seen that kcompactd can move on to more
> productive tasks if it's taking a long time to acquire a lock.

How long is the max wait time of folio_lock_timeout()?

> NOTE: The reason I stated digging into this isn't because some
> benchmark had gone awry, but because we've received in-the-field crash
> reports where we have a hung task waiting on the page lock (which is
> the equivalent code path on old kernels). While the root cause of
> those crashes is likely unrelated and won't be fixed by this patch,
> analyzing those crash reports did point out this unbounded wait and it
> seemed like something good to fix.
>
> ALSO NOTE: the timeout mechanism used here uses "jiffies" and we also
> will retry up to 7 times. That doesn't give us much accuracy in
> specifying the timeout. On 1000 Hz machines we'll end up timing out in
> 7-14 ms. On 100 Hz machines we'll end up in 70-140 ms. Given that we
> don't have a strong definition of how long "too long" is, this is
> probably OK.

You can use HZ to work with different configuration.  It doesn't help
much if your target is 1ms.  But I think that it's possible to set it to
longer than that in the future.  So, some general definition looks
better.

Best Regards,
Huang, Ying

> Suggested-by: Mel Gorman <mgorman@techsingularity.net>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
>
> Changes in v2:
> - Keep unbounded delay in "SYNC", delay with a timeout in "SYNC_LIGHT"
>
>  mm/migrate.c | 20 +++++++++++++++++++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index db3f154446af..60982df71a93 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -58,6 +58,23 @@
>  
>  #include "internal.h"
>  
> +/* Returns the schedule timeout for a non-async mode */
> +static long timeout_for_mode(enum migrate_mode mode)
> +{
> +	/*
> +	 * We'll always return 1 jiffy as the timeout. Since all places using
> +	 * this timeout are in a retry loop this means that the maximum time
> +	 * we might block is actually NR_MAX_MIGRATE_SYNC_RETRY jiffies.
> +	 * If a jiffy is 1 ms that's 7 ms, though with the accuracy of the
> +	 * timeouts it often ends up more like 14 ms; if a jiffy is 10 ms
> +	 * that's 70-140 ms.
> +	 */
> +	if (mode == MIGRATE_SYNC_LIGHT)
> +		return 1;
> +
> +	return MAX_SCHEDULE_TIMEOUT;
> +}
> +
>  bool isolate_movable_page(struct page *page, isolate_mode_t mode)
>  {
>  	struct folio *folio = folio_get_nontail_page(page);
> @@ -1162,7 +1179,8 @@ static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page
>  		if (current->flags & PF_MEMALLOC)
>  			goto out;
>  
> -		folio_lock(src);
> +		if (folio_lock_timeout(src, timeout_for_mode(mode)))
> +			goto out;
>  	}
>  	locked = true;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/4] mm/filemap: Add folio_lock_timeout()
       [not found]   ` <20230423081203.1812-1-hdanton@sina.com>
@ 2023-04-23  8:35     ` Gao Xiang
       [not found]     ` <20230423094901.1867-1-hdanton@sina.com>
  1 sibling, 0 replies; 14+ messages in thread
From: Gao Xiang @ 2023-04-23  8:35 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Andrew Morton, Douglas Anderson, Mel Gorman, Alexander Viro,
	Christian Brauner, linux-kernel, linux-mm, Yu Zhao,
	Matthew Wilcox, linux-fsdevel, ying.huang

Hi Hillf,

On 2023/4/23 16:12, Hillf Danton wrote:
> On 23 Apr 2023 14:08:49 +0800 Gao Xiang <hsiangkao@linux.alibaba.com>
>> On 2023/4/22 13:18, Hillf Danton wrote:
>>> On 21 Apr 2023 15:12:45 -0700 Douglas Anderson <dianders@chromium.org>
>>>> Add a variant of folio_lock() that can timeout. This is useful to
>>>> avoid unbounded waits for the page lock in kcompactd.
>>>
>>> Given no mutex_lock_timeout() (perhaps because timeout makes no sense for
>>> spinlock), I suspect your fix lies in the right layer. If waiting for
>>> page under IO causes trouble for you, another simpler option is make
>>> IO faster (perhaps all you can do) for instance. If kcompactd is waken
>>> up by kswapd, waiting for slow IO is the right thing to do.
>>
>> A bit out of topic.  That is almost our original inital use scenarios for
> 
> Thanks for taking a look.
> 
>> EROFS [1] although we didn't actually test Chrome OS, there lies four
>> points:
>>
>>    1) 128kb compressed size unit is not suitable for memory constraint
>>       workload, especially memory pressure scenarios, that amplify both I/Os
>>       and memory footprints (EROFS was initially optimized with 4KiB
>>       pclusters);
> 
> Feel free to take another one at 2M THP [1].
> 
> [1] https://lore.kernel.org/lkml/20230418191313.268131-1-hannes@cmpxchg.org/

Honestly I don't catch your point here, does THP has some relationship with
this?  Almost all smartphones (but I don't know Chromebook honestly) didn't
use THP at that time.

>>
>>    2) If you turn into a small compressed size (e.g. 4 KiB), some fs behaves
>>       ineffective since its on-disk compressed index isn't designed to be
>>       random accessed (another in-memory cache for random access) so you have
>>       to count one by one to calculate physical data offset if cache miss;
>>
>>    3) compressed data needs to take extra memory during I/O (especially
>>       low-ended devices) that makes the cases worse and our camera app
>>       workloads once cannot be properly launched under heavy memory pressure,
>>       but in order to keep user best experience we have to keep as many as
>>       apps active so that it's hard to kill apps directly.  So inplace I/O +
>>       decompression is needed in addition to small compressed sizes for
>>       overall performance.
> 
> Frankly nowadays I have no interest in running linux with <16M RAM for example.

Our cases are tested on 2016-2018 devices under 3 to 6 GB memory if you
take a glance at the original ATC paper, the page 9 (section 5.1) wrote:
"However, it costed too much CPU and memory resources, and when trying to
  run a camera application, the phone froze for tens of seconds before it
  finally failed."

I have no idea how 16M RAM here comes from but smartphones doesn't have
such limited memory.  In brief, if you runs few app, you have few problem.
but as long as you keeps more apps in background (and running), then the
memory will eventually suffer pressure.

>>
>>    4) If considering real-time performance, some algorithms are not quite
>>       suitable for extreme pressure cases;
> 
> Neither in considering any perf under extreme memory pressure (16M or 64G RAM)
> because of crystally pure waste of time.

Personally I don't think so, if you'd like to land an effective compression
approach for end users and avoid user complaints (app lagging, app frozen,
etc).  I think these all need to be considered in practice.

Thanks,
Gao Xiang

>>
>>    etc.
>>
>> I could give more details on this year LSF/MM about this, although it's not
>> a new topic and I'm not a Android guy now.
> 
> Did you book the air ticket? How many bucks?
>>
>> [1] https://www.usenix.org/conference/atc19/presentation/gao
>>
>> Thanks,
>> Gao Xiang

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/4] buffer: Add lock_buffer_timeout()
  2023-04-21 22:12 ` [PATCH v2 2/4] buffer: Add lock_buffer_timeout() Douglas Anderson
@ 2023-04-23  8:47   ` Huang, Ying
  0 siblings, 0 replies; 14+ messages in thread
From: Huang, Ying @ 2023-04-23  8:47 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Andrew Morton, Mel Gorman, Vlastimil Babka, Alexander Viro,
	Christian Brauner, linux-kernel, linux-mm, Yu Zhao, linux-fsdevel,
	Matthew Wilcox, Bart Van Assche, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Ingo Molnar,
	Jan Kara, Juri Lelli, Mel Gorman, Mikulas Patocka, Peter Zijlstra,
	Ritesh Harjani (IBM), Steven Rostedt, Valentin Schneider,
	Vincent Guittot, Will Deacon, Zhang Yi

Douglas Anderson <dianders@chromium.org> writes:

> Add a variant of lock_buffer() that can timeout. This is useful to
> avoid unbounded waits for the page lock in kcompactd.
>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
>
> Changes in v2:
> - "Add lock_buffer_timeout()" new for v2.
>
>  fs/buffer.c                 |  7 +++++++
>  include/linux/buffer_head.h | 10 ++++++++++
>  include/linux/wait_bit.h    | 24 ++++++++++++++++++++++++
>  kernel/sched/wait_bit.c     | 14 ++++++++++++++
>  4 files changed, 55 insertions(+)
>
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 9e1e2add541e..fcd19c270024 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -71,6 +71,13 @@ void __lock_buffer(struct buffer_head *bh)
>  }
>  EXPORT_SYMBOL(__lock_buffer);
>  
> +int __lock_buffer_timeout(struct buffer_head *bh, unsigned long timeout)
> +{
> +	return wait_on_bit_lock_io_timeout(&bh->b_state, BH_Lock,
> +					   TASK_UNINTERRUPTIBLE, timeout);
> +}
> +EXPORT_SYMBOL(__lock_buffer_timeout);
> +
>  void unlock_buffer(struct buffer_head *bh)
>  {
>  	clear_bit_unlock(BH_Lock, &bh->b_state);
> diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
> index 8f14dca5fed7..2bae464f89d5 100644
> --- a/include/linux/buffer_head.h
> +++ b/include/linux/buffer_head.h
> @@ -237,6 +237,7 @@ struct buffer_head *alloc_buffer_head(gfp_t gfp_flags);
>  void free_buffer_head(struct buffer_head * bh);
>  void unlock_buffer(struct buffer_head *bh);
>  void __lock_buffer(struct buffer_head *bh);
> +int __lock_buffer_timeout(struct buffer_head *bh, unsigned long timeout);
>  int sync_dirty_buffer(struct buffer_head *bh);
>  int __sync_dirty_buffer(struct buffer_head *bh, blk_opf_t op_flags);
>  void write_dirty_buffer(struct buffer_head *bh, blk_opf_t op_flags);
> @@ -400,6 +401,15 @@ static inline void lock_buffer(struct buffer_head *bh)
>  		__lock_buffer(bh);
>  }
>  
> +static inline int lock_buffer_timeout(struct buffer_head *bh,
> +				      unsigned long timeout)
> +{
> +	might_sleep();
> +	if (!trylock_buffer(bh))
> +		return __lock_buffer_timeout(bh, timeout);
> +	return 0;
> +}
> +

Add document about return value of lock_buffer_timeout()?

Otherwise looks good to me.

Best Regards,
Huang, Ying

>  static inline struct buffer_head *getblk_unmovable(struct block_device *bdev,
>  						   sector_t block,
>  						   unsigned size)
> diff --git a/include/linux/wait_bit.h b/include/linux/wait_bit.h
> index 7725b7579b78..33f0f60b1c8c 100644
> --- a/include/linux/wait_bit.h
> +++ b/include/linux/wait_bit.h
> @@ -30,6 +30,7 @@ void wake_up_bit(void *word, int bit);
>  int out_of_line_wait_on_bit(void *word, int, wait_bit_action_f *action, unsigned int mode);
>  int out_of_line_wait_on_bit_timeout(void *word, int, wait_bit_action_f *action, unsigned int mode, unsigned long timeout);
>  int out_of_line_wait_on_bit_lock(void *word, int, wait_bit_action_f *action, unsigned int mode);
> +int out_of_line_wait_on_bit_lock_timeout(void *word, int, wait_bit_action_f *action, unsigned int mode, unsigned long timeout);
>  struct wait_queue_head *bit_waitqueue(void *word, int bit);
>  extern void __init wait_bit_init(void);
>  
> @@ -208,6 +209,29 @@ wait_on_bit_lock_io(unsigned long *word, int bit, unsigned mode)
>  	return out_of_line_wait_on_bit_lock(word, bit, bit_wait_io, mode);
>  }
>  
> +/**
> + * wait_on_bit_lock_io_timeout - wait_on_bit_lock_io() with a timeout
> + * @word: the word being waited on, a kernel virtual address
> + * @bit: the bit of the word being waited on
> + * @mode: the task state to sleep in
> + * @timeout: the timeout in jiffies; %MAX_SCHEDULE_TIMEOUT means wait forever
> + *
> + * Returns zero if the bit was (eventually) found to be clear and was
> + * set.  Returns non-zero if a timeout happened or a signal was delivered to
> + * the process and the @mode allows that signal to wake the process.
> + */
> +static inline int
> +wait_on_bit_lock_io_timeout(unsigned long *word, int bit, unsigned mode,
> +			    unsigned long timeout)
> +{
> +	might_sleep();
> +	if (!test_and_set_bit(bit, word))
> +		return 0;
> +	return out_of_line_wait_on_bit_lock_timeout(word, bit,
> +						    bit_wait_io_timeout,
> +						    mode, timeout);
> +}
> +
>  /**
>   * wait_on_bit_lock_action - wait for a bit to be cleared, when wanting to set it
>   * @word: the word being waited on, a kernel virtual address
> diff --git a/kernel/sched/wait_bit.c b/kernel/sched/wait_bit.c
> index 0b1cd985dc27..629acd1c6c79 100644
> --- a/kernel/sched/wait_bit.c
> +++ b/kernel/sched/wait_bit.c
> @@ -118,6 +118,20 @@ int __sched out_of_line_wait_on_bit_lock(void *word, int bit,
>  }
>  EXPORT_SYMBOL(out_of_line_wait_on_bit_lock);
>  
> +int __sched out_of_line_wait_on_bit_lock_timeout(void *word, int bit,
> +						 wait_bit_action_f *action,
> +						 unsigned mode,
> +						 unsigned long timeout)
> +{
> +	struct wait_queue_head *wq_head = bit_waitqueue(word, bit);
> +	DEFINE_WAIT_BIT(wq_entry, word, bit);
> +
> +	wq_entry.key.timeout = jiffies + timeout;
> +
> +	return __wait_on_bit_lock(wq_head, &wq_entry, action, mode);
> +}
> +EXPORT_SYMBOL(out_of_line_wait_on_bit_lock_timeout);
> +
>  void __wake_up_bit(struct wait_queue_head *wq_head, void *word, int bit)
>  {
>  	struct wait_bit_key key = __WAIT_BIT_KEY_INITIALIZER(word, bit);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/4] mm/filemap: Add folio_lock_timeout()
       [not found]     ` <20230423094901.1867-1-hdanton@sina.com>
@ 2023-04-23 10:45       ` Gao Xiang
  0 siblings, 0 replies; 14+ messages in thread
From: Gao Xiang @ 2023-04-23 10:45 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Andrew Morton, Douglas Anderson, Mel Gorman, Alexander Viro,
	Christian Brauner, linux-kernel, linux-mm, Yu Zhao,
	Matthew Wilcox, linux-fsdevel, ying.huang



On 2023/4/23 17:49, Hillf Danton wrote:
> On 23 Apr 2023 16:35:26 +0800 Gao Xiang <hsiangkao@linux.alibaba.com>
>> On 2023/4/23 16:12, Hillf Danton wrote:
>>> On 23 Apr 2023 14:08:49 +0800 Gao Xiang <hsiangkao@linux.alibaba.com>
>>>> On 2023/4/22 13:18, Hillf Danton wrote:
>>>>> On 21 Apr 2023 15:12:45 -0700 Douglas Anderson <dianders@chromium.org>
>>>>>> Add a variant of folio_lock() that can timeout. This is useful to
>>>>>> avoid unbounded waits for the page lock in kcompactd.
>>>>>
>>>>> Given no mutex_lock_timeout() (perhaps because timeout makes no sense for
>>>>> spinlock), I suspect your fix lies in the right layer. If waiting for
>>>>> page under IO causes trouble for you, another simpler option is make
>>>>> IO faster (perhaps all you can do) for instance. If kcompactd is waken
>>>>> up by kswapd, waiting for slow IO is the right thing to do.
>>>>
>>>> A bit out of topic.  That is almost our original inital use scenarios for
>>>
>>> Thanks for taking a look.
>>>
>>>> EROFS [1] although we didn't actually test Chrome OS, there lies four
>>>> points:
>>>>
>>>>     1) 128kb compressed size unit is not suitable for memory constraint
>>>>        workload, especially memory pressure scenarios, that amplify both I/Os
>>>>        and memory footprints (EROFS was initially optimized with 4KiB
>>>>        pclusters);
>>>
>>> Feel free to take another one at 2M THP [1].
>>>
>>> [1] https://lore.kernel.org/lkml/20230418191313.268131-1-hannes@cmpxchg.org/
>>
>> Honestly I don't catch your point here, does THP has some relationship with
> 
> THP tests ended without the help of timeout helpers.
> 
>> this?  Almost all smartphones (but I don't know Chromebook honestly) didn't
>> use THP at that time.
>>>>
>>>>     2) If you turn into a small compressed size (e.g. 4 KiB), some fs behaves
>>>>        ineffective since its on-disk compressed index isn't designed to be
>>>>        random accessed (another in-memory cache for random access) so you have
>>>>        to count one by one to calculate physical data offset if cache miss;
>>>>
>>>>     3) compressed data needs to take extra memory during I/O (especially
>>>>        low-ended devices) that makes the cases worse and our camera app
>>>>        workloads once cannot be properly launched under heavy memory pressure,
>>>>        but in order to keep user best experience we have to keep as many as
>>>>        apps active so that it's hard to kill apps directly.  So inplace I/O +
>>>>        decompression is needed in addition to small compressed sizes for
>>>>        overall performance.
>>>
>>> Frankly nowadays I have no interest in running linux with <16M RAM for example.
>>
>> Our cases are tested on 2016-2018 devices under 3 to 6 GB memory if you
>> take a glance at the original ATC paper, the page 9 (section 5.1) wrote:
>> "However, it costed too much CPU and memory resources, and when trying to
>>   run a camera application, the phone froze for tens of seconds before it
>>   finally failed."
>>
>> I have no idea how 16M RAM here comes from but smartphones doesn't have
>> such limited memory.  In brief, if you runs few app, you have few problem.
>> but as long as you keeps more apps in background (and running), then the
>> memory will eventually suffer pressure.
> 
> Given no complaints in case of running 16 apps with 1G RAM for instance,
> what is the point of running 256 apps with the same RAM? And adding changes

I don't think the `ill-designed` word is helpful to the overall topic.

I'm not sure if my description is confusing:

  1) First, I never said running 256 apps with the same RAM.  In fact, in 2018
     there are indeed some phones still with 1G RAM, if my memory is correct,
     such 1G phones couldn't run 16 latest mainstream super apps at the same time
     smoothly, and, previously compression will lead this worse.  Even such
     phones cannot use a full Android but a minimized Android Go [1] instead.
     The worst case I've heard on phones with 1G RAM would be "after you checked
     a new message from friends on a superapp by switch out, another previous
     one with some incomplete registeration form could be killed and you have
     to restart and refill the form."

  2) apps and baseos can be upgraded over time, especially apps, since Android
     ecosystem is open.  It's hard to get over it.

Thanks,
Gao Xiang

> because of ill designed phone products?

[1] https://developer.android.com/guide/topics/androidgo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/4] mm/filemap: Add folio_lock_timeout()
  2023-04-21 22:12 ` [PATCH v2 1/4] mm/filemap: Add folio_lock_timeout() Douglas Anderson
  2023-04-23  7:50   ` Huang, Ying
@ 2023-04-24  8:22   ` Mel Gorman
  2023-04-24 16:22     ` Doug Anderson
  1 sibling, 1 reply; 14+ messages in thread
From: Mel Gorman @ 2023-04-24  8:22 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Andrew Morton, Vlastimil Babka, Ying, Alexander Viro,
	Christian Brauner, linux-kernel, linux-mm, Yu Zhao, linux-fsdevel,
	Matthew Wilcox

On Fri, Apr 21, 2023 at 03:12:45PM -0700, Douglas Anderson wrote:
> Add a variant of folio_lock() that can timeout. This is useful to
> avoid unbounded waits for the page lock in kcompactd.
> 
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> 
> Changes in v2:
> - "Add folio_lock_timeout()" new for v2.
> 
>  include/linux/pagemap.h | 16 ++++++++++++++
>  mm/filemap.c            | 47 +++++++++++++++++++++++++++++------------
>  2 files changed, 50 insertions(+), 13 deletions(-)
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 0acb8e1fb7af..0f3ef9f79300 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -892,6 +892,7 @@ static inline bool wake_page_match(struct wait_page_queue *wait_page,
>  }
>  
>  void __folio_lock(struct folio *folio);
> +int __folio_lock_timeout(struct folio *folio, long timeout);
>  int __folio_lock_killable(struct folio *folio);
>  bool __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm,
>  				unsigned int flags);
> @@ -952,6 +953,21 @@ static inline void folio_lock(struct folio *folio)
>  		__folio_lock(folio);
>  }
>  
> +/**
> + * folio_lock_timeout() - Lock this folio, with a timeout.
> + * @folio: The folio to lock.
> + * @timeout: The timeout in jiffies; %MAX_SCHEDULE_TIMEOUT means wait forever.
> + *
> + * Return: 0 upon success; -ETIMEDOUT upon failure.
> + */

May return -EINTR?

> +static inline int folio_lock_timeout(struct folio *folio, long timeout)
> +{
> +	might_sleep();
> +	if (!folio_trylock(folio))
> +		return __folio_lock_timeout(folio, timeout);
> +	return 0;
> +}
> +
>  /**
>   * lock_page() - Lock the folio containing this page.
>   * @page: The page to lock.
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 2723104cc06a..c6056ec41284 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1220,7 +1220,7 @@ static inline bool folio_trylock_flag(struct folio *folio, int bit_nr,
>  int sysctl_page_lock_unfairness = 5;
>  
>  static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
> -		int state, enum behavior behavior)
> +		int state, enum behavior behavior, long timeout)
>  {
>  	wait_queue_head_t *q = folio_waitqueue(folio);
>  	int unfairness = sysctl_page_lock_unfairness;
> @@ -1229,6 +1229,7 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
>  	bool thrashing = false;
>  	unsigned long pflags;
>  	bool in_thrashing;
> +	int err;
>  
>  	if (bit_nr == PG_locked &&
>  	    !folio_test_uptodate(folio) && folio_test_workingset(folio)) {
> @@ -1295,10 +1296,13 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
>  		/* Loop until we've been woken or interrupted */
>  		flags = smp_load_acquire(&wait->flags);
>  		if (!(flags & WQ_FLAG_WOKEN)) {
> +			if (!timeout)
> +				break;
> +

An io_schedule_timeout of 0 is valid so why the special handling? It's
negative timeouts that cause schedule_timeout() to complain.

>  			if (signal_pending_state(state, current))
>  				break;
>  
> -			io_schedule();
> +			timeout = io_schedule_timeout(timeout);
>  			continue;
>  		}
>  
> @@ -1324,10 +1328,10 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
>  	}
>  
>  	/*
> -	 * If a signal happened, this 'finish_wait()' may remove the last
> -	 * waiter from the wait-queues, but the folio waiters bit will remain
> -	 * set. That's ok. The next wakeup will take care of it, and trying
> -	 * to do it here would be difficult and prone to races.
> +	 * If a signal/timeout happened, this 'finish_wait()' may remove the
> +	 * last waiter from the wait-queues, but the folio waiters bit will
> +	 * remain set. That's ok. The next wakeup will take care of it, and
> +	 * trying to do it here would be difficult and prone to races.
>  	 */
>  	finish_wait(q, wait);
>  
> @@ -1336,6 +1340,13 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
>  		psi_memstall_leave(&pflags);
>  	}
>  
> +	/*
> +	 * If we don't meet the success criteria below then we've got an error
> +	 * of some sort. Differentiate between the two error cases. If there's
> +	 * no time left it must have been a timeout.
> +	 */
> +	err = !timeout ? -ETIMEDOUT : -EINTR;
> +
>  	/*
>  	 * NOTE! The wait->flags weren't stable until we've done the
>  	 * 'finish_wait()', and we could have exited the loop above due
> @@ -1350,9 +1361,9 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
>  	 * waiter, but an exclusive one requires WQ_FLAG_DONE.
>  	 */
>  	if (behavior == EXCLUSIVE)
> -		return wait->flags & WQ_FLAG_DONE ? 0 : -EINTR;
> +		return wait->flags & WQ_FLAG_DONE ? 0 : err;
>  
> -	return wait->flags & WQ_FLAG_WOKEN ? 0 : -EINTR;
> +	return wait->flags & WQ_FLAG_WOKEN ? 0 : err;
>  }
>  
>  #ifdef CONFIG_MIGRATION
> @@ -1442,13 +1453,15 @@ void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep,
>  
>  void folio_wait_bit(struct folio *folio, int bit_nr)
>  {
> -	folio_wait_bit_common(folio, bit_nr, TASK_UNINTERRUPTIBLE, SHARED);
> +	folio_wait_bit_common(folio, bit_nr, TASK_UNINTERRUPTIBLE, SHARED,
> +			      MAX_SCHEDULE_TIMEOUT);
>  }
>  EXPORT_SYMBOL(folio_wait_bit);
>  
>  int folio_wait_bit_killable(struct folio *folio, int bit_nr)
>  {
> -	return folio_wait_bit_common(folio, bit_nr, TASK_KILLABLE, SHARED);
> +	return folio_wait_bit_common(folio, bit_nr, TASK_KILLABLE, SHARED,
> +				     MAX_SCHEDULE_TIMEOUT);
>  }
>  EXPORT_SYMBOL(folio_wait_bit_killable);
>  
> @@ -1467,7 +1480,8 @@ EXPORT_SYMBOL(folio_wait_bit_killable);
>   */
>  static int folio_put_wait_locked(struct folio *folio, int state)
>  {
> -	return folio_wait_bit_common(folio, PG_locked, state, DROP);
> +	return folio_wait_bit_common(folio, PG_locked, state, DROP,
> +				     MAX_SCHEDULE_TIMEOUT);
>  }
>  
>  /**
> @@ -1662,17 +1676,24 @@ EXPORT_SYMBOL_GPL(page_endio);
>  void __folio_lock(struct folio *folio)
>  {
>  	folio_wait_bit_common(folio, PG_locked, TASK_UNINTERRUPTIBLE,
> -				EXCLUSIVE);
> +				EXCLUSIVE, MAX_SCHEDULE_TIMEOUT);
>  }
>  EXPORT_SYMBOL(__folio_lock);
>  
>  int __folio_lock_killable(struct folio *folio)
>  {
>  	return folio_wait_bit_common(folio, PG_locked, TASK_KILLABLE,
> -					EXCLUSIVE);
> +					EXCLUSIVE, MAX_SCHEDULE_TIMEOUT);
>  }
>  EXPORT_SYMBOL_GPL(__folio_lock_killable);
>  
> +int __folio_lock_timeout(struct folio *folio, long timeout)
> +{
> +	return folio_wait_bit_common(folio, PG_locked, TASK_KILLABLE,
> +					EXCLUSIVE, timeout);
> +}
> +EXPORT_SYMBOL_GPL(__folio_lock_timeout);
> +
>  static int __folio_lock_async(struct folio *folio, struct wait_page_queue *wait)
>  {
>  	struct wait_queue_head *q = folio_waitqueue(folio);
> -- 
> 2.40.0.634.g4ca3ef3211-goog
> 

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 3/4] migrate_pages: Don't wait forever locking pages in MIGRATE_SYNC_LIGHT
  2023-04-21 22:12 ` [PATCH v2 3/4] migrate_pages: Don't wait forever locking pages in MIGRATE_SYNC_LIGHT Douglas Anderson
  2023-04-23  7:59   ` Huang, Ying
@ 2023-04-24  9:38   ` Mel Gorman
  1 sibling, 0 replies; 14+ messages in thread
From: Mel Gorman @ 2023-04-24  9:38 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Andrew Morton, Vlastimil Babka, Ying, Alexander Viro,
	Christian Brauner, linux-kernel, linux-mm, Yu Zhao, linux-fsdevel,
	Matthew Wilcox

On Fri, Apr 21, 2023 at 03:12:47PM -0700, Douglas Anderson wrote:
> The MIGRATE_SYNC_LIGHT mode is intended to block for things that will
> finish quickly but not for things that will take a long time. Exactly
> how long is too long is not well defined, but waits of tens of
> milliseconds is likely non-ideal.
> 
> Waiting on the folio lock in isolate_movable_page() is something that
> usually is pretty quick, but is not officially bounded. Nothing stops
> another process from holding a folio lock while doing an expensive
> operation. Having an unbounded wait like this is not within the design
> goals of MIGRATE_SYNC_LIGHT.
> 
> When putting a Chromebook under memory pressure (opening over 90 tabs
> on a 4GB machine) it was fairly easy to see delays waiting for the
> lock of > 100 ms. While the laptop wasn't amazingly usable in this
> state, it was still limping along and this state isn't something
> artificial. Sometimes we simply end up with a lot of memory pressure.
> 
> Putting the same Chromebook under memory pressure while it was running
> Android apps (though not stressing them) showed a much worse result
> (NOTE: this was on a older kernel but the codepaths here are
> similar). Android apps on ChromeOS currently run from a 128K-block,
> zlib-compressed, loopback-mounted squashfs disk. If we get a page
> fault from something backed by the squashfs filesystem we could end up
> holding a folio lock while reading enough from disk to decompress 128K
> (and then decompressing it using the somewhat slow zlib algorithms).
> That reading goes through the ext4 subsystem (because it's a loopback
> mount) before eventually ending up in the block subsystem. This extra
> jaunt adds extra overhead. Without much work I could see cases where
> we ended up blocked on a folio lock for over a second. With more
> more extreme memory pressure I could see up to 25 seconds.
> 
> Let's bound the amount of time we can wait for the folio lock. The
> SYNC_LIGHT migration mode can already handle failure for things that
> are slow, so adding this timeout in is fairly straightforward.
> 
> With this timeout, it can be seen that kcompactd can move on to more
> productive tasks if it's taking a long time to acquire a lock.
> 
> NOTE: The reason I stated digging into this isn't because some
> benchmark had gone awry, but because we've received in-the-field crash
> reports where we have a hung task waiting on the page lock (which is
> the equivalent code path on old kernels). While the root cause of
> those crashes is likely unrelated and won't be fixed by this patch,
> analyzing those crash reports did point out this unbounded wait and it
> seemed like something good to fix.
> 
> ALSO NOTE: the timeout mechanism used here uses "jiffies" and we also
> will retry up to 7 times. That doesn't give us much accuracy in
> specifying the timeout. On 1000 Hz machines we'll end up timing out in
> 7-14 ms. On 100 Hz machines we'll end up in 70-140 ms. Given that we
> don't have a strong definition of how long "too long" is, this is
> probably OK.
> 
> Suggested-by: Mel Gorman <mgorman@techsingularity.net>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> 
> Changes in v2:
> - Keep unbounded delay in "SYNC", delay with a timeout in "SYNC_LIGHT"
> 
>  mm/migrate.c | 20 +++++++++++++++++++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index db3f154446af..60982df71a93 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -58,6 +58,23 @@
>  
>  #include "internal.h"
>  
> +/* Returns the schedule timeout for a non-async mode */
> +static long timeout_for_mode(enum migrate_mode mode)
> +{
> +	/*
> +	 * We'll always return 1 jiffy as the timeout. Since all places using
> +	 * this timeout are in a retry loop this means that the maximum time
> +	 * we might block is actually NR_MAX_MIGRATE_SYNC_RETRY jiffies.
> +	 * If a jiffy is 1 ms that's 7 ms, though with the accuracy of the
> +	 * timeouts it often ends up more like 14 ms; if a jiffy is 10 ms
> +	 * that's 70-140 ms.
> +	 */
> +	if (mode == MIGRATE_SYNC_LIGHT)
> +		return 1;
> +

Use switch and WARN_ON_ONCE if MIGRATE_ASYNC with a fallthrough to
MIGRATE_SYNC_LIGHT?

> +	return MAX_SCHEDULE_TIMEOUT;
> +}
> +

Even though HZ is defined at compile time, it is underdesirable to use
a constant timeout unrelated to HZ because it's normal case is variable
depending on CONFIG_HZ.  Please use a value like DIV_ROUND_UP(HZ/250) or
DIV_ROUND_UP(HZ/1000) for a 4ms or 1ms timeout respectively. Even though
it's still potentially variable, it would make any hypothetical transition
to [milli|micro|nano]seconds easier in the future as the intent would be
known. While there are no plans for change as such, working in jiffies is
occasionally problematic in kernel/sched/. At OSPM this year, the notion
of dynamic HZ was brought up (it would be hard) and a preliminary step
would be converting all uses of HZ to normal time.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/4] mm/filemap: Add folio_lock_timeout()
  2023-04-24  8:22   ` Mel Gorman
@ 2023-04-24 16:22     ` Doug Anderson
  2023-04-25  8:00       ` Mel Gorman
  0 siblings, 1 reply; 14+ messages in thread
From: Doug Anderson @ 2023-04-24 16:22 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Vlastimil Babka, Ying, Alexander Viro,
	Christian Brauner, linux-kernel, linux-mm, Yu Zhao, linux-fsdevel,
	Matthew Wilcox

Hi,

On Mon, Apr 24, 2023 at 1:22 AM Mel Gorman <mgorman@techsingularity.net> wrote:
>
> > @@ -1295,10 +1296,13 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
> >               /* Loop until we've been woken or interrupted */
> >               flags = smp_load_acquire(&wait->flags);
> >               if (!(flags & WQ_FLAG_WOKEN)) {
> > +                     if (!timeout)
> > +                             break;
> > +
>
> An io_schedule_timeout of 0 is valid so why the special handling? It's
> negative timeouts that cause schedule_timeout() to complain.

It's not expected that the caller passes in a timeout of 0 here. The
test here actually handles the case that the previous call to
io_schedule_timeout() returned 0. In my patch, after the call to
io_schedule_timeout() we unconditionally "continue" and end up back at
the top of the loop. The next time through the loop if we don't see
the WOKEN flag then we'll check for the two "error" conditions
(timeout or signal pending) and break for either of them.

To make it clearer, I'll add this comment for the next version:

/* Break if the last io_schedule_timeout() said no time left */

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/4] mm/filemap: Add folio_lock_timeout()
  2023-04-24 16:22     ` Doug Anderson
@ 2023-04-25  8:00       ` Mel Gorman
  0 siblings, 0 replies; 14+ messages in thread
From: Mel Gorman @ 2023-04-25  8:00 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Andrew Morton, Vlastimil Babka, Ying, Alexander Viro,
	Christian Brauner, linux-kernel, linux-mm, Yu Zhao, linux-fsdevel,
	Matthew Wilcox

On Mon, Apr 24, 2023 at 09:22:55AM -0700, Doug Anderson wrote:
> Hi,
> 
> On Mon, Apr 24, 2023 at 1:22???AM Mel Gorman <mgorman@techsingularity.net> wrote:
> >
> > > @@ -1295,10 +1296,13 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
> > >               /* Loop until we've been woken or interrupted */
> > >               flags = smp_load_acquire(&wait->flags);
> > >               if (!(flags & WQ_FLAG_WOKEN)) {
> > > +                     if (!timeout)
> > > +                             break;
> > > +
> >
> > An io_schedule_timeout of 0 is valid so why the special handling? It's
> > negative timeouts that cause schedule_timeout() to complain.
> 
> It's not expected that the caller passes in a timeout of 0 here. The
> test here actually handles the case that the previous call to
> io_schedule_timeout() returned 0. In my patch, after the call to
> io_schedule_timeout() we unconditionally "continue" and end up back at
> the top of the loop. The next time through the loop if we don't see
> the WOKEN flag then we'll check for the two "error" conditions
> (timeout or signal pending) and break for either of them.

Ah, I see!

> 
> To make it clearer, I'll add this comment for the next version:
> 
> /* Break if the last io_schedule_timeout() said no time left */

Yes please.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-04-25  8:01 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-21 22:12 [PATCH v2 0/4] migrate: Avoid unbounded blocks in MIGRATE_SYNC_LIGHT Douglas Anderson
2023-04-21 22:12 ` [PATCH v2 1/4] mm/filemap: Add folio_lock_timeout() Douglas Anderson
2023-04-23  7:50   ` Huang, Ying
2023-04-24  8:22   ` Mel Gorman
2023-04-24 16:22     ` Doug Anderson
2023-04-25  8:00       ` Mel Gorman
2023-04-21 22:12 ` [PATCH v2 2/4] buffer: Add lock_buffer_timeout() Douglas Anderson
2023-04-23  8:47   ` Huang, Ying
2023-04-21 22:12 ` [PATCH v2 3/4] migrate_pages: Don't wait forever locking pages in MIGRATE_SYNC_LIGHT Douglas Anderson
2023-04-23  7:59   ` Huang, Ying
2023-04-24  9:38   ` Mel Gorman
2023-04-21 22:12 ` [PATCH v2 4/4] migrate_pages: Don't wait forever locking buffers " Douglas Anderson
     [not found] ` <20230422051858.1696-1-hdanton@sina.com>
     [not found]   ` <20230423081203.1812-1-hdanton@sina.com>
2023-04-23  8:35     ` [PATCH v2 1/4] mm/filemap: Add folio_lock_timeout() Gao Xiang
     [not found]     ` <20230423094901.1867-1-hdanton@sina.com>
2023-04-23 10:45       ` Gao Xiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).