From: Matthew Wilcox <willy@infradead.org>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
linux-xfs@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>,
Hugh Dickins <hughd@google.com>
Subject: Re: writeback completion soft lockup BUG in folio_wake_bit()
Date: Wed, 16 Mar 2022 20:59:39 +0000 [thread overview]
Message-ID: <YjJPu/3tYnuKK888@casper.infradead.org> (raw)
In-Reply-To: <YjDj3lvlNJK/IPiU@bfoster>
On Tue, Mar 15, 2022 at 03:07:10PM -0400, Brian Foster wrote:
> What seems to happen is that the majority of the fsync calls end up
> waiting on writeback of a particular page, the wakeup of the writeback
> bit on that page wakes a task that immediately resets PG_writeback on
> the page such that N other folio_wait_writeback() waiters see the bit
> still set and immediately place themselves back onto the tail of the
> wait queue. Meanwhile the waker task spins in the WQ_FLAG_BOOKMARK loop
> in folio_wake_bit() (backing off the lock for a cycle or so in each
> iteration) only to find the same bunch of tasks in the queue. This
> process repeats for a long enough amount of time to trigger the soft
> lockup warning. I've confirmed this spinning behavior with a tracepoint
> in the bookmark loop that indicates we're stuck for many hundreds of
> thousands of iterations (at least) of this loop when the soft lockup
> warning triggers.
[...]
> I've run a few quick experiments to try and corroborate this analysis.
> The problem goes away completely if I either back out the loop change in
> folio_wait_writeback() or bump WAITQUEUE_WALK_BREAK_CNT to something
> like 128 (i.e. greater than the total possible number of waiter tasks in
> this test). I've also played a few games with bookmark behavior mostly
> out of curiosity, but usually end up introducing other problems like
> missed wakeups, etc.
As I recall, the bookmark hack was introduced in order to handle
lock_page() problems. It wasn't really supposed to handle writeback,
but nobody thought it would cause any harm (and indeed, it didn't at the
time). So how about we only use bookmarks for lock_page(), since
lock_page() usually doesn't have the multiple-waker semantics that
writeback has?
(this is more in the spirit of "minimal patch" -- I think initialising
the bookmark should be moved to folio_unlock()).
diff --git a/mm/filemap.c b/mm/filemap.c
index b2728eb52407..9ee3c5f1f489 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1146,26 +1146,28 @@ static int wake_page_function(wait_queue_entry_t *wait, unsigned mode, int sync,
return (flags & WQ_FLAG_EXCLUSIVE) != 0;
}
-static void folio_wake_bit(struct folio *folio, int bit_nr)
+static void folio_wake_bit(struct folio *folio, int bit_nr,
+ wait_queue_entry_t *bookmark)
{
wait_queue_head_t *q = folio_waitqueue(folio);
struct wait_page_key key;
unsigned long flags;
- wait_queue_entry_t bookmark;
key.folio = folio;
key.bit_nr = bit_nr;
key.page_match = 0;
- bookmark.flags = 0;
- bookmark.private = NULL;
- bookmark.func = NULL;
- INIT_LIST_HEAD(&bookmark.entry);
+ if (bookmark) {
+ bookmark->flags = 0;
+ bookmark->private = NULL;
+ bookmark->func = NULL;
+ INIT_LIST_HEAD(&bookmark->entry);
+ }
spin_lock_irqsave(&q->lock, flags);
- __wake_up_locked_key_bookmark(q, TASK_NORMAL, &key, &bookmark);
+ __wake_up_locked_key_bookmark(q, TASK_NORMAL, &key, bookmark);
- while (bookmark.flags & WQ_FLAG_BOOKMARK) {
+ while (bookmark && (bookmark->flags & WQ_FLAG_BOOKMARK)) {
/*
* Take a breather from holding the lock,
* allow pages that finish wake up asynchronously
@@ -1175,7 +1177,7 @@ static void folio_wake_bit(struct folio *folio, int bit_nr)
spin_unlock_irqrestore(&q->lock, flags);
cpu_relax();
spin_lock_irqsave(&q->lock, flags);
- __wake_up_locked_key_bookmark(q, TASK_NORMAL, &key, &bookmark);
+ __wake_up_locked_key_bookmark(q, TASK_NORMAL, &key, bookmark);
}
/*
@@ -1204,7 +1206,7 @@ static void folio_wake(struct folio *folio, int bit)
{
if (!folio_test_waiters(folio))
return;
- folio_wake_bit(folio, bit);
+ folio_wake_bit(folio, bit, NULL);
}
/*
@@ -1554,12 +1556,15 @@ static inline bool clear_bit_unlock_is_negative_byte(long nr, volatile void *mem
*/
void folio_unlock(struct folio *folio)
{
+ wait_queue_entry_t bookmark;
+
/* Bit 7 allows x86 to check the byte's sign bit */
BUILD_BUG_ON(PG_waiters != 7);
BUILD_BUG_ON(PG_locked > 7);
VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
+
if (clear_bit_unlock_is_negative_byte(PG_locked, folio_flags(folio, 0)))
- folio_wake_bit(folio, PG_locked);
+ folio_wake_bit(folio, PG_locked, &bookmark);
}
EXPORT_SYMBOL(folio_unlock);
@@ -1578,7 +1583,7 @@ void folio_end_private_2(struct folio *folio)
{
VM_BUG_ON_FOLIO(!folio_test_private_2(folio), folio);
clear_bit_unlock(PG_private_2, folio_flags(folio, 0));
- folio_wake_bit(folio, PG_private_2);
+ folio_wake_bit(folio, PG_private_2, NULL);
folio_put(folio);
}
EXPORT_SYMBOL(folio_end_private_2);
next prev parent reply other threads:[~2022-03-16 20:59 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-15 19:07 writeback completion soft lockup BUG in folio_wake_bit() Brian Foster
2022-03-16 20:59 ` Matthew Wilcox [this message]
2022-03-16 23:35 ` Linus Torvalds
2022-03-17 15:04 ` Matthew Wilcox
2022-03-17 19:26 ` Linus Torvalds
2022-03-17 21:16 ` Matthew Wilcox
2022-03-17 22:52 ` Dave Chinner
2022-03-18 13:16 ` Jan Kara
2022-03-18 18:56 ` Linus Torvalds
2022-03-19 16:23 ` Theodore Ts'o
2022-03-30 15:55 ` Christoph Hellwig
2022-03-17 15:31 ` Brian Foster
2022-03-17 13:51 ` Brian Foster
2022-03-18 14:14 ` Brian Foster
2022-03-18 14:45 ` Matthew Wilcox
2022-03-18 18:58 ` Linus Torvalds
2022-10-20 1:35 ` Dan Williams
2022-10-23 22:38 ` Linus Torvalds
2022-10-24 19:39 ` Tim Chen
2022-10-24 19:43 ` Linus Torvalds
2022-10-24 20:14 ` Dan Williams
2022-10-24 20:13 ` Dan Williams
2022-10-24 20:28 ` Linus Torvalds
2022-10-24 20:35 ` Dan Williams
2022-10-25 15:58 ` Arechiga Lopez, Jesus A
2022-10-25 19:19 ` Matthew Wilcox
2022-10-25 19:20 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YjJPu/3tYnuKK888@casper.infradead.org \
--to=willy@infradead.org \
--cc=bfoster@redhat.com \
--cc=hughd@google.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).