From: Brian Foster <bfoster@redhat.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
linux-xfs@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>,
Hugh Dickins <hughd@google.com>
Subject: Re: writeback completion soft lockup BUG in folio_wake_bit()
Date: Thu, 17 Mar 2022 09:51:44 -0400 [thread overview]
Message-ID: <YjM88OwoccZOKp86@bfoster> (raw)
In-Reply-To: <YjJPu/3tYnuKK888@casper.infradead.org>
On Wed, Mar 16, 2022 at 08:59:39PM +0000, Matthew Wilcox wrote:
> On Tue, Mar 15, 2022 at 03:07:10PM -0400, Brian Foster wrote:
> > What seems to happen is that the majority of the fsync calls end up
> > waiting on writeback of a particular page, the wakeup of the writeback
> > bit on that page wakes a task that immediately resets PG_writeback on
> > the page such that N other folio_wait_writeback() waiters see the bit
> > still set and immediately place themselves back onto the tail of the
> > wait queue. Meanwhile the waker task spins in the WQ_FLAG_BOOKMARK loop
> > in folio_wake_bit() (backing off the lock for a cycle or so in each
> > iteration) only to find the same bunch of tasks in the queue. This
> > process repeats for a long enough amount of time to trigger the soft
> > lockup warning. I've confirmed this spinning behavior with a tracepoint
> > in the bookmark loop that indicates we're stuck for many hundreds of
> > thousands of iterations (at least) of this loop when the soft lockup
> > warning triggers.
>
> [...]
>
> > I've run a few quick experiments to try and corroborate this analysis.
> > The problem goes away completely if I either back out the loop change in
> > folio_wait_writeback() or bump WAITQUEUE_WALK_BREAK_CNT to something
> > like 128 (i.e. greater than the total possible number of waiter tasks in
> > this test). I've also played a few games with bookmark behavior mostly
> > out of curiosity, but usually end up introducing other problems like
> > missed wakeups, etc.
>
> As I recall, the bookmark hack was introduced in order to handle
> lock_page() problems. It wasn't really supposed to handle writeback,
> but nobody thought it would cause any harm (and indeed, it didn't at the
> time). So how about we only use bookmarks for lock_page(), since
> lock_page() usually doesn't have the multiple-waker semantics that
> writeback has?
>
Oh, interesting. I wasn't aware of the tenuous status of the bookmark
code. This is indeed much nicer than anything I was playing with. I
suspect it will address the problem, but I'll throw it at my test env
for a while and follow up.. thanks!
Brian
> (this is more in the spirit of "minimal patch" -- I think initialising
> the bookmark should be moved to folio_unlock()).
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index b2728eb52407..9ee3c5f1f489 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1146,26 +1146,28 @@ static int wake_page_function(wait_queue_entry_t *wait, unsigned mode, int sync,
> return (flags & WQ_FLAG_EXCLUSIVE) != 0;
> }
>
> -static void folio_wake_bit(struct folio *folio, int bit_nr)
> +static void folio_wake_bit(struct folio *folio, int bit_nr,
> + wait_queue_entry_t *bookmark)
> {
> wait_queue_head_t *q = folio_waitqueue(folio);
> struct wait_page_key key;
> unsigned long flags;
> - wait_queue_entry_t bookmark;
>
> key.folio = folio;
> key.bit_nr = bit_nr;
> key.page_match = 0;
>
> - bookmark.flags = 0;
> - bookmark.private = NULL;
> - bookmark.func = NULL;
> - INIT_LIST_HEAD(&bookmark.entry);
> + if (bookmark) {
> + bookmark->flags = 0;
> + bookmark->private = NULL;
> + bookmark->func = NULL;
> + INIT_LIST_HEAD(&bookmark->entry);
> + }
>
> spin_lock_irqsave(&q->lock, flags);
> - __wake_up_locked_key_bookmark(q, TASK_NORMAL, &key, &bookmark);
> + __wake_up_locked_key_bookmark(q, TASK_NORMAL, &key, bookmark);
>
> - while (bookmark.flags & WQ_FLAG_BOOKMARK) {
> + while (bookmark && (bookmark->flags & WQ_FLAG_BOOKMARK)) {
> /*
> * Take a breather from holding the lock,
> * allow pages that finish wake up asynchronously
> @@ -1175,7 +1177,7 @@ static void folio_wake_bit(struct folio *folio, int bit_nr)
> spin_unlock_irqrestore(&q->lock, flags);
> cpu_relax();
> spin_lock_irqsave(&q->lock, flags);
> - __wake_up_locked_key_bookmark(q, TASK_NORMAL, &key, &bookmark);
> + __wake_up_locked_key_bookmark(q, TASK_NORMAL, &key, bookmark);
> }
>
> /*
> @@ -1204,7 +1206,7 @@ static void folio_wake(struct folio *folio, int bit)
> {
> if (!folio_test_waiters(folio))
> return;
> - folio_wake_bit(folio, bit);
> + folio_wake_bit(folio, bit, NULL);
> }
>
> /*
> @@ -1554,12 +1556,15 @@ static inline bool clear_bit_unlock_is_negative_byte(long nr, volatile void *mem
> */
> void folio_unlock(struct folio *folio)
> {
> + wait_queue_entry_t bookmark;
> +
> /* Bit 7 allows x86 to check the byte's sign bit */
> BUILD_BUG_ON(PG_waiters != 7);
> BUILD_BUG_ON(PG_locked > 7);
> VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
> +
> if (clear_bit_unlock_is_negative_byte(PG_locked, folio_flags(folio, 0)))
> - folio_wake_bit(folio, PG_locked);
> + folio_wake_bit(folio, PG_locked, &bookmark);
> }
> EXPORT_SYMBOL(folio_unlock);
>
> @@ -1578,7 +1583,7 @@ void folio_end_private_2(struct folio *folio)
> {
> VM_BUG_ON_FOLIO(!folio_test_private_2(folio), folio);
> clear_bit_unlock(PG_private_2, folio_flags(folio, 0));
> - folio_wake_bit(folio, PG_private_2);
> + folio_wake_bit(folio, PG_private_2, NULL);
> folio_put(folio);
> }
> EXPORT_SYMBOL(folio_end_private_2);
>
next prev parent reply other threads:[~2022-03-17 13:51 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-15 19:07 writeback completion soft lockup BUG in folio_wake_bit() Brian Foster
2022-03-16 20:59 ` Matthew Wilcox
2022-03-16 23:35 ` Linus Torvalds
2022-03-17 15:04 ` Matthew Wilcox
2022-03-17 19:26 ` Linus Torvalds
2022-03-17 21:16 ` Matthew Wilcox
2022-03-17 22:52 ` Dave Chinner
2022-03-18 13:16 ` Jan Kara
2022-03-18 18:56 ` Linus Torvalds
2022-03-19 16:23 ` Theodore Ts'o
2022-03-30 15:55 ` Christoph Hellwig
2022-03-17 15:31 ` Brian Foster
2022-03-17 13:51 ` Brian Foster [this message]
2022-03-18 14:14 ` Brian Foster
2022-03-18 14:45 ` Matthew Wilcox
2022-03-18 18:58 ` Linus Torvalds
2022-10-20 1:35 ` Dan Williams
2022-10-23 22:38 ` Linus Torvalds
2022-10-24 19:39 ` Tim Chen
2022-10-24 19:43 ` Linus Torvalds
2022-10-24 20:14 ` Dan Williams
2022-10-24 20:13 ` Dan Williams
2022-10-24 20:28 ` Linus Torvalds
2022-10-24 20:35 ` Dan Williams
2022-10-25 15:58 ` Arechiga Lopez, Jesus A
2022-10-25 19:19 ` Matthew Wilcox
2022-10-25 19:20 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YjM88OwoccZOKp86@bfoster \
--to=bfoster@redhat.com \
--cc=hughd@google.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).