From: Jan Kara <jack@suse.cz>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>,
linux-xfs@vger.kernel.org, david@fromorbit.com,
fdmanana@kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 1/3] vfs: remove lockdep bogosity in __sb_start_write
Date: Tue, 10 Nov 2020 12:33:00 +0100 [thread overview]
Message-ID: <20201110113300.GC20780@quack2.suse.cz> (raw)
In-Reply-To: <160494581071.772573.10466314698408344068.stgit@magnolia>
On Mon 09-11-20 10:16:50, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> __sb_start_write has some weird looking lockdep code that claims to
> exist to handle nested freeze locking requests from xfs. The code as
> written seems broken -- if we think we hold a read lock on any of the
> higher freeze levels (e.g. we hold SB_FREEZE_WRITE and are trying to
> lock SB_FREEZE_PAGEFAULT), it converts a blocking lock attempt into a
> trylock.
>
> However, it's not correct to downgrade a blocking lock attempt to a
> trylock unless the downgrading code or the callers are prepared to deal
> with that situation. Neither __sb_start_write nor its callers handle
> this at all. For example:
>
> sb_start_pagefault ignores the return value completely, with the result
> that if xfs_filemap_fault loses a race with a different thread trying to
> fsfreeze, it will proceed without pagefault freeze protection (thereby
> breaking locking rules) and then unlocks the pagefault freeze lock that
> it doesn't own on its way out (thereby corrupting the lock state), which
> leads to a system hang shortly afterwards.
>
> Normally, this won't happen because our ownership of a read lock on a
> higher freeze protection level blocks fsfreeze from grabbing a write
> lock on that higher level. *However*, if lockdep is offline,
> lock_is_held_type unconditionally returns 1, which means that
> percpu_rwsem_is_held returns 1, which means that __sb_start_write
> unconditionally converts blocking freeze lock attempts into trylocks,
> even when we *don't* hold anything that would block a fsfreeze.
>
> Apparently this all held together until 5.10-rc1, when bugs in lockdep
> caused lockdep to shut itself off early in an fstests run, and once
> fstests gets to the "race writes with freezer" tests, kaboom. This
> might explain the long trail of vanishingly infrequent livelocks in
> fstests after lockdep goes offline that I've never been able to
> diagnose.
>
> We could fix it by spinning on the trylock if wait==true, but AFAICT the
> locking works fine if lockdep is not built at all (and I didn't see any
> complaints running fstests overnight), so remove this snippet entirely.
>
> NOTE: Commit f4b554af9931 in 2015 created the current weird logic (which
> used to exist in a different form in commit 5accdf82ba25c from 2012) in
> __sb_start_write. XFS solved this whole problem in the late 2.6 era by
> creating a variant of transactions (XFS_TRANS_NO_WRITECOUNT) that don't
> grab intwrite freeze protection, thus making lockdep's solution
> unnecessary. The commit claims that Dave Chinner explained that the
> trylock hack + comment could be removed, but nobody ever did.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
Thanks for cleaning this up. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/super.c | 33 ++++-----------------------------
> 1 file changed, 4 insertions(+), 29 deletions(-)
>
>
> diff --git a/fs/super.c b/fs/super.c
> index a51c2083cd6b..e1fd667454d4 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -1647,36 +1647,11 @@ EXPORT_SYMBOL(__sb_end_write);
> */
> int __sb_start_write(struct super_block *sb, int level, bool wait)
> {
> - bool force_trylock = false;
> - int ret = 1;
> + if (!wait)
> + return percpu_down_read_trylock(sb->s_writers.rw_sem + level-1);
>
> -#ifdef CONFIG_LOCKDEP
> - /*
> - * We want lockdep to tell us about possible deadlocks with freezing
> - * but it's it bit tricky to properly instrument it. Getting a freeze
> - * protection works as getting a read lock but there are subtle
> - * problems. XFS for example gets freeze protection on internal level
> - * twice in some cases, which is OK only because we already hold a
> - * freeze protection also on higher level. Due to these cases we have
> - * to use wait == F (trylock mode) which must not fail.
> - */
> - if (wait) {
> - int i;
> -
> - for (i = 0; i < level - 1; i++)
> - if (percpu_rwsem_is_held(sb->s_writers.rw_sem + i)) {
> - force_trylock = true;
> - break;
> - }
> - }
> -#endif
> - if (wait && !force_trylock)
> - percpu_down_read(sb->s_writers.rw_sem + level-1);
> - else
> - ret = percpu_down_read_trylock(sb->s_writers.rw_sem + level-1);
> -
> - WARN_ON(force_trylock && !ret);
> - return ret;
> + percpu_down_read(sb->s_writers.rw_sem + level-1);
> + return 1;
> }
> EXPORT_SYMBOL(__sb_start_write);
>
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next prev parent reply other threads:[~2020-11-10 11:33 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-09 18:16 [PATCH v3 0/3] vfs: remove lockdep fs freeze weirdness Darrick J. Wong
2020-11-09 18:16 ` [PATCH 1/3] vfs: remove lockdep bogosity in __sb_start_write Darrick J. Wong
2020-11-10 11:33 ` Jan Kara [this message]
2020-11-09 18:16 ` [PATCH 2/3] vfs: separate __sb_start_write into blocking and non-blocking helpers Darrick J. Wong
2020-11-10 11:35 ` Jan Kara
2020-11-10 18:32 ` Christoph Hellwig
2020-11-09 18:17 ` [PATCH 3/3] vfs: move __sb_{start,end}_write* to fs.h Darrick J. Wong
2020-11-10 11:36 ` Jan Kara
2020-11-10 18:32 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201110113300.GC20780@quack2.suse.cz \
--to=jack@suse.cz \
--cc=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=fdmanana@kernel.org \
--cc=hch@lst.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox