public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jaegeuk Kim <jaegeuk@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Tim Murray <timmurray@google.com>,
	Waiman Long <longman@redhat.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux F2FS Dev Mailing List 
	<linux-f2fs-devel@lists.sourceforge.net>
Subject: Re: [GIT PULL] f2fs for 5.18
Date: Wed, 23 Mar 2022 14:21:07 -0700	[thread overview]
Message-ID: <YjuPQ36A4W553ai1@google.com> (raw)
In-Reply-To: <CAHk-=wi99R8i=uvHiHo3jjZPzg6oTJW1rin3ekuPbuccS5XZqA@mail.gmail.com>

On 03/23, Linus Torvalds wrote:
> On Wed, Mar 23, 2022 at 9:26 AM Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> >
> > OTOH, I was suspecting the major contetion would be
> >         f2fs_lock_op -> f2fs_down_read(&sbi->cp_rwsem);
> > , which was used for most of filesystem operations.
> 
> Very possible, I was just looking at a random one in f2fs/file.c
> obviously with no actual numbers in hand.
> 
> In general, I really hate seeing specialized locks, but this f2fs use
> case is in some ways worse than other ad-hoc locks I've seen - simply
> because it's been one whole-sale conversion of "down_read/write()" to
> "f2fs_down_read/write()" - regardless of _which_ lock is being locked.
> 
> (Now, it's not all bad news - in other respects it's much better than
> some ad-hoc locking: at least you still will participate in lockdep,
> and you use the actual low-level locking primitives instead of making
> up your own and getting memory ordering wrong).
> 
> But basically I think it would have been much nicer if you would have
> done this for just the _one_ lock that mattered, whichever lock that
> might be. Partly as documentation, and partly so that maybe some day
> you can split that lock up (or maybe notice cases where you can avoid
> it entirely).
> 
> For example, if it's really just f2fs_lock_op() that needs this, the
> special "wait_event(trylock)" hack could have been entirely local to
> just *that*, rather than affecting all the other locks too.
> 
> And the very first f2fs_lock_op() case I find, I see that the lock is
> pointless. Again, that's unlikely to be the *cause* of any of these
> problems, but the fact that I've now looked at two of the f2fs locks,
> and gone "the locking seems to be pointlessly badly done" does imply
> that the problem isn't "down_read()", it's the use.
> 
> That other lock I reacted to was the f2fs_lock_op(sbi) at the top of
> f2fs_new_inode().
> 
> Look, you have a new inode that you just allocated, that nobody else
> can yet access.
> 
> And the only thing that that f2fs_lock_op(sbi) -> f2fs_unlock_op(sbi)
> sequence protects is the f2fs_alloc_nid() for that new inode.
> 
> Ok, so maybe f2fs_alloc_nid() needs that lock?
> 
> No it doesn't. It already has
> 
>  - &nm_i->nid_list_lock spinlock for its own in-memory internal NID caches
> 
> *and* when that fails
> 
>  - &NM_I(sbi)->build_lock for protecting all of f2fs_build_free_nids()
> 
> *and* inside of that lock
> 
>  - f2fs_down_read(&nm_i->nat_tree_lock) for protecting the NAT tree structures.
> 
> So I see two major issues in the very first user of that
> f2fs_lock_op() that I look at:
> 
>  (a) it seems to be entirely unnecessary

Actually, when I took a look at the above path, indeed, f2fs_lock_op in
f2fs_new_inode may be unnecessary at all aligned to your points. Even, that
might hurt performance since we get f2fs_lock_op twice before dealing with
dentries like f2fs_add_link. Let me test a bit whether there's any regression
if I remove f2fs_lock_op in f2fs_new_inode.

> 
>  (b) it is a classic case of "multiple nested locks".
> 
> Now, it's possible that I'm wrong on (a) and there's some odd reason
> that lock is needed (maybe there is a lock ordering problem for one of
> the other locks between readers and writers, and the op-lock acts as a
> mutual exclusion for that).
> 
> But (b) really is a classic problem case for locking: nested locks are
> *much* more likely to cause horrible contention, because not any
> contention in any of the locks will end up affecting the others (and
> you easily get "bunching up" of different processes when they get
> synchronized with each other thanks to the inner lock).
> 
> Nested locking is often required, but it's one of those things where
> you just need to be aware that they can be horribly bad for
> performance, _particularly_ if an inner lock sees contention and
> essentially "transfers" that contention to an outer lock.
> 
> Maybe I've been unlucky. Maybe the two cases I happened to look at
> were just completely harmless, and very unusual. But the fact that I'm
> two-for-two and go "that locking looks like a prime candidate to be
> fixed" makes me suspect there's a lot of low-hanging fruit in there.

Thank you so much for taking the time to write this great advise. Let me dig
more whether there's anything that I can relax the lock use-cases further.
(tbh, I haven't reviewed them for a long time due to focusing on stability
issues mostly.)

> 
> And that whole "wait_event(trylock)" thing is a symptom of problematic
> f2fs locking, rather than a solution to it.

Understood. If I can avoid lock contention upfront, definitely it wouldn't
need to apply rwsem change at all. Let me take some time to think about how to
move forward.

> 
>                  Linus

  reply	other threads:[~2022-03-23 21:21 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-21 20:39 [GIT PULL] f2fs for 5.18 Jaegeuk Kim
2022-03-22 17:22 ` Linus Torvalds
2022-03-22 17:37   ` Waiman Long
2022-03-22 17:50     ` Linus Torvalds
2022-03-22 20:58       ` Jaegeuk Kim
2022-06-15 20:13         ` Pavel Machek
2022-06-16 17:02           ` Jaegeuk Kim
2022-03-23  0:34       ` Tim Murray
2022-03-23  2:03         ` Linus Torvalds
2022-03-23 16:26           ` Jaegeuk Kim
2022-03-23 17:06             ` Linus Torvalds
2022-03-23 21:21               ` Jaegeuk Kim [this message]
2022-03-23  7:33   ` Christoph Hellwig
2022-03-23 16:48     ` Jaegeuk Kim
2022-03-23 16:49       ` Christoph Hellwig
2022-03-23 17:00         ` Jaegeuk Kim
2022-03-23 19:28       ` Waiman Long
2022-03-23 21:25         ` Jaegeuk Kim
2022-03-22 18:32 ` [f2fs-dev] " pr-tracker-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YjuPQ36A4W553ai1@google.com \
    --to=jaegeuk@kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=timmurray@google.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox