All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jaegeuk Kim <jaegeuk@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Waiman Long <longman@redhat.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Tim Murray <timmurray@google.com>,
	Linux F2FS Dev Mailing List
	<linux-f2fs-devel@lists.sourceforge.net>
Subject: Re: [f2fs-dev] [GIT PULL] f2fs for 5.18
Date: Wed, 23 Mar 2022 14:21:07 -0700	[thread overview]
Message-ID: <YjuPQ36A4W553ai1@google.com> (raw)
In-Reply-To: <CAHk-=wi99R8i=uvHiHo3jjZPzg6oTJW1rin3ekuPbuccS5XZqA@mail.gmail.com>

On 03/23, Linus Torvalds wrote:
> On Wed, Mar 23, 2022 at 9:26 AM Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> >
> > OTOH, I was suspecting the major contetion would be
> >         f2fs_lock_op -> f2fs_down_read(&sbi->cp_rwsem);
> > , which was used for most of filesystem operations.
> 
> Very possible, I was just looking at a random one in f2fs/file.c
> obviously with no actual numbers in hand.
> 
> In general, I really hate seeing specialized locks, but this f2fs use
> case is in some ways worse than other ad-hoc locks I've seen - simply
> because it's been one whole-sale conversion of "down_read/write()" to
> "f2fs_down_read/write()" - regardless of _which_ lock is being locked.
> 
> (Now, it's not all bad news - in other respects it's much better than
> some ad-hoc locking: at least you still will participate in lockdep,
> and you use the actual low-level locking primitives instead of making
> up your own and getting memory ordering wrong).
> 
> But basically I think it would have been much nicer if you would have
> done this for just the _one_ lock that mattered, whichever lock that
> might be. Partly as documentation, and partly so that maybe some day
> you can split that lock up (or maybe notice cases where you can avoid
> it entirely).
> 
> For example, if it's really just f2fs_lock_op() that needs this, the
> special "wait_event(trylock)" hack could have been entirely local to
> just *that*, rather than affecting all the other locks too.
> 
> And the very first f2fs_lock_op() case I find, I see that the lock is
> pointless. Again, that's unlikely to be the *cause* of any of these
> problems, but the fact that I've now looked at two of the f2fs locks,
> and gone "the locking seems to be pointlessly badly done" does imply
> that the problem isn't "down_read()", it's the use.
> 
> That other lock I reacted to was the f2fs_lock_op(sbi) at the top of
> f2fs_new_inode().
> 
> Look, you have a new inode that you just allocated, that nobody else
> can yet access.
> 
> And the only thing that that f2fs_lock_op(sbi) -> f2fs_unlock_op(sbi)
> sequence protects is the f2fs_alloc_nid() for that new inode.
> 
> Ok, so maybe f2fs_alloc_nid() needs that lock?
> 
> No it doesn't. It already has
> 
>  - &nm_i->nid_list_lock spinlock for its own in-memory internal NID caches
> 
> *and* when that fails
> 
>  - &NM_I(sbi)->build_lock for protecting all of f2fs_build_free_nids()
> 
> *and* inside of that lock
> 
>  - f2fs_down_read(&nm_i->nat_tree_lock) for protecting the NAT tree structures.
> 
> So I see two major issues in the very first user of that
> f2fs_lock_op() that I look at:
> 
>  (a) it seems to be entirely unnecessary

Actually, when I took a look at the above path, indeed, f2fs_lock_op in
f2fs_new_inode may be unnecessary at all aligned to your points. Even, that
might hurt performance since we get f2fs_lock_op twice before dealing with
dentries like f2fs_add_link. Let me test a bit whether there's any regression
if I remove f2fs_lock_op in f2fs_new_inode.

> 
>  (b) it is a classic case of "multiple nested locks".
> 
> Now, it's possible that I'm wrong on (a) and there's some odd reason
> that lock is needed (maybe there is a lock ordering problem for one of
> the other locks between readers and writers, and the op-lock acts as a
> mutual exclusion for that).
> 
> But (b) really is a classic problem case for locking: nested locks are
> *much* more likely to cause horrible contention, because not any
> contention in any of the locks will end up affecting the others (and
> you easily get "bunching up" of different processes when they get
> synchronized with each other thanks to the inner lock).
> 
> Nested locking is often required, but it's one of those things where
> you just need to be aware that they can be horribly bad for
> performance, _particularly_ if an inner lock sees contention and
> essentially "transfers" that contention to an outer lock.
> 
> Maybe I've been unlucky. Maybe the two cases I happened to look at
> were just completely harmless, and very unusual. But the fact that I'm
> two-for-two and go "that locking looks like a prime candidate to be
> fixed" makes me suspect there's a lot of low-hanging fruit in there.

Thank you so much for taking the time to write this great advise. Let me dig
more whether there's anything that I can relax the lock use-cases further.
(tbh, I haven't reviewed them for a long time due to focusing on stability
issues mostly.)

> 
> And that whole "wait_event(trylock)" thing is a symptom of problematic
> f2fs locking, rather than a solution to it.

Understood. If I can avoid lock contention upfront, definitely it wouldn't
need to apply rwsem change at all. Let me take some time to think about how to
move forward.

> 
>                  Linus


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

WARNING: multiple messages have this Message-ID (diff)
From: Jaegeuk Kim <jaegeuk@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Tim Murray <timmurray@google.com>,
	Waiman Long <longman@redhat.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux F2FS Dev Mailing List 
	<linux-f2fs-devel@lists.sourceforge.net>
Subject: Re: [GIT PULL] f2fs for 5.18
Date: Wed, 23 Mar 2022 14:21:07 -0700	[thread overview]
Message-ID: <YjuPQ36A4W553ai1@google.com> (raw)
In-Reply-To: <CAHk-=wi99R8i=uvHiHo3jjZPzg6oTJW1rin3ekuPbuccS5XZqA@mail.gmail.com>

On 03/23, Linus Torvalds wrote:
> On Wed, Mar 23, 2022 at 9:26 AM Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> >
> > OTOH, I was suspecting the major contetion would be
> >         f2fs_lock_op -> f2fs_down_read(&sbi->cp_rwsem);
> > , which was used for most of filesystem operations.
> 
> Very possible, I was just looking at a random one in f2fs/file.c
> obviously with no actual numbers in hand.
> 
> In general, I really hate seeing specialized locks, but this f2fs use
> case is in some ways worse than other ad-hoc locks I've seen - simply
> because it's been one whole-sale conversion of "down_read/write()" to
> "f2fs_down_read/write()" - regardless of _which_ lock is being locked.
> 
> (Now, it's not all bad news - in other respects it's much better than
> some ad-hoc locking: at least you still will participate in lockdep,
> and you use the actual low-level locking primitives instead of making
> up your own and getting memory ordering wrong).
> 
> But basically I think it would have been much nicer if you would have
> done this for just the _one_ lock that mattered, whichever lock that
> might be. Partly as documentation, and partly so that maybe some day
> you can split that lock up (or maybe notice cases where you can avoid
> it entirely).
> 
> For example, if it's really just f2fs_lock_op() that needs this, the
> special "wait_event(trylock)" hack could have been entirely local to
> just *that*, rather than affecting all the other locks too.
> 
> And the very first f2fs_lock_op() case I find, I see that the lock is
> pointless. Again, that's unlikely to be the *cause* of any of these
> problems, but the fact that I've now looked at two of the f2fs locks,
> and gone "the locking seems to be pointlessly badly done" does imply
> that the problem isn't "down_read()", it's the use.
> 
> That other lock I reacted to was the f2fs_lock_op(sbi) at the top of
> f2fs_new_inode().
> 
> Look, you have a new inode that you just allocated, that nobody else
> can yet access.
> 
> And the only thing that that f2fs_lock_op(sbi) -> f2fs_unlock_op(sbi)
> sequence protects is the f2fs_alloc_nid() for that new inode.
> 
> Ok, so maybe f2fs_alloc_nid() needs that lock?
> 
> No it doesn't. It already has
> 
>  - &nm_i->nid_list_lock spinlock for its own in-memory internal NID caches
> 
> *and* when that fails
> 
>  - &NM_I(sbi)->build_lock for protecting all of f2fs_build_free_nids()
> 
> *and* inside of that lock
> 
>  - f2fs_down_read(&nm_i->nat_tree_lock) for protecting the NAT tree structures.
> 
> So I see two major issues in the very first user of that
> f2fs_lock_op() that I look at:
> 
>  (a) it seems to be entirely unnecessary

Actually, when I took a look at the above path, indeed, f2fs_lock_op in
f2fs_new_inode may be unnecessary at all aligned to your points. Even, that
might hurt performance since we get f2fs_lock_op twice before dealing with
dentries like f2fs_add_link. Let me test a bit whether there's any regression
if I remove f2fs_lock_op in f2fs_new_inode.

> 
>  (b) it is a classic case of "multiple nested locks".
> 
> Now, it's possible that I'm wrong on (a) and there's some odd reason
> that lock is needed (maybe there is a lock ordering problem for one of
> the other locks between readers and writers, and the op-lock acts as a
> mutual exclusion for that).
> 
> But (b) really is a classic problem case for locking: nested locks are
> *much* more likely to cause horrible contention, because not any
> contention in any of the locks will end up affecting the others (and
> you easily get "bunching up" of different processes when they get
> synchronized with each other thanks to the inner lock).
> 
> Nested locking is often required, but it's one of those things where
> you just need to be aware that they can be horribly bad for
> performance, _particularly_ if an inner lock sees contention and
> essentially "transfers" that contention to an outer lock.
> 
> Maybe I've been unlucky. Maybe the two cases I happened to look at
> were just completely harmless, and very unusual. But the fact that I'm
> two-for-two and go "that locking looks like a prime candidate to be
> fixed" makes me suspect there's a lot of low-hanging fruit in there.

Thank you so much for taking the time to write this great advise. Let me dig
more whether there's anything that I can relax the lock use-cases further.
(tbh, I haven't reviewed them for a long time due to focusing on stability
issues mostly.)

> 
> And that whole "wait_event(trylock)" thing is a symptom of problematic
> f2fs locking, rather than a solution to it.

Understood. If I can avoid lock contention upfront, definitely it wouldn't
need to apply rwsem change at all. Let me take some time to think about how to
move forward.

> 
>                  Linus

  reply	other threads:[~2022-03-23 21:21 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-21 20:39 [f2fs-dev] [GIT PULL] f2fs for 5.18 Jaegeuk Kim
2022-03-21 20:39 ` Jaegeuk Kim
2022-03-22 17:22 ` [f2fs-dev] " Linus Torvalds
2022-03-22 17:22   ` Linus Torvalds
2022-03-22 17:37   ` [f2fs-dev] " Waiman Long
2022-03-22 17:37     ` Waiman Long
2022-03-22 17:50     ` [f2fs-dev] " Linus Torvalds
2022-03-22 17:50       ` Linus Torvalds
2022-03-22 20:58       ` [f2fs-dev] " Jaegeuk Kim
2022-03-22 20:58         ` Jaegeuk Kim
2022-06-15 20:13         ` [f2fs-dev] " Pavel Machek
2022-06-15 20:13           ` Pavel Machek
2022-06-16 17:02           ` [f2fs-dev] " Jaegeuk Kim
2022-06-16 17:02             ` Jaegeuk Kim
2022-03-23  0:34       ` [f2fs-dev] " Tim Murray via Linux-f2fs-devel
2022-03-23  0:34         ` Tim Murray
2022-03-23  2:03         ` [f2fs-dev] " Linus Torvalds
2022-03-23  2:03           ` Linus Torvalds
2022-03-23 16:26           ` [f2fs-dev] " Jaegeuk Kim
2022-03-23 16:26             ` Jaegeuk Kim
2022-03-23 17:06             ` [f2fs-dev] " Linus Torvalds
2022-03-23 17:06               ` Linus Torvalds
2022-03-23 21:21               ` Jaegeuk Kim [this message]
2022-03-23 21:21                 ` Jaegeuk Kim
2022-03-23  7:33   ` [f2fs-dev] " Christoph Hellwig
2022-03-23  7:33     ` Christoph Hellwig
2022-03-23 16:48     ` [f2fs-dev] " Jaegeuk Kim
2022-03-23 16:48       ` Jaegeuk Kim
2022-03-23 16:49       ` [f2fs-dev] " Christoph Hellwig
2022-03-23 16:49         ` Christoph Hellwig
2022-03-23 17:00         ` [f2fs-dev] " Jaegeuk Kim
2022-03-23 17:00           ` Jaegeuk Kim
2022-03-23 19:28       ` [f2fs-dev] " Waiman Long
2022-03-23 19:28         ` Waiman Long
2022-03-23 21:25         ` [f2fs-dev] " Jaegeuk Kim
2022-03-23 21:25           ` Jaegeuk Kim
2022-03-22 18:32 ` [f2fs-dev] " pr-tracker-bot
2022-03-22 18:32   ` pr-tracker-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YjuPQ36A4W553ai1@google.com \
    --to=jaegeuk@kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=timmurray@google.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.