Re: [PATCH v2] Increase page and bit waitqueue hash size

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Nicholas Piggin <npiggin@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Anton Blanchard <anton@ozlabs.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	Ingo Molnar <mingo@kernel.org>
Subject: Re: [PATCH v2] Increase page and bit waitqueue hash size
Date: Thu, 18 Mar 2021 08:22:57 +1000	[thread overview]
Message-ID: <1616017462.cmzed2nj60.astroid@bobo.none> (raw)
In-Reply-To: <CAHk-=whKnamUnZaJQ+fcqYoLJfc8QB8dv6=2P7o-XPBDOtbF3w@mail.gmail.com>

Excerpts from Linus Torvalds's message of March 18, 2021 5:26 am:
> On Wed, Mar 17, 2021 at 3:44 AM Nicholas Piggin <npiggin@gmail.com> wrote:
>>
>> Argh, because I didn't test small. Sorry I had the BASE_SMALL setting in
>> another patch and thought it would be a good idea to mash them together.
>> In hindsight probably not even if it did build.
> 
> I was going to complain about that code in general.
> 
> First complaining about the hash being small, and then adding a config
> option to make it ridiculously much *smaller* seemed wrong to begin
> with, and didn't make any sense.
> 
> So no, please don't smash together.

Fair point, fixed.

> 
> In fact, I'd like to see this split up, and with more numbers:
> 
>  - separate out the bit_waitqueue thing that is almost certainly not
> remotely as critical (and maybe not needed at all)
> 
>  - show the profile number _after_ the patch(es)

Might take some time to get a system and run tests. We actually had 
difficulty recreating it before this patch too, so it's kind of
hard to say _that_ was the exact case that previously ran badly and
is now fixed. We thought just the statistical nature of collisions
and page / lock contention made things occasionally line up and
tank.

>  - explain why you picked the random scaling numbers (21 and 22 for
> the two different cases)?
> 
>  - give an estimate of how big the array now ends up being for
> different configurations.
> 
> I think it ends up using that "scale" factor of 21, and basically
> being "memory size >> 21" and then rounding up to a power of two.
> 
> And honestly, I'm not sure that makes much sense. So for a 1GB machine
> we get the same as we used to for the bit waitqueue (twice as many for
> the page waitqueue) , but if you run on some smaller setup, you
> apparently can end up with just a couple of buckets.
> 
> So I'd feel a lot better about this if I saw the numbers, and got the
> feeling that the patch actually tries to take legacy machines into
> account.
>
> And even on a big machine, what's the advantage of scaling perfectly
> with memory. If you have a terabyte of RAM, why would you need half a
> million hash entries (if I did the math right), and use 4GB of memory
> on it? The contention doesn't go up by amount of memory, it goes up
> roughly by number of threads, and the two are very seldom really all
> that linearly connected.
> 
> So honestly, I'd like to see more reasonable numbers. I'd like to see
> what the impact of just raising the hash bit size from 8 to 16 is on
> that big machine. Maybe still using alloc_large_system_hash(), but
> using a low-imit of 8 (our traditional very old number that hasn't
> been a problem even on small machines), and a high-limit of 16 or
> something.
> 
> And if you want even more, I really really want that justified by the
> performance / profile numbers.

Yes all good points I'll add those numbers. It may need a floor and
ceiling or something like that. We may not need quite so many entries.

> 
> And does does that "bit_waitqueue" really merit updating AT ALL? It's
> almost entirely unused these days.

I updated it mainly because keeping the code more similar ends up being 
easier than unnecessary diverging. The memory cost is no big deal (once 
limits are fixed) so I prefer not to encounter some case where it falls 
over.

> I think maybe the page lock code
> used to use that, but then realized it had more specialized needs, so
> now it's separate.
> 
> So can we split that bit-waitqueue thing up from the page waitqueue
> changes? They have basically nothing in common except for a history,
> and I think they should be treated separately (including the
> explanation for what actually hits the bottleneck).

It's still used. Buffer heads being an obvious and widely used one that
follows similar usage pattern as page lock / writeback in some cases.
Several other filesystems seem to use it for similar block / IO
tracking structures by the looks (md, btrfs, nfs).

Thanks,
Nick

next prev parent reply	other threads:[~2021-03-17 22:23 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-17  7:54 [PATCH v2] Increase page and bit waitqueue hash size Nicholas Piggin
2021-03-17  8:38 ` Ingo Molnar
2021-03-17 10:02   ` Nicholas Piggin
2021-03-17 10:12 ` Rasmus Villemoes
2021-03-17 10:44   ` Nicholas Piggin
2021-03-17 19:26     ` Linus Torvalds
2021-03-17 22:22       ` Nicholas Piggin [this message]
2021-03-17 23:13         ` Linus Torvalds
2021-03-17 11:25 ` kernel test robot
2021-03-17 11:25   ` kernel test robot
2021-03-17 11:30 ` kernel test robot
2021-03-17 11:30   ` kernel test robot
2021-03-17 12:38 ` [tip: sched/core] sched/wait_bit, mm/filemap: " tip-bot2 for Nicholas Piggin
2021-03-17 15:16   ` Thomas Gleixner
2021-03-17 19:54     ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1616017462.cmzed2nj60.astroid@bobo.none \
    --to=npiggin@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=anton@ozlabs.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=mingo@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.