From: jan.glauber@caviumnetworks.com (Jan Glauber)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH 2/2] arm64: defconfig: Raise NR_CPUS to 256
Date: Mon, 26 Mar 2018 10:52:14 +0200 [thread overview]
Message-ID: <20180326085214.GB5991@hc> (raw)
In-Reply-To: <20180306140201.GB7428@hc>
On Tue, Mar 06, 2018 at 03:02:01PM +0100, Jan Glauber wrote:
> On Tue, Mar 06, 2018 at 02:12:29PM +0100, Arnd Bergmann wrote:
> > On Fri, Mar 2, 2018 at 3:37 PM, Jan Glauber <jglauber@cavium.com> wrote:
> > > ThunderX1 dual socket has 96 CPUs and ThunderX2 has 224 CPUs.
> >
> > Are you sure about those numbers? From my counting, I would have expected
> > twice that number in both cases: 48 cores, 2 chips and 2x SMT for ThunderX
> > vs 52 Cores, 2 chips and 4x SMT for ThunderX2.
>
> That's what I have on those machines. I counted SMT as normal CPUs as it
> doesn't make a difference for the config. I've not seen SMT on ThunderX.
>
> The ThunderX2 number of 224 is already with 4x SMT (and 2 chips) but
> there may be other versions planned that I'm not aware of.
>
> > > Therefore raise the default number of CPUs from 64 to 256
> > > by adding an arm64 specific option to override the generic default.
> >
> > Regardless of what the correct numbers for your chips are, I'd like
> > to hear some other opinions on how high we should raise that default
> > limit, both in arch/arm64/Kconfig and in the defconfig file.
> >
> > As I remember it, there is a noticeable cost for taking the limit beyond
> > BITS_PER_LONG, both in terms of memory consumption and also
> > runtime performance (copying and comparing CPU masks).
>
> OK, that explains the default. My unverified assumption is that
> increasing the CPU masks wont be a noticable performance hit.
>
> Also, I don't think that anyone who wants performance will use
> defconfig. All server distributions would bump up the NR_CPUS anyway
> and really small systems will probably need to tune the config
> anyway.
>
> For me defconfig should produce a usable system, not with every last
> driver configured but with all the basics like CPUs, networking, etc.
> fully present.
>
> > I'm sure someone will keep coming up with even larger configurations
> > in the future, so we should try to decide how far we can take the
> > defaults for the moment without impacting users of the smallest
> > systems. Alternatively, you could add some measurements that
> > show how much memory and CPU time is used up on a typical
> > configuration for a small system (4 cores, no SMT, 512 MB RAM).
> > If that's low enough, we could just do it anyway.
>
> OK, I'll take a look.
I've made some measurements on a 4 core board (Cavium 81xx) with
NR_CPUS set to 64 or 256:
- vmlinux grows by 0.04 % with 256 CPUs
- Kernel compile time was a bit faster with 256 CPUS (which does
not make sense, but at least is seems to not suffer from the change).
Is there a benchmark that will be better suited? Maybe even a
microbenchmark that will suffer from the longer cpumasks?
- Available memory decreased by 0.13% (restricted memory to 512 MB),
BSS increased 5.3 %
Cheers,
Jan
WARNING: multiple messages have this Message-ID (diff)
From: Jan Glauber <jan.glauber@caviumnetworks.com>
To: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will.deacon@arm.com>,
Linux ARM <linux-arm-kernel@lists.infradead.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/2] arm64: defconfig: Raise NR_CPUS to 256
Date: Mon, 26 Mar 2018 10:52:14 +0200 [thread overview]
Message-ID: <20180326085214.GB5991@hc> (raw)
In-Reply-To: <20180306140201.GB7428@hc>
On Tue, Mar 06, 2018 at 03:02:01PM +0100, Jan Glauber wrote:
> On Tue, Mar 06, 2018 at 02:12:29PM +0100, Arnd Bergmann wrote:
> > On Fri, Mar 2, 2018 at 3:37 PM, Jan Glauber <jglauber@cavium.com> wrote:
> > > ThunderX1 dual socket has 96 CPUs and ThunderX2 has 224 CPUs.
> >
> > Are you sure about those numbers? From my counting, I would have expected
> > twice that number in both cases: 48 cores, 2 chips and 2x SMT for ThunderX
> > vs 52 Cores, 2 chips and 4x SMT for ThunderX2.
>
> That's what I have on those machines. I counted SMT as normal CPUs as it
> doesn't make a difference for the config. I've not seen SMT on ThunderX.
>
> The ThunderX2 number of 224 is already with 4x SMT (and 2 chips) but
> there may be other versions planned that I'm not aware of.
>
> > > Therefore raise the default number of CPUs from 64 to 256
> > > by adding an arm64 specific option to override the generic default.
> >
> > Regardless of what the correct numbers for your chips are, I'd like
> > to hear some other opinions on how high we should raise that default
> > limit, both in arch/arm64/Kconfig and in the defconfig file.
> >
> > As I remember it, there is a noticeable cost for taking the limit beyond
> > BITS_PER_LONG, both in terms of memory consumption and also
> > runtime performance (copying and comparing CPU masks).
>
> OK, that explains the default. My unverified assumption is that
> increasing the CPU masks wont be a noticable performance hit.
>
> Also, I don't think that anyone who wants performance will use
> defconfig. All server distributions would bump up the NR_CPUS anyway
> and really small systems will probably need to tune the config
> anyway.
>
> For me defconfig should produce a usable system, not with every last
> driver configured but with all the basics like CPUs, networking, etc.
> fully present.
>
> > I'm sure someone will keep coming up with even larger configurations
> > in the future, so we should try to decide how far we can take the
> > defaults for the moment without impacting users of the smallest
> > systems. Alternatively, you could add some measurements that
> > show how much memory and CPU time is used up on a typical
> > configuration for a small system (4 cores, no SMT, 512 MB RAM).
> > If that's low enough, we could just do it anyway.
>
> OK, I'll take a look.
I've made some measurements on a 4 core board (Cavium 81xx) with
NR_CPUS set to 64 or 256:
- vmlinux grows by 0.04 % with 256 CPUs
- Kernel compile time was a bit faster with 256 CPUS (which does
not make sense, but at least is seems to not suffer from the change).
Is there a benchmark that will be better suited? Maybe even a
microbenchmark that will suffer from the longer cpumasks?
- Available memory decreased by 0.13% (restricted memory to 512 MB),
BSS increased 5.3 %
Cheers,
Jan
next prev parent reply other threads:[~2018-03-26 8:52 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-02 14:37 [PATCH 1/2] arm64: defconfig: enable THUNDER_NIC_VF Jan Glauber
2018-03-02 14:37 ` Jan Glauber
2018-03-02 14:37 ` [PATCH 2/2] arm64: defconfig: Raise NR_CPUS to 256 Jan Glauber
2018-03-02 14:37 ` Jan Glauber
2018-03-06 13:12 ` Arnd Bergmann
2018-03-06 13:12 ` Arnd Bergmann
2018-03-06 14:02 ` Jan Glauber
2018-03-06 14:02 ` Jan Glauber
2018-03-06 14:30 ` Arnd Bergmann
2018-03-06 14:30 ` Arnd Bergmann
2018-03-26 8:52 ` Jan Glauber [this message]
2018-03-26 8:52 ` Jan Glauber
2018-03-26 9:28 ` Arnd Bergmann
2018-03-26 9:28 ` Arnd Bergmann
2018-03-26 10:02 ` Jan Glauber
2018-03-26 10:02 ` Jan Glauber
2018-04-30 9:36 ` [PATCH 1/2] arm64: defconfig: enable THUNDER_NIC_VF Jan Glauber
2018-04-30 9:36 ` Jan Glauber
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180326085214.GB5991@hc \
--to=jan.glauber@caviumnetworks.com \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.