public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
From: Mark Rutland <mark.rutland@arm.com>
To: Pingfan Liu <kernelfans@gmail.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>,
	Vladimir Murzin <vladimir.murzin@arm.com>,
	Steve Capper <steve.capper@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH] arm64/mm: save memory access in check_and_switch_context() fast switch path
Date: Thu, 9 Jul 2020 12:48:05 +0100	[thread overview]
Message-ID: <20200709114805.GA11227@C02TD0UTHF1T.local> (raw)
In-Reply-To: <CAFgQCTtu9U2bB9JfXMw5TLd=tcrXkexVZOSgP=CDHnfQamddbQ@mail.gmail.com>

On Tue, Jul 07, 2020 at 09:50:58AM +0800, Pingfan Liu wrote:
> On Mon, Jul 6, 2020 at 4:10 PM Pingfan Liu <kernelfans@gmail.com> wrote:
> >
> > On Fri, Jul 3, 2020 at 6:13 PM Mark Rutland <mark.rutland@arm.com> wrote:
> > >
> > > On Fri, Jul 03, 2020 at 01:44:39PM +0800, Pingfan Liu wrote:
> > > > The cpu_number and __per_cpu_offset cost two different cache lines, and may
> > > > not exist after a heavy user space load.
> > > >
> > > > By replacing per_cpu(active_asids, cpu) with this_cpu_ptr(&active_asids) in
> > > > fast path, register is used and these memory access are avoided.
> > >
> > > How about:
> > >
> > > | On arm64, smp_processor_id() reads a per-cpu `cpu_number` variable,
> > > | using the per-cpu offset stored in the tpidr_el1 system register. In
> > > | some cases we generate a per-cpu address with a sequence like:
> > > |
> > > | | cpu_ptr = &per_cpu(ptr, smp_processor_id());
> > > |
> > > | Which potentially incurs a cache miss for both `cpu_number` and the
> > > | in-memory `__per_cpu_offset` array. This can be written more optimally
> > > | as:
> > > |
> > > | | cpu_ptr = this_cpu_ptr(ptr);
> > > |
> > > | ... which only needs the offset from tpidr_el1, and does not need to
> > > | load from memory.
> > Appreciate for your clear document.
> > >
> > > > By replacing per_cpu(active_asids, cpu) with this_cpu_ptr(&active_asids) in
> > > > fast path, register is used and these memory access are avoided.
> > >
> > > Do you have any numbers that show benefit here? It's not clear to me how
> > > often the above case would apply where the cahes would also be hot for
> > > everything else we need, and numbers would help to justify that.
> > Initially, I was just abstracted by the macro __my_cpu_offset
> > implement, and came to this question. But following your thinking, I
> > realized data is needed to make things clear.
> >
> > I have finished a test with 5.8.0-rc4 kernel on a 46 cpus qualcomm machine.
> > command: time -p make all -j138
> >
> > Before this patch:
> > real 291.86
> > user 11050.18
> > sys 362.91
> >
> > After this patch
> > real 291.11
> > user 11055.62
> > sys 363.39
> >
> > As the data, it shows a very small improvement.
> The data may be affected by random factors, and less persuasive. And I
> tried to do some repeated tests with perf-stat.
> #cat b.sh
> make clean && make all -j138
> 
> #perf stat --repeat 10 --null --sync sh b.sh
> 
> - before this patch
>  Performance counter stats for 'sh b.sh' (10 runs):
> 
>             298.62 +- 1.86 seconds time elapsed  ( +-  0.62% )
> 
> 
> - after this patch
>  Performance counter stats for 'sh b.sh' (10 runs):
> 
>            297.734 +- 0.954 seconds time elapsed  ( +-  0.32% )
> 

IIUC that's a 0.3% improvement. It'd be worth putting these results in
the commit message.

Could you also try that with "perf bench sched messaging" as the
workload? As a microbenchmark, that might show the highest potential
benefit, and it'd be nice to have those figures too if possible.

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-07-09 11:49 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-03  5:44 [PATCH] arm64/mm: save memory access in check_and_switch_context() fast switch path Pingfan Liu
2020-07-03 10:13 ` Mark Rutland
2020-07-06  8:10   ` Pingfan Liu
2020-07-07  1:50     ` Pingfan Liu
2020-07-09 11:48       ` Mark Rutland [this message]
2020-07-10  8:03         ` Pingfan Liu
2020-07-10  9:35           ` Mark Rutland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200709114805.GA11227@C02TD0UTHF1T.local \
    --to=mark.rutland@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=jean-philippe@linaro.org \
    --cc=kernelfans@gmail.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=steve.capper@arm.com \
    --cc=vladimir.murzin@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox