From: Waiman Long <waiman.long@hpe.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
X86 ML <x86@kernel.org>, Jiang Liu <jiang.liu@linux.intel.com>,
Borislav Petkov <bp@suse.de>, Andy Lutomirski <luto@kernel.org>,
Scott J Norton <scott.norton@hpe.com>,
Douglas Hatch <doug.hatch@hpe.com>,
Randy Wright <rwright@hpe.com>
Subject: Re: [PATCH] x86/hpet: Reduce HPET counter read contention
Date: Thu, 7 Apr 2016 11:07:41 -0400 [thread overview]
Message-ID: <570677BD.1000800@hpe.com> (raw)
In-Reply-To: <CALCETrWhbsASNkpYW3DL-c76bTdE2rEmcimn6TunQL87Kx0XuA@mail.gmail.com>
On 04/07/2016 12:58 AM, Andy Lutomirski wrote:
> On Wed, Apr 6, 2016 at 7:02 AM, Waiman Long<Waiman.Long@hpe.com> wrote:
>> On a large system with many CPUs, using HPET as the clock source can
>> have a significant impact on the overall system performance because
>> of the following reasons:
>> 1) There is a single HPET counter shared by all the CPUs.
>> 2) HPET counter reading is a very slow operation.
>>
>> Using HPET as the default clock source may happen when, for example,
>> the TSC clock calibration exceeds the allowable tolerance. Something
>> the performance slowdown can be so severe that the system may crash
>> because of a NMI watchdog soft lockup, for example.
>>
>> This patch attempts to reduce HPET read contention by using the fact
>> that if more than one task are trying to access HPET at the same time,
>> it will be more efficient if one task in the group reads the HPET
>> counter and shares it with the rest of the group instead of each
>> group member reads the HPET counter individually.
>>
>> This is done by using a combination word with a sequence number and
>> a bit lock. The task that gets the bit lock will be responsible for
>> reading the HPET counter and update the sequence number. The others
>> will monitor the change in sequence number and grab the HPET counter
>> accordingly.
>>
>> On a 4-socket Haswell-EX box with 72 cores (HT off), running the
>> AIM7 compute workload (1500 users) on a 4.6-rc1 kernel (HZ=1000)
>> with and without the patch has the following performance numbers
>> (with HPET or TSC as clock source):
>>
>> TSC = 646515 jobs/min
>> HPET w/o patch = 566708 jobs/min
>> HPET with patch = 638791 jobs/min
>>
>> The perf profile showed a reduction of the %CPU time consumed by
>> read_hpet from 4.99% without patch to 1.41% with patch.
>>
>> On a 16-socket IvyBridge-EX system with 240 cores (HT on), on the
>> other hand, the performance numbers of the same benchmark were:
>>
>> TSC = 3145329 jobs/min
>> HPET w/o patch = 1108537 jobs/min
>> HPET with patch = 3019934 jobs/min
>>
>> The corresponding perf profile showed a drop of CPU consumption of
>> the read_hpet function from more than 34% to just 2.96%.
>>
>> Signed-off-by: Waiman Long<Waiman.Long@hpe.com>
>> ---
>> arch/x86/kernel/hpet.c | 110 +++++++++++++++++++++++++++++++++++++++++++++++-
>> 1 files changed, 109 insertions(+), 1 deletions(-)
>>
>> diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
>> index a1f0e4a..9e3de73 100644
>> --- a/arch/x86/kernel/hpet.c
>> +++ b/arch/x86/kernel/hpet.c
>> @@ -759,11 +759,112 @@ static int hpet_cpuhp_notify(struct notifier_block *n,
>> #endif
>>
>> /*
>> + * Reading the HPET counter is a very slow operation. If a large number of
>> + * CPUs are trying to access the HPET counter simultaneously, it can cause
>> + * massive delay and slow down system performance dramatically. This may
>> + * happen when HPET is the default clock source instead of TSC. For a
>> + * really large system with hundreds of CPUs, the slowdown may be so
>> + * severe that it may actually crash the system because of a NMI watchdog
>> + * soft lockup, for example.
>> + *
>> + * If multiple CPUs are trying to access the HPET counter at the same time,
>> + * we don't actually need to read the counter multiple times. Instead, the
>> + * other CPUs can use the counter value read by the first CPU in the group.
>> + *
>> + * A sequence number whose lsb is a lock bit is used to control which CPU
>> + * has the right to read the HPET counter directly and which CPUs are going
>> + * to get the indirect value read by the lock holder. For the later group,
>> + * if the sequence number differs from the expected locked value, they
>> + * can assume that the saved HPET value is up-to-date and return it.
>> + *
>> + * This mechanism is only activated on system with a large number of CPUs.
>> + * Currently, it is enabled when nr_cpus> 64.
>> + */
> Reading the HPET is so slow that all the atomic ops in the world won't
> make a dent. Why not just turn this optimization on unconditionally?
>
> --Andy
I am constantly on the alert that we should not introduce regression on
lesser systems like a single socket machine with a few cores. That is
why I put the check to conditionally enable this optimization. I have no
issue of taking that out and let it be the default as long as no one object.
Cheers,
Longman
next prev parent reply other threads:[~2016-04-07 15:07 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-06 14:02 [PATCH] x86/hpet: Reduce HPET counter read contention Waiman Long
2016-04-07 4:58 ` Andy Lutomirski
2016-04-07 15:07 ` Waiman Long [this message]
2016-04-08 0:13 ` Andy Lutomirski
2016-04-08 20:07 ` Waiman Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=570677BD.1000800@hpe.com \
--to=waiman.long@hpe.com \
--cc=bp@suse.de \
--cc=doug.hatch@hpe.com \
--cc=hpa@zytor.com \
--cc=jiang.liu@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=luto@kernel.org \
--cc=mingo@redhat.com \
--cc=rwright@hpe.com \
--cc=scott.norton@hpe.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.