From: Waiman Long <waiman.long@hpe.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
Jonathan Corbet <corbet@lwn.net>,
LKML <linux-kernel@vger.kernel.org>, <linux-doc@kernel.org>,
<x86@kernel.org>, Borislav Petkov <bp@suse.de>,
Andy Lutomirski <luto@kernel.org>,
Scott J Norton <scott.norton@hpe.com>,
Douglas Hatch <doug.hatch@hpe.com>,
Randy Wright <rwright@hpe.com>
Subject: Re: [PATCH v2] x86/hpet: Reduce HPET counter read contention
Date: Mon, 11 Apr 2016 16:06:04 -0400 [thread overview]
Message-ID: <570C03AC.7080404@hpe.com> (raw)
In-Reply-To: <alpine.DEB.2.11.1604091552540.3786@nanos>
On 04/09/2016 08:19 PM, Thomas Gleixner wrote:
> On Fri, 8 Apr 2016, Waiman Long wrote:
>> This patch attempts to reduce HPET read contention by using the fact
>> that if more than one task are trying to access HPET at the same time,
>> it will be more efficient if one task in the group reads the HPET
>> counter and shares it with the rest of the group instead of each
>> group member reads the HPET counter individually.
> That has nothing to do with tasks. clocksource reads can happen from almost
> any context. The problem is concurrent access on multiple cpus.
You are right. I should have used CPU instead.
>> This optimization is enabled on systems with more than 32 CPUs. It can
>> also be explicitly enabled or disabled by using the new opt_read_hpet
>> kernel parameter.
> Please not. What's wrong with enabling it unconditionally?
>
That is nothing wrong to enable it unconditionally. I am just not sure
if that is the right thing to do. Since both you and Andy said we should
enable it unconditionally, I will do so in the next version of the patch.
>> +/*
>> * Clock source related code
>> */
>> static cycle_t read_hpet(struct clocksource *cs)
>> {
>> - return (cycle_t)hpet_readl(HPET_COUNTER);
>> + int seq, cnt = 0;
>> + u32 time;
>> +
>> + if (opt_read_hpet<= 0)
>> + return (cycle_t)hpet_readl(HPET_COUNTER);
> This wants to be conditional on CONFIG_SMP. No point in having all that muck
> around for an UP kernel.
Will do so.
>> + seq = READ_ONCE(hpet_save.seq);
>> + if (!HPET_SEQ_LOCKED(seq)) {
>> + int old, new = seq + 1;
>> + unsigned long flags;
>> +
>> + local_irq_save(flags);
>> + /*
>> + * Set the lock bit (lsb) to get the right to read HPET
>> + * counter directly. If successful, read the counter, save
>> + * its value, and increment the sequence number. Otherwise,
>> + * increment the sequnce number to the expected locked value
>> + * for comparison later on.
>> + */
>> + old = cmpxchg(&hpet_save.seq, seq, new);
>> + if (old == seq) {
>> + time = hpet_readl(HPET_COUNTER);
>> + WRITE_ONCE(hpet_save.hpet, time);
>> +
>> + /* Unlock */
>> + smp_store_release(&hpet_save.seq, new + 1);
>> + local_irq_restore(flags);
>> + return (cycle_t)time;
>> + }
>> + local_irq_restore(flags);
>> + seq = new;
>> + }
>> +
>> + /*
>> + * Wait until the locked sequence number changes which indicates
>> + * that the saved HPET value is up-to-date.
>> + */
>> + while (READ_ONCE(hpet_save.seq) == seq) {
>> + /*
>> + * Since reading the HPET is much slower than a single
>> + * cpu_relax() instruction, we use two here in an attempt
>> + * to reduce the amount of cacheline contention in the
>> + * hpet_save.seq cacheline.
>> + */
>> + cpu_relax();
>> + cpu_relax();
>> +
>> + if (likely(++cnt<= HPET_RESET_THRESHOLD))
>> + continue;
>> +
>> + /*
>> + * In the unlikely event that it takes too long for the lock
>> + * holder to read the HPET, we do it ourselves and try to
>> + * reset the lock. This will also break a deadlock if it
>> + * happens, for example, when the process context lock holder
>> + * gets killed in the middle of reading the HPET counter.
>> + */
>> + time = hpet_readl(HPET_COUNTER);
>> + WRITE_ONCE(hpet_save.hpet, time);
>> + if (READ_ONCE(hpet_save.seq) == seq) {
>> + if (cmpxchg(&hpet_save.seq, seq, seq + 1) == seq)
>> + pr_warn("read_hpet: reset hpet seq to 0x%x\n",
>> + seq + 1);
> This is voodoo programming and actively dangerous.
>
> CPU0 CPU1 CPU2
> lock_hpet()
> T1=read_hpet() wait_for_unlock()
> store_hpet(T1)
> ....
> T2 = read_hpet()
> unlock_hpet()
> lock_hpet()
> T3 = read_hpet()
> store_hpet(T3)
> unlock_hpet()
> return T3
> lock_hpet()
> T4 = read_hpet() wait_for_unlock()
> store_hpet(T4)
> store_hpet(T2)
> unlock_hpet() return T2
>
> CPU2 will observe time going backwards.
>
> Thanks,
>
> tglx
That part is leftover code from my testing and debugging effort. I think
using local_irq_save() should allow the critical section to be executed
without interruption. In this case, I should be able to remove the
threshold checking code without harm.
Thanks for the review.
Cheers,
Longman
prev parent reply other threads:[~2016-04-11 20:06 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-08 20:11 [PATCH v2] x86/hpet: Reduce HPET counter read contention Waiman Long
2016-04-10 0:19 ` Thomas Gleixner
2016-04-11 20:06 ` Waiman Long [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=570C03AC.7080404@hpe.com \
--to=waiman.long@hpe.com \
--cc=bp@suse.de \
--cc=corbet@lwn.net \
--cc=doug.hatch@hpe.com \
--cc=hpa@zytor.com \
--cc=linux-doc@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mingo@redhat.com \
--cc=rwright@hpe.com \
--cc=scott.norton@hpe.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.