From: Jisheng Zhang <jszhang@marvell.com>
To: Timur Tabi <timur@codeaurora.org>
Cc: Linaro ACPI Mailman List <linaro-acpi@lists.linaro.org>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will.deacon@arm.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Wim Van Sebroeck <wim@iguana.be>, Fu Wei <fu.wei@linaro.org>,
wei@redhat.com, Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>,
Al Stone <al.stone@linaro.org>,
Daniel Lezcano <daniel.lezcano@linaro.org>,
ACPI Devel Maling List <linux-acpi@vger.kernel.org>,
Guenter Roeck <linux@roeck-us.net>, Len Brown <lenb@kernel.org>,
harba@codeaurora.org, linux-watchdog@vger.kernel.org,
Arnd Bergmann <arnd@arndb.de>,
Marc Zyngier <marc.zyngier@arm.com>, Jon Masters <jcm@redhat.com>,
Christopher Covington <cov@codeaurora.org>,
Thomas Gleixner <tglx@linutronix.de>,
linux-arm-kernel@lists.infradead.org,
G Gregory <graeme.gregory@linaro.org>,
"Rafael J. Wysocki" <rjw@rjwysocki.net>,
rruigrok@codeaurora
Subject: Re: [PATCH v9 4/9] clocksource/drivers/arm_arch_timer: use readq to get 64-bit CNTVCT
Date: Wed, 27 Jul 2016 11:33:42 +0800 [thread overview]
Message-ID: <20160727113342.2a839c1a@xhacker> (raw)
In-Reply-To: <57976FA5.2070802@codeaurora.org>
+1
On Tue, 26 Jul 2016 09:11:49 -0500 Timur Tabi wrote:
> Will Deacon wrote:
> > The kernel really needs to support both of those platforms :/
> >
> > For the memory-mapped counter registers, the architecture says:
> >
> > `If the implementation supports 64-bit atomic accesses, then the
> > CNTV_CVAL register must be accessible as an atomic 64-bit value.'
> >
> > which is borderline tautological. If we take the generous reading that
> > this means AArch64 CPUs can use readq (and I'm not completely
> > comfortable with that assertion, particularly as you say that it breaks
> > the model), then you still need to use readq_relaxed here to avoid a
> > DSB. Furthermore, what are you going to do for AArch32? readq doesn't
> > exist over there, and if you use the generic implementation then it's
> > not atomic. In which case, we end up with the current code, as well as a
> > readq_relaxed guarded by a questionable #ifdef that is known to break a
> > supported platform for an unknown performance improvement. Hardly a big
> > win.
>
> I know Fu dropped this patch, and I don't want to kick a dead horse, but
> I was wondering if it would be okay to do this:
>
> static u64 arch_counter_get_cntvct_mem(void)
> {
> #ifdef readq_relaxed
> return readq_relaxed(arch_counter_base + CNTVCT_LO);
> #else
> u32 vct_lo, vct_hi, tmp_hi;
>
> do {
> vct_hi = readl_relaxed(arch_counter_base + CNTVCT_HI);
> vct_lo = readl_relaxed(arch_counter_base + CNTVCT_LO);
> tmp_hi = readl_relaxed(arch_counter_base + CNTVCT_HI);
> } while (vct_hi != tmp_hi);
>
> return ((u64) vct_hi << 32) | vct_lo;
> #endif
> }
>
> readq and readq_relaxed are defined in arch/arm64/include/asm/io.h. Why
> would the function exist if AArch64 CPUs can't use it?
+1
I measured the performance on berlin arm64 platforms:
compared with original version, using readq_relaxed could reduce
time of arch_counter_get_cntvct_mem() by about 42%!
Thanks,
Jisheng
WARNING: multiple messages have this Message-ID (diff)
From: Jisheng Zhang <jszhang@marvell.com>
To: Timur Tabi <timur@codeaurora.org>
Cc: Will Deacon <will.deacon@arm.com>, Fu Wei <fu.wei@linaro.org>,
"Linaro ACPI Mailman List" <linaro-acpi@lists.linaro.org>,
Catalin Marinas <catalin.marinas@arm.com>,
<rruigrok@codeaurora.org>, Wim Van Sebroeck <wim@iguana.be>,
<wei@redhat.com>, Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>,
Al Stone <al.stone@linaro.org>,
Daniel Lezcano <daniel.lezcano@linaro.org>,
ACPI Devel Maling List <linux-acpi@vger.kernel.org>,
Guenter Roeck <linux@roeck-us.net>, Len Brown <lenb@kernel.org>,
<harba@codeaurora.org>, <linux-watchdog@vger.kernel.org>,
Arnd Bergmann <arnd@arndb.de>,
"Marc Zyngier" <marc.zyngier@arm.com>,
Jon Masters <jcm@redhat.com>,
"Christopher Covington" <cov@codeaurora.org>,
Thomas Gleixner <tglx@linutronix.de>,
<linux-arm-kernel@lists.infradead.org>,
G Gregory <graeme.gregory@linaro.org>,
"Rafael J. Wysocki" <rjw@rjwysocki.net>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Leo Duran <leo.duran@amd.com>, Hanjun Guo <hanjun.guo@linaro.org>,
"Suravee Suthikulpanit" <Suravee.Suthikulpanit@amd.com>,
Sudeep Holla <sudeep.holla@arm.com>
Subject: Re: [PATCH v9 4/9] clocksource/drivers/arm_arch_timer: use readq to get 64-bit CNTVCT
Date: Wed, 27 Jul 2016 11:33:42 +0800 [thread overview]
Message-ID: <20160727113342.2a839c1a@xhacker> (raw)
In-Reply-To: <57976FA5.2070802@codeaurora.org>
+1
On Tue, 26 Jul 2016 09:11:49 -0500 Timur Tabi wrote:
> Will Deacon wrote:
> > The kernel really needs to support both of those platforms :/
> >
> > For the memory-mapped counter registers, the architecture says:
> >
> > `If the implementation supports 64-bit atomic accesses, then the
> > CNTV_CVAL register must be accessible as an atomic 64-bit value.'
> >
> > which is borderline tautological. If we take the generous reading that
> > this means AArch64 CPUs can use readq (and I'm not completely
> > comfortable with that assertion, particularly as you say that it breaks
> > the model), then you still need to use readq_relaxed here to avoid a
> > DSB. Furthermore, what are you going to do for AArch32? readq doesn't
> > exist over there, and if you use the generic implementation then it's
> > not atomic. In which case, we end up with the current code, as well as a
> > readq_relaxed guarded by a questionable #ifdef that is known to break a
> > supported platform for an unknown performance improvement. Hardly a big
> > win.
>
> I know Fu dropped this patch, and I don't want to kick a dead horse, but
> I was wondering if it would be okay to do this:
>
> static u64 arch_counter_get_cntvct_mem(void)
> {
> #ifdef readq_relaxed
> return readq_relaxed(arch_counter_base + CNTVCT_LO);
> #else
> u32 vct_lo, vct_hi, tmp_hi;
>
> do {
> vct_hi = readl_relaxed(arch_counter_base + CNTVCT_HI);
> vct_lo = readl_relaxed(arch_counter_base + CNTVCT_LO);
> tmp_hi = readl_relaxed(arch_counter_base + CNTVCT_HI);
> } while (vct_hi != tmp_hi);
>
> return ((u64) vct_hi << 32) | vct_lo;
> #endif
> }
>
> readq and readq_relaxed are defined in arch/arm64/include/asm/io.h. Why
> would the function exist if AArch64 CPUs can't use it?
+1
I measured the performance on berlin arm64 platforms:
compared with original version, using readq_relaxed could reduce
time of arch_counter_get_cntvct_mem() by about 42%!
Thanks,
Jisheng
WARNING: multiple messages have this Message-ID (diff)
From: jszhang@marvell.com (Jisheng Zhang)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v9 4/9] clocksource/drivers/arm_arch_timer: use readq to get 64-bit CNTVCT
Date: Wed, 27 Jul 2016 11:33:42 +0800 [thread overview]
Message-ID: <20160727113342.2a839c1a@xhacker> (raw)
In-Reply-To: <57976FA5.2070802@codeaurora.org>
+1
On Tue, 26 Jul 2016 09:11:49 -0500 Timur Tabi wrote:
> Will Deacon wrote:
> > The kernel really needs to support both of those platforms :/
> >
> > For the memory-mapped counter registers, the architecture says:
> >
> > `If the implementation supports 64-bit atomic accesses, then the
> > CNTV_CVAL register must be accessible as an atomic 64-bit value.'
> >
> > which is borderline tautological. If we take the generous reading that
> > this means AArch64 CPUs can use readq (and I'm not completely
> > comfortable with that assertion, particularly as you say that it breaks
> > the model), then you still need to use readq_relaxed here to avoid a
> > DSB. Furthermore, what are you going to do for AArch32? readq doesn't
> > exist over there, and if you use the generic implementation then it's
> > not atomic. In which case, we end up with the current code, as well as a
> > readq_relaxed guarded by a questionable #ifdef that is known to break a
> > supported platform for an unknown performance improvement. Hardly a big
> > win.
>
> I know Fu dropped this patch, and I don't want to kick a dead horse, but
> I was wondering if it would be okay to do this:
>
> static u64 arch_counter_get_cntvct_mem(void)
> {
> #ifdef readq_relaxed
> return readq_relaxed(arch_counter_base + CNTVCT_LO);
> #else
> u32 vct_lo, vct_hi, tmp_hi;
>
> do {
> vct_hi = readl_relaxed(arch_counter_base + CNTVCT_HI);
> vct_lo = readl_relaxed(arch_counter_base + CNTVCT_LO);
> tmp_hi = readl_relaxed(arch_counter_base + CNTVCT_HI);
> } while (vct_hi != tmp_hi);
>
> return ((u64) vct_hi << 32) | vct_lo;
> #endif
> }
>
> readq and readq_relaxed are defined in arch/arm64/include/asm/io.h. Why
> would the function exist if AArch64 CPUs can't use it?
+1
I measured the performance on berlin arm64 platforms:
compared with original version, using readq_relaxed could reduce
time of arch_counter_get_cntvct_mem() by about 42%!
Thanks,
Jisheng
next prev parent reply other threads:[~2016-07-27 3:33 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-25 15:26 [PATCH v9 0/9] acpi, clocksource: add GTDT driver and GTDT support in arm_arch_timer fu.wei
2016-07-25 15:26 ` fu.wei at linaro.org
2016-07-25 15:27 ` [PATCH v9 2/9] clocksource/drivers/arm_arch_timer: Add a new enum for spi type fu.wei
2016-07-25 15:27 ` fu.wei at linaro.org
2016-07-25 15:27 ` [PATCH v9 4/9] clocksource/drivers/arm_arch_timer: use readq to get 64-bit CNTVCT fu.wei
2016-07-25 15:27 ` fu.wei at linaro.org
2016-07-25 15:31 ` Will Deacon
2016-07-25 15:31 ` Will Deacon
[not found] ` <20160725153118.GD19209-5wv7dgnIgG8@public.gmane.org>
2016-07-25 15:50 ` Timur Tabi
2016-07-25 15:50 ` Timur Tabi
2016-07-25 15:50 ` Timur Tabi
2016-07-25 15:55 ` Fu Wei
2016-07-25 15:55 ` Fu Wei
2016-07-25 15:55 ` Fu Wei
[not found] ` <CADyBb7ubKceQiB7+u7sSA=1+9_VMUrGjiabf_FkN2-n4rq7Kgg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-07-25 16:31 ` Will Deacon
2016-07-25 16:31 ` Will Deacon
2016-07-25 16:31 ` Will Deacon
[not found] ` <20160725163144.GE19209-5wv7dgnIgG8@public.gmane.org>
2016-07-25 22:49 ` Russell King - ARM Linux
2016-07-25 22:49 ` Russell King - ARM Linux
2016-07-25 22:49 ` Russell King - ARM Linux
2016-07-26 9:21 ` Fu Wei
2016-07-26 9:21 ` Fu Wei
2016-07-26 9:21 ` Fu Wei
2016-07-26 14:11 ` Timur Tabi
2016-07-26 14:11 ` Timur Tabi
2016-07-26 14:11 ` Timur Tabi
2016-07-27 3:33 ` Jisheng Zhang [this message]
2016-07-27 3:33 ` Jisheng Zhang
2016-07-27 3:33 ` Jisheng Zhang
2016-07-27 4:19 ` Fu Wei
2016-07-27 4:19 ` Fu Wei
2016-07-27 4:19 ` Fu Wei
2016-07-28 13:53 ` Will Deacon
2016-07-28 13:53 ` Will Deacon
2016-07-28 13:53 ` Will Deacon
[not found] ` <1469460427-8643-1-git-send-email-fu.wei-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2016-07-25 15:26 ` [PATCH v9 1/9] clocksource/drivers/arm_arch_timer: Move enums and defines to header file fu.wei-QSEj5FYQhm4dnm+yROfE0A
2016-07-25 15:26 ` fu.wei at linaro.org
2016-07-25 15:26 ` fu.wei
2016-07-25 15:27 ` [PATCH v9 3/9] clocksource/drivers/arm_arch_timer: Improve printk relevant code fu.wei-QSEj5FYQhm4dnm+yROfE0A
2016-07-25 15:27 ` fu.wei at linaro.org
2016-07-25 15:27 ` fu.wei
2016-07-25 15:27 ` [PATCH v9 5/9] acpi/arm64: Add GTDT table parse driver fu.wei-QSEj5FYQhm4dnm+yROfE0A
2016-07-25 15:27 ` fu.wei at linaro.org
2016-07-25 15:27 ` fu.wei
[not found] ` <1469460427-8643-6-git-send-email-fu.wei-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2016-07-26 11:50 ` Rafael J. Wysocki
2016-07-26 11:50 ` Rafael J. Wysocki
2016-07-26 11:50 ` Rafael J. Wysocki
[not found] ` <1754781.CC9jSrYt2s-sKB8Sp2ER+y1GS7QM15AGw@public.gmane.org>
2016-07-26 12:40 ` Fu Wei
2016-07-26 12:40 ` Fu Wei
2016-07-26 12:40 ` Fu Wei
2016-07-25 15:27 ` [PATCH v9 6/9] clocksource/drivers/arm_arch_timer: Simplify ACPI support code fu.wei-QSEj5FYQhm4dnm+yROfE0A
2016-07-25 15:27 ` fu.wei at linaro.org
2016-07-25 15:27 ` fu.wei
2016-07-25 15:27 ` [PATCH v9 7/9] acpi/arm64: Add memory-mapped timer support in GTDT driver fu.wei-QSEj5FYQhm4dnm+yROfE0A
2016-07-25 15:27 ` fu.wei at linaro.org
2016-07-25 15:27 ` fu.wei
2016-07-25 15:27 ` [PATCH v9 8/9] clocksource/drivers/arm_arch_timer: Add GTDT support for memory-mapped timer fu.wei-QSEj5FYQhm4dnm+yROfE0A
2016-07-25 15:27 ` fu.wei
2016-07-25 15:27 ` fu.wei at linaro.org
2016-07-25 15:27 ` fu.wei
2016-07-25 15:27 ` [PATCH v9 9/9] acpi/arm64: Add SBSA Generic Watchdog support in GTDT driver fu.wei-QSEj5FYQhm4dnm+yROfE0A
2016-07-25 15:27 ` fu.wei
2016-07-25 15:27 ` fu.wei at linaro.org
2016-07-25 15:27 ` fu.wei
2016-08-09 11:03 ` [PATCH v9 0/9] acpi, clocksource: add GTDT driver and GTDT support in arm_arch_timer Tomasz Nowicki
2016-08-09 11:03 ` Tomasz Nowicki
[not found] ` <7c11cca2-1eaa-a4fb-a44b-cdb36ea99ae9-nYOzD4b6Jr9Wk0Htik3J/w@public.gmane.org>
2016-08-09 18:12 ` Fu Wei
2016-08-09 18:12 ` Fu Wei
2016-08-09 18:12 ` Fu Wei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160727113342.2a839c1a@xhacker \
--to=jszhang@marvell.com \
--cc=al.stone@linaro.org \
--cc=arnd@arndb.de \
--cc=catalin.marinas@arm.com \
--cc=cov@codeaurora.org \
--cc=daniel.lezcano@linaro.org \
--cc=fu.wei@linaro.org \
--cc=graeme.gregory@linaro.org \
--cc=harba@codeaurora.org \
--cc=jcm@redhat.com \
--cc=lenb@kernel.org \
--cc=linaro-acpi@lists.linaro.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-watchdog@vger.kernel.org \
--cc=linux@roeck-us.net \
--cc=lorenzo.pieralisi@arm.com \
--cc=marc.zyngier@arm.com \
--cc=rjw@rjwysocki.net \
--cc=rruigrok@codeaurora \
--cc=tglx@linutronix.de \
--cc=timur@codeaurora.org \
--cc=wei@redhat.com \
--cc=will.deacon@arm.com \
--cc=wim@iguana.be \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.