From: jszhang@marvell.com (Jisheng Zhang)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH] clocksource/drivers/arm_global_timer: Always use {readl|writel}_relaxed
Date: Fri, 13 Nov 2015 17:59:48 +0800 [thread overview]
Message-ID: <20151113175948.69f610e9@xhacker> (raw)
In-Reply-To: <10575502.sxpiT76bOp@wuerfel>
Dear Arnd,
On Fri, 13 Nov 2015 10:28:01 +0100
Arnd Bergmann <arnd@arndb.de> wrote:
> On Friday 13 November 2015 16:40:25 Jisheng Zhang wrote:
> > On Fri, 13 Nov 2015 16:34:38 +0800
> > Jisheng Zhang <jszhang@marvell.com> wrote:
> >
> > > This driver use both readl/writel and their relaxed version, this patch
> > > tries to unify the io accesses.
> >
> > I'm sorry. This is the version I'd like to send for review and merge. Can you
> > please kindly have a review?
>
> I would prefer to use write_relaxed() as sparingly as we can, it is too
> hard to verify each case to ensure that we don't have to watch out
> for ordering or locking issues.
>
> > > diff --git a/drivers/clocksource/arm_global_timer.c b/drivers/clocksource/arm_global_timer.c
> > > index a2cb6fa..84a5a5d 100644
> > > --- a/drivers/clocksource/arm_global_timer.c
> > > +++ b/drivers/clocksource/arm_global_timer.c
> > > @@ -99,27 +99,27 @@ static void gt_compare_set(unsigned long delta, int periodic)
> > >
> > > counter += delta;
> > > ctrl = GT_CONTROL_TIMER_ENABLE;
> > > - writel(ctrl, gt_base + GT_CONTROL);
> > > - writel(lower_32_bits(counter), gt_base + GT_COMP0);
> > > - writel(upper_32_bits(counter), gt_base + GT_COMP1);
> > > + writel_relaxed(ctrl, gt_base + GT_CONTROL);
> > > + writel_relaxed(lower_32_bits(counter), gt_base + GT_COMP0);
> > > + writel_relaxed(upper_32_bits(counter), gt_base + GT_COMP1);
> > >
> > > if (periodic) {
> > > - writel(delta, gt_base + GT_AUTO_INC);
> > > + writel_relaxed(delta, gt_base + GT_AUTO_INC);
> > > ctrl |= GT_CONTROL_AUTO_INC;
> > > }
> > >
> > > ctrl |= GT_CONTROL_COMP_ENABLE | GT_CONTROL_IRQ_ENABLE;
> > > - writel(ctrl, gt_base + GT_CONTROL);
> > > + writel_relaxed(ctrl, gt_base + GT_CONTROL);
> > > }
>
> This seems fine. Do you have any performance numbers to show how much
> we save per call on a platform you care about, and how often it is
> called for a typical workload?
To be honest, all my platforms don't make use of global timer for clockevent,
we use dw_apb_timer and twd or arch_timer instead, but one performance impact
I saw in our case can also apply for the case with global timer as clokevent:
there are 500-1000 short sleeps, yes not good userspace behavior, so we
program clockevent device 500-1000 times/s. If the system is powered by CA9
with outer L2 cache, the writel will contend for l2x0_lock for 500-1000 times/s.
Then the L2 cache maintenance from other device driver have more chance to
spinning at the l2x0_lock, so other device driver performance is impacted.
Thanks,
Jisheng
>
> I see that _gt_counter_read() already uses readl_relaxed(), and it
> seems to be a much bigger win there, as we read the clock more
> often than we write the comparator, so the person who did that
> probably thought that this one wasn't important enough. Can you
> add an explanation in the changelog why you think that was a
> mistake?
>
> Unifying the accessors across a driver is not enough of a reason
> I think.
>
> > > static int gt_clockevent_shutdown(struct clock_event_device *evt)
> > > {
> > > unsigned long ctrl;
> > >
> > > - ctrl = readl(gt_base + GT_CONTROL);
> > > + ctrl = readl_relaxed(gt_base + GT_CONTROL);
> > > ctrl &= ~(GT_CONTROL_COMP_ENABLE | GT_CONTROL_IRQ_ENABLE |
> > > GT_CONTROL_AUTO_INC);
> > > - writel(ctrl, gt_base + GT_CONTROL);
> > > + writel_relaxed(ctrl, gt_base + GT_CONTROL);
> > > return 0;
> > > }
>
> This is certainly not performance critical, better leave it using
> the standard accessors.
>
> > > @@ -212,11 +212,11 @@ static u64 notrace gt_sched_clock_read(void)
> > >
> > > static void __init gt_clocksource_init(void)
> > > {
> > > - writel(0, gt_base + GT_CONTROL);
> > > - writel(0, gt_base + GT_COUNTER0);
> > > - writel(0, gt_base + GT_COUNTER1);
> > > + writel_relaxed(0, gt_base + GT_CONTROL);
> > > + writel_relaxed(0, gt_base + GT_COUNTER0);
> > > + writel_relaxed(0, gt_base + GT_COUNTER1);
> > > /* enables timer on all the cores */
> > > - writel(GT_CONTROL_TIMER_ENABLE, gt_base + GT_CONTROL);
> > > + writel_relaxed(GT_CONTROL_TIMER_ENABLE, gt_base + GT_CONTROL);
> > >
> > > #ifdef CONFIG_CLKSRC_ARM_GLOBAL_TIMER_SCHED_CLOCK
> > > sched_clock_register(gt_sched_clock_read, 64, gt_clk_rate);
> >
> >
>
> Same here.
>
> Arnd
WARNING: multiple messages have this Message-ID (diff)
From: Jisheng Zhang <jszhang@marvell.com>
To: Arnd Bergmann <arnd@arndb.de>
Cc: <linux-arm-kernel@lists.infradead.org>,
<srinivas.kandagatla@gmail.com>, <maxime.coquelin@st.com>,
<patrice.chotard@st.com>, <daniel.lezcano@linaro.org>,
<tglx@linutronix.de>, <kernel@stlinux.com>,
<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] clocksource/drivers/arm_global_timer: Always use {readl|writel}_relaxed
Date: Fri, 13 Nov 2015 17:59:48 +0800 [thread overview]
Message-ID: <20151113175948.69f610e9@xhacker> (raw)
In-Reply-To: <10575502.sxpiT76bOp@wuerfel>
Dear Arnd,
On Fri, 13 Nov 2015 10:28:01 +0100
Arnd Bergmann <arnd@arndb.de> wrote:
> On Friday 13 November 2015 16:40:25 Jisheng Zhang wrote:
> > On Fri, 13 Nov 2015 16:34:38 +0800
> > Jisheng Zhang <jszhang@marvell.com> wrote:
> >
> > > This driver use both readl/writel and their relaxed version, this patch
> > > tries to unify the io accesses.
> >
> > I'm sorry. This is the version I'd like to send for review and merge. Can you
> > please kindly have a review?
>
> I would prefer to use write_relaxed() as sparingly as we can, it is too
> hard to verify each case to ensure that we don't have to watch out
> for ordering or locking issues.
>
> > > diff --git a/drivers/clocksource/arm_global_timer.c b/drivers/clocksource/arm_global_timer.c
> > > index a2cb6fa..84a5a5d 100644
> > > --- a/drivers/clocksource/arm_global_timer.c
> > > +++ b/drivers/clocksource/arm_global_timer.c
> > > @@ -99,27 +99,27 @@ static void gt_compare_set(unsigned long delta, int periodic)
> > >
> > > counter += delta;
> > > ctrl = GT_CONTROL_TIMER_ENABLE;
> > > - writel(ctrl, gt_base + GT_CONTROL);
> > > - writel(lower_32_bits(counter), gt_base + GT_COMP0);
> > > - writel(upper_32_bits(counter), gt_base + GT_COMP1);
> > > + writel_relaxed(ctrl, gt_base + GT_CONTROL);
> > > + writel_relaxed(lower_32_bits(counter), gt_base + GT_COMP0);
> > > + writel_relaxed(upper_32_bits(counter), gt_base + GT_COMP1);
> > >
> > > if (periodic) {
> > > - writel(delta, gt_base + GT_AUTO_INC);
> > > + writel_relaxed(delta, gt_base + GT_AUTO_INC);
> > > ctrl |= GT_CONTROL_AUTO_INC;
> > > }
> > >
> > > ctrl |= GT_CONTROL_COMP_ENABLE | GT_CONTROL_IRQ_ENABLE;
> > > - writel(ctrl, gt_base + GT_CONTROL);
> > > + writel_relaxed(ctrl, gt_base + GT_CONTROL);
> > > }
>
> This seems fine. Do you have any performance numbers to show how much
> we save per call on a platform you care about, and how often it is
> called for a typical workload?
To be honest, all my platforms don't make use of global timer for clockevent,
we use dw_apb_timer and twd or arch_timer instead, but one performance impact
I saw in our case can also apply for the case with global timer as clokevent:
there are 500-1000 short sleeps, yes not good userspace behavior, so we
program clockevent device 500-1000 times/s. If the system is powered by CA9
with outer L2 cache, the writel will contend for l2x0_lock for 500-1000 times/s.
Then the L2 cache maintenance from other device driver have more chance to
spinning at the l2x0_lock, so other device driver performance is impacted.
Thanks,
Jisheng
>
> I see that _gt_counter_read() already uses readl_relaxed(), and it
> seems to be a much bigger win there, as we read the clock more
> often than we write the comparator, so the person who did that
> probably thought that this one wasn't important enough. Can you
> add an explanation in the changelog why you think that was a
> mistake?
>
> Unifying the accessors across a driver is not enough of a reason
> I think.
>
> > > static int gt_clockevent_shutdown(struct clock_event_device *evt)
> > > {
> > > unsigned long ctrl;
> > >
> > > - ctrl = readl(gt_base + GT_CONTROL);
> > > + ctrl = readl_relaxed(gt_base + GT_CONTROL);
> > > ctrl &= ~(GT_CONTROL_COMP_ENABLE | GT_CONTROL_IRQ_ENABLE |
> > > GT_CONTROL_AUTO_INC);
> > > - writel(ctrl, gt_base + GT_CONTROL);
> > > + writel_relaxed(ctrl, gt_base + GT_CONTROL);
> > > return 0;
> > > }
>
> This is certainly not performance critical, better leave it using
> the standard accessors.
>
> > > @@ -212,11 +212,11 @@ static u64 notrace gt_sched_clock_read(void)
> > >
> > > static void __init gt_clocksource_init(void)
> > > {
> > > - writel(0, gt_base + GT_CONTROL);
> > > - writel(0, gt_base + GT_COUNTER0);
> > > - writel(0, gt_base + GT_COUNTER1);
> > > + writel_relaxed(0, gt_base + GT_CONTROL);
> > > + writel_relaxed(0, gt_base + GT_COUNTER0);
> > > + writel_relaxed(0, gt_base + GT_COUNTER1);
> > > /* enables timer on all the cores */
> > > - writel(GT_CONTROL_TIMER_ENABLE, gt_base + GT_CONTROL);
> > > + writel_relaxed(GT_CONTROL_TIMER_ENABLE, gt_base + GT_CONTROL);
> > >
> > > #ifdef CONFIG_CLKSRC_ARM_GLOBAL_TIMER_SCHED_CLOCK
> > > sched_clock_register(gt_sched_clock_read, 64, gt_clk_rate);
> >
> >
>
> Same here.
>
> Arnd
next prev parent reply other threads:[~2015-11-13 9:59 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-13 8:34 [PATCH] clocksource/drivers/arm_global_timer: Always use {readl|writel}_relaxed Jisheng Zhang
2015-11-13 8:34 ` Jisheng Zhang
2015-11-13 8:40 ` Jisheng Zhang
2015-11-13 8:40 ` Jisheng Zhang
2015-11-13 9:28 ` Arnd Bergmann
2015-11-13 9:28 ` Arnd Bergmann
2015-11-13 9:59 ` Jisheng Zhang [this message]
2015-11-13 9:59 ` Jisheng Zhang
2015-11-13 10:33 ` Arnd Bergmann
2015-11-13 10:33 ` Arnd Bergmann
2015-11-13 12:20 ` Jisheng Zhang
2015-11-13 12:20 ` Jisheng Zhang
2015-11-13 12:37 ` Arnd Bergmann
2015-11-13 12:37 ` Arnd Bergmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151113175948.69f610e9@xhacker \
--to=jszhang@marvell.com \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.