From: jszhang@marvell.com (Jisheng Zhang)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH] clocksource/drivers/arm_global_timer: Always use {readl|writel}_relaxed
Date: Fri, 13 Nov 2015 20:20:01 +0800 [thread overview]
Message-ID: <20151113202001.5933ae54@xhacker> (raw)
In-Reply-To: <25602407.8sIohphlWH@wuerfel>
Dear Arnd,
On Fri, 13 Nov 2015 11:33:12 +0100
Arnd Bergmann wrote:
> On Friday 13 November 2015 17:59:48 Jisheng Zhang wrote:
> > On Fri, 13 Nov 2015 10:28:01 +0100
> > Arnd Bergmann wrote:
> > > On Friday 13 November 2015 16:40:25 Jisheng Zhang wrote:
> > > > On Fri, 13 Nov 2015 16:34:38 +0800
> > > > > diff --git a/drivers/clocksource/arm_global_timer.c b/drivers/clocksource/arm_global_timer.c
> > > > > index a2cb6fa..84a5a5d 100644
> > > > > --- a/drivers/clocksource/arm_global_timer.c
> > > > > +++ b/drivers/clocksource/arm_global_timer.c
> > > > > @@ -99,27 +99,27 @@ static void gt_compare_set(unsigned long delta, int periodic)
> > > > >
> > > > > counter += delta;
> > > > > ctrl = GT_CONTROL_TIMER_ENABLE;
> > > > > - writel(ctrl, gt_base + GT_CONTROL);
> > > > > - writel(lower_32_bits(counter), gt_base + GT_COMP0);
> > > > > - writel(upper_32_bits(counter), gt_base + GT_COMP1);
> > > > > + writel_relaxed(ctrl, gt_base + GT_CONTROL);
> > > > > + writel_relaxed(lower_32_bits(counter), gt_base + GT_COMP0);
> > > > > + writel_relaxed(upper_32_bits(counter), gt_base + GT_COMP1);
> > > > >
> > > > > if (periodic) {
> > > > > - writel(delta, gt_base + GT_AUTO_INC);
> > > > > + writel_relaxed(delta, gt_base + GT_AUTO_INC);
> > > > > ctrl |= GT_CONTROL_AUTO_INC;
> > > > > }
> > > > >
> > > > > ctrl |= GT_CONTROL_COMP_ENABLE | GT_CONTROL_IRQ_ENABLE;
> > > > > - writel(ctrl, gt_base + GT_CONTROL);
> > > > > + writel_relaxed(ctrl, gt_base + GT_CONTROL);
> > > > > }
> > >
> > > This seems fine. Do you have any performance numbers to show how much
> > > we save per call on a platform you care about, and how often it is
> > > called for a typical workload?
> >
> > To be honest, all my platforms don't make use of global timer for clockevent,
> > we use dw_apb_timer and twd or arch_timer instead, but one performance impact
> > I saw in our case can also apply for the case with global timer as clokevent:
> >
> > there are 500-1000 short sleeps, yes not good userspace behavior, so we
> > program clockevent device 500-1000 times/s. If the system is powered by CA9
> > with outer L2 cache, the writel will contend for l2x0_lock for 500-1000 times/s.
> > Then the L2 cache maintenance from other device driver have more chance to
> > spinning at the l2x0_lock, so other device driver performance is impacted.
>
> Just to make sure I get this right: which outer cache implementation do you
> use in this case? Most Cortex-A9 use pl310, which does not require l2x0_lock
PL310
> for outer_cache.sync(). The Aurora outer cache sync has a different method
> and also doesn't use l2x0_lock. Finally, tauros3 doesn't need a cache sync
> at all.
>
> Did you look at an older kernel version? We used to do a loop in the
oops, yes. The kernel version in product still needs the spinlock in sync.
I didn't check the L2 cache code for about 1 year, sorry for that.
If we upgrade to newer kernel version, yes, the bit performance bottleneck --
spinlock contention won't exist anymore. Thanks for pointing out this.
But I think we may still see trivial system performance improvement in 500-1000
times/s of clockevent programming case due to the mb() in writel.
Thanks,
Jisheng
> Aurora cache sync operation until I fixed that, so it should be a bit
> faster now. It will still require doing the actual sync, but at least
> there should not be any lock contention these days.
>
> Arnd
WARNING: multiple messages have this Message-ID (diff)
From: Jisheng Zhang <jszhang@marvell.com>
To: Arnd Bergmann <arnd@arndb.de>
Cc: <linux-arm-kernel@lists.infradead.org>, <kernel@stlinux.com>,
<srinivas.kandagatla@gmail.com>, <daniel.lezcano@linaro.org>,
<patrice.chotard@st.com>, <linux-kernel@vger.kernel.org>,
<tglx@linutronix.de>, <maxime.coquelin@st.com>
Subject: Re: [PATCH] clocksource/drivers/arm_global_timer: Always use {readl|writel}_relaxed
Date: Fri, 13 Nov 2015 20:20:01 +0800 [thread overview]
Message-ID: <20151113202001.5933ae54@xhacker> (raw)
In-Reply-To: <25602407.8sIohphlWH@wuerfel>
Dear Arnd,
On Fri, 13 Nov 2015 11:33:12 +0100
Arnd Bergmann wrote:
> On Friday 13 November 2015 17:59:48 Jisheng Zhang wrote:
> > On Fri, 13 Nov 2015 10:28:01 +0100
> > Arnd Bergmann wrote:
> > > On Friday 13 November 2015 16:40:25 Jisheng Zhang wrote:
> > > > On Fri, 13 Nov 2015 16:34:38 +0800
> > > > > diff --git a/drivers/clocksource/arm_global_timer.c b/drivers/clocksource/arm_global_timer.c
> > > > > index a2cb6fa..84a5a5d 100644
> > > > > --- a/drivers/clocksource/arm_global_timer.c
> > > > > +++ b/drivers/clocksource/arm_global_timer.c
> > > > > @@ -99,27 +99,27 @@ static void gt_compare_set(unsigned long delta, int periodic)
> > > > >
> > > > > counter += delta;
> > > > > ctrl = GT_CONTROL_TIMER_ENABLE;
> > > > > - writel(ctrl, gt_base + GT_CONTROL);
> > > > > - writel(lower_32_bits(counter), gt_base + GT_COMP0);
> > > > > - writel(upper_32_bits(counter), gt_base + GT_COMP1);
> > > > > + writel_relaxed(ctrl, gt_base + GT_CONTROL);
> > > > > + writel_relaxed(lower_32_bits(counter), gt_base + GT_COMP0);
> > > > > + writel_relaxed(upper_32_bits(counter), gt_base + GT_COMP1);
> > > > >
> > > > > if (periodic) {
> > > > > - writel(delta, gt_base + GT_AUTO_INC);
> > > > > + writel_relaxed(delta, gt_base + GT_AUTO_INC);
> > > > > ctrl |= GT_CONTROL_AUTO_INC;
> > > > > }
> > > > >
> > > > > ctrl |= GT_CONTROL_COMP_ENABLE | GT_CONTROL_IRQ_ENABLE;
> > > > > - writel(ctrl, gt_base + GT_CONTROL);
> > > > > + writel_relaxed(ctrl, gt_base + GT_CONTROL);
> > > > > }
> > >
> > > This seems fine. Do you have any performance numbers to show how much
> > > we save per call on a platform you care about, and how often it is
> > > called for a typical workload?
> >
> > To be honest, all my platforms don't make use of global timer for clockevent,
> > we use dw_apb_timer and twd or arch_timer instead, but one performance impact
> > I saw in our case can also apply for the case with global timer as clokevent:
> >
> > there are 500-1000 short sleeps, yes not good userspace behavior, so we
> > program clockevent device 500-1000 times/s. If the system is powered by CA9
> > with outer L2 cache, the writel will contend for l2x0_lock for 500-1000 times/s.
> > Then the L2 cache maintenance from other device driver have more chance to
> > spinning at the l2x0_lock, so other device driver performance is impacted.
>
> Just to make sure I get this right: which outer cache implementation do you
> use in this case? Most Cortex-A9 use pl310, which does not require l2x0_lock
PL310
> for outer_cache.sync(). The Aurora outer cache sync has a different method
> and also doesn't use l2x0_lock. Finally, tauros3 doesn't need a cache sync
> at all.
>
> Did you look at an older kernel version? We used to do a loop in the
oops, yes. The kernel version in product still needs the spinlock in sync.
I didn't check the L2 cache code for about 1 year, sorry for that.
If we upgrade to newer kernel version, yes, the bit performance bottleneck --
spinlock contention won't exist anymore. Thanks for pointing out this.
But I think we may still see trivial system performance improvement in 500-1000
times/s of clockevent programming case due to the mb() in writel.
Thanks,
Jisheng
> Aurora cache sync operation until I fixed that, so it should be a bit
> faster now. It will still require doing the actual sync, but at least
> there should not be any lock contention these days.
>
> Arnd
next prev parent reply other threads:[~2015-11-13 12:20 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-13 8:34 [PATCH] clocksource/drivers/arm_global_timer: Always use {readl|writel}_relaxed Jisheng Zhang
2015-11-13 8:34 ` Jisheng Zhang
2015-11-13 8:40 ` Jisheng Zhang
2015-11-13 8:40 ` Jisheng Zhang
2015-11-13 9:28 ` Arnd Bergmann
2015-11-13 9:28 ` Arnd Bergmann
2015-11-13 9:59 ` Jisheng Zhang
2015-11-13 9:59 ` Jisheng Zhang
2015-11-13 10:33 ` Arnd Bergmann
2015-11-13 10:33 ` Arnd Bergmann
2015-11-13 12:20 ` Jisheng Zhang [this message]
2015-11-13 12:20 ` Jisheng Zhang
2015-11-13 12:37 ` Arnd Bergmann
2015-11-13 12:37 ` Arnd Bergmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151113202001.5933ae54@xhacker \
--to=jszhang@marvell.com \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.