* [PATCH] drm/radeon: Inline r100_mm_rreg, v2
@ 2014-04-18 21:11 Lauri Kasanen
2014-04-19 10:06 ` Christian König
0 siblings, 1 reply; 5+ messages in thread
From: Lauri Kasanen @ 2014-04-18 21:11 UTC (permalink / raw)
To: dri-devel
This was originally un-inlined by Andi Kleen in 2011 citing size concerns.
Indeed, a first attempt at inlining it grew radeon.ko by 7%.
However, 2% of cpu is spent in this function. Simply inlining it gave 1% more fps
in Urban Terror.
v2: We know the minimum MMIO size. Adding it to the if allows the compiler to
optimize the branch out, improving both performance and size.
The v2 patch decreases radeon.ko size by 2%. I didn't re-benchmark, but common sense
says perf is now more than 1% better.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
---
drivers/gpu/drm/radeon/r100.c | 18 ------------------
drivers/gpu/drm/radeon/radeon.h | 21 +++++++++++++++++++--
2 files changed, 19 insertions(+), 20 deletions(-)
The _wreg function could be given similar treatment, but as it's nowhere as hot
(0.009% vs 2%), didn't bother.
diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
index b6c3264..8169e82 100644
--- a/drivers/gpu/drm/radeon/r100.c
+++ b/drivers/gpu/drm/radeon/r100.c
@@ -4086,24 +4086,6 @@ int r100_init(struct radeon_device *rdev)
return 0;
}
-uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg,
- bool always_indirect)
-{
- if (reg < rdev->rmmio_size && !always_indirect)
- return readl(((void __iomem *)rdev->rmmio) + reg);
- else {
- unsigned long flags;
- uint32_t ret;
-
- spin_lock_irqsave(&rdev->mmio_idx_lock, flags);
- writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX);
- ret = readl(((void __iomem *)rdev->rmmio) + RADEON_MM_DATA);
- spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags);
-
- return ret;
- }
-}
-
void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, uint32_t v,
bool always_indirect)
{
diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index f21db7a..883276a 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -2328,8 +2328,25 @@ int radeon_device_init(struct radeon_device *rdev,
void radeon_device_fini(struct radeon_device *rdev);
int radeon_gpu_wait_for_idle(struct radeon_device *rdev);
-uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg,
- bool always_indirect);
+static inline uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg,
+ bool always_indirect)
+{
+ /* The mmio size is 64kb at minimum. Allows the if to be optimized out. */
+ if ((reg < rdev->rmmio_size || reg < 0x10000) && !always_indirect)
+ return readl(((void __iomem *)rdev->rmmio) + reg);
+ else {
+ unsigned long flags;
+ uint32_t ret;
+
+ spin_lock_irqsave(&rdev->mmio_idx_lock, flags);
+ writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX);
+ ret = readl(((void __iomem *)rdev->rmmio) + RADEON_MM_DATA);
+ spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags);
+
+ return ret;
+ }
+}
+
void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, uint32_t v,
bool always_indirect);
u32 r100_io_rreg(struct radeon_device *rdev, u32 reg);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH] drm/radeon: Inline r100_mm_rreg, v2
2014-04-18 21:11 [PATCH] drm/radeon: Inline r100_mm_rreg, v2 Lauri Kasanen
@ 2014-04-19 10:06 ` Christian König
2014-04-19 15:15 ` Alex Deucher
0 siblings, 1 reply; 5+ messages in thread
From: Christian König @ 2014-04-19 10:06 UTC (permalink / raw)
To: Lauri Kasanen, dri-devel
Am 18.04.2014 23:11, schrieb Lauri Kasanen:
> This was originally un-inlined by Andi Kleen in 2011 citing size concerns.
> Indeed, a first attempt at inlining it grew radeon.ko by 7%.
>
> However, 2% of cpu is spent in this function. Simply inlining it gave 1% more fps
> in Urban Terror.
>
> v2: We know the minimum MMIO size. Adding it to the if allows the compiler to
> optimize the branch out, improving both performance and size.
>
> The v2 patch decreases radeon.ko size by 2%. I didn't re-benchmark, but common sense
> says perf is now more than 1% better.
Nice!
But are you sure that the register PCI bar is always at least 64K in
size? Keep in mind that this code is used over all generations since
R100. Additional to that we probably should have a define for that and
also apply the optimizations to r100_mm_wreg as well.
Christian.
>
> Signed-off-by: Lauri Kasanen <cand@gmx.com>
> ---
> drivers/gpu/drm/radeon/r100.c | 18 ------------------
> drivers/gpu/drm/radeon/radeon.h | 21 +++++++++++++++++++--
> 2 files changed, 19 insertions(+), 20 deletions(-)
>
> The _wreg function could be given similar treatment, but as it's nowhere as hot
> (0.009% vs 2%), didn't bother.
>
> diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
> index b6c3264..8169e82 100644
> --- a/drivers/gpu/drm/radeon/r100.c
> +++ b/drivers/gpu/drm/radeon/r100.c
> @@ -4086,24 +4086,6 @@ int r100_init(struct radeon_device *rdev)
> return 0;
> }
>
> -uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg,
> - bool always_indirect)
> -{
> - if (reg < rdev->rmmio_size && !always_indirect)
> - return readl(((void __iomem *)rdev->rmmio) + reg);
> - else {
> - unsigned long flags;
> - uint32_t ret;
> -
> - spin_lock_irqsave(&rdev->mmio_idx_lock, flags);
> - writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX);
> - ret = readl(((void __iomem *)rdev->rmmio) + RADEON_MM_DATA);
> - spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags);
> -
> - return ret;
> - }
> -}
> -
> void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, uint32_t v,
> bool always_indirect)
> {
> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
> index f21db7a..883276a 100644
> --- a/drivers/gpu/drm/radeon/radeon.h
> +++ b/drivers/gpu/drm/radeon/radeon.h
> @@ -2328,8 +2328,25 @@ int radeon_device_init(struct radeon_device *rdev,
> void radeon_device_fini(struct radeon_device *rdev);
> int radeon_gpu_wait_for_idle(struct radeon_device *rdev);
>
> -uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg,
> - bool always_indirect);
> +static inline uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg,
> + bool always_indirect)
> +{
> + /* The mmio size is 64kb at minimum. Allows the if to be optimized out. */
> + if ((reg < rdev->rmmio_size || reg < 0x10000) && !always_indirect)
> + return readl(((void __iomem *)rdev->rmmio) + reg);
> + else {
> + unsigned long flags;
> + uint32_t ret;
> +
> + spin_lock_irqsave(&rdev->mmio_idx_lock, flags);
> + writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX);
> + ret = readl(((void __iomem *)rdev->rmmio) + RADEON_MM_DATA);
> + spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags);
> +
> + return ret;
> + }
> +}
> +
> void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, uint32_t v,
> bool always_indirect);
> u32 r100_io_rreg(struct radeon_device *rdev, u32 reg);
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] drm/radeon: Inline r100_mm_rreg, v2
2014-04-19 10:06 ` Christian König
@ 2014-04-19 15:15 ` Alex Deucher
2014-04-19 17:33 ` Lauri Kasanen
0 siblings, 1 reply; 5+ messages in thread
From: Alex Deucher @ 2014-04-19 15:15 UTC (permalink / raw)
To: Christian König; +Cc: Maling list - DRI developers
On Sat, Apr 19, 2014 at 6:06 AM, Christian König
<deathsimple@vodafone.de> wrote:
> Am 18.04.2014 23:11, schrieb Lauri Kasanen:
>
>> This was originally un-inlined by Andi Kleen in 2011 citing size concerns.
>> Indeed, a first attempt at inlining it grew radeon.ko by 7%.
>>
>> However, 2% of cpu is spent in this function. Simply inlining it gave 1%
>> more fps
>> in Urban Terror.
>>
>> v2: We know the minimum MMIO size. Adding it to the if allows the compiler
>> to
>> optimize the branch out, improving both performance and size.
>>
>> The v2 patch decreases radeon.ko size by 2%. I didn't re-benchmark, but
>> common sense
>> says perf is now more than 1% better.
>
>
> Nice!
>
> But are you sure that the register PCI bar is always at least 64K in size?
> Keep in mind that this code is used over all generations since R100.
> Additional to that we probably should have a define for that and also apply
> the optimizations to r100_mm_wreg as well.
>
If most of the register accesses are for the interrupt setup, I wonder
if it would be better to just clean up the irq_set functions to reduce
the register accesses. E.g., only touch the registers for the
specific irq masks have changed.
Alex
> Christian.
>
>
>>
>> Signed-off-by: Lauri Kasanen <cand@gmx.com>
>> ---
>> drivers/gpu/drm/radeon/r100.c | 18 ------------------
>> drivers/gpu/drm/radeon/radeon.h | 21 +++++++++++++++++++--
>> 2 files changed, 19 insertions(+), 20 deletions(-)
>>
>> The _wreg function could be given similar treatment, but as it's nowhere
>> as hot
>> (0.009% vs 2%), didn't bother.
>>
>> diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
>> index b6c3264..8169e82 100644
>> --- a/drivers/gpu/drm/radeon/r100.c
>> +++ b/drivers/gpu/drm/radeon/r100.c
>> @@ -4086,24 +4086,6 @@ int r100_init(struct radeon_device *rdev)
>> return 0;
>> }
>> -uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg,
>> - bool always_indirect)
>> -{
>> - if (reg < rdev->rmmio_size && !always_indirect)
>> - return readl(((void __iomem *)rdev->rmmio) + reg);
>> - else {
>> - unsigned long flags;
>> - uint32_t ret;
>> -
>> - spin_lock_irqsave(&rdev->mmio_idx_lock, flags);
>> - writel(reg, ((void __iomem *)rdev->rmmio) +
>> RADEON_MM_INDEX);
>> - ret = readl(((void __iomem *)rdev->rmmio) +
>> RADEON_MM_DATA);
>> - spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags);
>> -
>> - return ret;
>> - }
>> -}
>> -
>> void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, uint32_t v,
>> bool always_indirect)
>> {
>> diff --git a/drivers/gpu/drm/radeon/radeon.h
>> b/drivers/gpu/drm/radeon/radeon.h
>> index f21db7a..883276a 100644
>> --- a/drivers/gpu/drm/radeon/radeon.h
>> +++ b/drivers/gpu/drm/radeon/radeon.h
>> @@ -2328,8 +2328,25 @@ int radeon_device_init(struct radeon_device *rdev,
>> void radeon_device_fini(struct radeon_device *rdev);
>> int radeon_gpu_wait_for_idle(struct radeon_device *rdev);
>> -uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg,
>> - bool always_indirect);
>> +static inline uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t
>> reg,
>> + bool always_indirect)
>> +{
>> + /* The mmio size is 64kb at minimum. Allows the if to be optimized
>> out. */
>> + if ((reg < rdev->rmmio_size || reg < 0x10000) && !always_indirect)
>> + return readl(((void __iomem *)rdev->rmmio) + reg);
>> + else {
>> + unsigned long flags;
>> + uint32_t ret;
>> +
>> + spin_lock_irqsave(&rdev->mmio_idx_lock, flags);
>> + writel(reg, ((void __iomem *)rdev->rmmio) +
>> RADEON_MM_INDEX);
>> + ret = readl(((void __iomem *)rdev->rmmio) +
>> RADEON_MM_DATA);
>> + spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags);
>> +
>> + return ret;
>> + }
>> +}
>> +
>> void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, uint32_t v,
>> bool always_indirect);
>> u32 r100_io_rreg(struct radeon_device *rdev, u32 reg);
>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] drm/radeon: Inline r100_mm_rreg, v2
2014-04-19 15:15 ` Alex Deucher
@ 2014-04-19 17:33 ` Lauri Kasanen
2014-04-20 13:44 ` Christian König
0 siblings, 1 reply; 5+ messages in thread
From: Lauri Kasanen @ 2014-04-19 17:33 UTC (permalink / raw)
To: Alex Deucher; +Cc: Maling list - DRI developers
On Sat, 19 Apr 2014 11:15:53 -0400
Alex Deucher <alexdeucher@gmail.com> wrote:
> On Sat, Apr 19, 2014 at 6:06 AM, Christian König
> >> This was originally un-inlined by Andi Kleen in 2011 citing size concerns.
> >> Indeed, a first attempt at inlining it grew radeon.ko by 7%.
> >>
> >> However, 2% of cpu is spent in this function. Simply inlining it gave 1%
> >> more fps
> >> in Urban Terror.
> >>
> >> v2: We know the minimum MMIO size. Adding it to the if allows the compiler
> >> to
> >> optimize the branch out, improving both performance and size.
> >>
> >> The v2 patch decreases radeon.ko size by 2%. I didn't re-benchmark, but
> >> common sense
> >> says perf is now more than 1% better.
> >
> > Nice!
> >
> > But are you sure that the register PCI bar is always at least 64K in size?
> > Keep in mind that this code is used over all generations since R100.
> > Additional to that we probably should have a define for that and also apply
> > the optimizations to r100_mm_wreg as well.
Yes, I checked the earlier code. It had 64kb hard-coded, and when it
was changed in 2010 to use the dynamic value, the commit said later
asics are larger. (07bec2df01)
A quick google also didn't find any dmesg with smaller values, R100
cards had 64kb.
> If most of the register accesses are for the interrupt setup, I wonder
> if it would be better to just clean up the irq_set functions to reduce
> the register accesses. E.g., only touch the registers for the
> specific irq masks have changed.
Yes, that should also be done. But as this function is used elsewhere
as well, having it fast (not to mention the size decrease) would be
good.
I think this patch is safe enough for 3.15, but perhaps it's too late
now.
- Lauri
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] drm/radeon: Inline r100_mm_rreg, v2
2014-04-19 17:33 ` Lauri Kasanen
@ 2014-04-20 13:44 ` Christian König
0 siblings, 0 replies; 5+ messages in thread
From: Christian König @ 2014-04-20 13:44 UTC (permalink / raw)
To: Lauri Kasanen, Alex Deucher; +Cc: Maling list - DRI developers
Am 19.04.2014 19:33, schrieb Lauri Kasanen:
> On Sat, 19 Apr 2014 11:15:53 -0400
> Alex Deucher <alexdeucher@gmail.com> wrote:
>
>> On Sat, Apr 19, 2014 at 6:06 AM, Christian König
>>>> This was originally un-inlined by Andi Kleen in 2011 citing size concerns.
>>>> Indeed, a first attempt at inlining it grew radeon.ko by 7%.
>>>>
>>>> However, 2% of cpu is spent in this function. Simply inlining it gave 1%
>>>> more fps
>>>> in Urban Terror.
>>>>
>>>> v2: We know the minimum MMIO size. Adding it to the if allows the compiler
>>>> to
>>>> optimize the branch out, improving both performance and size.
>>>>
>>>> The v2 patch decreases radeon.ko size by 2%. I didn't re-benchmark, but
>>>> common sense
>>>> says perf is now more than 1% better.
>>> Nice!
>>>
>>> But are you sure that the register PCI bar is always at least 64K in size?
>>> Keep in mind that this code is used over all generations since R100.
>>> Additional to that we probably should have a define for that and also apply
>>> the optimizations to r100_mm_wreg as well.
> Yes, I checked the earlier code. It had 64kb hard-coded, and when it
> was changed in 2010 to use the dynamic value, the commit said later
> asics are larger. (07bec2df01)
>
> A quick google also didn't find any dmesg with smaller values, R100
> cards had 64kb.
Thanks for digging that up, this indeed sounds valid and like a very
nice optimization.
Just as suggested before add a define for this in radeon.h, something
like RADEON_MIN_PCI_BAR_SIZE and do the same for r100_mm_wreg as well.
>> If most of the register accesses are for the interrupt setup, I wonder
>> if it would be better to just clean up the irq_set functions to reduce
>> the register accesses. E.g., only touch the registers for the
>> specific irq masks have changed.
> Yes, that should also be done. But as this function is used elsewhere
> as well, having it fast (not to mention the size decrease) would be
> good.
>
> I think this patch is safe enough for 3.15, but perhaps it's too late
> now.
Yeah, probably. But 3.16 should work as well.
Christian.
> - Lauri
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-04-20 13:44 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-18 21:11 [PATCH] drm/radeon: Inline r100_mm_rreg, v2 Lauri Kasanen
2014-04-19 10:06 ` Christian König
2014-04-19 15:15 ` Alex Deucher
2014-04-19 17:33 ` Lauri Kasanen
2014-04-20 13:44 ` Christian König
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.