From: Denys Vlasenko <dvlasenk@redhat.com>
To: "Christian König" <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] radeon: Shrink radeon_ring_write()
Date: Mon, 18 May 2015 20:25:48 +0200 [thread overview]
Message-ID: <555A2EAC.9000302@redhat.com> (raw)
In-Reply-To: <555A2B4E.1090309@amd.com>
On 05/18/2015 08:11 PM, Christian König wrote:
> De-duplicating the error message is probably a good idea,
> but are the remaining code changes really necessary for the size reduction?
The conversion from
if (ring->count_dw <= 0)
error;
...
ring->count_dw--;
to
if (--ring->count_dw < 0)
error;
...
eliminates one access to ring->count_dw.
Conversion from
ring->ring[ring->wptr++] = v;
ring->wptr &= ring->ptr_mask;
to
ring->ring[ring->wptr] = v;
ring->wptr = (ring->wptr + 1) & ring->ptr_mask;
ideally (with infinitely smart compiler)
shouldn't be necessary, but at least for my gcc
version it eliminates one additional insn:
gcc copies ring->wptr to a scratch reg, then increments it,
then uses scratch reg for addressing...
> On 18.05.2015 19:59, Denys Vlasenko wrote:
>> Inlined radeon_ring_write() has 729 callers, which amounts to about 50000
>> bytes of code. however, deinlining it is probably too much
>> of a performance impact.
>>
>> This patch shrinks slow path a bit and optimizes fast path.
>> Comparison of generated machine code is below:
>>
>> old___________________________________ new____________________________
>> 55 push %rbp 55 push %rbp
>> 4889e5 mov %rsp,%rbp ff4f38 decl 0x38(%rdi)
>> 4154 push %r12 4889e5 mov %rsp,%rbp
>> 4189f4 mov %esi,%r12d 4154 push %r12
>> 53 push %rbx 4189f4 mov %esi,%r12d
>> 837f3800 cmpl $0x0,0x38(%rdi) 53 push %rbx
>> 4889fb mov %rdi,%rbx 4889fb mov %rdi,%rbx
>> 7f0e jg <.Lbl> 7905 jns <.Lbl>
>> 48c7c78f51a785 mov $message,%rdi
>> 31c0 xor %eax,%eax
>> e89306f9ff call <drm_err> e8cbffffff call <radeon_ring_overflow>
>> .Lbl:
>> 8b4328 mov 0x28(%rbx),%eax 8b5328 mov 0x28(%rbx),%edx
>> 488b5308 mov 0x8(%rbx),%rdx 488b4308 mov 0x8(%rbx),%rax
>> 89c1 mov %eax,%ecx 488d0490 lea (%rax,%rdx,4),%rax
>> ffc0 inc %eax
>> 488d148a lea (%rdx,%rcx,4),%rdx 448920 mov %r12d,(%rax)
>> 448922 mov %r12d,(%rdx) 8b4328 mov 0x28(%rbx),%eax
>> 234354 and 0x54(%rbx),%eax ff4b34 decl 0x34(%rbx)
>> ff4b38 decl 0x38(%rbx) ffc0 inc %eax
>> ff4b34 decl 0x34(%rbx) 234354 and 0x54(%rbx),%eax
>> 894328 mov %eax,0x28(%rbx) 894328 mov %eax,0x28(%rbx)
>> 5b pop %rbx 5b pop %rbx
>> 415c pop %r12 415c pop %r12
>> 5d pop %rbp 5d pop %rbp
>>
>> This shaves off more than 10 kbytes of code off the kernel:
>>
>> text data bss dec hex filename
>> 85657104 22294872 20627456 128579432 7a9f768 vmlinux.before
>> 85646544 22294872 20627456 128568872 7a9ce28 vmlinux
>>
>> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: linux-kernel@vger.kernel.org
>> ---
>> drivers/gpu/drm/radeon/radeon.h | 11 +++++------
>> drivers/gpu/drm/radeon/radeon_ring.c | 5 +++++
>> 2 files changed, 10 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
>> index bb6b25c..9106873 100644
>> --- a/drivers/gpu/drm/radeon/radeon.h
>> +++ b/drivers/gpu/drm/radeon/radeon.h
>> @@ -2658,14 +2658,13 @@ void radeon_atombios_fini(struct radeon_device *rdev);
>> *
>> * Write a value to the requested ring buffer (all asics).
>> */
>> +void radeon_ring_overflow(void);
>> static inline void radeon_ring_write(struct radeon_ring *ring, uint32_t v)
>> {
>> - if (ring->count_dw <= 0)
>> - DRM_ERROR("radeon: writing more dwords to the ring than expected!\n");
>> -
>> - ring->ring[ring->wptr++] = v;
>> - ring->wptr &= ring->ptr_mask;
>> - ring->count_dw--;
>> + if (--ring->count_dw < 0)
>> + radeon_ring_overflow();
>> + ring->ring[ring->wptr] = v;
>> + ring->wptr = (ring->wptr + 1) & ring->ptr_mask;
>> ring->ring_free_dw--;
>> }
>> diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c
>> index 2456f69..8204c23 100644
>> --- a/drivers/gpu/drm/radeon/radeon_ring.c
>> +++ b/drivers/gpu/drm/radeon/radeon_ring.c
>> @@ -126,6 +126,11 @@ int radeon_ring_alloc(struct radeon_device *rdev, struct radeon_ring *ring, unsi
>> return 0;
>> }
>> +void radeon_ring_overflow(void)
>> +{
>> + DRM_ERROR("radeon: writing more dwords to the ring than expected!\n");
>> +}
>> +
>> /**
>> * radeon_ring_lock - lock the ring and allocate space on it
>> *
>
next prev parent reply other threads:[~2015-05-18 18:25 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-18 17:59 [PATCH] radeon: Deinline indirect register accessor functions Denys Vlasenko
2015-05-18 17:59 ` [PATCH] radeon: Shrink radeon_ring_write() Denys Vlasenko
2015-05-18 18:11 ` Christian König
2015-05-18 18:25 ` Denys Vlasenko [this message]
2015-05-18 19:01 ` Christian König
2015-05-18 18:06 ` [PATCH] radeon: Deinline indirect register accessor functions Christian König
2015-05-18 18:50 ` Denys Vlasenko
2015-05-18 19:04 ` Christian König
2015-05-18 19:22 ` Ilia Mirkin
2015-05-18 19:22 ` Ilia Mirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=555A2EAC.9000302@redhat.com \
--to=dvlasenk@redhat.com \
--cc=alexander.deucher@amd.com \
--cc=christian.koenig@amd.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.