From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754622AbbERSLk (ORCPT ); Mon, 18 May 2015 14:11:40 -0400 Received: from mail-by2on0131.outbound.protection.outlook.com ([207.46.100.131]:62699 "EHLO na01-by2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751207AbbERSLi convert rfc822-to-8bit (ORCPT ); Mon, 18 May 2015 14:11:38 -0400 X-Greylist: delayed 307 seconds by postgrey-1.27 at vger.kernel.org; Mon, 18 May 2015 14:11:38 EDT Authentication-Results: spf=none (sender IP is 165.204.84.222) smtp.mailfrom=amd.com; redhat.com; dkim=none (message not signed) header.d=none; X-WSS-ID: 0NOK573-08-GSJ-02 X-M-MSG: Message-ID: <555A2B4E.1090309@amd.com> Date: Mon, 18 May 2015 20:11:26 +0200 From: =?UTF-8?B?Q2hyaXN0aWFuIEvDtm5pZw==?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: Denys Vlasenko CC: Alex Deucher , Subject: Re: [PATCH] radeon: Shrink radeon_ring_write() References: <1431971955-31231-1-git-send-email-dvlasenk@redhat.com> <1431971955-31231-2-git-send-email-dvlasenk@redhat.com> In-Reply-To: <1431971955-31231-2-git-send-email-dvlasenk@redhat.com> Content-Type: text/plain; charset="utf-8"; format=flowed X-Originating-IP: [10.224.50.75] Content-Transfer-Encoding: 8BIT X-EOPAttributedMessage: 0 X-Microsoft-Exchange-Diagnostics: 1;BN1BFFO11FD006;1:AvMXC8BPXkLTfpUbZTPTA0CI6sCy1hvn79cItsnaIF/lOByp3Rj8M5hSrf29xgavO2FrfHhosCPJx8vvK6SVTF9srQlqh/cD5096yAsUxu0/j7EdtMyHdHwUPV4UQTfLb45VOpTsxdGg6cyvpNGNiaUgLRJuDpFXQrNV6uCn8c2vtVTJqMk4C7pUvEHfeL1dSw6bnKyr7B+QVzeG5mFE0LDLclBNqQdzY9O9DmK5J7r2BAKqSYCX01ZSdxQ1/RpX4LPWGCpgFHVA1HscxSewtA== X-Forefront-Antispam-Report: CIP:165.204.84.222;CTRY:US;IPV:NLI;EFV:NLI;SFV:NSPM;SFS:(10019020)(6009001)(428002)(199003)(51704005)(24454002)(189002)(36756003)(87936001)(59896002)(46102003)(92566002)(85182001)(50466002)(101416001)(65956001)(33656002)(65806001)(64706001)(97736004)(189998001)(80316001)(19580405001)(2950100001)(4001540100001)(19580395003)(4001350100001)(77096005)(68736005)(47776003)(83506001)(5001860100001)(5001920100001)(77156002)(23676002)(62966003)(110136002)(86362001)(575784001)(64126003)(50986999)(85202003)(76176999)(105586002)(106466001)(65816999)(87266999)(5001830100001)(54356999)(3940600001);DIR:OUT;SFP:1102;SCL:1;SRVR:CO1PR02MB077;H:atltwp02.amd.com;FPR:;SPF:None;PTR:ErrorRetry;A:1;MX:1;LANG:en; X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:CO1PR02MB077;UriScan:;BCL:0;PCL:0;RULEID:;SRVR:CO1PR02MB174; X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(5005006)(3002001);SRVR:CO1PR02MB077;BCL:0;PCL:0;RULEID:;SRVR:CO1PR02MB077; X-Forefront-PRVS: 058043A388 X-MS-Exchange-CrossTenant-OriginalArrivalTime: 18 May 2015 18:11:32.6187 (UTC) X-MS-Exchange-CrossTenant-Id: fde4dada-be84-483f-92cc-e026cbee8e96 X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=fde4dada-be84-483f-92cc-e026cbee8e96;Ip=[165.204.84.222];Helo=[atltwp02.amd.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CO1PR02MB077 X-OriginatorOrg: amd4.onmicrosoft.com Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org De-duplicating the error message is probably a good idea, but are the remaining code changes really necessary for the size reduction? Regards, Christian. On 18.05.2015 19:59, Denys Vlasenko wrote: > Inlined radeon_ring_write() has 729 callers, which amounts to about 50000 > bytes of code. however, deinlining it is probably too much > of a performance impact. > > This patch shrinks slow path a bit and optimizes fast path. > Comparison of generated machine code is below: > > old___________________________________ new____________________________ > 55 push %rbp 55 push %rbp > 4889e5 mov %rsp,%rbp ff4f38 decl 0x38(%rdi) > 4154 push %r12 4889e5 mov %rsp,%rbp > 4189f4 mov %esi,%r12d 4154 push %r12 > 53 push %rbx 4189f4 mov %esi,%r12d > 837f3800 cmpl $0x0,0x38(%rdi) 53 push %rbx > 4889fb mov %rdi,%rbx 4889fb mov %rdi,%rbx > 7f0e jg <.Lbl> 7905 jns <.Lbl> > 48c7c78f51a785 mov $message,%rdi > 31c0 xor %eax,%eax > e89306f9ff call e8cbffffff call > .Lbl: > 8b4328 mov 0x28(%rbx),%eax 8b5328 mov 0x28(%rbx),%edx > 488b5308 mov 0x8(%rbx),%rdx 488b4308 mov 0x8(%rbx),%rax > 89c1 mov %eax,%ecx 488d0490 lea (%rax,%rdx,4),%rax > ffc0 inc %eax > 488d148a lea (%rdx,%rcx,4),%rdx 448920 mov %r12d,(%rax) > 448922 mov %r12d,(%rdx) 8b4328 mov 0x28(%rbx),%eax > 234354 and 0x54(%rbx),%eax ff4b34 decl 0x34(%rbx) > ff4b38 decl 0x38(%rbx) ffc0 inc %eax > ff4b34 decl 0x34(%rbx) 234354 and 0x54(%rbx),%eax > 894328 mov %eax,0x28(%rbx) 894328 mov %eax,0x28(%rbx) > 5b pop %rbx 5b pop %rbx > 415c pop %r12 415c pop %r12 > 5d pop %rbp 5d pop %rbp > > This shaves off more than 10 kbytes of code off the kernel: > > text data bss dec hex filename > 85657104 22294872 20627456 128579432 7a9f768 vmlinux.before > 85646544 22294872 20627456 128568872 7a9ce28 vmlinux > > Signed-off-by: Denys Vlasenko > Cc: Christian König > Cc: Alex Deucher > Cc: linux-kernel@vger.kernel.org > --- > drivers/gpu/drm/radeon/radeon.h | 11 +++++------ > drivers/gpu/drm/radeon/radeon_ring.c | 5 +++++ > 2 files changed, 10 insertions(+), 6 deletions(-) > > diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h > index bb6b25c..9106873 100644 > --- a/drivers/gpu/drm/radeon/radeon.h > +++ b/drivers/gpu/drm/radeon/radeon.h > @@ -2658,14 +2658,13 @@ void radeon_atombios_fini(struct radeon_device *rdev); > * > * Write a value to the requested ring buffer (all asics). > */ > +void radeon_ring_overflow(void); > static inline void radeon_ring_write(struct radeon_ring *ring, uint32_t v) > { > - if (ring->count_dw <= 0) > - DRM_ERROR("radeon: writing more dwords to the ring than expected!\n"); > - > - ring->ring[ring->wptr++] = v; > - ring->wptr &= ring->ptr_mask; > - ring->count_dw--; > + if (--ring->count_dw < 0) > + radeon_ring_overflow(); > + ring->ring[ring->wptr] = v; > + ring->wptr = (ring->wptr + 1) & ring->ptr_mask; > ring->ring_free_dw--; > } > > diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c > index 2456f69..8204c23 100644 > --- a/drivers/gpu/drm/radeon/radeon_ring.c > +++ b/drivers/gpu/drm/radeon/radeon_ring.c > @@ -126,6 +126,11 @@ int radeon_ring_alloc(struct radeon_device *rdev, struct radeon_ring *ring, unsi > return 0; > } > > +void radeon_ring_overflow(void) > +{ > + DRM_ERROR("radeon: writing more dwords to the ring than expected!\n"); > +} > + > /** > * radeon_ring_lock - lock the ring and allocate space on it > *