From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755009AbbERTCK (ORCPT ); Mon, 18 May 2015 15:02:10 -0400 Received: from mail-bl2on0136.outbound.protection.outlook.com ([65.55.169.136]:47280 "EHLO na01-bl2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1754972AbbERTCE convert rfc822-to-8bit (ORCPT ); Mon, 18 May 2015 15:02:04 -0400 X-Greylist: delayed 2447 seconds by postgrey-1.27 at vger.kernel.org; Mon, 18 May 2015 15:02:03 EDT Authentication-Results: spf=none (sender IP is 165.204.84.221) smtp.mailfrom=amd.com; redhat.com; dkim=none (message not signed) header.d=none; X-WSS-ID: 0NOK7J8-07-8SA-02 X-M-MSG: Message-ID: <555A3720.80600@amd.com> Date: Mon, 18 May 2015 21:01:52 +0200 From: =?UTF-8?B?Q2hyaXN0aWFuIEvDtm5pZw==?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: Denys Vlasenko CC: Alex Deucher , Subject: Re: [PATCH] radeon: Shrink radeon_ring_write() References: <1431971955-31231-1-git-send-email-dvlasenk@redhat.com> <1431971955-31231-2-git-send-email-dvlasenk@redhat.com> <555A2B4E.1090309@amd.com> <555A2EAC.9000302@redhat.com> In-Reply-To: <555A2EAC.9000302@redhat.com> Content-Type: text/plain; charset="utf-8"; format=flowed X-Originating-IP: [10.224.50.12] Content-Transfer-Encoding: 8BIT X-EOPAttributedMessage: 0 X-Microsoft-Exchange-Diagnostics: 1;BL2FFO11FD046;1:swzAi5UvFZd8FUzKIbgu6Ny849f3m6zAuvDeIZ1Nn0EhryIa5uPYgN2XbtCcHEsUNEHJBcccVhEZbys6BHt6ih/v4zclUbXtvESDR+WdDhYlccdITgi4DtnrWyPVxh25V+zNo9YjQaj0Td0Z2sn1AWTQ+RANNkYHStwmZMopqtdR9zR9HJmrMNdTMQ/13o2jFaYGPGLl6aCtf+OFfuUUINVBMIUrFwrGW/BeTBl1zy4FGVgtHWO5Rt7IN3aeqr46HZIfWzU5kX3iGFNGTl5omQ== X-Forefront-Antispam-Report: CIP:165.204.84.221;CTRY:US;IPV:NLI;EFV:NLI;SFV:NSPM;SFS:(10019020)(6009001)(428002)(24454002)(51704005)(199003)(479174004)(377454003)(189002)(86362001)(189998001)(575784001)(65816999)(23676002)(47776003)(110136002)(4001350100001)(4001540100001)(5001830100001)(5001860100001)(5001920100001)(97736004)(62966003)(65806001)(85202003)(65956001)(54356999)(106466001)(76176999)(68736005)(50986999)(77156002)(64706001)(77096005)(46102003)(19580395003)(19580405001)(92566002)(105586002)(64126003)(33656002)(93886004)(101416001)(85182001)(36756003)(50466002)(87936001)(83506001)(2950100001)(3940600001);DIR:OUT;SFP:1102;SCL:1;SRVR:BY2PR02MB076;H:atltwp01.amd.com;FPR:;SPF:None;PTR:InfoDomainNonexistent;A:1;MX:1;LANG:en; X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:BY2PR02MB076; X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(5005006)(3002001);SRVR:BY2PR02MB076;BCL:0;PCL:0;RULEID:;SRVR:BY2PR02MB076; X-Forefront-PRVS: 058043A388 X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 18 May 2015 19:01:58.4360 (UTC) X-MS-Exchange-CrossTenant-Id: fde4dada-be84-483f-92cc-e026cbee8e96 X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=fde4dada-be84-483f-92cc-e026cbee8e96;Ip=[165.204.84.221];Helo=[atltwp01.amd.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY2PR02MB076 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Well thinking about this we could actually get completely ride of decrementing count_dw if we precalculate the last valid wptr instead. That would also reduce the number of error messages to once every ring submit, but I would say this is actually a rather positive side effect. Regards, Christian. On 18.05.2015 20:25, Denys Vlasenko wrote: > On 05/18/2015 08:11 PM, Christian König wrote: >> De-duplicating the error message is probably a good idea, >> but are the remaining code changes really necessary for the size reduction? > The conversion from > > if (ring->count_dw <= 0) > error; > ... > ring->count_dw--; > > to > > if (--ring->count_dw < 0) > error; > ... > > eliminates one access to ring->count_dw. > > > Conversion from > > ring->ring[ring->wptr++] = v; > ring->wptr &= ring->ptr_mask; > > to > > ring->ring[ring->wptr] = v; > ring->wptr = (ring->wptr + 1) & ring->ptr_mask; > > ideally (with infinitely smart compiler) > shouldn't be necessary, but at least for my gcc > version it eliminates one additional insn: > gcc copies ring->wptr to a scratch reg, then increments it, > then uses scratch reg for addressing... > > >> On 18.05.2015 19:59, Denys Vlasenko wrote: >>> Inlined radeon_ring_write() has 729 callers, which amounts to about 50000 >>> bytes of code. however, deinlining it is probably too much >>> of a performance impact. >>> >>> This patch shrinks slow path a bit and optimizes fast path. >>> Comparison of generated machine code is below: >>> >>> old___________________________________ new____________________________ >>> 55 push %rbp 55 push %rbp >>> 4889e5 mov %rsp,%rbp ff4f38 decl 0x38(%rdi) >>> 4154 push %r12 4889e5 mov %rsp,%rbp >>> 4189f4 mov %esi,%r12d 4154 push %r12 >>> 53 push %rbx 4189f4 mov %esi,%r12d >>> 837f3800 cmpl $0x0,0x38(%rdi) 53 push %rbx >>> 4889fb mov %rdi,%rbx 4889fb mov %rdi,%rbx >>> 7f0e jg <.Lbl> 7905 jns <.Lbl> >>> 48c7c78f51a785 mov $message,%rdi >>> 31c0 xor %eax,%eax >>> e89306f9ff call e8cbffffff call >>> .Lbl: >>> 8b4328 mov 0x28(%rbx),%eax 8b5328 mov 0x28(%rbx),%edx >>> 488b5308 mov 0x8(%rbx),%rdx 488b4308 mov 0x8(%rbx),%rax >>> 89c1 mov %eax,%ecx 488d0490 lea (%rax,%rdx,4),%rax >>> ffc0 inc %eax >>> 488d148a lea (%rdx,%rcx,4),%rdx 448920 mov %r12d,(%rax) >>> 448922 mov %r12d,(%rdx) 8b4328 mov 0x28(%rbx),%eax >>> 234354 and 0x54(%rbx),%eax ff4b34 decl 0x34(%rbx) >>> ff4b38 decl 0x38(%rbx) ffc0 inc %eax >>> ff4b34 decl 0x34(%rbx) 234354 and 0x54(%rbx),%eax >>> 894328 mov %eax,0x28(%rbx) 894328 mov %eax,0x28(%rbx) >>> 5b pop %rbx 5b pop %rbx >>> 415c pop %r12 415c pop %r12 >>> 5d pop %rbp 5d pop %rbp >>> >>> This shaves off more than 10 kbytes of code off the kernel: >>> >>> text data bss dec hex filename >>> 85657104 22294872 20627456 128579432 7a9f768 vmlinux.before >>> 85646544 22294872 20627456 128568872 7a9ce28 vmlinux >>> >>> Signed-off-by: Denys Vlasenko >>> Cc: Christian König >>> Cc: Alex Deucher >>> Cc: linux-kernel@vger.kernel.org >>> --- >>> drivers/gpu/drm/radeon/radeon.h | 11 +++++------ >>> drivers/gpu/drm/radeon/radeon_ring.c | 5 +++++ >>> 2 files changed, 10 insertions(+), 6 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h >>> index bb6b25c..9106873 100644 >>> --- a/drivers/gpu/drm/radeon/radeon.h >>> +++ b/drivers/gpu/drm/radeon/radeon.h >>> @@ -2658,14 +2658,13 @@ void radeon_atombios_fini(struct radeon_device *rdev); >>> * >>> * Write a value to the requested ring buffer (all asics). >>> */ >>> +void radeon_ring_overflow(void); >>> static inline void radeon_ring_write(struct radeon_ring *ring, uint32_t v) >>> { >>> - if (ring->count_dw <= 0) >>> - DRM_ERROR("radeon: writing more dwords to the ring than expected!\n"); >>> - >>> - ring->ring[ring->wptr++] = v; >>> - ring->wptr &= ring->ptr_mask; >>> - ring->count_dw--; >>> + if (--ring->count_dw < 0) >>> + radeon_ring_overflow(); >>> + ring->ring[ring->wptr] = v; >>> + ring->wptr = (ring->wptr + 1) & ring->ptr_mask; >>> ring->ring_free_dw--; >>> } >>> diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c >>> index 2456f69..8204c23 100644 >>> --- a/drivers/gpu/drm/radeon/radeon_ring.c >>> +++ b/drivers/gpu/drm/radeon/radeon_ring.c >>> @@ -126,6 +126,11 @@ int radeon_ring_alloc(struct radeon_device *rdev, struct radeon_ring *ring, unsi >>> return 0; >>> } >>> +void radeon_ring_overflow(void) >>> +{ >>> + DRM_ERROR("radeon: writing more dwords to the ring than expected!\n"); >>> +} >>> + >>> /** >>> * radeon_ring_lock - lock the ring and allocate space on it >>> *