From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?ISO-8859-1?Q?Christian_K=F6nig?= Subject: Re: [PATCH] drm/radeon: Inline r100_mm_rreg, -wreg, v3 Date: Sun, 20 Apr 2014 19:41:11 +0200 Message-ID: <535406B7.3090309@vodafone.de> References: <20140420202933.78b3e355.cand@gmx.com> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1"; Format="flowed" Content-Transfer-Encoding: quoted-printable Return-path: Received: from pegasos-out.vodafone.de (pegasos-out.vodafone.de [80.84.1.38]) by gabe.freedesktop.org (Postfix) with ESMTP id C058A6E1E8 for ; Sun, 20 Apr 2014 10:41:23 -0700 (PDT) In-Reply-To: <20140420202933.78b3e355.cand@gmx.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: Lauri Kasanen , dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org Am 20.04.2014 19:29, schrieb Lauri Kasanen: > This was originally un-inlined by Andi Kleen in 2011 citing size concerns. > Indeed, a first attempt at inlining it grew radeon.ko by 7%. > > However, 2% of cpu is spent in this function. Simply inlining it gave 1% = more fps > in Urban Terror. > > v2: We know the minimum MMIO size. Adding it to the if allows the compile= r to > optimize the branch out, improving both performance and size. > > The v2 patch decreases radeon.ko size by 2%. I didn't re-benchmark, but c= ommon sense > says perf is now more than 1% better. > > v3: Also change _wreg, make the threshold a define. > > Inlining _wreg increased the size a bit compared to v2, so now radeon.ko > is only 1% smaller. > > Signed-off-by: Lauri Kasanen Reviewed-by: Christian K=F6nig > --- > drivers/gpu/drm/radeon/r100.c | 33 --------------------------------- > drivers/gpu/drm/radeon/radeon.h | 40 ++++++++++++++++++++++++++++++++++= ++---- > 2 files changed, 36 insertions(+), 37 deletions(-) > > diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c > index b6c3264..a4e7871 100644 > --- a/drivers/gpu/drm/radeon/r100.c > +++ b/drivers/gpu/drm/radeon/r100.c > @@ -4086,39 +4086,6 @@ int r100_init(struct radeon_device *rdev) > return 0; > } > = > -uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg, > - bool always_indirect) > -{ > - if (reg < rdev->rmmio_size && !always_indirect) > - return readl(((void __iomem *)rdev->rmmio) + reg); > - else { > - unsigned long flags; > - uint32_t ret; > - > - spin_lock_irqsave(&rdev->mmio_idx_lock, flags); > - writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX); > - ret =3D readl(((void __iomem *)rdev->rmmio) + RADEON_MM_DATA); > - spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags); > - > - return ret; > - } > -} > - > -void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, uint32_t v, > - bool always_indirect) > -{ > - if (reg < rdev->rmmio_size && !always_indirect) > - writel(v, ((void __iomem *)rdev->rmmio) + reg); > - else { > - unsigned long flags; > - > - spin_lock_irqsave(&rdev->mmio_idx_lock, flags); > - writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX); > - writel(v, ((void __iomem *)rdev->rmmio) + RADEON_MM_DATA); > - spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags); > - } > -} > - > u32 r100_io_rreg(struct radeon_device *rdev, u32 reg) > { > if (reg < rdev->rio_mem_size) > diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/rad= eon.h > index f21db7a..a749b6c 100644 > --- a/drivers/gpu/drm/radeon/radeon.h > +++ b/drivers/gpu/drm/radeon/radeon.h > @@ -2328,10 +2328,42 @@ int radeon_device_init(struct radeon_device *rdev, > void radeon_device_fini(struct radeon_device *rdev); > int radeon_gpu_wait_for_idle(struct radeon_device *rdev); > = > -uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg, > - bool always_indirect); > -void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, uint32_t v, > - bool always_indirect); > +#define RADEON_MIN_MMIO_SIZE 0x10000 > + > +static inline uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t= reg, > + bool always_indirect) > +{ > + /* The mmio size is 64kb at minimum. Allows the if to be optimized out.= */ > + if ((reg < rdev->rmmio_size || reg < RADEON_MIN_MMIO_SIZE) && !always_i= ndirect) > + return readl(((void __iomem *)rdev->rmmio) + reg); > + else { > + unsigned long flags; > + uint32_t ret; > + > + spin_lock_irqsave(&rdev->mmio_idx_lock, flags); > + writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX); > + ret =3D readl(((void __iomem *)rdev->rmmio) + RADEON_MM_DATA); > + spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags); > + > + return ret; > + } > +} > + > +static inline void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg= , uint32_t v, > + bool always_indirect) > +{ > + if ((reg < rdev->rmmio_size || reg < RADEON_MIN_MMIO_SIZE) && !always_i= ndirect) > + writel(v, ((void __iomem *)rdev->rmmio) + reg); > + else { > + unsigned long flags; > + > + spin_lock_irqsave(&rdev->mmio_idx_lock, flags); > + writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX); > + writel(v, ((void __iomem *)rdev->rmmio) + RADEON_MM_DATA); > + spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags); > + } > +} > + > u32 r100_io_rreg(struct radeon_device *rdev, u32 reg); > void r100_io_wreg(struct radeon_device *rdev, u32 reg, u32 v); > =