* [PATCH] agp/intel, drm/i915: Use a write-combining map for updating PTEs
@ 2012-08-12 11:04 Chris Wilson
2012-08-12 15:47 ` Daniel Vetter
0 siblings, 1 reply; 5+ messages in thread
From: Chris Wilson @ 2012-08-12 11:04 UTC (permalink / raw)
To: intel-gfx
In order to be able to ioremap_wc the GTT space, we need to remove the
conflicting pci_iomap from drm/i915, so we limit the register map in
drm/i915 to the suitable range for each generation. The benefit of doing
this is an order of magnitude reduction in time spent rewriting the GTT
entries when inserting and removing objects. For example, this halves the
CPU time spent in X when pushing pixels for chromium through a userptr
(chromium has a bug where it likes to recreate its ShmPixmap on every
draw).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
drivers/char/agp/intel-gtt.c | 13 ++++++++++---
drivers/gpu/drm/i915/i915_dma.c | 14 ++++++++++++--
2 files changed, 22 insertions(+), 5 deletions(-)
diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c
index 76103aa..73bdb74 100644
--- a/drivers/char/agp/intel-gtt.c
+++ b/drivers/char/agp/intel-gtt.c
@@ -666,8 +666,14 @@ static int intel_gtt_init(void)
gtt_map_size = intel_private.base.gtt_total_entries * 4;
- intel_private.gtt = ioremap(intel_private.gtt_bus_addr,
- gtt_map_size);
+ intel_private.gtt = ioremap_wc(intel_private.gtt_bus_addr,
+ gtt_map_size);
+ if (!intel_private.gtt) {
+ dev_err(&intel_private.bridge_dev->dev,
+ "failed to map GATT as wc, falling back to uc-\n");
+ intel_private.gtt = ioremap(intel_private.gtt_bus_addr,
+ gtt_map_size);
+ }
if (!intel_private.gtt) {
intel_private.driver->cleanup();
iounmap(intel_private.registers);
@@ -1233,12 +1239,13 @@ static inline int needs_idle_maps(void)
static int i9xx_setup(void)
{
u32 reg_addr;
- int size = KB(512);
+ int size;
pci_read_config_dword(intel_private.pcidev, I915_MMADDR, ®_addr);
reg_addr &= 0xfff80000;
+ size = KB(512);
if (INTEL_GTT_GEN >= 7)
size = MB(2);
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index a21e0b0..c453304 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1458,7 +1458,7 @@ int i915_driver_load(struct drm_device *dev, unsigned long flags)
{
struct drm_i915_private *dev_priv;
struct intel_device_info *info;
- int ret = 0, mmio_bar;
+ int ret = 0, mmio_bar, mmio_size;
uint32_t aperture_size;
info = (struct intel_device_info *) flags;
@@ -1522,8 +1522,18 @@ int i915_driver_load(struct drm_device *dev, unsigned long flags)
if (IS_BROADWATER(dev) || IS_CRESTLINE(dev))
dma_set_coherent_mask(&dev->pdev->dev, DMA_BIT_MASK(32));
+ /* Restrict iomap to avoid clobbering the GTT which we want WC mapped.
+ * Do not attempt to map the whole BAR!
+ */
mmio_bar = IS_GEN2(dev) ? 1 : 0;
- dev_priv->regs = pci_iomap(dev->pdev, mmio_bar, 0);
+ if (info->gen < 3)
+ mmio_size = 64*1024;
+ else if (info->gen < 5)
+ mmio_size = 512*1024;
+ else
+ mmio_size = 2*1024*1024;
+
+ dev_priv->regs = pci_iomap(dev->pdev, mmio_bar, mmio_size);
if (!dev_priv->regs) {
DRM_ERROR("failed to map registers\n");
ret = -EIO;
--
1.7.10.4
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] agp/intel, drm/i915: Use a write-combining map for updating PTEs
2012-08-12 11:04 [PATCH] agp/intel, drm/i915: Use a write-combining map for updating PTEs Chris Wilson
@ 2012-08-12 15:47 ` Daniel Vetter
2012-08-12 16:01 ` Chris Wilson
0 siblings, 1 reply; 5+ messages in thread
From: Daniel Vetter @ 2012-08-12 15:47 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
On Sun, Aug 12, 2012 at 12:04:39PM +0100, Chris Wilson wrote:
> In order to be able to ioremap_wc the GTT space, we need to remove the
> conflicting pci_iomap from drm/i915, so we limit the register map in
> drm/i915 to the suitable range for each generation. The benefit of doing
> this is an order of magnitude reduction in time spent rewriting the GTT
> entries when inserting and removing objects. For example, this halves the
> CPU time spent in X when pushing pixels for chromium through a userptr
> (chromium has a bug where it likes to recreate its ShmPixmap on every
> draw).
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
How well does this work with ums?
I guess if it blows up, we could ioremap uncached, but when kms
initializes drop that uc mapping and try to remap wc. But I fear that ums
will map the entire bar and hence we can't just unconditionally map the
gatt wc.
-Daniel
--
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] agp/intel, drm/i915: Use a write-combining map for updating PTEs
2012-08-12 15:47 ` Daniel Vetter
@ 2012-08-12 16:01 ` Chris Wilson
2012-08-12 19:12 ` Chris Wilson
0 siblings, 1 reply; 5+ messages in thread
From: Chris Wilson @ 2012-08-12 16:01 UTC (permalink / raw)
To: Daniel Vetter; +Cc: intel-gfx
On Sun, 12 Aug 2012 17:47:46 +0200, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Sun, Aug 12, 2012 at 12:04:39PM +0100, Chris Wilson wrote:
> > In order to be able to ioremap_wc the GTT space, we need to remove the
> > conflicting pci_iomap from drm/i915, so we limit the register map in
> > drm/i915 to the suitable range for each generation. The benefit of doing
> > this is an order of magnitude reduction in time spent rewriting the GTT
> > entries when inserting and removing objects. For example, this halves the
> > CPU time spent in X when pushing pixels for chromium through a userptr
> > (chromium has a bug where it likes to recreate its ShmPixmap on every
> > draw).
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>
> How well does this work with ums?
>
> I guess if it blows up, we could ioremap uncached, but when kms
> initializes drop that uc mapping and try to remap wc. But I fear that ums
> will map the entire bar and hence we can't just unconditionally map the
> gatt wc.
It will work equisitely with ums. It will fail to do as it wishes and
fallback to VESA and everybody will be much happier...
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] agp/intel, drm/i915: Use a write-combining map for updating PTEs
2012-08-12 16:01 ` Chris Wilson
@ 2012-08-12 19:12 ` Chris Wilson
2012-08-13 9:16 ` Daniel Vetter
0 siblings, 1 reply; 5+ messages in thread
From: Chris Wilson @ 2012-08-12 19:12 UTC (permalink / raw)
To: Daniel Vetter; +Cc: intel-gfx
On Sun, 12 Aug 2012 17:01:08 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Sun, 12 Aug 2012 17:47:46 +0200, Daniel Vetter <daniel@ffwll.ch> wrote:
> > On Sun, Aug 12, 2012 at 12:04:39PM +0100, Chris Wilson wrote:
> > > In order to be able to ioremap_wc the GTT space, we need to remove the
> > > conflicting pci_iomap from drm/i915, so we limit the register map in
> > > drm/i915 to the suitable range for each generation. The benefit of doing
> > > this is an order of magnitude reduction in time spent rewriting the GTT
> > > entries when inserting and removing objects. For example, this halves the
> > > CPU time spent in X when pushing pixels for chromium through a userptr
> > > (chromium has a bug where it likes to recreate its ShmPixmap on every
> > > draw).
> > >
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >
> > How well does this work with ums?
> >
> > I guess if it blows up, we could ioremap uncached, but when kms
> > initializes drop that uc mapping and try to remap wc. But I fear that ums
> > will map the entire bar and hence we can't just unconditionally map the
> > gatt wc.
>
> It will work equisitely with ums. It will fail to do as it wishes and
> fallback to VESA and everybody will be much happier...
So having rediscovered the hard truth that i915.modeset=1 and
xf86-video-2.6.0 results in nasty hangs, setting the GTT table to WC has
no effect upon the ancient UMS module - it shows the retro background
and appears to function. We struck lucky. \o/
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] agp/intel, drm/i915: Use a write-combining map for updating PTEs
2012-08-12 19:12 ` Chris Wilson
@ 2012-08-13 9:16 ` Daniel Vetter
0 siblings, 0 replies; 5+ messages in thread
From: Daniel Vetter @ 2012-08-13 9:16 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
On Sun, Aug 12, 2012 at 08:12:02PM +0100, Chris Wilson wrote:
> On Sun, 12 Aug 2012 17:01:08 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > On Sun, 12 Aug 2012 17:47:46 +0200, Daniel Vetter <daniel@ffwll.ch> wrote:
> > > On Sun, Aug 12, 2012 at 12:04:39PM +0100, Chris Wilson wrote:
> > > > In order to be able to ioremap_wc the GTT space, we need to remove the
> > > > conflicting pci_iomap from drm/i915, so we limit the register map in
> > > > drm/i915 to the suitable range for each generation. The benefit of doing
> > > > this is an order of magnitude reduction in time spent rewriting the GTT
> > > > entries when inserting and removing objects. For example, this halves the
> > > > CPU time spent in X when pushing pixels for chromium through a userptr
> > > > (chromium has a bug where it likes to recreate its ShmPixmap on every
> > > > draw).
> > > >
> > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > >
> > > How well does this work with ums?
> > >
> > > I guess if it blows up, we could ioremap uncached, but when kms
> > > initializes drop that uc mapping and try to remap wc. But I fear that ums
> > > will map the entire bar and hence we can't just unconditionally map the
> > > gatt wc.
> >
> > It will work equisitely with ums. It will fail to do as it wishes and
> > fallback to VESA and everybody will be much happier...
>
> So having rediscovered the hard truth that i915.modeset=1 and
> xf86-video-2.6.0 results in nasty hangs, setting the GTT table to WC has
> no effect upon the ancient UMS module - it shows the retro background
> and appears to function. We struck lucky. \o/
Ok, let's them the wrath of the abi gods. Merged to -queued, thanks for
the patch.
-Daniel
--
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2012-08-13 9:15 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-12 11:04 [PATCH] agp/intel, drm/i915: Use a write-combining map for updating PTEs Chris Wilson
2012-08-12 15:47 ` Daniel Vetter
2012-08-12 16:01 ` Chris Wilson
2012-08-12 19:12 ` Chris Wilson
2012-08-13 9:16 ` Daniel Vetter
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.