public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: DRM Error on Acer Aspire One
  2010-05-11 16:10 ` Chris Wilson
@ 2010-05-11 14:48   ` Andrew Morton
  2010-05-11 18:18     ` Jaswinder Singh Rajput
  2010-05-11 18:19     ` Chris Wilson
  2010-05-11 17:39   ` Jaswinder Singh Rajput
  1 sibling, 2 replies; 17+ messages in thread
From: Andrew Morton @ 2010-05-11 14:48 UTC (permalink / raw)
  To: Chris Wilson
  Cc: Jaswinder Singh Rajput, dri-devel, Dave Airlie,
	Linux Kernel Mailing List

On Tue, 11 May 2010 17:10:53 +0100 Chris Wilson <chris@chris-wilson.co.uk> wrote:

> On Tue, 11 May 2010 20:30:07 +0530, Jaswinder Singh Rajput <jaswinderlinux@gmail.com> wrote:
> > Hello,
> > 
> > With latest git kernel, I am getting following DRM error and not
> > getting XWindows :
> 
> [snip]
> 
> Hmm, there are still patches for capturing error state that haven't gone
> upstream, shame on me.
> 
> That error is a secondary issue to the GPU hang that is being reported. If
> it is a regression caused by a kernel update it would be very useful if
> you could bisect to the erroneous commit.

It helps if one reads the code and the trace...

i915_error_object_create() is using KM_USER0 from softirq context. 
That's a bug, and a pretty serious one.  If some innocent civilian is
writing highmem data to disk and this timer interrupt fires and trashes
his KM_USER0 slot, the disk contents will be corrupted.

Something like this...

--- a/drivers/gpu/drm/i915/i915_irq.c~a
+++ a/drivers/gpu/drm/i915/i915_irq.c
@@ -456,11 +456,15 @@ i915_error_object_create(struct drm_devi
 
 	for (page = 0; page < page_count; page++) {
 		void *s, *d = kmalloc(PAGE_SIZE, GFP_ATOMIC);
+		unsigned long flags;
+
 		if (d == NULL)
 			goto unwind;
-		s = kmap_atomic(src_priv->pages[page], KM_USER0);
+		local_irq_save(flags);
+		s = kmap_atomic(src_priv->pages[page], KM_IRQ0);
 		memcpy(d, s, PAGE_SIZE);
-		kunmap_atomic(s, KM_USER0);
+		kunmap_atomic(s, KM_IRQ0);
+		local_irq_restore(flags);
 		dst->pages[page] = d;
 	}
 	dst->page_count = page_count;
_

Please let's get a tested fix for this into 2.6.34.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* DRM Error on Acer Aspire One
@ 2010-05-11 15:00 Jaswinder Singh Rajput
  2010-05-11 16:10 ` Chris Wilson
  0 siblings, 1 reply; 17+ messages in thread
From: Jaswinder Singh Rajput @ 2010-05-11 15:00 UTC (permalink / raw)
  To: dri-devel, Dave Airlie, Linux Kernel Mailing List, Andrew Morton

Hello,

With latest git kernel, I am getting following DRM error and not
getting XWindows :

[   45.269075] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer
elapsed... GPU hung
[   45.269111] ------------[ cut here ]------------
[   45.269139] WARNING: at mm/highmem.c:453 debug_kmap_atomic+0xa9/0x11e()
[   45.269150] Hardware name: Aspire one
[   45.269158] Modules linked in: nf_conntrack_ftp ath9k ath9k_common
battery ath9k_hw [last unloaded: scsi_wait_scan]
[   45.269198] Pid: 0, comm: swapper Not tainted 2.6.34-rc7-netbook #6
[   45.269208] Call Trace:
[   45.269231]  [<c1030ecb>] warn_slowpath_common+0x65/0x7c
[   45.269249]  [<c108ce5d>] ? debug_kmap_atomic+0xa9/0x11e
[   45.269267]  [<c1030eef>] warn_slowpath_null+0xd/0x10
[   45.269284]  [<c108ce5d>] debug_kmap_atomic+0xa9/0x11e
[   45.269304]  [<c10207c9>] kmap_atomic_prot+0x4d/0xb2
[   45.269321]  [<c102083c>] kmap_atomic+0xe/0x10
[   45.269341]  [<c11f7d64>] i915_error_object_create+0xea/0x14f
[   45.269359]  [<c11f8132>] i915_handle_error+0x369/0x868
[   45.269380]  [<c11f86d0>] i915_hangcheck_elapsed+0x9f/0xdf
[   45.269399]  [<c103ab6e>] run_timer_softirq+0x1c9/0x269
[   45.269417]  [<c11f8631>] ? i915_hangcheck_elapsed+0x0/0xdf
[   45.269435]  [<c1035b7b>] __do_softirq+0xc6/0x186
[   45.269451]  [<c1035c61>] do_softirq+0x26/0x2b
[   45.269466]  [<c1035dd2>] irq_exit+0x29/0x66
[   45.269484]  [<c101681f>] smp_apic_timer_interrupt+0x6e/0x7c
[   45.269504]  [<c141f826>] apic_timer_interrupt+0x2a/0x30
[   45.269524]  [<c104007b>] ? ftrace_raw_event_signal_generate+0x6d/0xd4
[   45.269542]  [<c11bed9d>] ? acpi_idle_enter_simple+0x13b/0x168
[   45.269563]  [<c12dd2b9>] cpuidle_idle_call+0x6b/0xda
[   45.269580]  [<c1001a3c>] cpu_idle+0x44/0x74
[   45.269598]  [<c141a041>] start_secondary+0x1b2/0x1b7
[   45.269612] ---[ end trace ce01d7ca0ae214f4 ]---
[   45.269631] ------------[ cut here ]------------
[   45.269647] WARNING: at mm/highmem.c:453 debug_kmap_atomic+0xa9/0x11e()
[   45.269657] Hardware name: Aspire one
[   45.269665] Modules linked in: nf_conntrack_ftp ath9k ath9k_common
battery ath9k_hw [last unloaded: scsi_wait_scan]
[   45.269700] Pid: 0, comm: swapper Tainted: G        W  2.6.34-rc7-netbook #6
[   45.269710] Call Trace:
[   45.269726]  [<c1030ecb>] warn_slowpath_common+0x65/0x7c
[   45.269743]  [<c108ce5d>] ? debug_kmap_atomic+0xa9/0x11e
[   45.269760]  [<c1030eef>] warn_slowpath_null+0xd/0x10
[   45.269777]  [<c108ce5d>] debug_kmap_atomic+0xa9/0x11e
[   45.269795]  [<c10207c9>] kmap_atomic_prot+0x4d/0xb2
[   45.269812]  [<c102083c>] kmap_atomic+0xe/0x10
[   45.269829]  [<c11f7d64>] i915_error_object_create+0xea/0x14f
[   45.269848]  [<c11f8132>] i915_handle_error+0x369/0x868
[   45.269868]  [<c11f86d0>] i915_hangcheck_elapsed+0x9f/0xdf
[   45.269885]  [<c103ab6e>] run_timer_softirq+0x1c9/0x269
[   45.269903]  [<c11f8631>] ? i915_hangcheck_elapsed+0x0/0xdf
[   45.269920]  [<c1035b7b>] __do_softirq+0xc6/0x186
[   45.269937]  [<c1035c61>] do_softirq+0x26/0x2b
[   45.269952]  [<c1035dd2>] irq_exit+0x29/0x66
[   45.269968]  [<c101681f>] smp_apic_timer_interrupt+0x6e/0x7c
[   45.269985]  [<c141f826>] apic_timer_interrupt+0x2a/0x30
[   45.270004]  [<c104007b>] ? ftrace_raw_event_signal_generate+0x6d/0xd4
[   45.270051]  [<c11bed9d>] ? acpi_idle_enter_simple+0x13b/0x168
[   45.270071]  [<c12dd2b9>] cpuidle_idle_call+0x6b/0xda
[   45.270087]  [<c1001a3c>] cpu_idle+0x44/0x74
[   45.270104]  [<c141a041>] start_secondary+0x1b2/0x1b7
[   45.270117] ---[ end trace ce01d7ca0ae214f5 ]---
[   45.270135] ------------[ cut here ]------------

dmesg : http://userweb.kernel.org/~jaswinder/acer_netbook/dmesg_2634-rc7.txt
.config : http://userweb.kernel.org/~jaswinder/acer_netbook/config_2634-rc7.txt

How can I fix these errors.

Thanks,
--
Jaswinder Singh.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: DRM Error on Acer Aspire One
  2010-05-11 18:19     ` Chris Wilson
@ 2010-05-11 15:35       ` Andrew Morton
  2010-05-11 18:52         ` Chris Wilson
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2010-05-11 15:35 UTC (permalink / raw)
  To: Chris Wilson
  Cc: Jaswinder Singh Rajput, dri-devel, Dave Airlie,
	Linux Kernel Mailing List, Keith Packard, Eric Anholt,
	Ingo Molnar

On Tue, 11 May 2010 19:19:26 +0100 Chris Wilson <chris@chris-wilson.co.uk> wrote:

> On Tue, 11 May 2010 10:48:18 -0400, Andrew Morton <akpm@linux-foundation.org> wrote:
> > 
> > On Tue, 11 May 2010 17:10:53 +0100 Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > 
> > > On Tue, 11 May 2010 20:30:07 +0530, Jaswinder Singh Rajput <jaswinderlinux@gmail.com> wrote:
> > > > Hello,
> > > > 
> > > > With latest git kernel, I am getting following DRM error and not
> > > > getting XWindows :
> > > 
> > > [snip]
> > > 
> > > Hmm, there are still patches for capturing error state that haven't gone
> > > upstream, shame on me.
> > > 
> > > That error is a secondary issue to the GPU hang that is being reported. If
> > > it is a regression caused by a kernel update it would be very useful if
> > > you could bisect to the erroneous commit.
> > 
> > It helps if one reads the code and the trace...
> > 
> > i915_error_object_create() is using KM_USER0 from softirq context. 
> > That's a bug, and a pretty serious one.  If some innocent civilian is
> > writing highmem data to disk and this timer interrupt fires and trashes
> > his KM_USER0 slot, the disk contents will be corrupted.
> > 
> > Something like this...
> > 
> > --- a/drivers/gpu/drm/i915/i915_irq.c~a
> > +++ a/drivers/gpu/drm/i915/i915_irq.c
> > @@ -456,11 +456,15 @@ i915_error_object_create(struct drm_devi
> >  
> >  	for (page = 0; page < page_count; page++) {
> >  		void *s, *d = kmalloc(PAGE_SIZE, GFP_ATOMIC);
> > +		unsigned long flags;
> > +
> >  		if (d == NULL)
> >  			goto unwind;
> > -		s = kmap_atomic(src_priv->pages[page], KM_USER0);
> > +		local_irq_save(flags);
> > +		s = kmap_atomic(src_priv->pages[page], KM_IRQ0);
> >  		memcpy(d, s, PAGE_SIZE);
> > -		kunmap_atomic(s, KM_USER0);
> > +		kunmap_atomic(s, KM_IRQ0);
> > +		local_irq_restore(flags);
> >  		dst->pages[page] = d;
> >  	}
> >  	dst->page_count = page_count;
> > _
> > 
> > Please let's get a tested fix for this into 2.6.34.
> 
> The change that I actually want is to replace the kmap_atomic(cpu_page) with an
> io_mapping_map_atomic_wc(gtt_page), in case there is a incoherency between
> the CPU and the GPU, we want to record what the GPU executed. Do you know
> how if similar precautions are required with io_mapping_map_atomic_wc()?

gack, wtf is io_mapping_map_atomic_wc()?

<looks>

Could do with some interface documentation. Looks too large to be inlined.

No, io_mapping_map_atomic_wc() cannot be used from [soft]irq context:
it hardwires use of KM_USER0.  I suggest that io_mapping_create_wc(),
io_mapping_map_atomic_wc() etc be changed so that the caller passes in the
KM_foo kmap slot index.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: DRM Error on Acer Aspire One
  2010-05-11 15:00 DRM Error on Acer Aspire One Jaswinder Singh Rajput
@ 2010-05-11 16:10 ` Chris Wilson
  2010-05-11 14:48   ` Andrew Morton
  2010-05-11 17:39   ` Jaswinder Singh Rajput
  0 siblings, 2 replies; 17+ messages in thread
From: Chris Wilson @ 2010-05-11 16:10 UTC (permalink / raw)
  To: Jaswinder Singh Rajput, dri-devel, Dave Airlie,
	Linux Kernel Mailing List, Andrew Morton

On Tue, 11 May 2010 20:30:07 +0530, Jaswinder Singh Rajput <jaswinderlinux@gmail.com> wrote:
> Hello,
> 
> With latest git kernel, I am getting following DRM error and not
> getting XWindows :

[snip]

Hmm, there are still patches for capturing error state that haven't gone
upstream, shame on me.

That error is a secondary issue to the GPU hang that is being reported. If
it is a regression caused by a kernel update it would be very useful if
you could bisect to the erroneous commit.

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: DRM Error on Acer Aspire One
  2010-05-11 16:10 ` Chris Wilson
  2010-05-11 14:48   ` Andrew Morton
@ 2010-05-11 17:39   ` Jaswinder Singh Rajput
  1 sibling, 0 replies; 17+ messages in thread
From: Jaswinder Singh Rajput @ 2010-05-11 17:39 UTC (permalink / raw)
  To: Chris Wilson
  Cc: dri-devel, Dave Airlie, Linux Kernel Mailing List, Andrew Morton

Hello Chris,

On Tue, May 11, 2010 at 9:40 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Tue, 11 May 2010 20:30:07 +0530, Jaswinder Singh Rajput <jaswinderlinux@gmail.com> wrote:
>> Hello,
>>
>> With latest git kernel, I am getting following DRM error and not
>> getting XWindows :
>
> [snip]
>
> Hmm, there are still patches for capturing error state that haven't gone
> upstream, shame on me.
>
> That error is a secondary issue to the GPU hang that is being reported. If
> it is a regression caused by a kernel update it would be very useful if
> you could bisect to the erroneous commit.
>

Earlier I was using Moblin, I switched to Fedora and start getting
this error. I have also tested different kernel versions but getting
same error, so I do not think this is a regression.

moblin dmesg : http://userweb.kernel.org/~jaswinder/moblin/dmesg-moblin_2633rc5.txt

Thanks,
--
Jaswinder Singh.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: DRM Error on Acer Aspire One
  2010-05-11 14:48   ` Andrew Morton
@ 2010-05-11 18:18     ` Jaswinder Singh Rajput
  2010-05-11 18:19     ` Chris Wilson
  1 sibling, 0 replies; 17+ messages in thread
From: Jaswinder Singh Rajput @ 2010-05-11 18:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Chris Wilson, dri-devel, Dave Airlie, Linux Kernel Mailing List

Hello Andrew,

On Tue, May 11, 2010 at 8:18 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Tue, 11 May 2010 17:10:53 +0100 Chris Wilson <chris@chris-wilson.co.uk> wrote:
>
>> On Tue, 11 May 2010 20:30:07 +0530, Jaswinder Singh Rajput <jaswinderlinux@gmail.com> wrote:
>> > Hello,
>> >
>> > With latest git kernel, I am getting following DRM error and not
>> > getting XWindows :
>>
>> [snip]
>>
>> Hmm, there are still patches for capturing error state that haven't gone
>> upstream, shame on me.
>>
>> That error is a secondary issue to the GPU hang that is being reported. If
>> it is a regression caused by a kernel update it would be very useful if
>> you could bisect to the erroneous commit.
>
> It helps if one reads the code and the trace...
>
> i915_error_object_create() is using KM_USER0 from softirq context.
> That's a bug, and a pretty serious one.  If some innocent civilian is
> writing highmem data to disk and this timer interrupt fires and trashes
> his KM_USER0 slot, the disk contents will be corrupted.
>
> Something like this...
>
> --- a/drivers/gpu/drm/i915/i915_irq.c~a
> +++ a/drivers/gpu/drm/i915/i915_irq.c
> @@ -456,11 +456,15 @@ i915_error_object_create(struct drm_devi
>
>        for (page = 0; page < page_count; page++) {
>                void *s, *d = kmalloc(PAGE_SIZE, GFP_ATOMIC);
> +               unsigned long flags;
> +
>                if (d == NULL)
>                        goto unwind;
> -               s = kmap_atomic(src_priv->pages[page], KM_USER0);
> +               local_irq_save(flags);
> +               s = kmap_atomic(src_priv->pages[page], KM_IRQ0);
>                memcpy(d, s, PAGE_SIZE);
> -               kunmap_atomic(s, KM_USER0);
> +               kunmap_atomic(s, KM_IRQ0);
> +               local_irq_restore(flags);
>                dst->pages[page] = d;
>        }
>        dst->page_count = page_count;
> _
>
> Please let's get a tested fix for this into 2.6.34.
>

I tested your patch with latest linus git and it works, it fixes the
softirq error.

Now I am only getting DRM errors :

[   42.276059] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer
elapsed... GPU hung
[   42.276398] render error detected, EIR: 0x00000000
[   42.276460] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request
returns -5 (awaiting 18 at 17)

Thanks,
--
Jaswinder Singh.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: DRM Error on Acer Aspire One
  2010-05-11 14:48   ` Andrew Morton
  2010-05-11 18:18     ` Jaswinder Singh Rajput
@ 2010-05-11 18:19     ` Chris Wilson
  2010-05-11 15:35       ` Andrew Morton
  1 sibling, 1 reply; 17+ messages in thread
From: Chris Wilson @ 2010-05-11 18:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jaswinder Singh Rajput, dri-devel, Dave Airlie,
	Linux Kernel Mailing List

On Tue, 11 May 2010 10:48:18 -0400, Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> On Tue, 11 May 2010 17:10:53 +0100 Chris Wilson <chris@chris-wilson.co.uk> wrote:
> 
> > On Tue, 11 May 2010 20:30:07 +0530, Jaswinder Singh Rajput <jaswinderlinux@gmail.com> wrote:
> > > Hello,
> > > 
> > > With latest git kernel, I am getting following DRM error and not
> > > getting XWindows :
> > 
> > [snip]
> > 
> > Hmm, there are still patches for capturing error state that haven't gone
> > upstream, shame on me.
> > 
> > That error is a secondary issue to the GPU hang that is being reported. If
> > it is a regression caused by a kernel update it would be very useful if
> > you could bisect to the erroneous commit.
> 
> It helps if one reads the code and the trace...
> 
> i915_error_object_create() is using KM_USER0 from softirq context. 
> That's a bug, and a pretty serious one.  If some innocent civilian is
> writing highmem data to disk and this timer interrupt fires and trashes
> his KM_USER0 slot, the disk contents will be corrupted.
> 
> Something like this...
> 
> --- a/drivers/gpu/drm/i915/i915_irq.c~a
> +++ a/drivers/gpu/drm/i915/i915_irq.c
> @@ -456,11 +456,15 @@ i915_error_object_create(struct drm_devi
>  
>  	for (page = 0; page < page_count; page++) {
>  		void *s, *d = kmalloc(PAGE_SIZE, GFP_ATOMIC);
> +		unsigned long flags;
> +
>  		if (d == NULL)
>  			goto unwind;
> -		s = kmap_atomic(src_priv->pages[page], KM_USER0);
> +		local_irq_save(flags);
> +		s = kmap_atomic(src_priv->pages[page], KM_IRQ0);
>  		memcpy(d, s, PAGE_SIZE);
> -		kunmap_atomic(s, KM_USER0);
> +		kunmap_atomic(s, KM_IRQ0);
> +		local_irq_restore(flags);
>  		dst->pages[page] = d;
>  	}
>  	dst->page_count = page_count;
> _
> 
> Please let's get a tested fix for this into 2.6.34.

The change that I actually want is to replace the kmap_atomic(cpu_page) with an
io_mapping_map_atomic_wc(gtt_page), in case there is a incoherency between
the CPU and the GPU, we want to record what the GPU executed. Do you know
how if similar precautions are required with io_mapping_map_atomic_wc()?

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: DRM Error on Acer Aspire One
  2010-05-11 15:35       ` Andrew Morton
@ 2010-05-11 18:52         ` Chris Wilson
  2010-05-11 19:10           ` Andrew Morton
  0 siblings, 1 reply; 17+ messages in thread
From: Chris Wilson @ 2010-05-11 18:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jaswinder Singh Rajput, dri-devel, Dave Airlie,
	Linux Kernel Mailing List, Keith Packard, Eric Anholt,
	Ingo Molnar

On Tue, 11 May 2010 11:35:55 -0400, Andrew Morton <akpm@linux-foundation.org> wrote:
> No, io_mapping_map_atomic_wc() cannot be used from [soft]irq context:
> it hardwires use of KM_USER0.  I suggest that io_mapping_create_wc(),
> io_mapping_map_atomic_wc() etc be changed so that the caller passes in the
> KM_foo kmap slot index.

Argh, sorry for the noise, read the mail in the wrong order. Thanks for
the review. It would be sensible to go with your simpler patch whilst
io_mapping_map_atomic_wc() is improved.

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: DRM Error on Acer Aspire One
  2010-05-11 18:52         ` Chris Wilson
@ 2010-05-11 19:10           ` Andrew Morton
  2010-05-11 19:57             ` Chris Wilson
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2010-05-11 19:10 UTC (permalink / raw)
  To: Chris Wilson
  Cc: Jaswinder Singh Rajput, dri-devel, Dave Airlie,
	Linux Kernel Mailing List, Keith Packard, Eric Anholt,
	Ingo Molnar

On Tue, 11 May 2010 19:52:31 +0100
Chris Wilson <chris@chris-wilson.co.uk> wrote:

> On Tue, 11 May 2010 11:35:55 -0400, Andrew Morton <akpm@linux-foundation.org> wrote:
> > No, io_mapping_map_atomic_wc() cannot be used from [soft]irq context:
> > it hardwires use of KM_USER0.  I suggest that io_mapping_create_wc(),
> > io_mapping_map_atomic_wc() etc be changed so that the caller passes in the
> > KM_foo kmap slot index.
> 
> Argh, sorry for the noise, read the mail in the wrong order. Thanks for
> the review. It would be sensible to go with your simpler patch whilst
> io_mapping_map_atomic_wc() is improved.

OK.  I'll be sending a bunch of fixes Linuswards in an hour or two.  
Should I include this?


Subject: drivers/gpu/drm/i915/i915_irq.c:i915_error_object_create(): use correct kmap-atomic slot
From: Andrew Morton <akpm@linux-foundation.org>

i915_error_object_create() is called from the timer interrupt and hence
can corrupt the KM_USER0 slot.  Use KM_IRQ0 instead.

Reported-by: Jaswinder Singh Rajput <jaswinderlinux@gmail.com>
Tested-by: Jaswinder Singh Rajput <jaswinderlinux@gmail.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/gpu/drm/i915/i915_irq.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff -puN drivers/gpu/drm/i915/i915_irq.c~drivers-gpu-drm-i915-i915_irqc-i915_error_object_create-use-correct-kmap-atomic-slot drivers/gpu/drm/i915/i915_irq.c
--- a/drivers/gpu/drm/i915/i915_irq.c~drivers-gpu-drm-i915-i915_irqc-i915_error_object_create-use-correct-kmap-atomic-slot
+++ a/drivers/gpu/drm/i915/i915_irq.c
@@ -461,11 +461,15 @@ i915_error_object_create(struct drm_devi
 
 	for (page = 0; page < page_count; page++) {
 		void *s, *d = kmalloc(PAGE_SIZE, GFP_ATOMIC);
+		unsigned long flags;
+
 		if (d == NULL)
 			goto unwind;
-		s = kmap_atomic(src_priv->pages[page], KM_USER0);
+		local_irq_save(flags);
+		s = kmap_atomic(src_priv->pages[page], KM_IRQ0);
 		memcpy(d, s, PAGE_SIZE);
-		kunmap_atomic(s, KM_USER0);
+		kunmap_atomic(s, KM_IRQ0);
+		local_irq_restore(flags);
 		dst->pages[page] = d;
 	}
 	dst->page_count = page_count;
_


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: DRM Error on Acer Aspire One
  2010-05-11 19:10           ` Andrew Morton
@ 2010-05-11 19:57             ` Chris Wilson
  2010-05-11 22:22               ` Dave Airlie
  0 siblings, 1 reply; 17+ messages in thread
From: Chris Wilson @ 2010-05-11 19:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jaswinder Singh Rajput, dri-devel, Dave Airlie,
	Linux Kernel Mailing List, Keith Packard, Eric Anholt,
	Ingo Molnar

On Tue, 11 May 2010 12:10:01 -0700, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Tue, 11 May 2010 19:52:31 +0100
> Chris Wilson <chris@chris-wilson.co.uk> wrote:
> 
> > On Tue, 11 May 2010 11:35:55 -0400, Andrew Morton <akpm@linux-foundation.org> wrote:
> > > No, io_mapping_map_atomic_wc() cannot be used from [soft]irq context:
> > > it hardwires use of KM_USER0.  I suggest that io_mapping_create_wc(),
> > > io_mapping_map_atomic_wc() etc be changed so that the caller passes in the
> > > KM_foo kmap slot index.
> > 
> > Argh, sorry for the noise, read the mail in the wrong order. Thanks for
> > the review. It would be sensible to go with your simpler patch whilst
> > io_mapping_map_atomic_wc() is improved.
> 
> OK.  I'll be sending a bunch of fixes Linuswards in an hour or two.  
> Should I include this?

Yes.

Acked-by: Chris Wilson <chris@chris-wilson.co.uk>

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: DRM Error on Acer Aspire One
  2010-05-11 19:57             ` Chris Wilson
@ 2010-05-11 22:22               ` Dave Airlie
  2010-05-11 22:32                 ` Andrew Morton
  2010-05-11 22:40                 ` Chris Wilson
  0 siblings, 2 replies; 17+ messages in thread
From: Dave Airlie @ 2010-05-11 22:22 UTC (permalink / raw)
  To: Chris Wilson
  Cc: Andrew Morton, Jaswinder Singh Rajput, dri-devel, Dave Airlie,
	Linux Kernel Mailing List, Keith Packard, Eric Anholt,
	Ingo Molnar

On Wed, May 12, 2010 at 5:57 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Tue, 11 May 2010 12:10:01 -0700, Andrew Morton <akpm@linux-foundation.org> wrote:
>> On Tue, 11 May 2010 19:52:31 +0100
>> Chris Wilson <chris@chris-wilson.co.uk> wrote:
>>
>> > On Tue, 11 May 2010 11:35:55 -0400, Andrew Morton <akpm@linux-foundation.org> wrote:
>> > > No, io_mapping_map_atomic_wc() cannot be used from [soft]irq context:
>> > > it hardwires use of KM_USER0.  I suggest that io_mapping_create_wc(),
>> > > io_mapping_map_atomic_wc() etc be changed so that the caller passes in the
>> > > KM_foo kmap slot index.
>> >
>> > Argh, sorry for the noise, read the mail in the wrong order. Thanks for
>> > the review. It would be sensible to go with your simpler patch whilst
>> > io_mapping_map_atomic_wc() is improved.
>>
>> OK.  I'll be sending a bunch of fixes Linuswards in an hour or two.
>> Should I include this?
>
> Yes.
>
> Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
>

I'm not sure pushing this in at this point is a good idea, if I'm
reading it correctly we've no idea what KM_IRQ is being used for, and
this codepath is called from non-irq contexts just as much as irq
contexts.

I'd rather we just backout the hangcheck stuff touching copies at all
at this point, and try again doing it properly with a slow work or
something for later.

Dave.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: DRM Error on Acer Aspire One
  2010-05-11 22:22               ` Dave Airlie
@ 2010-05-11 22:32                 ` Andrew Morton
  2010-05-11 22:51                   ` Dave Airlie
  2010-05-11 22:40                 ` Chris Wilson
  1 sibling, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2010-05-11 22:32 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Chris Wilson, Jaswinder Singh Rajput, dri-devel, Dave Airlie,
	Linux Kernel Mailing List, Keith Packard, Eric Anholt,
	Ingo Molnar

On Wed, 12 May 2010 08:22:49 +1000
Dave Airlie <airlied@gmail.com> wrote:

> On Wed, May 12, 2010 at 5:57 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > On Tue, 11 May 2010 12:10:01 -0700, Andrew Morton <akpm@linux-foundation.org> wrote:
> >> On Tue, 11 May 2010 19:52:31 +0100
> >> Chris Wilson <chris@chris-wilson.co.uk> wrote:
> >>
> >> > On Tue, 11 May 2010 11:35:55 -0400, Andrew Morton <akpm@linux-foundation.org> wrote:
> >> > > No, io_mapping_map_atomic_wc() cannot be used from [soft]irq context:
> >> > > it hardwires use of KM_USER0. __I suggest that io_mapping_create_wc(),
> >> > > io_mapping_map_atomic_wc() etc be changed so that the caller passes in the
> >> > > KM_foo kmap slot index.
> >> >
> >> > Argh, sorry for the noise, read the mail in the wrong order. Thanks for
> >> > the review. It would be sensible to go with your simpler patch whilst
> >> > io_mapping_map_atomic_wc() is improved.
> >>
> >> OK. __I'll be sending a bunch of fixes Linuswards in an hour or two.
> >> Should I include this?
> >
> > Yes.
> >
> > Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
> >
> 
> I'm not sure pushing this in at this point is a good idea, if I'm
> reading it correctly we've no idea what KM_IRQ is being used for,

It's used for taking kmaps from IRQ contexts.

> and
> this codepath is called from non-irq contexts just as much as irq
> contexts.

That's fine.  As long as we do a local_irq_disable(), KM_IRQ0 can be
used from both irq- and non-irq contexts.  All we need to do is to
ensure that some interrupt cannot come along on this CPU and corrupt
the slot.

> I'd rather we just backout the hangcheck stuff touching copies at all
> at this point, and try again doing it properly with a slow work or
> something for later.
> 
> Dave.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: DRM Error on Acer Aspire One
  2010-05-11 22:22               ` Dave Airlie
  2010-05-11 22:32                 ` Andrew Morton
@ 2010-05-11 22:40                 ` Chris Wilson
  1 sibling, 0 replies; 17+ messages in thread
From: Chris Wilson @ 2010-05-11 22:40 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Andrew Morton, Jaswinder Singh Rajput, dri-devel, Dave Airlie,
	Linux Kernel Mailing List, Keith Packard, Eric Anholt,
	Ingo Molnar

On Wed, 12 May 2010 08:22:49 +1000, Dave Airlie <airlied@gmail.com> wrote:
> I'd rather we just backout the hangcheck stuff touching copies at all
> at this point, and try again doing it properly with a slow work or
> something for later.

>From my point of view, the information provided by the hangcheck has been
invaluable for delving into and fixing some obnoxious driver bugs. I
suspect its honeymoon period is now over - those bugs that it could
detect easily have been fixed (I hope). In order to capture the relevant
information for later chipset generations, we will need to parse the
command stream and include auxiliary buffers. So whilst I would prefer to
see this in a release so that I can easily diagnose bug reports, I accept
that there is more work to be done and will HTFU.

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: DRM Error on Acer Aspire One
  2010-05-11 22:32                 ` Andrew Morton
@ 2010-05-11 22:51                   ` Dave Airlie
  2010-05-11 22:56                     ` Andrew Morton
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Airlie @ 2010-05-11 22:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Chris Wilson, Jaswinder Singh Rajput, dri-devel, Dave Airlie,
	Linux Kernel Mailing List, Keith Packard, Eric Anholt,
	Ingo Molnar

On Wed, May 12, 2010 at 8:32 AM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Wed, 12 May 2010 08:22:49 +1000
> Dave Airlie <airlied@gmail.com> wrote:
>
>> On Wed, May 12, 2010 at 5:57 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>> > On Tue, 11 May 2010 12:10:01 -0700, Andrew Morton <akpm@linux-foundation.org> wrote:
>> >> On Tue, 11 May 2010 19:52:31 +0100
>> >> Chris Wilson <chris@chris-wilson.co.uk> wrote:
>> >>
>> >> > On Tue, 11 May 2010 11:35:55 -0400, Andrew Morton <akpm@linux-foundation.org> wrote:
>> >> > > No, io_mapping_map_atomic_wc() cannot be used from [soft]irq context:
>> >> > > it hardwires use of KM_USER0. __I suggest that io_mapping_create_wc(),
>> >> > > io_mapping_map_atomic_wc() etc be changed so that the caller passes in the
>> >> > > KM_foo kmap slot index.
>> >> >
>> >> > Argh, sorry for the noise, read the mail in the wrong order. Thanks for
>> >> > the review. It would be sensible to go with your simpler patch whilst
>> >> > io_mapping_map_atomic_wc() is improved.
>> >>
>> >> OK. __I'll be sending a bunch of fixes Linuswards in an hour or two.
>> >> Should I include this?
>> >
>> > Yes.
>> >
>> > Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
>> >
>>
>> I'm not sure pushing this in at this point is a good idea, if I'm
>> reading it correctly we've no idea what KM_IRQ is being used for,
>
> It's used for taking kmaps from IRQ contexts.
>
>> and
>> this codepath is called from non-irq contexts just as much as irq
>> contexts.
>
> That's fine.  As long as we do a local_irq_disable(), KM_IRQ0 can be
> used from both irq- and non-irq contexts.  All we need to do is to
> ensure that some interrupt cannot come along on this CPU and corrupt
> the slot.

I don't think we do that in a lot of places, and I'd rather not add
that in to fix this problem at this point in the release cycle, as
we've no idea what it might break/regress.

Its easier to just disable the hangcheck copy and try again for 2.6.35
with a workqueue or slow work.

Dave



>
>> I'd rather we just backout the hangcheck stuff touching copies at all
>> at this point, and try again doing it properly with a slow work or
>> something for later.
>>
>> Dave.
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: DRM Error on Acer Aspire One
  2010-05-11 22:51                   ` Dave Airlie
@ 2010-05-11 22:56                     ` Andrew Morton
  2010-05-11 23:17                       ` Dave Airlie
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2010-05-11 22:56 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Chris Wilson, Jaswinder Singh Rajput, dri-devel, Dave Airlie,
	Linux Kernel Mailing List, Keith Packard, Eric Anholt,
	Ingo Molnar

On Wed, 12 May 2010 08:51:05 +1000
Dave Airlie <airlied@gmail.com> wrote:

> On Wed, May 12, 2010 at 8:32 AM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
> > On Wed, 12 May 2010 08:22:49 +1000
> > Dave Airlie <airlied@gmail.com> wrote:
> >
> >> On Wed, May 12, 2010 at 5:57 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> >> > On Tue, 11 May 2010 12:10:01 -0700, Andrew Morton <akpm@linux-foundation.org> wrote:
> >> >> On Tue, 11 May 2010 19:52:31 +0100
> >> >> Chris Wilson <chris@chris-wilson.co.uk> wrote:
> >> >>
> >> >> > On Tue, 11 May 2010 11:35:55 -0400, Andrew Morton <akpm@linux-foundation.org> wrote:
> >> >> > > No, io_mapping_map_atomic_wc() cannot be used from [soft]irq context:
> >> >> > > it hardwires use of KM_USER0. __I suggest that io_mapping_create_wc(),
> >> >> > > io_mapping_map_atomic_wc() etc be changed so that the caller passes in the
> >> >> > > KM_foo kmap slot index.
> >> >> >
> >> >> > Argh, sorry for the noise, read the mail in the wrong order. Thanks for
> >> >> > the review. It would be sensible to go with your simpler patch whilst
> >> >> > io_mapping_map_atomic_wc() is improved.
> >> >>
> >> >> OK. __I'll be sending a bunch of fixes Linuswards in an hour or two.
> >> >> Should I include this?
> >> >
> >> > Yes.
> >> >
> >> > Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> >
> >>
> >> I'm not sure pushing this in at this point is a good idea, if I'm
> >> reading it correctly we've no idea what KM_IRQ is being used for,
> >
> > It's used for taking kmaps from IRQ contexts.
> >
> >> and
> >> this codepath is called from non-irq contexts just as much as irq
> >> contexts.
> >
> > That's fine. __As long as we do a local_irq_disable(), KM_IRQ0 can be
> > used from both irq- and non-irq contexts. __All we need to do is to
> > ensure that some interrupt cannot come along on this CPU and corrupt
> > the slot.
> 
> I don't think we do that in a lot of places, and I'd rather not add
> that in to fix this problem at this point in the release cycle, as
> we've no idea what it might break/regress.

What is "that"?  The switch to irq-protected KM_IRQ0?  That won't break
anything.

> Its easier to just disable the hangcheck copy and try again for 2.6.35
> with a workqueue or slow work.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: DRM Error on Acer Aspire One
  2010-05-11 22:56                     ` Andrew Morton
@ 2010-05-11 23:17                       ` Dave Airlie
  2010-05-11 23:24                         ` Andrew Morton
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Airlie @ 2010-05-11 23:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Chris Wilson, Jaswinder Singh Rajput, dri-devel, Dave Airlie,
	Linux Kernel Mailing List, Keith Packard, Eric Anholt,
	Ingo Molnar

On Wed, May 12, 2010 at 8:56 AM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Wed, 12 May 2010 08:51:05 +1000
> Dave Airlie <airlied@gmail.com> wrote:
>
>> On Wed, May 12, 2010 at 8:32 AM, Andrew Morton
>> <akpm@linux-foundation.org> wrote:
>> > On Wed, 12 May 2010 08:22:49 +1000
>> > Dave Airlie <airlied@gmail.com> wrote:
>> >
>> >> On Wed, May 12, 2010 at 5:57 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>> >> > On Tue, 11 May 2010 12:10:01 -0700, Andrew Morton <akpm@linux-foundation.org> wrote:
>> >> >> On Tue, 11 May 2010 19:52:31 +0100
>> >> >> Chris Wilson <chris@chris-wilson.co.uk> wrote:
>> >> >>
>> >> >> > On Tue, 11 May 2010 11:35:55 -0400, Andrew Morton <akpm@linux-foundation.org> wrote:
>> >> >> > > No, io_mapping_map_atomic_wc() cannot be used from [soft]irq context:
>> >> >> > > it hardwires use of KM_USER0. __I suggest that io_mapping_create_wc(),
>> >> >> > > io_mapping_map_atomic_wc() etc be changed so that the caller passes in the
>> >> >> > > KM_foo kmap slot index.
>> >> >> >
>> >> >> > Argh, sorry for the noise, read the mail in the wrong order. Thanks for
>> >> >> > the review. It would be sensible to go with your simpler patch whilst
>> >> >> > io_mapping_map_atomic_wc() is improved.
>> >> >>
>> >> >> OK. __I'll be sending a bunch of fixes Linuswards in an hour or two.
>> >> >> Should I include this?
>> >> >
>> >> > Yes.
>> >> >
>> >> > Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
>> >> >
>> >>
>> >> I'm not sure pushing this in at this point is a good idea, if I'm
>> >> reading it correctly we've no idea what KM_IRQ is being used for,
>> >
>> > It's used for taking kmaps from IRQ contexts.
>> >
>> >> and
>> >> this codepath is called from non-irq contexts just as much as irq
>> >> contexts.
>> >
>> > That's fine. __As long as we do a local_irq_disable(), KM_IRQ0 can be
>> > used from both irq- and non-irq contexts. __All we need to do is to
>> > ensure that some interrupt cannot come along on this CPU and corrupt
>> > the slot.
>>
>> I don't think we do that in a lot of places, and I'd rather not add
>> that in to fix this problem at this point in the release cycle, as
>> we've no idea what it might break/regress.
>
> What is "that"?  The switch to irq-protected KM_IRQ0?  That won't break
> anything.
>

disabling local cpu irqs around all these kmap mappings.

Dave.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: DRM Error on Acer Aspire One
  2010-05-11 23:17                       ` Dave Airlie
@ 2010-05-11 23:24                         ` Andrew Morton
  0 siblings, 0 replies; 17+ messages in thread
From: Andrew Morton @ 2010-05-11 23:24 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Chris Wilson, Jaswinder Singh Rajput, dri-devel, Dave Airlie,
	Linux Kernel Mailing List, Keith Packard, Eric Anholt,
	Ingo Molnar

On Wed, 12 May 2010 09:17:09 +1000
Dave Airlie <airlied@gmail.com> wrote:

> >> >> and
> >> >> this codepath is called from non-irq contexts just as much as irq
> >> >> contexts.
> >> >
> >> > That's fine. __As long as we do a local_irq_disable(), KM_IRQ0 can be
> >> > used from both irq- and non-irq contexts. __All we need to do is to
> >> > ensure that some interrupt cannot come along on this CPU and corrupt
> >> > the slot.
> >>
> >> I don't think we do that in a lot of places, and I'd rather not add
> >> that in to fix this problem at this point in the release cycle, as
> >> we've no idea what it might break/regress.
> >
> > What is "that"? __The switch to irq-protected KM_IRQ0? __That won't break
> > anything.
> >
> 
> disabling local cpu irqs around all these kmap mappings.
> 

Ah.  Well if there are other uses of KM_USER0 from interrupt context
then yes, we have more problems.  CONFIG_DEBUG_HIGHMEM &&
CONFIG_TRACE_IRQFLAGS_SUPPORT will detect that and as long as Jaswinder
has hit all code paths in his testing, we're good.  Some manual review
for this would be good.


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2010-05-11 23:25 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-11 15:00 DRM Error on Acer Aspire One Jaswinder Singh Rajput
2010-05-11 16:10 ` Chris Wilson
2010-05-11 14:48   ` Andrew Morton
2010-05-11 18:18     ` Jaswinder Singh Rajput
2010-05-11 18:19     ` Chris Wilson
2010-05-11 15:35       ` Andrew Morton
2010-05-11 18:52         ` Chris Wilson
2010-05-11 19:10           ` Andrew Morton
2010-05-11 19:57             ` Chris Wilson
2010-05-11 22:22               ` Dave Airlie
2010-05-11 22:32                 ` Andrew Morton
2010-05-11 22:51                   ` Dave Airlie
2010-05-11 22:56                     ` Andrew Morton
2010-05-11 23:17                       ` Dave Airlie
2010-05-11 23:24                         ` Andrew Morton
2010-05-11 22:40                 ` Chris Wilson
2010-05-11 17:39   ` Jaswinder Singh Rajput

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox