inode->i_wb_list corruption.

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* inode->i_wb_list corruption.
@ 2012-03-06 18:51 Dave Jones
  2012-03-06 21:03 ` Jan Kara
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Jones @ 2012-03-06 18:51 UTC (permalink / raw)
  To: Linux Kernel; +Cc: Fedora Kernel Team, viro

We've had three separate reports against 3.2.x recently where the linked list debugging
is getting tripped up by the prev->next pointer being null instead of pointing
to the current list entry while walking the i_wb_list

Call traces are slightly different each time, but all end up walking i_wb_list 
in dput -> d_kill -> i_put -> evict -> inode_wb_list_del

What protects that list ? It looks to be just bdi->wb.list_lock ?


full reports at:
https://bugzilla.redhat.com/show_bug.cgi?id=784741
https://bugzilla.redhat.com/show_bug.cgi?id=799229
https://bugzilla.redhat.com/show_bug.cgi?id=799692

	Dave


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: inode->i_wb_list corruption.
  2012-03-06 18:51 inode->i_wb_list corruption Dave Jones
@ 2012-03-06 21:03 ` Jan Kara
  2012-03-07  7:26   ` Fengguang Wu
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Kara @ 2012-03-06 21:03 UTC (permalink / raw)
  To: Dave Jones
  Cc: Linux Kernel, Fedora Kernel Team, viro, Wu Fengguang,
	Christoph Hellwig

On Tue 06-03-12 13:51:37, Dave Jones wrote:
> We've had three separate reports against 3.2.x recently where the linked list debugging
> is getting tripped up by the prev->next pointer being null instead of pointing
> to the current list entry while walking the i_wb_list
> 
> Call traces are slightly different each time, but all end up walking i_wb_list 
> in dput -> d_kill -> i_put -> evict -> inode_wb_list_del
> 
> What protects that list ? It looks to be just bdi->wb.list_lock ?
> 
> 
> full reports at:
> https://bugzilla.redhat.com/show_bug.cgi?id=784741
> https://bugzilla.redhat.com/show_bug.cgi?id=799229
> https://bugzilla.redhat.com/show_bug.cgi?id=799692
  Hum, interesting! I'd guess this might be caused by f758eeab - adding
Fengguang and Christoph to CC. But I'm really failing to see how this could
happen but interesting thing is that in two of the three cases the files
are on virtual filesystems (once cgroup, once sysfs). These both use
noop_backing_dev_info.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: inode->i_wb_list corruption.
  2012-03-06 21:03 ` Jan Kara
@ 2012-03-07  7:26   ` Fengguang Wu
  2012-03-07 10:42     ` Jan Kara
  0 siblings, 1 reply; 19+ messages in thread
From: Fengguang Wu @ 2012-03-07  7:26 UTC (permalink / raw)
  To: Jan Kara
  Cc: Dave Jones, Linux Kernel, Fedora Kernel Team, viro,
	Christoph Hellwig

On Tue, Mar 06, 2012 at 10:03:07PM +0100, Jan Kara wrote:
> On Tue 06-03-12 13:51:37, Dave Jones wrote:
> > We've had three separate reports against 3.2.x recently where the linked list debugging
> > is getting tripped up by the prev->next pointer being null instead of pointing
> > to the current list entry while walking the i_wb_list
> > 
> > Call traces are slightly different each time, but all end up walking i_wb_list 
> > in dput -> d_kill -> i_put -> evict -> inode_wb_list_del
> > 
> > What protects that list ? It looks to be just bdi->wb.list_lock ?
> > 
> > 
> > full reports at:
> > https://bugzilla.redhat.com/show_bug.cgi?id=784741
> > https://bugzilla.redhat.com/show_bug.cgi?id=799229
> > https://bugzilla.redhat.com/show_bug.cgi?id=799692
>   Hum, interesting! I'd guess this might be caused by f758eeab - adding
> Fengguang and Christoph to CC. But I'm really failing to see how this could
> happen but interesting thing is that in two of the three cases the files
> are on virtual filesystems (once cgroup, once sysfs). These both use
> noop_backing_dev_info.

sysfs/cgroup forgot to init inode->i_wb_list?

This simplified fix inits it in inode_init_always().

The better fix would be to add init_once to sysfs or perhaps fix
sysfs_get_inode()/cgroup_new_inode().

Thanks,
Fengguang

---
 fs/inode.c |    1 +
 1 file changed, 1 insertion(+)

--- linux.orig/fs/inode.c	2012-02-22 19:20:48.374799955 -0800
+++ linux/fs/inode.c	2012-03-06 23:11:29.133899478 -0800
@@ -193,6 +193,7 @@ int inode_init_always(struct super_block
 	inode->i_private = NULL;
 	inode->i_mapping = mapping;
 	INIT_LIST_HEAD(&inode->i_dentry);	/* buggered by rcu freeing */
+	INIT_LIST_HEAD(&inode->i_wb_list);
 #ifdef CONFIG_FS_POSIX_ACL
 	inode->i_acl = inode->i_default_acl = ACL_NOT_CACHED;
 #endif

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: inode->i_wb_list corruption.
  2012-03-07  7:26   ` Fengguang Wu
@ 2012-03-07 10:42     ` Jan Kara
  2012-03-09  8:34       ` Yang Bai
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Kara @ 2012-03-07 10:42 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Jan Kara, Dave Jones, Linux Kernel, Fedora Kernel Team, viro,
	Christoph Hellwig

On Tue 06-03-12 23:26:08, Wu Fengguang wrote:
> On Tue, Mar 06, 2012 at 10:03:07PM +0100, Jan Kara wrote:
> > On Tue 06-03-12 13:51:37, Dave Jones wrote:
> > > We've had three separate reports against 3.2.x recently where the linked list debugging
> > > is getting tripped up by the prev->next pointer being null instead of pointing
> > > to the current list entry while walking the i_wb_list
> > > 
> > > Call traces are slightly different each time, but all end up walking i_wb_list 
> > > in dput -> d_kill -> i_put -> evict -> inode_wb_list_del
> > > 
> > > What protects that list ? It looks to be just bdi->wb.list_lock ?
> > > 
> > > 
> > > full reports at:
> > > https://bugzilla.redhat.com/show_bug.cgi?id=784741
> > > https://bugzilla.redhat.com/show_bug.cgi?id=799229
> > > https://bugzilla.redhat.com/show_bug.cgi?id=799692
> >   Hum, interesting! I'd guess this might be caused by f758eeab - adding
> > Fengguang and Christoph to CC. But I'm really failing to see how this could
> > happen but interesting thing is that in two of the three cases the files
> > are on virtual filesystems (once cgroup, once sysfs). These both use
> > noop_backing_dev_info.
> 
> sysfs/cgroup forgot to init inode->i_wb_list?
  Umm, it's not *that* simple I'd say. E.g. sysfs doesn't provide
alloc_inode() method so we use inode_cachep for allocations. And that cache
is configured to use inode_init_once().

Also note that the error message is:
list_del corruption. prev->next should be ffff8801c2f41b18, but was (null)

Which means that our inode had correct i_wb_list.prev but the previous
inode had NULL in i_wb_list.next. But that means that both inodes were
linked into the list at some point. So it does not seem like an
initialization issue to me...

								Honza

> --- linux.orig/fs/inode.c	2012-02-22 19:20:48.374799955 -0800
> +++ linux/fs/inode.c	2012-03-06 23:11:29.133899478 -0800
> @@ -193,6 +193,7 @@ int inode_init_always(struct super_block
>  	inode->i_private = NULL;
>  	inode->i_mapping = mapping;
>  	INIT_LIST_HEAD(&inode->i_dentry);	/* buggered by rcu freeing */
> +	INIT_LIST_HEAD(&inode->i_wb_list);
>  #ifdef CONFIG_FS_POSIX_ACL
>  	inode->i_acl = inode->i_default_acl = ACL_NOT_CACHED;
>  #endif

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: inode->i_wb_list corruption.
  2012-03-07 10:42     ` Jan Kara
@ 2012-03-09  8:34       ` Yang Bai
  2012-03-09 14:57         ` Dave Jones
  0 siblings, 1 reply; 19+ messages in thread
From: Yang Bai @ 2012-03-09  8:34 UTC (permalink / raw)
  To: Jan Kara
  Cc: Fengguang Wu, Dave Jones, Linux Kernel, Fedora Kernel Team, viro,
	Christoph Hellwig

On Wed, Mar 7, 2012 at 6:42 PM, Jan Kara <jack@suse.cz> wrote:
> On Tue 06-03-12 23:26:08, Wu Fengguang wrote:
>  Umm, it's not *that* simple I'd say. E.g. sysfs doesn't provide
> alloc_inode() method so we use inode_cachep for allocations. And that cache
> is configured to use inode_init_once().
>
> Also note that the error message is:
> list_del corruption. prev->next should be ffff8801c2f41b18, but was (null)
>
> Which means that our inode had correct i_wb_list.prev but the previous
> inode had NULL in i_wb_list.next. But that means that both inodes were
> linked into the list at some point. So it does not seem like an
> initialization issue to me...
>
>                                                                Honza
>
I still want to know how to reproduce this bug. I add the following
patch to the kernel fc-16 3.2.9-1

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 5b4a936..568ed0a 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -192,8 +192,13 @@ void bdi_start_background_writeback(struct
backing_dev_info *bdi)
 void inode_wb_list_del(struct inode *inode)
 {
 	struct backing_dev_info *bdi = inode_to_bdi(inode);
+	struct list_head *pos;

 	spin_lock(&bdi->wb.list_lock);
+	list_for_each(pos, &inode->i_wb_list) {
+		printk(KERN_EMERG "list entry: %p; next: %p; prev: %p",
+		       pos, pos->next, pos->prev);
+	}
 	list_del_init(&inode->i_wb_list);
 	spin_unlock(&bdi->wb.list_lock);
 }

So on every inode_wb_list_del, it will show the whole list.

and Doing while true; do touch a && rm -f a; done for almost one day
without any problem.

So How to reproduce it??

Thanks,
Yang

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: inode->i_wb_list corruption.
  2012-03-09  8:34       ` Yang Bai
@ 2012-03-09 14:57         ` Dave Jones
  2012-03-09 15:19           ` Dave Jones
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Jones @ 2012-03-09 14:57 UTC (permalink / raw)
  To: Yang Bai
  Cc: Jan Kara, Fengguang Wu, Linux Kernel, Fedora Kernel Team, viro,
	Christoph Hellwig

On Fri, Mar 09, 2012 at 04:34:57PM +0800, Yang Bai wrote:

 > I still want to know how to reproduce this bug. I add the following
 > patch to the kernel fc-16 3.2.9-1
 > 
 > So on every inode_wb_list_del, it will show the whole list.
 > 
 > and Doing while true; do touch a && rm -f a; done for almost one day
 > without any problem.
 > 
 > So How to reproduce it??

If it was that easy, I'd be bisecting it by now ;-)

This, like a bunch of other really weird bugs that we have no explanation for,
only seems to be being hit by a small minority of users.

One common thing seems to be that they were all quad core intel
boxes, with i915 graphics.

We have some reports of i915 causing memory corruption after suspend/hibernate,
but none of these reports mention whether they've done that (I just asked).

	Dave

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: inode->i_wb_list corruption.
  2012-03-09 14:57         ` Dave Jones
@ 2012-03-09 15:19           ` Dave Jones
  2012-03-09 16:14             ` Yang Bai
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Jones @ 2012-03-09 15:19 UTC (permalink / raw)
  To: Yang Bai, Jan Kara, Fengguang Wu, Linux Kernel,
	Fedora Kernel Team, viro, Christoph Hellwig

On Fri, Mar 09, 2012 at 09:57:14AM -0500, Dave Jones wrote:
 > On Fri, Mar 09, 2012 at 04:34:57PM +0800, Yang Bai wrote:
 >  
 >  > I still want to know how to reproduce this bug. I add the following
 >  > patch to the kernel fc-16 3.2.9-1
 >  > 
 >  > So on every inode_wb_list_del, it will show the whole list.
 >  > 
 >  > and Doing while true; do touch a && rm -f a; done for almost one day
 >  > without any problem.
 >  > 
 >  > So How to reproduce it??
 > 
 > If it was that easy, I'd be bisecting it by now ;-)
 > 
 > This, like a bunch of other really weird bugs that we have no explanation for,
 > only seems to be being hit by a small minority of users.
 > 
 > One common thing seems to be that they were all quad core intel
 > boxes, with i915 graphics.
 > 
 > We have some reports of i915 causing memory corruption after suspend/hibernate,
 > but none of these reports mention whether they've done that (I just asked).

And with that, this arrived.. 
https://bugzilla.redhat.com/show_bug.cgi?id=788433#c3

I'm leaning strongly towards believing this is yet another case of i915
corrupting memory on resume.

	Dave


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: inode->i_wb_list corruption.
  2012-03-09 15:19           ` Dave Jones
@ 2012-03-09 16:14             ` Yang Bai
  2012-03-09 18:00               ` Dave Jones
  0 siblings, 1 reply; 19+ messages in thread
From: Yang Bai @ 2012-03-09 16:14 UTC (permalink / raw)
  To: Dave Jones, Yang Bai, Jan Kara, Fengguang Wu, Linux Kernel,
	Fedora Kernel Team, viro, Christoph Hellwig

On Fri, Mar 9, 2012 at 11:19 PM, Dave Jones <davej@redhat.com> wrote:
> And with that, this arrived..
> https://bugzilla.redhat.com/show_bug.cgi?id=788433#c3
>
> I'm leaning strongly towards believing this is yet another case of i915
> corrupting memory on resume.
>

Nice catch. I am wondering
1) why all lists being affected and
2) why all list_head's prev being set to NULL.

Any ideas?

Thanks,
Yang

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: inode->i_wb_list corruption.
  2012-03-09 16:14             ` Yang Bai
@ 2012-03-09 18:00               ` Dave Jones
  2012-03-09 20:08                 ` Keith Packard
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Jones @ 2012-03-09 18:00 UTC (permalink / raw)
  To: Yang Bai; +Cc: Fengguang Wu, Linux Kernel, Fedora Kernel Team, kernel

(trimmed cc)

On Sat, Mar 10, 2012 at 12:14:37AM +0800, Yang Bai wrote:
 > On Fri, Mar 9, 2012 at 11:19 PM, Dave Jones <davej@redhat.com> wrote:
 > > And with that, this arrived..
 > > https://bugzilla.redhat.com/show_bug.cgi?id=788433#c3
 > >
 > > I'm leaning strongly towards believing this is yet another case of i915
 > > corrupting memory on resume.
 > 
 > Nice catch. I am wondering
 > 1) why all lists being affected and
 > 2) why all list_head's prev being set to NULL.
 > 
 > Any ideas?

This is probably the same bug: https://bugzilla.kernel.org/show_bug.cgi?id=37142
Petr noticed that the corruption is 32 bytes getting zeroed at the beginning
of a page.

I think this may be responsible for a lot of different bugs that we've
had reported.

i915_drm_thaw is a deep nest of functions though, so this is going to be
hard to track down where that write is coming from. Because the corruption
seems to happen to pages that are already allocated, we probably can't
even rely on DEBUG_PAGEALLOC, though it might be worth trying.

	Dave

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: inode->i_wb_list corruption.
  2012-03-09 18:00               ` Dave Jones
@ 2012-03-09 20:08                 ` Keith Packard
  2012-03-09 20:19                   ` Josh Boyer
  2012-03-12 23:26                   ` Dave Jones
  0 siblings, 2 replies; 19+ messages in thread
From: Keith Packard @ 2012-03-09 20:08 UTC (permalink / raw)
  To: Dave Jones, Yang Bai
  Cc: Fengguang Wu, Linux Kernel, Fedora Kernel Team, kernel

<#part sign=pgpmime>
On Fri, 9 Mar 2012 13:00:15 -0500, Dave Jones <davej@redhat.com> wrote:

> i915_drm_thaw is a deep nest of functions though, so this is going to be
> hard to track down where that write is coming from. Because the corruption
> seems to happen to pages that are already allocated, we probably can't
> even rely on DEBUG_PAGEALLOC, though it might be worth trying.

I'm worried that the write is coming through the GTT, which would make
sense as these look like pixel values. If this is on Ironlake (core
I3-I7 first gen), we know there are issues when VT-d is enabled, and
the work-around for that doesn't appear to be in place for the hibernate
resume case.

-- 
keith.packard@intel.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: inode->i_wb_list corruption.
  2012-03-09 20:08                 ` Keith Packard
@ 2012-03-09 20:19                   ` Josh Boyer
  2012-03-09 22:44                     ` Keith Packard
  2012-03-12 23:26                   ` Dave Jones
  1 sibling, 1 reply; 19+ messages in thread
From: Josh Boyer @ 2012-03-09 20:19 UTC (permalink / raw)
  To: Keith Packard
  Cc: Dave Jones, Yang Bai, Fengguang Wu, Linux Kernel,
	Fedora Kernel Team, kernel

On Fri, Mar 09, 2012 at 12:08:07PM -0800, Keith Packard wrote:
> <#part sign=pgpmime>
> On Fri, 9 Mar 2012 13:00:15 -0500, Dave Jones <davej@redhat.com> wrote:
> 
> > i915_drm_thaw is a deep nest of functions though, so this is going to be
> > hard to track down where that write is coming from. Because the corruption
> > seems to happen to pages that are already allocated, we probably can't
> > even rely on DEBUG_PAGEALLOC, though it might be worth trying.
> 
> I'm worried that the write is coming through the GTT, which would make
> sense as these look like pixel values. If this is on Ironlake (core
> I3-I7 first gen), we know there are issues when VT-d is enabled, and
> the work-around for that doesn't appear to be in place for the hibernate
> resume case.

Is the VT-d issue something in the hardware itself, or do you mean if
you have it enabled in the kernel?  We've had the intel IOMMU disabled
by default in the Fedora kernels for a while now.  At least since before
3.2 was released.

josh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: inode->i_wb_list corruption.
  2012-03-09 20:19                   ` Josh Boyer
@ 2012-03-09 22:44                     ` Keith Packard
  2012-03-12 21:13                       ` Josh Boyer
  0 siblings, 1 reply; 19+ messages in thread
From: Keith Packard @ 2012-03-09 22:44 UTC (permalink / raw)
  To: Josh Boyer
  Cc: Dave Jones, Yang Bai, Fengguang Wu, Linux Kernel,
	Fedora Kernel Team, kernel, David Woodhouse

<#part sign=pgpmime>
On Fri, 9 Mar 2012 15:19:34 -0500, Josh Boyer <jwboyer@redhat.com> wrote:

> Is the VT-d issue something in the hardware itself, or do you mean if
> you have it enabled in the kernel?  We've had the intel IOMMU disabled
> by default in the Fedora kernels for a while now.  At least since before
> 3.2 was released.

I don't know for sure; David Woodhouse gave a scary presentation
yesterday that makes me unsure of what happens when IOMMU is disabled in
the kernel, given that much of the hardware is setup by the BIOS.

-- 
keith.packard@intel.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: inode->i_wb_list corruption.
  2012-03-09 22:44                     ` Keith Packard
@ 2012-03-12 21:13                       ` Josh Boyer
  2012-03-12 21:27                         ` David Woodhouse
  0 siblings, 1 reply; 19+ messages in thread
From: Josh Boyer @ 2012-03-12 21:13 UTC (permalink / raw)
  To: Keith Packard
  Cc: Dave Jones, Yang Bai, Fengguang Wu, Linux Kernel,
	Fedora Kernel Team, kernel, David Woodhouse

On Fri, Mar 09, 2012 at 02:44:49PM -0800, Keith Packard wrote:
> <#part sign=pgpmime>
> On Fri, 9 Mar 2012 15:19:34 -0500, Josh Boyer <jwboyer@redhat.com> wrote:
> 
> > Is the VT-d issue something in the hardware itself, or do you mean if
> > you have it enabled in the kernel?  We've had the intel IOMMU disabled
> > by default in the Fedora kernels for a while now.  At least since before
> > 3.2 was released.
> 
> I don't know for sure; David Woodhouse gave a scary presentation
> yesterday that makes me unsure of what happens when IOMMU is disabled in
> the kernel, given that much of the hardware is setup by the BIOS.

Is that presentation something that could be shared?  I have to say,
hearing that doesn't really inspire confidence in either the IOMMU or
the kernel.

josh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: inode->i_wb_list corruption.
  2012-03-12 21:13                       ` Josh Boyer
@ 2012-03-12 21:27                         ` David Woodhouse
  0 siblings, 0 replies; 19+ messages in thread
From: David Woodhouse @ 2012-03-12 21:27 UTC (permalink / raw)
  To: Josh Boyer
  Cc: Keith Packard, Dave Jones, Yang Bai, Fengguang Wu, Linux Kernel,
	Fedora Kernel Team, kernel

[-- Attachment #1: Type: text/plain, Size: 1175 bytes --]

On Mon, 2012-03-12 at 17:13 -0400, Josh Boyer wrote:
> On Fri, Mar 09, 2012 at 02:44:49PM -0800, Keith Packard wrote:
> > <#part sign=pgpmime>
> > On Fri, 9 Mar 2012 15:19:34 -0500, Josh Boyer <jwboyer@redhat.com> wrote:
> > 
> > > Is the VT-d issue something in the hardware itself, or do you mean if
> > > you have it enabled in the kernel?  We've had the intel IOMMU disabled
> > > by default in the Fedora kernels for a while now.  At least since before
> > > 3.2 was released.
> > 
> > I don't know for sure; David Woodhouse gave a scary presentation
> > yesterday that makes me unsure of what happens when IOMMU is disabled in
> > the kernel, given that much of the hardware is setup by the BIOS.
> 
> Is that presentation something that could be shared?  I have to say,
> hearing that doesn't really inspire confidence in either the IOMMU or
> the kernel.

It was mostly just a rant about the design mistakes we made with the
IOMMU — in particular giving the BIOS as much rope as possible for it to
hang us with.

If the BIOS exposes the IOMMU but the OS chooses not to enable it, I
don't believe there's any problem with that.

-- 
dwmw2

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5818 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: inode->i_wb_list corruption.
  2012-03-09 20:08                 ` Keith Packard
  2012-03-09 20:19                   ` Josh Boyer
@ 2012-03-12 23:26                   ` Dave Jones
  2012-03-13  0:06                     ` Keith Packard
  1 sibling, 1 reply; 19+ messages in thread
From: Dave Jones @ 2012-03-12 23:26 UTC (permalink / raw)
  To: Keith Packard
  Cc: Yang Bai, Fengguang Wu, Linux Kernel, Fedora Kernel Team, kernel

On Fri, Mar 09, 2012 at 12:08:07PM -0800, Keith Packard wrote:
 > <#part sign=pgpmime>
 > On Fri, 9 Mar 2012 13:00:15 -0500, Dave Jones <davej@redhat.com> wrote:
 > 
 > > i915_drm_thaw is a deep nest of functions though, so this is going to be
 > > hard to track down where that write is coming from. Because the corruption
 > > seems to happen to pages that are already allocated, we probably can't
 > > even rely on DEBUG_PAGEALLOC, though it might be worth trying.
 > 
 > I'm worried that the write is coming through the GTT, which would make
 > sense as these look like pixel values. If this is on Ironlake (core
 > I3-I7 first gen), we know there are issues when VT-d is enabled, and
 > the work-around for that doesn't appear to be in place for the hibernate
 > resume case.

Thinking about how the GTT could contain stale pointers, I came up with this scenario:

Before we begin the thaw, the initramfs sets up a framebuffer.
This causes the GTT to be setup.

- Thaw begins, hardware state still points to the GTT setup by the modesetting code.
  At this point, any graphics operations are going to cause writes through
  those translations. Bad news if we just wrote a bunch of thawed data there.

  or..

- Thaw begins, and data is written over the GTT setup by the initramfs, but 
  the hardware registers still points at it, until thaw is complete, when we
  reprogram the GTT registers to their pre-hibernate values.

If we could somehow set modeset=0 automatically if we detect a hibernate
partition it would probably 'solve' it, but I suspect the real answer
would be to do GTT teardown before we do a thaw.

	Dave


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: inode->i_wb_list corruption.
  2012-03-12 23:26                   ` Dave Jones
@ 2012-03-13  0:06                     ` Keith Packard
  0 siblings, 0 replies; 19+ messages in thread
From: Keith Packard @ 2012-03-13  0:06 UTC (permalink / raw)
  To: Dave Jones
  Cc: Yang Bai, Fengguang Wu, Linux Kernel, Fedora Kernel Team, kernel

<#part sign=pgpmime>
On Mon, 12 Mar 2012 19:26:30 -0400, Dave Jones <davej@redhat.com> wrote:

> Thinking about how the GTT could contain stale pointers, I came up with this scenario:
> 
> Before we begin the thaw, the initramfs sets up a framebuffer.
> This causes the GTT to be setup.

Yes. The frame buffer is allocated as regular kernel pages, of course.

> - Thaw begins, hardware state still points to the GTT setup by the modesetting code.
>   At this point, any graphics operations are going to cause writes through
>   those translations. Bad news if we just wrote a bunch of thawed data
>   there.

The question is what data could still be pending there; we're running
just fbcon at that point, which uses only write-through
access. Presumably, any fbdev writes will have been long-since finished
once we start the new kernel.

> - Thaw begins, and data is written over the GTT setup by the initramfs, but 
>   the hardware registers still points at it, until thaw is complete, when we
>   reprogram the GTT registers to their pre-hibernate values.

I'm not sure how the new kernel could manage to do any writes through
this though -- it shouldn't touch the frame buffer until it has thawed
the video driver, right?

> If we could somehow set modeset=0 automatically if we detect a hibernate
> partition it would probably 'solve' it, but I suspect the real answer
> would be to do GTT teardown before we do a thaw.

We've got a ton of memory available in the 'stolen' area which the BIOS
used as a frame buffer; we should be able to switch to that before the
switch, if we decide that this is necessary.

-- 
keith.packard@intel.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: inode->i_wb_list corruption.
@ 2012-03-15 14:08 Petr Tesařík
  2012-03-15 14:22 ` Dave Airlie
  2012-03-15 14:49 ` Dave Jones
  0 siblings, 2 replies; 19+ messages in thread
From: Petr Tesařík @ 2012-03-15 14:08 UTC (permalink / raw)
  To: Dave Jones
  Cc: Yang Bai, Fengguang Wu, Linux Kernel, Fedora Kernel Team, kernel

Dne So 10. března 2012 02:00:15 Dave Jones napsal(a):
> (trimmed cc)
> 
> On Sat, Mar 10, 2012 at 12:14:37AM +0800, Yang Bai wrote:
>  > On Fri, Mar 9, 2012 at 11:19 PM, Dave Jones <davej@redhat.com> wrote:
>  > > And with that, this arrived..
>  > > https://bugzilla.redhat.com/show_bug.cgi?id=788433#c3
>  > > 
>  > > I'm leaning strongly towards believing this is yet another case of
>  > > i915 corrupting memory on resume.
>  > 
>  > Nice catch. I am wondering
>  > 1) why all lists being affected and
>  > 2) why all list_head's prev being set to NULL.
>  > 
>  > Any ideas?
> 
> This is probably the same bug:
> https://bugzilla.kernel.org/show_bug.cgi?id=37142 Petr noticed that the
> corruption is 32 bytes getting zeroed at the beginning of a page.
> 
> I think this may be responsible for a lot of different bugs that we've
> had reported.
> 
> i915_drm_thaw is a deep nest of functions though, so this is going to be
> hard to track down where that write is coming from. Because the corruption
> seems to happen to pages that are already allocated, we probably can't
> even rely on DEBUG_PAGEALLOC, though it might be worth trying.

If it you believe it could be written by the CPU, I can try to catch the 
instruction that writes to this memory. My plan is as follows:

Set up all the hardware debug registers to trap writes to the pages that are 
likely to get corrupted. Remember, I've seen the corruption happen always 
roughly in the same physical memory area.

I know, there are only 4 registers I can use, and the potential corruption 
area is much larger than 4 pages, but with enough reboots, the chance is quite 
high that I'll be lucky.

I haven't gone for that plan yet, because I thought the area was in fact 
written to by someone else on the PCI bus, not the CPU. If nothing else, I can 
verify that. ;-)

Dave, do you think the result of such testing would help you resolve the bug?

Petr

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: inode->i_wb_list corruption.
  2012-03-15 14:08 Petr Tesařík
@ 2012-03-15 14:22 ` Dave Airlie
  2012-03-15 14:49 ` Dave Jones
  1 sibling, 0 replies; 19+ messages in thread
From: Dave Airlie @ 2012-03-15 14:22 UTC (permalink / raw)
  To: Petr Tesařík
  Cc: Dave Jones, Yang Bai, Fengguang Wu, Linux Kernel,
	Fedora Kernel Team, kernel

On Thu, Mar 15, 2012 at 2:08 PM, Petr Tesařík <petr@tesarici.cz> wrote:
> Dne So 10. března 2012 02:00:15 Dave Jones napsal(a):
>> (trimmed cc)
>>
>> On Sat, Mar 10, 2012 at 12:14:37AM +0800, Yang Bai wrote:
>>  > On Fri, Mar 9, 2012 at 11:19 PM, Dave Jones <davej@redhat.com> wrote:
>>  > > And with that, this arrived..
>>  > > https://bugzilla.redhat.com/show_bug.cgi?id=788433#c3
>>  > >
>>  > > I'm leaning strongly towards believing this is yet another case of
>>  > > i915 corrupting memory on resume.
>>  >
>>  > Nice catch. I am wondering
>>  > 1) why all lists being affected and
>>  > 2) why all list_head's prev being set to NULL.
>>  >
>>  > Any ideas?
>>
>> This is probably the same bug:
>> https://bugzilla.kernel.org/show_bug.cgi?id=37142 Petr noticed that the
>> corruption is 32 bytes getting zeroed at the beginning of a page.
>>
>> I think this may be responsible for a lot of different bugs that we've
>> had reported.
>>
>> i915_drm_thaw is a deep nest of functions though, so this is going to be
>> hard to track down where that write is coming from. Because the corruption
>> seems to happen to pages that are already allocated, we probably can't
>> even rely on DEBUG_PAGEALLOC, though it might be worth trying.
>
> If it you believe it could be written by the CPU, I can try to catch the
> instruction that writes to this memory. My plan is as follows:
>
> Set up all the hardware debug registers to trap writes to the pages that are
> likely to get corrupted. Remember, I've seen the corruption happen always
> roughly in the same physical memory area.
>
> I know, there are only 4 registers I can use, and the potential corruption
> area is much larger than 4 pages, but with enough reboots, the chance is quite
> high that I'll be lucky.
>
> I haven't gone for that plan yet, because I thought the area was in fact
> written to by someone else on the PCI bus, not the CPU. If nothing else, I can
> verify that. ;-)

It would be interesting to maybe dump the GTT then and see where the
pages you see
corruption are in it, if they as in the fbcon object then that kinda
proves the CPU writes them.

Dave.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: inode->i_wb_list corruption.
  2012-03-15 14:08 Petr Tesařík
  2012-03-15 14:22 ` Dave Airlie
@ 2012-03-15 14:49 ` Dave Jones
  1 sibling, 0 replies; 19+ messages in thread
From: Dave Jones @ 2012-03-15 14:49 UTC (permalink / raw)
  To: Petr Tesařík
  Cc: Yang Bai, Fengguang Wu, Linux Kernel, Fedora Kernel Team, kernel

On Thu, Mar 15, 2012 at 10:08:03PM +0800, Petr Tesařík wrote:

 > > i915_drm_thaw is a deep nest of functions though, so this is going to be
 > > hard to track down where that write is coming from. Because the corruption
 > > seems to happen to pages that are already allocated, we probably can't
 > > even rely on DEBUG_PAGEALLOC, though it might be worth trying.
 > 
 > If it you believe it could be written by the CPU, I can try to catch the 
 > instruction that writes to this memory. My plan is as follows:

Given that the corruption pattern looks like pixel data, it's likely that
the writing is being done by the GPU, not the CPU, so debug registers
won't trap it.

	Dave
 

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2012-03-15 14:49 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-06 18:51 inode->i_wb_list corruption Dave Jones
2012-03-06 21:03 ` Jan Kara
2012-03-07  7:26   ` Fengguang Wu
2012-03-07 10:42     ` Jan Kara
2012-03-09  8:34       ` Yang Bai
2012-03-09 14:57         ` Dave Jones
2012-03-09 15:19           ` Dave Jones
2012-03-09 16:14             ` Yang Bai
2012-03-09 18:00               ` Dave Jones
2012-03-09 20:08                 ` Keith Packard
2012-03-09 20:19                   ` Josh Boyer
2012-03-09 22:44                     ` Keith Packard
2012-03-12 21:13                       ` Josh Boyer
2012-03-12 21:27                         ` David Woodhouse
2012-03-12 23:26                   ` Dave Jones
2012-03-13  0:06                     ` Keith Packard
  -- strict thread matches above, loose matches on Subject: below --
2012-03-15 14:08 Petr Tesařík
2012-03-15 14:22 ` Dave Airlie
2012-03-15 14:49 ` Dave Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).