Re: XFS related Oops (suspend/resume related)

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* Re: XFS related Oops (suspend/resume related)
       [not found]       ` <20071126131210.GA4430@eazy.amigager.de>
@ 2007-11-26 21:08         ` David Chinner
  2007-11-26 22:07           ` Rafael J. Wysocki
  0 siblings, 1 reply; 7+ messages in thread
From: David Chinner @ 2007-11-26 21:08 UTC (permalink / raw)
  To: David Chinner, linux-kernel; +Cc: rjw, xfs

On Mon, Nov 26, 2007 at 02:12:10PM +0100, Tino Keitel wrote:
> On Wed, Nov 14, 2007 at 10:04:45 +1100, David Chinner wrote:
> > On Tue, Nov 13, 2007 at 11:51:19AM +0100, Tino Keitel wrote:
> > > On Tue, Nov 13, 2007 at 09:27:20 +1100, David Chinner wrote:
> > > 
> > > [...]
> > > 
> > > > No. I'd say something got screwed up during suspend/resume. Is it
> > > > reproducable?
> > > 
> > > No. I often use suspend to RAM, and usually it works without such
> > > failures. I restart squid during the resume prosecure, and the above
> > > Oops lead to a squid in D state.
> > 
> > Ok. Sounds like there's not much we can debug at this point. Thanks
> > for the report, though.
> 
> I got a similar Oops again:
> 
> xfs_iget_core: ambiguous vns: vp/0xc00700c0, invp/0xcb5a1680

Now there's a message that I haven't seen in about 3 years.

It indicates that the linux inode connected to the xfs_inode is not
the correct one. i.e. that the linux inode cache is out of step with
the XFS inode cache.

Basically, that is not supposed to happen. I suspect that the way
threads are frozen is resulting in an inode lookup racing with
a reclaim. The reclaim thread gets stopped after any use threads,
and so we could have the situation that a process blocked in lookup
has the XFS inode reclaimed and reused before it gets unblocked.

The question is why is it happening now when none of that code in
XFS has changed?

Rafael, when are threads frozen? Only when they schedule or call
try_to_freeze()? Did the freezer mechanism change in 2.6.23 (this is
on 2.6.23.1)?  Is there some way of getting a stack trace of all the
processes in the system once the machine is frozen and about to
suspend so we can see if we blocked in a lookup?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: XFS related Oops (suspend/resume related)
  2007-11-26 21:08         ` XFS related Oops (suspend/resume related) David Chinner
@ 2007-11-26 22:07           ` Rafael J. Wysocki
       [not found]             ` <20071127132000.GA31893@dose.home.local>
  2007-11-27 15:51             ` Rafael J. Wysocki
  0 siblings, 2 replies; 7+ messages in thread
From: Rafael J. Wysocki @ 2007-11-26 22:07 UTC (permalink / raw)
  To: David Chinner; +Cc: linux-kernel, xfs

On Monday, 26 of November 2007, David Chinner wrote:
> On Mon, Nov 26, 2007 at 02:12:10PM +0100, Tino Keitel wrote:
> > On Wed, Nov 14, 2007 at 10:04:45 +1100, David Chinner wrote:
> > > On Tue, Nov 13, 2007 at 11:51:19AM +0100, Tino Keitel wrote:
> > > > On Tue, Nov 13, 2007 at 09:27:20 +1100, David Chinner wrote:
> > > > 
> > > > [...]
> > > > 
> > > > > No. I'd say something got screwed up during suspend/resume. Is it
> > > > > reproducable?
> > > > 
> > > > No. I often use suspend to RAM, and usually it works without such
> > > > failures. I restart squid during the resume prosecure, and the above
> > > > Oops lead to a squid in D state.
> > > 
> > > Ok. Sounds like there's not much we can debug at this point. Thanks
> > > for the report, though.
> > 
> > I got a similar Oops again:
> > 
> > xfs_iget_core: ambiguous vns: vp/0xc00700c0, invp/0xcb5a1680
> 
> Now there's a message that I haven't seen in about 3 years.
> 
> It indicates that the linux inode connected to the xfs_inode is not
> the correct one. i.e. that the linux inode cache is out of step with
> the XFS inode cache.
> 
> Basically, that is not supposed to happen. I suspect that the way
> threads are frozen is resulting in an inode lookup racing with
> a reclaim. The reclaim thread gets stopped after any use threads,
> and so we could have the situation that a process blocked in lookup
> has the XFS inode reclaimed and reused before it gets unblocked.
> 
> The question is why is it happening now when none of that code in
> XFS has changed?
> 
> Rafael, when are threads frozen? Only when they schedule or call
> try_to_freeze()?

Kernel threads freeze only when they call try_to_freeze().  User space tasks
freeze while executing the signals handling code.

> Did the freezer mechanism change in 2.6.23 (this is on 2.6.23.1)?

Yes.  Kernel threads are not sent fake signals by the freezer any more.

> Is there some way of getting a stack trace of all the 
> processes in the system once the machine is frozen and about to
> suspend so we can see if we blocked in a lookup?

Yes.  Please add show_state() before the last "return" in freeze_processes().

On 2.6.23.1 you can test the freezer alone by doing

# echo testproc > /sys/power/disk
# echo disk > /sys/power/state

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: XFS related Oops (suspend/resume related)
       [not found]             ` <20071127132000.GA31893@dose.home.local>
@ 2007-11-27 15:46               ` Rafael J. Wysocki
  0 siblings, 0 replies; 7+ messages in thread
From: Rafael J. Wysocki @ 2007-11-27 15:46 UTC (permalink / raw)
  To: Tino Keitel; +Cc: linux-kernel, David Chinner, xfs

On Tuesday, 27 of November 2007, Tino Keitel wrote:
> On Mon, Nov 26, 2007 at 23:07:56 +0100, Rafael J. Wysocki wrote:
> 
> [...]
> 
> > On 2.6.23.1 you can test the freezer alone by doing
> > 
> > # echo testproc > /sys/power/disk
> > # echo disk > /sys/power/state
> 
> This is suspend to RAM, not to disk.

I know. :-)

Nevertheless, this is how you can test the tasks freezer _without_ actually
doing a suspend of any kind.

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: XFS related Oops (suspend/resume related)
  2007-11-26 22:07           ` Rafael J. Wysocki
       [not found]             ` <20071127132000.GA31893@dose.home.local>
@ 2007-11-27 15:51             ` Rafael J. Wysocki
  2007-11-27 21:11               ` David Chinner
  1 sibling, 1 reply; 7+ messages in thread
From: Rafael J. Wysocki @ 2007-11-27 15:51 UTC (permalink / raw)
  To: David Chinner; +Cc: linux-kernel, xfs

On Monday, 26 of November 2007, Rafael J. Wysocki wrote:
> On Monday, 26 of November 2007, David Chinner wrote:
> > On Mon, Nov 26, 2007 at 02:12:10PM +0100, Tino Keitel wrote:
> > > On Wed, Nov 14, 2007 at 10:04:45 +1100, David Chinner wrote:
> > > > On Tue, Nov 13, 2007 at 11:51:19AM +0100, Tino Keitel wrote:
> > > > > On Tue, Nov 13, 2007 at 09:27:20 +1100, David Chinner wrote:
> > > > > 
> > > > > [...]
> > > > > 
> > > > > > No. I'd say something got screwed up during suspend/resume. Is it
> > > > > > reproducable?
> > > > > 
> > > > > No. I often use suspend to RAM, and usually it works without such
> > > > > failures. I restart squid during the resume prosecure, and the above
> > > > > Oops lead to a squid in D state.
> > > > 
> > > > Ok. Sounds like there's not much we can debug at this point. Thanks
> > > > for the report, though.
> > > 
> > > I got a similar Oops again:
> > > 
> > > xfs_iget_core: ambiguous vns: vp/0xc00700c0, invp/0xcb5a1680
> > 
> > Now there's a message that I haven't seen in about 3 years.
> > 
> > It indicates that the linux inode connected to the xfs_inode is not
> > the correct one. i.e. that the linux inode cache is out of step with
> > the XFS inode cache.
> > 
> > Basically, that is not supposed to happen. I suspect that the way
> > threads are frozen is resulting in an inode lookup racing with
> > a reclaim. The reclaim thread gets stopped after any use threads,
> > and so we could have the situation that a process blocked in lookup
> > has the XFS inode reclaimed and reused before it gets unblocked.
> > 
> > The question is why is it happening now when none of that code in
> > XFS has changed?
> > 
> > Rafael, when are threads frozen? Only when they schedule or call
> > try_to_freeze()?
> 
> Kernel threads freeze only when they call try_to_freeze().  User space tasks
> freeze while executing the signals handling code.
> 
> > Did the freezer mechanism change in 2.6.23 (this is on 2.6.23.1)?
> 
> Yes.  Kernel threads are not sent fake signals by the freezer any more.

Ah, sorry, this change has been merged after 2.6.23.  However, before 2.6.23
we had another important change that caused all kernel threads to have
PF_NOFREEZE set by default, unless they call set_freezable() explicitly.

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: XFS related Oops (suspend/resume related)
  2007-11-27 15:51             ` Rafael J. Wysocki
@ 2007-11-27 21:11               ` David Chinner
  2007-11-27 21:53                 ` Rafael J. Wysocki
  0 siblings, 1 reply; 7+ messages in thread
From: David Chinner @ 2007-11-27 21:11 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: David Chinner, linux-kernel, xfs

On Tue, Nov 27, 2007 at 04:51:38PM +0100, Rafael J. Wysocki wrote:
> On Monday, 26 of November 2007, Rafael J. Wysocki wrote:
> > On Monday, 26 of November 2007, David Chinner wrote:
> > > Now there's a message that I haven't seen in about 3 years.
> > > 
> > > It indicates that the linux inode connected to the xfs_inode is not
> > > the correct one. i.e. that the linux inode cache is out of step with
> > > the XFS inode cache.
> > > 
> > > Basically, that is not supposed to happen. I suspect that the way
> > > threads are frozen is resulting in an inode lookup racing with
> > > a reclaim. The reclaim thread gets stopped after any use threads,
> > > and so we could have the situation that a process blocked in lookup
> > > has the XFS inode reclaimed and reused before it gets unblocked.
> > > 
> > > The question is why is it happening now when none of that code in
> > > XFS has changed?
> > > 
> > > Rafael, when are threads frozen? Only when they schedule or call
> > > try_to_freeze()?
> > 
> > Kernel threads freeze only when they call try_to_freeze().  User space tasks
> > freeze while executing the signals handling code.
> > 
> > > Did the freezer mechanism change in 2.6.23 (this is on 2.6.23.1)?
> > 
> > Yes.  Kernel threads are not sent fake signals by the freezer any more.
> 
> Ah, sorry, this change has been merged after 2.6.23.  However, before 2.6.23
> we had another important change that caused all kernel threads to have
> PF_NOFREEZE set by default, unless they call set_freezable() explicitly.

So try_to_freeze() will never freeze a thread if it has not been
set_freezable()? And xfsbufd will never be frozen?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: XFS related Oops (suspend/resume related)
  2007-11-27 21:11               ` David Chinner
@ 2007-11-27 21:53                 ` Rafael J. Wysocki
  2007-11-29 21:05                   ` Rafael J. Wysocki
  0 siblings, 1 reply; 7+ messages in thread
From: Rafael J. Wysocki @ 2007-11-27 21:53 UTC (permalink / raw)
  To: David Chinner; +Cc: linux-kernel, xfs, Tino Keitel

On Tuesday, 27 of November 2007, David Chinner wrote:
> On Tue, Nov 27, 2007 at 04:51:38PM +0100, Rafael J. Wysocki wrote:
> > On Monday, 26 of November 2007, Rafael J. Wysocki wrote:
> > > On Monday, 26 of November 2007, David Chinner wrote:
> > > > Now there's a message that I haven't seen in about 3 years.
> > > > 
> > > > It indicates that the linux inode connected to the xfs_inode is not
> > > > the correct one. i.e. that the linux inode cache is out of step with
> > > > the XFS inode cache.
> > > > 
> > > > Basically, that is not supposed to happen. I suspect that the way
> > > > threads are frozen is resulting in an inode lookup racing with
> > > > a reclaim. The reclaim thread gets stopped after any use threads,
> > > > and so we could have the situation that a process blocked in lookup
> > > > has the XFS inode reclaimed and reused before it gets unblocked.
> > > > 
> > > > The question is why is it happening now when none of that code in
> > > > XFS has changed?
> > > > 
> > > > Rafael, when are threads frozen? Only when they schedule or call
> > > > try_to_freeze()?
> > > 
> > > Kernel threads freeze only when they call try_to_freeze().  User space tasks
> > > freeze while executing the signals handling code.
> > > 
> > > > Did the freezer mechanism change in 2.6.23 (this is on 2.6.23.1)?
> > > 
> > > Yes.  Kernel threads are not sent fake signals by the freezer any more.
> > 
> > Ah, sorry, this change has been merged after 2.6.23.  However, before 2.6.23
> > we had another important change that caused all kernel threads to have
> > PF_NOFREEZE set by default, unless they call set_freezable() explicitly.
> 
> So try_to_freeze() will never freeze a thread if it has not been
> set_freezable()? And xfsbufd will never be frozen?

No, it won't.

I must have overlooked it, probably because it calls refrigerator() directly
and not try_to_freeze() ...

I think something like the appended patch will help, then.

Greetings,
Rafael


---
Fix breakage caused by commit 831441862956fffa17b9801db37e6ea1650b0f69
that did not introduce the necessary call to set_freezable() in
xfs/linux-2.6/xfs_buf.c .

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 fs/xfs/linux-2.6/xfs_buf.c |    2 ++
 1 file changed, 2 insertions(+)

Index: linux-2.6/fs/xfs/linux-2.6/xfs_buf.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_buf.c
+++ linux-2.6/fs/xfs/linux-2.6/xfs_buf.c
@@ -1750,6 +1750,8 @@ xfsbufd(
 
 	current->flags |= PF_MEMALLOC;
 
+	set_freezable();
+
 	do {
 		if (unlikely(freezing(current))) {
 			set_bit(XBT_FORCE_SLEEP, &target->bt_flags);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: XFS related Oops (suspend/resume related)
  2007-11-27 21:53                 ` Rafael J. Wysocki
@ 2007-11-29 21:05                   ` Rafael J. Wysocki
  0 siblings, 0 replies; 7+ messages in thread
From: Rafael J. Wysocki @ 2007-11-29 21:05 UTC (permalink / raw)
  To: Tino Keitel; +Cc: David Chinner, linux-kernel, xfs

On Tuesday, 27 of November 2007, Rafael J. Wysocki wrote:
> On Tuesday, 27 of November 2007, David Chinner wrote:
> > On Tue, Nov 27, 2007 at 04:51:38PM +0100, Rafael J. Wysocki wrote:
> > > On Monday, 26 of November 2007, Rafael J. Wysocki wrote:
> > > > On Monday, 26 of November 2007, David Chinner wrote:
> > > > > Now there's a message that I haven't seen in about 3 years.
> > > > > 
> > > > > It indicates that the linux inode connected to the xfs_inode is not
> > > > > the correct one. i.e. that the linux inode cache is out of step with
> > > > > the XFS inode cache.
> > > > > 
> > > > > Basically, that is not supposed to happen. I suspect that the way
> > > > > threads are frozen is resulting in an inode lookup racing with
> > > > > a reclaim. The reclaim thread gets stopped after any use threads,
> > > > > and so we could have the situation that a process blocked in lookup
> > > > > has the XFS inode reclaimed and reused before it gets unblocked.
> > > > > 
> > > > > The question is why is it happening now when none of that code in
> > > > > XFS has changed?
> > > > > 
> > > > > Rafael, when are threads frozen? Only when they schedule or call
> > > > > try_to_freeze()?
> > > > 
> > > > Kernel threads freeze only when they call try_to_freeze().  User space tasks
> > > > freeze while executing the signals handling code.
> > > > 
> > > > > Did the freezer mechanism change in 2.6.23 (this is on 2.6.23.1)?
> > > > 
> > > > Yes.  Kernel threads are not sent fake signals by the freezer any more.
> > > 
> > > Ah, sorry, this change has been merged after 2.6.23.  However, before 2.6.23
> > > we had another important change that caused all kernel threads to have
> > > PF_NOFREEZE set by default, unless they call set_freezable() explicitly.
> > 
> > So try_to_freeze() will never freeze a thread if it has not been
> > set_freezable()? And xfsbufd will never be frozen?
> 
> No, it won't.
> 
> I must have overlooked it, probably because it calls refrigerator() directly
> and not try_to_freeze() ...
> 
> I think something like the appended patch will help, then.

Tino, can you check if this patch helps, please?

Greetings,
Rafael


> ---
> Fix breakage caused by commit 831441862956fffa17b9801db37e6ea1650b0f69
> that did not introduce the necessary call to set_freezable() in
> xfs/linux-2.6/xfs_buf.c .
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> ---
>  fs/xfs/linux-2.6/xfs_buf.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> Index: linux-2.6/fs/xfs/linux-2.6/xfs_buf.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_buf.c
> +++ linux-2.6/fs/xfs/linux-2.6/xfs_buf.c
> @@ -1750,6 +1750,8 @@ xfsbufd(
>  
>  	current->flags |= PF_MEMALLOC;
>  
> +	set_freezable();
> +
>  	do {
>  		if (unlikely(freezing(current))) {
>  			set_bit(XBT_FORCE_SLEEP, &target->bt_flags);
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 



-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-11-29 20:47 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20071112064706.GA23595@dose.home.local>
     [not found] ` <20071112222720.GG995458@sgi.com>
     [not found]   ` <20071113105119.GA11527@dose.home.local>
     [not found]     ` <20071113230445.GE995458@sgi.com>
     [not found]       ` <20071126131210.GA4430@eazy.amigager.de>
2007-11-26 21:08         ` XFS related Oops (suspend/resume related) David Chinner
2007-11-26 22:07           ` Rafael J. Wysocki
     [not found]             ` <20071127132000.GA31893@dose.home.local>
2007-11-27 15:46               ` Rafael J. Wysocki
2007-11-27 15:51             ` Rafael J. Wysocki
2007-11-27 21:11               ` David Chinner
2007-11-27 21:53                 ` Rafael J. Wysocki
2007-11-29 21:05                   ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox