Re: [PATCH] Freeze bdevs when freezing processes.

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH] Freeze bdevs when freezing processes.
       [not found] ` <200610231236.54317.rjw@sisk.pl>
@ 2006-10-24 14:44   ` David Chinner
  2006-10-24 15:29     ` Rafael J. Wysocki
  2006-10-24 22:19     ` Nigel Cunningham
  0 siblings, 2 replies; 28+ messages in thread
From: David Chinner @ 2006-10-24 14:44 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Nigel Cunningham, Andrew Morton, LKML, Pavel Machek, xfs

On Mon, Oct 23, 2006 at 12:36:53PM +0200, Rafael J. Wysocki wrote:
> On Monday, 23 October 2006 06:12, Nigel Cunningham wrote:
> > XFS can continue to submit I/O from a timer routine, even after
> > freezeable kernel and userspace threads are frozen. This doesn't seem to
> > be an issue for current swsusp code,
> 
> So it doesn't look like we need the patch _now_.
> 
> > but is definitely an issue for Suspend2, where the pages being written could
> > be overwritten by Suspend2's atomic copy.
> 
> And IMO that's a good reason why we shouldn't use RCU pages for storing the
> image.  XFS is one known example that breaks things if we do so and
> there may be more such things that we don't know of.  The fact that they
> haven't appeared in testing so far doesn't mean they don't exist and
> moreover some things like that may appear in the future.

Could you please tell us which XFS bits are broken so we can get
them fixed?  The XFS daemons should all be checking if they are
supposed to freeze (i.e. they call try_to_freeze() after they wake
up due to timer expiry) so I thought they were doing the right
thing.

However, I have to say that I agree with freezing the filesystems
before suspend - at least XFS will be in a consistent state that can
be recovered from without corruption if your machine fails to
resume....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Freeze bdevs when freezing processes.
  2006-10-24 14:44   ` [PATCH] Freeze bdevs when freezing processes David Chinner
@ 2006-10-24 15:29     ` Rafael J. Wysocki
  2006-10-24 16:27       ` Oleg Verych
                         ` (2 more replies)
  2006-10-24 22:19     ` Nigel Cunningham
  1 sibling, 3 replies; 28+ messages in thread
From: Rafael J. Wysocki @ 2006-10-24 15:29 UTC (permalink / raw)
  To: David Chinner; +Cc: Nigel Cunningham, Andrew Morton, LKML, Pavel Machek, xfs

On Tuesday, 24 October 2006 16:44, David Chinner wrote:
> On Mon, Oct 23, 2006 at 12:36:53PM +0200, Rafael J. Wysocki wrote:
> > On Monday, 23 October 2006 06:12, Nigel Cunningham wrote:
> > > XFS can continue to submit I/O from a timer routine, even after
> > > freezeable kernel and userspace threads are frozen. This doesn't seem to
> > > be an issue for current swsusp code,
> > 
> > So it doesn't look like we need the patch _now_.
> > 
> > > but is definitely an issue for Suspend2, where the pages being written could
> > > be overwritten by Suspend2's atomic copy.
> > 
> > And IMO that's a good reason why we shouldn't use RCU pages for storing the
> > image.  XFS is one known example that breaks things if we do so and
> > there may be more such things that we don't know of.  The fact that they
> > haven't appeared in testing so far doesn't mean they don't exist and
> > moreover some things like that may appear in the future.
> 
> Could you please tell us which XFS bits are broken so we can get
> them fixed?  The XFS daemons should all be checking if they are
> supposed to freeze (i.e. they call try_to_freeze() after they wake
> up due to timer expiry) so I thought they were doing the right
> thing.
> 
> However, I have to say that I agree with freezing the filesystems
> before suspend - at least XFS will be in a consistent state that can
> be recovered from without corruption if your machine fails to
> resume....

Do you mean calling sys_sync() after the userspace has been frozen
may not be sufficient?

Greetings,
Rafael


-- 
You never change things by fighting the existing reality.
		R. Buckminster Fuller

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Freeze bdevs when freezing processes.
  2006-10-24 15:29     ` Rafael J. Wysocki
@ 2006-10-24 16:27       ` Oleg Verych
  2006-10-25  8:05         ` Pavel Machek
  2006-10-24 16:33       ` David Chinner
  2006-10-24 17:06       ` Christoph Hellwig
  2 siblings, 1 reply; 28+ messages in thread
From: Oleg Verych @ 2006-10-24 16:27 UTC (permalink / raw)
  To: Rafael J. Wysocki, David Chinner, Nigel Cunningham, Andrew Morton,
	Pavel Machek, xfs, Oleg Verych


On 2006-10-24, Rafael J. Wysocki wrote:
> On Tuesday, 24 October 2006 16:44, David Chinner wrote:
>> On Mon, Oct 23, 2006 at 12:36:53PM +0200, Rafael J. Wysocki wrote:
>> > On Monday, 23 October 2006 06:12, Nigel Cunningham wrote:
>> > > XFS can continue to submit I/O from a timer routine, even after
>> > > freezeable kernel and userspace threads are frozen. This doesn't seem to
>> > > be an issue for current swsusp code,
>> > 
>> > So it doesn't look like we need the patch _now_.
>> > 
>> > > but is definitely an issue for Suspend2, where the pages being written could
>> > > be overwritten by Suspend2's atomic copy.
>> > 
>> > And IMO that's a good reason why we shouldn't use RCU pages for storing the
>> > image.  XFS is one known example that breaks things if we do so and
>> > there may be more such things that we don't know of.  The fact that they
>> > haven't appeared in testing so far doesn't mean they don't exist and
>> > moreover some things like that may appear in the future.
>> 
>> Could you please tell us which XFS bits are broken so we can get
>> them fixed?  The XFS daemons should all be checking if they are
>> supposed to freeze (i.e. they call try_to_freeze() after they wake
>> up due to timer expiry) so I thought they were doing the right
>> thing.
>> 
>> However, I have to say that I agree with freezing the filesystems
>> before suspend - at least XFS will be in a consistent state that can
>> be recovered from without corruption if your machine fails to
>> resume....
>
> Do you mean calling sys_sync() after the userspace has been frozen
> may not be sufficient?

Please see
<http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=317479;msg=105;att=0>

it's bottom of
<http://bugs.debian.org/317479>

IMHO it's may be helpful.

-o--=O`C
 #oo'L O
<___=E M

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Freeze bdevs when freezing processes.
  2006-10-24 16:27       ` Oleg Verych
@ 2006-10-25  8:05         ` Pavel Machek
  0 siblings, 0 replies; 28+ messages in thread
From: Pavel Machek @ 2006-10-25  8:05 UTC (permalink / raw)
  To: Oleg Verych
  Cc: Rafael J. Wysocki, David Chinner, Nigel Cunningham, Andrew Morton,
	xfs

Hi!

> >> However, I have to say that I agree with freezing the filesystems
> >> before suspend - at least XFS will be in a consistent state that can
> >> be recovered from without corruption if your machine fails to
> >> resume....
> >
> > Do you mean calling sys_sync() after the userspace has been frozen
> > may not be sufficient?
> 
> Please see
> <http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=317479;msg=105;att=0>
> 
> it's bottom of
> <http://bugs.debian.org/317479>
> 
> IMHO it's may be helpful.

Heh, ouch.

Okay, so what happens is that Debian's scripts attempt to change
/boot/../grub.conf.something (xfs mounted). Changes are properly
propagated to the journal, but grub can't parse the journal, and
fails.

Ouch.

Either

1) fix grub to parse the journal (not practical, I guess)

2) do not modify grub configuration from hibernate scripts

or

3) fix hibernate scripts to sync down to disk, properly.

swsusp does not guarantee any particular state of filesystems when
system is suspended. Sorry.
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Freeze bdevs when freezing processes.
  2006-10-24 15:29     ` Rafael J. Wysocki
  2006-10-24 16:27       ` Oleg Verych
@ 2006-10-24 16:33       ` David Chinner
  2006-10-24 21:37         ` Pavel Machek
  2006-10-24 17:06       ` Christoph Hellwig
  2 siblings, 1 reply; 28+ messages in thread
From: David Chinner @ 2006-10-24 16:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: David Chinner, Nigel Cunningham, Andrew Morton, LKML,
	Pavel Machek, xfs

On Tue, Oct 24, 2006 at 05:29:59PM +0200, Rafael J. Wysocki wrote:
> On Tuesday, 24 October 2006 16:44, David Chinner wrote:
> > On Mon, Oct 23, 2006 at 12:36:53PM +0200, Rafael J. Wysocki wrote:
> > > On Monday, 23 October 2006 06:12, Nigel Cunningham wrote:
> > > > XFS can continue to submit I/O from a timer routine, even after
> > > > freezeable kernel and userspace threads are frozen. This doesn't seem to
> > > > be an issue for current swsusp code,
> > > 
> > > So it doesn't look like we need the patch _now_.
> > > 
> > > > but is definitely an issue for Suspend2, where the pages being written could
> > > > be overwritten by Suspend2's atomic copy.
> > > 
> > > And IMO that's a good reason why we shouldn't use RCU pages for storing the
> > > image.  XFS is one known example that breaks things if we do so and
> > > there may be more such things that we don't know of.  The fact that they
> > > haven't appeared in testing so far doesn't mean they don't exist and
> > > moreover some things like that may appear in the future.
> > 
> > Could you please tell us which XFS bits are broken so we can get
> > them fixed?  The XFS daemons should all be checking if they are
> > supposed to freeze (i.e. they call try_to_freeze() after they wake
> > up due to timer expiry) so I thought they were doing the right
> > thing.
> > 
> > However, I have to say that I agree with freezing the filesystems
> > before suspend - at least XFS will be in a consistent state that can
> > be recovered from without corruption if your machine fails to
> > resume....
> 
> Do you mean calling sys_sync() after the userspace has been frozen
> may not be sufficient?

In most cases it probably is, but sys_sync() doesn't provide any
guarantees that the filesystem is not being used or written to after
it completes. Given that every so often I hear about an XFS filesystem
that was corrupted by suspend, I don't think this is sufficient...

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Freeze bdevs when freezing processes.
  2006-10-24 16:33       ` David Chinner
@ 2006-10-24 21:37         ` Pavel Machek
  2006-10-25  0:13           ` David Chinner
  0 siblings, 1 reply; 28+ messages in thread
From: Pavel Machek @ 2006-10-24 21:37 UTC (permalink / raw)
  To: David Chinner
  Cc: Rafael J. Wysocki, Nigel Cunningham, Andrew Morton, LKML, xfs

Hi!

> > Do you mean calling sys_sync() after the userspace has been frozen
> > may not be sufficient?
> 
> In most cases it probably is, but sys_sync() doesn't provide any
> guarantees that the filesystem is not being used or written to after
> it completes. Given that every so often I hear about an XFS filesystem
> that was corrupted by suspend, I don't think this is sufficient...

Userspace is frozen. There's noone that can write to the XFS
filesystem.

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Freeze bdevs when freezing processes.
  2006-10-24 21:37         ` Pavel Machek
@ 2006-10-25  0:13           ` David Chinner
  2006-10-25  8:10             ` Pavel Machek
  0 siblings, 1 reply; 28+ messages in thread
From: David Chinner @ 2006-10-25  0:13 UTC (permalink / raw)
  To: Pavel Machek
  Cc: David Chinner, Rafael J. Wysocki, Nigel Cunningham, Andrew Morton,
	LKML, xfs

On Tue, Oct 24, 2006 at 11:37:37PM +0200, Pavel Machek wrote:
> Hi!
> 
> > > Do you mean calling sys_sync() after the userspace has been frozen
> > > may not be sufficient?
> > 
> > In most cases it probably is, but sys_sync() doesn't provide any
> > guarantees that the filesystem is not being used or written to after
> > it completes. Given that every so often I hear about an XFS filesystem
> > that was corrupted by suspend, I don't think this is sufficient...
> 
> Userspace is frozen. There's noone that can write to the XFS
> filesystem.

Sure, no new userspace processes can write data, but what about the
internal state of the filesystem?

All a sync guarantees is that the filesystem is consistent when the
sync returns and XFS provides this guarantee by writing all data and
ensuring all metadata changes are logged so if a crash occurs it can
be recovered (which provides the sync guarantee). hence after a
sys_sync(), XFS will still have lots of dirty metadata that needs to
be written to disk at some time in the future so the transactions
can be removed from the log.

This dirty metadata can be flushed at any time, and the dirty state
is kept in XFS structures and not always in page structures (think
multipage metadata buffers). Hence I cannot see how suspend can
guarantee that it has saved all the dirty data in XFS, nor
restore it correctly on resume. Once you toss dirty metadata that
is currently in the log, further operations will result in that log
transaction being overwritten without it ever being written to disk.
That then means any subsequent operations after resume will corrupt
the filesystem....

Hence the only way to correctly rebuild the XFS state on resume is
to quiesce the filesystem on suspend and thaw it on resume so as to
trigger log recovery.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Freeze bdevs when freezing processes.
  2006-10-25  0:13           ` David Chinner
@ 2006-10-25  8:10             ` Pavel Machek
  2006-10-25  8:38               ` David Chinner
  0 siblings, 1 reply; 28+ messages in thread
From: Pavel Machek @ 2006-10-25  8:10 UTC (permalink / raw)
  To: David Chinner
  Cc: Rafael J. Wysocki, Nigel Cunningham, Andrew Morton, LKML, xfs

Hi!

> > > > Do you mean calling sys_sync() after the userspace has been frozen
> > > > may not be sufficient?
> > > 
> > > In most cases it probably is, but sys_sync() doesn't provide any
> > > guarantees that the filesystem is not being used or written to after
> > > it completes. Given that every so often I hear about an XFS filesystem
> > > that was corrupted by suspend, I don't think this is sufficient...
> > 
> > Userspace is frozen. There's noone that can write to the XFS
> > filesystem.
> 
> Sure, no new userspace processes can write data, but what about the
> internal state of the filesystem?
> 
> All a sync guarantees is that the filesystem is consistent when the
> sync returns and XFS provides this guarantee by writing all data and
> ensuring all metadata changes are logged so if a crash occurs it can
> be recovered (which provides the sync guarantee). hence after a
> sys_sync(), XFS will still have lots of dirty metadata that needs to
> be written to disk at some time in the future so the transactions
> can be removed from the log.
> 
> This dirty metadata can be flushed at any time, and the dirty state
> is kept in XFS structures and not always in page structures (think
> multipage metadata buffers). Hence I cannot see how suspend can
> guarantee that it has saved all the dirty data in XFS, nor
> restore it correctly on resume. Once you toss dirty metadata that
> is currently in the log, further operations will result in that log
> transaction being overwritten without it ever being written to disk.
> That then means any subsequent operations after resume will corrupt
> the filesystem....
> 
> Hence the only way to correctly rebuild the XFS state on resume is
> to quiesce the filesystem on suspend and thaw it on resume so as to
> trigger log recovery.

No, during suspend/resume, memory image is saved, and no state is
lost. We would not even have to do sys_sync(), and suspend/resume
would still work properly.

sys_sync() is there only to limit damage in case of suspend/resume
failure.

								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Freeze bdevs when freezing processes.
  2006-10-25  8:10             ` Pavel Machek
@ 2006-10-25  8:38               ` David Chinner
  2006-10-25  8:47                 ` Pavel Machek
  0 siblings, 1 reply; 28+ messages in thread
From: David Chinner @ 2006-10-25  8:38 UTC (permalink / raw)
  To: Pavel Machek
  Cc: David Chinner, Rafael J. Wysocki, Nigel Cunningham, Andrew Morton,
	LKML, xfs

On Wed, Oct 25, 2006 at 10:10:01AM +0200, Pavel Machek wrote:
> > Hence the only way to correctly rebuild the XFS state on resume is
> > to quiesce the filesystem on suspend and thaw it on resume so as to
> > trigger log recovery.
> 
> No, during suspend/resume, memory image is saved, and no state is
> lost. We would not even have to do sys_sync(), and suspend/resume
> would still work properly.

It seems to me that you ensure the filesystem is synced to disk and
then at some point later you record the memory state of the
filesystem, but these happen at different times. That leaves a
window for things to get out of sync again, right?

Cheers,

Dave.

-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Freeze bdevs when freezing processes.
  2006-10-25  8:38               ` David Chinner
@ 2006-10-25  8:47                 ` Pavel Machek
  2006-10-25 12:32                   ` Rafael J. Wysocki
  0 siblings, 1 reply; 28+ messages in thread
From: Pavel Machek @ 2006-10-25  8:47 UTC (permalink / raw)
  To: David Chinner
  Cc: Rafael J. Wysocki, Nigel Cunningham, Andrew Morton, LKML, xfs

On Wed 2006-10-25 18:38:30, David Chinner wrote:
> On Wed, Oct 25, 2006 at 10:10:01AM +0200, Pavel Machek wrote:
> > > Hence the only way to correctly rebuild the XFS state on resume is
> > > to quiesce the filesystem on suspend and thaw it on resume so as to
> > > trigger log recovery.
> > 
> > No, during suspend/resume, memory image is saved, and no state is
> > lost. We would not even have to do sys_sync(), and suspend/resume
> > would still work properly.
> 
> It seems to me that you ensure the filesystem is synced to disk and
> then at some point later you record the memory state of the
> filesystem, but these happen at different times. That leaves a
> window for things to get out of sync again, right?

I DO NOT HAVE TO ENSURE FILESYSTEM IS SYNCED. That sys_sync() is
optional.

Recording of memory state is atomic, and as long as noone writes to
the disk after atomic snapshot, memory image matches what is on disk.

								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Freeze bdevs when freezing processes.
  2006-10-25  8:47                 ` Pavel Machek
@ 2006-10-25 12:32                   ` Rafael J. Wysocki
  2006-10-25 13:23                     ` Nigel Cunningham
  0 siblings, 1 reply; 28+ messages in thread
From: Rafael J. Wysocki @ 2006-10-25 12:32 UTC (permalink / raw)
  To: Pavel Machek; +Cc: David Chinner, Nigel Cunningham, Andrew Morton, LKML, xfs

On Wednesday, 25 October 2006 10:47, Pavel Machek wrote:
> On Wed 2006-10-25 18:38:30, David Chinner wrote:
> > On Wed, Oct 25, 2006 at 10:10:01AM +0200, Pavel Machek wrote:
> > > > Hence the only way to correctly rebuild the XFS state on resume is
> > > > to quiesce the filesystem on suspend and thaw it on resume so as to
> > > > trigger log recovery.
> > > 
> > > No, during suspend/resume, memory image is saved, and no state is
> > > lost. We would not even have to do sys_sync(), and suspend/resume
> > > would still work properly.
> > 
> > It seems to me that you ensure the filesystem is synced to disk and
> > then at some point later you record the memory state of the
> > filesystem, but these happen at different times. That leaves a
> > window for things to get out of sync again, right?
> 
> I DO NOT HAVE TO ENSURE FILESYSTEM IS SYNCED. That sys_sync() is
> optional.
> 
> Recording of memory state is atomic, and as long as noone writes to
> the disk after atomic snapshot, memory image matches what is on disk.

Well, my impression is that this is exactly what happens here: Something
in the XFS code causes metadata to be written to disk _after_ the atomic
snapshot.

That's why I asked if the dirty XFS metadata were flushed by a kernel thread.

Greetings,
Rafael


-- 
You never change things by fighting the existing reality.
		R. Buckminster Fuller

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Freeze bdevs when freezing processes.
  2006-10-25 12:32                   ` Rafael J. Wysocki
@ 2006-10-25 13:23                     ` Nigel Cunningham
       [not found]                       ` <200610252105.56862.rjw@sisk.pl>
  0 siblings, 1 reply; 28+ messages in thread
From: Nigel Cunningham @ 2006-10-25 13:23 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Pavel Machek, David Chinner, Andrew Morton, LKML, xfs

Hi.

On Wed, 2006-10-25 at 14:32 +0200, Rafael J. Wysocki wrote:
> On Wednesday, 25 October 2006 10:47, Pavel Machek wrote:
> > On Wed 2006-10-25 18:38:30, David Chinner wrote:
> > > On Wed, Oct 25, 2006 at 10:10:01AM +0200, Pavel Machek wrote:
> > > > > Hence the only way to correctly rebuild the XFS state on resume is
> > > > > to quiesce the filesystem on suspend and thaw it on resume so as to
> > > > > trigger log recovery.
> > > > 
> > > > No, during suspend/resume, memory image is saved, and no state is
> > > > lost. We would not even have to do sys_sync(), and suspend/resume
> > > > would still work properly.
> > > 
> > > It seems to me that you ensure the filesystem is synced to disk and
> > > then at some point later you record the memory state of the
> > > filesystem, but these happen at different times. That leaves a
> > > window for things to get out of sync again, right?
> > 
> > I DO NOT HAVE TO ENSURE FILESYSTEM IS SYNCED. That sys_sync() is
> > optional.
> > 
> > Recording of memory state is atomic, and as long as noone writes to
> > the disk after atomic snapshot, memory image matches what is on disk.
> 
> Well, my impression is that this is exactly what happens here: Something
> in the XFS code causes metadata to be written to disk _after_ the atomic
> snapshot.
> 
> That's why I asked if the dirty XFS metadata were flushed by a kernel thread.

When I first added bdev freezing it was because there was an XFS timer
doing writes.

Regards,

Nigel

^ permalink raw reply	[flat|nested] 28+ messages in thread

[parent not found: <200610252105.56862.rjw@sisk.pl>]

* Re: [PATCH] Freeze bdevs when freezing processes.
       [not found]                       ` <200610252105.56862.rjw@sisk.pl>
@ 2006-10-26  7:30                         ` David Chinner
  2006-10-26  8:18                           ` Nigel Cunningham
  0 siblings, 1 reply; 28+ messages in thread
From: David Chinner @ 2006-10-26  7:30 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Nigel Cunningham, Pavel Machek, David Chinner, Andrew Morton,
	LKML, xfs

On Wed, Oct 25, 2006 at 09:05:56PM +0200, Rafael J. Wysocki wrote:
> On Wednesday, 25 October 2006 15:23, Nigel Cunningham wrote:
> > > 
> > > Well, my impression is that this is exactly what happens here: Something
> > > in the XFS code causes metadata to be written to disk _after_ the atomic
> > > snapshot.
> > > 
> > > That's why I asked if the dirty XFS metadata were flushed by a kernel thread.
> > 
> > When I first added bdev freezing it was because there was an XFS timer
> > doing writes.
> 
> Yes, I noticed you said that, but I'd like someone from the XFS team to either
> confirm or deny it.

We have daemons running in the background that can definitely do stuff
after a sync. hmm - one does try_to_freeze() after a wakeup, the
other does:

                if (unlikely(freezing(current))) {
                        set_bit(XBT_FORCE_SLEEP, &target->bt_flags);
                        refrigerator();
                } else {
                        clear_bit(XBT_FORCE_SLEEP, &target->bt_flags);
                }

before it goes to sleep. So that one (xfsbufd - metadata buffer flushing)
can definitely wake up after the sync and do work, and the other could if
the kernel thread freeze occurs after the sync.

Another good question at this point - exactly how should we be putting
these thread to to sleep? Are both these valid methods for freezing them?
And should we be freezing when we wake up instead of before we go to
sleep? i.e. what are teh rules we are supposed to be following?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Freeze bdevs when freezing processes.
  2006-10-26  7:30                         ` David Chinner
@ 2006-10-26  8:18                           ` Nigel Cunningham
  2006-10-26  8:48                             ` Rafael J. Wysocki
                                               ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Nigel Cunningham @ 2006-10-26  8:18 UTC (permalink / raw)
  To: David Chinner; +Cc: Rafael J. Wysocki, Pavel Machek, Andrew Morton, LKML, xfs

Hi Dave.

On Thu, 2006-10-26 at 17:30 +1000, David Chinner wrote:
> On Wed, Oct 25, 2006 at 09:05:56PM +0200, Rafael J. Wysocki wrote:
> > On Wednesday, 25 October 2006 15:23, Nigel Cunningham wrote:
> > > > 
> > > > Well, my impression is that this is exactly what happens here: Something
> > > > in the XFS code causes metadata to be written to disk _after_ the atomic
> > > > snapshot.
> > > > 
> > > > That's why I asked if the dirty XFS metadata were flushed by a kernel thread.
> > > 
> > > When I first added bdev freezing it was because there was an XFS timer
> > > doing writes.
> > 
> > Yes, I noticed you said that, but I'd like someone from the XFS team to either
> > confirm or deny it.
> 
> We have daemons running in the background that can definitely do stuff
> after a sync. hmm - one does try_to_freeze() after a wakeup, the
> other does:
> 
>                 if (unlikely(freezing(current))) {
>                         set_bit(XBT_FORCE_SLEEP, &target->bt_flags);
>                         refrigerator();
>                 } else {
>                         clear_bit(XBT_FORCE_SLEEP, &target->bt_flags);
>                 }
> 
> before it goes to sleep. So that one (xfsbufd - metadata buffer flushing)
> can definitely wake up after the sync and do work, and the other could if
> the kernel thread freeze occurs after the sync.
> 
> Another good question at this point - exactly how should we be putting
> these thread to to sleep? Are both these valid methods for freezing them?
> And should we be freezing when we wake up instead of before we go to
> sleep? i.e. what are teh rules we are supposed to be following?

As you have them at the moment, the threads seem to be freezing fine.
The issue I've seen in the past related not to threads but to timer
based activity. Admittedly it was 2.6.14 when I last looked at it, but
there used to be a possibility for XFS to submit I/O from a timer when
the threads are frozen but the bdev isn't frozen. Has that changed?

Regards,

Nigel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Freeze bdevs when freezing processes.
  2006-10-26  8:18                           ` Nigel Cunningham
@ 2006-10-26  8:48                             ` Rafael J. Wysocki
  2006-10-26  8:57                             ` David Chinner
  2006-10-26  9:08                             ` Rafael J. Wysocki
  2 siblings, 0 replies; 28+ messages in thread
From: Rafael J. Wysocki @ 2006-10-26  8:48 UTC (permalink / raw)
  To: Nigel Cunningham; +Cc: David Chinner, Pavel Machek, Andrew Morton, LKML, xfs

On Thursday, 26 October 2006 10:18, Nigel Cunningham wrote:
> Hi Dave.
> 
> On Thu, 2006-10-26 at 17:30 +1000, David Chinner wrote:
> > On Wed, Oct 25, 2006 at 09:05:56PM +0200, Rafael J. Wysocki wrote:
> > > On Wednesday, 25 October 2006 15:23, Nigel Cunningham wrote:
> > > > > 
> > > > > Well, my impression is that this is exactly what happens here: Something
> > > > > in the XFS code causes metadata to be written to disk _after_ the atomic
> > > > > snapshot.
> > > > > 
> > > > > That's why I asked if the dirty XFS metadata were flushed by a kernel thread.
> > > > 
> > > > When I first added bdev freezing it was because there was an XFS timer
> > > > doing writes.
> > > 
> > > Yes, I noticed you said that, but I'd like someone from the XFS team to either
> > > confirm or deny it.
> > 
> > We have daemons running in the background that can definitely do stuff
> > after a sync. hmm - one does try_to_freeze() after a wakeup, the
> > other does:
> > 
> >                 if (unlikely(freezing(current))) {
> >                         set_bit(XBT_FORCE_SLEEP, &target->bt_flags);
> >                         refrigerator();
> >                 } else {
> >                         clear_bit(XBT_FORCE_SLEEP, &target->bt_flags);
> >                 }
> > 
> > before it goes to sleep. So that one (xfsbufd - metadata buffer flushing)
> > can definitely wake up after the sync and do work,

Once it's entered the refrigerator, it won't return from it before
freeze_processes() is called (usually during the resume).

> > and the other could if the kernel thread freeze occurs after the sync.

Even if it does anything after the sync, it's okay as long as that's before we
create the image.  And it is, because both threads are frozen before we do it.

> > Another good question at this point - exactly how should we be putting
> > these thread to to sleep? Are both these valid methods for freezing them?
> > And should we be freezing when we wake up instead of before we go to
> > sleep? i.e. what are teh rules we are supposed to be following?
> 
> As you have them at the moment, the threads seem to be freezing fine.

Yes.  IMO they are freezing fine.

In fact the suspend would fail if they didn't freeze unless one of them had
PF_NOFREEZE set.

Greetings,
Rafael


-- 
You never change things by fighting the existing reality.
		R. Buckminster Fuller

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Freeze bdevs when freezing processes.
  2006-10-26  8:18                           ` Nigel Cunningham
  2006-10-26  8:48                             ` Rafael J. Wysocki
@ 2006-10-26  8:57                             ` David Chinner
  2006-10-26  9:11                               ` Rafael J. Wysocki
  2006-10-26  9:18                               ` Nigel Cunningham
  2006-10-26  9:08                             ` Rafael J. Wysocki
  2 siblings, 2 replies; 28+ messages in thread
From: David Chinner @ 2006-10-26  8:57 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: David Chinner, Rafael J. Wysocki, Pavel Machek, Andrew Morton,
	LKML, xfs

Hi Nigel,

On Thu, Oct 26, 2006 at 06:18:29PM +1000, Nigel Cunningham wrote:
> On Thu, 2006-10-26 at 17:30 +1000, David Chinner wrote:
> > We have daemons running in the background that can definitely do stuff
> > after a sync. hmm - one does try_to_freeze() after a wakeup, the
> > other does:
> > 
> >                 if (unlikely(freezing(current))) {
> >                         set_bit(XBT_FORCE_SLEEP, &target->bt_flags);
> >                         refrigerator();
> >                 } else {
> >                         clear_bit(XBT_FORCE_SLEEP, &target->bt_flags);
> >                 }
> > 
> > before it goes to sleep. So that one (xfsbufd - metadata buffer flushing)
> > can definitely wake up after the sync and do work, and the other could if
> > the kernel thread freeze occurs after the sync.
> > 
> > Another good question at this point - exactly how should we be putting
> > these thread to to sleep? Are both these valid methods for freezing them?
> > And should we be freezing when we wake up instead of before we go to
> > sleep? i.e. what are teh rules we are supposed to be following?
> 
> As you have them at the moment, the threads seem to be freezing fine.
> The issue I've seen in the past related not to threads but to timer
> based activity. Admittedly it was 2.6.14 when I last looked at it, but
> there used to be a possibility for XFS to submit I/O from a timer when
> the threads are frozen but the bdev isn't frozen. Has that changed?

I didn't think we've ever done that - periodic or delayed operations
are passed off to the kernel threads to execute. A stack trace
(if you still have it) would be really help here.

Hmmm - we have a couple of per-cpu work queues as well that are
used on I/O completion and that can, in some circumstances,
trigger new transactions. If we are only flush metadata, then
I don't think that any more I/o will be issued, but I could be
wrong (maze of twisty passages).

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Freeze bdevs when freezing processes.
  2006-10-26  8:57                             ` David Chinner
@ 2006-10-26  9:11                               ` Rafael J. Wysocki
  2006-10-27  1:38                                 ` David Chinner
  2006-10-26  9:18                               ` Nigel Cunningham
  1 sibling, 1 reply; 28+ messages in thread
From: Rafael J. Wysocki @ 2006-10-26  9:11 UTC (permalink / raw)
  To: David Chinner; +Cc: Nigel Cunningham, Pavel Machek, Andrew Morton, LKML, xfs

Hi,

On Thursday, 26 October 2006 10:57, David Chinner wrote:
> Hi Nigel,
> 
> On Thu, Oct 26, 2006 at 06:18:29PM +1000, Nigel Cunningham wrote:
> > On Thu, 2006-10-26 at 17:30 +1000, David Chinner wrote:
> > > We have daemons running in the background that can definitely do stuff
> > > after a sync. hmm - one does try_to_freeze() after a wakeup, the
> > > other does:
> > > 
> > >                 if (unlikely(freezing(current))) {
> > >                         set_bit(XBT_FORCE_SLEEP, &target->bt_flags);
> > >                         refrigerator();
> > >                 } else {
> > >                         clear_bit(XBT_FORCE_SLEEP, &target->bt_flags);
> > >                 }
> > > 
> > > before it goes to sleep. So that one (xfsbufd - metadata buffer flushing)
> > > can definitely wake up after the sync and do work, and the other could if
> > > the kernel thread freeze occurs after the sync.
> > > 
> > > Another good question at this point - exactly how should we be putting
> > > these thread to to sleep? Are both these valid methods for freezing them?
> > > And should we be freezing when we wake up instead of before we go to
> > > sleep? i.e. what are teh rules we are supposed to be following?
> > 
> > As you have them at the moment, the threads seem to be freezing fine.
> > The issue I've seen in the past related not to threads but to timer
> > based activity. Admittedly it was 2.6.14 when I last looked at it, but
> > there used to be a possibility for XFS to submit I/O from a timer when
> > the threads are frozen but the bdev isn't frozen. Has that changed?
> 
> I didn't think we've ever done that - periodic or delayed operations
> are passed off to the kernel threads to execute. A stack trace
> (if you still have it) would be really help here.
> 
> Hmmm - we have a couple of per-cpu work queues as well that are
> used on I/O completion and that can, in some circumstances,
> trigger new transactions. If we are only flush metadata, then
> I don't think that any more I/o will be issued, but I could be
> wrong (maze of twisty passages).

Well, I think this exactly is the problem, because worker_threads run with
PF_NOFREEZE set (as I've just said in another message).

Greetings,
Rafael


-- 
You never change things by fighting the existing reality.
		R. Buckminster Fuller

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Freeze bdevs when freezing processes.
  2006-10-26  9:11                               ` Rafael J. Wysocki
@ 2006-10-27  1:38                                 ` David Chinner
  2006-10-27 14:37                                   ` Rafael J. Wysocki
  2006-10-29 17:35                                   ` Pavel Machek
  0 siblings, 2 replies; 28+ messages in thread
From: David Chinner @ 2006-10-27  1:38 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: David Chinner, Nigel Cunningham, Pavel Machek, Andrew Morton,
	LKML, xfs

On Thu, Oct 26, 2006 at 11:11:29AM +0200, Rafael J. Wysocki wrote:
> On Thursday, 26 October 2006 10:57, David Chinner wrote:
> > On Thu, Oct 26, 2006 at 06:18:29PM +1000, Nigel Cunningham wrote:
> > > As you have them at the moment, the threads seem to be freezing fine.
> > > The issue I've seen in the past related not to threads but to timer
> > > based activity. Admittedly it was 2.6.14 when I last looked at it, but
> > > there used to be a possibility for XFS to submit I/O from a timer when
> > > the threads are frozen but the bdev isn't frozen. Has that changed?
> > 
> > I didn't think we've ever done that - periodic or delayed operations
> > are passed off to the kernel threads to execute. A stack trace
> > (if you still have it) would be really help here.
> > 
> > Hmmm - we have a couple of per-cpu work queues as well that are
> > used on I/O completion and that can, in some circumstances,
> > trigger new transactions. If we are only flush metadata, then
> > I don't think that any more I/o will be issued, but I could be
> > wrong (maze of twisty passages).
> 
> Well, I think this exactly is the problem, because worker_threads run with
> PF_NOFREEZE set (as I've just said in another message).

Ok, so freezing the filesystem is the only way you can prevent
this as the workqueues are flushed as part of quiescing the filesystem.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Freeze bdevs when freezing processes.
  2006-10-27  1:38                                 ` David Chinner
@ 2006-10-27 14:37                                   ` Rafael J. Wysocki
  2006-10-29 17:35                                   ` Pavel Machek
  1 sibling, 0 replies; 28+ messages in thread
From: Rafael J. Wysocki @ 2006-10-27 14:37 UTC (permalink / raw)
  To: David Chinner; +Cc: Nigel Cunningham, Pavel Machek, Andrew Morton, LKML, xfs

On Friday, 27 October 2006 03:38, David Chinner wrote:
> On Thu, Oct 26, 2006 at 11:11:29AM +0200, Rafael J. Wysocki wrote:
> > On Thursday, 26 October 2006 10:57, David Chinner wrote:
> > > On Thu, Oct 26, 2006 at 06:18:29PM +1000, Nigel Cunningham wrote:
> > > > As you have them at the moment, the threads seem to be freezing fine.
> > > > The issue I've seen in the past related not to threads but to timer
> > > > based activity. Admittedly it was 2.6.14 when I last looked at it, but
> > > > there used to be a possibility for XFS to submit I/O from a timer when
> > > > the threads are frozen but the bdev isn't frozen. Has that changed?
> > > 
> > > I didn't think we've ever done that - periodic or delayed operations
> > > are passed off to the kernel threads to execute. A stack trace
> > > (if you still have it) would be really help here.
> > > 
> > > Hmmm - we have a couple of per-cpu work queues as well that are
> > > used on I/O completion and that can, in some circumstances,
> > > trigger new transactions. If we are only flush metadata, then
> > > I don't think that any more I/o will be issued, but I could be
> > > wrong (maze of twisty passages).
> > 
> > Well, I think this exactly is the problem, because worker_threads run with
> > PF_NOFREEZE set (as I've just said in another message).
> 
> Ok, so freezing the filesystem is the only way you can prevent
> this as the workqueues are flushed as part of quiescing the filesystem.

Yes, I think so.

Now at last I know what the problem actually is and why we need the freezing
of filesystems, so thanks for helping me understand that. :-)

Greetings,
Rafael


-- 
You never change things by fighting the existing reality.
		R. Buckminster Fuller

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Freeze bdevs when freezing processes.
  2006-10-27  1:38                                 ` David Chinner
  2006-10-27 14:37                                   ` Rafael J. Wysocki
@ 2006-10-29 17:35                                   ` Pavel Machek
       [not found]                                     ` <200610300029.25555.rjw@sisk.pl>
  1 sibling, 1 reply; 28+ messages in thread
From: Pavel Machek @ 2006-10-29 17:35 UTC (permalink / raw)
  To: David Chinner
  Cc: Rafael J. Wysocki, Nigel Cunningham, Andrew Morton, LKML, xfs

Hi!

> > > > As you have them at the moment, the threads seem to be freezing fine.
> > > > The issue I've seen in the past related not to threads but to timer
> > > > based activity. Admittedly it was 2.6.14 when I last looked at it, but
> > > > there used to be a possibility for XFS to submit I/O from a timer when
> > > > the threads are frozen but the bdev isn't frozen. Has that changed?
> > > 
> > > I didn't think we've ever done that - periodic or delayed operations
> > > are passed off to the kernel threads to execute. A stack trace
> > > (if you still have it) would be really help here.
> > > 
> > > Hmmm - we have a couple of per-cpu work queues as well that are
> > > used on I/O completion and that can, in some circumstances,
> > > trigger new transactions. If we are only flush metadata, then
> > > I don't think that any more I/o will be issued, but I could be
> > > wrong (maze of twisty passages).
> > 
> > Well, I think this exactly is the problem, because worker_threads run with
> > PF_NOFREEZE set (as I've just said in another message).
> 
> Ok, so freezing the filesystem is the only way you can prevent
> this as the workqueues are flushed as part of quiescing the filesystem.

Well, alternative is to teach XFS to sense that we are being frozen
and stop disk writes in such case.

OTOH freeze_bdevs is perhaps not that bad solution... 
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

[parent not found: <200610300029.25555.rjw@sisk.pl>]

* Re: [PATCH] Freeze bdevs when freezing processes.
       [not found]                                     ` <200610300029.25555.rjw@sisk.pl>
@ 2006-10-29 23:46                                       ` Nigel Cunningham
  0 siblings, 0 replies; 28+ messages in thread
From: Nigel Cunningham @ 2006-10-29 23:46 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Pavel Machek, David Chinner, Andrew Morton, LKML, xfs,
	Christoph Hellwig

Hi.

On Mon, 2006-10-30 at 00:29 +0100, Rafael J. Wysocki wrote:
> Hi,
> 
> On Sunday, 29 October 2006 18:35, Pavel Machek wrote:
> > Hi!
> > 
> > > > > > As you have them at the moment, the threads seem to be freezing fine.
> > > > > > The issue I've seen in the past related not to threads but to timer
> > > > > > based activity. Admittedly it was 2.6.14 when I last looked at it, but
> > > > > > there used to be a possibility for XFS to submit I/O from a timer when
> > > > > > the threads are frozen but the bdev isn't frozen. Has that changed?
> > > > > 
> > > > > I didn't think we've ever done that - periodic or delayed operations
> > > > > are passed off to the kernel threads to execute. A stack trace
> > > > > (if you still have it) would be really help here.
> > > > > 
> > > > > Hmmm - we have a couple of per-cpu work queues as well that are
> > > > > used on I/O completion and that can, in some circumstances,
> > > > > trigger new transactions. If we are only flush metadata, then
> > > > > I don't think that any more I/o will be issued, but I could be
> > > > > wrong (maze of twisty passages).
> > > > 
> > > > Well, I think this exactly is the problem, because worker_threads run with
> > > > PF_NOFREEZE set (as I've just said in another message).
> > > 
> > > Ok, so freezing the filesystem is the only way you can prevent
> > > this as the workqueues are flushed as part of quiescing the filesystem.
> > 
> > Well, alternative is to teach XFS to sense that we are being frozen
> > and stop disk writes in such case.
> > 
> > OTOH freeze_bdevs is perhaps not that bad solution... 
> 
> Okay, appended is a patch that implements the freezing of bdevs in a slightly
> different way than the Nigel's patch did it.

> As Christoph suggested, I have put freeze_filesystems() and thaw_filesystems()
> into fs/buffer.c and indroduced the MS_FROZEN flag to mark frozen
> filesystems.
> 
> It seems to work fine, except I get the following trace from lockdep during
> the suspend on a regular basis (not 100% reproducible, though):
> 
> Stopping tasks...
> =============================================
> [ INFO: possible recursive locking detected ]
> 2.6.19-rc2-mm2 #15
> ---------------------------------------------
> s2disk/5564 is trying to acquire lock:
>  (&bdev->bd_mount_mutex){--..}, at: [<ffffffff80475e79>] mutex_lock+0x9/0x10
> 
> but task is already holding lock:
>  (&bdev->bd_mount_mutex){--..}, at: [<ffffffff80475e79>] mutex_lock+0x9/0x10
> 
> other info that might help us debug this:
> 3 locks held by s2disk/5564:
>  #0:  (&bdev->bd_mount_mutex){--..}, at: [<ffffffff80475e79>] mutex_lock+0x9/0x10
>  #1:  (&type->s_umount_key#16){----}, at: [<ffffffff80291647>] get_super+0x67/0xc0
>  #2:  (&journal->j_barrier){--..}, at: [<ffffffff80475e79>] mutex_lock+0x9/0x10
> 
> stack backtrace:
> 
> Call Trace:
>  [<ffffffff8020af79>] dump_trace+0xb9/0x430
>  [<ffffffff8020b333>] show_trace+0x43/0x60
>  [<ffffffff8020b635>] dump_stack+0x15/0x20
>  [<ffffffff8024a1d1>] __lock_acquire+0x881/0xc60
>  [<ffffffff8024a94d>] lock_acquire+0x8d/0xc0
>  [<ffffffff80475cd4>] __mutex_lock_slowpath+0xd4/0x270
>  [<ffffffff80475e79>] mutex_lock+0x9/0x10
>  [<ffffffff802b2bb6>] freeze_bdev+0x16/0x80
>  [<ffffffff802b3105>] freeze_filesystems+0x55/0x80
>  [<ffffffff80255942>] freeze_processes+0x1e2/0x360
>  [<ffffffff802592a3>] snapshot_ioctl+0x163/0x610
>  [<ffffffff8029cf0b>] do_ioctl+0x6b/0xa0
>  [<ffffffff8029d1eb>] vfs_ioctl+0x2ab/0x2d0
>  [<ffffffff8029d27a>] sys_ioctl+0x6a/0xa0
>  [<ffffffff80209c2e>] system_call+0x7e/0x83
>  [<00002afb13a4d8a9>]
> 
> done.
> Shrinking memory... done (19126 pages freed)
> 
> Greetings,
> Rafael
> 
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

Heh. I've just prepared almost exactly the same patch (except for
unwinding the process thawing).

I haven't ever seen those mutex warnings - is that due to something new
in -mm or one of those gazillion new warnings you can enable in vanilla?

Apart from that, I'll add:

Signed-off-by: Nigel Cunningham <nigel@suspend2.net>

Regards,

Nigel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Freeze bdevs when freezing processes.
  2006-10-26  8:57                             ` David Chinner
  2006-10-26  9:11                               ` Rafael J. Wysocki
@ 2006-10-26  9:18                               ` Nigel Cunningham
  1 sibling, 0 replies; 28+ messages in thread
From: Nigel Cunningham @ 2006-10-26  9:18 UTC (permalink / raw)
  To: David Chinner; +Cc: Rafael J. Wysocki, Pavel Machek, Andrew Morton, LKML, xfs

Hi.

On Thu, 2006-10-26 at 18:57 +1000, David Chinner wrote:
> I didn't think we've ever done that - periodic or delayed operations
> are passed off to the kernel threads to execute. A stack trace
> (if you still have it) would be really help here.

I don't, but I know how to get it again. Will give it a go shortly.

> Hmmm - we have a couple of per-cpu work queues as well that are
> used on I/O completion and that can, in some circumstances,
> trigger new transactions. If we are only flush metadata, then
> I don't think that any more I/o will be issued, but I could be
> wrong (maze of twisty passages).

I can understand that. I'm trying to learn Xgl programming at the mo :)

Nigel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Freeze bdevs when freezing processes.
  2006-10-26  8:18                           ` Nigel Cunningham
  2006-10-26  8:48                             ` Rafael J. Wysocki
  2006-10-26  8:57                             ` David Chinner
@ 2006-10-26  9:08                             ` Rafael J. Wysocki
  2 siblings, 0 replies; 28+ messages in thread
From: Rafael J. Wysocki @ 2006-10-26  9:08 UTC (permalink / raw)
  To: Nigel Cunningham; +Cc: David Chinner, Pavel Machek, Andrew Morton, LKML, xfs

On Thursday, 26 October 2006 10:18, Nigel Cunningham wrote:
> Hi Dave.
> 
> On Thu, 2006-10-26 at 17:30 +1000, David Chinner wrote:
> > On Wed, Oct 25, 2006 at 09:05:56PM +0200, Rafael J. Wysocki wrote:
> > > On Wednesday, 25 October 2006 15:23, Nigel Cunningham wrote:
> > > > > 
> > > > > Well, my impression is that this is exactly what happens here: Something
> > > > > in the XFS code causes metadata to be written to disk _after_ the atomic
> > > > > snapshot.
> > > > > 
> > > > > That's why I asked if the dirty XFS metadata were flushed by a kernel thread.
> > > > 
> > > > When I first added bdev freezing it was because there was an XFS timer
> > > > doing writes.
> > > 
> > > Yes, I noticed you said that, but I'd like someone from the XFS team to either
> > > confirm or deny it.
> > 
> > We have daemons running in the background that can definitely do stuff
> > after a sync. hmm - one does try_to_freeze() after a wakeup, the
> > other does:
> > 
> >                 if (unlikely(freezing(current))) {
> >                         set_bit(XBT_FORCE_SLEEP, &target->bt_flags);
> >                         refrigerator();
> >                 } else {
> >                         clear_bit(XBT_FORCE_SLEEP, &target->bt_flags);
> >                 }
> > 
> > before it goes to sleep. So that one (xfsbufd - metadata buffer flushing)
> > can definitely wake up after the sync and do work, and the other could if
> > the kernel thread freeze occurs after the sync.
> > 
> > Another good question at this point - exactly how should we be putting
> > these thread to to sleep? Are both these valid methods for freezing them?
> > And should we be freezing when we wake up instead of before we go to
> > sleep? i.e. what are teh rules we are supposed to be following?
> 
> As you have them at the moment, the threads seem to be freezing fine.
> The issue I've seen in the past related not to threads but to timer
> based activity. Admittedly it was 2.6.14 when I last looked at it, but
> there used to be a possibility for XFS to submit I/O from a timer when
> the threads are frozen but the bdev isn't frozen.

Also there may be a problem if a workqueue is used for that, because
worker_threads run with PF_NOFREEZE set.

Greetings,
Rafael


-- 
You never change things by fighting the existing reality.
		R. Buckminster Fuller

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Freeze bdevs when freezing processes.
  2006-10-24 15:29     ` Rafael J. Wysocki
  2006-10-24 16:27       ` Oleg Verych
  2006-10-24 16:33       ` David Chinner
@ 2006-10-24 17:06       ` Christoph Hellwig
  2006-10-24 21:26         ` Pavel Machek
  2 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2006-10-24 17:06 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: David Chinner, Nigel Cunningham, Andrew Morton, LKML,
	Pavel Machek, xfs

On Tue, Oct 24, 2006 at 05:29:59PM +0200, Rafael J. Wysocki wrote:
> Do you mean calling sys_sync() after the userspace has been frozen
> may not be sufficient?

No, that's definitly not enough.  You need to freeze_bdev to make sure
data is on disk in the place it's expected by the filesystem without
starting a log recovery.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Freeze bdevs when freezing processes.
  2006-10-24 17:06       ` Christoph Hellwig
@ 2006-10-24 21:26         ` Pavel Machek
  2006-10-24 21:33           ` Christoph Hellwig
  0 siblings, 1 reply; 28+ messages in thread
From: Pavel Machek @ 2006-10-24 21:26 UTC (permalink / raw)
  To: Christoph Hellwig, Rafael J. Wysocki, David Chinner,
	Nigel Cunningham, Andrew Morton, LKML, xfs

Hi!

On Tue 2006-10-24 18:06:33, Christoph Hellwig wrote:
> On Tue, Oct 24, 2006 at 05:29:59PM +0200, Rafael J. Wysocki wrote:
> > Do you mean calling sys_sync() after the userspace has been frozen
> > may not be sufficient?
> 
> No, that's definitly not enough.  You need to freeze_bdev to make sure
> data is on disk in the place it's expected by the filesystem without
> starting a log recovery.

I believe log recovery is okay in this case.

It can only happen when kernel dies during suspend or during
resume... And log recovery seems okay in that case. We even guarantee
that user did not loose any data -- by using sys_sync() after userland
is stopped -- but let's not overdo over protections.
								Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Freeze bdevs when freezing processes.
  2006-10-24 21:26         ` Pavel Machek
@ 2006-10-24 21:33           ` Christoph Hellwig
  2006-10-24 21:43             ` Pavel Machek
  0 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2006-10-24 21:33 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Christoph Hellwig, Rafael J. Wysocki, David Chinner,
	Nigel Cunningham, Andrew Morton, LKML, xfs

On Tue, Oct 24, 2006 at 11:26:48PM +0200, Pavel Machek wrote:
> > No, that's definitly not enough.  You need to freeze_bdev to make sure
> > data is on disk in the place it's expected by the filesystem without
> > starting a log recovery.
> 
> I believe log recovery is okay in this case.
> 
> It can only happen when kernel dies during suspend or during
> resume... And log recovery seems okay in that case. We even guarantee
> that user did not loose any data -- by using sys_sync() after userland
> is stopped -- but let's not overdo over protections.

You're still entirely missing the problem.

Take a look at http://www.opengroup.org/onlinepubs/007908799/xsh/sync.html
and the linux sync(2) manpage.  The only thing sync guarantees is writing
out all in-memory data to disk.  It doesn't even gurantee completion,
although we've been synchronous in Linux for a while.

What it does not gurantee is where on disk the data is located.  Now for
a journaling filesystem pushing everything to the log is the easiest way
to complete sync, and it's perfectly valid - if the system crashes after
the sync and before data is written back to it's normal place on disk
the system notices it's not been unmounted cleanly and will do a log
recovery.  In the suspend case however the system neither crashes nor
is unmounted - thus the filesystem doesn't know it has to recover the
log.  We have to choices to fix this:

 (1) force a log recovery of an already mounted and in use filesystem
 (2) make sure data is in the right place before suspending

(1) is pretty nasty, and hard to do across filesystems.  (2) is already
implemented and easily useable by the suspend code.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Freeze bdevs when freezing processes.
  2006-10-24 21:33           ` Christoph Hellwig
@ 2006-10-24 21:43             ` Pavel Machek
  0 siblings, 0 replies; 28+ messages in thread
From: Pavel Machek @ 2006-10-24 21:43 UTC (permalink / raw)
  To: Christoph Hellwig, Rafael J. Wysocki, David Chinner,
	Nigel Cunningham, Andrew Morton, LKML, xfs

Hi!

On Tue 2006-10-24 22:33:42, Christoph Hellwig wrote:
> On Tue, Oct 24, 2006 at 11:26:48PM +0200, Pavel Machek wrote:
> > > No, that's definitly not enough.  You need to freeze_bdev to make sure
> > > data is on disk in the place it's expected by the filesystem without
> > > starting a log recovery.
> > 
> > I believe log recovery is okay in this case.
> > 
> > It can only happen when kernel dies during suspend or during
> > resume... And log recovery seems okay in that case. We even guarantee
> > that user did not loose any data -- by using sys_sync() after userland
> > is stopped -- but let's not overdo over protections.
> 
> You're still entirely missing the problem.
> 
> Take a look at http://www.opengroup.org/onlinepubs/007908799/xsh/sync.html
> and the linux sync(2) manpage.  The only thing sync guarantees is writing
> out all in-memory data to disk.  It doesn't even gurantee completion,
> although we've been synchronous in Linux for a while.

Ok, I assume sys_sync is synchronous, but that's okay.

> What it does not gurantee is where on disk the data is located.  Now for
> a journaling filesystem pushing everything to the log is the easiest way
> to complete sync, and it's perfectly valid - if the system crashes after
> the sync and before data is written back to it's normal place on disk
> the system notices it's not been unmounted cleanly and will do a log
> recovery.  In the suspend case however the system neither crashes nor
> is unmounted - thus the filesystem doesn't know it has to recover the
> log.  We have to choices to fix this:
> 
>  (1) force a log recovery of an already mounted and in use filesystem
>  (2) make sure data is in the right place before suspending
> 
> (1) is pretty nasty, and hard to do across filesystems.  (2) is already
> implemented and easily useable by the suspend code.

No, there's no need to do either (1) or (2) in "machine suspended and
resumed successfully". In that case, machine just continues as if no
suspend has happened.

In fact I could remove sys_sync() from freezer. suspend code would
still be correct.

That sys_sync() only matters in case of suspend but machine died
during resume... and in that case we know we crashed, and journal
recovery is okay.

I do know how journaling works (from 10000feet, anyway), and it is
okay in this case.
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Freeze bdevs when freezing processes.
  2006-10-24 14:44   ` [PATCH] Freeze bdevs when freezing processes David Chinner
  2006-10-24 15:29     ` Rafael J. Wysocki
@ 2006-10-24 22:19     ` Nigel Cunningham
  1 sibling, 0 replies; 28+ messages in thread
From: Nigel Cunningham @ 2006-10-24 22:19 UTC (permalink / raw)
  To: David Chinner; +Cc: Rafael J. Wysocki, Andrew Morton, LKML, Pavel Machek, xfs

Hi David.

On Wed, 2006-10-25 at 00:44 +1000, David Chinner wrote:
> On Mon, Oct 23, 2006 at 12:36:53PM +0200, Rafael J. Wysocki wrote:
> > On Monday, 23 October 2006 06:12, Nigel Cunningham wrote:
> > > XFS can continue to submit I/O from a timer routine, even after
> > > freezeable kernel and userspace threads are frozen. This doesn't seem to
> > > be an issue for current swsusp code,
> > 
> > So it doesn't look like we need the patch _now_.
> > 
> > > but is definitely an issue for Suspend2, where the pages being written could
> > > be overwritten by Suspend2's atomic copy.
> > 
> > And IMO that's a good reason why we shouldn't use RCU pages for storing the
> > image.  XFS is one known example that breaks things if we do so and
> > there may be more such things that we don't know of.  The fact that they
> > haven't appeared in testing so far doesn't mean they don't exist and
> > moreover some things like that may appear in the future.
> 
> Could you please tell us which XFS bits are broken so we can get
> them fixed?  The XFS daemons should all be checking if they are
> supposed to freeze (i.e. they call try_to_freeze() after they wake
> up due to timer expiry) so I thought they were doing the right
> thing.

The problem (in my experience) isn't the threads but a timer that
submits I/O even when the threads are frozen. It stops when the bdev is
frozen. The last report I've seen was before I added bdev freezing to
suspend2, which was 2.6.14, so you guys may have fixed it since then.

I can seek to get a trace if you like.

Regards,

Nigel

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2006-10-29 23:59 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1161576735.3466.7.camel@nigel.suspend2.net>
     [not found] ` <200610231236.54317.rjw@sisk.pl>
2006-10-24 14:44   ` [PATCH] Freeze bdevs when freezing processes David Chinner
2006-10-24 15:29     ` Rafael J. Wysocki
2006-10-24 16:27       ` Oleg Verych
2006-10-25  8:05         ` Pavel Machek
2006-10-24 16:33       ` David Chinner
2006-10-24 21:37         ` Pavel Machek
2006-10-25  0:13           ` David Chinner
2006-10-25  8:10             ` Pavel Machek
2006-10-25  8:38               ` David Chinner
2006-10-25  8:47                 ` Pavel Machek
2006-10-25 12:32                   ` Rafael J. Wysocki
2006-10-25 13:23                     ` Nigel Cunningham
     [not found]                       ` <200610252105.56862.rjw@sisk.pl>
2006-10-26  7:30                         ` David Chinner
2006-10-26  8:18                           ` Nigel Cunningham
2006-10-26  8:48                             ` Rafael J. Wysocki
2006-10-26  8:57                             ` David Chinner
2006-10-26  9:11                               ` Rafael J. Wysocki
2006-10-27  1:38                                 ` David Chinner
2006-10-27 14:37                                   ` Rafael J. Wysocki
2006-10-29 17:35                                   ` Pavel Machek
     [not found]                                     ` <200610300029.25555.rjw@sisk.pl>
2006-10-29 23:46                                       ` Nigel Cunningham
2006-10-26  9:18                               ` Nigel Cunningham
2006-10-26  9:08                             ` Rafael J. Wysocki
2006-10-24 17:06       ` Christoph Hellwig
2006-10-24 21:26         ` Pavel Machek
2006-10-24 21:33           ` Christoph Hellwig
2006-10-24 21:43             ` Pavel Machek
2006-10-24 22:19     ` Nigel Cunningham

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox