* freeze vs freezer
@ 2007-11-22 3:54 Jeremy Fitzhardinge
2007-11-23 23:47 ` Rafael J. Wysocki
0 siblings, 1 reply; 50+ messages in thread
From: Jeremy Fitzhardinge @ 2007-11-22 3:54 UTC (permalink / raw)
To: David Chinner, xfs-masters, Rafael J. Wysocki; +Cc: Linux Kernel Mailing List
It seems that a process blocked in a write to an xfs filesystem due to
xfs_freeze cannot be frozen by the freezer.
I see this if I suspend my laptop while doing something xfs-filesystem
intensive, like a kernel build. My suspend scripts freeze the XFS
filesystem (as Dave said I should), which presumably blocks some writer,
and then the freezer times out and fails to complete.
Here's part of the process dump the freezer does when it times out:
cc1 D 00000000 0 18138 18137
dd5f1e24 00200082 00000002 00000000 ecdeeb00 ecdeec64 c200f280 00000001
009c09a0 dd5f1e0c dd5f1e0c 0000000f 00000000 00000000 00000000 dd5f1e74
c7beb480 dd5f1e88 dd5f1ea8 c0228d97 e8889540 dd5f1e38 c015b75d dd5f1e44
Call Trace:
[<c0228d97>] xfs_write+0xf4/0x6d9
[<c0226038>] xfs_file_aio_write+0x53/0x5b
[<c0171c15>] do_sync_write+0xae/0xec
[<c0172343>] vfs_write+0xa4/0x120
[<c01728d7>] sys_write+0x3b/0x60
[<c0106fae>] sysenter_past_esp+0x6b/0xa1
=======================
I haven't looked at how to fix this yet. I only just worked out why I
was getting suspend failures.
J
^ permalink raw reply [flat|nested] 50+ messages in thread* Re: freeze vs freezer 2007-11-22 3:54 freeze vs freezer Jeremy Fitzhardinge @ 2007-11-23 23:47 ` Rafael J. Wysocki 2007-11-26 18:44 ` Jeremy Fitzhardinge 2007-11-26 21:17 ` David Chinner 0 siblings, 2 replies; 50+ messages in thread From: Rafael J. Wysocki @ 2007-11-23 23:47 UTC (permalink / raw) To: Jeremy Fitzhardinge; +Cc: David Chinner, xfs-masters, Linux Kernel Mailing List On Thursday, 22 of November 2007, Jeremy Fitzhardinge wrote: > It seems that a process blocked in a write to an xfs filesystem due to > xfs_freeze cannot be frozen by the freezer. The freezer doesn't handle tasks in TASK_UNINTERRUPTIBLE and I don't know how to make it handle them without at least partially defeating its purpose. > I see this if I suspend my laptop while doing something xfs-filesystem > intensive, like a kernel build. My suspend scripts freeze the XFS > filesystem (as Dave said I should), which presumably blocks some writer, > and then the freezer times out and fails to complete. > > Here's part of the process dump the freezer does when it times out: > > cc1 D 00000000 0 18138 18137 > dd5f1e24 00200082 00000002 00000000 ecdeeb00 ecdeec64 c200f280 00000001 > 009c09a0 dd5f1e0c dd5f1e0c 0000000f 00000000 00000000 00000000 dd5f1e74 > c7beb480 dd5f1e88 dd5f1ea8 c0228d97 e8889540 dd5f1e38 c015b75d dd5f1e44 > Call Trace: > [<c0228d97>] xfs_write+0xf4/0x6d9 > [<c0226038>] xfs_file_aio_write+0x53/0x5b > [<c0171c15>] do_sync_write+0xae/0xec > [<c0172343>] vfs_write+0xa4/0x120 > [<c01728d7>] sys_write+0x3b/0x60 > [<c0106fae>] sysenter_past_esp+0x6b/0xa1 > ======================= > > > I haven't looked at how to fix this yet. I only just worked out why I > was getting suspend failures. Well, you can add freezer_do_not_count()/freezer_count() annotations to xfs_write() (and whatever else is blocked as a result of the XFS being frozen). Generally, that would be risky without the freezing of XFS, however, because it might leak us filesystem data to a storage device after creating a hibernation image which would result in the filesystem corruption after the resume. Still, if you only suspend to RAM, that should be safe. Greetings, Rafael ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: freeze vs freezer 2007-11-23 23:47 ` Rafael J. Wysocki @ 2007-11-26 18:44 ` Jeremy Fitzhardinge 2007-11-26 21:20 ` Rafael J. Wysocki 2007-11-26 21:17 ` David Chinner 1 sibling, 1 reply; 50+ messages in thread From: Jeremy Fitzhardinge @ 2007-11-26 18:44 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: David Chinner, xfs-masters, Linux Kernel Mailing List Rafael J. Wysocki wrote: > On Thursday, 22 of November 2007, Jeremy Fitzhardinge wrote: > >> It seems that a process blocked in a write to an xfs filesystem due to >> xfs_freeze cannot be frozen by the freezer. >> > > The freezer doesn't handle tasks in TASK_UNINTERRUPTIBLE and I don't know how > to make it handle them without at least partially defeating its purpose. > Well, I guess the question is whether an xfs-frozen writer really needs to be UNINTERRUPTIBLE from the freezer's perspective (clearly it does from usermode's perspective - filesystem writes just don't return EINTR). >From a quick poke around, it looks to me like freezing is actually implemented in the VFS layer rather than in XFS itself: is that right? Could vfs_check_frozen() be changed to something that is freezer-compatible? >> I see this if I suspend my laptop while doing something xfs-filesystem >> intensive, like a kernel build. My suspend scripts freeze the XFS >> filesystem (as Dave said I should), which presumably blocks some writer, >> and then the freezer times out and fails to complete. >> >> Here's part of the process dump the freezer does when it times out: >> >> cc1 D 00000000 0 18138 18137 >> dd5f1e24 00200082 00000002 00000000 ecdeeb00 ecdeec64 c200f280 00000001 >> 009c09a0 dd5f1e0c dd5f1e0c 0000000f 00000000 00000000 00000000 dd5f1e74 >> c7beb480 dd5f1e88 dd5f1ea8 c0228d97 e8889540 dd5f1e38 c015b75d dd5f1e44 >> Call Trace: >> [<c0228d97>] xfs_write+0xf4/0x6d9 >> [<c0226038>] xfs_file_aio_write+0x53/0x5b >> [<c0171c15>] do_sync_write+0xae/0xec >> [<c0172343>] vfs_write+0xa4/0x120 >> [<c01728d7>] sys_write+0x3b/0x60 >> [<c0106fae>] sysenter_past_esp+0x6b/0xa1 >> ======================= >> >> >> I haven't looked at how to fix this yet. I only just worked out why I >> was getting suspend failures. >> > > Well, you can add freezer_do_not_count()/freezer_count() annotations to > xfs_write() (and whatever else is blocked as a result of the XFS being frozen). > What would be the implications of that? Would that just prevent freezing while there's something blocked there? > Generally, that would be risky without the freezing of XFS, however, because it > might leak us filesystem data to a storage device after creating a hibernation > image which would result in the filesystem corruption after the resume. > > Still, if you only suspend to RAM, that should be safe. > I specifically added it because I was getting data loss due to crashes during suspend/resume problems. It's been pretty stable lately, but I may as well remove the xfs_freeze from my suspend scripts if this is the solution. I think the broader issue is that there's no reason in principle why something blocked due to xfs-freezing (or vfs freezing) should prevent the freezer from completing. J ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: freeze vs freezer 2007-11-26 18:44 ` Jeremy Fitzhardinge @ 2007-11-26 21:20 ` Rafael J. Wysocki 0 siblings, 0 replies; 50+ messages in thread From: Rafael J. Wysocki @ 2007-11-26 21:20 UTC (permalink / raw) To: Jeremy Fitzhardinge; +Cc: David Chinner, xfs-masters, Linux Kernel Mailing List On Monday, 26 of November 2007, Jeremy Fitzhardinge wrote: > Rafael J. Wysocki wrote: > > On Thursday, 22 of November 2007, Jeremy Fitzhardinge wrote: > > > >> It seems that a process blocked in a write to an xfs filesystem due to > >> xfs_freeze cannot be frozen by the freezer. > >> > > > > The freezer doesn't handle tasks in TASK_UNINTERRUPTIBLE and I don't know how > > to make it handle them without at least partially defeating its purpose. > > > > Well, I guess the question is whether an xfs-frozen writer really needs > to be UNINTERRUPTIBLE from the freezer's perspective (clearly it does > from usermode's perspective - filesystem writes just don't return EINTR). > > From a quick poke around, it looks to me like freezing is actually > implemented in the VFS layer rather than in XFS itself: is that right? I don't know the details. > Could vfs_check_frozen() be changed to something that is freezer-compatible? That seems doable in principle. I'll have a closer look at it. > >> I see this if I suspend my laptop while doing something xfs-filesystem > >> intensive, like a kernel build. My suspend scripts freeze the XFS > >> filesystem (as Dave said I should), which presumably blocks some writer, > >> and then the freezer times out and fails to complete. > >> > >> Here's part of the process dump the freezer does when it times out: > >> > >> cc1 D 00000000 0 18138 18137 > >> dd5f1e24 00200082 00000002 00000000 ecdeeb00 ecdeec64 c200f280 00000001 > >> 009c09a0 dd5f1e0c dd5f1e0c 0000000f 00000000 00000000 00000000 dd5f1e74 > >> c7beb480 dd5f1e88 dd5f1ea8 c0228d97 e8889540 dd5f1e38 c015b75d dd5f1e44 > >> Call Trace: > >> [<c0228d97>] xfs_write+0xf4/0x6d9 > >> [<c0226038>] xfs_file_aio_write+0x53/0x5b > >> [<c0171c15>] do_sync_write+0xae/0xec > >> [<c0172343>] vfs_write+0xa4/0x120 > >> [<c01728d7>] sys_write+0x3b/0x60 > >> [<c0106fae>] sysenter_past_esp+0x6b/0xa1 > >> ======================= > >> > >> > >> I haven't looked at how to fix this yet. I only just worked out why I > >> was getting suspend failures. > >> > > > > Well, you can add freezer_do_not_count()/freezer_count() annotations to > > xfs_write() (and whatever else is blocked as a result of the XFS being frozen). > > > > What would be the implications of that? Would that just prevent > freezing while there's something blocked there? The freezer will not wait for this particular task. Still, the task will have TIF_FREEZE set, so it will freeze as soon as freezer_count() is reached, unless the thawing of tasks is carried out first. If used in the right place, it's reasonably safe, but we need to know what the right place is. [That's how we handle vfork(), BTW.] > > Generally, that would be risky without the freezing of XFS, however, because it > > might leak us filesystem data to a storage device after creating a hibernation > > image which would result in the filesystem corruption after the resume. > > > > Still, if you only suspend to RAM, that should be safe. > > > > I specifically added it because I was getting data loss due to crashes > during suspend/resume problems. It's been pretty stable lately, but I > may as well remove the xfs_freeze from my suspend scripts if this is the > solution. Not exactly. :-) > I think the broader issue is that there's no reason in principle why > something blocked due to xfs-freezing (or vfs freezing) should prevent > the freezer from completing. Agreed, but the only way to tell the freezer "don't wait for this task", if the task in question is in TASK_UNINTERRUPTIBLE, is to annotate it. Greetings, Rafael ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: freeze vs freezer 2007-11-23 23:47 ` Rafael J. Wysocki 2007-11-26 18:44 ` Jeremy Fitzhardinge @ 2007-11-26 21:17 ` David Chinner 2007-11-26 21:53 ` Rafael J. Wysocki 1 sibling, 1 reply; 50+ messages in thread From: David Chinner @ 2007-11-26 21:17 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Jeremy Fitzhardinge, David Chinner, xfs-masters, Linux Kernel Mailing List On Sat, Nov 24, 2007 at 12:47:21AM +0100, Rafael J. Wysocki wrote: > On Thursday, 22 of November 2007, Jeremy Fitzhardinge wrote: > > It seems that a process blocked in a write to an xfs filesystem due to > > xfs_freeze cannot be frozen by the freezer. > > The freezer doesn't handle tasks in TASK_UNINTERRUPTIBLE and I don't know how > to make it handle them without at least partially defeating its purpose. So how do you handle threads that are blocked on I/O or a lock during the system freeze process, then? > > I see this if I suspend my laptop while doing something xfs-filesystem > > intensive, like a kernel build. My suspend scripts freeze the XFS > > filesystem (as Dave said I should), which presumably blocks some writer, > > and then the freezer times out and fails to complete. > > > > Here's part of the process dump the freezer does when it times out: > > > > cc1 D 00000000 0 18138 18137 > > dd5f1e24 00200082 00000002 00000000 ecdeeb00 ecdeec64 c200f280 00000001 > > 009c09a0 dd5f1e0c dd5f1e0c 0000000f 00000000 00000000 00000000 dd5f1e74 > > c7beb480 dd5f1e88 dd5f1ea8 c0228d97 e8889540 dd5f1e38 c015b75d dd5f1e44 > > Call Trace: > > [<c0228d97>] xfs_write+0xf4/0x6d9 > > [<c0226038>] xfs_file_aio_write+0x53/0x5b > > [<c0171c15>] do_sync_write+0xae/0xec > > [<c0172343>] vfs_write+0xa4/0x120 > > [<c01728d7>] sys_write+0x3b/0x60 > > [<c0106fae>] sysenter_past_esp+0x6b/0xa1 > > ======================= > > > > > > I haven't looked at how to fix this yet. I only just worked out why I > > was getting suspend failures. > > Well, you can add freezer_do_not_count()/freezer_count() annotations to > xfs_write() (and whatever else is blocked as a result of the XFS being frozen). May as well annotate the whole VFS, then, because once the transaction subsystem is frozen any operation that modifies the filesystem will get blocked like this. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: freeze vs freezer 2007-11-26 21:17 ` David Chinner @ 2007-11-26 21:53 ` Rafael J. Wysocki 2007-11-27 5:38 ` Matthew Garrett 0 siblings, 1 reply; 50+ messages in thread From: Rafael J. Wysocki @ 2007-11-26 21:53 UTC (permalink / raw) To: David Chinner; +Cc: Jeremy Fitzhardinge, xfs-masters, Linux Kernel Mailing List On Monday, 26 of November 2007, David Chinner wrote: > On Sat, Nov 24, 2007 at 12:47:21AM +0100, Rafael J. Wysocki wrote: > > On Thursday, 22 of November 2007, Jeremy Fitzhardinge wrote: > > > It seems that a process blocked in a write to an xfs filesystem due to > > > xfs_freeze cannot be frozen by the freezer. > > > > The freezer doesn't handle tasks in TASK_UNINTERRUPTIBLE and I don't know how > > to make it handle them without at least partially defeating its purpose. > > So how do you handle threads that are blocked on I/O or a lock during > the system freeze process, then? We wait until they can continue. > > > I see this if I suspend my laptop while doing something xfs-filesystem > > > intensive, like a kernel build. My suspend scripts freeze the XFS > > > filesystem (as Dave said I should), which presumably blocks some writer, > > > and then the freezer times out and fails to complete. > > > > > > Here's part of the process dump the freezer does when it times out: > > > > > > cc1 D 00000000 0 18138 18137 > > > dd5f1e24 00200082 00000002 00000000 ecdeeb00 ecdeec64 c200f280 00000001 > > > 009c09a0 dd5f1e0c dd5f1e0c 0000000f 00000000 00000000 00000000 dd5f1e74 > > > c7beb480 dd5f1e88 dd5f1ea8 c0228d97 e8889540 dd5f1e38 c015b75d dd5f1e44 > > > Call Trace: > > > [<c0228d97>] xfs_write+0xf4/0x6d9 > > > [<c0226038>] xfs_file_aio_write+0x53/0x5b > > > [<c0171c15>] do_sync_write+0xae/0xec > > > [<c0172343>] vfs_write+0xa4/0x120 > > > [<c01728d7>] sys_write+0x3b/0x60 > > > [<c0106fae>] sysenter_past_esp+0x6b/0xa1 > > > ======================= > > > > > > > > > I haven't looked at how to fix this yet. I only just worked out why I > > > was getting suspend failures. > > > > Well, you can add freezer_do_not_count()/freezer_count() annotations to > > xfs_write() (and whatever else is blocked as a result of the XFS being frozen). > > May as well annotate the whole VFS, then, because once the transaction > subsystem is frozen any operation that modifies the filesystem will get > blocked like this. Well, I don't know how this mechanism actually works, so I can't comment. Is there a mutex on which tasks block if the filesystem is frozen? Greetings, Rafael ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: freeze vs freezer 2007-11-26 21:53 ` Rafael J. Wysocki @ 2007-11-27 5:38 ` Matthew Garrett 2007-11-27 17:40 ` Rafael J. Wysocki 0 siblings, 1 reply; 50+ messages in thread From: Matthew Garrett @ 2007-11-27 5:38 UTC (permalink / raw) To: Rafael J. Wysocki Cc: David Chinner, Jeremy Fitzhardinge, xfs-masters, Linux Kernel Mailing List On Mon, Nov 26, 2007 at 10:53:34PM +0100, Rafael J. Wysocki wrote: > On Monday, 26 of November 2007, David Chinner wrote: > > So how do you handle threads that are blocked on I/O or a lock during > > the system freeze process, then? > > We wait until they can continue. So if I have a process blocked on an unavilable NFS mount, I can't suspend? -- Matthew Garrett | mjg59@srcf.ucam.org ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: freeze vs freezer 2007-11-27 5:38 ` Matthew Garrett @ 2007-11-27 17:40 ` Rafael J. Wysocki 2007-11-27 20:33 ` Kyle Moffett 0 siblings, 1 reply; 50+ messages in thread From: Rafael J. Wysocki @ 2007-11-27 17:40 UTC (permalink / raw) To: Matthew Garrett Cc: David Chinner, Jeremy Fitzhardinge, xfs-masters, Linux Kernel Mailing List On Tuesday, 27 of November 2007, Matthew Garrett wrote: > On Mon, Nov 26, 2007 at 10:53:34PM +0100, Rafael J. Wysocki wrote: > > On Monday, 26 of November 2007, David Chinner wrote: > > > So how do you handle threads that are blocked on I/O or a lock during > > > the system freeze process, then? > > > > We wait until they can continue. > > So if I have a process blocked on an unavilable NFS mount, I can't > suspend? That's correct, you can't. [And I know what you're going to say. ;-)] Greetings, Rafael ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: freeze vs freezer 2007-11-27 17:40 ` Rafael J. Wysocki @ 2007-11-27 20:33 ` Kyle Moffett 2007-11-27 23:01 ` Rafael J. Wysocki ` (2 more replies) 0 siblings, 3 replies; 50+ messages in thread From: Kyle Moffett @ 2007-11-27 20:33 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Matthew Garrett, David Chinner, Jeremy Fitzhardinge, xfs-masters, Linux Kernel Mailing List On Nov 27, 2007, at 12:40:24, Rafael J. Wysocki wrote: > On Tuesday, 27 of November 2007, Matthew Garrett wrote: >> On Mon, Nov 26, 2007 at 10:53:34PM +0100, Rafael J. Wysocki wrote: >>> On Monday, 26 of November 2007, David Chinner wrote: >>>> So how do you handle threads that are blocked on I/O or a lock >>>> during the system freeze process, then? >>> >>> We wait until they can continue. >> >> So if I have a process blocked on an unavilable NFS mount, I can't >> suspend? > > That's correct, you can't. > > [And I know what you're going to say. ;-)] Why exactly does suspend/hibernation depend on "TASK_INTERRUPTIBLE" instead of a zero preempt_count()? Really what we should do is just iterate over all of the actual physical devices and tell each one "Block new IO requests preemptably, finish pending DMA, put the hardware in low-power mode, and prepare for suspend/hibernate". As long as each driver knows how to do those simple things we can have an entirely consistent kernel image for both suspend and for hibernation. When all tasks are preemptable we can very trivially rely on the drivers to enforce the "Stop new IO submission" with a dirt-simple semaphore or waitqueue. The sleep itself will be TASK_UNINTERRUPTIBLE, but it will be done from a preemptible context. That way the system suspend time is the sum of the suspend times of the devices on the system, and the suspend time of any given device is the sum of its maximum non-preemptible critical section and the time to flush all of its remaining pending DMA/etc. This is almost completely independent of the load-level of the machine, and it does not depend on things like NFS filesystems. The one gotcha is that it does not flush dirty filesystem pages to disk first, although that could be fixed with a few VFS and blockdev hooks which hierarchically flush and "freeze" block devices and filesystems before actually disabling devices much the way that device-mapper can pause a device to take a snapshot and end up with a clean journal on the filesystem afterwards. Cheers, Kyle Moffett ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: freeze vs freezer 2007-11-27 20:33 ` Kyle Moffett @ 2007-11-27 23:01 ` Rafael J. Wysocki 2007-11-27 22:49 ` Jeremy Fitzhardinge 2008-01-02 16:02 ` Pavel Machek 2008-06-23 7:16 ` Pavel Machek 2 siblings, 1 reply; 50+ messages in thread From: Rafael J. Wysocki @ 2007-11-27 23:01 UTC (permalink / raw) To: Kyle Moffett Cc: Matthew Garrett, David Chinner, Jeremy Fitzhardinge, xfs-masters, Linux Kernel Mailing List, Len Brown On Tuesday, 27 of November 2007, Kyle Moffett wrote: > On Nov 27, 2007, at 12:40:24, Rafael J. Wysocki wrote: > > On Tuesday, 27 of November 2007, Matthew Garrett wrote: > >> On Mon, Nov 26, 2007 at 10:53:34PM +0100, Rafael J. Wysocki wrote: > >>> On Monday, 26 of November 2007, David Chinner wrote: > >>>> So how do you handle threads that are blocked on I/O or a lock > >>>> during the system freeze process, then? > >>> > >>> We wait until they can continue. > >> > >> So if I have a process blocked on an unavilable NFS mount, I can't > >> suspend? > > > > That's correct, you can't. > > > > [And I know what you're going to say. ;-)] > > Why exactly does suspend/hibernation depend on "TASK_INTERRUPTIBLE" > instead of a zero preempt_count()? Really what we should do is just > iterate over all of the actual physical devices and tell each one > "Block new IO requests preemptably, finish pending DMA, put the > hardware in low-power mode, and prepare for suspend/hibernate". As > long as each driver knows how to do those simple things we can have > an entirely consistent kernel image for both suspend and for > hibernation. Well, this is more-or-less how we all imagine that should be done eventually. The main problem is how to implement it without causing too much breakage. Also, there are some dirty details that need to be taken into consideration. > When all tasks are preemptable we can very trivially rely on the > drivers to enforce the "Stop new IO submission" with a dirt-simple > semaphore or waitqueue. The sleep itself will be > TASK_UNINTERRUPTIBLE, but it will be done from a preemptible context. If there are any drivers that make their devices available via mmap(), that won't be sufficient. Probably, we'll need a two iterations over devices to handle all corner cases. Moreover, for hibernation we need to resume at least some devices in order to save the image, which shouldn't result in unblocking the waiting tasks. > That way the system suspend time is the sum of the suspend times of > the devices on the system, and the suspend time of any given device > is the sum of its maximum non-preemptible critical section and the > time to flush all of its remaining pending DMA/etc. This is almost > completely independent of the load-level of the machine, and it does > not depend on things like NFS filesystems. The one gotcha is that it > does not flush dirty filesystem pages to disk first, although that > could be fixed with a few VFS and blockdev hooks which hierarchically > flush and "freeze" block devices and filesystems before actually > disabling devices much the way that device-mapper can pause a device > to take a snapshot and end up with a clean journal on the filesystem > afterwards. Yes, I generally agree. Greetings, Rafael ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: freeze vs freezer 2007-11-27 23:01 ` Rafael J. Wysocki @ 2007-11-27 22:49 ` Jeremy Fitzhardinge 2007-11-27 23:14 ` Kyle Moffett 0 siblings, 1 reply; 50+ messages in thread From: Jeremy Fitzhardinge @ 2007-11-27 22:49 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Kyle Moffett, Matthew Garrett, David Chinner, xfs-masters, Linux Kernel Mailing List, Len Brown Rafael J. Wysocki wrote: > Well, this is more-or-less how we all imagine that should be done eventually. > > The main problem is how to implement it without causing too much breakage. > Also, there are some dirty details that need to be taken into consideration. > For Xen suspend/resume, I'd like to use the freezer to get all threads into a known consistent state (where, specifically, they don't have any outstanding pagetable updates pending). In other words, the freezer as it currently stands is what I want, modulo some of these issues where it gets caught up unexpectedly. If threads end up getting frozen anywhere preempt isn't explicitly disabled, it wouldn't work for me. J ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: freeze vs freezer 2007-11-27 22:49 ` Jeremy Fitzhardinge @ 2007-11-27 23:14 ` Kyle Moffett 2007-11-27 23:32 ` Jeremy Fitzhardinge 0 siblings, 1 reply; 50+ messages in thread From: Kyle Moffett @ 2007-11-27 23:14 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Rafael J. Wysocki, Matthew Garrett, David Chinner, xfs-masters, Linux Kernel Mailing List, Len Brown On Nov 27, 2007, at 17:49:18, Jeremy Fitzhardinge wrote: > Rafael J. Wysocki wrote: >> Well, this is more-or-less how we all imagine that should be done >> eventually. >> >> The main problem is how to implement it without causing too much >> breakage. Also, there are some dirty details that need to be >> taken into consideration. > > For Xen suspend/resume, I'd like to use the freezer to get all > threads into a known consistent state (where, specifically, they > don't have any outstanding pagetable updates pending). In other > words, the freezer as it currently stands is what I want, modulo > some of these issues where it gets caught up unexpectedly. If > threads end up getting frozen anywhere preempt isn't explicitly > disabled, it wouldn't work for me. The problem with "one freezer" is that "known consistent state" means something completely different to every single driver and subsystem. Xen wants it to mean "No pending page table updates and no more updates from this point forward". A network driver wants it to mean "All pending network packets DMAed out or in and the device shut down with all remaining packets queued. A SATA controller wants it to mean "All DMA quiesced and no more commands", etc. The only way to have that work is to put minimal definitions of what state you care about in the drivers themselves. For Xen this means that you need to have an appropriately-timed suspend handler which hooks into Xen code very precisely to create and preserve the "No pending page table updates" state that you care about. It will be more work in the short term but it's the only maintainable solution in the long term IMO. Cheers, Kyle Moffett ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: freeze vs freezer 2007-11-27 23:14 ` Kyle Moffett @ 2007-11-27 23:32 ` Jeremy Fitzhardinge 0 siblings, 0 replies; 50+ messages in thread From: Jeremy Fitzhardinge @ 2007-11-27 23:32 UTC (permalink / raw) To: Kyle Moffett Cc: Rafael J. Wysocki, Matthew Garrett, David Chinner, xfs-masters, Linux Kernel Mailing List, Len Brown Kyle Moffett wrote: > On Nov 27, 2007, at 17:49:18, Jeremy Fitzhardinge wrote: >> Rafael J. Wysocki wrote: >>> Well, this is more-or-less how we all imagine that should be done >>> eventually. >>> >>> The main problem is how to implement it without causing too much >>> breakage. Also, there are some dirty details that need to be taken >>> into consideration. >> >> For Xen suspend/resume, I'd like to use the freezer to get all >> threads into a known consistent state (where, specifically, they >> don't have any outstanding pagetable updates pending). In other >> words, the freezer as it currently stands is what I want, modulo some >> of these issues where it gets caught up unexpectedly. If threads end >> up getting frozen anywhere preempt isn't explicitly disabled, it >> wouldn't work for me. > > The problem with "one freezer" is that "known consistent state" means > something completely different to every single driver and subsystem. Not really. The freezer puts tasks into a particular well-understood state: they're either in usermode, or in the kernel in the refrigerator. And since the places which call into the refrigerator are explicit in the source, and not terribly numerous, its easy to audit exactly what the state is at each call. > Xen wants it to mean "No pending page table updates and no more > updates from this point forward". A network driver wants it to mean > "All pending network packets DMAed out or in and the device shut down > with all remaining packets queued. A SATA controller wants it to mean > "All DMA quiesced and no more commands", etc. Well, those are somewhat different. The existing suspend/resume driver callbacks are sufficient for a device to be in that state. What I want for Xen is more global: I just want to make sure tasks are not preempted in the middle of a state which can't be suspended. The specific details of the state I want are moderately complex, but short lived. The problem with other mechanisms - like stop_machine - is that they can leave threads preempted in one of the states I can't handle, whereas the the freezer is more deterministic. > The only way to have that work is to put minimal definitions of what > state you care about in the drivers themselves. For Xen this means > that you need to have an appropriately-timed suspend handler which > hooks into Xen code very precisely to create and preserve the "No > pending page table updates" state that you care about. It will be > more work in the short term but it's the only maintainable solution in > the long term IMO. No, that doesn't really work. Aside from scattering hooks everywhere there's pagetable updates, there's no real existing place to hook into. While I could put those hooks in, they would amount to changing the kernel-internal pagetable update interface for everyone to deal with a corner case of a fairly obscure user - I don't think its a good tradeoff. The freezer is nice because the state it puts each task into is well-defined, and is well-suited for Xen's use. In fact, I would agree with you that the use I want to put the freezer to better suits it than its current use in suspend/resume. J ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: freeze vs freezer 2007-11-27 20:33 ` Kyle Moffett 2007-11-27 23:01 ` Rafael J. Wysocki @ 2008-01-02 16:02 ` Pavel Machek 2008-01-02 21:30 ` Nigel Cunningham 2008-06-23 7:16 ` Pavel Machek 2 siblings, 1 reply; 50+ messages in thread From: Pavel Machek @ 2008-01-02 16:02 UTC (permalink / raw) To: Kyle Moffett Cc: Rafael J. Wysocki, Matthew Garrett, David Chinner, Jeremy Fitzhardinge, xfs-masters, Linux Kernel Mailing List Hi! > >>>>So how do you handle threads that are blocked on I/O or a lock > >>>>during the system freeze process, then? > >>> > >>>We wait until they can continue. > >> > >>So if I have a process blocked on an unavilable NFS mount, I can't > >>suspend? > > > >That's correct, you can't. > > > >[And I know what you're going to say. ;-)] > > Why exactly does suspend/hibernation depend on "TASK_INTERRUPTIBLE" > instead of a zero preempt_count()? Really what we should do is just > iterate over all of the actual physical devices and tell each one > "Block new IO requests preemptably, finish pending DMA, put the > hardware in low-power mode, and prepare for suspend/hibernate". As > long as each driver knows how to do those simple things we can have > an entirely consistent kernel image for both suspend and for > hibernation. "each driver" means this is a lot of work. But yes, that is probably way to go, and patch would be welcome. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: freeze vs freezer 2008-01-02 16:02 ` Pavel Machek @ 2008-01-02 21:30 ` Nigel Cunningham 2008-01-02 22:04 ` Rafael J. Wysocki 0 siblings, 1 reply; 50+ messages in thread From: Nigel Cunningham @ 2008-01-02 21:30 UTC (permalink / raw) To: Pavel Machek Cc: Kyle Moffett, Rafael J. Wysocki, Matthew Garrett, David Chinner, Jeremy Fitzhardinge, xfs-masters, Linux Kernel Mailing List Hi. Pavel Machek wrote: > Hi! > >>>>>> So how do you handle threads that are blocked on I/O or a lock >>>>>> during the system freeze process, then? >>>>> We wait until they can continue. >>>> So if I have a process blocked on an unavilable NFS mount, I can't >>>> suspend? >>> That's correct, you can't. >>> >>> [And I know what you're going to say. ;-)] >> Why exactly does suspend/hibernation depend on "TASK_INTERRUPTIBLE" >> instead of a zero preempt_count()? Really what we should do is just >> iterate over all of the actual physical devices and tell each one >> "Block new IO requests preemptably, finish pending DMA, put the >> hardware in low-power mode, and prepare for suspend/hibernate". As >> long as each driver knows how to do those simple things we can have >> an entirely consistent kernel image for both suspend and for >> hibernation. > > "each driver" means this is a lot of work. But yes, that is probably > way to go, and patch would be welcome. Yes, that does work. It's what I've done in my (preliminary) support for fuse. Regards, Nigel ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: freeze vs freezer 2008-01-02 21:30 ` Nigel Cunningham @ 2008-01-02 22:04 ` Rafael J. Wysocki 2008-01-03 9:19 ` Nigel Cunningham 0 siblings, 1 reply; 50+ messages in thread From: Rafael J. Wysocki @ 2008-01-02 22:04 UTC (permalink / raw) To: nigel Cc: Pavel Machek, Kyle Moffett, Matthew Garrett, David Chinner, Jeremy Fitzhardinge, xfs-masters, Linux Kernel Mailing List On Wednesday, 2 of January 2008, Nigel Cunningham wrote: > Hi. > > Pavel Machek wrote: > > Hi! > > > >>>>>> So how do you handle threads that are blocked on I/O or a lock > >>>>>> during the system freeze process, then? > >>>>> We wait until they can continue. > >>>> So if I have a process blocked on an unavilable NFS mount, I can't > >>>> suspend? > >>> That's correct, you can't. > >>> > >>> [And I know what you're going to say. ;-)] > >> Why exactly does suspend/hibernation depend on "TASK_INTERRUPTIBLE" > >> instead of a zero preempt_count()? Really what we should do is just > >> iterate over all of the actual physical devices and tell each one > >> "Block new IO requests preemptably, finish pending DMA, put the > >> hardware in low-power mode, and prepare for suspend/hibernate". As > >> long as each driver knows how to do those simple things we can have > >> an entirely consistent kernel image for both suspend and for > >> hibernation. > > > > "each driver" means this is a lot of work. But yes, that is probably > > way to go, and patch would be welcome. > > Yes, that does work. It's what I've done in my (preliminary) support for > fuse. Hmm, can you please elaborate a bit? Rafael ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: freeze vs freezer 2008-01-02 22:04 ` Rafael J. Wysocki @ 2008-01-03 9:19 ` Nigel Cunningham 2008-01-03 9:47 ` Oliver Neukum 2008-01-03 22:31 ` Rafael J. Wysocki 0 siblings, 2 replies; 50+ messages in thread From: Nigel Cunningham @ 2008-01-03 9:19 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Pavel Machek, Kyle Moffett, Matthew Garrett, David Chinner, Jeremy Fitzhardinge, xfs-masters, Linux Kernel Mailing List Hi. Rafael J. Wysocki wrote: > On Wednesday, 2 of January 2008, Nigel Cunningham wrote: >> Pavel Machek wrote: >>>>>>>> So how do you handle threads that are blocked on I/O or a lock >>>>>>>> during the system freeze process, then? >>>>>>> We wait until they can continue. >>>>>> So if I have a process blocked on an unavilable NFS mount, I can't >>>>>> suspend? >>>>> That's correct, you can't. >>>>> >>>>> [And I know what you're going to say. ;-)] >>>> Why exactly does suspend/hibernation depend on "TASK_INTERRUPTIBLE" >>>> instead of a zero preempt_count()? Really what we should do is just >>>> iterate over all of the actual physical devices and tell each one >>>> "Block new IO requests preemptably, finish pending DMA, put the >>>> hardware in low-power mode, and prepare for suspend/hibernate". As >>>> long as each driver knows how to do those simple things we can have >>>> an entirely consistent kernel image for both suspend and for >>>> hibernation. >>> "each driver" means this is a lot of work. But yes, that is probably >>> way to go, and patch would be welcome. >> Yes, that does work. It's what I've done in my (preliminary) support for >> fuse. > > Hmm, can you please elaborate a bit? Sorry. I wasn't very unambiguous, was I? And I'm not sure now whether you're meaning "How does fuse support relate to freezing block devices?" or "What's this about fuse support?". Let me therefore seek to answer both questions: Higher level, I know (filesystems rather than block devices), but I was meaning the general concept of blocking new requests and completing existing ones worked fine for the supposedly impossible fuse support. Re fuse support, let me start by saying "I know this doesn't handle all situations, but I think it's a good enough proof-of-concept implementation". I added some simple hooks to the code for submitting new work to fuse threads. #define FUSE_MIGHT_FREEZE(superblock, desc) \ do { \ int printed = 0; \ while(superblock->s_frozen != SB_UNFROZEN) { \ if (!printed) { \ printk("%d frozen in " desc ".\n", current->pid); \ printed = 1; \ } \ try_to_freeze(); \ yield(); \ } \ } while (0) On top of this, I made a (too simple at the moment) freeze_filesystems function which iterates through &super_blocks in reverse order, freezing fuse filesystems or ordinary ones. I say 'too simple' because it doesn't currently allow for the possibility of someone mounting (say) ext3 on fuse, but that would just be an extension of what's already done. The end result is: int freeze_processes(void) { int error; printk(KERN_INFO "Stopping fuse filesystems.\n"); freeze_filesystems(FS_FREEZER_FUSE); freezer_state = FREEZER_FILESYSTEMS_FROZEN; printk(KERN_INFO "Freezing user space processes ... "); error = try_to_freeze_tasks(FREEZER_USER_SPACE); if (error) goto Exit; printk(KERN_INFO "done.\n"); sys_sync(); printk(KERN_INFO "Stopping normal filesystems.\n"); freeze_filesystems(FS_FREEZER_NORMAL); freezer_state = FREEZER_USERSPACE_FROZEN; printk(KERN_INFO "Freezing remaining freezable tasks ... "); error = try_to_freeze_tasks(FREEZER_KERNEL_THREADS); if (error) goto Exit; printk(KERN_INFO "done."); freezer_state = FREEZER_FULLY_ON; Exit: BUG_ON(in_atomic()); printk("\n"); return error; } Sorry if that's more info than you wanted. Nigel ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: freeze vs freezer 2008-01-03 9:19 ` Nigel Cunningham @ 2008-01-03 9:47 ` Oliver Neukum 2008-01-03 9:52 ` Nigel Cunningham 2008-01-03 22:31 ` Rafael J. Wysocki 1 sibling, 1 reply; 50+ messages in thread From: Oliver Neukum @ 2008-01-03 9:47 UTC (permalink / raw) To: nigel Cc: Rafael J. Wysocki, Pavel Machek, Kyle Moffett, Matthew Garrett, David Chinner, Jeremy Fitzhardinge, xfs-masters, Linux Kernel Mailing List Am Donnerstag 03 Januar 2008 schrieb Nigel Cunningham: > On top of this, I made a (too simple at the moment) freeze_filesystems > function which iterates through &super_blocks in reverse order, freezing > fuse filesystems or ordinary ones. I say 'too simple' because it doesn't > currently allow for the possibility of someone mounting (say) ext3 on > fuse, but that would just be an extension of what's already done. How do you deal with fuse server tasks using other fuse filesystems? How does freeze_filesystems() look? Regards Oliver ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: freeze vs freezer 2008-01-03 9:47 ` Oliver Neukum @ 2008-01-03 9:52 ` Nigel Cunningham 2008-01-03 11:15 ` Oliver Neukum 0 siblings, 1 reply; 50+ messages in thread From: Nigel Cunningham @ 2008-01-03 9:52 UTC (permalink / raw) To: Oliver Neukum Cc: Rafael J. Wysocki, Pavel Machek, Kyle Moffett, Matthew Garrett, David Chinner, Jeremy Fitzhardinge, xfs-masters, Linux Kernel Mailing List Hi. Oliver Neukum wrote: > Am Donnerstag 03 Januar 2008 schrieb Nigel Cunningham: >> On top of this, I made a (too simple at the moment) freeze_filesystems >> function which iterates through &super_blocks in reverse order, freezing >> fuse filesystems or ordinary ones. I say 'too simple' because it doesn't >> currently allow for the possibility of someone mounting (say) ext3 on >> fuse, but that would just be an extension of what's already done. > > How do you deal with fuse server tasks using other fuse filesystems? Since they're frozen in reverse order, the dependant one would be frozen first. > How does freeze_filesystems() look? Removing my ugly debugging statements, it's currently: /** * freeze_filesystems - lock all filesystems and force them into a consistent * state */ void freeze_filesystems(int which) { struct super_block *sb; lockdep_off(); /* * Freeze in reverse order so filesystems dependant upon others are * frozen in the right order (eg. loopback on ext3). */ list_for_each_entry_reverse(sb, &super_blocks, s_list) { if (sb->s_type->fs_flags & FS_IS_FUSE && sb->s_frozen == SB_UNFROZEN && which & FS_FREEZER_FUSE) { sb->s_frozen = SB_FREEZE_TRANS; sb->s_flags |= MS_FROZEN; continue; } if (!sb->s_root || !sb->s_bdev || (sb->s_frozen == SB_FREEZE_TRANS) || (sb->s_flags & MS_RDONLY) || (sb->s_flags & MS_FROZEN) || !(which & FS_FREEZER_NORMAL)) continue; freeze_bdev(sb->s_bdev); sb->s_flags |= MS_FROZEN; } lockdep_on(); } Nigel ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: freeze vs freezer 2008-01-03 9:52 ` Nigel Cunningham @ 2008-01-03 11:15 ` Oliver Neukum 2008-01-03 22:06 ` Nigel Cunningham 0 siblings, 1 reply; 50+ messages in thread From: Oliver Neukum @ 2008-01-03 11:15 UTC (permalink / raw) To: nigel Cc: Rafael J. Wysocki, Pavel Machek, Kyle Moffett, Matthew Garrett, David Chinner, Jeremy Fitzhardinge, xfs-masters, Linux Kernel Mailing List Am Donnerstag, 3. Januar 2008 10:52:53 schrieb Nigel Cunningham: > Hi. > > Oliver Neukum wrote: > > Am Donnerstag 03 Januar 2008 schrieb Nigel Cunningham: > >> On top of this, I made a (too simple at the moment) freeze_filesystems > >> function which iterates through &super_blocks in reverse order, freezing > >> fuse filesystems or ordinary ones. I say 'too simple' because it doesn't > >> currently allow for the possibility of someone mounting (say) ext3 on > >> fuse, but that would just be an extension of what's already done. > > > > How do you deal with fuse server tasks using other fuse filesystems? > > Since they're frozen in reverse order, the dependant one would be frozen > first. Say I do: a) mount fuse on /tmp/first b) mount fuse on /tmp/second Then the server task for (a) does "ls /tmp/second". So it will be frozen, right? How do you then freeze (a)? And keep in mind that the server task may have forked. Regards Oliver ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: freeze vs freezer 2008-01-03 11:15 ` Oliver Neukum @ 2008-01-03 22:06 ` Nigel Cunningham 2008-01-04 20:54 ` Oliver Neukum 0 siblings, 1 reply; 50+ messages in thread From: Nigel Cunningham @ 2008-01-03 22:06 UTC (permalink / raw) To: Oliver Neukum Cc: Rafael J. Wysocki, Pavel Machek, Kyle Moffett, Matthew Garrett, David Chinner, Jeremy Fitzhardinge, xfs-masters, Linux Kernel Mailing List Hi. Oliver Neukum wrote: > Am Donnerstag, 3. Januar 2008 10:52:53 schrieb Nigel Cunningham: >> Hi. >> >> Oliver Neukum wrote: >>> Am Donnerstag 03 Januar 2008 schrieb Nigel Cunningham: >>>> On top of this, I made a (too simple at the moment) freeze_filesystems >>>> function which iterates through &super_blocks in reverse order, freezing >>>> fuse filesystems or ordinary ones. I say 'too simple' because it doesn't >>>> currently allow for the possibility of someone mounting (say) ext3 on >>>> fuse, but that would just be an extension of what's already done. >>> How do you deal with fuse server tasks using other fuse filesystems? >> Since they're frozen in reverse order, the dependant one would be frozen >> first. > > Say I do: > > a) mount fuse on /tmp/first > b) mount fuse on /tmp/second > > Then the server task for (a) does "ls /tmp/second". So it will be frozen, > right? How do you then freeze (a)? And keep in mind that the server task > may have forked. I guess I should first ask, is this a real life problem or a hypothetical twisted web? I don't see why you would want to make two filesystems interdependent - it sounds like the way to create livelock and deadlocks in normal use, before we even begin to think about hibernating. Regards, Nigel ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: freeze vs freezer 2008-01-03 22:06 ` Nigel Cunningham @ 2008-01-04 20:54 ` Oliver Neukum 2008-01-05 1:38 ` Kyle Moffett 2008-01-05 21:18 ` Pavel Machek 0 siblings, 2 replies; 50+ messages in thread From: Oliver Neukum @ 2008-01-04 20:54 UTC (permalink / raw) To: nigel Cc: Rafael J. Wysocki, Pavel Machek, Kyle Moffett, Matthew Garrett, David Chinner, Jeremy Fitzhardinge, xfs-masters, Linux Kernel Mailing List Am Donnerstag, 3. Januar 2008 23:06:07 schrieb Nigel Cunningham: > Hi. > > Oliver Neukum wrote: > > Am Donnerstag, 3. Januar 2008 10:52:53 schrieb Nigel Cunningham: > >> Hi. > >> > >> Oliver Neukum wrote: > >>> Am Donnerstag 03 Januar 2008 schrieb Nigel Cunningham: > >>>> On top of this, I made a (too simple at the moment) freeze_filesystems > >>>> function which iterates through &super_blocks in reverse order, freezing > >>>> fuse filesystems or ordinary ones. I say 'too simple' because it doesn't > >>>> currently allow for the possibility of someone mounting (say) ext3 on > >>>> fuse, but that would just be an extension of what's already done. > >>> How do you deal with fuse server tasks using other fuse filesystems? > >> Since they're frozen in reverse order, the dependant one would be frozen > >> first. > > > > Say I do: > > > > a) mount fuse on /tmp/first > > b) mount fuse on /tmp/second > > > > Then the server task for (a) does "ls /tmp/second". So it will be frozen, > > right? How do you then freeze (a)? And keep in mind that the server task > > may have forked. > > I guess I should first ask, is this a real life problem or a > hypothetical twisted web? I don't see why you would want to make two > filesystems interdependent - it sounds like the way to create livelock > and deadlocks in normal use, before we even begin to think about > hibernating. Good questions. I personally don't use fuse, but I do care about power management. The problem I see is that an unprivileged user could make that dependency, even inadvertedly. Regards Oliver ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: freeze vs freezer 2008-01-04 20:54 ` Oliver Neukum @ 2008-01-05 1:38 ` Kyle Moffett 2008-01-05 21:18 ` Pavel Machek 1 sibling, 0 replies; 50+ messages in thread From: Kyle Moffett @ 2008-01-05 1:38 UTC (permalink / raw) To: Oliver Neukum Cc: nigel, Rafael J. Wysocki, Pavel Machek, Matthew Garrett, David Chinner, Jeremy Fitzhardinge, xfs-masters, Linux Kernel Mailing List On Jan 04, 2008, at 15:54:06, Oliver Neukum wrote: > Am Donnerstag, 3. Januar 2008 23:06:07 schrieb Nigel Cunningham: >> Hi. >>> a) mount fuse on /tmp/first >>> b) mount fuse on /tmp/second >>> >>> Then the server task for (a) does "ls /tmp/second". So it will be >>> frozen, right? How do you then freeze (a)? And keep in mind that >>> the server task may have forked. >> >> I guess I should first ask, is this a real life problem or a >> hypothetical twisted web? I don't see why you would want to make >> two filesystems interdependent - it sounds like the way to create >> livelock and deadlocks in normal use, before we even begin to >> think about hibernating. > > Good questions. I personally don't use fuse, but I do care about > power management. The problem I see is that an unprivileged user > could make that dependency, even inadvertedly. I don't think it makes sense for the kernel to try to keep track of hard data dependencies for FUSE filesystems, or to even *attempt* to auto-suspend them. You should instead allow a privileged program to initiate a "freeze-and-flush" operation on a particular FUSE filesystem and optionally wait for it to finish. Then your userspace would be configured with the appropriate data dependencies and would stop FUSE filesystems in the appropriate order. In addition, the kernel would automatically understand ext3=>loopback=>fuse, and when asked to freeze the "fuse" part, it would first freeze the "ext3" and the "loopback" parts using similar mechanisms as device-mapper currently uses when you do "dmsetup suspend mydev" followed by "echo 0 $SIZE snapshot /dev/mapper/mydev- base /dev/mapper/mydev-snap-back p 8 | dmsetup load mydev" (IE: when you create a snapshot of a given device). Naturally userspace could deadlock itself (although not the kernel) by freezing a block device and then attempting to access it, but since the "freeze" operation is limited to root this is not a big issue. The way to freeze all filesystems safely would be to clone a new mount namespace, mlockall(), mount a tmpfs, pivot_root() into the tmpfs, bind-mount the filesystems you want to freeze directly onto subdirectories of the tmpfs, and then freeze them in an appropriate order. Besides which the worst-case is a pretty straightforward non-critical failure; you might fail to fully sync a FUSE filesystem because its daemon is asleep waiting on something (possibly even just sitting in a "sleep(10000)" call with all signals masked). You simply need to make sure that all tasks are asleep outside of driver critical sections so that you can properly suspend your device tree. Cheers, Kyle Moffett ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: freeze vs freezer 2008-01-04 20:54 ` Oliver Neukum 2008-01-05 1:38 ` Kyle Moffett @ 2008-01-05 21:18 ` Pavel Machek 2008-01-05 23:01 ` Nigel Cunningham 1 sibling, 1 reply; 50+ messages in thread From: Pavel Machek @ 2008-01-05 21:18 UTC (permalink / raw) To: Oliver Neukum Cc: nigel, Rafael J. Wysocki, Kyle Moffett, Matthew Garrett, David Chinner, Jeremy Fitzhardinge, xfs-masters, Linux Kernel Mailing List On Fri 2008-01-04 21:54:06, Oliver Neukum wrote: > Am Donnerstag, 3. Januar 2008 23:06:07 schrieb Nigel Cunningham: > > Hi. > > > > Oliver Neukum wrote: > > > Am Donnerstag, 3. Januar 2008 10:52:53 schrieb Nigel Cunningham: > > >> Hi. > > >> > > >> Oliver Neukum wrote: > > >>> Am Donnerstag 03 Januar 2008 schrieb Nigel Cunningham: > > >>>> On top of this, I made a (too simple at the moment) freeze_filesystems > > >>>> function which iterates through &super_blocks in reverse order, freezing > > >>>> fuse filesystems or ordinary ones. I say 'too simple' because it doesn't > > >>>> currently allow for the possibility of someone mounting (say) ext3 on > > >>>> fuse, but that would just be an extension of what's already done. > > >>> How do you deal with fuse server tasks using other fuse filesystems? > > >> Since they're frozen in reverse order, the dependant one would be frozen > > >> first. > > > > > > Say I do: > > > > > > a) mount fuse on /tmp/first > > > b) mount fuse on /tmp/second > > > > > > Then the server task for (a) does "ls /tmp/second". So it will be frozen, > > > right? How do you then freeze (a)? And keep in mind that the server task > > > may have forked. > > > > I guess I should first ask, is this a real life problem or a > > hypothetical twisted web? I don't see why you would want to make two > > filesystems interdependent - it sounds like the way to create livelock > > and deadlocks in normal use, before we even begin to think about > > hibernating. > > Good questions. I personally don't use fuse, but I do care about power > management. The problem I see is that an unprivileged user could make > that dependency, even inadvertedly. Other problem is that unprivileged user can do it with evil intent. So called "denial-of-service" attack. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: freeze vs freezer 2008-01-05 21:18 ` Pavel Machek @ 2008-01-05 23:01 ` Nigel Cunningham 0 siblings, 0 replies; 50+ messages in thread From: Nigel Cunningham @ 2008-01-05 23:01 UTC (permalink / raw) To: Pavel Machek Cc: Oliver Neukum, Rafael J. Wysocki, Kyle Moffett, Matthew Garrett, David Chinner, Jeremy Fitzhardinge, xfs-masters, Linux Kernel Mailing List Hi. Pavel Machek wrote: > On Fri 2008-01-04 21:54:06, Oliver Neukum wrote: >> Am Donnerstag, 3. Januar 2008 23:06:07 schrieb Nigel Cunningham: >>> Oliver Neukum wrote: >>>> Am Donnerstag, 3. Januar 2008 10:52:53 schrieb Nigel Cunningham: >>>>> Oliver Neukum wrote: >>>>>> Am Donnerstag 03 Januar 2008 schrieb Nigel Cunningham: >>>>>>> On top of this, I made a (too simple at the moment) freeze_filesystems >>>>>>> function which iterates through &super_blocks in reverse order, freezing >>>>>>> fuse filesystems or ordinary ones. I say 'too simple' because it doesn't >>>>>>> currently allow for the possibility of someone mounting (say) ext3 on >>>>>>> fuse, but that would just be an extension of what's already done. >>>>>> How do you deal with fuse server tasks using other fuse filesystems? >>>>> Since they're frozen in reverse order, the dependant one would be frozen >>>>> first. >>>> Say I do: >>>> >>>> a) mount fuse on /tmp/first >>>> b) mount fuse on /tmp/second >>>> >>>> Then the server task for (a) does "ls /tmp/second". So it will be frozen, >>>> right? How do you then freeze (a)? And keep in mind that the server task >>>> may have forked. >>> I guess I should first ask, is this a real life problem or a >>> hypothetical twisted web? I don't see why you would want to make two >>> filesystems interdependent - it sounds like the way to create livelock >>> and deadlocks in normal use, before we even begin to think about >>> hibernating. >> Good questions. I personally don't use fuse, but I do care about power >> management. The problem I see is that an unprivileged user could make >> that dependency, even inadvertedly. > > Other problem is that unprivileged user can do it with evil intent. So > called "denial-of-service" attack. Only in this case it would be a denial-of-denial-of-service attack, since it would stop you hibernating or suspending :). This is still all hypothetical. If I could have a real life case where this could actually happen, it would help a lot. Nigel ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: freeze vs freezer 2008-01-03 9:19 ` Nigel Cunningham 2008-01-03 9:47 ` Oliver Neukum @ 2008-01-03 22:31 ` Rafael J. Wysocki 1 sibling, 0 replies; 50+ messages in thread From: Rafael J. Wysocki @ 2008-01-03 22:31 UTC (permalink / raw) To: nigel Cc: Pavel Machek, Kyle Moffett, Matthew Garrett, David Chinner, Jeremy Fitzhardinge, xfs-masters, Linux Kernel Mailing List On Thursday, 3 of January 2008, Nigel Cunningham wrote: > Hi. > > Rafael J. Wysocki wrote: > > On Wednesday, 2 of January 2008, Nigel Cunningham wrote: > >> Pavel Machek wrote: > >>>>>>>> So how do you handle threads that are blocked on I/O or a lock > >>>>>>>> during the system freeze process, then? > >>>>>>> We wait until they can continue. > >>>>>> So if I have a process blocked on an unavilable NFS mount, I can't > >>>>>> suspend? > >>>>> That's correct, you can't. > >>>>> > >>>>> [And I know what you're going to say. ;-)] > >>>> Why exactly does suspend/hibernation depend on "TASK_INTERRUPTIBLE" > >>>> instead of a zero preempt_count()? Really what we should do is just > >>>> iterate over all of the actual physical devices and tell each one > >>>> "Block new IO requests preemptably, finish pending DMA, put the > >>>> hardware in low-power mode, and prepare for suspend/hibernate". As > >>>> long as each driver knows how to do those simple things we can have > >>>> an entirely consistent kernel image for both suspend and for > >>>> hibernation. > >>> "each driver" means this is a lot of work. But yes, that is probably > >>> way to go, and patch would be welcome. > >> Yes, that does work. It's what I've done in my (preliminary) support for > >> fuse. > > > > Hmm, can you please elaborate a bit? > > Sorry. I wasn't very unambiguous, was I? And I'm not sure now whether > you're meaning "How does fuse support relate to freezing block devices?" > or "What's this about fuse support?". Let me therefore seek to answer > both questions: > > Higher level, I know (filesystems rather than block devices), but I was > meaning the general concept of blocking new requests and completing > existing ones worked fine for the supposedly impossible fuse support. > > Re fuse support, let me start by saying "I know this doesn't handle all > situations, but I think it's a good enough proof-of-concept implementation". > > I added some simple hooks to the code for submitting new work to fuse > threads. > > #define FUSE_MIGHT_FREEZE(superblock, desc) \ > do { \ > int printed = 0; \ > while(superblock->s_frozen != SB_UNFROZEN) { \ > if (!printed) { \ > printk("%d frozen in " desc ".\n", current->pid); \ > printed = 1; \ > } \ > try_to_freeze(); \ > yield(); \ > } \ > } while (0) > > On top of this, I made a (too simple at the moment) freeze_filesystems > function which iterates through &super_blocks in reverse order, freezing > fuse filesystems or ordinary ones. I say 'too simple' because it doesn't > currently allow for the possibility of someone mounting (say) ext3 on > fuse, but that would just be an extension of what's already done. > > The end result is: > > int freeze_processes(void) > { > int error; > > printk(KERN_INFO "Stopping fuse filesystems.\n"); > freeze_filesystems(FS_FREEZER_FUSE); > freezer_state = FREEZER_FILESYSTEMS_FROZEN; > printk(KERN_INFO "Freezing user space processes ... "); > error = try_to_freeze_tasks(FREEZER_USER_SPACE); > if (error) > goto Exit; > printk(KERN_INFO "done.\n"); > > sys_sync(); > printk(KERN_INFO "Stopping normal filesystems.\n"); > freeze_filesystems(FS_FREEZER_NORMAL); > freezer_state = FREEZER_USERSPACE_FROZEN; > printk(KERN_INFO "Freezing remaining freezable tasks ... "); > error = try_to_freeze_tasks(FREEZER_KERNEL_THREADS); > if (error) > goto Exit; > printk(KERN_INFO "done."); > freezer_state = FREEZER_FULLY_ON; > Exit: > BUG_ON(in_atomic()); > printk("\n"); > return error; > } > > Sorry if that's more info than you wanted. No, that's fine, thanks. Greetings, Rafael ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: freeze vs freezer 2007-11-27 20:33 ` Kyle Moffett 2007-11-27 23:01 ` Rafael J. Wysocki 2008-01-02 16:02 ` Pavel Machek @ 2008-06-23 7:16 ` Pavel Machek 2008-06-23 14:00 ` Henrique de Moraes Holschuh 2 siblings, 1 reply; 50+ messages in thread From: Pavel Machek @ 2008-06-23 7:16 UTC (permalink / raw) To: Kyle Moffett Cc: Rafael J. Wysocki, Matthew Garrett, David Chinner, Jeremy Fitzhardinge, xfs-masters, Linux Kernel Mailing List Hi! (replying to *very* old mail). >>>> We wait until they can continue. >>> >>> So if I have a process blocked on an unavilable NFS mount, I can't >>> suspend? >> >> That's correct, you can't. >> >> [And I know what you're going to say. ;-)] > > Why exactly does suspend/hibernation depend on "TASK_INTERRUPTIBLE" instead > of a zero preempt_count()? Really what we should do is just iterate over > all of the actual physical devices and tell each one "Block new IO requests > preemptably, finish pending DMA, put the hardware in low-power mode, and > prepare for suspend/hibernate". As long as each driver knows how to do > those simple things we can have an entirely consistent kernel image for > both suspend and for hibernation. Patch would be welcome, actually. It turns out blocking new IO-requests is not completely trivial. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: freeze vs freezer 2008-06-23 7:16 ` Pavel Machek @ 2008-06-23 14:00 ` Henrique de Moraes Holschuh 2008-06-24 8:08 ` Elias Oltmanns 0 siblings, 1 reply; 50+ messages in thread From: Henrique de Moraes Holschuh @ 2008-06-23 14:00 UTC (permalink / raw) To: Pavel Machek Cc: Kyle Moffett, Rafael J. Wysocki, Matthew Garrett, David Chinner, Jeremy Fitzhardinge, xfs-masters, Linux Kernel Mailing List, Elias Oltmanns On Mon, 23 Jun 2008, Pavel Machek wrote: > (replying to *very* old mail). > > >>>> We wait until they can continue. > >>> > >>> So if I have a process blocked on an unavilable NFS mount, I can't > >>> suspend? > >> > >> That's correct, you can't. > >> > >> [And I know what you're going to say. ;-)] > > > > Why exactly does suspend/hibernation depend on "TASK_INTERRUPTIBLE" instead > > of a zero preempt_count()? Really what we should do is just iterate over > > all of the actual physical devices and tell each one "Block new IO requests > > preemptably, finish pending DMA, put the hardware in low-power mode, and > > prepare for suspend/hibernate". As long as each driver knows how to do > > those simple things we can have an entirely consistent kernel image for > > both suspend and for hibernation. > > Patch would be welcome, actually. It turns out blocking new > IO-requests is not completely trivial. Is this the same thing the per-device IO-queue-freeze patches for HDAPS also need to do? If so, you may want to talk to Elias Oltmanns <eo@nebensachen.de> about it. Added to CC. -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: freeze vs freezer 2008-06-23 14:00 ` Henrique de Moraes Holschuh @ 2008-06-24 8:08 ` Elias Oltmanns 2008-06-26 15:09 ` Pavel Machek 0 siblings, 1 reply; 50+ messages in thread From: Elias Oltmanns @ 2008-06-24 8:08 UTC (permalink / raw) To: Henrique de Moraes Holschuh Cc: Pavel Machek, Kyle Moffett, Rafael J. Wysocki, Matthew Garrett, David Chinner, Jeremy Fitzhardinge, xfs-masters, Linux Kernel Mailing List Henrique de Moraes Holschuh <hmh@hmh.eng.br> wrote: > On Mon, 23 Jun 2008, Pavel Machek wrote: >> (replying to *very* old mail). > >> >> >>>> We wait until they can continue. >> >>> >> >>> So if I have a process blocked on an unavilable NFS mount, I can't >> >>> suspend? >> >> >> >> That's correct, you can't. >> >> >> >> [And I know what you're going to say. ;-)] >> > >> > Why exactly does suspend/hibernation depend on "TASK_INTERRUPTIBLE" instead >> > of a zero preempt_count()? Really what we should do is just iterate over >> > all of the actual physical devices and tell each one "Block new IO requests >> > preemptably, finish pending DMA, put the hardware in low-power mode, and >> > prepare for suspend/hibernate". As long as each driver knows how to do >> > those simple things we can have an entirely consistent kernel image for >> > both suspend and for hibernation. >> >> Patch would be welcome, actually. It turns out blocking new >> IO-requests is not completely trivial. Quite. But I'm not sure I see what this is all about yet. From the IDE and SCSI subsystems I remember that they block all I/O from higher levels once the suspend callbacks have been executed. I haven't made an effort to understand the freezer (or indeed anything related to hibernation) yet since I don't even use hibernation myself (only s2ram). Do you have any suggestion where to start reading up on things so I can get an idea what the issues are and what you would like IDE / SCSI / ... to do? > > Is this the same thing the per-device IO-queue-freeze patches for >HDAPS also > need to do? If so, you may want to talk to Elias Oltmanns > <eo@nebensachen.de> about it. Added to CC. Thanks for the heads up Henrique. Even though these issues seem to be related up to a certain degree, there probably are some important differences. When suspending a system, the emphasis is on leaving the system in a consistent state (think of journalled file systems), whereas disk shock protection is mainly concerned with stopping I/O as soon as possible. As yet, I cannot possibly say to what extend these two concepts can be reconciled in the sense of sharing some common code. Regards, Elias ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: freeze vs freezer 2008-06-24 8:08 ` Elias Oltmanns @ 2008-06-26 15:09 ` Pavel Machek 2008-06-29 22:12 ` [xfs-masters] " Dave Chinner 0 siblings, 1 reply; 50+ messages in thread From: Pavel Machek @ 2008-06-26 15:09 UTC (permalink / raw) To: Elias Oltmanns Cc: Henrique de Moraes Holschuh, Kyle Moffett, Rafael J. Wysocki, Matthew Garrett, David Chinner, Jeremy Fitzhardinge, xfs-masters, Linux Kernel Mailing List Hi! > >> Patch would be welcome, actually. It turns out blocking new > >> IO-requests is not completely trivial. > > Quite. But I'm not sure I see what this is all about yet. From the IDE > and SCSI subsystems I remember that they block all I/O from higher levels > once the suspend callbacks have been executed. I haven't made an effort > to understand the freezer (or indeed anything related to hibernation) > yet since I don't even use hibernation myself (only s2ram). Do you have s2ram also uses freezer these days. Difference is s2ram does not really need it. > any suggestion where to start reading up on things so I can get an idea > what the issues are and what you would like IDE / SCSI / ... to do? I'd like block layer to block any process that tries to do I/O. > > Is this the same thing the per-device IO-queue-freeze patches for > >HDAPS also > > need to do? If so, you may want to talk to Elias Oltmanns > > <eo@nebensachen.de> about it. Added to CC. > > Thanks for the heads up Henrique. Even though these issues seem to be > related up to a certain degree, there probably are some important > differences. When suspending a system, the emphasis is on leaving the > system in a consistent state (think of journalled file systems), whereas > disk shock protection is mainly concerned with stopping I/O as soon as > possible. As yet, I cannot possibly say to what extend these two > concepts can be reconciled in the sense of sharing some common code. Actually, I believe requirements are same. 'don't do i/o in dangerous period'. swsusp will just do sync() before entering dangerous period. That provides consistent-enough state... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [xfs-masters] Re: freeze vs freezer 2008-06-26 15:09 ` Pavel Machek @ 2008-06-29 22:12 ` Dave Chinner 2008-06-29 23:22 ` Rafael J. Wysocki 0 siblings, 1 reply; 50+ messages in thread From: Dave Chinner @ 2008-06-29 22:12 UTC (permalink / raw) To: xfs-masters Cc: Elias Oltmanns, Henrique de Moraes Holschuh, Kyle Moffett, Rafael J. Wysocki, Matthew Garrett, David Chinner, Jeremy Fitzhardinge, Linux Kernel Mailing List On Thu, Jun 26, 2008 at 05:09:10PM +0200, Pavel Machek wrote: > > > Is this the same thing the per-device IO-queue-freeze patches for > > >HDAPS also > > > need to do? If so, you may want to talk to Elias Oltmanns > > > <eo@nebensachen.de> about it. Added to CC. > > > > Thanks for the heads up Henrique. Even though these issues seem to be > > related up to a certain degree, there probably are some important > > differences. When suspending a system, the emphasis is on leaving the > > system in a consistent state (think of journalled file systems), whereas > > disk shock protection is mainly concerned with stopping I/O as soon as > > possible. As yet, I cannot possibly say to what extend these two > > concepts can be reconciled in the sense of sharing some common code. > > Actually, I believe requirements are same. > > 'don't do i/o in dangerous period'. > > swsusp will just do sync() before entering dangerous period. That > provides consistent-enough state... As I've said many times before - if the requirement is "don't do I/O" then you have to freeze the filesystem. In no way does 'sync' prevent filesystems from doing I/O..... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [xfs-masters] Re: freeze vs freezer 2008-06-29 22:12 ` [xfs-masters] " Dave Chinner @ 2008-06-29 23:22 ` Rafael J. Wysocki 2008-06-30 6:11 ` Christoph Hellwig 2008-06-30 6:29 ` Dave Chinner 0 siblings, 2 replies; 50+ messages in thread From: Rafael J. Wysocki @ 2008-06-29 23:22 UTC (permalink / raw) To: Dave Chinner Cc: xfs-masters, Elias Oltmanns, Henrique de Moraes Holschuh, Kyle Moffett, Matthew Garrett, David Chinner, Jeremy Fitzhardinge, Linux Kernel Mailing List, Jens Axboe On Monday, 30 of June 2008, Dave Chinner wrote: > On Thu, Jun 26, 2008 at 05:09:10PM +0200, Pavel Machek wrote: > > > > Is this the same thing the per-device IO-queue-freeze patches for > > > >HDAPS also > > > > need to do? If so, you may want to talk to Elias Oltmanns > > > > <eo@nebensachen.de> about it. Added to CC. > > > > > > Thanks for the heads up Henrique. Even though these issues seem to be > > > related up to a certain degree, there probably are some important > > > differences. When suspending a system, the emphasis is on leaving the > > > system in a consistent state (think of journalled file systems), whereas > > > disk shock protection is mainly concerned with stopping I/O as soon as > > > possible. As yet, I cannot possibly say to what extend these two > > > concepts can be reconciled in the sense of sharing some common code. > > > > Actually, I believe requirements are same. > > > > 'don't do i/o in dangerous period'. > > > > swsusp will just do sync() before entering dangerous period. That > > provides consistent-enough state... > > As I've said many times before - if the requirement is "don't do > I/O" then you have to freeze the filesystem. In no way does 'sync' > prevent filesystems from doing I/O..... Well, it seems we can handle this on the block layer level, by temporarily replacing the elevator with something that will selectively prevent fs I/O from reaching the layers below it. I talked with Jens about it on a very general level, but it seems doable at first sight. Thanks, Rafael ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [xfs-masters] Re: freeze vs freezer 2008-06-29 23:22 ` Rafael J. Wysocki @ 2008-06-30 6:11 ` Christoph Hellwig 2008-06-30 20:34 ` Rafael J. Wysocki 2008-06-30 6:29 ` Dave Chinner 1 sibling, 1 reply; 50+ messages in thread From: Christoph Hellwig @ 2008-06-30 6:11 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Dave Chinner, xfs-masters, Elias Oltmanns, Henrique de Moraes Holschuh, Kyle Moffett, Matthew Garrett, David Chinner, Jeremy Fitzhardinge, Linux Kernel Mailing List, Jens Axboe On Mon, Jun 30, 2008 at 01:22:47AM +0200, Rafael J. Wysocki wrote: > > > Actually, I believe requirements are same. > > > > > > 'don't do i/o in dangerous period'. > > > > > > swsusp will just do sync() before entering dangerous period. That > > > provides consistent-enough state... > > > > As I've said many times before - if the requirement is "don't do > > I/O" then you have to freeze the filesystem. In no way does 'sync' > > prevent filesystems from doing I/O..... > > Well, it seems we can handle this on the block layer level, by temporarily > replacing the elevator with something that will selectively prevent fs I/O > from reaching the layers below it. > > I talked with Jens about it on a very general level, but it seems doable at > first sight. Why would you hack the blok layer when we already have a perfectly fine facility to archive what you want? freeze_bdev is there exactly for the purpose to make the filesystem consistant on disk and then freeze all I/O. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [xfs-masters] Re: freeze vs freezer 2008-06-30 6:11 ` Christoph Hellwig @ 2008-06-30 20:34 ` Rafael J. Wysocki 2008-07-03 19:43 ` Eric Sandeen 0 siblings, 1 reply; 50+ messages in thread From: Rafael J. Wysocki @ 2008-06-30 20:34 UTC (permalink / raw) To: Christoph Hellwig Cc: Dave Chinner, xfs-masters, Elias Oltmanns, Henrique de Moraes Holschuh, Kyle Moffett, Matthew Garrett, David Chinner, Jeremy Fitzhardinge, Linux Kernel Mailing List, Jens Axboe On Monday, 30 of June 2008, Christoph Hellwig wrote: > On Mon, Jun 30, 2008 at 01:22:47AM +0200, Rafael J. Wysocki wrote: > > > > Actually, I believe requirements are same. > > > > > > > > 'don't do i/o in dangerous period'. > > > > > > > > swsusp will just do sync() before entering dangerous period. That > > > > provides consistent-enough state... > > > > > > As I've said many times before - if the requirement is "don't do > > > I/O" then you have to freeze the filesystem. In no way does 'sync' > > > prevent filesystems from doing I/O..... > > > > Well, it seems we can handle this on the block layer level, by temporarily > > replacing the elevator with something that will selectively prevent fs I/O > > from reaching the layers below it. > > > > I talked with Jens about it on a very general level, but it seems doable at > > first sight. > > Why would you hack the blok layer when we already have a perfectly fine > facility to archive what you want? freeze_bdev is there exactly for the > purpose to make the filesystem consistant on disk and then freeze all > I/O. We tried that in the past and it didn't work very well due to some bad interactions with the md layer that we wanted to stay functional while we were saving the image. Also, do all of the supported filesystems implement this feature? ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [xfs-masters] Re: freeze vs freezer 2008-06-30 20:34 ` Rafael J. Wysocki @ 2008-07-03 19:43 ` Eric Sandeen 0 siblings, 0 replies; 50+ messages in thread From: Eric Sandeen @ 2008-07-03 19:43 UTC (permalink / raw) To: xfs-masters Cc: Christoph Hellwig, Dave Chinner, Elias Oltmanns, Henrique de Moraes Holschuh, Kyle Moffett, Matthew Garrett, David Chinner, Jeremy Fitzhardinge, Linux Kernel Mailing List, Jens Axboe Rafael J. Wysocki wrote: >>> I talked with Jens about it on a very general level, but it seems doable at >>> first sight. >> Why would you hack the blok layer when we already have a perfectly fine >> facility to archive what you want? freeze_bdev is there exactly for the >> purpose to make the filesystem consistant on disk and then freeze all >> I/O. > > We tried that in the past and it didn't work very well due to some bad > interactions with the md layer that we wanted to stay functional while we > were saving the image. Hm, details or a link? > Also, do all of the supported filesystems implement this feature? ext3, ext4, gfs2, jfs, reiserfs, xfs, all provide a write_super_lockfs op, which is what freeze_bdev uses. I think that the rest is generic, for simpler filesystems. -Eric ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [xfs-masters] Re: freeze vs freezer 2008-06-29 23:22 ` Rafael J. Wysocki 2008-06-30 6:11 ` Christoph Hellwig @ 2008-06-30 6:29 ` Dave Chinner 2008-06-30 6:37 ` Jeremy Fitzhardinge 1 sibling, 1 reply; 50+ messages in thread From: Dave Chinner @ 2008-06-30 6:29 UTC (permalink / raw) To: xfs-masters Cc: Elias Oltmanns, Henrique de Moraes Holschuh, Kyle Moffett, Matthew Garrett, David Chinner, Jeremy Fitzhardinge, Linux Kernel Mailing List, Jens Axboe On Mon, Jun 30, 2008 at 01:22:47AM +0200, Rafael J. Wysocki wrote: > On Monday, 30 of June 2008, Dave Chinner wrote: > > On Thu, Jun 26, 2008 at 05:09:10PM +0200, Pavel Machek wrote: > > > > > Is this the same thing the per-device IO-queue-freeze patches for > > > > >HDAPS also > > > > > need to do? If so, you may want to talk to Elias Oltmanns > > > > > <eo@nebensachen.de> about it. Added to CC. > > > > > > > > Thanks for the heads up Henrique. Even though these issues seem to be > > > > related up to a certain degree, there probably are some important > > > > differences. When suspending a system, the emphasis is on leaving the > > > > system in a consistent state (think of journalled file systems), whereas > > > > disk shock protection is mainly concerned with stopping I/O as soon as > > > > possible. As yet, I cannot possibly say to what extend these two > > > > concepts can be reconciled in the sense of sharing some common code. > > > > > > Actually, I believe requirements are same. > > > > > > 'don't do i/o in dangerous period'. > > > > > > swsusp will just do sync() before entering dangerous period. That > > > provides consistent-enough state... > > > > As I've said many times before - if the requirement is "don't do > > I/O" then you have to freeze the filesystem. In no way does 'sync' > > prevent filesystems from doing I/O..... > > Well, it seems we can handle this on the block layer level, by temporarily > replacing the elevator with something that will selectively prevent fs I/O > from reaching the layers below it. Why? What part of freeze_bdev() doesn't work for you? Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [xfs-masters] Re: freeze vs freezer 2008-06-30 6:29 ` Dave Chinner @ 2008-06-30 6:37 ` Jeremy Fitzhardinge 2008-06-30 12:33 ` Dave Chinner 2008-07-01 8:59 ` Pavel Machek 0 siblings, 2 replies; 50+ messages in thread From: Jeremy Fitzhardinge @ 2008-06-30 6:37 UTC (permalink / raw) To: xfs-masters, Elias Oltmanns, Henrique de Moraes Holschuh, Kyle Moffett, Matthew Garrett, David Chinner, Jeremy Fitzhardinge, Linux Kernel Mailing List, Jens Axboe Dave Chinner wrote: > On Mon, Jun 30, 2008 at 01:22:47AM +0200, Rafael J. Wysocki wrote: > >> On Monday, 30 of June 2008, Dave Chinner wrote: >> >>> On Thu, Jun 26, 2008 at 05:09:10PM +0200, Pavel Machek wrote: >>> >>>>>> Is this the same thing the per-device IO-queue-freeze patches for >>>>>> HDAPS also >>>>>> need to do? If so, you may want to talk to Elias Oltmanns >>>>>> <eo@nebensachen.de> about it. Added to CC. >>>>>> >>>>> Thanks for the heads up Henrique. Even though these issues seem to be >>>>> related up to a certain degree, there probably are some important >>>>> differences. When suspending a system, the emphasis is on leaving the >>>>> system in a consistent state (think of journalled file systems), whereas >>>>> disk shock protection is mainly concerned with stopping I/O as soon as >>>>> possible. As yet, I cannot possibly say to what extend these two >>>>> concepts can be reconciled in the sense of sharing some common code. >>>>> >>>> Actually, I believe requirements are same. >>>> >>>> 'don't do i/o in dangerous period'. >>>> >>>> swsusp will just do sync() before entering dangerous period. That >>>> provides consistent-enough state... >>>> >>> As I've said many times before - if the requirement is "don't do >>> I/O" then you have to freeze the filesystem. In no way does 'sync' >>> prevent filesystems from doing I/O..... >>> >> Well, it seems we can handle this on the block layer level, by temporarily >> replacing the elevator with something that will selectively prevent fs I/O >> from reaching the layers below it. >> > > Why? What part of freeze_bdev() doesn't work for you? Well, my original problem - which is still an issue - is that a process writing to a frozen XFS filesystem is stuck in D state, and therefore cannot be frozen as part of suspend. J ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [xfs-masters] Re: freeze vs freezer 2008-06-30 6:37 ` Jeremy Fitzhardinge @ 2008-06-30 12:33 ` Dave Chinner 2008-06-30 21:00 ` Rafael J. Wysocki 2008-07-01 8:59 ` Pavel Machek 1 sibling, 1 reply; 50+ messages in thread From: Dave Chinner @ 2008-06-30 12:33 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: xfs-masters, Elias Oltmanns, Henrique de Moraes Holschuh, Kyle Moffett, Matthew Garrett, David Chinner, Linux Kernel Mailing List, Jens Axboe On Sun, Jun 29, 2008 at 11:37:31PM -0700, Jeremy Fitzhardinge wrote: > Dave Chinner wrote: >> On Mon, Jun 30, 2008 at 01:22:47AM +0200, Rafael J. Wysocki wrote: >>> On Monday, 30 of June 2008, Dave Chinner wrote: >>>> On Thu, Jun 26, 2008 at 05:09:10PM +0200, Pavel Machek wrote: >>>>>>> Is this the same thing the per-device IO-queue-freeze patches for >>>>>>> HDAPS also >>>>>>> need to do? If so, you may want to talk to Elias Oltmanns >>>>>>> <eo@nebensachen.de> about it. Added to CC. >>>>>>> >>>>>> Thanks for the heads up Henrique. Even though these issues seem to be >>>>>> related up to a certain degree, there probably are some important >>>>>> differences. When suspending a system, the emphasis is on leaving the >>>>>> system in a consistent state (think of journalled file systems), whereas >>>>>> disk shock protection is mainly concerned with stopping I/O as soon as >>>>>> possible. As yet, I cannot possibly say to what extend these two >>>>>> concepts can be reconciled in the sense of sharing some common code. >>>>>> >>>>> Actually, I believe requirements are same. >>>>> >>>>> 'don't do i/o in dangerous period'. >>>>> >>>>> swsusp will just do sync() before entering dangerous period. That >>>>> provides consistent-enough state... >>>>> >>>> As I've said many times before - if the requirement is "don't do >>>> I/O" then you have to freeze the filesystem. In no way does 'sync' >>>> prevent filesystems from doing I/O..... >>>> >>> Well, it seems we can handle this on the block layer level, by temporarily >>> replacing the elevator with something that will selectively prevent fs I/O >>> from reaching the layers below it. >> >> Why? What part of freeze_bdev() doesn't work for you? > > Well, my original problem - which is still an issue - is that a process > writing to a frozen XFS filesystem is stuck in D state, and therefore > cannot be frozen as part of suspend. Silly me - how could I forget the three headed monkey getting in the way of our happy trip to beer island? Seriously, though, how is stopping I/O in the elevator is going to change that? What do you do with a sync I/O (read or write)? The process is going to have to go to sleep somewhere in D state waiting for that I/O to complete. If you're going to intercept such processes somewhere else to do something magic, then why not put that magic in vfs_check_frozen()? Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [xfs-masters] Re: freeze vs freezer 2008-06-30 12:33 ` Dave Chinner @ 2008-06-30 21:00 ` Rafael J. Wysocki 2008-06-30 22:21 ` Dave Chinner 0 siblings, 1 reply; 50+ messages in thread From: Rafael J. Wysocki @ 2008-06-30 21:00 UTC (permalink / raw) To: Dave Chinner Cc: Jeremy Fitzhardinge, xfs-masters, Elias Oltmanns, Henrique de Moraes Holschuh, Kyle Moffett, Matthew Garrett, David Chinner, Linux Kernel Mailing List, Jens Axboe On Monday, 30 of June 2008, Dave Chinner wrote: > On Sun, Jun 29, 2008 at 11:37:31PM -0700, Jeremy Fitzhardinge wrote: > > Dave Chinner wrote: > >> On Mon, Jun 30, 2008 at 01:22:47AM +0200, Rafael J. Wysocki wrote: > >>> On Monday, 30 of June 2008, Dave Chinner wrote: > >>>> On Thu, Jun 26, 2008 at 05:09:10PM +0200, Pavel Machek wrote: > >>>>>>> Is this the same thing the per-device IO-queue-freeze patches for > >>>>>>> HDAPS also > >>>>>>> need to do? If so, you may want to talk to Elias Oltmanns > >>>>>>> <eo@nebensachen.de> about it. Added to CC. > >>>>>>> > >>>>>> Thanks for the heads up Henrique. Even though these issues seem to be > >>>>>> related up to a certain degree, there probably are some important > >>>>>> differences. When suspending a system, the emphasis is on leaving the > >>>>>> system in a consistent state (think of journalled file systems), whereas > >>>>>> disk shock protection is mainly concerned with stopping I/O as soon as > >>>>>> possible. As yet, I cannot possibly say to what extend these two > >>>>>> concepts can be reconciled in the sense of sharing some common code. > >>>>>> > >>>>> Actually, I believe requirements are same. > >>>>> > >>>>> 'don't do i/o in dangerous period'. > >>>>> > >>>>> swsusp will just do sync() before entering dangerous period. That > >>>>> provides consistent-enough state... > >>>>> > >>>> As I've said many times before - if the requirement is "don't do > >>>> I/O" then you have to freeze the filesystem. In no way does 'sync' > >>>> prevent filesystems from doing I/O..... > >>>> > >>> Well, it seems we can handle this on the block layer level, by temporarily > >>> replacing the elevator with something that will selectively prevent fs I/O > >>> from reaching the layers below it. > >> > >> Why? What part of freeze_bdev() doesn't work for you? > > > > Well, my original problem - which is still an issue - is that a process > > writing to a frozen XFS filesystem is stuck in D state, and therefore > > cannot be frozen as part of suspend. I thought we were talking about the post-freezer situation. > Silly me - how could I forget the three headed monkey getting in > the way of our happy trip to beer island? > > Seriously, though, how is stopping I/O in the elevator is going to > change that? We can do that after creating the image and before we let devices run again. This way we won't need to worry about the freezer. > What do you do with a sync I/O (read or write)? The > process is going to have to go to sleep somewhere in D state waiting > for that I/O to complete. If you're going to intercept such > processes somewhere else to do something magic, then why not put > that magic in vfs_check_frozen()? This might work too, but it would be nice to do something independent of the freezer, so that we can drop the freezer when we want and not when we are forced to. Thanks, Rafael ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [xfs-masters] Re: freeze vs freezer 2008-06-30 21:00 ` Rafael J. Wysocki @ 2008-06-30 22:21 ` Dave Chinner 2008-06-30 22:38 ` Rafael J. Wysocki 0 siblings, 1 reply; 50+ messages in thread From: Dave Chinner @ 2008-06-30 22:21 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Jeremy Fitzhardinge, xfs-masters, Elias Oltmanns, Henrique de Moraes Holschuh, Kyle Moffett, Matthew Garrett, David Chinner, Linux Kernel Mailing List, Jens Axboe On Mon, Jun 30, 2008 at 11:00:43PM +0200, Rafael J. Wysocki wrote: > On Monday, 30 of June 2008, Dave Chinner wrote: > > On Sun, Jun 29, 2008 at 11:37:31PM -0700, Jeremy Fitzhardinge wrote: > > > Dave Chinner wrote: > > >> On Mon, Jun 30, 2008 at 01:22:47AM +0200, Rafael J. Wysocki wrote: > > >>> Well, it seems we can handle this on the block layer level, by temporarily > > >>> replacing the elevator with something that will selectively prevent fs I/O > > >>> from reaching the layers below it. > > >> > > >> Why? What part of freeze_bdev() doesn't work for you? > > > > > > Well, my original problem - which is still an issue - is that a process > > > writing to a frozen XFS filesystem is stuck in D state, and therefore > > > cannot be frozen as part of suspend. > > I thought we were talking about the post-freezer situation. > > > Silly me - how could I forget the three headed monkey getting in > > the way of our happy trip to beer island? > > > > Seriously, though, how is stopping I/O in the elevator is going to > > change that? > > We can do that after creating the image and before we let devices run again. > This way we won't need to worry about the freezer. You're suggesting that you let processes trying to do I/O continue until *after* the memory image is taken? How is that going to work? You've got to quiesce the filesystems totally *before* taking an image of memory - it's the only way to guarantee that they are the in-memory state and on disk state are consistent state on resume. Don't re-invent the wheel - use the API we already have that does exactly what needs to be done. > > What do you do with a sync I/O (read or write)? The > > process is going to have to go to sleep somewhere in D state waiting > > for that I/O to complete. If you're going to intercept such > > processes somewhere else to do something magic, then why not put > > that magic in vfs_check_frozen()? > > This might work too, but it would be nice to do something independent of the > freezer, so that we can drop the freezer when we want and not when we are > forced to. vfs_check_frozen() is completely independent of the process freezer. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [xfs-masters] Re: freeze vs freezer 2008-06-30 22:21 ` Dave Chinner @ 2008-06-30 22:38 ` Rafael J. Wysocki 2008-07-01 6:38 ` Dave Chinner 0 siblings, 1 reply; 50+ messages in thread From: Rafael J. Wysocki @ 2008-06-30 22:38 UTC (permalink / raw) To: Dave Chinner Cc: Jeremy Fitzhardinge, xfs-masters, Elias Oltmanns, Henrique de Moraes Holschuh, Kyle Moffett, Matthew Garrett, David Chinner, Linux Kernel Mailing List, Jens Axboe On Tuesday, 1 of July 2008, Dave Chinner wrote: > On Mon, Jun 30, 2008 at 11:00:43PM +0200, Rafael J. Wysocki wrote: > > On Monday, 30 of June 2008, Dave Chinner wrote: > > > On Sun, Jun 29, 2008 at 11:37:31PM -0700, Jeremy Fitzhardinge wrote: > > > > Dave Chinner wrote: > > > >> On Mon, Jun 30, 2008 at 01:22:47AM +0200, Rafael J. Wysocki wrote: > > > >>> Well, it seems we can handle this on the block layer level, by temporarily > > > >>> replacing the elevator with something that will selectively prevent fs I/O > > > >>> from reaching the layers below it. > > > >> > > > >> Why? What part of freeze_bdev() doesn't work for you? > > > > > > > > Well, my original problem - which is still an issue - is that a process > > > > writing to a frozen XFS filesystem is stuck in D state, and therefore > > > > cannot be frozen as part of suspend. > > > > I thought we were talking about the post-freezer situation. > > > > > Silly me - how could I forget the three headed monkey getting in > > > the way of our happy trip to beer island? > > > > > > Seriously, though, how is stopping I/O in the elevator is going to > > > change that? > > > > We can do that after creating the image and before we let devices run again. > > This way we won't need to worry about the freezer. > > You're suggesting that you let processes trying to do I/O continue > until *after* the memory image is taken? I'm not going to let the data get to the disk. > How is that going to work? > You've got to quiesce the filesystems totally *before* taking an image > of memory - it's the only way to guarantee that they are the > in-memory state and on disk state are consistent state on resume. No, it's not the only way. We have to ensure that the fs data that did't make it to the disk(s) before creating the snapshot image will not be written to the disk(s) after the image has been created. In theory one can think of many ways to achieve that and the freezing of filesystems is certainly one of those. > Don't re-invent the wheel - use the API we already have that does > exactly what needs to be done. > > > > What do you do with a sync I/O (read or write)? The > > > process is going to have to go to sleep somewhere in D state waiting > > > for that I/O to complete. If you're going to intercept such > > > processes somewhere else to do something magic, then why not put > > > that magic in vfs_check_frozen()? > > > > This might work too, but it would be nice to do something independent of the > > freezer, so that we can drop the freezer when we want and not when we are > > forced to. > > vfs_check_frozen() is completely independent of the process freezer. Well, can you please tell me how exactly that works, then? Thanks, Rafael ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [xfs-masters] Re: freeze vs freezer 2008-06-30 22:38 ` Rafael J. Wysocki @ 2008-07-01 6:38 ` Dave Chinner 2008-07-01 14:35 ` Rafael J. Wysocki 0 siblings, 1 reply; 50+ messages in thread From: Dave Chinner @ 2008-07-01 6:38 UTC (permalink / raw) To: xfs-masters Cc: Jeremy Fitzhardinge, Elias Oltmanns, Henrique de Moraes Holschuh, Kyle Moffett, Matthew Garrett, David Chinner, Linux Kernel Mailing List, Jens Axboe On Tue, Jul 01, 2008 at 12:38:41AM +0200, Rafael J. Wysocki wrote: > On Tuesday, 1 of July 2008, Dave Chinner wrote: > > On Mon, Jun 30, 2008 at 11:00:43PM +0200, Rafael J. Wysocki wrote: > > > On Monday, 30 of June 2008, Dave Chinner wrote: > > > > On Sun, Jun 29, 2008 at 11:37:31PM -0700, Jeremy Fitzhardinge wrote: > > > > > Dave Chinner wrote: > > > > >> On Mon, Jun 30, 2008 at 01:22:47AM +0200, Rafael J. Wysocki wrote: > > > > >>> Well, it seems we can handle this on the block layer level, by temporarily > > > > >>> replacing the elevator with something that will selectively prevent fs I/O > > > > >>> from reaching the layers below it. > > > > >> > > > > >> Why? What part of freeze_bdev() doesn't work for you? > > > > > > > > > > Well, my original problem - which is still an issue - is that a process > > > > > writing to a frozen XFS filesystem is stuck in D state, and therefore > > > > > cannot be frozen as part of suspend. > > > > > > I thought we were talking about the post-freezer situation. > > > > > > > Silly me - how could I forget the three headed monkey getting in > > > > the way of our happy trip to beer island? > > > > > > > > Seriously, though, how is stopping I/O in the elevator is going to > > > > change that? > > > > > > We can do that after creating the image and before we let devices run again. > > > This way we won't need to worry about the freezer. > > > > You're suggesting that you let processes trying to do I/O continue > > until *after* the memory image is taken? > > I'm not going to let the data get to the disk. Yes, but you still haven't answered the original question - What are you going to do with sync I/O that leaves a process in D state because you've prevented the I/O from being completed? > > > > What do you do with a sync I/O (read or write)? The > > > > process is going to have to go to sleep somewhere in D state waiting > > > > for that I/O to complete. If you're going to intercept such > > > > processes somewhere else to do something magic, then why not put > > > > that magic in vfs_check_frozen()? > > > > > > This might work too, but it would be nice to do something independent of the > > > freezer, so that we can drop the freezer when we want and not when we are > > > forced to. > > > > vfs_check_frozen() is completely independent of the process freezer. > > Well, can you please tell me how exactly that works, then? Try looking at the code. When we freeze a filesystem sb->s_frozen changes state depending on the level of freeze currently obtained by the filesystem. And: #define vfs_check_frozen(sb, level) \ wait_event((sb)->s_wait_unfrozen, ((sb)->s_frozen < (level))) Pretty bloody simple, really. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [xfs-masters] Re: freeze vs freezer 2008-07-01 6:38 ` Dave Chinner @ 2008-07-01 14:35 ` Rafael J. Wysocki 2008-07-01 15:05 ` Elias Oltmanns 2008-07-01 21:12 ` Dave Chinner 0 siblings, 2 replies; 50+ messages in thread From: Rafael J. Wysocki @ 2008-07-01 14:35 UTC (permalink / raw) To: Dave Chinner Cc: xfs-masters, Jeremy Fitzhardinge, Elias Oltmanns, Henrique de Moraes Holschuh, Kyle Moffett, Matthew Garrett, David Chinner, Linux Kernel Mailing List, Jens Axboe On Tuesday, 1 of July 2008, Dave Chinner wrote: > On Tue, Jul 01, 2008 at 12:38:41AM +0200, Rafael J. Wysocki wrote: > > On Tuesday, 1 of July 2008, Dave Chinner wrote: > > > On Mon, Jun 30, 2008 at 11:00:43PM +0200, Rafael J. Wysocki wrote: > > > > On Monday, 30 of June 2008, Dave Chinner wrote: > > > > > On Sun, Jun 29, 2008 at 11:37:31PM -0700, Jeremy Fitzhardinge wrote: > > > > > > Dave Chinner wrote: > > > > > >> On Mon, Jun 30, 2008 at 01:22:47AM +0200, Rafael J. Wysocki wrote: > > > > > >>> Well, it seems we can handle this on the block layer level, by temporarily > > > > > >>> replacing the elevator with something that will selectively prevent fs I/O > > > > > >>> from reaching the layers below it. > > > > > >> > > > > > >> Why? What part of freeze_bdev() doesn't work for you? > > > > > > > > > > > > Well, my original problem - which is still an issue - is that a process > > > > > > writing to a frozen XFS filesystem is stuck in D state, and therefore > > > > > > cannot be frozen as part of suspend. > > > > > > > > I thought we were talking about the post-freezer situation. > > > > > > > > > Silly me - how could I forget the three headed monkey getting in > > > > > the way of our happy trip to beer island? > > > > > > > > > > Seriously, though, how is stopping I/O in the elevator is going to > > > > > change that? > > > > > > > > We can do that after creating the image and before we let devices run again. > > > > This way we won't need to worry about the freezer. > > > > > > You're suggesting that you let processes trying to do I/O continue > > > until *after* the memory image is taken? > > > > I'm not going to let the data get to the disk. > > Yes, but you still haven't answered the original question - What are > you going to do with sync I/O that leaves a process in D state > because you've prevented the I/O from being completed? I don't want to intercept those processes, just allow them to block on that I/O. > > > > > What do you do with a sync I/O (read or write)? The > > > > > process is going to have to go to sleep somewhere in D state waiting > > > > > for that I/O to complete. If you're going to intercept such > > > > > processes somewhere else to do something magic, then why not put > > > > > that magic in vfs_check_frozen()? > > > > > > > > This might work too, but it would be nice to do something independent of the > > > > freezer, so that we can drop the freezer when we want and not when we are > > > > forced to. > > > > > > vfs_check_frozen() is completely independent of the process freezer. > > > > Well, can you please tell me how exactly that works, then? > > Try looking at the code. When we freeze a filesystem sb->s_frozen > changes state depending on the level of freeze currently obtained > by the filesystem. And: > > #define vfs_check_frozen(sb, level) \ > wait_event((sb)->s_wait_unfrozen, ((sb)->s_frozen < (level))) > > Pretty bloody simple, really. OK Do all of the filesystems implement the freezing? Rafael ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [xfs-masters] Re: freeze vs freezer 2008-07-01 14:35 ` Rafael J. Wysocki @ 2008-07-01 15:05 ` Elias Oltmanns 2008-07-01 15:17 ` Christoph Hellwig 2008-07-01 21:15 ` Dave Chinner 2008-07-01 21:12 ` Dave Chinner 1 sibling, 2 replies; 50+ messages in thread From: Elias Oltmanns @ 2008-07-01 15:05 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Dave Chinner, xfs-masters, Jeremy Fitzhardinge, Henrique de Moraes Holschuh, Kyle Moffett, Matthew Garrett, David Chinner, Linux Kernel Mailing List, Jens Axboe "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > On Tuesday, 1 of July 2008, Dave Chinner wrote: >> On Tue, Jul 01, 2008 at 12:38:41AM +0200, Rafael J. Wysocki wrote: > >> > On Tuesday, 1 of July 2008, Dave Chinner wrote: >> > > On Mon, Jun 30, 2008 at 11:00:43PM +0200, Rafael J. Wysocki wrote: >> > > > On Monday, 30 of June 2008, Dave Chinner wrote: [...] >> > > > > What do you do with a sync I/O (read or write)? The >> > > > > process is going to have to go to sleep somewhere in D state waiting >> > > > > for that I/O to complete. If you're going to intercept such >> > > > > processes somewhere else to do something magic, then why not put >> > > > > that magic in vfs_check_frozen()? >> > > > >> > > > This might work too, but it would be nice to do something independent of the >> > > > freezer, so that we can drop the freezer when we want and not when we are >> > > > forced to. >> > > >> > > vfs_check_frozen() is completely independent of the process freezer. >> > >> > Well, can you please tell me how exactly that works, then? >> >> Try looking at the code. When we freeze a filesystem sb->s_frozen >> changes state depending on the level of freeze currently obtained >> by the filesystem. And: >> >> #define vfs_check_frozen(sb, level) \ >> wait_event((sb)->s_wait_unfrozen, ((sb)->s_frozen < (level))) >> >> Pretty bloody simple, really. > > OK > > Do all of the filesystems implement the freezing? There is some work in progress [1]. If you think this will help you to address this issue on the fs level, where I think it should be done, you may even be able to request some changes to fit your needs before it gets merged into mainline. Regards, Elias [1] http://permalink.gmane.org/gmane.linux.file-systems/24716 ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [xfs-masters] Re: freeze vs freezer 2008-07-01 15:05 ` Elias Oltmanns @ 2008-07-01 15:17 ` Christoph Hellwig 2008-07-01 21:15 ` Dave Chinner 1 sibling, 0 replies; 50+ messages in thread From: Christoph Hellwig @ 2008-07-01 15:17 UTC (permalink / raw) To: Elias Oltmanns Cc: Rafael J. Wysocki, Dave Chinner, xfs-masters, Jeremy Fitzhardinge, Henrique de Moraes Holschuh, Kyle Moffett, Matthew Garrett, David Chinner, Linux Kernel Mailing List, Jens Axboe On Tue, Jul 01, 2008 at 05:05:29PM +0200, Elias Oltmanns wrote: > There is some work in progress [1]. If you think this will help you to > address this issue on the fs level, where I think it should be done, you > may even be able to request some changes to fit your needs before it > gets merged into mainline. That is just a direct user interface to this functionality. The functionality has been around for a long time. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [xfs-masters] Re: freeze vs freezer 2008-07-01 15:05 ` Elias Oltmanns 2008-07-01 15:17 ` Christoph Hellwig @ 2008-07-01 21:15 ` Dave Chinner 2008-07-01 21:46 ` Elias Oltmanns 1 sibling, 1 reply; 50+ messages in thread From: Dave Chinner @ 2008-07-01 21:15 UTC (permalink / raw) To: xfs-masters Cc: Rafael J. Wysocki, Jeremy Fitzhardinge, Henrique de Moraes Holschuh, Kyle Moffett, Matthew Garrett, David Chinner, Linux Kernel Mailing List, Jens Axboe On Tue, Jul 01, 2008 at 05:05:29PM +0200, Elias Oltmanns wrote: > "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > > Do all of the filesystems implement the freezing? > > There is some work in progress [1]. [....] > [1] http://permalink.gmane.org/gmane.linux.file-systems/24716 No, that's the userspace ioctl interface to enable freezing from something other than dm-snapshot. The filesystems that is aimed at already support freezing via freeze_bdev(). Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [xfs-masters] Re: freeze vs freezer 2008-07-01 21:15 ` Dave Chinner @ 2008-07-01 21:46 ` Elias Oltmanns 0 siblings, 0 replies; 50+ messages in thread From: Elias Oltmanns @ 2008-07-01 21:46 UTC (permalink / raw) To: xfs-masters Cc: Rafael J. Wysocki, Jeremy Fitzhardinge, Henrique de Moraes Holschuh, Kyle Moffett, Matthew Garrett, Linux Kernel Mailing List, Jens Axboe Dave Chinner <david@fromorbit.com> wrote: > On Tue, Jul 01, 2008 at 05:05:29PM +0200, Elias Oltmanns wrote: >> "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > >> > Do all of the filesystems implement the freezing? >> >> There is some work in progress [1]. > [....] >> [1] http://permalink.gmane.org/gmane.linux.file-systems/24716 > > No, that's the userspace ioctl interface to enable freezing from > something other than dm-snapshot. The filesystems that is aimed at > already support freezing via freeze_bdev(). Yes, Christoph mentioned that too and I should have realised it had I looked at those patches properly. Regards, Elias ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [xfs-masters] Re: freeze vs freezer 2008-07-01 14:35 ` Rafael J. Wysocki 2008-07-01 15:05 ` Elias Oltmanns @ 2008-07-01 21:12 ` Dave Chinner 2008-07-01 21:21 ` Rafael J. Wysocki 1 sibling, 1 reply; 50+ messages in thread From: Dave Chinner @ 2008-07-01 21:12 UTC (permalink / raw) To: xfs-masters Cc: Jeremy Fitzhardinge, Elias Oltmanns, Henrique de Moraes Holschuh, Kyle Moffett, Matthew Garrett, David Chinner, Linux Kernel Mailing List, Jens Axboe On Tue, Jul 01, 2008 at 04:35:43PM +0200, Rafael J. Wysocki wrote: > On Tuesday, 1 of July 2008, Dave Chinner wrote: > > On Tue, Jul 01, 2008 at 12:38:41AM +0200, Rafael J. Wysocki wrote: > > > On Tuesday, 1 of July 2008, Dave Chinner wrote: > > > > On Mon, Jun 30, 2008 at 11:00:43PM +0200, Rafael J. Wysocki wrote: > > > > > On Monday, 30 of June 2008, Dave Chinner wrote: > > > > > > On Sun, Jun 29, 2008 at 11:37:31PM -0700, Jeremy Fitzhardinge wrote: > > > > > > > Dave Chinner wrote: > > > > > > >> On Mon, Jun 30, 2008 at 01:22:47AM +0200, Rafael J. Wysocki wrote: > > > > > > >>> Well, it seems we can handle this on the block layer level, by temporarily > > > > > > >>> replacing the elevator with something that will selectively prevent fs I/O > > > > > > >>> from reaching the layers below it. > > > > > > >> > > > > > > >> Why? What part of freeze_bdev() doesn't work for you? > > > > > > > > > > > > > > Well, my original problem - which is still an issue - is that a process > > > > > > > writing to a frozen XFS filesystem is stuck in D state, and therefore > > > > > > > cannot be frozen as part of suspend. > > > > > > > > > > I thought we were talking about the post-freezer situation. > > > > > > > > > > > Silly me - how could I forget the three headed monkey getting in > > > > > > the way of our happy trip to beer island? > > > > > > > > > > > > Seriously, though, how is stopping I/O in the elevator is going to > > > > > > change that? > > > > > > > > > > We can do that after creating the image and before we let devices run again. > > > > > This way we won't need to worry about the freezer. > > > > > > > > You're suggesting that you let processes trying to do I/O continue > > > > until *after* the memory image is taken? > > > > > > I'm not going to let the data get to the disk. > > > > Yes, but you still haven't answered the original question - What are > > you going to do with sync I/O that leaves a process in D state > > because you've prevented the I/O from being completed? > > I don't want to intercept those processes, just allow them to block on that I/O. So you're going to allow them to go to D state somewhere. Ok, so what's the problem with blocking them in vfs_check_frozen(), then? > Do all of the filesystems implement the freezing? Most of the major ones - those that implement ->write_super_lockfs() should work just fine. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [xfs-masters] Re: freeze vs freezer 2008-07-01 21:12 ` Dave Chinner @ 2008-07-01 21:21 ` Rafael J. Wysocki 0 siblings, 0 replies; 50+ messages in thread From: Rafael J. Wysocki @ 2008-07-01 21:21 UTC (permalink / raw) To: Dave Chinner Cc: xfs-masters, Jeremy Fitzhardinge, Elias Oltmanns, Henrique de Moraes Holschuh, Kyle Moffett, Matthew Garrett, David Chinner, Linux Kernel Mailing List, Jens Axboe On Tuesday, 1 of July 2008, Dave Chinner wrote: > On Tue, Jul 01, 2008 at 04:35:43PM +0200, Rafael J. Wysocki wrote: > > On Tuesday, 1 of July 2008, Dave Chinner wrote: > > > On Tue, Jul 01, 2008 at 12:38:41AM +0200, Rafael J. Wysocki wrote: > > > > On Tuesday, 1 of July 2008, Dave Chinner wrote: > > > > > On Mon, Jun 30, 2008 at 11:00:43PM +0200, Rafael J. Wysocki wrote: > > > > > > On Monday, 30 of June 2008, Dave Chinner wrote: > > > > > > > On Sun, Jun 29, 2008 at 11:37:31PM -0700, Jeremy Fitzhardinge wrote: > > > > > > > > Dave Chinner wrote: > > > > > > > >> On Mon, Jun 30, 2008 at 01:22:47AM +0200, Rafael J. Wysocki wrote: > > > > > > > >>> Well, it seems we can handle this on the block layer level, by temporarily > > > > > > > >>> replacing the elevator with something that will selectively prevent fs I/O > > > > > > > >>> from reaching the layers below it. > > > > > > > >> > > > > > > > >> Why? What part of freeze_bdev() doesn't work for you? > > > > > > > > > > > > > > > > Well, my original problem - which is still an issue - is that a process > > > > > > > > writing to a frozen XFS filesystem is stuck in D state, and therefore > > > > > > > > cannot be frozen as part of suspend. > > > > > > > > > > > > I thought we were talking about the post-freezer situation. > > > > > > > > > > > > > Silly me - how could I forget the three headed monkey getting in > > > > > > > the way of our happy trip to beer island? > > > > > > > > > > > > > > Seriously, though, how is stopping I/O in the elevator is going to > > > > > > > change that? > > > > > > > > > > > > We can do that after creating the image and before we let devices run again. > > > > > > This way we won't need to worry about the freezer. > > > > > > > > > > You're suggesting that you let processes trying to do I/O continue > > > > > until *after* the memory image is taken? > > > > > > > > I'm not going to let the data get to the disk. > > > > > > Yes, but you still haven't answered the original question - What are > > > you going to do with sync I/O that leaves a process in D state > > > because you've prevented the I/O from being completed? > > > > I don't want to intercept those processes, just allow them to block on that I/O. > > So you're going to allow them to go to D state somewhere. Ok, so > what's the problem with blocking them in vfs_check_frozen(), then? > > > Do all of the filesystems implement the freezing? > > Most of the major ones - those that implement ->write_super_lockfs() > should work just fine. Okay, so we can do that. I'm surely not against freezing of the filesystems before hibernation at least. In fact we tried that in the past, but there were some locking problems I was unable to resolve at that time. Unfortunately I'm not very familiar with the VFS and filesystems code, so some experts' help would be very much appreciated. Thanks, Rafael ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [xfs-masters] Re: freeze vs freezer 2008-06-30 6:37 ` Jeremy Fitzhardinge 2008-06-30 12:33 ` Dave Chinner @ 2008-07-01 8:59 ` Pavel Machek 1 sibling, 0 replies; 50+ messages in thread From: Pavel Machek @ 2008-07-01 8:59 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: xfs-masters, Elias Oltmanns, Henrique de Moraes Holschuh, Kyle Moffett, Matthew Garrett, David Chinner, Linux Kernel Mailing List, Jens Axboe On Sun 2008-06-29 23:37:31, Jeremy Fitzhardinge wrote: > Dave Chinner wrote: >> On Mon, Jun 30, 2008 at 01:22:47AM +0200, Rafael J. Wysocki wrote: >> >>> On Monday, 30 of June 2008, Dave Chinner wrote: >>> >>>> On Thu, Jun 26, 2008 at 05:09:10PM +0200, Pavel Machek wrote: >>>> >>>>>>> Is this the same thing the per-device IO-queue-freeze patches for >>>>>>> HDAPS also >>>>>>> need to do? If so, you may want to talk to Elias Oltmanns >>>>>>> <eo@nebensachen.de> about it. Added to CC. >>>>>>> >>>>>> Thanks for the heads up Henrique. Even though these issues seem to be >>>>>> related up to a certain degree, there probably are some important >>>>>> differences. When suspending a system, the emphasis is on leaving the >>>>>> system in a consistent state (think of journalled file systems), whereas >>>>>> disk shock protection is mainly concerned with stopping I/O as soon as >>>>>> possible. As yet, I cannot possibly say to what extend these two >>>>>> concepts can be reconciled in the sense of sharing some common code. >>>>>> >>>>> Actually, I believe requirements are same. >>>>> >>>>> 'don't do i/o in dangerous period'. >>>>> >>>>> swsusp will just do sync() before entering dangerous period. That >>>>> provides consistent-enough state... >>>>> >>>> As I've said many times before - if the requirement is "don't do >>>> I/O" then you have to freeze the filesystem. In no way does 'sync' >>>> prevent filesystems from doing I/O..... >>>> >>> Well, it seems we can handle this on the block layer level, by temporarily >>> replacing the elevator with something that will selectively prevent fs I/O >>> from reaching the layers below it. >>> >> >> Why? What part of freeze_bdev() doesn't work for you? > > Well, my original problem - which is still an issue - is that a process > writing to a frozen XFS filesystem is stuck in D state, and therefore > cannot be frozen as part of suspend. Well, if it is in D state but does not hold any important locks, you can just add "try_to_freeze()" in the place where it is sleeping, right? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 50+ messages in thread
end of thread, other threads:[~2008-07-03 19:43 UTC | newest] Thread overview: 50+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-11-22 3:54 freeze vs freezer Jeremy Fitzhardinge 2007-11-23 23:47 ` Rafael J. Wysocki 2007-11-26 18:44 ` Jeremy Fitzhardinge 2007-11-26 21:20 ` Rafael J. Wysocki 2007-11-26 21:17 ` David Chinner 2007-11-26 21:53 ` Rafael J. Wysocki 2007-11-27 5:38 ` Matthew Garrett 2007-11-27 17:40 ` Rafael J. Wysocki 2007-11-27 20:33 ` Kyle Moffett 2007-11-27 23:01 ` Rafael J. Wysocki 2007-11-27 22:49 ` Jeremy Fitzhardinge 2007-11-27 23:14 ` Kyle Moffett 2007-11-27 23:32 ` Jeremy Fitzhardinge 2008-01-02 16:02 ` Pavel Machek 2008-01-02 21:30 ` Nigel Cunningham 2008-01-02 22:04 ` Rafael J. Wysocki 2008-01-03 9:19 ` Nigel Cunningham 2008-01-03 9:47 ` Oliver Neukum 2008-01-03 9:52 ` Nigel Cunningham 2008-01-03 11:15 ` Oliver Neukum 2008-01-03 22:06 ` Nigel Cunningham 2008-01-04 20:54 ` Oliver Neukum 2008-01-05 1:38 ` Kyle Moffett 2008-01-05 21:18 ` Pavel Machek 2008-01-05 23:01 ` Nigel Cunningham 2008-01-03 22:31 ` Rafael J. Wysocki 2008-06-23 7:16 ` Pavel Machek 2008-06-23 14:00 ` Henrique de Moraes Holschuh 2008-06-24 8:08 ` Elias Oltmanns 2008-06-26 15:09 ` Pavel Machek 2008-06-29 22:12 ` [xfs-masters] " Dave Chinner 2008-06-29 23:22 ` Rafael J. Wysocki 2008-06-30 6:11 ` Christoph Hellwig 2008-06-30 20:34 ` Rafael J. Wysocki 2008-07-03 19:43 ` Eric Sandeen 2008-06-30 6:29 ` Dave Chinner 2008-06-30 6:37 ` Jeremy Fitzhardinge 2008-06-30 12:33 ` Dave Chinner 2008-06-30 21:00 ` Rafael J. Wysocki 2008-06-30 22:21 ` Dave Chinner 2008-06-30 22:38 ` Rafael J. Wysocki 2008-07-01 6:38 ` Dave Chinner 2008-07-01 14:35 ` Rafael J. Wysocki 2008-07-01 15:05 ` Elias Oltmanns 2008-07-01 15:17 ` Christoph Hellwig 2008-07-01 21:15 ` Dave Chinner 2008-07-01 21:46 ` Elias Oltmanns 2008-07-01 21:12 ` Dave Chinner 2008-07-01 21:21 ` Rafael J. Wysocki 2008-07-01 8:59 ` Pavel Machek
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox