public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: David Chinner <dgc@sgi.com>
To: Pavel Machek <pavel@suse.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>,
	Alasdair G Kergon <agk@redhat.com>,
	Eric Sandeen <sandeen@redhat.com>, Andrew Morton <akpm@osdl.org>,
	linux-kernel@vger.kernel.org, dm-devel@redhat.com,
	Srinivasa DS <srinivasa@in.ibm.com>,
	Nigel Cunningham <nigel@suspend2.net>,
	David Chinner <dgc@sgi.com>
Subject: Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex
Date: Mon, 13 Nov 2006 10:30:54 +1100	[thread overview]
Message-ID: <20061112233054.GI11034@melbourne.sgi.com> (raw)
In-Reply-To: <20061112184310.GC5081@ucw.cz>

On Sun, Nov 12, 2006 at 06:43:10PM +0000, Pavel Machek wrote:
> Hi!
> 
> > > Okay, so you claim that sys_sync can stall, waiting for administator?
> > > 
> > > In such case we can simply do one sys_sync() before we start freezing
> > > userspace... or just more the only sys_sync() there. That way, admin
> > > has chance to unlock his system.
> > 
> > Well, this is a different story.
> > 
> > My point is that if we call sys_sync() _anyway_ before calling
> > freeze_filesystems(), then freeze_filesystems() is _safe_ (either the
> > sys_sync() blocks, or it doesn't in which case freeze_filesystems() won't
> > block either).
> > 
> > This means, however, that we can leave the patch as is (well, with the minor
> > fix I have already posted), for now, because it doesn't make things worse a
> > bit, but:
> > (a) it prevents xfs from being corrupted and
> 
> I'd really prefer it to be fixed by 'freezeable workqueues'.

I'd prefer that you just freeze the filesystem and let the
filesystem do things correctly.

> Can you
> point me into sources -- which xfs workqueues are problematic?

AFAIK, its the I/O completion workqueues that are causing problems.
(fs/xfs/linux-2.6/xfs_buf.c) However, thinking about it, I'm not
sure that the work queues being left unfrozen is the real problem.

i.e. after a sync there's still I/O outstanding (e.g. metadata in
the log but not on disk), and because the kernel threads are frozen
some time after the sync, we could have issued this delayed write
metadata to disk after the sync. With XFS, we can have a of queue of
thousands of metadata buffers for delwri, and they are all issued
async and can take many seconds for the I/O to complete.

The I/O completion workqueues will continue to run until all I/O
stops, and metadata I/O completion will change the state of the
filesystem in memory.

However, even if you stop the workqueue processing, you're still
going to have to wait for all I/O completion to occur before
snapshotting memory because having any I/O complete changes memory
state.  Hence I fail to see how freezing the workqueues really helps
at all here....

Given that the only way to track and block on these delwri metadata
buffers is to issue a sync flush rather than a async flush, suspend
has to do something different to guarantee that we block until all
those I/Os have completed. i.e. freeze the filesystem.

So the problem, IMO, is suspend is not telling the filesystem
to stop doing stuff and so we are getting caught out by doing
stuff that suspend assumes won't happen but does nothing
to prevent.

> > (b) it prevents journaling filesystems in general from replaying journals
> > after a failing resume.

This is incorrect.  Freezing an XFS filesystem _ensures_ that log
replay occurs on thaw or a failed resume.  XFS specifically dirties
the log after a freeze down to a consistent state so that the
unlinked inode lists get processed by recovery on thaw/next mount.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

  parent reply	other threads:[~2006-11-12 23:31 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-07 18:34 [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex Alasdair G Kergon
2006-11-07 20:18 ` [dm-devel] " Mike Snitzer
2006-11-07 20:22   ` Eric Sandeen
2006-11-07 23:34   ` Alasdair G Kergon
2006-11-07 20:28 ` Andrew Morton
2006-11-07 22:45   ` Eric Sandeen
2006-11-07 23:00     ` Andrew Morton
2006-11-08  9:54       ` Arjan van de Ven
2007-01-12  6:23       ` Srinivasa Ds
2007-01-12 10:16       ` Srinivasa Ds
2006-11-07 23:05     ` Rafael J. Wysocki
2006-11-07 23:18       ` Eric Sandeen
2006-11-07 23:42         ` Rafael J. Wysocki
2006-11-08  0:01           ` Alasdair G Kergon
2006-11-08  8:27             ` David Chinner
2006-11-08 14:25               ` Alasdair G Kergon
2006-11-08 14:43                 ` Rafael J. Wysocki
2006-11-08 15:25                   ` Alasdair G Kergon
2006-11-08 23:06                     ` Rafael J. Wysocki
2006-11-07 23:49       ` Alasdair G Kergon
2006-11-08  0:00         ` Rafael J. Wysocki
2006-11-08  3:33           ` David Chinner
2006-11-08  2:30         ` Alasdair G Kergon
2006-11-08 12:10           ` Rafael J. Wysocki
2006-11-08 18:09             ` Pavel Machek
2006-11-09 15:52               ` Rafael J. Wysocki
2006-11-09 16:00                 ` Pavel Machek
2006-11-09 19:59                   ` Rafael J. Wysocki
2006-11-09 21:17                     ` Pavel Machek
2006-11-09 21:18                       ` Rafael J. Wysocki
2006-11-09 21:41                         ` Pavel Machek
2006-11-09 22:21                           ` Rafael J. Wysocki
2006-11-09 23:11                             ` Pavel Machek
2006-11-09 23:24                               ` Alasdair G Kergon
2006-11-09 23:32                                 ` Pavel Machek
2006-11-10 12:03                                   ` Rafael J. Wysocki
2006-11-12 18:43                                     ` Pavel Machek
2006-11-12 21:53                                       ` Rafael J. Wysocki
2006-11-12 23:30                                       ` David Chinner [this message]
2006-11-13 16:11                                         ` Rafael J. Wysocki
2006-11-15 18:50                                         ` Pavel Machek
2006-11-15 19:56                                           ` Rafael J. Wysocki
2006-11-15 20:00                                             ` Rafael J. Wysocki
2006-11-15 20:23                                               ` Pavel Machek
2006-11-15 21:58                                                 ` Rafael J. Wysocki
2006-11-15 22:49                                                   ` Pavel Machek
2006-11-16 23:20                                                   ` David Chinner
2006-11-16 23:38                                                     ` Pavel Machek
2006-11-13  7:35                                       ` Stefan Seyfried
2006-11-10  0:57                             ` David Chinner
2006-11-10 10:39                               ` Pavel Machek
2006-11-12 22:30                                 ` David Chinner
2006-11-12 22:43                                   ` Rafael J. Wysocki
2006-11-13  5:43                                     ` David Chinner
2006-11-13 16:22                                       ` Rafael J. Wysocki
2006-11-14  0:10                                         ` David Chinner
2006-11-16 23:23                                     ` David Chinner
2006-11-16 23:40                                       ` Pavel Machek
2006-11-17  1:40                                         ` David Chinner
2006-11-17 15:13                                           ` Pavel Machek
2006-11-10  0:54                       ` David Chinner
2006-11-10 10:24                       ` Alan Cox
2006-11-10 10:36                         ` Pavel Machek
2006-11-10  0:33                   ` David Chinner
2006-11-10 10:38                     ` Pavel Machek
2006-11-08 20:48             ` Nigel Cunningham
2006-11-08 21:08               ` Rafael J. Wysocki
2006-11-07 23:23   ` Alasdair G Kergon
2006-11-07 23:39   ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061112233054.GI11034@melbourne.sgi.com \
    --to=dgc@sgi.com \
    --cc=agk@redhat.com \
    --cc=akpm@osdl.org \
    --cc=dm-devel@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nigel@suspend2.net \
    --cc=pavel@suse.cz \
    --cc=rjw@sisk.pl \
    --cc=sandeen@redhat.com \
    --cc=srinivasa@in.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox