From: Daniel Phillips <phillips@phunq.net>
To: "Duane Griffin" <duaneg@dghda.com>
Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
Theodore Tso <tytso@mit.edu>,
sct@redhat.com, akpm@linux-foundation.org, adilger@clusterfs.com
Subject: Re: [RFC, PATCH 0/6] ext3: do not modify data on-disk when mounting read-only filesystem
Date: Wed, 12 Mar 2008 19:22:21 -0800 [thread overview]
Message-ID: <200803122022.22814.phillips@phunq.net> (raw)
In-Reply-To: <1204768754-29655-1-git-send-email-duaneg@dghda.com>
Hi Duane,
Thanks for doing this. Some perhaps not so obvious fallout from the bad
old way of doing things is that ddnap (zumastor) hits an issue in
replication. Since ddsnap allows journal replay on the downstream
server and also needs to have an unaltered snapshot to apply deltas
against, if we do not take special care, Ext3 will come along and
modify the downstream snapshot even when told not to. Our solution:
take two snapshots per replication cycle (pretty cheap) so that one can
be clean and the other can be stepped on at will by the journal replay.
Ugh.
With your hack, we can eventually drop the double snapshot, provided no
other filesystem is similarly badly behaved.
Re your page translation table: we already have a page translation
table, it is called the page cache. If you could figure out which file
(or metadata) each journal block belongs to, you could just load the
page table pages back in and presto, done. No need to replay the
journal at all, you are already back to journal+disk = consistent
state.
I probably have missed a detail or two since I haven't looked closely at
how orphan inodes work, revokes, probably other things, but there is
the basic idea. SCT, does my reasoning hold water? (In fact,
ddsnap "replays" its own journal in exactly this way. Cache state is
reconstructed and no actual journal flush is performed.)
Anyway, this is just a theoretical comment, it is in no way a suggestion
for a rewrite. The reason for that being, you do not have any
convenient way to map physical journal blocks back to files and
metadata. Maybe if we do implement reverse mapping for Ext3/4 later
(not just a pipe dream) we could revisit this and lose your extra
mapping. As it stands your solution seems well built, after a quick
readthrough. Nice looking code. I think you added about 250 lines
overall, so tight too. Thanks again.
Daniel
next prev parent reply other threads:[~2008-03-13 3:22 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-03-06 1:59 [RFC, PATCH 0/6] ext3: do not modify data on-disk when mounting read-only filesystem Duane Griffin
[not found] ` <a047dfbd7855d5f484ae8a3434f6a073302505be.1204685366.git.duaneg@dghda.com>
2008-03-06 1:59 ` [RFC, PATCH 1/6] jbd: eliminate duplicated code in revocation table init/destroy functions Duane Griffin
2008-03-06 1:59 ` Duane Griffin
[not found] ` <7fbb2f28dcb417e3173ffacc103932c36683f2f0.1204685366.git.duaneg@dghda.com>
2008-03-06 1:59 ` [RFC, PATCH 2/6] jbd: replace potentially false assertion with if block Duane Griffin
2008-03-06 1:59 ` Duane Griffin
2008-03-08 14:52 ` Christoph Hellwig
[not found] ` <bd11bfe0ae59094e7a3fbd813dfd877f989f122f.1204685366.git.duaneg@dghda.com>
2008-03-06 1:59 ` [RFC, PATCH 3/6] jbd: only create debugfs entries if cache initialisation is successful Duane Griffin
2008-03-06 1:59 ` Duane Griffin
[not found] ` <8644b32ddec999bbc1da0ac55ad7b66d0b8176de.1204685366.git.duaneg@dghda.com>
2008-03-06 1:59 ` [RFC, PATCH 4/6] jbd: refactor nested journal log recovery loop into separate functions Duane Griffin
2008-03-06 1:59 ` Duane Griffin
2008-03-08 14:53 ` Christoph Hellwig
2008-03-08 18:40 ` Duane Griffin
2008-03-11 14:35 ` Jan Kara
2008-03-12 1:02 ` Duane Griffin
2008-03-12 10:50 ` Jan Kara
[not found] ` <7f095bf2403465433796f2f7aab20f1c9a2e0f73.1204685366.git.duaneg@dghda.com>
2008-03-06 1:59 ` [RFC, PATCH 5/6] jbd: add support for read-only log recovery Duane Griffin
2008-03-06 1:59 ` Duane Griffin
2008-03-11 15:05 ` Jan Kara
2008-03-12 1:40 ` Duane Griffin
2008-03-12 10:51 ` Jan Kara
[not found] ` <f4577a0cc4262de0fbff02cae3858d2e2edfaaea.1204685366.git.duaneg@dghda.com>
2008-03-06 1:59 ` [RFC, PATCH 6/6] ext3: do not write to the disk when mounting a dirty read-only filesystem Duane Griffin
2008-03-06 1:59 ` Duane Griffin
2008-03-06 7:17 ` Andreas Dilger
2008-03-06 11:19 ` Duane Griffin
2008-03-11 15:11 ` Jan Kara
2008-03-12 2:42 ` Duane Griffin
2008-03-12 10:53 ` Jan Kara
2008-03-06 3:42 ` [RFC, PATCH 0/6] ext3: do not modify data on-disk when mounting " Andrew Morton
2008-03-06 11:20 ` Duane Griffin
2008-03-13 3:22 ` Daniel Phillips [this message]
2008-03-13 12:35 ` Duane Griffin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200803122022.22814.phillips@phunq.net \
--to=phillips@phunq.net \
--cc=adilger@clusterfs.com \
--cc=akpm@linux-foundation.org \
--cc=duaneg@dghda.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=sct@redhat.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox