public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Theodore Tso <tytso@MIT.EDU>
To: Alberto Bertogli <albertito@blitiri.com.ar>
Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
	dm-devel@redhat.com
Subject: Re: jbd2 inside a device mapper module
Date: Sat, 27 Dec 2008 14:29:50 -0500	[thread overview]
Message-ID: <20081227192950.GB30198@mit.edu> (raw)
In-Reply-To: <20081227030020.GD4127@blitiri.com.ar>

On Sat, Dec 27, 2008 at 01:00:20AM -0200, Alberto Bertogli wrote:
> I have a couple of alternatives in mind, the most decent one at the
> moment is having two metadatas (M1 and M2) for the each block, and
> update M1 on the first write to the given block, M2 on the second, M1 on
> the third, and so on.

I don't see how this would help.  You still have to do synchronous
writes for safety, which is what is going to kill your performance.

What you want to do is to batch as many writes as possible.  Until the
underlying filesystem requests a flush, you can afford to hold off
writing the block to disk.  Otherwise, you'll end up turning each 4k
write into two 8k synchronous writes, which will be a performance
disaster.  If you hold off, it's much more likely that the you'll be
able to patch a large number of blocks into a single transaction.
Also, if a block gets modified multiple times (for example, with an
inode table block where tar writes one file, and then another), if you
hold off the write as long as possible, you can only write the inode
table block once, instead of multiple times.

Note that this means that you have to wait until the last minute to
calculate the checksum, since the buffer could be modified after the
write request.  OCFS2 does this, by using a commit-time callback to
calculate the checksums used.

The bottom line doing something like this in an efficient way is
tricky.

> > Why not just use the ext3/4 external journal format?
> 
> Wouldn't that lead to confusion, because people can think the device
> holds an ext3/4 external journal, while it actually holds a
> device-mapper backing device that happens to contain a journal?

Not really; the external journal has a label and uuid, and the journal
superblock has a place to store the uuid of the "client" of the
journal.  So there is plenty of information available to tie an
external journal to some device-mapper backing device.

> What would be the advantages of using the ext3/4 journal format, over a
> simple initial sector and the journal following?

There already existing tools to find the external journal, using the
blkid library.  So you only have to store the UUID of the journal in
the superblock of the device-mapper backing device, and then you can
easily find the external journal as follows:

       journal_fn = blkid_get_devname(ctx->blkid, "UUID", uuid);

       		    				  	  - Ted

  reply	other threads:[~2008-12-27 19:30 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20081224211038.GT4127@blitiri.com.ar>
2008-12-24 22:38 ` jbd2 inside a device mapper module Alberto Bertogli
     [not found] ` <20081224234915.GA23723@mit.edu>
2008-12-25 14:35   ` Alberto Bertogli
2008-12-25 15:52     ` Theodore Tso
2008-12-26  0:00       ` Alberto Bertogli
2008-12-26  3:37         ` Theodore Tso
2008-12-26 16:17           ` Alberto Bertogli
2008-12-26 18:06             ` Theodore Tso
2008-12-27  3:00               ` Alberto Bertogli
2008-12-27 19:29                 ` Theodore Tso [this message]
2008-12-29 21:30                   ` Alberto Bertogli
2008-12-27 20:01     ` Andreas Dilger
     [not found]       ` <46A00B48CC54E4468EF6911F877AC4CA01DDBB66@blrx3m10.blr.amer.dell.com>
2008-12-29 21:05         ` [dm-devel] " Alberto Bertogli
2008-12-30  6:55           ` Alex Tomas
2008-12-30 13:51             ` Alberto Bertogli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081227192950.GB30198@mit.edu \
    --to=tytso@mit.edu \
    --cc=albertito@blitiri.com.ar \
    --cc=dm-devel@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox