From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: Theodore Tso <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org
Subject: Re: [RFC PATCH 1/1] e2fsprogs: Add undo I/O manager.
Date: Wed, 06 Jun 2007 15:32:27 +0530 [thread overview]
Message-ID: <46668633.7060403@linux.vnet.ibm.com> (raw)
In-Reply-To: <20070603231701.GA10140@thunk.org>
Theodore Tso wrote:
> On Tue, May 22, 2007 at 03:47:33PM +0530, Aneesh Kumar K.V wrote:
>> This I/O manager saves the contents of the location being overwritten
>> to a tdb database. This helps in undoing the changes done to the
>> file system.
>>
>> + /* loop through the existing entries and find if they overlap */
>> + for (key = tdb_firstkey(tdb); key.dptr; key = tdb_nextkey(tdb, key)) {
>> + data = tdb_fetch(tdb, key);
>> + /*
>> + * we are not interested in the data
>> + */
>> + free(data.dptr);
>
> Youch. This is going to be painful; it means that for every single
> write, we need to iterate over every single entry in the TDB. This
> doesn't scale terribly well. :-(
>
> I suspect you will do much better by using a different encoding for
> the tdb database; namely, one where you don't use an extent-based
> encoding, but rather something which is strictly block based. So that
> means a separate entry in the tdb database for each block entry. That
> will be slightly more inefficient in terms of overhead stored in the
> tdb database, yes, but as a percentage of the blocksize (typically
> 4k), the overhead isn't that great, and the performance improvement
> will be huge.
>
> The downside of doing this is that you need to take into account
> changes in blocksize via set_blksize() (this is rare; it's only done
> by mke2fs, and then only to zap various MD superblocks, et. al), and
> if the size is negative (in which case it represents the size in
> bytes, which could be more or less than a blocksize, and not
> necessarily a multiple of a blocksize).
>
If we allow to change the block size in between that would mean the
records that we store in the tdb database will be of variable size (
different block sizes). That would also add all the code/complexity that
i have in is_trans_overlapping. So if we are looking at avoiding the
above for() loop then we should have constant block size (4K ?). But in
your above statement, you are counting overhead as a percentage of
blocksize. So how do we handle this ?
Now with constant block size The write_blk gets complex because of there
won't be a 1:1 mapping between the block number we need to use in
tdb_database and the backing I/O manager.
> But that's not too bad, since now you just have to do is figure out
> which block(s) need to be saved, and then check to see if a record
> already exists in the tdb; if it does, the original value has been
> saved, and you don't have to do anything; if it doesn't then you just
> save the entire block. The undo manager need to save the first
> blocksize specified, and backup the original data in the tdb file in
> terms of that first blocksize, regardless of any later changes in the
> blocksize (which as I said is rare).
>
>> + if (req_loc < cur_loc) {
>> + if (req_loc+req_size > cur_loc) {
>> + /* ------------------
>> + * | | | |
>> + * ------------------
>> + * rl cl rs cs
>> + */
>
> This looks like notes to yourself; but it's not obvious what's going
> on here. Fortunately with the above suggested change in algorith,
> most of this will go away, but please make your comments more obvious.
>
rl -> req_loc
cl -> cur_loc
rs -> req_size
cs -> cur_size
>> + // I have libtdb1. Enable after rebasing to latest e2fsprogs that has this API
>> + //tdb_transaction_start(data->tdb);
>
> What version of e2fsprogs are you developing against?
>
Right now i am manually linking it against libtdb.
dpkg --search /usr/lib/libtdb.so
tdb-dev: /usr/lib/libtdb.so
>> + /* FIXME!! Not sure what should be the error */
>> + //tdb_transaction_cancel(data->tdb);
>> + retval = EXT2_ET_SHORT_WRITE;
>> + goto error_out;
>
> I would add a new error code in lib/ext2fs/ext2_err.et.in; that way we
> can have a very specific error code explaining that we have a failure
> to save the original version of the block.
>
Will do this
>> + actual = read(data->dev, tdb_data.dptr, size);
>> + if (actual != size) {
>
> Instead of using ext2fs_llseek() and read(), please use the underlying
> io_manager. That way the undo manager might have a chance to work on
> some other io manager other than unix_io. Yes, that means that you
> might in some cases need to save and restore the current blksize. But
> that's not going to be the common case, and it will make the code much
> more general.
>
>> +error_out:
>> + /* Move the location back */
>> + if (ext2fs_llseek(data->dev, cur_loc, SEEK_SET) != cur_loc) {
>> + retval = errno ? errno : EXT2_ET_LLSEEK_FAILED;
>> + goto error_out;
>> + }
>
> Why do you need to move the location back? As far as I know nothing
> should be depending on current offset of the file descriptor.
>
>
No specific reason.
-aneesh
next prev parent reply other threads:[~2007-06-06 10:03 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-22 10:17 [RFC PATCH 1/1] e2fsprogs: Add undo I/O manager Aneesh Kumar K.V
2007-05-22 10:17 ` Aneesh Kumar K.V
2007-06-03 23:17 ` Theodore Tso
2007-06-06 10:02 ` Aneesh Kumar K.V [this message]
2007-06-06 12:02 ` Theodore Tso
2007-06-07 11:40 ` Aneesh Kumar K.V
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=46668633.7060403@linux.vnet.ibm.com \
--to=aneesh.kumar@linux.vnet.ibm.com \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.