From: David Masover <ninja@slaphack.com>
To: Hans Reiser <reiser@namesys.com>
Cc: Alex Zarochentsev <zam@namesys.com>, reiserfs-list@namesys.com
Subject: Re: resizer?
Date: Wed, 13 Apr 2005 23:25:14 -0500 [thread overview]
Message-ID: <425DF0AA.9020804@slaphack.com> (raw)
In-Reply-To: <425D50FE.20501@namesys.com>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hans Reiser wrote:
> David Masover wrote:
>
>
>>
>>I realize that this may not be quite the industrial-strength repacker
>>that you wanted, but it should be immediately useful, which is a lot
>>better than "We might do it if you pay us."
>
>
> Just wait a little, and shortly after we go into the kernel we will work
> on the repacker.
>
> Hans
Disclaimer: I've hardly read any of the Reiser4 code, and I'm not
really an authority on this subject. I just like to pretend that I am.
I would take this off-list, but I'm curious about whether I'm wrong.
The repacker (and the resizer) doesn't seem like a hugely complicated
concept, unless you're trying to streamline the user experience during
the process. "On-line" means that I don't have to use a bootdisk and
stop all my servers. It doesn't mean that I would do it at any time
other than 2 AM, when I do backups, when I generally expect almost 0
traffic.
Basically, I'm saying that an off-line or a slow on-line shrinker should
have been done by now. In fact, it should have been done before the
meta-files, because meta-files benefit from a repacker, but not the
other way around.
Since you've told me to wait, I'm going to write this, because it's
easier for me to write documentation than to read code. This is
probably the fault of school, and will likely disappear this summer.
Anyway, this is how I think the resizer should be done:
If we are growing the FS, we should lock everything necessary, then
change the size value for the FS and make the new blocks available.
Unless we're actually storing something in unused nodes, this should be
an instantaneous operation which requires very little hacking to add. I
seem to remember that there was even an offline resizer (growing only)
awhile ago.
If we are shrinking the FS, we first set the new size of the FS in RAM,
so that nothing will try to write to the "chopped-off" portion until
we're done.
Next, we turn off the "write-in-the-middle" feature for large
database-like files (where a block in the middle of a huge file may be
written twice to avoid fragmentation), so that absolutely no new writes
will go to the chopped-off portion.
Basically, the filesystem should already think it's shrunken by now, we
just need to make sure it doesn't freak out when it _reads_ blocks past
the end of the FS. We should capture warnings about this and dirty
those nodes on the spot (nodes which are being read and which are in the
chopped section) -- they are already in RAM, so it'll be faster that way.
Next, we start walking the tree (as you described), dirtying all the
blocks we find which are in the chopped portion and leaving the rest
alone. We need to be careful about locking here, but that should just
mean "Lock the block we're dealing with, or if locks aren't that
granularity, lock the whole file." Locking should block, and userland
shouldn't have to know about it except to notice that the FS seems a
little slow right then.
This isn't as dangerous as it seems. If there is a crash, we just go
back to the old size -- automatically, since the new size hasn't been
written to disk anywhere yet -- with the only difference being that most
of the files will be already moved to where we want them.
Locking isn't as hard as it seems. If this were a VFS-level operation,
we'd have to worry about a new directory being created, a file being
moved, or our current path being deleted out from under us, but we
aren't working on the semantic layer, we're working on the key/object
layer. If I'm right, that means that all the things that we'd have to
worry about are merely seen as new writes, and would thus go to the new
places.
Metadata blocks may need a tiny bit of special treatment, since it may
be some small amount of data changing in-place. All we do here is, when
we notice any attempted write outside the new FS size, but inside the
old FS size, we relocate before we flush it out to disk. If this means
there's some parent metadata block we need to move, we do it afterwards,
as part of the same transaction. When we finally get to a parent block
that does not need to be moved, we close the transaction. This isn't as
elegant as the method for moving data blocks, but it works. I think.
The nice thing about this is that for the most part, the net impact on
normal FS operation is about the same as that of doing a large "cp -a".
Thoughts? How close to right is this? Do you already have another
document on the same thing that I should be reading?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iQIVAwUBQl3wqXgHNmZLgCUhAQIIGw//SWn2lkNPAGrFcF/r+Vr3t84l/haxnDFL
AF9/xARb6vQ/Mu/AQEd/L8lNabLPymXdzfBUJan2mhLFH97SrlGrA3hdBDcd9xMi
LXlvernTOFcv63jTB2cEq4awnMpTih4mZFrp1qAJ0kcWSu8oaCBUaOk3htXBfuKU
YAkireyHU6EWV2HQlfHmJrd9G/Z0CR6JmmAfVeBKG1CkI0t4Y86GmbeMVqsLdSz1
VEHfdTsCWgcaaod5GOjMk7BbB1a+fvf2wDk3ZsTiCkk8KP1JYPjKnXpCgG3ts8np
hMH1CEDj2Ql+lga8s44fXc0zrez6OAMjzMc/erNc6eUA7iFedQhmQW5oPxMu7TNh
aDF8PekMeYF1cYR1gFXG7B2P5gFx/k2KqDCxzHFNGKZLtSDBvuVlotDD7oJspYpd
5qvVQ0Mj1iYe6bxnV11rCHOvE2f56JlrFJtmzmEI0vmsln0sE4WktxKFONddwf5H
FuEn0L6XB+HkA9gsvkrM3J5xTd2PP1G02oF1MQFRIe3+CsomlSOwE1ZjfEi81s/p
z3Lvz6+0AO8xS7L2et84/y6uCaTb2/z8LZUhKMKx2j+OaUSBqgrzTYjCcotYooYO
7OM4KrSwpjmYQkCtm+iGYy8eC9sv09ng+YsE7F0MJlV1YZ17wo0eRbOVoUJpIgFF
kyq/7WnIUwM=
=D55S
-----END PGP SIGNATURE-----
next prev parent reply other threads:[~2005-04-14 4:25 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-04-04 3:55 resizer? David Masover
2005-04-04 8:53 ` resizer? Alex Zarochentsev
2005-04-05 1:53 ` resizer? David Masover
2005-04-13 17:03 ` resizer? Hans Reiser
2005-04-14 4:25 ` David Masover [this message]
2005-04-15 17:20 ` resizer? Hans Reiser
2005-04-15 22:56 ` resizer? David Masover
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=425DF0AA.9020804@slaphack.com \
--to=ninja@slaphack.com \
--cc=reiser@namesys.com \
--cc=reiserfs-list@namesys.com \
--cc=zam@namesys.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.