public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* File System conversion -- ideas
@ 2003-06-29  6:57 rmoser
  2003-06-30 13:05 ` Jesse Pollard
  0 siblings, 1 reply; 88+ messages in thread
From: rmoser @ 2003-06-29  6:57 UTC (permalink / raw)
  To: linux-kernel

I know I spout a ... wtf?  HTML composing?  *attempts to eliminate*

I know I spout a lot of crap, and wish I could just do it all (can we get
a "Make a small device driver for virtual hardware in Linux 2.4 and 2.5"
tutorial up on kernel.org?!), but I think I've got some good ideas.  At
any rate, the good is kept and the bad is weeded out, right?

Anyhow, I'm thinking still about when reiser4 comes out.  I want to
convert to it from reiser3.6.  It came to my attention that a user-space
tool to convert between filesystems is NOT the best way to deal with
this. Seriously, you'd think it would be, right?  Wrong, IMHO.

You have the filesystem code for every filesystem Linux supports.  It's
there, in the kernel.  So why maintain a kludgy userspace tool that has
to be rewritten to understand them all?  I have a better idea.

How about a kernel syscall?  It's possible to do this on a running
filesystem but it's far too difficult for a start, so let's start with
unmounted filesystems mmkay?

**** BEGIN WELL STRUCTURED MESSAGE ****

I'm going to go over a method of building into the kernel a filesystem
conversion suite.  I am first going to go over a brief overrun of the concept,
then I will draw up a roadmap, and then I will explain why I believe this is
the best way to solve this problem.

What I am suggesting is a kernel syscall suite that will allow a simple
userspace application to invoke a conversion between filesystems on an
unmounted filesystem.  The idea is that instead of maintaining the tool,
you (sorry I keep wanting to say "we" for no reason, so excuse me if I do)
simply code this and then maintain the kernel as usual, almost forgetting
about the tool because it changes with the kernel.

The first thing that has to happen is that the kernel filesystem drivers must
be altered to allow the filesystems to draw out the meta-data and group it
with the data, transmit it to the conversion functions, and have this data given
to them to be rewritten.  This will require a quick pre-pass of each individual
inode and a comparison to decide if the converted filesystem will actually
fit on disk; ext3 being converted to ext3 with a larger block size will FAIL if
the conversion causes the data to be bigger than the media.

The second thing that must happen is a syscall has to be added that allows
for conversion to be invoked.  Simple.  Preferably these functions would fork
from the kernel in a new thread or process and work in userspace, to avoid
locking the kernel as they execute and lagging the userspace by making the
kernel eat massive resources.

The last thing is that a user-space program to invoke these syscalls has to be
coded.

Here is a suggested roadmap, with excessive detail:

1) Create a method for storing meta-data for each file/directory on a filesystem
which is being slowly destroyed.  The data structures have to house everything
including the data that goes into the inode tables and all meta-data about the
inode, plus the data for the file/directory itself.  It MUST be object oriented,
because some meta-data will not transfer from one filesystem to another.  Each
unit should, possibly MUST, be compressed, since it MAY be larger than the
original input and also because it likely will have to be stored in the space of the
original data, unless the data is slowly shifted down the filesystem.  It is
preferable to make this datasystem fault tolerant, so that if it goes down, the
conversion can be continued without damage.  It should be possible to plan a
conversion on umount, so that the root filesystem may be converted at shutdown.
  - Object oriented:  Store meta-data that may not be recognized by the new
    filesystem
  - Journalized:  Don't break!
  - Compress data unit:  Don't get bigger than input.  Option is per-unit (compressed
    files get bigger when compressed!)
  - Store data that is needed to resume the conversion at any time: There may be
    a collossal system crash during conversion!
  - Differentiate between each filesystem structure and the datasystem used during
    conversion:  Must be able to disassemble one filesystem and reassemble it to
    another WITHOUT getting lost!
  - ... I had another important thing I forgot for now.  You guys are smart and there's
    more of you than me.  You figure it out.

2) Write this datastructure into the filesystems section of the kernel.

3) Rewrite the filesystem drivers in the kernel to be able to communicate with
the filesystem conversion datastructure code.  This will allow the slow systematic
destruction of the filesystem in place and at the same time the slow systematic
creation of the new filesystem for EVERY filesystem [with write support?] in the
kernel.

4) Impliment a syscall to initiate this process.  The functions should be run in
userspace if it is possible to fork execution out of the kernel and into userspace.
These syscalls include the checks to make sure the filesystem is not mounted.

5) Impliment a userspace program to call these functions.  It is slave to syscalls;
it does NOT do the checks itself.

6) Revisit steps 1 through 5 as needed until the process works properly.

7) Continue on to recode kernel VFS to allow the conversion to take place on a
running filesystem, and to allow that filesystem to be mounted even if the
conversion was forcibly stopped.  This will allow a smoother conversion of the
root filesystem and allow the user to keep running during conversion of the root
filesystem, although with likely a massive latencey issue.

I believe this is the best method of dealing with the problem of filesystem
conversion. Current methods include making a new, empty filesystem and
copying over all files as root with a 'umask 000' command first.  Future methods
excluding this one may include a userspace program with understanding of
multiple filesystems, or a userspace program that understands kernel modules.
These have the following flaws:

 - Copying the files requires a large amount of disk space to create a new
    partition and place the new filesystem on it.  Also, it alters the entire
    partition layout and forces the user to either ping-pong between partitions
    or rewrite his /etc/fstab and possibly his root= parameter in his kernel command
    line.
 - A userspace program with filesystems coded in will require constant matinence
    as new filesystems are created and old filesystems are maintained.  The
    constant development of filesystems such as ext2/3 (i.e. 2.0 doesn't understand
    the extra features in the version of ext2 that 2.2 has) and reiserfs causes this
    to require the rewriting of code in the userspace program, which is redundant
    because the filesystem has already been implimented in an incompatible
    manner in the kernel itself.
 - A userspace program using kernel modules will be more prone to bugs, as it has
    to simultaneously grok all version of kernel modules.  The structure of kernel
    modules changes as time goes on.  The program will grow and grow, or lose the
    ability to communicate with older kernel versions.  It has the advantage that it
    may be written to also grok older kernel modules; however, the entire
    infrastructure described above may be backported to older kernel versions,
    making this argument moot.

My method has the flaw that you have to get the kernel developers to agree to it.
If Linus is reading this, I'd like to at least ask that you hold back rejecting this until
the other developers have a chance to examine it (pessimistic outlook I have on
things, isn't it?).  It also has the flaw of requiring massive kernel-level work to
impliment, which in itself may render the kernel useless if not done quite right.
These changes to the filesystem drivers should not affect the filesystem drivers
until the kernel explicitely calls them to do the conversions (and when the option
is enabled in the make {menu | x}config).  It is flawed also in that it requires
massive CPU and possibly memory; but that is expected of any conversion of
filesystems.

Well, tell me what you think.  This is where my thinking ends.


^ permalink raw reply	[flat|nested] 88+ messages in thread
* Re: File System conversion -- ideas
@ 2003-06-29 10:11 John Bradford
  2003-06-29 13:28 ` Jamie Lokier
  2003-06-29 18:26 ` rmoser
  0 siblings, 2 replies; 88+ messages in thread
From: John Bradford @ 2003-06-29 10:11 UTC (permalink / raw)
  To: linux-kernel, mlmoser

> Anyhow, I'm thinking still about when reiser4 comes out.  I want to
> convert to it from reiser3.6.  It came to my attention that a user-space
> tool to convert between filesystems is NOT the best way to deal with
> this. Seriously, you'd think it would be, right?  Wrong, IMHO.
>
> You have the filesystem code for every filesystem Linux supports.  It's
> there, in the kernel.  So why maintain a kludgy userspace tool that has
> to be rewritten to understand them all?  I have a better idea.
>
> How about a kernel syscall?  It's possible to do this on a running
> filesystem but it's far too difficult for a start, so let's start with
> unmounted filesystems mmkay?

Apart from the special case of converting from one major version of a
filesystem to another major version of the same filesystem, I think
the performance of an on-the-fly filesystem conversion utility is
going to be so much worse than just creating a new partition and
copying the data across, that the only reason to do it would be if you
could do it on a read-write filesystem without unmounting it.

What I'd like to see is union mounts which allowed you to mount a new
filesystem of a different type over the original one, and have all new
writes go to the new fileystem.  I.E. as files were modified, they
would be re-written to the new FS.  That would be one way of avoiding
the performance hit on a busy server.

John.

^ permalink raw reply	[flat|nested] 88+ messages in thread
* Re: File System conversion -- ideas
@ 2003-06-29 16:13 John Bradford
  2003-06-29 19:16 ` Jamie Lokier
  0 siblings, 1 reply; 88+ messages in thread
From: John Bradford @ 2003-06-29 16:13 UTC (permalink / raw)
  To: jamie, john; +Cc: linux-kernel, mlmoser

> > I think
> > the performance of an on-the-fly filesystem conversion utility is
> > going to be so much worse than just creating a new partition and
> > copying the data across,
>
> which is awfully difficult if you have, say, a 60GB filesystem, a 60GB
> disk, and nothing else.

Well, I don't partition all of the space on every new disk I buy
straight away, I partition off what I think I'll need, and leave the
rest unallocated.

> > that the only reason to do it would be if you
> > could do it on a read-write filesystem without unmounting it.
>
> IMHO even if it requires the filesystem to be unmounted, it would
> still be useful.  More challenging to use - you'd have to boot and run
> from ramdisk, but much more useful than not being able to convert at all.

Only if it is the root filesystem, the filesystem of which generally
isn't going to affect overall performance that much.

> > What I'd like to see is union mounts which allowed you to mount a new
> > filesystem of a different type over the original one, and have all new
> > writes go to the new fileystem.  I.E. as files were modified, they
> > would be re-written to the new FS.  That would be one way of avoiding
> > the performance hit on a busy server.
>
> But useless unless you have a second disk lying around that you don't
> use for anything but filesystem conversions.

Not at all.  You can just use unpartitioned space on your existing
disk.

John.

^ permalink raw reply	[flat|nested] 88+ messages in thread
* Re: File System conversion -- ideas
@ 2003-06-29 16:24 John Bradford
  0 siblings, 0 replies; 88+ messages in thread
From: John Bradford @ 2003-06-29 16:24 UTC (permalink / raw)
  To: linux-kernel, thervoy; +Cc: jamie, john, mlmoser

> > I think
> > 
> >>the performance of an on-the-fly filesystem conversion utility is
> >>going to be so much worse than just creating a new partition and
> >>copying the data across,
> > 
> > 
> > which is awfully difficult if you have, say, a 60GB filesystem, a 60GB
> > disk, and nothing else.
> >
>
> I think that filesystem conversion on-the-fly is useless. Why? If you're
> making conversion of filesystem, you have to make good backup of data
> from that filesystem.

I agree.

Imagine a webserver with all it's webpages on a 40 GB EXT-2 partition
on /dev/sda1.

If I wanted to move the data on to a ReiserFS partition, I would just:

* Create the new partition on another device, E.G. /dev/sdb1
* Mount /dev/sda1 read-only
* Copy the data across to /dev/sdb1 as a nice process
* Stop the webserver processes
* Unmount /dev/sda1
* Mount /dev/sdb1 read-only
* Restart the webserver processes
* Test it
* Mount /dev/sdb1 read-write
* Keep /dev/sda1 around as a quick-to-access backup until I was sure
  it was all working correctly.
* Re-use /dev/sda1

The webserver would be off-line for only a few seconds, and
performance would not be significantly degraded at any time.

John.

^ permalink raw reply	[flat|nested] 88+ messages in thread
* Re: File System conversion -- ideas
@ 2003-06-29 18:37 John Bradford
  2003-06-29 18:48 ` rmoser
  2003-06-29 19:42 ` Jamie Lokier
  0 siblings, 2 replies; 88+ messages in thread
From: John Bradford @ 2003-06-29 18:37 UTC (permalink / raw)
  To: linux-kernel, wowbagger

> This is a place where logical volume management can help.
>
> For example, suppose you have a 60G disk, 55G of data, in ext2, and you 
> wish to convert to ReiserFS.
>
> Step 1: Shrink the volume to 55G. This requires a "shrink disk" utility 
> for the source file system (which exists for the major file systems in 
> use today).
> Step 2: Create an LVM block in the remaining 5G.
> Step 3: Create a ReiserFS in the LVM block.
> Step 4: Move 5G of data from the ext2 system to the ReiserFS block.
> Step 5: Shrink the ext2 volume by another 5G
> Step 6: Convert that 5G into an LVM block
> Step 7: Add that block to the ReiserFS volume group.
> Step 8: Grow the ReiserFS.
> Step 9: Repeat 4-8 as needed.
>
>
> This is why I'd really love to see LVM|EVM become standard, not just in 
> the kernel but in the distributions - if every distro by default made 
> all Linux volumes in LVM, then migrating data to bigger drives/adding 
> more space/converting file systems would be so much easier.

It's also a good reason not to use one huge partition on each disk,
and a good reason not to partition the whole disk when it's not
needed.

I've seen, (mainly desktop, not server), Linux machines with one
physical disk containing two partitions, root and swap, with the swap
partition being twice the physical memory of the box, even when the
box has more than a gigabyte of physical RAM.

It's usually more flexible just to partition the space you need, and
add more partitions when necessary.  For typical desktop use, swap
isn't even necessary with 1 GB of physical RAM.

For example, if you have an 80 GB disk, you could initially partition
10 GB for the root partition, and leave 70 GB unused.  When the root
partition fills us, you can simply use du -s /* to see which
directories are taking up the most space, and move them to separate
partitions.

John.

^ permalink raw reply	[flat|nested] 88+ messages in thread
* Re: File System conversion -- ideas
@ 2003-06-29 18:58 John Bradford
  2003-06-29 19:12 ` rmoser
  0 siblings, 1 reply; 88+ messages in thread
From: John Bradford @ 2003-06-29 18:58 UTC (permalink / raw)
  To: john, linux-kernel, mlmoser

> >> Anyhow, I'm thinking still about when reiser4 comes out.  I want to
> >> convert to it from reiser3.6.  It came to my attention that a user-space
> >> tool to convert between filesystems is NOT the best way to deal with
> >> this. Seriously, you'd think it would be, right?  Wrong, IMHO.
> >>
> >> You have the filesystem code for every filesystem Linux supports.  It's
> >> there, in the kernel.  So why maintain a kludgy userspace tool that has
> >> to be rewritten to understand them all?  I have a better idea.
> >>
> >> How about a kernel syscall?  It's possible to do this on a running
> >> filesystem but it's far too difficult for a start, so let's start with
> >> unmounted filesystems mmkay?
> >
> >Apart from the special case of converting from one major version of a
> >filesystem to another major version of the same filesystem, I think
> >the performance of an on-the-fly filesystem conversion utility is
> >going to be so much worse than just creating a new partition and
> >copying the data across, that the only reason to do it would be if you
> >could do it on a read-write filesystem without unmounting it.
> >
>
> You've entirely missed the point :/  Did you read the last section?

Yes, but...

> I noted
> that the "make new partition and copy" method requires, first off, space
> for a new partition.  All my partitions have massive amount of data on them.
> I can't do that.  Those of us that can have to either do it twice, or rewrite
> fstab.

Rewriting fstab shouldn't be a problem :-).

> Eventually I'm hoping it can be done on a read-write filesystem.  It's
> possible; I've thought about how to defragment read-write datasystems
> without getting in the way of logical operations.

Seriously, though, I was thinking more of what's most useful in a
server situation, where it's not uncommon to have a lot of spare
capacity - I don't think that the kernel mode read-only only converter
is going to be much of an advantage over a userspace solution in those
situations, whereas a read-write one would potentially be, because
although it's reasonable to expect backups to be done anyway, if you
can avoid the downtime needed for the restore, that's a Good Thing.

> >What I'd like to see is union mounts which allowed you to mount a new
> >filesystem of a different type over the original one, and have all new
> >writes go to the new fileystem.  I.E. as files were modified, they
> >would be re-written to the new FS.  That would be one way of avoiding
> >the performance hit on a busy server.
> >
>
> mmmm, then you'd need both fs' though.  That's not conversion ;-)

The idea was to transparently delete files from the old filesystem
once they had been written to, and therefore transferred to the new
filesystem.

I think you've missed my point - for a desktop machine, an hour or two
downtime is usually no problem.  For an ISPs webserver, it usually
is, (unless there are a cluster of them serving requests for the same
sites).  However, to be able to convert filesystems without:

* Significant performance loss of network serving applications
* Significant downtime

is a very desireable feature, but the ability to do this on a
read-write filesystem is critical - if it has to be unmounted, it's
not as useful.

The reason I mentioned union mounts was because BSD already has union
mounts - see the mount_union manual page for more details.  I don't
know of an implementation that allows you to automatically delete the
file on the old filesystem, when the copy on the new filesystem has
been made, though.

John.

^ permalink raw reply	[flat|nested] 88+ messages in thread
* Re: File System conversion -- ideas
@ 2003-06-29 20:06 John Bradford
  0 siblings, 0 replies; 88+ messages in thread
From: John Bradford @ 2003-06-29 20:06 UTC (permalink / raw)
  To: jamie, john; +Cc: linux-kernel, mlmoser

> > > > that the only reason to do it would be if you
> > > > could do it on a read-write filesystem without unmounting it.
> > >
> > > IMHO even if it requires the filesystem to be unmounted, it would
> > > still be useful.  More challenging to use - you'd have to boot and run
> > > from ramdisk, but much more useful than not being able to convert at all.
> > 
> > Only if it is the root filesystem, the filesystem of which generally
> > isn't going to affect overall performance that much.
>
> ...now use a single "/" filesystem on most systems, with a tiny
> "/boot" one to ensure booting.  With journalling, this risk of losing
> data this way is much lower than it used to be, and the old reason for
> using multiple partitions - to avoid having to fsck /usr - no longer applies.

Well, I prefer to have separate patitions to reduce fragmentation and
increase flexibility, but I can see there are reasons for having a
single root filesystem.

> > > But useless unless you have a second disk lying around that you don't
> > > use for anything but filesystem conversions.
> > 
> > Not at all.  You can just use unpartitioned space on your existing
> > disk.
>
> So you have as much space unpartitioned on your disks as you are
> actually using to store data?  I generally don't.

I probably average about 20% of the disk partitioned in my single disk
desktop boxes.

John.

^ permalink raw reply	[flat|nested] 88+ messages in thread
* Re: File System conversion -- ideas
@ 2003-06-29 20:20 John Bradford
  2003-06-29 20:44 ` rmoser
  0 siblings, 1 reply; 88+ messages in thread
From: John Bradford @ 2003-06-29 20:20 UTC (permalink / raw)
  To: john, linux-kernel, mlmoser

> >> You've entirely missed the point :/  Did you read the last section?
> >
> >Yes, but...
> >
> >> I noted
> >> that the "make new partition and copy" method requires, first off, space
> >> for a new partition.  All my partitions have massive amount of data on
> >> them.
> >> I can't do that.  Those of us that can have to either do it twice, or
> >> rewrite
> >> fstab.
> >
> >Rewriting fstab shouldn't be a problem :-).
> >
> >> Eventually I'm hoping it can be done on a read-write filesystem.  It's
> >> possible; I've thought about how to defragment read-write datasystems
> >> without getting in the way of logical operations.
> >
> >Seriously, though, I was thinking more of what's most useful in a
> >server situation, where it's not uncommon to have a lot of spare
> >capacity - I don't think that the kernel mode read-only only converter
> >is going to be much of an advantage over a userspace solution in those
> >situations, whereas a read-write one would potentially be, because
> >although it's reasonable to expect backups to be done anyway, if you
> >can avoid the downtime needed for the restore, that's a Good Thing.
> >
>
> It should be easy enough.  I dunno if it'll require a VFS rewrite or not though.
> The idea is to buffer changes to and allow retrieval of logical filesystem
> objects, which requires.. well, RAM.  Although, since the inodes on the new
> fs won't need to be in the same order they were in on the old fs, it should be
> possible to simply write new data to the new fs, IF you watch what you're
> doing.  And yes, I do realize I'm talking about writing to half-existant
> filesystems that by rights can't even mount.  (Actually, more like an empty
> filesystem that's jumbled around physically, but is being addressed logically
> anyway).
>
> Easy trick:  Skip deleted inodes, and if you have to change an inode, have
> the old fs go mark it as deleted real quick and free the space around it, giving
> it to the conversion datasystem.  Now you can run read-write while you do it.
>
> Remember also that I insist that there must be a journal in the CDS
> (conversion datasystem).
>
> >> >What I'd like to see is union mounts which allowed you to mount a new
> >> >filesystem of a different type over the original one, and have all new
> >> >writes go to the new fileystem.  I.E. as files were modified, they
> >> >would be re-written to the new FS.  That would be one way of avoiding
> >> >the performance hit on a busy server.
> >> >
> >>
> >> mmmm, then you'd need both fs' though.  That's not conversion ;-)
> >
> >The idea was to transparently delete files from the old filesystem
> >once they had been written to, and therefore transferred to the new
> >filesystem.
> >
>
> Heh, sounds like what I'm doing but you're hitting my final goal from the
> beginning, and using two partitions.
>
> >I think you've missed my point - for a desktop machine, an hour or two
> >downtime is usually no problem.  For an ISPs webserver, it usually
> >is, (unless there are a cluster of them serving requests for the same
> >sites).  However, to be able to convert filesystems without:
> >
> >* Significant performance loss of network serving applications
> >* Significant downtime
> >
> >is a very desireable feature, but the ability to do this on a
> >read-write filesystem is critical - if it has to be unmounted, it's
> >not as useful.
> >
>
> That's the eventual idea.  As for performance, errm.  The performance loss
> would be in referencing the CDS to find where the data in each filesystem is,
> and in the CPU time and RAM used up, along with the massive disk access,
> while the system does its job.  Shouldn't be a problem on servers though;
> IIRC they use SCSI disks and fast CPUs?

The disk accesses were what I was thinking of.  May well not be a
problem in reality.

> >The reason I mentioned union mounts was because BSD already has union
> >mounts - see the mount_union manual page for more details.  I don't
> >know of an implementation that allows you to automatically delete the
> >file on the old filesystem, when the copy on the new filesystem has
> >been made, though.
> >
>
> If you think about it, you have this:
>
> [PARTITION 1]
>     |
>    V
> [PARTITION 2]
>
> I have this (the == is an equivalence signm i.e. this is what's inside):
>
> [PARTITION]
>     ==
> [DATASYSTEM]
>     ==
> [FILESYSTEM 1]
>     |
>     V
> [DATASYSTEM ATOMS]
>     |
>     V
> [FILESYSTEM 2]
>
> Both filesystems are the full size of the partition, and so is the
> datasystem.  The only difference is that before you start you have
> to make sure that the datasystem's gonna fit in with the free space
> on the first filesystem, and still have space to start the second
> filesystem, and then have space for its atoms.

Just thought - that's going to be a problem in read-write mode :-/.

If the disk fills up, we'd need to be able to maintain a consistant
filesystem structure, (at least good enough so that a separate
fsck-like utility could repair it - if the disk filled up, then the
conversion couldn't be done on-the-fly).

> These atoms will
> slowly be destroyed as they go into the second filesystem.  You
> have to also make sure that the second FS won't be bigger than the
> first, and will at the end have enough to hold at least the empty
> datasystem and one atom.
>
> I feel I should note, since I forgot before, that an atom can contain part
> of the data for an inode, as long as you know this and can write the atom
> out to the new filesystem and get more of the old.

Seems like a solid idea, though.  As long as it worked on at least
read-only mounted filesystems, I'd be quite interested in seeing it in
the mainline kernel.

John.

^ permalink raw reply	[flat|nested] 88+ messages in thread
* Re: File System conversion -- ideas
@ 2003-06-29 21:59 John Bradford
  0 siblings, 0 replies; 88+ messages in thread
From: John Bradford @ 2003-06-29 21:59 UTC (permalink / raw)
  To: john, linux-kernel, mlmoser

> >> Both filesystems are the full size of the partition, and so is the
> >> datasystem.  The only difference is that before you start you have
> >> to make sure that the datasystem's gonna fit in with the free space
> >> on the first filesystem, and still have space to start the second
> >> filesystem, and then have space for its atoms.
> >
> >Just thought - that's going to be a problem in read-write mode :-/.
> >
> >If the disk fills up, we'd need to be able to maintain a consistant
> >filesystem structure, (at least good enough so that a separate
> >fsck-like utility could repair it - if the disk filled up, then the
> >conversion couldn't be done on-the-fly).
> >
>
>
> mmm.. hadn't thought of that.
>
> 1 second answer:  Lock down some of the freespace.  Do NOT let it
> get full.  You know how ext2 reserves 5% for the superuser?  Do that.
> Reserve enough freespace to keep working and finish the conversion.
> Predict from the beginning how much free space is going to be needed,
> and how much is going to be left over at the very final stages of the
> conversion.

That should work fine in most cases - it's not a problem to reserve
too much for the duration of the converstion, as it all gets freed
afterwards.  In most cases, we'd probably only need a relatively small
amount of space to allow writes whilst the conversion is in progress.

John.

^ permalink raw reply	[flat|nested] 88+ messages in thread
* Re: File System conversion -- ideas
@ 2003-06-30  8:55 John Bradford
  2003-06-30  9:36 ` Hans Reiser
  0 siblings, 1 reply; 88+ messages in thread
From: John Bradford @ 2003-06-30  8:55 UTC (permalink / raw)
  To: jaharkes, linux-kernel, mlmoser

> >I typically call that 'tar' and it works great whenever I want to
> >convert from one filesystem to another. I just haven't got a clue why
> >you want to implement tar (or cpio) in the kernel as the userspace
> >implementation is already pretty usable.
> >
>
> tar --inplace --fs-convert --targetfs=reiserfs /dev/hda1
>
> .......  it doesn't like it

tar -cf - -C /old_filesystem | tar -xf - -C /newfilesystem

Works fine, and copies symbolic links, and device files properly.  If
you don't want sparse files expanded, you can use --sparse.

Yes, it needs both old and new filesystems on-line at once.  That
isn't a problem for a lot of users.

It has the advantage over an on-line conversion utility that the files
are layed out in the way they were intended to be by the filesystem,
for performance, and anti-fragmentation reasons.

There are probably a few smaller ISPs, with customer webservers which
are not guaranteed to be backed up, who would like to be able to
switch to a more modern filesystem at some point in the future,
without downtime.  Union mounts would potentially be useful here - the
old data can be kept on whatever filesystem it's on, and a new
filesystem union mounted over it.  If a file is updated, it's
re-written to the new filesystem.  Data that's changed would migrate
to the new filesystem.  Once most of it is across, you could touch all
of the remaining files, and force them across.  Your webserver
performance shouldn't be impacted during all of this, (and might even
improve, if write performance is much better on the new filesystem).

For desktop users, small amounts of downtime usually don't matter,
filesystem performance isn't usually critical either, and if the data
isn't backed up anywhere, data integrity _is_ important, so I would
suggest that they either stick with their existing filesystem, or
backup and restore.

A conversion utility could save the time of the restore, but if it
leaves the user with a badly fragmented or poorly layed out
filesystem, it could well be counter-productive.

So, assuming that the main real world use would be small, but busy
servers which want better performance for new data, and old data
gradually migrated across, but with minimum performance impact, union
mounts would be a way to achieve this.

Union mounts would be a lot easier to implement, and a lot more
useful than a converstion utility.  Note that BSD already has union
mounts.

IFF a conversion utility could be written that:

* Works on read-write mounted filesystems
* Doesn't produce a poorly layed out filesystem

Then _maybe_ it would be useful.

Personally, if I was interested in implementing this, (which I'm not),
I wouldn't worry about data integrity at all times - (if it failed for
some reason, it would require a restore of the backup which the user
was advised to make anyway), but create the framework of the new
filesystem image in memory, with references to the location of data in
the old, (real), filesystem, having moved all the data to the end of
the existing filesystem.  Once complete, I'd unmount the existing FS,
and overwrite it with the in-memory filesystem, making sure to read
anything I needed from the old, (unmounted), filesystem image before
overwriting it.

I.E. to convert a filesystem with three files, (A, B, and C), and some
free space, (F).  The block numbers are underneath.

Old filesystem with 4K block size
----------------------------
AAAAABCCCBBCCBBCCCCCCCFFFFFF
1234567891111111111222222222
         0123456789012345678

Move data to the end.

FFFFFFCCCBBCCBBCCCCCCCAAAAAB

Desired new filesystem with 8K block size
----------------------------
AAAAAaFFBBBBBbFFCCCCCCCCCCCC
1 2 3 4 5 6 7 8 9 1 1 1 1 1
                  0 1 2 3 4

(the lower case letters represent the extra space taken up by the
larger block size).

So, we would create a table in memory:

New FS  Old FS
1       23,24
2       25,26
3       27
5       28,10
6       11,14
7       15
9       7,8
10      9,12
11      13,16
12      17,18
13      19,20
14      21,22

This table maps the in-memory new filesystem to blocks in the old
filesystem.  Now, we could unmount the old filesystem, and mount the
new virtual filesystem in it's place, then start writing the new
filesystem to disk:

Read data from old blocks 23 and 24, and write it to new block 1, overwriting old blocks 1 and 2,
Read data from old blocks 25 and 26, and write it to new block 2, overwriting old blocks 3 and 4,
Read data from old block  27       , and write it to new block 3, overwriting old blocks 5 and 6,
Store data from old blocks 7 and 8, (for new block 9), in RAM,
                                         erase block 4          , overwriting old blocks 7 and 8,
Read data from old blocks 28 and 10, and write it to new block 5, overwriting old blocks 9 and 10,
Store data from old block 12, (for new block 10) in RAM,
Read data from old blocks 11 and 14, and write it to new block 6, overwriting old blocks 11, and 12,
Store data from old block 13, (for new block 11), in to RAM.
Read data from old block  15       , and write it to new block 7, overwriting old blocks 13, and 14,
Read data from old block 16, (for new block 11), in to RAM.
                                         erase block 8          , overwriting old blocks 15, and 16,
Write data from RAM, in to new block 9, and free that RAM.

...etc...

At the end, you'd have the new filesystem on disk, and there would be
a direct mapping between the virtual RAM-based filesystem and the disk
blocks.  At that point, you could umount the virtual filesystem, and
mount the disk based one.

This would be do-able on a read-write filesystem, because writes would
go only to the new RAM-based virtual filesystem - the original
filesystem would be mounted read-only before the convertion process
started.

So, it's interesting and possible in theory, but is it practical or
worth implementing?  I don't think so.  If somebody is interested in
implementing it I'd be pleased to see it in the kernel, but it's not a
project I'd have any real interest in myself.

John.

^ permalink raw reply	[flat|nested] 88+ messages in thread
* Re: File System conversion -- ideas
@ 2003-06-30 14:11 John Bradford
  2003-06-30 15:45 ` Leonard Milcin Jr.
  0 siblings, 1 reply; 88+ messages in thread
From: John Bradford @ 2003-06-30 14:11 UTC (permalink / raw)
  To: john, reiser; +Cc: jaharkes, linux-kernel, mlmoser

> I tend to agree with the below.  I just want to add though that there 
> are a lot of users who have one disk drive and and no decent network 
> connection to somewhere with a lot of storage.  It would be nice to 
> adapt tar to understand about the reiser4 resizer and mkreiser4 and the 
> reiser3 resizer, and the partitioner (yah, at this point it would no 
> longer really be tar, but.... ), and to have it shrink the V3 partition, 
> create a reiser4 partition, copy some of the V3 partition to the V4 
> partition, shrink the V3 partition some more, etc.....

Out of interest, won't the resulting filesystem be excessively
fragmented, and cause worse performance than a virgin filesystem, or
does the reiser resizer actively prevent that?

John.

^ permalink raw reply	[flat|nested] 88+ messages in thread
* Re: File System conversion -- ideas
@ 2003-07-01 16:04 Matt Reuther
  2003-07-01 16:13 ` Frank Gevaerts
  0 siblings, 1 reply; 88+ messages in thread
From: Matt Reuther @ 2003-07-01 16:04 UTC (permalink / raw)
  To: linux-kernel

It seems like the loopback device would be useful for this. You can move all 
of you stuff into a mounted loopback device with the new fs. Is there not 
some utility to take a filesystem image from inside an fs, and overwrite 
that fs with it. It would be lots of sector-to-sector shuffling, but it 
would be cleaner than trying to convert.

I guess you could try overlaying the old and new filesystems by virtualizing 
the inodes, superblocks, directories, and other stuff in RAM, but you still 
have to write it to disk, and some of the metadata from one fs will collide 
with the other one. The superblock for ext2fs needs to written to several 
fixed places on the filesystem, which might also be needed by 
reiserfs/xfs/whatever.

Matt

_________________________________________________________________
MSN 8 helps eliminate e-mail viruses. Get 2 months FREE*.  
http://join.msn.com/?page=features/virus


^ permalink raw reply	[flat|nested] 88+ messages in thread
* Re: File System conversion -- ideas
@ 2003-07-07  8:43 John Bradford
  0 siblings, 0 replies; 88+ messages in thread
From: John Bradford @ 2003-07-07  8:43 UTC (permalink / raw)
  To: jesse, linux-kernel, mlmoser, svein.ove, viro

> What this boils down to is, "there may not be enough space".
> Personally I prefer incrementally resizing LVM partitions for conversion 
> anyway, but I'll take a stab at this.

Depending on the filesystem, incrementally resizing LVM paritions
could be a very _bad_ way to do it - continuously re-sizing a
partition will typically encourage poor layout and fragmentation.  It
would be possible to defragment and optimise the partition afterwards,
but that would extend the convertion time even more, especially if it
was done in a way which kept a consistent filesystem throughout, on a
filesystem without much free space.

The way to avoid, or at least minimise the problem of having one
partition filling the disk, is not to fully partition disks to begin
with - that gives you the flexibility to test and use different
partition types, and move data around.  Even without using LVM, it's
easy to move data around if it's on partitions which are each no
bigger than 25% of the disk.

John.

^ permalink raw reply	[flat|nested] 88+ messages in thread

end of thread, other threads:[~2003-07-07  8:22 UTC | newest]

Thread overview: 88+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-06-29  6:57 File System conversion -- ideas rmoser
2003-06-30 13:05 ` Jesse Pollard
  -- strict thread matches above, loose matches on Subject: below --
2003-06-29 10:11 John Bradford
2003-06-29 13:28 ` Jamie Lokier
2003-06-29 13:50   ` David D. Hagood
2003-06-29 18:31     ` rmoser
2003-06-29 19:55       ` David D. Hagood
2003-06-29 20:05         ` rmoser
2003-06-29 20:41           ` David D. Hagood
2003-06-29 20:53             ` rmoser
2003-06-29 20:22         ` Leonard Milcin Jr.
2003-06-30 16:05     ` Henning P. Schmiedehausen
2003-06-30 16:59       ` Leonard Milcin Jr.
2003-06-30 17:04         ` Kevin Corry
2003-06-30 17:37         ` Valdis.Kletnieks
2003-07-01  9:56       ` Stewart Smith
2003-06-29 13:54   ` Leonard Milcin Jr.
2003-06-29 18:45     ` rmoser
2003-06-29 19:37       ` Leonard Milcin Jr.
2003-06-29 19:43         ` Leonard Milcin Jr.
2003-06-29 19:48           ` rmoser
2003-06-30  3:52             ` Horst von Brand
2003-07-01 10:15             ` Stewart Smith
2003-07-01 14:55               ` Leonard Milcin Jr.
2003-07-01 15:41                 ` Stewart Smith
2003-07-01 16:19                   ` Leonard Milcin Jr.
2003-06-29 19:44         ` rmoser
2003-06-29 19:44         ` Jamie Lokier
2003-06-29 19:46           ` rmoser
2003-06-29 20:02           ` viro
2003-06-29 20:26             ` Leonard Milcin Jr.
2003-06-29 20:31             ` rmoser
2003-07-01 10:01         ` Stewart Smith
2003-06-29 19:28     ` Jamie Lokier
2003-06-29 19:35       ` rmoser
2003-06-29 19:42       ` viro
2003-06-29 19:45         ` rmoser
2003-06-29 20:00           ` viro
2003-06-29 20:19             ` Davide Libenzi
2003-06-29 20:25               ` viro
2003-06-29 20:45                 ` rmoser
2003-06-29 20:46                 ` Davide Libenzi
2003-06-30  9:13                 ` Nikita Danilov
2003-06-29 20:38               ` rmoser
2003-06-29 20:29             ` rmoser
2003-06-29 20:50               ` Hugo Mills
2003-06-29 21:00                 ` rmoser
2003-06-29 21:10                   ` Davide Libenzi
2003-06-29 21:37                   ` Hugo Mills
2003-06-29 21:54                     ` rmoser
2003-06-29 22:25                       ` Hugo Mills
2003-06-29 20:51               ` viro
2003-06-29 21:07                 ` rmoser
2003-06-29 21:08                 ` Chris Friesen
2003-06-30  0:25               ` Jan Harkes
2003-06-30  0:59                 ` rmoser
2003-07-01 20:03             ` Pavel Machek
2003-07-02 14:49               ` Jan Kara
2003-06-29 20:05           ` David D. Hagood
2003-06-29 20:36             ` rmoser
2003-06-30  0:05               ` Richard Braakman
2003-06-30  0:58                 ` rmoser
2003-06-29 21:32           ` Diego Calleja García
2003-06-30 13:26           ` Jesse Pollard
2003-06-30 13:42             ` Hans Reiser
2003-06-30 13:56               ` Jesse Pollard
2003-07-06 19:30             ` Svein Ove Aas
2003-06-29 18:26 ` rmoser
2003-06-29 16:13 John Bradford
2003-06-29 19:16 ` Jamie Lokier
2003-06-29 16:24 John Bradford
2003-06-29 18:37 John Bradford
2003-06-29 18:48 ` rmoser
2003-06-29 19:42 ` Jamie Lokier
2003-06-29 18:58 John Bradford
2003-06-29 19:12 ` rmoser
2003-06-29 20:06 John Bradford
2003-06-29 20:20 John Bradford
2003-06-29 20:44 ` rmoser
2003-06-29 21:59 John Bradford
2003-06-30  8:55 John Bradford
2003-06-30  9:36 ` Hans Reiser
2003-06-30 16:29   ` viro
2003-06-30 14:11 John Bradford
2003-06-30 15:45 ` Leonard Milcin Jr.
2003-07-01 16:04 Matt Reuther
2003-07-01 16:13 ` Frank Gevaerts
2003-07-07  8:43 John Bradford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox