* directory entries
@ 2008-08-23 20:38 Reinoud Zandijk
[not found] ` <20080823203853.GA19421-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
0 siblings, 1 reply; 15+ messages in thread
From: Reinoud Zandijk @ 2008-08-23 20:38 UTC (permalink / raw)
To: NILFS Users mailing list
[-- Attachment #1.1: Type: text/plain, Size: 492 bytes --]
Dear folks,
i wondered why directory lengths are specified in blocksize units resulting
in the last dirent in the block to fill up the space with its rec_len.
Searching for free space to enter a directory entry i could use
rec_len - NILFS_DIR_REC_LEN(last_dir_enty->namelen) as free space indicator
but i wondered if it might not be a better practice to allow zero-length
dir entries that are used to fill up the rest of the block with its
rec_len.
Thoughts?
With regards,
Reinoud
[-- Attachment #1.2: Type: application/pgp-signature, Size: 478 bytes --]
[-- Attachment #2: Type: text/plain, Size: 158 bytes --]
_______________________________________________
users mailing list
users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
https://www.nilfs.org/mailman/listinfo/users
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: directory entries
[not found] ` <20080823203853.GA19421-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
@ 2008-08-25 3:21 ` Ryusuke Konishi
[not found] ` <20080825.122125.65657043.ryusuke-sG5X7nlA6pw@public.gmane.org>
0 siblings, 1 reply; 15+ messages in thread
From: Ryusuke Konishi @ 2008-08-25 3:21 UTC (permalink / raw)
To: users-JrjvKiOkagjYtjvyW6yDsg, reinoud-S783fYmB3Ccdnm+yROfE0A
Hi, Reinoud!
On Sat, 23 Aug 2008 22:38:53 +0200, Reinoud Zandijk wrote:
> Dear folks,
> i wondered why directory lengths are specified in blocksize units resulting
> in the last dirent in the block to fill up the space with its rec_len.
> Searching for free space to enter a directory entry i could use
> rec_len - NILFS_DIR_REC_LEN(last_dir_enty->namelen) as free space indicator
> but i wondered if it might not be a better practice to allow zero-length
> dir entries that are used to fill up the rest of the block with its
> rec_len.
>
> Thoughts?
The directory format of NILFS comes from that of ext2 file system
except its inode number field is extended to 64 bytes. So, for
code maintenance, I think it's not good idea to make such change.
Redesigning directory format is actually one of todo items.
If someone send us a better alternative, we may switch to it.
In the meantime however, we'd like to avoid confusion on this.
Thank you for comments.
Regards,
Ryusuke Konishi
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: directory entries
[not found] ` <20080825.122125.65657043.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2008-08-25 3:30 ` Ryusuke Konishi
[not found] ` <20080825.123047.128885778.ryusuke-sG5X7nlA6pw@public.gmane.org>
0 siblings, 1 reply; 15+ messages in thread
From: Ryusuke Konishi @ 2008-08-25 3:30 UTC (permalink / raw)
To: users-JrjvKiOkagjYtjvyW6yDsg, reinoud-S783fYmB3Ccdnm+yROfE0A
On Mon, 25 Aug 2008 12:21:25 +0900 (JST), Ryusuke Konishi wrote:
> The directory format of NILFS comes from that of ext2 file system
> except its inode number field is extended to 64 bytes.
Oops, 64 bits.
Ryusuke
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: directory entries
[not found] ` <20080825.123047.128885778.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2008-08-25 15:52 ` Reinoud Zandijk
[not found] ` <20080825155243.GA12855-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
0 siblings, 1 reply; 15+ messages in thread
From: Reinoud Zandijk @ 2008-08-25 15:52 UTC (permalink / raw)
To: NILFS Users mailing list
Dear folks,
on the nilfs TODO list i find:
- writable snapshots
This sounds like a fun feature. Would you like to have 1) multiple writable
and snapshotable heads? 2) support updating a snapshot or 3) support
writing to a snapshot that is lost when unmounted? But how to number
checkpoints and snapshots then?
- data integrity support
Isn't that already there ? I thought that each block had a CRC? or would
you like to integrate that into the inode's btree?
- B tree base directory management
- Extent support
Isn't that mutually exclusive? I.e. btree's and extent support? Or are you
refering to recording extents in a btree; i.e. map block 0->X, 1->X+1,
2->X+2 to a form of block 0-15 to block X to X+15?
With regards,
Reinoud
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: directory entries
[not found] ` <20080825155243.GA12855-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
@ 2008-08-26 10:29 ` Ryusuke Konishi
[not found] ` <20080826.192942.104752679.ryusuke-sG5X7nlA6pw@public.gmane.org>
0 siblings, 1 reply; 15+ messages in thread
From: Ryusuke Konishi @ 2008-08-26 10:29 UTC (permalink / raw)
To: users-JrjvKiOkagjYtjvyW6yDsg, reinoud-S783fYmB3Ccdnm+yROfE0A
Hi,
On Mon, 25 Aug 2008 17:52:44 +0200, Reinoud Zandijk wrote:
> on the nilfs TODO list i find:
>
> - writable snapshots
>
> This sounds like a fun feature. Would you like to have 1) multiple writable
> and snapshotable heads? 2) support updating a snapshot or 3) support
> writing to a snapshot that is lost when unmounted?
What's the difference between (1) and (2), do you mean ?
The number of read/write mounts concurrently mountable ?
I'd like to allow read/write mount for snapshots like:
# mount -t nilfs2 -o rw,cp=xxx /dev/block /dir
and maybe (2) is nearest to what I want.
The (3) seems to be rather restrictive.
> But how to number checkpoints and snapshots then?
I don't like CVS revision like extension.
Just appending derived checkpoints (from a snapshot) to current head,
seems to be preferable. What do you think?
> - data integrity support
>
> Isn't that already there ? I thought that each block had a CRC? or would
> you like to integrate that into the inode's btree?
No, each block doesn't have CRC. Two CRCs are given to each log, one
for header and one for the whole log. These are used for GC and mount
time recovery. Neither are used for read time data verification.
These CRC cannot be referred to efficiently when reading data blocks
or B-tree node blocks because they are written in the header of logs.
It may be possible to use them in background task which gets hints
from read requests and verifies blocks. Of course the ZFS like
extension is one way, but we don't have difinite plan for now.
I'd rather expect that the future data integrity extension of block
layer (e.g. T10 DIF) due to simplicity and performance reason.
> - B tree base directory management
> - Extent support
>
> Isn't that mutually exclusive? I.e. btree's and extent support? Or are you
> refering to recording extents in a btree; i.e. map block 0->X, 1->X+1,
> 2->X+2 to a form of block 0-15 to block X to X+15?
Yeah, that's a bit ambiguous and inconsistent.
Think them separately, please. These are the items in long term todo list.
For extent based management, there seem to be some possibilities to apply
it in NILFS. For example,
- Extent tree (like ext4. This is a possible displacement of B-tree )
- Extent based DAT (would save disk space consumed by DAT and improve
performance)
- Extent based binfo in segment summary.
Among these, the second one seems to be worth of consideration IMO.
Anyhow, ``Extent support'' on the page should be rewritten.
Regards,
Ryusuke Konishi
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: directory entries
[not found] ` <20080826.192942.104752679.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2008-08-26 13:29 ` Reinoud Zandijk
[not found] ` <20080901.143956.08023399.ryusuke@osrg.net>
0 siblings, 1 reply; 15+ messages in thread
From: Reinoud Zandijk @ 2008-08-26 13:29 UTC (permalink / raw)
To: NILFS Users mailing list
[-- Attachment #1.1: Type: text/plain, Size: 2552 bytes --]
Dear Ryusuke,
On Tue, Aug 26, 2008 at 07:29:42PM +0900, Ryusuke Konishi wrote:
> On Mon, 25 Aug 2008 17:52:44 +0200, Reinoud Zandijk wrote:
> > - writable snapshots
> >
> > This sounds like a fun feature. Would you like to have 1) multiple writable
> > and snapshotable heads? 2) support updating a snapshot or 3) support
> > writing to a snapshot that is lost when unmounted?
>
> What's the difference between (1) and (2), do you mean ?
> The number of read/write mounts concurrently mountable ?
>
> I'd like to allow read/write mount for snapshots like:
>
> # mount -t nilfs2 -o rw,cp=xxx /dev/block /dir
>
> and maybe (2) is nearest to what I want.
> The (3) seems to be rather restrictive.
well i already guessed so :) though a forkable FS has its advantages! but
would one really use it in practice? It could be handy for switching
between configurations and effectively has a COW strategy that keeps the
diffs for each head between the fork point and the current point but that
could also be done with an LVM.
> > But how to number checkpoints and snapshots then?
>
> I don't like CVS revision like extension.
> Just appending derived checkpoints (from a snapshot) to current head,
> seems to be preferable. What do you think?
For option (2) new data/modifications can we written out only under the old
checkpoint number like the cleaner does.... but if you create a new
checkpoint for it ... i dunno; that would break the rule `the last
checkpoint is the head'. We could also give snapshot/head a name; then
increasing checkpoints is no issue if you keep the head name; one can then
search for the `HEAD' name with the highest checkpoint number. Or are you
suggesting a different way?
> > - data integrity support
>
> I'd rather expect that the future data integrity extension of block
> layer (e.g. T10 DIF) due to simplicity and performance reason.
yeah, maybe the block level would be better yes; easier at least :-D
> > - B tree base directory management
> > - Extent support
>
> For extent based management, there seem to be some possibilities to apply
> it in NILFS. For example,
>
> - Extent tree (like ext4. This is a possible displacement of B-tree )
> - Extent based DAT (would save disk space consumed by DAT and improve
> performance)
> - Extent based binfo in segment summary.
Extent based DAT yeah, that could be used yes. An extent tree is quite ...
a different thing though i dont know how difficult it would be to implement
an extent based DAT; haven't tried it yet.
With regards,
Reinoud
[-- Attachment #1.2: Type: application/pgp-signature, Size: 478 bytes --]
[-- Attachment #2: Type: text/plain, Size: 158 bytes --]
_______________________________________________
users mailing list
users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
https://www.nilfs.org/mailman/listinfo/users
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: directory entries
[not found] ` <20080901.143956.08023399.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2008-09-01 5:51 ` Shaya Potter
[not found] ` <48BB82F7.4070607-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-09-01 11:07 ` Reinoud Zandijk
1 sibling, 1 reply; 15+ messages in thread
From: Shaya Potter @ 2008-09-01 5:51 UTC (permalink / raw)
To: NILFS Users mailing list
As I mentioned a while ago, we jury rigged writable snapshots by
combining nilfs w/ unionfs.
Instead of writing to the root of the file system, you right to a subdir.
so we start w/ /nilfs/t0
when we want to rollback and continue to work
we mount the ro snapshot on /s0 and create /nilfs/t1 use unionfs to
union together /s0 (ro) and /nilfs/t1 (rw).
Ryusuke Konishi wrote:
> Dear Reinoud,
>
> On Tue, 26 Aug 2008 15:29:44 +0200, Reinoud Zandijk wrote:
>>>> But how to number checkpoints and snapshots then?
>>> I don't like CVS revision like extension.
>>> Just appending derived checkpoints (from a snapshot) to current head,
>>> seems to be preferable. What do you think?
>> For option (2) new data/modifications can we written out only under the old
>> checkpoint number like the cleaner does.... but if you create a new
>> checkpoint for it ... i dunno; that would break the rule `the last
>> checkpoint is the head'.
>
> Or, we can just think the ``main'' stream was replaced by the
> continued snapshot every time it is mounted in rw-mode. In this case,
> the head is regarded to be moved to the new (latest) checkpoint. This
> is actually convenient for the recovery in which a user pushed
> ``recover button'' for the snapshot.
>
> Note that even the old head becomes a plain checkpoint, it's still
> mountable and continuable again by being changed to a snapshot.
>
> To realize writable snapshots on this interpretation, However, we have
> to solve technical problems around the DAT file. The DAT file, which
> is a table file to map virtual disk addresses to actual disk addreses,
> also maintains lifetime information of each disk block, which is used
> to perform garbage collection.
>
> To that end, this file must be extended to handle multiple lifetimes
> per block, and this would complicate the DAT. Without the DAT file,
> things are not so difficult. This would be achieved in a future, but
> in the meantime, I'll use rsync to continue snapshots ;)
>
>> We could also give snapshot/head a name; then
>> increasing checkpoints is no issue if you keep the head name
>
> Yeah, this would be possible by adding another meta data file
> (e.g. tag file) which maps the HEAD names to checkpoint numbers of
> snapshots. When keeping multiple writeable snapshots, this kind of
> extension would be demanded than now. However, I'd rather do this in
> userland with a regular file (i.e. a DB file) or with TAG files each
> of which simply records a checkpoint number.
>
>
> Regards,
> Ryusuke Konishi
> _______________________________________________
> users mailing list
> users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
> https://www.nilfs.org/mailman/listinfo/users
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: directory entries
[not found] ` <48BB82F7.4070607-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2008-09-01 8:16 ` Ryusuke Konishi
[not found] ` <20080901.171643.74124381.ryusuke-sG5X7nlA6pw@public.gmane.org>
0 siblings, 1 reply; 15+ messages in thread
From: Ryusuke Konishi @ 2008-09-01 8:16 UTC (permalink / raw)
To: users-JrjvKiOkagjYtjvyW6yDsg, spotter-eQaUEPhvms7ENvBUuze7eA
On Mon, 01 Sep 2008 01:51:51 -0400, Shaya Potter wrote:
> As I mentioned a while ago, we jury rigged writable snapshots by
> combining nilfs w/ unionfs.
Yeah, I remember.
It sounds much better at least than switching with rsync :)
> Instead of writing to the root of the file system, you right to a subdir.
>
> so we start w/ /nilfs/t0
>
> when we want to rollback and continue to work
>
> we mount the ro snapshot on /s0 and create /nilfs/t1 use unionfs to
> union together /s0 (ro) and /nilfs/t1 (rw).
I've tried this without making sub directories:
# mkdir /nilfs /snap-ro /snap-rw /change
# mount -t nilfs2 /dev/sdb1 /nilfs
...
# mkcp -s
# lscp
CNO DATE TIME MODE SKT NBLKINC ICNT
...
62305 2008-09-01 16:13:28 ss - 488 39
62306 2008-09-01 16:13:33 cp - 8 39
# mount -t nilfs2 -o ro,cp=62305 /dev/sdb1 /snap-ro
# mount -t unionfs -o dirs=/snap-ro=rw unionfs /snap-rw
# unionctl /snap-rw --add --before /snap-ro --mode rw /change
<use /snap-rw as a writable snapshot mount>
# mount
...
/dev/sdb1 on /nilfs type nilfs2 (rw,gcpid=9512)
/dev/sdb1 on /snap-ro type nilfs2 (ro,cp=62305)
unionfs on /snap-rw type unionfs (rw,dirs=/snap-ro=rw)
It's working fine.
Is there a quicker way? (Or something to add?)
Cheers,
Ryusuke
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: directory entries
[not found] ` <20080901.143956.08023399.ryusuke-sG5X7nlA6pw@public.gmane.org>
2008-09-01 5:51 ` Shaya Potter
@ 2008-09-01 11:07 ` Reinoud Zandijk
[not found] ` <20080901110730.GA21008-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
1 sibling, 1 reply; 15+ messages in thread
From: Reinoud Zandijk @ 2008-09-01 11:07 UTC (permalink / raw)
To: Ryusuke Konishi; +Cc: users-JrjvKiOkagjYtjvyW6yDsg
[-- Attachment #1.1: Type: text/plain, Size: 2292 bytes --]
Dear folks, dear Ryusuke,
On Mon, Sep 01, 2008 at 02:39:56PM +0900, Ryusuke Konishi wrote:
> Or, we can just think the ``main'' stream was replaced by the
> continued snapshot every time it is mounted in rw-mode. In this case,
> the head is regarded to be moved to the new (latest) checkpoint. This
> is actually convenient for the recovery in which a user pushed
> ``recover button'' for the snapshot.
>
> Note that even the old head becomes a plain checkpoint, it's still
> mountable and continuable again by being changed to a snapshot.
sounds reasonable yes... but why would that give problems with the DAT
file? If you allways load the DAT, CP and SU descriptors from the latest
checkpoint as recorded in the superroot even when mounting a snapshot
read-write all is ok i think. The `old head' will be preserved initially as
will be all allocations. If you want to keep the old head you'll have to
make it a snapshot and thus protect all entries that have the old head
snapshot in their intervals.
Changing the `old head' snapshot to a checkpoint will just result in the
cleaned up AFAICS. Do you expect trouble with the current interval code in
the cleaner? I can't see why the DAT would need change.
> To that end, this file must be extended to handle multiple lifetimes
> per block, and this would complicate the DAT. Without the DAT file,
> things are not so difficult. This would be achieved in a future, but
> in the meantime, I'll use rsync to continue snapshots ;)
are you thinking about removing the DAT file?? I thought it was the
addition to v2.0 :)
> > We could also give snapshot/head a name; then
> > increasing checkpoints is no issue if you keep the head name
>
> Yeah, this would be possible by adding another meta data file
> (e.g. tag file) which maps the HEAD names to checkpoint numbers of
> snapshots. When keeping multiple writeable snapshots, this kind of
> extension would be demanded than now. However, I'd rather do this in
> userland with a regular file (i.e. a DB file) or with TAG files each
> of which simply records a checkpoint number.
The extra meta file sounds good but i dont like the `userland' DB solution;
it would make nilfs dependent on DB4 (or the like) and it could make it
non-selfcontaining.
With regards,
Reinoud
[-- Attachment #1.2: Type: application/pgp-signature, Size: 478 bytes --]
[-- Attachment #2: Type: text/plain, Size: 158 bytes --]
_______________________________________________
users mailing list
users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
https://www.nilfs.org/mailman/listinfo/users
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: directory entries
[not found] ` <20080901.171643.74124381.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2008-09-01 14:27 ` Shaya Potter
[not found] ` <48BBFBEA.2000308-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
0 siblings, 1 reply; 15+ messages in thread
From: Shaya Potter @ 2008-09-01 14:27 UTC (permalink / raw)
To: Ryusuke Konishi; +Cc: users-JrjvKiOkagjYtjvyW6yDsg
Ryusuke Konishi wrote:
>
> I've tried this without making sub directories:
>
> # mkdir /nilfs /snap-ro /snap-rw /change
> # mount -t nilfs2 /dev/sdb1 /nilfs
> ...
> # mkcp -s
> # lscp
> CNO DATE TIME MODE SKT NBLKINC ICNT
> ...
> 62305 2008-09-01 16:13:28 ss - 488 39
> 62306 2008-09-01 16:13:33 cp - 8 39
>
> # mount -t nilfs2 -o ro,cp=62305 /dev/sdb1 /snap-ro
> # mount -t unionfs -o dirs=/snap-ro=rw unionfs /snap-rw
> # unionctl /snap-rw --add --before /snap-ro --mode rw /change
> <use /snap-rw as a writable snapshot mount>
>
> # mount
> ...
> /dev/sdb1 on /nilfs type nilfs2 (rw,gcpid=9512)
> /dev/sdb1 on /snap-ro type nilfs2 (ro,cp=62305)
> unionfs on /snap-rw type unionfs (rw,dirs=/snap-ro=rw)
>
> It's working fine.
> Is there a quicker way? (Or something to add?)
well, with new unionfs, you wouldn't use unionctl, but what I did was
close, but something more along the lines of
- to setup
mkdir /base /nilfs /snap-ro /union
mount -t ext3 /dev/sdb1 /base
mount -t nilfs2 /dev/sdb2 /nilfs
mkdir /nilfs/1
mount -t unionfs -o dirs=/nilfs/1=rw,/base=ro none /union
(use union), all writes go to nilfs
- to rollback to a checkpoint, but keep it writable.
mount -t nilfs2 -o ro,cp=xyz /dev/sdb1 /snap-ro
mkdir /nilfs/2
mount -t unionfs -o dirs=/nilfs/2=rw,/nilfs/1=ro,/base=ro none /union
i.e. chaining the /nilfs/* dirs together with unionfs.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: directory entries
[not found] ` <20080901110730.GA21008-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
@ 2008-09-01 16:51 ` Ryusuke Konishi
[not found] ` <20080902.015156.126164477.ryusuke-sG5X7nlA6pw@public.gmane.org>
0 siblings, 1 reply; 15+ messages in thread
From: Ryusuke Konishi @ 2008-09-01 16:51 UTC (permalink / raw)
To: reinoud-S783fYmB3Ccdnm+yROfE0A; +Cc: users-JrjvKiOkagjYtjvyW6yDsg
On Mon, 1 Sep 2008 13:07:30 +0200, Reinoud Zandijk wrote:
> Dear folks, dear Ryusuke,
>
> On Mon, Sep 01, 2008 at 02:39:56PM +0900, Ryusuke Konishi wrote:
> > Or, we can just think the ``main'' stream was replaced by the
> > continued snapshot every time it is mounted in rw-mode. In this case,
> > the head is regarded to be moved to the new (latest) checkpoint. This
> > is actually convenient for the recovery in which a user pushed
> > ``recover button'' for the snapshot.
> >
> > Note that even the old head becomes a plain checkpoint, it's still
> > mountable and continuable again by being changed to a snapshot.
>
> sounds reasonable yes... but why would that give problems with the DAT
> file? If you allways load the DAT, CP and SU descriptors from the latest
> checkpoint as recorded in the superroot even when mounting a snapshot
> read-write all is ok i think.
Good question.
Without garbage collection, things become simple like that.
With the garbage collection, however, the older versions of the DAT,
SU and CP cannot be selected because part of their entries may be
invalidated or reused; they will become meaningless by a GC.
Consequently, the NILFS2 GC removes the old blocks belonging to these
meta data files or old super roots instead of moving them to a new
log. As the result, only the latest files are available against these
three files.
> The `old head' will be preserved initially as
> will be all allocations. If you want to keep the old head you'll have to
> make it a snapshot and thus protect all entries that have the old head
> snapshot in their intervals.
Yeah, but I think we don't have to necessarily keep the old
checkpoint. Even if we keep the previous head, changing its
checkpoint to a snapshot can be done when appending a writable
snapshot to the head.
> Changing the `old head' snapshot to a checkpoint will just result in the
> cleaned up AFAICS. Do you expect trouble with the current interval code in
> the cleaner?
Sorry for confusing you. I mean copying the snapshot to a new
checkpoint which is appended after the latest checkpoint; I don't
mean downgrading the old snapshot to a checkpoint.
Because the NILFS2 GC never reclaims the current checkpoint (the
current head), we don't have to change the continued head to a
snapshot until switching it again to other snapshots.
> I can't see why the DAT would need change.
As I mentioned above, we cannot use the past version of the DAT. So,
to allow multiple forks, we have to maintain multiple lifetime
information for each virtual block address in the single DAT.
> > To that end, this file must be extended to handle multiple lifetimes
> > per block, and this would complicate the DAT. Without the DAT file,
> > things are not so difficult. This would be achieved in a future, but
> > in the meantime, I'll use rsync to continue snapshots ;)
>
> are you thinking about removing the DAT file?? I thought it was the
> addition to v2.0 :)
No, it was achieved in v1.0 :)
Unfortunately we cannot go back to the world without the GC, so I
cannot remove the DAT file.
> > > We could also give snapshot/head a name; then
> > > increasing checkpoints is no issue if you keep the head name
> >
> > Yeah, this would be possible by adding another meta data file
> > (e.g. tag file) which maps the HEAD names to checkpoint numbers of
> > snapshots. When keeping multiple writeable snapshots, this kind of
> > extension would be demanded than now. However, I'd rather do this in
> > userland with a regular file (i.e. a DB file) or with TAG files each
> > of which simply records a checkpoint number.
>
> The extra meta file sounds good but i dont like the `userland' DB solution;
> it would make nilfs dependent on DB4 (or the like) and it could make it
> non-selfcontaining.
I think we can introduce the tags without bringing them into the
NILFS2 kernel module. Only the mount program and snapshot tools need
the tags, and the conversions from tags to checkpoint numbers can be
done in userland.
Adding the tag file is not bad idea, but this would increase the
performance penalty as well as the lines of kernel code. That is the
reason why I hesitate to admit this meta file.
Regards,
Ryusuke Konishi
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: directory entries
[not found] ` <48BBFBEA.2000308-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2008-09-01 17:31 ` Ryusuke Konishi
0 siblings, 0 replies; 15+ messages in thread
From: Ryusuke Konishi @ 2008-09-01 17:31 UTC (permalink / raw)
To: spotter-eQaUEPhvms7ENvBUuze7eA; +Cc: users-JrjvKiOkagjYtjvyW6yDsg
On Mon, 01 Sep 2008 10:27:54 -0400, Shaya Potter wrote:
> Ryusuke Konishi wrote:
> > It's working fine.
> > Is there a quicker way? (Or something to add?)
>
> well, with new unionfs, you wouldn't use unionctl, but what I did was
> close, but something more along the lines of
>
> - to setup
>
> mkdir /base /nilfs /snap-ro /union
> mount -t ext3 /dev/sdb1 /base
> mount -t nilfs2 /dev/sdb2 /nilfs
>
> mkdir /nilfs/1
>
> mount -t unionfs -o dirs=/nilfs/1=rw,/base=ro none /union
> (use union), all writes go to nilfs
>
> - to rollback to a checkpoint, but keep it writable.
>
> mount -t nilfs2 -o ro,cp=xyz /dev/sdb1 /snap-ro
> mkdir /nilfs/2
> mount -t unionfs -o dirs=/nilfs/2=rw,/nilfs/1=ro,/base=ro none /union
>
> i.e. chaining the /nilfs/* dirs together with unionfs.
Thank you for letting us know!
It looks nicer than mine.
These may be able to be wrapped in a shell script or in the
mount.nilfs2 helper program. Or, we may be able to use autofs to make
these mounts automatic.
Cheers,
Ryusuke
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: directory entries
[not found] ` <20080902.015156.126164477.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2008-09-02 15:02 ` Reinoud Zandijk
[not found] ` <20080902150226.GA28292-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
0 siblings, 1 reply; 15+ messages in thread
From: Reinoud Zandijk @ 2008-09-02 15:02 UTC (permalink / raw)
To: Ryusuke Konishi; +Cc: users-JrjvKiOkagjYtjvyW6yDsg
[-- Attachment #1.1: Type: text/plain, Size: 4007 bytes --]
Hi Ryusuke, hi folks,
On Tue, Sep 02, 2008 at 01:51:56AM +0900, Ryusuke Konishi wrote:
> > sounds reasonable yes... but why would that give problems with the DAT
> > file? If you allways load the DAT, CP and SU descriptors from the latest
> > checkpoint as recorded in the superroot even when mounting a snapshot
> > read-write all is ok i think.
>
> Good question.
> Without garbage collection, things become simple like that.
>
> With the garbage collection, however, the older versions of the DAT,
> SU and CP cannot be selected because part of their entries may be
> invalidated or reused; they will become meaningless by a GC.
> Consequently, the NILFS2 GC removes the old blocks belonging to these
> meta data files or old super roots instead of moving them to a new
> log. As the result, only the latest files are available against these
> three files.
True, in my implementation i allways go for the latest checkpoint's DAT, CP
and SU even when mounting a previous snapshot since older ones don't make
sense anymore as you already stated.
> > I can't see why the DAT would need change.
>
> As I mentioned above, we cannot use the past version of the DAT. So,
> to allow multiple forks, we have to maintain multiple lifetime
> information for each virtual block address in the single DAT.
I dont think thats needed. In the case with no history retention at all all
snapshots will be retained anyway regardless of how they are created/forked
right? So each snapshot keeps mountable and updateable. There will only be
a number of ifiles alive for each snapshot one like its already now.
If you'd like you modify the GC to keep say the history for a specified
time it only means that it needs to keep the history for the specified time
for all the snapshots that can be mounted as `head'.
> > The extra meta file sounds good but i dont like the `userland' DB solution;
> > it would make nilfs dependent on DB4 (or the like) and it could make it
> > non-selfcontaining.
>
> I think we can introduce the tags without bringing them into the
> NILFS2 kernel module. Only the mount program and snapshot tools need
> the tags, and the conversions from tags to checkpoint numbers can be
> done in userland.
i'd still opt the kernel metafile solution. The best way is also the
easiest i think :
struct nilfs_checkpoint_list {
__le64 ssl_next;
__le64 ssl_prev;
};
struct nilfs_checkpoint {
__le32 cp_flags;
__le32 cp_checkpoints_count;
struct nilfs_checkpoint_list cp_snapshot_list;
__le64 cp_cno;
__le64 cp_create;
__le64 cp_nblk_inc;
__le64 cp_inodes_count;
__le64 cp_blocks_count; /* Reserved (might be deleted) */
/* Do not change the byte offset of ifile inode.
To keep the compatibility of the disk format,
additional fields should be added behind cp_ifile_inode. */
struct nilfs_inode cp_ifile_inode;
struct nilfs_checkpoint_list cp_head_list;
char cp_headname[16];
};
So just add a `cp_head_list' inside the nilfs_checkpoint info and a head
name. (pitty one can't reorder the entries). If creating a new checkpoint
one can update the mounted checkpoint's `cp_head_list' to point to the
newly added checkpointnr and write out the new checkpoint. This checkpoint
also stores the name so in case the chain is lost due to a software bug or
whatever, it can be rebuild. It is also handy for dumping snapshots with
their head names.
It hardly takes any coding and has minimal inpact on the rest.
> Adding the tag file is not bad idea, but this would increase the
> performance penalty as well as the lines of kernel code. That is the
> reason why I hesitate to admit this meta file.
Is there a performance penalty then? It is at most only checked at
mount/unmount time....
BTW, its a pitty the data structures are not tagged and/or versioned :-/ it
would be handy to be able to recognize and reorganise things.
With regards,
Reinoud
[-- Attachment #1.2: Type: application/pgp-signature, Size: 478 bytes --]
[-- Attachment #2: Type: text/plain, Size: 158 bytes --]
_______________________________________________
users mailing list
users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
https://www.nilfs.org/mailman/listinfo/users
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: directory entries
[not found] ` <20080902150226.GA28292-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
@ 2008-09-03 12:39 ` Reinoud Zandijk
2008-09-03 16:32 ` Ryusuke Konishi
1 sibling, 0 replies; 15+ messages in thread
From: Reinoud Zandijk @ 2008-09-03 12:39 UTC (permalink / raw)
To: NILFS Users mailing list
[-- Attachment #1.1: Type: text/plain, Size: 1241 bytes --]
Hi Ryusuke, hi folks,
On Tue, Sep 02, 2008 at 05:02:26PM +0200, Reinoud Zandijk wrote:
> > > I can't see why the DAT would need change.
> >
> > As I mentioned above, we cannot use the past version of the DAT. So,
> > to allow multiple forks, we have to maintain multiple lifetime
> > information for each virtual block address in the single DAT.
>
> I dont think thats needed. In the case with no history retention at all all
> snapshots will be retained anyway regardless of how they are created/forked
> right? So each snapshot keeps mountable and updateable. There will only be
> a number of ifiles alive for each snapshot one like its already now.
>
> If you'd like you modify the GC to keep say the history for a specified
> time it only means that it needs to keep the history for the specified time
> for all the snapshots that can be mounted as `head'.
Hmmm, I forgot that the virtual adresses will get multiplexed unless new
virtual adresses are used... that suxs. So basicly each virtual address
needs an indicator on which head its on and if its not the current head, a
new virtual address needs to be assigned on write... hmmm
maybe we should cancel the idea of multiple heads for now? :)
with regards,
Reinoud
[-- Attachment #1.2: Type: application/pgp-signature, Size: 478 bytes --]
[-- Attachment #2: Type: text/plain, Size: 158 bytes --]
_______________________________________________
users mailing list
users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
https://www.nilfs.org/mailman/listinfo/users
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: directory entries
[not found] ` <20080902150226.GA28292-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
2008-09-03 12:39 ` Reinoud Zandijk
@ 2008-09-03 16:32 ` Ryusuke Konishi
1 sibling, 0 replies; 15+ messages in thread
From: Ryusuke Konishi @ 2008-09-03 16:32 UTC (permalink / raw)
To: reinoud-S783fYmB3Ccdnm+yROfE0A; +Cc: users-JrjvKiOkagjYtjvyW6yDsg
Hi Reinoud,
On Wed, 3 Sep 2008 14:39:40 +0200, Reinoud Zandijk wrote:
> Hmmm, I forgot that the virtual adresses will get multiplexed unless new
> virtual adresses are used... that suxs. So basicly each virtual address
> needs an indicator on which head its on and if its not the current head, a
> new virtual address needs to be assigned on write... hmmm
>
> maybe we should cancel the idea of multiple heads for now? :)
Yep, so it's classified to the future TODOs :)
On Tue, 2 Sep 2008 17:02:26 +0200, Reinoud Zandijk wrote:
> > > The extra meta file sounds good but i dont like the `userland' DB solution;
> > > it would make nilfs dependent on DB4 (or the like) and it could make it
> > > non-selfcontaining.
> >
> > I think we can introduce the tags without bringing them into the
> > NILFS2 kernel module. Only the mount program and snapshot tools need
> > the tags, and the conversions from tags to checkpoint numbers can be
> > done in userland.
>
> i'd still opt the kernel metafile solution. The best way is also the
> easiest i think :
>
> struct nilfs_checkpoint_list {
> __le64 ssl_next;
> __le64 ssl_prev;
> };
>
> struct nilfs_checkpoint {
> __le32 cp_flags;
> __le32 cp_checkpoints_count;
> struct nilfs_checkpoint_list cp_snapshot_list;
> __le64 cp_cno;
> __le64 cp_create;
> __le64 cp_nblk_inc;
> __le64 cp_inodes_count;
> __le64 cp_blocks_count; /* Reserved (might be deleted) */
>
> /* Do not change the byte offset of ifile inode.
> To keep the compatibility of the disk format,
> additional fields should be added behind cp_ifile_inode. */
> struct nilfs_inode cp_ifile_inode;
>
> struct nilfs_checkpoint_list cp_head_list;
> char cp_headname[16];
> };
>
> So just add a `cp_head_list' inside the nilfs_checkpoint info and a head
> name. (pitty one can't reorder the entries). If creating a new checkpoint
> one can update the mounted checkpoint's `cp_head_list' to point to the
> newly added checkpointnr and write out the new checkpoint. This checkpoint
> also stores the name so in case the chain is lost due to a software bug or
> whatever, it can be rebuild. It is also handy for dumping snapshots with
> their head names.
Sure, it's the primary way.
It has an advantage to list tag names in the lscp command.
But, as you guessed, this is a bit restrictive (i.e. limited length
and single tag per snapshot) as well as it fattens the cpfile.
I won't reply NAK, but I want to reserve this as a last measure and
discuss alternatives carefully because reverting disk format is almost
impossible.
> > Adding the tag file is not bad idea, but this would increase the
> > performance penalty as well as the lines of kernel code. That is the
> > reason why I hesitate to admit this meta file.
>
> Is there a performance penalty then? It is at most only checked at
> mount/unmount time....
I worried about penalties against segment constructions. But I may do
not have to get so nervous. At least we have a way around; unless we
write the HEAD tag on the file, we can make updates of the tag file
infrequent. The update will be needed only when switching to other
forks or manipulating tags.
Cheers,
Ryusuke
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2008-09-03 16:32 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-23 20:38 directory entries Reinoud Zandijk
[not found] ` <20080823203853.GA19421-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
2008-08-25 3:21 ` Ryusuke Konishi
[not found] ` <20080825.122125.65657043.ryusuke-sG5X7nlA6pw@public.gmane.org>
2008-08-25 3:30 ` Ryusuke Konishi
[not found] ` <20080825.123047.128885778.ryusuke-sG5X7nlA6pw@public.gmane.org>
2008-08-25 15:52 ` Reinoud Zandijk
[not found] ` <20080825155243.GA12855-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
2008-08-26 10:29 ` Ryusuke Konishi
[not found] ` <20080826.192942.104752679.ryusuke-sG5X7nlA6pw@public.gmane.org>
2008-08-26 13:29 ` Reinoud Zandijk
[not found] ` <20080901.143956.08023399.ryusuke@osrg.net>
[not found] ` <20080901.143956.08023399.ryusuke-sG5X7nlA6pw@public.gmane.org>
2008-09-01 5:51 ` Shaya Potter
[not found] ` <48BB82F7.4070607-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-09-01 8:16 ` Ryusuke Konishi
[not found] ` <20080901.171643.74124381.ryusuke-sG5X7nlA6pw@public.gmane.org>
2008-09-01 14:27 ` Shaya Potter
[not found] ` <48BBFBEA.2000308-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-09-01 17:31 ` Ryusuke Konishi
2008-09-01 11:07 ` Reinoud Zandijk
[not found] ` <20080901110730.GA21008-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
2008-09-01 16:51 ` Ryusuke Konishi
[not found] ` <20080902.015156.126164477.ryusuke-sG5X7nlA6pw@public.gmane.org>
2008-09-02 15:02 ` Reinoud Zandijk
[not found] ` <20080902150226.GA28292-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
2008-09-03 12:39 ` Reinoud Zandijk
2008-09-03 16:32 ` Ryusuke Konishi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox