* directory entries
@ 2008-08-23 20:38 Reinoud Zandijk
[not found] ` <20080823203853.GA19421-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
0 siblings, 1 reply; 15+ messages in thread
From: Reinoud Zandijk @ 2008-08-23 20:38 UTC (permalink / raw)
To: NILFS Users mailing list
[-- Attachment #1.1: Type: text/plain, Size: 492 bytes --]
Dear folks,
i wondered why directory lengths are specified in blocksize units resulting
in the last dirent in the block to fill up the space with its rec_len.
Searching for free space to enter a directory entry i could use
rec_len - NILFS_DIR_REC_LEN(last_dir_enty->namelen) as free space indicator
but i wondered if it might not be a better practice to allow zero-length
dir entries that are used to fill up the rest of the block with its
rec_len.
Thoughts?
With regards,
Reinoud
[-- Attachment #1.2: Type: application/pgp-signature, Size: 478 bytes --]
[-- Attachment #2: Type: text/plain, Size: 158 bytes --]
_______________________________________________
users mailing list
users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
https://www.nilfs.org/mailman/listinfo/users
^ permalink raw reply [flat|nested] 15+ messages in thread[parent not found: <20080823203853.GA19421-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>]
* Re: directory entries [not found] ` <20080823203853.GA19421-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org> @ 2008-08-25 3:21 ` Ryusuke Konishi [not found] ` <20080825.122125.65657043.ryusuke-sG5X7nlA6pw@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Ryusuke Konishi @ 2008-08-25 3:21 UTC (permalink / raw) To: users-JrjvKiOkagjYtjvyW6yDsg, reinoud-S783fYmB3Ccdnm+yROfE0A Hi, Reinoud! On Sat, 23 Aug 2008 22:38:53 +0200, Reinoud Zandijk wrote: > Dear folks, > i wondered why directory lengths are specified in blocksize units resulting > in the last dirent in the block to fill up the space with its rec_len. > Searching for free space to enter a directory entry i could use > rec_len - NILFS_DIR_REC_LEN(last_dir_enty->namelen) as free space indicator > but i wondered if it might not be a better practice to allow zero-length > dir entries that are used to fill up the rest of the block with its > rec_len. > > Thoughts? The directory format of NILFS comes from that of ext2 file system except its inode number field is extended to 64 bytes. So, for code maintenance, I think it's not good idea to make such change. Redesigning directory format is actually one of todo items. If someone send us a better alternative, we may switch to it. In the meantime however, we'd like to avoid confusion on this. Thank you for comments. Regards, Ryusuke Konishi ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <20080825.122125.65657043.ryusuke-sG5X7nlA6pw@public.gmane.org>]
* Re: directory entries [not found] ` <20080825.122125.65657043.ryusuke-sG5X7nlA6pw@public.gmane.org> @ 2008-08-25 3:30 ` Ryusuke Konishi [not found] ` <20080825.123047.128885778.ryusuke-sG5X7nlA6pw@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Ryusuke Konishi @ 2008-08-25 3:30 UTC (permalink / raw) To: users-JrjvKiOkagjYtjvyW6yDsg, reinoud-S783fYmB3Ccdnm+yROfE0A On Mon, 25 Aug 2008 12:21:25 +0900 (JST), Ryusuke Konishi wrote: > The directory format of NILFS comes from that of ext2 file system > except its inode number field is extended to 64 bytes. Oops, 64 bits. Ryusuke ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <20080825.123047.128885778.ryusuke-sG5X7nlA6pw@public.gmane.org>]
* Re: directory entries [not found] ` <20080825.123047.128885778.ryusuke-sG5X7nlA6pw@public.gmane.org> @ 2008-08-25 15:52 ` Reinoud Zandijk [not found] ` <20080825155243.GA12855-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Reinoud Zandijk @ 2008-08-25 15:52 UTC (permalink / raw) To: NILFS Users mailing list Dear folks, on the nilfs TODO list i find: - writable snapshots This sounds like a fun feature. Would you like to have 1) multiple writable and snapshotable heads? 2) support updating a snapshot or 3) support writing to a snapshot that is lost when unmounted? But how to number checkpoints and snapshots then? - data integrity support Isn't that already there ? I thought that each block had a CRC? or would you like to integrate that into the inode's btree? - B tree base directory management - Extent support Isn't that mutually exclusive? I.e. btree's and extent support? Or are you refering to recording extents in a btree; i.e. map block 0->X, 1->X+1, 2->X+2 to a form of block 0-15 to block X to X+15? With regards, Reinoud ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <20080825155243.GA12855-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>]
* Re: directory entries [not found] ` <20080825155243.GA12855-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org> @ 2008-08-26 10:29 ` Ryusuke Konishi [not found] ` <20080826.192942.104752679.ryusuke-sG5X7nlA6pw@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Ryusuke Konishi @ 2008-08-26 10:29 UTC (permalink / raw) To: users-JrjvKiOkagjYtjvyW6yDsg, reinoud-S783fYmB3Ccdnm+yROfE0A Hi, On Mon, 25 Aug 2008 17:52:44 +0200, Reinoud Zandijk wrote: > on the nilfs TODO list i find: > > - writable snapshots > > This sounds like a fun feature. Would you like to have 1) multiple writable > and snapshotable heads? 2) support updating a snapshot or 3) support > writing to a snapshot that is lost when unmounted? What's the difference between (1) and (2), do you mean ? The number of read/write mounts concurrently mountable ? I'd like to allow read/write mount for snapshots like: # mount -t nilfs2 -o rw,cp=xxx /dev/block /dir and maybe (2) is nearest to what I want. The (3) seems to be rather restrictive. > But how to number checkpoints and snapshots then? I don't like CVS revision like extension. Just appending derived checkpoints (from a snapshot) to current head, seems to be preferable. What do you think? > - data integrity support > > Isn't that already there ? I thought that each block had a CRC? or would > you like to integrate that into the inode's btree? No, each block doesn't have CRC. Two CRCs are given to each log, one for header and one for the whole log. These are used for GC and mount time recovery. Neither are used for read time data verification. These CRC cannot be referred to efficiently when reading data blocks or B-tree node blocks because they are written in the header of logs. It may be possible to use them in background task which gets hints from read requests and verifies blocks. Of course the ZFS like extension is one way, but we don't have difinite plan for now. I'd rather expect that the future data integrity extension of block layer (e.g. T10 DIF) due to simplicity and performance reason. > - B tree base directory management > - Extent support > > Isn't that mutually exclusive? I.e. btree's and extent support? Or are you > refering to recording extents in a btree; i.e. map block 0->X, 1->X+1, > 2->X+2 to a form of block 0-15 to block X to X+15? Yeah, that's a bit ambiguous and inconsistent. Think them separately, please. These are the items in long term todo list. For extent based management, there seem to be some possibilities to apply it in NILFS. For example, - Extent tree (like ext4. This is a possible displacement of B-tree ) - Extent based DAT (would save disk space consumed by DAT and improve performance) - Extent based binfo in segment summary. Among these, the second one seems to be worth of consideration IMO. Anyhow, ``Extent support'' on the page should be rewritten. Regards, Ryusuke Konishi ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <20080826.192942.104752679.ryusuke-sG5X7nlA6pw@public.gmane.org>]
* Re: directory entries [not found] ` <20080826.192942.104752679.ryusuke-sG5X7nlA6pw@public.gmane.org> @ 2008-08-26 13:29 ` Reinoud Zandijk [not found] ` <20080901.143956.08023399.ryusuke@osrg.net> 0 siblings, 1 reply; 15+ messages in thread From: Reinoud Zandijk @ 2008-08-26 13:29 UTC (permalink / raw) To: NILFS Users mailing list [-- Attachment #1.1: Type: text/plain, Size: 2552 bytes --] Dear Ryusuke, On Tue, Aug 26, 2008 at 07:29:42PM +0900, Ryusuke Konishi wrote: > On Mon, 25 Aug 2008 17:52:44 +0200, Reinoud Zandijk wrote: > > - writable snapshots > > > > This sounds like a fun feature. Would you like to have 1) multiple writable > > and snapshotable heads? 2) support updating a snapshot or 3) support > > writing to a snapshot that is lost when unmounted? > > What's the difference between (1) and (2), do you mean ? > The number of read/write mounts concurrently mountable ? > > I'd like to allow read/write mount for snapshots like: > > # mount -t nilfs2 -o rw,cp=xxx /dev/block /dir > > and maybe (2) is nearest to what I want. > The (3) seems to be rather restrictive. well i already guessed so :) though a forkable FS has its advantages! but would one really use it in practice? It could be handy for switching between configurations and effectively has a COW strategy that keeps the diffs for each head between the fork point and the current point but that could also be done with an LVM. > > But how to number checkpoints and snapshots then? > > I don't like CVS revision like extension. > Just appending derived checkpoints (from a snapshot) to current head, > seems to be preferable. What do you think? For option (2) new data/modifications can we written out only under the old checkpoint number like the cleaner does.... but if you create a new checkpoint for it ... i dunno; that would break the rule `the last checkpoint is the head'. We could also give snapshot/head a name; then increasing checkpoints is no issue if you keep the head name; one can then search for the `HEAD' name with the highest checkpoint number. Or are you suggesting a different way? > > - data integrity support > > I'd rather expect that the future data integrity extension of block > layer (e.g. T10 DIF) due to simplicity and performance reason. yeah, maybe the block level would be better yes; easier at least :-D > > - B tree base directory management > > - Extent support > > For extent based management, there seem to be some possibilities to apply > it in NILFS. For example, > > - Extent tree (like ext4. This is a possible displacement of B-tree ) > - Extent based DAT (would save disk space consumed by DAT and improve > performance) > - Extent based binfo in segment summary. Extent based DAT yeah, that could be used yes. An extent tree is quite ... a different thing though i dont know how difficult it would be to implement an extent based DAT; haven't tried it yet. With regards, Reinoud [-- Attachment #1.2: Type: application/pgp-signature, Size: 478 bytes --] [-- Attachment #2: Type: text/plain, Size: 158 bytes --] _______________________________________________ users mailing list users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org https://www.nilfs.org/mailman/listinfo/users ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <20080901.143956.08023399.ryusuke@osrg.net>]
[parent not found: <20080901.143956.08023399.ryusuke-sG5X7nlA6pw@public.gmane.org>]
* Re: directory entries [not found] ` <20080901.143956.08023399.ryusuke-sG5X7nlA6pw@public.gmane.org> @ 2008-09-01 5:51 ` Shaya Potter [not found] ` <48BB82F7.4070607-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> 2008-09-01 11:07 ` Reinoud Zandijk 1 sibling, 1 reply; 15+ messages in thread From: Shaya Potter @ 2008-09-01 5:51 UTC (permalink / raw) To: NILFS Users mailing list As I mentioned a while ago, we jury rigged writable snapshots by combining nilfs w/ unionfs. Instead of writing to the root of the file system, you right to a subdir. so we start w/ /nilfs/t0 when we want to rollback and continue to work we mount the ro snapshot on /s0 and create /nilfs/t1 use unionfs to union together /s0 (ro) and /nilfs/t1 (rw). Ryusuke Konishi wrote: > Dear Reinoud, > > On Tue, 26 Aug 2008 15:29:44 +0200, Reinoud Zandijk wrote: >>>> But how to number checkpoints and snapshots then? >>> I don't like CVS revision like extension. >>> Just appending derived checkpoints (from a snapshot) to current head, >>> seems to be preferable. What do you think? >> For option (2) new data/modifications can we written out only under the old >> checkpoint number like the cleaner does.... but if you create a new >> checkpoint for it ... i dunno; that would break the rule `the last >> checkpoint is the head'. > > Or, we can just think the ``main'' stream was replaced by the > continued snapshot every time it is mounted in rw-mode. In this case, > the head is regarded to be moved to the new (latest) checkpoint. This > is actually convenient for the recovery in which a user pushed > ``recover button'' for the snapshot. > > Note that even the old head becomes a plain checkpoint, it's still > mountable and continuable again by being changed to a snapshot. > > To realize writable snapshots on this interpretation, However, we have > to solve technical problems around the DAT file. The DAT file, which > is a table file to map virtual disk addresses to actual disk addreses, > also maintains lifetime information of each disk block, which is used > to perform garbage collection. > > To that end, this file must be extended to handle multiple lifetimes > per block, and this would complicate the DAT. Without the DAT file, > things are not so difficult. This would be achieved in a future, but > in the meantime, I'll use rsync to continue snapshots ;) > >> We could also give snapshot/head a name; then >> increasing checkpoints is no issue if you keep the head name > > Yeah, this would be possible by adding another meta data file > (e.g. tag file) which maps the HEAD names to checkpoint numbers of > snapshots. When keeping multiple writeable snapshots, this kind of > extension would be demanded than now. However, I'd rather do this in > userland with a regular file (i.e. a DB file) or with TAG files each > of which simply records a checkpoint number. > > > Regards, > Ryusuke Konishi > _______________________________________________ > users mailing list > users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org > https://www.nilfs.org/mailman/listinfo/users ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <48BB82F7.4070607-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>]
* Re: directory entries [not found] ` <48BB82F7.4070607-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> @ 2008-09-01 8:16 ` Ryusuke Konishi [not found] ` <20080901.171643.74124381.ryusuke-sG5X7nlA6pw@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Ryusuke Konishi @ 2008-09-01 8:16 UTC (permalink / raw) To: users-JrjvKiOkagjYtjvyW6yDsg, spotter-eQaUEPhvms7ENvBUuze7eA On Mon, 01 Sep 2008 01:51:51 -0400, Shaya Potter wrote: > As I mentioned a while ago, we jury rigged writable snapshots by > combining nilfs w/ unionfs. Yeah, I remember. It sounds much better at least than switching with rsync :) > Instead of writing to the root of the file system, you right to a subdir. > > so we start w/ /nilfs/t0 > > when we want to rollback and continue to work > > we mount the ro snapshot on /s0 and create /nilfs/t1 use unionfs to > union together /s0 (ro) and /nilfs/t1 (rw). I've tried this without making sub directories: # mkdir /nilfs /snap-ro /snap-rw /change # mount -t nilfs2 /dev/sdb1 /nilfs ... # mkcp -s # lscp CNO DATE TIME MODE SKT NBLKINC ICNT ... 62305 2008-09-01 16:13:28 ss - 488 39 62306 2008-09-01 16:13:33 cp - 8 39 # mount -t nilfs2 -o ro,cp=62305 /dev/sdb1 /snap-ro # mount -t unionfs -o dirs=/snap-ro=rw unionfs /snap-rw # unionctl /snap-rw --add --before /snap-ro --mode rw /change <use /snap-rw as a writable snapshot mount> # mount ... /dev/sdb1 on /nilfs type nilfs2 (rw,gcpid=9512) /dev/sdb1 on /snap-ro type nilfs2 (ro,cp=62305) unionfs on /snap-rw type unionfs (rw,dirs=/snap-ro=rw) It's working fine. Is there a quicker way? (Or something to add?) Cheers, Ryusuke ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <20080901.171643.74124381.ryusuke-sG5X7nlA6pw@public.gmane.org>]
* Re: directory entries [not found] ` <20080901.171643.74124381.ryusuke-sG5X7nlA6pw@public.gmane.org> @ 2008-09-01 14:27 ` Shaya Potter [not found] ` <48BBFBEA.2000308-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Shaya Potter @ 2008-09-01 14:27 UTC (permalink / raw) To: Ryusuke Konishi; +Cc: users-JrjvKiOkagjYtjvyW6yDsg Ryusuke Konishi wrote: > > I've tried this without making sub directories: > > # mkdir /nilfs /snap-ro /snap-rw /change > # mount -t nilfs2 /dev/sdb1 /nilfs > ... > # mkcp -s > # lscp > CNO DATE TIME MODE SKT NBLKINC ICNT > ... > 62305 2008-09-01 16:13:28 ss - 488 39 > 62306 2008-09-01 16:13:33 cp - 8 39 > > # mount -t nilfs2 -o ro,cp=62305 /dev/sdb1 /snap-ro > # mount -t unionfs -o dirs=/snap-ro=rw unionfs /snap-rw > # unionctl /snap-rw --add --before /snap-ro --mode rw /change > <use /snap-rw as a writable snapshot mount> > > # mount > ... > /dev/sdb1 on /nilfs type nilfs2 (rw,gcpid=9512) > /dev/sdb1 on /snap-ro type nilfs2 (ro,cp=62305) > unionfs on /snap-rw type unionfs (rw,dirs=/snap-ro=rw) > > It's working fine. > Is there a quicker way? (Or something to add?) well, with new unionfs, you wouldn't use unionctl, but what I did was close, but something more along the lines of - to setup mkdir /base /nilfs /snap-ro /union mount -t ext3 /dev/sdb1 /base mount -t nilfs2 /dev/sdb2 /nilfs mkdir /nilfs/1 mount -t unionfs -o dirs=/nilfs/1=rw,/base=ro none /union (use union), all writes go to nilfs - to rollback to a checkpoint, but keep it writable. mount -t nilfs2 -o ro,cp=xyz /dev/sdb1 /snap-ro mkdir /nilfs/2 mount -t unionfs -o dirs=/nilfs/2=rw,/nilfs/1=ro,/base=ro none /union i.e. chaining the /nilfs/* dirs together with unionfs. ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <48BBFBEA.2000308-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>]
* Re: directory entries [not found] ` <48BBFBEA.2000308-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> @ 2008-09-01 17:31 ` Ryusuke Konishi 0 siblings, 0 replies; 15+ messages in thread From: Ryusuke Konishi @ 2008-09-01 17:31 UTC (permalink / raw) To: spotter-eQaUEPhvms7ENvBUuze7eA; +Cc: users-JrjvKiOkagjYtjvyW6yDsg On Mon, 01 Sep 2008 10:27:54 -0400, Shaya Potter wrote: > Ryusuke Konishi wrote: > > It's working fine. > > Is there a quicker way? (Or something to add?) > > well, with new unionfs, you wouldn't use unionctl, but what I did was > close, but something more along the lines of > > - to setup > > mkdir /base /nilfs /snap-ro /union > mount -t ext3 /dev/sdb1 /base > mount -t nilfs2 /dev/sdb2 /nilfs > > mkdir /nilfs/1 > > mount -t unionfs -o dirs=/nilfs/1=rw,/base=ro none /union > (use union), all writes go to nilfs > > - to rollback to a checkpoint, but keep it writable. > > mount -t nilfs2 -o ro,cp=xyz /dev/sdb1 /snap-ro > mkdir /nilfs/2 > mount -t unionfs -o dirs=/nilfs/2=rw,/nilfs/1=ro,/base=ro none /union > > i.e. chaining the /nilfs/* dirs together with unionfs. Thank you for letting us know! It looks nicer than mine. These may be able to be wrapped in a shell script or in the mount.nilfs2 helper program. Or, we may be able to use autofs to make these mounts automatic. Cheers, Ryusuke ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: directory entries [not found] ` <20080901.143956.08023399.ryusuke-sG5X7nlA6pw@public.gmane.org> 2008-09-01 5:51 ` Shaya Potter @ 2008-09-01 11:07 ` Reinoud Zandijk [not found] ` <20080901110730.GA21008-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org> 1 sibling, 1 reply; 15+ messages in thread From: Reinoud Zandijk @ 2008-09-01 11:07 UTC (permalink / raw) To: Ryusuke Konishi; +Cc: users-JrjvKiOkagjYtjvyW6yDsg [-- Attachment #1.1: Type: text/plain, Size: 2292 bytes --] Dear folks, dear Ryusuke, On Mon, Sep 01, 2008 at 02:39:56PM +0900, Ryusuke Konishi wrote: > Or, we can just think the ``main'' stream was replaced by the > continued snapshot every time it is mounted in rw-mode. In this case, > the head is regarded to be moved to the new (latest) checkpoint. This > is actually convenient for the recovery in which a user pushed > ``recover button'' for the snapshot. > > Note that even the old head becomes a plain checkpoint, it's still > mountable and continuable again by being changed to a snapshot. sounds reasonable yes... but why would that give problems with the DAT file? If you allways load the DAT, CP and SU descriptors from the latest checkpoint as recorded in the superroot even when mounting a snapshot read-write all is ok i think. The `old head' will be preserved initially as will be all allocations. If you want to keep the old head you'll have to make it a snapshot and thus protect all entries that have the old head snapshot in their intervals. Changing the `old head' snapshot to a checkpoint will just result in the cleaned up AFAICS. Do you expect trouble with the current interval code in the cleaner? I can't see why the DAT would need change. > To that end, this file must be extended to handle multiple lifetimes > per block, and this would complicate the DAT. Without the DAT file, > things are not so difficult. This would be achieved in a future, but > in the meantime, I'll use rsync to continue snapshots ;) are you thinking about removing the DAT file?? I thought it was the addition to v2.0 :) > > We could also give snapshot/head a name; then > > increasing checkpoints is no issue if you keep the head name > > Yeah, this would be possible by adding another meta data file > (e.g. tag file) which maps the HEAD names to checkpoint numbers of > snapshots. When keeping multiple writeable snapshots, this kind of > extension would be demanded than now. However, I'd rather do this in > userland with a regular file (i.e. a DB file) or with TAG files each > of which simply records a checkpoint number. The extra meta file sounds good but i dont like the `userland' DB solution; it would make nilfs dependent on DB4 (or the like) and it could make it non-selfcontaining. With regards, Reinoud [-- Attachment #1.2: Type: application/pgp-signature, Size: 478 bytes --] [-- Attachment #2: Type: text/plain, Size: 158 bytes --] _______________________________________________ users mailing list users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org https://www.nilfs.org/mailman/listinfo/users ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <20080901110730.GA21008-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>]
* Re: directory entries [not found] ` <20080901110730.GA21008-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org> @ 2008-09-01 16:51 ` Ryusuke Konishi [not found] ` <20080902.015156.126164477.ryusuke-sG5X7nlA6pw@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Ryusuke Konishi @ 2008-09-01 16:51 UTC (permalink / raw) To: reinoud-S783fYmB3Ccdnm+yROfE0A; +Cc: users-JrjvKiOkagjYtjvyW6yDsg On Mon, 1 Sep 2008 13:07:30 +0200, Reinoud Zandijk wrote: > Dear folks, dear Ryusuke, > > On Mon, Sep 01, 2008 at 02:39:56PM +0900, Ryusuke Konishi wrote: > > Or, we can just think the ``main'' stream was replaced by the > > continued snapshot every time it is mounted in rw-mode. In this case, > > the head is regarded to be moved to the new (latest) checkpoint. This > > is actually convenient for the recovery in which a user pushed > > ``recover button'' for the snapshot. > > > > Note that even the old head becomes a plain checkpoint, it's still > > mountable and continuable again by being changed to a snapshot. > > sounds reasonable yes... but why would that give problems with the DAT > file? If you allways load the DAT, CP and SU descriptors from the latest > checkpoint as recorded in the superroot even when mounting a snapshot > read-write all is ok i think. Good question. Without garbage collection, things become simple like that. With the garbage collection, however, the older versions of the DAT, SU and CP cannot be selected because part of their entries may be invalidated or reused; they will become meaningless by a GC. Consequently, the NILFS2 GC removes the old blocks belonging to these meta data files or old super roots instead of moving them to a new log. As the result, only the latest files are available against these three files. > The `old head' will be preserved initially as > will be all allocations. If you want to keep the old head you'll have to > make it a snapshot and thus protect all entries that have the old head > snapshot in their intervals. Yeah, but I think we don't have to necessarily keep the old checkpoint. Even if we keep the previous head, changing its checkpoint to a snapshot can be done when appending a writable snapshot to the head. > Changing the `old head' snapshot to a checkpoint will just result in the > cleaned up AFAICS. Do you expect trouble with the current interval code in > the cleaner? Sorry for confusing you. I mean copying the snapshot to a new checkpoint which is appended after the latest checkpoint; I don't mean downgrading the old snapshot to a checkpoint. Because the NILFS2 GC never reclaims the current checkpoint (the current head), we don't have to change the continued head to a snapshot until switching it again to other snapshots. > I can't see why the DAT would need change. As I mentioned above, we cannot use the past version of the DAT. So, to allow multiple forks, we have to maintain multiple lifetime information for each virtual block address in the single DAT. > > To that end, this file must be extended to handle multiple lifetimes > > per block, and this would complicate the DAT. Without the DAT file, > > things are not so difficult. This would be achieved in a future, but > > in the meantime, I'll use rsync to continue snapshots ;) > > are you thinking about removing the DAT file?? I thought it was the > addition to v2.0 :) No, it was achieved in v1.0 :) Unfortunately we cannot go back to the world without the GC, so I cannot remove the DAT file. > > > We could also give snapshot/head a name; then > > > increasing checkpoints is no issue if you keep the head name > > > > Yeah, this would be possible by adding another meta data file > > (e.g. tag file) which maps the HEAD names to checkpoint numbers of > > snapshots. When keeping multiple writeable snapshots, this kind of > > extension would be demanded than now. However, I'd rather do this in > > userland with a regular file (i.e. a DB file) or with TAG files each > > of which simply records a checkpoint number. > > The extra meta file sounds good but i dont like the `userland' DB solution; > it would make nilfs dependent on DB4 (or the like) and it could make it > non-selfcontaining. I think we can introduce the tags without bringing them into the NILFS2 kernel module. Only the mount program and snapshot tools need the tags, and the conversions from tags to checkpoint numbers can be done in userland. Adding the tag file is not bad idea, but this would increase the performance penalty as well as the lines of kernel code. That is the reason why I hesitate to admit this meta file. Regards, Ryusuke Konishi ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <20080902.015156.126164477.ryusuke-sG5X7nlA6pw@public.gmane.org>]
* Re: directory entries [not found] ` <20080902.015156.126164477.ryusuke-sG5X7nlA6pw@public.gmane.org> @ 2008-09-02 15:02 ` Reinoud Zandijk [not found] ` <20080902150226.GA28292-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Reinoud Zandijk @ 2008-09-02 15:02 UTC (permalink / raw) To: Ryusuke Konishi; +Cc: users-JrjvKiOkagjYtjvyW6yDsg [-- Attachment #1.1: Type: text/plain, Size: 4007 bytes --] Hi Ryusuke, hi folks, On Tue, Sep 02, 2008 at 01:51:56AM +0900, Ryusuke Konishi wrote: > > sounds reasonable yes... but why would that give problems with the DAT > > file? If you allways load the DAT, CP and SU descriptors from the latest > > checkpoint as recorded in the superroot even when mounting a snapshot > > read-write all is ok i think. > > Good question. > Without garbage collection, things become simple like that. > > With the garbage collection, however, the older versions of the DAT, > SU and CP cannot be selected because part of their entries may be > invalidated or reused; they will become meaningless by a GC. > Consequently, the NILFS2 GC removes the old blocks belonging to these > meta data files or old super roots instead of moving them to a new > log. As the result, only the latest files are available against these > three files. True, in my implementation i allways go for the latest checkpoint's DAT, CP and SU even when mounting a previous snapshot since older ones don't make sense anymore as you already stated. > > I can't see why the DAT would need change. > > As I mentioned above, we cannot use the past version of the DAT. So, > to allow multiple forks, we have to maintain multiple lifetime > information for each virtual block address in the single DAT. I dont think thats needed. In the case with no history retention at all all snapshots will be retained anyway regardless of how they are created/forked right? So each snapshot keeps mountable and updateable. There will only be a number of ifiles alive for each snapshot one like its already now. If you'd like you modify the GC to keep say the history for a specified time it only means that it needs to keep the history for the specified time for all the snapshots that can be mounted as `head'. > > The extra meta file sounds good but i dont like the `userland' DB solution; > > it would make nilfs dependent on DB4 (or the like) and it could make it > > non-selfcontaining. > > I think we can introduce the tags without bringing them into the > NILFS2 kernel module. Only the mount program and snapshot tools need > the tags, and the conversions from tags to checkpoint numbers can be > done in userland. i'd still opt the kernel metafile solution. The best way is also the easiest i think : struct nilfs_checkpoint_list { __le64 ssl_next; __le64 ssl_prev; }; struct nilfs_checkpoint { __le32 cp_flags; __le32 cp_checkpoints_count; struct nilfs_checkpoint_list cp_snapshot_list; __le64 cp_cno; __le64 cp_create; __le64 cp_nblk_inc; __le64 cp_inodes_count; __le64 cp_blocks_count; /* Reserved (might be deleted) */ /* Do not change the byte offset of ifile inode. To keep the compatibility of the disk format, additional fields should be added behind cp_ifile_inode. */ struct nilfs_inode cp_ifile_inode; struct nilfs_checkpoint_list cp_head_list; char cp_headname[16]; }; So just add a `cp_head_list' inside the nilfs_checkpoint info and a head name. (pitty one can't reorder the entries). If creating a new checkpoint one can update the mounted checkpoint's `cp_head_list' to point to the newly added checkpointnr and write out the new checkpoint. This checkpoint also stores the name so in case the chain is lost due to a software bug or whatever, it can be rebuild. It is also handy for dumping snapshots with their head names. It hardly takes any coding and has minimal inpact on the rest. > Adding the tag file is not bad idea, but this would increase the > performance penalty as well as the lines of kernel code. That is the > reason why I hesitate to admit this meta file. Is there a performance penalty then? It is at most only checked at mount/unmount time.... BTW, its a pitty the data structures are not tagged and/or versioned :-/ it would be handy to be able to recognize and reorganise things. With regards, Reinoud [-- Attachment #1.2: Type: application/pgp-signature, Size: 478 bytes --] [-- Attachment #2: Type: text/plain, Size: 158 bytes --] _______________________________________________ users mailing list users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org https://www.nilfs.org/mailman/listinfo/users ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <20080902150226.GA28292-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>]
* Re: directory entries [not found] ` <20080902150226.GA28292-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org> @ 2008-09-03 12:39 ` Reinoud Zandijk 2008-09-03 16:32 ` Ryusuke Konishi 1 sibling, 0 replies; 15+ messages in thread From: Reinoud Zandijk @ 2008-09-03 12:39 UTC (permalink / raw) To: NILFS Users mailing list [-- Attachment #1.1: Type: text/plain, Size: 1241 bytes --] Hi Ryusuke, hi folks, On Tue, Sep 02, 2008 at 05:02:26PM +0200, Reinoud Zandijk wrote: > > > I can't see why the DAT would need change. > > > > As I mentioned above, we cannot use the past version of the DAT. So, > > to allow multiple forks, we have to maintain multiple lifetime > > information for each virtual block address in the single DAT. > > I dont think thats needed. In the case with no history retention at all all > snapshots will be retained anyway regardless of how they are created/forked > right? So each snapshot keeps mountable and updateable. There will only be > a number of ifiles alive for each snapshot one like its already now. > > If you'd like you modify the GC to keep say the history for a specified > time it only means that it needs to keep the history for the specified time > for all the snapshots that can be mounted as `head'. Hmmm, I forgot that the virtual adresses will get multiplexed unless new virtual adresses are used... that suxs. So basicly each virtual address needs an indicator on which head its on and if its not the current head, a new virtual address needs to be assigned on write... hmmm maybe we should cancel the idea of multiple heads for now? :) with regards, Reinoud [-- Attachment #1.2: Type: application/pgp-signature, Size: 478 bytes --] [-- Attachment #2: Type: text/plain, Size: 158 bytes --] _______________________________________________ users mailing list users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org https://www.nilfs.org/mailman/listinfo/users ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: directory entries [not found] ` <20080902150226.GA28292-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org> 2008-09-03 12:39 ` Reinoud Zandijk @ 2008-09-03 16:32 ` Ryusuke Konishi 1 sibling, 0 replies; 15+ messages in thread From: Ryusuke Konishi @ 2008-09-03 16:32 UTC (permalink / raw) To: reinoud-S783fYmB3Ccdnm+yROfE0A; +Cc: users-JrjvKiOkagjYtjvyW6yDsg Hi Reinoud, On Wed, 3 Sep 2008 14:39:40 +0200, Reinoud Zandijk wrote: > Hmmm, I forgot that the virtual adresses will get multiplexed unless new > virtual adresses are used... that suxs. So basicly each virtual address > needs an indicator on which head its on and if its not the current head, a > new virtual address needs to be assigned on write... hmmm > > maybe we should cancel the idea of multiple heads for now? :) Yep, so it's classified to the future TODOs :) On Tue, 2 Sep 2008 17:02:26 +0200, Reinoud Zandijk wrote: > > > The extra meta file sounds good but i dont like the `userland' DB solution; > > > it would make nilfs dependent on DB4 (or the like) and it could make it > > > non-selfcontaining. > > > > I think we can introduce the tags without bringing them into the > > NILFS2 kernel module. Only the mount program and snapshot tools need > > the tags, and the conversions from tags to checkpoint numbers can be > > done in userland. > > i'd still opt the kernel metafile solution. The best way is also the > easiest i think : > > struct nilfs_checkpoint_list { > __le64 ssl_next; > __le64 ssl_prev; > }; > > struct nilfs_checkpoint { > __le32 cp_flags; > __le32 cp_checkpoints_count; > struct nilfs_checkpoint_list cp_snapshot_list; > __le64 cp_cno; > __le64 cp_create; > __le64 cp_nblk_inc; > __le64 cp_inodes_count; > __le64 cp_blocks_count; /* Reserved (might be deleted) */ > > /* Do not change the byte offset of ifile inode. > To keep the compatibility of the disk format, > additional fields should be added behind cp_ifile_inode. */ > struct nilfs_inode cp_ifile_inode; > > struct nilfs_checkpoint_list cp_head_list; > char cp_headname[16]; > }; > > So just add a `cp_head_list' inside the nilfs_checkpoint info and a head > name. (pitty one can't reorder the entries). If creating a new checkpoint > one can update the mounted checkpoint's `cp_head_list' to point to the > newly added checkpointnr and write out the new checkpoint. This checkpoint > also stores the name so in case the chain is lost due to a software bug or > whatever, it can be rebuild. It is also handy for dumping snapshots with > their head names. Sure, it's the primary way. It has an advantage to list tag names in the lscp command. But, as you guessed, this is a bit restrictive (i.e. limited length and single tag per snapshot) as well as it fattens the cpfile. I won't reply NAK, but I want to reserve this as a last measure and discuss alternatives carefully because reverting disk format is almost impossible. > > Adding the tag file is not bad idea, but this would increase the > > performance penalty as well as the lines of kernel code. That is the > > reason why I hesitate to admit this meta file. > > Is there a performance penalty then? It is at most only checked at > mount/unmount time.... I worried about penalties against segment constructions. But I may do not have to get so nervous. At least we have a way around; unless we write the HEAD tag on the file, we can make updates of the tag file infrequent. The update will be needed only when switching to other forks or manipulating tags. Cheers, Ryusuke ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2008-09-03 16:32 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-23 20:38 directory entries Reinoud Zandijk
[not found] ` <20080823203853.GA19421-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
2008-08-25 3:21 ` Ryusuke Konishi
[not found] ` <20080825.122125.65657043.ryusuke-sG5X7nlA6pw@public.gmane.org>
2008-08-25 3:30 ` Ryusuke Konishi
[not found] ` <20080825.123047.128885778.ryusuke-sG5X7nlA6pw@public.gmane.org>
2008-08-25 15:52 ` Reinoud Zandijk
[not found] ` <20080825155243.GA12855-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
2008-08-26 10:29 ` Ryusuke Konishi
[not found] ` <20080826.192942.104752679.ryusuke-sG5X7nlA6pw@public.gmane.org>
2008-08-26 13:29 ` Reinoud Zandijk
[not found] ` <20080901.143956.08023399.ryusuke@osrg.net>
[not found] ` <20080901.143956.08023399.ryusuke-sG5X7nlA6pw@public.gmane.org>
2008-09-01 5:51 ` Shaya Potter
[not found] ` <48BB82F7.4070607-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-09-01 8:16 ` Ryusuke Konishi
[not found] ` <20080901.171643.74124381.ryusuke-sG5X7nlA6pw@public.gmane.org>
2008-09-01 14:27 ` Shaya Potter
[not found] ` <48BBFBEA.2000308-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-09-01 17:31 ` Ryusuke Konishi
2008-09-01 11:07 ` Reinoud Zandijk
[not found] ` <20080901110730.GA21008-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
2008-09-01 16:51 ` Ryusuke Konishi
[not found] ` <20080902.015156.126164477.ryusuke-sG5X7nlA6pw@public.gmane.org>
2008-09-02 15:02 ` Reinoud Zandijk
[not found] ` <20080902150226.GA28292-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
2008-09-03 12:39 ` Reinoud Zandijk
2008-09-03 16:32 ` Ryusuke Konishi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox