* Dump tools for checkpoints
@ 2009-03-01 0:57 nilfs-MZZvbRqs/9F0RdzJJlgK+g
[not found] ` <2883.84.133.202.72.1235869032.squirrel-lZV7jZi4TBeO2lRZCxuEEg@public.gmane.org>
0 siblings, 1 reply; 11+ messages in thread
From: nilfs-MZZvbRqs/9F0RdzJJlgK+g @ 2009-03-01 0:57 UTC (permalink / raw)
To: users-JrjvKiOkagjYtjvyW6yDsg
Hi,
a great feature would be a tool to dump all filesystem changes since a
given checkpoint up to the newest available cp. The resulting file could
be fed to another filesystem for replication.
For a start, two tools: One to create and one to execute a binary dump.
Other projects can develop on top of these. A networked client / server to
keep a remote site in sync in master / slave style, for example. But some
self-made scripts could pipe the dump files between systems as well.
This would be much faster, compared to rsync and similar tools which need
to scan the source system.
Greetings,
Pierre Beck
^ permalink raw reply [flat|nested] 11+ messages in thread[parent not found: <2883.84.133.202.72.1235869032.squirrel-lZV7jZi4TBeO2lRZCxuEEg@public.gmane.org>]
* Re: Dump tools for checkpoints [not found] ` <2883.84.133.202.72.1235869032.squirrel-lZV7jZi4TBeO2lRZCxuEEg@public.gmane.org> @ 2009-03-08 5:39 ` Ryusuke Konishi [not found] ` <20090308.143916.113539820.ryusuke-sG5X7nlA6pw@public.gmane.org> 0 siblings, 1 reply; 11+ messages in thread From: Ryusuke Konishi @ 2009-03-08 5:39 UTC (permalink / raw) To: users-JrjvKiOkagjYtjvyW6yDsg, nilfs-MZZvbRqs/9F0RdzJJlgK+g Hi, On Sun, 01 Mar 2009 01:57:18 +0100, nilfs-MZZvbRqs/9F0RdzJJlgK+g@public.gmane.org wrote: > Hi, > > a great feature would be a tool to dump all filesystem changes since a > given checkpoint up to the newest available cp. The resulting file could > be fed to another filesystem for replication. > > For a start, two tools: One to create and one to execute a binary dump. > > Other projects can develop on top of these. A networked client / server to > keep a remote site in sync in master / slave style, for example. But some > self-made scripts could pipe the dump files between systems as well. > > This would be much faster, compared to rsync and similar tools which need > to scan the source system. > > Greetings, > > Pierre Beck Sorry for my late reply. I've moved the replication feature to the top of todo list in our web site ;) Actually I want to realize this feature in some form, and had considered details several times. What I want to realize is checkpoint based replication including incremental dumping and restoration of file system states. Basically I believe this is possible, but one problem is how to realize rollback of the remote nilfs which will become required when allowing user's updates or garbage collection on the remote device after synchronization. Some of meta data files of nilfs (e.g. checkpoint file, segment usage file, disk address translation file) do not keep past versions even though they are written in a copy-on-write manner. And it would be too complex to roll back these files. At present, I am seeking the way not requiring big change of disk format, but some state conflictions (concretely speaking, confliction of virtual block numers maintained in the DAT file) must be resolved under the condition. A breakthrough is needed for this. Anyway, I'm expecting we can find out a realistic solution for those technical things at some stage. Regards, Ryusuke Konishi ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <20090308.143916.113539820.ryusuke-sG5X7nlA6pw@public.gmane.org>]
* Re: Dump tools for checkpoints [not found] ` <20090308.143916.113539820.ryusuke-sG5X7nlA6pw@public.gmane.org> @ 2009-03-12 15:26 ` Reinoud Zandijk [not found] ` <20090312152618.GA25317-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org> 2009-03-13 16:26 ` Leon Weber 2009-03-13 16:42 ` nilfs-MZZvbRqs/9F0RdzJJlgK+g 2 siblings, 1 reply; 11+ messages in thread From: Reinoud Zandijk @ 2009-03-12 15:26 UTC (permalink / raw) To: NILFS Users mailing list; +Cc: nilfs-MZZvbRqs/9F0RdzJJlgK+g Dear Ryusuke, dear Pierre, On Sun, Mar 08, 2009 at 02:39:16PM +0900, Ryusuke Konishi wrote: > Actually I want to realize this feature in some form, and had > considered details several times. What I want to realize is > checkpoint based replication including incremental dumping and > restoration of file system states. depends on the granlarity of the backup; do you want backup time backups or checkpoint/snapshot based? The first could be implemented (though not watertight) by comparing the DAT files between checkpoint P and Q at backup time. If while parsing the filetree at Q a change is noticed in the DAT allocation of the vblock the file is changed. The 2nd is easier and more logical; a snapshot is marked as the last backup time. Then when the backup is updated, the checkpoints are walked and -just like the cleaner- the diff is cleaned up between this snapshot and the next backup snapshot. After the backup the old backup snapshot is either deleted or is unmarked for backup. This way the backup granularity can be controlled by the user in the time between backup snapshots. just an idea, possible? problems? With regards, Reinoud ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <20090312152618.GA25317-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>]
* Re: Dump tools for checkpoints [not found] ` <20090312152618.GA25317-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org> @ 2009-03-13 15:33 ` Ryusuke Konishi 0 siblings, 0 replies; 11+ messages in thread From: Ryusuke Konishi @ 2009-03-13 15:33 UTC (permalink / raw) To: users-JrjvKiOkagjYtjvyW6yDsg, reinoud-S783fYmB3Ccdnm+yROfE0A Cc: nilfs-MZZvbRqs/9F0RdzJJlgK+g Hi, Reinoud, On Thu, 12 Mar 2009 16:26:18 +0100, Reinoud Zandijk wrote: > Dear Ryusuke, dear Pierre, > > On Sun, Mar 08, 2009 at 02:39:16PM +0900, Ryusuke Konishi wrote: > > Actually I want to realize this feature in some form, and had > > considered details several times. What I want to realize is > > checkpoint based replication including incremental dumping and > > restoration of file system states. > > depends on the granlarity of the backup; do you want backup time backups or > checkpoint/snapshot based? My image of the checkpoint based replication (increment dumping and restoration) is as follows: # sendcp -i <cno-from> <cno-to> [device] | ssh remote-host recvcp [device] Not backup time backups. But it's not important for me because time and checkpoint number is convertible by the checkpoint file. (though reverse conversion from time to checkpoint number is not efficient) > The first could be implemented (though not watertight) by comparing > the DAT files between checkpoint P and Q at backup time. If while > parsing the filetree at Q a change is noticed in the DAT allocation > of the vblock the file is changed. One problem is the DAT file does not have generations. As I mentioned in the previous mail, past versions of DAT is not maintained by nilfs though it is written in a copy-on-write manner; GC breaks past versions of the DAT file. if we continuosly replicate blocks from client to server, this might become possible because keeping old DAT blocks would not suffer big overhead and GC for the DAT (partially having past versions) would not become so hard. On the other hand, it is possible to extract vblocks (and their pblocks) within a period from the DAT file because it has lifetime information for each vblock. It's not efficient because it requires full scan of DAT, but seems to be a ponderable if we give priority to keep the current design. Without garbage collection, things are much easier. It is possible just by scanning delta from the log at P to the log at Q. sigh. > The 2nd is easier and more logical; a snapshot is marked as the last > backup time. Then when the backup is updated, the checkpoints are > walked and -just like the cleaner- the diff is cleaned up between > this snapshot and the next backup snapshot. After the backup the old > backup snapshot is either deleted or is unmarked for backup. This > way the backup granularity can be controlled by the user in the time > between backup snapshots. > > just an idea, possible? problems? > This is closer to what I want to do, but I'd like to extract delta in block level without comparing trees. B-trees can undergo a great transformation. So, it seems neither easy nor simple. If give up the efficient B-tree comparision, rsync seems enough to me. In addition, to compare two filesystem generations entirely, the compare routine must know some meta data structures. This is likely to make nilfs bigger and more complex. I even feel these considerations are iffy. Maybe we should go into each approach more. Regards, Ryusuke ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Dump tools for checkpoints [not found] ` <20090308.143916.113539820.ryusuke-sG5X7nlA6pw@public.gmane.org> 2009-03-12 15:26 ` Reinoud Zandijk @ 2009-03-13 16:26 ` Leon Weber [not found] ` <20090313162609.GD7494-Tv691C15nu5348sJH/KcYBvVK+yQ3ZXh@public.gmane.org> 2009-03-13 16:42 ` nilfs-MZZvbRqs/9F0RdzJJlgK+g 2 siblings, 1 reply; 11+ messages in thread From: Leon Weber @ 2009-03-13 16:26 UTC (permalink / raw) To: users-JrjvKiOkagjYtjvyW6yDsg [-- Attachment #1.1: Type: text/plain, Size: 772 bytes --] On 08.03.2009 14:39:16, Ryusuke Konishi wrote: > I've moved the replication feature to the top of todo list in our web > site ;) Is replication really a feature that a file system should implement? I think it should be implemented one layer below the fs. If one needs a replicated file system, there are things like drbd (distributed replicated block device). Of course, the file system should support being on a replicated device, though I don't really see a need to implement this on the fs layer, or am I missing anything? Leon -- Leon Weber, leon-UvuJAy62EvlM7kwft8N7nw@public.gmane.org 0x8E04D7FC jabber: leon-2ASvDZBniIemthueiVhc4g@public.gmane.org (icq: 261067046) -- Geizige Menschen sind unangenehme Zeitgenossen - aber angenehme Vorfahren. [-- Attachment #1.2: Type: application/pgp-signature, Size: 197 bytes --] [-- Attachment #2: Type: text/plain, Size: 158 bytes --] _______________________________________________ users mailing list users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org https://www.nilfs.org/mailman/listinfo/users ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <20090313162609.GD7494-Tv691C15nu5348sJH/KcYBvVK+yQ3ZXh@public.gmane.org>]
* Re: Dump tools for checkpoints [not found] ` <20090313162609.GD7494-Tv691C15nu5348sJH/KcYBvVK+yQ3ZXh@public.gmane.org> @ 2009-03-13 16:53 ` Gergely Gábor 2009-03-13 17:33 ` mail-MZZvbRqs/9F0RdzJJlgK+g 2009-03-15 13:58 ` Ryusuke Konishi 2 siblings, 0 replies; 11+ messages in thread From: Gergely Gábor @ 2009-03-13 16:53 UTC (permalink / raw) To: NILFS Users mailing list This is a feature to provide a COW natured offsite backup, with some granularity, i think, or i can imagine this as a useful thing this way. Greetings: Gergely Gábor On Fri, Mar 13, 2009 at 5:26 PM, Leon Weber <leon@leonweber.de> wrote: > On 08.03.2009 14:39:16, Ryusuke Konishi wrote: >> I've moved the replication feature to the top of todo list in our web >> site ;) > > Is replication really a feature that a file system should implement? I > think it should be implemented one layer below the fs. If one needs a > replicated file system, there are things like drbd (distributed > replicated block device). Of course, the file system should support being > on a replicated device, though I don't really see a need to implement > this on the fs layer, or am I missing anything? > > Leon > > -- > Leon Weber, leon@leonweber.de 0x8E04D7FC > jabber: leon@jabber.ccc.de (icq: 261067046) > -- > Geizige Menschen sind unangenehme Zeitgenossen - aber angenehme Vorfahren. > > _______________________________________________ > users mailing list > users@nilfs.org > https://www.nilfs.org/mailman/listinfo/users > > _______________________________________________ users mailing list users@nilfs.org https://www.nilfs.org/mailman/listinfo/users ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Dump tools for checkpoints [not found] ` <20090313162609.GD7494-Tv691C15nu5348sJH/KcYBvVK+yQ3ZXh@public.gmane.org> 2009-03-13 16:53 ` Gergely Gábor @ 2009-03-13 17:33 ` mail-MZZvbRqs/9F0RdzJJlgK+g [not found] ` <3214.84.133.153.87.1236965583.squirrel-lZV7jZi4TBeO2lRZCxuEEg@public.gmane.org> 2009-03-15 13:58 ` Ryusuke Konishi 2 siblings, 1 reply; 11+ messages in thread From: mail-MZZvbRqs/9F0RdzJJlgK+g @ 2009-03-13 17:33 UTC (permalink / raw) Cc: users-JrjvKiOkagjYtjvyW6yDsg Hi, > Is replication really a feature that a file system should implement? I > think it should be implemented one layer below the fs. If one needs a > replicated file system, there are things like drbd (distributed > replicated block device). yes, the task is similar to what drbd does. But for nightly incremental backup, drbd would be overkill. It's called "RAID1 over network"! Rsync is slow and consumes much CPU. Incremental backup software can't cope with big files AFAIK. Nilfs could fill that gap perfectly. - Have database / virtual machines / etc. on nilfs - Do a nightly or hourly "nsync" to a remote mirror Regards, Pierre Beck ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <3214.84.133.153.87.1236965583.squirrel-lZV7jZi4TBeO2lRZCxuEEg@public.gmane.org>]
* Re: Dump tools for checkpoints [not found] ` <3214.84.133.153.87.1236965583.squirrel-lZV7jZi4TBeO2lRZCxuEEg@public.gmane.org> @ 2009-03-16 0:57 ` Ryusuke Konishi 0 siblings, 0 replies; 11+ messages in thread From: Ryusuke Konishi @ 2009-03-16 0:57 UTC (permalink / raw) To: users-JrjvKiOkagjYtjvyW6yDsg, mail-MZZvbRqs/9F0RdzJJlgK+g Hi, On Fri, 13 Mar 2009 18:33:08 +0100, mail-MZZvbRqs/9F0RdzJJlgK+g@public.gmane.org wrote: > Hi, > > > Is replication really a feature that a file system should implement? I > > think it should be implemented one layer below the fs. If one needs a > > replicated file system, there are things like drbd (distributed > > replicated block device). > > yes, the task is similar to what drbd does. But for nightly incremental > backup, drbd would be overkill. It's called "RAID1 over network"! Rsync is > slow and consumes much CPU. Incremental backup software can't cope with > big files AFAIK. Nilfs could fill that gap perfectly. > > - Have database / virtual machines / etc. on nilfs > - Do a nightly or hourly "nsync" to a remote mirror > > Regards, > > Pierre Beck "nsync" sounds nice! Regards, Ryusuke Konishi ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Dump tools for checkpoints [not found] ` <20090313162609.GD7494-Tv691C15nu5348sJH/KcYBvVK+yQ3ZXh@public.gmane.org> 2009-03-13 16:53 ` Gergely Gábor 2009-03-13 17:33 ` mail-MZZvbRqs/9F0RdzJJlgK+g @ 2009-03-15 13:58 ` Ryusuke Konishi 2 siblings, 0 replies; 11+ messages in thread From: Ryusuke Konishi @ 2009-03-15 13:58 UTC (permalink / raw) To: users-JrjvKiOkagjYtjvyW6yDsg, leon-UvuJAy62EvlM7kwft8N7nw Hi, On Fri, 13 Mar 2009 17:26:09 +0100, Leon Weber wrote: > On 08.03.2009 14:39:16, Ryusuke Konishi wrote: > > I've moved the replication feature to the top of todo list in our web > > site ;) > > Is replication really a feature that a file system should implement? I > think it should be implemented one layer below the fs. If one needs a > replicated file system, there are things like drbd (distributed > replicated block device). Yes, DRBD is another solution. I'm actually thinking it a possible measure though we haven't yet evaluate for nilfs. But as you know the granularity differs from that of the checkpoint based solution. Unless the checkpoint based solution doesn't have any merit, I think the solution to be chosen depends on customer's (users') demand and so-called service level. Anyway you have a good point. I should care not only rsync but also drbd and other block layer solutions, and must find merits definitely. > Of course, the file system should support being > on a replicated device, though I don't really see a need to implement > this on the fs layer, or am I missing anything? > > Leon In general, filesystems are said to have better opportunity to utilize the knowledge for their metadata structures and consistency especially for copy-on-write filesystems. Those can efficiently extract delta by nature, and can drop intermediate changes depending on required granularity. Will it true with nilfs? - I don't know yet, but I think it's one of the feature being worth of consideration. Thanks, Ryusuke Konishi ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Dump tools for checkpoints [not found] ` <20090308.143916.113539820.ryusuke-sG5X7nlA6pw@public.gmane.org> 2009-03-12 15:26 ` Reinoud Zandijk 2009-03-13 16:26 ` Leon Weber @ 2009-03-13 16:42 ` nilfs-MZZvbRqs/9F0RdzJJlgK+g [not found] ` <2893.84.133.153.87.1236962534.squirrel-lZV7jZi4TBeO2lRZCxuEEg@public.gmane.org> 2 siblings, 1 reply; 11+ messages in thread From: nilfs-MZZvbRqs/9F0RdzJJlgK+g @ 2009-03-13 16:42 UTC (permalink / raw) To: Ryusuke Konishi; +Cc: users-JrjvKiOkagjYtjvyW6yDsg Hi, > Basically I believe this is possible, but one problem is how to > realize rollback of the remote nilfs which will become required when > allowing user's updates or garbage collection on the remote device > after synchronization. let's call them master / slave, I'm confused which one is remote :) I assume only the master is writable and slaves don't do any changes or cleaning. Require that all checkpoints from last sync to present are still existing. Upon dump on master, cleanerd has to be suspended and may resume after dump. Does that make things easier? By the way: When I completely disable cleanerd on the master, wouldn't it be possible to define a byte-range on the block device which makes up the difference between two checkpoints and copy that to the slaves? Regards, Pierre Beck ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <2893.84.133.153.87.1236962534.squirrel-lZV7jZi4TBeO2lRZCxuEEg@public.gmane.org>]
* Re: Dump tools for checkpoints [not found] ` <2893.84.133.153.87.1236962534.squirrel-lZV7jZi4TBeO2lRZCxuEEg@public.gmane.org> @ 2009-03-15 15:12 ` Ryusuke Konishi 0 siblings, 0 replies; 11+ messages in thread From: Ryusuke Konishi @ 2009-03-15 15:12 UTC (permalink / raw) To: nilfs-MZZvbRqs/9F0RdzJJlgK+g; +Cc: users-JrjvKiOkagjYtjvyW6yDsg On Fri, 13 Mar 2009 17:42:20 +0100, nilfs-MZZvbRqs/9F0RdzJJlgK+g@public.gmane.org wrote: > Hi, > > > Basically I believe this is possible, but one problem is how to > > realize rollback of the remote nilfs which will become required when > > allowing user's updates or garbage collection on the remote device > > after synchronization. > > let's call them master / slave, I'm confused which one is remote :) > > I assume only the master is writable and slaves don't do any changes or > cleaning. If that premise is allowed, things would become so simple. But, is it suitable for practical use? If cleaning is not allowed, the slave partition gets full periodically. So, frequent full replications would be needed in that case. If we supposed to have very huge slave device, breaking the constraint could be costly because recovery to the slave state requires a full replication. > Require that all checkpoints from last sync to present are still existing. > Upon dump on master, cleanerd has to be suspended and may resume after > dump. Does that make things easier? Yes, absolutely. I have no mind to think the dumping while cleanerd operates. The dump tool at least should hold write lock for .nilfs to prevent cleaning, and I would make the tool so that it suspends and resumes cleanerd by signals (if I were). > By the way: When I completely disable cleanerd on the master, wouldn't it > be possible to define a byte-range on the block device which makes up the > difference between two checkpoints and copy that to the slaves? We would get multiple ranges on the device not a single range because nilfs may allocate segments in non-sequential order. (Though the current cleanerd algorithm induces fifo order allocation, it is not guranteed by nilfs.) The current sufile doesn't have information on the sequence (or chain) of allocated segments. So, it requires scanning of on-disk segment headers; each on-disk segment header has a next pointer. If the scanning cost becomes an issue, we can include the next pointer in sufile to minimize the cost. Regards, Ryusuke Konishi ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2009-03-16 0:57 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-01 0:57 Dump tools for checkpoints nilfs-MZZvbRqs/9F0RdzJJlgK+g
[not found] ` <2883.84.133.202.72.1235869032.squirrel-lZV7jZi4TBeO2lRZCxuEEg@public.gmane.org>
2009-03-08 5:39 ` Ryusuke Konishi
[not found] ` <20090308.143916.113539820.ryusuke-sG5X7nlA6pw@public.gmane.org>
2009-03-12 15:26 ` Reinoud Zandijk
[not found] ` <20090312152618.GA25317-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
2009-03-13 15:33 ` Ryusuke Konishi
2009-03-13 16:26 ` Leon Weber
[not found] ` <20090313162609.GD7494-Tv691C15nu5348sJH/KcYBvVK+yQ3ZXh@public.gmane.org>
2009-03-13 16:53 ` Gergely Gábor
2009-03-13 17:33 ` mail-MZZvbRqs/9F0RdzJJlgK+g
[not found] ` <3214.84.133.153.87.1236965583.squirrel-lZV7jZi4TBeO2lRZCxuEEg@public.gmane.org>
2009-03-16 0:57 ` Ryusuke Konishi
2009-03-15 13:58 ` Ryusuke Konishi
2009-03-13 16:42 ` nilfs-MZZvbRqs/9F0RdzJJlgK+g
[not found] ` <2893.84.133.153.87.1236962534.squirrel-lZV7jZi4TBeO2lRZCxuEEg@public.gmane.org>
2009-03-15 15:12 ` Ryusuke Konishi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox