Linux NILFS development
 help / color / mirror / Atom feed
* Dump tools for checkpoints
@ 2009-03-01  0:57 nilfs-MZZvbRqs/9F0RdzJJlgK+g
       [not found] ` <2883.84.133.202.72.1235869032.squirrel-lZV7jZi4TBeO2lRZCxuEEg@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: nilfs-MZZvbRqs/9F0RdzJJlgK+g @ 2009-03-01  0:57 UTC (permalink / raw)
  To: users-JrjvKiOkagjYtjvyW6yDsg

Hi,

a great feature would be a tool to dump all filesystem changes since a
given checkpoint up to the newest available cp. The resulting file could
be fed to another filesystem for replication.

For a start, two tools: One to create and one to execute a binary dump.

Other projects can develop on top of these. A networked client / server to
keep a remote site in sync in master / slave style, for example. But some
self-made scripts could pipe the dump files between systems as well.

This would be much faster, compared to rsync and similar tools which need
to scan the source system.

Greetings,

Pierre Beck

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Dump tools for checkpoints
       [not found] ` <2883.84.133.202.72.1235869032.squirrel-lZV7jZi4TBeO2lRZCxuEEg@public.gmane.org>
@ 2009-03-08  5:39   ` Ryusuke Konishi
       [not found]     ` <20090308.143916.113539820.ryusuke-sG5X7nlA6pw@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Ryusuke Konishi @ 2009-03-08  5:39 UTC (permalink / raw)
  To: users-JrjvKiOkagjYtjvyW6yDsg, nilfs-MZZvbRqs/9F0RdzJJlgK+g

Hi,
On Sun, 01 Mar 2009 01:57:18 +0100, nilfs-MZZvbRqs/9F0RdzJJlgK+g@public.gmane.org wrote:
> Hi,
> 
> a great feature would be a tool to dump all filesystem changes since a
> given checkpoint up to the newest available cp. The resulting file could
> be fed to another filesystem for replication.
> 
> For a start, two tools: One to create and one to execute a binary dump.
> 
> Other projects can develop on top of these. A networked client / server to
> keep a remote site in sync in master / slave style, for example. But some
> self-made scripts could pipe the dump files between systems as well.
> 
> This would be much faster, compared to rsync and similar tools which need
> to scan the source system.
> 
> Greetings,
> 
> Pierre Beck

Sorry for my late reply.

I've moved the replication feature to the top of todo list in our web
site ;)

Actually I want to realize this feature in some form, and had
considered details several times.  What I want to realize is
checkpoint based replication including incremental dumping and
restoration of file system states.

Basically I believe this is possible, but one problem is how to
realize rollback of the remote nilfs which will become required when
allowing user's updates or garbage collection on the remote device
after synchronization.

Some of meta data files of nilfs (e.g. checkpoint file, segment usage
file, disk address translation file) do not keep past versions even
though they are written in a copy-on-write manner.  And it would be
too complex to roll back these files.  At present, I am seeking the
way not requiring big change of disk format, but some state
conflictions (concretely speaking, confliction of virtual block numers
maintained in the DAT file) must be resolved under the condition.  A
breakthrough is needed for this.

Anyway, I'm expecting we can find out a realistic solution for those
technical things at some stage.

Regards,
Ryusuke Konishi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Dump tools for checkpoints
       [not found]     ` <20090308.143916.113539820.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2009-03-12 15:26       ` Reinoud Zandijk
       [not found]         ` <20090312152618.GA25317-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
  2009-03-13 16:26       ` Leon Weber
  2009-03-13 16:42       ` nilfs-MZZvbRqs/9F0RdzJJlgK+g
  2 siblings, 1 reply; 11+ messages in thread
From: Reinoud Zandijk @ 2009-03-12 15:26 UTC (permalink / raw)
  To: NILFS Users mailing list; +Cc: nilfs-MZZvbRqs/9F0RdzJJlgK+g

Dear Ryusuke, dear Pierre,

On Sun, Mar 08, 2009 at 02:39:16PM +0900, Ryusuke Konishi wrote:
> Actually I want to realize this feature in some form, and had
> considered details several times.  What I want to realize is
> checkpoint based replication including incremental dumping and
> restoration of file system states.

depends on the granlarity of the backup; do you want backup time backups or
checkpoint/snapshot based?

The first could be implemented (though not watertight) by comparing the DAT
files between checkpoint P and Q at backup time. If while parsing the filetree
at Q a change is noticed in the DAT allocation of the vblock the file is
changed.

The 2nd is easier and more logical; a snapshot is marked as the last backup
time. Then when the backup is updated, the checkpoints are walked and -just
like the cleaner- the diff is cleaned up between this snapshot and the next
backup snapshot. After the backup the old backup snapshot is either deleted or
is unmarked for backup. This way the backup granularity can be controlled by
the user in the time between backup snapshots.

just an idea, possible? problems?

With regards,
Reinoud

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Dump tools for checkpoints
       [not found]         ` <20090312152618.GA25317-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
@ 2009-03-13 15:33           ` Ryusuke Konishi
  0 siblings, 0 replies; 11+ messages in thread
From: Ryusuke Konishi @ 2009-03-13 15:33 UTC (permalink / raw)
  To: users-JrjvKiOkagjYtjvyW6yDsg, reinoud-S783fYmB3Ccdnm+yROfE0A
  Cc: nilfs-MZZvbRqs/9F0RdzJJlgK+g

Hi, Reinoud,
On Thu, 12 Mar 2009 16:26:18 +0100, Reinoud Zandijk wrote:
> Dear Ryusuke, dear Pierre,
> 
> On Sun, Mar 08, 2009 at 02:39:16PM +0900, Ryusuke Konishi wrote:
> > Actually I want to realize this feature in some form, and had
> > considered details several times.  What I want to realize is
> > checkpoint based replication including incremental dumping and
> > restoration of file system states.
> 
> depends on the granlarity of the backup; do you want backup time backups or
> checkpoint/snapshot based?

My image of the checkpoint based replication (increment dumping and
restoration) is as follows:

 # sendcp -i <cno-from> <cno-to> [device] | ssh remote-host recvcp [device]

Not backup time backups.  But it's not important for me because time
and checkpoint number is convertible by the checkpoint file.
(though reverse conversion from time to checkpoint number is not efficient)

> The first could be implemented (though not watertight) by comparing
> the DAT files between checkpoint P and Q at backup time. If while
> parsing the filetree at Q a change is noticed in the DAT allocation
> of the vblock the file is changed.

One problem is the DAT file does not have generations.

As I mentioned in the previous mail, past versions of DAT is not
maintained by nilfs though it is written in a copy-on-write manner; GC
breaks past versions of the DAT file.

if we continuosly replicate blocks from client to server, this might
become possible because keeping old DAT blocks would not suffer big
overhead and GC for the DAT (partially having past versions) would not
become so hard.

On the other hand, it is possible to extract vblocks (and their
pblocks) within a period from the DAT file because it has lifetime
information for each vblock.  It's not efficient because it requires
full scan of DAT, but seems to be a ponderable if we give priority to
keep the current design.

Without garbage collection, things are much easier.  It is possible
just by scanning delta from the log at P to the log at Q. sigh.

> The 2nd is easier and more logical; a snapshot is marked as the last
> backup time. Then when the backup is updated, the checkpoints are
> walked and -just like the cleaner- the diff is cleaned up between
> this snapshot and the next backup snapshot. After the backup the old
> backup snapshot is either deleted or is unmarked for backup. This
> way the backup granularity can be controlled by the user in the time
> between backup snapshots.
>
> just an idea, possible? problems?
> 

This is closer to what I want to do, but I'd like to extract delta in
block level without comparing trees.  B-trees can undergo a great
transformation.  So, it seems neither easy nor simple.  If give up the
efficient B-tree comparision, rsync seems enough to me.

In addition, to compare two filesystem generations entirely, the
compare routine must know some meta data structures.  This is likely
to make nilfs bigger and more complex.

I even feel these considerations are iffy.  Maybe we should go into
each approach more.


Regards,
Ryusuke

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Dump tools for checkpoints
       [not found]     ` <20090308.143916.113539820.ryusuke-sG5X7nlA6pw@public.gmane.org>
  2009-03-12 15:26       ` Reinoud Zandijk
@ 2009-03-13 16:26       ` Leon Weber
       [not found]         ` <20090313162609.GD7494-Tv691C15nu5348sJH/KcYBvVK+yQ3ZXh@public.gmane.org>
  2009-03-13 16:42       ` nilfs-MZZvbRqs/9F0RdzJJlgK+g
  2 siblings, 1 reply; 11+ messages in thread
From: Leon Weber @ 2009-03-13 16:26 UTC (permalink / raw)
  To: users-JrjvKiOkagjYtjvyW6yDsg


[-- Attachment #1.1: Type: text/plain, Size: 772 bytes --]

On 08.03.2009 14:39:16, Ryusuke Konishi wrote:
> I've moved the replication feature to the top of todo list in our web
> site ;)

Is replication really a feature that a file system should implement? I
think it should be implemented one layer below the fs. If one needs a
replicated file system, there are things like drbd (distributed
replicated block device). Of course, the file system should support being
on a replicated device, though I don't really see a need to implement
this on the fs layer, or am I missing anything?

Leon

-- 
Leon Weber, leon-UvuJAy62EvlM7kwft8N7nw@public.gmane.org 0x8E04D7FC
jabber: leon-2ASvDZBniIemthueiVhc4g@public.gmane.org (icq: 261067046)
--
Geizige Menschen sind unangenehme Zeitgenossen - aber angenehme Vorfahren.

[-- Attachment #1.2: Type: application/pgp-signature, Size: 197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 158 bytes --]

_______________________________________________
users mailing list
users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
https://www.nilfs.org/mailman/listinfo/users

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Dump tools for checkpoints
       [not found]     ` <20090308.143916.113539820.ryusuke-sG5X7nlA6pw@public.gmane.org>
  2009-03-12 15:26       ` Reinoud Zandijk
  2009-03-13 16:26       ` Leon Weber
@ 2009-03-13 16:42       ` nilfs-MZZvbRqs/9F0RdzJJlgK+g
       [not found]         ` <2893.84.133.153.87.1236962534.squirrel-lZV7jZi4TBeO2lRZCxuEEg@public.gmane.org>
  2 siblings, 1 reply; 11+ messages in thread
From: nilfs-MZZvbRqs/9F0RdzJJlgK+g @ 2009-03-13 16:42 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: users-JrjvKiOkagjYtjvyW6yDsg

Hi,

> Basically I believe this is possible, but one problem is how to
> realize rollback of the remote nilfs which will become required when
> allowing user's updates or garbage collection on the remote device
> after synchronization.

let's call them master / slave, I'm confused which one is remote :)

I assume only the master is writable and slaves don't do any changes or
cleaning.

Require that all checkpoints from last sync to present are still existing.
Upon dump on master, cleanerd has to be suspended and may resume after
dump. Does that make things easier?

By the way: When I completely disable cleanerd on the master, wouldn't it
be possible to define a byte-range on the block device which makes up the
difference between two checkpoints and copy that to the slaves?

Regards,

Pierre Beck

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Dump tools for checkpoints
       [not found]         ` <20090313162609.GD7494-Tv691C15nu5348sJH/KcYBvVK+yQ3ZXh@public.gmane.org>
@ 2009-03-13 16:53           ` Gergely Gábor
  2009-03-13 17:33           ` mail-MZZvbRqs/9F0RdzJJlgK+g
  2009-03-15 13:58           ` Ryusuke Konishi
  2 siblings, 0 replies; 11+ messages in thread
From: Gergely Gábor @ 2009-03-13 16:53 UTC (permalink / raw)
  To: NILFS Users mailing list

This is a feature to provide a COW natured offsite backup, with some
granularity, i think, or i can imagine this as a useful thing this
way.

Greetings: Gergely Gábor

On Fri, Mar 13, 2009 at 5:26 PM, Leon Weber <leon@leonweber.de> wrote:
> On 08.03.2009 14:39:16, Ryusuke Konishi wrote:
>> I've moved the replication feature to the top of todo list in our web
>> site ;)
>
> Is replication really a feature that a file system should implement? I
> think it should be implemented one layer below the fs. If one needs a
> replicated file system, there are things like drbd (distributed
> replicated block device). Of course, the file system should support being
> on a replicated device, though I don't really see a need to implement
> this on the fs layer, or am I missing anything?
>
> Leon
>
> --
> Leon Weber, leon@leonweber.de 0x8E04D7FC
> jabber: leon@jabber.ccc.de (icq: 261067046)
> --
> Geizige Menschen sind unangenehme Zeitgenossen - aber angenehme Vorfahren.
>
> _______________________________________________
> users mailing list
> users@nilfs.org
> https://www.nilfs.org/mailman/listinfo/users
>
>
_______________________________________________
users mailing list
users@nilfs.org
https://www.nilfs.org/mailman/listinfo/users

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Dump tools for checkpoints
       [not found]         ` <20090313162609.GD7494-Tv691C15nu5348sJH/KcYBvVK+yQ3ZXh@public.gmane.org>
  2009-03-13 16:53           ` Gergely Gábor
@ 2009-03-13 17:33           ` mail-MZZvbRqs/9F0RdzJJlgK+g
       [not found]             ` <3214.84.133.153.87.1236965583.squirrel-lZV7jZi4TBeO2lRZCxuEEg@public.gmane.org>
  2009-03-15 13:58           ` Ryusuke Konishi
  2 siblings, 1 reply; 11+ messages in thread
From: mail-MZZvbRqs/9F0RdzJJlgK+g @ 2009-03-13 17:33 UTC (permalink / raw)
  Cc: users-JrjvKiOkagjYtjvyW6yDsg

Hi,

> Is replication really a feature that a file system should implement? I
> think it should be implemented one layer below the fs. If one needs a
> replicated file system, there are things like drbd (distributed
> replicated block device).

yes, the task is similar to what drbd does. But for nightly incremental
backup, drbd would be overkill. It's called "RAID1 over network"! Rsync is
slow and consumes much CPU. Incremental backup software can't cope with
big files AFAIK. Nilfs could fill that gap perfectly.

- Have database / virtual machines / etc. on nilfs
- Do a nightly or hourly "nsync" to a remote mirror

Regards,

Pierre Beck

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Dump tools for checkpoints
       [not found]         ` <20090313162609.GD7494-Tv691C15nu5348sJH/KcYBvVK+yQ3ZXh@public.gmane.org>
  2009-03-13 16:53           ` Gergely Gábor
  2009-03-13 17:33           ` mail-MZZvbRqs/9F0RdzJJlgK+g
@ 2009-03-15 13:58           ` Ryusuke Konishi
  2 siblings, 0 replies; 11+ messages in thread
From: Ryusuke Konishi @ 2009-03-15 13:58 UTC (permalink / raw)
  To: users-JrjvKiOkagjYtjvyW6yDsg, leon-UvuJAy62EvlM7kwft8N7nw

Hi,
On Fri, 13 Mar 2009 17:26:09 +0100, Leon Weber wrote:
> On 08.03.2009 14:39:16, Ryusuke Konishi wrote:
> > I've moved the replication feature to the top of todo list in our web
> > site ;)
> 
> Is replication really a feature that a file system should implement? I
> think it should be implemented one layer below the fs. If one needs a
> replicated file system, there are things like drbd (distributed
> replicated block device).

Yes, DRBD is another solution. I'm actually thinking it a possible
measure though we haven't yet evaluate for nilfs.  But as you know the
granularity differs from that of the checkpoint based solution.

Unless the checkpoint based solution doesn't have any merit, I think
the solution to be chosen depends on customer's (users') demand and
so-called service level.

Anyway you have a good point.  I should care not only rsync but also
drbd and other block layer solutions, and must find merits definitely.

> Of course, the file system should support being
> on a replicated device, though I don't really see a need to implement
> this on the fs layer, or am I missing anything?
> 
> Leon

In general, filesystems are said to have better opportunity to utilize
the knowledge for their metadata structures and consistency especially
for copy-on-write filesystems.

Those can efficiently extract delta by nature, and can drop
intermediate changes depending on required granularity.

Will it true with nilfs? - I don't know yet, but I think it's one of
the feature being worth of consideration.

Thanks,
Ryusuke Konishi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Dump tools for checkpoints
       [not found]         ` <2893.84.133.153.87.1236962534.squirrel-lZV7jZi4TBeO2lRZCxuEEg@public.gmane.org>
@ 2009-03-15 15:12           ` Ryusuke Konishi
  0 siblings, 0 replies; 11+ messages in thread
From: Ryusuke Konishi @ 2009-03-15 15:12 UTC (permalink / raw)
  To: nilfs-MZZvbRqs/9F0RdzJJlgK+g; +Cc: users-JrjvKiOkagjYtjvyW6yDsg

On Fri, 13 Mar 2009 17:42:20 +0100, nilfs-MZZvbRqs/9F0RdzJJlgK+g@public.gmane.org wrote:
> Hi,
> 
> > Basically I believe this is possible, but one problem is how to
> > realize rollback of the remote nilfs which will become required when
> > allowing user's updates or garbage collection on the remote device
> > after synchronization.
> 
> let's call them master / slave, I'm confused which one is remote :)
> 
> I assume only the master is writable and slaves don't do any changes or
> cleaning.

If that premise is allowed, things would become so simple.  But, is it
suitable for practical use?

If cleaning is not allowed, the slave partition gets full
periodically.  So, frequent full replications would be needed in that
case.

If we supposed to have very huge slave device, breaking the constraint
could be costly because recovery to the slave state requires a full
replication.
 
> Require that all checkpoints from last sync to present are still existing.
> Upon dump on master, cleanerd has to be suspended and may resume after
> dump. Does that make things easier?

Yes, absolutely.  I have no mind to think the dumping while cleanerd
operates.

The dump tool at least should hold write lock for .nilfs to prevent
cleaning, and I would make the tool so that it suspends and resumes
cleanerd by signals (if I were).

> By the way: When I completely disable cleanerd on the master, wouldn't it
> be possible to define a byte-range on the block device which makes up the
> difference between two checkpoints and copy that to the slaves?

We would get multiple ranges on the device not a single range because
nilfs may allocate segments in non-sequential order.  (Though the
current cleanerd algorithm induces fifo order allocation, it is not
guranteed by nilfs.)

The current sufile doesn't have information on the sequence (or chain)
of allocated segments.  So, it requires scanning of on-disk segment
headers; each on-disk segment header has a next pointer.

If the scanning cost becomes an issue, we can include the next pointer
in sufile to minimize the cost.
 
Regards,
Ryusuke Konishi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Dump tools for checkpoints
       [not found]             ` <3214.84.133.153.87.1236965583.squirrel-lZV7jZi4TBeO2lRZCxuEEg@public.gmane.org>
@ 2009-03-16  0:57               ` Ryusuke Konishi
  0 siblings, 0 replies; 11+ messages in thread
From: Ryusuke Konishi @ 2009-03-16  0:57 UTC (permalink / raw)
  To: users-JrjvKiOkagjYtjvyW6yDsg, mail-MZZvbRqs/9F0RdzJJlgK+g

Hi,
On Fri, 13 Mar 2009 18:33:08 +0100, mail-MZZvbRqs/9F0RdzJJlgK+g@public.gmane.org wrote:
> Hi,
> 
> > Is replication really a feature that a file system should implement? I
> > think it should be implemented one layer below the fs. If one needs a
> > replicated file system, there are things like drbd (distributed
> > replicated block device).
> 
> yes, the task is similar to what drbd does. But for nightly incremental
> backup, drbd would be overkill. It's called "RAID1 over network"! Rsync is
> slow and consumes much CPU. Incremental backup software can't cope with
> big files AFAIK. Nilfs could fill that gap perfectly.
> 
> - Have database / virtual machines / etc. on nilfs
> - Do a nightly or hourly "nsync" to a remote mirror
> 
> Regards,
> 
> Pierre Beck

"nsync" sounds nice!

Regards,
Ryusuke Konishi

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2009-03-16  0:57 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-01  0:57 Dump tools for checkpoints nilfs-MZZvbRqs/9F0RdzJJlgK+g
     [not found] ` <2883.84.133.202.72.1235869032.squirrel-lZV7jZi4TBeO2lRZCxuEEg@public.gmane.org>
2009-03-08  5:39   ` Ryusuke Konishi
     [not found]     ` <20090308.143916.113539820.ryusuke-sG5X7nlA6pw@public.gmane.org>
2009-03-12 15:26       ` Reinoud Zandijk
     [not found]         ` <20090312152618.GA25317-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
2009-03-13 15:33           ` Ryusuke Konishi
2009-03-13 16:26       ` Leon Weber
     [not found]         ` <20090313162609.GD7494-Tv691C15nu5348sJH/KcYBvVK+yQ3ZXh@public.gmane.org>
2009-03-13 16:53           ` Gergely Gábor
2009-03-13 17:33           ` mail-MZZvbRqs/9F0RdzJJlgK+g
     [not found]             ` <3214.84.133.153.87.1236965583.squirrel-lZV7jZi4TBeO2lRZCxuEEg@public.gmane.org>
2009-03-16  0:57               ` Ryusuke Konishi
2009-03-15 13:58           ` Ryusuke Konishi
2009-03-13 16:42       ` nilfs-MZZvbRqs/9F0RdzJJlgK+g
     [not found]         ` <2893.84.133.153.87.1236962534.squirrel-lZV7jZi4TBeO2lRZCxuEEg@public.gmane.org>
2009-03-15 15:12           ` Ryusuke Konishi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox