* btrfs subvol find-new
@ 2010-03-26 10:18 Michael Niederle
2010-03-26 13:22 ` Chris Mason
0 siblings, 1 reply; 11+ messages in thread
From: Michael Niederle @ 2010-03-26 10:18 UTC (permalink / raw)
To: linux-btrfs
I want to write a differential backup tool for btrfs snapshots.
The new "btrfs subvol find-new"-command sounds great on first encounter, but I'm
missing informations about updated directories. I would need a list of updated
directories to scan for deleted files.
I had a look at find_updated_files() in btrfs-list.c. To me it seems as if
the ioctl would only return the extents of regular files.
The function find_root_gen() in btrfs-list.c seems to return the newest
generation in a given snapshot. It would be nice to have this exported as a
user command (e.g. "btrfs subvol newest-gen") then one could use the output of
btrfs subvol newest-gen <old snapshot>
(plus 1) as the input generation number to
btrfs subvol find-new <new snapshot> <gen+1>
(I'm using kernel 2.6.32.10 with the most current btrfs-kernel modules and
userland tools as of last Saturday.)
Greetings, Michael
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: btrfs subvol find-new
2010-03-26 10:18 btrfs subvol find-new Michael Niederle
@ 2010-03-26 13:22 ` Chris Mason
2010-03-26 18:51 ` btrfs subvol find-modified Dipl.-Ing. Michael Niederle
0 siblings, 1 reply; 11+ messages in thread
From: Chris Mason @ 2010-03-26 13:22 UTC (permalink / raw)
To: Michael Niederle; +Cc: linux-btrfs
On Fri, Mar 26, 2010 at 11:18:07AM +0100, Michael Niederle wrote:
> I want to write a differential backup tool for btrfs snapshots.
>
> The new "btrfs subvol find-new"-command sounds great on first encounter, but I'm
> missing informations about updated directories. I would need a list of updated
> directories to scan for deleted files.
>
> I had a look at find_updated_files() in btrfs-list.c. To me it seems as if
> the ioctl would only return the extents of regular files.
Well, the ioctl is actually returning all the updated inodes, but the
command ignores them.
Every piece of metadata in the btrfs btree has a key, and every key has
a type field. It's the type field that makes keys for inodes different
from keys for file extents or directory items.
In find_udpated_files, it does this:
sk->min_type = 0;
sk->max_type = BTRFS_EXTENT_DATA_KEY;
This means the search ioctl in the kernel won't return anything with a
key bigger than BTRFS_EXTENT_DATA_KEY. If you look in ctree.h, you'll
see that BTRFS_EXTENT_DATA_KEY is actually bigger than inodes and
directory items, so we're getting most of the file and directory
metadata with this search.
In the loop in find_updates_files, it does this:
if (sh->type == BTRFS_EXTENT_DATA_KEY &&
Which limits the output to only extent data keys.
>
> The function find_root_gen() in btrfs-list.c seems to return the newest
> generation in a given snapshot. It would be nice to have this exported as a
> user command (e.g. "btrfs subvol newest-gen") then one could use the output of
>
> btrfs subvol newest-gen <old snapshot>
That was definitely the plan. If you're interested in coding this,
please remember that you have to record the generation before you start
to backup, so that you catch everything that changed during the backup
next time around.
When we find an inode in the output, it doesn't mean that inode has
changed. It just means the btree block holding that inode has changed.
So we'll want to add limiting based on the ctime/mtime of the inode as
well.
Inodes have type BTRFS_INODE_ITEM_KEY, the same inode format is used for
both files and directories. Inside a directory we have the files listed
twice, once under items of type BTRFS_DIR_ITEM_KEY, and once under items
of type BTRFS_DIR_INDEX_KEY. The duplicate index helps with NFS and
helps us do sequential directory reads.
You'll want to pick the BTRFS_DIR_INDEX_KEY because they are in a better
order for backing up.
>
> (plus 1) as the input generation number to
>
> btrfs subvol find-new <new snapshot> <gen+1>
>
To be on the safe side (not miss any updates) we want to use gen, not
gen+1. We'll get some duplicates, but it is the only way to be sure we
don't miss anything.
-chris
^ permalink raw reply [flat|nested] 11+ messages in thread
* btrfs subvol find-modified
2010-03-26 13:22 ` Chris Mason
@ 2010-03-26 18:51 ` Dipl.-Ing. Michael Niederle
2010-03-26 19:36 ` Chris Mason
0 siblings, 1 reply; 11+ messages in thread
From: Dipl.-Ing. Michael Niederle @ 2010-03-26 18:51 UTC (permalink / raw)
To: Chris Mason; +Cc: linux-btrfs
I have added a command
btrfs subvolume find-modified <path> <last_gen>
List the recently modified files and directories in a filesystem.
It's similar to find-new with the following differences:
* in addition to modified files it will also display modified directories
* it lists only the paths of the modified files and directories (no extent
information)
Directories "." and ".." are filtered.
I will do extensive testing this weekend and then post the patch to this list
if wanted - if I'm able to master git until then ... ^^
Greetings, Michael
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: btrfs subvol find-modified
2010-03-26 18:51 ` btrfs subvol find-modified Dipl.-Ing. Michael Niederle
@ 2010-03-26 19:36 ` Chris Mason
2010-03-26 20:22 ` Dipl.-Ing. Michael Niederle
0 siblings, 1 reply; 11+ messages in thread
From: Chris Mason @ 2010-03-26 19:36 UTC (permalink / raw)
To: Dipl.-Ing. Michael Niederle; +Cc: linux-btrfs
On Fri, Mar 26, 2010 at 07:51:47PM +0100, Dipl.-Ing. Michael Niederle wrote:
> I have added a command
>
> btrfs subvolume find-modified <path> <last_gen>
> List the recently modified files and directories in a filesystem.
>
> It's similar to find-new with the following differences:
Ok, I'd suggest two changes. Add an optional timestamp field to filter
files that have changed since a given timestamp.
Also make it take -e that prints the extents.
-chris
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: btrfs subvol find-modified
2010-03-26 19:36 ` Chris Mason
@ 2010-03-26 20:22 ` Dipl.-Ing. Michael Niederle
2010-03-26 20:26 ` Chris Mason
0 siblings, 1 reply; 11+ messages in thread
From: Dipl.-Ing. Michael Niederle @ 2010-03-26 20:22 UTC (permalink / raw)
To: Chris Mason; +Cc: linux-btrfs
Hi, Chris!
> Add an optional timestamp field to filter
> files that have changed since a given timestamp.
Is there a possibility to derive the timestamp directly from the generation
number?
If we have a "-e"-switch for printing extent-information we could also have
another switch to decide whether to print directory-information or not and
combine find-new and find-modified into a single command.
Meanwhile I have implemented the (very simple) command
btrfs subvolume max-gen <path>
Print the highest generation number in a filesystem.
Greetings, Michael
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: btrfs subvol find-modified
2010-03-26 20:22 ` Dipl.-Ing. Michael Niederle
@ 2010-03-26 20:26 ` Chris Mason
2010-03-26 20:36 ` Goffredo Baroncelli
0 siblings, 1 reply; 11+ messages in thread
From: Chris Mason @ 2010-03-26 20:26 UTC (permalink / raw)
To: Dipl.-Ing. Michael Niederle; +Cc: linux-btrfs
On Fri, Mar 26, 2010 at 09:22:07PM +0100, Dipl.-Ing. Michael Niederle wrote:
> Hi, Chris!
>
> > Add an optional timestamp field to filter
> > files that have changed since a given timestamp.
>
> Is there a possibility to derive the timestamp directly from the generation
> number?
I'm afraid not.
>
> If we have a "-e"-switch for printing extent-information we could also have
> another switch to decide whether to print directory-information or not and
> combine find-new and find-modified into a single command.
Yes, that's the direction I'd like to see.
>
> Meanwhile I have implemented the (very simple) command
>
> btrfs subvolume max-gen <path>
> Print the highest generation number in a filesystem.
>
Great.
-chris
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: btrfs subvol find-modified
2010-03-26 20:26 ` Chris Mason
@ 2010-03-26 20:36 ` Goffredo Baroncelli
2010-03-26 21:11 ` Chris Mason
2010-03-26 21:38 ` btrfs subvol diff Dipl.-Ing. Michael Niederle
0 siblings, 2 replies; 11+ messages in thread
From: Goffredo Baroncelli @ 2010-03-26 20:36 UTC (permalink / raw)
To: Chris Mason, Dipl.-Ing. Michael Niederle, linux-btrfs
On Friday 26 March 2010, Chris Mason wrote:
> On Fri, Mar 26, 2010 at 09:22:07PM +0100, Dipl.-Ing. Michael Niederle wrote:
> > Hi, Chris!
> >
> > > Add an optional timestamp field to filter
> > > files that have changed since a given timestamp.
> >
> > Is there a possibility to derive the timestamp directly from the
generation
> > number?
>
> I'm afraid not.
>
> >
> > If we have a "-e"-switch for printing extent-information we could also
have
> > another switch to decide whether to print directory-information or not and
> > combine find-new and find-modified into a single command.
>
> Yes, that's the direction I'd like to see.
>
> >
> > Meanwhile I have implemented the (very simple) command
> >
> > btrfs subvolume max-gen <path>
> > Print the highest generation number in a filesystem.
> >
It is possible to combine the commands max-gen and find-new ? Something like:
$ btrfs subvol find-new subvol1 snap1
I think that the generation number is useful only from a developer point of
view. But from an user point of view a command which is able to compare two
snapshot if more useful.
>
> Great.
>
> -chris
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) <kreijackATinwind.it>
Key fingerprint = 4769 7E51 5293 D36C 814E C054 BF04 F161 3DC5 0512
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: btrfs subvol find-modified
2010-03-26 20:36 ` Goffredo Baroncelli
@ 2010-03-26 21:11 ` Chris Mason
2010-03-26 22:23 ` Goffredo Baroncelli
2010-03-27 9:52 ` directory order in btrfs Dipl.-Ing. Michael Niederle
2010-03-26 21:38 ` btrfs subvol diff Dipl.-Ing. Michael Niederle
1 sibling, 2 replies; 11+ messages in thread
From: Chris Mason @ 2010-03-26 21:11 UTC (permalink / raw)
To: kreijack; +Cc: Dipl.-Ing. Michael Niederle, linux-btrfs
On Fri, Mar 26, 2010 at 09:36:31PM +0100, Goffredo Baroncelli wrote:
> On Friday 26 March 2010, Chris Mason wrote:
> > On Fri, Mar 26, 2010 at 09:22:07PM +0100, Dipl.-Ing. Michael Niederle wrote:
> > > Hi, Chris!
> > >
> > > > Add an optional timestamp field to filter
> > > > files that have changed since a given timestamp.
> > >
> > > Is there a possibility to derive the timestamp directly from the
> generation
> > > number?
> >
> > I'm afraid not.
> >
> > >
> > > If we have a "-e"-switch for printing extent-information we could also
> have
> > > another switch to decide whether to print directory-information or not and
> > > combine find-new and find-modified into a single command.
> >
> > Yes, that's the direction I'd like to see.
> >
> > >
> > > Meanwhile I have implemented the (very simple) command
> > >
> > > btrfs subvolume max-gen <path>
> > > Print the highest generation number in a filesystem.
> > >
>
> It is possible to combine the commands max-gen and find-new ? Something like:
>
> $ btrfs subvol find-new subvol1 snap1
>
> I think that the generation number is useful only from a developer point of
> view. But from an user point of view a command which is able to compare two
> snapshot if more useful.
In general, the end goal is backing up a snapshot the changes from a
point in time to right now. We don't actually need a snapshot to do
this, we just need the generation number and (optionally) a timestamp.
So, we could store these things into a state file that gets fed into the
next backup, but I'd like to keep a command that can print them as well.
-chris
^ permalink raw reply [flat|nested] 11+ messages in thread
* btrfs subvol diff
2010-03-26 20:36 ` Goffredo Baroncelli
2010-03-26 21:11 ` Chris Mason
@ 2010-03-26 21:38 ` Dipl.-Ing. Michael Niederle
1 sibling, 0 replies; 11+ messages in thread
From: Dipl.-Ing. Michael Niederle @ 2010-03-26 21:38 UTC (permalink / raw)
To: kreijack; +Cc: kreijack, Chris Mason, linux-btrfs
> It is possible to combine the commands max-gen and find-new ? Something like:
>
> $ btrfs subvol find-new subvol1 snap1
I had very similar thoughts myself.
If we compare two snapshots (of the same subvolume) we wouldn't need timestamps
either, e.g.:
btrfs subvol diff <old_snapshot> <new_snapshot>
The output could be a list of files (and directories); each line prefixed with
a plus sign (for new or modified files) or a minus sign (for deleted files).
The output could be easily postprocessed using grep and cut.
Greetings, Michael
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: btrfs subvol find-modified
2010-03-26 21:11 ` Chris Mason
@ 2010-03-26 22:23 ` Goffredo Baroncelli
2010-03-27 9:52 ` directory order in btrfs Dipl.-Ing. Michael Niederle
1 sibling, 0 replies; 11+ messages in thread
From: Goffredo Baroncelli @ 2010-03-26 22:23 UTC (permalink / raw)
To: Chris Mason, Dipl.-Ing. Michael Niederle, linux-btrfs
On Friday 26 March 2010, Chris Mason wrote:
> In general, the end goal is backing up a snapshot the changes from a
> point in time to right now. We don't actually need a snapshot to do
> this, we just need the generation number and (optionally) a timestamp.
I think that backup the difference between two snapshot has a big advantage:
the snapshot is a coherent state.
For example what if we are doing a backup during a package installation or
during a database working ? The risk is to take some files from an old "state"
and other files from a new "state"...
>
> So, we could store these things into a state file that gets fed into the
> next backup, but I'd like to keep a command that can print them as well.
>
> -chris
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) <kreijackATinwind.it>
Key fingerprint = 4769 7E51 5293 D36C 814E C054 BF04 F161 3DC5 0512
^ permalink raw reply [flat|nested] 11+ messages in thread
* directory order in btrfs
2010-03-26 21:11 ` Chris Mason
2010-03-26 22:23 ` Goffredo Baroncelli
@ 2010-03-27 9:52 ` Dipl.-Ing. Michael Niederle
1 sibling, 0 replies; 11+ messages in thread
From: Dipl.-Ing. Michael Niederle @ 2010-03-27 9:52 UTC (permalink / raw)
To: Chris Mason; +Cc: linux-btrfs
Hi, Chris!
I'm writing the btrfs snapshot diff tool and I would like to know, whether the
entries in a btrfs directory are ordered in some way.
I want to find missing entries in a new snapshot's directory. Can I do a "linear
compare" of the old and new directories or do I have to sort the entries first
(or do some kind of hashing)?
Greetings, Michael
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2010-03-27 9:52 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-26 10:18 btrfs subvol find-new Michael Niederle
2010-03-26 13:22 ` Chris Mason
2010-03-26 18:51 ` btrfs subvol find-modified Dipl.-Ing. Michael Niederle
2010-03-26 19:36 ` Chris Mason
2010-03-26 20:22 ` Dipl.-Ing. Michael Niederle
2010-03-26 20:26 ` Chris Mason
2010-03-26 20:36 ` Goffredo Baroncelli
2010-03-26 21:11 ` Chris Mason
2010-03-26 22:23 ` Goffredo Baroncelli
2010-03-27 9:52 ` directory order in btrfs Dipl.-Ing. Michael Niederle
2010-03-26 21:38 ` btrfs subvol diff Dipl.-Ing. Michael Niederle
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).