linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Workaround for hardlink count problem?
@ 2012-09-08 16:56 Marc MERLIN
  2012-09-10  9:12 ` Martin Steigerwald
  0 siblings, 1 reply; 7+ messages in thread
From: Marc MERLIN @ 2012-09-08 16:56 UTC (permalink / raw)
  To: linux-btrfs; +Cc: mfasheh

I read the discussions on hardlinks, and saw that there was a proposed patch
(although I'm not sure if it's due in 3.6 or not, or whether I can apply it 
to my 3.5.3 tree).

I was migrating a backup disk to a new btrfs disk, and the backup had a lot of hardlinks
to collapse identical files to cut down on inode count and disk space.

Then, I started seeing:

cp: cannot create hard link `../dshelf3/backup/saroumane/20080319/var/lib/dpkg/info/libaspell15.postrm' to `../dshelf3/backup/moremagic/oldinstall/var/lib/dpkg/info/libncurses5.postrm': Too many links
cp: cannot create hard link `../dshelf3/backup/saroumane/20080319/var/lib/dpkg/info/libxp6.postrm' to `../dshelf3/backup/moremagic/oldinstall/var/lib/dpkg/info/libncurses5.postrm': Too many links
cp: cannot create symbolic link `../dshelf3/backup/saroumane/20020317_oldload/usr/share/doc/menu/examples/system.fvwmrc': File name too long
cp: cannot create hard link `../dshelf3/backup/saroumane/20061218/var/lib/dpkg/info/libxxf86vm1.postrm' to `../dshelf3/backup/moremagic/oldinstall/var/lib/dpkg/info/libncurses5.postrm': Too many links
cp: cannot create hard link `../dshelf3/backup/saroumane/20061218/var/lib/dpkg/info/libxxf86dga1.postrm' to `../dshelf3/backup/moremagic/oldinstall/var/lib/dpkg/info/libncurses5.postrm': Too many links
cp: cannot create hard link `../dshelf3/backup/saroumane/20061218/var/lib/dpkg/info/libavc1394-0.postrm' to `../dshelf3/backup/moremagic/oldinstall/var/lib/dpkg/info/libncurses5.postrm': Too many links

What's interesting is the 'File name too long' one, but more generally, 
I'm trying to find a userspace workaround for this by unlinking files that go beyond
the hardlink count that btrfs can support for now.

Has someone come up with a cool way to work around the too many link error
and only when that happens, turn the hardlink into a file copy instead?
(that is when copying an entire tree with millions of files).

I realize I could parse the errors and pipe that into some crafty shell to do this,
but if there is a smarter already made solution, I'm all ears :)

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Workaround for hardlink count problem?
  2012-09-08 16:56 Workaround for hardlink count problem? Marc MERLIN
@ 2012-09-10  9:12 ` Martin Steigerwald
  2012-09-10  9:21   ` Fajar A. Nugraha
  0 siblings, 1 reply; 7+ messages in thread
From: Martin Steigerwald @ 2012-09-10  9:12 UTC (permalink / raw)
  To: linux-btrfs

Am Samstag, 8. September 2012 schrieb Marc MERLIN:
> I read the discussions on hardlinks, and saw that there was a proposed
> patch (although I'm not sure if it's due in 3.6 or not, or whether I
> can apply it to my 3.5.3 tree).
> 
> I was migrating a backup disk to a new btrfs disk, and the backup had a
> lot of hardlinks to collapse identical files to cut down on inode
> count and disk space.
> 
> Then, I started seeing:
[…]
> Has someone come up with a cool way to work around the too many link
> error and only when that happens, turn the hardlink into a file copy
> instead? (that is when copying an entire tree with millions of files).

What about:

- copy first backup version
- btrfs subvol create first next
- copy next backup version
- btrfs subvol create previous next

I use this scheme for my backup since quite a while. Except that first 
backup, then create a read only snapshot. And at some time remove old 
snapshots.

Works like a charm and is easily scriptable.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Workaround for hardlink count problem?
  2012-09-10  9:12 ` Martin Steigerwald
@ 2012-09-10  9:21   ` Fajar A. Nugraha
  2012-09-10 23:09     ` Martin Steigerwald
  0 siblings, 1 reply; 7+ messages in thread
From: Fajar A. Nugraha @ 2012-09-10  9:21 UTC (permalink / raw)
  To: linux-btrfs

On Mon, Sep 10, 2012 at 4:12 PM, Martin Steigerwald <Martin@lichtvoll.de> wrote:
> Am Samstag, 8. September 2012 schrieb Marc MERLIN:
>> I was migrating a backup disk to a new btrfs disk, and the backup had a
>> lot of hardlinks to collapse identical files to cut down on inode
>> count and disk space.
>>
>> Then, I started seeing:
> […]
>> Has someone come up with a cool way to work around the too many link
>> error and only when that happens, turn the hardlink into a file copy
>> instead? (that is when copying an entire tree with millions of files).
>
> What about:
>
> - copy first backup version
> - btrfs subvol create first next
> - copy next backup version
> - btrfs subvol create previous next

Wouldn't "btrfs subvolume snapshot", plus "rsync --inplace" more
useful here? That is. if the original hardlink is caused by multiple
versions of backup of the same file.

Personally, if I need a feature not currently impelented yet in btrfs,
I'd just switch to something else for now, like zfs. And revisit btrfs
later when it has the needed features have been merged.

-- 
Fajar

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Workaround for hardlink count problem?
  2012-09-10  9:21   ` Fajar A. Nugraha
@ 2012-09-10 23:09     ` Martin Steigerwald
  2012-09-10 23:38       ` Jan Engelhardt
  0 siblings, 1 reply; 7+ messages in thread
From: Martin Steigerwald @ 2012-09-10 23:09 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Fajar A. Nugraha

Am Montag, 10. September 2012 schrieb Fajar A. Nugraha:
> On Mon, Sep 10, 2012 at 4:12 PM, Martin Steigerwald 
<Martin@lichtvoll.de> wrote:
> > Am Samstag, 8. September 2012 schrieb Marc MERLIN:
> >> I was migrating a backup disk to a new btrfs disk, and the backup
> >> had a lot of hardlinks to collapse identical files to cut down on
> >> inode count and disk space.
> > 
> >> Then, I started seeing:
> > […]
> > 
> >> Has someone come up with a cool way to work around the too many link
> >> error and only when that happens, turn the hardlink into a file copy
> >> instead? (that is when copying an entire tree with millions of
> >> files).
> > 
> > What about:
> > 
> > - copy first backup version
> > - btrfs subvol create first next
> > - copy next backup version
> > - btrfs subvol create previous next
> 
> Wouldn't "btrfs subvolume snapshot", plus "rsync --inplace" more
> useful here? That is. if the original hardlink is caused by multiple
> versions of backup of the same file.

Sure, I meant subvol snapshot in above example. Thanks for noticing.

But I do not use --inplace as it conflicts with some other rsync options I 
like to use:

-ax --acls --xattrs --sparse --hard-links --del --delete-excluded --
exclude-from "debian-exclude"

Yes, it was --sparse:

       -S, --sparse
              Try to handle sparse files efficiently so they  take  up
              less space on the destination.  Conflicts with --inplace
              because it’s not possible to overwrite data in a  sparse
              fashion.

As I have some pretty big sparse files, I went without --inplace:

martin@merkaba:~/Amiga> du -sch M-Archiv.hardfile Messages.hardfile 
241M    M-Archiv.hardfile
726M    Messages.hardfile
966M    insgesamt
martin@merkaba:~/Amiga> ls -lh M-Archiv.hardfile Messages.hardfile 
-rw-r----- 1 martin martin 1,0G Mär 27  2005 M-Archiv.hardfile
-rw-r----- 1 martin martin 1,0G Sep 10 17:33 Messages.hardfile
martin@merkaba:~/Amiga>

(my old mails when I used Amiga as my main machine, still accessible via 
e-uae ;)

Anyway, I think that will be solved by btrfs send/receive.

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Workaround for hardlink count problem?
  2012-09-10 23:09     ` Martin Steigerwald
@ 2012-09-10 23:38       ` Jan Engelhardt
  2012-09-11  9:16         ` Martin Steigerwald
  2012-09-11 14:20         ` Arne Jansen
  0 siblings, 2 replies; 7+ messages in thread
From: Jan Engelhardt @ 2012-09-10 23:38 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-btrfs, Fajar A. Nugraha


On Tuesday 2012-09-11 01:09, Martin Steigerwald wrote:
>> > What about:
>> > 
>> > - copy first backup version
>> > - btrfs subvol create first next
>> > - copy next backup version
>> > - btrfs subvol create previous next
>> 
>> Wouldn't "btrfs subvolume snapshot", plus "rsync --inplace" more
>> useful here? That is. if the original hardlink is caused by multiple
>> versions of backup of the same file.
>
>Sure, I meant subvol snapshot in above example. Thanks for noticing.
>
>But I do not use --inplace as it conflicts with some other rsync options I 
>like to use:

It is a tradeoff.

rsync "--inplace" leads to fragmentation which is detrimental for the
speed of reads (and read-write-cycles as used by rsync) of big files
(multi-GB) that are regularly updated, but it is probably even worse
for smaller-than-GB files because percent-wise, they are even more
fragmented.

$ filefrag */vm/intranet.dsk
snap-2012-08-15/vm/intranet.dsk: 23 extents found
snap-2012-08-16/vm/intranet.dsk: 23 extents found
snap-2012-08-17/vm/intranet.dsk: 4602 extents found
snap-2012-08-18/vm/intranet.dsk: 6221 extents found
snap-2012-08-19/vm/intranet.dsk: 6604 extents found
snap-2012-08-20/vm/intranet.dsk: 6694 extents found
snap-2012-08-21/vm/intranet.dsk: 6650 extents found
snap-2012-08-22/vm/intranet.dsk: 6760 extents found
snap-2012-08-23/vm/intranet.dsk: 7226 extents found
snap-2012-08-24/vm/intranet.dsk: 7159 extents found
snap-2012-08-25/vm/intranet.dsk: 7464 extents found
snap-2012-08-26/vm/intranet.dsk: 7746 extents found
snap-2012-08-27/vm/intranet.dsk: 8017 extents found
snap-2012-08-28/vm/intranet.dsk: 8145 extents found
snap-2012-08-29/vm/intranet.dsk: 8393 extents found
snap-2012-08-30/vm/intranet.dsk: 8474 extents found
snap-2012-08-31/vm/intranet.dsk: 9150 extents found
snap-2012-09-01/vm/intranet.dsk: 8900 extents found
snap-2012-09-02/vm/intranet.dsk: 9218 extents found
snap-2012-09-03/vm/intranet.dsk: 9575 extents found
snap-2012-09-04/vm/intranet.dsk: 9760 extents found
snap-2012-09-05/vm/intranet.dsk: 9839 extents found
snap-2012-09-06/vm/intranet.dsk: 9907 extents found
snap-2012-09-07/vm/intranet.dsk: 10006 extents found
snap-2012-09-08/vm/intranet.dsk: 10248 extents found
snap-2012-09-09/vm/intranet.dsk: 10488 extents found

Without --inplace (prerequisite to use -S) however, it will recreate
a file if it has been touched. While this easily avoids fragmentation
(since it won't share any data blocks with the old one), it can take
up more space with the big files.

>-ax --acls --xattrs --sparse --hard-links --del --delete-excluded --

I knew short options would be helpful here: -axAXSH
(why don't they just become the standard... they are in like almost
every other rsync invocation I ever had)

>       -S, --sparse
>              Try to handle sparse files efficiently so they  take  up
>              less space on the destination.  Conflicts with --inplace
>              because it’s not possible to overwrite data in a  sparse
>              fashion.

Oh and if anybody from the rsync camp reads it: with hole-punching
now supported in Linux, there is no reason not to support "-S" with
"--inplace", I think.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Workaround for hardlink count problem?
  2012-09-10 23:38       ` Jan Engelhardt
@ 2012-09-11  9:16         ` Martin Steigerwald
  2012-09-11 14:20         ` Arne Jansen
  1 sibling, 0 replies; 7+ messages in thread
From: Martin Steigerwald @ 2012-09-11  9:16 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: linux-btrfs, Fajar A. Nugraha

Am Dienstag, 11. September 2012 schrieb Jan Engelhardt:
> 
> On Tuesday 2012-09-11 01:09, Martin Steigerwald wrote:
> >> > What about:
> >> > 
> >> > - copy first backup version
> >> > - btrfs subvol create first next
> >> > - copy next backup version
> >> > - btrfs subvol create previous next
> >> 
> >> Wouldn't "btrfs subvolume snapshot", plus "rsync --inplace" more
> >> useful here? That is. if the original hardlink is caused by multiple
> >> versions of backup of the same file.
> >
> >Sure, I meant subvol snapshot in above example. Thanks for noticing.
> >
> >But I do not use --inplace as it conflicts with some other rsync options I 
> >like to use:
> 
> It is a tradeoff.
> 
> rsync "--inplace" leads to fragmentation which is detrimental for the
> speed of reads (and read-write-cycles as used by rsync) of big files
> (multi-GB) that are regularly updated, but it is probably even worse
> for smaller-than-GB files because percent-wise, they are even more
> fragmented.
> 
> $ filefrag */vm/intranet.dsk
> snap-2012-08-15/vm/intranet.dsk: 23 extents found
> snap-2012-08-16/vm/intranet.dsk: 23 extents found
> snap-2012-08-17/vm/intranet.dsk: 4602 extents found
> snap-2012-08-18/vm/intranet.dsk: 6221 extents found
[…]
> snap-2012-08-25/vm/intranet.dsk: 7464 extents found
[…]
> snap-2012-09-09/vm/intranet.dsk: 10488 extents found
> 
> Without --inplace (prerequisite to use -S) however, it will recreate
> a file if it has been touched. While this easily avoids fragmentation
> (since it won't share any data blocks with the old one), it can take
> up more space with the big files.

Yes. But I do not care as much as about sparse files. Cause for the
example I gave on a backup restore those sparse files would consume
about 1 GiB more on the SSD. Then I prefer to have some duplicated
files on the 2TB backup harddisk.

As for recreating the sparse nature of the files I´d have to format new
hardfiles and copy tons of mail files over within E-UAE. Thus I prefer
not to loose it on backup.

> >-ax --acls --xattrs --sparse --hard-links --del --delete-excluded --
> 
> I knew short options would be helpful here: -axAXSH
> (why don't they just become the standard... they are in like almost
> every other rsync invocation I ever had)

Hey, I like those. I do not have to look up in the manpage what each
option means ;)

> >       -S, --sparse
> >              Try to handle sparse files efficiently so they  take  up
> >              less space on the destination.  Conflicts with --inplace
> >              because it’s not possible to overwrite data in a  sparse
> >              fashion.
> 
> Oh and if anybody from the rsync camp reads it: with hole-punching
> now supported in Linux, there is no reason not to support "-S" with
> "--inplace", I think.

Hmm, maybe I forward this to them.

Thanks,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Workaround for hardlink count problem?
  2012-09-10 23:38       ` Jan Engelhardt
  2012-09-11  9:16         ` Martin Steigerwald
@ 2012-09-11 14:20         ` Arne Jansen
  1 sibling, 0 replies; 7+ messages in thread
From: Arne Jansen @ 2012-09-11 14:20 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Linux Btrfs

On 11.09.2012 01:38, Jan Engelhardt wrote:
> 
> On Tuesday 2012-09-11 01:09, Martin Steigerwald wrote:
>>>> What about:
>>>>
>>>> - copy first backup version
>>>> - btrfs subvol create first next
>>>> - copy next backup version
>>>> - btrfs subvol create previous next
>>>
>>> Wouldn't "btrfs subvolume snapshot", plus "rsync --inplace" more
>>> useful here? That is. if the original hardlink is caused by multiple
>>> versions of backup of the same file.
>>
>> Sure, I meant subvol snapshot in above example. Thanks for noticing.
>>
>> But I do not use --inplace as it conflicts with some other rsync options I 
>> like to use:
> 
> It is a tradeoff.
> 
> rsync "--inplace" leads to fragmentation which is detrimental for the
> speed of reads (and read-write-cycles as used by rsync) of big files
> (multi-GB) that are regularly updated, but it is probably even worse
> for smaller-than-GB files because percent-wise, they are even more
> fragmented.
> 
> $ filefrag */vm/intranet.dsk
> snap-2012-08-15/vm/intranet.dsk: 23 extents found
> snap-2012-08-16/vm/intranet.dsk: 23 extents found
> snap-2012-08-17/vm/intranet.dsk: 4602 extents found
> snap-2012-08-18/vm/intranet.dsk: 6221 extents found
> snap-2012-08-19/vm/intranet.dsk: 6604 extents found
> snap-2012-08-20/vm/intranet.dsk: 6694 extents found
> snap-2012-08-21/vm/intranet.dsk: 6650 extents found
> snap-2012-08-22/vm/intranet.dsk: 6760 extents found
> snap-2012-08-23/vm/intranet.dsk: 7226 extents found
> snap-2012-08-24/vm/intranet.dsk: 7159 extents found
> snap-2012-08-25/vm/intranet.dsk: 7464 extents found
> snap-2012-08-26/vm/intranet.dsk: 7746 extents found
> snap-2012-08-27/vm/intranet.dsk: 8017 extents found
> snap-2012-08-28/vm/intranet.dsk: 8145 extents found
> snap-2012-08-29/vm/intranet.dsk: 8393 extents found
> snap-2012-08-30/vm/intranet.dsk: 8474 extents found
> snap-2012-08-31/vm/intranet.dsk: 9150 extents found
> snap-2012-09-01/vm/intranet.dsk: 8900 extents found
> snap-2012-09-02/vm/intranet.dsk: 9218 extents found
> snap-2012-09-03/vm/intranet.dsk: 9575 extents found
> snap-2012-09-04/vm/intranet.dsk: 9760 extents found
> snap-2012-09-05/vm/intranet.dsk: 9839 extents found
> snap-2012-09-06/vm/intranet.dsk: 9907 extents found
> snap-2012-09-07/vm/intranet.dsk: 10006 extents found
> snap-2012-09-08/vm/intranet.dsk: 10248 extents found
> snap-2012-09-09/vm/intranet.dsk: 10488 extents found
> 
> Without --inplace (prerequisite to use -S) however, it will recreate
> a file if it has been touched. While this easily avoids fragmentation
> (since it won't share any data blocks with the old one), it can take
> up more space with the big files.
> 
>> -ax --acls --xattrs --sparse --hard-links --del --delete-excluded --
> 
> I knew short options would be helpful here: -axAXSH
> (why don't they just become the standard... they are in like almost
> every other rsync invocation I ever had)
> 
>>       -S, --sparse
>>              Try to handle sparse files efficiently so they  take  up
>>              less space on the destination.  Conflicts with --inplace
>>              because it’s not possible to overwrite data in a  sparse
>>              fashion.
> 
> Oh and if anybody from the rsync camp reads it: with hole-punching
> now supported in Linux, there is no reason not to support "-S" with
> "--inplace", I think.

I sent a patch for this quite some time ago:

https://bugzilla.samba.org/show_bug.cgi?id=7194

Feel free to push it :)

-Arne

> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-09-11 14:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-08 16:56 Workaround for hardlink count problem? Marc MERLIN
2012-09-10  9:12 ` Martin Steigerwald
2012-09-10  9:21   ` Fajar A. Nugraha
2012-09-10 23:09     ` Martin Steigerwald
2012-09-10 23:38       ` Jan Engelhardt
2012-09-11  9:16         ` Martin Steigerwald
2012-09-11 14:20         ` Arne Jansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).