public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Andy Smith <andy@strugglers.net>
To: linux-btrfs@vger.kernel.org
Subject: "Too many links (31)" issue
Date: Thu, 3 Feb 2022 16:31:08 +0000	[thread overview]
Message-ID: <20220203163108.ipdv3yxbe7eb6vc4@bitfolk.com> (raw)

Hi,

I have a host with an xfs filesystem on it, with about 25 million
files. It contains an rsnapshot backup that has been aggressively
deduplicated by means of hardlinks and there's probably only about 7
million unique files on there.

I'm trying to rsync it to a different host into a btrfs filesystem,
but part way through the rsync I get a "Too many links (31)" error:

rsync: [generator] link "/data/backup/rsnapshot/daily.0/chacha/var/lib/dpkg/info/.apt-utils.postrm.0" => daily.0/backup1/var/lib/dpkg/info/libpango1.0-0.postrm failed: Too many links (31)
Hlink node data for 219191 already has path=daily.0/backup1/var/lib/dpkg/info/libpango1.0-0.postrm (daily.0/chacha/var/lib/dpkg/info/apt-utils.postrm)
rsync error: errors with program diagnostics (code 13) at hlink.c(539) [generator=3.2.3]

I searched around on this topic and found hits from 10 years ago
about maximum hardlinks per directory and being dependent upon
length of file path. Is that still relevant today?

This particular file does indeed have a huge number of hardlinks:

$ stat daily.0/chacha/var/lib/dpkg/info/apt-utils.postrm
  File: daily.0/chacha/var/lib/dpkg/info/apt-utils.postrm
  Size: 132             Blocks: 8          IO Block: 4096   regular file
Device: fd05h/64773d    Inode: 1342355538  Links: 7565
Access: (0755/-rwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2022-02-03 04:51:29.978915067 +0000
Modify: 2011-03-01 21:59:29.000000000 +0000
Change: 2022-01-01 17:35:06.598921506 +0000
 Birth: -
$ sudo find . -mount -samefile daily.0/chacha/var/lib/dpkg/info/apt-utils.postrm | wc -l
7565

I guess it is some sort of template file that is littered all over
Debian systems.

Is there anything I can do to get this working?

The receiving host with the btrfs filesystem is Debian 11
(bullseye), stock Debian kernel 5.10.0-11-amd64. The btrfs
filesystem is mounted with options:

/dev/mapper/backupenc on /data/backup type btrfs (rw,noatime,compress=zstd:15,space_cache,subvolid=5,subvol=/)

As an aside, when the file is as small as 132 bytes is there
actually any advantage in hardlinking copies of it together rather
than just having multiple copies of it? Is there some minimum
file size where it's just not worth it?

(Yes I am aware of offline deuplication which works in XFS as well,
it's just that where the files are known to be entirely identical
I've found that simply hardlinking them together was faster and
easier.)

Cheers,
Andy

             reply	other threads:[~2022-02-03 16:31 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-03 16:31 Andy Smith [this message]
2022-02-03 22:15 ` "Too many links (31)" issue Lukas Straub
2022-02-03 22:27   ` Andy Smith
2022-02-04 14:30     ` Filipe Manana

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220203163108.ipdv3yxbe7eb6vc4@bitfolk.com \
    --to=andy@strugglers.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox