public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Arkadiusz Miśkiewicz" <arekm@maven.pl>
Cc: "xfs@oss.sgi.com" <xfs@oss.sgi.com>
Subject: Re: hardlinking and deleting milions of small files
Date: Mon, 25 Jul 2016 10:23:34 +1000	[thread overview]
Message-ID: <20160725002334.GG12670@dastard> (raw)
In-Reply-To: <201607241438.10035.arekm@maven.pl>

On Sun, Jul 24, 2016 at 02:38:10PM +0200, Arkadiusz Miśkiewicz wrote:
> 
> Hello.
> 
> I'm using rsnapshot to backup big servers (like 5TB fs, 25 000 000 inodes, 
> small files - mailboxes in form of maildirs, so each mail is a separate file). 
> Backup server - kernel 4.6.3, V4 xfs filesystems.

What storage? What mount options? What is the xfs_info output?

(/me points at
http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F)

> cp -al for that amount takes about 1.5 day.
> rm -rf of hardlinked copy takes another 1.5 day

So what's the bottleneck? Reading the directory structure/inodes
into memory to make the copy? What's the IO performance look like?
Are they CPU bound? What else is generating IO load at the same
time? How big are the individual directories? How full is the
filesystem?  How many hardlinks in a "copy"?

FWIW< you've got 25M inodes in the filesystem - how many hardlinks
do you have in the filesystem? 100M? 200M? 1B? i.e. what's the scale
of the directory structure that contains all the hard links?

> (and toons of ram for these operations; causing OOM until recent kernels made 
> reclaim better, so no more OOM)

What oom problems? Slabtop output during a test?

> Now the weird part - similar operations on ext4 finish in matter of hours.

So you probably need to identify where the difference in behaviour
is - reading from disk, writing to disk, CPU usage, directory entry
creation/removal speed, etc.

> ps. Didn't do scientific comparison (I'm just viewing backup logs of two 
> similar mail servers (similar hardware, similar storage size) being backed up 
> to single backup server onto two partitions - one with xfs and one with ext4 
> on it))

So the /destination/ files is either ext4 or XFS, but the source
filesystem is the same? So how does "cp -al" work to create
hardlinks when copying to a different filesystem? If this is a copy
to a different filesystem, then it's a very different problem to
"create/removing hardlinks are slow".

Clearly I haven't understood what you are trying to describe, so can
you please describe the problem in more detail and not assume I know
anything about where you are copying from/to, what the hardware or
filesystem layout is, etc.

I know I haven't answered your question and just fired back a bunch
of questions, but I need to know specifics to be able to have any
chance of understanding the problem you are having.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

      parent reply	other threads:[~2016-07-25  0:24 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-24 12:38 hardlinking and deleting milions of small files Arkadiusz Miśkiewicz
2016-07-24 12:48 ` Carlos E. R.
2016-07-25  0:23 ` Dave Chinner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160725002334.GG12670@dastard \
    --to=david@fromorbit.com \
    --cc=arekm@maven.pl \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox