public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* XFS lstat() _very_ slow on SMP
@ 2005-05-16 16:25 Jan Kasprzak
  2005-05-18 14:02 ` Christoph Hellwig
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Kasprzak @ 2005-05-16 16:25 UTC (permalink / raw)
  To: linux-kernel

	Hi all,

	I have a big XFS volume on my fileserver, and I have noticed that
making an incremental backup of this volume is _very_ slow. The incremental
backup essentially checks mtime of all files on this volume, and it
takes ~4ms of _system_ time (i.e. no iowait or what) to do a lstat().

	The server is quad opteron with 26GB of RAM, 2.6.11.x kernel.
The volume in question is this big:

$ df -i /export/home
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/array-vg/home   2147502080 7917653 2139584427    1% /export/home
$ df -k /export/home
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/array-vg/home   2147371008 558965100 1588405908  27% /export/home

	As you can see it has ~8 milion of files. At 4ms per file it takes
almost 9 hours of system time just to select the files to back up.

	I've tried to narrow this problem in the following way: I've created a
new logical volume, put an empty XFS filesystem on it, and created
16.7M files in the three-level directory structure (256 subdirs at each
level, 256 files in each leaf directory). It took about a day of
_system_ time to create this structure, and another day to run
"find /mnt1 -type f -mtime +1000 -print" on it. I did a strace -c
of this find command, and it looks like this:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.97 68713.490033        4048  16974597           lstat
[...]

	So the find(1) spent almost all of its time in lstat() syscall,
and each call took about 4ms of system time. Real time was almost the
same as the system time, so disk subsystem was not a bottleneck here.

	When I tried to umount this test volume, the umount(8) did not
finish even after 5 minutes, and it was cycling somewhere in kernel
(according to top(1) it used 25% of CPU time, i.e. one CPU out of 4).
In addition to this, all other filesystem-related processes started to
lock up in kernel, possibly waiting for some lock held by umount command.
I had to reset this server after 5 minutes.

	I tried to run the same test on a single-CPU Athlon FX-51 with
512M RAM, and only 128*128*128 (2M) files, and got the following
numbers:

      create                     find -type f -mtime +1000    cost of lstat()
XFS   10m15s real, 4m22s system  7m30s real, 3m17s system     107us
ext3  1m10s real, 0m30s system   1m26s real, 0m10s system       9us

So XFS lstat() is almost 12-times slower than ext3 one even on single-CPU
x86_64. I've tried to boot a SMP kernel as well, but there was no measurable
difference in speed. So XFS is slow even on this box, but going 4-way SMP
makes the problem two magnitudes worse.

	The problem is definitely in XFS on SMP (or maybe in filling up
the dentry/inode cache, because the cost of lstat() on smaller trees is
not so big).

	I will do more benchmarks if you want (I can test it
on SMP x86 as well, for example).

-Yenya

-- 
| Jan "Yenya" Kasprzak  <kas at {fi.muni.cz - work | yenya.net - private}> |
| GPG: ID 1024/D3498839      Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/   Czech Linux Homepage: http://www.linux.cz/ |
-- Yes. CVS is much denser.                                               --
-- CVS is also total crap. So your point is?             --Linus Torvalds --

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: XFS lstat() _very_ slow on SMP
  2005-05-16 16:25 XFS lstat() _very_ slow on SMP Jan Kasprzak
@ 2005-05-18 14:02 ` Christoph Hellwig
  2005-05-18 17:45   ` Jan Kasprzak
  0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2005-05-18 14:02 UTC (permalink / raw)
  To: Jan Kasprzak; +Cc: linux-kernel

On Mon, May 16, 2005 at 06:25:06PM +0200, Jan Kasprzak wrote:
> 	Hi all,
> 
> 	I have a big XFS volume on my fileserver, and I have noticed that
> making an incremental backup of this volume is _very_ slow. The incremental
> backup essentially checks mtime of all files on this volume, and it
> takes ~4ms of _system_ time (i.e. no iowait or what) to do a lstat().

Thanks a lot for the report, I'll investigate what's going on once I get
a little time.  (Early next week I hope)


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: XFS lstat() _very_ slow on SMP
  2005-05-18 14:02 ` Christoph Hellwig
@ 2005-05-18 17:45   ` Jan Kasprzak
  2005-05-28  9:11     ` Christoph Hellwig
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Kasprzak @ 2005-05-18 17:45 UTC (permalink / raw)
  To: Christoph Hellwig, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2920 bytes --]

Christoph Hellwig wrote:
: On Mon, May 16, 2005 at 06:25:06PM +0200, Jan Kasprzak wrote:
: > 	Hi all,
: > 
: > 	I have a big XFS volume on my fileserver, and I have noticed that
: > making an incremental backup of this volume is _very_ slow. The incremental
: > backup essentially checks mtime of all files on this volume, and it
: > takes ~4ms of _system_ time (i.e. no iowait or what) to do a lstat().
: 
: Thanks a lot for the report, I'll investigate what's going on once I get
: a little time.  (Early next week I hope)

	Hmm, I feel like I am hunting ghosts - after a fresh reboot
of the 4-CPU server I did four runs of 128*128*128 files with various
sizes of the underlying filesystem (in order to eliminate the volume
size as a problematic factor). I've got the following numbers:

Volume size   create time           find -mtime +1000     cost of lseek()
  5GB         55m77 real 52m51 sys  1m1 real 0m53 sys       19 usecs
 25GB         58m15 real 55m27 sys  83m47 real 82m15 sys  2171 usecs (!!!!!!)
125GB         67m0 real 61m35 sys   0m55 real 0m48 sys      18 usecs
625GB         68m30 real 62m38 sys  0m57 real 0m49 sys      18 usecs

	So the results are probably not dependent on the volume size,
but on something totally random (such as which cpu the command
ends up running on or something like that), or on the system uptime
(and implied fragmentation of memory or what).

	I've tried to re-run the same test the next day (i.e. on
server with longer uptime), but the server crashed - my test script
ended locked up somewhere in kernel (probably holding some locks),
and then other processes started to lock up after accessing the file
system (my top(1) was running OK, but when I tried to "touch newfile"
in another shell, it locked up as well).  So I had to reset this server
again.

	I am not really sure where exactly the problem is. I think
it is related to XFS, big memory of this server (26 GB), four CPUs,
and maybe even the x86_64 architecture. I was not able to reproduce
the problem on the same HW using ext3fs, and the problem is also
a magnitude smaller on 2-way system with 4GB of RAM. Maybe I should
try to reproduce this on our Altix box to eliminate the x86_64 as the
possible source of problems.

	I use the attached "bigtree.pl" to create the directory structure
("time ./bigtree.pl /new-volume 3 128" for 128*128*128 files), and then
"strace -c find /new-volume -type f -mtime +1000 -print" (the numbers
without strace are almost the same, so strace is not a problem here).

	Thanks,

-Yenya

-- 
| Jan "Yenya" Kasprzak  <kas at {fi.muni.cz - work | yenya.net - private}> |
| GPG: ID 1024/D3498839      Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/   Czech Linux Homepage: http://www.linux.cz/ |
-- Yes. CVS is much denser.                                               --
-- CVS is also total crap. So your point is?             --Linus Torvalds --

[-- Attachment #2: bigtree.pl --]
[-- Type: text/plain, Size: 651 bytes --]

#!/usr/bin/perl -w

use strict;

if ($#ARGV != 2) {
	print STDERR "Usage: $0 prefix depth width\n\t$0 /mnt1 2 256\n";
	exit 1;
}

my ($prefix, $maxdepth, $files) = @ARGV;
my $print = 10_000;
my $count = 0;
my $total = 0;

sub depth($$);

sub depth($$) {
	my ($depth, $prefix) = @_;
	for my $i (0..$files-1) {
		my $p = $prefix . '/' . $i;
		if ($depth < $maxdepth) {
			mkdir $p or die "mkdir $p: $!";
			depth($depth+1, $p);
		} else {
			open F, ">$p" or die "open $p: $!";
			close F;
			unless (++$count % $print) {
				print STDERR "$count/$total files created\n"
					if -t STDERR;
			}
		}
	}
}

$total = $files**$maxdepth;
depth (1, $prefix);

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: XFS lstat() _very_ slow on SMP
  2005-05-18 17:45   ` Jan Kasprzak
@ 2005-05-28  9:11     ` Christoph Hellwig
  2005-05-29 18:37       ` Jan Kasprzak
  0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2005-05-28  9:11 UTC (permalink / raw)
  To: Jan Kasprzak; +Cc: linux-kernel, linux-xfs

On Wed, May 18, 2005 at 07:45:30PM +0200, Jan Kasprzak wrote:
> Christoph Hellwig wrote:
> : On Mon, May 16, 2005 at 06:25:06PM +0200, Jan Kasprzak wrote:
> : > 	Hi all,
> : > 
> : > 	I have a big XFS volume on my fileserver, and I have noticed that
> : > making an incremental backup of this volume is _very_ slow. The incremental
> : > backup essentially checks mtime of all files on this volume, and it
> : > takes ~4ms of _system_ time (i.e. no iowait or what) to do a lstat().
> : 
> : Thanks a lot for the report, I'll investigate what's going on once I get
> : a little time.  (Early next week I hope)
> 
> 	Hmm, I feel like I am hunting ghosts - after a fresh reboot
> of the 4-CPU server I did four runs of 128*128*128 files with various
> sizes of the underlying filesystem (in order to eliminate the volume
> size as a problematic factor). I've got the following numbers:
> 
> Volume size   create time           find -mtime +1000     cost of lseek()
>   5GB         55m77 real 52m51 sys  1m1 real 0m53 sys       19 usecs
>  25GB         58m15 real 55m27 sys  83m47 real 82m15 sys  2171 usecs (!!!!!!)
> 125GB         67m0 real 61m35 sys   0m55 real 0m48 sys      18 usecs
> 625GB         68m30 real 62m38 sys  0m57 real 0m49 sys      18 usecs
> 
> 	So the results are probably not dependent on the volume size,
> but on something totally random (such as which cpu the command
> ends up running on or something like that), or on the system uptime
> (and implied fragmentation of memory or what).
> 
> 	I've tried to re-run the same test the next day (i.e. on
> server with longer uptime), but the server crashed - my test script
> ended locked up somewhere in kernel (probably holding some locks),
> and then other processes started to lock up after accessing the file
> system (my top(1) was running OK, but when I tried to "touch newfile"
> in another shell, it locked up as well).  So I had to reset this server
> again.
> 
> 	I am not really sure where exactly the problem is. I think
> it is related to XFS, big memory of this server (26 GB), four CPUs,
> and maybe even the x86_64 architecture. I was not able to reproduce
> the problem on the same HW using ext3fs, and the problem is also
> a magnitude smaller on 2-way system with 4GB of RAM. Maybe I should
> try to reproduce this on our Altix box to eliminate the x86_64 as the
> possible source of problems.
> 
> 	I use the attached "bigtree.pl" to create the directory structure
> ("time ./bigtree.pl /new-volume 3 128" for 128*128*128 files), and then
> "strace -c find /new-volume -type f -mtime +1000 -print" (the numbers
> without strace are almost the same, so strace is not a problem here).

I couldn't reproduce the odd case here.  Could you try to get some profiling
data with oprofile for the odd and one of the normal cases?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: XFS lstat() _very_ slow on SMP
  2005-05-28  9:11     ` Christoph Hellwig
@ 2005-05-29 18:37       ` Jan Kasprzak
  0 siblings, 0 replies; 5+ messages in thread
From: Jan Kasprzak @ 2005-05-29 18:37 UTC (permalink / raw)
  To: Christoph Hellwig, linux-kernel, linux-xfs

Christoph Hellwig wrote:
: > ("time ./bigtree.pl /new-volume 3 128" for 128*128*128 files), and then
: > "strace -c find /new-volume -type f -mtime +1000 -print" (the numbers
: > without strace are almost the same, so strace is not a problem here).
: 
: I couldn't reproduce the odd case here.  Could you try to get some profiling
: data with oprofile for the odd and one of the normal cases?

	I have solved this by adding "ihashsize=65537" to /etc/fstab.
Now my find(1) is limited by the disk speed instead of system time.

-Y.
-- 
| Jan "Yenya" Kasprzak  <kas at {fi.muni.cz - work | yenya.net - private}> |
| GPG: ID 1024/D3498839      Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/   Czech Linux Homepage: http://www.linux.cz/ |
-- Yes. CVS is much denser.                                               --
-- CVS is also total crap. So your point is?             --Linus Torvalds --

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2005-05-29 18:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-16 16:25 XFS lstat() _very_ slow on SMP Jan Kasprzak
2005-05-18 14:02 ` Christoph Hellwig
2005-05-18 17:45   ` Jan Kasprzak
2005-05-28  9:11     ` Christoph Hellwig
2005-05-29 18:37       ` Jan Kasprzak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox