public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* is xfs good if I have millions of files and thousands of hardlinks?
@ 2008-02-19 13:53 Tomasz Chmielewski
  2008-02-19 21:09 ` Peter Grandi
  2008-02-19 22:30 ` Mark Goodwin
  0 siblings, 2 replies; 5+ messages in thread
From: Tomasz Chmielewski @ 2008-02-19 13:53 UTC (permalink / raw)
  To: xfs

I have a ext3 filesystem with almost 200 million files (1.2 TB fs, ~65% 
full); most of the files are hardlinked multiple times, some of them are 
hardlinked thousands of times.

I described my problem yesterday on linux-fsdev list:

http://marc.info/?t=120333985100003


In general, because new files and hardlinks are being added all the time 
and the old ones are being removed, this leads to a very, very poor 
performance.

When I want to remove a lot of directories/files (which will be 
hardlinks, mostly), I see disk write speed is down to
50 kB/s - 200 kB/s (fifty - two hundred kilobytes/s) - this is the 
"bandwidth" used during the deletion.


Also, the filesystem is very fragmented ("dd if=/dev/zero of=some_file 
bs=64k" writes only about 1 MB/s).


Will xfs handle a large number of files, including lots of hardlinks, 
any better than ext3?



-- 
Tomasz Chmielewski
http://wpkg.org

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: is xfs good if I have millions of files and thousands of hardlinks?
  2008-02-19 13:53 is xfs good if I have millions of files and thousands of hardlinks? Tomasz Chmielewski
@ 2008-02-19 21:09 ` Peter Grandi
  2008-02-19 22:30 ` Mark Goodwin
  1 sibling, 0 replies; 5+ messages in thread
From: Peter Grandi @ 2008-02-19 21:09 UTC (permalink / raw)
  To: Linux XFS

>>> On Tue, 19 Feb 2008 14:53:57 +0100, Tomasz Chmielewski
>>> <mangoo@wpkg.org> said:

mangoo> I have a ext3 filesystem with almost 200 million files
mangoo> (1.2 TB fs, ~65% full); most of the files are hardlinked
mangoo> multiple times, some of them are hardlinked thousands of
mangoo> times.

Lucky numbers! :-)

mangoo> In general, because new files and hardlinks are being
mangoo> added all the time and the old ones are being removed,
mangoo> this leads to a very, very poor performance.

That is not the cause of the poor performance. The ultimate
cause is rather different.

mangoo> When I want to remove a lot of directories/files (which
mangoo> will be hardlinks, mostly), I see disk write speed is
mangoo> down to 50 kB/s - 200 kB/s (fifty - two hundred
mangoo> kilobytes/s) - this is the "bandwidth" used during the
mangoo> deletion.

How is bandwidth relevant for that? OK that there are quotes,
but it seems very very stranget regardless.

mangoo> Also, the filesystem is very fragmented ("dd
mangoo> if=/dev/zero of=some_file bs=64k" writes only about 1
mangoo> MB/s).

Then more the merrier.

mangoo> Will xfs handle a large number of files, including lots
mangoo> of hardlinks, any better than ext3?

It shows consideration to consult the archives of a mailing list
before aking a question. It may be a good idea to do it even
after posting a question :-).

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: is xfs good if I have millions of files and thousands of hardlinks?
  2008-02-19 13:53 is xfs good if I have millions of files and thousands of hardlinks? Tomasz Chmielewski
  2008-02-19 21:09 ` Peter Grandi
@ 2008-02-19 22:30 ` Mark Goodwin
  2008-02-19 22:43   ` Tomasz Chmielewski
  2008-02-20  9:56   ` Tomasz Chmielewski
  1 sibling, 2 replies; 5+ messages in thread
From: Mark Goodwin @ 2008-02-19 22:30 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: xfs



Tomasz Chmielewski wrote:
> I have a ext3 filesystem with almost 200 million files (1.2 TB fs, ~65% 
> full); most of the files are hardlinked multiple times, some of them are 
> hardlinked thousands of times.
> 
> I described my problem yesterday on linux-fsdev list:
> 
> http://marc.info/?t=120333985100003

quite a long discussion there .. I haven't read it .. but some comments
below anyway ..

> In general, because new files and hardlinks are being added all the time 
> and the old ones are being removed, this leads to a very, very poor 
> performance.
> 
> When I want to remove a lot of directories/files (which will be 
> hardlinks, mostly), I see disk write speed is down to
> 50 kB/s - 200 kB/s (fifty - two hundred kilobytes/s) - this is the 
> "bandwidth" used during the deletion.
> 
> 
> Also, the filesystem is very fragmented ("dd if=/dev/zero of=some_file 
> bs=64k" writes only about 1 MB/s).
> 
> Will xfs handle a large number of files, including lots of hardlinks, 
> any better than ext3?

defragmenting by copying from the ext3 filesystem to a new filesystem
should help, for a while at least. Whether xfs would have an on-going
performance problem compared to ext3 depends on your usage patterns ..
does "all the time" mean you are continuously adding new files and links
and removing files at a high rate/second? Are multiple threads doing this?
Are all the files the same size? Block-size been tuned?

-- Mark

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: is xfs good if I have millions of files and thousands of hardlinks?
  2008-02-19 22:30 ` Mark Goodwin
@ 2008-02-19 22:43   ` Tomasz Chmielewski
  2008-02-20  9:56   ` Tomasz Chmielewski
  1 sibling, 0 replies; 5+ messages in thread
From: Tomasz Chmielewski @ 2008-02-19 22:43 UTC (permalink / raw)
  To: markgw; +Cc: xfs

Mark Goodwin schrieb:

> defragmenting by copying from the ext3 filesystem to a new filesystem
> should help, for a while at least. Whether xfs would have an on-going
> performance problem compared to ext3 depends on your usage patterns ..
> does "all the time" mean you are continuously adding new files and links
> and removing files at a high rate/second? Are multiple threads doing this?

Yes. Multiple threads adding new files (or hardlinks, if there are such 
files already) all the time (24h/day).
Normally, there is only one thread removing the files. Because of this 
performance problem I described, it also does its job 24h/day - it just 
can't finish removing the unneeded files in a couple of hours, not to 
say one day.


> Are all the files the same size? Block-size been tuned?

No, file sizes are mostly random stuff you will normally find on any 
rootfs, home, etc. directory.
It's a backup system which uses hardlinks so that files which are 
already in backup do not take additional place.

I didn't do any block-size tuning, as I don't really know where to bite.


-- 
Tomasz Chmielewski
http://wpkg.org

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: is xfs good if I have millions of files and thousands of hardlinks?
  2008-02-19 22:30 ` Mark Goodwin
  2008-02-19 22:43   ` Tomasz Chmielewski
@ 2008-02-20  9:56   ` Tomasz Chmielewski
  1 sibling, 0 replies; 5+ messages in thread
From: Tomasz Chmielewski @ 2008-02-20  9:56 UTC (permalink / raw)
  To: markgw; +Cc: xfs

Peter Grandi wrote:

> mangoo> In general, because new files and hardlinks are being
> mangoo> added all the time and the old ones are being removed,
> mangoo> this leads to a very, very poor performance.
> 
> That is not the cause of the poor performance. The ultimate
> cause is rather different.

Well, adding new files and hardlinks all the time leads to that that the 
inodes are scattered all over the disk.


> mangoo> When I want to remove a lot of directories/files (which
> mangoo> will be hardlinks, mostly), I see disk write speed is
> mangoo> down to 50 kB/s - 200 kB/s (fifty - two hundred
> mangoo> kilobytes/s) - this is the "bandwidth" used during the
> mangoo> deletion.
> 
> How is bandwidth relevant for that? OK that there are quotes,
> but it seems very very stranget regardless.

The filesystem is available via iSCSI, so it's easy to measure the 
current performance. But iSCSI is not a problem here - performance is 
very good on an empty filesystem on that very same iSCSI/SAN device.

What I mean, is that when I remove large amount of files, the bandwidth 
used for writing to the disk is only down to 50-200 kB/s. Down from 
what, one might ask? Let me paste here yet another quotation from 
linux-fsdevel list, it may shed some more light:

   Recently I began removing some of unneeded files (or hardlinks) and
   to my surprise, it takes longer than I initially expected.

   After cache is emptied (echo 3 > /proc/sys/vm/drop_caches) I can
   usually remove about 50000-200000 files with moderate performance.
   I see up to 5000 kB read/write from/to the disk, wa reported by top
   is usually 20-70%.

   After that, waiting for IO grows to 99%, and disk write speed is down
   to 50 kB/s - 200 kB/s (fifty - two hundred kilobytes/s).


> mangoo> Also, the filesystem is very fragmented ("dd
> mangoo> if=/dev/zero of=some_file bs=64k" writes only about 1
> mangoo> MB/s).
> 
> Then more the merrier.

Umm, no.
Usually, one is merrier when these numbers are high, not low ;)


> mangoo> Will xfs handle a large number of files, including lots
> mangoo> of hardlinks, any better than ext3?
> 
> It shows consideration to consult the archives of a mailing list
> before aking a question. It may be a good idea to do it even
> after posting a question :-).

Oh, I did consult the archive. There are not many posts about hardlinks 
here on this xfs list (or, at least I didn't find many).

There was even a similar subject last year: someone had a 17 TB array 
used for backup, which was getting full, and asked if xfs is or will be 
capable of transparent compression.
As xfs will not have transparent compression in a foreseeable future, it 
was suggested to him that he should use hardlinks - that alone could 
save him lots of space.

I wonder if the guy uses hardlinks now, and if yes, how does it behave 
on this 17 TB array (my filesystem is just 1.2 TB, but soon, I'm about 
to create a bigger one on another device - and hence my questions).



-- 
Tomasz Chmielewski
http://wpkg.org

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-02-20  9:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-19 13:53 is xfs good if I have millions of files and thousands of hardlinks? Tomasz Chmielewski
2008-02-19 21:09 ` Peter Grandi
2008-02-19 22:30 ` Mark Goodwin
2008-02-19 22:43   ` Tomasz Chmielewski
2008-02-20  9:56   ` Tomasz Chmielewski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox