public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* XFS unlink still slow on 3.1.9 kernel ?
@ 2012-02-13 16:57 Richard Ems
  2012-02-13 17:08 ` Christoph Hellwig
  2012-02-14  0:09 ` Dave Chinner
  0 siblings, 2 replies; 31+ messages in thread
From: Richard Ems @ 2012-02-13 16:57 UTC (permalink / raw)
  To: xfs

Hello list !

I ran a "find dir" on one directory with 11 million files and dirs in it
and it took 100 minutes. Is this a "normal" run time to be expected?

I am running openSUSE 12.1, kernel 3.1.9-1.4-default. The 20 TB XFS
partition is 100% full and is on an external InforTrend RAID system with
24 x 1 TB SATA HDDs on RAID 6 with one hot-spare HDD, so 21 data discs
plus 2 parity discs plus 1 hot-spare disc. The case is connected through
SCSI.

The system was not running anything else on that discs and the load on
the server was around 1 because of only this one find command running.

I am asking because I am seeing very long times while removing big
directory trees. I thought on kernels above 3.0 removing dirs and files
had improved a lot, but I don't see that improvement.

This is a backup system running dirvish, so most files in the dirs I am
removing are hard links. Almost all of the files do have ACLs set.


# mount | grep xfs
/dev/sda1 on /backup/IFT type xfs
(rw,noatime,nodiratime,attr2,delaylog,nobarrier,logbufs=8,logbsize=256k,sunit=256,swidth=5376,noquota,_netdev)

Any thoughts?


Thanks, Richard



-- 
Richard Ems       mail: Richard.Ems@Cape-Horn-Eng.com

Cape Horn Engineering S.L.
C/ Dr. J.J. Dómine 1, 5º piso
46011 Valencia
Tel : +34 96 3242923 / Fax 924
http://www.cape-horn-eng.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS unlink still slow on 3.1.9 kernel ?
  2012-02-13 16:57 XFS unlink still slow on 3.1.9 kernel ? Richard Ems
@ 2012-02-13 17:08 ` Christoph Hellwig
  2012-02-13 17:11   ` Richard Ems
  2012-02-14  0:09 ` Dave Chinner
  1 sibling, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2012-02-13 17:08 UTC (permalink / raw)
  To: Richard Ems; +Cc: xfs

On Mon, Feb 13, 2012 at 05:57:58PM +0100, Richard Ems wrote:
> This is a backup system running dirvish, so most files in the dirs I am
> removing are hard links. Almost all of the files do have ACLs set.

How many ACLs do you usually have set?  If they aren't stored inline
but need to go out of the inode unlinks will be extremly slow for
kernels before v3.2.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS unlink still slow on 3.1.9 kernel ?
  2012-02-13 17:08 ` Christoph Hellwig
@ 2012-02-13 17:11   ` Richard Ems
  2012-02-13 17:15     ` Christoph Hellwig
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Ems @ 2012-02-13 17:11 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On 02/13/2012 06:08 PM, Christoph Hellwig wrote:
> On Mon, Feb 13, 2012 at 05:57:58PM +0100, Richard Ems wrote:
>> This is a backup system running dirvish, so most files in the dirs I am
>> removing are hard links. Almost all of the files do have ACLs set.
> 
> How many ACLs do you usually have set?  If they aren't stored inline
> but need to go out of the inode unlinks will be extremly slow for
> kernels before v3.2.
> 

Almost all dirs and files there do have ACLs set.
Each of them do have about 10 user ACLs and 10 default ACls.
Is that too many?
Is this then the reason for being that slow?
Will updating to a kernel > 3.2 improve the unlink speed?

Many thanks,
Richard

-- 
Richard Ems       mail: Richard.Ems@Cape-Horn-Eng.com

Cape Horn Engineering S.L.
C/ Dr. J.J. Dómine 1, 5º piso
46011 Valencia
Tel : +34 96 3242923 / Fax 924
http://www.cape-horn-eng.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS unlink still slow on 3.1.9 kernel ?
  2012-02-13 17:11   ` Richard Ems
@ 2012-02-13 17:15     ` Christoph Hellwig
  2012-02-13 17:26       ` Richard Ems
  0 siblings, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2012-02-13 17:15 UTC (permalink / raw)
  To: Richard Ems; +Cc: Christoph Hellwig, xfs

On Mon, Feb 13, 2012 at 06:11:30PM +0100, Richard Ems wrote:
> On 02/13/2012 06:08 PM, Christoph Hellwig wrote:
> > On Mon, Feb 13, 2012 at 05:57:58PM +0100, Richard Ems wrote:
> >> This is a backup system running dirvish, so most files in the dirs I am
> >> removing are hard links. Almost all of the files do have ACLs set.
> > 
> > How many ACLs do you usually have set?  If they aren't stored inline
> > but need to go out of the inode unlinks will be extremly slow for
> > kernels before v3.2.
> > 
> 
> Almost all dirs and files there do have ACLs set.
> Each of them do have about 10 user ACLs and 10 default ACls.
> Is that too many?
> Is this then the reason for being that slow?

That doesn't sound like a lot to me, but instead of guessing around,
let's just check the actual facts.

Does "xfs_bmap -a" for the kind of files you are deleting show any
extents? If it doesn't the output will look like:

# xfs_bmap -a internal
internal: no extents

if it has any it will look like:

# xfs_bmap -a external
external:
	0: [0..7]: 8557712..8557719

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS unlink still slow on 3.1.9 kernel ?
  2012-02-13 17:15     ` Christoph Hellwig
@ 2012-02-13 17:26       ` Richard Ems
  2012-02-13 17:29         ` Christoph Hellwig
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Ems @ 2012-02-13 17:26 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On 02/13/2012 06:15 PM, Christoph Hellwig wrote:
> On Mon, Feb 13, 2012 at 06:11:30PM +0100, Richard Ems wrote:
>> On 02/13/2012 06:08 PM, Christoph Hellwig wrote:
>>> On Mon, Feb 13, 2012 at 05:57:58PM +0100, Richard Ems wrote:
>>>> This is a backup system running dirvish, so most files in the dirs I am
>>>> removing are hard links. Almost all of the files do have ACLs set.
>>>
>>> How many ACLs do you usually have set?  If they aren't stored inline
>>> but need to go out of the inode unlinks will be extremly slow for
>>> kernels before v3.2.
>>>
>>
>> Almost all dirs and files there do have ACLs set.
>> Each of them do have about 10 user ACLs and 10 default ACls.
>> Is that too many?
>> Is this then the reason for being that slow?
> 
> That doesn't sound like a lot to me, but instead of guessing around,
> let's just check the actual facts.
> 
> Does "xfs_bmap -a" for the kind of files you are deleting show any
> extents? If it doesn't the output will look like:
> 
> # xfs_bmap -a internal
> internal: no extents
> 
> if it has any it will look like:
> 
> # xfs_bmap -a external
> external:
> 	0: [0..7]: 8557712..8557719
> 

YES. All files (and dirs) that I checked do show something as

0: [0..7]: 18531216..18531223

So, what improvements can I expect from a kernel > 3.2 ?
Can I read somewhere about the changes/patches introduced?
Is there another way to mount/create/mkfs the XFS to improve the unlink
time for this case?


Thanks again,
Richard


-- 
Richard Ems       mail: Richard.Ems@Cape-Horn-Eng.com

Cape Horn Engineering S.L.
C/ Dr. J.J. Dómine 1, 5º piso
46011 Valencia
Tel : +34 96 3242923 / Fax 924
http://www.cape-horn-eng.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS unlink still slow on 3.1.9 kernel ?
  2012-02-13 17:26       ` Richard Ems
@ 2012-02-13 17:29         ` Christoph Hellwig
  2012-02-13 17:53           ` Richard Ems
  0 siblings, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2012-02-13 17:29 UTC (permalink / raw)
  To: Richard Ems; +Cc: xfs

On Mon, Feb 13, 2012 at 06:26:46PM +0100, Richard Ems wrote:
> YES. All files (and dirs) that I checked do show something as
> 
> 0: [0..7]: 18531216..18531223
> 
> So, what improvements can I expect from a kernel > 3.2 ?
> Can I read somewhere about the changes/patches introduced?

On some crazy workloads I've seen speedups up to a factor of 10.000 (5
orders or magnitude).  You probably won't get that much of a speedup,
but it will still be significant.

The patch in mainline for this is:

commit 859f57ca00805e6c482eef1a7ab073097d02c8ca
Author: Christoph Hellwig <hch@infradead.org>
Date:   Sat Aug 27 14:45:11 2011 +0000

    xfs: avoid synchronous transactions when deleting attr blocks

> Is there another way to mount/create/mkfs the XFS to improve the unlink
> time for this case?

Try increasing the inode size during filesystem creating using the
"-i size=512" option or even "-i size=1024" if you still have
out of line attributes.  The should give you even bigger speedups
for this workload than the patch above.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS unlink still slow on 3.1.9 kernel ?
  2012-02-13 17:29         ` Christoph Hellwig
@ 2012-02-13 17:53           ` Richard Ems
  2012-02-13 18:02             ` Christoph Hellwig
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Ems @ 2012-02-13 17:53 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On 02/13/2012 06:29 PM, Christoph Hellwig wrote:
> On Mon, Feb 13, 2012 at 06:26:46PM +0100, Richard Ems wrote:
>> YES. All files (and dirs) that I checked do show something as
>>
>> 0: [0..7]: 18531216..18531223
>>
>> So, what improvements can I expect from a kernel > 3.2 ?
>> Can I read somewhere about the changes/patches introduced?
> 
> On some crazy workloads I've seen speedups up to a factor of 10.000 (5
> orders or magnitude).  You probably won't get that much of a speedup,
> but it will still be significant.
> 
> The patch in mainline for this is:
> 
> commit 859f57ca00805e6c482eef1a7ab073097d02c8ca
> Author: Christoph Hellwig <hch@infradead.org>
> Date:   Sat Aug 27 14:45:11 2011 +0000
> 
>     xfs: avoid synchronous transactions when deleting attr blocks
> 
>> Is there another way to mount/create/mkfs the XFS to improve the unlink
>> time for this case?
> 
> Try increasing the inode size during filesystem creating using the
> "-i size=512" option or even "-i size=1024" if you still have
> out of line attributes.  The should give you even bigger speedups
> for this workload than the patch above.
> 

Ok, Many thanks for this good info!

I will try to install a > 3.2 kernel and will create new XFS partitions
with "-i size=1024", since we use ACLs a lot for user access.
Is there a chance to change existing XFS partitions to "-i size=1024" ?
I already have 5 big partitions, all full of ACLs and not running
kernels > 3.2 !

Many thanks again,
Richard


-- 
Richard Ems       mail: Richard.Ems@Cape-Horn-Eng.com

Cape Horn Engineering S.L.
C/ Dr. J.J. Dómine 1, 5º piso
46011 Valencia
Tel : +34 96 3242923 / Fax 924
http://www.cape-horn-eng.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS unlink still slow on 3.1.9 kernel ?
  2012-02-13 17:53           ` Richard Ems
@ 2012-02-13 18:02             ` Christoph Hellwig
  2012-02-13 18:06               ` Richard Ems
  0 siblings, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2012-02-13 18:02 UTC (permalink / raw)
  To: Richard Ems; +Cc: Christoph Hellwig, xfs

On Mon, Feb 13, 2012 at 06:53:20PM +0100, Richard Ems wrote:
> I will try to install a > 3.2 kernel and will create new XFS partitions
> with "-i size=1024", since we use ACLs a lot for user access.
> Is there a chance to change existing XFS partitions to "-i size=1024" ?

Unfortunately not.  Note that the speedups in 3.2 only matter for
out of line attributes - once you store the ACLs inside the inode the
code that makes it dog slow in old kernels is never used.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS unlink still slow on 3.1.9 kernel ?
  2012-02-13 18:02             ` Christoph Hellwig
@ 2012-02-13 18:06               ` Richard Ems
  2012-02-13 18:10                 ` Christoph Hellwig
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Ems @ 2012-02-13 18:06 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On 02/13/2012 07:02 PM, Christoph Hellwig wrote:
> On Mon, Feb 13, 2012 at 06:53:20PM +0100, Richard Ems wrote:
>> I will try to install a > 3.2 kernel and will create new XFS partitions
>> with "-i size=1024", since we use ACLs a lot for user access.
>> Is there a chance to change existing XFS partitions to "-i size=1024" ?
> 
> Unfortunately not.  Note that the speedups in 3.2 only matter for
> out of line attributes - once you store the ACLs inside the inode the
> code that makes it dog slow in old kernels is never used.

Ok, I see, many thanks.

So

1. use kernels > 3.2, if XFS partition was not created using "-i size=1024"

2. if XFS partition was created using "-i size=1024", kernels < 3.2 will
also be fast while unlinking files with out of line attributes

Right?

Thanks,
Richard

-- 
Richard Ems       mail: Richard.Ems@Cape-Horn-Eng.com

Cape Horn Engineering S.L.
C/ Dr. J.J. Dómine 1, 5º piso
46011 Valencia
Tel : +34 96 3242923 / Fax 924
http://www.cape-horn-eng.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS unlink still slow on 3.1.9 kernel ?
  2012-02-13 18:06               ` Richard Ems
@ 2012-02-13 18:10                 ` Christoph Hellwig
  2012-02-13 18:18                   ` Richard Ems
  2012-02-13 18:48                   ` Richard Ems
  0 siblings, 2 replies; 31+ messages in thread
From: Christoph Hellwig @ 2012-02-13 18:10 UTC (permalink / raw)
  To: Richard Ems; +Cc: Christoph Hellwig, xfs

On Mon, Feb 13, 2012 at 07:06:44PM +0100, Richard Ems wrote:
> So
> 
> 1. use kernels > 3.2, if XFS partition was not created using "-i size=1024"

Yes.

> 2. if XFS partition was created using "-i size=1024", kernels < 3.2 will
> also be fast while unlinking files with out of line attributes

Exactly.

For workloads like yours creating the large inodes (probably only with
512 byte inodes, though) will be preferably if you have a choice, as it
should be even faster.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS unlink still slow on 3.1.9 kernel ?
  2012-02-13 18:10                 ` Christoph Hellwig
@ 2012-02-13 18:18                   ` Richard Ems
  2012-02-13 18:48                   ` Richard Ems
  1 sibling, 0 replies; 31+ messages in thread
From: Richard Ems @ 2012-02-13 18:18 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On 02/13/2012 07:10 PM, Christoph Hellwig wrote:
> On Mon, Feb 13, 2012 at 07:06:44PM +0100, Richard Ems wrote:
>> So
>>
>> 1. use kernels > 3.2, if XFS partition was not created using "-i size=1024"
> 
> Yes.
> 
>> 2. if XFS partition was created using "-i size=1024", kernels < 3.2 will
>> also be fast while unlinking files with out of line attributes
> 
> Exactly.
> 
> For workloads like yours creating the large inodes (probably only with
> 512 byte inodes, though) will be preferably if you have a choice, as it
> should be even faster.

Ok, I see. I will do it on all new XFS partitions, since we heavily use
ACLs.

Many thanks,
Richard

-- 
Richard Ems       mail: Richard.Ems@Cape-Horn-Eng.com

Cape Horn Engineering S.L.
C/ Dr. J.J. Dómine 1, 5º piso
46011 Valencia
Tel : +34 96 3242923 / Fax 924
http://www.cape-horn-eng.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS unlink still slow on 3.1.9 kernel ?
  2012-02-13 18:10                 ` Christoph Hellwig
  2012-02-13 18:18                   ` Richard Ems
@ 2012-02-13 18:48                   ` Richard Ems
  2012-02-13 21:16                     ` Christoph Hellwig
  1 sibling, 1 reply; 31+ messages in thread
From: Richard Ems @ 2012-02-13 18:48 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On 02/13/2012 07:10 PM, Christoph Hellwig wrote:
> On Mon, Feb 13, 2012 at 07:06:44PM +0100, Richard Ems wrote:
>> So
>>
>> 1. use kernels > 3.2, if XFS partition was not created using "-i size=1024"
> 
> Yes.
> 
>> 2. if XFS partition was created using "-i size=1024", kernels < 3.2 will
>> also be fast while unlinking files with out of line attributes
> 
> Exactly.
> 
> For workloads like yours creating the large inodes (probably only with
> 512 byte inodes, though) will be preferably if you have a choice, as it
> should be even faster.
> 

I already updated to 3.2.4 and started the same "find dir" command again
that previously took 100 min to run. It has been running now for over 30
min ...

Should this "find" run time also improve ?
Or will only unlink run time improve ?
I expect both of them to change similarly, or not? Wrong assumption?

Do I have to mount the XFS partition with some new/old/special option?

Thanks again,
Richard


-- 
Richard Ems       mail: Richard.Ems@Cape-Horn-Eng.com

Cape Horn Engineering S.L.
C/ Dr. J.J. Dómine 1, 5º piso
46011 Valencia
Tel : +34 96 3242923 / Fax 924
http://www.cape-horn-eng.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS unlink still slow on 3.1.9 kernel ?
  2012-02-13 18:48                   ` Richard Ems
@ 2012-02-13 21:16                     ` Christoph Hellwig
  2012-02-14  5:31                       ` Stan Hoeppner
                                         ` (4 more replies)
  0 siblings, 5 replies; 31+ messages in thread
From: Christoph Hellwig @ 2012-02-13 21:16 UTC (permalink / raw)
  To: Richard Ems; +Cc: Christoph Hellwig, xfs

On Mon, Feb 13, 2012 at 07:48:58PM +0100, Richard Ems wrote:
> I already updated to 3.2.4 and started the same "find dir" command again
> that previously took 100 min to run. It has been running now for over 30
> min ...
> 
> Should this "find" run time also improve ?

No, not by that change anyway.

> Or will only unlink run time improve ?

Yes.

> Do I have to mount the XFS partition with some new/old/special option?

I'd have to look into it in more detail.  IIRC you said you're using
RAID6 which can be fairly nasty for small reads.  Did you use the
inode64 mount option on the filesystem?

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS unlink still slow on 3.1.9 kernel ?
  2012-02-13 16:57 XFS unlink still slow on 3.1.9 kernel ? Richard Ems
  2012-02-13 17:08 ` Christoph Hellwig
@ 2012-02-14  0:09 ` Dave Chinner
  2012-02-14 12:32   ` Richard Ems
  1 sibling, 1 reply; 31+ messages in thread
From: Dave Chinner @ 2012-02-14  0:09 UTC (permalink / raw)
  To: Richard Ems; +Cc: xfs

On Mon, Feb 13, 2012 at 05:57:58PM +0100, Richard Ems wrote:
> Hello list !
> 
> I ran a "find dir" on one directory with 11 million files and dirs in it
> and it took 100 minutes. Is this a "normal" run time to be expected?

It certainly can be, depending on the way the directory is
fragmented, how sequential the inodes the directory references are
how slow the seek time of your disks are.

Just to put this in context, a directory with 11 million entries
with an average of 20 bytes per name results in roughly *350MB* of
directory data. That's likely to be fragmented into single 4k
blocks, so reading the entire directory contents will take you
something like 75,000 IOs.

If you then have to randomly read each of those 11 million inodes.
Assume we get a 50% hit rate (i.e. good!), we're reading 16 inodes
per IO.  That brings it down to about 680,000 IOs to read all the
inodes. So to read all the directory entries and inodes, you're
looking at about 750,000 IOs.

Given you have SATA drives, an average seek time of 5ms would be
pretty good. that gives 3,500,000ms of IO time to do all that IO.
That's just under an hour. Given that the IO is mostly serialised,
with CPU time between each IO and the io times will vary a bit, as
will cache hit rates, then taking 100 minutes to run find across the
directory is about right for your given storage.

> I am running openSUSE 12.1, kernel 3.1.9-1.4-default. The 20 TB XFS
> partition is 100% full

Running filesystems to 100% full is always a bad idea - it causes
significant increases in fragementation of both data and metadata
compared to a filesystem that doesn't get past ~90% full.

> and is on an external InforTrend RAID system with
> 24 x 1 TB SATA HDDs on RAID 6 with one hot-spare HDD, so 21 data discs
> plus 2 parity discs plus 1 hot-spare disc. The case is connected through
> SCSI.
> 
> The system was not running anything else on that discs and the load on
> the server was around 1 because of only this one find command running.
> 
> I am asking because I am seeing very long times while removing big
> directory trees. I thought on kernels above 3.0 removing dirs and files
> had improved a lot, but I don't see that improvement.

You won't if the directory traversal is seek bound and that is the
limiting factor for performance.

> This is a backup system running dirvish, so most files in the dirs I am
> removing are hard links. Almost all of the files do have ACLs set.

The unlink will have an extra IO to read per inode - the out-of-line
attribute block, so you've just added 11 million IOs to the 800,000
the traversal already takes to the unlink overhead. So it's going to
take roughly ten hours because the unlink is gong to be read IO seek
bound....

Christophs suggestions to use larger inodes to keep the attribute
data inline is a very good one - whenever you have a workload that
is attribute heavy you should use larger inodes to try to keep the
attributes in-line if possible. The down side is that increasing the
inode size increases the amount of IO required to read/write inodes,
though this typically isn't a huge penalty compared to the penalty
of out-of-line attributes.

Also, for large directories like this (millions of entries) you
should also consider using a larger directory block size (mkfs -n
size=xxxx option) as that can be scaled independently to the
filesystem block size. This will significantly decrease the amount
of IO and fragmentation large directories cause. Peak modification
performance of small directories will be reduced because larger
block size directories consume more CPU to process, but for large
directories performance will be significantly better as they will
spend much less time waiting for IO.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS unlink still slow on 3.1.9 kernel ?
  2012-02-13 21:16                     ` Christoph Hellwig
@ 2012-02-14  5:31                       ` Stan Hoeppner
  2012-02-14  9:48                       ` Richard Ems
                                         ` (3 subsequent siblings)
  4 siblings, 0 replies; 31+ messages in thread
From: Stan Hoeppner @ 2012-02-14  5:31 UTC (permalink / raw)
  To: xfs

On 2/13/2012 3:16 PM, Christoph Hellwig wrote:

> I'd have to look into it in more detail.  IIRC you said you're using
> RAID6 which can be fairly nasty for small reads.  Did you use the
> inode64 mount option on the filesystem?

On 2/13/2012 10:57 AM, Richard Ems wrote:

> # mount | grep xfs
> /dev/sda1 on /backup/IFT type xfs
>
(rw,noatime,nodiratime,attr2,delaylog,nobarrier,logbufs=8,logbsize=256k,sunit=256,swidth=5376,noquota,_netdev)

With a 16TB+ XFS, 20TB here, isn't inode64 the default allocator?

[...]

> 20 TB XFS
> partition is 100% full

Does the fact the FS is 100% full make any difference here?


> The case is connected through SCSI.

Do you mean iSCSI?  Does the host on which you're running your "find
dir" command have a 1GbE or 10GbE connection to the InforTrend unit?
More than one connection using bonding or multipath?  Direct connected
or through a switch(es)?  What brand is the switch(es)?  Switch(es)
under heavy load?

If it's a single direct 1GbE connection it's possible you're running out
of host pipe bandwidth which is only ~100MB/s in each direction.  Check
iotop/iostat while running your command to see if you're peaking the
interface with either read or write bytes.  If either are at or above
100MB/s then your host pipe is full, thus this is a significant part of
your high run time problem.

Also check the performance data in the management interface on the
InforTrend unit to see if you're hitting any limits there (if it has
such a feature).  RAID6 is numerically intensive and that particular
controller may not have the ASIC horsepower to keep up with the IOPS
workload you're throwing at it.

Lastly, please paste the exact command or script you refer to as "find
dir" which is generating the workload in question.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS unlink still slow on 3.1.9 kernel ?
  2012-02-13 21:16                     ` Christoph Hellwig
  2012-02-14  5:31                       ` Stan Hoeppner
@ 2012-02-14  9:48                       ` Richard Ems
  2012-02-14 19:43                         ` Christoph Hellwig
  2012-02-14  9:49                       ` Richard Ems
                                         ` (2 subsequent siblings)
  4 siblings, 1 reply; 31+ messages in thread
From: Richard Ems @ 2012-02-14  9:48 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

Hi Christoph, hi list,

I don;t have ONE dir with that 11 million files, it's one dir with many
directories and a total of about 11 million files AND dirs! See output
below!

On 02/13/2012 10:16 PM, Christoph Hellwig wrote:
> On Mon, Feb 13, 2012 at 07:48:58PM +0100, Richard Ems wrote:
>> I already updated to 3.2.4 and started the same "find dir" command again
>> that previously took 100 min to run. It has been running now for over 30
>> min ...
>>
>> Should this "find" run time also improve ?
> 
> No, not by that change anyway.

It didn't improve, 100 min again.



>> Or will only unlink run time improve ?
> 
> Yes.

rm took about 110 min.


>> Do I have to mount the XFS partition with some new/old/special option?
> 
> I'd have to look into it in more detail.  IIRC you said you're using
> RAID6 which can be fairly nasty for small reads.  Did you use the
> inode64 mount option on the filesystem?

No, I did not use it, but I was thinking about and ran the script from
http://sandeen.net/misc/summarise_stat.pl and got as an example on /bin:

# /net/c3m/usr/local/software/XFS/summarise_stat.pl /bin/
      9  6.2% are scripts (shell, perl, whatever)
     65 44.8% don't use any stat() family calls at all
     61 42.1% use 32-bit stat() family interfaces only
      9  6.2% use 64-bit stat64() family interfaces only
      1  0.7% use both 32-bit and 64-bit stat() family interfaces

So I was not sure if I should use inode64 or not.



This are the times that the run took yesterday:

Mon Feb 13 19:14:07 CET 2012

+ wc -l
+ find 2012-02-13/
11377443

real    101m30.811s
user    0m17.365s
sys     1m4.632s


Mon Feb 13 20:55:38 CET 2012

+ wc -l
+ find 2012-02-13/ -type d
834591

real    103m52.686s
user    0m11.765s
sys     1m41.018s


+ wc -l
+ find 2012-02-13/ -type f
10539154

real    104m38.421s
user    0m19.905s
sys     1m47.551s


+ /bin/rm -i -rf 2012-02-13/

real    110m55.764s
user    0m13.401s
sys     4m3.115s

Tue Feb 14 02:15:05 CET 2012


Thanks again,
Richard


-- 
Richard Ems       mail: Richard.Ems@Cape-Horn-Eng.com

Cape Horn Engineering S.L.
C/ Dr. J.J. Dómine 1, 5º piso
46011 Valencia
Tel : +34 96 3242923 / Fax 924
http://www.cape-horn-eng.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS unlink still slow on 3.1.9 kernel ?
  2012-02-13 21:16                     ` Christoph Hellwig
  2012-02-14  5:31                       ` Stan Hoeppner
  2012-02-14  9:48                       ` Richard Ems
@ 2012-02-14  9:49                       ` Richard Ems
  2012-02-14 10:54                       ` Richard Ems
  2012-02-14 11:44                       ` Richard Ems
  4 siblings, 0 replies; 31+ messages in thread
From: Richard Ems @ 2012-02-14  9:49 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On 02/13/2012 10:16 PM, Christoph Hellwig wrote:
> On Mon, Feb 13, 2012 at 07:48:58PM +0100, Richard Ems wrote:
>> I already updated to 3.2.4 and started the same "find dir" command again
>> that previously took 100 min to run. It has been running now for over 30
>> min ...
>>
>> Should this "find" run time also improve ?
> 
> No, not by that change anyway.
> 
>> Or will only unlink run time improve ?
> 
> Yes.

I see no improvement at all.   :(


-- 
Richard Ems       mail: Richard.Ems@Cape-Horn-Eng.com

Cape Horn Engineering S.L.
C/ Dr. J.J. Dómine 1, 5º piso
46011 Valencia
Tel : +34 96 3242923 / Fax 924
http://www.cape-horn-eng.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS unlink still slow on 3.1.9 kernel ?
  2012-02-13 21:16                     ` Christoph Hellwig
                                         ` (2 preceding siblings ...)
  2012-02-14  9:49                       ` Richard Ems
@ 2012-02-14 10:54                       ` Richard Ems
  2012-02-14 11:44                       ` Richard Ems
  4 siblings, 0 replies; 31+ messages in thread
From: Richard Ems @ 2012-02-14 10:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On 02/13/2012 10:16 PM, Christoph Hellwig wrote:
> On Mon, Feb 13, 2012 at 07:48:58PM +0100, Richard Ems wrote:
>> I already updated to 3.2.4 and started the same "find dir" command again
>> that previously took 100 min to run. It has been running now for over 30
>> min ...
>>
>> Should this "find" run time also improve ?
> 
> No, not by that change anyway.
> 
>> Or will only unlink run time improve ?
> 
> Yes.

I ran a rm on a smaller dir, containing 9225 dirs and 425659 files and
it took ~37 sec on 3.1.9 and ~20 sec on 3.2.4, so I do see a good
improvement there.

Thanks,
Richard


-- 
Richard Ems       mail: Richard.Ems@Cape-Horn-Eng.com

Cape Horn Engineering S.L.
C/ Dr. J.J. Dómine 1, 5º piso
46011 Valencia
Tel : +34 96 3242923 / Fax 924
http://www.cape-horn-eng.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS unlink still slow on 3.1.9 kernel ?
  2012-02-13 21:16                     ` Christoph Hellwig
                                         ` (3 preceding siblings ...)
  2012-02-14 10:54                       ` Richard Ems
@ 2012-02-14 11:44                       ` Richard Ems
  4 siblings, 0 replies; 31+ messages in thread
From: Richard Ems @ 2012-02-14 11:44 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On 02/13/2012 10:16 PM, Christoph Hellwig wrote:
> On Mon, Feb 13, 2012 at 07:48:58PM +0100, Richard Ems wrote:
>> I already updated to 3.2.4 and started the same "find dir" command again
>> that previously took 100 min to run. It has been running now for over 30
>> min ...
>>
>> Should this "find" run time also improve ?
> 
> No, not by that change anyway.
> 
>> Or will only unlink run time improve ?
> 
> Yes.

On a second test, removing 37981 directories with 312674 files, again
all with ACLs set, went from ~46 sec on 3.1.9 to ~13 sec on 3.2.4, so
the improvement is clearly there with 3.2.4 !

Richard

-- 
Richard Ems       mail: Richard.Ems@Cape-Horn-Eng.com

Cape Horn Engineering S.L.
C/ Dr. J.J. Dómine 1, 5º piso
46011 Valencia
Tel : +34 96 3242923 / Fax 924
http://www.cape-horn-eng.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS unlink still slow on 3.1.9 kernel ?
  2012-02-14  0:09 ` Dave Chinner
@ 2012-02-14 12:32   ` Richard Ems
  2012-02-14 19:45     ` Christoph Hellwig
  2012-02-15  1:27     ` Dave Chinner
  0 siblings, 2 replies; 31+ messages in thread
From: Richard Ems @ 2012-02-14 12:32 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Hi Dave, hi list,

first thanks for the very detailed reply. Please find below my comments
and questions.

On 02/14/2012 01:09 AM, Dave Chinner wrote:
> On Mon, Feb 13, 2012 at 05:57:58PM +0100, Richard Ems wrote:
>> I am running openSUSE 12.1, kernel 3.1.9-1.4-default. The 20 TB XFS
>> partition is 100% full
> 
> Running filesystems to 100% full is always a bad idea - it causes
> significant increases in fragementation of both data and metadata
> compared to a filesystem that doesn't get past ~90% full.

Yes, true, I know. But I have no other free space for this backups. I am
waiting for a new already ordered system and will have 4 times this
space. So later I will open a new thread asking if my thoughts for
creating this new 80 TB XFS partition are right.



>> I am asking because I am seeing very long times while removing big
>> directory trees. I thought on kernels above 3.0 removing dirs and files
>> had improved a lot, but I don't see that improvement.
> 
> You won't if the directory traversal is seek bound and that is the
> limiting factor for performance.

*Seek bound*? *When* is the directory traversal *seek bound*?


>> This is a backup system running dirvish, so most files in the dirs I am
>> removing are hard links. Almost all of the files do have ACLs set.
> 
> The unlink will have an extra IO to read per inode - the out-of-line
> attribute block, so you've just added 11 million IOs to the 800,000
> the traversal already takes to the unlink overhead. So it's going to
> take roughly ten hours because the unlink is gong to be read IO seek
> bound....

It took 110 minutes and not 10 hours. All files and dirs there had ACLs set.


> Christophs suggestions to use larger inodes to keep the attribute
> data inline is a very good one - whenever you have a workload that
> is attribute heavy you should use larger inodes to try to keep the
> attributes in-line if possible. The down side is that increasing the
> inode size increases the amount of IO required to read/write inodes,
> though this typically isn't a huge penalty compared to the penalty
> of out-of-line attributes.

I will use larger inodes always from now on, since we largely use ACLs
on our XFS partitions.


> Also, for large directories like this (millions of entries) you
> should also consider using a larger directory block size (mkfs -n
> size=xxxx option) as that can be scaled independently to the
> filesystem block size. This will significantly decrease the amount
> of IO and fragmentation large directories cause. Peak modification
> performance of small directories will be reduced because larger
> block size directories consume more CPU to process, but for large
> directories performance will be significantly better as they will
> spend much less time waiting for IO.

This was not ONE directory with that many files, but a directory
containing 834591 subdirectories (deeply nested, not all in the same
dir!) and 10539154 files.

Many thanks,
Richard


-- 
Richard Ems       mail: Richard.Ems@Cape-Horn-Eng.com

Cape Horn Engineering S.L.
C/ Dr. J.J. Dómine 1, 5º piso
46011 Valencia
Tel : +34 96 3242923 / Fax 924
http://www.cape-horn-eng.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS unlink still slow on 3.1.9 kernel ?
@ 2012-02-14 13:02 Richard Ems
       [not found] ` <4F3AA191.9030606@mnsu.edu>
  2012-02-14 23:10 ` Stan Hoeppner
  0 siblings, 2 replies; 31+ messages in thread
From: Richard Ems @ 2012-02-14 13:02 UTC (permalink / raw)
  To: xfs, Stan Hoeppner

Hi Stan,

I was not subscribed to the list so I did not get your mail. I
subscribed now, but apparently and admin has to aprove the subscription
since I am not getting any confirmation email. so this will take some time.

> On 2/13/2012 10:57 AM, Richard Ems wrote:
>> The case is connected through SCSI.
> 
> Do you mean iSCSI?  Does the host on which you're running your "find
> dir" command have a 1GbE or 10GbE connection to the InforTrend unit?
> More than one connection using bonding or multipath?  Direct connected
> or through a switch(es)?  What brand is the switch(es)?  Switch(es)
> under heavy load?

NO. SCSI. Not iSCSI. No switches, it's direct attached through a LSI
SCSI Controller:

lspci shows:

LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI



> If it's a single direct 1GbE connection it's possible you're running out
> of host pipe bandwidth which is only ~100MB/s in each direction.  Check
> iotop/iostat while running your command to see if you're peaking the
> interface with either read or write bytes.  If either are at or above
> 100MB/s then your host pipe is full, thus this is a significant part of
> your high run time problem.

This does not apply, or?


> Also check the performance data in the management interface on the
> InforTrend unit to see if you're hitting any limits there (if it has
> such a feature).  RAID6 is numerically intensive and that particular
> controller may not have the ASIC horsepower to keep up with the IOPS
> workload you're throwing at it.

There is no performance data in the management interface to check.


> Lastly, please paste the exact command or script you refer to as "find
> dir" which is generating the workload in question.

See previous answers to the list. You will see all commands there.

Thanks,
Richard



-- 
Richard Ems       mail: Richard.Ems@Cape-Horn-Eng.com

Cape Horn Engineering S.L.
C/ Dr. J.J. Dómine 1, 5º piso
46011 Valencia
Tel : +34 96 3242923 / Fax 924
http://www.cape-horn-eng.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS unlink still slow on 3.1.9 kernel ?
       [not found] ` <4F3AA191.9030606@mnsu.edu>
@ 2012-02-14 18:12   ` Richard Ems
  2012-02-14 19:07     ` Christoph Hellwig
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Ems @ 2012-02-14 18:12 UTC (permalink / raw)
  To: Jeffrey Hundstad; +Cc: xfs

Hi Jeffrey,


On 02/14/2012 07:01 PM, Jeffrey Hundstad wrote:
> Richard,
> 
> Someone asked if you used inode64.  I didn't see a response that you
> did.  Inode64 is a mount option.  I bet this will help with your
> problem.  It appears that all the inodes will be (by default, without
> the inode64 option) in the first 1TB of disk.  This could cause a LOT of
> seeks.  BTW: the option by itself will not help.  You'll need to
> save/restore to have this help.  However, I suspect over time it will
> help if files old files are replaced by new ones.
> 
> For example:
> mount -o inode64 /dev/sda1 /home/
> 
> Here's some documentation:
> 
> mount(8):  inode64
> Indicates that XFS is allowed to create inodes at any location in the
> filesystem, including those which will result in inode numbers occupying
> more than 32 bits of significance.  This is provided for backwards
> compatibility, but causes problems for backup applications that cannot
> handle large inode numbers.
> 
> http://xfs.org/index.php/XFS_FAQ#Q:_What_is_the_inode64_mount_option_for.3F
> Q: What is the inode64 mount option for?
> 
> By default, with 32bit inodes, XFS places inodes only in the first 1TB
> of a disk. If you have a disk with 100TB, all inodes will be stuck in
> the first TB. This can lead to strange things like "disk full" when you
> still have plenty space free, but there's no more place in the first TB
> to create a new inode. Also, performance sucks.
> 
> To come around this, use the inode64 mount options for filesystems >1TB.
> Inodes will then be placed in the location where their data is,
> minimizing disk seeks.

What about that programs using only 32-bit stat() ?

> 
> Beware that some old programs might have problems reading 64bit inodes,
> especially over NFS. Your editor used inode64 for over a year with
> recent (openSUSE 11.1 and higher) distributions using NFS and Samba
> without any corruptions, so that might be a recent enough distro.
> 

yes, I replied to Christoph's question stating that I am not using
inode64. My reply was:

"
No, I did not use it, but I was thinking about and ran the script from
http://sandeen.net/misc/summarise_stat.pl and got as an example on /bin:

# /net/c3m/usr/local/software/XFS/summarise_stat.pl /bin/
      9  6.2% are scripts (shell, perl, whatever)
     65 44.8% don't use any stat() family calls at all
     61 42.1% use 32-bit stat() family interfaces only
      9  6.2% use 64-bit stat64() family interfaces only
      1  0.7% use both 32-bit and 64-bit stat() family interfaces

So I was not sure if I should use inode64 or not.
"

Thanks, Richard


-- 
Richard Ems       mail: Richard.Ems@Cape-Horn-Eng.com

Cape Horn Engineering S.L.
C/ Dr. J.J. Dómine 1, 5º piso
46011 Valencia
Tel : +34 96 3242923 / Fax 924
http://www.cape-horn-eng.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS unlink still slow on 3.1.9 kernel ?
  2012-02-14 18:12   ` Richard Ems
@ 2012-02-14 19:07     ` Christoph Hellwig
  2012-02-15 12:48       ` Richard Ems
  0 siblings, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2012-02-14 19:07 UTC (permalink / raw)
  To: Richard Ems; +Cc: Jeffrey Hundstad, xfs

On Tue, Feb 14, 2012 at 07:12:29PM +0100, Richard Ems wrote:
> # /net/c3m/usr/local/software/XFS/summarise_stat.pl /bin/
>       9  6.2% are scripts (shell, perl, whatever)
>      65 44.8% don't use any stat() family calls at all
>      61 42.1% use 32-bit stat() family interfaces only
>       9  6.2% use 64-bit stat64() family interfaces only
>       1  0.7% use both 32-bit and 64-bit stat() family interfaces
> 
> So I was not sure if I should use inode64 or not.

Are you on a 32-bit system (userspace, kernel doesn't matter)?

If the system is 64-bit even the plain stat handles 64-bit inodes just
fine.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS unlink still slow on 3.1.9 kernel ?
  2012-02-14  9:48                       ` Richard Ems
@ 2012-02-14 19:43                         ` Christoph Hellwig
  0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2012-02-14 19:43 UTC (permalink / raw)
  To: Richard Ems; +Cc: xfs

On Tue, Feb 14, 2012 at 10:48:25AM +0100, Richard Ems wrote:
> It didn't improve, 100 min again.
> 
> 
> 
> >> Or will only unlink run time improve ?
> > 
> > Yes.
> 
> rm took about 110 min.

In that case you're pretty much bound by the time the directory
traversals help.

Using larger inodes will help you as we won't have to seek for the
external attribute block.  Not filling a fs 100% will probably help by
reducing the fragmentaion, as will using larger directory blocks
as suggested by Dave.

A patch to improve the unlink path won't help the traversal speed
ever, though.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS unlink still slow on 3.1.9 kernel ?
  2012-02-14 12:32   ` Richard Ems
@ 2012-02-14 19:45     ` Christoph Hellwig
  2012-02-15 12:07       ` Richard Ems
  2012-02-15  1:27     ` Dave Chinner
  1 sibling, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2012-02-14 19:45 UTC (permalink / raw)
  To: Richard Ems; +Cc: xfs

On Tue, Feb 14, 2012 at 01:32:00PM +0100, Richard Ems wrote:
> > You won't if the directory traversal is seek bound and that is the
> > limiting factor for performance.
> 
> *Seek bound*? *When* is the directory traversal *seek bound*?

You read the inode for the directory first, then the external attribute
block for the ACLs, then if the directory isn't tiny you'll start reading
directory blocks, the more the larger the directory is, and if the
filesystem is close to beeing full they often will be non-contiguous.
Then you read the inode for each file/directory in it, then the external
attribute block, then the extent list, and so on.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS unlink still slow on 3.1.9 kernel ?
  2012-02-14 13:02 Richard Ems
       [not found] ` <4F3AA191.9030606@mnsu.edu>
@ 2012-02-14 23:10 ` Stan Hoeppner
  2012-02-15 15:54   ` Richard Ems
  1 sibling, 1 reply; 31+ messages in thread
From: Stan Hoeppner @ 2012-02-14 23:10 UTC (permalink / raw)
  To: Richard Ems; +Cc: xfs

On 2/14/2012 7:02 AM, Richard Ems wrote:
> Hi Stan,
> 
> I was not subscribed to the list so I did not get your mail. I

Ahh, my bad.  I should have done a reply-all.  I sometimes forget XFS is
an open list.

> subscribed now, but apparently and admin has to aprove the subscription
> since I am not getting any confirmation email. so this will take some time.
> 
>> On 2/13/2012 10:57 AM, Richard Ems wrote:
>>> The case is connected through SCSI.
>>
>> Do you mean iSCSI?  Does the host on which you're running your "find
>> dir" command have a 1GbE or 10GbE connection to the InforTrend unit?
>> More than one connection using bonding or multipath?  Direct connected
>> or through a switch(es)?  What brand is the switch(es)?  Switch(es)
>> under heavy load?
> 
> NO. SCSI. Not iSCSI. No switches, it's direct attached through a LSI
> SCSI Controller:

Ok so it's obviously not a host bandwidth or latency issue then.  I was
just trying to cover the all the bases with my hardware questions.

Also I didn't realize anyone was (still?) offering parallel SCSI host
connections on their SATA arrays.

> lspci shows:
> 
> LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI

PCI-X?  So this host and array have a few years on them I assume.

>> Also check the performance data in the management interface on the
>> InforTrend unit to see if you're hitting any limits there (if it has
>> such a feature).  RAID6 is numerically intensive and that particular
>> controller may not have the ASIC horsepower to keep up with the IOPS
>> workload you're throwing at it.
> 
> There is no performance data in the management interface to check.

Ok, so this is a relatively basic, entry level array?

-- 
Stan


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS unlink still slow on 3.1.9 kernel ?
  2012-02-14 12:32   ` Richard Ems
  2012-02-14 19:45     ` Christoph Hellwig
@ 2012-02-15  1:27     ` Dave Chinner
  2012-02-15 12:07       ` Richard Ems
  1 sibling, 1 reply; 31+ messages in thread
From: Dave Chinner @ 2012-02-15  1:27 UTC (permalink / raw)
  To: Richard Ems; +Cc: xfs

On Tue, Feb 14, 2012 at 01:32:00PM +0100, Richard Ems wrote:
> On 02/14/2012 01:09 AM, Dave Chinner wrote:
> >> I am asking because I am seeing very long times while removing big
> >> directory trees. I thought on kernels above 3.0 removing dirs and files
> >> had improved a lot, but I don't see that improvement.
> > 
> > You won't if the directory traversal is seek bound and that is the
> > limiting factor for performance.
> 
> *Seek bound*? *When* is the directory traversal *seek bound*?

Whenever you are traversing a directory structure that is not alrady
hot in the cache. IOWS, almost always.

> >> This is a backup system running dirvish, so most files in the dirs I am
> >> removing are hard links. Almost all of the files do have ACLs set.
> > 
> > The unlink will have an extra IO to read per inode - the out-of-line
> > attribute block, so you've just added 11 million IOs to the 800,000
> > the traversal already takes to the unlink overhead. So it's going to
> > take roughly ten hours because the unlink is gong to be read IO seek
> > bound....
> 
> It took 110 minutes and not 10 hours. All files and dirs there had ACLs set.

I was basing that on you "find dir" time of 100 minutes, which was
the only number you gave, and making the assumption it didn't read
the attribute blocks and that it was seeing worse case seek times
(i.e. avg seek times) for every IO.

Given the way locality works in XFS, I'd suggest that the typical
seek time will be much less (a few blocks, not half the disk
platter) and not necessarily on the same disk (due to RAID) so the
average seek time for your workload is likely to be much lower. If
it's at 1ms (closer to track-to-track seek times) instead of the
5ms, then that 10hrs becomes 2hrs for that many IOs....

> > Also, for large directories like this (millions of entries) you
> > should also consider using a larger directory block size (mkfs -n
> > size=xxxx option) as that can be scaled independently to the
> > filesystem block size. This will significantly decrease the amount
> > of IO and fragmentation large directories cause. Peak modification
> > performance of small directories will be reduced because larger
> > block size directories consume more CPU to process, but for large
> > directories performance will be significantly better as they will
> > spend much less time waiting for IO.
> 
> This was not ONE directory with that many files, but a directory
> containing 834591 subdirectories (deeply nested, not all in the same
> dir!) and 10539154 files.

So you've got a directory *tree* that indexes 11 million inodes, not
"one directory with 11 million files and dirs in it" as you
originally described.  Both Christoph and I have interpreted your
original description as "one large directory", but there's no need
to shout at us because it's difficult to understand any given
configuration from just a few lines of text.  IOWs, details like "one
directory" vs "one directory tree" might seem insignificant to you,
but they mean an awful lot us developers and can easily lead us down
the wrong path.

FWIW, directory tree traversal is even more read IO latency
sensitive than a single large directory traversal because we can't
do readahead across directory boundaries to hide seek latencies as
much as possible and the locality on individual directories can be
very different depending on the allocaiton policy the filesystem is
using. As it is, large directory blocks can also reduce the amount
of IO needed in this sort of situation and speed up traversals....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS unlink still slow on 3.1.9 kernel ?
  2012-02-15  1:27     ` Dave Chinner
@ 2012-02-15 12:07       ` Richard Ems
  0 siblings, 0 replies; 31+ messages in thread
From: Richard Ems @ 2012-02-15 12:07 UTC (permalink / raw)
  To: xfs

Hi Dave, hi list,

On 02/15/2012 02:27 AM, Dave Chinner wrote:
> On Tue, Feb 14, 2012 at 01:32:00PM +0100, Richard Ems wrote:
>> On 02/14/2012 01:09 AM, Dave Chinner wrote:
>>>> I am asking because I am seeing very long times while removing big
>>>> directory trees. I thought on kernels above 3.0 removing dirs and files
>>>> had improved a lot, but I don't see that improvement.
>>>
>>> You won't if the directory traversal is seek bound and that is the
>>> limiting factor for performance.
>>
>> *Seek bound*? *When* is the directory traversal *seek bound*?
> 
> Whenever you are traversing a directory structure that is not alrady
> hot in the cache. IOWS, almost always.

Ok, got that.

> 
>>>> This is a backup system running dirvish, so most files in the dirs I am
>>>> removing are hard links. Almost all of the files do have ACLs set.
>>>
>>> The unlink will have an extra IO to read per inode - the out-of-line
>>> attribute block, so you've just added 11 million IOs to the 800,000
>>> the traversal already takes to the unlink overhead. So it's going to
>>> take roughly ten hours because the unlink is gong to be read IO seek
>>> bound....
>>
>> It took 110 minutes and not 10 hours. All files and dirs there had ACLs set.
> 
> I was basing that on you "find dir" time of 100 minutes, which was
> the only number you gave, and making the assumption it didn't read
> the attribute blocks and that it was seeing worse case seek times
> (i.e. avg seek times) for every IO.
> 
> Given the way locality works in XFS, I'd suggest that the typical
> seek time will be much less (a few blocks, not half the disk
> platter) and not necessarily on the same disk (due to RAID) so the
> average seek time for your workload is likely to be much lower. If
> it's at 1ms (closer to track-to-track seek times) instead of the
> 5ms, then that 10hrs becomes 2hrs for that many IOs....

Many thanks for the clarification !!!


>>> Also, for large directories like this (millions of entries) you
>>> should also consider using a larger directory block size (mkfs -n
>>> size=xxxx option) as that can be scaled independently to the
>>> filesystem block size. This will significantly decrease the amount
>>> of IO and fragmentation large directories cause. Peak modification
>>> performance of small directories will be reduced because larger
>>> block size directories consume more CPU to process, but for large
>>> directories performance will be significantly better as they will
>>> spend much less time waiting for IO.
>>
>> This was not ONE directory with that many files, but a directory
>> containing 834591 subdirectories (deeply nested, not all in the same
>> dir!) and 10539154 files.
> 
> So you've got a directory *tree* that indexes 11 million inodes, not
> "one directory with 11 million files and dirs in it" as you
> originally described.  Both Christoph and I have interpreted your
> original description as "one large directory", but there's no need
> to shout at us because it's difficult to understand any given
> configuration from just a few lines of text.  IOWs, details like "one
> directory" vs "one directory tree" might seem insignificant to you,
> but they mean an awful lot us developers and can easily lead us down
> the wrong path.

Sorry, I didn't mean to shout at anyone, sorry for that. I just wanted
to clarify my original description, since I noticed I did it wrong. Now
I know I should have used ** and not uppercase.
As you suggested, I should have written *directory tree* and not only
*directory*, sorry, my fault.
But I didn't mean to shout at anyone, I am very happy for the fast and
extense responses from both you and Christoph! Thanks again!


> 
> FWIW, directory tree traversal is even more read IO latency
> sensitive than a single large directory traversal because we can't
> do readahead across directory boundaries to hide seek latencies as
> much as possible and the locality on individual directories can be
> very different depending on the allocaiton policy the filesystem is
> using. As it is, large directory blocks can also reduce the amount
> of IO needed in this sort of situation and speed up traversals....
> 
> Cheers,
> 
> Dave.


Many thanks!
Richard

-- 
Richard Ems       mail: Richard.Ems@Cape-Horn-Eng.com

Cape Horn Engineering S.L.
C/ Dr. J.J. Dómine 1, 5º piso
46011 Valencia
Tel : +34 96 3242923 / Fax 924
http://www.cape-horn-eng.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS unlink still slow on 3.1.9 kernel ?
  2012-02-14 19:45     ` Christoph Hellwig
@ 2012-02-15 12:07       ` Richard Ems
  0 siblings, 0 replies; 31+ messages in thread
From: Richard Ems @ 2012-02-15 12:07 UTC (permalink / raw)
  To: xfs

On 02/14/2012 08:45 PM, Christoph Hellwig wrote:
> On Tue, Feb 14, 2012 at 01:32:00PM +0100, Richard Ems wrote:
>>> You won't if the directory traversal is seek bound and that is the
>>> limiting factor for performance.
>>
>> *Seek bound*? *When* is the directory traversal *seek bound*?
> 
> You read the inode for the directory first, then the external attribute
> block for the ACLs, then if the directory isn't tiny you'll start reading
> directory blocks, the more the larger the directory is, and if the
> filesystem is close to beeing full they often will be non-contiguous.
> Then you read the inode for each file/directory in it, then the external
> attribute block, then the extent list, and so on.

Ok, got it.

Many thanks,
Richard


-- 
Richard Ems       mail: Richard.Ems@Cape-Horn-Eng.com

Cape Horn Engineering S.L.
C/ Dr. J.J. Dómine 1, 5º piso
46011 Valencia
Tel : +34 96 3242923 / Fax 924
http://www.cape-horn-eng.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS unlink still slow on 3.1.9 kernel ?
  2012-02-14 19:07     ` Christoph Hellwig
@ 2012-02-15 12:48       ` Richard Ems
  0 siblings, 0 replies; 31+ messages in thread
From: Richard Ems @ 2012-02-15 12:48 UTC (permalink / raw)
  To: xfs

On 02/14/2012 08:07 PM, Christoph Hellwig wrote:
> On Tue, Feb 14, 2012 at 07:12:29PM +0100, Richard Ems wrote:
>> # /net/c3m/usr/local/software/XFS/summarise_stat.pl /bin/
>>       9  6.2% are scripts (shell, perl, whatever)
>>      65 44.8% don't use any stat() family calls at all
>>      61 42.1% use 32-bit stat() family interfaces only
>>       9  6.2% use 64-bit stat64() family interfaces only
>>       1  0.7% use both 32-bit and 64-bit stat() family interfaces
>>
>> So I was not sure if I should use inode64 or not.
> 
> Are you on a 32-bit system (userspace, kernel doesn't matter)?

No, I have been running on 64-bit systems for years. That output above
is on openSUSE 12.1 64-bit.

> If the system is 64-bit even the plain stat handles 64-bit inodes just
> fine.

With *plain stat* you mean the *32-bit stat* listed above ?
Is it then safe to switch all XFS partitions to be mounted with inode64
on 64-bit systems? Also for NFS v3 exports?
>From the XFS FAQ I read that it should work, also for NFS v3, and that I
can try it and switch back if there are problems, so I will give it a try!

Many thanks again for your time and help!
Richard

-- 
Richard Ems       mail: Richard.Ems@Cape-Horn-Eng.com

Cape Horn Engineering S.L.
C/ Dr. J.J. Dómine 1, 5º piso
46011 Valencia
Tel : +34 96 3242923 / Fax 924
http://www.cape-horn-eng.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS unlink still slow on 3.1.9 kernel ?
  2012-02-14 23:10 ` Stan Hoeppner
@ 2012-02-15 15:54   ` Richard Ems
  0 siblings, 0 replies; 31+ messages in thread
From: Richard Ems @ 2012-02-15 15:54 UTC (permalink / raw)
  To: xfs

On 02/15/2012 12:10 AM, Stan Hoeppner wrote:
> On 2/14/2012 7:02 AM, Richard Ems wrote:
>> lspci shows:
>>
>> LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI
> 
> PCI-X?  So this host and array have a few years on them I assume.

Yes. Bought 2007.

>>> Also check the performance data in the management interface on the
>>> InforTrend unit to see if you're hitting any limits there (if it has
>>> such a feature).  RAID6 is numerically intensive and that particular
>>> controller may not have the ASIC horsepower to keep up with the IOPS
>>> workload you're throwing at it.
>>
>> There is no performance data in the management interface to check.
> 
> Ok, so this is a relatively basic, entry level array?

It's an InforTrend InforTrend A24U-G2421, see
http://www.infortrend.com/main/2_product/es_a24u-g2421.asp


Richard

-- 
Richard Ems       mail: Richard.Ems@Cape-Horn-Eng.com

Cape Horn Engineering S.L.
C/ Dr. J.J. Dómine 1, 5º piso
46011 Valencia
Tel : +34 96 3242923 / Fax 924
http://www.cape-horn-eng.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2012-02-15 15:54 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-13 16:57 XFS unlink still slow on 3.1.9 kernel ? Richard Ems
2012-02-13 17:08 ` Christoph Hellwig
2012-02-13 17:11   ` Richard Ems
2012-02-13 17:15     ` Christoph Hellwig
2012-02-13 17:26       ` Richard Ems
2012-02-13 17:29         ` Christoph Hellwig
2012-02-13 17:53           ` Richard Ems
2012-02-13 18:02             ` Christoph Hellwig
2012-02-13 18:06               ` Richard Ems
2012-02-13 18:10                 ` Christoph Hellwig
2012-02-13 18:18                   ` Richard Ems
2012-02-13 18:48                   ` Richard Ems
2012-02-13 21:16                     ` Christoph Hellwig
2012-02-14  5:31                       ` Stan Hoeppner
2012-02-14  9:48                       ` Richard Ems
2012-02-14 19:43                         ` Christoph Hellwig
2012-02-14  9:49                       ` Richard Ems
2012-02-14 10:54                       ` Richard Ems
2012-02-14 11:44                       ` Richard Ems
2012-02-14  0:09 ` Dave Chinner
2012-02-14 12:32   ` Richard Ems
2012-02-14 19:45     ` Christoph Hellwig
2012-02-15 12:07       ` Richard Ems
2012-02-15  1:27     ` Dave Chinner
2012-02-15 12:07       ` Richard Ems
  -- strict thread matches above, loose matches on Subject: below --
2012-02-14 13:02 Richard Ems
     [not found] ` <4F3AA191.9030606@mnsu.edu>
2012-02-14 18:12   ` Richard Ems
2012-02-14 19:07     ` Christoph Hellwig
2012-02-15 12:48       ` Richard Ems
2012-02-14 23:10 ` Stan Hoeppner
2012-02-15 15:54   ` Richard Ems

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox