Choosing and tuning Linux file systems

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Choosing and tuning Linux file systems
@ 2006-06-25 22:00 Valerie Henson
  2006-06-25 22:13 ` Matthew Wilcox
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Valerie Henson @ 2006-06-25 22:00 UTC (permalink / raw)
  To: linux-fsdevel

Hi folks,

I foolishly signed up to give a talk at OSCON in about a month about
choosing and tuning Linux file systems for different workloads.  I
have some ideas about which file system to use when, but I'd rather
get recommendations from the experts on each file system.  Below is a
straw man outline of my current recommendations, please take a look
and comment.  I will make a summary freely available when I'm done.
At long last, I'll have an easy answer when someone asks me, "But
which file system should I use?"  Answer: "Go read this web page..."

By the way, a lot of the data on file/fs limits and the like is from:

http://en.wikipedia.org/wiki/File_systems

If it's wrong, please go check the page and update it if it's wrong.
Thanks!

Choosing a file system

Laptop: ext3 with noatime
General purpose server: ext3 or reiser
Lots of small files: reiser, ext2/3 with 1k blocks
More than ~32,000 files in one directory: XFS or reiser
Fast lookups in large directories: XFS, reiser, ext3 with htree (?)
File size more than 2TB: XFS, reiser up to 8TB
File system size more than 2TB: XFS, reiser up to 16TB
Ease of data recovery after corruption: ext2, ext3

Tuning a file system

Use "noatime" mount option
 - atime makes read workloads into random write workloads, yuck
 - This is Ubuntu installation default
 - I have a report that mutt doesn't work with this because atime is
   never updated but mtime is, maybe some kind of lazy atime is better?
 - Don't do if you want to e.g., track down hackers

Choosing journaling mode in ext3
 - Default is "ordered", usually the right choice
 - "journal" is slower but guarantees data is on-disk as well
 - "writeback" is faster but may result in garbage/security leaks in
   your file data

Choosing block size
 - You can do this at mkfs time
 - tradeoff is space wasted vs. max file/fs size (other considerations?)
 - limitation is system page size

Tuning reiser
 - I know nothing!!!  Help!

Choosing number of inodes
 - XFS, reiser dynamically allocate inodes
 - Default is fine unless you have LOTS of small files (or occasionally, only big files)
 - mke2fs -T {news,largefile,largefile4}

Laptop mode
 - I know almost nothing about this... some kind of write timeout?

-VAL

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Choosing and tuning Linux file systems
  2006-06-25 22:00 Choosing and tuning Linux file systems Valerie Henson
@ 2006-06-25 22:13 ` Matthew Wilcox
  2006-06-25 22:26 ` Arjan van de Ven
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Matthew Wilcox @ 2006-06-25 22:13 UTC (permalink / raw)
  To: Valerie Henson; +Cc: linux-fsdevel

On Sun, Jun 25, 2006 at 03:00:53PM -0700, Valerie Henson wrote:
> I foolishly signed up to give a talk at OSCON in about a month about
> choosing and tuning Linux file systems for different workloads.  I
> 
> Laptop: ext3 with noatime
> General purpose server: ext3 or reiser
> Lots of small files: reiser, ext2/3 with 1k blocks
> More than ~32,000 files in one directory: XFS or reiser
> Fast lookups in large directories: XFS, reiser, ext3 with htree (?)
> File size more than 2TB: XFS, reiser up to 8TB
> File system size more than 2TB: XFS, reiser up to 16TB
> Ease of data recovery after corruption: ext2, ext3

An interesting workload you don't cover here is the PVR workload.
You're looking at lots of 1-2GB files (1GB for half-hour programs, 2GB
for full-hour).  Reads and writes are sequential; overwrites and random
accesses almost never happen.  It's not uncommon (at least for those of
us with two tuners ...) to record two things while watching a third,
so support for massive preallocation will prevent fragmentation.
All these files are in one directory (so ext2/3's Orlov allocator is
pretty much defeated).

> Tuning a file system
> 
> Use "noatime" mount option
>  - atime makes read workloads into random write workloads, yuck
>  - This is Ubuntu installation default
>  - I have a report that mutt doesn't work with this because atime is
>    never updated but mtime is, maybe some kind of lazy atime is better?
>  - Don't do if you want to e.g., track down hackers

Mention nodiratime?


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Choosing and tuning Linux file systems
  2006-06-25 22:00 Choosing and tuning Linux file systems Valerie Henson
  2006-06-25 22:13 ` Matthew Wilcox
@ 2006-06-25 22:26 ` Arjan van de Ven
  2006-06-26  7:22 ` Neil Brown
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Arjan van de Ven @ 2006-06-25 22:26 UTC (permalink / raw)
  To: Valerie Henson; +Cc: linux-fsdevel

Hi,

> Laptop: ext3 with noatime
> General purpose server: ext3 or reiser
> Lots of small files: reiser, ext2/3 with 1k blocks
> More than ~32,000 files in one directory: XFS or reiser

this part probably predates htree in ext3... or does it?

> Choosing block size
>  - You can do this at mkfs time
>  - tradeoff is space wasted vs. max file/fs size (other considerations?)

"page size" is optimal for the VM basically, since the VM page == FS
block case becomes real simple in terms of write back of dirty pages.

> Laptop mode
>  - I know almost nothing about this... some kind of write timeout?

actually it's not really (although usually when people say "laptop mode"
they include the increase of various writeback timeouts)
The key idea behind laptop mode is that IF you write (or read), and thus
spin the disk up, you should use the opportunity to write back all
pending stuff, with the idea that you can use that to prevent future
spinups of the disk that way (eg when the timer on that data expires). 
Think of it like scheduling a "sync" about half a second after each IO
completion (although that's too simplistic).

Greetings,
   Arjan van de Ven

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Choosing and tuning Linux file systems
  2006-06-25 22:00 Choosing and tuning Linux file systems Valerie Henson
  2006-06-25 22:13 ` Matthew Wilcox
  2006-06-25 22:26 ` Arjan van de Ven
@ 2006-06-26  7:22 ` Neil Brown
  2006-06-26  9:04 ` Nate Diller
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Neil Brown @ 2006-06-26  7:22 UTC (permalink / raw)
  To: Valerie Henson; +Cc: linux-fsdevel

On Sunday June 25, val_henson@linux.intel.com wrote:
> 
> Choosing journaling mode in ext3
>  - Default is "ordered", usually the right choice
>  - "journal" is slower but guarantees data is on-disk as well

'journal' doesn't make any extra guarantees over 'ordered'.  It can
provide lower latencies for synchronous updates as writes don't
require as many seeks.  I have found that data=journal makes NFS go
faster.

>  - "writeback" is faster but may result in garbage/security leaks in
>    your file data

NeilBrown

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Choosing and tuning Linux file systems
  2006-06-25 22:00 Choosing and tuning Linux file systems Valerie Henson
                   ` (2 preceding siblings ...)
  2006-06-26  7:22 ` Neil Brown
@ 2006-06-26  9:04 ` Nate Diller
  2006-06-27 18:46   ` Valerie Henson
  2006-06-26 11:10 ` Erik Mouw
       [not found] ` <20060626091357.GQ5817@schatzie.adilger.int>
  5 siblings, 1 reply; 18+ messages in thread
From: Nate Diller @ 2006-06-26  9:04 UTC (permalink / raw)
  To: Valerie Henson; +Cc: linux-fsdevel

On 6/25/06, Valerie Henson <val_henson@linux.intel.com> wrote:
> Hi folks,
>
> I foolishly signed up to give a talk at OSCON in about a month about
> choosing and tuning Linux file systems for different workloads.  I
> have some ideas about which file system to use when, but I'd rather
> get recommendations from the experts on each file system.  Below is a
> straw man outline of my current recommendations, please take a look
> and comment.  I will make a summary freely available when I'm done.
> At long last, I'll have an easy answer when someone asks me, "But
> which file system should I use?"  Answer: "Go read this web page..."

heh, in other words, "bring on the flames, FUD, death threats, etc"

> By the way, a lot of the data on file/fs limits and the like is from:
>
> http://en.wikipedia.org/wiki/File_systems
>
> If it's wrong, please go check the page and update it if it's wrong.
> Thanks!
>
> Choosing a file system
>
> Laptop: ext3 with noatime
> General purpose server: ext3 or reiser
> Lots of small files: reiser, ext2/3 with 1k blocks
> More than ~32,000 files in one directory: XFS or reiser
> Fast lookups in large directories: XFS, reiser, ext3 with htree (?)
> File size more than 2TB: XFS, reiser up to 8TB
> File system size more than 2TB: XFS, reiser up to 16TB
> Ease of data recovery after corruption: ext2, ext3
>
> Tuning a file system
>
> Use "noatime" mount option
>  - atime makes read workloads into random write workloads, yuck
>  - This is Ubuntu installation default
>  - I have a report that mutt doesn't work with this because atime is
>    never updated but mtime is, maybe some kind of lazy atime is better?
>  - Don't do if you want to e.g., track down hackers
>
> Choosing journaling mode in ext3
>  - Default is "ordered", usually the right choice
>  - "journal" is slower but guarantees data is on-disk as well
>  - "writeback" is faster but may result in garbage/security leaks in
>    your file data

XFS (and reiser4) use delayed block allocation, and have no
"data=journal" option, however reiser4 guarantees "data=ordered".
delayed allocation can have a big performance advantage for
interspersed writes and overwrites.

> Choosing block size
>  - You can do this at mkfs time
>  - tradeoff is space wasted vs. max file/fs size (other considerations?)
>  - limitation is system page size

you might also want to mention ext3 reservations, they can definitely
increase performance for streaming workloads, and can be increased by
changing a #define.  too bad this sort of thing isn't generalized for
all the FS's, with some sort of pre-allocation/mapping addition to the
aops.  it could even replace the bmap() call.

> Tuning reiser
>  - I know nothing!!!  Help!

read up on the notail option, it is almost always the best idea.  it
reduces the number of seeks, at the cost of a small packing
inefficiency.  also, reiser4 fixes this problem (and some other big
performance issues).

NATE

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Choosing and tuning Linux file systems
  2006-06-25 22:00 Choosing and tuning Linux file systems Valerie Henson
                   ` (3 preceding siblings ...)
  2006-06-26  9:04 ` Nate Diller
@ 2006-06-26 11:10 ` Erik Mouw
  2006-06-26 12:36   ` ext2/3 subdirectory limit [WAS: Choosing and tuning Linux file systems] Tomas Hruby
       [not found] ` <20060626091357.GQ5817@schatzie.adilger.int>
  5 siblings, 1 reply; 18+ messages in thread
From: Erik Mouw @ 2006-06-26 11:10 UTC (permalink / raw)
  To: Valerie Henson; +Cc: linux-fsdevel

On Sun, Jun 25, 2006 at 03:00:53PM -0700, Valerie Henson wrote:
> I foolishly signed up to give a talk at OSCON in about a month about
> choosing and tuning Linux file systems for different workloads.  I
> have some ideas about which file system to use when, but I'd rather
> get recommendations from the experts on each file system.  Below is a
> straw man outline of my current recommendations, please take a look
> and comment.  I will make a summary freely available when I'm done.
> At long last, I'll have an easy answer when someone asks me, "But
> which file system should I use?"  Answer: "Go read this web page..."

Here are some comments.

> Choosing a file system
> 
> Laptop: ext3 with noatime
> General purpose server: ext3 or reiser
> Lots of small files: reiser, ext2/3 with 1k blocks

Small files usually implies lots of files in a directory, so be sure to
use htree with ext3.

> More than ~32,000 files in one directory: XFS or reiser

Ext3 can easily have more than 32000 *files* in a directory. However,
it can only have 32000 *subdirectories* in a directory. This limit is
from struct ext3_inode->i_links_count, which is an __le16: each
subdirectory has an entry ".." that links back to its parent increasing
the parents i_links_count.

> Fast lookups in large directories: XFS, reiser, ext3 with htree (?)
> File size more than 2TB: XFS, reiser up to 8TB
> File system size more than 2TB: XFS, reiser up to 16TB
> Ease of data recovery after corruption: ext2, ext3
> 
> Tuning a file system
> 
> Use "noatime" mount option

Can also be combined with the "nodiratime" mount option.

>  - atime makes read workloads into random write workloads, yuck
>  - This is Ubuntu installation default
>  - I have a report that mutt doesn't work with this because atime is
>    never updated but mtime is, maybe some kind of lazy atime is better?

It does indeed think that a mailbox always has new content. However,
this is only with mbox style mailboxes, maildir or mh style mailboxes
just work.

>  - Don't do if you want to e.g., track down hackers
> 
> Choosing journaling mode in ext3
>  - Default is "ordered", usually the right choice
>  - "journal" is slower but guarantees data is on-disk as well
>  - "writeback" is faster but may result in garbage/security leaks in
>    your file data
> 
> Choosing block size
>  - You can do this at mkfs time
>  - tradeoff is space wasted vs. max file/fs size (other considerations?)
>  - limitation is system page size

NTFS has support for block sizes larger than page size. There were some
patches from Anton Altaparmakov to allow such block sizes, but IIRC
they are NTFS-only and not made genericly available for all
filesystems.


Erik

-- 
+-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 --
| Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ext2/3 subdirectory limit [WAS: Choosing and tuning Linux file systems]
  2006-06-26 12:36   ` ext2/3 subdirectory limit [WAS: Choosing and tuning Linux file systems] Tomas Hruby
@ 2006-06-26 12:35     ` Arjan van de Ven
  2006-06-26 12:54     ` Theodore Tso
  2006-06-26 12:59     ` Erik Mouw
  2 siblings, 0 replies; 18+ messages in thread
From: Arjan van de Ven @ 2006-06-26 12:35 UTC (permalink / raw)
  To: Tomas Hruby; +Cc: Erik Mouw, linux-fsdevel


> What is the link_count (incremented by subdirectories) used for? Is it ext2/3
> specific or should it be implemented in such a way by other FS too? I am asking
> becuse I see no reason why to do so in our FS.

iirc some old unix tools (some versions of tar as well as other backup
tools) use this.... don't ask me how/why ;) I only remember getting
bugreports at one point about this count being wrong ;)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* ext2/3 subdirectory limit [WAS: Choosing and tuning Linux file systems]
  2006-06-26 11:10 ` Erik Mouw
@ 2006-06-26 12:36   ` Tomas Hruby
  2006-06-26 12:35     ` Arjan van de Ven
                       ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Tomas Hruby @ 2006-06-26 12:36 UTC (permalink / raw)
  To: Erik Mouw; +Cc: linux-fsdevel

> > More than ~32,000 files in one directory: XFS or reiser
> 
> Ext3 can easily have more than 32000 *files* in a directory. However,
> it can only have 32000 *subdirectories* in a directory. This limit is
> from struct ext3_inode->i_links_count, which is an __le16: each
> subdirectory has an entry ".." that links back to its parent increasing
> the parents i_links_count.

I was always wondering why it increases link_count of the parent directory when
creating a subdirectory. It is clear that .. points to the parent, but the
subdirectory cannot exist without its parent and you cannot delete the parent if
it is not empty. Correct me if I am wrong.

What is the link_count (incremented by subdirectories) used for? Is it ext2/3
specific or should it be implemented in such a way by other FS too? I am asking
becuse I see no reason why to do so in our FS.

Cheers,

		Tomas

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ext2/3 subdirectory limit [WAS: Choosing and tuning Linux file systems]
  2006-06-26 12:36   ` ext2/3 subdirectory limit [WAS: Choosing and tuning Linux file systems] Tomas Hruby
  2006-06-26 12:35     ` Arjan van de Ven
@ 2006-06-26 12:54     ` Theodore Tso
  2006-06-26 16:25       ` Andreas Dilger
  2006-06-26 17:35       ` Chris Wedgwood
  2006-06-26 12:59     ` Erik Mouw
  2 siblings, 2 replies; 18+ messages in thread
From: Theodore Tso @ 2006-06-26 12:54 UTC (permalink / raw)
  To: Tomas Hruby; +Cc: Erik Mouw, linux-fsdevel

On Mon, Jun 26, 2006 at 02:36:35PM +0200, Tomas Hruby wrote:
> I was always wondering why it increases link_count of the parent
> directory when creating a subdirectory. It is clear that .. points
> to the parent, but the subdirectory cannot exist without its parent
> and you cannot delete the parent if it is not empty. Correct me if I
> am wrong.
> 
> What is the link_count (incremented by subdirectories) used for? Is
> it ext2/3 specific or should it be implemented in such a way by
> other FS too? I am asking becuse I see no reason why to do so in our
> FS.

No, it's not ext2/3 specific; it's an old Unix tradition.  I've heard
reports of programs that attempt to optimize recursive descents by
counting the number of directories found so far and comparing it with
st_nlinks.  This is dangerous since it's possible for directories to
be created during the readdir(), and in any case not all filesystems
do follow this behavior.  So preserving it just to keep application
programs from breaking is probably a bad idea; in the case of ext2/3
we've been preserving it mainly because to do otherwise would cause
e2fsck to complain.  

We could create an rocompat feature which disabled this, or which
causes st_nlink above 32000 to mean "infinity"; such patches have
existed, but in practice it's relatively rare that people actually
find this to be a limitation, so none of the patches have managed to
achieve the necessary activation energy to actually get integrated
into both the kernel and e2fsprogs.

						- Ted

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ext2/3 subdirectory limit [WAS: Choosing and tuning Linux file systems]
  2006-06-26 12:36   ` ext2/3 subdirectory limit [WAS: Choosing and tuning Linux file systems] Tomas Hruby
  2006-06-26 12:35     ` Arjan van de Ven
  2006-06-26 12:54     ` Theodore Tso
@ 2006-06-26 12:59     ` Erik Mouw
  2006-06-26 21:09       ` Tomas Hruby
  2 siblings, 1 reply; 18+ messages in thread
From: Erik Mouw @ 2006-06-26 12:59 UTC (permalink / raw)
  To: Tomas Hruby; +Cc: linux-fsdevel

On Mon, Jun 26, 2006 at 02:36:35PM +0200, Tomas Hruby wrote:
> > > More than ~32,000 files in one directory: XFS or reiser
> > 
> > Ext3 can easily have more than 32000 *files* in a directory. However,
> > it can only have 32000 *subdirectories* in a directory. This limit is
> > from struct ext3_inode->i_links_count, which is an __le16: each
> > subdirectory has an entry ".." that links back to its parent increasing
> > the parents i_links_count.
> 
> I was always wondering why it increases link_count of the parent directory when
> creating a subdirectory. It is clear that .. points to the parent, but the
> subdirectory cannot exist without its parent and you cannot delete the parent if
> it is not empty. Correct me if I am wrong.

It is an elegant way: an inode can only be deleted when the link count
is zero. The fastest way to figure that out for directories would be to
let subdirs increase the parent link count: you just have look up the
link count in the parent instead of going through all directory entries
searching for possible subdirectories.


Erik

-- 
+-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 --
| Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ext2/3 subdirectory limit [WAS: Choosing and tuning Linux file systems]
  2006-06-26 12:54     ` Theodore Tso
@ 2006-06-26 16:25       ` Andreas Dilger
  2006-06-26 17:35       ` Chris Wedgwood
  1 sibling, 0 replies; 18+ messages in thread
From: Andreas Dilger @ 2006-06-26 16:25 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Tomas Hruby, Erik Mouw, linux-fsdevel

On Jun 26, 2006  08:54 -0400, Theodore Tso wrote:
> We could create an rocompat feature which disabled this, or which
> causes st_nlink above 32000 to mean "infinity"; such patches have
> existed, but in practice it's relatively rare that people actually
> find this to be a limitation, so none of the patches have managed to
> achieve the necessary activation energy to actually get integrated
> into both the kernel and e2fsprogs.

Ted, can you please assign an official RO_COMPAT flag for this feature?
CFS has such patches that we should submit.  All that is needed is which
flag should be used.

While you are there, please also assign an RO_COMPAT flag for NS_TIMESTAMP
patch I recently submitted, and INCOMPAT_64BIT for > 32-bit blocknr patches.

There is also EXT3_HUGE_FILE_FL and RO_COMPAT_HUGE_FILE for Takashi's
"i_blocks in fs-blocksize units" patch.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ext2/3 subdirectory limit [WAS: Choosing and tuning Linux file systems]
  2006-06-26 12:54     ` Theodore Tso
  2006-06-26 16:25       ` Andreas Dilger
@ 2006-06-26 17:35       ` Chris Wedgwood
  2006-06-26 21:03         ` Tomas Hruby
  1 sibling, 1 reply; 18+ messages in thread
From: Chris Wedgwood @ 2006-06-26 17:35 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Tomas Hruby, Erik Mouw, linux-fsdevel

On Mon, Jun 26, 2006 at 08:54:35AM -0400, Theodore Tso wrote:

> I've heard reports of programs that attempt to optimize recursive
> descents by counting the number of directories found so far and
> comparing it with st_nlinks.

Some versions of find do this.  Breaking these semantics I think would
be painful for some people.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ext2/3 subdirectory limit [WAS: Choosing and tuning Linux file systems]
  2006-06-26 21:03         ` Tomas Hruby
@ 2006-06-26 21:03           ` Chris Wedgwood
  2006-06-26 21:13             ` H. Peter Anvin
  0 siblings, 1 reply; 18+ messages in thread
From: Chris Wedgwood @ 2006-06-26 21:03 UTC (permalink / raw)
  To: Tomas Hruby; +Cc: Theodore Tso, Erik Mouw, linux-fsdevel

On Mon, Jun 26, 2006 at 11:03:57PM +0200, Tomas Hruby wrote:

> So if I have a directory where the nlink does not correspond with
> then number of subdirectories, the applications you mentined don't
> work correctly?

yes

> As Ted said, not all fiel systems follow this. This means that such
> applications cannot e.g., remove a directory (rm -r) from such a FS?

i don't know about rm, but some versions of find need -noleaf (see the
man page for details)

> I am asking because we have an issue that an older version of mc
> cannot remove directory recursively, however we have not observed
> this on any new system. Might the nlink count be the problem?

i'm not seen the nlink problem occur because pretty much all
filesystems have nlink set correctly *or* they set it to one like
autofs does which happens to work because 1-2-n underflows (there are
also those who claim that nlink==1 implies you don't know how many
links there are)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ext2/3 subdirectory limit [WAS: Choosing and tuning Linux file systems]
  2006-06-26 17:35       ` Chris Wedgwood
@ 2006-06-26 21:03         ` Tomas Hruby
  2006-06-26 21:03           ` Chris Wedgwood
  0 siblings, 1 reply; 18+ messages in thread
From: Tomas Hruby @ 2006-06-26 21:03 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Theodore Tso, Erik Mouw, linux-fsdevel

On Mon, Jun 26, 2006 at 10:35:56AM -0700, Chris Wedgwood wrote:
> On Mon, Jun 26, 2006 at 08:54:35AM -0400, Theodore Tso wrote:
> 
> > I've heard reports of programs that attempt to optimize recursive
> > descents by counting the number of directories found so far and
> > comparing it with st_nlinks.
> 
> Some versions of find do this.  Breaking these semantics I think would
> be painful for some people.

So if I have a directory where the nlink does not correspond with then number
of subdirectories, the applications you mentined don't work correctly? 

As Ted said, not all fiel systems follow this. This means that such applications
cannot e.g., remove a directory (rm -r) from such a FS? I am asking because we
have an issue that an older version of mc cannot remove directory recursively,
however we have not observed this on any new system. Might the nlink count be
the problem?

		Tomas

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ext2/3 subdirectory limit [WAS: Choosing and tuning Linux file systems]
  2006-06-26 12:59     ` Erik Mouw
@ 2006-06-26 21:09       ` Tomas Hruby
  0 siblings, 0 replies; 18+ messages in thread
From: Tomas Hruby @ 2006-06-26 21:09 UTC (permalink / raw)
  To: Erik Mouw; +Cc: linux-fsdevel

On Mon, Jun 26, 2006 at 02:59:00PM +0200, Erik Mouw wrote:
> On Mon, Jun 26, 2006 at 02:36:35PM +0200, Tomas Hruby wrote:
> > > > More than ~32,000 files in one directory: XFS or reiser
> > > 
> > > Ext3 can easily have more than 32000 *files* in a directory. However,
> > > it can only have 32000 *subdirectories* in a directory. This limit is
> > > from struct ext3_inode->i_links_count, which is an __le16: each
> > > subdirectory has an entry ".." that links back to its parent increasing
> > > the parents i_links_count.
> > 
> > I was always wondering why it increases link_count of the parent directory when
> > creating a subdirectory. It is clear that .. points to the parent, but the
> > subdirectory cannot exist without its parent and you cannot delete the parent if
> > it is not empty. Correct me if I am wrong.
> 
> It is an elegant way: an inode can only be deleted when the link count
> is zero. The fastest way to figure that out for directories would be to
> let subdirs increase the parent link count: you just have look up the
> link count in the parent instead of going through all directory entries
> searching for possible subdirectories.

I don't get the point here. If there are not only subdirectories but other
entries as well, you can't remove the directory anyway ...

		Tomas

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ext2/3 subdirectory limit [WAS: Choosing and tuning Linux file systems]
  2006-06-26 21:03           ` Chris Wedgwood
@ 2006-06-26 21:13             ` H. Peter Anvin
  0 siblings, 0 replies; 18+ messages in thread
From: H. Peter Anvin @ 2006-06-26 21:13 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Tomas Hruby, Theodore Tso, Erik Mouw, linux-fsdevel

Chris Wedgwood wrote:
> 
>> As Ted said, not all fiel systems follow this. This means that such
>> applications cannot e.g., remove a directory (rm -r) from such a FS?
> 
> i don't know about rm, but some versions of find need -noleaf (see the
> man page for details)
> 
>> I am asking because we have an issue that an older version of mc
>> cannot remove directory recursively, however we have not observed
>> this on any new system. Might the nlink count be the problem?
> 
> i'm not seen the nlink problem occur because pretty much all
> filesystems have nlink set correctly *or* they set it to one like
> autofs does which happens to work because 1-2-n underflows (there are
> also those who claim that nlink==1 implies you don't know how many
> links there are)

Right.  On most applications, it "just works" because of underflow (if 
you treat nlink as unsigned and subtract 2, then you get UINT_MAX which 
is functionally infinity.)  This is a good reason for this particular 
retcon.

The established practice is to set nlink to 1 if either you don't know 
how many links there are or there are too many to fit in the nlink field.

	-hpa

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Choosing and tuning Linux file systems
       [not found] ` <20060626091357.GQ5817@schatzie.adilger.int>
@ 2006-06-26 22:01   ` Valerie Henson
  0 siblings, 0 replies; 18+ messages in thread
From: Valerie Henson @ 2006-06-26 22:01 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: linux-fsdevel

(Small accident occurred on Andreas' reply, restoring fsdevel and
quoting in entirety.)

On Mon, Jun 26, 2006 at 03:13:57AM -0600, Andreas Dilger wrote:
> On Jun 25, 2006  15:00 -0700, Valerie Henson wrote:
> > Lots of small files: reiser, ext2/3 with 1k blocks
> > More than ~32,000 files in one directory: XFS or reiser
> 
> This is actually "more than 32000 subdirectories in one directory".

Yeah, uh, dur.  Thanks!

> Unpatched ext3 is "good" up to 1M files and usable up to 10M regular
> files in a single directory.

Good to know the guidelines here.  Thanks!

-VAL

> > Choosing journaling mode in ext3
> >  - Default is "ordered", usually the right choice
> >  - "journal" is slower but guarantees data is on-disk as well
> 
> Also good for NFS server or mail spool running in "sync" mode (sync
> writes are linear into the journal).
> 
> >  - "writeback" is faster but may result in garbage/security leaks in
> >    your file data
> 
> ... after a crash
> 
> > Tuning reiser
> >  - I know nothing!!!  Help!
> 
> Tail packing saves space for small files, hurts performance somewhat.
> 
> > Laptop mode
> >  - I know almost nothing about this... some kind of write timeout?
> 
> Writes are cached until VM pressure forces them to disk (if disk is
> suspended), or there is a read which causes disk to spin up.  So,
> chance of data loss if laptop crashes or "suspends" w/o writing it.
> 
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Software Engineer
> Cluster File Systems, Inc.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Choosing and tuning Linux file systems
  2006-06-26  9:04 ` Nate Diller
@ 2006-06-27 18:46   ` Valerie Henson
  0 siblings, 0 replies; 18+ messages in thread
From: Valerie Henson @ 2006-06-27 18:46 UTC (permalink / raw)
  To: Nate Diller; +Cc: linux-fsdevel

On Mon, Jun 26, 2006 at 02:04:17AM -0700, Nate Diller wrote:
> 
> heh, in other words, "bring on the flames, FUD, death threats, etc"

Shhh!!!  Sh!!!  Someone might hear you! :)

Seriously, I used to be a file system absolutist.  Then I talked to
people who actually used file systems to do work.  They had this
irritating habit of saying things like, "I have requirement XYZ for my
workload, and only fs ABC can do that."  And I would think, "Well,
you're right.  You do have to use that fs for your workload."  So I'm
focusing on the boundary conditions - when do you absolutely want to
use a particular file system and not another?

> you might also want to mention ext3 reservations, they can definitely
> increase performance for streaming workloads, and can be increased by
> changing a #define.  too bad this sort of thing isn't generalized for
> all the FS's, with some sort of pre-allocation/mapping addition to the
> aops.  it could even replace the bmap() call.

Recently I was writing up an article on transparent large page support
and realized how much large page reservations and disk block
reservations had in common.  I highly recommend reading this paper,
"Practical, Transparent Operating System Support for Superpages," by
Juan Navarro, et al.:

http://www.cs.rice.edu/~ssiyer/r/superpages/

Or if you either have a LWN subscription or can wait until Wednesday
when it becomes free, my summary of the paper:

http://lwn.net/Articles/187921/

Especially the population maps and the lists of reservations of
particular sizes look interesting.

-VAL

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2006-06-27 18:48 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-25 22:00 Choosing and tuning Linux file systems Valerie Henson
2006-06-25 22:13 ` Matthew Wilcox
2006-06-25 22:26 ` Arjan van de Ven
2006-06-26  7:22 ` Neil Brown
2006-06-26  9:04 ` Nate Diller
2006-06-27 18:46   ` Valerie Henson
2006-06-26 11:10 ` Erik Mouw
2006-06-26 12:36   ` ext2/3 subdirectory limit [WAS: Choosing and tuning Linux file systems] Tomas Hruby
2006-06-26 12:35     ` Arjan van de Ven
2006-06-26 12:54     ` Theodore Tso
2006-06-26 16:25       ` Andreas Dilger
2006-06-26 17:35       ` Chris Wedgwood
2006-06-26 21:03         ` Tomas Hruby
2006-06-26 21:03           ` Chris Wedgwood
2006-06-26 21:13             ` H. Peter Anvin
2006-06-26 12:59     ` Erik Mouw
2006-06-26 21:09       ` Tomas Hruby
     [not found] ` <20060626091357.GQ5817@schatzie.adilger.int>
2006-06-26 22:01   ` Choosing and tuning Linux file systems Valerie Henson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).