* how to scale root-reserved space going forward... @ 2009-03-01 22:22 Eric Sandeen 2009-03-02 2:47 ` Theodore Tso 0 siblings, 1 reply; 5+ messages in thread From: Eric Sandeen @ 2009-03-01 22:22 UTC (permalink / raw) To: ext4 development 5% of a 16T filesystem is getting a little crazy from the point of view of "root-reserved" - 800G! But I think the original reason for this reserved space was actually as an allocator cushion; letting root gain access to it was just a safety-valve for that. Now that we have a completely different allocator in ext4, and potentially much larger filesystems, I think we need to revisit how much is held back, and for what reason. Any thoughts on a reasonable way to scale this reservation (or, just for discussion - if it's even needed at all today for ext4?) -Eric ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: how to scale root-reserved space going forward... 2009-03-01 22:22 how to scale root-reserved space going forward Eric Sandeen @ 2009-03-02 2:47 ` Theodore Tso 2009-03-02 7:17 ` Ron Johnson 2009-03-02 8:56 ` Andreas Dilger 0 siblings, 2 replies; 5+ messages in thread From: Theodore Tso @ 2009-03-02 2:47 UTC (permalink / raw) To: Eric Sandeen; +Cc: ext4 development On Sun, Mar 01, 2009 at 04:22:54PM -0600, Eric Sandeen wrote: > 5% of a 16T filesystem is getting a little crazy from the point of view > of "root-reserved" - 800G! > > But I think the original reason for this reserved space was actually as > an allocator cushion; letting root gain access to it was just a > safety-valve for that. Yep, that's correct. Historically, this came from BSD Fast Filesystem, which used to use a default reserve of 10%. To quote from the FreeBSD sources, in ufs/ffs.h: /* * MINFREE gives the minimum acceptable percentage of filesystem * blocks which may be free. If the freelist drops below this level * only the superuser may continue to allocate blocks. This may * be set to 0 if no reserve of free blocks is deemed necessary, * however throughput drops by fifty percent if the filesystem * is run at between 95% and 100% full; thus the minimum default * value of fs_minfree is 5%. However, to get good clustering * performance, 10% is a better choice. hence we use 10% as our * default value. With 10% free space, fragmentation is not a * problem, so we choose to optimize for time. */ #define MINFREE 8 The interesting thing is that FreeBSD has decided push things down to 8%. A quick survey shows that NetBSD is using a MINFREE of 5%, like Linux. (Fortunately, http://fxr.watson.org/ makes it easy to make these comparisons.) And like Linux, it looks like the *BSD's have the same tendency not to update the comments when they update the code. :-) > Now that we have a completely different allocator in ext4, and > potentially much larger filesystems, I think we need to revisit how much > is held back, and for what reason. > > Any thoughts on a reasonable way to scale this reservation (or, just for > discussion - if it's even needed at all today for ext4?) This is a reasonable question. What would be great is if we could get a benchmarking team to fill an ext4 filesystem with files. The simple thing would be if we did something fixed --- say, 50 files per directory, each file 100k, and say 10 subdirectories in each directory, to some fixed depth, and with a filesystem size of at least 8 gigabytes (which would give us at least 16 flex groups with the default flex size of 16) --- and then filled each filesystem to from 0% to 90% in increments of 10%, and from 90% to 99% in increments of 1%, and then ran some throughput benchmark like bonnie on the mostly filled filesystem. A better filler would probably use a random file sizes with a average size of say 64k, but with outliers from 4k to 128 megs, and a similar random distribution of number of files per directory, and number of subdirectories and depth of subdirectories. I suppose it would be good to do one set of charts with a filesystem size of 8 gigs, and another at 80 gigs and 800 gigs, and see if the shape of the filesystem curve changes at scale. Once we have that, we would be in a position to make a reasonable set of defaults. Or we could just guess and come up with some percentage figure that sounds good. :-) - Ted ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: how to scale root-reserved space going forward... 2009-03-02 2:47 ` Theodore Tso @ 2009-03-02 7:17 ` Ron Johnson 2009-03-02 8:56 ` Andreas Dilger 1 sibling, 0 replies; 5+ messages in thread From: Ron Johnson @ 2009-03-02 7:17 UTC (permalink / raw) To: ext4 development On 03/01/2009 08:47 PM, Theodore Tso wrote: [snip] > > Or we could just guess and come up with some percentage figure that > sounds good. :-) 5% but 100GB is the fs is .ge. 2TB? -- Ron Johnson, Jr. Jefferson LA USA The feeling of disgust at seeing a human female in a Relationship with a chimp male is Homininphobia, and you should be ashamed of yourself. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: how to scale root-reserved space going forward... 2009-03-02 2:47 ` Theodore Tso 2009-03-02 7:17 ` Ron Johnson @ 2009-03-02 8:56 ` Andreas Dilger 2009-03-02 16:26 ` Eric Sandeen 1 sibling, 1 reply; 5+ messages in thread From: Andreas Dilger @ 2009-03-02 8:56 UTC (permalink / raw) To: Theodore Tso; +Cc: Eric Sandeen, ext4 development On Mar 01, 2009 21:47 -0500, Theodore Ts'o wrote: > This is a reasonable question. What would be great is if we could get > a benchmarking team to fill an ext4 filesystem with files. The simple > thing would be if we did something fixed --- say, 50 files per > directory, each file 100k, and say 10 subdirectories in each > directory, to some fixed depth, and with a filesystem size of at least > 8 gigabytes (which would give us at least 16 flex groups with the > default flex size of 16) --- and then filled each filesystem to from > 0% to 90% in increments of 10%, and from 90% to 99% in increments of > 1%, and then ran some throughput benchmark like bonnie on the mostly > filled filesystem. We've done tests like this, and it is important to take the inner vs. outer cyliners into account. It can happen that even a "perfectly" allocated filesystem will appear to show slowdowns in performance as it gets full, yet this is partitially due to physical disk layout issues. > A better filler would probably use a random file sizes with a average > size of say 64k, but with outliers from 4k to 128 megs, and a similar > random distribution of number of files per directory, and number of > subdirectories and depth of subdirectories. You describe the Reiserfs "Mongo" benchmark. > I suppose it would be good to do one set of charts with a filesystem > size of 8 gigs, and another at 80 gigs and 800 gigs, and see if the > shape of the filesystem curve changes at scale. Once we have that, we > would be in a position to make a reasonable set of defaults. > > Or we could just guess and come up with some percentage figure that > sounds good. :-) I suspect that at a certain filesystem size, there isn't much benefit in having more reserved space. If we keep 50GB of reserved space then this is likely to contain a decent amount of 1MB free chunks, which is what we really care about. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: how to scale root-reserved space going forward... 2009-03-02 8:56 ` Andreas Dilger @ 2009-03-02 16:26 ` Eric Sandeen 0 siblings, 0 replies; 5+ messages in thread From: Eric Sandeen @ 2009-03-02 16:26 UTC (permalink / raw) To: Andreas Dilger; +Cc: Theodore Tso, ext4 development Andreas Dilger wrote: > I suspect that at a certain filesystem size, there isn't much benefit > in having more reserved space. If we keep 50GB of reserved space then > this is likely to contain a decent amount of 1MB free chunks, which is > what we really care about. > > Cheers, Andreas As long as we don't care much about inode<->block locality, that is probably ok... I got one off-list reply from someone who was depending on the 5% for administration reasons, but even if we change the default I assume we'll leave the tunable in place for people who really do want 5% for "root-reserved" over "allocator safety buffer" reasons. -Eric ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-03-02 16:26 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-03-01 22:22 how to scale root-reserved space going forward Eric Sandeen 2009-03-02 2:47 ` Theodore Tso 2009-03-02 7:17 ` Ron Johnson 2009-03-02 8:56 ` Andreas Dilger 2009-03-02 16:26 ` Eric Sandeen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).