From mboxrd@z Thu Jan 1 00:00:00 1970 From: Matthew Wilcox Subject: Re: topics for the file system mini-summit Date: Sun, 28 May 2006 18:11:03 -0600 Message-ID: <20060529001103.GA23405@parisc-linux.org> References: <44762552.8000906@emc.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: linux-fsdevel@vger.kernel.org Return-path: Received: from palinux.external.hp.com ([192.25.206.14]:7623 "EHLO palinux.external.hp.com") by vger.kernel.org with ESMTP id S1751066AbWE2ALE (ORCPT ); Sun, 28 May 2006 20:11:04 -0400 To: Ric Wheeler Content-Disposition: inline In-Reply-To: <44762552.8000906@emc.com> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Thu, May 25, 2006 at 02:44:50PM -0700, Ric Wheeler wrote: > The obvious alternative to this is to break up these big disks into > multiple small file systems, but there again we hit several issues. > > As an example, in one of the boxes that I work with we have 4 drives, > each 500GBs, with limited memory and CPU resources. To address the > issues above, we break each drive into 100GB chunks which gives us 20 > (reiserfs) file systems per box. The set of new problems that arise > from this include: > > (1) no forced unmount - one file system goes down, you have to > reboot the box to recover. > (2) worst case memory consumption for the journal scales linearly > with the number of file systems (32MB/per file system). > (3) we take away the ability of the file system to do intelligent > head movement on the drives (i.e., I end up begging the application team > to please only use one file system per drive at a time for ingest ;-)). > The same goes for allocation - we basically have to push this up to the > application to use the capacity in an even way. > (4) pain of administration of multiple file systems. > > I know that other file systems deal with scale better, but the question > is really how to move the mass of linux users onto these large and > increasingly common storage devices in a way that handles these challenges. How do you handle the inode number space? Do you partition it across the sub-filesystems, or do you prohibit hardlinks between the sub-fses?