From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ric Wheeler Subject: Re: topics for the file system mini-summit Date: Sat, 27 May 2006 21:44:43 -0400 Message-ID: <4479008B.3000400@emc.com> References: <44762552.8000906@emc.com> <20060526164856.GQ5964@schatzie.adilger.int> <4477A236.3040208@emc.com> <20060527141832.GT5964@schatzie.adilger.int> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: linux-fsdevel@vger.kernel.org Return-path: Received: from mexforward.lss.emc.com ([168.159.213.200]:16818 "EHLO mexforward.lss.emc.com") by vger.kernel.org with ESMTP id S964999AbWE1BxU (ORCPT ); Sat, 27 May 2006 21:53:20 -0400 To: Andreas Dilger In-Reply-To: <20060527141832.GT5964@schatzie.adilger.int> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org Andreas Dilger wrote: >On May 26, 2006 20:49 -0400, Ric Wheeler wrote: > > >>Andreas Dilger wrote: >> >> >>>In a way what you describe is Lustre - it aggregates multiple "smaller" >>>filesystems into a single large filesystem from the application POV >>>(though in many cases "smaller" filesystems are 2TB). It runs e2fsck >>>in parallel if needed, has smart object allocation (clients do delayed >>>allocation, can load balance across storage targets, etc), can run with >>>down storage targets. >>> >>> >>The approach that lustre takes here is great - distributed systems >>typically take into account subcomponent failures as a fact of life & >>do this better than many single system designs... >> >>The challenge is still there on the "smaller" file systems that make up >>Lustre - you can spend a lot of time waiting for just one fsck to finish ;-) >> >> > >CFS is actually quite interested in improving the health and reliability >of the component filesystems also. That is the reason for our interest >in the U. Wisconsin IRON filesystem work, which we are (slowly) working >to include into ext3. > > We actually were the sponsors of the Wisconsin work, so I am glad to hear that it has a real impact. I think that the Iron FS ideas will help, but they still don't eliminate the issues of scalability with fsck (and some of the scalability issues I see where performance dips with high object count file systems). >This will also be our focus for upcoming filesystem work. It is >relatively easy to make filesystems with 64-bit structures, but the >ability to run such large filesystems in the face of corruption >environments is the real challenge. It isn't practical to need a >17-year e2fsck time, extrapolating 2TB e2fsck times to 2^48 block >filesystems. A lot of the features in ZFS make sense in this regard. > >Cheers, Andreas >-- > > > Absolutely agree - I wonder if there is some value in trying to go back to profiling fsck if someone has not already done that. It won't get rid of the design limitations, but we might be able to make some significant improvements... ric