From mboxrd@z Thu Jan  1 00:00:00 1970
From: Ric Wheeler <ric@emc.com>
Subject: Re: topics for the file system mini-summit
Date: Sat, 27 May 2006 21:44:43 -0400
Message-ID: <4479008B.3000400@emc.com>
References: <44762552.8000906@emc.com> <20060526164856.GQ5964@schatzie.adilger.int> <4477A236.3040208@emc.com> <20060527141832.GT5964@schatzie.adilger.int>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: linux-fsdevel@vger.kernel.org
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mexforward.lss.emc.com ([168.159.213.200]:16818 "EHLO
	mexforward.lss.emc.com") by vger.kernel.org with ESMTP
	id S964999AbWE1BxU (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>);
	Sat, 27 May 2006 21:53:20 -0400
To: Andreas Dilger <adilger@clusterfs.com>
In-Reply-To: <20060527141832.GT5964@schatzie.adilger.int>
Sender: linux-fsdevel-owner@vger.kernel.org
List-Id: linux-fsdevel.vger.kernel.org


Andreas Dilger wrote:

>On May 26, 2006  20:49 -0400, Ric Wheeler wrote:
>  
>
>>Andreas Dilger wrote:
>>    
>>
>>>In a way what you describe is Lustre - it aggregates multiple "smaller"
>>>filesystems into a single large filesystem from the application POV
>>>(though in many cases "smaller" filesystems are 2TB).  It runs e2fsck
>>>in parallel if needed, has smart object allocation (clients do delayed
>>>allocation, can load balance across storage targets, etc), can run with
>>>down storage targets.
>>>      
>>>
>>The approach that lustre takes here is great -  distributed systems 
>>typically  take into account subcomponent failures as a fact of life & 
>>do this better than many single system designs...
>>
>>The challenge is still there on the "smaller" file systems that make up 
>>Lustre - you can spend a lot of time waiting for just one fsck to finish ;-)
>>    
>>
>
>CFS is actually quite interested in improving the health and reliability
>of the component filesystems also.  That is the reason for our interest
>in the U. Wisconsin IRON filesystem work, which we are (slowly) working
>to include into ext3.
>  
>
We actually were the sponsors of the Wisconsin work, so I am glad to 
hear that it has a real impact.  I think that the Iron FS ideas will 
help, but they still don't eliminate the issues of scalability with fsck 
(and some of the scalability issues I see where performance dips with 
high object count file systems).

>This will also be our focus for upcoming filesystem work.  It is
>relatively easy to make filesystems with 64-bit structures, but the
>ability to run such large filesystems in the face of corruption
>environments is the real challenge.  It isn't practical to need a
>17-year e2fsck time, extrapolating 2TB e2fsck times to 2^48 block
>filesystems.  A lot of the features in ZFS make sense in this regard.
>
>Cheers, Andreas
>--
>
>  
>
Absolutely agree - I wonder if there is some value in trying to go back 
to profiling fsck if someone has not already done that.  It won't get 
rid of the design limitations, but we might be able to make some 
significant improvements...

ric