From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter Braam Subject: Re: [ANNOUNCE] Lustre Lite 1.0 beta 1 Date: Mon, 17 Mar 2003 10:47:23 -0700 Sender: linux-fsdevel-owner@vger.kernel.org Message-ID: <20030317174723.GG12121@peter.cfs> References: <20030312175625.GL888@peter.cfs> <3E740DE1.6010204@shaolinmicro.com> <20030316013858.C12806@schatzie.adilger.int> <3E76024E.9060607@shaolinmicro.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Andreas Dilger , linux-fsdevel@vger.kernel.org Return-path: To: David Chow Content-Disposition: inline In-Reply-To: <3E76024E.9060607@shaolinmicro.com> List-Id: linux-fsdevel.vger.kernel.org > Andreas, > > Thanks for your lengthly explanation. The design looked like Coda with > OST as you refer to the actual data storage. In fact, it is a stacked > file cache or your store data in files persistently on existing file > systesms. However, how can it handle a disconnected storage server? > Where this is the most diffcult problem for any cluster file systems > that support disconnection. It is obviously not allow disconnection for > system like having thousands of nodes is bad. The chance of node failure > is very high in those cases. As file allocation is still allowed to be > done across multiple storage servers. The answer to resolving data > conflicts transparently after disconnection is impossible! I would > really like to hear this from Lustre as it already played around with > 1000 nodes. When I came down to design a distributed file system end up > blowing my head about this. Thanks for comments or may yo give some > directions for me as I am very interested in this topic. > > regards, > David Chow Hi David, Lustre recovers from client failures and clients recover from server failures, but does not allow disconnected operations. Disconnected operation referes to the ability to make updates when the clients are not connected to the servers. The Lustre architecture does allow for modular extensions that enable disconnected operation, but no customers have asked for it yet. We have designed a very simple, automatic algortithm for handling conflicts arising during disconnected operations for InterMezzo, see the paper on www.inter-mezzo.org. Again, this could be implemented for Lustre, but we are waiting for a contract before we will do this. Disconnected operation would involve a client cache, which will be many many times slower than the distributed network infrastructure when the file sizes exceed what can be cached in memory on the client. This is unusual but quite important for supercomputing and some industrial applications. - Peter -