From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mescal.linbit (213-229-1-138.sdsl-line.inode.at [213.229.1.138]) by mail.linbit.com (LINBIT Mail Daemon) with ESMTP id 3BC4614301 for ; Mon, 4 Oct 2004 15:25:40 +0200 (CEST) From: Philipp Reisner To: drbd-dev@lists.linbit.com Subject: Re: [Drbd-dev] How Locking in GFS works... Date: Mon, 4 Oct 2004 15:26:15 +0200 References: <200410041456.21841.philipp.reisner@linbit.com> <20041004130158.GP1542@marowsky-bree.de> In-Reply-To: <20041004130158.GP1542@marowsky-bree.de> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8bit Content-Disposition: inline Message-Id: <200410041526.15189.philipp.reisner@linbit.com> List-Id: Coordination of development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Monday 04 October 2004 15:01, Lars Marowsky-Bree wrote: > On 2004-10-04T14:56:21, Philipp Reisner wrote: > > This is intended as food for thought on how we should design our > > support for shared disk file systems. > > I'm still not sure what kind of special support you need. The only > guarantee you need to provide is that after a barrier all reads on all > nodes return the same data for those blocks affected by the flush. > > The shared disk file system itself will take care of issueing > appropriate barrier and flushing the OS caches. > > Am I missing something? ;-) > If everything works (esp. the locking of the shared disk fs) no. But just consider that the locking of the shared disk FS on top of us is broken, and that it issues a write request to the same block number on both nodes. Then each node would write its copy first and the peers version of the data at second to that block number. => We would have different data in this block on our two copies. - And we would event know about it! What would have happened on a real shared disk? The real shared disk would have ordered in some order, ond one of the writes would overwrite the other version. (This is the basic design idea of proposed solution 1) (For proposed solution2 the lock "granulaty" of the shared disk FS is interesting...) --snip from ROADMAP file-- global write order As far as I understand the topic up to now we have two options to establish a global write order. Proposed Solution 1, using the order of a coordinator node: Writes from the coordinator node are carried out, as they are carried out on the primary node in conventional DRBD. ( Write to disk and send to peer simultaneously. ) Writes from the other node are sent to the coordinator first, then the coordinator inserts a small "write now" packet into its stream of write packets. The node commits the write to its local IO subsystem as soon as it gets the "write-now" packet from the coordinator. Note: With protocol C it does not matter which node is the coordinator from the performance viewpoint. Proposed Solution 2, use a dedicated LRU to implement locking: Each extent in the locking LRU can have on of these states: requested locked-by-peer locked-by-me locked-by-me-and-requested-by-peer We allow application writes only to extents which are in locked-by-me* state. New Packets: LockExtent LockExtentAck Configuration directives: dl-extents , dl-extent-size TODO: Need to verify with GFS that this makes sense. -- : Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Schönbrunnerstr 244, 1120 Vienna, Austria http://www.linbit.com :