From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christophe Varoqui Subject: Re: persistent reservation behaviour with dm-multipath Date: Wed, 23 Jul 2008 23:09:44 +0200 Message-ID: <1216847384.9122.15.camel@plop> References: <1216457992.7364.15.camel@plop> <48879489.6060401@torque.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from postfix1-g20.free.fr ([212.27.60.42]:43797 "EHLO postfix1-g20.free.fr" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754586AbYGWVKq (ORCPT ); Wed, 23 Jul 2008 17:10:46 -0400 Received: from smtp7-g19.free.fr (smtp7-g19.free.fr [212.27.42.64]) by postfix1-g20.free.fr (Postfix) with ESMTP id 5C7AD2890589 for ; Wed, 23 Jul 2008 23:10:45 +0200 (CEST) In-Reply-To: <48879489.6060401@torque.net> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: dougg@torque.net Cc: linux-scsi@vger.kernel.org Le mercredi 23 juillet 2008 =C3=A0 16:28 -0400, Douglas Gilbert a =C3=A9= crit : > Christophe Varoqui wrote: > > The current dm-multipath behaviour is currently a potent data corru= pter > > on Persistant Reservation-based clusters sharing multipaths with th= e > > queue_if_no_path feature on (Clariion, Storageworks, ...). > >=20 > > Consider the following scenario : > >=20 > > - Node A take a write-exclusive persistent reservation on LU > > - Node B submits a write io to LU, which is a sda-sdb multipath > > =EF=BB=BF- B dm_multipath routes the wio to sda, the wio is failed,= the path is > > marked failed > > =EF=BB=BF- B dm_multipath routes the wio to sdb, the wio is failed,= the last > > path is marked failed > > - B queues the wio because of the queue_if_no_path feature. Process > > submitting the wio is stuck in D-state. > > - A releases the reservation. Queued wios are unqueued, corrupting = the > > data on LU. > >=20 > > I suspect wio returning a "reservation conflict" status should neve= r be > > queued. > >=20 > > DM suspend/resume on the multipath devmap effectively flushes the q= ueue, > > but this solution leaves a window open for data corruption, between= io > > enqueue and user-space driven queue flush. > >=20 > > Is there work in progress to address this issue yet ? What's would = be an > > acceptable solution design (for example Mike Christie suggested in = Aug > > 2005 a scsi-to-blk error translation patch, which got nowhere) ? > y of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html >=20 > If memory serves, a SCSI command status of RESERVATION > CONFLICT did not find its way back to the sg driver API > (and/or the command was retried). Is that still the case? >=20 As far as I can tell, the scsi subsystem alone behaves as expected : a wio on a reserved-by-other scsi device gets errored nicely : no retry, = a clean message indicating the wio error cause in the dmesg. The device-mapper multipath target, on the other hand, can be configure= d to queue ios errored by the scsi layer. Which is a desirable behaviour when we know we face a transcient all-paths-down situation (like a LU tresspass on a Clariion controller pair), but which is not so smart we the io was errored due to a reservation conflict. The problem here is how the scsi layer can instruct the multipath dm driver not to queue an errored io. This is what Mike's patch tied to address. Regards, cvaroqui -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html