From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39500) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a8jQh-0007nS-JR for qemu-devel@nongnu.org; Tue, 15 Dec 2015 01:49:48 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a8jQe-0006fx-Dx for qemu-devel@nongnu.org; Tue, 15 Dec 2015 01:49:47 -0500 Received: from mx2.suse.de ([195.135.220.15]:36176) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a8jQe-0006ft-2x for qemu-devel@nongnu.org; Tue, 15 Dec 2015 01:49:44 -0500 References: <1448636346-24641-1-git-send-email-hare@suse.de> <20151210082641.GA4222@stefanha-x1.localdomain> <5669422D.5080400@suse.de> <20151214072425.GB5027@stefanha-x1.localdomain> <566E714F.1010706@suse.de> <20151215030257.GA30291@stefanha-x1.localdomain> From: Hannes Reinecke Message-ID: <566FB806.2070301@suse.de> Date: Tue, 15 Dec 2015 07:49:42 +0100 MIME-Version: 1.0 In-Reply-To: <20151215030257.GA30291@stefanha-x1.localdomain> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH RFC 0/8] scsi-disk: Active/passive ALUA support List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: Johannes Thumshirn , Paolo Bonzini , qemu-devel@nongnu.org, Alexander Graf On 12/15/2015 04:02 AM, Stefan Hajnoczi wrote: > On Mon, Dec 14, 2015 at 08:35:43AM +0100, Hannes Reinecke wrote: >> On 12/14/2015 08:24 AM, Stefan Hajnoczi wrote: >>> On Thu, Dec 10, 2015 at 10:13:17AM +0100, Hannes Reinecke wrote: >>>> On 12/10/2015 09:26 AM, Stefan Hajnoczi wrote: >>>>> On Fri, Nov 27, 2015 at 03:58:58PM +0100, Hannes Reinecke wrote: >>>>>> here's now an updated version to enable ALUA and simplified >>>>>> active/passive multipath support for qemu. >>>>>> >>>>>> This patchset relies on having _two_ block devices configured, >>>>>> and two SCSI disks pointing to those block devices with the >>>>>> _same_ 'wwn' property and unique 'port_group' properties. >>>>>> I know, this is a bit of a nasty hack, but I hope to add >>>>>> proper multipath support (with several SCSI devices pointing / >>>>>> linking to the same block device) in the near future. >>>>>> >>>>>> It also implements a 'alua_policy', which allows for simulating >>>>>> an 'active/passive' multipath setup. >>>>>> >>>>>> And for testing I've implemented a 'block_disconnect' HMP command, >>>>>> which simulates a link failure for the attached devices. >>>>>> >>>>>> I wouldn't object if someone declares this a gross hack, but with >>>>>> it I can finally simulate real-life multipath failover and do >>>>>> some functional multipath-tools testing withouth having to recurse >>>>>> on using real hardware. >>>>> >>>>> I'm not familiar with how ALUA works but have been thinking about a >>>>> multipath problem: >>>>> >>>>> If the host has SCSI disks that are marked 'offline' then QEMU will >>>>> refuse to start up since it cannot open the block device (ENXIO). >>>>> >>>> Define 'offline'. >>>> If this means the ALUA state 'offline' then we wouldn't have to worr= y; ALUA >>>> state 'offline' essentially means "Yeah, there's something here, but= I won't >>>> tell you and you cannot access it.". >>>> And any transitions to and from 'offline' are essentially vendor-spe= cific. >>>> In short: Do not use it. >>>> >>>> If OTOH means the 'block_disconnect' state this is something which >>>> should/needs to be implemented in the HBA emulation for simulating >>>> a link failure. >>>> qemu itself should be able to access the device and it should start = up >>>> perfectly normal, so we shouldn't get any ENXIO errors. >>>> >>>> (Obviously, if _all_ disks are in 'disconnect' state the guest would= n't >>>> start up as it cannot read any data. But that's beside the point.) >>> >>> I'm referring to scsi_device_set_state(scmd->device, SDEV_OFFLINE) in >>> Linux. This is the state where the host block device cannot be opene= d >>> or accessed. >>> >> Which means the device is declared dead by the SCSI stack. >> And qemu does _very_ well not to start in this circumstances. >> >> However, this behaviour is not influenced nor modified by the ALUA pat= chset >> but is rather a different topic. >> >> >> 'offline' devices is the final step in SCSI EH, which means that SCSI = EH has >> exhausted its options and doesn't know how to fix the device. >> However, in modern systems this typically happens when SCSI EH kicks i= n >> during a (transport) link disconnect, as then every single step in SCS= I EH >> will fail. (Which also means that SCSI EH is woefully inadequate for F= C, but >> that's a different topic.) >> But as this is a transport issue, _all_ respective drivers should be a= ware >> of this, and should have been modified _not_ to start SCSI EH when the >> transport link is severed. >> So the very fact that SCSI EH is started means that there's an issue w= ith >> the driver, which really needs to be fixed first. >> Hence I think qemu is right here, as the underlying reason for the 'of= fline' >> device should be fixed first. >> > > Interesting, thanks for explaining. > And thinking about it some more, we _could_ map the SCSI 'offline'=20 state onto ALUA 'offline', and allow qemu to start nevertheless. (It could get the interesting bits like INQUIRY from sysfs, so it=20 doesn't _actually_ have to do I/O on startup). Then we could have a HMP/QMP command for resetting the SCSI status=20 back to 'running', which should allow I/O to start properly. Hmm. Lemme see ... Cheers, Hannes --=20 Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N=FCrnberg GF: F. Imend=F6rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG N=FCrnberg)