From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34290) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WiQVx-0006Lb-S5 for qemu-devel@nongnu.org; Thu, 08 May 2014 11:45:51 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WiQVl-0005wJ-6H for qemu-devel@nongnu.org; Thu, 08 May 2014 11:45:41 -0400 Received: from mx-v6.kamp.de ([2a02:248:0:51::16]:37290 helo=mx01.kamp.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WiQVk-0005vo-Rq for qemu-devel@nongnu.org; Thu, 08 May 2014 11:45:29 -0400 Message-ID: <536BA689.4050601@kamp.de> Date: Thu, 08 May 2014 17:45:13 +0200 From: Peter Lieven MIME-Version: 1.0 References: <1398956086-20171-1-git-send-email-stefanha@redhat.com> <1398956086-20171-9-git-send-email-stefanha@redhat.com> <20140507100745.GD1771@stefanha-thinkpad.muc.redhat.com> <536A0AF8.7030600@redhat.com> <536A3E97.3000503@kamp.de> <20140508113348.GD10610@stefanha-thinkpad.redhat.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 08/22] iscsi: implement .bdrv_detach/attach_aio_context() List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: ronnie sahlberg , Stefan Hajnoczi Cc: Kevin Wolf , Stefan Hajnoczi , "Shergill, Gurinder" , qemu-devel , Paolo Bonzini , "Vinod, Chegu" Am 08.05.2014 16:52, schrieb ronnie sahlberg: > On Thu, May 8, 2014 at 4:33 AM, Stefan Hajnoczi wrote: >> On Wed, May 07, 2014 at 04:09:27PM +0200, Peter Lieven wrote: >>> On 07.05.2014 12:29, Paolo Bonzini wrote: >>>> Il 07/05/2014 12:07, Stefan Hajnoczi ha scritto: >>>>> On Fri, May 02, 2014 at 12:39:06AM +0200, Peter Lieven wrote: >>>>>>> +static void iscsi_attach_aio_context(BlockDriverState *bs, >>>>>>> + AioContext *new_context) >>>>>>> +{ >>>>>>> + IscsiLun *iscsilun = bs->opaque; >>>>>>> + >>>>>>> + iscsilun->aio_context = new_context; >>>>>>> + iscsi_set_events(iscsilun); >>>>>>> + >>>>>>> +#if defined(LIBISCSI_FEATURE_NOP_COUNTER) >>>>>>> + /* Set up a timer for sending out iSCSI NOPs */ >>>>>>> + iscsilun->nop_timer = aio_timer_new(iscsilun->aio_context, >>>>>>> + QEMU_CLOCK_REALTIME, SCALE_MS, >>>>>>> + iscsi_nop_timed_event, iscsilun); >>>>>>> + timer_mod(iscsilun->nop_timer, >>>>>>> + qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + NOP_INTERVAL); >>>>>>> +#endif >>>>>>> +} >>>>>> Is it still guaranteed that iscsi_nop_timed_event for a target is not invoked >>>>>> while we are in another function/callback of the iscsi driver for the same target? >>>> Yes, since the timer is in the same AioContext as the iscsi driver callbacks. >>> >>> Ok. Stefan: What MUST NOT happen is that the timer gets fired while we are in iscsi_service. >>> As Paolo outlined, this cannot happen, right? >> Okay, I think we're safe then. The timer can only be invoked during >> aio_poll() event loop iterations. It cannot be invoked while we're >> inside iscsi_service(). >> >>>>> BTW, is iscsi_reconnect() the right libiscsi interface to use since it >>>>> is synchronous? It seems like this would block QEMU until the socket >>>>> has connected! The guest would be frozen. >>>> There is no asynchronous interface yet for reconnection, unfortunately. >>> We initiate the reconnect after we miss a few NOP replies. So the target is already down for approx. 30 seconds. >>> Every process inside the guest is already haging or has timed out. >>> >>> If I understand correctly with the new patches only the communication with this target is hanging or isn't it? >>> So what benefit would an asyncronous reconnect have? >> Asynchronous reconnect is desirable: >> >> 1. The QEMU monitor is blocked while we're waiting for the iSCSI target >> to accept our reconnect. This means the management stack (libvirt) >> cannot control QEMU until we time out or succeed. >> >> 2. The guest is totally frozen - cannot execute instructions - because >> it will soon reach a point in the code that locks the QEMU global >> mutex (which is being held while we reconnect to the iSCSI target). >> >> This may be okayish for guests where the iSCSI LUN contains the >> "main" data that is being processed. But what if an iSCSI LUN was >> just attached to a guest that is also doing other things that are >> independent (e.g. serving a website, processing data from a local >> disk, etc) - now the reconnect causes downtime for the entire guest. > I will look into making the reconnect async over the next few days. Thanks for looking into this. I have a few things in mind that I will post on github to the issue you created. Peter