From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:45198)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1T3N1q-0004FD-SF
	for qemu-devel@nongnu.org; Mon, 20 Aug 2012 04:08:07 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1T3N1q-0006l8-1D
	for qemu-devel@nongnu.org; Mon, 20 Aug 2012 04:08:06 -0400
Received: from mx1.redhat.com ([209.132.183.28]:11889)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1T3N1p-0006kq-Pg
	for qemu-devel@nongnu.org; Mon, 20 Aug 2012 04:08:05 -0400
Message-ID: <5031F060.1030602@redhat.com>
Date: Mon, 20 Aug 2012 10:08:00 +0200
From: Paolo Bonzini <pbonzini@redhat.com>
MIME-Version: 1.0
References: <1345326543-10677-1-git-send-email-pbonzini@redhat.com>
	<50309BEE.3090602@profihost.ag> <5030E5F2.7060903@redhat.com>
	<50313D03.8040101@profihost.ag> <5031E5CC.7090306@redhat.com>
	<5031E86F.2000000@profihost.ag>
In-Reply-To: <5031E86F.2000000@profihost.ag>
Content-Type: text/plain; charset=ISO-8859-15
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH RFT 0/3] iscsi: fix NULL dereferences /
 races between task completion and abort
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
Cc: qemu-devel@nongnu.org, ronniesahlberg@gmail.com

Il 20/08/2012 09:34, Stefan Priebe - Profihost AG ha scritto:
>>> Booting works fine now. But the VM starts to hang after trying to unmap
>>> large regions. No segfault or so just not reacting anymore.
>>
>> This is expected; unfortunately cancellation right now is a synchronous
>> operation in the block layer.  SCSI is the first big user of
>> cancellation, and it would indeed benefit from asynchronous cancellation.
>>
>> Without these three patches, you risk corruption in case the following
>> happens:
>>
>>      qemu                 target
>>    -----------------------------------
>>      send unmap -------->
>>      cancel unmap ------>
>>      send write -------->
>>         <---------------- complete write
>>                           <unmap just written sector>
>>         <---------------- complete unmap
>>         <---------------- cancellation done (unmap complete)
> 
> mhm OK that makes sense. But i cannot even login via SSH

That's because the "big QEMU lock" is held by the thread that called
qemu_aio_cancel.

> and i also see
> no cancellation message in kernel log.

And that's because the UNMAP actually ultimately succeeds.  You'll
probably see soft lockup messages though.

The solution here is to bump the timeout of the UNMAP command (either in
the kernel or in libiscsi, I didn't really understand who's at fault).

Paolo