From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:32850)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <anthony@codemonkey.ws>) id 1SURdh-00046A-QE
	for qemu-devel@nongnu.org; Tue, 15 May 2012 19:58:51 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <anthony@codemonkey.ws>) id 1SURdf-0008E8-Ox
	for qemu-devel@nongnu.org; Tue, 15 May 2012 19:58:49 -0400
Received: from mail-ob0-f173.google.com ([209.85.214.173]:46142)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <anthony@codemonkey.ws>) id 1SURdf-0008Df-Jw
	for qemu-devel@nongnu.org; Tue, 15 May 2012 19:58:47 -0400
Received: by obbwd20 with SMTP id wd20so250750obb.4
	for <qemu-devel@nongnu.org>; Tue, 15 May 2012 16:58:45 -0700 (PDT)
Message-ID: <4FB2EDB2.5050305@codemonkey.ws>
Date: Tue, 15 May 2012 18:58:42 -0500
From: Anthony Liguori <anthony@codemonkey.ws>
MIME-Version: 1.0
References: <1336625347-10169-1-git-send-email-benh@kernel.crashing.org>
	<1336625347-10169-9-git-send-email-benh@kernel.crashing.org>
	<4FB1A80C.1010103@codemonkey.ws>
	<20120515014204.GE30229@truffala.fritz.box>
	<4FB1B95A.20209@codemonkey.ws> <1337049166.6727.32.camel@pasglop>
	<4FB1C480.1030408@codemonkey.ws> <1337050942.6727.40.camel@pasglop>
	<4FB26212.5050409@codemonkey.ws> <1337118943.6727.93.camel@pasglop>
	<4FB2D291.1050003@codemonkey.ws>
	<1337123324.6727.101.camel@pasglop>
In-Reply-To: <1337123324.6727.101.camel@pasglop>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH 08/13] iommu: Introduce IOMMU emulation
	infrastructure
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Alex Williamson <alex.williamson@redhat.com>, Richard Henderson <rth@twiddle.net>, "Michael S. Tsirkin" <mst@redhat.com>, qemu-devel@nongnu.org, Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>

On 05/15/2012 06:08 PM, Benjamin Herrenschmidt wrote:
> On Tue, 2012-05-15 at 17:02 -0500, Anthony Liguori wrote:
>>
>> "6.2.1 Register Based Invalidation Interface
>> The register based invalidations provides a synchronous hardware interface for
>> invalidations.  Software is expected to write to the IOTLB registers to submit
>> invalidation command and may poll on these registers to check for invalidation
>> completion. For optimal performance, hardware implementations are recommended to
>> complete an invalidation request with minimal latency"
>>
>> This makes perfect sense.  You write to an MMIO location to request invalidation
>> and then *poll* on a separate register for completion.
>>
>> It's not a single MMIO operation that has an indefinitely return duration.
>
> Sure, it's an implementation detail, I never meant that it had to be a
> single blocking register access, all I said is that the HW must provide
> such a mechanism that is typically used synchronously by the operating
> system. Polling for completion is a perfectly legit way to do it, that's
> how we do it on the Apple G5 "DART" iommu as well.
>
> The fact that MMIO operations can block is orthogonal, it is possible
> however, especially with ancient PIO devices.

Even ancient PIO devices really don't block indefinitely.

> In our case (TCEs) it's a hypervisor call, not an MMIO op, so to some
> extent it's even more likely to do "blocking" things.

Yes, so I think the right thing to do is not model hypercalls for sPAPR as 
synchronous calls but rather as asynchronous calls.  Obviously, simply ones can 
use a synchronous implementation...

This is a matter of setting hlt=1 before dispatching the hypercall and passing a 
continuation to the call that when executed, prepare the CPUState for the 
hypercall return and then set hlt=0 to resume the CPU.

> It would have been possible to implement a "busy" return status with the
> guest having to try again, unfortunately that's not how Linux has
> implemented it, so we are stuck with the current semantics.
>
> Now, if you think that dropping the lock isn't good, what do you reckon
> I should do ?

Add a reference count to dma map calls and a flush_pending flag.  If 
flush_pending && ref > 0, return NULL for all map calls.

Decrement ref on unmap and if ref = 0 and flush_pending, clear flush_pending. 
You could add a flush_notifier too for this event.

dma_flush() sets flush_pending if ref > 0.  Your TCE flush hypercall would 
register for flush notifications and squirrel away the hypercall completion 
continuation.

VT-d actually has a concept of a invalidation completion queue which delivers 
interrupt based notification of invalidation completion events.  The above 
flush_notify would be the natural way to support this since in this case, there 
is no VCPU event that's directly involved in the completion event.

Regards,

Anthony Liguori

> Cheers,
> Ben.
>
>