From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:32850) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SURdh-00046A-QE for qemu-devel@nongnu.org; Tue, 15 May 2012 19:58:51 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SURdf-0008E8-Ox for qemu-devel@nongnu.org; Tue, 15 May 2012 19:58:49 -0400 Received: from mail-ob0-f173.google.com ([209.85.214.173]:46142) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SURdf-0008Df-Jw for qemu-devel@nongnu.org; Tue, 15 May 2012 19:58:47 -0400 Received: by obbwd20 with SMTP id wd20so250750obb.4 for ; Tue, 15 May 2012 16:58:45 -0700 (PDT) Message-ID: <4FB2EDB2.5050305@codemonkey.ws> Date: Tue, 15 May 2012 18:58:42 -0500 From: Anthony Liguori MIME-Version: 1.0 References: <1336625347-10169-1-git-send-email-benh@kernel.crashing.org> <1336625347-10169-9-git-send-email-benh@kernel.crashing.org> <4FB1A80C.1010103@codemonkey.ws> <20120515014204.GE30229@truffala.fritz.box> <4FB1B95A.20209@codemonkey.ws> <1337049166.6727.32.camel@pasglop> <4FB1C480.1030408@codemonkey.ws> <1337050942.6727.40.camel@pasglop> <4FB26212.5050409@codemonkey.ws> <1337118943.6727.93.camel@pasglop> <4FB2D291.1050003@codemonkey.ws> <1337123324.6727.101.camel@pasglop> In-Reply-To: <1337123324.6727.101.camel@pasglop> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 08/13] iommu: Introduce IOMMU emulation infrastructure List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Benjamin Herrenschmidt Cc: Alex Williamson , Richard Henderson , "Michael S. Tsirkin" , qemu-devel@nongnu.org, Eduard - Gabriel Munteanu On 05/15/2012 06:08 PM, Benjamin Herrenschmidt wrote: > On Tue, 2012-05-15 at 17:02 -0500, Anthony Liguori wrote: >> >> "6.2.1 Register Based Invalidation Interface >> The register based invalidations provides a synchronous hardware interface for >> invalidations. Software is expected to write to the IOTLB registers to submit >> invalidation command and may poll on these registers to check for invalidation >> completion. For optimal performance, hardware implementations are recommended to >> complete an invalidation request with minimal latency" >> >> This makes perfect sense. You write to an MMIO location to request invalidation >> and then *poll* on a separate register for completion. >> >> It's not a single MMIO operation that has an indefinitely return duration. > > Sure, it's an implementation detail, I never meant that it had to be a > single blocking register access, all I said is that the HW must provide > such a mechanism that is typically used synchronously by the operating > system. Polling for completion is a perfectly legit way to do it, that's > how we do it on the Apple G5 "DART" iommu as well. > > The fact that MMIO operations can block is orthogonal, it is possible > however, especially with ancient PIO devices. Even ancient PIO devices really don't block indefinitely. > In our case (TCEs) it's a hypervisor call, not an MMIO op, so to some > extent it's even more likely to do "blocking" things. Yes, so I think the right thing to do is not model hypercalls for sPAPR as synchronous calls but rather as asynchronous calls. Obviously, simply ones can use a synchronous implementation... This is a matter of setting hlt=1 before dispatching the hypercall and passing a continuation to the call that when executed, prepare the CPUState for the hypercall return and then set hlt=0 to resume the CPU. > It would have been possible to implement a "busy" return status with the > guest having to try again, unfortunately that's not how Linux has > implemented it, so we are stuck with the current semantics. > > Now, if you think that dropping the lock isn't good, what do you reckon > I should do ? Add a reference count to dma map calls and a flush_pending flag. If flush_pending && ref > 0, return NULL for all map calls. Decrement ref on unmap and if ref = 0 and flush_pending, clear flush_pending. You could add a flush_notifier too for this event. dma_flush() sets flush_pending if ref > 0. Your TCE flush hypercall would register for flush notifications and squirrel away the hypercall completion continuation. VT-d actually has a concept of a invalidation completion queue which delivers interrupt based notification of invalidation completion events. The above flush_notify would be the natural way to support this since in this case, there is no VCPU event that's directly involved in the completion event. Regards, Anthony Liguori > Cheers, > Ben. > >