From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: Mask bit support's API Date: Tue, 23 Nov 2010 14:47:33 +0200 Message-ID: <4CEBB7E5.5030601@redhat.com> References: <201011231409.52666.sheng.yang@intel.com> <201011231435.55332.sheng.yang@intel.com> <4CEB7340.5060807@redhat.com> <201011231630.36895.sheng.yang@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Marcelo Tosatti , "Michael S. Tsirkin" , "kvm@vger.kernel.org" To: "Yang, Sheng" Return-path: Received: from mx1.redhat.com ([209.132.183.28]:6242 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753756Ab0KWMrh (ORCPT ); Tue, 23 Nov 2010 07:47:37 -0500 In-Reply-To: <201011231630.36895.sheng.yang@intel.com> Sender: kvm-owner@vger.kernel.org List-ID: On 11/23/2010 10:30 AM, Yang, Sheng wrote: > On Tuesday 23 November 2010 15:54:40 Avi Kivity wrote: > > On 11/23/2010 08:35 AM, Yang, Sheng wrote: > > > On Tuesday 23 November 2010 14:17:28 Avi Kivity wrote: > > > > On 11/23/2010 08:09 AM, Yang, Sheng wrote: > > > > > Hi Avi, > > > > > > > > > > I've purposed the following API for mask bit support. > > > > > > > > > > The main point is, QEmu can know which entries are enabled(by > > > > > pci_enable_msix()). And for enabled entries, kernel own it, > > > > > including MSI data/address and mask bit(routing table and mask > > > > > bitmap). QEmu should use KVM_GET_MSIX_ENTRY ioctl to get them(and > > > > > it can sync with them if it want to do so). > > > > > > > > > > Before entries are enabled, QEmu can still use it's own MSI > > > > > table(because we didn't contain these kind of information in > > > > > kernel, and it's unnecessary for kernel). > > > > > > > > > > The KVM_MSIX_FLAG_ENTRY flag would be clear if QEmu want to query > > > > > one entry didn't exist in kernel - or we can simply return -EINVAL > > > > > for it. > > > > > > > > > > I suppose it would be rare for QEmu to use this interface to get > > > > > the context of entry(the only case I think is when MSI-X disable > > > > > and QEmu need to sync the context), so performance should not be > > > > > an issue. > > > > > > > > > > What's your opinion? > > > > > > > > > > > #define KVM_GET_MSIX_ENTRY _IOWR(KVMIO, 0x7d, struct > > > > > > kvm_msix_entry) > > > > > > > > Need SET_MSIX_ENTRY for live migration as well. > > > > > > Current we don't support LM with VT-d... > > > > Isn't this work useful for virtio as well? > > Yeah, but won't be included in this patchset. What API changes are needed? I'd like to see the complete API. > > > > What about the pending bits? > > > > > > We didn't cover it here - and it's in another MMIO space(PBA). Of course > > > we can add more flags for it later. > > > > When an entry is masked, we need to set the pending bit for it > > somewhere. I guess this is broken in the existing code (without your > > patches)? > > Even with my patch, we didn't support the pending bit. It would always return 0 > now. What we supposed to do(after my patch checked in) is to check IRQ_PENDING flag > of irq_desc->status(if the entry is masked), and return the result to userspace. > > That would involve some core change, like to export irq_to_desc(). I don't think > it would be accepted soon, so would push mask bit first. The API needs to be compatible with the pending bit, even if we don't implement it now. I want to reduce the rate of API changes. > > > > > > Also need a new exit reason to tell userspace that an msix entry has > > > > changed, so userspace can update mappings. > > > > > > I think we don't need it. Whenever userspace want to get one mapping > > > which is an enabled MSI-X entry, it can check it with the API > > > above(which is quite rare, because kernel would handle all of them when > > > guest is accessing them). If it's a disabled entry, the context inside > > > userspace MMIO record is the correct one(and only one). The only place I > > > think QEmu need to sync is when MSI-X is about to disabled, QEmu need to > > > update it's own MMIO record. > > > > So in-kernel handling of mmio would be decided per entry? I'm trying to > > simplify this, and simplest thing is - all or nothing. > > So you would like to handle all MSI-X MMIO in kernel? Yes. Writes to address or data would be handled by: - recording it into the shadow msix table - notifying userspace that msix entry x changed Reads would be handled in kernel from the shadow msix table. So instead of - guest reads/writes msix - kvm filters mmio, implements some, passes others to userspace we have - guest reads/writes msix - kvm implements all - some writes generate an additional notification to userspace -- error compiling committee.c: too many arguments to function