From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: Mask bit support's API Date: Tue, 23 Nov 2010 09:54:40 +0200 Message-ID: <4CEB7340.5060807@redhat.com> References: <201011231409.52666.sheng.yang@intel.com> <4CEB5C78.4060408@redhat.com> <201011231435.55332.sheng.yang@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Marcelo Tosatti , "Michael S. Tsirkin" , "kvm@vger.kernel.org" To: "Yang, Sheng" Return-path: Received: from mx1.redhat.com ([209.132.183.28]:27256 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752613Ab0KWHyp (ORCPT ); Tue, 23 Nov 2010 02:54:45 -0500 In-Reply-To: <201011231435.55332.sheng.yang@intel.com> Sender: kvm-owner@vger.kernel.org List-ID: On 11/23/2010 08:35 AM, Yang, Sheng wrote: > On Tuesday 23 November 2010 14:17:28 Avi Kivity wrote: > > On 11/23/2010 08:09 AM, Yang, Sheng wrote: > > > Hi Avi, > > > > > > I've purposed the following API for mask bit support. > > > > > > The main point is, QEmu can know which entries are enabled(by > > > pci_enable_msix()). And for enabled entries, kernel own it, including > > > MSI data/address and mask bit(routing table and mask bitmap). QEmu > > > should use KVM_GET_MSIX_ENTRY ioctl to get them(and it can sync with > > > them if it want to do so). > > > > > > Before entries are enabled, QEmu can still use it's own MSI table(because > > > we didn't contain these kind of information in kernel, and it's > > > unnecessary for kernel). > > > > > > The KVM_MSIX_FLAG_ENTRY flag would be clear if QEmu want to query one > > > entry didn't exist in kernel - or we can simply return -EINVAL for it. > > > > > > I suppose it would be rare for QEmu to use this interface to get the > > > context of entry(the only case I think is when MSI-X disable and QEmu > > > need to sync the context), so performance should not be an issue. > > > > > > What's your opinion? > > > > > > > #define KVM_GET_MSIX_ENTRY _IOWR(KVMIO, 0x7d, struct > > > > kvm_msix_entry) > > > > Need SET_MSIX_ENTRY for live migration as well. > > Current we don't support LM with VT-d... Isn't this work useful for virtio as well? > > > > > > #define KVM_UPDATE_MSIX_MMIO _IOW(KVMIO, 0x7e, struct > > > > kvm_msix_mmio) > > > > > > > > #define KVM_MSIX_TYPE_ASSIGNED_DEV 1 > > > > > > > > #define KVM_MSIX_FLAG_MASKBIT (1<< 0) > > > > #define KVM_MSIX_FLAG_QUERY_MASKBIT (1<< 0) > > > > #define KVM_MSIX_FLAG_ENTRY (1<< 1) > > > > #define KVM_MSIX_FLAG_QUERY_ENTRY (1<< 1) > > > > Why is there a need for the flag? If we simply get/set entire entries, > > that includes the mask bits? > > We still want QEmu to cover a part of entries which hasn't been enabled yet(which > won't existed in routing table), but kernel would cover all mask bit regardless of > if it's enabled. So QEmu can query any entry to check the maskbit, but not > address/data. Don't understand. If we support reading/writing entire entries, that works for both enabled and disabled entries? > > > What about the pending bits? > > We didn't cover it here - and it's in another MMIO space(PBA). Of course we can > add more flags for it later. When an entry is masked, we need to set the pending bit for it somewhere. I guess this is broken in the existing code (without your patches)? > > > > Also need a new exit reason to tell userspace that an msix entry has > > changed, so userspace can update mappings. > > I think we don't need it. Whenever userspace want to get one mapping which is an > enabled MSI-X entry, it can check it with the API above(which is quite rare, > because kernel would handle all of them when guest is accessing them). If it's a > disabled entry, the context inside userspace MMIO record is the correct one(and > only one). The only place I think QEmu need to sync is when MSI-X is about to > disabled, QEmu need to update it's own MMIO record. > So in-kernel handling of mmio would be decided per entry? I'm trying to simplify this, and simplest thing is - all or nothing. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.