From: Chao Gao <chao.gao@intel.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: "george.dunlap@eu.citrix.com" <george.dunlap@eu.citrix.com>,
"andrew.cooper3@citrix.com" <andrew.cooper3@citrix.com>,
"dario.faggioli@citrix.com" <dario.faggioli@citrix.com>,
Kevin Tian <kevin.tian@intel.com>,
"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: [PATCH v8 4/7] VT-d: Use one function to update both remapped and posted IRTE
Date: Wed, 22 Feb 2017 15:12:30 +0800 [thread overview]
Message-ID: <20170222071230.GA59724@skl-2s3.sh.intel.com> (raw)
In-Reply-To: <58AD6391020000780013CA19@prv-mh.provo.novell.com>
On Wed, Feb 22, 2017 at 02:10:25AM -0700, Jan Beulich wrote:
>>>> On 22.02.17 at 02:53, <chao.gao@intel.com> wrote:
>> On Tue, Nov 22, 2016 at 05:58:56PM +0800, Jan Beulich wrote:
>>>> @@ -637,7 +657,23 @@ static int msi_msg_to_remap_entry(
>>>> remap_rte->address_hi = 0;
>>>> remap_rte->data = index - i;
>>>>
>>>> - memcpy(iremap_entry, &new_ire, sizeof(struct iremap_entry));
>>>> + if ( !pi_desc )
>>>> + memcpy(iremap_entry, &new_ire, sizeof(struct iremap_entry));
>>>> + else
>>>> + {
>>>> + __uint128_t ret;
>>>> +
>>>> + old_ire = *iremap_entry;
>>>> + ret = cmpxchg16b(iremap_entry, &old_ire, &new_ire);
>>>> +
>>>> + /*
>>>> + * In the above, we use cmpxchg16 to atomically update the 128-bit IRTE,
>>>> + * and the hardware cannot update the IRTE behind us, so the return value
>>>> + * of cmpxchg16 should be the same as old_ire. This ASSERT validate it.
>>>> + */
>>>> + ASSERT(ret == old_ire.val);
>>>> + }
>>>
>>>Could you remind me again please why posted format updates need
>>>to use cmpxchg16b, but remapped format ones don't? (As a aside,
>>>with the code structure as you have it you should move the old_irte
>>>declaration here, or omit it altogether, as you could as well pass
>>>*iremap_entry directly afaict.)
>>
>> Before feng left, I have asked him about this question. He told me that
>> the PDA field of posted format IRTE comprises of two parts:
>> Posted Descritor Address High[127:96] and Low [63:38]. If we want to update
>> PDA field, do it atomically or disable-update-enable. He also said, it had
>> been confirmed that cmpxchg16b was supported on all intel platform with VT-d PI.
>> If we assume that updates to remapped format IRTE only is to update either
>> 64 bit or high 64 bit (except initialition), two 64bit memory write operations
>> is enough.
>
>Two 64-bit memory write operations? Where do you see them? I
>only see memcpy(), which for the purposes of the code here is
>supposed to be a black box.
Ok. I made a mistake here. In ioapic case, before update IRTE, the according
IOAPIC RTE is masked. So, using a memcpy() is safe. In msi case, there is no
mask operation. I think only using a memcpy() is unsafe. Do you think so?
>
>>>> @@ -668,7 +704,8 @@ int msi_msg_write_remap_rte(
>>>>
>>>> drhd = pdev ? acpi_find_matched_drhd_unit(pdev)
>>>> : hpet_to_drhd(msi_desc->hpet_id);
>>>> - return drhd ? msi_msg_to_remap_entry(drhd->iommu, pdev, msi_desc, msg)
>>>> + return drhd ? msi_msg_to_remap_entry(drhd->iommu, pdev,
>>>> + msi_desc, msg, NULL, 0)
>>>
>>>Is this unconditional passing of NULL here really correct?
>>
>> Since two parameters are added to this function, we should think about what
>> the function does again. the last 2 parameters are optional.
>>
>> If they are not present, just means a physical device driver changes its msi
>> message. So it notifies iommu to do some changes to IRTE accordingly (the driver doesn't
>> know the format of the live IRTE). This is the case above.
>>
>> If they are present, it means the msi should be delivered to the vcpu with the
>> vector num. To achieve that, the function replaces the old IRTE with a new
>> posted format IRTE.
>
>I don't see how this answers my question. In fact it feels like you,
>just like Feng, are making assumptions on the conditions under
>which the function here is being called _at present_. You should,
>however, make the function work correctly for all possible uses,
>or add ASSERT()s to clearly expose issues with potential new,
>future callers.
Ok. Your suggestion is very good.
Every caller tells the function to construct a centain format IRTE. Return to
your question, I think it is defintely wrong to pass NULL unconditionally.
We should pass NULL before the msi is binded to a guest interrupt and pass the
destination vcpu and vector num after that. At this moment, I can't come up
with a way to check whether the msi is binded to a guest interrupt and get the
destination vcpu and vector num only through the struct pci_dev and struct
msi_desc. Could you give me some suggestion on that or recommend a structure,
you think, in which we can add some fields to record these information?
Thanks,
Chao
>
>>>> @@ -992,35 +986,11 @@ int pi_update_irte(const struct vcpu *v, const struct pirq *pirq,
>>>> return -ENODEV;
>>>>
>>>> iommu = drhd->iommu;
>>>> - ir_ctrl = iommu_ir_ctrl(iommu);
>>>> - if ( !ir_ctrl )
>>>> + if ( !iommu_ir_ctrl(iommu) )
>>>> return -ENODEV;
>>>>
>>>> - spin_lock_irq(&ir_ctrl->iremap_lock);
>>>> -
>>>> - GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, remap_index, iremap_entries, p);
>>>> -
>>>> - old_ire = *p;
>>>> -
>>>> - /* Setup/Update interrupt remapping table entry. */
>>>> - setup_posted_irte(&new_ire, &old_ire, pi_desc, gvec);
>>>> - ret = cmpxchg16b(p, &old_ire, &new_ire);
>>>> -
>>>> - /*
>>>> - * In the above, we use cmpxchg16 to atomically update the 128-bit IRTE,
>>>> - * and the hardware cannot update the IRTE behind us, so the return value
>>>> - * of cmpxchg16 should be the same as old_ire. This ASSERT validate it.
>>>> - */
>>>> - ASSERT(ret == old_ire.val);
>>>> -
>>>> - iommu_flush_cache_entry(p, sizeof(*p));
>>>> - iommu_flush_iec_index(iommu, 0, remap_index);
>>>> -
>>>> - unmap_vtd_domain_page(iremap_entries);
>>>> -
>>>> - spin_unlock_irq(&ir_ctrl->iremap_lock);
>>>> -
>>>> - return 0;
>>>> + return msi_msg_to_remap_entry(iommu, pci_dev, msi_desc, &msi_desc->msg,
>>>> + pi_desc, gvec);
>>>
>>>There are few changes here which appear to have the potential of
>>>affecting behavior: Previously you didn't alter msi_desc or the MSI
>>>message contained therein (as documented by the pointer having
>>>been const). Is this possible updating of message and remap index
>>>really benign? In any event any such changes should be reasoned
>>>about in the commit message.
>>
>> I agree that we can't update message and remap index in this pi_update_irte.
>> but msi_msg_to_remap_entry will change msi_desc when msi_desc->remap_index < 0.
>> How about splitting part of msi_msg_to_remap_entry to a new function which consumes a const
>> msi_desc parameter and pi_update_irte will call the new function?
>
>Well, I can't easily say yes or no here without seeing what the
>result would be. Give it a try, and we'll look at the result in v9.
>
>Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
next prev parent reply other threads:[~2017-02-22 7:12 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-18 1:57 [PATCH v8 0/7] VMX: Properly handle pi descriptor and per-cpu blocking list Feng Wu
2016-11-18 1:57 ` [PATCH v8 1/7] VMX: Permanently assign PI hook vmx_pi_switch_to() Feng Wu
2016-11-18 3:14 ` Tian, Kevin
2016-11-18 4:23 ` Wu, Feng
2016-11-18 1:57 ` [PATCH v8 2/7] VMX: Properly handle pi when all the assigned devices are removed Feng Wu
2016-11-18 3:19 ` Tian, Kevin
2016-11-18 4:27 ` Wu, Feng
2016-11-18 4:58 ` Tian, Kevin
2016-11-18 5:22 ` Wu, Feng
2016-11-18 9:23 ` Jan Beulich
2016-11-18 1:57 ` [PATCH v8 3/7] VMX: Make sure PI is in proper state before install the hooks Feng Wu
2016-11-18 4:11 ` Tian, Kevin
2016-11-18 4:27 ` Wu, Feng
2016-11-18 1:57 ` [PATCH v8 4/7] VT-d: Use one function to update both remapped and posted IRTE Feng Wu
2016-11-18 4:31 ` Tian, Kevin
2016-11-22 4:31 ` Wu, Feng
2016-11-22 9:58 ` Jan Beulich
2017-02-22 1:53 ` Chao Gao
2017-02-22 9:10 ` Jan Beulich
2017-02-22 7:12 ` Chao Gao [this message]
2017-02-22 14:54 ` Jan Beulich
2016-11-18 1:57 ` [PATCH v8 5/7] VT-d: No need to set irq affinity for posted format IRTE Feng Wu
2016-11-18 1:57 ` [PATCH v8 6/7] VT-d: Some cleanups Feng Wu
2016-11-18 4:45 ` Tian, Kevin
2016-11-18 1:57 ` [PATCH v8 7/7] VMX: Fixup PI descriptor when cpu is offline Feng Wu
2016-11-18 4:55 ` Tian, Kevin
2016-11-18 3:02 ` [PATCH v8 0/7] VMX: Properly handle pi descriptor and per-cpu blocking list Tian, Kevin
2016-11-18 4:21 ` Wu, Feng
2017-02-21 21:30 ` Chao Gao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170222071230.GA59724@skl-2s3.sh.intel.com \
--to=chao.gao@intel.com \
--cc=JBeulich@suse.com \
--cc=andrew.cooper3@citrix.com \
--cc=dario.faggioli@citrix.com \
--cc=george.dunlap@eu.citrix.com \
--cc=kevin.tian@intel.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).