From: Chao Gao <chao.gao@intel.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: "george.dunlap@eu.citrix.com" <george.dunlap@eu.citrix.com>,
"andrew.cooper3@citrix.com" <andrew.cooper3@citrix.com>,
"dario.faggioli@citrix.com" <dario.faggioli@citrix.com>,
"Tian, Kevin" <kevin.tian@intel.com>,
"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: [PATCH v8 4/7] VT-d: Use one function to update both remapped and posted IRTE
Date: Wed, 22 Feb 2017 09:53:04 +0800 [thread overview]
Message-ID: <20170222015304.GB50645@skl-2s3.sh.intel.com> (raw)
In-Reply-To: <583424F002000078001208F3@prv-mh.provo.novell.com>
On Tue, Nov 22, 2016 at 05:58:56PM +0800, Jan Beulich wrote:
>>>> On 18.11.16 at 02:57, <feng.wu@intel.com> wrote:
>> @@ -597,31 +598,50 @@ static int msi_msg_to_remap_entry(
>
>Considering you basically re-do most of the function, I think there's
>some more adjustment necessary (or at least very desirable) here.
>
>>
>> memcpy(&new_ire, iremap_entry, sizeof(struct iremap_entry));
>>
>> - /* Set interrupt remapping table entry */
>> - new_ire.remap.fpd = 0;
>> - new_ire.remap.dm = (msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT) & 0x1;
>> - new_ire.remap.tm = (msg->data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
>> - new_ire.remap.dlm = (msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x1;
>> - /* Hardware require RH = 1 for LPR delivery mode */
>> - new_ire.remap.rh = (new_ire.remap.dlm == dest_LowestPrio);
>> - new_ire.remap.avail = 0;
>> - new_ire.remap.res_1 = 0;
>> - new_ire.remap.vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) &
>> - MSI_DATA_VECTOR_MASK;
>> - new_ire.remap.res_2 = 0;
>> - if ( x2apic_enabled )
>> - new_ire.remap.dst = msg->dest32;
>> + if ( !pi_desc )
>> + {
>> + /* Set interrupt remapping table entry */
>> + new_ire.remap.fpd = 0;
>> + new_ire.remap.dm = (msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT) & 0x1;
>> + new_ire.remap.tm = (msg->data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
>
>These two "& 0x1" seem unnecessary, as the fields are 1 bit only
>anyway.
>
Ok, will remove them.
>> + new_ire.remap.dlm = (msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x1;
>
>This masking, however, has always been puzzling me: dlm is a 3 bit
>field, and the MSI message field is a 3-bit one too. Hence the
>masking should also be dropped here - if the message field is wrong
>(i.e. holding an unsupported value), there's no good reason to try
>to compensate for it here. If at all the function should refuse to do
>the requested translation.
>
>> + /* Hardware require RH = 1 for LPR delivery mode */
>> + new_ire.remap.rh = (new_ire.remap.dlm == dest_LowestPrio);
>> + new_ire.remap.avail = 0;
>> + new_ire.remap.res_1 = 0;
>> + new_ire.remap.vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) &
>> + MSI_DATA_VECTOR_MASK;
>> + new_ire.remap.res_2 = 0;
>> + if ( x2apic_enabled )
>> + new_ire.remap.dst = msg->dest32;
>> + else
>> + new_ire.remap.dst = ((msg->address_lo >> MSI_ADDR_DEST_ID_SHIFT)
>> + & 0xff) << 8;
>> +
>> + new_ire.remap.res_3 = 0;
>> + new_ire.remap.res_4 = 0;
>> + new_ire.remap.p = 1; /* finally, set present bit */
>
>I don't understand this comment. The order of operations does not
>matter at all, and in fact you now set p before being done with all
>other fields. Please drop such misleading comments, or make them
>useful/correct.
Yes, The order does not matter. I will drop this comment.
>
>Also, the only field you don't explicitly set here is .im. Why?
>
>> + }
>> else
>> - new_ire.remap.dst = ((msg->address_lo >> MSI_ADDR_DEST_ID_SHIFT)
>> - & 0xff) << 8;
>> + {
>> + new_ire.post.fpd = 0;
>> + new_ire.post.res_1 = 0;
>> + new_ire.post.res_2 = 0;
>> + new_ire.post.urg = 0;
>> + new_ire.post.im = 1;
>> + new_ire.post.vector = gvec;
>> + new_ire.post.res_3 = 0;
>> + new_ire.post.res_4 = 0;
>> + new_ire.post.res_5 = 0;
>> + new_ire.post.pda_l = virt_to_maddr(pi_desc) >> (32 - PDA_LOW_BIT);
>> + new_ire.post.pda_h = virt_to_maddr(pi_desc) >> 32;
>> + new_ire.post.p = 1; /* finally, set present bit */
>> + }
>
>Along the lines of the comment above, you don't fill avail here. Why?
>
>Taking both together, I don't see why - after adding said initialization -
>new_ire needs to start out as a copy of the live IRTE. Instead you
>could memset() the whole structure to zero, or simply give it an empty
>initializer (saving you from initializing all the reserved fields, plus some
>more).
>
Yes, it is really confused. Maybe because this field is available to software in posting
format IRTE. We can reserve the avail field through introducing a flag to
distinguish initialization from update. We also can clear the avail field every
time if we don't use it right now, leaving the problem to others who want use
the avail field. Do you think which is better?
>And of course, along the lines of ...
>
>> if ( pdev )
>> set_msi_source_id(pdev, &new_ire);
>> else
>> set_hpet_source_id(msi_desc->hpet_id, &new_ire);
>
>... this, I see little reason for common fields to be initialized separately
>on each path. According to the code above that's only p (leaving
>aside fields which get zeroed), but anyway. Perhaps there should
>even be a common sub-structure of the union ...
I can add a patch in this series to do this.
>
>> @@ -637,7 +657,23 @@ static int msi_msg_to_remap_entry(
>> remap_rte->address_hi = 0;
>> remap_rte->data = index - i;
>>
>> - memcpy(iremap_entry, &new_ire, sizeof(struct iremap_entry));
>> + if ( !pi_desc )
>> + memcpy(iremap_entry, &new_ire, sizeof(struct iremap_entry));
>> + else
>> + {
>> + __uint128_t ret;
>> +
>> + old_ire = *iremap_entry;
>> + ret = cmpxchg16b(iremap_entry, &old_ire, &new_ire);
>> +
>> + /*
>> + * In the above, we use cmpxchg16 to atomically update the 128-bit IRTE,
>> + * and the hardware cannot update the IRTE behind us, so the return value
>> + * of cmpxchg16 should be the same as old_ire. This ASSERT validate it.
>> + */
>> + ASSERT(ret == old_ire.val);
>> + }
>
>Could you remind me again please why posted format updates need
>to use cmpxchg16b, but remapped format ones don't? (As a aside,
>with the code structure as you have it you should move the old_irte
>declaration here, or omit it altogether, as you could as well pass
>*iremap_entry directly afaict.)
Before feng left, I have asked him about this question. He told me that
the PDA field of posted format IRTE comprises of two parts:
Posted Descritor Address High[127:96] and Low [63:38]. If we want to update
PDA field, do it atomically or disable-update-enable. He also said, it had
been confirmed that cmpxchg16b was supported on all intel platform with VT-d PI.
If we assume that updates to remapped format IRTE only is to update either
64 bit or high 64 bit (except initialition), two 64bit memory write operations
is enough.
>
>> @@ -668,7 +704,8 @@ int msi_msg_write_remap_rte(
>>
>> drhd = pdev ? acpi_find_matched_drhd_unit(pdev)
>> : hpet_to_drhd(msi_desc->hpet_id);
>> - return drhd ? msi_msg_to_remap_entry(drhd->iommu, pdev, msi_desc, msg)
>> + return drhd ? msi_msg_to_remap_entry(drhd->iommu, pdev,
>> + msi_desc, msg, NULL, 0)
>
>Is this unconditional passing of NULL here really correct?
Since two parameters are added to this function, we should think about what
the function does again. the last 2 parameters are optional.
If they are not present, just means a physical device driver changes its msi
message. So it notifies iommu to do some changes to IRTE accordingly (the driver doesn't
know the format of the live IRTE). This is the case above.
If they are present, it means the msi should be delivered to the vcpu with the
vector num. To achieve that, the function replaces the old IRTE with a new
posted format IRTE.
>
>> @@ -946,17 +947,12 @@ int pi_update_irte(const struct vcpu *v, const struct pirq *pirq,
>> const uint8_t gvec)
>> {
>> struct irq_desc *desc;
>> - const struct msi_desc *msi_desc;
>> - int remap_index;
>> + struct msi_desc *msi_desc;
>> int rc = 0;
>> const struct pci_dev *pci_dev;
>> const struct acpi_drhd_unit *drhd;
>> struct iommu *iommu;
>> - struct ir_ctrl *ir_ctrl;
>> - struct iremap_entry *iremap_entries = NULL, *p = NULL;
>> - struct iremap_entry new_ire, old_ire;
>> const struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
>> - __uint128_t ret;
>>
>> desc = pirq_spin_lock_irq_desc(pirq, NULL);
>> if ( !desc )
>> @@ -976,8 +972,6 @@ int pi_update_irte(const struct vcpu *v, const struct pirq *pirq,
>> goto unlock_out;
>> }
>>
>> - remap_index = msi_desc->remap_index;
>> -
>> spin_unlock_irq(&desc->lock);
>>
>> ASSERT(pcidevs_locked());
>> @@ -992,35 +986,11 @@ int pi_update_irte(const struct vcpu *v, const struct pirq *pirq,
>> return -ENODEV;
>>
>> iommu = drhd->iommu;
>> - ir_ctrl = iommu_ir_ctrl(iommu);
>> - if ( !ir_ctrl )
>> + if ( !iommu_ir_ctrl(iommu) )
>> return -ENODEV;
>>
>> - spin_lock_irq(&ir_ctrl->iremap_lock);
>> -
>> - GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, remap_index, iremap_entries, p);
>> -
>> - old_ire = *p;
>> -
>> - /* Setup/Update interrupt remapping table entry. */
>> - setup_posted_irte(&new_ire, &old_ire, pi_desc, gvec);
>> - ret = cmpxchg16b(p, &old_ire, &new_ire);
>> -
>> - /*
>> - * In the above, we use cmpxchg16 to atomically update the 128-bit IRTE,
>> - * and the hardware cannot update the IRTE behind us, so the return value
>> - * of cmpxchg16 should be the same as old_ire. This ASSERT validate it.
>> - */
>> - ASSERT(ret == old_ire.val);
>> -
>> - iommu_flush_cache_entry(p, sizeof(*p));
>> - iommu_flush_iec_index(iommu, 0, remap_index);
>> -
>> - unmap_vtd_domain_page(iremap_entries);
>> -
>> - spin_unlock_irq(&ir_ctrl->iremap_lock);
>> -
>> - return 0;
>> + return msi_msg_to_remap_entry(iommu, pci_dev, msi_desc, &msi_desc->msg,
>> + pi_desc, gvec);
>
>There are few changes here which appear to have the potential of
>affecting behavior: Previously you didn't alter msi_desc or the MSI
>message contained therein (as documented by the pointer having
>been const). Is this possible updating of message and remap index
>really benign? In any event any such changes should be reasoned
>about in the commit message.
I agree that we can't update message and remap index in this pi_update_irte.
but msi_msg_to_remap_entry will change msi_desc when msi_desc->remap_index < 0.
How about splitting part of msi_msg_to_remap_entry to a new function which consumes a const
msi_desc parameter and pi_update_irte will call the new function?
>
>Jan
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xen.org
>https://lists.xen.org/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
next prev parent reply other threads:[~2017-02-22 1:53 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-18 1:57 [PATCH v8 0/7] VMX: Properly handle pi descriptor and per-cpu blocking list Feng Wu
2016-11-18 1:57 ` [PATCH v8 1/7] VMX: Permanently assign PI hook vmx_pi_switch_to() Feng Wu
2016-11-18 3:14 ` Tian, Kevin
2016-11-18 4:23 ` Wu, Feng
2016-11-18 1:57 ` [PATCH v8 2/7] VMX: Properly handle pi when all the assigned devices are removed Feng Wu
2016-11-18 3:19 ` Tian, Kevin
2016-11-18 4:27 ` Wu, Feng
2016-11-18 4:58 ` Tian, Kevin
2016-11-18 5:22 ` Wu, Feng
2016-11-18 9:23 ` Jan Beulich
2016-11-18 1:57 ` [PATCH v8 3/7] VMX: Make sure PI is in proper state before install the hooks Feng Wu
2016-11-18 4:11 ` Tian, Kevin
2016-11-18 4:27 ` Wu, Feng
2016-11-18 1:57 ` [PATCH v8 4/7] VT-d: Use one function to update both remapped and posted IRTE Feng Wu
2016-11-18 4:31 ` Tian, Kevin
2016-11-22 4:31 ` Wu, Feng
2016-11-22 9:58 ` Jan Beulich
2017-02-22 1:53 ` Chao Gao [this message]
2017-02-22 9:10 ` Jan Beulich
2017-02-22 7:12 ` Chao Gao
2017-02-22 14:54 ` Jan Beulich
2016-11-18 1:57 ` [PATCH v8 5/7] VT-d: No need to set irq affinity for posted format IRTE Feng Wu
2016-11-18 1:57 ` [PATCH v8 6/7] VT-d: Some cleanups Feng Wu
2016-11-18 4:45 ` Tian, Kevin
2016-11-18 1:57 ` [PATCH v8 7/7] VMX: Fixup PI descriptor when cpu is offline Feng Wu
2016-11-18 4:55 ` Tian, Kevin
2016-11-18 3:02 ` [PATCH v8 0/7] VMX: Properly handle pi descriptor and per-cpu blocking list Tian, Kevin
2016-11-18 4:21 ` Wu, Feng
2017-02-21 21:30 ` Chao Gao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170222015304.GB50645@skl-2s3.sh.intel.com \
--to=chao.gao@intel.com \
--cc=JBeulich@suse.com \
--cc=andrew.cooper3@citrix.com \
--cc=dario.faggioli@citrix.com \
--cc=george.dunlap@eu.citrix.com \
--cc=kevin.tian@intel.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).