From: Chao Gao <chao.gao@intel.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: Kevin Tian <kevin.tian@intel.com>, Feng Wu <feng.wu@intel.com>,
George Dunlap <George.Dunlap@eu.citrix.com>,
Andrew Cooper <andrew.cooper3@citrix.com>,
Dario Faggioli <dario.faggioli@citrix.com>,
xen-devel@lists.xen.org, Jun Nakajima <jun.nakajima@intel.com>
Subject: Re: [PATCH v9 5/8] VT-d: Introduce a new function update_irte_for_msi_common
Date: Thu, 2 Mar 2017 15:14:21 +0800 [thread overview]
Message-ID: <20170302071421.GA15870@skl-2s3.sh.intel.com> (raw)
In-Reply-To: <58B7ECB6020000780013F1E9@prv-mh.provo.novell.com>
On Thu, Mar 02, 2017 at 01:58:14AM -0700, Jan Beulich wrote:
>>>> On 27.02.17 at 02:45, <chao.gao@intel.com> wrote:
>> @@ -547,16 +548,116 @@ static int remap_entry_to_msi_msg(
>> return 0;
>> }
>>
>> +/*
>> + * This function is a common interface to update irte for msi case.
>> + *
>> + * If @pi_desc != NULL and @gvec != 0, the IRTE will be updated to a posted
>> + * format. In this case, @msg is ignored because constructing a posted format
>> + * IRTE doesn't need any information about the msi address or msi data.
>> + *
>> + * If @pi_desc == NULL and @gvec == 0, the IRTE will be updated to a remapped
>> + * format. In this case, @msg can't be NULL.
>
>This kind of implies that in the other case msg can be NULL. Please
>make this explicit or remove the last sentence, to avoid confusing
>readers. Plus, if msg can be NULL in that case, why don't you pass
>NULL in that case?
I choose to make this explicit.
>
>> + * Assume 'ir_ctrl->iremap_lock' has been acquired and the remap_index
>> + * of msi_desc has a benign value.
>> + */
>> +static int update_irte_for_msi_common(
>> + struct iommu *iommu, const struct pci_dev *pdev,
>> + const struct msi_desc *msi_desc, struct msi_msg *msg,
>> + const struct pi_desc *pi_desc, const uint8_t gvec)
>> +{
>> + struct iremap_entry *iremap_entry = NULL, *iremap_entries;
>> + struct iremap_entry new_ire = {{0}};
>
>I think just "{ }" will do.
>
>> + unsigned int index = msi_desc->remap_index;
>> + struct ir_ctrl *ir_ctrl = iommu_ir_ctrl(iommu);
>> +
>> + ASSERT( ir_ctrl );
>> + ASSERT( spin_is_locked(&ir_ctrl->iremap_lock) );
>> + ASSERT( (index >= 0) && (index < IREMAP_ENTRY_NR) );
>
>Stray blanks inside parentheses.
>
>> + if ( (!pi_desc && gvec) || (pi_desc && !gvec) )
>
>gvec == 0 alone is never a valid check: Either all vectors are valid,
>or a whole range (e.g. 0x00...0x0f) is invalid. Furthermore I think
>such checks are easier to read as either
How about only use pi_desc is NULL or not to decide the format of the IRTE?
>
> if ( !pi_desc != !gvec )
>
>or
>
> if ( pi_desc ? !gvec : gvec )
>
>> + return -EINVAL;
>> +
>> + if ( !pi_desc && !gvec && !msg )
>
>With the earlier check the first or second part could be omitted
>afaict.
Agree
>
>> + return -EINVAL;
>> +
>> + GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, index,
>> + iremap_entries, iremap_entry);
>> +
>> + if ( !pi_desc )
>> + {
>> + /* Set interrupt remapping table entry */
>
>Again a request for consistency: Either have a respective comment
>also at the top of the else branch, or omit the one here too.
>
>> + new_ire.remap.dm = msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT;
>> + new_ire.remap.tm = msg->data >> MSI_DATA_TRIGGER_SHIFT;
>> + new_ire.remap.dlm = msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT;
>> + /* Hardware require RH = 1 for LPR delivery mode */
>> + new_ire.remap.rh = (new_ire.remap.dlm == dest_LowestPrio);
>> + new_ire.remap.vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) &
>> + MSI_DATA_VECTOR_MASK;
>> + if ( x2apic_enabled )
>> + new_ire.remap.dst = msg->dest32;
>> + else
>> + new_ire.remap.dst = ((msg->address_lo >> MSI_ADDR_DEST_ID_SHIFT)
>> + & 0xff) << 8;
>
>Please strive to eliminate literal numbers here. At least the 0xff looks
>to be easy to deal with (using MASK_EXTR() together with
>MSI_ADDR_DEST_ID_MASK).
>
>> + new_ire.remap.p = 1;
>> + }
>> + else
>> + {
>> + new_ire.post.im = 1;
>> + new_ire.post.vector = gvec;
>> + new_ire.post.pda_l = virt_to_maddr(pi_desc) >> (32 - PDA_LOW_BIT);
>> + new_ire.post.pda_h = virt_to_maddr(pi_desc) >> 32;
>> + new_ire.post.p = 1;
>> + }
>> +
>> + if ( pdev )
>> + set_msi_source_id(pdev, &new_ire);
>> + else
>> + set_hpet_source_id(msi_desc->hpet_id, &new_ire);
>> +
>> + if ( iremap_entry->val != new_ire.val )
>> + {
>> + if ( cpu_has_cx16 )
>> + {
>> + __uint128_t ret;
>> + struct iremap_entry old_ire;
>> +
>> + old_ire = *iremap_entry;
>> + ret = cmpxchg16b(iremap_entry, &old_ire, &new_ire);
>> +
>> + /*
>> + * In the above, we use cmpxchg16 to atomically update the 128-bit
>> + * IRTE, and the hardware cannot update the IRTE behind us, so
>> + * the return value of cmpxchg16 should be the same as old_ire.
>> + * This ASSERT validate it.
>> + */
>> + ASSERT(ret == old_ire.val);
>> + }
>> + else
>> + {
>
>This wants a comment added explaining the conditions under which
>this is safe. Perhaps also one or more ASSERT()s to that effect.
Yes, will add an explaination and ASSERT().
>
>> + iremap_entry->lo = new_ire.lo;
>> + iremap_entry->hi = new_ire.hi;
>> + }
>> +
>> + iommu_flush_cache_entry(iremap_entry, sizeof(struct iremap_entry));
>
>sizeof(*iremap_entry)
>
>> @@ -592,38 +693,33 @@ static int msi_msg_to_remap_entry(
>> return -EFAULT;
>> }
>>
>> + /* Get the IRTE's bind relationship with guest from the live IRTE. */
>> GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, index,
>> iremap_entries, iremap_entry);
>> -
>> - memcpy(&new_ire, iremap_entry, sizeof(struct iremap_entry));
>> -
>> - /* Set interrupt remapping table entry */
>> - new_ire.remap.fpd = 0;
>> - new_ire.remap.dm = (msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT) & 0x1;
>> - new_ire.remap.tm = (msg->data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
>> - new_ire.remap.dlm = (msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x1;
>> - /* Hardware require RH = 1 for LPR delivery mode */
>> - new_ire.remap.rh = (new_ire.remap.dlm == dest_LowestPrio);
>> - new_ire.remap.avail = 0;
>> - new_ire.remap.res_1 = 0;
>> - new_ire.remap.vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) &
>> - MSI_DATA_VECTOR_MASK;
>> - new_ire.remap.res_2 = 0;
>> - if ( x2apic_enabled )
>> - new_ire.remap.dst = msg->dest32;
>> + if ( !iremap_entry->remap.im )
>> + {
>> + gvec = 0;
>> + pi_desc = NULL;
>> + }
>> else
>> - new_ire.remap.dst = ((msg->address_lo >> MSI_ADDR_DEST_ID_SHIFT)
>> - & 0xff) << 8;
>> + {
>> + gvec = iremap_entry->post.vector;
>> + pi_desc = (void *)((((u64)iremap_entry->post.pda_h) << PDA_LOW_BIT )
>> + + iremap_entry->post.pda_l);
>> + }
>> + unmap_vtd_domain_page(iremap_entries);
>
>I don't follow: Why does it matter what the entry currently holds?
>As I've pointed out more than once before (mainly to Feng), the
>goal ought to be to produce the new entry solely based on what
>the intended new state is, i.e. function input and global data.
>
I think the function introduced by this patch is to produce the new
entry solely based on input. If someone wants to produce the new entry,
it can call it directly.
I want to explain why we read the entry.
msi_msg_to_remap_entry() can be called before a msi gets bound to a guest
interrupt or after that. If we call the function without realizing the msi
has been binded to a guest interrupt, the IRTE would be updated to a
remapped format breaking the binding (at least breaking the intention to use
VT-d PI). I think this is a possible case in the current code. This patch avoids
this case and provides a new function to the callers who are intended to replace
a posted format IRTE with a remapped format IRTE. Reading this entry is to get
the binding information and use it to update IRTE ( as comments in code, when
the IRTE is in posted format, we can suppress the update since the content of IRTE will not change for the binding information hasn't change. and Also if the binding information changed, we should call pi_update_irte ).
At this moment, we don't recognize any existing caller of
msi_msg_to_remap_entry() needs to update a posted IRTE to a remapped IRTE.
If the need emerges, we can expose the common function to the callers.
I also want to extend pi_update_irte to replace a posted IRTE to a remapped one,when guest wrongly configurate its msi interrupt.
I hope I have made it a little clear. and glad to see your further suggestion
to make the code and the description better to enlighten the following readers.
>> - if ( pdev )
>> - set_msi_source_id(pdev, &new_ire);
>> - else
>> - set_hpet_source_id(msi_desc->hpet_id, &new_ire);
>> - new_ire.remap.res_3 = 0;
>> - new_ire.remap.res_4 = 0;
>> - new_ire.remap.p = 1; /* finally, set present bit */
>> + /*
>> + * Actually we can just suppress the update when IRTE is already in posted
>> + * format. After a msi gets bound to a guest interrupt, changes to the msi
>> + * message have no effect to the IRTE.
>> + */
>> + update_irte_for_msi_common(iommu, pdev, msi_desc, msg, pi_desc, gvec);
>>
>> /* now construct new MSI/MSI-X rte entry */
>> + if ( msi_desc->msi_attrib.type == PCI_CAP_ID_MSI )
>> + nr = msi_desc->msi.nvec;
>
>Why do you re-do here what was already done earlier in the function
>(code you didn't touch)?
>
Will remove this.
>> @@ -996,31 +1046,11 @@ int pi_update_irte(const struct vcpu *v, const struct pirq *pirq,
>> if ( !ir_ctrl )
>> return -ENODEV;
>>
>> - spin_lock_irq(&ir_ctrl->iremap_lock);
>> -
>> - GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, remap_index, iremap_entries, p);
>> -
>> - old_ire = *p;
>> -
>> - /* Setup/Update interrupt remapping table entry. */
>> - setup_posted_irte(&new_ire, &old_ire, pi_desc, gvec);
>> - ret = cmpxchg16b(p, &old_ire, &new_ire);
>> -
>> - /*
>> - * In the above, we use cmpxchg16 to atomically update the 128-bit IRTE,
>> - * and the hardware cannot update the IRTE behind us, so the return value
>> - * of cmpxchg16 should be the same as old_ire. This ASSERT validate it.
>> - */
>> - ASSERT(ret == old_ire.val);
>> -
>> - iommu_flush_cache_entry(p, sizeof(*p));
>> - iommu_flush_iec_index(iommu, 0, remap_index);
>> -
>> - unmap_vtd_domain_page(iremap_entries);
>> -
>> - spin_unlock_irq(&ir_ctrl->iremap_lock);
>> -
>> - return 0;
>> + spin_lock_irqsave(&ir_ctrl->iremap_lock, flags);
>> + rc = update_irte_for_msi_common(iommu, pci_dev, msi_desc, NULL, pi_desc,
>> + gvec);
>> + spin_unlock_irqrestore(&ir_ctrl->iremap_lock, flags);
>> + return rc;
>
>Considering the old code use spin_lock_irq() (and there's such left
>also earlier in the function), why do you use the irqsave function
>here?
Yes, it should be spin_lock_irq(). I saw it used in msi_msg_to_remap_entry() so
I followed it without digging deep into the difference. The spin_lock_irq()
makes an assumption to the current context, so it's better.
Thanks,
Chao
>
>Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
next prev parent reply other threads:[~2017-03-02 7:14 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-27 1:45 [PATCH v9 0/8] VMX: Properly handle pi descriptor and per-cpu Chao Gao
2017-02-27 1:45 ` [PATCH v9 1/8] VMX: Permanently assign PI hook vmx_pi_switch_to() Chao Gao
2017-02-28 16:43 ` Jan Beulich
2017-03-01 0:01 ` Chao Gao
2017-03-01 7:41 ` Jan Beulich
2017-03-03 8:29 ` Tian, Kevin
2017-03-03 10:49 ` Jan Beulich
2017-03-03 11:54 ` Tian, Kevin
2017-02-27 1:45 ` [PATCH v9 2/8] xen/passthrough: Reject self-(de)assignment of devices Chao Gao
2017-02-28 16:46 ` Jan Beulich
2017-02-27 1:45 ` [PATCH v9 3/8] VMX: Properly handle pi when all the assigned devices are removed Chao Gao
2017-03-03 11:51 ` Tian, Kevin
2017-02-27 1:45 ` [PATCH v9 4/8] VMX: Make sure PI is in proper state before install the hooks Chao Gao
2017-02-27 1:45 ` [PATCH v9 5/8] VT-d: Introduce a new function update_irte_for_msi_common Chao Gao
2017-03-02 8:58 ` Jan Beulich
2017-03-02 7:14 ` Chao Gao [this message]
2017-03-02 14:32 ` Jan Beulich
2017-02-27 1:45 ` [PATCH v9 6/8] VT-d: Some cleanups Chao Gao
2017-02-27 1:45 ` [PATCH v9 7/8] VMX: Fixup PI descriptor when cpu is offline Chao Gao
2017-02-27 1:45 ` [PATCH v9 8/8] VT-d: Add copy_irte_{to, from}_irt for updating irte Chao Gao
2017-03-02 9:03 ` Jan Beulich
2017-03-15 10:38 ` Tian, Kevin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170302071421.GA15870@skl-2s3.sh.intel.com \
--to=chao.gao@intel.com \
--cc=George.Dunlap@eu.citrix.com \
--cc=JBeulich@suse.com \
--cc=andrew.cooper3@citrix.com \
--cc=dario.faggioli@citrix.com \
--cc=feng.wu@intel.com \
--cc=jun.nakajima@intel.com \
--cc=kevin.tian@intel.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).