Re: [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: George Dunlap <george.dunlap@citrix.com>
To: Dario Faggioli <dario.faggioli@citrix.com>,
	"Wu, Feng" <feng.wu@intel.com>
Cc: "Tian, Kevin" <kevin.tian@intel.com>, Keir Fraser <keir@xen.org>,
	George Dunlap <george.dunlap@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
	Jan Beulich <jbeulich@suse.com>
Subject: Re: [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling
Date: Thu, 17 Sep 2015 10:38:10 +0100	[thread overview]
Message-ID: <55FA8A02.30705@citrix.com> (raw)
In-Reply-To: <1442479704.15327.65.camel@citrix.com>

On 09/17/2015 09:48 AM, Dario Faggioli wrote:
> On Thu, 2015-09-17 at 08:00 +0000, Wu, Feng wrote:
> 
>>> -----Original Message-----
>>> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> 
>>> So, I guess, first of all, can you confirm whether or not it's exploding
>>> in debug builds?
>>
>> Does the following information in Config.mk mean it is a debug build?
>>
>> # A debug build of Xen and tools?
>> debug ?= y
>> debug_symbols ?= $(debug)
>>
> I think so. But as I said in my other email, I was wrong, and this is
> probably not an issue.
> 
>>> And in either case (just tossing out ideas) would it be
>>> possible to deal with the "interrupt already raised when blocking" case:
>>
>> Thanks for the suggestions below!
>>
> :-)
> 
>>>  - later in the context switching function ?
>> In this case, we might need to set a flag in vmx_pre_ctx_switch_pi() instead
>> of calling vcpu_unblock() directly, then when it returns to context_switch(),
>> we can check the flag and don't really block the vCPU. 
>>
> Yeah, and that would still be rather hard to understand and maintain,
> IMO.
> 
>> But I don't have a clear
>> picture about how to archive this, here are some questions from me:
>> - When we are in context_switch(), we have done the following changes to
>> vcpu's state:
>> 	* sd->curr is set to next
>> 	* vCPU's running state (both prev and next ) is changed by
>> 	  vcpu_runstate_change()
>> 	* next->is_running is set to 1
>> 	* periodic timer for prev is stopped
>> 	* periodic timer for next is setup
>> 	......
>>
>> So what point should we perform the action to _unblock_ the vCPU? We
>> Need to roll back the formal changes to the vCPU's state, right?
>>
> Mmm... not really. Not blocking prev does not mean that prev would be
> kept running on the pCPU, and that's true for your current solution as
> well! As you say yourself, you're already in the process of switching
> between prev and next, at a point where it's already a thing that next
> will be the vCPU that will run. Not blocking means that prev is
> reinserted to the runqueue, and a new invocation to the scheduler is
> (potentially) queued as well (via raising SCHEDULE_SOFTIRQ, in
> __runq_tickle()), but it's only when such new scheduling happens that
> prev will (potentially) be selected to run again.
> 
> So, no, unless I'm fully missing your point, there wouldn't be no
> rollback required. However, I still would like the other solution (doing
> stuff in vcpu_block()) better (see below).
> 
>>>  - with another hook, perhaps in vcpu_block() ?
>>
>> We could check this in vcpu_block(), however, the logic here is that before
>> vCPU is blocked, we need to change the posted-interrupt descriptor,
>> and during changing it, if 'ON' bit is set, which means VT-d hardware
>> issues a notification event because interrupts from the assigned devices
>> is coming, we don't need to block the vCPU and hence no need to update
>> the PI descriptor in this case. 
>>
> Yep, I saw that. But could it be possible to do *everything* related to
> blocking, including the update of the descriptor, in vcpu_block(), if no
> interrupt have been raised yet at that time? I mean, would you, if
> updating the descriptor in there, still get the event that allows you to
> call vcpu_wake(), and hence vmx_vcpu_wake_prepare(), which would undo
> the blocking, no matter whether that resulted in an actual context
> switch already or not?
> 
> I appreciate that this narrows the window for such an event to happen by
> quite a bit, making the logic itself a little less useful (it makes
> things more similar to "regular" blocking vs. event delivery, though,
> AFAICT), but if it's correct, ad if it allows us to save the ugly
> invocation of vcpu_unblock from context switch context, I'd give it a
> try.
> 
> After all, this PI thing requires actions to be taken when a vCPU is
> scheduled or descheduled because of blocking, unblocking and
> preemptions, and it would seem natural to me to:
>  - deal with blockings in vcpu_block()
>  - deal with unblockings in vcpu_wake()
>  - deal with preemptions in context_switch()
> 
> This does not mean being able to consolidate some of the cases (like
> blockings and preemptions, in the current version of the code) were not
> a nice thing... But we don't want it at all costs . :-)

So just to clarify the situation...

If a vcpu configured for the "running" state (i.e., NV set to
"posted_intr_vector", notifications enabled), and an interrupt happens
in the hypervisor -- what happens?

Is it the case that the interrupt is not actually delivered to the
processor, but that the pending bit will be set in the pi field, so that
the interrupt will be delivered the next time the hypervisor returns
into the guest?

(I am assuming that is the case, because if the hypervisor *does* get an
interrupt, then it can just unblock it there.)

This sort of race condition -- where we get an interrupt to wake up a
vcpu as we're blocking -- is already handled for "old-style" interrupts
in vcpu_block:

void vcpu_block(void)
{
    struct vcpu *v = current;

    set_bit(_VPF_blocked, &v->pause_flags);

    /* Check for events /after/ blocking: avoids wakeup waiting race. */
    if ( local_events_need_delivery() )
    {
        clear_bit(_VPF_blocked, &v->pause_flags);
    }
    else
    {
        TRACE_2D(TRC_SCHED_BLOCK, v->domain->domain_id, v->vcpu_id);
        raise_softirq(SCHEDULE_SOFTIRQ);
    }
}

That is, we set _VPF_blocked, so that any interrupt which would wake it
up actually wakes it up, and then we check local_events_need_delivery()
to see if there were any that came in after we decided to block but
before we made sure that an interrupt would wake us up.

I think it would be best if we could keep all the logic that does the
same thing in the same place.  Which would mean in vcpu_block(), after
calling set_bit(_VPF_blocked), changing the NV to pi_wakeup_vector, and
then extending local_events_need_delivery() to also look for pending PI
events.

Looking a bit more at your states, I think the actions that need to be
taken on all the transitions are these (collapsing 'runnable' and
'offline' into the same state):

blocked -> runnable (vcpu_wake)
 - NV = posted_intr_vector
 - Take vcpu off blocked list
 - SN = 1
runnable -> running (context_switch)
 - SN = 0
running -> runnable (context_switch)
 - SN = 1
running -> blocked (vcpu_block)
 - NV = pi_wakeup_vector
 - Add vcpu to blocked list

This actually has a few pretty nice properties:
1. You have a nice pair of complementary actions -- block / wake, run /
preempt
2. The potentially long actions with lists happen in vcpu_wake and
vcpu_block, not on the context switch path

And at this point, you don't have the "lazy context switch" issue
anymore, do we?  Because we've handled the "blocking" case in
vcpu_block(), we don't need to do anything in the main context_switch()
(which may do the lazy context switching into idle).  We only need to do
something in the actual __context_switch().

And at that point, could we actually get rid of the PI-specific context
switch hooks altogether, and just put the SN state changes required for
running->runnable and runnable->running in vmx_ctxt_switch_from() and
vmx_ctxt_switch_to()?

If so, then the only hooks we need to add are vcpu_block and vcpu_wake.
 To keep these consistent with other scheduling-related functions, I
would put these in arch_vcpu, next to ctxt_switch_from() and
ctxt_switch_to().

Thoughts?

 -George

next prev parent reply	other threads:[~2015-09-17  9:38 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-11  8:28 [PATCH v7 00/17] Add VT-d Posted-Interrupts support Feng Wu
2015-09-11  8:28 ` [PATCH v7 01/17] VT-d Posted-intterrupt (PI) design Feng Wu
2015-09-11  8:28 ` [PATCH v7 02/17] Add cmpxchg16b support for x86-64 Feng Wu
2015-09-22 13:50   ` Jan Beulich
2015-09-22 13:55     ` Wu, Feng
2015-09-11  8:28 ` [PATCH v7 03/17] iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature Feng Wu
2015-09-11  8:28 ` [PATCH v7 04/17] vt-d: VT-d Posted-Interrupts feature detection Feng Wu
2015-09-22 14:18   ` Jan Beulich
2015-09-11  8:28 ` [PATCH v7 05/17] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts Feng Wu
2015-09-22 14:20   ` Jan Beulich
2015-09-23  1:02     ` Wu, Feng
2015-09-23  7:36       ` Jan Beulich
2015-09-11  8:28 ` [PATCH v7 06/17] vmx: Add some helper functions for Posted-Interrupts Feng Wu
2015-09-11  8:28 ` [PATCH v7 07/17] vmx: Initialize VT-d Posted-Interrupts Descriptor Feng Wu
2015-09-11  8:28 ` [PATCH v7 08/17] vmx: Suppress posting interrupts when 'SN' is set Feng Wu
2015-09-22 14:23   ` Jan Beulich
2015-09-11  8:28 ` [PATCH v7 09/17] VT-d: Remove pointless casts Feng Wu
2015-09-22 14:30   ` Jan Beulich
2015-09-11  8:28 ` [PATCH v7 10/17] vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts Feng Wu
2015-09-22 14:28   ` Jan Beulich
2015-09-11  8:29 ` [PATCH v7 11/17] vt-d: Add API to update IRTE when VT-d PI is used Feng Wu
2015-09-22 14:42   ` Jan Beulich
2015-09-11  8:29 ` [PATCH v7 12/17] x86: move some APIC related macros to apicdef.h Feng Wu
2015-09-22 14:44   ` Jan Beulich
2015-09-11  8:29 ` [PATCH v7 13/17] Update IRTE according to guest interrupt config changes Feng Wu
2015-09-22 14:51   ` Jan Beulich
2015-09-11  8:29 ` [PATCH v7 14/17] vmx: Properly handle notification event when vCPU is running Feng Wu
2015-09-11  8:29 ` [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling Feng Wu
2015-09-16 16:00   ` Dario Faggioli
2015-09-16 17:18   ` Dario Faggioli
2015-09-16 18:05     ` Dario Faggioli
2015-09-17  8:00     ` Wu, Feng
2015-09-17  8:48       ` Dario Faggioli
2015-09-17  9:16         ` Wu, Feng
2015-09-17  9:38         ` George Dunlap [this message]
2015-09-17  9:39           ` George Dunlap
2015-09-17 11:44           ` George Dunlap
2015-09-17 12:40             ` Dario Faggioli
2015-09-17 14:30               ` George Dunlap
2015-09-17 16:36                 ` Dario Faggioli
2015-09-18  6:27                 ` Jan Beulich
2015-09-18  9:22                   ` Dario Faggioli
2015-09-18 14:31                     ` George Dunlap
2015-09-18 14:34                       ` George Dunlap
2015-09-11  8:29 ` [PATCH v7 16/17] VT-d: Dump the posted format IRTE Feng Wu
2015-09-22 14:58   ` Jan Beulich
2015-09-11  8:29 ` [PATCH v7 17/17] Add a command line parameter for VT-d posted-interrupts Feng Wu
  -- strict thread matches above, loose matches on Subject: below --
2015-09-21  5:08 [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling Wu, Feng
2015-09-21  9:18 ` George Dunlap
2015-09-21 11:59   ` Wu, Feng
2015-09-21 13:31     ` Dario Faggioli
2015-09-21 13:50       ` Wu, Feng
2015-09-21 14:11         ` Dario Faggioli
2015-09-22  5:10           ` Wu, Feng
2015-09-22 10:43             ` George Dunlap
2015-09-22 10:46               ` George Dunlap
2015-09-22 13:25                 ` Wu, Feng
2015-09-22 13:40                   ` Dario Faggioli
2015-09-22 13:52                     ` Wu, Feng
2015-09-22 14:15                       ` George Dunlap
2015-09-22 14:38                         ` Dario Faggioli
2015-09-23  5:52                           ` Wu, Feng
2015-09-23  7:59                             ` Dario Faggioli
2015-09-23  8:11                               ` Wu, Feng
2015-09-22 14:28                   ` George Dunlap
2015-09-23  5:37                     ` Wu, Feng
2015-09-21  5:09 Wu, Feng
2015-09-21  9:54 ` George Dunlap
2015-09-21 12:22   ` Wu, Feng
2015-09-21 14:24     ` Dario Faggioli
2015-09-22  7:19       ` Wu, Feng
2015-09-22  8:59         ` Jan Beulich
2015-09-22 13:40           ` Wu, Feng
2015-09-22 14:01             ` Jan Beulich
2015-09-23  9:44               ` George Dunlap
2015-09-23 12:35                 ` Wu, Feng
2015-09-23 15:25                   ` George Dunlap
2015-09-23 15:38                     ` Jan Beulich
2015-09-24  1:50                     ` Wu, Feng
2015-09-24  3:35                       ` Dario Faggioli
2015-09-24  7:51                       ` Jan Beulich
2015-09-24  8:03                         ` Wu, Feng
2015-09-22 10:26         ` George Dunlap
2015-09-23  6:35           ` Wu, Feng
2015-09-23  7:11             ` Dario Faggioli
2015-09-23  7:20               ` Wu, Feng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55FA8A02.30705@citrix.com \
    --to=george.dunlap@citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=dario.faggioli@citrix.com \
    --cc=feng.wu@intel.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=jbeulich@suse.com \
    --cc=keir@xen.org \
    --cc=kevin.tian@intel.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).