Re: [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: George Dunlap <george.dunlap@citrix.com>
To: "Wu, Feng" <feng.wu@intel.com>,
	Dario Faggioli <dario.faggioli@citrix.com>
Cc: "Tian, Kevin" <kevin.tian@intel.com>, Keir Fraser <keir@xen.org>,
	George Dunlap <george.dunlap@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
	Jan Beulich <jbeulich@suse.com>
Subject: Re: [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling
Date: Mon, 21 Sep 2015 10:18:31 +0100	[thread overview]
Message-ID: <55FFCB67.3050900@citrix.com> (raw)
In-Reply-To: <E959C4978C3B6342920538CF579893F002722BBF@SHSMSX104.ccr.corp.intel.com>

On 09/21/2015 06:08 AM, Wu, Feng wrote:
> 
> 
>> -----Original Message-----
>> From: George Dunlap [mailto:george.dunlap@citrix.com]
>> Sent: Thursday, September 17, 2015 5:38 PM
>> To: Dario Faggioli; Wu, Feng
>> Cc: xen-devel@lists.xen.org; Tian, Kevin; Keir Fraser; George Dunlap; Andrew
>> Cooper; Jan Beulich
>> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic
>> handling
>>
>> On 09/17/2015 09:48 AM, Dario Faggioli wrote:
>>> On Thu, 2015-09-17 at 08:00 +0000, Wu, Feng wrote:
>>>
>>>>> -----Original Message-----
>>>>> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
>>>
>>>>> So, I guess, first of all, can you confirm whether or not it's exploding
>>>>> in debug builds?
>>>>
>>>> Does the following information in Config.mk mean it is a debug build?
>>>>
>>>> # A debug build of Xen and tools?
>>>> debug ?= y
>>>> debug_symbols ?= $(debug)
>>>>
>>> I think so. But as I said in my other email, I was wrong, and this is
>>> probably not an issue.
>>>
>>>>> And in either case (just tossing out ideas) would it be
>>>>> possible to deal with the "interrupt already raised when blocking" case:
>>>>
>>>> Thanks for the suggestions below!
>>>>
>>> :-)
>>>
>>>>>  - later in the context switching function ?
>>>> In this case, we might need to set a flag in vmx_pre_ctx_switch_pi() instead
>>>> of calling vcpu_unblock() directly, then when it returns to context_switch(),
>>>> we can check the flag and don't really block the vCPU.
>>>>
>>> Yeah, and that would still be rather hard to understand and maintain,
>>> IMO.
>>>
>>>> But I don't have a clear
>>>> picture about how to archive this, here are some questions from me:
>>>> - When we are in context_switch(), we have done the following changes to
>>>> vcpu's state:
>>>> 	* sd->curr is set to next
>>>> 	* vCPU's running state (both prev and next ) is changed by
>>>> 	  vcpu_runstate_change()
>>>> 	* next->is_running is set to 1
>>>> 	* periodic timer for prev is stopped
>>>> 	* periodic timer for next is setup
>>>> 	......
>>>>
>>>> So what point should we perform the action to _unblock_ the vCPU? We
>>>> Need to roll back the formal changes to the vCPU's state, right?
>>>>
>>> Mmm... not really. Not blocking prev does not mean that prev would be
>>> kept running on the pCPU, and that's true for your current solution as
>>> well! As you say yourself, you're already in the process of switching
>>> between prev and next, at a point where it's already a thing that next
>>> will be the vCPU that will run. Not blocking means that prev is
>>> reinserted to the runqueue, and a new invocation to the scheduler is
>>> (potentially) queued as well (via raising SCHEDULE_SOFTIRQ, in
>>> __runq_tickle()), but it's only when such new scheduling happens that
>>> prev will (potentially) be selected to run again.
>>>
>>> So, no, unless I'm fully missing your point, there wouldn't be no
>>> rollback required. However, I still would like the other solution (doing
>>> stuff in vcpu_block()) better (see below).
>>>
>>>>>  - with another hook, perhaps in vcpu_block() ?
>>>>
>>>> We could check this in vcpu_block(), however, the logic here is that before
>>>> vCPU is blocked, we need to change the posted-interrupt descriptor,
>>>> and during changing it, if 'ON' bit is set, which means VT-d hardware
>>>> issues a notification event because interrupts from the assigned devices
>>>> is coming, we don't need to block the vCPU and hence no need to update
>>>> the PI descriptor in this case.
>>>>
>>> Yep, I saw that. But could it be possible to do *everything* related to
>>> blocking, including the update of the descriptor, in vcpu_block(), if no
>>> interrupt have been raised yet at that time? I mean, would you, if
>>> updating the descriptor in there, still get the event that allows you to
>>> call vcpu_wake(), and hence vmx_vcpu_wake_prepare(), which would undo
>>> the blocking, no matter whether that resulted in an actual context
>>> switch already or not?
>>>
>>> I appreciate that this narrows the window for such an event to happen by
>>> quite a bit, making the logic itself a little less useful (it makes
>>> things more similar to "regular" blocking vs. event delivery, though,
>>> AFAICT), but if it's correct, ad if it allows us to save the ugly
>>> invocation of vcpu_unblock from context switch context, I'd give it a
>>> try.
>>>
>>> After all, this PI thing requires actions to be taken when a vCPU is
>>> scheduled or descheduled because of blocking, unblocking and
>>> preemptions, and it would seem natural to me to:
>>>  - deal with blockings in vcpu_block()
>>>  - deal with unblockings in vcpu_wake()
>>>  - deal with preemptions in context_switch()
>>>
>>> This does not mean being able to consolidate some of the cases (like
>>> blockings and preemptions, in the current version of the code) were not
>>> a nice thing... But we don't want it at all costs . :-)
>>
>> So just to clarify the situation...
>>
>> If a vcpu configured for the "running" state (i.e., NV set to
>> "posted_intr_vector", notifications enabled), and an interrupt happens
>> in the hypervisor -- what happens?
>>
>> Is it the case that the interrupt is not actually delivered to the
>> processor, but that the pending bit will be set in the pi field, so that
>> the interrupt will be delivered the next time the hypervisor returns
>> into the guest?
>>
>> (I am assuming that is the case, because if the hypervisor *does* get an
>> interrupt, then it can just unblock it there.)
>>
>> This sort of race condition -- where we get an interrupt to wake up a
>> vcpu as we're blocking -- is already handled for "old-style" interrupts
>> in vcpu_block:
>>
>> void vcpu_block(void)
>> {
>>     struct vcpu *v = current;
>>
>>     set_bit(_VPF_blocked, &v->pause_flags);
>>
>>     /* Check for events /after/ blocking: avoids wakeup waiting race. */
>>     if ( local_events_need_delivery() )
>>     {
>>         clear_bit(_VPF_blocked, &v->pause_flags);
>>     }
>>     else
>>     {
>>         TRACE_2D(TRC_SCHED_BLOCK, v->domain->domain_id, v->vcpu_id);
>>         raise_softirq(SCHEDULE_SOFTIRQ);
>>     }
>> }
>>
>> That is, we set _VPF_blocked, so that any interrupt which would wake it
>> up actually wakes it up, and then we check local_events_need_delivery()
>> to see if there were any that came in after we decided to block but
>> before we made sure that an interrupt would wake us up.
>>
>> I think it would be best if we could keep all the logic that does the
>> same thing in the same place.  Which would mean in vcpu_block(), after
>> calling set_bit(_VPF_blocked), changing the NV to pi_wakeup_vector, and
>> then extending local_events_need_delivery() to also look for pending PI
>> events.
>>
>> Looking a bit more at your states, I think the actions that need to be
>> taken on all the transitions are these (collapsing 'runnable' and
>> 'offline' into the same state):
>>
>> blocked -> runnable (vcpu_wake)
>>  - NV = posted_intr_vector
>>  - Take vcpu off blocked list
>>  - SN = 1
>> runnable -> running (context_switch)
>>  - SN = 0
> 
> Need set the 'NDST' field to the right dest vCPU as well.
> 
>> running -> runnable (context_switch)
>>  - SN = 1
>> running -> blocked (vcpu_block)
>>  - NV = pi_wakeup_vector
>>  - Add vcpu to blocked list
> 
> Need set the 'NDST' field to the pCPU which owns the blocking list,
> So we can wake up the vCPU from the right blocking list in the wakeup
> event handler.
> 
>>
>> This actually has a few pretty nice properties:
>> 1. You have a nice pair of complementary actions -- block / wake, run /
>> preempt
>> 2. The potentially long actions with lists happen in vcpu_wake and
>> vcpu_block, not on the context switch path
>>
>> And at this point, you don't have the "lazy context switch" issue
>> anymore, do we?  Because we've handled the "blocking" case in
>> vcpu_block(), we don't need to do anything in the main context_switch()
>> (which may do the lazy context switching into idle).  We only need to do
>> something in the actual __context_switch().
> 
> I think the handling for lazy context switch is not only for the blocking case,
> we still need to do something for lazy context switch even we handled the
> blocking case in vcpu_block(), such as,
> 1. For non-idle -> idle
> - set 'SN'

If we set SN in vcpu_block(), then we don't need to set it on context
switch -- 是不是？

> 2. For idle -> non-idle
> - clear 'SN'
> - set 'NDST' filed to the right cpu the vCPU is going to running on. (Maybe
> this one doesn't belong to lazy context switch, if the cpu of the non-idle
> vCPU was changed, then per_cpu(curr_vcpu, cpu) != next in context_switch(),
> hence it will go to __context_switch() directly, right?)

If we clear SN* in vcpu_wake(), then we don't need to clear it on a
context switch.  And the only way we can transition from "idle lazy
context switch" to "runnable" is if the vcpu was the last vcpu running
on this pcpu -- in which case, NDST should already be set to this pcpu,
right?

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

next prev parent reply	other threads:[~2015-09-21  9:18 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-21  5:08 [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling Wu, Feng
2015-09-21  9:18 ` George Dunlap [this message]
2015-09-21 11:59   ` Wu, Feng
2015-09-21 13:31     ` Dario Faggioli
2015-09-21 13:50       ` Wu, Feng
2015-09-21 14:11         ` Dario Faggioli
2015-09-22  5:10           ` Wu, Feng
2015-09-22 10:43             ` George Dunlap
2015-09-22 10:46               ` George Dunlap
2015-09-22 13:25                 ` Wu, Feng
2015-09-22 13:40                   ` Dario Faggioli
2015-09-22 13:52                     ` Wu, Feng
2015-09-22 14:15                       ` George Dunlap
2015-09-22 14:38                         ` Dario Faggioli
2015-09-23  5:52                           ` Wu, Feng
2015-09-23  7:59                             ` Dario Faggioli
2015-09-23  8:11                               ` Wu, Feng
2015-09-22 14:28                   ` George Dunlap
2015-09-23  5:37                     ` Wu, Feng
  -- strict thread matches above, loose matches on Subject: below --
2015-09-21  5:09 Wu, Feng
2015-09-21  9:54 ` George Dunlap
2015-09-21 12:22   ` Wu, Feng
2015-09-21 14:24     ` Dario Faggioli
2015-09-22  7:19       ` Wu, Feng
2015-09-22  8:59         ` Jan Beulich
2015-09-22 13:40           ` Wu, Feng
2015-09-22 14:01             ` Jan Beulich
2015-09-23  9:44               ` George Dunlap
2015-09-23 12:35                 ` Wu, Feng
2015-09-23 15:25                   ` George Dunlap
2015-09-23 15:38                     ` Jan Beulich
2015-09-24  1:50                     ` Wu, Feng
2015-09-24  3:35                       ` Dario Faggioli
2015-09-24  7:51                       ` Jan Beulich
2015-09-24  8:03                         ` Wu, Feng
2015-09-22 10:26         ` George Dunlap
2015-09-23  6:35           ` Wu, Feng
2015-09-23  7:11             ` Dario Faggioli
2015-09-23  7:20               ` Wu, Feng
2015-09-11  8:28 [PATCH v7 00/17] Add VT-d Posted-Interrupts support Feng Wu
2015-09-11  8:29 ` [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling Feng Wu
2015-09-16 16:00   ` Dario Faggioli
2015-09-16 17:18   ` Dario Faggioli
2015-09-16 18:05     ` Dario Faggioli
2015-09-17  8:00     ` Wu, Feng
2015-09-17  8:48       ` Dario Faggioli
2015-09-17  9:16         ` Wu, Feng
2015-09-17  9:38         ` George Dunlap
2015-09-17  9:39           ` George Dunlap
2015-09-17 11:44           ` George Dunlap
2015-09-17 12:40             ` Dario Faggioli
2015-09-17 14:30               ` George Dunlap
2015-09-17 16:36                 ` Dario Faggioli
2015-09-18  6:27                 ` Jan Beulich
2015-09-18  9:22                   ` Dario Faggioli
2015-09-18 14:31                     ` George Dunlap
2015-09-18 14:34                       ` George Dunlap

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55FFCB67.3050900@citrix.com \
    --to=george.dunlap@citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=dario.faggioli@citrix.com \
    --cc=feng.wu@intel.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=jbeulich@suse.com \
    --cc=keir@xen.org \
    --cc=kevin.tian@intel.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).