From: George Dunlap <george.dunlap@citrix.com>
To: "Wu, Feng" <feng.wu@intel.com>,
Dario Faggioli <dario.faggioli@citrix.com>
Cc: "Tian, Kevin" <kevin.tian@intel.com>, Keir Fraser <keir@xen.org>,
George Dunlap <george.dunlap@eu.citrix.com>,
Andrew Cooper <andrew.cooper3@citrix.com>,
"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
Jan Beulich <jbeulich@suse.com>
Subject: Re: [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling
Date: Mon, 21 Sep 2015 10:18:31 +0100 [thread overview]
Message-ID: <55FFCB67.3050900@citrix.com> (raw)
In-Reply-To: <E959C4978C3B6342920538CF579893F002722BBF@SHSMSX104.ccr.corp.intel.com>
On 09/21/2015 06:08 AM, Wu, Feng wrote:
>
>
>> -----Original Message-----
>> From: George Dunlap [mailto:george.dunlap@citrix.com]
>> Sent: Thursday, September 17, 2015 5:38 PM
>> To: Dario Faggioli; Wu, Feng
>> Cc: xen-devel@lists.xen.org; Tian, Kevin; Keir Fraser; George Dunlap; Andrew
>> Cooper; Jan Beulich
>> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic
>> handling
>>
>> On 09/17/2015 09:48 AM, Dario Faggioli wrote:
>>> On Thu, 2015-09-17 at 08:00 +0000, Wu, Feng wrote:
>>>
>>>>> -----Original Message-----
>>>>> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
>>>
>>>>> So, I guess, first of all, can you confirm whether or not it's exploding
>>>>> in debug builds?
>>>>
>>>> Does the following information in Config.mk mean it is a debug build?
>>>>
>>>> # A debug build of Xen and tools?
>>>> debug ?= y
>>>> debug_symbols ?= $(debug)
>>>>
>>> I think so. But as I said in my other email, I was wrong, and this is
>>> probably not an issue.
>>>
>>>>> And in either case (just tossing out ideas) would it be
>>>>> possible to deal with the "interrupt already raised when blocking" case:
>>>>
>>>> Thanks for the suggestions below!
>>>>
>>> :-)
>>>
>>>>> - later in the context switching function ?
>>>> In this case, we might need to set a flag in vmx_pre_ctx_switch_pi() instead
>>>> of calling vcpu_unblock() directly, then when it returns to context_switch(),
>>>> we can check the flag and don't really block the vCPU.
>>>>
>>> Yeah, and that would still be rather hard to understand and maintain,
>>> IMO.
>>>
>>>> But I don't have a clear
>>>> picture about how to archive this, here are some questions from me:
>>>> - When we are in context_switch(), we have done the following changes to
>>>> vcpu's state:
>>>> * sd->curr is set to next
>>>> * vCPU's running state (both prev and next ) is changed by
>>>> vcpu_runstate_change()
>>>> * next->is_running is set to 1
>>>> * periodic timer for prev is stopped
>>>> * periodic timer for next is setup
>>>> ......
>>>>
>>>> So what point should we perform the action to _unblock_ the vCPU? We
>>>> Need to roll back the formal changes to the vCPU's state, right?
>>>>
>>> Mmm... not really. Not blocking prev does not mean that prev would be
>>> kept running on the pCPU, and that's true for your current solution as
>>> well! As you say yourself, you're already in the process of switching
>>> between prev and next, at a point where it's already a thing that next
>>> will be the vCPU that will run. Not blocking means that prev is
>>> reinserted to the runqueue, and a new invocation to the scheduler is
>>> (potentially) queued as well (via raising SCHEDULE_SOFTIRQ, in
>>> __runq_tickle()), but it's only when such new scheduling happens that
>>> prev will (potentially) be selected to run again.
>>>
>>> So, no, unless I'm fully missing your point, there wouldn't be no
>>> rollback required. However, I still would like the other solution (doing
>>> stuff in vcpu_block()) better (see below).
>>>
>>>>> - with another hook, perhaps in vcpu_block() ?
>>>>
>>>> We could check this in vcpu_block(), however, the logic here is that before
>>>> vCPU is blocked, we need to change the posted-interrupt descriptor,
>>>> and during changing it, if 'ON' bit is set, which means VT-d hardware
>>>> issues a notification event because interrupts from the assigned devices
>>>> is coming, we don't need to block the vCPU and hence no need to update
>>>> the PI descriptor in this case.
>>>>
>>> Yep, I saw that. But could it be possible to do *everything* related to
>>> blocking, including the update of the descriptor, in vcpu_block(), if no
>>> interrupt have been raised yet at that time? I mean, would you, if
>>> updating the descriptor in there, still get the event that allows you to
>>> call vcpu_wake(), and hence vmx_vcpu_wake_prepare(), which would undo
>>> the blocking, no matter whether that resulted in an actual context
>>> switch already or not?
>>>
>>> I appreciate that this narrows the window for such an event to happen by
>>> quite a bit, making the logic itself a little less useful (it makes
>>> things more similar to "regular" blocking vs. event delivery, though,
>>> AFAICT), but if it's correct, ad if it allows us to save the ugly
>>> invocation of vcpu_unblock from context switch context, I'd give it a
>>> try.
>>>
>>> After all, this PI thing requires actions to be taken when a vCPU is
>>> scheduled or descheduled because of blocking, unblocking and
>>> preemptions, and it would seem natural to me to:
>>> - deal with blockings in vcpu_block()
>>> - deal with unblockings in vcpu_wake()
>>> - deal with preemptions in context_switch()
>>>
>>> This does not mean being able to consolidate some of the cases (like
>>> blockings and preemptions, in the current version of the code) were not
>>> a nice thing... But we don't want it at all costs . :-)
>>
>> So just to clarify the situation...
>>
>> If a vcpu configured for the "running" state (i.e., NV set to
>> "posted_intr_vector", notifications enabled), and an interrupt happens
>> in the hypervisor -- what happens?
>>
>> Is it the case that the interrupt is not actually delivered to the
>> processor, but that the pending bit will be set in the pi field, so that
>> the interrupt will be delivered the next time the hypervisor returns
>> into the guest?
>>
>> (I am assuming that is the case, because if the hypervisor *does* get an
>> interrupt, then it can just unblock it there.)
>>
>> This sort of race condition -- where we get an interrupt to wake up a
>> vcpu as we're blocking -- is already handled for "old-style" interrupts
>> in vcpu_block:
>>
>> void vcpu_block(void)
>> {
>> struct vcpu *v = current;
>>
>> set_bit(_VPF_blocked, &v->pause_flags);
>>
>> /* Check for events /after/ blocking: avoids wakeup waiting race. */
>> if ( local_events_need_delivery() )
>> {
>> clear_bit(_VPF_blocked, &v->pause_flags);
>> }
>> else
>> {
>> TRACE_2D(TRC_SCHED_BLOCK, v->domain->domain_id, v->vcpu_id);
>> raise_softirq(SCHEDULE_SOFTIRQ);
>> }
>> }
>>
>> That is, we set _VPF_blocked, so that any interrupt which would wake it
>> up actually wakes it up, and then we check local_events_need_delivery()
>> to see if there were any that came in after we decided to block but
>> before we made sure that an interrupt would wake us up.
>>
>> I think it would be best if we could keep all the logic that does the
>> same thing in the same place. Which would mean in vcpu_block(), after
>> calling set_bit(_VPF_blocked), changing the NV to pi_wakeup_vector, and
>> then extending local_events_need_delivery() to also look for pending PI
>> events.
>>
>> Looking a bit more at your states, I think the actions that need to be
>> taken on all the transitions are these (collapsing 'runnable' and
>> 'offline' into the same state):
>>
>> blocked -> runnable (vcpu_wake)
>> - NV = posted_intr_vector
>> - Take vcpu off blocked list
>> - SN = 1
>> runnable -> running (context_switch)
>> - SN = 0
>
> Need set the 'NDST' field to the right dest vCPU as well.
>
>> running -> runnable (context_switch)
>> - SN = 1
>> running -> blocked (vcpu_block)
>> - NV = pi_wakeup_vector
>> - Add vcpu to blocked list
>
> Need set the 'NDST' field to the pCPU which owns the blocking list,
> So we can wake up the vCPU from the right blocking list in the wakeup
> event handler.
>
>>
>> This actually has a few pretty nice properties:
>> 1. You have a nice pair of complementary actions -- block / wake, run /
>> preempt
>> 2. The potentially long actions with lists happen in vcpu_wake and
>> vcpu_block, not on the context switch path
>>
>> And at this point, you don't have the "lazy context switch" issue
>> anymore, do we? Because we've handled the "blocking" case in
>> vcpu_block(), we don't need to do anything in the main context_switch()
>> (which may do the lazy context switching into idle). We only need to do
>> something in the actual __context_switch().
>
> I think the handling for lazy context switch is not only for the blocking case,
> we still need to do something for lazy context switch even we handled the
> blocking case in vcpu_block(), such as,
> 1. For non-idle -> idle
> - set 'SN'
If we set SN in vcpu_block(), then we don't need to set it on context
switch -- 是不是?
> 2. For idle -> non-idle
> - clear 'SN'
> - set 'NDST' filed to the right cpu the vCPU is going to running on. (Maybe
> this one doesn't belong to lazy context switch, if the cpu of the non-idle
> vCPU was changed, then per_cpu(curr_vcpu, cpu) != next in context_switch(),
> hence it will go to __context_switch() directly, right?)
If we clear SN* in vcpu_wake(), then we don't need to clear it on a
context switch. And the only way we can transition from "idle lazy
context switch" to "runnable" is if the vcpu was the last vcpu running
on this pcpu -- in which case, NDST should already be set to this pcpu,
right?
-George
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2015-09-21 9:18 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-21 5:08 [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling Wu, Feng
2015-09-21 9:18 ` George Dunlap [this message]
2015-09-21 11:59 ` Wu, Feng
2015-09-21 13:31 ` Dario Faggioli
2015-09-21 13:50 ` Wu, Feng
2015-09-21 14:11 ` Dario Faggioli
2015-09-22 5:10 ` Wu, Feng
2015-09-22 10:43 ` George Dunlap
2015-09-22 10:46 ` George Dunlap
2015-09-22 13:25 ` Wu, Feng
2015-09-22 13:40 ` Dario Faggioli
2015-09-22 13:52 ` Wu, Feng
2015-09-22 14:15 ` George Dunlap
2015-09-22 14:38 ` Dario Faggioli
2015-09-23 5:52 ` Wu, Feng
2015-09-23 7:59 ` Dario Faggioli
2015-09-23 8:11 ` Wu, Feng
2015-09-22 14:28 ` George Dunlap
2015-09-23 5:37 ` Wu, Feng
-- strict thread matches above, loose matches on Subject: below --
2015-09-21 5:09 Wu, Feng
2015-09-21 9:54 ` George Dunlap
2015-09-21 12:22 ` Wu, Feng
2015-09-21 14:24 ` Dario Faggioli
2015-09-22 7:19 ` Wu, Feng
2015-09-22 8:59 ` Jan Beulich
2015-09-22 13:40 ` Wu, Feng
2015-09-22 14:01 ` Jan Beulich
2015-09-23 9:44 ` George Dunlap
2015-09-23 12:35 ` Wu, Feng
2015-09-23 15:25 ` George Dunlap
2015-09-23 15:38 ` Jan Beulich
2015-09-24 1:50 ` Wu, Feng
2015-09-24 3:35 ` Dario Faggioli
2015-09-24 7:51 ` Jan Beulich
2015-09-24 8:03 ` Wu, Feng
2015-09-22 10:26 ` George Dunlap
2015-09-23 6:35 ` Wu, Feng
2015-09-23 7:11 ` Dario Faggioli
2015-09-23 7:20 ` Wu, Feng
2015-09-11 8:28 [PATCH v7 00/17] Add VT-d Posted-Interrupts support Feng Wu
2015-09-11 8:29 ` [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling Feng Wu
2015-09-16 16:00 ` Dario Faggioli
2015-09-16 17:18 ` Dario Faggioli
2015-09-16 18:05 ` Dario Faggioli
2015-09-17 8:00 ` Wu, Feng
2015-09-17 8:48 ` Dario Faggioli
2015-09-17 9:16 ` Wu, Feng
2015-09-17 9:38 ` George Dunlap
2015-09-17 9:39 ` George Dunlap
2015-09-17 11:44 ` George Dunlap
2015-09-17 12:40 ` Dario Faggioli
2015-09-17 14:30 ` George Dunlap
2015-09-17 16:36 ` Dario Faggioli
2015-09-18 6:27 ` Jan Beulich
2015-09-18 9:22 ` Dario Faggioli
2015-09-18 14:31 ` George Dunlap
2015-09-18 14:34 ` George Dunlap
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55FFCB67.3050900@citrix.com \
--to=george.dunlap@citrix.com \
--cc=andrew.cooper3@citrix.com \
--cc=dario.faggioli@citrix.com \
--cc=feng.wu@intel.com \
--cc=george.dunlap@eu.citrix.com \
--cc=jbeulich@suse.com \
--cc=keir@xen.org \
--cc=kevin.tian@intel.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).