All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Vrabel <david.vrabel@citrix.com>
To: Matt Wilson <msw@linux.com>
Cc: Charles Wang <muming.wq@taobao.com>,
	Ian Campbell <ian.campbell@citrix.com>,
	George Dunlap <George.Dunlap@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Zhu Yanhai <gaoyang.zyh@taobao.com>,
	Shen Yiben <zituan@taobao.com>,
	David Vrabel <david.vrabel@citrix.com>,
	Jan Beulich <JBeulich@suse.com>,
	xen-devel <xen-devel@lists.xenproject.org>,
	Zhu Yanhai <zhu.yanhai@gmail.com>,
	Wan Jia <jia.wanj@alibaba-inc.com>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>
Subject: Re: [PATCH] x86/fpu: CR0.TS should be set before trap into PV guest's #NM exception handler
Date: Thu, 27 Feb 2014 12:37:43 +0000	[thread overview]
Message-ID: <530F3197.2040403@citrix.com> (raw)
In-Reply-To: <20140227000405.GA11825@u109add4315675089e695.ant.amazon.com>

On 27/02/14 00:04, Matt Wilson wrote:
> On Wed, Nov 06, 2013 at 08:51:56AM +0000, Jan Beulich wrote:
>>>>> On 06.11.13 at 07:41, Zhu Yanhai <zhu.yanhai@gmail.com> wrote:
>>> As we know Intel X86's CR0.TS is a sticky bit, which means once set
>>> it remains set until cleared by some software routines, in other words,
>>> the exception handler expects the bit is set when it starts to execute.
>>
>> Since when would that be the case? CR0.TS is entirely unaffected
>> by exception invocations according to all I know. All that is known
>> here is that #NM wouldn't have occurred in the first place if CR0.TS
>> was clear.
>>
>>> However xen doesn't simulate this behavior quite well for PV guests -
>>> vcpu_restore_fpu_lazy() clears CR0.TS unconditionally in the very beginning,
>>> so the guest kernel's #NM handler runs with CR0.TS cleared. Generally 
>>> speaking
>>> it's fine since the linux kernel executes the exception handler with
>>> interrupt disabled and a sane #NM handler will clear the bit anyway
>>> before it exits, but there's a catch: if it's the first FPU trap for the 
>>> process,
>>> the linux kernel must allocate a piece of SLAB memory for it to save
>>> the FPU registers, which opens a schedule window as the memory
>>> allocation might sleep -- and with CR0.TS keeps clear!
>>>
>>> [see the code below in linux kernel,
>>
>> You're apparently referring to the pvops kernel.
>>
>>> void math_state_restore(void)
>>> {
>>>     struct task_struct *tsk = current;
>>>
>>>     if (!tsk_used_math(tsk)) {
>>>         local_irq_enable();
>>>         /*
>>>          * does a slab alloc which can sleep
>>>          */
>>>         if (init_fpu(tsk)) {                 <<<< Here it might open a schedule window
>>>             /*
>>>              * ran out of memory!
>>>              */
>>>             do_group_exit(SIGKILL);
>>>             return;
>>>         }
>>>         local_irq_disable();
>>>     }
>>>
>>>     __thread_fpu_begin(tsk);    <<<< Here the process gets marked as a 'fpu user'
>>>                                          after the schedule window
>>>
>>>     /*
>>>      * Paranoid restore. send a SIGSEGV if we fail to restore the state.
>>>      */
>>>     if (unlikely(restore_fpu_checking(tsk))) {
>>>         drop_init_fpu(tsk);
>>>         force_sig(SIGSEGV, tsk);
>>>         return;
>>>     }
>>>
>>>     tsk->fpu_counter++;
>>> }
>>> ]
>>
>> May I direct your attention to the XenoLinux one:
>>
>> asmlinkage void math_state_restore(void)
>> {
>> 	struct task_struct *me = current;
>>
>> 	/* NB. 'clts' is done for us by Xen during virtual trap. */
>> 	__get_cpu_var(xen_x86_cr0) &= ~X86_CR0_TS;
>> 	if (!used_math())
>> 		init_fpu(me);
>> 	restore_fpu_checking(&me->thread.i387.fxsave);
>> 	task_thread_info(me)->status |= TS_USEDFPU;
>> }
>>
>> Note the comment close to the beginning - the fact that CR0.TS
>> is clear at exception handler entry is actually part of the PV ABI,
>> i.e. by altering hypervisor behavior here you break all forward
>> ported kernels.
>>
>> Nevertheless I agree that there is an issue, but this needs to be
>> fixed on the Linux side (hence adding the Linux maintainers to Cc);
>> this issue was introduced way back in 2.6.26 (before that there
>> was no allocation on that path). It's not clear though whether
>> using GFP_ATOMIC for the allocation would be preferable over
>> stts() before calling the allocation function (and clts() if it
>> succeeded), or whether perhaps to defer the stts() until we
>> actually know the task is being switched out. It's going to be an
>> ugly, Xen-specific hack in any event.
> 
> Was there ever a resolution to this problem? I never saw a comment
> from the Linux Xen PV maintainers.

I think allocating on the context switch is mad and the irq
enable/disable just to allow the allocation looks equally mad.

I had vague plans to maintain a mempool for FPU contexts but couldn't
immediately think how we could guarantee that the pool would be kept
sufficiently populated.

David

  parent reply	other threads:[~2014-02-27 12:37 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-06  6:41 [PATCH] x86/fpu: CR0.TS should be set before trap into PV guest's #NM exception handler Zhu Yanhai
2013-11-06  8:51 ` Jan Beulich
2013-11-06  9:15   ` Zhu Yanhai
2013-11-06  9:28     ` Jan Beulich
2014-02-27  0:04   ` Matt Wilson
2014-02-27  8:00     ` Jan Beulich
2014-02-27 12:46       ` George Dunlap
2014-02-27 12:37     ` David Vrabel [this message]
2014-02-27 12:21 ` George Dunlap
2014-02-27 12:30   ` Processed: " xen
2015-09-11 16:50 ` Konrad Rzeszutek Wilk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=530F3197.2040403@citrix.com \
    --to=david.vrabel@citrix.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=gaoyang.zyh@taobao.com \
    --cc=ian.campbell@citrix.com \
    --cc=jia.wanj@alibaba-inc.com \
    --cc=msw@linux.com \
    --cc=muming.wq@taobao.com \
    --cc=xen-devel@lists.xenproject.org \
    --cc=zhu.yanhai@gmail.com \
    --cc=zituan@taobao.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.