From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754073AbbG2Oou (ORCPT ); Wed, 29 Jul 2015 10:44:50 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:45093 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753874AbbG2Oos (ORCPT ); Wed, 29 Jul 2015 10:44:48 -0400 Message-ID: <55B8E68B.2030305@oracle.com> Date: Wed, 29 Jul 2015 10:43:23 -0400 From: Boris Ostrovsky User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Andrew Cooper , Andy Lutomirski CC: "security@kernel.org" , Peter Zijlstra , X86 ML , "linux-kernel@vger.kernel.org" , Steven Rostedt , xen-devel , Borislav Petkov , Jan Beulich , Sasha Levin Subject: Re: [Xen-devel] [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option References: <55B64FEA.70204@oracle.com> <55B659EC.5030009@oracle.com> <55B75993.90909@citrix.com> <55B7AE39.7000101@citrix.com> <55B7B791.2050208@oracle.com> <55B822B8.3090608@citrix.com> <55B841FF.2000102@oracle.com> <55B8E16C.2050406@citrix.com> In-Reply-To: <55B8E16C.2050406@citrix.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Source-IP: userv0021.oracle.com [156.151.31.71] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 07/29/2015 10:21 AM, Andrew Cooper wrote: > On 29/07/15 06:28, Andy Lutomirski wrote: >> On Tue, Jul 28, 2015 at 8:01 PM, Boris Ostrovsky >> wrote: >>> On 07/28/2015 08:47 PM, Andrew Cooper wrote: >>>> On 29/07/2015 01:21, Andy Lutomirski wrote: >>>>> On Tue, Jul 28, 2015 at 10:10 AM, Boris Ostrovsky >>>>> wrote: >>>>>> On 07/28/2015 01:07 PM, Andy Lutomirski wrote: >>>>>>> On Tue, Jul 28, 2015 at 9:30 AM, Andrew Cooper >>>>>>> wrote: >>>>>>>> I suspect that the set_ldt(NULL, 0) call hasn't reached Xen before >>>>>>>> xen_free_ldt() is attempting to nab back the pages which Xen still has >>>>>>>> mapped as an LDT. >>>>>>>> >>>>>>> I just instrumented it with yet more LSL instructions. I'm pretty >>>>>>> sure that set_ldt really is clearing at least LDT entry zero. >>>>>>> Nonetheless the free_ldt call still oopses. >>>>>>> >>>>>> Yes, I added some instrumentation to the hypervisor and we definitely >>>>>> set >>>>>> LDT to NULL before failing. >>>>>> >>>>>> -boris >>>>> Looking at map_ldt_shadow_page: what keeps shadow_ldt_mapcnt from >>>>> getting incremented once on each CPU at the same time if both CPUs >>>>> fault in the same shadow LDT page at the same time? >>>> Nothing, but that is fine. If a page is in use in two vcpus LDTs, it is >>>> expected to have a type refcount of 2. >>>> >>>>> Similarly, what >>>>> keeps both CPUs from calling get_page_type at the same time and >>>>> therefore losing track of the page type reference count? >>>> a cmpxchg() loop in the depths of __get_page_type(). >>>> >>>>> I don't see why vmalloc or vm_unmap_aliases would have anything to do >>>>> with this, though. >>> So just for kicks I made lazy_max_pages() return 0 to free vmaps immediately >>> and the problem went away. >> As far as I can tell, this affects TLB flushes but not unmaps. That >> means that my patch is totally bogus -- vm_unmap_aliases() *flushed* >> aliases but isn't involved in removing them from the page tables. >> That must be why xen_alloc_ldt and xen_set_ldt work today. >> >> So what does flushing the TLB have to do with anything? The only >> thing I can think of is that it might force some deferred hypercalls >> out. I can reproduce this easily on UP, so IPIs aren't involved. >> >> The other odd thing is that it seems like this happens when clearing >> the LDT and freeing the old one but not when setting the LDT and >> freeing the old one. This is plausibly related to the lazy mode in >> effect at the time, but I have no evidence for that. >> >> Two more data points: Putting xen_flush_mc before and after the >> SET_LDT multicall has no effect. Putting flush_tlb_all() in >> xen_free_ldt doesn't help either, while vm_unmap_aliases() in the >> exact same place does help. > FYI, I have got a repro now and am investigating. To simplify your test case, this is sufficient for me to trigger this: #define _GNU_SOURCE #include #include #include int main() { int i; struct user_desc desc = { .entry_number = 0, .base_addr = 0, .limit = 10, .seg_32bit = 1, .contents = 2, /* Code, not conforming */ .read_exec_only = 0, .limit_in_pages = 0, .seg_not_present = 0, .useable = 0 }; for (i = 0; i < 500; i++) syscall(SYS_modify_ldt, 0x11, &desc, sizeof(desc)); } Run this program in a loop --- the error is triggered (again, for me), when it exits. -boris