From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754566Ab1LLWuK (ORCPT ); Mon, 12 Dec 2011 17:50:10 -0500 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:33860 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753184Ab1LLWuI (ORCPT ); Mon, 12 Dec 2011 17:50:08 -0500 Message-ID: <4EE6850F.5070002@fb.com> Date: Mon, 12 Dec 2011 14:49:51 -0800 From: Arun Sharma User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:8.0) Gecko/20111105 Thunderbird/8.0 MIME-Version: 1.0 To: Andrew Lutomirski CC: , Kumar Sundararajan , Peter Zijlstra , Ingo Molnar , Thomas Gleixner , john stultz Subject: Re: [PATCH 2/2] Add a thread cpu time implementation to vDSO References: <1323718578-1157-1-git-send-email-asharma@fb.com> <1323718578-1157-3-git-send-email-asharma@fb.com> In-Reply-To: Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [192.168.18.252] X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.5.7110,1.0.211,0.0.0000 definitions=2011-12-12_10:2011-12-12,2011-12-12,1970-01-01 signatures=0 X-Proofpoint-Spam-Reason: safe Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/12/11 12:15 PM, Andrew Lutomirski wrote: > This seems like a neat idea, but I'm a little worried about the > implementation. Surely someone with 4k+ cpus will complain when the > percpu vvar data exceeds a page and crashes. I have no personal > objection to a dynamically-sized vvar area, but it might need more > careful handling. I'm for disabling the vsyscall when the data doesn't fit in a couple of pages. > > I also wonder if there a problem with information leaking. If there > was an entire per-cpu vvar page (perhaps mapped differently on each > cpu) then nothing interesting would leak. But that could add > overhead. > The timing based attacks depend on the granularity of timestamps. I feel what's available here is too coarse grained to be useful. Happy to disable the code at compile time for those cases. Are there CONFIG_HIGH_SECURITY type of options I could use for this purpose? >> diff --git a/arch/x86/include/asm/vvar.h b/arch/x86/include/asm/vvar.h >> index 0fd7a4a..e36e1c1 100644 >> --- a/arch/x86/include/asm/vvar.h >> +++ b/arch/x86/include/asm/vvar.h >> @@ -47,5 +47,6 @@ >> DECLARE_VVAR(0, volatile unsigned long, jiffies) >> DECLARE_VVAR(16, int, vgetcpu_mode) >> DECLARE_VVAR(128, struct vsyscall_gtod_data, vsyscall_gtod_data) >> +DECLARE_VVAR(2048, struct vcpu_data, vcpu_data) > > Any reason this isn't page-aligned? Offset 2048 seems like an odd place. > I meant it to be 128 + sizeof(struct vsyscall_gtod_data) so we fit everything in one page if CONFIG_NR_CPUS is small enough. I'll fix it up in the next rev. >> notrace static long vdso_fallback_gettime(long clock, struct timespec *ts) >> { >> long ret; >> - asm("syscall" : "=a" (ret) : >> - "0" (__NR_clock_gettime),"D" (clock), "S" (ts) : "memory"); >> + asm volatile("syscall" : "=a" (ret) : >> + "0" (__NR_clock_gettime),"D" (clock), "S" (ts) : "memory"); >> return ret; >> } > > Huh? This should probably be a separate patch (and probably not a > -stable candidate, since it would take amazing compiler stupidity to > generate anything other than the obvious code). The memory clobber > may also be enough to make this officially safe. Yes - this should be a separate patch. gcc-4.4 likes to get rid of the instruction in __do_thread_cpu_time without the asm volatile (in spite of the memory clobber). >> + if (vp->tsc_unstable) { >> + struct timespec ts; >> + vdso_fallback_gettime(CLOCK_THREAD_CPUTIME_ID,&ts); >> + return timespec_to_ns(&ts); >> + } > > Yuck -- another flag indicating whether we're using the tsc. I renamed it to thread_cputime_disabled to deal with NR_CPUS > 64. > >> + >> + do { >> + native_read_tscp(&p); > > Do all 64-bit cpus have rdtscp? ISTR older Intel machines don't have it. > Yeah - I ran into this one when testing with KVM on a rdtscp capable CPU. Will disable the vsyscall when the CPU doesn't support rdtscp. >> --- a/arch/x86/vdso/vdso.lds.S >> +++ b/arch/x86/vdso/vdso.lds.S >> @@ -25,6 +25,8 @@ VERSION { >> __vdso_getcpu; >> time; >> __vdso_time; >> + thread_cpu_time; >> + __vdso_thread_cpu_time; >> local: *; >> }; >> } > > Why do we have the non-__vdso versions? IMO anything that actually > uses them is likely to be confused. They have the same names as the > glibc wrappers, but the glibc wrappers have different return value and > errno semantics. I'd say just use __vdso. I'm not familiar with the history of __vdso_foo vs foo. Happy to get rid of the non __vdso versions. > > Also, what's wrong with just adding the functionality to __vdso_clock_gettime? > Performance. Most of this is client dependent (whether it expects time in nanoseconds or timespec). Baseline: 19.34 secs vclock_gettime: 4.74 secs thread_cpu_time: 3.62 secs Our use case prefers nanoseconds, so the conversion to timespec and back adds overhead. Also, having a direct entry point vs going through the switch in clock_gettime() adds some cycles. I have some people here asking me if they could get CLOCK_REALTIME and CLOCK_MONOTONIC also in nanoseconds for the same reason. >> +struct vcpu_data { >> + struct vpercpu_data vpercpu[NR_CPUS]; >> + unsigned int tsc_khz; > > I think this also duplicates some of the gtod_data stuff, at least in > the case that TSC works for gtod. > vgotd.h seems to be clock source independent. This vsyscall is usable only on systems with tsc_unstable == 0. Also, there is only one instance of mult and shift there, whereas we save it on a per-cpu basis in the VVAR page. Thanks for the review and comments! -Arun