* [RFC] perf: mmap2 not covering VM_CLONE regions @ 2013-09-30 15:44 Stephane Eranian 2013-09-30 16:15 ` Peter Zijlstra 2013-10-08 14:23 ` David Ahern 0 siblings, 2 replies; 42+ messages in thread From: Stephane Eranian @ 2013-09-30 15:44 UTC (permalink / raw) To: LKML Cc: Peter Zijlstra, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, David Ahern, Jiri Olsa, Hugh Dickins Hi, I was alerted by people trying to use the PERF_RECORD_MMAP2 record to disambiguate virtual address mappings that there is a case where the record does not contain enough information. As you know, the MMAP2 record adds the major, minor, ino number, inode generation numbers to a mapping. But it does that only for file or pseudo -file backed mappings. That covers file mmaps and also SYSV shared memory segments. However there is a another kind of situation that arises in some multi-process benchmarks where a region of memory is cloned using VM_CLONE. As such, the virtual addresses match between the processes but the major, minor, inode, inode generation fields are all zeroes because there is no inode associated with the mapping. Yet, it is important for the tool to know the mappings between the processes are pointing to the same physical data. We need to cover this case and I am seeking for advice on how to best address this need given that we discarded using the plain physical address for disambiguation. Thanks. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-09-30 15:44 [RFC] perf: mmap2 not covering VM_CLONE regions Stephane Eranian @ 2013-09-30 16:15 ` Peter Zijlstra 2013-09-30 16:48 ` Stephane Eranian 2013-10-08 14:23 ` David Ahern 1 sibling, 1 reply; 42+ messages in thread From: Peter Zijlstra @ 2013-09-30 16:15 UTC (permalink / raw) To: Stephane Eranian Cc: LKML, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, David Ahern, Jiri Olsa, Hugh Dickins On Mon, Sep 30, 2013 at 05:44:41PM +0200, Stephane Eranian wrote: > Hi, > > I was alerted by people trying to use the PERF_RECORD_MMAP2 > record to disambiguate virtual address mappings that there is a case > where the record does not contain enough information. > > As you know, the MMAP2 record adds the major, minor, ino number, > inode generation numbers to a mapping. But it does that only for > file or pseudo -file backed mappings. That covers file mmaps and also > SYSV shared memory segments. > > However there is a another kind of situation that arises in some > multi-process benchmarks where a region of memory is cloned > using VM_CLONE. As such, the virtual addresses match between > the processes but the major, minor, inode, inode generation fields > are all zeroes because there is no inode associated with the mapping. > Yet, it is important for the tool to know the mappings between the > processes are pointing to the same physical data. > > We need to cover this case and I am seeking for advice on how to > best address this need given that we discarded using the plain physical > address for disambiguation. Urgh.. who in his bloody mind is playing VM_CLNOE games that is not pthread_creatE() ? ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-09-30 16:15 ` Peter Zijlstra @ 2013-09-30 16:48 ` Stephane Eranian 2013-09-30 16:54 ` Peter Zijlstra 0 siblings, 1 reply; 42+ messages in thread From: Stephane Eranian @ 2013-09-30 16:48 UTC (permalink / raw) To: Peter Zijlstra Cc: LKML, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, David Ahern, Jiri Olsa, Hugh Dickins On Mon, Sep 30, 2013 at 6:15 PM, Peter Zijlstra <peterz@infradead.org> wrote: > On Mon, Sep 30, 2013 at 05:44:41PM +0200, Stephane Eranian wrote: >> Hi, >> >> I was alerted by people trying to use the PERF_RECORD_MMAP2 >> record to disambiguate virtual address mappings that there is a case >> where the record does not contain enough information. >> >> As you know, the MMAP2 record adds the major, minor, ino number, >> inode generation numbers to a mapping. But it does that only for >> file or pseudo -file backed mappings. That covers file mmaps and also >> SYSV shared memory segments. >> >> However there is a another kind of situation that arises in some >> multi-process benchmarks where a region of memory is cloned >> using VM_CLONE. As such, the virtual addresses match between >> the processes but the major, minor, inode, inode generation fields >> are all zeroes because there is no inode associated with the mapping. >> Yet, it is important for the tool to know the mappings between the >> processes are pointing to the same physical data. >> >> We need to cover this case and I am seeking for advice on how to >> best address this need given that we discarded using the plain physical >> address for disambiguation. > > Urgh.. who in his bloody mind is playing VM_CLNOE games that is not > pthread_creatE() ? Some matrix multiply benchmark, I guess. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-09-30 16:48 ` Stephane Eranian @ 2013-09-30 16:54 ` Peter Zijlstra 2013-10-01 11:22 ` Stephane Eranian 0 siblings, 1 reply; 42+ messages in thread From: Peter Zijlstra @ 2013-09-30 16:54 UTC (permalink / raw) To: Stephane Eranian Cc: LKML, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, David Ahern, Jiri Olsa, Hugh Dickins On Mon, Sep 30, 2013 at 06:48:55PM +0200, Stephane Eranian wrote: > On Mon, Sep 30, 2013 at 6:15 PM, Peter Zijlstra <peterz@infradead.org> wrote: > > On Mon, Sep 30, 2013 at 05:44:41PM +0200, Stephane Eranian wrote: > >> Hi, > >> > >> I was alerted by people trying to use the PERF_RECORD_MMAP2 > >> record to disambiguate virtual address mappings that there is a case > >> where the record does not contain enough information. > >> > >> As you know, the MMAP2 record adds the major, minor, ino number, > >> inode generation numbers to a mapping. But it does that only for > >> file or pseudo -file backed mappings. That covers file mmaps and also > >> SYSV shared memory segments. > >> > >> However there is a another kind of situation that arises in some > >> multi-process benchmarks where a region of memory is cloned > >> using VM_CLONE. As such, the virtual addresses match between > >> the processes but the major, minor, inode, inode generation fields > >> are all zeroes because there is no inode associated with the mapping. > >> Yet, it is important for the tool to know the mappings between the > >> processes are pointing to the same physical data. > >> > >> We need to cover this case and I am seeking for advice on how to > >> best address this need given that we discarded using the plain physical > >> address for disambiguation. > > > > Urgh.. who in his bloody mind is playing VM_CLNOE games that is not > > pthread_creatE() ? > > Some matrix multiply benchmark, I guess. So the problem is that we don't have a user visible address space identifier; with CLONE_THREAD we have the thread group id that acts like this. But for bare CLONE_VM usage there's nothing afaik. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-09-30 16:54 ` Peter Zijlstra @ 2013-10-01 11:22 ` Stephane Eranian 2013-10-02 11:23 ` Peter Zijlstra 0 siblings, 1 reply; 42+ messages in thread From: Stephane Eranian @ 2013-10-01 11:22 UTC (permalink / raw) To: Peter Zijlstra Cc: LKML, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, David Ahern, Jiri Olsa, Hugh Dickins On Mon, Sep 30, 2013 at 6:54 PM, Peter Zijlstra <peterz@infradead.org> wrote: > > On Mon, Sep 30, 2013 at 06:48:55PM +0200, Stephane Eranian wrote: > > On Mon, Sep 30, 2013 at 6:15 PM, Peter Zijlstra <peterz@infradead.org> wrote: > > > On Mon, Sep 30, 2013 at 05:44:41PM +0200, Stephane Eranian wrote: > > >> Hi, > > >> > > >> I was alerted by people trying to use the PERF_RECORD_MMAP2 > > >> record to disambiguate virtual address mappings that there is a case > > >> where the record does not contain enough information. > > >> > > >> As you know, the MMAP2 record adds the major, minor, ino number, > > >> inode generation numbers to a mapping. But it does that only for > > >> file or pseudo -file backed mappings. That covers file mmaps and also > > >> SYSV shared memory segments. > > >> > > >> However there is a another kind of situation that arises in some > > >> multi-process benchmarks where a region of memory is cloned > > >> using VM_CLONE. As such, the virtual addresses match between > > >> the processes but the major, minor, inode, inode generation fields > > >> are all zeroes because there is no inode associated with the mapping. > > >> Yet, it is important for the tool to know the mappings between the > > >> processes are pointing to the same physical data. > > >> > > >> We need to cover this case and I am seeking for advice on how to > > >> best address this need given that we discarded using the plain physical > > >> address for disambiguation. > > > > > > Urgh.. who in his bloody mind is playing VM_CLNOE games that is not > > > pthread_creatE() ? > > > > Some matrix multiply benchmark, I guess. > > So the problem is that we don't have a user visible address space > identifier; with CLONE_THREAD we have the thread group id that acts > like this. But for bare CLONE_VM usage there's nothing afaik. >From the tool's perspective, the MMAP2 record must contain enough information to identify that the mapping points to the same physical pages in that particular case (multi-process + VM_CLONE). As we have it now all inode-related fields are zero which is useless (indicates: no info). In other words, we need to make up some unique number and stash it in the maj.min,ino triplet somehow. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-10-01 11:22 ` Stephane Eranian @ 2013-10-02 11:23 ` Peter Zijlstra 2013-10-02 11:58 ` Peter Zijlstra 0 siblings, 1 reply; 42+ messages in thread From: Peter Zijlstra @ 2013-10-02 11:23 UTC (permalink / raw) To: Stephane Eranian Cc: LKML, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, David Ahern, Jiri Olsa, Hugh Dickins, Kees Cook On Tue, Oct 01, 2013 at 01:22:58PM +0200, Stephane Eranian wrote: > > So the problem is that we don't have a user visible address space > > identifier; with CLONE_THREAD we have the thread group id that acts > > like this. But for bare CLONE_VM usage there's nothing afaik. > > > From the tool's perspective, the MMAP2 record must contain enough information > to identify that the mapping points to the same physical pages in that > particular > case (multi-process + VM_CLONE). As we have it now all inode-related fields > are zero which is useless (indicates: no info). In other words, we need to make > up some unique number and stash it in the maj.min,ino triplet somehow. So the only thing I can come up with is something like the below; supposedly the sha hash mixing a boot time random seed and the mm pointer is enough to avoid it being a data leak. And of course there's the possibility of a collision. --- --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -39,11 +39,15 @@ #include <linux/hw_breakpoint.h> #include <linux/mm_types.h> #include <linux/cgroup.h> +#include <linux/random.h> +#include <linux/cryptohash.h> #include "internal.h" #include <asm/irq_regs.h> +static u64 __perf_rand_seed; + struct remote_function_call { struct task_struct *p; int (*func)(void *info); @@ -5136,6 +5140,33 @@ static void perf_event_mmap_event(struct min = MINOR(dev); } else { + union { + struct { + u64 ino, gen; + u32 min; + }; + u32 digest[SHA_DIGEST_WORDS]; + } hash; + union { + struct { + u64 seed; + u64 ptr; + }; + u8 message[SHA_MESSAGE_BYTES]; + } data; + u32 workspace[SHA_WORKSPACE_WORDS]; + + memset(&data, 0, sizeof(data)); + data.seed = __perf_rand_seed; + data.ptr = (u64)vma->vm_mm; + + sha_init(hash.digest); + sha_transform(hash.digest, data.message, workspace); + + gen = hash.gen; + ino = hash.ino; + min = hash.min; + if (arch_vma_name(mmap_event->vma)) { name = strncpy(tmp, arch_vma_name(mmap_event->vma), sizeof(tmp) - 1); @@ -7895,6 +7926,8 @@ void __init perf_event_init(void) /* do not patch jump label more than once per second */ jump_label_rate_limit(&perf_sched_events, HZ); + get_random_bytes(&__perf_rand_seed, sizeof(__perf_rand_seed)); + /* * Build time assertion that we keep the data_head at the intended * location. IOW, validation we got the __reserved[] size right. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-10-02 11:23 ` Peter Zijlstra @ 2013-10-02 11:58 ` Peter Zijlstra 2013-10-02 12:39 ` Ingo Molnar 0 siblings, 1 reply; 42+ messages in thread From: Peter Zijlstra @ 2013-10-02 11:58 UTC (permalink / raw) To: Stephane Eranian Cc: LKML, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, David Ahern, Jiri Olsa, Hugh Dickins, Kees Cook On Wed, Oct 02, 2013 at 01:23:16PM +0200, Peter Zijlstra wrote: > So the only thing I can come up with is something like the below; > supposedly the sha hash mixing a boot time random seed and the mm > pointer is enough to avoid it being a data leak. That is, right until it becomes feasible to run 2^64 SHA1 computations. I've actually no idea how hard that is given todays GPU assisted efforts. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-10-02 11:58 ` Peter Zijlstra @ 2013-10-02 12:39 ` Ingo Molnar 2013-10-02 12:46 ` Peter Zijlstra 0 siblings, 1 reply; 42+ messages in thread From: Ingo Molnar @ 2013-10-02 12:39 UTC (permalink / raw) To: Peter Zijlstra Cc: Stephane Eranian, LKML, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, David Ahern, Jiri Olsa, Hugh Dickins, Kees Cook, Linus Torvalds, Andrew Morton * Peter Zijlstra <peterz@infradead.org> wrote: > On Wed, Oct 02, 2013 at 01:23:16PM +0200, Peter Zijlstra wrote: > > > So the only thing I can come up with is something like the below; > > supposedly the sha hash mixing a boot time random seed and the mm > > pointer is enough to avoid it being a data leak. > > That is, right until it becomes feasible to run 2^64 SHA1 computations. > I've actually no idea how hard that is given todays GPU assisted > efforts. Well, here are the possible cryptanalytic attacks I can think of: - differential, because here you don't just have access to the hash value but you can essentially feed highly correlated plaintext to the hash at will, by starting/stopping threads, knowing their typical mm pointer differences, etc. I.e. less than 2^64, potentially a lot less. - then there are timing attacks, and someone having access to a PMU context and who can trigger this SHA1 computation arbitrarily in task local context can run very accurate and low noise timing attacks... I don't think the kernel's sha_transform() is hardened against timing attacks, it's performance optimized so it has variable execution time highly dependent on plaintext input - which leaks information about the plaintext. But then again, how realistic is an attack? All that effort just to recover the raw kernel data pointer value of a struct mm? Dunno whether we should worry about it. Thanks, Ingo ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-10-02 12:39 ` Ingo Molnar @ 2013-10-02 12:46 ` Peter Zijlstra 2013-10-02 12:59 ` Stephane Eranian 2013-10-02 15:22 ` Ingo Molnar 0 siblings, 2 replies; 42+ messages in thread From: Peter Zijlstra @ 2013-10-02 12:46 UTC (permalink / raw) To: Ingo Molnar Cc: Stephane Eranian, LKML, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, David Ahern, Jiri Olsa, Hugh Dickins, Kees Cook, Linus Torvalds, Andrew Morton On Wed, Oct 02, 2013 at 02:39:53PM +0200, Ingo Molnar wrote: > - then there are timing attacks, and someone having access to a PMU > context and who can trigger this SHA1 computation arbitrarily in task > local context can run very accurate and low noise timing attacks... > > I don't think the kernel's sha_transform() is hardened against timing > attacks, it's performance optimized so it has variable execution time > highly dependent on plaintext input - which leaks information about the > plaintext. Typical user doesn't have enough priv to profile kernel space; once you do you also have enough priv to see kernel addresses outright (ie. kallsyms etc..). ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-10-02 12:46 ` Peter Zijlstra @ 2013-10-02 12:59 ` Stephane Eranian 2013-10-02 13:01 ` Peter Zijlstra 2013-10-02 15:22 ` Ingo Molnar 1 sibling, 1 reply; 42+ messages in thread From: Stephane Eranian @ 2013-10-02 12:59 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, LKML, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, David Ahern, Jiri Olsa, Hugh Dickins, Kees Cook, Linus Torvalds, Andrew Morton On Wed, Oct 2, 2013 at 2:46 PM, Peter Zijlstra <peterz@infradead.org> wrote: > On Wed, Oct 02, 2013 at 02:39:53PM +0200, Ingo Molnar wrote: >> - then there are timing attacks, and someone having access to a PMU >> context and who can trigger this SHA1 computation arbitrarily in task >> local context can run very accurate and low noise timing attacks... >> >> I don't think the kernel's sha_transform() is hardened against timing >> attacks, it's performance optimized so it has variable execution time >> highly dependent on plaintext input - which leaks information about the >> plaintext. > > Typical user doesn't have enough priv to profile kernel space; once you > do you also have enough priv to see kernel addresses outright (ie. > kallsyms etc..). > I was going to say just that. But that's not the default, paranoid level is at 1 by default and not 2. So I supposedly can still do: $ perf record -e cycles ...... In per-thread mode and collect kernel level addresses. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-10-02 12:59 ` Stephane Eranian @ 2013-10-02 13:01 ` Peter Zijlstra 2013-10-02 13:13 ` Stephane Eranian 2013-10-07 11:16 ` Stephane Eranian 0 siblings, 2 replies; 42+ messages in thread From: Peter Zijlstra @ 2013-10-02 13:01 UTC (permalink / raw) To: Stephane Eranian Cc: Ingo Molnar, LKML, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, David Ahern, Jiri Olsa, Hugh Dickins, Kees Cook, Linus Torvalds, Andrew Morton On Wed, Oct 02, 2013 at 02:59:32PM +0200, Stephane Eranian wrote: > On Wed, Oct 2, 2013 at 2:46 PM, Peter Zijlstra <peterz@infradead.org> wrote: > > On Wed, Oct 02, 2013 at 02:39:53PM +0200, Ingo Molnar wrote: > >> - then there are timing attacks, and someone having access to a PMU > >> context and who can trigger this SHA1 computation arbitrarily in task > >> local context can run very accurate and low noise timing attacks... > >> > >> I don't think the kernel's sha_transform() is hardened against timing > >> attacks, it's performance optimized so it has variable execution time > >> highly dependent on plaintext input - which leaks information about the > >> plaintext. > > > > Typical user doesn't have enough priv to profile kernel space; once you > > do you also have enough priv to see kernel addresses outright (ie. > > kallsyms etc..). > > > I was going to say just that. But that's not the default, paranoid level > is at 1 by default and not 2. So I supposedly can still do: Oh right you are.. so yes that's a very viable avenue. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-10-02 13:01 ` Peter Zijlstra @ 2013-10-02 13:13 ` Stephane Eranian 2013-10-02 13:37 ` Peter Zijlstra 2013-10-07 11:16 ` Stephane Eranian 1 sibling, 1 reply; 42+ messages in thread From: Stephane Eranian @ 2013-10-02 13:13 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, LKML, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, David Ahern, Jiri Olsa, Hugh Dickins, Kees Cook, Linus Torvalds, Andrew Morton On Wed, Oct 2, 2013 at 3:01 PM, Peter Zijlstra <peterz@infradead.org> wrote: > On Wed, Oct 02, 2013 at 02:59:32PM +0200, Stephane Eranian wrote: >> On Wed, Oct 2, 2013 at 2:46 PM, Peter Zijlstra <peterz@infradead.org> wrote: >> > On Wed, Oct 02, 2013 at 02:39:53PM +0200, Ingo Molnar wrote: >> >> - then there are timing attacks, and someone having access to a PMU >> >> context and who can trigger this SHA1 computation arbitrarily in task >> >> local context can run very accurate and low noise timing attacks... >> >> >> >> I don't think the kernel's sha_transform() is hardened against timing >> >> attacks, it's performance optimized so it has variable execution time >> >> highly dependent on plaintext input - which leaks information about the >> >> plaintext. >> > >> > Typical user doesn't have enough priv to profile kernel space; once you >> > do you also have enough priv to see kernel addresses outright (ie. >> > kallsyms etc..). >> > >> I was going to say just that. But that's not the default, paranoid level >> is at 1 by default and not 2. So I supposedly can still do: > > Oh right you are.. so yes that's a very viable avenue. You mean simply encoding the vma->vm_mm as the ino number, for instance. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-10-02 13:13 ` Stephane Eranian @ 2013-10-02 13:37 ` Peter Zijlstra 2013-10-02 17:14 ` Kees Cook 0 siblings, 1 reply; 42+ messages in thread From: Peter Zijlstra @ 2013-10-02 13:37 UTC (permalink / raw) To: Stephane Eranian Cc: Ingo Molnar, LKML, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, David Ahern, Jiri Olsa, Hugh Dickins, Kees Cook, Linus Torvalds, Andrew Morton On Wed, Oct 02, 2013 at 03:13:16PM +0200, Stephane Eranian wrote: > On Wed, Oct 2, 2013 at 3:01 PM, Peter Zijlstra <peterz@infradead.org> wrote: > > On Wed, Oct 02, 2013 at 02:59:32PM +0200, Stephane Eranian wrote: > >> On Wed, Oct 2, 2013 at 2:46 PM, Peter Zijlstra <peterz@infradead.org> wrote: > >> > On Wed, Oct 02, 2013 at 02:39:53PM +0200, Ingo Molnar wrote: > >> >> - then there are timing attacks, and someone having access to a PMU > >> >> context and who can trigger this SHA1 computation arbitrarily in task > >> >> local context can run very accurate and low noise timing attacks... > >> >> > >> >> I don't think the kernel's sha_transform() is hardened against timing > >> >> attacks, it's performance optimized so it has variable execution time > >> >> highly dependent on plaintext input - which leaks information about the > >> >> plaintext. > >> > > >> > Typical user doesn't have enough priv to profile kernel space; once you > >> > do you also have enough priv to see kernel addresses outright (ie. > >> > kallsyms etc..). > >> > > >> I was going to say just that. But that's not the default, paranoid level > >> is at 1 by default and not 2. So I supposedly can still do: > > > > Oh right you are.. so yes that's a very viable avenue. > > You mean simply encoding the vma->vm_mm as the ino number, for instance. Nah.. I think Kees would very much shoot us on the spot for doing that. But with the paranoid level defaulting to 1 the PMU attack on the kernel SHA implenentation is feasible. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-10-02 13:37 ` Peter Zijlstra @ 2013-10-02 17:14 ` Kees Cook 2013-10-02 17:20 ` Stephane Eranian 0 siblings, 1 reply; 42+ messages in thread From: Kees Cook @ 2013-10-02 17:14 UTC (permalink / raw) To: Peter Zijlstra Cc: Stephane Eranian, Ingo Molnar, LKML, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, David Ahern, Jiri Olsa, Hugh Dickins, Linus Torvalds, Andrew Morton On Wed, Oct 2, 2013 at 6:37 AM, Peter Zijlstra <peterz@infradead.org> wrote: > On Wed, Oct 02, 2013 at 03:13:16PM +0200, Stephane Eranian wrote: >> On Wed, Oct 2, 2013 at 3:01 PM, Peter Zijlstra <peterz@infradead.org> wrote: >> > On Wed, Oct 02, 2013 at 02:59:32PM +0200, Stephane Eranian wrote: >> >> On Wed, Oct 2, 2013 at 2:46 PM, Peter Zijlstra <peterz@infradead.org> wrote: >> >> > On Wed, Oct 02, 2013 at 02:39:53PM +0200, Ingo Molnar wrote: >> >> >> - then there are timing attacks, and someone having access to a PMU >> >> >> context and who can trigger this SHA1 computation arbitrarily in task >> >> >> local context can run very accurate and low noise timing attacks... >> >> >> >> >> >> I don't think the kernel's sha_transform() is hardened against timing >> >> >> attacks, it's performance optimized so it has variable execution time >> >> >> highly dependent on plaintext input - which leaks information about the >> >> >> plaintext. >> >> > >> >> > Typical user doesn't have enough priv to profile kernel space; once you >> >> > do you also have enough priv to see kernel addresses outright (ie. >> >> > kallsyms etc..). >> >> > >> >> I was going to say just that. But that's not the default, paranoid level >> >> is at 1 by default and not 2. So I supposedly can still do: >> >> >> >> $ perf record -e cycles ...... >> >> >> >> In per-thread mode and collect kernel level addresses. >> > >> > Oh right you are.. so yes that's a very viable avenue. >> >> You mean simply encoding the vma->vm_mm as the ino number, for instance. > > Nah.. I think Kees would very much shoot us on the spot for doing that. > But with the paranoid level defaulting to 1 the PMU attack on the kernel > SHA implenentation is feasible. We already have other kernel address leaks (e.g. heap addresses via INET_DIAG), and I'd like to avoid adding more. It'd be nice if there was a common way to uniquely mask these values that are really just "handles". We could use it both here and in the network code. Would it be possible to just have a "regular" incrementing handle, like fd, or are we talking about doing that map for all VMAs, which would make that mapping unfeasible due to storage needs? On the other hand, trying to hide kernel addresses from root is no easy task, especially given that while kptr_restrict has a "2" level, dmesg_restrict does not. Are these kernel-address-leaking perf modes already limited to a specific POSIX capability? -Kees -- Kees Cook Chrome OS Security ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-10-02 17:14 ` Kees Cook @ 2013-10-02 17:20 ` Stephane Eranian 2013-10-02 17:29 ` Kees Cook 0 siblings, 1 reply; 42+ messages in thread From: Stephane Eranian @ 2013-10-02 17:20 UTC (permalink / raw) To: Kees Cook Cc: Peter Zijlstra, Ingo Molnar, LKML, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, David Ahern, Jiri Olsa, Hugh Dickins, Linus Torvalds, Andrew Morton On Wed, Oct 2, 2013 at 7:14 PM, Kees Cook <keescook@chromium.org> wrote: > On Wed, Oct 2, 2013 at 6:37 AM, Peter Zijlstra <peterz@infradead.org> wrote: >> On Wed, Oct 02, 2013 at 03:13:16PM +0200, Stephane Eranian wrote: >>> On Wed, Oct 2, 2013 at 3:01 PM, Peter Zijlstra <peterz@infradead.org> wrote: >>> > On Wed, Oct 02, 2013 at 02:59:32PM +0200, Stephane Eranian wrote: >>> >> On Wed, Oct 2, 2013 at 2:46 PM, Peter Zijlstra <peterz@infradead.org> wrote: >>> >> > On Wed, Oct 02, 2013 at 02:39:53PM +0200, Ingo Molnar wrote: >>> >> >> - then there are timing attacks, and someone having access to a PMU >>> >> >> context and who can trigger this SHA1 computation arbitrarily in task >>> >> >> local context can run very accurate and low noise timing attacks... >>> >> >> >>> >> >> I don't think the kernel's sha_transform() is hardened against timing >>> >> >> attacks, it's performance optimized so it has variable execution time >>> >> >> highly dependent on plaintext input - which leaks information about the >>> >> >> plaintext. >>> >> > >>> >> > Typical user doesn't have enough priv to profile kernel space; once you >>> >> > do you also have enough priv to see kernel addresses outright (ie. >>> >> > kallsyms etc..). >>> >> > >>> >> I was going to say just that. But that's not the default, paranoid level >>> >> is at 1 by default and not 2. So I supposedly can still do: >>> >> >>> >> $ perf record -e cycles ...... >>> >> >>> >> In per-thread mode and collect kernel level addresses. >>> > >>> > Oh right you are.. so yes that's a very viable avenue. >>> >>> You mean simply encoding the vma->vm_mm as the ino number, for instance. >> >> Nah.. I think Kees would very much shoot us on the spot for doing that. >> But with the paranoid level defaulting to 1 the PMU attack on the kernel >> SHA implenentation is feasible. > > We already have other kernel address leaks (e.g. heap addresses via > INET_DIAG), and I'd like to avoid adding more. It'd be nice if there > was a common way to uniquely mask these values that are really just > "handles". We could use it both here and in the network code. > > Would it be possible to just have a "regular" incrementing handle, > like fd, or are we talking about doing that map for all VMAs, which > would make that mapping unfeasible due to storage needs? > All we need is a way to report that two vmas point to the same vma->vm_mm, i..e, same physical data. If I understand what you are suggesting, you'd add some sort of generation number to the vm_mm. Each new vm_mm gets a new number. That would work, I think. No kernel addresses reported directly nor hashed. > On the other hand, trying to hide kernel addresses from root is no > easy task, especially given that while kptr_restrict has a "2" level, > dmesg_restrict does not. Are these kernel-address-leaking perf modes > already limited to a specific POSIX capability? > > -Kees > > -- > Kees Cook > Chrome OS Security ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-10-02 17:20 ` Stephane Eranian @ 2013-10-02 17:29 ` Kees Cook 2013-10-02 17:49 ` Stephane Eranian 0 siblings, 1 reply; 42+ messages in thread From: Kees Cook @ 2013-10-02 17:29 UTC (permalink / raw) To: Stephane Eranian Cc: Peter Zijlstra, Ingo Molnar, LKML, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, David Ahern, Jiri Olsa, Hugh Dickins, Linus Torvalds, Andrew Morton On Wed, Oct 2, 2013 at 10:20 AM, Stephane Eranian <eranian@google.com> wrote: > On Wed, Oct 2, 2013 at 7:14 PM, Kees Cook <keescook@chromium.org> wrote: >> On Wed, Oct 2, 2013 at 6:37 AM, Peter Zijlstra <peterz@infradead.org> wrote: >>> On Wed, Oct 02, 2013 at 03:13:16PM +0200, Stephane Eranian wrote: >>>> On Wed, Oct 2, 2013 at 3:01 PM, Peter Zijlstra <peterz@infradead.org> wrote: >>>> > On Wed, Oct 02, 2013 at 02:59:32PM +0200, Stephane Eranian wrote: >>>> >> On Wed, Oct 2, 2013 at 2:46 PM, Peter Zijlstra <peterz@infradead.org> wrote: >>>> >> > On Wed, Oct 02, 2013 at 02:39:53PM +0200, Ingo Molnar wrote: >>>> >> >> - then there are timing attacks, and someone having access to a PMU >>>> >> >> context and who can trigger this SHA1 computation arbitrarily in task >>>> >> >> local context can run very accurate and low noise timing attacks... >>>> >> >> >>>> >> >> I don't think the kernel's sha_transform() is hardened against timing >>>> >> >> attacks, it's performance optimized so it has variable execution time >>>> >> >> highly dependent on plaintext input - which leaks information about the >>>> >> >> plaintext. >>>> >> > >>>> >> > Typical user doesn't have enough priv to profile kernel space; once you >>>> >> > do you also have enough priv to see kernel addresses outright (ie. >>>> >> > kallsyms etc..). >>>> >> > >>>> >> I was going to say just that. But that's not the default, paranoid level >>>> >> is at 1 by default and not 2. So I supposedly can still do: >>>> >> >>>> >> $ perf record -e cycles ...... >>>> >> >>>> >> In per-thread mode and collect kernel level addresses. >>>> > >>>> > Oh right you are.. so yes that's a very viable avenue. >>>> >>>> You mean simply encoding the vma->vm_mm as the ino number, for instance. >>> >>> Nah.. I think Kees would very much shoot us on the spot for doing that. >>> But with the paranoid level defaulting to 1 the PMU attack on the kernel >>> SHA implenentation is feasible. >> >> We already have other kernel address leaks (e.g. heap addresses via >> INET_DIAG), and I'd like to avoid adding more. It'd be nice if there >> was a common way to uniquely mask these values that are really just >> "handles". We could use it both here and in the network code. >> >> Would it be possible to just have a "regular" incrementing handle, >> like fd, or are we talking about doing that map for all VMAs, which >> would make that mapping unfeasible due to storage needs? >> > All we need is a way to report that two vmas point to the same > vma->vm_mm, i..e, same physical data. If I understand what > you are suggesting, you'd add some sort of generation number > to the vm_mm. Each new vm_mm gets a new number. That > would work, I think. No kernel addresses reported directly nor > hashed. Right. Is that workable? It sounds like this handle is only needed at inspection time, though. Is this uniqueness test limited to a single process, or is this uniqueness test across processes? -Kees -- Kees Cook Chrome OS Security ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-10-02 17:29 ` Kees Cook @ 2013-10-02 17:49 ` Stephane Eranian 2013-10-02 18:10 ` Kees Cook 0 siblings, 1 reply; 42+ messages in thread From: Stephane Eranian @ 2013-10-02 17:49 UTC (permalink / raw) To: Kees Cook Cc: Peter Zijlstra, Ingo Molnar, LKML, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, David Ahern, Jiri Olsa, Hugh Dickins, Linus Torvalds, Andrew Morton On Wed, Oct 2, 2013 at 7:29 PM, Kees Cook <keescook@chromium.org> wrote: > On Wed, Oct 2, 2013 at 10:20 AM, Stephane Eranian <eranian@google.com> wrote: >> On Wed, Oct 2, 2013 at 7:14 PM, Kees Cook <keescook@chromium.org> wrote: >>> On Wed, Oct 2, 2013 at 6:37 AM, Peter Zijlstra <peterz@infradead.org> wrote: >>>> On Wed, Oct 02, 2013 at 03:13:16PM +0200, Stephane Eranian wrote: >>>>> On Wed, Oct 2, 2013 at 3:01 PM, Peter Zijlstra <peterz@infradead.org> wrote: >>>>> > On Wed, Oct 02, 2013 at 02:59:32PM +0200, Stephane Eranian wrote: >>>>> >> On Wed, Oct 2, 2013 at 2:46 PM, Peter Zijlstra <peterz@infradead.org> wrote: >>>>> >> > On Wed, Oct 02, 2013 at 02:39:53PM +0200, Ingo Molnar wrote: >>>>> >> >> - then there are timing attacks, and someone having access to a PMU >>>>> >> >> context and who can trigger this SHA1 computation arbitrarily in task >>>>> >> >> local context can run very accurate and low noise timing attacks... >>>>> >> >> >>>>> >> >> I don't think the kernel's sha_transform() is hardened against timing >>>>> >> >> attacks, it's performance optimized so it has variable execution time >>>>> >> >> highly dependent on plaintext input - which leaks information about the >>>>> >> >> plaintext. >>>>> >> > >>>>> >> > Typical user doesn't have enough priv to profile kernel space; once you >>>>> >> > do you also have enough priv to see kernel addresses outright (ie. >>>>> >> > kallsyms etc..). >>>>> >> > >>>>> >> I was going to say just that. But that's not the default, paranoid level >>>>> >> is at 1 by default and not 2. So I supposedly can still do: >>>>> >> >>>>> >> $ perf record -e cycles ...... >>>>> >> >>>>> >> In per-thread mode and collect kernel level addresses. >>>>> > >>>>> > Oh right you are.. so yes that's a very viable avenue. >>>>> >>>>> You mean simply encoding the vma->vm_mm as the ino number, for instance. >>>> >>>> Nah.. I think Kees would very much shoot us on the spot for doing that. >>>> But with the paranoid level defaulting to 1 the PMU attack on the kernel >>>> SHA implenentation is feasible. >>> >>> We already have other kernel address leaks (e.g. heap addresses via >>> INET_DIAG), and I'd like to avoid adding more. It'd be nice if there >>> was a common way to uniquely mask these values that are really just >>> "handles". We could use it both here and in the network code. >>> >>> Would it be possible to just have a "regular" incrementing handle, >>> like fd, or are we talking about doing that map for all VMAs, which >>> would make that mapping unfeasible due to storage needs? >>> >> All we need is a way to report that two vmas point to the same >> vma->vm_mm, i..e, same physical data. If I understand what >> you are suggesting, you'd add some sort of generation number >> to the vm_mm. Each new vm_mm gets a new number. That >> would work, I think. No kernel addresses reported directly nor >> hashed. > > Right. Is that workable? It sounds like this handle is only needed at > inspection time, though. Is this uniqueness test limited to a single > process, or is this uniqueness test across processes? > Each time that vm_mm is allocated we would allocate a new generation number. Uniqueness is across processes. But that's by construction of the address space. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-10-02 17:49 ` Stephane Eranian @ 2013-10-02 18:10 ` Kees Cook 2013-10-02 19:00 ` Peter Zijlstra 0 siblings, 1 reply; 42+ messages in thread From: Kees Cook @ 2013-10-02 18:10 UTC (permalink / raw) To: Stephane Eranian Cc: Peter Zijlstra, Ingo Molnar, LKML, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, David Ahern, Jiri Olsa, Hugh Dickins, Linus Torvalds, Andrew Morton On Wed, Oct 2, 2013 at 10:49 AM, Stephane Eranian <eranian@google.com> wrote: > On Wed, Oct 2, 2013 at 7:29 PM, Kees Cook <keescook@chromium.org> wrote: >> On Wed, Oct 2, 2013 at 10:20 AM, Stephane Eranian <eranian@google.com> wrote: >>> On Wed, Oct 2, 2013 at 7:14 PM, Kees Cook <keescook@chromium.org> wrote: >>>> On Wed, Oct 2, 2013 at 6:37 AM, Peter Zijlstra <peterz@infradead.org> wrote: >>>>> On Wed, Oct 02, 2013 at 03:13:16PM +0200, Stephane Eranian wrote: >>>>>> On Wed, Oct 2, 2013 at 3:01 PM, Peter Zijlstra <peterz@infradead.org> wrote: >>>>>> > On Wed, Oct 02, 2013 at 02:59:32PM +0200, Stephane Eranian wrote: >>>>>> >> On Wed, Oct 2, 2013 at 2:46 PM, Peter Zijlstra <peterz@infradead.org> wrote: >>>>>> >> > On Wed, Oct 02, 2013 at 02:39:53PM +0200, Ingo Molnar wrote: >>>>>> >> >> - then there are timing attacks, and someone having access to a PMU >>>>>> >> >> context and who can trigger this SHA1 computation arbitrarily in task >>>>>> >> >> local context can run very accurate and low noise timing attacks... >>>>>> >> >> >>>>>> >> >> I don't think the kernel's sha_transform() is hardened against timing >>>>>> >> >> attacks, it's performance optimized so it has variable execution time >>>>>> >> >> highly dependent on plaintext input - which leaks information about the >>>>>> >> >> plaintext. >>>>>> >> > >>>>>> >> > Typical user doesn't have enough priv to profile kernel space; once you >>>>>> >> > do you also have enough priv to see kernel addresses outright (ie. >>>>>> >> > kallsyms etc..). >>>>>> >> > >>>>>> >> I was going to say just that. But that's not the default, paranoid level >>>>>> >> is at 1 by default and not 2. So I supposedly can still do: >>>>>> >> >>>>>> >> $ perf record -e cycles ...... >>>>>> >> >>>>>> >> In per-thread mode and collect kernel level addresses. >>>>>> > >>>>>> > Oh right you are.. so yes that's a very viable avenue. >>>>>> >>>>>> You mean simply encoding the vma->vm_mm as the ino number, for instance. >>>>> >>>>> Nah.. I think Kees would very much shoot us on the spot for doing that. >>>>> But with the paranoid level defaulting to 1 the PMU attack on the kernel >>>>> SHA implenentation is feasible. >>>> >>>> We already have other kernel address leaks (e.g. heap addresses via >>>> INET_DIAG), and I'd like to avoid adding more. It'd be nice if there >>>> was a common way to uniquely mask these values that are really just >>>> "handles". We could use it both here and in the network code. >>>> >>>> Would it be possible to just have a "regular" incrementing handle, >>>> like fd, or are we talking about doing that map for all VMAs, which >>>> would make that mapping unfeasible due to storage needs? >>>> >>> All we need is a way to report that two vmas point to the same >>> vma->vm_mm, i..e, same physical data. If I understand what >>> you are suggesting, you'd add some sort of generation number >>> to the vm_mm. Each new vm_mm gets a new number. That >>> would work, I think. No kernel addresses reported directly nor >>> hashed. >> >> Right. Is that workable? It sounds like this handle is only needed at >> inspection time, though. Is this uniqueness test limited to a single >> process, or is this uniqueness test across processes? >> > Each time that vm_mm is allocated we would allocate a new generation > number. Seems like a simple enough solution. Surely there must be a catch. :) > > Uniqueness is across processes. But that's by construction of the > address space. Okay, that's what I figured. -Kees -- Kees Cook Chrome OS Security ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-10-02 18:10 ` Kees Cook @ 2013-10-02 19:00 ` Peter Zijlstra 2013-10-02 19:38 ` Kees Cook 0 siblings, 1 reply; 42+ messages in thread From: Peter Zijlstra @ 2013-10-02 19:00 UTC (permalink / raw) To: Kees Cook Cc: Stephane Eranian, Ingo Molnar, LKML, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, David Ahern, Jiri Olsa, Hugh Dickins, Linus Torvalds, Andrew Morton On Wed, Oct 02, 2013 at 11:10:15AM -0700, Kees Cook wrote: > Seems like a simple enough solution. Surely there must be a catch. :) I didn't want to add this to the core mm just for perf.. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-10-02 19:00 ` Peter Zijlstra @ 2013-10-02 19:38 ` Kees Cook 2013-10-02 20:31 ` Peter Zijlstra 0 siblings, 1 reply; 42+ messages in thread From: Kees Cook @ 2013-10-02 19:38 UTC (permalink / raw) To: Peter Zijlstra Cc: Stephane Eranian, Ingo Molnar, LKML, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, David Ahern, Jiri Olsa, Hugh Dickins, Linus Torvalds, Andrew Morton On Wed, Oct 2, 2013 at 12:00 PM, Peter Zijlstra <peterz@infradead.org> wrote: > On Wed, Oct 02, 2013 at 11:10:15AM -0700, Kees Cook wrote: >> Seems like a simple enough solution. Surely there must be a catch. :) > > I didn't want to add this to the core mm just for perf.. It seems like it would be pretty inexpensive. It might also be valuable in other situations. Not that I can think of any at the moment. Additionally, it could likely be hidden by a CONFIG, so that if perf isn't built in, there's no change? -Kees -- Kees Cook Chrome OS Security ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-10-02 19:38 ` Kees Cook @ 2013-10-02 20:31 ` Peter Zijlstra 2013-10-03 8:55 ` Stephane Eranian 0 siblings, 1 reply; 42+ messages in thread From: Peter Zijlstra @ 2013-10-02 20:31 UTC (permalink / raw) To: Kees Cook Cc: Stephane Eranian, Ingo Molnar, LKML, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, David Ahern, Jiri Olsa, Hugh Dickins, Linus Torvalds, Andrew Morton On Wed, Oct 02, 2013 at 12:38:54PM -0700, Kees Cook wrote: > On Wed, Oct 2, 2013 at 12:00 PM, Peter Zijlstra <peterz@infradead.org> wrote: > > On Wed, Oct 02, 2013 at 11:10:15AM -0700, Kees Cook wrote: > >> Seems like a simple enough solution. Surely there must be a catch. :) > > > > I didn't want to add this to the core mm just for perf.. > > It seems like it would be pretty inexpensive. It might also be > valuable in other situations. Not that I can think of any at the > moment. Additionally, it could likely be hidden by a CONFIG, so that > if perf isn't built in, there's no change? You optimist, you think you can build a kernel without perf? ;-) Its just that I would hate to add more completely global state to the fork() path. The tasklist_lock might be hard to crack, but at least the pid-hash could use per bucket locks (it doesn't apparently). I suppose people don't really care that much about fork() performance; which is sad. KSM and THP also add their own global locks :-( ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-10-02 20:31 ` Peter Zijlstra @ 2013-10-03 8:55 ` Stephane Eranian 2013-10-03 9:03 ` Peter Zijlstra 2013-10-03 18:32 ` Andi Kleen 0 siblings, 2 replies; 42+ messages in thread From: Stephane Eranian @ 2013-10-03 8:55 UTC (permalink / raw) To: Peter Zijlstra Cc: Kees Cook, Ingo Molnar, LKML, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, David Ahern, Jiri Olsa, Hugh Dickins, Linus Torvalds, Andrew Morton On Wed, Oct 2, 2013 at 10:31 PM, Peter Zijlstra <peterz@infradead.org> wrote: > On Wed, Oct 02, 2013 at 12:38:54PM -0700, Kees Cook wrote: >> On Wed, Oct 2, 2013 at 12:00 PM, Peter Zijlstra <peterz@infradead.org> wrote: >> > On Wed, Oct 02, 2013 at 11:10:15AM -0700, Kees Cook wrote: >> >> Seems like a simple enough solution. Surely there must be a catch. :) >> > >> > I didn't want to add this to the core mm just for perf.. >> >> It seems like it would be pretty inexpensive. It might also be >> valuable in other situations. Not that I can think of any at the >> moment. Additionally, it could likely be hidden by a CONFIG, so that >> if perf isn't built in, there's no change? > > You optimist, you think you can build a kernel without perf? ;-) > > Its just that I would hate to add more completely global state to the > fork() path. The tasklist_lock might be hard to crack, but at least the > pid-hash could use per bucket locks (it doesn't apparently). > > I suppose people don't really care that much about fork() performance; > which is sad. KSM and THP also add their own global locks :-( I don't know the MM code but I assume that that vm_mm struct is allocated dynamically and maybe you already grabbing a lock while doing this. Could we leverage that lock to increment a global generation number? ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-10-03 8:55 ` Stephane Eranian @ 2013-10-03 9:03 ` Peter Zijlstra 2013-10-03 9:13 ` Stephane Eranian 2013-10-07 21:04 ` Stephane Eranian 2013-10-03 18:32 ` Andi Kleen 1 sibling, 2 replies; 42+ messages in thread From: Peter Zijlstra @ 2013-10-03 9:03 UTC (permalink / raw) To: Stephane Eranian Cc: Kees Cook, Ingo Molnar, LKML, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, David Ahern, Jiri Olsa, Hugh Dickins, Linus Torvalds, Andrew Morton On Thu, Oct 03, 2013 at 10:55:28AM +0200, Stephane Eranian wrote: > I don't know the MM code but I assume that that vm_mm struct is > allocated dynamically > and maybe you already grabbing a lock while doing this. Could we > leverage that lock > to increment a global generation number? Sure; something like so.. I just don't like global state nor adding to mm_struct for just this. --- include/linux/mm_types.h | 1 + kernel/fork.c | 6 ++++++ 2 files changed, 7 insertions(+) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index d9851eeb6e1d..3877b1e72a5b 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -436,6 +436,7 @@ struct mm_struct { int first_nid; #endif struct uprobes_state uprobes_state; + u64 mm_id; }; /* first nid will either be a valid NID or one of these values */ diff --git a/kernel/fork.c b/kernel/fork.c index 086fe73ad6bd..b315f6227629 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -523,6 +523,8 @@ static void mm_init_aio(struct mm_struct *mm) #endif } +static u64 global_mm_id; + static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p) { atomic_set(&mm->mm_users, 1); @@ -537,6 +539,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p) spin_lock_init(&mm->page_table_lock); mm_init_aio(mm); mm_init_owner(mm, p); + mm->mm_id = 0; if (likely(!mm_alloc_pgd(mm))) { mm->def_flags = 0; @@ -1422,6 +1425,9 @@ static struct task_struct *copy_process(unsigned long clone_flags, */ write_lock_irq(&tasklist_lock); + if (p->mm && !p->mm->mm_id) + p->mm->mm_id = ++global_mm_id; + /* CLONE_PARENT re-uses the old parent */ if (clone_flags & (CLONE_PARENT|CLONE_THREAD)) { p->real_parent = current->real_parent; ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-10-03 9:03 ` Peter Zijlstra @ 2013-10-03 9:13 ` Stephane Eranian 2013-10-03 15:34 ` Kees Cook 2013-10-07 21:04 ` Stephane Eranian 1 sibling, 1 reply; 42+ messages in thread From: Stephane Eranian @ 2013-10-03 9:13 UTC (permalink / raw) To: Peter Zijlstra Cc: Kees Cook, Ingo Molnar, LKML, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, David Ahern, Jiri Olsa, Hugh Dickins, Linus Torvalds, Andrew Morton On Thu, Oct 3, 2013 at 11:03 AM, Peter Zijlstra <peterz@infradead.org> wrote: > On Thu, Oct 03, 2013 at 10:55:28AM +0200, Stephane Eranian wrote: >> I don't know the MM code but I assume that that vm_mm struct is >> allocated dynamically >> and maybe you already grabbing a lock while doing this. Could we >> leverage that lock >> to increment a global generation number? > > Sure; something like so.. I just don't like global state nor adding to > mm_struct for just this. > I understand, I don't like global state either. I see you already have uprobes state in there. Thus to me, the perf situation is not much worse in terms of usefulness (frequency of use). Thanks. > --- > include/linux/mm_types.h | 1 + > kernel/fork.c | 6 ++++++ > 2 files changed, 7 insertions(+) > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index d9851eeb6e1d..3877b1e72a5b 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -436,6 +436,7 @@ struct mm_struct { > int first_nid; > #endif > struct uprobes_state uprobes_state; > + u64 mm_id; > }; > > /* first nid will either be a valid NID or one of these values */ > diff --git a/kernel/fork.c b/kernel/fork.c > index 086fe73ad6bd..b315f6227629 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -523,6 +523,8 @@ static void mm_init_aio(struct mm_struct *mm) > #endif > } > > +static u64 global_mm_id; > + > static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p) > { > atomic_set(&mm->mm_users, 1); > @@ -537,6 +539,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p) > spin_lock_init(&mm->page_table_lock); > mm_init_aio(mm); > mm_init_owner(mm, p); > + mm->mm_id = 0; > > if (likely(!mm_alloc_pgd(mm))) { > mm->def_flags = 0; > @@ -1422,6 +1425,9 @@ static struct task_struct *copy_process(unsigned long clone_flags, > */ > write_lock_irq(&tasklist_lock); > > + if (p->mm && !p->mm->mm_id) > + p->mm->mm_id = ++global_mm_id; > + > /* CLONE_PARENT re-uses the old parent */ > if (clone_flags & (CLONE_PARENT|CLONE_THREAD)) { > p->real_parent = current->real_parent; ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-10-03 9:13 ` Stephane Eranian @ 2013-10-03 15:34 ` Kees Cook 0 siblings, 0 replies; 42+ messages in thread From: Kees Cook @ 2013-10-03 15:34 UTC (permalink / raw) To: Stephane Eranian Cc: Peter Zijlstra, Ingo Molnar, LKML, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, David Ahern, Jiri Olsa, Hugh Dickins, Linus Torvalds, Andrew Morton On Thu, Oct 3, 2013 at 2:13 AM, Stephane Eranian <eranian@google.com> wrote: > On Thu, Oct 3, 2013 at 11:03 AM, Peter Zijlstra <peterz@infradead.org> wrote: >> On Thu, Oct 03, 2013 at 10:55:28AM +0200, Stephane Eranian wrote: >>> I don't know the MM code but I assume that that vm_mm struct is >>> allocated dynamically >>> and maybe you already grabbing a lock while doing this. Could we >>> leverage that lock >>> to increment a global generation number? >> >> Sure; something like so.. I just don't like global state nor adding to >> mm_struct for just this. >> > I understand, I don't like global state either. I see you already > have uprobes state in there. Thus to me, the perf situation is > not much worse in terms of usefulness (frequency of use). > > Thanks. > >> --- >> include/linux/mm_types.h | 1 + >> kernel/fork.c | 6 ++++++ >> 2 files changed, 7 insertions(+) >> >> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h >> index d9851eeb6e1d..3877b1e72a5b 100644 >> --- a/include/linux/mm_types.h >> +++ b/include/linux/mm_types.h >> @@ -436,6 +436,7 @@ struct mm_struct { >> int first_nid; >> #endif >> struct uprobes_state uprobes_state; >> + u64 mm_id; >> }; >> >> /* first nid will either be a valid NID or one of these values */ >> diff --git a/kernel/fork.c b/kernel/fork.c >> index 086fe73ad6bd..b315f6227629 100644 >> --- a/kernel/fork.c >> +++ b/kernel/fork.c >> @@ -523,6 +523,8 @@ static void mm_init_aio(struct mm_struct *mm) >> #endif >> } >> >> +static u64 global_mm_id; >> + >> static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p) >> { >> atomic_set(&mm->mm_users, 1); >> @@ -537,6 +539,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p) >> spin_lock_init(&mm->page_table_lock); >> mm_init_aio(mm); >> mm_init_owner(mm, p); >> + mm->mm_id = 0; >> >> if (likely(!mm_alloc_pgd(mm))) { >> mm->def_flags = 0; >> @@ -1422,6 +1425,9 @@ static struct task_struct *copy_process(unsigned long clone_flags, >> */ >> write_lock_irq(&tasklist_lock); >> >> + if (p->mm && !p->mm->mm_id) >> + p->mm->mm_id = ++global_mm_id; >> + >> /* CLONE_PARENT re-uses the old parent */ >> if (clone_flags & (CLONE_PARENT|CLONE_THREAD)) { >> p->real_parent = current->real_parent; FWIW, I like this approach. -Kees -- Kees Cook Chrome OS Security ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-10-03 9:03 ` Peter Zijlstra 2013-10-03 9:13 ` Stephane Eranian @ 2013-10-07 21:04 ` Stephane Eranian 2013-10-08 6:54 ` Peter Zijlstra 1 sibling, 1 reply; 42+ messages in thread From: Stephane Eranian @ 2013-10-07 21:04 UTC (permalink / raw) To: Peter Zijlstra Cc: Kees Cook, Ingo Molnar, LKML, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, David Ahern, Jiri Olsa, Hugh Dickins, Linus Torvalds, Andrew Morton Peter, On Thu, Oct 3, 2013 at 11:03 AM, Peter Zijlstra <peterz@infradead.org> wrote: > On Thu, Oct 03, 2013 at 10:55:28AM +0200, Stephane Eranian wrote: >> I don't know the MM code but I assume that that vm_mm struct is >> allocated dynamically >> and maybe you already grabbing a lock while doing this. Could we >> leverage that lock >> to increment a global generation number? > > Sure; something like so.. I just don't like global state nor adding to > mm_struct for just this. > > --- > include/linux/mm_types.h | 1 + > kernel/fork.c | 6 ++++++ > 2 files changed, 7 insertions(+) > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index d9851eeb6e1d..3877b1e72a5b 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -436,6 +436,7 @@ struct mm_struct { > int first_nid; > #endif > struct uprobes_state uprobes_state; > + u64 mm_id; > }; > > /* first nid will either be a valid NID or one of these values */ > diff --git a/kernel/fork.c b/kernel/fork.c > index 086fe73ad6bd..b315f6227629 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -523,6 +523,8 @@ static void mm_init_aio(struct mm_struct *mm) > #endif > } > > +static u64 global_mm_id; > + > static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p) > { > atomic_set(&mm->mm_users, 1); > @@ -537,6 +539,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p) > spin_lock_init(&mm->page_table_lock); > mm_init_aio(mm); > mm_init_owner(mm, p); > + mm->mm_id = 0; > > if (likely(!mm_alloc_pgd(mm))) { > mm->def_flags = 0; > @@ -1422,6 +1425,9 @@ static struct task_struct *copy_process(unsigned long clone_flags, > */ > write_lock_irq(&tasklist_lock); > > + if (p->mm && !p->mm->mm_id) > + p->mm->mm_id = ++global_mm_id; > + > /* CLONE_PARENT re-uses the old parent */ > if (clone_flags & (CLONE_PARENT|CLONE_THREAD)) { > p->real_parent = current->real_parent; Ok, so I tried this. It does not cover shmat() cases unfortunately. We need that same ++global_mm_id somewhere on that code path. Haven't looked at it in details just yet. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-10-07 21:04 ` Stephane Eranian @ 2013-10-08 6:54 ` Peter Zijlstra 2013-10-08 7:15 ` Stephane Eranian 0 siblings, 1 reply; 42+ messages in thread From: Peter Zijlstra @ 2013-10-08 6:54 UTC (permalink / raw) To: Stephane Eranian Cc: Kees Cook, Ingo Molnar, LKML, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, David Ahern, Jiri Olsa, Hugh Dickins, Linus Torvalds, Andrew Morton On Mon, Oct 07, 2013 at 11:04:38PM +0200, Stephane Eranian wrote: > Peter, > > Ok, so I tried this. It does not cover shmat() cases unfortunately. > We need that same ++global_mm_id somewhere on that code path. > Haven't looked at it in details just yet. shmat() is a regular shared memory thingy right? So that should be covered by the regular inode,dev bits, right? ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-10-08 6:54 ` Peter Zijlstra @ 2013-10-08 7:15 ` Stephane Eranian 2013-10-08 9:36 ` Peter Zijlstra 0 siblings, 1 reply; 42+ messages in thread From: Stephane Eranian @ 2013-10-08 7:15 UTC (permalink / raw) To: Peter Zijlstra Cc: Kees Cook, Ingo Molnar, LKML, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, David Ahern, Jiri Olsa, Hugh Dickins, Linus Torvalds, Andrew Morton On Tue, Oct 8, 2013 at 8:54 AM, Peter Zijlstra <peterz@infradead.org> wrote: > On Mon, Oct 07, 2013 at 11:04:38PM +0200, Stephane Eranian wrote: >> Peter, >> >> Ok, so I tried this. It does not cover shmat() cases unfortunately. >> We need that same ++global_mm_id somewhere on that code path. >> Haven't looked at it in details just yet. > > shmat() is a regular shared memory thingy right? So that should be > covered by the regular inode,dev bits, right? Yes, it is but I am trying to see whether or not we could unify that and use a single u64 number to uniquely identify each mapping. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-10-08 7:15 ` Stephane Eranian @ 2013-10-08 9:36 ` Peter Zijlstra 2013-10-08 9:42 ` Stephane Eranian 0 siblings, 1 reply; 42+ messages in thread From: Peter Zijlstra @ 2013-10-08 9:36 UTC (permalink / raw) To: Stephane Eranian Cc: Kees Cook, Ingo Molnar, LKML, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, David Ahern, Jiri Olsa, Hugh Dickins, Linus Torvalds, Andrew Morton On Tue, Oct 08, 2013 at 09:15:30AM +0200, Stephane Eranian wrote: > Yes, it is but I am trying to see whether or not we could unify that and > use a single u64 number to uniquely identify each mapping. No you cannot; two unrelated executables which have distinct mm_ids can easily mmap() the same shared file. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-10-08 9:36 ` Peter Zijlstra @ 2013-10-08 9:42 ` Stephane Eranian 2013-10-08 9:54 ` Peter Zijlstra 0 siblings, 1 reply; 42+ messages in thread From: Stephane Eranian @ 2013-10-08 9:42 UTC (permalink / raw) To: Peter Zijlstra Cc: Kees Cook, Ingo Molnar, LKML, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, David Ahern, Jiri Olsa, Hugh Dickins, Linus Torvalds, Andrew Morton On Tue, Oct 8, 2013 at 11:36 AM, Peter Zijlstra <peterz@infradead.org> wrote: > On Tue, Oct 08, 2013 at 09:15:30AM +0200, Stephane Eranian wrote: >> Yes, it is but I am trying to see whether or not we could unify that and >> use a single u64 number to uniquely identify each mapping. > > No you cannot; two unrelated executables which have distinct mm_ids can > easily mmap() the same shared file. That seems to indicate the mm_ids is not attached to the right level of VM data structure. But I am okay with keeping it that way and stashing the mm_id as a pseudo inode number for the case of non file-backed mappings. If we say maj=min=ino=gen=0 means no "info", then any other combinations can be used to identify identical mappings. We use actual min,maj, ino, gen for file backed, and maj=min=gen=0 + ino = mm_ids for the other cases. That should work, shouldn't it? ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-10-08 9:42 ` Stephane Eranian @ 2013-10-08 9:54 ` Peter Zijlstra 0 siblings, 0 replies; 42+ messages in thread From: Peter Zijlstra @ 2013-10-08 9:54 UTC (permalink / raw) To: Stephane Eranian Cc: Kees Cook, Ingo Molnar, LKML, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, David Ahern, Jiri Olsa, Hugh Dickins, Linus Torvalds, Andrew Morton On Tue, Oct 08, 2013 at 11:42:06AM +0200, Stephane Eranian wrote: > On Tue, Oct 8, 2013 at 11:36 AM, Peter Zijlstra <peterz@infradead.org> wrote: > > On Tue, Oct 08, 2013 at 09:15:30AM +0200, Stephane Eranian wrote: > >> Yes, it is but I am trying to see whether or not we could unify that and > >> use a single u64 number to uniquely identify each mapping. > > > > No you cannot; two unrelated executables which have distinct mm_ids can > > easily mmap() the same shared file. > > That seems to indicate the mm_ids is not attached to the right level of VM > data structure. No that's not it.. shared mappings can create arbitrary couplings between mm's, there's really nothing you can do about that. This is really only about anonymous memory shared with CLONE_VM. > But I am okay with keeping it that way and stashing the mm_id > as a pseudo inode number for the case of non file-backed mappings. If we > say maj=min=ino=gen=0 means no "info", then any other combinations can > be used to identify identical mappings. We use actual min,maj, ino, gen > for file backed, and maj=min=gen=0 + ino = mm_ids for the other cases. > That should work, shouldn't it? I'm not sure if min=maj=0 is a valid device, nor do I know if ino=0 is a valid ino. But I suppose the name "//anon" should be a big enough clue if both of those fail. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-10-03 8:55 ` Stephane Eranian 2013-10-03 9:03 ` Peter Zijlstra @ 2013-10-03 18:32 ` Andi Kleen 1 sibling, 0 replies; 42+ messages in thread From: Andi Kleen @ 2013-10-03 18:32 UTC (permalink / raw) To: Stephane Eranian Cc: Peter Zijlstra, Kees Cook, Ingo Molnar, LKML, mingo@elte.hu, Arnaldo Carvalho de Melo, David Ahern, Jiri Olsa, Hugh Dickins, Linus Torvalds, Andrew Morton, xemul On Thu, Oct 03, 2013 at 10:55:28AM +0200, Stephane Eranian wrote: > On Wed, Oct 2, 2013 at 10:31 PM, Peter Zijlstra <peterz@infradead.org> wrote: > > On Wed, Oct 02, 2013 at 12:38:54PM -0700, Kees Cook wrote: > >> On Wed, Oct 2, 2013 at 12:00 PM, Peter Zijlstra <peterz@infradead.org> wrote: > >> > On Wed, Oct 02, 2013 at 11:10:15AM -0700, Kees Cook wrote: > >> >> Seems like a simple enough solution. Surely there must be a catch. :) > >> > > >> > I didn't want to add this to the core mm just for perf.. > >> > >> It seems like it would be pretty inexpensive. It might also be > >> valuable in other situations. Not that I can think of any at the > >> moment. Additionally, it could likely be hidden by a CONFIG, so that > >> if perf isn't built in, there's no change? > > > > You optimist, you think you can build a kernel without perf? ;-) > > > > Its just that I would hate to add more completely global state to the > > fork() path. The tasklist_lock might be hard to crack, but at least the > > pid-hash could use per bucket locks (it doesn't apparently). > > > > I suppose people don't really care that much about fork() performance; > > which is sad. KSM and THP also add their own global locks :-( > > I don't know the MM code but I assume that that vm_mm struct is > allocated dynamically > and maybe you already grabbing a lock while doing this. Could we > leverage that lock > to increment a global generation number? I thought Pavel added a mm identifier for criu, but can't find it right now. -Andi -- ak@linux.intel.com -- Speaking for myself only ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-10-02 13:01 ` Peter Zijlstra 2013-10-02 13:13 ` Stephane Eranian @ 2013-10-07 11:16 ` Stephane Eranian 2013-10-07 11:32 ` Peter Zijlstra 1 sibling, 1 reply; 42+ messages in thread From: Stephane Eranian @ 2013-10-07 11:16 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, LKML, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, David Ahern, Jiri Olsa, Hugh Dickins, Kees Cook, Linus Torvalds, Andrew Morton On Wed, Oct 2, 2013 at 3:01 PM, Peter Zijlstra <peterz@infradead.org> wrote: > On Wed, Oct 02, 2013 at 02:59:32PM +0200, Stephane Eranian wrote: >> On Wed, Oct 2, 2013 at 2:46 PM, Peter Zijlstra <peterz@infradead.org> wrote: >> > On Wed, Oct 02, 2013 at 02:39:53PM +0200, Ingo Molnar wrote: >> >> - then there are timing attacks, and someone having access to a PMU >> >> context and who can trigger this SHA1 computation arbitrarily in task >> >> local context can run very accurate and low noise timing attacks... >> >> >> >> I don't think the kernel's sha_transform() is hardened against timing >> >> attacks, it's performance optimized so it has variable execution time >> >> highly dependent on plaintext input - which leaks information about the >> >> plaintext. >> > >> > Typical user doesn't have enough priv to profile kernel space; once you >> > do you also have enough priv to see kernel addresses outright (ie. >> > kallsyms etc..). >> > >> I was going to say just that. But that's not the default, paranoid level >> is at 1 by default and not 2. So I supposedly can still do: > > Oh right you are.. so yes that's a very viable avenue. I am going to try this out today. I think if it works well, we could also simplify the MMAP2 record and just pass this unique id for all cases.MMAP2 is only in rcX release so far. Is that still possible? ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-10-07 11:16 ` Stephane Eranian @ 2013-10-07 11:32 ` Peter Zijlstra 0 siblings, 0 replies; 42+ messages in thread From: Peter Zijlstra @ 2013-10-07 11:32 UTC (permalink / raw) To: Stephane Eranian Cc: Ingo Molnar, LKML, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, David Ahern, Jiri Olsa, Hugh Dickins, Kees Cook, Linus Torvalds, Andrew Morton On Mon, Oct 07, 2013 at 01:16:35PM +0200, Stephane Eranian wrote: > I am going to try this out today. I think if it works well, we could > also simplify the > MMAP2 record and just pass this unique id for all cases.MMAP2 is only in rcX > release so far. Is that still possible? Yeah I think so. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-10-02 12:46 ` Peter Zijlstra 2013-10-02 12:59 ` Stephane Eranian @ 2013-10-02 15:22 ` Ingo Molnar 1 sibling, 0 replies; 42+ messages in thread From: Ingo Molnar @ 2013-10-02 15:22 UTC (permalink / raw) To: Peter Zijlstra Cc: Stephane Eranian, LKML, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, David Ahern, Jiri Olsa, Hugh Dickins, Kees Cook, Linus Torvalds, Andrew Morton * Peter Zijlstra <peterz@infradead.org> wrote: > On Wed, Oct 02, 2013 at 02:39:53PM +0200, Ingo Molnar wrote: > > - then there are timing attacks, and someone having access to a PMU > > context and who can trigger this SHA1 computation arbitrarily in task > > local context can run very accurate and low noise timing attacks... > > > > I don't think the kernel's sha_transform() is hardened against timing > > attacks, it's performance optimized so it has variable execution time > > highly dependent on plaintext input - which leaks information about the > > plaintext. > > Typical user doesn't have enough priv to profile kernel space; once you > do you also have enough priv to see kernel addresses outright (ie. > kallsyms etc..). I didn't mean profiling - that's not a 'timing attack'. A simple RDTSC done around repeated calls to sha_transform() using kernel functionality is. Thanks, Ingo ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-09-30 15:44 [RFC] perf: mmap2 not covering VM_CLONE regions Stephane Eranian 2013-09-30 16:15 ` Peter Zijlstra @ 2013-10-08 14:23 ` David Ahern 2013-10-08 19:41 ` Ingo Molnar 1 sibling, 1 reply; 42+ messages in thread From: David Ahern @ 2013-10-08 14:23 UTC (permalink / raw) To: Stephane Eranian Cc: LKML, Peter Zijlstra, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, Jiri Olsa, Hugh Dickins Stephane: On 9/30/13 9:44 AM, Stephane Eranian wrote: > I was alerted by people trying to use the PERF_RECORD_MMAP2 > record to disambiguate virtual address mappings that there is a case > where the record does not contain enough information. > > As you know, the MMAP2 record adds the major, minor, ino number, > inode generation numbers to a mapping. But it does that only for > file or pseudo -file backed mappings. That covers file mmaps and also > SYSV shared memory segments. > > However there is a another kind of situation that arises in some > multi-process benchmarks where a region of memory is cloned > using VM_CLONE. As such, the virtual addresses match between > the processes but the major, minor, inode, inode generation fields > are all zeroes because there is no inode associated with the mapping. > Yet, it is important for the tool to know the mappings between the > processes are pointing to the same physical data. > > We need to cover this case and I am seeking for advice on how to > best address this need given that we discarded using the plain physical > address for disambiguation. If the current MMAP2 is not a complete solution for what you (Google) need, should support be reverted before 3.12 is released? No sense in making this part of the forever API if more work is needed on it. David ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-10-08 14:23 ` David Ahern @ 2013-10-08 19:41 ` Ingo Molnar 2013-10-08 19:54 ` Stephane Eranian 0 siblings, 1 reply; 42+ messages in thread From: Ingo Molnar @ 2013-10-08 19:41 UTC (permalink / raw) To: David Ahern Cc: Stephane Eranian, LKML, Peter Zijlstra, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, Jiri Olsa, Hugh Dickins * David Ahern <dsahern@gmail.com> wrote: > Stephane: > > On 9/30/13 9:44 AM, Stephane Eranian wrote: > >I was alerted by people trying to use the PERF_RECORD_MMAP2 > >record to disambiguate virtual address mappings that there is a case > >where the record does not contain enough information. > > > >As you know, the MMAP2 record adds the major, minor, ino number, > >inode generation numbers to a mapping. But it does that only for > >file or pseudo -file backed mappings. That covers file mmaps and also > >SYSV shared memory segments. > > > >However there is a another kind of situation that arises in some > >multi-process benchmarks where a region of memory is cloned > >using VM_CLONE. As such, the virtual addresses match between > >the processes but the major, minor, inode, inode generation fields > >are all zeroes because there is no inode associated with the mapping. > >Yet, it is important for the tool to know the mappings between the > >processes are pointing to the same physical data. > > > >We need to cover this case and I am seeking for advice on how to > >best address this need given that we discarded using the plain physical > >address for disambiguation. > > > If the current MMAP2 is not a complete solution for what you (Google) > need, should support be reverted before 3.12 is released? No sense in > making this part of the forever API if more work is needed on it. Instead of a full revert we could just turn off the ABI portion minimally and not recognize it for now. Assuming a more complete solution is in the works for v3.13. Thanks, Ingo ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-10-08 19:41 ` Ingo Molnar @ 2013-10-08 19:54 ` Stephane Eranian 2013-10-08 19:57 ` David Ahern 2013-10-09 9:59 ` Ingo Molnar 0 siblings, 2 replies; 42+ messages in thread From: Stephane Eranian @ 2013-10-08 19:54 UTC (permalink / raw) To: Ingo Molnar Cc: David Ahern, LKML, Peter Zijlstra, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, Jiri Olsa, Hugh Dickins On Tue, Oct 8, 2013 at 9:41 PM, Ingo Molnar <mingo@kernel.org> wrote: > > * David Ahern <dsahern@gmail.com> wrote: > >> Stephane: >> >> On 9/30/13 9:44 AM, Stephane Eranian wrote: >> >I was alerted by people trying to use the PERF_RECORD_MMAP2 >> >record to disambiguate virtual address mappings that there is a case >> >where the record does not contain enough information. >> > >> >As you know, the MMAP2 record adds the major, minor, ino number, >> >inode generation numbers to a mapping. But it does that only for >> >file or pseudo -file backed mappings. That covers file mmaps and also >> >SYSV shared memory segments. >> > >> >However there is a another kind of situation that arises in some >> >multi-process benchmarks where a region of memory is cloned >> >using VM_CLONE. As such, the virtual addresses match between >> >the processes but the major, minor, inode, inode generation fields >> >are all zeroes because there is no inode associated with the mapping. >> >Yet, it is important for the tool to know the mappings between the >> >processes are pointing to the same physical data. >> > >> >We need to cover this case and I am seeking for advice on how to >> >best address this need given that we discarded using the plain physical >> >address for disambiguation. >> >> >> If the current MMAP2 is not a complete solution for what you (Google) >> need, should support be reverted before 3.12 is released? No sense in >> making this part of the forever API if more work is needed on it. > > Instead of a full revert we could just turn off the ABI portion minimally > and not recognize it for now. Assuming a more complete solution is in the > works for v3.13. > That's a possibility. They are also pieces in the perf tool itself. We could certainly make the attr->mmap2 bit disappear. I think it boils down to how can we uniquely identify virtual mapping to the same physical data either via shmat(), files, VM_CLONE. We had all covered but the last case with the ino approach. We don't have a solution for VM_CLONE yet. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-10-08 19:54 ` Stephane Eranian @ 2013-10-08 19:57 ` David Ahern 2013-10-09 9:54 ` Ingo Molnar 2013-10-09 9:59 ` Ingo Molnar 1 sibling, 1 reply; 42+ messages in thread From: David Ahern @ 2013-10-08 19:57 UTC (permalink / raw) To: Stephane Eranian Cc: Ingo Molnar, LKML, Peter Zijlstra, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, Jiri Olsa, Hugh Dickins On 10/8/13 1:54 PM, Stephane Eranian wrote: >>> If the current MMAP2 is not a complete solution for what you (Google) >>> need, should support be reverted before 3.12 is released? No sense in >>> making this part of the forever API if more work is needed on it. >> >> Instead of a full revert we could just turn off the ABI portion minimally >> and not recognize it for now. Assuming a more complete solution is in the >> works for v3.13. >> > That's a possibility. They are also pieces in the perf tool itself. > We could certainly make the attr->mmap2 bit disappear. > > I think it boils down to how can we uniquely identify virtual > mapping to the same physical data either via shmat(), files, VM_CLONE. > We had all covered but the last case with the ino approach. We don't have > a solution for VM_CLONE yet. > I was mainly thinking the 2 parts that generate MMAP2 events: kernel side it's the mmap2 attribute and perf_event_mmap_output; userspace side it's perf_event__synthesize_mmap_event. Certainly the rest of the plumbing can be left in place until the solution is refined as needed. David ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-10-08 19:57 ` David Ahern @ 2013-10-09 9:54 ` Ingo Molnar 0 siblings, 0 replies; 42+ messages in thread From: Ingo Molnar @ 2013-10-09 9:54 UTC (permalink / raw) To: David Ahern Cc: Stephane Eranian, LKML, Peter Zijlstra, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, Jiri Olsa, Hugh Dickins * David Ahern <dsahern@gmail.com> wrote: > On 10/8/13 1:54 PM, Stephane Eranian wrote: > >>>If the current MMAP2 is not a complete solution for what you (Google) > >>>need, should support be reverted before 3.12 is released? No sense in > >>>making this part of the forever API if more work is needed on it. > >> > >>Instead of a full revert we could just turn off the ABI portion minimally > >>and not recognize it for now. Assuming a more complete solution is in the > >>works for v3.13. > >> > >That's a possibility. They are also pieces in the perf tool itself. > >We could certainly make the attr->mmap2 bit disappear. > > > >I think it boils down to how can we uniquely identify virtual > >mapping to the same physical data either via shmat(), files, VM_CLONE. > >We had all covered but the last case with the ino approach. We don't have > >a solution for VM_CLONE yet. > > > > I was mainly thinking the 2 parts that generate MMAP2 events: kernel > side it's the mmap2 attribute and perf_event_mmap_output; userspace side > it's perf_event__synthesize_mmap_event. Certainly the rest of the > plumbing can be left in place until the solution is refined as needed. Could some of you please send a patch ASAP? Thanks, Ingo ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-10-08 19:54 ` Stephane Eranian 2013-10-08 19:57 ` David Ahern @ 2013-10-09 9:59 ` Ingo Molnar 2013-10-09 10:39 ` Peter Zijlstra 1 sibling, 1 reply; 42+ messages in thread From: Ingo Molnar @ 2013-10-09 9:59 UTC (permalink / raw) To: Stephane Eranian Cc: David Ahern, LKML, Peter Zijlstra, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, Jiri Olsa, Hugh Dickins * Stephane Eranian <eranian@google.com> wrote: > > Instead of a full revert we could just turn off the ABI portion > > minimally and not recognize it for now. Assuming a more complete > > solution is in the works for v3.13. > > That's a possibility. They are also pieces in the perf tool itself. We > could certainly make the attr->mmap2 bit disappear. > > I think it boils down to how can we uniquely identify virtual mapping to > the same physical data either via shmat(), files, VM_CLONE. We had all > covered but the last case with the ino approach. We don't have a > solution for VM_CLONE yet. PeterZ didn't like exposing the physical RAM address, right? Peter: could we expose the page frame number instead? (Maybe mix it with a random seed through the hash-mixer and expose that.) User-space will thus be able to discover whether two pages are shared or not, but not extract other information. The global MM ID thing looked rather ugly as well. That way we could drop the inode info as well? That feels a bit hacky too. Thanks, Ingo ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] perf: mmap2 not covering VM_CLONE regions 2013-10-09 9:59 ` Ingo Molnar @ 2013-10-09 10:39 ` Peter Zijlstra 0 siblings, 0 replies; 42+ messages in thread From: Peter Zijlstra @ 2013-10-09 10:39 UTC (permalink / raw) To: Ingo Molnar Cc: Stephane Eranian, David Ahern, LKML, mingo@elte.hu, ak@linux.intel.com, Arnaldo Carvalho de Melo, Jiri Olsa, Hugh Dickins On Wed, Oct 09, 2013 at 11:59:06AM +0200, Ingo Molnar wrote: > PeterZ didn't like exposing the physical RAM address, right? More than not like; its useless and broken, see below. > Peter: could we expose the page frame number instead? (Maybe mix it with a > random seed through the hash-mixer and expose that.) User-space will thus > be able to discover whether two pages are shared or not, but not extract > other information. No, all physical stuff is useless, a page can get moved about through swap/ksm/thp/compaction/etc. > The global MM ID thing looked rather ugly as well. I'm still having the argument with Stephane -- although it fell off-list -- on why this wouldn't work. AFAICT the mm_id should work for CLONE_VM, as CLONE_VM inherits the entire address space so all should be identical when mm_id match. > That way we could drop the inode info as well? That feels a bit hacky too. No, the inode stuff is required for shared pages, its the only way to track them; and barring a filename (which we've so far used for this) the dev:ino:gen thing is the only way to uniquely identify files -- it even works where filenames don't, eg. filesystem namespaces and weird things like shm that don't have stable filenames. ^ permalink raw reply [flat|nested] 42+ messages in thread
end of thread, other threads:[~2013-10-09 10:39 UTC | newest] Thread overview: 42+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-09-30 15:44 [RFC] perf: mmap2 not covering VM_CLONE regions Stephane Eranian 2013-09-30 16:15 ` Peter Zijlstra 2013-09-30 16:48 ` Stephane Eranian 2013-09-30 16:54 ` Peter Zijlstra 2013-10-01 11:22 ` Stephane Eranian 2013-10-02 11:23 ` Peter Zijlstra 2013-10-02 11:58 ` Peter Zijlstra 2013-10-02 12:39 ` Ingo Molnar 2013-10-02 12:46 ` Peter Zijlstra 2013-10-02 12:59 ` Stephane Eranian 2013-10-02 13:01 ` Peter Zijlstra 2013-10-02 13:13 ` Stephane Eranian 2013-10-02 13:37 ` Peter Zijlstra 2013-10-02 17:14 ` Kees Cook 2013-10-02 17:20 ` Stephane Eranian 2013-10-02 17:29 ` Kees Cook 2013-10-02 17:49 ` Stephane Eranian 2013-10-02 18:10 ` Kees Cook 2013-10-02 19:00 ` Peter Zijlstra 2013-10-02 19:38 ` Kees Cook 2013-10-02 20:31 ` Peter Zijlstra 2013-10-03 8:55 ` Stephane Eranian 2013-10-03 9:03 ` Peter Zijlstra 2013-10-03 9:13 ` Stephane Eranian 2013-10-03 15:34 ` Kees Cook 2013-10-07 21:04 ` Stephane Eranian 2013-10-08 6:54 ` Peter Zijlstra 2013-10-08 7:15 ` Stephane Eranian 2013-10-08 9:36 ` Peter Zijlstra 2013-10-08 9:42 ` Stephane Eranian 2013-10-08 9:54 ` Peter Zijlstra 2013-10-03 18:32 ` Andi Kleen 2013-10-07 11:16 ` Stephane Eranian 2013-10-07 11:32 ` Peter Zijlstra 2013-10-02 15:22 ` Ingo Molnar 2013-10-08 14:23 ` David Ahern 2013-10-08 19:41 ` Ingo Molnar 2013-10-08 19:54 ` Stephane Eranian 2013-10-08 19:57 ` David Ahern 2013-10-09 9:54 ` Ingo Molnar 2013-10-09 9:59 ` Ingo Molnar 2013-10-09 10:39 ` Peter Zijlstra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).