public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] perf/core: Drop kernel samples even though :u is specified
@ 2017-05-23 10:16 Jin Yao
  2017-05-23 10:58 ` Mark Rutland
  0 siblings, 1 reply; 3+ messages in thread
From: Jin Yao @ 2017-05-23 10:16 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, will.deacon, mark.rutland,
	Jin Yao

When doing sampling, for example,

perf record -e cycles:u ...

On workloads that do a lot of kernel entry/exits we see kernel
samples, even though :u is specified. This is due to skid existing.

This is a security issue because it can leak kernel addresses even
though kernel sampling support is disabled.

The patch drops the kernel samples if exclude_kernel is specified.

For example, test on Haswell desktop.

perf record -e cycles:u <mgen>
perf report --stdio

Before patch applied:

    99.77%  mgen     mgen              [.] buf_read
     0.20%  mgen     mgen              [.] rand_buf_init
     0.01%  mgen     [kernel.vmlinux]  [k] apic_timer_interrupt
     0.00%  mgen     mgen              [.] last_free_elem
     0.00%  mgen     libc-2.23.so      [.] __random_r
     0.00%  mgen     libc-2.23.so      [.] _int_malloc
     0.00%  mgen     mgen              [.] rand_array_init
     0.00%  mgen     [kernel.vmlinux]  [k] page_fault
     0.00%  mgen     libc-2.23.so      [.] __random
     0.00%  mgen     libc-2.23.so      [.] __strcasestr
     0.00%  mgen     ld-2.23.so        [.] strcmp
     0.00%  mgen     ld-2.23.so        [.] _dl_start
     0.00%  mgen     libc-2.23.so      [.] sched_setaffinity@@GLIBC_2.3.4
     0.00%  mgen     ld-2.23.so        [.] _start

We can see kernel symbols apic_timer_interrupt and page_fault.

After patch applied:

    99.79%  mgen     mgen           [.] buf_read
     0.19%  mgen     mgen           [.] rand_buf_init
     0.00%  mgen     libc-2.23.so   [.] __random_r
     0.00%  mgen     mgen           [.] rand_array_init
     0.00%  mgen     mgen           [.] last_free_elem
     0.00%  mgen     libc-2.23.so   [.] vfprintf
     0.00%  mgen     libc-2.23.so   [.] rand
     0.00%  mgen     libc-2.23.so   [.] __random
     0.00%  mgen     libc-2.23.so   [.] _int_malloc
     0.00%  mgen     libc-2.23.so   [.] _IO_doallocbuf
     0.00%  mgen     ld-2.23.so     [.] do_lookup_x
     0.00%  mgen     ld-2.23.so     [.] open_verify.constprop.7
     0.00%  mgen     ld-2.23.so     [.] _dl_important_hwcaps
     0.00%  mgen     libc-2.23.so   [.] sched_setaffinity@@GLIBC_2.3.4
     0.00%  mgen     ld-2.23.so     [.] _start

There are only userspace symbols.

Change-log:
-----------

v2: Since other arch has similar skid issue in sampling, so move the
    skid checking from x86 code to generic code. Use a new flag
    PERF_PMU_CAP_NO_SKID to let arch decide if keep original behavior
    or drop the skid kernel samples.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 include/linux/perf_event.h |  1 +
 kernel/events/core.c       | 26 ++++++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 24a6358..8d48668 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -254,6 +254,7 @@ struct perf_event;
 #define PERF_PMU_CAP_EXCLUSIVE			0x10
 #define PERF_PMU_CAP_ITRACE			0x20
 #define PERF_PMU_CAP_HETEROGENEOUS_CPUS		0x40
+#define PERF_PMU_CAP_NO_SKID			0x80
 
 /**
  * struct pmu - generic performance monitoring unit
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6e75a5c..8a49251 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7316,6 +7316,26 @@ int perf_event_account_interrupt(struct perf_event *event)
 	return __perf_event_account_interrupt(event, 1);
 }
 
+static bool skid_kernel_samples(struct perf_event *event, struct pt_regs *regs)
+{
+	/*
+	 * We may get kernel samples even though exclude_kernel
+	 * is specified due to potential skid in sampling.
+	 * The skid kernel samples could be dropped or just do
+	 * nothing by testing the flag PERF_PMU_CAP_NO_SKID.
+	 */
+	if (event->pmu->capabilities & PERF_PMU_CAP_NO_SKID)
+		return false;
+
+	if (event->attr.exclude_kernel &&
+	    !user_mode(regs) &&
+	    (event->attr.sample_type & PERF_SAMPLE_IP)) {
+		return true;
+	}
+
+	return false;
+}
+
 /*
  * Generic event overflow handling, sampling.
  */
@@ -7337,6 +7357,12 @@ static int __perf_event_overflow(struct perf_event *event,
 	ret = __perf_event_account_interrupt(event, throttle);
 
 	/*
+	 * For security, drop the skid kernel samples if necessary.
+	 */
+	if (skid_kernel_samples(event, regs))
+		return ret;
+
+	/*
 	 * XXX event_limit might not quite work as expected on inherited
 	 * events
 	 */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] perf/core: Drop kernel samples even though :u is specified
  2017-05-23 10:16 [PATCH v2] perf/core: Drop kernel samples even though :u is specified Jin Yao
@ 2017-05-23 10:58 ` Mark Rutland
  2017-05-23 12:32   ` Jin, Yao
  0 siblings, 1 reply; 3+ messages in thread
From: Mark Rutland @ 2017-05-23 10:58 UTC (permalink / raw)
  To: Jin Yao
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin, will.deacon

Hi,

On Tue, May 23, 2017 at 06:16:10PM +0800, Jin Yao wrote:
> When doing sampling, for example,
> 
> perf record -e cycles:u ...
> 
> On workloads that do a lot of kernel entry/exits we see kernel
> samples, even though :u is specified. This is due to skid existing.
> 
> This is a security issue because it can leak kernel addresses even
> though kernel sampling support is disabled.
> 
> The patch drops the kernel samples if exclude_kernel is specified.

[...]

> +static bool skid_kernel_samples(struct perf_event *event, struct pt_regs *regs)

The name is a bit opaque, especially where it is used in
__perf_event_overflow(). 

How about we invert the polarity and call this sample_is_allowed() ?

> +{
> +	/*
> +	 * We may get kernel samples even though exclude_kernel
> +	 * is specified due to potential skid in sampling.
> +	 * The skid kernel samples could be dropped or just do
> +	 * nothing by testing the flag PERF_PMU_CAP_NO_SKID.
> +	 */
> +	if (event->pmu->capabilities & PERF_PMU_CAP_NO_SKID)
> +		return false;

Do we need this new cap?

I'd expect user_mode(regs) to be about as cheap as testing the cap, and
the common case is going to be that we we have test both.

For those PMUs without skid, when not sampling the kernel,
user_mode(regs) should always be true.

IMO, it would make more sense to just check user_mode(regs), which also
avoids any surprises with unexpected skid...

> +
> +	if (event->attr.exclude_kernel &&
> +	    !user_mode(regs) &&
> +	    (event->attr.sample_type & PERF_SAMPLE_IP)) {
> +		return true;
> +	}
> +
> +	return false;
> +}

How about:

static bool sample_is_allowed(struct perf_event *event, struct pt_regs *regs)
{
	/*
	 * Due to interrupt latency (AKA "skid"), we may enter the
	 * kernel before taking an overflow, even if the PMU is only
	 * counting user events.
	 *
	 * To avoid leaking information to userspace, we must always
	 * reject kernel samples when exclude_kernel is set.
	 */
	if (!user_mode(regs) && event->attr.exclude_kernel &&
	    (event->attr.sample_type & PERF_SAMPLE_IP))
		return false;

	return true;
}

... do we need to reject any other sample types, or do we definitely
avoid leaks by other means?

> +
>  /*
>   * Generic event overflow handling, sampling.
>   */
> @@ -7337,6 +7357,12 @@ static int __perf_event_overflow(struct perf_event *event,
>  	ret = __perf_event_account_interrupt(event, throttle);
>  
>  	/*
> +	 * For security, drop the skid kernel samples if necessary.
> +	 */
> +	if (skid_kernel_samples(event, regs))
> +		return ret;
> +

.. with the above changes, this can be:

	if (!sample_is_allowed(event, regs))
		return ret;

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] perf/core: Drop kernel samples even though :u is specified
  2017-05-23 10:58 ` Mark Rutland
@ 2017-05-23 12:32   ` Jin, Yao
  0 siblings, 0 replies; 3+ messages in thread
From: Jin, Yao @ 2017-05-23 12:32 UTC (permalink / raw)
  To: Mark Rutland
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin, will.deacon

SNIP
>> +static bool skid_kernel_samples(struct perf_event *event, struct pt_regs *regs)
> The name is a bit opaque, especially where it is used in
> __perf_event_overflow().
>
> How about we invert the polarity and call this sample_is_allowed() ?

That's OK, thanks.

>> +{
>> +	/*
>> +	 * We may get kernel samples even though exclude_kernel
>> +	 * is specified due to potential skid in sampling.
>> +	 * The skid kernel samples could be dropped or just do
>> +	 * nothing by testing the flag PERF_PMU_CAP_NO_SKID.
>> +	 */
>> +	if (event->pmu->capabilities & PERF_PMU_CAP_NO_SKID)
>> +		return false;
> Do we need this new cap?
>
> I'd expect user_mode(regs) to be about as cheap as testing the cap, and
> the common case is going to be that we we have test both.
>
> For those PMUs without skid, when not sampling the kernel,
> user_mode(regs) should always be true.
>
> IMO, it would make more sense to just check user_mode(regs), which also
> avoids any surprises with unexpected skid...
I guess the reason which Peter recommends to use a new cap is to have a 
way to keep original behavior.

If we don't need to keep original behavior, I think the new cap is not 
necessary. Could Peter provide comment? Thanks!

>> +
>> +	if (event->attr.exclude_kernel &&
>> +	    !user_mode(regs) &&
>> +	    (event->attr.sample_type & PERF_SAMPLE_IP)) {
>> +		return true;
>> +	}
>> +
>> +	return false;
>> +}
> How about:
>
> static bool sample_is_allowed(struct perf_event *event, struct pt_regs *regs)
> {
> 	/*
> 	 * Due to interrupt latency (AKA "skid"), we may enter the
> 	 * kernel before taking an overflow, even if the PMU is only
> 	 * counting user events.
> 	 *
> 	 * To avoid leaking information to userspace, we must always
> 	 * reject kernel samples when exclude_kernel is set.
> 	 */
> 	if (!user_mode(regs) && event->attr.exclude_kernel &&
> 	    (event->attr.sample_type & PERF_SAMPLE_IP))
> 		return false;
>
> 	return true;
> }
>
> ... do we need to reject any other sample types, or do we definitely
> avoid leaks by other means?
I just think only when the PERF_SAMPLE_IP is applied, we can get correct 
ip. So I check the PERF_SAMPLE_IP here.

>> +
>>   /*
>>    * Generic event overflow handling, sampling.
>>    */
>> @@ -7337,6 +7357,12 @@ static int __perf_event_overflow(struct perf_event *event,
>>   	ret = __perf_event_account_interrupt(event, throttle);
>>   
>>   	/*
>> +	 * For security, drop the skid kernel samples if necessary.
>> +	 */
>> +	if (skid_kernel_samples(event, regs))
>> +		return ret;
>> +
> .. with the above changes, this can be:
>
> 	if (!sample_is_allowed(event, regs))
> 		return ret;
>
> Thanks,
> Mark.
OK, thanks! I will change the patch according to your comments.

Thanks
Jin Yao

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-05-23 12:32 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-05-23 10:16 [PATCH v2] perf/core: Drop kernel samples even though :u is specified Jin Yao
2017-05-23 10:58 ` Mark Rutland
2017-05-23 12:32   ` Jin, Yao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox