public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] perf/x86/intel: Work around BTS leaking kernel addresses
@ 2015-08-27 15:13 Alexander Shishkin
  2015-08-27 15:13 ` [PATCH 1/2] perf/x86/intel/ds: " Alexander Shishkin
  2015-08-27 15:13 ` [PATCH 2/2] perf/x86/intel/bts: Disallow use by unprivileged users on paranoid systems Alexander Shishkin
  0 siblings, 2 replies; 7+ messages in thread
From: Alexander Shishkin @ 2015-08-27 15:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, Arnaldo Carvalho de Melo,
	Alexander Shishkin

Hi Peter,

Due to imprecise IP sampling, BTS may (and often does) leak kernel
addresses when kernel tracing is not even enabled, which results in
mostly syscall entry points and page_fault handler addresses being
exposed. This might be a security concern for the address
randomization, and it also makes life harder for gdb that makes use of
bts samples.

This patchset works around the old (DS) driver and disables the new
(intel_bts) for the unprivileged users on systems where perf paranoia
level prohibits kernel tracing.

Not sure if these should be treated as bugfixes.

Alexander Shishkin (2):
  perf/x86/intel/ds: Work around BTS leaking kernel addresses
  perf/x86/intel/bts: Disallow use by unprivileged users on paranoid
    systems

 arch/x86/kernel/cpu/perf_event_intel_bts.c | 10 ++++++++
 arch/x86/kernel/cpu/perf_event_intel_ds.c  | 40 +++++++++++++++++++++++++-----
 2 files changed, 44 insertions(+), 6 deletions(-)

-- 
2.5.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] perf/x86/intel/ds: Work around BTS leaking kernel addresses
  2015-08-27 15:13 [PATCH 0/2] perf/x86/intel: Work around BTS leaking kernel addresses Alexander Shishkin
@ 2015-08-27 15:13 ` Alexander Shishkin
  2015-08-28  5:30   ` Ingo Molnar
  2015-08-27 15:13 ` [PATCH 2/2] perf/x86/intel/bts: Disallow use by unprivileged users on paranoid systems Alexander Shishkin
  1 sibling, 1 reply; 7+ messages in thread
From: Alexander Shishkin @ 2015-08-27 15:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, Arnaldo Carvalho de Melo,
	Alexander Shishkin

BTS leaks kernel addresses even in userspace-only mode due to imprecise IP
sampling, so sometimes syscall entry points or page fault handler addresses
end up in a userspace trace.

Since this driver uses a relatively small buffer for BTS records and it has
to iterate through them anyway, it can also take on the additional job of
filtering out the records that contain kernel addresses when kernel space
tracing is not enabled.

This patch changes the bts code to skip the offending records from perf
output. In order to request the exact amount of space on the ring buffer,
we need to do an extra pass through the records to know how many there are
of the valid ones, but considering the small size of the buffer, this extra
pass adds very little overhead to the nmi handler. This way we won't end
up with awkward IP samples with zero IPs in the perf stream.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 arch/x86/kernel/cpu/perf_event_intel_ds.c | 40 ++++++++++++++++++++++++++-----
 1 file changed, 34 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 84f236ab96..79175a2e5a 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -510,10 +510,11 @@ int intel_pmu_drain_bts_buffer(void)
 		u64	flags;
 	};
 	struct perf_event *event = cpuc->events[INTEL_PMC_IDX_FIXED_BTS];
-	struct bts_record *at, *top;
+	struct bts_record *at, *base, *top;
 	struct perf_output_handle handle;
 	struct perf_event_header header;
 	struct perf_sample_data data;
+	unsigned long skip = 0;
 	struct pt_regs regs;
 
 	if (!event)
@@ -522,10 +523,10 @@ int intel_pmu_drain_bts_buffer(void)
 	if (!x86_pmu.bts_active)
 		return 0;
 
-	at  = (struct bts_record *)(unsigned long)ds->bts_buffer_base;
-	top = (struct bts_record *)(unsigned long)ds->bts_index;
+	base = (struct bts_record *)(unsigned long)ds->bts_buffer_base;
+	top  = (struct bts_record *)(unsigned long)ds->bts_index;
 
-	if (top <= at)
+	if (top <= base)
 		return 0;
 
 	memset(&regs, 0, sizeof(regs));
@@ -535,16 +536,43 @@ int intel_pmu_drain_bts_buffer(void)
 	perf_sample_data_init(&data, 0, event->hw.last_period);
 
 	/*
+	 * BTS leaks kernel addresses in branches across the cpl boundary,
+	 * such as traps or system calls, so unless the user is asking for
+	 * kernel tracing (and right now it's not possible), we'd need to
+	 * filter them out. But first we need to count how many of those we
+	 * have in the current batch. This is an extra O(n) pass, however,
+	 * it's much faster than the other one especially considering that
+	 * n <= 2560 (BTS_BUFFER_SIZE / BTS_RECORD_SIZE * 15/16; see the
+	 * alloc_bts_buffer()).
+	 */
+	for (at = base; at < top; at++) {
+		/*
+		 * Note that right now *this* BTS code only works if
+		 * attr::exclude_kernel is set, but let's keep this extra
+		 * check here in case that changes.
+		 */
+		if (event->attr.exclude_kernel &&
+		    (at->from >= PAGE_OFFSET || at->to >= PAGE_OFFSET))
+			skip++;
+	}
+
+	/*
 	 * Prepare a generic sample, i.e. fill in the invariant fields.
 	 * We will overwrite the from and to address before we output
 	 * the sample.
 	 */
 	perf_prepare_sample(&header, &data, event, &regs);
 
-	if (perf_output_begin(&handle, event, header.size * (top - at)))
+	if (perf_output_begin(&handle, event, header.size *
+			      (top - base - skip)))
 		return 1;
 
-	for (; at < top; at++) {
+	for (at = base; at < top; at++) {
+		/* Filter out any records that contain kernel addresses. */
+		if (event->attr.exclude_kernel &&
+		    (at->from >= PAGE_OFFSET || at->to >= PAGE_OFFSET))
+			continue;
+
 		data.ip		= at->from;
 		data.addr	= at->to;
 
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] perf/x86/intel/bts: Disallow use by unprivileged users on paranoid systems
  2015-08-27 15:13 [PATCH 0/2] perf/x86/intel: Work around BTS leaking kernel addresses Alexander Shishkin
  2015-08-27 15:13 ` [PATCH 1/2] perf/x86/intel/ds: " Alexander Shishkin
@ 2015-08-27 15:13 ` Alexander Shishkin
  2015-08-28  5:31   ` Ingo Molnar
  1 sibling, 1 reply; 7+ messages in thread
From: Alexander Shishkin @ 2015-08-27 15:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, Arnaldo Carvalho de Melo,
	Alexander Shishkin

BTS leaks kernel addresses even in userspace-only mode due to imprecise IP
sampling, so sometimes syscall entry points or page fault handler addresses
end up in a userspace trace.

Now, intel_bts driver exports trace data zero-copy, it does not scan through
it to filter out the kernel addresses and it's would be a O(n) job.

To work around this situation, this patch forbids the use of intel_bts
driver by unprivileged users with paranoid setting higher than 1, which
forbids kernel tracing.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 arch/x86/kernel/cpu/perf_event_intel_bts.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_bts.c b/arch/x86/kernel/cpu/perf_event_intel_bts.c
index 80df16e020..4f6daff92d 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_bts.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_bts.c
@@ -495,6 +495,16 @@ static int bts_event_init(struct perf_event *event)
 	if (x86_add_exclusive(x86_lbr_exclusive_bts))
 		return -EBUSY;
 
+	/*
+	 * BTS leaks kernel addresses even when CPL0 tracing is
+	 * disabled, so disallow intel_bts driver for unprivileged
+	 * users on paranoid systems since it provides trace data
+	 * to the user in a zero-copy fashion.
+	 */
+	if (event->attr.exclude_kernel && perf_paranoid_kernel() &&
+	    !capable(CAP_SYS_ADMIN))
+		return -EACCES;
+
 	ret = x86_reserve_hardware();
 	if (ret) {
 		x86_del_exclusive(x86_lbr_exclusive_bts);
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] perf/x86/intel/ds: Work around BTS leaking kernel addresses
  2015-08-27 15:13 ` [PATCH 1/2] perf/x86/intel/ds: " Alexander Shishkin
@ 2015-08-28  5:30   ` Ingo Molnar
  2015-08-28  6:06     ` Alexander Shishkin
  0 siblings, 1 reply; 7+ messages in thread
From: Ingo Molnar @ 2015-08-28  5:30 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel,
	Arnaldo Carvalho de Melo


* Alexander Shishkin <alexander.shishkin@linux.intel.com> wrote:

> +	for (at = base; at < top; at++) {
> +		/*
> +		 * Note that right now *this* BTS code only works if
> +		 * attr::exclude_kernel is set, but let's keep this extra
> +		 * check here in case that changes.
> +		 */
> +		if (event->attr.exclude_kernel &&
> +		    (at->from >= PAGE_OFFSET || at->to >= PAGE_OFFSET))
> +			skip++;

Yeah, so that only works on 32-bit kernels, on 64-bit kernels the check for kernel 
addresses is to see whether it's a negative address. PAGE_OFFSET points to above 
any hypervisor's address, so even with your fix we could still leak hypervisor 
addresses.

I.e. use the kernel_ip() primitive instead.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] perf/x86/intel/bts: Disallow use by unprivileged users on paranoid systems
  2015-08-27 15:13 ` [PATCH 2/2] perf/x86/intel/bts: Disallow use by unprivileged users on paranoid systems Alexander Shishkin
@ 2015-08-28  5:31   ` Ingo Molnar
  2015-08-28  9:40     ` Alexander Shishkin
  0 siblings, 1 reply; 7+ messages in thread
From: Ingo Molnar @ 2015-08-28  5:31 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel,
	Arnaldo Carvalho de Melo


* Alexander Shishkin <alexander.shishkin@linux.intel.com> wrote:

> BTS leaks kernel addresses even in userspace-only mode due to imprecise IP
> sampling, so sometimes syscall entry points or page fault handler addresses
> end up in a userspace trace.
> 
> Now, intel_bts driver exports trace data zero-copy, it does not scan through
> it to filter out the kernel addresses and it's would be a O(n) job.
> 
> To work around this situation, this patch forbids the use of intel_bts
> driver by unprivileged users with paranoid setting higher than 1, which
> forbids kernel tracing.
> 
> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> ---
>  arch/x86/kernel/cpu/perf_event_intel_bts.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/perf_event_intel_bts.c b/arch/x86/kernel/cpu/perf_event_intel_bts.c
> index 80df16e020..4f6daff92d 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel_bts.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel_bts.c
> @@ -495,6 +495,16 @@ static int bts_event_init(struct perf_event *event)
>  	if (x86_add_exclusive(x86_lbr_exclusive_bts))
>  		return -EBUSY;
>  
> +	/*
> +	 * BTS leaks kernel addresses even when CPL0 tracing is
> +	 * disabled, so disallow intel_bts driver for unprivileged
> +	 * users on paranoid systems since it provides trace data
> +	 * to the user in a zero-copy fashion.
> +	 */
> +	if (event->attr.exclude_kernel && perf_paranoid_kernel() &&
> +	    !capable(CAP_SYS_ADMIN))
> +		return -EACCES;

I.e. it's disabled by default as well, with default paranoia settings?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] perf/x86/intel/ds: Work around BTS leaking kernel addresses
  2015-08-28  5:30   ` Ingo Molnar
@ 2015-08-28  6:06     ` Alexander Shishkin
  0 siblings, 0 replies; 7+ messages in thread
From: Alexander Shishkin @ 2015-08-28  6:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel,
	Arnaldo Carvalho de Melo

Ingo Molnar <mingo@kernel.org> writes:

> * Alexander Shishkin <alexander.shishkin@linux.intel.com> wrote:
>
>> +	for (at = base; at < top; at++) {
>> +		/*
>> +		 * Note that right now *this* BTS code only works if
>> +		 * attr::exclude_kernel is set, but let's keep this extra
>> +		 * check here in case that changes.
>> +		 */
>> +		if (event->attr.exclude_kernel &&
>> +		    (at->from >= PAGE_OFFSET || at->to >= PAGE_OFFSET))
>> +			skip++;
>
> Yeah, so that only works on 32-bit kernels, on 64-bit kernels the check for kernel 
> addresses is to see whether it's a negative address. PAGE_OFFSET points to above 
> any hypervisor's address, so even with your fix we could still leak hypervisor 
> addresses.
>
> I.e. use the kernel_ip() primitive instead.

That's what I've been looking for, thanks!

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] perf/x86/intel/bts: Disallow use by unprivileged users on paranoid systems
  2015-08-28  5:31   ` Ingo Molnar
@ 2015-08-28  9:40     ` Alexander Shishkin
  0 siblings, 0 replies; 7+ messages in thread
From: Alexander Shishkin @ 2015-08-28  9:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel,
	Arnaldo Carvalho de Melo

Ingo Molnar <mingo@kernel.org> writes:

> * Alexander Shishkin <alexander.shishkin@linux.intel.com> wrote:
>
>> BTS leaks kernel addresses even in userspace-only mode due to imprecise IP
>> sampling, so sometimes syscall entry points or page fault handler addresses
>> end up in a userspace trace.
>> 
>> Now, intel_bts driver exports trace data zero-copy, it does not scan through
>> it to filter out the kernel addresses and it's would be a O(n) job.
>> 
>> To work around this situation, this patch forbids the use of intel_bts
>> driver by unprivileged users with paranoid setting higher than 1, which
>> forbids kernel tracing.
>> 
>> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
>> ---
>>  arch/x86/kernel/cpu/perf_event_intel_bts.c | 10 ++++++++++
>>  1 file changed, 10 insertions(+)
>> 
>> diff --git a/arch/x86/kernel/cpu/perf_event_intel_bts.c b/arch/x86/kernel/cpu/perf_event_intel_bts.c
>> index 80df16e020..4f6daff92d 100644
>> --- a/arch/x86/kernel/cpu/perf_event_intel_bts.c
>> +++ b/arch/x86/kernel/cpu/perf_event_intel_bts.c
>> @@ -495,6 +495,16 @@ static int bts_event_init(struct perf_event *event)
>>  	if (x86_add_exclusive(x86_lbr_exclusive_bts))
>>  		return -EBUSY;
>>  
>> +	/*
>> +	 * BTS leaks kernel addresses even when CPL0 tracing is
>> +	 * disabled, so disallow intel_bts driver for unprivileged
>> +	 * users on paranoid systems since it provides trace data
>> +	 * to the user in a zero-copy fashion.
>> +	 */
>> +	if (event->attr.exclude_kernel && perf_paranoid_kernel() &&
>> +	    !capable(CAP_SYS_ADMIN))
>> +		return -EACCES;
>
> I.e. it's disabled by default as well, with default paranoia settings?

Actually no, the kernel's default is 1, which allows kernel profiling
for unprivileged users. Distros might be more strict though.

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-08-28  9:40 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-27 15:13 [PATCH 0/2] perf/x86/intel: Work around BTS leaking kernel addresses Alexander Shishkin
2015-08-27 15:13 ` [PATCH 1/2] perf/x86/intel/ds: " Alexander Shishkin
2015-08-28  5:30   ` Ingo Molnar
2015-08-28  6:06     ` Alexander Shishkin
2015-08-27 15:13 ` [PATCH 2/2] perf/x86/intel/bts: Disallow use by unprivileged users on paranoid systems Alexander Shishkin
2015-08-28  5:31   ` Ingo Molnar
2015-08-28  9:40     ` Alexander Shishkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox