linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv3] Better kernel instruction abort handling
@ 2016-07-05 22:22 Laura Abbott
  2016-07-05 22:22 ` [PATCHv3] arm64: Handle el1 synchronous instruction aborts cleanly Laura Abbott
  0 siblings, 1 reply; 3+ messages in thread
From: Laura Abbott @ 2016-07-05 22:22 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

This is v3 of the patch to make instruction aborts print a nicer more standard\
error message (i.e. no more bad mode)

Mark Rutland pointed out in v2 that we need to audit do_mem_abort paths. Of the
functions that do_mem_abort can call, do_bad, do_translation_fault, and
do_alignment_fault all mostly reduce to calling do_bad_area which should call
__do_kernel_fault directly. This makes do_page_fault and __do_kernel_fault the
only cases to review.

Mark raised the problem of taking an instruction abort with a fixup handler.
Any fixup handler being run would not exist in the exception table so there
should be no risk of looping. Another instruction abort would just reduce to
the case of an instruction abort without a fixup handler. The fixup handlers
are expecting data aborts, not instruction aborts though so while they could
run successfully, it wouldn't be for the precise right reason. Practically
speaking, I don't think it matters but to be on the safe side, the fixup
handlers are not run in __do_kernel_fault if the abort is an instruction abort.
This should cover__do_kernel_fault.

do_page_fault gets a little bit more complicated. A fault on a kernel address
should just end up in __do_kernel_fault. Extending is_permission_fault to
cover instruction aborts should be sufficient, mostly because addr == regs->pc
and there should never be a userspace address in the exception table and there
should never be a userspace address in the exception table.

So I think this should cover all cases. The sample LKDTM test cases all work
now.

Thanks,
Laura

Laura Abbott (1):
  arm64: Handle el1 synchronous instruction aborts cleanly

 arch/arm64/kernel/entry.S | 18 ++++++++++++++++++
 arch/arm64/mm/fault.c     | 11 +++++++++--
 2 files changed, 27 insertions(+), 2 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCHv3] arm64: Handle el1 synchronous instruction aborts cleanly
  2016-07-05 22:22 [PATCHv3] Better kernel instruction abort handling Laura Abbott
@ 2016-07-05 22:22 ` Laura Abbott
  2016-07-12 13:37   ` Mark Rutland
  0 siblings, 1 reply; 3+ messages in thread
From: Laura Abbott @ 2016-07-05 22:22 UTC (permalink / raw)
  To: linux-arm-kernel

Executing from a non-executable area gives an ugly message:

lkdtm: Performing direct entry EXEC_RODATA
lkdtm: attempting ok execution at ffff0000084c0e08
lkdtm: attempting bad execution at ffff000008880700
Bad mode in Synchronous Abort handler detected on CPU2, code 0x8400000e -- IABT (current EL)
CPU: 2 PID: 998 Comm: sh Not tainted 4.7.0-rc2+ #13
Hardware name: linux,dummy-virt (DT)
task: ffff800077e35780 ti: ffff800077970000 task.ti: ffff800077970000
PC is at lkdtm_rodata_do_nothing+0x0/0x8
LR is at execute_location+0x74/0x88

The 'IABT (current EL)' indicates the error but it's a bit cryptic
without knowledge of the ARM ARM. There is also no indication of the
specific address which triggered the fault. The increase in kernel
page permissions makes hitting this case more likely as well.
Handling the case in the vectors gives a much more familiar looking
error message:

lkdtm: Performing direct entry EXEC_RODATA
lkdtm: attempting ok execution at ffff0000084c0840
lkdtm: attempting bad execution at ffff000008880680
Unable to handle kernel paging request at virtual address ffff000008880680
pgd = ffff8000089b2000
[ffff000008880680] *pgd=00000000489b4003, *pud=0000000048904003, *pmd=0000000000000000
Internal error: Oops: 8400000e [#1] PREEMPT SMP
Modules linked in:
CPU: 1 PID: 997 Comm: sh Not tainted 4.7.0-rc1+ #24
Hardware name: linux,dummy-virt (DT)
task: ffff800077f9f080 ti: ffff800008a1c000 task.ti: ffff800008a1c000
PC is at lkdtm_rodata_do_nothing+0x0/0x8
LR is at execute_location+0x74/0x88

Signed-off-by: Laura Abbott <labbott@redhat.com>
---
v3: Fixup permission in do_page_fault to detect the kernel iabort, don't run
fixup handlers on kernel instruction aborts.

Dropped the Acked-by since the addition of checks is pretty significant.
---
 arch/arm64/kernel/entry.S | 18 ++++++++++++++++++
 arch/arm64/mm/fault.c     | 11 +++++++++--
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 12e8d2b..54e93d12 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -336,6 +336,8 @@ el1_sync:
 	lsr	x24, x1, #ESR_ELx_EC_SHIFT	// exception class
 	cmp	x24, #ESR_ELx_EC_DABT_CUR	// data abort in EL1
 	b.eq	el1_da
+	cmp	x24, #ESR_ELx_EC_IABT_CUR	// instruction abort in EL1
+	b.eq	el1_ia
 	cmp	x24, #ESR_ELx_EC_SYS64		// configurable trap
 	b.eq	el1_undef
 	cmp	x24, #ESR_ELx_EC_SP_ALIGN	// stack alignment exception
@@ -347,6 +349,22 @@ el1_sync:
 	cmp	x24, #ESR_ELx_EC_BREAKPT_CUR	// debug exception in EL1
 	b.ge	el1_dbg
 	b	el1_inv
+el1_ia:
+	/*
+	 * Instruction abort handling
+	 */
+	mrs	x0, far_el1
+	enable_dbg
+	// re-enable interrupts if they were enabled in the aborted context
+	tbnz	x23, #7, 1f			// PSR_I_BIT
+	enable_irq
+1:
+	mov	x2, sp				// struct pt_regs
+	bl	do_mem_abort
+
+	// disable interrupts before pulling preserved data off the stack
+	disable_irq
+	kernel_exit 1
 el1_da:
 	/*
 	 * Data abort handling
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 013e2cb..e25b0891 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -131,6 +131,11 @@ int ptep_set_access_flags(struct vm_area_struct *vma,
 }
 #endif
 
+static bool is_el1_instruction_abort(unsigned int esr)
+{
+	return ESR_ELx_EC(esr) == ESR_ELx_EC_IABT_CUR;
+}
+
 /*
  * The kernel tried to access some page that wasn't present.
  */
@@ -139,8 +144,9 @@ static void __do_kernel_fault(struct mm_struct *mm, unsigned long addr,
 {
 	/*
 	 * Are we prepared to handle this kernel fault?
+	 * We are almost certainly not prepared to handle instruction faults.
 	 */
-	if (fixup_exception(regs))
+	if (!is_el1_instruction_abort(esr) && fixup_exception(regs))
 		return;
 
 	/*
@@ -247,7 +253,8 @@ static inline int permission_fault(unsigned int esr)
 	unsigned int ec       = (esr & ESR_ELx_EC_MASK) >> ESR_ELx_EC_SHIFT;
 	unsigned int fsc_type = esr & ESR_ELx_FSC_TYPE;
 
-	return (ec == ESR_ELx_EC_DABT_CUR && fsc_type == ESR_ELx_FSC_PERM);
+	return (ec == ESR_ELx_EC_DABT_CUR && fsc_type == ESR_ELx_FSC_PERM) ||
+	       (ec == ESR_ELx_EC_IABT_CUR && fsc_type == ESR_ELx_FSC_PERM);
 }
 
 static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCHv3] arm64: Handle el1 synchronous instruction aborts cleanly
  2016-07-05 22:22 ` [PATCHv3] arm64: Handle el1 synchronous instruction aborts cleanly Laura Abbott
@ 2016-07-12 13:37   ` Mark Rutland
  0 siblings, 0 replies; 3+ messages in thread
From: Mark Rutland @ 2016-07-12 13:37 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Laura,

On Tue, Jul 05, 2016 at 03:22:53PM -0700, Laura Abbott wrote:
> Executing from a non-executable area gives an ugly message:
> 
> lkdtm: Performing direct entry EXEC_RODATA
> lkdtm: attempting ok execution at ffff0000084c0e08
> lkdtm: attempting bad execution at ffff000008880700
> Bad mode in Synchronous Abort handler detected on CPU2, code 0x8400000e -- IABT (current EL)
> CPU: 2 PID: 998 Comm: sh Not tainted 4.7.0-rc2+ #13
> Hardware name: linux,dummy-virt (DT)
> task: ffff800077e35780 ti: ffff800077970000 task.ti: ffff800077970000
> PC is at lkdtm_rodata_do_nothing+0x0/0x8
> LR is at execute_location+0x74/0x88
> 
> The 'IABT (current EL)' indicates the error but it's a bit cryptic
> without knowledge of the ARM ARM. There is also no indication of the
> specific address which triggered the fault. The increase in kernel
> page permissions makes hitting this case more likely as well.
> Handling the case in the vectors gives a much more familiar looking
> error message:
> 
> lkdtm: Performing direct entry EXEC_RODATA
> lkdtm: attempting ok execution at ffff0000084c0840
> lkdtm: attempting bad execution at ffff000008880680
> Unable to handle kernel paging request at virtual address ffff000008880680
> pgd = ffff8000089b2000
> [ffff000008880680] *pgd=00000000489b4003, *pud=0000000048904003, *pmd=0000000000000000
> Internal error: Oops: 8400000e [#1] PREEMPT SMP
> Modules linked in:
> CPU: 1 PID: 997 Comm: sh Not tainted 4.7.0-rc1+ #24
> Hardware name: linux,dummy-virt (DT)
> task: ffff800077f9f080 ti: ffff800008a1c000 task.ti: ffff800008a1c000
> PC is at lkdtm_rodata_do_nothing+0x0/0x8
> LR is at execute_location+0x74/0x88
> 
> Signed-off-by: Laura Abbott <labbott@redhat.com>

It's unfortunate that those of us used to looking for 'IABT' lose the
ability to immediately distinguish instruction and data aborts, but that
can be reverse engineered from the later register dump, or the ESR
hidden in the Oops message. I guess we'll need to do some more cleanup
work in this area to make reporting more consistently useful.

Regardless, this looks good, and worked for me in local testing. The
page table dump in the report looks especially useful.

So, with the below comments addressed:

Acked-by: Mark Rutland <mark.rutland@arm.com>

> ---
> v3: Fixup permission in do_page_fault to detect the kernel iabort, don't run
> fixup handlers on kernel instruction aborts.
> 
> Dropped the Acked-by since the addition of checks is pretty significant.
> ---
>  arch/arm64/kernel/entry.S | 18 ++++++++++++++++++
>  arch/arm64/mm/fault.c     | 11 +++++++++--
>  2 files changed, 27 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index 12e8d2b..54e93d12 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -336,6 +336,8 @@ el1_sync:
>  	lsr	x24, x1, #ESR_ELx_EC_SHIFT	// exception class
>  	cmp	x24, #ESR_ELx_EC_DABT_CUR	// data abort in EL1
>  	b.eq	el1_da
> +	cmp	x24, #ESR_ELx_EC_IABT_CUR	// instruction abort in EL1
> +	b.eq	el1_ia
>  	cmp	x24, #ESR_ELx_EC_SYS64		// configurable trap
>  	b.eq	el1_undef
>  	cmp	x24, #ESR_ELx_EC_SP_ALIGN	// stack alignment exception
> @@ -347,6 +349,22 @@ el1_sync:
>  	cmp	x24, #ESR_ELx_EC_BREAKPT_CUR	// debug exception in EL1
>  	b.ge	el1_dbg
>  	b	el1_inv
> +el1_ia:
> +	/*
> +	 * Instruction abort handling
> +	 */
> +	mrs	x0, far_el1
> +	enable_dbg
> +	// re-enable interrupts if they were enabled in the aborted context
> +	tbnz	x23, #7, 1f			// PSR_I_BIT
> +	enable_irq
> +1:
> +	mov	x2, sp				// struct pt_regs
> +	bl	do_mem_abort
> +
> +	// disable interrupts before pulling preserved data off the stack
> +	disable_irq
> +	kernel_exit 1
>  el1_da:
>  	/*
>  	 * Data abort handling
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 013e2cb..e25b0891 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -131,6 +131,11 @@ int ptep_set_access_flags(struct vm_area_struct *vma,
>  }
>  #endif
>  
> +static bool is_el1_instruction_abort(unsigned int esr)
> +{
> +	return ESR_ELx_EC(esr) == ESR_ELx_EC_IABT_CUR;
> +}

Could we check this in do_page_fault for the
!search_exception_tables(regs->pc) case?

For the EXEC_USERSPACE case, we will log "Accessing user space memory
outside uaccess.h routines", which seems a little off. It would be nice
if we could use this to determine the message, and log something like
"Attempting to execute userspace memory" in the case.

> +
>  /*
>   * The kernel tried to access some page that wasn't present.
>   */
> @@ -139,8 +144,9 @@ static void __do_kernel_fault(struct mm_struct *mm, unsigned long addr,
>  {
>  	/*
>  	 * Are we prepared to handle this kernel fault?
> +	 * We are almost certainly not prepared to handle instruction faults.
>  	 */
> -	if (fixup_exception(regs))
> +	if (!is_el1_instruction_abort(esr) && fixup_exception(regs))
>  		return;
>  
>  	/*

Your cover letter convinced me that if this occurs we're likely hosed
anyway, so I guess my prior comment about this being a gnarly case
doesn't really hold.

Given that, I'm happy with or without the is_el1_instruction_abort
check here.

> @@ -247,7 +253,8 @@ static inline int permission_fault(unsigned int esr)
>  	unsigned int ec       = (esr & ESR_ELx_EC_MASK) >> ESR_ELx_EC_SHIFT;
>  	unsigned int fsc_type = esr & ESR_ELx_FSC_TYPE;
>  
> -	return (ec == ESR_ELx_EC_DABT_CUR && fsc_type == ESR_ELx_FSC_PERM);
> +	return (ec == ESR_ELx_EC_DABT_CUR && fsc_type == ESR_ELx_FSC_PERM) ||
> +	       (ec == ESR_ELx_EC_IABT_CUR && fsc_type == ESR_ELx_FSC_PERM);
>  }

The name of this function changed with the version of my
kill-esr-lnx-exec series queued in the arm64 for-next/core branch.
Luckily git am -3 is clever enough to figure that out itself, but you
might want to rebase.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-07-12 13:37 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-05 22:22 [PATCHv3] Better kernel instruction abort handling Laura Abbott
2016-07-05 22:22 ` [PATCHv3] arm64: Handle el1 synchronous instruction aborts cleanly Laura Abbott
2016-07-12 13:37   ` Mark Rutland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).