All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] c/r: tolerate X86_EFLAGS_RF on restart
@ 2009-10-23 18:37 Oren Laadan
       [not found] ` <1256323024-20268-1-git-send-email-orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Oren Laadan @ 2009-10-23 18:37 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: Avi Kivity, Alexey Dobriyan

On restart, the c/r code that sanitizes a task's EFLAGS failed with an
error if the X86_EFLAGS_RF was set. However, there are cases in which
a task legitimately has this flag recorded at checkpoint, such as:

1) We are running inside a KVM guest, and the guest is being debugged
(see arch/x86/kvm/{svm,vmx}.c, via callbacks from qemu.

2) The kernel itself is being debugged using kgbd.

3) Exceptions that occur under certain condition will set this flag
on the currently running task. For details see page 679 in Intel's
manual (http://www.intel.com/Assets/PDF/manual/253668.pdf), like:

  When calling an event handler, Intel 64 and IA-32 processors
  establish the value of the RF flag in the EFLAGS image pushed on the
  stack:
    • For any fault-class exception except a debug exception
      generated in response to an instruction breakpoint, the value
      pushed for RF is 1.
    • For any interrupt arriving after any iteration of a repeated
      string instruction but the last iteration, the value pushed for
      RF is 1.
    • For any trap-class exception generated by any iteration of a
      repeated string instruction but the last iteration, the value
      pushed for RF is 1.
    • For other cases, the value pushed for RF is the value that was
      in EFLAG.RF at the time the event handler was called.
  [...]

Case #3 is consistent with the fact that in all cases where we
observed the issue there was a sendmail process doing IO.

Clearly, the X86_EFLAGS_RF is valid to be set.

This patches fixes the restart to:

1) Silently ignore the X86_EFLAGS_RF if was set at checkpoint: this
allows restart to tolerate the flag and safely complete the restart.

2) Preserve the existing X86_EFLAGS_RF of a restarting task (instead
of overwriting it): this ensures that restarting does not interfere
with kvm, kgdb, etc.

Signed-off-by: Oren Laadan <orenl@cs.columbia.edu>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Avi Kivity <avi@redhat.com>
---
 arch/x86/mm/checkpoint.c |   40 ++++++++++++++++++++++++++++++++++++++--
 1 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/checkpoint.c b/arch/x86/mm/checkpoint.c
index 9dd8e12..a93bb6f 100644
--- a/arch/x86/mm/checkpoint.c
+++ b/arch/x86/mm/checkpoint.c
@@ -27,13 +27,49 @@ static int check_eflags(__u32 eflags)
 #define X86_EFLAGS_CKPT_MASK  \
 	(X86_EFLAGS_CF | X86_EFLAGS_PF | X86_EFLAGS_AF | X86_EFLAGS_ZF | \
 	 X86_EFLAGS_SF | X86_EFLAGS_TF | X86_EFLAGS_DF | X86_EFLAGS_OF | \
-	 X86_EFLAGS_NT | X86_EFLAGS_AC | X86_EFLAGS_ID)
+	 X86_EFLAGS_NT | X86_EFLAGS_AC | X86_EFLAGS_ID | X86_EFLAGS_RF)
 
 	if ((eflags & ~X86_EFLAGS_CKPT_MASK) != (X86_EFLAGS_IF | 0x2))
 		return 0;
 	return 1;
 }
 
+static void restore_eflags(struct pt_regs *regs, __u32 eflags)
+{
+	/*
+	 * A task may have had X86_EFLAGS_RF set at checkpoint, .e.g:
+	 * 1) It ran in a KVM guest, and the guest was being debugged,
+	 * 2) The kernel was debugged using kgbd,
+	 * 3) From Intel's manual: "When calling an event handler, 
+	 *    Intel 64 and IA-32 processors establish the value of the
+	 *    RF flag in the EFLAGS image pushed on the stack:
+	 *  - For any fault-class exception except a debug exception
+	 *    generated in response to an instruction breakpoint, the
+	 *    value pushed for RF is 1.
+	 *  - For any interrupt arriving after any iteration of a
+	 *    repeated string instruction but the last iteration, the
+	 *    value pushed for RF is 1.
+	 *  - For any trap-class exception generated by any iteration
+	 *    of a repeated string instruction but the last iteration,
+	 *    the value pushed for RF is 1.
+	 *  - For other cases, the value pushed for RF is the value
+	 *    that was in EFLAG.RF at the time the event handler was
+	 *    called.
+	 *  [from: http://www.intel.com/Assets/PDF/manual/253668.pdf]
+	 *
+	 * So: (1) silently ignore X86_EFLAGS_RF from checkpoint time,
+	 * as it's irrelevant for the current process, (2) preserve
+	 * the existing X86_EFLAGS_RF of this process as it it may be
+	 * used by kvm/kgdb. Disable preemption to protect test and
+	 * change of our eflags.
+	 */
+	preempt_disable();
+	eflags &= ~X86_EFLAGS_RF;
+	eflags |= (regs->flags & X86_EFLAGS_RF);
+	regs->flags = eflags;
+	preempt_enable();
+}
+
 static int check_tls(struct desc_struct *desc)
 {
 	if (!desc->a && !desc->b)
@@ -433,7 +469,7 @@ static int load_cpu_regs(struct ckpt_hdr_cpu *h, struct task_struct *t)
 	regs->orig_ax = h->orig_ax;
 	regs->ip = h->ip;
 
-	regs->flags = h->flags;
+	restore_eflags(regs, h->flags);
 	regs->sp = h->sp;
 
 	regs->ds = decode_segment(h->ds);
-- 
1.6.0.4

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] c/r: tolerate X86_EFLAGS_RF on restart
       [not found] ` <1256323024-20268-1-git-send-email-orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
@ 2009-10-26 14:57   ` Serge E. Hallyn
       [not found]     ` <20091026145746.GA23564-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Serge E. Hallyn @ 2009-10-26 14:57 UTC (permalink / raw)
  To: Oren Laadan
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Avi Kivity, Alexey Dobriyan

Quoting Oren Laadan (orenl@librato.com):
> On restart, the c/r code that sanitizes a task's EFLAGS failed with an
> error if the X86_EFLAGS_RF was set. However, there are cases in which
> a task legitimately has this flag recorded at checkpoint, such as:
> 
> 1) We are running inside a KVM guest, and the guest is being debugged
> (see arch/x86/kvm/{svm,vmx}.c, via callbacks from qemu.
> 
> 2) The kernel itself is being debugged using kgbd.
> 
> 3) Exceptions that occur under certain condition will set this flag
> on the currently running task. For details see page 679 in Intel's
> manual (http://www.intel.com/Assets/PDF/manual/253668.pdf), like:
> 
>   When calling an event handler, Intel 64 and IA-32 processors
>   establish the value of the RF flag in the EFLAGS image pushed on the
>   stack:
>     • For any fault-class exception except a debug exception
>       generated in response to an instruction breakpoint, the value
>       pushed for RF is 1.
>     • For any interrupt arriving after any iteration of a repeated
>       string instruction but the last iteration, the value pushed for
>       RF is 1.
>     • For any trap-class exception generated by any iteration of a
>       repeated string instruction but the last iteration, the value
>       pushed for RF is 1.

I wasn't quite sure what a 'trap-class exception' is,
http://www.yashks.com/exception-basics.html says:

Example: INT 3(or CC or CD 21) is a trap instruction. If a program has this
instruction and if the debugger is running, it will return the next instruction
address to this debugger. Where debugger can continue it debugging.

Does that not mean that if we checkpoint t1 and t2 where t1 is ptracing t2, t2
might have RF set to 1?  So this isn't generally safe, right?

IIUC it also could be set for things like a "General Protection Exception(Type:
Fault)" at checkpoint in which case we could really mess up the sw at
restart.

Or maybe I'm totally misunderstanding...?

>     • For other cases, the value pushed for RF is the value that was
>       in EFLAG.RF at the time the event handler was called.
>   [...]
> 
> Case #3 is consistent with the fact that in all cases where we
> observed the issue there was a sendmail process doing IO.
> 
> Clearly, the X86_EFLAGS_RF is valid to be set.
> 
> This patches fixes the restart to:
> 
> 1) Silently ignore the X86_EFLAGS_RF if was set at checkpoint: this
> allows restart to tolerate the flag and safely complete the restart.
> 
> 2) Preserve the existing X86_EFLAGS_RF of a restarting task (instead
> of overwriting it): this ensures that restarting does not interfere
> with kvm, kgdb, etc.
> 
> Signed-off-by: Oren Laadan <orenl@cs.columbia.edu>
> Cc: Alexey Dobriyan <adobriyan@gmail.com>
> Cc: Matt Helsley <matthltc@us.ibm.com>
> Cc: Avi Kivity <avi@redhat.com>
> ---
>  arch/x86/mm/checkpoint.c |   40 ++++++++++++++++++++++++++++++++++++++--
>  1 files changed, 38 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/mm/checkpoint.c b/arch/x86/mm/checkpoint.c
> index 9dd8e12..a93bb6f 100644
> --- a/arch/x86/mm/checkpoint.c
> +++ b/arch/x86/mm/checkpoint.c
> @@ -27,13 +27,49 @@ static int check_eflags(__u32 eflags)
>  #define X86_EFLAGS_CKPT_MASK  \
>  	(X86_EFLAGS_CF | X86_EFLAGS_PF | X86_EFLAGS_AF | X86_EFLAGS_ZF | \
>  	 X86_EFLAGS_SF | X86_EFLAGS_TF | X86_EFLAGS_DF | X86_EFLAGS_OF | \
> -	 X86_EFLAGS_NT | X86_EFLAGS_AC | X86_EFLAGS_ID)
> +	 X86_EFLAGS_NT | X86_EFLAGS_AC | X86_EFLAGS_ID | X86_EFLAGS_RF)
>  
>  	if ((eflags & ~X86_EFLAGS_CKPT_MASK) != (X86_EFLAGS_IF | 0x2))
>  		return 0;
>  	return 1;
>  }
>  
> +static void restore_eflags(struct pt_regs *regs, __u32 eflags)
> +{
> +	/*
> +	 * A task may have had X86_EFLAGS_RF set at checkpoint, .e.g:
> +	 * 1) It ran in a KVM guest, and the guest was being debugged,
> +	 * 2) The kernel was debugged using kgbd,
> +	 * 3) From Intel's manual: "When calling an event handler, 
> +	 *    Intel 64 and IA-32 processors establish the value of the
> +	 *    RF flag in the EFLAGS image pushed on the stack:
> +	 *  - For any fault-class exception except a debug exception
> +	 *    generated in response to an instruction breakpoint, the
> +	 *    value pushed for RF is 1.
> +	 *  - For any interrupt arriving after any iteration of a
> +	 *    repeated string instruction but the last iteration, the
> +	 *    value pushed for RF is 1.
> +	 *  - For any trap-class exception generated by any iteration
> +	 *    of a repeated string instruction but the last iteration,
> +	 *    the value pushed for RF is 1.
> +	 *  - For other cases, the value pushed for RF is the value
> +	 *    that was in EFLAG.RF at the time the event handler was
> +	 *    called.
> +	 *  [from: http://www.intel.com/Assets/PDF/manual/253668.pdf]
> +	 *
> +	 * So: (1) silently ignore X86_EFLAGS_RF from checkpoint time,
> +	 * as it's irrelevant for the current process, (2) preserve
> +	 * the existing X86_EFLAGS_RF of this process as it it may be
> +	 * used by kvm/kgdb. Disable preemption to protect test and
> +	 * change of our eflags.
> +	 */
> +	preempt_disable();
> +	eflags &= ~X86_EFLAGS_RF;
> +	eflags |= (regs->flags & X86_EFLAGS_RF);
> +	regs->flags = eflags;
> +	preempt_enable();
> +}
> +
>  static int check_tls(struct desc_struct *desc)
>  {
>  	if (!desc->a && !desc->b)
> @@ -433,7 +469,7 @@ static int load_cpu_regs(struct ckpt_hdr_cpu *h, struct task_struct *t)
>  	regs->orig_ax = h->orig_ax;
>  	regs->ip = h->ip;
>  
> -	regs->flags = h->flags;
> +	restore_eflags(regs, h->flags);
>  	regs->sp = h->sp;
>  
>  	regs->ds = decode_segment(h->ds);
> -- 
> 1.6.0.4
> 
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/containers
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] c/r: tolerate X86_EFLAGS_RF on restart
       [not found]     ` <20091026145746.GA23564-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-10-29  1:20       ` Oren Laadan
       [not found]         ` <4AE8EDC4.5010202-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Oren Laadan @ 2009-10-29  1:20 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Avi Kivity, Alexey Dobriyan


After reading the code a bit more, and seeing that:

1) ptrace uses the debugreg of a process so may be interested in
that particular flag, and

2) even without ptrace userspace can set that flag (by suitably
setting the restore context from a signal handler)

I now think that we should instead:

1) Keep the X86_EFLAGS_RF from the time of checkpoint *as is*

2) If the restarting task already has this flag set prior to
restoring eflags from the saved value, then preserve the existing
flag even if at the time of checkpoint it wasn't set.

Unless someones yells, I'll commit this soon.

Oren

Serge E. Hallyn wrote:
> Quoting Oren Laadan (orenl@librato.com):
>> On restart, the c/r code that sanitizes a task's EFLAGS failed with an
>> error if the X86_EFLAGS_RF was set. However, there are cases in which
>> a task legitimately has this flag recorded at checkpoint, such as:
>>
>> 1) We are running inside a KVM guest, and the guest is being debugged
>> (see arch/x86/kvm/{svm,vmx}.c, via callbacks from qemu.
>>
>> 2) The kernel itself is being debugged using kgbd.
>>
>> 3) Exceptions that occur under certain condition will set this flag
>> on the currently running task. For details see page 679 in Intel's
>> manual (http://www.intel.com/Assets/PDF/manual/253668.pdf), like:
>>
>>   When calling an event handler, Intel 64 and IA-32 processors
>>   establish the value of the RF flag in the EFLAGS image pushed on the
>>   stack:
>>     • For any fault-class exception except a debug exception
>>       generated in response to an instruction breakpoint, the value
>>       pushed for RF is 1.
>>     • For any interrupt arriving after any iteration of a repeated
>>       string instruction but the last iteration, the value pushed for
>>       RF is 1.
>>     • For any trap-class exception generated by any iteration of a
>>       repeated string instruction but the last iteration, the value
>>       pushed for RF is 1.
> 
> I wasn't quite sure what a 'trap-class exception' is,
> http://www.yashks.com/exception-basics.html says:
> 
> Example: INT 3(or CC or CD 21) is a trap instruction. If a program has this
> instruction and if the debugger is running, it will return the next instruction
> address to this debugger. Where debugger can continue it debugging.
> 
> Does that not mean that if we checkpoint t1 and t2 where t1 is ptracing t2, t2
> might have RF set to 1?  So this isn't generally safe, right?
> 
> IIUC it also could be set for things like a "General Protection Exception(Type:
> Fault)" at checkpoint in which case we could really mess up the sw at
> restart.
> 
> Or maybe I'm totally misunderstanding...?
> 
>>     • For other cases, the value pushed for RF is the value that was
>>       in EFLAG.RF at the time the event handler was called.
>>   [...]
>>
>> Case #3 is consistent with the fact that in all cases where we
>> observed the issue there was a sendmail process doing IO.
>>
>> Clearly, the X86_EFLAGS_RF is valid to be set.
>>
>> This patches fixes the restart to:
>>
>> 1) Silently ignore the X86_EFLAGS_RF if was set at checkpoint: this
>> allows restart to tolerate the flag and safely complete the restart.
>>
>> 2) Preserve the existing X86_EFLAGS_RF of a restarting task (instead
>> of overwriting it): this ensures that restarting does not interfere
>> with kvm, kgdb, etc.
>>
>> Signed-off-by: Oren Laadan <orenl@cs.columbia.edu>
>> Cc: Alexey Dobriyan <adobriyan@gmail.com>
>> Cc: Matt Helsley <matthltc@us.ibm.com>
>> Cc: Avi Kivity <avi@redhat.com>
>> ---
>>  arch/x86/mm/checkpoint.c |   40 ++++++++++++++++++++++++++++++++++++++--
>>  1 files changed, 38 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/mm/checkpoint.c b/arch/x86/mm/checkpoint.c
>> index 9dd8e12..a93bb6f 100644
>> --- a/arch/x86/mm/checkpoint.c
>> +++ b/arch/x86/mm/checkpoint.c
>> @@ -27,13 +27,49 @@ static int check_eflags(__u32 eflags)
>>  #define X86_EFLAGS_CKPT_MASK  \
>>  	(X86_EFLAGS_CF | X86_EFLAGS_PF | X86_EFLAGS_AF | X86_EFLAGS_ZF | \
>>  	 X86_EFLAGS_SF | X86_EFLAGS_TF | X86_EFLAGS_DF | X86_EFLAGS_OF | \
>> -	 X86_EFLAGS_NT | X86_EFLAGS_AC | X86_EFLAGS_ID)
>> +	 X86_EFLAGS_NT | X86_EFLAGS_AC | X86_EFLAGS_ID | X86_EFLAGS_RF)
>>  
>>  	if ((eflags & ~X86_EFLAGS_CKPT_MASK) != (X86_EFLAGS_IF | 0x2))
>>  		return 0;
>>  	return 1;
>>  }
>>  
>> +static void restore_eflags(struct pt_regs *regs, __u32 eflags)
>> +{
>> +	/*
>> +	 * A task may have had X86_EFLAGS_RF set at checkpoint, .e.g:
>> +	 * 1) It ran in a KVM guest, and the guest was being debugged,
>> +	 * 2) The kernel was debugged using kgbd,
>> +	 * 3) From Intel's manual: "When calling an event handler, 
>> +	 *    Intel 64 and IA-32 processors establish the value of the
>> +	 *    RF flag in the EFLAGS image pushed on the stack:
>> +	 *  - For any fault-class exception except a debug exception
>> +	 *    generated in response to an instruction breakpoint, the
>> +	 *    value pushed for RF is 1.
>> +	 *  - For any interrupt arriving after any iteration of a
>> +	 *    repeated string instruction but the last iteration, the
>> +	 *    value pushed for RF is 1.
>> +	 *  - For any trap-class exception generated by any iteration
>> +	 *    of a repeated string instruction but the last iteration,
>> +	 *    the value pushed for RF is 1.
>> +	 *  - For other cases, the value pushed for RF is the value
>> +	 *    that was in EFLAG.RF at the time the event handler was
>> +	 *    called.
>> +	 *  [from: http://www.intel.com/Assets/PDF/manual/253668.pdf]
>> +	 *
>> +	 * So: (1) silently ignore X86_EFLAGS_RF from checkpoint time,
>> +	 * as it's irrelevant for the current process, (2) preserve
>> +	 * the existing X86_EFLAGS_RF of this process as it it may be
>> +	 * used by kvm/kgdb. Disable preemption to protect test and
>> +	 * change of our eflags.
>> +	 */
>> +	preempt_disable();
>> +	eflags &= ~X86_EFLAGS_RF;
>> +	eflags |= (regs->flags & X86_EFLAGS_RF);
>> +	regs->flags = eflags;
>> +	preempt_enable();
>> +}
>> +
>>  static int check_tls(struct desc_struct *desc)
>>  {
>>  	if (!desc->a && !desc->b)
>> @@ -433,7 +469,7 @@ static int load_cpu_regs(struct ckpt_hdr_cpu *h, struct task_struct *t)
>>  	regs->orig_ax = h->orig_ax;
>>  	regs->ip = h->ip;
>>  
>> -	regs->flags = h->flags;
>> +	restore_eflags(regs, h->flags);
>>  	regs->sp = h->sp;
>>  
>>  	regs->ds = decode_segment(h->ds);
>> -- 
>> 1.6.0.4
>>
>> _______________________________________________
>> Containers mailing list
>> Containers@lists.linux-foundation.org
>> https://lists.linux-foundation.org/mailman/listinfo/containers
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] c/r: tolerate X86_EFLAGS_RF on restart
       [not found]         ` <4AE8EDC4.5010202-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
@ 2009-10-29 14:31           ` Serge E. Hallyn
  0 siblings, 0 replies; 4+ messages in thread
From: Serge E. Hallyn @ 2009-10-29 14:31 UTC (permalink / raw)
  To: Oren Laadan
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Avi Kivity, Alexey Dobriyan

Quoting Oren Laadan (orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org):
> 
> After reading the code a bit more, and seeing that:
> 
> 1) ptrace uses the debugreg of a process so may be interested in
> that particular flag, and
> 
> 2) even without ptrace userspace can set that flag (by suitably
> setting the restore context from a signal handler)
> 
> I now think that we should instead:
> 
> 1) Keep the X86_EFLAGS_RF from the time of checkpoint *as is*
> 
> 2) If the restarting task already has this flag set prior to
> restoring eflags from the saved value, then preserve the existing
> flag even if at the time of checkpoint it wasn't set.
> 
> Unless someones yells, I'll commit this soon.

As it stands (looking at arch/x86/kernel/ptrace.c) it looks like
userspace with no privilege can set the bit right?  I think your
proposal sounds best.

-serge

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-10-29 14:31 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-23 18:37 [PATCH] c/r: tolerate X86_EFLAGS_RF on restart Oren Laadan
     [not found] ` <1256323024-20268-1-git-send-email-orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
2009-10-26 14:57   ` Serge E. Hallyn
     [not found]     ` <20091026145746.GA23564-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-29  1:20       ` Oren Laadan
     [not found]         ` <4AE8EDC4.5010202-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
2009-10-29 14:31           ` Serge E. Hallyn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.