From: sherry hurwitz <sherry.hurwitz@amd.com>
To: Jiri Olsa <jolsa@redhat.com>, Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
X86 ML <x86@kernel.org>, Peter Zijlstra <a.p.zijlstra@chello.nl>,
Ingo Molnar <mingo@redhat.com>, Robert Richter <rric@kernel.org>,
"H. Peter Anvin" <hpa@zytor.com>,
Thomas Gleixner <tglx@linutronix.de>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
"Namhyung Kim" <namhyung@kernel.org>,
Jan Stancek <jstancek@redhat.com>,
"Suravee Suthikulpanit" <suravee.suthikulpanit@amd.com>
Subject: Re: [BUG/RFC] perf test fails on AMD CPUs
Date: Mon, 24 Aug 2015 17:37:17 -0500 [thread overview]
Message-ID: <55DB9C9D.7020003@amd.com> (raw)
In-Reply-To: <20150818101025.GA15340@krava.brq.redhat.com>
On 08/18/2015 05:10 AM, Jiri Olsa wrote:
> On Mon, Aug 17, 2015 at 09:06:59AM -0700, Andy Lutomirski wrote:
>> On Sun, Aug 16, 2015 at 9:36 PM, Borislav Petkov <bp@suse.de> wrote:
>>> On Mon, Aug 17, 2015 at 12:29:56AM +0200, Jiri Olsa wrote:
>>>> hi,
>>>> 'perf test 18' is failing on systems with AMD processor.
>>> Hmm, still using that b0rked test box? :-)
>>>
>>> Also, which kernel?
>>>
>>> There have been substantial changes to the entry code recently. Although
>>> I don't see anything being done differently on AMD there except
>>> X86_BUG_SYSRET_SS_ATTRS but that should be unrelated.
>>>
>>>> The only reason I could find is that AMD does not set 'resume flag'
>>>> in RFLAGS register the way the Intel CPU does.
>>>>
>>>> (simplified) test scenario:
>>>>
>>>> - create breakpoint (on test_function) perf event with SIGIO signal
>>>> to be delivered any time the breakpoint is hit
>>>> - run test_function
>>>>
>>>>
>>>> expected course of actions is:
>>>> 1) CPU hits 'test_function'
>>>> 2) DB exception is triggered, with RFLAGS.RF=0
>>>> 3) DB exception handler sets regs->RFLAGS.RF=1 and perf handler
>>>> triggers irq_work pending work
>>>> 4) DB exception executes iretd
>>>> 5) irq_work interrupt is triggered, with RFLAGS.RF=1
>>>> 6) irq_work interrupt calls kill_fasync with SIGIO signal
>>>> 7) irq_work interrupt on return to userspace calls prepare_exit_to_usermode
>>>> which actually delivers the SIGIO signal
>>>> 8) sigreturn syscall prepare registers to return to the
>>>> instruction from step 1) and sets RFLAGS.RF to the its original
>>>> value from step 5) (RFLAGS.RF=1)
>>>> 9) CPU hits 'test_function' and DB exception is NOT triggered
>>>> due to RFLAGS.RF=1
>>>>
>>>> this is how I see it works on Intel
>>>>
>>>> But AMD gives me RFLAGS.RF=0 on step 5, which makes the step 9 to
>>>> trigger the DB exception once again and makes the test fail.
>>> Adding Andy, he might have an idea. Leaving in the rest for reference.
>> Gee thanks :-p
>>
>> Jiri, did you instrument the code and observe do_IRQ sees RF clear in
>> its pt_regs? Also, it might be worth checking that regs->ip in the
>> irq_work matches regs->ip.
> yep, thats what I saw.. once irq_work interrupt was triggered
> the regs->ip was same as for the previous debug exception
> but the RFLAGS.RF was 0
>
>> It's *possible* that I messed up and broke RF restore with
>> opportunistic sysret, but the code looks correct:
>>
>> testq $(X86_EFLAGS_RF|X86_EFLAGS_TF), %r11
>> jnz opportunistic_sysret_failed
> AFAICS the problematic paths did not hit syscalls
>
> buuuuuut anyway, it looks like latest AMD firmware issue:
>
> [root@amd-pike-07 ~]# cat /sys/devices/system/cpu/cpu0/microcode/version
> 0x6000822
> [root@amd-pike-07 perf]# ./perf test 18
> 18: Test breakpoint overflow signal handler : Ok
>
> [root@amd-pike-07 perf]# cat /sys/devices/system/cpu/cpu0/microcode/version
> 0x6000832
> [root@amd-pike-07 perf]# ./perf test 18
> 18: Test breakpoint overflow signal handler : FAILED!
>
>
> [root@amd-pike-07 ~]# cat /proc/cpuinfo
> processor : 7
> vendor_id : AuthenticAMD
> cpu family : 21
> model : 2
> model name : AMD Opteron(tm) Processor 3380
> stepping : 0
> microcode : 0x6000832
>
> SNIP
>
>
>>>> AMD description of RF flag (SDM 3.1.6):
>>>> =======================================
>>>> Resume Flag (RF) Bit. Bit 16. The RF bit allows an instruction to be restarted following an
>>>> instruction breakpoint resulting in a debug exception (#DB). This bit prevents multiple debug
>>>> exceptions from occurring on the same instruction.
>>>> The processor clears the RF bit after every instruction is successfully executed, except when the
>>>> instruction is:
>>>> •
>>>> •
>>>> An IRET that sets the RF bit.
>>>> JMP, CALL, or INTn through a task gate.
>>>> In both of the above cases, RF is not cleared to 0 until the next instruction successfully executes.
>>>> When an exception occurs (or when a string instruction is interrupted), the processor normally sets
>>>> RF=1 in the RFLAGS image saved on the interrupt stack. However, when a #DB exception occurs as a
>>>> result of an instruction breakpoint, the processor clears the RF bit to 0 in the interrupt-stack RFLAGS
>>>> image.
>> That's a little weird, I think. Shouldn't RF be zero on #DB due to a
>> *watchpoint* so that a watchpoint followed immediately by a breakpoint
>> works?
> the AMD description looked to be more vague (compared to Intels)
>
>>>> • For other cases, the value pushed for RF is the value that was in EFLAG.RF at the time the event handler was
>>>> called. This includes:
>>>> — Debug exceptions generated in response to instruction breakpoints
>>>> — Hardware-generated interrupts arriving between instructions (including those arriving after the last
>>>> iteration of a repeated string instruction)
>> This appears to be why it works on Intel. Does AMD not do that? We
>> could probably work around this in software (by not using irq work for
>> this), but yuck.
> yep, but hopefuly it's the issue microcode ;-) Cc-ing guys from linux-firmware git
>
> Sherry, Suravee, any idea?
>
> thanks,
> jirka
Jiri,
I have duplicated your problem and asked the HW architect that wrote 832
to review the diff between the 822 and 832 microcode patch.
Thanks,
Sherry
next prev parent reply other threads:[~2015-08-25 1:12 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-16 22:29 [BUG/RFC] perf test fails on AMD CPUs Jiri Olsa
2015-08-17 4:36 ` Borislav Petkov
2015-08-17 7:33 ` Jiri Olsa
2015-08-17 16:06 ` Andy Lutomirski
2015-08-18 8:52 ` Borislav Petkov
2015-08-18 10:10 ` Jiri Olsa
2015-08-19 3:55 ` Borislav Petkov
2015-08-19 8:55 ` Jiri Olsa
2015-08-19 15:47 ` Borislav Petkov
2015-08-19 15:58 ` Jiri Olsa
2015-08-19 16:12 ` Borislav Petkov
2015-08-21 7:45 ` Jiri Olsa
2015-08-24 22:37 ` sherry hurwitz [this message]
2015-12-10 19:26 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55DB9C9D.7020003@amd.com \
--to=sherry.hurwitz@amd.com \
--cc=a.p.zijlstra@chello.nl \
--cc=acme@kernel.org \
--cc=bp@suse.de \
--cc=hpa@zytor.com \
--cc=jolsa@redhat.com \
--cc=jstancek@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=rric@kernel.org \
--cc=suravee.suthikulpanit@amd.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox