* [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction @ 2019-11-12 23:22 Luis Machado 2019-11-18 13:15 ` Will Deacon 0 siblings, 1 reply; 17+ messages in thread From: Luis Machado @ 2019-11-12 23:22 UTC (permalink / raw) To: linux-arm-kernel, will Hi, I've noticed, under very specific conditions, that a PTRACE_SINGLESTEP request by GDB won't execute the underlying instruction. As a consequence, the PC doesn't move, but we return a SIGTRAP just like we would for a regular successful PTRACE_SINGLESTEP request. Since there are no software breakpoints inserted at PC (we are actually stepping over a breakpoint, so GDB removes the breakpoint at PC before issuing a PTRACE_SINGLESTEP request), this is an odd behavior. Though not too harmful, i see this manifesting in the GDB testsuite (gdb.reverse/insn-reverse.exp), which throws the test off by making GDB think it is further in the instruction stream than it really is. In fact, we get lucky here and no FAIL's show up, only many more spurious PASSes. Since the reproduction steps involve GDB and the testcase, I'll report my findings here for convenience. But it can be reproduced with a top-of-tree kernel (what i used) or an Ubuntu one (4.12.13), it doesn't make a difference. I've also reproduced this in real hardware and under QEMU. I did some rudimentary debugging to confirm GDB wasn't doing anything wrong, and placed some debugging output on the arm64 ptrace-related functions in the kernel. I also added some debugging output to the function that handles software breakpoint traps, to make sure no breakpoints were being inadvertently left behind. At the point where GDB issues PTRACE_SINGLESTEP, we see this: <case 1> <before execution> [ 524.329276] >>>> Start user_enable_single_step,/repos/linux/arch/arm64/kernel/debug-monitors.c:450 <<<< [ 524.329314] >>>> PC is 400574 <<<< [ 524.329329] >>>> End user_enable_single_step,/repos/linux/arch/arm64/kernel/debug-monitors.c:459 <<<< <after execution> [ 524.329679] >>>> Start single_step_handler,/repos/linux/arch/arm64/kernel/debug-monitors.c:249 <<<< [ 524.329707] >>>> PC is 400574 <<<< [ 524.329725] >>>> Start send_user_sigtrap,/repos/linux/arch/arm64/kernel/debug-monitors.c:228 <<<< [ 524.329733] >>>> PC is 400574 <<<< [ 524.329783] >>>> End send_user_sigtrap,/repos/linux/arch/arm64/kernel/debug-monitors.c:241 <<<< [ 524.329794] >>>> End single_step_handler,/repos/linux/arch/arm64/kernel/debug-monitors.c:280 <<<< A regular successful PTRACE_SINGLESTEP should look like this instead: <case 2> <before execution> [ 981.042942] >>>> Start user_enable_single_step,/repos/linux/arch/arm64/kernel/debug-monitors.c:450 <<<< [ 981.042982] >>>> PC is 400574 <<<< [ 981.042997] >>>> End user_enable_single_step,/repos/linux/arch/arm64/kernel/debug-monitors.c:459 <<<< <after execution> [ 981.043411] >>>> Start single_step_handler,/repos/linux/arch/arm64/kernel/debug-monitors.c:249 <<<< [ 981.043453] >>>> PC is 400578 <<<< [ 981.043472] >>>> Start send_user_sigtrap,/repos/linux/arch/arm64/kernel/debug-monitors.c:228 <<<< [ 981.043481] >>>> PC is 400578 <<<< [ 981.043540] >>>> End send_user_sigtrap,/repos/linux/arch/arm64/kernel/debug-monitors.c:241 <<<< [ 981.043553] >>>> End single_step_handler,/repos/linux/arch/arm64/kernel/debug-monitors.c:280 <<<< As a guess, i decided to revert commit 3a402a709500c5a3faca2111668c33d96555e35a (arm64: debug: avoid resetting stepping state machine when TIF_SINGLESTEP) to see its effect on this particular case. Then the output looks like <case 2> above, which is correct. So this is at least partially caused by commit 3a402a709500c5a3faca2111668c33d96555e35a, but i don't understand the full picture (involving the kernel) here. I know said commit is needed for other problematic cases in GDB (fork/vfork for example), but it might be having undesirable side effects here. Here's how to reproduce. Make sure you have a reasonably new GDB (I reproduced it with Ubuntu's GDB 7.11.1-0ubuntu1~16.5). You can also build GDB from the git tree if you want. A standard aarch64-linux-gnu GDB will do. Grab both of these source files for the testcase: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob_plain;f=gdb/testsuite/gdb.reverse/insn-reverse.c;hb=HEAD https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob_plain;f=gdb/testsuite/gdb.reverse/insn-reverse-aarch64.c;hb=HEAD Build the testcase with: gcc -O0 -g3 -lm insn-reverse.c -o insn-reverse Execute gdb like so: gdb -ex "set displaced-stepping off" -ex "b load" -ex "run" -ex "record" -ex "si" -ex "rsi" -ex "record stop" insn-reverse What the above does is put a breakpoint in "load", run to it, enable reversible debugging, step one instruction forward, step back one instruction (essentially coming back to the same PC) and then shutting down reversible debugging. Now, giving gdb the "si" command will cause it to execute the PTRACE_SINGLESTEP i pointed out above, in my explanation of the bug. display/x $pc stepi You'll see, if it reproduces, the PC has not changed and the instruction has not executed. GDB will indicate a breakpoint hit, but this is bogus. It is due to the fact the PC didn't move, and GDB still has a breakpoint listed in this PC. Please let me know if i can help with any other information in case any of the steps is not clear. Thanks, Luis _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction 2019-11-12 23:22 [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction Luis Machado @ 2019-11-18 13:15 ` Will Deacon 2019-11-18 14:54 ` Luis Machado 0 siblings, 1 reply; 17+ messages in thread From: Will Deacon @ 2019-11-18 13:15 UTC (permalink / raw) To: Luis Machado; +Cc: mark.rutland, linux-arm-kernel Hi Luis, [+Mark for the valid_user_regs() part] On Tue, Nov 12, 2019 at 08:22:10PM -0300, Luis Machado wrote: > I've noticed, under very specific conditions, that a PTRACE_SINGLESTEP > request by GDB won't execute the underlying instruction. As a consequence, > the PC doesn't move, but we return a SIGTRAP just like we would for a > regular successful PTRACE_SINGLESTEP request. > > Since there are no software breakpoints inserted at PC (we are actually > stepping over a breakpoint, so GDB removes the breakpoint at PC before > issuing a PTRACE_SINGLESTEP request), this is an odd behavior. > > Though not too harmful, i see this manifesting in the GDB testsuite > (gdb.reverse/insn-reverse.exp), which throws the test off by making GDB > think it is further in the instruction stream than it really is. In fact, we > get lucky here and no FAIL's show up, only many more spurious PASSes. I managed to reproduce this locally and I think I've figured out what's going on, although I'm not sure that the kernel is the best place to fix it. Looking at the specific reproducer: > Execute gdb like so: > > gdb -ex "set displaced-stepping off" -ex "b load" -ex "run" -ex "record" -ex > "si" -ex "rsi" -ex "record stop" insn-reverse So we've got a couple of instructions as follows (it doesn't actually matter what they are, so I've changed the LD1 in your binary for a NOP in order to avoid confusion with the "load" label not actually pointing at a load): 0x7b8: mov // "load" 0x7bc: nop "b load" places a breakpoint at 0x7b8: ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201fd4200000) = 0 We run to a software breakpoint on "load" (the mov instruction). We take the trap and try to execute the "si", which means we need to remove the breakpoint while we step over it: ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201f910003e0) = 0 [...] ptrace(PTRACE_SINGLESTEP, 662, 0x1, 0) = 0 This causes the kernel to arm the single-step state machine so that MDSCR_EL1.SS == SPSR_EL1.SS == 1 (known as "active-not-pending"). Running an instruction in userspace will transition to MDSCR_EL1.SS ==1 and SPSR_EL1.SS == 0 (known as "active-pending"), which will cause the trap to trigger, at which point gdb puts the breakpoint instruction back since the step is complete: ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201fd4200000) = 0 This is where things start to go wrong. The "rsi" command attempts to perform a reverse step, which means restoring the old state when we were previously executing at 0x7b8. It starts by removing the breakpoint again, since we've already hit that: ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201f910003e0) = 0 and then resets the CPU registers to their old values: (I don't know why it does this three times) ptrace(PTRACE_SETREGSET, 662, NT_PRSTATUS, [{iov_base=0xffffff64b3c8, iov_len=272}]) = 0 ptrace(PTRACE_SETREGSET, 662, NT_PRSTATUS, [{iov_base=0xffffff64b3c8, iov_len=272}]) = 0 ptrace(PTRACE_SETREGSET, 662, NT_PRSTATUS, [{iov_base=0xffffff64b3c8, iov_len=272}]) = 0 The problem with this is that we have moved the PC back to 0x7b8 but we have also cleared SPSR_EL1.SS to 0. Internally, the kernel hasn't seen stepping get disabled (this usually happens by PTRACE_CONT calling user_disable_single_step()) which means that MDSCR_EL1.SS remains set to 1 and we're in the active-pending state! Consequently, we immediately take a step exception if a step operation is attempted. Now, we *could* consider hacking the TIF_SINGLESTEP check in valid_user_regs() so that SPSR_EL1.SS is preserved when stepping is active but this is a user-visible change and may break things like stepping out of signal handlers. I would prefer that GDB manages the SS bit explicitly in this scenario, by setting it to 1 when restoring the old state in the reverse step, a bit like when it disables the old breakpoint. You can emulate this by doing: (gdb) set $cpsr |= (1<<21) Thoughts? Will _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction 2019-11-18 13:15 ` Will Deacon @ 2019-11-18 14:54 ` Luis Machado 2019-11-26 16:35 ` Luis Machado 0 siblings, 1 reply; 17+ messages in thread From: Luis Machado @ 2019-11-18 14:54 UTC (permalink / raw) To: Will Deacon; +Cc: mark.rutland, linux-arm-kernel Hi Will, Thanks for the thorough explanation. On 11/18/19 10:15 AM, Will Deacon wrote: > Hi Luis, > > [+Mark for the valid_user_regs() part] > > On Tue, Nov 12, 2019 at 08:22:10PM -0300, Luis Machado wrote: >> I've noticed, under very specific conditions, that a PTRACE_SINGLESTEP >> request by GDB won't execute the underlying instruction. As a consequence, >> the PC doesn't move, but we return a SIGTRAP just like we would for a >> regular successful PTRACE_SINGLESTEP request. >> >> Since there are no software breakpoints inserted at PC (we are actually >> stepping over a breakpoint, so GDB removes the breakpoint at PC before >> issuing a PTRACE_SINGLESTEP request), this is an odd behavior. >> >> Though not too harmful, i see this manifesting in the GDB testsuite >> (gdb.reverse/insn-reverse.exp), which throws the test off by making GDB >> think it is further in the instruction stream than it really is. In fact, we >> get lucky here and no FAIL's show up, only many more spurious PASSes. > > I managed to reproduce this locally and I think I've figured out what's > going on, although I'm not sure that the kernel is the best place to fix > it. > > Looking at the specific reproducer: > >> Execute gdb like so: >> >> gdb -ex "set displaced-stepping off" -ex "b load" -ex "run" -ex "record" -ex >> "si" -ex "rsi" -ex "record stop" insn-reverse > > So we've got a couple of instructions as follows (it doesn't actually matter > what they are, so I've changed the LD1 in your binary for a NOP in order to > avoid confusion with the "load" label not actually pointing at a load): > > 0x7b8: mov // "load" > 0x7bc: nop > > "b load" places a breakpoint at 0x7b8: > > ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201fd4200000) = 0 > > We run to a software breakpoint on "load" (the mov instruction). We take > the trap and try to execute the "si", which means we need to remove the > breakpoint while we step over it: > > ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201f910003e0) = 0 > [...] > ptrace(PTRACE_SINGLESTEP, 662, 0x1, 0) = 0 > > This causes the kernel to arm the single-step state machine so that > MDSCR_EL1.SS == SPSR_EL1.SS == 1 (known as "active-not-pending"). Running > an instruction in userspace will transition to MDSCR_EL1.SS ==1 and > SPSR_EL1.SS == 0 (known as "active-pending"), which will cause the trap to > trigger, at which point gdb puts the breakpoint instruction back since the > step is complete: So, just to confirm my understanding, we have a couple bits controlling single-stepping in the kernel, one in MDSCR_EL1 and another in SPSR_EL1. GDB doesn't have direct access to any of those, correct? Instead, GDB has access to a SS bit in the reserved 21~22 range of CPSR. The transition from active-not-pending to active-pending takes place via a single PTRACE_SINGLESTEP request? Is that correct? > > ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201fd4200000) = 0 > > This is where things start to go wrong. The "rsi" command attempts to > perform a reverse step, which means restoring the old state when we were > previously executing at 0x7b8. It starts by removing the breakpoint again, > since we've already hit that: > > ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201f910003e0) = 0 > > and then resets the CPU registers to their old values: > > (I don't know why it does this three times) > ptrace(PTRACE_SETREGSET, 662, NT_PRSTATUS, [{iov_base=0xffffff64b3c8, iov_len=272}]) = 0 > ptrace(PTRACE_SETREGSET, 662, NT_PRSTATUS, [{iov_base=0xffffff64b3c8, iov_len=272}]) = 0 > ptrace(PTRACE_SETREGSET, 662, NT_PRSTATUS, [{iov_base=0xffffff64b3c8, iov_len=272}]) = 0 > > The problem with this is that we have moved the PC back to 0x7b8 but we have > also cleared SPSR_EL1.SS to 0. Internally, the kernel hasn't seen stepping > get disabled (this usually happens by PTRACE_CONT calling > user_disable_single_step()) which means that MDSCR_EL1.SS remains set to 1 > and we're in the active-pending state! Consequently, we immediately take a > step exception if a step operation is attempted > While trying to reproduce this, i was paying attention to the SS bit coming and going. But in the particular sequence of si/rsi, within the record boundaries, i see GDB just restored the original CPSR value to what it was before we processed the si command. From GDB's POV all state was restore to the way it was before and we're good to go. Is this not enough to restore state kernel-wise? > Now, we *could* consider hacking the TIF_SINGLESTEP check in > valid_user_regs() so that SPSR_EL1.SS is preserved when stepping is active > but this is a user-visible change and may break things like stepping out of > signal handlers. I would prefer that GDB manages the SS bit explicitly in > this scenario, by setting it to 1 when restoring the old state in the > reverse step, a bit like when it disables the old breakpoint. You can > emulate this by doing: I think we could let GDB control this when required, but I'm trying to understand the ramifications of letting GDB do so. For example, what if the user decides to alter the PC here and there, for debugging purposes. That is a use case that happens often, in order to go back or skip some parts of the code. Would we need to pay attention to the SS bit in those cases as well? > > (gdb) set $cpsr |= (1<<21) In particular, what does the switching of this bit accomplishes in the kernel? Would we be better off forcing the SS bit every time we do a single-step operation, for example? _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction 2019-11-18 14:54 ` Luis Machado @ 2019-11-26 16:35 ` Luis Machado 2019-12-10 20:00 ` Luis Machado 2020-01-13 18:13 ` Luis Machado 0 siblings, 2 replies; 17+ messages in thread From: Luis Machado @ 2019-11-26 16:35 UTC (permalink / raw) To: Will Deacon; +Cc: mark.rutland, linux-arm-kernel ping? On 11/18/19 11:54 AM, Luis Machado wrote: > Hi Will, > > Thanks for the thorough explanation. > > On 11/18/19 10:15 AM, Will Deacon wrote: >> Hi Luis, >> >> [+Mark for the valid_user_regs() part] >> >> On Tue, Nov 12, 2019 at 08:22:10PM -0300, Luis Machado wrote: >>> I've noticed, under very specific conditions, that a PTRACE_SINGLESTEP >>> request by GDB won't execute the underlying instruction. As a >>> consequence, >>> the PC doesn't move, but we return a SIGTRAP just like we would for a >>> regular successful PTRACE_SINGLESTEP request. >>> >>> Since there are no software breakpoints inserted at PC (we are actually >>> stepping over a breakpoint, so GDB removes the breakpoint at PC before >>> issuing a PTRACE_SINGLESTEP request), this is an odd behavior. >>> >>> Though not too harmful, i see this manifesting in the GDB testsuite >>> (gdb.reverse/insn-reverse.exp), which throws the test off by making GDB >>> think it is further in the instruction stream than it really is. In >>> fact, we >>> get lucky here and no FAIL's show up, only many more spurious PASSes. >> >> I managed to reproduce this locally and I think I've figured out what's >> going on, although I'm not sure that the kernel is the best place to fix >> it. >> >> Looking at the specific reproducer: >> >>> Execute gdb like so: >>> >>> gdb -ex "set displaced-stepping off" -ex "b load" -ex "run" -ex >>> "record" -ex >>> "si" -ex "rsi" -ex "record stop" insn-reverse >> >> So we've got a couple of instructions as follows (it doesn't actually >> matter >> what they are, so I've changed the LD1 in your binary for a NOP in >> order to >> avoid confusion with the "load" label not actually pointing at a load): >> >> 0x7b8: mov // "load" >> 0x7bc: nop >> >> "b load" places a breakpoint at 0x7b8: >> >> ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201fd4200000) = 0 >> >> We run to a software breakpoint on "load" (the mov instruction). We take >> the trap and try to execute the "si", which means we need to remove the >> breakpoint while we step over it: >> >> ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201f910003e0) = 0 >> [...] >> ptrace(PTRACE_SINGLESTEP, 662, 0x1, 0) = 0 >> >> This causes the kernel to arm the single-step state machine so that >> MDSCR_EL1.SS == SPSR_EL1.SS == 1 (known as "active-not-pending"). Running >> an instruction in userspace will transition to MDSCR_EL1.SS ==1 and >> SPSR_EL1.SS == 0 (known as "active-pending"), which will cause the >> trap to >> trigger, at which point gdb puts the breakpoint instruction back since >> the >> step is complete: > > So, just to confirm my understanding, we have a couple bits controlling > single-stepping in the kernel, one in MDSCR_EL1 and another in SPSR_EL1. > GDB doesn't have direct access to any of those, correct? > > Instead, GDB has access to a SS bit in the reserved 21~22 range of CPSR. > > The transition from active-not-pending to active-pending takes place via > a single PTRACE_SINGLESTEP request? Is that correct? > >> >> ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201fd4200000) = 0 >> >> This is where things start to go wrong. The "rsi" command attempts to >> perform a reverse step, which means restoring the old state when we were >> previously executing at 0x7b8. It starts by removing the breakpoint >> again, >> since we've already hit that: >> >> ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201f910003e0) = 0 >> >> and then resets the CPU registers to their old values: >> >> (I don't know why it does this three times) >> ptrace(PTRACE_SETREGSET, 662, NT_PRSTATUS, >> [{iov_base=0xffffff64b3c8, iov_len=272}]) = 0 >> ptrace(PTRACE_SETREGSET, 662, NT_PRSTATUS, >> [{iov_base=0xffffff64b3c8, iov_len=272}]) = 0 >> ptrace(PTRACE_SETREGSET, 662, NT_PRSTATUS, >> [{iov_base=0xffffff64b3c8, iov_len=272}]) = 0 >> >> The problem with this is that we have moved the PC back to 0x7b8 but >> we have >> also cleared SPSR_EL1.SS to 0. Internally, the kernel hasn't seen >> stepping >> get disabled (this usually happens by PTRACE_CONT calling >> user_disable_single_step()) which means that MDSCR_EL1.SS remains set >> to 1 >> and we're in the active-pending state! Consequently, we immediately >> take a >> step exception if a step operation is attempted > > > While trying to reproduce this, i was paying attention to the SS bit > coming and going. But in the particular sequence of si/rsi, within the > record boundaries, i see GDB just restored the original CPSR value to > what it was before we processed the si command. > > From GDB's POV all state was restore to the way it was before and we're > good to go. > > Is this not enough to restore state kernel-wise? > >> Now, we *could* consider hacking the TIF_SINGLESTEP check in >> valid_user_regs() so that SPSR_EL1.SS is preserved when stepping is >> active >> but this is a user-visible change and may break things like stepping >> out of >> signal handlers. I would prefer that GDB manages the SS bit explicitly in >> this scenario, by setting it to 1 when restoring the old state in the >> reverse step, a bit like when it disables the old breakpoint. You can >> emulate this by doing: > > I think we could let GDB control this when required, but I'm trying to > understand the ramifications of letting GDB do so. > > For example, what if the user decides to alter the PC here and there, > for debugging purposes. That is a use case that happens often, in order > to go back or skip some parts of the code. > > Would we need to pay attention to the SS bit in those cases as well? > >> >> (gdb) set $cpsr |= (1<<21) > > In particular, what does the switching of this bit accomplishes in the > kernel? Would we be better off forcing the SS bit every time we do a > single-step operation, for example? _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction 2019-11-26 16:35 ` Luis Machado @ 2019-12-10 20:00 ` Luis Machado 2020-02-13 12:01 ` Will Deacon 2020-01-13 18:13 ` Luis Machado 1 sibling, 1 reply; 17+ messages in thread From: Luis Machado @ 2019-12-10 20:00 UTC (permalink / raw) To: Will Deacon; +Cc: mark.rutland, linux-arm-kernel Will, Mark, Do you have any input regarding this particular situation? It would be nice to get this fixed before the release of another GDB version, if the fix is to live in GDB itself. On 11/26/19 1:35 PM, Luis Machado wrote: > ping? > > On 11/18/19 11:54 AM, Luis Machado wrote: >> Hi Will, >> >> Thanks for the thorough explanation. >> >> On 11/18/19 10:15 AM, Will Deacon wrote: >>> Hi Luis, >>> >>> [+Mark for the valid_user_regs() part] >>> >>> On Tue, Nov 12, 2019 at 08:22:10PM -0300, Luis Machado wrote: >>>> I've noticed, under very specific conditions, that a PTRACE_SINGLESTEP >>>> request by GDB won't execute the underlying instruction. As a >>>> consequence, >>>> the PC doesn't move, but we return a SIGTRAP just like we would for a >>>> regular successful PTRACE_SINGLESTEP request. >>>> >>>> Since there are no software breakpoints inserted at PC (we are actually >>>> stepping over a breakpoint, so GDB removes the breakpoint at PC before >>>> issuing a PTRACE_SINGLESTEP request), this is an odd behavior. >>>> >>>> Though not too harmful, i see this manifesting in the GDB testsuite >>>> (gdb.reverse/insn-reverse.exp), which throws the test off by making GDB >>>> think it is further in the instruction stream than it really is. In >>>> fact, we >>>> get lucky here and no FAIL's show up, only many more spurious PASSes. >>> >>> I managed to reproduce this locally and I think I've figured out what's >>> going on, although I'm not sure that the kernel is the best place to fix >>> it. >>> >>> Looking at the specific reproducer: >>> >>>> Execute gdb like so: >>>> >>>> gdb -ex "set displaced-stepping off" -ex "b load" -ex "run" -ex >>>> "record" -ex >>>> "si" -ex "rsi" -ex "record stop" insn-reverse >>> >>> So we've got a couple of instructions as follows (it doesn't actually >>> matter >>> what they are, so I've changed the LD1 in your binary for a NOP in >>> order to >>> avoid confusion with the "load" label not actually pointing at a load): >>> >>> 0x7b8: mov // "load" >>> 0x7bc: nop >>> >>> "b load" places a breakpoint at 0x7b8: >>> >>> ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201fd4200000) = 0 >>> >>> We run to a software breakpoint on "load" (the mov instruction). We take >>> the trap and try to execute the "si", which means we need to remove the >>> breakpoint while we step over it: >>> >>> ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201f910003e0) = 0 >>> [...] >>> ptrace(PTRACE_SINGLESTEP, 662, 0x1, 0) = 0 >>> >>> This causes the kernel to arm the single-step state machine so that >>> MDSCR_EL1.SS == SPSR_EL1.SS == 1 (known as "active-not-pending"). >>> Running >>> an instruction in userspace will transition to MDSCR_EL1.SS ==1 and >>> SPSR_EL1.SS == 0 (known as "active-pending"), which will cause the >>> trap to >>> trigger, at which point gdb puts the breakpoint instruction back >>> since the >>> step is complete: >> >> So, just to confirm my understanding, we have a couple bits >> controlling single-stepping in the kernel, one in MDSCR_EL1 and >> another in SPSR_EL1. GDB doesn't have direct access to any of those, >> correct? >> >> Instead, GDB has access to a SS bit in the reserved 21~22 range of CPSR. >> >> The transition from active-not-pending to active-pending takes place >> via a single PTRACE_SINGLESTEP request? Is that correct? >> >>> >>> ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201fd4200000) = 0 >>> >>> This is where things start to go wrong. The "rsi" command attempts to >>> perform a reverse step, which means restoring the old state when we were >>> previously executing at 0x7b8. It starts by removing the breakpoint >>> again, >>> since we've already hit that: >>> >>> ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201f910003e0) = 0 >>> >>> and then resets the CPU registers to their old values: >>> >>> (I don't know why it does this three times) >>> ptrace(PTRACE_SETREGSET, 662, NT_PRSTATUS, >>> [{iov_base=0xffffff64b3c8, iov_len=272}]) = 0 >>> ptrace(PTRACE_SETREGSET, 662, NT_PRSTATUS, >>> [{iov_base=0xffffff64b3c8, iov_len=272}]) = 0 >>> ptrace(PTRACE_SETREGSET, 662, NT_PRSTATUS, >>> [{iov_base=0xffffff64b3c8, iov_len=272}]) = 0 >>> >>> The problem with this is that we have moved the PC back to 0x7b8 but >>> we have >>> also cleared SPSR_EL1.SS to 0. Internally, the kernel hasn't seen >>> stepping >>> get disabled (this usually happens by PTRACE_CONT calling >>> user_disable_single_step()) which means that MDSCR_EL1.SS remains set >>> to 1 >>> and we're in the active-pending state! Consequently, we immediately >>> take a >>> step exception if a step operation is attempted > >> >> While trying to reproduce this, i was paying attention to the SS bit >> coming and going. But in the particular sequence of si/rsi, within the >> record boundaries, i see GDB just restored the original CPSR value to >> what it was before we processed the si command. >> >> From GDB's POV all state was restore to the way it was before and >> we're good to go. >> >> Is this not enough to restore state kernel-wise? >> >>> Now, we *could* consider hacking the TIF_SINGLESTEP check in >>> valid_user_regs() so that SPSR_EL1.SS is preserved when stepping is >>> active >>> but this is a user-visible change and may break things like stepping >>> out of >>> signal handlers. I would prefer that GDB manages the SS bit >>> explicitly in >>> this scenario, by setting it to 1 when restoring the old state in the >>> reverse step, a bit like when it disables the old breakpoint. You can >>> emulate this by doing: >> >> I think we could let GDB control this when required, but I'm trying to >> understand the ramifications of letting GDB do so. >> >> For example, what if the user decides to alter the PC here and there, >> for debugging purposes. That is a use case that happens often, in >> order to go back or skip some parts of the code. >> >> Would we need to pay attention to the SS bit in those cases as well? >> >>> >>> (gdb) set $cpsr |= (1<<21) >> >> In particular, what does the switching of this bit accomplishes in the >> kernel? Would we be better off forcing the SS bit every time we do a >> single-step operation, for example? _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction 2019-12-10 20:00 ` Luis Machado @ 2020-02-13 12:01 ` Will Deacon 2020-02-13 17:07 ` Luis Machado 2020-02-20 13:02 ` Mark Rutland 0 siblings, 2 replies; 17+ messages in thread From: Will Deacon @ 2020-02-13 12:01 UTC (permalink / raw) To: Luis Machado; +Cc: mark.rutland, linux-arm-kernel Hi Luis, Sorry for the very slow reply. I talked to Mark about this a bit but it seems that we never followed up here. On Tue, Dec 10, 2019 at 05:00:18PM -0300, Luis Machado wrote: > Do you have any input regarding this particular situation? > > It would be nice to get this fixed before the release of another GDB > version, if the fix is to live in GDB itself. Basically, I'm very nervous about fixing this in the kernel because whatever we do will be visible to userspace. On the other hand, this part of the ptrace interface is only seriously used by GDB and we should make sure that it works well. Does the diff below solve the problem? If so, can you confirm that it doesn't appear to regress anything else for GDB? Cheers, Will --->8 diff --git a/arch/arm64/include/asm/debug-monitors.h b/arch/arm64/include/asm/debug-monitors.h index 7619f473155f..d825e3585e28 100644 --- a/arch/arm64/include/asm/debug-monitors.h +++ b/arch/arm64/include/asm/debug-monitors.h @@ -109,6 +109,8 @@ void disable_debug_monitors(enum dbg_active_el el); void user_rewind_single_step(struct task_struct *task); void user_fastforward_single_step(struct task_struct *task); +void user_regs_reset_single_step(struct user_pt_regs *regs, + struct task_struct *task); void kernel_enable_single_step(struct pt_regs *regs); void kernel_disable_single_step(void); diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c index 48222a4760c2..7569deb1eac1 100644 --- a/arch/arm64/kernel/debug-monitors.c +++ b/arch/arm64/kernel/debug-monitors.c @@ -141,17 +141,20 @@ postcore_initcall(debug_monitors_init); /* * Single step API and exception handling. */ -static void set_regs_spsr_ss(struct pt_regs *regs) +static void set_user_regs_spsr_ss(struct user_pt_regs *regs) { regs->pstate |= DBG_SPSR_SS; } -NOKPROBE_SYMBOL(set_regs_spsr_ss); +NOKPROBE_SYMBOL(set_user_regs_spsr_ss); -static void clear_regs_spsr_ss(struct pt_regs *regs) +static void clear_user_regs_spsr_ss(struct user_pt_regs *regs) { regs->pstate &= ~DBG_SPSR_SS; } -NOKPROBE_SYMBOL(clear_regs_spsr_ss); +NOKPROBE_SYMBOL(clear_user_regs_spsr_ss); + +#define set_regs_spsr_ss(r) set_user_regs_spsr_ss(&(r)->user_regs) +#define clear_regs_spsr_ss(r) clear_user_regs_spsr_ss(&(r)->user_regs) static DEFINE_SPINLOCK(debug_hook_lock); static LIST_HEAD(user_step_hook); @@ -404,6 +407,15 @@ void user_fastforward_single_step(struct task_struct *task) clear_regs_spsr_ss(task_pt_regs(task)); } +void user_regs_reset_single_step(struct user_pt_regs *regs, + struct task_struct *task) +{ + if (test_tsk_thread_flag(task, TIF_SINGLESTEP)) + set_user_regs_spsr_ss(regs); + else + clear_user_regs_spsr_ss(regs); +} + /* Kernel API */ void kernel_enable_single_step(struct pt_regs *regs) { diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index cd6e5fa48b9c..d479fbcbd0d2 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -1934,8 +1934,8 @@ static int valid_native_regs(struct user_pt_regs *regs) */ int valid_user_regs(struct user_pt_regs *regs, struct task_struct *task) { - if (!test_tsk_thread_flag(task, TIF_SINGLESTEP)) - regs->pstate &= ~DBG_SPSR_SS; + /* https://lore.kernel.org/lkml/20191118131525.GA4180@willie-the-truck */ + user_regs_reset_single_step(regs, task); if (is_compat_thread(task_thread_info(task))) return valid_compat_regs(regs); diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index 339882db5a91..bc54bdbfd760 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -505,8 +505,12 @@ static int restore_sigframe(struct pt_regs *regs, forget_syscall(regs); err |= !valid_user_regs(®s->user_regs, current); - if (err == 0) + + if (err == 0) { + /* Make it look like we stepped the sigreturn system call */ + user_fastforward_single_step(current); err = parse_user_sigframe(&user, sf); + } if (err == 0 && system_supports_fpsimd()) { if (!user.fpsimd) _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction 2020-02-13 12:01 ` Will Deacon @ 2020-02-13 17:07 ` Luis Machado 2020-02-14 15:45 ` Luis Machado 2020-02-20 13:02 ` Mark Rutland 1 sibling, 1 reply; 17+ messages in thread From: Luis Machado @ 2020-02-13 17:07 UTC (permalink / raw) To: Will Deacon; +Cc: mark.rutland, linux-arm-kernel Hi Will, On 2/13/20 9:01 AM, Will Deacon wrote: > Hi Luis, > > Sorry for the very slow reply. I talked to Mark about this a bit but it > seems that we never followed up here. No worries. > > On Tue, Dec 10, 2019 at 05:00:18PM -0300, Luis Machado wrote: >> Do you have any input regarding this particular situation? >> >> It would be nice to get this fixed before the release of another GDB >> version, if the fix is to live in GDB itself. > > Basically, I'm very nervous about fixing this in the kernel because > whatever we do will be visible to userspace. On the other hand, this > part of the ptrace interface is only seriously used by GDB and we should > make sure that it works well. > > Does the diff below solve the problem? If so, can you confirm that it > doesn't appear to regress anything else for GDB? Thanks for the patch. I'll exercise this in various ways to see if anything breaks. > > Cheers, > > Will > > --->8 > > diff --git a/arch/arm64/include/asm/debug-monitors.h b/arch/arm64/include/asm/debug-monitors.h > index 7619f473155f..d825e3585e28 100644 > --- a/arch/arm64/include/asm/debug-monitors.h > +++ b/arch/arm64/include/asm/debug-monitors.h > @@ -109,6 +109,8 @@ void disable_debug_monitors(enum dbg_active_el el); > > void user_rewind_single_step(struct task_struct *task); > void user_fastforward_single_step(struct task_struct *task); > +void user_regs_reset_single_step(struct user_pt_regs *regs, > + struct task_struct *task); > > void kernel_enable_single_step(struct pt_regs *regs); > void kernel_disable_single_step(void); > diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c > index 48222a4760c2..7569deb1eac1 100644 > --- a/arch/arm64/kernel/debug-monitors.c > +++ b/arch/arm64/kernel/debug-monitors.c > @@ -141,17 +141,20 @@ postcore_initcall(debug_monitors_init); > /* > * Single step API and exception handling. > */ > -static void set_regs_spsr_ss(struct pt_regs *regs) > +static void set_user_regs_spsr_ss(struct user_pt_regs *regs) > { > regs->pstate |= DBG_SPSR_SS; > } > -NOKPROBE_SYMBOL(set_regs_spsr_ss); > +NOKPROBE_SYMBOL(set_user_regs_spsr_ss); > > -static void clear_regs_spsr_ss(struct pt_regs *regs) > +static void clear_user_regs_spsr_ss(struct user_pt_regs *regs) > { > regs->pstate &= ~DBG_SPSR_SS; > } > -NOKPROBE_SYMBOL(clear_regs_spsr_ss); > +NOKPROBE_SYMBOL(clear_user_regs_spsr_ss); > + > +#define set_regs_spsr_ss(r) set_user_regs_spsr_ss(&(r)->user_regs) > +#define clear_regs_spsr_ss(r) clear_user_regs_spsr_ss(&(r)->user_regs) > > static DEFINE_SPINLOCK(debug_hook_lock); > static LIST_HEAD(user_step_hook); > @@ -404,6 +407,15 @@ void user_fastforward_single_step(struct task_struct *task) > clear_regs_spsr_ss(task_pt_regs(task)); > } > > +void user_regs_reset_single_step(struct user_pt_regs *regs, > + struct task_struct *task) > +{ > + if (test_tsk_thread_flag(task, TIF_SINGLESTEP)) > + set_user_regs_spsr_ss(regs); > + else > + clear_user_regs_spsr_ss(regs); > +} > + > /* Kernel API */ > void kernel_enable_single_step(struct pt_regs *regs) > { > diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c > index cd6e5fa48b9c..d479fbcbd0d2 100644 > --- a/arch/arm64/kernel/ptrace.c > +++ b/arch/arm64/kernel/ptrace.c > @@ -1934,8 +1934,8 @@ static int valid_native_regs(struct user_pt_regs *regs) > */ > int valid_user_regs(struct user_pt_regs *regs, struct task_struct *task) > { > - if (!test_tsk_thread_flag(task, TIF_SINGLESTEP)) > - regs->pstate &= ~DBG_SPSR_SS; > + /* https://lore.kernel.org/lkml/20191118131525.GA4180@willie-the-truck */ > + user_regs_reset_single_step(regs, task); > > if (is_compat_thread(task_thread_info(task))) > return valid_compat_regs(regs); > diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c > index 339882db5a91..bc54bdbfd760 100644 > --- a/arch/arm64/kernel/signal.c > +++ b/arch/arm64/kernel/signal.c > @@ -505,8 +505,12 @@ static int restore_sigframe(struct pt_regs *regs, > forget_syscall(regs); > > err |= !valid_user_regs(®s->user_regs, current); > - if (err == 0) > + > + if (err == 0) { > + /* Make it look like we stepped the sigreturn system call */ > + user_fastforward_single_step(current); > err = parse_user_sigframe(&user, sf); > + } > > if (err == 0 && system_supports_fpsimd()) { > if (!user.fpsimd) > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction 2020-02-13 17:07 ` Luis Machado @ 2020-02-14 15:45 ` Luis Machado 2020-02-18 8:44 ` Will Deacon 0 siblings, 1 reply; 17+ messages in thread From: Luis Machado @ 2020-02-14 15:45 UTC (permalink / raw) To: Will Deacon; +Cc: mark.rutland, linux-arm-kernel Will, On 2/13/20 2:07 PM, Luis Machado wrote: > Hi Will, > > On 2/13/20 9:01 AM, Will Deacon wrote: >> Hi Luis, >> >> Sorry for the very slow reply. I talked to Mark about this a bit but it >> seems that we never followed up here. > > No worries. > >> >> On Tue, Dec 10, 2019 at 05:00:18PM -0300, Luis Machado wrote: >>> Do you have any input regarding this particular situation? >>> >>> It would be nice to get this fixed before the release of another GDB >>> version, if the fix is to live in GDB itself. >> >> Basically, I'm very nervous about fixing this in the kernel because >> whatever we do will be visible to userspace. On the other hand, this >> part of the ptrace interface is only seriously used by GDB and we should >> make sure that it works well. >> >> Does the diff below solve the problem? If so, can you confirm that it >> doesn't appear to regress anything else for GDB? > > Thanks for the patch. I'll exercise this in various ways to see if > anything breaks. > I gave this a try with the particular test in GDB's testsuite that exposed the problem. It is working as expected now, so we're single-stepping past the instruction correctly instead of getting a spurious SIGTRAP. I managed to run a few other tests related to syscalls and signals and they also executed as expected. But this was inside QEMU. Do you see any potential scenarios where this change may break things? Other things i should try to exercise? Given we need to be careful with a kernel patch at this stage, i just want to make sure I covered all/most the possible cases. Otherwise, I'm happy with this change. Thanks for putting it together! _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction 2020-02-14 15:45 ` Luis Machado @ 2020-02-18 8:44 ` Will Deacon 2020-02-18 10:33 ` Luis Machado 0 siblings, 1 reply; 17+ messages in thread From: Will Deacon @ 2020-02-18 8:44 UTC (permalink / raw) To: Luis Machado; +Cc: mark.rutland, linux-arm-kernel On Fri, Feb 14, 2020 at 12:45:31PM -0300, Luis Machado wrote: > On 2/13/20 2:07 PM, Luis Machado wrote: > > On 2/13/20 9:01 AM, Will Deacon wrote: > > > Sorry for the very slow reply. I talked to Mark about this a bit but it > > > seems that we never followed up here. > > > > No worries. > > > > > > > > On Tue, Dec 10, 2019 at 05:00:18PM -0300, Luis Machado wrote: > > > > Do you have any input regarding this particular situation? > > > > > > > > It would be nice to get this fixed before the release of another GDB > > > > version, if the fix is to live in GDB itself. > > > > > > Basically, I'm very nervous about fixing this in the kernel because > > > whatever we do will be visible to userspace. On the other hand, this > > > part of the ptrace interface is only seriously used by GDB and we should > > > make sure that it works well. > > > > > > Does the diff below solve the problem? If so, can you confirm that it > > > doesn't appear to regress anything else for GDB? > > > > Thanks for the patch. I'll exercise this in various ways to see if > > anything breaks. > > > > I gave this a try with the particular test in GDB's testsuite that exposed > the problem. It is working as expected now, so we're single-stepping past > the instruction correctly instead of getting a spurious SIGTRAP. > > I managed to run a few other tests related to syscalls and signals and they > also executed as expected. But this was inside QEMU. > > Do you see any potential scenarios where this change may break things? Other > things i should try to exercise? Could you run the entire testsuite please and check there aren't any regressions? Hardware would be best, but QEMU is still useful. > Given we need to be careful with a kernel patch at this stage, i just want > to make sure I covered all/most the possible cases. > > Otherwise, I'm happy with this change. Thanks for putting it together! I'll add your Tested-by, but I'd still like review from Mark. Will _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction 2020-02-18 8:44 ` Will Deacon @ 2020-02-18 10:33 ` Luis Machado 2020-02-26 13:01 ` Luis Machado 0 siblings, 1 reply; 17+ messages in thread From: Luis Machado @ 2020-02-18 10:33 UTC (permalink / raw) To: Will Deacon; +Cc: mark.rutland, linux-arm-kernel On 2/18/20 5:44 AM, Will Deacon wrote: > On Fri, Feb 14, 2020 at 12:45:31PM -0300, Luis Machado wrote: >> On 2/13/20 2:07 PM, Luis Machado wrote: >>> On 2/13/20 9:01 AM, Will Deacon wrote: >>>> Sorry for the very slow reply. I talked to Mark about this a bit but it >>>> seems that we never followed up here. >>> >>> No worries. >>> >>>> >>>> On Tue, Dec 10, 2019 at 05:00:18PM -0300, Luis Machado wrote: >>>>> Do you have any input regarding this particular situation? >>>>> >>>>> It would be nice to get this fixed before the release of another GDB >>>>> version, if the fix is to live in GDB itself. >>>> >>>> Basically, I'm very nervous about fixing this in the kernel because >>>> whatever we do will be visible to userspace. On the other hand, this >>>> part of the ptrace interface is only seriously used by GDB and we should >>>> make sure that it works well. >>>> >>>> Does the diff below solve the problem? If so, can you confirm that it >>>> doesn't appear to regress anything else for GDB? >>> >>> Thanks for the patch. I'll exercise this in various ways to see if >>> anything breaks. >>> >> >> I gave this a try with the particular test in GDB's testsuite that exposed >> the problem. It is working as expected now, so we're single-stepping past >> the instruction correctly instead of getting a spurious SIGTRAP. >> >> I managed to run a few other tests related to syscalls and signals and they >> also executed as expected. But this was inside QEMU. >> >> Do you see any potential scenarios where this change may break things? Other >> things i should try to exercise? > > Could you run the entire testsuite please and check there aren't any > regressions? Hardware would be best, but QEMU is still useful. > I'll try to get a hold of hardware to do this. QEMU will be too slow and we'll likely see some failures due to running things in QEMU as well. I'll let you know. >> Given we need to be careful with a kernel patch at this stage, i just want >> to make sure I covered all/most the possible cases. >> >> Otherwise, I'm happy with this change. Thanks for putting it together! > > I'll add your Tested-by, but I'd still like review from Mark. > > Will > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction 2020-02-18 10:33 ` Luis Machado @ 2020-02-26 13:01 ` Luis Machado 0 siblings, 0 replies; 17+ messages in thread From: Luis Machado @ 2020-02-26 13:01 UTC (permalink / raw) To: Will Deacon; +Cc: mark.rutland, linux-arm-kernel Hi, On 2/18/20 7:33 AM, Luis Machado wrote: > On 2/18/20 5:44 AM, Will Deacon wrote: >> On Fri, Feb 14, 2020 at 12:45:31PM -0300, Luis Machado wrote: >>> On 2/13/20 2:07 PM, Luis Machado wrote: >>>> On 2/13/20 9:01 AM, Will Deacon wrote: >>>>> Sorry for the very slow reply. I talked to Mark about this a bit >>>>> but it >>>>> seems that we never followed up here. >>>> >>>> No worries. >>>> >>>>> >>>>> On Tue, Dec 10, 2019 at 05:00:18PM -0300, Luis Machado wrote: >>>>>> Do you have any input regarding this particular situation? >>>>>> >>>>>> It would be nice to get this fixed before the release of another GDB >>>>>> version, if the fix is to live in GDB itself. >>>>> >>>>> Basically, I'm very nervous about fixing this in the kernel because >>>>> whatever we do will be visible to userspace. On the other hand, this >>>>> part of the ptrace interface is only seriously used by GDB and we >>>>> should >>>>> make sure that it works well. >>>>> >>>>> Does the diff below solve the problem? If so, can you confirm that it >>>>> doesn't appear to regress anything else for GDB? >>>> >>>> Thanks for the patch. I'll exercise this in various ways to see if >>>> anything breaks. >>>> >>> >>> I gave this a try with the particular test in GDB's testsuite that >>> exposed >>> the problem. It is working as expected now, so we're single-stepping >>> past >>> the instruction correctly instead of getting a spurious SIGTRAP. >>> >>> I managed to run a few other tests related to syscalls and signals >>> and they >>> also executed as expected. But this was inside QEMU. >>> >>> Do you see any potential scenarios where this change may break >>> things? Other >>> things i should try to exercise? >> >> Could you run the entire testsuite please and check there aren't any >> regressions? Hardware would be best, but QEMU is still useful. >> > > I'll try to get a hold of hardware to do this. QEMU will be too slow and > we'll likely see some failures due to running things in QEMU as well. > > I'll let you know. So i managed to do a complete GDB testsuite run inside a system mode QEMU, with both the patched and unpatched kernel. I did not see any regressions. I also noticed the particular testcase where we were having the single-stepping hiccup is running as it should now. So, from GDB's perspective, this patch looks good. Let me know if there are any corner cases i should exercise (maybe by hand). Thanks, Luis _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction 2020-02-13 12:01 ` Will Deacon 2020-02-13 17:07 ` Luis Machado @ 2020-02-20 13:02 ` Mark Rutland 2020-02-20 13:29 ` Will Deacon 1 sibling, 1 reply; 17+ messages in thread From: Mark Rutland @ 2020-02-20 13:02 UTC (permalink / raw) To: Will Deacon; +Cc: Luis Machado, linux-arm-kernel Hi Will, Luis, On Thu, Feb 13, 2020 at 12:01:16PM +0000, Will Deacon wrote: > Sorry for the very slow reply. I talked to Mark about this a bit but it > seems that we never followed up here. > > On Tue, Dec 10, 2019 at 05:00:18PM -0300, Luis Machado wrote: > > Do you have any input regarding this particular situation? > > > > It would be nice to get this fixed before the release of another GDB > > version, if the fix is to live in GDB itself. > > Basically, I'm very nervous about fixing this in the kernel because > whatever we do will be visible to userspace. On the other hand, this > part of the ptrace interface is only seriously used by GDB and we should > make sure that it works well. > > Does the diff below solve the problem? If so, can you confirm that it > doesn't appear to regress anything else for GDB? > > Cheers, > > Will > > --->8 > > diff --git a/arch/arm64/include/asm/debug-monitors.h b/arch/arm64/include/asm/debug-monitors.h > index 7619f473155f..d825e3585e28 100644 > --- a/arch/arm64/include/asm/debug-monitors.h > +++ b/arch/arm64/include/asm/debug-monitors.h > @@ -109,6 +109,8 @@ void disable_debug_monitors(enum dbg_active_el el); > > void user_rewind_single_step(struct task_struct *task); > void user_fastforward_single_step(struct task_struct *task); > +void user_regs_reset_single_step(struct user_pt_regs *regs, > + struct task_struct *task); > > void kernel_enable_single_step(struct pt_regs *regs); > void kernel_disable_single_step(void); > diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c > index 48222a4760c2..7569deb1eac1 100644 > --- a/arch/arm64/kernel/debug-monitors.c > +++ b/arch/arm64/kernel/debug-monitors.c > @@ -141,17 +141,20 @@ postcore_initcall(debug_monitors_init); > /* > * Single step API and exception handling. > */ > -static void set_regs_spsr_ss(struct pt_regs *regs) > +static void set_user_regs_spsr_ss(struct user_pt_regs *regs) > { > regs->pstate |= DBG_SPSR_SS; > } > -NOKPROBE_SYMBOL(set_regs_spsr_ss); > +NOKPROBE_SYMBOL(set_user_regs_spsr_ss); > > -static void clear_regs_spsr_ss(struct pt_regs *regs) > +static void clear_user_regs_spsr_ss(struct user_pt_regs *regs) > { > regs->pstate &= ~DBG_SPSR_SS; > } > -NOKPROBE_SYMBOL(clear_regs_spsr_ss); > +NOKPROBE_SYMBOL(clear_user_regs_spsr_ss); > + > +#define set_regs_spsr_ss(r) set_user_regs_spsr_ss(&(r)->user_regs) > +#define clear_regs_spsr_ss(r) clear_user_regs_spsr_ss(&(r)->user_regs) > > static DEFINE_SPINLOCK(debug_hook_lock); > static LIST_HEAD(user_step_hook); > @@ -404,6 +407,15 @@ void user_fastforward_single_step(struct task_struct *task) > clear_regs_spsr_ss(task_pt_regs(task)); > } > > +void user_regs_reset_single_step(struct user_pt_regs *regs, > + struct task_struct *task) > +{ > + if (test_tsk_thread_flag(task, TIF_SINGLESTEP)) > + set_user_regs_spsr_ss(regs); > + else > + clear_user_regs_spsr_ss(regs); > +} > + > /* Kernel API */ > void kernel_enable_single_step(struct pt_regs *regs) > { > diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c > index cd6e5fa48b9c..d479fbcbd0d2 100644 > --- a/arch/arm64/kernel/ptrace.c > +++ b/arch/arm64/kernel/ptrace.c > @@ -1934,8 +1934,8 @@ static int valid_native_regs(struct user_pt_regs *regs) > */ > int valid_user_regs(struct user_pt_regs *regs, struct task_struct *task) > { > - if (!test_tsk_thread_flag(task, TIF_SINGLESTEP)) > - regs->pstate &= ~DBG_SPSR_SS; > + /* https://lore.kernel.org/lkml/20191118131525.GA4180@willie-the-truck */ > + user_regs_reset_single_step(regs, task); I think this change means we do the right thing for signal entry/return and ptrace messing with the regs. Instruction emulation seems to do the right thing via skip_faulting_instruction(). I think there are a few more single-step edge cases lying around (e.g. uprobes, rseq), but it looks like those have to be fixed separately. I fear fixing uprobes might require a largler structural change to single step, but ignoring uprobes the changes above seem to be sound. If userspace doesn't consume the SS value today, I wonder if we should hide it when dumping the SPSR to userspace, so that userspace has a consistent view regardless of whether it's being stepped. I'll try to dig into the uprobes stuff this afternoon, just in case that needs us to do something substantially different. The existing logic in valid_user_regs() doesn't make sense to me, given SPSR_EL1.SS is immaterial unless MSCDR_EL1.SS == 1. I'm not sure if that was overzealous or I've forgotten an edge case that we cared about in the past. > > if (is_compat_thread(task_thread_info(task))) > return valid_compat_regs(regs); > diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c > index 339882db5a91..bc54bdbfd760 100644 > --- a/arch/arm64/kernel/signal.c > +++ b/arch/arm64/kernel/signal.c > @@ -505,8 +505,12 @@ static int restore_sigframe(struct pt_regs *regs, > forget_syscall(regs); > > err |= !valid_user_regs(®s->user_regs, current); > - if (err == 0) > + > + if (err == 0) { > + /* Make it look like we stepped the sigreturn system call */ > + user_fastforward_single_step(current); > err = parse_user_sigframe(&user, sf); > + } I don't understand this. AFAICT we don't likewise for other SVCs, so either I'm missing that, or there's something else I'm missing. Why do we need to step sigreturn but not SVC generally? Thanks, Mark. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction 2020-02-20 13:02 ` Mark Rutland @ 2020-02-20 13:29 ` Will Deacon 2020-02-21 11:16 ` Mark Rutland 0 siblings, 1 reply; 17+ messages in thread From: Will Deacon @ 2020-02-20 13:29 UTC (permalink / raw) To: Mark Rutland; +Cc: Luis Machado, linux-arm-kernel Hi Mark, Thanks for having a look. On Thu, Feb 20, 2020 at 01:02:22PM +0000, Mark Rutland wrote: > On Thu, Feb 13, 2020 at 12:01:16PM +0000, Will Deacon wrote: > > diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c > > index cd6e5fa48b9c..d479fbcbd0d2 100644 > > --- a/arch/arm64/kernel/ptrace.c > > +++ b/arch/arm64/kernel/ptrace.c > > @@ -1934,8 +1934,8 @@ static int valid_native_regs(struct user_pt_regs *regs) > > */ > > int valid_user_regs(struct user_pt_regs *regs, struct task_struct *task) > > { > > - if (!test_tsk_thread_flag(task, TIF_SINGLESTEP)) > > - regs->pstate &= ~DBG_SPSR_SS; > > + /* https://lore.kernel.org/lkml/20191118131525.GA4180@willie-the-truck */ > > + user_regs_reset_single_step(regs, task); > > I think this change means we do the right thing for signal entry/return > and ptrace messing with the regs. Instruction emulation seems to do the > right thing via skip_faulting_instruction(). > > I think there are a few more single-step edge cases lying around (e.g. > uprobes, rseq), but it looks like those have to be fixed separately. I > fear fixing uprobes might require a largler structural change to single > step, but ignoring uprobes the changes above seem to be sound. Rseq should just abort when delivering the step signal and I'm not sure I see the issue with uprobes. Can you elaborate on your concerns a bit, please? > If userspace doesn't consume the SS value today, I wonder if we should > hide it when dumping the SPSR to userspace, so that userspace has a > consistent view regardless of whether it's being stepped. You can't really hide it though, because '0' has a meaning so I don't think it gains us a lot other than increasing the scope of the change. > I'll try to dig into the uprobes stuff this afternoon, just in case that > needs us to do something substantially different. Thanks. > The existing logic in valid_user_regs() doesn't make sense to me, given > SPSR_EL1.SS is immaterial unless MSCDR_EL1.SS == 1. I'm not sure if that > was overzealous or I've forgotten an edge case that we cared about in > the past. I think it was just part of sanitising the registers to a consistent value, but I agree that it wouldn't have a functional impact. > > diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c > > index 339882db5a91..bc54bdbfd760 100644 > > --- a/arch/arm64/kernel/signal.c > > +++ b/arch/arm64/kernel/signal.c > > @@ -505,8 +505,12 @@ static int restore_sigframe(struct pt_regs *regs, > > forget_syscall(regs); > > > > err |= !valid_user_regs(®s->user_regs, current); > > - if (err == 0) > > + > > + if (err == 0) { > > + /* Make it look like we stepped the sigreturn system call */ > > + user_fastforward_single_step(current); > > err = parse_user_sigframe(&user, sf); > > + } > > I don't understand this. AFAICT we don't likewise for other SVCs, so > either I'm missing that, or there's something else I'm missing. > > Why do we need to step sigreturn but not SVC generally? Because we restore the SPSR from the sigframe during sigreturn, so we will end up with PSTATE.SS set when it should be cleared. Will _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction 2020-02-20 13:29 ` Will Deacon @ 2020-02-21 11:16 ` Mark Rutland 2020-05-27 14:39 ` Luis Machado 2020-05-31 9:52 ` Will Deacon 0 siblings, 2 replies; 17+ messages in thread From: Mark Rutland @ 2020-02-21 11:16 UTC (permalink / raw) To: Will Deacon; +Cc: Luis Machado, linux-arm-kernel On Thu, Feb 20, 2020 at 01:29:42PM +0000, Will Deacon wrote: > Hi Mark, > > Thanks for having a look. > > On Thu, Feb 20, 2020 at 01:02:22PM +0000, Mark Rutland wrote: > > On Thu, Feb 13, 2020 at 12:01:16PM +0000, Will Deacon wrote: > > > diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c > > > index cd6e5fa48b9c..d479fbcbd0d2 100644 > > > --- a/arch/arm64/kernel/ptrace.c > > > +++ b/arch/arm64/kernel/ptrace.c > > > @@ -1934,8 +1934,8 @@ static int valid_native_regs(struct user_pt_regs *regs) > > > */ > > > int valid_user_regs(struct user_pt_regs *regs, struct task_struct *task) > > > { > > > - if (!test_tsk_thread_flag(task, TIF_SINGLESTEP)) > > > - regs->pstate &= ~DBG_SPSR_SS; > > > + /* https://lore.kernel.org/lkml/20191118131525.GA4180@willie-the-truck */ > > > + user_regs_reset_single_step(regs, task); > > > > I think this change means we do the right thing for signal entry/return > > and ptrace messing with the regs. Instruction emulation seems to do the > > right thing via skip_faulting_instruction(). > > > > I think there are a few more single-step edge cases lying around (e.g. > > uprobes, rseq), but it looks like those have to be fixed separately. I > > fear fixing uprobes might require a largler structural change to single > > step, but ignoring uprobes the changes above seem to be sound. > > Rseq should just abort when delivering the step signal and I'm not sure I > see the issue with uprobes. Can you elaborate on your concerns a bit, > please? For rseq I wasn't sure what state PSTATE.SS should be when we head to the abort handler -- I think the sensible thing would be that it immediately triggers a single-step exception, but I don't see where we'd clear PSTATE.SS to ensure that. For uprobes I fear that the uprobes xol single-stepping might end up conflicting with the regular ptrace single-stepping, and that the uprobes emulation might not always advance the state machine correctly. > > If userspace doesn't consume the SS value today, I wonder if we should > > hide it when dumping the SPSR to userspace, so that userspace has a > > consistent view regardless of whether it's being stepped. > > You can't really hide it though, because '0' has a meaning so I don't think > it gains us a lot other than increasing the scope of the change. I think that it reduces the likelihood that single-stepping a program changes its behaviour unexpectedly. This patch makes the kernel disregard the PSTATE.SS value provided by userspace, so what is gained by exposing PSTATE.SS to userspace at all? I do agree that there are potentially subtle landmines here; I just can't see a legitimate reason for userspace to need the value. > > I'll try to dig into the uprobes stuff this afternoon, just in case > > that > > needs us to do something substantially different. > > Thanks. I didn't get the chance to do this yesterday, but I did think of another potential problem. I *think* that when attempting to single-step a syscall, if prior to return from the syscall the tracer messed with the tracee's regs (e.g. to mess with arguments or the retun value) then valid_user_regs() will set the SS bit, and upon return from the syscall the next instruction would be executed rather than first raising a single-step exception. This patch relies on valid_user_regs() being a signal that PSTATE.SS is stale, but that's not always the case. To handle that generally I suspect we need two bits of state rather than just TIF_SINGLESTEP. > > The existing logic in valid_user_regs() doesn't make sense to me, given > > SPSR_EL1.SS is immaterial unless MSCDR_EL1.SS == 1. I'm not sure if that > > was overzealous or I've forgotten an edge case that we cared about in > > the past. > > I think it was just part of sanitising the registers to a consistent value, > but I agree that it wouldn't have a functional impact. Thanks for confirming my understanding. I guess this may have minimized the cases where userspace saw PSTATE.SS set. > > > diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c > > > index 339882db5a91..bc54bdbfd760 100644 > > > --- a/arch/arm64/kernel/signal.c > > > +++ b/arch/arm64/kernel/signal.c > > > @@ -505,8 +505,12 @@ static int restore_sigframe(struct pt_regs *regs, > > > forget_syscall(regs); > > > > > > err |= !valid_user_regs(®s->user_regs, current); > > > - if (err == 0) > > > + > > > + if (err == 0) { > > > + /* Make it look like we stepped the sigreturn system call */ > > > + user_fastforward_single_step(current); > > > err = parse_user_sigframe(&user, sf); > > > + } > > > > I don't understand this. AFAICT we don't likewise for other SVCs, so > > either I'm missing that, or there's something else I'm missing. > > > > Why do we need to step sigreturn but not SVC generally? > > Because we restore the SPSR from the sigframe during sigreturn, so we will > end up with PSTATE.SS set when it should be cleared. Ah, I see. As above, I think we can hit a similar case when single-stepping an SVC for a regular syscall. Thanks, Mark. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction 2020-02-21 11:16 ` Mark Rutland @ 2020-05-27 14:39 ` Luis Machado 2020-05-31 9:52 ` Will Deacon 1 sibling, 0 replies; 17+ messages in thread From: Luis Machado @ 2020-05-27 14:39 UTC (permalink / raw) To: Mark Rutland, Will Deacon; +Cc: linux-arm-kernel Hi, On 2/21/20 8:16 AM, Mark Rutland wrote: > On Thu, Feb 20, 2020 at 01:29:42PM +0000, Will Deacon wrote: >> Hi Mark, >> >> Thanks for having a look. >> >> On Thu, Feb 20, 2020 at 01:02:22PM +0000, Mark Rutland wrote: >>> On Thu, Feb 13, 2020 at 12:01:16PM +0000, Will Deacon wrote: >>>> diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c >>>> index cd6e5fa48b9c..d479fbcbd0d2 100644 >>>> --- a/arch/arm64/kernel/ptrace.c >>>> +++ b/arch/arm64/kernel/ptrace.c >>>> @@ -1934,8 +1934,8 @@ static int valid_native_regs(struct user_pt_regs *regs) >>>> */ >>>> int valid_user_regs(struct user_pt_regs *regs, struct task_struct *task) >>>> { >>>> - if (!test_tsk_thread_flag(task, TIF_SINGLESTEP)) >>>> - regs->pstate &= ~DBG_SPSR_SS; >>>> + /* https://lore.kernel.org/lkml/20191118131525.GA4180@willie-the-truck */ >>>> + user_regs_reset_single_step(regs, task); >>> >>> I think this change means we do the right thing for signal entry/return >>> and ptrace messing with the regs. Instruction emulation seems to do the >>> right thing via skip_faulting_instruction(). >>> >>> I think there are a few more single-step edge cases lying around (e.g. >>> uprobes, rseq), but it looks like those have to be fixed separately. I >>> fear fixing uprobes might require a largler structural change to single >>> step, but ignoring uprobes the changes above seem to be sound. >> >> Rseq should just abort when delivering the step signal and I'm not sure I >> see the issue with uprobes. Can you elaborate on your concerns a bit, >> please? > > For rseq I wasn't sure what state PSTATE.SS should be when we head to > the abort handler -- I think the sensible thing would be that it > immediately triggers a single-step exception, but I don't see where we'd > clear PSTATE.SS to ensure that. > > For uprobes I fear that the uprobes xol single-stepping might end up > conflicting with the regular ptrace single-stepping, and that the > uprobes emulation might not always advance the state machine correctly. > >>> If userspace doesn't consume the SS value today, I wonder if we should >>> hide it when dumping the SPSR to userspace, so that userspace has a >>> consistent view regardless of whether it's being stepped. >> >> You can't really hide it though, because '0' has a meaning so I don't think >> it gains us a lot other than increasing the scope of the change. > > I think that it reduces the likelihood that single-stepping a program > changes its behaviour unexpectedly. This patch makes the kernel > disregard the PSTATE.SS value provided by userspace, so what is gained > by exposing PSTATE.SS to userspace at all? > > I do agree that there are potentially subtle landmines here; I just > can't see a legitimate reason for userspace to need the value. > >>> I'll try to dig into the uprobes stuff this afternoon, just in case >>> that >>> needs us to do something substantially different. >> >> Thanks. > > I didn't get the chance to do this yesterday, but I did think of another > potential problem. > > I *think* that when attempting to single-step a syscall, if prior to > return from the syscall the tracer messed with the tracee's regs (e.g. > to mess with arguments or the retun value) then valid_user_regs() will > set the SS bit, and upon return from the syscall the next instruction > would be executed rather than first raising a single-step exception. > > This patch relies on valid_user_regs() being a signal that PSTATE.SS is > stale, but that's not always the case. To handle that generally I > suspect we need two bits of state rather than just TIF_SINGLESTEP. > >>> The existing logic in valid_user_regs() doesn't make sense to me, given >>> SPSR_EL1.SS is immaterial unless MSCDR_EL1.SS == 1. I'm not sure if that >>> was overzealous or I've forgotten an edge case that we cared about in >>> the past. >> >> I think it was just part of sanitising the registers to a consistent value, >> but I agree that it wouldn't have a functional impact. > > Thanks for confirming my understanding. I guess this may have minimized > the cases where userspace saw PSTATE.SS set. > >>>> diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c >>>> index 339882db5a91..bc54bdbfd760 100644 >>>> --- a/arch/arm64/kernel/signal.c >>>> +++ b/arch/arm64/kernel/signal.c >>>> @@ -505,8 +505,12 @@ static int restore_sigframe(struct pt_regs *regs, >>>> forget_syscall(regs); >>>> >>>> err |= !valid_user_regs(®s->user_regs, current); >>>> - if (err == 0) >>>> + >>>> + if (err == 0) { >>>> + /* Make it look like we stepped the sigreturn system call */ >>>> + user_fastforward_single_step(current); >>>> err = parse_user_sigframe(&user, sf); >>>> + } >>> >>> I don't understand this. AFAICT we don't likewise for other SVCs, so >>> either I'm missing that, or there's something else I'm missing. >>> >>> Why do we need to step sigreturn but not SVC generally? >> >> Because we restore the SPSR from the sigframe during sigreturn, so we will >> end up with PSTATE.SS set when it should be cleared. > > Ah, I see. As above, I think we can hit a similar case when > single-stepping an SVC for a regular syscall. > > Thanks, > Mark. > Did we have any further developments on this front? Has a patch made its way upstream for review? _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction 2020-02-21 11:16 ` Mark Rutland 2020-05-27 14:39 ` Luis Machado @ 2020-05-31 9:52 ` Will Deacon 1 sibling, 0 replies; 17+ messages in thread From: Will Deacon @ 2020-05-31 9:52 UTC (permalink / raw) To: Mark Rutland; +Cc: Luis Machado, linux-arm-kernel Hi folks, Sorry, I wrote a reply to this on a plane (so you can tell how long ago that was!) and then forgot about it. On Fri, Feb 21, 2020 at 11:16:53AM +0000, Mark Rutland wrote: > On Thu, Feb 20, 2020 at 01:29:42PM +0000, Will Deacon wrote: > > On Thu, Feb 20, 2020 at 01:02:22PM +0000, Mark Rutland wrote: > > > On Thu, Feb 13, 2020 at 12:01:16PM +0000, Will Deacon wrote: > > > > diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c > > > > index cd6e5fa48b9c..d479fbcbd0d2 100644 > > > > --- a/arch/arm64/kernel/ptrace.c > > > > +++ b/arch/arm64/kernel/ptrace.c > > > > @@ -1934,8 +1934,8 @@ static int valid_native_regs(struct user_pt_regs *regs) > > > > */ > > > > int valid_user_regs(struct user_pt_regs *regs, struct task_struct *task) > > > > { > > > > - if (!test_tsk_thread_flag(task, TIF_SINGLESTEP)) > > > > - regs->pstate &= ~DBG_SPSR_SS; > > > > + /* https://lore.kernel.org/lkml/20191118131525.GA4180@willie-the-truck */ > > > > + user_regs_reset_single_step(regs, task); > > > > > > I think this change means we do the right thing for signal entry/return > > > and ptrace messing with the regs. Instruction emulation seems to do the > > > right thing via skip_faulting_instruction(). > > > > > > I think there are a few more single-step edge cases lying around (e.g. > > > uprobes, rseq), but it looks like those have to be fixed separately. I > > > fear fixing uprobes might require a largler structural change to single > > > step, but ignoring uprobes the changes above seem to be sound. > > > > Rseq should just abort when delivering the step signal and I'm not sure I > > see the issue with uprobes. Can you elaborate on your concerns a bit, > > please? > > For rseq I wasn't sure what state PSTATE.SS should be when we head to > the abort handler -- I think the sensible thing would be that it > immediately triggers a single-step exception, but I don't see where we'd > clear PSTATE.SS to ensure that. > > For uprobes I fear that the uprobes xol single-stepping might end up > conflicting with the regular ptrace single-stepping, and that the > uprobes emulation might not always advance the state machine correctly. > > > > If userspace doesn't consume the SS value today, I wonder if we should > > > hide it when dumping the SPSR to userspace, so that userspace has a > > > consistent view regardless of whether it's being stepped. > > > > You can't really hide it though, because '0' has a meaning so I don't think > > it gains us a lot other than increasing the scope of the change. > > I think that it reduces the likelihood that single-stepping a program > changes its behaviour unexpectedly. This patch makes the kernel > disregard the PSTATE.SS value provided by userspace, so what is gained > by exposing PSTATE.SS to userspace at all? > > I do agree that there are potentially subtle landmines here; I just > can't see a legitimate reason for userspace to need the value. > > > > I'll try to dig into the uprobes stuff this afternoon, just in case > > > that > > > needs us to do something substantially different. > > > > Thanks. > > I didn't get the chance to do this yesterday, but I did think of another > potential problem. > > I *think* that when attempting to single-step a syscall, if prior to > return from the syscall the tracer messed with the tracee's regs (e.g. > to mess with arguments or the retun value) then valid_user_regs() will > set the SS bit, and upon return from the syscall the next instruction > would be executed rather than first raising a single-step exception. I don't actually think that's a problem: if the tracer has taken control by e.g. PTRACE_SYSCALL and modified the registers on the syscall return path, then it has to resume execution of the tracee somehow. There's nothing like a "PTRACE_RESUME_SINGLESTEP" request, so it would need to issue something like PTRACE_CONT (which disables stepping altogether) or PTRACE_SINGLESTEP, which would step over the first instruction after the SVC. That's the same as the behaviour today. > This patch relies on valid_user_regs() being a signal that PSTATE.SS is > stale, but that's not always the case. To handle that generally I > suspect we need two bits of state rather than just TIF_SINGLESTEP. Added another bit of state feels like we'll open up another can of worms. Given that I don't think we need it for ptrace, I'll write this up as a proper patch. Will _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction 2019-11-26 16:35 ` Luis Machado 2019-12-10 20:00 ` Luis Machado @ 2020-01-13 18:13 ` Luis Machado 1 sibling, 0 replies; 17+ messages in thread From: Luis Machado @ 2020-01-13 18:13 UTC (permalink / raw) Cc: mark.rutland, linux-arm-kernel Ping? On 11/26/19 1:35 PM, Luis Machado wrote: > ping? > > On 11/18/19 11:54 AM, Luis Machado wrote: >> Hi Will, >> >> Thanks for the thorough explanation. >> >> On 11/18/19 10:15 AM, Will Deacon wrote: >>> Hi Luis, >>> >>> [+Mark for the valid_user_regs() part] >>> >>> On Tue, Nov 12, 2019 at 08:22:10PM -0300, Luis Machado wrote: >>>> I've noticed, under very specific conditions, that a PTRACE_SINGLESTEP >>>> request by GDB won't execute the underlying instruction. As a >>>> consequence, >>>> the PC doesn't move, but we return a SIGTRAP just like we would for a >>>> regular successful PTRACE_SINGLESTEP request. >>>> >>>> Since there are no software breakpoints inserted at PC (we are actually >>>> stepping over a breakpoint, so GDB removes the breakpoint at PC before >>>> issuing a PTRACE_SINGLESTEP request), this is an odd behavior. >>>> >>>> Though not too harmful, i see this manifesting in the GDB testsuite >>>> (gdb.reverse/insn-reverse.exp), which throws the test off by making GDB >>>> think it is further in the instruction stream than it really is. In >>>> fact, we >>>> get lucky here and no FAIL's show up, only many more spurious PASSes. >>> >>> I managed to reproduce this locally and I think I've figured out what's >>> going on, although I'm not sure that the kernel is the best place to fix >>> it. >>> >>> Looking at the specific reproducer: >>> >>>> Execute gdb like so: >>>> >>>> gdb -ex "set displaced-stepping off" -ex "b load" -ex "run" -ex >>>> "record" -ex >>>> "si" -ex "rsi" -ex "record stop" insn-reverse >>> >>> So we've got a couple of instructions as follows (it doesn't actually >>> matter >>> what they are, so I've changed the LD1 in your binary for a NOP in >>> order to >>> avoid confusion with the "load" label not actually pointing at a load): >>> >>> 0x7b8: mov // "load" >>> 0x7bc: nop >>> >>> "b load" places a breakpoint at 0x7b8: >>> >>> ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201fd4200000) = 0 >>> >>> We run to a software breakpoint on "load" (the mov instruction). We take >>> the trap and try to execute the "si", which means we need to remove the >>> breakpoint while we step over it: >>> >>> ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201f910003e0) = 0 >>> [...] >>> ptrace(PTRACE_SINGLESTEP, 662, 0x1, 0) = 0 >>> >>> This causes the kernel to arm the single-step state machine so that >>> MDSCR_EL1.SS == SPSR_EL1.SS == 1 (known as "active-not-pending"). >>> Running >>> an instruction in userspace will transition to MDSCR_EL1.SS ==1 and >>> SPSR_EL1.SS == 0 (known as "active-pending"), which will cause the >>> trap to >>> trigger, at which point gdb puts the breakpoint instruction back >>> since the >>> step is complete: >> >> So, just to confirm my understanding, we have a couple bits >> controlling single-stepping in the kernel, one in MDSCR_EL1 and >> another in SPSR_EL1. GDB doesn't have direct access to any of those, >> correct? >> >> Instead, GDB has access to a SS bit in the reserved 21~22 range of CPSR. >> >> The transition from active-not-pending to active-pending takes place >> via a single PTRACE_SINGLESTEP request? Is that correct? >> >>> >>> ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201fd4200000) = 0 >>> >>> This is where things start to go wrong. The "rsi" command attempts to >>> perform a reverse step, which means restoring the old state when we were >>> previously executing at 0x7b8. It starts by removing the breakpoint >>> again, >>> since we've already hit that: >>> >>> ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201f910003e0) = 0 >>> >>> and then resets the CPU registers to their old values: >>> >>> (I don't know why it does this three times) >>> ptrace(PTRACE_SETREGSET, 662, NT_PRSTATUS, >>> [{iov_base=0xffffff64b3c8, iov_len=272}]) = 0 >>> ptrace(PTRACE_SETREGSET, 662, NT_PRSTATUS, >>> [{iov_base=0xffffff64b3c8, iov_len=272}]) = 0 >>> ptrace(PTRACE_SETREGSET, 662, NT_PRSTATUS, >>> [{iov_base=0xffffff64b3c8, iov_len=272}]) = 0 >>> >>> The problem with this is that we have moved the PC back to 0x7b8 but >>> we have >>> also cleared SPSR_EL1.SS to 0. Internally, the kernel hasn't seen >>> stepping >>> get disabled (this usually happens by PTRACE_CONT calling >>> user_disable_single_step()) which means that MDSCR_EL1.SS remains set >>> to 1 >>> and we're in the active-pending state! Consequently, we immediately >>> take a >>> step exception if a step operation is attempted > >> >> While trying to reproduce this, i was paying attention to the SS bit >> coming and going. But in the particular sequence of si/rsi, within the >> record boundaries, i see GDB just restored the original CPSR value to >> what it was before we processed the si command. >> >> From GDB's POV all state was restore to the way it was before and >> we're good to go. >> >> Is this not enough to restore state kernel-wise? >> >>> Now, we *could* consider hacking the TIF_SINGLESTEP check in >>> valid_user_regs() so that SPSR_EL1.SS is preserved when stepping is >>> active >>> but this is a user-visible change and may break things like stepping >>> out of >>> signal handlers. I would prefer that GDB manages the SS bit >>> explicitly in >>> this scenario, by setting it to 1 when restoring the old state in the >>> reverse step, a bit like when it disables the old breakpoint. You can >>> emulate this by doing: >> >> I think we could let GDB control this when required, but I'm trying to >> understand the ramifications of letting GDB do so. >> >> For example, what if the user decides to alter the PC here and there, >> for debugging purposes. That is a use case that happens often, in >> order to go back or skip some parts of the code. >> >> Would we need to pay attention to the SS bit in those cases as well? >> >>> >>> (gdb) set $cpsr |= (1<<21) >> >> In particular, what does the switching of this bit accomplishes in the >> kernel? Would we be better off forcing the SS bit every time we do a >> single-step operation, for example? _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2020-05-31 9:52 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-11-12 23:22 [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction Luis Machado 2019-11-18 13:15 ` Will Deacon 2019-11-18 14:54 ` Luis Machado 2019-11-26 16:35 ` Luis Machado 2019-12-10 20:00 ` Luis Machado 2020-02-13 12:01 ` Will Deacon 2020-02-13 17:07 ` Luis Machado 2020-02-14 15:45 ` Luis Machado 2020-02-18 8:44 ` Will Deacon 2020-02-18 10:33 ` Luis Machado 2020-02-26 13:01 ` Luis Machado 2020-02-20 13:02 ` Mark Rutland 2020-02-20 13:29 ` Will Deacon 2020-02-21 11:16 ` Mark Rutland 2020-05-27 14:39 ` Luis Machado 2020-05-31 9:52 ` Will Deacon 2020-01-13 18:13 ` Luis Machado
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox