* ptrace vs FSGSBASE @ 2016-04-29 18:22 Andy Lutomirski 2016-05-02 12:40 ` Oleg Nesterov 2016-05-02 14:27 ` Oleg Nesterov 0 siblings, 2 replies; 6+ messages in thread From: Andy Lutomirski @ 2016-04-29 18:22 UTC (permalink / raw) To: X86 ML, linux-kernel@vger.kernel.org, Roland McGrath, Andi Kleen, Borislav Petkov, Oleg Nesterov, Brian Gerst Suppose I'm a ptracer. Wtf is supposed to happen when I write to fs_base or gs_base? Here are some schenarios: 1. I read fs_base using ptrace. I think I should get the actual fs_base without any nonsense. 2. I read all the regs (PEEKUSER or whatever) and then write then all back verbatim. At the very least, I think that if I do this atomically using PTRACE_SETREGSET, the task's state needs to remain unchanged. Since ptrace doesn't seem to have any real concept of atomic register state changes right now (although we could add such a thing for GETREGSET), it would be convenient if writing the unchanged state back in numerical order using POKEUSER should also leave it unchanged. 3. I write fs_base on a non-FSGSBASE system. I think it should have the obvious effect of setting FSBASE to the value I wrote. 4. I write fs_base on an FSGSBASE system. I think it should have the obvious effect of setting FSBASE to the value I wrote. It would be nice if it behaved identically to #3 as well. #3 that writing fs_base using ptrace, if it's set to a value that doesn't match the base associated with the current selector, needs to zero out fs. #4 means that it should do the same on an fsgsbase system. Due to the strange design of user_regs, fs_base and gs_base are numerically below fs and gs, so #2 means that writing fs on an FSGSBASE system shouldn't override a just-written fsbase value on an FSGSBASE system. But writing fs on a non-FSGSBASE system needs to override the just-written base. But wouldn't users expect that writing a value into fs would actually change the base even on FSGSBASE systems? The best thing I've come up with so far is to have POKEUSER to fs and gs have different behaviorf on FSGSBASE systems, which IMO sucks. On the other hand, it wouldn't surprise me all that much if no one ever does this in the first place. See this, too: https://lkml.org/lkml/2007/11/21/82 -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ptrace vs FSGSBASE 2016-04-29 18:22 ptrace vs FSGSBASE Andy Lutomirski @ 2016-05-02 12:40 ` Oleg Nesterov 2016-05-02 14:27 ` Oleg Nesterov 1 sibling, 0 replies; 6+ messages in thread From: Oleg Nesterov @ 2016-05-02 12:40 UTC (permalink / raw) To: Andy Lutomirski Cc: X86 ML, linux-kernel@vger.kernel.org, Roland McGrath, Andi Kleen, Borislav Petkov, Brian Gerst ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ptrace vs FSGSBASE 2016-04-29 18:22 ptrace vs FSGSBASE Andy Lutomirski 2016-05-02 12:40 ` Oleg Nesterov @ 2016-05-02 14:27 ` Oleg Nesterov 2016-05-02 15:38 ` Andy Lutomirski 1 sibling, 1 reply; 6+ messages in thread From: Oleg Nesterov @ 2016-05-02 14:27 UTC (permalink / raw) To: Andy Lutomirski Cc: X86 ML, linux-kernel@vger.kernel.org, Roland McGrath, Andi Kleen, Borislav Petkov, Brian Gerst Hi Andy, let me first say that I never knew how this code (and the hardware) actually works, I am not sure I even understand what ARCH_SET_.S exactly does ;) What is even worse, I do not understand your question. So it is not that I am trying to help, I am asking you to help me understand the problem. On 04/29, Andy Lutomirski wrote: > > 1. I read fs_base using ptrace. I think I should get the actual > fs_base without any nonsense. Which fs_base? The member of user_regs_struct? But this structure/layout is just the ABI, so to me it seems correct that getreg() tries to look at ->fs and/or ->fsindex. IOW. getreg(fs) should return the same value as prctl(ARCH_GET_FS) returns if called by the tracee, no? > 2. I read all the regs (PEEKUSER or whatever) and then write then all > back verbatim. At the very least, I think that if I do this > atomically using PTRACE_SETREGSET, the task's state needs to remain > unchanged. Agreed... do you mean this doesn't work? > Since ptrace doesn't seem to have any real concept of > atomic register state changes right now Could you spell please? I can't understand what does "atomically" mean in this context. Oleg. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ptrace vs FSGSBASE 2016-05-02 14:27 ` Oleg Nesterov @ 2016-05-02 15:38 ` Andy Lutomirski 2016-05-02 15:35 ` Oleg Nesterov 0 siblings, 1 reply; 6+ messages in thread From: Andy Lutomirski @ 2016-05-02 15:38 UTC (permalink / raw) To: Oleg Nesterov Cc: X86 ML, linux-kernel@vger.kernel.org, Roland McGrath, Andi Kleen, Borislav Petkov, Brian Gerst On Mon, May 2, 2016 at 7:27 AM, Oleg Nesterov <oleg@redhat.com> wrote: > Hi Andy, > > let me first say that I never knew how this code (and the hardware) > actually works, I am not sure I even understand what ARCH_SET_.S > exactly does ;) > > What is even worse, I do not understand your question. So it is not > that I am trying to help, I am asking you to help me understand the > problem. > > On 04/29, Andy Lutomirski wrote: >> >> 1. I read fs_base using ptrace. I think I should get the actual >> fs_base without any nonsense. > > Which fs_base? The member of user_regs_struct? But this structure/layout > is just the ABI, so to me it seems correct that getreg() tries to look > at ->fs and/or ->fsindex. Yeah, the member of user_regs_struct. > > IOW. getreg(fs) should return the same value as prctl(ARCH_GET_FS) > returns if called by the tracee, no? Hah, nice can of worms there. You're assuming that ARCH_GET_FS actually worked... > >> 2. I read all the regs (PEEKUSER or whatever) and then write then all >> back verbatim. At the very least, I think that if I do this >> atomically using PTRACE_SETREGSET, the task's state needs to remain >> unchanged. > > Agreed... do you mean this doesn't work? I'm not 100% sure. It probably does right now. See below. > >> Since ptrace doesn't seem to have any real concept of >> atomic register state changes right now > > Could you spell please? > > I can't understand what does "atomically" mean in this context. I mean "change fs and fs_base to these two values in a single syscall so that the kernel can do something intelligent." Let me give some background: On 32-bit systems, there are the FS and GS registers. For any value of FS, there is an implied base address of the FS segment. A debugger could, if it cared, try to figure out that implied base, except that no one ever added the API for that. If a debugger read FS and wrote the same value back to FS, then the process would probably end up in the same state it started in (modulo several bugs, all but one of which are now fixed in -tip AFAIK.) All was well. On current 64-bit systems Linux systems, there is a degree of independent control of FS and FSBASE. A process can call ARCH_SET_FS and pass an offset >4G, which will result in FS == 0 and FSBASE == whatever the process passed. This is already a bit screwy. Suppose a debugger writes zero to FS. If this were an actual MOV instruction on an Intel chip, FSBASE would be reset to zero (and then the context switch code would corrupt it). But writing zero to FS through ptrace should have no effect and currently has no effect. If FS != 0, then FSBASE has some implied value. On old kernels, reading user_regs_struct::fs_base would give either zero or garbage, depending on which set of bugs you managed to hit. If you write, say, 0x2b to fs and 12345 to fs_base using the ptrace API, you'd end up with FS == 0x2b and FSBASE == 0, because the fs_base write went to an ignored field. On Ivy Bridge and up, there's a new CPU feature that lets user code override FSBASE on its own, making pretty much any combination of FS and FSBASE possible. But how should this interact with ptrace? If a debugger sets fs_base = 12345 and *then* sets fs to 0x2b, does the debugger expect the write to fs to override FSBASE (which it would if done using MOV) causing FSBASE to reset to zero? Or should FSBASE actually end up containing 12345? The issue comes up because, on these newer systems, 0x2b/12345 is actually a reasonable combination of values, whereas, on older systems, it was not. --Andy ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ptrace vs FSGSBASE 2016-05-02 15:38 ` Andy Lutomirski @ 2016-05-02 15:35 ` Oleg Nesterov 2016-05-02 17:26 ` Andy Lutomirski 0 siblings, 1 reply; 6+ messages in thread From: Oleg Nesterov @ 2016-05-02 15:35 UTC (permalink / raw) To: Andy Lutomirski Cc: X86 ML, linux-kernel@vger.kernel.org, Roland McGrath, Andi Kleen, Borislav Petkov, Brian Gerst On 05/02, Andy Lutomirski wrote: > > On Mon, May 2, 2016 at 7:27 AM, Oleg Nesterov <oleg@redhat.com> wrote: > >> > >> 1. I read fs_base using ptrace. I think I should get the actual > >> fs_base without any nonsense. > > > > Which fs_base? The member of user_regs_struct? But this structure/layout > > is just the ABI, so to me it seems correct that getreg() tries to look > > at ->fs and/or ->fsindex. > > Yeah, the member of user_regs_struct. Still can't understand this... user_regs_struct is just the set of offsets we use to "name" the registers for getreg/putreg. We simply do not have "the actual fs_base" we could use in getreg(), we need to calculate it. > > I can't understand what does "atomically" mean in this context. > > I mean "change fs and fs_base to these two values in a single syscall > so that the kernel can do something intelligent." > > Let me give some background: > [... snip ...] Thanks Andy. I need to re-read your explanation, but it seems I am starting to understand. And yes, I didn't bother to look at putreg() when I wrote my reply. > If you write, say, 0x2b to > fs and 12345 to fs_base using the ptrace API, you'd end up with FS == > 0x2b and FSBASE == 0, Hmm. I can be easily wrong again but afaics in this case do_arch_prctl() will change fs/fs_base first and set fsindex = FS_TLS_SEL fs = 0 and then... and then I simply can't understand what set_segment_reg(fs) will/should do in this case. Nor I can understand the "thread.fs != value" check before do_arch_prctl(ARCH_SET_FS). Confused. Oleg. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ptrace vs FSGSBASE 2016-05-02 15:35 ` Oleg Nesterov @ 2016-05-02 17:26 ` Andy Lutomirski 0 siblings, 0 replies; 6+ messages in thread From: Andy Lutomirski @ 2016-05-02 17:26 UTC (permalink / raw) To: Oleg Nesterov Cc: X86 ML, linux-kernel@vger.kernel.org, Roland McGrath, Andi Kleen, Borislav Petkov, Brian Gerst On Mon, May 2, 2016 at 8:35 AM, Oleg Nesterov <oleg@redhat.com> wrote: > On 05/02, Andy Lutomirski wrote: >> >> On Mon, May 2, 2016 at 7:27 AM, Oleg Nesterov <oleg@redhat.com> wrote: >> >> >> >> 1. I read fs_base using ptrace. I think I should get the actual >> >> fs_base without any nonsense. >> > >> > Which fs_base? The member of user_regs_struct? But this structure/layout >> > is just the ABI, so to me it seems correct that getreg() tries to look >> > at ->fs and/or ->fsindex. >> >> Yeah, the member of user_regs_struct. > > Still can't understand this... user_regs_struct is just the set of offsets > we use to "name" the registers for getreg/putreg. We simply do not have > "the actual fs_base" we could use in getreg(), we need to calculate it. Right. When I said writing to fs_base, I meant using POKEUSER or similar to write to the thing referred to as fs_base via the helpers in arch/x86/kernel/ptrace.c > >> > I can't understand what does "atomically" mean in this context. >> >> I mean "change fs and fs_base to these two values in a single syscall >> so that the kernel can do something intelligent." >> >> Let me give some background: >> [... snip ...] > > Thanks Andy. I need to re-read your explanation, but it seems I am starting > to understand. And yes, I didn't bother to look at putreg() when I wrote > my reply. > >> If you write, say, 0x2b to >> fs and 12345 to fs_base using the ptrace API, you'd end up with FS == >> 0x2b and FSBASE == 0, > > Hmm. I can be easily wrong again but afaics in this case do_arch_prctl() > will change fs/fs_base first and set > > fsindex = FS_TLS_SEL > fs = 0 > > and then... and then I simply can't understand what set_segment_reg(fs) > will/should do in this case. Exactly. Hence my uncertainly as to what to do. > Nor I can understand the "thread.fs != value" > check before do_arch_prctl(ARCH_SET_FS). Confused. I think that code was a optimization that doesn't make much sense. It wouldn't surprise me if almost no one uses any of this functionality right now. --Andy ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-05-02 17:26 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-04-29 18:22 ptrace vs FSGSBASE Andy Lutomirski 2016-05-02 12:40 ` Oleg Nesterov 2016-05-02 14:27 ` Oleg Nesterov 2016-05-02 15:38 ` Andy Lutomirski 2016-05-02 15:35 ` Oleg Nesterov 2016-05-02 17:26 ` Andy Lutomirski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox