netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Masami Hiramatsu <mhiramat@kernel.org>
To: Andy Lutomirski <luto@kernel.org>
Cc: Jann Horn <jannh@google.com>, Nadav Amit <namit@vmware.com>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Changbin Du <changbin.du@gmail.com>,
	Kees Cook <keescook@chromium.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Network Development <netdev@vger.kernel.org>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Igor Stoppa <igor.stoppa@gmail.com>
Subject: Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault
Date: Mon, 25 Feb 2019 22:36:31 +0900	[thread overview]
Message-ID: <20190225223631.8ad8e949aa17a6c1eaae74ee@kernel.org> (raw)
In-Reply-To: <CALCETrU2V5KBA97g3O-yHiUu1acmM_K9b2a5ATmKSqGwdd-+Dw@mail.gmail.com>

On Fri, 22 Feb 2019 15:59:30 -0800
Andy Lutomirski <luto@kernel.org> wrote:

> On Fri, Feb 22, 2019 at 3:02 PM Jann Horn <jannh@google.com> wrote:
> >
> > On Fri, Feb 22, 2019 at 11:39 PM Nadav Amit <namit@vmware.com> wrote:
> > > > On Feb 22, 2019, at 2:21 PM, Nadav Amit <namit@vmware.com> wrote:
> > > >
> > > >> On Feb 22, 2019, at 2:17 PM, Jann Horn <jannh@google.com> wrote:
> > > >>
> > > >> On Fri, Feb 22, 2019 at 11:08 PM Nadav Amit <namit@vmware.com> wrote:
> > > >>>> On Feb 22, 2019, at 1:43 PM, Jann Horn <jannh@google.com> wrote:
> > > >>>>
> > > >>>> (adding some people from the text_poke series to the thread, removing stable@)
> > > >>>>
> > > >>>> On Fri, Feb 22, 2019 at 8:55 PM Andy Lutomirski <luto@amacapital.net> wrote:
> > > >>>>>> On Feb 22, 2019, at 11:34 AM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> > > >>>>>>> On Fri, Feb 22, 2019 at 02:30:26PM -0500, Steven Rostedt wrote:
> > > >>>>>>> On Fri, 22 Feb 2019 11:27:05 -0800
> > > >>>>>>> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> > > >>>>>>>
> > > >>>>>>>>> On Fri, Feb 22, 2019 at 09:43:14AM -0800, Linus Torvalds wrote:
> > > >>>>>>>>>
> > > >>>>>>>>> Then we should still probably fix up "__probe_kernel_read()" to not
> > > >>>>>>>>> allow user accesses. The easiest way to do that is actually likely to
> > > >>>>>>>>> use the "unsafe_get_user()" functions *without* doing a
> > > >>>>>>>>> uaccess_begin(), which will mean that modern CPU's will simply fault
> > > >>>>>>>>> on a kernel access to user space.
> > > >>>>>>>>
> > > >>>>>>>> On bpf side the bpf_probe_read() helper just calls probe_kernel_read()
> > > >>>>>>>> and users pass both user and kernel addresses into it and expect
> > > >>>>>>>> that the helper will actually try to read from that address.
> > > >>>>>>>>
> > > >>>>>>>> If __probe_kernel_read will suddenly start failing on all user addresses
> > > >>>>>>>> it will break the expectations.
> > > >>>>>>>> How do we solve it in bpf_probe_read?
> > > >>>>>>>> Call probe_kernel_read and if that fails call unsafe_get_user byte-by-byte
> > > >>>>>>>> in the loop?
> > > >>>>>>>> That's doable, but people already complain that bpf_probe_read() is slow
> > > >>>>>>>> and shows up in their perf report.
> > > >>>>>>>
> > > >>>>>>> We're changing kprobes to add a specific flag to say that we want to
> > > >>>>>>> differentiate between kernel or user reads. Can this be done with
> > > >>>>>>> bpf_probe_read()? If it's showing up in perf report, I doubt a single
> > > >>>>>>
> > > >>>>>> so you're saying you will break existing kprobe scripts?
> > > >>>>>> I don't think it's a good idea.
> > > >>>>>> It's not acceptable to break bpf_probe_read uapi.
> > > >>>>>
> > > >>>>> If so, the uapi is wrong: a long-sized number does not reliably identify an address if you don’t separately know whether it’s a user or kernel address. s390x and 4G:4G x86_32 are the notable exceptions. I have lobbied for RISC-V and future x86_64 to join the crowd.  I don’t know whether I’ll win this fight, but the uapi will probably have to change for at least s390x.
> > > >>>>>
> > > >>>>> What to do about existing scripts is a different question.
> > > >>>>
> > > >>>> This lack of logical separation between user and kernel addresses
> > > >>>> might interact interestingly with the text_poke series, specifically
> > > >>>> "[PATCH v3 05/20] x86/alternative: Initialize temporary mm for
> > > >>>> patching" (https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20190221234451.17632-6-rick.p.edgecombe%40intel.com%2F&amp;data=02%7C01%7Cnamit%40vmware.com%7Cf2513009ef734ecd6b0d08d69913a5ae%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636864707020821793&amp;sdata=HAbnDcrBne64JyPuVUMKmM7nQk67F%2BFvjuXEn8TmHeo%3D&amp;reserved=0)
> > > >>>> and "[PATCH v3 06/20] x86/alternative: Use temporary mm for text
> > > >>>> poking" (https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20190221234451.17632-7-rick.p.edgecombe%40intel.com%2F&amp;data=02%7C01%7Cnamit%40vmware.com%7Cf2513009ef734ecd6b0d08d69913a5ae%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636864707020821793&amp;sdata=vNRIMKtFDy%2F3z5FlTwDiJY6VGEV%2FMHgQPTdFSFtCo4s%3D&amp;reserved=0),
> > > >>>> right? If someone manages to get a tracing BPF program to trigger in a
> > > >>>> task that has switched to the patching mm, could they use
> > > >>>> bpf_probe_write_user() - which uses probe_kernel_write() after
> > > >>>> checking that KERNEL_DS isn't active and that access_ok() passes - to
> > > >>>> overwrite kernel text that is mapped writable in the patching mm?
> > > >>>
> > > >>> Yes, this is a good point. I guess text_poke() should be defined with
> > > >>> “__kprobes” and open-code memcpy.
> > > >>>
> > > >>> Does it sound reasonable?
> > > >>
> > > >> Doesn't __text_poke() as implemented in the proposed patch use a
> > > >> couple other kernel functions, too? Like switch_mm_irqs_off() and
> > > >> pte_clear() (which can be a call into a separate function on paravirt
> > > >> kernels)?
> > > >
> > > > I will move the pte_clear() to be done after the poking mm was unloaded.
> > > > Give me a few minutes to send a sketch of what I think should be done.
> > >
> > > Err.. You are right, I don’t see an easy way of preventing a kprobe from
> > > being set on switch_mm_irqs_off(), and open-coding this monster is too ugly.
> > >
> > > The reasonable solution seems to me as taking all the relevant pieces of
> > > code (and data) that might be used during text-poking and encapsulating them, so they
> > > will be set in a memory area which cannot be kprobe'd. This can also be
> > > useful to write-protect data structures of code that calls text_poke(),
> > > e.g., static-keys. It can also protect data on that stack that is used
> > > during text_poke() from being overwritten from another core.
> > >
> > > This solution is somewhat similar to Igor Stoppa’s idea of using “enclaves”
> > > when doing write-rarely operations.
> > >
> > > Right now, I think that text_poke() will keep being susceptible to such
> > > an attack, unless you have a better suggestion.
> >
> > A relatively simple approach might be to teach BPF not to run kprobe
> > programs and such in contexts where current->mm isn't the active mm?
> > Maybe using nmi_uaccess_okay(), or something like that? It looks like
> > perf_callchain_user() also already uses that. Except that a lot of
> > this code is x86-specific...
> 
> This sounds like exactly the right solution.  If you're running from
> some unknown context (like NMI or tracing), then you should check
> nmi_uaccess_okay().  I think we should just promote that to be a
> non-arch-specific function (that returns true by default) and check it
> the relevant bpf_probe_xyz() functions.

This treat may also need for my work, like probe_user_read() we should
fail if nmi_uaccess_okay().

Thank you,

> 
> Alexei, does that seem reasonable?


-- 
Masami Hiramatsu <mhiramat@kernel.org>

  parent reply	other threads:[~2019-02-25 13:36 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAHk-=wgJzNp0R3cVhjBPHTR4X9sOvHdqK4UVFfbsOKQ6L=A_eQ@mail.gmail.com>
     [not found] ` <CAHk-=wh9XrOykA5J9RQ7zaBio-S_D+1AE+rGnBYWSd==pCXh+w@mail.gmail.com>
     [not found]   ` <20190219111802.1d6dbaa3@gandalf.local.home>
     [not found]     ` <CAHk-=wgTuK3kAduP-gr10vykT1uG=B2VpdffvmyBuTQ1UxPpMg@mail.gmail.com>
     [not found]       ` <20190219140330.5dd9e876@gandalf.local.home>
     [not found]         ` <20190220171019.5e81a4946b56982f324f7c45@kernel.org>
     [not found]           ` <20190220094926.0ab575b3@gandalf.local.home>
     [not found]             ` <20190222172745.2c7205d62003c0a858e33278@kernel.org>
     [not found]               ` <20190222173509.88489b7c5d1bf0e2ec2382ee@kernel.org>
     [not found]                 ` <CAHk-=whNf_n1WXWW+ugAVeL5ZK0GcEP3cTYocju1nS85VtMjjQ@mail.gmail.com>
2019-02-22 19:27                   ` [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault Alexei Starovoitov
2019-02-22 19:30                     ` Steven Rostedt
2019-02-22 19:34                       ` Alexei Starovoitov
2019-02-22 19:39                         ` Steven Rostedt
2019-02-22 19:55                         ` Andy Lutomirski
2019-02-22 21:43                           ` Jann Horn
2019-02-22 22:08                             ` Nadav Amit
2019-02-22 22:17                               ` Jann Horn
2019-02-22 22:21                                 ` Nadav Amit
2019-02-22 22:39                                   ` Nadav Amit
2019-02-22 23:02                                     ` Jann Horn
2019-02-22 23:22                                       ` Nadav Amit
2019-02-22 23:59                                       ` Andy Lutomirski
2019-02-23  0:03                                         ` Alexei Starovoitov
2019-02-23  0:15                                         ` Nadav Amit
2019-02-24 19:35                                           ` Andy Lutomirski
2019-02-25 13:36                                         ` Masami Hiramatsu [this message]
2019-02-22 21:20                     ` Linus Torvalds
2019-02-22 21:38                       ` David Miller
2019-02-22 21:59                         ` Linus Torvalds
2019-02-22 22:51                           ` Alexei Starovoitov
2019-02-22 23:11                             ` Jann Horn
2019-02-22 23:16                               ` David Miller
2019-02-22 23:16                             ` Linus Torvalds
2019-02-22 23:56                               ` Alexei Starovoitov
2019-02-23  0:08                                 ` Linus Torvalds
2019-02-23  2:28                                   ` Alexei Starovoitov
2019-02-23  2:32                                     ` Linus Torvalds
2019-02-23  3:02                                     ` Steven Rostedt
2019-02-23  4:51                                 ` Masami Hiramatsu
2019-02-26  3:57                           ` Christoph Hellwig
2019-02-26 15:24                     ` Joel Fernandes
2019-02-28 12:29                       ` Masami Hiramatsu
2019-02-28 15:18                         ` Joel Fernandes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190225223631.8ad8e949aa17a6c1eaae74ee@kernel.org \
    --to=mhiramat@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=alexei.starovoitov@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=changbin.du@gmail.com \
    --cc=daniel@iogearbox.net \
    --cc=dave.hansen@intel.com \
    --cc=igor.stoppa@gmail.com \
    --cc=jannh@google.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    --cc=namit@vmware.com \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rostedt@goodmis.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).