linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jessica Yu <jeyu@kernel.org>
To: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"daniel@iogearbox.net" <daniel@iogearbox.net>,
	"keescook@chromium.org" <keescook@chromium.org>,
	"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
	"arjan@linux.intel.com" <arjan@linux.intel.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"linux-mips@linux-mips.org" <linux-mips@linux-mips.org>,
	"linux-s390@vger.kernel.org" <linux-s390@vger.kernel.org>,
	"jannh@google.com" <jannh@google.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"kristen@linux.intel.com" <kristen@linux.intel.com>,
	"Dock, Deneen T" <deneen.t.dock@intel.com>,
	"catalin.marinas@arm.com" <catalin.marinas@arm.com>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"will.deacon@arm.com" <will.deacon@arm.com>,
	"kernel-hardening@lists.openwall.com"
	<kernel-hardening@lists.openwall.com>,
	"bp@alien8.de" <bp@alien8.de>,
	"Hansen, Dave" <dave.hansen@intel.com>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"arnd@arndb.de" <arnd@arndb.de>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"sparclinux@vger.kernel.org" <sparclinux@vger.kernel.org>
Subject: Re: [PATCH RFC v3 0/3] Rlimit for module space
Date: Wed, 24 Oct 2018 17:07:06 +0200	[thread overview]
Message-ID: <20181024150706.jewcclhhh756tupn@linux-8ccs> (raw)
In-Reply-To: <CAKv+Gu-Rk-SQVOQ63L3DkF3=EVik3pHXzpNp5r5TrgDajTM_iQ@mail.gmail.com>

+++ Ard Biesheuvel [23/10/18 08:54 -0300]:
>On 22 October 2018 at 20:06, Edgecombe, Rick P
><rick.p.edgecombe@intel.com> wrote:
>> On Sat, 2018-10-20 at 19:20 +0200, Ard Biesheuvel wrote:
>>> Hi Rick,
>>>
>>> On 19 October 2018 at 22:47, Rick Edgecombe <rick.p.edgecombe@intel.com>
>>> wrote:
>>> > If BPF JIT is on, there is no effective limit to prevent filling the entire
>>> > module space with JITed e/BPF filters.
>>>
>>> Why do BPF filters use the module space, and does this reason apply to
>>> all architectures?
>>>
>>> On arm64, we already support loading plain modules far away from the
>>> core kernel (i.e. out of range for ordinary relative jump/calll
>>> instructions), and so I'd expect BPF to be able to deal with this
>>> already as well. So for arm64, I wonder why an ordinary vmalloc_exec()
>>> wouldn't be more appropriate.
>> AFAIK, it's like you said about relative instruction limits, but also because
>> some predictors don't predict past a certain distance. So performance as well.
>> Not sure the reasons for each arch, or if they all apply for BPF JIT. There seem
>> to be 8 by my count, that have a dedicated module space for some reason.
>>
>>> So before refactoring the module alloc/free routines to accommodate
>>> BPF, I'd like to take one step back and assess whether it wouldn't be
>>> more appropriate to have a separate bpf_alloc/free API, which could be
>>> totally separate from module alloc/free if the arch permits it.
>>>
>> I am not a BPF JIT expert unfortunately, hopefully someone more authoritative
>> will chime in. I only ran into this because I was trying to increase
>> randomization for the module space and wanted to find out how many allocations
>> needed to be supported.
>>
>> I'd guess though, that BPF JIT is just assuming that there will be some arch
>> specific constraints about where text can be placed optimally and they would
>> already be taken into account in the module space allocator.
>>
>> If there are no constraints for some arch, I'd wonder why not just update its
>> module_alloc to use the whole space available. What exactly are the constraints
>> for arm64 for normal modules?
>>
>
>Relative branches and the interactions with KAsan.
>
>We just fixed something similar in the kprobes code: it was using
>RWX-mapped module memory to store kprobed instructions, and we
>replaced that with a simple vmalloc_exec() [and code to remap it
>read-only], which was possible given that relative branches are always
>emulated by arm64 kprobes.
>
>So it depends on whether BPF code needs to be in relative branching
>range from the calling code, and whether the BPF code itself is
>emitted using relative branches into other parts of the code.
>
>> It seems fine to me for architectures to have the option of solving this a
>> different way. If potentially the rlimit ends up being the best solution for
>> some architectures though, do you think this refactor (pretty close to just a
>> name change) is that intrusive?
>>
>> I guess it could be just a BPF JIT per user limit and not touch module space,
>> but I thought the module space limit was a more general solution as that is the
>> actual limited resource.
>>
>
>I think it is wrong to conflate the two things. Limiting the number of
>BPF allocations and the limiting number of module allocations are two
>separate things, and the fact that BPF reuses module_alloc() out of
>convenience does not mean a single rlimit for both is appropriate.

Hm, I think Ard has a good point. AFAIK, and correct me if I'm wrong,
users of module_alloc() i.e. kprobes, ftrace, bpf, seem to use it
because it is an easy way to obtain executable kernel memory (and
depending on the needs of the architecture, being additionally
reachable via relative branches) during runtime. The side effect is
that all these users share the "module" memory space, even though this
memory region is not exclusively used by modules (well, personally I
think it technically should be, because seeing module_alloc() usage
outside of the module loader is kind of a misuse of the module API and
it's confusing for people who don't know the reason behind its usage
outside of the module loader).

Right now I'm not sure if it makes sense to impose a blanket limit on
all module_alloc() allocations when the real motivation behind the
rlimit is related to BPF, i.e., to stop unprivileged users from
hogging up all the vmalloc space for modules with JITed BPF filters.
So the rlimit has more to do with limiting the memory usage of BPF
filters than it has to do with modules themselves.

I think Ard's suggestion of having a separate bpf_alloc/free API makes
a lot of sense if we want to keep track of bpf-related allocations
(and then the rlimit would be enforced for those). Maybe part of the
module mapping space could be carved out for bpf filters (e.g. have
BPF_VADDR, BPF_VSIZE, etc like how we have it for modules), or
continue sharing the region but explicitly define a separate bpf_alloc
API, depending on an architecture's needs. What do people think?

Thanks,

Jessica

  reply	other threads:[~2018-10-24 23:35 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-19 20:47 [PATCH RFC v3 0/3] Rlimit for module space Rick Edgecombe
2018-10-19 20:47 ` [PATCH v3 1/3] modules: Create arch versions of module alloc/free Rick Edgecombe
2018-10-19 20:47 ` [PATCH v3 2/3] modules: Create rlimit for module space Rick Edgecombe
2018-10-19 20:47 ` [PATCH v3 3/3] bpf: Add system wide BPF JIT limit Rick Edgecombe
2018-10-20 17:20 ` [PATCH RFC v3 0/3] Rlimit for module space Ard Biesheuvel
2018-10-22 23:06   ` Edgecombe, Rick P
2018-10-23 11:54     ` Ard Biesheuvel
2018-10-24 15:07       ` Jessica Yu [this message]
2018-10-24 16:04         ` Daniel Borkmann
2018-10-25  1:01           ` Edgecombe, Rick P
2018-10-25 14:18             ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181024150706.jewcclhhh756tupn@linux-8ccs \
    --to=jeyu@kernel.org \
    --cc=ard.biesheuvel@linaro.org \
    --cc=arjan@linux.intel.com \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=daniel@iogearbox.net \
    --cc=dave.hansen@intel.com \
    --cc=davem@davemloft.net \
    --cc=deneen.t.dock@intel.com \
    --cc=jannh@google.com \
    --cc=keescook@chromium.org \
    --cc=kernel-hardening@lists.openwall.com \
    --cc=kristen@linux.intel.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@linux-mips.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=sparclinux@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=will.deacon@arm.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).