* Re: [RFC PATCH v2] security, capability: pass object information to security_capable
From: Paul Moore @ 2019-08-14 21:27 UTC (permalink / raw)
To: Stephen Smalley
Cc: Richard Guy Briggs, Aaron Goidel, mortonm, john.johansen, selinux,
James Morris, luto, linux-security-module, linux-audit,
Nicholas Franck, Serge Hallyn
In-Reply-To: <b47e07bc-1b01-c5f0-305d-e6fe014b00d8@tycho.nsa.gov>
On Wed, Aug 14, 2019 at 5:08 PM Stephen Smalley <sds@tycho.nsa.gov> wrote:
> On 8/14/19 3:59 PM, Paul Moore wrote:
> > On Tue, Aug 13, 2019 at 5:27 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> >> On 2019-08-13 11:01, Aaron Goidel wrote:
> >>> On 8/8/19 12:30 PM, Paul Moore wrote:
> >>>> On Thu, Aug 1, 2019 at 10:43 AM Aaron Goidel <acgoide@tycho.nsa.gov> wrote:
> >>>>> From: Nicholas Franck <nhfran2@tycho.nsa.gov>
> >>>>>
> >>>>> At present security_capable does not pass any object information
> >>>>> and therefore can neither audit the particular object nor take it
> >>>>> into account. Augment the security_capable interface to support
> >>>>> passing supplementary data. Use this facility initially to convey
> >>>>> the inode for capability checks relevant to inodes. This only
> >>>>> addresses capable_wrt_inode_uidgid calls; other capability checks
> >>>>> relevant to inodes will be addressed in subsequent changes. In the
> >>>>> future, this will be further extended to pass object information for
> >>>>> other capability checks such as the target task for CAP_KILL.
> >>>>>
> >>>>> In SELinux this new information is leveraged here to include the inode
> >>>>> in the audit message. In the future, it could also be used to perform
> >>>>> a per inode capability checks.
> >>>>>
> >>>>> It would be possible to fold the existing opts argument into this new
> >>>>> supplementary data structure. This was omitted from this change to
> >>>>> minimize changes.
> >>>>>
> >>>>> Signed-off-by: Nicholas Franck <nhfran2@tycho.nsa.gov>
> >>>>> Signed-off-by: Aaron Goidel <acgoide@tycho.nsa.gov>
> >>>>> ---
> >>>>> v2:
> >>>>> - Changed order of audit prints so optional information comes second
> >>>>> ---
> >>>>> include/linux/capability.h | 7 ++++++
> >>>>> include/linux/lsm_audit.h | 5 +++-
> >>>>> include/linux/lsm_hooks.h | 3 ++-
> >>>>> include/linux/security.h | 23 +++++++++++++-----
> >>>>> kernel/capability.c | 33 ++++++++++++++++++--------
> >>>>> kernel/seccomp.c | 2 +-
> >>>>> security/apparmor/capability.c | 8 ++++---
> >>>>> security/apparmor/include/capability.h | 4 +++-
> >>>>> security/apparmor/ipc.c | 2 +-
> >>>>> security/apparmor/lsm.c | 5 ++--
> >>>>> security/apparmor/resource.c | 2 +-
> >>>>> security/commoncap.c | 11 +++++----
> >>>>> security/lsm_audit.c | 21 ++++++++++++++--
> >>>>> security/safesetid/lsm.c | 3 ++-
> >>>>> security/security.c | 5 ++--
> >>>>> security/selinux/hooks.c | 20 +++++++++-------
> >>>>> security/smack/smack_access.c | 2 +-
> >>>>> 17 files changed, 110 insertions(+), 46 deletions(-)
> >>>>
> >>>> You should CC the linux-audit list, I've added them on this mail.
> >>>>
> >>>> I had hoped to see some thought put into the idea of dynamically
> >>>> emitting the proper audit records as I mentioned in the previous patch
> >>>> set, but regardless there are some comments on this code as written
> >>>> ...
> >>>>
> >>>>> diff --git a/security/lsm_audit.c b/security/lsm_audit.c
> >>>>> index 33028c098ef3..18cc7c956b69 100644
> >>>>> --- a/security/lsm_audit.c
> >>>>> +++ b/security/lsm_audit.c
> >>>>> @@ -229,9 +229,26 @@ static void dump_common_audit_data(struct audit_buffer *ab,
> >>>>> case LSM_AUDIT_DATA_IPC:
> >>>>> audit_log_format(ab, " key=%d ", a->u.ipc_id);
> >>>>> break;
> >>>>> - case LSM_AUDIT_DATA_CAP:
> >>>>> - audit_log_format(ab, " capability=%d ", a->u.cap);
> >>>>> + case LSM_AUDIT_DATA_CAP: {
> >>>>> + const struct inode *inode;
> >>>>> +
> >>>>> + audit_log_format(ab, " capability=%d ", a->u.cap_struct.cap);
> >>>>> + if (a->u.cap_struct.cad) {
> >>>>> + switch (a->u.cap_struct.cad->type) {
> >>>>> + case CAP_AUX_DATA_INODE: {
> >>>>> + inode = a->u.cap_struct.cad->u.inode;
> >>>>> +
> >>>>> + audit_log_format(ab, " dev=");
> >>>>> + audit_log_untrustedstring(ab,
> >>>>> + inode->i_sb->s_id);
> >>>>> + audit_log_format(ab, " ino=%lu",
> >>>>> + inode->i_ino);
> >>>>> + break;
> >>>>> + }
> >>>>
> >>>> Since you are declaring "inode" further up, there doesn't appear to be
> >>>> any need for the CAP_AUX_DATA_INODE braces, please remove them.
> >>>>
> >>>> The general recommended practice when it comes to "sometimes" fields
> >>>> in an audit record, is to always record them in the record, but use a
> >>>> value of "?" when there is nothing relevant to record. For example,
> >>>> when *not* recording inode information you would do something like the
> >>>> following:
> >>>>
> >>>> audit_log_format(ab, " dev=? ino=?");
> >>>>
> >>> The issue this brings up is what happens when this is expanded to more
> >>> cases? Assuming there will be a case here for logging audit data for task
> >>> based capabilities (CAP_AUX_DATA_TASK), what do we want to have happen if we
> >>> are recording *neither* inode information nor task information (say a PID)?
> >>> If we log something in the inode case, we presumably don't want to call
> >>> audit_log_format(ab, " dev=?, pid=?") as well. (And vice versa for when we
> >>> log a pid and no inode).
> >>
> >> Yup. This record is already a mess due to that.
> >>
> >> The general solution is to either use a new record type for each
> >> variant, or to use an auxiliary record for each additional piece of
> >> information. The long term solution that has been suggested it to move
> >> away from a text message format.
> >
> > This is why I was suggesting that Aaron look into allowing the LSMs to
> > request generation of audit records in the earlier thread (and
> > mentioned it again at the top of my comments in this thread).
>
> How would that work? The behavior we want is to capture some information
> about the inode whenever there is a capability denial upon an attempted
> access to that inode. Allowing the LSM to enable audit collection on a
> per-process basis doesn't appear to help with that goal, because this is
> something we want for all processes. Further, we don't really want the
> rest of the audit collection machinery engaged here ...
I read this as "we want to audit this information, but we don't to
turn on the very thing which is designed to do this". At some point,
if you want to audit things, you actually need to turn on auditing.
> ... we just want the
> inode information for this specific inode, and we have the inode readily
> accessible in the caller of the LSM hook already, so we don't need to do
> it earlier.
Aaron appeared to be complaining that if we stuck to the current best
practices regarding record formatting for the LSM generated audit
record, the record was going to get complicated when people started
adding additional information. FWIW, I don't disagree. The only real
alternative I've seen thus far is to look into having the LSM enable
certain records, if anyone has another idea I'm all ears/eyes.
> Further, in the future we want to be able to take the inode security
> label into consideration as part of the capability checking, which is
> independent of audit entirely. So the rest of the patch will still be
> required even if the audit solution ends up being different.
That's a different topic, I don't think there are any remaining
objections to passing the inode information here.
--
paul moore
www.paul-moore.com
^ permalink raw reply
* Re: [RFC PATCH v2] security, capability: pass object information to security_capable
From: Stephen Smalley @ 2019-08-14 21:08 UTC (permalink / raw)
To: Paul Moore, Richard Guy Briggs, Aaron Goidel
Cc: mortonm, john.johansen, selinux, James Morris, luto,
linux-security-module, linux-audit, Nicholas Franck, Serge Hallyn
In-Reply-To: <CAHC9VhTWY4vtsmCn8X3TjR1HdsB1-wqBLs03vZVv0SmWQ-Ab2Q@mail.gmail.com>
On 8/14/19 3:59 PM, Paul Moore wrote:
> On Tue, Aug 13, 2019 at 5:27 PM Richard Guy Briggs <rgb@redhat.com> wrote:
>> On 2019-08-13 11:01, Aaron Goidel wrote:
>>> On 8/8/19 12:30 PM, Paul Moore wrote:
>>>> On Thu, Aug 1, 2019 at 10:43 AM Aaron Goidel <acgoide@tycho.nsa.gov> wrote:
>>>>> From: Nicholas Franck <nhfran2@tycho.nsa.gov>
>>>>>
>>>>> At present security_capable does not pass any object information
>>>>> and therefore can neither audit the particular object nor take it
>>>>> into account. Augment the security_capable interface to support
>>>>> passing supplementary data. Use this facility initially to convey
>>>>> the inode for capability checks relevant to inodes. This only
>>>>> addresses capable_wrt_inode_uidgid calls; other capability checks
>>>>> relevant to inodes will be addressed in subsequent changes. In the
>>>>> future, this will be further extended to pass object information for
>>>>> other capability checks such as the target task for CAP_KILL.
>>>>>
>>>>> In SELinux this new information is leveraged here to include the inode
>>>>> in the audit message. In the future, it could also be used to perform
>>>>> a per inode capability checks.
>>>>>
>>>>> It would be possible to fold the existing opts argument into this new
>>>>> supplementary data structure. This was omitted from this change to
>>>>> minimize changes.
>>>>>
>>>>> Signed-off-by: Nicholas Franck <nhfran2@tycho.nsa.gov>
>>>>> Signed-off-by: Aaron Goidel <acgoide@tycho.nsa.gov>
>>>>> ---
>>>>> v2:
>>>>> - Changed order of audit prints so optional information comes second
>>>>> ---
>>>>> include/linux/capability.h | 7 ++++++
>>>>> include/linux/lsm_audit.h | 5 +++-
>>>>> include/linux/lsm_hooks.h | 3 ++-
>>>>> include/linux/security.h | 23 +++++++++++++-----
>>>>> kernel/capability.c | 33 ++++++++++++++++++--------
>>>>> kernel/seccomp.c | 2 +-
>>>>> security/apparmor/capability.c | 8 ++++---
>>>>> security/apparmor/include/capability.h | 4 +++-
>>>>> security/apparmor/ipc.c | 2 +-
>>>>> security/apparmor/lsm.c | 5 ++--
>>>>> security/apparmor/resource.c | 2 +-
>>>>> security/commoncap.c | 11 +++++----
>>>>> security/lsm_audit.c | 21 ++++++++++++++--
>>>>> security/safesetid/lsm.c | 3 ++-
>>>>> security/security.c | 5 ++--
>>>>> security/selinux/hooks.c | 20 +++++++++-------
>>>>> security/smack/smack_access.c | 2 +-
>>>>> 17 files changed, 110 insertions(+), 46 deletions(-)
>>>>
>>>> You should CC the linux-audit list, I've added them on this mail.
>>>>
>>>> I had hoped to see some thought put into the idea of dynamically
>>>> emitting the proper audit records as I mentioned in the previous patch
>>>> set, but regardless there are some comments on this code as written
>>>> ...
>>>>
>>>>> diff --git a/security/lsm_audit.c b/security/lsm_audit.c
>>>>> index 33028c098ef3..18cc7c956b69 100644
>>>>> --- a/security/lsm_audit.c
>>>>> +++ b/security/lsm_audit.c
>>>>> @@ -229,9 +229,26 @@ static void dump_common_audit_data(struct audit_buffer *ab,
>>>>> case LSM_AUDIT_DATA_IPC:
>>>>> audit_log_format(ab, " key=%d ", a->u.ipc_id);
>>>>> break;
>>>>> - case LSM_AUDIT_DATA_CAP:
>>>>> - audit_log_format(ab, " capability=%d ", a->u.cap);
>>>>> + case LSM_AUDIT_DATA_CAP: {
>>>>> + const struct inode *inode;
>>>>> +
>>>>> + audit_log_format(ab, " capability=%d ", a->u.cap_struct.cap);
>>>>> + if (a->u.cap_struct.cad) {
>>>>> + switch (a->u.cap_struct.cad->type) {
>>>>> + case CAP_AUX_DATA_INODE: {
>>>>> + inode = a->u.cap_struct.cad->u.inode;
>>>>> +
>>>>> + audit_log_format(ab, " dev=");
>>>>> + audit_log_untrustedstring(ab,
>>>>> + inode->i_sb->s_id);
>>>>> + audit_log_format(ab, " ino=%lu",
>>>>> + inode->i_ino);
>>>>> + break;
>>>>> + }
>>>>
>>>> Since you are declaring "inode" further up, there doesn't appear to be
>>>> any need for the CAP_AUX_DATA_INODE braces, please remove them.
>>>>
>>>> The general recommended practice when it comes to "sometimes" fields
>>>> in an audit record, is to always record them in the record, but use a
>>>> value of "?" when there is nothing relevant to record. For example,
>>>> when *not* recording inode information you would do something like the
>>>> following:
>>>>
>>>> audit_log_format(ab, " dev=? ino=?");
>>>>
>>> The issue this brings up is what happens when this is expanded to more
>>> cases? Assuming there will be a case here for logging audit data for task
>>> based capabilities (CAP_AUX_DATA_TASK), what do we want to have happen if we
>>> are recording *neither* inode information nor task information (say a PID)?
>>> If we log something in the inode case, we presumably don't want to call
>>> audit_log_format(ab, " dev=?, pid=?") as well. (And vice versa for when we
>>> log a pid and no inode).
>>
>> Yup. This record is already a mess due to that.
>>
>> The general solution is to either use a new record type for each
>> variant, or to use an auxiliary record for each additional piece of
>> information. The long term solution that has been suggested it to move
>> away from a text message format.
>
> This is why I was suggesting that Aaron look into allowing the LSMs to
> request generation of audit records in the earlier thread (and
> mentioned it again at the top of my comments in this thread).
How would that work? The behavior we want is to capture some information
about the inode whenever there is a capability denial upon an attempted
access to that inode. Allowing the LSM to enable audit collection on a
per-process basis doesn't appear to help with that goal, because this is
something we want for all processes. Further, we don't really want the
rest of the audit collection machinery engaged here; we just want the
inode information for this specific inode, and we have the inode readily
accessible in the caller of the LSM hook already, so we don't need to do
it earlier.
Further, in the future we want to be able to take the inode security
label into consideration as part of the capability checking, which is
independent of audit entirely. So the rest of the patch will still be
required even if the audit solution ends up being different.
^ permalink raw reply
* Re: [Non-DoD Source] Re: [RFC PATCH v2] security, capability: pass object information to security_capable
From: Paul Moore @ 2019-08-14 19:59 UTC (permalink / raw)
To: Richard Guy Briggs, Aaron Goidel
Cc: mortonm, john.johansen, selinux, James Morris, luto,
linux-security-module, linux-audit, Nicholas Franck,
Stephen Smalley, Serge Hallyn
In-Reply-To: <20190813212710.wimxgfunrijqwuqt@madcap2.tricolour.ca>
On Tue, Aug 13, 2019 at 5:27 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> On 2019-08-13 11:01, Aaron Goidel wrote:
> > On 8/8/19 12:30 PM, Paul Moore wrote:
> > > On Thu, Aug 1, 2019 at 10:43 AM Aaron Goidel <acgoide@tycho.nsa.gov> wrote:
> > > > From: Nicholas Franck <nhfran2@tycho.nsa.gov>
> > > >
> > > > At present security_capable does not pass any object information
> > > > and therefore can neither audit the particular object nor take it
> > > > into account. Augment the security_capable interface to support
> > > > passing supplementary data. Use this facility initially to convey
> > > > the inode for capability checks relevant to inodes. This only
> > > > addresses capable_wrt_inode_uidgid calls; other capability checks
> > > > relevant to inodes will be addressed in subsequent changes. In the
> > > > future, this will be further extended to pass object information for
> > > > other capability checks such as the target task for CAP_KILL.
> > > >
> > > > In SELinux this new information is leveraged here to include the inode
> > > > in the audit message. In the future, it could also be used to perform
> > > > a per inode capability checks.
> > > >
> > > > It would be possible to fold the existing opts argument into this new
> > > > supplementary data structure. This was omitted from this change to
> > > > minimize changes.
> > > >
> > > > Signed-off-by: Nicholas Franck <nhfran2@tycho.nsa.gov>
> > > > Signed-off-by: Aaron Goidel <acgoide@tycho.nsa.gov>
> > > > ---
> > > > v2:
> > > > - Changed order of audit prints so optional information comes second
> > > > ---
> > > > include/linux/capability.h | 7 ++++++
> > > > include/linux/lsm_audit.h | 5 +++-
> > > > include/linux/lsm_hooks.h | 3 ++-
> > > > include/linux/security.h | 23 +++++++++++++-----
> > > > kernel/capability.c | 33 ++++++++++++++++++--------
> > > > kernel/seccomp.c | 2 +-
> > > > security/apparmor/capability.c | 8 ++++---
> > > > security/apparmor/include/capability.h | 4 +++-
> > > > security/apparmor/ipc.c | 2 +-
> > > > security/apparmor/lsm.c | 5 ++--
> > > > security/apparmor/resource.c | 2 +-
> > > > security/commoncap.c | 11 +++++----
> > > > security/lsm_audit.c | 21 ++++++++++++++--
> > > > security/safesetid/lsm.c | 3 ++-
> > > > security/security.c | 5 ++--
> > > > security/selinux/hooks.c | 20 +++++++++-------
> > > > security/smack/smack_access.c | 2 +-
> > > > 17 files changed, 110 insertions(+), 46 deletions(-)
> > >
> > > You should CC the linux-audit list, I've added them on this mail.
> > >
> > > I had hoped to see some thought put into the idea of dynamically
> > > emitting the proper audit records as I mentioned in the previous patch
> > > set, but regardless there are some comments on this code as written
> > > ...
> > >
> > > > diff --git a/security/lsm_audit.c b/security/lsm_audit.c
> > > > index 33028c098ef3..18cc7c956b69 100644
> > > > --- a/security/lsm_audit.c
> > > > +++ b/security/lsm_audit.c
> > > > @@ -229,9 +229,26 @@ static void dump_common_audit_data(struct audit_buffer *ab,
> > > > case LSM_AUDIT_DATA_IPC:
> > > > audit_log_format(ab, " key=%d ", a->u.ipc_id);
> > > > break;
> > > > - case LSM_AUDIT_DATA_CAP:
> > > > - audit_log_format(ab, " capability=%d ", a->u.cap);
> > > > + case LSM_AUDIT_DATA_CAP: {
> > > > + const struct inode *inode;
> > > > +
> > > > + audit_log_format(ab, " capability=%d ", a->u.cap_struct.cap);
> > > > + if (a->u.cap_struct.cad) {
> > > > + switch (a->u.cap_struct.cad->type) {
> > > > + case CAP_AUX_DATA_INODE: {
> > > > + inode = a->u.cap_struct.cad->u.inode;
> > > > +
> > > > + audit_log_format(ab, " dev=");
> > > > + audit_log_untrustedstring(ab,
> > > > + inode->i_sb->s_id);
> > > > + audit_log_format(ab, " ino=%lu",
> > > > + inode->i_ino);
> > > > + break;
> > > > + }
> > >
> > > Since you are declaring "inode" further up, there doesn't appear to be
> > > any need for the CAP_AUX_DATA_INODE braces, please remove them.
> > >
> > > The general recommended practice when it comes to "sometimes" fields
> > > in an audit record, is to always record them in the record, but use a
> > > value of "?" when there is nothing relevant to record. For example,
> > > when *not* recording inode information you would do something like the
> > > following:
> > >
> > > audit_log_format(ab, " dev=? ino=?");
> > >
> > The issue this brings up is what happens when this is expanded to more
> > cases? Assuming there will be a case here for logging audit data for task
> > based capabilities (CAP_AUX_DATA_TASK), what do we want to have happen if we
> > are recording *neither* inode information nor task information (say a PID)?
> > If we log something in the inode case, we presumably don't want to call
> > audit_log_format(ab, " dev=?, pid=?") as well. (And vice versa for when we
> > log a pid and no inode).
>
> Yup. This record is already a mess due to that.
>
> The general solution is to either use a new record type for each
> variant, or to use an auxiliary record for each additional piece of
> information. The long term solution that has been suggested it to move
> away from a text message format.
This is why I was suggesting that Aaron look into allowing the LSMs to
request generation of audit records in the earlier thread (and
mentioned it again at the top of my comments in this thread).
--
paul moore
www.paul-moore.com
^ permalink raw reply
* Re: [PATCH 0/6] lockdown fixups
From: Matthew Garrett @ 2019-08-14 18:26 UTC (permalink / raw)
To: James Morris; +Cc: LSM List
In-Reply-To: <alpine.LRH.2.21.1908150423100.31503@namei.org>
On Wed, Aug 14, 2019 at 11:24 AM James Morris <jmorris@namei.org> wrote:
>
> On Wed, 14 Aug 2019, Matthew Garrett wrote:
>
> > On Tue, Aug 13, 2019 at 10:06 PM James Morris <jmorris@namei.org> wrote:
> > > Which kernel version are these against?
> >
> > Crap. Sorry, these ended up derived from HEAD. Let me fix that up and
> > resend. Sorry about that!
>
> Do you have an upstream dependency on Mimi's code in -rc?
No, there's no dependency, I just ended up with an additional hunk
when rebasing.
^ permalink raw reply
* Re: [PATCH 0/6] lockdown fixups
From: James Morris @ 2019-08-14 18:24 UTC (permalink / raw)
To: Matthew Garrett; +Cc: LSM List
In-Reply-To: <CACdnJuvXpJoai+H7WeM_TNk2cjqr8evAm082aJgBRGvsUyaAVw@mail.gmail.com>
On Wed, 14 Aug 2019, Matthew Garrett wrote:
> On Tue, Aug 13, 2019 at 10:06 PM James Morris <jmorris@namei.org> wrote:
> > Which kernel version are these against?
>
> Crap. Sorry, these ended up derived from HEAD. Let me fix that up and
> resend. Sorry about that!
Do you have an upstream dependency on Mimi's code in -rc?
If so, just tell me which commit it is and I'll find the next good merge
point.
--
James Morris
<jmorris@namei.org>
^ permalink raw reply
* Re: [PATCH V38 15/29] acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down
From: Matthew Garrett @ 2019-08-14 18:02 UTC (permalink / raw)
To: Borislav Petkov
Cc: James Morris, LSM List, Linux Kernel Mailing List, Linux API,
Josh Boyer, David Howells, Kees Cook, Dave Young, linux-acpi
In-Reply-To: <20190814174732.GD1841@zn.tnic>
On Wed, Aug 14, 2019 at 10:46 AM Borislav Petkov <bp@alien8.de> wrote:
> Yeah, ok, I see what you're doing there. AFAICT, you do that in
>
> setup_arch->acpi_boot_table_init-> ... -> acpi_os_get_root_pointer()
Right.
> I hope nothing needs it earlier because then we'll have to restructure
> again...
Passing the RSDP via command line is a pretty grotesque hack - we
should just set up boot params in kexec_file, which would leave this
as a problem for legacy kexec and nothing else.
^ permalink raw reply
* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Andy Lutomirski @ 2019-08-14 17:51 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Colascione
Cc: Andy Lutomirski, Song Liu, Kees Cook, Networking, bpf,
Alexei Starovoitov, Daniel Borkmann, Kernel Team, Lorenz Bauer,
Jann Horn, Greg KH, Linux API, LSM List
In-Reply-To: <20190814005737.4qg6wh4a53vmso2v@ast-mbp>
On Tue, Aug 13, 2019 at 5:57 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> hmm. No. Kernel developers should not make any assumptions.
> They should guide their design by real use cases instead. That includes studing
> what people do now and hacks they use to workaround lack of interfaces.
> Effecitvely bpf is root only. There are no unpriv users.
> This root applications go out of their way to reduce privileges
> while they still want to use bpf. That is the need that /dev/bpf is solving.
>
> >
> > > Containers are not providing the level of security that is enough
> > > to run arbitrary code. VMs can do it better, but cpu bugs don't make it easy.
> > > Containers are used to make production systems safer.
> > > Some people call it more 'secure', but it's clearly not secure for
> > > arbitrary code and that is what kernel.unprivileged_bpf_disabled allows.
> > > When we say 'unprivileged bpf' we really mean arbitrary malicious bpf program.
> > > It's been a constant source of pain. The constant blinding, randomization,
> > > verifier speculative analysis, all spectre v1, v2, v4 mitigations
> > > are simply not worth it. It's a lot of complex kernel code without users.
> >
> > Seccomp really will want eBPF some day, and it should work without
> > privilege. Maybe it should be a restricted subset of eBPF, and
> > Spectre will always be an issue until dramatically better hardware
> > shows up, but I think people will want the ability for regular
> > programs to load eBPF seccomp programs.
>
> I'm absolutely against using eBPF in seccomp.
> Precisely due to discussions like the current one.
I still think I don't really agree with your overall premise.
If eBPF is genuinely not usable by programs that are not fully trusted
by the admin, then no kernel changes at all are needed. Programs that
want to reduce their own privileges can easily fork() a privileged
subprocess or run a little helper to which they delegate BPF
operations. This is far more flexible than anything that will ever be
in the kernel because it allows the helper to verify that the rest of
the program is doing exactly what it's supposed to and restrict eBPF
operations to exactly the subset that is needed. So a container
manager or network manager that drops some provilege could have a
little bpf-helper that manages its BPF XDP, firewalling, etc
configuration. The two processes would talk over a socketpair.
The interesting cases you're talking about really *do* involved
unprivileged or less privileged eBPF, though. Let's see:
systemd --user: systemd --user *is not privileged at all*. There's no
issue of reducing privilege, since systemd --user doesn't have any
privilege to begin with. But systemd supports some eBPF features, and
presumably it would like to support them in the systemd --user case.
This is unprivileged eBPF.
Seccomp. Seccomp already uses cBPF, which is a form of BPF although
it doesn't involve the bpf() syscall. There are some seccomp
proposals in the works that will want some stuff from eBPF. In
particular, the ability to call seccomp-specific bpf functions from a
seccomp program could be very nice. Similarly, the ability to use the
enhanced instruction set and maybe even *read* maps would be nice. I
do think that seccomp will continue to want its programs to be
stateless.
So it's a bit of a chicken-and-egg situation. There aren't major
unprivileged eBPF users because the kernel support isn't there.
>
> >
> > > Hence I prefer this /dev/bpf mechanism to be as simple a possible.
> > > The applications that will use it are going to be just as trusted as systemd.
> >
> > I still don't understand your systemd example. systemd --users is not
> > trusted systemwide in any respect. The main PID 1 systemd is root.
> > No matter how you dice it, granting a user systemd instance extra bpf
> > access is tantamount to granting the user extra bpf access in general.
>
> People use systemd --user while their kernel have 'undef CONFIG_USER_NS'.
I don't know what you're getting at. I'm typing this email in a
browser running under a systemd --user instance, and there are no user
namespaces involved.
$ ps -u luto |grep systemd
1944 ? 00:00:02 systemd
$ stat /proc/1944
...
Access: (0555/dr-xr-xr-x) Uid: ( 1000/ luto) Gid: ( 1000/ luto)
Context: unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
$ gdb -p 1944
[snipped tons of output, but gdb works fine like this]
systemd --user is not privileged. Giving it /dev/bpf as imagined by
the current set of patches would be a gaping security hole.
>
> I think there should be no unprivileged bpf at all,
> because over all these years we've seen zero use cases.
> Hence all new features are root only.
You're the maintainer. If you feel this way, then I think you should
just drop the /dev/bpf idea entirely and have userspace manage all of
this by itself. It will remain extremely awkward for containers and
especially nested containers to use eBPF.
--Andy
^ permalink raw reply
* Re: [PATCH V38 15/29] acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down
From: Borislav Petkov @ 2019-08-14 17:47 UTC (permalink / raw)
To: Matthew Garrett
Cc: James Morris, LSM List, Linux Kernel Mailing List, Linux API,
Josh Boyer, David Howells, Kees Cook, Dave Young, linux-acpi
In-Reply-To: <CACdnJuuOhuw71GDwQOrPQxUexpvpZNJocr6m0dYzJw+MMaVKWQ@mail.gmail.com>
On Wed, Aug 14, 2019 at 10:14:54AM -0700, Matthew Garrett wrote:
> We explicitly don't want to pay attention to the acpi_rsdp kernel
> parameter in early boot except for the case of finding the SRAT table,
> and we only need that if CONFIG_RANDOMIZE_BASE and
> CONFIG_MEMORY_HOTREMOVE are set. However, we *do* want to tell the
> actual kernel where the RSDP is if we found it via some other means,
> so we can't just clear the boot parameters value.
Ok.
> The kernel proper will parse the command line again and will then (if
> lockdown isn't enabled) override the actual value we passed up in boot
> params.
Yeah, ok, I see what you're doing there. AFAICT, you do that in
setup_arch->acpi_boot_table_init-> ... -> acpi_os_get_root_pointer()
I hope nothing needs it earlier because then we'll have to restructure
again...
Thx.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
^ permalink raw reply
* Re: [PATCH 0/6] lockdown fixups
From: Matthew Garrett @ 2019-08-14 17:20 UTC (permalink / raw)
To: James Morris; +Cc: LSM List
In-Reply-To: <alpine.LRH.2.21.1908141503240.27654@namei.org>
On Tue, Aug 13, 2019 at 10:06 PM James Morris <jmorris@namei.org> wrote:
> Which kernel version are these against?
Crap. Sorry, these ended up derived from HEAD. Let me fix that up and
resend. Sorry about that!
^ permalink raw reply
* Re: [PATCH 3/6] Avoid build warning when !CONFIG_KEXEC_SIG
From: Matthew Garrett @ 2019-08-14 17:18 UTC (permalink / raw)
To: Dave Young; +Cc: James Morris, LSM List, Jiri Bohac, kexec devel list
In-Reply-To: <20190814052359.GA10664@dhcp-128-65.nay.redhat.com>
On Tue, Aug 13, 2019 at 10:24 PM Dave Young <dyoung@redhat.com> wrote:
> I can not get the whole thread, also not sure which tree it should
> apply.
This should be against -next.
> Also I think it should be good to split the preparation piece of kexec
> and send them separately. Since it is improve the signature
> verification logic, they do not necessarily depend on the lockdown
> series.
Sorry, which preparation piece? There shouldn't be any logic change in
this patch.
^ permalink raw reply
* Re: [PATCH V38 15/29] acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down
From: Matthew Garrett @ 2019-08-14 17:14 UTC (permalink / raw)
To: Borislav Petkov
Cc: James Morris, LSM List, Linux Kernel Mailing List, Linux API,
Josh Boyer, David Howells, Kees Cook, Dave Young, linux-acpi
In-Reply-To: <20190814072602.GA27836@zn.tnic>
On Wed, Aug 14, 2019 at 12:25 AM Borislav Petkov <bp@alien8.de> wrote:
> #if defined(CONFIG_RANDOMIZE_BASE) && defined(CONFIG_MEMORY_HOTREMOVE)
>
> false and thus not available to early code anymore.
We explicitly don't want to pay attention to the acpi_rsdp kernel
parameter in early boot except for the case of finding the SRAT table,
and we only need that if CONFIG_RANDOMIZE_BASE and
CONFIG_MEMORY_HOTREMOVE are set. However, we *do* want to tell the
actual kernel where the RSDP is if we found it via some other means,
so we can't just clear the boot parameters value. The kernel proper
will parse the command line again and will then (if lockdown isn't
enabled) override the actual value we passed up in boot params. So I
think this is ok?
(Sorry for not Cc:ing x86, clear oversight on my part)
^ permalink raw reply
* Re: [RFC/RFT v4 0/5] Add generic trusted keys framework/subsystem
From: Mimi Zohar @ 2019-08-14 13:24 UTC (permalink / raw)
To: Sumit Garg, keyrings, linux-integrity, linux-crypto,
linux-security-module
Cc: dhowells, herbert, davem, peterhuewe, jgg, jejb, jarkko.sakkinen,
arnd, gregkh, jmorris, serge, casey, ard.biesheuvel,
daniel.thompson, linux-kernel, tee-dev
In-Reply-To: <1565682784-10234-1-git-send-email-sumit.garg@linaro.org>
Hi Sumit,
On Tue, 2019-08-13 at 13:22 +0530, Sumit Garg wrote:
> This patch-set is an outcome of discussion here [1]. It has evolved very
> much since v1 to create, consolidate and generalize trusted keys
> subsystem.
>
> This framework has been tested with trusted keys support provided via TEE
> but I wasn't able to test it with a TPM device as I don't possess one. It
> would be really helpful if others could test this patch-set using a TPM
> device.
With the "CONFIG_HEADER_TEST" and "CONFIG_KERNEL_HEADER_TEST" config
options enabled, which is required for linux-next, it fails to build.
Mimi
^ permalink raw reply
* Re: [PATCH V38 15/29] acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down
From: Borislav Petkov @ 2019-08-14 7:28 UTC (permalink / raw)
To: Matthew Garrett
Cc: jmorris, linux-security-module, linux-kernel, linux-api,
Josh Boyer, David Howells, Matthew Garrett, Kees Cook, Dave Young,
linux-acpi, x86-ml
In-Reply-To: <20190808000721.124691-16-matthewgarrett@google.com>
On Wed, Aug 07, 2019 at 05:07:07PM -0700, Matthew Garrett wrote:
> From: Josh Boyer <jwboyer@redhat.com>
>
> This option allows userspace to pass the RSDP address to the kernel, which
> makes it possible for a user to modify the workings of hardware. Reject
> the option when the kernel is locked down. This requires some reworking
> of the existing RSDP command line logic, since the early boot code also
> makes use of a command-line passed RSDP when locating the SRAT table
> before the lockdown code has been initialised. This is achieved by
> separating the command line RSDP path in the early boot code from the
> generic RSDP path, and then copying the command line RSDP into boot
> params in the kernel proper if lockdown is not enabled. If lockdown is
> enabled and an RSDP is provided on the command line, this will only be
> used when parsing SRAT (which shouldn't permit kernel code execution)
> and will be ignored in the rest of the kernel.
>
> (Modified by Matthew Garrett in order to handle the early boot RSDP
> environment)
>
> Signed-off-by: Josh Boyer <jwboyer@redhat.com>
> Signed-off-by: David Howells <dhowells@redhat.com>
> Signed-off-by: Matthew Garrett <mjg59@google.com>
> Reviewed-by: Kees Cook <keescook@chromium.org>
> cc: Dave Young <dyoung@redhat.com>
> cc: linux-acpi@vger.kernel.org
> ---
> arch/x86/boot/compressed/acpi.c | 19 +++++++++++++------
> arch/x86/include/asm/acpi.h | 9 +++++++++
> arch/x86/include/asm/x86_init.h | 2 ++
> arch/x86/kernel/acpi/boot.c | 5 +++++
> arch/x86/kernel/x86_init.c | 1 +
> drivers/acpi/osl.c | 14 +++++++++++++-
> include/linux/acpi.h | 6 ++++++
> 7 files changed, 49 insertions(+), 7 deletions(-)
Also, why isn't an x86 person CCed here? This clearly needs an x86
maintainer's ACK before it goes in?
Thx.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
^ permalink raw reply
* Re: [PATCH V38 15/29] acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down
From: Borislav Petkov @ 2019-08-14 7:26 UTC (permalink / raw)
To: Matthew Garrett
Cc: jmorris, linux-security-module, linux-kernel, linux-api,
Josh Boyer, David Howells, Matthew Garrett, Kees Cook, Dave Young,
linux-acpi
In-Reply-To: <20190808000721.124691-16-matthewgarrett@google.com>
On Wed, Aug 07, 2019 at 05:07:07PM -0700, Matthew Garrett wrote:
> From: Josh Boyer <jwboyer@redhat.com>
>
> This option allows userspace to pass the RSDP address to the kernel, which
> makes it possible for a user to modify the workings of hardware. Reject
> the option when the kernel is locked down. This requires some reworking
> of the existing RSDP command line logic, since the early boot code also
> makes use of a command-line passed RSDP when locating the SRAT table
> before the lockdown code has been initialised. This is achieved by
> separating the command line RSDP path in the early boot code from the
> generic RSDP path, and then copying the command line RSDP into boot
> params in the kernel proper if lockdown is not enabled. If lockdown is
> enabled and an RSDP is provided on the command line, this will only be
> used when parsing SRAT (which shouldn't permit kernel code execution)
> and will be ignored in the rest of the kernel.
>
> (Modified by Matthew Garrett in order to handle the early boot RSDP
> environment)
>
> Signed-off-by: Josh Boyer <jwboyer@redhat.com>
> Signed-off-by: David Howells <dhowells@redhat.com>
> Signed-off-by: Matthew Garrett <mjg59@google.com>
> Reviewed-by: Kees Cook <keescook@chromium.org>
> cc: Dave Young <dyoung@redhat.com>
> cc: linux-acpi@vger.kernel.org
> ---
> arch/x86/boot/compressed/acpi.c | 19 +++++++++++++------
> arch/x86/include/asm/acpi.h | 9 +++++++++
> arch/x86/include/asm/x86_init.h | 2 ++
> arch/x86/kernel/acpi/boot.c | 5 +++++
> arch/x86/kernel/x86_init.c | 1 +
> drivers/acpi/osl.c | 14 +++++++++++++-
> include/linux/acpi.h | 6 ++++++
> 7 files changed, 49 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c
> index 15255f388a85..149795c369f2 100644
> --- a/arch/x86/boot/compressed/acpi.c
> +++ b/arch/x86/boot/compressed/acpi.c
> @@ -26,7 +26,7 @@ struct mem_vector immovable_mem[MAX_NUMNODES*2];
> */
> #define MAX_ADDR_LEN 19
>
> -static acpi_physical_address get_acpi_rsdp(void)
> +static acpi_physical_address get_cmdline_acpi_rsdp(void)
> {
> acpi_physical_address addr = 0;
>
> @@ -278,10 +278,7 @@ acpi_physical_address get_rsdp_addr(void)
> {
> acpi_physical_address pa;
>
> - pa = get_acpi_rsdp();
> -
> - if (!pa)
> - pa = boot_params->acpi_rsdp_addr;
AFAICT, this looks like it would break current usage: get_rsdp_addr()
needs to call get_acpi_rsdp() which you've now called
get_cmdline_acpi_rsdp() to parse "acpi_rsdp=" but it ain't happening
anymore.
Where the parsing is happening now is in get_acpi_srat_table() which is
not present in configs with
#if defined(CONFIG_RANDOMIZE_BASE) && defined(CONFIG_MEMORY_HOTREMOVE)
false and thus not available to early code anymore.
I think the cleaner/easier approach would be to clear
boot_params->acpi_rsdp_addr after SRAT has been parsed in lockdown
configurations so that nothing else uses the supplied RDSP address
anymore. But I could very well be missing something...
Thx.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
^ permalink raw reply
* [PATCH] tracefs: Fix NULL pointer dereference when no lockdown is used
From: Marek Szyprowski @ 2019-08-14 6:12 UTC (permalink / raw)
To: linux-kernel
Cc: Marek Szyprowski, Matthew Garrett, Steven Rostedt, James Morris,
Krzysztof Kozlowski, Bartlomiej Zolnierkiewicz,
linux-security-module, linux-api, Matthew Garrett
In-Reply-To: <3028ed35-3b6d-459f-f3c8-103c5636fe95@samsung.com>
Commit 757ff7244358 ("tracefs: Restrict tracefs when the kernel is locked
down") added infrastructure for restricting tracefs access when lockdown
is enabled. It however broke tracefs operation when no lockdown is used.
Fix this issue by adding missing check for a NULL ->open() callback.
Fixes: 757ff7244358 ("tracefs: Restrict tracefs when the kernel is locked down")
Reported-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
fs/tracefs/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index 12a325fb4cbd..8efff7603032 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -43,7 +43,7 @@ static int default_open_file(struct inode *inode, struct file *filp)
return ret;
real_fops = dentry->d_fsdata;
- return real_fops->open(inode, filp);
+ return real_fops->open ? real_fops->open(inode, filp) : 0;
}
static ssize_t default_read_file(struct file *file, char __user *buf,
--
2.17.1
^ permalink raw reply related
* Re: [PATCH 3/6] Avoid build warning when !CONFIG_KEXEC_SIG
From: Dave Young @ 2019-08-14 5:23 UTC (permalink / raw)
To: Matthew Garrett
Cc: jmorris, linux-security-module, Matthew Garrett, Jiri Bohac,
kexec
In-Reply-To: <20190813192126.122370-4-matthewgarrett@google.com>
On 08/13/19 at 12:21pm, Matthew Garrett wrote:
> Refactor the signature validation and lockdown integration a little in
> order to avoid an unused variable.
>
> Signed-off-by: Matthew Garrett <mjg59@google.com>
> Cc: Jiri Bohac <jbohac@suse.cz>
> Cc: Dave Young <dyoung@redhat.com>
> Cc: kexec@lists.infradead.org
> ---
> kernel/kexec_file.c | 72 ++++++++++++++++++++++++++++-----------------
> 1 file changed, 45 insertions(+), 27 deletions(-)
>
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index be0c13076056..e878587715b9 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -177,36 +177,13 @@ void kimage_file_post_load_cleanup(struct kimage *image)
> image->image_loader_data = NULL;
> }
>
> -/*
> - * In file mode list of segments is prepared by kernel. Copy relevant
> - * data from user space, do error checking, prepare segment list
> - */
> +#ifdef CONFIG_KEXEC_SIG
> static int
> -kimage_file_prepare_segments(struct kimage *image, int kernel_fd, int initrd_fd,
> - const char __user *cmdline_ptr,
> - unsigned long cmdline_len, unsigned flags)
> +kimage_validate_signature(struct kimage *image)
> {
> const char *reason;
> int ret;
> - void *ldata;
> - loff_t size;
> -
> - ret = kernel_read_file_from_fd(kernel_fd, &image->kernel_buf,
> - &size, INT_MAX, READING_KEXEC_IMAGE);
> - if (ret)
> - return ret;
> - image->kernel_buf_len = size;
> -
> - /* IMA needs to pass the measurement list to the next kernel. */
> - ima_add_kexec_buffer(image);
>
> - /* Call arch image probe handlers */
> - ret = arch_kexec_kernel_image_probe(image, image->kernel_buf,
> - image->kernel_buf_len);
> - if (ret)
> - goto out;
> -
> -#ifdef CONFIG_KEXEC_SIG
> ret = arch_kexec_kernel_verify_sig(image, image->kernel_buf,
> image->kernel_buf_len);
> switch (ret) {
> @@ -228,7 +205,7 @@ kimage_file_prepare_segments(struct kimage *image, int kernel_fd, int initrd_fd,
> decide:
> if (IS_ENABLED(CONFIG_KEXEC_SIG_FORCE)) {
> pr_notice("%s rejected\n", reason);
> - goto out;
> + break;
> }
>
> ret = 0;
> @@ -251,9 +228,44 @@ kimage_file_prepare_segments(struct kimage *image, int kernel_fd, int initrd_fd,
> */
> default:
> pr_notice("kernel signature verification failed (%d).\n", ret);
> - goto out;
> + break;
> }
> +
> + return ret;
> +}
> +#endif
> +
> +/*
> + * In file mode list of segments is prepared by kernel. Copy relevant
> + * data from user space, do error checking, prepare segment list
> + */
> +static int
> +kimage_file_prepare_segments(struct kimage *image, int kernel_fd, int initrd_fd,
> + const char __user *cmdline_ptr,
> + unsigned long cmdline_len, unsigned flags)
> +{
> + int ret;
> + void *ldata;
> + loff_t size;
> +
> + ret = kernel_read_file_from_fd(kernel_fd, &image->kernel_buf,
> + &size, INT_MAX, READING_KEXEC_IMAGE);
> + if (ret)
> + return ret;
> + image->kernel_buf_len = size;
> +
> + /* Call arch image probe handlers */
> + ret = arch_kexec_kernel_image_probe(image, image->kernel_buf,
> + image->kernel_buf_len);
> + if (ret)
> + goto out;
> +
> +#ifdef CONFIG_KEXEC_SIG
> + ret = kimage_validate_signature(image);
> + if (ret)
> + goto out;
> #endif
> +
> /* It is possible that there no initramfs is being loaded */
> if (!(flags & KEXEC_FILE_NO_INITRAMFS)) {
> ret = kernel_read_file_from_fd(initrd_fd, &image->initrd_buf,
> @@ -279,8 +291,14 @@ kimage_file_prepare_segments(struct kimage *image, int kernel_fd, int initrd_fd,
> ret = -EINVAL;
> goto out;
> }
> +
> + ima_kexec_cmdline(image->cmdline_buf,
> + image->cmdline_buf_len - 1);
> }
>
> + /* IMA needs to pass the measurement list to the next kernel. */
> + ima_add_kexec_buffer(image);
> +
> /* Call arch image load handlers */
> ldata = arch_kexec_kernel_image_load(image);
>
I can not get the whole thread, also not sure which tree it should
apply.
I assume it is based on your series for lockdown. But it is really hard
to review because of no context in this patch.
Also I think it should be good to split the preparation piece of kexec
and send them separately. Since it is improve the signature
verification logic, they do not necessarily depend on the lockdown
series.
Thanks
Dave
^ permalink raw reply
* Re: [PATCH 0/6] lockdown fixups
From: James Morris @ 2019-08-14 5:06 UTC (permalink / raw)
To: Matthew Garrett; +Cc: linux-security-module
In-Reply-To: <20190813192126.122370-1-matthewgarrett@google.com>
[-- Attachment #1: Type: text/plain, Size: 851 bytes --]
On Tue, 13 Aug 2019, Matthew Garrett wrote:
> Fixups for the lockdown patchset. Potentially makes more sense to merge
> these into the relevant original patches rather than keeping them
> separate, but sending them for review.
Which kernel version are these against?
kernel/kexec_file.c: In function ‘kimage_validate_signature’:
kernel/kexec_file.c:220:4: error: label ‘out’ used but not defined
220 | goto out;
| ^~~~
kernel/kexec_file.c: In function ‘kimage_file_prepare_segments’:
kernel/kexec_file.c:295:3: error: implicit declaration of function
‘ima_kexec_cmdline’ [-Werror=implicit-function-declaration]
295 | ima_kexec_cmdline(image->cmdline_buf,
| ^~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
My next-lockdown branch is based on v5.2.
--
James Morris
<jmorris@namei.org>
^ permalink raw reply
* Re: [PATCH V38 15/29] acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down
From: Dave Young @ 2019-08-14 2:51 UTC (permalink / raw)
To: Matthew Garrett
Cc: jmorris, linux-security-module, linux-kernel, linux-api,
Josh Boyer, bhe, kasong, Borislav Petkov, David Howells,
Matthew Garrett, Kees Cook, linux-acpi
In-Reply-To: <20190808000721.124691-16-matthewgarrett@google.com>
On 08/07/19 at 05:07pm, Matthew Garrett wrote:
> From: Josh Boyer <jwboyer@redhat.com>
>
> This option allows userspace to pass the RSDP address to the kernel, which
> makes it possible for a user to modify the workings of hardware. Reject
> the option when the kernel is locked down. This requires some reworking
> of the existing RSDP command line logic, since the early boot code also
> makes use of a command-line passed RSDP when locating the SRAT table
> before the lockdown code has been initialised. This is achieved by
> separating the command line RSDP path in the early boot code from the
> generic RSDP path, and then copying the command line RSDP into boot
> params in the kernel proper if lockdown is not enabled. If lockdown is
> enabled and an RSDP is provided on the command line, this will only be
> used when parsing SRAT (which shouldn't permit kernel code execution)
> and will be ignored in the rest of the kernel.
>
> (Modified by Matthew Garrett in order to handle the early boot RSDP
> environment)
>
> Signed-off-by: Josh Boyer <jwboyer@redhat.com>
> Signed-off-by: David Howells <dhowells@redhat.com>
> Signed-off-by: Matthew Garrett <mjg59@google.com>
> Reviewed-by: Kees Cook <keescook@chromium.org>
> cc: Dave Young <dyoung@redhat.com>
> cc: linux-acpi@vger.kernel.org
> ---
> arch/x86/boot/compressed/acpi.c | 19 +++++++++++++------
> arch/x86/include/asm/acpi.h | 9 +++++++++
> arch/x86/include/asm/x86_init.h | 2 ++
> arch/x86/kernel/acpi/boot.c | 5 +++++
> arch/x86/kernel/x86_init.c | 1 +
> drivers/acpi/osl.c | 14 +++++++++++++-
> include/linux/acpi.h | 6 ++++++
> 7 files changed, 49 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c
> index 15255f388a85..149795c369f2 100644
> --- a/arch/x86/boot/compressed/acpi.c
> +++ b/arch/x86/boot/compressed/acpi.c
> @@ -26,7 +26,7 @@ struct mem_vector immovable_mem[MAX_NUMNODES*2];
> */
> #define MAX_ADDR_LEN 19
>
> -static acpi_physical_address get_acpi_rsdp(void)
> +static acpi_physical_address get_cmdline_acpi_rsdp(void)
> {
> acpi_physical_address addr = 0;
>
> @@ -278,10 +278,7 @@ acpi_physical_address get_rsdp_addr(void)
> {
> acpi_physical_address pa;
>
> - pa = get_acpi_rsdp();
> -
> - if (!pa)
> - pa = boot_params->acpi_rsdp_addr;
> + pa = boot_params->acpi_rsdp_addr;
>
> /*
> * Try to get EFI data from setup_data. This can happen when we're a
> @@ -311,7 +308,17 @@ static unsigned long get_acpi_srat_table(void)
> char arg[10];
> u8 *entry;
>
> - rsdp = (struct acpi_table_rsdp *)(long)boot_params->acpi_rsdp_addr;
> + /*
> + * Check whether we were given an RSDP on the command line. We don't
> + * stash this in boot params because the kernel itself may have
> + * different ideas about whether to trust a command-line parameter.
> + */
> + rsdp = (struct acpi_table_rsdp *)get_cmdline_acpi_rsdp();
> +
> + if (!rsdp)
> + rsdp = (struct acpi_table_rsdp *)(long)
> + boot_params->acpi_rsdp_addr;
> +
> if (!rsdp)
> return 0;
>
> diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
> index aac686e1e005..bc9693c9107e 100644
> --- a/arch/x86/include/asm/acpi.h
> +++ b/arch/x86/include/asm/acpi.h
> @@ -117,6 +117,12 @@ static inline bool acpi_has_cpu_in_madt(void)
> return !!acpi_lapic;
> }
>
> +#define ACPI_HAVE_ARCH_SET_ROOT_POINTER
> +static inline void acpi_arch_set_root_pointer(u64 addr)
> +{
> + x86_init.acpi.set_root_pointer(addr);
> +}
> +
> #define ACPI_HAVE_ARCH_GET_ROOT_POINTER
> static inline u64 acpi_arch_get_root_pointer(void)
> {
> @@ -125,6 +131,7 @@ static inline u64 acpi_arch_get_root_pointer(void)
>
> void acpi_generic_reduced_hw_init(void);
>
> +void x86_default_set_root_pointer(u64 addr);
> u64 x86_default_get_root_pointer(void);
>
> #else /* !CONFIG_ACPI */
> @@ -138,6 +145,8 @@ static inline void disable_acpi(void) { }
>
> static inline void acpi_generic_reduced_hw_init(void) { }
>
> +static inline void x86_default_set_root_pointer(u64 addr) { }
> +
> static inline u64 x86_default_get_root_pointer(void)
> {
> return 0;
> diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
> index ac0934189017..19435858df5f 100644
> --- a/arch/x86/include/asm/x86_init.h
> +++ b/arch/x86/include/asm/x86_init.h
> @@ -134,10 +134,12 @@ struct x86_hyper_init {
>
> /**
> * struct x86_init_acpi - x86 ACPI init functions
> + * @set_root_poitner: set RSDP address
> * @get_root_pointer: get RSDP address
> * @reduced_hw_early_init: hardware reduced platform early init
> */
> struct x86_init_acpi {
> + void (*set_root_pointer)(u64 addr);
> u64 (*get_root_pointer)(void);
> void (*reduced_hw_early_init)(void);
> };
> diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
> index 17b33ef604f3..04205ce127a1 100644
> --- a/arch/x86/kernel/acpi/boot.c
> +++ b/arch/x86/kernel/acpi/boot.c
> @@ -1760,6 +1760,11 @@ void __init arch_reserve_mem_area(acpi_physical_address addr, size_t size)
> e820__update_table_print();
> }
>
> +void x86_default_set_root_pointer(u64 addr)
> +{
> + boot_params.acpi_rsdp_addr = addr;
> +}
> +
> u64 x86_default_get_root_pointer(void)
> {
> return boot_params.acpi_rsdp_addr;
> diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
> index 1bef687faf22..18a799c8fa28 100644
> --- a/arch/x86/kernel/x86_init.c
> +++ b/arch/x86/kernel/x86_init.c
> @@ -95,6 +95,7 @@ struct x86_init_ops x86_init __initdata = {
> },
>
> .acpi = {
> + .set_root_pointer = x86_default_set_root_pointer,
> .get_root_pointer = x86_default_get_root_pointer,
> .reduced_hw_early_init = acpi_generic_reduced_hw_init,
> },
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index 9c0edf2fc0dd..d43df3a3fa8d 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -26,6 +26,7 @@
> #include <linux/list.h>
> #include <linux/jiffies.h>
> #include <linux/semaphore.h>
> +#include <linux/security.h>
>
> #include <asm/io.h>
> #include <linux/uaccess.h>
> @@ -180,8 +181,19 @@ acpi_physical_address __init acpi_os_get_root_pointer(void)
> acpi_physical_address pa;
>
> #ifdef CONFIG_KEXEC
> - if (acpi_rsdp)
> + /*
> + * We may have been provided with an RSDP on the command line,
> + * but if a malicious user has done so they may be pointing us
> + * at modified ACPI tables that could alter kernel behaviour -
> + * so, we check the lockdown status before making use of
> + * it. If we trust it then also stash it in an architecture
> + * specific location (if appropriate) so it can be carried
> + * over further kexec()s.
> + */
> + if (acpi_rsdp && !security_locked_down(LOCKDOWN_ACPI_TABLES)) {
> + acpi_arch_set_root_pointer(acpi_rsdp);
> return acpi_rsdp;
> + }
> #endif
> pa = acpi_arch_get_root_pointer();
> if (pa)
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index e40e1e27ed8e..6b35f2f4cab3 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -643,6 +643,12 @@ bool acpi_gtdt_c3stop(int type);
> int acpi_arch_timer_mem_init(struct arch_timer_mem *timer_mem, int *timer_count);
> #endif
>
> +#ifndef ACPI_HAVE_ARCH_SET_ROOT_POINTER
> +static inline void acpi_arch_set_root_pointer(u64 addr)
> +{
> +}
> +#endif
> +
> #ifndef ACPI_HAVE_ARCH_GET_ROOT_POINTER
> static inline u64 acpi_arch_get_root_pointer(void)
> {
> --
> 2.22.0.770.g0f2c4a37fd-goog
>
I'm not sure why this series get posted quickly again. As for this
patch it was restructured according the early rsdp parsing code, so
personally I think keep the original Reviewed-by is questionable and it
needs another review. But I did not find time to read it carefully.
Add several cc to void it slipped.
Thanks
Dave
^ permalink raw reply
* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Alexei Starovoitov @ 2019-08-14 0:57 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Song Liu, Kees Cook, Networking, bpf, Alexei Starovoitov,
Daniel Borkmann, Kernel Team, Lorenz Bauer, Jann Horn, Greg KH,
Linux API, LSM List
In-Reply-To: <CALCETrVT-dDXQGukGs5S1DkzvQv9_e=axzr_GyEd2c4T4z8Qng@mail.gmail.com>
On Tue, Aug 13, 2019 at 04:06:00PM -0700, Andy Lutomirski wrote:
> On Tue, Aug 13, 2019 at 2:58 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Tue, Aug 06, 2019 at 10:24:25PM -0700, Andy Lutomirski wrote:
> > > >
> > > > Inside containers and inside nested containers we need to start processes
> > > > that will use bpf. All of the processes are trusted.
> > >
> > > Trusted by whom? In a non-nested container, the container manager
> > > *might* be trusted by the outside world. In a *nested* container,
> > > unless the inner container management is controlled from outside the
> > > outer container, it's not trusted. I don't know much about how
> > > Facebook's containers work, but the LXC/LXD/Podman world is moving
> > > very strongly toward user namespaces and maximally-untrusted
> > > containers, and I think bpf() should work in that context.
> >
> > agree that containers (namespaces) reduce amount of trust necessary
> > for apps to run, but the end goal is not security though.
> > Linux has become a single user system.
> > If user can ssh into the host they can become root.
> > If arbitrary code can run on the host it will be break out of any sandbox.
>
> I would argue that this is a reasonable assumption to make if you're
> designing a system using Linux, but it's not a valid assumption to
> make as kernel developers. Otherwise we should just give everyone
> CAP_SYS_ADMIN and call it a day. There really is a difference between
> root and non-root.
hmm. No. Kernel developers should not make any assumptions.
They should guide their design by real use cases instead. That includes studing
what people do now and hacks they use to workaround lack of interfaces.
Effecitvely bpf is root only. There are no unpriv users.
This root applications go out of their way to reduce privileges
while they still want to use bpf. That is the need that /dev/bpf is solving.
>
> > Containers are not providing the level of security that is enough
> > to run arbitrary code. VMs can do it better, but cpu bugs don't make it easy.
> > Containers are used to make production systems safer.
> > Some people call it more 'secure', but it's clearly not secure for
> > arbitrary code and that is what kernel.unprivileged_bpf_disabled allows.
> > When we say 'unprivileged bpf' we really mean arbitrary malicious bpf program.
> > It's been a constant source of pain. The constant blinding, randomization,
> > verifier speculative analysis, all spectre v1, v2, v4 mitigations
> > are simply not worth it. It's a lot of complex kernel code without users.
>
> Seccomp really will want eBPF some day, and it should work without
> privilege. Maybe it should be a restricted subset of eBPF, and
> Spectre will always be an issue until dramatically better hardware
> shows up, but I think people will want the ability for regular
> programs to load eBPF seccomp programs.
I'm absolutely against using eBPF in seccomp.
Precisely due to discussions like the current one.
>
> > Hence I prefer this /dev/bpf mechanism to be as simple a possible.
> > The applications that will use it are going to be just as trusted as systemd.
>
> I still don't understand your systemd example. systemd --users is not
> trusted systemwide in any respect. The main PID 1 systemd is root.
> No matter how you dice it, granting a user systemd instance extra bpf
> access is tantamount to granting the user extra bpf access in general.
People use systemd --user while their kernel have 'undef CONFIG_USER_NS'.
> It sounds to me like you're thinking of eBPF as a feature a bit like
> unprivileged user namespaces: *in principle*, it's supposed to be safe
> to give any unprivileged process the ability to use it, and you
> consider security flaws in it to be bugs worth fixing. But you think
> it's a large attack surface and that most unprivileged programs
> shouldn't be allowed to use it. Is that reasonable?
I think there should be no unprivileged bpf at all,
because over all these years we've seen zero use cases.
Hence all new features are root only.
LPM map is a prime example. There was not a single security bug in there.
There were few functional bugs, but not security issues.
These bugs didn't crash the kernel and didn't expose any data.
Yet we still keep LPM as root only.
Can we flip the switch and make it non-root? It's trivial single line patch ?
and security risk is very low?
Nope, since it will not address the underlying issue.
^ permalink raw reply
* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Andy Lutomirski @ 2019-08-13 23:24 UTC (permalink / raw)
To: Daniel Colascione
Cc: Alexei Starovoitov, Andy Lutomirski, Song Liu, Kees Cook,
Networking, bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team,
Lorenz Bauer, Jann Horn, Greg KH, Linux API, LSM List
In-Reply-To: <CAKOZuev8XY5+shG8SiWcx4z12QnkgzhcUqCHs9t+eV2z-6nzPA@mail.gmail.com>
On Tue, Aug 13, 2019 at 3:27 PM Daniel Colascione <dancol@google.com> wrote:
>
> On Tue, Aug 13, 2019 at 2:58 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Tue, Aug 06, 2019 at 10:24:25PM -0700, Andy Lutomirski wrote:
> > > >
> > > > Inside containers and inside nested containers we need to start processes
> > > > that will use bpf. All of the processes are trusted.
> > >
> > > Trusted by whom? In a non-nested container, the container manager
> > > *might* be trusted by the outside world. In a *nested* container,
> > > unless the inner container management is controlled from outside the
> > > outer container, it's not trusted. I don't know much about how
> > > Facebook's containers work, but the LXC/LXD/Podman world is moving
> > > very strongly toward user namespaces and maximally-untrusted
> > > containers, and I think bpf() should work in that context.
> >
> > agree that containers (namespaces) reduce amount of trust necessary
> > for apps to run, but the end goal is not security though.
> > Linux has become a single user system.
> > If user can ssh into the host they can become root.
> > If arbitrary code can run on the host it will be break out of any sandbox.
> > Containers are not providing the level of security that is enough
> > to run arbitrary code. VMs can do it better, but cpu bugs don't make it easy.
> > Containers are used to make production systems safer.
> > Some people call it more 'secure', but it's clearly not secure for
> > arbitrary code and that is what kernel.unprivileged_bpf_disabled allows.
> > When we say 'unprivileged bpf' we really mean arbitrary malicious bpf program.
> > It's been a constant source of pain. The constant blinding, randomization,
> > verifier speculative analysis, all spectre v1, v2, v4 mitigations
> > are simply not worth it. It's a lot of complex kernel code without users.
> > There is not a single use case to allow arbitrary malicious bpf
> > program to be loaded and executed.
> > As soon as we have /dev/bpf to allow all of bpf to be used without root
> > we will set sysctl kernel.unprivileged_bpf_disabled=1
> > Hence I prefer this /dev/bpf mechanism to be as simple a possible.
> > The applications that will use it are going to be just as trusted as systemd.
> >
> > > > To solve your concern of bypassing all capable checks...
> > > > How about we do /dev/bpf/full_verifier first?
> > > > It will replace capable() checks in the verifier only.
> > >
> > > I'm not convinced that "in the verifier" is the right distinction.
> > > Telling administrators that some setting lets certain users bypass
> > > bpf() verifier checks doesn't have a clear enough meaning.
> >
> > linux is a single user system. there are no administrators any more.
> > No doubt, folks will disagree, but that game is over.
> > At least on bpf side it's done.
> >
> > > I propose,
> > > instead, that the current capable() checks be divided into three
> > > categories:
> >
> > I don't see a use case for these categories.
> > All bpf programs extend the kernel in some way.
> > The kernel vs user is one category.
> > Conceptually CAP_BPF is enough. It would be similar to CAP_NET_ADMIN.
> > When application has CAP_NET_ADMIN it covers all of networking knobs.
> > There is no use case that would warrant fine grain CAP_ROUTE_ADMIN,
> > CAP_ETHTOOL_ADMIN, CAP_ETH0_ADMIN, etc.
> > Similarly CAP_BPF as the only knob is enough.
> > The only disadvantage of CAP_BPF is that it's not possible to
> > pass it from one systemd-like daemon to another systemd-like daemon.
> > Hence /dev/bpf idea and passing file descriptor.
> >
> > > This type of thing actually fits quite nicely into an idea I've been
> > > thinking about for a while called "implicit rights". In very brief
> > > summary, there would be objects called /dev/rights/xyz, where xyz is
> > > the same of a "right". If there is a readable object of the right
> > > type at the literal path "/dev/rights/xyz", then you have right xyz.
> > > There's a bit more flexibility on top of this. BPF could use
> > > /dev/rights/bpf/maptypes/lpm and
> > > /dev/rights/bpf/verifier/bounded_loops, for example. Other non-BPF
> > > use cases include a biggie:
> > > /dev/rights/namespace/create_unprivileged_userns.
> > > /dev/rights/bind_port/80 would be nice, too.
> >
> > The concept of "implicit rights" is very nice and I'm sure it will
> > be a good fit somewhere, but I don't see why use it in bpf space.
> > There is no use case for fine grain partition of bpf features.
>
> Isn't this "implicit rights" model just another kind of ambient
> authority --- one that constrains the otherwise-free filesystem
> namespace to boot?
Yes.
> IMHO, the kernel should be moving toward explicit
> authorization tokens modeled by file descriptors and away from
> contextual authorization decisions.
And yes, I agree there too. Here's how I think about it: there are
really two layers here:
Rights: these are objects like /dev/rights/bpf/some_bpf_privilege or
/dev/rights/namespace/unpriv_userns, and you would, ideally, use them
like genuine capabilities. You'd pass an fd with appropriate access
(FMODE_READ, presumably, since exec is awkward to work with for fds)
into bpf() or similar, and the kernel would say "yep, caller has the
capability" and do something. There's nothing really restricting them
to /dev/rights, but they more or less have to live on a memory-backed
file system (a real backing store has all kinds of issues), and
putting them in /dev gets a lot of nifty properties for free. For
example, existing container systems that don't know about them will
automatically deny them to containers, since nothing with an ounce of
sense passes all of /dev through to a container. But container
systems that are aware of them can bind-mount them into the container.
And /dev is already known to be magical due to things like
/dev/urandom.
The implicit part on top is less than ideal, but it solves two problems:
1. It keeps compatibility with existing code. There are programs that
expect unshare(CLONE_NEWUSER) to work -- with *implicit* rights, it
will work exactly when it's supposed to. Also, for cases like
CLONE_NEWUSER, it does have more or less the right semantics -- if
they were explicit, most programs would just try to open
/dev/rights/namespace/unpriv_userns and pass the fd to unshare2, so
we're not losing much.
2. For things like eBPF where the set of rights could be a moving
target, implicit rights lets the model evolve without breaking
userspace. So if LPM maps eventually become bulletproof and a right
is no longer needed, it still works. Or if some feature in the
verifier that is currently unrestricted were subsequently deemed to
need restrictions, they could be added without retrofitting all the
users.
There are cases where implicit rights would be totally inappropriate.
For example, a CAP_DAC_READ_SEARCH right could not be safely made
implicit. In general, I think the implicit model works for system
calls where it's unambiguous what the caller wants to have happen and
there, depending on privilege level, it either works or it doesn't.
So, for accessing a filesystem, it's not at all obvious whether a
program is accessing it on its own behalf or on a client's behalf, and
privilege usage should be explicit. For something like "don't
Spectre-mitigate this eBPF program", the semantics change and the
request should IMO be explicit. For for "create an LPM map", I don't
see how a confused deputy is likely, and an implicit right seems
reasonable. Similarly, for creating a namespace or binding a network
port, confused deputies seem unlikely. (For connecting to a network
address, if such a thing were ever restricted, confused deputies are
definitely possible and happen all the time, e.g. under a DNS
rebinding attack.)
^ permalink raw reply
* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Andy Lutomirski @ 2019-08-13 23:06 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Andy Lutomirski, Song Liu, Kees Cook, Networking, bpf,
Alexei Starovoitov, Daniel Borkmann, Kernel Team, Lorenz Bauer,
Jann Horn, Greg KH, Linux API, LSM List
In-Reply-To: <20190813215823.3sfbakzzjjykyng2@ast-mbp>
On Tue, Aug 13, 2019 at 2:58 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Aug 06, 2019 at 10:24:25PM -0700, Andy Lutomirski wrote:
> > >
> > > Inside containers and inside nested containers we need to start processes
> > > that will use bpf. All of the processes are trusted.
> >
> > Trusted by whom? In a non-nested container, the container manager
> > *might* be trusted by the outside world. In a *nested* container,
> > unless the inner container management is controlled from outside the
> > outer container, it's not trusted. I don't know much about how
> > Facebook's containers work, but the LXC/LXD/Podman world is moving
> > very strongly toward user namespaces and maximally-untrusted
> > containers, and I think bpf() should work in that context.
>
> agree that containers (namespaces) reduce amount of trust necessary
> for apps to run, but the end goal is not security though.
> Linux has become a single user system.
> If user can ssh into the host they can become root.
> If arbitrary code can run on the host it will be break out of any sandbox.
I would argue that this is a reasonable assumption to make if you're
designing a system using Linux, but it's not a valid assumption to
make as kernel developers. Otherwise we should just give everyone
CAP_SYS_ADMIN and call it a day. There really is a difference between
root and non-root.
> Containers are not providing the level of security that is enough
> to run arbitrary code. VMs can do it better, but cpu bugs don't make it easy.
> Containers are used to make production systems safer.
> Some people call it more 'secure', but it's clearly not secure for
> arbitrary code and that is what kernel.unprivileged_bpf_disabled allows.
> When we say 'unprivileged bpf' we really mean arbitrary malicious bpf program.
> It's been a constant source of pain. The constant blinding, randomization,
> verifier speculative analysis, all spectre v1, v2, v4 mitigations
> are simply not worth it. It's a lot of complex kernel code without users.
Seccomp really will want eBPF some day, and it should work without
privilege. Maybe it should be a restricted subset of eBPF, and
Spectre will always be an issue until dramatically better hardware
shows up, but I think people will want the ability for regular
programs to load eBPF seccomp programs.
> Hence I prefer this /dev/bpf mechanism to be as simple a possible.
> The applications that will use it are going to be just as trusted as systemd.
I still don't understand your systemd example. systemd --users is not
trusted systemwide in any respect. The main PID 1 systemd is root.
No matter how you dice it, granting a user systemd instance extra bpf
access is tantamount to granting the user extra bpf access in general.
It sounds to me like you're thinking of eBPF as a feature a bit like
unprivileged user namespaces: *in principle*, it's supposed to be safe
to give any unprivileged process the ability to use it, and you
consider security flaws in it to be bugs worth fixing. But you think
it's a large attack surface and that most unprivileged programs
shouldn't be allowed to use it. Is that reasonable?
>
> > > To solve your concern of bypassing all capable checks...
> > > How about we do /dev/bpf/full_verifier first?
> > > It will replace capable() checks in the verifier only.
> >
> > I'm not convinced that "in the verifier" is the right distinction.
> > Telling administrators that some setting lets certain users bypass
> > bpf() verifier checks doesn't have a clear enough meaning.
>
> linux is a single user system. there are no administrators any more.
> No doubt, folks will disagree, but that game is over.
> At least on bpf side it's done.
>
> > I propose,
> > instead, that the current capable() checks be divided into three
> > categories:
>
> I don't see a use case for these categories.
> All bpf programs extend the kernel in some way.
> The kernel vs user is one category.
> Conceptually CAP_BPF is enough. It would be similar to CAP_NET_ADMIN.
> When application has CAP_NET_ADMIN it covers all of networking knobs.
> There is no use case that would warrant fine grain CAP_ROUTE_ADMIN,
> CAP_ETHTOOL_ADMIN, CAP_ETH0_ADMIN, etc.
> Similarly CAP_BPF as the only knob is enough.
> The only disadvantage of CAP_BPF is that it's not possible to
> pass it from one systemd-like daemon to another systemd-like daemon.
> Hence /dev/bpf idea and passing file descriptor.
>
> > This type of thing actually fits quite nicely into an idea I've been
> > thinking about for a while called "implicit rights". In very brief
> > summary, there would be objects called /dev/rights/xyz, where xyz is
> > the same of a "right". If there is a readable object of the right
> > type at the literal path "/dev/rights/xyz", then you have right xyz.
> > There's a bit more flexibility on top of this. BPF could use
> > /dev/rights/bpf/maptypes/lpm and
> > /dev/rights/bpf/verifier/bounded_loops, for example. Other non-BPF
> > use cases include a biggie:
> > /dev/rights/namespace/create_unprivileged_userns.
> > /dev/rights/bind_port/80 would be nice, too.
>
> The concept of "implicit rights" is very nice and I'm sure it will
> be a good fit somewhere, but I don't see why use it in bpf space.
> There is no use case for fine grain partition of bpf features.
>
^ permalink raw reply
* Re: [PATCH 0/6] lockdown fixups
From: James Morris @ 2019-08-13 22:59 UTC (permalink / raw)
To: Matthew Garrett; +Cc: linux-security-module
In-Reply-To: <20190813192126.122370-1-matthewgarrett@google.com>
On Tue, 13 Aug 2019, Matthew Garrett wrote:
> Fixups for the lockdown patchset. Potentially makes more sense to merge
> these into the relevant original patches rather than keeping them
> separate, but sending them for review.
Yep, it's difficult to guess which Linus would prefer, as this is
also public development history now.
--
James Morris
<jmorris@namei.org>
^ permalink raw reply
* Re: [RFC PATCH v5 1/1] Add dm verity root hash pkcs7 sig validation.
From: Jaskaran Singh Khurana @ 2019-08-13 18:49 UTC (permalink / raw)
To: Mike Snitzer
Cc: gmazyland, linux-security-module, linux-kernel, linux-integrity,
linux-fsdevel, scottsh, ebiggers, jmorris, dm-devel, mpatocka,
agk
In-Reply-To: <20190625182004.GA32075@redhat.com>
Hello Mike,
On Tue, 25 Jun 2019, Mike Snitzer wrote:
> On Wed, Jun 19 2019 at 3:10pm -0400,
> Jaskaran Khurana <jaskarankhurana@linux.microsoft.com> wrote:
>
>> The verification is to support cases where the roothash is not secured by
>> Trusted Boot, UEFI Secureboot or similar technologies.
>> One of the use cases for this is for dm-verity volumes mounted after boot,
>> the root hash provided during the creation of the dm-verity volume has to
>> be secure and thus in-kernel validation implemented here will be used
>> before we trust the root hash and allow the block device to be created.
>>
>> The signature being provided for verification must verify the root hash and
>> must be trusted by the builtin keyring for verification to succeed.
>>
>> The hash is added as a key of type "user" and the description is passed to
>> the kernel so it can look it up and use it for verification.
>>
>> Kernel commandline parameter will indicate whether to check (only if
>> specified) or force (for all dm verity volumes) roothash signature
>> verification.
>>
>> Kernel commandline: dm_verity.verify_sig=1 or 2 for check/force root hash
>> signature validation respectively.
>>
>> Signed-off-by: Jaskaran Khurana <jaskarankhurana@linux.microsoft.com>
>
> Milan and/or others: could you please provide review and if you're OK
> with this patch respond accordingly?
>
The v7 of this patch was Reviewed and Tested by Milan Broz. Could you tell
me when this will be merged/next steps, if required I can post the patches
again.
> Thanks,
> Mike
>
Regards,
Jaskaran
^ permalink raw reply
* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Daniel Colascione @ 2019-08-13 22:26 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Andy Lutomirski, Song Liu, Kees Cook, Networking, bpf,
Alexei Starovoitov, Daniel Borkmann, Kernel Team, Lorenz Bauer,
Jann Horn, Greg KH, Linux API, LSM List
In-Reply-To: <20190813215823.3sfbakzzjjykyng2@ast-mbp>
On Tue, Aug 13, 2019 at 2:58 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Aug 06, 2019 at 10:24:25PM -0700, Andy Lutomirski wrote:
> > >
> > > Inside containers and inside nested containers we need to start processes
> > > that will use bpf. All of the processes are trusted.
> >
> > Trusted by whom? In a non-nested container, the container manager
> > *might* be trusted by the outside world. In a *nested* container,
> > unless the inner container management is controlled from outside the
> > outer container, it's not trusted. I don't know much about how
> > Facebook's containers work, but the LXC/LXD/Podman world is moving
> > very strongly toward user namespaces and maximally-untrusted
> > containers, and I think bpf() should work in that context.
>
> agree that containers (namespaces) reduce amount of trust necessary
> for apps to run, but the end goal is not security though.
> Linux has become a single user system.
> If user can ssh into the host they can become root.
> If arbitrary code can run on the host it will be break out of any sandbox.
> Containers are not providing the level of security that is enough
> to run arbitrary code. VMs can do it better, but cpu bugs don't make it easy.
> Containers are used to make production systems safer.
> Some people call it more 'secure', but it's clearly not secure for
> arbitrary code and that is what kernel.unprivileged_bpf_disabled allows.
> When we say 'unprivileged bpf' we really mean arbitrary malicious bpf program.
> It's been a constant source of pain. The constant blinding, randomization,
> verifier speculative analysis, all spectre v1, v2, v4 mitigations
> are simply not worth it. It's a lot of complex kernel code without users.
> There is not a single use case to allow arbitrary malicious bpf
> program to be loaded and executed.
> As soon as we have /dev/bpf to allow all of bpf to be used without root
> we will set sysctl kernel.unprivileged_bpf_disabled=1
> Hence I prefer this /dev/bpf mechanism to be as simple a possible.
> The applications that will use it are going to be just as trusted as systemd.
>
> > > To solve your concern of bypassing all capable checks...
> > > How about we do /dev/bpf/full_verifier first?
> > > It will replace capable() checks in the verifier only.
> >
> > I'm not convinced that "in the verifier" is the right distinction.
> > Telling administrators that some setting lets certain users bypass
> > bpf() verifier checks doesn't have a clear enough meaning.
>
> linux is a single user system. there are no administrators any more.
> No doubt, folks will disagree, but that game is over.
> At least on bpf side it's done.
>
> > I propose,
> > instead, that the current capable() checks be divided into three
> > categories:
>
> I don't see a use case for these categories.
> All bpf programs extend the kernel in some way.
> The kernel vs user is one category.
> Conceptually CAP_BPF is enough. It would be similar to CAP_NET_ADMIN.
> When application has CAP_NET_ADMIN it covers all of networking knobs.
> There is no use case that would warrant fine grain CAP_ROUTE_ADMIN,
> CAP_ETHTOOL_ADMIN, CAP_ETH0_ADMIN, etc.
> Similarly CAP_BPF as the only knob is enough.
> The only disadvantage of CAP_BPF is that it's not possible to
> pass it from one systemd-like daemon to another systemd-like daemon.
> Hence /dev/bpf idea and passing file descriptor.
>
> > This type of thing actually fits quite nicely into an idea I've been
> > thinking about for a while called "implicit rights". In very brief
> > summary, there would be objects called /dev/rights/xyz, where xyz is
> > the same of a "right". If there is a readable object of the right
> > type at the literal path "/dev/rights/xyz", then you have right xyz.
> > There's a bit more flexibility on top of this. BPF could use
> > /dev/rights/bpf/maptypes/lpm and
> > /dev/rights/bpf/verifier/bounded_loops, for example. Other non-BPF
> > use cases include a biggie:
> > /dev/rights/namespace/create_unprivileged_userns.
> > /dev/rights/bind_port/80 would be nice, too.
>
> The concept of "implicit rights" is very nice and I'm sure it will
> be a good fit somewhere, but I don't see why use it in bpf space.
> There is no use case for fine grain partition of bpf features.
Isn't this "implicit rights" model just another kind of ambient
authority --- one that constrains the otherwise-free filesystem
namespace to boot? IMHO, the kernel should be moving toward explicit
authorization tokens modeled by file descriptors and away from
contextual authorization decisions.
^ permalink raw reply
* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Alexei Starovoitov @ 2019-08-13 21:58 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Song Liu, Kees Cook, Networking, bpf, Alexei Starovoitov,
Daniel Borkmann, Kernel Team, Lorenz Bauer, Jann Horn, Greg KH,
Linux API, LSM List
In-Reply-To: <CALCETrXEHL3+NAY6P6vUj7Pvd9ZpZsYC6VCLXOaNxb90a_POGw@mail.gmail.com>
On Tue, Aug 06, 2019 at 10:24:25PM -0700, Andy Lutomirski wrote:
> >
> > Inside containers and inside nested containers we need to start processes
> > that will use bpf. All of the processes are trusted.
>
> Trusted by whom? In a non-nested container, the container manager
> *might* be trusted by the outside world. In a *nested* container,
> unless the inner container management is controlled from outside the
> outer container, it's not trusted. I don't know much about how
> Facebook's containers work, but the LXC/LXD/Podman world is moving
> very strongly toward user namespaces and maximally-untrusted
> containers, and I think bpf() should work in that context.
agree that containers (namespaces) reduce amount of trust necessary
for apps to run, but the end goal is not security though.
Linux has become a single user system.
If user can ssh into the host they can become root.
If arbitrary code can run on the host it will be break out of any sandbox.
Containers are not providing the level of security that is enough
to run arbitrary code. VMs can do it better, but cpu bugs don't make it easy.
Containers are used to make production systems safer.
Some people call it more 'secure', but it's clearly not secure for
arbitrary code and that is what kernel.unprivileged_bpf_disabled allows.
When we say 'unprivileged bpf' we really mean arbitrary malicious bpf program.
It's been a constant source of pain. The constant blinding, randomization,
verifier speculative analysis, all spectre v1, v2, v4 mitigations
are simply not worth it. It's a lot of complex kernel code without users.
There is not a single use case to allow arbitrary malicious bpf
program to be loaded and executed.
As soon as we have /dev/bpf to allow all of bpf to be used without root
we will set sysctl kernel.unprivileged_bpf_disabled=1
Hence I prefer this /dev/bpf mechanism to be as simple a possible.
The applications that will use it are going to be just as trusted as systemd.
> > To solve your concern of bypassing all capable checks...
> > How about we do /dev/bpf/full_verifier first?
> > It will replace capable() checks in the verifier only.
>
> I'm not convinced that "in the verifier" is the right distinction.
> Telling administrators that some setting lets certain users bypass
> bpf() verifier checks doesn't have a clear enough meaning.
linux is a single user system. there are no administrators any more.
No doubt, folks will disagree, but that game is over.
At least on bpf side it's done.
> I propose,
> instead, that the current capable() checks be divided into three
> categories:
I don't see a use case for these categories.
All bpf programs extend the kernel in some way.
The kernel vs user is one category.
Conceptually CAP_BPF is enough. It would be similar to CAP_NET_ADMIN.
When application has CAP_NET_ADMIN it covers all of networking knobs.
There is no use case that would warrant fine grain CAP_ROUTE_ADMIN,
CAP_ETHTOOL_ADMIN, CAP_ETH0_ADMIN, etc.
Similarly CAP_BPF as the only knob is enough.
The only disadvantage of CAP_BPF is that it's not possible to
pass it from one systemd-like daemon to another systemd-like daemon.
Hence /dev/bpf idea and passing file descriptor.
> This type of thing actually fits quite nicely into an idea I've been
> thinking about for a while called "implicit rights". In very brief
> summary, there would be objects called /dev/rights/xyz, where xyz is
> the same of a "right". If there is a readable object of the right
> type at the literal path "/dev/rights/xyz", then you have right xyz.
> There's a bit more flexibility on top of this. BPF could use
> /dev/rights/bpf/maptypes/lpm and
> /dev/rights/bpf/verifier/bounded_loops, for example. Other non-BPF
> use cases include a biggie:
> /dev/rights/namespace/create_unprivileged_userns.
> /dev/rights/bind_port/80 would be nice, too.
The concept of "implicit rights" is very nice and I'm sure it will
be a good fit somewhere, but I don't see why use it in bpf space.
There is no use case for fine grain partition of bpf features.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox