Linux Security Modules development

Linux Security Modules development
 help / color / mirror / Atom feed

* Re: [PATCH v18 05/23] net: Prepare UDS for security module stacking
From: Casey Schaufler @ 2020-07-09 19:24 UTC (permalink / raw)
  To: John Johansen, Stephen Smalley
  Cc: Casey Schaufler, James Morris, LSM List, SElinux list, Kees Cook,
	Tetsuo Handa, Paul Moore, Stephen Smalley, netdev,
	Casey Schaufler
In-Reply-To: <8a5a243f-e991-ad55-0503-654cc2587133@canonical.com>

On 7/9/2020 9:28 AM, John Johansen wrote:
> On 7/9/20 9:11 AM, Stephen Smalley wrote:
>> On Wed, Jul 8, 2020 at 8:23 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>>> Change the data used in UDS SO_PEERSEC processing from a
>>> secid to a more general struct lsmblob. Update the
>>> security_socket_getpeersec_dgram() interface to use the
>>> lsmblob. There is a small amount of scaffolding code
>>> that will come out when the security_secid_to_secctx()
>>> code is brought in line with the lsmblob.
>>>
>>> The secid field of the unix_skb_parms structure has been
>>> replaced with a pointer to an lsmblob structure, and the
>>> lsmblob is allocated as needed. This is similar to how the
>>> list of passed files is managed. While an lsmblob structure
>>> will fit in the available space today, there is no guarantee
>>> that the addition of other data to the unix_skb_parms or
>>> support for additional security modules wouldn't exceed what
>>> is available.
>>>
>>> Reviewed-by: Kees Cook <keescook@chromium.org>
>>> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
>>> Cc: netdev@vger.kernel.org
>>> ---
>>> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
>>> index 3385a7a0b231..d246aefcf4da 100644
>>> --- a/net/unix/af_unix.c
>>> +++ b/net/unix/af_unix.c
>>> @@ -138,17 +138,23 @@ static struct hlist_head *unix_sockets_unbound(void *addr)
>>>  #ifdef CONFIG_SECURITY_NETWORK
>>>  static void unix_get_secdata(struct scm_cookie *scm, struct sk_buff *skb)
>>>  {
>>> -       UNIXCB(skb).secid = scm->secid;
>>> +       UNIXCB(skb).lsmdata = kmemdup(&scm->lsmblob, sizeof(scm->lsmblob),
>>> +                                     GFP_KERNEL);
>>>  }
>>>
>>>  static inline void unix_set_secdata(struct scm_cookie *scm, struct sk_buff *skb)
>>>  {
>>> -       scm->secid = UNIXCB(skb).secid;
>>> +       if (likely(UNIXCB(skb).lsmdata))
>>> +               scm->lsmblob = *(UNIXCB(skb).lsmdata);
>>> +       else
>>> +               lsmblob_init(&scm->lsmblob, 0);
>>>  }
>>>
>>>  static inline bool unix_secdata_eq(struct scm_cookie *scm, struct sk_buff *skb)
>>>  {
>>> -       return (scm->secid == UNIXCB(skb).secid);
>>> +       if (likely(UNIXCB(skb).lsmdata))
>>> +               return lsmblob_equal(&scm->lsmblob, UNIXCB(skb).lsmdata);
>>> +       return false;
>>>  }
>> I don't think that this provides sensible behavior to userspace.  On a
>> transient memory allocation failure, instead of returning an error to
>> the sender and letting them handle it, this will just proceed with
>> sending the message without its associated security information, and
>> potentially split messages on arbitrary boundaries because it cannot
>> tell whether the sender had the same security information.  I think
>> you instead need to change unix_get_secdata() to return an error on
>> allocation failure and propagate that up to the sender.

Can't say that I think that would go over especially well.
You're right about that being a better, or at least more correct,
change.

>>   Not a fan of
>> this change in general both due to extra overhead on this code path
>> and potential for breakage on allocation failures.  I know it was
>> motivated by paul's observation that we won't be able to fit many more
>> secids into the cb but not sure we have to go there prematurely,

Paul wasn't completely against the original approach. His objection
was that using a struct lsmblob, which was already close to the maximum
size it can be and that can grow over time, might be a hard sell.

>> especially absent its usage by upstream AA (no unix_stream_connect
>> hook implementation upstream).  Also not sure how the whole bpf local
> I'm not sure how premature it is, I am running late for 5.9 but would
> like to land apparmor unix mediation in 5.10

Which means that scaffolding around the UNIXCB.secid wouldn't
suffice for very long.

>
>> storage approach to supporting security modules (or at least bpf lsm)
>> might reduce need for expanding these structures?

I think the allocation failure case would still be an issue,
and it could be much more complicated to deal with using the
local storage model. The fundamental problem comes back to fitting
more that 32 bits of information into 32 bits without having to
perform an operation that might fail.

At this point I'm inclined to revert to the original implementation
and see if it doesn't turn out to be acceptable after all. I remain
open to better ideas.



^ permalink raw reply

* Re: [PATCH v18 05/23] net: Prepare UDS for security module stacking
From: Stephen Smalley @ 2020-07-09 19:54 UTC (permalink / raw)
  To: John Johansen
  Cc: Casey Schaufler, Casey Schaufler, James Morris, LSM List,
	SElinux list, Kees Cook, Tetsuo Handa, Paul Moore,
	Stephen Smalley, netdev
In-Reply-To: <8a5a243f-e991-ad55-0503-654cc2587133@canonical.com>

On Thu, Jul 9, 2020 at 12:28 PM John Johansen
<john.johansen@canonical.com> wrote:
>
> On 7/9/20 9:11 AM, Stephen Smalley wrote:
> > On Wed, Jul 8, 2020 at 8:23 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
> >>
> >> Change the data used in UDS SO_PEERSEC processing from a
> >> secid to a more general struct lsmblob. Update the
> >> security_socket_getpeersec_dgram() interface to use the
> >> lsmblob. There is a small amount of scaffolding code
> >> that will come out when the security_secid_to_secctx()
> >> code is brought in line with the lsmblob.
> >>
> >> The secid field of the unix_skb_parms structure has been
> >> replaced with a pointer to an lsmblob structure, and the
> >> lsmblob is allocated as needed. This is similar to how the
> >> list of passed files is managed. While an lsmblob structure
> >> will fit in the available space today, there is no guarantee
> >> that the addition of other data to the unix_skb_parms or
> >> support for additional security modules wouldn't exceed what
> >> is available.
> >>
> >> Reviewed-by: Kees Cook <keescook@chromium.org>
> >> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
> >> Cc: netdev@vger.kernel.org
> >> ---
> >
> >> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> >> index 3385a7a0b231..d246aefcf4da 100644
> >> --- a/net/unix/af_unix.c
> >> +++ b/net/unix/af_unix.c
> >> @@ -138,17 +138,23 @@ static struct hlist_head *unix_sockets_unbound(void *addr)
> >>  #ifdef CONFIG_SECURITY_NETWORK
> >>  static void unix_get_secdata(struct scm_cookie *scm, struct sk_buff *skb)
> >>  {
> >> -       UNIXCB(skb).secid = scm->secid;
> >> +       UNIXCB(skb).lsmdata = kmemdup(&scm->lsmblob, sizeof(scm->lsmblob),
> >> +                                     GFP_KERNEL);
> >>  }
> >>
> >>  static inline void unix_set_secdata(struct scm_cookie *scm, struct sk_buff *skb)
> >>  {
> >> -       scm->secid = UNIXCB(skb).secid;
> >> +       if (likely(UNIXCB(skb).lsmdata))
> >> +               scm->lsmblob = *(UNIXCB(skb).lsmdata);
> >> +       else
> >> +               lsmblob_init(&scm->lsmblob, 0);
> >>  }
> >>
> >>  static inline bool unix_secdata_eq(struct scm_cookie *scm, struct sk_buff *skb)
> >>  {
> >> -       return (scm->secid == UNIXCB(skb).secid);
> >> +       if (likely(UNIXCB(skb).lsmdata))
> >> +               return lsmblob_equal(&scm->lsmblob, UNIXCB(skb).lsmdata);
> >> +       return false;
> >>  }
> >
> > I don't think that this provides sensible behavior to userspace.  On a
> > transient memory allocation failure, instead of returning an error to
> > the sender and letting them handle it, this will just proceed with
> > sending the message without its associated security information, and
> > potentially split messages on arbitrary boundaries because it cannot
> > tell whether the sender had the same security information.  I think
> > you instead need to change unix_get_secdata() to return an error on
> > allocation failure and propagate that up to the sender.  Not a fan of
> > this change in general both due to extra overhead on this code path
> > and potential for breakage on allocation failures.  I know it was
> > motivated by paul's observation that we won't be able to fit many more
> > secids into the cb but not sure we have to go there prematurely,
> > especially absent its usage by upstream AA (no unix_stream_connect
> > hook implementation upstream).  Also not sure how the whole bpf local
>
> I'm not sure how premature it is, I am running late for 5.9 but would
> like to land apparmor unix mediation in 5.10

Sorry I think I mischaracterized the condition under which this
support needs to be stacked. It seems to only be needed if using
SO_PASSSEC and SCM_SECURITY (i.e. datagram labeling), not just for
unix mediation or SO_PEERSEC IIUC.  So not sure if that applies even
for downstream.

^ permalink raw reply

* [merged][PATCH v3 00/16] Make the user mode driver code a better citizen
From: Eric W. Biederman @ 2020-07-09 22:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: David Miller, Greg Kroah-Hartman, Tetsuo Handa,
	Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov,
	Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski,
	Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List,
	Casey Schaufler, Luis Chamberlain, Linus Torvalds,
	Christian Brauner
In-Reply-To: <87y2o1swee.fsf_-_@x220.int.ebiederm.org>


I have merged all of this into my exec-next tree.

The code is also available on the frozen branch:

   git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git usermode-driver-cleanup

The range-diff from the last posted version is below.

I was asked "Is there a simpler version of this code that could
be used for backports?".  The honest answer is not really.

Fundamentally do_execve_file as it existed prior to this set of changes
breaks a lot of invariants in exec.  The choices are either track down
all of the invariants it violates and fix it, or reorganize the code so
that do_execve_file is unnecessary.  Reorganizing the code was the path
I found simplest and most reliable.  I don't think anyone has tracked
down all of the constraints the code violated.

There is an issue clearly pointed out by Tetsuo Handa that in theory if
there is too long of a delay between closing the file after writing it
and flush_delayed_fput might not synchronize the file synchronously.  I
can not trigger it, and this is the same code path the initramfs relies
upon.  So I think calling flush_delayed_fput is good enough for this set
of changes.

If and when a generally accepted way to remove the theoreticaly race
it will be trivial to fix flush_delayed_fput or replace it and none
of the other logic changes.

Declaring this set of changes done now, allows the work that depends
upon this change to proceed.


Eric

--- 

1:  8fee10be3e7e !  1:  5fec25f2cb95 umh: Capture the pid in umh_pipe_setup
    @@ Commit message
     
         v1: https://lkml.kernel.org/r/87h7uygf9i.fsf_-_@x220.int.ebiederm.org
         v2: https://lkml.kernel.org/r/875zb97iix.fsf_-_@x220.int.ebiederm.org
    +    Link: https://lkml.kernel.org/r/20200702164140.4468-1-ebiederm@xmission.com
         Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    +    Acked-by: Alexei Starovoitov <ast@kernel.org>
    +    Tested-by: Alexei Starovoitov <ast@kernel.org>
         Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
     
      ## include/linux/umh.h ##
 2:  2d97bc5269dd !  2:  b044fa2ae50d umh: Move setting PF_UMH into umh_pipe_setup
    @@ Commit message
     
         v1: https://lkml.kernel.org/r/87bll6gf8t.fsf_-_@x220.int.ebiederm.org
         v2: https://lkml.kernel.org/r/87zh8l63xs.fsf_-_@x220.int.ebiederm.org
    +    Link: https://lkml.kernel.org/r/20200702164140.4468-2-ebiederm@xmission.com
         Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    +    Acked-by: Alexei Starovoitov <ast@kernel.org>
    +    Tested-by: Alexei Starovoitov <ast@kernel.org>
         Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
     
      ## kernel/umh.c ##
 3:  974e2b827aca !  3:  3a171042aeab umh: Rename the user mode driver helpers for clarity
    @@ Commit message
     
         v1: https://lkml.kernel.org/r/875zbegf82.fsf_-_@x220.int.ebiederm.org
         v2: https://lkml.kernel.org/r/87tuyt63x3.fsf_-_@x220.int.ebiederm.org
    +    Link: https://lkml.kernel.org/r/20200702164140.4468-3-ebiederm@xmission.com
         Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    +    Acked-by: Alexei Starovoitov <ast@kernel.org>
    +    Tested-by: Alexei Starovoitov <ast@kernel.org>
         Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
     
      ## kernel/umh.c ##
 4:  6c8f72f8eb49 !  4:  21d598280675 umh: Remove call_usermodehelper_setup_file.
    @@ Commit message
     
         v1: https://lkml.kernel.org/r/87zh8qf0mp.fsf_-_@x220.int.ebiederm.org
         v2: https://lkml.kernel.org/r/87o8p163u1.fsf_-_@x220.int.ebiederm.org
    +    Link: https://lkml.kernel.org/r/20200702164140.4468-4-ebiederm@xmission.com
         Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    +    Acked-by: Alexei Starovoitov <ast@kernel.org>
    +    Tested-by: Alexei Starovoitov <ast@kernel.org>
         Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
     
      ## include/linux/umh.h ##
 5:  cbf6c2b5a04a !  5:  884c5e683b67 umh: Separate the user mode driver and the user mode helper support
    @@ Commit message
     
         v1: https://lkml.kernel.org/r/87tuyyf0ln.fsf_-_@x220.int.ebiederm.org
         v2: https://lkml.kernel.org/r/87imf963s6.fsf_-_@x220.int.ebiederm.org
    +    Link: https://lkml.kernel.org/r/20200702164140.4468-5-ebiederm@xmission.com
         Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    +    Acked-by: Alexei Starovoitov <ast@kernel.org>
    +    Tested-by: Alexei Starovoitov <ast@kernel.org>
         Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
     
      ## include/linux/bpfilter.h ##
 6:  b68617fd4ee3 !  6:  74be2d3b80af umd: For clarity rename umh_info umd_info
    @@ Commit message
     
         v1: https://lkml.kernel.org/r/87o8p6f0kw.fsf_-_@x220.int.ebiederm.org
         v2: https://lkml.kernel.org/r/878sg563po.fsf_-_@x220.int.ebiederm.org
    +    Link: https://lkml.kernel.org/r/20200702164140.4468-6-ebiederm@xmission.com
         Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    +    Acked-by: Alexei Starovoitov <ast@kernel.org>
    +    Tested-by: Alexei Starovoitov <ast@kernel.org>
         Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
     
      ## include/linux/bpfilter.h ##
 7:  6881acff5f6a !  7:  1199c6c3da51 umd: Rename umd_info.cmdline umd_info.driver_name
    @@ Commit message
     
         v1: https://lkml.kernel.org/r/87imfef0k3.fsf_-_@x220.int.ebiederm.org
         v2: https://lkml.kernel.org/r/87366d63os.fsf_-_@x220.int.ebiederm.org
    +    Link: https://lkml.kernel.org/r/20200702164140.4468-7-ebiederm@xmission.com
         Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    +    Acked-by: Alexei Starovoitov <ast@kernel.org>
    +    Tested-by: Alexei Starovoitov <ast@kernel.org>
         Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
     
      ## include/linux/usermode_driver.h ##
 8:  cd210622ff6f !  8:  e2dc9bf3f527 umd: Transform fork_usermode_blob into fork_usermode_driver
    @@ Commit message
         [1] https://lore.kernel.org/linux-fsdevel/2a8775b4-1dd5-9d5c-aa42-9872445e0942@i-love.sakura.ne.jp/
         v1: https://lkml.kernel.org/r/87d05mf0j9.fsf_-_@x220.int.ebiederm.org
         v2: https://lkml.kernel.org/r/87wo3p4p35.fsf_-_@x220.int.ebiederm.org
    +    Link: https://lkml.kernel.org/r/20200702164140.4468-8-ebiederm@xmission.com
         Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    +    Acked-by: Alexei Starovoitov <ast@kernel.org>
    +    Tested-by: Alexei Starovoitov <ast@kernel.org>
         Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
     
      ## include/linux/usermode_driver.h ##
 9:  74d65aaf2cab !  9:  55e6074e3fa6 umh: Stop calling do_execve_file
    @@ Commit message
     
         v1: https://lkml.kernel.org/r/877dvuf0i7.fsf_-_@x220.int.ebiederm.org
         v2: https://lkml.kernel.org/r/87r1tx4p2a.fsf_-_@x220.int.ebiederm.org
    +    Link: https://lkml.kernel.org/r/20200702164140.4468-9-ebiederm@xmission.com
         Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    +    Acked-by: Alexei Starovoitov <ast@kernel.org>
    +    Tested-by: Alexei Starovoitov <ast@kernel.org>
         Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
     
      ## include/linux/umh.h ##
10:  58a9854274a1 ! 10:  25cf336de51b exec: Remove do_execve_file
    @@ Commit message
         [1] https://lore.kernel.org/linux-fsdevel/2a8775b4-1dd5-9d5c-aa42-9872445e0942@i-love.sakura.ne.jp/
         v1: https://lkml.kernel.org/r/871rm2f0hi.fsf_-_@x220.int.ebiederm.org
         v2: https://lkml.kernel.org/r/87lfk54p0m.fsf_-_@x220.int.ebiederm.org
    +    Link: https://lkml.kernel.org/r/20200702164140.4468-10-ebiederm@xmission.com
         Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    +    Acked-by: Alexei Starovoitov <ast@kernel.org>
    +    Tested-by: Alexei Starovoitov <ast@kernel.org>
         Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
     
      ## fs/exec.c ##
11:  c45ae16a18c9 ! 11:  0fe3c63148ef bpfilter: Move bpfilter_umh back into init data
    @@ Commit message
     
         v1: https://lkml.kernel.org/r/87sgeidlvq.fsf_-_@x220.int.ebiederm.org
         v2: https://lkml.kernel.org/r/87ftad4ozc.fsf_-_@x220.int.ebiederm.org
    +    Link: https://lkml.kernel.org/r/20200702164140.4468-11-ebiederm@xmission.com
         Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    +    Acked-by: Alexei Starovoitov <ast@kernel.org>
    +    Tested-by: Alexei Starovoitov <ast@kernel.org>
         Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
     
      ## net/bpfilter/bpfilter_umh_blob.S ##
12:  43b41b9d52a0 ! 12:  1c340ead18ee umd: Track user space drivers with struct pid
    @@ Commit message
     
         v1: https://lkml.kernel.org/r/87mu4qdlv2.fsf_-_@x220.int.ebiederm.org
         v2: https://lkml.kernel.org/r/a70l4oy8.fsf_-_@x220.int.ebiederm.org
    +    Link: https://lkml.kernel.org/r/20200702164140.4468-12-ebiederm@xmission.com
         Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    +    Acked-by: Alexei Starovoitov <ast@kernel.org>
    +    Tested-by: Alexei Starovoitov <ast@kernel.org>
         Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
     
      ## include/linux/usermode_driver.h ##
13:  653476c24a30 ! 13:  38fd525a4c61 exit: Factor thread_group_exited out of pidfd_poll
    @@ Metadata
      ## Commit message ##
         exit: Factor thread_group_exited out of pidfd_poll
     
    -    Create an independent helper thread_group_exited report return true
    +    Create an independent helper thread_group_exited which returns true
         when all threads have passed exit_notify in do_exit.  AKA all of the
         threads are at least zombies and might be dead or completely gone.
     
    -    Create this helper by taking the logic out of pidfd_poll where
    -    it is already tested, and adding a missing READ_ONCE on
    -    the read of task->exit_state.
    +    Create this helper by taking the logic out of pidfd_poll where it is
    +    already tested, and adding a READ_ONCE on the read of
    +    task->exit_state.
     
         I will be changing the user mode driver code to use this same logic
         to know when a user mode driver needs to be restarted.
    @@ Commit message
         Place the new helper thread_group_exited in kernel/exit.c and
         EXPORT it so it can be used by modules.
     
    +    Link: https://lkml.kernel.org/r/20200702164140.4468-13-ebiederm@xmission.com
    +    Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
    +    Acked-by: Alexei Starovoitov <ast@kernel.org>
    +    Tested-by: Alexei Starovoitov <ast@kernel.org>
         Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
     
      ## include/linux/sched/signal.h ##
    @@ kernel/exit.c: COMPAT_SYSCALL_DEFINE5(waitid,
     + * thread_group_exited - check that a thread group has exited
     + * @pid: tgid of thread group to be checked.
     + *
    -+ * Test if thread group is has exited (all threads are zombies, dead
    -+ * or completely gone).
    ++ * Test if the thread group represented by tgid has exited (all
    ++ * threads are zombies, dead or completely gone).
     + *
     + * Return: true if the thread group has exited. false otherwise.
     + */
14:  7ad037d12723 ! 14:  e80eb1dc868b bpfilter: Take advantage of the facilities of struct pid
    @@ Commit message
     
         v1: https://lkml.kernel.org/r/87h7uydlu9.fsf_-_@x220.int.ebiederm.org
         v2: https://lkml.kernel.org/r/874kqt4owu.fsf_-_@x220.int.ebiederm.org
    +    Link: https://lkml.kernel.org/r/20200702164140.4468-14-ebiederm@xmission.com
         Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    +    Acked-by: Alexei Starovoitov <ast@kernel.org>
    +    Tested-by: Alexei Starovoitov <ast@kernel.org>
         Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
     
      ## include/linux/bpfilter.h ##
15:  e50cf5e57a62 ! 15:  8c2f52663973 umd: Remove exit_umh
    @@ Commit message
     
         v1: https://lkml.kernel.org/r/87bll6dlte.fsf_-_@x220.int.ebiederm.org
         v2: https://lkml.kernel.org/r/87y2o53abg.fsf_-_@x220.int.ebiederm.org
    +    Link: https://lkml.kernel.org/r/20200702164140.4468-15-ebiederm@xmission.com
         Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    +    Acked-by: Alexei Starovoitov <ast@kernel.org>
    +    Tested-by: Alexei Starovoitov <ast@kernel.org>
         Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
     
      ## include/linux/sched.h ##
16:  32e057d8aa4a ! 16:  33c326014fe6 umd: Stop using split_argv
    @@ Commit message
         call_usermodehelper_setup.
     
         v1: https://lkml.kernel.org/r/87sged3a9n.fsf_-_@x220.int.ebiederm.org
    +    Link: https://lkml.kernel.org/r/20200702164140.4468-16-ebiederm@xmission.com
    +    Acked-by: Alexei Starovoitov <ast@kernel.org>
    +    Tested-by: Alexei Starovoitov <ast@kernel.org>
         Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
     
      ## kernel/usermode_driver.c ##



^ permalink raw reply

* Re: [PATCH bpf-next v4 1/4] bpf: Generalize bpf_sk_storage
From: Martin KaFai Lau @ 2020-07-10  6:59 UTC (permalink / raw)
  To: KP Singh
  Cc: linux-kernel, bpf, linux-security-module, Alexei Starovoitov,
	Daniel Borkmann, Paul Turner, Jann Horn, Florent Revest
In-Reply-To: <20200709101239.3829793-2-kpsingh@chromium.org>

On Thu, Jul 09, 2020 at 12:12:36PM +0200, KP Singh wrote:
> From: KP Singh <kpsingh@google.com>
> 
> Refactor the functionality in bpf_sk_storage.c so that concept of
> storage linked to kernel objects can be extended to other objects like
> inode, task_struct etc.
> 
> bpf_sk_storage is updated to be bpf_local_storage with a union that
> contains a pointer to the owner object.

> The type of the
> bpf_local_storage can be determined using the newly added
> bpf_local_storage_type enum.
This is out dated.

> 
> Each new local storage will still be a separate map and provide its own
> set of helpers. This allows for future object specific extensions and
> still share a lot of the underlying implementation.
Thanks for v4.

I do find it quite hard to follow by directly moving to
bpf_local_storage.c without doing all the renaming locally
at bpf_sk_storage.c first.  I will try my best to follow.

There are some unnecessary name/convention change and function
folding that do not help on this side either.  Please keep them
unchanged for now and they can use another patch in the future if needed.
It will be easier to have a mostly one to one naming change
and please mention them in the commit message.

[ ... ]

> diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h
> new file mode 100644
> index 000000000000..605b81f2f806
> --- /dev/null
> +++ b/include/linux/bpf_local_storage.h
> @@ -0,0 +1,175 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (c) 2019 Facebook
> + * Copyright 2020 Google LLC.
> + */
> +
> +#ifndef _BPF_LOCAL_STORAGE_H
> +#define _BPF_LOCAL_STORAGE_H
> +
> +#include <linux/bpf.h>
> +#include <linux/rculist.h>
> +#include <linux/list.h>
> +#include <linux/hash.h>
> +#include <linux/types.h>
> +#include <uapi/linux/btf.h>
> +
> +#define LOCAL_STORAGE_CREATE_FLAG_MASK					\
> +	(BPF_F_NO_PREALLOC | BPF_F_CLONE)
> +
> +struct bucket {
Since it is in a .h, it can use a more specific name.
May be bpf_local_storage_map_bucket.

> +	struct hlist_head list;
> +	raw_spinlock_t lock;
> +};
> +

[ ... ]

> +struct bpf_local_storage {
> +	struct bpf_local_storage_data __rcu *cache[BPF_STORAGE_CACHE_SIZE];
> +	struct hlist_head list;		/* List of bpf_local_storage_elem */
> +	/* The object that owns the the above "list" of
> +	 * bpf_local_storage_elem
> +	 */
> +	union {
> +		struct sock *sk;
Instead of having a specific pointer type and then union them here,
would one "void *owner;" work as good?

> +	};
> +	struct rcu_head rcu;
> +	raw_spinlock_t lock;	/* Protect adding/removing from the "list" */
> +};
> +
> +/* Helper functions for bpf_local_storage */

[ ... ]

> diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
> new file mode 100644
> index 000000000000..c818eb6f8261
> --- /dev/null
> +++ b/kernel/bpf/bpf_local_storage.c
> @@ -0,0 +1,517 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2019 Facebook
> + * Copyright 2020 Google LLC.
> + */
> +
> +#include <linux/rculist.h>
> +#include <linux/list.h>
> +#include <linux/hash.h>
> +#include <linux/types.h>
> +#include <linux/spinlock.h>
> +#include <linux/bpf.h>
> +#include <linux/bpf_local_storage.h>
> +#include <net/sock.h>
> +#include <uapi/linux/sock_diag.h>
> +#include <uapi/linux/btf.h>
> +
> +#define SELEM(_SDATA)                                                          \
> +	container_of((_SDATA), struct bpf_local_storage_elem, sdata)
> +#define SDATA(_SELEM) (&(_SELEM)->sdata)
> +
> +static struct bucket *select_bucket(struct bpf_local_storage_map *smap,
> +				    struct bpf_local_storage_elem *selem)
> +{
> +	return &smap->buckets[hash_ptr(selem, smap->bucket_log)];
> +}
> +
> +static bool selem_linked_to_node(const struct bpf_local_storage_elem *selem)
The suffix was selem_linked_to"_sk" and it is changed to "_node" here.
However, the latter bpf_selem_unlink has removed the _sk suffix instead.

Instead of _to_node, it is linked to storage.  How about
selem_linked_to_storage()?

> +{
> +	return !hlist_unhashed(&selem->snode);
> +}
> +
> +static bool selem_linked_to_map(const struct bpf_local_storage_elem *selem)
> +{
> +	return !hlist_unhashed(&selem->map_node);
> +}
> +
> +struct bpf_local_storage_elem *
> +bpf_selem_alloc(struct bpf_local_storage_map *smap, void *value)
> +{
> +	struct bpf_local_storage_elem *selem;
> +
> +	selem = kzalloc(smap->elem_size, GFP_ATOMIC | __GFP_NOWARN);
> +	if (selem) {
> +		if (value)
> +			memcpy(SDATA(selem)->data, value, smap->map.value_size);
> +		return selem;
> +	}
> +
> +	return NULL;
> +}
> +
> +/* local_storage->lock must be held and selem->local_storage == local_storage.
> + * The caller must ensure selem->smap is still valid to be
> + * dereferenced for its smap->elem_size and smap->cache_idx.
> + *
> + * uncharge_omem is only relevant for BPF_MAP_TYPE_SK_STORAGE.
> + */
> +bool bpf_selem_unlink(struct bpf_local_storage *local_storage,
> +		      struct bpf_local_storage_elem *selem, bool uncharge_omem)
It is originated from __selem_unlink_sk() which does not take the
local_storage->lock.

How about keeping the _sk suffix here somehow to distinguish it from
unlink_map?
was __selem_unlink_sk => bpf_selem_unlink_storage()?

> +{
> +	struct bpf_local_storage_map *smap;
> +	bool free_local_storage;
> +
> +	smap = rcu_dereference(SDATA(selem)->smap);
> +	free_local_storage = hlist_is_singular_node(&selem->snode,
> +						    &local_storage->list);
> +
> +	/* local_storage is not freed now.  local_storage->lock is
> +	 * still held and raw_spin_unlock_bh(&local_storage->lock)
> +	 * will be done by the caller.
> +	 * Although the unlock will be done under
> +	 * rcu_read_lock(),  it is more intutivie to
> +	 * read if kfree_rcu(local_storage, rcu) is done
> +	 * after the raw_spin_unlock_bh(&local_storage->lock).
> +	 *
> +	 * Hence, a "bool free_local_storage" is returned
> +	 * to the caller which then calls the kfree_rcu()
> +	 * after unlock.
> +	 */
> +	if (free_local_storage)
> +		smap->map.ops->map_local_storage_unlink(local_storage,
> +							uncharge_omem);
> +
> +	hlist_del_init_rcu(&selem->snode);
> +	if (rcu_access_pointer(local_storage->cache[smap->cache_idx]) ==
> +	    SDATA(selem))
> +		RCU_INIT_POINTER(local_storage->cache[smap->cache_idx], NULL);
> +
> +	kfree_rcu(selem, rcu);
> +
> +	return free_local_storage;
> +}
> +
> +void bpf_selem_link(struct bpf_local_storage *local_storage,
> +		    struct bpf_local_storage_elem *selem)
was __selem_link_sk() => bpf_selem_link_storage()

> +{
> +	RCU_INIT_POINTER(selem->local_storage, local_storage);
> +	hlist_add_head(&selem->snode, &local_storage->list);
> +}
> +
> +void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
> +{
> +	struct bpf_local_storage_map *smap;
> +	struct bucket *b;
> +
> +	if (unlikely(!selem_linked_to_map(selem)))
> +		/* selem has already be unlinked from smap */
> +		return;
> +
> +	smap = rcu_dereference(SDATA(selem)->smap);
> +	b = select_bucket(smap, selem);
> +	raw_spin_lock_bh(&b->lock);
> +	if (likely(selem_linked_to_map(selem)))
> +		hlist_del_init_rcu(&selem->map_node);
> +	raw_spin_unlock_bh(&b->lock);
> +}
> +
> +void bpf_selem_link_map(struct bpf_local_storage_map *smap,
> +			struct bpf_local_storage_elem *selem)
> +{
> +	struct bucket *b = select_bucket(smap, selem);
> +
> +	raw_spin_lock_bh(&b->lock);
> +	RCU_INIT_POINTER(SDATA(selem)->smap, smap);
> +	hlist_add_head_rcu(&selem->map_node, &b->list);
> +	raw_spin_unlock_bh(&b->lock);
> +}
> +
> +void bpf_selem_unlink_map_elem(struct bpf_local_storage_elem *selem)
How about keep the original no-suffix to mean unlink from both map and storage.
was selem_unlink() => bpf_selem_unlink()

> +{
> +	struct bpf_local_storage *local_storage;
> +	bool free_local_storage = false;
> +
> +	/* Always unlink from map before unlinking from local_storage
> +	 * because selem will be freed after successfully unlinked from
> +	 * the local_storage.
> +	 */
> +	bpf_selem_unlink_map(selem);
> +
> +	if (unlikely(!selem_linked_to_node(selem)))
> +		/* selem has already been unlinked from its owner */
> +		return;
> +
> +	local_storage = rcu_dereference(selem->local_storage);
> +	raw_spin_lock_bh(&local_storage->lock);
> +	if (likely(selem_linked_to_node(selem)))
> +		free_local_storage =
> +			bpf_selem_unlink(local_storage, selem, true);
> +	raw_spin_unlock_bh(&local_storage->lock);
> +
> +	if (free_local_storage)
> +		kfree_rcu(local_storage, rcu);
Part of these is folding the selem_unlink_sk() into here.
Please don't do it for now.
Keep them in __bpf_selem_unlink_storage().  Hence, we only
need to remember the original "__" meaning is flipped
from unlock to lock.

> +}
> +

[ ... ]

> diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
> index 6f921c4ddc2c..a2b00a09d843 100644
> --- a/net/core/bpf_sk_storage.c
> +++ b/net/core/bpf_sk_storage.c

[ ... ]

> +static void unlink_sk_storage(struct bpf_local_storage *local_storage,
>  			      bool uncharge_omem)
>  {
> -	struct bpf_sk_storage_map *smap;
> -	bool free_sk_storage;
> -	struct sock *sk;
> -
> -	smap = rcu_dereference(SDATA(selem)->smap);
> -	sk = sk_storage->sk;
> +	struct sock *sk = local_storage->sk;
>  
> -	/* All uncharging on sk->sk_omem_alloc must be done first.
> -	 * sk may be freed once the last selem is unlinked from sk_storage.
> -	 */
>  	if (uncharge_omem)
> -		atomic_sub(smap->elem_size, &sk->sk_omem_alloc);
Where is smap->elem_size uncharged?

> -
> -	free_sk_storage = hlist_is_singular_node(&selem->snode,
> -						 &sk_storage->list);
> -	if (free_sk_storage) {
> -		atomic_sub(sizeof(struct bpf_sk_storage), &sk->sk_omem_alloc);
> -		sk_storage->sk = NULL;
> -		/* After this RCU_INIT, sk may be freed and cannot be used */
> -		RCU_INIT_POINTER(sk->sk_bpf_storage, NULL);
> -
> -		/* sk_storage is not freed now.  sk_storage->lock is
> -		 * still held and raw_spin_unlock_bh(&sk_storage->lock)
> -		 * will be done by the caller.
> -		 *
> -		 * Although the unlock will be done under
> -		 * rcu_read_lock(),  it is more intutivie to
> -		 * read if kfree_rcu(sk_storage, rcu) is done
> -		 * after the raw_spin_unlock_bh(&sk_storage->lock).
> -		 *
> -		 * Hence, a "bool free_sk_storage" is returned
> -		 * to the caller which then calls the kfree_rcu()
> -		 * after unlock.
> -		 */
> -	}
> -	hlist_del_init_rcu(&selem->snode);
> -	if (rcu_access_pointer(sk_storage->cache[smap->cache_idx]) ==
> -	    SDATA(selem))
> -		RCU_INIT_POINTER(sk_storage->cache[smap->cache_idx], NULL);
> -
> -	kfree_rcu(selem, rcu);
> -
> -	return free_sk_storage;
> -}

[ ... ]

> +static struct bpf_local_storage_data *
> +sk_storage_update(void *owner, struct bpf_map *map, void *value, u64 map_flags)
>  {
> -	struct bpf_sk_storage_data *old_sdata = NULL;
> -	struct bpf_sk_storage_elem *selem;
> -	struct bpf_sk_storage *sk_storage;
> -	struct bpf_sk_storage_map *smap;
> +	struct bpf_local_storage_data *old_sdata = NULL;
> +	struct bpf_local_storage_elem *selem;
> +	struct bpf_local_storage *local_storage;
> +	struct bpf_local_storage_map *smap;
> +	struct sock *sk;
>  	int err;
>  
> -	/* BPF_EXIST and BPF_NOEXIST cannot be both set */
> -	if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST) ||
> -	    /* BPF_F_LOCK can only be used in a value with spin_lock */
> -	    unlikely((map_flags & BPF_F_LOCK) && !map_value_has_spin_lock(map)))
> -		return ERR_PTR(-EINVAL);
> +	err = bpf_local_storage_check_update_flags(map, map_flags);
> +	if (err)
> +		return ERR_PTR(err);
>  
> -	smap = (struct bpf_sk_storage_map *)map;
> -	sk_storage = rcu_dereference(sk->sk_bpf_storage);
> -	if (!sk_storage || hlist_empty(&sk_storage->list)) {
> -		/* Very first elem for this sk */
> -		err = check_flags(NULL, map_flags);
> -		if (err)
> -			return ERR_PTR(err);
> +	sk = owner;
> +	local_storage = rcu_dereference(sk->sk_bpf_storage);
> +	smap = (struct bpf_local_storage_map *)map;
>  
> -		selem = selem_alloc(smap, sk, value, true);
> +	if (!local_storage || hlist_empty(&local_storage->list)) {
> +		/* Very first elem */
> +		selem = map->ops->map_selem_alloc(smap, owner, value, !old_sdata);
hmmm... If this map_selem_alloc is directly called here in sk_storage instead
of the common local_storage, does it have to be in map_ops?

>  		if (!selem)
>  			return ERR_PTR(-ENOMEM);
>  
> -		err = sk_storage_alloc(sk, smap, selem);
> +		err = map->ops->map_local_storage_alloc(owner, smap, selem);
>  		if (err) {
>  			kfree(selem);
>  			atomic_sub(smap->elem_size, &sk->sk_omem_alloc);

[ ... ]

> -static void bpf_sk_storage_map_free(struct bpf_map *map)
> +static void *bpf_sk_storage_lookup_elem(struct bpf_map *map, void *key)
Hmmm... this change here... keep scrolling down and down .... :)

>  {
> -	struct bpf_sk_storage_elem *selem;
> -	struct bpf_sk_storage_map *smap;
> -	struct bucket *b;
> -	unsigned int i;
> -
> -	smap = (struct bpf_sk_storage_map *)map;
> -
> -	cache_idx_free(smap->cache_idx);
> -
> -	/* Note that this map might be concurrently cloned from
> -	 * bpf_sk_storage_clone. Wait for any existing bpf_sk_storage_clone
> -	 * RCU read section to finish before proceeding. New RCU
> -	 * read sections should be prevented via bpf_map_inc_not_zero.
> -	 */
> -	synchronize_rcu();
> -
> -	/* bpf prog and the userspace can no longer access this map
> -	 * now.  No new selem (of this map) can be added
> -	 * to the sk->sk_bpf_storage or to the map bucket's list.
> -	 *
> -	 * The elem of this map can be cleaned up here
> -	 * or
> -	 * by bpf_sk_storage_free() during __sk_destruct().
> -	 */
> -	for (i = 0; i < (1U << smap->bucket_log); i++) {
> -		b = &smap->buckets[i];
> -
> -		rcu_read_lock();
> -		/* No one is adding to b->list now */
> -		while ((selem = hlist_entry_safe(rcu_dereference_raw(hlist_first_rcu(&b->list)),
> -						 struct bpf_sk_storage_elem,
> -						 map_node))) {
> -			selem_unlink(selem);
> -			cond_resched_rcu();
> -		}
> -		rcu_read_unlock();
> -	}
> -
> -	/* bpf_sk_storage_free() may still need to access the map.
> -	 * e.g. bpf_sk_storage_free() has unlinked selem from the map
> -	 * which then made the above while((selem = ...)) loop
> -	 * exited immediately.
> -	 *
> -	 * However, the bpf_sk_storage_free() still needs to access
> -	 * the smap->elem_size to do the uncharging in
> -	 * __selem_unlink_sk().
> -	 *
> -	 * Hence, wait another rcu grace period for the
> -	 * bpf_sk_storage_free() to finish.
> -	 */
> -	synchronize_rcu();
> -
> -	kvfree(smap->buckets);
> -	kfree(map);
> -}
> -
> -/* U16_MAX is much more than enough for sk local storage
> - * considering a tcp_sock is ~2k.
> - */
> -#define MAX_VALUE_SIZE							\
> -	min_t(u32,							\
> -	      (KMALLOC_MAX_SIZE - MAX_BPF_STACK - sizeof(struct bpf_sk_storage_elem)), \
> -	      (U16_MAX - sizeof(struct bpf_sk_storage_elem)))
> -
> -static int bpf_sk_storage_map_alloc_check(union bpf_attr *attr)
> -{
> -	if (attr->map_flags & ~SK_STORAGE_CREATE_FLAG_MASK ||
> -	    !(attr->map_flags & BPF_F_NO_PREALLOC) ||
> -	    attr->max_entries ||
> -	    attr->key_size != sizeof(int) || !attr->value_size ||
> -	    /* Enforce BTF for userspace sk dumping */
> -	    !attr->btf_key_type_id || !attr->btf_value_type_id)
> -		return -EINVAL;
> -
> -	if (!bpf_capable())
> -		return -EPERM;
> -
> -	if (attr->value_size > MAX_VALUE_SIZE)
> -		return -E2BIG;
> -
> -	return 0;
> -}
> -
> -static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr)
> -{
> -	struct bpf_sk_storage_map *smap;
> -	unsigned int i;
> -	u32 nbuckets;
> -	u64 cost;
> -	int ret;
> -
> -	smap = kzalloc(sizeof(*smap), GFP_USER | __GFP_NOWARN);
> -	if (!smap)
> -		return ERR_PTR(-ENOMEM);
> -	bpf_map_init_from_attr(&smap->map, attr);
> -
> -	nbuckets = roundup_pow_of_two(num_possible_cpus());
> -	/* Use at least 2 buckets, select_bucket() is undefined behavior with 1 bucket */
> -	nbuckets = max_t(u32, 2, nbuckets);
> -	smap->bucket_log = ilog2(nbuckets);
> -	cost = sizeof(*smap->buckets) * nbuckets + sizeof(*smap);
> -
> -	ret = bpf_map_charge_init(&smap->map.memory, cost);
> -	if (ret < 0) {
> -		kfree(smap);
> -		return ERR_PTR(ret);
> -	}
> -
> -	smap->buckets = kvcalloc(sizeof(*smap->buckets), nbuckets,
> -				 GFP_USER | __GFP_NOWARN);
> -	if (!smap->buckets) {
> -		bpf_map_charge_finish(&smap->map.memory);
> -		kfree(smap);
> -		return ERR_PTR(-ENOMEM);
> -	}
> -
> -	for (i = 0; i < nbuckets; i++) {
> -		INIT_HLIST_HEAD(&smap->buckets[i].list);
> -		raw_spin_lock_init(&smap->buckets[i].lock);
> -	}
> -
> -	smap->elem_size = sizeof(struct bpf_sk_storage_elem) + attr->value_size;
> -	smap->cache_idx = cache_idx_get();
> -
> -	return &smap->map;
> -}
> -
> -static int notsupp_get_next_key(struct bpf_map *map, void *key,
> -				void *next_key)
> -{
> -	return -ENOTSUPP;
> -}
> -
> -static int bpf_sk_storage_map_check_btf(const struct bpf_map *map,
> -					const struct btf *btf,
> -					const struct btf_type *key_type,
> -					const struct btf_type *value_type)
> -{
> -	u32 int_data;
> -
> -	if (BTF_INFO_KIND(key_type->info) != BTF_KIND_INT)
> -		return -EINVAL;
> -
> -	int_data = *(u32 *)(key_type + 1);
> -	if (BTF_INT_BITS(int_data) != 32 || BTF_INT_OFFSET(int_data))
> -		return -EINVAL;
> -
> -	return 0;
> -}
> -
> -static void *bpf_fd_sk_storage_lookup_elem(struct bpf_map *map, void *key)
.... finally got it :p

> -{
> -	struct bpf_sk_storage_data *sdata;
> +	struct bpf_local_storage_data *sdata;
>  	struct socket *sock;
> -	int fd, err;
> +	int fd, err = -EINVAL;
This is a bug fix or to suppress compiler warning?

>  
>  	fd = *(int *)key;
>  	sock = sockfd_lookup(fd, &err);
> @@ -752,17 +223,18 @@ static void *bpf_fd_sk_storage_lookup_elem(struct bpf_map *map, void *key)
>  	return ERR_PTR(err);
>  }
>  

[ ... ]

>  static int sk_storage_map_btf_id;
>  const struct bpf_map_ops sk_storage_map_ops = {
> -	.map_alloc_check = bpf_sk_storage_map_alloc_check,
> -	.map_alloc = bpf_sk_storage_map_alloc,
> -	.map_free = bpf_sk_storage_map_free,
> +	.map_alloc_check = bpf_local_storage_map_alloc_check,
> +	.map_alloc = sk_storage_map_alloc,
> +	.map_free = sk_storage_map_free,
>  	.map_get_next_key = notsupp_get_next_key,
> -	.map_lookup_elem = bpf_fd_sk_storage_lookup_elem,
> -	.map_update_elem = bpf_fd_sk_storage_update_elem,
> -	.map_delete_elem = bpf_fd_sk_storage_delete_elem,
Why this "_fd_" name change?

> -	.map_check_btf = bpf_sk_storage_map_check_btf,
> -	.map_btf_name = "bpf_sk_storage_map",
> +	.map_lookup_elem = bpf_sk_storage_lookup_elem,
> +	.map_update_elem = bpf_sk_storage_update_elem,
> +	.map_delete_elem = bpf_sk_storage_delete_elem,
> +	.map_check_btf = bpf_local_storage_map_check_btf,
> +	.map_btf_name = "bpf_local_storage_map",
>  	.map_btf_id = &sk_storage_map_btf_id,
> +	.map_local_storage_alloc = sk_storage_alloc,
> +	.map_selem_alloc = sk_selem_alloc,
> +	.map_local_storage_update = sk_storage_update,
> +	.map_local_storage_unlink = unlink_sk_storage,
>  };

^ permalink raw reply

* Re: [PATCH v19 08/12] landlock: Add syscall implementation
From: Mickaël Salaün @ 2020-07-10 12:57 UTC (permalink / raw)
  To: Christian Brauner, Arnd Bergmann
  Cc: linux-kernel@vger.kernel.org, Al Viro, Andy Lutomirski,
	Anton Ivanov, Casey Schaufler, James Morris, Jann Horn, Jeff Dike,
	Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Richard Weinberger, Serge E . Hallyn,
	Shuah Khan, Vincent Dagonneau, Kernel Hardening, Linux API,
	linux-arch, open list:DOCUMENTATION, Linux FS-devel Mailing List,
	open list:KERNEL SELFTEST FRAMEWORK, LSM List,
	the arch/x86 maintainers
In-Reply-To: <20200709174723.3m7iuma4re2v3xod@wittgenstein>


On 09/07/2020 19:47, Christian Brauner wrote:
> On Thu, Jul 09, 2020 at 07:26:18PM +0200, Arnd Bergmann wrote:
>> On Wed, Jul 8, 2020 at 7:50 PM Mickaël Salaün <mic@digikod.net> wrote:
>>> On 08/07/2020 15:49, Arnd Bergmann wrote:
>>>> On Wed, Jul 8, 2020 at 3:04 PM Mickaël Salaün <mic@digikod.net> wrote:
>>>>> On 08/07/2020 10:57, Arnd Bergmann wrote:
>>>>>> On Tue, Jul 7, 2020 at 8:10 PM Mickaël Salaün <mic@digikod.net> wrote:
>>>>>>
>>>>>> It looks like all you need here today is a single argument bit, plus
>>>>>> possibly some room for extensibility. I would suggest removing all
>>>>>> the extra bits and using a syscall like
>>>>>>
>>>>>> SYSCALL_DEFINE1(landlock_create_ruleset, u32, flags);
>>>>>>
>>>>>> I don't really see how this needs any variable-length arguments,
>>>>>> it really doesn't do much.
>>>>>
>>>>> We need the attr_ptr/attr_size pattern because the number of ruleset
>>>>> properties will increase (e.g. network access mask).
>>>>
>>>> But how many bits do you think you will *actually* need in total that
>>>> this needs to be a two-dimensional set of flags? At the moment you
>>>> only have a single bit that you interpret.
>>>
>>> I think there is a misunderstanding. For this syscall I wasn't talking
>>> about the "options" field but about the "handled_access_fs" field which
>>> has 14 bits dedicated to control access to the file system:
>>> https://landlock.io/linux-doc/landlock-v19/security/landlock/user.html#filesystem-flags
>>
>> Ok, got it. I didn't read far enough there.
>>
>>> The idea is to add other handled_access_* fields for other kernel object
>>> types (e.g. network, process, etc.).
>>>
>>> The "options" field is fine as a raw __u32 syscall argument.
>>
>> I'd still like to avoid having it variable-length and structured though.
>> How about having a __u32 "options" flag, plus an indirect argument
>> with 32 fixed-length (all 32 bit or all 64 bit) flag words, each of which
>> corresponds to one of the option bits?
>>
>> It's still fairly complex that way, but not as much as the version
>> you have right now that can be extended in multiple dimensions.

It seems simpler at first glance, but I prefer to follow the same
implementation as clone3(2) and openat2(2) because it is the same
(generic) struct argument parsing, which may then be handled by seccomp
and others, while adding a variable struct size depending on some flags
seems more complex to handle correctly and error-prone.

>>
>> This could possibly also help avoid the need for the get_features
> 
> What is this fresh hell again, please?

As explained in the commit message, this command/syscall enables to get
the Landlock supported features (i.e. option flags, commands, etc.),
which is required for backward and forward compatibility, and
best-effort security, while having strict kernel errors when
encountering unknown features. Cf.
https://landlock.io/linux-doc/landlock-v19/security/landlock/user.html
See the following arguments.

> 
>> syscall: If user space just passes the bitmap of all the access flags
>> it wants to use in a fixed-size structure, the kernel can update the
>> bits to mask out the ones it does not understand and write back
>> that bitmap as the result of create_ruleset().
>>
>>>>>> To be on the safe side, you might split up the flags into either the
>>>>>> upper/lower 16 bits or two u32 arguments, to allow both compatible
>>>>>> (ignored by older kernels if flag is set) and incompatible (return error
>>>>>> when an unknown flag is set) bits.
>>>>>
>>>>> This may be a good idea in general, but in the case of Landlock, because
>>>>> this kind of (discretionary) sandboxing should be a best-effort security
>>>>> feature, we should avoid incompatible behavior. In practice, every
>>>>> unknown bit returns an error because userland can probe for available
>>>>> bits thanks to the get_features command. This kind of (in)compatibility
>>>>> can then be handled by userland.
>>>>
>>>> If there are not going to be incompatible extensions, then just ignore
>>>> all unknown bits and never return an error but get rid of the user
>>>> space probing that just complicates the interface.
>>>
>>> There was multiple discussions about ABI compatibility, especially
>>> inspired by open(2) vs. openat2(2), and ignoring flags seems to be a bad
>>> idea. In the "sandboxer" example, we first probe the supported features
>>> and then mask unknown bits (i.e. access rights) at run time in userland.
>>> This strategy is quite straightforward, backward compatible and
>>> future-proof.
>>
>> For behavior changing flags, I agree they should be seen as
>> incompatible flags (i.e. return an error if an unknown bit is set).
>>
>> However, for the flags you pass in in an allowlist, treating them
>> as compatible (i.e. ignore any unknown flags, allowing everything
>> you are not forbidding already) seems completely reasonable
>> to me. Do you foresee user space doing anything other than masking
>> out the bits that the kernel doesn't know about? If not, then doing
>> it in the  kernel should always be simpler.

There is some use cases where userspace would benefit from this:

Because of backward and forward compatibility, one important case is for
common usage libraries, to help and encourage developers to not ignore
all security features if some are missing. Indeed, if from the start we
have get_features(), userland logic will be able to leverage it to
seamlessly enforce access-controls as much as possible whatever is the
running kernel.

For userland, being able to not trigger kernel errors (because userland
knows what is supported) enables to log them as real anomalies, which
can then help for debugging and understanding issues.

Landlock may also be combined with seccomp, but with different filters
according to the number of features the running kernel supports (e.g. an
app could filter some syscalls if they can't be restricted by the
running kernel through Landlock). Another point is that while seccomp
could be used as an alternative for some missing access-control from
Landlock, using seccomp for this could have a different impact on
performances (i.e. running BPF filters for each syscalls vs. controlling
access to kernel objects).

Another use case is that a system integrator should be aware that some
security features required by applications are not available. This may
be implemented as logging some warnings, or displaying the supported
security features as Chromium does (i.e. chrome://sandbox/).

This is also useful for (self)tests to ensure that a specific security
feature works as intended.

To sum up, get_features() and the strict kernel error codes enable
application developers to have a contract with the kernel. Multiple
use-cases can then be managed in userspace.


>>
>>>>> I suggest this syscall signature:
>>>>> SYSCALL_DEFINE3(landlock_create_ruleset, __u32, options, const struct
>>>>> landlock_attr_ruleset __user *, ruleset_ptr, size_t, ruleset_size);
>>>>
>>>> The other problem here is that indirect variable-size structured arguments
>>>> are a pain to instrument with things like strace or seccomp, so you
>>>> should first try to use a fixed argument list, and fall back to a fixed
>>>> structure if that fails.
>>>
>>> I agree that it is not perfect with the current tools but this kind of
>>> extensible structs are becoming common and well defined (e.g. openat2).
>>> Moreover there is some work going on for seccomp to support "extensible
>>> argument" syscalls: https://lwn.net/Articles/822256/
>>
>> openat2() is already more complex than we'd ideally want, I think we
>> should try hard to make new syscalls simpler than that, following the
>> rule that any interface should be as simple as possible, but no simpler.
> 
> Extensible structs are targeted at system calls that are either known to
> grow a lot of features or we already have prior versions that have
> accumulated quite a lot of features or that by their nature need to be
> more complex.
> openat2() is not really complex per se (At least not yet. It will likely
> grow quite a bit in the future...). The kernel now has infrastructure
> since clone3() and later generalized with openat2() and is well-equipped
> with a consistent api to deal with such syscalls so I don't see how this
> is really an issue in the first place. Yes, syscalls should be kept
> as simple as possible but we don't need to lock us into a "structs as
> arguments" are inherently bad mindset. That will also cause us to end up
> with crappy syscalls that are awkward to use for userspace.
> (Second-level pointers is a whole different issue of course.)

I agree.

> 
> (Arnd, you should also note that we're giving a talk at kernel summit
> about new syscall conventions and I'm syncing with Florian who'll be
> talking about the userspace side and requirements of this.)

Good! What do you think about these 4 new Landlock syscalls?

> 
> Christian
> 
>>
>>>>>>> +static int syscall_add_rule_path_beneath(const void __user *const attr_ptr,
>>>>>>> +               const size_t attr_size)
>>>>>>> +{
>>>>>>> +       struct landlock_attr_path_beneath attr_path_beneath;
>>>>>>> +       struct path path;
>>>>>>> +       struct landlock_ruleset *ruleset;
>>>>>>> +       int err;
>>>>>>
>>>>>> Similarly, it looks like this wants to be
>>>>>>
>>>>>> SYSCALL_DEFINE3(landlock_add_rule_path_beneath, int, ruleset, int,
>>>>>> path, __u32, flags)
>>>>>>
>>>>>> I don't see any need to extend this in a way that wouldn't already
>>>>>> be served better by adding another system call. You might argue
>>>>>> that 'flags' and 'allowed_access' could be separate, with the latter
>>>>>> being an indirect in/out argument here, like
>>>>>>
>>>>>> SYSCALL_DEFINE4(landlock_add_rule_path_beneath, int, ruleset, int, path,
>>>>>>                            __u64 *, allowed_acces, __u32, flags)
>>>>>
>>>>> To avoid adding a new syscall for each new rule type (e.g. path_beneath,
>>>>> path_range, net_ipv4_range, etc.), I think it would be better to keep
>>>>> the attr_ptr/attr_size pattern and to explicitely set a dedicated option
>>>>> flag to specify the attr type.
>>>>>
>>>>> This would look like this:
>>>>> SYSCALL_DEFINE4(landlock_add_rule, __u32, options, int, ruleset, const
>>>>> void __user *, rule_ptr, size_t, rule_size);
>>>>>
>>>>> The rule_ptr could then point to multiple types like struct
>>>>> landlock_attr_path_beneath (without the current ruleset_fd field).
>>>>
>>>> This again introduces variable-sized structured data. How many different
>>>> kinds of rule types do you think there will be (most likely, and maybe an
>>>> upper bound)?
>>>
>>> I don't know how many rule types will come, but right now I think it may
>>> be less than 10.
>>
>> Ok,
>>
>>>> Could (some of) these be generalized to use the same data structure?
>>>
>>> I don't think so, file path and network addresses are an example of very
>>> different types.
>>
>> Clearly the target object is something different, but maybe there is
>> enough commonality to still make them fit into a more regular form.

Because they don't exist yet, it seems risky to assume that they could
fit into a common form.

>>
>> For the file system case, you have an identify for an object
>> (the file descriptor) and the  '__u64 allowed_access'. I would
>> expect that the 'allowed_access' concept is generic enough that
>> you can make it a direct argument (32 bit register arg, or pointer
>> to a __u64). Do you expect others to need something besides
>> an object identifier and a permission bitmask?

Yes, a file descriptor (i.e. parent_fd) may not make sense to identify
every rule types. For example, it does not help to identify a network
subnet.

Even if it may be "shared", the allowed_access semantic depends on the
object type. It should also remain a 64-bits value to hold enough access
rights. I think it is more consistent, safer, and simpler to keep it in
the same pointed struct.

>> Maybe it could
>> be something like
>>
>>  SYSCALL_DEFINE4(landlock_add_rule, int, ruleset, __u32, options,
>>                        const void __user *, object, const __u64 __user
>> *, allowed_access,
>>                        __u32, flags);
>>
>> with a fixed-length 'object' identifier type (file descriptor,
>> sockaddr_storage, ...) for each option.

This would freeze the pointed struct to a fixed size, which may lead to
redundant struct parsing in the kernel. For example, the current
extended struct enables to extend struct landlock_attr_path_beneath with
a "path" field, which could have the same usage as in openat(2) to avoid
multiple calls to open(2) and close(2) to add new paths. If we have a
fixed-size struct, adding such field would require to create a new type
and I think it would make the kernel code more complex.

The "options" and "flags" seems confusing. For optional future flags, I
think that the name "options" is more explicit than "flags". What about
an "options" (set to zero for now) and a "rule_type" field to clearly
differentiate the rule type identifier (and use an enum)?
This would look like this:
SYSCALL_DEFINE5(landlock_add_rule, int, ruleset, __u32, options, __u32,
rule_type, const void __user *, rule_ptr, size_t, rule_size);

^ permalink raw reply

* Re: [PATCH v5 2/8] lib/mpi: Extend the MPI library
From: Marcelo Henrique Cerri @ 2020-07-10 13:12 UTC (permalink / raw)
  To: Tianjia Zhang
  Cc: herbert, davem, dhowells, mcoquelin.stm32, alexandre.torgue,
	jmorris, serge, nramas, tusharsu, zohar, vt, gilad, pvanleeuwen,
	zhang.jia, linux-crypto, linux-kernel, keyrings, linux-stm32,
	linux-arm-kernel, linux-security-module, linux-integrity
In-Reply-To: <20200709084015.21886-3-tianjia.zhang@linux.alibaba.com>

[-- Attachment #1: Type: text/plain, Size: 67423 bytes --]

Hi, Tianjia.

On Thu, Jul 09, 2020 at 04:40:09PM +0800, Tianjia Zhang wrote:
> Expand the mpi library based on libgcrypt, and the ECC algorithm of
> mpi based on libgcrypt requires these functions.
> Some other algorithms will be developed based on mpi ecc, such as SM2.
> 
> Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
> ---
>  include/linux/mpi.h    |  88 +++++++++++
>  lib/mpi/Makefile       |   5 +
>  lib/mpi/mpi-add.c      | 207 +++++++++++++++++++++++++
>  lib/mpi/mpi-bit.c      | 251 ++++++++++++++++++++++++++++++
>  lib/mpi/mpi-cmp.c      |  46 ++++--
>  lib/mpi/mpi-div.c      | 238 +++++++++++++++++++++++++++++
>  lib/mpi/mpi-internal.h |  53 +++++++
>  lib/mpi/mpi-inv.c      | 143 ++++++++++++++++++
>  lib/mpi/mpi-mod.c      | 155 +++++++++++++++++++
>  lib/mpi/mpi-mul.c      |  94 ++++++++++++
>  lib/mpi/mpicoder.c     | 336 +++++++++++++++++++++++++++++++++++++++++
>  lib/mpi/mpih-div.c     | 294 ++++++++++++++++++++++++++++++++++++
>  lib/mpi/mpih-mul.c     |  25 +++
>  lib/mpi/mpiutil.c      | 204 +++++++++++++++++++++++++
>  14 files changed, 2129 insertions(+), 10 deletions(-)
>  create mode 100644 lib/mpi/mpi-add.c
>  create mode 100644 lib/mpi/mpi-div.c
>  create mode 100644 lib/mpi/mpi-inv.c
>  create mode 100644 lib/mpi/mpi-mod.c
>  create mode 100644 lib/mpi/mpi-mul.c
> 
> diff --git a/include/linux/mpi.h b/include/linux/mpi.h
> index 7bd6d8af0004..2dddf4c6e011 100644
> --- a/include/linux/mpi.h
> +++ b/include/linux/mpi.h
> @@ -40,21 +40,79 @@ struct gcry_mpi {
>  typedef struct gcry_mpi *MPI;
>  
>  #define mpi_get_nlimbs(a)     ((a)->nlimbs)
> +#define mpi_has_sign(a)       ((a)->sign)
>  
>  /*-- mpiutil.c --*/
>  MPI mpi_alloc(unsigned nlimbs);
> +void mpi_clear(MPI a);
>  void mpi_free(MPI a);
>  int mpi_resize(MPI a, unsigned nlimbs);
>  
> +static inline MPI mpi_new(unsigned int nbits)
> +{
> +	return mpi_alloc((nbits + BITS_PER_MPI_LIMB - 1) / BITS_PER_MPI_LIMB);
> +}
> +
> +MPI mpi_copy(MPI a);
> +MPI mpi_alloc_like(MPI a);
> +void mpi_snatch(MPI w, MPI u);
> +MPI mpi_set(MPI w, MPI u);
> +MPI mpi_set_ui(MPI w, unsigned long u);
> +MPI mpi_alloc_set_ui(unsigned long u);
> +void mpi_swap_cond(MPI a, MPI b, unsigned long swap);
> +
> +/* Constants used to return constant MPIs.  See mpi_init if you
> + * want to add more constants.
> + */
> +#define MPI_NUMBER_OF_CONSTANTS 6
> +enum gcry_mpi_constants {
> +	MPI_C_ZERO,
> +	MPI_C_ONE,
> +	MPI_C_TWO,
> +	MPI_C_THREE,
> +	MPI_C_FOUR,
> +	MPI_C_EIGHT
> +};
> +
> +MPI mpi_const(enum gcry_mpi_constants no);
> +
>  /*-- mpicoder.c --*/
> +
> +/* Different formats of external big integer representation. */
> +enum gcry_mpi_format {
> +	GCRYMPI_FMT_NONE = 0,
> +	GCRYMPI_FMT_STD = 1,    /* Twos complement stored without length. */
> +	GCRYMPI_FMT_PGP = 2,    /* As used by OpenPGP (unsigned only). */
> +	GCRYMPI_FMT_SSH = 3,    /* As used by SSH (like STD but with length). */
> +	GCRYMPI_FMT_HEX = 4,    /* Hex format. */
> +	GCRYMPI_FMT_USG = 5,    /* Like STD but unsigned. */
> +	GCRYMPI_FMT_OPAQUE = 8  /* Opaque format (some functions only). */
> +};
> +
>  MPI mpi_read_raw_data(const void *xbuffer, size_t nbytes);
>  MPI mpi_read_from_buffer(const void *buffer, unsigned *ret_nread);
> +int mpi_fromstr(MPI val, const char *str);
> +MPI mpi_scanval(const char *string);
>  MPI mpi_read_raw_from_sgl(struct scatterlist *sgl, unsigned int len);
>  void *mpi_get_buffer(MPI a, unsigned *nbytes, int *sign);
>  int mpi_read_buffer(MPI a, uint8_t *buf, unsigned buf_len, unsigned *nbytes,
>  		    int *sign);
>  int mpi_write_to_sgl(MPI a, struct scatterlist *sg, unsigned nbytes,
>  		     int *sign);
> +int mpi_print(enum gcry_mpi_format format, unsigned char *buffer,
> +			size_t buflen, size_t *nwritten, MPI a);
> +
> +/*-- mpi-mod.c --*/
> +void mpi_mod(MPI rem, MPI dividend, MPI divisor);
> +
> +/* Context used with Barrett reduction.  */
> +struct barrett_ctx_s;
> +typedef struct barrett_ctx_s *mpi_barrett_t;
> +
> +mpi_barrett_t mpi_barrett_init(MPI m, int copy);
> +void mpi_barrett_free(mpi_barrett_t ctx);
> +void mpi_mod_barrett(MPI r, MPI x, mpi_barrett_t ctx);
> +void mpi_mul_barrett(MPI w, MPI u, MPI v, mpi_barrett_t ctx);
>  
>  /*-- mpi-pow.c --*/
>  int mpi_powm(MPI res, MPI base, MPI exp, MPI mod);
> @@ -62,10 +120,40 @@ int mpi_powm(MPI res, MPI base, MPI exp, MPI mod);
>  /*-- mpi-cmp.c --*/
>  int mpi_cmp_ui(MPI u, ulong v);
>  int mpi_cmp(MPI u, MPI v);
> +int mpi_cmpabs(MPI u, MPI v);
>  
>  /*-- mpi-bit.c --*/
>  void mpi_normalize(MPI a);
>  unsigned mpi_get_nbits(MPI a);
> +int mpi_test_bit(MPI a, unsigned int n);
> +void mpi_set_bit(MPI a, unsigned int n);
> +void mpi_set_highbit(MPI a, unsigned int n);
> +void mpi_clear_highbit(MPI a, unsigned int n);
> +void mpi_clear_bit(MPI a, unsigned int n);
> +void mpi_rshift_limbs(MPI a, unsigned int count);
> +void mpi_rshift(MPI x, MPI a, unsigned int n);
> +void mpi_lshift_limbs(MPI a, unsigned int count);
> +void mpi_lshift(MPI x, MPI a, unsigned int n);
> +
> +/*-- mpi-add.c --*/
> +void mpi_add_ui(MPI w, MPI u, unsigned long v);
> +void mpi_add(MPI w, MPI u, MPI v);
> +void mpi_sub_ui(MPI w, MPI u, unsigned long v);
> +void mpi_sub(MPI w, MPI u, MPI v);
> +void mpi_addm(MPI w, MPI u, MPI v, MPI m);
> +void mpi_subm(MPI w, MPI u, MPI v, MPI m);
> +
> +/*-- mpi-mul.c --*/
> +void mpi_mul(MPI w, MPI u, MPI v);
> +void mpi_mulm(MPI w, MPI u, MPI v, MPI m);
> +
> +/*-- mpi-div.c --*/
> +void mpi_tdiv_r(MPI rem, MPI num, MPI den);
> +void mpi_fdiv_r(MPI rem, MPI dividend, MPI divisor);
> +void mpi_fdiv_q(MPI quot, MPI dividend, MPI divisor);
> +
> +/*-- mpi-inv.c --*/
> +int mpi_invm(MPI x, MPI a, MPI n);
>  
>  /* inline functions */
>  
> diff --git a/lib/mpi/Makefile b/lib/mpi/Makefile
> index d5874a7f5ff9..5f40f93ff3d9 100644
> --- a/lib/mpi/Makefile
> +++ b/lib/mpi/Makefile
> @@ -14,8 +14,13 @@ mpi-y = \
>  	generic_mpih-sub1.o		\
>  	generic_mpih-add1.o		\
>  	mpicoder.o			\
> +	mpi-add.o			\
>  	mpi-bit.o			\
>  	mpi-cmp.o			\
> +	mpi-div.o			\
> +	mpi-inv.o			\
> +	mpi-mod.o			\
> +	mpi-mul.o			\
>  	mpih-cmp.o			\
>  	mpih-div.o			\
>  	mpih-mul.o			\
> diff --git a/lib/mpi/mpi-add.c b/lib/mpi/mpi-add.c
> new file mode 100644
> index 000000000000..9afad7832737
> --- /dev/null
> +++ b/lib/mpi/mpi-add.c
> @@ -0,0 +1,207 @@
> +/* mpi-add.c  -  MPI functions
> + * Copyright (C) 1994, 1996, 1998, 2001, 2002,
> + *               2003 Free Software Foundation, Inc.
> + *
> + * This file is part of Libgcrypt.
> + *
> + * Note: This code is heavily based on the GNU MP Library.
> + *	 Actually it's the same code with only minor changes in the
> + *	 way the data is stored; this is to support the abstraction
> + *	 of an optional secure memory allocation which may be used
> + *	 to avoid revealing of sensitive data due to paging etc.
> + */
> +
> +#include "mpi-internal.h"
> +
> +/****************
> + * Add the unsigned integer V to the mpi-integer U and store the
> + * result in W. U and V may be the same.
> + */
> +void mpi_add_ui(MPI w, MPI u, unsigned long v)
> +{
> +	mpi_ptr_t wp, up;
> +	mpi_size_t usize, wsize;
> +	int usign, wsign;
> +
> +	usize = u->nlimbs;
> +	usign = u->sign;
> +	wsign = 0;
> +
> +	/* If not space for W (and possible carry), increase space.  */
> +	wsize = usize + 1;
> +	if (w->alloced < wsize)
> +		mpi_resize(w, wsize);

You are ignoring the mpi_resize() return. I believe these new functions
need to return an int to indicate errors as mpi_powm() does.


> +
> +	/* These must be after realloc (U may be the same as W).  */
> +	up = u->d;
> +	wp = w->d;
> +
> +	if (!usize) {  /* simple */
> +		wp[0] = v;
> +		wsize = v ? 1:0;
> +	} else if (!usign) {  /* mpi is not negative */
> +		mpi_limb_t cy;
> +		cy = mpihelp_add_1(wp, up, usize, v);
> +		wp[usize] = cy;
> +		wsize = usize + cy;
> +	} else {
> +		/* The signs are different.  Need exact comparison to determine
> +		 * which operand to subtract from which.
> +		 */
> +		if (usize == 1 && up[0] < v) {
> +			wp[0] = v - up[0];
> +			wsize = 1;
> +		} else {
> +			mpihelp_sub_1(wp, up, usize, v);
> +			/* Size can decrease with at most one limb. */
> +			wsize = usize - (wp[usize-1] == 0);
> +			wsign = 1;
> +		}
> +	}
> +
> +	w->nlimbs = wsize;
> +	w->sign   = wsign;
> +}
> +
> +
> +void mpi_add(MPI w, MPI u, MPI v)
> +{
> +	mpi_ptr_t wp, up, vp;
> +	mpi_size_t usize, vsize, wsize;
> +	int usign, vsign, wsign;
> +
> +	if (u->nlimbs < v->nlimbs) { /* Swap U and V. */
> +		usize = v->nlimbs;
> +		usign = v->sign;
> +		vsize = u->nlimbs;
> +		vsign = u->sign;
> +		wsize = usize + 1;
> +		RESIZE_IF_NEEDED(w, wsize);
> +		/* These must be after realloc (u or v may be the same as w).  */
> +		up = v->d;
> +		vp = u->d;
> +	} else {
> +		usize = u->nlimbs;
> +		usign = u->sign;
> +		vsize = v->nlimbs;
> +		vsign = v->sign;
> +		wsize = usize + 1;
> +		RESIZE_IF_NEEDED(w, wsize);
> +		/* These must be after realloc (u or v may be the same as w).  */
> +		up = u->d;
> +		vp = v->d;
> +	}
> +	wp = w->d;
> +	wsign = 0;
> +
> +	if (!vsize) {  /* simple */
> +		MPN_COPY(wp, up, usize);
> +		wsize = usize;
> +		wsign = usign;
> +	} else if (usign != vsign) { /* different sign */
> +		/* This test is right since USIZE >= VSIZE */
> +		if (usize != vsize) {
> +			mpihelp_sub(wp, up, usize, vp, vsize);
> +			wsize = usize;
> +			MPN_NORMALIZE(wp, wsize);
> +			wsign = usign;
> +		} else if (mpihelp_cmp(up, vp, usize) < 0) {
> +			mpihelp_sub_n(wp, vp, up, usize);
> +			wsize = usize;
> +			MPN_NORMALIZE(wp, wsize);
> +			if (!usign)
> +				wsign = 1;
> +		} else {
> +			mpihelp_sub_n(wp, up, vp, usize);
> +			wsize = usize;
> +			MPN_NORMALIZE(wp, wsize);
> +			if (usign)
> +				wsign = 1;
> +		}
> +	} else { /* U and V have same sign. Add them. */
> +		mpi_limb_t cy = mpihelp_add(wp, up, usize, vp, vsize);
> +		wp[usize] = cy;
> +		wsize = usize + cy;
> +		if (usign)
> +			wsign = 1;
> +	}
> +
> +	w->nlimbs = wsize;
> +	w->sign = wsign;
> +}
> +EXPORT_SYMBOL_GPL(mpi_add);
> +
> +
> +/****************
> + * Subtract the unsigned integer V from the mpi-integer U and store the
> + * result in W.
> + */
> +void mpi_sub_ui(MPI w, MPI u, unsigned long v)
> +{
> +	mpi_ptr_t wp, up;
> +	mpi_size_t usize, wsize;
> +	int usign, wsign;
> +
> +	usize = u->nlimbs;
> +	usign = u->sign;
> +	wsign = 0;
> +
> +	/* If not space for W (and possible carry), increase space.  */
> +	wsize = usize + 1;
> +	if (w->alloced < wsize)
> +		mpi_resize(w, wsize);
> +
> +	/* These must be after realloc (U may be the same as W).  */
> +	up = u->d;
> +	wp = w->d;
> +
> +	if (!usize) {  /* simple */
> +		wp[0] = v;
> +		wsize = v ? 1:0;
> +		wsign = 1;
> +	} else if (usign) {	/* mpi and v are negative */
> +		mpi_limb_t cy;
> +		cy = mpihelp_add_1(wp, up, usize, v);
> +		wp[usize] = cy;
> +		wsize = usize + cy;
> +	} else {
> +		/* The signs are different.  Need exact comparison to determine
> +		 * which operand to subtract from which.
> +		 */
> +		if (usize == 1 && up[0] < v) {
> +			wp[0] = v - up[0];
> +			wsize = 1;
> +			wsign = 1;
> +		} else {
> +			mpihelp_sub_1(wp, up, usize, v);
> +			/* Size can decrease with at most one limb. */
> +			wsize = usize - (wp[usize-1] == 0);
> +		}
> +	}
> +
> +	w->nlimbs = wsize;
> +	w->sign   = wsign;
> +}
> +
> +void mpi_sub(MPI w, MPI u, MPI v)
> +{
> +	MPI vv = mpi_copy(v);
> +	vv->sign = !vv->sign;
> +	mpi_add(w, u, vv);
> +	mpi_free(vv);
> +}
> +
> +
> +void mpi_addm(MPI w, MPI u, MPI v, MPI m)
> +{
> +	mpi_add(w, u, v);
> +	mpi_mod(w, w, m);
> +}
> +EXPORT_SYMBOL_GPL(mpi_addm);
> +
> +void mpi_subm(MPI w, MPI u, MPI v, MPI m)
> +{
> +	mpi_sub(w, u, v);
> +	mpi_mod(w, w, m);
> +}
> +EXPORT_SYMBOL_GPL(mpi_subm);
> diff --git a/lib/mpi/mpi-bit.c b/lib/mpi/mpi-bit.c
> index 503537e08436..a5119a2bcdd4 100644
> --- a/lib/mpi/mpi-bit.c
> +++ b/lib/mpi/mpi-bit.c
> @@ -32,6 +32,7 @@ void mpi_normalize(MPI a)
>  	for (; a->nlimbs && !a->d[a->nlimbs - 1]; a->nlimbs--)
>  		;
>  }
> +EXPORT_SYMBOL_GPL(mpi_normalize);
>  
>  /****************
>   * Return the number of bits in A.
> @@ -54,3 +55,253 @@ unsigned mpi_get_nbits(MPI a)
>  	return n;
>  }
>  EXPORT_SYMBOL_GPL(mpi_get_nbits);
> +
> +/****************
> + * Test whether bit N is set.
> + */
> +int mpi_test_bit(MPI a, unsigned int n)
> +{
> +	unsigned int limbno, bitno;
> +	mpi_limb_t limb;
> +
> +	limbno = n / BITS_PER_MPI_LIMB;
> +	bitno  = n % BITS_PER_MPI_LIMB;
> +
> +	if (limbno >= a->nlimbs)
> +		return 0; /* too far left: this is a 0 */
> +	limb = a->d[limbno];
> +	return (limb & (A_LIMB_1 << bitno)) ? 1 : 0;
> +}
> +EXPORT_SYMBOL_GPL(mpi_test_bit);
> +
> +/****************
> + * Set bit N of A.
> + */
> +void mpi_set_bit(MPI a, unsigned int n)
> +{
> +	unsigned int i, limbno, bitno;
> +
> +	limbno = n / BITS_PER_MPI_LIMB;
> +	bitno  = n % BITS_PER_MPI_LIMB;
> +
> +	if (limbno >= a->nlimbs) {
> +		for (i = a->nlimbs; i < a->alloced; i++)
> +			a->d[i] = 0;
> +		mpi_resize(a, limbno+1);
> +		a->nlimbs = limbno+1;
> +	}
> +	a->d[limbno] |= (A_LIMB_1<<bitno);
> +}
> +
> +/****************
> + * Set bit N of A. and clear all bits above
> + */
> +void mpi_set_highbit(MPI a, unsigned int n)
> +{
> +	unsigned int i, limbno, bitno;
> +
> +	limbno = n / BITS_PER_MPI_LIMB;
> +	bitno  = n % BITS_PER_MPI_LIMB;
> +
> +	if (limbno >= a->nlimbs) {
> +		for (i = a->nlimbs; i < a->alloced; i++)
> +			a->d[i] = 0;
> +		mpi_resize(a, limbno+1);
> +		a->nlimbs = limbno+1;
> +	}
> +	a->d[limbno] |= (A_LIMB_1<<bitno);
> +	for (bitno++; bitno < BITS_PER_MPI_LIMB; bitno++)
> +		a->d[limbno] &= ~(A_LIMB_1 << bitno);
> +	a->nlimbs = limbno+1;
> +}
> +EXPORT_SYMBOL_GPL(mpi_set_highbit);
> +
> +/****************
> + * clear bit N of A and all bits above
> + */
> +void mpi_clear_highbit(MPI a, unsigned int n)
> +{
> +	unsigned int limbno, bitno;
> +
> +	limbno = n / BITS_PER_MPI_LIMB;
> +	bitno  = n % BITS_PER_MPI_LIMB;
> +
> +	if (limbno >= a->nlimbs)
> +		return; /* not allocated, therefore no need to clear bits :-) */
> +
> +	for ( ; bitno < BITS_PER_MPI_LIMB; bitno++)
> +		a->d[limbno] &= ~(A_LIMB_1 << bitno);
> +	a->nlimbs = limbno+1;
> +}
> +
> +/****************
> + * Clear bit N of A.
> + */
> +void mpi_clear_bit(MPI a, unsigned int n)
> +{
> +	unsigned int limbno, bitno;
> +
> +	limbno = n / BITS_PER_MPI_LIMB;
> +	bitno  = n % BITS_PER_MPI_LIMB;
> +
> +	if (limbno >= a->nlimbs)
> +		return; /* Don't need to clear this bit, it's far too left.  */
> +	a->d[limbno] &= ~(A_LIMB_1 << bitno);
> +}
> +EXPORT_SYMBOL_GPL(mpi_clear_bit);
> +
> +
> +/****************
> + * Shift A by COUNT limbs to the right
> + * This is used only within the MPI library
> + */
> +void mpi_rshift_limbs(MPI a, unsigned int count)
> +{
> +	mpi_ptr_t ap = a->d;
> +	mpi_size_t n = a->nlimbs;
> +	unsigned int i;
> +
> +	if (count >= n) {
> +		a->nlimbs = 0;
> +		return;
> +	}
> +
> +	for (i = 0; i < n - count; i++)
> +		ap[i] = ap[i+count];
> +	ap[i] = 0;
> +	a->nlimbs -= count;
> +}
> +
> +/*
> + * Shift A by N bits to the right.
> + */
> +void mpi_rshift(MPI x, MPI a, unsigned int n)
> +{
> +	mpi_size_t xsize;
> +	unsigned int i;
> +	unsigned int nlimbs = (n/BITS_PER_MPI_LIMB);
> +	unsigned int nbits = (n%BITS_PER_MPI_LIMB);
> +
> +	if (x == a) {
> +		/* In-place operation.  */
> +		if (nlimbs >= x->nlimbs) {
> +			x->nlimbs = 0;
> +			return;
> +		}
> +
> +		if (nlimbs) {
> +			for (i = 0; i < x->nlimbs - nlimbs; i++)
> +				x->d[i] = x->d[i+nlimbs];
> +			x->d[i] = 0;
> +			x->nlimbs -= nlimbs;
> +		}
> +		if (x->nlimbs && nbits)
> +			mpihelp_rshift(x->d, x->d, x->nlimbs, nbits);
> +	} else if (nlimbs) {
> +		/* Copy and shift by more or equal bits than in a limb. */
> +		xsize = a->nlimbs;
> +		x->sign = a->sign;
> +		RESIZE_IF_NEEDED(x, xsize);
> +		x->nlimbs = xsize;
> +		for (i = 0; i < a->nlimbs; i++)
> +			x->d[i] = a->d[i];
> +		x->nlimbs = i;
> +
> +		if (nlimbs >= x->nlimbs) {
> +			x->nlimbs = 0;
> +			return;
> +		}
> +
> +		if (nlimbs) {
> +			for (i = 0; i < x->nlimbs - nlimbs; i++)
> +				x->d[i] = x->d[i+nlimbs];
> +			x->d[i] = 0;
> +			x->nlimbs -= nlimbs;
> +		}
> +
> +		if (x->nlimbs && nbits)
> +			mpihelp_rshift(x->d, x->d, x->nlimbs, nbits);
> +	} else {
> +		/* Copy and shift by less than bits in a limb.  */
> +		xsize = a->nlimbs;
> +		x->sign = a->sign;
> +		RESIZE_IF_NEEDED(x, xsize);
> +		x->nlimbs = xsize;
> +
> +		if (xsize) {
> +			if (nbits)
> +				mpihelp_rshift(x->d, a->d, x->nlimbs, nbits);
> +			else {
> +				/* The rshift helper function is not specified for
> +				 * NBITS==0, thus we do a plain copy here.
> +				 */
> +				for (i = 0; i < x->nlimbs; i++)
> +					x->d[i] = a->d[i];
> +			}
> +		}
> +	}
> +	MPN_NORMALIZE(x->d, x->nlimbs);
> +}
> +
> +/****************
> + * Shift A by COUNT limbs to the left
> + * This is used only within the MPI library
> + */
> +void mpi_lshift_limbs(MPI a, unsigned int count)
> +{
> +	mpi_ptr_t ap;
> +	int n = a->nlimbs;
> +	int i;
> +
> +	if (!count || !n)
> +		return;
> +
> +	RESIZE_IF_NEEDED(a, n+count);
> +
> +	ap = a->d;
> +	for (i = n-1; i >= 0; i--)
> +		ap[i+count] = ap[i];
> +	for (i = 0; i < count; i++)
> +		ap[i] = 0;
> +	a->nlimbs += count;
> +}
> +
> +/*
> + * Shift A by N bits to the left.
> + */
> +void mpi_lshift(MPI x, MPI a, unsigned int n)
> +{
> +	unsigned int nlimbs = (n/BITS_PER_MPI_LIMB);
> +	unsigned int nbits = (n%BITS_PER_MPI_LIMB);
> +
> +	if (x == a && !n)
> +		return;  /* In-place shift with an amount of zero.  */
> +
> +	if (x != a) {
> +		/* Copy A to X.  */
> +		unsigned int alimbs = a->nlimbs;
> +		int asign = a->sign;
> +		mpi_ptr_t xp, ap;
> +
> +		RESIZE_IF_NEEDED(x, alimbs+nlimbs+1);
> +		xp = x->d;
> +		ap = a->d;
> +		MPN_COPY(xp, ap, alimbs);
> +		x->nlimbs = alimbs;
> +		x->flags = a->flags;
> +		x->sign = asign;
> +	}
> +
> +	if (nlimbs && !nbits) {
> +		/* Shift a full number of limbs.  */
> +		mpi_lshift_limbs(x, nlimbs);
> +	} else if (n) {
> +		/* We use a very dump approach: Shift left by the number of
> +		 * limbs plus one and than fix it up by an rshift.
> +		 */
> +		mpi_lshift_limbs(x, nlimbs+1);
> +		mpi_rshift(x, x, BITS_PER_MPI_LIMB - nbits);
> +	}
> +
> +	MPN_NORMALIZE(x->d, x->nlimbs);
> +}
> diff --git a/lib/mpi/mpi-cmp.c b/lib/mpi/mpi-cmp.c
> index d25e9e96c310..c4cfa3ff0581 100644
> --- a/lib/mpi/mpi-cmp.c
> +++ b/lib/mpi/mpi-cmp.c
> @@ -41,28 +41,54 @@ int mpi_cmp_ui(MPI u, unsigned long v)
>  }
>  EXPORT_SYMBOL_GPL(mpi_cmp_ui);
>  
> -int mpi_cmp(MPI u, MPI v)
> +static int do_mpi_cmp(MPI u, MPI v, int absmode)
>  {
> -	mpi_size_t usize, vsize;
> +	mpi_size_t usize;
> +	mpi_size_t vsize;
> +	int usign;
> +	int vsign;
>  	int cmp;
>  
>  	mpi_normalize(u);
>  	mpi_normalize(v);
> +
>  	usize = u->nlimbs;
>  	vsize = v->nlimbs;
> -	if (!u->sign && v->sign)
> +	usign = absmode ? 0 : u->sign;
> +	vsign = absmode ? 0 : v->sign;
> +
> +	/* Compare sign bits.  */
> +
> +	if (!usign && vsign)
>  		return 1;
> -	if (u->sign && !v->sign)
> +	if (usign && !vsign)
>  		return -1;
> -	if (usize != vsize && !u->sign && !v->sign)
> +
> +	/* U and V are either both positive or both negative.  */
> +
> +	if (usize != vsize && !usign && !vsign)
>  		return usize - vsize;
> -	if (usize != vsize && u->sign && v->sign)
> -		return vsize - usize;
> +	if (usize != vsize && usign && vsign)
> +		return vsize + usize;
>  	if (!usize)
>  		return 0;
>  	cmp = mpihelp_cmp(u->d, v->d, usize);
> -	if (u->sign)
> -		return -cmp;
> -	return cmp;
> +	if (!cmp)
> +		return 0;
> +	if ((cmp < 0?1:0) == (usign?1:0))
> +		return 1;
> +
> +	return -1;
> +}
> +
> +int mpi_cmp(MPI u, MPI v)
> +{
> +	return do_mpi_cmp(u, v, 0);
>  }
>  EXPORT_SYMBOL_GPL(mpi_cmp);
> +
> +int mpi_cmpabs(MPI u, MPI v)
> +{
> +	return do_mpi_cmp(u, v, 1);
> +}
> +EXPORT_SYMBOL_GPL(mpi_cmpabs);
> diff --git a/lib/mpi/mpi-div.c b/lib/mpi/mpi-div.c
> new file mode 100644
> index 000000000000..21332dab97d4
> --- /dev/null
> +++ b/lib/mpi/mpi-div.c
> @@ -0,0 +1,238 @@
> +/* mpi-div.c  -  MPI functions
> + * Copyright (C) 1994, 1996, 1998, 2001, 2002,
> + *               2003 Free Software Foundation, Inc.
> + *
> + * This file is part of Libgcrypt.
> + *
> + * Note: This code is heavily based on the GNU MP Library.
> + *	 Actually it's the same code with only minor changes in the
> + *	 way the data is stored; this is to support the abstraction
> + *	 of an optional secure memory allocation which may be used
> + *	 to avoid revealing of sensitive data due to paging etc.
> + */
> +
> +#include "mpi-internal.h"
> +#include "longlong.h"
> +
> +void mpi_tdiv_qr(MPI quot, MPI rem, MPI num, MPI den);
> +void mpi_fdiv_qr(MPI quot, MPI rem, MPI dividend, MPI divisor);
> +
> +void mpi_fdiv_r(MPI rem, MPI dividend, MPI divisor)
> +{
> +	int divisor_sign = divisor->sign;
> +	MPI temp_divisor = NULL;
> +
> +	/* We need the original value of the divisor after the remainder has been
> +	 * preliminary calculated.	We have to copy it to temporary space if it's
> +	 * the same variable as REM.
> +	 */
> +	if (rem == divisor) {
> +		temp_divisor = mpi_copy(divisor);
> +		divisor = temp_divisor;
> +	}
> +
> +	mpi_tdiv_r(rem, dividend, divisor);
> +
> +	if (((divisor_sign?1:0) ^ (dividend->sign?1:0)) && rem->nlimbs)
> +		mpi_add(rem, rem, divisor);
> +
> +	if (temp_divisor)
> +		mpi_free(temp_divisor);
> +}
> +
> +void mpi_fdiv_q(MPI quot, MPI dividend, MPI divisor)
> +{
> +	MPI tmp = mpi_alloc(mpi_get_nlimbs(quot));
> +	mpi_fdiv_qr(quot, tmp, dividend, divisor);
> +	mpi_free(tmp);
> +}
> +
> +void mpi_fdiv_qr(MPI quot, MPI rem, MPI dividend, MPI divisor)
> +{
> +	int divisor_sign = divisor->sign;
> +	MPI temp_divisor = NULL;
> +
> +	if (quot == divisor || rem == divisor) {
> +		temp_divisor = mpi_copy(divisor);
> +		divisor = temp_divisor;
> +	}
> +
> +	mpi_tdiv_qr(quot, rem, dividend, divisor);
> +
> +	if ((divisor_sign ^ dividend->sign) && rem->nlimbs) {
> +		mpi_sub_ui(quot, quot, 1);
> +		mpi_add(rem, rem, divisor);
> +	}
> +
> +	if (temp_divisor)
> +		mpi_free(temp_divisor);
> +}
> +
> +/* If den == quot, den needs temporary storage.
> + * If den == rem, den needs temporary storage.
> + * If num == quot, num needs temporary storage.
> + * If den has temporary storage, it can be normalized while being copied,
> + *   i.e no extra storage should be allocated.
> + */
> +
> +void mpi_tdiv_r(MPI rem, MPI num, MPI den)
> +{
> +	mpi_tdiv_qr(NULL, rem, num, den);
> +}
> +
> +void mpi_tdiv_qr(MPI quot, MPI rem, MPI num, MPI den)
> +{
> +	mpi_ptr_t np, dp;
> +	mpi_ptr_t qp, rp;
> +	mpi_size_t nsize = num->nlimbs;
> +	mpi_size_t dsize = den->nlimbs;
> +	mpi_size_t qsize, rsize;
> +	mpi_size_t sign_remainder = num->sign;
> +	mpi_size_t sign_quotient = num->sign ^ den->sign;
> +	unsigned int normalization_steps;
> +	mpi_limb_t q_limb;
> +	mpi_ptr_t marker[5];
> +	unsigned int marker_nlimbs[5];
> +	int markidx = 0;
> +
> +	/* Ensure space is enough for quotient and remainder.
> +	 * We need space for an extra limb in the remainder, because it's
> +	 * up-shifted (normalized) below.
> +	 */
> +	rsize = nsize + 1;
> +	mpi_resize(rem, rsize);
> +
> +	qsize = rsize - dsize;	  /* qsize cannot be bigger than this.	*/
> +	if (qsize <= 0) {
> +		if (num != rem) {
> +			rem->nlimbs = num->nlimbs;
> +			rem->sign = num->sign;
> +			MPN_COPY(rem->d, num->d, nsize);
> +		}
> +		if (quot) {
> +			/* This needs to follow the assignment to rem, in case the
> +			 * numerator and quotient are the same.
> +			 */
> +			quot->nlimbs = 0;
> +			quot->sign = 0;
> +		}
> +		return;
> +	}
> +
> +	if (quot)
> +		mpi_resize(quot, qsize);
> +
> +	/* Read pointers here, when reallocation is finished.  */
> +	np = num->d;
> +	dp = den->d;
> +	rp = rem->d;
> +
> +	/* Optimize division by a single-limb divisor.  */
> +	if (dsize == 1) {
> +		mpi_limb_t rlimb;
> +		if (quot) {
> +			qp = quot->d;
> +			rlimb = mpihelp_divmod_1(qp, np, nsize, dp[0]);
> +			qsize -= qp[qsize - 1] == 0;
> +			quot->nlimbs = qsize;
> +			quot->sign = sign_quotient;
> +		} else
> +			rlimb = mpihelp_mod_1(np, nsize, dp[0]);
> +		rp[0] = rlimb;
> +		rsize = rlimb != 0?1:0;
> +		rem->nlimbs = rsize;
> +		rem->sign = sign_remainder;
> +		return;
> +	}
> +
> +
> +	if (quot) {
> +		qp = quot->d;
> +		/* Make sure QP and NP point to different objects.  Otherwise the
> +		 * numerator would be gradually overwritten by the quotient limbs.
> +		 */
> +		if (qp == np) { /* Copy NP object to temporary space.  */
> +			marker_nlimbs[markidx] = nsize;
> +			np = marker[markidx++] = mpi_alloc_limb_space(nsize);
> +			MPN_COPY(np, qp, nsize);
> +		}
> +	} else /* Put quotient at top of remainder. */
> +		qp = rp + dsize;
> +
> +	normalization_steps = count_leading_zeros(dp[dsize - 1]);
> +
> +	/* Normalize the denominator, i.e. make its most significant bit set by
> +	 * shifting it NORMALIZATION_STEPS bits to the left.  Also shift the
> +	 * numerator the same number of steps (to keep the quotient the same!).
> +	 */
> +	if (normalization_steps) {
> +		mpi_ptr_t tp;
> +		mpi_limb_t nlimb;
> +
> +		/* Shift up the denominator setting the most significant bit of
> +		 * the most significant word.  Use temporary storage not to clobber
> +		 * the original contents of the denominator.
> +		 */
> +		marker_nlimbs[markidx] = dsize;
> +		tp = marker[markidx++] = mpi_alloc_limb_space(dsize);
> +		mpihelp_lshift(tp, dp, dsize, normalization_steps);
> +		dp = tp;
> +
> +		/* Shift up the numerator, possibly introducing a new most
> +		 * significant word.  Move the shifted numerator in the remainder
> +		 * meanwhile.
> +		 */
> +		nlimb = mpihelp_lshift(rp, np, nsize, normalization_steps);
> +		if (nlimb) {
> +			rp[nsize] = nlimb;
> +			rsize = nsize + 1;
> +		} else
> +			rsize = nsize;
> +	} else {
> +		/* The denominator is already normalized, as required.	Copy it to
> +		 * temporary space if it overlaps with the quotient or remainder.
> +		 */
> +		if (dp == rp || (quot && (dp == qp))) {
> +			mpi_ptr_t tp;
> +
> +			marker_nlimbs[markidx] = dsize;
> +			tp = marker[markidx++] = mpi_alloc_limb_space(dsize);
> +			MPN_COPY(tp, dp, dsize);
> +			dp = tp;
> +		}
> +
> +		/* Move the numerator to the remainder.  */
> +		if (rp != np)
> +			MPN_COPY(rp, np, nsize);
> +
> +		rsize = nsize;
> +	}
> +
> +	q_limb = mpihelp_divrem(qp, 0, rp, rsize, dp, dsize);
> +
> +	if (quot) {
> +		qsize = rsize - dsize;
> +		if (q_limb) {
> +			qp[qsize] = q_limb;
> +			qsize += 1;
> +		}
> +
> +		quot->nlimbs = qsize;
> +		quot->sign = sign_quotient;
> +	}
> +
> +	rsize = dsize;
> +	MPN_NORMALIZE(rp, rsize);
> +
> +	if (normalization_steps && rsize) {
> +		mpihelp_rshift(rp, rp, rsize, normalization_steps);
> +		rsize -= rp[rsize - 1] == 0?1:0;
> +	}
> +
> +	rem->nlimbs = rsize;
> +	rem->sign	= sign_remainder;
> +	while (markidx) {
> +		markidx--;
> +		mpi_free_limb_space(marker[markidx]);
> +	}
> +}
> diff --git a/lib/mpi/mpi-internal.h b/lib/mpi/mpi-internal.h
> index 91df5f0b70f2..d29c4537c3a3 100644
> --- a/lib/mpi/mpi-internal.h
> +++ b/lib/mpi/mpi-internal.h
> @@ -52,6 +52,12 @@
>  typedef mpi_limb_t *mpi_ptr_t;	/* pointer to a limb */
>  typedef int mpi_size_t;		/* (must be a signed type) */
>  
> +#define RESIZE_IF_NEEDED(a, b)			\
> +	do {					\
> +		if ((a)->alloced < (b))		\
> +			mpi_resize((a), (b));	\
> +	} while (0)
> +
>  /* Copy N limbs from S to D.  */
>  #define MPN_COPY(d, s, n) \
>  	do {					\
> @@ -60,6 +66,14 @@ typedef int mpi_size_t;		/* (must be a signed type) */
>  			(d)[_i] = (s)[_i];	\
>  	} while (0)
>  
> +#define MPN_COPY_INCR(d, s, n)		\
> +	do {					\
> +		mpi_size_t _i;			\
> +		for (_i = 0; _i < (n); _i++)	\
> +			(d)[_i] = (s)[_i];	\
> +	} while (0)
> +
> +
>  #define MPN_COPY_DECR(d, s, n) \
>  	do {					\
>  		mpi_size_t _i;			\
> @@ -92,6 +106,38 @@ typedef int mpi_size_t;		/* (must be a signed type) */
>  			mul_n(prodp, up, vp, size, tspace);	\
>  	} while (0);
>  
> +/* Divide the two-limb number in (NH,,NL) by D, with DI being the largest
> + * limb not larger than (2**(2*BITS_PER_MP_LIMB))/D - (2**BITS_PER_MP_LIMB).
> + * If this would yield overflow, DI should be the largest possible number
> + * (i.e., only ones).  For correct operation, the most significant bit of D
> + * has to be set.  Put the quotient in Q and the remainder in R.
> + */
> +#define UDIV_QRNND_PREINV(q, r, nh, nl, d, di)				\
> +	do {								\
> +		mpi_limb_t _ql;						\
> +		mpi_limb_t _q, _r;					\
> +		mpi_limb_t _xh, _xl;					\
> +		umul_ppmm(_q, _ql, (nh), (di));				\
> +		_q += (nh);	/* DI is 2**BITS_PER_MPI_LIMB too small */ \
> +		umul_ppmm(_xh, _xl, _q, (d));				\
> +		sub_ddmmss(_xh, _r, (nh), (nl), _xh, _xl);		\
> +		if (_xh) {						\
> +			sub_ddmmss(_xh, _r, _xh, _r, 0, (d));		\
> +			_q++;						\
> +			if (_xh) {					\
> +				sub_ddmmss(_xh, _r, _xh, _r, 0, (d));	\
> +				_q++;					\
> +			}						\
> +		}							\
> +		if (_r >= (d)) {					\
> +			_r -= (d);					\
> +			_q++;						\
> +		}							\
> +		(r) = _r;						\
> +		(q) = _q;						\
> +	} while (0)
> +
> +
>  /*-- mpiutil.c --*/
>  mpi_ptr_t mpi_alloc_limb_space(unsigned nlimbs);
>  void mpi_free_limb_space(mpi_ptr_t a);
> @@ -135,6 +181,8 @@ int mpihelp_mul(mpi_ptr_t prodp, mpi_ptr_t up, mpi_size_t usize,
>  void mpih_sqr_n_basecase(mpi_ptr_t prodp, mpi_ptr_t up, mpi_size_t size);
>  void mpih_sqr_n(mpi_ptr_t prodp, mpi_ptr_t up, mpi_size_t size,
>  		mpi_ptr_t tspace);
> +void mpihelp_mul_n(mpi_ptr_t prodp,
> +		mpi_ptr_t up, mpi_ptr_t vp, mpi_size_t size);
>  
>  int mpihelp_mul_karatsuba_case(mpi_ptr_t prodp,
>  			       mpi_ptr_t up, mpi_size_t usize,
> @@ -146,9 +194,14 @@ mpi_limb_t mpihelp_mul_1(mpi_ptr_t res_ptr, mpi_ptr_t s1_ptr,
>  			 mpi_size_t s1_size, mpi_limb_t s2_limb);
>  
>  /*-- mpih-div.c --*/
> +mpi_limb_t mpihelp_mod_1(mpi_ptr_t dividend_ptr, mpi_size_t dividend_size,
> +			 mpi_limb_t divisor_limb);
>  mpi_limb_t mpihelp_divrem(mpi_ptr_t qp, mpi_size_t qextra_limbs,
>  			  mpi_ptr_t np, mpi_size_t nsize,
>  			  mpi_ptr_t dp, mpi_size_t dsize);
> +mpi_limb_t mpihelp_divmod_1(mpi_ptr_t quot_ptr,
> +			    mpi_ptr_t dividend_ptr, mpi_size_t dividend_size,
> +			    mpi_limb_t divisor_limb);
>  
>  /*-- generic_mpih-[lr]shift.c --*/
>  mpi_limb_t mpihelp_lshift(mpi_ptr_t wp, mpi_ptr_t up, mpi_size_t usize,
> diff --git a/lib/mpi/mpi-inv.c b/lib/mpi/mpi-inv.c
> new file mode 100644
> index 000000000000..61e37d18f793
> --- /dev/null
> +++ b/lib/mpi/mpi-inv.c
> @@ -0,0 +1,143 @@
> +/* mpi-inv.c  -  MPI functions
> + *	Copyright (C) 1998, 2001, 2002, 2003 Free Software Foundation, Inc.
> + *
> + * This file is part of Libgcrypt.
> + *
> + * Libgcrypt is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU Lesser General Public License as
> + * published by the Free Software Foundation; either version 2.1 of
> + * the License, or (at your option) any later version.
> + *
> + * Libgcrypt is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "mpi-internal.h"
> +
> +/****************
> + * Calculate the multiplicative inverse X of A mod N
> + * That is: Find the solution x for
> + *		1 = (a*x) mod n
> + */
> +int mpi_invm(MPI x, MPI a, MPI n)
> +{
> +	/* Extended Euclid's algorithm (See TAOCP Vol II, 4.5.2, Alg X)
> +	 * modified according to Michael Penk's solution for Exercise 35
> +	 * with further enhancement
> +	 */
> +	MPI u, v, u1, u2 = NULL, u3, v1, v2 = NULL, v3, t1, t2 = NULL, t3;
> +	unsigned int k;
> +	int sign;
> +	int odd;
> +
> +	if (!mpi_cmp_ui(a, 0))
> +		return 0; /* Inverse does not exists.  */
> +	if (!mpi_cmp_ui(n, 1))
> +		return 0; /* Inverse does not exists.  */
> +
> +	u = mpi_copy(a);
> +	v = mpi_copy(n);
> +
> +	for (k = 0; !mpi_test_bit(u, 0) && !mpi_test_bit(v, 0); k++) {
> +		mpi_rshift(u, u, 1);
> +		mpi_rshift(v, v, 1);
> +	}
> +	odd = mpi_test_bit(v, 0);
> +
> +	u1 = mpi_alloc_set_ui(1);
> +	if (!odd)
> +		u2 = mpi_alloc_set_ui(0);
> +	u3 = mpi_copy(u);
> +	v1 = mpi_copy(v);
> +	if (!odd) {
> +		v2 = mpi_alloc(mpi_get_nlimbs(u));
> +		mpi_sub(v2, u1, u); /* U is used as const 1 */
> +	}
> +	v3 = mpi_copy(v);
> +	if (mpi_test_bit(u, 0)) { /* u is odd */
> +		t1 = mpi_alloc_set_ui(0);
> +		if (!odd) {
> +			t2 = mpi_alloc_set_ui(1);
> +			t2->sign = 1;
> +		}
> +		t3 = mpi_copy(v);
> +		t3->sign = !t3->sign;
> +		goto Y4;
> +	} else {
> +		t1 = mpi_alloc_set_ui(1);
> +		if (!odd)
> +			t2 = mpi_alloc_set_ui(0);
> +		t3 = mpi_copy(u);
> +	}
> +
> +	do {
> +		do {
> +			if (!odd) {
> +				if (mpi_test_bit(t1, 0) || mpi_test_bit(t2, 0)) {
> +					/* one is odd */
> +					mpi_add(t1, t1, v);
> +					mpi_sub(t2, t2, u);
> +				}
> +				mpi_rshift(t1, t1, 1);
> +				mpi_rshift(t2, t2, 1);
> +				mpi_rshift(t3, t3, 1);
> +			} else {
> +				if (mpi_test_bit(t1, 0))
> +					mpi_add(t1, t1, v);
> +				mpi_rshift(t1, t1, 1);
> +				mpi_rshift(t3, t3, 1);
> +			}
> +Y4:
> +			;
> +		} while (!mpi_test_bit(t3, 0)); /* while t3 is even */
> +
> +		if (!t3->sign) {
> +			mpi_set(u1, t1);
> +			if (!odd)
> +				mpi_set(u2, t2);
> +			mpi_set(u3, t3);
> +		} else {
> +			mpi_sub(v1, v, t1);
> +			sign = u->sign; u->sign = !u->sign;
> +			if (!odd)
> +				mpi_sub(v2, u, t2);
> +			u->sign = sign;
> +			sign = t3->sign; t3->sign = !t3->sign;
> +			mpi_set(v3, t3);
> +			t3->sign = sign;
> +		}
> +		mpi_sub(t1, u1, v1);
> +		if (!odd)
> +			mpi_sub(t2, u2, v2);
> +		mpi_sub(t3, u3, v3);
> +		if (t1->sign) {
> +			mpi_add(t1, t1, v);
> +			if (!odd)
> +				mpi_sub(t2, t2, u);
> +		}
> +	} while (mpi_cmp_ui(t3, 0)); /* while t3 != 0 */
> +	/* mpi_lshift( u3, k ); */
> +	mpi_set(x, u1);
> +
> +	mpi_free(u1);
> +	mpi_free(v1);
> +	mpi_free(t1);
> +	if (!odd) {
> +		mpi_free(u2);
> +		mpi_free(v2);
> +		mpi_free(t2);
> +	}
> +	mpi_free(u3);
> +	mpi_free(v3);
> +	mpi_free(t3);
> +
> +	mpi_free(u);
> +	mpi_free(v);
> +	return 1;
> +}
> +EXPORT_SYMBOL_GPL(mpi_invm);
> diff --git a/lib/mpi/mpi-mod.c b/lib/mpi/mpi-mod.c
> new file mode 100644
> index 000000000000..47bc59edd4ff
> --- /dev/null
> +++ b/lib/mpi/mpi-mod.c
> @@ -0,0 +1,155 @@
> +/* mpi-mod.c -  Modular reduction
> + * Copyright (C) 1998, 1999, 2001, 2002, 2003,
> + *               2007  Free Software Foundation, Inc.
> + *
> + * This file is part of Libgcrypt.
> + */
> +
> +
> +#include "mpi-internal.h"
> +#include "longlong.h"
> +
> +/* Context used with Barrett reduction.  */
> +struct barrett_ctx_s {
> +	MPI m;   /* The modulus - may not be modified. */
> +	int m_copied;   /* If true, M needs to be released.  */
> +	int k;
> +	MPI y;
> +	MPI r1;  /* Helper MPI. */
> +	MPI r2;  /* Helper MPI. */
> +	MPI r3;  /* Helper MPI allocated on demand. */
> +};
> +
> +
> +
> +void mpi_mod(MPI rem, MPI dividend, MPI divisor)
> +{
> +	mpi_fdiv_r(rem, dividend, divisor);
> +}
> +
> +/* This function returns a new context for Barrett based operations on
> + * the modulus M.  This context needs to be released using
> + * _gcry_mpi_barrett_free.  If COPY is true M will be transferred to
> + * the context and the user may change M.  If COPY is false, M may not
> + * be changed until gcry_mpi_barrett_free has been called.
> + */
> +mpi_barrett_t mpi_barrett_init(MPI m, int copy)
> +{
> +	mpi_barrett_t ctx;
> +	MPI tmp;
> +
> +	mpi_normalize(m);
> +	ctx = kcalloc(1, sizeof(*ctx), GFP_KERNEL);
> +
> +	if (copy) {
> +		ctx->m = mpi_copy(m);
> +		ctx->m_copied = 1;
> +	} else
> +		ctx->m = m;
> +
> +	ctx->k = mpi_get_nlimbs(m);
> +	tmp = mpi_alloc(ctx->k + 1);
> +
> +	/* Barrett precalculation: y = floor(b^(2k) / m). */
> +	mpi_set_ui(tmp, 1);
> +	mpi_lshift_limbs(tmp, 2 * ctx->k);
> +	mpi_fdiv_q(tmp, tmp, m);
> +
> +	ctx->y  = tmp;
> +	ctx->r1 = mpi_alloc(2 * ctx->k + 1);
> +	ctx->r2 = mpi_alloc(2 * ctx->k + 1);
> +
> +	return ctx;
> +}
> +
> +void mpi_barrett_free(mpi_barrett_t ctx)
> +{
> +	if (ctx) {
> +		mpi_free(ctx->y);
> +		mpi_free(ctx->r1);
> +		mpi_free(ctx->r2);
> +		if (ctx->r3)
> +			mpi_free(ctx->r3);
> +		if (ctx->m_copied)
> +			mpi_free(ctx->m);
> +		kfree(ctx);
> +	}
> +}
> +
> +
> +/* R = X mod M
> + *
> + * Using Barrett reduction.  Before using this function
> + * _gcry_mpi_barrett_init must have been called to do the
> + * precalculations.  CTX is the context created by this precalculation
> + * and also conveys M.  If the Barret reduction could no be done a
> + * straightforward reduction method is used.
> + *
> + * We assume that these conditions are met:
> + * Input:  x =(x_2k-1 ...x_0)_b
> + *     m =(m_k-1 ....m_0)_b	  with m_k-1 != 0
> + * Output: r = x mod m
> + */
> +void mpi_mod_barrett(MPI r, MPI x, mpi_barrett_t ctx)
> +{
> +	MPI m = ctx->m;
> +	int k = ctx->k;
> +	MPI y = ctx->y;
> +	MPI r1 = ctx->r1;
> +	MPI r2 = ctx->r2;
> +	int sign;
> +
> +	mpi_normalize(x);
> +	if (mpi_get_nlimbs(x) > 2*k) {
> +		mpi_mod(r, x, m);
> +		return;
> +	}
> +
> +	sign = x->sign;
> +	x->sign = 0;
> +
> +	/* 1. q1 = floor( x / b^k-1)
> +	 *    q2 = q1 * y
> +	 *    q3 = floor( q2 / b^k+1 )
> +	 * Actually, we don't need qx, we can work direct on r2
> +	 */
> +	mpi_set(r2, x);
> +	mpi_rshift_limbs(r2, k-1);
> +	mpi_mul(r2, r2, y);
> +	mpi_rshift_limbs(r2, k+1);
> +
> +	/* 2. r1 = x mod b^k+1
> +	 *	r2 = q3 * m mod b^k+1
> +	 *	r  = r1 - r2
> +	 * 3. if r < 0 then  r = r + b^k+1
> +	 */
> +	mpi_set(r1, x);
> +	if (r1->nlimbs > k+1) /* Quick modulo operation.  */
> +		r1->nlimbs = k+1;
> +	mpi_mul(r2, r2, m);
> +	if (r2->nlimbs > k+1) /* Quick modulo operation. */
> +		r2->nlimbs = k+1;
> +	mpi_sub(r, r1, r2);
> +
> +	if (mpi_has_sign(r)) {
> +		if (!ctx->r3) {
> +			ctx->r3 = mpi_alloc(k + 2);
> +			mpi_set_ui(ctx->r3, 1);
> +			mpi_lshift_limbs(ctx->r3, k + 1);
> +		}
> +		mpi_add(r, r, ctx->r3);
> +	}
> +
> +	/* 4. while r >= m do r = r - m */
> +	while (mpi_cmp(r, m) >= 0)
> +		mpi_sub(r, r, m);
> +
> +	x->sign = sign;
> +}
> +
> +
> +void mpi_mul_barrett(MPI w, MPI u, MPI v, mpi_barrett_t ctx)
> +{
> +	mpi_mul(w, u, v);
> +	mpi_mod_barrett(w, w, ctx);
> +}
> diff --git a/lib/mpi/mpi-mul.c b/lib/mpi/mpi-mul.c
> new file mode 100644
> index 000000000000..587e6335cc12
> --- /dev/null
> +++ b/lib/mpi/mpi-mul.c
> @@ -0,0 +1,94 @@
> +/* mpi-mul.c  -  MPI functions
> + * Copyright (C) 1994, 1996, 1998, 2001, 2002,
> + *               2003 Free Software Foundation, Inc.
> + *
> + * This file is part of Libgcrypt.
> + *
> + * Note: This code is heavily based on the GNU MP Library.
> + *	 Actually it's the same code with only minor changes in the
> + *	 way the data is stored; this is to support the abstraction
> + *	 of an optional secure memory allocation which may be used
> + *	 to avoid revealing of sensitive data due to paging etc.
> + */
> +
> +#include "mpi-internal.h"
> +
> +void mpi_mul(MPI w, MPI u, MPI v)
> +{
> +	mpi_size_t usize, vsize, wsize;
> +	mpi_ptr_t up, vp, wp;
> +	mpi_limb_t cy;
> +	int usign, vsign, sign_product;
> +	int assign_wp = 0;
> +	mpi_ptr_t tmp_limb = NULL;
> +	unsigned int tmp_limb_nlimbs = 0;
> +
> +	if (u->nlimbs < v->nlimbs) {
> +		/* Swap U and V. */
> +		usize = v->nlimbs;
> +		usign = v->sign;
> +		up    = v->d;
> +		vsize = u->nlimbs;
> +		vsign = u->sign;
> +		vp    = u->d;
> +	} else {
> +		usize = u->nlimbs;
> +		usign = u->sign;
> +		up    = u->d;
> +		vsize = v->nlimbs;
> +		vsign = v->sign;
> +		vp    = v->d;
> +	}
> +	sign_product = usign ^ vsign;
> +	wp = w->d;
> +
> +	/* Ensure W has space enough to store the result.  */
> +	wsize = usize + vsize;
> +	if (w->alloced < wsize) {
> +		if (wp == up || wp == vp) {
> +			wp = mpi_alloc_limb_space(wsize);
> +			assign_wp = 1;
> +		} else {
> +			mpi_resize(w, wsize);
> +			wp = w->d;
> +		}
> +	} else { /* Make U and V not overlap with W.	*/
> +		if (wp == up) {
> +			/* W and U are identical.  Allocate temporary space for U. */
> +			tmp_limb_nlimbs = usize;
> +			up = tmp_limb = mpi_alloc_limb_space(usize);
> +			/* Is V identical too?  Keep it identical with U.  */
> +			if (wp == vp)
> +				vp = up;
> +			/* Copy to the temporary space.  */
> +			MPN_COPY(up, wp, usize);
> +		} else if (wp == vp) {
> +			/* W and V are identical.  Allocate temporary space for V. */
> +			tmp_limb_nlimbs = vsize;
> +			vp = tmp_limb = mpi_alloc_limb_space(vsize);
> +			/* Copy to the temporary space.  */
> +			MPN_COPY(vp, wp, vsize);
> +		}
> +	}
> +
> +	if (!vsize)
> +		wsize = 0;
> +	else {
> +		mpihelp_mul(wp, up, usize, vp, vsize, &cy);
> +		wsize -= cy ? 0:1;
> +	}
> +
> +	if (assign_wp)
> +		mpi_assign_limb_space(w, wp, wsize);
> +	w->nlimbs = wsize;
> +	w->sign = sign_product;
> +	if (tmp_limb)
> +		mpi_free_limb_space(tmp_limb);
> +}
> +
> +void mpi_mulm(MPI w, MPI u, MPI v, MPI m)
> +{
> +	mpi_mul(w, u, v);
> +	mpi_tdiv_r(w, w, m);
> +}
> +EXPORT_SYMBOL_GPL(mpi_mulm);
> diff --git a/lib/mpi/mpicoder.c b/lib/mpi/mpicoder.c
> index eead4b339466..7ea225b2204f 100644
> --- a/lib/mpi/mpicoder.c
> +++ b/lib/mpi/mpicoder.c
> @@ -25,6 +25,7 @@
>  #include <linux/string.h>
>  #include "mpi-internal.h"
>  
> +#define MAX_EXTERN_SCAN_BYTES (16*1024*1024)
>  #define MAX_EXTERN_MPI_BITS 16384
>  
>  /**
> @@ -109,6 +110,112 @@ MPI mpi_read_from_buffer(const void *xbuffer, unsigned *ret_nread)
>  }
>  EXPORT_SYMBOL_GPL(mpi_read_from_buffer);
>  
> +/****************
> + * Fill the mpi VAL from the hex string in STR.
> + */
> +int mpi_fromstr(MPI val, const char *str)
> +{
> +	int sign = 0;
> +	int prepend_zero = 0;
> +	int i, j, c, c1, c2;
> +	unsigned int nbits, nbytes, nlimbs;
> +	mpi_limb_t a;
> +
> +	if (*str == '-') {
> +		sign = 1;
> +		str++;
> +	}
> +
> +	/* Skip optional hex prefix.  */
> +	if (*str == '0' && str[1] == 'x')
> +		str += 2;
> +
> +	nbits = strlen(str);
> +	if (nbits > MAX_EXTERN_SCAN_BYTES) {
> +		mpi_clear(val);
> +		return -EINVAL;
> +	}
> +	nbits *= 4;
> +	if ((nbits % 8))
> +		prepend_zero = 1;
> +
> +	nbytes = (nbits+7) / 8;
> +	nlimbs = (nbytes+BYTES_PER_MPI_LIMB-1) / BYTES_PER_MPI_LIMB;
> +
> +	if (val->alloced < nlimbs)
> +		mpi_resize(val, nlimbs);
> +
> +	i = BYTES_PER_MPI_LIMB - (nbytes % BYTES_PER_MPI_LIMB);
> +	i %= BYTES_PER_MPI_LIMB;
> +	j = val->nlimbs = nlimbs;
> +	val->sign = sign;
> +	for (; j > 0; j--) {
> +		a = 0;
> +		for (; i < BYTES_PER_MPI_LIMB; i++) {
> +			if (prepend_zero) {
> +				c1 = '0';
> +				prepend_zero = 0;
> +			} else
> +				c1 = *str++;
> +
> +			if (!c1) {
> +				mpi_clear(val);
> +				return -EINVAL;
> +			}
> +			c2 = *str++;
> +			if (!c2) {
> +				mpi_clear(val);
> +				return -EINVAL;
> +			}
> +			if (c1 >= '0' && c1 <= '9')
> +				c = c1 - '0';
> +			else if (c1 >= 'a' && c1 <= 'f')
> +				c = c1 - 'a' + 10;
> +			else if (c1 >= 'A' && c1 <= 'F')
> +				c = c1 - 'A' + 10;
> +			else {
> +				mpi_clear(val);
> +				return -EINVAL;
> +			}
> +			c <<= 4;
> +			if (c2 >= '0' && c2 <= '9')
> +				c |= c2 - '0';
> +			else if (c2 >= 'a' && c2 <= 'f')
> +				c |= c2 - 'a' + 10;
> +			else if (c2 >= 'A' && c2 <= 'F')
> +				c |= c2 - 'A' + 10;
> +			else {
> +				mpi_clear(val);
> +				return -EINVAL;
> +			}
> +			a <<= 8;
> +			a |= c;
> +		}
> +		i = 0;
> +		val->d[j-1] = a;
> +	}
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(mpi_fromstr);
> +
> +MPI mpi_scanval(const char *string)
> +{
> +	MPI a;
> +
> +	a = mpi_alloc(0);
> +	if (!a)
> +		return NULL;
> +
> +	if (mpi_fromstr(a, string)) {
> +		mpi_free(a);
> +		return NULL;
> +	}
> +	mpi_normalize(a);
> +	return a;
> +}
> +EXPORT_SYMBOL_GPL(mpi_scanval);
> +
>  static int count_lzeros(MPI a)
>  {
>  	mpi_limb_t alimb;
> @@ -413,3 +520,232 @@ MPI mpi_read_raw_from_sgl(struct scatterlist *sgl, unsigned int nbytes)
>  	return val;
>  }
>  EXPORT_SYMBOL_GPL(mpi_read_raw_from_sgl);
> +
> +/* Perform a two's complement operation on buffer P of size N bytes.  */
> +static void twocompl(unsigned char *p, unsigned int n)
> +{
> +	int i;
> +
> +	for (i = n-1; i >= 0 && !p[i]; i--)
> +		;
> +	if (i >= 0) {
> +		if ((p[i] & 0x01))
> +			p[i] = (((p[i] ^ 0xfe) | 0x01) & 0xff);
> +		else if ((p[i] & 0x02))
> +			p[i] = (((p[i] ^ 0xfc) | 0x02) & 0xfe);
> +		else if ((p[i] & 0x04))
> +			p[i] = (((p[i] ^ 0xf8) | 0x04) & 0xfc);
> +		else if ((p[i] & 0x08))
> +			p[i] = (((p[i] ^ 0xf0) | 0x08) & 0xf8);
> +		else if ((p[i] & 0x10))
> +			p[i] = (((p[i] ^ 0xe0) | 0x10) & 0xf0);
> +		else if ((p[i] & 0x20))
> +			p[i] = (((p[i] ^ 0xc0) | 0x20) & 0xe0);
> +		else if ((p[i] & 0x40))
> +			p[i] = (((p[i] ^ 0x80) | 0x40) & 0xc0);
> +		else
> +			p[i] = 0x80;
> +
> +		for (i--; i >= 0; i--)
> +			p[i] ^= 0xff;
> +	}
> +}
> +
> +int mpi_print(enum gcry_mpi_format format, unsigned char *buffer,
> +			size_t buflen, size_t *nwritten, MPI a)
> +{
> +	unsigned int nbits = mpi_get_nbits(a);
> +	size_t len;
> +	size_t dummy_nwritten;
> +	int negative;
> +
> +	if (!nwritten)
> +		nwritten = &dummy_nwritten;
> +
> +	/* Libgcrypt does no always care to set clear the sign if the value
> +	 * is 0.  For printing this is a bit of a surprise, in particular
> +	 * because if some of the formats don't support negative numbers but
> +	 * should be able to print a zero.  Thus we need this extra test
> +	 * for a negative number.
> +	 */
> +	if (a->sign && mpi_cmp_ui(a, 0))
> +		negative = 1;
> +	else
> +		negative = 0;
> +
> +	len = buflen;
> +	*nwritten = 0;
> +	if (format == GCRYMPI_FMT_STD) {
> +		unsigned char *tmp;
> +		int extra = 0;
> +		unsigned int n;
> +
> +		tmp = mpi_get_buffer(a, &n, NULL);
> +		if (!tmp)
> +			return -EINVAL;
> +
> +		if (negative) {
> +			twocompl(tmp, n);
> +			if (!(*tmp & 0x80)) {
> +				/* Need to extend the sign.  */
> +				n++;
> +				extra = 2;
> +			}
> +		} else if (n && (*tmp & 0x80)) {
> +			/* Positive but the high bit of the returned buffer is set.
> +			 * Thus we need to print an extra leading 0x00 so that the
> +			 * output is interpreted as a positive number.
> +			 */
> +			n++;
> +			extra = 1;
> +		}
> +
> +		if (buffer && n > len) {
> +			/* The provided buffer is too short. */
> +			kfree(tmp);
> +			return -E2BIG;
> +		}
> +		if (buffer) {
> +			unsigned char *s = buffer;
> +
> +			if (extra == 1)
> +				*s++ = 0;
> +			else if (extra)
> +				*s++ = 0xff;
> +			memcpy(s, tmp, n-!!extra);
> +		}
> +		kfree(tmp);
> +		*nwritten = n;
> +		return 0;
> +	} else if (format == GCRYMPI_FMT_USG) {
> +		unsigned int n = (nbits + 7)/8;
> +
> +		/* Note:  We ignore the sign for this format.  */
> +		/* FIXME: for performance reasons we should put this into
> +		 * mpi_aprint because we can then use the buffer directly.
> +		 */
> +
> +		if (buffer && n > len)
> +			return -E2BIG;
> +		if (buffer) {
> +			unsigned char *tmp;
> +
> +			tmp = mpi_get_buffer(a, &n, NULL);
> +			if (!tmp)
> +				return -EINVAL;
> +			memcpy(buffer, tmp, n);
> +			kfree(tmp);
> +		}
> +		*nwritten = n;
> +		return 0;
> +	} else if (format == GCRYMPI_FMT_PGP) {
> +		unsigned int n = (nbits + 7)/8;
> +
> +		/* The PGP format can only handle unsigned integers.  */
> +		if (negative)
> +			return -EINVAL;
> +
> +		if (buffer && n+2 > len)
> +			return -E2BIG;
> +
> +		if (buffer) {
> +			unsigned char *tmp;
> +			unsigned char *s = buffer;
> +
> +			s[0] = nbits >> 8;
> +			s[1] = nbits;
> +
> +			tmp = mpi_get_buffer(a, &n, NULL);
> +			if (!tmp)
> +				return -EINVAL;
> +			memcpy(s+2, tmp, n);
> +			kfree(tmp);
> +		}
> +		*nwritten = n+2;
> +		return 0;
> +	} else if (format == GCRYMPI_FMT_SSH) {
> +		unsigned char *tmp;
> +		int extra = 0;
> +		unsigned int n;
> +
> +		tmp = mpi_get_buffer(a, &n, NULL);
> +		if (!tmp)
> +			return -EINVAL;
> +
> +		if (negative) {
> +			twocompl(tmp, n);
> +			if (!(*tmp & 0x80)) {
> +				/* Need to extend the sign.  */
> +				n++;
> +				extra = 2;
> +			}
> +		} else if (n && (*tmp & 0x80)) {
> +			n++;
> +			extra = 1;
> +		}
> +
> +		if (buffer && n+4 > len) {
> +			kfree(tmp);
> +			return -E2BIG;
> +		}
> +
> +		if (buffer) {
> +			unsigned char *s = buffer;
> +
> +			*s++ = n >> 24;
> +			*s++ = n >> 16;
> +			*s++ = n >> 8;
> +			*s++ = n;
> +			if (extra == 1)
> +				*s++ = 0;
> +			else if (extra)
> +				*s++ = 0xff;
> +			memcpy(s, tmp, n-!!extra);
> +		}
> +		kfree(tmp);
> +		*nwritten = 4+n;
> +		return 0;
> +	} else if (format == GCRYMPI_FMT_HEX) {
> +		unsigned char *tmp;
> +		int i;
> +		int extra = 0;
> +		unsigned int n = 0;
> +
> +		tmp = mpi_get_buffer(a, &n, NULL);
> +		if (!tmp)
> +			return -EINVAL;
> +		if (!n || (*tmp & 0x80))
> +			extra = 2;
> +
> +		if (buffer && 2*n + extra + negative + 1 > len) {
> +			kfree(tmp);
> +			return -E2BIG;
> +		}
> +		if (buffer) {
> +			unsigned char *s = buffer;
> +
> +			if (negative)
> +				*s++ = '-';
> +			if (extra) {
> +				*s++ = '0';
> +				*s++ = '0';
> +			}
> +
> +			for (i = 0; i < n; i++) {
> +				unsigned int c = tmp[i];
> +
> +				*s++ = (c >> 4) < 10 ? '0'+(c>>4) : 'A'+(c>>4)-10;
> +				c &= 15;
> +				*s++ = c < 10 ? '0'+c : 'A'+c-10;
> +			}
> +			*s++ = 0;
> +			*nwritten = s - buffer;
> +		} else {
> +			*nwritten = 2*n + extra + negative + 1;
> +		}
> +		kfree(tmp);
> +		return 0;
> +	} else
> +		return -EINVAL;
> +}
> +EXPORT_SYMBOL_GPL(mpi_print);
> diff --git a/lib/mpi/mpih-div.c b/lib/mpi/mpih-div.c
> index 913a519eb005..182a656a1ba0 100644
> --- a/lib/mpi/mpih-div.c
> +++ b/lib/mpi/mpih-div.c
> @@ -24,6 +24,150 @@
>  #define UDIV_TIME UMUL_TIME
>  #endif
>  
> +
> +mpi_limb_t
> +mpihelp_mod_1(mpi_ptr_t dividend_ptr, mpi_size_t dividend_size,
> +			mpi_limb_t divisor_limb)
> +{
> +	mpi_size_t i;
> +	mpi_limb_t n1, n0, r;
> +	mpi_limb_t dummy;
> +
> +	/* Botch: Should this be handled at all?  Rely on callers?	*/
> +	if (!dividend_size)
> +		return 0;
> +
> +	/* If multiplication is much faster than division, and the
> +	 * dividend is large, pre-invert the divisor, and use
> +	 * only multiplications in the inner loop.
> +	 *
> +	 * This test should be read:
> +	 *	 Does it ever help to use udiv_qrnnd_preinv?
> +	 *	   && Does what we save compensate for the inversion overhead?
> +	 */
> +	if (UDIV_TIME > (2 * UMUL_TIME + 6)
> +			&& (UDIV_TIME - (2 * UMUL_TIME + 6)) * dividend_size > UDIV_TIME) {
> +		int normalization_steps;
> +
> +		normalization_steps = count_leading_zeros(divisor_limb);
> +		if (normalization_steps) {
> +			mpi_limb_t divisor_limb_inverted;
> +
> +			divisor_limb <<= normalization_steps;
> +
> +			/* Compute (2**2N - 2**N * DIVISOR_LIMB) / DIVISOR_LIMB.  The
> +			 * result is a (N+1)-bit approximation to 1/DIVISOR_LIMB, with the
> +			 * most significant bit (with weight 2**N) implicit.
> +			 *
> +			 * Special case for DIVISOR_LIMB == 100...000.
> +			 */
> +			if (!(divisor_limb << 1))
> +				divisor_limb_inverted = ~(mpi_limb_t)0;
> +			else
> +				udiv_qrnnd(divisor_limb_inverted, dummy,
> +						-divisor_limb, 0, divisor_limb);
> +
> +			n1 = dividend_ptr[dividend_size - 1];
> +			r = n1 >> (BITS_PER_MPI_LIMB - normalization_steps);
> +
> +			/* Possible optimization:
> +			 * if (r == 0
> +			 * && divisor_limb > ((n1 << normalization_steps)
> +			 *		       | (dividend_ptr[dividend_size - 2] >> ...)))
> +			 * ...one division less...
> +			 */
> +			for (i = dividend_size - 2; i >= 0; i--) {
> +				n0 = dividend_ptr[i];
> +				UDIV_QRNND_PREINV(dummy, r, r,
> +						((n1 << normalization_steps)
> +						 | (n0 >> (BITS_PER_MPI_LIMB - normalization_steps))),
> +						divisor_limb, divisor_limb_inverted);
> +				n1 = n0;
> +			}
> +			UDIV_QRNND_PREINV(dummy, r, r,
> +					n1 << normalization_steps,
> +					divisor_limb, divisor_limb_inverted);
> +			return r >> normalization_steps;
> +		} else {
> +			mpi_limb_t divisor_limb_inverted;
> +
> +			/* Compute (2**2N - 2**N * DIVISOR_LIMB) / DIVISOR_LIMB.  The
> +			 * result is a (N+1)-bit approximation to 1/DIVISOR_LIMB, with the
> +			 * most significant bit (with weight 2**N) implicit.
> +			 *
> +			 * Special case for DIVISOR_LIMB == 100...000.
> +			 */
> +			if (!(divisor_limb << 1))
> +				divisor_limb_inverted = ~(mpi_limb_t)0;
> +			else
> +				udiv_qrnnd(divisor_limb_inverted, dummy,
> +						-divisor_limb, 0, divisor_limb);
> +
> +			i = dividend_size - 1;
> +			r = dividend_ptr[i];
> +
> +			if (r >= divisor_limb)
> +				r = 0;
> +			else
> +				i--;
> +
> +			for ( ; i >= 0; i--) {
> +				n0 = dividend_ptr[i];
> +				UDIV_QRNND_PREINV(dummy, r, r,
> +						n0, divisor_limb, divisor_limb_inverted);
> +			}
> +			return r;
> +		}
> +	} else {
> +		if (UDIV_NEEDS_NORMALIZATION) {
> +			int normalization_steps;
> +
> +			normalization_steps = count_leading_zeros(divisor_limb);
> +			if (normalization_steps) {
> +				divisor_limb <<= normalization_steps;
> +
> +				n1 = dividend_ptr[dividend_size - 1];
> +				r = n1 >> (BITS_PER_MPI_LIMB - normalization_steps);
> +
> +				/* Possible optimization:
> +				 * if (r == 0
> +				 * && divisor_limb > ((n1 << normalization_steps)
> +				 *		   | (dividend_ptr[dividend_size - 2] >> ...)))
> +				 * ...one division less...
> +				 */
> +				for (i = dividend_size - 2; i >= 0; i--) {
> +					n0 = dividend_ptr[i];
> +					udiv_qrnnd(dummy, r, r,
> +						((n1 << normalization_steps)
> +						 | (n0 >> (BITS_PER_MPI_LIMB - normalization_steps))),
> +						divisor_limb);
> +					n1 = n0;
> +				}
> +				udiv_qrnnd(dummy, r, r,
> +						n1 << normalization_steps,
> +						divisor_limb);
> +				return r >> normalization_steps;
> +			}
> +		}
> +		/* No normalization needed, either because udiv_qrnnd doesn't require
> +		 * it, or because DIVISOR_LIMB is already normalized.
> +		 */
> +		i = dividend_size - 1;
> +		r = dividend_ptr[i];
> +
> +		if (r >= divisor_limb)
> +			r = 0;
> +		else
> +			i--;
> +
> +		for (; i >= 0; i--) {
> +			n0 = dividend_ptr[i];
> +			udiv_qrnnd(dummy, r, r, n0, divisor_limb);
> +		}
> +		return r;
> +	}
> +}
> +
>  /* Divide num (NP/NSIZE) by den (DP/DSIZE) and write
>   * the NSIZE-DSIZE least significant quotient limbs at QP
>   * and the DSIZE long remainder at NP.	If QEXTRA_LIMBS is
> @@ -221,3 +365,153 @@ mpihelp_divrem(mpi_ptr_t qp, mpi_size_t qextra_limbs,
>  
>  	return most_significant_q_limb;
>  }
> +
> +/****************
> + * Divide (DIVIDEND_PTR,,DIVIDEND_SIZE) by DIVISOR_LIMB.
> + * Write DIVIDEND_SIZE limbs of quotient at QUOT_PTR.
> + * Return the single-limb remainder.
> + * There are no constraints on the value of the divisor.
> + *
> + * QUOT_PTR and DIVIDEND_PTR might point to the same limb.
> + */
> +
> +mpi_limb_t
> +mpihelp_divmod_1(mpi_ptr_t quot_ptr,
> +		mpi_ptr_t dividend_ptr, mpi_size_t dividend_size,
> +		mpi_limb_t divisor_limb)
> +{
> +	mpi_size_t i;
> +	mpi_limb_t n1, n0, r;
> +	mpi_limb_t dummy;
> +
> +	if (!dividend_size)
> +		return 0;
> +
> +	/* If multiplication is much faster than division, and the
> +	 * dividend is large, pre-invert the divisor, and use
> +	 * only multiplications in the inner loop.
> +	 *
> +	 * This test should be read:
> +	 * Does it ever help to use udiv_qrnnd_preinv?
> +	 * && Does what we save compensate for the inversion overhead?
> +	 */
> +	if (UDIV_TIME > (2 * UMUL_TIME + 6)
> +			&& (UDIV_TIME - (2 * UMUL_TIME + 6)) * dividend_size > UDIV_TIME) {
> +		int normalization_steps;
> +
> +		normalization_steps = count_leading_zeros(divisor_limb);
> +		if (normalization_steps) {
> +			mpi_limb_t divisor_limb_inverted;
> +
> +			divisor_limb <<= normalization_steps;
> +
> +			/* Compute (2**2N - 2**N * DIVISOR_LIMB) / DIVISOR_LIMB.  The
> +			 * result is a (N+1)-bit approximation to 1/DIVISOR_LIMB, with the
> +			 * most significant bit (with weight 2**N) implicit.
> +			 */
> +			/* Special case for DIVISOR_LIMB == 100...000.  */
> +			if (!(divisor_limb << 1))
> +				divisor_limb_inverted = ~(mpi_limb_t)0;
> +			else
> +				udiv_qrnnd(divisor_limb_inverted, dummy,
> +						-divisor_limb, 0, divisor_limb);
> +
> +			n1 = dividend_ptr[dividend_size - 1];
> +			r = n1 >> (BITS_PER_MPI_LIMB - normalization_steps);
> +
> +			/* Possible optimization:
> +			 * if (r == 0
> +			 * && divisor_limb > ((n1 << normalization_steps)
> +			 *		       | (dividend_ptr[dividend_size - 2] >> ...)))
> +			 * ...one division less...
> +			 */
> +			for (i = dividend_size - 2; i >= 0; i--) {
> +				n0 = dividend_ptr[i];
> +				UDIV_QRNND_PREINV(quot_ptr[i + 1], r, r,
> +						((n1 << normalization_steps)
> +						 | (n0 >> (BITS_PER_MPI_LIMB - normalization_steps))),
> +						divisor_limb, divisor_limb_inverted);
> +				n1 = n0;
> +			}
> +			UDIV_QRNND_PREINV(quot_ptr[0], r, r,
> +					n1 << normalization_steps,
> +					divisor_limb, divisor_limb_inverted);
> +			return r >> normalization_steps;
> +		} else {
> +			mpi_limb_t divisor_limb_inverted;
> +
> +			/* Compute (2**2N - 2**N * DIVISOR_LIMB) / DIVISOR_LIMB.  The
> +			 * result is a (N+1)-bit approximation to 1/DIVISOR_LIMB, with the
> +			 * most significant bit (with weight 2**N) implicit.
> +			 */
> +			/* Special case for DIVISOR_LIMB == 100...000.  */
> +			if (!(divisor_limb << 1))
> +				divisor_limb_inverted = ~(mpi_limb_t) 0;
> +			else
> +				udiv_qrnnd(divisor_limb_inverted, dummy,
> +						-divisor_limb, 0, divisor_limb);
> +
> +			i = dividend_size - 1;
> +			r = dividend_ptr[i];
> +
> +			if (r >= divisor_limb)
> +				r = 0;
> +			else
> +				quot_ptr[i--] = 0;
> +
> +			for ( ; i >= 0; i--) {
> +				n0 = dividend_ptr[i];
> +				UDIV_QRNND_PREINV(quot_ptr[i], r, r,
> +						n0, divisor_limb, divisor_limb_inverted);
> +			}
> +			return r;
> +		}
> +	} else {
> +		if (UDIV_NEEDS_NORMALIZATION) {
> +			int normalization_steps;
> +
> +			normalization_steps = count_leading_zeros(divisor_limb);
> +			if (normalization_steps) {
> +				divisor_limb <<= normalization_steps;
> +
> +				n1 = dividend_ptr[dividend_size - 1];
> +				r = n1 >> (BITS_PER_MPI_LIMB - normalization_steps);
> +
> +				/* Possible optimization:
> +				 * if (r == 0
> +				 * && divisor_limb > ((n1 << normalization_steps)
> +				 *		   | (dividend_ptr[dividend_size - 2] >> ...)))
> +				 * ...one division less...
> +				 */
> +				for (i = dividend_size - 2; i >= 0; i--) {
> +					n0 = dividend_ptr[i];
> +					udiv_qrnnd(quot_ptr[i + 1], r, r,
> +						((n1 << normalization_steps)
> +						 | (n0 >> (BITS_PER_MPI_LIMB - normalization_steps))),
> +						divisor_limb);
> +					n1 = n0;
> +				}
> +				udiv_qrnnd(quot_ptr[0], r, r,
> +						n1 << normalization_steps,
> +						divisor_limb);
> +				return r >> normalization_steps;
> +			}
> +		}
> +		/* No normalization needed, either because udiv_qrnnd doesn't require
> +		 * it, or because DIVISOR_LIMB is already normalized.
> +		 */
> +		i = dividend_size - 1;
> +		r = dividend_ptr[i];
> +
> +		if (r >= divisor_limb)
> +			r = 0;
> +		else
> +			quot_ptr[i--] = 0;
> +
> +		for (; i >= 0; i--) {
> +			n0 = dividend_ptr[i];
> +			udiv_qrnnd(quot_ptr[i], r, r, n0, divisor_limb);
> +		}
> +		return r;
> +	}
> +}
> diff --git a/lib/mpi/mpih-mul.c b/lib/mpi/mpih-mul.c
> index a93647564054..e5f1c84e3c48 100644
> --- a/lib/mpi/mpih-mul.c
> +++ b/lib/mpi/mpih-mul.c
> @@ -317,6 +317,31 @@ mpih_sqr_n(mpi_ptr_t prodp, mpi_ptr_t up, mpi_size_t size, mpi_ptr_t tspace)
>  	}
>  }
>  
> +
> +void mpihelp_mul_n(mpi_ptr_t prodp,
> +		mpi_ptr_t up, mpi_ptr_t vp, mpi_size_t size)
> +{
> +	if (up == vp) {
> +		if (size < KARATSUBA_THRESHOLD)
> +			mpih_sqr_n_basecase(prodp, up, size);
> +		else {
> +			mpi_ptr_t tspace;
> +			tspace = mpi_alloc_limb_space(2 * size);
> +			mpih_sqr_n(prodp, up, size, tspace);
> +			mpi_free_limb_space(tspace);
> +		}
> +	} else {
> +		if (size < KARATSUBA_THRESHOLD)
> +			mul_n_basecase(prodp, up, vp, size);
> +		else {
> +			mpi_ptr_t tspace;
> +			tspace = mpi_alloc_limb_space(2 * size);
> +			mul_n(prodp, up, vp, size, tspace);
> +			mpi_free_limb_space(tspace);
> +		}
> +	}
> +}
> +
>  int
>  mpihelp_mul_karatsuba_case(mpi_ptr_t prodp,
>  			   mpi_ptr_t up, mpi_size_t usize,
> diff --git a/lib/mpi/mpiutil.c b/lib/mpi/mpiutil.c
> index 20ed0f766787..e4046f74f336 100644
> --- a/lib/mpi/mpiutil.c
> +++ b/lib/mpi/mpiutil.c
> @@ -20,6 +20,63 @@
>  
>  #include "mpi-internal.h"
>  
> +/* Constants allocated right away at startup.  */
> +static MPI constants[MPI_NUMBER_OF_CONSTANTS];
> +
> +/* Initialize the MPI subsystem.  This is called early and allows to
> + * do some initialization without taking care of threading issues.
> + */
> +static int __init mpi_init(void)
> +{
> +	int idx;
> +	unsigned long value;
> +
> +	for (idx = 0; idx < MPI_NUMBER_OF_CONSTANTS; idx++) {
> +		switch (idx) {
> +		case MPI_C_ZERO:
> +			value = 0;
> +			break;
> +		case MPI_C_ONE:
> +			value = 1;
> +			break;
> +		case MPI_C_TWO:
> +			value = 2;
> +			break;
> +		case MPI_C_THREE:
> +			value = 3;
> +			break;
> +		case MPI_C_FOUR:
> +			value = 4;
> +			break;
> +		case MPI_C_EIGHT:
> +			value = 8;
> +			break;
> +		default:
> +			pr_err("MPI: invalid mpi_const selector %d\n", idx);
> +			return -EFAULT;
> +		}
> +		constants[idx] = mpi_alloc_set_ui(value);
> +		constants[idx]->flags = (16|32);
> +	}
> +
> +	return 0;
> +}
> +postcore_initcall(mpi_init);
> +
> +/* Return a constant MPI descripbed by NO which is one of the
> + * MPI_C_xxx macros.  There is no need to copy this returned value; it
> + * may be used directly.
> + */
> +MPI mpi_const(enum gcry_mpi_constants no)
> +{
> +	if ((int)no < 0 || no > MPI_NUMBER_OF_CONSTANTS)
> +		pr_err("MPI: invalid mpi_const selector %d\n", no);
> +	if (!constants[no])
> +		pr_err("MPI: MPI subsystem not initialized\n");
> +	return constants[no];
> +}
> +EXPORT_SYMBOL_GPL(mpi_const);
> +
>  /****************
>   * Note:  It was a bad idea to use the number of limbs to allocate
>   *	  because on a alpha the limbs are large but we normally need
> @@ -106,6 +163,15 @@ int mpi_resize(MPI a, unsigned nlimbs)
>  	return 0;
>  }
>  
> +void mpi_clear(MPI a)
> +{
> +	if (!a)
> +		return;
> +	a->nlimbs = 0;
> +	a->flags = 0;
> +}
> +EXPORT_SYMBOL_GPL(mpi_clear);
> +
>  void mpi_free(MPI a)
>  {
>  	if (!a)
> @@ -122,5 +188,143 @@ void mpi_free(MPI a)
>  }
>  EXPORT_SYMBOL_GPL(mpi_free);
>  
> +/****************
> + * Note: This copy function should not interpret the MPI
> + *	 but copy it transparently.
> + */
> +MPI mpi_copy(MPI a)
> +{
> +	int i;
> +	MPI b;
> +
> +	if (a) {
> +		b = mpi_alloc(a->nlimbs);
> +		b->nlimbs = a->nlimbs;
> +		b->sign = a->sign;
> +		b->flags = a->flags;
> +		b->flags &= ~(16|32); /* Reset the immutable and constant flags. */
> +		for (i = 0; i < b->nlimbs; i++)
> +			b->d[i] = a->d[i];
> +	} else
> +		b = NULL;
> +	return b;
> +}
> +
> +/****************
> + * This function allocates an MPI which is optimized to hold
> + * a value as large as the one given in the argument and allocates it
> + * with the same flags as A.
> + */
> +MPI mpi_alloc_like(MPI a)
> +{
> +	MPI b;
> +
> +	if (a) {
> +		b = mpi_alloc(a->nlimbs);
> +		b->nlimbs = 0;
> +		b->sign = 0;
> +		b->flags = a->flags;
> +	} else
> +		b = NULL;
> +
> +	return b;
> +}
> +
> +
> +/* Set U into W and release U.  If W is NULL only U will be released. */
> +void mpi_snatch(MPI w, MPI u)
> +{
> +	if (w) {
> +		mpi_assign_limb_space(w, u->d, u->alloced);
> +		w->nlimbs = u->nlimbs;
> +		w->sign   = u->sign;
> +		w->flags  = u->flags;
> +		u->alloced = 0;
> +		u->nlimbs = 0;
> +		u->d = NULL;
> +	}
> +	mpi_free(u);
> +}
> +
> +
> +MPI mpi_set(MPI w, MPI u)
> +{
> +	mpi_ptr_t wp, up;
> +	mpi_size_t usize = u->nlimbs;
> +	int usign = u->sign;
> +
> +	if (!w)
> +		w = mpi_alloc(mpi_get_nlimbs(u));
> +	RESIZE_IF_NEEDED(w, usize);
> +	wp = w->d;
> +	up = u->d;
> +	MPN_COPY(wp, up, usize);
> +	w->nlimbs = usize;
> +	w->flags = u->flags;
> +	w->flags &= ~(16|32); /* Reset the immutable and constant flags.  */
> +	w->sign = usign;
> +	return w;
> +}
> +EXPORT_SYMBOL_GPL(mpi_set);
> +
> +MPI mpi_set_ui(MPI w, unsigned long u)
> +{
> +	if (!w)
> +		w = mpi_alloc(1);
> +	/* FIXME: If U is 0 we have no need to resize and thus possible
> +	 * allocating the the limbs.
> +	 */
> +	RESIZE_IF_NEEDED(w, 1);
> +	w->d[0] = u;
> +	w->nlimbs = u ? 1 : 0;
> +	w->sign = 0;
> +	w->flags = 0;
> +	return w;
> +}
> +EXPORT_SYMBOL_GPL(mpi_set_ui);
> +
> +MPI mpi_alloc_set_ui(unsigned long u)
> +{
> +	MPI w = mpi_alloc(1);
> +	w->d[0] = u;
> +	w->nlimbs = u ? 1 : 0;
> +	w->sign = 0;
> +	return w;
> +}
> +
> +/****************
> + * Swap the value of A and B, when SWAP is 1.
> + * Leave the value when SWAP is 0.
> + * This implementation should be constant-time regardless of SWAP.
> + */
> +void mpi_swap_cond(MPI a, MPI b, unsigned long swap)
> +{
> +	mpi_size_t i;
> +	mpi_size_t nlimbs;
> +	mpi_limb_t mask = ((mpi_limb_t)0) - swap;
> +	mpi_limb_t x;
> +
> +	if (a->alloced > b->alloced)
> +		nlimbs = b->alloced;
> +	else
> +		nlimbs = a->alloced;
> +	if (a->nlimbs > nlimbs || b->nlimbs > nlimbs)
> +		return;
> +
> +	for (i = 0; i < nlimbs; i++) {
> +		x = mask & (a->d[i] ^ b->d[i]);
> +		a->d[i] = a->d[i] ^ x;
> +		b->d[i] = b->d[i] ^ x;
> +	}
> +
> +	x = mask & (a->nlimbs ^ b->nlimbs);
> +	a->nlimbs = a->nlimbs ^ x;
> +	b->nlimbs = b->nlimbs ^ x;
> +
> +	x = mask & (a->sign ^ b->sign);
> +	a->sign = a->sign ^ x;
> +	b->sign = b->sign ^ x;
> +}
> +
>  MODULE_DESCRIPTION("Multiprecision maths library");
>  MODULE_LICENSE("GPL");
> -- 
> 2.17.1
> 

-- 
Regards,
Marcelo


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply

* Re: [PATCH v8 00/12] Introduce CAP_PERFMON to secure system performance monitoring and observability
From: Ravi Bangoria @ 2020-07-10 13:31 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Alexei Starovoitov,
	Ingo Molnar, James Morris, Namhyung Kim, Serge Hallyn, Jiri Olsa,
	Song Liu, Andi Kleen, Stephane Eranian, Igor Lubashev,
	Thomas Gleixner, linux-kernel,
	linux-security-module@vger.kernel.org, selinux@vger.kernel.org,
	intel-gfx@lists.freedesktop.org, linux-doc@vger.kernel.org,
	linux-man, Ravi Bangoria
In-Reply-To: <f96f8f8a-e65c-3f36-dc85-fc3f5191e8c5@linux.intel.com>

Hi Alexey,

> Currently access to perf_events, i915_perf and other performance
> monitoring and observability subsystems of the kernel is open only for
> a privileged process [1] with CAP_SYS_ADMIN capability enabled in the
> process effective set [2].
> 
> This patch set introduces CAP_PERFMON capability designed to secure
> system performance monitoring and observability operations so that
> CAP_PERFMON would assist CAP_SYS_ADMIN capability in its governing role
> for performance monitoring and observability subsystems of the kernel.

I'm seeing an issue with CAP_PERFMON when I try to record data for a
specific target. I don't know whether this is sort of a regression or
an expected behavior.

Without setting CAP_PERFMON:

   $ getcap ./perf
   $ ./perf stat -a ls
     Error:
     Access to performance monitoring and observability operations is limited.
   $ ./perf stat ls
     Performance counter stats for 'ls':
    
                  2.06 msec task-clock:u              #    0.418 CPUs utilized
                     0      context-switches:u        #    0.000 K/sec
                     0      cpu-migrations:u          #    0.000 K/sec

With CAP_PERFMON:

   $ getcap ./perf
     ./perf = cap_perfmon+ep
   $ ./perf stat -a ls
     Performance counter stats for 'system wide':
    
                142.42 msec cpu-clock                 #   25.062 CPUs utilized
                   182      context-switches          #    0.001 M/sec
                    48      cpu-migrations            #    0.337 K/sec
   $ ./perf stat ls
     Error:
     Access to performance monitoring and observability operations is limited.

Am I missing something silly?

Analysis:
---------
A bit more analysis lead me to below kernel code fs/exec.c:

   begin_new_exec()
   {
         ...
         if (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP ||
             !(uid_eq(current_euid(), current_uid()) &&
               gid_eq(current_egid(), current_gid())))
                 set_dumpable(current->mm, suid_dumpable);
         else
                 set_dumpable(current->mm, SUID_DUMP_USER);

         ...
         commit_creds(bprm->cred);
   }

When I execute './perf stat ls', it's going into else condition and thus sets
dumpable flag as SUID_DUMP_USER. Then in commit_creds():

   int commit_creds(struct cred *new)
   {
         ...
         /* dumpability changes */
         if (...
             !cred_cap_issubset(old, new)) {
                 if (task->mm)
                         set_dumpable(task->mm, suid_dumpable);
   }

!cred_cap_issubset(old, new) fails for perf without any capability and thus
it doesn't execute set_dumpable(). Whereas that condition passes for perf
with CAP_PERFMON and thus it overwrites old value (SUID_DUMP_USER) with
suid_dumpable in mm_flags. On an Ubuntu, suid_dumpable default value is
SUID_DUMP_ROOT. On Fedora, it's SUID_DUMP_DISABLE. (/proc/sys/fs/suid_dumpable).

Now while opening an event:

   perf_event_open()
     ptrace_may_access()
       __ptrace_may_access() {
                 ...
                 if (mm &&
                     ((get_dumpable(mm) != SUID_DUMP_USER) &&
                      !ptrace_has_cap(cred, mm->user_ns, mode)))
                     return -EPERM;
       }

This if condition passes for perf with CAP_PERFMON and thus it returns -EPERM.
But it fails for perf without CAP_PERFMON and thus it goes ahead and returns
success. So opening an event fails when perf has CAP_PREFMON and tries to open
process specific event as normal user.

Workarounds:
------------
Based on above analysis, I found couple of workarounds (examples are on
Ubuntu 18.04.4 powerpc):

Workaround1:
Setting SUID_DUMP_USER as default (in /proc/sys/fs/suid_dumpable) solves the
issue.

   # echo 1 > /proc/sys/fs/suid_dumpable
   $ getcap ./perf
     ./perf = cap_perfmon+ep
   $ ./perf stat ls
     Performance counter stats for 'ls':
    
                  1.47 msec task-clock                #    0.806 CPUs utilized
                     0      context-switches          #    0.000 K/sec
                     0      cpu-migrations            #    0.000 K/sec

Workaround2:
Using CAP_SYS_PTRACE along with CAP_PERFMON solves the issue.

   $ cat /proc/sys/fs/suid_dumpable
     2
   # setcap "cap_perfmon,cap_sys_ptrace=ep" ./perf
   $ ./perf stat ls
     Performance counter stats for 'ls':
    
                  1.41 msec task-clock                #    0.826 CPUs utilized
                     0      context-switches          #    0.000 K/sec
                     0      cpu-migrations            #    0.000 K/sec

Workaround3:
Adding CAP_PERFMON to parent of perf (/bin/bash) also solves the issue.

   $ cat /proc/sys/fs/suid_dumpable
     2
   # setcap "cap_perfmon=ep" /bin/bash
   # setcap "cap_perfmon=ep" ./perf
   $ bash
   $ ./perf stat ls
     Performance counter stats for 'ls':
    
                  1.47 msec task-clock                #    0.806 CPUs utilized
                     0      context-switches          #    0.000 K/sec
                     0      cpu-migrations            #    0.000 K/sec

- Ravi

^ permalink raw reply

* Re: [PATCH v8 00/12] Introduce CAP_PERFMON to secure system performance monitoring and observability
From: Alexey Budankov @ 2020-07-10 14:30 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Alexei Starovoitov,
	Ingo Molnar, James Morris, Namhyung Kim, Serge Hallyn, Jiri Olsa,
	Song Liu, Andi Kleen, Stephane Eranian, Igor Lubashev,
	Thomas Gleixner, linux-kernel,
	linux-security-module@vger.kernel.org, selinux@vger.kernel.org,
	intel-gfx@lists.freedesktop.org, linux-doc@vger.kernel.org,
	linux-man
In-Reply-To: <76718dc6-5483-5e2e-85b8-64e70306ee1f@linux.ibm.com>


Hi Ravi,

On 10.07.2020 16:31, Ravi Bangoria wrote:
> Hi Alexey,
> 
>> Currently access to perf_events, i915_perf and other performance
>> monitoring and observability subsystems of the kernel is open only for
>> a privileged process [1] with CAP_SYS_ADMIN capability enabled in the
>> process effective set [2].
>>
>> This patch set introduces CAP_PERFMON capability designed to secure
>> system performance monitoring and observability operations so that
>> CAP_PERFMON would assist CAP_SYS_ADMIN capability in its governing role
>> for performance monitoring and observability subsystems of the kernel.
> 
> I'm seeing an issue with CAP_PERFMON when I try to record data for a
> specific target. I don't know whether this is sort of a regression or
> an expected behavior.

Thanks for reporting and root causing this case. The behavior looks like
kind of expected since currently CAP_PERFMON takes over the related part
of CAP_SYS_ADMIN credentials only. Actually Perf security docs [1] say
that access control is also subject to CAP_SYS_PTRACE credentials.

CAP_PERFMON could be used to extend and substitute ptrace_may_access()
check in perf_events subsystem to simplify user experience at least in
this specific case.

Alexei

[1] https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html

> 
> Without setting CAP_PERFMON:
> 
>   $ getcap ./perf
>   $ ./perf stat -a ls
>     Error:
>     Access to performance monitoring and observability operations is limited.
>   $ ./perf stat ls
>     Performance counter stats for 'ls':
>                     2.06 msec task-clock:u              #    0.418 CPUs utilized
>                     0      context-switches:u        #    0.000 K/sec
>                     0      cpu-migrations:u          #    0.000 K/sec
> 
> With CAP_PERFMON:
> 
>   $ getcap ./perf
>     ./perf = cap_perfmon+ep
>   $ ./perf stat -a ls
>     Performance counter stats for 'system wide':
>                   142.42 msec cpu-clock                 #   25.062 CPUs utilized
>                   182      context-switches          #    0.001 M/sec
>                    48      cpu-migrations            #    0.337 K/sec
>   $ ./perf stat ls
>     Error:
>     Access to performance monitoring and observability operations is limited.
> 
> Am I missing something silly?
> 
> Analysis:
> ---------
> A bit more analysis lead me to below kernel code fs/exec.c:
> 
>   begin_new_exec()
>   {
>         ...
>         if (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP ||
>             !(uid_eq(current_euid(), current_uid()) &&
>               gid_eq(current_egid(), current_gid())))
>                 set_dumpable(current->mm, suid_dumpable);
>         else
>                 set_dumpable(current->mm, SUID_DUMP_USER);
> 
>         ...
>         commit_creds(bprm->cred);
>   }
> 
> When I execute './perf stat ls', it's going into else condition and thus sets
> dumpable flag as SUID_DUMP_USER. Then in commit_creds():
> 
>   int commit_creds(struct cred *new)
>   {
>         ...
>         /* dumpability changes */
>         if (...
>             !cred_cap_issubset(old, new)) {
>                 if (task->mm)
>                         set_dumpable(task->mm, suid_dumpable);
>   }
> 
> !cred_cap_issubset(old, new) fails for perf without any capability and thus
> it doesn't execute set_dumpable(). Whereas that condition passes for perf
> with CAP_PERFMON and thus it overwrites old value (SUID_DUMP_USER) with
> suid_dumpable in mm_flags. On an Ubuntu, suid_dumpable default value is
> SUID_DUMP_ROOT. On Fedora, it's SUID_DUMP_DISABLE. (/proc/sys/fs/suid_dumpable).
> 
> Now while opening an event:
> 
>   perf_event_open()
>     ptrace_may_access()
>       __ptrace_may_access() {
>                 ...
>                 if (mm &&
>                     ((get_dumpable(mm) != SUID_DUMP_USER) &&
>                      !ptrace_has_cap(cred, mm->user_ns, mode)))
>                     return -EPERM;
>       }
> 
> This if condition passes for perf with CAP_PERFMON and thus it returns -EPERM.
> But it fails for perf without CAP_PERFMON and thus it goes ahead and returns
> success. So opening an event fails when perf has CAP_PREFMON and tries to open
> process specific event as normal user.
> 
> Workarounds:
> ------------
> Based on above analysis, I found couple of workarounds (examples are on
> Ubuntu 18.04.4 powerpc):
> 
> Workaround1:
> Setting SUID_DUMP_USER as default (in /proc/sys/fs/suid_dumpable) solves the
> issue.
> 
>   # echo 1 > /proc/sys/fs/suid_dumpable
>   $ getcap ./perf
>     ./perf = cap_perfmon+ep
>   $ ./perf stat ls
>     Performance counter stats for 'ls':
>                     1.47 msec task-clock                #    0.806 CPUs utilized
>                     0      context-switches          #    0.000 K/sec
>                     0      cpu-migrations            #    0.000 K/sec
> 
> Workaround2:
> Using CAP_SYS_PTRACE along with CAP_PERFMON solves the issue.
> 
>   $ cat /proc/sys/fs/suid_dumpable
>     2
>   # setcap "cap_perfmon,cap_sys_ptrace=ep" ./perf
>   $ ./perf stat ls
>     Performance counter stats for 'ls':
>                     1.41 msec task-clock                #    0.826 CPUs utilized
>                     0      context-switches          #    0.000 K/sec
>                     0      cpu-migrations            #    0.000 K/sec
> 
> Workaround3:
> Adding CAP_PERFMON to parent of perf (/bin/bash) also solves the issue.
> 
>   $ cat /proc/sys/fs/suid_dumpable
>     2
>   # setcap "cap_perfmon=ep" /bin/bash
>   # setcap "cap_perfmon=ep" ./perf
>   $ bash
>   $ ./perf stat ls
>     Performance counter stats for 'ls':
>                     1.47 msec task-clock                #    0.806 CPUs utilized
>                     0      context-switches          #    0.000 K/sec
>                     0      cpu-migrations            #    0.000 K/sec
> 
> - Ravi

^ permalink raw reply

* Re: [PATCH v8 00/12] Introduce CAP_PERFMON to secure system performance monitoring and observability
From: Arnaldo Carvalho de Melo @ 2020-07-10 17:09 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Ravi Bangoria, Peter Zijlstra, Alexei Starovoitov, Ingo Molnar,
	James Morris, Namhyung Kim, Serge Hallyn, Jiri Olsa, Song Liu,
	Andi Kleen, Stephane Eranian, Igor Lubashev, Thomas Gleixner,
	linux-kernel, linux-security-module@vger.kernel.org,
	selinux@vger.kernel.org, intel-gfx@lists.freedesktop.org,
	linux-doc@vger.kernel.org, linux-man
In-Reply-To: <7776fa40-6c65-2aa6-1322-eb3a01201000@linux.intel.com>

Em Fri, Jul 10, 2020 at 05:30:50PM +0300, Alexey Budankov escreveu:
> On 10.07.2020 16:31, Ravi Bangoria wrote:
> >> Currently access to perf_events, i915_perf and other performance
> >> monitoring and observability subsystems of the kernel is open only for
> >> a privileged process [1] with CAP_SYS_ADMIN capability enabled in the
> >> process effective set [2].

> >> This patch set introduces CAP_PERFMON capability designed to secure
> >> system performance monitoring and observability operations so that
> >> CAP_PERFMON would assist CAP_SYS_ADMIN capability in its governing role
> >> for performance monitoring and observability subsystems of the kernel.
 
> > I'm seeing an issue with CAP_PERFMON when I try to record data for a
> > specific target. I don't know whether this is sort of a regression or
> > an expected behavior.
 
> Thanks for reporting and root causing this case. The behavior looks like
> kind of expected since currently CAP_PERFMON takes over the related part
> of CAP_SYS_ADMIN credentials only. Actually Perf security docs [1] say
> that access control is also subject to CAP_SYS_PTRACE credentials.

I think that stating that in the error message would be helpful, after
all, who reads docs? 8-)

I.e., this:

$ ./perf stat ls
  Error:
  Access to performance monitoring and observability operations is limited.
$

Could become:

$ ./perf stat ls
  Error:
  Access to performance monitoring and observability operations is limited.
  Right now only CAP_PERFMON is granted, you may need CAP_SYS_PTRACE.
$

- Arnaldo
 
> CAP_PERFMON could be used to extend and substitute ptrace_may_access()
> check in perf_events subsystem to simplify user experience at least in
> this specific case.
> 
> Alexei
> 
> [1] https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
> 
> > 
> > Without setting CAP_PERFMON:
> > 
> >   $ getcap ./perf
> >   $ ./perf stat -a ls
> >     Error:
> >     Access to performance monitoring and observability operations is limited.
> >   $ ./perf stat ls
> >     Performance counter stats for 'ls':
> >                     2.06 msec task-clock:u              #    0.418 CPUs utilized
> >                     0      context-switches:u        #    0.000 K/sec
> >                     0      cpu-migrations:u          #    0.000 K/sec
> > 
> > With CAP_PERFMON:
> > 
> >   $ getcap ./perf
> >     ./perf = cap_perfmon+ep
> >   $ ./perf stat -a ls
> >     Performance counter stats for 'system wide':
> >                   142.42 msec cpu-clock                 #   25.062 CPUs utilized
> >                   182      context-switches          #    0.001 M/sec
> >                    48      cpu-migrations            #    0.337 K/sec
> >   $ ./perf stat ls
> >     Error:
> >     Access to performance monitoring and observability operations is limited.
> > 
> > Am I missing something silly?
> > 
> > Analysis:
> > ---------
> > A bit more analysis lead me to below kernel code fs/exec.c:
> > 
> >   begin_new_exec()
> >   {
> >         ...
> >         if (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP ||
> >             !(uid_eq(current_euid(), current_uid()) &&
> >               gid_eq(current_egid(), current_gid())))
> >                 set_dumpable(current->mm, suid_dumpable);
> >         else
> >                 set_dumpable(current->mm, SUID_DUMP_USER);
> > 
> >         ...
> >         commit_creds(bprm->cred);
> >   }
> > 
> > When I execute './perf stat ls', it's going into else condition and thus sets
> > dumpable flag as SUID_DUMP_USER. Then in commit_creds():
> > 
> >   int commit_creds(struct cred *new)
> >   {
> >         ...
> >         /* dumpability changes */
> >         if (...
> >             !cred_cap_issubset(old, new)) {
> >                 if (task->mm)
> >                         set_dumpable(task->mm, suid_dumpable);
> >   }
> > 
> > !cred_cap_issubset(old, new) fails for perf without any capability and thus
> > it doesn't execute set_dumpable(). Whereas that condition passes for perf
> > with CAP_PERFMON and thus it overwrites old value (SUID_DUMP_USER) with
> > suid_dumpable in mm_flags. On an Ubuntu, suid_dumpable default value is
> > SUID_DUMP_ROOT. On Fedora, it's SUID_DUMP_DISABLE. (/proc/sys/fs/suid_dumpable).
> > 
> > Now while opening an event:
> > 
> >   perf_event_open()
> >     ptrace_may_access()
> >       __ptrace_may_access() {
> >                 ...
> >                 if (mm &&
> >                     ((get_dumpable(mm) != SUID_DUMP_USER) &&
> >                      !ptrace_has_cap(cred, mm->user_ns, mode)))
> >                     return -EPERM;
> >       }
> > 
> > This if condition passes for perf with CAP_PERFMON and thus it returns -EPERM.
> > But it fails for perf without CAP_PERFMON and thus it goes ahead and returns
> > success. So opening an event fails when perf has CAP_PREFMON and tries to open
> > process specific event as normal user.
> > 
> > Workarounds:
> > ------------
> > Based on above analysis, I found couple of workarounds (examples are on
> > Ubuntu 18.04.4 powerpc):
> > 
> > Workaround1:
> > Setting SUID_DUMP_USER as default (in /proc/sys/fs/suid_dumpable) solves the
> > issue.
> > 
> >   # echo 1 > /proc/sys/fs/suid_dumpable
> >   $ getcap ./perf
> >     ./perf = cap_perfmon+ep
> >   $ ./perf stat ls
> >     Performance counter stats for 'ls':
> >                     1.47 msec task-clock                #    0.806 CPUs utilized
> >                     0      context-switches          #    0.000 K/sec
> >                     0      cpu-migrations            #    0.000 K/sec
> > 
> > Workaround2:
> > Using CAP_SYS_PTRACE along with CAP_PERFMON solves the issue.
> > 
> >   $ cat /proc/sys/fs/suid_dumpable
> >     2
> >   # setcap "cap_perfmon,cap_sys_ptrace=ep" ./perf
> >   $ ./perf stat ls
> >     Performance counter stats for 'ls':
> >                     1.41 msec task-clock                #    0.826 CPUs utilized
> >                     0      context-switches          #    0.000 K/sec
> >                     0      cpu-migrations            #    0.000 K/sec
> > 
> > Workaround3:
> > Adding CAP_PERFMON to parent of perf (/bin/bash) also solves the issue.
> > 
> >   $ cat /proc/sys/fs/suid_dumpable
> >     2
> >   # setcap "cap_perfmon=ep" /bin/bash
> >   # setcap "cap_perfmon=ep" ./perf
> >   $ bash
> >   $ ./perf stat ls
> >     Performance counter stats for 'ls':
> >                     1.47 msec task-clock                #    0.806 CPUs utilized
> >                     0      context-switches          #    0.000 K/sec
> >                     0      cpu-migrations            #    0.000 K/sec
> > 
> > - Ravi

-- 

- Arnaldo

^ permalink raw reply

* Re: [PATCH] ima: Rename internal audit rule functions
From: Tyler Hicks @ 2020-07-10 19:42 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: Dmitry Kasatkin, James Morris, Serge E . Hallyn, linux-kernel,
	linux-integrity, linux-security-module, Casey Schaufler,
	linux-audit
In-Reply-To: <1593466203.5085.62.camel@linux.ibm.com>

On 2020-06-29 17:30:03, Mimi Zohar wrote:
> [Cc'ing the audit mailing list]
> 
> On Mon, 2020-06-29 at 10:30 -0500, Tyler Hicks wrote:
> > 
> > diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
> > index ff2bf57ff0c7..5d62ee8319f4 100644
> > --- a/security/integrity/ima/ima.h
> > +++ b/security/integrity/ima/ima.h
> > @@ -419,24 +419,24 @@ static inline void ima_free_modsig(struct modsig *modsig)
> >  /* LSM based policy rules require audit */
> >  #ifdef CONFIG_IMA_LSM_RULES
> >  
> > -#define security_filter_rule_init security_audit_rule_init
> > -#define security_filter_rule_free security_audit_rule_free
> > -#define security_filter_rule_match security_audit_rule_match
> > +#define ima_audit_rule_init security_audit_rule_init
> > +#define ima_audit_rule_free security_audit_rule_free
> > +#define ima_audit_rule_match security_audit_rule_match
> 
> Instead of defining an entirely new method of identifying files, IMA
> piggybacks on top of the existing audit rule syntax.  IMA policy rules
> "filter" based on this information.
> 
> IMA already audits security/integrity related events.  Using the word
> "audit" here will make things even more confusing than they currently
> are.  Renaming these functions as ima_audit_rule_XXX provides no
> benefit.  At that point, IMA might as well call the
> security_audit_rule prefixed function names directly.  As a quick fix,
> rename them as "ima_filter_rule".
> 
> The correct solution would probably be to rename these prefixed
> "security_audit_rule" functions as "security_filter_rule", so that
> both the audit subsystem and IMA could use them.

There doesn't seem to be any interest, from the audit side, in re-using
these. I don't quite understand why they would want to use them since
they're just simple wrappers around the security_audit_rule_*()
functions.

I'll go the "quick fix" route of renaming them as ima_filter_rule_*().

Tyler

> 
> Mimi

^ permalink raw reply

* [PATCH v2] ima: Rename internal audit rule functions
From: Tyler Hicks @ 2020-07-10 20:37 UTC (permalink / raw)
  To: Mimi Zohar, Dmitry Kasatkin
  Cc: James Morris, Serge E . Hallyn, linux-kernel, linux-integrity,
	linux-security-module, Casey Schaufler

Rename IMA's internal audit rule functions from security_filter_rule_*()
to ima_filter_rule_*(). This avoids polluting the security_* namespace,
which is typically reserved for general security subsystem
infrastructure.

Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Suggested-by: Casey Schaufler <casey@schaufler-ca.com>
---

* v2
  - Rebased onto v3 of prereq series
  - Renamed the functions to ima_filter_rule_*(), instead of
    ima_audit_rule_*(), at Mimi's request
  - Didn't pick up Casey's Reviewed-by on v1 since nearly every line of
    the patch changed. Although, I suspect he'll be equally as happy with
    the new names in v2.

Developed on top of next-integrity-testing, commit cd1d8603df60 ("IMA:
Add audit log for failure conditions"), plus this patch series:

 [PATCH v3 00/12] ima: Fix rule parsing bugs and extend KEXEC_CMDLINE rule support
 https://lore.kernel.org/linux-integrity/20200709061911.954326-1-tyhicks@linux.microsoft.com/T/#t

This patch has dependencies on the above patch series.

Tested with and without CONFIG_IMA_LSM_RULES enabled by attempting to
load IMA policy with rules containing the subj_role=foo conditional.
Build logs are clean in both configurations. The IMA policy was first
loaded without and then with a valid AppArmor profile named "foo". The
behavior is the same before and after this patch is applied:

                  | CONFIG_IMA_LSM_RULES=n   | CONFIG_IMA_LSM_RULES=y
-----------------------------------------------------------------------
 Without Profile  |  IMA policy load fails   | IMA policy load fails
 With Profile     |  IMA policy load fails   | IMA policy load succeeds

 security/integrity/ima/ima.h        | 16 +++++++--------
 security/integrity/ima/ima_policy.c | 30 +++++++++++++----------------
 2 files changed, 21 insertions(+), 25 deletions(-)

diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index 576ae2c6d418..38043074ce5e 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -413,24 +413,24 @@ static inline void ima_free_modsig(struct modsig *modsig)
 /* LSM based policy rules require audit */
 #ifdef CONFIG_IMA_LSM_RULES
 
-#define security_filter_rule_init security_audit_rule_init
-#define security_filter_rule_free security_audit_rule_free
-#define security_filter_rule_match security_audit_rule_match
+#define ima_filter_rule_init security_audit_rule_init
+#define ima_filter_rule_free security_audit_rule_free
+#define ima_filter_rule_match security_audit_rule_match
 
 #else
 
-static inline int security_filter_rule_init(u32 field, u32 op, char *rulestr,
-					    void **lsmrule)
+static inline int ima_filter_rule_init(u32 field, u32 op, char *rulestr,
+				       void **lsmrule)
 {
 	return -EINVAL;
 }
 
-static inline void security_filter_rule_free(void *lsmrule)
+static inline void ima_filter_rule_free(void *lsmrule)
 {
 }
 
-static inline int security_filter_rule_match(u32 secid, u32 field, u32 op,
-					     void *lsmrule)
+static inline int ima_filter_rule_match(u32 secid, u32 field, u32 op,
+					void *lsmrule)
 {
 	return -EINVAL;
 }
diff --git a/security/integrity/ima/ima_policy.c b/security/integrity/ima/ima_policy.c
index 2e87154c9296..c5eda02e5f51 100644
--- a/security/integrity/ima/ima_policy.c
+++ b/security/integrity/ima/ima_policy.c
@@ -258,7 +258,7 @@ static void ima_lsm_free_rule(struct ima_rule_entry *entry)
 	int i;
 
 	for (i = 0; i < MAX_LSM_RULES; i++) {
-		security_filter_rule_free(entry->lsm[i].rule);
+		ima_filter_rule_free(entry->lsm[i].rule);
 		kfree(entry->lsm[i].args_p);
 	}
 }
@@ -308,10 +308,9 @@ static struct ima_rule_entry *ima_lsm_copy_rule(struct ima_rule_entry *entry)
 		 */
 		entry->lsm[i].args_p = NULL;
 
-		security_filter_rule_init(nentry->lsm[i].type,
-					  Audit_equal,
-					  nentry->lsm[i].args_p,
-					  &nentry->lsm[i].rule);
+		ima_filter_rule_init(nentry->lsm[i].type, Audit_equal,
+				     nentry->lsm[i].args_p,
+				     &nentry->lsm[i].rule);
 		if (!nentry->lsm[i].rule)
 			pr_warn("rule for LSM \'%s\' is undefined\n",
 				entry->lsm[i].args_p);
@@ -495,18 +494,16 @@ static bool ima_match_rules(struct ima_rule_entry *rule, struct inode *inode,
 		case LSM_OBJ_ROLE:
 		case LSM_OBJ_TYPE:
 			security_inode_getsecid(inode, &osid);
-			rc = security_filter_rule_match(osid,
-							rule->lsm[i].type,
-							Audit_equal,
-							rule->lsm[i].rule);
+			rc = ima_filter_rule_match(osid, rule->lsm[i].type,
+						   Audit_equal,
+						   rule->lsm[i].rule);
 			break;
 		case LSM_SUBJ_USER:
 		case LSM_SUBJ_ROLE:
 		case LSM_SUBJ_TYPE:
-			rc = security_filter_rule_match(secid,
-							rule->lsm[i].type,
-							Audit_equal,
-							rule->lsm[i].rule);
+			rc = ima_filter_rule_match(secid, rule->lsm[i].type,
+						   Audit_equal,
+						   rule->lsm[i].rule);
 		default:
 			break;
 		}
@@ -901,10 +898,9 @@ static int ima_lsm_rule_init(struct ima_rule_entry *entry,
 		return -ENOMEM;
 
 	entry->lsm[lsm_rule].type = audit_type;
-	result = security_filter_rule_init(entry->lsm[lsm_rule].type,
-					   Audit_equal,
-					   entry->lsm[lsm_rule].args_p,
-					   &entry->lsm[lsm_rule].rule);
+	result = ima_filter_rule_init(entry->lsm[lsm_rule].type, Audit_equal,
+				      entry->lsm[lsm_rule].args_p,
+				      &entry->lsm[lsm_rule].rule);
 	if (!entry->lsm[lsm_rule].rule) {
 		pr_warn("rule for LSM \'%s\' is undefined\n",
 			entry->lsm[lsm_rule].args_p);
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH 2/4] fs: Remove FIRMWARE_PREALLOC_BUFFER from kernel_read_file() enums
From: Scott Branden @ 2020-07-10 21:00 UTC (permalink / raw)
  To: Kees Cook, James Morris
  Cc: Luis Chamberlain, Mimi Zohar, Greg Kroah-Hartman,
	Rafael J. Wysocki, Alexander Viro, Jessica Yu, Dmitry Kasatkin,
	Serge E. Hallyn, Casey Schaufler, Eric W. Biederman,
	Peter Zijlstra, Matthew Garrett, David Howells,
	Mauro Carvalho Chehab, Randy Dunlap, Joel Fernandes (Google),
	KP Singh, Dave Olsthoorn, Hans de Goede, Peter Jones,
	Andrew Morton, Stephen Boyd, Paul Moore, linux-kernel,
	linux-fsdevel, linux-integrity, linux-security-module
In-Reply-To: <20200707081926.3688096-3-keescook@chromium.org>

Hi Kees,

This patch fails during booting of my system - see below.

On 2020-07-07 1:19 a.m., Kees Cook wrote:
> FIRMWARE_PREALLOC_BUFFER is a "how", not a "what", and confuses the LSMs
> that are interested in filtering between types of things. The "how"
> should be an internal detail made uninteresting to the LSMs.
>
> Fixes: a098ecd2fa7d ("firmware: support loading into a pre-allocated buffer")
> Fixes: fd90bc559bfb ("ima: based on policy verify firmware signatures (pre-allocated buffer)")
> Fixes: 4f0496d8ffa3 ("ima: based on policy warn about loading firmware (pre-allocated buffer)")
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>   drivers/base/firmware_loader/main.c | 5 ++---
>   fs/exec.c                           | 7 ++++---
>   include/linux/fs.h                  | 2 +-
>   security/integrity/ima/ima_main.c   | 6 ++----
>   4 files changed, 9 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/base/firmware_loader/main.c b/drivers/base/firmware_loader/main.c
> index ca871b13524e..c2f57cedcd6f 100644
> --- a/drivers/base/firmware_loader/main.c
> +++ b/drivers/base/firmware_loader/main.c
> @@ -465,14 +465,12 @@ fw_get_filesystem_firmware(struct device *device, struct fw_priv *fw_priv,
>   	int i, len;
>   	int rc = -ENOENT;
>   	char *path;
> -	enum kernel_read_file_id id = READING_FIRMWARE;
>   	size_t msize = INT_MAX;
>   	void *buffer = NULL;
>   
>   	/* Already populated data member means we're loading into a buffer */
>   	if (!decompress && fw_priv->data) {
>   		buffer = fw_priv->data;
> -		id = READING_FIRMWARE_PREALLOC_BUFFER;
>   		msize = fw_priv->allocated_size;
>   	}
>   
> @@ -496,7 +494,8 @@ fw_get_filesystem_firmware(struct device *device, struct fw_priv *fw_priv,
>   
>   		/* load firmware files from the mount namespace of init */
>   		rc = kernel_read_file_from_path_initns(path, &buffer,
> -						       &size, msize, id);
> +						       &size, msize,
> +						       READING_FIRMWARE);
>   		if (rc) {
>   			if (rc != -ENOENT)
>   				dev_warn(device, "loading %s failed with error %d\n",
> diff --git a/fs/exec.c b/fs/exec.c
> index e6e8a9a70327..2bf549757ce7 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -927,6 +927,7 @@ int kernel_read_file(struct file *file, void **buf, loff_t *size,
>   {
>   	loff_t i_size, pos;
>   	ssize_t bytes = 0;
> +	void *allocated = NULL;
>   	int ret;
>   
>   	if (!S_ISREG(file_inode(file)->i_mode) || max_size < 0)
> @@ -950,8 +951,8 @@ int kernel_read_file(struct file *file, void **buf, loff_t *size,
>   		goto out;
>   	}
>   
> -	if (id != READING_FIRMWARE_PREALLOC_BUFFER)
> -		*buf = vmalloc(i_size);
> +	if (!*buf)
The assumption that *buf is always NULL when id != 
READING_FIRMWARE_PREALLOC_BUFFER doesn't appear to be correct.
I get unhandled page faults due to this change on boot.
> +		*buf = allocated = vmalloc(i_size);
>   	if (!*buf) {
>   		ret = -ENOMEM;
>   		goto out;
> @@ -980,7 +981,7 @@ int kernel_read_file(struct file *file, void **buf, loff_t *size,
>   
>   out_free:
>   	if (ret < 0) {
> -		if (id != READING_FIRMWARE_PREALLOC_BUFFER) {
> +		if (allocated) {
>   			vfree(*buf);
>   			*buf = NULL;
>   		}
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 3f881a892ea7..95fc775ed937 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2993,10 +2993,10 @@ static inline void i_readcount_inc(struct inode *inode)
>   #endif
>   extern int do_pipe_flags(int *, int);
>   
> +/* This is a list of *what* is being read, not *how*. */
>   #define __kernel_read_file_id(id) \
>   	id(UNKNOWN, unknown)		\
>   	id(FIRMWARE, firmware)		\
> -	id(FIRMWARE_PREALLOC_BUFFER, firmware)	\
>   	id(FIRMWARE_EFI_EMBEDDED, firmware)	\
>   	id(MODULE, kernel-module)		\
>   	id(KEXEC_IMAGE, kexec-image)		\
> diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
> index c1583d98c5e5..f80ee4ce4669 100644
> --- a/security/integrity/ima/ima_main.c
> +++ b/security/integrity/ima/ima_main.c
> @@ -611,19 +611,17 @@ void ima_post_path_mknod(struct dentry *dentry)
>   int ima_read_file(struct file *file, enum kernel_read_file_id read_id)
>   {
>   	/*
> -	 * READING_FIRMWARE_PREALLOC_BUFFER
> -	 *
>   	 * Do devices using pre-allocated memory run the risk of the
>   	 * firmware being accessible to the device prior to the completion
>   	 * of IMA's signature verification any more than when using two
> -	 * buffers?
> +	 * buffers? It may be desirable to include the buffer address
> +	 * in this API and walk all the dma_map_single() mappings to check.
>   	 */
>   	return 0;
>   }
>   
>   const int read_idmap[READING_MAX_ID] = {
>   	[READING_FIRMWARE] = FIRMWARE_CHECK,
> -	[READING_FIRMWARE_PREALLOC_BUFFER] = FIRMWARE_CHECK,
>   	[READING_MODULE] = MODULE_CHECK,
>   	[READING_KEXEC_IMAGE] = KEXEC_KERNEL_CHECK,
>   	[READING_KEXEC_INITRAMFS] = KEXEC_INITRAMFS_CHECK,


^ permalink raw reply

* Re: [PATCH 2/4] fs: Remove FIRMWARE_PREALLOC_BUFFER from kernel_read_file() enums
From: Matthew Wilcox @ 2020-07-10 22:04 UTC (permalink / raw)
  To: Scott Branden
  Cc: Kees Cook, James Morris, Luis Chamberlain, Mimi Zohar,
	Greg Kroah-Hartman, Rafael J. Wysocki, Alexander Viro, Jessica Yu,
	Dmitry Kasatkin, Serge E. Hallyn, Casey Schaufler,
	Eric W. Biederman, Peter Zijlstra, Matthew Garrett, David Howells,
	Mauro Carvalho Chehab, Randy Dunlap, Joel Fernandes (Google),
	KP Singh, Dave Olsthoorn, Hans de Goede, Peter Jones,
	Andrew Morton, Stephen Boyd, Paul Moore, linux-kernel,
	linux-fsdevel, linux-integrity, linux-security-module
In-Reply-To: <3fdb3c53-7471-14d8-ce6a-251d8b660b8a@broadcom.com>

On Fri, Jul 10, 2020 at 02:00:32PM -0700, Scott Branden wrote:
> > @@ -950,8 +951,8 @@ int kernel_read_file(struct file *file, void **buf, loff_t *size,
> >   		goto out;
> >   	}
> > -	if (id != READING_FIRMWARE_PREALLOC_BUFFER)
> > -		*buf = vmalloc(i_size);
> > +	if (!*buf)
> The assumption that *buf is always NULL when id !=
> READING_FIRMWARE_PREALLOC_BUFFER doesn't appear to be correct.
> I get unhandled page faults due to this change on boot.

Did it give you a stack backtrace?


^ permalink raw reply

* Re: [PATCH 2/4] fs: Remove FIRMWARE_PREALLOC_BUFFER from kernel_read_file() enums
From: Scott Branden @ 2020-07-10 22:10 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Kees Cook, James Morris, Luis Chamberlain, Mimi Zohar,
	Greg Kroah-Hartman, Rafael J. Wysocki, Alexander Viro, Jessica Yu,
	Dmitry Kasatkin, Serge E. Hallyn, Casey Schaufler,
	Eric W. Biederman, Peter Zijlstra, Matthew Garrett, David Howells,
	Mauro Carvalho Chehab, Randy Dunlap, Joel Fernandes (Google),
	KP Singh, Dave Olsthoorn, Hans de Goede, Peter Jones,
	Andrew Morton, Stephen Boyd, Paul Moore, linux-kernel,
	linux-fsdevel, linux-integrity, linux-security-module
In-Reply-To: <20200710220411.GR12769@casper.infradead.org>



On 2020-07-10 3:04 p.m., Matthew Wilcox wrote:
> On Fri, Jul 10, 2020 at 02:00:32PM -0700, Scott Branden wrote:
>>> @@ -950,8 +951,8 @@ int kernel_read_file(struct file *file, void **buf, loff_t *size,
>>>    		goto out;
>>>    	}
>>> -	if (id != READING_FIRMWARE_PREALLOC_BUFFER)
>>> -		*buf = vmalloc(i_size);
>>> +	if (!*buf)
>> The assumption that *buf is always NULL when id !=
>> READING_FIRMWARE_PREALLOC_BUFFER doesn't appear to be correct.
>> I get unhandled page faults due to this change on boot.
> Did it give you a stack backtrace?
Yes, but there's no requirement that *buf need to be NULL when calling 
this function.
To fix my particular crash I added the following locally:

--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3989,7 +3989,7 @@ SYSCALL_DEFINE3(finit_module, int, fd, const char 
__user *, uargs, int, flags)
  {
      struct load_info info = { };
      loff_t size;
-    void *hdr;
+    void *hdr = NULL;
      int err;

      err = may_init_module();
>


^ permalink raw reply

* Re: [PATCH 2/4] fs: Remove FIRMWARE_PREALLOC_BUFFER from kernel_read_file() enums
From: Kees Cook @ 2020-07-10 22:44 UTC (permalink / raw)
  To: Scott Branden
  Cc: Matthew Wilcox, James Morris, Luis Chamberlain, Mimi Zohar,
	Greg Kroah-Hartman, Rafael J. Wysocki, Alexander Viro, Jessica Yu,
	Dmitry Kasatkin, Serge E. Hallyn, Casey Schaufler,
	Eric W. Biederman, Peter Zijlstra, Matthew Garrett, David Howells,
	Mauro Carvalho Chehab, Randy Dunlap, Joel Fernandes (Google),
	KP Singh, Dave Olsthoorn, Hans de Goede, Peter Jones,
	Andrew Morton, Stephen Boyd, Paul Moore, linux-kernel,
	linux-fsdevel, linux-integrity, linux-security-module
In-Reply-To: <128120ca-7465-e041-7481-4c5d53f639dd@broadcom.com>

On Fri, Jul 10, 2020 at 03:10:25PM -0700, Scott Branden wrote:
> 
> 
> On 2020-07-10 3:04 p.m., Matthew Wilcox wrote:
> > On Fri, Jul 10, 2020 at 02:00:32PM -0700, Scott Branden wrote:
> > > > @@ -950,8 +951,8 @@ int kernel_read_file(struct file *file, void **buf, loff_t *size,
> > > >    		goto out;
> > > >    	}
> > > > -	if (id != READING_FIRMWARE_PREALLOC_BUFFER)
> > > > -		*buf = vmalloc(i_size);
> > > > +	if (!*buf)
> > > The assumption that *buf is always NULL when id !=
> > > READING_FIRMWARE_PREALLOC_BUFFER doesn't appear to be correct.
> > > I get unhandled page faults due to this change on boot.
> > Did it give you a stack backtrace?
> Yes, but there's no requirement that *buf need to be NULL when calling this
> function.
> To fix my particular crash I added the following locally:
> 
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3989,7 +3989,7 @@ SYSCALL_DEFINE3(finit_module, int, fd, const char
> __user *, uargs, int, flags)
>  {
>      struct load_info info = { };
>      loff_t size;
> -    void *hdr;
> +    void *hdr = NULL;
>      int err;
> 
>      err = may_init_module();
> > 
> 

Thanks for the diagnosis and fix! I haven't had time to cycle back
around to this series yet. Hopefully soon. :)

-- 
Kees Cook

^ permalink raw reply

* Re: [PATCH 2/4] fs: Remove FIRMWARE_PREALLOC_BUFFER from kernel_read_file() enums
From: Scott Branden @ 2020-07-10 22:58 UTC (permalink / raw)
  To: Kees Cook
  Cc: Matthew Wilcox, James Morris, Luis Chamberlain, Mimi Zohar,
	Greg Kroah-Hartman, Rafael J. Wysocki, Alexander Viro, Jessica Yu,
	Dmitry Kasatkin, Serge E. Hallyn, Casey Schaufler,
	Eric W. Biederman, Peter Zijlstra, Matthew Garrett, David Howells,
	Mauro Carvalho Chehab, Randy Dunlap, Joel Fernandes (Google),
	KP Singh, Dave Olsthoorn, Hans de Goede, Peter Jones,
	Andrew Morton, Stephen Boyd, Paul Moore, linux-kernel,
	linux-fsdevel, linux-integrity, linux-security-module
In-Reply-To: <202007101543.912633AA73@keescook>

Hi Kees,

On 2020-07-10 3:44 p.m., Kees Cook wrote:
> On Fri, Jul 10, 2020 at 03:10:25PM -0700, Scott Branden wrote:
>>
>> On 2020-07-10 3:04 p.m., Matthew Wilcox wrote:
>>> On Fri, Jul 10, 2020 at 02:00:32PM -0700, Scott Branden wrote:
>>>>> @@ -950,8 +951,8 @@ int kernel_read_file(struct file *file, void **buf, loff_t *size,
>>>>>     		goto out;
>>>>>     	}
>>>>> -	if (id != READING_FIRMWARE_PREALLOC_BUFFER)
>>>>> -		*buf = vmalloc(i_size);
>>>>> +	if (!*buf)
>>>> The assumption that *buf is always NULL when id !=
>>>> READING_FIRMWARE_PREALLOC_BUFFER doesn't appear to be correct.
>>>> I get unhandled page faults due to this change on boot.
>>> Did it give you a stack backtrace?
>> Yes, but there's no requirement that *buf need to be NULL when calling this
>> function.
>> To fix my particular crash I added the following locally:
>>
>> --- a/kernel/module.c
>> +++ b/kernel/module.c
>> @@ -3989,7 +3989,7 @@ SYSCALL_DEFINE3(finit_module, int, fd, const char
>> __user *, uargs, int, flags)
>>   {
>>       struct load_info info = { };
>>       loff_t size;
>> -    void *hdr;
>> +    void *hdr = NULL;
>>       int err;
>>
>>       err = may_init_module();
> Thanks for the diagnosis and fix! I haven't had time to cycle back
> around to this series yet. Hopefully soon. :)
I don't consider this a complete fix as there may be other callers which 
do not initialize
the *buf param to NULL before calling kernel_read_file.

But, it does boot my system.  Also, I was able to make modifications for my
pread changes that pass (and the IMA works with IMA patch in my series 
is dropped completely with your changes in place).

So your changes work for me other than the hack needed above.
>


^ permalink raw reply

* Re: [PATCH v3 10/16] exec: Remove do_execve_file
From: Pavel Machek @ 2020-07-12 21:02 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, David Miller, Greg Kroah-Hartman, Tetsuo Handa,
	Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov,
	Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski,
	Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List,
	Casey Schaufler, Luis Chamberlain, Linus Torvalds,
	Christian Brauner, Greg Kroah-Hartman
In-Reply-To: <20200702164140.4468-10-ebiederm@xmission.com>

On Thu 2020-07-02 11:41:34, Eric W. Biederman wrote:
> Now that the last callser has been removed remove this code from exec.

Typo "caller".

> For anyone thinking of resurrecing do_execve_file please note that

resurrecting?

> the code was buggy in several fundamental ways.
> 
> - It did not ensure the file it was passed was read-only and that
>   deny_write_access had been called on it.  Which subtlely breaks
>   invaniants in exec.

subtly, invariants?

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply

* Re: [PATCH v5 2/8] lib/mpi: Extend the MPI library
From: Tianjia Zhang @ 2020-07-13  2:17 UTC (permalink / raw)
  To: Marcelo Henrique Cerri
  Cc: herbert, davem, dhowells, mcoquelin.stm32, alexandre.torgue,
	jmorris, serge, nramas, tusharsu, zohar, vt, gilad, pvanleeuwen,
	zhang.jia, linux-crypto, linux-kernel, keyrings, linux-stm32,
	linux-arm-kernel, linux-security-module, linux-integrity
In-Reply-To: <20200710131203.wyj33bq2hgkz6pv4@valinor>



On 2020/7/10 21:12, Marcelo Henrique Cerri wrote:
> Hi, Tianjia.
> 
> On Thu, Jul 09, 2020 at 04:40:09PM +0800, Tianjia Zhang wrote:
>> Expand the mpi library based on libgcrypt, and the ECC algorithm of
>> mpi based on libgcrypt requires these functions.
>> Some other algorithms will be developed based on mpi ecc, such as SM2.
>>
>> Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
>> ---
>>   include/linux/mpi.h    |  88 +++++++++++
>>   lib/mpi/Makefile       |   5 +
>>   lib/mpi/mpi-add.c      | 207 +++++++++++++++++++++++++
>>   lib/mpi/mpi-bit.c      | 251 ++++++++++++++++++++++++++++++
>>   lib/mpi/mpi-cmp.c      |  46 ++++--
>>   lib/mpi/mpi-div.c      | 238 +++++++++++++++++++++++++++++
>>   lib/mpi/mpi-internal.h |  53 +++++++
>>   lib/mpi/mpi-inv.c      | 143 ++++++++++++++++++
>>   lib/mpi/mpi-mod.c      | 155 +++++++++++++++++++
>>   lib/mpi/mpi-mul.c      |  94 ++++++++++++
>>   lib/mpi/mpicoder.c     | 336 +++++++++++++++++++++++++++++++++++++++++
>>   lib/mpi/mpih-div.c     | 294 ++++++++++++++++++++++++++++++++++++
>>   lib/mpi/mpih-mul.c     |  25 +++
>>   lib/mpi/mpiutil.c      | 204 +++++++++++++++++++++++++
>>   14 files changed, 2129 insertions(+), 10 deletions(-)
>>   create mode 100644 lib/mpi/mpi-add.c
>>   create mode 100644 lib/mpi/mpi-div.c
>>   create mode 100644 lib/mpi/mpi-inv.c
>>   create mode 100644 lib/mpi/mpi-mod.c
>>   create mode 100644 lib/mpi/mpi-mul.c
>>
>> diff --git a/lib/mpi/mpi-add.c b/lib/mpi/mpi-add.c
>> new file mode 100644
>> index 000000000000..9afad7832737
>> --- /dev/null
>> +++ b/lib/mpi/mpi-add.c
>> @@ -0,0 +1,207 @@
>> +/* mpi-add.c  -  MPI functions
>> + * Copyright (C) 1994, 1996, 1998, 2001, 2002,
>> + *               2003 Free Software Foundation, Inc.
>> + *
>> + * This file is part of Libgcrypt.
>> + *
>> + * Note: This code is heavily based on the GNU MP Library.
>> + *	 Actually it's the same code with only minor changes in the
>> + *	 way the data is stored; this is to support the abstraction
>> + *	 of an optional secure memory allocation which may be used
>> + *	 to avoid revealing of sensitive data due to paging etc.
>> + */
>> +
>> +#include "mpi-internal.h"
>> +
>> +/****************
>> + * Add the unsigned integer V to the mpi-integer U and store the
>> + * result in W. U and V may be the same.
>> + */
>> +void mpi_add_ui(MPI w, MPI u, unsigned long v)
>> +{
>> +	mpi_ptr_t wp, up;
>> +	mpi_size_t usize, wsize;
>> +	int usign, wsign;
>> +
>> +	usize = u->nlimbs;
>> +	usign = u->sign;
>> +	wsign = 0;
>> +
>> +	/* If not space for W (and possible carry), increase space.  */
>> +	wsize = usize + 1;
>> +	if (w->alloced < wsize)
>> +		mpi_resize(w, wsize);
> 
> You are ignoring the mpi_resize() return. I believe these new functions
> need to return an int to indicate errors as mpi_powm() does.
> 

Yes, of course.  Thanks for pointing it out.

Thanks,
Tianjia

> 
>> +
>> +	/* These must be after realloc (U may be the same as W).  */
>> +	up = u->d;
>> +	wp = w->d;
>> +
>> +	if (!usize) {  /* simple */
>> +		wp[0] = v;
>> +		wsize = v ? 1:0;
>> +	} else if (!usign) {  /* mpi is not negative */
>> +		mpi_limb_t cy;
>> +		cy = mpihelp_add_1(wp, up, usize, v);
>> +		wp[usize] = cy;
>> +		wsize = usize + cy;
>> +	} else {
>> +		/* The signs are different.  Need exact comparison to determine
>> +		 * which operand to subtract from which.
>> +		 */
>> +		if (usize == 1 && up[0] < v) {
>> +			wp[0] = v - up[0];
>> +			wsize = 1;
>> +		} else {
>> +			mpihelp_sub_1(wp, up, usize, v);
>> +			/* Size can decrease with at most one limb. */
>> +			wsize = usize - (wp[usize-1] == 0);
>> +			wsign = 1;
>> +		}
>> +	}
>> +
>> +	w->nlimbs = wsize;
>> +	w->sign   = wsign;
>> +}
>> +


^ permalink raw reply

* Re: [PATCH v8 00/12] Introduce CAP_PERFMON to secure system performance monitoring and observability
From: Alexey Budankov @ 2020-07-13  9:48 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ravi Bangoria, Peter Zijlstra, Alexei Starovoitov, Ingo Molnar,
	James Morris, Namhyung Kim, Serge Hallyn, Jiri Olsa, Song Liu,
	Andi Kleen, Stephane Eranian, Igor Lubashev, Thomas Gleixner,
	linux-kernel, linux-security-module@vger.kernel.org,
	selinux@vger.kernel.org, intel-gfx@lists.freedesktop.org,
	linux-doc@vger.kernel.org, linux-man
In-Reply-To: <20200710170911.GD7487@kernel.org>


On 10.07.2020 20:09, Arnaldo Carvalho de Melo wrote:
> Em Fri, Jul 10, 2020 at 05:30:50PM +0300, Alexey Budankov escreveu:
>> On 10.07.2020 16:31, Ravi Bangoria wrote:
>>>> Currently access to perf_events, i915_perf and other performance
>>>> monitoring and observability subsystems of the kernel is open only for
>>>> a privileged process [1] with CAP_SYS_ADMIN capability enabled in the
>>>> process effective set [2].
> 
>>>> This patch set introduces CAP_PERFMON capability designed to secure
>>>> system performance monitoring and observability operations so that
>>>> CAP_PERFMON would assist CAP_SYS_ADMIN capability in its governing role
>>>> for performance monitoring and observability subsystems of the kernel.
>  
>>> I'm seeing an issue with CAP_PERFMON when I try to record data for a
>>> specific target. I don't know whether this is sort of a regression or
>>> an expected behavior.
>  
>> Thanks for reporting and root causing this case. The behavior looks like
>> kind of expected since currently CAP_PERFMON takes over the related part
>> of CAP_SYS_ADMIN credentials only. Actually Perf security docs [1] say
>> that access control is also subject to CAP_SYS_PTRACE credentials.
> 
> I think that stating that in the error message would be helpful, after
> all, who reads docs? 8-)

At least those who write it :D ...

> 
> I.e., this:
> 
> $ ./perf stat ls
>   Error:
>   Access to performance monitoring and observability operations is limited.
> $
> 
> Could become:
> 
> $ ./perf stat ls
>   Error:
>   Access to performance monitoring and observability operations is limited.
>   Right now only CAP_PERFMON is granted, you may need CAP_SYS_PTRACE.
> $

It would better provide reference to perf security docs in the tool output.
Looks like extending ptrace_may_access() check for perf_events with CAP_PERFMON
makes monitoring simpler and even more secure to use since Perf tool need
not to start/stop/single-step and read/write registers and memory and so on
like a debugger or strace-like tool. What do you think?

Alexei

> 
> - Arnaldo
>  
>> CAP_PERFMON could be used to extend and substitute ptrace_may_access()
>> check in perf_events subsystem to simplify user experience at least in
>> this specific case.
>>
>> Alexei
>>
>> [1] https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
>>
>>>
>>> Without setting CAP_PERFMON:
>>>
>>>   $ getcap ./perf
>>>   $ ./perf stat -a ls
>>>     Error:
>>>     Access to performance monitoring and observability operations is limited.
>>>   $ ./perf stat ls
>>>     Performance counter stats for 'ls':
>>>                     2.06 msec task-clock:u              #    0.418 CPUs utilized
>>>                     0      context-switches:u        #    0.000 K/sec
>>>                     0      cpu-migrations:u          #    0.000 K/sec
>>>
>>> With CAP_PERFMON:
>>>
>>>   $ getcap ./perf
>>>     ./perf = cap_perfmon+ep
>>>   $ ./perf stat -a ls
>>>     Performance counter stats for 'system wide':
>>>                   142.42 msec cpu-clock                 #   25.062 CPUs utilized
>>>                   182      context-switches          #    0.001 M/sec
>>>                    48      cpu-migrations            #    0.337 K/sec
>>>   $ ./perf stat ls
>>>     Error:
>>>     Access to performance monitoring and observability operations is limited.
>>>
>>> Am I missing something silly?
>>>
>>> Analysis:
>>> ---------
>>> A bit more analysis lead me to below kernel code fs/exec.c:
>>>
>>>   begin_new_exec()
>>>   {
>>>         ...
>>>         if (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP ||
>>>             !(uid_eq(current_euid(), current_uid()) &&
>>>               gid_eq(current_egid(), current_gid())))
>>>                 set_dumpable(current->mm, suid_dumpable);
>>>         else
>>>                 set_dumpable(current->mm, SUID_DUMP_USER);
>>>
>>>         ...
>>>         commit_creds(bprm->cred);
>>>   }
>>>
>>> When I execute './perf stat ls', it's going into else condition and thus sets
>>> dumpable flag as SUID_DUMP_USER. Then in commit_creds():
>>>
>>>   int commit_creds(struct cred *new)
>>>   {
>>>         ...
>>>         /* dumpability changes */
>>>         if (...
>>>             !cred_cap_issubset(old, new)) {
>>>                 if (task->mm)
>>>                         set_dumpable(task->mm, suid_dumpable);
>>>   }
>>>
>>> !cred_cap_issubset(old, new) fails for perf without any capability and thus
>>> it doesn't execute set_dumpable(). Whereas that condition passes for perf
>>> with CAP_PERFMON and thus it overwrites old value (SUID_DUMP_USER) with
>>> suid_dumpable in mm_flags. On an Ubuntu, suid_dumpable default value is
>>> SUID_DUMP_ROOT. On Fedora, it's SUID_DUMP_DISABLE. (/proc/sys/fs/suid_dumpable).
>>>
>>> Now while opening an event:
>>>
>>>   perf_event_open()
>>>     ptrace_may_access()
>>>       __ptrace_may_access() {
>>>                 ...
>>>                 if (mm &&
>>>                     ((get_dumpable(mm) != SUID_DUMP_USER) &&
>>>                      !ptrace_has_cap(cred, mm->user_ns, mode)))
>>>                     return -EPERM;
>>>       }
>>>
>>> This if condition passes for perf with CAP_PERFMON and thus it returns -EPERM.
>>> But it fails for perf without CAP_PERFMON and thus it goes ahead and returns
>>> success. So opening an event fails when perf has CAP_PREFMON and tries to open
>>> process specific event as normal user.
>>>
>>> Workarounds:
>>> ------------
>>> Based on above analysis, I found couple of workarounds (examples are on
>>> Ubuntu 18.04.4 powerpc):
>>>
>>> Workaround1:
>>> Setting SUID_DUMP_USER as default (in /proc/sys/fs/suid_dumpable) solves the
>>> issue.
>>>
>>>   # echo 1 > /proc/sys/fs/suid_dumpable
>>>   $ getcap ./perf
>>>     ./perf = cap_perfmon+ep
>>>   $ ./perf stat ls
>>>     Performance counter stats for 'ls':
>>>                     1.47 msec task-clock                #    0.806 CPUs utilized
>>>                     0      context-switches          #    0.000 K/sec
>>>                     0      cpu-migrations            #    0.000 K/sec
>>>
>>> Workaround2:
>>> Using CAP_SYS_PTRACE along with CAP_PERFMON solves the issue.
>>>
>>>   $ cat /proc/sys/fs/suid_dumpable
>>>     2
>>>   # setcap "cap_perfmon,cap_sys_ptrace=ep" ./perf
>>>   $ ./perf stat ls
>>>     Performance counter stats for 'ls':
>>>                     1.41 msec task-clock                #    0.826 CPUs utilized
>>>                     0      context-switches          #    0.000 K/sec
>>>                     0      cpu-migrations            #    0.000 K/sec
>>>
>>> Workaround3:
>>> Adding CAP_PERFMON to parent of perf (/bin/bash) also solves the issue.
>>>
>>>   $ cat /proc/sys/fs/suid_dumpable
>>>     2
>>>   # setcap "cap_perfmon=ep" /bin/bash
>>>   # setcap "cap_perfmon=ep" ./perf
>>>   $ bash
>>>   $ ./perf stat ls
>>>     Performance counter stats for 'ls':
>>>                     1.47 msec task-clock                #    0.806 CPUs utilized
>>>                     0      context-switches          #    0.000 K/sec
>>>                     0      cpu-migrations            #    0.000 K/sec
>>>
>>> - Ravi
> 

^ permalink raw reply

* [PATCH] capabilities: Replace HTTP links with HTTPS ones
From: Alexander A. Klimov @ 2020-07-13 10:34 UTC (permalink / raw)
  To: serge, linux-security-module, linux-kernel; +Cc: Alexander A. Klimov

Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.

Deterministic algorithm:
For each file:
  If not .svg:
    For each line:
      If doesn't contain `\bxmlns\b`:
        For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
	  If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`:
            If both the HTTP and HTTPS versions
            return 200 OK and serve the same content:
              Replace HTTP with HTTPS.

Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
---
 Continuing my work started at 93431e0607e5.
 See also: git log --oneline '--author=Alexander A. Klimov <grandmaster@al2klimov.de>' v5.7..master
 (Actually letting a shell for loop submit all this stuff for me.)

 If there are any URLs to be removed completely or at least not just HTTPSified:
 Just clearly say so and I'll *undo my change*.
 See also: https://lkml.org/lkml/2020/6/27/64

 If there are any valid, but yet not changed URLs:
 See: https://lkml.org/lkml/2020/6/26/837

 If you apply the patch, please let me know.

 Sorry again to all maintainers who complained about subject lines.
 Now I realized that you want an actually perfect prefixes,
 not just subsystem ones.
 I tried my best...
 And yes, *I could* (at least half-)automate it.
 Impossible is nothing! :)

 kernel/capability.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/capability.c b/kernel/capability.c
index 1444f3954d75..a8a20ebc43ee 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -40,7 +40,7 @@ __setup("no_file_caps", file_caps_disable);
 /*
  * More recent versions of libcap are available from:
  *
- *   http://www.kernel.org/pub/linux/libs/security/linux-privs/
+ *   https://www.kernel.org/pub/linux/libs/security/linux-privs/
  */

 static void warn_legacy_capability_use(void)
-- 
2.27.0

^ permalink raw reply related

* Re: [PATCH v8 00/12] Introduce CAP_PERFMON to secure system performance monitoring and observability
From: Arnaldo Carvalho de Melo @ 2020-07-13 12:17 UTC (permalink / raw)
  To: Alexey Budankov, Peter Zijlstra
  Cc: Ravi Bangoria, Alexei Starovoitov, Ingo Molnar, James Morris,
	Namhyung Kim, Serge Hallyn, Jiri Olsa, Song Liu, Andi Kleen,
	Stephane Eranian, Igor Lubashev, Thomas Gleixner, linux-kernel,
	linux-security-module@vger.kernel.org, selinux@vger.kernel.org,
	intel-gfx@lists.freedesktop.org, linux-doc@vger.kernel.org,
	linux-man
In-Reply-To: <0d2e2306-22b2-a730-dc3f-edb3538b6561@linux.intel.com>

Em Mon, Jul 13, 2020 at 12:48:25PM +0300, Alexey Budankov escreveu:
> 
> On 10.07.2020 20:09, Arnaldo Carvalho de Melo wrote:
> > Em Fri, Jul 10, 2020 at 05:30:50PM +0300, Alexey Budankov escreveu:
> >> On 10.07.2020 16:31, Ravi Bangoria wrote:
> >>>> Currently access to perf_events, i915_perf and other performance
> >>>> monitoring and observability subsystems of the kernel is open only for
> >>>> a privileged process [1] with CAP_SYS_ADMIN capability enabled in the
> >>>> process effective set [2].

> >>>> This patch set introduces CAP_PERFMON capability designed to secure
> >>>> system performance monitoring and observability operations so that
> >>>> CAP_PERFMON would assist CAP_SYS_ADMIN capability in its governing role
> >>>> for performance monitoring and observability subsystems of the kernel.

> >>> I'm seeing an issue with CAP_PERFMON when I try to record data for a
> >>> specific target. I don't know whether this is sort of a regression or
> >>> an expected behavior.

> >> Thanks for reporting and root causing this case. The behavior looks like
> >> kind of expected since currently CAP_PERFMON takes over the related part
> >> of CAP_SYS_ADMIN credentials only. Actually Perf security docs [1] say
> >> that access control is also subject to CAP_SYS_PTRACE credentials.

> > I think that stating that in the error message would be helpful, after
> > all, who reads docs? 8-)

> At least those who write it :D ...

Everybody should read it, sure :-)
 
> > I.e., this:
> > 
> > $ ./perf stat ls
> >   Error:
> >   Access to performance monitoring and observability operations is limited.
> > $
> > 
> > Could become:
> > 
> > $ ./perf stat ls
> >   Error:
> >   Access to performance monitoring and observability operations is limited.
> >   Right now only CAP_PERFMON is granted, you may need CAP_SYS_PTRACE.
> > $
> 
> It would better provide reference to perf security docs in the tool output.

So add a 3rd line:

$ ./perf stat ls
  Error:
  Access to performance monitoring and observability operations is limited.
  Right now only CAP_PERFMON is granted, you may need CAP_SYS_PTRACE.
  Please read the 'Perf events and tool security' document:
  https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html

> Looks like extending ptrace_may_access() check for perf_events with CAP_PERFMON

You mean the following?

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 856d98c36f56..a2397f724c10 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11595,7 +11595,7 @@ SYSCALL_DEFINE5(perf_event_open,
 		 * perf_event_exit_task() that could imply).
 		 */
 		err = -EACCES;
-		if (!ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS))
+		if (!perfmon_capable() && !ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS))
 			goto err_cred;
 	}

> makes monitoring simpler and even more secure to use since Perf tool need
> not to start/stop/single-step and read/write registers and memory and so on
> like a debugger or strace-like tool. What do you think?

I tend to agree, Peter?
 
> Alexei
> 
> > 
> > - Arnaldo
> >  
> >> CAP_PERFMON could be used to extend and substitute ptrace_may_access()
> >> check in perf_events subsystem to simplify user experience at least in
> >> this specific case.
> >>
> >> Alexei
> >>
> >> [1] https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
> >>
> >>>
> >>> Without setting CAP_PERFMON:
> >>>
> >>>   $ getcap ./perf
> >>>   $ ./perf stat -a ls
> >>>     Error:
> >>>     Access to performance monitoring and observability operations is limited.
> >>>   $ ./perf stat ls
> >>>     Performance counter stats for 'ls':
> >>>                     2.06 msec task-clock:u              #    0.418 CPUs utilized
> >>>                     0      context-switches:u        #    0.000 K/sec
> >>>                     0      cpu-migrations:u          #    0.000 K/sec
> >>>
> >>> With CAP_PERFMON:
> >>>
> >>>   $ getcap ./perf
> >>>     ./perf = cap_perfmon+ep
> >>>   $ ./perf stat -a ls
> >>>     Performance counter stats for 'system wide':
> >>>                   142.42 msec cpu-clock                 #   25.062 CPUs utilized
> >>>                   182      context-switches          #    0.001 M/sec
> >>>                    48      cpu-migrations            #    0.337 K/sec
> >>>   $ ./perf stat ls
> >>>     Error:
> >>>     Access to performance monitoring and observability operations is limited.
> >>>
> >>> Am I missing something silly?
> >>>
> >>> Analysis:
> >>> ---------
> >>> A bit more analysis lead me to below kernel code fs/exec.c:
> >>>
> >>>   begin_new_exec()
> >>>   {
> >>>         ...
> >>>         if (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP ||
> >>>             !(uid_eq(current_euid(), current_uid()) &&
> >>>               gid_eq(current_egid(), current_gid())))
> >>>                 set_dumpable(current->mm, suid_dumpable);
> >>>         else
> >>>                 set_dumpable(current->mm, SUID_DUMP_USER);
> >>>
> >>>         ...
> >>>         commit_creds(bprm->cred);
> >>>   }
> >>>
> >>> When I execute './perf stat ls', it's going into else condition and thus sets
> >>> dumpable flag as SUID_DUMP_USER. Then in commit_creds():
> >>>
> >>>   int commit_creds(struct cred *new)
> >>>   {
> >>>         ...
> >>>         /* dumpability changes */
> >>>         if (...
> >>>             !cred_cap_issubset(old, new)) {
> >>>                 if (task->mm)
> >>>                         set_dumpable(task->mm, suid_dumpable);
> >>>   }
> >>>
> >>> !cred_cap_issubset(old, new) fails for perf without any capability and thus
> >>> it doesn't execute set_dumpable(). Whereas that condition passes for perf
> >>> with CAP_PERFMON and thus it overwrites old value (SUID_DUMP_USER) with
> >>> suid_dumpable in mm_flags. On an Ubuntu, suid_dumpable default value is
> >>> SUID_DUMP_ROOT. On Fedora, it's SUID_DUMP_DISABLE. (/proc/sys/fs/suid_dumpable).
> >>>
> >>> Now while opening an event:
> >>>
> >>>   perf_event_open()
> >>>     ptrace_may_access()
> >>>       __ptrace_may_access() {
> >>>                 ...
> >>>                 if (mm &&
> >>>                     ((get_dumpable(mm) != SUID_DUMP_USER) &&
> >>>                      !ptrace_has_cap(cred, mm->user_ns, mode)))
> >>>                     return -EPERM;
> >>>       }
> >>>
> >>> This if condition passes for perf with CAP_PERFMON and thus it returns -EPERM.
> >>> But it fails for perf without CAP_PERFMON and thus it goes ahead and returns
> >>> success. So opening an event fails when perf has CAP_PREFMON and tries to open
> >>> process specific event as normal user.
> >>>
> >>> Workarounds:
> >>> ------------
> >>> Based on above analysis, I found couple of workarounds (examples are on
> >>> Ubuntu 18.04.4 powerpc):
> >>>
> >>> Workaround1:
> >>> Setting SUID_DUMP_USER as default (in /proc/sys/fs/suid_dumpable) solves the
> >>> issue.
> >>>
> >>>   # echo 1 > /proc/sys/fs/suid_dumpable
> >>>   $ getcap ./perf
> >>>     ./perf = cap_perfmon+ep
> >>>   $ ./perf stat ls
> >>>     Performance counter stats for 'ls':
> >>>                     1.47 msec task-clock                #    0.806 CPUs utilized
> >>>                     0      context-switches          #    0.000 K/sec
> >>>                     0      cpu-migrations            #    0.000 K/sec
> >>>
> >>> Workaround2:
> >>> Using CAP_SYS_PTRACE along with CAP_PERFMON solves the issue.
> >>>
> >>>   $ cat /proc/sys/fs/suid_dumpable
> >>>     2
> >>>   # setcap "cap_perfmon,cap_sys_ptrace=ep" ./perf
> >>>   $ ./perf stat ls
> >>>     Performance counter stats for 'ls':
> >>>                     1.41 msec task-clock                #    0.826 CPUs utilized
> >>>                     0      context-switches          #    0.000 K/sec
> >>>                     0      cpu-migrations            #    0.000 K/sec
> >>>
> >>> Workaround3:
> >>> Adding CAP_PERFMON to parent of perf (/bin/bash) also solves the issue.
> >>>
> >>>   $ cat /proc/sys/fs/suid_dumpable
> >>>     2
> >>>   # setcap "cap_perfmon=ep" /bin/bash
> >>>   # setcap "cap_perfmon=ep" ./perf
> >>>   $ bash
> >>>   $ ./perf stat ls
> >>>     Performance counter stats for 'ls':
> >>>                     1.47 msec task-clock                #    0.806 CPUs utilized
> >>>                     0      context-switches          #    0.000 K/sec
> >>>                     0      cpu-migrations            #    0.000 K/sec
> >>>
> >>> - Ravi
> > 

-- 

- Arnaldo

^ permalink raw reply related

* Re: [PATCH v8 00/12] Introduce CAP_PERFMON to secure system performance monitoring and observability
From: Alexey Budankov @ 2020-07-13 12:37 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra
  Cc: Ravi Bangoria, Alexei Starovoitov, Ingo Molnar, James Morris,
	Namhyung Kim, Serge Hallyn, Jiri Olsa, Song Liu, Andi Kleen,
	Stephane Eranian, Igor Lubashev, Thomas Gleixner, linux-kernel,
	linux-security-module@vger.kernel.org, selinux@vger.kernel.org,
	intel-gfx@lists.freedesktop.org, linux-doc@vger.kernel.org,
	linux-man
In-Reply-To: <20200713121746.GA7029@kernel.org>


On 13.07.2020 15:17, Arnaldo Carvalho de Melo wrote:
> Em Mon, Jul 13, 2020 at 12:48:25PM +0300, Alexey Budankov escreveu:
>>
>> On 10.07.2020 20:09, Arnaldo Carvalho de Melo wrote:
>>> Em Fri, Jul 10, 2020 at 05:30:50PM +0300, Alexey Budankov escreveu:
>>>> On 10.07.2020 16:31, Ravi Bangoria wrote:
>>>>>> Currently access to perf_events, i915_perf and other performance
>>>>>> monitoring and observability subsystems of the kernel is open only for
>>>>>> a privileged process [1] with CAP_SYS_ADMIN capability enabled in the
>>>>>> process effective set [2].
> 
>>>>>> This patch set introduces CAP_PERFMON capability designed to secure
>>>>>> system performance monitoring and observability operations so that
>>>>>> CAP_PERFMON would assist CAP_SYS_ADMIN capability in its governing role
>>>>>> for performance monitoring and observability subsystems of the kernel.
> 
>>>>> I'm seeing an issue with CAP_PERFMON when I try to record data for a
>>>>> specific target. I don't know whether this is sort of a regression or
>>>>> an expected behavior.
> 
>>>> Thanks for reporting and root causing this case. The behavior looks like
>>>> kind of expected since currently CAP_PERFMON takes over the related part
>>>> of CAP_SYS_ADMIN credentials only. Actually Perf security docs [1] say
>>>> that access control is also subject to CAP_SYS_PTRACE credentials.
> 
>>> I think that stating that in the error message would be helpful, after
>>> all, who reads docs? 8-)
> 
>> At least those who write it :D ...
> 
> Everybody should read it, sure :-)
>  
>>> I.e., this:
>>>
>>> $ ./perf stat ls
>>>   Error:
>>>   Access to performance monitoring and observability operations is limited.
>>> $
>>>
>>> Could become:
>>>
>>> $ ./perf stat ls
>>>   Error:
>>>   Access to performance monitoring and observability operations is limited.
>>>   Right now only CAP_PERFMON is granted, you may need CAP_SYS_PTRACE.
>>> $
>>
>> It would better provide reference to perf security docs in the tool output.
> 
> So add a 3rd line:
> 
> $ ./perf stat ls
>   Error:
>   Access to performance monitoring and observability operations is limited.
>   Right now only CAP_PERFMON is granted, you may need CAP_SYS_PTRACE.
>   Please read the 'Perf events and tool security' document:
>   https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
If it had that patch below then message change would not be required.
However this two sentences in the end of whole message would still add up:
"Please read the 'Perf events and tool security' document:
 https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html"

> 
>> Looks like extending ptrace_may_access() check for perf_events with CAP_PERFMON
> 
> You mean the following?

Exactly that.

> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 856d98c36f56..a2397f724c10 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -11595,7 +11595,7 @@ SYSCALL_DEFINE5(perf_event_open,
>  		 * perf_event_exit_task() that could imply).
>  		 */
>  		err = -EACCES;
> -		if (!ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS))
> +		if (!perfmon_capable() && !ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS))
>  			goto err_cred;
>  	}
> 
>> makes monitoring simpler and even more secure to use since Perf tool need
>> not to start/stop/single-step and read/write registers and memory and so on
>> like a debugger or strace-like tool. What do you think?
> 
> I tend to agree, Peter?
>  
>> Alexei
>>
>>>
>>> - Arnaldo

Alexei

^ permalink raw reply

* [PATCH -next] IMA: Make ima_mok_init() static
From: Wei Yongjun @ 2020-07-13 14:26 UTC (permalink / raw)
  To: Hulk Robot, Mimi Zohar, Dmitry Kasatkin, James Morris,
	Serge E. Hallyn
  Cc: Wei Yongjun, linux-integrity, linux-security-module

The sparse tool complains as follows:

security/integrity/ima/ima_mok.c:24:12: warning:
 symbol 'ima_mok_init' was not declared. Should it be static?

ima_mok_init() is not used outside of ima_mok.c, so marks
it static.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
---
 security/integrity/ima/ima_mok.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/integrity/ima/ima_mok.c b/security/integrity/ima/ima_mok.c
index 36cadadbfba4..a93a2ca257d9 100644
--- a/security/integrity/ima/ima_mok.c
+++ b/security/integrity/ima/ima_mok.c
@@ -21,7 +21,7 @@ struct key *ima_blacklist_keyring;
 /*
  * Allocate the IMA blacklist keyring
  */
-__init int ima_mok_init(void)
+static __init int ima_mok_init(void)
 {
 	struct key_restriction *restriction;
 


^ permalink raw reply related

* Re: [PATCH v8 00/12] Introduce CAP_PERFMON to secure system performance monitoring and observability
From: Arnaldo Carvalho de Melo @ 2020-07-13 18:51 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Peter Zijlstra, Ravi Bangoria, Alexei Starovoitov, Ingo Molnar,
	James Morris, Namhyung Kim, Serge Hallyn, Jiri Olsa, Song Liu,
	Andi Kleen, Stephane Eranian, Igor Lubashev, Thomas Gleixner,
	linux-kernel, linux-security-module@vger.kernel.org,
	selinux@vger.kernel.org, intel-gfx@lists.freedesktop.org,
	linux-doc@vger.kernel.org, linux-man
In-Reply-To: <0fadcf78-8b0e-ed03-a554-cc172b7d249c@linux.intel.com>

Em Mon, Jul 13, 2020 at 03:37:51PM +0300, Alexey Budankov escreveu:
> 
> On 13.07.2020 15:17, Arnaldo Carvalho de Melo wrote:
> > Em Mon, Jul 13, 2020 at 12:48:25PM +0300, Alexey Budankov escreveu:
> >>
> >> On 10.07.2020 20:09, Arnaldo Carvalho de Melo wrote:
> >>> Em Fri, Jul 10, 2020 at 05:30:50PM +0300, Alexey Budankov escreveu:
> >>>> On 10.07.2020 16:31, Ravi Bangoria wrote:
> >>>>>> Currently access to perf_events, i915_perf and other performance
> >>>>>> monitoring and observability subsystems of the kernel is open only for
> >>>>>> a privileged process [1] with CAP_SYS_ADMIN capability enabled in the
> >>>>>> process effective set [2].
> > 
> >>>>>> This patch set introduces CAP_PERFMON capability designed to secure
> >>>>>> system performance monitoring and observability operations so that
> >>>>>> CAP_PERFMON would assist CAP_SYS_ADMIN capability in its governing role
> >>>>>> for performance monitoring and observability subsystems of the kernel.
> > 
> >>>>> I'm seeing an issue with CAP_PERFMON when I try to record data for a
> >>>>> specific target. I don't know whether this is sort of a regression or
> >>>>> an expected behavior.
> > 
> >>>> Thanks for reporting and root causing this case. The behavior looks like
> >>>> kind of expected since currently CAP_PERFMON takes over the related part
> >>>> of CAP_SYS_ADMIN credentials only. Actually Perf security docs [1] say
> >>>> that access control is also subject to CAP_SYS_PTRACE credentials.
> > 
> >>> I think that stating that in the error message would be helpful, after
> >>> all, who reads docs? 8-)
> > 
> >> At least those who write it :D ...
> > 
> > Everybody should read it, sure :-)
> >  
> >>> I.e., this:
> >>>
> >>> $ ./perf stat ls
> >>>   Error:
> >>>   Access to performance monitoring and observability operations is limited.
> >>> $
> >>>
> >>> Could become:
> >>>
> >>> $ ./perf stat ls
> >>>   Error:
> >>>   Access to performance monitoring and observability operations is limited.
> >>>   Right now only CAP_PERFMON is granted, you may need CAP_SYS_PTRACE.
> >>> $
> >>
> >> It would better provide reference to perf security docs in the tool output.
> > 
> > So add a 3rd line:
> > 
> > $ ./perf stat ls
> >   Error:
> >   Access to performance monitoring and observability operations is limited.
> >   Right now only CAP_PERFMON is granted, you may need CAP_SYS_PTRACE.
> >   Please read the 'Perf events and tool security' document:
> >   https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html

> If it had that patch below then message change would not be required.

Sure, but the tool should continue to work and provide useful messages
when running on kernels without that change. Pointing to the document is
valid and should be done, that is an agreed point. But the tool can do
some checks, narrow down the possible causes for the error message and
provide something that in most cases will make the user make progress.

> However this two sentences in the end of whole message would still add up:
> "Please read the 'Perf events and tool security' document:
>  https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html"

We're in violent agreement here. :-)
 
> > 
> >> Looks like extending ptrace_may_access() check for perf_events with CAP_PERFMON
> > 
> > You mean the following?
> 
> Exactly that.

Sure, lets then wait for others to chime in and then you can go ahead
and submit that patch.

Peter?

- Arnaldo
 
> > 
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index 856d98c36f56..a2397f724c10 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -11595,7 +11595,7 @@ SYSCALL_DEFINE5(perf_event_open,
> >  		 * perf_event_exit_task() that could imply).
> >  		 */
> >  		err = -EACCES;
> > -		if (!ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS))
> > +		if (!perfmon_capable() && !ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS))
> >  			goto err_cred;
> >  	}
> > 
> >> makes monitoring simpler and even more secure to use since Perf tool need
> >> not to start/stop/single-step and read/write registers and memory and so on
> >> like a debugger or strace-like tool. What do you think?
> > 
> > I tend to agree, Peter?
> >  
> >> Alexei
> >>
> >>>
> >>> - Arnaldo
> 
> Alexei

-- 

- Arnaldo

^ permalink raw reply

* [PATCH ghak84 v4] audit: purge audit_log_string from the intra-kernel audit API
From: Richard Guy Briggs @ 2020-07-13 19:51 UTC (permalink / raw)
  To: Linux-Audit Mailing List, LKML, Linux Security Module list
  Cc: Paul Moore, eparis, john.johansen, Richard Guy Briggs

audit_log_string() was inteded to be an internal audit function and
since there are only two internal uses, remove them.  Purge all external
uses of it by restructuring code to use an existing audit_log_format()
or using audit_log_format().

Please see the upstream issue
https://github.com/linux-audit/audit-kernel/issues/84

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
---
Passes audit-testsuite.

Changelog:
v4
- use double quotes in all replaced audit_log_string() calls

v3
- fix two warning: non-void function does not return a value in all control paths
	Reported-by: kernel test robot <lkp@intel.com>

v2
- restructure to piggyback on existing audit_log_format() calls, checking quoting needs for each.

v1 Vlad Dronov
- https://github.com/nefigtut/audit-kernel/commit/dbbcba46335a002f44b05874153a85b9cc18aebf

 include/linux/audit.h     |  5 -----
 kernel/audit.c            |  4 ++--
 security/apparmor/audit.c | 10 ++++------
 security/apparmor/file.c  | 25 +++++++------------------
 security/apparmor/ipc.c   | 46 +++++++++++++++++++++++-----------------------
 security/apparmor/net.c   | 14 ++++++++------
 security/lsm_audit.c      |  4 ++--
 7 files changed, 46 insertions(+), 62 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 523f77494847..b3d859831a31 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -694,9 +694,4 @@ static inline bool audit_loginuid_set(struct task_struct *tsk)
 	return uid_valid(audit_get_loginuid(tsk));
 }
 
-static inline void audit_log_string(struct audit_buffer *ab, const char *buf)
-{
-	audit_log_n_string(ab, buf, strlen(buf));
-}
-
 #endif
diff --git a/kernel/audit.c b/kernel/audit.c
index 8c201f414226..a2f3e34aa724 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -2080,13 +2080,13 @@ void audit_log_d_path(struct audit_buffer *ab, const char *prefix,
 	/* We will allow 11 spaces for ' (deleted)' to be appended */
 	pathname = kmalloc(PATH_MAX+11, ab->gfp_mask);
 	if (!pathname) {
-		audit_log_string(ab, "<no_memory>");
+		audit_log_format(ab, "\"<no_memory>\"");
 		return;
 	}
 	p = d_path(path, pathname, PATH_MAX+11);
 	if (IS_ERR(p)) { /* Should never happen since we send PATH_MAX */
 		/* FIXME: can we save some information here? */
-		audit_log_string(ab, "<too_long>");
+		audit_log_format(ab, "\"<too_long>\"");
 	} else
 		audit_log_untrustedstring(ab, p);
 	kfree(pathname);
diff --git a/security/apparmor/audit.c b/security/apparmor/audit.c
index 597732503815..f7e97c7e80f3 100644
--- a/security/apparmor/audit.c
+++ b/security/apparmor/audit.c
@@ -57,18 +57,16 @@ static void audit_pre(struct audit_buffer *ab, void *ca)
 	struct common_audit_data *sa = ca;
 
 	if (aa_g_audit_header) {
-		audit_log_format(ab, "apparmor=");
-		audit_log_string(ab, aa_audit_type[aad(sa)->type]);
+		audit_log_format(ab, "apparmor=\"%s\"",
+				 aa_audit_type[aad(sa)->type]);
 	}
 
 	if (aad(sa)->op) {
-		audit_log_format(ab, " operation=");
-		audit_log_string(ab, aad(sa)->op);
+		audit_log_format(ab, " operation=\"%s\"", aad(sa)->op);
 	}
 
 	if (aad(sa)->info) {
-		audit_log_format(ab, " info=");
-		audit_log_string(ab, aad(sa)->info);
+		audit_log_format(ab, " info=\"%s\"", aad(sa)->info);
 		if (aad(sa)->error)
 			audit_log_format(ab, " error=%d", aad(sa)->error);
 	}
diff --git a/security/apparmor/file.c b/security/apparmor/file.c
index 9a2d14b7c9f8..92acf9a49405 100644
--- a/security/apparmor/file.c
+++ b/security/apparmor/file.c
@@ -35,20 +35,6 @@ static u32 map_mask_to_chr_mask(u32 mask)
 }
 
 /**
- * audit_file_mask - convert mask to permission string
- * @buffer: buffer to write string to (NOT NULL)
- * @mask: permission mask to convert
- */
-static void audit_file_mask(struct audit_buffer *ab, u32 mask)
-{
-	char str[10];
-
-	aa_perm_mask_to_str(str, sizeof(str), aa_file_perm_chrs,
-			    map_mask_to_chr_mask(mask));
-	audit_log_string(ab, str);
-}
-
-/**
  * file_audit_cb - call back for file specific audit fields
  * @ab: audit_buffer  (NOT NULL)
  * @va: audit struct to audit values of  (NOT NULL)
@@ -57,14 +43,17 @@ static void file_audit_cb(struct audit_buffer *ab, void *va)
 {
 	struct common_audit_data *sa = va;
 	kuid_t fsuid = current_fsuid();
+	char str[10];
 
 	if (aad(sa)->request & AA_AUDIT_FILE_MASK) {
-		audit_log_format(ab, " requested_mask=");
-		audit_file_mask(ab, aad(sa)->request);
+		aa_perm_mask_to_str(str, sizeof(str), aa_file_perm_chrs,
+				    map_mask_to_chr_mask(aad(sa)->request));
+		audit_log_format(ab, " requested_mask=\"%s\"", str);
 	}
 	if (aad(sa)->denied & AA_AUDIT_FILE_MASK) {
-		audit_log_format(ab, " denied_mask=");
-		audit_file_mask(ab, aad(sa)->denied);
+		aa_perm_mask_to_str(str, sizeof(str), aa_file_perm_chrs,
+				    map_mask_to_chr_mask(aad(sa)->denied));
+		audit_log_format(ab, " denied_mask=\"%s\"", str);
 	}
 	if (aad(sa)->request & AA_AUDIT_FILE_MASK) {
 		audit_log_format(ab, " fsuid=%d",
diff --git a/security/apparmor/ipc.c b/security/apparmor/ipc.c
index 4ecedffbdd33..fe36d112aad9 100644
--- a/security/apparmor/ipc.c
+++ b/security/apparmor/ipc.c
@@ -20,25 +20,23 @@
 
 /**
  * audit_ptrace_mask - convert mask to permission string
- * @buffer: buffer to write string to (NOT NULL)
  * @mask: permission mask to convert
+ *
+ * Returns: pointer to static string
  */
-static void audit_ptrace_mask(struct audit_buffer *ab, u32 mask)
+static const char *audit_ptrace_mask(u32 mask)
 {
 	switch (mask) {
 	case MAY_READ:
-		audit_log_string(ab, "read");
-		break;
+		return "read";
 	case MAY_WRITE:
-		audit_log_string(ab, "trace");
-		break;
+		return "trace";
 	case AA_MAY_BE_READ:
-		audit_log_string(ab, "readby");
-		break;
+		return "readby";
 	case AA_MAY_BE_TRACED:
-		audit_log_string(ab, "tracedby");
-		break;
+		return "tracedby";
 	}
+	return "";
 }
 
 /* call back to audit ptrace fields */
@@ -47,12 +45,12 @@ static void audit_ptrace_cb(struct audit_buffer *ab, void *va)
 	struct common_audit_data *sa = va;
 
 	if (aad(sa)->request & AA_PTRACE_PERM_MASK) {
-		audit_log_format(ab, " requested_mask=");
-		audit_ptrace_mask(ab, aad(sa)->request);
+		audit_log_format(ab, " requested_mask=\"%s\"",
+				 audit_ptrace_mask(aad(sa)->request));
 
 		if (aad(sa)->denied & AA_PTRACE_PERM_MASK) {
-			audit_log_format(ab, " denied_mask=");
-			audit_ptrace_mask(ab, aad(sa)->denied);
+			audit_log_format(ab, " denied_mask=\"%s\"",
+					 audit_ptrace_mask(aad(sa)->denied));
 		}
 	}
 	audit_log_format(ab, " peer=");
@@ -142,16 +140,18 @@ static inline int map_signal_num(int sig)
 }
 
 /**
- * audit_file_mask - convert mask to permission string
- * @buffer: buffer to write string to (NOT NULL)
+ * audit_signal_mask - convert mask to permission string
  * @mask: permission mask to convert
+ *
+ * Returns: pointer to static string
  */
-static void audit_signal_mask(struct audit_buffer *ab, u32 mask)
+static const char *audit_signal_mask(u32 mask)
 {
 	if (mask & MAY_READ)
-		audit_log_string(ab, "receive");
+		return "receive";
 	if (mask & MAY_WRITE)
-		audit_log_string(ab, "send");
+		return "send";
+	return "";
 }
 
 /**
@@ -164,11 +164,11 @@ static void audit_signal_cb(struct audit_buffer *ab, void *va)
 	struct common_audit_data *sa = va;
 
 	if (aad(sa)->request & AA_SIGNAL_PERM_MASK) {
-		audit_log_format(ab, " requested_mask=");
-		audit_signal_mask(ab, aad(sa)->request);
+		audit_log_format(ab, " requested_mask=\"%s\"",
+				 audit_signal_mask(aad(sa)->request));
 		if (aad(sa)->denied & AA_SIGNAL_PERM_MASK) {
-			audit_log_format(ab, " denied_mask=");
-			audit_signal_mask(ab, aad(sa)->denied);
+			audit_log_format(ab, " denied_mask=\"%s\"",
+					 audit_signal_mask(aad(sa)->denied));
 		}
 	}
 	if (aad(sa)->signal == SIGUNKNOWN)
diff --git a/security/apparmor/net.c b/security/apparmor/net.c
index d8afc39f663a..fa0e85568450 100644
--- a/security/apparmor/net.c
+++ b/security/apparmor/net.c
@@ -72,16 +72,18 @@ void audit_net_cb(struct audit_buffer *ab, void *va)
 {
 	struct common_audit_data *sa = va;
 
-	audit_log_format(ab, " family=");
 	if (address_family_names[sa->u.net->family])
-		audit_log_string(ab, address_family_names[sa->u.net->family]);
+		audit_log_format(ab, " family=\"%s\"",
+				 address_family_names[sa->u.net->family]);
 	else
-		audit_log_format(ab, "\"unknown(%d)\"", sa->u.net->family);
-	audit_log_format(ab, " sock_type=");
+		audit_log_format(ab, " family=\"unknown(%d)\"",
+				 sa->u.net->family);
 	if (sock_type_names[aad(sa)->net.type])
-		audit_log_string(ab, sock_type_names[aad(sa)->net.type]);
+		audit_log_format(ab, " sock_type=\"%s\"",
+				 sock_type_names[aad(sa)->net.type]);
 	else
-		audit_log_format(ab, "\"unknown(%d)\"", aad(sa)->net.type);
+		audit_log_format(ab, " sock_type=\"unknown(%d)\"",
+				 aad(sa)->net.type);
 	audit_log_format(ab, " protocol=%d", aad(sa)->net.protocol);
 
 	if (aad(sa)->request & NET_PERMS_MASK) {
diff --git a/security/lsm_audit.c b/security/lsm_audit.c
index 7c555621c2bd..53d0d183db8f 100644
--- a/security/lsm_audit.c
+++ b/security/lsm_audit.c
@@ -432,8 +432,8 @@ static void dump_common_audit_data(struct audit_buffer *ab,
 				 a->u.ibendport->port);
 		break;
 	case LSM_AUDIT_DATA_LOCKDOWN:
-		audit_log_format(ab, " lockdown_reason=");
-		audit_log_string(ab, lockdown_reasons[a->u.reason]);
+		audit_log_format(ab, " lockdown_reason=\"%s\"",
+				 lockdown_reasons[a->u.reason]);
 		break;
 	} /* switch (a->type) */
 }
-- 
1.8.3.1


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox