Linux Security Modules development

Linux Security Modules development
 help / color / mirror / Atom feed

* Re: [PATCH v11 02/25] LSM: Create and manage the lsmblob data structure.
From: Tetsuo Handa @ 2019-11-23  5:21 UTC (permalink / raw)
  To: Casey Schaufler; +Cc: linux-security-module
In-Reply-To: <20191123011454.3292-1-casey@schaufler-ca.com>

On 2019/11/23 10:14, Casey Schaufler wrote:
> When more than one security module is exporting data to
> audit and networking sub-systems a single 32 bit integer
> is no longer sufficient to represent the data. Add a
> structure to be used instead.
> 
> The lsmblob structure is currently an array of
> u32 "secids". There is an entry for each of the
> security modules built into the system that would
> use secids if active. The system assigns the module
> a "slot" when it registers hooks. If modules are
> compiled in but not registered there will be unused
> slots.
> 
> A new lsm_id structure, which contains the name
> of the LSM and its slot number, is created. There
> is an instance for each LSM, which assigns the name
> and passes it to the infrastructure to set the slot.
> 
> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
> ---
>  include/linux/lsm_hooks.h  | 12 ++++++--
>  include/linux/security.h   | 58 ++++++++++++++++++++++++++++++++++++++
>  security/apparmor/lsm.c    |  7 ++++-
>  security/commoncap.c       |  7 ++++-
>  security/loadpin/loadpin.c |  8 +++++-
>  security/safesetid/lsm.c   |  8 +++++-
>  security/security.c        | 28 ++++++++++++++----
>  security/selinux/hooks.c   |  8 +++++-
>  security/smack/smack_lsm.c |  7 ++++-
>  security/tomoyo/tomoyo.c   |  8 +++++-
>  security/yama/yama_lsm.c   |  7 ++++-
>  11 files changed, 142 insertions(+), 16 deletions(-)
>

For TOMOYO part,
 
Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

^ permalink raw reply

* Re: general protection fault in __schedule (2)
From: Sean Christopherson @ 2019-11-25 17:54 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: syzbot, Casey Schaufler, Frederic Weisbecker, Greg Kroah-Hartman,
	H. Peter Anvin, Jim Mattson, James Morris, Raslan, KarimAllah,
	Kate Stewart, KVM list, LKML, linux-security-module, Ingo Molnar,
	Ingo Molnar, Pavel Tatashin, Paolo Bonzini, Philippe Ombredanne,
	Radim Krčmář, Serge E. Hallyn, syzkaller-bugs,
	Thomas Gleixner, the arch/x86 maintainers
In-Reply-To: <CACT4Y+b9FD8GTHc0baY-kUkuNFo-gdXCJ-uk5JtJSyjsyt8jTg@mail.gmail.com>

On Sat, Nov 23, 2019 at 06:15:15AM +0100, Dmitry Vyukov wrote:
> On Fri, Nov 22, 2019 at 9:54 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > On Thu, Nov 21, 2019 at 11:19:00PM -0800, syzbot wrote:
> > > syzbot has bisected this bug to:
> > >
> > > commit 8fcc4b5923af5de58b80b53a069453b135693304
> > > Author: Jim Mattson <jmattson@google.com>
> > > Date:   Tue Jul 10 09:27:20 2018 +0000
> > >
> > >     kvm: nVMX: Introduce KVM_CAP_NESTED_STATE
> > >
> > > bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=124cdbace00000
> > > start commit:   234b69e3 ocfs2: fix ocfs2 read block panic
> > > git tree:       upstream
> > > final crash:    https://syzkaller.appspot.com/x/report.txt?x=114cdbace00000
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=164cdbace00000
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=5fa12be50bca08d8
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=7e2ab84953e4084a638d
> > > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=150f0a4e400000
> > > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17f67111400000
> > >
> > > Reported-by: syzbot+7e2ab84953e4084a638d@syzkaller.appspotmail.com
> > > Fixes: 8fcc4b5923af ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE")
> > >
> > > For information about bisection process see: https://goo.gl/tpsmEJ#bisection
> >
> > Is there a way to have syzbot stop processing/bisecting these things
> > after a reasonable amount of time?  The original crash is from August of
> > last year...
> >
> > Note, the original crash is actually due to KVM's put_kvm() fd race, but
> > whatever we want to blame, it's a duplicate.
> >
> > #syz dup: general protection fault in kvm_lapic_hv_timer_in_use
> 
> Hi Sean,
> 
> syzbot only sends bisection results to open bugs with no known fixes.
> So what you did (marking the bug as invalid/dup, or attaching a fix)
> would stop it from doing/sending bisection.
> 
> "Original crash happened a long time ago" is not necessary a good
> signal. On the syzbot dashboard
> (https://syzkaller.appspot.com/upstream), you can see bugs with the
> original crash 2+ years ago, but they are still pretty much relevant.
> The default kernel development process strategy for invalidating bug
> reports by burying them in oblivion has advantages, but also
> downsides. FWIW syzbot prefers explicit status tracking.

I have no objection to explicit status tracking or getting pinged on old
open bugs.  I suppose I don't even mind the belated bisection, I'd probably
whine if syzbot didn't do the bisection :-).

What's annoying is the report doesn't provide any information about when it
originally occured or on what kernel it originally failed.  It didn't occur
to me that the original bug might be a year old and I only realized it was
from an old kernel when I saw "4.19.0-rc4+" in the dashboard's sample crash
log.  Knowing that the original crash was a year old would have saved me
5-10 minutes of getting myself oriented.

Could syzbot provide the date and reported kernel version (assuming the
kernel version won't be misleading) of the original failure in its reports?

^ permalink raw reply

* [GIT PULL] pipe: Notification queue preparation
From: David Howells @ 2019-11-25 22:39 UTC (permalink / raw)
  To: torvalds
  Cc: dhowells, Rasmus Villemoes, Greg Kroah-Hartman, Peter Zijlstra,
	raven, Christian Brauner, keyrings, linux-usb, linux-block,
	linux-security-module, linux-fsdevel, linux-api, linux-kernel

Hi Linus,

Can you pull this please?  This is my set of preparatory patches for
building a general notification queue on top of pipes.  It makes a number
of significant changes:

 (1) It removes the nr_exclusive argument from __wake_up_sync_key() as this
     is always 1.  This prepares for step 2.

 (2) Adds wake_up_interruptible_sync_poll_locked() so that poll can be
     woken up from a function that's holding the poll waitqueue spinlock.

     [btw, I realise that I haven't un-sync'd the
      wake_up_interruptible_sync_poll() calls as you tentatively suggested.
      I can send a follow up patch to fix that if you still want it]

 (3) Change the pipe buffer ring to be managed in terms of unbounded head
     and tail indices rather than bounded index and length.  This means
     that reading the pipe only needs to modify one index, not two.

 (4) A selection of helper functions are provided to query the state of the
     pipe buffer, plus a couple to apply updates to the pipe indices.

 (5) The pipe ring is allowed to have kernel-reserved slots.  This allows
     many notification messages to be spliced in by the kernel without
     allowing userspace to pin too many pages if it writes to the same
     pipe.

 (6) Advance the head and tail indices inside the pipe waitqueue lock and
     use step 2 to poke poll without having to take the lock twice.

 (7) Rearrange pipe_write() to preallocate the buffer it is going to write
     into and then drop the spinlock.  This allows kernel notifications to
     then be added the ring whilst it is filling the buffer it allocated.
     The read side is stalled because the pipe mutex is still held.

 (8) Don't wake up readers on a pipe if there was already data in it when
     we added more.

 (9) Don't wake up writers on a pipe if the ring wasn't full before we
     removed a buffer.

PATCHES	BENCHMARK	BEST		TOTAL BYTES	AVG BYTES	STDDEV
=======	===============	===============	===============	===============	===============
-	pipe		      307457969	    36348556755	      302904639	       10622403
-	splice		      287117614	    26933658717	      224447155	      160777958
-	vmsplice	      435180375	    51302964090	      427524700	       19083037

rm-nrx	pipe		      311091179	    37093181356	      309109844	        7221622
rm-nrx	splice		      285628049	    27916298942	      232635824	      158296431
rm-nrx	vmsplice	      417703153	    47570362546	      396419687	       33960822

wakesl	pipe		      310698731	    36772541631	      306437846	        8249347
wakesl	splice		      286193726	    28600435451	      238336962	      141169318
wakesl	vmsplice	      436175803	    50723895824	      422699131	       40724240

ht	pipe		      305534565	    36426079543	      303550662	        5673885
ht	splice		      243632025	    23319439010	      194328658	      150479853
ht	vmsplice	      432825176	    49101781001	      409181508	       44102509

k-rsv	pipe		      308691523	    36652267561	      305435563	       12972559
k-rsv	splice		      244793528	    23625172865	      196876440	      125319143
k-rsv	vmsplice	      436119082	    49460808579	      412173404	       55547525

r-adv-t	pipe		      310094218	    36860182219	      307168185	        8081101
r-adv-t	splice		      285527382	    27085052687	      225708772	      206918887
r-adv-t	vmsplice	      336885948	    40128756927	      334406307	        5895935

r-cond	pipe		      308727804	    36635828180	      305298568	        9976806
r-cond	splice		      284467568	    28445793054	      237048275	      200284329
r-cond	vmsplice	      449679489	    51134833848	      426123615	       66790875

w-preal	pipe		      307416578	    36662086426	      305517386	        6216663
w-preal	splice		      282655051	    28455249109	      237127075	      194154549
w-preal	vmsplice	      437002601	    47832160621	      398601338	       96513019

w-redun	pipe		      307279630	    36329750422	      302747920	        8913567
w-redun	splice		      284324488	    27327152734	      227726272	      219735663
w-redun	vmsplice	      451141971	    51485257719	      429043814	       51388217

w-ckful	pipe		      305055247	    36374947350	      303124561	        5400728
w-ckful	splice		      281575308	    26841554544	      223679621	      215942886
w-ckful	vmsplice	      436653588	    47564907110	      396374225	       82255342

The patches column indicates the point in the patchset at which the benchmarks
were taken:

	0	No patches
	rm-nrx	"Remove the nr_exclusive argument from __wake_up_sync_key()"
	wakesl	"Add wake_up_interruptible_sync_poll_locked()"
	ht	"pipe: Use head and tail pointers for the ring, not cursor and length"
	k-rsv	"pipe: Allow pipes to have kernel-reserved slots"
	r-adv-t	"pipe: Advance tail pointer inside of wait spinlock in pipe_read()"
	r-cond	"pipe: Conditionalise wakeup in pipe_read()"
	w-preal	"pipe: Rearrange sequence in pipe_write() to preallocate slot"
	w-redun	"pipe: Remove redundant wakeup from pipe_write()"
	w-ckful	"pipe: Check for ring full inside of the spinlock in pipe_write()"

Changes:

 (*) Fix some bugs spotted by kbuild.

 ver #3:

 (*) Get rid of pipe_commit_{read,write}.

 (*) Port the virtio_console driver.

 (*) Fix pipe_zero().

 (*) Amend some comments.

 (*) Added an additional patch that changes the threshold at which readers
     wake writers for Konstantin Khlebnikov.

 ver #2:

 (*) Split the notification patches out into a separate branch.

 (*) Removed the nr_exclusive parameter from __wake_up_sync_key().

 (*) Renamed the locked wakeup function.

 (*) Add helpers for empty, full, occupancy.

 (*) Split the addition of ->max_usage out into its own patch.

 (*) Fixed some bits pointed out by Rasmus Villemoes.

 ver #1:

 (*) Build on top of standard pipes instead of having a driver.

David
---
The following changes since commit da0c9ea146cbe92b832f1b0f694840ea8eb33cce:

  Linux 5.4-rc2 (2019-10-06 14:27:30 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git tags/notifications-pipe-prep-20191115

for you to fetch changes up to 3c0edea9b29f9be6c093f236f762202b30ac9431:

  pipe: Remove sync on wake_ups (2019-11-15 16:22:54 +0000)

----------------------------------------------------------------
Pipework for general notification queue

----------------------------------------------------------------
David Howells (12):
      pipe: Reduce #inclusion of pipe_fs_i.h
      Remove the nr_exclusive argument from __wake_up_sync_key()
      Add wake_up_interruptible_sync_poll_locked()
      pipe: Use head and tail pointers for the ring, not cursor and length
      pipe: Allow pipes to have kernel-reserved slots
      pipe: Advance tail pointer inside of wait spinlock in pipe_read()
      pipe: Conditionalise wakeup in pipe_read()
      pipe: Rearrange sequence in pipe_write() to preallocate slot
      pipe: Remove redundant wakeup from pipe_write()
      pipe: Check for ring full inside of the spinlock in pipe_write()
      pipe: Increase the writer-wakeup threshold to reduce context-switch count
      pipe: Remove sync on wake_ups

 drivers/char/virtio_console.c |  16 ++-
 fs/exec.c                     |   1 -
 fs/fuse/dev.c                 |  31 +++--
 fs/ocfs2/aops.c               |   1 -
 fs/pipe.c                     | 232 +++++++++++++++++++++---------------
 fs/splice.c                   | 190 +++++++++++++++++------------
 include/linux/pipe_fs_i.h     |  64 +++++++++-
 include/linux/uio.h           |   4 +-
 include/linux/wait.h          |  11 +-
 kernel/exit.c                 |   2 +-
 kernel/sched/wait.c           |  37 ++++--
 lib/iov_iter.c                | 269 ++++++++++++++++++++++++------------------
 security/smack/smack_lsm.c    |   1 -
 13 files changed, 529 insertions(+), 330 deletions(-)


^ permalink raw reply

* [PATCH 1/2] tpm: Revert "tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts"
From: Stefan Berger @ 2019-11-26 13:17 UTC (permalink / raw)
  To: linux-integrity, jarkko.sakkinen
  Cc: linux-kernel, stable, linux-security-module, Stefan Berger,
	Jerry Snitselaar
In-Reply-To: <20191126131753.3424363-1-stefanb@linux.vnet.ibm.com>

From: Stefan Berger <stefanb@linux.ibm.com>

Revert the patch that was setting the TPM_CHIP_FLAG_IRQ before probing for
interrupts.

Fixes: 1ea32c83c699 ("tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts")
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Reported-by: Jerry Snitselaar <jsnitsel@redhat.com>
Cc: stable@vger.kernel.org
---
 drivers/char/tpm/tpm_tis_core.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index 8af2cee1a762..5dc52c4e2292 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -1060,7 +1060,6 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
 		}
 
 		tpm_chip_start(chip);
-		chip->flags |= TPM_CHIP_FLAG_IRQ;
 		if (irq) {
 			tpm_tis_probe_irq_single(chip, intmask, IRQF_SHARED,
 						 irq);
-- 
2.14.5


^ permalink raw reply related

* [PATCH 0/2] Revert patches fixing probing of interrupts
From: Stefan Berger @ 2019-11-26 13:17 UTC (permalink / raw)
  To: linux-integrity, jarkko.sakkinen
  Cc: linux-kernel, stable, linux-security-module, Stefan Berger

From: Stefan Berger <stefanb@linux.ibm.com>

Revert the patches that were fixing the probing of interrupts due
to reports of interrupt stroms on some systems

The following Linux kernel versions are affected:

- 5.4
- 5.3.4 and later
- 5.2.19 and later

Stefan Berger (2):
  tpm: Revert "tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for
    interrupts"
  tpm: Revert "tpm_tis_core: Turn on the TPM before probing IRQ's"

 drivers/char/tpm/tpm_tis_core.c | 3 ---
 1 file changed, 3 deletions(-)

-- 
2.14.5


^ permalink raw reply

* [PATCH 2/2] tpm: Revert "tpm_tis_core: Turn on the TPM before probing IRQ's"
From: Stefan Berger @ 2019-11-26 13:17 UTC (permalink / raw)
  To: linux-integrity, jarkko.sakkinen
  Cc: linux-kernel, stable, linux-security-module, Stefan Berger,
	Jerry Snitselaar
In-Reply-To: <20191126131753.3424363-1-stefanb@linux.vnet.ibm.com>

From: Stefan Berger <stefanb@linux.ibm.com>

Revert the patch that was turning the TPM on before probing for IRQs.

Fixes: 5b359c7c4372 ("tpm_tis_core: Turn on the TPM before probing IRQ's")
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Reported-by: Jerry Snitselaar <jsnitsel@redhat.com>
Cc: stable@vger.kernel.org
---
 drivers/char/tpm/tpm_tis_core.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index 5dc52c4e2292..27c6ca031e23 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -1059,7 +1059,6 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
 			goto out_err;
 		}
 
-		tpm_chip_start(chip);
 		if (irq) {
 			tpm_tis_probe_irq_single(chip, intmask, IRQF_SHARED,
 						 irq);
@@ -1069,7 +1068,6 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
 		} else {
 			tpm_tis_probe_irq(chip, intmask);
 		}
-		tpm_chip_stop(chip);
 	}
 
 	rc = tpm_chip_register(chip);
-- 
2.14.5


^ permalink raw reply related

* [GIT PULL] SELinux patches for v5.5
From: Paul Moore @ 2019-11-26 21:24 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: selinux, linux-security-module, linux-kernel

Hi Linus,

Only three SELinux patches for v5.5, all passing the test suite and
listed below, please merge them for v5.5.

- Remove the size limit on SELinux policies, the limitation was a
lingering vestige and no longer necessary.

- Allow file labeling before the policy is loaded.  This should ease
some of the burden when the policy is initially loaded (no need to
relabel files), but it should also help enable some new system
concepts which dynamically create the root filesystem in the initrd.

- Add support for the "greatest lower bound" policy construct which is
defined as the intersection of the MLS range of two SELinux labels.

Thanks,
-Paul
--
The following changes since commit 54ecb8f7028c5eb3d740bb82b0f1d90f2df63c5c:

 Linux 5.4-rc1 (2019-09-30 10:35:40 -0700)

are available in the Git repository at:

 git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux.git
   tags/selinux-pr-20191126

for you to fetch changes up to 42345b68c2e3e2b6549fc34b937ff44240dfc3b6:

 selinux: default_range glblub implementation (2019-10-07 19:01:35 -0400)

----------------------------------------------------------------
selinux/stable-5.5 PR 20191126

----------------------------------------------------------------
Jonathan Lebon (1):
     selinux: allow labeling before policy is loaded

Joshua Brindle (1):
     selinux: default_range glblub implementation

zhanglin (1):
     selinux: remove load size limit

security/selinux/hooks.c            | 12 ++++++++++++
security/selinux/include/security.h |  3 ++-
security/selinux/selinuxfs.c        |  4 ----
security/selinux/ss/context.h       | 32 ++++++++++++++++++++++++++++++++
security/selinux/ss/ebitmap.c       | 18 ++++++++++++++++++
security/selinux/ss/ebitmap.h       |  1 +
security/selinux/ss/mls.c           |  3 +++
security/selinux/ss/policydb.c      |  5 +++++
security/selinux/ss/policydb.h      |  1 +
9 files changed, 74 insertions(+), 5 deletions(-)

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* [GIT PULL] pipe: General notification queue
From: David Howells @ 2019-11-26 23:18 UTC (permalink / raw)
  To: torvalds
  Cc: dhowells, Rasmus Villemoes, Greg Kroah-Hartman, Peter Zijlstra,
	raven, Christian Brauner, keyrings, linux-usb, linux-block,
	linux-security-module, linux-fsdevel, linux-api, linux-kernel

Hi Linus,

Can you consider pulling my general notification queue patchset after
you've pulled the preparatory pipework patchset?  Or should it be deferred
to the next window?

The general notification queue is built on top a pipe, splicing kernel
events into the pipe from a variety of event sources.  A 'notification'
mode pipe must be linked to the desired sources before it will start
getting any events delivered through it.  Event sources include:

 (1) Keys/keyrings, such as linking and unlinking keys and changing their
     attributes.

 (2) General device events (single common queue) including:

     - Block layer events, such as device errors

     - USB subsystem events, such as device attach/remove, device reset,
       device errors.

 (3) I have patches for adding superblock and mount topology watches also,
     but in a separate branch as there are other dependencies.

LSM hooks are included:

 (1) A set of hooks are provided that allow an LSM to rule on whether or
     not a watch may be set.  Each of these hooks takes a different
     "watched object" parameter, so they're not really shareable.  The LSM
     should use current's credentials.  [Wanted by SELinux & Smack]

 (2) A hook is provided to allow an LSM to rule on whether or not a
     particular message may be posted to a particular queue.  This is given
     the credentials from the event generator (which may be the system) and
     the watch setter.  [Wanted by Smack]

I've provided SELinux and Smack with implementations of some of these hooks.


USE CASES
=========

 (1) Key/keyring notifications.

     If you have your kerberos tickets in a file/directory, your gnome desktop
     will monitor that using something like fanotify and tell you if your
     credentials cache changes.

     We also have the ability to cache your kerberos tickets in the session,
     user or persistent keyring so that it isn't left around on disk across a
     reboot or logout.  Keyrings, however, cannot currently be monitored
     asynchronously, so the desktop has to poll for it - not so good on a
     laptop.

     This source will allow the desktop to avoid the need to poll.

 (2) USB notifications.

     GregKH was looking for a way to do USB notifications as I was looking to
     find additional sources to implement.  I'm not sure how he wants to use
     them, but I'll let him speak to that himself.

 (3) Block notifications.

     This one I was thinking that I could make something like ddrescue better
     by letting it get notifications this way.  This was a target of
     convenience since I had a dodgy disk I was trying to rescue.

     It could also potentially be used help systemd, say, detect broken
     devices and avoid trying to unmount them when trying to reboot the machine.

     I can drop this for now if you prefer.

 (4) Mount notifications.

     This one is wanted to avoid repeated trawling of /proc/mounts or similar
     to work out changes to the mount object attributes and mount topology.
     I'm told that the proc file holding the namespace_sem is a point of
     contention, especially as the process of generating the text descriptions
     of the mounts/superblocks can be quite involved.

     The notifications directly indicate the mounts involved in any particular
     event and what the change was.  You can poll /proc/mounts, but all you
     know is that something changed; you don't know what and you don't know
     how and reading that file may race with multiple changed being effected.

     I pair this with a new fsinfo() system call that allows, amongst other
     things, the ability to retrieve in one go an { id, change counter } tuple
     from all the children of a specified mount, allowing buffer overruns to
     be cleaned up quickly.

     It's not just Red Hat that's potentially interested in this:

	https://lore.kernel.org/linux-fsdevel/293c9bd3-f530-d75e-c353-ddeabac27cf6@6wind.com/

 (5) Superblock notifications.

     This one is provided to allow systemd or the desktop to more easily
     detect events such as I/O errors and EDQUOT/ENOSPC.


DESIGN DECISIONS
================

 (1) The notification queue is built on top of a standard pipe.  Messages
     are effectively spliced in.  The pipe is opened with a special flag:

	pipe2(fds, O_NOTIFICATION_PIPE);

     The special flag has the same value as O_EXCL (which doesn't seem like
     it will ever be applicable in this context)[?].  It is given up front
     to make it a lot easier to prohibit splice and co. from accessing the
     pipe.

     [?] Should this be done some other way?  I'd rather not use up a new
     O_* flag if I can avoid it - should I add a pipe3() system call
     instead?

     The pipe is then configured::

	ioctl(fds[1], IOC_WATCH_QUEUE_SET_SIZE, queue_depth);
	ioctl(fds[1], IOC_WATCH_QUEUE_SET_FILTER, &filter);

     Messages are then read out of the pipe using read().

     [?] Should this use fcntl() rather than ioctl()?

 (2) It should be possible to allow write() to insert data into the
     notification pipes too, but this is currently disabled as the kernel
     has to be able to insert messages into the pipe *without* holding
     pipe->mutex and the code to make this work needs careful auditing.

 (3) sendfile(), splice() and vmsplice() are disabled on notification pipes
     because of the pipe->mutex issue and also because they sometimes want
     to revert what they just did - but one or more notification messages
     might've been interleaved in the ring.

 (4) The kernel inserts messages with the wait queue spinlock held.  This
     means that pipe_read() and pipe_write() have to take the spinlock to
     update the queue pointers.

 (5) Records in the buffer are binary, typed and have a length so that they
     can be of varying size.

     This allows multiple heterogeneous sources to share a common buffer;
     there are 16 million types available, of which I've used just a few,
     so there is scope for others to be used.  Tags may be specified when a
     watchpoint is created to help distinguish the sources.

 (6) Records are filterable as types have up to 256 subtypes that can be
     individually filtered.  Other filtration is also available.

 (7) Notification pipes don't interfere with each other; each may be bound
     to a different set of event sources.  Any particular notification will
     be copied to all the queues that are currently watching for it - and
     only those that are watching for it.

 (8) When recording a notification, the kernel will not sleep, but will
     rather mark a queue as having lost a message if there's insufficient
     space.  read() will fabricate a loss notification message at an
     appropriate point later.

 (9) The notification pipe is created and then watchpoints are attached to
     it.  Examples of this include:

	keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fds[1], 0x01);
	watch_devices(fds[1], 0x02, 0);

     where in both cases, fd indicates the queue and the number after is a
     tag between 0 and 255.

(10) Watches are removed if either the notification pipe is destroyed or
     the watched object is destroyed.  In the latter case, a message will
     be generated indicating the enforced watch removal.


Things I want to avoid:

 (1) Introducing features that make the core VFS dependent on the network
     stack or networking namespaces (ie. usage of netlink).

 (2) Dumping all this stuff into dmesg and having a daemon that sits there
     parsing the output and distributing it as this then puts the
     responsibility for security into userspace and makes handling
     namespaces tricky.  Further, dmesg might not exist or might be
     inaccessible inside a container.

 (3) Letting users see events they shouldn't be able to see.


TESTING AND MANPAGES
====================

 (*) The keyutils tree has a pipe-watch branch that has keyctl commands for
     making use of notifications.  Proposed manual pages can also be found
     on this branch, though a couple of them really need to go to the main
     manpages repository instead.

     If the kernel supports the watching of keys, then running "make test"
     on that branch will cause the testing infrastructure to spawn a
     monitoring process on the side that monitors a notifications pipe for
     all the key/keyring changes induced by the tests and they'll all be
     checked off to make sure they happened.

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=pipe-watch

Changes:

 (*) Fix some bits found by kbuild.

 ver #2:

 (*) Declare O_NOTIFICATION_PIPE to use and switch it to be the same value
     as O_EXCL rather then O_TMPFILE (the latter is a bit nasty in its
     implementation).

 ver #1:

 (*) Build on top of standard pipes instead of having a driver.

David
---
The following changes since commit 3c0edea9b29f9be6c093f236f762202b30ac9431:

  pipe: Remove sync on wake_ups (2019-11-15 16:22:54 +0000)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git tags/notifications-pipe-core-20191115

for you to fetch changes up to 72fb99936e50f6c17490aeb445c87406a1265d52:

  smack: Implement the watch_key and post_notification hooks (2019-11-15 16:23:58 +0000)

----------------------------------------------------------------
General notification queue built on pipes

----------------------------------------------------------------
David Howells (14):
      uapi: General notification queue definitions
      security: Add hooks to rule on setting a watch
      security: Add a hook for the point of notification insertion
      pipe: Add O_NOTIFICATION_PIPE
      pipe: Add general notification queue support
      keys: Add a notification facility
      Add sample notification program
      pipe: Allow buffers to be marked read-whole-or-error for notifications
      pipe: Add notification lossage handling
      Add a general, global device notification watch list
      block: Add block layer notifications
      usb: Add USB subsystem notifications
      selinux: Implement the watch_key security hook
      smack: Implement the watch_key and post_notification hooks

 Documentation/ioctl/ioctl-number.rst        |   1 +
 Documentation/security/keys/core.rst        |  58 +++
 Documentation/watch_queue.rst               | 385 ++++++++++++++++
 arch/alpha/kernel/syscalls/syscall.tbl      |   1 +
 arch/arm/tools/syscall.tbl                  |   1 +
 arch/arm64/include/asm/unistd.h             |   2 +-
 arch/arm64/include/asm/unistd32.h           |   2 +
 arch/ia64/kernel/syscalls/syscall.tbl       |   1 +
 arch/m68k/kernel/syscalls/syscall.tbl       |   1 +
 arch/microblaze/kernel/syscalls/syscall.tbl |   1 +
 arch/mips/kernel/syscalls/syscall_n32.tbl   |   1 +
 arch/mips/kernel/syscalls/syscall_n64.tbl   |   1 +
 arch/mips/kernel/syscalls/syscall_o32.tbl   |   1 +
 arch/parisc/kernel/syscalls/syscall.tbl     |   1 +
 arch/powerpc/kernel/syscalls/syscall.tbl    |   1 +
 arch/s390/kernel/syscalls/syscall.tbl       |   1 +
 arch/sh/kernel/syscalls/syscall.tbl         |   1 +
 arch/sparc/kernel/syscalls/syscall.tbl      |   1 +
 arch/x86/entry/syscalls/syscall_32.tbl      |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl      |   1 +
 arch/xtensa/kernel/syscalls/syscall.tbl     |   1 +
 block/Kconfig                               |   9 +
 block/blk-core.c                            |  29 ++
 drivers/base/Kconfig                        |   9 +
 drivers/base/Makefile                       |   1 +
 drivers/base/watch.c                        |  90 ++++
 drivers/usb/core/Kconfig                    |   9 +
 drivers/usb/core/devio.c                    |  47 ++
 drivers/usb/core/hub.c                      |   4 +
 fs/pipe.c                                   | 242 +++++++---
 fs/splice.c                                 |  12 +-
 include/linux/blkdev.h                      |  15 +
 include/linux/device.h                      |   7 +
 include/linux/key.h                         |   3 +
 include/linux/lsm_audit.h                   |   1 +
 include/linux/lsm_hooks.h                   |  38 ++
 include/linux/pipe_fs_i.h                   |  27 +-
 include/linux/security.h                    |  32 +-
 include/linux/syscalls.h                    |   1 +
 include/linux/usb.h                         |  18 +
 include/linux/watch_queue.h                 | 127 ++++++
 include/uapi/asm-generic/unistd.h           |   4 +-
 include/uapi/linux/keyctl.h                 |   2 +
 include/uapi/linux/watch_queue.h            | 158 +++++++
 init/Kconfig                                |  12 +
 kernel/Makefile                             |   1 +
 kernel/sys_ni.c                             |   1 +
 kernel/watch_queue.c                        | 658 ++++++++++++++++++++++++++++
 samples/Kconfig                             |   7 +
 samples/Makefile                            |   1 +
 samples/watch_queue/Makefile                |   7 +
 samples/watch_queue/watch_test.c            | 251 +++++++++++
 security/keys/Kconfig                       |   9 +
 security/keys/compat.c                      |   3 +
 security/keys/gc.c                          |   5 +
 security/keys/internal.h                    |  30 +-
 security/keys/key.c                         |  38 +-
 security/keys/keyctl.c                      |  99 ++++-
 security/keys/keyring.c                     |  20 +-
 security/keys/request_key.c                 |   4 +-
 security/security.c                         |  23 +
 security/selinux/hooks.c                    |  14 +
 security/smack/smack_lsm.c                  |  82 +++-
 63 files changed, 2506 insertions(+), 108 deletions(-)
 create mode 100644 Documentation/watch_queue.rst
 create mode 100644 drivers/base/watch.c
 create mode 100644 include/linux/watch_queue.h
 create mode 100644 include/uapi/linux/watch_queue.h
 create mode 100644 kernel/watch_queue.c
 create mode 100644 samples/watch_queue/Makefile
 create mode 100644 samples/watch_queue/watch_test.c


^ permalink raw reply

* [RFC PATCH v2] security,lockdown,selinux: implement SELinux lockdown
From: Stephen Smalley @ 2019-11-27 17:04 UTC (permalink / raw)
  To: selinux
  Cc: linux-security-module, linux-audit, paul, jmorris, matthewgarrett,
	Stephen Smalley

Implement a SELinux hook for lockdown.  If the lockdown module is also
enabled, then a denial by the lockdown module will take precedence over
SELinux, so SELinux can only further restrict lockdown decisions.
The SELinux hook only distinguishes at the granularity of integrity
versus confidentiality similar to the lockdown module, but includes the
full lockdown reason as part of the audit record as a hint in diagnosing
what triggered the denial.  To support this auditing, move the
lockdown_reasons[] string array from being private to the lockdown
module to the security framework so that it can be used by the lsm audit
code and so that it is always available even when the lockdown module
is disabled.

Note that the SELinux implementation allows the integrity and
confidentiality reasons to be controlled independently from one another.
Thus, in an SELinux policy, one could allow operations that specify
an integrity reason while blocking operations that specify a
confidentiality reason. The SELinux hook implementation is
stricter than the lockdown module in validating the provided reason value.

Sample AVC audit output from denials:
avc:  denied  { integrity } for pid=3402 comm="fwupd"
 lockdown_reason="/dev/mem,kmem,port" scontext=system_u:system_r:fwupd_t:s0
 tcontext=system_u:system_r:fwupd_t:s0 tclass=lockdown permissive=0

avc:  denied  { confidentiality } for pid=4628 comm="cp"
 lockdown_reason="/proc/kcore access"
 scontext=unconfined_u:unconfined_r:test_lockdown_integrity_t:s0-s0:c0.c1023
 tcontext=unconfined_u:unconfined_r:test_lockdown_integrity_t:s0-s0:c0.c1023
 tclass=lockdown permissive=0

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
---
 include/linux/lsm_audit.h           |  2 ++
 include/linux/security.h            |  2 ++
 security/lockdown/lockdown.c        | 24 -----------------------
 security/lsm_audit.c                |  5 +++++
 security/security.c                 | 30 +++++++++++++++++++++++++++++
 security/selinux/hooks.c            | 30 +++++++++++++++++++++++++++++
 security/selinux/include/classmap.h |  2 ++
 7 files changed, 71 insertions(+), 24 deletions(-)

diff --git a/include/linux/lsm_audit.h b/include/linux/lsm_audit.h
index 915330abf6e5..99d629fd9944 100644
--- a/include/linux/lsm_audit.h
+++ b/include/linux/lsm_audit.h
@@ -74,6 +74,7 @@ struct common_audit_data {
 #define LSM_AUDIT_DATA_FILE	12
 #define LSM_AUDIT_DATA_IBPKEY	13
 #define LSM_AUDIT_DATA_IBENDPORT 14
+#define LSM_AUDIT_DATA_LOCKDOWN 15
 	union 	{
 		struct path path;
 		struct dentry *dentry;
@@ -93,6 +94,7 @@ struct common_audit_data {
 		struct file *file;
 		struct lsm_ibpkey_audit *ibpkey;
 		struct lsm_ibendport_audit *ibendport;
+		int reason;
 	} u;
 	/* this union contains LSM specific data */
 	union {
diff --git a/include/linux/security.h b/include/linux/security.h
index a8d59d612d27..df7a4d293fe8 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -125,6 +125,8 @@ enum lockdown_reason {
 	LOCKDOWN_CONFIDENTIALITY_MAX,
 };
 
+extern const char *const lockdown_reasons[LOCKDOWN_CONFIDENTIALITY_MAX+1];
+
 /* These functions are in security/commoncap.c */
 extern int cap_capable(const struct cred *cred, struct user_namespace *ns,
 		       int cap, unsigned int opts);
diff --git a/security/lockdown/lockdown.c b/security/lockdown/lockdown.c
index 8a10b43daf74..5a952617a0eb 100644
--- a/security/lockdown/lockdown.c
+++ b/security/lockdown/lockdown.c
@@ -16,30 +16,6 @@
 
 static enum lockdown_reason kernel_locked_down;
 
-static const char *const lockdown_reasons[LOCKDOWN_CONFIDENTIALITY_MAX+1] = {
-	[LOCKDOWN_NONE] = "none",
-	[LOCKDOWN_MODULE_SIGNATURE] = "unsigned module loading",
-	[LOCKDOWN_DEV_MEM] = "/dev/mem,kmem,port",
-	[LOCKDOWN_KEXEC] = "kexec of unsigned images",
-	[LOCKDOWN_HIBERNATION] = "hibernation",
-	[LOCKDOWN_PCI_ACCESS] = "direct PCI access",
-	[LOCKDOWN_IOPORT] = "raw io port access",
-	[LOCKDOWN_MSR] = "raw MSR access",
-	[LOCKDOWN_ACPI_TABLES] = "modifying ACPI tables",
-	[LOCKDOWN_PCMCIA_CIS] = "direct PCMCIA CIS storage",
-	[LOCKDOWN_TIOCSSERIAL] = "reconfiguration of serial port IO",
-	[LOCKDOWN_MODULE_PARAMETERS] = "unsafe module parameters",
-	[LOCKDOWN_MMIOTRACE] = "unsafe mmio",
-	[LOCKDOWN_DEBUGFS] = "debugfs access",
-	[LOCKDOWN_INTEGRITY_MAX] = "integrity",
-	[LOCKDOWN_KCORE] = "/proc/kcore access",
-	[LOCKDOWN_KPROBES] = "use of kprobes",
-	[LOCKDOWN_BPF_READ] = "use of bpf to read kernel RAM",
-	[LOCKDOWN_PERF] = "unsafe use of perf",
-	[LOCKDOWN_TRACEFS] = "use of tracefs",
-	[LOCKDOWN_CONFIDENTIALITY_MAX] = "confidentiality",
-};
-
 static const enum lockdown_reason lockdown_levels[] = {LOCKDOWN_NONE,
 						 LOCKDOWN_INTEGRITY_MAX,
 						 LOCKDOWN_CONFIDENTIALITY_MAX};
diff --git a/security/lsm_audit.c b/security/lsm_audit.c
index e40874373f2b..2d2bf49016f4 100644
--- a/security/lsm_audit.c
+++ b/security/lsm_audit.c
@@ -27,6 +27,7 @@
 #include <linux/dccp.h>
 #include <linux/sctp.h>
 #include <linux/lsm_audit.h>
+#include <linux/security.h>
 
 /**
  * ipv4_skb_to_auditdata : fill auditdata from skb
@@ -425,6 +426,10 @@ static void dump_common_audit_data(struct audit_buffer *ab,
 				 a->u.ibendport->dev_name,
 				 a->u.ibendport->port);
 		break;
+	case LSM_AUDIT_DATA_LOCKDOWN:
+		audit_log_format(ab, " lockdown_reason=");
+		audit_log_string(ab, lockdown_reasons[a->u.reason]);
+		break;
 	} /* switch (a->type) */
 }
 
diff --git a/security/security.c b/security/security.c
index 1bc000f834e2..f439c1102b1a 100644
--- a/security/security.c
+++ b/security/security.c
@@ -35,6 +35,36 @@
 #define LSM_COUNT (__end_lsm_info - __start_lsm_info)
 #define EARLY_LSM_COUNT (__end_early_lsm_info - __start_early_lsm_info)
 
+/*
+ * These are descriptions of the reasons that can be passed to the
+ * security_locked_down() LSM hook. Placing this array here allows
+ * all security modules to use the same descriptions for auditing
+ * purposes.
+ */
+const char *const lockdown_reasons[LOCKDOWN_CONFIDENTIALITY_MAX+1] = {
+	[LOCKDOWN_NONE] = "none",
+	[LOCKDOWN_MODULE_SIGNATURE] = "unsigned module loading",
+	[LOCKDOWN_DEV_MEM] = "/dev/mem,kmem,port",
+	[LOCKDOWN_KEXEC] = "kexec of unsigned images",
+	[LOCKDOWN_HIBERNATION] = "hibernation",
+	[LOCKDOWN_PCI_ACCESS] = "direct PCI access",
+	[LOCKDOWN_IOPORT] = "raw io port access",
+	[LOCKDOWN_MSR] = "raw MSR access",
+	[LOCKDOWN_ACPI_TABLES] = "modifying ACPI tables",
+	[LOCKDOWN_PCMCIA_CIS] = "direct PCMCIA CIS storage",
+	[LOCKDOWN_TIOCSSERIAL] = "reconfiguration of serial port IO",
+	[LOCKDOWN_MODULE_PARAMETERS] = "unsafe module parameters",
+	[LOCKDOWN_MMIOTRACE] = "unsafe mmio",
+	[LOCKDOWN_DEBUGFS] = "debugfs access",
+	[LOCKDOWN_INTEGRITY_MAX] = "integrity",
+	[LOCKDOWN_KCORE] = "/proc/kcore access",
+	[LOCKDOWN_KPROBES] = "use of kprobes",
+	[LOCKDOWN_BPF_READ] = "use of bpf to read kernel RAM",
+	[LOCKDOWN_PERF] = "unsafe use of perf",
+	[LOCKDOWN_TRACEFS] = "use of tracefs",
+	[LOCKDOWN_CONFIDENTIALITY_MAX] = "confidentiality",
+};
+
 struct security_hook_heads security_hook_heads __lsm_ro_after_init;
 static BLOCKING_NOTIFIER_HEAD(blocking_lsm_notifier_chain);
 
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 36e531b91df2..ca8a9d1b3ffd 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -6799,6 +6799,34 @@ static void selinux_bpf_prog_free(struct bpf_prog_aux *aux)
 }
 #endif
 
+static int selinux_lockdown(enum lockdown_reason what)
+{
+	struct common_audit_data ad;
+	u32 sid = current_sid();
+	int invalid_reason = (what <= LOCKDOWN_NONE) ||
+			     (what == LOCKDOWN_INTEGRITY_MAX) ||
+			     (what >= LOCKDOWN_CONFIDENTIALITY_MAX);
+
+	if (WARN(invalid_reason, "Invalid lockdown reason")) {
+		audit_log(audit_context(),
+			  GFP_ATOMIC, AUDIT_SELINUX_ERR,
+			  "lockdown_reason=invalid");
+		return -EINVAL;
+	}
+
+	ad.type = LSM_AUDIT_DATA_LOCKDOWN;
+	ad.u.reason = what;
+
+	if (what <= LOCKDOWN_INTEGRITY_MAX)
+		return avc_has_perm(&selinux_state,
+				    sid, sid, SECCLASS_LOCKDOWN,
+				    LOCKDOWN__INTEGRITY, &ad);
+	else
+		return avc_has_perm(&selinux_state,
+				    sid, sid, SECCLASS_LOCKDOWN,
+				    LOCKDOWN__CONFIDENTIALITY, &ad);
+}
+
 struct lsm_blob_sizes selinux_blob_sizes __lsm_ro_after_init = {
 	.lbs_cred = sizeof(struct task_security_struct),
 	.lbs_file = sizeof(struct file_security_struct),
@@ -7042,6 +7070,8 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = {
 	LSM_HOOK_INIT(bpf_map_free_security, selinux_bpf_map_free),
 	LSM_HOOK_INIT(bpf_prog_free_security, selinux_bpf_prog_free),
 #endif
+
+	LSM_HOOK_INIT(locked_down, selinux_lockdown),
 };
 
 static __init int selinux_init(void)
diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h
index 32e9b03be3dd..594c32febcd8 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -244,6 +244,8 @@ struct security_class_mapping secclass_map[] = {
 	  {"map_create", "map_read", "map_write", "prog_load", "prog_run"} },
 	{ "xdp_socket",
 	  { COMMON_SOCK_PERMS, NULL } },
+	{ "lockdown",
+	  { "integrity", "confidentiality", NULL } },
 	{ NULL }
   };
 
-- 
2.23.0


^ permalink raw reply related

* Re: [RFC PATCH v2] security,lockdown,selinux: implement SELinux lockdown
From: James Morris @ 2019-11-27 17:22 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: selinux, linux-security-module, Audit-ML, Paul Moore,
	matthewgarrett
In-Reply-To: <20191127170436.4237-1-sds@tycho.nsa.gov>

On Wed, 27 Nov 2019, Stephen Smalley wrote:

> avc:  denied  { confidentiality } for pid=4628 comm="cp"
>  lockdown_reason="/proc/kcore access"
>  scontext=unconfined_u:unconfined_r:test_lockdown_integrity_t:s0-s0:c0.c1023
>  tcontext=unconfined_u:unconfined_r:test_lockdown_integrity_t:s0-s0:c0.c1023
>  tclass=lockdown permissive=0
> 
> Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
> ---
>  include/linux/lsm_audit.h           |  2 ++
>  include/linux/security.h            |  2 ++
>  security/lockdown/lockdown.c        | 24 -----------------------
>  security/lsm_audit.c                |  5 +++++
>  security/security.c                 | 30 +++++++++++++++++++++++++++++
>  security/selinux/hooks.c            | 30 +++++++++++++++++++++++++++++
>  security/selinux/include/classmap.h |  2 ++
>  7 files changed, 71 insertions(+), 24 deletions(-)

LGTM.

Reviewed-by: James Morris <jamorris@linux.microsoft.com>


-- 
James Morris
<jmorris@namei.org>


^ permalink raw reply

* Re: [PATCH] tpm_tis: Move setting of TPM_CHIP_FLAG_IRQ into tpm_tis_probe_irq_single
From: Jarkko Sakkinen @ 2019-11-27 21:11 UTC (permalink / raw)
  To: Stefan Berger, Stefan Berger, linux-integrity, linux-kernel,
	linux-security-module, dan.j.williams
In-Reply-To: <20191121184949.yvw2gwzlkhjzko64@cantor>

On Thu, Nov 21, 2019 at 11:49:49AM -0700, Jerry Snitselaar wrote:
> On Sat Nov 16 19, Stefan Berger wrote:
> > On 11/14/19 11:44 AM, Jarkko Sakkinen wrote:
> > > On Thu, Nov 14, 2019 at 06:41:51PM +0200, Jarkko Sakkinen wrote:
> > > > On Tue, Nov 12, 2019 at 03:27:25PM -0500, Stefan Berger wrote:
> > > > > From: Stefan Berger <stefanb@linux.ibm.com>
> > > > > 
> > > > > Move the setting of the TPM_CHIP_FLAG_IRQ for irq probing into
> > > > > tpm_tis_probe_irq_single before calling tpm_tis_gen_interrupt.
> > > > > This move handles error conditions better that may arise if anything
> > > > > before fails in tpm_tis_probe_irq_single.
> > > > > 
> > > > > Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
> > > > > Suggested-by: Jerry Snitselaar <jsnitsel@redhat.com>
> > > > What about just changing the condition?
> > > Also cannot take this since it is not a bug (no fixes tag).
> > 
> > I'll repost but will wait until Jerry has tested it on that machine.
> > 
> >    Stefan
> > 
> > 
> > > 
> > > /Jarkko
> > 
> > 
> 
> It appears they still have the problem. I'm still waiting on logistics
> to send me a system to debug.

Which hardware is guaranteed to ignite this? I can try to get test hw
for this from somewhere. Kind of looking into this blinded ATM. Dan?

/Jarkko

^ permalink raw reply

* Re: [PATCH] tpm_tis: Move setting of TPM_CHIP_FLAG_IRQ into tpm_tis_probe_irq_single
From: Dan Williams @ 2019-11-27 21:26 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Stefan Berger, Stefan Berger, linux-integrity,
	Linux Kernel Mailing List, linux-security-module, jsnitsel
In-Reply-To: <20191127211109.GF14290@linux.intel.com>

[ add Jerry ]

On Wed, Nov 27, 2019 at 1:11 PM Jarkko Sakkinen
<jarkko.sakkinen@linux.intel.com> wrote:
>
> On Thu, Nov 21, 2019 at 11:49:49AM -0700, Jerry Snitselaar wrote:
> > On Sat Nov 16 19, Stefan Berger wrote:
> > > On 11/14/19 11:44 AM, Jarkko Sakkinen wrote:
> > > > On Thu, Nov 14, 2019 at 06:41:51PM +0200, Jarkko Sakkinen wrote:
> > > > > On Tue, Nov 12, 2019 at 03:27:25PM -0500, Stefan Berger wrote:
> > > > > > From: Stefan Berger <stefanb@linux.ibm.com>
> > > > > >
> > > > > > Move the setting of the TPM_CHIP_FLAG_IRQ for irq probing into
> > > > > > tpm_tis_probe_irq_single before calling tpm_tis_gen_interrupt.
> > > > > > This move handles error conditions better that may arise if anything
> > > > > > before fails in tpm_tis_probe_irq_single.
> > > > > >
> > > > > > Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
> > > > > > Suggested-by: Jerry Snitselaar <jsnitsel@redhat.com>
> > > > > What about just changing the condition?
> > > > Also cannot take this since it is not a bug (no fixes tag).
> > >
> > > I'll repost but will wait until Jerry has tested it on that machine.
> > >
> > >    Stefan
> > >
> > >
> > > >
> > > > /Jarkko
> > >
> > >
> >
> > It appears they still have the problem. I'm still waiting on logistics
> > to send me a system to debug.
>
> Which hardware is guaranteed to ignite this? I can try to get test hw
> for this from somewhere. Kind of looking into this blinded ATM. Dan?

Jerry had mentioned that this was also occurring on T490s. Otherwise
I'll ping you offline about the system I saw this on internally.

^ permalink raw reply

* Re: [PATCH] tpm_tis: Move setting of TPM_CHIP_FLAG_IRQ into tpm_tis_probe_irq_single
From: Jerry Snitselaar @ 2019-11-28  0:14 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jarkko Sakkinen, Stefan Berger, Stefan Berger, linux-integrity,
	Linux Kernel Mailing List, linux-security-module
In-Reply-To: <CAPcyv4gO2T4xcZjYSYJ8-0kDPRnVYWhX_df5E94Cjyksx6WFbg@mail.gmail.com>

On Wed Nov 27 19, Dan Williams wrote:
>[ add Jerry ]
>
>On Wed, Nov 27, 2019 at 1:11 PM Jarkko Sakkinen
><jarkko.sakkinen@linux.intel.com> wrote:
>>
>> On Thu, Nov 21, 2019 at 11:49:49AM -0700, Jerry Snitselaar wrote:
>> > On Sat Nov 16 19, Stefan Berger wrote:
>> > > On 11/14/19 11:44 AM, Jarkko Sakkinen wrote:
>> > > > On Thu, Nov 14, 2019 at 06:41:51PM +0200, Jarkko Sakkinen wrote:
>> > > > > On Tue, Nov 12, 2019 at 03:27:25PM -0500, Stefan Berger wrote:
>> > > > > > From: Stefan Berger <stefanb@linux.ibm.com>
>> > > > > >
>> > > > > > Move the setting of the TPM_CHIP_FLAG_IRQ for irq probing into
>> > > > > > tpm_tis_probe_irq_single before calling tpm_tis_gen_interrupt.
>> > > > > > This move handles error conditions better that may arise if anything
>> > > > > > before fails in tpm_tis_probe_irq_single.
>> > > > > >
>> > > > > > Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
>> > > > > > Suggested-by: Jerry Snitselaar <jsnitsel@redhat.com>
>> > > > > What about just changing the condition?
>> > > > Also cannot take this since it is not a bug (no fixes tag).
>> > >
>> > > I'll repost but will wait until Jerry has tested it on that machine.
>> > >
>> > >    Stefan
>> > >
>> > >
>> > > >
>> > > > /Jarkko
>> > >
>> > >
>> >
>> > It appears they still have the problem. I'm still waiting on logistics
>> > to send me a system to debug.
>>
>> Which hardware is guaranteed to ignite this? I can try to get test hw
>> for this from somewhere. Kind of looking into this blinded ATM. Dan?
>
>Jerry had mentioned that this was also occurring on T490s. Otherwise
>I'll ping you offline about the system I saw this on internally.
>

I've been trying for about 3 weeks now to get one of the laptops from
logistics here, but unfortunately they have been silent. Pinged one
of their email addresses today to see if they could respond to the
ticket, so hopefully next week I will have something.


^ permalink raw reply

* Re: general protection fault in __schedule (2)
From: Dmitry Vyukov @ 2019-11-28  9:53 UTC (permalink / raw)
  To: Sean Christopherson, syzkaller
  Cc: syzbot, Casey Schaufler, Frederic Weisbecker, Greg Kroah-Hartman,
	H. Peter Anvin, Jim Mattson, James Morris, Raslan, KarimAllah,
	Kate Stewart, KVM list, LKML, linux-security-module, Ingo Molnar,
	Ingo Molnar, Pavel Tatashin, Paolo Bonzini, Philippe Ombredanne,
	Radim Krčmář, Serge E. Hallyn, syzkaller-bugs,
	Thomas Gleixner, the arch/x86 maintainers
In-Reply-To: <20191125175417.GD12178@linux.intel.com>

On Mon, Nov 25, 2019 at 6:54 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Sat, Nov 23, 2019 at 06:15:15AM +0100, Dmitry Vyukov wrote:
> > On Fri, Nov 22, 2019 at 9:54 PM Sean Christopherson
> > <sean.j.christopherson@intel.com> wrote:
> > >
> > > On Thu, Nov 21, 2019 at 11:19:00PM -0800, syzbot wrote:
> > > > syzbot has bisected this bug to:
> > > >
> > > > commit 8fcc4b5923af5de58b80b53a069453b135693304
> > > > Author: Jim Mattson <jmattson@google.com>
> > > > Date:   Tue Jul 10 09:27:20 2018 +0000
> > > >
> > > >     kvm: nVMX: Introduce KVM_CAP_NESTED_STATE
> > > >
> > > > bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=124cdbace00000
> > > > start commit:   234b69e3 ocfs2: fix ocfs2 read block panic
> > > > git tree:       upstream
> > > > final crash:    https://syzkaller.appspot.com/x/report.txt?x=114cdbace00000
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=164cdbace00000
> > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=5fa12be50bca08d8
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=7e2ab84953e4084a638d
> > > > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=150f0a4e400000
> > > > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17f67111400000
> > > >
> > > > Reported-by: syzbot+7e2ab84953e4084a638d@syzkaller.appspotmail.com
> > > > Fixes: 8fcc4b5923af ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE")
> > > >
> > > > For information about bisection process see: https://goo.gl/tpsmEJ#bisection
> > >
> > > Is there a way to have syzbot stop processing/bisecting these things
> > > after a reasonable amount of time?  The original crash is from August of
> > > last year...
> > >
> > > Note, the original crash is actually due to KVM's put_kvm() fd race, but
> > > whatever we want to blame, it's a duplicate.
> > >
> > > #syz dup: general protection fault in kvm_lapic_hv_timer_in_use
> >
> > Hi Sean,
> >
> > syzbot only sends bisection results to open bugs with no known fixes.
> > So what you did (marking the bug as invalid/dup, or attaching a fix)
> > would stop it from doing/sending bisection.
> >
> > "Original crash happened a long time ago" is not necessary a good
> > signal. On the syzbot dashboard
> > (https://syzkaller.appspot.com/upstream), you can see bugs with the
> > original crash 2+ years ago, but they are still pretty much relevant.
> > The default kernel development process strategy for invalidating bug
> > reports by burying them in oblivion has advantages, but also
> > downsides. FWIW syzbot prefers explicit status tracking.
>
> I have no objection to explicit status tracking or getting pinged on old
> open bugs.  I suppose I don't even mind the belated bisection, I'd probably
> whine if syzbot didn't do the bisection :-).
>
> What's annoying is the report doesn't provide any information about when it
> originally occured or on what kernel it originally failed.  It didn't occur
> to me that the original bug might be a year old and I only realized it was
> from an old kernel when I saw "4.19.0-rc4+" in the dashboard's sample crash
> log.  Knowing that the original crash was a year old would have saved me
> 5-10 minutes of getting myself oriented.
>
> Could syzbot provide the date and reported kernel version (assuming the
> kernel version won't be misleading) of the original failure in its reports?

+syzkaller mailing list for syzbot discussion

We tried to provide some aggregate info in email reports long time ago
(like trees where it occurred, number of crashes). The problem was
that any such info captured in emails become stale very quickly. E.g.
later somebody looks at the report and thinking "oh, linux-next only"
or "it happened only once", but maybe it's not for a long time. E.g.
if we say "it last happened 3 months" ago, maybe it's just happened
again once we send it... While this "emails always provide latest
updates" works for kernel in other context b/c updates provided by
humans and there is no other source of truth; it does not play well
with automated systems, or syzbot will need to send several emails per
second, because it's really the rate at which things change.

If we add some info, which one should it be? The original crash, the
one used for bisection, or the latest one? All these are different...
syzbot does not know "4.19.0-rc4+" strings for commits, it generally
identifies commits by hashes. There are dates, but then again which
one? Author or commit? Author is what generally shown, but I remember
a number of patches where Author date is 1.5 years old for just merged
commits :)

There is another problem: if we stuff too many info into emails,
people still stop reading them. This is very serious and real concern.
If you have 1000-page manual, it's well documented, but it's
equivalent to no docs at all, nobody is reading 1000 pages to find 1
bit of info. Especially if you don't know that there is an important
bit that you need to find in the first place...

What would be undoubtedly positive is presenting information on the
dashboard better (If we find a way).
Currently the page says near the top:

First crash: 478d, last: 430d

The idea was that "last: 430d" is supposed to communicate the bit of
info that confused you. Is it what you were looking for? Is there a
better way to present it?

Unfortunately most of such problems are much harder if extended beyond
1 concrete case...

^ permalink raw reply

* Re: tracefs splats in lockdown=confidentiality mode
From: Jordan Glover @ 2019-11-28 15:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: dann frazier, linux-kernel@vger.kernel.org,
	linux-security-module@vger.kernel.org, Seth Forshee,
	Matthew Garrett, James Morris, Linux API, Ben Hutchings, Al Viro,
	Linus Torvalds
In-Reply-To: <20191101181501.4beff81b@grimm.local.home>

On Friday, November 1, 2019 10:15 PM, Steven Rostedt <rostedt@goodmis.org> wrote:

> On Fri, 1 Nov 2019 15:08:03 -0600
> dann frazier dann.frazier@canonical.com wrote:
>
> > hey,
> > fyi, I'm seeing a bunch of errors from tracefs when booting 5.4-rc5 in
> > lockdown=confidentiality mode:
> > [ 1.763630] Lockdown: swapper/0: use of tracefs is restricted; see man kernel_lockdown.7
> > [ 1.772332] Could not create tracefs 'available_events' entry
> > [ 1.778633] Lockdown: swapper/0: use of tracefs is restricted; see man kernel_lockdown.7
> > [ 1.787095] Could not create tracefs 'set_event' entry
> > [ 1.792412] Lockdown: swapper/0: use of tracefs is restricted; see man kernel_lockdown.7
> > (...)
> > [ 2.899481] Could not create tracefs 'set_graph_notrace' entry
> > [ 2.905671] Lockdown: swapper/0: use of tracefs is restricted; see man kernel_lockdown.7
> > [ 2.913934] ------------[ cut here ]------------
> > [ 2.918435] Could not register function stat for cpu 0
> > [ 2.923717] WARNING: CPU: 1 PID: 1 at kernel/trace/ftrace.c:987 ftrace_init_tracefs_toplevel+0x168/0x1bc
> > [ 2.933939] Modules linked in:
> > [ 2.937290] CPU: 1 PID: 1 Comm:
>
> Looks to me that it's working as designed ;-)
>
> I'm guessing we could quiet these warnings for boot up though. :-/
>
> But there should be at least one message that states that the tracefs
> files are not being created due to lockdown.
>
> -- Steve

Could you clarify what functionality is lost here and if it affects
system stability?

I agree that triggering WARNING on every boot with supported kernel
configuration isn't optimal experience for users.

Jordan

^ permalink raw reply

* Re: tracefs splats in lockdown=confidentiality mode
From: dann frazier @ 2019-11-28 17:35 UTC (permalink / raw)
  To: Jordan Glover
  Cc: Steven Rostedt, linux-kernel@vger.kernel.org,
	linux-security-module@vger.kernel.org, Seth Forshee,
	Matthew Garrett, James Morris, Linux API, Ben Hutchings, Al Viro,
	Linus Torvalds
In-Reply-To: <2vtDIdkutRsBBbaiswjFZlGeQPSlDHF3et5ZxQ4YJ4zArOKo7-53A6d8SwpUtt7NCYdQEmmkeTADvrS7NCzw0Stw33n44vJC_qspqXgRPZQ=@protonmail.ch>

On Thu, Nov 28, 2019 at 03:31:31PM +0000, Jordan Glover wrote:
> On Friday, November 1, 2019 10:15 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > On Fri, 1 Nov 2019 15:08:03 -0600
> > dann frazier dann.frazier@canonical.com wrote:
> >
> > > hey,
> > > fyi, I'm seeing a bunch of errors from tracefs when booting 5.4-rc5 in
> > > lockdown=confidentiality mode:
> > > [ 1.763630] Lockdown: swapper/0: use of tracefs is restricted; see man kernel_lockdown.7
> > > [ 1.772332] Could not create tracefs 'available_events' entry
> > > [ 1.778633] Lockdown: swapper/0: use of tracefs is restricted; see man kernel_lockdown.7
> > > [ 1.787095] Could not create tracefs 'set_event' entry
> > > [ 1.792412] Lockdown: swapper/0: use of tracefs is restricted; see man kernel_lockdown.7
> > > (...)
> > > [ 2.899481] Could not create tracefs 'set_graph_notrace' entry
> > > [ 2.905671] Lockdown: swapper/0: use of tracefs is restricted; see man kernel_lockdown.7
> > > [ 2.913934] ------------[ cut here ]------------
> > > [ 2.918435] Could not register function stat for cpu 0
> > > [ 2.923717] WARNING: CPU: 1 PID: 1 at kernel/trace/ftrace.c:987 ftrace_init_tracefs_toplevel+0x168/0x1bc
> > > [ 2.933939] Modules linked in:
> > > [ 2.937290] CPU: 1 PID: 1 Comm:
> >
> > Looks to me that it's working as designed ;-)
> >
> > I'm guessing we could quiet these warnings for boot up though. :-/
> >
> > But there should be at least one message that states that the tracefs
> > files are not being created due to lockdown.
> >
> > -- Steve
> 
> Could you clarify what functionality is lost here and if it affects
> system stability?

None that I'm aware of.

> I agree that triggering WARNING on every boot with supported kernel
> configuration isn't optimal experience for users.

Yes, that's my concern.

 -dann

^ permalink raw reply

* Re: [PATCH v23 12/24] x86/sgx: Linux Enclave Driver
From: Greg KH @ 2019-11-28 18:24 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: linux-kernel, x86, linux-sgx, akpm, dave.hansen,
	sean.j.christopherson, nhorman, npmccallum, serge.ayoun,
	shay.katz-zamir, haitao.huang, andriy.shevchenko, tglx, kai.svahn,
	bp, josh, luto, kai.huang, rientjes, cedric.xing, puiterwijk,
	linux-security-module, Suresh Siddha
In-Reply-To: <20191028210324.12475-13-jarkko.sakkinen@linux.intel.com>

On Mon, Oct 28, 2019 at 11:03:12PM +0200, Jarkko Sakkinen wrote:
> +static struct device sgx_encl_dev;

Ugh, really?  After 23 versions of this patchset no one saw this?

> +static struct cdev sgx_encl_cdev;
> +static dev_t sgx_devt;
> +
> +static void sgx_dev_release(struct device *dev)
> +{
> +}

The old kernel documentation used to say I was allowed to make fun of
people who did this, but that was removed as it really wasn't that nice.

But I'm seriously reconsidering that at the moment.

No, this is NOT OK!

Think about what you are doing here, and why you feel that it is ok to
work around a kernel message that was added there explicitly to help you
do things the right way.  I didn't add it just because I felt like it, I
was trying to give you a chance to not get the use of this api
incorrect.

That failed :(

Ugh, not ok.  Seriously, not ok...

> +static __init int sgx_dev_init(const char *name, struct device *dev,
> +			       struct cdev *cdev,
> +			       const struct file_operations *fops, int minor)
> +{
> +	int ret;
> +
> +	device_initialize(dev);

Why do you even need a struct device in the first place?

> +
> +	dev->bus = &sgx_bus_type;
> +	dev->devt = MKDEV(MAJOR(sgx_devt), minor);
> +	dev->release = sgx_dev_release;
> +
> +	ret = dev_set_name(dev, name);
> +	if (ret) {
> +		put_device(dev);
> +		return ret;
> +	}
> +
> +	cdev_init(cdev, fops);

Why a whole cdev?

Why not use a misc device?  YOu only have 2 devices right?  Why not 2
misc devices then?  That saves the use of a whole major number and makes
your code a _LOT_ simpler.

> +	ret = bus_register(&sgx_bus_type);

I'm afraid to look at this bus code.

Instead I'm going to ask, why do you need a bus at all?  What drivers do
you have for this bus?

ugh I don't know why I looked at this code, but it's not ok as-is and
anyone who reviewed the driver model interaction needs to rethink
things...

greg k-h

^ permalink raw reply

* general protection fault in smack_socket_sendmsg (2)
From: syzbot @ 2019-11-29  0:05 UTC (permalink / raw)
  To: a, andrew, b.a.t.m.a.n, casey, davem, f.fainelli, jmorris,
	linux-kernel, linux-security-module, mareklindner, netdev, serge,
	sw, syzkaller-bugs, vivien.didelot

Hello,

syzbot found the following crash on:

HEAD commit:    0be0ee71 vfs: properly and reliably lock f_pos in fdget_po..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=12c49ef2e00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=330a1f54d1edb817
dashboard link: https://syzkaller.appspot.com/bug?extid=131d2229316b7012ac06
compiler:       clang version 9.0.0 (/home/glider/llvm/clang  
80fee25776c2fb61e74c1ecb1a523375c2500b69)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=13bb67cee00000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=12460136e00000

The bug was bisected to:

commit 8ae5bcdc5d98a99e59f194101e7acd2e9d055758
Author: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Date:   Fri May 19 21:00:54 2017 +0000

     net: dsa: add MDB notifier

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=17ec2f5ae00000
final crash:    https://syzkaller.appspot.com/x/report.txt?x=141c2f5ae00000
console output: https://syzkaller.appspot.com/x/log.txt?x=101c2f5ae00000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+131d2229316b7012ac06@syzkaller.appspotmail.com
Fixes: 8ae5bcdc5d98 ("net: dsa: add MDB notifier")

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 7989 Comm: kworker/1:4 Not tainted 5.4.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Workqueue: krxrpcd rxrpc_peer_keepalive_worker
RIP: 0010:smack_socket_sendmsg+0x5b/0x480 security/smack/smack_lsm.c:3675
Code: e8 fa 03 6b fe 4c 89 e8 48 c1 e8 03 42 80 3c 38 00 74 08 4c 89 ef e8  
74 46 a4 fe 4d 8b 65 00 48 83 c3 18 48 89 d8 48 c1 e8 03 <42> 80 3c 38 00  
74 08 48 89 df e8 56 46 a4 fe 4c 8b 33 49 8d 9e 08
RSP: 0018:ffff88808a58f9c8 EFLAGS: 00010206
RAX: 0000000000000003 RBX: 0000000000000018 RCX: ffff8880a1270280
RDX: 0000000000000000 RSI: ffff88808a58fb18 RDI: 0000000000000000
RBP: ffff88808a58fa80 R08: ffffffff83442500 R09: ffff88808a58fb86
R10: ffffed10114b1f72 R11: 0000000000000000 R12: ffff8880a124c114
R13: ffff88808a58fb18 R14: dffffc0000000000 R15: dffffc0000000000
FS:  0000000000000000(0000) GS:ffff8880aeb00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe2d48c9e78 CR3: 0000000098a23000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  security_socket_sendmsg+0x6c/0xd0 security/security.c:2013
  sock_sendmsg net/socket.c:655 [inline]
  kernel_sendmsg+0x77/0x140 net/socket.c:678
  rxrpc_send_keepalive+0x254/0x3c0 net/rxrpc/output.c:655
  rxrpc_peer_keepalive_dispatch net/rxrpc/peer_event.c:376 [inline]
  rxrpc_peer_keepalive_worker+0x76e/0xb40 net/rxrpc/peer_event.c:437
  process_one_work+0x7ef/0x10e0 kernel/workqueue.c:2269
  worker_thread+0xc01/0x1630 kernel/workqueue.c:2415
  kthread+0x332/0x350 kernel/kthread.c:255
  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
Modules linked in:
---[ end trace 8b748724da7e3b28 ]---
RIP: 0010:smack_socket_sendmsg+0x5b/0x480 security/smack/smack_lsm.c:3675
Code: e8 fa 03 6b fe 4c 89 e8 48 c1 e8 03 42 80 3c 38 00 74 08 4c 89 ef e8  
74 46 a4 fe 4d 8b 65 00 48 83 c3 18 48 89 d8 48 c1 e8 03 <42> 80 3c 38 00  
74 08 48 89 df e8 56 46 a4 fe 4c 8b 33 49 8d 9e 08
RSP: 0018:ffff88808a58f9c8 EFLAGS: 00010206
RAX: 0000000000000003 RBX: 0000000000000018 RCX: ffff8880a1270280
RDX: 0000000000000000 RSI: ffff88808a58fb18 RDI: 0000000000000000
RBP: ffff88808a58fa80 R08: ffffffff83442500 R09: ffff88808a58fb86
R10: ffffed10114b1f72 R11: 0000000000000000 R12: ffff8880a124c114
R13: ffff88808a58fb18 R14: dffffc0000000000 R15: dffffc0000000000
FS:  0000000000000000(0000) GS:ffff8880aeb00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe2d48c9e78 CR3: 0000000098a23000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: general protection fault in smack_socket_sendmsg (2)
From: Tetsuo Handa @ 2019-11-29  9:39 UTC (permalink / raw)
  To: syzbot, syzkaller-bugs
  Cc: b.a.t.m.a.n, jmorris, linux-kernel, linux-security-module, netdev,
	serge, Casey Schaufler
In-Reply-To: <000000000000723a32059870fbd4@google.com>

On 2019/11/29 9:05, syzbot wrote:
>  rxrpc_send_keepalive+0x254/0x3c0 net/rxrpc/output.c:655

Again net/rxrpc/output.c:655

#syz dup: KMSAN: use-after-free in rxrpc_send_keepalive

^ permalink raw reply

* Re: [PATCH -next] x86/efi: remove unused variables
From: Michael Ellerman @ 2019-11-29 12:05 UTC (permalink / raw)
  To: YueHaibing, jmorris, serge, nayna, zohar, dhowells, jwboyer,
	yuehaibing
  Cc: linux-security-module, linux-kernel
In-Reply-To: <20191115130830.13320-1-yuehaibing@huawei.com>

YueHaibing <yuehaibing@huawei.com> writes:
> commit ad723674d675 ("x86/efi: move common keyring
> handler functions to new file") leave this unused.
>
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
> ---
>  security/integrity/platform_certs/load_uefi.c | 5 -----
>  1 file changed, 5 deletions(-)

Thanks for spotting this, my x86 test builds did trigger it, but I
didn't notice the new warnings.

I've picked this up into the powerpc tree, as that's where the offending
commit is.

cheers

> diff --git a/security/integrity/platform_certs/load_uefi.c b/security/integrity/platform_certs/load_uefi.c
> index 4369204..111898a 100644
> --- a/security/integrity/platform_certs/load_uefi.c
> +++ b/security/integrity/platform_certs/load_uefi.c
> @@ -11,11 +11,6 @@
>  #include "../integrity.h"
>  #include "keyring_handler.h"
>  
> -static efi_guid_t efi_cert_x509_guid __initdata = EFI_CERT_X509_GUID;
> -static efi_guid_t efi_cert_x509_sha256_guid __initdata =
> -	EFI_CERT_X509_SHA256_GUID;
> -static efi_guid_t efi_cert_sha256_guid __initdata = EFI_CERT_SHA256_GUID;
> -
>  /*
>   * Look to see if a UEFI variable called MokIgnoreDB exists and return true if
>   * it does.
> -- 
> 2.7.4

^ permalink raw reply

* Re: [PATCH 0/2] Revert patches fixing probing of interrupts
From: Jarkko Sakkinen @ 2019-11-29 22:37 UTC (permalink / raw)
  To: Stefan Berger
  Cc: linux-integrity, linux-kernel, stable, linux-security-module,
	Stefan Berger
In-Reply-To: <20191126131753.3424363-1-stefanb@linux.vnet.ibm.com>

On Tue, Nov 26, 2019 at 08:17:51AM -0500, Stefan Berger wrote:
> From: Stefan Berger <stefanb@linux.ibm.com>
> 
> Revert the patches that were fixing the probing of interrupts due
> to reports of interrupt stroms on some systems

Can you explain how reverting is going to fix the issue?

This is wrong way to move forward. The root cause must be identified
first and then decide actions like always in any situation.

/Jarkko

^ permalink raw reply

* [PATCH v24 12/24] x86/sgx: Linux Enclave Driver
From: Jarkko Sakkinen @ 2019-11-29 23:13 UTC (permalink / raw)
  To: linux-kernel, x86, linux-sgx
  Cc: akpm, dave.hansen, sean.j.christopherson, nhorman, npmccallum,
	serge.ayoun, shay.katz-zamir, haitao.huang, andriy.shevchenko,
	tglx, kai.svahn, bp, josh, luto, kai.huang, rientjes, cedric.xing,
	puiterwijk, Jarkko Sakkinen, linux-security-module, Suresh Siddha
In-Reply-To: <20191129231326.18076-1-jarkko.sakkinen@linux.intel.com>

Intel Software Guard eXtensions (SGX) is a set of CPU instructions that
can be used by applications to set aside private regions of code and
data. The code outside the SGX hosted software entity is disallowed to
access the memory inside the enclave enforced by the CPU. We call these
entities as enclaves.

This commit implements a driver that provides an ioctl API to construct
and run enclaves. Enclaves are constructed from pages residing in
reserved physical memory areas. The contents of these pages can only be
accessed when they are mapped as part of an enclave, by a hardware
thread running inside the enclave.

The starting state of an enclave consists of a fixed measured set of
pages that are copied to the EPC during the construction process by
using ENCLS leaf functions and Software Enclave Control Structure (SECS)
that defines the enclave properties.

Enclave are constructed by using ENCLS leaf functions ECREATE, EADD and
EINIT. ECREATE initializes SECS, EADD copies pages from system memory to
the EPC and EINIT check a given signed measurement and moves the enclave
into a state ready for execution.

An initialized enclave can only be accessed through special Thread Control
Structure (TCS) pages by using ENCLU (ring-3 only) leaf EENTER.  This leaf
function converts a thread into enclave mode and continues the execution in
the offset defined by the TCS provided to EENTER. An enclave is exited
through syscall, exception, interrupts or by explicitly calling another
ENCLU leaf EEXIT.

The permissions, which enclave page is added will set the limit for maximum
permissions that can be set for mmap() and mprotect(). This will
effectively allow to build different security schemes between producers and
consumers of enclaves. Later on we can increase granularity with LSM hooks
for page addition (i.e. for producers) and mapping of the enclave (i.e. for
consumers)

Cc: linux-security-module@vger.kernel.org
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
---
 Documentation/ioctl/ioctl-number.rst |   1 +
 arch/x86/include/uapi/asm/sgx.h      |  66 +++
 arch/x86/kernel/cpu/sgx/Makefile     |   3 +
 arch/x86/kernel/cpu/sgx/driver.c     | 252 ++++++++++
 arch/x86/kernel/cpu/sgx/driver.h     |  32 ++
 arch/x86/kernel/cpu/sgx/encl.c       | 329 +++++++++++++
 arch/x86/kernel/cpu/sgx/encl.h       |  87 ++++
 arch/x86/kernel/cpu/sgx/ioctl.c      | 663 +++++++++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/main.c       |  10 +
 arch/x86/kernel/cpu/sgx/reclaim.c    |   1 +
 10 files changed, 1444 insertions(+)
 create mode 100644 arch/x86/include/uapi/asm/sgx.h
 create mode 100644 arch/x86/kernel/cpu/sgx/driver.c
 create mode 100644 arch/x86/kernel/cpu/sgx/driver.h
 create mode 100644 arch/x86/kernel/cpu/sgx/encl.c
 create mode 100644 arch/x86/kernel/cpu/sgx/encl.h
 create mode 100644 arch/x86/kernel/cpu/sgx/ioctl.c

diff --git a/Documentation/ioctl/ioctl-number.rst b/Documentation/ioctl/ioctl-number.rst
index bef79cd4c6b4..f9f3ea9606fc 100644
--- a/Documentation/ioctl/ioctl-number.rst
+++ b/Documentation/ioctl/ioctl-number.rst
@@ -321,6 +321,7 @@ Code  Seq#    Include File                                           Comments
                                                                      <mailto:tlewis@mindspring.com>
 0xA3  90-9F  linux/dtlk.h
 0xA4  00-1F  uapi/linux/tee.h                                        Generic TEE subsystem
+0xA4  00-1F  uapi/asm/sgx.h                                          Intel SGX subsystem (a legit conflict as TEE and SGX do not co-exist)
 0xAA  00-3F  linux/uapi/linux/userfaultfd.h
 0xAB  00-1F  linux/nbd.h
 0xAC  00-1F  linux/raw.h
diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
new file mode 100644
index 000000000000..5edb08ab8fd0
--- /dev/null
+++ b/arch/x86/include/uapi/asm/sgx.h
@@ -0,0 +1,66 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) WITH Linux-syscall-note */
+/*
+ * Copyright(c) 2016-19 Intel Corporation.
+ */
+#ifndef _UAPI_ASM_X86_SGX_H
+#define _UAPI_ASM_X86_SGX_H
+
+#include <linux/types.h>
+#include <linux/ioctl.h>
+
+/**
+ * enum sgx_epage_flags - page control flags
+ * %SGX_PAGE_MEASURE:	Measure the page contents with a sequence of
+ *			ENCLS[EEXTEND] operations.
+ */
+enum sgx_page_flags {
+	SGX_PAGE_MEASURE	= 0x01,
+};
+
+#define SGX_MAGIC 0xA4
+
+#define SGX_IOC_ENCLAVE_CREATE \
+	_IOW(SGX_MAGIC, 0x00, struct sgx_enclave_create)
+#define SGX_IOC_ENCLAVE_ADD_PAGES \
+	_IOWR(SGX_MAGIC, 0x01, struct sgx_enclave_add_pages)
+#define SGX_IOC_ENCLAVE_INIT \
+	_IOW(SGX_MAGIC, 0x02, struct sgx_enclave_init)
+
+/**
+ * struct sgx_enclave_create - parameter structure for the
+ *                             %SGX_IOC_ENCLAVE_CREATE ioctl
+ * @src:	address for the SECS page data
+ */
+struct sgx_enclave_create  {
+	__u64	src;
+};
+
+/**
+ * struct sgx_enclave_add_pages - parameter structure for the
+ *                                %SGX_IOC_ENCLAVE_ADD_PAGE ioctl
+ * @src:	start address for the page data
+ * @offset:	starting page offset
+ * @length:	length of the data (multiple of the page size)
+ * @secinfo:	address for the SECINFO data
+ * @flags:	page control flags
+ * @count:	number of bytes added (multiple of the page size)
+ */
+struct sgx_enclave_add_pages {
+	__u64	src;
+	__u64	offset;
+	__u64	length;
+	__u64	secinfo;
+	__u64	flags;
+	__u64	count;
+};
+
+/**
+ * struct sgx_enclave_init - parameter structure for the
+ *                           %SGX_IOC_ENCLAVE_INIT ioctl
+ * @sigstruct:	address for the SIGSTRUCT data
+ */
+struct sgx_enclave_init {
+	__u64 sigstruct;
+};
+
+#endif /* _UAPI_ASM_X86_SGX_H */
diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile
index 874492d9e3bd..3ddcdabab081 100644
--- a/arch/x86/kernel/cpu/sgx/Makefile
+++ b/arch/x86/kernel/cpu/sgx/Makefile
@@ -1,4 +1,7 @@
 obj-y += \
+	driver.o \
+	encl.o \
 	encls.o \
+	ioctl.o \
 	main.o \
 	reclaim.o
diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c
new file mode 100644
index 000000000000..c724dcccf2e2
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/driver.c
@@ -0,0 +1,252 @@
+// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
+// Copyright(c) 2016-18 Intel Corporation.
+
+#include <linux/acpi.h>
+#include <linux/cdev.h>
+#include <linux/mman.h>
+#include <linux/platform_device.h>
+#include <linux/security.h>
+#include <linux/suspend.h>
+#include <asm/traps.h>
+#include "driver.h"
+#include "encl.h"
+
+MODULE_DESCRIPTION("Intel SGX Enclave Driver");
+MODULE_AUTHOR("Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>");
+MODULE_LICENSE("Dual BSD/GPL");
+
+struct workqueue_struct *sgx_encl_wq;
+u64 sgx_encl_size_max_32;
+u64 sgx_encl_size_max_64;
+u32 sgx_misc_reserved_mask;
+u64 sgx_attributes_reserved_mask;
+u64 sgx_xfrm_reserved_mask = ~0x3;
+u32 sgx_xsave_size_tbl[64];
+
+static int sgx_open(struct inode *inode, struct file *file)
+{
+	struct sgx_encl *encl;
+	int ret;
+
+	encl = kzalloc(sizeof(*encl), GFP_KERNEL);
+	if (!encl)
+		return -ENOMEM;
+
+	atomic_set(&encl->flags, 0);
+	kref_init(&encl->refcount);
+	INIT_RADIX_TREE(&encl->page_tree, GFP_KERNEL);
+	mutex_init(&encl->lock);
+	INIT_LIST_HEAD(&encl->mm_list);
+	spin_lock_init(&encl->mm_lock);
+
+	ret = init_srcu_struct(&encl->srcu);
+	if (ret) {
+		kfree(encl);
+		return ret;
+	}
+
+	file->private_data = encl;
+
+	return 0;
+}
+
+static int sgx_release(struct inode *inode, struct file *file)
+{
+	struct sgx_encl *encl = file->private_data;
+	struct sgx_encl_mm *encl_mm;
+
+	for ( ; ; )  {
+		spin_lock(&encl->mm_lock);
+
+		if (list_empty(&encl->mm_list)) {
+			encl_mm = NULL;
+		} else {
+			encl_mm = list_first_entry(&encl->mm_list,
+						   struct sgx_encl_mm, list);
+			list_del_rcu(&encl_mm->list);
+		}
+
+		spin_unlock(&encl->mm_lock);
+
+		/* The list is empty, ready to go. */
+		if (!encl_mm)
+			break;
+
+		synchronize_srcu(&encl->srcu);
+		mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm);
+		kfree(encl_mm);
+	};
+
+	mutex_lock(&encl->lock);
+	atomic_or(SGX_ENCL_DEAD, &encl->flags);
+	mutex_unlock(&encl->lock);
+
+	kref_put(&encl->refcount, sgx_encl_release);
+	return 0;
+}
+
+#ifdef CONFIG_COMPAT
+static long sgx_compat_ioctl(struct file *filep, unsigned int cmd,
+			      unsigned long arg)
+{
+	return sgx_ioctl(filep, cmd, arg);
+}
+#endif
+
+static int sgx_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct sgx_encl *encl = file->private_data;
+	int ret;
+
+	ret = sgx_encl_may_map(encl, vma->vm_start, vma->vm_end,
+			       vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC));
+	if (ret)
+		return ret;
+
+	ret = sgx_encl_mm_add(encl, vma->vm_mm);
+	if (ret)
+		return ret;
+
+	vma->vm_ops = &sgx_vm_ops;
+	vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO;
+	vma->vm_private_data = encl;
+
+	return 0;
+}
+
+static unsigned long sgx_get_unmapped_area(struct file *file,
+					   unsigned long addr,
+					   unsigned long len,
+					   unsigned long pgoff,
+					   unsigned long flags)
+{
+	if (flags & MAP_PRIVATE)
+		return -EINVAL;
+
+	if (flags & MAP_FIXED)
+		return addr;
+
+	return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);
+}
+
+static const struct file_operations sgx_encl_fops = {
+	.owner			= THIS_MODULE,
+	.open			= sgx_open,
+	.release		= sgx_release,
+	.unlocked_ioctl		= sgx_ioctl,
+#ifdef CONFIG_COMPAT
+	.compat_ioctl		= sgx_compat_ioctl,
+#endif
+	.mmap			= sgx_mmap,
+	.get_unmapped_area	= sgx_get_unmapped_area,
+};
+
+static struct bus_type sgx_bus_type = {
+	.name	= "sgx",
+};
+
+static struct device sgx_encl_dev;
+static struct cdev sgx_encl_cdev;
+static dev_t sgx_devt;
+
+static void sgx_dev_release(struct device *dev)
+{
+}
+
+static __init int sgx_dev_init(const char *name, struct device *dev,
+			       struct cdev *cdev,
+			       const struct file_operations *fops, int minor)
+{
+	int ret;
+
+	device_initialize(dev);
+
+	dev->bus = &sgx_bus_type;
+	dev->devt = MKDEV(MAJOR(sgx_devt), minor);
+	dev->release = sgx_dev_release;
+
+	ret = dev_set_name(dev, name);
+	if (ret) {
+		put_device(dev);
+		return ret;
+	}
+
+	cdev_init(cdev, fops);
+	cdev->owner = THIS_MODULE;
+	return 0;
+}
+
+int __init sgx_drv_init(void)
+{
+	unsigned int eax, ebx, ecx, edx;
+	u64 attr_mask, xfrm_mask;
+	int ret;
+	int i;
+
+	if (!boot_cpu_has(X86_FEATURE_SGX_LC)) {
+		pr_info("The public key MSRs are not writable\n");
+		return -ENODEV;
+	}
+
+	ret = bus_register(&sgx_bus_type);
+	if (ret)
+		return ret;
+
+	ret = alloc_chrdev_region(&sgx_devt, 0, SGX_DRV_NR_DEVICES, "sgx");
+	if (ret < 0)
+		goto err_bus;
+
+	cpuid_count(SGX_CPUID, 0, &eax, &ebx, &ecx, &edx);
+	sgx_misc_reserved_mask = ~ebx | SGX_MISC_RESERVED_MASK;
+	sgx_encl_size_max_64 = 1ULL << ((edx >> 8) & 0xFF);
+	sgx_encl_size_max_32 = 1ULL << (edx & 0xFF);
+
+	cpuid_count(SGX_CPUID, 1, &eax, &ebx, &ecx, &edx);
+
+	attr_mask = (((u64)ebx) << 32) + (u64)eax;
+	sgx_attributes_reserved_mask = ~attr_mask | SGX_ATTR_RESERVED_MASK;
+
+	if (boot_cpu_has(X86_FEATURE_OSXSAVE)) {
+		xfrm_mask = (((u64)edx) << 32) + (u64)ecx;
+
+		for (i = 2; i < 64; i++) {
+			cpuid_count(0x0D, i, &eax, &ebx, &ecx, &edx);
+			if ((1 << i) & xfrm_mask)
+				sgx_xsave_size_tbl[i] = eax + ebx;
+		}
+
+		sgx_xfrm_reserved_mask = ~xfrm_mask;
+	}
+
+	ret = sgx_dev_init("sgx/enclave", &sgx_encl_dev, &sgx_encl_cdev,
+			   &sgx_encl_fops, 0);
+	if (ret)
+		goto err_chrdev_region;
+
+	sgx_encl_wq = alloc_workqueue("sgx-encl-wq",
+				      WQ_UNBOUND | WQ_FREEZABLE, 1);
+	if (!sgx_encl_wq) {
+		ret = -ENOMEM;
+		goto err_encl_dev;
+	}
+
+	ret = cdev_device_add(&sgx_encl_cdev, &sgx_encl_dev);
+	if (ret)
+		goto err_encl_wq;
+
+	return 0;
+
+err_encl_wq:
+	destroy_workqueue(sgx_encl_wq);
+
+err_encl_dev:
+	put_device(&sgx_encl_dev);
+
+err_chrdev_region:
+	unregister_chrdev_region(sgx_devt, SGX_DRV_NR_DEVICES);
+
+err_bus:
+	bus_unregister(&sgx_bus_type);
+
+	return ret;
+}
diff --git a/arch/x86/kernel/cpu/sgx/driver.h b/arch/x86/kernel/cpu/sgx/driver.h
new file mode 100644
index 000000000000..e95c6e86c0c6
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/driver.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */
+#ifndef __ARCH_SGX_DRIVER_H__
+#define __ARCH_SGX_DRIVER_H__
+
+#include <crypto/hash.h>
+#include <linux/kref.h>
+#include <linux/mmu_notifier.h>
+#include <linux/radix-tree.h>
+#include <linux/rwsem.h>
+#include <linux/sched.h>
+#include <linux/workqueue.h>
+#include <uapi/asm/sgx.h>
+#include "sgx.h"
+
+#define SGX_DRV_NR_DEVICES	2
+#define SGX_EINIT_SPIN_COUNT	20
+#define SGX_EINIT_SLEEP_COUNT	50
+#define SGX_EINIT_SLEEP_TIME	20
+
+extern struct workqueue_struct *sgx_encl_wq;
+extern u64 sgx_encl_size_max_32;
+extern u64 sgx_encl_size_max_64;
+extern u32 sgx_misc_reserved_mask;
+extern u64 sgx_attributes_reserved_mask;
+extern u64 sgx_xfrm_reserved_mask;
+extern u32 sgx_xsave_size_tbl[64];
+
+long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg);
+
+int sgx_drv_init(void);
+
+#endif /* __ARCH_X86_SGX_DRIVER_H__ */
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
new file mode 100644
index 000000000000..cd2b8dbb0eca
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -0,0 +1,329 @@
+// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
+// Copyright(c) 2016-18 Intel Corporation.
+
+#include <linux/lockdep.h>
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/shmem_fs.h>
+#include <linux/suspend.h>
+#include <linux/sched/mm.h>
+#include "arch.h"
+#include "encl.h"
+#include "sgx.h"
+
+static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
+						unsigned long addr)
+{
+	struct sgx_encl_page *entry;
+	unsigned int flags;
+
+	/* If process was forked, VMA is still there but vm_private_data is set
+	 * to NULL.
+	 */
+	if (!encl)
+		return ERR_PTR(-EFAULT);
+
+	flags = atomic_read(&encl->flags);
+
+	if ((flags & SGX_ENCL_DEAD) || !(flags & SGX_ENCL_INITIALIZED))
+		return ERR_PTR(-EFAULT);
+
+	entry = radix_tree_lookup(&encl->page_tree, addr >> PAGE_SHIFT);
+	if (!entry)
+		return ERR_PTR(-EFAULT);
+
+	/* Page is already resident in the EPC. */
+	if (entry->epc_page)
+		return entry;
+
+	return ERR_PTR(-EFAULT);
+}
+
+static void sgx_mmu_notifier_release(struct mmu_notifier *mn,
+				     struct mm_struct *mm)
+{
+	struct sgx_encl_mm *encl_mm =
+		container_of(mn, struct sgx_encl_mm, mmu_notifier);
+	struct sgx_encl_mm *tmp = NULL;
+
+	/*
+	 * The enclave itself can remove encl_mm.  Note, objects can't be moved
+	 * off an RCU protected list, but deletion is ok.
+	 */
+	spin_lock(&encl_mm->encl->mm_lock);
+	list_for_each_entry(tmp, &encl_mm->encl->mm_list, list) {
+		if (tmp == encl_mm) {
+			list_del_rcu(&encl_mm->list);
+			break;
+		}
+	}
+	spin_unlock(&encl_mm->encl->mm_lock);
+
+	if (tmp == encl_mm) {
+		synchronize_srcu(&encl_mm->encl->srcu);
+		mmu_notifier_put(mn);
+	}
+}
+
+static void sgx_mmu_notifier_free(struct mmu_notifier *mn)
+{
+	struct sgx_encl_mm *encl_mm =
+		container_of(mn, struct sgx_encl_mm, mmu_notifier);
+
+	kfree(encl_mm);
+}
+
+static const struct mmu_notifier_ops sgx_mmu_notifier_ops = {
+	.release		= sgx_mmu_notifier_release,
+	.free_notifier		= sgx_mmu_notifier_free,
+};
+
+static struct sgx_encl_mm *sgx_encl_find_mm(struct sgx_encl *encl,
+					    struct mm_struct *mm)
+{
+	struct sgx_encl_mm *encl_mm = NULL;
+	struct sgx_encl_mm *tmp;
+	int idx;
+
+	idx = srcu_read_lock(&encl->srcu);
+
+	list_for_each_entry_rcu(tmp, &encl->mm_list, list) {
+		if (tmp->mm == mm) {
+			encl_mm = tmp;
+			break;
+		}
+	}
+
+	srcu_read_unlock(&encl->srcu, idx);
+
+	return encl_mm;
+}
+
+int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm)
+{
+	struct sgx_encl_mm *encl_mm;
+	int ret;
+
+	if (atomic_read(&encl->flags) & SGX_ENCL_DEAD)
+		return -EINVAL;
+
+	/*
+	 * mm_structs are kept on mm_list until the mm or the enclave dies,
+	 * i.e. once an mm is off the list, it's gone for good, therefore it's
+	 * impossible to get a false positive on @mm due to a stale mm_list.
+	 */
+	if (sgx_encl_find_mm(encl, mm))
+		return 0;
+
+	encl_mm = kzalloc(sizeof(*encl_mm), GFP_KERNEL);
+	if (!encl_mm)
+		return -ENOMEM;
+
+	encl_mm->encl = encl;
+	encl_mm->mm = mm;
+	encl_mm->mmu_notifier.ops = &sgx_mmu_notifier_ops;
+
+	ret = __mmu_notifier_register(&encl_mm->mmu_notifier, mm);
+	if (ret) {
+		kfree(encl_mm);
+		return ret;
+	}
+
+	spin_lock(&encl->mm_lock);
+	list_add_rcu(&encl_mm->list, &encl->mm_list);
+	spin_unlock(&encl->mm_lock);
+
+	synchronize_srcu(&encl->srcu);
+
+	return 0;
+}
+
+static void sgx_vma_open(struct vm_area_struct *vma)
+{
+	struct sgx_encl *encl = vma->vm_private_data;
+
+	if (!encl)
+		return;
+
+	if (sgx_encl_mm_add(encl, vma->vm_mm))
+		vma->vm_private_data = NULL;
+}
+
+static unsigned int sgx_vma_fault(struct vm_fault *vmf)
+{
+	unsigned long addr = (unsigned long)vmf->address;
+	struct vm_area_struct *vma = vmf->vma;
+	struct sgx_encl *encl = vma->vm_private_data;
+	struct sgx_encl_page *entry;
+	int ret = VM_FAULT_NOPAGE;
+	unsigned long pfn;
+
+	if (!encl)
+		return VM_FAULT_SIGBUS;
+
+	mutex_lock(&encl->lock);
+
+	entry = sgx_encl_load_page(encl, addr);
+	if (IS_ERR(entry)) {
+		if (unlikely(PTR_ERR(entry) != -EBUSY))
+			ret = VM_FAULT_SIGBUS;
+
+		goto out;
+	}
+
+	if (!follow_pfn(vma, addr, &pfn))
+		goto out;
+
+	ret = vmf_insert_pfn(vma, addr, PFN_DOWN(entry->epc_page->desc));
+	if (ret != VM_FAULT_NOPAGE) {
+		ret = VM_FAULT_SIGBUS;
+		goto out;
+	}
+
+out:
+	mutex_unlock(&encl->lock);
+	return ret;
+}
+
+/**
+ * sgx_encl_may_map() - Check if a requested VMA mapping is allowed
+ * @encl:		an enclave
+ * @start:		lower bound of the address range, inclusive
+ * @end:		upper bound of the address range, exclusive
+ * @vm_prot_bits:	requested protections of the address range
+ *
+ * Iterate through the enclave pages contained within [@start, @end) to verify
+ * the permissions requested by @vm_prot_bits do not exceed that of any enclave
+ * page to be mapped.  Page addresses that do not have an associated enclave
+ * page are interpreted to zero permissions.
+ *
+ * Return:
+ *   0 on success,
+ *   -EACCES if VMA permissions exceed enclave page permissions
+ */
+int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
+		     unsigned long end, unsigned long vm_prot_bits)
+{
+	unsigned long idx, idx_start, idx_end;
+	struct sgx_encl_page *page;
+
+	/* PROT_NONE always succeeds. */
+	if (!vm_prot_bits)
+		return 0;
+
+	idx_start = PFN_DOWN(start);
+	idx_end = PFN_DOWN(end - 1);
+
+	for (idx = idx_start; idx <= idx_end; ++idx) {
+		mutex_lock(&encl->lock);
+		page = radix_tree_lookup(&encl->page_tree, idx);
+		mutex_unlock(&encl->lock);
+
+		if (!page || (~page->vm_max_prot_bits & vm_prot_bits))
+			return -EACCES;
+	}
+
+	return 0;
+}
+
+static int sgx_vma_mprotect(struct vm_area_struct *vma, unsigned long start,
+			    unsigned long end, unsigned long prot)
+{
+	return sgx_encl_may_map(vma->vm_private_data, start, end,
+				calc_vm_prot_bits(prot, 0));
+}
+
+const struct vm_operations_struct sgx_vm_ops = {
+	.open = sgx_vma_open,
+	.fault = sgx_vma_fault,
+	.may_mprotect = sgx_vma_mprotect,
+};
+
+/**
+ * sgx_encl_find - find an enclave
+ * @mm:		mm struct of the current process
+ * @addr:	address in the ELRANGE
+ * @vma:	the resulting VMA
+ *
+ * Find an enclave identified by the given address. Give back a VMA that is
+ * part of the enclave and located in that address. The VMA is given back if it
+ * is a proper enclave VMA even if an &sgx_encl instance does not exist yet
+ * (enclave creation has not been performed).
+ *
+ * Return:
+ *   0 on success,
+ *   -EINVAL if an enclave was not found,
+ *   -ENOENT if the enclave has not been created yet
+ */
+int sgx_encl_find(struct mm_struct *mm, unsigned long addr,
+		  struct vm_area_struct **vma)
+{
+	struct vm_area_struct *result;
+	struct sgx_encl *encl;
+
+	result = find_vma(mm, addr);
+	if (!result || result->vm_ops != &sgx_vm_ops || addr < result->vm_start)
+		return -EINVAL;
+
+	encl = result->vm_private_data;
+	*vma = result;
+
+	return encl ? 0 : -ENOENT;
+}
+
+/**
+ * sgx_encl_destroy() - destroy enclave resources
+ * @encl:	an &sgx_encl instance
+ */
+void sgx_encl_destroy(struct sgx_encl *encl)
+{
+	struct sgx_encl_page *entry;
+	struct radix_tree_iter iter;
+	void **slot;
+
+	atomic_or(SGX_ENCL_DEAD, &encl->flags);
+
+	radix_tree_for_each_slot(slot, &encl->page_tree, &iter, 0) {
+		entry = *slot;
+
+		if (entry->epc_page) {
+			sgx_free_page(entry->epc_page);
+			encl->secs_child_cnt--;
+			entry->epc_page = NULL;
+		}
+
+		radix_tree_delete(&entry->encl->page_tree,
+				  PFN_DOWN(entry->desc));
+		kfree(entry);
+	}
+
+	if (!encl->secs_child_cnt && encl->secs.epc_page) {
+		sgx_free_page(encl->secs.epc_page);
+		encl->secs.epc_page = NULL;
+	}
+}
+
+/**
+ * sgx_encl_release - Destroy an enclave instance
+ * @kref:	address of a kref inside &sgx_encl
+ *
+ * Used together with kref_put(). Frees all the resources associated with the
+ * enclave and the instance itself.
+ */
+void sgx_encl_release(struct kref *ref)
+{
+	struct sgx_encl *encl = container_of(ref, struct sgx_encl, refcount);
+
+	sgx_encl_destroy(encl);
+
+	if (encl->backing)
+		fput(encl->backing);
+
+	WARN_ON_ONCE(!list_empty(&encl->mm_list));
+
+	/* Detect EPC page leak's. */
+	WARN_ON_ONCE(encl->secs_child_cnt);
+	WARN_ON_ONCE(encl->secs.epc_page);
+
+	kfree(encl);
+}
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
new file mode 100644
index 000000000000..1d1bc5d590ee
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -0,0 +1,87 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */
+/**
+ * Copyright(c) 2016-19 Intel Corporation.
+ */
+#ifndef _X86_ENCL_H
+#define _X86_ENCL_H
+
+#include <linux/cpumask.h>
+#include <linux/kref.h>
+#include <linux/list.h>
+#include <linux/mm_types.h>
+#include <linux/mmu_notifier.h>
+#include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <linux/radix-tree.h>
+#include <linux/srcu.h>
+#include <linux/workqueue.h>
+#include "sgx.h"
+
+/**
+ * enum sgx_encl_page_desc - defines bits for an enclave page's descriptor
+ * %SGX_ENCL_PAGE_ADDR_MASK:		Holds the virtual address of the page.
+ *
+ * The page address for SECS is zero and is used by the subsystem to recognize
+ * the SECS page.
+ */
+enum sgx_encl_page_desc {
+	/* Bits 11:3 are available when the page is not swapped. */
+	SGX_ENCL_PAGE_ADDR_MASK		= PAGE_MASK,
+};
+
+#define SGX_ENCL_PAGE_ADDR(page) \
+	((page)->desc & SGX_ENCL_PAGE_ADDR_MASK)
+
+struct sgx_encl_page {
+	unsigned long desc;
+	unsigned long vm_max_prot_bits;
+	struct sgx_epc_page *epc_page;
+	struct sgx_encl *encl;
+};
+
+enum sgx_encl_flags {
+	SGX_ENCL_CREATED	= BIT(0),
+	SGX_ENCL_INITIALIZED	= BIT(1),
+	SGX_ENCL_DEBUG		= BIT(2),
+	SGX_ENCL_DEAD		= BIT(3),
+	SGX_ENCL_IOCTL		= BIT(4),
+};
+
+struct sgx_encl_mm {
+	struct sgx_encl *encl;
+	struct mm_struct *mm;
+	struct list_head list;
+	struct mmu_notifier mmu_notifier;
+};
+
+struct sgx_encl {
+	atomic_t flags;
+	u64 secs_attributes;
+	u64 allowed_attributes;
+	unsigned int page_cnt;
+	unsigned int secs_child_cnt;
+	struct mutex lock;
+	struct list_head mm_list;
+	spinlock_t mm_lock;
+	struct file *backing;
+	struct kref refcount;
+	struct srcu_struct srcu;
+	unsigned long base;
+	unsigned long size;
+	unsigned long ssaframesize;
+	struct radix_tree_root page_tree;
+	struct sgx_encl_page secs;
+	cpumask_t cpumask;
+};
+
+extern const struct vm_operations_struct sgx_vm_ops;
+
+int sgx_encl_find(struct mm_struct *mm, unsigned long addr,
+		  struct vm_area_struct **vma);
+void sgx_encl_destroy(struct sgx_encl *encl);
+void sgx_encl_release(struct kref *ref);
+int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm);
+int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
+		     unsigned long end, unsigned long vm_prot_bits);
+
+#endif /* _X86_ENCL_H */
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
new file mode 100644
index 000000000000..cd2146a15a22
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -0,0 +1,663 @@
+// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
+// Copyright(c) 2016-19 Intel Corporation.
+
+#include <asm/mman.h>
+#include <linux/mman.h>
+#include <linux/delay.h>
+#include <linux/file.h>
+#include <linux/hashtable.h>
+#include <linux/highmem.h>
+#include <linux/ratelimit.h>
+#include <linux/sched/signal.h>
+#include <linux/shmem_fs.h>
+#include <linux/slab.h>
+#include <linux/suspend.h>
+#include "driver.h"
+#include "encl.h"
+#include "encls.h"
+
+static u32 sgx_calc_ssaframesize(u32 miscselect, u64 xfrm)
+{
+	u32 size_max = PAGE_SIZE;
+	u32 size;
+	int i;
+
+	for (i = 2; i < 64; i++) {
+		if (!((1 << i) & xfrm))
+			continue;
+
+		size = SGX_SSA_GPRS_SIZE + sgx_xsave_size_tbl[i];
+		if (miscselect & SGX_MISC_EXINFO)
+			size += SGX_SSA_MISC_EXINFO_SIZE;
+
+		if (size > size_max)
+			size_max = size;
+	}
+
+	return PFN_UP(size_max);
+}
+
+static int sgx_validate_secs(const struct sgx_secs *secs,
+			     unsigned long ssaframesize)
+{
+	if (secs->size < (2 * PAGE_SIZE) || !is_power_of_2(secs->size))
+		return -EINVAL;
+
+	if (secs->base & (secs->size - 1))
+		return -EINVAL;
+
+	if (secs->miscselect & sgx_misc_reserved_mask ||
+	    secs->attributes & sgx_attributes_reserved_mask ||
+	    secs->xfrm & sgx_xfrm_reserved_mask)
+		return -EINVAL;
+
+	if (secs->attributes & SGX_ATTR_MODE64BIT) {
+		if (secs->size > sgx_encl_size_max_64)
+			return -EINVAL;
+	} else if (secs->size > sgx_encl_size_max_32)
+		return -EINVAL;
+
+	if (!(secs->xfrm & XFEATURE_MASK_FP) ||
+	    !(secs->xfrm & XFEATURE_MASK_SSE) ||
+	    (((secs->xfrm >> XFEATURE_BNDREGS) & 1) !=
+	     ((secs->xfrm >> XFEATURE_BNDCSR) & 1)))
+		return -EINVAL;
+
+	if (!secs->ssa_frame_size || ssaframesize > secs->ssa_frame_size)
+		return -EINVAL;
+
+	if (memchr_inv(secs->reserved1, 0, sizeof(secs->reserved1)) ||
+	    memchr_inv(secs->reserved2, 0, sizeof(secs->reserved2)) ||
+	    memchr_inv(secs->reserved3, 0, sizeof(secs->reserved3)) ||
+	    memchr_inv(secs->reserved4, 0, sizeof(secs->reserved4)))
+		return -EINVAL;
+
+	return 0;
+}
+
+static struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl *encl,
+						 unsigned long offset,
+						 u64 secinfo_flags)
+{
+	struct sgx_encl_page *encl_page;
+	unsigned long prot;
+
+	encl_page = kzalloc(sizeof(*encl_page), GFP_KERNEL);
+	if (!encl_page)
+		return ERR_PTR(-ENOMEM);
+
+	encl_page->desc = encl->base + offset;
+	encl_page->encl = encl;
+
+	prot = _calc_vm_trans(secinfo_flags, SGX_SECINFO_R, PROT_READ)  |
+	       _calc_vm_trans(secinfo_flags, SGX_SECINFO_W, PROT_WRITE) |
+	       _calc_vm_trans(secinfo_flags, SGX_SECINFO_X, PROT_EXEC);
+
+	/*
+	 * TCS pages must always RW set for CPU access while the SECINFO
+	 * permissions are *always* zero - the CPU ignores the user provided
+	 * values and silently overwrites them with zero permissions.
+	 */
+	if ((secinfo_flags & SGX_SECINFO_PAGE_TYPE_MASK) == SGX_SECINFO_TCS)
+		prot |= PROT_READ | PROT_WRITE;
+
+	/* Calculate maximum of the VM flags for the page. */
+	encl_page->vm_max_prot_bits = calc_vm_prot_bits(prot, 0);
+
+	return encl_page;
+}
+
+static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs)
+{
+	unsigned long encl_size = secs->size + PAGE_SIZE;
+	struct sgx_epc_page *secs_epc;
+	unsigned long ssaframesize;
+	struct sgx_pageinfo pginfo;
+	struct sgx_secinfo secinfo;
+	struct file *backing;
+	long ret;
+
+	if (atomic_read(&encl->flags) & SGX_ENCL_CREATED)
+		return -EINVAL;
+
+	ssaframesize = sgx_calc_ssaframesize(secs->miscselect, secs->xfrm);
+	if (sgx_validate_secs(secs, ssaframesize)) {
+		pr_debug("invalid SECS\n");
+		return -EINVAL;
+	}
+
+	backing = shmem_file_setup("SGX backing", encl_size + (encl_size >> 5),
+				   VM_NORESERVE);
+	if (IS_ERR(backing))
+		return PTR_ERR(backing);
+
+	encl->backing = backing;
+
+	secs_epc = sgx_try_alloc_page();
+	if (IS_ERR(secs_epc)) {
+		ret = PTR_ERR(secs_epc);
+		goto err_out_backing;
+	}
+
+	encl->secs.epc_page = secs_epc;
+
+	pginfo.addr = 0;
+	pginfo.contents = (unsigned long)secs;
+	pginfo.metadata = (unsigned long)&secinfo;
+	pginfo.secs = 0;
+	memset(&secinfo, 0, sizeof(secinfo));
+
+	ret = __ecreate((void *)&pginfo, sgx_epc_addr(secs_epc));
+	if (ret) {
+		pr_debug("ECREATE returned %ld\n", ret);
+		goto err_out;
+	}
+
+	if (secs->attributes & SGX_ATTR_DEBUG)
+		atomic_or(SGX_ENCL_DEBUG, &encl->flags);
+
+	encl->secs.encl = encl;
+	encl->secs_attributes = secs->attributes;
+	encl->allowed_attributes |= SGX_ATTR_ALLOWED_MASK;
+	encl->base = secs->base;
+	encl->size = secs->size;
+	encl->ssaframesize = secs->ssa_frame_size;
+
+	/*
+	 * Set SGX_ENCL_CREATED only after the enclave is fully prepped.  This
+	 * allows setting and checking enclave creation without having to take
+	 * encl->lock.
+	 */
+	atomic_or(SGX_ENCL_CREATED, &encl->flags);
+
+	return 0;
+
+err_out:
+	sgx_free_page(encl->secs.epc_page);
+	encl->secs.epc_page = NULL;
+
+err_out_backing:
+	fput(encl->backing);
+	encl->backing = NULL;
+
+	return ret;
+}
+
+/**
+ * sgx_ioc_enclave_create - handler for %SGX_IOC_ENCLAVE_CREATE
+ * @filep:	open file to /dev/sgx
+ * @arg:	userspace pointer to a struct sgx_enclave_create instance
+ *
+ * Allocate kernel data structures for a new enclave and execute ECREATE after
+ * verifying the correctness of the provided SECS.
+ *
+ * Note, enforcement of restricted and disallowed attributes is deferred until
+ * sgx_ioc_enclave_init(), only the architectural correctness of the SECS is
+ * checked by sgx_ioc_enclave_create().
+ *
+ * Return:
+ *   0 on success,
+ *   -errno otherwise
+ */
+static long sgx_ioc_enclave_create(struct sgx_encl *encl, void __user *arg)
+{
+	struct sgx_enclave_create ecreate;
+	struct page *secs_page;
+	struct sgx_secs *secs;
+	int ret;
+
+	if (copy_from_user(&ecreate, arg, sizeof(ecreate)))
+		return -EFAULT;
+
+	secs_page = alloc_page(GFP_KERNEL);
+	if (!secs_page)
+		return -ENOMEM;
+
+	secs = kmap(secs_page);
+	if (copy_from_user(secs, (void __user *)ecreate.src, sizeof(*secs))) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	ret = sgx_encl_create(encl, secs);
+
+out:
+	kunmap(secs_page);
+	__free_page(secs_page);
+	return ret;
+}
+
+static int sgx_validate_secinfo(struct sgx_secinfo *secinfo)
+{
+	u64 perm = secinfo->flags & SGX_SECINFO_PERMISSION_MASK;
+	u64 pt = secinfo->flags & SGX_SECINFO_PAGE_TYPE_MASK;
+
+	if (pt != SGX_SECINFO_REG && pt != SGX_SECINFO_TCS)
+		return -EINVAL;
+
+	if ((perm & SGX_SECINFO_W) && !(perm & SGX_SECINFO_R))
+		return -EINVAL;
+
+	/*
+	 * CPU will silently overwrite the permissions as zero, which means
+	 * that we need to validate it ourselves.
+	 */
+	if (pt == SGX_SECINFO_TCS && perm)
+		return -EINVAL;
+
+	if (secinfo->flags & SGX_SECINFO_RESERVED_MASK)
+		return -EINVAL;
+
+	if (memchr_inv(secinfo->reserved, 0, sizeof(secinfo->reserved)))
+		return -EINVAL;
+
+	return 0;
+}
+
+static int __sgx_encl_add_page(struct sgx_encl *encl,
+			       struct sgx_encl_page *encl_page,
+			       struct sgx_epc_page *epc_page,
+			       struct sgx_secinfo *secinfo, unsigned long src)
+{
+	struct sgx_pageinfo pginfo;
+	struct vm_area_struct *vma;
+	struct page *src_page;
+	int ret;
+
+	/* Query vma's VM_MAYEXEC as an indirect path_noexec() check. */
+	if (encl_page->vm_max_prot_bits & VM_EXEC) {
+		vma = find_vma(current->mm, src);
+		if (!vma)
+			return -EFAULT;
+
+		if (!(vma->vm_flags & VM_MAYEXEC))
+			return -EACCES;
+	}
+
+	ret = get_user_pages(src, 1, 0, &src_page, NULL);
+	if (ret < 1)
+		return ret;
+
+	pginfo.secs = (unsigned long)sgx_epc_addr(encl->secs.epc_page);
+	pginfo.addr = SGX_ENCL_PAGE_ADDR(encl_page);
+	pginfo.metadata = (unsigned long)secinfo;
+	pginfo.contents = (unsigned long)kmap_atomic(src_page);
+
+	ret = __eadd(&pginfo, sgx_epc_addr(epc_page));
+
+	kunmap_atomic((void *)pginfo.contents);
+	put_page(src_page);
+
+	return ret ? -EIO : 0;
+}
+
+static int __sgx_encl_extend(struct sgx_encl *encl,
+			     struct sgx_epc_page *epc_page)
+{
+	int ret;
+	int i;
+
+	for (i = 0; i < 16; i++) {
+		ret = __eextend(sgx_epc_addr(encl->secs.epc_page),
+				sgx_epc_addr(epc_page) + (i * 0x100));
+		if (ret) {
+			if (encls_failed(ret))
+				ENCLS_WARN(ret, "EEXTEND");
+			return -EIO;
+		}
+	}
+
+	return 0;
+}
+
+static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src,
+			     unsigned long offset, unsigned long length,
+			     struct sgx_secinfo *secinfo, unsigned long flags)
+{
+	struct sgx_encl_page *encl_page;
+	struct sgx_epc_page *epc_page;
+	int ret;
+
+	encl_page = sgx_encl_page_alloc(encl, offset, secinfo->flags);
+	if (IS_ERR(encl_page))
+		return PTR_ERR(encl_page);
+
+	epc_page = sgx_try_alloc_page();
+	if (IS_ERR(epc_page)) {
+		kfree(encl_page);
+		return PTR_ERR(epc_page);
+	}
+
+	if (atomic_read(&encl->flags) &
+	    (SGX_ENCL_INITIALIZED | SGX_ENCL_DEAD)) {
+		ret = -EFAULT;
+		goto err_out_free;
+	}
+
+	down_read(&current->mm->mmap_sem);
+	mutex_lock(&encl->lock);
+
+	/*
+	 * Insert prior to EADD in case of OOM.  EADD modifies MRENCLAVE, i.e.
+	 * can't be gracefully unwound, while failure on EADD/EXTEND is limited
+	 * to userspace errors (or kernel/hardware bugs).
+	 */
+	ret = radix_tree_insert(&encl->page_tree, PFN_DOWN(encl_page->desc),
+				encl_page);
+	if (ret)
+		goto err_out_unlock;
+
+	ret = __sgx_encl_add_page(encl, encl_page, epc_page, secinfo,
+				  src);
+	if (ret) {
+		/* ENCLS failure. */
+		if (ret == -EIO)
+			sgx_encl_destroy(encl);
+
+		goto err_out;
+	}
+
+	/*
+	 * Complete the "add" before doing the "extend" so that the "add"
+	 * isn't in a half-baked state in the extremely unlikely scenario the
+	 * the enclave will be destroyed in response to EEXTEND failure.
+	 */
+	encl_page->encl = encl;
+	encl_page->epc_page = epc_page;
+	encl->secs_child_cnt++;
+
+	if (flags & SGX_PAGE_MEASURE) {
+		ret = __sgx_encl_extend(encl, epc_page);
+
+		/* ENCLS failure. */
+		if (ret) {
+			sgx_encl_destroy(encl);
+			goto out_unlock;
+		}
+	}
+
+out_unlock:
+	mutex_unlock(&encl->lock);
+	up_read(&current->mm->mmap_sem);
+	return ret;
+
+err_out:
+	radix_tree_delete(&encl_page->encl->page_tree,
+			  PFN_DOWN(encl_page->desc));
+
+err_out_unlock:
+	mutex_unlock(&encl->lock);
+	up_read(&current->mm->mmap_sem);
+
+err_out_free:
+	sgx_free_page(epc_page);
+	kfree(encl_page);
+
+	return ret;
+}
+
+/**
+ * sgx_ioc_enclave_add_pages() - The handler for %SGX_IOC_ENCLAVE_ADD_PAGES
+ * @encl:       pointer to an enclave instance (via ioctl() file pointer)
+ * @arg:	a user pointer to a struct sgx_enclave_add_pages instance
+ *
+ * Add one or more pages to an uninitialized enclave, and optionally extend the
+ * measurement with the contents of the page. The address range of pages must
+ * be contiguous. The SECINFO and measurement mask are applied to all pages.
+ *
+ * A SECINFO for a TCS is required to always contain zero permissions because
+ * CPU silently zeros them. Allowing anything else would cause a mismatch in
+ * the measurement.
+ *
+ * mmap()'s protection bits are capped by the page permissions. For each page
+ * address, the maximum protection bits are computed with the following
+ * heuristics:
+ *
+ * 1. A regular page: PROT_R, PROT_W and PROT_X match the SECINFO permissions.
+ * 2. A TCS page: PROT_R | PROT_W.
+ * 3. No page: PROT_NONE.
+ *
+ * mmap() is not allowed to surpass the minimum of the maximum protection bits
+ * within the given address range.
+ *
+ * As stated above, a non-existent page is interpreted as a page with no
+ * permissions. In effect, this allows mmap() with PROT_NONE to be used to seek
+ * an address range for the enclave that can be then populated into SECS.
+ *
+ * If ENCLS opcode fails, that effectively means that EPC has been invalidated.
+ * When this happens the enclave is destroyed and -EIO is returned to the
+ * caller.
+ *
+ * Return:
+ *   0 on success,
+ *   -EACCES if an executable source page is located in a noexec partition,
+ *   -EIO if either ENCLS[EADD] or ENCLS[EEXTEND] fails
+ *   -errno otherwise
+ */
+static long sgx_ioc_enclave_add_pages(struct sgx_encl *encl, void __user *arg)
+{
+	struct sgx_enclave_add_pages addp;
+	struct sgx_secinfo secinfo;
+	unsigned long c;
+	int ret;
+
+	if (!(atomic_read(&encl->flags) & SGX_ENCL_CREATED))
+		return -EINVAL;
+
+	if (copy_from_user(&addp, arg, sizeof(addp)))
+		return -EFAULT;
+
+	if (!IS_ALIGNED(addp.offset, PAGE_SIZE) ||
+	    !IS_ALIGNED(addp.src, PAGE_SIZE))
+		return -EINVAL;
+
+	if (!(access_ok(addp.src, PAGE_SIZE)))
+		return -EFAULT;
+
+	if (addp.length & (PAGE_SIZE - 1))
+		return -EINVAL;
+
+	if (addp.offset + addp.length - PAGE_SIZE >= encl->size)
+		return -EINVAL;
+
+	if (copy_from_user(&secinfo, (void __user *)addp.secinfo,
+			   sizeof(secinfo)))
+		return -EFAULT;
+
+	if (sgx_validate_secinfo(&secinfo))
+		return -EINVAL;
+
+	for (c = 0 ; c < addp.length; c += PAGE_SIZE) {
+		if (signal_pending(current)) {
+			ret = -ERESTARTSYS;
+			break;
+		}
+
+		if (need_resched())
+			cond_resched();
+
+		ret = sgx_encl_add_page(encl, addp.src + c, addp.offset + c,
+					addp.length - c, &secinfo, addp.flags);
+		if (ret)
+			break;
+	}
+
+	addp.count = c;
+
+	if (copy_to_user(arg, &addp, sizeof(addp)))
+		return -EFAULT;
+
+	return ret;
+}
+
+static int __sgx_get_key_hash(struct crypto_shash *tfm, const void *modulus,
+			      void *hash)
+{
+	SHASH_DESC_ON_STACK(shash, tfm);
+
+	shash->tfm = tfm;
+
+	return crypto_shash_digest(shash, modulus, SGX_MODULUS_SIZE, hash);
+}
+
+static int sgx_get_key_hash(const void *modulus, void *hash)
+{
+	struct crypto_shash *tfm;
+	int ret;
+
+	tfm = crypto_alloc_shash("sha256", 0, CRYPTO_ALG_ASYNC);
+	if (IS_ERR(tfm))
+		return PTR_ERR(tfm);
+
+	ret = __sgx_get_key_hash(tfm, modulus, hash);
+
+	crypto_free_shash(tfm);
+	return ret;
+}
+
+static int sgx_encl_init(struct sgx_encl *encl, struct sgx_sigstruct *sigstruct,
+			 struct sgx_einittoken *token)
+{
+	u64 mrsigner[4];
+	int ret;
+	int i;
+	int j;
+
+	/* Check that the required attributes have been authorized. */
+	if (encl->secs_attributes & ~encl->allowed_attributes)
+		return -EINVAL;
+
+	ret = sgx_get_key_hash(sigstruct->modulus, mrsigner);
+	if (ret)
+		return ret;
+
+	mutex_lock(&encl->lock);
+
+	if (atomic_read(&encl->flags) & SGX_ENCL_INITIALIZED) {
+		ret = -EFAULT;
+		goto err_out;
+	}
+
+	for (i = 0; i < SGX_EINIT_SLEEP_COUNT; i++) {
+		for (j = 0; j < SGX_EINIT_SPIN_COUNT; j++) {
+			ret = sgx_einit(sigstruct, token, encl->secs.epc_page,
+					mrsigner);
+			if (ret == SGX_UNMASKED_EVENT)
+				continue;
+			else
+				break;
+		}
+
+		if (ret != SGX_UNMASKED_EVENT)
+			break;
+
+		msleep_interruptible(SGX_EINIT_SLEEP_TIME);
+
+		if (signal_pending(current)) {
+			ret = -ERESTARTSYS;
+			goto err_out;
+		}
+	}
+
+	if (ret & ENCLS_FAULT_FLAG) {
+		if (encls_failed(ret))
+			ENCLS_WARN(ret, "EINIT");
+
+		sgx_encl_destroy(encl);
+		ret = -EFAULT;
+	} else if (ret) {
+		pr_debug("EINIT returned %d\n", ret);
+		ret = -EPERM;
+	} else {
+		atomic_or(SGX_ENCL_INITIALIZED, &encl->flags);
+	}
+
+err_out:
+	mutex_unlock(&encl->lock);
+	return ret;
+}
+
+/**
+ * sgx_ioc_enclave_init - handler for %SGX_IOC_ENCLAVE_INIT
+ *
+ * @filep:	open file to /dev/sgx
+ * @arg:	userspace pointer to a struct sgx_enclave_init instance
+ *
+ * Flush any outstanding enqueued EADD operations and perform EINIT.  The
+ * Launch Enclave Public Key Hash MSRs are rewritten as necessary to match
+ * the enclave's MRSIGNER, which is caculated from the provided sigstruct.
+ *
+ * Return:
+ *   0 on success,
+ *   SGX error code on EINIT failure,
+ *   -errno otherwise
+ */
+static long sgx_ioc_enclave_init(struct sgx_encl *encl, void __user *arg)
+{
+	struct sgx_einittoken *einittoken;
+	struct sgx_sigstruct *sigstruct;
+	struct sgx_enclave_init einit;
+	struct page *initp_page;
+	int ret;
+
+	if (!(atomic_read(&encl->flags) & SGX_ENCL_CREATED))
+		return -EINVAL;
+
+	if (copy_from_user(&einit, arg, sizeof(einit)))
+		return -EFAULT;
+
+	initp_page = alloc_page(GFP_KERNEL);
+	if (!initp_page)
+		return -ENOMEM;
+
+	sigstruct = kmap(initp_page);
+	einittoken = (struct sgx_einittoken *)
+		((unsigned long)sigstruct + PAGE_SIZE / 2);
+	memset(einittoken, 0, sizeof(*einittoken));
+
+	if (copy_from_user(sigstruct, (void __user *)einit.sigstruct,
+			   sizeof(*sigstruct))) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	ret = sgx_encl_init(encl, sigstruct, einittoken);
+
+out:
+	kunmap(initp_page);
+	__free_page(initp_page);
+	return ret;
+}
+
+
+long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
+{
+	struct sgx_encl *encl = filep->private_data;
+	int ret, encl_flags;
+
+	encl_flags = atomic_fetch_or(SGX_ENCL_IOCTL, &encl->flags);
+	if (encl_flags & SGX_ENCL_IOCTL)
+		return -EBUSY;
+
+	if (encl_flags & SGX_ENCL_DEAD)
+		return -EFAULT;
+
+	switch (cmd) {
+	case SGX_IOC_ENCLAVE_CREATE:
+		ret = sgx_ioc_enclave_create(encl, (void __user *)arg);
+		break;
+	case SGX_IOC_ENCLAVE_ADD_PAGES:
+		ret = sgx_ioc_enclave_add_pages(encl, (void __user *)arg);
+		break;
+	case SGX_IOC_ENCLAVE_INIT:
+		ret = sgx_ioc_enclave_init(encl, (void __user *)arg);
+		break;
+	default:
+		ret = -ENOIOCTLCMD;
+		break;
+	}
+
+	atomic_andnot(SGX_ENCL_IOCTL, &encl->flags);
+
+	return ret;
+}
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 6a37df61ae32..36a295a0272b 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -8,6 +8,7 @@
 #include <linux/ratelimit.h>
 #include <linux/sched/signal.h>
 #include <linux/slab.h>
+#include "driver.h"
 #include "encls.h"
 
 struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
@@ -196,6 +197,8 @@ static bool __init sgx_page_cache_init(void)
 
 static void __init sgx_init(void)
 {
+	int ret;
+
 	if (!boot_cpu_has(X86_FEATURE_SGX))
 		return;
 
@@ -205,8 +208,15 @@ static void __init sgx_init(void)
 	if (!sgx_page_reclaimer_init())
 		goto err_page_cache;
 
+	ret = sgx_drv_init();
+	if (ret)
+		goto err_kthread;
+
 	return;
 
+err_kthread:
+	kthread_stop(ksgxswapd_tsk);
+
 err_page_cache:
 	sgx_page_cache_teardown();
 }
diff --git a/arch/x86/kernel/cpu/sgx/reclaim.c b/arch/x86/kernel/cpu/sgx/reclaim.c
index f071158d34f6..bdb42f4326aa 100644
--- a/arch/x86/kernel/cpu/sgx/reclaim.c
+++ b/arch/x86/kernel/cpu/sgx/reclaim.c
@@ -10,6 +10,7 @@
 #include <linux/sched/mm.h>
 #include <linux/sched/signal.h>
 #include "encls.h"
+#include "driver.h"
 
 struct task_struct *ksgxswapd_tsk;
 
-- 
2.20.1


^ permalink raw reply related

* [PATCH v24 15/24] x86/sgx: Add provisioning
From: Jarkko Sakkinen @ 2019-11-29 23:13 UTC (permalink / raw)
  To: linux-kernel, x86, linux-sgx
  Cc: akpm, dave.hansen, sean.j.christopherson, nhorman, npmccallum,
	serge.ayoun, shay.katz-zamir, haitao.huang, andriy.shevchenko,
	tglx, kai.svahn, bp, josh, luto, kai.huang, rientjes, cedric.xing,
	puiterwijk, Jarkko Sakkinen, linux-security-module
In-Reply-To: <20191129231326.18076-1-jarkko.sakkinen@linux.intel.com>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=a, Size: 6385 bytes --]

In order to provide a mechanism for devilering provisoning rights:

1. Add a new device file /dev/sgx/provision that works as a token for
   allowing an enclave to have the provisioning privileges.
2. Add a new ioctl called SGX_IOC_ENCLAVE_SET_ATTRIBUTE that accepts the
   following data structure:

   struct sgx_enclave_set_attribute {
           __u64 addr;
           __u64 attribute_fd;
   };

A daemon could sit on top of /dev/sgx/provision and send a file
descriptor of this file to a process that needs to be able to provision
enclaves.

The way this API is used is straight-forward. Lets assume that dev_fd is
a handle to /dev/sgx/enclave and prov_fd is a handle to
/dev/sgx/provision.  You would allow SGX_IOC_ENCLAVE_CREATE to
initialize an enclave with the PROVISIONKEY attribute by

params.addr = <enclave address>;
params.token_fd = prov_fd;

ioctl(dev_fd, SGX_IOC_ENCLAVE_SET_ATTRIBUTE, &params);

Cc: linux-security-module@vger.kernel.org
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
---
 arch/x86/include/uapi/asm/sgx.h  | 11 ++++++++
 arch/x86/kernel/cpu/sgx/driver.c | 23 +++++++++++++++-
 arch/x86/kernel/cpu/sgx/driver.h |  2 ++
 arch/x86/kernel/cpu/sgx/ioctl.c  | 47 ++++++++++++++++++++++++++++++++
 4 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
index 5edb08ab8fd0..57d0d30c79b3 100644
--- a/arch/x86/include/uapi/asm/sgx.h
+++ b/arch/x86/include/uapi/asm/sgx.h
@@ -25,6 +25,8 @@ enum sgx_page_flags {
 	_IOWR(SGX_MAGIC, 0x01, struct sgx_enclave_add_pages)
 #define SGX_IOC_ENCLAVE_INIT \
 	_IOW(SGX_MAGIC, 0x02, struct sgx_enclave_init)
+#define SGX_IOC_ENCLAVE_SET_ATTRIBUTE \
+	_IOW(SGX_MAGIC, 0x03, struct sgx_enclave_set_attribute)
 
 /**
  * struct sgx_enclave_create - parameter structure for the
@@ -63,4 +65,13 @@ struct sgx_enclave_init {
 	__u64 sigstruct;
 };
 
+/**
+ * struct sgx_enclave_set_attribute - parameter structure for the
+ *				      %SGX_IOC_ENCLAVE_SET_ATTRIBUTE ioctl
+ * @attribute_fd:	file handle of the attribute file in the securityfs
+ */
+struct sgx_enclave_set_attribute {
+	__u64 attribute_fd;
+};
+
 #endif /* _UAPI_ASM_X86_SGX_H */
diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c
index c724dcccf2e2..4d996463b213 100644
--- a/arch/x86/kernel/cpu/sgx/driver.c
+++ b/arch/x86/kernel/cpu/sgx/driver.c
@@ -141,12 +141,18 @@ static const struct file_operations sgx_encl_fops = {
 	.get_unmapped_area	= sgx_get_unmapped_area,
 };
 
+const struct file_operations sgx_provision_fops = {
+	.owner			= THIS_MODULE,
+};
+
 static struct bus_type sgx_bus_type = {
 	.name	= "sgx",
 };
 
 static struct device sgx_encl_dev;
 static struct cdev sgx_encl_cdev;
+static struct device sgx_provision_dev;
+static struct cdev sgx_provision_cdev;
 static dev_t sgx_devt;
 
 static void sgx_dev_release(struct device *dev)
@@ -223,22 +229,37 @@ int __init sgx_drv_init(void)
 	if (ret)
 		goto err_chrdev_region;
 
+	ret = sgx_dev_init("sgx/provision", &sgx_provision_dev,
+			   &sgx_provision_cdev, &sgx_provision_fops, 1);
+	if (ret)
+		goto err_encl_dev;
+
 	sgx_encl_wq = alloc_workqueue("sgx-encl-wq",
 				      WQ_UNBOUND | WQ_FREEZABLE, 1);
 	if (!sgx_encl_wq) {
 		ret = -ENOMEM;
-		goto err_encl_dev;
+		goto err_provision_dev;
 	}
 
 	ret = cdev_device_add(&sgx_encl_cdev, &sgx_encl_dev);
 	if (ret)
 		goto err_encl_wq;
 
+	ret = cdev_device_add(&sgx_provision_cdev, &sgx_provision_dev);
+	if (ret)
+		goto err_encl_cdev;
+
 	return 0;
 
+err_encl_cdev:
+	cdev_device_del(&sgx_encl_cdev, &sgx_encl_dev);
+
 err_encl_wq:
 	destroy_workqueue(sgx_encl_wq);
 
+err_provision_dev:
+	put_device(&sgx_provision_dev);
+
 err_encl_dev:
 	put_device(&sgx_encl_dev);
 
diff --git a/arch/x86/kernel/cpu/sgx/driver.h b/arch/x86/kernel/cpu/sgx/driver.h
index e95c6e86c0c6..2f13886522a8 100644
--- a/arch/x86/kernel/cpu/sgx/driver.h
+++ b/arch/x86/kernel/cpu/sgx/driver.h
@@ -25,6 +25,8 @@ extern u64 sgx_attributes_reserved_mask;
 extern u64 sgx_xfrm_reserved_mask;
 extern u32 sgx_xsave_size_tbl[64];
 
+extern const struct file_operations sgx_provision_fops;
+
 long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg);
 
 int sgx_drv_init(void);
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index cd2146a15a22..275388ba9992 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -629,6 +629,50 @@ static long sgx_ioc_enclave_init(struct sgx_encl *encl, void __user *arg)
 	return ret;
 }
 
+/**
+ * sgx_ioc_enclave_set_attribute - handler for %SGX_IOC_ENCLAVE_SET_ATTRIBUTE
+ * @filep:	open file to /dev/sgx
+ * @arg:	userspace pointer to a struct sgx_enclave_set_attribute instance
+ *
+ * Mark the enclave as being allowed to access a restricted attribute bit.
+ * The requested attribute is specified via the attribute_fd field in the
+ * provided struct sgx_enclave_set_attribute.  The attribute_fd must be a
+ * handle to an SGX attribute file, e.g. “/dev/sgx/provision".
+ *
+ * Failure to explicitly request access to a restricted attribute will cause
+ * sgx_ioc_enclave_init() to fail.  Currently, the only restricted attribute
+ * is access to the PROVISION_KEY.
+ *
+ * Note, access to the EINITTOKEN_KEY is disallowed entirely.
+ *
+ * Return: 0 on success, -errno otherwise
+ */
+static long sgx_ioc_enclave_set_attribute(struct sgx_encl *encl,
+					  void __user *arg)
+{
+	struct sgx_enclave_set_attribute params;
+	struct file *attribute_file;
+	int ret;
+
+	if (copy_from_user(&params, arg, sizeof(params)))
+		return -EFAULT;
+
+	attribute_file = fget(params.attribute_fd);
+	if (!attribute_file)
+		return -EINVAL;
+
+	if (attribute_file->f_op != &sgx_provision_fops) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	encl->allowed_attributes |= SGX_ATTR_PROVISIONKEY;
+	ret = 0;
+
+out:
+	fput(attribute_file);
+	return ret;
+}
 
 long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
 {
@@ -652,6 +696,9 @@ long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
 	case SGX_IOC_ENCLAVE_INIT:
 		ret = sgx_ioc_enclave_init(encl, (void __user *)arg);
 		break;
+	case SGX_IOC_ENCLAVE_SET_ATTRIBUTE:
+		ret = sgx_ioc_enclave_set_attribute(encl, (void __user *)arg);
+		break;
 	default:
 		ret = -ENOIOCTLCMD;
 		break;
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH] tpm_tis: Move setting of TPM_CHIP_FLAG_IRQ into tpm_tis_probe_irq_single
From: Jarkko Sakkinen @ 2019-11-29 23:26 UTC (permalink / raw)
  To: Dan Williams
  Cc: Stefan Berger, Stefan Berger, linux-integrity,
	Linux Kernel Mailing List, linux-security-module, jsnitsel
In-Reply-To: <CAPcyv4gO2T4xcZjYSYJ8-0kDPRnVYWhX_df5E94Cjyksx6WFbg@mail.gmail.com>

On Wed, Nov 27, 2019 at 01:26:07PM -0800, Dan Williams wrote:
> [ add Jerry ]
> 
> On Wed, Nov 27, 2019 at 1:11 PM Jarkko Sakkinen
> <jarkko.sakkinen@linux.intel.com> wrote:
> >
> > On Thu, Nov 21, 2019 at 11:49:49AM -0700, Jerry Snitselaar wrote:
> > > On Sat Nov 16 19, Stefan Berger wrote:
> > > > On 11/14/19 11:44 AM, Jarkko Sakkinen wrote:
> > > > > On Thu, Nov 14, 2019 at 06:41:51PM +0200, Jarkko Sakkinen wrote:
> > > > > > On Tue, Nov 12, 2019 at 03:27:25PM -0500, Stefan Berger wrote:
> > > > > > > From: Stefan Berger <stefanb@linux.ibm.com>
> > > > > > >
> > > > > > > Move the setting of the TPM_CHIP_FLAG_IRQ for irq probing into
> > > > > > > tpm_tis_probe_irq_single before calling tpm_tis_gen_interrupt.
> > > > > > > This move handles error conditions better that may arise if anything
> > > > > > > before fails in tpm_tis_probe_irq_single.
> > > > > > >
> > > > > > > Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
> > > > > > > Suggested-by: Jerry Snitselaar <jsnitsel@redhat.com>
> > > > > > What about just changing the condition?
> > > > > Also cannot take this since it is not a bug (no fixes tag).
> > > >
> > > > I'll repost but will wait until Jerry has tested it on that machine.
> > > >
> > > >    Stefan
> > > >
> > > >
> > > > >
> > > > > /Jarkko
> > > >
> > > >
> > >
> > > It appears they still have the problem. I'm still waiting on logistics
> > > to send me a system to debug.
> >
> > Which hardware is guaranteed to ignite this? I can try to get test hw
> > for this from somewhere. Kind of looking into this blinded ATM. Dan?
> 
> Jerry had mentioned that this was also occurring on T490s. Otherwise
> I'll ping you offline about the system I saw this on internally.

I'll see if I can get my hands on T490 or T490s or something with
equivalent hardware.

/Jarkko

^ permalink raw reply

* [PATCH] Kernel Lockdown: Add an option to allow raw MSR access even, in confidentiality mode.
From: Matt Parnell @ 2019-11-30  6:49 UTC (permalink / raw)
  To: linux-security-module; +Cc: dhowells, matthew.garrett, keescook


[-- Attachment #1.1: Type: text/plain, Size: 2587 bytes --]

From 452b8460e464422d268659a8abb93353a182f8c8 Mon Sep 17 00:00:00 2001
From: Matt Parnell <mparnell@gmail.com>
Date: Sat, 30 Nov 2019 00:44:09 -0600
Subject: [PATCH] Kernel Lockdown: Add an option to allow raw MSR access even
 in confidentiality mode.

For Intel CPUs, some of the MDS mitigations utilize the new "flush" MSR, and
while this isn't something normally used in userspace, it does cause false
positives for the "Forshadow" vulnerability.

Additionally, Intel CPUs use MSRs for voltage and frequency controls,
which in
many cases is useful for undervolting to avoid excess heat.

Signed-off-by: Matt Parnell <mparnell@gmail.com>
---
 arch/x86/kernel/msr.c     |  5 ++++-
 security/lockdown/Kconfig | 12 ++++++++++++
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/msr.c b/arch/x86/kernel/msr.c
index 1547be359d7f..4adce59455c3 100644
--- a/arch/x86/kernel/msr.c
+++ b/arch/x86/kernel/msr.c
@@ -80,10 +80,11 @@ static ssize_t msr_write(struct file *file, const
char __user *buf,
     int err = 0;
     ssize_t bytes = 0;
 
+#if defined(LOCK_DOWN_DENY_RAW_MSR)
     err = security_locked_down(LOCKDOWN_MSR);
     if (err)
         return err;
-
+#endif
     if (count % 8)
         return -EINVAL;    /* Invalid chunk size */
 
@@ -135,9 +136,11 @@ static long msr_ioctl(struct file *file, unsigned
int ioc, unsigned long arg)
             err = -EFAULT;
             break;
         }
+#if defined(LOCK_DOWN_DENY_RAW_MSR)
         err = security_locked_down(LOCKDOWN_MSR);
         if (err)
             break;
+#endif
         err = wrmsr_safe_regs_on_cpu(cpu, regs);
         if (err)
             break;
diff --git a/security/lockdown/Kconfig b/security/lockdown/Kconfig
index e84ddf484010..f4fe72c4bf8f 100644
--- a/security/lockdown/Kconfig
+++ b/security/lockdown/Kconfig
@@ -44,4 +44,16 @@ config LOCK_DOWN_KERNEL_FORCE_CONFIDENTIALITY
      code to read confidential material held inside the kernel are
      disabled.
 
+config LOCK_DOWN_DENY_RAW_MSR
+    bool "Lock down and deny raw MSR access"
+    depends on LOCK_DOWN_KERNEL_FORCE_CONFIDENTIALITY
+    default y
+    help
+      Some Intel based systems require raw MSR access to use the flush
+      MSR for MDS mitigation confirmation. Raw access can also be used
+      to undervolt many Intel CPUs.
+
+      Say Y to prevent access or N to allow raw MSR access for such
+      cases.
+
 endchoice
-- 
2.24.0



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox