Linux Security Modules development

Linux Security Modules development
 help / color / mirror / Atom feed

* Re: [GIT PULL] SELinux fixes for v5.14 (#1)
From: pr-tracker-bot @ 2021-08-05 19:26 UTC (permalink / raw)
  To: Paul Moore; +Cc: Linus Torvalds, selinux, linux-security-module, linux-kernel
In-Reply-To: <CAHC9VhSW0zVR7wB9dxR-AkQAMK_H_fKQ75tTbMLomkBQzfzciw@mail.gmail.com>

The pull request you sent on Thu, 5 Aug 2021 14:29:27 -0400:

> git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux.git tags/selinux

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/0b53abfc5f66449d42fb1738c1c191e29e3be2e4

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply

* [GIT PULL] SELinux fixes for v5.14 (#1)
From: Paul Moore @ 2021-08-05 18:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: selinux, linux-security-module, linux-kernel

Hi Linus,

One small SELinux fix for v5.14-rcX to fix a problem where an error
code was not being propagated back up to userspace when a bogus
SELinux policy is loaded into the kernel.  The patch passes the
selinux-testsuite and merges cleanly with your tree as of a few
minutes ago; please merge this for the next release.

Thanks,
-Paul

--
The following changes since commit d99cf13f14200cdb5cbb704345774c9c0698612d:

 selinux: kill 'flags' argument in avc_has_perm_flags() and avc_audit() (2021-0
6-11 13:11:45 -0400)

are available in the Git repository at:

 git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux.git tags/selinux
-pr-20210805

for you to fetch changes up to 4c156084daa8ee70978e4b150b5eb5fc7b1f15be:

 selinux: correct the return value when loads initial sids (2021-08-02 09:59:50
-0400)

----------------------------------------------------------------
selinux/stable-5.14 PR 20210805

----------------------------------------------------------------
Xiu Jianfeng (1):
     selinux: correct the return value when loads initial sids

security/selinux/ss/policydb.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* RE: [RFC][PATCH v2 06/12] diglim: Interfaces - digest_list_add, digest_list_del
From: Roberto Sassu @ 2021-08-05 17:04 UTC (permalink / raw)
  To: Mimi Zohar, gregkh@linuxfoundation.org, mchehab+huawei@kernel.org
  Cc: linux-integrity@vger.kernel.org,
	linux-security-module@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org,
	Eric Snowberg
In-Reply-To: <e886224b50195a2c1324c91b39514395e9780b06.camel@linux.ibm.com>

> From: Mimi Zohar [mailto:zohar@linux.ibm.com]
> Sent: Thursday, August 5, 2021 5:38 PM
> [Cc'ing Eric Snowberg]
> 
> Hi Roberto,
> 
> On Mon, 2021-08-02 at 16:54 +0000, Roberto Sassu wrote:
> 
> > > > Properly identifying (all) user space parser(s) would be critical.  It
> > > > would be simpler and  safer to require the converted data be signed.
> >
> > When a process directly uploads a buffer to the kernel, the actions are
> > added to a digest list depending on the result of ima_measure_critical_data()
> > and from the actions attached to the process credentials and set by the
> > new LSM.
> >
> > If a process fails the identification, the actions in the process credentials
> > remain zero and the digest lists the process uploads will be ignored by IMA.
> >
> > The actions in the process credentials are set with the actions performed
> > on the executable by IMA, only if the digest of the executable is found in
> > a digest list and the digest list type is COMPACT_PARSER. The parser is
> > statically linked.
> >
> > The digest list for the parser can be generated at the end of the
> > building process and signed similarly to kernel modules (for SUSE,
> > with pesign-obs-integration). This is the only exception to handle,
> > other packages are not affected.
> 
> Ok, so to boot strap the set of permitted digest list parsers, the
> digest list signature is an appended signature, generated by the build
> process.  The key needed for verifying the signature would already be
> loaded on the builtin keyring.

Hi Mimi

yes. RPM headers will have an appended signature too, so that
appraisal will work.

> > After the parser has been identified, each file operation is monitored.
> 
> Does the new LSM need to monitor all file opens?

I would say yes. In the threat model, root is untrusted and
can potentially interfere with the conversion of the original
digest lists. Other than monitoring file operations, I'm also
denying ptraces on the parser. Hopefully this would be
sufficient, but any suggestion is more than welcome.

The good thing is that the policy of the new LSM is applied
to the processes that are successfully identified as parser.
Other processes are mostly unaffected.

The only limitation the new LSM would introduce is that
the files being used by the parser are write-locked until
the parser releases them (if files are already opened for
writing by other processes, the LSM would mark the parser
as untrusted and will not apply any IMA actions to the digest
lists the parser uploads).

It is probably a good idea to send the patch, after I finish
testing it. I will send also another patch for loading digest
lists during kernel initialization (with the two new patches
the non-IMA part would be complete).

> > The LSM has to explicitly perform a second open to ensure that
> > the file is measured/appraised before the integrity_iint_cache structure
> > is retrieved (because IMA is called after all LSMs).
> >
> > If an action is missing from the integrity_iint_cache structure, it
> > will be cleared by the LSM in the actions attached to the process
> > credentials, and will not be added to the digest list being uploaded.
> >
> > > I agree, it would be much easier. However, it would require changes
> > > to the building infrastructure of Linux distribution vendors, which
> > > might limit the applicability of DIGLIM.
> > >
> 
> I understand, but instead of the distros signing the compact digest
> lists, with Eric's  "Enroll kernel keys thru MOK" patch set, the
> customer/end user could have more control over which file digests are
> permitted on a per system basis.

Yes, generating custom digest lists is supported and needed if
users want to run their own applications, when appraisal is
enforced. But I like the idea that, if users simply want to just run
anything the distribution provides, they have everything they
need. Theoretically, users will be able to run appraisal in enforcing
mode at the first boot after installation.

Thanks

Roberto

HUAWEI TECHNOLOGIES Duesseldorf GmbH, HRB 56063
Managing Director: Li Peng, Li Jian, Shi Yanli

> > > With the user space parser taking care of the conversion, distributions
> > > can do appraisal of executables and shared libraries with an update of:
> > > - the kernel: to add DIGLIM
> > > - dracut: to add required digest lists in the initial ram disk
> > > - rpm (plugin): to extract the RPM header and its signature and write
> > >   them to a file that is uploaded to the kernel by the user space parser
> > >
> > > I'm planning to append the signature at the end of the RPM header
> > > (and use appraise_type=modsig) to avoid the dependency on the
> > > 'initramfs: add support for xattrs in the initial ram disk' patch set
> > > (which I might try to resume in the future).
> 
> Based on your explanation above, I surmised as much.
> 
> thanks,
> 
> Mimi


^ permalink raw reply

* Re: [RFC][PATCH v2 06/12] diglim: Interfaces - digest_list_add, digest_list_del
From: Mimi Zohar @ 2021-08-05 15:38 UTC (permalink / raw)
  To: Roberto Sassu, gregkh@linuxfoundation.org,
	mchehab+huawei@kernel.org
  Cc: linux-integrity@vger.kernel.org,
	linux-security-module@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org,
	Eric Snowberg
In-Reply-To: <f7adeb81bab24b689c3e1aa61d83c6f5@huawei.com>

[Cc'ing Eric Snowberg]

Hi Roberto,

On Mon, 2021-08-02 at 16:54 +0000, Roberto Sassu wrote:

> > > Properly identifying (all) user space parser(s) would be critical.  It
> > > would be simpler and  safer to require the converted data be signed.
> 
> When a process directly uploads a buffer to the kernel, the actions are
> added to a digest list depending on the result of ima_measure_critical_data()
> and from the actions attached to the process credentials and set by the
> new LSM.
> 
> If a process fails the identification, the actions in the process credentials
> remain zero and the digest lists the process uploads will be ignored by IMA.
> 
> The actions in the process credentials are set with the actions performed
> on the executable by IMA, only if the digest of the executable is found in
> a digest list and the digest list type is COMPACT_PARSER. The parser is
> statically linked.
> 
> The digest list for the parser can be generated at the end of the
> building process and signed similarly to kernel modules (for SUSE,
> with pesign-obs-integration). This is the only exception to handle,
> other packages are not affected.

Ok, so to boot strap the set of permitted digest list parsers, the
digest list signature is an appended signature, generated by the build
process.  The key needed for verifying the signature would already be
loaded on the builtin keyring.

> 
> After the parser has been identified, each file operation is monitored.

Does the new LSM need to monitor all file opens?

> The LSM has to explicitly perform a second open to ensure that
> the file is measured/appraised before the integrity_iint_cache structure
> is retrieved (because IMA is called after all LSMs).
> 
> If an action is missing from the integrity_iint_cache structure, it
> will be cleared by the LSM in the actions attached to the process
> credentials, and will not be added to the digest list being uploaded.
> 
> > I agree, it would be much easier. However, it would require changes
> > to the building infrastructure of Linux distribution vendors, which
> > might limit the applicability of DIGLIM.
> > 

I understand, but instead of the distros signing the compact digest
lists, with Eric's  "Enroll kernel keys thru MOK" patch set, the
customer/end user could have more control over which file digests are
permitted on a per system basis.

> > With the user space parser taking care of the conversion, distributions
> > can do appraisal of executables and shared libraries with an update of:
> > - the kernel: to add DIGLIM
> > - dracut: to add required digest lists in the initial ram disk
> > - rpm (plugin): to extract the RPM header and its signature and write
> >   them to a file that is uploaded to the kernel by the user space parser
> > 
> > I'm planning to append the signature at the end of the RPM header
> > (and use appraise_type=modsig) to avoid the dependency on the
> > 'initramfs: add support for xattrs in the initial ram disk' patch set
> > (which I might try to resume in the future).

Based on your explanation above, I surmised as much.

thanks,

Mimi


^ permalink raw reply

* Re: [PATCH RFC v2 02/12] KEYS: CA link restriction
From: Mimi Zohar @ 2021-08-05 14:00 UTC (permalink / raw)
  To: Eric Snowberg, keyrings, linux-integrity, dhowells, dwmw2,
	herbert, davem, jarkko, jmorris, serge
  Cc: keescook, gregkh, torvalds, scott.branden, weiyongjun1, nayna,
	ebiggers, ardb, nramas, lszubowi, linux-kernel, linux-crypto,
	linux-security-module, James.Bottomley, pjones, glin, konrad.wilk
In-Reply-To: <20210726171319.3133879-3-eric.snowberg@oracle.com>

On Mon, 2021-07-26 at 13:13 -0400, Eric Snowberg wrote:
> Add a new link restriction.  Restrict the addition of keys in a keyring
> based on the key to be added being a CA (self-signed) or by being
> vouched for by a key in either the built-in or the secondary trusted
> keyrings.
> 
> Signed-off-by: Eric Snowberg <eric.snowberg@oracle.com>

As discussed, please remove "or by being vouched for by a key in
either the built-in or the secondary trusted keyrings."

If these keys were to be loaded onto a keyring other than the platform
keyring, they should be loaded onto the secondary keyring.  The
secondary restriction currently allows certificates signed by keys on
either the builtin or the secondary keyring, to be loaded onto the
secondary keyring.   A new restriction would also allow certificates
signed by keys on the ".mok" keyring.

> +/**
> + * restrict_link_by_ca - Restrict additions to a ring of public keys
> + * based on it being a CA
> + * @dest_keyring: Keyring being linked to.
> + * @type: The type of key being added.
> + * @payload: The payload of the new key.
> + * @trusted: A key or ring of keys that can be used to vouch for the new cert.
> + *
> + * Check if the new certificate is a CA or if they key can be vouched for
> + * by keys already linked in the destination keyring or the trusted
> + * keyring.  If one of those is the signing key or it is self signed, then
> + * mark the new certificate as being ok to link.
> + *
> + * Returns 0 if the new certificate was accepted, -ENOKEY if we could not find
> + * a matching parent certificate in the trusted list.  -ENOPKG if the signature
> + * uses unsupported crypto, or some other error if there is a matching
> + * certificate  but the signature check cannot be performed.
> + */

Please update the function description as discussed, removing "or if they
key can be vouched for by keys already linked in the destination
keyring or the trusted keyring."

The kernel doc "Brief description of function" should not wrap.  The
variable definition shouldn't be suffixed with a period.  Please refer
to Documentation/doc-guide/kernel-doc.rst.

> +int restrict_link_by_ca(struct key *dest_keyring,

The UEFI db CA keys should only be loaded on boot.  Should this be
annotated as __init?

> +			const struct key_type *type,
> +			const union key_payload *payload,
> +			struct key *trust_keyring)
> +{
> +	const struct public_key_signature *sig;
> +	const struct public_key *pkey;
> +	struct key *key;
> +	int ret;
> +
> +	if (type != &key_type_asymmetric)
> +		return -EOPNOTSUPP;
> +
> +	sig = payload->data[asym_auth];
> +	if (!sig)
> +		return -ENOPKG;
> +
> +	if (!sig->auth_ids[0] && !sig->auth_ids[1])
> +		return -ENOKEY;
> +
> +	pkey = payload->data[asym_crypto];
> +	if (!pkey)
> +		return -ENOPKG;
> +
> +	ret = public_key_verify_signature(pkey, sig);
> +	if (!ret)
> +		return 0;
> +
> +	if (!trust_keyring)
> +		return -ENOKEY;
> +
> +	key = find_asymmetric_key(trust_keyring,
> +				  sig->auth_ids[0], sig->auth_ids[1],
> +				  false);
> +	if (IS_ERR(key))
> +		return -ENOKEY;
> +
> +	ret = verify_signature(key, sig);
> +	key_put(key);
> +	return ret;
> +}
> +
>  static bool match_either_id(const struct asymmetric_key_ids *pair,
>  			    const struct asymmetric_key_id *single)
>  {


> +extern int restrict_link_by_system_trusted_or_ca(
> +	struct key *dest_keyring,
> +	const struct key_type *type,
> +	const union key_payload *payload,
> +	struct key *restrict_key);

After the discussed change, this shouldn't be needed.

thanks,

Mimi

> +
>  #ifdef CONFIG_SECONDARY_TRUSTED_KEYRING
>  extern int restrict_link_by_builtin_and_secondary_trusted(
>  	struct key *keyring,



^ permalink raw reply

* Re: [PATCH RFC v2 10/12] KEYS: link system_trusted_keys to mok_trusted_keys
From: Mimi Zohar @ 2021-08-05 13:58 UTC (permalink / raw)
  To: Eric Snowberg, keyrings, linux-integrity, dhowells, dwmw2,
	herbert, davem, jarkko, jmorris, serge
  Cc: keescook, gregkh, torvalds, scott.branden, weiyongjun1, nayna,
	ebiggers, ardb, nramas, lszubowi, linux-kernel, linux-crypto,
	linux-security-module, James.Bottomley, pjones, glin, konrad.wilk
In-Reply-To: <20210726171319.3133879-11-eric.snowberg@oracle.com>

On Mon, 2021-07-26 at 13:13 -0400, Eric Snowberg wrote:

> diff --git a/certs/system_keyring.c b/certs/system_keyring.c
> index dcaf74102ab2..b27ae30eaadc 100644
> --- a/certs/system_keyring.c
> +++ b/certs/system_keyring.c
> @@ -45,6 +45,15 @@ int restrict_link_by_builtin_trusted(struct key *dest_keyring,
>  				     const union key_payload *payload,
>  				     struct key *restriction_key)
>  {
> +	/* If the secondary trusted keyring is not enabled, we may link
> +	 * through to the mok keyring and the search may follow that link.
> +	 */

Refer to section "8) Commenting" of Documentation/process/coding-
style.rst for the format of multi line comments.

> +	if (mok_trusted_keys && type == &key_type_keyring &&
> +	    dest_keyring == builtin_trusted_keys &&
> +	    payload == &mok_trusted_keys->payload)
> +		/* Allow the mok keyring to be added to the builtin */
> +		return 0;
> +

Unless you're changing the meaning of the restriction, then a new
restriction needs to be defined.  In this case, please don't change the
meaning of restrict_link_by_builtin_trusted().  Instead define a new
restriction named restrict_link_by_builtin_and_ca_trusted().

>  	return restrict_link_by_signature(dest_keyring, type, payload,
>  					  builtin_trusted_keys);
>  }
> @@ -91,6 +100,15 @@ int restrict_link_by_builtin_and_secondary_trusted(
>  		/* Allow the builtin keyring to be added to the secondary */
>  		return 0;
>  
> +	/* If we have a secondary trusted keyring, it may contain a link
> +	 * through to the mok keyring and the search may follow that link.
> +	 */
> +	if (mok_trusted_keys && type == &key_type_keyring &&
> +	    dest_keyring == secondary_trusted_keys &&
> +	    payload == &mok_trusted_keys->payload)
> +		/* Allow the mok keyring to be added to the secondary */
> +		return 0;
> +

Similarly here, please define a new restriction maybe named
restrict_link_by_builtin_secondary_and_ca_trusted().   To avoid code
duplication, the new restriction could be a wrapper around the existing
function.

>  	return restrict_link_by_signature(dest_keyring, type, payload,
>  					  secondary_trusted_keys);
>  }
> @@ -321,5 +339,8 @@ void __init set_platform_trusted_keys(struct key *keyring)
>  void __init set_mok_trusted_keys(struct key *keyring)
>  {
>  	mok_trusted_keys = keyring;
> +
> +	if (key_link(system_trusted_keys, mok_trusted_keys) < 0)
> +		panic("Can't link (mok) trusted keyrings\n");
>  }

From the thread discussion on 00/12:

Only the builtin keys should ever be on the builtin keyring.  The
builtin keyring would need to be linked to the mok keyring.  But in the
secondary keyring case, the mok keyring would be linked to the
secondary keyring, similar to how the builtin keyring is linked to the
secondary keyring.

        if (key_link(secondary_trusted_keys, builtin_trusted_keys) < 0)
                panic("Can't link trusted keyrings\n");


thanks,

Mimi

>  #endif



^ permalink raw reply

* Re: [PATCH RFC v2 00/12] Enroll kernel keys thru MOK
From: Mimi Zohar @ 2021-08-05 13:58 UTC (permalink / raw)
  To: Eric Snowberg
  Cc: keyrings@vger.kernel.org, linux-integrity@vger.kernel.org,
	David Howells, dwmw2@infradead.org, herbert@gondor.apana.org.au,
	davem@davemloft.net, jarkko@kernel.org, jmorris@namei.org,
	serge@hallyn.com, keescook@chromium.org,
	gregkh@linuxfoundation.org, torvalds@linux-foundation.org,
	scott.branden@broadcom.com, weiyongjun1@huawei.com,
	nayna@linux.ibm.com, ebiggers@google.com, ardb@kernel.org,
	Lakshmi Ramasubramanian, lszubowi@redhat.com,
	linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org,
	linux-security-module@vger.kernel.org,
	James.Bottomley@hansenpartnership.com, pjones@redhat.com,
	glin@suse.com, Konrad Wilk
In-Reply-To: <3D8EF15C-E3D6-4C9A-9BA2-9F4201AC3ED3@oracle.com>

On Wed, 2021-08-04 at 02:56 +0000, Eric Snowberg wrote:

> Ok, I’ll update the current code to just load CA keys into the mok in v3.  This would
> simplify the new restrict_link_by_ca function. 

Thank you!
> 
> With that change, do you see any issues with how I’m doing the linking?  With the 
> mok keyring linked to the secondary keyring, the platform keyring may only contain 
> a subset of the keys it originally contained.  But, as I described above, I don’t believe
> it will lead to a regression since all keys get referenced. Thanks.

I think there is a problem.  Only the builtin keys should ever be on
the builtin keyring.  The builtin keyring would need to be linked to
the mok keyring.  But in the secondary keyring case, the linking
should be the reverse, where the mok keyring would be linked to the
secondary keyring, similar to how the builtin keyring is linked to the
secondary keyring.

        if (key_link(secondary_trusted_keys, builtin_trusted_keys) < 0)
                panic("Can't link trusted keyrings\n");


thanks,

Mimi


^ permalink raw reply

* Re: [REGRESSION?] Simultaneous writes to a reader-less, non-full pipe can hang
From: Linus Torvalds @ 2021-08-04 20:04 UTC (permalink / raw)
  To: Alex Xu (Hello71)
  Cc: acrichton, Christian Brauner, David Howells, Greg Kroah-Hartman,
	keyrings, Linux API, linux-block, linux-fsdevel,
	Linux Kernel Mailing List, Rasmus Villemoes, LSM List, linux-usb,
	Nicolas Dichtel, Peter Zijlstra, Ian Kent
In-Reply-To: <1628105897.vb3ko0vb06.none@localhost>

On Wed, Aug 4, 2021 at 12:48 PM Alex Xu (Hello71) <alex_y_xu@yahoo.ca> wrote:
>
> I agree that if this only affects programs which intentionally adjust
> the pipe buffer size, then it is not a huge issue. The problem,
> admittedly buried very close to the bottom of my email, is that the
> kernel will silently provide one-page pipe buffers if the user has
> exceeded 16384 (by default) pipe buffer pages allocated.

That's a good point.

That "fall back to a single buffer" case is meant to make things
hobble along if the user has exhausted the pipe buffers, but you're
right that we might want to make that minimum be two buffers.

I didn't test this, but the obvious fix seems to be just increasing
the '1' to '2'.

  @@ -781,8 +784,8 @@ struct pipe_inode_info *alloc_pipe_info(void)
          user_bufs = account_pipe_buffers(user, 0, pipe_bufs);

          if (too_many_pipe_buffers_soft(user_bufs) &&
pipe_is_unprivileged_user()) {
  -               user_bufs = account_pipe_buffers(user, pipe_bufs, 1);
  -               pipe_bufs = 1;
  +               user_bufs = account_pipe_buffers(user, pipe_bufs, 2);
  +               pipe_bufs = 2;
          }

          if (too_many_pipe_buffers_hard(user_bufs) &&
pipe_is_unprivileged_user())

although a real patch would need a comment about how a single buffer
is problematic, and probably make the '2' be a #define to not just
repeat the same magic constant silently.

IOW, something like

  /*
   * The general pipe use case needs at least two buffers: one
   * for data yet to be read, and one for new data
   */
  #define DEF_MIN_PIPE_BUFFERS 2

to go with the existing PIPE_DEF_BUFFERS (although the
DEF_MIN_PIPE_BUFFERS use would only be in fs/pipe.c, so I guess it
doesn't actually need to be exposed to anybody else in the pipe.h
header file).

I'd take that patch with some proper testing and a commit message.

Hint hint ;)

                Linus

^ permalink raw reply

* Re: [REGRESSION?] Simultaneous writes to a reader-less, non-full pipe can hang
From: Alex Xu (Hello71) @ 2021-08-04 19:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: acrichton, Christian Brauner, David Howells, Greg Kroah-Hartman,
	keyrings, Linux API, linux-block, linux-fsdevel,
	Linux Kernel Mailing List, Rasmus Villemoes, LSM List, linux-usb,
	Nicolas Dichtel, Peter Zijlstra, Ian Kent
In-Reply-To: <CAHk-=wiLr55zHUWNzmp3DeoO0DUaYp7vAzQB5KUCni5FpwC7Uw@mail.gmail.com>

Excerpts from Linus Torvalds's message of August 4, 2021 12:31 pm:
> Your program is buggy.
> 
> On Wed, Aug 4, 2021 at 8:37 AM Alex Xu (Hello71) <alex_y_xu@yahoo.ca> wrote:
>>
>>     pipe(pipefd);
>>     printf("init buffer: %d\n", fcntl(pipefd[1], F_GETPIPE_SZ));
>>     printf("new buffer:  %d\n", fcntl(pipefd[1], F_SETPIPE_SZ, 0));
> 
> Yeah, what did you expect this to do? You said you want a minimal
> buffer, you get a really small buffer.
> 
> Then you try to write multiple messages to the pipe that you just said
> should have a minimum size.
> 
> Don't do that then.
> 
>> /proc/x/stack shows that the remaining thread is hanging at pipe.c:560.
>> It looks like not only there needs to be space in the pipe, but also
>> slots.
> 
> Correct. The fullness of a pipe is not about whether it has the
> possibility of merging more bytes into an existing not-full slot, but
> about whether it has empty slots left.
> 
> Part of that is simply the POSIX pipe guarantees - a write of size
> PIPE_BUF or less is guaranteed to be atomic, so it mustn't be split
> among buffers.
> 
> So a pipe must not be "writable" unless it has space for at least that
> much (think select/poll, which don't know the size of the write).
> 
> The fact that we might be able to reuse a partially filled buffer for
> smaller writes is simply not relevant to that issue.
> 
> And yes, we could have different measures of "could write" for
> different writes, but we just don't have or want that complexity.
> 
> Please don't mess with F_SETPIPE_SZ unless you have a really good
> reason to do so, and actually understand what you are doing.
> 
> Doing a F_SETPIPE_SZ, 0 basically means "I want the mimimum pipe size
> possible". And that one accepts exactly one write at a time.
> 
> Of course, the exact semantics are much more complicated than that
> "exactly one write". The pipe write code will optimistically merge
> writes into a previous buffer, which means that depending on the
> pattern of your writes, the exact number of bytes you can write will
> be very different.
> 
> But that "merge writes into a previous buffer" only appends to the
> buffer - not _reuse_ it - so when each buffer is one page in size,
> what happens is that you can merge up to 4096 bytes worth of writes,
> but then after that the pipe write will want a new buffer - even if
> the old buffer is now empty because of old reads.
> 
> That's why your test program won't block immediately: both writers
> will actually start out happily doing writes into that one buffer that
> is allocated, but at some point that buffer ends, and it wants to
> allocate a new buffer.
> 
> But you told it not to allocate more buffers, and the old buffer is
> never completely empty because your readers never read _everythign_,
> so it will hang, waiting for you to empty the one minimal buffer it
> allocated. And that will never happen.
> 
> There's a very real reason why we do *not* by default say "pipes can
> only ever use only one buffer".
> 
> I don't think this is a regression, but if you have an actual
> application - not a test program - that does crazy things like this
> and used to work (I'm not sure it has ever worked, though), we can
> look into making it work again.
> 
> That said, I suspect the way to make it work is to just say "the
> minimum pipe size is two slots" rather than change the "we want at
> least one empty slot". Exactly because of that whole "look, we must
> not consider a pipe that doesn't have a slot writable".
> 
> Because clearly people don't understand how subtle F_SETPIPE_SZ is.
> It's not really a "byte count", even though that is how it's
> expressed.
> 
>                    Linus
> 

I agree that if this only affects programs which intentionally adjust 
the pipe buffer size, then it is not a huge issue. The problem, 
admittedly buried very close to the bottom of my email, is that the 
kernel will silently provide one-page pipe buffers if the user has 
exceeded 16384 (by default) pipe buffer pages allocated. Try this 
slightly more complicated program:

#define _GNU_SOURCE
#include <fcntl.h>
#include <pthread.h>
#include <signal.h>
#include <stdio.h>
#include <unistd.h>

static int pipefd[2];

void *thread_start(void *arg) {
    char buf[1];
    int i;
    for (i = 0; i < 1000000; i++) {
        read(pipefd[0], buf, sizeof(buf));
        if (write(pipefd[1], buf, sizeof(buf)) == -1)
            break;
    }
    printf("done %d\n", i);
    return NULL;
}

int main() {
    signal(SIGPIPE, SIG_IGN);
    // pretend there are a very large number of make processes running
    for (int i = 0; i < 1025; i++)
    {
        pipe(pipefd);
        write(pipefd[1], "aa", 2);
    }
    printf("%d %d\n", pipefd[0], pipefd[1]);
    printf("init buffer: %d\n", fcntl(pipefd[1], F_GETPIPE_SZ));
    //printf("new buffer:  %d\n", fcntl(pipefd[1], F_SETPIPE_SZ, 0));
    pthread_t thread1, thread2;
    pthread_create(&thread1, NULL, thread_start, NULL);
    pthread_create(&thread2, NULL, thread_start, NULL);
    sleep(3);
    close(pipefd[0]);
    close(pipefd[1]);
    pthread_join(thread1, NULL);
    pthread_join(thread2, NULL);
}

You may need to raise your RLIMIT_NOFILE before running the program.

With default settings (fs.pipe-user-pages-soft = 16384) the init buffer 
will be 4096, no mangling required. I think this could be probably be 
solved if the kernel instead reduced over-quota pipes to two pages 
instead of one page. If someone wants to set a one-page pipe buffer, 
then they can suffer the consequences, but the kernel shouldn't silently 
hand people that footgun.

Regards,
Alex.

^ permalink raw reply

* Re: [syzbot] BUG: unable to handle kernel paging request in cap_capable
From: Nikolay Aleksandrov @ 2021-08-04 18:16 UTC (permalink / raw)
  To: syzbot, jmorris, linux-kernel, linux-security-module, netdev,
	serge, syzkaller-bugs, Arnd Bergmann
In-Reply-To: <00000000000009422905c8bf2102@google.com>

On 04/08/2021 20:28, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    b820c114eba7 net: fec: fix MAC internal delay doesn't work
> git tree:       net-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=11fbdd7a300000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=5e6797705e664e3b
> dashboard link: https://syzkaller.appspot.com/bug?extid=79f4a8692e267bdb7227
> compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.1
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=127e4952300000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=16fef2aa300000
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+79f4a8692e267bdb7227@syzkaller.appspotmail.com
> 
> netdevsim netdevsim0 netdevsim1: set [1, 0] type 2 family 0 port 6081 - 0
> netdevsim netdevsim0 netdevsim2: set [1, 0] type 2 family 0 port 6081 - 0
> netdevsim netdevsim0 netdevsim3: set [1, 0] type 2 family 0 port 6081 - 0
> BUG: unable to handle page fault for address: fffff3008f71a93b
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0 
> Oops: 0000 [#1] PREEMPT SMP KASAN
> CPU: 0 PID: 8445 Comm: syz-executor610 Not tainted 5.14.0-rc3-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> RIP: 0010:cap_capable+0xa0/0x280 security/commoncap.c:83
[snip]
> 
> 
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
> 
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> syzbot can test patches for this issue, for details see:
> https://goo.gl/tpsmEJ#testing-patches
> 

+CC Arnd

I think this bug is also from the recent ndo_do_ioctl conversion, it seems
nothing checks if the device is an actual bridge so one could call br_ioctl_call()
for any device. I'll add this to my list, it's a simple fix.

Sorry for the delay, I will send both bridge ioctl fixes tomorrow.

Thanks,
 Nik

^ permalink raw reply

* [syzbot] BUG: unable to handle kernel paging request in cap_capable
From: syzbot @ 2021-08-04 17:28 UTC (permalink / raw)
  To: jmorris, linux-kernel, linux-security-module, netdev, serge,
	syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    b820c114eba7 net: fec: fix MAC internal delay doesn't work
git tree:       net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=11fbdd7a300000
kernel config:  https://syzkaller.appspot.com/x/.config?x=5e6797705e664e3b
dashboard link: https://syzkaller.appspot.com/bug?extid=79f4a8692e267bdb7227
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.1
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=127e4952300000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=16fef2aa300000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+79f4a8692e267bdb7227@syzkaller.appspotmail.com

netdevsim netdevsim0 netdevsim1: set [1, 0] type 2 family 0 port 6081 - 0
netdevsim netdevsim0 netdevsim2: set [1, 0] type 2 family 0 port 6081 - 0
netdevsim netdevsim0 netdevsim3: set [1, 0] type 2 family 0 port 6081 - 0
BUG: unable to handle page fault for address: fffff3008f71a93b
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0 
Oops: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 8445 Comm: syz-executor610 Not tainted 5.14.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:cap_capable+0xa0/0x280 security/commoncap.c:83
Code: 0f 8e d3 01 00 00 41 8b ad e0 00 00 00 49 bc 00 00 00 00 00 fc ff df e8 ce ef da fd 48 8d bb e0 00 00 00 48 89 f8 48 c1 e8 03 <42> 0f b6 04 20 84 c0 74 08 3c 03 0f 8e 64 01 00 00 44 8b bb e0 00
RSP: 0018:ffffc90001c3f9d8 EFLAGS: 00010a02
RAX: 1ffff7008f71a93b RBX: ffffb8047b8d48f8 RCX: 0000000000000000
RDX: ffff88802dad54c0 RSI: ffffffff839aad82 RDI: ffffb8047b8d49d8
RBP: 0000000000000000 R08: 0000000000000028 R09: ffffffff87f70fe6
R10: ffffffff814731be R11: 00000000000089a2 R12: dffffc0000000000
R13: ffffffff8b83b660 R14: ffff8880155fa500 R15: ffff8880155fa500
FS:  000000000105d300(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: fffff3008f71a93b CR3: 000000002789f000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 security_capable+0x5c/0xc0 security/security.c:805
 ns_capable_common kernel/capability.c:375 [inline]
 ns_capable+0x6f/0x100 kernel/capability.c:396
 add_del_if+0x9b/0x140 net/bridge/br_ioctl.c:89
 br_ioctl_stub+0x1c6/0x7f0 net/bridge/br_ioctl.c:397
 br_ioctl_call+0x5e/0xa0 net/socket.c:1092
 dev_ifsioc+0xc1f/0xf60 net/core/dev_ioctl.c:382
 dev_ioctl+0x1b9/0xee0 net/core/dev_ioctl.c:580
 sock_do_ioctl+0x18b/0x210 net/socket.c:1129
 sock_ioctl+0x2f1/0x640 net/socket.c:1232
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:1069 [inline]
 __se_sys_ioctl fs/ioctl.c:1055 [inline]
 __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:1055
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x4430a9
Code: 28 c3 e8 4a 15 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffe266e8a48 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffe266e8a58 RCX: 00000000004430a9
RDX: 0000000020000080 RSI: 00000000000089a2 RDI: 0000000000000003
RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe266e8a60
R13: 00007ffe266e8a80 R14: 00000000004b8018 R15: 00000000004004b8
Modules linked in:
CR2: fffff3008f71a93b
---[ end trace 5ec5ff62110878d2 ]---
RIP: 0010:cap_capable+0xa0/0x280 security/commoncap.c:83
Code: 0f 8e d3 01 00 00 41 8b ad e0 00 00 00 49 bc 00 00 00 00 00 fc ff df e8 ce ef da fd 48 8d bb e0 00 00 00 48 89 f8 48 c1 e8 03 <42> 0f b6 04 20 84 c0 74 08 3c 03 0f 8e 64 01 00 00 44 8b bb e0 00
RSP: 0018:ffffc90001c3f9d8 EFLAGS: 00010a02
RAX: 1ffff7008f71a93b RBX: ffffb8047b8d48f8 RCX: 0000000000000000
RDX: ffff88802dad54c0 RSI: ffffffff839aad82 RDI: ffffb8047b8d49d8
RBP: 0000000000000000 R08: 0000000000000028 R09: ffffffff87f70fe6
R10: ffffffff814731be R11: 00000000000089a2 R12: dffffc0000000000
R13: ffffffff8b83b660 R14: ffff8880155fa500 R15: ffff8880155fa500
FS:  000000000105d300(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: fffff3008f71a93b CR3: 000000002789f000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: [REGRESSION?] Simultaneous writes to a reader-less, non-full pipe can hang
From: Linus Torvalds @ 2021-08-04 16:31 UTC (permalink / raw)
  To: Alex Xu (Hello71)
  Cc: Linux Kernel Mailing List, David Howells, acrichton,
	Rasmus Villemoes, Greg Kroah-Hartman, Peter Zijlstra,
	Nicolas Dichtel, Ian Kent, Christian Brauner, keyrings, linux-usb,
	linux-block, LSM List, linux-fsdevel, Linux API
In-Reply-To: <1628086770.5rn8p04n6j.none@localhost>

Your program is buggy.

On Wed, Aug 4, 2021 at 8:37 AM Alex Xu (Hello71) <alex_y_xu@yahoo.ca> wrote:
>
>     pipe(pipefd);
>     printf("init buffer: %d\n", fcntl(pipefd[1], F_GETPIPE_SZ));
>     printf("new buffer:  %d\n", fcntl(pipefd[1], F_SETPIPE_SZ, 0));

Yeah, what did you expect this to do? You said you want a minimal
buffer, you get a really small buffer.

Then you try to write multiple messages to the pipe that you just said
should have a minimum size.

Don't do that then.

> /proc/x/stack shows that the remaining thread is hanging at pipe.c:560.
> It looks like not only there needs to be space in the pipe, but also
> slots.

Correct. The fullness of a pipe is not about whether it has the
possibility of merging more bytes into an existing not-full slot, but
about whether it has empty slots left.

Part of that is simply the POSIX pipe guarantees - a write of size
PIPE_BUF or less is guaranteed to be atomic, so it mustn't be split
among buffers.

So a pipe must not be "writable" unless it has space for at least that
much (think select/poll, which don't know the size of the write).

The fact that we might be able to reuse a partially filled buffer for
smaller writes is simply not relevant to that issue.

And yes, we could have different measures of "could write" for
different writes, but we just don't have or want that complexity.

Please don't mess with F_SETPIPE_SZ unless you have a really good
reason to do so, and actually understand what you are doing.

Doing a F_SETPIPE_SZ, 0 basically means "I want the mimimum pipe size
possible". And that one accepts exactly one write at a time.

Of course, the exact semantics are much more complicated than that
"exactly one write". The pipe write code will optimistically merge
writes into a previous buffer, which means that depending on the
pattern of your writes, the exact number of bytes you can write will
be very different.

But that "merge writes into a previous buffer" only appends to the
buffer - not _reuse_ it - so when each buffer is one page in size,
what happens is that you can merge up to 4096 bytes worth of writes,
but then after that the pipe write will want a new buffer - even if
the old buffer is now empty because of old reads.

That's why your test program won't block immediately: both writers
will actually start out happily doing writes into that one buffer that
is allocated, but at some point that buffer ends, and it wants to
allocate a new buffer.

But you told it not to allocate more buffers, and the old buffer is
never completely empty because your readers never read _everythign_,
so it will hang, waiting for you to empty the one minimal buffer it
allocated. And that will never happen.

There's a very real reason why we do *not* by default say "pipes can
only ever use only one buffer".

I don't think this is a regression, but if you have an actual
application - not a test program - that does crazy things like this
and used to work (I'm not sure it has ever worked, though), we can
look into making it work again.

That said, I suspect the way to make it work is to just say "the
minimum pipe size is two slots" rather than change the "we want at
least one empty slot". Exactly because of that whole "look, we must
not consider a pipe that doesn't have a slot writable".

Because clearly people don't understand how subtle F_SETPIPE_SZ is.
It's not really a "byte count", even though that is how it's
expressed.

                   Linus

^ permalink raw reply

* [REGRESSION?] Simultaneous writes to a reader-less, non-full pipe can hang
From: Alex Xu (Hello71) @ 2021-08-04 15:37 UTC (permalink / raw)
  To: linux-kernel, dhowells, acrichton
  Cc: torvalds, Rasmus Villemoes, Greg Kroah-Hartman, Peter Zijlstra,
	nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb,
	linux-block, linux-security-module, linux-fsdevel, linux-api
In-Reply-To: <1628086770.5rn8p04n6j.none.ref@localhost>

Hi,

An issue "Jobserver hangs due to full pipe" was recently reported 
against Cargo, the Rust package manager. This was diagnosed as an issue 
with pipe writes hanging in certain circumstances.

Specifically, if two or more threads simultaneously write to a pipe, it 
is possible for all the writers to hang despite there being significant 
space available in the pipe.

I have translated the Rust example to C with some small adjustments:

#define _GNU_SOURCE
#include <fcntl.h>
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>

static int pipefd[2];

void *thread_start(void *arg) {
    char buf[1];
    for (int i = 0; i < 1000000; i++) {
        read(pipefd[0], buf, sizeof(buf));
        write(pipefd[1], buf, sizeof(buf));
    }
    puts("done");
    return NULL;
}

int main() {
    pipe(pipefd);
    printf("init buffer: %d\n", fcntl(pipefd[1], F_GETPIPE_SZ));
    printf("new buffer:  %d\n", fcntl(pipefd[1], F_SETPIPE_SZ, 0));
    write(pipefd[1], "aa", 2);
    pthread_t thread1, thread2;
    pthread_create(&thread1, NULL, thread_start, NULL);
    pthread_create(&thread2, NULL, thread_start, NULL);
    pthread_join(thread1, NULL);
    pthread_join(thread2, NULL);
}

The expected behavior of this program is to print:

init buffer: 65536
new buffer:  4096
done
done

and then exit.

On Linux 5.14-rc4, compiling this program and running it will print the 
following about half the time:

init buffer: 65536
new buffer:  4096
done

and then hang. This is unexpected behavior, since the pipe is at most 
two bytes full at any given time.

/proc/x/stack shows that the remaining thread is hanging at pipe.c:560. 
It looks like not only there needs to be space in the pipe, but also 
slots. At pipe.c:1306, a one-page pipe has only one slot. this led me to 
test nthreads=2, which also hangs. Checking blame of the pipe_write 
comment, it was added in a194dfe, which says, among other things:

> We just abandon the preallocated slot if we get a copy error.  Future
> writes may continue it and a future read will eventually recycle it.

This matches the observed behavior: in this case, there are no readers 
on the pipe, so the abandoned slot is lost.

In my opinion (as expressed on the issue), the pipe is being misused 
here. As explained in the pipe(7) manual page:

> Applications should not rely on a particular capacity: an application 
> should be designed so that a reading process consumes data as soon as 
> it is available, so that a writing process does not remain blocked.

Despite the misuse, I am reporting this for the following reasons:

1. I am reasonably confident that this is a regression in the kernel, 
   which has a standard of making reasonable efforts to maintain 
   backwards compatibility even with broken programs.

2. Even if this is not a regression, it seems like this situation could 
   be handled somewhat more gracefully. In this case, we are not writing 
   4095 bytes and then expecting a one-byte write to succeed; the pipe 
   is actually almost entirely empty.

3. Pipe sizes dynamically shrink in Linux, so despite the fact that this 
   case is unlikely to occur with two or more slots available, even a 
   program which does not explicitly allocate a one-page pipe buffer may 
   wind up with one if the user has 1024 or more pipes already open. 
   This significantly exacerbates the next point:

4. GNU make's jobserver uses pipes in a similar manner. By my reading of 
   the paper, it is theoretically possible for an N simultaneous writes 
   to occur without any readers, where N is the maximum concurrent jobs 
   permitted.

   Consider the following example with make -j2: two compile jobs are to 
   be performed: one at the top level, and one in a sub-directory. The 
   top-level make invokes one make and one cc, costing two tokens. The 
   sub-make invokes one cc with its free token. The pipe is now empty. 
   Now, suppose the two compilers return at exactly the same time. Both 
   copies of make will attempt to simultaneously write a token to the 
   pipe. This does not yet trigger deadlock: at least one write will 
   always succeed on an empty pipe. Suppose the sub-make's write goes 
   through. It then exits. The top-level make, however, is still blocked 
   on its original write, since it was not successfully merged with the 
   other write. The build is now deadlocked.

   I think this does not happen only by a coincidental design decision: 
   when the sub-make exits, the top-level make receives a SIGCHLD. GNU 
   make registers a SA_RESTART handler for SIGCHLD, so the write will be 
   interrupted and restarted. This is only a coincidence, however: the 
   program does not actually expect writing to the control pipe to ever 
   block; it could just as well de-register the signal handler while 
   performing the write and still be fully correct.

Regards,
Alex.

^ permalink raw reply

* Re: [External] Re: LSM policy options for new GPIO kernel driver interface
From: Stephen Smalley @ 2021-08-04 14:02 UTC (permalink / raw)
  To: Weber, Matthew L Collins
  Cc: Dominick Grift, selinux@vger.kernel.org,
	linux-security-module@vger.kernel.org, Graziano, David D Collins,
	Paul Moore, Ondrej Mosnacek
In-Reply-To: <CY1P110MB010233780A4A54DE7580F9FBF2F09@CY1P110MB0102.NAMP110.PROD.OUTLOOK.COM>

On Tue, Aug 3, 2021 at 3:17 PM Weber, Matthew L Collins
<Matthew.Weber@collins.com> wrote:
>
> Dominick,
>
> > From: Dominick Grift <dominick.grift@defensec.nl>
> > Sent: Tuesday, August 3, 2021 12:22 PM
> > To: Weber, Matthew L Collins <Matthew.Weber@collins.com>
> > Cc: selinux@vger.kernel.org <selinux@vger.kernel.org>; linux-security-module@vger.kernel.org <linux-security-module@vger.kernel.org>; Graziano, > David D Collins <david.graziano@collins.com>
> > Subject: [External] Re: LSM policy options for new GPIO kernel driver interface
> >
>
> [snip]
>
> >
> > SELinux supports IOCTL allow-listing and so access to the various GPIO
> > IOCTL can probably already be controlled.
> >
>
> We had been looking at this option but noticed the GPIO are broken out in groups by the "chip" providing them so a granular single IO "write" action can't be controlled through an allow-listing.  One idea we were going to look into was to break out all the IO in a chip as minor dev nodes which then could have specific IOCTL controls applied.  The default policy could restrict the "chip" node and then have broken out rules for each minor IO.
>
> > Other than that you could consider adding LSM hooks for GPIO object
> > related syscalls and adding SELinux check for GPIO syscall operations
> > but not sure if that adds any value to the above.
>
> Assuming you're referring to something like SECCOMP filtering the IOCTL, that would shift the responsibility to userspace to properly use the SECCOMP filter...  Or are you referring to new hooks on the kernel side of the syscall handling that would partially decode the payload of the call?
>
> Thanks for the response on this.  I wanted to have some debate before reaching out to the GPIO maintainers to look at options from their perspective.

Circa Linux 4.3, the SELinux kernel code was augmented to support
"extended permissions" with ioctl commands as the initial use case for
Android device driver whitelisting. This is supported in kernel policy
version 30 and later. A simple example can be found in the
selinux-testsuite under the ioctl test. See:
https://blog.siphos.be/2017/11/selinux-and-extended-permissions/
https://selinuxproject.org/page/XpermRules

This may still not provide you with the desired granularity. Depending
on the driver implementation, it may be possible to automatically
transition different "objects" managed by the driver into different
contexts through the recent SELinux anonymous inode labeling support
merged in Linux 5.12. There is an example of this for userfaultfd
inodes in the selinux-testsuite. We had previously looked at using
this support for /dev/kvm. Correct labeling of the inodes may require
modification to the driver depending on the approach desired.

^ permalink raw reply

* Re: [PATCH v2] tpm: ibmvtpm: Avoid error message when process gets signal while waiting
From: Nageswara Sastry @ 2021-08-04  5:30 UTC (permalink / raw)
  To: Stefan Berger, jarkko
  Cc: peterhuewe, jgg, linux-integrity, linux-security-module,
	linux-kernel, Stefan Berger, Nayna Jain, George Wilson
In-Reply-To: <20210803202622.1537040-1-stefanb@linux.vnet.ibm.com>



On 04/08/21 1:56 am, Stefan Berger wrote:
> From: Stefan Berger <stefanb@linux.ibm.com>
> 
> When rngd is run as root then lots of these types of message will appear
> in the kernel log if the TPM has been configured to provide random bytes:
> 
> [ 7406.275163] tpm tpm0: tpm_transmit: tpm_recv: error -4
> 
> The issue is caused by the following call that is interrupted while
> waiting for the TPM's response.
> 
> sig = wait_event_interruptible(ibmvtpm->wq,
>                                 !ibmvtpm->tpm_processing_cmd);
> 
> Rather than waiting for the response in the low level driver, have it use
> the polling loop in tpm_try_transmit() that uses a command's duration to
> poll until a result has been returned by the TPM, thus ending when the
> timeout has occurred but not responding to signals and ctrl-c anymore. To
> stay in this polling loop extend tpm_ibmvtpm_status() to return
> TPM_STATUS_BUSY for as long as the vTPM is busy. Since the loop requires
> the TPM's timeouts, get them now using tpm_get_timeouts() after setting
> the TPM2 version flag on the chip.
> 
> Rename the tpm_processing_cmd to tpm_status in ibmvtpm_dev and set the
> TPM_STATUS_BUSY flag while the vTPM is busy processing a command.
> 
> To recreat the resolved issue start rngd like this:
> 
> sudo rngd -r /dev/hwrng -t
> sudo rngd -r /dev/tpm0 -t
> 
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=1981473
> Fixes: 6674ff145eef ("tpm_ibmvtpm: properly handle interrupted packet receptions")
> Cc: Nayna Jain <nayna@linux.ibm.com>
> Cc: George Wilson <gcwilson@linux.ibm.com>
> Reported-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
> 

Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>

Tested the patch applying on kernel version 5.14.0-rc3

1. Ran with recreate commands i.e.
rngd -r /dev/hwrng -t
rngd -r /dev/tpm0 -t

2. Ran the above rngd commands along with "stress-ng - dispatching hogs: 
32 af-alg, 32 bsearch, 32 context, 32 cpu, 32 crypt, 32 hsearch, 32 
longjmp, 32 lsearch, 32 matrix, 32 qsort, 32 str, 32 stream, 32 tsearch, 
32 vecmath, 32 wcs" for 2 hours

Thanks!!
> ---
> 
> v2:
>   - reworded commit text
> ---
>   drivers/char/tpm/tpm_ibmvtpm.c | 31 ++++++++++++++++++-------------
>   drivers/char/tpm/tpm_ibmvtpm.h |  3 ++-
>   2 files changed, 20 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/char/tpm/tpm_ibmvtpm.c b/drivers/char/tpm/tpm_ibmvtpm.c
> index 903604769de9..5d795866b483 100644
> --- a/drivers/char/tpm/tpm_ibmvtpm.c
> +++ b/drivers/char/tpm/tpm_ibmvtpm.c
> @@ -106,17 +106,12 @@ static int tpm_ibmvtpm_recv(struct tpm_chip *chip, u8 *buf, size_t count)
>   {
>   	struct ibmvtpm_dev *ibmvtpm = dev_get_drvdata(&chip->dev);
>   	u16 len;
> -	int sig;
>   
>   	if (!ibmvtpm->rtce_buf) {
>   		dev_err(ibmvtpm->dev, "ibmvtpm device is not ready\n");
>   		return 0;
>   	}
>   
> -	sig = wait_event_interruptible(ibmvtpm->wq, !ibmvtpm->tpm_processing_cmd);
> -	if (sig)
> -		return -EINTR;
> -
>   	len = ibmvtpm->res_len;
>   
>   	if (count < len) {
> @@ -220,11 +215,12 @@ static int tpm_ibmvtpm_send(struct tpm_chip *chip, u8 *buf, size_t count)
>   		return -EIO;
>   	}
>   
> -	if (ibmvtpm->tpm_processing_cmd) {
> +	if ((ibmvtpm->tpm_status & TPM_STATUS_BUSY)) {
>   		dev_info(ibmvtpm->dev,
>   		         "Need to wait for TPM to finish\n");
>   		/* wait for previous command to finish */
> -		sig = wait_event_interruptible(ibmvtpm->wq, !ibmvtpm->tpm_processing_cmd);
> +		sig = wait_event_interruptible(ibmvtpm->wq,
> +				(ibmvtpm->tpm_status & TPM_STATUS_BUSY) == 0);
>   		if (sig)
>   			return -EINTR;
>   	}
> @@ -237,7 +233,7 @@ static int tpm_ibmvtpm_send(struct tpm_chip *chip, u8 *buf, size_t count)
>   	 * set the processing flag before the Hcall, since we may get the
>   	 * result (interrupt) before even being able to check rc.
>   	 */
> -	ibmvtpm->tpm_processing_cmd = true;
> +	ibmvtpm->tpm_status |= TPM_STATUS_BUSY;
>   
>   again:
>   	rc = ibmvtpm_send_crq(ibmvtpm->vdev,
> @@ -255,7 +251,7 @@ static int tpm_ibmvtpm_send(struct tpm_chip *chip, u8 *buf, size_t count)
>   			goto again;
>   		}
>   		dev_err(ibmvtpm->dev, "tpm_ibmvtpm_send failed rc=%d\n", rc);
> -		ibmvtpm->tpm_processing_cmd = false;
> +		ibmvtpm->tpm_status &= ~TPM_STATUS_BUSY;
>   	}
>   
>   	spin_unlock(&ibmvtpm->rtce_lock);
> @@ -269,7 +265,9 @@ static void tpm_ibmvtpm_cancel(struct tpm_chip *chip)
>   
>   static u8 tpm_ibmvtpm_status(struct tpm_chip *chip)
>   {
> -	return 0;
> +	struct ibmvtpm_dev *ibmvtpm = dev_get_drvdata(&chip->dev);
> +
> +	return ibmvtpm->tpm_status;
>   }
>   
>   /**
> @@ -457,7 +455,7 @@ static const struct tpm_class_ops tpm_ibmvtpm = {
>   	.send = tpm_ibmvtpm_send,
>   	.cancel = tpm_ibmvtpm_cancel,
>   	.status = tpm_ibmvtpm_status,
> -	.req_complete_mask = 0,
> +	.req_complete_mask = TPM_STATUS_BUSY,
>   	.req_complete_val = 0,
>   	.req_canceled = tpm_ibmvtpm_req_canceled,
>   };
> @@ -550,7 +548,7 @@ static void ibmvtpm_crq_process(struct ibmvtpm_crq *crq,
>   		case VTPM_TPM_COMMAND_RES:
>   			/* len of the data in rtce buffer */
>   			ibmvtpm->res_len = be16_to_cpu(crq->len);
> -			ibmvtpm->tpm_processing_cmd = false;
> +			ibmvtpm->tpm_status &= ~TPM_STATUS_BUSY;
>   			wake_up_interruptible(&ibmvtpm->wq);
>   			return;
>   		default:
> @@ -688,8 +686,15 @@ static int tpm_ibmvtpm_probe(struct vio_dev *vio_dev,
>   		goto init_irq_cleanup;
>   	}
>   
> -	if (!strcmp(id->compat, "IBM,vtpm20")) {
> +
> +	if (!strcmp(id->compat, "IBM,vtpm20"))
>   		chip->flags |= TPM_CHIP_FLAG_TPM2;
> +
> +	rc = tpm_get_timeouts(chip);
> +	if (rc)
> +		goto init_irq_cleanup;
> +
> +	if (chip->flags & TPM_CHIP_FLAG_TPM2) {
>   		rc = tpm2_get_cc_attrs_tbl(chip);
>   		if (rc)
>   			goto init_irq_cleanup;
> diff --git a/drivers/char/tpm/tpm_ibmvtpm.h b/drivers/char/tpm/tpm_ibmvtpm.h
> index b92aa7d3e93e..252f1cccdfc5 100644
> --- a/drivers/char/tpm/tpm_ibmvtpm.h
> +++ b/drivers/char/tpm/tpm_ibmvtpm.h
> @@ -41,7 +41,8 @@ struct ibmvtpm_dev {
>   	wait_queue_head_t wq;
>   	u16 res_len;
>   	u32 vtpm_version;
> -	bool tpm_processing_cmd;
> +	u8 tpm_status;
> +#define TPM_STATUS_BUSY		(1 << 0) /* vtpm is processing a command */
>   };
>   
>   #define CRQ_RES_BUF_SIZE	PAGE_SIZE
> 

-- 
Thanks and Regards
R.Nageswara Sastry

^ permalink raw reply

* Re: [PATCH RFC v2 00/12] Enroll kernel keys thru MOK
From: Eric Snowberg @ 2021-08-04  2:56 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: keyrings@vger.kernel.org, linux-integrity@vger.kernel.org,
	David Howells, dwmw2@infradead.org, herbert@gondor.apana.org.au,
	davem@davemloft.net, jarkko@kernel.org, jmorris@namei.org,
	serge@hallyn.com, keescook@chromium.org,
	gregkh@linuxfoundation.org, torvalds@linux-foundation.org,
	scott.branden@broadcom.com, weiyongjun1@huawei.com,
	nayna@linux.ibm.com, ebiggers@google.com, ardb@kernel.org,
	Lakshmi Ramasubramanian, lszubowi@redhat.com,
	linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org,
	linux-security-module@vger.kernel.org,
	James.Bottomley@hansenpartnership.com, pjones@redhat.com,
	glin@suse.com, Konrad Wilk
In-Reply-To: <5ac7f5fc866dd271ecfc9be17fef7fa47babbc6e.camel@linux.ibm.com>



> On Aug 3, 2021, at 7:14 PM, Mimi Zohar <zohar@linux.ibm.com> wrote:
> 
> On Tue, 2021-08-03 at 13:52 -0600, Eric Snowberg wrote:
>>> On Aug 3, 2021, at 11:01 AM, Mimi Zohar <zohar@linux.ibm.com> wrote:
>>> 
>>> On Mon, 2021-07-26 at 13:13 -0400, Eric Snowberg wrote:
>>> 
>>>> When the kernel boots, if MokListTrustedRT is set and
>>>> EFI_VARIABLE_NON_VOLATILE is not set, the MokListRT is loaded into the
>>>> mok keyring instead of the platform keyring. Mimi has suggested that
>>>> only CA keys or keys that can be vouched for by other kernel keys be
>>>> loaded into this keyring. All other certs will load into the platform
>>>> keyring instead.
>>> 
>>> I suggested only loading the CA keys stored in the MOK db onto the MOK
>>> keyring.  Like the builtin trusted keyring, the MOK keyring would also
>>> be linked to the secondary keyring.   Assuming the secondary keyring is
>>> defined, all other properly signed MOK db keys  - signed by keys on the
>>> builtin, secondary or MOK keyring - would be loaded onto the secondary
>>> keyring.
>>> 
>>> As previously discussed, this might require reading the MOK db twice -
>>> once to load the CA keys on the MOK keyring, a second time to load the
>>> remaining properly signed keys onto the secondary keyring.
>> 
>> I’m only loading CA keys or keys that can be vouched for by other kernel 
>> keys into the new mok keyring.
> 
> The cover letter implies that this suggestion is coming from me, which
> it definitely is not.  My preference, as I made clear from the very
> beginning, is to load ONLY the MOK DB CA keys onto the mok
> keyring.   (And even go one step farther, requiring the MOK DB CA
> key(s) to be identified on the boot command line.)

Ok, got it.  I guess I misunderstood and was thinking built-in should be 
referenced too for things going into the new mok keyring.

>> Currently, I’m not doing another pass.  I 
>> could add another pass, but it would not solve the issue with someone trying 
>> to load an intermediate CA along with a leaf cert.  This would require yet 
>> a third pass.  I wasn’t sure if this added complexity was necessary.  
>> 
>> Currently, any CA contained within the MOK db would now be trusted by the 
>> kernel.  Someone using a kernel with the secondary keyring enabled could 
>> load the intermediate and leaf certs themselves following boot.
> 
> Correct, as previously discussed, the other signed MOK DB keys may be
> loaded by userspace.   The only reason we're interested in any of the
> other MOK DB keys is prevent a regression.  As you previously pointed
> out all of the MOK DB keys are currently being loaded onto the platform
> keyring.  So leave the existing code, which loads the MOK DB keys onto
> the platform keyring, alone to prevent that regression.  It's already
> being controlled by a UEFI variable.

With this series, I do not believe a regression exists.  With a single pass, 
keys are either loaded into the mok or the platform keyring.  Since the mok
is linked to the secondary (or the built-in),  during kexec signature validation,
all keys are referenced. 

>> Taking 
>> this into account, if you’d like to see two passes, let me know and I’ll add 
>> that in v3.  If a second pass is done, do you really want these additional 
>> keys added to the secondary keyring or should they go into the mok keyring
>> instead?  I was under the impression the secondary should be empty until a
>> user adds their own keys into it. Thanks.
> 
> Again, my preference would be to load ONLY the MOK DB CA keys onto the
> mok keyring.

Ok, I’ll update the current code to just load CA keys into the mok in v3.  This would
simplify the new restrict_link_by_ca function. 

With that change, do you see any issues with how I’m doing the linking?  With the 
mok keyring linked to the secondary keyring, the platform keyring may only contain 
a subset of the keys it originally contained.  But, as I described above, I don’t believe
it will lead to a regression since all keys get referenced. Thanks.


^ permalink raw reply

* Re: [PATCH RFC v2 00/12] Enroll kernel keys thru MOK
From: Mimi Zohar @ 2021-08-04  1:14 UTC (permalink / raw)
  To: Eric Snowberg
  Cc: keyrings, linux-integrity, dhowells, dwmw2, herbert, davem,
	jarkko, jmorris, serge, keescook, gregkh, torvalds, scott.branden,
	weiyongjun1, nayna, ebiggers, ardb, nramas, lszubowi,
	linux-kernel, linux-crypto, linux-security-module,
	James.Bottomley, pjones, glin, konrad.wilk@oracle.com
In-Reply-To: <2BBC3A71-6E0D-47A2-842A-11C279A5DC56@oracle.com>

Hi Eric,

On Tue, 2021-08-03 at 13:52 -0600, Eric Snowberg wrote:
> > On Aug 3, 2021, at 11:01 AM, Mimi Zohar <zohar@linux.ibm.com> wrote:
> > 
> > On Mon, 2021-07-26 at 13:13 -0400, Eric Snowberg wrote:
> > 
> >> When the kernel boots, if MokListTrustedRT is set and
> >> EFI_VARIABLE_NON_VOLATILE is not set, the MokListRT is loaded into the
> >> mok keyring instead of the platform keyring. Mimi has suggested that
> >> only CA keys or keys that can be vouched for by other kernel keys be
> >> loaded into this keyring. All other certs will load into the platform
> >> keyring instead.
> > 
> > I suggested only loading the CA keys stored in the MOK db onto the MOK
> > keyring.  Like the builtin trusted keyring, the MOK keyring would also
> > be linked to the secondary keyring.   Assuming the secondary keyring is
> > defined, all other properly signed MOK db keys  - signed by keys on the
> > builtin, secondary or MOK keyring - would be loaded onto the secondary
> > keyring.
> > 
> > As previously discussed, this might require reading the MOK db twice -
> > once to load the CA keys on the MOK keyring, a second time to load the
> > remaining properly signed keys onto the secondary keyring.
> 
> I’m only loading CA keys or keys that can be vouched for by other kernel 
> keys into the new mok keyring.

The cover letter implies that this suggestion is coming from me, which
it definitely is not.  My preference, as I made clear from the very
beginning, is to load ONLY the MOK DB CA keys onto the mok
keyring.   (And even go one step farther, requiring the MOK DB CA
key(s) to be identified on the boot command line.)

> Currently, I’m not doing another pass.  I 
> could add another pass, but it would not solve the issue with someone trying 
> to load an intermediate CA along with a leaf cert.  This would require yet 
> a third pass.  I wasn’t sure if this added complexity was necessary.  
> 
> Currently, any CA contained within the MOK db would now be trusted by the 
> kernel.  Someone using a kernel with the secondary keyring enabled could 
> load the intermediate and leaf certs themselves following boot.

Correct, as previously discussed, the other signed MOK DB keys may be
loaded by userspace.   The only reason we're interested in any of the
other MOK DB keys is prevent a regression.  As you previously pointed
out all of the MOK DB keys are currently being loaded onto the platform
keyring.  So leave the existing code, which loads the MOK DB keys onto
the platform keyring, alone to prevent that regression.  It's already
being controlled by a UEFI variable.

> Taking 
> this into account, if you’d like to see two passes, let me know and I’ll add 
> that in v3.  If a second pass is done, do you really want these additional 
> keys added to the secondary keyring or should they go into the mok keyring
> instead?  I was under the impression the secondary should be empty until a
> user adds their own keys into it. Thanks.

Again, my preference would be to load ONLY the MOK DB CA keys onto the
mok keyring.

If YOU decide you want to load the signed keys stored in MOK DB, be my
guest.  However, they should be loaded onto the secondary keyring and a
new restriction defined, similar to
"restrict_link_by_builtin_and_secondary_trusted", which includes mok as
well.

thanks,

Mimi


^ permalink raw reply

* Re: Linux Kernel vulnerability scripting
From: Denis Efremov @ 2021-08-03 20:50 UTC (permalink / raw)
  To: Weber, Matthew L Collins, Masahiro Yamada,
	Arnaldo Carvalho de Melo, LSM List, keescook
In-Reply-To: <CY1P110MB0102E9CF9461FDAA8C3B8C22F2F09@CY1P110MB0102.NAMP110.PROD.OUTLOOK.COM>

Hi,

On 8/3/21 11:07 PM, Weber, Matthew L Collins wrote:
> Hello,
> 
> (I didn't want to spam the whole LKML, so I've included the LSM list and top hits with get_maintainer.pl on the scripts and tools folders.)
> 
> I'm organizing a project to take some prototype scripting and publicly publish/rewrite.  The script I'd like to add to the kernel code base breaks down a kernel build and identifies the active code (using enabled Kconfig and obj file list). 

If I understand you correctly this is what I started to do since the beginning of the year with CVEHound project.
https://github.com/evdenis/cvehound/

Kconfig analysis available with --config option.

> It then uses the kernel version and queries a public vulnerability database(NIST NVD) to identify possible patches against known vulnerabilities.

I take info about fixed CVEs from MITRE, NIST NVD and other sources (RedHat, Ubuntu, linuxkernelcves, ...).
Vulnerable version info in the databases is not reliable. Most of the time I need to figure out the bug commit
and event double check the "Fixes:" tag.

> The script then attempts to patch the source code to determine which vulnerabilities are still present in the codebase.

Just application of a patch with -R option will not work in case the code is modified since the fix and in case of backports.
I describe CVEs with coccinelle patterns. https://coccinelle.gitlabpages.inria.fr/website/
Coccinelle is already broadly used in kernel (see scripts/coccinelle).

> The end goal is to help the user understand the state of the active codebase, whereas most tools stop at the kernel version, and then the activity is all manual.

Exactly. CodeBase+KernelConfig.
Many vendors (e.g. samsung) don't update kernel version info, they just backport fixes.

> For an example of what the scripting impact could improve, a recent Kernel 4.14.x dump of vulnerabilities had hundreds that needed to be paired down.

> Our estimate before tooling put the effort at about 10-15mins a CVE (determine active code, review code paths in suggested patches).

As for now, I described > 200 kernel CVEs with coccinelle patterns.
Usually it takes me exactly 10-15 minutes to write a pattern (without testing it).
I test a pattern to detect a missing fix since the commit the bug was introduced.
https://github.com/evdenis/cvehound/blob/master/tests/test_05_between_fixes_fix.py

I don't rely on kernel version at all. Usually a pattern contains information about
bug conditions or/and about missing fix.

I already found some trophies with it like missing backports:
  https://lkml.org/lkml/2021/1/21/1278
or partial backports:
  https://github.com/oracle/linux-uek/commit/ee7ab9e8f9cb844c4fac8ca9bcc1a0f3f8fdc9bb
  vs
  https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/block/rbd.c?h=linux-4.4.y&id=e349a5786f4c23eb11d1e7385703ddbf94f3f061

Regards,
Denis

^ permalink raw reply

* [PATCH v2] tpm: ibmvtpm: Avoid error message when process gets signal while waiting
From: Stefan Berger @ 2021-08-03 20:26 UTC (permalink / raw)
  To: jarkko
  Cc: peterhuewe, jgg, linux-integrity, linux-security-module,
	linux-kernel, Stefan Berger, Nayna Jain, George Wilson,
	Nageswara R Sastry

From: Stefan Berger <stefanb@linux.ibm.com>

When rngd is run as root then lots of these types of message will appear
in the kernel log if the TPM has been configured to provide random bytes:

[ 7406.275163] tpm tpm0: tpm_transmit: tpm_recv: error -4

The issue is caused by the following call that is interrupted while
waiting for the TPM's response.

sig = wait_event_interruptible(ibmvtpm->wq,
                               !ibmvtpm->tpm_processing_cmd);

Rather than waiting for the response in the low level driver, have it use
the polling loop in tpm_try_transmit() that uses a command's duration to
poll until a result has been returned by the TPM, thus ending when the
timeout has occurred but not responding to signals and ctrl-c anymore. To
stay in this polling loop extend tpm_ibmvtpm_status() to return
TPM_STATUS_BUSY for as long as the vTPM is busy. Since the loop requires
the TPM's timeouts, get them now using tpm_get_timeouts() after setting
the TPM2 version flag on the chip.

Rename the tpm_processing_cmd to tpm_status in ibmvtpm_dev and set the
TPM_STATUS_BUSY flag while the vTPM is busy processing a command.

To recreat the resolved issue start rngd like this:

sudo rngd -r /dev/hwrng -t
sudo rngd -r /dev/tpm0 -t

Link: https://bugzilla.redhat.com/show_bug.cgi?id=1981473
Fixes: 6674ff145eef ("tpm_ibmvtpm: properly handle interrupted packet receptions")
Cc: Nayna Jain <nayna@linux.ibm.com>
Cc: George Wilson <gcwilson@linux.ibm.com>
Reported-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>

---

v2:
 - reworded commit text
---
 drivers/char/tpm/tpm_ibmvtpm.c | 31 ++++++++++++++++++-------------
 drivers/char/tpm/tpm_ibmvtpm.h |  3 ++-
 2 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/drivers/char/tpm/tpm_ibmvtpm.c b/drivers/char/tpm/tpm_ibmvtpm.c
index 903604769de9..5d795866b483 100644
--- a/drivers/char/tpm/tpm_ibmvtpm.c
+++ b/drivers/char/tpm/tpm_ibmvtpm.c
@@ -106,17 +106,12 @@ static int tpm_ibmvtpm_recv(struct tpm_chip *chip, u8 *buf, size_t count)
 {
 	struct ibmvtpm_dev *ibmvtpm = dev_get_drvdata(&chip->dev);
 	u16 len;
-	int sig;
 
 	if (!ibmvtpm->rtce_buf) {
 		dev_err(ibmvtpm->dev, "ibmvtpm device is not ready\n");
 		return 0;
 	}
 
-	sig = wait_event_interruptible(ibmvtpm->wq, !ibmvtpm->tpm_processing_cmd);
-	if (sig)
-		return -EINTR;
-
 	len = ibmvtpm->res_len;
 
 	if (count < len) {
@@ -220,11 +215,12 @@ static int tpm_ibmvtpm_send(struct tpm_chip *chip, u8 *buf, size_t count)
 		return -EIO;
 	}
 
-	if (ibmvtpm->tpm_processing_cmd) {
+	if ((ibmvtpm->tpm_status & TPM_STATUS_BUSY)) {
 		dev_info(ibmvtpm->dev,
 		         "Need to wait for TPM to finish\n");
 		/* wait for previous command to finish */
-		sig = wait_event_interruptible(ibmvtpm->wq, !ibmvtpm->tpm_processing_cmd);
+		sig = wait_event_interruptible(ibmvtpm->wq,
+				(ibmvtpm->tpm_status & TPM_STATUS_BUSY) == 0);
 		if (sig)
 			return -EINTR;
 	}
@@ -237,7 +233,7 @@ static int tpm_ibmvtpm_send(struct tpm_chip *chip, u8 *buf, size_t count)
 	 * set the processing flag before the Hcall, since we may get the
 	 * result (interrupt) before even being able to check rc.
 	 */
-	ibmvtpm->tpm_processing_cmd = true;
+	ibmvtpm->tpm_status |= TPM_STATUS_BUSY;
 
 again:
 	rc = ibmvtpm_send_crq(ibmvtpm->vdev,
@@ -255,7 +251,7 @@ static int tpm_ibmvtpm_send(struct tpm_chip *chip, u8 *buf, size_t count)
 			goto again;
 		}
 		dev_err(ibmvtpm->dev, "tpm_ibmvtpm_send failed rc=%d\n", rc);
-		ibmvtpm->tpm_processing_cmd = false;
+		ibmvtpm->tpm_status &= ~TPM_STATUS_BUSY;
 	}
 
 	spin_unlock(&ibmvtpm->rtce_lock);
@@ -269,7 +265,9 @@ static void tpm_ibmvtpm_cancel(struct tpm_chip *chip)
 
 static u8 tpm_ibmvtpm_status(struct tpm_chip *chip)
 {
-	return 0;
+	struct ibmvtpm_dev *ibmvtpm = dev_get_drvdata(&chip->dev);
+
+	return ibmvtpm->tpm_status;
 }
 
 /**
@@ -457,7 +455,7 @@ static const struct tpm_class_ops tpm_ibmvtpm = {
 	.send = tpm_ibmvtpm_send,
 	.cancel = tpm_ibmvtpm_cancel,
 	.status = tpm_ibmvtpm_status,
-	.req_complete_mask = 0,
+	.req_complete_mask = TPM_STATUS_BUSY,
 	.req_complete_val = 0,
 	.req_canceled = tpm_ibmvtpm_req_canceled,
 };
@@ -550,7 +548,7 @@ static void ibmvtpm_crq_process(struct ibmvtpm_crq *crq,
 		case VTPM_TPM_COMMAND_RES:
 			/* len of the data in rtce buffer */
 			ibmvtpm->res_len = be16_to_cpu(crq->len);
-			ibmvtpm->tpm_processing_cmd = false;
+			ibmvtpm->tpm_status &= ~TPM_STATUS_BUSY;
 			wake_up_interruptible(&ibmvtpm->wq);
 			return;
 		default:
@@ -688,8 +686,15 @@ static int tpm_ibmvtpm_probe(struct vio_dev *vio_dev,
 		goto init_irq_cleanup;
 	}
 
-	if (!strcmp(id->compat, "IBM,vtpm20")) {
+
+	if (!strcmp(id->compat, "IBM,vtpm20"))
 		chip->flags |= TPM_CHIP_FLAG_TPM2;
+
+	rc = tpm_get_timeouts(chip);
+	if (rc)
+		goto init_irq_cleanup;
+
+	if (chip->flags & TPM_CHIP_FLAG_TPM2) {
 		rc = tpm2_get_cc_attrs_tbl(chip);
 		if (rc)
 			goto init_irq_cleanup;
diff --git a/drivers/char/tpm/tpm_ibmvtpm.h b/drivers/char/tpm/tpm_ibmvtpm.h
index b92aa7d3e93e..252f1cccdfc5 100644
--- a/drivers/char/tpm/tpm_ibmvtpm.h
+++ b/drivers/char/tpm/tpm_ibmvtpm.h
@@ -41,7 +41,8 @@ struct ibmvtpm_dev {
 	wait_queue_head_t wq;
 	u16 res_len;
 	u32 vtpm_version;
-	bool tpm_processing_cmd;
+	u8 tpm_status;
+#define TPM_STATUS_BUSY		(1 << 0) /* vtpm is processing a command */
 };
 
 #define CRQ_RES_BUF_SIZE	PAGE_SIZE
-- 
2.31.1


^ permalink raw reply related

* Linux Kernel vulnerability scripting
From: Weber, Matthew L Collins @ 2021-08-03 20:07 UTC (permalink / raw)
  To: Masahiro Yamada, Arnaldo Carvalho de Melo, LSM List

Hello,

(I didn't want to spam the whole LKML, so I've included the LSM list and top hits with get_maintainer.pl on the scripts and tools folders.)

I'm organizing a project to take some prototype scripting and publicly publish/rewrite.  The script I'd like to add to the kernel code base breaks down a kernel build and identifies the active code (using enabled Kconfig and obj file list).  It then uses the kernel version and queries a public vulnerability database(NIST NVD) to identify possible patches against known vulnerabilities.  The script then attempts to patch the source code to determine which vulnerabilities are still present in the codebase.  The end goal is to help the user understand the state of the active codebase, whereas most tools stop at the kernel version, and then the activity is all manual.  For an example of what the scripting impact could improve, a recent Kernel 4.14.x dump of vulnerabilities had hundreds that needed to be paired down.  Our estimate before tooling put the effort at about 10-15mins a CVE (determine active code, review code paths in suggested patches).  

Is this something that fits within your understanding of the "scripts or tools" included in the kernel codebase?  If so, do scripting-related patches primarily hit the LKML or via subsystem lists related to the topic (I.e., then staging branches via the subsystem for merge).

Thank you for any suggestions of which mailing lists, subsystems, or maintainers I should include on the topic. 

--
Matthew Weber | Associate Director Software Engineer | Commercial Avionics   (Focused in Open Source products and Cybersecurity)
COLLINS AEROSPACE
400 Collins Road NE, Cedar Rapids, Iowa 52498, USA
Tel: +1 319 295 7349 | FAX: +1 319 263 6099
matthew.weber@collins.com 

^ permalink raw reply

* Re: [PATCH RFC v2 00/12] Enroll kernel keys thru MOK
From: Eric Snowberg @ 2021-08-03 19:52 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: keyrings, linux-integrity, dhowells, dwmw2, herbert, davem,
	jarkko, jmorris, serge, keescook, gregkh, torvalds, scott.branden,
	weiyongjun1, nayna, ebiggers, ardb, nramas, lszubowi,
	linux-kernel, linux-crypto, linux-security-module,
	James.Bottomley, pjones, glin, konrad.wilk@oracle.com
In-Reply-To: <820cd72cd77c4716bff2bf344c64d7bcb59fc4d3.camel@linux.ibm.com>

> On Aug 3, 2021, at 11:01 AM, Mimi Zohar <zohar@linux.ibm.com> wrote:
> 
> On Mon, 2021-07-26 at 13:13 -0400, Eric Snowberg wrote:
> 
>> When the kernel boots, if MokListTrustedRT is set and
>> EFI_VARIABLE_NON_VOLATILE is not set, the MokListRT is loaded into the
>> mok keyring instead of the platform keyring. Mimi has suggested that
>> only CA keys or keys that can be vouched for by other kernel keys be
>> loaded into this keyring. All other certs will load into the platform
>> keyring instead.
> 
> I suggested only loading the CA keys stored in the MOK db onto the MOK
> keyring.  Like the builtin trusted keyring, the MOK keyring would also
> be linked to the secondary keyring.   Assuming the secondary keyring is
> defined, all other properly signed MOK db keys  - signed by keys on the
> builtin, secondary or MOK keyring - would be loaded onto the secondary
> keyring.
> 
> As previously discussed, this might require reading the MOK db twice -
> once to load the CA keys on the MOK keyring, a second time to load the
> remaining properly signed keys onto the secondary keyring.

I’m only loading CA keys or keys that can be vouched for by other kernel 
keys into the new mok keyring.  Currently, I’m not doing another pass.  I 
could add another pass, but it would not solve the issue with someone trying 
to load an intermediate CA along with a leaf cert.  This would require yet 
a third pass.  I wasn’t sure if this added complexity was necessary.  

Currently, any CA contained within the MOK db would now be trusted by the 
kernel.  Someone using a kernel with the secondary keyring enabled could 
load the intermediate and leaf certs themselves following boot.  Taking 
this into account, if you’d like to see two passes, let me know and I’ll add 
that in v3.  If a second pass is done, do you really want these additional 
keys added to the secondary keyring or should they go into the mok keyring
instead?  I was under the impression the secondary should be empty until a
user adds their own keys into it. Thanks.

^ permalink raw reply

* Re: [External] Re: LSM policy options for new GPIO kernel driver interface
From: Weber, Matthew L Collins @ 2021-08-03 18:11 UTC (permalink / raw)
  To: Dominick Grift
  Cc: selinux@vger.kernel.org, linux-security-module@vger.kernel.org,
	Graziano, David D Collins
In-Reply-To: <87im0m789b.fsf@defensec.nl>

Dominick,

> From: Dominick Grift <dominick.grift@defensec.nl>
> Sent: Tuesday, August 3, 2021 12:22 PM
> To: Weber, Matthew L Collins <Matthew.Weber@collins.com>
> Cc: selinux@vger.kernel.org <selinux@vger.kernel.org>; linux-security-module@vger.kernel.org <linux-security-module@vger.kernel.org>; Graziano, > David D Collins <david.graziano@collins.com>
> Subject: [External] Re: LSM policy options for new GPIO kernel driver interface 
>  

[snip]

> 
> SELinux supports IOCTL allow-listing and so access to the various GPIO
> IOCTL can probably already be controlled.
> 

We had been looking at this option but noticed the GPIO are broken out in groups by the "chip" providing them so a granular single IO "write" action can't be controlled through an allow-listing.  One idea we were going to look into was to break out all the IO in a chip as minor dev nodes which then could have specific IOCTL controls applied.  The default policy could restrict the "chip" node and then have broken out rules for each minor IO.

> Other than that you could consider adding LSM hooks for GPIO object
> related syscalls and adding SELinux check for GPIO syscall operations
> but not sure if that adds any value to the above.

Assuming you're referring to something like SECCOMP filtering the IOCTL, that would shift the responsibility to userspace to properly use the SECCOMP filter...  Or are you referring to new hooks on the kernel side of the syscall handling that would partially decode the payload of the call?

Thanks for the response on this.  I wanted to have some debate before reaching out to the GPIO maintainers to look at options from their perspective.

Best Regards,
Matt

^ permalink raw reply

* Re: LSM policy options for new GPIO kernel driver interface
From: Dominick Grift @ 2021-08-03 17:22 UTC (permalink / raw)
  To: Weber, Matthew L Collins
  Cc: selinux@vger.kernel.org, linux-security-module@vger.kernel.org,
	Graziano, David D Collins
In-Reply-To: <CY1P110MB0102ED0206E9498C742F6DC0F2EF9@CY1P110MB0102.NAMP110.PROD.OUTLOOK.COM>

"Weber, Matthew L Collins" <Matthew.Weber@collins.com> writes:

> All,
>
> Since the 5.10 kernel, the GPIO subsystem has migrated from a sysfs
> based GPIO export method [1] (everything is a file) to a character
> device[2] + library[3].  The new framework[2] provides users with
> signal debouncing and other features that benefit embedded products.
> The legacy method[1] allowed fine policy control of who can export /
> set / get the GPIO state.  We have not found a similar security policy
> path with the new approach.  Has anyone brainstormed strategies for
> the new character device-based interface without adding a userspace
> broker to enforce another level of rules?  The ideal case would be to
> keep all the controls within the SELinux refpolicy such that testing
> can be all-inclusive.
>
> I'd be interested in what people think, such that I can prepare a
> university research project submission for Fall 2021 to build a
> prototype.

Disclaimer: I am not qualified to give advice

SELinux supports IOCTL allow-listing and so access to the various GPIO
IOCTL can probably already be controlled.

So for example you can probably already control access to things like:

GPIO_GET_CHIPINFO_IOCTL
GPIO_GET_LINEINFO_UNWATCH_IOCTL
GPIO_V2_GET_LINEINFO_IOCTL
GPIO_V2_GET_LINEINFO_WATCH_IOCTL
GPIO_V2_GET_LINE_IOCTL
GPIO_V2_LINE_SET_CONFIG_IOCTL
GPIO_V2_LINE_GET_VALUES_IOCTL
GPIO_V2_LINE_SET_VALUES_IOCTL

Other than that you could consider adding LSM hooks for GPIO object
related syscalls and adding SELinux check for GPIO syscall operations
but not sure if that adds any value to the above.

>
> Best Regards,
> --
> Matt Weber
>
> [1] https://www.kernel.org/doc/html/latest/driver-api/gpio/legacy.html#sysfs-interface-for-userspace-optional
> [2] https://www.kernel.org/doc/html/latest/driver-api/gpio/using-gpio.html
> [3] https://git.kernel.org/pub/scm/libs/libgpiod/libgpiod.git/about/

-- 
gpg --locate-keys dominick.grift@defensec.nl
Key fingerprint = FCD2 3660 5D6B 9D27 7FC6  E0FF DA7E 521F 10F6 4098
https://sks-keyservers.net/pks/lookup?op=get&search=0xDA7E521F10F64098
Dominick Grift

^ permalink raw reply

* Re: [PATCH RFC v2 00/12] Enroll kernel keys thru MOK
From: Mimi Zohar @ 2021-08-03 17:01 UTC (permalink / raw)
  To: Eric Snowberg, keyrings, linux-integrity, dhowells, dwmw2,
	herbert, davem, jarkko, jmorris, serge
  Cc: keescook, gregkh, torvalds, scott.branden, weiyongjun1, nayna,
	ebiggers, ardb, nramas, lszubowi, linux-kernel, linux-crypto,
	linux-security-module, James.Bottomley, pjones, glin, konrad.wilk
In-Reply-To: <20210726171319.3133879-1-eric.snowberg@oracle.com>

Hi Eric,

On Mon, 2021-07-26 at 13:13 -0400, Eric Snowberg wrote:

> When the kernel boots, if MokListTrustedRT is set and
> EFI_VARIABLE_NON_VOLATILE is not set, the MokListRT is loaded into the
> mok keyring instead of the platform keyring. Mimi has suggested that
> only CA keys or keys that can be vouched for by other kernel keys be
> loaded into this keyring. All other certs will load into the platform
> keyring instead.

I suggested only loading the CA keys stored in the MOK db onto the MOK
keyring.  Like the builtin trusted keyring, the MOK keyring would also
be linked to the secondary keyring.   Assuming the secondary keyring is
defined, all other properly signed MOK db keys  - signed by keys on the
builtin, secondary or MOK keyring - would be loaded onto the secondary
keyring.

As previously discussed, this might require reading the MOK db twice -
once to load the CA keys on the MOK keyring, a second time to load the
remaining properly signed keys onto the secondary keyring.

thanks,

Mimi

^ permalink raw reply

* LSM policy options for new GPIO kernel driver interface
From: Weber, Matthew L Collins @ 2021-08-02 17:08 UTC (permalink / raw)
  To: selinux@vger.kernel.org, linux-security-module@vger.kernel.org
  Cc: Graziano, David D Collins

All,

Since the 5.10 kernel, the GPIO subsystem has migrated from a sysfs based GPIO export method [1] (everything is a file) to a character device[2] + library[3].  The new framework[2] provides users with signal debouncing and other features that benefit embedded products.  The legacy method[1] allowed fine policy control of who can export / set / get the GPIO state.  We have not found a similar security policy path with the new approach.  Has anyone brainstormed strategies for the new character device-based interface without adding a userspace broker to enforce another level of rules?  The ideal case would be to keep all the controls within the SELinux refpolicy such that testing can be all-inclusive.

I'd be interested in what people think, such that I can prepare a university research project submission for Fall 2021 to build a prototype. 

Best Regards,
--
Matt Weber

[1] https://www.kernel.org/doc/html/latest/driver-api/gpio/legacy.html#sysfs-interface-for-userspace-optional
[2] https://www.kernel.org/doc/html/latest/driver-api/gpio/using-gpio.html
[3] https://git.kernel.org/pub/scm/libs/libgpiod/libgpiod.git/about/

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox