Linux Documentation

Linux Documentation
 help / color / mirror / Atom feed

* Re: [PATCH 0/3] Clean up crypto documentation
From: Joe Perches @ 2019-06-24 22:37 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Gary R Hook, Hook, Gary, herbert@gondor.apana.org.au,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-crypto@vger.kernel.org, davem@davemloft.net
In-Reply-To: <20190624143748.7fcfe623@lwn.net>

On Mon, 2019-06-24 at 14:37 -0600, Jonathan Corbet wrote:
> On Mon, 24 Jun 2019 13:29:42 -0700
> Joe Perches <joe@perches.com> wrote:
> 
> > > Finally, would you prefer a v2 of the patch set? Happy to do
> > > whatever is preferred, of course.  
> > 
> > Whatever Jonathan decides is fine with me.
> > Mine was just a plea to avoid unnecessarily
> > making the source text harder to read as
> > that's what I mostly use.
> 
> Usually Herbert seems to take crypto docs, so it's not necessarily up to
> me :)
> 
> I don't see much that's objectionable here.  But...
> 
> > I don't know if this extension is valid yet, but
> > I believe just using <function_name>() is more
> > readable as text than ``<function_name>`` or
> > :c:func:`<function_name>`
> 
> It's been "valid" since I wrote it...it's just not upstream yet :)  I
> expect it to be in 5.3, though.  So the best way to refer to a kernel
> function, going forward, is just function() with no markup needed.

When that's actually "valid" I suggest doing:

$ git ls-files -- 'Documentation/*.rst' | \
 xargs perl -pi -e 's/:c:func:`(\w+)(?:\(\))?`/\1()/g; s/``(\w+)\(\)``/\1()/g'

so function designations in Documentation are simpler to read.

Right now that's:

$ git diff --shortstat Documentation/
 125 files changed, 1680 insertions(+), 1680 deletions(-)



^ permalink raw reply

* Re: [PATCH 2/3] crypto: doc - Describe the crypto engine
From: Eric Biggers @ 2019-06-24 22:03 UTC (permalink / raw)
  To: Hook, Gary
  Cc: herbert@gondor.apana.org.au, corbet@lwn.net,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-crypto@vger.kernel.org, davem@davemloft.net
In-Reply-To: <156140326736.29777.7751606850237303573.stgit@taos>

On Mon, Jun 24, 2019 at 07:07:49PM +0000, Hook, Gary wrote:
> Add a reference to the crypto engine documentation to
> the index.
> 
> Signed-off-by: Gary R Hook <gary.hook@amd.com>
> ---
>  Documentation/crypto/index.rst |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/Documentation/crypto/index.rst b/Documentation/crypto/index.rst
> index c4ff5d791233..37cd7fb0ea82 100644
> --- a/Documentation/crypto/index.rst
> +++ b/Documentation/crypto/index.rst
> @@ -19,6 +19,7 @@ for cryptographic use cases, as well as programming examples.
>     intro
>     architecture
>     devel-algos
> +   crypto_engine
>     userspace-if
>     crypto_engine
>     api
> 

It's already in the list.

- Eric

^ permalink raw reply

* Re: [PATCH 0/3] Clean up crypto documentation
From: Jonathan Corbet @ 2019-06-24 20:37 UTC (permalink / raw)
  To: Joe Perches
  Cc: Gary R Hook, Hook, Gary, herbert@gondor.apana.org.au,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-crypto@vger.kernel.org, davem@davemloft.net
In-Reply-To: <977bc7c484ef55ff78de51d7555afcc3c3350b1e.camel@perches.com>

On Mon, 24 Jun 2019 13:29:42 -0700
Joe Perches <joe@perches.com> wrote:

> > Finally, would you prefer a v2 of the patch set? Happy to do
> > whatever is preferred, of course.  
> 
> Whatever Jonathan decides is fine with me.
> Mine was just a plea to avoid unnecessarily
> making the source text harder to read as
> that's what I mostly use.

Usually Herbert seems to take crypto docs, so it's not necessarily up to
me :)

I don't see much that's objectionable here.  But...

> I don't know if this extension is valid yet, but
> I believe just using <function_name>() is more
> readable as text than ``<function_name>`` or
> :c:func:`<function_name>`

It's been "valid" since I wrote it...it's just not upstream yet :)  I
expect it to be in 5.3, though.  So the best way to refer to a kernel
function, going forward, is just function() with no markup needed.

Thanks,

jon

^ permalink raw reply

* Re: [PATCH 0/3] Clean up crypto documentation
From: Joe Perches @ 2019-06-24 20:29 UTC (permalink / raw)
  To: Gary R Hook, Hook, Gary, herbert@gondor.apana.org.au,
	corbet@lwn.net, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org,
	davem@davemloft.net
In-Reply-To: <d8b359ff-5891-7bb8-d292-9f10cca04f17@amd.com>

On Mon, 2019-06-24 at 20:06 +0000, Gary R Hook wrote:

Hi Gary.

> On 6/24/19 2:30 PM, Joe Perches wrote:
> > On Mon, 2019-06-24 at 19:07 +0000, Hook, Gary wrote:
> > > Tidy up the crypto documentation by filling in some variable
> > > descriptions, make some grammatical corrections, and enhance
> > > formatting.
> > 
> > While this seems generally OK, please try not to make the
> > readability of the source _text_ less intelligible just
> > to enhance the output formatting of the html.
> > 
> > e.g.:
> > 
> > Unnecessary blank lines separating function descriptions
> > Removing space alignment from bullet point descriptions
> 
> Apologies. I generally consider white space a Good Thing,
> but will take your note and not do that. The blank lines I
> added do not affect the output, so I should not have done
> that.
> 
> Also, I turned sentences into bulleted lists here, so I'm not
> clear on whether that was a Bad Thing or not.

To me, using bulleted lists are not a bad thing at all
but are quite the opposite for humans to read.

> Seems more legible
> to me all the way around, but I clearly could be incorrect.

Not at all.

> I agree that mucking with alignment is a bad thing, and would not
> intentionally do so. That said, if you would please elaborate on
> any mistakes I've made?
> 
> Finally, would you prefer a v2 of the patch set? Happy to do
> whatever is preferred, of course.

Whatever Jonathan decides is fine with me.
Mine was just a plea to avoid unnecessarily
making the source text harder to read as
that's what I mostly use.

I don't know if this extension is valid yet, but
I believe just using <function_name>() is more
readable as text than ``<function_name>`` or
:c:func:`<function_name>`

https://lore.kernel.org/lkml/20190425200125.12302-1-corbet@lwn.net/T/

I prefer the automatic approach over the manual
marking of functions as ideally sphinx formatting
should not overly impact the source text.



^ permalink raw reply

* Re: [PATCH 0/3] Clean up crypto documentation
From: Gary R Hook @ 2019-06-24 20:06 UTC (permalink / raw)
  To: Joe Perches, Hook, Gary, herbert@gondor.apana.org.au,
	corbet@lwn.net, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org,
	davem@davemloft.net
In-Reply-To: <23a5979082c89d7028409ad9ae082840411e1ca6.camel@perches.com>

On 6/24/19 2:30 PM, Joe Perches wrote:
> On Mon, 2019-06-24 at 19:07 +0000, Hook, Gary wrote:
>> Tidy up the crypto documentation by filling in some variable
>> descriptions, make some grammatical corrections, and enhance
>> formatting.
> 
> While this seems generally OK, please try not to make the
> readability of the source _text_ less intelligible just
> to enhance the output formatting of the html.
> 
> e.g.:
> 
> Unnecessary blank lines separating function descriptions
> Removing space alignment from bullet point descriptions

Apologies. I generally consider white space a Good Thing,
but will take your note and not do that. The blank lines I
added do not affect the output, so I should not have done
that.

Also, I turned sentences into bulleted lists here, so I'm not
clear on whether that was a Bad Thing or not. Seems more legible
to me all the way around, but I clearly could be incorrect. I
agree that mucking with alignment is a bad thing, and would not
intentionally do so. That said, if you would please elaborate on
any mistakes I've made?

Finally, would you prefer a v2 of the patch set? Happy to do
whatever is preferred, of course.

grh

^ permalink raw reply

* Re: [PATCH v11 02/13] PKCS#7: Refactor verify_pkcs7_signature()
From: Thiago Jung Bauermann @ 2019-06-24 19:56 UTC (permalink / raw)
  To: David Howells
  Cc: linux-security-module, keyrings, linux-crypto, linuxppc-dev,
	linux-doc, linux-kernel, Mimi Zohar, Dmitry Kasatkin,
	James Morris, Serge E. Hallyn, David Woodhouse, Jessica Yu,
	Herbert Xu, David S. Miller, Jonathan Corbet, AKASHI, Takahiro,
	linux-integrity
In-Reply-To: <20190611062817.18412-3-bauerman@linux.ibm.com>


Hello David,

AFAIK Mimi is happy with this patch set, but I still need acks from
maintainers of other subsystems that my changes touch before she can
accept it.

Are this patch and the next one ("PKCS#7: Introduce pkcs7_get_digest()")
OK from your PoV?

--
Thiago Jung Bauermann
IBM Linux Technology Center


Thiago Jung Bauermann <bauerman@linux.ibm.com> writes:

> IMA will need to verify a PKCS#7 signature which has already been parsed.
> For this reason, factor out the code which does that from
> verify_pkcs7_signature() into a new function which takes a struct
> pkcs7_message instead of a data buffer.
>
> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
> Cc: David Howells <dhowells@redhat.com>
> Cc: David Woodhouse <dwmw2@infradead.org>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: "David S. Miller" <davem@davemloft.net>
> ---
>  certs/system_keyring.c       | 61 ++++++++++++++++++++++++++----------
>  include/linux/verification.h | 10 ++++++
>  2 files changed, 55 insertions(+), 16 deletions(-)
>
> diff --git a/certs/system_keyring.c b/certs/system_keyring.c
> index c05c29ae4d5d..4ba82e52e4b4 100644
> --- a/certs/system_keyring.c
> +++ b/certs/system_keyring.c
> @@ -194,33 +194,27 @@ late_initcall(load_system_certificate_list);
>  #ifdef CONFIG_SYSTEM_DATA_VERIFICATION
>
>  /**
> - * verify_pkcs7_signature - Verify a PKCS#7-based signature on system data.
> + * verify_pkcs7_message_sig - Verify a PKCS#7-based signature on system data.
>   * @data: The data to be verified (NULL if expecting internal data).
>   * @len: Size of @data.
> - * @raw_pkcs7: The PKCS#7 message that is the signature.
> - * @pkcs7_len: The size of @raw_pkcs7.
> + * @pkcs7: The PKCS#7 message that is the signature.
>   * @trusted_keys: Trusted keys to use (NULL for builtin trusted keys only,
>   *					(void *)1UL for all trusted keys).
>   * @usage: The use to which the key is being put.
>   * @view_content: Callback to gain access to content.
>   * @ctx: Context for callback.
>   */
> -int verify_pkcs7_signature(const void *data, size_t len,
> -			   const void *raw_pkcs7, size_t pkcs7_len,
> -			   struct key *trusted_keys,
> -			   enum key_being_used_for usage,
> -			   int (*view_content)(void *ctx,
> -					       const void *data, size_t len,
> -					       size_t asn1hdrlen),
> -			   void *ctx)
> +int verify_pkcs7_message_sig(const void *data, size_t len,
> +			     struct pkcs7_message *pkcs7,
> +			     struct key *trusted_keys,
> +			     enum key_being_used_for usage,
> +			     int (*view_content)(void *ctx,
> +						 const void *data, size_t len,
> +						 size_t asn1hdrlen),
> +			     void *ctx)
>  {
> -	struct pkcs7_message *pkcs7;
>  	int ret;
>
> -	pkcs7 = pkcs7_parse_message(raw_pkcs7, pkcs7_len);
> -	if (IS_ERR(pkcs7))
> -		return PTR_ERR(pkcs7);
> -
>  	/* The data should be detached - so we need to supply it. */
>  	if (data && pkcs7_supply_detached_data(pkcs7, data, len) < 0) {
>  		pr_err("PKCS#7 signature with non-detached data\n");
> @@ -273,6 +267,41 @@ int verify_pkcs7_signature(const void *data, size_t len,
>  	}
>
>  error:
> +	pr_devel("<==%s() = %d\n", __func__, ret);
> +	return ret;
> +}
> +
> +/**
> + * verify_pkcs7_signature - Verify a PKCS#7-based signature on system data.
> + * @data: The data to be verified (NULL if expecting internal data).
> + * @len: Size of @data.
> + * @raw_pkcs7: The PKCS#7 message that is the signature.
> + * @pkcs7_len: The size of @raw_pkcs7.
> + * @trusted_keys: Trusted keys to use (NULL for builtin trusted keys only,
> + *					(void *)1UL for all trusted keys).
> + * @usage: The use to which the key is being put.
> + * @view_content: Callback to gain access to content.
> + * @ctx: Context for callback.
> + */
> +int verify_pkcs7_signature(const void *data, size_t len,
> +			   const void *raw_pkcs7, size_t pkcs7_len,
> +			   struct key *trusted_keys,
> +			   enum key_being_used_for usage,
> +			   int (*view_content)(void *ctx,
> +					       const void *data, size_t len,
> +					       size_t asn1hdrlen),
> +			   void *ctx)
> +{
> +	struct pkcs7_message *pkcs7;
> +	int ret;
> +
> +	pkcs7 = pkcs7_parse_message(raw_pkcs7, pkcs7_len);
> +	if (IS_ERR(pkcs7))
> +		return PTR_ERR(pkcs7);
> +
> +	ret = verify_pkcs7_message_sig(data, len, pkcs7, trusted_keys, usage,
> +				       view_content, ctx);
> +
>  	pkcs7_free_message(pkcs7);
>  	pr_devel("<==%s() = %d\n", __func__, ret);
>  	return ret;
> diff --git a/include/linux/verification.h b/include/linux/verification.h
> index 018fb5f13d44..5e1d41f2b336 100644
> --- a/include/linux/verification.h
> +++ b/include/linux/verification.h
> @@ -36,6 +36,7 @@ extern const char *const key_being_used_for[NR__KEY_BEING_USED_FOR];
>  #ifdef CONFIG_SYSTEM_DATA_VERIFICATION
>
>  struct key;
> +struct pkcs7_message;
>
>  extern int verify_pkcs7_signature(const void *data, size_t len,
>  				  const void *raw_pkcs7, size_t pkcs7_len,
> @@ -45,6 +46,15 @@ extern int verify_pkcs7_signature(const void *data, size_t len,
>  						      const void *data, size_t len,
>  						      size_t asn1hdrlen),
>  				  void *ctx);
> +extern int verify_pkcs7_message_sig(const void *data, size_t len,
> +				    struct pkcs7_message *pkcs7,
> +				    struct key *trusted_keys,
> +				    enum key_being_used_for usage,
> +				    int (*view_content)(void *ctx,
> +							const void *data,
> +							size_t len,
> +							size_t asn1hdrlen),
> +				    void *ctx);
>
>  #ifdef CONFIG_SIGNED_PE_FILE_VERIFICATION
>  extern int verify_pefile_signature(const void *pebuf, unsigned pelen,

^ permalink raw reply

* Re: [PATCH v11 01/13] MODSIGN: Export module signature definitions
From: Thiago Jung Bauermann @ 2019-06-24 19:52 UTC (permalink / raw)
  To: Jessica Yu
  Cc: linux-security-module, keyrings, linux-crypto, linuxppc-dev,
	linux-doc, linux-kernel, Mimi Zohar, Dmitry Kasatkin,
	James Morris, Serge E. Hallyn, David Howells, David Woodhouse,
	Herbert Xu, David S. Miller, Jonathan Corbet, AKASHI, Takahiro,
	linux-integrity
In-Reply-To: <20190611062817.18412-2-bauerman@linux.ibm.com>


Hello Jessica,

AFAIK Mimi is happy with this patch set, but I still need acks from
maintainers of other subsystems that my changes touch before she can
accept it.

Is this patch OK from your PoV?

--
Thiago Jung Bauermann
IBM Linux Technology Center


Thiago Jung Bauermann <bauerman@linux.ibm.com> writes:

> IMA will use the module_signature format for append signatures, so export
> the relevant definitions and factor out the code which verifies that the
> appended signature trailer is valid.
>
> Also, create a CONFIG_MODULE_SIG_FORMAT option so that IMA can select it
> and be able to use mod_check_sig() without having to depend on either
> CONFIG_MODULE_SIG or CONFIG_MODULES.
>
> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
> Cc: Jessica Yu <jeyu@kernel.org>
> ---
>  include/linux/module.h           |  3 --
>  include/linux/module_signature.h | 44 +++++++++++++++++++++++++
>  init/Kconfig                     |  6 +++-
>  kernel/Makefile                  |  1 +
>  kernel/module.c                  |  1 +
>  kernel/module_signature.c        | 46 ++++++++++++++++++++++++++
>  kernel/module_signing.c          | 56 +++++---------------------------
>  scripts/Makefile                 |  2 +-
>  8 files changed, 106 insertions(+), 53 deletions(-)
>
> diff --git a/include/linux/module.h b/include/linux/module.h
> index 188998d3dca9..aa56f531cf1e 100644
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -25,9 +25,6 @@
>  #include <linux/percpu.h>
>  #include <asm/module.h>
>
> -/* In stripped ARM and x86-64 modules, ~ is surprisingly rare. */
> -#define MODULE_SIG_STRING "~Module signature appended~\n"
> -
>  /* Not Yet Implemented */
>  #define MODULE_SUPPORTED_DEVICE(name)
>
> diff --git a/include/linux/module_signature.h b/include/linux/module_signature.h
> new file mode 100644
> index 000000000000..523617fc5b6a
> --- /dev/null
> +++ b/include/linux/module_signature.h
> @@ -0,0 +1,44 @@
> +/* SPDX-License-Identifier: GPL-2.0+ */
> +/*
> + * Module signature handling.
> + *
> + * Copyright (C) 2012 Red Hat, Inc. All Rights Reserved.
> + * Written by David Howells (dhowells@redhat.com)
> + */
> +
> +#ifndef _LINUX_MODULE_SIGNATURE_H
> +#define _LINUX_MODULE_SIGNATURE_H
> +
> +/* In stripped ARM and x86-64 modules, ~ is surprisingly rare. */
> +#define MODULE_SIG_STRING "~Module signature appended~\n"
> +
> +enum pkey_id_type {
> +	PKEY_ID_PGP,		/* OpenPGP generated key ID */
> +	PKEY_ID_X509,		/* X.509 arbitrary subjectKeyIdentifier */
> +	PKEY_ID_PKCS7,		/* Signature in PKCS#7 message */
> +};
> +
> +/*
> + * Module signature information block.
> + *
> + * The constituents of the signature section are, in order:
> + *
> + *	- Signer's name
> + *	- Key identifier
> + *	- Signature data
> + *	- Information block
> + */
> +struct module_signature {
> +	u8	algo;		/* Public-key crypto algorithm [0] */
> +	u8	hash;		/* Digest algorithm [0] */
> +	u8	id_type;	/* Key identifier type [PKEY_ID_PKCS7] */
> +	u8	signer_len;	/* Length of signer's name [0] */
> +	u8	key_id_len;	/* Length of key identifier [0] */
> +	u8	__pad[3];
> +	__be32	sig_len;	/* Length of signature data */
> +};
> +
> +int mod_check_sig(const struct module_signature *ms, size_t file_len,
> +		  const char *name);
> +
> +#endif /* _LINUX_MODULE_SIGNATURE_H */
> diff --git a/init/Kconfig b/init/Kconfig
> index 8b9ffe236e4f..c2286a3c74c5 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1852,6 +1852,10 @@ config BASE_SMALL
>  	default 0 if BASE_FULL
>  	default 1 if !BASE_FULL
>
> +config MODULE_SIG_FORMAT
> +	def_bool n
> +	select SYSTEM_DATA_VERIFICATION
> +
>  menuconfig MODULES
>  	bool "Enable loadable module support"
>  	option modules
> @@ -1929,7 +1933,7 @@ config MODULE_SRCVERSION_ALL
>  config MODULE_SIG
>  	bool "Module signature verification"
>  	depends on MODULES
> -	select SYSTEM_DATA_VERIFICATION
> +	select MODULE_SIG_FORMAT
>  	help
>  	  Check modules for valid signatures upon load: the signature
>  	  is simply appended to the module. For more information see
> diff --git a/kernel/Makefile b/kernel/Makefile
> index 33824f0385b3..f29ae2997a43 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -58,6 +58,7 @@ endif
>  obj-$(CONFIG_UID16) += uid16.o
>  obj-$(CONFIG_MODULES) += module.o
>  obj-$(CONFIG_MODULE_SIG) += module_signing.o
> +obj-$(CONFIG_MODULE_SIG_FORMAT) += module_signature.o
>  obj-$(CONFIG_KALLSYMS) += kallsyms.o
>  obj-$(CONFIG_BSD_PROCESS_ACCT) += acct.o
>  obj-$(CONFIG_CRASH_CORE) += crash_core.o
> diff --git a/kernel/module.c b/kernel/module.c
> index 6e6712b3aaf5..2712f4d217f5 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -19,6 +19,7 @@
>  #include <linux/export.h>
>  #include <linux/extable.h>
>  #include <linux/moduleloader.h>
> +#include <linux/module_signature.h>
>  #include <linux/trace_events.h>
>  #include <linux/init.h>
>  #include <linux/kallsyms.h>
> diff --git a/kernel/module_signature.c b/kernel/module_signature.c
> new file mode 100644
> index 000000000000..4224a1086b7d
> --- /dev/null
> +++ b/kernel/module_signature.c
> @@ -0,0 +1,46 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * Module signature checker
> + *
> + * Copyright (C) 2012 Red Hat, Inc. All Rights Reserved.
> + * Written by David Howells (dhowells@redhat.com)
> + */
> +
> +#include <linux/errno.h>
> +#include <linux/printk.h>
> +#include <linux/module_signature.h>
> +#include <asm/byteorder.h>
> +
> +/**
> + * mod_check_sig - check that the given signature is sane
> + *
> + * @ms:		Signature to check.
> + * @file_len:	Size of the file to which @ms is appended.
> + * @name:	What is being checked. Used for error messages.
> + */
> +int mod_check_sig(const struct module_signature *ms, size_t file_len,
> +		  const char *name)
> +{
> +	if (be32_to_cpu(ms->sig_len) >= file_len - sizeof(*ms))
> +		return -EBADMSG;
> +
> +	if (ms->id_type != PKEY_ID_PKCS7) {
> +		pr_err("%s: Module is not signed with expected PKCS#7 message\n",
> +		       name);
> +		return -ENOPKG;
> +	}
> +
> +	if (ms->algo != 0 ||
> +	    ms->hash != 0 ||
> +	    ms->signer_len != 0 ||
> +	    ms->key_id_len != 0 ||
> +	    ms->__pad[0] != 0 ||
> +	    ms->__pad[1] != 0 ||
> +	    ms->__pad[2] != 0) {
> +		pr_err("%s: PKCS#7 signature info has unexpected non-zero params\n",
> +		       name);
> +		return -EBADMSG;
> +	}
> +
> +	return 0;
> +}
> diff --git a/kernel/module_signing.c b/kernel/module_signing.c
> index 6b9a926fd86b..cdd04a6b8074 100644
> --- a/kernel/module_signing.c
> +++ b/kernel/module_signing.c
> @@ -11,37 +11,13 @@
>
>  #include <linux/kernel.h>
>  #include <linux/errno.h>
> +#include <linux/module.h>
> +#include <linux/module_signature.h>
>  #include <linux/string.h>
>  #include <linux/verification.h>
>  #include <crypto/public_key.h>
>  #include "module-internal.h"
>
> -enum pkey_id_type {
> -	PKEY_ID_PGP,		/* OpenPGP generated key ID */
> -	PKEY_ID_X509,		/* X.509 arbitrary subjectKeyIdentifier */
> -	PKEY_ID_PKCS7,		/* Signature in PKCS#7 message */
> -};
> -
> -/*
> - * Module signature information block.
> - *
> - * The constituents of the signature section are, in order:
> - *
> - *	- Signer's name
> - *	- Key identifier
> - *	- Signature data
> - *	- Information block
> - */
> -struct module_signature {
> -	u8	algo;		/* Public-key crypto algorithm [0] */
> -	u8	hash;		/* Digest algorithm [0] */
> -	u8	id_type;	/* Key identifier type [PKEY_ID_PKCS7] */
> -	u8	signer_len;	/* Length of signer's name [0] */
> -	u8	key_id_len;	/* Length of key identifier [0] */
> -	u8	__pad[3];
> -	__be32	sig_len;	/* Length of signature data */
> -};
> -
>  /*
>   * Verify the signature on a module.
>   */
> @@ -49,6 +25,7 @@ int mod_verify_sig(const void *mod, struct load_info *info)
>  {
>  	struct module_signature ms;
>  	size_t sig_len, modlen = info->len;
> +	int ret;
>
>  	pr_devel("==>%s(,%zu)\n", __func__, modlen);
>
> @@ -56,32 +33,15 @@ int mod_verify_sig(const void *mod, struct load_info *info)
>  		return -EBADMSG;
>
>  	memcpy(&ms, mod + (modlen - sizeof(ms)), sizeof(ms));
> -	modlen -= sizeof(ms);
> +
> +	ret = mod_check_sig(&ms, modlen, info->name);
> +	if (ret)
> +		return ret;
>
>  	sig_len = be32_to_cpu(ms.sig_len);
> -	if (sig_len >= modlen)
> -		return -EBADMSG;
> -	modlen -= sig_len;
> +	modlen -= sig_len + sizeof(ms);
>  	info->len = modlen;
>
> -	if (ms.id_type != PKEY_ID_PKCS7) {
> -		pr_err("%s: Module is not signed with expected PKCS#7 message\n",
> -		       info->name);
> -		return -ENOPKG;
> -	}
> -
> -	if (ms.algo != 0 ||
> -	    ms.hash != 0 ||
> -	    ms.signer_len != 0 ||
> -	    ms.key_id_len != 0 ||
> -	    ms.__pad[0] != 0 ||
> -	    ms.__pad[1] != 0 ||
> -	    ms.__pad[2] != 0) {
> -		pr_err("%s: PKCS#7 signature info has unexpected non-zero params\n",
> -		       info->name);
> -		return -EBADMSG;
> -	}
> -
>  	return verify_pkcs7_signature(mod, modlen, mod + modlen, sig_len,
>  				      VERIFY_USE_SECONDARY_KEYRING,
>  				      VERIFYING_MODULE_SIGNATURE,
> diff --git a/scripts/Makefile b/scripts/Makefile
> index 9d442ee050bd..52098b080ab7 100644
> --- a/scripts/Makefile
> +++ b/scripts/Makefile
> @@ -17,7 +17,7 @@ hostprogs-$(CONFIG_VT)           += conmakehash
>  hostprogs-$(BUILD_C_RECORDMCOUNT) += recordmcount
>  hostprogs-$(CONFIG_BUILDTIME_EXTABLE_SORT) += sortextable
>  hostprogs-$(CONFIG_ASN1)	 += asn1_compiler
> -hostprogs-$(CONFIG_MODULE_SIG)	 += sign-file
> +hostprogs-$(CONFIG_MODULE_SIG_FORMAT) += sign-file
>  hostprogs-$(CONFIG_SYSTEM_TRUSTED_KEYRING) += extract-cert
>  hostprogs-$(CONFIG_SYSTEM_EXTRA_CERTIFICATE) += insert-sys-cert
>


^ permalink raw reply

* Re: [PATCH 0/3] Clean up crypto documentation
From: Joe Perches @ 2019-06-24 19:30 UTC (permalink / raw)
  To: Hook, Gary, herbert@gondor.apana.org.au, corbet@lwn.net,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-crypto@vger.kernel.org, davem@davemloft.net
In-Reply-To: <156140322426.29777.8610751479936722967.stgit@taos>

On Mon, 2019-06-24 at 19:07 +0000, Hook, Gary wrote:
> Tidy up the crypto documentation by filling in some variable
> descriptions, make some grammatical corrections, and enhance
> formatting.

While this seems generally OK, please try not to make the
readability of the source _text_ less intelligible just
to enhance the output formatting of the html.

e.g.:

Unnecessary blank lines separating function descriptions
Removing space alignment from bullet point descriptions

etc..,




^ permalink raw reply

* [PATCH 3/3] crypto: doc - Fix formatting of new crypto engine content
From: Hook, Gary @ 2019-06-24 19:07 UTC (permalink / raw)
  To: herbert@gondor.apana.org.au, corbet@lwn.net,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-crypto@vger.kernel.org, davem@davemloft.net
In-Reply-To: <156140322426.29777.8610751479936722967.stgit@taos>

Tidy up the formatting/grammar in crypto_engine.rst. Use lists where
appropriate.

Signed-off-by: Gary R Hook <gary.hook@amd.com>
---
 Documentation/crypto/crypto_engine.rst |  120 ++++++++++++++++++++++----------
 1 file changed, 82 insertions(+), 38 deletions(-)

diff --git a/Documentation/crypto/crypto_engine.rst b/Documentation/crypto/crypto_engine.rst
index 1d56221dfe35..18e3a9afa444 100644
--- a/Documentation/crypto/crypto_engine.rst
+++ b/Documentation/crypto/crypto_engine.rst
@@ -1,50 +1,94 @@
-=============
-CRYPTO ENGINE
+.. SPDX-License-Identifier: GPL-2.0
+
+Crypto Engine
 =============
 
 Overview
 --------
-The crypto engine API (CE), is a crypto queue manager.
+
+The crypto engine (CE) API is a crypto queue manager.
 
 Requirement
 -----------
-You have to put at start of your tfm_ctx the struct crypto_engine_ctx::
 
-  struct your_tfm_ctx {
-        struct crypto_engine_ctx enginectx;
-        ...
-  };
+You must put, at the start of your transform context your_tfm_ctx, the structure
+crypto_engine:
+
+::
 
-Why: Since CE manage only crypto_async_request, it cannot know the underlying
-request_type and so have access only on the TFM.
-So using container_of for accessing __ctx is impossible.
-Furthermore, the crypto engine cannot know the "struct your_tfm_ctx",
-so it must assume that crypto_engine_ctx is at start of it.
+
+	struct your_tfm_ctx {
+		struct crypto_engine engine;
+		...
+	};
+
+
+The crypto engine only manages asynchronous requests in the form of
+crypto_async_request. It cannot know the underlying request type and thus only
+has access to the transform structure. It is not possible to access the context
+using container_of. In addition, the engine knows nothing about your
+structure "``struct your_tfm_ctx``". The engine assumes (requires) the placement
+of the known member ``struct crypto_engine`` at the beginning.
 
 Order of operations
 -------------------
-You have to obtain a struct crypto_engine via crypto_engine_alloc_init().
-And start it via crypto_engine_start().
-
-Before transferring any request, you have to fill the enginectx.
-- prepare_request: (taking a function pointer) If you need to do some processing before doing the request
-- unprepare_request: (taking a function pointer) Undoing what's done in prepare_request
-- do_one_request: (taking a function pointer) Do encryption for current request
-
-Note: that those three functions get the crypto_async_request associated with the received request.
-So your need to get the original request via container_of(areq, struct yourrequesttype_request, base);
-
-When your driver receive a crypto_request, you have to transfer it to
-the cryptoengine via one of:
-- crypto_transfer_ablkcipher_request_to_engine()
-- crypto_transfer_aead_request_to_engine()
-- crypto_transfer_akcipher_request_to_engine()
-- crypto_transfer_hash_request_to_engine()
-- crypto_transfer_skcipher_request_to_engine()
-
-At the end of the request process, a call to one of the following function is needed:
-- crypto_finalize_ablkcipher_request
-- crypto_finalize_aead_request
-- crypto_finalize_akcipher_request
-- crypto_finalize_hash_request
-- crypto_finalize_skcipher_request
+
+You are required to obtain a struct crypto_engine via ``crypto_engine_alloc_init()``.
+Start it via ``crypto_engine_start()``. When finished with your work, shut down the
+engine using ``crypto_engine_stop()`` and destroy the engine with
+``crypto_engine_exit()``.
+
+Before transferring any request, you have to fill the context enginectx by
+providing functions for the following:
+
+* ``prepare_crypt_hardware``: Called once before any prepare functions are
+  called.
+
+* ``unprepare_crypt_hardware``: Called once after all unprepare functions have
+  been called.
+
+* ``prepare_cipher_request``/``prepare_hash_request``: Called before each
+  corresponding request is performed. If some processing or other preparatory
+  work is required, do it here.
+
+* ``unprepare_cipher_request``/``unprepare_hash_request``: Called after each
+  request is handled. Clean up / undo what was done in the prepare function.
+
+* ``cipher_one_request``/``hash_one_request``: Handle the current request by
+  performing the operation.
+
+Note that these functions access the crypto_async_request structure
+associated with the received request. You are able to retrieve the original
+request by using:
+
+::
+
+
+	container_of(areq, struct yourrequesttype_request, base);
+
+
+
+When your driver receives a crypto_request, you must to transfer it to
+the crypto engine via one of:
+
+* ``crypto_transfer_ablkcipher_request_to_engine()``
+
+* ``crypto_transfer_aead_request_to_engine()``
+
+* ``crypto_transfer_akcipher_request_to_engine()``
+
+* ``crypto_transfer_hash_request_to_engine()``
+
+* ``crypto_transfer_skcipher_request_to_engine()``
+
+At the end of the request process, a call to one of the following functions is needed:
+
+* ``crypto_finalize_ablkcipher_request()``
+
+* ``crypto_finalize_aead_request()``
+
+* ``crypto_finalize_akcipher_request()``
+
+* ``crypto_finalize_hash_request()``
+
+* ``crypto_finalize_skcipher_request()``


^ permalink raw reply related

* [PATCH 2/3] crypto: doc - Describe the crypto engine
From: Hook, Gary @ 2019-06-24 19:07 UTC (permalink / raw)
  To: herbert@gondor.apana.org.au, corbet@lwn.net,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-crypto@vger.kernel.org, davem@davemloft.net
In-Reply-To: <156140322426.29777.8610751479936722967.stgit@taos>

Add a reference to the crypto engine documentation to
the index.

Signed-off-by: Gary R Hook <gary.hook@amd.com>
---
 Documentation/crypto/index.rst |    1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/crypto/index.rst b/Documentation/crypto/index.rst
index c4ff5d791233..37cd7fb0ea82 100644
--- a/Documentation/crypto/index.rst
+++ b/Documentation/crypto/index.rst
@@ -19,6 +19,7 @@ for cryptographic use cases, as well as programming examples.
    intro
    architecture
    devel-algos
+   crypto_engine
    userspace-if
    crypto_engine
    api


^ permalink raw reply related

* [PATCH 1/3] crypto: doc - Add parameter documentation
From: Hook, Gary @ 2019-06-24 19:07 UTC (permalink / raw)
  To: herbert@gondor.apana.org.au, corbet@lwn.net,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-crypto@vger.kernel.org, davem@davemloft.net
In-Reply-To: <156140322426.29777.8610751479936722967.stgit@taos>

Fill in missing parameter descriptions for the ompression algorithm,
then pick them up to document for the compression_alg structure.

Signed-off-by: Gary R Hook <gary.hook@amd.com>
---
 Documentation/crypto/api-skcipher.rst |    2 +-
 include/linux/crypto.h                |   11 +++++++++++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/Documentation/crypto/api-skcipher.rst b/Documentation/crypto/api-skcipher.rst
index 4eec4a93f7e3..20ba08dddf2e 100644
--- a/Documentation/crypto/api-skcipher.rst
+++ b/Documentation/crypto/api-skcipher.rst
@@ -5,7 +5,7 @@ Block Cipher Algorithm Definitions
    :doc: Block Cipher Algorithm Definitions
 
 .. kernel-doc:: include/linux/crypto.h
-   :functions: crypto_alg ablkcipher_alg blkcipher_alg cipher_alg
+   :functions: crypto_alg ablkcipher_alg blkcipher_alg cipher_alg compress_alg
 
 Symmetric Key Cipher API
 ------------------------
diff --git a/include/linux/crypto.h b/include/linux/crypto.h
index 311237b1dab0..4b4e2ffbee74 100644
--- a/include/linux/crypto.h
+++ b/include/linux/crypto.h
@@ -327,6 +327,17 @@ struct cipher_alg {
 	void (*cia_decrypt)(struct crypto_tfm *tfm, u8 *dst, const u8 *src);
 };
 
+/**
+ * struct compress_alg - compression/decompression algorithm
+ * @coa_compress: Compress a buffer of specified length, storing the resulting
+ *		  data in the specified buffer. Return the length of the
+ *		  compressed data in dlen.
+ * @coa_decompress: Decompress the source buffer, storing the uncompressed
+ *		    data in the specified buffer. The length of the data is
+ *		    returned in dlen.
+ *
+ * All fields are mandatory.
+ */
 struct compress_alg {
 	int (*coa_compress)(struct crypto_tfm *tfm, const u8 *src,
 			    unsigned int slen, u8 *dst, unsigned int *dlen);


^ permalink raw reply related

* [PATCH 0/3] Clean up crypto documentation
From: Hook, Gary @ 2019-06-24 19:07 UTC (permalink / raw)
  To: herbert@gondor.apana.org.au, corbet@lwn.net,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-crypto@vger.kernel.org, davem@davemloft.net

Tidy up the crypto documentation by filling in some variable
descriptions, make some grammatical corrections, and enhance
formatting.

---

Gary R Hook (3):
      crypto: doc - Add parameter documentation
      crypto: doc - Describe the crypto engine
      crypto: doc - Fix formatting of new crypto engine content


 Documentation/crypto/api-skcipher.rst  |    2 -
 Documentation/crypto/crypto_engine.rst |  120 ++++++++++++++++++++++----------
 Documentation/crypto/index.rst         |    1 
 include/linux/crypto.h                 |   11 +++
 4 files changed, 95 insertions(+), 39 deletions(-)

--

^ permalink raw reply

* [PATCH] Documentation: dmaengine: clean up description of dmatest usage
From: Hook, Gary @ 2019-06-24 18:35 UTC (permalink / raw)
  To: dmaengine@vger.kernel.org, vkoul@kernel.org,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org

Fix the formatting of the multi-channel test usage example. Call out
the note about parameter ordering and add detail on the settings of
parameters for the new version of dmatest.

Fixes: f80f9988a26d7 ("dmaengine: Documentation: Add documentation for multi chan testing")

Signed-off-by: Gary R Hook <gary.hook@amd.com>
---
 Documentation/driver-api/dmaengine/dmatest.rst |   21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/Documentation/driver-api/dmaengine/dmatest.rst b/Documentation/driver-api/dmaengine/dmatest.rst
index e78d070bb468..ee268d445d38 100644
--- a/Documentation/driver-api/dmaengine/dmatest.rst
+++ b/Documentation/driver-api/dmaengine/dmatest.rst
@@ -44,7 +44,8 @@ Example of usage::
 
     dmatest.timeout=2000 dmatest.iterations=1 dmatest.channel=dma0chan0 dmatest.run=1
 
-Example of multi-channel test usage:
+Example of multi-channel test usage (new in the 5.0 kernel)::
+
     % modprobe dmatest
     % echo 2000 > /sys/module/dmatest/parameters/timeout
     % echo 1 > /sys/module/dmatest/parameters/iterations
@@ -53,15 +54,18 @@ Example of multi-channel test usage:
     % echo dma0chan2 > /sys/module/dmatest/parameters/channel
     % echo 1 > /sys/module/dmatest/parameters/run
 
-Note: the channel parameter should always be the last parameter set prior to
-running the test (setting run=1), this is because upon setting the channel
-parameter, that specific channel is requested using the dmaengine and a thread
-is created with the existing parameters. This thread is set as pending
-and will be executed once run is set to 1. Any parameters set after the thread
-is created are not applied.
+.. note::
+  For all tests, starting in the 5.0 kernel, either single- or multi-channel,
+  the channel parameter(s) must be set after all other parameters. It is at
+  that time that the existing parameter values are acquired for use by the
+  thread(s). All other parameters are shared. Therefore, if changes are made
+  to any of the other parameters, and an additional channel specified, the
+  (shared) parameters used for all threads will use the new values.
+  After the channels are specified, each thread is set as pending. All threads
+  begin execution when the run parameter is set to 1.
 
 .. hint::
-  available channel list could be extracted by running the following command::
+  A list of available channels can be found by running the following command::
 
     % ls -1 /sys/class/dma/
 
@@ -204,6 +208,7 @@ Releasing Channels
 Channels can be freed by setting run to 0.
 
 Example::
+
     % echo dma0chan1 > /sys/module/dmatest/parameters/channel
     dmatest: Added 1 threads using dma0chan1
     % cat /sys/class/dma/dma0chan1/in_use


^ permalink raw reply related

* Re: [PATCH] bcache: make stripe_size configurable and persistent for hardware raid5/6
From: Eric Wheeler @ 2019-06-24 18:14 UTC (permalink / raw)
  To: Coly Li
  Cc: linux-block, Jonathan Corbet, Kent Overstreet,
	open list:DOCUMENTATION, open list,
	open list:BCACHE (BLOCK LAYER CACHE), Martin K. Petersen
In-Reply-To: <200638b0-7cba-38b4-20c4-b325f3cfe862@suse.de>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 8629 bytes --]

On Mon, 24 Jun 2019, Coly Li wrote:

> On 2019/6/23 7:16 上午, Eric Wheeler wrote:
> > From: Eric Wheeler <git@linux.ewheeler.net>
> > 
> > While some drivers set queue_limits.io_opt (e.g., md raid5), there are
> > currently no SCSI/RAID controller drivers that do.  Previously stripe_size
> > and partial_stripes_expensive were read-only values and could not be
> > tuned by users (eg, for hardware RAID5/6).
> > 
> > This patch enables users to save the optimal IO size via sysfs through
> > the backing device attributes stripe_size and partial_stripes_expensive
> > into the bcache superblock.
> > 
> > Superblock changes are backwards-compatable:
> > 
> > *  partial_stripes_expensive: One bit was used in the superblock flags field
> > 
> > *  stripe_size: There are eight 64-bit "pad" fields for future use in
> >    the superblock which default to 0; from those, 32-bits are now used
> >    to save the stripe_size and load at device registration time.
> > 
> > Signed-off-by: Eric Wheeler <bcache@linux.ewheeler.net>
> 
> Hi Eric,
> 
> In general I am OK with this patch. Since Peter comments lots of SCSI
> RAID devices reports a stripe width, could you please list the hardware
> raid devices which don't list stripe size ? Then we can make decision
> whether it is necessary to have such option enabled.

Perhaps they do not set stripe_width using io_opt? I did a grep to see if 
any of them did, but I didn't see them. How is stripe_width indicated by 
RAID controllers? 

If they do set io_opt, then at least my Areca 1883 does not set io_opt as 
of 4.19.x. I also have a LSI MegaRAID 3108 which does not report io_opt as 
of 4.1.x, but that is an older kernel so maybe support has been added 
since then.

Martin,

Where would stripe_width be configured in the SCSI drivers? Is it visible 
through sysfs or debugfs so I can check my hardware support without 
hacking debugging the kernel?

> 
> Another point is, this patch changes struct cache_sb, it is no problem
> to change on-disk format. I plan to update the super block version soon,
> to store more configuration persistently into super block. stripe_size
> can be added to cache_sb with other on-disk changes.

Maybe bumping version makes sense, but even if you do not, this is safe to 
use on systems without bumping the version because the values are unused 
and default to 0.

--
Eric Wheeler

> 
> Thanks.
> 
> Coly Li
> 
> 
> > ---
> >  Documentation/admin-guide/bcache.rst | 21 +++++++++++++++++++++
> >  drivers/md/bcache/super.c            | 15 ++++++++++++++-
> >  drivers/md/bcache/sysfs.c            | 33 +++++++++++++++++++++++++++++++--
> >  include/uapi/linux/bcache.h          |  6 ++++--
> >  4 files changed, 70 insertions(+), 5 deletions(-)
> > 
> > diff --git a/Documentation/admin-guide/bcache.rst b/Documentation/admin-guide/bcache.rst
> > index c0ce64d..ef82022 100644
> > --- a/Documentation/admin-guide/bcache.rst
> > +++ b/Documentation/admin-guide/bcache.rst
> > @@ -420,6 +420,12 @@ dirty_data
> >  label
> >    Name of underlying device.
> >  
> > +partial_stripes_expensive
> > +  Flag to bcache that partial or unaligned stripe_size'd
> > +  writes to the backing device are expensive (e.g., RAID5/6 incur
> > +  read-copy-write). Writing this sysfs attribute updates the superblock
> > +  and also takes effect immediately.  See also stripe_size, below.
> > +
> >  readahead
> >    Size of readahead that should be performed.  Defaults to 0.  If set to e.g.
> >    1M, it will round cache miss reads up to that size, but without overlapping
> > @@ -458,6 +464,21 @@ stop
> >    Write to this file to shut down the bcache device and close the backing
> >    device.
> >  
> > +stripe_size
> > +  The stripe size in bytes of the backing device for optimial
> > +  write performance (also known as the "stride width"). This is set
> > +  automatically when using a device driver sets blk_limits_io_opt
> > +  (e.g., md, rbd, skd, zram, virtio_blk).  No hardware RAID controller
> > +  sets blk_limits_io_opt as of 2019-06-15, so configure this to suit
> > +  your needs.  Note that you must unregister and re-register the backing
> > +  device after making a change to stripe_size.
> > +
> > +  Where N is the number of data disks,
> > +    RAID5: stripe_size = (N-1)*RAID_CHUNK_SIZE.
> > +    RAID6: stripe_size = (N-2)*RAID_CHUNK_SIZE.
> > +
> > +  See also partial_stripes_expensive, above.
> > +
> >  writeback_delay
> >    When dirty data is written to the cache and it previously did not contain
> >    any, waits some number of seconds before initiating writeback. Defaults to
> > diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> > index 1b63ac8..d0b9501 100644
> > --- a/drivers/md/bcache/super.c
> > +++ b/drivers/md/bcache/super.c
> > @@ -80,6 +80,7 @@ static const char *read_super(struct cache_sb *sb, struct block_device *bdev,
> >  
> >  	sb->flags		= le64_to_cpu(s->flags);
> >  	sb->seq			= le64_to_cpu(s->seq);
> > +	sb->stripe_size		= le32_to_cpu(s->stripe_size);
> >  	sb->last_mount		= le32_to_cpu(s->last_mount);
> >  	sb->first_bucket	= le16_to_cpu(s->first_bucket);
> >  	sb->keys		= le16_to_cpu(s->keys);
> > @@ -221,6 +222,7 @@ static void __write_super(struct cache_sb *sb, struct bio *bio)
> >  
> >  	out->flags		= cpu_to_le64(sb->flags);
> >  	out->seq		= cpu_to_le64(sb->seq);
> > +	out->stripe_size	= cpu_to_le32(sb->stripe_size);
> >  
> >  	out->last_mount		= cpu_to_le32(sb->last_mount);
> >  	out->first_bucket	= cpu_to_le16(sb->first_bucket);
> > @@ -1258,7 +1260,18 @@ static int cached_dev_init(struct cached_dev *dc, unsigned int block_size)
> >  
> >  	dc->disk.stripe_size = q->limits.io_opt >> 9;
> >  
> > -	if (dc->disk.stripe_size)
> > +	if (dc->sb.stripe_size) {
> > +		if (dc->disk.stripe_size &&
> > +		    dc->disk.stripe_size != dc->sb.stripe_size) {
> > +			pr_warn("superblock stripe_size (%d) overrides bdev stripe_size (%d)\n",
> > +				(int)dc->sb.stripe_size,
> > +				(int)dc->disk.stripe_size);
> > +		}
> > +
> > +		dc->disk.stripe_size = dc->sb.stripe_size;
> > +		dc->partial_stripes_expensive =
> > +			(unsigned int)BDEV_PARTIAL_STRIPES_EXPENSIVE(&dc->sb);
> > +	} else if (dc->disk.stripe_size)
> >  		dc->partial_stripes_expensive =
> >  			q->limits.raid_partial_stripes_expensive;
> >  
> > diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c
> > index bfb437f..4ebca52 100644
> > --- a/drivers/md/bcache/sysfs.c
> > +++ b/drivers/md/bcache/sysfs.c
> > @@ -111,8 +111,8 @@
> >  rw_attribute(writeback_rate_minimum);
> >  read_attribute(writeback_rate_debug);
> >  
> > -read_attribute(stripe_size);
> > -read_attribute(partial_stripes_expensive);
> > +rw_attribute(stripe_size);
> > +rw_attribute(partial_stripes_expensive);
> >  
> >  rw_attribute(synchronous);
> >  rw_attribute(journal_delay_ms);
> > @@ -343,6 +343,35 @@ static ssize_t bch_snprint_string_list(char *buf,
> >  		}
> >  	}
> >  
> > +	if (attr == &sysfs_stripe_size) {
> > +		int v = strtoul_or_return(buf);
> > +
> > +		if (v & 0x1FF) {
> > +			pr_err("stripe_size must be a muliple of 512-byte sectors");
> > +			return -EINVAL;
> > +		}
> > +
> > +		v >>= 9;
> > +
> > +		if (v != dc->sb.stripe_size) {
> > +			dc->sb.stripe_size = v;
> > +			pr_info("stripe_size=%d, re-register to take effect.",
> > +				v<<9);
> > +			bch_write_bdev_super(dc, NULL);
> > +		} else
> > +			pr_info("stripe_size is already set to %d.", v<<9);
> > +	}
> > +
> > +	if (attr == &sysfs_partial_stripes_expensive) {
> > +		int v = strtoul_or_return(buf);
> > +
> > +		if (v != BDEV_PARTIAL_STRIPES_EXPENSIVE(&dc->sb)) {
> > +			SET_BDEV_PARTIAL_STRIPES_EXPENSIVE(&dc->sb, v);
> > +			dc->partial_stripes_expensive = v;
> > +			bch_write_bdev_super(dc, NULL);
> > +		}
> > +	}
> > +
> >  	if (attr == &sysfs_stop_when_cache_set_failed) {
> >  		v = __sysfs_match_string(bch_stop_on_failure_modes, -1, buf);
> >  		if (v < 0)
> > diff --git a/include/uapi/linux/bcache.h b/include/uapi/linux/bcache.h
> > index 5d4f58e..ee60914 100644
> > --- a/include/uapi/linux/bcache.h
> > +++ b/include/uapi/linux/bcache.h
> > @@ -172,7 +172,9 @@ struct cache_sb {
> >  
> >  	__u64			flags;
> >  	__u64			seq;
> > -	__u64			pad[8];
> > +	__u32			stripe_size;
> > +	__u32			pad_u32;
> > +	__u64			pad_u64[7];
> >  
> >  	union {
> >  	struct {
> > @@ -230,7 +232,7 @@ static inline _Bool SB_IS_BDEV(const struct cache_sb *sb)
> >  #define BDEV_STATE_CLEAN		1U
> >  #define BDEV_STATE_DIRTY		2U
> >  #define BDEV_STATE_STALE		3U
> > -
> > +BITMASK(BDEV_PARTIAL_STRIPES_EXPENSIVE,	struct cache_sb, flags, 60, 1);
> >  /*
> >   * Magic numbers
> >   *
> > 
> 
> 
> -- 
> 
> Coly Li
> 

^ permalink raw reply

* [PATCH 2/2] mm, slab: Extend vm/drop_caches to shrink kmem slabs
From: Waiman Long @ 2019-06-24 17:42 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Alexander Viro, Jonathan Corbet, Luis Chamberlain,
	Kees Cook, Johannes Weiner, Michal Hocko, Vladimir Davydov
  Cc: linux-mm, linux-doc, linux-fsdevel, cgroups, linux-kernel,
	Roman Gushchin, Shakeel Butt, Andrea Arcangeli, Waiman Long
In-Reply-To: <20190624174219.25513-1-longman@redhat.com>

With the slub memory allocator, the numbers of active slab objects
reported in /proc/slabinfo are not real because they include objects
that are held by the per-cpu slab structures whether they are actually
used or not.  The problem gets worse the more CPUs a system have. For
instance, looking at the reported number of active task_struct objects,
one will wonder where all the missing tasks gone.

I know it is hard and costly to get a real count of active objects. So
I am not advocating for that. Instead, this patch extends the
/proc/sys/vm/drop_caches sysctl parameter by using a new bit (bit 3)
to shrink all the kmem slabs which will flush out all the slabs in the
per-cpu structures and give a more accurate view of how much memory are
really used up by the active slab objects. This is a costly operation,
of course, but it gives a way to have a clearer picture of the actual
number of slab objects used, if the need arises.

The upper range of the drop_caches sysctl parameter is increased to 15
to allow all possible combinations of the lowest 4 bits.

On a 2-socket 64-core 256-thread ARM64 system with 64k page size after
a parallel kernel build, the amount of memory occupied by slabs before
and after echoing to drop_caches were:

 # grep task_struct /proc/slabinfo
 task_struct        48376  48434   4288   61    4 : tunables    0    0
 0 : slabdata    794    794      0
 # grep "^S[lRU]" /proc/meminfo
 Slab:            3419072 kB
 SReclaimable:     354688 kB
 SUnreclaim:      3064384 kB
 # echo 3 > /proc/sys/vm/drop_caches
 # grep "^S[lRU]" /proc/meminfo
 Slab:            3351680 kB
 SReclaimable:     316096 kB
 SUnreclaim:      3035584 kB
 # echo 8 > /proc/sys/vm/drop_caches
 # grep "^S[lRU]" /proc/meminfo
 Slab:            1008192 kB
 SReclaimable:     126912 kB
 SUnreclaim:       881280 kB
 # grep task_struct /proc/slabinfo
 task_struct         2601   6588   4288   61    4 : tunables    0    0
 0 : slabdata    108    108      0

Shrinking the slabs saves more than 2GB of memory in this case. This
new feature certainly fulfills the promise of dropping caches.

Unlike counting objects in the per-node caches done by /proc/slabinfo
which is rather light weight, iterating all the per-cpu caches and
shrinking them is much more heavy weight.

For this particular instance, the time taken to shrinks all the root
caches was about 30.2ms. There were 73 memory cgroup and the longest
time taken for shrinking the largest one was about 16.4ms. The total
shrinking time was about 101ms.

Because of the potential long time to shrinks all the caches, the
slab_mutex was taken multiple times - once for all the root caches
and once for each memory cgroup. This is to reduce the slab_mutex hold
time to minimize impact to other running applications that may need to
acquire the mutex.

The slab shrinking feature is only available when CONFIG_MEMCG_KMEM is
defined as the code need to access slab_root_caches to iterate all the
root caches.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/sysctl/vm.txt | 11 ++++++++--
 fs/drop_caches.c            |  4 ++++
 include/linux/slab.h        |  1 +
 kernel/sysctl.c             |  4 ++--
 mm/slab_common.c            | 44 +++++++++++++++++++++++++++++++++++++
 5 files changed, 60 insertions(+), 4 deletions(-)

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 749322060f10..b643ac8968d2 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -207,8 +207,8 @@ Setting this to zero disables periodic writeback altogether.
 drop_caches
 
 Writing to this will cause the kernel to drop clean caches, as well as
-reclaimable slab objects like dentries and inodes.  Once dropped, their
-memory becomes free.
+reclaimable slab objects like dentries and inodes.  It can also be used
+to shrink the slabs.  Once dropped, their memory becomes free.
 
 To free pagecache:
 	echo 1 > /proc/sys/vm/drop_caches
@@ -216,6 +216,8 @@ To free reclaimable slab objects (includes dentries and inodes):
 	echo 2 > /proc/sys/vm/drop_caches
 To free slab objects and pagecache:
 	echo 3 > /proc/sys/vm/drop_caches
+To shrink the slabs:
+	echo 8 > /proc/sys/vm/drop_caches
 
 This is a non-destructive operation and will not free any dirty objects.
 To increase the number of objects freed by this operation, the user may run
@@ -223,6 +225,11 @@ To increase the number of objects freed by this operation, the user may run
 number of dirty objects on the system and create more candidates to be
 dropped.
 
+Shrinking the slabs can reduce the memory footprint used by the slabs.
+It also makes the number of active objects reported in /proc/slabinfo
+more representative of the actual number of objects used for the slub
+memory allocator.
+
 This file is not a means to control the growth of the various kernel caches
 (inodes, dentries, pagecache, etc...)  These objects are automatically
 reclaimed by the kernel when memory is needed elsewhere on the system.
diff --git a/fs/drop_caches.c b/fs/drop_caches.c
index d31b6c72b476..633b99e25dab 100644
--- a/fs/drop_caches.c
+++ b/fs/drop_caches.c
@@ -9,6 +9,7 @@
 #include <linux/writeback.h>
 #include <linux/sysctl.h>
 #include <linux/gfp.h>
+#include <linux/slab.h>
 #include "internal.h"
 
 /* A global variable is a bit ugly, but it keeps the code simple */
@@ -65,6 +66,9 @@ int drop_caches_sysctl_handler(struct ctl_table *table, int write,
 			drop_slab();
 			count_vm_event(DROP_SLAB);
 		}
+		if (sysctl_drop_caches & 8) {
+			kmem_cache_shrink_all();
+		}
 		if (!stfu) {
 			pr_info("%s (%d): drop_caches: %d\n",
 				current->comm, task_pid_nr(current),
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 9449b19c5f10..f7c1626b2aa6 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -149,6 +149,7 @@ struct kmem_cache *kmem_cache_create_usercopy(const char *name,
 			void (*ctor)(void *));
 void kmem_cache_destroy(struct kmem_cache *);
 int kmem_cache_shrink(struct kmem_cache *);
+void kmem_cache_shrink_all(void);
 
 void memcg_create_kmem_cache(struct mem_cgroup *, struct kmem_cache *);
 void memcg_deactivate_kmem_caches(struct mem_cgroup *);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 1beca96fb625..feeb867dabd7 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -129,7 +129,7 @@ static int __maybe_unused neg_one = -1;
 static int zero;
 static int __maybe_unused one = 1;
 static int __maybe_unused two = 2;
-static int __maybe_unused four = 4;
+static int __maybe_unused fifteen = 15;
 static unsigned long zero_ul;
 static unsigned long one_ul = 1;
 static unsigned long long_max = LONG_MAX;
@@ -1455,7 +1455,7 @@ static struct ctl_table vm_table[] = {
 		.mode		= 0644,
 		.proc_handler	= drop_caches_sysctl_handler,
 		.extra1		= &one,
-		.extra2		= &four,
+		.extra2		= &fifteen,
 	},
 #ifdef CONFIG_COMPACTION
 	{
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 58251ba63e4a..b3c5b64f9bfb 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -956,6 +956,50 @@ int kmem_cache_shrink(struct kmem_cache *cachep)
 }
 EXPORT_SYMBOL(kmem_cache_shrink);
 
+#ifdef CONFIG_MEMCG_KMEM
+static void kmem_cache_shrink_memcg(struct mem_cgroup *memcg,
+				    void __maybe_unused *arg)
+{
+	struct kmem_cache *s;
+
+	if (memcg == root_mem_cgroup)
+		return;
+	mutex_lock(&slab_mutex);
+	list_for_each_entry(s, &memcg->kmem_caches,
+			    memcg_params.kmem_caches_node) {
+		kmem_cache_shrink(s);
+	}
+	mutex_unlock(&slab_mutex);
+	cond_resched();
+}
+
+/*
+ * Shrink all the kmem caches.
+ *
+ * If there are a large number of memory cgroups outstanding, it may take
+ * a while to shrink all of them. So we may need to release the lock, call
+ * cond_resched() and reacquire the lock from time to time.
+ */
+void kmem_cache_shrink_all(void)
+{
+	struct kmem_cache *s;
+
+	/* Shrink all the root caches */
+	mutex_lock(&slab_mutex);
+	list_for_each_entry(s, &slab_root_caches, root_caches_node)
+		kmem_cache_shrink(s);
+	mutex_unlock(&slab_mutex);
+	cond_resched();
+
+	/*
+	 * Flush each of the memcg individually
+	 */
+	memcg_iterate_all(kmem_cache_shrink_memcg, NULL);
+}
+#else
+void kmem_cache_shrink_all(void) { }
+#endif
+
 bool slab_is_available(void)
 {
 	return slab_state >= UP;
-- 
2.18.1


^ permalink raw reply related

* [PATCH 1/2] mm, memcontrol: Add memcg_iterate_all()
From: Waiman Long @ 2019-06-24 17:42 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Alexander Viro, Jonathan Corbet, Luis Chamberlain,
	Kees Cook, Johannes Weiner, Michal Hocko, Vladimir Davydov
  Cc: linux-mm, linux-doc, linux-fsdevel, cgroups, linux-kernel,
	Roman Gushchin, Shakeel Butt, Andrea Arcangeli, Waiman Long
In-Reply-To: <20190624174219.25513-1-longman@redhat.com>

Add a memcg_iterate_all() function for iterating all the available
memory cgroups and call the given callback function for each of the
memory cgruops.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 include/linux/memcontrol.h |  3 +++
 mm/memcontrol.c            | 13 +++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 1dcb763bb610..0e31418e5a47 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1268,6 +1268,9 @@ static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
 struct kmem_cache *memcg_kmem_get_cache(struct kmem_cache *cachep);
 void memcg_kmem_put_cache(struct kmem_cache *cachep);
 
+extern void memcg_iterate_all(void (*callback)(struct mem_cgroup *memcg,
+					       void *arg), void *arg);
+
 #ifdef CONFIG_MEMCG_KMEM
 int __memcg_kmem_charge(struct page *page, gfp_t gfp, int order);
 void __memcg_kmem_uncharge(struct page *page, int order);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ba9138a4a1de..c1c4706f7696 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -443,6 +443,19 @@ static int memcg_alloc_shrinker_maps(struct mem_cgroup *memcg)
 static void memcg_free_shrinker_maps(struct mem_cgroup *memcg) { }
 #endif /* CONFIG_MEMCG_KMEM */
 
+/*
+ * Iterate all the memory cgroups and call the given callback function
+ * for each of the memory cgroups.
+ */
+void memcg_iterate_all(void (*callback)(struct mem_cgroup *memcg, void *arg),
+		       void *arg)
+{
+	struct mem_cgroup *memcg;
+
+	for_each_mem_cgroup(memcg)
+		callback(memcg, arg);
+}
+
 /**
  * mem_cgroup_css_from_page - css of the memcg associated with a page
  * @page: page of interest
-- 
2.18.1


^ permalink raw reply related

* [PATCH 0/2] mm, slab: Extend vm/drop_caches to shrink kmem slabs
From: Waiman Long @ 2019-06-24 17:42 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Alexander Viro, Jonathan Corbet, Luis Chamberlain,
	Kees Cook, Johannes Weiner, Michal Hocko, Vladimir Davydov
  Cc: linux-mm, linux-doc, linux-fsdevel, cgroups, linux-kernel,
	Roman Gushchin, Shakeel Butt, Andrea Arcangeli, Waiman Long

The purpose of this patchset is to allow system administrators to have
the ability to shrink all the kmem slabs in order to free up memory
and get a more accurate picture of how many slab objects are actually
being used.

Patch 1 adds a new memcg_iterate_all() that is used by the patch 2 to
iterate on all the memory cgroups.

Waiman Long (2):
  mm, memcontrol: Add memcg_iterate_all()
  mm, slab: Extend vm/drop_caches to shrink kmem slabs

 Documentation/sysctl/vm.txt | 11 ++++++++--
 fs/drop_caches.c            |  4 ++++
 include/linux/memcontrol.h  |  3 +++
 include/linux/slab.h        |  1 +
 kernel/sysctl.c             |  4 ++--
 mm/memcontrol.c             | 13 +++++++++++
 mm/slab_common.c            | 44 +++++++++++++++++++++++++++++++++++++
 7 files changed, 76 insertions(+), 4 deletions(-)

-- 
2.18.1


^ permalink raw reply

* Re: [PATCH 1/3] Docs: An initial automarkup extension for sphinx
From: Mauro Carvalho Chehab @ 2019-06-24 16:38 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-doc, Matthew Wilcox, Jani Nikula, linux-kernel
In-Reply-To: <20190624082950.5e338d37@lwn.net>

Em Mon, 24 Jun 2019 08:29:50 -0600
Jonathan Corbet <corbet@lwn.net> escreveu:

> On Sat, 22 Jun 2019 14:46:10 -0300
> Mauro Carvalho Chehab <mchehab+samsung@kernel.org> wrote:
> 
> > > > .. c:function:: int ioctl( int fd, int request, void *argp )
> > > >     :name: v4l2-ioctl      
> > > 
> > > Some digging around didn't turn up any documentation for :name:, but it
> > > seems to prevent ioctl() from going into the list of functions that can be
> > > cross-referenced.     
> > 
> > It took me a while to discover this way to be able to re-define the
> > name of a symbol at the C domain, but I'm pretty sure I read this
> > somewhere at the Sphinx docs (or perhaps on some bug track or Stack
> > Overflow).
> > 
> > I don't remember exactly where I get it, but I guess it is related to
> > this:
> > 
> > 	http://docutils.sourceforge.net/docs/howto/rst-roles.html
> >   
> > > I wonder if the same should be done for the others?    
> > 
> > Sure.  
> 
> It actually occurs to me that it might be better to keep the skip list and
> maybe expand it.  There are vast numbers of places where people write
> open() or whatever(), and there is no point in even trying to
> cross-reference them. 

Yeah, subsystems will likely have their own "man pages" for for sysctls.

Both the dvb and the v4l2 ones are part of what used to be the DocBook
manpages for those syscalls. If I'm not mistaken, we ended by expanded 
the same approach for the media controller, for CEC and for RC.

> I should do some tests, it might even make a
> measurable difference in the build time to skip them :)  And in any case,
> somebody is bound to put one of those common names into the namespace in
> the future, recreating the current problem.

There is one way of keeping it while avoiding troubles: you could
create internal names, for example using the current dir, auto-adding
the ":name:" tag when a declaration conflict rises. Not sure how
easy/hard would be to implement it.

Btw, at get_abi.pl, I had to do a trick like that, as some symbols
have a "local context". The good thing at ABI files is that the context
is clear: it is valid only between "What:" and "Description:" fields.

Also, as it is a single script that parses the entire ABI (it takes
on ~0.1 seconds to parse everything and store internally on my machine),
the logic there detects when an existing symbol is re-used on a different
context. When this happens, it adds a random char at the end of the
internal reference, while keeping the original name[1].

[1] on the the highly unlikely event that the new name still repeats,
it adds a new random char - until the name gets different.

Probably doing it at automarkup won't be that simple, but it could
work.

Thanks,
Mauro

^ permalink raw reply

* [PATCH v2 23/24] EDAC, Documentation: Describe CPER module definition and DIMM ranks
From: Robert Richter @ 2019-06-24 15:09 UTC (permalink / raw)
  To: Borislav Petkov, James Morse, Mauro Carvalho Chehab, Tony Luck,
	Jonathan Corbet
  Cc: linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org,
	Robert Richter, linux-doc@vger.kernel.org
In-Reply-To: <20190624150758.6695-1-rrichter@marvell.com>

Update on CPER DIMM naming convention and DIMM ranks.

Signed-off-by: Robert Richter <rrichter@marvell.com>
---
 Documentation/admin-guide/ras.rst | 31 +++++++++++++++++++------------
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/Documentation/admin-guide/ras.rst b/Documentation/admin-guide/ras.rst
index c7495e42e6f4..4e2a01c77a9c 100644
--- a/Documentation/admin-guide/ras.rst
+++ b/Documentation/admin-guide/ras.rst
@@ -330,9 +330,12 @@ There can be multiple csrows and multiple channels.
 
 .. [#f4] Nowadays, the term DIMM (Dual In-line Memory Module) is widely
   used to refer to a memory module, although there are other memory
-  packaging alternatives, like SO-DIMM, SIMM, etc. Along this document,
-  and inside the EDAC system, the term "dimm" is used for all memory
-  modules, even when they use a different kind of packaging.
+  packaging alternatives, like SO-DIMM, SIMM, etc. The UEFI
+  specification (Version 2.7) defines a memory module in the Common
+  Platform Error Record (CPER) section to be an SMBIOS Memory Device
+  (Type 17). Along this document, and inside the EDAC system, the term
+  "dimm" is used for all memory modules, even when they use a
+  different kind of packaging.
 
 Memory controllers allow for several csrows, with 8 csrows being a
 typical value. Yet, the actual number of csrows depends on the layout of
@@ -349,12 +352,14 @@ controllers. The following example will assume 2 channels:
 	|            |  ``ch0``  |  ``ch1``  |
 	+============+===========+===========+
 	| ``csrow0`` |  DIMM_A0  |  DIMM_B0  |
-	+------------+           |           |
-	| ``csrow1`` |           |           |
+	|            |   rank0   |   rank0   |
+	+------------+     -     |     -     |
+	| ``csrow1`` |   rank1   |   rank1   |
 	+------------+-----------+-----------+
 	| ``csrow2`` |  DIMM_A1  | DIMM_B1   |
-	+------------+           |           |
-	| ``csrow3`` |           |           |
+	|            |   rank0   |   rank0   |
+	+------------+     -     |     -     |
+	| ``csrow3`` |   rank1   |   rank1   |
 	+------------+-----------+-----------+
 
 In the above example, there are 4 physical slots on the motherboard
@@ -374,11 +379,13 @@ which the memory DIMM is placed. Thus, when 1 DIMM is placed in each
 Channel, the csrows cross both DIMMs.
 
 Memory DIMMs come single or dual "ranked". A rank is a populated csrow.
-Thus, 2 single ranked DIMMs, placed in slots DIMM_A0 and DIMM_B0 above
-will have just one csrow (csrow0). csrow1 will be empty. On the other
-hand, when 2 dual ranked DIMMs are similarly placed, then both csrow0
-and csrow1 will be populated. The pattern repeats itself for csrow2 and
-csrow3.
+In the example above 2 dual ranked DIMMs are similarly placed. Thus,
+both csrow0 and csrow1 are populated. On the other hand, when 2 single
+ranked DIMMs are placed in slots DIMM_A0 and DIMM_B0, then they will
+have just one csrow (csrow0) and csrow1 will be empty. The pattern
+repeats itself for csrow2 and csrow3. Also note that some memory
+controller doesn't have any logic to identify the memory module, see
+``rankX`` directories below.
 
 The representation of the above is reflected in the directory
 tree in EDAC's sysfs interface. Starting in directory
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH 1/3] Docs: An initial automarkup extension for sphinx
From: Jonathan Corbet @ 2019-06-24 14:29 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: linux-doc, Matthew Wilcox, Jani Nikula, linux-kernel
In-Reply-To: <20190622144610.26b7d99c@coco.lan>

On Sat, 22 Jun 2019 14:46:10 -0300
Mauro Carvalho Chehab <mchehab+samsung@kernel.org> wrote:

> > > .. c:function:: int ioctl( int fd, int request, void *argp )
> > >     :name: v4l2-ioctl    
> > 
> > Some digging around didn't turn up any documentation for :name:, but it
> > seems to prevent ioctl() from going into the list of functions that can be
> > cross-referenced.   
> 
> It took me a while to discover this way to be able to re-define the
> name of a symbol at the C domain, but I'm pretty sure I read this
> somewhere at the Sphinx docs (or perhaps on some bug track or Stack
> Overflow).
> 
> I don't remember exactly where I get it, but I guess it is related to
> this:
> 
> 	http://docutils.sourceforge.net/docs/howto/rst-roles.html
> 
> > I wonder if the same should be done for the others?  
> 
> Sure.

It actually occurs to me that it might be better to keep the skip list and
maybe expand it.  There are vast numbers of places where people write
open() or whatever(), and there is no point in even trying to
cross-reference them.  I should do some tests, it might even make a
measurable difference in the build time to skip them :)  And in any case,
somebody is bound to put one of those common names into the namespace in
the future, recreating the current problem.

jon

^ permalink raw reply

* Re: [PATCH 1/3] Docs: An initial automarkup extension for sphinx
From: Jonathan Corbet @ 2019-06-24 14:25 UTC (permalink / raw)
  To: Jani Nikula
  Cc: linux-doc, Matthew Wilcox, Mauro Carvalho Chehab, linux-kernel
In-Reply-To: <87k1dbrziw.fsf@intel.com>

On Mon, 24 Jun 2019 14:30:47 +0300
Jani Nikula <jani.nikula@linux.intel.com> wrote:

> > +def auto_markup(app, doctree, name):
> > +    for para in doctree.traverse(nodes.paragraph):
> > +        for node in para.traverse(nodes.Text):
> > +            if not isinstance(node.parent, nodes.literal):
> > +                node.parent.replace(node, markup_funcs(name, app, node))  
> 
> I think overall this is a better approach than preprocessing. Thanks for
> doing this!
> 
> I toyed with something like this before, and the key difference here
> seems to be ignoring literal blocks. The problem seemed to be that
> replacing blocks with syntax highlighting also removed the syntax
> highlighting, with no way that I could find to bring it back.

That test could use a comment, really.  What it is actually doing is
skipping text chunks in ``inline literal`` sections, and what that is
*actually* doing is avoiding marking up functions that have an
explicit :c:func: markup on them already.

Someday I don't doubt that this loop will be replaced by a proper tree
walk that knows where to prune things and how to replace various other
types of nodes, but this is easy and does the right thing pretty much
everywhere as far as I can tell.

Thanks,

jon

^ permalink raw reply

* Re: [PATCH next] softirq: enable MAX_SOFTIRQ_TIME tuning with sysctl max_softirq_time_usecs
From: Zhiqiang Liu @ 2019-06-24 13:32 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: corbet, mcgrof, Kees Cook, akpm, manfred, jwilk, dvyukov,
	feng.tang, sunilmut, quentin.perret, linux, alex.popov, linux-doc,
	linux-kernel, linux-fsdevel, wangxiaogang (F), Zhoukang (A),
	Mingfangsen, tedheadster, Eric Dumazet
In-Reply-To: <alpine.DEB.2.21.1906241141370.32342@nanos.tec.linutronix.de>

On 2019/6/24 17:45, Thomas Gleixner wrote:
> Zhiqiang,
> 
> On Mon, 24 Jun 2019, Zhiqiang Liu wrote:
>> 在 2019/6/24 0:38, Thomas Gleixner 写道:
>>
>> Thanks again for your detailed advice.
>> As your said, the max_softirq_time_usecs setting without explaining the
>> relationship with CONFIG_HZ will give a false sense of controlability. And
>> the time accuracy of jiffies will result in a certain difference between the
>> max_softirq_time_usecs set value and the actual value, which is in one jiffies
>> range.
>>
>> I will add these infomation in the sysctl documentation and changelog in v2 patch.
> 
> Please make the sysctl milliseconds based. That's the closest approximation
> of useful units for this. This still has the same issues as explained
> before but it's not off by 3 orders of magitude anymore.
> 
> Thanks,
> 
> 	tglx
> 
Thanks for your suggestion.
I will adopt max_softirq_time_ms to replace MAX_SOFTIRQ_TIME in v2.


^ permalink raw reply

* Re: [PATCH 1/3] Docs: An initial automarkup extension for sphinx
From: Jani Nikula @ 2019-06-24 11:30 UTC (permalink / raw)
  To: Jonathan Corbet, linux-doc
  Cc: Matthew Wilcox, Mauro Carvalho Chehab, linux-kernel,
	Jonathan Corbet
In-Reply-To: <20190621235159.6992-2-corbet@lwn.net>

On Fri, 21 Jun 2019, Jonathan Corbet <corbet@lwn.net> wrote:
> Rather than fill our text files with :c:func:`function()` syntax, just do
> the markup via a hook into the sphinx build process.
>
> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
> ---
>  Documentation/conf.py              |  3 +-
>  Documentation/sphinx/automarkup.py | 80 ++++++++++++++++++++++++++++++
>  2 files changed, 82 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/sphinx/automarkup.py
>
> diff --git a/Documentation/conf.py b/Documentation/conf.py
> index 7ace3f8852bd..a502baecbb85 100644
> --- a/Documentation/conf.py
> +++ b/Documentation/conf.py
> @@ -34,7 +34,8 @@ needs_sphinx = '1.3'
>  # Add any Sphinx extension module names here, as strings. They can be
>  # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
>  # ones.
> -extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'cdomain', 'kfigure', 'sphinx.ext.ifconfig']
> +extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'cdomain',
> +              'kfigure', 'sphinx.ext.ifconfig', 'automarkup']
>  
>  # The name of the math extension changed on Sphinx 1.4
>  if (major == 1 and minor > 3) or (major > 1):
> diff --git a/Documentation/sphinx/automarkup.py b/Documentation/sphinx/automarkup.py
> new file mode 100644
> index 000000000000..14b09b5d145e
> --- /dev/null
> +++ b/Documentation/sphinx/automarkup.py
> @@ -0,0 +1,80 @@
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright 2019 Jonathan Corbet <corbet@lwn.net>
> +#
> +# Apply kernel-specific tweaks after the initial document processing
> +# has been done.
> +#
> +from docutils import nodes
> +from sphinx import addnodes
> +import re
> +
> +#
> +# Regex nastiness.  Of course.
> +# Try to identify "function()" that's not already marked up some
> +# other way.  Sphinx doesn't like a lot of stuff right after a
> +# :c:func: block (i.e. ":c:func:`mmap()`s" flakes out), so the last
> +# bit tries to restrict matches to things that won't create trouble.
> +#
> +RE_function = re.compile(r'([\w_][\w\d_]+\(\))')
> +
> +#
> +# The DVB docs create references for these basic system calls, leading
> +# to lots of confusing links.  So just don't link them.
> +#
> +Skipfuncs = [ 'open', 'close', 'write' ]
> +
> +#
> +# Find all occurrences of function() and try to replace them with
> +# appropriate cross references.
> +#
> +def markup_funcs(docname, app, node):
> +    cdom = app.env.domains['c']
> +    t = node.astext()
> +    done = 0
> +    repl = [ ]
> +    for m in RE_function.finditer(t):
> +        #
> +        # Include any text prior to function() as a normal text node.
> +        #
> +        if m.start() > done:
> +            repl.append(nodes.Text(t[done:m.start()]))
> +        #
> +        # Go through the dance of getting an xref out of the C domain
> +        #
> +        target = m.group(1)[:-2]
> +        target_text = nodes.Text(target + '()')
> +        xref = None
> +        if target not in Skipfuncs:
> +            lit_text = nodes.literal(classes=['xref', 'c', 'c-func'])
> +            lit_text += target_text
> +            pxref = addnodes.pending_xref('', refdomain = 'c',
> +                                          reftype = 'function',
> +                                          reftarget = target, modname = None,
> +                                          classname = None)
> +            xref = cdom.resolve_xref(app.env, docname, app.builder,
> +                                     'function', target, pxref, lit_text)
> +        #
> +        # Toss the xref into the list if we got it; otherwise just put
> +        # the function text.
> +        #
> +        if xref:
> +            repl.append(xref)
> +        else:
> +            repl.append(target_text)
> +        done = m.end()
> +    if done < len(t):
> +        repl.append(nodes.Text(t[done:]))
> +    return repl
> +
> +def auto_markup(app, doctree, name):
> +    for para in doctree.traverse(nodes.paragraph):
> +        for node in para.traverse(nodes.Text):
> +            if not isinstance(node.parent, nodes.literal):
> +                node.parent.replace(node, markup_funcs(name, app, node))

I think overall this is a better approach than preprocessing. Thanks for
doing this!

I toyed with something like this before, and the key difference here
seems to be ignoring literal blocks. The problem seemed to be that
replacing blocks with syntax highlighting also removed the syntax
highlighting, with no way that I could find to bring it back.

Obviously it would be great to be able to add the cross references in
more places than just bulk text nodes, but this is a good start.

BR,
Jani.


> +
> +def setup(app):
> +    app.connect('doctree-resolved', auto_markup)
> +    return {
> +        'parallel_read_safe': True,
> +        'parallel_write_safe': True,
> +        }

-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply

* Re: [PATCH 0/2] arm64: Introduce boot parameter to disable TLB flush instruction within the same inner shareable domain
From: qi.fuli @ 2019-06-24 10:34 UTC (permalink / raw)
  To: Will Deacon, indou.takao@fujitsu.com
  Cc: Jonathan Corbet, Catalin Marinas, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, qi.fuli@fujitsu.com,
	indou.takao@fujitsu.com, peterz@infradead.org
In-Reply-To: <20190617170328.GJ30800@fuggles.cambridge.arm.com>

Hi Will,

I am Takao's colleague, thank you very much for your reply.

On 6/18/19 2:03 AM, Will Deacon wrote:
> Hi Takao,
>
> [+Peter Z]
>
> On Mon, Jun 17, 2019 at 11:32:53PM +0900, Takao Indoh wrote:
>> From: Takao Indoh <indou.takao@fujitsu.com>
>>
>> I found a performance issue related on the implementation of Linux's TLB
>> flush for arm64.
>>
>> When I run a single-threaded test program on moderate environment, it
>> usually takes 39ms to finish its work. However, when I put a small
>> apprication, which just calls mprotest() continuously, on one of sibling
>> cores and run it simultaneously, the test program slows down significantly.
>> It becomes 49ms(125%) on ThunderX2. I also detected the same problem on
>> ThunderX1 and Fujitsu A64FX.
> This is a problem for any applications that share hardware resources with
> each other, so I don't think it's something we should be too concerned about
> addressing unless there is a practical DoS scenario, which there doesn't
> appear to be in this case. It may be that the real answer is "don't call
> mprotect() in a loop".
I think there has been a misunderstanding, please let me explain.
This application is just an example using for reproducing the 
performance issue we found.
Our original purpose is reducing OS jitter by this series.
The OS jitter on massively parallel processing systems have been known 
and studied for many years.
The 2.5% OS jitter can result in over a factor of 20 slowdown for the 
same application [1].
Though it may be an extreme example, reducing the OS jitter has been an 
issue in HPC environment.

[1] Ferreira, Kurt B., Patrick Bridges, and Ron Brightwell. 
"Characterizing application sensitivity to OS interference using 
kernel-level noise injection." Proceedings of the 2008 ACM/IEEE 
conference on Supercomputing. IEEE Press, 2008.

>> I suppose the root cause of this issue is the implementation of Linux's TLB
>> flush for arm64, especially use of TLBI-is instruction which is a broadcast
>> to all processor core on the system. In case of the above situation,
>> TLBI-is is called by mprotect().
> On the flip side, Linux is providing the hardware with enough information
> not to broadcast to cores for which the remote TLBs don't have entries
> allocated for the ASID being invalidated. I would say that the root cause
> of the issue is that this filtering is not taking place.

Do you mean that the filter should be implemented in hardware?

Thanks,
Qi Fuli

>> This is not a problem for small environment, but this causes a significant
>> performance noise for large-scale HPC environment, which has more than
>> thousand nodes with low latency interconnect.
> If you have a system with over a thousand nodes, without snoop filtering
> for DVM messages and you expect performance to scale in the face of tight
> mprotect() loops then I think you have a problem irrespective of this patch.
> What happens if somebody runs I-cache invalidation in a loop?
>
>> To fix this problem, this patch adds new boot parameter
>> 'disable_tlbflush_is'.  In the case of flush_tlb_mm() *without* this
>> parameter, TLB entry is invalidated by __tlbi(aside1is, asid). By this
>> instruction, all CPUs within the same inner shareable domain check if there
>> are TLB entries which have this ASID, this causes performance noise. OTOH,
>> when this new parameter is specified, TLB entry is invalidated by
>> __tlbi(aside1, asid) only on the CPUs specified by mm_cpumask(mm).
>> Therefore TLB flush is done on minimal CPUs and performance problem does
>> not occur. Actually I confirm the performance problem is fixed by this
>> patch.
> Other than my comments above, my overall concern with this patch is that
> it introduces divergent behaviour for our TLB invalidation flow, which is
> undesirable from both maintainability and usability perspectives. If you
> wish to change the code, please don't put it behind a command-line option,
> but instead improve the code that is already there. However, I suspect that
> blowing away the local TLB on every context-switch may have hidden costs
> which are only apparent with workloads different from the contrived case
> that you're seeking to improve. You also haven't taken into account the
> effects of virtualisation, where it's likely that the hypervisor will
> upgrade non-shareable operations to inner-shareable ones anyway.
>
> Thanks,
>
> Will

^ permalink raw reply

* [PATCH v2] hrtimer: Use a bullet for the returns bullet list
From: Mauro Carvalho Chehab @ 2019-06-24 10:33 UTC (permalink / raw)
  To: Linux Doc Mailing List, Thomas Gleixner
  Cc: Mauro Carvalho Chehab, Mauro Carvalho Chehab, linux-kernel,
	Jonathan Corbet

That gets rid of this warning:

	./kernel/time/hrtimer.c:1119: WARNING: Block quote ends without a blank line; unexpected unindent.

and displays nicely both at the source code and at the produced
documentation.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 kernel/time/hrtimer.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index edb230aba3d1..5ee77f1a8a92 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1114,9 +1114,10 @@ EXPORT_SYMBOL_GPL(hrtimer_start_range_ns);
  * @timer:	hrtimer to stop
  *
  * Returns:
- *  0 when the timer was not active
- *  1 when the timer was active
- * -1 when the timer is currently executing the callback function and
+ *
+ *  *  0 when the timer was not active
+ *  *  1 when the timer was active
+ *  * -1 when the timer is currently executing the callback function and
  *    cannot be stopped
  */
 int hrtimer_try_to_cancel(struct hrtimer *timer)
-- 
2.21.0



^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox