LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v11 05/13] integrity: Select CONFIG_KEYS instead of depending on it
From: Thiago Jung Bauermann @ 2019-06-11  6:28 UTC (permalink / raw)
  To: linux-integrity
  Cc: Herbert Xu, linux-doc, Dmitry Kasatkin, David S. Miller,
	Jonathan Corbet, linux-kernel, Mimi Zohar, James Morris,
	David Howells, AKASHI, Takahiro, linux-security-module, keyrings,
	linux-crypto, Jessica Yu, linuxppc-dev, David Woodhouse,
	Thiago Jung Bauermann, Serge E. Hallyn
In-Reply-To: <20190611062817.18412-1-bauerman@linux.ibm.com>

This avoids a dependency cycle in soon-to-be-introduced
CONFIG_IMA_APPRAISE_MODSIG: it will select CONFIG_MODULE_SIG_FORMAT
which in turn selects CONFIG_KEYS. Kconfig then complains that
CONFIG_INTEGRITY_SIGNATURE depends on CONFIG_KEYS.

Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
---
 security/integrity/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/integrity/Kconfig b/security/integrity/Kconfig
index 3ba1168b1756..93d73902c571 100644
--- a/security/integrity/Kconfig
+++ b/security/integrity/Kconfig
@@ -17,8 +17,8 @@ if INTEGRITY
 
 config INTEGRITY_SIGNATURE
 	bool "Digital signature verification using multiple keyrings"
-	depends on KEYS
 	default n
+	select KEYS
 	select SIGNATURE
 	help
 	  This option enables digital signature verification support

^ permalink raw reply related

* [PATCH v11 07/13] ima: Add modsig appraise_type option for module-style appended signatures
From: Thiago Jung Bauermann @ 2019-06-11  6:28 UTC (permalink / raw)
  To: linux-integrity
  Cc: Herbert Xu, linux-doc, Dmitry Kasatkin, David S. Miller,
	Jonathan Corbet, linux-kernel, Mimi Zohar, James Morris,
	David Howells, AKASHI, Takahiro, linux-security-module, keyrings,
	linux-crypto, Jessica Yu, linuxppc-dev, David Woodhouse,
	Thiago Jung Bauermann, Serge E. Hallyn
In-Reply-To: <20190611062817.18412-1-bauerman@linux.ibm.com>

Introduce the modsig keyword to the IMA policy syntax to specify that
a given hook should expect the file to have the IMA signature appended
to it. Here is how it can be used in a rule:

appraise func=KEXEC_KERNEL_CHECK appraise_type=imasig|modsig

With this rule, IMA will accept either a signature stored in the extended
attribute or an appended signature.

For now, the rule above will behave exactly the same as if
appraise_type=imasig was specified. The actual modsig implementation
will be introduced separately.

Suggested-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
---
 Documentation/ABI/testing/ima_policy |  6 +++++-
 security/integrity/ima/Kconfig       | 10 +++++++++
 security/integrity/ima/Makefile      |  1 +
 security/integrity/ima/ima.h         |  9 ++++++++
 security/integrity/ima/ima_modsig.c  | 31 ++++++++++++++++++++++++++++
 security/integrity/ima/ima_policy.c  | 12 +++++++++--
 security/integrity/integrity.h       |  1 +
 7 files changed, 67 insertions(+), 3 deletions(-)

diff --git a/Documentation/ABI/testing/ima_policy b/Documentation/ABI/testing/ima_policy
index b383c1763610..e622cdafe0af 100644
--- a/Documentation/ABI/testing/ima_policy
+++ b/Documentation/ABI/testing/ima_policy
@@ -36,7 +36,7 @@ Description:
 			euid:= decimal value
 			fowner:= decimal value
 		lsm:  	are LSM specific
-		option:	appraise_type:= [imasig]
+		option:	appraise_type:= [imasig] [imasig|modsig]
 			template:= name of a defined IMA template type
 			(eg, ima-ng). Only valid when action is "measure".
 			pcr:= decimal value
@@ -104,3 +104,7 @@ Description:
 
 			measure func=KEXEC_KERNEL_CHECK pcr=4
 			measure func=KEXEC_INITRAMFS_CHECK pcr=5
+
+		Example of appraise rule allowing modsig appended signatures:
+
+			appraise func=KEXEC_KERNEL_CHECK appraise_type=imasig|modsig
diff --git a/security/integrity/ima/Kconfig b/security/integrity/ima/Kconfig
index a18f8c6d13b5..bba19f9ea184 100644
--- a/security/integrity/ima/Kconfig
+++ b/security/integrity/ima/Kconfig
@@ -231,6 +231,16 @@ config IMA_APPRAISE_BOOTPARAM
 	  This option enables the different "ima_appraise=" modes
 	  (eg. fix, log) from the boot command line.
 
+config IMA_APPRAISE_MODSIG
+	bool "Support module-style signatures for appraisal"
+	depends on IMA_APPRAISE
+	default n
+	help
+	   Adds support for signatures appended to files. The format of the
+	   appended signature is the same used for signed kernel modules.
+	   The modsig keyword can be used in the IMA policy to allow a hook
+	   to accept such signatures.
+
 config IMA_TRUSTED_KEYRING
 	bool "Require all keys on the .ima keyring be signed (deprecated)"
 	depends on IMA_APPRAISE && SYSTEM_TRUSTED_KEYRING
diff --git a/security/integrity/ima/Makefile b/security/integrity/ima/Makefile
index d921dc4f9eb0..31d57cdf2421 100644
--- a/security/integrity/ima/Makefile
+++ b/security/integrity/ima/Makefile
@@ -9,5 +9,6 @@ obj-$(CONFIG_IMA) += ima.o
 ima-y := ima_fs.o ima_queue.o ima_init.o ima_main.o ima_crypto.o ima_api.o \
 	 ima_policy.o ima_template.o ima_template_lib.o
 ima-$(CONFIG_IMA_APPRAISE) += ima_appraise.o
+ima-$(CONFIG_IMA_APPRAISE_MODSIG) += ima_modsig.o
 ima-$(CONFIG_HAVE_IMA_KEXEC) += ima_kexec.o
 obj-$(CONFIG_IMA_BLACKLIST_KEYRING) += ima_mok.o
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index 18b48a6d0b80..9e2580164e97 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -298,6 +298,15 @@ static inline int ima_read_xattr(struct dentry *dentry,
 
 #endif /* CONFIG_IMA_APPRAISE */
 
+#ifdef CONFIG_IMA_APPRAISE_MODSIG
+bool ima_hook_supports_modsig(enum ima_hooks func);
+#else
+static inline bool ima_hook_supports_modsig(enum ima_hooks func)
+{
+	return false;
+}
+#endif /* CONFIG_IMA_APPRAISE_MODSIG */
+
 /* LSM based policy rules require audit */
 #ifdef CONFIG_IMA_LSM_RULES
 
diff --git a/security/integrity/ima/ima_modsig.c b/security/integrity/ima/ima_modsig.c
new file mode 100644
index 000000000000..87503bfe8c8b
--- /dev/null
+++ b/security/integrity/ima/ima_modsig.c
@@ -0,0 +1,31 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * IMA support for appraising module-style appended signatures.
+ *
+ * Copyright (C) 2019  IBM Corporation
+ *
+ * Author:
+ * Thiago Jung Bauermann <bauerman@linux.ibm.com>
+ */
+
+#include "ima.h"
+
+/**
+ * ima_hook_supports_modsig - can the policy allow modsig for this hook?
+ *
+ * modsig is only supported by hooks using ima_post_read_file(), because only
+ * they preload the contents of the file in a buffer. FILE_CHECK does that in
+ * some cases, but not when reached from vfs_open(). POLICY_CHECK can support
+ * it, but it's not useful in practice because it's a text file so deny.
+ */
+bool ima_hook_supports_modsig(enum ima_hooks func)
+{
+	switch (func) {
+	case KEXEC_KERNEL_CHECK:
+	case KEXEC_INITRAMFS_CHECK:
+	case MODULE_CHECK:
+		return true;
+	default:
+		return false;
+	}
+}
diff --git a/security/integrity/ima/ima_policy.c b/security/integrity/ima/ima_policy.c
index fd9b01881d17..06ae4b7b3676 100644
--- a/security/integrity/ima/ima_policy.c
+++ b/security/integrity/ima/ima_policy.c
@@ -1049,6 +1049,10 @@ static int ima_parse_rule(char *rule, struct ima_rule_entry *entry)
 			ima_log_string(ab, "appraise_type", args[0].from);
 			if ((strcmp(args[0].from, "imasig")) == 0)
 				entry->flags |= IMA_DIGSIG_REQUIRED;
+			else if (ima_hook_supports_modsig(entry->func) &&
+				 strcmp(args[0].from, "imasig|modsig") == 0)
+				entry->flags |= IMA_DIGSIG_REQUIRED |
+						IMA_MODSIG_ALLOWED;
 			else
 				result = -EINVAL;
 			break;
@@ -1358,8 +1362,12 @@ int ima_policy_show(struct seq_file *m, void *v)
 	}
 	if (entry->template)
 		seq_printf(m, "template=%s ", entry->template->name);
-	if (entry->flags & IMA_DIGSIG_REQUIRED)
-		seq_puts(m, "appraise_type=imasig ");
+	if (entry->flags & IMA_DIGSIG_REQUIRED) {
+		if (entry->flags & IMA_MODSIG_ALLOWED)
+			seq_puts(m, "appraise_type=imasig|modsig ");
+		else
+			seq_puts(m, "appraise_type=imasig ");
+	}
 	if (entry->flags & IMA_PERMIT_DIRECTIO)
 		seq_puts(m, "permit_directio ");
 	rcu_read_unlock();
diff --git a/security/integrity/integrity.h b/security/integrity/integrity.h
index 88a29f72a74f..0e7330a36a9d 100644
--- a/security/integrity/integrity.h
+++ b/security/integrity/integrity.h
@@ -36,6 +36,7 @@
 #define IMA_NEW_FILE		0x04000000
 #define EVM_IMMUTABLE_DIGSIG	0x08000000
 #define IMA_FAIL_UNVERIFIABLE_SIGS	0x10000000
+#define IMA_MODSIG_ALLOWED	0x20000000
 
 #define IMA_DO_MASK		(IMA_MEASURE | IMA_APPRAISE | IMA_AUDIT | \
 				 IMA_HASH | IMA_APPRAISE_SUBMASK)


^ permalink raw reply related

* [PATCH v11 08/13] ima: Factor xattr_verify() out of ima_appraise_measurement()
From: Thiago Jung Bauermann @ 2019-06-11  6:28 UTC (permalink / raw)
  To: linux-integrity
  Cc: Herbert Xu, linux-doc, Dmitry Kasatkin, David S. Miller,
	Jonathan Corbet, linux-kernel, Mimi Zohar, James Morris,
	David Howells, AKASHI, Takahiro, linux-security-module, keyrings,
	linux-crypto, Jessica Yu, linuxppc-dev, David Woodhouse,
	Thiago Jung Bauermann, Serge E. Hallyn
In-Reply-To: <20190611062817.18412-1-bauerman@linux.ibm.com>

Verify xattr signature in a separate function so that the logic in
ima_appraise_measurement() remains clear when it gains the ability to also
verify an appended module signature.

The code in the switch statement is unchanged except for having to
dereference the status and cause variables (since they're now pointers),
and fixing the style of a block comment to appease checkpatch.

Suggested-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
---
 security/integrity/ima/ima_appraise.c | 141 +++++++++++++++-----------
 1 file changed, 81 insertions(+), 60 deletions(-)

diff --git a/security/integrity/ima/ima_appraise.c b/security/integrity/ima/ima_appraise.c
index 18bbe753421a..5d4772f39757 100644
--- a/security/integrity/ima/ima_appraise.c
+++ b/security/integrity/ima/ima_appraise.c
@@ -202,6 +202,83 @@ int ima_read_xattr(struct dentry *dentry,
 	return ret;
 }
 
+/*
+ * xattr_verify - verify xattr digest or signature
+ *
+ * Verify whether the hash or signature matches the file contents.
+ *
+ * Return 0 on success, error code otherwise.
+ */
+static int xattr_verify(enum ima_hooks func, struct integrity_iint_cache *iint,
+			struct evm_ima_xattr_data *xattr_value, int xattr_len,
+			enum integrity_status *status, const char **cause)
+{
+	int rc = -EINVAL, hash_start = 0;
+
+	switch (xattr_value->type) {
+	case IMA_XATTR_DIGEST_NG:
+		/* first byte contains algorithm id */
+		hash_start = 1;
+		/* fall through */
+	case IMA_XATTR_DIGEST:
+		if (iint->flags & IMA_DIGSIG_REQUIRED) {
+			*cause = "IMA-signature-required";
+			*status = INTEGRITY_FAIL;
+			break;
+		}
+		clear_bit(IMA_DIGSIG, &iint->atomic_flags);
+		if (xattr_len - sizeof(xattr_value->type) - hash_start >=
+				iint->ima_hash->length)
+			/*
+			 * xattr length may be longer. md5 hash in previous
+			 * version occupied 20 bytes in xattr, instead of 16
+			 */
+			rc = memcmp(&xattr_value->data[hash_start],
+				    iint->ima_hash->digest,
+				    iint->ima_hash->length);
+		else
+			rc = -EINVAL;
+		if (rc) {
+			*cause = "invalid-hash";
+			*status = INTEGRITY_FAIL;
+			break;
+		}
+		*status = INTEGRITY_PASS;
+		break;
+	case EVM_IMA_XATTR_DIGSIG:
+		set_bit(IMA_DIGSIG, &iint->atomic_flags);
+		rc = integrity_digsig_verify(INTEGRITY_KEYRING_IMA,
+					     (const char *)xattr_value,
+					     xattr_len,
+					     iint->ima_hash->digest,
+					     iint->ima_hash->length);
+		if (rc == -EOPNOTSUPP) {
+			*status = INTEGRITY_UNKNOWN;
+			break;
+		}
+		if (IS_ENABLED(CONFIG_INTEGRITY_PLATFORM_KEYRING) && rc &&
+		    func == KEXEC_KERNEL_CHECK)
+			rc = integrity_digsig_verify(INTEGRITY_KEYRING_PLATFORM,
+						     (const char *)xattr_value,
+						     xattr_len,
+						     iint->ima_hash->digest,
+						     iint->ima_hash->length);
+		if (rc) {
+			*cause = "invalid-signature";
+			*status = INTEGRITY_FAIL;
+		} else {
+			*status = INTEGRITY_PASS;
+		}
+		break;
+	default:
+		*status = INTEGRITY_UNKNOWN;
+		*cause = "unknown-ima-data";
+		break;
+	}
+
+	return rc;
+}
+
 /*
  * ima_appraise_measurement - appraise file measurement
  *
@@ -221,7 +298,7 @@ int ima_appraise_measurement(enum ima_hooks func,
 	struct dentry *dentry = file_dentry(file);
 	struct inode *inode = d_backing_inode(dentry);
 	enum integrity_status status = INTEGRITY_UNKNOWN;
-	int rc = xattr_len, hash_start = 0;
+	int rc = xattr_len;
 
 	if (!(inode->i_opflags & IOP_XATTR))
 		return INTEGRITY_UNKNOWN;
@@ -259,65 +336,9 @@ int ima_appraise_measurement(enum ima_hooks func,
 		WARN_ONCE(true, "Unexpected integrity status %d\n", status);
 	}
 
-	switch (xattr_value->type) {
-	case IMA_XATTR_DIGEST_NG:
-		/* first byte contains algorithm id */
-		hash_start = 1;
-		/* fall through */
-	case IMA_XATTR_DIGEST:
-		if (iint->flags & IMA_DIGSIG_REQUIRED) {
-			cause = "IMA-signature-required";
-			status = INTEGRITY_FAIL;
-			break;
-		}
-		clear_bit(IMA_DIGSIG, &iint->atomic_flags);
-		if (xattr_len - sizeof(xattr_value->type) - hash_start >=
-				iint->ima_hash->length)
-			/* xattr length may be longer. md5 hash in previous
-			   version occupied 20 bytes in xattr, instead of 16
-			 */
-			rc = memcmp(&xattr_value->data[hash_start],
-				    iint->ima_hash->digest,
-				    iint->ima_hash->length);
-		else
-			rc = -EINVAL;
-		if (rc) {
-			cause = "invalid-hash";
-			status = INTEGRITY_FAIL;
-			break;
-		}
-		status = INTEGRITY_PASS;
-		break;
-	case EVM_IMA_XATTR_DIGSIG:
-		set_bit(IMA_DIGSIG, &iint->atomic_flags);
-		rc = integrity_digsig_verify(INTEGRITY_KEYRING_IMA,
-					     (const char *)xattr_value,
-					     xattr_len,
-					     iint->ima_hash->digest,
-					     iint->ima_hash->length);
-		if (rc == -EOPNOTSUPP) {
-			status = INTEGRITY_UNKNOWN;
-			break;
-		}
-		if (IS_ENABLED(CONFIG_INTEGRITY_PLATFORM_KEYRING) && rc &&
-		    func == KEXEC_KERNEL_CHECK)
-			rc = integrity_digsig_verify(INTEGRITY_KEYRING_PLATFORM,
-						     (const char *)xattr_value,
-						     xattr_len,
-						     iint->ima_hash->digest,
-						     iint->ima_hash->length);
-		if (rc) {
-			cause = "invalid-signature";
-			status = INTEGRITY_FAIL;
-		} else {
-			status = INTEGRITY_PASS;
-		}
-		break;
-	default:
-		status = INTEGRITY_UNKNOWN;
-		cause = "unknown-ima-data";
-		break;
-	}
+	if (xattr_value)
+		rc = xattr_verify(func, iint, xattr_value, xattr_len, &status,
+				  &cause);
 
 out:
 	/*


^ permalink raw reply related

* [PATCH v11 09/13] ima: Implement support for module-style appended signatures
From: Thiago Jung Bauermann @ 2019-06-11  6:28 UTC (permalink / raw)
  To: linux-integrity
  Cc: Herbert Xu, linux-doc, Dmitry Kasatkin, David S. Miller,
	Jonathan Corbet, linux-kernel, Mimi Zohar, James Morris,
	David Howells, AKASHI, Takahiro, linux-security-module, keyrings,
	linux-crypto, Jessica Yu, linuxppc-dev, David Woodhouse,
	Thiago Jung Bauermann, Serge E. Hallyn
In-Reply-To: <20190611062817.18412-1-bauerman@linux.ibm.com>

Implement the appraise_type=imasig|modsig option, allowing IMA to read and
verify modsig signatures.

In case a file has both an xattr signature and an appended modsig, IMA will
only use the appended signature if the key used by the xattr signature
isn't present in the IMA or platform keyring.

Because modsig verification needs to convert from an integrity keyring id
to the keyring itself, add an integrity_keyring_from_id() function in
digsig.c so that integrity_modsig_verify() can use it.

Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
---
 security/integrity/digsig.c           | 43 ++++++++++++----
 security/integrity/ima/Kconfig        |  3 ++
 security/integrity/ima/ima.h          | 22 ++++++++-
 security/integrity/ima/ima_appraise.c | 51 +++++++++++++++++--
 security/integrity/ima/ima_main.c     | 11 ++++-
 security/integrity/ima/ima_modsig.c   | 71 +++++++++++++++++++++++++++
 security/integrity/ima/ima_policy.c   | 12 ++---
 security/integrity/integrity.h        | 19 +++++++
 8 files changed, 209 insertions(+), 23 deletions(-)

diff --git a/security/integrity/digsig.c b/security/integrity/digsig.c
index e19c2eb72c51..3399a7e32830 100644
--- a/security/integrity/digsig.c
+++ b/security/integrity/digsig.c
@@ -43,11 +43,10 @@ static const char * const keyring_name[INTEGRITY_KEYRING_MAX] = {
 #define restrict_link_to_ima restrict_link_by_builtin_trusted
 #endif
 
-int integrity_digsig_verify(const unsigned int id, const char *sig, int siglen,
-			    const char *digest, int digestlen)
+static struct key *integrity_keyring_from_id(const unsigned int id)
 {
-	if (id >= INTEGRITY_KEYRING_MAX || siglen < 2)
-		return -EINVAL;
+	if (id >= INTEGRITY_KEYRING_MAX)
+		return ERR_PTR(-EINVAL);
 
 	if (!keyring[id]) {
 		keyring[id] =
@@ -56,23 +55,49 @@ int integrity_digsig_verify(const unsigned int id, const char *sig, int siglen,
 			int err = PTR_ERR(keyring[id]);
 			pr_err("no %s keyring: %d\n", keyring_name[id], err);
 			keyring[id] = NULL;
-			return err;
+			return ERR_PTR(err);
 		}
 	}
 
+	return keyring[id];
+}
+
+int integrity_digsig_verify(const unsigned int id, const char *sig, int siglen,
+			    const char *digest, int digestlen)
+{
+	struct key *keyring;
+
+	if (siglen < 2)
+		return -EINVAL;
+
+	keyring = integrity_keyring_from_id(id);
+	if (IS_ERR(keyring))
+		return PTR_ERR(keyring);
+
 	switch (sig[1]) {
 	case 1:
 		/* v1 API expect signature without xattr type */
-		return digsig_verify(keyring[id], sig + 1, siglen - 1,
-				     digest, digestlen);
+		return digsig_verify(keyring, sig + 1, siglen - 1, digest,
+				     digestlen);
 	case 2:
-		return asymmetric_verify(keyring[id], sig, siglen,
-					 digest, digestlen);
+		return asymmetric_verify(keyring, sig, siglen, digest,
+					 digestlen);
 	}
 
 	return -EOPNOTSUPP;
 }
 
+int integrity_modsig_verify(const unsigned int id, const struct modsig *modsig)
+{
+	struct key *keyring;
+
+	keyring = integrity_keyring_from_id(id);
+	if (IS_ERR(keyring))
+		return PTR_ERR(keyring);
+
+	return ima_modsig_verify(keyring, modsig);
+}
+
 static int __integrity_init_keyring(const unsigned int id, key_perm_t perm,
 				    struct key_restriction *restriction)
 {
diff --git a/security/integrity/ima/Kconfig b/security/integrity/ima/Kconfig
index bba19f9ea184..0fb542455698 100644
--- a/security/integrity/ima/Kconfig
+++ b/security/integrity/ima/Kconfig
@@ -234,6 +234,9 @@ config IMA_APPRAISE_BOOTPARAM
 config IMA_APPRAISE_MODSIG
 	bool "Support module-style signatures for appraisal"
 	depends on IMA_APPRAISE
+	depends on INTEGRITY_ASYMMETRIC_KEYS
+	select PKCS7_MESSAGE_PARSER
+	select MODULE_SIG_FORMAT
 	default n
 	help
 	   Adds support for signatures appended to files. The format of the
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index 9e2580164e97..ebbfae10f174 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -192,6 +192,10 @@ enum ima_hooks {
 	__ima_hooks(__ima_hook_enumify)
 };
 
+extern const char *const func_tokens[];
+
+struct modsig;
+
 /* LIM API function definitions */
 int ima_get_action(struct inode *inode, const struct cred *cred, u32 secid,
 		   int mask, enum ima_hooks func, int *pcr,
@@ -245,7 +249,7 @@ int ima_appraise_measurement(enum ima_hooks func,
 			     struct integrity_iint_cache *iint,
 			     struct file *file, const unsigned char *filename,
 			     struct evm_ima_xattr_data *xattr_value,
-			     int xattr_len);
+			     int xattr_len, const struct modsig *modsig);
 int ima_must_appraise(struct inode *inode, int mask, enum ima_hooks func);
 void ima_update_xattr(struct integrity_iint_cache *iint, struct file *file);
 enum integrity_status ima_get_cache_status(struct integrity_iint_cache *iint,
@@ -261,7 +265,8 @@ static inline int ima_appraise_measurement(enum ima_hooks func,
 					   struct file *file,
 					   const unsigned char *filename,
 					   struct evm_ima_xattr_data *xattr_value,
-					   int xattr_len)
+					   int xattr_len,
+					   const struct modsig *modsig)
 {
 	return INTEGRITY_UNKNOWN;
 }
@@ -300,11 +305,24 @@ static inline int ima_read_xattr(struct dentry *dentry,
 
 #ifdef CONFIG_IMA_APPRAISE_MODSIG
 bool ima_hook_supports_modsig(enum ima_hooks func);
+int ima_read_modsig(enum ima_hooks func, const void *buf, loff_t buf_len,
+		    struct modsig **modsig);
+void ima_free_modsig(struct modsig *modsig);
 #else
 static inline bool ima_hook_supports_modsig(enum ima_hooks func)
 {
 	return false;
 }
+
+static inline int ima_read_modsig(enum ima_hooks func, const void *buf,
+				  loff_t buf_len, struct modsig **modsig)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline void ima_free_modsig(struct modsig *modsig)
+{
+}
 #endif /* CONFIG_IMA_APPRAISE_MODSIG */
 
 /* LSM based policy rules require audit */
diff --git a/security/integrity/ima/ima_appraise.c b/security/integrity/ima/ima_appraise.c
index 5d4772f39757..70252ac3321d 100644
--- a/security/integrity/ima/ima_appraise.c
+++ b/security/integrity/ima/ima_appraise.c
@@ -279,6 +279,33 @@ static int xattr_verify(enum ima_hooks func, struct integrity_iint_cache *iint,
 	return rc;
 }
 
+/*
+ * modsig_verify - verify modsig signature
+ *
+ * Verify whether the signature matches the file contents.
+ *
+ * Return 0 on success, error code otherwise.
+ */
+static int modsig_verify(enum ima_hooks func, const struct modsig *modsig,
+			 enum integrity_status *status, const char **cause)
+{
+	int rc;
+
+	rc = integrity_modsig_verify(INTEGRITY_KEYRING_IMA, modsig);
+	if (IS_ENABLED(CONFIG_INTEGRITY_PLATFORM_KEYRING) && rc &&
+	    func == KEXEC_KERNEL_CHECK)
+		rc = integrity_modsig_verify(INTEGRITY_KEYRING_PLATFORM,
+					     modsig);
+	if (rc) {
+		*cause = "invalid-signature";
+		*status = INTEGRITY_FAIL;
+	} else {
+		*status = INTEGRITY_PASS;
+	}
+
+	return rc;
+}
+
 /*
  * ima_appraise_measurement - appraise file measurement
  *
@@ -291,7 +318,7 @@ int ima_appraise_measurement(enum ima_hooks func,
 			     struct integrity_iint_cache *iint,
 			     struct file *file, const unsigned char *filename,
 			     struct evm_ima_xattr_data *xattr_value,
-			     int xattr_len)
+			     int xattr_len, const struct modsig *modsig)
 {
 	static const char op[] = "appraise_data";
 	const char *cause = "unknown";
@@ -299,11 +326,14 @@ int ima_appraise_measurement(enum ima_hooks func,
 	struct inode *inode = d_backing_inode(dentry);
 	enum integrity_status status = INTEGRITY_UNKNOWN;
 	int rc = xattr_len;
+	bool try_modsig = iint->flags & IMA_MODSIG_ALLOWED && modsig;
 
-	if (!(inode->i_opflags & IOP_XATTR))
+	/* If not appraising a modsig, we need an xattr. */
+	if (!(inode->i_opflags & IOP_XATTR) && !try_modsig)
 		return INTEGRITY_UNKNOWN;
 
-	if (rc <= 0) {
+	/* If reading the xattr failed and there's no modsig, error out. */
+	if (rc <= 0 && !try_modsig) {
 		if (rc && rc != -ENODATA)
 			goto out;
 
@@ -326,6 +356,10 @@ int ima_appraise_measurement(enum ima_hooks func,
 	case INTEGRITY_UNKNOWN:
 		break;
 	case INTEGRITY_NOXATTRS:	/* No EVM protected xattrs. */
+		/* It's fine not to have xattrs when using a modsig. */
+		if (try_modsig)
+			break;
+		/* fall through */
 	case INTEGRITY_NOLABEL:		/* No security.evm xattr. */
 		cause = "missing-HMAC";
 		goto out;
@@ -340,6 +374,15 @@ int ima_appraise_measurement(enum ima_hooks func,
 		rc = xattr_verify(func, iint, xattr_value, xattr_len, &status,
 				  &cause);
 
+	/*
+	 * If we have a modsig and either no imasig or the imasig's key isn't
+	 * known, then try verifying the modsig.
+	 */
+	if (try_modsig &&
+	    (!xattr_value || xattr_value->type == IMA_XATTR_DIGEST_NG ||
+	     rc == -ENOKEY))
+		rc = modsig_verify(func, modsig, &status, &cause);
+
 out:
 	/*
 	 * File signatures on some filesystems can not be properly verified.
@@ -356,7 +399,7 @@ int ima_appraise_measurement(enum ima_hooks func,
 				    op, cause, rc, 0);
 	} else if (status != INTEGRITY_PASS) {
 		/* Fix mode, but don't replace file signatures. */
-		if ((ima_appraise & IMA_APPRAISE_FIX) &&
+		if ((ima_appraise & IMA_APPRAISE_FIX) && !try_modsig &&
 		    (!xattr_value ||
 		     xattr_value->type != EVM_IMA_XATTR_DIGSIG)) {
 			if (!ima_fix_xattr(dentry, iint))
diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
index af341a80118f..8ddf9faa8d02 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -202,6 +202,7 @@ static int process_measurement(struct file *file, const struct cred *cred,
 	int rc = 0, action, must_appraise = 0;
 	int pcr = CONFIG_IMA_MEASURE_PCR_IDX;
 	struct evm_ima_xattr_data *xattr_value = NULL;
+	struct modsig *modsig = NULL;
 	int xattr_len = 0;
 	bool violation_check;
 	enum hash_algo hash_algo;
@@ -302,10 +303,15 @@ static int process_measurement(struct file *file, const struct cred *cred,
 	}
 
 	if ((action & IMA_APPRAISE_SUBMASK) ||
-		    strcmp(template_desc->name, IMA_TEMPLATE_IMA_NAME) != 0)
+	    strcmp(template_desc->name, IMA_TEMPLATE_IMA_NAME) != 0) {
 		/* read 'security.ima' */
 		xattr_len = ima_read_xattr(file_dentry(file), &xattr_value);
 
+		/* Read the appended modsig if allowed by the policy. */
+		if (iint->flags & IMA_MODSIG_ALLOWED)
+			ima_read_modsig(func, buf, size, &modsig);
+	}
+
 	hash_algo = ima_get_hash_algo(xattr_value, xattr_len);
 
 	rc = ima_collect_measurement(iint, file, buf, size, hash_algo);
@@ -322,7 +328,7 @@ static int process_measurement(struct file *file, const struct cred *cred,
 	if (rc == 0 && (action & IMA_APPRAISE_SUBMASK)) {
 		inode_lock(inode);
 		rc = ima_appraise_measurement(func, iint, file, pathname,
-					      xattr_value, xattr_len);
+					      xattr_value, xattr_len, modsig);
 		inode_unlock(inode);
 		if (!rc)
 			rc = mmap_violation_check(func, file, &pathbuf,
@@ -339,6 +345,7 @@ static int process_measurement(struct file *file, const struct cred *cred,
 		rc = -EACCES;
 	mutex_unlock(&iint->mutex);
 	kfree(xattr_value);
+	ima_free_modsig(modsig);
 out:
 	if (pathbuf)
 		__putname(pathbuf);
diff --git a/security/integrity/ima/ima_modsig.c b/security/integrity/ima/ima_modsig.c
index 87503bfe8c8b..f41ebe370fa0 100644
--- a/security/integrity/ima/ima_modsig.c
+++ b/security/integrity/ima/ima_modsig.c
@@ -8,8 +8,17 @@
  * Thiago Jung Bauermann <bauerman@linux.ibm.com>
  */
 
+#include <linux/types.h>
+#include <linux/module_signature.h>
+#include <keys/asymmetric-type.h>
+#include <crypto/pkcs7.h>
+
 #include "ima.h"
 
+struct modsig {
+	struct pkcs7_message *pkcs7_msg;
+};
+
 /**
  * ima_hook_supports_modsig - can the policy allow modsig for this hook?
  *
@@ -29,3 +38,65 @@ bool ima_hook_supports_modsig(enum ima_hooks func)
 		return false;
 	}
 }
+
+/*
+ * ima_read_modsig - Read modsig from buf.
+ *
+ * Return: 0 on success, error code otherwise.
+ */
+int ima_read_modsig(enum ima_hooks func, const void *buf, loff_t buf_len,
+		    struct modsig **modsig)
+{
+	const size_t marker_len = strlen(MODULE_SIG_STRING);
+	const struct module_signature *sig;
+	struct modsig *hdr;
+	size_t sig_len;
+	const void *p;
+	int rc;
+
+	if (buf_len <= marker_len + sizeof(*sig))
+		return -ENOENT;
+
+	p = buf + buf_len - marker_len;
+	if (memcmp(p, MODULE_SIG_STRING, marker_len))
+		return -ENOENT;
+
+	buf_len -= marker_len;
+	sig = (const struct module_signature *)(p - sizeof(*sig));
+
+	rc = mod_check_sig(sig, buf_len, func_tokens[func]);
+	if (rc)
+		return rc;
+
+	sig_len = be32_to_cpu(sig->sig_len);
+	buf_len -= sig_len + sizeof(*sig);
+
+	hdr = kmalloc(sizeof(*hdr), GFP_KERNEL);
+	if (!hdr)
+		return -ENOMEM;
+
+	hdr->pkcs7_msg = pkcs7_parse_message(buf + buf_len, sig_len);
+	if (IS_ERR(hdr->pkcs7_msg)) {
+		kfree(hdr);
+		return PTR_ERR(hdr->pkcs7_msg);
+	}
+
+	*modsig = hdr;
+
+	return 0;
+}
+
+int ima_modsig_verify(struct key *keyring, const struct modsig *modsig)
+{
+	return verify_pkcs7_message_sig(NULL, 0, modsig->pkcs7_msg, keyring,
+					VERIFYING_MODULE_SIGNATURE, NULL, NULL);
+}
+
+void ima_free_modsig(struct modsig *modsig)
+{
+	if (!modsig)
+		return;
+
+	pkcs7_free_message(modsig->pkcs7_msg);
+	kfree(modsig);
+}
diff --git a/security/integrity/ima/ima_policy.c b/security/integrity/ima/ima_policy.c
index 06ae4b7b3676..f64ef84516db 100644
--- a/security/integrity/ima/ima_policy.c
+++ b/security/integrity/ima/ima_policy.c
@@ -1167,6 +1167,12 @@ void ima_delete_rules(void)
 	}
 }
 
+#define __ima_hook_stringify(str)	(#str),
+
+const char *const func_tokens[] = {
+	__ima_hooks(__ima_hook_stringify)
+};
+
 #ifdef	CONFIG_IMA_READ_POLICY
 enum {
 	mask_exec = 0, mask_write, mask_read, mask_append
@@ -1179,12 +1185,6 @@ static const char *const mask_tokens[] = {
 	"^MAY_APPEND"
 };
 
-#define __ima_hook_stringify(str)	(#str),
-
-static const char *const func_tokens[] = {
-	__ima_hooks(__ima_hook_stringify)
-};
-
 void *ima_policy_start(struct seq_file *m, loff_t *pos)
 {
 	loff_t l = *pos;
diff --git a/security/integrity/integrity.h b/security/integrity/integrity.h
index 0e7330a36a9d..c6e7f41db470 100644
--- a/security/integrity/integrity.h
+++ b/security/integrity/integrity.h
@@ -153,10 +153,13 @@ int integrity_kernel_read(struct file *file, loff_t offset,
 
 extern struct dentry *integrity_dir;
 
+struct modsig;
+
 #ifdef CONFIG_INTEGRITY_SIGNATURE
 
 int integrity_digsig_verify(const unsigned int id, const char *sig, int siglen,
 			    const char *digest, int digestlen);
+int integrity_modsig_verify(unsigned int id, const struct modsig *modsig);
 
 int __init integrity_init_keyring(const unsigned int id);
 int __init integrity_load_x509(const unsigned int id, const char *path);
@@ -171,6 +174,12 @@ static inline int integrity_digsig_verify(const unsigned int id,
 	return -EOPNOTSUPP;
 }
 
+static inline int integrity_modsig_verify(unsigned int id,
+					  const struct modsig *modsig)
+{
+	return -EOPNOTSUPP;
+}
+
 static inline int integrity_init_keyring(const unsigned int id)
 {
 	return 0;
@@ -196,6 +205,16 @@ static inline int asymmetric_verify(struct key *keyring, const char *sig,
 }
 #endif
 
+#ifdef CONFIG_IMA_APPRAISE_MODSIG
+int ima_modsig_verify(struct key *keyring, const struct modsig *modsig);
+#else
+static inline int ima_modsig_verify(struct key *keyring,
+				    const struct modsig *modsig)
+{
+	return -EOPNOTSUPP;
+}
+#endif
+
 #ifdef CONFIG_IMA_LOAD_X509
 void __init ima_load_x509(void);
 #else


^ permalink raw reply related

* [PATCH v11 10/13] ima: Collect modsig
From: Thiago Jung Bauermann @ 2019-06-11  6:28 UTC (permalink / raw)
  To: linux-integrity
  Cc: Herbert Xu, linux-doc, Dmitry Kasatkin, David S. Miller,
	Jonathan Corbet, linux-kernel, Mimi Zohar, James Morris,
	David Howells, AKASHI, Takahiro, linux-security-module, keyrings,
	linux-crypto, Jessica Yu, linuxppc-dev, David Woodhouse,
	Thiago Jung Bauermann, Serge E. Hallyn
In-Reply-To: <20190611062817.18412-1-bauerman@linux.ibm.com>

Obtain the modsig and calculate its corresponding hash in
ima_collect_measurement().

Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
---
 security/integrity/ima/ima.h          |  8 ++++-
 security/integrity/ima/ima_api.c      |  5 ++-
 security/integrity/ima/ima_appraise.c |  2 +-
 security/integrity/ima/ima_main.c     |  2 +-
 security/integrity/ima/ima_modsig.c   | 50 ++++++++++++++++++++++++++-
 5 files changed, 62 insertions(+), 5 deletions(-)

diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index ebbfae10f174..0acc8e56ec73 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -203,7 +203,7 @@ int ima_get_action(struct inode *inode, const struct cred *cred, u32 secid,
 int ima_must_measure(struct inode *inode, int mask, enum ima_hooks func);
 int ima_collect_measurement(struct integrity_iint_cache *iint,
 			    struct file *file, void *buf, loff_t size,
-			    enum hash_algo algo);
+			    enum hash_algo algo, struct modsig *modsig);
 void ima_store_measurement(struct integrity_iint_cache *iint, struct file *file,
 			   const unsigned char *filename,
 			   struct evm_ima_xattr_data *xattr_value,
@@ -307,6 +307,7 @@ static inline int ima_read_xattr(struct dentry *dentry,
 bool ima_hook_supports_modsig(enum ima_hooks func);
 int ima_read_modsig(enum ima_hooks func, const void *buf, loff_t buf_len,
 		    struct modsig **modsig);
+void ima_collect_modsig(struct modsig *modsig, const void *buf, loff_t size);
 void ima_free_modsig(struct modsig *modsig);
 #else
 static inline bool ima_hook_supports_modsig(enum ima_hooks func)
@@ -320,6 +321,11 @@ static inline int ima_read_modsig(enum ima_hooks func, const void *buf,
 	return -EOPNOTSUPP;
 }
 
+static inline void ima_collect_modsig(struct modsig *modsig, const void *buf,
+				      loff_t size)
+{
+}
+
 static inline void ima_free_modsig(struct modsig *modsig)
 {
 }
diff --git a/security/integrity/ima/ima_api.c b/security/integrity/ima/ima_api.c
index c0cf4bcfc82f..c351b8c37278 100644
--- a/security/integrity/ima/ima_api.c
+++ b/security/integrity/ima/ima_api.c
@@ -208,7 +208,7 @@ int ima_get_action(struct inode *inode, const struct cred *cred, u32 secid,
  */
 int ima_collect_measurement(struct integrity_iint_cache *iint,
 			    struct file *file, void *buf, loff_t size,
-			    enum hash_algo algo)
+			    enum hash_algo algo, struct modsig *modsig)
 {
 	const char *audit_cause = "failed";
 	struct inode *inode = file_inode(file);
@@ -255,6 +255,9 @@ int ima_collect_measurement(struct integrity_iint_cache *iint,
 	memcpy(iint->ima_hash, &hash, length);
 	iint->version = i_version;
 
+	if (modsig)
+		ima_collect_modsig(modsig, buf, size);
+
 	/* Possibly temporary failure due to type of read (eg. O_DIRECT) */
 	if (!result)
 		iint->flags |= IMA_COLLECTED;
diff --git a/security/integrity/ima/ima_appraise.c b/security/integrity/ima/ima_appraise.c
index 70252ac3321d..aa14e3fe25d5 100644
--- a/security/integrity/ima/ima_appraise.c
+++ b/security/integrity/ima/ima_appraise.c
@@ -438,7 +438,7 @@ void ima_update_xattr(struct integrity_iint_cache *iint, struct file *file)
 	    !(iint->flags & IMA_HASH))
 		return;
 
-	rc = ima_collect_measurement(iint, file, NULL, 0, ima_hash_algo);
+	rc = ima_collect_measurement(iint, file, NULL, 0, ima_hash_algo, NULL);
 	if (rc < 0)
 		return;
 
diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
index 8ddf9faa8d02..2c9d3cf85726 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -314,7 +314,7 @@ static int process_measurement(struct file *file, const struct cred *cred,
 
 	hash_algo = ima_get_hash_algo(xattr_value, xattr_len);
 
-	rc = ima_collect_measurement(iint, file, buf, size, hash_algo);
+	rc = ima_collect_measurement(iint, file, buf, size, hash_algo, modsig);
 	if (rc != 0 && rc != -EBADF && rc != -EINVAL)
 		goto out_locked;
 
diff --git a/security/integrity/ima/ima_modsig.c b/security/integrity/ima/ima_modsig.c
index f41ebe370fa0..d438b87dba89 100644
--- a/security/integrity/ima/ima_modsig.c
+++ b/security/integrity/ima/ima_modsig.c
@@ -17,6 +17,19 @@
 
 struct modsig {
 	struct pkcs7_message *pkcs7_msg;
+
+	enum hash_algo hash_algo;
+
+	/* This digest will go in the 'd-modsig' field of the IMA template. */
+	const u8 *digest;
+	u32 digest_size;
+
+	/*
+	 * This is what will go to the measurement list if the template requires
+	 * storing the signature.
+	 */
+	int raw_pkcs7_len;
+	u8 raw_pkcs7[];
 };
 
 /**
@@ -71,7 +84,8 @@ int ima_read_modsig(enum ima_hooks func, const void *buf, loff_t buf_len,
 	sig_len = be32_to_cpu(sig->sig_len);
 	buf_len -= sig_len + sizeof(*sig);
 
-	hdr = kmalloc(sizeof(*hdr), GFP_KERNEL);
+	/* Allocate sig_len additional bytes to hold the raw PKCS#7 data. */
+	hdr = kzalloc(sizeof(*hdr) + sig_len, GFP_KERNEL);
 	if (!hdr)
 		return -ENOMEM;
 
@@ -81,11 +95,45 @@ int ima_read_modsig(enum ima_hooks func, const void *buf, loff_t buf_len,
 		return PTR_ERR(hdr->pkcs7_msg);
 	}
 
+	memcpy(hdr->raw_pkcs7, buf + buf_len, sig_len);
+	hdr->raw_pkcs7_len = sig_len;
+
+	/* We don't know the hash algorithm yet. */
+	hdr->hash_algo = HASH_ALGO__LAST;
+
 	*modsig = hdr;
 
 	return 0;
 }
 
+/**
+ * ima_collect_modsig - Calculate the file hash without the appended signature.
+ *
+ * Since the modsig is part of the file contents, the hash used in its signature
+ * isn't the same one ordinarily calculated by IMA. Therefore PKCS7 code
+ * calculates a separate one for signature verification.
+ */
+void ima_collect_modsig(struct modsig *modsig, const void *buf, loff_t size)
+{
+	int rc;
+
+	/*
+	 * Provide the file contents (minus the appended sig) so that the PKCS7
+	 * code can calculate the file hash.
+	 */
+	size -= modsig->raw_pkcs7_len + strlen(MODULE_SIG_STRING) +
+		sizeof(struct module_signature);
+	rc = pkcs7_supply_detached_data(modsig->pkcs7_msg, buf, size);
+	if (rc)
+		return;
+
+	/* Ask the PKCS7 code to calculate the file hash. */
+	rc = pkcs7_get_digest(modsig->pkcs7_msg, &modsig->digest,
+			      &modsig->digest_size, &modsig->hash_algo);
+	if (rc)
+		return;
+}
+
 int ima_modsig_verify(struct key *keyring, const struct modsig *modsig)
 {
 	return verify_pkcs7_message_sig(NULL, 0, modsig->pkcs7_msg, keyring,


^ permalink raw reply related

* [PATCH v11 12/13] ima: Store the measurement again when appraising a modsig
From: Thiago Jung Bauermann @ 2019-06-11  6:28 UTC (permalink / raw)
  To: linux-integrity
  Cc: Herbert Xu, linux-doc, Dmitry Kasatkin, David S. Miller,
	Jonathan Corbet, linux-kernel, Mimi Zohar, James Morris,
	David Howells, AKASHI, Takahiro, linux-security-module, keyrings,
	linux-crypto, Jessica Yu, linuxppc-dev, David Woodhouse,
	Thiago Jung Bauermann, Serge E. Hallyn
In-Reply-To: <20190611062817.18412-1-bauerman@linux.ibm.com>

If the IMA template contains the "modsig" or "d-modsig" field, then the
modsig should be added to the measurement list when the file is appraised.

And that is what normally happens, but if a measurement rule caused a file
containing a modsig to be measured before a different rule causes it to be
appraised, the resulting measurement entry will not contain the modsig
because it is only fetched during appraisal. When the appraisal rule
triggers, it won't store a new measurement containing the modsig because
the file was already measured.

We need to detect that situation and store an additional measurement with
the modsig. This is done by adding an IMA_MEASURE action flag if we read a
modsig and the IMA template contains a modsig field.

Suggested-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
---
 security/integrity/ima/ima.h          |  1 +
 security/integrity/ima/ima_api.c      | 19 +++++++++++++++----
 security/integrity/ima/ima_main.c     | 15 ++++++++++++---
 security/integrity/ima/ima_template.c | 19 +++++++++++++++++++
 4 files changed, 47 insertions(+), 7 deletions(-)

diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index a2b2c13ceda8..44f5f60424c2 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -149,6 +149,7 @@ void ima_putc(struct seq_file *m, void *data, int datalen);
 void ima_print_digest(struct seq_file *m, u8 *digest, u32 size);
 struct ima_template_desc *ima_template_desc_current(void);
 struct ima_template_desc *lookup_template_desc(const char *name);
+bool ima_template_has_modsig(const struct ima_template_desc *ima_template);
 int ima_restore_measurement_entry(struct ima_template_entry *entry);
 int ima_restore_measurement_list(loff_t bufsize, void *buf);
 int ima_measurements_show(struct seq_file *m, void *v);
diff --git a/security/integrity/ima/ima_api.c b/security/integrity/ima/ima_api.c
index 32297d1e6164..bb887ed3d8a7 100644
--- a/security/integrity/ima/ima_api.c
+++ b/security/integrity/ima/ima_api.c
@@ -222,6 +222,14 @@ int ima_collect_measurement(struct integrity_iint_cache *iint,
 		char digest[IMA_MAX_DIGEST_SIZE];
 	} hash;
 
+	/*
+	 * Always collect the modsig, because IMA might have already collected
+	 * the file digest without collecting the modsig in a previous
+	 * measurement rule.
+	 */
+	if (modsig)
+		ima_collect_modsig(modsig, buf, size);
+
 	if (iint->flags & IMA_COLLECTED)
 		goto out;
 
@@ -255,9 +263,6 @@ int ima_collect_measurement(struct integrity_iint_cache *iint,
 	memcpy(iint->ima_hash, &hash, length);
 	iint->version = i_version;
 
-	if (modsig)
-		ima_collect_modsig(modsig, buf, size);
-
 	/* Possibly temporary failure due to type of read (eg. O_DIRECT) */
 	if (!result)
 		iint->flags |= IMA_COLLECTED;
@@ -307,7 +312,13 @@ void ima_store_measurement(struct integrity_iint_cache *iint,
 					     .modsig = modsig };
 	int violation = 0;
 
-	if (iint->measured_pcrs & (0x1 << pcr))
+	/*
+	 * We still need to store the measurement in the case of MODSIG because
+	 * we only have its contents to put in the list at the time of
+	 * appraisal, but a file measurement from earlier might already exist in
+	 * the measurement list.
+	 */
+	if (iint->measured_pcrs & (0x1 << pcr) && !modsig)
 		return;
 
 	result = ima_alloc_init_template(&event_data, &entry, template_desc);
diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
index 85afb31fafe0..e0ca39f81a59 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -307,9 +307,18 @@ static int process_measurement(struct file *file, const struct cred *cred,
 		/* read 'security.ima' */
 		xattr_len = ima_read_xattr(file_dentry(file), &xattr_value);
 
-		/* Read the appended modsig if allowed by the policy. */
-		if (iint->flags & IMA_MODSIG_ALLOWED)
-			ima_read_modsig(func, buf, size, &modsig);
+		/*
+		 * Read the appended modsig if allowed by the policy, and allow
+		 * an additional measurement list entry, if needed, based on the
+		 * template format and whether the file was already measured.
+		 */
+		if (iint->flags & IMA_MODSIG_ALLOWED) {
+			rc = ima_read_modsig(func, buf, size, &modsig);
+
+			if (!rc && ima_template_has_modsig(template_desc) &&
+			    iint->flags & IMA_MEASURED)
+				action |= IMA_MEASURE;
+		}
 	}
 
 	hash_algo = ima_get_hash_algo(xattr_value, xattr_len);
diff --git a/security/integrity/ima/ima_template.c b/security/integrity/ima/ima_template.c
index e25bef419c87..00d9a6cc8a60 100644
--- a/security/integrity/ima/ima_template.c
+++ b/security/integrity/ima/ima_template.c
@@ -56,6 +56,25 @@ static int template_desc_init_fields(const char *template_fmt,
 				     const struct ima_template_field ***fields,
 				     int *num_fields);
 
+/**
+ * ima_template_has_modsig - Check whether template has modsig-related fields.
+ * @ima_template: IMA template to check.
+ *
+ * Tells whether the given template has fields referencing a file's appended
+ * signature.
+ */
+bool ima_template_has_modsig(const struct ima_template_desc *ima_template)
+{
+	int i;
+
+	for (i = 0; i < ima_template->num_fields; i++)
+		if (!strcmp(ima_template->fields[i]->field_id, "modsig") ||
+		    !strcmp(ima_template->fields[i]->field_id, "d-modsig"))
+			return true;
+
+	return false;
+}
+
 static int __init ima_template_setup(char *str)
 {
 	struct ima_template_desc *template_desc;


^ permalink raw reply related

* [PATCH v11 11/13] ima: Define ima-modsig template
From: Thiago Jung Bauermann @ 2019-06-11  6:28 UTC (permalink / raw)
  To: linux-integrity
  Cc: Herbert Xu, linux-doc, Dmitry Kasatkin, David S. Miller,
	Jonathan Corbet, linux-kernel, Mimi Zohar, James Morris,
	David Howells, AKASHI, Takahiro, linux-security-module, keyrings,
	linux-crypto, Jessica Yu, linuxppc-dev, David Woodhouse,
	Thiago Jung Bauermann, Serge E. Hallyn
In-Reply-To: <20190611062817.18412-1-bauerman@linux.ibm.com>

Define new "d-modsig" template field which holds the digest that is
expected to match the one contained in the modsig, and also new "modsig"
template field which holds the appended file signature.

Add a new "ima-modsig" defined template descriptor with the new fields as
well as the ones from the "ima-sig" descriptor.

Change ima_store_measurement() to accept a struct modsig * argument so that
it can be passed along to the templates via struct ima_event_data.

Suggested-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
---
 Documentation/security/IMA-templates.rst  |  7 ++-
 security/integrity/ima/ima.h              | 20 +++++++-
 security/integrity/ima/ima_api.c          |  5 +-
 security/integrity/ima/ima_main.c         |  2 +-
 security/integrity/ima/ima_modsig.c       | 19 +++++++
 security/integrity/ima/ima_policy.c       | 41 ++++++++++++++++
 security/integrity/ima/ima_template.c     |  7 ++-
 security/integrity/ima/ima_template_lib.c | 60 ++++++++++++++++++++++-
 security/integrity/ima/ima_template_lib.h |  4 ++
 9 files changed, 157 insertions(+), 8 deletions(-)

diff --git a/Documentation/security/IMA-templates.rst b/Documentation/security/IMA-templates.rst
index 2cd0e273cc9a..8da20b444be0 100644
--- a/Documentation/security/IMA-templates.rst
+++ b/Documentation/security/IMA-templates.rst
@@ -68,15 +68,18 @@ descriptors by adding their identifier to the format string
  - 'd-ng': the digest of the event, calculated with an arbitrary hash
    algorithm (field format: [<hash algo>:]digest, where the digest
    prefix is shown only if the hash algorithm is not SHA1 or MD5);
+ - 'd-modsig': the digest of the event without the appended modsig;
  - 'n-ng': the name of the event, without size limitations;
- - 'sig': the file signature.
+ - 'sig': the file signature;
+ - 'modsig' the appended file signature.
 
 
 Below, there is the list of defined template descriptors:
 
  - "ima": its format is ``d|n``;
  - "ima-ng" (default): its format is ``d-ng|n-ng``;
- - "ima-sig": its format is ``d-ng|n-ng|sig``.
+ - "ima-sig": its format is ``d-ng|n-ng|sig``;
+ - "ima-modsig": its format is ``d-ng|n-ng|sig|d-modsig|modsig``.
 
 
 
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index 0acc8e56ec73..a2b2c13ceda8 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -64,6 +64,7 @@ struct ima_event_data {
 	const unsigned char *filename;
 	struct evm_ima_xattr_data *xattr_value;
 	int xattr_len;
+	const struct modsig *modsig;
 	const char *violation;
 };
 
@@ -207,7 +208,7 @@ int ima_collect_measurement(struct integrity_iint_cache *iint,
 void ima_store_measurement(struct integrity_iint_cache *iint, struct file *file,
 			   const unsigned char *filename,
 			   struct evm_ima_xattr_data *xattr_value,
-			   int xattr_len, int pcr,
+			   int xattr_len, const struct modsig *modsig, int pcr,
 			   struct ima_template_desc *template_desc);
 void ima_audit_measurement(struct integrity_iint_cache *iint,
 			   const unsigned char *filename);
@@ -308,6 +309,10 @@ bool ima_hook_supports_modsig(enum ima_hooks func);
 int ima_read_modsig(enum ima_hooks func, const void *buf, loff_t buf_len,
 		    struct modsig **modsig);
 void ima_collect_modsig(struct modsig *modsig, const void *buf, loff_t size);
+int ima_get_modsig_digest(const struct modsig *modsig, enum hash_algo *algo,
+			  const u8 **digest, u32 *digest_size);
+int ima_get_raw_modsig(const struct modsig *modsig, const void **data,
+		       u32 *data_len);
 void ima_free_modsig(struct modsig *modsig);
 #else
 static inline bool ima_hook_supports_modsig(enum ima_hooks func)
@@ -326,6 +331,19 @@ static inline void ima_collect_modsig(struct modsig *modsig, const void *buf,
 {
 }
 
+static inline int ima_get_modsig_digest(const struct modsig *modsig,
+					enum hash_algo *algo, const u8 **digest,
+					u32 *digest_size)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline int ima_get_raw_modsig(const struct modsig *modsig,
+				     const void **data, u32 *data_len)
+{
+	return -EOPNOTSUPP;
+}
+
 static inline void ima_free_modsig(struct modsig *modsig)
 {
 }
diff --git a/security/integrity/ima/ima_api.c b/security/integrity/ima/ima_api.c
index c351b8c37278..32297d1e6164 100644
--- a/security/integrity/ima/ima_api.c
+++ b/security/integrity/ima/ima_api.c
@@ -291,7 +291,7 @@ int ima_collect_measurement(struct integrity_iint_cache *iint,
 void ima_store_measurement(struct integrity_iint_cache *iint,
 			   struct file *file, const unsigned char *filename,
 			   struct evm_ima_xattr_data *xattr_value,
-			   int xattr_len, int pcr,
+			   int xattr_len, const struct modsig *modsig, int pcr,
 			   struct ima_template_desc *template_desc)
 {
 	static const char op[] = "add_template_measure";
@@ -303,7 +303,8 @@ void ima_store_measurement(struct integrity_iint_cache *iint,
 					     .file = file,
 					     .filename = filename,
 					     .xattr_value = xattr_value,
-					     .xattr_len = xattr_len };
+					     .xattr_len = xattr_len,
+					     .modsig = modsig };
 	int violation = 0;
 
 	if (iint->measured_pcrs & (0x1 << pcr))
diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
index 2c9d3cf85726..85afb31fafe0 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -323,7 +323,7 @@ static int process_measurement(struct file *file, const struct cred *cred,
 
 	if (action & IMA_MEASURE)
 		ima_store_measurement(iint, file, pathname,
-				      xattr_value, xattr_len, pcr,
+				      xattr_value, xattr_len, modsig, pcr,
 				      template_desc);
 	if (rc == 0 && (action & IMA_APPRAISE_SUBMASK)) {
 		inode_lock(inode);
diff --git a/security/integrity/ima/ima_modsig.c b/security/integrity/ima/ima_modsig.c
index d438b87dba89..b01bbfeb1d98 100644
--- a/security/integrity/ima/ima_modsig.c
+++ b/security/integrity/ima/ima_modsig.c
@@ -140,6 +140,25 @@ int ima_modsig_verify(struct key *keyring, const struct modsig *modsig)
 					VERIFYING_MODULE_SIGNATURE, NULL, NULL);
 }
 
+int ima_get_modsig_digest(const struct modsig *modsig, enum hash_algo *algo,
+			  const u8 **digest, u32 *digest_size)
+{
+	*algo = modsig->hash_algo;
+	*digest = modsig->digest;
+	*digest_size = modsig->digest_size;
+
+	return 0;
+}
+
+int ima_get_raw_modsig(const struct modsig *modsig, const void **data,
+		       u32 *data_len)
+{
+	*data = &modsig->raw_pkcs7;
+	*data_len = modsig->raw_pkcs7_len;
+
+	return 0;
+}
+
 void ima_free_modsig(struct modsig *modsig)
 {
 	if (!modsig)
diff --git a/security/integrity/ima/ima_policy.c b/security/integrity/ima/ima_policy.c
index f64ef84516db..6463ab8921ea 100644
--- a/security/integrity/ima/ima_policy.c
+++ b/security/integrity/ima/ima_policy.c
@@ -10,6 +10,9 @@
  *	- initialize default measure policy rules
  *
  */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
 #include <linux/init.h>
 #include <linux/list.h>
 #include <linux/fs.h>
@@ -766,6 +769,38 @@ static void ima_log_string(struct audit_buffer *ab, char *key, char *value)
 	ima_log_string_op(ab, key, value, NULL);
 }
 
+/*
+ * Validating the appended signature included in the measurement list requires
+ * the file hash calculated without the appended signature (i.e., the 'd-modsig'
+ * field). Therefore, notify the user if they have the 'modsig' field but not
+ * the 'd-modsig' field in the template.
+ */
+static void check_template_modsig(const struct ima_template_desc *template)
+{
+#define MSG "template with 'modsig' field also needs 'd-modsig' field\n"
+	bool has_modsig, has_dmodsig;
+	static bool checked;
+	int i;
+
+	/* We only need to notify the user once. */
+	if (checked)
+		return;
+
+	has_modsig = has_dmodsig = false;
+	for (i = 0; i < template->num_fields; i++) {
+		if (!strcmp(template->fields[i]->field_id, "modsig"))
+			has_modsig = true;
+		else if (!strcmp(template->fields[i]->field_id, "d-modsig"))
+			has_dmodsig = true;
+	}
+
+	if (has_modsig && !has_dmodsig)
+		pr_notice(MSG);
+
+	checked = true;
+#undef MSG
+}
+
 static int ima_parse_rule(char *rule, struct ima_rule_entry *entry)
 {
 	struct audit_buffer *ab;
@@ -1096,6 +1131,12 @@ static int ima_parse_rule(char *rule, struct ima_rule_entry *entry)
 	else if (entry->action == APPRAISE)
 		temp_ima_appraise |= ima_appraise_flag(entry->func);
 
+	if (!result && entry->flags & IMA_MODSIG_ALLOWED) {
+		template_desc = entry->template ? entry->template :
+						  ima_template_desc_current();
+		check_template_modsig(template_desc);
+	}
+
 	audit_log_format(ab, "res=%d", !result);
 	audit_log_end(ab);
 	return result;
diff --git a/security/integrity/ima/ima_template.c b/security/integrity/ima/ima_template.c
index e6e892f31cbd..e25bef419c87 100644
--- a/security/integrity/ima/ima_template.c
+++ b/security/integrity/ima/ima_template.c
@@ -26,6 +26,7 @@ static struct ima_template_desc builtin_templates[] = {
 	{.name = IMA_TEMPLATE_IMA_NAME, .fmt = IMA_TEMPLATE_IMA_FMT},
 	{.name = "ima-ng", .fmt = "d-ng|n-ng"},
 	{.name = "ima-sig", .fmt = "d-ng|n-ng|sig"},
+	{.name = "ima-modsig", .fmt = "d-ng|n-ng|sig|d-modsig|modsig"},
 	{.name = "", .fmt = ""},	/* placeholder for a custom format */
 };
 
@@ -43,8 +44,12 @@ static const struct ima_template_field supported_fields[] = {
 	 .field_show = ima_show_template_string},
 	{.field_id = "sig", .field_init = ima_eventsig_init,
 	 .field_show = ima_show_template_sig},
+	{.field_id = "d-modsig", .field_init = ima_eventdigest_modsig_init,
+	 .field_show = ima_show_template_digest_ng},
+	{.field_id = "modsig", .field_init = ima_eventmodsig_init,
+	 .field_show = ima_show_template_sig},
 };
-#define MAX_TEMPLATE_NAME_LEN 15
+#define MAX_TEMPLATE_NAME_LEN sizeof("d|n|d-ng|n-ng|sig|d-modisg|modsig")
 
 static struct ima_template_desc *ima_template;
 static int template_desc_init_fields(const char *template_fmt,
diff --git a/security/integrity/ima/ima_template_lib.c b/security/integrity/ima/ima_template_lib.c
index 513b457ae900..dacb01fb105f 100644
--- a/security/integrity/ima/ima_template_lib.c
+++ b/security/integrity/ima/ima_template_lib.c
@@ -223,7 +223,8 @@ int ima_parse_buf(void *bufstartp, void *bufendp, void **bufcurp,
 	return 0;
 }
 
-static int ima_eventdigest_init_common(u8 *digest, u32 digestsize, u8 hash_algo,
+static int ima_eventdigest_init_common(const u8 *digest, u32 digestsize,
+				       u8 hash_algo,
 				       struct ima_field_data *field_data)
 {
 	/*
@@ -326,6 +327,41 @@ int ima_eventdigest_ng_init(struct ima_event_data *event_data,
 					   hash_algo, field_data);
 }
 
+/*
+ * This function writes the digest of the file which is expected to match the
+ * digest contained in the file's embedded signature.
+ */
+int ima_eventdigest_modsig_init(struct ima_event_data *event_data,
+				struct ima_field_data *field_data)
+{
+	enum hash_algo hash_algo;
+	const u8 *cur_digest;
+	u32 cur_digestsize;
+
+	if (!event_data->modsig)
+		return 0;
+
+	if (event_data->violation) {
+		/* Recording a violation. */
+		hash_algo = HASH_ALGO_SHA1;
+		cur_digest = NULL;
+		cur_digestsize = 0;
+	} else {
+		int rc;
+
+		rc = ima_get_modsig_digest(event_data->modsig, &hash_algo,
+					   &cur_digest, &cur_digestsize);
+		if (rc)
+			return rc;
+		else if (hash_algo == HASH_ALGO__LAST || cur_digestsize == 0)
+			/* There was some error collecting the digest. */
+			return -EINVAL;
+	}
+
+	return ima_eventdigest_init_common(cur_digest, cur_digestsize,
+					   hash_algo, field_data);
+}
+
 static int ima_eventname_init_common(struct ima_event_data *event_data,
 				     struct ima_field_data *field_data,
 				     bool size_limit)
@@ -389,3 +425,25 @@ int ima_eventsig_init(struct ima_event_data *event_data,
 	return ima_write_template_field_data(xattr_value, event_data->xattr_len,
 					     DATA_FMT_HEX, field_data);
 }
+
+int ima_eventmodsig_init(struct ima_event_data *event_data,
+			 struct ima_field_data *field_data)
+{
+	const void *data;
+	u32 data_len;
+	int rc;
+
+	if (!event_data->modsig)
+		return 0;
+
+	/*
+	 * modsig is a runtime structure containing pointers. Get its raw data
+	 * instead.
+	 */
+	rc = ima_get_raw_modsig(event_data->modsig, &data, &data_len);
+	if (rc)
+		return rc;
+
+	return ima_write_template_field_data(data, data_len, DATA_FMT_HEX,
+					     field_data);
+}
diff --git a/security/integrity/ima/ima_template_lib.h b/security/integrity/ima/ima_template_lib.h
index 6a3d8b831deb..1d7c690ebae5 100644
--- a/security/integrity/ima/ima_template_lib.h
+++ b/security/integrity/ima/ima_template_lib.h
@@ -38,8 +38,12 @@ int ima_eventname_init(struct ima_event_data *event_data,
 		       struct ima_field_data *field_data);
 int ima_eventdigest_ng_init(struct ima_event_data *event_data,
 			    struct ima_field_data *field_data);
+int ima_eventdigest_modsig_init(struct ima_event_data *event_data,
+				struct ima_field_data *field_data);
 int ima_eventname_ng_init(struct ima_event_data *event_data,
 			  struct ima_field_data *field_data);
 int ima_eventsig_init(struct ima_event_data *event_data,
 		      struct ima_field_data *field_data);
+int ima_eventmodsig_init(struct ima_event_data *event_data,
+			 struct ima_field_data *field_data);
 #endif /* __LINUX_IMA_TEMPLATE_LIB_H */

^ permalink raw reply related

* [PATCH v11 13/13] ima: Allow template= option for appraise rules as well
From: Thiago Jung Bauermann @ 2019-06-11  6:28 UTC (permalink / raw)
  To: linux-integrity
  Cc: Herbert Xu, linux-doc, Dmitry Kasatkin, David S. Miller,
	Jonathan Corbet, linux-kernel, Mimi Zohar, James Morris,
	David Howells, AKASHI, Takahiro, linux-security-module, keyrings,
	linux-crypto, Jessica Yu, linuxppc-dev, David Woodhouse,
	Thiago Jung Bauermann, Serge E. Hallyn
In-Reply-To: <20190611062817.18412-1-bauerman@linux.ibm.com>

It's useful being able to specify a different IMA template on appraise
policy rules, so allow it.

Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Suggested-by: Mimi Zohar <zohar@linux.ibm.com>
---
 security/integrity/ima/ima_policy.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/security/integrity/ima/ima_policy.c b/security/integrity/ima/ima_policy.c
index 6463ab8921ea..1ac1ef458f2e 100644
--- a/security/integrity/ima/ima_policy.c
+++ b/security/integrity/ima/ima_policy.c
@@ -1110,7 +1110,8 @@ static int ima_parse_rule(char *rule, struct ima_rule_entry *entry)
 			break;
 		case Opt_template:
 			ima_log_string(ab, "template", args[0].from);
-			if (entry->action != MEASURE) {
+			if (entry->action != MEASURE &&
+			    entry->action != APPRAISE) {
 				result = -EINVAL;
 				break;
 			}


^ permalink raw reply related

* Re: [PATCH v2] powerpc: Add force enable of DAWR on P9 option
From: Michael Neuling @ 2019-06-11  6:44 UTC (permalink / raw)
  To: Cédric Le Goater, mpe; +Cc: linuxppc-dev, Cameron Kaiser
In-Reply-To: <68f4f99d-4bb7-7d25-1e68-96c65dfbfbe9@kaod.org>


> >  2:
> > -BEGIN_FTR_SECTION
> > -	/* POWER9 with disabled DAWR */
> > +	LOAD_REG_ADDR(r11, dawr_force_enable)
> > +	lbz	r11, 0(r11)
> > +	cmpdi	r11, 0
> >  	li	r3, H_HARDWARE
> > -	blr
> > -END_FTR_SECTION_IFCLR(CPU_FTR_DAWR)
> > +	beqlr
> 
> Why is this a 'beqlr' ? Shouldn't it be a blr ? 

I believe it's right and should be a beqlr.  It's to replace the FTR section to
make it dynamic based on the dawr_force_enable bit.

Mikey


^ permalink raw reply

* Re: [PATCH 2/3] powerpc/64s/radix: ioremap use ioremap_page_range
From: Christophe Leroy @ 2019-06-11  6:46 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev
In-Reply-To: <20190610030818.17965-2-npiggin@gmail.com>



Le 10/06/2019 à 05:08, Nicholas Piggin a écrit :
> Radix can use ioremap_page_range for ioremap, after slab is available.
> This makes it possible to enable huge ioremap mapping support.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   arch/powerpc/include/asm/book3s/64/radix.h |  3 +++
>   arch/powerpc/mm/book3s64/pgtable.c         | 21 +++++++++++++++++++++
>   arch/powerpc/mm/book3s64/radix_pgtable.c   | 21 +++++++++++++++++++++
>   arch/powerpc/mm/pgtable_64.c               |  2 +-
>   4 files changed, 46 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
> index 574eca33f893..e04a839cb5b9 100644
> --- a/arch/powerpc/include/asm/book3s/64/radix.h
> +++ b/arch/powerpc/include/asm/book3s/64/radix.h
> @@ -266,6 +266,9 @@ extern void radix__vmemmap_remove_mapping(unsigned long start,
>   extern int radix__map_kernel_page(unsigned long ea, unsigned long pa,
>   				 pgprot_t flags, unsigned int psz);
>   
> +extern int radix__ioremap_range(unsigned long ea, phys_addr_t pa,
> +				unsigned long size, pgprot_t prot, int nid);
> +

'extern' is pointless here, and checkpatch will cry.

>   static inline unsigned long radix__get_tree_size(void)
>   {
>   	unsigned long rts_field;
> diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
> index ff98b663c83e..953850a602f7 100644
> --- a/arch/powerpc/mm/book3s64/pgtable.c
> +++ b/arch/powerpc/mm/book3s64/pgtable.c
> @@ -450,3 +450,24 @@ int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
>   
>   	return true;
>   }
> +
> +int ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long size, pgprot_t prot, int nid)
> +{
> +	unsigned long i;
> +
> +	if (radix_enabled())
> +		return radix__ioremap_range(ea, pa, size, prot, nid);

This function looks pretty similar to the one in the previous patch.
Since radix_enabled() is available and return false for all other 
subarches, I think the above could go in the generic ioremap_range(), 
you'll only need to move the function declaration in a common file, for 
instance asm/io.h

> +
> +	for (i = 0; i < size; i += PAGE_SIZE) {
> +		int err = map_kernel_page(ea + i, pa + i, prot);
> +		if (err) {
> +			if (slab_is_available())
> +				unmap_kernel_range(ea, size);
> +			else
> +				WARN_ON_ONCE(1); /* Should clean up */
> +			return err;
> +		}
> +	}
> +
> +	return 0;
> +}
> diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
> index c9bcf428dd2b..db993bc1aef3 100644
> --- a/arch/powerpc/mm/book3s64/radix_pgtable.c
> +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
> @@ -11,6 +11,7 @@
>   
>   #define pr_fmt(fmt) "radix-mmu: " fmt
>   
> +#include <linux/io.h>
>   #include <linux/kernel.h>
>   #include <linux/sched/mm.h>
>   #include <linux/memblock.h>
> @@ -1122,3 +1123,23 @@ void radix__ptep_modify_prot_commit(struct vm_area_struct *vma,
>   
>   	set_pte_at(mm, addr, ptep, pte);
>   }
> +
> +int radix__ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long size,
> +			pgprot_t prot, int nid)
> +{
> +	if (likely(slab_is_available())) {
> +		int err = ioremap_page_range(ea, ea + size, pa, prot);
> +		if (err)
> +			unmap_kernel_range(ea, size);
> +		return err;
> +	} else {
> +		unsigned long i;
> +
> +		for (i = 0; i < size; i += PAGE_SIZE) {
> +			int err = map_kernel_page(ea + i, pa + i, prot);
> +			if (WARN_ON_ONCE(err)) /* Should clean up */
> +				return err;
> +		}

Same loop again.

What about not doing a radix specific function and just putting 
something like below in the core ioremap_range() function ?

	if (likely(slab_is_available()) && radix_enabled()) {
		int err = ioremap_page_range(ea, ea + size, pa, prot);

		if (err)
			unmap_kernel_range(ea, size);
		return err;
	}

Because I'm pretty sure will more and more use ioremap_page_range().

> +		return 0;
> +	}
> +}
> diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
> index 6bd3660388aa..63cd81130643 100644
> --- a/arch/powerpc/mm/pgtable_64.c
> +++ b/arch/powerpc/mm/pgtable_64.c
> @@ -108,7 +108,7 @@ unsigned long ioremap_bot;
>   unsigned long ioremap_bot = IOREMAP_BASE;
>   #endif
>   
> -static int ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long size, pgprot_t prot, int nid)
> +int __weak ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long size, pgprot_t prot, int nid)

Hum. Weak functions remain in unused in vmlinux unless 
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION is selected.

Also, they are some how dangerous because people might change them 
without seeing that it is overridden for some particular configuration.

Christophe

>   {
>   	unsigned long i;
>   
> 

^ permalink raw reply

* Re: [PATCH v2] powerpc: Add force enable of DAWR on P9 option
From: Cédric Le Goater @ 2019-06-11  6:48 UTC (permalink / raw)
  To: Michael Neuling, mpe; +Cc: linuxppc-dev, Cameron Kaiser
In-Reply-To: <287ab7092cc6128e1c0d25f6245eb5f1706c6cb0.camel@neuling.org>

On 11/06/2019 08:44, Michael Neuling wrote:
> 
>>>  2:
>>> -BEGIN_FTR_SECTION
>>> -	/* POWER9 with disabled DAWR */
>>> +	LOAD_REG_ADDR(r11, dawr_force_enable)
>>> +	lbz	r11, 0(r11)
>>> +	cmpdi	r11, 0
>>>  	li	r3, H_HARDWARE
>>> -	blr
>>> -END_FTR_SECTION_IFCLR(CPU_FTR_DAWR)
>>> +	beqlr
>>
>> Why is this a 'beqlr' ? Shouldn't it be a blr ? 
> 
> I believe it's right and should be a beqlr.  It's to replace the FTR section to
> make it dynamic based on the dawr_force_enable bit.

hmm, see the crash below on a L1 running a nested guest. r3 is set
to -1 (H_HARDWARE) but a vpcu pointer was expected. How can we fix
this ? 

C.


[   44.374746] BUG: Kernel NULL pointer dereference at 0x000013bf
[   44.374848] Faulting instruction address: 0xc00000000010b044
[   44.374906] Oops: Kernel access of bad area, sig: 11 [#1]
[   44.374951] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[   44.375018] Modules linked in: vhost_net vhost tap xt_CHECKSUM iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 libcrc32c nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bpfilter vmx_crypto crct10dif_vpmsum crc32c_vpmsum kvm_hv kvm sch_fq_codel ip_tables x_tables autofs4 virtio_net net_failover virtio_scsi failover
[   44.375401] CPU: 8 PID: 1771 Comm: qemu-system-ppc Kdump: loaded Not tainted 5.2.0-rc4+ #3
[   44.375500] NIP:  c00000000010b044 LR: c0080000089dacf4 CTR: c00000000010aff4
[   44.375604] REGS: c00000179b397710 TRAP: 0300   Not tainted  (5.2.0-rc4+)
[   44.375691] MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 42244842  XER: 00000000
[   44.375815] CFAR: c00000000010aff8 DAR: 00000000000013bf DSISR: 42000000 IRQMASK: 0 
[   44.375815] GPR00: c0080000089dd6bc c00000179b3979a0 c008000008a04300 ffffffffffffffff 
[   44.375815] GPR04: 0000000000000000 0000000000000003 000000002444b05d c0000017f11c45d0 
[   44.375815] GPR08: 078000003e018dfe 0000000000000028 0000000000000001 0000000000000075 
[   44.375815] GPR12: c00000000010aff4 c000000007ff6300 0000000000000000 0000000000000000 
[   44.375815] GPR16: 0000000000000000 c0000017f11d0000 00000000ffffffff c0000017f11ca7a8 
[   44.375815] GPR20: c0000017f11c42ec ffffffffffffffff 0000000000000000 000000000000000a 
[   44.375815] GPR24: fffffffffffffffc 0000000000000000 c0000017f11c0000 c000000001a77ed8 
[   44.375815] GPR28: c00000179af70000 fffffffffffffffc c0080000089ff170 c00000179ae88540 
[   44.376673] NIP [c00000000010b044] kvmppc_h_set_dabr+0x50/0x68
[   44.376754] LR [c0080000089dacf4] kvmppc_pseries_do_hcall+0xa3c/0xeb0 [kvm_hv]
[   44.376849] Call Trace:
[   44.376886] [c00000179b3979a0] [c0000017f11c0000] 0xc0000017f11c0000 (unreliable)
[   44.376982] [c00000179b397a10] [c0080000089dd6bc] kvmppc_vcpu_run_hv+0x694/0xec0 [kvm_hv]
[   44.377084] [c00000179b397ae0] [c0080000093f8bcc] kvmppc_vcpu_run+0x34/0x48 [kvm]
[   44.377185] [c00000179b397b00] [c0080000093f522c] kvm_arch_vcpu_ioctl_run+0x2f4/0x400 [kvm]
[   44.377286] [c00000179b397b90] [c0080000093e3618] kvm_vcpu_ioctl+0x460/0x850 [kvm]
[   44.377384] [c00000179b397d00] [c0000000004ba6c4] do_vfs_ioctl+0xe4/0xb40
[   44.377464] [c00000179b397db0] [c0000000004bb1e4] ksys_ioctl+0xc4/0x110
[   44.377547] [c00000179b397e00] [c0000000004bb258] sys_ioctl+0x28/0x80
[   44.377628] [c00000179b397e20] [c00000000000b888] system_call+0x5c/0x70
[   44.377712] Instruction dump:
[   44.377765] 4082fff4 4c00012c 38600000 4e800020 e96280c0 896b0000 2c2b0000 3860ffff 
[   44.377862] 4d820020 50852e74 508516f6 78840724 <f88313c0> f8a313c8 7c942ba6 7cbc2ba6 

^ permalink raw reply

* Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook
From: Benjamin Herrenschmidt @ 2019-06-11  6:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Aaro Koskinen, linux-wireless, linux-kernel, Christian Zigotzky,
	linuxppc-dev, Larry Finger
In-Reply-To: <20190611060816.GA20158@lst.de>

On Tue, 2019-06-11 at 08:08 +0200, Christoph Hellwig wrote:
> On Tue, Jun 11, 2019 at 03:56:33PM +1000, Benjamin Herrenschmidt
> wrote:
> > The reason I think it sort-of-mostly-worked is that to get more
> > than
> > 1GB of RAM, those machines use CONFIG_HIGHMEM. And *most* network
> > buffers aren't allocated in Highmem.... so you got lucky.
> > 
> > That said, there is such as thing as no-copy send on network, so I
> > wouldn't be surprised if some things would still have failed, just
> > not
> > frequent enough for you to notice.
> 
> Unless NETIF_F_HIGHDMA is set on a netdev, the core networkign code
> will bounce buffer highmem pages for the driver under all
> circumstances.

 ... which b43legacy doesn't set to the best of my knowledge ...

Which makes me wonder how come it didn't work even with your patches ?
AFAIK, we have less than 1GB of lowmem unless the config has been
tweaked....

Cheers,
Ben.



^ permalink raw reply

* Re: [PATCH 4/4] mm/vmalloc: Hugepage vmalloc mappings
From: Anshuman Khandual @ 2019-06-11  6:59 UTC (permalink / raw)
  To: Nicholas Piggin, linux-mm; +Cc: linuxppc-dev, Ard Biesheuvel, linux-arm-kernel
In-Reply-To: <1560210095.fpemv3ultp.astroid@bobo.none>

On 06/11/2019 05:46 AM, Nicholas Piggin wrote:
> Anshuman Khandual's on June 10, 2019 6:53 pm:
>> On 06/10/2019 10:08 AM, Nicholas Piggin wrote:
>>> For platforms that define HAVE_ARCH_HUGE_VMAP, have vmap allow vmalloc to
>>> allocate huge pages and map them.
>>
>> IIUC that extends HAVE_ARCH_HUGE_VMAP from iormap to vmalloc. 
>>
>>>
>>> This brings dTLB misses for linux kernel tree `git diff` from 45,000 to
>>> 8,000 on a Kaby Lake KVM guest with 8MB dentry hash and mitigations=off
>>> (performance is in the noise, under 1% difference, page tables are likely
>>> to be well cached for this workload). Similar numbers are seen on POWER9.
>>
>> Sure will try this on arm64.
>>
>>>
>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>> ---
>>>  include/asm-generic/4level-fixup.h |   1 +
>>>  include/asm-generic/5level-fixup.h |   1 +
>>>  include/linux/vmalloc.h            |   1 +
>>>  mm/vmalloc.c                       | 132 +++++++++++++++++++++++------
>>>  4 files changed, 107 insertions(+), 28 deletions(-)
>>>
>>> diff --git a/include/asm-generic/4level-fixup.h b/include/asm-generic/4level-fixup.h
>>> index e3667c9a33a5..3cc65a4dd093 100644
>>> --- a/include/asm-generic/4level-fixup.h
>>> +++ b/include/asm-generic/4level-fixup.h
>>> @@ -20,6 +20,7 @@
>>>  #define pud_none(pud)			0
>>>  #define pud_bad(pud)			0
>>>  #define pud_present(pud)		1
>>> +#define pud_large(pud)			0
>>>  #define pud_ERROR(pud)			do { } while (0)
>>>  #define pud_clear(pud)			pgd_clear(pud)
>>>  #define pud_val(pud)			pgd_val(pud)
>>> diff --git a/include/asm-generic/5level-fixup.h b/include/asm-generic/5level-fixup.h
>>> index bb6cb347018c..c4377db09a4f 100644
>>> --- a/include/asm-generic/5level-fixup.h
>>> +++ b/include/asm-generic/5level-fixup.h
>>> @@ -22,6 +22,7 @@
>>>  #define p4d_none(p4d)			0
>>>  #define p4d_bad(p4d)			0
>>>  #define p4d_present(p4d)		1
>>> +#define p4d_large(p4d)			0
>>>  #define p4d_ERROR(p4d)			do { } while (0)
>>>  #define p4d_clear(p4d)			pgd_clear(p4d)
>>>  #define p4d_val(p4d)			pgd_val(p4d)
>>
>> Both of these are required from vmalloc_to_page() which as per a later comment
>> should be part of a prerequisite patch before this series.
> 
> I'm not sure what you mean. This patch is where they get used.

In case you move out vmalloc_to_page() changes to a separate patch.

> 
> Possibly I could split this and the vmalloc_to_page change out. I'll
> consider it.
> 
>>> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
>>> index 812bea5866d6..4c92dc608928 100644
>>> --- a/include/linux/vmalloc.h
>>> +++ b/include/linux/vmalloc.h
>>> @@ -42,6 +42,7 @@ struct vm_struct {
>>>  	unsigned long		size;
>>>  	unsigned long		flags;
>>>  	struct page		**pages;
>>> +	unsigned int		page_shift;
>>
>> So the entire vm_struct will be mapped with a single page_shift. It cannot have
>> mix and match mappings with PAGE_SIZE, PMD_SIZE, PUD_SIZE etc in case the
>> allocation fails for larger ones, falling back etc what over other reasons.
> 
> For now, yes. I have a bit of follow up work to improve that and make
> it able to fall back, but it's a bit more churn and not a significant
> benefit just yet because there are not a lot of very large vmallocs
> (except the early hashes which can be satisfied with large allocs).

Right but it will make this new feature complete like ioremap which logically
supports till P4D (though AFAICT not used). If there are no actual vmalloc
requests that large it is fine. Allocation attempts will start from the page
table level depending on the requested size. It is better to have PUD/P4D
considerations now rather than trying to after fit it later.

> 
>>
>>>  	unsigned int		nr_pages;
>>>  	phys_addr_t		phys_addr;
>>>  	const void		*caller;
>>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>>> index dd27cfb29b10..0cf8e861caeb 100644
>>> --- a/mm/vmalloc.c
>>> +++ b/mm/vmalloc.c
>>> @@ -36,6 +36,7 @@
>>>  #include <linux/rbtree_augmented.h>
>>>  
>>>  #include <linux/uaccess.h>
>>> +#include <asm/pgtable.h>
>>>  #include <asm/tlbflush.h>
>>>  #include <asm/shmparam.h>
>>>  
>>> @@ -440,6 +441,41 @@ static int vmap_pages_range(unsigned long start, unsigned long end,
>>>  	return ret;
>>>  }
>>>  
>>> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
>>> +static int vmap_hpages_range(unsigned long start, unsigned long end,
>>
>> A small nit (if you agree) s/hpages/huge_pages/
> 
> Hmm. It's not actually a good function name because it can do small
> pages as well. vmap_pages_size_range or something may be better.

Right.

> 
>>
>>> +				   pgprot_t prot, struct page **pages,
>>
>> Re-order (prot <---> pages) just to follow the standard like before.
> 
> Will do.
> 
>>> +				   unsigned int page_shift)
>>> +{
>>> +	unsigned long addr = start;
>>> +	unsigned int i, nr = (end - start) >> (PAGE_SHIFT + page_shift);
>>
>> s/nr/nr_huge_pages ?
> 
> Sure.
> 
>> Also should not we check for the alignment of the range [start...end] with
>> respect to (1UL << [PAGE_SHIFT + page_shift]).
> 
> The caller should if it specifies large page. Could check and -EINVAL
> for incorrect alignment.

That might be a good check here.

> 
>>> +
>>> +	for (i = 0; i < nr; i++) {
>>> +		int err;
>>> +
>>> +		err = vmap_range_noflush(addr,
>>> +					addr + (PAGE_SIZE << page_shift),
>>> +					__pa(page_address(pages[i])), prot,
>>> +					PAGE_SHIFT + page_shift);
>>> +		if (err)
>>> +			return err;
>>> +
>>> +		addr += PAGE_SIZE << page_shift;
>>> +	}
>>> +	flush_cache_vmap(start, end);
>>> +
>>> +	return nr;
>>> +}
>>> +#else
>>> +static int vmap_hpages_range(unsigned long start, unsigned long end,
>>> +			   pgprot_t prot, struct page **pages,
>>> +			   unsigned int page_shift)
>>> +{
>>> +	BUG_ON(page_shift != PAGE_SIZE);
>>> +	return vmap_pages_range(start, end, prot, pages);
>>> +}
>>> +#endif
>>> +
>>> +
>>>  int is_vmalloc_or_module_addr(const void *x)
>>>  {
>>>  	/*
>>> @@ -462,7 +498,7 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
>>>  {
>>>  	unsigned long addr = (unsigned long) vmalloc_addr;
>>>  	struct page *page = NULL;
>>> -	pgd_t *pgd = pgd_offset_k(addr);
>>> +	pgd_t *pgd;
>>>  	p4d_t *p4d;
>>>  	pud_t *pud;
>>>  	pmd_t *pmd;
>>> @@ -474,27 +510,38 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
>>>  	 */
>>>  	VIRTUAL_BUG_ON(!is_vmalloc_or_module_addr(vmalloc_addr));
>>>  
>>> +	pgd = pgd_offset_k(addr);
>>>  	if (pgd_none(*pgd))
>>>  		return NULL;
>>> +
>>
>> Small nit. Stray line here.
>>
>> 'pgd' related changes here seem to be just cleanups and should not part of this patch.
> 
> Yeah I figure it doesn't matter to make small changes close by, but
> maybe that's more frowned upon now for git blame?

Right. But I guess it should be okay if you can make vmalloc_to_page()
changes as a separate patch. This patch which adds a new feature should
not have any clean ups IMHO.

> 
>>>  	p4d = p4d_offset(pgd, addr);
>>>  	if (p4d_none(*p4d))
>>>  		return NULL;
>>> -	pud = pud_offset(p4d, addr);
>>> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
>>> +	if (p4d_large(*p4d))
>>> +		return p4d_page(*p4d) + ((addr & ~P4D_MASK) >> PAGE_SHIFT);
>>> +#endif
>>> +	if (WARN_ON_ONCE(p4d_bad(*p4d)))
>>> +		return NULL;
>>>  
>>> -	/*
>>> -	 * Don't dereference bad PUD or PMD (below) entries. This will also
>>> -	 * identify huge mappings, which we may encounter on architectures
>>> -	 * that define CONFIG_HAVE_ARCH_HUGE_VMAP=y. Such regions will be
>>> -	 * identified as vmalloc addresses by is_vmalloc_addr(), but are
>>> -	 * not [unambiguously] associated with a struct page, so there is
>>> -	 * no correct value to return for them.
>>> -	 */
>>
>> What changed the situation so that we could return struct page for a huge
>> mapping now ?
> 
> For the PUD case? Nothing changed, per se, we I just calculate the
> correct struct page now, so I may return it.

I was just curious what prevented this earlier (before this series). The
comment here and commit message which added this change making me wonder
what was the reason for not doing this earlier.  

> 
>> AFAICT even after this patch, PUD/P4D level huge pages can only
>> be created with ioremap_page_range() not with vmalloc() which creates PMD
>> sized mappings only. Hence if it's okay to dereference struct page of a huge
>> mapping (not withstanding the comment here) it should be part of an earlier
>> patch fixing it first for existing ioremap_page_range() huge mappings.
> 
> Possibly yes, we can consider 029c54b095995 to be a band-aid for huge
> vmaps which is fixed properly by this change, in which case it could
> make sense to break this into its own patch.

On arm64 [pud|pmd]_bad() calls out huge mappings at PUD or PMD. I still wonder what
Ard (copied him now) meant by "not [unambiguously] associated with a struct page".
He also mentioned about compound pages in the commit message. Anyways these makes
sense (fetching the struct page) unless I am missing something. But should be part
of a separate patch.

pXd_page(*pXd) + ((addr & ~PXD_MASK) >> PAGE_SHIFT)

> 
>>
>>> -	WARN_ON_ONCE(pud_bad(*pud));
>>> -	if (pud_none(*pud) || pud_bad(*pud))
>>> +	pud = pud_offset(p4d, addr);
>>> +	if (pud_none(*pud))
>>> +		return NULL;
>>> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
>>> +	if (pud_large(*pud))
>>> +		return pud_page(*pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
>>> +#endif
>>> +	if (WARN_ON_ONCE(pud_bad(*pud)))
>>>  		return NULL;
>>> +
>>>  	pmd = pmd_offset(pud, addr);
>>> -	WARN_ON_ONCE(pmd_bad(*pmd));
>>> -	if (pmd_none(*pmd) || pmd_bad(*pmd))
>>> +	if (pmd_none(*pmd))
>>> +		return NULL;
>>> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
>>> +	if (pmd_large(*pmd))
>>> +		return pmd_page(*pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
>>> +#endif
>>> +	if (WARN_ON_ONCE(pmd_bad(*pmd)))
>>>  		return NULL;
>>
>> At each page table level, we are checking in this order
>>
>> pXX_none() --> pXX_large() --> pXX_bad()
>>
>> Are not these alternative orders bit better
>>
>> pXX_bad() --> pXX_none() --> pXX_large()
>>
>> Or
>>
>> pXX_none() --> pXX_bad() --> pXX_large()
>>
>> Checking for pXX_bad() at the end does not make much sense.
> 
> Yeah the order tends to go none->bad. It's not 100% clear we can
> test bad before none (at least it goes against convention). But good
> point should be changed to your last sequence I think.

Sure.

> 
>>
>>>  
>>>  	ptep = pte_offset_map(pmd, addr);
>>> @@ -502,6 +549,7 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
>>>  	if (pte_present(pte))
>>>  		page = pte_page(pte);
>>>  	pte_unmap(ptep);
>>> +
>>
>> Small nit. Stray line here.
> 
> I don't mind adding some lines here and there, like here. It is an
> unrelated (alleged-)cleanup though.
> 
>>
>>>  	return page;
>>>  }
>>>  EXPORT_SYMBOL(vmalloc_to_page);
>>> @@ -2185,8 +2233,9 @@ static struct vm_struct *__get_vm_area_node(unsigned long size,
>>>  		return NULL;
>>>  
>>>  	if (flags & VM_IOREMAP)
>>> -		align = 1ul << clamp_t(int, get_count_order_long(size),
>>> -				       PAGE_SHIFT, IOREMAP_MAX_ORDER);
>>> +		align = max(align,
>>> +				1ul << clamp_t(int, get_count_order_long(size),
>>> +				       PAGE_SHIFT, IOREMAP_MAX_ORDER));
>>>  
>>>  	area = kzalloc_node(sizeof(*area), gfp_mask & GFP_RECLAIM_MASK, node);
>>>  	if (unlikely(!area))
>>> @@ -2398,7 +2447,7 @@ static void __vunmap(const void *addr, int deallocate_pages)
>>>  			struct page *page = area->pages[i];
>>>  
>>>  			BUG_ON(!page);
>>> -			__free_pages(page, 0);
>>> +			__free_pages(page, area->page_shift);
>>
>> area->page_shift' turns out to be effective page order. I think the name here is bit
>> misleading. s/page_shift/page_order or nr_pages should be better IMHO. page_shift is
>> not actual shift (1UL << area->shift to get size) nor does it sound like page 'order'.
> 
> Yeah good point.
> 
>>>  		}
>>>  
>>>  		kvfree(area->pages);
>>> @@ -2541,14 +2590,17 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>>>  				 pgprot_t prot, int node)
>>>  {
>>>  	struct page **pages;
>>> +	unsigned long addr = (unsigned long)area->addr;
>>> +	unsigned long size = get_vm_area_size(area);
>>> +	unsigned int page_shift = area->page_shift;
>>> +	unsigned int shift = page_shift + PAGE_SHIFT;
>>>  	unsigned int nr_pages, array_size, i;
>>>  	const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
>>>  	const gfp_t alloc_mask = gfp_mask | __GFP_NOWARN;
>>>  	const gfp_t highmem_mask = (gfp_mask & (GFP_DMA | GFP_DMA32)) ?
>>> -					0 :
>>> -					__GFP_HIGHMEM;
>>> +					0 : __GFP_HIGHMEM;
>>>  
>>> -	nr_pages = get_vm_area_size(area) >> PAGE_SHIFT;
>>> +	nr_pages = size >> shift;
>>>  	array_size = (nr_pages * sizeof(struct page *));
>>>  
>>>  	area->nr_pages = nr_pages;
>>> @@ -2569,10 +2621,8 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>>>  	for (i = 0; i < area->nr_pages; i++) {
>>>  		struct page *page;
>>>  
>>> -		if (node == NUMA_NO_NODE)
>>> -			page = alloc_page(alloc_mask|highmem_mask);
>>> -		else
>>> -			page = alloc_pages_node(node, alloc_mask|highmem_mask, 0);
>>> +		page = alloc_pages_node(node,
>>> +				alloc_mask|highmem_mask, page_shift);
>>
>> alloc_mask remains the exact same like before even for these high order pages.
> 
> Is there a problem there? I don't see.

There is no problem. I justed noted it.

> 
>>>  
>>>  		if (unlikely(!page)) {
>>>  			/* Successfully allocated i pages, free them in __vunmap() */
>>> @@ -2584,8 +2634,9 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>>>  			cond_resched();
>>>  	}
>>>  
>>> -	if (map_vm_area(area, prot, pages))
>>> +	if (vmap_hpages_range(addr, addr + size, prot, pages, page_shift) < 0)
>>>  		goto fail;
>>> +
>>>  	return area->addr;
>>>  
>>>  fail:
>>> @@ -2619,22 +2670,39 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
>>>  			pgprot_t prot, unsigned long vm_flags, int node,
>>>  			const void *caller)
>>>  {
>>> -	struct vm_struct *area;
>>> +	struct vm_struct *area = NULL;
>>>  	void *addr;
>>>  	unsigned long real_size = size;
>>> +	unsigned long real_align = align;
>>> +	unsigned int shift = PAGE_SHIFT;
>>>  
>>>  	size = PAGE_ALIGN(size);
>>>  	if (!size || (size >> PAGE_SHIFT) > totalram_pages())
>>>  		goto fail;
>>>  
>>> +	if (IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP)) {
>>> +		unsigned long size_per_node;
>>> +
>>> +		size_per_node = size;
>>> +		if (node == NUMA_NO_NODE)
>>> +			size_per_node /= num_online_nodes();
>>> +		if (size_per_node >= PMD_SIZE)
>>> +			shift = PMD_SHIFT;
>>
>> There are two problems here.
>>
>> 1. Should not size_per_node be aligned with PMD_SIZE to avoid wasting memory later
>>    because of alignment upwards (making it worse for NUMA_NO_NODE)
> 
> I'm not sure what you mean, it's just a heuristic to check for node
> interleaving, and use small pages if large can not interleave well.
> 
>> 2. What about PUD_SIZE which is not considered here at all
> 
> Yeah, not doing PUD pages at all. It would be pretty trivial to add 
> after PMD is working, but would it actually get used anywhere?

But it should make this feature logically complete. Allocation attempts can start
at right pgtable level depending on the requested size. I dont think it will have
any performance impact or something.

> 
>> 3. We should have similar knobs like ioremap controlling different size huge mappings
>>
>> static int __read_mostly ioremap_p4d_capable;
>> static int __read_mostly ioremap_pud_capable;
>> static int __read_mostly ioremap_pmd_capable;
>> static int __read_mostly ioremap_huge_disabled;
>>
>> while also giving arch a chance to weigh in through similar overrides like these.
>>
>> arch_ioremap_[pud|pmd]_supported() ---> probably unifying it for vmalloc() 
> 
> I'm not sure if that makes sense. IO mappings I could see maybe having
> some quirks or bugs or support issues. Cacheable mappings should be the
> "base" for supporting larger pages, if the platform has trouble with
> those then it should just avoid the feature I think.
> 
> Or is there a good reason to add the option? I don't mind, I just want
> to avoid proliferation.

We might need atleast for the potential problem on arm64 as discussed on the
other thread.

^ permalink raw reply

* Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook
From: Benjamin Herrenschmidt @ 2019-06-11  6:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Aaro Koskinen, linux-wireless, linux-kernel, Christian Zigotzky,
	linuxppc-dev, Larry Finger
In-Reply-To: <fdfc817d1dcdc83f5bc45f0ab12cbce0c61e6702.camel@kernel.crashing.org>

On Tue, 2019-06-11 at 16:58 +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2019-06-11 at 08:08 +0200, Christoph Hellwig wrote:
> > On Tue, Jun 11, 2019 at 03:56:33PM +1000, Benjamin Herrenschmidt
> > wrote:
> > > The reason I think it sort-of-mostly-worked is that to get more
> > > than
> > > 1GB of RAM, those machines use CONFIG_HIGHMEM. And *most* network
> > > buffers aren't allocated in Highmem.... so you got lucky.
> > > 
> > > That said, there is such as thing as no-copy send on network, so I
> > > wouldn't be surprised if some things would still have failed, just
> > > not
> > > frequent enough for you to notice.
> > 
> > Unless NETIF_F_HIGHDMA is set on a netdev, the core networkign code
> > will bounce buffer highmem pages for the driver under all
> > circumstances.
> 
>  ... which b43legacy doesn't set to the best of my knowledge ...
> 
> Which makes me wonder how come it didn't work even with your patches ?
> AFAIK, we have less than 1GB of lowmem unless the config has been
> tweaked....

Ah stupid me ... it's dma_set_mask that failed, since it has no idea
that the calling driver is limited to lowmem.

That's also why the "wrong" patch worked.

So yes, a ZONE_DMA at 30-bits will work, though it's somewhat overkill.

Cheers,
Ben.



^ permalink raw reply

* Re: [PATCH v2] powerpc: Add force enable of DAWR on P9 option
From: Michael Neuling @ 2019-06-11  7:24 UTC (permalink / raw)
  To: Cédric Le Goater, mpe; +Cc: linuxppc-dev, Cameron Kaiser
In-Reply-To: <dc9106e8-422f-5582-e463-def38902f03a@kaod.org>

On Tue, 2019-06-11 at 08:48 +0200, Cédric Le Goater wrote:
> On 11/06/2019 08:44, Michael Neuling wrote:
> > > >  2:
> > > > -BEGIN_FTR_SECTION
> > > > -	/* POWER9 with disabled DAWR */
> > > > +	LOAD_REG_ADDR(r11, dawr_force_enable)
> > > > +	lbz	r11, 0(r11)
> > > > +	cmpdi	r11, 0
> > > >  	li	r3, H_HARDWARE
> > > > -	blr
> > > > -END_FTR_SECTION_IFCLR(CPU_FTR_DAWR)
> > > > +	beqlr
> > > 
> > > Why is this a 'beqlr' ? Shouldn't it be a blr ? 
> > 
> > I believe it's right and should be a beqlr.  It's to replace the FTR section to
> > make it dynamic based on the dawr_force_enable bit.
> 
> hmm, see the crash below on a L1 running a nested guest. r3 is set
> to -1 (H_HARDWARE) but a vpcu pointer was expected. How can we fix
> this ? 
> 
> C.
> 
> 
> [   44.374746] BUG: Kernel NULL pointer dereference at 0x000013bf
> [   44.374848] Faulting instruction address: 0xc00000000010b044
> [   44.374906] Oops: Kernel access of bad area, sig: 11 [#1]
> [   44.374951] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> [   44.375018] Modules linked in: vhost_net vhost tap xt_CHECKSUM iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 libcrc32c nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bpfilter vmx_crypto crct10dif_vpmsum crc32c_vpmsum kvm_hv kvm sch_fq_codel ip_tables x_tables autofs4 virtio_net net_failover virtio_scsi failover
> [   44.375401] CPU: 8 PID: 1771 Comm: qemu-system-ppc Kdump: loaded Not tainted 5.2.0-rc4+ #3
> [   44.375500] NIP:  c00000000010b044 LR: c0080000089dacf4 CTR: c00000000010aff4
> [   44.375604] REGS: c00000179b397710 TRAP: 0300   Not tainted  (5.2.0-rc4+)
> [   44.375691] MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 42244842  XER: 00000000
> [   44.375815] CFAR: c00000000010aff8 DAR: 00000000000013bf DSISR: 42000000 IRQMASK: 0 
> [   44.375815] GPR00: c0080000089dd6bc c00000179b3979a0 c008000008a04300 ffffffffffffffff 
> [   44.375815] GPR04: 0000000000000000 0000000000000003 000000002444b05d c0000017f11c45d0 
> [   44.375815] GPR08: 078000003e018dfe 0000000000000028 0000000000000001 0000000000000075 
> [   44.375815] GPR12: c00000000010aff4 c000000007ff6300 0000000000000000 0000000000000000 
> [   44.375815] GPR16: 0000000000000000 c0000017f11d0000 00000000ffffffff c0000017f11ca7a8 
> [   44.375815] GPR20: c0000017f11c42ec ffffffffffffffff 0000000000000000 000000000000000a 
> [   44.375815] GPR24: fffffffffffffffc 0000000000000000 c0000017f11c0000 c000000001a77ed8 
> [   44.375815] GPR28: c00000179af70000 fffffffffffffffc c0080000089ff170 c00000179ae88540 
> [   44.376673] NIP [c00000000010b044] kvmppc_h_set_dabr+0x50/0x68
> [   44.376754] LR [c0080000089dacf4] kvmppc_pseries_do_hcall+0xa3c/0xeb0 [kvm_hv]
> [   44.376849] Call Trace:
> [   44.376886] [c00000179b3979a0] [c0000017f11c0000] 0xc0000017f11c0000 (unreliable)
> [   44.376982] [c00000179b397a10] [c0080000089dd6bc] kvmppc_vcpu_run_hv+0x694/0xec0 [kvm_hv]
> [   44.377084] [c00000179b397ae0] [c0080000093f8bcc] kvmppc_vcpu_run+0x34/0x48 [kvm]
> [   44.377185] [c00000179b397b00] [c0080000093f522c] kvm_arch_vcpu_ioctl_run+0x2f4/0x400 [kvm]
> [   44.377286] [c00000179b397b90] [c0080000093e3618] kvm_vcpu_ioctl+0x460/0x850 [kvm]
> [   44.377384] [c00000179b397d00] [c0000000004ba6c4] do_vfs_ioctl+0xe4/0xb40
> [   44.377464] [c00000179b397db0] [c0000000004bb1e4] ksys_ioctl+0xc4/0x110
> [   44.377547] [c00000179b397e00] [c0000000004bb258] sys_ioctl+0x28/0x80
> [   44.377628] [c00000179b397e20] [c00000000000b888] system_call+0x5c/0x70
> [   44.377712] Instruction dump:
> [   44.377765] 4082fff4 4c00012c 38600000 4e800020 e96280c0 896b0000 2c2b0000 3860ffff 
> [   44.377862] 4d820020 50852e74 508516f6 78840724 <f88313c0> f8a313c8 7c942ba6 7cbc2ba6 


Opps, it's because I corrupted r3 :-(

Does this fix it?


diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 139027c62d..f781ee1458 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -2519,8 +2519,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	LOAD_REG_ADDR(r11, dawr_force_enable)
 	lbz	r11, 0(r11)
 	cmpdi	r11, 0
+	bne	3f
 	li	r3, H_HARDWARE
-	beqlr
+	blr
+3:
 	/* Emulate H_SET_DABR/X on P8 for the sake of compat mode guests */
 	rlwimi	r5, r4, 5, DAWRX_DR | DAWRX_DW
 	rlwimi	r5, r4, 2, DAWRX_WT




^ permalink raw reply related

* [Bug 203839] Kernel 5.2-rc3 fails to boot on a PowerMac G4 3,6: systemd[1]: Failed to bump fs.file-max, ignoring: invalid argument
From: bugzilla-daemon @ 2019-06-11  7:25 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <bug-203839-206035@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=203839

--- Comment #8 from Christophe Leroy (christophe.leroy@c-s.fr) ---
Argh !

CONFIG_SMP must (again) be the reason we missed it.

Can you please try the change below ?

diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index 1d5f1bd0dacd..f255e22184b4 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -752,6 +752,7 @@ __secondary_start:
        stw     r0,0(r3)

        /* load up the MMU */
+       bl      load_segment_registers
        bl      load_up_mmu

        /* ptr to phys current thread */

Thanks
Christophe

On 06/11/2019 12:32 AM, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=203839
> 
> --- Comment #6 from Erhard F. (erhard_f@mailbox.org) ---
> Created attachment 283183
>    --> https://bugzilla.kernel.org/attachment.cgi?id=283183&action=edit
> bisect.log
> 
> bisect took me a while due to quite some skips. Cherry-picking
> 397d2300b08cdee052053e362018cdb6dd65eea2 and
> 305d60012304684bd59ea1f67703e51662e4906a helped me complete it.
> 
> # git bisect good | tee -a /root/bisect02.log
> 215b823707ce4e8e52b106915f70357fa474c669 is the first bad commit
> commit 215b823707ce4e8e52b106915f70357fa474c669
> Author: Christophe Leroy <christophe.leroy@c-s.fr>
> Date:   Fri Apr 26 16:23:36 2019 +0000
> 
>      powerpc/32s: set up an early static hash table for KASAN.
> 
>      KASAN requires early activation of hash table, before memblock()
>      functions are available.
> 
>      This patch implements an early hash_table statically defined in
>      __initdata.
> 
>      During early boot, a single page table is used.
> 
>      For hash32, when doing the final init, one page table is allocated
>      for each PGD entry because of the _PAGE_HASHPTE flag which can't be
>      common to several virt pages. This is done after memblock get
>      available but before switching to the final hash table, otherwise
>      there are issues with TLB flushing due to the shared entries.
> 
>      Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>      Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
> 
> :040000 040000 abc24eb3c4ad3e4f2b1eb7b52c295c8b95d79a78
> c3b6114c26eb8e181abb3f1abc9b6ecc12292f4d M      arch
>

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply related

* Re: [PATCH v2] powerpc: Add force enable of DAWR on P9 option
From: Christophe Leroy @ 2019-06-11  7:51 UTC (permalink / raw)
  To: Michael Neuling, Cédric Le Goater, mpe; +Cc: linuxppc-dev, Cameron Kaiser
In-Reply-To: <5d4cdec0f11e1d47b196b068fcd9fdb107f147b0.camel@neuling.org>



Le 11/06/2019 à 09:24, Michael Neuling a écrit :
> On Tue, 2019-06-11 at 08:48 +0200, Cédric Le Goater wrote:
>> On 11/06/2019 08:44, Michael Neuling wrote:
>>>>>   2:
>>>>> -BEGIN_FTR_SECTION
>>>>> -	/* POWER9 with disabled DAWR */
>>>>> +	LOAD_REG_ADDR(r11, dawr_force_enable)
>>>>> +	lbz	r11, 0(r11)
>>>>> +	cmpdi	r11, 0
>>>>>   	li	r3, H_HARDWARE
>>>>> -	blr
>>>>> -END_FTR_SECTION_IFCLR(CPU_FTR_DAWR)
>>>>> +	beqlr
>>>>
>>>> Why is this a 'beqlr' ? Shouldn't it be a blr ?
>>>
>>> I believe it's right and should be a beqlr.  It's to replace the FTR section to
>>> make it dynamic based on the dawr_force_enable bit.
>>
>> hmm, see the crash below on a L1 running a nested guest. r3 is set
>> to -1 (H_HARDWARE) but a vpcu pointer was expected. How can we fix
>> this ?
>>
>> C.
>>
>>
>> [   44.374746] BUG: Kernel NULL pointer dereference at 0x000013bf
>> [   44.374848] Faulting instruction address: 0xc00000000010b044
>> [   44.374906] Oops: Kernel access of bad area, sig: 11 [#1]
>> [   44.374951] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
>> [   44.375018] Modules linked in: vhost_net vhost tap xt_CHECKSUM iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 libcrc32c nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bpfilter vmx_crypto crct10dif_vpmsum crc32c_vpmsum kvm_hv kvm sch_fq_codel ip_tables x_tables autofs4 virtio_net net_failover virtio_scsi failover
>> [   44.375401] CPU: 8 PID: 1771 Comm: qemu-system-ppc Kdump: loaded Not tainted 5.2.0-rc4+ #3
>> [   44.375500] NIP:  c00000000010b044 LR: c0080000089dacf4 CTR: c00000000010aff4
>> [   44.375604] REGS: c00000179b397710 TRAP: 0300   Not tainted  (5.2.0-rc4+)
>> [   44.375691] MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 42244842  XER: 00000000
>> [   44.375815] CFAR: c00000000010aff8 DAR: 00000000000013bf DSISR: 42000000 IRQMASK: 0
>> [   44.375815] GPR00: c0080000089dd6bc c00000179b3979a0 c008000008a04300 ffffffffffffffff
>> [   44.375815] GPR04: 0000000000000000 0000000000000003 000000002444b05d c0000017f11c45d0
>> [   44.375815] GPR08: 078000003e018dfe 0000000000000028 0000000000000001 0000000000000075
>> [   44.375815] GPR12: c00000000010aff4 c000000007ff6300 0000000000000000 0000000000000000
>> [   44.375815] GPR16: 0000000000000000 c0000017f11d0000 00000000ffffffff c0000017f11ca7a8
>> [   44.375815] GPR20: c0000017f11c42ec ffffffffffffffff 0000000000000000 000000000000000a
>> [   44.375815] GPR24: fffffffffffffffc 0000000000000000 c0000017f11c0000 c000000001a77ed8
>> [   44.375815] GPR28: c00000179af70000 fffffffffffffffc c0080000089ff170 c00000179ae88540
>> [   44.376673] NIP [c00000000010b044] kvmppc_h_set_dabr+0x50/0x68
>> [   44.376754] LR [c0080000089dacf4] kvmppc_pseries_do_hcall+0xa3c/0xeb0 [kvm_hv]
>> [   44.376849] Call Trace:
>> [   44.376886] [c00000179b3979a0] [c0000017f11c0000] 0xc0000017f11c0000 (unreliable)
>> [   44.376982] [c00000179b397a10] [c0080000089dd6bc] kvmppc_vcpu_run_hv+0x694/0xec0 [kvm_hv]
>> [   44.377084] [c00000179b397ae0] [c0080000093f8bcc] kvmppc_vcpu_run+0x34/0x48 [kvm]
>> [   44.377185] [c00000179b397b00] [c0080000093f522c] kvm_arch_vcpu_ioctl_run+0x2f4/0x400 [kvm]
>> [   44.377286] [c00000179b397b90] [c0080000093e3618] kvm_vcpu_ioctl+0x460/0x850 [kvm]
>> [   44.377384] [c00000179b397d00] [c0000000004ba6c4] do_vfs_ioctl+0xe4/0xb40
>> [   44.377464] [c00000179b397db0] [c0000000004bb1e4] ksys_ioctl+0xc4/0x110
>> [   44.377547] [c00000179b397e00] [c0000000004bb258] sys_ioctl+0x28/0x80
>> [   44.377628] [c00000179b397e20] [c00000000000b888] system_call+0x5c/0x70
>> [   44.377712] Instruction dump:
>> [   44.377765] 4082fff4 4c00012c 38600000 4e800020 e96280c0 896b0000 2c2b0000 3860ffff
>> [   44.377862] 4d820020 50852e74 508516f6 78840724 <f88313c0> f8a313c8 7c942ba6 7cbc2ba6
> 
> 
> Opps, it's because I corrupted r3 :-(
> 
> Does this fix it?
> 
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index 139027c62d..f781ee1458 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -2519,8 +2519,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>   	LOAD_REG_ADDR(r11, dawr_force_enable)
>   	lbz	r11, 0(r11)
>   	cmpdi	r11, 0
> +	bne	3f
>   	li	r3, H_HARDWARE
> -	beqlr
> +	blr
> +3:

Or you could copy r3 into another unused volatile register and use it 
instead of r3 below.

Christophe


>   	/* Emulate H_SET_DABR/X on P8 for the sake of compat mode guests */
>   	rlwimi	r5, r4, 5, DAWRX_DR | DAWRX_DW
>   	rlwimi	r5, r4, 2, DAWRX_WT
> 

^ permalink raw reply

* Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook
From: Christoph Hellwig @ 2019-06-11  7:53 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Aaro Koskinen, linux-wireless, linux-kernel, Christian Zigotzky,
	linuxppc-dev, Christoph Hellwig, Larry Finger
In-Reply-To: <fdfc817d1dcdc83f5bc45f0ab12cbce0c61e6702.camel@kernel.crashing.org>

On Tue, Jun 11, 2019 at 04:58:12PM +1000, Benjamin Herrenschmidt wrote:
>  ... which b43legacy doesn't set to the best of my knowledge ...
> 
> Which makes me wonder how come it didn't work even with your patches ?
> AFAIK, we have less than 1GB of lowmem unless the config has been
> tweaked....

It needs to bounce to somewhere.  And the dma-direct code is pretty
strict to require a zone it can do allocations from when setting the
dma mask.  As was the old ppc64 code, but not the ppc32 code that
allowed setting any DMA mask.  And something about the more strict
validation seem to trip up now.

^ permalink raw reply

* Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook
From: Christoph Hellwig @ 2019-06-11  7:54 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Aaro Koskinen, linux-wireless, linux-kernel, Christian Zigotzky,
	linuxppc-dev, Christoph Hellwig, Larry Finger
In-Reply-To: <b30ced162fa96d0ca63b8b9629d6fe9bc5c78746.camel@kernel.crashing.org>

On Tue, Jun 11, 2019 at 04:59:54PM +1000, Benjamin Herrenschmidt wrote:
> Ah stupid me ... it's dma_set_mask that failed, since it has no idea
> that the calling driver is limited to lowmem.
> 
> That's also why the "wrong" patch worked.
> 
> So yes, a ZONE_DMA at 30-bits will work, though it's somewhat overkill.

Well, according to Larry it doesn't actually work, which is odd.

^ permalink raw reply

* Re: [PATCH v2] mm: hwpoison: disable memory error handling on 1GB hugepage
From: Wanpeng Li @ 2019-06-11  8:42 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: kvm, lidongchen@tencent.com, Punit Agrawal, Xiao Guangrong,
	linux-kernel@vger.kernel.org, Michal Hocko, linux-mm@kvack.org,
	yongkaiwu@tencent.com, Aneesh Kumar K.V, Paolo Bonzini,
	Andrew Morton, Anshuman Khandual, linuxppc-dev@lists.ozlabs.org,
	Mike Kravetz
In-Reply-To: <20190610235045.GB30991@hori.linux.bs1.fc.nec.co.jp>

On Tue, 11 Jun 2019 at 07:51, Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:
>
> On Wed, May 29, 2019 at 04:31:01PM -0700, Mike Kravetz wrote:
> > On 5/28/19 2:49 AM, Wanpeng Li wrote:
> > > Cc Paolo,
> > > Hi all,
> > > On Wed, 14 Feb 2018 at 06:34, Mike Kravetz <mike.kravetz@oracle.com> wrote:
> > >>
> > >> On 02/12/2018 06:48 PM, Michael Ellerman wrote:
> > >>> Andrew Morton <akpm@linux-foundation.org> writes:
> > >>>
> > >>>> On Thu, 08 Feb 2018 12:30:45 +0000 Punit Agrawal <punit.agrawal@arm.com> wrote:
> > >>>>
> > >>>>>>
> > >>>>>> So I don't think that the above test result means that errors are properly
> > >>>>>> handled, and the proposed patch should help for arm64.
> > >>>>>
> > >>>>> Although, the deviation of pud_huge() avoids a kernel crash the code
> > >>>>> would be easier to maintain and reason about if arm64 helpers are
> > >>>>> consistent with expectations by core code.
> > >>>>>
> > >>>>> I'll look to update the arm64 helpers once this patch gets merged. But
> > >>>>> it would be helpful if there was a clear expression of semantics for
> > >>>>> pud_huge() for various cases. Is there any version that can be used as
> > >>>>> reference?
> > >>>>
> > >>>> Is that an ack or tested-by?
> > >>>>
> > >>>> Mike keeps plaintively asking the powerpc developers to take a look,
> > >>>> but they remain steadfastly in hiding.
> > >>>
> > >>> Cc'ing linuxppc-dev is always a good idea :)
> > >>>
> > >>
> > >> Thanks Michael,
> > >>
> > >> I was mostly concerned about use cases for soft/hard offline of huge pages
> > >> larger than PMD_SIZE on powerpc.  I know that powerpc supports PGD_SIZE
> > >> huge pages, and soft/hard offline support was specifically added for this.
> > >> See, 94310cbcaa3c "mm/madvise: enable (soft|hard) offline of HugeTLB pages
> > >> at PGD level"
> > >>
> > >> This patch will disable that functionality.  So, at a minimum this is a
> > >> 'heads up'.  If there are actual use cases that depend on this, then more
> > >> work/discussions will need to happen.  From the e-mail thread on PGD_SIZE
> > >> support, I can not tell if there is a real use case or this is just a
> > >> 'nice to have'.
> > >
> > > 1GB hugetlbfs pages are used by DPDK and VMs in cloud deployment, we
> > > encounter gup_pud_range() panic several times in product environment.
> > > Is there any plan to reenable and fix arch codes?
> >
> > I too am aware of slightly more interest in 1G huge pages.  Suspect that as
> > Intel MMU capacity increases to handle more TLB entries there will be more
> > and more interest.
> >
> > Personally, I am not looking at this issue.  Perhaps Naoya will comment as
> > he know most about this code.
>
> Thanks for forwarding this to me, I'm feeling that memory error handling
> on 1GB hugepage is demanded as real use case.
>
> >
> > > In addition, https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kvm/mmu.c#n3213
> > > The memory in guest can be 1GB/2MB/4K, though the host-backed memory
> > > are 1GB hugetlbfs pages, after above PUD panic is fixed,
> > > try_to_unmap() which is called in MCA recovery path will mark the PUD
> > > hwpoison entry. The guest will vmexit and retry endlessly when
> > > accessing any memory in the guest which is backed by this 1GB poisoned
> > > hugetlbfs page. We have a plan to split this 1GB hugetblfs page by 2MB
> > > hugetlbfs pages/4KB pages, maybe file remap to a virtual address range
> > > which is 2MB/4KB page granularity, also split the KVM MMU 1GB SPTE
> > > into 2MB/4KB and mark the offensive SPTE w/ a hwpoison flag, a sigbus
> > > will be delivered to VM at page fault next time for the offensive
> > > SPTE. Is this proposal acceptable?
> >
> > I am not sure of the error handling design, but this does sound reasonable.
>
> I agree that that's better.
>
> > That block of code which potentially dissolves a huge page on memory error
> > is hard to understand and I'm not sure if that is even the 'normal'
> > functionality.  Certainly, we would hate to waste/poison an entire 1G page
> > for an error on a small subsection.
>
> Yes, that's not practical, so we need at first establish the code base for
> 2GB hugetlb splitting and then extending it to 1GB next.

I'm working on this, thanks for the inputs.

Regards,
Wanpeng Li

^ permalink raw reply

* Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook
From: Benjamin Herrenschmidt @ 2019-06-11  9:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Aaro Koskinen, linux-wireless, linux-kernel, Christian Zigotzky,
	linuxppc-dev, Larry Finger
In-Reply-To: <20190611075439.GB21815@lst.de>

On Tue, 2019-06-11 at 09:54 +0200, Christoph Hellwig wrote:
> On Tue, Jun 11, 2019 at 04:59:54PM +1000, Benjamin Herrenschmidt
> wrote:
> > Ah stupid me ... it's dma_set_mask that failed, since it has no
> > idea
> > that the calling driver is limited to lowmem.
> > 
> > That's also why the "wrong" patch worked.
> > 
> > So yes, a ZONE_DMA at 30-bits will work, though it's somewhat
> > overkill.
> 
> Well, according to Larry it doesn't actually work, which is odd.

Oh I assume that's just a glitch in the patch :-)

Cheers,
Ben.



^ permalink raw reply

* Re: [PATCH v3 1/6] nvdimm: Consider probe return -EOPNOTSUPP as success
From: Aneesh Kumar K.V @ 2019-06-11  9:29 UTC (permalink / raw)
  To: dan.j.williams; +Cc: linux-mm, linuxppc-dev, linux-nvdimm
In-Reply-To: <20190604091357.32213-1-aneesh.kumar@linux.ibm.com>


Hi Dan,

Any feedback on this?

A change I would like to get done on top of this series is

+	if (__le16_to_cpu(pfn_sb->version_minor) < 3) {
+		/*
+		 * For a large part we use PAGE_SIZE. But we
+		 * do have some accounting code using SZ_4K.
+		 */
+		pfn_sb->page_struct_size = cpu_to_le16(64);
+		pfn_sb->page_size = cpu_to_le32(SZ_4K);
+	}
+

to

+	if (__le16_to_cpu(pfn_sb->version_minor) < 3) {
+		/*
+		 * For a large part we use PAGE_SIZE. But we
+		 * do have some accounting code using SZ_4K.
+		 */
+		pfn_sb->page_struct_size = cpu_to_le16(64);
+		pfn_sb->page_size = cpu_to_le32(PAGE_SIZE);
+	}
+


That would make sure we will able to access the namespace created on
powerpc with newer kernel.

Kindly let me know if you want to see further changes to this series. Do
you think this is ready for next merge window?

-aneesh

"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:

> With following patches we add EOPNOTSUPP as return from probe callback to
> indicate we were not able to initialize a namespace due to pfn superblock
> feature/version mismatch. We want to consider this a probe success so that
> we can create new namesapce seed and there by avoid marking the failed
> namespace as the seed namespace.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  drivers/nvdimm/bus.c         |  4 ++--
>  drivers/nvdimm/nd-core.h     |  3 ++-
>  drivers/nvdimm/region_devs.c | 19 +++++++++++++++----
>  3 files changed, 19 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
> index 2eb6a6cfe9e4..792b3e90453b 100644
> --- a/drivers/nvdimm/bus.c
> +++ b/drivers/nvdimm/bus.c
> @@ -100,8 +100,8 @@ static int nvdimm_bus_probe(struct device *dev)
>  
>  	nvdimm_bus_probe_start(nvdimm_bus);
>  	rc = nd_drv->probe(dev);
> -	if (rc == 0)
> -		nd_region_probe_success(nvdimm_bus, dev);
> +	if (rc == 0 || rc == -EOPNOTSUPP)
> +		nd_region_probe_success(nvdimm_bus, dev, rc);
>  	else
>  		nd_region_disable(nvdimm_bus, dev);
>  	nvdimm_bus_probe_end(nvdimm_bus);
> diff --git a/drivers/nvdimm/nd-core.h b/drivers/nvdimm/nd-core.h
> index e5ffd5733540..9e67a79fb6d5 100644
> --- a/drivers/nvdimm/nd-core.h
> +++ b/drivers/nvdimm/nd-core.h
> @@ -134,7 +134,8 @@ int __init nvdimm_bus_init(void);
>  void nvdimm_bus_exit(void);
>  void nvdimm_devs_exit(void);
>  void nd_region_devs_exit(void);
> -void nd_region_probe_success(struct nvdimm_bus *nvdimm_bus, struct device *dev);
> +void nd_region_probe_success(struct nvdimm_bus *nvdimm_bus,
> +			     struct device *dev, int ret);
>  struct nd_region;
>  void nd_region_create_ns_seed(struct nd_region *nd_region);
>  void nd_region_create_btt_seed(struct nd_region *nd_region);
> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> index b4ef7d9ff22e..fcf3d8828540 100644
> --- a/drivers/nvdimm/region_devs.c
> +++ b/drivers/nvdimm/region_devs.c
> @@ -723,7 +723,7 @@ void nd_mapping_free_labels(struct nd_mapping *nd_mapping)
>   * disable the region.
>   */
>  static void nd_region_notify_driver_action(struct nvdimm_bus *nvdimm_bus,
> -		struct device *dev, bool probe)
> +					   struct device *dev, bool probe, int ret)
>  {
>  	struct nd_region *nd_region;
>  
> @@ -753,6 +753,16 @@ static void nd_region_notify_driver_action(struct nvdimm_bus *nvdimm_bus,
>  			nd_region_create_ns_seed(nd_region);
>  		nvdimm_bus_unlock(dev);
>  	}
> +
> +	if (dev->parent && is_nd_region(dev->parent) &&
> +	    !probe && (ret == -EOPNOTSUPP)) {
> +		nd_region = to_nd_region(dev->parent);
> +		nvdimm_bus_lock(dev);
> +		if (nd_region->ns_seed == dev)
> +			nd_region_create_ns_seed(nd_region);
> +		nvdimm_bus_unlock(dev);
> +	}
> +
>  	if (is_nd_btt(dev) && probe) {
>  		struct nd_btt *nd_btt = to_nd_btt(dev);
>  
> @@ -788,14 +798,15 @@ static void nd_region_notify_driver_action(struct nvdimm_bus *nvdimm_bus,
>  	}
>  }
>  
> -void nd_region_probe_success(struct nvdimm_bus *nvdimm_bus, struct device *dev)
> +void nd_region_probe_success(struct nvdimm_bus *nvdimm_bus,
> +			     struct device *dev, int ret)
>  {
> -	nd_region_notify_driver_action(nvdimm_bus, dev, true);
> +	nd_region_notify_driver_action(nvdimm_bus, dev, true, ret);
>  }
>  
>  void nd_region_disable(struct nvdimm_bus *nvdimm_bus, struct device *dev)
>  {
> -	nd_region_notify_driver_action(nvdimm_bus, dev, false);
> +	nd_region_notify_driver_action(nvdimm_bus, dev, false, 0);
>  }
>  
>  static ssize_t mappingN(struct device *dev, char *buf, int n)
> -- 
> 2.21.0


^ permalink raw reply

* Re: [PATCH v2] powerpc: Add force enable of DAWR on P9 option
From: Michael Neuling @ 2019-06-11  9:37 UTC (permalink / raw)
  To: Christophe Leroy, Cédric Le Goater, mpe; +Cc: linuxppc-dev, Cameron Kaiser
In-Reply-To: <4ebaff61-2e7e-e038-71c3-a7ae662b56f4@c-s.fr>

On Tue, 2019-06-11 at 09:51 +0200, Christophe Leroy wrote:
> 
> Le 11/06/2019 à 09:24, Michael Neuling a écrit :
> > On Tue, 2019-06-11 at 08:48 +0200, Cédric Le Goater wrote:
> > > On 11/06/2019 08:44, Michael Neuling wrote:
> > > > > >   2:
> > > > > > -BEGIN_FTR_SECTION
> > > > > > -	/* POWER9 with disabled DAWR */
> > > > > > +	LOAD_REG_ADDR(r11, dawr_force_enable)
> > > > > > +	lbz	r11, 0(r11)
> > > > > > +	cmpdi	r11, 0
> > > > > >   	li	r3, H_HARDWARE
> > > > > > -	blr
> > > > > > -END_FTR_SECTION_IFCLR(CPU_FTR_DAWR)
> > > > > > +	beqlr
> > > > > 
> > > > > Why is this a 'beqlr' ? Shouldn't it be a blr ?
> > > > 
> > > > I believe it's right and should be a beqlr.  It's to replace the FTR
> > > > section to
> > > > make it dynamic based on the dawr_force_enable bit.
> > > 
> > > hmm, see the crash below on a L1 running a nested guest. r3 is set
> > > to -1 (H_HARDWARE) but a vpcu pointer was expected. How can we fix
> > > this ?
> > > 
> > > C.
> > > 
> > > 
> > > [   44.374746] BUG: Kernel NULL pointer dereference at 0x000013bf
> > > [   44.374848] Faulting instruction address: 0xc00000000010b044
> > > [   44.374906] Oops: Kernel access of bad area, sig: 11 [#1]
> > > [   44.374951] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA
> > > pSeries
> > > [   44.375018] Modules linked in: vhost_net vhost tap xt_CHECKSUM
> > > iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack
> > > nf_defrag_ipv6 libcrc32c nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4
> > > xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter
> > > ip6_tables iptable_filter bpfilter vmx_crypto crct10dif_vpmsum
> > > crc32c_vpmsum kvm_hv kvm sch_fq_codel ip_tables x_tables autofs4
> > > virtio_net net_failover virtio_scsi failover
> > > [   44.375401] CPU: 8 PID: 1771 Comm: qemu-system-ppc Kdump: loaded Not
> > > tainted 5.2.0-rc4+ #3
> > > [   44.375500] NIP:  c00000000010b044 LR: c0080000089dacf4 CTR:
> > > c00000000010aff4
> > > [   44.375604] REGS: c00000179b397710 TRAP: 0300   Not tainted  (5.2.0-
> > > rc4+)
> > > [   44.375691] MSR:  800000000280b033
> > > <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 42244842  XER: 00000000
> > > [   44.375815] CFAR: c00000000010aff8 DAR: 00000000000013bf DSISR:
> > > 42000000 IRQMASK: 0
> > > [   44.375815] GPR00: c0080000089dd6bc c00000179b3979a0 c008000008a04300
> > > ffffffffffffffff
> > > [   44.375815] GPR04: 0000000000000000 0000000000000003 000000002444b05d
> > > c0000017f11c45d0
> > > [   44.375815] GPR08: 078000003e018dfe 0000000000000028 0000000000000001
> > > 0000000000000075
> > > [   44.375815] GPR12: c00000000010aff4 c000000007ff6300 0000000000000000
> > > 0000000000000000
> > > [   44.375815] GPR16: 0000000000000000 c0000017f11d0000 00000000ffffffff
> > > c0000017f11ca7a8
> > > [   44.375815] GPR20: c0000017f11c42ec ffffffffffffffff 0000000000000000
> > > 000000000000000a
> > > [   44.375815] GPR24: fffffffffffffffc 0000000000000000 c0000017f11c0000
> > > c000000001a77ed8
> > > [   44.375815] GPR28: c00000179af70000 fffffffffffffffc c0080000089ff170
> > > c00000179ae88540
> > > [   44.376673] NIP [c00000000010b044] kvmppc_h_set_dabr+0x50/0x68
> > > [   44.376754] LR [c0080000089dacf4] kvmppc_pseries_do_hcall+0xa3c/0xeb0
> > > [kvm_hv]
> > > [   44.376849] Call Trace:
> > > [   44.376886] [c00000179b3979a0] [c0000017f11c0000] 0xc0000017f11c0000
> > > (unreliable)
> > > [   44.376982] [c00000179b397a10] [c0080000089dd6bc]
> > > kvmppc_vcpu_run_hv+0x694/0xec0 [kvm_hv]
> > > [   44.377084] [c00000179b397ae0] [c0080000093f8bcc]
> > > kvmppc_vcpu_run+0x34/0x48 [kvm]
> > > [   44.377185] [c00000179b397b00] [c0080000093f522c]
> > > kvm_arch_vcpu_ioctl_run+0x2f4/0x400 [kvm]
> > > [   44.377286] [c00000179b397b90] [c0080000093e3618]
> > > kvm_vcpu_ioctl+0x460/0x850 [kvm]
> > > [   44.377384] [c00000179b397d00] [c0000000004ba6c4]
> > > do_vfs_ioctl+0xe4/0xb40
> > > [   44.377464] [c00000179b397db0] [c0000000004bb1e4] ksys_ioctl+0xc4/0x110
> > > [   44.377547] [c00000179b397e00] [c0000000004bb258] sys_ioctl+0x28/0x80
> > > [   44.377628] [c00000179b397e20] [c00000000000b888] system_call+0x5c/0x70
> > > [   44.377712] Instruction dump:
> > > [   44.377765] 4082fff4 4c00012c 38600000 4e800020 e96280c0 896b0000
> > > 2c2b0000 3860ffff
> > > [   44.377862] 4d820020 50852e74 508516f6 78840724 <f88313c0> f8a313c8
> > > 7c942ba6 7cbc2ba6
> > 
> > Opps, it's because I corrupted r3 :-(
> > 
> > Does this fix it?
> > 
> > 
> > diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > index 139027c62d..f781ee1458 100644
> > --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > @@ -2519,8 +2519,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
> >   	LOAD_REG_ADDR(r11, dawr_force_enable)
> >   	lbz	r11, 0(r11)
> >   	cmpdi	r11, 0
> > +	bne	3f
> >   	li	r3, H_HARDWARE
> > -	beqlr
> > +	blr
> > +3:
> 
> Or you could copy r3 into another unused volatile register and use it 
> instead of r3 below.

r3 is the vcpu pointer passed in. Changing to a different register will make the
code harder to follow IMHO.

Plus this is a much clearer fix.

So I don't think I'll do that.

Mikey

> 
> Christophe
> 
> 
> >   	/* Emulate H_SET_DABR/X on P8 for the sake of compat mode guests */
> >   	rlwimi	r5, r4, 5, DAWRX_DR | DAWRX_DW
> >   	rlwimi	r5, r4, 2, DAWRX_WT
> > 


^ permalink raw reply

* [Bug 203839] Kernel 5.2-rc3 fails to boot on a PowerMac G4 3,6: systemd[1]: Failed to bump fs.file-max, ignoring: invalid argument
From: bugzilla-daemon @ 2019-06-11 10:34 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <bug-203839-206035@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=203839

--- Comment #9 from Erhard F. (erhard_f@mailbox.org) ---
(In reply to Christophe Leroy from comment #8)
> Argh !
> 
> CONFIG_SMP must (again) be the reason we missed it.
> 
> Can you please try the change below ?
Applied your change on top of 5.2-rc4. The G4 boots fine again, thanks!

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply

* Re: [PATCH v3 3/6] mm/nvdimm: Add page size and struct page size to pfn superblock
From: Jan Kara @ 2019-06-11 10:37 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linux-mm, dan.j.williams, linuxppc-dev, linux-nvdimm
In-Reply-To: <20190604091357.32213-3-aneesh.kumar@linux.ibm.com>

On Tue 04-06-19 14:43:54, Aneesh Kumar K.V wrote:
> This is needed so that we don't wrongly initialize a namespace
> which doesn't have enough space reserved for holding struct pages
> with the current kernel.
> 
> We also increment PFN_MIN_VERSION to make sure that older kernel
> won't initialize namespace created with newer kernel.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
...
> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
> index 00c57805cad3..e01eee9efafe 100644
> --- a/drivers/nvdimm/pfn_devs.c
> +++ b/drivers/nvdimm/pfn_devs.c
> @@ -467,6 +467,15 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
>  	if (__le16_to_cpu(pfn_sb->version_minor) < 2)
>  		pfn_sb->align = 0;
>  
> +	if (__le16_to_cpu(pfn_sb->version_minor) < 3) {
> +		/*
> +		 * For a large part we use PAGE_SIZE. But we
> +		 * do have some accounting code using SZ_4K.
> +		 */
> +		pfn_sb->page_struct_size = cpu_to_le16(64);
> +		pfn_sb->page_size = cpu_to_le32(SZ_4K);
> +	}
> +
>  	switch (le32_to_cpu(pfn_sb->mode)) {
>  	case PFN_MODE_RAM:
>  	case PFN_MODE_PMEM:

As we discussed with Aneesh privately, this actually means that existing
NVDIMM namespaces on PPC64 will stop working due to these defaults for old
superblocks. I don't think that's a good thing as upgrading kernels is
going to be nightmare due to this on PPC64. So I believe we should make
defaults for old superblocks such that working setups keep working without
sysadmin having to touch anything.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox