Linux userland API discussions
 help / color / mirror / Atom feed
* Re: [PATCH 10/17] mm: rmap preparation for remap_anon_pages
From: Paolo Bonzini @ 2014-10-07 17:14 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Linus Torvalds
  Cc: Andrea Arcangeli, qemu-devel, KVM list, Linux Kernel Mailing List,
	linux-mm, Linux API, Andres Lagar-Cavilla, Dave Hansen,
	Rik van Riel, Mel Gorman, Andy Lutomirski, Andrew Morton,
	Sasha Levin, Hugh Dickins, Peter Feiner, Christopher Covington,
	Johannes Weiner, Android Kernel Team, Robert Love,
	Dmitry Adamushko, Neil Brown
In-Reply-To: <20141007170731.GO2404@work-vm>

Il 07/10/2014 19:07, Dr. David Alan Gilbert ha scritto:
>> > 
>> > So I'd *much* rather have a "write()" style interface (ie _copying_
>> > bytes from user space into a newly allocated page that gets mapped)
>> > than a "remap page" style interface
> Something like that might work for the postcopy case; it doesn't work
> for some of the other uses that need to stop a page being changed by the
> guest, but then need to somehow get a copy of that page internally to QEMU,
> and perhaps provide it back later.

I cannot parse this.  Which uses do you have in mind?  Is it for
QEMU-specific or is it for other applications of userfaults?

As long as the page is atomically mapped, I'm not sure what the
difference from remap_anon_pages are (as far as the destination page is
concerned).  Are you thinking of having userfaults enabled on the source
as well?

Paolo

> remap_anon_pages worked for those cases
> as well; I can't think of another current way of doing it in userspace.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 10/17] mm: rmap preparation for remap_anon_pages
From: Dr. David Alan Gilbert @ 2014-10-07 17:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrea Arcangeli, qemu-devel, KVM list, Linux Kernel Mailing List,
	linux-mm, Linux API, Andres Lagar-Cavilla, Dave Hansen,
	Paolo Bonzini, Rik van Riel, Mel Gorman, Andy Lutomirski,
	Andrew Morton, Sasha Levin, Hugh Dickins, Peter Feiner,
	Christopher Covington, Johannes Weiner, Android Kernel Team,
	Robert Love, Dmitry Adamushko <dmitry.adamush>
In-Reply-To: <CA+55aFxAOYBny+QwXfkPy-P3rs-RPr5SLYLcPNBiFO3waBXtQA@mail.gmail.com>

* Linus Torvalds (torvalds@linux-foundation.org) wrote:
> On Mon, Oct 6, 2014 at 12:41 PM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> >
> > Of course if somebody has better ideas on how to resolve an anonymous
> > userfault they're welcome.
> 
> So I'd *much* rather have a "write()" style interface (ie _copying_
> bytes from user space into a newly allocated page that gets mapped)
> than a "remap page" style interface

Something like that might work for the postcopy case; it doesn't work
for some of the other uses that need to stop a page being changed by the
guest, but then need to somehow get a copy of that page internally to QEMU,
and perhaps provide it back later.  remap_anon_pages worked for those cases
as well; I can't think of another current way of doing it in userspace.

I'm thinking here of systems for making VMs with memory larger than a single
host; that's something that's not as well thought out.  I've also seen people
writing emulation that want to trap and emulate some page accesses while
still having the original data available to the emulator itself.

So yes, OK for now, but the result is less general.

Dave


> remapping anonymous pages involves page table games that really aren't
> necessarily a good idea, and tlb invalidates for the old page etc.
> Just don't do it.
> 
>            Linus
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH v2 7/7] tpm: TPM 2.0 FIFO Interface
From: Jarkko Sakkinen @ 2014-10-07 17:01 UTC (permalink / raw)
  To: Peter Huewe, Ashley Lai, Marcel Selhorst
  Cc: tpmdd-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Will Arthur, Jarkko Sakkinen
In-Reply-To: <1412701277-27794-1-git-send-email-jarkko.sakkinen-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>

From: Will Arthur <will.c.arthur-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

Detect TPM 2.0 by using the extended STS (STS3) register. For TPM 2.0,
instead of calling tpm_get_timeouts(), assign duration and timeout
values defined in the TPM 2.0 PTP specification.

Signed-off-by: Will Arthur <will.c.arthur-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
---
 drivers/char/tpm/tpm_tis.c | 77 ++++++++++++++++++++++++++++++++++++----------
 1 file changed, 60 insertions(+), 17 deletions(-)

diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c
index df04ce6..5c5a0cc 100644
--- a/drivers/char/tpm/tpm_tis.c
+++ b/drivers/char/tpm/tpm_tis.c
@@ -1,5 +1,6 @@
 /*
  * Copyright (C) 2005, 2006 IBM Corporation
+ * Copyright (C) 2014 Intel Corporation
  *
  * Authors:
  * Leendert van Doorn <leendert-aZOuKsOsJu3MbYB6QlFGEg@public.gmane.org>
@@ -44,6 +45,10 @@ enum tis_status {
 	TPM_STS_DATA_EXPECT = 0x08,
 };
 
+enum tis_status3 {
+	TPM_STS3_TPM2_FAM = 0x04,
+};
+
 enum tis_int_flags {
 	TPM_GLOBAL_INT_ENABLE = 0x80000000,
 	TPM_INTF_BURST_COUNT_STATIC = 0x100,
@@ -70,6 +75,7 @@ enum tis_defaults {
 #define	TPM_INT_STATUS(l)		(0x0010 | ((l) << 12))
 #define	TPM_INTF_CAPS(l)		(0x0014 | ((l) << 12))
 #define	TPM_STS(l)			(0x0018 | ((l) << 12))
+#define	TPM_STS3(l)			(0x001b | ((l) << 12))
 #define	TPM_DATA_FIFO(l)		(0x0024 | ((l) << 12))
 
 #define	TPM_DID_VID(l)			(0x0F00 | ((l) << 12))
@@ -344,6 +350,7 @@ static int tpm_tis_send(struct tpm_chip *chip, u8 *buf, size_t len)
 {
 	int rc;
 	u32 ordinal;
+	unsigned long dur;
 
 	rc = tpm_tis_send_data(chip, buf, len);
 	if (rc < 0)
@@ -355,9 +362,14 @@ static int tpm_tis_send(struct tpm_chip *chip, u8 *buf, size_t len)
 
 	if (chip->vendor.irq) {
 		ordinal = be32_to_cpu(*((__be32 *) (buf + 6)));
+
+		if (chip->flags & TPM_CHIP_FLAG_TPM2)
+			dur = tpm_calc_ordinal_duration(chip, ordinal);
+		else
+			dur = tpm_calc_ordinal_duration(chip, ordinal);
+
 		if (wait_for_tpm_stat
-		    (chip, TPM_STS_DATA_AVAIL | TPM_STS_VALID,
-		     tpm_calc_ordinal_duration(chip, ordinal),
+		    (chip, TPM_STS_DATA_AVAIL | TPM_STS_VALID, dur,
 		     &chip->vendor.read_queue, false) < 0) {
 			rc = -ETIME;
 			goto out_err;
@@ -531,24 +543,43 @@ static int tpm_tis_init(struct device *dev, resource_size_t start,
 	u32 vendor, intfcaps, intmask;
 	int rc, i, irq_s, irq_e, probe;
 	struct tpm_chip *chip;
+	u8 sts3;
+	u32 dummy;
 
 	chip = tpm_chip_alloc(dev, &tpm_tis);
-	if (!chip)
+	if (IS_ERR(chip))
 		return -ENODEV;
 
 	chip->vendor.iobase = devm_ioremap(dev, start, len);
 	if (!chip->vendor.iobase)
 		return -EIO;
 
+	sts3 = ioread8(chip->vendor.iobase + TPM_STS3(1));
+	if (sts3 & TPM_STS3_TPM2_FAM)
+		chip->flags = TPM_CHIP_FLAG_TPM2;
+
 	rc = tpm_chip_register(chip);
 	if (rc)
 		return -ENODEV;
 
-	/* Default timeouts */
-	chip->vendor.timeout_a = msecs_to_jiffies(TIS_SHORT_TIMEOUT);
-	chip->vendor.timeout_b = msecs_to_jiffies(TIS_LONG_TIMEOUT);
-	chip->vendor.timeout_c = msecs_to_jiffies(TIS_SHORT_TIMEOUT);
-	chip->vendor.timeout_d = msecs_to_jiffies(TIS_SHORT_TIMEOUT);
+        /* Default timeouts */
+        if (chip->flags & TPM_CHIP_FLAG_TPM2) {
+                chip->vendor.timeout_a = usecs_to_jiffies(TPM2_TIMEOUT_A);
+                chip->vendor.timeout_b = usecs_to_jiffies(TPM2_TIMEOUT_B);
+                chip->vendor.timeout_c = usecs_to_jiffies(TPM2_TIMEOUT_C);
+                chip->vendor.timeout_d = usecs_to_jiffies(TPM2_TIMEOUT_D);
+                chip->vendor.duration[TPM_SHORT] =
+                        usecs_to_jiffies(TPM2_DURATION_SHORT);
+                chip->vendor.duration[TPM_MEDIUM] =
+                        usecs_to_jiffies(TPM2_DURATION_MEDIUM);
+                chip->vendor.duration[TPM_LONG] =
+                        usecs_to_jiffies(TPM2_DURATION_LONG);
+        } else {
+                chip->vendor.timeout_a = msecs_to_jiffies(TIS_SHORT_TIMEOUT);
+                chip->vendor.timeout_b = msecs_to_jiffies(TIS_LONG_TIMEOUT);
+                chip->vendor.timeout_c = msecs_to_jiffies(TIS_SHORT_TIMEOUT);
+                chip->vendor.timeout_d = msecs_to_jiffies(TIS_SHORT_TIMEOUT);
+        }
 
 	if (wait_startup(chip, 0) != 0) {
 		rc = -ENODEV;
@@ -563,9 +594,9 @@ static int tpm_tis_init(struct device *dev, resource_size_t start,
 	vendor = ioread32(chip->vendor.iobase + TPM_DID_VID(0));
 	chip->vendor.manufacturer_id = vendor;
 
-	dev_info(dev,
-		 "1.2 TPM (device-id 0x%X, rev-id %d)\n",
-		 vendor >> 16, ioread8(chip->vendor.iobase + TPM_RID(0)));
+        dev_info(dev, "%s TPM (device-id 0x%X, rev-id %d)\n",
+                 (chip->flags & TPM_CHIP_FLAG_TPM2) ? "2.0" : "1.2",
+                 vendor >> 16, ioread8(chip->vendor.iobase + TPM_RID(0)));
 
 	if (!itpm) {
 		probe = probe_itpm(chip);
@@ -612,7 +643,11 @@ static int tpm_tis_init(struct device *dev, resource_size_t start,
 		goto out_err;
 	}
 
-	if (tpm_do_selftest(chip)) {
+	if (chip->flags & TPM_CHIP_FLAG_TPM2)
+		rc = tpm2_do_selftest(chip);
+	else
+		rc = tpm_do_selftest(chip);
+	if (rc) {
 		dev_err(dev, "TPM self test failed\n");
 		rc = -ENODEV;
 		goto out_err;
@@ -673,7 +708,11 @@ static int tpm_tis_init(struct device *dev, resource_size_t start,
 			chip->vendor.probed_irq = 0;
 
 			/* Generate Interrupts */
-			tpm_gen_interrupt(chip);
+			if (chip->flags & TPM_CHIP_FLAG_TPM2)
+				rc = tpm2_get_tpm_pt(chip, TPM2_CAP_TPM_PROPERTIES, &dummy,
+						     "attempting to generate an interrupt");
+			else
+				tpm_gen_interrupt(chip);
 
 			chip->vendor.irq = chip->vendor.probed_irq;
 
@@ -752,14 +791,18 @@ static void tpm_tis_reenable_interrupts(struct tpm_chip *chip)
 static int tpm_tis_resume(struct device *dev)
 {
 	struct tpm_chip *chip = dev_get_drvdata(dev);
-	int ret;
+	int ret = 0;
 
 	if (chip->vendor.irq)
 		tpm_tis_reenable_interrupts(chip);
 
-	ret = tpm_pm_resume(dev);
-	if (!ret)
-		tpm_do_selftest(chip);
+	if (chip->flags & TPM_CHIP_FLAG_TPM2)
+		tpm2_do_selftest(chip);
+	else {
+		ret = tpm_pm_resume(dev);
+		if (!ret)
+			tpm_do_selftest(chip);
+	}
 
 	return ret;
 }
-- 
2.1.0

^ permalink raw reply related

* [PATCH v2 6/7] tpm: TPM 2.0 CRB Interface
From: Jarkko Sakkinen @ 2014-10-07 17:01 UTC (permalink / raw)
  To: Peter Huewe, Ashley Lai, Marcel Selhorst
  Cc: tpmdd-devel, linux-kernel, linux-api, Jarkko Sakkinen
In-Reply-To: <1412701277-27794-1-git-send-email-jarkko.sakkinen@linux.intel.com>

tpm_crb is a driver for TPM 2.0 Command Response Buffer (CRB) Interface
as defined in PC Client Platform TPM Profile (PTP) Specification.

Only polling and single locality is supported as these are the limitations
of the available hardware, Platform Trust Techonlogy (PTT) in Haswell
CPUs.

The driver always applies CRB with ACPI start because PTT reports using
only ACPI start as start method but as a result of my testing it requires
also CRB start.

Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
---
 drivers/char/tpm/Kconfig   |   9 ++
 drivers/char/tpm/Makefile  |   1 +
 drivers/char/tpm/tpm_crb.c | 329 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 339 insertions(+)
 create mode 100644 drivers/char/tpm/tpm_crb.c

diff --git a/drivers/char/tpm/Kconfig b/drivers/char/tpm/Kconfig
index c54cac3..10c9419 100644
--- a/drivers/char/tpm/Kconfig
+++ b/drivers/char/tpm/Kconfig
@@ -122,4 +122,13 @@ config TCG_XEN
 	  To compile this driver as a module, choose M here; the module
 	  will be called xen-tpmfront.
 
+config TCG_CRB
+	tristate "TPM 2.0 CRB Interface"
+	depends on X86 && ACPI
+	---help---
+	  If you have a TPM security chip that is compliant with the
+	  TCG CRB 2.0 TPM specification say Yes and it will be accessible
+	  from within Linux.  To compile this driver as a module, choose
+	  M here; the module will be called tpm_crb.
+
 endif # TCG_TPM
diff --git a/drivers/char/tpm/Makefile b/drivers/char/tpm/Makefile
index d3cf905..15e3b4c 100644
--- a/drivers/char/tpm/Makefile
+++ b/drivers/char/tpm/Makefile
@@ -22,3 +22,4 @@ obj-$(CONFIG_TCG_INFINEON) += tpm_infineon.o
 obj-$(CONFIG_TCG_IBMVTPM) += tpm_ibmvtpm.o
 obj-$(CONFIG_TCG_ST33_I2C) += tpm_i2c_stm_st33.o
 obj-$(CONFIG_TCG_XEN) += xen-tpmfront.o
+obj-$(CONFIG_TCG_CRB) += tpm_crb.o
diff --git a/drivers/char/tpm/tpm_crb.c b/drivers/char/tpm/tpm_crb.c
new file mode 100644
index 0000000..f808a40
--- /dev/null
+++ b/drivers/char/tpm/tpm_crb.c
@@ -0,0 +1,329 @@
+/*
+ * Copyright (C) 2014 Intel Corporation
+ *
+ * Authors:
+ * Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
+ *
+ * Maintained by: <tpmdd-devel@lists.sourceforge.net>
+ *
+ * This device driver implements the TPM interface as defined in
+ * the TCG CRB 2.0 TPM specification.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#include <linux/acpi.h>
+#include <linux/highmem.h>
+#include <linux/rculist.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include "tpm.h"
+
+#define ACPI_SIG_TPM2 "TPM2"
+
+static const u8 CRB_ACPI_START_UUID[] = {
+	/* 0000 */ 0xAB, 0x6C, 0xBF, 0x6B, 0x63, 0x54, 0x14, 0x47,
+	/* 0008 */ 0xB7, 0xCD, 0xF0, 0x20, 0x3C, 0x03, 0x68, 0xD4
+};
+
+enum crb_defaults {
+	CRB_ACPI_START_REVISION_ID = 1,
+	CRB_ACPI_START_INDEX = 1,
+};
+
+enum crb_start_method {
+	CRB_SM_ACPI_START = 2,
+	CRB_SM_CRB = 7,
+	CRB_SM_CRB_WITH_ACPI_START = 8,
+};
+
+struct acpi_tpm2 {
+	struct acpi_table_header hdr;
+	u16 platform_class;
+	u16 reserved;
+	u64 control_area_pa;
+	u32 start_method;
+};
+
+enum crb_ca_request {
+	CRB_CA_REQ_GO_IDLE	= BIT(0),
+	CRB_CA_REQ_CMD_READY	= BIT(1),
+};
+
+enum crb_ca_status {
+	CRB_CA_STS_ERROR	= BIT(0),
+	CRB_CA_STS_TPM_IDLE	= BIT(1),
+};
+
+struct crb_control_area {
+	u32 req;
+	u32 sts;
+	u32 cancel;
+	u32 start;
+	u32 int_enable;
+	u32 int_sts;
+	u32 cmd_size;
+	u64 cmd_pa;
+	u32 rsp_size;
+	u64 rsp_pa;
+} __packed;
+
+enum crb_status {
+	CRB_STS_COMPLETE	= BIT(0),
+};
+
+enum crb_flags {
+	CRB_FL_ACPI_START	= BIT(0),
+	CRB_FL_CRB_START	= BIT(1),
+};
+
+struct crb_priv {
+	unsigned int flags;
+	struct crb_control_area *cca;
+	unsigned long cca_pa;
+	acpi_handle dev_handle;
+};
+
+#ifdef CONFIG_PM_SLEEP
+int crb_suspend(struct device *dev)
+{
+	return 0;
+}
+
+static int crb_resume(struct device *dev)
+{
+	struct tpm_chip *chip = dev_get_drvdata(dev);
+
+	(void) tpm2_do_selftest(chip);
+
+	return 0;
+}
+#endif
+
+static SIMPLE_DEV_PM_OPS(crb_pm, crb_suspend, crb_resume);
+
+static u8 crb_status(struct tpm_chip *chip)
+{
+	struct crb_priv *priv = chip->vendor.priv;
+	u8 sts = 0;
+
+	if ((le32_to_cpu(priv->cca->start) & 1) != 1)
+		sts |= CRB_STS_COMPLETE;
+
+	return sts;
+}
+
+static int crb_recv(struct tpm_chip *chip, u8 *buf, size_t count)
+{
+	struct crb_priv *priv = chip->vendor.priv;
+	struct crb_control_area *cca;
+	unsigned int expected;
+	unsigned long offset;
+	u8 *resp;
+
+	cca = priv->cca;
+	if (le32_to_cpu(cca->sts) & CRB_CA_STS_ERROR)
+		return -EIO;
+
+	offset = le64_to_cpu(cca->rsp_pa) - priv->cca_pa;
+	resp = (u8 *) ((unsigned long) cca + offset);
+	memcpy(buf, resp, 6);
+	expected = be32_to_cpup((__be32 *) &buf[2]);
+
+	if (expected > count)
+		return -EIO;
+
+	memcpy(&buf[6], &resp[6], expected - 6);
+
+	return expected;
+}
+
+static int crb_do_acpi_start(struct tpm_chip *chip)
+{
+	struct crb_priv *priv = chip->vendor.priv;
+	union acpi_object *obj;
+	int rc;
+
+	obj = acpi_evaluate_dsm(priv->dev_handle,
+				CRB_ACPI_START_UUID,
+				CRB_ACPI_START_REVISION_ID,
+				CRB_ACPI_START_INDEX,
+				NULL);
+	if (!obj)
+		return -ENXIO;
+	rc = obj->integer.value == 0 ? 0 : -ENXIO;
+	ACPI_FREE(obj);
+	return rc;
+}
+
+static int crb_send(struct tpm_chip *chip, u8 *buf, size_t len)
+{
+	struct crb_priv *priv = chip->vendor.priv;
+	struct crb_control_area *cca;
+	u8 *cmd;
+	int rc = 0;
+
+	cca = priv->cca;
+
+	if (len > le32_to_cpu(cca->cmd_size)) {
+		dev_err(chip->dev,
+			"invalid command count value %x %zx\n",
+			(unsigned int) len,
+			(size_t) le32_to_cpu(cca->cmd_size));
+		return -E2BIG;
+	}
+
+	cmd = (u8 *) ((unsigned long) cca + le64_to_cpu(cca->cmd_pa) - priv->cca_pa);
+	memcpy(cmd, buf, len);
+	wmb();
+
+	cca->start = cpu_to_le32(1);
+	rc = crb_do_acpi_start(chip);
+	return rc;
+}
+
+static void crb_cancel(struct tpm_chip *chip)
+{
+	struct crb_priv *priv = chip->vendor.priv;
+	struct crb_control_area *cca;
+
+	cca = priv->cca;
+	cca->cancel = cpu_to_le32(1);
+	wmb();
+
+	if (crb_do_acpi_start(chip))
+		dev_err(chip->dev, "ACPI Start failed\n");
+
+	cca->cancel = 0;
+}
+
+static bool crb_req_canceled(struct tpm_chip *chip, u8 status)
+{
+	struct crb_priv *priv = chip->vendor.priv;
+
+	return (le32_to_cpu(priv->cca->cancel) & 1) == 1;
+}
+
+static const struct tpm_class_ops tpm_crb = {
+	.status = crb_status,
+	.recv = crb_recv,
+	.send = crb_send,
+	.cancel = crb_cancel,
+	.req_canceled = crb_req_canceled,
+	.req_complete_mask = CRB_STS_COMPLETE,
+	.req_complete_val = CRB_STS_COMPLETE,
+};
+
+static int crb_acpi_add(struct acpi_device *device)
+{
+	struct tpm_chip *chip;
+	struct acpi_tpm2 *buf;
+	struct crb_priv *priv;
+	struct device *dev = &device->dev;
+	acpi_status status;
+	u32 sm;
+	int rc;
+
+	chip = tpm_chip_alloc(dev, &tpm_crb);
+	if (IS_ERR(chip))
+		return -ENODEV;
+
+	chip->flags = TPM_CHIP_FLAG_TPM2;
+
+	rc = tpm_chip_register(chip);
+	if (rc)
+		return -ENODEV;
+
+	status = acpi_get_table(ACPI_SIG_TPM2, 1,
+				(struct acpi_table_header **) &buf);
+	if (ACPI_FAILURE(status)) {
+		dev_err(dev, "could not get TPM2 ACPI table\n");
+		rc = -ENODEV;
+		goto out_err;
+	}
+
+	priv = (struct crb_priv *) devm_kzalloc(dev, sizeof(struct crb_priv),
+						GFP_KERNEL);
+	if (!priv) {
+		rc = -ENODEV;
+		goto out_err;
+	}
+
+	sm = le32_to_cpu(buf->start_method);
+
+	if (sm == CRB_SM_CRB || sm == CRB_SM_CRB_WITH_ACPI_START)
+		priv->flags |= CRB_FL_CRB_START;
+
+	if (sm == CRB_SM_ACPI_START || sm == CRB_SM_CRB_WITH_ACPI_START)
+		priv->flags |= CRB_FL_ACPI_START;
+
+	priv->dev_handle = device->handle;
+	priv->cca_pa = le32_to_cpu(buf->control_area_pa);
+	priv->cca = (struct crb_control_area *)
+		devm_ioremap_nocache(dev, buf->control_area_pa, 0x1000);
+	if (!priv->cca) {
+		rc = -ENODEV;
+		goto out_err;
+	}
+
+	chip->vendor.priv = priv;
+
+	/* Default timeouts and durations */
+	chip->vendor.timeout_a = usecs_to_jiffies(TPM2_TIMEOUT_A);
+	chip->vendor.timeout_b = usecs_to_jiffies(TPM2_TIMEOUT_B);
+	chip->vendor.timeout_c = usecs_to_jiffies(TPM2_TIMEOUT_C);
+	chip->vendor.timeout_d = usecs_to_jiffies(TPM2_TIMEOUT_D);
+	chip->vendor.duration[TPM_SHORT] =
+		usecs_to_jiffies(TPM2_DURATION_SHORT);
+	chip->vendor.duration[TPM_MEDIUM] =
+		usecs_to_jiffies(TPM2_DURATION_MEDIUM);
+	chip->vendor.duration[TPM_LONG] =
+		usecs_to_jiffies(TPM2_DURATION_LONG);
+
+	rc = tpm2_do_selftest(chip);
+	if (rc) {
+		rc = -ENODEV;
+		goto out_err;
+	}
+
+	return 0;
+out_err:
+	tpm_chip_unregister(chip);
+	return rc;
+}
+
+int crb_acpi_remove(struct acpi_device *device)
+{
+	struct device *dev = &device->dev;
+	struct tpm_chip *chip = dev_get_drvdata(dev);
+
+	tpm_chip_unregister(chip);
+	return 0;
+}
+
+static struct acpi_device_id crb_device_ids[] = {
+	{"MSFT0101", 0},
+	{"", 0},
+};
+MODULE_DEVICE_TABLE(acpi, crb_device_ids);
+
+static struct acpi_driver crb_acpi_driver = {
+	.name = "tpm_crb",
+	.ids = crb_device_ids,
+	.ops = {
+		.add = crb_acpi_add,
+		.remove = crb_acpi_remove,
+	},
+	.drv = {
+		.pm = &crb_pm,
+	},
+};
+
+module_acpi_driver(crb_acpi_driver);
+MODULE_AUTHOR("Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>");
+MODULE_DESCRIPTION("TPM2 Driver");
+MODULE_VERSION("0.1");
+MODULE_LICENSE("GPL");
-- 
2.1.0

^ permalink raw reply related

* [PATCH v2 5/7] tpm: TPM 2.0 sysfs attributes
From: Jarkko Sakkinen @ 2014-10-07 17:01 UTC (permalink / raw)
  To: Peter Huewe, Ashley Lai, Marcel Selhorst
  Cc: tpmdd-devel, linux-kernel, linux-api, Jarkko Sakkinen
In-Reply-To: <1412701277-27794-1-git-send-email-jarkko.sakkinen@linux.intel.com>

Implemented sysfs attributes for TPM2 devices.

Documentation/ABI/stable/sysfs-class/tpm2 contains descriptions
of these attributes.

Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
---
 Documentation/ABI/stable/sysfs-class-tpm2 |  69 +++++++
 drivers/char/tpm/Makefile                 |   2 +-
 drivers/char/tpm/tpm-chip.c               |  10 +-
 drivers/char/tpm/tpm.h                    |  24 +++
 drivers/char/tpm/tpm2-sysfs.c             | 314 ++++++++++++++++++++++++++++++
 5 files changed, 416 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/ABI/stable/sysfs-class-tpm2
 create mode 100644 drivers/char/tpm/tpm2-sysfs.c

diff --git a/Documentation/ABI/stable/sysfs-class-tpm2 b/Documentation/ABI/stable/sysfs-class-tpm2
new file mode 100644
index 0000000..e9d4b6a
--- /dev/null
+++ b/Documentation/ABI/stable/sysfs-class-tpm2
@@ -0,0 +1,69 @@
+What:		/sys/class/misc/tpmX/device/
+Date:		October 2014
+KernelVersion:	3.19
+Contact:	tpmdd-devel@lists.sf.net
+Description:	The device/ directory under a specific TPM instance exposes
+		the properties of that TPM chip.
+
+What:		/sys/class/misc/tpmX/device/version
+Date:		October 2014
+KernelVersion:	3.19
+Contact:	tpmdd-devel@lists.sf.net
+Description:	The "version" property prints the protocol version number
+		in the major.minor format.
+
+What:		/sys/class/misc/tpmX/device/enabled_sh
+Date:		October 2014
+KernelVersion:	3.19
+Contact:	tpmdd-devel@lists.sf.net
+Description:	The "enabled_sh" property prints a '1' if the Storage Hierarchy
+		is enabled.
+
+What:		/sys/class/misc/tpmX/device/enabled_eh
+Date:		October 2014
+KernelVersion:	3.19
+Contact:	tpmdd-devel@lists.sf.net
+Description:	The "enabled_eh" property prints a '1' if the Endorsement Hierarchy
+		is enabled.
+
+What:		/sys/class/misc/tpmX/device/owned_sh
+Date:		October 2014
+KernelVersion:	3.19
+Contact:	tpmdd-devel@lists.sf.net
+Description:	The "owned_sh" property prints a '1' if the ownership of the 
+		Storage Hierarchy has been taken.
+
+What:		/sys/class/misc/tpmX/device/owned_eh
+Date:		October 2014
+KernelVersion:	3.19
+Contact:	tpmdd-devel@lists.sf.net
+Description:	The "owned_sh" property prints a '1' if the ownership of the
+		Endrosoment Hierarchy has been taken.
+
+What:		/sys/class/misc/tpmX/device/manufacturer
+Date:		October 2014
+KernelVersion:	3.19
+Contact:	tpmdd-devel@lists.sf.net
+Description:	The "manufacturer" property prints the vendor ID of the TPM
+		manufacturer.
+
+What:		/sys/class/misc/tpmX/device/firmware
+Date:		October 2014
+KernelVersion:	3.19
+Contact:	tpmdd-devel@lists.sf.net
+Description:	The property prints the vendor-specific value indicating the
+		version of the firmware.
+
+What:		/sys/class/misc/tpmX/device/pcr/sha1/X
+Date:		October 2014
+KernelVersion:	3.19
+Contact:	tpmdd-devel@lists.sf.net
+Description:	These files print PCR values for the SHA-1 bank.
+
+What:		/sys/class/misc/tpmX/device/cancel
+Date:		October 2014
+KernelVersion:	3.19
+Contact:	tpmdd-devel@lists.sf.net
+Description:	The "cancel" property allows you to cancel the currently
+		pending TPM command. Writing any value to cancel will call the
+		TPM chip specific cancel operation.
diff --git a/drivers/char/tpm/Makefile b/drivers/char/tpm/Makefile
index ae56af9..d3cf905 100644
--- a/drivers/char/tpm/Makefile
+++ b/drivers/char/tpm/Makefile
@@ -2,7 +2,7 @@
 # Makefile for the kernel tpm device drivers.
 #
 obj-$(CONFIG_TCG_TPM) += tpm.o
-tpm-y := tpm-interface.o tpm-dev.o tpm-sysfs.o tpm-chip.o tpm2-cmd.o
+tpm-y := tpm-interface.o tpm-dev.o tpm-sysfs.o tpm-chip.o tpm2-cmd.o tpm2-sysfs.o
 tpm-$(CONFIG_ACPI) += tpm_ppi.o
 
 ifdef CONFIG_ACPI
diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
index 6cc4cee..5c50fd7 100644
--- a/drivers/char/tpm/tpm-chip.c
+++ b/drivers/char/tpm/tpm-chip.c
@@ -130,7 +130,10 @@ int tpm_chip_register(struct tpm_chip *chip)
 	if (rc)
 		return rc;
 
-	rc = tpm_sysfs_add_device(chip);
+	if (chip->flags & TPM_CHIP_FLAG_TPM2)
+		rc = tpm2_sysfs_add_device(chip->dev);
+	else
+		rc = tpm_sysfs_add_device(chip);
 	if (rc)
 		goto del_misc;
 
@@ -171,7 +174,10 @@ void tpm_chip_unregister(struct tpm_chip *chip)
 	synchronize_rcu();
 
 	tpm_dev_del_device(chip);
-	tpm_sysfs_del_device(chip);
+	if (chip->flags & TPM_CHIP_FLAG_TPM2)
+		tpm2_sysfs_del_device(chip->dev);
+	else
+		tpm_sysfs_del_device(chip);
 	tpm_remove_ppi(&chip->dev->kobj);
 	tpm_bios_log_teardown(chip->bios_dir);
 }
diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index d141639..4678cdf 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -107,6 +107,24 @@ enum tpm2_capabilities {
 	TPM2_CAP_TPM_PROPERTIES = 6,
 };
 
+enum tpm2_tpm_properties {
+	TPM2_PT_MANUFACTURER		= 0x00000105,
+	TPM2_PT_FIRMWARE_VERSION_1	= 0x00000111,
+	TPM2_PT_FIRMWARE_VERSION_2	= 0x00000111,
+	TPM2_PT_PERMANENT		= 0x00000200,
+	TPM2_PT_STARTUP_CLEAR		= 0x00000201,
+};
+
+enum tpm2_pt_startup_clear {
+	TPM2_PT_SC_SH_ENABLE	= BIT(1),
+	TPM2_PT_SC_EH_ENABLE	= BIT(2),
+};
+
+enum tpm2_pt_permanent {
+	TPM2_PT_PM_OWNER_AUTH_SET	= BIT(0),
+	TPM2_PT_PM_ENDORSEMENT_AUTH_SET	= BIT(1),
+};
+
 enum tpm2_startup_types {
 	TPM2_SU_CLEAR	= 0x0000,
 	TPM2_SU_STATE	= 0x0001,
@@ -165,6 +183,9 @@ struct tpm_chip {
 
 	struct dentry **bios_dir;
 
+	struct kobject *pcrs_kobj;
+	void *sha1_bank;
+
 	struct list_head list;
 };
 
@@ -419,3 +440,6 @@ extern ssize_t tpm2_get_tpm_pt(struct tpm_chip *chip, u32 property_id,
 			       u32* value, const char *desc);
 extern unsigned long tpm2_calc_ordinal_duration(struct tpm_chip *, u32);
 extern int tpm2_do_selftest(struct tpm_chip *chip);
+
+int tpm2_sysfs_add_device(struct device *dev);
+void tpm2_sysfs_del_device(struct device *dev);
diff --git a/drivers/char/tpm/tpm2-sysfs.c b/drivers/char/tpm/tpm2-sysfs.c
new file mode 100644
index 0000000..e6e603e
--- /dev/null
+++ b/drivers/char/tpm/tpm2-sysfs.c
@@ -0,0 +1,314 @@
+/*
+ * Copyright (C) 2014 Intel Corporation
+ * Copyright (C) 2004 IBM Corporation
+ * Copyright (C) 2013 Obsidian Research Corp
+ *
+ * Authors:
+ * Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
+ *
+ * sysfs filesystem inspection interface to the TPM
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation, version 2 of the
+ * License.
+ *
+ */
+#include <linux/device.h>
+#include <linux/slab.h>
+#include "tpm.h"
+
+static ssize_t enabled_sh_show(struct device *dev, struct device_attribute *attr,
+		     char *buf)
+{
+	struct tpm_chip *chip = dev_get_drvdata(dev);
+	u32 value;
+	ssize_t rc;
+
+	rc = tpm2_get_tpm_pt(chip, TPM2_PT_STARTUP_CLEAR, &value,
+			     "could not retrieve STARTUP_CLEAR property");
+	if (rc)
+		return 0;
+
+	rc = sprintf(buf, "%d\n", (value & TPM2_PT_SC_SH_ENABLE) > 0);
+	return rc;
+}
+static DEVICE_ATTR_RO(enabled_sh);
+
+static ssize_t enabled_eh_show(struct device *dev, struct device_attribute *attr,
+		     char *buf)
+{
+	struct tpm_chip *chip = dev_get_drvdata(dev);
+	u32 value;
+	ssize_t rc;
+
+	rc = tpm2_get_tpm_pt(chip, TPM2_PT_STARTUP_CLEAR, &value,
+			     "could not retrieve STARTUP_CLEAR property");
+	if (rc)
+		return 0;
+
+	rc = sprintf(buf, "%d\n", (value & TPM2_PT_SC_EH_ENABLE) > 0);
+	return rc;
+}
+static DEVICE_ATTR_RO(enabled_eh);
+
+static ssize_t owned_sh_show(struct device *dev, struct device_attribute *attr,
+			  char *buf)
+{
+	struct tpm_chip *chip = dev_get_drvdata(dev);
+	u32 value;
+	ssize_t rc;
+
+	rc = tpm2_get_tpm_pt(chip, TPM2_PT_PERMANENT, &value,
+			     "could not retrieve PERMANENT property");
+	if (rc)
+		return 0;
+
+	rc = sprintf(buf, "%d\n", (value & TPM2_PT_PM_OWNER_AUTH_SET) > 0);
+	return rc;
+}
+static DEVICE_ATTR_RO(owned_sh);
+
+static ssize_t owned_eh_show(struct device *dev, struct device_attribute *attr,
+			  char *buf)
+{
+	struct tpm_chip *chip = dev_get_drvdata(dev);
+	u32 value;
+	ssize_t rc;
+
+	rc = tpm2_get_tpm_pt(chip, TPM2_PT_PERMANENT, &value,
+			     "could not retrieve PERMANENT property");
+	if (rc)
+		return 0;
+
+	rc = sprintf(buf, "%d\n", (value & TPM2_PT_PM_ENDORSEMENT_AUTH_SET) > 0);
+	return rc;
+}
+static DEVICE_ATTR_RO(owned_eh);
+
+static ssize_t manufacturer_show(struct device *dev,
+				 struct device_attribute *attr,
+				 char *buf)
+{
+	struct tpm_chip *chip = dev_get_drvdata(dev);
+	u32 manufacturer;
+	ssize_t rc;
+	char *str = buf;
+
+	rc = tpm2_get_tpm_pt(chip, TPM2_PT_MANUFACTURER, (u32 *) &manufacturer,
+			     "could not retrieve MANUFACTURER property");
+	if (rc)
+		return 0;
+
+	str += sprintf(str, "0x%08x\n", be32_to_cpu(manufacturer));
+
+	return str - buf;
+}
+static DEVICE_ATTR_RO(manufacturer);
+
+static ssize_t firmware_show(struct device *dev, struct device_attribute *attr,
+			 char *buf)
+{
+	struct tpm_chip *chip = dev_get_drvdata(dev);
+	u32 firmware1;
+	u32 firmware2;
+	ssize_t rc;
+	char *str = buf;
+
+	rc = tpm2_get_tpm_pt(chip, TPM2_PT_FIRMWARE_VERSION_1, (u32 *) &firmware1,
+			     "could not retrieve FIRMWARE_VERSION_1 property");
+	if (rc)
+		return 0;
+
+	rc = tpm2_get_tpm_pt(chip, TPM2_PT_FIRMWARE_VERSION_2, (u32 *) &firmware2,
+			     "could not retrieve FIRMWARE_VERSION_2 property");
+	if (rc)
+		return 0;
+
+	str += sprintf(str, "0x%08x.0x%08x\n", firmware1, firmware2);
+
+	return str - buf;
+}
+static DEVICE_ATTR_RO(firmware);
+
+static ssize_t cancel_store(struct device *dev, struct device_attribute *attr,
+			    const char *buf, size_t count)
+{
+	struct tpm_chip *chip = dev_get_drvdata(dev);
+	if (chip == NULL)
+		return 0;
+
+	chip->ops->cancel(chip);
+	return count;
+}
+static DEVICE_ATTR_WO(cancel);
+
+static ssize_t version_show(struct device *dev, struct device_attribute *attr,
+			   char *buf)
+{
+	char *str = buf;
+
+	str += sprintf(str, "2.0\n");
+
+	return str - buf;
+}
+static DEVICE_ATTR_RO(version);
+
+static struct attribute *tpm_dev_attrs[] = {
+	&dev_attr_enabled_sh.attr,
+	&dev_attr_enabled_eh.attr,
+	&dev_attr_owned_sh.attr,
+	&dev_attr_owned_eh.attr,
+	&dev_attr_manufacturer.attr,
+	&dev_attr_firmware.attr,
+	&dev_attr_cancel.attr,
+	&dev_attr_version.attr,
+	NULL,
+};
+
+static const struct attribute_group tpm_dev_group = {
+	.attrs	= tpm_dev_attrs,
+};
+
+struct pcr_attr {
+	struct attribute attr;
+	unsigned int index;
+	char name[3];
+};
+
+struct pcr_bank {
+	struct kobject kobj;
+	struct kobj_type ktype;
+	struct device *dev;
+	struct pcr_attr pcr_attrs[TPM2_PLATFORM_PCR];
+	struct attribute *attrs[TPM2_PLATFORM_PCR + 1];
+};
+
+static ssize_t pcr_bank_attr_show(struct kobject *kobj,
+				  struct attribute *attr,
+				  char *buf)
+{
+	u8 digest[TPM_DIGEST_SIZE];
+	ssize_t rc;
+	int i;
+	char *str = buf;
+	struct tpm_chip *chip;
+	struct pcr_attr *pcr_attr;
+	struct pcr_bank *pcr_bank;
+
+	pcr_attr = container_of(attr, struct pcr_attr, attr);
+	pcr_bank = container_of(kobj, struct pcr_bank, kobj);
+	chip = dev_get_drvdata(pcr_bank->dev);
+
+	rc = tpm2_pcr_read(chip, pcr_attr->index, digest);
+	if (rc)
+		return rc;
+
+	for (i = 0; i < TPM_DIGEST_SIZE; i++)
+		str += sprintf(str, "%02X", digest[i]);
+
+	str += sprintf(str, "\n");
+
+	return str - buf;
+}
+
+static void pcr_bank_release(struct kobject *kobj)
+{
+	struct pcr_bank *pcr_bank;
+	pcr_bank = container_of(kobj, struct pcr_bank, kobj);
+	kfree(pcr_bank);
+}
+
+static const struct sysfs_ops pcr_bank_sysfs_ops = {
+	.show		= pcr_bank_attr_show,
+};
+
+static struct pcr_bank *pcr_bank_create(struct device *dev,
+					struct kobject *parent_kobj)
+{
+	struct pcr_bank *pcr_bank;
+	struct pcr_attr *pcr_attr;
+	struct attribute *attr;
+	int i;
+	int rc;
+
+	pcr_bank = kzalloc(sizeof(*pcr_bank), GFP_KERNEL);
+	if (!pcr_bank)
+		return NULL;
+
+	pcr_bank->dev			= dev;
+	pcr_bank->ktype.sysfs_ops	= &pcr_bank_sysfs_ops;
+	pcr_bank->ktype.default_attrs	= pcr_bank->attrs;
+	pcr_bank->ktype.release		= pcr_bank_release;
+
+	for (i = 0; i < TPM2_PLATFORM_PCR; i++) {
+		pcr_attr = &pcr_bank->pcr_attrs[i];
+		pcr_attr->index = i;
+		sprintf(pcr_attr->name, "%d", i);
+
+		attr = &pcr_attr->attr;
+		attr->name = pcr_attr->name;
+		attr->mode = S_IRUGO;
+
+		pcr_bank->attrs[i] = attr;
+	}
+
+	pcr_bank->attrs[i] = NULL;
+
+	rc = kobject_init_and_add(&pcr_bank->kobj, &pcr_bank->ktype,
+				  parent_kobj, "sha1");
+	if (rc) {
+		kfree(pcr_bank);
+		return NULL;
+	}
+
+	return pcr_bank;
+}
+
+int tpm2_sysfs_add_device(struct device *dev)
+{
+	struct tpm_chip *chip;
+	struct pcr_bank *pcr_bank;
+	struct kobject *pcrs_kobj;
+	int rc;
+
+	rc = sysfs_create_group(&dev->kobj, &tpm_dev_group);
+	if (rc) {
+		dev_err(dev, "failed to create sysfs attributes, %d\n", rc);
+		return rc;
+	}
+
+	pcrs_kobj = kobject_create_and_add("pcrs", &dev->kobj);
+	if (!pcrs_kobj) {
+		sysfs_remove_group(&dev->kobj, &tpm_dev_group);
+		dev_err(dev, "failed to create sysfs attributes, %d\n", rc);
+		return -ENOMEM;
+	}
+
+	pcr_bank = pcr_bank_create(dev, pcrs_kobj);
+	if (!pcr_bank) {
+		kobject_put(pcrs_kobj);
+		sysfs_remove_group(&dev->kobj, &tpm_dev_group);
+		dev_err(dev, "failed to create sysfs attributes, %d\n", rc);
+		return -ENOMEM;
+	}
+
+	kobject_uevent(pcrs_kobj, KOBJ_ADD);
+
+	chip = dev_get_drvdata(dev);
+	chip->pcrs_kobj = pcrs_kobj;
+	chip->sha1_bank = &pcr_bank->kobj;
+
+	return 0;
+}
+
+void tpm2_sysfs_del_device(struct device *dev)
+{
+	struct tpm_chip *chip;
+
+	chip = dev_get_drvdata(dev);
+
+	kobject_put(chip->sha1_bank);
+	kobject_put(chip->pcrs_kobj);
+	sysfs_remove_group(&dev->kobj, &tpm_dev_group);
+}
-- 
2.1.0

^ permalink raw reply related

* [PATCH v2 4/7] tpm: TPM 2.0 commands
From: Jarkko Sakkinen @ 2014-10-07 17:01 UTC (permalink / raw)
  To: Peter Huewe, Ashley Lai, Marcel Selhorst
  Cc: tpmdd-devel, linux-kernel, linux-api, Jarkko Sakkinen,
	Will Arthur
In-Reply-To: <1412701277-27794-1-git-send-email-jarkko.sakkinen@linux.intel.com>

This patch contains the following internal helper functions for
tpm.ko:

- tpm2_get_random()
- tpm2_get_tpm_pt()
- tpm2_pcr_extend()
- tpm2_pcr_read()
- tpm2_startup()

and the following exported functions for implementing new device
drivers:

- tpm2_do_selftest()
- tpm2_calc_ordinal_durations()
- tpm2_gen_interrupt()

Functions that are defined in include/linux/tpm.h and tpm_transmit()
use TPM2 commands for TPM2 chips.

Added a new field "flags" to struct tpm_chip and enum tpm_chip_flags.
TPM2 chips can set TPM_CHIP_FLAG_TPM2 after tpm_chip_alloc() to
enable TPM2 commands.

The code for tpm2_calc_ordinal_duration() and tpm2_startup() is
derived from the code originally made by Will Arthur.

Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Will Arthur <will.c.arthur@intel.com>
---
 drivers/char/tpm/Makefile        |   2 +-
 drivers/char/tpm/tpm-interface.c |  24 +-
 drivers/char/tpm/tpm.h           |  66 +++++
 drivers/char/tpm/tpm2-cmd.c      | 543 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 631 insertions(+), 4 deletions(-)
 create mode 100644 drivers/char/tpm/tpm2-cmd.c

diff --git a/drivers/char/tpm/Makefile b/drivers/char/tpm/Makefile
index 837da04..ae56af9 100644
--- a/drivers/char/tpm/Makefile
+++ b/drivers/char/tpm/Makefile
@@ -2,7 +2,7 @@
 # Makefile for the kernel tpm device drivers.
 #
 obj-$(CONFIG_TCG_TPM) += tpm.o
-tpm-y := tpm-interface.o tpm-dev.o tpm-sysfs.o tpm-chip.o
+tpm-y := tpm-interface.o tpm-dev.o tpm-sysfs.o tpm-chip.o tpm2-cmd.o
 tpm-$(CONFIG_ACPI) += tpm_ppi.o
 
 ifdef CONFIG_ACPI
diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
index 1ce3ad3..ec28b5f 100644
--- a/drivers/char/tpm/tpm-interface.c
+++ b/drivers/char/tpm/tpm-interface.c
@@ -360,7 +360,10 @@ static ssize_t tpm_transmit(struct tpm_chip *chip, const char *buf,
 	if (chip->vendor.irq)
 		goto out_recv;
 
-	stop = jiffies + tpm_calc_ordinal_duration(chip, ordinal);
+	if (chip->flags & TPM_CHIP_FLAG_TPM2)
+		stop = jiffies + tpm2_calc_ordinal_duration(chip, ordinal);
+	else
+		stop = jiffies + tpm_calc_ordinal_duration(chip, ordinal);
 	do {
 		u8 status = chip->ops->status(chip);
 		if ((status & chip->ops->req_complete_mask) ==
@@ -482,7 +485,7 @@ static const struct tpm_input_header tpm_startup_header = {
 static int tpm_startup(struct tpm_chip *chip, __be16 startup_type)
 {
 	struct tpm_cmd_t start_cmd;
-	start_cmd.header.in = tpm_startup_header;
+
 	start_cmd.params.startup_in.startup_type = startup_type;
 	return tpm_transmit_cmd(chip, &start_cmd, TPM_INTERNAL_RESULT_SIZE,
 				"attempting to start the TPM");
@@ -679,7 +682,10 @@ int tpm_pcr_read(u32 chip_num, int pcr_idx, u8 *res_buf)
 	chip = tpm_chip_find_get(chip_num);
 	if (chip == NULL)
 		return -ENODEV;
-	rc = tpm_pcr_read_dev(chip, pcr_idx, res_buf);
+	if (chip->flags & TPM_CHIP_FLAG_TPM2)
+		rc = tpm2_pcr_read(chip, pcr_idx, res_buf);
+	else
+		rc = tpm_pcr_read_dev(chip, pcr_idx, res_buf);
 	tpm_chip_put(chip);
 	return rc;
 }
@@ -713,6 +719,12 @@ int tpm_pcr_extend(u32 chip_num, int pcr_idx, const u8 *hash)
 	if (chip == NULL)
 		return -ENODEV;
 
+	if (chip->flags & TPM_CHIP_FLAG_TPM2) {
+		rc = tpm2_pcr_extend(chip, pcr_idx, hash);
+		tpm_chip_put(chip);
+		return rc;
+	}
+
 	cmd.header.in = pcrextend_header;
 	cmd.params.pcrextend_in.pcr_idx = cpu_to_be32(pcr_idx);
 	memcpy(cmd.params.pcrextend_in.hash, hash, TPM_DIGEST_SIZE);
@@ -986,6 +998,12 @@ int tpm_get_random(u32 chip_num, u8 *out, size_t max)
 	if (chip == NULL)
 		return -ENODEV;
 
+	if (chip->flags & TPM_CHIP_FLAG_TPM2) {
+		err = tpm2_get_random(chip, out, max);
+		tpm_chip_put(chip);
+		return err;
+	}
+
 	do {
 		tpm_cmd.header.in = tpm_getrandom_header;
 		tpm_cmd.params.getrandom_in.num_bytes = cpu_to_be32(num_bytes);
diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index 5eb89897..d141639 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -61,6 +61,57 @@ enum tpm_duration {
 #define TPM_ERR_INVALID_POSTINIT 38
 
 #define TPM_HEADER_SIZE		10
+
+enum tpm2_const {
+	TPM2_PLATFORM_PCR	= 24,
+	TPM2_PCR_SELECT_MIN	= ((TPM2_PLATFORM_PCR + 7) / 8),
+	TPM2_TIMEOUT_A		= 750 * 1000,
+	TPM2_TIMEOUT_B		= 2000 * 1000,
+	TPM2_TIMEOUT_C		= 200 * 1000,
+	TPM2_TIMEOUT_D		= 30 * 1000,
+	TPM2_DURATION_SHORT	= 20 * 1000,
+	TPM2_DURATION_MEDIUM	= 750 * 1000,
+	TPM2_DURATION_LONG	= 2000 * 1000,
+};
+
+enum tpm2_structures {
+	TPM2_ST_NO_SESSIONS	= 0x8001,
+	TPM2_ST_SESSIONS	= 0x8002,
+};
+
+enum tpm2_return_codes {
+	TPM2_RC_TESTING		= 0x090A,
+	TPM2_RC_DISABLED	= 0x0120,
+};
+
+enum tpm2_algorithms {
+	TPM2_ALG_SHA1		= 0x0004,
+};
+
+enum tpm2_command_codes {
+	TPM2_CC_FIRST		= 0x011F,
+	TPM2_CC_SELF_TEST	= 0x0143,
+	TPM2_CC_STARTUP		= 0x0144,
+	TPM2_CC_GET_CAPABILITY	= 0x017A,
+	TPM2_CC_GET_RANDOM	= 0x017B,
+	TPM2_CC_PCR_READ	= 0x017E,
+	TPM2_CC_PCR_EXTEND	= 0x0182,
+	TPM2_CC_LAST		= 0x018F,
+};
+
+enum tpm2_permanent_handles {
+	TPM2_RS_PW		= 0x40000009,
+};
+
+enum tpm2_capabilities {
+	TPM2_CAP_TPM_PROPERTIES = 6,
+};
+
+enum tpm2_startup_types {
+	TPM2_SU_CLEAR	= 0x0000,
+	TPM2_SU_STATE	= 0x0001,
+};
+
 struct tpm_chip;
 
 struct tpm_vendor_specific {
@@ -94,9 +145,14 @@ struct tpm_vendor_specific {
 #define TPM_VID_WINBOND  0x1050
 #define TPM_VID_STM      0x104A
 
+enum tpm_chip_flags {
+	TPM_CHIP_FLAG_TPM2	= BIT(0),
+};
+
 struct tpm_chip {
 	struct device *dev;	/* Device stuff */
 	const struct tpm_class_ops *ops;
+	unsigned int flags;
 
 	int dev_num;		/* /dev/tpm# */
 	char devname[7];
@@ -353,3 +409,13 @@ static inline void tpm_remove_ppi(struct kobject *parent)
 {
 }
 #endif
+
+int tpm2_startup(struct tpm_chip *chip, __be16 startup_type);
+int tpm2_pcr_read(struct tpm_chip *chip, int pcr_idx, u8 *res_buf);
+int tpm2_pcr_extend(struct tpm_chip *chip, int pcr_idx, const u8 *hash);
+int tpm2_get_random(struct tpm_chip *chip, u8 *out, size_t max);
+
+extern ssize_t tpm2_get_tpm_pt(struct tpm_chip *chip, u32 property_id,
+			       u32* value, const char *desc);
+extern unsigned long tpm2_calc_ordinal_duration(struct tpm_chip *, u32);
+extern int tpm2_do_selftest(struct tpm_chip *chip);
diff --git a/drivers/char/tpm/tpm2-cmd.c b/drivers/char/tpm/tpm2-cmd.c
new file mode 100644
index 0000000..e0c455d
--- /dev/null
+++ b/drivers/char/tpm/tpm2-cmd.c
@@ -0,0 +1,543 @@
+/*
+ * Copyright (C) 2014 Intel Corporation
+ *
+ * Authors:
+ * Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
+ *
+ * Maintained by: <tpmdd-devel@lists.sourceforge.net>
+ *
+ * This file contains TPM2 protocol implementations of the commands
+ * used by the kernel internally.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#include "tpm.h"
+
+struct tpm2_startup_in {
+	__be16	startup_type;
+} __packed;
+
+struct tpm2_self_test_in {
+	u8	full_test;
+} __packed;
+
+struct tpm2_pcr_read_in {
+	__be32	pcr_selects_cnt;
+	__be16	hash_alg;
+	u8	pcr_select_size;
+	u8	pcr_select[TPM2_PCR_SELECT_MIN];
+} __packed;
+
+struct tpm2_pcr_read_out {
+	__be32	update_cnt;
+	__be32	pcr_selects_cnt;
+	__be16	hash_alg;
+	u8	pcr_select_size;
+	u8	pcr_select[TPM2_PCR_SELECT_MIN];
+	__be32	digests_cnt;
+	__be16	digest_size;
+	u8	digest[TPM_DIGEST_SIZE];
+} __packed;
+
+struct tpm2_null_auth_area {
+	__be32			handle;
+	__be16			nonce_size;
+	u8			attributes;
+	__be16			auth_size;
+} __packed;
+
+struct tpm2_pcr_extend_in {
+	__be32				pcr_idx;
+	__be32				auth_area_size;
+	struct tpm2_null_auth_area	auth_area;
+	__be32				digest_cnt;
+	__be16				hash_alg;
+	u8				digest[TPM_DIGEST_SIZE];
+} __packed;
+
+struct tpm2_get_tpm_pt_in {
+	__be32	cap_id;
+	__be32	property_id;
+	__be32	property_cnt;
+} __packed;
+
+struct tpm2_get_tpm_pt_out {
+	u8	more_data;
+	__be32	subcap_id;
+	__be32	property_cnt;
+	__be32	property_id;
+	__be32	value;
+} __packed;
+
+struct tpm2_get_random_in {
+	__be16	size;
+} __packed;
+
+struct tpm2_get_random_out {
+	__be16	size;
+	u8	buffer[TPM_MAX_RNG_DATA];
+} __packed;
+
+union tpm2_cmd_params {
+	struct	tpm2_startup_in		startup_in;
+	struct	tpm2_self_test_in	selftest_in;
+	struct	tpm2_pcr_read_in	pcrread_in;
+	struct	tpm2_pcr_read_out	pcrread_out;
+	struct	tpm2_pcr_extend_in	pcrextend_in;
+	struct	tpm2_get_tpm_pt_in	get_tpm_pt_in;
+	struct	tpm2_get_tpm_pt_out	get_tpm_pt_out;
+	struct	tpm2_get_random_in	getrandom_in;
+	struct	tpm2_get_random_out	getrandom_out;
+};
+
+struct tpm2_cmd {
+	tpm_cmd_header		header;
+	union tpm2_cmd_params	params;
+} __packed;
+
+/*
+ * Array with one entry per ordinal defining the maximum amount
+ * of time the chip could take to return the result. The values
+ * of the SHORT, MEDIUM, and LONG durations are taken from the
+ * PC Client Profile (PTP) specification.
+ */
+static const u8 tpm2_ordinal_duration[TPM2_CC_LAST - TPM2_CC_FIRST + 1] = {
+	TPM_UNDEFINED,		/* 11F */
+	TPM_UNDEFINED,		/* 120 */
+	TPM_LONG,		/* 121 */
+	TPM_UNDEFINED,		/* 122 */
+	TPM_UNDEFINED,		/* 123 */
+	TPM_UNDEFINED,		/* 124 */
+	TPM_UNDEFINED,		/* 125 */
+	TPM_UNDEFINED,		/* 126 */
+	TPM_UNDEFINED,		/* 127 */
+	TPM_UNDEFINED,		/* 128 */
+	TPM_LONG,		/* 129 */
+	TPM_UNDEFINED,		/* 12a */
+	TPM_UNDEFINED,		/* 12b */
+	TPM_UNDEFINED,		/* 12c */
+	TPM_UNDEFINED,		/* 12d */
+	TPM_UNDEFINED,		/* 12e */
+	TPM_UNDEFINED,		/* 12f */
+	TPM_UNDEFINED,		/* 130 */
+	TPM_UNDEFINED,		/* 131 */
+	TPM_UNDEFINED,		/* 132 */
+	TPM_UNDEFINED,		/* 133 */
+	TPM_UNDEFINED,		/* 134 */
+	TPM_UNDEFINED,		/* 135 */
+	TPM_UNDEFINED,		/* 136 */
+	TPM_UNDEFINED,		/* 137 */
+	TPM_UNDEFINED,		/* 138 */
+	TPM_UNDEFINED,		/* 139 */
+	TPM_UNDEFINED,		/* 13a */
+	TPM_UNDEFINED,		/* 13b */
+	TPM_UNDEFINED,		/* 13c */
+	TPM_UNDEFINED,		/* 13d */
+	TPM_MEDIUM,		/* 13e */
+	TPM_UNDEFINED,		/* 13f */
+	TPM_UNDEFINED,		/* 140 */
+	TPM_UNDEFINED,		/* 141 */
+	TPM_UNDEFINED,		/* 142 */
+	TPM_LONG,		/* 143 */
+	TPM_MEDIUM,		/* 144 */
+	TPM_UNDEFINED,		/* 145 */
+	TPM_UNDEFINED,		/* 146 */
+	TPM_UNDEFINED,		/* 147 */
+	TPM_UNDEFINED,		/* 148 */
+	TPM_UNDEFINED,		/* 149 */
+	TPM_UNDEFINED,		/* 14a */
+	TPM_UNDEFINED,		/* 14b */
+	TPM_UNDEFINED,		/* 14c */
+	TPM_UNDEFINED,		/* 14d */
+	TPM_LONG,		/* 14e */
+	TPM_UNDEFINED,		/* 14f */
+	TPM_UNDEFINED,		/* 150 */
+	TPM_UNDEFINED,		/* 151 */
+	TPM_UNDEFINED,		/* 152 */
+	TPM_UNDEFINED,		/* 153 */
+	TPM_UNDEFINED,		/* 154 */
+	TPM_UNDEFINED,		/* 155 */
+	TPM_UNDEFINED,		/* 156 */
+	TPM_UNDEFINED,		/* 157 */
+	TPM_UNDEFINED,		/* 158 */
+	TPM_UNDEFINED,		/* 159 */
+	TPM_UNDEFINED,		/* 15a */
+	TPM_UNDEFINED,		/* 15b */
+	TPM_MEDIUM,		/* 15c */
+	TPM_UNDEFINED,		/* 15d */
+	TPM_UNDEFINED,		/* 15e */
+	TPM_UNDEFINED,		/* 15f */
+	TPM_UNDEFINED,		/* 160 */
+	TPM_UNDEFINED,		/* 161 */
+	TPM_UNDEFINED,		/* 162 */
+	TPM_UNDEFINED,		/* 163 */
+	TPM_UNDEFINED,		/* 164 */
+	TPM_UNDEFINED,		/* 165 */
+	TPM_UNDEFINED,		/* 166 */
+	TPM_UNDEFINED,		/* 167 */
+	TPM_UNDEFINED,		/* 168 */
+	TPM_UNDEFINED,		/* 169 */
+	TPM_UNDEFINED,		/* 16a */
+	TPM_UNDEFINED,		/* 16b */
+	TPM_UNDEFINED,		/* 16c */
+	TPM_UNDEFINED,		/* 16d */
+	TPM_UNDEFINED,		/* 16e */
+	TPM_UNDEFINED,		/* 16f */
+	TPM_UNDEFINED,		/* 170 */
+	TPM_UNDEFINED,		/* 171 */
+	TPM_UNDEFINED,		/* 172 */
+	TPM_UNDEFINED,		/* 173 */
+	TPM_UNDEFINED,		/* 174 */
+	TPM_UNDEFINED,		/* 175 */
+	TPM_UNDEFINED,		/* 176 */
+	TPM_LONG,		/* 177 */
+	TPM_UNDEFINED,		/* 178 */
+	TPM_UNDEFINED,		/* 179 */
+	TPM_MEDIUM,		/* 17a */
+	TPM_LONG,		/* 17b */
+	TPM_UNDEFINED,		/* 17c */
+	TPM_UNDEFINED,		/* 17d */
+	TPM_UNDEFINED,		/* 17e */
+	TPM_UNDEFINED,		/* 17f */
+	TPM_UNDEFINED,		/* 180 */
+	TPM_UNDEFINED,		/* 181 */
+	TPM_MEDIUM,		/* 182 */
+	TPM_UNDEFINED,		/* 183 */
+	TPM_UNDEFINED,		/* 184 */
+	TPM_MEDIUM,		/* 185 */
+	TPM_MEDIUM,		/* 186 */
+	TPM_UNDEFINED,		/* 187 */
+	TPM_UNDEFINED,		/* 188 */
+	TPM_UNDEFINED,		/* 189 */
+	TPM_UNDEFINED,		/* 18a */
+	TPM_UNDEFINED,		/* 18b */
+	TPM_UNDEFINED,		/* 18c */
+	TPM_UNDEFINED,		/* 18d */
+	TPM_UNDEFINED,		/* 18e */
+	TPM_UNDEFINED		/* 18f */
+};
+
+static const struct tpm_input_header tpm2_startup_header = {
+	.tag = cpu_to_be16(TPM2_ST_NO_SESSIONS),
+	.length = cpu_to_be32(12),
+	.ordinal = cpu_to_be32(TPM2_CC_STARTUP)
+};
+
+/**
+ * tpm2_startup() - send startup command to the TPM chip
+ * @chip:		TPM chip to use.
+ * @startup_type	startup type. The value is either
+ * 			TPM_SU_CLEAR or TPM_SU_STATE.
+ *
+ * 0 is returned when the operation is succesful. When a negative number is
+ * returned it remarks a POSIX error code. When a positive number is returned
+ * it remarks a TPM error.
+ */
+int tpm2_startup(struct tpm_chip *chip, __be16 startup_type)
+{
+	struct tpm2_cmd cmd;
+
+	cmd.header.in = tpm2_startup_header;
+
+	cmd.params.startup_in.startup_type = startup_type;
+	return tpm_transmit_cmd(chip, &cmd, sizeof(cmd),
+				"attempting to start the TPM");
+}
+
+#define TPM2_PCR_READ_IN_SIZE \
+	(sizeof(struct tpm_input_header) + \
+	 sizeof(struct tpm2_pcr_read_in))
+
+static const struct tpm_input_header tpm2_pcrread_header = {
+	.tag = cpu_to_be16(TPM2_ST_NO_SESSIONS),
+	.length = cpu_to_be32(TPM2_PCR_READ_IN_SIZE),
+	.ordinal = cpu_to_be32(TPM2_CC_PCR_READ)
+};
+
+/**
+ * tpm2_pcr_read() - read a PCR value
+ * @chip:	TPM chip to use.
+ * @pcr_idx:	index of the PCR to read.
+ * @ref_buf:	buffer to store the resulting hash,
+ *
+ * 0 is returned when the operation is succesful. When a negative number is
+ * returned it remarks a POSIX error code. When a positive number is returned
+ * it remarks a TPM error.
+ */
+int tpm2_pcr_read(struct tpm_chip *chip, int pcr_idx, u8 *res_buf)
+{
+	int rc;
+	struct tpm2_cmd cmd;
+	u8 *buf;
+	int i, j;
+
+	if (pcr_idx >= TPM2_PLATFORM_PCR)
+		return -EINVAL;
+
+	cmd.header.in = tpm2_pcrread_header;
+	cmd.params.pcrread_in.pcr_selects_cnt = cpu_to_be32(1);
+	cmd.params.pcrread_in.hash_alg = cpu_to_be16(TPM2_ALG_SHA1);
+	cmd.params.pcrread_in.pcr_select_size = TPM2_PCR_SELECT_MIN;
+
+	for (i = 0; i < TPM2_PCR_SELECT_MIN; i++) {
+		j = pcr_idx - i * 8;
+
+		cmd.params.pcrread_in.pcr_select[i] =
+			(j >= 0 && j < 8) ? 1 << j : 0;
+	}
+
+	rc = tpm_transmit_cmd(chip, &cmd, sizeof(cmd),
+			      "attempting to read a pcr value");
+
+	if (rc == 0) {
+		buf = cmd.params.pcrread_out.digest;
+		memcpy(res_buf, buf, TPM_DIGEST_SIZE);
+	}
+
+	return rc;
+}
+
+/**
+ * tpm2_pcr_extend() - extend a PCR value
+ * @chip:	TPM chip to use.
+ * @pcr_idx:	index of the PCR.
+ * @hash:	hash value to use for the extend operation.
+ *
+ * 0 is returned when the operation is succesful. When a negative number is
+ * returned it remarks a POSIX error code. When a positive number is returned
+ * it remarks a TPM error.
+ */
+static const struct tpm_input_header tpm2_pcrextend_header = {
+	.tag = cpu_to_be16(TPM2_ST_SESSIONS),
+	.length = cpu_to_be32(sizeof(struct tpm_input_header) +
+			      sizeof(struct tpm2_pcr_extend_in)),
+	.ordinal = cpu_to_be32(TPM2_CC_PCR_EXTEND)
+};
+
+int tpm2_pcr_extend(struct tpm_chip *chip, int pcr_idx, const u8 *hash)
+{
+	struct tpm2_cmd cmd;
+	int rc;
+
+	cmd.header.in = tpm2_pcrextend_header;
+	cmd.params.pcrextend_in.pcr_idx = cpu_to_be32(pcr_idx);
+	cmd.params.pcrextend_in.auth_area_size =
+		cpu_to_be32(sizeof(struct tpm2_null_auth_area));
+	cmd.params.pcrextend_in.auth_area.handle =
+		cpu_to_be32(TPM2_RS_PW);
+	cmd.params.pcrextend_in.auth_area.nonce_size = 0;
+	cmd.params.pcrextend_in.auth_area.attributes = 0;
+	cmd.params.pcrextend_in.auth_area.auth_size = 0;
+	cmd.params.pcrextend_in.digest_cnt = cpu_to_be32(1);
+	cmd.params.pcrextend_in.hash_alg = cpu_to_be16(TPM2_ALG_SHA1);
+	memcpy(cmd.params.pcrextend_in.digest, hash, TPM_DIGEST_SIZE);
+
+	rc = tpm_transmit_cmd(chip, &cmd, sizeof(cmd),
+			      "attempting extend a PCR value");
+
+	return rc;
+}
+
+static const struct tpm_input_header tpm2_getrandom_header = {
+	.tag = cpu_to_be16(TPM2_ST_NO_SESSIONS),
+	.length = cpu_to_be32(sizeof(struct tpm_input_header) +
+			      sizeof(struct tpm2_get_random_in)),
+	.ordinal = cpu_to_be32(TPM2_CC_GET_RANDOM)
+};
+
+/**
+ * tpm2_get_random() - get random bytes from the TPM RNG
+ * @chip: TPM chip to use
+ * @out: destination buffer for the random bytes
+ * @max: the max number of bytes to write to @out
+ *
+ * 0 is returned when the operation is succesful. When a negative number is
+ * returned it remarks a POSIX error code. When a positive number is returned
+ * it remarks a TPM error.
+ */
+int tpm2_get_random(struct tpm_chip *chip, u8 *out, size_t max)
+{
+	struct tpm2_cmd cmd;
+	u32 recd, num_bytes = min_t(u32, max, TPM_MAX_RNG_DATA);
+	int err, total = 0, retries = 5;
+	u8 *dest = out;
+
+	if (!out || !num_bytes || max > TPM_MAX_RNG_DATA)
+		return -EINVAL;
+
+	do {
+		cmd.header.in = tpm2_getrandom_header;
+		cmd.params.getrandom_in.size = cpu_to_be16(num_bytes);
+
+		err = tpm_transmit_cmd(chip, &cmd, sizeof(cmd),
+				       "attempting get random");
+		if (err)
+			break;
+
+		recd = be16_to_cpu(cmd.params.getrandom_out.size);
+		memcpy(dest, cmd.params.getrandom_out.buffer, recd);
+
+		dest += recd;
+		total += recd;
+		num_bytes -= recd;
+	} while (retries-- && total < max);
+
+	return total ? total : -EIO;
+}
+
+#define TPM2_GET_TPM_PT_IN_SIZE \
+	(sizeof(struct tpm_input_header) + \
+	 sizeof(struct tpm2_get_tpm_pt_in))
+
+static const struct tpm_input_header tpm2_get_tpm_pt_header = {
+	.tag = cpu_to_be16(TPM2_ST_NO_SESSIONS),
+	.length = cpu_to_be32(TPM2_GET_TPM_PT_IN_SIZE),
+	.ordinal = cpu_to_be32(TPM2_CC_GET_CAPABILITY)
+};
+
+/**
+ * tpm2_get_tpm_pt() - get value of a TPM_CAP_TPM_PROPERTIES type property
+ * @chip:		TPM chip to use.
+ * @property_id:	property ID.
+ * @value:		output variable.
+ * @desc:		passed to tpm_transmit_cmd()
+ *
+ * 0 is returned when the operation is succesful. When a negative number is
+ * returned it remarks a POSIX error code. When a positive number is returned
+ * it remarks a TPM error.
+ */
+ssize_t tpm2_get_tpm_pt(struct tpm_chip *chip, u32 property_id,  u32* value,
+			const char *desc)
+{
+	struct tpm2_cmd cmd;
+	int rc;
+
+	cmd.header.in = tpm2_get_tpm_pt_header;
+	cmd.params.get_tpm_pt_in.cap_id = cpu_to_be32(TPM2_CAP_TPM_PROPERTIES);
+	cmd.params.get_tpm_pt_in.property_id = property_id;
+	cmd.params.get_tpm_pt_in.property_cnt = cpu_to_be32(1);
+
+	rc = tpm_transmit_cmd(chip, &cmd, sizeof(cmd), desc);
+	if (!rc)
+		*value = cmd.params.get_tpm_pt_out.value;
+
+	return rc;
+}
+EXPORT_SYMBOL_GPL(tpm2_get_tpm_pt);
+
+/*
+ * tpm2_calc_ordinal_duration() - maximum duration for a command
+ * @chip: 	TPM chip to use.
+ * @ordinal:	command code number.
+ *
+ * 0 is returned when the operation is succesful. When a negative number is
+ * returned it remarks a POSIX error code. When a positive number is returned
+ * it remarks a TPM error.
+ */
+unsigned long tpm2_calc_ordinal_duration(struct tpm_chip *chip, u32 ordinal)
+{
+	int index = TPM_UNDEFINED;
+	int duration = 0;
+
+	if (ordinal >= TPM2_CC_FIRST && ordinal <= TPM2_CC_LAST)
+		index = tpm2_ordinal_duration[ordinal - TPM2_CC_FIRST];
+
+	if (index != TPM_UNDEFINED)
+		duration = chip->vendor.duration[index];
+	if (duration <= 0)
+		return 2 * 60 * HZ;
+	else
+		return duration;
+}
+EXPORT_SYMBOL_GPL(tpm2_calc_ordinal_duration);
+
+static const struct tpm_input_header tpm2_selftest_header = {
+	.tag = cpu_to_be16(TPM2_ST_NO_SESSIONS),
+	.length = cpu_to_be32(sizeof(struct tpm_input_header) +
+			      sizeof(struct tpm2_self_test_in)),
+	.ordinal = cpu_to_be32(TPM2_CC_SELF_TEST)
+};
+
+#define TPM2_SELF_TEST_IN_SIZE \
+	(sizeof(struct tpm_input_header) + sizeof(struct tpm2_self_test_in))
+
+/**
+ * tpm2_continue_selftest() - start a self test
+ * @chip: TPM chip to use
+ * @full: test all commands instead of testing only those that were not
+ *        previously tested.
+ *
+ * 0 is returned when the operation is succesful. When a negative number is
+ * returned it remarks a POSIX error code. When a positive number is returned
+ * it remarks a TPM error.
+ */
+static int tpm2_start_selftest(struct tpm_chip *chip, bool full)
+{
+	int rc;
+	struct tpm2_cmd cmd;
+
+	cmd.header.in = tpm2_selftest_header;
+	cmd.params.selftest_in.full_test = full;
+
+	rc = tpm_transmit_cmd(chip, &cmd, TPM2_SELF_TEST_IN_SIZE,
+			      "continue selftest");
+
+	return rc;
+}
+
+/**
+ * tpm2_do_selftest() - run a full self test
+ * @chip: TPM chip to use
+ *
+ * During the self test TPM2 commands return with the error code RC_TESTING.
+ * Waiting is done by issuing PCR read until it executes succesfully.
+ *
+ * 0 is returned when the operation is succesful. When a negative number is
+ * returned it remarks a POSIX error code. When a positive number is returned
+ * it remarks a TPM error.
+ */
+int tpm2_do_selftest(struct tpm_chip *chip)
+{
+	int rc;
+	unsigned int loops;
+	unsigned int delay_msec = 100;
+	unsigned long duration;
+	struct tpm2_cmd cmd;
+	int i;
+
+	duration = tpm2_calc_ordinal_duration(chip, TPM2_CC_SELF_TEST);
+
+	loops = jiffies_to_msecs(duration) / delay_msec;
+
+	rc = tpm2_start_selftest(chip, true);
+	if (rc)
+		return rc;
+
+	for (i = 0; i < loops; i++) {
+		/* Attempt to read a PCR value */
+		cmd.header.in = tpm2_pcrread_header;
+		cmd.params.pcrread_in.pcr_selects_cnt = cpu_to_be32(1);
+		cmd.params.pcrread_in.hash_alg = cpu_to_be16(TPM2_ALG_SHA1);
+		cmd.params.pcrread_in.pcr_select_size = TPM2_PCR_SELECT_MIN;
+		cmd.params.pcrread_in.pcr_select[0] = 0x01;
+		cmd.params.pcrread_in.pcr_select[1] = 0x00;
+		cmd.params.pcrread_in.pcr_select[2] = 0x00;
+
+		rc = tpm_transmit_cmd(chip, (u8 *) &cmd, sizeof(cmd), NULL);
+		if (rc < 0)
+			break;
+
+		rc = be32_to_cpu(cmd.header.out.return_code);
+		if (rc != TPM2_RC_TESTING)
+			break;
+
+		msleep(delay_msec);
+	}
+
+	return rc;
+}
+EXPORT_SYMBOL_GPL(tpm2_do_selftest);
-- 
2.1.0

^ permalink raw reply related

* [PATCH v2 3/7] tpm: clean up tpm_tis driver life-cycle
From: Jarkko Sakkinen @ 2014-10-07 17:01 UTC (permalink / raw)
  To: Peter Huewe, Ashley Lai, Marcel Selhorst
  Cc: tpmdd-devel, linux-kernel, linux-api, Jarkko Sakkinen
In-Reply-To: <1412701277-27794-1-git-send-email-jarkko.sakkinen@linux.intel.com>

Updated tpm_tis to properly use tpm-chip API instead of using ad hoc
methods. tpm_chip_unregister() is called on remove event when PNP driver
is used and on module removal when platform driver is used.

Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
---
 drivers/char/tpm/tpm_tis.c | 64 ++++++++++++++++++++--------------------------
 1 file changed, 28 insertions(+), 36 deletions(-)

diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c
index a2780df..df04ce6 100644
--- a/drivers/char/tpm/tpm_tis.c
+++ b/drivers/char/tpm/tpm_tis.c
@@ -75,9 +75,6 @@ enum tis_defaults {
 #define	TPM_DID_VID(l)			(0x0F00 | ((l) << 12))
 #define	TPM_RID(l)			(0x0F04 | ((l) << 12))
 
-static LIST_HEAD(tis_chips);
-static DEFINE_MUTEX(tis_lock);
-
 #if defined(CONFIG_PNP) && defined(CONFIG_ACPI)
 static int is_itpm(struct pnp_dev *dev)
 {
@@ -535,14 +532,17 @@ static int tpm_tis_init(struct device *dev, resource_size_t start,
 	int rc, i, irq_s, irq_e, probe;
 	struct tpm_chip *chip;
 
-	if (!(chip = tpm_register_hardware(dev, &tpm_tis)))
+	chip = tpm_chip_alloc(dev, &tpm_tis);
+	if (!chip)
 		return -ENODEV;
 
-	chip->vendor.iobase = ioremap(start, len);
-	if (!chip->vendor.iobase) {
-		rc = -EIO;
-		goto out_err;
-	}
+	chip->vendor.iobase = devm_ioremap(dev, start, len);
+	if (!chip->vendor.iobase)
+		return -EIO;
+
+	rc = tpm_chip_register(chip);
+	if (rc)
+		return -ENODEV;
 
 	/* Default timeouts */
 	chip->vendor.timeout_a = msecs_to_jiffies(TIS_SHORT_TIMEOUT);
@@ -720,16 +720,10 @@ static int tpm_tis_init(struct device *dev, resource_size_t start,
 	}
 
 	INIT_LIST_HEAD(&chip->vendor.list);
-	mutex_lock(&tis_lock);
-	list_add(&chip->vendor.list, &tis_chips);
-	mutex_unlock(&tis_lock);
-
 
 	return 0;
 out_err:
-	if (chip->vendor.iobase)
-		iounmap(chip->vendor.iobase);
-	tpm_remove_hardware(chip->dev);
+	tpm_chip_unregister(chip);
 	return rc;
 }
 
@@ -808,13 +802,27 @@ static struct pnp_device_id tpm_pnp_tbl[] = {
 };
 MODULE_DEVICE_TABLE(pnp, tpm_pnp_tbl);
 
+static void tpm_tis_chip_remove(struct tpm_chip *chip)
+{
+	iowrite32(~TPM_GLOBAL_INT_ENABLE &
+		  ioread32(chip->vendor.iobase +
+			   TPM_INT_ENABLE(chip->vendor.
+					  locality)),
+		  chip->vendor.iobase +
+		  TPM_INT_ENABLE(chip->vendor.locality));
+	release_locality(chip, chip->vendor.locality, 1);
+	if (chip->vendor.irq)
+		free_irq(chip->vendor.irq, chip);
+
+	tpm_chip_unregister(chip);
+}
+
 static void tpm_tis_pnp_remove(struct pnp_dev *dev)
 {
 	struct tpm_chip *chip = pnp_get_drvdata(dev);
-	tpm_remove_hardware(chip->dev);
+	tpm_tis_chip_remove(chip);
 }
 
-
 static struct pnp_driver tis_pnp_driver = {
 	.name = "tpm_tis",
 	.id_table = tpm_pnp_tbl,
@@ -873,31 +881,15 @@ err_dev:
 
 static void __exit cleanup_tis(void)
 {
-	struct tpm_vendor_specific *i, *j;
 	struct tpm_chip *chip;
-	mutex_lock(&tis_lock);
-	list_for_each_entry_safe(i, j, &tis_chips, list) {
-		chip = to_tpm_chip(i);
-		tpm_remove_hardware(chip->dev);
-		iowrite32(~TPM_GLOBAL_INT_ENABLE &
-			  ioread32(chip->vendor.iobase +
-				   TPM_INT_ENABLE(chip->vendor.
-						  locality)),
-			  chip->vendor.iobase +
-			  TPM_INT_ENABLE(chip->vendor.locality));
-		release_locality(chip, chip->vendor.locality, 1);
-		if (chip->vendor.irq)
-			free_irq(chip->vendor.irq, chip);
-		iounmap(i->iobase);
-		list_del(&i->list);
-	}
-	mutex_unlock(&tis_lock);
 #ifdef CONFIG_PNP
 	if (!force) {
 		pnp_unregister_driver(&tis_pnp_driver);
 		return;
 	}
 #endif
+	chip = dev_get_drvdata(&pdev->dev);
+	tpm_tis_chip_remove(chip);
 	platform_device_unregister(pdev);
 	platform_driver_unregister(&tis_drv);
 }
-- 
2.1.0

^ permalink raw reply related

* [PATCH v2 2/7] tpm: two-phase chip management functions
From: Jarkko Sakkinen @ 2014-10-07 17:01 UTC (permalink / raw)
  To: Peter Huewe, Ashley Lai, Marcel Selhorst
  Cc: tpmdd-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Jarkko Sakkinen
In-Reply-To: <1412701277-27794-1-git-send-email-jarkko.sakkinen-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>

Added tpm_chip_alloc() and tpm_chip_register() where tpm_chip_alloc()
reserves memory resources and tpm_chip_register() initializes the
device driver. This way it is possible to alter struct tpm_chip
attributes before passing it to tpm_chip_register().

The framework takes care of freeing struct tpm_chip by using devres
API. The broken release callback has been wiped.

tpm_register_hardware() and tpm_remove_hardware are still available
as a wrapper in order to make this change less intrusive. However,
device drivers should eventually move to the tpm_chip_* functions.

Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
---
 drivers/char/tpm/Makefile           |   2 +-
 drivers/char/tpm/tpm-chip.c         | 178 ++++++++++++++++++++++++++++++++++++
 drivers/char/tpm/tpm-interface.c    | 124 ++-----------------------
 drivers/char/tpm/tpm.h              |   8 +-
 drivers/char/tpm/tpm_i2c_atmel.c    |   5 -
 drivers/char/tpm/tpm_i2c_infineon.c |  10 --
 drivers/char/tpm/tpm_i2c_nuvoton.c  |   5 -
 drivers/char/tpm/tpm_infineon.c     |   1 -
 drivers/char/tpm/tpm_tis.c          |   5 +-
 9 files changed, 194 insertions(+), 144 deletions(-)
 create mode 100644 drivers/char/tpm/tpm-chip.c

diff --git a/drivers/char/tpm/Makefile b/drivers/char/tpm/Makefile
index 4d85dd6..837da04 100644
--- a/drivers/char/tpm/Makefile
+++ b/drivers/char/tpm/Makefile
@@ -2,7 +2,7 @@
 # Makefile for the kernel tpm device drivers.
 #
 obj-$(CONFIG_TCG_TPM) += tpm.o
-tpm-y := tpm-interface.o tpm-dev.o tpm-sysfs.o
+tpm-y := tpm-interface.o tpm-dev.o tpm-sysfs.o tpm-chip.o
 tpm-$(CONFIG_ACPI) += tpm_ppi.o
 
 ifdef CONFIG_ACPI
diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
new file mode 100644
index 0000000..6cc4cee
--- /dev/null
+++ b/drivers/char/tpm/tpm-chip.c
@@ -0,0 +1,178 @@
+/*
+ * Copyright (C) 2004 IBM Corporation
+ * Copyright (C) 2014 Intel Corporation
+ *
+ * Authors:
+ * Jarkko Sakkinen <jarkko.sakkinen-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
+ * Leendert van Doorn <leendert-aZOuKsOsJu3MbYB6QlFGEg@public.gmane.org>
+ * Dave Safford <safford-aZOuKsOsJu3MbYB6QlFGEg@public.gmane.org>
+ * Reiner Sailer <sailer-aZOuKsOsJu3MbYB6QlFGEg@public.gmane.org>
+ * Kylene Hall <kjhall-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
+ *
+ * Maintained by: <tpmdd-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>
+ *
+ * TPM chip management routines.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation, version 2 of the
+ * License.
+ *
+ */
+
+#include <linux/poll.h>
+#include <linux/slab.h>
+#include <linux/mutex.h>
+#include <linux/spinlock.h>
+#include <linux/freezer.h>
+#include "tpm.h"
+#include "tpm_eventlog.h"
+
+static DECLARE_BITMAP(dev_mask, TPM_NUM_DEVICES);
+static LIST_HEAD(tpm_chip_list);
+static DEFINE_SPINLOCK(driver_lock);
+
+/*
+ * tpm_chip_find_get - return tpm_chip for a given chip number
+ * @chip_num the device number for the chip
+ */
+struct tpm_chip *tpm_chip_find_get(int chip_num)
+{
+	struct tpm_chip *pos, *chip = NULL;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(pos, &tpm_chip_list, list) {
+		if (chip_num != TPM_ANY_NUM && chip_num != pos->dev_num)
+			continue;
+
+		if (try_module_get(pos->dev->driver->owner)) {
+			chip = pos;
+			break;
+		}
+	}
+	rcu_read_unlock();
+	return chip;
+}
+
+/**
+ * tpm_devm_chip_remove() - free chip memory and device number
+ * @data: points to struct tpm_chip instance
+ *
+ * This is used internally by tpm_chip_alloc() and called by devres
+ * when the device is released. This funcion does the opposite of
+ * tpm_chip_alloc() freeing memory and the device number.
+ */
+static void tpm_devm_chip_remove(void *data)
+{
+	struct tpm_chip *chip = (struct tpm_chip *) data;
+	dev_dbg(chip->dev, "%s\n", __func__);
+	clear_bit(chip->dev_num, dev_mask);
+	kfree(chip);
+}
+
+/**
+ * tpm_chip_alloc() - allocate a new struct tpm_chip instance
+ * @dev: device to which the chip is associated
+ * @ops: struct tpm_class_ops instance
+ *
+ * Allocates a new struct tpm_chip instance and assigns a free
+ * device number for it. Caller does not have to worry about
+ * freeing the allocated resources. When the devices is removed
+ * devres calls tpm_devm_chip_remove() to do the job.
+ */
+struct tpm_chip *tpm_chip_alloc(struct device *dev,
+				const struct tpm_class_ops *ops)
+{
+	struct tpm_chip *chip;
+
+	chip = kzalloc(sizeof(*chip), GFP_KERNEL);
+	if (chip == NULL)
+		return ERR_PTR(-ENOMEM);
+
+	mutex_init(&chip->tpm_mutex);
+	INIT_LIST_HEAD(&chip->list);
+
+	chip->ops = ops;
+	chip->dev_num = find_first_zero_bit(dev_mask, TPM_NUM_DEVICES);
+
+	if (chip->dev_num >= TPM_NUM_DEVICES) {
+		dev_err(dev, "No available tpm device numbers\n");
+		kfree(chip);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	set_bit(chip->dev_num, dev_mask);
+
+	scnprintf(chip->devname, sizeof(chip->devname), "%s%d", "tpm",
+		  chip->dev_num);
+
+	chip->dev = dev;
+	devm_add_action(dev, tpm_devm_chip_remove, chip);
+	dev_set_drvdata(dev, chip);
+
+	return chip;
+}
+EXPORT_SYMBOL_GPL(tpm_chip_alloc);
+
+/*
+ * tpm_chip_register() - create a misc driver for the TPM chip
+ * @chip: TPM chip to use.
+ *
+ * Creates a misc driver for the TPM chip and adds sysfs interfaces for
+ * the device, PPI and TCPA. As the last step this function adds the
+ * chip to the list of TPM chips available for use.
+ */
+int tpm_chip_register(struct tpm_chip *chip)
+{
+	int rc;
+
+	rc = tpm_dev_add_device(chip);
+	if (rc)
+		return rc;
+
+	rc = tpm_sysfs_add_device(chip);
+	if (rc)
+		goto del_misc;
+
+	rc = tpm_add_ppi(&chip->dev->kobj);
+	if (rc)
+		goto del_sysfs;
+
+	chip->bios_dir = tpm_bios_log_setup(chip->devname);
+
+	/* Make the chip available. */
+	spin_lock(&driver_lock);
+	list_add_rcu(&chip->list, &tpm_chip_list);
+	spin_unlock(&driver_lock);
+
+	return 0;
+del_sysfs:
+	tpm_sysfs_del_device(chip);
+del_misc:
+	tpm_dev_del_device(chip);
+	return rc;
+}
+EXPORT_SYMBOL_GPL(tpm_chip_register);
+
+/*
+ * tpm_chip_unregister() - release the TPM driver
+ * @chip: TPM chip to use.
+ *
+ * Takes the chip first away from the list of available TPM chips and then
+ * cleans up all the resources reserved by tpm_chip_register().
+ */
+void tpm_chip_unregister(struct tpm_chip *chip)
+{
+	dev_dbg(chip->dev, "%s\n", __func__);
+
+	spin_lock(&driver_lock);
+	list_del_rcu(&chip->list);
+	spin_unlock(&driver_lock);
+	synchronize_rcu();
+
+	tpm_dev_del_device(chip);
+	tpm_sysfs_del_device(chip);
+	tpm_remove_ppi(&chip->dev->kobj);
+	tpm_bios_log_teardown(chip->bios_dir);
+}
+EXPORT_SYMBOL_GPL(tpm_chip_unregister);
diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
index fedb4d5..1ce3ad3 100644
--- a/drivers/char/tpm/tpm-interface.c
+++ b/drivers/char/tpm/tpm-interface.c
@@ -1,5 +1,6 @@
 /*
  * Copyright (C) 2004 IBM Corporation
+ * Copyright (C) 2014 Intel Corporation
  *
  * Authors:
  * Leendert van Doorn <leendert-aZOuKsOsJu3MbYB6QlFGEg@public.gmane.org>
@@ -47,10 +48,6 @@ module_param_named(suspend_pcr, tpm_suspend_pcr, uint, 0644);
 MODULE_PARM_DESC(suspend_pcr,
 		 "PCR to use for dummy writes to faciltate flush on suspend.");
 
-static LIST_HEAD(tpm_chip_list);
-static DEFINE_SPINLOCK(driver_lock);
-static DECLARE_BITMAP(dev_mask, TPM_NUM_DEVICES);
-
 /*
  * Array with one entry per ordinal defining the maximum amount
  * of time the chip could take to return the result.  The ordinal
@@ -639,27 +636,6 @@ static int tpm_continue_selftest(struct tpm_chip *chip)
 	return rc;
 }
 
-/*
- * tpm_chip_find_get - return tpm_chip for given chip number
- */
-static struct tpm_chip *tpm_chip_find_get(int chip_num)
-{
-	struct tpm_chip *pos, *chip = NULL;
-
-	rcu_read_lock();
-	list_for_each_entry_rcu(pos, &tpm_chip_list, list) {
-		if (chip_num != TPM_ANY_NUM && chip_num != pos->dev_num)
-			continue;
-
-		if (try_module_get(pos->dev->driver->owner)) {
-			chip = pos;
-			break;
-		}
-	}
-	rcu_read_unlock();
-	return chip;
-}
-
 #define TPM_ORDINAL_PCRREAD cpu_to_be32(21)
 #define READ_PCR_RESULT_SIZE 30
 static struct tpm_input_header pcrread_header = {
@@ -896,18 +872,7 @@ void tpm_remove_hardware(struct device *dev)
 		return;
 	}
 
-	spin_lock(&driver_lock);
-	list_del_rcu(&chip->list);
-	spin_unlock(&driver_lock);
-	synchronize_rcu();
-
-	tpm_dev_del_device(chip);
-	tpm_sysfs_del_device(chip);
-	tpm_remove_ppi(&dev->kobj);
-	tpm_bios_log_teardown(chip->bios_dir);
-
-	/* write it this way to be explicit (chip->dev == dev) */
-	put_device(chip->dev);
+	tpm_chip_unregister(chip);
 }
 EXPORT_SYMBOL_GPL(tpm_remove_hardware);
 
@@ -1044,35 +1009,6 @@ int tpm_get_random(u32 chip_num, u8 *out, size_t max)
 }
 EXPORT_SYMBOL_GPL(tpm_get_random);
 
-/* In case vendor provided release function, call it too.*/
-
-void tpm_dev_vendor_release(struct tpm_chip *chip)
-{
-	if (!chip)
-		return;
-
-	clear_bit(chip->dev_num, dev_mask);
-}
-EXPORT_SYMBOL_GPL(tpm_dev_vendor_release);
-
-
-/*
- * Once all references to platform device are down to 0,
- * release all allocated structures.
- */
-static void tpm_dev_release(struct device *dev)
-{
-	struct tpm_chip *chip = dev_get_drvdata(dev);
-
-	if (!chip)
-		return;
-
-	tpm_dev_vendor_release(chip);
-
-	chip->release(dev);
-	kfree(chip);
-}
-
 /*
  * Called from tpm_<specific>.c probe function only for devices
  * the driver has determined it should claim.  Prior to calling
@@ -1084,61 +1020,17 @@ struct tpm_chip *tpm_register_hardware(struct device *dev,
 				       const struct tpm_class_ops *ops)
 {
 	struct tpm_chip *chip;
+	int rc;
 
-	/* Driver specific per-device data */
-	chip = kzalloc(sizeof(*chip), GFP_KERNEL);
-
-	if (chip == NULL)
+	chip = tpm_chip_alloc(dev, ops);
+	if (IS_ERR(chip))
 		return NULL;
 
-	mutex_init(&chip->tpm_mutex);
-	INIT_LIST_HEAD(&chip->list);
-
-	chip->ops = ops;
-	chip->dev_num = find_first_zero_bit(dev_mask, TPM_NUM_DEVICES);
-
-	if (chip->dev_num >= TPM_NUM_DEVICES) {
-		dev_err(dev, "No available tpm device numbers\n");
-		goto out_free;
-	}
-
-	set_bit(chip->dev_num, dev_mask);
-
-	scnprintf(chip->devname, sizeof(chip->devname), "%s%d", "tpm",
-		  chip->dev_num);
-
-	chip->dev = get_device(dev);
-	chip->release = dev->release;
-	dev->release = tpm_dev_release;
-	dev_set_drvdata(dev, chip);
-
-	if (tpm_dev_add_device(chip))
-		goto put_device;
-
-	if (tpm_sysfs_add_device(chip))
-		goto del_misc;
-
-	if (tpm_add_ppi(&dev->kobj))
-		goto del_sysfs;
-
-	chip->bios_dir = tpm_bios_log_setup(chip->devname);
-
-	/* Make chip available */
-	spin_lock(&driver_lock);
-	list_add_rcu(&chip->list, &tpm_chip_list);
-	spin_unlock(&driver_lock);
+	rc = tpm_chip_register(chip);
+	if (rc)
+		return NULL;
 
 	return chip;
-
-del_sysfs:
-	tpm_sysfs_del_device(chip);
-del_misc:
-	tpm_dev_del_device(chip);
-put_device:
-	put_device(chip->dev);
-out_free:
-	kfree(chip);
-	return NULL;
 }
 EXPORT_SYMBOL_GPL(tpm_register_hardware);
 
diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index 12326e1..5eb89897 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -110,7 +110,6 @@ struct tpm_chip {
 	struct dentry **bios_dir;
 
 	struct list_head list;
-	void (*release) (struct device *);
 };
 
 #define to_tpm_chip(n) container_of(n, struct tpm_chip, vendor)
@@ -322,13 +321,18 @@ extern int tpm_do_selftest(struct tpm_chip *);
 extern unsigned long tpm_calc_ordinal_duration(struct tpm_chip *, u32);
 extern struct tpm_chip* tpm_register_hardware(struct device *,
 					      const struct tpm_class_ops *ops);
-extern void tpm_dev_vendor_release(struct tpm_chip *);
 extern void tpm_remove_hardware(struct device *);
 extern int tpm_pm_suspend(struct device *);
 extern int tpm_pm_resume(struct device *);
 extern int wait_for_tpm_stat(struct tpm_chip *, u8, unsigned long,
 			     wait_queue_head_t *, bool);
 
+struct tpm_chip *tpm_chip_find_get(int chip_num);
+extern struct tpm_chip *tpm_chip_alloc(struct device *dev,
+				       const struct tpm_class_ops *ops);
+extern int tpm_chip_register(struct tpm_chip *chip);
+extern void tpm_chip_unregister(struct tpm_chip *chip);
+
 int tpm_dev_add_device(struct tpm_chip *chip);
 void tpm_dev_del_device(struct tpm_chip *chip);
 int tpm_sysfs_add_device(struct tpm_chip *chip);
diff --git a/drivers/char/tpm/tpm_i2c_atmel.c b/drivers/char/tpm/tpm_i2c_atmel.c
index 7727292..1b52045 100644
--- a/drivers/char/tpm/tpm_i2c_atmel.c
+++ b/drivers/char/tpm/tpm_i2c_atmel.c
@@ -192,7 +192,6 @@ static int i2c_atmel_probe(struct i2c_client *client,
 	return 0;
 
 out_err:
-	tpm_dev_vendor_release(chip);
 	tpm_remove_hardware(chip->dev);
 	return rc;
 }
@@ -200,12 +199,8 @@ out_err:
 static int i2c_atmel_remove(struct i2c_client *client)
 {
 	struct device *dev = &(client->dev);
-	struct tpm_chip *chip = dev_get_drvdata(dev);
 
-	if (chip)
-		tpm_dev_vendor_release(chip);
 	tpm_remove_hardware(dev);
-	kfree(chip);
 	return 0;
 }
 
diff --git a/drivers/char/tpm/tpm_i2c_infineon.c b/drivers/char/tpm/tpm_i2c_infineon.c
index 472af4b..9d9834d 100644
--- a/drivers/char/tpm/tpm_i2c_infineon.c
+++ b/drivers/char/tpm/tpm_i2c_infineon.c
@@ -634,15 +634,10 @@ out_release:
 	release_locality(chip, chip->vendor.locality, 1);
 
 out_vendor:
-	/* close file handles */
-	tpm_dev_vendor_release(chip);
-
 	/* remove hardware */
 	tpm_remove_hardware(chip->dev);
 
 	/* reset these pointers, otherwise we oops */
-	chip->dev->release = NULL;
-	chip->release = NULL;
 	tpm_dev.client = NULL;
 out_err:
 	return rc;
@@ -714,15 +709,10 @@ static int tpm_tis_i2c_remove(struct i2c_client *client)
 	struct tpm_chip *chip = tpm_dev.chip;
 	release_locality(chip, chip->vendor.locality, 1);
 
-	/* close file handles */
-	tpm_dev_vendor_release(chip);
-
 	/* remove hardware */
 	tpm_remove_hardware(chip->dev);
 
 	/* reset these pointers, otherwise we oops */
-	chip->dev->release = NULL;
-	chip->release = NULL;
 	tpm_dev.client = NULL;
 
 	return 0;
diff --git a/drivers/char/tpm/tpm_i2c_nuvoton.c b/drivers/char/tpm/tpm_i2c_nuvoton.c
index 7b158ef..3a2caec 100644
--- a/drivers/char/tpm/tpm_i2c_nuvoton.c
+++ b/drivers/char/tpm/tpm_i2c_nuvoton.c
@@ -615,7 +615,6 @@ static int i2c_nuvoton_probe(struct i2c_client *client,
 	return 0;
 
 out_err:
-	tpm_dev_vendor_release(chip);
 	tpm_remove_hardware(chip->dev);
 	return rc;
 }
@@ -623,12 +622,8 @@ out_err:
 static int i2c_nuvoton_remove(struct i2c_client *client)
 {
 	struct device *dev = &(client->dev);
-	struct tpm_chip *chip = dev_get_drvdata(dev);
 
-	if (chip)
-		tpm_dev_vendor_release(chip);
 	tpm_remove_hardware(dev);
-	kfree(chip);
 	return 0;
 }
 
diff --git a/drivers/char/tpm/tpm_infineon.c b/drivers/char/tpm/tpm_infineon.c
index dc0a255..0a72840 100644
--- a/drivers/char/tpm/tpm_infineon.c
+++ b/drivers/char/tpm/tpm_infineon.c
@@ -581,7 +581,6 @@ static void tpm_inf_pnp_remove(struct pnp_dev *dev)
 			iounmap(tpm_dev.mem_base);
 			release_mem_region(tpm_dev.map_base, tpm_dev.map_size);
 		}
-		tpm_dev_vendor_release(chip);
 		tpm_remove_hardware(chip->dev);
 	}
 }
diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c
index 2c46734..a2780df 100644
--- a/drivers/char/tpm/tpm_tis.c
+++ b/drivers/char/tpm/tpm_tis.c
@@ -811,10 +811,7 @@ MODULE_DEVICE_TABLE(pnp, tpm_pnp_tbl);
 static void tpm_tis_pnp_remove(struct pnp_dev *dev)
 {
 	struct tpm_chip *chip = pnp_get_drvdata(dev);
-
-	tpm_dev_vendor_release(chip);
-
-	kfree(chip);
+	tpm_remove_hardware(chip->dev);
 }
 
 
-- 
2.1.0

^ permalink raw reply related

* [PATCH v2 1/7] tpm: merge duplicate transmit_cmd() functions
From: Jarkko Sakkinen @ 2014-10-07 17:01 UTC (permalink / raw)
  To: Peter Huewe, Ashley Lai, Marcel Selhorst
  Cc: tpmdd-devel, linux-kernel, linux-api, Jarkko Sakkinen
In-Reply-To: <1412701277-27794-1-git-send-email-jarkko.sakkinen@linux.intel.com>

Merged transmit_cmd() functions in tpm-interface.c and tpm-sysfs.c.
Added "tpm_" prefix for consistency sake. Changed cmd parameter as
opaque. This enables to use separate command structures for TPM1
and TPM2 commands instead of putting everything to struct tpm_cmd_t.

Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
---
 drivers/char/tpm/tpm-dev.c       |  4 +--
 drivers/char/tpm/tpm-interface.c | 53 +++++++++++++++++++++-------------------
 drivers/char/tpm/tpm-sysfs.c     | 23 ++---------------
 drivers/char/tpm/tpm.h           |  5 ++--
 4 files changed, 34 insertions(+), 51 deletions(-)

diff --git a/drivers/char/tpm/tpm-dev.c b/drivers/char/tpm/tpm-dev.c
index d9b774e..bd79d33 100644
--- a/drivers/char/tpm/tpm-dev.c
+++ b/drivers/char/tpm/tpm-dev.c
@@ -140,8 +140,8 @@ static ssize_t tpm_write(struct file *file, const char __user *buf,
 	}
 
 	/* atomic tpm command send and result receive */
-	out_size = tpm_transmit(priv->chip, priv->data_buffer,
-				sizeof(priv->data_buffer));
+	out_size = tpm_transmit_cmd(priv->chip, priv->data_buffer,
+				    sizeof(priv->data_buffer), NULL);
 	if (out_size < 0) {
 		mutex_unlock(&priv->buffer_mutex);
 		return out_size;
diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
index 6af1700..fedb4d5 100644
--- a/drivers/char/tpm/tpm-interface.c
+++ b/drivers/char/tpm/tpm-interface.c
@@ -331,8 +331,8 @@ EXPORT_SYMBOL_GPL(tpm_calc_ordinal_duration);
 /*
  * Internal kernel interface to transmit TPM commands
  */
-ssize_t tpm_transmit(struct tpm_chip *chip, const char *buf,
-		     size_t bufsiz)
+static ssize_t tpm_transmit(struct tpm_chip *chip, const char *buf,
+			    size_t bufsiz)
 {
 	ssize_t rc;
 	u32 count, ordinal;
@@ -398,9 +398,10 @@ out:
 #define TPM_DIGEST_SIZE 20
 #define TPM_RET_CODE_IDX 6
 
-static ssize_t transmit_cmd(struct tpm_chip *chip, struct tpm_cmd_t *cmd,
-			    int len, const char *desc)
+ssize_t tpm_transmit_cmd(struct tpm_chip *chip, void *cmd,
+			 int len, const char *desc)
 {
+	struct tpm_output_header *header;
 	int err;
 
 	len = tpm_transmit(chip, (u8 *) cmd, len);
@@ -409,7 +410,9 @@ static ssize_t transmit_cmd(struct tpm_chip *chip, struct tpm_cmd_t *cmd,
 	else if (len < TPM_HEADER_SIZE)
 		return -EFAULT;
 
-	err = be32_to_cpu(cmd->header.out.return_code);
+	header = (struct tpm_output_header *) cmd;
+
+	err = be32_to_cpu(header->return_code);
 	if (err != 0 && desc)
 		dev_err(chip->dev, "A TPM error (%d) occurred %s\n", err, desc);
 
@@ -448,7 +451,7 @@ ssize_t tpm_getcap(struct device *dev, __be32 subcap_id, cap_t *cap,
 		tpm_cmd.params.getcap_in.subcap_size = cpu_to_be32(4);
 		tpm_cmd.params.getcap_in.subcap = subcap_id;
 	}
-	rc = transmit_cmd(chip, &tpm_cmd, TPM_INTERNAL_RESULT_SIZE, desc);
+	rc = tpm_transmit_cmd(chip, &tpm_cmd, TPM_INTERNAL_RESULT_SIZE, desc);
 	if (!rc)
 		*cap = tpm_cmd.params.getcap_out.cap;
 	return rc;
@@ -464,8 +467,8 @@ void tpm_gen_interrupt(struct tpm_chip *chip)
 	tpm_cmd.params.getcap_in.subcap_size = cpu_to_be32(4);
 	tpm_cmd.params.getcap_in.subcap = TPM_CAP_PROP_TIS_TIMEOUT;
 
-	rc = transmit_cmd(chip, &tpm_cmd, TPM_INTERNAL_RESULT_SIZE,
-			"attempting to determine the timeouts");
+	rc = tpm_transmit_cmd(chip, &tpm_cmd, TPM_INTERNAL_RESULT_SIZE,
+			      "attempting to determine the timeouts");
 }
 EXPORT_SYMBOL_GPL(tpm_gen_interrupt);
 
@@ -484,8 +487,8 @@ static int tpm_startup(struct tpm_chip *chip, __be16 startup_type)
 	struct tpm_cmd_t start_cmd;
 	start_cmd.header.in = tpm_startup_header;
 	start_cmd.params.startup_in.startup_type = startup_type;
-	return transmit_cmd(chip, &start_cmd, TPM_INTERNAL_RESULT_SIZE,
-			    "attempting to start the TPM");
+	return tpm_transmit_cmd(chip, &start_cmd, TPM_INTERNAL_RESULT_SIZE,
+				"attempting to start the TPM");
 }
 
 int tpm_get_timeouts(struct tpm_chip *chip)
@@ -500,7 +503,7 @@ int tpm_get_timeouts(struct tpm_chip *chip)
 	tpm_cmd.params.getcap_in.cap = TPM_CAP_PROP;
 	tpm_cmd.params.getcap_in.subcap_size = cpu_to_be32(4);
 	tpm_cmd.params.getcap_in.subcap = TPM_CAP_PROP_TIS_TIMEOUT;
-	rc = transmit_cmd(chip, &tpm_cmd, TPM_INTERNAL_RESULT_SIZE, NULL);
+	rc = tpm_transmit_cmd(chip, &tpm_cmd, TPM_INTERNAL_RESULT_SIZE, NULL);
 
 	if (rc == TPM_ERR_INVALID_POSTINIT) {
 		/* The TPM is not started, we are the first to talk to it.
@@ -513,7 +516,7 @@ int tpm_get_timeouts(struct tpm_chip *chip)
 		tpm_cmd.params.getcap_in.cap = TPM_CAP_PROP;
 		tpm_cmd.params.getcap_in.subcap_size = cpu_to_be32(4);
 		tpm_cmd.params.getcap_in.subcap = TPM_CAP_PROP_TIS_TIMEOUT;
-		rc = transmit_cmd(chip, &tpm_cmd, TPM_INTERNAL_RESULT_SIZE,
+		rc = tpm_transmit_cmd(chip, &tpm_cmd, TPM_INTERNAL_RESULT_SIZE,
 				  NULL);
 	}
 	if (rc) {
@@ -575,8 +578,8 @@ duration:
 	tpm_cmd.params.getcap_in.subcap_size = cpu_to_be32(4);
 	tpm_cmd.params.getcap_in.subcap = TPM_CAP_PROP_TIS_DURATION;
 
-	rc = transmit_cmd(chip, &tpm_cmd, TPM_INTERNAL_RESULT_SIZE,
-			"attempting to determine the durations");
+	rc = tpm_transmit_cmd(chip, &tpm_cmd, TPM_INTERNAL_RESULT_SIZE,
+			      "attempting to determine the durations");
 	if (rc)
 		return rc;
 
@@ -631,8 +634,8 @@ static int tpm_continue_selftest(struct tpm_chip *chip)
 	struct tpm_cmd_t cmd;
 
 	cmd.header.in = continue_selftest_header;
-	rc = transmit_cmd(chip, &cmd, CONTINUE_SELFTEST_RESULT_SIZE,
-			  "continue selftest");
+	rc = tpm_transmit_cmd(chip, &cmd, CONTINUE_SELFTEST_RESULT_SIZE,
+			      "continue selftest");
 	return rc;
 }
 
@@ -672,8 +675,8 @@ int tpm_pcr_read_dev(struct tpm_chip *chip, int pcr_idx, u8 *res_buf)
 
 	cmd.header.in = pcrread_header;
 	cmd.params.pcrread_in.pcr_idx = cpu_to_be32(pcr_idx);
-	rc = transmit_cmd(chip, &cmd, READ_PCR_RESULT_SIZE,
-			  "attempting to read a pcr value");
+	rc = tpm_transmit_cmd(chip, &cmd, READ_PCR_RESULT_SIZE,
+			      "attempting to read a pcr value");
 
 	if (rc == 0)
 		memcpy(res_buf, cmd.params.pcrread_out.pcr_result,
@@ -737,8 +740,8 @@ int tpm_pcr_extend(u32 chip_num, int pcr_idx, const u8 *hash)
 	cmd.header.in = pcrextend_header;
 	cmd.params.pcrextend_in.pcr_idx = cpu_to_be32(pcr_idx);
 	memcpy(cmd.params.pcrextend_in.hash, hash, TPM_DIGEST_SIZE);
-	rc = transmit_cmd(chip, &cmd, EXTEND_PCR_RESULT_SIZE,
-			  "attempting extend a PCR value");
+	rc = tpm_transmit_cmd(chip, &cmd, EXTEND_PCR_RESULT_SIZE,
+			      "attempting extend a PCR value");
 
 	tpm_chip_put(chip);
 	return rc;
@@ -817,7 +820,7 @@ int tpm_send(u32 chip_num, void *cmd, size_t buflen)
 	if (chip == NULL)
 		return -ENODEV;
 
-	rc = transmit_cmd(chip, cmd, buflen, "attempting tpm_cmd");
+	rc = tpm_transmit_cmd(chip, cmd, buflen, "attempting tpm_cmd");
 
 	tpm_chip_put(chip);
 	return rc;
@@ -938,14 +941,14 @@ int tpm_pm_suspend(struct device *dev)
 		cmd.params.pcrextend_in.pcr_idx = cpu_to_be32(tpm_suspend_pcr);
 		memcpy(cmd.params.pcrextend_in.hash, dummy_hash,
 		       TPM_DIGEST_SIZE);
-		rc = transmit_cmd(chip, &cmd, EXTEND_PCR_RESULT_SIZE,
-				  "extending dummy pcr before suspend");
+		rc = tpm_transmit_cmd(chip, &cmd, EXTEND_PCR_RESULT_SIZE,
+				      "extending dummy pcr before suspend");
 	}
 
 	/* now do the actual savestate */
 	for (try = 0; try < TPM_RETRY; try++) {
 		cmd.header.in = savestate_header;
-		rc = transmit_cmd(chip, &cmd, SAVESTATE_RESULT_SIZE, NULL);
+		rc = tpm_transmit_cmd(chip, &cmd, SAVESTATE_RESULT_SIZE, NULL);
 
 		/*
 		 * If the TPM indicates that it is too busy to respond to
@@ -1022,7 +1025,7 @@ int tpm_get_random(u32 chip_num, u8 *out, size_t max)
 		tpm_cmd.header.in = tpm_getrandom_header;
 		tpm_cmd.params.getrandom_in.num_bytes = cpu_to_be32(num_bytes);
 
-		err = transmit_cmd(chip, &tpm_cmd,
+		err = tpm_transmit_cmd(chip, &tpm_cmd,
 				   TPM_GETRANDOM_RESULT_SIZE + num_bytes,
 				   "attempting get random");
 		if (err)
diff --git a/drivers/char/tpm/tpm-sysfs.c b/drivers/char/tpm/tpm-sysfs.c
index 01730a2..8ecb052 100644
--- a/drivers/char/tpm/tpm-sysfs.c
+++ b/drivers/char/tpm/tpm-sysfs.c
@@ -20,25 +20,6 @@
 #include <linux/device.h>
 #include "tpm.h"
 
-/* XXX for now this helper is duplicated in tpm-interface.c */
-static ssize_t transmit_cmd(struct tpm_chip *chip, struct tpm_cmd_t *cmd,
-			    int len, const char *desc)
-{
-	int err;
-
-	len = tpm_transmit(chip, (u8 *) cmd, len);
-	if (len <  0)
-		return len;
-	else if (len < TPM_HEADER_SIZE)
-		return -EFAULT;
-
-	err = be32_to_cpu(cmd->header.out.return_code);
-	if (err != 0 && desc)
-		dev_err(chip->dev, "A TPM error (%d) occurred %s\n", err, desc);
-
-	return err;
-}
-
 #define READ_PUBEK_RESULT_SIZE 314
 #define TPM_ORD_READPUBEK cpu_to_be32(124)
 static struct tpm_input_header tpm_readpubek_header = {
@@ -58,8 +39,8 @@ static ssize_t pubek_show(struct device *dev, struct device_attribute *attr,
 	struct tpm_chip *chip = dev_get_drvdata(dev);
 
 	tpm_cmd.header.in = tpm_readpubek_header;
-	err = transmit_cmd(chip, &tpm_cmd, READ_PUBEK_RESULT_SIZE,
-			   "attempting to read the PUBEK");
+	err = tpm_transmit_cmd(chip, &tpm_cmd, READ_PUBEK_RESULT_SIZE,
+			       "attempting to read the PUBEK");
 	if (err)
 		goto out;
 
diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index e4d0888..12326e1 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -314,9 +314,8 @@ struct tpm_cmd_t {
 } __packed;
 
 ssize_t	tpm_getcap(struct device *, __be32, cap_t *, const char *);
-
-ssize_t tpm_transmit(struct tpm_chip *chip, const char *buf,
-		     size_t bufsiz);
+ssize_t tpm_transmit_cmd(struct tpm_chip *chip, void *cmd, int len,
+			 const char *desc);
 extern int tpm_get_timeouts(struct tpm_chip *);
 extern void tpm_gen_interrupt(struct tpm_chip *);
 extern int tpm_do_selftest(struct tpm_chip *);
-- 
2.1.0

^ permalink raw reply related

* [PATCH v2 0/7] TPM 2.0 support
From: Jarkko Sakkinen @ 2014-10-07 17:01 UTC (permalink / raw)
  To: Peter Huewe, Ashley Lai, Marcel Selhorst
  Cc: tpmdd-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Jarkko Sakkinen

This patch set enables TPM2 protocol and provides drivers for FIFO and
CRB interfaces.

Major changes since v1:

- Improved struct tpm_chip life-cycle by taking advantage of devres
  API.
- Refined sysfs attributes as simple key-values thereby not repeating
  mistakes in TPM1 sysfs attributes.
- Documented functions in tpm-chip.c and tpm2-cmd.c.
- Documented sysfs attributes.

Jarkko Sakkinen (6):
  tpm: merge duplicate transmit_cmd() functions
  tpm: two-phase chip management functions
  tpm: clean up tpm_tis driver life-cycle
  tpm: TPM 2.0 commands
  tpm: TPM 2.0 sysfs attributes
  tpm: TPM 2.0 CRB Interface

Will Arthur (1):
  tpm: TPM 2.0 FIFO Interface

 Documentation/ABI/stable/sysfs-class-tpm2 |  69 ++++
 drivers/char/tpm/Kconfig                  |   9 +
 drivers/char/tpm/Makefile                 |   3 +-
 drivers/char/tpm/tpm-chip.c               | 184 ++++++++++
 drivers/char/tpm/tpm-dev.c                |   4 +-
 drivers/char/tpm/tpm-interface.c          | 201 ++++-------
 drivers/char/tpm/tpm-sysfs.c              |  23 +-
 drivers/char/tpm/tpm.h                    | 103 +++++-
 drivers/char/tpm/tpm2-cmd.c               | 543 ++++++++++++++++++++++++++++++
 drivers/char/tpm/tpm2-sysfs.c             | 314 +++++++++++++++++
 drivers/char/tpm/tpm_crb.c                | 329 ++++++++++++++++++
 drivers/char/tpm/tpm_i2c_atmel.c          |   5 -
 drivers/char/tpm/tpm_i2c_infineon.c       |  10 -
 drivers/char/tpm/tpm_i2c_nuvoton.c        |   5 -
 drivers/char/tpm/tpm_infineon.c           |   1 -
 drivers/char/tpm/tpm_tis.c                | 142 +++++---
 16 files changed, 1696 insertions(+), 249 deletions(-)
 create mode 100644 Documentation/ABI/stable/sysfs-class-tpm2
 create mode 100644 drivers/char/tpm/tpm-chip.c
 create mode 100644 drivers/char/tpm/tpm2-cmd.c
 create mode 100644 drivers/char/tpm/tpm2-sysfs.c
 create mode 100644 drivers/char/tpm/tpm_crb.c

-- 
2.1.0

^ permalink raw reply

* Re: [PATCH 10/17] mm: rmap preparation for remap_anon_pages
From: Linus Torvalds @ 2014-10-07 16:56 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Dr. David Alan Gilbert, qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	KVM list, Linux Kernel Mailing List, linux-mm, Linux API,
	Andres Lagar-Cavilla, Dave Hansen, Paolo Bonzini, Rik van Riel,
	Mel Gorman, Andy Lutomirski, Andrew Morton, Sasha Levin,
	Hugh Dickins, Peter Feiner, Christopher Covington,
	Johannes Weiner, Android Kernel Team, Robert Love,
	Dmitry Adamushko, Neil Brown, Mike Hommey, Jan Kara
In-Reply-To: <20141007141913.GC2342-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Tue, Oct 7, 2014 at 10:19 AM, Andrea Arcangeli <aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>
> I see what you mean. The only cons I see is that we couldn't use then
> recv(tmp_addr, PAGE_SIZE), remap_anon_pages(faultaddr, tmp_addr,
> PAGE_SIZE, ..)  and retain the zerocopy behavior. Or how could we?
> There's no recvfile(userfaultfd, socketfd, PAGE_SIZE).

You're doing completelt faulty math, and you haven't thought it through.

Your "zero-copy" case is no such thing. Who cares if some packet
receive is zero-copy, when you need to set up the anonymous page to
*receive* the zero copy into, which involves page allocation, page
zeroing, page table setup with VM and page table locking, etc etc.

The thing is, the whole concept of "zero-copy" is pure and utter
bullshit. Sun made a big deal about the whole concept back in the
nineties, and IT DOES NOT WORK. It's a scam. Don't buy into it. It's
crap. It's made-up and not real.

Then, once you've allocated and cleared the page, mapped it in, your
"zero-copy" model involves looking up the page in the page tables
again (locking etc), then doing that zero-copy to the page. Then, when
you remap it, you look it up in the page tables AGAIN, with locking,
move it around, have to clear the old page table entry (which involves
a locked cmpxchg64), a TLB flush with most likely a cross-CPU IPI -
since the people who do this are all threaded and want many CPU's, and
then you insert the page into the new place.

That's *insane*. It's crap. All just to try to avoid one page copy.

Don't do it. remapping games really are complete BS. They never beat
just copying the data. It's that simple.

> As things stands now, I'm afraid with a write() syscall we couldn't do
> it zerocopy.

Really, you need to rethink your whole "zerocopy" model. It's broken.
Nobody sane cares. You've bought into a model that Sun already showed
doesn't work.

The only time zero-copy works is in random benchmarks that are very
careful to not touch the data at all at any point, and also try to
make sure that the benchmark is very much single-threaded so that you
never have the whole cross-CPU IPI issue for the TLB invalidate. Then,
and only then, can zero-copy win. And it's just not a realistic
scenario.

> If it wasn't for the TLB flush of the old page, the remap_anon_pages
> variant would be more optimal than doing a copy through a write
> syscall. Is the copy cheaper than a TLB flush? I probably naively
> assumed the TLB flush was always cheaper.

A local TLB flush is cheap. That's not the problem. The problem is the
setup of the page, and the clearing of the page, and the cross-CPU TLB
flush. And the page table locking, etc etc.

So no, I'm not AT ALL worried about a single "invlpg" instruction.
That's nothing. Local CPU TLB flushes of single pages are basically
free. But that really isn't where the costs are.

Quite frankly, the *only* time page remapping has ever made sense is
when it is used for realloc() kind of purposes, where you need to
remap pages not because of zero-copy, but because you need to change
the virtual address space layout. And then you make sure it's not a
common operation, because you're not doing it as a performance win,
you're doing it because you're changing your virtual layout.

Really. Zerocopy is for benchmarks, and for things like "splice()"
when you can avoid the page tables *entirely*. But the notion that
page remapping of user pages is faster than a copy is pure and utter
garbage. It's simply not true.

So I really think you should aim for a "write()": kind of interface.

With write, you may not get the zero-copy, but on the other hand it
allows you to re-use the source buffer many times without having to
allocate new pages and map it in etc. So a "read()+write()" loop (or,
quite commonly a "generate data computationally from a compressed
source + write()" loop) is actually much more efficient than the
zero-copy remapping, because you don't have all the complexities and
overheads in creating the source page.

It is possible that that could involve "splice()" too, although I
don't really think the source data tends to be in page-aligned chunks.
But hey, splice() at least *can* be faster than copying (and then we
have vmsplice() not because it's magically faster, but because it can
under certain circumstances be worth it, and it kind of made sense to
allow the interface, but I really don't think it's used very much or
very useful).

               Linus

^ permalink raw reply

* Re: [PATCH 10/17] mm: rmap preparation for remap_anon_pages
From: Peter Feiner @ 2014-10-07 16:13 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Linus Torvalds, Dr. David Alan Gilbert,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A, KVM list,
	Linux Kernel Mailing List, linux-mm, Linux API,
	Andres Lagar-Cavilla, Dave Hansen, Paolo Bonzini, Rik van Riel,
	Mel Gorman, Andy Lutomirski, Andrew Morton, Sasha Levin,
	Hugh Dickins, Christopher Covington, Johannes Weiner,
	Android Kernel Team, Robert Love, Dmitry Adamushko, Neil Brown,
	Mike Hommey, Taras
In-Reply-To: <20141007155247.GD2342-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Tue, Oct 07, 2014 at 05:52:47PM +0200, Andrea Arcangeli wrote:
> I probably grossly overestimated the benefits of resolving the
> userfault with a zerocopy page move, sorry. [...]

For posterity, I think it's worth noting that most expensive aspect of a TLB
shootdown is the interprocessor interrupt necessary to flush other CPUs' TLBs.
On a many-core machine, copying 4K of data looks pretty cheap compared to
taking an interrupt and invalidating TLBs on many cores :-)

> [...] So if we entirely drop the
> zerocopy behavior and the TLB flush of the old page like you
> suggested, the way to keep the userfaultfd mechanism decoupled from
> the userfault resolution mechanism would be to implement an
> atomic-copy syscall. That would work for SIGBUS userfaults too without
> requiring a pseudofd then. It would be enough then to call
> mcopy_atomic(userfault_addr,tmp_addr,len) with the only constraints
> that len must be a multiple of PAGE_SIZE. Of course mcopy_atomic
> wouldn't page fault or call GUP into the destination address (it can't
> otherwise the in-flight partial copy would be visible to the process,
> breaking the atomicity of the copy), but it would fill in the
> pte/trans_huge_pmd with the same strict behavior that remap_anon_pages
> currently has (in turn it would by design bypass the VM_USERFAULT
> check and be ideal for resolving userfaults).
> 
> mcopy_atomic could then be also extended to tmpfs and it would work
> without requiring the source page to be a tmpfs page too without
> having to convert page types on the fly.
> 
> If I add mcopy_atomic, the patch in subject (10/17) can be dropped of
> course so it'd be even less intrusive than the current
> remap_anon_pages and it would require zero TLB flush during its
> runtime (it would just require an atomic copy).

I like this new approach. It will be good to have a single interface for
resolving anon and tmpfs userfaults.

> So should I try to embed a mcopy_atomic inside userfault_write or can
> I expose it to userland as a standalone new syscall? Or should I do
> something different? Comments?

One interesting (ab)use of userfault_write would be that the faulting process
and the fault-handling process could be different, which would be necessary
for post-copy live migration in CRIU (http://criu.org).

Aside from the asthetic difference, I can't think of any advantage in favor of
a syscall.

Peter

^ permalink raw reply

* Re: [PATCH 10/17] mm: rmap preparation for remap_anon_pages
From: Andy Lutomirski @ 2014-10-07 15:54 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Linus Torvalds, Dr. David Alan Gilbert,
	qemu-devel@nongnu.org Developers, KVM list,
	Linux Kernel Mailing List, linux-mm, Linux API,
	Andres Lagar-Cavilla, Dave Hansen, Paolo Bonzini, Rik van Riel,
	Mel Gorman, Andrew Morton, Sasha Levin, Hugh Dickins,
	Peter Feiner, Christopher Covington, Johannes Weiner,
	Android Kernel Team, Robert Love, Dmitry Adamushko, Neil Brown
In-Reply-To: <20141007155247.GD2342@redhat.com>

On Tue, Oct 7, 2014 at 8:52 AM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> On Tue, Oct 07, 2014 at 04:19:13PM +0200, Andrea Arcangeli wrote:
>> mremap like interface, or file+commands protocol interface. I tend to
>> like mremap more, that's why I opted for a remap_anon_pages syscall
>> kept orthogonal to the userfaultfd functionality (remap_anon_pages
>> could be also used standalone as an accelerated mremap in some
>> circumstances) but nothing prevents to just embed the same mechanism
>
> Sorry for the self followup, but something else comes to mind to
> elaborate this further.
>
> In term of interfaces, the most efficient I could think of to minimize
> the enter/exit kernel, would be to append the "source address" of the
> data received from the network transport, to the userfaultfd_write()
> command (by appending 8 bytes to the wakeup command). Said that,
> mixing the mechanism to be notified about userfaults with the
> mechanism to resolve an userfault to me looks a complication. I kind
> of liked to keep the userfaultfd protocol is very simple and doing
> just its thing. The userfaultfd doesn't need to know how the userfault
> was resolved, even mremap would work theoretically (until we run out
> of vmas). I thought it was simpler to keep it that way. However if we
> want to resolve the fault with a "write()" syscall this may be the
> most efficient way to do it, as we're already doing a write() into the
> pseudofd to wakeup the page fault that contains the destination
> address, I just need to append the source address to the wakeup command.
>
> I probably grossly overestimated the benefits of resolving the
> userfault with a zerocopy page move, sorry. So if we entirely drop the
> zerocopy behavior and the TLB flush of the old page like you
> suggested, the way to keep the userfaultfd mechanism decoupled from
> the userfault resolution mechanism would be to implement an
> atomic-copy syscall. That would work for SIGBUS userfaults too without
> requiring a pseudofd then. It would be enough then to call
> mcopy_atomic(userfault_addr,tmp_addr,len) with the only constraints
> that len must be a multiple of PAGE_SIZE. Of course mcopy_atomic
> wouldn't page fault or call GUP into the destination address (it can't
> otherwise the in-flight partial copy would be visible to the process,
> breaking the atomicity of the copy), but it would fill in the
> pte/trans_huge_pmd with the same strict behavior that remap_anon_pages
> currently has (in turn it would by design bypass the VM_USERFAULT
> check and be ideal for resolving userfaults).

At the risk of asking a possibly useless question, would it make sense
to splice data into a userfaultfd?

--Andy

>
> mcopy_atomic could then be also extended to tmpfs and it would work
> without requiring the source page to be a tmpfs page too without
> having to convert page types on the fly.
>
> If I add mcopy_atomic, the patch in subject (10/17) can be dropped of
> course so it'd be even less intrusive than the current
> remap_anon_pages and it would require zero TLB flush during its
> runtime (it would just require an atomic copy).
>
> So should I try to embed a mcopy_atomic inside userfault_write or can
> I expose it to userland as a standalone new syscall? Or should I do
> something different? Comments?
>
> Thanks,
> Andrea



-- 
Andy Lutomirski
AMA Capital Management, LLC

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 10/17] mm: rmap preparation for remap_anon_pages
From: Andrea Arcangeli @ 2014-10-07 15:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dr. David Alan Gilbert, qemu-devel, KVM list,
	Linux Kernel Mailing List, linux-mm, Linux API,
	Andres Lagar-Cavilla, Dave Hansen, Paolo Bonzini, Rik van Riel,
	Mel Gorman, Andy Lutomirski, Andrew Morton, Sasha Levin,
	Hugh Dickins, Peter Feiner, Christopher Covington,
	Johannes Weiner, Android Kernel Team, Robert Love,
	Dmitry Adamushko <dmitry>
In-Reply-To: <20141007141913.GC2342@redhat.com>

On Tue, Oct 07, 2014 at 04:19:13PM +0200, Andrea Arcangeli wrote:
> mremap like interface, or file+commands protocol interface. I tend to
> like mremap more, that's why I opted for a remap_anon_pages syscall
> kept orthogonal to the userfaultfd functionality (remap_anon_pages
> could be also used standalone as an accelerated mremap in some
> circumstances) but nothing prevents to just embed the same mechanism

Sorry for the self followup, but something else comes to mind to
elaborate this further.

In term of interfaces, the most efficient I could think of to minimize
the enter/exit kernel, would be to append the "source address" of the
data received from the network transport, to the userfaultfd_write()
command (by appending 8 bytes to the wakeup command). Said that,
mixing the mechanism to be notified about userfaults with the
mechanism to resolve an userfault to me looks a complication. I kind
of liked to keep the userfaultfd protocol is very simple and doing
just its thing. The userfaultfd doesn't need to know how the userfault
was resolved, even mremap would work theoretically (until we run out
of vmas). I thought it was simpler to keep it that way. However if we
want to resolve the fault with a "write()" syscall this may be the
most efficient way to do it, as we're already doing a write() into the
pseudofd to wakeup the page fault that contains the destination
address, I just need to append the source address to the wakeup command.

I probably grossly overestimated the benefits of resolving the
userfault with a zerocopy page move, sorry. So if we entirely drop the
zerocopy behavior and the TLB flush of the old page like you
suggested, the way to keep the userfaultfd mechanism decoupled from
the userfault resolution mechanism would be to implement an
atomic-copy syscall. That would work for SIGBUS userfaults too without
requiring a pseudofd then. It would be enough then to call
mcopy_atomic(userfault_addr,tmp_addr,len) with the only constraints
that len must be a multiple of PAGE_SIZE. Of course mcopy_atomic
wouldn't page fault or call GUP into the destination address (it can't
otherwise the in-flight partial copy would be visible to the process,
breaking the atomicity of the copy), but it would fill in the
pte/trans_huge_pmd with the same strict behavior that remap_anon_pages
currently has (in turn it would by design bypass the VM_USERFAULT
check and be ideal for resolving userfaults).

mcopy_atomic could then be also extended to tmpfs and it would work
without requiring the source page to be a tmpfs page too without
having to convert page types on the fly.

If I add mcopy_atomic, the patch in subject (10/17) can be dropped of
course so it'd be even less intrusive than the current
remap_anon_pages and it would require zero TLB flush during its
runtime (it would just require an atomic copy).

So should I try to embed a mcopy_atomic inside userfault_write or can
I expose it to userland as a standalone new syscall? Or should I do
something different? Comments?

Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 08/17] mm: madvise MADV_USERFAULT
From: Kirill A. Shutemov @ 2014-10-07 15:21 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: qemu-devel, kvm, linux-kernel, linux-mm, linux-api,
	Linus Torvalds, Andres Lagar-Cavilla, Dave Hansen, Paolo Bonzini,
	Rik van Riel, Mel Gorman, Andy Lutomirski, Andrew Morton,
	Sasha Levin, Hugh Dickins, Peter Feiner,
	\"Dr. David Alan Gilbert\", Christopher Covington,
	Johannes Weiner, Android Kernel Team, Robert Love,
	Dmitry Adamushko, Neil Brown, Mike Hommey, Taras Glek
In-Reply-To: <20141007132458.GZ2342@redhat.com>

On Tue, Oct 07, 2014 at 03:24:58PM +0200, Andrea Arcangeli wrote:
> Hi Kirill,
> 
> On Tue, Oct 07, 2014 at 01:36:45PM +0300, Kirill A. Shutemov wrote:
> > On Fri, Oct 03, 2014 at 07:07:58PM +0200, Andrea Arcangeli wrote:
> > > MADV_USERFAULT is a new madvise flag that will set VM_USERFAULT in the
> > > vma flags. Whenever VM_USERFAULT is set in an anonymous vma, if
> > > userland touches a still unmapped virtual address, a sigbus signal is
> > > sent instead of allocating a new page. The sigbus signal handler will
> > > then resolve the page fault in userland by calling the
> > > remap_anon_pages syscall.
> > 
> > Hm. I wounder if this functionality really fits madvise(2) interface: as
> > far as I understand it, it provides a way to give a *hint* to kernel which
> > may or may not trigger an action from kernel side. I don't think an
> > application will behaive reasonably if kernel ignore the *advise* and will
> > not send SIGBUS, but allocate memory.
> > 
> > I would suggest to consider to use some other interface for the
> > functionality: a new syscall or, perhaps, mprotect().
> 
> I didn't feel like adding PROT_USERFAULT to mprotect, which looks
> hardwired to just these flags:

PROT_NOALLOC may be?

> 
>        PROT_NONE  The memory cannot be accessed at all.
> 
>        PROT_READ  The memory can be read.
> 
>        PROT_WRITE The memory can be modified.
> 
>        PROT_EXEC  The memory can be executed.

To be complete: PROT_GROWSDOWN, PROT_GROWSUP and unused PROT_SEM.

> So here somebody should comment and choose between:
> 
> 1) set VM_USERFAULT with mprotect(PROT_USERFAULT) instead of
>    the current madvise(MADV_USERFAULT)
> 
> 2) drop MADV_USERFAULT and VM_USERFAULT and force the usage of the
>    userfaultfd protocol as the only way for userland to catch
>    userfaults (each userfaultfd must already register itself into its
>    own virtual memory ranges so it's a trivial change for userfaultfd
>    users that deletes just 1 or 2 lines of userland code, but it would
>    prevent to use the SIGBUS behavior with info->si_addr=faultaddr for
>    other users)
> 
> 3) keep things as they are now: use MADV_USERFAULT for SIGBUS
>    userfaults, with optional intersection between the
>    vm_flags&VM_USERFAULT ranges and the userfaultfd registered ranges
>    with vma->vm_userfaultfd_ctx!=NULL to know if to engage the
>    userfaultfd protocol instead of the plain SIGBUS

4) new syscall?
 
> I will update the code accordingly to feedback, so please comment.

I don't have strong points on this. Just *feel* it doesn't fit advice
semantics.

The only userspace interface I've designed was not proven good by time.
I would listen what senior maintainers say. :)
 
-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* RE: [PATCH 3/8] iio: Add support for DA9150 GPADC
From: Opensource [Adam Thomson] @ 2014-10-07 14:55 UTC (permalink / raw)
  To: Jonathan Cameron, Opensource [Adam Thomson], Lee Jones,
	Samuel Ortiz, linux-iio-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Sebastian Reichel, Dmitry Eremin-Solenikov, David Woodhouse,
	linux-pm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Rob Herring,
	Pawel Moll, Mark Rutland, Ian Campbell, Kumar Gala, Grant Likely,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Andrew Morton,
	Joe Perches, linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Support Opensource
In-Reply-To: <5426964A.3050407-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

On September 27, 2014 11:50, Jonathan Cameron wrote:

> On 23/09/14 11:53, Adam Thomson wrote:
> > This patch adds support for DA9150 Charger & Fuel-Gauge IC GPADC.

> > +
> > +static inline int da9150_gpadc_gpio_6v_voltage_now(int raw_val)
> > +{
> > +	/* Convert to mV */
> > +	return (6 * ((raw_val * 1000) + 500)) / 1024;
> These could all be expressed as raw values with offsets
> and scales (and that would be preferred).
> E.g. This one has offset 500000 and scale 6000/1024 or even
> better use IIO_VAL_FRACTIONAL_LOG2 for scale with val1 = 6000
> and val2 = (log_2 1024) = 10.
> 

What you've suggested isn't correct. The problem here is that the offset is
added first to the raw ADC reading, without factoring the ADC value accordingly
to match the factor of the offset. If we take the original equation provided for
this channel of the ADC, the offset is actually 0.5 which should be added to the
raw ADC value. This doesn't fit into the implementation in the kernel as we
can't use floating point. If we multiply the offset but not the raw ADC value,
then add them before applying the scale factor, then the result is wrong at the
end. Basically you need a scale for the raw ADC value to match the offset scale
so you can achieve the correct results, which is what my calculation does.
But that seems impossible with the current raw|offset|scale method.

> > +	ret = iio_map_array_register(indio_dev, da9150_gpadc_default_maps);
> > +	if (ret) {
> > +		dev_err(dev, "Failed to register IIO maps: %d\n", ret);
> > +		return ret;
> > +	}
> I'd suggest doing the devm_request_thread_irq before the iio_map_array
> stuff.  This is purely to avoid the order during remove not being
> obviously correct as it isn't the reverse of during probe.

Ok, should still work ok that way so can update.

> > +static int da9150_gpadc_remove(struct platform_device *pdev)
> > +{
> > +	struct iio_dev *indio_dev = platform_get_drvdata(pdev);
> > +
> > +	iio_map_array_unregister(indio_dev);
> Twice in one day.  I'm definitely thinking we should add a
> devm version of iio_map_array_register...

I assume you mean here that iio_device_unregister() should come first? Will
update.

^ permalink raw reply

* Re: [PATCH 10/17] mm: rmap preparation for remap_anon_pages
From: Andrea Arcangeli @ 2014-10-07 14:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dr. David Alan Gilbert, qemu-devel, KVM list,
	Linux Kernel Mailing List, linux-mm, Linux API,
	Andres Lagar-Cavilla, Dave Hansen, Paolo Bonzini, Rik van Riel,
	Mel Gorman, Andy Lutomirski, Andrew Morton, Sasha Levin,
	Hugh Dickins, Peter Feiner, Christopher Covington,
	Johannes Weiner, Android Kernel Team, Robert Love,
	Dmitry Adamushko <dmitry>
In-Reply-To: <CA+55aFxAOYBny+QwXfkPy-P3rs-RPr5SLYLcPNBiFO3waBXtQA@mail.gmail.com>

Hello,

On Tue, Oct 07, 2014 at 08:47:59AM -0400, Linus Torvalds wrote:
> On Mon, Oct 6, 2014 at 12:41 PM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> >
> > Of course if somebody has better ideas on how to resolve an anonymous
> > userfault they're welcome.
> 
> So I'd *much* rather have a "write()" style interface (ie _copying_
> bytes from user space into a newly allocated page that gets mapped)
> than a "remap page" style interface
> 
> remapping anonymous pages involves page table games that really aren't
> necessarily a good idea, and tlb invalidates for the old page etc.
> Just don't do it.

I see what you mean. The only cons I see is that we couldn't use then
recv(tmp_addr, PAGE_SIZE), remap_anon_pages(faultaddr, tmp_addr,
PAGE_SIZE, ..)  and retain the zerocopy behavior. Or how could we?
There's no recvfile(userfaultfd, socketfd, PAGE_SIZE).

Ideally if we could prevent the page data coming from the network to
ever become visible in the kernel we could avoid the TLB flush and
also be zerocopy but I can't see how we could achieve that.

The page data could come through a ssh pipe or anything (qemu supports
all kind of network transports for live migration), this is why
leaving the network protocol into userland is preferable.

As things stands now, I'm afraid with a write() syscall we couldn't do
it zerocopy. We'd still need to receive the memory in a temporary page
and then copy it to a kernel page (invisible to userland while we
write to it) to later map into the userfault address.

If it wasn't for the TLB flush of the old page, the remap_anon_pages
variant would be more optimal than doing a copy through a write
syscall. Is the copy cheaper than a TLB flush? I probably naively
assumed the TLB flush was always cheaper.

Now another idea that comes to mind to be able to add the ability to
switch between copy and TLB flush is using a RAP_FORCE_COPY flag, that
would then do a copy inside remap_anon_pages and leave the original
page mapped in place... (and such flag would also disable the -EBUSY
error if page_mapcount is > 1).

So then if the RAP_FORCE_COPY flag is set remap_anon_pages would
behave like you suggested (but with a mremap-like interface, instead
of a write syscall) and we could benchmark the difference between copy
and TLB flush too. We could even periodically benchmark it at runtime
and switch over the faster method (the more CPUs there are in the host
and the more threads the process has, the faster the copy will be
compared to the TLB flush).

Of course in terms of API I could implement the exact same mechanism
as described above for remap_anon_pages inside a write() to the
userfaultfd (it's a pseudo inode). It'd need two different commands to
prepare for the coming write (with a len multiple of PAGE_SIZE) to
know the address where the page should be mapped into and if to behave
zerocopy or if to skip the TLB flush and copy.

Because the copy vs TLB flush trade off is possible to achieve with
both interfaces, I think it really boils down to choosing between a
mremap like interface, or file+commands protocol interface. I tend to
like mremap more, that's why I opted for a remap_anon_pages syscall
kept orthogonal to the userfaultfd functionality (remap_anon_pages
could be also used standalone as an accelerated mremap in some
circumstances) but nothing prevents to just embed the same mechanism
inside userfaultfd if a file+commands API is preferable. Or we could
add a different syscall (separated from userfaultfd) that creates
another pseudofd to write a command plus the page data into it. Just I
wouldn't see the point of creating a pseudofd just to copy a page
atomically, the write() syscall would look more ideal if the
userfaultfd is already open for other reasons and the pseudofd
overhead is required anyway.

Last thing to keep in mind is that if using userfaults with SIGBUS and
without userfaultfd, remap_anon_pages would have been still useful, so
if we retain the SIGBUS behavior for volatile pages and we don't force
the usage for userfaultfd, it may be cleaner not to use userfaultfd
but a separate pseudofd to do the write() syscall though. Otherwise
the app would need to open the userfaultfd to resolve the fault even
though it's not using the userfaultfd protocol which doesn't look an
intuitive interface to me.

Comments welcome.

Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* RE: [PATCH 2/8] mfd: da9150: Add DT binding documentation for core
From: Opensource [Adam Thomson] @ 2014-10-07 13:47 UTC (permalink / raw)
  To: Jonathan Cameron, Opensource [Adam Thomson], Lee Jones,
	Samuel Ortiz, linux-iio-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Sebastian Reichel, Dmitry Eremin-Solenikov, David Woodhouse,
	linux-pm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Rob Herring,
	Pawel Moll, Mark Rutland, Ian Campbell, Kumar Gala, Grant Likely,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Andrew Morton,
	Joe Perches, linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Support Opensource
In-Reply-To: <5426932E.7060609-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

On September 27, 2014 11:37, Jonathan Cameron wrote:

> On 23/09/14 11:53, Adam Thomson wrote:
> > Signed-off-by: Adam Thomson <Adam.Thomson.Opensource@diasemi.com>
> Obviously this really wants a review from one of the device tree guys, but I
> have a few
> bits based on what Mark has recently said in other reviews ;)
> > ---
> >  Documentation/devicetree/bindings/mfd/da9150.txt | 41
> ++++++++++++++++++++++++
> >  1 file changed, 41 insertions(+)
> >  create mode 100644 Documentation/devicetree/bindings/mfd/da9150.txt
> >
> > diff --git a/Documentation/devicetree/bindings/mfd/da9150.txt
> b/Documentation/devicetree/bindings/mfd/da9150.txt
> > new file mode 100644
> > index 0000000..d7de150
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/mfd/da9150.txt
> > @@ -0,0 +1,41 @@
> > +Dialog Semiconductor DA9150 Combined Charger/Fuel-Gauge MFD bindings
> > +
> > +DA9150 consists of a group of sub-devices (I2C Only):
> What does I2C only add to the description?

Nothing really. Will remove.

> > +
> > +Device			 Description
> > +------			 -----------
> > +da9150-gpadc		: IIO - GPADC
> Given usual aversion to anything driver specific in the device tree description
> you probably
> just want to describe what they do rather than what subsystem provides the driver.

Ok, can update accordingly.

> 
> > +da9150-charger		: Power Supply (Charger)
> > +
> > +======
> > +
> > +Required properties:
> > +- compatible : Should be "dlg,da9150"
> > +- reg: Specifies the I2C slave address
> > +- interrupt-parent: Specifies the phandle of the interrupt controller to which
> > +  the IRQs from da9150 are delivered to.
> > +- interrupts: IRQ line info for da9150 chip.
> Cross refer to the standard interrupts doc for these...
> 

Ok, can do that.

> > +- interrupt-controller: da9150 has internal IRQs (own IRQ domain).
> > +
> > +Sub-devices:
> > +- da9150-gpadc: See Documentation/devicetree/bindings/iio/adc/da9150-
> gpadc.txt
> > +- da9150-charger: See Documentation/devicetree/bindings/power/da9150-
> charger.txt
> > +
> > +
> > +Example:
> > +
> > +	charger_fg: da9150@58 {
> > +		compatible = "dlg,da9150";
> > +		reg = <0x58>;
> > +		interrupt-parent = <&gpio6>;
> > +		interrupts = <11 IRQ_TYPE_LEVEL_LOW>;
> > +		interrupt-controller;
> > +
> > +		gpadc: da9150-gpadc {
> > +			...
> > +		};
> > +
> > +		da9150-charger {
> > +			...
> > +		};
> > +	};
> > --
> > 1.9.3
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-iio" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >

^ permalink raw reply

* Re: [Qemu-devel] [PATCH 10/17] mm: rmap preparation for remap_anon_pages
From: Andrea Arcangeli @ 2014-10-07 13:37 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: qemu-devel, kvm, linux-kernel, linux-mm, linux-api, Robert Love,
	Dave Hansen, Jan Kara, Neil Brown, Stefan Hajnoczi, Andrew Jones,
	KOSAKI Motohiro, Michel Lespinasse, Taras Glek, Juan Quintela,
	Hugh Dickins, Isaku Yamahata, Mel Gorman, Sasha Levin,
	Android Kernel Team, \"Dr. David Alan Gilbert\",
	Huangpeng (Peter), Andres Lagar-Cavilla <andres>
In-Reply-To: <20141007111026.GD30762@node.dhcp.inet.fi>

Hi Kirill,

On Tue, Oct 07, 2014 at 02:10:26PM +0300, Kirill A. Shutemov wrote:
> On Fri, Oct 03, 2014 at 07:08:00PM +0200, Andrea Arcangeli wrote:
> > There's one constraint enforced to allow this simplification: the
> > source pages passed to remap_anon_pages must be mapped only in one
> > vma, but this is not a limitation when used to handle userland page
> > faults with MADV_USERFAULT. The source addresses passed to
> > remap_anon_pages should be set as VM_DONTCOPY with MADV_DONTFORK to
> > avoid any risk of the mapcount of the pages increasing, if fork runs
> > in parallel in another thread, before or while remap_anon_pages runs.
> 
> Have you considered triggering COW instead of adding limitation on
> pages' mapcount? The limitation looks artificial from interface POV.

I haven't considered it, mostly because I see it as a feature that it
returns -EBUSY. I prefer to avoid the risk of userland getting a
successful retval but internally the kernel silently behaving
non-zerocopy by mistake because some userland bug forgot to set
MADV_DONTFORK on the src_vma.

COW would be not zerocopy so it's not ok. We get sub 1msec latency for
userfaults through 10gbit and we don't want to risk wasting CPU
caches.

I however considered allowing to extend the strict behavior (i.e. the
feature) later in a backwards compatible way. We could provide a
non-zerocopy beahvior with a RAP_ALLOW_COW flag that would then turn
the -EBUSY error into a copy.

It's also more complex to implement the cow now, so it would make the
code that really matters, harder to review. So it may be preferable to
extend this later in a backwards compatible way with a new
RAP_ALLOW_COW flag.

The current handling the flags is already written in a way that should
allow backwards compatible extension with RAP_ALLOW_*:

#define RAP_ALLOW_SRC_HOLES (1UL<<0)

SYSCALL_DEFINE4(remap_anon_pages,
		unsigned long, dst_start, unsigned long, src_start,
		unsigned long, len, unsigned long, flags)
[..]
	long err = -EINVAL;
[..]
	if (flags & ~RAP_ALLOW_SRC_HOLES)
		return err;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH] [RFC] mnt: add ability to clone mntns starting with the current root
From: Al Viro @ 2014-10-07 13:33 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Andrey Vagin, Andrew Morton,
	Eric W. Biederman, Cyrill Gorcunov, Pavel Emelyanov, Serge Hallyn,
	Rob Landley
In-Reply-To: <20141007133039.GG7996-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>

On Tue, Oct 07, 2014 at 02:30:40PM +0100, Al Viro wrote:
> On Tue, Oct 07, 2014 at 04:12:57PM +0400, Andrey Vagin wrote:
> > Another problem is that rootfs can't be hidden from a container, because
> > rootfs can't be moved or umounted.
> 
> ... which is a bug in mntns_install(), AFAICS.

Ability to get to exposed rootfs, that is.

> > Here is an example how to get access to rootfs:
> > fd = open("/proc/self/ns/mnt", O_RDONLY)
> > umount2("/", MNT_DETACH);
> > setns(fd, CLONE_NEWNS)
> > 
> > rootfs may contain data, which should not be avaliable in CT-s.
> 
> Indeed.

... and it looks like the above is what your mangled reproducer in previous
patch had been made of -
	fd = open("/proc/self/ns/mnt", O_RDONLY)
	umount2("/", MNT_DETACH);
	setns(fd, CLONE_NEWNS)
	umount2("/", MNT_DETACH);

IMO what it shows is setns() bug.  This "switch root/cwd, no matter what"
is wrong.

^ permalink raw reply

* Re: [PATCH] [RFC] mnt: add ability to clone mntns starting with the current root
From: Al Viro @ 2014-10-07 13:30 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: linux-fsdevel, linux-kernel, linux-api, Andrey Vagin,
	Andrew Morton, Eric W. Biederman, Cyrill Gorcunov,
	Pavel Emelyanov, Serge Hallyn, Rob Landley
In-Reply-To: <1412683977-29543-1-git-send-email-avagin@openvz.org>

On Tue, Oct 07, 2014 at 04:12:57PM +0400, Andrey Vagin wrote:
> Another problem is that rootfs can't be hidden from a container, because
> rootfs can't be moved or umounted.

... which is a bug in mntns_install(), AFAICS.

> Here is an example how to get access to rootfs:
> fd = open("/proc/self/ns/mnt", O_RDONLY)
> umount2("/", MNT_DETACH);
> setns(fd, CLONE_NEWNS)
> 
> rootfs may contain data, which should not be avaliable in CT-s.

Indeed.

> I suggest to add ability to create a mount namespace with specified
> mount points. A current task root can be used as a root for the new
> mount namespace.

Yecchh...  Frankly, you are opening a big can of worms - having rootfs
as absolute root of all namespaces simplifies a lot of things in there.

^ permalink raw reply

* Re: [PATCH 08/17] mm: madvise MADV_USERFAULT
From: Andrea Arcangeli @ 2014-10-07 13:24 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: qemu-devel, kvm, linux-kernel, linux-mm, linux-api,
	Linus Torvalds, Andres Lagar-Cavilla, Dave Hansen, Paolo Bonzini,
	Rik van Riel, Mel Gorman, Andy Lutomirski, Andrew Morton,
	Sasha Levin, Hugh Dickins, Peter Feiner,
	\"Dr. David Alan Gilbert\", Christopher Covington,
	Johannes Weiner, Android Kernel Team, Robert Love,
	Dmitry Adamushko <dmitry.adamu>
In-Reply-To: <20141007103645.GB30762@node.dhcp.inet.fi>

Hi Kirill,

On Tue, Oct 07, 2014 at 01:36:45PM +0300, Kirill A. Shutemov wrote:
> On Fri, Oct 03, 2014 at 07:07:58PM +0200, Andrea Arcangeli wrote:
> > MADV_USERFAULT is a new madvise flag that will set VM_USERFAULT in the
> > vma flags. Whenever VM_USERFAULT is set in an anonymous vma, if
> > userland touches a still unmapped virtual address, a sigbus signal is
> > sent instead of allocating a new page. The sigbus signal handler will
> > then resolve the page fault in userland by calling the
> > remap_anon_pages syscall.
> 
> Hm. I wounder if this functionality really fits madvise(2) interface: as
> far as I understand it, it provides a way to give a *hint* to kernel which
> may or may not trigger an action from kernel side. I don't think an
> application will behaive reasonably if kernel ignore the *advise* and will
> not send SIGBUS, but allocate memory.
> 
> I would suggest to consider to use some other interface for the
> functionality: a new syscall or, perhaps, mprotect().

I didn't feel like adding PROT_USERFAULT to mprotect, which looks
hardwired to just these flags:

       PROT_NONE  The memory cannot be accessed at all.

       PROT_READ  The memory can be read.

       PROT_WRITE The memory can be modified.

       PROT_EXEC  The memory can be executed.

Normally mprotect doesn't just alter the vmas but it also alters
pte/hugepmds protection bits, that's something that is never needed
with VM_USERFAULT so I didn't feel like VM_USERFAULT is a protection
change to the VMA.

mprotect is also hardwired to mangle only the VM_READ|WRITE|EXEC
flags, while madvise is ideal to set arbitrary vma flags.

>From an implementation standpoint the perfect place to set a flag in a
vma is madvise. This is what MADV_DONTFORK (it sets VM_DONTCOPY)
already does too in an identical way to MADV_USERFAULT/VM_USERFAULT.

MADV_DONTFORK is as critical as MADV_USERFAULT because people depends
on it for example to prevent the O_DIRECT vs fork race condition that
results in silent data corruption during I/O with threads that may
fork. The other reason why MADV_DONTFORK is critical is that fork()
would otherwise fail with OOM unless full overcommit is enabled
(i.e. pci hotplug crashes the guest if you forget to set
MADV_DONTFORK).

Another madvise that would generate a failure if not obeyed by the
kernel is MADV_DONTNEED that if it does nothing it could run lead to
OOM killing. We don't inflate virt balloons using munmap just to make
an example. Various other apps (maybe JVM garbage collection too)
makes extensive use of MADV_DONTNEED and depend on it.

Said that I can change it to mprotect, the only thing that I don't
like is that it'll result in a less clean patch and I can't possibly
see a practical risk in keeping it simpler with madvise, as long as we
always return -EINVAL whenever we encounter a vma type that cannot
raise userfaults yet (that is something I already enforced).

Yet another option would be to drop MADV_USERFAULT and
vm_flags&VM_USERFAULT entirely and in turn the ability to handle
userfaults with SIGBUS, and retain only the userfaultfd. The new
userfaultfd protocol requires registering each created userfaultfd
into its own private virtual memory ranges (that is to allow an
unlimited number of userfaultfd per process). Currently the
userfaultfd engages iff the fault address intersects both the
MADV_USERFAULT range and the userfaultfd registered ranges. So I could
drop MADV_USERFAULT and VM_USERFAULT and just check for
vma->vm_userfaultfd_ctx!=NULL to know if the userfaultfd protocol
needs to be engaged during the first page fault for a still unmapped
virtual address. I just thought it would be more flexibile to also
allow SIGBUS without forcing people to use userfaultfd (that's in fact
the only reason to still retain madvise(MADV_USERFAULT)!).

Volatile pages earlier patches only supported SIGBUS behavior for
example.. and I didn't intend to force them to use userfaultfd if
they're guaranteed to access the memory with the CPU and never through
a kernel syscall (that is something the app can enforce by
design). userfaultfd becomes necessary the moment you want to handle
userfaults through syscalls/gup etc... qemu obviously requires
userfaultfd and it never uses the userfaultfd-less SIGBUS behavior as
it touches the memory in all possible ways (first and foremost with
the KVM page fault that uses almost all variants of gup..).

So here somebody should comment and choose between:

1) set VM_USERFAULT with mprotect(PROT_USERFAULT) instead of
   the current madvise(MADV_USERFAULT)

2) drop MADV_USERFAULT and VM_USERFAULT and force the usage of the
   userfaultfd protocol as the only way for userland to catch
   userfaults (each userfaultfd must already register itself into its
   own virtual memory ranges so it's a trivial change for userfaultfd
   users that deletes just 1 or 2 lines of userland code, but it would
   prevent to use the SIGBUS behavior with info->si_addr=faultaddr for
   other users)

3) keep things as they are now: use MADV_USERFAULT for SIGBUS
   userfaults, with optional intersection between the
   vm_flags&VM_USERFAULT ranges and the userfaultfd registered ranges
   with vma->vm_userfaultfd_ctx!=NULL to know if to engage the
   userfaultfd protocol instead of the plain SIGBUS

I will update the code accordingly to feedback, so please comment.

I implemented 3) because I thought it provided the most flexibility
for userland to choose if to engage in the userfaultfd protocol or to
stay simple with the SIGBUS if the app doesn't require to access the
userfault virtual memory from the kernel code. It also provides the
cleanest and simplest implementation to set the VM_USERFAULT flags
with madvise.

My second choice would be 2). We could always add MADV_USERFAULT later
except then we'd be forced to set and clear VM_USERFAULT within the
userfaultfd registration to remain backwards compatible. The main cons
and the reason I didn't pick 2) is that it wouldn't be a drop in
replacement for volatile pages that would then be force to use the
userfaultfd protocol too.

I don't like 3) very much mostly because the changes to mprotect would
just make things more complex on the implementation side with purely
conceptual benefits, but then it's possible too and it's feature
equivalent to 1) as far as volatile pages are concerned, so I'm
overall fine with this change if that's the preferred way.

Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 10/17] mm: rmap preparation for remap_anon_pages
From: Linus Torvalds @ 2014-10-07 12:47 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Dr. David Alan Gilbert, qemu-devel, KVM list,
	Linux Kernel Mailing List, linux-mm, Linux API,
	Andres Lagar-Cavilla, Dave Hansen, Paolo Bonzini, Rik van Riel,
	Mel Gorman, Andy Lutomirski, Andrew Morton, Sasha Levin,
	Hugh Dickins, Peter Feiner, Christopher Covington,
	Johannes Weiner, Android Kernel Team, Robert Love,
	Dmitry Adamushko, Neil Brown, Mike Hommey,
	Taras Glek <tgle>
In-Reply-To: <20141006164156.GA31075@redhat.com>

On Mon, Oct 6, 2014 at 12:41 PM, Andrea Arcangeli <aarcange@redhat.com> wrote:
>
> Of course if somebody has better ideas on how to resolve an anonymous
> userfault they're welcome.

So I'd *much* rather have a "write()" style interface (ie _copying_
bytes from user space into a newly allocated page that gets mapped)
than a "remap page" style interface

remapping anonymous pages involves page table games that really aren't
necessarily a good idea, and tlb invalidates for the old page etc.
Just don't do it.

           Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH] [RFC] mnt: add ability to clone mntns starting with the current root
From: Andrey Vagin @ 2014-10-07 12:12 UTC (permalink / raw)
  To: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Andrey Vagin, Alexander Viro,
	Andrew Morton, Eric W. Biederman, Cyrill Gorcunov,
	Pavel Emelyanov, Serge Hallyn, Rob Landley, Andrey Vagin

From: Andrey Vagin <avagin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Currently when we create a new container with a separate root,
we need to clone the current mount namespace with all mounts and then
clean up it by using pivot_root(). A big part of mountpoints are cloned
only to be umounted.

Another problem is that rootfs can't be hidden from a container, because
rootfs can't be moved or umounted.

Here is an example how to get access to rootfs:
fd = open("/proc/self/ns/mnt", O_RDONLY)
umount2("/", MNT_DETACH);
setns(fd, CLONE_NEWNS)

rootfs may contain data, which should not be avaliable in CT-s.

I suggest to add ability to create a mount namespace with specified
mount points. A current task root can be used as a root for the new
mount namespace.

With this patch you can call chroot(ct->rootfs) and
unshare(UNSHARE_NEWNS2) to get a clean mount namespace.

UNSHARE_NEWNS2 can be used only with the unshare() syscall. The clone()
syscall doesn't have unused flags.

Here is an example how it looks like:
$ cat ../../unshare.c

int main(int argc, char **argv)
{
	if (unshare(UNSHARE_NEWNS2))
		return 1;

	execl("/bin/bash", "/bin/bash", NULL);
	return 1;
}
$ mount --bind test/ubuntu/ test/ubuntu/
$ cd test/ubuntu/
$ chroot .
$ ./unshare2
$ mount -t proc proc proc
$ cat /proc/self/mountinfo
55 55 252:1 /home/avagin/test/ubuntu / rw,relatime - ext4 /dev/disk/by-uuid/d672b85f-533c-4868-9609-ca80be52d3c6 rw,errors=remount-ro,data=ordered
56 55 0:3 / /proc rw,relatime - proc proc rw

Cc: Alexander Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
Cc: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Cc: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Cc: Cyrill Gorcunov <gorcunov-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
Cc: Pavel Emelyanov <xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
Cc: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Cc: Rob Landley <rob-VoJi6FS/r0vR7s880joybQ@public.gmane.org>
Signed-off-by: Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
---
 fs/namespace.c             | 16 ++++++++++++++--
 include/uapi/linux/sched.h |  8 ++++++++
 kernel/fork.c              | 11 ++++++++---
 kernel/nsproxy.c           |  2 +-
 4 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 730c50e..f50a848 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2569,12 +2569,24 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns,
 
 	BUG_ON(!ns);
 
-	if (likely(!(flags & CLONE_NEWNS))) {
+	if (likely(!(flags & (CLONE_NEWNS | UNSHARE_NEWNS2)))) {
 		get_mnt_ns(ns);
 		return ns;
 	}
 
-	old = ns->root;
+	if (flags & CLONE_NEWNS)
+		old = ns->root;
+	else { /* UNSHARE_NEWNS2 */
+		struct path root;
+
+		get_fs_root(current->fs, &root);
+		if (root.mnt->mnt_root != root.dentry) {
+			path_put(&root);
+			return ERR_PTR(-EINVAL); /* not a mountpoint */
+		}
+		old = real_mount(root.mnt);
+		path_put(&root);
+	}
 
 	new_ns = alloc_mnt_ns(user_ns);
 	if (IS_ERR(new_ns))
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index 34f9d73..8092e50 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -31,6 +31,14 @@
 #define CLONE_IO		0x80000000	/* Clone io context */
 
 /*
+ * Following flags can be used only with unshare(), because
+ * they are intersected with CSIGNAL
+ */
+#define UNSHARE_NEWNS2		0x00000001	/* Clone mnt namespace starting with the current task root. */
+
+#define UNSHARE_FLAGS		(UNSHARE_NEWNS2)
+
+/*
  * Scheduling policies
  */
 #define SCHED_NORMAL		0
diff --git a/kernel/fork.c b/kernel/fork.c
index 0cf9cdb..52f1fc0 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1381,7 +1381,12 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 	retval = copy_mm(clone_flags, p);
 	if (retval)
 		goto bad_fork_cleanup_signal;
-	retval = copy_namespaces(clone_flags, p);
+
+	/*
+	 * CSIGNAL and UNSHARE_FLAGS are intersected, but
+	 * UNSHARE_FLAGS can't be used with clone().
+	 */
+	retval = copy_namespaces(clone_flags & ~UNSHARE_FLAGS, p);
 	if (retval)
 		goto bad_fork_cleanup_mm;
 	retval = copy_io(clone_flags, p);
@@ -1790,7 +1795,7 @@ static int check_unshare_flags(unsigned long unshare_flags)
 	if (unshare_flags & ~(CLONE_THREAD|CLONE_FS|CLONE_NEWNS|CLONE_SIGHAND|
 				CLONE_VM|CLONE_FILES|CLONE_SYSVSEM|
 				CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWNET|
-				CLONE_NEWUSER|CLONE_NEWPID))
+				CLONE_NEWUSER|CLONE_NEWPID|UNSHARE_FLAGS))
 		return -EINVAL;
 	/*
 	 * Not implemented, but pretend it works if there is nothing to
@@ -1880,7 +1885,7 @@ SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags)
 	/*
 	 * If unsharing namespace, must also unshare filesystem information.
 	 */
-	if (unshare_flags & CLONE_NEWNS)
+	if (unshare_flags & (CLONE_NEWNS | UNSHARE_NEWNS2))
 		unshare_flags |= CLONE_FS;
 
 	err = check_unshare_flags(unshare_flags);
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index ef42d0a..a29e836 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -180,7 +180,7 @@ int unshare_nsproxy_namespaces(unsigned long unshare_flags,
 	int err = 0;
 
 	if (!(unshare_flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
-			       CLONE_NEWNET | CLONE_NEWPID)))
+			       CLONE_NEWNET | CLONE_NEWPID | UNSHARE_FLAGS)))
 		return 0;
 
 	user_ns = new_cred ? new_cred->user_ns : current_user_ns();
-- 
1.9.3

^ permalink raw reply related

* Re: [Qemu-devel] [PATCH 08/17] mm: madvise MADV_USERFAULT
From: Kirill A. Shutemov @ 2014-10-07 11:30 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Robert Love, Dave Hansen, Jan Kara, kvm, Neil Brown,
	Stefan Hajnoczi, qemu-devel, linux-mm, KOSAKI Motohiro,
	Michel Lespinasse, Andrea Arcangeli, Taras Glek, Juan Quintela,
	Hugh Dickins, Isaku Yamahata, Mel Gorman, Sasha Levin,
	Android Kernel Team, Andrew Jones, Huangpeng (Peter),
	Andres Lagar-Cavilla, Christopher Covington, Anthony Liguori,
	Paolo Bonzini, Keith Packard
In-Reply-To: <20141007110102.GJ2404@work-vm>

On Tue, Oct 07, 2014 at 12:01:02PM +0100, Dr. David Alan Gilbert wrote:
> * Kirill A. Shutemov (kirill@shutemov.name) wrote:
> > On Tue, Oct 07, 2014 at 11:46:04AM +0100, Dr. David Alan Gilbert wrote:
> > > * Kirill A. Shutemov (kirill@shutemov.name) wrote:
> > > > On Fri, Oct 03, 2014 at 07:07:58PM +0200, Andrea Arcangeli wrote:
> > > > > MADV_USERFAULT is a new madvise flag that will set VM_USERFAULT in the
> > > > > vma flags. Whenever VM_USERFAULT is set in an anonymous vma, if
> > > > > userland touches a still unmapped virtual address, a sigbus signal is
> > > > > sent instead of allocating a new page. The sigbus signal handler will
> > > > > then resolve the page fault in userland by calling the
> > > > > remap_anon_pages syscall.
> > > > 
> > > > Hm. I wounder if this functionality really fits madvise(2) interface: as
> > > > far as I understand it, it provides a way to give a *hint* to kernel which
> > > > may or may not trigger an action from kernel side. I don't think an
> > > > application will behaive reasonably if kernel ignore the *advise* and will
> > > > not send SIGBUS, but allocate memory.
> > > 
> > > Aren't DONTNEED and DONTDUMP  similar cases of madvise operations that are
> > > expected to do what they say ?
> > 
> > No. If kernel would ignore MADV_DONTNEED or MADV_DONTDUMP it will not
> > affect correctness, just behaviour will be suboptimal: more than needed
> > memory used or wasted space in coredump.
> 
> That's not how the manpage reads for DONTNEED; it calls it out as a special
> case near the top, and explicitly says what will happen if you read the
> area marked as DONTNEED.

Your are right. MADV_DONTNEED doesn't fit the interface too. That's bad
and we can't fix it. But it's not a reason to make this mistake again.

Read the next sentence: "The kernel is free to ignore the advice."

Note, POSIX_MADV_DONTNEED has totally different semantics.

> It looks like there are openssl patches that use DONTDUMP to explicitly
> make sure keys etc don't land in cores.

That's nice to have. But openssl works on systems without the interface,
meaning it's not essential for functionality.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox