LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 2/2] powerpc/e6500: TLB miss handler with hardware tablewalk support
From: Scott Wood @ 2012-07-19 20:12 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <1339722302.9220.175.camel@pasglop>

On 06/14/2012 08:05 PM, Benjamin Herrenschmidt wrote:
> On Thu, 2012-06-14 at 18:41 -0500, Scott Wood wrote:
>>  - Like on e5500, the linear mapping is bolted, so we don't need the
>>    overhead of supporting nested tlb misses.
>>
>> Note that hardware tablewalk does not work in rev1 of e6500.
>> We do not expect to support e6500 rev1 in mainline Linux.
> 
> I'll try to review that in more details next week.... 
> 
> Ben.

ping

-Scott

^ permalink raw reply

* [PATCH 2/4] powerpc/crypto: add compression support to arch vec
From: Seth Jennings @ 2012-07-19 14:42 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Seth Jennings, Kent Yoder, Herbert Xu, Greg Kroah-Hartman,
	linux-kernel, Paul Mackerras, Jeff Kirsher, Andrew Morton,
	Robert Jennings, linuxppc-dev, David S. Miller, linux-crypto
In-Reply-To: <1342708961-28587-1-git-send-email-sjenning@linux.vnet.ibm.com>

This patch enables compression engine support in the
architecture vector.  This causes the Power hypervisor
to allow access to the nx comrpession accelerator.

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/prom_init.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index 0794a30..9ec5e55 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -705,6 +705,7 @@ static void __init early_cmdline_parse(void)
 #endif
 #define OV5_TYPE1_AFFINITY	0x80	/* Type 1 NUMA affinity */
 #define OV5_PFO_HW_RNG		0x80	/* PFO Random Number Generator */
+#define OV5_PFO_HW_842		0x40	/* PFO Compression Accelerator */
 #define OV5_PFO_HW_ENCR		0x20	/* PFO Encryption Accelerator */
 
 /* Option Vector 6: IBM PAPR hints */
@@ -774,8 +775,7 @@ static unsigned char ibm_architecture_vec[] = {
 	0,
 	0,
 	0,
-	OV5_PFO_HW_RNG | OV5_PFO_HW_ENCR,
-
+	OV5_PFO_HW_RNG | OV5_PFO_HW_ENCR | OV5_PFO_HW_842,
 	/* option vector 6: IBM PAPR hints */
 	4 - 2,				/* length */
 	0,
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 1/4] powerpc/crypto: rework Kconfig
From: Seth Jennings @ 2012-07-19 14:42 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Seth Jennings, Kent Yoder, Herbert Xu, Greg Kroah-Hartman,
	linux-kernel, Paul Mackerras, Jeff Kirsher, Andrew Morton,
	Robert Jennings, linuxppc-dev, David S. Miller, linux-crypto
In-Reply-To: <1342708961-28587-1-git-send-email-sjenning@linux.vnet.ibm.com>

This patch creates a new submenu for the NX cryptographic
hardware accelerator and breaks the NX options into their own
Kconfig file under drivers/crypto/nx/Kconfig.

This will permit additional NX functionality to be easily
and more cleanly added in the future without touching
drivers/crypto/Makefile|Kconfig.

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
---
 drivers/crypto/Kconfig     |   20 +++++++-------------
 drivers/crypto/nx/Kconfig  |   17 +++++++++++++++++
 drivers/crypto/nx/Makefile |    2 +-
 3 files changed, 25 insertions(+), 14 deletions(-)
 create mode 100644 drivers/crypto/nx/Kconfig

diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index 7d74d09..662588a 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -298,21 +298,15 @@ config CRYPTO_DEV_TEGRA_AES
 	  will be called tegra-aes.
 
 config CRYPTO_DEV_NX
-	tristate "Support for Power7+ in-Nest cryptographic acceleration"
+	bool "Support for IBM Power7+ in-Nest cryptographic acceleration"
 	depends on PPC64 && IBMVIO
-	select CRYPTO_AES
-	select CRYPTO_CBC
-	select CRYPTO_ECB
-	select CRYPTO_CCM
-	select CRYPTO_GCM
-	select CRYPTO_AUTHENC
-	select CRYPTO_XCBC
-	select CRYPTO_SHA256
-	select CRYPTO_SHA512
+	default n
 	help
-	  Support for Power7+ in-Nest cryptographic acceleration. This
-	  module supports acceleration for AES and SHA2 algorithms. If you
-	  choose 'M' here, this module will be called nx_crypto.
+	  Support for Power7+ in-Nest cryptographic acceleration.
+
+if CRYPTO_DEV_NX
+	source "drivers/crypto/nx/Kconfig"
+endif
 
 config CRYPTO_DEV_UX500
 	tristate "Driver for ST-Ericsson UX500 crypto hardware acceleration"
diff --git a/drivers/crypto/nx/Kconfig b/drivers/crypto/nx/Kconfig
new file mode 100644
index 0000000..dedde53
--- /dev/null
+++ b/drivers/crypto/nx/Kconfig
@@ -0,0 +1,17 @@
+config CRYPTO_DEV_NX_ENCRYPT
+	tristate "Encryption acceleration support"
+	depends on PPC64 && IBMVIO
+	default y
+	select CRYPTO_AES
+	select CRYPTO_CBC
+	select CRYPTO_ECB
+	select CRYPTO_CCM
+	select CRYPTO_GCM
+	select CRYPTO_AUTHENC
+	select CRYPTO_XCBC
+	select CRYPTO_SHA256
+	select CRYPTO_SHA512
+	help
+	  Support for Power7+ in-Nest encryption acceleration. This
+	  module supports acceleration for AES and SHA2 algorithms. If you
+	  choose 'M' here, this module will be called nx_crypto.
diff --git a/drivers/crypto/nx/Makefile b/drivers/crypto/nx/Makefile
index 411ce59..7f110e4 100644
--- a/drivers/crypto/nx/Makefile
+++ b/drivers/crypto/nx/Makefile
@@ -1,4 +1,4 @@
-obj-$(CONFIG_CRYPTO_DEV_NX) += nx-crypto.o
+obj-$(CONFIG_CRYPTO_DEV_NX_ENCRYPT) += nx-crypto.o
 nx-crypto-objs := nx.o \
 		  nx_debugfs.o \
 		  nx-aes-cbc.o \
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 3/4] powerpc/crypto: add 842 hardware compression driver
From: Seth Jennings @ 2012-07-19 14:42 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Seth Jennings, Kent Yoder, Herbert Xu, Greg Kroah-Hartman,
	linux-kernel, Paul Mackerras, Jeff Kirsher, Andrew Morton,
	Robert Jennings, linuxppc-dev, David S. Miller, linux-crypto
In-Reply-To: <1342708961-28587-1-git-send-email-sjenning@linux.vnet.ibm.com>

This patch adds the driver for interacting with the 842
compression accelerator on IBM Power7+ systems.

The device is a child of the Platform Facilities Option (PFO)
and shows up as a child of the IBM VIO bus.

The compression/decompression API takes the same arguments
as existing compression methods like lzo and deflate.  The 842
hardware operates on 4K hardware pages and the driver breaks up
input on 4K boundaries to submit it to the hardware accelerator.

Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
---
 MAINTAINERS                |    6 +
 drivers/crypto/nx/Kconfig  |    9 +
 drivers/crypto/nx/Makefile |    3 +
 drivers/crypto/nx/nx-842.c | 1615 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/nx842.h      |   11 +
 5 files changed, 1644 insertions(+)
 create mode 100644 drivers/crypto/nx/nx-842.c
 create mode 100644 include/linux/nx842.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 5dbf8a2..30821ab 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3387,6 +3387,12 @@ T:	git git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git
 S:	Maintained
 F:	arch/ia64/
 
+IBM Power 842 compression accelerator
+M:	Robert Jennings <rcj@linux.vnet.ibm.com>
+S:	Supported
+F:	drivers/crypto/nx/nx-842.c
+F:	include/linux/nx842.h
+
 IBM Power Linux RAID adapter
 M:	Brian King <brking@us.ibm.com>
 S:	Supported
diff --git a/drivers/crypto/nx/Kconfig b/drivers/crypto/nx/Kconfig
index dedde53..f826166 100644
--- a/drivers/crypto/nx/Kconfig
+++ b/drivers/crypto/nx/Kconfig
@@ -15,3 +15,12 @@ config CRYPTO_DEV_NX_ENCRYPT
 	  Support for Power7+ in-Nest encryption acceleration. This
 	  module supports acceleration for AES and SHA2 algorithms. If you
 	  choose 'M' here, this module will be called nx_crypto.
+
+config CRYPTO_DEV_NX_COMPRESS
+	tristate "Compression acceleration support"
+	depends on PPC64 && IBMVIO
+	default y
+	help
+	  Support for Power7+ in-Nest compression acceleration. This
+	  module supports acceleration for AES and SHA2 algorithms. If you
+	  choose 'M' here, this module will be called nx_compress.
diff --git a/drivers/crypto/nx/Makefile b/drivers/crypto/nx/Makefile
index 7f110e4..bb770ea 100644
--- a/drivers/crypto/nx/Makefile
+++ b/drivers/crypto/nx/Makefile
@@ -9,3 +9,6 @@ nx-crypto-objs := nx.o \
 		  nx-aes-xcbc.o \
 		  nx-sha256.o \
 		  nx-sha512.o
+
+obj-$(CONFIG_CRYPTO_DEV_NX_COMPRESS) += nx-compress.o
+nx-compress-objs := nx-842.o
diff --git a/drivers/crypto/nx/nx-842.c b/drivers/crypto/nx/nx-842.c
new file mode 100644
index 0000000..9da0fb2
--- /dev/null
+++ b/drivers/crypto/nx/nx-842.c
@@ -0,0 +1,1615 @@
+/*
+ * Driver for IBM Power 842 compression accelerator
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright (C) IBM Corporation, 2012
+ *
+ * Authors: Robert Jennings <rcj@linux.vnet.ibm.com>
+ *          Seth Jennings <sjenning@linux.vnet.ibm.com>
+ */
+
+#include <linux/module.h>
+#include <asm/vio.h>
+#include <asm/pSeries_reconfig.h>
+#include <linux/slab.h>
+#include <asm/abs_addr.h>
+#include <linux/nx842.h>
+#include <linux/kernel.h>
+
+#include "nx_csbcpb.h" /* struct nx_csbcpb */
+
+#define MODULE_NAME "nx-compress"
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Robert Jennings <rcj@linux.vnet.ibm.com>");
+MODULE_DESCRIPTION("842 H/W Compression driver for IBM Power processors");
+
+#define SHIFT_4K 12
+#define SHIFT_64K 16
+#define SIZE_4K (1UL << SHIFT_4K)
+#define SIZE_64K (1UL << SHIFT_64K)
+
+/* IO buffer must be 128 byte aligned */
+#define IO_BUFFER_ALIGN 128
+
+struct nx842_header {
+	int blocks_nr; /* number of compressed blocks */
+	int offset; /* offset of the first block (from beginning of header) */
+	int sizes[0]; /* size of compressed blocks */
+};
+
+static inline int nx842_header_size(const struct nx842_header *hdr)
+{
+	return sizeof(struct nx842_header) +
+			hdr->blocks_nr * sizeof(hdr->sizes[0]);
+}
+
+/* Macros for fields within nx_csbcpb */
+/* Check the valid bit within the csbcpb valid field */
+#define NX842_CSBCBP_VALID_CHK(x) (x & BIT_MASK(7))
+
+/* CE macros operate on the completion_extension field bits in the csbcpb.
+ * CE0 0=full completion, 1=partial completion
+ * CE1 0=CE0 indicates completion, 1=termination (output may be modified)
+ * CE2 0=processed_bytes is source bytes, 1=processed_bytes is target bytes */
+#define NX842_CSBCPB_CE0(x)	(x & BIT_MASK(7))
+#define NX842_CSBCPB_CE1(x)	(x & BIT_MASK(6))
+#define NX842_CSBCPB_CE2(x)	(x & BIT_MASK(5))
+
+/* The NX unit accepts data only on 4K page boundaries */
+#define NX842_HW_PAGE_SHIFT	SHIFT_4K
+#define NX842_HW_PAGE_SIZE	(ASM_CONST(1) << NX842_HW_PAGE_SHIFT)
+#define NX842_HW_PAGE_MASK	(~(NX842_HW_PAGE_SIZE-1))
+
+enum nx842_status {
+	UNAVAILABLE,
+	AVAILABLE
+};
+
+struct ibm_nx842_counters {
+	atomic64_t comp_complete;
+	atomic64_t comp_failed;
+	atomic64_t decomp_complete;
+	atomic64_t decomp_failed;
+	atomic64_t swdecomp;
+	atomic64_t comp_times[32];
+	atomic64_t decomp_times[32];
+};
+
+static struct nx842_devdata {
+	struct vio_dev *vdev;
+	struct device *dev;
+	struct ibm_nx842_counters *counters;
+	unsigned int max_sg_len;
+	unsigned int max_sync_size;
+	unsigned int max_sync_sg;
+	enum nx842_status status;
+} __rcu *devdata;
+static DEFINE_SPINLOCK(devdata_mutex);
+
+#define NX842_COUNTER_INC(_x) \
+static inline void nx842_inc_##_x( \
+	const struct nx842_devdata *dev) { \
+	if (dev) \
+		atomic64_inc(&dev->counters->_x); \
+}
+NX842_COUNTER_INC(comp_complete);
+NX842_COUNTER_INC(comp_failed);
+NX842_COUNTER_INC(decomp_complete);
+NX842_COUNTER_INC(decomp_failed);
+NX842_COUNTER_INC(swdecomp);
+
+#define NX842_HIST_SLOTS 16
+
+static void ibm_nx842_incr_hist(atomic64_t *times, unsigned int time)
+{
+	int bucket = fls(time);
+
+	if (bucket)
+		bucket = min((NX842_HIST_SLOTS - 1), bucket - 1);
+
+	atomic64_inc(&times[bucket]);
+}
+
+/* NX unit operation flags */
+#define NX842_OP_COMPRESS	0x0
+#define NX842_OP_CRC		0x1
+#define NX842_OP_DECOMPRESS	0x2
+#define NX842_OP_COMPRESS_CRC   (NX842_OP_COMPRESS | NX842_OP_CRC)
+#define NX842_OP_DECOMPRESS_CRC (NX842_OP_DECOMPRESS | NX842_OP_CRC)
+#define NX842_OP_ASYNC		(1<<23)
+#define NX842_OP_NOTIFY		(1<<22)
+#define NX842_OP_NOTIFY_INT(x)	((x & 0xff)<<8)
+
+static unsigned long nx842_get_desired_dma(struct vio_dev *viodev)
+{
+	/* No use of DMA mappings within the driver. */
+	return 0;
+}
+
+struct nx842_slentry {
+	unsigned long ptr; /* Absolute address (use virt_to_abs()) */
+	unsigned long len;
+};
+
+/* pHyp scatterlist entry */
+struct nx842_scatterlist {
+	int entry_nr; /* number of slentries */
+	struct nx842_slentry *entries; /* ptr to array of slentries */
+};
+
+/* Does not include sizeof(entry_nr) in the size */
+static inline unsigned long nx842_get_scatterlist_size(
+				struct nx842_scatterlist *sl)
+{
+	return sl->entry_nr * sizeof(struct nx842_slentry);
+}
+
+static int nx842_build_scatterlist(unsigned long buf, int len,
+			struct nx842_scatterlist *sl)
+{
+	unsigned long nextpage;
+	struct nx842_slentry *entry;
+
+	sl->entry_nr = 0;
+
+	entry = sl->entries;
+	while (len) {
+		entry->ptr = virt_to_abs(buf);
+		nextpage = ALIGN(buf + 1, NX842_HW_PAGE_SIZE);
+		if (nextpage < buf + len) {
+			/* we aren't at the end yet */
+			if (IS_ALIGNED(buf, NX842_HW_PAGE_SIZE))
+				/* we are in the middle (or beginning) */
+				entry->len = NX842_HW_PAGE_SIZE;
+			else
+				/* we are at the beginning */
+				entry->len = nextpage - buf;
+		} else {
+			/* at the end */
+			entry->len = len;
+		}
+
+		len -= entry->len;
+		buf += entry->len;
+		sl->entry_nr++;
+		entry++;
+	}
+
+	return 0;
+}
+
+/*
+ * Working memory for software decompression
+ */
+struct sw842_fifo {
+	union {
+		char f8[256][8];
+		char f4[512][4];
+	};
+	char f2[256][2];
+	unsigned char f84_full;
+	unsigned char f2_full;
+	unsigned char f8_count;
+	unsigned char f2_count;
+	unsigned int f4_count;
+};
+
+/*
+ * Working memory for crypto API
+ */
+struct nx842_workmem {
+	char bounce[PAGE_SIZE]; /* bounce buffer for decompression input */
+	union {
+		/* hardware working memory */
+		struct {
+			/* scatterlist */
+			char slin[SIZE_4K];
+			char slout[SIZE_4K];
+			/* coprocessor status/parameter block */
+			struct nx_csbcpb csbcpb;
+		};
+		/* software working memory */
+		struct sw842_fifo swfifo; /* software decompression fifo */
+	};
+};
+
+int nx842_get_workmem_size(void)
+{
+	return sizeof(struct nx842_workmem) + NX842_HW_PAGE_SIZE;
+}
+EXPORT_SYMBOL_GPL(nx842_get_workmem_size);
+
+int nx842_get_workmem_size_aligned(void)
+{
+	return sizeof(struct nx842_workmem);
+}
+EXPORT_SYMBOL_GPL(nx842_get_workmem_size_aligned);
+
+static int nx842_validate_result(struct device *dev,
+	struct cop_status_block *csb)
+{
+	/* The csb must be valid after returning from vio_h_cop_sync */
+	if (!NX842_CSBCBP_VALID_CHK(csb->valid)) {
+		dev_err(dev, "%s: cspcbp not valid upon completion.\n",
+				__func__);
+		dev_dbg(dev, "valid:0x%02x cs:0x%02x cc:0x%02x ce:0x%02x\n",
+				csb->valid,
+				csb->crb_seq_number,
+				csb->completion_code,
+				csb->completion_extension);
+		dev_dbg(dev, "processed_bytes:%d address:0x%016lx\n",
+				csb->processed_byte_count,
+				(unsigned long)csb->address);
+		return -EIO;
+	}
+
+	/* Check return values from the hardware in the CSB */
+	switch (csb->completion_code) {
+	case 0:	/* Completed without error */
+		break;
+	case 64: /* Target bytes > Source bytes during compression */
+	case 13: /* Output buffer too small */
+		dev_dbg(dev, "%s: Compression output larger than input\n",
+					__func__);
+		return -ENOSPC;
+	case 66: /* Input data contains an illegal template field */
+	case 67: /* Template indicates data past the end of the input stream */
+		dev_dbg(dev, "%s: Bad data for decompression (code:%d)\n",
+					__func__, csb->completion_code);
+		return -EINVAL;
+	default:
+		dev_dbg(dev, "%s: Unspecified error (code:%d)\n",
+					__func__, csb->completion_code);
+		return -EIO;
+	}
+
+	/* Hardware sanity check */
+	if (!NX842_CSBCPB_CE2(csb->completion_extension)) {
+		dev_err(dev, "%s: No error returned by hardware, but "
+				"data returned is unusable, contact support.\n"
+				"(Additional info: csbcbp->processed bytes "
+				"does not specify processed bytes for the "
+				"target buffer.)\n", __func__);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+/**
+ * nx842_compress - Compress data using the 842 algorithm
+ *
+ * Compression provide by the NX842 coprocessor on IBM Power systems.
+ * The input buffer is compressed and the result is stored in the
+ * provided output buffer.
+ *
+ * Upon return from this function @outlen contains the length of the
+ * compressed data.  If there is an error then @outlen will be 0 and an
+ * error will be specified by the return code from this function.
+ *
+ * @in: Pointer to input buffer, must be page aligned
+ * @inlen: Length of input buffer, must be PAGE_SIZE
+ * @out: Pointer to output buffer
+ * @outlen: Length of output buffer
+ * @wrkmem: ptr to buffer for working memory, size determined by
+ *          nx842_get_workmem_size()
+ *
+ * Returns:
+ *   0		Success, output of length @outlen stored in the buffer at @out
+ *   -ENOMEM	Unable to allocate internal buffers
+ *   -ENOSPC	Output buffer is to small
+ *   -EMSGSIZE	XXX Difficult to describe this limitation
+ *   -EIO	Internal error
+ *   -ENODEV	Hardware unavailable
+ */
+int nx842_compress(const unsigned char *in, unsigned int inlen,
+		       unsigned char *out, unsigned int *outlen, void *wmem)
+{
+	struct nx842_header *hdr;
+	struct nx842_devdata *local_devdata;
+	struct device *dev = NULL;
+	struct nx842_workmem *workmem;
+	struct nx842_scatterlist slin, slout;
+	struct nx_csbcpb *csbcpb;
+	int ret = 0, max_sync_size, i, bytesleft, size, hdrsize;
+	unsigned long inbuf, outbuf, padding;
+	struct vio_pfo_op op = {
+		.done = NULL,
+		.handle = 0,
+		.timeout = 0,
+	};
+	unsigned long start_time = get_tb();
+
+	/*
+	 * Make sure input buffer is 64k page aligned.  This is assumed since
+	 * this driver is designed for page compression only (for now).  This
+	 * is very nice since we can now use direct DDE(s) for the input and
+	 * the alignment is guaranteed.
+	*/
+	inbuf = (unsigned long)in;
+	if (!IS_ALIGNED(inbuf, PAGE_SIZE) || inlen != PAGE_SIZE)
+		return -EINVAL;
+
+	rcu_read_lock();
+	local_devdata = rcu_dereference(devdata);
+	if (!local_devdata || !local_devdata->dev) {
+		rcu_read_unlock();
+		return -ENODEV;
+	}
+	max_sync_size = local_devdata->max_sync_size;
+	dev = local_devdata->dev;
+
+	/* Create the header */
+	hdr = (struct nx842_header *)out;
+	hdr->blocks_nr = PAGE_SIZE / max_sync_size;
+	hdrsize = nx842_header_size(hdr);
+	outbuf = (unsigned long)out + hdrsize;
+	bytesleft = *outlen - hdrsize;
+
+	/* Init scatterlist */
+	workmem = (struct nx842_workmem *)ALIGN((unsigned long)wmem,
+		NX842_HW_PAGE_SIZE);
+	slin.entries = (struct nx842_slentry *)workmem->slin;
+	slout.entries = (struct nx842_slentry *)workmem->slout;
+
+	/* Init operation */
+	op.flags = NX842_OP_COMPRESS;
+	csbcpb = &workmem->csbcpb;
+	memset(csbcpb, 0, sizeof(*csbcpb));
+	op.csbcpb = virt_to_abs(csbcpb);
+	op.out = virt_to_abs(slout.entries);
+
+	for (i = 0; i < hdr->blocks_nr; i++) {
+		/*
+		 * Aligning the output blocks to 128 bytes does waste space,
+		 * but it prevents the need for bounce buffers and memory
+		 * copies.  It also simplifies the code a lot.  In the worst
+		 * case (64k page, 4k max_sync_size), you lose up to
+		 * (128*16)/64k = ~3% the compression factor. For 64k
+		 * max_sync_size, the loss would be at most 128/64k = ~0.2%.
+		 */
+		padding = ALIGN(outbuf, IO_BUFFER_ALIGN) - outbuf;
+		outbuf += padding;
+		bytesleft -= padding;
+		if (i == 0)
+			/* save offset into first block in header */
+			hdr->offset = padding + hdrsize;
+
+		if (bytesleft <= 0) {
+			ret = -ENOSPC;
+			goto unlock;
+		}
+
+		/*
+		 * NOTE: If the default max_sync_size is changed from 4k
+		 * to 64k, remove the "likely" case below, since a
+		 * scatterlist will always be needed.
+		 */
+		if (likely(max_sync_size == NX842_HW_PAGE_SIZE)) {
+			/* Create direct DDE */
+			op.in = virt_to_abs(inbuf);
+			op.inlen = max_sync_size;
+
+		} else {
+			/* Create indirect DDE (scatterlist) */
+			nx842_build_scatterlist(inbuf, max_sync_size, &slin);
+			op.in = virt_to_abs(slin.entries);
+			op.inlen = -nx842_get_scatterlist_size(&slin);
+		}
+
+		/*
+		 * If max_sync_size != NX842_HW_PAGE_SIZE, an indirect
+		 * DDE is required for the outbuf.
+		 * If max_sync_size == NX842_HW_PAGE_SIZE, outbuf must
+		 * also be page aligned (1 in 128/4k=32 chance) in order
+		 * to use a direct DDE.
+		 * This is unlikely, just use an indirect DDE always.
+		 */
+		nx842_build_scatterlist(outbuf,
+			min(bytesleft, max_sync_size), &slout);
+		/* op.out set before loop */
+		op.outlen = -nx842_get_scatterlist_size(&slout);
+
+		/* Send request to pHyp */
+		ret = vio_h_cop_sync(local_devdata->vdev, &op);
+
+		/* Check for pHyp error */
+		if (ret) {
+			dev_dbg(dev, "%s: vio_h_cop_sync error (ret=%d, hret=%ld)\n",
+				__func__, ret, op.hcall_err);
+			ret = -EIO;
+			goto unlock;
+		}
+
+		/* Check for hardware error */
+		ret = nx842_validate_result(dev, &csbcpb->csb);
+		if (ret && ret != -ENOSPC)
+			goto unlock;
+
+		/* Handle incompressible data */
+		if (unlikely(ret == -ENOSPC)) {
+			if (bytesleft < max_sync_size) {
+				/*
+				 * Not enough space left in the output buffer
+				 * to store uncompressed block
+				 */
+				goto unlock;
+			} else {
+				/* Store incompressible block */
+				memcpy((void *)outbuf, (void *)inbuf,
+					max_sync_size);
+				hdr->sizes[i] = -max_sync_size;
+				outbuf += max_sync_size;
+				bytesleft -= max_sync_size;
+				/* Reset ret, incompressible data handled */
+				ret = 0;
+			}
+		} else {
+			/* Normal case, compression was successful */
+			size = csbcpb->csb.processed_byte_count;
+			dev_dbg(dev, "%s: processed_bytes=%d\n",
+				__func__, size);
+			hdr->sizes[i] = size;
+			outbuf += size;
+			bytesleft -= size;
+		}
+
+		inbuf += max_sync_size;
+	}
+
+	*outlen = (unsigned int)(outbuf - (unsigned long)out);
+
+unlock:
+	if (ret)
+		nx842_inc_comp_failed(local_devdata);
+	else {
+		nx842_inc_comp_complete(local_devdata);
+		ibm_nx842_incr_hist(local_devdata->counters->comp_times,
+			(get_tb() - start_time) / tb_ticks_per_usec);
+	}
+	rcu_read_unlock();
+	return ret;
+}
+EXPORT_SYMBOL_GPL(nx842_compress);
+
+static int sw842_decompress(const unsigned char *, int, unsigned char *, int *,
+			const void *);
+
+/**
+ * nx842_decompress - Decompress data using the 842 algorithm
+ *
+ * Decompression provide by the NX842 coprocessor on IBM Power systems.
+ * The input buffer is decompressed and the result is stored in the
+ * provided output buffer.  The size allocated to the output buffer is
+ * provided by the caller of this function in @outlen.  Upon return from
+ * this function @outlen contains the length of the decompressed data.
+ * If there is an error then @outlen will be 0 and an error will be
+ * specified by the return code from this function.
+ *
+ * @in: Pointer to input buffer, will use bounce buffer if not 128 byte
+ *      aligned
+ * @inlen: Length of input buffer
+ * @out: Pointer to output buffer, must be page aligned
+ * @outlen: Length of output buffer, must be PAGE_SIZE
+ * @wrkmem: ptr to buffer for working memory, size determined by
+ *          nx842_get_workmem_size()
+ *
+ * Returns:
+ *   0		Success, output of length @outlen stored in the buffer at @out
+ *   -ENODEV	Hardware decompression device is unavailable
+ *   -ENOMEM	Unable to allocate internal buffers
+ *   -ENOSPC	Output buffer is to small
+ *   -EINVAL	Bad input data encountered when attempting decompress
+ *   -EIO	Internal error
+ */
+int nx842_decompress(const unsigned char *in, unsigned int inlen,
+			 unsigned char *out, unsigned int *outlen, void *wmem)
+{
+	struct nx842_header *hdr;
+	struct nx842_devdata *local_devdata;
+	struct device *dev = NULL;
+	struct nx842_workmem *workmem;
+	struct nx842_scatterlist slin, slout;
+	struct nx_csbcpb *csbcpb;
+	int ret = 0, i, size, max_sync_size;
+	unsigned long inbuf, outbuf;
+	struct vio_pfo_op op = {
+		.done = NULL,
+		.handle = 0,
+		.timeout = 0,
+	};
+	unsigned long start_time = get_tb();
+
+	/* Ensure page alignment and size */
+	outbuf = (unsigned long)out;
+	if (!IS_ALIGNED(outbuf, PAGE_SIZE) || *outlen != PAGE_SIZE)
+		return -EINVAL;
+
+	rcu_read_lock();
+	local_devdata = rcu_dereference(devdata);
+	if (local_devdata)
+		dev = local_devdata->dev;
+
+	/* Get header */
+	hdr = (struct nx842_header *)in;
+
+	workmem = (struct nx842_workmem *)ALIGN((unsigned long)wmem,
+		NX842_HW_PAGE_SIZE);
+
+	inbuf = (unsigned long)in + hdr->offset;
+	if (likely(!IS_ALIGNED(inbuf, IO_BUFFER_ALIGN))) {
+		/* Copy block(s) into bounce buffer for alignment */
+		memcpy(workmem->bounce, in + hdr->offset, inlen - hdr->offset);
+		inbuf = (unsigned long)workmem->bounce;
+	}
+
+	/* Init scatterlist */
+	slin.entries = (struct nx842_slentry *)workmem->slin;
+	slout.entries = (struct nx842_slentry *)workmem->slout;
+
+	/* Init operation */
+	op.flags = NX842_OP_DECOMPRESS;
+	csbcpb = &workmem->csbcpb;
+	memset(csbcpb, 0, sizeof(*csbcpb));
+	op.csbcpb = virt_to_abs(csbcpb);
+
+	/*
+	 * max_sync_size may have changed since compression,
+	 * so we can't read it from the device info. We need
+	 * to derive it from hdr->blocks_nr.
+	 */
+	max_sync_size = PAGE_SIZE / hdr->blocks_nr;
+
+	for (i = 0; i < hdr->blocks_nr; i++) {
+		/* Skip padding */
+		inbuf = ALIGN(inbuf, IO_BUFFER_ALIGN);
+
+		if (hdr->sizes[i] < 0) {
+			/* Negative sizes indicate uncompressed data blocks */
+			size = abs(hdr->sizes[i]);
+			memcpy((void *)outbuf, (void *)inbuf, size);
+			outbuf += size;
+			inbuf += size;
+			continue;
+		}
+
+		if (!dev)
+			goto sw;
+
+		/*
+		 * The better the compression, the more likely the "likely"
+		 * case becomes.
+		 */
+		if (likely((inbuf & NX842_HW_PAGE_MASK) ==
+			((inbuf + hdr->sizes[i] - 1) & NX842_HW_PAGE_MASK))) {
+			/* Create direct DDE */
+			op.in = virt_to_abs(inbuf);
+			op.inlen = hdr->sizes[i];
+		} else {
+			/* Create indirect DDE (scatterlist) */
+			nx842_build_scatterlist(inbuf, hdr->sizes[i] , &slin);
+			op.in = virt_to_abs(slin.entries);
+			op.inlen = -nx842_get_scatterlist_size(&slin);
+		}
+
+		/*
+		 * NOTE: If the default max_sync_size is changed from 4k
+		 * to 64k, remove the "likely" case below, since a
+		 * scatterlist will always be needed.
+		 */
+		if (likely(max_sync_size == NX842_HW_PAGE_SIZE)) {
+			/* Create direct DDE */
+			op.out = virt_to_abs(outbuf);
+			op.outlen = max_sync_size;
+		} else {
+			/* Create indirect DDE (scatterlist) */
+			nx842_build_scatterlist(outbuf, max_sync_size, &slout);
+			op.out = virt_to_abs(slout.entries);
+			op.outlen = -nx842_get_scatterlist_size(&slout);
+		}
+
+		/* Send request to pHyp */
+		ret = vio_h_cop_sync(local_devdata->vdev, &op);
+
+		/* Check for pHyp error */
+		if (ret) {
+			dev_dbg(dev, "%s: vio_h_cop_sync error (ret=%d, hret=%ld)\n",
+				__func__, ret, op.hcall_err);
+			dev = NULL;
+			goto sw;
+		}
+
+		/* Check for hardware error */
+		ret = nx842_validate_result(dev, &csbcpb->csb);
+		if (ret) {
+			dev = NULL;
+			goto sw;
+		}
+
+		/* HW decompression success */
+		inbuf += hdr->sizes[i];
+		outbuf += csbcpb->csb.processed_byte_count;
+		continue;
+
+sw:
+		/* software decompression */
+		size = max_sync_size;
+		ret = sw842_decompress(
+			(unsigned char *)inbuf, hdr->sizes[i],
+			(unsigned char *)outbuf, &size, wmem);
+		if (ret)
+			pr_debug("%s: sw842_decompress failed with %d\n",
+				__func__, ret);
+
+		if (ret) {
+			if (ret != -ENOSPC && ret != -EINVAL &&
+					ret != -EMSGSIZE)
+				ret = -EIO;
+			goto unlock;
+		}
+
+		/* SW decompression success */
+		inbuf += hdr->sizes[i];
+		outbuf += size;
+	}
+
+	*outlen = (unsigned int)(outbuf - (unsigned long)out);
+
+unlock:
+	if (ret)
+		/* decompress fail */
+		nx842_inc_decomp_failed(local_devdata);
+	else {
+		if (!dev)
+			/* software decompress */
+			nx842_inc_swdecomp(local_devdata);
+		nx842_inc_decomp_complete(local_devdata);
+		ibm_nx842_incr_hist(local_devdata->counters->decomp_times,
+			(get_tb() - start_time) / tb_ticks_per_usec);
+	}
+
+	rcu_read_unlock();
+	return ret;
+}
+EXPORT_SYMBOL_GPL(nx842_decompress);
+
+/**
+ * nx842_OF_set_defaults -- Set default (disabled) values for devdata
+ *
+ * @devdata - struct nx842_devdata to update
+ *
+ * Returns:
+ *  0 on success
+ *  -ENOENT if @devdata ptr is NULL
+ */
+static int nx842_OF_set_defaults(struct nx842_devdata *devdata)
+{
+	if (devdata) {
+		devdata->max_sync_size = 0;
+		devdata->max_sync_sg = 0;
+		devdata->max_sg_len = 0;
+		devdata->status = UNAVAILABLE;
+		return 0;
+	} else
+		return -ENOENT;
+}
+
+/**
+ * nx842_OF_upd_status -- Update the device info from OF status prop
+ *
+ * The status property indicates if the accelerator is enabled.  If the
+ * device is in the OF tree it indicates that the hardware is present.
+ * The status field indicates if the device is enabled when the status
+ * is 'okay'.  Otherwise the device driver will be disabled.
+ *
+ * @devdata - struct nx842_devdata to update
+ * @prop - struct property point containing the maxsyncop for the update
+ *
+ * Returns:
+ *  0 - Device is available
+ *  -EINVAL - Device is not available
+ */
+static int nx842_OF_upd_status(struct nx842_devdata *devdata,
+					struct property *prop) {
+	int ret = 0;
+	const char *status = (const char *)prop->value;
+
+	if (!strncmp(status, "okay", (size_t)prop->length)) {
+		devdata->status = AVAILABLE;
+	} else {
+		dev_info(devdata->dev, "%s: status '%s' is not 'okay'\n",
+				__func__, status);
+		devdata->status = UNAVAILABLE;
+	}
+
+	return ret;
+}
+
+/**
+ * nx842_OF_upd_maxsglen -- Update the device info from OF maxsglen prop
+ *
+ * Definition of the 'ibm,max-sg-len' OF property:
+ *  This field indicates the maximum byte length of a scatter list
+ *  for the platform facility. It is a single cell encoded as with encode-int.
+ *
+ * Example:
+ *  # od -x ibm,max-sg-len
+ *  0000000 0000 0ff0
+ *
+ *  In this example, the maximum byte length of a scatter list is
+ *  0x0ff0 (4,080).
+ *
+ * @devdata - struct nx842_devdata to update
+ * @prop - struct property point containing the maxsyncop for the update
+ *
+ * Returns:
+ *  0 on success
+ *  -EINVAL on failure
+ */
+static int nx842_OF_upd_maxsglen(struct nx842_devdata *devdata,
+					struct property *prop) {
+	int ret = 0;
+	const int *maxsglen = prop->value;
+
+	if (prop->length != sizeof(*maxsglen)) {
+		dev_err(devdata->dev, "%s: unexpected format for ibm,max-sg-len property\n", __func__);
+		dev_dbg(devdata->dev, "%s: ibm,max-sg-len is %d bytes long, expected %lu bytes\n", __func__,
+				prop->length, sizeof(*maxsglen));
+		ret = -EINVAL;
+	} else {
+		devdata->max_sg_len = (unsigned int)min(*maxsglen,
+				(int)NX842_HW_PAGE_SIZE);
+	}
+
+	return ret;
+}
+
+/**
+ * nx842_OF_upd_maxsyncop -- Update the device info from OF maxsyncop prop
+ *
+ * Definition of the 'ibm,max-sync-cop' OF property:
+ *  Two series of cells.  The first series of cells represents the maximums
+ *  that can be synchronously compressed. The second series of cells
+ *  represents the maximums that can be synchronously decompressed.
+ *  1. The first cell in each series contains the count of the number of
+ *     data length, scatter list elements pairs that follow – each being
+ *     of the form
+ *    a. One cell data byte length
+ *    b. One cell total number of scatter list elements
+ *
+ * Example:
+ *  # od -x ibm,max-sync-cop
+ *  0000000 0000 0001 0000 1000 0000 01fe 0000 0001
+ *  0000020 0000 1000 0000 01fe
+ *
+ *  In this example, compression supports 0x1000 (4,096) data byte length
+ *  and 0x1fe (510) total scatter list elements.  Decompression supports
+ *  0x1000 (4,096) data byte length and 0x1f3 (510) total scatter list
+ *  elements.
+ *
+ * @devdata - struct nx842_devdata to update
+ * @prop - struct property point containing the maxsyncop for the update
+ *
+ * Returns:
+ *  0 on success
+ *  -EINVAL on failure
+ */
+static int nx842_OF_upd_maxsyncop(struct nx842_devdata *devdata,
+					struct property *prop) {
+	int ret = 0;
+	const struct maxsynccop_t {
+		int comp_elements;
+		int comp_data_limit;
+		int comp_sg_limit;
+		int decomp_elements;
+		int decomp_data_limit;
+		int decomp_sg_limit;
+	} *maxsynccop;
+
+	if (prop->length != sizeof(*maxsynccop)) {
+		dev_err(devdata->dev, "%s: unexpected format for ibm,max-sync-cop property\n", __func__);
+		dev_dbg(devdata->dev, "%s: ibm,max-sync-cop is %d bytes long, expected %lu bytes\n", __func__, prop->length,
+				sizeof(*maxsynccop));
+		ret = -EINVAL;
+		goto out;
+	}
+
+	maxsynccop = (const struct maxsynccop_t *)prop->value;
+
+	/* Use one limit rather than separate limits for compression and
+	 * decompression. Set a maximum for this so as not to exceed the
+	 * size that the header can support and round the value down to
+	 * the hardware page size (4K) */
+	devdata->max_sync_size =
+			(unsigned int)min(maxsynccop->comp_data_limit,
+					maxsynccop->decomp_data_limit);
+
+	devdata->max_sync_size = min_t(unsigned int, devdata->max_sync_size,
+					SIZE_64K);
+
+	if (devdata->max_sync_size < SIZE_4K) {
+		dev_err(devdata->dev, "%s: hardware max data size (%u) is "
+				"less than the driver minimum, unable to use "
+				"the hardware device\n",
+				__func__, devdata->max_sync_size);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	devdata->max_sync_sg = (unsigned int)min(maxsynccop->comp_sg_limit,
+						maxsynccop->decomp_sg_limit);
+	if (devdata->max_sync_sg < 1) {
+		dev_err(devdata->dev, "%s: hardware max sg size (%u) is "
+				"less than the driver minimum, unable to use "
+				"the hardware device\n",
+				__func__, devdata->max_sync_sg);
+		ret = -EINVAL;
+		goto out;
+	}
+
+out:
+	return ret;
+}
+
+/**
+ *
+ * nx842_OF_upd -- Handle OF properties updates for the device.
+ *
+ * Set all properties from the OF tree.  Optionally, a new property
+ * can be provided by the @new_prop pointer to overwrite an existing value.
+ * The device will remain disabled until all values are valid, this function
+ * will return an error for updates unless all values are valid.
+ *
+ * @new_prop: If not NULL, this property is being updated.  If NULL, update
+ *  all properties from the current values in the OF tree.
+ *
+ * Returns:
+ *  0 - Success
+ *  -ENOMEM - Could not allocate memory for new devdata structure
+ *  -EINVAL - property value not found, new_prop is not a recognized
+ *	property for the device or property value is not valid.
+ *  -ENODEV - Device is not available
+ */
+static int nx842_OF_upd(struct property *new_prop)
+{
+	struct nx842_devdata *old_devdata = NULL;
+	struct nx842_devdata *new_devdata = NULL;
+	struct device_node *of_node = NULL;
+	struct property *status = NULL;
+	struct property *maxsglen = NULL;
+	struct property *maxsyncop = NULL;
+	int ret = 0;
+	unsigned long flags;
+
+	spin_lock_irqsave(&devdata_mutex, flags);
+	old_devdata = rcu_dereference_check(devdata,
+			lockdep_is_held(&devdata_mutex));
+	if (old_devdata)
+		of_node = old_devdata->dev->of_node;
+
+	if (!old_devdata || !of_node) {
+		pr_err("%s: device is not available\n", __func__);
+		spin_unlock_irqrestore(&devdata_mutex, flags);
+		return -ENODEV;
+	}
+
+	new_devdata = kzalloc(sizeof(*new_devdata), GFP_NOFS);
+	if (!new_devdata) {
+		dev_err(old_devdata->dev, "%s: Could not allocate memory for device data\n", __func__);
+		ret = -ENOMEM;
+		goto error_out;
+	}
+
+	memcpy(new_devdata, old_devdata, sizeof(*old_devdata));
+	new_devdata->counters = old_devdata->counters;
+
+	/* Set ptrs for existing properties */
+	status = of_find_property(of_node, "status", NULL);
+	maxsglen = of_find_property(of_node, "ibm,max-sg-len", NULL);
+	maxsyncop = of_find_property(of_node, "ibm,max-sync-cop", NULL);
+	if (!status || !maxsglen || !maxsyncop) {
+		dev_err(old_devdata->dev, "%s: Could not locate device properties\n", __func__);
+		ret = -EINVAL;
+		goto error_out;
+	}
+
+	/* Set ptr to new property if provided */
+	if (new_prop) {
+		/* Single property */
+		if (!strncmp(new_prop->name, "status", new_prop->length)) {
+			status = new_prop;
+
+		} else if (!strncmp(new_prop->name, "ibm,max-sg-len",
+					new_prop->length)) {
+			maxsglen = new_prop;
+
+		} else if (!strncmp(new_prop->name, "ibm,max-sync-cop",
+					new_prop->length)) {
+			maxsyncop = new_prop;
+
+		} else {
+			/*
+			 * Skip the update, the property being updated
+			 * has no impact.
+			 */
+			goto out;
+		}
+	}
+
+	/* Perform property updates */
+	ret = nx842_OF_upd_status(new_devdata, status);
+	if (ret)
+		goto error_out;
+
+	ret = nx842_OF_upd_maxsglen(new_devdata, maxsglen);
+	if (ret)
+		goto error_out;
+
+	ret = nx842_OF_upd_maxsyncop(new_devdata, maxsyncop);
+	if (ret)
+		goto error_out;
+
+out:
+	dev_info(old_devdata->dev, "%s: max_sync_size new:%u old:%u\n",
+			__func__, new_devdata->max_sync_size,
+			old_devdata->max_sync_size);
+	dev_info(old_devdata->dev, "%s: max_sync_sg new:%u old:%u\n",
+			__func__, new_devdata->max_sync_sg,
+			old_devdata->max_sync_sg);
+	dev_info(old_devdata->dev, "%s: max_sg_len new:%u old:%u\n",
+			__func__, new_devdata->max_sg_len,
+			old_devdata->max_sg_len);
+
+	rcu_assign_pointer(devdata, new_devdata);
+	spin_unlock_irqrestore(&devdata_mutex, flags);
+	synchronize_rcu();
+	dev_set_drvdata(new_devdata->dev, new_devdata);
+	kfree(old_devdata);
+	return 0;
+
+error_out:
+	if (new_devdata) {
+		dev_info(old_devdata->dev, "%s: device disabled\n", __func__);
+		nx842_OF_set_defaults(new_devdata);
+		rcu_assign_pointer(devdata, new_devdata);
+		spin_unlock_irqrestore(&devdata_mutex, flags);
+		synchronize_rcu();
+		dev_set_drvdata(new_devdata->dev, new_devdata);
+		kfree(old_devdata);
+	} else {
+		dev_err(old_devdata->dev, "%s: could not update driver from hardware\n", __func__);
+		spin_unlock_irqrestore(&devdata_mutex, flags);
+	}
+
+	if (!ret)
+		ret = -EINVAL;
+	return ret;
+}
+
+/**
+ * nx842_OF_notifier - Process updates to OF properties for the device
+ *
+ * @np: notifier block
+ * @action: notifier action
+ * @update: struct pSeries_reconfig_prop_update pointer if action is
+ *	PSERIES_UPDATE_PROPERTY
+ *
+ * Returns:
+ *	NOTIFY_OK on success
+ *	NOTIFY_BAD encoded with error number on failure, use
+ *		notifier_to_errno() to decode this value
+ */
+static int nx842_OF_notifier(struct notifier_block *np,
+					unsigned long action,
+					void *update)
+{
+	struct pSeries_reconfig_prop_update *upd;
+	struct nx842_devdata *local_devdata;
+	struct device_node *node = NULL;
+
+	upd = (struct pSeries_reconfig_prop_update *)update;
+
+	rcu_read_lock();
+	local_devdata = rcu_dereference(devdata);
+	if (local_devdata)
+		node = local_devdata->dev->of_node;
+
+	if (local_devdata &&
+			action == PSERIES_UPDATE_PROPERTY &&
+			!strcmp(upd->node->name, node->name)) {
+		rcu_read_unlock();
+		nx842_OF_upd(upd->property);
+	} else
+		rcu_read_unlock();
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block nx842_of_nb = {
+	.notifier_call = nx842_OF_notifier,
+};
+
+#define nx842_counter_read(_name)					\
+static ssize_t nx842_##_name##_show(struct device *dev,		\
+		struct device_attribute *attr,				\
+		char *buf) {						\
+	struct nx842_devdata *local_devdata;			\
+	int p = 0;							\
+	rcu_read_lock();						\
+	local_devdata = rcu_dereference(devdata);			\
+	if (local_devdata)						\
+		p = snprintf(buf, PAGE_SIZE, "%ld\n",			\
+		       atomic64_read(&local_devdata->counters->_name));	\
+	rcu_read_unlock();						\
+	return p;							\
+}
+
+#define NX842DEV_COUNTER_ATTR_RO(_name)					\
+	nx842_counter_read(_name);					\
+	static struct device_attribute dev_attr_##_name = __ATTR(_name,	\
+						0444,			\
+						nx842_##_name##_show,\
+						NULL);
+
+NX842DEV_COUNTER_ATTR_RO(comp_complete);
+NX842DEV_COUNTER_ATTR_RO(comp_failed);
+NX842DEV_COUNTER_ATTR_RO(decomp_complete);
+NX842DEV_COUNTER_ATTR_RO(decomp_failed);
+NX842DEV_COUNTER_ATTR_RO(swdecomp);
+
+static ssize_t nx842_timehist_show(struct device *,
+		struct device_attribute *, char *);
+
+static struct device_attribute dev_attr_comp_times = __ATTR(comp_times, 0444,
+		nx842_timehist_show, NULL);
+static struct device_attribute dev_attr_decomp_times = __ATTR(decomp_times,
+		0444, nx842_timehist_show, NULL);
+
+static ssize_t nx842_timehist_show(struct device *dev,
+		struct device_attribute *attr, char *buf) {
+	char *p = buf;
+	struct nx842_devdata *local_devdata;
+	atomic64_t *times;
+	int bytes_remain = PAGE_SIZE;
+	int bytes;
+	int i;
+
+	rcu_read_lock();
+	local_devdata = rcu_dereference(devdata);
+	if (!local_devdata) {
+		rcu_read_unlock();
+		return 0;
+	}
+
+	if (attr == &dev_attr_comp_times)
+		times = local_devdata->counters->comp_times;
+	else if (attr == &dev_attr_decomp_times)
+		times = local_devdata->counters->decomp_times;
+	else {
+		rcu_read_unlock();
+		return 0;
+	}
+
+	for (i = 0; i < (NX842_HIST_SLOTS - 2); i++) {
+		bytes = snprintf(p, bytes_remain, "%u-%uus:\t%ld\n",
+			       i ? (2<<(i-1)) : 0, (2<<i)-1,
+			       atomic64_read(&times[i]));
+		bytes_remain -= bytes;
+		p += bytes;
+	}
+	/* The last bucket holds everything over
+	 * 2<<(NX842_HIST_SLOTS - 2) us */
+	bytes = snprintf(p, bytes_remain, "%uus - :\t%ld\n",
+			2<<(NX842_HIST_SLOTS - 2),
+			atomic64_read(&times[(NX842_HIST_SLOTS - 1)]));
+	p += bytes;
+
+	rcu_read_unlock();
+	return p - buf;
+}
+
+static struct attribute *nx842_sysfs_entries[] = {
+	&dev_attr_comp_complete.attr,
+	&dev_attr_comp_failed.attr,
+	&dev_attr_decomp_complete.attr,
+	&dev_attr_decomp_failed.attr,
+	&dev_attr_swdecomp.attr,
+	&dev_attr_comp_times.attr,
+	&dev_attr_decomp_times.attr,
+	NULL,
+};
+
+static struct attribute_group nx842_attribute_group = {
+	.name = NULL,		/* put in device directory */
+	.attrs = nx842_sysfs_entries,
+};
+
+static int __init nx842_probe(struct vio_dev *viodev,
+				  const struct vio_device_id *id)
+{
+	struct nx842_devdata *old_devdata, *new_devdata = NULL;
+	unsigned long flags;
+	int ret = 0;
+
+	spin_lock_irqsave(&devdata_mutex, flags);
+	old_devdata = rcu_dereference_check(devdata,
+			lockdep_is_held(&devdata_mutex));
+
+	if (old_devdata && old_devdata->vdev != NULL) {
+		dev_err(&viodev->dev, "%s: Attempt to register more than one instance of the hardware\n", __func__);
+		ret = -1;
+		goto error_unlock;
+	}
+
+	dev_set_drvdata(&viodev->dev, NULL);
+
+	new_devdata = kzalloc(sizeof(*new_devdata), GFP_NOFS);
+	if (!new_devdata) {
+		dev_err(&viodev->dev, "%s: Could not allocate memory for device data\n", __func__);
+		ret = -ENOMEM;
+		goto error_unlock;
+	}
+
+	new_devdata->counters = kzalloc(sizeof(*new_devdata->counters),
+			GFP_NOFS);
+	if (!new_devdata->counters) {
+		dev_err(&viodev->dev, "%s: Could not allocate memory for performance counters\n", __func__);
+		ret = -ENOMEM;
+		goto error_unlock;
+	}
+
+	new_devdata->vdev = viodev;
+	new_devdata->dev = &viodev->dev;
+	nx842_OF_set_defaults(new_devdata);
+
+	rcu_assign_pointer(devdata, new_devdata);
+	spin_unlock_irqrestore(&devdata_mutex, flags);
+	synchronize_rcu();
+	kfree(old_devdata);
+
+	pSeries_reconfig_notifier_register(&nx842_of_nb);
+
+	ret = nx842_OF_upd(NULL);
+	if (ret && ret != -ENODEV) {
+		dev_err(&viodev->dev, "could not parse device tree. %d\n", ret);
+		ret = -1;
+		goto error;
+	}
+
+	rcu_read_lock();
+	if (dev_set_drvdata(&viodev->dev, rcu_dereference(devdata))) {
+		rcu_read_unlock();
+		dev_err(&viodev->dev, "failed to set driver data for device\n");
+		ret = -1;
+		goto error;
+	}
+	rcu_read_unlock();
+
+	if (sysfs_create_group(&viodev->dev.kobj, &nx842_attribute_group)) {
+		dev_err(&viodev->dev, "could not create sysfs device attributes\n");
+		ret = -1;
+		goto error;
+	}
+
+	return 0;
+
+error_unlock:
+	spin_unlock_irqrestore(&devdata_mutex, flags);
+	if (new_devdata)
+		kfree(new_devdata->counters);
+	kfree(new_devdata);
+error:
+	return ret;
+}
+
+static int __exit nx842_remove(struct vio_dev *viodev)
+{
+	struct nx842_devdata *old_devdata;
+	unsigned long flags;
+
+	pr_info("Removing IBM Power 842 compression device\n");
+	sysfs_remove_group(&viodev->dev.kobj, &nx842_attribute_group);
+
+	spin_lock_irqsave(&devdata_mutex, flags);
+	old_devdata = rcu_dereference_check(devdata,
+			lockdep_is_held(&devdata_mutex));
+	pSeries_reconfig_notifier_unregister(&nx842_of_nb);
+	rcu_assign_pointer(devdata, NULL);
+	spin_unlock_irqrestore(&devdata_mutex, flags);
+	synchronize_rcu();
+	dev_set_drvdata(&viodev->dev, NULL);
+	if (old_devdata)
+		kfree(old_devdata->counters);
+	kfree(old_devdata);
+	return 0;
+}
+
+static struct vio_device_id nx842_driver_ids[] = {
+	{"ibm,compression-v1", "ibm,compression"},
+	{"", ""},
+};
+
+static struct vio_driver nx842_driver = {
+	.name = MODULE_NAME,
+	.probe = nx842_probe,
+	.remove = nx842_remove,
+	.get_desired_dma = nx842_get_desired_dma,
+	.id_table = nx842_driver_ids,
+};
+
+static int __init nx842_init(void)
+{
+	struct nx842_devdata *new_devdata;
+	pr_info("Registering IBM Power 842 compression driver\n");
+
+	RCU_INIT_POINTER(devdata, NULL);
+	new_devdata = kzalloc(sizeof(*new_devdata), GFP_KERNEL);
+	if (!new_devdata) {
+		pr_err("Could not allocate memory for device data\n");
+		return -ENOMEM;
+	}
+	new_devdata->status = UNAVAILABLE;
+	RCU_INIT_POINTER(devdata, new_devdata);
+
+	return vio_register_driver(&nx842_driver);
+}
+
+module_init(nx842_init);
+
+static void __exit nx842_exit(void)
+{
+	struct nx842_devdata *old_devdata;
+	unsigned long flags;
+
+	pr_info("Exiting IBM Power 842 compression driver\n");
+	spin_lock_irqsave(&devdata_mutex, flags);
+	old_devdata = rcu_dereference_check(devdata,
+			lockdep_is_held(&devdata_mutex));
+	rcu_assign_pointer(devdata, NULL);
+	spin_unlock_irqrestore(&devdata_mutex, flags);
+	synchronize_rcu();
+	if (old_devdata)
+		dev_set_drvdata(old_devdata->dev, NULL);
+	kfree(old_devdata);
+	vio_unregister_driver(&nx842_driver);
+}
+
+module_exit(nx842_exit);
+
+/*********************************
+ * 842 software decompressor
+*********************************/
+typedef int (*sw842_template_op)(const char **, int *, unsigned char **,
+						struct sw842_fifo *);
+
+static int sw842_data8(const char **, int *, unsigned char **,
+						struct sw842_fifo *);
+static int sw842_data4(const char **, int *, unsigned char **,
+						struct sw842_fifo *);
+static int sw842_data2(const char **, int *, unsigned char **,
+						struct sw842_fifo *);
+static int sw842_ptr8(const char **, int *, unsigned char **,
+						struct sw842_fifo *);
+static int sw842_ptr4(const char **, int *, unsigned char **,
+						struct sw842_fifo *);
+static int sw842_ptr2(const char **, int *, unsigned char **,
+						struct sw842_fifo *);
+
+/* special templates */
+#define SW842_TMPL_REPEAT 0x1B
+#define SW842_TMPL_ZEROS 0x1C
+#define SW842_TMPL_EOF 0x1E
+
+static sw842_template_op sw842_tmpl_ops[26][4] = {
+	{ sw842_data8, NULL}, /* 0 (00000) */
+	{ sw842_data4, sw842_data2, sw842_ptr2,  NULL},
+	{ sw842_data4, sw842_ptr2,  sw842_data2, NULL},
+	{ sw842_data4, sw842_ptr2,  sw842_ptr2,  NULL},
+	{ sw842_data4, sw842_ptr4,  NULL},
+	{ sw842_data2, sw842_ptr2,  sw842_data4, NULL},
+	{ sw842_data2, sw842_ptr2,  sw842_data2, sw842_ptr2},
+	{ sw842_data2, sw842_ptr2,  sw842_ptr2,  sw842_data2},
+	{ sw842_data2, sw842_ptr2,  sw842_ptr2,  sw842_ptr2,},
+	{ sw842_data2, sw842_ptr2,  sw842_ptr4,  NULL},
+	{ sw842_ptr2,  sw842_data2, sw842_data4, NULL}, /* 10 (01010) */
+	{ sw842_ptr2,  sw842_data4, sw842_ptr2,  NULL},
+	{ sw842_ptr2,  sw842_data2, sw842_ptr2,  sw842_data2},
+	{ sw842_ptr2,  sw842_data2, sw842_ptr2,  sw842_ptr2},
+	{ sw842_ptr2,  sw842_data2, sw842_ptr4,  NULL},
+	{ sw842_ptr2,  sw842_ptr2,  sw842_data4, NULL},
+	{ sw842_ptr2,  sw842_ptr2,  sw842_data2, sw842_ptr2},
+	{ sw842_ptr2,  sw842_ptr2,  sw842_ptr2,  sw842_data2},
+	{ sw842_ptr2,  sw842_ptr2,  sw842_ptr2,  sw842_ptr2},
+	{ sw842_ptr2,  sw842_ptr2,  sw842_ptr4,  NULL},
+	{ sw842_ptr4,  sw842_data4, NULL}, /* 20 (10100) */
+	{ sw842_ptr4,  sw842_data2, sw842_ptr2,  NULL},
+	{ sw842_ptr4,  sw842_ptr2,  sw842_data2, NULL},
+	{ sw842_ptr4,  sw842_ptr2,  sw842_ptr2,  NULL},
+	{ sw842_ptr4,  sw842_ptr4,  NULL},
+	{ sw842_ptr8,  NULL}
+};
+
+/* Software decompress helpers */
+
+static uint8_t sw842_get_byte(const char *buf, int bit)
+{
+	uint8_t tmpl;
+	uint16_t tmp;
+	tmp = htons(*(uint16_t *)(buf));
+	tmp = (uint16_t)(tmp << bit);
+	tmp = ntohs(tmp);
+	memcpy(&tmpl, &tmp, 1);
+	return tmpl;
+}
+
+static uint8_t sw842_get_template(const char **buf, int *bit)
+{
+	uint8_t byte;
+	byte = sw842_get_byte(*buf, *bit);
+	byte = byte >> 3;
+	byte &= 0x1F;
+	*buf += (*bit + 5) / 8;
+	*bit = (*bit + 5) % 8;
+	return byte;
+}
+
+/* repeat_count happens to be 5-bit too (like the template) */
+static uint8_t sw842_get_repeat_count(const char **buf, int *bit)
+{
+	uint8_t byte;
+	byte = sw842_get_byte(*buf, *bit);
+	byte = byte >> 2;
+	byte &= 0x3F;
+	*buf += (*bit + 6) / 8;
+	*bit = (*bit + 6) % 8;
+	return byte;
+}
+
+static uint8_t sw842_get_ptr2(const char **buf, int *bit)
+{
+	uint8_t ptr;
+	ptr = sw842_get_byte(*buf, *bit);
+	(*buf)++;
+	return ptr;
+}
+
+static uint16_t sw842_get_ptr4(const char **buf, int *bit,
+		struct sw842_fifo *fifo)
+{
+	uint16_t ptr;
+	ptr = htons(*(uint16_t *)(*buf));
+	ptr = (uint16_t)(ptr << *bit);
+	ptr = ptr >> 7;
+	ptr &= 0x01FF;
+	*buf += (*bit + 9) / 8;
+	*bit = (*bit + 9) % 8;
+	return ptr;
+}
+
+static uint8_t sw842_get_ptr8(const char **buf, int *bit,
+		struct sw842_fifo *fifo)
+{
+	return sw842_get_ptr2(buf, bit);
+}
+
+/* Software decompress template ops */
+
+static int sw842_data8(const char **inbuf, int *inbit,
+		unsigned char **outbuf, struct sw842_fifo *fifo)
+{
+	int ret;
+
+	ret = sw842_data4(inbuf, inbit, outbuf, fifo);
+	if (ret)
+		return ret;
+	ret = sw842_data4(inbuf, inbit, outbuf, fifo);
+	return ret;
+}
+
+static int sw842_data4(const char **inbuf, int *inbit,
+		unsigned char **outbuf, struct sw842_fifo *fifo)
+{
+	int ret;
+
+	ret = sw842_data2(inbuf, inbit, outbuf, fifo);
+	if (ret)
+		return ret;
+	ret = sw842_data2(inbuf, inbit, outbuf, fifo);
+	return ret;
+}
+
+static int sw842_data2(const char **inbuf, int *inbit,
+		unsigned char **outbuf, struct sw842_fifo *fifo)
+{
+	**outbuf = sw842_get_byte(*inbuf, *inbit);
+	(*inbuf)++;
+	(*outbuf)++;
+	**outbuf = sw842_get_byte(*inbuf, *inbit);
+	(*inbuf)++;
+	(*outbuf)++;
+	return 0;
+}
+
+static int sw842_ptr8(const char **inbuf, int *inbit,
+		unsigned char **outbuf, struct sw842_fifo *fifo)
+{
+	uint8_t ptr;
+	ptr = sw842_get_ptr8(inbuf, inbit, fifo);
+	if (!fifo->f84_full && (ptr >= fifo->f8_count))
+		return 1;
+	memcpy(*outbuf, fifo->f8[ptr], 8);
+	*outbuf += 8;
+	return 0;
+}
+
+static int sw842_ptr4(const char **inbuf, int *inbit,
+		unsigned char **outbuf, struct sw842_fifo *fifo)
+{
+	uint16_t ptr;
+	ptr = sw842_get_ptr4(inbuf, inbit, fifo);
+	if (!fifo->f84_full && (ptr >= fifo->f4_count))
+		return 1;
+	memcpy(*outbuf, fifo->f4[ptr], 4);
+	*outbuf += 4;
+	return 0;
+}
+
+static int sw842_ptr2(const char **inbuf, int *inbit,
+		unsigned char **outbuf, struct sw842_fifo *fifo)
+{
+	uint8_t ptr;
+	ptr = sw842_get_ptr2(inbuf, inbit);
+	if (!fifo->f2_full && (ptr >= fifo->f2_count))
+		return 1;
+	memcpy(*outbuf, fifo->f2[ptr], 2);
+	*outbuf += 2;
+	return 0;
+}
+
+static void sw842_copy_to_fifo(const char *buf, struct sw842_fifo *fifo)
+{
+	unsigned char initial_f2count = fifo->f2_count;
+
+	memcpy(fifo->f8[fifo->f8_count], buf, 8);
+	fifo->f4_count += 2;
+	fifo->f8_count += 1;
+
+	if (!fifo->f84_full && fifo->f4_count >= 512) {
+		fifo->f84_full = 1;
+		fifo->f4_count /= 512;
+	}
+
+	memcpy(fifo->f2[fifo->f2_count++], buf, 2);
+	memcpy(fifo->f2[fifo->f2_count++], buf + 2, 2);
+	memcpy(fifo->f2[fifo->f2_count++], buf + 4, 2);
+	memcpy(fifo->f2[fifo->f2_count++], buf + 6, 2);
+	if (fifo->f2_count < initial_f2count)
+		fifo->f2_full = 1;
+}
+
+static int sw842_decompress(const unsigned char *src, int srclen,
+			unsigned char *dst, int *destlen,
+			const void *wrkmem)
+{
+	uint8_t tmpl;
+	const char *inbuf;
+	int inbit = 0;
+	unsigned char *outbuf, *outbuf_end, *origbuf, *prevbuf;
+	const char *inbuf_end;
+	sw842_template_op op;
+	int opindex;
+	int i, repeat_count;
+	struct sw842_fifo *fifo;
+	int ret = 0;
+
+	fifo = &((struct nx842_workmem *)(wrkmem))->swfifo;
+	memset(fifo, 0, sizeof(*fifo));
+
+	origbuf = NULL;
+	inbuf = src;
+	inbuf_end = src + srclen;
+	outbuf = dst;
+	outbuf_end = dst + *destlen;
+
+	while ((tmpl = sw842_get_template(&inbuf, &inbit)) != SW842_TMPL_EOF) {
+		if (inbuf >= inbuf_end) {
+			ret = -EINVAL;
+			goto out;
+		}
+
+		opindex = 0;
+		prevbuf = origbuf;
+		origbuf = outbuf;
+		switch (tmpl) {
+		case SW842_TMPL_REPEAT:
+			if (prevbuf == NULL) {
+				ret = -EINVAL;
+				goto out;
+			}
+
+			repeat_count = sw842_get_repeat_count(&inbuf,
+								&inbit) + 1;
+
+			/* Did the repeat count advance past the end of input */
+			if (inbuf > inbuf_end) {
+				ret = -EINVAL;
+				goto out;
+			}
+
+			for (i = 0; i < repeat_count; i++) {
+				/* Would this overflow the output buffer */
+				if ((outbuf + 8) > outbuf_end) {
+					ret = -ENOSPC;
+					goto out;
+				}
+
+				memcpy(outbuf, prevbuf, 8);
+				sw842_copy_to_fifo(outbuf, fifo);
+				outbuf += 8;
+			}
+			break;
+
+		case SW842_TMPL_ZEROS:
+			/* Would this overflow the output buffer */
+			if ((outbuf + 8) > outbuf_end) {
+				ret = -ENOSPC;
+				goto out;
+			}
+
+			memset(outbuf, 0, 8);
+			sw842_copy_to_fifo(outbuf, fifo);
+			outbuf += 8;
+			break;
+
+		default:
+			if (tmpl > 25) {
+				ret = -EINVAL;
+				goto out;
+			}
+
+			/* Does this go past the end of the input buffer */
+			if ((inbuf + 2) > inbuf_end) {
+				ret = -EINVAL;
+				goto out;
+			}
+
+			/* Would this overflow the output buffer */
+			if ((outbuf + 8) > outbuf_end) {
+				ret = -ENOSPC;
+				goto out;
+			}
+
+			while (opindex < 4 &&
+				(op = sw842_tmpl_ops[tmpl][opindex++])
+					!= NULL) {
+				ret = (*op)(&inbuf, &inbit, &outbuf, fifo);
+				if (ret) {
+					ret = -EINVAL;
+					goto out;
+				}
+				sw842_copy_to_fifo(origbuf, fifo);
+			}
+		}
+	}
+
+out:
+	if (!ret)
+		*destlen = (unsigned int)(outbuf - dst);
+	else
+		*destlen = 0;
+
+	return ret;
+}
diff --git a/include/linux/nx842.h b/include/linux/nx842.h
new file mode 100644
index 0000000..a4d324c
--- /dev/null
+++ b/include/linux/nx842.h
@@ -0,0 +1,11 @@
+#ifndef __NX842_H__
+#define __NX842_H__
+
+int nx842_get_workmem_size(void);
+int nx842_get_workmem_size_aligned(void);
+int nx842_compress(const unsigned char *in, unsigned int in_len,
+		unsigned char *out, unsigned int *out_len, void *wrkmem);
+int nx842_decompress(const unsigned char *in, unsigned int in_len,
+		unsigned char *out, unsigned int *out_len, void *wrkmem);
+
+#endif
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 0/4] powerpc/crypto: IBM Power7+ in-Nest compression support
From: Seth Jennings @ 2012-07-19 14:42 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Seth Jennings, Kent Yoder, Herbert Xu, Greg Kroah-Hartman,
	linux-kernel, Paul Mackerras, Jeff Kirsher, Andrew Morton,
	Robert Jennings, linuxppc-dev, David S. Miller, linux-crypto

This is a continuation of support for the Power7+ in-Nest
hardware accelerator.

https://lkml.org/lkml/2012/4/12/223

This patchset adds the hardware driver and the cryptographic
driver for hardware accelerated compression, which uses a
hardware-optimized algorithm named 842.

The hardware driver has limits on generic compression and is
geared toward compressing units that are of PAGE_SIZE for
in-kernel memory compression.

Based on linux-next (20120717)

Seth Jennings (4):
  powerpc: nx: rework Kconfig
  powerpc: nx: add compression support to arch vec
  powerpc: nx: add 842 hardware compression driver
  crypto: add 842 crypto driver

 MAINTAINERS                     |    6 +
 arch/powerpc/kernel/prom_init.c |    4 +-
 crypto/842.c                    |  183 +++++
 crypto/Kconfig                  |    9 +
 crypto/Makefile                 |    1 +
 drivers/crypto/Kconfig          |   20 +-
 drivers/crypto/nx/Kconfig       |   26 +
 drivers/crypto/nx/Makefile      |    5 +-
 drivers/crypto/nx/nx-842.c      | 1615 +++++++++++++++++++++++++++++++++++++++
 include/linux/nx842.h           |   11 +
 10 files changed, 1864 insertions(+), 16 deletions(-)
 create mode 100644 crypto/842.c
 create mode 100644 drivers/crypto/nx/Kconfig
 create mode 100644 drivers/crypto/nx/nx-842.c
 create mode 100644 include/linux/nx842.h

-- 
1.7.9.5

^ permalink raw reply

* [PATCH 4/4] powerpc/crypto: add 842 crypto driver
From: Seth Jennings @ 2012-07-19 14:42 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Seth Jennings, Kent Yoder, Herbert Xu, Greg Kroah-Hartman,
	linux-kernel, Paul Mackerras, Jeff Kirsher, Andrew Morton,
	Robert Jennings, linuxppc-dev, David S. Miller, linux-crypto
In-Reply-To: <1342708961-28587-1-git-send-email-sjenning@linux.vnet.ibm.com>

This patch add the 842 cryptographic API driver that
submits compression requests to the 842 hardware compression
accelerator driver (nx-compress).

If the hardware accelerator goes offline for any reason
(dynamic disable, migration, etc...), this driver will use LZO
as a software failover for all future compression requests.
For decompression requests, the 842 hardware driver contains
a software implementation of the 842 decompressor to support
the decompression of data that was compressed before the accelerator
went offline.

Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
---
 crypto/842.c    |  183 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 crypto/Kconfig  |    9 +++
 crypto/Makefile |    1 +
 3 files changed, 193 insertions(+)
 create mode 100644 crypto/842.c

diff --git a/crypto/842.c b/crypto/842.c
new file mode 100644
index 0000000..144767d
--- /dev/null
+++ b/crypto/842.c
@@ -0,0 +1,183 @@
+/*
+ * Cryptographic API for the 842 compression algorithm.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright (C) IBM Corporation, 2011
+ *
+ * Authors: Robert Jennings <rcj@linux.vnet.ibm.com>
+ *          Seth Jennings <sjenning@linux.vnet.ibm.com>
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/crypto.h>
+#include <linux/vmalloc.h>
+#include <linux/nx842.h>
+#include <linux/lzo.h>
+#include <linux/timer.h>
+
+static int nx842_uselzo;
+
+struct nx842_ctx {
+	void *nx842_wmem; /* working memory for 842/lzo */
+};
+
+enum nx842_crypto_type {
+	NX842_CRYPTO_TYPE_842,
+	NX842_CRYPTO_TYPE_LZO
+};
+
+#define NX842_SENTINEL 0xdeadbeef
+
+struct nx842_crypto_header {
+	unsigned int sentinel; /* debug */
+	enum nx842_crypto_type type;
+};
+
+static int nx842_init(struct crypto_tfm *tfm)
+{
+	struct nx842_ctx *ctx = crypto_tfm_ctx(tfm);
+	int wmemsize;
+
+	wmemsize = max_t(int, nx842_get_workmem_size(), LZO1X_MEM_COMPRESS);
+	ctx->nx842_wmem = kmalloc(wmemsize, GFP_NOFS);
+	if (!ctx->nx842_wmem)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static void nx842_exit(struct crypto_tfm *tfm)
+{
+	struct nx842_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	kfree(ctx->nx842_wmem);
+}
+
+static void nx842_reset_uselzo(unsigned long data)
+{
+	nx842_uselzo = 0;
+}
+
+static DEFINE_TIMER(failover_timer, nx842_reset_uselzo, 0, 0);
+
+static int nx842_crypto_compress(struct crypto_tfm *tfm, const u8 *src,
+			    unsigned int slen, u8 *dst, unsigned int *dlen)
+{
+	struct nx842_ctx *ctx = crypto_tfm_ctx(tfm);
+	struct nx842_crypto_header *hdr;
+	unsigned int tmp_len = *dlen;
+	size_t lzodlen; /* needed for lzo */
+	int err;
+
+	*dlen = 0;
+	hdr = (struct nx842_crypto_header *)dst;
+	hdr->sentinel = NX842_SENTINEL; /* debug */
+	dst += sizeof(struct nx842_crypto_header);
+	tmp_len -= sizeof(struct nx842_crypto_header);
+	lzodlen = tmp_len;
+
+	if (likely(!nx842_uselzo)) {
+		err = nx842_compress(src, slen, dst, &tmp_len, ctx->nx842_wmem);
+
+		if (likely(!err)) {
+			hdr->type = NX842_CRYPTO_TYPE_842;
+			*dlen = tmp_len + sizeof(struct nx842_crypto_header);
+			return 0;
+		}
+
+		/* hardware failed */
+		nx842_uselzo = 1;
+
+		/* set timer to check for hardware again in 1 second */
+		mod_timer(&failover_timer, jiffies + msecs_to_jiffies(1000));
+	}
+
+	/* no hardware, use lzo */
+	err = lzo1x_1_compress(src, slen, dst, &lzodlen, ctx->nx842_wmem);
+	if (err != LZO_E_OK)
+		return -EINVAL;
+
+	hdr->type = NX842_CRYPTO_TYPE_LZO;
+	*dlen = lzodlen + sizeof(struct nx842_crypto_header);
+	return 0;
+}
+
+static int nx842_crypto_decompress(struct crypto_tfm *tfm, const u8 *src,
+			      unsigned int slen, u8 *dst, unsigned int *dlen)
+{
+	struct nx842_ctx *ctx = crypto_tfm_ctx(tfm);
+	struct nx842_crypto_header *hdr;
+	unsigned int tmp_len = *dlen;
+	size_t lzodlen; /* needed for lzo */
+	int err;
+
+	*dlen = 0;
+	hdr = (struct nx842_crypto_header *)src;
+
+	if (unlikely(hdr->sentinel != NX842_SENTINEL))
+		return -EINVAL;
+
+	src += sizeof(struct nx842_crypto_header);
+	slen -= sizeof(struct nx842_crypto_header);
+
+	if (likely(hdr->type == NX842_CRYPTO_TYPE_842)) {
+		err = nx842_decompress(src, slen, dst, &tmp_len,
+			ctx->nx842_wmem);
+		if (err)
+			return -EINVAL;
+		*dlen = tmp_len;
+	} else if (hdr->type == NX842_CRYPTO_TYPE_LZO) {
+		lzodlen = tmp_len;
+		err = lzo1x_decompress_safe(src, slen, dst, &lzodlen);
+		if (err != LZO_E_OK)
+			return -EINVAL;
+		*dlen = lzodlen;
+	} else
+		return -EINVAL;
+
+	return 0;
+}
+
+static struct crypto_alg alg = {
+	.cra_name		= "842",
+	.cra_flags		= CRYPTO_ALG_TYPE_COMPRESS,
+	.cra_ctxsize		= sizeof(struct nx842_ctx),
+	.cra_module		= THIS_MODULE,
+	.cra_list		= LIST_HEAD_INIT(alg.cra_list),
+	.cra_init		= nx842_init,
+	.cra_exit		= nx842_exit,
+	.cra_u			= { .compress = {
+	.coa_compress		= nx842_crypto_compress,
+	.coa_decompress		= nx842_crypto_decompress } }
+};
+
+static int __init nx842_mod_init(void)
+{
+	del_timer(&failover_timer);
+	return crypto_register_alg(&alg);
+}
+
+static void __exit nx842_mod_exit(void)
+{
+	crypto_unregister_alg(&alg);
+}
+
+module_init(nx842_mod_init);
+module_exit(nx842_mod_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("842 Compression Algorithm");
diff --git a/crypto/Kconfig b/crypto/Kconfig
index a323805..7876358 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -1008,6 +1008,15 @@ config CRYPTO_LZO
 	help
 	  This is the LZO algorithm.
 
+config CRYPTO_842
+	tristate "842 compression algorithm"
+	depends on CRYPTO_DEV_NX_COMPRESS
+	# 842 uses lzo if the hardware becomes unavailable
+	select LZO_COMPRESS
+	select LZO_DECOMPRESS
+	help
+	  This is the 842 algorithm.
+
 comment "Random Number Generation"
 
 config CRYPTO_ANSI_CPRNG
diff --git a/crypto/Makefile b/crypto/Makefile
index 30f33d6..5d5675c 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -82,6 +82,7 @@ obj-$(CONFIG_CRYPTO_MICHAEL_MIC) += michael_mic.o
 obj-$(CONFIG_CRYPTO_CRC32C) += crc32c.o
 obj-$(CONFIG_CRYPTO_AUTHENC) += authenc.o authencesn.o
 obj-$(CONFIG_CRYPTO_LZO) += lzo.o
+obj-$(CONFIG_CRYPTO_842) += 842.o
 obj-$(CONFIG_CRYPTO_RNG2) += rng.o
 obj-$(CONFIG_CRYPTO_RNG2) += krng.o
 obj-$(CONFIG_CRYPTO_ANSI_CPRNG) += ansi_cprng.o
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCH] powerpc/85xx: Fix sram_offset parameter type
From: Kumar Gala @ 2012-07-19 12:07 UTC (permalink / raw)
  To: Claudiu Manoil; +Cc: linuxppc-dev
In-Reply-To: <1342693696-29734-1-git-send-email-claudiu.manoil@freescale.com>


On Jul 19, 2012, at 5:28 AM, Claudiu Manoil wrote:

> The sram_offset parameter represents a physical address
> and should be of type phys_addr_t. As part of this fix,
> the extraction of sram_params is being cleaned-up and
> fixed.
> This patch fixes now the case when the offset value of
> 0xfff00000 was being rejected by the driver (returning
> -EINVAL), although this is a valid offset value.
>=20
> Signed-off-by: Tang Yuantian <Yuantian.Tang@freescale.com>
> Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
> ---
> arch/powerpc/sysdev/fsl_85xx_cache_ctlr.h |    4 +-
> arch/powerpc/sysdev/fsl_85xx_l2ctlr.c     |   39 =
++++++++++------------------
> 2 files changed, 16 insertions(+), 27 deletions(-)
>=20
> diff --git a/arch/powerpc/sysdev/fsl_85xx_cache_ctlr.h =
b/arch/powerpc/sysdev/fsl_85xx_cache_ctlr.h
> index 60c9c0b..a4ce9b8 100644
> --- a/arch/powerpc/sysdev/fsl_85xx_cache_ctlr.h
> +++ b/arch/powerpc/sysdev/fsl_85xx_cache_ctlr.h
> @@ -1,5 +1,5 @@
> /*
> - * Copyright 2009-2010 Freescale Semiconductor, Inc
> + * Copyright 2009-2010 2012 Freescale Semiconductor, Inc

we normally do 2009-2010, 2012

>  *
>  * QorIQ based Cache Controller Memory Mapped Registers
>  *
> @@ -91,7 +91,7 @@ struct mpc85xx_l2ctlr {
>=20
> struct sram_parameters {
> 	unsigned int sram_size;
> -	uint64_t sram_offset;
> +	phys_addr_t sram_offset;
> };
>=20
>=20

- k

^ permalink raw reply

* [PATCH] powerpc/85xx: Fix sram_offset parameter type
From: Claudiu Manoil @ 2012-07-19 10:28 UTC (permalink / raw)
  To: linuxppc-dev

The sram_offset parameter represents a physical address
and should be of type phys_addr_t. As part of this fix,
the extraction of sram_params is being cleaned-up and
fixed.
This patch fixes now the case when the offset value of
0xfff00000 was being rejected by the driver (returning
-EINVAL), although this is a valid offset value.

Signed-off-by: Tang Yuantian <Yuantian.Tang@freescale.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
---
 arch/powerpc/sysdev/fsl_85xx_cache_ctlr.h |    4 +-
 arch/powerpc/sysdev/fsl_85xx_l2ctlr.c     |   39 ++++++++++------------------
 2 files changed, 16 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/sysdev/fsl_85xx_cache_ctlr.h b/arch/powerpc/sysdev/fsl_85xx_cache_ctlr.h
index 60c9c0b..a4ce9b8 100644
--- a/arch/powerpc/sysdev/fsl_85xx_cache_ctlr.h
+++ b/arch/powerpc/sysdev/fsl_85xx_cache_ctlr.h
@@ -1,5 +1,5 @@
 /*
- * Copyright 2009-2010 Freescale Semiconductor, Inc
+ * Copyright 2009-2010 2012 Freescale Semiconductor, Inc
  *
  * QorIQ based Cache Controller Memory Mapped Registers
  *
@@ -91,7 +91,7 @@ struct mpc85xx_l2ctlr {
 
 struct sram_parameters {
 	unsigned int sram_size;
-	uint64_t sram_offset;
+	phys_addr_t sram_offset;
 };
 
 extern int instantiate_cache_sram(struct platform_device *dev,
diff --git a/arch/powerpc/sysdev/fsl_85xx_l2ctlr.c b/arch/powerpc/sysdev/fsl_85xx_l2ctlr.c
index 611bb4b..d65e785 100644
--- a/arch/powerpc/sysdev/fsl_85xx_l2ctlr.c
+++ b/arch/powerpc/sysdev/fsl_85xx_l2ctlr.c
@@ -1,5 +1,5 @@
 /*
- * Copyright 2009-2010 Freescale Semiconductor, Inc.
+ * Copyright 2009-2010 2012 Freescale Semiconductor, Inc.
  *
  * QorIQ (P1/P2) L2 controller init for Cache-SRAM instantiation
  *
@@ -31,24 +31,21 @@ static char *sram_size;
 static char *sram_offset;
 struct mpc85xx_l2ctlr __iomem *l2ctlr;
 
-static long get_cache_sram_size(void)
+static int get_cache_sram_params(struct sram_parameters *sram_params)
 {
-	unsigned long val;
+	unsigned long long addr;
+	unsigned int size;
 
-	if (!sram_size || (strict_strtoul(sram_size, 0, &val) < 0))
+	if (!sram_size || (kstrtouint(sram_size, 0, &size) < 0))
 		return -EINVAL;
 
-	return val;
-}
-
-static long get_cache_sram_offset(void)
-{
-	unsigned long val;
-
-	if (!sram_offset || (strict_strtoul(sram_offset, 0, &val) < 0))
+	if (!sram_offset || (kstrtoull(sram_offset, 0, &addr) < 0))
 		return -EINVAL;
 
-	return val;
+	sram_params->sram_offset = addr;
+	sram_params->sram_size = size;
+
+	return 0;
 }
 
 static int __init get_size_from_cmdline(char *str)
@@ -93,17 +90,9 @@ static int __devinit mpc85xx_l2ctlr_of_probe(struct platform_device *dev)
 	}
 	l2cache_size = *prop;
 
-	sram_params.sram_size  = get_cache_sram_size();
-	if ((int)sram_params.sram_size <= 0) {
-		dev_err(&dev->dev,
-			"Entire L2 as cache, Aborting Cache-SRAM stuff\n");
-		return -EINVAL;
-	}
-
-	sram_params.sram_offset  = get_cache_sram_offset();
-	if ((int64_t)sram_params.sram_offset <= 0) {
+	if (get_cache_sram_params(&sram_params)) {
 		dev_err(&dev->dev,
-			"Entire L2 as cache, provide a valid sram offset\n");
+			"Entire L2 as cache, provide valid sram offset and size\n");
 		return -EINVAL;
 	}
 
@@ -125,14 +114,14 @@ static int __devinit mpc85xx_l2ctlr_of_probe(struct platform_device *dev)
 	 * Write bits[0-17] to srbar0
 	 */
 	out_be32(&l2ctlr->srbar0,
-		sram_params.sram_offset & L2SRAM_BAR_MSK_LO18);
+		lower_32_bits(sram_params.sram_offset) & L2SRAM_BAR_MSK_LO18);
 
 	/*
 	 * Write bits[18-21] to srbare0
 	 */
 #ifdef CONFIG_PHYS_64BIT
 	out_be32(&l2ctlr->srbarea0,
-		(sram_params.sram_offset >> 32) & L2SRAM_BARE_MSK_HI4);
+		upper_32_bits(sram_params.sram_offset) & L2SRAM_BARE_MSK_HI4);
 #endif
 
 	clrsetbits_be32(&l2ctlr->ctl, L2CR_L2E, L2CR_L2FI);
-- 
1.6.6

^ permalink raw reply related

* Re: [RFC PATCH v4 11/13] memory-hotplug : free memmap of sparse-vmemmap
From: Wen Congyang @ 2012-07-19 10:01 UTC (permalink / raw)
  To: Yasuaki Ishimatsu
  Cc: len.brown, linux-acpi, linux-kernel, linux-mm, paulus,
	minchan.kim, kosaki.motohiro, rientjes, cl, linuxppc-dev, akpm,
	liuj97
In-Reply-To: <5007D722.1030807@cn.fujitsu.com>

At 07/19/2012 05:45 PM, Wen Congyang Wrote:
> At 07/18/2012 06:16 PM, Yasuaki Ishimatsu Wrote:
>> All pages of virtual mapping in removed memory cannot be freed, since some pages
>> used as PGD/PUD includes not only removed memory but also other memory. So the
>> patch checks whether page can be freed or not.
>>
>> How to check whether page can be freed or not?
>>  1. When removing memory, the page structs of the revmoved memory are filled
>>     with 0FD.
>>  2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared.
>>     In this case, the page used as PT/PMD can be freed.
>>
>> Applying patch, __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is integrated
>> into one. So __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is deleted.
>>
>> CC: David Rientjes <rientjes@google.com>
>> CC: Jiang Liu <liuj97@gmail.com>
>> CC: Len Brown <len.brown@intel.com>
>> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> CC: Paul Mackerras <paulus@samba.org> 
>> CC: Christoph Lameter <cl@linux.com>
>> Cc: Minchan Kim <minchan.kim@gmail.com>
>> CC: Andrew Morton <akpm@linux-foundation.org>
>> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> 
>> CC: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>>
>> ---
>>  arch/x86/mm/init_64.c |  121 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>  include/linux/mm.h    |    2 
>>  mm/memory_hotplug.c   |   19 -------
>>  mm/sparse.c           |    5 +-
>>  4 files changed, 128 insertions(+), 19 deletions(-)
>>
>> Index: linux-3.5-rc6/include/linux/mm.h
>> ===================================================================
>> --- linux-3.5-rc6.orig/include/linux/mm.h	2012-07-18 18:01:28.000000000 +0900
>> +++ linux-3.5-rc6/include/linux/mm.h	2012-07-18 18:03:05.551168773 +0900
>> @@ -1588,6 +1588,8 @@ int vmemmap_populate(struct page *start_
>>  void vmemmap_populate_print_last(void);
>>  void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
>>  				  unsigned long size);
>> +void vmemmap_kfree(struct page *memmpa, unsigned long nr_pages);
>> +void vmemmap_free_bootmem(struct page *memmpa, unsigned long nr_pages);
>>  
>>  enum mf_flags {
>>  	MF_COUNT_INCREASED = 1 << 0,
>> Index: linux-3.5-rc6/mm/sparse.c
>> ===================================================================
>> --- linux-3.5-rc6.orig/mm/sparse.c	2012-07-18 17:59:25.000000000 +0900
>> +++ linux-3.5-rc6/mm/sparse.c	2012-07-18 18:03:05.553168749 +0900
>> @@ -614,12 +614,13 @@ static inline struct page *kmalloc_secti
>>  	/* This will make the necessary allocations eventually. */
>>  	return sparse_mem_map_populate(pnum, nid);
>>  }
>> -static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages)
>> +static void __kfree_section_memmap(struct page *page, unsigned long nr_pages)
>>  {
>> -	return; /* XXX: Not implemented yet */
>> +	vmemmap_kfree(page, nr_pages);
>>  }
>>  static void free_map_bootmem(struct page *page, unsigned long nr_pages)
>>  {
>> +	vmemmap_free_bootmem(page, nr_pages);
>>  }
>>  #else
>>  static struct page *__kmalloc_section_memmap(unsigned long nr_pages)
>> Index: linux-3.5-rc6/arch/x86/mm/init_64.c
>> ===================================================================
>> --- linux-3.5-rc6.orig/arch/x86/mm/init_64.c	2012-07-18 18:01:28.000000000 +0900
>> +++ linux-3.5-rc6/arch/x86/mm/init_64.c	2012-07-18 18:03:05.564168611 +0900
>> @@ -978,6 +978,127 @@ vmemmap_populate(struct page *start_page
>>  	return 0;
>>  }
>>  
>> +#define PAGE_INUSE 0xFD
>> +
>> +unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long end,
>> +			    struct page **pp, int *page_size)
>> +{
>> +	pgd_t *pgd;
>> +	pud_t *pud;
>> +	pmd_t *pmd;
>> +	pte_t *pte;
>> +	void *page_addr;
>> +	unsigned long next;
>> +
>> +	*pp = NULL;
>> +
>> +	pgd = pgd_offset_k(addr);
>> +	if (pgd_none(*pgd))
>> +		return pgd_addr_end(addr, end);
>> +
>> +	pud = pud_offset(pgd, addr);
>> +	if (pud_none(*pud))
>> +		return pud_addr_end(addr,end);
>> +
>> +	if (!cpu_has_pse) {
>> +		next = (addr + PAGE_SIZE) & PAGE_MASK;
>> +		pmd = pmd_offset(pud, addr);
>> +		if (pmd_none(*pmd))
>> +			return next;
>> +
>> +		pte = pte_offset_kernel(pmd, addr);
>> +		if (pte_none(*pte))
>> +			return next;
>> +
>> +		*page_size = PAGE_SIZE;
>> +		*pp = pte_page(*pte);
>> +	} else {
>> +		next = pmd_addr_end(addr, end);
>> +
>> +		pmd = pmd_offset(pud, addr);
>> +		if (pmd_none(*pmd))
>> +			return next;
>> +
>> +		*page_size = PMD_SIZE;
>> +		*pp = pmd_page(*pmd);
>> +	}
>> +
>> +	/*
>> +	 * Removed page structs are filled with 0xFD.
>> +	 */
>> +	memset((void *)addr, PAGE_INUSE, next - addr);
>> +
>> +	page_addr = page_address(*pp);
>> +
>> +	/*
>> +	 * Check the page is filled with 0xFD or not.
>> +	 * memchr_inv() returns the address. In this case, we cannot
>> +	 * clear PTE/PUD entry, since the page is used by other.
>> +	 * So we cannot also free the page.
>> +	 *
>> +	 * memchr_inv() returns NULL. In this case, we can clear
>> +	 * PTE/PUD entry, since the page is not used by other.
>> +	 * So we can also free the page.
>> +	 */
>> +	if (memchr_inv(page_addr, PAGE_INUSE, *page_size)) {
>> +		*pp = NULL;
>> +		return next;
>> +	}
>> +
>> +	if (!cpu_has_pse)
>> +		pte_clear(&init_mm, addr, pte);
>> +	else
>> +		pmd_clear(pmd);
>> +
>> +	return next;
>> +}
>> +
>> +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
>> +{
>> +	unsigned long addr = (unsigned long)memmap;
>> +	unsigned long end = (unsigned long)(memmap + nr_pages);
>> +	unsigned long next;
>> +	struct page *page;
>> +	int page_size;
>> +
>> +	for (; addr < end; addr = next) {
>> +		page = NULL;
>> +		page_size = 0;
>> +		next = find_and_clear_pte_page(addr, end, &page, &page_size);
>> +		if (!page)
>> +			continue;
>> +
>> +		free_pages((unsigned long)page_address(page),
>> +			    get_order(page_size));
>> +		__flush_tlb_one((unsigned long)page_address(page));
> 
> I think you want to free the memory to store struct page.
> So why you free page_address(page)?

I understand it now. page is for the memory to store struct page.

You clear page table's entry for the addr, not page_address(page).
And the entry for page_address(page) is still valid now.
So I think you want this:
__flush_tlb_one(addr);

Thanks
Wen Congyang

> 
> Thanks
> Wen Congyang
> 
>> +	}
>> +
>> +}
>> +
>> +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
>> +{
>> +	unsigned long addr = (unsigned long)memmap;
>> +	unsigned long end = (unsigned long)(memmap + nr_pages);
>> +	unsigned long next;
>> +	struct page *page;
>> +	int page_size;
>> +	unsigned long magic;
>> +
>> +	for (; addr < end; addr = next) {
>> +		page = NULL;
>> +		page_size = 0;
>> +		next = find_and_clear_pte_page(addr, end, &page, &page_size);
>> +		if (!page)
>> +			continue;
>> +
>> +		magic = (unsigned long) page->lru.next;
>> +		if (magic == SECTION_INFO)
>> +			put_page_bootmem(page);
>> +		flush_tlb_kernel_range(addr, end);
>> +	}
>> +
>> +}
>> +
>>  void register_page_bootmem_memmap(unsigned long section_nr,
>>  				  struct page *start_page, unsigned long size)
>>  {
>> Index: linux-3.5-rc6/mm/memory_hotplug.c
>> ===================================================================
>> --- linux-3.5-rc6.orig/mm/memory_hotplug.c	2012-07-18 18:01:28.000000000 +0900
>> +++ linux-3.5-rc6/mm/memory_hotplug.c	2012-07-18 18:25:11.036597977 +0900
>> @@ -300,7 +300,6 @@ static int __meminit __add_section(int n
>>  	return register_new_memory(nid, __pfn_to_section(phys_start_pfn));
>>  }
>>  
>> -#ifdef CONFIG_SPARSEMEM_VMEMMAP
>>  static int __remove_section(struct zone *zone, struct mem_section *ms)
>>  {
>>  	int ret = -EINVAL;
>> @@ -309,29 +308,15 @@ static int __remove_section(struct zone 
>>  		return ret;
>>  
>>  	ret = unregister_memory_section(ms);
>> -
>> -	return ret;
>> -}
>> -#else
>> -static int __remove_section(struct zone *zone, struct mem_section *ms)
>> -{
>> -	unsigned long flags;
>> -	struct pglist_data *pgdat = zone->zone_pgdat;
>> -	int ret = -EINVAL;
>> -
>> -	if (!valid_section(ms))
>> -		return ret;
>> -
>> -	ret = unregister_memory_section(ms);
>>  	if (ret)
>>  		return ret;
>>  
>>  	pgdat_resize_lock(pgdat, &flags);
>>  	sparse_remove_one_section(zone, ms);
>>  	pgdat_resize_unlock(pgdat, &flags);
>> -	return 0;
>> +
>> +	return ret;
>>  }
>> -#endif
>>  
>>  /*
>>   * Reasonably generic function for adding memory.  It is
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: [RFC PATCH v4 11/13] memory-hotplug : free memmap of sparse-vmemmap
From: Yasuaki Ishimatsu @ 2012-07-19  9:54 UTC (permalink / raw)
  To: Wen Congyang
  Cc: len.brown, linux-acpi, linux-kernel, linux-mm, paulus,
	minchan.kim, kosaki.motohiro, rientjes, cl, linuxppc-dev, akpm,
	liuj97
In-Reply-To: <5007D722.1030807@cn.fujitsu.com>

Hi Wen,

2012/07/19 18:45, Wen Congyang wrote:
> At 07/18/2012 06:16 PM, Yasuaki Ishimatsu Wrote:
>> All pages of virtual mapping in removed memory cannot be freed, since some pages
>> used as PGD/PUD includes not only removed memory but also other memory. So the
>> patch checks whether page can be freed or not.
>>
>> How to check whether page can be freed or not?
>>   1. When removing memory, the page structs of the revmoved memory are filled
>>      with 0FD.
>>   2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared.
>>      In this case, the page used as PT/PMD can be freed.
>>
>> Applying patch, __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is integrated
>> into one. So __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is deleted.
>>
>> CC: David Rientjes <rientjes@google.com>
>> CC: Jiang Liu <liuj97@gmail.com>
>> CC: Len Brown <len.brown@intel.com>
>> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> CC: Paul Mackerras <paulus@samba.org>
>> CC: Christoph Lameter <cl@linux.com>
>> Cc: Minchan Kim <minchan.kim@gmail.com>
>> CC: Andrew Morton <akpm@linux-foundation.org>
>> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> CC: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>>
>> ---
>>   arch/x86/mm/init_64.c |  121 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>   include/linux/mm.h    |    2
>>   mm/memory_hotplug.c   |   19 -------
>>   mm/sparse.c           |    5 +-
>>   4 files changed, 128 insertions(+), 19 deletions(-)
>>
>> Index: linux-3.5-rc6/include/linux/mm.h
>> ===================================================================
>> --- linux-3.5-rc6.orig/include/linux/mm.h	2012-07-18 18:01:28.000000000 +0900
>> +++ linux-3.5-rc6/include/linux/mm.h	2012-07-18 18:03:05.551168773 +0900
>> @@ -1588,6 +1588,8 @@ int vmemmap_populate(struct page *start_
>>   void vmemmap_populate_print_last(void);
>>   void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
>>   				  unsigned long size);
>> +void vmemmap_kfree(struct page *memmpa, unsigned long nr_pages);
>> +void vmemmap_free_bootmem(struct page *memmpa, unsigned long nr_pages);
>>   
>>   enum mf_flags {
>>   	MF_COUNT_INCREASED = 1 << 0,
>> Index: linux-3.5-rc6/mm/sparse.c
>> ===================================================================
>> --- linux-3.5-rc6.orig/mm/sparse.c	2012-07-18 17:59:25.000000000 +0900
>> +++ linux-3.5-rc6/mm/sparse.c	2012-07-18 18:03:05.553168749 +0900
>> @@ -614,12 +614,13 @@ static inline struct page *kmalloc_secti
>>   	/* This will make the necessary allocations eventually. */
>>   	return sparse_mem_map_populate(pnum, nid);
>>   }
>> -static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages)
>> +static void __kfree_section_memmap(struct page *page, unsigned long nr_pages)
>>   {
>> -	return; /* XXX: Not implemented yet */
>> +	vmemmap_kfree(page, nr_pages);
>>   }
>>   static void free_map_bootmem(struct page *page, unsigned long nr_pages)
>>   {
>> +	vmemmap_free_bootmem(page, nr_pages);
>>   }
>>   #else
>>   static struct page *__kmalloc_section_memmap(unsigned long nr_pages)
>> Index: linux-3.5-rc6/arch/x86/mm/init_64.c
>> ===================================================================
>> --- linux-3.5-rc6.orig/arch/x86/mm/init_64.c	2012-07-18 18:01:28.000000000 +0900
>> +++ linux-3.5-rc6/arch/x86/mm/init_64.c	2012-07-18 18:03:05.564168611 +0900
>> @@ -978,6 +978,127 @@ vmemmap_populate(struct page *start_page
>>   	return 0;
>>   }
>>   
>> +#define PAGE_INUSE 0xFD
>> +
>> +unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long end,
>> +			    struct page **pp, int *page_size)
>> +{
>> +	pgd_t *pgd;
>> +	pud_t *pud;
>> +	pmd_t *pmd;
>> +	pte_t *pte;
>> +	void *page_addr;
>> +	unsigned long next;
>> +
>> +	*pp = NULL;
>> +
>> +	pgd = pgd_offset_k(addr);
>> +	if (pgd_none(*pgd))
>> +		return pgd_addr_end(addr, end);
>> +
>> +	pud = pud_offset(pgd, addr);
>> +	if (pud_none(*pud))
>> +		return pud_addr_end(addr,end);
>> +
>> +	if (!cpu_has_pse) {
>> +		next = (addr + PAGE_SIZE) & PAGE_MASK;
>> +		pmd = pmd_offset(pud, addr);
>> +		if (pmd_none(*pmd))
>> +			return next;
>> +
>> +		pte = pte_offset_kernel(pmd, addr);
>> +		if (pte_none(*pte))
>> +			return next;
>> +
>> +		*page_size = PAGE_SIZE;
>> +		*pp = pte_page(*pte);
>> +	} else {
>> +		next = pmd_addr_end(addr, end);
>> +
>> +		pmd = pmd_offset(pud, addr);
>> +		if (pmd_none(*pmd))
>> +			return next;
>> +
>> +		*page_size = PMD_SIZE;
>> +		*pp = pmd_page(*pmd);
>> +	}
>> +
>> +	/*
>> +	 * Removed page structs are filled with 0xFD.
>> +	 */
>> +	memset((void *)addr, PAGE_INUSE, next - addr);
>> +
>> +	page_addr = page_address(*pp);
>> +
>> +	/*
>> +	 * Check the page is filled with 0xFD or not.
>> +	 * memchr_inv() returns the address. In this case, we cannot
>> +	 * clear PTE/PUD entry, since the page is used by other.
>> +	 * So we cannot also free the page.
>> +	 *
>> +	 * memchr_inv() returns NULL. In this case, we can clear
>> +	 * PTE/PUD entry, since the page is not used by other.
>> +	 * So we can also free the page.
>> +	 */
>> +	if (memchr_inv(page_addr, PAGE_INUSE, *page_size)) {
>> +		*pp = NULL;
>> +		return next;
>> +	}
>> +
>> +	if (!cpu_has_pse)
>> +		pte_clear(&init_mm, addr, pte);
>> +	else
>> +		pmd_clear(pmd);
>> +
>> +	return next;
>> +}
>> +
>> +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
>> +{
>> +	unsigned long addr = (unsigned long)memmap;
>> +	unsigned long end = (unsigned long)(memmap + nr_pages);
>> +	unsigned long next;
>> +	struct page *page;
>> +	int page_size;
>> +
>> +	for (; addr < end; addr = next) {
>> +		page = NULL;
>> +		page_size = 0;
>> +		next = find_and_clear_pte_page(addr, end, &page, &page_size);
>> +		if (!page)
>> +			continue;
>> +
>> +		free_pages((unsigned long)page_address(page),
>> +			    get_order(page_size));
>> +		__flush_tlb_one((unsigned long)page_address(page));
> 
> I think you want to free the memory to store struct page.
> So why you free page_address(page)?

Ths page is PT/PMD page and it has stored struct pages.
So I free the page.

Thanks,
Yasuaki Ishimatsu

> Thanks
> Wen Congyang
> 
>> +	}
>> +
>> +}
>> +
>> +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
>> +{
>> +	unsigned long addr = (unsigned long)memmap;
>> +	unsigned long end = (unsigned long)(memmap + nr_pages);
>> +	unsigned long next;
>> +	struct page *page;
>> +	int page_size;
>> +	unsigned long magic;
>> +
>> +	for (; addr < end; addr = next) {
>> +		page = NULL;
>> +		page_size = 0;
>> +		next = find_and_clear_pte_page(addr, end, &page, &page_size);
>> +		if (!page)
>> +			continue;
>> +
>> +		magic = (unsigned long) page->lru.next;
>> +		if (magic == SECTION_INFO)
>> +			put_page_bootmem(page);
>> +		flush_tlb_kernel_range(addr, end);
>> +	}
>> +
>> +}
>> +
>>   void register_page_bootmem_memmap(unsigned long section_nr,
>>   				  struct page *start_page, unsigned long size)
>>   {
>> Index: linux-3.5-rc6/mm/memory_hotplug.c
>> ===================================================================
>> --- linux-3.5-rc6.orig/mm/memory_hotplug.c	2012-07-18 18:01:28.000000000 +0900
>> +++ linux-3.5-rc6/mm/memory_hotplug.c	2012-07-18 18:25:11.036597977 +0900
>> @@ -300,7 +300,6 @@ static int __meminit __add_section(int n
>>   	return register_new_memory(nid, __pfn_to_section(phys_start_pfn));
>>   }
>>   
>> -#ifdef CONFIG_SPARSEMEM_VMEMMAP
>>   static int __remove_section(struct zone *zone, struct mem_section *ms)
>>   {
>>   	int ret = -EINVAL;
>> @@ -309,29 +308,15 @@ static int __remove_section(struct zone
>>   		return ret;
>>   
>>   	ret = unregister_memory_section(ms);
>> -
>> -	return ret;
>> -}
>> -#else
>> -static int __remove_section(struct zone *zone, struct mem_section *ms)
>> -{
>> -	unsigned long flags;
>> -	struct pglist_data *pgdat = zone->zone_pgdat;
>> -	int ret = -EINVAL;
>> -
>> -	if (!valid_section(ms))
>> -		return ret;
>> -
>> -	ret = unregister_memory_section(ms);
>>   	if (ret)
>>   		return ret;
>>   
>>   	pgdat_resize_lock(pgdat, &flags);
>>   	sparse_remove_one_section(zone, ms);
>>   	pgdat_resize_unlock(pgdat, &flags);
>> -	return 0;
>> +
>> +	return ret;
>>   }
>> -#endif
>>   
>>   /*
>>    * Reasonably generic function for adding memory.  It is
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
> 

^ permalink raw reply

* Re: [RFC PATCH v4 11/13] memory-hotplug : free memmap of sparse-vmemmap
From: Wen Congyang @ 2012-07-19  9:45 UTC (permalink / raw)
  To: Yasuaki Ishimatsu
  Cc: len.brown, linux-acpi, linux-kernel, linux-mm, paulus,
	minchan.kim, kosaki.motohiro, rientjes, cl, linuxppc-dev, akpm,
	liuj97
In-Reply-To: <50068D09.1050704@jp.fujitsu.com>

At 07/18/2012 06:16 PM, Yasuaki Ishimatsu Wrote:
> All pages of virtual mapping in removed memory cannot be freed, since some pages
> used as PGD/PUD includes not only removed memory but also other memory. So the
> patch checks whether page can be freed or not.
> 
> How to check whether page can be freed or not?
>  1. When removing memory, the page structs of the revmoved memory are filled
>     with 0FD.
>  2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared.
>     In this case, the page used as PT/PMD can be freed.
> 
> Applying patch, __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is integrated
> into one. So __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is deleted.
> 
> CC: David Rientjes <rientjes@google.com>
> CC: Jiang Liu <liuj97@gmail.com>
> CC: Len Brown <len.brown@intel.com>
> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> CC: Paul Mackerras <paulus@samba.org> 
> CC: Christoph Lameter <cl@linux.com>
> Cc: Minchan Kim <minchan.kim@gmail.com>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> 
> CC: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> 
> ---
>  arch/x86/mm/init_64.c |  121 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/mm.h    |    2 
>  mm/memory_hotplug.c   |   19 -------
>  mm/sparse.c           |    5 +-
>  4 files changed, 128 insertions(+), 19 deletions(-)
> 
> Index: linux-3.5-rc6/include/linux/mm.h
> ===================================================================
> --- linux-3.5-rc6.orig/include/linux/mm.h	2012-07-18 18:01:28.000000000 +0900
> +++ linux-3.5-rc6/include/linux/mm.h	2012-07-18 18:03:05.551168773 +0900
> @@ -1588,6 +1588,8 @@ int vmemmap_populate(struct page *start_
>  void vmemmap_populate_print_last(void);
>  void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
>  				  unsigned long size);
> +void vmemmap_kfree(struct page *memmpa, unsigned long nr_pages);
> +void vmemmap_free_bootmem(struct page *memmpa, unsigned long nr_pages);
>  
>  enum mf_flags {
>  	MF_COUNT_INCREASED = 1 << 0,
> Index: linux-3.5-rc6/mm/sparse.c
> ===================================================================
> --- linux-3.5-rc6.orig/mm/sparse.c	2012-07-18 17:59:25.000000000 +0900
> +++ linux-3.5-rc6/mm/sparse.c	2012-07-18 18:03:05.553168749 +0900
> @@ -614,12 +614,13 @@ static inline struct page *kmalloc_secti
>  	/* This will make the necessary allocations eventually. */
>  	return sparse_mem_map_populate(pnum, nid);
>  }
> -static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages)
> +static void __kfree_section_memmap(struct page *page, unsigned long nr_pages)
>  {
> -	return; /* XXX: Not implemented yet */
> +	vmemmap_kfree(page, nr_pages);
>  }
>  static void free_map_bootmem(struct page *page, unsigned long nr_pages)
>  {
> +	vmemmap_free_bootmem(page, nr_pages);
>  }
>  #else
>  static struct page *__kmalloc_section_memmap(unsigned long nr_pages)
> Index: linux-3.5-rc6/arch/x86/mm/init_64.c
> ===================================================================
> --- linux-3.5-rc6.orig/arch/x86/mm/init_64.c	2012-07-18 18:01:28.000000000 +0900
> +++ linux-3.5-rc6/arch/x86/mm/init_64.c	2012-07-18 18:03:05.564168611 +0900
> @@ -978,6 +978,127 @@ vmemmap_populate(struct page *start_page
>  	return 0;
>  }
>  
> +#define PAGE_INUSE 0xFD
> +
> +unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long end,
> +			    struct page **pp, int *page_size)
> +{
> +	pgd_t *pgd;
> +	pud_t *pud;
> +	pmd_t *pmd;
> +	pte_t *pte;
> +	void *page_addr;
> +	unsigned long next;
> +
> +	*pp = NULL;
> +
> +	pgd = pgd_offset_k(addr);
> +	if (pgd_none(*pgd))
> +		return pgd_addr_end(addr, end);
> +
> +	pud = pud_offset(pgd, addr);
> +	if (pud_none(*pud))
> +		return pud_addr_end(addr,end);
> +
> +	if (!cpu_has_pse) {
> +		next = (addr + PAGE_SIZE) & PAGE_MASK;
> +		pmd = pmd_offset(pud, addr);
> +		if (pmd_none(*pmd))
> +			return next;
> +
> +		pte = pte_offset_kernel(pmd, addr);
> +		if (pte_none(*pte))
> +			return next;
> +
> +		*page_size = PAGE_SIZE;
> +		*pp = pte_page(*pte);
> +	} else {
> +		next = pmd_addr_end(addr, end);
> +
> +		pmd = pmd_offset(pud, addr);
> +		if (pmd_none(*pmd))
> +			return next;
> +
> +		*page_size = PMD_SIZE;
> +		*pp = pmd_page(*pmd);
> +	}
> +
> +	/*
> +	 * Removed page structs are filled with 0xFD.
> +	 */
> +	memset((void *)addr, PAGE_INUSE, next - addr);
> +
> +	page_addr = page_address(*pp);
> +
> +	/*
> +	 * Check the page is filled with 0xFD or not.
> +	 * memchr_inv() returns the address. In this case, we cannot
> +	 * clear PTE/PUD entry, since the page is used by other.
> +	 * So we cannot also free the page.
> +	 *
> +	 * memchr_inv() returns NULL. In this case, we can clear
> +	 * PTE/PUD entry, since the page is not used by other.
> +	 * So we can also free the page.
> +	 */
> +	if (memchr_inv(page_addr, PAGE_INUSE, *page_size)) {
> +		*pp = NULL;
> +		return next;
> +	}
> +
> +	if (!cpu_has_pse)
> +		pte_clear(&init_mm, addr, pte);
> +	else
> +		pmd_clear(pmd);
> +
> +	return next;
> +}
> +
> +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
> +{
> +	unsigned long addr = (unsigned long)memmap;
> +	unsigned long end = (unsigned long)(memmap + nr_pages);
> +	unsigned long next;
> +	struct page *page;
> +	int page_size;
> +
> +	for (; addr < end; addr = next) {
> +		page = NULL;
> +		page_size = 0;
> +		next = find_and_clear_pte_page(addr, end, &page, &page_size);
> +		if (!page)
> +			continue;
> +
> +		free_pages((unsigned long)page_address(page),
> +			    get_order(page_size));
> +		__flush_tlb_one((unsigned long)page_address(page));

I think you want to free the memory to store struct page.
So why you free page_address(page)?

Thanks
Wen Congyang

> +	}
> +
> +}
> +
> +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
> +{
> +	unsigned long addr = (unsigned long)memmap;
> +	unsigned long end = (unsigned long)(memmap + nr_pages);
> +	unsigned long next;
> +	struct page *page;
> +	int page_size;
> +	unsigned long magic;
> +
> +	for (; addr < end; addr = next) {
> +		page = NULL;
> +		page_size = 0;
> +		next = find_and_clear_pte_page(addr, end, &page, &page_size);
> +		if (!page)
> +			continue;
> +
> +		magic = (unsigned long) page->lru.next;
> +		if (magic == SECTION_INFO)
> +			put_page_bootmem(page);
> +		flush_tlb_kernel_range(addr, end);
> +	}
> +
> +}
> +
>  void register_page_bootmem_memmap(unsigned long section_nr,
>  				  struct page *start_page, unsigned long size)
>  {
> Index: linux-3.5-rc6/mm/memory_hotplug.c
> ===================================================================
> --- linux-3.5-rc6.orig/mm/memory_hotplug.c	2012-07-18 18:01:28.000000000 +0900
> +++ linux-3.5-rc6/mm/memory_hotplug.c	2012-07-18 18:25:11.036597977 +0900
> @@ -300,7 +300,6 @@ static int __meminit __add_section(int n
>  	return register_new_memory(nid, __pfn_to_section(phys_start_pfn));
>  }
>  
> -#ifdef CONFIG_SPARSEMEM_VMEMMAP
>  static int __remove_section(struct zone *zone, struct mem_section *ms)
>  {
>  	int ret = -EINVAL;
> @@ -309,29 +308,15 @@ static int __remove_section(struct zone 
>  		return ret;
>  
>  	ret = unregister_memory_section(ms);
> -
> -	return ret;
> -}
> -#else
> -static int __remove_section(struct zone *zone, struct mem_section *ms)
> -{
> -	unsigned long flags;
> -	struct pglist_data *pgdat = zone->zone_pgdat;
> -	int ret = -EINVAL;
> -
> -	if (!valid_section(ms))
> -		return ret;
> -
> -	ret = unregister_memory_section(ms);
>  	if (ret)
>  		return ret;
>  
>  	pgdat_resize_lock(pgdat, &flags);
>  	sparse_remove_one_section(zone, ms);
>  	pgdat_resize_unlock(pgdat, &flags);
> -	return 0;
> +
> +	return ret;
>  }
> -#endif
>  
>  /*
>   * Reasonably generic function for adding memory.  It is
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply

* Re: [RFC PATCH v4 2/13] memory-hotplug : add physical memory hotplug code to acpi_memory_device_remove
From: Yasuaki Ishimatsu @ 2012-07-19  9:32 UTC (permalink / raw)
  To: Wen Congyang
  Cc: len.brown, linux-acpi, linux-kernel, linux-mm, paulus,
	minchan.kim, kosaki.motohiro, rientjes, cl, linuxppc-dev, akpm,
	liuj97
In-Reply-To: <5007B5E4.1010602@cn.fujitsu.com>

Hi Wen,

2012/07/19 16:23, Wen Congyang wrote:
> At 07/18/2012 06:06 PM, Yasuaki Ishimatsu Wrote:
>> acpi_memory_device_remove() has been prepared to remove physical memory.
>> But, the function only frees acpi_memory_device currentlry.
>>
>> The patch adds following functions into acpi_memory_device_remove():
>>    - offline memory
>>    - remove physical memory. It only check whether memory is online or not.
>>    - free acpi_memory_device
>>
>> CC: David Rientjes <rientjes@google.com>
>> CC: Jiang Liu <liuj97@gmail.com>
>> CC: Len Brown <len.brown@intel.com>
>> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> CC: Paul Mackerras <paulus@samba.org>
>> CC: Christoph Lameter <cl@linux.com>
>> Cc: Minchan Kim <minchan.kim@gmail.com>
>> CC: Andrew Morton <akpm@linux-foundation.org>
>> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> CC: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>>
>> ---
>>   drivers/acpi/acpi_memhotplug.c |   27 ++++++++++++++++++++++++++-
>>   drivers/base/memory.c          |   39 +++++++++++++++++++++++++++++++++++++++
>>   include/linux/memory.h         |    5 +++++
>>   include/linux/memory_hotplug.h |    5 +++++
>>   mm/memory_hotplug.c            |   22 ++++++++++++++++++++++
>>   5 files changed, 97 insertions(+), 1 deletion(-)
>>
>> Index: linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c
>> ===================================================================
>> --- linux-3.5-rc6.orig/drivers/acpi/acpi_memhotplug.c	2012-07-17 11:20:15.117796971 +0900
>> +++ linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c	2012-07-17 13:36:30.325594022 +0900
>> @@ -29,6 +29,7 @@
>>   #include <linux/module.h>
>>   #include <linux/init.h>
>>   #include <linux/types.h>
>> +#include <linux/memory.h>
>>   #include <linux/memory_hotplug.h>
>>   #include <linux/slab.h>
>>   #include <acpi/acpi_drivers.h>
>> @@ -452,12 +453,36 @@ static int acpi_memory_device_add(struct
>>   static int acpi_memory_device_remove(struct acpi_device *device, int type)
>>   {
>>   	struct acpi_memory_device *mem_device = NULL;
>> -
>> +	struct acpi_memory_info *info, *tmp;
>> +	int result;
>> +	int node;
>>   
>>   	if (!device || !acpi_driver_data(device))
>>   		return -EINVAL;
>>   
>>   	mem_device = acpi_driver_data(device);
>> +
>> +	node = acpi_get_node(mem_device->device->handle);
>> +	list_for_each_entry_safe(info, tmp, &mem_device->res_list, list) {
>> +		if (!info->enabled)
>> +			continue;
>> +
>> +		if (!is_memblk_offline(info->start_addr, info->length)) {
>> +			result = offline_memory(info->start_addr, info->length);
>> +			if (result)
>> +				return result;
>> +		}
>> +		if (node < 0)
>> +			node = memory_add_physaddr_to_nid(info->start_addr);
>> +
>> +		result = remove_memory(node, info->start_addr, info->length);
>> +		if (result)
>> +			return result;
>> +
>> +		list_del(&info->list);
>> +		kfree(info);
>> +	}
>> +
>>   	kfree(mem_device);
>>   
>>   	return 0;
>> Index: linux-3.5-rc6/include/linux/memory_hotplug.h
>> ===================================================================
>> --- linux-3.5-rc6.orig/include/linux/memory_hotplug.h	2012-07-17 11:20:15.133796772 +0900
>> +++ linux-3.5-rc6/include/linux/memory_hotplug.h	2012-07-17 11:29:41.490716352 +0900
>> @@ -221,6 +221,7 @@ static inline void unlock_memory_hotplug
>>   #ifdef CONFIG_MEMORY_HOTREMOVE
>>   
>>   extern int is_mem_section_removable(unsigned long pfn, unsigned long nr_pages);
>> +extern int remove_memory(int nid, u64 start, u64 size);
>>   
>>   #else
>>   static inline int is_mem_section_removable(unsigned long pfn,
>> @@ -228,6 +229,10 @@ static inline int is_mem_section_removab
>>   {
>>   	return 0;
>>   }
>> +static inline int remove_memory(int nid, u64 start, u64 size)
>> +{
>> +	return -EBUSY;
>> +}
>>   #endif /* CONFIG_MEMORY_HOTREMOVE */
>>   
>>   extern int mem_online_node(int nid);
>> Index: linux-3.5-rc6/mm/memory_hotplug.c
>> ===================================================================
>> --- linux-3.5-rc6.orig/mm/memory_hotplug.c	2012-07-17 11:20:15.129796821 +0900
>> +++ linux-3.5-rc6/mm/memory_hotplug.c	2012-07-17 13:25:18.952986069 +0900
>> @@ -998,6 +998,28 @@ int offline_memory(u64 start, u64 size)
>>   	end_pfn = start_pfn + PFN_DOWN(size);
>>   	return offline_pages(start_pfn, end_pfn, 120 * HZ);
>>   }
>> +
>> +int remove_memory(int nid, u64 start, u64 size)
>> +{
>> +	int ret = -EBUSY;
>> +	lock_memory_hotplug();
>> +	/*
>> +	 * The memory might become online by other task, even if you offine it.
>> +	 * So we check whether the cpu has been onlined or not.
>> +	 */
>> +	if (!is_memblk_offline(start, size)) {
>> +		pr_warn("memory removing [mem %#010llx-%#010llx] failed, "
>> +			"because the memmory range is online\n",
>> +			start, start + size);
>> +		ret = -EAGAIN;
>> +	}
>> +
>> +	unlock_memory_hotplug();
>> +	return ret;
>> +
>> +}
>> +EXPORT_SYMBOL_GPL(remove_memory);
>> +
>>   #else
>>   int offline_memory(u64 start, u64 size)
>>   {
>> Index: linux-3.5-rc6/drivers/base/memory.c
>> ===================================================================
>> --- linux-3.5-rc6.orig/drivers/base/memory.c	2012-07-17 11:20:15.120796934 +0900
>> +++ linux-3.5-rc6/drivers/base/memory.c	2012-07-17 11:20:54.626302995 +0900
>> @@ -70,6 +70,45 @@ void unregister_memory_isolate_notifier(
>>   }
>>   EXPORT_SYMBOL(unregister_memory_isolate_notifier);
>>   
>> +bool is_memblk_offline(unsigned long start, unsigned long size)
>> +{
>> +	struct memory_block *mem = NULL;
>> +	struct mem_section *section;
>> +	unsigned long start_pfn, end_pfn;
>> +	unsigned long pfn, section_nr;
>> +
>> +	start_pfn = PFN_DOWN(start);
>> +	end_pfn = start_pfn + PFN_DOWN(start);
> 
> This line is wrong. I think you want this:
> end_pfn = start_pfn + PFN_UP(size);

Yes. I'll update it.

>> +
>> +	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
>> +		section_nr = pfn_to_section_nr(pfn);
>> +		if (!present_section_nr(section_nr));
> 
> The ';' should be removed. Otherwise, this function always return true...

Thanks. I'll update it.

Thanks,
Yasuaki Ishimatsu

> Thanks
> Wen Congyang
> 
>> +			continue;
>> +
>> +		section = __nr_to_section(section_nr);
>> +		/* same memblock? */
>> +		if (mem)
>> +			if((section_nr >= mem->start_section_nr) &&
>> +			   (section_nr <= mem->end_section_nr))
>> +				continue;
>> +
>> +		mem = find_memory_block_hinted(section, mem);
>> +		if (!mem)
>> +			continue;
>> +		if (mem->state == MEM_OFFLINE)
>> +			continue;
>> +
>> +		kobject_put(&mem->dev.kobj);
>> +		return false;
>> +	}
>> +
>> +	if (mem)
>> +		kobject_put(&mem->dev.kobj);
>> +
>> +	return true;
>> +}
>> +EXPORT_SYMBOL(is_memblk_offline);
>> +
>>   /*
>>    * register_memory - Setup a sysfs device for a memory block
>>    */
>> Index: linux-3.5-rc6/include/linux/memory.h
>> ===================================================================
>> --- linux-3.5-rc6.orig/include/linux/memory.h	2012-07-17 11:18:00.693477455 +0900
>> +++ linux-3.5-rc6/include/linux/memory.h	2012-07-17 11:20:54.632302919 +0900
>> @@ -106,6 +106,10 @@ static inline int memory_isolate_notify(
>>   {
>>   	return 0;
>>   }
>> +static inline bool is_memblk_offline(unsigned long start, unsigned long size)
>> +{
>> +	return false;
>> +}
>>   #else
>>   extern int register_memory_notifier(struct notifier_block *nb);
>>   extern void unregister_memory_notifier(struct notifier_block *nb);
>> @@ -120,6 +124,7 @@ extern int memory_isolate_notify(unsigne
>>   extern struct memory_block *find_memory_block_hinted(struct mem_section *,
>>   							struct memory_block *);
>>   extern struct memory_block *find_memory_block(struct mem_section *);
>> +extern bool is_memblk_offline(unsigned long start, unsigned long size);
>>   #define CONFIG_MEM_BLOCK_SIZE	(PAGES_PER_SECTION<<PAGE_SHIFT)
>>   enum mem_add_context { BOOT, HOTPLUG };
>>   #endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */
>>
>>
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

^ permalink raw reply

* Re: [RFC PATCH v4 7/13] memory-hotplug : remove_memory calls __remove_pages
From: Yasuaki Ishimatsu @ 2012-07-19  9:30 UTC (permalink / raw)
  To: Bob Liu
  Cc: len.brown, wency, linux-acpi, linux-kernel, linux-mm, paulus,
	minchan.kim, kosaki.motohiro, rientjes, cl, linuxppc-dev, akpm,
	liuj97
In-Reply-To: <CAA_GA1dPdjO7jwMaQsx+ywWpZe4fyGm+aTeJcjUgJPKuVZd9xA@mail.gmail.com>

Hi Bob,

2012/07/19 17:32, Bob Liu wrote:
> On Wed, Jul 18, 2012 at 6:12 PM, Yasuaki Ishimatsu
> <isimatu.yasuaki@jp.fujitsu.com> wrote:
>> The patch adds __remove_pages() to remove_memory(). Then the range of
>> phys_start_pfn argument and nr_pages argument in __remove_pagse() may
>> have different zone. So zone argument is removed from __remove_pages()
>> and __remove_pages() caluculates zone in each section.
>>
>> When CONFIG_SPARSEMEM_VMEMMAP is defined, there is no way to remove a memmap.
>> So __remove_section only calls unregister_memory_section().
>>
>> CC: David Rientjes <rientjes@google.com>
>> CC: Jiang Liu <liuj97@gmail.com>
>> CC: Len Brown <len.brown@intel.com>
>> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> CC: Paul Mackerras <paulus@samba.org>
>> CC: Christoph Lameter <cl@linux.com>
>> Cc: Minchan Kim <minchan.kim@gmail.com>
>> CC: Andrew Morton <akpm@linux-foundation.org>
>> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> CC: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>>
>> ---
>>   arch/powerpc/platforms/pseries/hotplug-memory.c |    5 +----
>>   include/linux/memory_hotplug.h                  |    3 +--
>>   mm/memory_hotplug.c                             |   19 ++++++++++++-------
>>   3 files changed, 14 insertions(+), 13 deletions(-)
>>
>> Index: linux-3.5-rc6/mm/memory_hotplug.c
>> ===================================================================
>> --- linux-3.5-rc6.orig/mm/memory_hotplug.c      2012-07-18 18:00:27.440145432 +0900
>> +++ linux-3.5-rc6/mm/memory_hotplug.c   2012-07-18 18:01:02.070712487 +0900
>> @@ -275,11 +275,14 @@ static int __meminit __add_section(int n
>>   #ifdef CONFIG_SPARSEMEM_VMEMMAP
>>   static int __remove_section(struct zone *zone, struct mem_section *ms)
>>   {
>> -       /*
>> -        * XXX: Freeing memmap with vmemmap is not implement yet.
>> -        *      This should be removed later.
>> -        */
>> -       return -EBUSY;
>> +       int ret = -EINVAL;
>> +
>> +       if (!valid_section(ms))
>> +               return ret;
>> +
>> +       ret = unregister_memory_section(ms);
>> +
>
> I saw a patch from Jiang Liu "mm/hotplug: free zone->pageset when a
> zone becomes empty" to
> free the zone->pageset and i think there may more cleanup needed when
> a zone becomes empty.
>
> We already have __add_zone() in __add_section(), what about add a
> function like __remove_zone()
> to do the cleanup here?

Thank you for your cooment. As you say, I think cleanup function of zone
is necessary. So I'll update it.

Thanks,
Yasuaki Ishimatsu.

>
>> +       return ret;
>>   }
>>   #else
>>   static int __remove_section(struct zone *zone, struct mem_section *ms)
>> @@ -346,11 +349,11 @@ EXPORT_SYMBOL_GPL(__add_pages);
>>    * sure that pages are marked reserved and zones are adjust properly by
>>    * calling offline_pages().
>>    */
>> -int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
>> -                unsigned long nr_pages)
>> +int __remove_pages(unsigned long phys_start_pfn, unsigned long nr_pages)
>>   {
>>          unsigned long i, ret = 0;
>>          int sections_to_remove;
>> +       struct zone *zone;
>>
>>          /*
>>           * We can only remove entire sections
>> @@ -363,6 +366,7 @@ int __remove_pages(struct zone *zone, un
>>          sections_to_remove = nr_pages / PAGES_PER_SECTION;
>>          for (i = 0; i < sections_to_remove; i++) {
>>                  unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION;
>> +               zone = page_zone(pfn_to_page(pfn));
>>                  ret = __remove_section(zone, __pfn_to_section(pfn));
>>                  if (ret)
>>                          break;
>> @@ -1031,6 +1035,7 @@ int __ref remove_memory(int nid, u64 sta
>>          /* remove memmap entry */
>>          firmware_map_remove(start, start + size, "System RAM");
>>
>> +       __remove_pages(start >> PAGE_SHIFT, size >> PAGE_SHIFT);
>>   out:
>>          unlock_memory_hotplug();
>>          return ret;
>> Index: linux-3.5-rc6/include/linux/memory_hotplug.h
>> ===================================================================
>> --- linux-3.5-rc6.orig/include/linux/memory_hotplug.h   2012-07-18 18:00:27.445145371 +0900
>> +++ linux-3.5-rc6/include/linux/memory_hotplug.h        2012-07-18 18:00:40.461982690 +0900
>> @@ -89,8 +89,7 @@ extern bool is_pageblock_removable_noloc
>>   /* reasonably generic interface to expand the physical pages in a zone  */
>>   extern int __add_pages(int nid, struct zone *zone, unsigned long start_pfn,
>>          unsigned long nr_pages);
>> -extern int __remove_pages(struct zone *zone, unsigned long start_pfn,
>> -       unsigned long nr_pages);
>> +extern int __remove_pages(unsigned long start_pfn, unsigned long nr_pages);
>>
>>   #ifdef CONFIG_NUMA
>>   extern int memory_add_physaddr_to_nid(u64 start);
>> Index: linux-3.5-rc6/arch/powerpc/platforms/pseries/hotplug-memory.c
>> ===================================================================
>> --- linux-3.5-rc6.orig/arch/powerpc/platforms/pseries/hotplug-memory.c  2012-07-18 18:00:27.442145407 +0900
>> +++ linux-3.5-rc6/arch/powerpc/platforms/pseries/hotplug-memory.c       2012-07-18 18:00:40.470982578 +0900
>> @@ -76,7 +76,6 @@ unsigned long memory_block_size_bytes(vo
>>   static int pseries_remove_memblock(unsigned long base, unsigned int memblock_size)
>>   {
>>          unsigned long start, start_pfn;
>> -       struct zone *zone;
>>          int i, ret;
>>          int sections_to_remove;
>>
>> @@ -87,8 +86,6 @@ static int pseries_remove_memblock(unsig
>>                  return 0;
>>          }
>>
>> -       zone = page_zone(pfn_to_page(start_pfn));
>> -
>>          /*
>>           * Remove section mappings and sysfs entries for the
>>           * section of the memory we are removing.
>> @@ -101,7 +98,7 @@ static int pseries_remove_memblock(unsig
>>          sections_to_remove = (memblock_size >> PAGE_SHIFT) / PAGES_PER_SECTION;
>>          for (i = 0; i < sections_to_remove; i++) {
>>                  unsigned long pfn = start_pfn + i * PAGES_PER_SECTION;
>> -               ret = __remove_pages(zone, start_pfn,  PAGES_PER_SECTION);
>> +               ret = __remove_pages(start_pfn,  PAGES_PER_SECTION);
>>                  if (ret)
>>                          return ret;
>>          }
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
>

^ permalink raw reply

* Re: [RFC PATCH v4 1/13] memory-hotplug : rename remove_memory to offline_memory
From: Yasuaki Ishimatsu @ 2012-07-19  9:26 UTC (permalink / raw)
  To: Bob Liu
  Cc: len.brown, wency, linux-acpi, linux-kernel, linux-mm, paulus,
	minchan.kim, kosaki.motohiro, rientjes, cl, linuxppc-dev, akpm,
	liuj97
In-Reply-To: <CAA_GA1fayhA1A3vT5BcDCoL_JVd6pZJn2_=NXK0bjJNRXo=7LA@mail.gmail.com>

Hi Bob,

2012/07/19 17:19, Bob Liu wrote:
> Hi Yasuaki,
>
> On Wed, Jul 18, 2012 at 6:05 PM, Yasuaki Ishimatsu
> <isimatu.yasuaki@jp.fujitsu.com> wrote:
>> remove_memory() does not remove memory but just offlines memory. The patch
>> changes name of it to offline_memory().
>
> Since offline_memory() just align the start/end pfn and there is no
> matched online_memory() function,
> i think it's better to remove this function and add the alignment into
> offline_pages().

If we change it, these argument becomes different as follows:

   online_pages  : page frame number and number of page frame number
   offline_pages : memory address and memory length

I think it is ugly. So I don't want to change it. As you say, there is no
function that matches to offline_memory(). If we create export symbol
function for onlining page, in this case, the function should be named
online_memory().

Thanks,
Yasuaki Ishimatsu

>
>>
>> CC: David Rientjes <rientjes@google.com>
>> CC: Jiang Liu <liuj97@gmail.com>
>> CC: Len Brown <len.brown@intel.com>
>> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> CC: Paul Mackerras <paulus@samba.org>
>> CC: Christoph Lameter <cl@linux.com>
>> Cc: Minchan Kim <minchan.kim@gmail.com>
>> CC: Andrew Morton <akpm@linux-foundation.org>
>> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> CC: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>>
>> ---
>>   drivers/acpi/acpi_memhotplug.c |    2 +-
>>   drivers/base/memory.c          |    4 ++--
>>   include/linux/memory_hotplug.h |    2 +-
>>   mm/memory_hotplug.c            |    6 +++---
>>   4 files changed, 7 insertions(+), 7 deletions(-)
>>
>> Index: linux-3.5-rc4/drivers/acpi/acpi_memhotplug.c
>> ===================================================================
>> --- linux-3.5-rc4.orig/drivers/acpi/acpi_memhotplug.c   2012-07-03 14:21:46.102416917 +0900
>> +++ linux-3.5-rc4/drivers/acpi/acpi_memhotplug.c        2012-07-03 14:21:49.458374960 +0900
>> @@ -318,7 +318,7 @@ static int acpi_memory_disable_device(st
>>           */
>>          list_for_each_entry_safe(info, n, &mem_device->res_list, list) {
>>                  if (info->enabled) {
>> -                       result = remove_memory(info->start_addr, info->length);
>> +                       result = offline_memory(info->start_addr, info->length);
>>                          if (result)
>>                                  return result;
>>                  }
>> Index: linux-3.5-rc4/drivers/base/memory.c
>> ===================================================================
>> --- linux-3.5-rc4.orig/drivers/base/memory.c    2012-07-03 14:21:46.095417003 +0900
>> +++ linux-3.5-rc4/drivers/base/memory.c 2012-07-03 14:21:49.459374948 +0900
>> @@ -266,8 +266,8 @@ memory_block_action(unsigned long phys_i
>>                          break;
>>                  case MEM_OFFLINE:
>>                          start_paddr = page_to_pfn(first_page) << PAGE_SHIFT;
>> -                       ret = remove_memory(start_paddr,
>> -                                           nr_pages << PAGE_SHIFT);
>> +                       ret = offline_memory(start_paddr,
>> +                                            nr_pages << PAGE_SHIFT);
>>                          break;
>>                  default:
>>                          WARN(1, KERN_WARNING "%s(%ld, %ld) unknown action: "
>> Index: linux-3.5-rc4/mm/memory_hotplug.c
>> ===================================================================
>> --- linux-3.5-rc4.orig/mm/memory_hotplug.c      2012-07-03 14:21:46.102416917 +0900
>> +++ linux-3.5-rc4/mm/memory_hotplug.c   2012-07-03 14:21:49.466374860 +0900
>> @@ -990,7 +990,7 @@ out:
>>          return ret;
>>   }
>>
>> -int remove_memory(u64 start, u64 size)
>> +int offline_memory(u64 start, u64 size)
>>   {
>>          unsigned long start_pfn, end_pfn;
>>
>> @@ -999,9 +999,9 @@ int remove_memory(u64 start, u64 size)
>>          return offline_pages(start_pfn, end_pfn, 120 * HZ);
>>   }
>>   #else
>> -int remove_memory(u64 start, u64 size)
>> +int offline_memory(u64 start, u64 size)
>>   {
>>          return -EINVAL;
>>   }
>>   #endif /* CONFIG_MEMORY_HOTREMOVE */
>> -EXPORT_SYMBOL_GPL(remove_memory);
>> +EXPORT_SYMBOL_GPL(offline_memory);
>> Index: linux-3.5-rc4/include/linux/memory_hotplug.h
>> ===================================================================
>> --- linux-3.5-rc4.orig/include/linux/memory_hotplug.h   2012-07-03 14:21:46.102416917 +0900
>> +++ linux-3.5-rc4/include/linux/memory_hotplug.h        2012-07-03 14:21:49.471374796 +0900
>> @@ -233,7 +233,7 @@ static inline int is_mem_section_removab
>>   extern int mem_online_node(int nid);
>>   extern int add_memory(int nid, u64 start, u64 size);
>>   extern int arch_add_memory(int nid, u64 start, u64 size);
>> -extern int remove_memory(u64 start, u64 size);
>> +extern int offline_memory(u64 start, u64 size);
>>   extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn,
>>                                                                  int nr_pages);
>>   extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms);
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
>

^ permalink raw reply

* Re: [RFC] Configure GPIO settings via the device tree
From: Christian Engelmayer @ 2012-07-19  8:39 UTC (permalink / raw)
  To: grant.likely, linus.walleij, devicetree-discuss
  Cc: linuxppc-dev, christian.engelmayer
In-Reply-To: <20120713100440.7a80e05f@frequentis.com>

Hello,

As there was no response to my question I assume that I see it correctly
that at the moment the only 2 options for setting up GPIO configurations
is either via platform-specific code by eg. platform_data or via userspace?

Would there be a common use of defining GPIO configurations via a device
tree and applying those settings via a generic, platform-unaware driver?

Regards,
Christian


On Fri, 13 Jul 2012 10:04:40 +0200
Christian Engelmayer <christian.engelmayer@frequentis.com> wrote:

> Hello,
> 
> I am looking for a way to configure GPIO initial settings and exports to the
> userspace via Sysfs in a generic way via a device tree.
> 
> The purpose would be to have the same features as when initializing and
> exporting pins via platform code, eg.
> 
> 	static struct gpio leds_gpios[] = {
> 		{ 32, GPIOF_OUT_INIT_HIGH, "Power LED" }, /* default to ON */
> 		{ 33, GPIOF_OUT_INIT_LOW,  "Green LED" }, /* default to OFF */
> 		{ 34, GPIOF_OUT_INIT_LOW,  "Red LED"   }, /* default to OFF */
> 		{ 35, GPIOF_OUT_INIT_LOW,  "Blue LED"  }, /* default to OFF */
> 		{ ... },
> 	};
> 
> ,however, with no need for the kernel to know anything more about those pins
> and their later handling by simple userpsace drivers than the setup information
> provided in the device tree.
> 
> This should also attack the problem of unstable GPIO numbers in the case of
> daughtercards on different base boards and would help provide a defined API
> to the userspace based on pin labels with the board specifics hidden in one
> place in the device tree.
> 
> Is there already a way for realizing such a scenario ?
> 
> Regards,
> Christian

^ permalink raw reply

* Re: [RFC PATCH v4 7/13] memory-hotplug : remove_memory calls __remove_pages
From: Bob Liu @ 2012-07-19  8:32 UTC (permalink / raw)
  To: Yasuaki Ishimatsu
  Cc: len.brown, wency, linux-acpi, linux-kernel, linux-mm, paulus,
	minchan.kim, kosaki.motohiro, rientjes, cl, linuxppc-dev, akpm,
	liuj97
In-Reply-To: <50068BF4.1080603@jp.fujitsu.com>

On Wed, Jul 18, 2012 at 6:12 PM, Yasuaki Ishimatsu
<isimatu.yasuaki@jp.fujitsu.com> wrote:
> The patch adds __remove_pages() to remove_memory(). Then the range of
> phys_start_pfn argument and nr_pages argument in __remove_pagse() may
> have different zone. So zone argument is removed from __remove_pages()
> and __remove_pages() caluculates zone in each section.
>
> When CONFIG_SPARSEMEM_VMEMMAP is defined, there is no way to remove a memmap.
> So __remove_section only calls unregister_memory_section().
>
> CC: David Rientjes <rientjes@google.com>
> CC: Jiang Liu <liuj97@gmail.com>
> CC: Len Brown <len.brown@intel.com>
> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> CC: Paul Mackerras <paulus@samba.org>
> CC: Christoph Lameter <cl@linux.com>
> Cc: Minchan Kim <minchan.kim@gmail.com>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> CC: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>
> ---
>  arch/powerpc/platforms/pseries/hotplug-memory.c |    5 +----
>  include/linux/memory_hotplug.h                  |    3 +--
>  mm/memory_hotplug.c                             |   19 ++++++++++++-------
>  3 files changed, 14 insertions(+), 13 deletions(-)
>
> Index: linux-3.5-rc6/mm/memory_hotplug.c
> ===================================================================
> --- linux-3.5-rc6.orig/mm/memory_hotplug.c      2012-07-18 18:00:27.440145432 +0900
> +++ linux-3.5-rc6/mm/memory_hotplug.c   2012-07-18 18:01:02.070712487 +0900
> @@ -275,11 +275,14 @@ static int __meminit __add_section(int n
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>  static int __remove_section(struct zone *zone, struct mem_section *ms)
>  {
> -       /*
> -        * XXX: Freeing memmap with vmemmap is not implement yet.
> -        *      This should be removed later.
> -        */
> -       return -EBUSY;
> +       int ret = -EINVAL;
> +
> +       if (!valid_section(ms))
> +               return ret;
> +
> +       ret = unregister_memory_section(ms);
> +

I saw a patch from Jiang Liu "mm/hotplug: free zone->pageset when a
zone becomes empty" to
free the zone->pageset and i think there may more cleanup needed when
a zone becomes empty.

We already have __add_zone() in __add_section(), what about add a
function like __remove_zone()
to do the cleanup here?

> +       return ret;
>  }
>  #else
>  static int __remove_section(struct zone *zone, struct mem_section *ms)
> @@ -346,11 +349,11 @@ EXPORT_SYMBOL_GPL(__add_pages);
>   * sure that pages are marked reserved and zones are adjust properly by
>   * calling offline_pages().
>   */
> -int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
> -                unsigned long nr_pages)
> +int __remove_pages(unsigned long phys_start_pfn, unsigned long nr_pages)
>  {
>         unsigned long i, ret = 0;
>         int sections_to_remove;
> +       struct zone *zone;
>
>         /*
>          * We can only remove entire sections
> @@ -363,6 +366,7 @@ int __remove_pages(struct zone *zone, un
>         sections_to_remove = nr_pages / PAGES_PER_SECTION;
>         for (i = 0; i < sections_to_remove; i++) {
>                 unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION;
> +               zone = page_zone(pfn_to_page(pfn));
>                 ret = __remove_section(zone, __pfn_to_section(pfn));
>                 if (ret)
>                         break;
> @@ -1031,6 +1035,7 @@ int __ref remove_memory(int nid, u64 sta
>         /* remove memmap entry */
>         firmware_map_remove(start, start + size, "System RAM");
>
> +       __remove_pages(start >> PAGE_SHIFT, size >> PAGE_SHIFT);
>  out:
>         unlock_memory_hotplug();
>         return ret;
> Index: linux-3.5-rc6/include/linux/memory_hotplug.h
> ===================================================================
> --- linux-3.5-rc6.orig/include/linux/memory_hotplug.h   2012-07-18 18:00:27.445145371 +0900
> +++ linux-3.5-rc6/include/linux/memory_hotplug.h        2012-07-18 18:00:40.461982690 +0900
> @@ -89,8 +89,7 @@ extern bool is_pageblock_removable_noloc
>  /* reasonably generic interface to expand the physical pages in a zone  */
>  extern int __add_pages(int nid, struct zone *zone, unsigned long start_pfn,
>         unsigned long nr_pages);
> -extern int __remove_pages(struct zone *zone, unsigned long start_pfn,
> -       unsigned long nr_pages);
> +extern int __remove_pages(unsigned long start_pfn, unsigned long nr_pages);
>
>  #ifdef CONFIG_NUMA
>  extern int memory_add_physaddr_to_nid(u64 start);
> Index: linux-3.5-rc6/arch/powerpc/platforms/pseries/hotplug-memory.c
> ===================================================================
> --- linux-3.5-rc6.orig/arch/powerpc/platforms/pseries/hotplug-memory.c  2012-07-18 18:00:27.442145407 +0900
> +++ linux-3.5-rc6/arch/powerpc/platforms/pseries/hotplug-memory.c       2012-07-18 18:00:40.470982578 +0900
> @@ -76,7 +76,6 @@ unsigned long memory_block_size_bytes(vo
>  static int pseries_remove_memblock(unsigned long base, unsigned int memblock_size)
>  {
>         unsigned long start, start_pfn;
> -       struct zone *zone;
>         int i, ret;
>         int sections_to_remove;
>
> @@ -87,8 +86,6 @@ static int pseries_remove_memblock(unsig
>                 return 0;
>         }
>
> -       zone = page_zone(pfn_to_page(start_pfn));
> -
>         /*
>          * Remove section mappings and sysfs entries for the
>          * section of the memory we are removing.
> @@ -101,7 +98,7 @@ static int pseries_remove_memblock(unsig
>         sections_to_remove = (memblock_size >> PAGE_SHIFT) / PAGES_PER_SECTION;
>         for (i = 0; i < sections_to_remove; i++) {
>                 unsigned long pfn = start_pfn + i * PAGES_PER_SECTION;
> -               ret = __remove_pages(zone, start_pfn,  PAGES_PER_SECTION);
> +               ret = __remove_pages(start_pfn,  PAGES_PER_SECTION);
>                 if (ret)
>                         return ret;
>         }
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>


-- 
Regards,
--Bob

^ permalink raw reply

* Re: [RFC PATCH v4 1/13] memory-hotplug : rename remove_memory to offline_memory
From: Bob Liu @ 2012-07-19  8:19 UTC (permalink / raw)
  To: Yasuaki Ishimatsu
  Cc: len.brown, wency, linux-acpi, linux-kernel, linux-mm, paulus,
	minchan.kim, kosaki.motohiro, rientjes, cl, linuxppc-dev, akpm,
	liuj97
In-Reply-To: <50068A6E.5050904@jp.fujitsu.com>

Hi Yasuaki,

On Wed, Jul 18, 2012 at 6:05 PM, Yasuaki Ishimatsu
<isimatu.yasuaki@jp.fujitsu.com> wrote:
> remove_memory() does not remove memory but just offlines memory. The patch
> changes name of it to offline_memory().

Since offline_memory() just align the start/end pfn and there is no
matched online_memory() function,
i think it's better to remove this function and add the alignment into
offline_pages().

>
> CC: David Rientjes <rientjes@google.com>
> CC: Jiang Liu <liuj97@gmail.com>
> CC: Len Brown <len.brown@intel.com>
> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> CC: Paul Mackerras <paulus@samba.org>
> CC: Christoph Lameter <cl@linux.com>
> Cc: Minchan Kim <minchan.kim@gmail.com>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> CC: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>
> ---
>  drivers/acpi/acpi_memhotplug.c |    2 +-
>  drivers/base/memory.c          |    4 ++--
>  include/linux/memory_hotplug.h |    2 +-
>  mm/memory_hotplug.c            |    6 +++---
>  4 files changed, 7 insertions(+), 7 deletions(-)
>
> Index: linux-3.5-rc4/drivers/acpi/acpi_memhotplug.c
> ===================================================================
> --- linux-3.5-rc4.orig/drivers/acpi/acpi_memhotplug.c   2012-07-03 14:21:46.102416917 +0900
> +++ linux-3.5-rc4/drivers/acpi/acpi_memhotplug.c        2012-07-03 14:21:49.458374960 +0900
> @@ -318,7 +318,7 @@ static int acpi_memory_disable_device(st
>          */
>         list_for_each_entry_safe(info, n, &mem_device->res_list, list) {
>                 if (info->enabled) {
> -                       result = remove_memory(info->start_addr, info->length);
> +                       result = offline_memory(info->start_addr, info->length);
>                         if (result)
>                                 return result;
>                 }
> Index: linux-3.5-rc4/drivers/base/memory.c
> ===================================================================
> --- linux-3.5-rc4.orig/drivers/base/memory.c    2012-07-03 14:21:46.095417003 +0900
> +++ linux-3.5-rc4/drivers/base/memory.c 2012-07-03 14:21:49.459374948 +0900
> @@ -266,8 +266,8 @@ memory_block_action(unsigned long phys_i
>                         break;
>                 case MEM_OFFLINE:
>                         start_paddr = page_to_pfn(first_page) << PAGE_SHIFT;
> -                       ret = remove_memory(start_paddr,
> -                                           nr_pages << PAGE_SHIFT);
> +                       ret = offline_memory(start_paddr,
> +                                            nr_pages << PAGE_SHIFT);
>                         break;
>                 default:
>                         WARN(1, KERN_WARNING "%s(%ld, %ld) unknown action: "
> Index: linux-3.5-rc4/mm/memory_hotplug.c
> ===================================================================
> --- linux-3.5-rc4.orig/mm/memory_hotplug.c      2012-07-03 14:21:46.102416917 +0900
> +++ linux-3.5-rc4/mm/memory_hotplug.c   2012-07-03 14:21:49.466374860 +0900
> @@ -990,7 +990,7 @@ out:
>         return ret;
>  }
>
> -int remove_memory(u64 start, u64 size)
> +int offline_memory(u64 start, u64 size)
>  {
>         unsigned long start_pfn, end_pfn;
>
> @@ -999,9 +999,9 @@ int remove_memory(u64 start, u64 size)
>         return offline_pages(start_pfn, end_pfn, 120 * HZ);
>  }
>  #else
> -int remove_memory(u64 start, u64 size)
> +int offline_memory(u64 start, u64 size)
>  {
>         return -EINVAL;
>  }
>  #endif /* CONFIG_MEMORY_HOTREMOVE */
> -EXPORT_SYMBOL_GPL(remove_memory);
> +EXPORT_SYMBOL_GPL(offline_memory);
> Index: linux-3.5-rc4/include/linux/memory_hotplug.h
> ===================================================================
> --- linux-3.5-rc4.orig/include/linux/memory_hotplug.h   2012-07-03 14:21:46.102416917 +0900
> +++ linux-3.5-rc4/include/linux/memory_hotplug.h        2012-07-03 14:21:49.471374796 +0900
> @@ -233,7 +233,7 @@ static inline int is_mem_section_removab
>  extern int mem_online_node(int nid);
>  extern int add_memory(int nid, u64 start, u64 size);
>  extern int arch_add_memory(int nid, u64 start, u64 size);
> -extern int remove_memory(u64 start, u64 size);
> +extern int offline_memory(u64 start, u64 size);
>  extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn,
>                                                                 int nr_pages);
>  extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms);
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>


-- 
Regards,
--Bob

^ permalink raw reply

* Re: [PATCH] [powerpc] Export memory limit via device tree
From: Suzuki K. Poulose @ 2012-07-19  8:00 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: mahesh, linuxppc-dev, linux-kernel
In-Reply-To: <1341985013.18850.29.camel@pasglop>

On 07/11/2012 11:06 AM, Benjamin Herrenschmidt wrote:
>> diff --git a/arch/powerpc/kernel/machine_kexec.c b/arch/powerpc/kernel/machine_kexec.c
>> index c957b12..0c9695d 100644
>> --- a/arch/powerpc/kernel/machine_kexec.c
>> +++ b/arch/powerpc/kernel/machine_kexec.c
>> @@ -207,6 +207,12 @@ static struct property crashk_size_prop = {
>>   	.value = &crashk_size,
>>   };
>>
>> +static struct property memory_limit_prop = {
>> +	.name = "linux,memory-limit",
>> +	.length = sizeof(phys_addr_t),
>> +	.value = &memory_limit,
>> +};
>> +
>
> AFAIK. phys_addr_t can change size, so instead make it point to a known
> fixes size quantity (a u64).
Ben,

Sorry for the delay in the response.

Some of the other properties are also of phys_addr_t, (e.g 
linux,crashkernel-base, linux,kernel-end ). Should we fix them as well ?

Or

Should we leave this also a phys_addr_t and let the userspace handle it ?

>
>> +
>> +	/* memory-limit is needed for constructing the crash regions */
>> +	prop = of_find_property(node, memory_limit_prop.name, NULL);
>> +	if (prop)
>> +		prom_remove_property(node, prop);
>> +
>> +	if (memory_limit)
>> +		prom_add_property(node, &memory_limit_prop);
>> +
>
> There's a patch floating around making prom_update_property properly
> handle both pre-existing and non-pre-existing props, you should probably
> base yourself on top of it. I'm about to stick that patch in powerpc
> -next
>
OK. I am testing the new patch based on the above commit. I will wait
for the clarification on the issue of the type, before I post it here.

Thanks
Suzuki

^ permalink raw reply

* Re: [PATCH 05/15] pci: resource assignment based on p2p alignment
From: Gavin Shan @ 2012-07-19  7:24 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: yinghai, linuxppc-dev, Ram Pai, Gavin Shan, linux-pci
In-Reply-To: <CAErSpo79HSj+8o9u6P689UEP9qPTUWrnuNPO3NDjErQa6EbhUg@mail.gmail.com>

On Wed, Jul 18, 2012 at 10:59:52AM -0600, Bjorn Helgaas wrote:
>On Tue, Jul 17, 2012 at 10:25 PM, Ram Pai <linuxram@us.ibm.com> wrote:
>> On Tue, Jul 17, 2012 at 11:14:51AM -0600, Bjorn Helgaas wrote:
>>> On Tue, Jul 17, 2012 at 4:38 AM, Benjamin Herrenschmidt
>>> <benh@kernel.crashing.org> wrote:
>>> > On Tue, 2012-07-17 at 18:03 +0800, Ram Pai wrote:
>>> >>         Lets say we passed that 'type' flag to size the minimum
>>> >>         alignment constraints for that b_res.  And window_alignment(bus,
>>> >>         type) of your platform  used that 'type' information to
>>> >>         determine whether to use the alignment constraints of 32-bit
>>> >>         window or 64-bit window.
>>> >>
>>> >>         However, later when the b_res is actually allocated a resource,
>>> >>         the pci_assign_resource() has no idea whether to allocate 32-bit
>>> >>         window resource or 64-bit window resource, because the 'type'
>>> >>         information is not captured anywhere in b_res.
>>> >>
>>> >>         You would basically have a disconnect between what is sized and
>>> >>         what is allocated. Unless offcourse you pass that 'type' to
>>> >>         the b_res->flags, which is currently not the case.
>>> >
>>> > Right, we ideally would need the core to query the alignment once per
>>> > "apertures" it tries as a candidate to allocate a given resource... but
>>> > that's for later.
>>> >
>>> > For now we can probably live with giving out the max of the minimum
>>> > alignment we support for M64 and our M32 segment size.
>>>
>>> We already know the aperture we're proposing to allocate from (the
>>> result of find_free_bus_resource()), don't we?  What if we passed it
>>> to pcibios_window_alignment() along with the struct pci_bus *, e.g.:
>>>
>>>   resource_size_t pcibios_window_alignment(struct pci_bus *bus, struct
>>> resource *window)
>>
>> Hmm..'struct resource *window' may not yet know which aperature it is
>> allocated from. Will it? We are still in the sizing process, the allocation will
>> be done much later.
>
>Of course, you're absolutely right; I had this backwards.  In
>pbus_size_io/mem(), we do "b_res = find_free_bus_resource()", so b_res
>is a bus resource that matches the desired type (IO/MEM).  This
>resource represents an aperture of the upstream bridge leading to the
>bus.  I was thinking that b_res->start would contain address
>information that the arch could use to decide alignment.
>
>But at this point, in pbus_size_io/mem(), we set "b_res->start =
>min_align", so obviously b_res contains no information about the
>window base that will eventually be assigned.  I think b_res is
>basically the *container* into which we'll eventually put the P2P
>aperture start/end, but here, we're using that container to hold the
>information about the size and alignment we need for that aperture.
>
>The fact that we deal with alignment in pbus_size_mem() and again in
>__pci_assign_resource() (via pcibios_align_resource) is confusing to
>me -- I don't have a clear idea of what sorts of alignment are done in
>each place.  Could this powerpc alignment be done in
>pcibios_align_resource()?  We do have the actual proposed address
>there, as well as the pci_dev.
>

If I understood correctly, it's a bit hard to put PowerPC alignment in
the function pcibios_align_resource(). The target of those patches is
to make those I/O and memory windows of p2p bridges aligned based on
the special requirement from specific platform, so that we can put
the corresponding PCI bus directed from the p2p bridge into separate
EEH segment. Unforunately, pcibios_align_resource() was implemented
based on individual bars (resources) and individual bars doesn't
have the alignment requirement under current situation :-)

Thanks,
Gavin

>Bjorn
>_______________________________________________
>Linuxppc-dev mailing list
>Linuxppc-dev@lists.ozlabs.org
>https://lists.ozlabs.org/listinfo/linuxppc-dev
>

^ permalink raw reply

* Re: [RFC PATCH v4 2/13] memory-hotplug : add physical memory hotplug code to acpi_memory_device_remove
From: Wen Congyang @ 2012-07-19  7:23 UTC (permalink / raw)
  To: Yasuaki Ishimatsu
  Cc: len.brown, linux-acpi, linux-kernel, linux-mm, paulus,
	minchan.kim, kosaki.motohiro, rientjes, cl, linuxppc-dev, akpm,
	liuj97
In-Reply-To: <50068AB9.20005@jp.fujitsu.com>

At 07/18/2012 06:06 PM, Yasuaki Ishimatsu Wrote:
> acpi_memory_device_remove() has been prepared to remove physical memory.
> But, the function only frees acpi_memory_device currentlry. 
> 
> The patch adds following functions into acpi_memory_device_remove():
>   - offline memory
>   - remove physical memory. It only check whether memory is online or not.
>   - free acpi_memory_device
> 
> CC: David Rientjes <rientjes@google.com>
> CC: Jiang Liu <liuj97@gmail.com>
> CC: Len Brown <len.brown@intel.com>
> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> CC: Paul Mackerras <paulus@samba.org> 
> CC: Christoph Lameter <cl@linux.com>
> Cc: Minchan Kim <minchan.kim@gmail.com>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> 
> CC: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> 
> ---
>  drivers/acpi/acpi_memhotplug.c |   27 ++++++++++++++++++++++++++-
>  drivers/base/memory.c          |   39 +++++++++++++++++++++++++++++++++++++++
>  include/linux/memory.h         |    5 +++++
>  include/linux/memory_hotplug.h |    5 +++++
>  mm/memory_hotplug.c            |   22 ++++++++++++++++++++++
>  5 files changed, 97 insertions(+), 1 deletion(-)
> 
> Index: linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c
> ===================================================================
> --- linux-3.5-rc6.orig/drivers/acpi/acpi_memhotplug.c	2012-07-17 11:20:15.117796971 +0900
> +++ linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c	2012-07-17 13:36:30.325594022 +0900
> @@ -29,6 +29,7 @@
>  #include <linux/module.h>
>  #include <linux/init.h>
>  #include <linux/types.h>
> +#include <linux/memory.h>
>  #include <linux/memory_hotplug.h>
>  #include <linux/slab.h>
>  #include <acpi/acpi_drivers.h>
> @@ -452,12 +453,36 @@ static int acpi_memory_device_add(struct
>  static int acpi_memory_device_remove(struct acpi_device *device, int type)
>  {
>  	struct acpi_memory_device *mem_device = NULL;
> -
> +	struct acpi_memory_info *info, *tmp;
> +	int result;
> +	int node;
>  
>  	if (!device || !acpi_driver_data(device))
>  		return -EINVAL;
>  
>  	mem_device = acpi_driver_data(device);
> +
> +	node = acpi_get_node(mem_device->device->handle);
> +	list_for_each_entry_safe(info, tmp, &mem_device->res_list, list) {
> +		if (!info->enabled)
> +			continue;
> +
> +		if (!is_memblk_offline(info->start_addr, info->length)) {
> +			result = offline_memory(info->start_addr, info->length);
> +			if (result)
> +				return result;
> +		}
> +		if (node < 0)
> +			node = memory_add_physaddr_to_nid(info->start_addr);
> +
> +		result = remove_memory(node, info->start_addr, info->length);
> +		if (result)
> +			return result;
> +
> +		list_del(&info->list);
> +		kfree(info);
> +	}
> +
>  	kfree(mem_device);
>  
>  	return 0;
> Index: linux-3.5-rc6/include/linux/memory_hotplug.h
> ===================================================================
> --- linux-3.5-rc6.orig/include/linux/memory_hotplug.h	2012-07-17 11:20:15.133796772 +0900
> +++ linux-3.5-rc6/include/linux/memory_hotplug.h	2012-07-17 11:29:41.490716352 +0900
> @@ -221,6 +221,7 @@ static inline void unlock_memory_hotplug
>  #ifdef CONFIG_MEMORY_HOTREMOVE
>  
>  extern int is_mem_section_removable(unsigned long pfn, unsigned long nr_pages);
> +extern int remove_memory(int nid, u64 start, u64 size);
>  
>  #else
>  static inline int is_mem_section_removable(unsigned long pfn,
> @@ -228,6 +229,10 @@ static inline int is_mem_section_removab
>  {
>  	return 0;
>  }
> +static inline int remove_memory(int nid, u64 start, u64 size)
> +{
> +	return -EBUSY;
> +}
>  #endif /* CONFIG_MEMORY_HOTREMOVE */
>  
>  extern int mem_online_node(int nid);
> Index: linux-3.5-rc6/mm/memory_hotplug.c
> ===================================================================
> --- linux-3.5-rc6.orig/mm/memory_hotplug.c	2012-07-17 11:20:15.129796821 +0900
> +++ linux-3.5-rc6/mm/memory_hotplug.c	2012-07-17 13:25:18.952986069 +0900
> @@ -998,6 +998,28 @@ int offline_memory(u64 start, u64 size)
>  	end_pfn = start_pfn + PFN_DOWN(size);
>  	return offline_pages(start_pfn, end_pfn, 120 * HZ);
>  }
> +
> +int remove_memory(int nid, u64 start, u64 size)
> +{
> +	int ret = -EBUSY;
> +	lock_memory_hotplug();
> +	/*
> +	 * The memory might become online by other task, even if you offine it.
> +	 * So we check whether the cpu has been onlined or not.
> +	 */
> +	if (!is_memblk_offline(start, size)) {
> +		pr_warn("memory removing [mem %#010llx-%#010llx] failed, "
> +			"because the memmory range is online\n",
> +			start, start + size);
> +		ret = -EAGAIN;
> +	}
> +
> +	unlock_memory_hotplug();
> +	return ret;
> +
> +}
> +EXPORT_SYMBOL_GPL(remove_memory);
> +
>  #else
>  int offline_memory(u64 start, u64 size)
>  {
> Index: linux-3.5-rc6/drivers/base/memory.c
> ===================================================================
> --- linux-3.5-rc6.orig/drivers/base/memory.c	2012-07-17 11:20:15.120796934 +0900
> +++ linux-3.5-rc6/drivers/base/memory.c	2012-07-17 11:20:54.626302995 +0900
> @@ -70,6 +70,45 @@ void unregister_memory_isolate_notifier(
>  }
>  EXPORT_SYMBOL(unregister_memory_isolate_notifier);
>  
> +bool is_memblk_offline(unsigned long start, unsigned long size)
> +{
> +	struct memory_block *mem = NULL;
> +	struct mem_section *section;
> +	unsigned long start_pfn, end_pfn;
> +	unsigned long pfn, section_nr;
> +
> +	start_pfn = PFN_DOWN(start);
> +	end_pfn = start_pfn + PFN_DOWN(start);

This line is wrong. I think you want this:
end_pfn = start_pfn + PFN_UP(size);

> +
> +	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
> +		section_nr = pfn_to_section_nr(pfn);
> +		if (!present_section_nr(section_nr));

The ';' should be removed. Otherwise, this function always return true...

Thanks
Wen Congyang

> +			continue;
> +
> +		section = __nr_to_section(section_nr);
> +		/* same memblock? */
> +		if (mem)
> +			if((section_nr >= mem->start_section_nr) &&
> +			   (section_nr <= mem->end_section_nr))
> +				continue;
> +
> +		mem = find_memory_block_hinted(section, mem);
> +		if (!mem)
> +			continue;
> +		if (mem->state == MEM_OFFLINE)
> +			continue;
> +
> +		kobject_put(&mem->dev.kobj);
> +		return false;
> +	}
> +
> +	if (mem)
> +		kobject_put(&mem->dev.kobj);
> +
> +	return true;
> +}
> +EXPORT_SYMBOL(is_memblk_offline);
> +
>  /*
>   * register_memory - Setup a sysfs device for a memory block
>   */
> Index: linux-3.5-rc6/include/linux/memory.h
> ===================================================================
> --- linux-3.5-rc6.orig/include/linux/memory.h	2012-07-17 11:18:00.693477455 +0900
> +++ linux-3.5-rc6/include/linux/memory.h	2012-07-17 11:20:54.632302919 +0900
> @@ -106,6 +106,10 @@ static inline int memory_isolate_notify(
>  {
>  	return 0;
>  }
> +static inline bool is_memblk_offline(unsigned long start, unsigned long size)
> +{
> +	return false;
> +}
>  #else
>  extern int register_memory_notifier(struct notifier_block *nb);
>  extern void unregister_memory_notifier(struct notifier_block *nb);
> @@ -120,6 +124,7 @@ extern int memory_isolate_notify(unsigne
>  extern struct memory_block *find_memory_block_hinted(struct mem_section *,
>  							struct memory_block *);
>  extern struct memory_block *find_memory_block(struct mem_section *);
> +extern bool is_memblk_offline(unsigned long start, unsigned long size);
>  #define CONFIG_MEM_BLOCK_SIZE	(PAGES_PER_SECTION<<PAGE_SHIFT)
>  enum mem_add_context { BOOT, HOTPLUG };
>  #endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */
> 
> 

^ permalink raw reply

* [RESEND RFC PATCH v4 11/13] memory-hotplug : free memmap of sparse-vmemmap
From: Yasuaki Ishimatsu @ 2012-07-19  6:17 UTC (permalink / raw)
  To: linux-mm, linux-kernel, linuxppc-dev, linux-acpi
  Cc: len.brown, wency, paulus, minchan.kim, kosaki.motohiro, rientjes,
	cl, akpm, liuj97
In-Reply-To: <50068D09.1050704@jp.fujitsu.com>

All pages of virtual mapping in removed memory cannot be freed, since some pages
used as PGD/PUD includes not only removed memory but also other memory. So the
patch checks whether page can be freed or not.

How to check whether page can be freed or not?
 1. When removing memory, the page structs of the revmoved memory are filled
    with 0FD.
 2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared.
    In this case, the page used as PT/PMD can be freed.

Applying patch, __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is integrated
into one. So __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is deleted.

CC: David Rientjes <rientjes@google.com>
CC: Jiang Liu <liuj97@gmail.com>
CC: Len Brown <len.brown@intel.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org> 
CC: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> 
CC: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

---
 arch/x86/mm/init_64.c |  121 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/mm.h    |    2 
 mm/memory_hotplug.c   |   17 -------
 mm/sparse.c           |    5 +-
 4 files changed, 128 insertions(+), 17 deletions(-)

Index: linux-3.5-rc6/include/linux/mm.h
===================================================================
--- linux-3.5-rc6.orig/include/linux/mm.h	2012-07-19 15:07:48.836986796 +0900
+++ linux-3.5-rc6/include/linux/mm.h	2012-07-19 15:07:59.101858469 +0900
@@ -1588,6 +1588,8 @@ int vmemmap_populate(struct page *start_
 void vmemmap_populate_print_last(void);
 void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
 				  unsigned long size);
+void vmemmap_kfree(struct page *memmpa, unsigned long nr_pages);
+void vmemmap_free_bootmem(struct page *memmpa, unsigned long nr_pages);
 
 enum mf_flags {
 	MF_COUNT_INCREASED = 1 << 0,
Index: linux-3.5-rc6/mm/sparse.c
===================================================================
--- linux-3.5-rc6.orig/mm/sparse.c	2012-07-19 11:57:09.065797011 +0900
+++ linux-3.5-rc6/mm/sparse.c	2012-07-19 15:07:59.114858306 +0900
@@ -614,12 +614,13 @@ static inline struct page *kmalloc_secti
 	/* This will make the necessary allocations eventually. */
 	return sparse_mem_map_populate(pnum, nid);
 }
-static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages)
+static void __kfree_section_memmap(struct page *page, unsigned long nr_pages)
 {
-	return; /* XXX: Not implemented yet */
+	vmemmap_kfree(page, nr_pages);
 }
 static void free_map_bootmem(struct page *page, unsigned long nr_pages)
 {
+	vmemmap_free_bootmem(page, nr_pages);
 }
 #else
 static struct page *__kmalloc_section_memmap(unsigned long nr_pages)
Index: linux-3.5-rc6/arch/x86/mm/init_64.c
===================================================================
--- linux-3.5-rc6.orig/arch/x86/mm/init_64.c	2012-07-19 15:07:48.898986022 +0900
+++ linux-3.5-rc6/arch/x86/mm/init_64.c	2012-07-19 15:14:05.870273270 +0900
@@ -978,6 +978,127 @@ vmemmap_populate(struct page *start_page
 	return 0;
 }
 
+#define PAGE_INUSE 0xFD
+
+unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long end,
+			    struct page **pp, int *page_size)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte;
+	void *page_addr;
+	unsigned long next;
+
+	*pp = NULL;
+
+	pgd = pgd_offset_k(addr);
+	if (pgd_none(*pgd))
+		return pgd_addr_end(addr, end);
+
+	pud = pud_offset(pgd, addr);
+	if (pud_none(*pud))
+		return pud_addr_end(addr, end);
+
+	if (!cpu_has_pse) {
+		next = (addr + PAGE_SIZE) & PAGE_MASK;
+		pmd = pmd_offset(pud, addr);
+		if (pmd_none(*pmd))
+			return next;
+
+		pte = pte_offset_kernel(pmd, addr);
+		if (pte_none(*pte))
+			return next;
+
+		*page_size = PAGE_SIZE;
+		*pp = pte_page(*pte);
+	} else {
+		next = pmd_addr_end(addr, end);
+
+		pmd = pmd_offset(pud, addr);
+		if (pmd_none(*pmd))
+			return next;
+
+		*page_size = PMD_SIZE;
+		*pp = pmd_page(*pmd);
+	}
+
+	/*
+	 * Removed page structs are filled with 0xFD.
+	 */
+	memset((void *)addr, PAGE_INUSE, next - addr);
+
+	page_addr = page_address(*pp);
+
+	/*
+	 * Check the page is filled with 0xFD or not.
+	 * memchr_inv() returns the address. In this case, we cannot
+	 * clear PTE/PUD entry, since the page is used by other.
+	 * So we cannot also free the page.
+	 *
+	 * memchr_inv() returns NULL. In this case, we can clear
+	 * PTE/PUD entry, since the page is not used by other.
+	 * So we can also free the page.
+	 */
+	if (memchr_inv(page_addr, PAGE_INUSE, *page_size)) {
+		*pp = NULL;
+		return next;
+	}
+
+	if (!cpu_has_pse)
+		pte_clear(&init_mm, addr, pte);
+	else
+		pmd_clear(pmd);
+
+	return next;
+}
+
+void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
+{
+	unsigned long addr = (unsigned long)memmap;
+	unsigned long end = (unsigned long)(memmap + nr_pages);
+	unsigned long next;
+	struct page *page;
+	int page_size;
+
+	for (; addr < end; addr = next) {
+		page = NULL;
+		page_size = 0;
+		next = find_and_clear_pte_page(addr, end, &page, &page_size);
+		if (!page)
+			continue;
+
+		free_pages((unsigned long)page_address(page),
+			    get_order(page_size));
+		__flush_tlb_one((unsigned long)page_address(page));
+	}
+
+}
+
+void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
+{
+	unsigned long addr = (unsigned long)memmap;
+	unsigned long end = (unsigned long)(memmap + nr_pages);
+	unsigned long next;
+	struct page *page;
+	int page_size;
+	unsigned long magic;
+
+	for (; addr < end; addr = next) {
+		page = NULL;
+		page_size = 0;
+		next = find_and_clear_pte_page(addr, end, &page, &page_size);
+		if (!page)
+			continue;
+
+		magic = (unsigned long) page->lru.next;
+		if (magic == SECTION_INFO)
+			put_page_bootmem(page);
+		flush_tlb_kernel_range(addr, end);
+	}
+
+}
+
 void register_page_bootmem_memmap(unsigned long section_nr,
 				  struct page *start_page, unsigned long size)
 {
Index: linux-3.5-rc6/mm/memory_hotplug.c
===================================================================
--- linux-3.5-rc6.orig/mm/memory_hotplug.c	2012-07-19 15:07:48.815987060 +0900
+++ linux-3.5-rc6/mm/memory_hotplug.c	2012-07-19 15:12:27.536502452 +0900
@@ -300,19 +300,6 @@ static int __meminit __add_section(int n
 	return register_new_memory(nid, __pfn_to_section(phys_start_pfn));
 }
 
-#ifdef CONFIG_SPARSEMEM_VMEMMAP
-static int __remove_section(struct zone *zone, struct mem_section *ms)
-{
-	int ret = -EINVAL;
-
-	if (!valid_section(ms))
-		return ret;
-
-	ret = unregister_memory_section(ms);
-
-	return ret;
-}
-#else
 static int __remove_section(struct zone *zone, struct mem_section *ms)
 {
 	unsigned long flags;
@@ -329,9 +316,9 @@ static int __remove_section(struct zone 
 	pgdat_resize_lock(pgdat, &flags);
 	sparse_remove_one_section(zone, ms);
 	pgdat_resize_unlock(pgdat, &flags);
-	return 0;
+
+	return ret;
 }
-#endif
 
 /*
  * Reasonably generic function for adding memory.  It is

^ permalink raw reply

* Re: [RFC PATCH v4 11/13] memory-hotplug : free memmap of sparse-vmemmap
From: Yasuaki Ishimatsu @ 2012-07-19  6:11 UTC (permalink / raw)
  To: Wen Congyang
  Cc: len.brown, linux-acpi, linux-kernel, linux-mm, paulus,
	minchan.kim, kosaki.motohiro, rientjes, cl, linuxppc-dev, akpm,
	liuj97
In-Reply-To: <5007A217.2060500@cn.fujitsu.com>

Hi Wen,

2012/07/19 14:58, Wen Congyang wrote:
> At 07/18/2012 06:16 PM, Yasuaki Ishimatsu Wrote:
>> All pages of virtual mapping in removed memory cannot be freed, since some pages
>> used as PGD/PUD includes not only removed memory but also other memory. So the
>> patch checks whether page can be freed or not.
>>
>> How to check whether page can be freed or not?
>>   1. When removing memory, the page structs of the revmoved memory are filled
>>      with 0FD.
>>   2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared.
>>      In this case, the page used as PT/PMD can be freed.
>>
>> Applying patch, __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is integrated
>> into one. So __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is deleted.
>>
>> CC: David Rientjes <rientjes@google.com>
>> CC: Jiang Liu <liuj97@gmail.com>
>> CC: Len Brown <len.brown@intel.com>
>> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> CC: Paul Mackerras <paulus@samba.org>
>> CC: Christoph Lameter <cl@linux.com>
>> Cc: Minchan Kim <minchan.kim@gmail.com>
>> CC: Andrew Morton <akpm@linux-foundation.org>
>> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> CC: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>>
>> ---
>>   arch/x86/mm/init_64.c |  121 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>   include/linux/mm.h    |    2
>>   mm/memory_hotplug.c   |   19 -------
>>   mm/sparse.c           |    5 +-
>>   4 files changed, 128 insertions(+), 19 deletions(-)
>>
>> Index: linux-3.5-rc6/include/linux/mm.h
>> ===================================================================
>> --- linux-3.5-rc6.orig/include/linux/mm.h	2012-07-18 18:01:28.000000000 +0900
>> +++ linux-3.5-rc6/include/linux/mm.h	2012-07-18 18:03:05.551168773 +0900
>> @@ -1588,6 +1588,8 @@ int vmemmap_populate(struct page *start_
>>   void vmemmap_populate_print_last(void);
>>   void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
>>   				  unsigned long size);
>> +void vmemmap_kfree(struct page *memmpa, unsigned long nr_pages);
>> +void vmemmap_free_bootmem(struct page *memmpa, unsigned long nr_pages);
>>   
>>   enum mf_flags {
>>   	MF_COUNT_INCREASED = 1 << 0,
>> Index: linux-3.5-rc6/mm/sparse.c
>> ===================================================================
>> --- linux-3.5-rc6.orig/mm/sparse.c	2012-07-18 17:59:25.000000000 +0900
>> +++ linux-3.5-rc6/mm/sparse.c	2012-07-18 18:03:05.553168749 +0900
>> @@ -614,12 +614,13 @@ static inline struct page *kmalloc_secti
>>   	/* This will make the necessary allocations eventually. */
>>   	return sparse_mem_map_populate(pnum, nid);
>>   }
>> -static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages)
>> +static void __kfree_section_memmap(struct page *page, unsigned long nr_pages)
>>   {
>> -	return; /* XXX: Not implemented yet */
>> +	vmemmap_kfree(page, nr_pages);
>>   }
>>   static void free_map_bootmem(struct page *page, unsigned long nr_pages)
>>   {
>> +	vmemmap_free_bootmem(page, nr_pages);
>>   }
>>   #else
>>   static struct page *__kmalloc_section_memmap(unsigned long nr_pages)
>> Index: linux-3.5-rc6/arch/x86/mm/init_64.c
>> ===================================================================
>> --- linux-3.5-rc6.orig/arch/x86/mm/init_64.c	2012-07-18 18:01:28.000000000 +0900
>> +++ linux-3.5-rc6/arch/x86/mm/init_64.c	2012-07-18 18:03:05.564168611 +0900
>> @@ -978,6 +978,127 @@ vmemmap_populate(struct page *start_page
>>   	return 0;
>>   }
>>   
>> +#define PAGE_INUSE 0xFD
>> +
>> +unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long end,
>> +			    struct page **pp, int *page_size)
>> +{
>> +	pgd_t *pgd;
>> +	pud_t *pud;
>> +	pmd_t *pmd;
>> +	pte_t *pte;
>> +	void *page_addr;
>> +	unsigned long next;
>> +
>> +	*pp = NULL;
>> +
>> +	pgd = pgd_offset_k(addr);
>> +	if (pgd_none(*pgd))
>> +		return pgd_addr_end(addr, end);
>> +
>> +	pud = pud_offset(pgd, addr);
>> +	if (pud_none(*pud))
>> +		return pud_addr_end(addr,end);
>> +
>> +	if (!cpu_has_pse) {
>> +		next = (addr + PAGE_SIZE) & PAGE_MASK;
>> +		pmd = pmd_offset(pud, addr);
>> +		if (pmd_none(*pmd))
>> +			return next;
>> +
>> +		pte = pte_offset_kernel(pmd, addr);
>> +		if (pte_none(*pte))
>> +			return next;
>> +
>> +		*page_size = PAGE_SIZE;
>> +		*pp = pte_page(*pte);
>> +	} else {
>> +		next = pmd_addr_end(addr, end);
>> +
>> +		pmd = pmd_offset(pud, addr);
>> +		if (pmd_none(*pmd))
>> +			return next;
>> +
>> +		*page_size = PMD_SIZE;
>> +		*pp = pmd_page(*pmd);
>> +	}
>> +
>> +	/*
>> +	 * Removed page structs are filled with 0xFD.
>> +	 */
>> +	memset((void *)addr, PAGE_INUSE, next - addr);
>> +
>> +	page_addr = page_address(*pp);
>> +
>> +	/*
>> +	 * Check the page is filled with 0xFD or not.
>> +	 * memchr_inv() returns the address. In this case, we cannot
>> +	 * clear PTE/PUD entry, since the page is used by other.
>> +	 * So we cannot also free the page.
>> +	 *
>> +	 * memchr_inv() returns NULL. In this case, we can clear
>> +	 * PTE/PUD entry, since the page is not used by other.
>> +	 * So we can also free the page.
>> +	 */
>> +	if (memchr_inv(page_addr, PAGE_INUSE, *page_size)) {
>> +		*pp = NULL;
>> +		return next;
>> +	}
>> +
>> +	if (!cpu_has_pse)
>> +		pte_clear(&init_mm, addr, pte);
>> +	else
>> +		pmd_clear(pmd);
>> +
>> +	return next;
>> +}
>> +
>> +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
>> +{
>> +	unsigned long addr = (unsigned long)memmap;
>> +	unsigned long end = (unsigned long)(memmap + nr_pages);
>> +	unsigned long next;
>> +	struct page *page;
>> +	int page_size;
>> +
>> +	for (; addr < end; addr = next) {
>> +		page = NULL;
>> +		page_size = 0;
>> +		next = find_and_clear_pte_page(addr, end, &page, &page_size);
>> +		if (!page)
>> +			continue;
>> +
>> +		free_pages((unsigned long)page_address(page),
>> +			    get_order(page_size));
>> +		__flush_tlb_one((unsigned long)page_address(page));
>> +	}
>> +
>> +}
>> +
>> +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
>> +{
>> +	unsigned long addr = (unsigned long)memmap;
>> +	unsigned long end = (unsigned long)(memmap + nr_pages);
>> +	unsigned long next;
>> +	struct page *page;
>> +	int page_size;
>> +	unsigned long magic;
>> +
>> +	for (; addr < end; addr = next) {
>> +		page = NULL;
>> +		page_size = 0;
>> +		next = find_and_clear_pte_page(addr, end, &page, &page_size);
>> +		if (!page)
>> +			continue;
>> +
>> +		magic = (unsigned long) page->lru.next;
>> +		if (magic == SECTION_INFO)
>> +			put_page_bootmem(page);
>> +		flush_tlb_kernel_range(addr, end);
>> +	}
>> +
>> +}
>> +
>>   void register_page_bootmem_memmap(unsigned long section_nr,
>>   				  struct page *start_page, unsigned long size)
>>   {
>> Index: linux-3.5-rc6/mm/memory_hotplug.c
>> ===================================================================
>> --- linux-3.5-rc6.orig/mm/memory_hotplug.c	2012-07-18 18:01:28.000000000 +0900
>> +++ linux-3.5-rc6/mm/memory_hotplug.c	2012-07-18 18:25:11.036597977 +0900
>> @@ -300,7 +300,6 @@ static int __meminit __add_section(int n
>>   	return register_new_memory(nid, __pfn_to_section(phys_start_pfn));
>>   }
>>   
>> -#ifdef CONFIG_SPARSEMEM_VMEMMAP
>>   static int __remove_section(struct zone *zone, struct mem_section *ms)
>>   {
>>   	int ret = -EINVAL;
>> @@ -309,29 +308,15 @@ static int __remove_section(struct zone
>>   		return ret;
>>   
>>   	ret = unregister_memory_section(ms);
>> -
>> -	return ret;
>> -}
>> -#else
>> -static int __remove_section(struct zone *zone, struct mem_section *ms)
>> -{
>> -	unsigned long flags;
>> -	struct pglist_data *pgdat = zone->zone_pgdat;
> 
> This two line should not be removed. Otherwise, we can not build the kernel.

Oops. I'll resend the patch.

Thanks,
Yasuaki Ishimatsu

> 
> Thanks
> Wen Congyang
> 
>> -	int ret = -EINVAL;
>> -
>> -	if (!valid_section(ms))
>> -		return ret;
>> -
>> -	ret = unregister_memory_section(ms);
>>   	if (ret)
>>   		return ret;
>>   
>>   	pgdat_resize_lock(pgdat, &flags);
>>   	sparse_remove_one_section(zone, ms);
>>   	pgdat_resize_unlock(pgdat, &flags);
>> -	return 0;
>> +
>> +	return ret;
>>   }
>> -#endif
>>   
>>   /*
>>    * Reasonably generic function for adding memory.  It is
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
> 

^ permalink raw reply

* Re: [RFC PATCH v4 11/13] memory-hotplug : free memmap of sparse-vmemmap
From: Wen Congyang @ 2012-07-19  5:58 UTC (permalink / raw)
  To: Yasuaki Ishimatsu
  Cc: len.brown, linux-acpi, linux-kernel, linux-mm, paulus,
	minchan.kim, kosaki.motohiro, rientjes, cl, linuxppc-dev, akpm,
	liuj97
In-Reply-To: <50068D09.1050704@jp.fujitsu.com>

At 07/18/2012 06:16 PM, Yasuaki Ishimatsu Wrote:
> All pages of virtual mapping in removed memory cannot be freed, since some pages
> used as PGD/PUD includes not only removed memory but also other memory. So the
> patch checks whether page can be freed or not.
> 
> How to check whether page can be freed or not?
>  1. When removing memory, the page structs of the revmoved memory are filled
>     with 0FD.
>  2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared.
>     In this case, the page used as PT/PMD can be freed.
> 
> Applying patch, __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is integrated
> into one. So __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is deleted.
> 
> CC: David Rientjes <rientjes@google.com>
> CC: Jiang Liu <liuj97@gmail.com>
> CC: Len Brown <len.brown@intel.com>
> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> CC: Paul Mackerras <paulus@samba.org> 
> CC: Christoph Lameter <cl@linux.com>
> Cc: Minchan Kim <minchan.kim@gmail.com>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> 
> CC: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> 
> ---
>  arch/x86/mm/init_64.c |  121 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/mm.h    |    2 
>  mm/memory_hotplug.c   |   19 -------
>  mm/sparse.c           |    5 +-
>  4 files changed, 128 insertions(+), 19 deletions(-)
> 
> Index: linux-3.5-rc6/include/linux/mm.h
> ===================================================================
> --- linux-3.5-rc6.orig/include/linux/mm.h	2012-07-18 18:01:28.000000000 +0900
> +++ linux-3.5-rc6/include/linux/mm.h	2012-07-18 18:03:05.551168773 +0900
> @@ -1588,6 +1588,8 @@ int vmemmap_populate(struct page *start_
>  void vmemmap_populate_print_last(void);
>  void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
>  				  unsigned long size);
> +void vmemmap_kfree(struct page *memmpa, unsigned long nr_pages);
> +void vmemmap_free_bootmem(struct page *memmpa, unsigned long nr_pages);
>  
>  enum mf_flags {
>  	MF_COUNT_INCREASED = 1 << 0,
> Index: linux-3.5-rc6/mm/sparse.c
> ===================================================================
> --- linux-3.5-rc6.orig/mm/sparse.c	2012-07-18 17:59:25.000000000 +0900
> +++ linux-3.5-rc6/mm/sparse.c	2012-07-18 18:03:05.553168749 +0900
> @@ -614,12 +614,13 @@ static inline struct page *kmalloc_secti
>  	/* This will make the necessary allocations eventually. */
>  	return sparse_mem_map_populate(pnum, nid);
>  }
> -static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages)
> +static void __kfree_section_memmap(struct page *page, unsigned long nr_pages)
>  {
> -	return; /* XXX: Not implemented yet */
> +	vmemmap_kfree(page, nr_pages);
>  }
>  static void free_map_bootmem(struct page *page, unsigned long nr_pages)
>  {
> +	vmemmap_free_bootmem(page, nr_pages);
>  }
>  #else
>  static struct page *__kmalloc_section_memmap(unsigned long nr_pages)
> Index: linux-3.5-rc6/arch/x86/mm/init_64.c
> ===================================================================
> --- linux-3.5-rc6.orig/arch/x86/mm/init_64.c	2012-07-18 18:01:28.000000000 +0900
> +++ linux-3.5-rc6/arch/x86/mm/init_64.c	2012-07-18 18:03:05.564168611 +0900
> @@ -978,6 +978,127 @@ vmemmap_populate(struct page *start_page
>  	return 0;
>  }
>  
> +#define PAGE_INUSE 0xFD
> +
> +unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long end,
> +			    struct page **pp, int *page_size)
> +{
> +	pgd_t *pgd;
> +	pud_t *pud;
> +	pmd_t *pmd;
> +	pte_t *pte;
> +	void *page_addr;
> +	unsigned long next;
> +
> +	*pp = NULL;
> +
> +	pgd = pgd_offset_k(addr);
> +	if (pgd_none(*pgd))
> +		return pgd_addr_end(addr, end);
> +
> +	pud = pud_offset(pgd, addr);
> +	if (pud_none(*pud))
> +		return pud_addr_end(addr,end);
> +
> +	if (!cpu_has_pse) {
> +		next = (addr + PAGE_SIZE) & PAGE_MASK;
> +		pmd = pmd_offset(pud, addr);
> +		if (pmd_none(*pmd))
> +			return next;
> +
> +		pte = pte_offset_kernel(pmd, addr);
> +		if (pte_none(*pte))
> +			return next;
> +
> +		*page_size = PAGE_SIZE;
> +		*pp = pte_page(*pte);
> +	} else {
> +		next = pmd_addr_end(addr, end);
> +
> +		pmd = pmd_offset(pud, addr);
> +		if (pmd_none(*pmd))
> +			return next;
> +
> +		*page_size = PMD_SIZE;
> +		*pp = pmd_page(*pmd);
> +	}
> +
> +	/*
> +	 * Removed page structs are filled with 0xFD.
> +	 */
> +	memset((void *)addr, PAGE_INUSE, next - addr);
> +
> +	page_addr = page_address(*pp);
> +
> +	/*
> +	 * Check the page is filled with 0xFD or not.
> +	 * memchr_inv() returns the address. In this case, we cannot
> +	 * clear PTE/PUD entry, since the page is used by other.
> +	 * So we cannot also free the page.
> +	 *
> +	 * memchr_inv() returns NULL. In this case, we can clear
> +	 * PTE/PUD entry, since the page is not used by other.
> +	 * So we can also free the page.
> +	 */
> +	if (memchr_inv(page_addr, PAGE_INUSE, *page_size)) {
> +		*pp = NULL;
> +		return next;
> +	}
> +
> +	if (!cpu_has_pse)
> +		pte_clear(&init_mm, addr, pte);
> +	else
> +		pmd_clear(pmd);
> +
> +	return next;
> +}
> +
> +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
> +{
> +	unsigned long addr = (unsigned long)memmap;
> +	unsigned long end = (unsigned long)(memmap + nr_pages);
> +	unsigned long next;
> +	struct page *page;
> +	int page_size;
> +
> +	for (; addr < end; addr = next) {
> +		page = NULL;
> +		page_size = 0;
> +		next = find_and_clear_pte_page(addr, end, &page, &page_size);
> +		if (!page)
> +			continue;
> +
> +		free_pages((unsigned long)page_address(page),
> +			    get_order(page_size));
> +		__flush_tlb_one((unsigned long)page_address(page));
> +	}
> +
> +}
> +
> +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
> +{
> +	unsigned long addr = (unsigned long)memmap;
> +	unsigned long end = (unsigned long)(memmap + nr_pages);
> +	unsigned long next;
> +	struct page *page;
> +	int page_size;
> +	unsigned long magic;
> +
> +	for (; addr < end; addr = next) {
> +		page = NULL;
> +		page_size = 0;
> +		next = find_and_clear_pte_page(addr, end, &page, &page_size);
> +		if (!page)
> +			continue;
> +
> +		magic = (unsigned long) page->lru.next;
> +		if (magic == SECTION_INFO)
> +			put_page_bootmem(page);
> +		flush_tlb_kernel_range(addr, end);
> +	}
> +
> +}
> +
>  void register_page_bootmem_memmap(unsigned long section_nr,
>  				  struct page *start_page, unsigned long size)
>  {
> Index: linux-3.5-rc6/mm/memory_hotplug.c
> ===================================================================
> --- linux-3.5-rc6.orig/mm/memory_hotplug.c	2012-07-18 18:01:28.000000000 +0900
> +++ linux-3.5-rc6/mm/memory_hotplug.c	2012-07-18 18:25:11.036597977 +0900
> @@ -300,7 +300,6 @@ static int __meminit __add_section(int n
>  	return register_new_memory(nid, __pfn_to_section(phys_start_pfn));
>  }
>  
> -#ifdef CONFIG_SPARSEMEM_VMEMMAP
>  static int __remove_section(struct zone *zone, struct mem_section *ms)
>  {
>  	int ret = -EINVAL;
> @@ -309,29 +308,15 @@ static int __remove_section(struct zone 
>  		return ret;
>  
>  	ret = unregister_memory_section(ms);
> -
> -	return ret;
> -}
> -#else
> -static int __remove_section(struct zone *zone, struct mem_section *ms)
> -{
> -	unsigned long flags;
> -	struct pglist_data *pgdat = zone->zone_pgdat;

This two line should not be removed. Otherwise, we can not build the kernel.

Thanks
Wen Congyang

> -	int ret = -EINVAL;
> -
> -	if (!valid_section(ms))
> -		return ret;
> -
> -	ret = unregister_memory_section(ms);
>  	if (ret)
>  		return ret;
>  
>  	pgdat_resize_lock(pgdat, &flags);
>  	sparse_remove_one_section(zone, ms);
>  	pgdat_resize_unlock(pgdat, &flags);
> -	return 0;
> +
> +	return ret;
>  }
> -#endif
>  
>  /*
>   * Reasonably generic function for adding memory.  It is
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply

* Re: [PATCH] radeonfb: Add quirk for the graphics adapter in some JSxx
From: Tony Breeds @ 2012-07-19  5:48 UTC (permalink / raw)
  To: olaf; +Cc: linux-fbdev, linuxppc-dev
In-Reply-To: <1342630144-16350-1-git-send-email-olaf@aepfle.de>

[-- Attachment #1: Type: text/plain, Size: 534 bytes --]

On Wed, Jul 18, 2012 at 06:49:04PM +0200, olaf@aepfle.de wrote:

Thanks for following this up.

> From: Tony Breeds <tony@bakeyournoodle.com>
> 
> These devices are set to 640x480 by firmware, switch them to 800x600@60
> so that the graphical installer can run on remote console.
> 
> Reported by IBM during SLES10 SP2 beta testing:
> 
> https://bugzilla.novell.com/show_bug.cgi?id=461002
> LTC50817
> 
> Signed-off-by: Olaf Hering <olaf@aepfle.de>

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>

Yours Tony

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH] radeonfb: Add quirk for the graphics adapter in some JSxx
From: Olaf Hering @ 2012-07-19  5:20 UTC (permalink / raw)
  To: Jingoo Han, 'Tony Breeds'; +Cc: linux-fbdev, linuxppc-dev
In-Reply-To: <001201cd656b$b2794350$176bc9f0$%han@samsung.com>

On Thu, Jul 19, Jingoo Han wrote:

> On Thursday, July 19, 2012 1:49 AM, olaf@aepfle.de wrote:
> > From: Tony Breeds <tony@bakeyournoodle.com>
> > 
> > These devices are set to 640x480 by firmware, switch them to 800x600@60
> > so that the graphical installer can run on remote console.
> > 
> > Reported by IBM during SLES10 SP2 beta testing:
> > 
> > https://bugzilla.novell.com/show_bug.cgi?id=461002
> > LTC50817
> > 
> > Signed-off-by: Olaf Hering <olaf@aepfle.de>
> 
> If the author is Tony Breeds, please add 'Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>'.

He provided the initial version of the change, but did not add his tag
back in 2009. Tony, perhaps you can do that now?

Olaf

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox