LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/2] powerpc/mpc8572ds: Fix eTSEC is not available on core1 of AMP boot issue
From: Jia Hongtao @ 2012-07-10  6:08 UTC (permalink / raw)
  To: linuxppc-dev, galak; +Cc: b38951

The issue log on core1 is:
root@mpc8572ds:~# ifconfig eth0 10.192.208.244
net eth0: could not attach to PHY
SIOCSIFFLAGS: No such device

To attach PHY node mdio@24520 should not be disabled in dts of core1.
Because all PHYs are controlled through this node as follows:

mdio@24520 {
	phy0: ethernet-phy@0 {
		interrupts = <10 1 0 0>;
		reg = <0x0>;
	};
	phy1: ethernet-phy@1 {
		interrupts = <10 1 0 0>;
		reg = <0x1>;
	};
	phy2: ethernet-phy@2 {
		interrupts = <10 1 0 0>;
		reg = <0x2>;
	};
	phy3: ethernet-phy@3 {
		interrupts = <10 1 0 0>;
		reg = <0x3>;
	};

	tbi0: tbi-phy@11 {
		reg = <0x11>;
		device_type = "tbi-phy";
	};
};

Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Jia Hongtao <B38951@freescale.com>
---
 arch/powerpc/boot/dts/mpc8572ds_camp_core1.dts |    3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/boot/dts/mpc8572ds_camp_core1.dts b/arch/powerpc/boot/dts/mpc8572ds_camp_core1.dts
index d6a8faf..1932396 100644
--- a/arch/powerpc/boot/dts/mpc8572ds_camp_core1.dts
+++ b/arch/powerpc/boot/dts/mpc8572ds_camp_core1.dts
@@ -67,9 +67,6 @@
 		ethernet@24000 {
 			status = "disabled";
 		};
-		mdio@24520 {
-			status = "disabled";
-		};
 		ptp_clock@24e00 {
 			status = "disabled";
 		};
-- 
1.7.5.1

^ permalink raw reply related

* [PATCH 0/3] Raid: enable talitos xor offload for improving performance
From: Qiang Liu @ 2012-07-10  6:00 UTC (permalink / raw)
  To: linux-crypto, linuxppc-dev

The following 4 patches enabling fsl-dma and talitos offload raid
operations for improving raid performance and balancing CPU load.

Write performance will be improved by 40% tested by iozone. CPU load
will be reduced by 8%.

Qiang Liu (4):
      Talitos: move the data structure into header file
      Talitos: Support for async_tx XOR offload
      fsl-dma: support attribute of DMA_MEMORY when async_tx enabled
      Talitos: fix the issue of dma memory leak

 drivers/crypto/Kconfig   |    9 +
 drivers/crypto/talitos.c |  520 ++++++++++++++++++++++++++++++++++++----------
 drivers/crypto/talitos.h |  161 ++++++++++++++
 drivers/dma/fsldma.c     |   78 +++-----
 4 files changed, 606 insertions(+), 162 deletions(-)

^ permalink raw reply

* [PATCH 4/4] Talitos: fix the issue of dma memory leak
From: Qiang Liu @ 2012-07-10  6:00 UTC (permalink / raw)
  To: linux-crypto, linuxppc-dev; +Cc: Qiang Liu, Herbert Xu, David S. Miller

An error will be happened when test with mass data:
"DMA-API: device driver tries to sync DMA memory it has not allocated";
"DMA-API: debugging out of memory - disabling"
dma mapping memory of request->desc is not released by right device,
it should be private->dev but not dev;

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Qiang Liu <qiang.liu@freescale.com>
---
 drivers/crypto/talitos.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index 81f8497..a7da48c 100644
--- a/drivers/crypto/talitos.c
+++ b/drivers/crypto/talitos.c
@@ -264,7 +264,7 @@ static void flush_channel(struct device *dev, int ch, int error, int reset_ch)
 			else
 				status = error;

-		dma_unmap_single(dev, request->dma_desc,
+		 dma_unmap_single(priv->dev, request->dma_desc,
 				 sizeof(struct talitos_desc),
 				 DMA_BIDIRECTIONAL);

--
1.7.5.1

^ permalink raw reply related

* [PATCH 3/4] fsl-dma: support attribute of DMA_MEMORY when async_tx enabled
From: Qiang Liu @ 2012-07-10  5:59 UTC (permalink / raw)
  To: linux-crypto, linuxppc-dev; +Cc: Qiang Liu, Vinod Koul, Dan Williams

- delete attribute of DMA_INTERRUPT because fsl-dma doesn't support
this function, exception will be thrown if talitos is used to compute xor
at the same time;
- change the release process of dma descriptor for avoiding exception when
enable config NET_DMA, release dma descriptor from 1st to last second, the
last descriptor which is reserved in current descriptor register may not be
completed, race condition will be raised if free current descriptor;
- use spin_lock_bh to instead of spin_lock_irqsave for improving performance;

A race condition which is raised when use both of talitos and dmaengine to
offload xor is because napi scheduler will sync all pending requests in dma
channels, it will affect the process of raid operations. The descriptor is
freed which is submitted just now, but async_tx must check whether this depend
tx descriptor is acked, there are poison contents in the invalid address,
then BUG_ON() is thrown, so this descriptor will be freed in the next time.

Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: Li Yang <leoli@freescale.com>
Signed-off-by: Qiang Liu <qiang.liu@freescale.com>
---
 drivers/dma/fsldma.c |   78 ++++++++++++++++----------------------------------
 1 files changed, 25 insertions(+), 53 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index b98070c..44698c9 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -404,11 +404,9 @@ static dma_cookie_t fsl_dma_tx_submit(struct dma_async_tx_descriptor *tx)
 	struct fsldma_chan *chan = to_fsl_chan(tx->chan);
 	struct fsl_desc_sw *desc = tx_to_fsl_desc(tx);
 	struct fsl_desc_sw *child;
-	unsigned long flags;
 	dma_cookie_t cookie;

-	spin_lock_irqsave(&chan->desc_lock, flags);
-
+	spin_lock_bh(&chan->desc_lock);
 	/*
 	 * assign cookies to all of the software descriptors
 	 * that make up this transaction
@@ -427,7 +425,7 @@ static dma_cookie_t fsl_dma_tx_submit(struct dma_async_tx_descriptor *tx)
 	/* put this transaction onto the tail of the pending queue */
 	append_ld_queue(chan, desc);

-	spin_unlock_irqrestore(&chan->desc_lock, flags);
+	spin_unlock_bh(&chan->desc_lock);

 	return cookie;
 }
@@ -536,48 +534,18 @@ static void fsldma_free_desc_list_reverse(struct fsldma_chan *chan,
 static void fsl_dma_free_chan_resources(struct dma_chan *dchan)
 {
 	struct fsldma_chan *chan = to_fsl_chan(dchan);
-	unsigned long flags;

 	chan_dbg(chan, "free all channel resources\n");
-	spin_lock_irqsave(&chan->desc_lock, flags);
+	spin_lock_bh(&chan->desc_lock);
 	fsldma_free_desc_list(chan, &chan->ld_pending);
 	fsldma_free_desc_list(chan, &chan->ld_running);
-	spin_unlock_irqrestore(&chan->desc_lock, flags);
+	spin_unlock_bh(&chan->desc_lock);

 	dma_pool_destroy(chan->desc_pool);
 	chan->desc_pool = NULL;
 }

 static struct dma_async_tx_descriptor *
-fsl_dma_prep_interrupt(struct dma_chan *dchan, unsigned long flags)
-{
-	struct fsldma_chan *chan;
-	struct fsl_desc_sw *new;
-
-	if (!dchan)
-		return NULL;
-
-	chan = to_fsl_chan(dchan);
-
-	new = fsl_dma_alloc_descriptor(chan);
-	if (!new) {
-		chan_err(chan, "%s\n", msg_ld_oom);
-		return NULL;
-	}
-
-	new->async_tx.cookie = -EBUSY;
-	new->async_tx.flags = flags;
-
-	/* Insert the link descriptor to the LD ring */
-	list_add_tail(&new->node, &new->tx_list);
-
-	/* Set End-of-link to the last link descriptor of new list */
-	set_ld_eol(chan, new);
-
-	return &new->async_tx;
-}
-
-static struct dma_async_tx_descriptor *
 fsl_dma_prep_memcpy(struct dma_chan *dchan,
 	dma_addr_t dma_dst, dma_addr_t dma_src,
 	size_t len, unsigned long flags)
@@ -788,7 +756,6 @@ static int fsl_dma_device_control(struct dma_chan *dchan,
 {
 	struct dma_slave_config *config;
 	struct fsldma_chan *chan;
-	unsigned long flags;
 	int size;

 	if (!dchan)
@@ -798,7 +765,7 @@ static int fsl_dma_device_control(struct dma_chan *dchan,

 	switch (cmd) {
 	case DMA_TERMINATE_ALL:
-		spin_lock_irqsave(&chan->desc_lock, flags);
+		spin_lock_bh(&chan->desc_lock);

 		/* Halt the DMA engine */
 		dma_halt(chan);
@@ -808,7 +775,7 @@ static int fsl_dma_device_control(struct dma_chan *dchan,
 		fsldma_free_desc_list(chan, &chan->ld_running);
 		chan->idle = true;

-		spin_unlock_irqrestore(&chan->desc_lock, flags);
+		spin_unlock_bh(&chan->desc_lock);
 		return 0;

 	case DMA_SLAVE_CONFIG:
@@ -968,11 +935,10 @@ static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan)
 static void fsl_dma_memcpy_issue_pending(struct dma_chan *dchan)
 {
 	struct fsldma_chan *chan = to_fsl_chan(dchan);
-	unsigned long flags;

-	spin_lock_irqsave(&chan->desc_lock, flags);
+	spin_lock_bh(&chan->desc_lock);
 	fsl_chan_xfer_ld_queue(chan);
-	spin_unlock_irqrestore(&chan->desc_lock, flags);
+	spin_unlock_bh(&chan->desc_lock);
 }

 /**
@@ -1073,13 +1039,20 @@ static irqreturn_t fsldma_chan_irq(int irq, void *data)
 static void dma_do_tasklet(unsigned long data)
 {
 	struct fsldma_chan *chan = (struct fsldma_chan *)data;
-	struct fsl_desc_sw *desc, *_desc;
+	struct fsl_desc_sw *desc, *_desc, *prev = NULL;
 	LIST_HEAD(ld_cleanup);
-	unsigned long flags;
+	dma_addr_t curr_phys = get_cdar(chan);

 	chan_dbg(chan, "tasklet entry\n");

-	spin_lock_irqsave(&chan->desc_lock, flags);
+	spin_lock_bh(&chan->desc_lock);
+
+	/* find the descriptor which is already completed */
+	list_for_each_entry_safe(desc, _desc, &chan->ld_running, node) {
+		if (prev && desc->async_tx.phys == curr_phys)
+			break;
+		prev = desc;
+	}

 	/* update the cookie if we have some descriptors to cleanup */
 	if (!list_empty(&chan->ld_running)) {
@@ -1092,23 +1065,24 @@ static void dma_do_tasklet(unsigned long data)
 		chan_dbg(chan, "completed_cookie=%d\n", cookie);
 	}

+	/* the hardware is now idle and ready for more */
+	chan->idle = true;
+
 	/*
 	 * move the descriptors to a temporary list so we can drop the lock
 	 * during the entire cleanup operation
 	 */
-	list_splice_tail_init(&chan->ld_running, &ld_cleanup);
-
-	/* the hardware is now idle and ready for more */
-	chan->idle = true;
+	list_cut_position(&ld_cleanup, &chan->ld_running, &prev->node);

 	/*
-	 * Start any pending transactions automatically
+	 * Start any pending transactions automatically if current descriptor
+	 * list is completed
 	 *
 	 * In the ideal case, we keep the DMA controller busy while we go
 	 * ahead and free the descriptors below.
 	 */
 	fsl_chan_xfer_ld_queue(chan);
-	spin_unlock_irqrestore(&chan->desc_lock, flags);
+	spin_unlock_bh(&chan->desc_lock);

 	/* Run the callback for each descriptor, in order */
 	list_for_each_entry_safe(desc, _desc, &ld_cleanup, node) {
@@ -1360,12 +1334,10 @@ static int __devinit fsldma_of_probe(struct platform_device *op)
 	fdev->irq = irq_of_parse_and_map(op->dev.of_node, 0);

 	dma_cap_set(DMA_MEMCPY, fdev->common.cap_mask);
-	dma_cap_set(DMA_INTERRUPT, fdev->common.cap_mask);
 	dma_cap_set(DMA_SG, fdev->common.cap_mask);
 	dma_cap_set(DMA_SLAVE, fdev->common.cap_mask);
 	fdev->common.device_alloc_chan_resources = fsl_dma_alloc_chan_resources;
 	fdev->common.device_free_chan_resources = fsl_dma_free_chan_resources;
-	fdev->common.device_prep_dma_interrupt = fsl_dma_prep_interrupt;
 	fdev->common.device_prep_dma_memcpy = fsl_dma_prep_memcpy;
 	fdev->common.device_prep_dma_sg = fsl_dma_prep_sg;
 	fdev->common.device_tx_status = fsl_tx_status;
--
1.7.5.1

^ permalink raw reply related

* [PATCH 2/4] Talitos: Support for async_tx XOR offload
From: Qiang Liu @ 2012-07-10  5:58 UTC (permalink / raw)
  To: linux-crypto, linuxppc-dev; +Cc: Qiang Liu, Herbert Xu, David S. Miller

Expose Talitos's XOR functionality to be used for RAID parity
calculation via the Async_tx layer.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Dipen Dudhat <Dipen.Dudhat@freescale.com>
Signed-off-by: Maneesh Gupta <Maneesh.Gupta@freescale.com>
Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Vishnu Suresh <Vishnu@freescale.com>
Signed-off-by: Qiang Liu <qiang.liu@freescale.com>
---
 drivers/crypto/Kconfig   |    9 +
 drivers/crypto/talitos.c |  410 ++++++++++++++++++++++++++++++++++++++++++++++
 drivers/crypto/talitos.h |   53 ++++++
 3 files changed, 472 insertions(+), 0 deletions(-)

diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index 69fdf18..ad0cf5a 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -220,6 +220,15 @@ config CRYPTO_DEV_TALITOS
 	  To compile this driver as a module, choose M here: the module
 	  will be called talitos.

+config CRYPTO_DEV_TALITOS_RAIDXOR
+	bool "Talitos RAID5 XOR Calculation Offload"
+	default y
+	select DMA_ENGINE
+	depends on CRYPTO_DEV_TALITOS
+	help
+	  Say 'Y' here to use the Freescale Security Engine (SEC) to
+	  offload RAID XOR parity Calculation
+
 config CRYPTO_DEV_IXP4XX
 	tristate "Driver for IXP4xx crypto hardware acceleration"
 	depends on ARCH_IXP4XX
diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index cb9e4cd..81f8497 100644
--- a/drivers/crypto/talitos.c
+++ b/drivers/crypto/talitos.c
@@ -609,6 +609,396 @@ static void talitos_unregister_rng(struct device *dev)
 	hwrng_unregister(&priv->rng);
 }

+#ifdef CONFIG_CRYPTO_DEV_TALITOS_RAIDXOR
+static void talitos_release_xor(struct device *dev, struct talitos_desc *hwdesc,
+				void *context, int error);
+
+static enum dma_status talitos_is_tx_complete(struct dma_chan *chan,
+					      dma_cookie_t cookie,
+					      struct dma_tx_state *state)
+{
+	struct talitos_xor_chan *xor_chan;
+	dma_cookie_t last_used;
+	dma_cookie_t last_complete;
+
+	xor_chan = container_of(chan, struct talitos_xor_chan, common);
+
+	last_used = chan->cookie;
+	last_complete = xor_chan->completed_cookie;
+
+	if (state->last)
+		state->last = last_complete;
+
+	if (state->used)
+		state->used = last_used;
+
+	return dma_async_is_complete(cookie, last_complete, last_used);
+}
+
+static void talitos_process_pending(struct talitos_xor_chan *xor_chan)
+{
+	struct talitos_xor_desc *desc, *_desc;
+	unsigned long flags;
+	int status;
+	struct talitos_private *priv;
+	int ch;
+
+	priv = dev_get_drvdata(xor_chan->dev);
+	ch = atomic_inc_return(&priv->last_chan) &
+				  (priv->num_channels - 1);
+	spin_lock_irqsave(&xor_chan->desc_lock, flags);
+
+	list_for_each_entry_safe(desc, _desc, &xor_chan->pending_q, node) {
+		status = talitos_submit(xor_chan->dev, ch, &desc->hwdesc,
+					talitos_release_xor, desc);
+		if (status != -EINPROGRESS)
+			break;
+
+		list_del(&desc->node);
+		list_add_tail(&desc->node, &xor_chan->in_progress_q);
+	}
+
+	spin_unlock_irqrestore(&xor_chan->desc_lock, flags);
+}
+
+static void talitos_xor_run_tx_complete_actions(struct talitos_xor_desc *desc,
+		struct talitos_xor_chan *xor_chan)
+{
+	struct device *dev = xor_chan->dev;
+	dma_addr_t dest, addr;
+	unsigned int src_cnt = desc->unmap_src_cnt;
+	unsigned int len = desc->unmap_len;
+	enum dma_ctrl_flags flags = desc->async_tx.flags;
+	struct dma_async_tx_descriptor *tx = &desc->async_tx;
+
+	/* unmap dma addresses */
+	dest = desc->hwdesc.ptr[6].ptr;
+	if (likely(!(flags & DMA_COMPL_SKIP_DEST_UNMAP)))
+		dma_unmap_page(dev, dest, len, DMA_BIDIRECTIONAL);
+
+	desc->idx = 6 - src_cnt;
+	while(desc->idx < 6) {
+		addr = desc->hwdesc.ptr[desc->idx++].ptr;
+		if (likely(!(flags & DMA_COMPL_SKIP_SRC_UNMAP)))
+			dma_unmap_page(dev, addr, len, DMA_TO_DEVICE);
+	}
+
+	/* run dependent operations */
+	dma_run_dependencies(tx);
+}
+
+static void talitos_release_xor(struct device *dev, struct talitos_desc *hwdesc,
+				void *context, int error)
+{
+	struct talitos_xor_desc *desc = context;
+	struct talitos_xor_chan *xor_chan;
+	dma_async_tx_callback callback;
+	void *callback_param;
+
+	if (unlikely(error))
+		dev_err(dev, "xor operation: talitos error %d\n", error);
+
+	xor_chan = container_of(desc->async_tx.chan, struct talitos_xor_chan,
+				common);
+	spin_lock_bh(&xor_chan->desc_lock);
+	if (xor_chan->completed_cookie < desc->async_tx.cookie)
+		xor_chan->completed_cookie = desc->async_tx.cookie;
+
+	callback = desc->async_tx.callback;
+	callback_param = desc->async_tx.callback_param;
+
+	if (callback) {
+		spin_unlock_bh(&xor_chan->desc_lock);
+		callback(callback_param);
+		spin_lock_bh(&xor_chan->desc_lock);
+	}
+
+	talitos_xor_run_tx_complete_actions(desc, xor_chan);
+
+	list_del(&desc->node);
+	list_add_tail(&desc->node, &xor_chan->free_desc);
+	spin_unlock_bh(&xor_chan->desc_lock);
+	if (!list_empty(&xor_chan->pending_q))
+		talitos_process_pending(xor_chan);
+}
+
+/**
+ * talitos_issue_pending - move the descriptors in submit
+ * queue to pending queue and submit them for processing
+ * @chan: DMA channel
+ */
+static void talitos_issue_pending(struct dma_chan *chan)
+{
+	struct talitos_xor_chan *xor_chan;
+
+	xor_chan = container_of(chan, struct talitos_xor_chan, common);
+	spin_lock_bh(&xor_chan->desc_lock);
+	list_splice_tail_init(&xor_chan->submit_q,
+				 &xor_chan->pending_q);
+	spin_unlock_bh(&xor_chan->desc_lock);
+	talitos_process_pending(xor_chan);
+}
+
+static dma_cookie_t talitos_async_tx_submit(struct dma_async_tx_descriptor *tx)
+{
+	struct talitos_xor_desc *desc;
+	struct talitos_xor_chan *xor_chan;
+	dma_cookie_t cookie;
+
+	desc = container_of(tx, struct talitos_xor_desc, async_tx);
+	xor_chan = container_of(tx->chan, struct talitos_xor_chan, common);
+
+	spin_lock_bh(&xor_chan->desc_lock);
+
+	cookie = xor_chan->common.cookie + 1;
+	if (cookie < 0)
+		cookie = 1;
+
+	desc->async_tx.cookie = cookie;
+	xor_chan->common.cookie = desc->async_tx.cookie;
+
+	list_splice_tail_init(&desc->tx_list,
+				 &xor_chan->submit_q);
+
+	spin_unlock_bh(&xor_chan->desc_lock);
+
+	return cookie;
+}
+
+static struct talitos_xor_desc *talitos_xor_alloc_descriptor(
+				struct talitos_xor_chan *xor_chan, gfp_t flags)
+{
+	struct talitos_xor_desc *desc;
+
+	desc = kmalloc(sizeof(*desc), flags);
+	if (desc) {
+		xor_chan->total_desc++;
+		desc->async_tx.tx_submit = talitos_async_tx_submit;
+	}
+
+	return desc;
+}
+
+static void talitos_free_chan_resources(struct dma_chan *chan)
+{
+	struct talitos_xor_chan *xor_chan;
+	struct talitos_xor_desc *desc, *_desc;
+
+	xor_chan = container_of(chan, struct talitos_xor_chan, common);
+
+	spin_lock_bh(&xor_chan->desc_lock);
+
+	list_for_each_entry_safe(desc, _desc, &xor_chan->submit_q, node) {
+		list_del(&desc->node);
+		xor_chan->total_desc--;
+		kfree(desc);
+	}
+	list_for_each_entry_safe(desc, _desc, &xor_chan->pending_q, node) {
+		list_del(&desc->node);
+		xor_chan->total_desc--;
+		kfree(desc);
+	}
+	list_for_each_entry_safe(desc, _desc, &xor_chan->in_progress_q, node) {
+		list_del(&desc->node);
+		xor_chan->total_desc--;
+		kfree(desc);
+	}
+	list_for_each_entry_safe(desc, _desc, &xor_chan->free_desc, node) {
+		list_del(&desc->node);
+		xor_chan->total_desc--;
+		kfree(desc);
+	}
+
+	/* Some descriptor not freed? */
+	if (unlikely(xor_chan->total_desc))
+		dev_warn(chan->device->dev, "Failed to free xor channel resource\n");
+
+	spin_unlock_bh(&xor_chan->desc_lock);
+}
+
+static int talitos_alloc_chan_resources(struct dma_chan *chan)
+{
+	struct talitos_xor_chan *xor_chan;
+	struct talitos_xor_desc *desc;
+	LIST_HEAD(tmp_list);
+	int i;
+
+	xor_chan = container_of(chan, struct talitos_xor_chan, common);
+
+	if (!list_empty(&xor_chan->free_desc))
+		return xor_chan->total_desc;
+
+	for (i = 0; i < TALITOS_MAX_DESCRIPTOR_NR; i++) {
+		desc = talitos_xor_alloc_descriptor(xor_chan,
+				GFP_KERNEL | GFP_DMA);
+		if (!desc) {
+			dev_err(xor_chan->common.device->dev,
+				"Only %d initial descriptors\n", i);
+			break;
+		}
+		list_add_tail(&desc->node, &tmp_list);
+	}
+
+	if (!i)
+		return -ENOMEM;
+
+	/* At least one desc is allocated */
+	spin_lock_bh(&xor_chan->desc_lock);
+	list_splice_init(&tmp_list, &xor_chan->free_desc);
+	spin_unlock_bh(&xor_chan->desc_lock);
+
+	return xor_chan->total_desc;
+}
+
+static struct dma_async_tx_descriptor *talitos_prep_dma_xor(
+			struct dma_chan *chan, dma_addr_t dest, dma_addr_t *src,
+			unsigned int src_cnt, size_t len, unsigned long flags)
+{
+	struct talitos_xor_chan *xor_chan;
+	struct talitos_xor_desc *new;
+	struct talitos_desc *desc;
+	int i, j;
+
+	BUG_ON(len > TALITOS_MAX_DATA_LEN);
+
+	xor_chan = container_of(chan, struct talitos_xor_chan, common);
+
+	spin_lock_bh(&xor_chan->desc_lock);
+	if (!list_empty(&xor_chan->free_desc)) {
+		new = container_of(xor_chan->free_desc.next,
+				   struct talitos_xor_desc, node);
+		list_del(&new->node);
+	} else {
+		 new = talitos_xor_alloc_descriptor(xor_chan, GFP_KERNEL | GFP_DMA);
+	}
+	spin_unlock_bh(&xor_chan->desc_lock);
+
+	if (!new) {
+		dev_err(xor_chan->common.device->dev,
+			"No free memory for XOR DMA descriptor\n");
+		return NULL;
+	}
+	dma_async_tx_descriptor_init(&new->async_tx, &xor_chan->common);
+
+	INIT_LIST_HEAD(&new->node);
+	INIT_LIST_HEAD(&new->tx_list);
+
+	desc = &new->hwdesc;
+	/* Set destination: Last pointer pair */
+	to_talitos_ptr(&desc->ptr[6], dest);
+	desc->ptr[6].len = cpu_to_be16(len);
+	desc->ptr[6].j_extent = 0;
+	new->unmap_src_cnt = src_cnt;
+	new->unmap_len = len;
+
+	/* Set Sources: End loading from second-last pointer pair */
+	for (i = 5, j = 0; j < src_cnt && i >= 0; i--, j++) {
+		to_talitos_ptr(&desc->ptr[i], src[j]);
+		desc->ptr[i].len = cpu_to_be16(len);
+		desc->ptr[i].j_extent = 0;
+	}
+
+	/*
+	 * documentation states first 0 ptr/len combo marks end of sources
+	 * yet device produces scatter boundary error unless all subsequent
+	 * sources are zeroed out
+	 */
+	for (; i >= 0; i--) {
+		to_talitos_ptr(&desc->ptr[i], 0);
+		desc->ptr[i].len = 0;
+		desc->ptr[i].j_extent = 0;
+	}
+
+	desc->hdr = DESC_HDR_SEL0_AESU | DESC_HDR_MODE0_AESU_XOR |
+		DESC_HDR_TYPE_RAID_XOR;
+
+	new->async_tx.parent = NULL;
+	new->async_tx.next = NULL;
+	new->async_tx.cookie = 0;
+	async_tx_ack(&new->async_tx);
+
+	list_add_tail(&new->node, &new->tx_list);
+
+	new->async_tx.flags = flags;
+	new->async_tx.cookie = -EBUSY;
+
+	return &new->async_tx;
+}
+
+static void talitos_unregister_async_xor(struct device *dev)
+{
+	struct talitos_private *priv = dev_get_drvdata(dev);
+	struct talitos_xor_chan *xor_chan;
+	struct dma_chan *chan, *_chan;
+
+	if (priv->dma_dev_common.chancnt)
+		dma_async_device_unregister(&priv->dma_dev_common);
+
+	list_for_each_entry_safe(chan, _chan, &priv->dma_dev_common.channels,
+				device_node) {
+		xor_chan = container_of(chan, struct talitos_xor_chan,
+					common);
+		list_del(&chan->device_node);
+		priv->dma_dev_common.chancnt--;
+		kfree(xor_chan);
+	}
+}
+
+/**
+ * talitos_register_dma_async - Initialize the Freescale XOR ADMA device
+ * It is registered as a DMA device with the capability to perform
+ * XOR operation with the Async_tx layer.
+ * The various queues and channel resources are also allocated.
+ */
+static int talitos_register_async_tx(struct device *dev, int max_xor_srcs)
+{
+	struct talitos_private *priv = dev_get_drvdata(dev);
+	struct dma_device *dma_dev = &priv->dma_dev_common;
+	struct talitos_xor_chan *xor_chan;
+	int err;
+
+	xor_chan = kzalloc(sizeof(struct talitos_xor_chan), GFP_KERNEL);
+	if (!xor_chan) {
+		dev_err(dev, "unable to allocate xor channel\n");
+		return -ENOMEM;
+	}
+
+	dma_dev->dev = dev;
+	dma_dev->device_alloc_chan_resources = talitos_alloc_chan_resources;
+	dma_dev->device_free_chan_resources = talitos_free_chan_resources;
+	dma_dev->device_prep_dma_xor = talitos_prep_dma_xor;
+	dma_dev->max_xor = max_xor_srcs;
+	dma_dev->device_tx_status = talitos_is_tx_complete;
+	dma_dev->device_issue_pending = talitos_issue_pending;
+	INIT_LIST_HEAD(&dma_dev->channels);
+	dma_cap_set(DMA_XOR, dma_dev->cap_mask);
+
+	xor_chan->dev = dev;
+	xor_chan->common.device = dma_dev;
+	xor_chan->total_desc = 0;
+	INIT_LIST_HEAD(&xor_chan->submit_q);
+	INIT_LIST_HEAD(&xor_chan->pending_q);
+	INIT_LIST_HEAD(&xor_chan->in_progress_q);
+	INIT_LIST_HEAD(&xor_chan->free_desc);
+	spin_lock_init(&xor_chan->desc_lock);
+
+	list_add_tail(&xor_chan->common.device_node, &dma_dev->channels);
+	dma_dev->chancnt++;
+
+	err = dma_async_device_register(dma_dev);
+	if (err) {
+		dev_err(dev, "Unable to register XOR with Async_tx\n");
+		goto err_out;
+	}
+
+	return err;
+
+err_out:
+	talitos_unregister_async_xor(dev);
+	return err;
+}
+#endif
+
 /*
  * crypto alg
  */
@@ -2720,6 +3110,26 @@ static int talitos_probe(struct platform_device *ofdev)
 			dev_info(dev, "hwrng\n");
 	}

+#ifdef CONFIG_CRYPTO_DEV_TALITOS_RAIDXOR
+	/*
+	 * register with async_tx xor, if capable
+	 * SEC 2.x support up to 3 RAID sources,
+	 * SEC 3.x support up to 6
+	 */
+	if (hw_supports(dev, DESC_HDR_SEL0_AESU | DESC_HDR_TYPE_RAID_XOR)) {
+		int max_xor_srcs = 3;
+		if (of_device_is_compatible(np, "fsl,sec3.0"))
+			max_xor_srcs = 6;
+		err = talitos_register_async_tx(dev, max_xor_srcs);
+		if (err) {
+			dev_err(dev, "failed to register async_tx xor: %d\n",
+					err);
+			goto err_out;
+		}
+		dev_info(dev, "max_xor_srcs %d\n", max_xor_srcs);
+	}
+#endif
+
 	/* register crypto algorithms the device supports */
 	for (i = 0; i < ARRAY_SIZE(driver_algs); i++) {
 		if (hw_supports(dev, driver_algs[i].desc_hdr_template)) {
diff --git a/drivers/crypto/talitos.h b/drivers/crypto/talitos.h
index 21934be..ef5b7dc 100644
--- a/drivers/crypto/talitos.h
+++ b/drivers/crypto/talitos.h
@@ -30,6 +30,7 @@

 #define TALITOS_TIMEOUT 100000
 #define TALITOS_MAX_DATA_LEN 65535
+#define TALITOS_MAX_DESCRIPTOR_NR 256

 #define DESC_TYPE(desc_hdr) ((be32_to_cpu(desc_hdr) >> 3) & 0x1f)
 #define PRIMARY_EU(desc_hdr) ((be32_to_cpu(desc_hdr) >> 28) & 0xf)
@@ -128,7 +129,57 @@ struct talitos_private {

 	/* hwrng device */
 	struct hwrng rng;
+
+#ifdef CONFIG_CRYPTO_DEV_TALITOS_RAIDXOR
+	/* XOR Device */
+	struct dma_device dma_dev_common;
+#endif
+};
+
+#ifdef CONFIG_CRYPTO_DEV_TALITOS_RAIDXOR
+/**
+ * talitos_xor_chan - context management for the async_tx channel
+ * @completed_cookie: the last completed cookie
+ * @desc_lock: lock for tx queue
+ * @total_desc: number of descriptors allocated
+ * @submit_q: queue of submitted descriptors
+ * @pending_q: queue of pending descriptors
+ * @in_progress_q: queue of descriptors in progress
+ * @free_desc: queue of unused descriptors
+ * @dev: talitos device implementing this channel
+ * @common: the corresponding xor channel in async_tx
+ */
+struct talitos_xor_chan {
+	dma_cookie_t completed_cookie;
+	spinlock_t desc_lock;
+	unsigned int total_desc;
+	struct list_head submit_q;
+	struct list_head pending_q;
+	struct list_head in_progress_q;
+	struct list_head free_desc;
+	struct device *dev;
+	struct dma_chan common;
+};
+
+/**
+ * talitos_xor_desc - software xor descriptor
+ * @async_tx: the referring async_tx descriptor
+ * @node:
+ * @hwdesc: h/w descriptor
+ * @unmap_src_cnt: number of xor sources
+ * @unmap_len: transaction byte count
+ * @idx: index of xor sources
+ */
+struct talitos_xor_desc {
+	struct dma_async_tx_descriptor async_tx;
+	struct list_head tx_list;
+	struct list_head node;
+	struct talitos_desc hwdesc;
+	unsigned int unmap_src_cnt;
+	unsigned int unmap_len;
+	unsigned int idx;
 };
+#endif

 /* .features flag */
 #define TALITOS_FTR_SRC_LINK_TBL_LEN_INCLUDES_EXTENT 0x00000001
@@ -275,6 +326,7 @@ struct talitos_private {
 /* primary execution unit mode (MODE0) and derivatives */
 #define	DESC_HDR_MODE0_ENCRYPT		cpu_to_be32(0x00100000)
 #define	DESC_HDR_MODE0_AESU_CBC		cpu_to_be32(0x00200000)
+#define	DESC_HDR_MODE0_AESU_XOR         cpu_to_be32(0x0c600000)
 #define	DESC_HDR_MODE0_DEU_CBC		cpu_to_be32(0x00400000)
 #define	DESC_HDR_MODE0_DEU_3DES		cpu_to_be32(0x00200000)
 #define	DESC_HDR_MODE0_MDEU_CONT	cpu_to_be32(0x08000000)
@@ -329,6 +381,7 @@ struct talitos_private {
 #define DESC_HDR_TYPE_IPSEC_ESP			cpu_to_be32(1 << 3)
 #define DESC_HDR_TYPE_COMMON_NONSNOOP_NO_AFEU	cpu_to_be32(2 << 3)
 #define DESC_HDR_TYPE_HMAC_SNOOP_NO_AFEU	cpu_to_be32(4 << 3)
+#define DESC_HDR_TYPE_RAID_XOR                  cpu_to_be32(21 << 3)

 /* link table extent field bits */
 #define DESC_PTR_LNKTBL_JUMP			0x80
--
1.7.5.1

^ permalink raw reply related

* [PATCH 1/4] Talitos: move the data structure into header file
From: Qiang Liu @ 2012-07-10  5:56 UTC (permalink / raw)
  To: linux-crypto, linuxppc-dev; +Cc: Qiang Liu, Herbert Xu, David S. Miller
In-Reply-To: <1341899809-20630-1-git-send-email-qiang.liu@freescale.com>

Move the declaration of talitos data structure into talitos.h.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Qiang Liu <qiang.liu@freescale.com>
---
 drivers/crypto/talitos.c |  108 ----------------------------------------------
 drivers/crypto/talitos.h |  108 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 108 insertions(+), 108 deletions(-)

diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index dc641c7..cb9e4cd 100644
--- a/drivers/crypto/talitos.c
+++ b/drivers/crypto/talitos.c
@@ -53,114 +53,6 @@

 #include "talitos.h"

-#define TALITOS_TIMEOUT 100000
-#define TALITOS_MAX_DATA_LEN 65535
-
-#define DESC_TYPE(desc_hdr) ((be32_to_cpu(desc_hdr) >> 3) & 0x1f)
-#define PRIMARY_EU(desc_hdr) ((be32_to_cpu(desc_hdr) >> 28) & 0xf)
-#define SECONDARY_EU(desc_hdr) ((be32_to_cpu(desc_hdr) >> 16) & 0xf)
-
-/* descriptor pointer entry */
-struct talitos_ptr {
-	__be16 len;	/* length */
-	u8 j_extent;	/* jump to sg link table and/or extent */
-	u8 eptr;	/* extended address */
-	__be32 ptr;	/* address */
-};
-
-static const struct talitos_ptr zero_entry = {
-	.len = 0,
-	.j_extent = 0,
-	.eptr = 0,
-	.ptr = 0
-};
-
-/* descriptor */
-struct talitos_desc {
-	__be32 hdr;			/* header high bits */
-	__be32 hdr_lo;			/* header low bits */
-	struct talitos_ptr ptr[7];	/* ptr/len pair array */
-};
-
-/**
- * talitos_request - descriptor submission request
- * @desc: descriptor pointer (kernel virtual)
- * @dma_desc: descriptor's physical bus address
- * @callback: whom to call when descriptor processing is done
- * @context: caller context (optional)
- */
-struct talitos_request {
-	struct talitos_desc *desc;
-	dma_addr_t dma_desc;
-	void (*callback) (struct device *dev, struct talitos_desc *desc,
-	                  void *context, int error);
-	void *context;
-};
-
-/* per-channel fifo management */
-struct talitos_channel {
-	void __iomem *reg;
-
-	/* request fifo */
-	struct talitos_request *fifo;
-
-	/* number of requests pending in channel h/w fifo */
-	atomic_t submit_count ____cacheline_aligned;
-
-	/* request submission (head) lock */
-	spinlock_t head_lock ____cacheline_aligned;
-	/* index to next free descriptor request */
-	int head;
-
-	/* request release (tail) lock */
-	spinlock_t tail_lock ____cacheline_aligned;
-	/* index to next in-progress/done descriptor request */
-	int tail;
-};
-
-struct talitos_private {
-	struct device *dev;
-	struct platform_device *ofdev;
-	void __iomem *reg;
-	int irq[2];
-
-	/* SEC version geometry (from device tree node) */
-	unsigned int num_channels;
-	unsigned int chfifo_len;
-	unsigned int exec_units;
-	unsigned int desc_types;
-
-	/* SEC Compatibility info */
-	unsigned long features;
-
-	/*
-	 * length of the request fifo
-	 * fifo_len is chfifo_len rounded up to next power of 2
-	 * so we can use bitwise ops to wrap
-	 */
-	unsigned int fifo_len;
-
-	struct talitos_channel *chan;
-
-	/* next channel to be assigned next incoming descriptor */
-	atomic_t last_chan ____cacheline_aligned;
-
-	/* request callback tasklet */
-	struct tasklet_struct done_task[2];
-
-	/* list of registered algorithms */
-	struct list_head alg_list;
-
-	/* hwrng device */
-	struct hwrng rng;
-};
-
-/* .features flag */
-#define TALITOS_FTR_SRC_LINK_TBL_LEN_INCLUDES_EXTENT 0x00000001
-#define TALITOS_FTR_HW_AUTH_CHECK 0x00000002
-#define TALITOS_FTR_SHA224_HWINIT 0x00000004
-#define TALITOS_FTR_HMAC_OK 0x00000008
-
 static void to_talitos_ptr(struct talitos_ptr *talitos_ptr, dma_addr_t dma_addr)
 {
 	talitos_ptr->ptr = cpu_to_be32(lower_32_bits(dma_addr));
diff --git a/drivers/crypto/talitos.h b/drivers/crypto/talitos.h
index 3c17395..21934be 100644
--- a/drivers/crypto/talitos.h
+++ b/drivers/crypto/talitos.h
@@ -28,6 +28,114 @@
  *
  */

+#define TALITOS_TIMEOUT 100000
+#define TALITOS_MAX_DATA_LEN 65535
+
+#define DESC_TYPE(desc_hdr) ((be32_to_cpu(desc_hdr) >> 3) & 0x1f)
+#define PRIMARY_EU(desc_hdr) ((be32_to_cpu(desc_hdr) >> 28) & 0xf)
+#define SECONDARY_EU(desc_hdr) ((be32_to_cpu(desc_hdr) >> 16) & 0xf)
+
+/* descriptor pointer entry */
+struct talitos_ptr {
+	__be16 len;	/* length */
+	u8 j_extent;	/* jump to sg link table and/or extent */
+	u8 eptr;	/* extended address */
+	__be32 ptr;	/* address */
+};
+
+static const struct talitos_ptr zero_entry = {
+	.len = 0,
+	.j_extent = 0,
+	.eptr = 0,
+	.ptr = 0
+};
+
+/* descriptor */
+struct talitos_desc {
+	__be32 hdr;			/* header high bits */
+	__be32 hdr_lo;			/* header low bits */
+	struct talitos_ptr ptr[7];	/* ptr/len pair array */
+};
+
+/**
+ * talitos_request - descriptor submission request
+ * @desc: descriptor pointer (kernel virtual)
+ * @dma_desc: descriptor's physical bus address
+ * @callback: whom to call when descriptor processing is done
+ * @context: caller context (optional)
+ */
+struct talitos_request {
+	struct talitos_desc *desc;
+	dma_addr_t dma_desc;
+	void (*callback) (struct device *dev, struct talitos_desc *desc,
+	                  void *context, int error);
+	void *context;
+};
+
+/* per-channel fifo management */
+struct talitos_channel {
+	void __iomem *reg;
+
+	/* request fifo */
+	struct talitos_request *fifo;
+
+	/* number of requests pending in channel h/w fifo */
+	atomic_t submit_count ____cacheline_aligned;
+
+	/* request submission (head) lock */
+	spinlock_t head_lock ____cacheline_aligned;
+	/* index to next free descriptor request */
+	int head;
+
+	/* request release (tail) lock */
+	spinlock_t tail_lock ____cacheline_aligned;
+	/* index to next in-progress/done descriptor request */
+	int tail;
+};
+
+struct talitos_private {
+	struct device *dev;
+	struct platform_device *ofdev;
+	void __iomem *reg;
+	int irq[2];
+
+	/* SEC version geometry (from device tree node) */
+	unsigned int num_channels;
+	unsigned int chfifo_len;
+	unsigned int exec_units;
+	unsigned int desc_types;
+
+	/* SEC Compatibility info */
+	unsigned long features;
+
+	/*
+	 * length of the request fifo
+	 * fifo_len is chfifo_len rounded up to next power of 2
+	 * so we can use bitwise ops to wrap
+	 */
+	unsigned int fifo_len;
+
+	struct talitos_channel *chan;
+
+	/* next channel to be assigned next incoming descriptor */
+	atomic_t last_chan ____cacheline_aligned;
+
+	/* request callback tasklet */
+	struct tasklet_struct done_task[2];
+
+	/* list of registered algorithms */
+	struct list_head alg_list;
+
+	/* hwrng device */
+	struct hwrng rng;
+};
+
+/* .features flag */
+#define TALITOS_FTR_SRC_LINK_TBL_LEN_INCLUDES_EXTENT 0x00000001
+#define TALITOS_FTR_HW_AUTH_CHECK 0x00000002
+#define TALITOS_FTR_SHA224_HWINIT 0x00000004
+#define TALITOS_FTR_HMAC_OK 0x00000008
+
 /*
  * TALITOS_xxx_LO addresses point to the low data bits (32-63) of the register
  */
--
1.7.5.1

^ permalink raw reply related

* [PATCH 0/3] Raid: enable talitos xor offload for improving performance
From: Qiang Liu @ 2012-07-10  5:56 UTC (permalink / raw)
  To: linux-crypto, linuxppc-dev

The following 4 patches enabling fsl-dma and talitos offload raid
operations for improving raid performance and balancing CPU load.

Write performance will be improved by 40% tested by iozone. CPU load
will be reduced by 8%.

Qiang Liu (4):
      Talitos: move the data structure into header file
      Talitos: Support for async_tx XOR offload
      fsl-dma: support attribute of DMA_MEMORY when async_tx enabled
      Talitos: fix the issue of dma memory leak

 drivers/crypto/Kconfig   |    9 +
 drivers/crypto/talitos.c |  520 ++++++++++++++++++++++++++++++++++++----------
 drivers/crypto/talitos.h |  161 ++++++++++++++
 drivers/dma/fsldma.c     |   78 +++-----
 4 files changed, 606 insertions(+), 162 deletions(-)

^ permalink raw reply

* [PATCH 0/3] Raid: enable talitos xor offload for improving performance
From: Qiang Liu @ 2012-07-10  5:54 UTC (permalink / raw)
  To: linux-crypto, linuxppc-dev

The following 4 patches enabling fsl-dma and talitos offload raid
operations for improving raid performance and balancing CPU load.

Write performance will be improved by 40% tested by iozone. CPU load
will be reduced by 8%.

Qiang Liu (4):
      Talitos: move the data structure into header file
      Talitos: Support for async_tx XOR offload
      fsl-dma: support attribute of DMA_MEMORY when async_tx enabled
      Talitos: fix the issue of dma memory leak

 drivers/crypto/Kconfig   |    9 +
 drivers/crypto/talitos.c |  520 ++++++++++++++++++++++++++++++++++++----------
 drivers/crypto/talitos.h |  161 ++++++++++++++
 drivers/dma/fsldma.c     |   78 +++-----
 4 files changed, 606 insertions(+), 162 deletions(-)

^ permalink raw reply

* Re: [PATCH -V3 03/11] arch/powerpc: Convert virtual address to vpn
From: Aneesh Kumar K.V @ 2012-07-10  6:15 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: linuxppc-dev, paulus
In-Reply-To: <20120710090605.03b5887c0dabdf61eb70b68a@canb.auug.org.au>

Stephen Rothwell <sfr@canb.auug.org.au> writes:

> Hi Aneesh,
>
> On Mon,  9 Jul 2012 18:43:33 +0530 "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
>>
>> diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
>> index 660b8bb..308e29d 100644
>> --- a/arch/powerpc/mm/hash_native_64.c
>> +++ b/arch/powerpc/mm/hash_native_64.c
>> @@ -39,22 +39,33 @@
>>  
>>  DEFINE_RAW_SPINLOCK(native_tlbie_lock);
>>  
>> -static inline void __tlbie(unsigned long va, int psize, int ssize)
>> +static inline void __tlbie(unsigned long vpn, int psize, int ssize)
>>  {
>> +	unsigned long va;
>>  	unsigned int penc;
>>  
>> +	/*
>> +	 * We need 14 to 65 bits of va for a tlibe of 4K page
>> +	 * With vpn we ignore the lower VPN_SHIFT bits already.
>> +	 * And top two bits are already ignored because we can
>> +	 * only accomadate 76 bits in a 64 bit vpn with a VPN_SHIFT
>> +	 * of 12.
>> +	 */
>> +	BUG_ON((77 - 65) > VPN_SHIFT);
>
> BUILD_BUG_ON() ?

Yes. I also found a bug there. It should be updated as below

	BUILD_BUG_ON(VPN_SHIFT > (77 - 65));

We want to make sure we are not ignoring bits above 65th bit. 


>
>> -static inline void __tlbiel(unsigned long va, int psize, int ssize)
>> +static inline void __tlbiel(unsigned long vpn, int psize, int ssize)
>>  {
>> +	unsigned long va;
>>  	unsigned int penc;
>>  
>> +	BUG_ON((77 - 65) > VPN_SHIFT);
>
> BUILD_BUG_ON() ?

-aneesh

^ permalink raw reply

* RE: [PATCH] PCI: Add pcie_irq=other to enable non MSI/INTx interrupt for port service driver
From: Liu Shengzhou-B36685 @ 2012-07-10  6:13 UTC (permalink / raw)
  To: Wood Scott-B07421
  Cc: bhelgaas@google.com, linux-pci@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <4FFB0927.7080406@freescale.com>

DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogV29vZCBTY290dC1CMDc0
MjENCj4gU2VudDogVHVlc2RheSwgSnVseSAxMCwgMjAxMiAxMjozOSBBTQ0KPiBUbzogTGl1IFNo
ZW5nemhvdS1CMzY2ODUNCj4gQ2M6IGJoZWxnYWFzQGdvb2dsZS5jb207IGxpbnV4LXBjaUB2Z2Vy
Lmtlcm5lbC5vcmc7IGxpbnV4cHBjLQ0KPiBkZXZAbGlzdHMub3psYWJzLm9yZw0KPiBTdWJqZWN0
OiBSZTogW1BBVENIXSBQQ0k6IEFkZCBwY2llX2lycT1vdGhlciB0byBlbmFibGUgbm9uIE1TSS9J
TlR4IGludGVycnVwdA0KPiBmb3IgcG9ydCBzZXJ2aWNlIGRyaXZlcg0KPiANCj4gT24gMDcvMDkv
MjAxMiAwNTo0OSBBTSwgU2hlbmd6aG91IExpdSB3cm90ZToNCj4gPiBPbiBzb21lIHBsYXRmb3Jt
cywgaW4gUkMgbW9kZSwgcm9vdCBwb3J0IGhhcyBuZWl0aGVyIE1TSS9NU0ktWCBub3INCj4gPiBJ
TlR4IGludGVycnVwdCBnZW5lcmF0ZWQsIHdoaWNoIGFyZSBhdmFpbGFibGUgb25seSBpbiBFUCBt
b2RlIG9uIHRob3NlDQo+IHBsYXRmb3JtLg0KPiA+IEluIHRoaXMgY2FzZSwgd2UgdHJ5IHRvIHVz
ZSBvdGhlciBpbnRlcnJ1cHQgaWYgc3VwcG9ydGVkIChpLmUuIHRoZXJlDQo+ID4gaXMgdGhlIHNo
YXJlZCBlcnJvciBpbnRlcnJ1cHQgb24gcGxhdGZvcm0gUDEwMTAsIFAzMDQxLCBQNDA4MCwgZXRj
KSB0bw0KPiA+IGhhdmUgQUVSLCBIb3QtcGx1ZywgZXRjLCBzZXJ2aWNlcyB0byB3b3JrLg0KPiA+
DQo+ID4gU2lnbmVkLW9mZi1ieTogU2hlbmd6aG91IExpdSA8U2hlbmd6aG91LkxpdUBmcmVlc2Nh
bGUuY29tPg0KPiA+IC0tLQ0KPiA+ICBEb2N1bWVudGF0aW9uL2tlcm5lbC1wYXJhbWV0ZXJzLnR4
dCB8ICAgIDQgKysrKw0KPiA+ICBkcml2ZXJzL3BjaS9wY2llL3BvcnRkcnZfY29yZS5jICAgICB8
ICAgMTkgKysrKysrKysrKysrKysrKysrKw0KPiA+ICAyIGZpbGVzIGNoYW5nZWQsIDIzIGluc2Vy
dGlvbnMoKyksIDAgZGVsZXRpb25zKC0pDQo+ID4NCj4gPiBkaWZmIC0tZ2l0IGEvRG9jdW1lbnRh
dGlvbi9rZXJuZWwtcGFyYW1ldGVycy50eHQNCj4gPiBiL0RvY3VtZW50YXRpb24va2VybmVsLXBh
cmFtZXRlcnMudHh0DQo+ID4gaW5kZXggYTkyYzVlYi4uYWY5N2M4MSAxMDA2NDQNCj4gPiAtLS0g
YS9Eb2N1bWVudGF0aW9uL2tlcm5lbC1wYXJhbWV0ZXJzLnR4dA0KPiA+ICsrKyBiL0RvY3VtZW50
YXRpb24va2VybmVsLXBhcmFtZXRlcnMudHh0DQo+ID4gQEAgLTIyMTgsNiArMjIxOCwxMCBAQCBi
eXRlcyByZXNwZWN0aXZlbHkuIFN1Y2ggbGV0dGVyIHN1ZmZpeGVzIGNhbiBhbHNvIGJlDQo+IGVu
dGlyZWx5IG9taXR0ZWQuDQo+ID4gIAkJbm9tc2kJRG8gbm90IHVzZSBNU0kgZm9yIG5hdGl2ZSBQ
Q0llIFBNRSBzaWduYWxpbmcgKHRoaXMgbWFrZXMNCj4gPiAgCQkJYWxsIFBDSWUgcm9vdCBwb3J0
cyB1c2UgSU5UeCBmb3IgYWxsIHNlcnZpY2VzKS4NCj4gPg0KPiA+ICsJcGNpZV9pcnE9CVtQQ0lF
XSBOYXRpdmUgUENJZSByb290IHBvcnQgaW50ZXJydXB0IG9wdGlvbnM6DQo+ID4gKwkJb3RoZXIJ
VHJ5IHRvIHVzZSBvdGhlciBpbnRlcnJ1cHQgd2hlbiByb290IHBvcnQgaGFzDQo+ID4gKwkJCW5l
aXRoZXIgTVNJL01TSS1YIG5vciBJTlR4IHN1cHBvcnQuDQo+IA0KPiBXaHkgZG9lcyB0aGUgdXNl
ciBuZWVkIHRvIHNwZWNpZnkgdGhpcz8gIFNob3VsZG4ndCB0aGlzIGJlIGEgbWF0dGVyIG9mDQo+
IGNvbW11bmljYXRpb24gYmV0d2VlbiBrZXJuZWwgaW50ZXJuYWxzPw0KPiANCg0KVGhlICJvdGhl
ciBpbnRlcnJ1cHQiIGFwcGVhcnMgYSBub24tc3RhbmRhcmQgaW50ZXJydXB0IHdheSBjb21wYXJl
ZCB0byBNU0kvTVNJLVggYW5kIElOVHggaW4gcG9pbnQgb2YgUENJZSBzcGVjLg0KVGhlIGludGVu
dCBvZiBzcGVjaWZ5aW5nIHRoaXMgaXMgdG8gaGF2ZSBhbiBpbnRlcnZlbnRpb24gYW5kIGNvbmZp
cm1hdGlvbiBtYW51YWxseSB0byBhdm9pZCBjYXVzaW5nIHVuZXhwZWN0ZWQgaXNzdWUgb24gc29t
ZSB1bmtub3duIHBsYXRmb3Jtcy4NCkknbSBnbGFkIHRvIHJlbW92ZSB0aGUgc3BlY2lmaWVkIGtl
cm5lbCBwYXJhbWV0ZXIgaWYgaXQgd291bGQgYmUgYWNjZXB0ZWQgdXBzdHJlYW0uDQoNCj4gPiBA
QCAtMjE2LDYgKzIyNywxNCBAQCBzdGF0aWMgaW50IGluaXRfc2VydmljZV9pcnFzKHN0cnVjdCBw
Y2lfZGV2ICpkZXYsIGludA0KPiAqaXJxcywgaW50IG1hc2spDQo+ID4gIAlpZiAoIXBjaV9lbmFi
bGVfbXNpKGRldikgfHwgZGV2LT5waW4pDQo+ID4gIAkJaXJxID0gZGV2LT5pcnE7DQo+ID4NCj4g
PiArCS8qDQo+ID4gKwkgKiBPbiBzb21lIHBsYXRmb3Jtcywgcm9vdCBwb3J0IGhhcyBuZWl0aGVy
IE1TSS9NU0ktWCBub3IgSU5UeA0KPiA+ICsJICogaW50ZXJydXB0IHN1cHBvcnQgaW4gUkMgbW9k
ZSwgc28gdHJ5IHRvIHVzZSBvdGhlciBpbnRlcnJ1cHQoaS5lLg0KPiA+ICsJICogc2hhcmVkIGlu
dGVycnVwdCBpZiBzdXBwb3J0ZWQpLg0KPiA+ICsJICovDQo+ID4gKwllbHNlIGlmIChwb3J0X290
aGVyX2ludGVycnVwdF9lbmFibGVkICYmIGRldi0+aXJxKQ0KPiA+ICsJCWlycSA9IGRldi0+aXJx
Ow0KPiANCj4gSXMgdGhlcmUgYW55IHJlYXNvbiB0byBub3QgdXNlIGRldi0+aXJxIGlmIGl0IGlz
IG5vbi16ZXJvPw0KPiANCj4gLVNjb3R0DQoNCkkgZ3Vlc3MgYSB0aGlua2luZyBvZiB0aGUgb3Jp
Z2luYWwgYXV0aG9yIGRpZCBzbyBpcyBiYXNlZCBvbiB0aGUgcG9pbnQgb2YgUENJZSBzcGVjLCB3
aGV0aGVyIElOVHggaXMgc3VwcG9ydGVkIG9yIG5vdCBpcyBkZWNpZGVkIGJ5IHRoZSAiUENJIEV4
cHJlc3MgSW50ZXJydXB0IFBpbiBSZWdpc3RlcihSZWFkIG9ubHkpIi4NCg0KSXMgdGhlcmUgdGhl
IGNhc2Ugb24gc29tZSBvdGhlciB1bmtub3duIHBsYXRmb3JtcyB0aGF0IGRldi0+aXJxIGlzIG5v
bi16ZXJvIGR1ZSB0byBpbmNvcnJlY3QgaW5pdGlhbGl6YXRpb24gb2YgaXJxIGFsbG9jYXRpb24s
IGJ1dCBhY3R1YWxseSBpdCBkb2Vzbid0IHN1cHBvcnQgTVNJL01TSVgvSU5UeCBvciBhbnkgb3Ro
ZXIgaW50ZXJydXB0PyAgDQpJZiB3ZSBkb24ndCBuZWVkIHRvIGNvbnNpZGVyIHRoaXMgY2FzZSwg
d2UgY2FuIHVzZSBkZXYtPmlycSBkaXJlY3RseSBpbnN0ZWFkIG9mIGRldi0+cGluIHRvIHN1cHBv
cnQgIm90aGVyIGludGVycnVwdCIuDQoNClRoYW5rcywNClNoZW5nemhvdQ0K

^ permalink raw reply

* Re: [PATCH -V3 03/11] arch/powerpc: Convert virtual address to vpn
From: Aneesh Kumar K.V @ 2012-07-10  6:12 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: linuxppc-dev, paulus
In-Reply-To: <20120710084128.6fe4a4347719f4445dd12b4f@canb.auug.org.au>

Stephen Rothwell <sfr@canb.auug.org.au> writes:

> Hi Aneesh,
>
> On Mon,  9 Jul 2012 18:43:33 +0530 "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
>>
>> diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
>> index 1c65a59..1c984a6 100644
>> --- a/arch/powerpc/include/asm/mmu-hash64.h
>> +++ b/arch/powerpc/include/asm/mmu-hash64.h
>> @@ -154,9 +155,25 @@ struct mmu_psize_def
>>  #define MMU_SEGSIZE_256M	0
>>  #define MMU_SEGSIZE_1T		1
>>  
>> +/*
>> + * encode page number shift.
>> + * Inorder to fit the 78 bit va in a 64 bit variable we shift the va by
>> + * 12 bits. This enable us to address upto 76 bit va.
>> + * For hpt hash from a va we can ignore the page size bits of va and for
>> + * hpte encoding we ignore upto 23 bits of va. So ignoring lower 12 bits ensure
>> + * we work in all cases including 4k page size.
>> + */
>> +#define VPN_SHIFT	12
>>  
>> +static inline unsigned long hpte_encode_avpn(unsigned long vpn, int psize,
>> +					     int ssize)
>> +{
>> +	unsigned long v;
>> +	/*
>> +	 * The AVA field omits the low-order 23 bits of the 78 bits VA.
>> +	 * These bits are not needed in the PTE, because the
>> +	 * low-order b of these bits are part of the byte offset
>> +	 * into the virtual page and, if b < 23, the high-order
>> +	 * 23-b of these bits are always used in selecting the
>> +	 * PTEGs to be searched
>> +	 */
>> +	BUG_ON(VPN_SHIFT > 23);
>
> Could/should this be BUILD_BUG_ON()?
>

Yes. Will update.

-aneesh

^ permalink raw reply

* Re: [PATCH] More fixes for lazy IRQ vs. idle
From: Michael Neuling @ 2012-07-10  6:10 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <1341891538.2561.10.camel@pasglop>

Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> Looks like we still have issues with pSeries and Cell idle code
> vs. the lazy irq state. In fact, the reset fixes that went upstream
> are exposing the problem more by causing BUG_ON() to trigger (which
> this patch turns into a WARN_ON instead).

Do we need to cc stable for 3.4 or is this new stuff only effecting us
since the 3.5 merge window?

Mikey

> We need to be careful when using a variant of low power state that
> has the side effect of turning interrupts back on, to properly set
> all the SW & lazy state to look as if everything is enabled before
> we enter the low power state with MSR:EE off as we will return with
> MSR:EE on. If not, we have a discrepancy of state which can cause
> things to go very wrong later on.
> 
> This patch moves the logic into a helper and uses it from the
> pseries and cell idle code. The power4/970 idle code already got
> things right (in assembly even !) so I'm not touching it. The power7
> "bare metal" idle code is subtly different and correct. Remains PA6T
> and some hypervisor based Cell platforms which have questionable
> code in there, but they are mostly dead platforms so I'll fix them
> when I manage to get final answers from the respective maintainers
> about how the low power state actually works on them.
> 
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
> diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
> index 6eb75b8..92224b7 100644
> --- a/arch/powerpc/include/asm/hw_irq.h
> +++ b/arch/powerpc/include/asm/hw_irq.h
> @@ -125,6 +125,8 @@ static inline bool arch_irq_disabled_regs(struct pt_regs *regs)
>  	return !regs->softe;
>  }
>  
> +extern bool prep_irq_for_idle(void);
> +
>  #else /* CONFIG_PPC64 */
>  
>  #define SET_MSR_EE(x)	mtmsr(x)
> diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
> index 1b41502..5dc4a80 100644
> --- a/arch/powerpc/kernel/irq.c
> +++ b/arch/powerpc/kernel/irq.c
> @@ -286,6 +286,52 @@ void notrace restore_interrupts(void)
>  		__hard_irq_enable();
>  }
>  
> +/*
> + * This is a helper to use when about to go into idle low-power
> + * when the latter has the side effect of re-enabling interrupts
> + * (such as calling H_CEDE under pHyp).
> + *
> + * You call this function with interrupts soft-disabled (this is
> + * already the case when ppc_md.power_save is called). The function
> + * will return whether to enter power save or just return.
> + *
> + * In the former case, it will have notified lockdep of interrupts
> + * being re-enabled and generally sanitized the lazy irq state,
> + * and in the latter case it will leave with interrupts hard
> + * disabled and marked as such, so the local_irq_restore() call
> + * in cpu_idle() will properly re-enable everything.
> + */
> +bool prep_irq_for_idle(void)
> +{
> +	/*
> +	 * First we need to hard disable to ensure no interrupt
> +	 * occurs before we effectively enter the low power state
> +	 */
> +	hard_irq_disable();
> +
> +	/*
> +	 * If anything happened while we were soft-disabled,
> +	 * we return now and do not enter the low power state.
> +	 */
> +	if (lazy_irq_pending())
> +		return false;
> +
> +	/* Tell lockdep we are about to re-enable */
> +	trace_hardirqs_on();
> +
> +	/*
> +	 * Mark interrupts as soft-enabled and clear the
> +	 * PACA_IRQ_HARD_DIS from the pending mask since we
> +	 * are about to hard enable as well as a side effect
> +	 * of entering the low power state.
> +	 */
> +	local_paca->irq_happened &= ~PACA_IRQ_HARD_DIS;
> +	local_paca->soft_enabled = 1;
> +
> +	/* Tell the caller to enter the low power state */
> +	return true;
> +}
> +
>  #endif /* CONFIG_PPC64 */
>  
>  int arch_show_interrupts(struct seq_file *p, int prec)
> diff --git a/arch/powerpc/platforms/cell/pervasive.c b/arch/powerpc/platforms/cell/pervasive.c
> index efdacc8..d17e98b 100644
> --- a/arch/powerpc/platforms/cell/pervasive.c
> +++ b/arch/powerpc/platforms/cell/pervasive.c
> @@ -42,11 +42,9 @@ static void cbe_power_save(void)
>  {
>  	unsigned long ctrl, thread_switch_control;
>  
> -	/*
> -	 * We need to hard disable interrupts, the local_irq_enable() done by
> -	 * our caller upon return will hard re-enable.
> -	 */
> -	hard_irq_disable();
> +	/* Ensure our interrupt state is properly tracked */
> +	if (!prep_irq_for_idle())
> +		return;
>  
>  	ctrl = mfspr(SPRN_CTRLF);
>  
> @@ -81,6 +79,9 @@ static void cbe_power_save(void)
>  	 */
>  	ctrl &= ~(CTRL_RUNLATCH | CTRL_TE);
>  	mtspr(SPRN_CTRLT, ctrl);
> +
> +	/* Re-enable interrupts in MSR */
> +	__hard_irq_enable();
>  }
>  
>  static int cbe_system_reset_exception(struct pt_regs *regs)
> diff --git a/arch/powerpc/platforms/pseries/processor_idle.c b/arch/powerpc/platforms/pseries/processor_idle.c
> index a97ef66..9b51ccf 100644
> --- a/arch/powerpc/platforms/pseries/processor_idle.c
> +++ b/arch/powerpc/platforms/pseries/processor_idle.c
> @@ -100,15 +100,17 @@ out:
>  static void check_and_cede_processor(void)
>  {
>  	/*
> -	 * Interrupts are soft-disabled at this point,
> -	 * but not hard disabled. So an interrupt might have
> -	 * occurred before entering NAP, and would be potentially
> -	 * lost (edge events, decrementer events, etc...) unless
> -	 * we first hard disable then check.
> +	 * Ensure our interrupt state is properly tracked,
> +	 * also checks if no interrupt has occurred while we
> +	 * were soft-disabled
>  	 */
> -	hard_irq_disable();
> -	if (!lazy_irq_pending())
> +	if (prep_irq_for_idle()) {
>  		cede_processor();
> +#ifdef CONFIG_TRACE_IRQFLAG
> +		/* Ensure that H_CEDE returns with IRQs on */
> +		WARN_ON(!(mfmsr() & MSR_EE));
> +#endif
> +	}
>  }
>  
>  static int dedicated_cede_loop(struct cpuidle_device *dev,
> 
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 

^ permalink raw reply

* [PATCH] More fixes for lazy IRQ vs. idle
From: Benjamin Herrenschmidt @ 2012-07-10  3:38 UTC (permalink / raw)
  To: linuxppc-dev

Looks like we still have issues with pSeries and Cell idle code
vs. the lazy irq state. In fact, the reset fixes that went upstream
are exposing the problem more by causing BUG_ON() to trigger (which
this patch turns into a WARN_ON instead).

We need to be careful when using a variant of low power state that
has the side effect of turning interrupts back on, to properly set
all the SW & lazy state to look as if everything is enabled before
we enter the low power state with MSR:EE off as we will return with
MSR:EE on. If not, we have a discrepancy of state which can cause
things to go very wrong later on.

This patch moves the logic into a helper and uses it from the
pseries and cell idle code. The power4/970 idle code already got
things right (in assembly even !) so I'm not touching it. The power7
"bare metal" idle code is subtly different and correct. Remains PA6T
and some hypervisor based Cell platforms which have questionable
code in there, but they are mostly dead platforms so I'll fix them
when I manage to get final answers from the respective maintainers
about how the low power state actually works on them.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
index 6eb75b8..92224b7 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -125,6 +125,8 @@ static inline bool arch_irq_disabled_regs(struct pt_regs *regs)
 	return !regs->softe;
 }
 
+extern bool prep_irq_for_idle(void);
+
 #else /* CONFIG_PPC64 */
 
 #define SET_MSR_EE(x)	mtmsr(x)
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 1b41502..5dc4a80 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -286,6 +286,52 @@ void notrace restore_interrupts(void)
 		__hard_irq_enable();
 }
 
+/*
+ * This is a helper to use when about to go into idle low-power
+ * when the latter has the side effect of re-enabling interrupts
+ * (such as calling H_CEDE under pHyp).
+ *
+ * You call this function with interrupts soft-disabled (this is
+ * already the case when ppc_md.power_save is called). The function
+ * will return whether to enter power save or just return.
+ *
+ * In the former case, it will have notified lockdep of interrupts
+ * being re-enabled and generally sanitized the lazy irq state,
+ * and in the latter case it will leave with interrupts hard
+ * disabled and marked as such, so the local_irq_restore() call
+ * in cpu_idle() will properly re-enable everything.
+ */
+bool prep_irq_for_idle(void)
+{
+	/*
+	 * First we need to hard disable to ensure no interrupt
+	 * occurs before we effectively enter the low power state
+	 */
+	hard_irq_disable();
+
+	/*
+	 * If anything happened while we were soft-disabled,
+	 * we return now and do not enter the low power state.
+	 */
+	if (lazy_irq_pending())
+		return false;
+
+	/* Tell lockdep we are about to re-enable */
+	trace_hardirqs_on();
+
+	/*
+	 * Mark interrupts as soft-enabled and clear the
+	 * PACA_IRQ_HARD_DIS from the pending mask since we
+	 * are about to hard enable as well as a side effect
+	 * of entering the low power state.
+	 */
+	local_paca->irq_happened &= ~PACA_IRQ_HARD_DIS;
+	local_paca->soft_enabled = 1;
+
+	/* Tell the caller to enter the low power state */
+	return true;
+}
+
 #endif /* CONFIG_PPC64 */
 
 int arch_show_interrupts(struct seq_file *p, int prec)
diff --git a/arch/powerpc/platforms/cell/pervasive.c b/arch/powerpc/platforms/cell/pervasive.c
index efdacc8..d17e98b 100644
--- a/arch/powerpc/platforms/cell/pervasive.c
+++ b/arch/powerpc/platforms/cell/pervasive.c
@@ -42,11 +42,9 @@ static void cbe_power_save(void)
 {
 	unsigned long ctrl, thread_switch_control;
 
-	/*
-	 * We need to hard disable interrupts, the local_irq_enable() done by
-	 * our caller upon return will hard re-enable.
-	 */
-	hard_irq_disable();
+	/* Ensure our interrupt state is properly tracked */
+	if (!prep_irq_for_idle())
+		return;
 
 	ctrl = mfspr(SPRN_CTRLF);
 
@@ -81,6 +79,9 @@ static void cbe_power_save(void)
 	 */
 	ctrl &= ~(CTRL_RUNLATCH | CTRL_TE);
 	mtspr(SPRN_CTRLT, ctrl);
+
+	/* Re-enable interrupts in MSR */
+	__hard_irq_enable();
 }
 
 static int cbe_system_reset_exception(struct pt_regs *regs)
diff --git a/arch/powerpc/platforms/pseries/processor_idle.c b/arch/powerpc/platforms/pseries/processor_idle.c
index a97ef66..9b51ccf 100644
--- a/arch/powerpc/platforms/pseries/processor_idle.c
+++ b/arch/powerpc/platforms/pseries/processor_idle.c
@@ -100,15 +100,17 @@ out:
 static void check_and_cede_processor(void)
 {
 	/*
-	 * Interrupts are soft-disabled at this point,
-	 * but not hard disabled. So an interrupt might have
-	 * occurred before entering NAP, and would be potentially
-	 * lost (edge events, decrementer events, etc...) unless
-	 * we first hard disable then check.
+	 * Ensure our interrupt state is properly tracked,
+	 * also checks if no interrupt has occurred while we
+	 * were soft-disabled
 	 */
-	hard_irq_disable();
-	if (!lazy_irq_pending())
+	if (prep_irq_for_idle()) {
 		cede_processor();
+#ifdef CONFIG_TRACE_IRQFLAG
+		/* Ensure that H_CEDE returns with IRQs on */
+		WARN_ON(!(mfmsr() & MSR_EE));
+#endif
+	}
 }
 
 static int dedicated_cede_loop(struct cpuidle_device *dev,

^ permalink raw reply related

* Semantics of lv1_pause()
From: Benjamin Herrenschmidt @ 2012-07-10  3:20 UTC (permalink / raw)
  To: Geoff Levand; +Cc: linuxppc-dev

Hi Geoff !

Do you have access to any kind of documentation regarding the precise
semantics of lv1_pause() ?

I'm fixing various issues with our idle loops, among other things,
because we are soft-disabled when we hit ppc_md.powersave() but not
hard-disabled, there could be an interrupt marked as pending and not
taken yet.

Will that work properly ? I have this understanding that it might
not ... but heh.

Also if I call lv1_pause() with MSR_EE off, will it work ? Will it
return with MSR_EE on like H_CEDE does on pseries ?

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH 3/3 v2] powerpc/mpic: FSL MPIC error interrupt support.
From: Kumar Gala @ 2012-07-10  1:47 UTC (permalink / raw)
  To: Scott Wood; +Cc: Bogdan Hamciuc, Varun Sethi, linuxppc-dev
In-Reply-To: <4FFB3D88.6030106@freescale.com>


On Jul 9, 2012, at 3:22 PM, Scott Wood wrote:

> On 07/09/2012 02:03 PM, Kumar Gala wrote:
>>=20
>> On Jul 9, 2012, at 3:47 AM, Varun Sethi wrote:
>>=20
>>> +int mpic_err_int_init(struct mpic *mpic, irq_hw_number_t irqnum)
>>> +{
>>=20
>> Why can't we do this during mpic_init() time?
>=20
> Are you willing to hardcode that IRQ 16 is the error interrupt, =
without
> waiting to see an intspec?

I'm torn, but the bit of code in mpic_host_xlate that calls =
mpic_err_int_init() is just ugly.

We could consider it similar to how we assume IPIs.

>>> +	ret =3D request_irq(virq, fsl_error_int_handler, IRQF_NO_THREAD,
>>> +		    "mpic-error-int", mpic);
>>=20
>> Hmm, should we be using irq_set_chained_handler() instead of =
request_irq
>=20
> As I said last time, "that's how Varun initially did it and I asked =
him
> to change it, because it gets a lot trickier to get things right, and =
I
> didn't see what it was buying us."  That original patch had locking
> problems as a result.
>=20
> Using the chained handler mechanism puts the responsibility on us to =
do
> a lot of the generic stuff that's already perfectly well implemented =
in
> generic code.  We're implementing a cascade, not a new flow.

Makes sense.

- k=

^ permalink raw reply

* Re: [PATCH SLAB 1/2 v3] duplicate the cache name in SLUB's saved_alias list, SLAB, and SLOB
From: Li Zhong @ 2012-07-10  1:35 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: LKML, Glauber Costa, Pekka Enberg, linux-mm, Paul Mackerras,
	Matt Mackall, PowerPC email list, Wanlong Gao
In-Reply-To: <alpine.DEB.2.00.1207090859420.27737@router.home>

On Mon, 2012-07-09 at 09:01 -0500, Christoph Lameter wrote:
> > I was pointed by Glauber to the slab common code patches. I need some
> > more time to read the patches. Now I think the slab/slot changes in this
> > v3 are not needed, and can be ignored.
> 
> That may take some kernel cycles. You have a current issue here that needs
> to be fixed.

I'm a little confused ... and what need I do for the next step? 

> 
> > >  	down_write(&slub_lock);
> > > -	s = find_mergeable(size, align, flags, name, ctor);
> > > +	s = find_mergeable(size, align, flags, n, ctor);
> > >  	if (s) {
> > >  		s->refcount++;
> > >  		/*
> >
> > 		......
> > 		up_write(&slub_lock);
> > 		return s;
> > 	}
> >
> > Here, the function returns without name string n be kfreed.
> 
> That is intentional since the string n is still referenced by the entry
> that sysfs_slab_alias has created.

I'm not sure whether the "referenced by ..." you mentioned is what I
understood. From my understanding:

if slab_state == SYS_FS, after 
	return sysfs_create_link(&slab_kset->kobj, &s->kobj, name); 
is called, the name string passed in sysfs_slab_alias is no longer
referenced (sysfs_new_dirent duplicates the string for sysfs to use).

else, the name sting is referenced by 
	al->name = name;
temporarily. After slab_sysfs_init is finished, the name is not
referenced any more.

So in my patch (slub part), the string is duplicated here, and kfreed in
slab_sysfs_init.

> > But we couldn't kfree n here, because in sysfs_slab_alias(), if
> > (slab_state < SYS_FS), the name need to be kept valid until
> > slab_sysfs_init() is finished adding the entry into sysfs.
> 
> Right that is why it is not freed and that is what fixes the issue you
> see.
> 

^ permalink raw reply

* Re: [PATCH v3] printk: Have printk() never buffer its data
From: Joe Perches @ 2012-07-09 23:41 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Michael Neuling, Greg Kroah-Hartman, LKML, Steven Rostedt,
	Paul E. McKenney, linuxppc-dev, Andrew Morton, Wu Fengguang,
	Linus Torvalds, Ingo Molnar
In-Reply-To: <1341876736.6118.70.camel@joe2Laptop>

On Mon, 2012-07-09 at 16:32 -0700, Joe Perches wrote:
> Then you've changed semantics and I think you need to
> fix it.
> 
> A dev_<level> call is not guaranteed to be a complete
> message.
> 
> There are dev_<level> and netdev_<level> calls
> followed by pr_cont.
> 
> Maybe these could be fixed up and then they could be
> always integral.  There don't look to be too many.
> 
> This may be most (all?) of them:

Nah, there's a bunch more:

$ git grep -E -A10 "\b(netdev|dev)_(info|warn|notice|err|alert|emerg|crit)" drivers | \
  grep -B10 -E "\bprintk\s*\(\s*(KERN_CONT|)\s*\""

All of them could be fixed up though.

^ permalink raw reply

* Re: [PATCH v3] printk: Have printk() never buffer its data
From: Joe Perches @ 2012-07-09 23:32 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Michael Neuling, Greg Kroah-Hartman, LKML, Steven Rostedt,
	Paul E. McKenney, linuxppc-dev, Andrew Morton, Wu Fengguang,
	Linus Torvalds, Ingo Molnar
In-Reply-To: <CAPXgP12wrORpTdzMFW0QKr0j1VqZy_PRBFRM2P=qRqPFMQ=zUA@mail.gmail.com>

On Tue, 2012-07-10 at 00:40 +0200, Kay Sievers wrote:
> On Tue, Jul 10, 2012 at 12:29 AM, Joe Perches <joe@perches.com> wrote:
> > On Tue, 2012-07-10 at 00:10 +0200, Kay Sievers wrote:
> >> On Mon, Jul 9, 2012 at 11:42 PM, Joe Perches <joe@perches.com> wrote:
> >> > On Sun, 2012-07-08 at 19:55 +0200, Kay Sievers wrote:
> >> >
> >> >> At the same time the CPU#2 prints the same warning with a continuation
> >> >> line, but the buffer from CPU#1 can not be flushed to the console, nor
> >> >> can the continuation line printk()s from CPU#2 be merged at this point.
> >> >> The consoles are still locked and busy with replaying the old log
> >> >> messages, so the new continuation data is just stored away in the record
> >> >> buffer as it is coming in.
> >> >> If the console would be registered a bit earlier, or the warning would
> >> >> happen a bit later, we would probably not see any of this.
> >> >>
> >> >> I can fake something like this just by holding the console semaphore
> >> >> over a longer time and printing continuation lines with different CPUs
> >> >> in a row.
> >> >>
> >> >> The patch below seems to work for me. It is also here:
> >> >>   http://git.kernel.org/?p=linux/kernel/git/kay/patches.git;a=blob;f=kmsg-merge-cont.patch;hb=HEAD
> >> >>
> >> >> It only applies cleanly on top of this patch:
> >> >>   http://git.kernel.org/?p=linux/kernel/git/kay/patches.git;a=blob;f=kmsg-syslog-1-byte-read.patch;hb=HEAD
> >> >>
> >> >
> >> > Hi Kay.
> >> >
> >> > I just ran a test with what's in Greg's driver-core -for-linus branch.
> >> >
> >> > One of the differences in dmesg is timestamping of consecutive
> >> >         pr_<level>("foo...)
> >> > followed directly by
> >> >         pr_cont("bar...")
> >> >
> >> > For instance: (dmesg is 3.4, dmesg.0 is 3.5-rc6+)
> >> >
> >> > # grep MAP /var/log/dm* -A1
> >> > dmesg:[    0.781687] ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ]
> >> > dmesg-[    0.781707] ata2: port disabled--ignoring
> >> > --
> >> > dmesg.0:[    0.948881] ata_piix 0000:00:1f.2: MAP [
> >> > dmesg.0-[    0.948883]  P0 P2 P1 P3 ]
> >> >
> >> > These messages originate starting at
> >> > drivers/ata/ata_piix.c:1354
> >> >
> >> > All the continuations are emitted with pr_cont.
> >> >
> >> > I think this output should still be coalesced without
> >> > timestamp deltas.  Perhaps the timestamping code can
> >> > still be reworked to avoid too small a delta producing
> >> > a new timestamp and another dmesg line.
> >>
> >> Hmm, I don't see that.
> >>
> >> If I do:
> >>   pr_info("[");
> >>   for (i = 0; i < 4; i++)
> >>        pr_cont("%i ", i);
> >>   pr_cont("]\n");
> >>
> >> I get:
> >>   6,173,0;[0 1 2 3 ]
> >>
> >> And if I fill the cont buffer and forcefully hold the console sem
> >> during all that, and we can't merge anymore, I get:
> >>   6,167,0;[
> >>   4,168,0;0
> >>   4,169,0;1
> >>   4,170,0;2
> >>   4,171,0;3
> >>   4,172,0;]
> >>
> >> But the output is still all fine for both lines:
> >>   [    0.000000] [0 1 2 3 ]
> >>   [    0.000000] [0 1 2 3 ]
> >>
> >> What do I miss?
> >
> > In this case the initial line is dev_info not pr_info
> > so there are the additional dict descriptors output to
> > /dev/kmsg as well.
> >
> > Maybe that interferes with continuations.  Dunno.
> 
> Yes, it does. Annotated records dev_printk() must be reliable in the
> data storage and for the consumers. We can not expose them to the racy
> continuation printk()s. We need to be able to trust the data they
> print and not possibly merge unrelated things into it.
> 
> If it's needed, we can try to set the flags accordingly, that they
> *look* like a line in the classic byte-stream output, but the
> interface in /dev/kmsg must not export them that way, because
> continuation lines can never be trusted to be correct.

Then you've changed semantics and I think you need to
fix it.

A dev_<level> call is not guaranteed to be a complete
message.

There are dev_<level> and netdev_<level> calls
followed by pr_cont.

Maybe these could be fixed up and then they could be
always integral.  There don't look to be too many.

This may be most (all?) of them:

 drivers/ata/ata_piix.c               |   16 +++++++++-----
 drivers/media/rc/redrat3.c           |   36 ++++++++++++++++-----------------
 drivers/net/ethernet/broadcom/bnx2.c |   26 ++++++++++++++----------
 3 files changed, 42 insertions(+), 36 deletions(-)

diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c
index 3c809bf..f51962f 100644
--- a/drivers/ata/ata_piix.c
+++ b/drivers/ata/ata_piix.c
@@ -1346,38 +1346,42 @@ static const int *__devinit piix_init_sata_map(struct pci_dev *pdev,
 	const int *map;
 	int i, invalid_map = 0;
 	u8 map_value;
+	char maps[50] = {0};
+	int len = 0;
 
 	pci_read_config_byte(pdev, ICH5_PMR, &map_value);
 
 	map = map_db->map[map_value & map_db->mask];
 
-	dev_info(&pdev->dev, "MAP [");
 	for (i = 0; i < 4; i++) {
 		switch (map[i]) {
 		case RV:
 			invalid_map = 1;
-			pr_cont(" XX");
+			len += snprintf(maps + len, sizeof(maps) - len, " XX");
 			break;
 
 		case NA:
-			pr_cont(" --");
+			len += snprintf(maps + len, sizeof(maps) - len, " --");
 			break;
 
 		case IDE:
 			WARN_ON((i & 1) || map[i + 1] != IDE);
 			pinfo[i / 2] = piix_port_info[ich_pata_100];
 			i++;
-			pr_cont(" IDE IDE");
+			len += snprintf(maps + len, sizeof(maps) - len,
+					" IDE IDE");
 			break;
 
 		default:
-			pr_cont(" P%d", map[i]);
+			len += snprintf(maps + len, sizeof(maps) - len,
+					" P%d", map[i]);
 			if (i & 1)
 				pinfo[i / 2].flags |= ATA_FLAG_SLAVE_POSS;
 			break;
 		}
 	}
-	pr_cont(" ]\n");
+
+	dev_info(&pdev->dev, "MAP [%s ]\n", maps);
 
 	if (invalid_map)
 		dev_err(&pdev->dev, "invalid MAP value %u\n", map_value);
diff --git a/drivers/media/rc/redrat3.c b/drivers/media/rc/redrat3.c
index 2878b0e..636a245 100644
--- a/drivers/media/rc/redrat3.c
+++ b/drivers/media/rc/redrat3.c
@@ -296,41 +296,36 @@ static void redrat3_issue_async(struct redrat3_dev *rr3)
 
 static void redrat3_dump_fw_error(struct redrat3_dev *rr3, int code)
 {
-	if (!rr3->transmitting && (code != 0x40))
-		dev_info(rr3->dev, "fw error code 0x%02x: ", code);
+	const char *msg = "";
 
 	switch (code) {
 	case 0x00:
-		pr_cont("No Error\n");
+		msg = "No Error";
 		break;
 
 	/* Codes 0x20 through 0x2f are IR Firmware Errors */
 	case 0x20:
-		pr_cont("Initial signal pulse not long enough "
-			"to measure carrier frequency\n");
+		msg = "Initial signal pulse not long enough to measure carrier frequency";
 		break;
 	case 0x21:
-		pr_cont("Not enough length values allocated for signal\n");
+		msg = "Not enough length values allocated for signal";
 		break;
 	case 0x22:
-		pr_cont("Not enough memory allocated for signal data\n");
+		msg = "Not enough memory allocated for signal data";
 		break;
 	case 0x23:
-		pr_cont("Too many signal repeats\n");
+		msg = "Too many signal repeats";
 		break;
 	case 0x28:
-		pr_cont("Insufficient memory available for IR signal "
-			"data memory allocation\n");
+		msg = "Insufficient memory available for IR signal data memory allocation";
 		break;
 	case 0x29:
-		pr_cont("Insufficient memory available "
-			"for IrDa signal data memory allocation\n");
+		msg = "Insufficient memory available for IrDa signal data memory allocation";
 		break;
 
 	/* Codes 0x30 through 0x3f are USB Firmware Errors */
 	case 0x30:
-		pr_cont("Insufficient memory available for bulk "
-			"transfer structure\n");
+		msg = "Insufficient memory available for bulk transfer structure";
 		break;
 
 	/*
@@ -339,20 +334,23 @@ static void redrat3_dump_fw_error(struct redrat3_dev *rr3, int code)
 	 */
 	case 0x40:
 		if (!rr3->transmitting)
-			pr_cont("Signal capture has been terminated\n");
+			msg = "Signal capture has been terminated";
 		break;
 	case 0x41:
-		pr_cont("Attempt to set/get and unknown signal I/O "
-			"algorithm parameter\n");
+		msg = "Attempt to set/get and unknown signal I/O algorithm parameter";
 		break;
 	case 0x42:
-		pr_cont("Signal capture already started\n");
+		msg = "Signal capture already started";
 		break;
 
 	default:
-		pr_cont("Unknown Error\n");
+		msg = "Unknown Error";
 		break;
 	}
+
+	if (!rr3->transmitting && (code != 0x40))
+		dev_info(rr3->dev, "fw error code 0x%02x: %s\n", code, msg);
+
 }
 
 static u32 redrat3_val_to_mod_freq(struct redrat3_signal_header *ph)
diff --git a/drivers/net/ethernet/broadcom/bnx2.c b/drivers/net/ethernet/broadcom/bnx2.c
index 1901da1..f5dd19f 100644
--- a/drivers/net/ethernet/broadcom/bnx2.c
+++ b/drivers/net/ethernet/broadcom/bnx2.c
@@ -987,24 +987,28 @@ static void
 bnx2_report_link(struct bnx2 *bp)
 {
 	if (bp->link_up) {
+		const char *fc_type = "";
+		const char *fc_on = "";
 		netif_carrier_on(bp->dev);
-		netdev_info(bp->dev, "NIC %s Link is Up, %d Mbps %s duplex",
-			    bnx2_xceiver_str(bp),
-			    bp->line_speed,
-			    bp->duplex == DUPLEX_FULL ? "full" : "half");
 
 		if (bp->flow_ctrl) {
 			if (bp->flow_ctrl & FLOW_CTRL_RX) {
-				pr_cont(", receive ");
 				if (bp->flow_ctrl & FLOW_CTRL_TX)
-					pr_cont("& transmit ");
-			}
-			else {
-				pr_cont(", transmit ");
+					fc_type = ", receive & transmit";
+				else
+					fc_type = ", receive";
+
+			} else {
+				fc_type = ", transmit";
 			}
-			pr_cont("flow control ON");
+			fc_on = " flow control ON";
 		}
-		pr_cont("\n");
+		netdev_info(bp->dev, "NIC %s Link is Up, %d Mbps %s duplex%s%s\n",
+			    bnx2_xceiver_str(bp),
+			    bp->line_speed,
+			    bp->duplex == DUPLEX_FULL ? "full" : "half",
+			    fc_type, fc_on);
+
 	} else {
 		netif_carrier_off(bp->dev);
 		netdev_err(bp->dev, "NIC %s Link is Down\n",

^ permalink raw reply related

* Re: [PATCH -V3 03/11] arch/powerpc: Convert virtual address to vpn
From: Stephen Rothwell @ 2012-07-09 23:06 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: paulus, linuxppc-dev
In-Reply-To: <1341839621-28332-4-git-send-email-aneesh.kumar@linux.vnet.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 1295 bytes --]

Hi Aneesh,

On Mon,  9 Jul 2012 18:43:33 +0530 "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
>
> diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
> index 660b8bb..308e29d 100644
> --- a/arch/powerpc/mm/hash_native_64.c
> +++ b/arch/powerpc/mm/hash_native_64.c
> @@ -39,22 +39,33 @@
>  
>  DEFINE_RAW_SPINLOCK(native_tlbie_lock);
>  
> -static inline void __tlbie(unsigned long va, int psize, int ssize)
> +static inline void __tlbie(unsigned long vpn, int psize, int ssize)
>  {
> +	unsigned long va;
>  	unsigned int penc;
>  
> +	/*
> +	 * We need 14 to 65 bits of va for a tlibe of 4K page
> +	 * With vpn we ignore the lower VPN_SHIFT bits already.
> +	 * And top two bits are already ignored because we can
> +	 * only accomadate 76 bits in a 64 bit vpn with a VPN_SHIFT
> +	 * of 12.
> +	 */
> +	BUG_ON((77 - 65) > VPN_SHIFT);

BUILD_BUG_ON() ?

> -static inline void __tlbiel(unsigned long va, int psize, int ssize)
> +static inline void __tlbiel(unsigned long vpn, int psize, int ssize)
>  {
> +	unsigned long va;
>  	unsigned int penc;
>  
> +	BUG_ON((77 - 65) > VPN_SHIFT);

BUILD_BUG_ON() ?

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH -V3 03/11] arch/powerpc: Convert virtual address to vpn
From: Stephen Rothwell @ 2012-07-09 22:41 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: paulus, linuxppc-dev
In-Reply-To: <1341839621-28332-4-git-send-email-aneesh.kumar@linux.vnet.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 1501 bytes --]

Hi Aneesh,

On Mon,  9 Jul 2012 18:43:33 +0530 "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
>
> diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
> index 1c65a59..1c984a6 100644
> --- a/arch/powerpc/include/asm/mmu-hash64.h
> +++ b/arch/powerpc/include/asm/mmu-hash64.h
> @@ -154,9 +155,25 @@ struct mmu_psize_def
>  #define MMU_SEGSIZE_256M	0
>  #define MMU_SEGSIZE_1T		1
>  
> +/*
> + * encode page number shift.
> + * Inorder to fit the 78 bit va in a 64 bit variable we shift the va by
> + * 12 bits. This enable us to address upto 76 bit va.
> + * For hpt hash from a va we can ignore the page size bits of va and for
> + * hpte encoding we ignore upto 23 bits of va. So ignoring lower 12 bits ensure
> + * we work in all cases including 4k page size.
> + */
> +#define VPN_SHIFT	12
>  
> +static inline unsigned long hpte_encode_avpn(unsigned long vpn, int psize,
> +					     int ssize)
> +{
> +	unsigned long v;
> +	/*
> +	 * The AVA field omits the low-order 23 bits of the 78 bits VA.
> +	 * These bits are not needed in the PTE, because the
> +	 * low-order b of these bits are part of the byte offset
> +	 * into the virtual page and, if b < 23, the high-order
> +	 * 23-b of these bits are always used in selecting the
> +	 * PTEGs to be searched
> +	 */
> +	BUG_ON(VPN_SHIFT > 23);

Could/should this be BUILD_BUG_ON()?

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH v3] printk: Have printk() never buffer its data
From: Kay Sievers @ 2012-07-09 22:40 UTC (permalink / raw)
  To: Joe Perches
  Cc: Michael Neuling, Greg Kroah-Hartman, LKML, Steven Rostedt,
	Paul E. McKenney, linuxppc-dev, Andrew Morton, Wu Fengguang,
	Linus Torvalds, Ingo Molnar
In-Reply-To: <1341872984.6118.54.camel@joe2Laptop>

On Tue, Jul 10, 2012 at 12:29 AM, Joe Perches <joe@perches.com> wrote:
> On Tue, 2012-07-10 at 00:10 +0200, Kay Sievers wrote:
>> On Mon, Jul 9, 2012 at 11:42 PM, Joe Perches <joe@perches.com> wrote:
>> > On Sun, 2012-07-08 at 19:55 +0200, Kay Sievers wrote:
>> >
>> >> At the same time the CPU#2 prints the same warning with a continuation
>> >> line, but the buffer from CPU#1 can not be flushed to the console, nor
>> >> can the continuation line printk()s from CPU#2 be merged at this point.
>> >> The consoles are still locked and busy with replaying the old log
>> >> messages, so the new continuation data is just stored away in the record
>> >> buffer as it is coming in.
>> >> If the console would be registered a bit earlier, or the warning would
>> >> happen a bit later, we would probably not see any of this.
>> >>
>> >> I can fake something like this just by holding the console semaphore
>> >> over a longer time and printing continuation lines with different CPUs
>> >> in a row.
>> >>
>> >> The patch below seems to work for me. It is also here:
>> >>   http://git.kernel.org/?p=linux/kernel/git/kay/patches.git;a=blob;f=kmsg-merge-cont.patch;hb=HEAD
>> >>
>> >> It only applies cleanly on top of this patch:
>> >>   http://git.kernel.org/?p=linux/kernel/git/kay/patches.git;a=blob;f=kmsg-syslog-1-byte-read.patch;hb=HEAD
>> >>
>> >
>> > Hi Kay.
>> >
>> > I just ran a test with what's in Greg's driver-core -for-linus branch.
>> >
>> > One of the differences in dmesg is timestamping of consecutive
>> >         pr_<level>("foo...)
>> > followed directly by
>> >         pr_cont("bar...")
>> >
>> > For instance: (dmesg is 3.4, dmesg.0 is 3.5-rc6+)
>> >
>> > # grep MAP /var/log/dm* -A1
>> > dmesg:[    0.781687] ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ]
>> > dmesg-[    0.781707] ata2: port disabled--ignoring
>> > --
>> > dmesg.0:[    0.948881] ata_piix 0000:00:1f.2: MAP [
>> > dmesg.0-[    0.948883]  P0 P2 P1 P3 ]
>> >
>> > These messages originate starting at
>> > drivers/ata/ata_piix.c:1354
>> >
>> > All the continuations are emitted with pr_cont.
>> >
>> > I think this output should still be coalesced without
>> > timestamp deltas.  Perhaps the timestamping code can
>> > still be reworked to avoid too small a delta producing
>> > a new timestamp and another dmesg line.
>>
>> Hmm, I don't see that.
>>
>> If I do:
>>   pr_info("[");
>>   for (i = 0; i < 4; i++)
>>        pr_cont("%i ", i);
>>   pr_cont("]\n");
>>
>> I get:
>>   6,173,0;[0 1 2 3 ]
>>
>> And if I fill the cont buffer and forcefully hold the console sem
>> during all that, and we can't merge anymore, I get:
>>   6,167,0;[
>>   4,168,0;0
>>   4,169,0;1
>>   4,170,0;2
>>   4,171,0;3
>>   4,172,0;]
>>
>> But the output is still all fine for both lines:
>>   [    0.000000] [0 1 2 3 ]
>>   [    0.000000] [0 1 2 3 ]
>>
>> What do I miss?
>
> In this case the initial line is dev_info not pr_info
> so there are the additional dict descriptors output to
> /dev/kmsg as well.
>
> Maybe that interferes with continuations.  Dunno.

Yes, it does. Annotated records dev_printk() must be reliable in the
data storage and for the consumers. We can not expose them to the racy
continuation printk()s. We need to be able to trust the data they
print and not possibly merge unrelated things into it.

If it's needed, we can try to set the flags accordingly, that they
*look* like a line in the classic byte-stream output, but the
interface in /dev/kmsg must not export them that way, because
continuation lines can never be trusted to be correct.

Kay

^ permalink raw reply

* Re: [PATCH v3] printk: Have printk() never buffer its data
From: Michael Neuling @ 2012-07-09 22:36 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Kay Sievers, LKML, Steven Rostedt, Paul E. McKenney, linuxppc-dev,
	Joe Perches, Andrew Morton, Wu Fengguang, Linus Torvalds,
	Ingo Molnar
In-Reply-To: <20120709170956.GB24728@kroah.com>

Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:

> On Sun, Jul 08, 2012 at 07:55:55PM +0200, Kay Sievers wrote:
> > On Sat, 2012-07-07 at 07:04 +1000, Michael Neuling wrote:
> > > Whole kmsg below.
> > 
> > I guess I have an idea now what's going on.
> > 
> > > 4,47,0;WARNING: at /scratch/mikey/src/linux-ozlabs/arch/powerpc/sysdev/xics/xics-common.c:105
> > > 4,51,0;MSR: 9000000000021032 <SF,HV,ME,IR,DR,RI>  CR: 24000042  XER: 22000000
> > > 4,54,0;TASK = c000000000b2dd80[0] 'swapper/0' THREAD: c000000000c24000 CPU: 0
> > 
> > This is the warning on CPU#1, all fine, all in one line.
> > 
> > > 6,74,0;console [tty0] enabled
> > > 6,75,0;console [hvc0] enabled
> > 
> > Now the boot consoles are registered, which replays the whole buffer
> > that was collected up to this point. During the entire time the console
> > semaphore needs to be held, and this can be quite a while.
> > 
> > > 4,87,24545;WARNING: at /scratch/mikey/src/linux-ozlabs/arch/powerpc/sysdev/xics/xics-common.c:105
> > > \4,91,24586;MSR: 9000000000021032 
> > > 4,92,24590;<
> > > 4,93,24594;SF
> > > 4,94,24599;,HV
> > > 4,95,24604;,ME
> > > 4,96,24609;,IR
> > > 4,97,24614;,DR
> > > 4,98,24619;,RI
> > > 4,99,24623;>
> > > 4,104,24661; CPU: 1
> > 
> > At the same time the CPU#2 prints the same warning with a continuation
> > line, but the buffer from CPU#1 can not be flushed to the console, nor
> > can the continuation line printk()s from CPU#2 be merged at this point.
> > The consoles are still locked and busy with replaying the old log
> > messages, so the new continuation data is just stored away in the record
> > buffer as it is coming in.
> > If the console would be registered a bit earlier, or the warning would
> > happen a bit later, we would probably not see any of this.
> > 
> > I can fake something like this just by holding the console semaphore
> > over a longer time and printing continuation lines with different CPUs
> > in a row.
> > 
> > The patch below seems to work for me. It is also here:
> >   http://git.kernel.org/?p=linux/kernel/git/kay/patches.git;a=blob;f=kmsg-merge-cont.patch;hb=HEAD
> > 
> > It only applies cleanly on top of this patch:
> >   http://git.kernel.org/?p=linux/kernel/git/kay/patches.git;a=blob;f=kmsg-syslog-1-byte-read.patch;hb=HEAD
> > 
> > Thanks,
> > Kay
> > 
> > 
> > Subject: kmsg: merge continuation records while printing
> > 
> > In (the unlikely) case our continuation merge buffer is busy, we unfortunately
> > can not merge further continuation printk()s into a single record and have to
> > store them separately, which leads to split-up output of these lines when they
> > are printed.
> > 
> > Add some flags about newlines and prefix existence to these records and try to
> > reconstruct the full line again, when the separated records are printed.
> > ---
> >  kernel/printk.c |  119 ++++++++++++++++++++++++++++++++++++--------------------
> >  1 file changed, 77 insertions(+), 42 deletions(-)
> 
> Michael, did this solve the issue for you?

It didn't but I've been working with Kay offline and what he pushed to
you and is now in your driver-core-linus branch is good.  ie.

  5becfb1 kmsg: merge continuation records while printing

... works for me.

Thanks,
Mikey

^ permalink raw reply

* Re: [PATCH v3] printk: Have printk() never buffer its data
From: Joe Perches @ 2012-07-09 22:29 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Michael Neuling, Greg Kroah-Hartman, LKML, Steven Rostedt,
	Paul E. McKenney, linuxppc-dev, Andrew Morton, Wu Fengguang,
	Linus Torvalds, Ingo Molnar
In-Reply-To: <CAPXgP11Qsj9yGM5nSbRJ_YPFZcr2Km6w4R8wPGU1AytyQChYXQ@mail.gmail.com>

On Tue, 2012-07-10 at 00:10 +0200, Kay Sievers wrote:
> On Mon, Jul 9, 2012 at 11:42 PM, Joe Perches <joe@perches.com> wrote:
> > On Sun, 2012-07-08 at 19:55 +0200, Kay Sievers wrote:
> >
> >> At the same time the CPU#2 prints the same warning with a continuation
> >> line, but the buffer from CPU#1 can not be flushed to the console, nor
> >> can the continuation line printk()s from CPU#2 be merged at this point.
> >> The consoles are still locked and busy with replaying the old log
> >> messages, so the new continuation data is just stored away in the record
> >> buffer as it is coming in.
> >> If the console would be registered a bit earlier, or the warning would
> >> happen a bit later, we would probably not see any of this.
> >>
> >> I can fake something like this just by holding the console semaphore
> >> over a longer time and printing continuation lines with different CPUs
> >> in a row.
> >>
> >> The patch below seems to work for me. It is also here:
> >>   http://git.kernel.org/?p=linux/kernel/git/kay/patches.git;a=blob;f=kmsg-merge-cont.patch;hb=HEAD
> >>
> >> It only applies cleanly on top of this patch:
> >>   http://git.kernel.org/?p=linux/kernel/git/kay/patches.git;a=blob;f=kmsg-syslog-1-byte-read.patch;hb=HEAD
> >>
> >
> > Hi Kay.
> >
> > I just ran a test with what's in Greg's driver-core -for-linus branch.
> >
> > One of the differences in dmesg is timestamping of consecutive
> >         pr_<level>("foo...)
> > followed directly by
> >         pr_cont("bar...")
> >
> > For instance: (dmesg is 3.4, dmesg.0 is 3.5-rc6+)
> >
> > # grep MAP /var/log/dm* -A1
> > dmesg:[    0.781687] ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ]
> > dmesg-[    0.781707] ata2: port disabled--ignoring
> > --
> > dmesg.0:[    0.948881] ata_piix 0000:00:1f.2: MAP [
> > dmesg.0-[    0.948883]  P0 P2 P1 P3 ]
> >
> > These messages originate starting at
> > drivers/ata/ata_piix.c:1354
> >
> > All the continuations are emitted with pr_cont.
> >
> > I think this output should still be coalesced without
> > timestamp deltas.  Perhaps the timestamping code can
> > still be reworked to avoid too small a delta producing
> > a new timestamp and another dmesg line.
> 
> Hmm, I don't see that.
> 
> If I do:
>   pr_info("[");
>   for (i = 0; i < 4; i++)
>        pr_cont("%i ", i);
>   pr_cont("]\n");
> 
> I get:
>   6,173,0;[0 1 2 3 ]
> 
> And if I fill the cont buffer and forcefully hold the console sem
> during all that, and we can't merge anymore, I get:
>   6,167,0;[
>   4,168,0;0
>   4,169,0;1
>   4,170,0;2
>   4,171,0;3
>   4,172,0;]
> 
> But the output is still all fine for both lines:
>   [    0.000000] [0 1 2 3 ]
>   [    0.000000] [0 1 2 3 ]
> 
> What do I miss?

In this case the initial line is dev_info not pr_info
so there are the additional dict descriptors output to
/dev/kmsg as well.

Maybe that interferes with continuations.  Dunno.

^ permalink raw reply

* Re: [PATCH v3] printk: Have printk() never buffer its data
From: Kay Sievers @ 2012-07-09 22:10 UTC (permalink / raw)
  To: Joe Perches
  Cc: Michael Neuling, Greg Kroah-Hartman, LKML, Steven Rostedt,
	Paul E. McKenney, linuxppc-dev, Andrew Morton, Wu Fengguang,
	Linus Torvalds, Ingo Molnar
In-Reply-To: <1341870165.6118.42.camel@joe2Laptop>

On Mon, Jul 9, 2012 at 11:42 PM, Joe Perches <joe@perches.com> wrote:
> On Sun, 2012-07-08 at 19:55 +0200, Kay Sievers wrote:
>
>> At the same time the CPU#2 prints the same warning with a continuation
>> line, but the buffer from CPU#1 can not be flushed to the console, nor
>> can the continuation line printk()s from CPU#2 be merged at this point.
>> The consoles are still locked and busy with replaying the old log
>> messages, so the new continuation data is just stored away in the record
>> buffer as it is coming in.
>> If the console would be registered a bit earlier, or the warning would
>> happen a bit later, we would probably not see any of this.
>>
>> I can fake something like this just by holding the console semaphore
>> over a longer time and printing continuation lines with different CPUs
>> in a row.
>>
>> The patch below seems to work for me. It is also here:
>>   http://git.kernel.org/?p=linux/kernel/git/kay/patches.git;a=blob;f=kmsg-merge-cont.patch;hb=HEAD
>>
>> It only applies cleanly on top of this patch:
>>   http://git.kernel.org/?p=linux/kernel/git/kay/patches.git;a=blob;f=kmsg-syslog-1-byte-read.patch;hb=HEAD
>>
>
> Hi Kay.
>
> I just ran a test with what's in Greg's driver-core -for-linus branch.
>
> One of the differences in dmesg is timestamping of consecutive
>         pr_<level>("foo...)
> followed directly by
>         pr_cont("bar...")
>
> For instance: (dmesg is 3.4, dmesg.0 is 3.5-rc6+)
>
> # grep MAP /var/log/dm* -A1
> dmesg:[    0.781687] ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ]
> dmesg-[    0.781707] ata2: port disabled--ignoring
> --
> dmesg.0:[    0.948881] ata_piix 0000:00:1f.2: MAP [
> dmesg.0-[    0.948883]  P0 P2 P1 P3 ]
>
> These messages originate starting at
> drivers/ata/ata_piix.c:1354
>
> All the continuations are emitted with pr_cont.
>
> I think this output should still be coalesced without
> timestamp deltas.  Perhaps the timestamping code can
> still be reworked to avoid too small a delta producing
> a new timestamp and another dmesg line.

Hmm, I don't see that.

If I do:
  pr_info("[");
  for (i = 0; i < 4; i++)
       pr_cont("%i ", i);
  pr_cont("]\n");

I get:
  6,173,0;[0 1 2 3 ]

And if I fill the cont buffer and forcefully hold the console sem
during all that, and we can't merge anymore, I get:
  6,167,0;[
  4,168,0;0
  4,169,0;1
  4,170,0;2
  4,171,0;3
  4,172,0;]

But the output is still all fine for both lines:
  [    0.000000] [0 1 2 3 ]
  [    0.000000] [0 1 2 3 ]

What do I miss?

Kay

^ permalink raw reply

* Re: [PATCH v3] printk: Have printk() never buffer its data
From: Joe Perches @ 2012-07-09 21:42 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Michael Neuling, Greg Kroah-Hartman, LKML, Steven Rostedt,
	Paul E. McKenney, linuxppc-dev, Andrew Morton, Wu Fengguang,
	Linus Torvalds, Ingo Molnar
In-Reply-To: <1341770155.1011.27.camel@mop>

On Sun, 2012-07-08 at 19:55 +0200, Kay Sievers wrote:

> At the same time the CPU#2 prints the same warning with a continuation
> line, but the buffer from CPU#1 can not be flushed to the console, nor
> can the continuation line printk()s from CPU#2 be merged at this point.
> The consoles are still locked and busy with replaying the old log
> messages, so the new continuation data is just stored away in the record
> buffer as it is coming in.
> If the console would be registered a bit earlier, or the warning would
> happen a bit later, we would probably not see any of this.
> 
> I can fake something like this just by holding the console semaphore
> over a longer time and printing continuation lines with different CPUs
> in a row.
> 
> The patch below seems to work for me. It is also here:
>   http://git.kernel.org/?p=linux/kernel/git/kay/patches.git;a=blob;f=kmsg-merge-cont.patch;hb=HEAD
> 
> It only applies cleanly on top of this patch:
>   http://git.kernel.org/?p=linux/kernel/git/kay/patches.git;a=blob;f=kmsg-syslog-1-byte-read.patch;hb=HEAD
> 

Hi Kay.

I just ran a test with what's in Greg's driver-core -for-linus branch.

One of the differences in dmesg is timestamping of consecutive
	pr_<level>("foo...)
followed directly by
	pr_cont("bar...")

For instance: (dmesg is 3.4, dmesg.0 is 3.5-rc6+)

# grep MAP /var/log/dm* -A1
dmesg:[    0.781687] ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ]
dmesg-[    0.781707] ata2: port disabled--ignoring
--
dmesg.0:[    0.948881] ata_piix 0000:00:1f.2: MAP [
dmesg.0-[    0.948883]  P0 P2 P1 P3 ]

These messages originate starting at
drivers/ata/ata_piix.c:1354

All the continuations are emitted with pr_cont.

I think this output should still be coalesced without
timestamp deltas.  Perhaps the timestamping code can
still be reworked to avoid too small a delta producing
a new timestamp and another dmesg line.

cheers, Joe

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox