[PATCH] mmc_spi: do propper retry managment in the block layer

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] mmc_spi: do propper retry managment in the block layer - rewritten
@ 2009-05-18 13:21 Wolfgang Mües
  2009-05-18 15:20 ` Matt Fleming
  0 siblings, 1 reply; 7+ messages in thread
From: Wolfgang Mües @ 2009-05-18 13:21 UTC (permalink / raw)
  To: Pierre Ossman
  Cc: Andrew Morton, Matt Fleming, David Brownell, Mike Frysinger,
	linux-kernel

From: Wolfgang Muees <wolfgang.mues@auerswald.de>

o This patch adds a propper retry managment for reading
  and writing data blocks for mmc and mmc_spi. Blocks are
  retransmitted if the cause of failure might be a transmission
  fault. After a permanent failure, we try to read all other
  pending sectors, but terminate on write.
  This patch was tested with induced transmission errors
  by ESD pulses (and survived).

Signed-off-by: Wolfgang Muees <wolfgang.mues@auerswald.de>

---
diff -uprN 2_6_29_rc7_patch_EILSEQ/drivers/mmc/card/block.c 2_6_29_rc7_patch_retries/drivers/mmc/card/block.c
--- 2_6_29_rc7_patch_EILSEQ/drivers/mmc/card/block.c	2009-03-09 17:10:55.000000000 +0100
+++ 2_6_29_rc7_patch_retries/drivers/mmc/card/block.c	2009-05-18 15:00:41.000000000 +0200
@@ -230,12 +230,14 @@ static int mmc_blk_issue_rq(struct mmc_q
 	struct mmc_card *card = md->queue.card;
 	struct mmc_blk_request brq;
 	int ret = 1, disable_multi = 0;
+	int retries = 3;

 	mmc_claim_host(card->host);

 	do {
 		struct mmc_command cmd;
 		u32 readcmd, writecmd, status = 0;
+		int error = 0;

 		memset(&brq, 0, sizeof(struct mmc_blk_request));
 		brq.mrq.cmd = &brq.cmd;
@@ -251,10 +253,8 @@ static int mmc_blk_issue_rq(struct mmc_q
 		brq.stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC;
 		brq.data.blocks = req->nr_sectors;

-		/*
-		 * After a read error, we redo the request one sector at a time
-		 * in order to accurately determine which sectors can be read
-		 * successfully.
+		/* After an transmission error, we switch to single block
+		 * transfer until the problem is solved or we fail.
 		 */
 		if (disable_multi && brq.data.blocks > 1)
 			brq.data.blocks = 1;
@@ -262,10 +262,12 @@ static int mmc_blk_issue_rq(struct mmc_q
 		if (brq.data.blocks > 1) {
 			/* SPI multiblock writes terminate using a special
 			 * token, not a STOP_TRANSMISSION request.
+			 * Here, this request is set for ALL types of
+			 * hosts, so that we can use it in the lower
+			 * layers if the data transfer stage has failed
+			 * and the card is not able to accept the token.
 			 */
-			if (!mmc_host_is_spi(card->host)
-					|| rq_data_dir(req) == READ)
-				brq.mrq.stop = &brq.stop;
+			brq.mrq.stop = &brq.stop;
 			readcmd = MMC_READ_MULTIPLE_BLOCK;
 			writecmd = MMC_WRITE_MULTIPLE_BLOCK;
 		} else {
@@ -311,52 +313,71 @@ static int mmc_blk_issue_rq(struct mmc_q
 		mmc_wait_for_req(card->host, &brq.mrq);

 		mmc_queue_bounce_post(mq);
-
-		/*
-		 * Check for errors here, but don't jump to cmd_err
-		 * until later as we need to wait for the card to leave
-		 * programming mode even when things go wrong.
-		 */
-		if (brq.cmd.error || brq.data.error || brq.stop.error) {
-			if (brq.data.blocks > 1 && rq_data_dir(req) == READ) {
-				/* Redo read one sector at a time */
-				printk(KERN_WARNING "%s: retrying using single "
-				       "block read\n", req->rq_disk->disk_name);
-				disable_multi = 1;
-				continue;
-			}
-			status = get_card_status(card, req);
-		}
-
+#if 0
+		printk(KERN_INFO "%s transfer sector %d count %d\n",
+			(rq_data_dir(req) == READ) ? "Read" : "Write",
+			(int)req->sector, (int)brq.data.blocks);
+#endif
+		/* Error reporting */
 		if (brq.cmd.error) {
+			error = brq.cmd.error;
+			status = get_card_status(card, req);
 			printk(KERN_ERR "%s: error %d sending read/write "
 			       "command, response %#x, card status %#x\n",
 			       req->rq_disk->disk_name, brq.cmd.error,
 			       brq.cmd.resp[0], status);
-		}
-
-		if (brq.data.error) {
-			if (brq.data.error == -ETIMEDOUT && brq.mrq.stop)
+		} else if (brq.data.error) {
+			error = brq.data.error;
+			if ((brq.data.error == -ETIMEDOUT)
+			    && brq.mrq.stop && !brq.stop.error)
 				/* 'Stop' response contains card status */
 				status = brq.mrq.stop->resp[0];
+			else
+				status = get_card_status(card, req);
 			printk(KERN_ERR "%s: error %d transferring data,"
 			       " sector %u, nr %u, card status %#x\n",
 			       req->rq_disk->disk_name, brq.data.error,
 			       (unsigned)req->sector,
 			       (unsigned)req->nr_sectors, status);
-		}
-
-		if (brq.stop.error) {
+		} else if (brq.stop.error) {
+			error = brq.stop.error;
+			status = get_card_status(card, req);
 			printk(KERN_ERR "%s: error %d sending stop command, "
 			       "response %#x, card status %#x\n",
 			       req->rq_disk->disk_name, brq.stop.error,
 			       brq.stop.resp[0], status);
 		}

+		/*
+		 * If this is an SD card and we're writing, we can first
+		 * mark the known good sectors as ok.
+		 *
+		 * If the card is not SD, we can still ok written sectors
+		 * as reported by the controller (which might be less than
+		 * the real number of written sectors, but never more).
+		 *
+		 * For read transfers, we report the number of tranfered bytes.
+		 */
+		if (error && mmc_card_sd(card) && (rq_data_dir(req) != READ)) {
+			u32 blocks;
+
+			blocks = mmc_sd_num_wr_blocks(card);
+			if (blocks != (u32)-1)
+				brq.data.bytes_xfered = blocks << 9;
+		}
+		if (brq.data.bytes_xfered) {
+			spin_lock_irq(&md->lock);
+			ret = __blk_end_request(req, 0, brq.data.bytes_xfered);
+			spin_unlock_irq(&md->lock);
+		}
+
+		/* Wait until card is ready for new data. This information is
+		 * only available in non-SPI mode. In SPI mode, the busy
+		 * handling is done in the SPI driver.
+		 */
 		if (!mmc_host_is_spi(card->host) && rq_data_dir(req) != READ) {
 			do {
 				int err;
-
 				cmd.opcode = MMC_SEND_STATUS;
 				cmd.arg = card->rca << 16;
 				cmd.flags = MMC_RSP_R1 | MMC_CMD_AC;
@@ -374,66 +395,63 @@ static int mmc_blk_issue_rq(struct mmc_q
 			} while (!(cmd.resp[0] & R1_READY_FOR_DATA) ||
 				(R1_CURRENT_STATE(cmd.resp[0]) == 7));

-#if 0
-			if (cmd.resp[0] & ~0x00000900)
-				printk(KERN_ERR "%s: status = %08x\n",
-				       req->rq_disk->disk_name, cmd.resp[0]);
-			if (mmc_decode_status(cmd.resp))
-				goto cmd_err;
-#endif
 		}

-		if (brq.cmd.error || brq.stop.error || brq.data.error) {
-			if (rq_data_dir(req) == READ) {
-				/*
-				 * After an error, we redo I/O one sector at a
-				 * time, so we only reach here after trying to
-				 * read a single sector.
-				 */
-				spin_lock_irq(&md->lock);
-				ret = __blk_end_request(req, -EIO, brq.data.blksz);
-				spin_unlock_irq(&md->lock);
-				continue;
-			}
-			goto cmd_err;
-		}
+		/* Do retries for all sort of transmission errors */
+		switch (error) {

-		/*
-		 * A block was successfully transferred.
+		case 0:	/* no error: continue, reset error variables */
+			disable_multi = 0;
+			retries = 3;
+			break;
+
+		/* Card has not understand command. As we do only send
+		 * valid commands, this must be a transmission error. */
+		case -EPROTO:	/* fall through */
+
+		/* Transmission error */
+		case -EILSEQ:	/* fall through */
+
+		/* Timeout, no answer. This might be a transmission error
+		 * host -> card or a real timeout. If we repeat the command,
+		 * we do an overall slowdown and have good chances to
+		 * complete the transfer even in case of a real timeout.
 		 */
-		spin_lock_irq(&md->lock);
-		ret = __blk_end_request(req, 0, brq.data.bytes_xfered);
-		spin_unlock_irq(&md->lock);
+		case -ETIMEDOUT:
+			disable_multi = 1;
+			if (--retries > 0)
+				break;
+			/* Fall through */
+
+		/* non-recoverable error or retries exhausted */
+		default:
+			/* Terminate, if transfer direction is write.
+			 * We are not allowed to continue after a non-
+			 * recoverable write!
+			 */
+			if (rq_data_dir(req) != READ)
+				goto cmd_err;
+			/* If transfer direction is read, report error
+			 * for this sector and continue to read
+			 * the rest. Get all available information!
+			 */
+			spin_lock_irq(&md->lock);
+			ret = __blk_end_request(req, -EIO, brq.data.blksz);
+			spin_unlock_irq(&md->lock);
+			/* reset error variables */
+			disable_multi = 0;
+			retries = 3;
+			break;
+		}
+
+	/* repeat until no more sectors left */
 	} while (ret);

 	mmc_release_host(card->host);

 	return 1;

- cmd_err:
- 	/*
- 	 * If this is an SD card and we're writing, we can first
- 	 * mark the known good sectors as ok.
- 	 *
-	 * If the card is not SD, we can still ok written sectors
-	 * as reported by the controller (which might be less than
-	 * the real number of written sectors, but never more).
-	 */
-	if (mmc_card_sd(card)) {
-		u32 blocks;
-
-		blocks = mmc_sd_num_wr_blocks(card);
-		if (blocks != (u32)-1) {
-			spin_lock_irq(&md->lock);
-			ret = __blk_end_request(req, 0, blocks << 9);
-			spin_unlock_irq(&md->lock);
-		}
-	} else {
-		spin_lock_irq(&md->lock);
-		ret = __blk_end_request(req, 0, brq.data.bytes_xfered);
-		spin_unlock_irq(&md->lock);
-	}
-
+cmd_err:
 	mmc_release_host(card->host);

 	spin_lock_irq(&md->lock);
diff -uprN 2_6_29_rc7_patch_EILSEQ/drivers/mmc/host/mmc_spi.c 2_6_29_rc7_patch_retries/drivers/mmc/host/mmc_spi.c
--- 2_6_29_rc7_patch_EILSEQ/drivers/mmc/host/mmc_spi.c	2009-05-14 12:49:42.000000000 +0200
+++ 2_6_29_rc7_patch_retries/drivers/mmc/host/mmc_spi.c	2009-05-14 15:01:15.000000000 +0200
@@ -743,12 +743,11 @@ mmc_spi_writeblock(struct mmc_spi_host *
 	if (status != 0) {
 		dev_dbg(&spi->dev, "write error %02x (%d)\n",
 			scratch->status[0], status);
-		return status;
-	}
-
+	} else {
 	t->tx_buf += t->len;
 	if (host->dma_dev)
 		t->tx_dma += t->len;
+	}

 	/* Return when not busy.  If we didn't collect that status yet,
 	 * we'll need some more I/O.
@@ -756,9 +755,12 @@ mmc_spi_writeblock(struct mmc_spi_host *
 	for (i = 4; i < sizeof(scratch->status); i++) {
 		/* card is non-busy if the most recent bit is 1 */
 		if (scratch->status[i] & 0x01)
-			return 0;
+		return status;
 	}
-	return mmc_spi_wait_unbusy(host, timeout);
+	i = mmc_spi_wait_unbusy(host, timeout);
+	if (!status)
+		status = i;
+	return status;
 }

 /*
@@ -1087,7 +1089,15 @@ static void mmc_spi_request(struct mmc_h
 	status = mmc_spi_command_send(host, mrq, mrq->cmd, mrq->data != NULL);
 	if (status == 0 && mrq->data) {
 		mmc_spi_data_do(host, mrq->cmd, mrq->data, mrq->data->blksz);
-		if (mrq->stop)
+		/* filter-out the stop command for multiblock writes,
+		* only if the data stage has no transmission error.
+		* If the data stage has a transmission error, send the
+		* STOP command because there is a great chance that the
+		* SPI stop token was not accepted by the card.
+		*/
+		if (mrq->stop &&
+		  ((mrq->data->flags & MMC_DATA_READ)
+		 || mrq->data->error))
 			status = mmc_spi_command_send(host, mrq, mrq->stop, 0);
 		else
 			mmc_cs_off(host);
---
regards

i. A. Wolfgang Mües
-- 
Auerswald GmbH & Co. KG
Hardware Development
Telefon: +49 (0)5306 9219 0
Telefax: +49 (0)5306 9219 94 
E-Mail: Wolfgang.Mues@Auerswald.de
Web: http://www.auerswald.de
 
--------------------------------------------------------------
Auerswald GmbH & Co. KG, Vor den Grashöfen 1, 38162 Cremlingen
Registriert beim AG Braunschweig HRA 13289
p.h.G Auerswald Geschäftsführungsges. mbH
Registriert beim AG Braunschweig HRB 7463
Geschäftsführer: Dipl-Ing. Gerhard Auerswald

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mmc_spi: do propper retry managment in the block layer - rewritten
  2009-05-18 13:21 [PATCH] mmc_spi: do propper retry managment in the block layer - rewritten Wolfgang Mües
@ 2009-05-18 15:20 ` Matt Fleming
  2009-05-18 16:00   ` Wolfgang Mües
  0 siblings, 1 reply; 7+ messages in thread
From: Matt Fleming @ 2009-05-18 15:20 UTC (permalink / raw)
  To: Wolfgang Mües
  Cc: Pierre Ossman, Andrew Morton, David Brownell, Mike Frysinger,
	linux-kernel

On Mon, May 18, 2009 at 02:21:38PM +0100, Wolfgang Mües wrote:
> From: Wolfgang Muees <wolfgang.mues@auerswald.de>
> 
> o This patch adds a propper retry managment for reading
>   and writing data blocks for mmc and mmc_spi. Blocks are
>   retransmitted if the cause of failure might be a transmission
>   fault. After a permanent failure, we try to read all other
>   pending sectors, but terminate on write.
>   This patch was tested with induced transmission errors
>   by ESD pulses (and survived).
> 
> Signed-off-by: Wolfgang Muees <wolfgang.mues@auerswald.de>
> 

When you say "proper retry management", what problem are you having
with the current code? Your changelog tells me what you changed but
not why you changed it.

> ---
> diff -uprN 2_6_29_rc7_patch_EILSEQ/drivers/mmc/card/block.c 2_6_29_rc7_patch_retries/drivers/mmc/card/block.c
> --- 2_6_29_rc7_patch_EILSEQ/drivers/mmc/card/block.c	2009-03-09 17:10:55.000000000 +0100
> +++ 2_6_29_rc7_patch_retries/drivers/mmc/card/block.c	2009-05-18 15:00:41.000000000 +0200

It's always a good idea to make your changes against the latest kernel
version to make sure that your patches apply cleanly. They applied OK
this time, but they might not in future.

> @@ -230,12 +230,14 @@ static int mmc_blk_issue_rq(struct mmc_q
>  	struct mmc_card *card = md->queue.card;
>  	struct mmc_blk_request brq;
>  	int ret = 1, disable_multi = 0;
> +	int retries = 3;
> 
>  	mmc_claim_host(card->host);
> 
>  	do {
>  		struct mmc_command cmd;
>  		u32 readcmd, writecmd, status = 0;
> +		int error = 0;
> 
>  		memset(&brq, 0, sizeof(struct mmc_blk_request));
>  		brq.mrq.cmd = &brq.cmd;
> @@ -251,10 +253,8 @@ static int mmc_blk_issue_rq(struct mmc_q
>  		brq.stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC;
>  		brq.data.blocks = req->nr_sectors;
> 
> -		/*
> -		 * After a read error, we redo the request one sector at a time
> -		 * in order to accurately determine which sectors can be read
> -		 * successfully.
> +		/* After an transmission error, we switch to single block
> +		 * transfer until the problem is solved or we fail.
>  		 */

Why did you change this comment? They seem to say roughly the same
things to me, only the first one is more descriptive. I think you
really need to combine the two,

		/*
		 * After a transmission error, we redo the request one sector
		 * at a time in order to accurately determine which sectors
		 * can be transmitted successfully. We keep retrying until
		 * the transmission succeeds or we exceed our retry count.
		 */

>  		if (disable_multi && brq.data.blocks > 1)
>  			brq.data.blocks = 1;
> @@ -262,10 +262,12 @@ static int mmc_blk_issue_rq(struct mmc_q
>  		if (brq.data.blocks > 1) {
>  			/* SPI multiblock writes terminate using a special
>  			 * token, not a STOP_TRANSMISSION request.
> +			 * Here, this request is set for ALL types of
> +			 * hosts, so that we can use it in the lower
> +			 * layers if the data transfer stage has failed
> +			 * and the card is not able to accept the token.
>  			 */
> -			if (!mmc_host_is_spi(card->host)
> -					|| rq_data_dir(req) == READ)
> -				brq.mrq.stop = &brq.stop;
> +			brq.mrq.stop = &brq.stop;
>  			readcmd = MMC_READ_MULTIPLE_BLOCK;
>  			writecmd = MMC_WRITE_MULTIPLE_BLOCK;
>  		} else {

Hmm.. is it OK to abuse the STOP_TRANSMISSION request like this? Does
SPI multiblock write still terminate with this patch?

> @@ -311,52 +313,71 @@ static int mmc_blk_issue_rq(struct mmc_q
>  		mmc_wait_for_req(card->host, &brq.mrq);
> 
>  		mmc_queue_bounce_post(mq);
> -
> -		/*
> -		 * Check for errors here, but don't jump to cmd_err
> -		 * until later as we need to wait for the card to leave
> -		 * programming mode even when things go wrong.
> -		 */
> -		if (brq.cmd.error || brq.data.error || brq.stop.error) {
> -			if (brq.data.blocks > 1 && rq_data_dir(req) == READ) {
> -				/* Redo read one sector at a time */
> -				printk(KERN_WARNING "%s: retrying using single "
> -				       "block read\n", req->rq_disk->disk_name);
> -				disable_multi = 1;
> -				continue;
> -			}
> -			status = get_card_status(card, req);
> -		}
> -
> +#if 0
> +		printk(KERN_INFO "%s transfer sector %d count %d\n",
> +			(rq_data_dir(req) == READ) ? "Read" : "Write",
> +			(int)req->sector, (int)brq.data.blocks);
> +#endif

This should probably be using pr_debug(). That way you can remove the
#if 0. Or delete it altogether.

[...]

> diff -uprN 2_6_29_rc7_patch_EILSEQ/drivers/mmc/host/mmc_spi.c 2_6_29_rc7_patch_retries/drivers/mmc/host/mmc_spi.c
> --- 2_6_29_rc7_patch_EILSEQ/drivers/mmc/host/mmc_spi.c	2009-05-14 12:49:42.000000000 +0200
> +++ 2_6_29_rc7_patch_retries/drivers/mmc/host/mmc_spi.c	2009-05-14 15:01:15.000000000 +0200
> @@ -1087,7 +1089,15 @@ static void mmc_spi_request(struct mmc_h
>  	status = mmc_spi_command_send(host, mrq, mrq->cmd, mrq->data != NULL);
>  	if (status == 0 && mrq->data) {
>  		mmc_spi_data_do(host, mrq->cmd, mrq->data, mrq->data->blksz);
> -		if (mrq->stop)
> +		/* filter-out the stop command for multiblock writes,
> +		* only if the data stage has no transmission error.
> +		* If the data stage has a transmission error, send the
> +		* STOP command because there is a great chance that the
> +		* SPI stop token was not accepted by the card.
> +		*/
> +		if (mrq->stop &&
> +		  ((mrq->data->flags & MMC_DATA_READ)
> +		 || mrq->data->error))
>  			status = mmc_spi_command_send(host, mrq, mrq->stop, 0);
>  		else
>  			mmc_cs_off(host);

Again, does this still work for SPI? Have you tested it?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mmc_spi: do propper retry managment in the block layer - rewritten
  2009-05-18 15:20 ` Matt Fleming
@ 2009-05-18 16:00   ` Wolfgang Mües
  2009-05-18 20:33     ` Matt Fleming
  0 siblings, 1 reply; 7+ messages in thread
From: Wolfgang Mües @ 2009-05-18 16:00 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Pierre Ossman, Andrew Morton, David Brownell, Mike Frysinger,
	linux-kernel

Matt,

Am Montag, 18. Mai 2009 schrieb Matt Fleming:
> On Mon, May 18, 2009 at 02:21:38PM +0100, Wolfgang Mües wrote:
> > From: Wolfgang Muees <wolfgang.mues@auerswald.de>
> >
> > o This patch adds a propper retry managment for reading
> >   and writing data blocks for mmc and mmc_spi. Blocks are
> >   retransmitted if the cause of failure might be a transmission
> >   fault. After a permanent failure, we try to read all other
> >   pending sectors, but terminate on write.
> >   This patch was tested with induced transmission errors
> >   by ESD pulses (and survived).
> >
> > Signed-off-by: Wolfgang Muees <wolfgang.mues@auerswald.de>
>
> When you say "proper retry management", what problem are you having
> with the current code?

a) Writes were not retried at all, so every transmission error created a data 
loss.
b) There was no clear distinction between errors which comes from transmission 
faults and errors which are by nature non-recoverable.
c) After one error, the remaining sectors were read one by one. This is 
unneccessary time-consuming.

Cause a) was my initial problem with this code. This was not production 
quality.  

> > -		/*
> > -		 * After a read error, we redo the request one sector at a time
> > -		 * in order to accurately determine which sectors can be read
> > -		 * successfully.
> > +		/* After an transmission error, we switch to single block
> > +		 * transfer until the problem is solved or we fail.
> >  		 */
>
> Why did you change this comment? They seem to say roughly the same
> things to me, only the first one is more descriptive. I think you
> really need to combine the two,

Because it would not be true.

In case of an error (both read or write), there is only a temporary switch to
one-sector-at-a-time. This is unlike the old code.

> > @@ -262,10 +262,12 @@ static int mmc_blk_issue_rq(struct mmc_q
> >  		if (brq.data.blocks > 1) {
> >  			/* SPI multiblock writes terminate using a special
> >  			 * token, not a STOP_TRANSMISSION request.
> > +			 * Here, this request is set for ALL types of
> > +			 * hosts, so that we can use it in the lower
> > +			 * layers if the data transfer stage has failed
> > +			 * and the card is not able to accept the token.
> >  			 */
> > -			if (!mmc_host_is_spi(card->host)
> > -					|| rq_data_dir(req) == READ)
> > -				brq.mrq.stop = &brq.stop;
> > +			brq.mrq.stop = &brq.stop;
> >  			readcmd = MMC_READ_MULTIPLE_BLOCK;
> >  			writecmd = MMC_WRITE_MULTIPLE_BLOCK;
> >  		} else {
>
> Hmm.. is it OK to abuse the STOP_TRANSMISSION request like this? Does
> SPI multiblock write still terminate with this patch?

IMHO, this is the right thing to do. If you have a transmission error 
while SPI multiblock write, the card will miss the stop token. So the
card is in multiblock write mode and has to be stopped. You cannot
retry the stop token. Every card I have tested accepted the STOP_TRANSMISSION 
request. So it is better to try a STOP_TRANSMISSION, because this will 
recover communication with the card. If a particular card will not respond to 
the STOP_TRANSMISSION, the situation is not worse as before...

The STOP_TRANSMISSION request is only send to the card if the card
has not responded to the stop token (see the change in mmc_spi.c).
The default non-error behaviour has not changed.

> > +#if 0
> > +		printk(KERN_INFO "%s transfer sector %d count %d\n",
> > +			(rq_data_dir(req) == READ) ? "Read" : "Write",
> > +			(int)req->sector, (int)brq.data.blocks);
> > +#endif
>
> This should probably be using pr_debug(). That way you can remove the
> #if 0.

I do not know pr_debug(). I will have to learn.

> Again, does this still work for SPI? Have you tested it?

Yes and Yes. SPI is my primary target. This is Patch #11 I have submitted to 
the spi/mmc framework.

best regards
 
i. A. Wolfgang Mües
-- 
Auerswald GmbH & Co. KG
Hardware Development
Telefon: +49 (0)5306 9219 0
Telefax: +49 (0)5306 9219 94 
E-Mail: Wolfgang.Mues@Auerswald.de
Web: http://www.auerswald.de
 
--------------------------------------------------------------
Auerswald GmbH & Co. KG, Vor den Grashöfen 1, 38162 Cremlingen
Registriert beim AG Braunschweig HRA 13289
p.h.G Auerswald Geschäftsführungsges. mbH
Registriert beim AG Braunschweig HRB 7463
Geschäftsführer: Dipl-Ing. Gerhard Auerswald

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mmc_spi: do propper retry managment in the block layer - rewritten
  2009-05-18 16:00   ` Wolfgang Mües
@ 2009-05-18 20:33     ` Matt Fleming
  2009-05-19  7:16       ` Wolfgang Mües
  0 siblings, 1 reply; 7+ messages in thread
From: Matt Fleming @ 2009-05-18 20:33 UTC (permalink / raw)
  To: Wolfgang Mües
  Cc: Pierre Ossman, Andrew Morton, David Brownell, Mike Frysinger,
	linux-kernel

On Mon, May 18, 2009 at 06:00:40PM +0200, Wolfgang Mües wrote:
> Matt,
> 
> Am Montag, 18. Mai 2009 schrieb Matt Fleming:
> >
> > When you say "proper retry management", what problem are you having
> > with the current code?
> 
> a) Writes were not retried at all, so every transmission error created a data 
> loss.
> b) There was no clear distinction between errors which comes from transmission 
> faults and errors which are by nature non-recoverable.
> c) After one error, the remaining sectors were read one by one. This is 
> unneccessary time-consuming.
> 
> Cause a) was my initial problem with this code. This was not production 
> quality.  
> 

Ah, now I understand. I've also been contacted privately by someone
who says that they've been experiencing similar issues and that your
patch has fixed them. Clearly there are people out there with flaky
hardware ;-)

> > > @@ -262,10 +262,12 @@ static int mmc_blk_issue_rq(struct mmc_q
> > >  		if (brq.data.blocks > 1) {
> > >  			/* SPI multiblock writes terminate using a special
> > >  			 * token, not a STOP_TRANSMISSION request.
> > > +			 * Here, this request is set for ALL types of
> > > +			 * hosts, so that we can use it in the lower
> > > +			 * layers if the data transfer stage has failed
> > > +			 * and the card is not able to accept the token.
> > >  			 */
> > > -			if (!mmc_host_is_spi(card->host)
> > > -					|| rq_data_dir(req) == READ)
> > > -				brq.mrq.stop = &brq.stop;
> > > +			brq.mrq.stop = &brq.stop;
> > >  			readcmd = MMC_READ_MULTIPLE_BLOCK;
> > >  			writecmd = MMC_WRITE_MULTIPLE_BLOCK;
> > >  		} else {
> >
> > Hmm.. is it OK to abuse the STOP_TRANSMISSION request like this? Does
> > SPI multiblock write still terminate with this patch?
> 
> IMHO, this is the right thing to do. If you have a transmission error 
> while SPI multiblock write, the card will miss the stop token. So the
> card is in multiblock write mode and has to be stopped. You cannot
> retry the stop token. Every card I have tested accepted the STOP_TRANSMISSION 
> request. So it is better to try a STOP_TRANSMISSION, because this will 
> recover communication with the card. If a particular card will not respond to 
> the STOP_TRANSMISSION, the situation is not worse as before...
> 
> The STOP_TRANSMISSION request is only send to the card if the card
> has not responded to the stop token (see the change in mmc_spi.c).
> The default non-error behaviour has not changed.
> 

I'll defer to Pierre on that one.

I feel that retrying data accesses should be done at a higher layer
than the MMC/SD subsystem. But maybe I'm just being impractical. This
patch solves a real world problem that is biting some people, and so
based on that, the approach seems OK to me FWIW.


diff -uprN 2_6_29_rc7_patch_EILSEQ/drivers/mmc/host/mmc_spi.c 2_6_29_rc7_patch_retries/drivers/mmc/host/mmc_spi.c
--- 2_6_29_rc7_patch_EILSEQ/drivers/mmc/host/mmc_spi.c	2009-05-14 12:49:42.000000000 +0200
+++ 2_6_29_rc7_patch_retries/drivers/mmc/host/mmc_spi.c	2009-05-14 15:01:15.000000000 +0200
@@ -743,12 +743,11 @@ mmc_spi_writeblock(struct mmc_spi_host *
 	if (status != 0) {
 		dev_dbg(&spi->dev, "write error %02x (%d)\n",
 			scratch->status[0], status);
-		return status;
-	}
-
+	} else {
 	t->tx_buf += t->len;
 	if (host->dma_dev)
 		t->tx_dma += t->len;
+	}

By the way, the indentation needs fixing here.

@@ -1087,7 +1089,15 @@ static void mmc_spi_request(struct mmc_h
 	status = mmc_spi_command_send(host, mrq, mrq->cmd, mrq->data != NULL);
 	if (status == 0 && mrq->data) {
 		mmc_spi_data_do(host, mrq->cmd, mrq->data, mrq->data->blksz);
-		if (mrq->stop)
+		/* filter-out the stop command for multiblock writes,
+		* only if the data stage has no transmission error.
+		* If the data stage has a transmission error, send the
+		* STOP command because there is a great chance that the
+		* SPI stop token was not accepted by the card.
+		*/
+		if (mrq->stop &&
+		  ((mrq->data->flags & MMC_DATA_READ)
+		 || mrq->data->error))
			status = mmc_spi_command_send(host, mrq, mrq->stop, 0);
 		else
 			mmc_cs_off(host);

And here..

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mmc_spi: do propper retry managment in the block layer - rewritten
  2009-05-18 20:33     ` Matt Fleming
@ 2009-05-19  7:16       ` Wolfgang Mües
  2009-05-19  8:41         ` Matt Fleming
  0 siblings, 1 reply; 7+ messages in thread
From: Wolfgang Mües @ 2009-05-19  7:16 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Pierre Ossman, Andrew Morton, David Brownell, Mike Frysinger,
	linux-kernel

Matt,

Am Montag, 18. Mai 2009 schrieb Matt Fleming:
> Ah, now I understand. I've also been contacted privately by someone
> who says that they've been experiencing similar issues and that your
> patch has fixed them. Clearly there are people out there with flaky
> hardware ;-)

Hmmm... Every hardware is flaky. Put a piece of hardware near your 
refrigerator or DECT base, and you will have EMP or HF problems.
Or use your finger and generate ESD pulses.

I am using a piezo where I have the GND pin connected to a wire to generate 
ESD pulses which have the same results as a 4 KV ESD generator:

http://www.vega-direct.at/buffet/chafing_dish_mit_zubehoer/gasanzuender/index.hsp

A software which do not behave gracefully in the presence of errors is not 
production quality for me.

> I feel that retrying data accesses should be done at a higher layer
> than the MMC/SD subsystem.

There is some sense in that. But then you must give some more information to 
the higher layers:

- the information that it makes sense to retry.
- If a retry is commanded by the upper layer, mark it as retry, so that the 
lower layer is able to react different if needed.

Retry is not only bare retry, it is also about recovering from a previous 
failure, and try to continue the (maybe lost) communication. Retry may be a 
multi-stage approach (first try a simple retry, second try to recover 
communication, third to reset the hardware and start from the beginning.)

So I fear that a retry logic in the upper layers must have so much information 
about what is going on there and how to fix the problems, that it might be a 
real hard work to implement as a general solution.

Now, the upper layers (I mainly use vfat file system) are doing no retry at 
all, and this is not acceptable IMO.

> This 
> patch solves a real world problem that is biting some people, and so
> based on that, the approach seems OK to me FWIW.

Fine. Note that the old code has already done retries for read operations.

> By the way, the indentation needs fixing here.

OK.

If the discussion is over, I will post an updated patch. I will wait for 
Pierre to comment.

best regards

i. A. Wolfgang Mües
-- 
Auerswald GmbH & Co. KG
Hardware Development
Telefon: +49 (0)5306 9219 0
Telefax: +49 (0)5306 9219 94 
E-Mail: Wolfgang.Mues@Auerswald.de
Web: http://www.auerswald.de

--------------------------------------------------------------
Auerswald GmbH & Co. KG, Vor den Grashöfen 1, 38162 Cremlingen
Registriert beim AG Braunschweig HRA 13289
p.h.G Auerswald Geschäftsführungsges. mbH
Registriert beim AG Braunschweig HRB 7463
Geschäftsführer: Dipl-Ing. Gerhard Auerswald

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mmc_spi: do propper retry managment in the block layer - rewritten
  2009-05-19  7:16       ` Wolfgang Mües
@ 2009-05-19  8:41         ` Matt Fleming
  2009-05-19  9:16           ` Wolfgang Mües
  0 siblings, 1 reply; 7+ messages in thread
From: Matt Fleming @ 2009-05-19  8:41 UTC (permalink / raw)
  To: Wolfgang Mües
  Cc: Pierre Ossman, Andrew Morton, David Brownell, Mike Frysinger,
	linux-kernel, Nicolas Ferre

[CC'ing Nicolas as he's the at91 mci driver maintainer]

On Tue, May 19, 2009 at 09:16:40AM +0200, Wolfgang Mües wrote:
> Matt,
> 
> Am Montag, 18. Mai 2009 schrieb Matt Fleming:
> > Ah, now I understand. I've also been contacted privately by someone
> > who says that they've been experiencing similar issues and that your
> > patch has fixed them. Clearly there are people out there with flaky
> > hardware ;-)
> 
> Hmmm... Every hardware is flaky. Put a piece of hardware near your 
> refrigerator or DECT base, and you will have EMP or HF problems.
> Or use your finger and generate ESD pulses.
> 
> I am using a piezo where I have the GND pin connected to a wire to generate 
> ESD pulses which have the same results as a 4 KV ESD generator:
> 
> http://www.vega-direct.at/buffet/chafing_dish_mit_zubehoer/gasanzuender/index.hsp
> 
> A software which do not behave gracefully in the presence of errors is not 
> production quality for me.
> 

Oh, wow. I didn't pick up on the fact that you were artificially
generating errors on the bus. I thought you'd seen this sort of
problem in production. I don't understand what sort of fault-tolerance
you expect the MMC/SD subsystem to provide.

I've heard from one person that an earlier version of this patch fixed
issues with their at91 controller. If othe people are experiencing
similar issues and this patch fixes it for them, then it definitely
lends some weight to it.

Nicolas, do you have an opinion on the patch?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mmc_spi: do propper retry managment in the block layer - rewritten
  2009-05-19  8:41         ` Matt Fleming
@ 2009-05-19  9:16           ` Wolfgang Mües
  0 siblings, 0 replies; 7+ messages in thread
From: Wolfgang Mües @ 2009-05-19  9:16 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Pierre Ossman, Andrew Morton, David Brownell, Mike Frysinger,
	linux-kernel, Nicolas Ferre

Matt,

Am Dienstag, 19. Mai 2009 schrieb Matt Fleming:
> Oh, wow. I didn't pick up on the fact that you were artificially
> generating errors on the bus.

I have started with observing random errors in a batch of 20 pre-production 
devices. There were about 1 error in 10 seconds of transfer due to an non-
optimal GND area design. Remember that for 25 MHz SPI clock, you only have
20ns from the start of the clock pulse into the SD card, and out of the
data line of the SD card and into the controller. This is fast - and induced 
EMP pulses of say 5 ns will be able to produce random transmission errors. 
And you can not apply any filtering to the data lines at 25 MHz - so you need 
some sort of fault tolerance in the software.

After that observation, I started to test these devices with an ESD pulse 
tester to test the software retry logic.

best regards

i. A. Wolfgang Mües
-- 
Auerswald GmbH & Co. KG
Hardware Development
Telefon: +49 (0)5306 9219 0
Telefax: +49 (0)5306 9219 94 
E-Mail: Wolfgang.Mues@Auerswald.de
Web: http://www.auerswald.de

--------------------------------------------------------------
Auerswald GmbH & Co. KG, Vor den Grashöfen 1, 38162 Cremlingen
Registriert beim AG Braunschweig HRA 13289
p.h.G Auerswald Geschäftsführungsges. mbH
Registriert beim AG Braunschweig HRB 7463
Geschäftsführer: Dipl-Ing. Gerhard Auerswald

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-05-19  9:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-18 13:21 [PATCH] mmc_spi: do propper retry managment in the block layer - rewritten Wolfgang Mües
2009-05-18 15:20 ` Matt Fleming
2009-05-18 16:00   ` Wolfgang Mües
2009-05-18 20:33     ` Matt Fleming
2009-05-19  7:16       ` Wolfgang Mües
2009-05-19  8:41         ` Matt Fleming
2009-05-19  9:16           ` Wolfgang Mües

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox