* [PATCH 2/2] mmc_block: ensure all sectors that do not have errors are read @ 2008-10-16 13:26 Adrian Hunter 2008-10-26 11:32 ` Pierre Ossman 0 siblings, 1 reply; 10+ messages in thread From: Adrian Hunter @ 2008-10-16 13:26 UTC (permalink / raw) To: Pierre Ossman; +Cc: LKML If a card encounters an ECC error while reading a sector it will timeout. Instead of reporting the entire I/O request as having an error, redo the I/O one sector at a time so that all readable sectors are provided to the upper layers. Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com> --- drivers/mmc/card/block.c | 32 ++++++++++++++++++++++++++------ 1 files changed, 26 insertions(+), 6 deletions(-) diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c index d121462..0566aae 100644 --- a/drivers/mmc/card/block.c +++ b/drivers/mmc/card/block.c @@ -256,13 +256,14 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req) struct mmc_blk_data *md = mq->data; struct mmc_card *card = md->queue.card; struct mmc_blk_request brq; - int ret = 1; + int ret = 1, disable_multi = 0; mmc_claim_host(card->host); do { struct mmc_command cmd; u32 readcmd, writecmd; + int multi, err; memset(&brq, 0, sizeof(struct mmc_blk_request)); brq.mrq.cmd = &brq.cmd; @@ -278,6 +279,9 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req) brq.stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC; brq.data.blocks = req->nr_sectors; + if (disable_multi && brq.data.blocks > 1) + brq.data.blocks = 1; + if (brq.data.blocks > 1) { /* SPI multiblock writes terminate using a special * token, not a STOP_TRANSMISSION request. @@ -287,10 +291,12 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req) brq.mrq.stop = &brq.stop; readcmd = MMC_READ_MULTIPLE_BLOCK; writecmd = MMC_WRITE_MULTIPLE_BLOCK; + multi = 1; } else { brq.mrq.stop = NULL; readcmd = MMC_READ_SINGLE_BLOCK; writecmd = MMC_WRITE_BLOCK; + multi = 0; } if (rq_data_dir(req) == READ) { @@ -312,6 +318,13 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req) mmc_queue_bounce_post(mq); + if (multi && rq_data_dir(req) == READ && + brq.data.error == -ETIMEDOUT) { + /* Redo read one sector at a time */ + disable_multi = 1; + continue; + } + /* * Check for errors here, but don't jump to cmd_err * until later as we need to wait for the card to leave @@ -360,14 +373,21 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req) #endif } - if (brq.cmd.error || brq.data.error || brq.stop.error) + if (brq.cmd.error || brq.stop.error) goto cmd_err; - /* - * A block was successfully transferred. - */ + if (brq.data.error) { + if (brq.data.error == -ETIMEDOUT && + rq_data_dir(req) == READ) { + err = -EIO; + brq.data.bytes_xfered = brq.data.blksz; + } else + goto cmd_err; + } else + err = 0; + spin_lock_irq(&md->lock); - ret = __blk_end_request(req, 0, brq.data.bytes_xfered); + ret = __blk_end_request(req, err, brq.data.bytes_xfered); spin_unlock_irq(&md->lock); } while (ret); -- 1.5.4.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] mmc_block: ensure all sectors that do not have errors are read 2008-10-16 13:26 [PATCH 2/2] mmc_block: ensure all sectors that do not have errors are read Adrian Hunter @ 2008-10-26 11:32 ` Pierre Ossman 2008-10-29 14:26 ` Adrian Hunter 0 siblings, 1 reply; 10+ messages in thread From: Pierre Ossman @ 2008-10-26 11:32 UTC (permalink / raw) To: Adrian Hunter; +Cc: LKML [-- Attachment #1: Type: text/plain, Size: 3141 bytes --] On Thu, 16 Oct 2008 16:26:57 +0300 Adrian Hunter <ext-adrian.hunter@nokia.com> wrote: > If a card encounters an ECC error while reading a sector it will > timeout. Instead of reporting the entire I/O request as having > an error, redo the I/O one sector at a time so that all readable > sectors are provided to the upper layers. > > Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com> > --- We actually had something like this on the table some time ago. It got scrapped because of data integrity problems. This is just for reads though, so I guess it should be safe. > @@ -278,6 +279,9 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req) > brq.stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC; > brq.data.blocks = req->nr_sectors; > > + if (disable_multi && brq.data.blocks > 1) > + brq.data.blocks = 1; > + A comment here would be nice. You also need to adjust the sg list when you change the block count. There was code there that did that previously, but it got removed in 2.6.27-rc1. > @@ -312,6 +318,13 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req) > > mmc_queue_bounce_post(mq); > > + if (multi && rq_data_dir(req) == READ && > + brq.data.error == -ETIMEDOUT) { > + /* Redo read one sector at a time */ > + disable_multi = 1; > + continue; > + } > + Some concerns here: 1. "brq.data.blocks > 1" doesn't need to be optimised into its own variable. It just obscures things. 2. A comment here as well. Explain what this does and why it is safe (so people don't try to extend it to writes) 3. You should check all errors, not just data.error and ETIMEDOUT. 4. You should first report the successfully transferred blocks as ok. > @@ -360,14 +373,21 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req) > #endif > } > > - if (brq.cmd.error || brq.data.error || brq.stop.error) > + if (brq.cmd.error || brq.stop.error) > goto cmd_err; Move your code to inside this if clause and you'll solve 3. and 4. in a neat manner. You might also want to print something so that it is visible that the driver retried the transfer. > > - /* > - * A block was successfully transferred. > - */ > + if (brq.data.error) { > + if (brq.data.error == -ETIMEDOUT && > + rq_data_dir(req) == READ) { > + err = -EIO; > + brq.data.bytes_xfered = brq.data.blksz; > + } else > + goto cmd_err; > + } else > + err = 0; > + > spin_lock_irq(&md->lock); > - ret = __blk_end_request(req, 0, brq.data.bytes_xfered); > + ret = __blk_end_request(req, err, brq.data.bytes_xfered); > spin_unlock_irq(&md->lock); > } while (ret); > Instead of this big song and dance routine, just have a dedicated piece of code for calling __blk_end_request() for the single sector failure. Rgds -- -- Pierre Ossman WARNING: This correspondence is being monitored by the Swedish government. Make sure your server uses encryption for SMTP traffic and consider using PGP for end-to-end encryption. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] mmc_block: ensure all sectors that do not have errors are read 2008-10-26 11:32 ` Pierre Ossman @ 2008-10-29 14:26 ` Adrian Hunter 2008-11-10 8:21 ` Adrian Hunter 2008-11-30 19:05 ` Pierre Ossman 0 siblings, 2 replies; 10+ messages in thread From: Adrian Hunter @ 2008-10-29 14:26 UTC (permalink / raw) To: Pierre Ossman; +Cc: LKML Pierre Ossman wrote: > On Thu, 16 Oct 2008 16:26:57 +0300 > Adrian Hunter <ext-adrian.hunter@nokia.com> wrote: > >> If a card encounters an ECC error while reading a sector it will >> timeout. Instead of reporting the entire I/O request as having >> an error, redo the I/O one sector at a time so that all readable >> sectors are provided to the upper layers. >> >> Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com> >> --- > > We actually had something like this on the table some time ago. It got > scrapped because of data integrity problems. This is just for reads > though, so I guess it should be safe. > >> @@ -278,6 +279,9 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req) >> brq.stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC; >> brq.data.blocks = req->nr_sectors; >> >> + if (disable_multi && brq.data.blocks > 1) >> + brq.data.blocks = 1; >> + > > A comment here would be nice. Ok > You also need to adjust the sg list when you change the block count. > There was code there that did that previously, but it got removed in > 2.6.27-rc1. That is not necessary. It is an optimisation. In general, optimising an error path serves no purpose. >> @@ -312,6 +318,13 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req) >> >> mmc_queue_bounce_post(mq); >> >> + if (multi && rq_data_dir(req) == READ && >> + brq.data.error == -ETIMEDOUT) { >> + /* Redo read one sector at a time */ >> + disable_multi = 1; >> + continue; >> + } >> + > > Some concerns here: > > 1. "brq.data.blocks > 1" doesn't need to be optimised into its own > variable. It just obscures things. But you have to assume that no driver changes the 'blocks' variable e.g. counts it down. It is not an optimisation, it is just to improve reliability and readability. What does it obscure? > 2. A comment here as well. Explain what this does and why it is safe > (so people don't try to extend it to writes) ok > 3. You should check all errors, not just data.error and ETIMEDOUT. No. Data timeout is a special case. The other errors are system errors. If there is a command error or stop error (which is also a command error) it means either there is a bug in the kernel or the controller or card has failed to follow the specification. Under those circumstances Data timeout on the other hand just means the data could not be retrieved - in the case we have seen because of ECC error. > 4. You should first report the successfully transferred blocks as ok. That is another optimisation of the error path i.e. not necessary. It is simpler to just start processing the request again - which the patch does. >> @@ -360,14 +373,21 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req) >> #endif >> } >> >> - if (brq.cmd.error || brq.data.error || brq.stop.error) >> + if (brq.cmd.error || brq.stop.error) >> goto cmd_err; > > Move your code to inside this if clause and you'll solve 3. and 4. in a > neat manner. Well, I do not agree with 3 and 4. > You might also want to print something so that it is > visible that the driver retried the transfer. There are already two error messages per sector (one from this function and one from '__blk_end_request()', so another message is too much. >> >> - /* >> - * A block was successfully transferred. >> - */ >> + if (brq.data.error) { >> + if (brq.data.error == -ETIMEDOUT && >> + rq_data_dir(req) == READ) { >> + err = -EIO; >> + brq.data.bytes_xfered = brq.data.blksz; >> + } else >> + goto cmd_err; >> + } else >> + err = 0; >> + >> spin_lock_irq(&md->lock); >> - ret = __blk_end_request(req, 0, brq.data.bytes_xfered); >> + ret = __blk_end_request(req, err, brq.data.bytes_xfered); >> spin_unlock_irq(&md->lock); >> } while (ret); >> > > Instead of this big song and dance routine, just have a dedicated piece > of code for calling __blk_end_request() for the single sector failure. Ok Amended patch follows: >From 318326b563f7c792fac92e7c93b0e02b353d0a0d Mon Sep 17 00:00:00 2001 From: Adrian Hunter <ext-adrian.hunter@nokia.com> Date: Thu, 16 Oct 2008 13:13:08 +0300 Subject: [PATCH] mmc_block: ensure all sectors that do not have errors are read If a card encounters an ECC error while reading a sector it will timeout. Instead of reporting the entire I/O request as having an error, redo the I/O one sector at a time so that all readable sectors are provided to the upper layers. Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com> --- drivers/mmc/card/block.c | 35 ++++++++++++++++++++++++++++++----- 1 files changed, 30 insertions(+), 5 deletions(-) diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c index 9998718..d3777cc 100644 --- a/drivers/mmc/card/block.c +++ b/drivers/mmc/card/block.c @@ -235,13 +235,14 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req) struct mmc_blk_data *md = mq->data; struct mmc_card *card = md->queue.card; struct mmc_blk_request brq; - int ret = 1; + int ret = 1, disable_multi = 0; mmc_claim_host(card->host); do { struct mmc_command cmd; u32 readcmd, writecmd, status = 0; + int multi; memset(&brq, 0, sizeof(struct mmc_blk_request)); brq.mrq.cmd = &brq.cmd; @@ -257,6 +258,14 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req) brq.stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC; brq.data.blocks = req->nr_sectors; + /* + * After a read error, we redo the request one sector at a time + * in order to accurately determine which sectors can be read + * successfully. + */ + if (disable_multi && brq.data.blocks > 1) + brq.data.blocks = 1; + if (brq.data.blocks > 1) { /* SPI multiblock writes terminate using a special * token, not a STOP_TRANSMISSION request. @@ -266,10 +275,12 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req) brq.mrq.stop = &brq.stop; readcmd = MMC_READ_MULTIPLE_BLOCK; writecmd = MMC_WRITE_MULTIPLE_BLOCK; + multi = 1; } else { brq.mrq.stop = NULL; readcmd = MMC_READ_SINGLE_BLOCK; writecmd = MMC_WRITE_BLOCK; + multi = 0; } if (rq_data_dir(req) == READ) { @@ -291,6 +302,13 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req) mmc_queue_bounce_post(mq); + if (multi && rq_data_dir(req) == READ && + brq.data.error == -ETIMEDOUT) { + /* Redo read one sector at a time */ + disable_multi = 1; + continue; + } + /* * Check for errors here, but don't jump to cmd_err * until later as we need to wait for the card to leave @@ -362,12 +380,19 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req) #endif } - if (brq.cmd.error || brq.data.error || brq.stop.error) + if (brq.cmd.error || brq.stop.error) + goto cmd_err; + + if (brq.data.error == -ETIMEDOUT && rq_data_dir(req) == READ) { + spin_lock_irq(&md->lock); + ret = __blk_end_request(req, -EIO, brq.data.blksz); + spin_unlock_irq(&md->lock); + continue; + } + + if (brq.cmd.error) goto cmd_err; - /* - * A block was successfully transferred. - */ spin_lock_irq(&md->lock); ret = __blk_end_request(req, 0, brq.data.bytes_xfered); spin_unlock_irq(&md->lock); -- 1.5.4.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] mmc_block: ensure all sectors that do not have errors are read 2008-10-29 14:26 ` Adrian Hunter @ 2008-11-10 8:21 ` Adrian Hunter 2008-11-30 19:05 ` Pierre Ossman 1 sibling, 0 replies; 10+ messages in thread From: Adrian Hunter @ 2008-11-10 8:21 UTC (permalink / raw) To: Pierre Ossman Cc: LKML, Lavinen Jarkko (Nokia-M/Helsinki), Bityutskiy Artem (Nokia-M/Helsinki) Adrian Hunter wrote: > Pierre Ossman wrote: >> On Thu, 16 Oct 2008 16:26:57 +0300 >> Adrian Hunter <ext-adrian.hunter@nokia.com> wrote: >> >>> If a card encounters an ECC error while reading a sector it will >>> timeout. Instead of reporting the entire I/O request as having >>> an error, redo the I/O one sector at a time so that all readable >>> sectors are provided to the upper layers. >>> >>> Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com> >>> --- >> >> We actually had something like this on the table some time ago. It got >> scrapped because of data integrity problems. This is just for reads >> though, so I guess it should be safe. >> >>> @@ -278,6 +279,9 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, >>> struct request *req) >>> brq.stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC; >>> brq.data.blocks = req->nr_sectors; >>> >>> + if (disable_multi && brq.data.blocks > 1) >>> + brq.data.blocks = 1; >>> + >> >> A comment here would be nice. > > Ok > >> You also need to adjust the sg list when you change the block count. >> There was code there that did that previously, but it got removed in >> 2.6.27-rc1. > > That is not necessary. It is an optimisation. In general, optimising an > error path serves no purpose. > >>> @@ -312,6 +318,13 @@ static int mmc_blk_issue_rq(struct mmc_queue >>> *mq, struct request *req) >>> >>> mmc_queue_bounce_post(mq); >>> >>> + if (multi && rq_data_dir(req) == READ && >>> + brq.data.error == -ETIMEDOUT) { >>> + /* Redo read one sector at a time */ >>> + disable_multi = 1; >>> + continue; >>> + } >>> + >> >> Some concerns here: >> >> 1. "brq.data.blocks > 1" doesn't need to be optimised into its own >> variable. It just obscures things. > > But you have to assume that no driver changes the 'blocks' variable e.g. > counts it down. It is not an optimisation, it is just to improve > reliability and readability. What does it obscure? > >> 2. A comment here as well. Explain what this does and why it is safe >> (so people don't try to extend it to writes) > > ok > >> 3. You should check all errors, not just data.error and ETIMEDOUT. > > No. Data timeout is a special case. The other errors are system errors. > If there is a command error or stop error (which is also a command error) > it means either there is a bug in the kernel or the controller or card > has failed to follow the specification. Under those circumstances > > Data timeout on the other hand just means the data could not be retrieved > - in the case we have seen because of ECC error. > >> 4. You should first report the successfully transferred blocks as ok. > > That is another optimisation of the error path i.e. not necessary. It > is simpler to just start processing the request again - which the patch > does. > >>> @@ -360,14 +373,21 @@ static int mmc_blk_issue_rq(struct mmc_queue >>> *mq, struct request *req) >>> #endif >>> } >>> >>> - if (brq.cmd.error || brq.data.error || brq.stop.error) >>> + if (brq.cmd.error || brq.stop.error) >>> goto cmd_err; >> >> Move your code to inside this if clause and you'll solve 3. and 4. in a >> neat manner. > > Well, I do not agree with 3 and 4. > >> You might also want to print something so that it is >> visible that the driver retried the transfer. > > There are already two error messages per sector (one from this function > and one from '__blk_end_request()', so another message is too much. > >>> >>> - /* >>> - * A block was successfully transferred. >>> - */ >>> + if (brq.data.error) { >>> + if (brq.data.error == -ETIMEDOUT && >>> + rq_data_dir(req) == READ) { >>> + err = -EIO; >>> + brq.data.bytes_xfered = brq.data.blksz; >>> + } else >>> + goto cmd_err; >>> + } else >>> + err = 0; >>> + >>> spin_lock_irq(&md->lock); >>> - ret = __blk_end_request(req, 0, brq.data.bytes_xfered); >>> + ret = __blk_end_request(req, err, brq.data.bytes_xfered); >>> spin_unlock_irq(&md->lock); >>> } while (ret); >>> >> >> Instead of this big song and dance routine, just have a dedicated piece >> of code for calling __blk_end_request() for the single sector failure. > > Ok > > Amended patch follows: What is the status of this patch? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] mmc_block: ensure all sectors that do not have errors are read 2008-10-29 14:26 ` Adrian Hunter 2008-11-10 8:21 ` Adrian Hunter @ 2008-11-30 19:05 ` Pierre Ossman 2008-12-05 11:09 ` Adrian Hunter 1 sibling, 1 reply; 10+ messages in thread From: Pierre Ossman @ 2008-11-30 19:05 UTC (permalink / raw) To: Adrian Hunter; +Cc: LKML [-- Attachment #1: Type: text/plain, Size: 4028 bytes --] On Wed, 29 Oct 2008 16:26:07 +0200 Adrian Hunter <ext-adrian.hunter@nokia.com> wrote: > Pierre Ossman wrote: > > You also need to adjust the sg list when you change the block count. > > There was code there that did that previously, but it got removed in > > 2.6.27-rc1. > > That is not necessary. It is an optimisation. In general, optimising an > error path serves no purpose. > Actually, it is a requirement that some drivers rely on. There is also a WARN_ON(), or possibly a BUG_ON() in the request handling path. I think it only triggers with MMC_DEBUG though... > > > > Some concerns here: > > > > 1. "brq.data.blocks > 1" doesn't need to be optimised into its own > > variable. It just obscures things. > > But you have to assume that no driver changes the 'blocks' variable e.g. > counts it down. That would be a no-no. I guess this is not properly documented, but it is strongly implied that the request structure should be read-only except for fields that contain result data. ;) > It is not an optimisation, it is just to improve > reliability and readability. What does it obscure? if-clauses look like they're different, but they're not. > > 3. You should check all errors, not just data.error and ETIMEDOUT. > > No. Data timeout is a special case. The other errors are system errors. > If there is a command error or stop error (which is also a command error) > it means either there is a bug in the kernel or the controller or card > has failed to follow the specification. Under those circumstances Controllers do not follow the specs. Welcome to reality. :) (sad fact: I don't think there is a single controller out there that follows the specs to 100%). Anyway, for this specific case, a controller might not be able to differentiate between command and data timeouts. Or it might report "data error" for anything unexpected (I seem to recall one of the currently supported controllers behaves like this). Generally it's about being as flexible as we can be. This is a world of crappy hardware, so strict spec adherence just leads to a lot of hurt. And I have the scars to prove it. :) > > 4. You should first report the successfully transferred blocks as ok. > > That is another optimisation of the error path i.e. not necessary. It > is simpler to just start processing the request again - which the patch > does. > I suppose. > > You might also want to print something so that it is > > visible that the driver retried the transfer. > > There are already two error messages per sector (one from this function > and one from '__blk_end_request()', so another message is too much. > I didn't mean an error for the bad sector, but something like "encountered an error during multi-block transfer. retrying...". This could save a lot of time and headache during debugging in the future. > @@ -362,12 +380,19 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req) > #endif > } > > - if (brq.cmd.error || brq.data.error || brq.stop.error) > + if (brq.cmd.error || brq.stop.error) > + goto cmd_err; > + > + if (brq.data.error == -ETIMEDOUT && rq_data_dir(req) == READ) { > + spin_lock_irq(&md->lock); > + ret = __blk_end_request(req, -EIO, brq.data.blksz); > + spin_unlock_irq(&md->lock); > + continue; > + } > + This could use a comment explaining that if we've reached this point, we know that it was a single sector that failed. > + if (brq.cmd.error) > goto cmd_err; > > - /* > - * A block was successfully transferred. > - */ > spin_lock_irq(&md->lock); > ret = __blk_end_request(req, 0, brq.data.bytes_xfered); > spin_unlock_irq(&md->lock); Shouldn't this comment stay now? :) Rgds -- -- Pierre Ossman WARNING: This correspondence is being monitored by the Swedish government. Make sure your server uses encryption for SMTP traffic and consider using PGP for end-to-end encryption. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] mmc_block: ensure all sectors that do not have errors are read 2008-11-30 19:05 ` Pierre Ossman @ 2008-12-05 11:09 ` Adrian Hunter 2008-12-21 14:29 ` Pierre Ossman 0 siblings, 1 reply; 10+ messages in thread From: Adrian Hunter @ 2008-12-05 11:09 UTC (permalink / raw) To: Pierre Ossman; +Cc: LKML Pierre Ossman wrote: > On Wed, 29 Oct 2008 16:26:07 +0200 > Adrian Hunter <ext-adrian.hunter@nokia.com> wrote: > >> Pierre Ossman wrote: >>> You also need to adjust the sg list when you change the block count. >>> There was code there that did that previously, but it got removed in >>> 2.6.27-rc1. >> That is not necessary. It is an optimisation. In general, optimising an >> error path serves no purpose. >> > > Actually, it is a requirement that some drivers rely on. There is also > a WARN_ON(), or possibly a BUG_ON() in the request handling path. I > think it only triggers with MMC_DEBUG though... OK >>> Some concerns here: >>> >>> 1. "brq.data.blocks > 1" doesn't need to be optimised into its own >>> variable. It just obscures things. >> But you have to assume that no driver changes the 'blocks' variable e.g. >> counts it down. > > That would be a no-no. I guess this is not properly documented, but it > is strongly implied that the request structure should be read-only > except for fields that contain result data. ;) > >> It is not an optimisation, it is just to improve >> reliability and readability. What does it obscure? > > if-clauses look like they're different, but they're not. Well, the if clauses are different anyway, but no matter. >>> 3. You should check all errors, not just data.error and ETIMEDOUT. >> No. Data timeout is a special case. The other errors are system errors. >> If there is a command error or stop error (which is also a command error) >> it means either there is a bug in the kernel or the controller or card >> has failed to follow the specification. Under those circumstances > > Controllers do not follow the specs. Welcome to reality. :) > > (sad fact: I don't think there is a single controller out there that > follows the specs to 100%). > > Anyway, for this specific case, a controller might not be able to > differentiate between command and data timeouts. Or it might report > "data error" for anything unexpected (I seem to recall one of the > currently supported controllers behaves like this). > > Generally it's about being as flexible as we can be. This is a world of > crappy hardware, so strict spec adherence just leads to a lot of hurt. > And I have the scars to prove it. :) I agree with your philosophy, although I was merely letting it go back to the way it was - which I thought was safer then introducing new behaviour into an undefined situation. >>> 4. You should first report the successfully transferred blocks as ok. >> That is another optimisation of the error path i.e. not necessary. It >> is simpler to just start processing the request again - which the patch >> does. >> > > I suppose. > >>> You might also want to print something so that it is >>> visible that the driver retried the transfer. >> There are already two error messages per sector (one from this function >> and one from '__blk_end_request()', so another message is too much. >> > > I didn't mean an error for the bad sector, but something like > "encountered an error during multi-block transfer. retrying...". This > could save a lot of time and headache during debugging in the future. > >> @@ -362,12 +380,19 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req) >> #endif >> } >> >> - if (brq.cmd.error || brq.data.error || brq.stop.error) >> + if (brq.cmd.error || brq.stop.error) >> + goto cmd_err; >> + >> + if (brq.data.error == -ETIMEDOUT && rq_data_dir(req) == READ) { >> + spin_lock_irq(&md->lock); >> + ret = __blk_end_request(req, -EIO, brq.data.blksz); >> + spin_unlock_irq(&md->lock); >> + continue; >> + } >> + > > This could use a comment explaining that if we've reached this point, > we know that it was a single sector that failed. > >> + if (brq.cmd.error) >> goto cmd_err; >> >> - /* >> - * A block was successfully transferred. >> - */ >> spin_lock_irq(&md->lock); >> ret = __blk_end_request(req, 0, brq.data.bytes_xfered); >> spin_unlock_irq(&md->lock); > > Shouldn't this comment stay now? :) OK >From ceb711896b3c0cbc1c01b8f0c3733cedf3d88843 Mon Sep 17 00:00:00 2001 From: Adrian Hunter <ext-adrian.hunter@nokia.com> Date: Thu, 16 Oct 2008 13:13:08 +0300 Subject: [PATCH] mmc_block: ensure all sectors that do not have errors are read If a card encounters an ECC error while reading a sector it will timeout. Instead of reporting the entire I/O request as having an error, redo the I/O one sector at a time so that all readable sectors are provided to the upper layers. Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com> --- drivers/mmc/card/block.c | 68 ++++++++++++++++++++++++++++++++++----------- 1 files changed, 51 insertions(+), 17 deletions(-) diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c index 5b46ec9..040b57e 100644 --- a/drivers/mmc/card/block.c +++ b/drivers/mmc/card/block.c @@ -231,7 +231,7 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req) struct mmc_blk_data *md = mq->data; struct mmc_card *card = md->queue.card; struct mmc_blk_request brq; - int ret = 1; + int ret = 1, disable_multi = 0; mmc_claim_host(card->host); @@ -253,6 +253,14 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req) brq.stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC; brq.data.blocks = req->nr_sectors; + /* + * After a read error, we redo the request one sector at a time + * in order to accurately determine which sectors can be read + * successfully. + */ + if (disable_multi && brq.data.blocks > 1) + brq.data.blocks = 1; + if (brq.data.blocks > 1) { /* SPI multiblock writes terminate using a special * token, not a STOP_TRANSMISSION request. @@ -281,6 +289,16 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req) brq.data.sg = mq->sg; brq.data.sg_len = mmc_queue_map_sg(mq); + /* + * Some drivers expect the sg list to be the same size as the + * request, which it won't be if we have fallen back to do + * one sector at a time. + */ + if (disable_multi) { + brq.data.sg->length = 512; + brq.data.sg_len = 1; + } + mmc_queue_bounce_pre(mq); mmc_wait_for_req(card->host, &brq.mrq); @@ -292,8 +310,17 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req) * until later as we need to wait for the card to leave * programming mode even when things go wrong. */ - if (brq.cmd.error || brq.data.error || brq.stop.error) + if (brq.cmd.error || brq.data.error || brq.stop.error) { + if (brq.data.blocks > 1 && rq_data_dir(req) == READ) { + /* Redo read one sector at a time */ + printk(KERN_WARNING "%s: error, retrying using " + "single block read\n", + req->rq_disk->disk_name); + disable_multi = 1; + continue; + } status = get_card_status(card, req); + } if (brq.cmd.error) { printk(KERN_ERR "%s: error %d sending read/write " @@ -350,8 +377,20 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req) #endif } - if (brq.cmd.error || brq.data.error || brq.stop.error) + if (brq.cmd.error || brq.stop.error || brq.data.error) { + if (rq_data_dir(req) == READ) { + /* + * After an error, we redo I/O one sector at a + * time, so we only reach here after trying to + * read a single sector. + */ + spin_lock_irq(&md->lock); + ret = __blk_end_request(req, -EIO, brq.data.blksz); + spin_unlock_irq(&md->lock); + continue; + } goto cmd_err; + } /* * A block was successfully transferred. @@ -373,25 +412,20 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req) * If the card is not SD, we can still ok written sectors * as reported by the controller (which might be less than * the real number of written sectors, but never more). - * - * For reads we just fail the entire chunk as that should - * be safe in all cases. */ - if (rq_data_dir(req) != READ) { - if (mmc_card_sd(card)) { - u32 blocks; + if (mmc_card_sd(card)) { + u32 blocks; - blocks = mmc_sd_num_wr_blocks(card); - if (blocks != (u32)-1) { - spin_lock_irq(&md->lock); - ret = __blk_end_request(req, 0, blocks << 9); - spin_unlock_irq(&md->lock); - } - } else { + blocks = mmc_sd_num_wr_blocks(card); + if (blocks != (u32)-1) { spin_lock_irq(&md->lock); - ret = __blk_end_request(req, 0, brq.data.bytes_xfered); + ret = __blk_end_request(req, 0, blocks << 9); spin_unlock_irq(&md->lock); } + } else { + spin_lock_irq(&md->lock); + ret = __blk_end_request(req, 0, brq.data.bytes_xfered); + spin_unlock_irq(&md->lock); } mmc_release_host(card->host); -- 1.5.4.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] mmc_block: ensure all sectors that do not have errors are read 2008-12-05 11:09 ` Adrian Hunter @ 2008-12-21 14:29 ` Pierre Ossman 2008-12-22 11:29 ` Adrian Hunter 0 siblings, 1 reply; 10+ messages in thread From: Pierre Ossman @ 2008-12-21 14:29 UTC (permalink / raw) To: Adrian Hunter; +Cc: LKML [-- Attachment #1: Type: text/plain, Size: 1066 bytes --] On Fri, 05 Dec 2008 13:09:11 +0200 Adrian Hunter <ext-adrian.hunter@nokia.com> wrote: > @@ -281,6 +289,16 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req) > brq.data.sg = mq->sg; > brq.data.sg_len = mmc_queue_map_sg(mq); > > + /* > + * Some drivers expect the sg list to be the same size as the > + * request, which it won't be if we have fallen back to do > + * one sector at a time. > + */ > + if (disable_multi) { > + brq.data.sg->length = 512; > + brq.data.sg_len = 1; > + } > + > mmc_queue_bounce_pre(mq); > > mmc_wait_for_req(card->host, &brq.mrq); Unfortunately, there is no guarantee that the sg list will be sector aligned. Look at the code removed in f3eb0aaa02 for how to handle this properly. Other than that, the patch looks ready to go. Rgds -- -- Pierre Ossman WARNING: This correspondence is being monitored by the Swedish government. Make sure your server uses encryption for SMTP traffic and consider using PGP for end-to-end encryption. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] mmc_block: ensure all sectors that do not have errors are read 2008-12-21 14:29 ` Pierre Ossman @ 2008-12-22 11:29 ` Adrian Hunter 2008-12-22 13:12 ` Bernd Eckenfels 2008-12-31 17:21 ` Pierre Ossman 0 siblings, 2 replies; 10+ messages in thread From: Adrian Hunter @ 2008-12-22 11:29 UTC (permalink / raw) To: Pierre Ossman; +Cc: LKML Pierre Ossman wrote: > On Fri, 05 Dec 2008 13:09:11 +0200 > Adrian Hunter <ext-adrian.hunter@nokia.com> wrote: > >> @@ -281,6 +289,16 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req) >> brq.data.sg = mq->sg; >> brq.data.sg_len = mmc_queue_map_sg(mq); >> >> + /* >> + * Some drivers expect the sg list to be the same size as the >> + * request, which it won't be if we have fallen back to do >> + * one sector at a time. >> + */ >> + if (disable_multi) { >> + brq.data.sg->length = 512; >> + brq.data.sg_len = 1; >> + } >> + >> mmc_queue_bounce_pre(mq); >> >> mmc_wait_for_req(card->host, &brq.mrq); > > Unfortunately, there is no guarantee that the sg list will be sector > aligned. OK. I am curious though - do you know anywhere in the kernel that actually does submit I/O in buffers that are not aligned to 512? > Look at the code removed in f3eb0aaa02 for how to handle this > properly. > > Other than that, the patch looks ready to go. > > Rgds From: Adrian Hunter <ext-adrian.hunter@nokia.com> Subject: [PATCH] mmc_block: ensure all sectors that do not have errors are read If a card encounters an ECC error while reading a sector it will timeout. Instead of reporting the entire I/O request as having an error, redo the I/O one sector at a time so that all readable sectors are provided to the upper layers. Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com> --- drivers/mmc/card/block.c | 76 +++++++++++++++++++++++++++++++++++---------- 1 files changed, 59 insertions(+), 17 deletions(-) diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c index 2998112..679b489 100644 --- a/drivers/mmc/card/block.c +++ b/drivers/mmc/card/block.c @@ -231,7 +231,7 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req) struct mmc_blk_data *md = mq->data; struct mmc_card *card = md->queue.card; struct mmc_blk_request brq; - int ret = 1; + int ret = 1, disable_multi = 0; mmc_claim_host(card->host); @@ -253,6 +253,14 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req) brq.stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC; brq.data.blocks = req->nr_sectors; + /* + * After a read error, we redo the request one sector at a time + * in order to accurately determine which sectors can be read + * successfully. + */ + if (disable_multi && brq.data.blocks > 1) + brq.data.blocks = 1; + if (brq.data.blocks > 1) { /* SPI multiblock writes terminate using a special * token, not a STOP_TRANSMISSION request. @@ -281,6 +289,25 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req) brq.data.sg = mq->sg; brq.data.sg_len = mmc_queue_map_sg(mq); + /* + * Adjust the sg list so it is the same size as the + * request. + */ + if (brq.data.blocks != req->nr_sectors) { + int i, data_size = brq.data.blocks << 9; + struct scatterlist *sg; + + for_each_sg(brq.data.sg, sg, brq.data.sg_len, i) { + data_size -= sg->length; + if (data_size <= 0) { + sg->length += data_size; + i++; + break; + } + } + brq.data.sg_len = i; + } + mmc_queue_bounce_pre(mq); mmc_wait_for_req(card->host, &brq.mrq); @@ -292,8 +319,16 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req) * until later as we need to wait for the card to leave * programming mode even when things go wrong. */ - if (brq.cmd.error || brq.data.error || brq.stop.error) + if (brq.cmd.error || brq.data.error || brq.stop.error) { + if (brq.data.blocks > 1 && rq_data_dir(req) == READ) { + /* Redo read one sector at a time */ + printk(KERN_WARNING "%s: retrying using single " + "block read\n", req->rq_disk->disk_name); + disable_multi = 1; + continue; + } status = get_card_status(card, req); + } if (brq.cmd.error) { printk(KERN_ERR "%s: error %d sending read/write " @@ -350,8 +385,20 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req) #endif } - if (brq.cmd.error || brq.data.error || brq.stop.error) + if (brq.cmd.error || brq.stop.error || brq.data.error) { + if (rq_data_dir(req) == READ) { + /* + * After an error, we redo I/O one sector at a + * time, so we only reach here after trying to + * read a single sector. + */ + spin_lock_irq(&md->lock); + ret = __blk_end_request(req, -EIO, brq.data.blksz); + spin_unlock_irq(&md->lock); + continue; + } goto cmd_err; + } /* * A block was successfully transferred. @@ -373,25 +420,20 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req) * If the card is not SD, we can still ok written sectors * as reported by the controller (which might be less than * the real number of written sectors, but never more). - * - * For reads we just fail the entire chunk as that should - * be safe in all cases. */ - if (rq_data_dir(req) != READ) { - if (mmc_card_sd(card)) { - u32 blocks; + if (mmc_card_sd(card)) { + u32 blocks; - blocks = mmc_sd_num_wr_blocks(card); - if (blocks != (u32)-1) { - spin_lock_irq(&md->lock); - ret = __blk_end_request(req, 0, blocks << 9); - spin_unlock_irq(&md->lock); - } - } else { + blocks = mmc_sd_num_wr_blocks(card); + if (blocks != (u32)-1) { spin_lock_irq(&md->lock); - ret = __blk_end_request(req, 0, brq.data.bytes_xfered); + ret = __blk_end_request(req, 0, blocks << 9); spin_unlock_irq(&md->lock); } + } else { + spin_lock_irq(&md->lock); + ret = __blk_end_request(req, 0, brq.data.bytes_xfered); + spin_unlock_irq(&md->lock); } mmc_release_host(card->host); -- 1.5.4.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] mmc_block: ensure all sectors that do not have errors are read 2008-12-22 11:29 ` Adrian Hunter @ 2008-12-22 13:12 ` Bernd Eckenfels 2008-12-31 17:21 ` Pierre Ossman 1 sibling, 0 replies; 10+ messages in thread From: Bernd Eckenfels @ 2008-12-22 13:12 UTC (permalink / raw) To: linux-kernel In article <494F7A02.2030302@nokia.com> you wrote: > + /* > + * After a read error, we redo the request one sector at a time > + * in order to accurately determine which sectors can be read > + * successfully. > + */ > + if (disable_multi && brq.data.blocks > 1) > + brq.data.blocks = 1; > + > if (brq.data.blocks > 1) { > /* SPI multiblock writes terminate using a special > * token, not a STOP_TRANSMISSION request. We would save an comparision here if we use it like this: ... if (brq.data.blocks > 1) { /* * After a read error, we redo the request one sector at a time * in order to accurately determine which sectors can be read * successfully. */ if (disable_multi) brq.data.blocks = 1; /* SPI multiblock writes terminate using a special ... > + if (brq.cmd.error || brq.data.error || brq.stop.error) { > + if (brq.data.blocks > 1 && rq_data_dir(req) == READ) { > + /* Redo read one sector at a time */ > + printk(KERN_WARNING "%s: retrying using single " > + "block read\n", req->rq_disk->disk_name); > + disable_multi = 1; > + continue; > + } Will this flood the logs? Gruss Bernd ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] mmc_block: ensure all sectors that do not have errors are read 2008-12-22 11:29 ` Adrian Hunter 2008-12-22 13:12 ` Bernd Eckenfels @ 2008-12-31 17:21 ` Pierre Ossman 1 sibling, 0 replies; 10+ messages in thread From: Pierre Ossman @ 2008-12-31 17:21 UTC (permalink / raw) To: Adrian Hunter; +Cc: LKML [-- Attachment #1: Type: text/plain, Size: 1179 bytes --] On Mon, 22 Dec 2008 13:29:06 +0200 Adrian Hunter <ext-adrian.hunter@nokia.com> wrote: > Pierre Ossman wrote: > > > > Unfortunately, there is no guarantee that the sg list will be sector > > aligned. > > OK. I am curious though - do you know anywhere in the kernel that actually > does submit I/O in buffers that are not aligned to 512? > Not really, no. Not that I've gone digging. > > From: Adrian Hunter <ext-adrian.hunter@nokia.com> > Subject: [PATCH] mmc_block: ensure all sectors that do not have errors are read > > If a card encounters an ECC error while reading a sector it will > timeout. Instead of reporting the entire I/O request as having > an error, redo the I/O one sector at a time so that all readable > sectors are provided to the upper layers. > > Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com> > --- Queued up. Thanks for sticking through this somewhat long review process. :) Rgds -- -- Pierre Ossman WARNING: This correspondence is being monitored by the Swedish government. Make sure your server uses encryption for SMTP traffic and consider using PGP for end-to-end encryption. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2008-12-31 17:21 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-10-16 13:26 [PATCH 2/2] mmc_block: ensure all sectors that do not have errors are read Adrian Hunter 2008-10-26 11:32 ` Pierre Ossman 2008-10-29 14:26 ` Adrian Hunter 2008-11-10 8:21 ` Adrian Hunter 2008-11-30 19:05 ` Pierre Ossman 2008-12-05 11:09 ` Adrian Hunter 2008-12-21 14:29 ` Pierre Ossman 2008-12-22 11:29 ` Adrian Hunter 2008-12-22 13:12 ` Bernd Eckenfels 2008-12-31 17:21 ` Pierre Ossman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox