All of lore.kernel.org
 help / color / mirror / Atom feed
From: Trey Ramsay <tramsay@linux.vnet.ibm.com>
To: Chris Ball <cjb@laptop.org>
Cc: linux-kernel@vger.kernel.org, linux-mmc@vger.kernel.org,
	Rich Rattanni <rattanni@gmail.com>,
	Radovan Lekanovic <lekanovic@gmail.com>
Subject: Re: [PATCH 1/1] mmc: Bad device can cause mmc driver to hang
Date: Fri, 16 Nov 2012 18:40:22 -0600	[thread overview]
Message-ID: <50A6DCF6.2030707@linux.vnet.ibm.com> (raw)
In-Reply-To: <876255bluf.fsf@octavius.laptop.org>

On 11/16/2012 09:37 AM, Chris Ball wrote:
> Hi Trey,
> 
> On Fri, Nov 16 2012, Trey Ramsay wrote:
>> There are infinite loops in the mmc code that can be caused by bad hardware.
>> The code will loop forever if the device never comes back from program mode,
>> R1_STATE_PRG, and it is not ready for data, R1_READY_FOR_DATA.
>>
>> A long timeout will be added to prevent the code from looping forever.
>> The timeout will occur if the device never comes back from program
>> state or the device never becomes ready for data.
>>
>> Signed-off-by: Trey Ramsay <tramsay@linux.vnet.ibm.com>
> 
> Thanks, looks good!
> 
> Have you thought about what's going to happen after this path is hit?
> Are we just going to start a new request that begins a new ten-minute
> hang, or do we notice the bad card state somewhere and refuse to start
> new I/O?
> 
> - Chris.
> 

You're welcome, and thanks for the help!

Good question.  In regards to the original problem were it was hung in
mmc_blk_err_check, the new code path will timeout after 10 minutes, log
an error, issue a hardware reset and abort the request. Is the hardware
reset enough or will that even work when the device isn't coming out of
program state? Should we try to refuse all new I/O?


Below are just some notes I took about the code path for
mmc_blk_err_check and mmc_do_erase.
=============================
mmc_blk_err_check
                        int err = get_card_status(card, &status, 5);
                        if (err) {
                                pr_err("%s: error %d requesting status\n",
                                       req->rq_disk->disk_name, err);
                                return MMC_BLK_CMD_ERR;
                        }
+
+                       /* Timeout if the device never becomes ready for
data
+                        * and never leaves the program state.
+                        */
+                       if (time_after(jiffies, timeout)) {
+                               pr_err("%s: Card stuck in programming
state!"\
+                                       " %s %s\n",
mmc_hostname(card->host),
+                                       req->rq_disk->disk_name, __func__);
+
+                               return MMC_BLK_CMD_ERR;
+                       }

Stack
-----------------
mmc_blk_err_check
mmc_start_req
mmc_blk_issue_rw_rq
mmc_blk_issue_rq
mmc_queue_thread

We return MMC_BLK_CMD_ERR the same way as we return MMC_BLK_CMD_ERR if
the get_card_status failed. The code which returns to mmc_start_req
which sets the error status and returns to mmc_blk_issue_rw_rq with a
status of MMC_BLK_CMD_ERR where it calls mmc_blk_reset which calls
mmc_hw_reset. The code then return so mmc_queue_thread which does not
check the return code.

=============================
mmc_do_erase
+
+               /* Timeout if the device never becomes ready for data and
+                * never leaves the program state.
+                */
+               if (time_after(jiffies, timeout)) {
+                       pr_err("%s: Card stuck in programming state! %s\n",
+                               mmc_hostname(card->host), __func__);
+                       err =  -EIO;
+                       goto out;
+               }
+

        err = mmc_erase(card, from, nr, arg);
out:
        if (err == -EIO && !mmc_blk_reset(md, card->host, type))
                goto retry;


The mmc_do_erase function is called by mmc_erase which is called from 3
locations in the block.c code.

    1) mmc_blk_issue_discard_rq--->mmc_erase--->mmc_do_erase    line 848
    2) mmc_blk_issue_secdiscard_rq--->mmc_erase--->mmc_do_erase line 884
    3) mmc_blk_issue_secdiscard_rq--->mmc_erase--->mmc_do_erase line 904

    This applies to 1, 2 and 3 above.
    Since we return -EIO, mmc_blk_issue_discard_rq or
    mmc_blk_issue_secdiscard will do a mmc_blk_reset which will call
    mmc_hw_reset. If hardware reset is successfull, it will retry the
    command.  If the hardware reset is unsuccessfull, it will call
    blk_end_request and will return 0 if there is
    an err and will return to the mmc_queue_thread function which does
    not check the return code.





  parent reply	other threads:[~2012-11-17  0:40 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-31 19:36 drivers/mmc/card/block.c infinite loop in mmc_blk_err_check waiting on R1_READY_FOR_DATA Trey Ramsay
2012-10-31 20:47 ` Chris Ball
     [not found]   ` <50A2A08D.5000601@linux.vnet.ibm.com>
2012-11-13 20:48     ` Chris Ball
2012-11-16 15:31       ` [PATCH 1/1] mmc: Bad device can cause mmc driver to hang Trey Ramsay
2012-11-16 15:31         ` Trey Ramsay
2012-11-16 15:37         ` Chris Ball
2012-11-16 15:37           ` Chris Ball
2012-11-16 23:52           ` Trey Ramsay
2012-11-17  0:37             ` Chris Ball
2012-11-17  5:16               ` Trey Ramsay
2012-11-17  0:40           ` Trey Ramsay [this message]
2012-11-17 14:34         ` Chris Ball
2012-11-17 14:34           ` Chris Ball

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50A6DCF6.2030707@linux.vnet.ibm.com \
    --to=tramsay@linux.vnet.ibm.com \
    --cc=cjb@laptop.org \
    --cc=lekanovic@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mmc@vger.kernel.org \
    --cc=rattanni@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.