From: Jaehoon Chung <jh80.chung@samsung.com>
To: Grant Grundler <grundler@chromium.org>
Cc: Chris Ball <cjb@laptop.org>,
Jaehoon Chung <jh80.chung@samsung.com>,
Doug Anderson <dianders@chromium.org>,
"linux-mmc@vger.kernel.org" <linux-mmc@vger.kernel.org>
Subject: Re: MMC locking: mmc_request accesses from dw_mmc driver ok?
Date: Fri, 23 Aug 2013 23:06:22 +0900 [thread overview]
Message-ID: <52176C5E.2060101@samsung.com> (raw)
In-Reply-To: <CANEJEGtiwVo1K3Ht7iwMecfjA76M2JPFz+U_Hs8iRzNud=Y1-A@mail.gmail.com>
Hi Grant,
I will check this.
Best Regards,
Jaehoon Chung
On 08/22/2013 05:18 AM, Grant Grundler wrote:
> On Mon, Aug 12, 2013 at 4:45 PM, Grant Grundler <grundler@chromium.org> wrote:
>> I've been working on an "task mmcqd/0:84 blocked for more than 120
>> seconds" panic for the past month or so in the chromeos-3.4 kernel
>> tree. Stack trace below. Feel free to tell me "fixed in v3.x". :)
>
> I've added this change:
> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> index 1cf4900..a127ce1 100644
> --- a/drivers/mmc/core/core.c
> +++ b/drivers/mmc/core/core.c
> @@ -135,6 +135,9 @@ void mmc_request_done(struct mmc_host *host,
> struct mmc_request *mrq)
> struct mmc_command *cmd = mrq->cmd;
> int err = cmd->error;
>
> +WARN_ON(!host->claimed);
> +WARN_ON(host->claimer != current);
> +
> if (err && cmd->retries && mmc_host_is_spi(host)) {
> if (cmd->resp[0] & R1_SPI_ILLEGAL_COMMAND)
> cmd->retries = 0;
>
> and the "WARN_ON(host->claimer != current)" is triggering with this
> stack trace (as I suspected):
>
> WARNING: at /mnt/host/source/src/third_party/kernel/files/drivers/mmc/core/core.c:139
> mmc_request_done+0x6c/0xf0()
> Modules linked in: i2c_dev uinput asix usbnet nf_conntrack_ipv6
> nf_defrag_ipv6 uvcvideo videobuf2_vmalloc sbs_battery mwifiex_sdio
> mwifiex cfg80211 btmrvl_sdio btmrvl rtc_s3c bluetooth zram(C)
> zsmalloc(C) fuse ip6table_filter xt_mark ip6_tables joydev
> [<800150a4>] (unwind_backtrace+0x0/0x114) from [<804ee9a0>]
> (dump_stack+0x20/0x24)
> [<804ee9a0>] (dump_stack+0x20/0x24) from [<8002bff8>]
> (warn_slowpath_null+0x44/0x5c)
> [<8002bff8>] (warn_slowpath_null+0x44/0x5c) from [<803a3e74>]
> (mmc_request_done+0x6c/0xf0)
> [<803a3e74>] (mmc_request_done+0x6c/0xf0) from [<803b9c4c>]
> (dw_mci_request_end+0xc4/0xec)
> [<803b9c4c>] (dw_mci_request_end+0xc4/0xec) from [<803ba274>]
> (dw_mci_tasklet_func+0x354/0x3a8)
> [<803ba274>] (dw_mci_tasklet_func+0x354/0x3a8) from [<80034044>]
> (tasklet_action+0xac/0x12c)
> [<80034044>] (tasklet_action+0xac/0x12c) from [<800334f8>]
> (__do_softirq+0xc4/0x208)
> [<800334f8>] (__do_softirq+0xc4/0x208) from [<80033a54>] (irq_exit+0x54/0x94)
> [<80033a54>] (irq_exit+0x54/0x94) from [<8000ef68>] (handle_IRQ+0x8c/0xc8)
> [<8000ef68>] (handle_IRQ+0x8c/0xc8) from [<800085ec>] (gic_handle_irq+0x4c/0x70)
> [<800085ec>] (gic_handle_irq+0x4c/0x70) from [<8000e200>] (__irq_svc+0x40/0x60)
>
> Is this expected behavior?
>
> It feels wrong to me since it means the dw_mmc tasklet and whatever
> thread starts the IO can access the mmc_request data structure. Do I
> have this right?
>
> If I have this right, any "obvious" fixes? e.g. add locking to
> mmc_start_request() and mmc_request_done()?
>
> thanks,
> grant
>
>> After staring at the 14 MMC and DW driver data structures, I now
>> think dw_mmc driver is accessing MMC generic data structures
>> (mmc_request and mmc_queue_req) without grabbing either
>> mmc_blk_data->lock or mmc_queue->thread_sem and it needs to. I don't
>> have a specific stack trace yet where dw_mmc driver is accessing MMC
>> generic data without protection. This is where I need some guidance.
>>
>> I am confident dw_mmc driver is always acquiring dw_mci->lock when
>> accessing data in dw_mci structure(s). I don't see any locking around
>> access into the struct mmc_request by dw_mci_slot[]->mrq though - not
>> sure where that belongs.
>>
>> Two questions:
>> 1) is there interest in adding "assert_spin_locked()" calls to
>> document locking dependencies?
>> 2) Does anyone understand this code well enough to confirm I'm on the
>> right track and which code path I should be looking at?
>>
>>
>> Back to the bug: mmc_start_req() is sleeping, waiting for the
>> "previous" (in flight) "async" IO to complete. (1) This IO never
>> completes (unlikely) OR (2) already did (e.g. mmc_host->areq is stale)
>> OR (3) mmc_host->areq is non-zero garbage. I'll add some code to
>> confirm (3) not the last case.
>>
>> I have confirmed with the stress test I'm running (many async IO in
>> flight with two antagonist processes that burns CPU cycles) gets about
>> 4 completions per second that are "done" before we call
>> mmc_start_req(). So I know the race in (2) could happen.
>>
>>
>> thanks,
>> grant
>>
>>
>> INFO: task mmcqd/0:84 blocked for more than 120 seconds.
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> mmcqd/0 D 804f4890 0 84 2 0x00000000
>> [<804f4890>] (__schedule+0x614/0x780) from [<804f4dc0>] (schedule+0x94/0x98)
>> [<804f4dc0>] (schedule+0x94/0x98) from [<804f2ae0>]
>> (schedule_timeout+0x38/0x2d0)
>> [<804f2ae0>] (schedule_timeout+0x38/0x2d0) from [<804f4c14>]
>> (wait_for_common+0x164/0x1a0)
>> [<804f4c14>] (wait_for_common+0x164/0x1a0) from [<804f4d28>]
>> (wait_for_completion+0x20/0x24)
>> [<804f4d28>] (wait_for_completion+0x20/0x24) from [<803a39f8>]
>> (mmc_wait_for_req_done+0x2c/0x84)
>> [<803a39f8>] (mmc_wait_for_req_done+0x2c/0x84) from [<803a4b50>]
>> (mmc_start_req+0x60/0x120)
>> [<803a4b50>] (mmc_start_req+0x60/0x120) from [<803b09bc>]
>> (mmc_blk_issue_rw_rq+0xa0/0x3a8)
>> [<803b09bc>] (mmc_blk_issue_rw_rq+0xa0/0x3a8) from [<803b10e8>]
>> (mmc_blk_issue_rq+0x424/0x478)
>> [<803b10e8>] (mmc_blk_issue_rq+0x424/0x478) from [<803b220c>]
>> (mmc_queue_thread+0xb0/0x118)
>> [<803b220c>] (mmc_queue_thread+0xb0/0x118) from [<8004d620>] (kthread+0xa8/0xbc)
>> [<8004d620>] (kthread+0xa8/0xbc) from [<8000f1c8>] (kernel_thread_exit+0x0/0x8)
>> Kernel panic - not syncing: hung_task: blocked tasks
>> [<800150a4>] (unwind_backtrace+0x0/0x114) from [<804ee160>]
>> (dump_stack+0x20/0x24)
>> [<804ee160>] (dump_stack+0x20/0x24) from [<804ee2d0>] (panic+0xa8/0x1f4)
>> [<804ee2d0>] (panic+0xa8/0x1f4) from [<80086d44>] (watchdog+0x1f4/0x25c)
>> [<80086d44>] (watchdog+0x1f4/0x25c) from [<8004d620>] (kthread+0xa8/0xbc)
>> [<8004d620>] (kthread+0xa8/0xbc) from [<8000f1c8>] (kernel_thread_exit+0x0/0x8)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
next prev parent reply other threads:[~2013-08-23 14:06 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-12 23:45 MMC locking: mmc_request accesses from dw_mmc driver ok? Grant Grundler
2013-08-21 20:18 ` Grant Grundler
2013-08-23 14:06 ` Jaehoon Chung [this message]
2013-08-28 23:20 ` Grant Grundler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52176C5E.2060101@samsung.com \
--to=jh80.chung@samsung.com \
--cc=cjb@laptop.org \
--cc=dianders@chromium.org \
--cc=grundler@chromium.org \
--cc=linux-mmc@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.