From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752907AbcERJP4 (ORCPT ); Wed, 18 May 2016 05:15:56 -0400 Received: from lucky1.263xmail.com ([211.157.147.135]:43158 "EHLO lucky1.263xmail.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752439AbcERJPx (ORCPT ); Wed, 18 May 2016 05:15:53 -0400 X-263anti-spam: KSV:0; X-MAIL-GRAY: 1 X-MAIL-DELIVERY: 0 X-KSVirus-check: 0 X-ABS-CHECKED: 4 X-ADDR-CHECKED: 0 X-RL-SENDER: shawn.lin@rock-chips.com X-FST-TO: linux-kernel@vger.kernel.org X-SENDER-IP: 58.22.7.114 X-LOGIN-NAME: shawn.lin@rock-chips.com X-UNIQUE-TAG: X-ATTACHMENT-NUM: 0 X-DNS-TYPE: 0 Subject: Re: [PATCH] mmc: dw_mmc: Consider HLE errors to be data and command errors To: Doug Anderson References: <1426002490-2014-1-git-send-email-dianders@chromium.org> <5502CA4E.9060401@samsung.com> <5506707D.40708@samsung.com> <55189F04.8000404@samsung.com> <573BCC8D.5090606@kernel-upstream.org> Cc: shawn.lin@rock-chips.com, Jaehoon Chung , Seungwon Jeon , Ulf Hansson , Alim Akhtar , Sonny Rao , Heiko Stuebner , Alexandru Stan , Javier Martinez Canillas , "open list:ARM/Rockchip SoC..." , "linux-arm-kernel@lists.infradead.org" , "linux-mmc@vger.kernel.org" , "linux-kernel@vger.kernel.org" From: Shawn Lin Message-ID: <573C3283.1040606@rock-chips.com> Date: Wed, 18 May 2016 17:14:43 +0800 User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:38.0) Gecko/20100101 Thunderbird/38.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi On 2016-5-18 12:12, Doug Anderson wrote: > Hi, > > On Tue, May 17, 2016 at 6:59 PM, Shawn Lin > wrote: >> Could you try this patch to see if you can still find HLE? >> >> @@ -2356,12 +2356,22 @@ static void dw_mci_cmd_interrupt(struct dw_mci >> *host, u32 status) >> static void dw_mci_handle_cd(struct dw_mci *host) >> { >> int i; >> + int present; >> >> for (i = 0; i < host->num_slots; i++) { >> struct dw_mci_slot *slot = host->slot[i]; >> >> if (!slot) >> continue; >> >> + present = !(mci_readl(slot->host, CDETECT) & (1 << >> slot->id)); >> + if (present) >> + set_bit(DW_MMC_CARD_PRESENT, &slot->flags); >> + else >> + clear_bit(DW_MMC_CARD_PRESENT, &slot->flags); > > No, because we don't use the builtin card detect on veyron. ;) > > We use GPIO card detect because we didn't like the way JTAG and SD > interacted. Also on rk3288 the builtin card detect line had the wrong > voltage domain (you couldn't detect a card when the IO lines were > powered off). The builtin card detect line is always driven low on > veyron. Okay, I see. > > > I'm nearly certain that the root cause of my HLE errors is actually > related to the same problem addressed by the commit 7c5209c315ea > ("mmc: core: Increase delay for voltage to stabilize from 3.3V to > 1.8V"). I think that on minnie we're still on the hairy edge and > sometimes the line doesn't transition fast enough. Things are not so simple from your details. I was not enabling SD3.0 support, then I also found HLE sometimes. So it seems commit 7c5209c315ea does not contibute to this phenomenon. The scenario looks like: remove sd-card -> mmc_sd_detect -> send status(CMD13) ->power_off -> set_ios -> setup_bus -> disabled clk , then HLE irq storm coming From the code of dw_mci_prepare_command: SDMMC_CMD_PRV_DAT_WAIT will not be used for CMD13, so we don't wait_busy here, then cmd code is loding into queue of dw_mmc but still failing send out because it's in busy? With my patch, things go well: remove sd-card -> clear bit of DW_MMC_CARD_PRESENT -> send status(CMD13) return directly -> power_off -> set_ios -> setup_bus -> disable clk So why should we allow inquiry of card status if we sure the card is removed? I mean no any further cmds should be delivered. And another question: should we wait busy for cmd13? > > It appears that increasing this to 30ms avoids the HLE errors. > > I _think_ I can actually fully fix this properly by temporarily > engaging the internal pull-ups while the voltage switch is happening. > This will bleed away the voltage just a little bit faster (since lines > are driven low here). I'll try to confirm that. > > > In any case, it seems like we should take this patch since (without > this patch) the failure case when you get HLE errors is that the > interrupt controller fires over and over again (with no printouts) and > your system stalls with no error messages. Sure, at least we need to address this irq storm... > > -Doug > > >