From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bartlomiej Zolnierkiewicz Subject: Re: [PATCH 11/16] mmc: block: shuffle retry and error handling Date: Wed, 01 Mar 2017 12:45:57 +0100 Message-ID: <1718299.ToPxjyb5YA@amdc3058> References: <20170209153403.9730-1-linus.walleij@linaro.org> <20170209153403.9730-12-linus.walleij@linaro.org> <22219053.LlSbdLjSYi@amdc3058> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7Bit Return-path: Received: from mailout2.samsung.com ([203.254.224.25]:37337 "EHLO mailout2.samsung.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750743AbdCAL6g (ORCPT ); Wed, 1 Mar 2017 06:58:36 -0500 In-reply-to: <22219053.LlSbdLjSYi@amdc3058> Sender: linux-mmc-owner@vger.kernel.org List-Id: linux-mmc@vger.kernel.org To: Linus Walleij Cc: linux-mmc@vger.kernel.org, Ulf Hansson , Adrian Hunter , Paolo Valente , Chunyan Zhang , Baolin Wang , linux-block@vger.kernel.org, Jens Axboe , Christoph Hellwig , Arnd Bergmann Hi, On Tuesday, February 28, 2017 06:45:20 PM Bartlomiej Zolnierkiewicz wrote: > On Thursday, February 09, 2017 04:33:58 PM Linus Walleij wrote: > > Instead of doing retries at the same time as trying to submit new > > requests, do the retries when the request is reported as completed > > by the driver, in the finalization worker. > > > > This is achieved by letting the core worker call back into the block > > layer using mmc_blk_rw_done(), that will read the status and repeatedly > > try to hammer the request using single request etc by calling back to > > the core layer using mmc_restart_areq() > > > > The beauty of it is that the completion will not complete until the > > block layer has had the opportunity to hammer a bit at the card using > > a bunch of different approaches in the while() loop in > > mmc_blk_rw_done() > > > > The algorithm for recapture, retry and handle errors is essentially > > identical to the one we used to have in mmc_blk_issue_rw_rq(), > > only augmented to get called in another path. > > > > We have to add and initialize a pointer back to the struct mmc_queue > > from the struct mmc_queue_req to find the queue from the asynchronous > > request. > > > > Signed-off-by: Linus Walleij > > It seems that after this change we can end up queuing more > work for kthread from the kthread worker itself and wait > inside it for this nested work to complete. I hope that On the second look it seems that there is no waiting for the retried areq to complete so I cannot see what protects us from racing and trying to run two areq-s in parallel: 1st areq being retried (in the completion kthread): mmc_blk_rw_done()->mmc_restart_areq()->__mmc_start_data_req() 2nd areq coming from the second request in the queue (in the queuing kthread): mmc_blk_issue_rw_rq()->mmc_start_areq()->__mmc_start_data_req() (after mmc_blk_rw_done() is done in mmc_finalize_areq() 1st areq is marked as completed by the completion kthread and the waiting on host->areq in mmc_start_areq() of the queuing kthread is done and 2nd areq is started while the 1st one is still being retried) ? Also retrying of areqs for MMC_BLK_RETRY status case got broken (before change do {} while() loop increased retry variable, now the loop is gone and retry variable will not be increased correctly and we can loop forever). Best regards, -- Bartlomiej Zolnierkiewicz Samsung R&D Institute Poland Samsung Electronics