From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6BDECC433E3 for ; Mon, 17 Aug 2020 05:39:45 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3B60D206FA for ; Mon, 17 Aug 2020 05:39:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="KzTtsm2Q"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=skidata.com header.i=@skidata.com header.b="eqHTZgOi" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3B60D206FA Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=skidata.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:Message-ID: Subject:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=y7iugNzCfOvDXkoL1qq5kOnF/9rfxgtJ/qlyFzLjH5Y=; b=KzTtsm2QpRZhvp9nwG4yx5zVP i/sy2nOoDwKLipR0g2qj8oF62GjK2/nkc4grRnisW/I7hoLkMayleYL9apjV7y9VlKcaU55lA3R1O WB6FMXZXcz6JiP2lyx6K6Ne/mx6c0jJfDaTjIlB6IZdyg9oCGbYdo8az3lG3JAlkF/2XaYKqzIOov Flev97eTBkIbrgYlLRPKp45QU/zXKrU/4gZgd4WhRPL+m4IoblD0tw9U1A/JO0153HFCt/dE6zkaN Pq6Vd/VLpUZLUqvaaxRyd7tBy8VzXIMgiz6oF+FAqwWWarAz6ACacXQ7O+qRBgu3jQ1Abvc/cXeHZ pdgjWCXwA==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1k7Xqy-0005vK-EV; Mon, 17 Aug 2020 05:38:40 +0000 Received: from mail1.skidata.com ([91.230.2.99]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1k7Xqw-0005ud-C8 for linux-arm-kernel@lists.infradead.org; Mon, 17 Aug 2020 05:38:39 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=skidata.com; i=@skidata.com; q=dns/txt; s=selector1; t=1597642718; x=1629178718; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=nC3QtiPkc2ash+7NhCkNY1Q6+j1FJN1xDaKIPykDIQE=; b=eqHTZgOiB9kHvKsJ1G/EijirkI5KhB6u4inAcYKzTZJMmjEg4rCn0fkB y5pr1yq3ZDJgvG4IakFO59xSZUr4KyB0CvQ3JVrAPvekQyZ3afReCnucK zdPSTta6oiE7wNSyd0v0FSv+839XqEY+ARNLt/gkJNveSBe/x/uG1pbUE pt3nt62YMLimaeNgZnN0PNDeEp3Mt0BA0mrs1OsUgwJ+o5NyB+RW8szpW 1K6JEZ/xXJuI36liIA7yfrrtxmhuLIvUhDbsgF0D+SJDKtfInW3kakEaV xsC/IqMHKyntyOqA+nk4t/CXrru1wmXsqQp+QLBj8haqt1zMaQBBBQoVO A==; IronPort-SDR: Cyb4DgRj+rCeSt/lc6ca/UWAxILZY3BwglmUQTpWJKQHBZEMAjDzPsI/XWGEqNlWzWLVt2G5t0 ftCvLYazD3V9kXiumE7Dl9j53lr16eEt+XMB1S73ckdJQOEgch5Laa9QH2iMy1OOOEVG94MmRg eI1QfaSCRzXL/yYa0rMlnC+FXRsAypoOJKqbBx9YX3zNK5O6JBvu19RU57ttVdjfR64ruRgtyz DC2rVUWr89IXhyUIsDZjAO5qfLdU7CDpruLk5D66jwnhCxXbO6t/cAHKtunyJaZMyvRnOESEV4 fkA= X-IronPort-AV: E=Sophos;i="5.76,322,1592863200"; d="scan'208";a="26161395" Date: Mon, 17 Aug 2020 07:38:36 +0200 From: Richard Leitner To: Robin Gong Subject: Re: pcm|dmaengine|imx-sdma race condition on i.MX6 Message-ID: <20200817053836.GC551027@pcleri> References: <20200813112258.GA327172@pcleri> <7f98cd6d30404e4d9d621f57f45ae441@skidata.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <7f98cd6d30404e4d9d621f57f45ae441@skidata.com> X-Originating-IP: [192.168.111.252] X-ClientProxiedBy: sdex6srv.skidata.net (192.168.111.84) To sdex5srv.skidata.net (192.168.111.83) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20200817_013838_696051_A618EA9D X-CRM114-Status: GOOD ( 34.34 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: alsa-devel@alsa-project.org, timur@kernel.org, linux-kernel@vger.kernel.org, nicoleotsuka@gmail.com, vkoul@kernel.org, dl-linux-imx , kernel@pengutronix.de, dmaengine@vger.kernel.org, dan.j.williams@intel.com, shawnguo@kernel.org, linux-arm-kernel@lists.infradead.org, Benjamin Bara - SKIDATA Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Fri, Aug 14, 2020 at 11:18:10AM +0200, Robin Gong wrote: > > On 2020/08/13 19:23: Richard Leitner wrote: > > Hi, > > we've found a race condition with the PCM on the i.MX6 which results > > in an -EIO for the SNDRV_PCM_IOCTL_READI_FRAMES ioctl after an -EPIPE (XRUN). > > > > A possible reproduction may look like the following reduced call graph > > during a PCM capture: > > > > us -> ioctl(SNDRV_PCM_IOCTL_READI_FRAMES) > > - wait_for_avail() > > - schedule_timeout() > > -> snd_pcm_update_hw_ptr0() > > - snd_pcm_update_state: EPIPE (XRUN) > > - sdma_disable_channel_async() # get's scheduled away due to sleep us > > <- ioctl(SNDRV_PCM_IOCTL_READI_FRAMES) returns -EPIPE > > us -> ioctl(SNDRV_PCM_IOCTL_PREPARE) # as reaction to the EPIPE (XRUN) > > us -> ioctl(SNDRV_PCM_IOCTL_READI_FRAMES) # next try to capture frames > > - sdma_prep_dma_cyclic() > > - sdma_load_context() # not loaded as context_loaded is 1 > > - wait_for_avail() > > - schedule_timeout() > > # now the sdma_channel_terminate_work() comes back and sets # > > context_loaded = false and frees in vchan_dma_desc_free_list(). > > us <- ioctl returns -EIO (capture write error (DMA or IRQ trouble?)) > > Seems the write error caused by context_loaded not set to false before > next transfer start? If yes, please have a try with the 03/04 of the below > patch set, anyway, could you post your failure log? > https://lkml.org/lkml/2020/8/11/111 Thanks for the pointer to those patches. I've tested them on top of the v5.8 tag during the weekend. It seems those patches are mitigiating the described EIO issue. Nonetheless IMHO this patches are not fixing the root cause of the problem (unsynchronized sdma_disable_channel_async / sdma_prep_dma_cyclic). Do you (or somebody else) have any suggestions/comments/objections on that? regards;rl > > > > > > > What we have found out, based on our understanding: > > The dmaengine docu states that a dmaengine_terminate_async() must be > > followed by a dmaengine_synchronize(). > > However, in the pcm_dmaengine.c, only dmaengine_terminate_async() is > > called (for performance reasons and because it might be called from an > > interrupt handler). > > > > In our tests, we saw that the user-space immediately calls > > ioctl(SNDRV_PCM_IOCTL_PREPARE) as a handler for the happened xrun > > (previous ioctl(SNDRV_PCM_IOCTL_READI_FRAMES) returns with -EPIPE). In > > our case (imx-sdma.c), the terminate really happens asynchronously > > with a worker thread which is not awaited/synchronized by the > > ioctl(SNDRV_PCM_IOCTL_PREPARE) call. > > > > Since the syscall immediately enters an atomic context > > (snd_pcm_stream_lock_irq()), we are not able to flush the work of the > > termination worker from within the DMA context. This leads to an > > unterminated DMA getting re-initialized and then terminated. > > > > On the i.MX6 platform the problem is (if I got it correctly) that the > > sdma_channel_terminate_work() called after the -EPIPE gets scheduled > > away (for the 1-2ms sleep [1]). During that time the userspace already > > sends in the > > ioctl(SNDRV_PCM_IOCTL_PREPARE) and > > ioctl(SNDRV_PCM_IOCTL_READI_FRAMES). > > As none of them are anyhow synchronized to the terminate_worker the > > vchan_dma_desc_free_list() [2] and "sdmac->context_loaded = false;" > > [3] are executed during the wait_for_avail() [4] of the > > ioctl(SNDRV_PCM_IOCTL_READI_FRAMES). > > > > To make sure we identified the problem correctly we've tested to add a > > "dmaengine_synchronize()" before the snd_pcm_prepare() in [5]. This > > fixed the race condition in all our tests. (Before we were able to > > reproduce it in 100% of the test runs). > > > > Based on our understanding, there are two different points to ensure > > the > > termination: > > Either ensure that the termination is finished within the previous > > SNDRV_PCM_IOCTL_READI_FRAMES call (inside the DMA context) or > > finishing it in the SNDRV_PCM_IOCTL_PREPARE call (and all other > > applicable ioclts) before entering the atomic context (from the PCM context). > > > > We initially thought about implementing the first approach, basically > > splitting up the dma_device terminate_all operation into a sync > > (busy-wait) and a async one. This would align the operations with the > > DMAengine interface and would enable a sync termination variant from > > atomic contexts. > > However, we saw that the dma_free_attrs() function has a WARN_ON on > > irqs disabled, which would be the case for the sync variant. > > Side note: We found this issue on the current v5.4.y LTS branch, but > > it also affects v5.8.y. > > > > Any feedback or pointers how we may fix the problem are warmly welcome! > > If anything is unclear please just ask :-) > > > > regards; > > Richard Leitner > > Benjamin Bara _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel