From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS,URIBL_BLOCKED, USER_AGENT_NEOMUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3CBCC43381 for ; Fri, 15 Feb 2019 11:10:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B532121B1A for ; Fri, 15 Feb 2019 11:10:14 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=armlinux.org.uk header.i=@armlinux.org.uk header.b="p4yAO6DW" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2394011AbfBOLKN (ORCPT ); Fri, 15 Feb 2019 06:10:13 -0500 Received: from pandora.armlinux.org.uk ([78.32.30.218]:60536 "EHLO pandora.armlinux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2389957AbfBOLKN (ORCPT ); Fri, 15 Feb 2019 06:10:13 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=armlinux.org.uk; s=pandora-2019; h=Sender:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:To:From:Date:Reply-To:Cc: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=IU+yOwChmzmvd3UzIKfT99IaoF7dnV+KOj6k9ztu1AA=; b=p4yAO6DWS4FDh59bZwva8zoa9 H1coVDCyIToWW1GFwJMpYuJdl+VyPXSxbTWIy4k/C0FOuD4d6bc4Dw1vzWvILSS/N+8q5eESsL/On Gk82r5+eBE/vAg4ROeEF1F/aTml0SEa9bD5MccSlPHvsBYyDqCjy9YB0ErveBrtWeWowdmz3xA5eA qb3zgjQU50ewyqllAWTey3bUOfIOy92CPFemqYncX7VpolK0nG4LcqpjBaIX5NzjibHoJ8oCrvQQP Sa0Pkpvn2SFAr6BmpFqiX1+KkK83n1uPY0ypfFd46NoxXWEZEHwjJmkKrM6OruSBYcYGL76jnEgk0 16pmdl1sA==; Received: from shell.armlinux.org.uk ([fd8f:7570:feb6:1:5054:ff:fe00:4ec]:51122) by pandora.armlinux.org.uk with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.90_1) (envelope-from ) id 1gubNg-00025P-IS; Fri, 15 Feb 2019 11:10:08 +0000 Received: from linux by shell.armlinux.org.uk with local (Exim 4.89) (envelope-from ) id 1gubNf-0004Y0-8f; Fri, 15 Feb 2019 11:10:07 +0000 Date: Fri, 15 Feb 2019 11:10:07 +0000 From: Russell King - ARM Linux admin To: Thomas Petazzoni , netdev@vger.kernel.org Subject: Re: [REGRESSION 4.20] mvneta - DMA-API: device driver tries to sync DMA memory it has not allocated Message-ID: <20190215111007.myjudx35mmwyp7km@shell.armlinux.org.uk> References: <20190212135919.xbka3lamjn4ifcki@shell.armlinux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190212135919.xbka3lamjn4ifcki@shell.armlinux.org.uk> User-Agent: NeoMutt/20170113 (1.7.2) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Tue, Feb 12, 2019 at 01:59:19PM +0000, Russell King - ARM Linux admin wrote: > Hi, > > Booting 4.20 on SolidRun Clearfog reliably provokes the following > warning - this is with mvneta built in, but DSA as modules: > > WARNING: CPU: 0 PID: 555 at kernel/dma/debug.c:1230 check_sync+0x514/0x5bc > mvneta f1070000.ethernet: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x000000002dd7dc00] [size=240 bytes] > Modules linked in: ahci mv88e6xxx dsa_core xhci_plat_hcd xhci_hcd devlink armada_thermal marvell_cesa des_generic ehci_orion phy_armada38x_comphy mcp3021 spi_orion evbug sfp mdio_i2c ip_tables x_tables > CPU: 0 PID: 555 Comm: bridge-network- Not tainted 4.20.0+ #291 > Hardware name: Marvell Armada 380/385 (Device Tree) > [] (unwind_backtrace) from [] (show_stack+0x10/0x14) > [] (show_stack) from [] (dump_stack+0x9c/0xd4) > [] (dump_stack) from [] (__warn+0xf8/0x124) > [] (__warn) from [] (warn_slowpath_fmt+0x38/0x48) > [] (warn_slowpath_fmt) from [] (check_sync+0x514/0x5bc) > [] (check_sync) from [] (debug_dma_sync_single_range_for_cpu+0x6c/0x74) > [] (debug_dma_sync_single_range_for_cpu) from [] (mvneta_poll+0x298/0xf58) > [] (mvneta_poll) from [] (net_rx_action+0x128/0x424) > [] (net_rx_action) from [] (__do_softirq+0xf0/0x540) > [] (__do_softirq) from [] (irq_exit+0x124/0x144) > [] (irq_exit) from [] (__handle_domain_irq+0x58/0xb0) > [] (__handle_domain_irq) from [] (gic_handle_irq+0x48/0x98) > [] (gic_handle_irq) from [] (__irq_svc+0x70/0x98) > ... > > This appears to be from: > > if (rx_bytes <= rx_copybreak) { > /* better copy a small frame and not unmap the DMA region */ > skb = netdev_alloc_skb_ip_align(dev, rx_bytes); > if (unlikely(!skb)) > goto err_drop_frame_ret_pool; > > dma_sync_single_range_for_cpu(dev->dev.parent, > rx_desc->buf_phys_addr, > MVNETA_MH_SIZE + NET_SKB_PAD, > rx_bytes, > DMA_FROM_DEVICE); > > which suggests that rx_desc->buf_phys_addr is not something that should > be passed to dma_sync_single_range_for_cpu(). I've not been able to > track down why that is, nor which interface is provoking that. > > As I don't have the details of how the buffer management hardware works > on Armada 388, I'm unable to debug this myself. Doing what debugging I _can_ do, it seems that this has been a long-term error in mvneta, but one that was merely uncovered by: commit 562e2f467e71f45f0400ebee5077eaa426d3e426 Author: Yelena Krivosheev Date: Wed Jul 18 18:10:57 2018 +0200 The buffer that is being complained about is sync'd using a device of dev->dev.parent 'f1070000.ethernet', but is allocated by mvneta_bm_construct() against a different device: mvneta_bm_construct: 0x2dd85c00 +0x140 for ee113294 (f10c8000.bm) namely 'f10c8000.bm'. It's long-term, because it will only trigger in older kernels if we hit the copy-break stuff, which used to do: if (rx_bytes <= rx_copybreak) { skb = netdev_alloc_skb_ip_align(dev, rx_bytes); if (unlikely(!skb)) { ... } dma_sync_single_range_for_cpu(dev->dev.parent, phys_addr, MVNETA_MH_SIZE + NET_SKB_PAD, rx_bytes, DMA_FROM_DEVICE); where rx_copybreak is 256 bytes. Quite why that hasn't been seen already, I do not know. Looking at the code after the commit, if mvneta is used on a non-coherent platform, then we have problems: copy_size = min(skb_size, rx_bytes); ... memcpy(rxq->skb->data, data + MVNETA_MH_SIZE, copy_size); ... if (rxq->left_size == 0) { int size = copy_size + MVNETA_MH_SIZE; dma_sync_single_range_for_cpu(dev->dev.parent, phys_addr, 0, size, DMA_FROM_DEVICE); Since the sync is done _after_ we've copied data from a non-coherent buffer. If this code has been written to assume that we're always coherent, then is there any point at all to having the incorrect dma_sync_*() calls at all? -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up