From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.0 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93A8AC2BB1D for ; Fri, 17 Apr 2020 05:49:34 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id AB39321D94 for ; Fri, 17 Apr 2020 05:49:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=ozlabs-ru.20150623.gappssmtp.com header.i=@ozlabs-ru.20150623.gappssmtp.com header.b="PbMZnD+z" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AB39321D94 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ozlabs.ru Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 493QCH1Yv5zDrYZ for ; Fri, 17 Apr 2020 15:49:31 +1000 (AEST) Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=ozlabs.ru (client-ip=2607:f8b0:4864:20::443; helo=mail-pf1-x443.google.com; envelope-from=aik@ozlabs.ru; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=ozlabs.ru Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ozlabs-ru.20150623.gappssmtp.com header.i=@ozlabs-ru.20150623.gappssmtp.com header.a=rsa-sha256 header.s=20150623 header.b=PbMZnD+z; dkim-atps=neutral Received: from mail-pf1-x443.google.com (mail-pf1-x443.google.com [IPv6:2607:f8b0:4864:20::443]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 493Q9606RYzDq9R for ; Fri, 17 Apr 2020 15:47:36 +1000 (AEST) Received: by mail-pf1-x443.google.com with SMTP id b8so549071pfp.8 for ; Thu, 16 Apr 2020 22:47:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ozlabs-ru.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:autocrypt:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=wP424tWQCwOWHAKF6chYcnvu9XPTD6kN6RyJ6xdytOU=; b=PbMZnD+zrMMbne6gD8i+v0HrJ3hsS5wwTNInUmfDOKhLVh7DxsjzAFATPUXYrgGONE zEsb01abADJ5S8UoPRhVJUHNQhaRNhdxyiQZbEmzO/HC+dy99eJO6tb/3u3nTkRHrqB4 5ZZbkpMEKzI9BfJeg/7fN0K3Ikl8LBpWH0cJ+mrzNFe02hsBuQNK5uBg2oHjetW2Eg2T HvFVdvdO8EibS/R8cquR09OdNo9JHSF3WB8gPTRjXKYruvKs0Gu3H81/+RopBHZ1tMtM Kp6jIdnBBWBDuYhI5OxCtcOPUWCnrbponnTgT6U8C/SVmvulQy1IV3A30kt4b9IqqFEP +01A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:autocrypt :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=wP424tWQCwOWHAKF6chYcnvu9XPTD6kN6RyJ6xdytOU=; b=nKBVFakWvqM96t+AN+IPrbx1H7i6feJVKz5ixOqeZxIMN4yIjCDAN3vxRWwGKjpFaE lRRpfL9Ohl29R4mXUyA2RufVRg45CwOrOVODX2/LGYzGvXv2ghrzgoypkQzl/ZYlixiX Gl3vQ8RDQ9BIEEpmW/YaHwAj56MtHS7LVdJnj7qpvApFf/GMSVNOjSFvtBalQqmCFU4A Lt4Fnx2SJ8ChUHcAxHNG00BaIXAC+6AV8T9GgQCBumX6dJe2ecDriSt4ecmaDJvhWqLm v9Rm+AkdN5ctwOVvZQt1mdUa8Eg1/VWYoKGfaexzBJr36TzWPKBxlT3YHKZIbgMKZRl9 lT+Q== X-Gm-Message-State: AGi0PubjlA9i0qdKi8QSaDtUKNMbBWK0471eVHBhx3btJbj1TnxNn2Cl +8OV7xfz61i4VaV8Ln51Co6/+Q== X-Google-Smtp-Source: APiQypI4+kNZUWFcVjHFEKFe9oztxHTlOXTDMrRApDuhO/EYPPIUYjjIlTabg/9BA3uoBG1bL6KoWg== X-Received: by 2002:a63:dd0a:: with SMTP id t10mr1380020pgg.50.1587102453071; Thu, 16 Apr 2020 22:47:33 -0700 (PDT) Received: from [192.168.10.94] (124-171-87-207.dyn.iinet.net.au. [124.171.87.207]) by smtp.gmail.com with ESMTPSA id c10sm17131518pgh.48.2020.04.16.22.47.29 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 16 Apr 2020 22:47:32 -0700 (PDT) Subject: Re: [PATCH kernel v2 0/7] powerpc/powenv/ioda: Allow huge DMA window at 4GB To: Russell Currey , Oliver O'Halloran References: <20200323075354.93825-1-aik@ozlabs.ru> From: Alexey Kardashevskiy Autocrypt: addr=aik@ozlabs.ru; keydata= mQINBE+rT0sBEADFEI2UtPRsLLvnRf+tI9nA8T91+jDK3NLkqV+2DKHkTGPP5qzDZpRSH6mD EePO1JqpVuIow/wGud9xaPA5uvuVgRS1q7RU8otD+7VLDFzPRiRE4Jfr2CW89Ox6BF+q5ZPV /pS4v4G9eOrw1v09lEKHB9WtiBVhhxKK1LnUjPEH3ifkOkgW7jFfoYgTdtB3XaXVgYnNPDFo PTBYsJy+wr89XfyHr2Ev7BB3Xaf7qICXdBF8MEVY8t/UFsesg4wFWOuzCfqxFmKEaPDZlTuR tfLAeVpslNfWCi5ybPlowLx6KJqOsI9R2a9o4qRXWGP7IwiMRAC3iiPyk9cknt8ee6EUIxI6 t847eFaVKI/6WcxhszI0R6Cj+N4y+1rHfkGWYWupCiHwj9DjILW9iEAncVgQmkNPpUsZECLT WQzMuVSxjuXW4nJ6f4OFHqL2dU//qR+BM/eJ0TT3OnfLcPqfucGxubhT7n/CXUxEy+mvWwnm s9p4uqVpTfEuzQ0/bE6t7dZdPBua7eYox1AQnk8JQDwC3Rn9kZq2O7u5KuJP5MfludMmQevm pHYEMF4vZuIpWcOrrSctJfIIEyhDoDmR34bCXAZfNJ4p4H6TPqPh671uMQV82CfTxTrMhGFq 8WYU2AH86FrVQfWoH09z1WqhlOm/KZhAV5FndwVjQJs1MRXD8QARAQABtCRBbGV4ZXkgS2Fy ZGFzaGV2c2tpeSA8YWlrQG96bGFicy5ydT6JAjgEEwECACIFAk+rT0sCGwMGCwkIBwMCBhUI AgkKCwQWAgMBAh4BAheAAAoJEIYTPdgrwSC5fAIP/0wf/oSYaCq9PhO0UP9zLSEz66SSZUf7 AM9O1rau1lJpT8RoNa0hXFXIVbqPPKPZgorQV8SVmYRLr0oSmPnTiZC82x2dJGOR8x4E01gK TanY53J/Z6+CpYykqcIpOlGsytUTBA+AFOpdaFxnJ9a8p2wA586fhCZHVpV7W6EtUPH1SFTQ q5xvBmr3KkWGjz1FSLH4FeB70zP6uyuf/B2KPmdlPkyuoafl2UrU8LBADi/efc53PZUAREih sm3ch4AxaL4QIWOmlE93S+9nHZSRo9jgGXB1LzAiMRII3/2Leg7O4hBHZ9Nki8/fbDo5///+ kD4L7UNbSUM/ACWHhd4m1zkzTbyRzvL8NAVQ3rckLOmju7Eu9whiPueGMi5sihy9VQKHmEOx OMEhxLRQbzj4ypRLS9a+oxk1BMMu9cd/TccNy0uwx2UUjDQw/cXw2rRWTRCxoKmUsQ+eNWEd iYLW6TCfl9CfHlT6A7Zmeqx2DCeFafqEd69DqR9A8W5rx6LQcl0iOlkNqJxxbbW3ddDsLU/Y r4cY20++WwOhSNghhtrroP+gouTOIrNE/tvG16jHs8nrYBZuc02nfX1/gd8eguNfVX/ZTHiR gHBWe40xBKwBEK2UeqSpeVTohYWGBkcd64naGtK9qHdo1zY1P55lHEc5Uhlk743PgAnOi27Q ns5zuQINBE+rT0sBEACnV6GBSm+25ACT+XAE0t6HHAwDy+UKfPNaQBNTTt31GIk5aXb2Kl/p AgwZhQFEjZwDbl9D/f2GtmUHWKcCmWsYd5M/6Ljnbp0Ti5/xi6FyfqnO+G/wD2VhGcKBId1X Em/B5y1kZVbzcGVjgD3HiRTqE63UPld45bgK2XVbi2+x8lFvzuFq56E3ZsJZ+WrXpArQXib2 hzNFwQleq/KLBDOqTT7H+NpjPFR09Qzfa7wIU6pMNF2uFg5ihb+KatxgRDHg70+BzQfa6PPA o1xioKXW1eHeRGMmULM0Eweuvpc7/STD3K7EJ5bBq8svoXKuRxoWRkAp9Ll65KTUXgfS+c0x gkzJAn8aTG0z/oEJCKPJ08CtYQ5j7AgWJBIqG+PpYrEkhjzSn+DZ5Yl8r+JnZ2cJlYsUHAB9 jwBnWmLCR3gfop65q84zLXRQKWkASRhBp4JK3IS2Zz7Nd/Sqsowwh8x+3/IUxVEIMaVoUaxk Wt8kx40h3VrnLTFRQwQChm/TBtXqVFIuv7/Mhvvcq11xnzKjm2FCnTvCh6T2wJw3de6kYjCO 7wsaQ2y3i1Gkad45S0hzag/AuhQJbieowKecuI7WSeV8AOFVHmgfhKti8t4Ff758Z0tw5Fpc BFDngh6Lty9yR/fKrbkkp6ux1gJ2QncwK1v5kFks82Cgj+DSXK6GUQARAQABiQIfBBgBAgAJ BQJPq09LAhsMAAoJEIYTPdgrwSC5NYEP/2DmcEa7K9A+BT2+G5GXaaiFa098DeDrnjmRvumJ BhA1UdZRdfqICBADmKHlJjj2xYo387sZpS6ABbhrFxM6s37g/pGPvFUFn49C47SqkoGcbeDz Ha7JHyYUC+Tz1dpB8EQDh5xHMXj7t59mRDgsZ2uVBKtXj2ZkbizSHlyoeCfs1gZKQgQE8Ffc F8eWKoqAQtn3j4nE3RXbxzTJJfExjFB53vy2wV48fUBdyoXKwE85fiPglQ8bU++0XdOr9oyy j1llZlB9t3tKVv401JAdX8EN0++ETiOovQdzE1m+6ioDCtKEx84ObZJM0yGSEGEanrWjiwsa nzeK0pJQM9EwoEYi8TBGhHC9ksaAAQipSH7F2OHSYIlYtd91QoiemgclZcSgrxKSJhyFhmLr QEiEILTKn/pqJfhHU/7R7UtlDAmFMUp7ByywB4JLcyD10lTmrEJ0iyRRTVfDrfVP82aMBXgF tKQaCxcmLCaEtrSrYGzd1sSPwJne9ssfq0SE/LM1J7VdCjm6OWV33SwKrfd6rOtvOzgadrG6 3bgUVBw+bsXhWDd8tvuCXmdY4bnUblxF2B6GOwSY43v6suugBttIyW5Bl2tXSTwP+zQisOJo +dpVG2pRr39h+buHB3NY83NEPXm1kUOhduJUA17XUY6QQCAaN4sdwPqHq938S3EmtVhsuQIN BFq54uIBEACtPWrRdrvqfwQF+KMieDAMGdWKGSYSfoEGGJ+iNR8v255IyCMkty+yaHafvzpl PFtBQ/D7Fjv+PoHdFq1BnNTk8u2ngfbre9wd9MvTDsyP/TmpF0wyyTXhhtYvE267Av4X/BQT lT9IXKyAf1fP4BGYdTNgQZmAjrRsVUW0j6gFDrN0rq2J9emkGIPvt9rQt6xGzrd6aXonbg5V j6Uac1F42ESOZkIh5cN6cgnGdqAQb8CgLK92Yc8eiCVCH3cGowtzQ2m6U32qf30cBWmzfSH0 HeYmTP9+5L8qSTA9s3z0228vlaY0cFGcXjdodBeVbhqQYseMF9FXiEyRs28uHAJEyvVZwI49 CnAgVV/n1eZa5qOBpBL+ZSURm8Ii0vgfvGSijPGbvc32UAeAmBWISm7QOmc6sWa1tobCiVmY SNzj5MCNk8z4cddoKIc7Wt197+X/X5JPUF5nQRvg3SEHvfjkS4uEst9GwQBpsbQYH9MYWq2P PdxZ+xQE6v7cNB/pGGyXqKjYCm6v70JOzJFmheuUq0Ljnfhfs15DmZaLCGSMC0Amr+rtefpA y9FO5KaARgdhVjP2svc1F9KmTUGinSfuFm3quadGcQbJw+lJNYIfM7PMS9fftq6vCUBoGu3L j4xlgA/uQl/LPneu9mcvit8JqcWGS3fO+YeagUOon1TRqQARAQABiQRsBBgBCAAgFiEEZSrP ibrORRTHQ99dhhM92CvBILkFAlq54uICGwICQAkQhhM92CvBILnBdCAEGQEIAB0WIQQIhvWx rCU+BGX+nH3N7sq0YorTbQUCWrni4gAKCRDN7sq0YorTbVVSD/9V1xkVFyUCZfWlRuryBRZm S4GVaNtiV2nfUfcThQBfF0sSW/aFkLP6y+35wlOGJE65Riw1C2Ca9WQYk0xKvcZrmuYkK3DZ 0M9/Ikkj5/2v0vxz5Z5w/9+IaCrnk7pTnHZuZqOh23NeVZGBls/IDIvvLEjpD5UYicH0wxv+ X6cl1RoP2Kiyvenf0cS73O22qSEw0Qb9SId8wh0+ClWet2E7hkjWFkQfgJ3hujR/JtwDT/8h 3oCZFR0KuMPHRDsCepaqb/k7VSGTLBjVDOmr6/C9FHSjq0WrVB9LGOkdnr/xcISDZcMIpbRm EkIQ91LkT/HYIImL33ynPB0SmA+1TyMgOMZ4bakFCEn1vxB8Ir8qx5O0lHMOiWMJAp/PAZB2 r4XSSHNlXUaWUg1w3SG2CQKMFX7vzA31ZeEiWO8tj/c2ZjQmYjTLlfDK04WpOy1vTeP45LG2 wwtMA1pKvQ9UdbYbovz92oyZXHq81+k5Fj/YA1y2PI4MdHO4QobzgREoPGDkn6QlbJUBf4To pEbIGgW5LRPLuFlOPWHmIS/sdXDrllPc29aX2P7zdD/ivHABslHmt7vN3QY+hG0xgsCO1JG5 pLORF2N5XpM95zxkZqvYfC5tS/qhKyMcn1kC0fcRySVVeR3tUkU8/caCqxOqeMe2B6yTiU1P aNDq25qYFLeYxg67D/4w/P6BvNxNxk8hx6oQ10TOlnmeWp1q0cuutccblU3ryRFLDJSngTEu ZgnOt5dUFuOZxmMkqXGPHP1iOb+YDznHmC0FYZFG2KAc9pO0WuO7uT70lL6larTQrEneTDxQ CMQLP3qAJ/2aBH6SzHIQ7sfbsxy/63jAiHiT3cOaxAKsWkoV2HQpnmPOJ9u02TPjYmdpeIfa X2tXyeBixa3i/6dWJ4nIp3vGQicQkut1YBwR7dJq67/FCV3Mlj94jI0myHT5PIrCS2S8LtWX ikTJSxWUKmh7OP5mrqhwNe0ezgGiWxxvyNwThOHc5JvpzJLd32VDFilbxgu4Hhnf6LcgZJ2c Zd44XWqUu7FzVOYaSgIvTP0hNrBYm/E6M7yrLbs3JY74fGzPWGRbBUHTZXQEqQnZglXaVB5V ZhSFtHopZnBSCUSNDbB+QGy4B/E++Bb02IBTGl/JxmOwG+kZUnymsPvTtnNIeTLHxN/H/ae0 c7E5M+/NpslPCmYnDjs5qg0/3ihh6XuOGggZQOqrYPC3PnsNs3NxirwOkVPQgO6mXxpuifvJ DG9EMkK8IBXnLulqVk54kf7fE0jT/d8RTtJIA92GzsgdK2rpT1MBKKVffjRFGwN7nQVOzi4T XrB5p+6ML7Bd84xOEGsj/vdaXmz1esuH7BOZAGEZfLRCHJ0GVCSssg== Message-ID: <9893c4db-057d-8e42-52fe-8241d6d90b5f@ozlabs.ru> Date: Fri, 17 Apr 2020 15:47:27 +1000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: KVM list , Fabiano Rosas , Alistair Popple , kvm-ppc@vger.kernel.org, linuxppc-dev , David Gibson Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On 17/04/2020 11:26, Russell Currey wrote: > On Thu, 2020-04-16 at 12:53 +1000, Oliver O'Halloran wrote: >> On Thu, Apr 16, 2020 at 12:34 PM Oliver O'Halloran >> wrote: >>> On Thu, Apr 16, 2020 at 11:27 AM Alexey Kardashevskiy < >>> aik@ozlabs.ru> wrote: >>>> Anyone? Is it totally useless or wrong approach? Thanks, >>> >>> I wouldn't say it's either, but I still hate it. >>> >>> The 4GB mode being per-PHB makes it difficult to use unless we >>> force >>> that mode on 100% of the time which I'd prefer not to do. Ideally >>> devices that actually support 64bit addressing (which is most of >>> them) >>> should be able to use no-translate mode when possible since a) It's >>> faster, and b) It frees up room in the TCE cache devices that >>> actually >>> need them. I know you've done some testing with 100G NICs and found >>> the overhead was fine, but IMO that's a bad test since it's pretty >>> much the best-case scenario since all the devices on the PHB are in >>> the same PE. The PHB's TCE cache only hits when the TCE matches the >>> DMA bus address and the PE number for the device so in a multi-PE >>> environment there's a lot of potential for TCE cache trashing. If >>> there was one or two PEs under that PHB it's probably not going to >>> matter, but if you have an NVMe rack with 20 drives it starts to >>> look >>> a bit ugly. >>> >>> That all said, it might be worth doing this anyway since we >>> probably >>> want the software infrastructure in place to take advantage of it. >>> Maybe expand the command line parameters to allow it to be enabled >>> on >>> a per-PHB basis rather than globally. >> >> Since we're on the topic >> >> I've been thinking the real issue we have is that we're trying to >> pick >> an "optimal" IOMMU config at a point where we don't have enough >> information to work out what's actually optimal. The IOMMU config is >> done on a per-PE basis, but since PEs may contain devices with >> different DMA masks (looking at you wierd AMD audio function) we're >> always going to have to pick something conservative as the default >> config for TVE#0 (64k, no bypass mapping) since the driver will tell >> us what the device actually supports long after the IOMMU >> configuation >> is done. What we really want is to be able to have separate IOMMU >> contexts for each device, or at the very least a separate context for >> the crippled devices. >> >> We could allow a per-device IOMMU context by extending the Master / >> Slave PE thing to cover DMA in addition to MMIO. Right now we only >> use >> slave PEs when a device's MMIO BARs extend over multiple m64 >> segments. >> When that happens an MMIO error causes the PHB to freezes the PE >> corresponding to one of those segments, but not any of the others. To >> present a single "PE" to the EEH core we check the freeze status of >> each of the slave PEs when the EEH core does a PE status check and if >> any of them are frozen, we freeze the rest of them too. When a driver >> sets a limited DMA mask we could move that device to a seperate slave >> PE so that it has it's own IOMMU context taylored to its DMA >> addressing limits. >> >> Thoughts? > > For what it's worth this sounds like a good idea to me, it just sounds > tricky to implement. You're adding another layer of complexity on top > of EEH (well, making things look simple to the EEH core and doing your > own freezing on top of it) in addition to the DMA handling. > > If it works then great, just has a high potential to become a new bug > haven. imho putting every PCI function to a separate PE is the right thing to do here but I've been told it is not that simple, and I believe that. Reusing slave PEs seems unreliable - the configuration will depend on whether a PE occupied enough segments to give an unique PE to a PCI function and my little brain explodes. So this is not happening soon. For the time being, this patchset is good for: 1. weird hardware which has limited DMA mask (this is why the patchset was written in the first place) 2. debug DMA by routing it via IOMMU (even when 4GB hack is not enabled). -- Alexey