From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0B9CAD0C600 for ; Fri, 25 Oct 2024 11:16:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 595A86B0083; Fri, 25 Oct 2024 07:16:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 545406B0085; Fri, 25 Oct 2024 07:16:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 40D4F6B008A; Fri, 25 Oct 2024 07:16:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 252D16B0083 for ; Fri, 25 Oct 2024 07:16:24 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 1A3CAA0461 for ; Fri, 25 Oct 2024 11:15:48 +0000 (UTC) X-FDA: 82711870080.03.206CBDE Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf09.hostedemail.com (Postfix) with ESMTP id 71C97140006 for ; Fri, 25 Oct 2024 11:16:07 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=U2dN+jTB; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf09.hostedemail.com: domain of toke@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=toke@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729854928; a=rsa-sha256; cv=none; b=xQ619BLE8c8Ik/HlLFQUzCRplPaAELChMM16QGfTga9dcxFV+3MhAmmWak/pn2EAZarbkr lmoovRmybuUO/mA0UMxoe5zP3KP2FQjO7te8TLiqM1y7SOzbY7v1vbYr1kAoYc36rQJJmR /9ATootAdc8sHrbabff7eqiCpvACFGw= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=U2dN+jTB; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf09.hostedemail.com: domain of toke@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=toke@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729854928; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ojPbtEba3I+pxdii2b5vp6Mvbi/JQ/cy4jqBCqwFgJE=; b=67psaXG+8JZ50Qh5e1wnJY03rw0uRQD8Jf+y6cwlavksfxelgfSlIIT4YSDaa12qiKUcgb je3COYnp506tkHGjB0GMQVbHQcUx9w3Wcru4w5qoQxuRQ2ysh36PGT4pijYmi5lxusZzKe krs9TcWxCLpEJUS6ADGDqeYC22YMUoU= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1729854980; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ojPbtEba3I+pxdii2b5vp6Mvbi/JQ/cy4jqBCqwFgJE=; b=U2dN+jTBeInzOU/cmqAfhQLZ2CUZT+43Yz5w30JK6mlWvn92TqR+D8JdCX5kM0gMfU+4Jy Da/sTrXjLZXaAfHCv2JrM7pugB78PVY6lBSS88KA4dNbSYL74E57Klu9DTVNzo/loq5fVy Spb67qvlyquyZuKCS2GhqL/A9eIrEJ8= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-645--SBmSdsCN-KHQ1NrTWZhAg-1; Fri, 25 Oct 2024 07:16:19 -0400 X-MC-Unique: -SBmSdsCN-KHQ1NrTWZhAg-1 Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-4314f1e0f2bso13948015e9.1 for ; Fri, 25 Oct 2024 04:16:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729854978; x=1730459778; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=l0qPBrRNSLkb3KGkLhlgiZrNu9jD/XhF/NDSopp9rNg=; b=sEOspg5Ih+AQ/ZlpIdfc7/H40z6+i9FuoGZDUFYcQsUwbYjCzn0YFmIiKriQ8NmenP edRdf30ev6BUcuf52RcuodLmf4E2GcHb5AhEARw6grisfO+Q/ULTLHAlaufmszthUajk 8skNXkXe+2geYXNzfJX1GBMRtZO2lZrsOMlnTfx21dLNUWPSbfprjRPUEFWYDXxlN7Fa h6ITLYNXMJepfldixG3ZvApGmJaOyeJaJhSRX4WpXjOrHUXyDOtNY429t8b7n8UAxRV7 oildihfEnuQwFtPKW8woWk3J9x0fVotmsLSlRV4NAj4ErNffwUgc7N/oVTcVl5+Gghkf 4Y0w== X-Forwarded-Encrypted: i=1; AJvYcCXNbRy4ciiMSEAegC1Ek4PQni0P+4ETmiKcE01y7dNRBFaSfVjcN35XqwUpm5tU2T6u5ivyoRLj7g==@kvack.org X-Gm-Message-State: AOJu0YyVu1v5G/8ukMiZWL0b+5mtVfn6hgSRMyJlHVoKlOYNnEKm1S3l brVoE8gKq9NvPnD6LzjDAJaE0maUW959Y40BLAPMZNtwNSZP0/p9RARJxKPON01P2h7ZOazRCUi 9wJkQ7pPpu6CxbvVzo0FjBOyZrg8WsOPkKOEyy0Xa4n2ruXyn X-Received: by 2002:a05:600c:4509:b0:42f:8fcd:486c with SMTP id 5b1f17b1804b1-43184246f16mr69382315e9.33.1729854977948; Fri, 25 Oct 2024 04:16:17 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHz/TKf3cyp29H5EI56yHsa6P9AJDFSiMIPoxYVeexqi7wh6wrfOKtvYZ15vyaqaeliDwSAJQ== X-Received: by 2002:a05:600c:4509:b0:42f:8fcd:486c with SMTP id 5b1f17b1804b1-43184246f16mr69381965e9.33.1729854977523; Fri, 25 Oct 2024 04:16:17 -0700 (PDT) Received: from alrua-x1.borgediget.toke.dk ([45.145.92.2]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-431934be328sm15153085e9.0.2024.10.25.04.16.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Oct 2024 04:16:16 -0700 (PDT) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id DC671160B56C; Fri, 25 Oct 2024 13:16:15 +0200 (CEST) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= To: Yunsheng Lin , Jesper Dangaard Brouer , davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com Cc: zhangkun09@huawei.com, fanghaiqing@huawei.com, liuyonglong@huawei.com, Robin Murphy , Alexander Duyck , IOMMU , Andrew Morton , Eric Dumazet , Ilias Apalodimas , linux-mm@kvack.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, kernel-team Subject: Re: [PATCH net-next v3 3/3] page_pool: fix IOMMU crash when driver has already unbound In-Reply-To: References: <20241022032214.3915232-1-linyunsheng@huawei.com> <20241022032214.3915232-4-linyunsheng@huawei.com> <113c9835-f170-46cf-92ba-df4ca5dfab3d@huawei.com> <878qudftsn.fsf@toke.dk> X-Clacks-Overhead: GNU Terry Pratchett Date: Fri, 25 Oct 2024 13:16:15 +0200 Message-ID: <87r084e8lc.fsf@toke.dk> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: 71C97140006 X-Rspamd-Server: rspam01 X-Stat-Signature: u681u3uxman7w11ibk5ywmcnfmumjofd X-HE-Tag: 1729854967-519025 X-HE-Meta: U2FsdGVkX1/MXbviTSwVFTklkYv2dFcHwwgu9g962chfxVB6NFVS7foSgYmjVJfVHcMwpl/HIyReRsqs30xZ2gWLL9wWdojnkU8u+uNgKUjfH99sbzq/Z0tg5l+nF8DN7ORE4UPp/C8utoWLVsDNLNaJceGal/p18orAM0hOWV31TEoaKRIK1T7NelsG0zHDX4gjdQWKDNp7tNenuvklDHtGf8TFOmIrumfnVUiGU3NXW+c+j0JmUoDPV4L6MPrgfu2g9+4b64AJb3loNdlizyfGSnfvMoOrRM9g9suPNCJtcv9JjM8/wk9bGXk2H//jL7tUSWc12v5Y7yjrRCBXM2d9XXZ4LujKQw81cLiAfX7pkfCE+ef0QvCumU8Y6e2HR8GUXMK+tELs7BplDVjuvLor+Xg9NFWnIf8OeZt4nYI6FfzNdg0kyMdhA+gN+8BjQ8ZW5vuXmn/uQulZn2RoHzuZZi8KLyXpZvkkFwQwyVCcmglrKgEJDmbI8eom2UdPjRRSLLpgE99hASnPIHhe2X57743VckBuhJ+NBEgMIyx+M6Z3vcsMmu42ZWfcEWGAEajAvqblpoZ04U8KxHYk6gNQX/07UkG62tLJTDjsCXRUgj3SjV9wLY28pZHLAG+/hBDLDYO4cNpshI0FbIEa7pKMJICf0dXfF2fDnLxHchC79Bp/O1E88ujkdh7k2gaMSB6MDRTgz8tNtHoRmmT211HSTSQfbG/3XKBRP6DVatVcW0Jtpa44kXVs8HJ1iuTNvBsO6vcqSaiGj2pi2YS5d94QG0Lxj+hi6WJ/7+Of5DtD1PXvjw0ShWGMdlOsxFP+fZiC4zlDTbXkFR5+SnmUj40VO64NXpI924Es9N3DxCvW5HANUO3qWb62vjKc1hMciOEVPCvpTcIu9dfHMo3xv7Hid1tljU4AB1X7yqPpkzUz4rFyuVZ/c7RkTYr/qEMD6mR4s4jve8OlE/o/g+P gIzZP2eE 6QlWgoYULeRkNltNcATFTbd0Zc8M6EoXx4m8SPjd9lfFDeFMdfnhlcRhHfHjY8jxo+3xyXZpgpBEYepijJz6hb9YCJdr+yVGzMXSmIXYGEzDMSUMX6j2ij/dvgvGR3b17vPKueat8ERrt4XIS0BfHiX60TfDhC65Ki+RAG62ma002yqcfFio6ekvEz5DmV2GME6SRt0xCi43I503f1qxw4In0SY/VsURaol2Sl1UpySAQYOa1Bpltx3bnAmMY4o5u+08O0jNRop+1TkkUt4XyBP9rOpcVpNS2BRT7jf3s7TKiiHCAKGSLLVng5cSbFnX/mbwqvKwwdJP5/ccZqdKcopAvJU3g83tlIWa0Wi1VMgRj/FrjoEdrwuhpiw23GQZD8pug+dIyArcfKen2cc0fj6FVgoak+wRg6iSwPKksQuwYr/74kW9x0Cj2P7kwNMpMYDrPawqYt1JeKRJHwDfNFBzd15ORtJqrOij2LQCh3IXivt9wP4ioLqjrE57uwUuJm1as9g6322NxcwVJTo05l0WRYkYfeVkeh3W+x8VOqgAbDlPvY2vzq3/gsNyEZRWAJ7Xn43HegxYoubaFrJ1+40lTGNSwq+M7brvofHJRUmqiXSiLDqsxS7R5Xnkuugyg68Vh X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Yunsheng Lin writes: > On 2024/10/24 22:40, Toke H=C3=B8iland-J=C3=B8rgensen wrote: > > ... > >>>> >>>> I really really dislike this approach! >>>> >>>> Nacked-by: Jesper Dangaard Brouer >>>> >>>> Having to keep an array to record all the pages including the ones >>>> which are handed over to network stack, goes against the very principl= e >>>> behind page_pool. We added members to struct page, such that pages cou= ld >>>> be "outstanding". >>> >>> Before and after this patch both support "outstanding", the difference = is >>> how many "outstanding" pages do they support. >>> >>> The question seems to be do we really need unlimited inflight page for >>> page_pool to work as mentioned in [1]? >>> >>> 1. https://lore.kernel.org/all/5d9ea7bd-67bb-4a9d-a120-c8f290c31a47@hua= wei.com/ >>=20 >> Well, yes? Imposing an arbitrary limit on the number of in-flight >> packets (especially such a low one as in this series) is a complete >> non-starter. Servers have hundreds of gigs of memory these days, and if >> someone wants to use that for storing in-flight packets, the kernel >> definitely shouldn't impose some (hard-coded!) limit on that. > > You and Jesper seems to be mentioning a possible fact that there might > be 'hundreds of gigs of memory' needed for inflight pages, it would be ni= ce > to provide more info or reasoning above why 'hundreds of gigs of memory' = is > needed here so that we don't do a over-designed thing to support recordin= g > unlimited in-flight pages if the driver unbound stalling turns out imposs= ible > and the inflight pages do need to be recorded. I don't have a concrete example of a use that will blow the limit you are setting (but maybe Jesper does), I am simply objecting to the arbitrary imposing of any limit at all. It smells a lot of "640k ought to be enough for anyone". > I guess it is common sense to start with easy one until someone complains > with some testcase and detailed reasoning if we need to go the hard way a= s > you and Jesper are also prefering waiting over having to record the infli= ght > pages. AFAIU Jakub's comment on his RFC patch for waiting, he was suggesting exactly this: Add the wait, and see if the cases where it can stall turn out to be problems in practice. > More detailed about why we might need to go the hard way of having to rec= ord > the inflight pages as below. > >>=20 >>>> >>>> The page_pool already have a system for waiting for these outstanding = / >>>> inflight packets to get returned.=C2=A0 As I suggested before, the pag= e_pool >>>> should simply take over the responsability (from net_device) to free t= he >>>> struct device (after inflight reach zero), where AFAIK the DMA device = is >>>> connected via. >>> >>> It seems you mentioned some similar suggestion in previous version, >>> it would be good to show some code about the idea in your mind, I am su= re >>> that Yonglong Liu Cc'ed will be happy to test it if there some code lik= e >>> POC/RFC is provided. >>=20 >> I believe Jesper is basically referring to Jakub's RFC that you >> mentioned below. >>=20 >>> I should mention that it seems that DMA device is not longer vaild when >>> remove() function of the device driver returns, as mentioned in [2], wh= ich >>> means dma API is not allowed to called after remove() function of the d= evice >>> driver returns. >>> >>> 2. https://www.spinics.net/lists/netdev/msg1030641.html >>> >>>> >>>> The alternative is what Kuba suggested (and proposed an RFC for),=C2= =A0 that >>>> the net_device teardown waits for the page_pool inflight packets. >>> >>> As above, the question is how long does the waiting take here? >>> Yonglong tested Kuba's RFC, see [3], the waiting took forever due to >>> reason as mentioned in commit log: >>> "skb_defer_free_flush(): this may cause infinite delay if there is no >>> triggering for net_rx_action()." >>=20 >> Honestly, this just seems like a bug (the "no triggering of >> net_rx_action()") that should be root caused and fixed; not a reason >> that waiting can't work. > > I would prefer the waiting too if simple waiting fixed the test cases tha= t > Youglong and Haiqing were reporting and I did not look into the rabbit ho= le > of possible caching in networking. > > As mentioned in commit log and [1]: > 1. ipv4 packet defragmentation timeout: this seems to cause delay up to 3= 0 > secs, which was reported by Haiqing. > 2. skb_defer_free_flush(): this may cause infinite delay if there is no > triggering for net_rx_action(), which was reported by Yonglong. > > For case 1, is it really ok to stall the driver unbound up to 30 secs for= the > default setting of defragmentation timeout? > > For case 2, it is possible to add timeout for those kind of caching like = the > defragmentation timeout too, but as mentioned in [2], it seems to be a no= rmal > thing for this kind of caching in networking: Both 1 and 2 seem to be cases where the netdev teardown code can just make sure to kick the respective queues and make sure there's nothing outstanding (for (1), walk the defrag cache and clear out anything related to the netdev going away, for (2) make sure to kick net_rx_action() as part of the teardown). > "Eric pointed out/predicted there's no guarantee that applications will > read / close their sockets so a page pool page may be stuck in a socket > (but not leaked) forever." As for this one, I would put that in the "well, let's see if this becomes a problem in practice" bucket. -Toke