From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f43.google.com (mail-wm1-f43.google.com [209.85.128.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AA1B374BED for ; Tue, 11 Mar 2025 13:24:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.43 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741699461; cv=none; b=cIVjMHo4uVtQt7VGfeQoYhjIjE61Tl68Kh2C+8YY0wUBmyXPIKXOrvwoUMEKy+IRDAMCCXd9mj/Fj47Ya+s27t6O+aDGkZpOtHLmHcCdXAI0iynQxSfxULFs5KDP+X4mqb+LQ+5ZpWH7jNqKXKi/ptdQ8VVc7GQHn3jgZWDm0xw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741699461; c=relaxed/simple; bh=a+qml/l+P0CYqMejAk7Vj7ahyJ9uQrNqhenaR9eEJ/c=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=MMX1CSb9HltJp4vIuh/1M0uUdYP0jGeW5ZTwZZaHi/qhjuRmN5OaNUOThyY9RU/CCwv8nMlwiZQJoQvUwHdiu+pEqMcFjlFlFKoRm1Ef6UGTFLwmK5oCYBXrBM4TRqVph1cs+d1nuUstTaUGNcsJVMnNpVCcnN5/XhyEYJkNCAc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=f0F0rKui; arc=none smtp.client-ip=209.85.128.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="f0F0rKui" Received: by mail-wm1-f43.google.com with SMTP id 5b1f17b1804b1-43cebe06e9eso17756395e9.3 for ; Tue, 11 Mar 2025 06:24:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741699458; x=1742304258; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=SB2j+2DJD1bu8gPq0cnVv7W46llV7q01ONp8A9vhfUg=; b=f0F0rKuicpW5PGo7NCBN8Zf67L9XpQuQ3xyZ87NdQRg3VgEAf7tJmDirVadq0n57nH UkprNvEpEhlJyAXsO76oFNxRrlmyR0cYX+psgW5dKLOT4Y71LYW459NPV7zkjxzIbIf7 4/+sStySCm+RGvHp2TYO9RSkl4cYz6qGn8qpH8ZfGRWNr0D1OXmEuVxcbrwIK4v+PKrd RbaucF9hT7UHOHg3gfw44mE5e6Yei3A253ME2zW31kZzpLDZ87xRFasXpJfX/UAQUqz/ zpf7tCMn8MwqOzKD/ceSWwNB1n2XL7LSCDz5FfJRPDWRNAhGOq+4LpRgQnX04gdIX4e3 ggRA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741699458; x=1742304258; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=SB2j+2DJD1bu8gPq0cnVv7W46llV7q01ONp8A9vhfUg=; b=wzXBIq5y1XQBfIFNusnPQPlyp/GClcS3+5FcDjDi2kSUsU6Z//7vdm24Ik8H1bVXSy BhdqkzJr4gkCrPvrUhe6Kj+TMVdf6FCQ8gCt51oDz91oTc40kMEtHDkpKw/TWNQExJOx migQ3B5myIHmevkFCG4NLVhP8bpphT1gzCmZiNsFPR4j179Y9TciweMHE21nKSbcWDYK m+laKmAKXG/PTfgd+utUDIBw7oQzqytnk/9UGT+IsYRkB1H+ohbvdlb7Fv9R5yViNaRV hBjjqvAjocWi8Eag8j9gNJlh76A4VApXhlw3tTy5B3ewuhjQPFa/7KpwgB+lEl0rlWfR qK3Q== X-Forwarded-Encrypted: i=1; AJvYcCWawLei/P2w8Jye1Z670tv9JVEcUrD0Nvt8fGIoNx2tIk1G6kurwjHYYqvJiBjYUJoPAJv2Krw=@vger.kernel.org X-Gm-Message-State: AOJu0Yw0dQKV9XTa6XyaxhRjbtz/U3ZMed4m1AQ/+wyWHsUTSoXexXFm jm+Ra2Kg0TpqLPzbSALyVufKCnBilKAMtJg+EfMLBtQhbM9LHHPmn8QTZypS X-Gm-Gg: ASbGnctaR4aByAoYjtoykbrbMFJtY4PDGdbFB0jQgl/cz0CKXJLNQKn8Kvwxdq310XQ 6cVhLmwruqUL94guEGPLmNVJamZ+doERz5jifSEAKIpUhgDE6O9WNjV4WEzzTOlu2AM03oLhLQM u6gGIdWd6UtbJXJcIolsS885s0TWAsPNJsNRq8zFajXP8K9EOrKGFstWpMSq8gOWR9Fm+yTizrG SeMjL1KR9B/rUXaf+fRTZwOSNQJ6Tej6ElbEUgDYMEtg5aBgDgc4G7fC0sqMUy85U0xIZAY4BmT 7HRJcPZA9yCSWUdjzOqJdFSwYyHIuFUIRb/NhY296ndNa/C9CengfH5Sww== X-Google-Smtp-Source: AGHT+IEG41Sh2atWO3RHBfoSb7Tx3IIwYhLEiEKsNbi+A9VfcSbMXi1f7yke56YANW9xiS3CuXBj5g== X-Received: by 2002:a05:6000:381:b0:391:4ca:490 with SMTP id ffacd0b85a97d-39132d76e70mr12811495f8f.29.1741699457734; Tue, 11 Mar 2025 06:24:17 -0700 (PDT) Received: from [192.168.116.141] ([148.252.129.108]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3912c0e2f10sm18253881f8f.65.2025.03.11.06.24.15 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 11 Mar 2025 06:24:16 -0700 (PDT) Message-ID: Date: Tue, 11 Mar 2025 13:25:11 +0000 Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH net-next] page_pool: Track DMA-mapped pages and unmap them when destroying the pool To: =?UTF-8?Q?Toke_H=C3=B8iland-J=C3=B8rgensen?= , Mina Almasry , David Wei Cc: Andrew Morton , Jesper Dangaard Brouer , Ilias Apalodimas , "David S. Miller" , Yunsheng Lin , Yonglong Liu , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , linux-mm@kvack.org, netdev@vger.kernel.org References: <20250308145500.14046-1-toke@redhat.com> <87cyeqml3d.fsf@toke.dk> Content-Language: en-US From: Pavel Begunkov In-Reply-To: <87cyeqml3d.fsf@toke.dk> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 3/9/25 12:42, Toke Høiland-Jørgensen wrote: > Mina Almasry writes: > >> On Sat, Mar 8, 2025 at 6:55 AM Toke Høiland-Jørgensen wrote: >>> >>> When enabling DMA mapping in page_pool, pages are kept DMA mapped until >>> they are released from the pool, to avoid the overhead of re-mapping the >>> pages every time they are used. This causes problems when a device is >>> torn down, because the page pool can't unmap the pages until they are >>> returned to the pool. This causes resource leaks and/or crashes when >>> there are pages still outstanding while the device is torn down, because >>> page_pool will attempt an unmap of a non-existent DMA device on the >>> subsequent page return. >>> >>> To fix this, implement a simple tracking of outstanding dma-mapped pages >>> in page pool using an xarray. This was first suggested by Mina[0], and >>> turns out to be fairly straight forward: We simply store pointers to >>> pages directly in the xarray with xa_alloc() when they are first DMA >>> mapped, and remove them from the array on unmap. Then, when a page pool >>> is torn down, it can simply walk the xarray and unmap all pages still >>> present there before returning, which also allows us to get rid of the >>> get/put_device() calls in page_pool. >> >> THANK YOU!! I had been looking at the other proposals to fix this here >> and there and I had similar feelings to you. They add lots of code >> changes and the code changes themselves were hard for me to >> understand. I hope we can make this simpler approach work. > > You're welcome :) > And yeah, me too! > >>> Using xa_cmpxchg(), no additional >>> synchronisation is needed, as a page will only ever be unmapped once. >>> >> >> Very clever. I had been wondering how to handle the concurrency. I >> also think this works. > > Thanks! > >>> To avoid having to walk the entire xarray on unmap to find the page >>> reference, we stash the ID assigned by xa_alloc() into the page >>> structure itself, in the field previously called '_pp_mapping_pad' in >>> the page_pool struct inside struct page. This field overlaps with the >>> page->mapping pointer, which may turn out to be problematic, so an >>> alternative is probably needed. Sticking the ID into some of the upper >>> bits of page->pp_magic may work as an alternative, but that requires >>> further investigation. Using the 'mapping' field works well enough as >>> a demonstration for this RFC, though. >>> >> >> I'm unsure about this. I think page->mapping may be used when we map >> the page to the userspace in TCP zerocopy, but I'm really not sure. >> Yes, finding somewhere else to put the id would be ideal. Do we really >> need a full unsigned long for the pp_magic? > > No, pp_magic was also my backup plan (see the other thread). Tried > actually doing that now, and while there's a bit of complication due to > the varying definitions of POISON_POINTER_DELTA across architectures, > but it seems that this can be defined at compile time. I'll send a v2 > RFC with this change. FWIW, personally I like this one much more than an extra indirection to pp. If we're out of space in the page, why can't we use struct page * as indices into the xarray? Ala struct page *p = ...; xa_store(xarray, index=(unsigned long)p, p); Indices wouldn't be nicely packed, but it's still a map. Is there a problem with that I didn't consider? -- Pavel Begunkov