From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wr1-f66.google.com (mail-wr1-f66.google.com [209.85.221.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 90A2B43C05B for ; Thu, 22 Jan 2026 11:43:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.66 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769082216; cv=none; b=J80DXyLlO7KmwmGcxJUZayMSGvt5AADskVNVIpbMhTwBDaELgSs86VokBB9k2/8kxDfHG+4bIMIillL5BSbIeqcMkpTxOzpC1xg5f4Ekrr0lbVkACjvh14HsJh1rbXov592hU0QGuEq1iR2OgzTx2gx4NHmGP/eEUYUI0ZzsE4g= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769082216; c=relaxed/simple; bh=F41KeYoTPX1xryCYyrjJBC6quQJkQA8gOyh0i4b/oQ4=; h=Message-ID:Date:MIME-Version:From:Subject:To:Cc:References: In-Reply-To:Content-Type; b=Fk8ifZ9ttyiHRVMXErWng2Gq2WcAuE7TdtBHmnprr1gMIUzLBgazg4D5UQg4dMSRAsZPOyW3DtiinlQyHMMWHg7MkMsuCGymokLfoMl3AxVNCNgmLB+gu0jNnimGsjP2PygCyrtJ83vks4HTkurY+imtjzO3sYE8G7jbCjbJ4XA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=HhLiMYfp; arc=none smtp.client-ip=209.85.221.66 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="HhLiMYfp" Received: by mail-wr1-f66.google.com with SMTP id ffacd0b85a97d-432d28870ddso506008f8f.3 for ; Thu, 22 Jan 2026 03:43:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1769082213; x=1769687013; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:content-language:references :cc:to:subject:from:user-agent:mime-version:date:message-id:from:to :cc:subject:date:message-id:reply-to; bh=4HpEyOC6PJWGGCRXO6diXKKa/2BXf2R4DS4i3pG5vnY=; b=HhLiMYfpVekH0/ua43LJafJwNLFcyyqm8QlDnSmGBL31MtrJDQpCzE9t+OxuL2peyT HB63bWXBta/gXSjZurr6WLluqHt61qkUH7uPSOY1dUYjjjoUDFag0AbVDlTEoaGzwuwu ewpyVEe7SWMTL9QfFTUMhE3mptNMKie2O1UXv5NM4b6zRwI6HXdMzYRAM7J9Ufg6Y0Rm 1rZRhxLZTWt4/SI1pYL26u1G7beaU3qDT2xH8zc72Dcuewqt1DvdEQP+BhMKUQGQmcJL FxUOLspxo5eeh51UHgKu6hmdSksMJ9ftPsFBF8ijkUwmcqYJUfJpJio57exbBIjTmRg2 Ft4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769082213; x=1769687013; h=content-transfer-encoding:in-reply-to:content-language:references :cc:to:subject:from:user-agent:mime-version:date:message-id:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=4HpEyOC6PJWGGCRXO6diXKKa/2BXf2R4DS4i3pG5vnY=; b=e5fATYOflKov/MQbbMxw4sw8zZugm88FlpiImKVNcNc1r7asMESWIJTvqAmQMMEgbK SIdOzFMh7O1dqA7oO8mCurEQu9u2sG4Icbhis82miVUpOIDFavBUBSKdthN5rc3E09Wm XmLH9flmPzRHqZlI65mkDigLODHZP2IWnEGhNqSvWMj9KINCl9NW6QzcliuiifGzkdVi 7vlGegwHExV8s1b8xs9B462ZUm+F5sYmGbTGn1a/PAWODB1ZXBcWu647RnAtviyjYPX5 uc3W9DU1U4uUo20PxdCtK8iBLXOnpqhwVijCTp9BeYLASalFc9bxI+m3KiqUKVkG+yMs W/Mg== X-Forwarded-Encrypted: i=1; AJvYcCXe6glplLwN8TFUGjQZiduwG/TcgvhaI53ZfVSUvpeBlvs/h8Gv6AmwxXLbtyRLWkmKTvEZ2Fs=@vger.kernel.org X-Gm-Message-State: AOJu0YxUFu1J+ZF7Ykz4Gro2CUB6ITb3FnTY2MrtbnL0epXdhECXduJu yjeEhEwqAwyUl/3LcKmPQDtDm7DP4OlgFeJYO3h/Ei1f0S1L71x3KIsRRkGX5YJL27Q= X-Gm-Gg: AZuq6aLwBZQ3CSGWLZAZeDvZ2s7eDsDKWKTJkz3PJGGuUhEPFLOMz7qQbpbf9NPEoXA UV6Z3ZrpzXSYy+nghJlil3Nle0612I6qFkDAw90n1GuQEtJQbVABuQw2xXh61lht65br1du4DPn uadyheeiLY+WciZXgNDbCLKrRCAp9GCMf9DAh38MYJNyoY+FffgEXlg3wyLUPJ60XWj4pqscAxf Pblp4R/bi3NajQu/bwseaunIRArqUA5ObXJgvCmcykrKc6kaQPGtJCWqBMsC695M6zz9kXBjemn /CCn3jwi5z7fqfmLuqQlTfZ9XafWuQWtPtlyP1bNm1Vdpli7I6FVoMC6rQDUbf9aYMIqvNHpv1I UtlFd8wqg5yl3Phx7eGP5JzHYm9S6el1X0pdX2sdsd3CiKSb5M5Odm8rSseXuDgeBOrPA7KNOSw OjPZcoOzFc1QpSn1hIfSfbtUAgUgle1htC6sqUa4UsBfDuf2IdwgYm30wKzaO45GkmKZMtFcotb h7IekDfXROf7HxCP1bFyC4yLWm58FnuVp5CNRW94qAn1Bo= X-Received: by 2002:a05:6000:2502:b0:430:f3ab:56a1 with SMTP id ffacd0b85a97d-43569bcb6d7mr31529951f8f.42.1769082212593; Thu, 22 Jan 2026 03:43:32 -0800 (PST) Received: from ?IPV6:2620:10d:c096:325:77fd:1068:74c8:af87? ([2620:10d:c092:600::1:46c4]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-43569921f6esm43811271f8f.4.2026.01.22.03.43.31 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 22 Jan 2026 03:43:31 -0800 (PST) Message-ID: Date: Thu, 22 Jan 2026 11:43:28 +0000 Precedence: bulk X-Mailing-List: stable@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird From: Pavel Begunkov Subject: Re: [PATCH v2] io_uring/rsrc: fix RLIMIT_MEMLOCK bypass by removing cross-buffer accounting To: Jens Axboe , Yuhao Jiang Cc: io-uring@vger.kernel.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org References: <20260119071039.2113739-1-danisjiang@gmail.com> <2919f3c5-2510-4e97-ab7f-c9eef1c76a69@kernel.dk> <8c6a9114-82e9-416e-804b-ffaa7a679ab7@kernel.dk> <2be71481-ac35-4ff2-b6a9-a7568f81f728@gmail.com> <2fcf583a-f521-4e8d-9a89-0985681ca85b@kernel.dk> Content-Language: en-US In-Reply-To: <2fcf583a-f521-4e8d-9a89-0985681ca85b@kernel.dk> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 1/21/26 14:58, Jens Axboe wrote: > On 1/20/26 2:45 PM, Pavel Begunkov wrote: >> On 1/20/26 17:03, Jens Axboe wrote: >>> On 1/20/26 5:05 AM, Pavel Begunkov wrote: >>>> On 1/20/26 07:05, Yuhao Jiang wrote: >> ... >>>>> >>>>> I've been implementing the xarray-based ref tracking approach for v3. >>>>> While working on it, I discovered an issue with buffer cloning. >>>>> >>>>> If ctx1 has two buffers sharing a huge page, ctx1->hpage_acct[page] = 2. >>>>> Clone to ctx2, now both have a refcount of 2. On cleanup both hit zero >>>>> and unaccount, so we double-unaccount and user->locked_vm goes negative. >>>>> >>>>> The per-context xarray can't coordinate across clones - each context >>>>> tracks its own refcount independently. I think we either need a global >>>>> xarray (shared across all contexts), or just go back to v2. What do >>>>> you think? >>>> >>>> The Jens' diff is functionally equivalent to your v1 and has >>>> exactly same problems. Global tracking won't work well. >>> >>> Why not? My thinking was that we just use xa_lock() for this, with >>> a global xarray. It's not like register+unregister is a high frequency >>> thing. And if they are, then we've got much bigger problems than the >>> single lock as the runtime complexity isn't ideal. >> >> 1. There could be quite a lot of entries even for a single ring >> with realistic amount of memory. If lots of threads start up >> at the same time taking it in a loop, it might become a chocking >> point for large systems. Should be even more spectacular for >> some numa setups. > > I already briefly touched on that earlier, for sure not going to be of > any practical concern. Modest 16 GB can give 1M entries. Assuming 50ns-100ns per entry for the xarray business, that's 50-100ms. It's all serialised, so multiply by the number of CPUs/threads, e.g. 10-100, that's 0.5-10s. Account sky high spinlock contention, and it jumps again, and there can be more memory / CPUs / numa nodes. Not saying that it's worse than the current O(n^2), I have a test program that borderline hangs the system. Look, I don't care what it'd be, whether it stutters or blows up the kernel, I only took a quick look since you pinged me and was asking "why not". If you don't want to consider my reasoning, as the maintainer you can merge whatever you like, and it'll be easier for me as I won't be wasting more time. -- Pavel Begunkov