From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wr1-f66.google.com (mail-wr1-f66.google.com [209.85.221.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 90ABA43CEFF for ; Thu, 22 Jan 2026 11:43:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.66 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769082216; cv=none; b=J80DXyLlO7KmwmGcxJUZayMSGvt5AADskVNVIpbMhTwBDaELgSs86VokBB9k2/8kxDfHG+4bIMIillL5BSbIeqcMkpTxOzpC1xg5f4Ekrr0lbVkACjvh14HsJh1rbXov592hU0QGuEq1iR2OgzTx2gx4NHmGP/eEUYUI0ZzsE4g= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769082216; c=relaxed/simple; bh=F41KeYoTPX1xryCYyrjJBC6quQJkQA8gOyh0i4b/oQ4=; h=Message-ID:Date:MIME-Version:From:Subject:To:Cc:References: In-Reply-To:Content-Type; b=Fk8ifZ9ttyiHRVMXErWng2Gq2WcAuE7TdtBHmnprr1gMIUzLBgazg4D5UQg4dMSRAsZPOyW3DtiinlQyHMMWHg7MkMsuCGymokLfoMl3AxVNCNgmLB+gu0jNnimGsjP2PygCyrtJ83vks4HTkurY+imtjzO3sYE8G7jbCjbJ4XA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=HhLiMYfp; arc=none smtp.client-ip=209.85.221.66 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="HhLiMYfp" Received: by mail-wr1-f66.google.com with SMTP id ffacd0b85a97d-432da746749so493338f8f.0 for ; Thu, 22 Jan 2026 03:43:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1769082213; x=1769687013; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:content-language:references :cc:to:subject:from:user-agent:mime-version:date:message-id:from:to :cc:subject:date:message-id:reply-to; bh=4HpEyOC6PJWGGCRXO6diXKKa/2BXf2R4DS4i3pG5vnY=; b=HhLiMYfpVekH0/ua43LJafJwNLFcyyqm8QlDnSmGBL31MtrJDQpCzE9t+OxuL2peyT HB63bWXBta/gXSjZurr6WLluqHt61qkUH7uPSOY1dUYjjjoUDFag0AbVDlTEoaGzwuwu ewpyVEe7SWMTL9QfFTUMhE3mptNMKie2O1UXv5NM4b6zRwI6HXdMzYRAM7J9Ufg6Y0Rm 1rZRhxLZTWt4/SI1pYL26u1G7beaU3qDT2xH8zc72Dcuewqt1DvdEQP+BhMKUQGQmcJL FxUOLspxo5eeh51UHgKu6hmdSksMJ9ftPsFBF8ijkUwmcqYJUfJpJio57exbBIjTmRg2 Ft4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769082213; x=1769687013; h=content-transfer-encoding:in-reply-to:content-language:references :cc:to:subject:from:user-agent:mime-version:date:message-id:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=4HpEyOC6PJWGGCRXO6diXKKa/2BXf2R4DS4i3pG5vnY=; b=btY1RSzptkvV+rLwi75ga8KSnB5LbBHwI/wEbCWnFgElsX5lP3GgRIAN9LC1ob069V 1cVAucKu0W8NZCkYtNJN335Jdam06+w8m5JsYufv/cuU8roxGaJ9gx56SRJAj0T1O99T hy2r/6xWf7WSnJvmbjHpgvHckiP9NE/BE4/5dfNULQIW+Y6Z7f2QR4Nl3fvE4Due1ysl qar5cUzaxkITps74WkzNr6z964W26zSwL3sLGLWMj7SmTjLd1prbweOylxAS9A7srpF4 dWH9lqYvLJ7+4wggbwW/UOgKmmHwovMe4e54nvmy8v5muTL1ZHwUO5I/TaYpHfdUdyZL KX5w== X-Forwarded-Encrypted: i=1; AJvYcCWOVEFuvzFvHoggazz1uR/dE32ifLtOhOTNqJo8dtQiIpfxXLPXNrJa+rJrrD49Wqj3KSmFdtBTKNHcQSI=@vger.kernel.org X-Gm-Message-State: AOJu0YzVvPXyh3FZOgne2o30rtCedscWvA1mYdwF9NRJaqAr/4L4QiNu KqT9eCRc1yKxNibum58vBDqAEVmZR4959RzVgVkQJ9ZWD67Nb/chL/35 X-Gm-Gg: AZuq6aI0/5FK9TslJrp8olpQfCcs02ilvWGZY9gkPcldryA4MQGYXyFHEwIWGYeUmSU Zbr0g+1xpwL7Q7IrfJDfgQGkLUeefSi7w10Ku5v0HRb2K+5lVGaLy9ZBKkQaaFB4VRZiXJa6vVV fsuBaMlaTG7OvWp5lYgpyxmk1JPJV3ixBZD0pjwDd4Qy06gxdm7UBpeWBYXEYsAXPpMEhASu9zs BhqaWdmRHJtvpvq+BdtoG8NskJNwnbKg3sQNPR/4FFWeyk29citc6j4RWk32voymbQuYeEfTsw/ fsJjrYAVMymuVGsF547ks5wWSO5TkqDpP7yW3n0e7DMDByhYe5a/MHbQU0UfW9x7T9jCVUPirxt 56niAqfk5rNJJinc90QD01VjTN5UoynDpxCcQIEbIxl3fL4xLi5lQr22FtemWc2t+s2aS9tkFpA FGC1xmeSnRLW3PxqbAdGyob79BMwSejZvI5uJXR2KVvMLw9eF1yy647ZV9bymFGJLpqOpHn3KpU bLk0Q6T0T0sFkm69YOOl2w4m2D+J1C4T77s8fcHZSBixLE= X-Received: by 2002:a05:6000:2502:b0:430:f3ab:56a1 with SMTP id ffacd0b85a97d-43569bcb6d7mr31529951f8f.42.1769082212593; Thu, 22 Jan 2026 03:43:32 -0800 (PST) Received: from ?IPV6:2620:10d:c096:325:77fd:1068:74c8:af87? ([2620:10d:c092:600::1:46c4]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-43569921f6esm43811271f8f.4.2026.01.22.03.43.31 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 22 Jan 2026 03:43:31 -0800 (PST) Message-ID: Date: Thu, 22 Jan 2026 11:43:28 +0000 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird From: Pavel Begunkov Subject: Re: [PATCH v2] io_uring/rsrc: fix RLIMIT_MEMLOCK bypass by removing cross-buffer accounting To: Jens Axboe , Yuhao Jiang Cc: io-uring@vger.kernel.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org References: <20260119071039.2113739-1-danisjiang@gmail.com> <2919f3c5-2510-4e97-ab7f-c9eef1c76a69@kernel.dk> <8c6a9114-82e9-416e-804b-ffaa7a679ab7@kernel.dk> <2be71481-ac35-4ff2-b6a9-a7568f81f728@gmail.com> <2fcf583a-f521-4e8d-9a89-0985681ca85b@kernel.dk> Content-Language: en-US In-Reply-To: <2fcf583a-f521-4e8d-9a89-0985681ca85b@kernel.dk> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 1/21/26 14:58, Jens Axboe wrote: > On 1/20/26 2:45 PM, Pavel Begunkov wrote: >> On 1/20/26 17:03, Jens Axboe wrote: >>> On 1/20/26 5:05 AM, Pavel Begunkov wrote: >>>> On 1/20/26 07:05, Yuhao Jiang wrote: >> ... >>>>> >>>>> I've been implementing the xarray-based ref tracking approach for v3. >>>>> While working on it, I discovered an issue with buffer cloning. >>>>> >>>>> If ctx1 has two buffers sharing a huge page, ctx1->hpage_acct[page] = 2. >>>>> Clone to ctx2, now both have a refcount of 2. On cleanup both hit zero >>>>> and unaccount, so we double-unaccount and user->locked_vm goes negative. >>>>> >>>>> The per-context xarray can't coordinate across clones - each context >>>>> tracks its own refcount independently. I think we either need a global >>>>> xarray (shared across all contexts), or just go back to v2. What do >>>>> you think? >>>> >>>> The Jens' diff is functionally equivalent to your v1 and has >>>> exactly same problems. Global tracking won't work well. >>> >>> Why not? My thinking was that we just use xa_lock() for this, with >>> a global xarray. It's not like register+unregister is a high frequency >>> thing. And if they are, then we've got much bigger problems than the >>> single lock as the runtime complexity isn't ideal. >> >> 1. There could be quite a lot of entries even for a single ring >> with realistic amount of memory. If lots of threads start up >> at the same time taking it in a loop, it might become a chocking >> point for large systems. Should be even more spectacular for >> some numa setups. > > I already briefly touched on that earlier, for sure not going to be of > any practical concern. Modest 16 GB can give 1M entries. Assuming 50ns-100ns per entry for the xarray business, that's 50-100ms. It's all serialised, so multiply by the number of CPUs/threads, e.g. 10-100, that's 0.5-10s. Account sky high spinlock contention, and it jumps again, and there can be more memory / CPUs / numa nodes. Not saying that it's worse than the current O(n^2), I have a test program that borderline hangs the system. Look, I don't care what it'd be, whether it stutters or blows up the kernel, I only took a quick look since you pinged me and was asking "why not". If you don't want to consider my reasoning, as the maintainer you can merge whatever you like, and it'll be easier for me as I won't be wasting more time. -- Pavel Begunkov