From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com [209.85.128.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 63FCE212566 for ; Sat, 24 Jan 2026 11:04:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.65 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769252676; cv=none; b=grwMgJUptc1HffAYupdg/26Nazdu2PaqQwZwnR8dtOB5QWoSXj/5Juau6YNAqDfNNejmh1fh3zvlEJnAfa+5pEZF6zH0lAsnBp4J1lA3BG92HvMO8kVHNZH2/tiVQn65O+7N5UMeSfT6+HCcKyW4RaQq2vS4/9iwotHBJVWPnCw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769252676; c=relaxed/simple; bh=QvepxcoI9HsfH9a32eqNoIUE7TeRCrEjtB6GukarcdM=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=T2Z7Rnv1zFDqOiJ7dtEPfUtOWc42vKzuZ+m+pPVGxQQ9clvDa30zqXBbPH0DBiephNEtl3JOALnJIbTQFz9IuQJgSgGnCcvHk1ZxbQ567lN7Sj/UFex0i4OyOl/SJJVsr/itCTrRQb4fmi3e7fjsX5rJ2G2eVQXW9n4oS+oDRLw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=V2gksqst; arc=none smtp.client-ip=209.85.128.65 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="V2gksqst" Received: by mail-wm1-f65.google.com with SMTP id 5b1f17b1804b1-47fedb7c68dso30014495e9.2 for ; Sat, 24 Jan 2026 03:04:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1769252674; x=1769857474; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=oYJt6/0BCXAzZx2pq7NPVuk1wGb1eMVM+qNktItIKjs=; b=V2gksqstJFJgiEb3ZL72YYVRWyceCx2xOv9axwAvMhDOAXn34el0peAQ+bVajnmzA3 0ndKXnF//Q81WblmmY42kWKJDJp07iVBtjSPg4xFVgfiTvRVzyIaieb3yvIm2tJblvW0 dmIQGEIwcyZwijzap8HXjOg9j/SYu+QAwWfNttvNN2pSzKqz4/cahh8nzV4FOIOWqX6y b2lGdi0DLiHoczsQxu+CTFFm1EGJulAlBJ5rLhKQlwWJhs2VD6r7CMJ7clLItZ/XgdzJ FTBuaI79gAs+KAJJGw0+SuFhKBkQzulbnFujMgU2ee1H2yl0rn1/EUpUxqbfwa2y1yD0 5jDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769252674; x=1769857474; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-gg:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=oYJt6/0BCXAzZx2pq7NPVuk1wGb1eMVM+qNktItIKjs=; b=Ay8+2L3lOGPGB7wBu2NSVCEl8Cibqh1x114opv1YnoUnqWWJpUqaTf16hEPoEV7ISP //7XUD+9feRogPH1dm2ZufOf3VgeysqBxtyfBzTmtfrNghCho+hRptdlSWIgO4whL3hf 1FfQbFpjR3V4C1tqajivrrcLOXz7d0PKKh2qTy72TvE4HHVbo7Hm1qbZMOtgmGcv+5tx svxeVpx8nXm3P1WsBM6LTVL+vdW2lJcaMf8CDcqfDTKz7R5vy1GBvvat8Y8FL4RwKqIa gNecQSt3oHfvuKXDtE327uhWvAamQ66lFVNzRdnB11T9WMxVLUFKR1NSsfZNKwSo4SzT BgJQ== X-Forwarded-Encrypted: i=1; AJvYcCXR8wDSENgn7u7z2yTfcuj9B01TANbpOG1VzvGXuy1JvVaVNhGB9++S0jFhoAE6zNBs+KBvVjZXwg9vjq4=@vger.kernel.org X-Gm-Message-State: AOJu0Yw2pEdTbOptDGNgSSU+iIbsL21kZns0n71xQuGJ5b0x+VlVqFdr F3X+KnJ4hNmLlA/SXsPqdqD8+wBPU7msExYOpuKYfZ/1DWZOAw99URGH X-Gm-Gg: AZuq6aKams5N5GG1+/33TxY/OgcLiZ3CQQaqZ5X6OvlqjX2PYG8FiZND4isco+o6M+6 OqihudS4ghVgoaxr1KDn+5fBF58imTDlEQ90o57I9HnDct4ZMFLZMw8YkiLy6uK22nI/Mf0qlhu OsWu8TVSJgesUKfTQJPOIZZYoDnsobL8OPE6VL62LjH1bvx58G/7yPeILoedV7fnXf3jD7qR8/p 3wsjhKMOPwRxg9N0naYJ1bLdJ8Rj2X8/o9fxqbYbBCfxT//6khqD1oNAJ/EDTLHtD8LhFtg6O5G wqoJyVOTAuhyWacoVzeHpWnhjPTd4zF+koyUkjd2U8wqRB4j/8lNGcMMKxe2EBfinlTa4L1vlzX ruCYREahRrt+OnYNoyRkx1eiqsc7dOgP6GPsXWedE8zDHJPYnMCICrn+pZsrsKv7pTDLQOrTG7F O2oyqm5Jw/j4FS4IDD2qhDKYuxWOx8pwruHWg73U+hy9iQ+OtZRdeh+mQ7mRyIELLUFy7rkftf4 pnF0fGRrH/10bF0LBeUYU5ib4hwdiXSE5fMqnYLyjgwk1NJAqdY0SLaJpfW504uMQ== X-Received: by 2002:a05:6000:420e:b0:432:aa61:a06e with SMTP id ffacd0b85a97d-435b9658ccemr7315702f8f.32.1769252673590; Sat, 24 Jan 2026 03:04:33 -0800 (PST) Received: from ?IPV6:2a01:4b00:bd21:4f00:7cc6:d3ca:494:116c? ([2a01:4b00:bd21:4f00:7cc6:d3ca:494:116c]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-435b1c02c91sm13843795f8f.9.2026.01.24.03.04.32 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 24 Jan 2026 03:04:32 -0800 (PST) Message-ID: <9317bad6-aa89-4e93-b7d2-9e28f5d17cc8@gmail.com> Date: Sat, 24 Jan 2026 11:04:31 +0000 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2] io_uring/rsrc: fix RLIMIT_MEMLOCK bypass by removing cross-buffer accounting To: Jens Axboe , Yuhao Jiang Cc: io-uring@vger.kernel.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org References: <20260119071039.2113739-1-danisjiang@gmail.com> <2919f3c5-2510-4e97-ab7f-c9eef1c76a69@kernel.dk> <8c6a9114-82e9-416e-804b-ffaa7a679ab7@kernel.dk> <2be71481-ac35-4ff2-b6a9-a7568f81f728@gmail.com> <2fcf583a-f521-4e8d-9a89-0985681ca85b@kernel.dk> <3b7e6088-7d92-4d5c-96c7-f8c0e2cc7745@kernel.dk> <596bc7ac-3d24-43a7-9e7e-e59189525ebc@gmail.com> <654fe339-5a2b-4c38-9d2d-28cfc306b307@kernel.dk> Content-Language: en-US From: Pavel Begunkov In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 1/23/26 16:52, Jens Axboe wrote: > On 1/23/26 8:04 AM, Jens Axboe wrote: >> On 1/23/26 7:50 AM, Jens Axboe wrote: >>> On 1/23/26 7:26 AM, Pavel Begunkov wrote: >>>> On 1/22/26 21:51, Pavel Begunkov wrote: >>>> ... >>>>>>>> I already briefly touched on that earlier, for sure not going to be of >>>>>>>> any practical concern. >>>>>>> >>>>>>> Modest 16 GB can give 1M entries. Assuming 50ns-100ns per entry for the >>>>>>> xarray business, that's 50-100ms. It's all serialised, so multiply by >>>>>>> the number of CPUs/threads, e.g. 10-100, that's 0.5-10s. Account sky >>>>>>> high spinlock contention, and it jumps again, and there can be more >>>>>>> memory / CPUs / numa nodes. Not saying that it's worse than the >>>>>>> current O(n^2), I have a test program that borderline hangs the >>>>>>> system. ... >> Should've tried 32x32 as well, that ends up going deep into "this sucks" >> territory: >> >> git >> >> good luck FWIW, current scales perfectly with CPUs, so just 1 thread should be enough for testing. >> git + user_struct >> >> axboe@r7625 ~> time ./ppage 32 32 >> register 32 GB, num threads 32 >> >> ________________________________________________________ >> Executed in 16.34 secs fish external That's as precise to the calculations above as it could be, it was 100x16GB but that should only be differ by the factor of ~1.5. Without anchoring to this particular number, the problem is that the wall clock runtime for the accounting will linearly depend on the number of threads, so this 16 sec is what seemed concerning. >> usr time 0.54 secs 497.00 micros 0.54 secs >> sys time 451.94 secs 55.00 micros 451.94 secs > ... > and the crazier cases: I don't think it's even crazy, thinking of databases with lots of caches where it wants to read to / write from. 100GB+ shouldn't be surprising. > axboe@r7625 ~> time ./ppage 32 32 > register 32 GB, num threads 32 > > ________________________________________________________ > Executed in 2.81 secs fish external > usr time 0.71 secs 497.00 micros 0.71 secs > sys time 19.57 secs 183.00 micros 19.57 secs > > which isn't insane. Obviously also needs conditional rescheduling in the > page loops, as those can take a loooong time for large amounts of > memory. 2.8 sec sounds like a lot as well, makes me wonder which part of that is mm, but it mm should scale fine-ish. Surely there will be contention on page refcounts but at least the table walk is lockless in the best case scenario and otherwise seems to be read protected by an rw lock. -- Pavel Begunkov