From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A6D592857C4 for ; Mon, 19 May 2025 18:25:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747679148; cv=none; b=dpiqMiXuv0xetisPKWx7gdVmtv8EVaUYlMQBFPGd5GeIIIS26gvXI6CFdSKr51uVc9lGg9vAn+UKabK6ch8noQr+EVRmDmGJMqszPt766rd50mvYRZBFpnE80qNETtWDvHRpjJ6E2csmiG8TDeQ75SQ9i9iopf+AwvGElaMYB0I= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747679148; c=relaxed/simple; bh=NIDBPzXUqQ3vtMsBWBp5K7llFn1Vg+0Pe1G6c7OuNEM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Sk9fbmo1Osn7Qd04OjQt4A32Zjgc3YFhyPyEHLlQ2cfdFzmxKSoMuy4rRRASQWLrLsD/Za/dE2gJpTFEBwbzLDNdqfP3zTnseFVjnKmZ2i82S+dKtRyYMSRI8SQA5aXlzYq+7IJcS1v82aS6ZaoYml0QfhXVBjgVqQdA1sk4j+M= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=OWaSNc+a; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="OWaSNc+a" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E72BFC4CEE4; Mon, 19 May 2025 18:25:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747679148; bh=NIDBPzXUqQ3vtMsBWBp5K7llFn1Vg+0Pe1G6c7OuNEM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=OWaSNc+a/MGXZl7VvatWyMP1U6uYkMg6Yt9tWz1bxHCCUhWPz5170lWukvKEzixcU 2WlqmJSSiGtyL8VMgxWcfDGqrtrAStpKz63mjf6yj+qRrSXwQZviWcnlpynHuAvYNB yx18mmSYjNCcgllxx6jpxJ/6aGBTbghg6hs9Ln5xVbSSYiHcdmnFu/x865k0IC/4Wq ETTtIz4fy28cSO/kPRppamhby63aCGZAg7RTTAwJTjzcuTt/wmZIWZcZYJRO3HrGGv WmCDK3bgi4VhToYu9ZjzWIuVs2EritLOpJZ1BalQ52bZNLBfvhm9cMaKFh1+DGX9OW 8qdFky4nmdKXA== From: SeongJae Park To: Lorenzo Stoakes Cc: SeongJae Park , "Liam R. Howlett" , Davidlohr Bueso , Andrew Morton , David Hildenbrand , Shakeel Butt , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [RFC PATCH v2 4/4] mm/madvise: remove redundant mmap_lock operations from process_madvise() Date: Mon, 19 May 2025 11:25:44 -0700 Message-Id: <20250519182544.45603-1-sj@kernel.org> X-Mailer: git-send-email 2.39.5 In-Reply-To: <371ec2c6-01d9-4deb-a234-aacad94680c5@lucifer.local> References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit On Sat, 17 May 2025 20:28:49 +0100 Lorenzo Stoakes wrote: > On Fri, Jan 31, 2025 at 05:51:45PM +0000, Lorenzo Stoakes wrote: > > On Fri, Jan 31, 2025 at 12:47:24PM -0500, Liam R. Howlett wrote: > > > * Davidlohr Bueso [250131 12:31]: > > > > On Fri, 31 Jan 2025, Lorenzo Stoakes wrote: > > > > > > > > > On Thu, Jan 16, 2025 at 05:30:58PM -0800, SeongJae Park wrote: > > > > > > Optimize redundant mmap lock operations from process_madvise() by > > > > > > directly doing the mmap locking first, and then the remaining works for > > > > > > all ranges in the loop. > > > > > > > > > > > > Signed-off-by: SeongJae Park > > > > > > > > > > I wonder if this might increase lock contention because now all of the > > > > > vector operations will hold the relevant mm lock without releasing after > > > > > each operation? > > > > > > > > That was exactly my concern. While afaict the numbers presented in v1 > > > > are quite nice, this is ultimately a micro-benchmark, where no other > > > > unrelated threads are impacted by these new hold times. > > > > > > Indeed, I was also concerned about this scenario. > > > > > > But this method does have the added advantage of keeping the vma space > > > in the same state as it was expected during the initial call - although > > > the race does still exist on looking vs acting on the data. This would > > > just remove the intermediate changes. > > > > > > > > > > > > Probably it's ok given limited size of iov, but maybe in future we'd want > > > > > to set a limit on the ranges before we drop/reacquire lock? > > > > > > > > imo, this should best be done in the same patch/series. Maybe extend > > > > the benchmark to use IOV_MAX and find a sweet spot? > > > > > > Are you worried this is over-engineering for a problem that may never be > > > an issue, or is there a particular usecase you have in mind? > > > > > > It is probably worth investigating, and maybe a potential usecase would > > > help with the targeted sweet spot? > > > > > > > Keep in mind process_madvise() is not limited by IOV_MAX, which can be rather > > high, but rather UIO_FASTIOV, which is limited to 8 entries. > > > > (Some have been surprised by this limitation...!) > > Surprised, perhaps because I was wrong about this :) Apologies for that. > > SJ raised this in [0] and the non-RFC version of this series is over at [1]. > > [0]: https://lore.kernel.org/all/20250517162048.36347-1-sj@kernel.org/ > [1]: https://lore.kernel.org/all/20250206061517.2958-1-sj@kernel.org/ I actually mentioned[1] I think the real limit is UIO_MAXIOV but still that wouldn't be a real problem since users can tune the batching size. Actually jemalloc has made a change to use process_madvise() with up to 128 batching size. I impatiently sent[3] the next revision without giving you enough time to reply, though. > > We should revisit this and determine whether the drop/reacquire lock is > required, perhaps doing some experiments around heavy operations using > UIO_MAXIOV entries? > > SJ - could you take a look at this please? We had a chance to test this against a production workload, and found no visible regression. The workload is not intesively calling process_madvise() though. Our internal testing of kernels having this change also didn't find any problem so far, though process_madvise() calls from the internal testing is also not intensive to my best knowledge. So my thought about UIO_MAXIOV is same. I anticipate no issue (until someone yells ;) ) and didn't find an evidence of the problem. But also same to the previous discussion[1], I agree more testing would be good, while I have no good list of benchmarks for this. It would be nice if someone can give me the name of the benchmarks. > > > > > So I think at this point scaling isn't a huge issue, I raise it because in > > future we may want to increase this limit, at which point we should think about > > it, which is why I sort of hand-waved it away a bit. > > Again as I said here, I suspect _probably_ this won't be too much of an > issue - but it is absolutely one we need to address. Yes, I agree :) [1] https://lore.kernel.org/20250204195343.16500-1-sj@kernel.org [2] https://github.com/jemalloc/jemalloc/pull/2794/commits/c3604456d4c1f570348a [3] https://lore.kernel.org/20250206062801.3060-1-sj@kernel.org Thanks, SJ [...]