From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8088CC7115A for ; Wed, 18 Jun 2025 10:30:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 08E2A6B008A; Wed, 18 Jun 2025 06:30:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 03F1D6B008C; Wed, 18 Jun 2025 06:30:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E97446B0092; Wed, 18 Jun 2025 06:30:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id DA26F6B008A for ; Wed, 18 Jun 2025 06:30:40 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 8A476C122C for ; Wed, 18 Jun 2025 10:30:40 +0000 (UTC) X-FDA: 83568152640.01.6C55BC7 Received: from mail-ua1-f48.google.com (mail-ua1-f48.google.com [209.85.222.48]) by imf09.hostedemail.com (Postfix) with ESMTP id B1844140006 for ; Wed, 18 Jun 2025 10:30:38 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=a2oeNPdE; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf09.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.48 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750242638; a=rsa-sha256; cv=none; b=VjBGSMAYTSVSV/Sc1ouxg5U5uzUPc5yy6/+j6VpXwmNS9NoiIkm8E5b6sj7PXK3JXblSRE tJGSaBvhugvqW/XNMKXO7Z1iGZ57ftRGdWo3x+QlYIdQLAUbUWR6Y/OBlxJchx/+ce0TtG zd59/HqELRZjOj7mB8GxgWRxEhRUV9A= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=a2oeNPdE; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf09.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.48 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750242638; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=eT/12PZKfSu2SPGaB3McbmJPaBjlQ9tOE7JomTG4kVY=; b=oUuC//rjjpevDqJYJdA2PiUAWlFR8mhxZPldmNaRGYyMKEul7f/IMpFmviBPA8l/68dIKe /X0AupkRYVcgBnLg52r+AwKeUH4XDNGSZ88RJF/BIkxyKXsVNZ2U62NhYHCva4yP0zbbw0 8C4YaZqIG0anLYmwUgrzzS1R52v1z8I= Received: by mail-ua1-f48.google.com with SMTP id a1e0cc1a2514c-87ed676e631so4130478241.3 for ; Wed, 18 Jun 2025 03:30:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1750242638; x=1750847438; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=eT/12PZKfSu2SPGaB3McbmJPaBjlQ9tOE7JomTG4kVY=; b=a2oeNPdEJI264IhgExO+tuFHwmko4Ia6Klz1K+dFInloG1tlzLZeIFlaIl4LaaCo4V EP7GOWqLH9iDO3pmodHsSeFMx++pUyBEDEaD2QHmjyF7PTdRTZsry/SueecS1b3rSvDZ 69Vxdbc8O6rF48EVk+3VXWSBp5sOeDnYxuY6O4qIKrHx8yOYDmmguEsQewXxpJtRuOZw QKd9C+jZSQt6vEdvLkBO4NZqSFgJmNAwyJeGAoA14Ne6ZGQJUdmIvycmWOkRLB0jLAZa tPMvL4GncraBxjIITcP8c5gYZR+2ZenfwPt+/3dLF3Nm2uAYUjjJ1l0pnFWjo1QM1kn5 ugoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750242638; x=1750847438; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=eT/12PZKfSu2SPGaB3McbmJPaBjlQ9tOE7JomTG4kVY=; b=OWhxAIiJGDABrbRlp9K98lHyIYTOKGXOH5dko+iEoqSE20R4l1AJ+2HbPuWWF3ICwC fMG4BVLRYdC8AxolLSnJ5Jpgr4yiL6Yr1BNFJe+F+CIjP7ZSdpixbRi/Zser6h3wjEkm SJSf0IVY8z1kMr/pzNIpF+uk5VLLIpAHUpz9HBGb9tmrn0WLGNcjsva3Svx2yOwnW3Qi HfblNbIR8lOk4PRexKz6GmGr0665+T6YN7uGU8ynaA1YShYwVU2aikJyDYX8DHACSjqm eX6wTmUkqH9LKC9VSgR2I4y++9g9Sn1h4ZKVAKDzKYUZm1bIXVsHKnrBznQyrX7S46tH ccHQ== X-Forwarded-Encrypted: i=1; AJvYcCXaAx7TiNF4ItBWcqqxMAbcgnuVOHkC+otqMQkP0srQZQcS4nv2AHkn3djfl/GBnTPRQH3lzjkw4w==@kvack.org X-Gm-Message-State: AOJu0YyDO2c8uWOXAMipBifqAIIM/ERF7SyHVaVtJ2Gf+ZyliULI27P3 R1D7ABBfvkAEbeOXexyt3B8780yBnG49q02v15MfF+vJYgG+gMKOmQLejWSJiFZu1OfhlL/vnOX dKwWYb1Mr6Nbd3FI/93deS7VQEfuxVSk= X-Gm-Gg: ASbGnctvnFvwHgnuNN+r9VCqMlt/sAJu/Swo4RBg/SshAwDH1N7oMyc/8AiTxY7q7z3 eavdLHO+Q6kubU9v0Zrjp6khx0CYdaGrdJXxL+/gaCGFQxj3ue79abi5yUQdTAZvK1Y5JSiYEOo b605DXiCG3dlreNucYxW9s3Cpkh0KOv4aVxqJbeuCKouM= X-Google-Smtp-Source: AGHT+IH7RKRvCm6q1PrSG+d31xlc/PQvpYKmyeQnmWasX0yTBn6nbvTFur8gYIrv7DGT5XaouyCELPVDrMUMs/XDbts= X-Received: by 2002:a05:6102:442b:b0:4e5:9c06:39d8 with SMTP id ada2fe7eead31-4e7f60f09fcmr12152465137.5.1750242637715; Wed, 18 Jun 2025 03:30:37 -0700 (PDT) MIME-Version: 1.0 References: <20250607220150.2980-1-21cnbao@gmail.com> <309d22ca-6cd9-4601-8402-d441a07d9443@lucifer.local> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Wed, 18 Jun 2025 18:30:26 +0800 X-Gm-Features: Ac12FXwWS54rOJl_kJC8BwUmeIH64jUFLRXblV-c1oAAypuMBEE4BMLagbfjl_E Message-ID: Subject: Re: [PATCH v4] mm: use per_vma lock for MADV_DONTNEED To: David Hildenbrand Cc: Lance Yang , akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Barry Song , "Liam R. Howlett" , Vlastimil Babka , Jann Horn , Suren Baghdasaryan , Lokesh Gidra , Tangquan Zheng , Qi Zheng , Lance Yang , Lorenzo Stoakes , Zi Li Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: B1844140006 X-Stat-Signature: r5czt86y9aug9ijrpfa1kfx9imepmnk1 X-Rspam-User: X-HE-Tag: 1750242638-305027 X-HE-Meta: U2FsdGVkX1/gLS637xiwTt1A7PO3/cZZK1ouzOQhBZyI7gwxq2erdhtJREER3ReT1pVSXBiSbfDKjE72BwGTN+uvVkEGFrrCt3uzhEpFescdm2ER5OxqBQkUH2y99iYpqbQbIHSmF/SbQgQaKhXngSFZGvxTeoEzE4XLV1QklhL9DIw6zgX5D+zhJqseCFimrJdlV7B1rUuZTG3KmGuFvC3gUjxAzZORltd0UC31uWnD0zX32tOP+l7XMi38r5fIWyZGil57BGgOnfTyBVGjhUe1GAHppZpI8wtS7ryxqALFpbwnlxkNxlO5yO4fCPoeK3lUi/uzPtsJVttdgOKEphcPPujl1TzsKscx66nNzkoCn6M7A4WpUf1aOkwYozZJ49ehIKgka7FlXCaamwgieqF+95DnAFFxjRVhWhdv18Z8rhHTDFG+1UjFtOhjdJemcGy69ndmku6hpEMwmDmM4h2/N4MU6LHoy2YVpJ51JWOICJ1aHas8QQJIpkMMWB/eJQLm/DTSDNEtKB43/fZM7fEg8NGZ2Cl0uw37nO7z6K0jCxcr7spw3w/mIoK6D05t/5fu1smQVLsbR9kahWkdoKht4guRVB4eiMm78AqiO/n2h8wJEEUICcVu1aKp9zbycY0yX85YHLL4z4hF5q2KNYD28VvZIskd1BglJkuqVBvXXndPvERFz5kDVh0QLhKCj/RcvA8KtgLON6tBtN10N4rjMdbse3z1Cv3XKFdkuCA5gHf0xtXQ4/KSHcwuoXf+nVilCn8PlQnbD3o5FCDsPTSglaauUIjSefsSV3oVsix1lKmHc7d5VAXhQIya91p/HYzzQ6tme00PKA9IomLPZjF9soeOW03lZSRpbg7xPFAJlB1P2iPtVL7kXCBcihxlzr+9kn3tgaBQJbOpcbe1Es583cazPRPfFAPg0+sjZFAb3S6/zY9M2oiIjOt8Fj81SqL7XHFKwkzmN38u3bD Yp0QzT5B x2wwqo2uBANf0POR67WFE6g3qZh4vg6JuRCl8MhpUpx2q+PLyubIEzcHXQI5d6epzBSpgMvcEUgHescVFQc+4I1idWyf9kiwAS0wrN3QRaILF2W/t8N/6AIZul2UXDZGZN8l6lhCuL1VI2WfJowJQ2XrMA+QtS9kqZKueEW6dVBEMH4i4KZ6THZ3aK0Gjn26kfaYDIuv9hIBRL5LHSv7UC9PuGrOiAMVJTL5U/uUCo7Pq0wtUI6HjrvPH383xDVLpJ1nyp8QE3HkGpDY9zpyzdwKnjBkGWoTBCOOWVIR8G1wXXIwFYf0mZ1uNORLquOakfgtjA0XjlgUcVl95IJQ1U8GMmKQa4i7BnB979TsRyF1RsHWTPtexxoPxf1EfTSLAqjczUxdEospYr1rlkTyWnHzFPmHto+L3lvsJ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jun 18, 2025 at 6:18=E2=80=AFPM David Hildenbrand wrote: > > On 18.06.25 11:52, Barry Song wrote: > > On Wed, Jun 18, 2025 at 10:25=E2=80=AFAM Lance Yang wrote: > >> > >> Hi all, > >> > >> Crazy, the per-VMA lock for madvise is an absolute game-changer ;) > >> > >> On 2025/6/17 21:38, Lorenzo Stoakes wrote: > >> [...] > >>> > >>> On Sun, Jun 08, 2025 at 10:01:50AM +1200, Barry Song wrote: > >>>> From: Barry Song > >>>> > >>>> Certain madvise operations, especially MADV_DONTNEED, occur far more > >>>> frequently than other madvise options, particularly in native and Ja= va > >>>> heaps for dynamic memory management. > >>>> > >>>> Currently, the mmap_lock is always held during these operations, eve= n when > >>>> unnecessary. This causes lock contention and can lead to severe prio= rity > >>>> inversion, where low-priority threads=E2=80=94such as Android's Heap= TaskDaemon=E2=80=94 > >>>> hold the lock and block higher-priority threads. > >>>> > >>>> This patch enables the use of per-VMA locks when the advised range l= ies > >>>> entirely within a single VMA, avoiding the need for full VMA travers= al. In > >>>> practice, userspace heaps rarely issue MADV_DONTNEED across multiple= VMAs. > >>>> > >>>> Tangquan=E2=80=99s testing shows that over 99.5% of memory reclaimed= by Android > >>>> benefits from this per-VMA lock optimization. After extended runtime= , > >>>> 217,735 madvise calls from HeapTaskDaemon used the per-VMA path, whi= le > >>>> only 1,231 fell back to mmap_lock. > >>>> > >>>> To simplify handling, the implementation falls back to the standard > >>>> mmap_lock if userfaultfd is enabled on the VMA, avoiding the complex= ity of > >>>> userfaultfd_remove(). > >>>> > >>>> Many thanks to Lorenzo's work[1] on: > >>>> "Refactor the madvise() code to retain state about the locking mode > >>>> utilised for traversing VMAs. > >>>> > >>>> Then use this mechanism to permit VMA locking to be done later in th= e > >>>> madvise() logic and also to allow altering of the locking mode to pe= rmit > >>>> falling back to an mmap read lock if required." > >>>> > >>>> One important point, as pointed out by Jann[2], is that > >>>> untagged_addr_remote() requires holding mmap_lock. This is because > >>>> address tagging on x86 and RISC-V is quite complex. > >>>> > >>>> Until untagged_addr_remote() becomes atomic=E2=80=94which seems unli= kely in > >>>> the near future=E2=80=94we cannot support per-VMA locks for remote p= rocesses. > >>>> So for now, only local processes are supported. > >> > >> Just to put some numbers on it, I ran a micro-benchmark with 100 > >> parallel threads, where each thread calls madvise() on its own 1GiB > >> chunk of 64KiB mTHP-backed memory. The performance gain is huge: > >> > >> 1) MADV_DONTNEED saw its average time drop from 0.0508s to 0.0270s (~4= 7% > >> faster) > >> 2) MADV_FREE saw its average time drop from 0.3078s to 0.1095s (~6= 4% > >> faster) > > > > Thanks for the report, Lance. I assume your micro-benchmark includes so= me > > explicit or implicit operations that may require mmap_write_lock(). > > As mmap_read_lock() only waits for writers and does not block other > > mmap_read_lock() calls. > > The number rather indicate that one test was run with (m)THPs enabled > and the other not? Just a thought. The locking overhead from my > experience is not that significant. Right. I don't expect pure madvise_dontneed/free=E2=80=94without any additi= onal behavior requiring mmap_write_lock=E2=80=94to improve performance significa= ntly. The main benefit would be avoiding contention on the write lock. Consider this scenario: timestamp1: Thread A acquires the read lock timestamp2: Thread B attempts to acquire the write lock timestamp3: Threads C, D, and E attempt to acquire the read lock In this case, thread B must wait for A, and threads C, D, and E will wait for both A and B. Any write lock request effectively blocks all subsequent read acquisitions. In the worst case, thread A might be a GC thread with a high nice value. If it's preempted by other threads, the delay can reach several milliseconds=E2=80=94as we've observed in some cases. > > -- > Cheers, > > David / dhildenb > Thanks Barry