From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-yw1-f174.google.com (mail-yw1-f174.google.com [209.85.128.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 939CC2EBBB7 for ; Tue, 9 Jun 2026 01:06:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.174 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780967190; cv=none; b=l9u+DFqLwscA1wLnSZLpz4z06BkGungP84oQxOJ7bTQ3nfqyNbmUR38wUvdPZeVl/Y+L1bJcboA8FiJ5FqNPsA1ByERAIVZZIj7x7C7PFAjwPzf/BNikXHFhP40s0Cvh8+YPK/vPUJfKHtfOGXbBGOdhQzZ3cxTduporhPVl5rw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780967190; c=relaxed/simple; bh=VFkYZ8sSNaCRXjGOc6b1Gs4pWPrM9PT9ywpXh8rk3t4=; h=From:Date:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=dsZpBUS17FhLLIU7WFDALSyt/B7tAHfDZ6qQQ+JZ5seAVNs20UlAVuxHIyW0VMjw2C94g+L7KtB9da/iC58w4VeJaCW3l0/nqvS1LJBhhC/oYsW4/gxL568JNR7nXHCqLpb6SGv1Wk51BQfhlSqwpIXF6yUw0lcHPv+ksUg3h+E= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=qhaTybKu; arc=none smtp.client-ip=209.85.128.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="qhaTybKu" Received: by mail-yw1-f174.google.com with SMTP id 00721157ae682-7e8ec43e5f9so66408427b3.0 for ; Mon, 08 Jun 2026 18:06:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1780967187; x=1781571987; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:date:from:from:to:cc:subject:date:message-id:reply-to; bh=REZ5Dh0IGNQt6Ft1ls+eIu2bp2zbWu3Gdvj71Obqn4Q=; b=qhaTybKu9qqBFdXGZtxwWT8Qh/mXPJi/i8yrmSnK8joqBAGd+ubId+QRtBeTgzGBTR hgA21EHfCjmMBnKEuEmNYQ9EAOFdMmheASByaPrL9H27IK83i3wbR+mPFJ3z2g61Q1Sh sK8q5zPkqGmonOMdUcZZvhOCRi8EsDRBlo6j5NBR8K/BZFZrCzzzHdITF0yr2WLSbDah PGDqi78qC0YeHl80jRP6i7aUPfXeOqQ7IAI2+h2Q0zHamFlECW5buJipBT9L4chYW43Q 7FvvldTKiVaJo6pnOrMC1vKIy7gKxZvU6zzlpnqmxpJmqeaSGGaRLDn5LLl2RavXXspz +jmA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780967187; x=1781571987; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:date:from:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=REZ5Dh0IGNQt6Ft1ls+eIu2bp2zbWu3Gdvj71Obqn4Q=; b=EbkpdUSLM+6ib4jbjxxEkCQWoHVE9W6ZBsH/MVo17qdJVBwNElEoriIrCrJYt9MGAV RXihXzH4WpOhMuQ/h2dOiapw+GPmwOYFwDApLMYN93gomRBKjUaSdUs3rCnEBHZ9APAT Sn0t9Sx0cMW8cTK1L8hkv7y6SKNkEo2Cl0To078HiEKk7Z1ikFyR7F4aao94yzH45AuL OSfVxiy4JhYowde0iNdfH2Dps7rcasIAAuM31OgYdcOAopZmdMtYMXIa0LbjKGwRE/Mj WW+Y/P36iJWoezcEyJ8RjQf7iMzxhRc8msJqadoPIVfQX1CAKwkwmDDLE1jJc7MOexHR FbjQ== X-Forwarded-Encrypted: i=1; AFNElJ+TlHZ3A92jWkzcA4EpHRwLTXUGGZCBT3pJA8Dikv9/cQKgdKzMw0N28wVddfj+Yw+yJ9yXeyqTBuUcfu8=@vger.kernel.org X-Gm-Message-State: AOJu0Yyt46rCACUhdS/nCH7aBmLnBuesakthROPmN1blQXngI+ezQU1g Kbcndsh1lRZJFdq0GMB0C6AXP6QHGnDx295oqXQDHL9Igr/2XTtaRQlo X-Gm-Gg: Acq92OHDc3fns7xR6SUYLD1ltxPchaWF/ZZH0tUjVlLG1uWkF9FUEROhYdgODEZg06D kvhUawIb7GLEWWbD5+D6r9cOXrhPbpz5BTJjogmVTgZJOYxzcjsp9A4XIbMHI0StRLmQyWBDlS7 LTmmXoKn2mS28QtiQN2Zm2GXDjutOQ8DswhIQG+nKC3WSFKRc1Qk9vIAMZJZL2G0BSEi5U+m9Ur tERCGMTGNTAXbHwZTOtB5YcnXs0l43fjZmnsjxjfY4NQOVEAzOMh3uJ6/78YwMV93EgM4JStpVJ CkU5k5kLvuYmPPK8dJeoMfxsJUZ3Nc/EXTMwKnUm2DBPvXSCwMqqLsD5PkbO1xMdXONEGAnV0Pg iwE2MSZd363h8zy5INwBgYhSPFyRA5zb1bFlzvBJzmG7l9wTUt629UqnICpJdHRKoK3OnRZ5/4o suH2EysVHCFQ85jVWTF+NOAE3LMZ74 X-Received: by 2002:a05:690c:6e04:b0:7b8:7e2d:7d89 with SMTP id 00721157ae682-7ed0dc9b115mr170700077b3.35.1780967187369; Mon, 08 Jun 2026 18:06:27 -0700 (PDT) Received: from localhost ([50.221.107.122]) by smtp.gmail.com with ESMTPSA id 00721157ae682-7ea23592beasm92163687b3.32.2026.06.08.18.06.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 08 Jun 2026 18:06:26 -0700 (PDT) From: Yury Norov X-Google-Original-From: Yury Norov Date: Mon, 8 Jun 2026 21:06:26 -0400 To: Yury Norov Cc: Yi Sun , mnazarewicz@gmail.com, akpm@linux-foundation.org, mina86@mina86.com, akinobu.mita@gmail.com, linux-kernel@vger.kernel.org, John Stultz , John Stultz Subject: Re: [PATCH v4 0/2] Improve the performance of bitmap_find_next_zero_area_off() Message-ID: References: <20260601094234.103863-1-yi.sun@unisoc.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Mon, Jun 08, 2026 at 05:54:20PM -0400, Yury Norov wrote: > On Mon, Jun 01, 2026 at 05:42:32PM +0800, Yi Sun wrote: > > Test code has been added to PATCH v2. > > No new APIs were introduced. > > > > Testing with the test code showed a performance improvement > > of approximately 70%. > > No, it's not. Your numbers show approximately 50% improvement for > the dense case, and approximately 2% slowdown for the sparse case. > > > Test result(random): > > orig_ns orig_cnt orig_average new_ns new_cnt new_average ratio > > test1 1388885 1154 1203 462923 1308 353 70.7% > > test2 1393616 1324 1052 736193 1212 607 42.3% > > test3 1391693 1216 1144 735808 1260 583 49% > > test4 1393231 1275 1092 742731 1402 529 51.6% > > test5 1390731 1260 1103 737231 1274 578 47.6% > > > > Test result(sparse): > > orig_ns orig_cnt orig_average new_ns new_cnt new_average ratio > > test1 4496077 322477 13 2419462 322480 7 46.2% > > test2 7514731 322482 23 5785808 322476 17 26.1% > > test3 7490692 322493 23 7654423 322483 23 0% > > test4 7474500 322469 23 7628230 322483 23 0% > > test5 7452692 322481 23 7663116 322478 23 0% > > The numbers look quite inconsistent. The first measurements are > significantly faster for almost all experiments. In the 'new sparse' > case the first run is 4 times faster than the others. And the ratio > 0% is simply wrong. > > Please, run the test on a real hardware, not virtualized. Please > built-in the test, so it's executed at boot time, or make sure you're > not running anything on parallel, like a GUI or networking. > > I gave your code a brief test on my qemu, and I have 43% improvement > in the dense case, with p-value 0.001; and -8% for sparse bitmap, > with the p-value 0.044, still significant. > > Overall not bad. But if some critical user has actually a sparse bitmap, > he'll be disappointed. There's not that many actual users of the > function. For v5, can you CC those from non-driver part, at least. > > (The ARM GIC counts as the non-driver, I believe.) OK, I traced the cma_alloc(), which calls the bitmap function through cma_range_alloc(), and the numbers are looking really strong: Metric Before After Change Trace span 194.0 ms 87.1 ms -55.1% Total CMA alloc time 48.46 ms 16.11 ms -66.8% Avg alloc latency 184.94 us 61.49 us -66.8% Median alloc latency 73.72 us 20.59 us -72.1% p90 alloc latency 329.76 us 55.63 us -83.1% p99 alloc latency 1866.76 us 859.83 us -53.9% Max alloc latency 4821.91 us 2324.41 us -51.8% By request size: Request Before Avg After Avg Change 1 page 79.68 us 34.47 us -56.7% 256 pages 285.50 us 87.30 us -69.4% I ran it on qemu, but the numbers are so impressive that I believe they will be reproduced baremetal. The tracing command is: sudo trace-cmd record \ -o cma-dmabuf.dat \ -b 65536 \ -e cma:cma_alloc_start \ -e cma:cma_alloc_finish \ -e cma:cma_alloc_busy_retry \ -e cma:cma_release \ -- kselftest/dmabuf-heaps/dmabuf-heap Can you run it on your side before sending v5, and share your results? Adding John Stultz, the test author. Hi John. This series improves the underlying bitmap_find_next_zero_area_off() significantly for average bitmap, but shows ~8% slowdown for sparse bitmaps. With your CMA allocator test, the results are even stronger, comparing to the synthetic benchmark, and there seemingly are no drawbacks. Can you comment on the results and maybe reproduce it on your side? Are you or anyone aware of any other useful tests for CMA allocator? How important the sparse bitmap case overall? Thanks, Yury