From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 65E753988FB; Tue, 17 Mar 2026 08:27:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773736075; cv=none; b=leFWYEWb3fnzP/9TYmyN1d7aW81HfL1dY1ivC+++G+4QzUp6GglEZQSZLFx//2YAis8znXL4l6GSUUhvZJdTgLiyt3+xCcB6N8uh6FJsewFlCoTC3dd/Nm8xB1FInN60XRwL3f4dW1IYM7O5oCRo75SP3PrCfdDtX3CNC4xBVY8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773736075; c=relaxed/simple; bh=5J7bm5yZK2SOAa9ngMWjA/xHVCb4YicboOTrm+PNtwU=; h=Message-ID:Date:MIME-Version:From:Subject:To:Cc:References: In-Reply-To:Content-Type; b=rI/6cKYgQN90CA0vKrt53DKnxW2Ke41h3yZXBoS/uqseo5yTk3M6/o7kY44e6U7gtwmLAQ7Vw4xwpL6UsLISX98EA2rx+dEW5RqChBV6MOfolGZNEmJSmBWqDPlYZvnOetFBvc8YJZsNAGILh8xNBi1Phac35vn5bvN+MXE9Jb0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=rPHlFxRm; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="rPHlFxRm" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3B08AC4CEF7; Tue, 17 Mar 2026 08:27:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773736074; bh=5J7bm5yZK2SOAa9ngMWjA/xHVCb4YicboOTrm+PNtwU=; h=Date:From:Subject:To:Cc:References:In-Reply-To:From; b=rPHlFxRm+xTAI1x38mz3jSy00ZKkrVmPpnz40uyihNJE5MPDov1WHJdJUitOMXLrJ un6A8dglaKzNMTlYW8p7vZW0d/oZcVWwNHhf/Jc3EQk9CIAxFAO/TV0PTJBAU+xMX4 4SBdtLTLv0tJnRV+H5bHNxvG3TGygef8kayz+xsC/OhwS6JmnTucI7EVfjQhUW5fnz k3HNS0/A8/i+4SKgBrqCtG5hCNiS/Mi7XpynSOplR0cxRbCaHa5k0NLGFDr8WykDK6 iDppdA+7VLxcxSyM0fPEPfd6hcmf9cf2OqNpnno0ACsAJBsUbqlJQ5m8sBErpDSyAz PJg3HBBRH7D1A== Message-ID: Date: Tue, 17 Mar 2026 09:27:50 +0100 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird From: "Vlastimil Babka (SUSE)" Subject: Re: [PATCH v3] iomap: add allocation cache for iomap_dio To: changfengnan Cc: Dave Chinner , Harry Yoo , Hao Li , guzebing , brauner@kernel.org, djwong@kernel.org, hch@infradead.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, guzebing@bytedance.com, syzbot@syzkaller.appspotmail.com, linux-mm@kvack.org References: <20260115021108.1913695-1-guzebing1612@gmail.com> Content-Language: en-US In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On 3/17/26 08:28, changfengnan wrote: > >> That suggests in that test you used larger capacity than the automatically >> calculated. > The 10% improvement is due to the every cache has sheaves. > When I tested 256-byte objects, default sheaf_capacity is 26, allocating and > freeing 32 objects did not show a noticeable difference, but allocating and > freeing 128 objects resulted in a significant improvement, about 3-4x in a  > multithreaded environment.  about 12% improvement in single thread. Great! >>   >> > I'm thinking that maybe these improvements may not be significant enough to >> > see the effect in the io flow. >> > Using a simple list seems to be the most efficient approach. >>  >> I think the question is, what improvement do you now see with your added >> pcpu cache vs kmalloc() when 7.0-rc4 is used as the baseline? > > On 7.0-rc4, pcpu get 1.20M IOPS , kmalloc get 1.19M IOPS, new cache with set sheaf_capacity 256, 1.19M IOPS > On 6.19, pcpu get 1.20M IOPS,  kmalloc get 1.17M IOPS, new cache with set sheaf_capacity 256, 1.19M IOPS. Thanks a lot for that data. My conclusion is that kmalloc before sheaves did indeed worse and custom pcpu cache improved it relatively more. Kmalloc with sheaves does better, and the improvement of custom pcpu cache is smaller. Also the default sheaf capacity seems to be enough for this workload. IO is not my area but getting from 1.19M to 1.20M doesn't look like it's worth the custom code? (possibly from 1.17M to 1.20M it also wasn't).