From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 818A922ACEB; Sun, 31 May 2026 00:35:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780187730; cv=none; b=SftGt3APDiEVXbTLWn3aHs7hAQWBgNIldG+2bJbS6wEaY0GHwipmWRNJiDYXhGvudqyRVK0O9dS7611zC1SKKARN6+m0ZH4tLoDIPe7aXTOd7QyOzIoZzXmwzTdsM1/ufeIFRZVxUjDv6cM2l2dogRYMZwO7H1hRnj8clCGherg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780187730; c=relaxed/simple; bh=ua6nfT0obm5wkaI0I0ksba9L04jpDgP1XQxgrhPSutY=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=sQbBqcYQjPKmdOYJTvPw8MT5JRqm7xUbKYqm2tiauERwK+6fZoxs8Ty4Bih4kalUqOLYSWPFqALq1B3EA08Gqf8518R7hiYmhBHf7VASziLNEebV58/uSE66m4jNL1vSVgZ/TDWnFxkcJATilIRCbNjN7lLqiNHhioyp/8SmqsE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=d53H0TLH; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="d53H0TLH" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0179E1F00893; Sun, 31 May 2026 00:35:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780187729; bh=c8RBaPvCP6ZMekNwxeunAIbpIE5CulD2vhb0civMEuc=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=d53H0TLHSiX4+p2qwlrpbYNSkQr89x9B3WOu78Q9gI0I8WEUkK4/F0Zh3HERksKXa hg3haxL1FnZcxondubwZwSNMfqkYHFUlb1pVtumAOLYPxzg9iYQ0pDorDrMaBgKOD/ 19SMFIXoJdy9v1FI3zZfF9v/cgA9sSj7H7cQk2XdS8zrPotiR+MuiE0Uv2MC7VN0Oj D9qPYvBnffngDkemA2x1OvI+zzU9n1afjMGdGZN9X7wZzeaN3mSqLcp9+AZOQIZICx RomPOo2yIKZt95Ctpt1mPpJrInk8xy5K7p3NidJmIZRd+riM76W06RmR+UlHx97mfR 4qsbhgJyW6jMA== Date: Sun, 31 May 2026 00:35:27 +0000 From: Jaegeuk Kim To: Matthew Wilcox Cc: Theodore Tso , linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, Christoph Hellwig , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Akilesh Kailash , Christian Brauner Subject: Re: [f2fs-dev] [PATCH v2] f2fs: another way to set large folio by remembering inode number Message-ID: References: <20260521155748.GA79343@macsyma-wired.lan> <20260522141115.GA8258@macsyma-wired.lan> <20260522224108.GA18663@macsyma-wired.lan> Precedence: bulk X-Mailing-List: linux-api@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On 05/28, Matthew Wilcox wrote: > On Tue, May 26, 2026 at 01:10:55AM +0000, Jaegeuk Kim wrote: > > Background > > ---------- > > The primary use case is accelerating AI model loading, which demands > > exceptionally high sequential read speeds. In our benchmarks on embedded > > systems: > > - Using high-order page allocations allows the system to saturate the > > Universal Flash Storage (UFS) bandwidth, reaching 4 GB/s even at > > medium-to-low CPU frequencies. > > - In contrast, standard small folios cap performance at 2 GB/s. > > > > The performance doubling stems directly from reducing CPU cycle overhead during > > memory allocation. > > When you say "AI model loading", are you mmap()ing the file of weights, > or are you calling read() to load the file into anonymous memory? > > This matters because for the first operation, you need to allocate folios > of PMD size in order to make best use of TLB entries. For the second > operation, it's more important to iterate through the file quickly, > freeing folios behind you after you access them so they're available > for the next batch. We deal with multiple options tho, what I'm looking at is mostly a preloading models by mmap(MAP_POPULATE) which takes the readahead path bumping up the order by 2. Previously I also looked at fadvise(WILLNEED), but gave up due to the broken interface. OTOH, we use RWF_DONTCACHE for read() case, but I don't think it's ideal for the best loading performance. > > > _______________________________________________ > Linux-f2fs-devel mailing list > Linux-f2fs-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel