From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57427C47DDB for ; Mon, 29 Jan 2024 00:21:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 645F26B0071; Sun, 28 Jan 2024 19:21:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5CF2A6B0072; Sun, 28 Jan 2024 19:21:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 46EF86B0074; Sun, 28 Jan 2024 19:21:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 310C66B0071 for ; Sun, 28 Jan 2024 19:21:23 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id A11E5A196E for ; Mon, 29 Jan 2024 00:21:22 +0000 (UTC) X-FDA: 81730444404.01.0C8394D Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf15.hostedemail.com (Postfix) with ESMTP id C55EEA0007 for ; Mon, 29 Jan 2024 00:21:20 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b="YrQM/5A1"; spf=none (imf15.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706487681; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TtcvkfpjeeZrMMvO5FMe9b4Sy38+nE76hEUK8iZvFVU=; b=uThXBeHbFwGorPniOGf5SuRBtu816JPNDlerBYgWqh4wua3A666SWzO6XEPql642X7N2n2 YRk9M1rIsEq5LwV5UnRZptUuyon9d30XqQg0r1004DQ3V9CrVF3CSWjQk41PkYBNi1AnON BQT7bmrSFUFoYqIKF6FIJut5VkLFUGI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706487681; a=rsa-sha256; cv=none; b=dxbtFjn07vPLCvHczbIeldSqwgxZFc6VOTWn5MpWPrNbLbVs1vS6f3+CepJM0UxWI/E50B 9brD+bX8adb4sJjZWGo1pCdQjUdxOO5jwZmAM+0rrybNRel3XWqhg/1VlZLUcJNH1vYtWa rGLjn9ZcTbX8DORXCvqlViPGsCT1UPQ= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b="YrQM/5A1"; spf=none (imf15.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=TtcvkfpjeeZrMMvO5FMe9b4Sy38+nE76hEUK8iZvFVU=; b=YrQM/5A1V/xN0zBHd0Tl1PDh5h fpS54DN4lYo/murXbdbzG5CGsZgqP4yal7mkBQxPIknYMVeUyrtxTrznzLT/PZ3SVlLnQC6UX9lQW 7KiIIj+yhOrmb5ws5z+GPF0x3cG5hQoI2nJhRwl6e4K02a1shj9GKSqGE//taor2XTj2rDiMNuBS/ IOtwCbsx2ofToRNyQZpz2/uMf3+rApIosBa8VWvFn17naRWtZz22PihuWw5NVf99FRUalUFwznIz8 fb7uWzn/y8fp3f1/mXWfVTV9rJHZQRTY0CAk25yp1fHXFQPxK9JoLJI+psU6AhNq616ODzUDdHl68 Zn5hctlA==; Received: from willy by casper.infradead.org with local (Exim 4.97.1 #2 (Red Hat Linux)) id 1rUFOm-00000004ynq-1UyX; Mon, 29 Jan 2024 00:21:16 +0000 Date: Mon, 29 Jan 2024 00:21:16 +0000 From: Matthew Wilcox To: Mike Snitzer Cc: Ming Lei , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Don Dutile , Raghavendra K T , Alexander Viro , Christian Brauner Subject: Re: [RFC PATCH] mm/readahead: readahead aggressively if read drops in willneed range Message-ID: References: <20240128142522.1524741-1-ming.lei@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: C55EEA0007 X-Rspam-User: X-Stat-Signature: ddd5ru1igwyok4jwecbhz8aarymwwwd5 X-Rspamd-Server: rspam03 X-HE-Tag: 1706487680-680596 X-HE-Meta: U2FsdGVkX183IQh/hr452V6R1RgRLs32k8i/TenTucfaxTZbvdRPk5zK8M8llAUVm8z59sPFy0tdPJU1VQkF5b+pe3nAR+0d5R68hlYzD61zpA4ViJ1d6V3+VcIhj8Fr7rapDAQXiPzvRAgNS+YoI1j8TNUy03pk8Kv/T11LpwrSs1DfEUGK18Jrkfi09zlP3xnjVyytB/VHM3OE+N72Az0XDEVNPyF9CZupHZvGxU6o0WCW0iqD5unzn4ekSzeyHOSCWMvmFxTYSUrwuEwdEWoQaab95yuYVfXMnLQeLm72tTyP2LBAn9R/pyyWW2lw2eyJdEuJ1CiqbTEIg82bMenyEzKLtX6U4bN5GRsPK8x6rKaACUoBbSqs8/7xiQMfrhJj5eF1gzc95bWZ8oDOdjT2veorgj6kPXgyaqqXlAGx0+OKgjLuwS2+Nuzz181Si62jLff++RfLBhz5n2i8GeZGkhSozX9AnRvffBzynNnI14DdVgyOW5sMg6znEfFfdWXkUXOo9t6Cb0xF2NSkfTC4DEMVDgk8R6zJyhe69k77Zun4tvqDIUOgJiqJf6Z2O15wZLJ0x8qYtG66UzefZbT3ENPL3tIDiznXrt07K3o0OiTMyuu8tTsagKIdEDqlQ+DMytu/k2RPA3DpCBBD7a/FJW1IwIwgdwcuCkoZD3OnjcTtz2ITbxenaONVCKnPtPCkOXg3zbvFLHukWeu5+fDW9c9N6G1liRvBMdWnoyAHTkERWZ2mO2lZ7I5SNQrMBvDnOlIh6+vGhkvL6yNCcvMnuJmourR93NwPySx5lOupbKJtiWZUkilzOFKPs6LEYmq5hPPso3aoxrcksKEiqeF2xgEHBp4VFXYAqsKlTIi8uXT8ZFou8pZyh028Jhn9iISyEMax+s0Jgm/UtvJ2oXkIi+jl7N2JhJsfUys55yP9nusUo7X83NjbxR8Oua+dmbv7iO5PqzaQwGXxOAl RI9Dsxyf ZeUu0eAiG2udiQwrF15Uy+lgyO7gzWLeWL0q7THdQ8nXNqv/3ljiniaOPGR+5VJwjcJrAhkLN/oMVWjerT6MuLFX91VkZks4jbDtMDWKEwN5yG6nI2qpA+/BATDtKZMxHEkj5YKEIRwjsPNc0sWUDFstaRHMSPbCo8KCVDJe8H9vSEpwKsQqN1WczZitbx7S4jhed X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun, Jan 28, 2024 at 06:12:29PM -0500, Mike Snitzer wrote: > On Sun, Jan 28 2024 at 5:02P -0500, > Matthew Wilcox wrote: > > > On Sun, Jan 28, 2024 at 10:25:22PM +0800, Ming Lei wrote: > > > Since commit 6d2be915e589 ("mm/readahead.c: fix readahead failure for > > > memoryless NUMA nodes and limit readahead max_pages"), ADV_WILLNEED > > > only tries to readahead 512 pages, and the remained part in the advised > > > range fallback on normal readahead. > > > > Does the MAINTAINERS file mean nothing any more? > > "Ming, please use scripts/get_maintainer.pl when submitting patches." That's an appropriate response to a new contributor, sure. Ming has been submitting patches since, what, 2008? Surely they know how to submit patches by now. > I agree this patch's header could've worked harder to establish the > problem that it fixes. But I'll now take a crack at backfilling the > regression report that motivated this patch be developed: Thank you. > Linux 3.14 was the last kernel to allow madvise (MADV_WILLNEED) > allowed mmap'ing a file more optimally if read_ahead_kb < max_sectors_kb. > > Ths regressed with commit 6d2be915e589 (so Linux 3.15) such that > mounting XFS on a device with read_ahead_kb=64 and max_sectors_kb=1024 > and running this reproducer against a 2G file will take ~5x longer > (depending on the system's capabilities), mmap_load_test.java follows: > > import java.nio.ByteBuffer; > import java.nio.ByteOrder; > import java.io.RandomAccessFile; > import java.nio.MappedByteBuffer; > import java.nio.channels.FileChannel; > import java.io.File; > import java.io.FileNotFoundException; > import java.io.IOException; > > public class mmap_load_test { > > public static void main(String[] args) throws FileNotFoundException, IOException, InterruptedException { > if (args.length == 0) { > System.out.println("Please provide a file"); > System.exit(0); > } > FileChannel fc = new RandomAccessFile(new File(args[0]), "rw").getChannel(); > MappedByteBuffer mem = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size()); > > System.out.println("Loading the file"); > > long startTime = System.currentTimeMillis(); > mem.load(); > long endTime = System.currentTimeMillis(); > System.out.println("Done! Loading took " + (endTime-startTime) + " ms"); > > } > } It's good to have the original reproducer. The unfortunate part is that being at such a high level, it doesn't really show what syscalls the library makes on behalf of the application. I'll take your word for it that it calls madvise(MADV_WILLNEED). An strace might not go amiss. > reproduce with: > > javac mmap_load_test.java > echo 64 > /sys/block/sda/queue/read_ahead_kb > echo 1024 > /sys/block/sda/queue/max_sectors_kb > mkfs.xfs /dev/sda > mount /dev/sda /mnt/test > dd if=/dev/zero of=/mnt/test/2G_file bs=1024k count=2000 > > echo 3 > /proc/sys/vm/drop_caches (I prefer to unmount/mount /mnt/test; it drops the cache for /mnt/test/2G_file without affecting the rest of the system) > java mmap_load_test /mnt/test/2G_file > > Without a fix, like the patch Ming provided, iostat will show rareq-sz > is 64 rather than ~1024. Understood. But ... the application is asking for as much readahead as possible, and the sysadmin has said "Don't readahead more than 64kB at a time". So why will we not get a bug report in 1-15 years time saying "I put a limit on readahead and the kernel is ignoring it"? I think typically we allow the sysadmin to override application requests, don't we? > > > @@ -972,6 +974,7 @@ struct file_ra_state { > > > unsigned int ra_pages; > > > unsigned int mmap_miss; > > > loff_t prev_pos; > > > + struct maple_tree *need_mt; > > > > No. Embed the struct maple tree. Don't allocate it. > > Constructive feedback, thanks. > > > What made you think this was the right approach? > > But then you closed with an attack, rather than inform Ming and/or > others why you feel so strongly, e.g.: Best to keep memory used for > file_ra_state contiguous. That's not an attack, it's a genuine question. Is there somewhere else doing it wrong that Ming copied from? Does the documentation need to be clearer? I can't fix what I don't know.