From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D190C47422 for ; Mon, 29 Jan 2024 03:58:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 97E6D6B0083; Sun, 28 Jan 2024 22:58:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 92EA56B0087; Sun, 28 Jan 2024 22:58:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7F6AD6B0088; Sun, 28 Jan 2024 22:58:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 703816B0083 for ; Sun, 28 Jan 2024 22:58:02 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 94BF8A1E11 for ; Mon, 29 Jan 2024 03:58:01 +0000 (UTC) X-FDA: 81730990362.01.CADEC7B Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf30.hostedemail.com (Postfix) with ESMTP id BE23880017 for ; Mon, 29 Jan 2024 03:57:59 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UIdKm2EA; spf=pass (imf30.hostedemail.com: domain of ming.lei@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=ming.lei@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706500679; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=n/MxaP1kzI5B5Dd1Ifm/aTvDRwpaQpXdO3YcG9Kmi5A=; b=Xfj3OA5roNJn9JpsqRFWmu5c9d8vY46Pg5y0/Wv11Nxi9FvJknq2rQLqHhOFJMQhg0w3vV WtgpdfmyDLKFC3DPKsyk54IIMj7jRs7I6zWhCVbcOdPEu10mO40AQjazbe/zwx4ikUnPQH 2kkOJWWuqM0dhQv/p6y/6vvUC6o4e3M= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706500679; a=rsa-sha256; cv=none; b=cVNP17aaNHids3JevHZt2rh4Rx4nWDk0E8fAgNIXybaLucqVYDS58eQkUw0jux3jM3hWZ4 KGTmEvWos2ZXpab/m/EzgpYVXNrDq6xI72pcUR9HVd1bZvb/5bCK8p3Jeb4iqkRWeE//FM NilT39HASk0gvuHtJd5g5pyKKDNY9qo= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UIdKm2EA; spf=pass (imf30.hostedemail.com: domain of ming.lei@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=ming.lei@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1706500679; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=n/MxaP1kzI5B5Dd1Ifm/aTvDRwpaQpXdO3YcG9Kmi5A=; b=UIdKm2EAuEZxRrAJehrEKYx7AV9stFS3jR7Q/wdFXG7CdvT83IJc5H1QKAe+LXjJtn/qRm EjqRudC98gLudNCrnR/7Sizob3+ESZERp3M+JfLpfLJVhgaW+0vxs7J0Zgv7tDt0LCTq1c Bxw2G+0iPl8Uvbl8aVYufxwOzRdaX28= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-564-Rma6JoVROtOm-i8iAgSF3A-1; Sun, 28 Jan 2024 22:57:55 -0500 X-MC-Unique: Rma6JoVROtOm-i8iAgSF3A-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E460683B86A; Mon, 29 Jan 2024 03:57:54 +0000 (UTC) Received: from fedora (unknown [10.72.116.135]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 4F37F492BC6; Mon, 29 Jan 2024 03:57:48 +0000 (UTC) Date: Mon, 29 Jan 2024 11:57:45 +0800 From: Ming Lei To: Dave Chinner Cc: Mike Snitzer , Matthew Wilcox , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Don Dutile , Raghavendra K T , Alexander Viro , Christian Brauner , linux-block@vger.kernel.org, ming.lei@redhat.com Subject: Re: [RFC PATCH] mm/readahead: readahead aggressively if read drops in willneed range Message-ID: References: <20240128142522.1524741-1-ming.lei@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 X-Rspamd-Queue-Id: BE23880017 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 1f8x6s9y6p6hthao5o1gu47j6qecxm6t X-HE-Tag: 1706500679-505012 X-HE-Meta: U2FsdGVkX187YQlYkJy8qNtTVDvf7HpZDsrotD2Kt8ol0rGD4688kCzSGAPw5bw4q4sJUWuSoDB/oojlbu2sgh5E6t/EN3jtpd/q9W91mETN9OjWmbQ3G5rYX7HGdoFepiT9FcAU0DlMMm0gEPrLljnLedWoz6ctdpyP5G9a/uOP6h9psSSFpk17SKSSiMr5XMjwrpFgK/rtUEPU7RbWL+bAjyzK3b1xq8Wp1JBAbUX3YqjGw6khXowRrlBLP+jOMJtup27U8rlj92hqs+d3LodB8A5g0z0TKzQYzdf12qzHacwzHT8zUIuqFhN4ZgvvhdlmcZvtOZTr74jZ3kMY1EV72TiRAetB27N0tckdH2I6VXqjwdfVlML+KdWV6C3PKcpdyGPb9XmedUFGc4awfhZjbACzGktvFeMtRZVpeKVdbs8q4PHiFfPgLsebslWJ5GOhE6Wlw3qyHfNK988/I2T5Rxp+5sZ2bpm2R9QyAlzBLyEFP2R2qc4m8MA/h9e7IFkNr6eTcZrhqUsoNv2QObxnphTX6g1MCvmwtlXJoN86N1aMSJGX3RX6QCq1ds2Lr+BSPbLi7eJ4kA0JxHaJt/QJiEnF3vy9KLBJddzb/nNIjsD5Fu7AuEnhnjkHe8HVy21AOd6JUuJKvkg+SDW/4MG1rU3ubaGbyDdony9ZCyzbsG5YJ6wmq2ssGcs7KLrcDywMCjwROvOxELdrCZLzal3H1tYdgfdnA7BPBWU+cFcbcyxSveVAxvQ4xWy10PZNZrBl4k5wEkO67ebf+ytKIIWAAwWck9aAnnTaHIVyXtkfjTxhjgHxPV0RTBCsfGkzkY94q5DfIFFr4I1dA9XutsNDrVVKtnb5vg3dhtshZ+DNd6HeTQsggx/7R/KX2rLmwsxkKUIfKdVP9IdaoRNq+oMYs2M6A2dpGZizYTpJIvW4uiUyhlbmhxxIkj07YD6qudBtfVYyC6OYBA25k/f frsNRjwh ZvyUO3xVl0STY33OHBiTZWbNCVBlLHNfSKTIdvoQW5lGo+Aqf33smRygH1ViEqrhz7s4svyXkkfFMN7iSZM/XrVm2In3HKFSw/22GOzRvtHeS88Ym7LHVBd9R0C6cMjardtFIHs+r1avsj1ZmQfUBEXCxyEGSFXVpd2D0Js6lLAAIV3kBwa2Bnc79yMGgM+PLdq0e5eKqwHhxZghuAEUvFA9HtQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jan 29, 2024 at 12:47:41PM +1100, Dave Chinner wrote: > On Sun, Jan 28, 2024 at 07:39:49PM -0500, Mike Snitzer wrote: > > On Sun, Jan 28, 2024 at 7:22 PM Matthew Wilcox wrote: > > > > > > On Sun, Jan 28, 2024 at 06:12:29PM -0500, Mike Snitzer wrote: > > > > On Sun, Jan 28 2024 at 5:02P -0500, > > > > Matthew Wilcox wrote: > > > Understood. But ... the application is asking for as much readahead as > > > possible, and the sysadmin has said "Don't readahead more than 64kB at > > > a time". So why will we not get a bug report in 1-15 years time saying > > > "I put a limit on readahead and the kernel is ignoring it"? I think > > > typically we allow the sysadmin to override application requests, > > > don't we? > > > > The application isn't knowingly asking for readahead. It is asking to > > mmap the file (and reporter wants it done as quickly as possible.. > > like occurred before). > > ... which we do within the constraints of the given configuration. > > > This fix is comparable to Jens' commit 9491ae4aade6 ("mm: don't cap > > request size based on read-ahead setting") -- same logic, just applied > > to callchain that ends up using madvise(MADV_WILLNEED). > > Not really. There is a difference between performing a synchronous > read IO here that we must complete, compared to optimistic > asynchronous read-ahead which we can fail or toss away without the > user ever seeing the data the IO returned. Yeah, the big readahead in this patch happens when user starts to read over mmaped buffer instead of madvise(). > > We want required IO to be done in as few, larger IOs as possible, > and not be limited by constraints placed on background optimistic > IOs. > > madvise(WILLNEED) is optimistic IO - there is no requirement that it > complete the data reads successfully. If the data is actually > required, we'll guarantee completion when the user accesses it, not > when madvise() is called. IOWs, madvise is async readahead, and so > really should be constrained by readahead bounds and not user IO > bounds. > > We could change this behaviour for madvise of large ranges that we > force into the page cache by ignoring device readahead bounds, but > I'm not sure we want to do this in general. > > Perhaps fadvise/madvise(willneed) can fiddle the file f_ra.ra_pages > value in this situation to override the device limit for large > ranges (for some definition of large - say 10x bdi->ra_pages) and > restore it once the readahead operation is done. This would make it > behave less like readahead and more like a user read from an IO > perspective... ->ra_pages is just one hint, which is 128KB at default, and either device or userspace can override it. fadvise/madvise(willneed) already readahead bytes from bdi->io_pages which is the max device sector size(often 10X of ->ra_pages), please see force_page_cache_ra(). Follows the current report: 1) usersapce call madvise(willneed, 1G) 2) only the 1st part(size is from bdi->io_pages, suppose it is 2MB) is readahead in madvise(willneed, 1G) since commit 6d2be915e589 3) the other parts(2M ~ 1G) is readahead by unit of bdi->ra_pages which is set as 64KB by userspace when userspace reads the mmaped buffer, then the whole application becomes slower. This patch changes 3) to use bdi->io_pages as readahead unit. Thanks, Ming