From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 669FDF9D6 for ; Mon, 29 Jan 2024 03:00:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706497252; cv=none; b=CKzluGxWYbQHEo0tFf8z/ei5Kl6k01G8RvcI/l94p2Py14xezJnhQ6t73eV+k837S3eZzOkmPrPb0IjHgQSGf78kje+cQoRCz7aylSEhQrsqQMTwVrHcOmqWvuba7sdzjdpbb36jW7B8WbL6+idueHSQIbjz4jkQ4DI9bjUpc1A= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706497252; c=relaxed/simple; bh=/oGa5ABPhpXed7C0cgst3BJAf0CGV7fQ9TLfp798wPU=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=phwifk99YK560at1ePDQY/z/UHuNWjtb4T9Qdz2bAiDdFszxN3Bg6Xa2s3P811x0QhN1gBtRWUj9dQi0jffuFOIY87k25IPIggtMPf09UTUheE3ZbyJQYbAtImExNROHtIJ23DxHM4QkP5ekMH56V+wdTrLdF+aUYRmeWrF8SBU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Lj3ZlaMS; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Lj3ZlaMS" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1706497249; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=SH5JJA8B2/oLNz4ErgboH6AqTlVlU9RXl7tVsXTShf0=; b=Lj3ZlaMSkbSqhI9QPb3uvNYOPrudpdRXFCaqa3swaeO6N+N1qBkS/vNZN0irGVD/hdTi/f r9QkDz0M1BWZ43OjuJOXrtC9fb1gM223AXWFtIObkHbzrkcNTw0iz1yEPcB/YessPI1oCB oRidc4vIMCGdTtalox92HnSdStJTwDo= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-540-GSvxAW-rMwWbug6hU-02LQ-1; Sun, 28 Jan 2024 22:00:45 -0500 X-MC-Unique: GSvxAW-rMwWbug6hU-02LQ-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 2077E29AC015; Mon, 29 Jan 2024 03:00:45 +0000 (UTC) Received: from fedora (unknown [10.72.116.135]) by smtp.corp.redhat.com (Postfix) with ESMTPS id C4D5B2026F95; Mon, 29 Jan 2024 03:00:40 +0000 (UTC) Date: Mon, 29 Jan 2024 11:00:36 +0800 From: Ming Lei To: Matthew Wilcox Cc: Andrew Morton , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Mike Snitzer , Don Dutile , Raghavendra K T , ming.lei@redhat.com Subject: Re: [RFC PATCH] mm/readahead: readahead aggressively if read drops in willneed range Message-ID: References: <20240128142522.1524741-1-ming.lei@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.4 On Sun, Jan 28, 2024 at 10:02:49PM +0000, Matthew Wilcox wrote: > On Sun, Jan 28, 2024 at 10:25:22PM +0800, Ming Lei wrote: > > Since commit 6d2be915e589 ("mm/readahead.c: fix readahead failure for > > memoryless NUMA nodes and limit readahead max_pages"), ADV_WILLNEED > > only tries to readahead 512 pages, and the remained part in the advised > > range fallback on normal readahead. > > Does the MAINTAINERS file mean nothing any more? It is just miss to Cc you, sorry. > > > If bdi->ra_pages is set as small, readahead will perform not efficient > > enough. Increasing read ahead may not be an option since workload may > > have mixed random and sequential I/O. > > I thik there needs to be a lot more explanation than this about what's > going on before we jump to "And therefore this patch is the right > answer". Both 6d2be915e589 and the commit log provids background about this issue, let me explain it more: 1) before commit 6d2be915e589, madvise/fadvise(WILLNEED)/readahead syscalls try to readahead in the specified range if memory is allowed, and for each readahead in this range, the ra size is set as max sectors of the block device, see force_page_cache_ra(). 2) since commit 6d2be915e589, only 2MB bytes are load in these syscalls, and the remained bytes fallback to future normal readahead when reads from page cache or mmap buffer 3) this patch wires the advise(WILLNEED) range info to normal readahead for both mmap fault and buffered read code path, so each readhead can use max sectors of block size for the ra, basically takes the similar approach before commit 6d2be915e589 > > > @@ -972,6 +974,7 @@ struct file_ra_state { > > unsigned int ra_pages; > > unsigned int mmap_miss; > > loff_t prev_pos; > > + struct maple_tree *need_mt; > > No. Embed the struct maple tree. Don't allocate it. What made you > think this was the right approach? Can you explain why it has to be embedded? core-api/maple_tree.rst mentioned it is fine to call "mt_init() for dynamically allocated ones". maple tree provides one easy way to record the advised willneed range, so readahead code path can apply this info for speedup readahead. Thanks, Ming