From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9E59A3D0C2 for ; Mon, 22 Jan 2024 13:34:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.15 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705930457; cv=none; b=HNF67tahSuydiSON43KUNfim8HcyXmPtzEqGjoWNH+Ea+k6ELpMxARuJqwqjfYcWhIvHKiAL3qRpttDvRHX3JOiQhgWrZ6B3jmQ2WelbW49VXVA3ayrW6jOglRzlyRRCKpfAhS7lkwYcG5xTnl8chZbOWhvlndPvAsZysIYSOZs= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705930457; c=relaxed/simple; bh=58hk3EGgivAqnjEyYxsUglKu29GyzAljTRDcz4+mJzs=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=kWG01tW8DO1hSyPgIfsRXZDIIA5lRZ+TpdDJXlTvozB/yu1/+2tPXMSUF5h2BzRB3ocIaQYu/bQtgA3IPE00Fail73vsFeT/bN/S/E3wsyEVyu3GxC4HVjnLhbgEejDasJZIC1x+FXjPxZ+ONJ2ha60lsG3gzZdn67/glgIKZbE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=LCj3iWGX; arc=none smtp.client-ip=198.175.65.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="LCj3iWGX" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1705930455; x=1737466455; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=58hk3EGgivAqnjEyYxsUglKu29GyzAljTRDcz4+mJzs=; b=LCj3iWGX1pEyc5WwL4J4L74qXp2SDZpYQ6IiljC2Paq4n3UiJQF8bUYU S40Fppuc/iOP6CPMTSCgdS56UXfZTpYGs83TGuJRgMYIsS7CWLMQk6sEh s5oj2x7MSvqhPSzHxULLAaTQHsuV4lbVWH5J8qguD5Ya0UjEbJP3zs2UB DvbbiyVYQ+e9GeNb5k1aYBvI1i9Q0K2yEw/BZUuw0g8JUkXepNReHKcPH 0dSMg2swMp0oYmOts0feZ/CPadX7iORTrUC8BpILGLM1dSd17WBPRMcFm dBTCGkGnwq+7OLxychbgAK8T8kd+2ha9FyzIbjN1hKeb/2bqE7LfMga9/ Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10960"; a="1091137" X-IronPort-AV: E=Sophos;i="6.05,211,1701158400"; d="scan'208";a="1091137" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jan 2024 05:34:15 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10960"; a="928993791" X-IronPort-AV: E=Sophos;i="6.05,211,1701158400"; d="scan'208";a="928993791" Received: from tassilo.jf.intel.com (HELO tassilo) ([10.54.38.190]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jan 2024 05:34:14 -0800 Date: Mon, 22 Jan 2024 05:34:12 -0800 From: Andi Kleen To: Dave Chinner Cc: linux-xfs@vger.kernel.org, linux-mm@kvack.org Subject: Re: Using Folios for XFS metadata Message-ID: References: <20240118222216.4131379-1-david@fromorbit.com> <87zfwxk75o.fsf@linux.intel.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: [fixed the subject, not sure what happened there] FWIW I'm not sure fail-fail is always the right strategy here, in many cases even with some reclaim, compaction may win. Just not if you're on a tight budget for the latencies. > I stress test and measure XFS metadata performance under sustained > memory pressure all the time. This change has not caused any > obvious regressions in the short time I've been testing it. Did you test for tail latencies? There are some relatively simple ways to trigger memory fragmentation, the standard way is to allocate a very large THP backed file and then punch a lot of holes. > > I still need to do perf testing on large directory block sizes. That > is where high-order allocations will get stressed - that's where > xlog_kvmalloc() starts dominating the profiles as it trips over > vmalloc scalability issues... Yes that's true. vmalloc has many issues, although with the recent patches to split the rbtrees with separate locks it may now look quite different than before. > > > I would in any case add a tunable for it in case people run into this. > > No tunables. It either works or it doesn't. If we can't make > it work reliably by default, we throw it in the dumpster, light it > on fire and walk away. I'm not sure there is a single definition of "reliably" here -- for many workloads tail latencies don't matter, so it's always reliable, as long as you have good aggregate throughput. Others have very high expectations for them. Forcing the high expectations on everyone is probably not a good general strategy though, as there are general trade offs. I could see that having lots of small tunables for every use might not be a good idea. Perhaps there would be a case for a single general tunable that controls higher order folios for everyone. > > > Tail latencies are a common concern on many IO workloads. > > Yes, for user data operations it's a common concern. For metadata, > not so much - there's so many far worse long tail latencies in > metadata operations (like waiting for journal space) that memory > allocation latencies in the metadata IO path are largely noise.... I've seen pretty long stalls in the past. The difference to the journal is also that it is local the file system, while the memory is normally shared with everyone on the node or system. So the scope of noisy neighbour impact can be quite different, especially on a large machine. -Andi