From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6B9BF17BB38; Mon, 30 Sep 2024 20:57:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727729827; cv=none; b=K4UnTYAwYmTupqvsvIBhw+wClKofRkWFO8tv1wJl+dzcLLU2DgqHx9dNtWRMhMfE52YaOLOoF5lcDP9UyOfzvgj+KEEdtC2F+Aaf6PDyaBtpHjrIqzvJ9Y6dozaIQd6nRs+bTUYsGs3m5KQLjeHr1eW1B3JDfzQBtcqFJ2+kQvg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727729827; c=relaxed/simple; bh=oAnbrt3CNRO8/aNuu0f/1Al0yrakZS5BFEVOZbkrSsQ=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=M2Lve06OAyIzA1aACrNoc3HiTelf3RlqTmfvEsGPZLbyjhIYen9OxYzljdWAw/FfRAD27YvcuNj1ur8Y+s5c9sSqjoxmHZfR1FNVkIL33x4YQhN0LdSrv2Av8yckwF5ZFfvE5L7e9gcMUMALuic1vFe0eIf9+nzljre8ht7gUz8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=uQYqFNAu; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="uQYqFNAu" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=CYhZpytb38aFtb2OIz5EkmYbXp7wsTdiyaA1Hi7/lRY=; b=uQYqFNAujgk6D4yfw7K+xlhc+N s9mPV+J3PpTfHxSccYhnB2FyKGUifVBubSYz7Si8lA2eiYw/FGZ8LfAzL05fap14UMYO3sv7k6mbS uv+ctARlaq4JQPxDePMCfjp0vjG7Pn77oJ1/clzYyP+RHz3tAtwsTOKif9fTQC1egl2NpKr36By+4 1DW1ISu+xyM2W7451yQy8gv/Qlg8ZEr+HLZBYkZZIS/qTAdR1dW2P5ccEFuXRIjmJpeNTVyu7MQVk HgjSuuL8i6l2Zocqb9+o/DmWZetyQhwRHCLiJj6bwravb8Qu4bsUmNdOnKPN5EMQcrNJi62vuVLKl M4PVNnCQ==; Received: from willy by casper.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1svNRw-00000000cMU-3ST3; Mon, 30 Sep 2024 20:56:56 +0000 Date: Mon, 30 Sep 2024 21:56:56 +0100 From: Matthew Wilcox To: Linus Torvalds Cc: Christian Theune , Dave Chinner , Chris Mason , Jens Axboe , linux-mm@kvack.org, "linux-xfs@vger.kernel.org" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Daniel Dao , regressions@lists.linux.dev, regressions@leemhuis.info Subject: Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards) Message-ID: References: <02121707-E630-4E7E-837B-8F53B4C28721@flyingcircus.io> <295BE120-8BF4-41AE-A506-3D6B10965F2B@flyingcircus.io> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Mon, Sep 30, 2024 at 01:12:37PM -0700, Linus Torvalds wrote: > It's basically been that way forever. The code has changed many times, > but we've basically always had that "wait on bit will wait not until > the next wakeup, but until it actually sees the bit being clear". > > And by "always" I mean "going back at least to before the git tree". I > didn't search further. It's not new. > > The only reason I pointed at that (relatively recent) commit from 2021 > is that when we rewrote the page bit waiting logic (for some unrelated > horrendous scalability issues with tens of thousands of pages on wait > queues), the rewritten code _tried_ to not do it, and instead go "we > were woken up by a bit clear op, so now we've waited enough". > > And that then caused problems as explained in that commit c2407cf7d22d > ("mm: make wait_on_page_writeback() wait for multiple pending > writebacks") because the wakeups aren't atomic wrt the actual bit > setting/clearing/testing. Could we break out if folio->mapping has changed? Clearly if it has, we're no longer waiting for the folio we thought we were waiting for, but for a folio which now belongs to a different file. maybe this: +void __folio_wait_writeback(struct address_space *mapping, struct folio *folio) +{ + while (folio_test_writeback(folio) && folio->mapping == mapping) { + trace_folio_wait_writeback(folio, mapping); + folio_wait_bit(folio, PG_writeback); + } +} [...] void folio_wait_writeback(struct folio *folio) { - while (folio_test_writeback(folio)) { - trace_folio_wait_writeback(folio, folio_mapping(folio)); - folio_wait_bit(folio, PG_writeback); - } + __folio_wait_writeback(folio->mapping, folio); }