From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06583C433EF for ; Tue, 19 Jul 2022 15:13:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238325AbiGSPNv (ORCPT ); Tue, 19 Jul 2022 11:13:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43294 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238314AbiGSPNu (ORCPT ); Tue, 19 Jul 2022 11:13:50 -0400 Received: from verein.lst.de (verein.lst.de [213.95.11.211]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2F5995407E for ; Tue, 19 Jul 2022 08:13:49 -0700 (PDT) Received: by verein.lst.de (Postfix, from userid 2407) id 0231068AFE; Tue, 19 Jul 2022 17:13:45 +0200 (CEST) Date: Tue, 19 Jul 2022 17:13:45 +0200 From: Christoph Hellwig To: Johannes Thumshirn Cc: Matthew Wilcox , Christoph Hellwig , Naohiro Aota , "linux-btrfs@vger.kernel.org" Subject: Re: error writing primary super block on zoned btrfs Message-ID: <20220719151345.GA21932@lst.de> References: <20220718054944.GA22359@lst.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.17 (2007-11-01) Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On Tue, Jul 19, 2022 at 07:53:45AM +0000, Johannes Thumshirn wrote: > Ha but zoned btrfs uses two zones as a ringbuffer for its super-block, could > it be, that we're accumulating too many page references somewhere? And then it > behaves like having millions of filesystems mounted? That fact the superblock moves for zoned devices probably has something to do with it. But the whole code leaves me really puzzling. Why does wait_dev_supers even do a find_get_page vs just stashing three page pointers away in the btrfs_device structure? Why does this abuse wait_on_page_locked vs using a completion? Why does the code count errors while only an error on the primary superblock has any consequences? What is the point of the secodary superblocks if they aren't written on fsync? How does just setting the whole page uptodat work on file systems with a block size smaller than the page size where we don't know what is in the rest of the page?