From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 78E0641C7A for ; Fri, 1 Mar 2024 15:02:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709305324; cv=none; b=ZSRzHmHz6FNuKt7DkaDzR9NqSnC9r8u01Uz7/xp2Q6H8H2uplqDftKjQPykilganO2Efho/jAPhBHv01nYGNCJM6DreGEDcFI1QiBv8Bslo5SqG/P9+G5E5b2gEhh1GKcgehjQ+MVBWX8Htvk50e+F223ScCV3LfiuXixyZNLZY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709305324; c=relaxed/simple; bh=WcFb6UJyrAj6vMVXS9jpaavIcTkGQ6WkUCPf39EjXH4=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=HeIzusEBnG5FXQpo4e4Qnuo/uK9okR+fiI/4aVwKg8Fe+0v1MGP2STFGVEb1UrU9E5uxNUmCufXk90w4zhMMNwP+Y0FkHEZ/jLyqNxp+EXS7AiapL5nKVJxAs6i2IYYTGOon2DyfBRjA8jDZxzmZHnt2kvAnJn1Cu9VbwmW6RrU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=UhvmraFn; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="UhvmraFn" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1709305321; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=LxpqDIonN0/AQ1Q5Zx4aN7Ac1V+lU2NoYWrXBhd0Etg=; b=UhvmraFnkosJyze9KoMXCP7G3ZiZT/QIm8C2ll5EH24/Umk6pDFzIOvu33ccDqL4EeKyDR 8TTaKldJGHKnCOhKm603mStRsAMgToIk+UKXE3JkwEU1LBWh/fux7fsbv6CUP9qVo7CV35 YBVsPrkeaTRNVYVNaPGOqaQhzMJPImY= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-189-gdLcepOFNM2CPVAuVBP41Q-1; Fri, 01 Mar 2024 10:01:55 -0500 X-MC-Unique: gdLcepOFNM2CPVAuVBP41Q-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5CCE42815E29; Fri, 1 Mar 2024 15:01:53 +0000 (UTC) Received: from bfoster (unknown [10.22.32.137]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 25F21111D3E6; Fri, 1 Mar 2024 15:01:53 +0000 (UTC) Date: Fri, 1 Mar 2024 10:03:35 -0500 From: Brian Foster To: Kent Overstreet Cc: linux-bcachefs@vger.kernel.org, linux-kernel@vger.kernel.org, djwong@kernel.org Subject: Re: [PATCH 04/21] bcachefs: Disk space accounting rewrite Message-ID: References: <20240225023826.2413565-1-kent.overstreet@linux.dev> <20240225023826.2413565-5-kent.overstreet@linux.dev> <3dug3vvjmlkhu42glwhkm5ozgkp4tzqvekxdabbgv4dv4yhxig@r5bw746uaxym> Precedence: bulk X-Mailing-List: linux-bcachefs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.3 On Thu, Feb 29, 2024 at 04:16:00PM -0500, Kent Overstreet wrote: > On Thu, Feb 29, 2024 at 01:44:27PM -0500, Brian Foster wrote: > > On Wed, Feb 28, 2024 at 11:10:12PM -0500, Kent Overstreet wrote: > > > I think it ended up not needing to be moved, and I just forgot to drop > > > it - originally I disallowed accounting entries that referenced > > > nonexistent devices, but that wasn't workable so now it's only nonzero > > > accounting keys that aren't allowed to reference nonexistent devices. > > > > > > I'll see if I can delete it. > > > > > > > Do you mean to delete the change that moves the call, or the flush call > > entirely? > > Delte the change, I think there's further cleanup (& probably bugs to > fix) possible with that flush call but I'm not going to get into it > right now. > Ok, just trying to determine whether I need to look back and make sure this doesn't regress the problem this originally fixed. > > > +/* > > > + * Notes on disk accounting: > > > + * > > > + * We have two parallel sets of counters to be concerned with, and both must be > > > + * kept in sync. > > > + * > > > + * - Persistent/on disk accounting, stored in the accounting btree and updated > > > + * via btree write buffer updates that treat new accounting keys as deltas to > > > + * apply to existing values. But reading from a write buffer btree is > > > + * expensive, so we also have > > > + * > > > > I find the wording a little odd here, and I also think it would be > > helpful to explain how/from where the deltas originate. For example, > > something along the lines of: > > > > "Persistent/on disk accounting, stored in the accounting btree and > > updated via btree write buffer updates. Accounting updates are > > represented as deltas that originate from . > > Accounting keys represent these deltas through commit into the write > > buffer. The accounting/delta keys in the write buffer are then > > accumulated into the appropriate accounting btree key at write buffer > > flush time." > > yeah, that's worth including. > > There's an interesting point that you're touching on; btree write buffer > are always dependent state changes from some other (non write buffer) > btree; we never look at a write buffer btree and generate an update > there - we can't, reading from a write buffer btree doesn't get you > anything consistent or up to date. > > So in normal operation it really only makes sense to do write buffer > updates from a transactional trigger - that's the only way to use them > and have them be consistent with the resst of the filesystem. > > And since triggers work by comparing old and new, they naturally > generate updates that are deltas. > Hm that is interesting, I hadn't made that connection. Thanks. Brian > > > + * - In memory accounting, where accounting is stored as an array of percpu > > > + * counters, indexed by an eytzinger array of disk acounting keys/bpos (which > > > + * are the same thing, excepting byte swabbing on big endian). > > > + * > > > > Not really sure about the keys vs. bpos thing, kind of related to my > > comments on the earlier patch. It might be more clear to just elide the > > implementation details here, i.e.: > > > > "In memory accounting, where accounting is stored as an array of percpu > > counters that are cheap to read, but not persistent. Updates to in > > memory accounting are propagated from the transaction commit path." > > > > ... but NBD, and feel free to reword, drop and/or correct any of that > > text. > > It's there because bch2_accounting_mem_read() takes a bpos when it > should be a disk_accounting_key. I'll fix that if I can... > > > > + * Cheap to read, but non persistent. > > > + * > > > + * To do a disk accounting update: > > > + * - initialize a disk_accounting_key, to specify which counter is being update > > > + * - initialize counter deltas, as an array of 1-3 s64s > > > + * - call bch2_disk_accounting_mod() > > > + * > > > + * This queues up the accounting update to be done at transaction commit time. > > > + * Underneath, it's a normal btree write buffer update. > > > + * > > > + * The transaction commit path is responsible for propagating updates to the in > > > + * memory counters, with bch2_accounting_mem_mod(). > > > + * > > > + * The commit path also assigns every disk accounting update a unique version > > > + * number, based on the journal sequence number and offset within that journal > > > + * buffer; this is used by journal replay to determine which updates have been > > > + * done. > > > + * > > > + * The transaction commit path also ensures that replicas entry accounting > > > + * updates are properly marked in the superblock (so that we know whether we can > > > + * mount without data being unavailable); it will update the superblock if > > > + * bch2_accounting_mem_mod() tells it to. > > > > I'm not really sure what this last paragraph is telling me, but granted > > I've not got that far into the code yet either. > > yeah that's for a whole different subsystem that happens to be slaved to > the accounting - replicas.c, which also used to help out quite a bit > with the accounting but now it's pretty much just for managing the > superblock replicas section. > > The superblock replicas section is just a list of entries, where each > entry is a list of devices - "there is replicated data present on this > set of devices". We also have full counters of how much data is present > replicated across each set of devices, so the superblock section is just > a truncated version of the accounting - "data exists on these devices", > instead of saying how much. >