From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D785C4332F for ; Mon, 19 Dec 2022 15:43:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232511AbiLSPnf (ORCPT ); Mon, 19 Dec 2022 10:43:35 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57906 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232433AbiLSPmw (ORCPT ); Mon, 19 Dec 2022 10:42:52 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C11E112087 for ; Mon, 19 Dec 2022 07:42:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671464526; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=KlmAXRN/bl6GOiZ7bfwE8nGx0LLWtlpcSJavywUbIXE=; b=LCi+2BroyGY51KNUsAWDX035aNjGUMSgJ1n6rk29NLffNmOkIxu5JLLVttUAkOGKMqOc2V MT5+LWdgEpuFnuVoXCfuJw6enaJyupjPRgmF07aHpSrfAaUBLvSaAiczdX/eJddNQLi9ds MyYZxhjwAVhxTXzobOS4/fkYp5bthxc= Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-379-1iIti7d3PRObsEZVbnQExw-1; Mon, 19 Dec 2022 10:42:04 -0500 X-MC-Unique: 1iIti7d3PRObsEZVbnQExw-1 Received: by mail-qk1-f199.google.com with SMTP id h8-20020a05620a284800b006b5c98f09fbso7540288qkp.21 for ; Mon, 19 Dec 2022 07:42:04 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=KlmAXRN/bl6GOiZ7bfwE8nGx0LLWtlpcSJavywUbIXE=; b=4j2nzsGs2ZcbnxCH0JdPuP34JbPsgBKo9zQ2INwEXCfTRMxQ3ASdj7LxCW2yWTys5o eeyCoZNoM4B3HLbVt7J9YxKJTlSxsM958IzQL4f3204mX+fV84ixdynUtlYhML1Le3wW PdvGjZnkxfRi/oFkqPAhxUHbjlR7o0c+Mgq7nHOLnATbEHtZGwGHXN4rLonx3kfNdq5b naemJTnA2jOaA9VBnLfS5W/zxo3duV6cvgc8p8SVy1BX1wG1QSsmsXFH1gC1N1XvOakg rnrbNbhJx6EkaD4HLD04V18vY4VEa5+WAKox/OB0VMU0koswS3Q0Vmr52Cc8ScGPQiDg qnAw== X-Gm-Message-State: AFqh2kqL31y2xmEmVrwtIyL1k4F/PrRw+W4p2SprhJ6CBZdjlCs8HxhE EDEDBCsGz6JLaYqBtcYi23dNLcHoh8AvNJ84Aene2LM09SXieVYIm2AiYyLFRUffmHIpGiPQB1d 0jF8MOBOuNzAEtefzRvXdJ3BRuk0= X-Received: by 2002:ac8:44b6:0:b0:3a9:7d74:d138 with SMTP id a22-20020ac844b6000000b003a97d74d138mr11198578qto.40.1671464524078; Mon, 19 Dec 2022 07:42:04 -0800 (PST) X-Google-Smtp-Source: AMrXdXsAKgQ5Wa9xDgBFCzAvbkrDetU3o2mDrt+nRzeIYa6W8RtQxq9qEke9GedRPKLFG/msBwo7iw== X-Received: by 2002:ac8:44b6:0:b0:3a9:7d74:d138 with SMTP id a22-20020ac844b6000000b003a97d74d138mr11198559qto.40.1671464523823; Mon, 19 Dec 2022 07:42:03 -0800 (PST) Received: from bfoster (c-24-61-119-116.hsd1.ma.comcast.net. [24.61.119.116]) by smtp.gmail.com with ESMTPSA id a16-20020ac81090000000b003a82ca4e81csm6060610qtj.80.2022.12.19.07.42.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Dec 2022 07:42:03 -0800 (PST) Date: Mon, 19 Dec 2022 10:42:09 -0500 From: Brian Foster To: Kent Overstreet Cc: linux-bcachefs@vger.kernel.org Subject: Re: [PATCH RFC] bcachefs: use inode as write point index instead of task Message-ID: References: <20221212190602.1388127-1-bfoster@redhat.com> <20221213183743.3m6ntfnu7n3yebng@moria.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-bcachefs@vger.kernel.org On Fri, Dec 16, 2022 at 02:04:45AM -0500, Kent Overstreet wrote: > On Thu, Dec 15, 2022 at 07:09:58PM -0500, Kent Overstreet wrote: > > I do think we could probably be doing something more than using the pid for the > > writepoint, I've just been waiting until we see specific workloads where the > > current behaviour falls over or have a specific complaint before designing > > something new. > > Random late night thoughts: > > Say we introduce a new object, 'writepoint_handle' or somesuch. > > Allocate them when opening a file for write, close them when close the file. > > Then we'd be explicitly picking which writepoint to use when allocating the > writepoint_handle; it would be easy to add logic for "if there's a writepoint > which was last used by this process and doesn't currently have any handles > pointing to it, use that". > Ok, but if we alloc the handle at open (or first write or whatever), we'd still need to potentially keep it around after ->release() (i.e. userspace close()) while the mapping is dirty and thus still needs to be written back, right? If so, perhaps this would need some additional state to track an "active" writepoint, explicitly defined as a "writepoint with currently open files" as opposed to simply a handle pointer? IOW, if the task is no longer writing to the previous file, it's probably Ok to reuse that writepoint even though the handle might still have a reference..? But generally I think I get the idea: preserve the current ability for a single sequential writer to use the same writepoint across N files, but fall back to a separate writepoint where we otherwise detect multi-file activity. I think that makes sense, though I'd probably have to think a bit more about an explicit open() -> close() handle lifecycle and whether that's robust enough for fileserver like use cases. I.e., I'd be a little concerned about whether that workload might make inter-spersed sub-file writes look a bit too much like the single user open -> write -> close -> repeat use case.. > One of the things that needs to be considered is - what do we do when there's > more writepoint_handles than writepoints? > Does bcachefs have to deal with something like that today? For example if there is some max number of writepoints, what happens if there might be some greater number of tasks doing allocations at the same time? Brian > bcache has some logic for this by tracking when a writepoint was used, and if we > don't find a writepoint that matches up with the IO being issued - pick the > oldest one off an LRU queue. Was dropped in bcachefs because the straight hash > table seemed to work just as well and was faster - or maybe I'm thinking of the > sequential bypass data structure? >