Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Parallelizing filesystem writeback

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Anuj Gupta <anuj20.g@samsung.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>, Kundan Kumar <kundan.kumar@samsung.com>,
	Christoph Hellwig <hch@lst.de>,
	lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
	mcgrof@kernel.org, joshi.k@samsung.com, axboe@kernel.dk,
	clm@meta.com, willy@infradead.org, gost.dev@samsung.com
Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Parallelizing filesystem writeback
Date: Wed, 19 Mar 2025 13:37:00 +0530	[thread overview]
Message-ID: <20250319080700.GA16509@green245> (raw)
In-Reply-To: <Z9jrmu9dXMUaNYba@dread.disaster.area>

[-- Attachment #1: Type: text/plain, Size: 3590 bytes --]

On Tue, Mar 18, 2025 at 02:42:18PM +1100, Dave Chinner wrote:
> On Thu, Mar 13, 2025 at 09:22:00PM +0100, Jan Kara wrote:
> > On Thu 20-02-25 19:49:22, Kundan Kumar wrote:
> > > > Well, that's currently selected by __inode_attach_wb() based on
> > > > whether there is a memcg attached to the folio/task being dirtied or
> > > > not. If there isn't a cgroup based writeback task, then it uses the
> > > > bdi->wb as the wb context.
> > > 
> > > We have created a proof of concept for per-AG context-based writeback, as
> > > described in [1]. The AG is mapped to a writeback context (wb_ctx). Using
> > > the filesystem handler, __mark_inode_dirty() selects writeback context
> > > corresponding to the inode.
> > > 
> > > We attempted to handle memcg and bdi based writeback in a similar manner.
> > > This approach aims to maintain the original writeback semantics while
> > > providing parallelism. This helps in pushing more data early to the
> > > device, trying to ease the write pressure faster.
> > > [1] https://lore.kernel.org/all/20250212103634.448437-1-kundan.kumar@samsung.com/
> > 
> > Yeah, I've seen the patches. Sorry for not getting to you earlier.
> >  
> > > > Then selecting inodes for writeback becomes a list_lru_walk()
> > > > variant depending on what needs to be written back (e.g. physical
> > > > node, memcg, both, everything that is dirty everywhere, etc).
> > > 
> > > We considered using list_lru to track inodes within a writeback context.
> > > This can be implemented as:
> > > struct bdi_writeback {
> > >  struct list_lru b_dirty_inodes_lru; // instead of a single b_dirty list
> > >  struct list_lru b_io_dirty_inodes_lru;
> > >  ...
> > >  ...
> > > };
> > > By doing this, we would obtain a sharded list of inodes per NUMA node.
> > 
> > I think you've misunderstood Dave's suggestion here. list_lru was given as
> > an example of a structure for inspiration. We cannot take it directly as is
> > for writeback purposes because we don't want to be sharding based on NUMA
> > nodes but rather based on some other (likely FS driven) criteria.
> 
> Well, you might say that, but.....
> 
> ... I was actually thinking of taking the list_lru and abstracting
> it's N-way parallelism away from the numa infrastructure.
> 
> The NUMA awareness of the list_lru is largely in external APIs. Th
> eonly implicit NUMA awareness is in the list_lru_add() function
> where it converts the object being added to the list to a node ID
> based on where it is physically located in memory.
> 
> The only other thing that is NUMA specific is that the list is set
> up with N-way concurrency when N = the number of NUMA nodes in the
> machine.
> 
> So, really, it is just thin veneer of NUMA wrapped around the
> inherent concurrency built into the structure.
> 
> IOWs, when we create a list_lru for a numa aware shrinker, we simply
> use the number of nodes as the N-way parallelism for the list,
> and the existing per-node infrastructure simply feeds the right
> numa node ID as the "list index" for it to function as is.
> 
> In the case of writeback parallelism, we could create a list_lru
> with the number of AGs as the N-way parallism for the list, and then
> have the concurrent BDI writeback context (1 per AG) simply provide
> the AG number as the "list index" for writeback....
> 

If we go ahead with Jan's suggestion, each AG will have a separate
bdi_writeback_context. Each bdi_writeback_context has its own b_dirty
inode list. This naturally partitions inodes per AG. Given that, do we
still need AG based sharded list_lru? Am I missing something here?

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]

next prev parent reply	other threads:[~2025-03-19  8:32 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20250129103448epcas5p1f7d71506e4443429a0b0002eb842e749@epcas5p1.samsung.com>
2025-01-29 10:26 ` [LSF/MM/BPF TOPIC] Parallelizing filesystem writeback Kundan Kumar
2025-01-29 15:42   ` Christoph Hellwig
2025-01-31  9:57     ` Kundan Kumar
2025-01-29 22:51   ` Dave Chinner
2025-01-31  9:32     ` Kundan Kumar
2025-01-31 17:06       ` Luis Chamberlain
2025-02-04  2:50       ` Dave Chinner
2025-02-04  5:06         ` Christoph Hellwig
2025-02-04  7:08           ` Dave Chinner
2025-02-10 17:28           ` [Lsf-pc] " Jan Kara
2025-02-11  1:13             ` Dave Chinner
2025-02-11 13:43               ` Jan Kara
2025-02-20 14:19               ` Kundan Kumar
2025-03-13 20:22                 ` Jan Kara
2025-03-18  3:42                   ` Dave Chinner
2025-03-19  8:07                     ` Anuj Gupta [this message]
2025-03-18  6:41                   ` Kundan Kumar
2025-03-18 11:37                   ` Anuj Gupta
2025-03-19 15:54                     ` Jan Kara
2025-03-20  7:08                       ` Anuj Gupta
2025-03-12 17:47               ` Luis Chamberlain
2025-03-13 19:39                 ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250319080700.GA16509@green245 \
    --to=anuj20.g@samsung.com \
    --cc=axboe@kernel.dk \
    --cc=clm@meta.com \
    --cc=david@fromorbit.com \
    --cc=gost.dev@samsung.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=joshi.k@samsung.com \
    --cc=kundan.kumar@samsung.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=mcgrof@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.