From: Brian Foster <bfoster@redhat.com>
To: linux-bcachefs@vger.kernel.org
Subject: [PATCH RFC] bcachefs: use inode as write point index instead of task
Date: Mon, 12 Dec 2022 14:06:02 -0500 [thread overview]
Message-ID: <20221212190602.1388127-1-bfoster@redhat.com> (raw)
Use the pointer to the current in-core inode object as the hash of
the write point associated with block allocation for buffered
writes.
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
Hi bcachefs folks,
I'm posting this RFC more as a discussion point than a direct proposal.
I've been poking around a bit at bcachefs just to grok some of the
basics and whatnot and ended up digging down into the block allocation
code a bit. From there I was trying to make sense of the bucket and
write point abstractions, and then noticed that the write point
structure used to track allocation/write contexts is hashed by task.
IIUC, that means a single task that might be writing to multiple files
will do so via a single write point, or conversely multiple tasks that
might be writing to a single file would do so with multiple write points
(one associated with each task). In turn, that means the allocation
layout of a particular file might be a characteristic of the set of
tasks that initially wrote the file as opposed to something more
traditional based on inode locality.
In thinking about this a bit more, it seems logical that this derives
from bcache because it seems like a perfectly good cache allocation
policy. I suppose it might make sense in certain filesystem workloads
where per-file contiguity might no longer necessarily be the most
important thing in the world, but I'm curious if that is intentional for
bcachefs (the approach is documented, after all) and if there are any
particular workloads or use cases in mind for that sort of approach?
Even if it makes sense in some cases, I wonder whether it's the ideal
approach across the board. For example, consider how the task is tracked
through the inode from buffered write to writeback contexts so the
latter can locate the write point for the last task that wrote to the
file. Even if multiple tasks perform buffered writes to a file, we only
really use the last task that wrote to the file before the current
writeback cycle. Absent frequent writebacks, wouldn't it make some sense
to just associate the write point with the current inode pointer and use
that to minimize per-inode fragmentation?
FWIW, I was able to run some oddball workloads that produce excessive
fragmentation by forcing frequent writeback activity, but my tests so
far are ad hoc and moreso just playing around to help understand
behavior. To be fair, more traditional filesystems don't necessarily
handle the same sort of tests terribly well either. What I found more
interesting is that with the small tweak from this RFC, the allocation
layout from bcachefs went from notably worse to notably better in that
kind of pathological workload. Thoughts?
Brian
fs/bcachefs/fs-io.c | 5 +----
fs/bcachefs/fs.h | 1 -
2 files changed, 1 insertion(+), 5 deletions(-)
diff --git a/fs/bcachefs/fs-io.c b/fs/bcachefs/fs-io.c
index e7ebb01b4d09..f0cb32f2c437 100644
--- a/fs/bcachefs/fs-io.c
+++ b/fs/bcachefs/fs-io.c
@@ -1389,7 +1389,7 @@ static void bch2_writepage_io_alloc(struct bch_fs *c,
op->target = w->opts.foreground_target;
op->nr_replicas = nr_replicas;
op->res.nr_replicas = nr_replicas;
- op->write_point = writepoint_hashed(inode->ei_last_dirtied);
+ op->write_point = writepoint_hashed((unsigned long) inode);
op->subvol = inode->ei_subvol;
op->pos = POS(inode->v.i_ino, sector);
op->end_io = bch2_writepage_io_done;
@@ -1669,8 +1669,6 @@ int bch2_write_end(struct file *file, struct address_space *mapping,
SetPageUptodate(page);
bch2_set_page_dirty(c, inode, page, res, offset, copied);
-
- inode->ei_last_dirtied = (unsigned long) current;
}
unlock_page(page);
@@ -1823,7 +1821,6 @@ static int __bch2_buffered_write(struct bch_inode_info *inode,
}
nr_pages_copied = DIV_ROUND_UP(offset + copied, PAGE_SIZE);
- inode->ei_last_dirtied = (unsigned long) current;
out:
for (i = nr_pages_copied; i < nr_pages; i++) {
unlock_page(pages[i]);
diff --git a/fs/bcachefs/fs.h b/fs/bcachefs/fs.h
index 6b91bbe91116..c963994a104a 100644
--- a/fs/bcachefs/fs.h
+++ b/fs/bcachefs/fs.h
@@ -17,7 +17,6 @@ struct bch_inode_info {
struct mutex ei_update_lock;
u64 ei_quota_reserved;
- unsigned long ei_last_dirtied;
two_state_lock_t ei_pagecache_lock;
--
2.37.3
next reply other threads:[~2022-12-12 19:07 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-12 19:06 Brian Foster [this message]
2022-12-13 18:38 ` [PATCH RFC] bcachefs: use inode as write point index instead of task Kent Overstreet
2022-12-14 17:44 ` Brian Foster
2022-12-16 0:09 ` Kent Overstreet
2022-12-16 7:04 ` Kent Overstreet
2022-12-19 15:42 ` Brian Foster
2022-12-20 1:56 ` Kent Overstreet
2022-12-28 22:24 ` Eric Wheeler
2022-12-29 20:59 ` Kent Overstreet
2022-12-29 22:26 ` Eric Wheeler
2022-12-30 3:18 ` Kent Overstreet
2022-12-19 15:27 ` Brian Foster
2022-12-20 1:02 ` Kent Overstreet
2022-12-22 14:03 ` Brian Foster
2022-12-23 4:36 ` Kent Overstreet
2022-12-23 11:49 ` Brian Foster
2022-12-23 18:02 ` Kent Overstreet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221212190602.1388127-1-bfoster@redhat.com \
--to=bfoster@redhat.com \
--cc=linux-bcachefs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox