* [PATCH RFC 1/3] iomap: factor out a bio submission helper
2024-04-10 14:09 [PATCH RFC 0/3] xfs: nodataio mount option to skip data I/O Brian Foster
@ 2024-04-10 14:09 ` Brian Foster
2024-04-10 14:09 ` [PATCH RFC 2/3] iomap: add nosubmit flag to skip data I/O on iomap mapping Brian Foster
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Brian Foster @ 2024-04-10 14:09 UTC (permalink / raw)
To: linux-bcachefs, linux-xfs; +Cc: Kent Overstreet
This is a small cleanup to facilitate a nosubmit iomap flag. No
functional changes intended.
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
fs/iomap/buffered-io.c | 32 +++++++++++++++++++++++---------
1 file changed, 23 insertions(+), 9 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 4e8e41c8b3c0..b6d176027887 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -43,6 +43,23 @@ struct iomap_folio_state {
static struct bio_set iomap_ioend_bioset;
+/*
+ * Simple submit_bio() wrapper. Set ->bi_status to trigger error completion.
+ */
+static inline int iomap_submit_bio(struct bio *bio, bool wait)
+{
+ int ret = 0;
+
+ if (bio->bi_status)
+ bio_endio(bio);
+ else if (wait)
+ ret = submit_bio_wait(bio);
+ else
+ submit_bio(bio);
+
+ return ret;
+}
+
static inline bool ifs_is_fully_uptodate(struct folio *folio,
struct iomap_folio_state *ifs)
{
@@ -411,7 +428,7 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
unsigned int nr_vecs = DIV_ROUND_UP(length, PAGE_SIZE);
if (ctx->bio)
- submit_bio(ctx->bio);
+ iomap_submit_bio(ctx->bio, false);
if (ctx->rac) /* same as readahead_gfp_mask */
gfp |= __GFP_NORETRY | __GFP_NOWARN;
@@ -464,7 +481,7 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
folio_set_error(folio);
if (ctx.bio) {
- submit_bio(ctx.bio);
+ iomap_submit_bio(ctx.bio, false);
WARN_ON_ONCE(!ctx.cur_folio_in_bio);
} else {
WARN_ON_ONCE(ctx.cur_folio_in_bio);
@@ -537,7 +554,7 @@ void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
iter.processed = iomap_readahead_iter(&iter, &ctx);
if (ctx.bio)
- submit_bio(ctx.bio);
+ iomap_submit_bio(ctx.bio, false);
if (ctx.cur_folio) {
if (!ctx.cur_folio_in_bio)
folio_unlock(ctx.cur_folio);
@@ -665,7 +682,7 @@ static int iomap_read_folio_sync(loff_t block_start, struct folio *folio,
bio_init(&bio, iomap->bdev, &bvec, 1, REQ_OP_READ);
bio.bi_iter.bi_sector = iomap_sector(iomap, block_start);
bio_add_folio_nofail(&bio, folio, plen, poff);
- return submit_bio_wait(&bio);
+ return iomap_submit_bio(&bio, true);
}
static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
@@ -1667,12 +1684,9 @@ static int iomap_submit_ioend(struct iomap_writepage_ctx *wpc, int error)
if (wpc->ops->prepare_ioend)
error = wpc->ops->prepare_ioend(wpc->ioend, error);
- if (error) {
+ if (error)
wpc->ioend->io_bio.bi_status = errno_to_blk_status(error);
- bio_endio(&wpc->ioend->io_bio);
- } else {
- submit_bio(&wpc->ioend->io_bio);
- }
+ iomap_submit_bio(&wpc->ioend->io_bio, false);
wpc->ioend = NULL;
return error;
--
2.44.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH RFC 2/3] iomap: add nosubmit flag to skip data I/O on iomap mapping
2024-04-10 14:09 [PATCH RFC 0/3] xfs: nodataio mount option to skip data I/O Brian Foster
2024-04-10 14:09 ` [PATCH RFC 1/3] iomap: factor out a bio submission helper Brian Foster
@ 2024-04-10 14:09 ` Brian Foster
2024-04-10 14:09 ` [PATCH RFC 3/3] xfs: add nodataio mount option to skip all data I/O Brian Foster
2024-04-10 16:17 ` [PATCH RFC 0/3] xfs: nodataio mount option to skip " Kent Overstreet
3 siblings, 0 replies; 5+ messages in thread
From: Brian Foster @ 2024-04-10 14:09 UTC (permalink / raw)
To: linux-bcachefs, linux-xfs; +Cc: Kent Overstreet
Define a nosubmit flag to skip data I/O submission on a specified
mapping. The iomap layer still performs every step up through
constructing the bio as if it will be submitted, but instead invokes
completion on the bio directly from submit context. The purpose of
this is to facilitate filesystem metadata performance testing
without the overhead of actual data I/O.
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
fs/iomap/buffered-io.c | 21 +++++++++++++--------
include/linux/iomap.h | 1 +
2 files changed, 14 insertions(+), 8 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index b6d176027887..5d1c443a6fb4 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -46,11 +46,16 @@ static struct bio_set iomap_ioend_bioset;
/*
* Simple submit_bio() wrapper. Set ->bi_status to trigger error completion.
*/
-static inline int iomap_submit_bio(struct bio *bio, bool wait)
+static inline int iomap_submit_bio(const struct iomap *iomap, struct bio *bio,
+ bool wait)
{
- int ret = 0;
+ int ret = 0;
+ bool nosubmit = iomap->flags & IOMAP_F_NOSUBMIT;
+
+ if (nosubmit)
+ zero_fill_bio_iter(bio, bio->bi_iter);
- if (bio->bi_status)
+ if (bio->bi_status || nosubmit)
bio_endio(bio);
else if (wait)
ret = submit_bio_wait(bio);
@@ -428,7 +433,7 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
unsigned int nr_vecs = DIV_ROUND_UP(length, PAGE_SIZE);
if (ctx->bio)
- iomap_submit_bio(ctx->bio, false);
+ iomap_submit_bio(iomap, ctx->bio, false);
if (ctx->rac) /* same as readahead_gfp_mask */
gfp |= __GFP_NORETRY | __GFP_NOWARN;
@@ -481,7 +486,7 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
folio_set_error(folio);
if (ctx.bio) {
- iomap_submit_bio(ctx.bio, false);
+ iomap_submit_bio(&iter.iomap, ctx.bio, false);
WARN_ON_ONCE(!ctx.cur_folio_in_bio);
} else {
WARN_ON_ONCE(ctx.cur_folio_in_bio);
@@ -554,7 +559,7 @@ void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
iter.processed = iomap_readahead_iter(&iter, &ctx);
if (ctx.bio)
- iomap_submit_bio(ctx.bio, false);
+ iomap_submit_bio(&iter.iomap, ctx.bio, false);
if (ctx.cur_folio) {
if (!ctx.cur_folio_in_bio)
folio_unlock(ctx.cur_folio);
@@ -682,7 +687,7 @@ static int iomap_read_folio_sync(loff_t block_start, struct folio *folio,
bio_init(&bio, iomap->bdev, &bvec, 1, REQ_OP_READ);
bio.bi_iter.bi_sector = iomap_sector(iomap, block_start);
bio_add_folio_nofail(&bio, folio, plen, poff);
- return iomap_submit_bio(&bio, true);
+ return iomap_submit_bio(iomap, &bio, true);
}
static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
@@ -1686,7 +1691,7 @@ static int iomap_submit_ioend(struct iomap_writepage_ctx *wpc, int error)
if (error)
wpc->ioend->io_bio.bi_status = errno_to_blk_status(error);
- iomap_submit_bio(&wpc->ioend->io_bio, false);
+ iomap_submit_bio(&wpc->iomap, &wpc->ioend->io_bio, false);
wpc->ioend = NULL;
return error;
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 6fc1c858013d..8d34ec240e12 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -64,6 +64,7 @@ struct vm_fault;
#define IOMAP_F_BUFFER_HEAD 0
#endif /* CONFIG_BUFFER_HEAD */
#define IOMAP_F_XATTR (1U << 5)
+#define IOMAP_F_NOSUBMIT (1U << 6)
/*
* Flags set by the core iomap code during operations:
--
2.44.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH RFC 3/3] xfs: add nodataio mount option to skip all data I/O
2024-04-10 14:09 [PATCH RFC 0/3] xfs: nodataio mount option to skip data I/O Brian Foster
2024-04-10 14:09 ` [PATCH RFC 1/3] iomap: factor out a bio submission helper Brian Foster
2024-04-10 14:09 ` [PATCH RFC 2/3] iomap: add nosubmit flag to skip data I/O on iomap mapping Brian Foster
@ 2024-04-10 14:09 ` Brian Foster
2024-04-10 16:17 ` [PATCH RFC 0/3] xfs: nodataio mount option to skip " Kent Overstreet
3 siblings, 0 replies; 5+ messages in thread
From: Brian Foster @ 2024-04-10 14:09 UTC (permalink / raw)
To: linux-bcachefs, linux-xfs; +Cc: Kent Overstreet
When mounted with nodataio, add the NOSUBMIT iomap flag to all data
mappings passed into the iomap layer. This causes iomap to skip all
data I/O submission and thus facilitates metadata only performance
testing.
For experimental use only. Only tested insofar as fsstress runs for
a few minutes without blowing up.
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
fs/xfs/xfs_iomap.c | 3 +++
fs/xfs/xfs_mount.h | 2 ++
fs/xfs/xfs_super.c | 6 +++++-
3 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 4087af7f3c9f..9b71a649e106 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -101,6 +101,9 @@ xfs_bmbt_to_iomap(
struct xfs_mount *mp = ip->i_mount;
struct xfs_buftarg *target = xfs_inode_buftarg(ip);
+ if (xfs_has_nodataio(mp))
+ iomap_flags |= IOMAP_F_NOSUBMIT;
+
if (unlikely(!xfs_valid_startblock(ip, imap->br_startblock))) {
xfs_bmap_mark_sick(ip, XFS_DATA_FORK);
return xfs_alert_fsblock_zero(ip, imap);
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index e880aa48de68..fd8a5b46d449 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -294,6 +294,7 @@ typedef struct xfs_mount {
#define XFS_FEAT_NREXT64 (1ULL << 26) /* large extent counters */
/* Mount features */
+#define XFS_FEAT_NODATAIO (1ULL << 47) /* skip all data I/O */
#define XFS_FEAT_NOATTR2 (1ULL << 48) /* disable attr2 creation */
#define XFS_FEAT_NOALIGN (1ULL << 49) /* ignore alignment */
#define XFS_FEAT_ALLOCSIZE (1ULL << 50) /* user specified allocation size */
@@ -363,6 +364,7 @@ __XFS_HAS_FEAT(large_extent_counts, NREXT64)
* bit inodes and read-only state, are kept as operational state rather than
* features.
*/
+__XFS_HAS_FEAT(nodataio, NODATAIO)
__XFS_HAS_FEAT(noattr2, NOATTR2)
__XFS_HAS_FEAT(noalign, NOALIGN)
__XFS_HAS_FEAT(allocsize, ALLOCSIZE)
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index bce020374c5e..1fb24b5ba684 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -103,7 +103,7 @@ enum {
Opt_filestreams, Opt_quota, Opt_noquota, Opt_usrquota, Opt_grpquota,
Opt_prjquota, Opt_uquota, Opt_gquota, Opt_pquota,
Opt_uqnoenforce, Opt_gqnoenforce, Opt_pqnoenforce, Opt_qnoenforce,
- Opt_discard, Opt_nodiscard, Opt_dax, Opt_dax_enum,
+ Opt_discard, Opt_nodiscard, Opt_dax, Opt_dax_enum, Opt_nodataio,
};
static const struct fs_parameter_spec xfs_fs_parameters[] = {
@@ -148,6 +148,7 @@ static const struct fs_parameter_spec xfs_fs_parameters[] = {
fsparam_flag("nodiscard", Opt_nodiscard),
fsparam_flag("dax", Opt_dax),
fsparam_enum("dax", Opt_dax_enum, dax_param_enums),
+ fsparam_flag("nodataio", Opt_nodataio),
{}
};
@@ -1385,6 +1386,9 @@ xfs_fs_parse_param(
xfs_fs_warn_deprecated(fc, param, XFS_FEAT_NOATTR2, true);
parsing_mp->m_features |= XFS_FEAT_NOATTR2;
return 0;
+ case Opt_nodataio:
+ parsing_mp->m_features |= XFS_FEAT_NODATAIO;
+ return 0;
default:
xfs_warn(parsing_mp, "unknown mount option [%s].", param->key);
return -EINVAL;
--
2.44.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH RFC 0/3] xfs: nodataio mount option to skip data I/O
2024-04-10 14:09 [PATCH RFC 0/3] xfs: nodataio mount option to skip data I/O Brian Foster
` (2 preceding siblings ...)
2024-04-10 14:09 ` [PATCH RFC 3/3] xfs: add nodataio mount option to skip all data I/O Brian Foster
@ 2024-04-10 16:17 ` Kent Overstreet
3 siblings, 0 replies; 5+ messages in thread
From: Kent Overstreet @ 2024-04-10 16:17 UTC (permalink / raw)
To: Brian Foster; +Cc: linux-bcachefs, linux-xfs
On Wed, Apr 10, 2024 at 10:09:53AM -0400, Brian Foster wrote:
> Hi all,
>
> bcachefs has a nodataio mount option that is used for isolated metadata
> performance testing purposes. When enabled, it performs all metadata I/O
> as normal and shortcuts data I/O by directly invoking bio completion.
> Kent had asked for something similar for fs comparison purposes some
> time ago and I put together a quick hack based around an iomap flag and
> mount option for XFS.
>
> I don't recall if I ever posted the initial version and Kent recently
> asked about whether we'd want to consider merging something like this. I
> think there are at least a couple things that probably need addressing
> before that is a viable option.
>
> One is that the mount option is kind of hacky in and of itself. Beyond
> that, this mechanism provides a means for stale data exposure because
> writes with nodataio mode enabled will operate as if writes were
> completed normally (including unwritten extent conversion). Therefore, a
> remount to !nodataio mode means we read off whatever was last written to
> storage.
>
> Kent mentioned that Eric (or somebody?) had floated the idea of a mkfs
> time feature flag or some such to control nodataio mode. That would
> avoid mount api changes in general and also disallow use of such
> filesystems in a non-nodataio mode, so to me seems like the direction
> bcachefs should go with its variant of this regardless.
>
> Personally, I don't have much of an opinion on whether something like
> this lands upstream or just remains as a local test hack for isolated
> performance testing. The code is simple enough as it is and not really
> worth the additional polishing for the latter, but I offered to at least
> rebase and post for discussion. Thoughts, reviews, flames appreciated.
>
> Brian
>
> Brian Foster (3):
> iomap: factor out a bio submission helper
> iomap: add nosubmit flag to skip data I/O on iomap mapping
> xfs: add nodataio mount option to skip all data I/O
>
> fs/iomap/buffered-io.c | 37 ++++++++++++++++++++++++++++---------
> fs/xfs/xfs_iomap.c | 3 +++
> fs/xfs/xfs_mount.h | 2 ++
> fs/xfs/xfs_super.c | 6 +++++-
> include/linux/iomap.h | 1 +
> 5 files changed, 39 insertions(+), 10 deletions(-)
>
> --
> 2.44.0
I'm contemplating add the superblock option to bcachefs as well, that
would fit well with using this for working with metadata dumps too.
("Yes, we know all data checksums are wrong, it's fine").
Another thing that makes this exceedingly useful - SSDs these days are
_garbage_ in terms of getting consistent results. Without this, run to
run variance is ridiculous without a bunch of prep between each test
(that takes longer than the test itself).
^ permalink raw reply [flat|nested] 5+ messages in thread