* [PATCHSET 0/2] Add RWF_DONTCACHE support @ 2025-01-06 17:48 Jens Axboe 2025-01-06 17:48 ` [PATCH 1/2] fsstress: add support for RWF_DONTCACHE Jens Axboe 2025-01-06 17:48 ` [PATCH 2/2] fsx: " Jens Axboe 0 siblings, 2 replies; 13+ messages in thread From: Jens Axboe @ 2025-01-06 17:48 UTC (permalink / raw) To: zlang; +Cc: djwong, fstests Hi, This adds support for RWF_DONTCACHE for fsx and fsstress, which was used for testing the kernel patchset referenced in the two patches. -- Jens Axboe ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/2] fsstress: add support for RWF_DONTCACHE 2025-01-06 17:48 [PATCHSET 0/2] Add RWF_DONTCACHE support Jens Axboe @ 2025-01-06 17:48 ` Jens Axboe 2025-01-07 2:11 ` Darrick J. Wong 2025-01-06 17:48 ` [PATCH 2/2] fsx: " Jens Axboe 1 sibling, 1 reply; 13+ messages in thread From: Jens Axboe @ 2025-01-06 17:48 UTC (permalink / raw) To: zlang; +Cc: djwong, fstests, Jens Axboe Using RWF_DONTCACHE tells the kernel that any page cache instantiated by this operation should get pruned once the operation completes. If data is in cache prior to the operation it will remain there. Add ops for testing both the read and write side of this. If the kernel being tested doesn't support RWF_DONTCACHE, then operations are performed with regular read/write buffered IO. See the kernel posting adding support: https://lore.kernel.org/linux-fsdevel/20241220154831.1086649-1-axboe@kernel.dk/ Signed-off-by: Jens Axboe <axboe@kernel.dk> --- ltp/fsstress.c | 136 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 136 insertions(+) diff --git a/ltp/fsstress.c b/ltp/fsstress.c index 3d248ee25791..df9f6ffb86fc 100644 --- a/ltp/fsstress.c +++ b/ltp/fsstress.c @@ -82,6 +82,12 @@ static int renameat2(int dfd1, const char *path1, #define RENAME_WHITEOUT (1 << 2) /* Whiteout source */ #endif +#ifndef RWF_DONTCACHE +#define RWF_DONTCACHE 0x80 +#endif + +static int have_rwf_dontcache = 1; + #define FILELEN_MAX (32*4096) typedef enum { @@ -117,6 +123,7 @@ typedef enum { OP_COLLAPSE, OP_INSERT, OP_READ, + OP_READ_DONTCACHE, OP_READLINK, OP_READV, OP_REMOVEFATTR, @@ -143,6 +150,7 @@ typedef enum { OP_URING_READ, OP_URING_WRITE, OP_WRITE, + OP_WRITE_DONTCACHE, OP_WRITEV, OP_EXCHANGE_RANGE, OP_LAST @@ -248,6 +256,7 @@ void zero_f(opnum_t, long); void collapse_f(opnum_t, long); void insert_f(opnum_t, long); void unshare_f(opnum_t, long); +void read_dontcache_f(opnum_t, long); void read_f(opnum_t, long); void readlink_f(opnum_t, long); void readv_f(opnum_t, long); @@ -273,6 +282,7 @@ void unlink_f(opnum_t, long); void unresvsp_f(opnum_t, long); void uring_read_f(opnum_t, long); void uring_write_f(opnum_t, long); +void write_dontcache_f(opnum_t, long); void write_f(opnum_t, long); void writev_f(opnum_t, long); void exchangerange_f(opnum_t, long); @@ -315,6 +325,7 @@ struct opdesc ops[OP_LAST] = { [OP_COLLAPSE] = {"collapse", collapse_f, 1, 1 }, [OP_INSERT] = {"insert", insert_f, 1, 1 }, [OP_READ] = {"read", read_f, 1, 0 }, + [OP_READ_DONTCACHE] = {"read_dontcache", read_dontcache_f, 1, 0 }, [OP_READLINK] = {"readlink", readlink_f, 1, 0 }, [OP_READV] = {"readv", readv_f, 1, 0 }, /* remove (delete) extended attribute */ @@ -346,6 +357,7 @@ struct opdesc ops[OP_LAST] = { [OP_URING_WRITE] = {"uring_write", uring_write_f, 1, 1 }, [OP_WRITE] = {"write", write_f, 4, 1 }, [OP_WRITEV] = {"writev", writev_f, 4, 1 }, + [OP_WRITE_DONTCACHE]= {"write_dontcache", write_dontcache_f,4, 1 }, [OP_EXCHANGE_RANGE]= {"exchangerange", exchangerange_f, 2, 1 }, }, *ops_end; @@ -4635,6 +4647,71 @@ readv_f(opnum_t opno, long r) close(fd); } +void +read_dontcache_f(opnum_t opno, long r) +{ + int e; + pathname_t f; + int fd; + int64_t lr; + off64_t off; + struct stat64 stb; + int v; + char st[1024]; + struct iovec iov; + int flags; + + init_pathname(&f); + if (!get_fname(FT_REGFILE, r, &f, NULL, NULL, &v)) { + if (v) + printf("%d/%lld: read - no filename\n", procid, opno); + free_pathname(&f); + return; + } + fd = open_path(&f, O_RDONLY); + e = fd < 0 ? errno : 0; + check_cwd(); + if (fd < 0) { + if (v) + printf("%d/%lld: read - open %s failed %d\n", + procid, opno, f.path, e); + free_pathname(&f); + return; + } + if (fstat64(fd, &stb) < 0) { + if (v) + printf("%d/%lld: read - fstat64 %s failed %d\n", + procid, opno, f.path, errno); + free_pathname(&f); + close(fd); + return; + } + inode_info(st, sizeof(st), &stb, v); + if (stb.st_size == 0) { + if (v) + printf("%d/%lld: read - %s%s zero size\n", procid, opno, + f.path, st); + free_pathname(&f); + close(fd); + return; + } + lr = ((int64_t)random() << 32) + random(); + off = (off64_t)(lr % stb.st_size); + iov.iov_len = (random() % FILELEN_MAX) + 1; + iov.iov_base = malloc(iov.iov_len); + flags = have_rwf_dontcache ? RWF_DONTCACHE : 0; + e = preadv2(fd, &iov, 1, off, flags) < 0 ? errno : 0; + if (have_rwf_dontcache && e == EOPNOTSUPP) + e = preadv2(fd, &iov, 1, off, 0) < 0 ? errno : 0; + free(iov.iov_base); + if (v) + printf("%d/%lld: read dontcache %s%s [%lld,%d] %d\n", + procid, opno, f.path, st, (long long)off, + (int)iov.iov_len, e); + free_pathname(&f); + close(fd); +} + void removefattr_f(opnum_t opno, long r) { @@ -5509,6 +5586,65 @@ writev_f(opnum_t opno, long r) close(fd); } +void +write_dontcache_f(opnum_t opno, long r) +{ + int e; + pathname_t f; + int fd; + int64_t lr; + off64_t off; + struct stat64 stb; + int v; + char st[1024]; + struct iovec iov; + int flags; + + init_pathname(&f); + if (!get_fname(FT_REGm, r, &f, NULL, NULL, &v)) { + if (v) + printf("%d/%lld: write - no filename\n", procid, opno); + free_pathname(&f); + return; + } + fd = open_path(&f, O_WRONLY); + e = fd < 0 ? errno : 0; + check_cwd(); + if (fd < 0) { + if (v) + printf("%d/%lld: write - open %s failed %d\n", + procid, opno, f.path, e); + free_pathname(&f); + return; + } + if (fstat64(fd, &stb) < 0) { + if (v) + printf("%d/%lld: write - fstat64 %s failed %d\n", + procid, opno, f.path, errno); + free_pathname(&f); + close(fd); + return; + } + inode_info(st, sizeof(st), &stb, v); + lr = ((int64_t)random() << 32) + random(); + off = (off64_t)(lr % MIN(stb.st_size + (1024 * 1024), MAXFSIZE)); + off %= maxfsize; + iov.iov_len = (random() % FILELEN_MAX) + 1; + iov.iov_base = malloc(iov.iov_len); + memset(iov.iov_base, nameseq & 0xff, iov.iov_len); + flags = have_rwf_dontcache ? RWF_DONTCACHE : 0; + e = pwritev2(fd, &iov, 1, off, flags) < 0 ? errno : 0; + if (have_rwf_dontcache && e == EOPNOTSUPP) + e = pwritev2(fd, &iov, 1, off, 0) < 0 ? errno : 0; + free(iov.iov_base); + if (v) + printf("%d/%lld: write dontcache %s%s [%lld,%d] %d\n", + procid, opno, f.path, st, (long long)off, + (int)iov.iov_len, e); + free_pathname(&f); + close(fd); +} + char * xattr_flag_to_string(int flag) { -- 2.47.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] fsstress: add support for RWF_DONTCACHE 2025-01-06 17:48 ` [PATCH 1/2] fsstress: add support for RWF_DONTCACHE Jens Axboe @ 2025-01-07 2:11 ` Darrick J. Wong 2025-01-07 2:16 ` Jens Axboe 0 siblings, 1 reply; 13+ messages in thread From: Darrick J. Wong @ 2025-01-07 2:11 UTC (permalink / raw) To: Jens Axboe; +Cc: zlang, fstests On Mon, Jan 06, 2025 at 10:48:46AM -0700, Jens Axboe wrote: > Using RWF_DONTCACHE tells the kernel that any page cache instantiated > by this operation should get pruned once the operation completes. If > data is in cache prior to the operation it will remain there. > > Add ops for testing both the read and write side of this. If the kernel > being tested doesn't support RWF_DONTCACHE, then operations are performed > with regular read/write buffered IO. > > See the kernel posting adding support: > > https://lore.kernel.org/linux-fsdevel/20241220154831.1086649-1-axboe@kernel.dk/ > > Signed-off-by: Jens Axboe <axboe@kernel.dk> > --- > ltp/fsstress.c | 136 +++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 136 insertions(+) > > diff --git a/ltp/fsstress.c b/ltp/fsstress.c > index 3d248ee25791..df9f6ffb86fc 100644 > --- a/ltp/fsstress.c > +++ b/ltp/fsstress.c > @@ -82,6 +82,12 @@ static int renameat2(int dfd1, const char *path1, > #define RENAME_WHITEOUT (1 << 2) /* Whiteout source */ > #endif > > +#ifndef RWF_DONTCACHE > +#define RWF_DONTCACHE 0x80 > +#endif > + > +static int have_rwf_dontcache = 1; > + > #define FILELEN_MAX (32*4096) > > typedef enum { > @@ -117,6 +123,7 @@ typedef enum { > OP_COLLAPSE, > OP_INSERT, > OP_READ, > + OP_READ_DONTCACHE, > OP_READLINK, > OP_READV, > OP_REMOVEFATTR, > @@ -143,6 +150,7 @@ typedef enum { > OP_URING_READ, > OP_URING_WRITE, > OP_WRITE, > + OP_WRITE_DONTCACHE, > OP_WRITEV, > OP_EXCHANGE_RANGE, > OP_LAST > @@ -248,6 +256,7 @@ void zero_f(opnum_t, long); > void collapse_f(opnum_t, long); > void insert_f(opnum_t, long); > void unshare_f(opnum_t, long); > +void read_dontcache_f(opnum_t, long); > void read_f(opnum_t, long); > void readlink_f(opnum_t, long); > void readv_f(opnum_t, long); > @@ -273,6 +282,7 @@ void unlink_f(opnum_t, long); > void unresvsp_f(opnum_t, long); > void uring_read_f(opnum_t, long); > void uring_write_f(opnum_t, long); > +void write_dontcache_f(opnum_t, long); > void write_f(opnum_t, long); > void writev_f(opnum_t, long); > void exchangerange_f(opnum_t, long); > @@ -315,6 +325,7 @@ struct opdesc ops[OP_LAST] = { > [OP_COLLAPSE] = {"collapse", collapse_f, 1, 1 }, > [OP_INSERT] = {"insert", insert_f, 1, 1 }, > [OP_READ] = {"read", read_f, 1, 0 }, > + [OP_READ_DONTCACHE] = {"read_dontcache", read_dontcache_f, 1, 0 }, > [OP_READLINK] = {"readlink", readlink_f, 1, 0 }, > [OP_READV] = {"readv", readv_f, 1, 0 }, > /* remove (delete) extended attribute */ > @@ -346,6 +357,7 @@ struct opdesc ops[OP_LAST] = { > [OP_URING_WRITE] = {"uring_write", uring_write_f, 1, 1 }, > [OP_WRITE] = {"write", write_f, 4, 1 }, > [OP_WRITEV] = {"writev", writev_f, 4, 1 }, > + [OP_WRITE_DONTCACHE]= {"write_dontcache", write_dontcache_f,4, 1 }, > [OP_EXCHANGE_RANGE]= {"exchangerange", exchangerange_f, 2, 1 }, > }, *ops_end; > > @@ -4635,6 +4647,71 @@ readv_f(opnum_t opno, long r) > close(fd); > } > > +void > +read_dontcache_f(opnum_t opno, long r) > +{ > + int e; > + pathname_t f; > + int fd; > + int64_t lr; > + off64_t off; > + struct stat64 stb; > + int v; > + char st[1024]; > + struct iovec iov; > + int flags; > + > + init_pathname(&f); > + if (!get_fname(FT_REGFILE, r, &f, NULL, NULL, &v)) { > + if (v) > + printf("%d/%lld: read - no filename\n", procid, opno); > + free_pathname(&f); > + return; > + } > + fd = open_path(&f, O_RDONLY); > + e = fd < 0 ? errno : 0; > + check_cwd(); > + if (fd < 0) { > + if (v) > + printf("%d/%lld: read - open %s failed %d\n", > + procid, opno, f.path, e); > + free_pathname(&f); > + return; > + } > + if (fstat64(fd, &stb) < 0) { > + if (v) > + printf("%d/%lld: read - fstat64 %s failed %d\n", > + procid, opno, f.path, errno); > + free_pathname(&f); > + close(fd); > + return; > + } > + inode_info(st, sizeof(st), &stb, v); > + if (stb.st_size == 0) { > + if (v) > + printf("%d/%lld: read - %s%s zero size\n", procid, opno, > + f.path, st); > + free_pathname(&f); > + close(fd); > + return; > + } > + lr = ((int64_t)random() << 32) + random(); > + off = (off64_t)(lr % stb.st_size); > + iov.iov_len = (random() % FILELEN_MAX) + 1; > + iov.iov_base = malloc(iov.iov_len); Should there be a check for null iov_base after the allocation? > + flags = have_rwf_dontcache ? RWF_DONTCACHE : 0; > + e = preadv2(fd, &iov, 1, off, flags) < 0 ? errno : 0; > + if (have_rwf_dontcache && e == EOPNOTSUPP) ...and should this set have_rwf_dontcache = 0? (Other than that, thanks for wiring this up...) --D > + e = preadv2(fd, &iov, 1, off, 0) < 0 ? errno : 0; > + free(iov.iov_base); > + if (v) > + printf("%d/%lld: read dontcache %s%s [%lld,%d] %d\n", > + procid, opno, f.path, st, (long long)off, > + (int)iov.iov_len, e); > + free_pathname(&f); > + close(fd); > +} > + > void > removefattr_f(opnum_t opno, long r) > { > @@ -5509,6 +5586,65 @@ writev_f(opnum_t opno, long r) > close(fd); > } > > +void > +write_dontcache_f(opnum_t opno, long r) > +{ > + int e; > + pathname_t f; > + int fd; > + int64_t lr; > + off64_t off; > + struct stat64 stb; > + int v; > + char st[1024]; > + struct iovec iov; > + int flags; > + > + init_pathname(&f); > + if (!get_fname(FT_REGm, r, &f, NULL, NULL, &v)) { > + if (v) > + printf("%d/%lld: write - no filename\n", procid, opno); > + free_pathname(&f); > + return; > + } > + fd = open_path(&f, O_WRONLY); > + e = fd < 0 ? errno : 0; > + check_cwd(); > + if (fd < 0) { > + if (v) > + printf("%d/%lld: write - open %s failed %d\n", > + procid, opno, f.path, e); > + free_pathname(&f); > + return; > + } > + if (fstat64(fd, &stb) < 0) { > + if (v) > + printf("%d/%lld: write - fstat64 %s failed %d\n", > + procid, opno, f.path, errno); > + free_pathname(&f); > + close(fd); > + return; > + } > + inode_info(st, sizeof(st), &stb, v); > + lr = ((int64_t)random() << 32) + random(); > + off = (off64_t)(lr % MIN(stb.st_size + (1024 * 1024), MAXFSIZE)); > + off %= maxfsize; > + iov.iov_len = (random() % FILELEN_MAX) + 1; > + iov.iov_base = malloc(iov.iov_len); > + memset(iov.iov_base, nameseq & 0xff, iov.iov_len); > + flags = have_rwf_dontcache ? RWF_DONTCACHE : 0; > + e = pwritev2(fd, &iov, 1, off, flags) < 0 ? errno : 0; > + if (have_rwf_dontcache && e == EOPNOTSUPP) > + e = pwritev2(fd, &iov, 1, off, 0) < 0 ? errno : 0; > + free(iov.iov_base); > + if (v) > + printf("%d/%lld: write dontcache %s%s [%lld,%d] %d\n", > + procid, opno, f.path, st, (long long)off, > + (int)iov.iov_len, e); > + free_pathname(&f); > + close(fd); > +} > + > char * > xattr_flag_to_string(int flag) > { > -- > 2.47.1 > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] fsstress: add support for RWF_DONTCACHE 2025-01-07 2:11 ` Darrick J. Wong @ 2025-01-07 2:16 ` Jens Axboe 2025-01-07 17:30 ` Darrick J. Wong 0 siblings, 1 reply; 13+ messages in thread From: Jens Axboe @ 2025-01-07 2:16 UTC (permalink / raw) To: Darrick J. Wong; +Cc: zlang, fstests On 1/6/25 7:11 PM, Darrick J. Wong wrote: >> +void >> +read_dontcache_f(opnum_t opno, long r) >> +{ >> + int e; >> + pathname_t f; >> + int fd; >> + int64_t lr; >> + off64_t off; >> + struct stat64 stb; >> + int v; >> + char st[1024]; >> + struct iovec iov; >> + int flags; >> + >> + init_pathname(&f); >> + if (!get_fname(FT_REGFILE, r, &f, NULL, NULL, &v)) { >> + if (v) >> + printf("%d/%lld: read - no filename\n", procid, opno); >> + free_pathname(&f); >> + return; >> + } >> + fd = open_path(&f, O_RDONLY); >> + e = fd < 0 ? errno : 0; >> + check_cwd(); >> + if (fd < 0) { >> + if (v) >> + printf("%d/%lld: read - open %s failed %d\n", >> + procid, opno, f.path, e); >> + free_pathname(&f); >> + return; >> + } >> + if (fstat64(fd, &stb) < 0) { >> + if (v) >> + printf("%d/%lld: read - fstat64 %s failed %d\n", >> + procid, opno, f.path, errno); >> + free_pathname(&f); >> + close(fd); >> + return; >> + } >> + inode_info(st, sizeof(st), &stb, v); >> + if (stb.st_size == 0) { >> + if (v) >> + printf("%d/%lld: read - %s%s zero size\n", procid, opno, >> + f.path, st); >> + free_pathname(&f); >> + close(fd); >> + return; >> + } >> + lr = ((int64_t)random() << 32) + random(); >> + off = (off64_t)(lr % stb.st_size); >> + iov.iov_len = (random() % FILELEN_MAX) + 1; >> + iov.iov_base = malloc(iov.iov_len); > > Should there be a check for null iov_base after the allocation? Nothing else in fsstress seems to bother with malloc() failures, which at least on Linux, is probably fair game. >> + flags = have_rwf_dontcache ? RWF_DONTCACHE : 0; >> + e = preadv2(fd, &iov, 1, off, flags) < 0 ? errno : 0; >> + if (have_rwf_dontcache && e == EOPNOTSUPP) > > ...and should this set have_rwf_dontcache = 0? I don't think so? If we get EOPNOTSUPP and we don't have dontcache, then it's a fatal condition. fsstress defaults to it being available, so we may very well run into EOPNOTSUPP and then just do a regular read or write for that case. We could probably do: if (have_rwf_dontcache && e == EOPNOTSUPP) { have_rwf_dontcache = 0; e = preadv2(fd, &iov, 1, off, 0) < 0 ? errno : 0; } here and on the write side, at least then we won't repeatedly try RWF_DONTCACHE if we hit EOPNOTSUPP. But in terms of logic, it should be correct as-is. -- Jens Axboe ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] fsstress: add support for RWF_DONTCACHE 2025-01-07 2:16 ` Jens Axboe @ 2025-01-07 17:30 ` Darrick J. Wong 0 siblings, 0 replies; 13+ messages in thread From: Darrick J. Wong @ 2025-01-07 17:30 UTC (permalink / raw) To: Jens Axboe; +Cc: zlang, fstests On Mon, Jan 06, 2025 at 07:16:17PM -0700, Jens Axboe wrote: > On 1/6/25 7:11 PM, Darrick J. Wong wrote: > >> +void > >> +read_dontcache_f(opnum_t opno, long r) > >> +{ > >> + int e; > >> + pathname_t f; > >> + int fd; > >> + int64_t lr; > >> + off64_t off; > >> + struct stat64 stb; > >> + int v; > >> + char st[1024]; > >> + struct iovec iov; > >> + int flags; > >> + > >> + init_pathname(&f); > >> + if (!get_fname(FT_REGFILE, r, &f, NULL, NULL, &v)) { > >> + if (v) > >> + printf("%d/%lld: read - no filename\n", procid, opno); > >> + free_pathname(&f); > >> + return; > >> + } > >> + fd = open_path(&f, O_RDONLY); > >> + e = fd < 0 ? errno : 0; > >> + check_cwd(); > >> + if (fd < 0) { > >> + if (v) > >> + printf("%d/%lld: read - open %s failed %d\n", > >> + procid, opno, f.path, e); > >> + free_pathname(&f); > >> + return; > >> + } > >> + if (fstat64(fd, &stb) < 0) { > >> + if (v) > >> + printf("%d/%lld: read - fstat64 %s failed %d\n", > >> + procid, opno, f.path, errno); > >> + free_pathname(&f); > >> + close(fd); > >> + return; > >> + } > >> + inode_info(st, sizeof(st), &stb, v); > >> + if (stb.st_size == 0) { > >> + if (v) > >> + printf("%d/%lld: read - %s%s zero size\n", procid, opno, > >> + f.path, st); > >> + free_pathname(&f); > >> + close(fd); > >> + return; > >> + } > >> + lr = ((int64_t)random() << 32) + random(); > >> + off = (off64_t)(lr % stb.st_size); > >> + iov.iov_len = (random() % FILELEN_MAX) + 1; > >> + iov.iov_base = malloc(iov.iov_len); > > > > Should there be a check for null iov_base after the allocation? > > Nothing else in fsstress seems to bother with malloc() failures, which > at least on Linux, is probably fair game. lol ok. > >> + flags = have_rwf_dontcache ? RWF_DONTCACHE : 0; > >> + e = preadv2(fd, &iov, 1, off, flags) < 0 ? errno : 0; > >> + if (have_rwf_dontcache && e == EOPNOTSUPP) > > > > ...and should this set have_rwf_dontcache = 0? > > I don't think so? If we get EOPNOTSUPP and we don't have dontcache, then > it's a fatal condition. fsstress defaults to it being available, so we > may very well run into EOPNOTSUPP and then just do a regular read or > write for that case. We could probably do: > > if (have_rwf_dontcache && e == EOPNOTSUPP) { > have_rwf_dontcache = 0; > e = preadv2(fd, &iov, 1, off, 0) < 0 ? errno : 0; > } Yep, that's exactly what I was thinking. :) --D > here and on the write side, at least then we won't repeatedly try > RWF_DONTCACHE if we hit EOPNOTSUPP. But in terms of logic, it should be > correct as-is. > > -- > Jens Axboe > ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 2/2] fsx: add support for RWF_DONTCACHE 2025-01-06 17:48 [PATCHSET 0/2] Add RWF_DONTCACHE support Jens Axboe 2025-01-06 17:48 ` [PATCH 1/2] fsstress: add support for RWF_DONTCACHE Jens Axboe @ 2025-01-06 17:48 ` Jens Axboe 2025-01-07 2:09 ` Darrick J. Wong 1 sibling, 1 reply; 13+ messages in thread From: Jens Axboe @ 2025-01-06 17:48 UTC (permalink / raw) To: zlang; +Cc: djwong, fstests, Jens Axboe Using RWF_DONTCACHE tells the kernel that any page cache instantiated by this operation should get pruned once the operation completes. If data is in cache prior to the operation it will remain there. Add ops for testing both the read and write side of this. At startup, kernel support for this feature is probed. If support isn't available, uncached/dontcache IO is performed as regular buffered IO. If -Z is used to turn on O_DIRECT, then uncached/dontcache IO isn't performed. Defaults to on if available, and adds a -T parameter to turn it off. See the kernel posting adding support: https://lore.kernel.org/linux-fsdevel/20241220154831.1086649-1-axboe@kernel.dk/ Signed-off-by: Jens Axboe <axboe@kernel.dk> --- ltp/fsx.c | 115 ++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 77 insertions(+), 38 deletions(-) diff --git a/ltp/fsx.c b/ltp/fsx.c index 41933354328a..7c996026157d 100644 --- a/ltp/fsx.c +++ b/ltp/fsx.c @@ -43,6 +43,10 @@ # define MAP_FILE 0 #endif +#ifndef RWF_DONTCACHE +#define RWF_DONTCACHE 0x80 +#endif + #define NUMPRINTCOLUMNS 32 /* # columns of data to print on each line */ /* Operation flags (bitmask) */ @@ -101,7 +105,9 @@ int logcount = 0; /* total ops */ enum { /* common operations */ OP_READ = 0, + OP_READ_DONTCACHE, OP_WRITE, + OP_WRITE_DONTCACHE, OP_MAPREAD, OP_MAPWRITE, OP_MAX_LITE, @@ -190,15 +196,16 @@ int o_direct; /* -Z */ int aio = 0; int uring = 0; int mark_nr = 0; +int dontcache_io = 1; int page_size; int page_mask; int mmap_mask; -int fsx_rw(int rw, int fd, char *buf, unsigned len, unsigned offset); +int fsx_rw(int rw, int fd, char *buf, unsigned len, unsigned offset, int flags); #define READ 0 #define WRITE 1 -#define fsxread(a,b,c,d) fsx_rw(READ, a,b,c,d) -#define fsxwrite(a,b,c,d) fsx_rw(WRITE, a,b,c,d) +#define fsxread(a,b,c,d,f) fsx_rw(READ, a,b,c,d,f) +#define fsxwrite(a,b,c,d,f) fsx_rw(WRITE, a,b,c,d,f) struct timespec deadline; @@ -266,7 +273,9 @@ prterr(const char *prefix) static const char *op_names[] = { [OP_READ] = "read", + [OP_READ_DONTCACHE] = "read_dontcache", [OP_WRITE] = "write", + [OP_WRITE_DONTCACHE] = "write_dontcache", [OP_MAPREAD] = "mapread", [OP_MAPWRITE] = "mapwrite", [OP_TRUNCATE] = "truncate", @@ -393,12 +402,14 @@ logdump(void) prt("\t******WWWW"); break; case OP_READ: + case OP_READ_DONTCACHE: prt("READ 0x%x thru 0x%x\t(0x%x bytes)", lp->args[0], lp->args[0] + lp->args[1] - 1, lp->args[1]); if (overlap) prt("\t***RRRR***"); break; + case OP_WRITE_DONTCACHE: case OP_WRITE: prt("WRITE 0x%x thru 0x%x\t(0x%x bytes)", lp->args[0], lp->args[0] + lp->args[1] - 1, @@ -784,9 +795,8 @@ doflush(unsigned offset, unsigned size) } void -doread(unsigned offset, unsigned size) +doread(unsigned offset, unsigned size, int flags) { - off_t ret; unsigned iret; offset -= offset % readbdy; @@ -818,12 +828,7 @@ doread(unsigned offset, unsigned size) (monitorend == -1 || offset <= monitorend)))))) prt("%lld read\t0x%x thru\t0x%x\t(0x%x bytes)\n", testcalls, offset, offset + size - 1, size); - ret = lseek(fd, (off_t)offset, SEEK_SET); - if (ret == (off_t)-1) { - prterr("doread: lseek"); - report_failure(140); - } - iret = fsxread(fd, temp_buf, size, offset); + iret = fsxread(fd, temp_buf, size, offset, flags); if (iret != size) { if (iret == -1) prterr("doread: read"); @@ -870,7 +875,6 @@ check_contents(void) unsigned map_offset; unsigned map_size; char *p; - off_t ret; unsigned iret; if (!check_buf) { @@ -885,13 +889,7 @@ check_contents(void) if (size == 0) return; - ret = lseek(fd, (off_t)offset, SEEK_SET); - if (ret == (off_t)-1) { - prterr("doread: lseek"); - report_failure(140); - } - - iret = fsxread(fd, check_buf, size, offset); + iret = fsxread(fd, check_buf, size, offset, 0); if (iret != size) { if (iret == -1) prterr("check_contents: read"); @@ -1064,9 +1062,8 @@ update_file_size(unsigned offset, unsigned size) } void -dowrite(unsigned offset, unsigned size) +dowrite(unsigned offset, unsigned size, int flags) { - off_t ret; unsigned iret; offset -= offset % writebdy; @@ -1099,14 +1096,9 @@ dowrite(unsigned offset, unsigned size) (monitorstart == -1 || (offset + size > monitorstart && (monitorend == -1 || offset <= monitorend)))))) - prt("%lld write\t0x%x thru\t0x%x\t(0x%x bytes)\n", testcalls, - offset, offset + size - 1, size); - ret = lseek(fd, (off_t)offset, SEEK_SET); - if (ret == (off_t)-1) { - prterr("dowrite: lseek"); - report_failure(150); - } - iret = fsxwrite(fd, good_buf + offset, size, offset); + prt("%lld write\t0x%x thru\t0x%x\t(0x%x bytes)\tdontcache=%d\n", testcalls, + offset, offset + size - 1, size, (flags & RWF_DONTCACHE) != 0); + iret = fsxwrite(fd, good_buf + offset, size, offset, flags); if (iret != size) { if (iret == -1) prterr("dowrite: write"); @@ -1954,6 +1946,26 @@ do_preallocate(unsigned offset, unsigned length, int keep_size, int unshare) } #endif +int +test_dontcache_io(void) +{ + char buf[4096]; + struct iovec iov = { .iov_base = buf, .iov_len = sizeof(buf) }; + int ret, e; + + ret = preadv2(fd, &iov, 1, 0, RWF_DONTCACHE); + e = ret < 0 ? errno : 0; + if (e == EOPNOTSUPP) { + if (!quiet) + fprintf(stderr, + "main: filesystem does not support " + "dontcache IO, disabling!\n"); + return 0; + } + + return 1; +} + void writefileimage() { @@ -2337,12 +2349,28 @@ have_op: switch (op) { case OP_READ: TRIM_OFF_LEN(offset, size, file_size); - doread(offset, size); + doread(offset, size, 0); + break; + + case OP_READ_DONTCACHE: + TRIM_OFF_LEN(offset, size, file_size); + if (dontcache_io) + doread(offset, size, RWF_DONTCACHE); + else + doread(offset, size, 0); break; case OP_WRITE: TRIM_OFF_LEN(offset, size, maxfilelen); - dowrite(offset, size); + dowrite(offset, size, 0); + break; + + case OP_WRITE_DONTCACHE: + TRIM_OFF_LEN(offset, size, maxfilelen); + if (dontcache_io) + dowrite(offset, size, RWF_DONTCACHE); + else + dowrite(offset, size, 0); break; case OP_MAPREAD: @@ -2538,6 +2566,7 @@ usage(void) " -0: Do not use exchange range calls\n" #endif " -K: Do not use keep size\n\ + -T: Do not use dontcache IO\n\ -L: fsxLite - no file creations & no file size changes\n\ -N numops: total # operations to do (default infinity)\n\ -O: use oplen (see -o flag) for every op (default random)\n\ @@ -2546,7 +2575,7 @@ usage(void) -S seed: for random # generator (default 1) 0 gets timestamp\n\ -W: mapped write operations DISabled\n\ -X: Read file and compare to good buffer after every operation\n\ - -Z: O_DIRECT (use -R, -W, -r and -w too)\n\ + -Z: O_DIRECT (use -R, -W, -r and -w too, excludes dontcache IO)\n\ --replay-ops=opsfile: replay ops from recorded .fsxops file\n\ --record-ops[=opsfile]: dump ops file also on success. optionally specify ops file name\n\ --duration=seconds: ignore any -N setting and run for this many seconds\n\ @@ -2702,7 +2731,7 @@ uring_setup() } int -uring_rw(int rw, int fd, char *buf, unsigned len, unsigned offset) +uring_rw(int rw, int fd, char *buf, unsigned len, unsigned offset, int flags) { struct io_uring_sqe *sqe; struct io_uring_cqe *cqe; @@ -2733,6 +2762,7 @@ uring_rw(int rw, int fd, char *buf, unsigned len, unsigned offset) } else { io_uring_prep_writev(sqe, fd, &iovec, 1, o); } + sqe->rw_flags = flags; ret = io_uring_submit_and_wait(&ring, 1); if (ret != 1) { @@ -2781,7 +2811,7 @@ uring_rw(int rw, int fd, char *buf, unsigned len, unsigned offset) } #else int -uring_rw(int rw, int fd, char *buf, unsigned len, unsigned offset) +uring_rw(int rw, int fd, char *buf, unsigned len, unsigned offset, int flags) { fprintf(stderr, "io_rw: need IO_URING support!\n"); exit(111); @@ -2789,19 +2819,21 @@ uring_rw(int rw, int fd, char *buf, unsigned len, unsigned offset) #endif int -fsx_rw(int rw, int fd, char *buf, unsigned len, unsigned offset) +fsx_rw(int rw, int fd, char *buf, unsigned len, unsigned offset, int flags) { int ret; if (aio) { ret = aio_rw(rw, fd, buf, len, offset); } else if (uring) { - ret = uring_rw(rw, fd, buf, len, offset); + ret = uring_rw(rw, fd, buf, len, offset, flags); } else { + struct iovec iov = { .iov_base = buf, .iov_len = len }; + if (rw == READ) - ret = read(fd, buf, len); + ret = preadv2(fd, &iov, 1, offset, flags); else - ret = write(fd, buf, len); + ret = pwritev2(fd, &iov, 1, offset, flags); } return ret; } @@ -3065,6 +3097,9 @@ main(int argc, char **argv) if (seed < 0) usage(); break; + case 'T': + dontcache_io = 0; + break; case 'W': mapped_writes = 0; if (!quiet) @@ -3076,6 +3111,7 @@ main(int argc, char **argv) case 'Z': o_direct = O_DIRECT; o_flags |= O_DIRECT; + dontcache_io = 0; break; case 254: /* --duration */ if (!optarg) { @@ -3293,6 +3329,9 @@ main(int argc, char **argv) copy_range_calls = test_copy_range(); if (exchange_range_calls) exchange_range_calls = test_exchange_range(); + if (dontcache_io) + dontcache_io = test_dontcache_io(); + printf("Dontcache_io=%d\n", dontcache_io); while (keep_running()) if (!test()) -- 2.47.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] fsx: add support for RWF_DONTCACHE 2025-01-06 17:48 ` [PATCH 2/2] fsx: " Jens Axboe @ 2025-01-07 2:09 ` Darrick J. Wong 2025-01-07 2:12 ` Jens Axboe 0 siblings, 1 reply; 13+ messages in thread From: Darrick J. Wong @ 2025-01-07 2:09 UTC (permalink / raw) To: Jens Axboe; +Cc: zlang, fstests On Mon, Jan 06, 2025 at 10:48:47AM -0700, Jens Axboe wrote: > Using RWF_DONTCACHE tells the kernel that any page cache instantiated > by this operation should get pruned once the operation completes. If > data is in cache prior to the operation it will remain there. > > Add ops for testing both the read and write side of this. At startup, > kernel support for this feature is probed. If support isn't available, > uncached/dontcache IO is performed as regular buffered IO. If -Z is > used to turn on O_DIRECT, then uncached/dontcache IO isn't performed. > Defaults to on if available, and adds a -T parameter to turn it off. > > See the kernel posting adding support: > > https://lore.kernel.org/linux-fsdevel/20241220154831.1086649-1-axboe@kernel.dk/ > > Signed-off-by: Jens Axboe <axboe@kernel.dk> > --- > ltp/fsx.c | 115 ++++++++++++++++++++++++++++++++++++------------------ > 1 file changed, 77 insertions(+), 38 deletions(-) > > diff --git a/ltp/fsx.c b/ltp/fsx.c > index 41933354328a..7c996026157d 100644 > --- a/ltp/fsx.c > +++ b/ltp/fsx.c > @@ -43,6 +43,10 @@ > # define MAP_FILE 0 > #endif > > +#ifndef RWF_DONTCACHE > +#define RWF_DONTCACHE 0x80 > +#endif > + > #define NUMPRINTCOLUMNS 32 /* # columns of data to print on each line */ > > /* Operation flags (bitmask) */ > @@ -101,7 +105,9 @@ int logcount = 0; /* total ops */ > enum { > /* common operations */ > OP_READ = 0, > + OP_READ_DONTCACHE, > OP_WRITE, > + OP_WRITE_DONTCACHE, > OP_MAPREAD, > OP_MAPWRITE, > OP_MAX_LITE, > @@ -190,15 +196,16 @@ int o_direct; /* -Z */ > int aio = 0; > int uring = 0; > int mark_nr = 0; > +int dontcache_io = 1; > > int page_size; > int page_mask; > int mmap_mask; > -int fsx_rw(int rw, int fd, char *buf, unsigned len, unsigned offset); > +int fsx_rw(int rw, int fd, char *buf, unsigned len, unsigned offset, int flags); > #define READ 0 > #define WRITE 1 > -#define fsxread(a,b,c,d) fsx_rw(READ, a,b,c,d) > -#define fsxwrite(a,b,c,d) fsx_rw(WRITE, a,b,c,d) > +#define fsxread(a,b,c,d,f) fsx_rw(READ, a,b,c,d,f) > +#define fsxwrite(a,b,c,d,f) fsx_rw(WRITE, a,b,c,d,f) > > struct timespec deadline; > > @@ -266,7 +273,9 @@ prterr(const char *prefix) > > static const char *op_names[] = { > [OP_READ] = "read", > + [OP_READ_DONTCACHE] = "read_dontcache", > [OP_WRITE] = "write", > + [OP_WRITE_DONTCACHE] = "write_dontcache", > [OP_MAPREAD] = "mapread", > [OP_MAPWRITE] = "mapwrite", > [OP_TRUNCATE] = "truncate", > @@ -393,12 +402,14 @@ logdump(void) > prt("\t******WWWW"); > break; > case OP_READ: > + case OP_READ_DONTCACHE: > prt("READ 0x%x thru 0x%x\t(0x%x bytes)", > lp->args[0], lp->args[0] + lp->args[1] - 1, > lp->args[1]); > if (overlap) > prt("\t***RRRR***"); > break; > + case OP_WRITE_DONTCACHE: > case OP_WRITE: > prt("WRITE 0x%x thru 0x%x\t(0x%x bytes)", > lp->args[0], lp->args[0] + lp->args[1] - 1, > @@ -784,9 +795,8 @@ doflush(unsigned offset, unsigned size) > } > > void > -doread(unsigned offset, unsigned size) > +doread(unsigned offset, unsigned size, int flags) > { > - off_t ret; > unsigned iret; > > offset -= offset % readbdy; > @@ -818,12 +828,7 @@ doread(unsigned offset, unsigned size) > (monitorend == -1 || offset <= monitorend)))))) > prt("%lld read\t0x%x thru\t0x%x\t(0x%x bytes)\n", testcalls, > offset, offset + size - 1, size); > - ret = lseek(fd, (off_t)offset, SEEK_SET); > - if (ret == (off_t)-1) { > - prterr("doread: lseek"); > - report_failure(140); > - } > - iret = fsxread(fd, temp_buf, size, offset); > + iret = fsxread(fd, temp_buf, size, offset, flags); > if (iret != size) { > if (iret == -1) > prterr("doread: read"); > @@ -870,7 +875,6 @@ check_contents(void) > unsigned map_offset; > unsigned map_size; > char *p; > - off_t ret; > unsigned iret; > > if (!check_buf) { > @@ -885,13 +889,7 @@ check_contents(void) > if (size == 0) > return; > > - ret = lseek(fd, (off_t)offset, SEEK_SET); > - if (ret == (off_t)-1) { > - prterr("doread: lseek"); > - report_failure(140); > - } > - > - iret = fsxread(fd, check_buf, size, offset); > + iret = fsxread(fd, check_buf, size, offset, 0); > if (iret != size) { > if (iret == -1) > prterr("check_contents: read"); > @@ -1064,9 +1062,8 @@ update_file_size(unsigned offset, unsigned size) > } > > void > -dowrite(unsigned offset, unsigned size) > +dowrite(unsigned offset, unsigned size, int flags) > { > - off_t ret; > unsigned iret; > > offset -= offset % writebdy; > @@ -1099,14 +1096,9 @@ dowrite(unsigned offset, unsigned size) > (monitorstart == -1 || > (offset + size > monitorstart && > (monitorend == -1 || offset <= monitorend)))))) > - prt("%lld write\t0x%x thru\t0x%x\t(0x%x bytes)\n", testcalls, > - offset, offset + size - 1, size); > - ret = lseek(fd, (off_t)offset, SEEK_SET); > - if (ret == (off_t)-1) { > - prterr("dowrite: lseek"); > - report_failure(150); > - } > - iret = fsxwrite(fd, good_buf + offset, size, offset); > + prt("%lld write\t0x%x thru\t0x%x\t(0x%x bytes)\tdontcache=%d\n", testcalls, > + offset, offset + size - 1, size, (flags & RWF_DONTCACHE) != 0); > + iret = fsxwrite(fd, good_buf + offset, size, offset, flags); > if (iret != size) { > if (iret == -1) > prterr("dowrite: write"); > @@ -1954,6 +1946,26 @@ do_preallocate(unsigned offset, unsigned length, int keep_size, int unshare) > } > #endif > > +int > +test_dontcache_io(void) > +{ > + char buf[4096]; > + struct iovec iov = { .iov_base = buf, .iov_len = sizeof(buf) }; > + int ret, e; > + > + ret = preadv2(fd, &iov, 1, 0, RWF_DONTCACHE); > + e = ret < 0 ? errno : 0; > + if (e == EOPNOTSUPP) { > + if (!quiet) > + fprintf(stderr, > + "main: filesystem does not support " > + "dontcache IO, disabling!\n"); > + return 0; > + } > + > + return 1; > +} > + > void > writefileimage() > { > @@ -2337,12 +2349,28 @@ have_op: > switch (op) { > case OP_READ: > TRIM_OFF_LEN(offset, size, file_size); > - doread(offset, size); > + doread(offset, size, 0); > + break; > + > + case OP_READ_DONTCACHE: > + TRIM_OFF_LEN(offset, size, file_size); > + if (dontcache_io) > + doread(offset, size, RWF_DONTCACHE); > + else > + doread(offset, size, 0); > break; > > case OP_WRITE: > TRIM_OFF_LEN(offset, size, maxfilelen); > - dowrite(offset, size); > + dowrite(offset, size, 0); > + break; > + > + case OP_WRITE_DONTCACHE: > + TRIM_OFF_LEN(offset, size, maxfilelen); > + if (dontcache_io) > + dowrite(offset, size, RWF_DONTCACHE); > + else > + dowrite(offset, size, 0); > break; > > case OP_MAPREAD: > @@ -2538,6 +2566,7 @@ usage(void) > " -0: Do not use exchange range calls\n" > #endif > " -K: Do not use keep size\n\ > + -T: Do not use dontcache IO\n\ > -L: fsxLite - no file creations & no file size changes\n\ > -N numops: total # operations to do (default infinity)\n\ > -O: use oplen (see -o flag) for every op (default random)\n\ > @@ -2546,7 +2575,7 @@ usage(void) > -S seed: for random # generator (default 1) 0 gets timestamp\n\ > -W: mapped write operations DISabled\n\ > -X: Read file and compare to good buffer after every operation\n\ > - -Z: O_DIRECT (use -R, -W, -r and -w too)\n\ > + -Z: O_DIRECT (use -R, -W, -r and -w too, excludes dontcache IO)\n\ > --replay-ops=opsfile: replay ops from recorded .fsxops file\n\ > --record-ops[=opsfile]: dump ops file also on success. optionally specify ops file name\n\ > --duration=seconds: ignore any -N setting and run for this many seconds\n\ > @@ -2702,7 +2731,7 @@ uring_setup() > } > > int > -uring_rw(int rw, int fd, char *buf, unsigned len, unsigned offset) > +uring_rw(int rw, int fd, char *buf, unsigned len, unsigned offset, int flags) > { > struct io_uring_sqe *sqe; > struct io_uring_cqe *cqe; > @@ -2733,6 +2762,7 @@ uring_rw(int rw, int fd, char *buf, unsigned len, unsigned offset) > } else { > io_uring_prep_writev(sqe, fd, &iovec, 1, o); > } > + sqe->rw_flags = flags; > > ret = io_uring_submit_and_wait(&ring, 1); > if (ret != 1) { > @@ -2781,7 +2811,7 @@ uring_rw(int rw, int fd, char *buf, unsigned len, unsigned offset) > } > #else > int > -uring_rw(int rw, int fd, char *buf, unsigned len, unsigned offset) > +uring_rw(int rw, int fd, char *buf, unsigned len, unsigned offset, int flags) > { > fprintf(stderr, "io_rw: need IO_URING support!\n"); > exit(111); > @@ -2789,19 +2819,21 @@ uring_rw(int rw, int fd, char *buf, unsigned len, unsigned offset) > #endif > > int > -fsx_rw(int rw, int fd, char *buf, unsigned len, unsigned offset) > +fsx_rw(int rw, int fd, char *buf, unsigned len, unsigned offset, int flags) > { > int ret; > > if (aio) { > ret = aio_rw(rw, fd, buf, len, offset); > } else if (uring) { > - ret = uring_rw(rw, fd, buf, len, offset); > + ret = uring_rw(rw, fd, buf, len, offset, flags); > } else { > + struct iovec iov = { .iov_base = buf, .iov_len = len }; > + > if (rw == READ) > - ret = read(fd, buf, len); > + ret = preadv2(fd, &iov, 1, offset, flags); > else > - ret = write(fd, buf, len); > + ret = pwritev2(fd, &iov, 1, offset, flags); > } > return ret; > } > @@ -3065,6 +3097,9 @@ main(int argc, char **argv) > if (seed < 0) > usage(); > break; > + case 'T': > + dontcache_io = 0; > + break; > case 'W': > mapped_writes = 0; > if (!quiet) > @@ -3076,6 +3111,7 @@ main(int argc, char **argv) > case 'Z': > o_direct = O_DIRECT; > o_flags |= O_DIRECT; > + dontcache_io = 0; > break; > case 254: /* --duration */ > if (!optarg) { > @@ -3293,6 +3329,9 @@ main(int argc, char **argv) > copy_range_calls = test_copy_range(); > if (exchange_range_calls) > exchange_range_calls = test_exchange_range(); > + if (dontcache_io) > + dontcache_io = test_dontcache_io(); > + printf("Dontcache_io=%d\n", dontcache_io); Is this a debug printf that got left in by mistake? (Everything else in here looks fine to me...) --D > > while (keep_running()) > if (!test()) > -- > 2.47.1 > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] fsx: add support for RWF_DONTCACHE 2025-01-07 2:09 ` Darrick J. Wong @ 2025-01-07 2:12 ` Jens Axboe 0 siblings, 0 replies; 13+ messages in thread From: Jens Axboe @ 2025-01-07 2:12 UTC (permalink / raw) To: Darrick J. Wong; +Cc: zlang, fstests On 1/6/25 7:09 PM, Darrick J. Wong wrote: >> @@ -3293,6 +3329,9 @@ main(int argc, char **argv) >> copy_range_calls = test_copy_range(); >> if (exchange_range_calls) >> exchange_range_calls = test_exchange_range(); >> + if (dontcache_io) >> + dontcache_io = test_dontcache_io(); >> + printf("Dontcache_io=%d\n", dontcache_io); > > Is this a debug printf that got left in by mistake? > > (Everything else in here looks fine to me...) Oops indeed yes, good catch! -- Jens Axboe ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCHSET v2 0/2] Add RWF_DONTCACHE support @ 2025-01-07 16:05 Jens Axboe 2025-01-07 16:05 ` [PATCH 2/2] fsx: add support for RWF_DONTCACHE Jens Axboe 0 siblings, 1 reply; 13+ messages in thread From: Jens Axboe @ 2025-01-07 16:05 UTC (permalink / raw) To: zlang; +Cc: djwong, fstests Hi, This adds support for RWF_DONTCACHE for fsx and fsstress, which was used for testing the kernel patchset referenced in the two patches. Since v1: - Remove debug printf in fsx (Darrick) - Disable DONTCACHE in fsstress if it fails with -EOPNOTSUPP -- Jens Axboe ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 2/2] fsx: add support for RWF_DONTCACHE 2025-01-07 16:05 [PATCHSET v2 0/2] Add RWF_DONTCACHE support Jens Axboe @ 2025-01-07 16:05 ` Jens Axboe 2025-01-07 18:19 ` Darrick J. Wong 0 siblings, 1 reply; 13+ messages in thread From: Jens Axboe @ 2025-01-07 16:05 UTC (permalink / raw) To: zlang; +Cc: djwong, fstests, Jens Axboe Using RWF_DONTCACHE tells the kernel that any page cache instantiated by this operation should get pruned once the operation completes. If data is in cache prior to the operation it will remain there. Add ops for testing both the read and write side of this. At startup, kernel support for this feature is probed. If support isn't available, uncached/dontcache IO is performed as regular buffered IO. If -Z is used to turn on O_DIRECT, then uncached/dontcache IO isn't performed. Defaults to on if available, and adds a -T parameter to turn it off. See the kernel posting adding support: https://lore.kernel.org/linux-fsdevel/20241220154831.1086649-1-axboe@kernel.dk/ Signed-off-by: Jens Axboe <axboe@kernel.dk> --- ltp/fsx.c | 114 ++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 76 insertions(+), 38 deletions(-) diff --git a/ltp/fsx.c b/ltp/fsx.c index 41933354328a..9efd2f5c86d1 100644 --- a/ltp/fsx.c +++ b/ltp/fsx.c @@ -43,6 +43,10 @@ # define MAP_FILE 0 #endif +#ifndef RWF_DONTCACHE +#define RWF_DONTCACHE 0x80 +#endif + #define NUMPRINTCOLUMNS 32 /* # columns of data to print on each line */ /* Operation flags (bitmask) */ @@ -101,7 +105,9 @@ int logcount = 0; /* total ops */ enum { /* common operations */ OP_READ = 0, + OP_READ_DONTCACHE, OP_WRITE, + OP_WRITE_DONTCACHE, OP_MAPREAD, OP_MAPWRITE, OP_MAX_LITE, @@ -190,15 +196,16 @@ int o_direct; /* -Z */ int aio = 0; int uring = 0; int mark_nr = 0; +int dontcache_io = 1; int page_size; int page_mask; int mmap_mask; -int fsx_rw(int rw, int fd, char *buf, unsigned len, unsigned offset); +int fsx_rw(int rw, int fd, char *buf, unsigned len, unsigned offset, int flags); #define READ 0 #define WRITE 1 -#define fsxread(a,b,c,d) fsx_rw(READ, a,b,c,d) -#define fsxwrite(a,b,c,d) fsx_rw(WRITE, a,b,c,d) +#define fsxread(a,b,c,d,f) fsx_rw(READ, a,b,c,d,f) +#define fsxwrite(a,b,c,d,f) fsx_rw(WRITE, a,b,c,d,f) struct timespec deadline; @@ -266,7 +273,9 @@ prterr(const char *prefix) static const char *op_names[] = { [OP_READ] = "read", + [OP_READ_DONTCACHE] = "read_dontcache", [OP_WRITE] = "write", + [OP_WRITE_DONTCACHE] = "write_dontcache", [OP_MAPREAD] = "mapread", [OP_MAPWRITE] = "mapwrite", [OP_TRUNCATE] = "truncate", @@ -393,12 +402,14 @@ logdump(void) prt("\t******WWWW"); break; case OP_READ: + case OP_READ_DONTCACHE: prt("READ 0x%x thru 0x%x\t(0x%x bytes)", lp->args[0], lp->args[0] + lp->args[1] - 1, lp->args[1]); if (overlap) prt("\t***RRRR***"); break; + case OP_WRITE_DONTCACHE: case OP_WRITE: prt("WRITE 0x%x thru 0x%x\t(0x%x bytes)", lp->args[0], lp->args[0] + lp->args[1] - 1, @@ -784,9 +795,8 @@ doflush(unsigned offset, unsigned size) } void -doread(unsigned offset, unsigned size) +doread(unsigned offset, unsigned size, int flags) { - off_t ret; unsigned iret; offset -= offset % readbdy; @@ -818,12 +828,7 @@ doread(unsigned offset, unsigned size) (monitorend == -1 || offset <= monitorend)))))) prt("%lld read\t0x%x thru\t0x%x\t(0x%x bytes)\n", testcalls, offset, offset + size - 1, size); - ret = lseek(fd, (off_t)offset, SEEK_SET); - if (ret == (off_t)-1) { - prterr("doread: lseek"); - report_failure(140); - } - iret = fsxread(fd, temp_buf, size, offset); + iret = fsxread(fd, temp_buf, size, offset, flags); if (iret != size) { if (iret == -1) prterr("doread: read"); @@ -870,7 +875,6 @@ check_contents(void) unsigned map_offset; unsigned map_size; char *p; - off_t ret; unsigned iret; if (!check_buf) { @@ -885,13 +889,7 @@ check_contents(void) if (size == 0) return; - ret = lseek(fd, (off_t)offset, SEEK_SET); - if (ret == (off_t)-1) { - prterr("doread: lseek"); - report_failure(140); - } - - iret = fsxread(fd, check_buf, size, offset); + iret = fsxread(fd, check_buf, size, offset, 0); if (iret != size) { if (iret == -1) prterr("check_contents: read"); @@ -1064,9 +1062,8 @@ update_file_size(unsigned offset, unsigned size) } void -dowrite(unsigned offset, unsigned size) +dowrite(unsigned offset, unsigned size, int flags) { - off_t ret; unsigned iret; offset -= offset % writebdy; @@ -1099,14 +1096,9 @@ dowrite(unsigned offset, unsigned size) (monitorstart == -1 || (offset + size > monitorstart && (monitorend == -1 || offset <= monitorend)))))) - prt("%lld write\t0x%x thru\t0x%x\t(0x%x bytes)\n", testcalls, - offset, offset + size - 1, size); - ret = lseek(fd, (off_t)offset, SEEK_SET); - if (ret == (off_t)-1) { - prterr("dowrite: lseek"); - report_failure(150); - } - iret = fsxwrite(fd, good_buf + offset, size, offset); + prt("%lld write\t0x%x thru\t0x%x\t(0x%x bytes)\tdontcache=%d\n", testcalls, + offset, offset + size - 1, size, (flags & RWF_DONTCACHE) != 0); + iret = fsxwrite(fd, good_buf + offset, size, offset, flags); if (iret != size) { if (iret == -1) prterr("dowrite: write"); @@ -1954,6 +1946,26 @@ do_preallocate(unsigned offset, unsigned length, int keep_size, int unshare) } #endif +int +test_dontcache_io(void) +{ + char buf[4096]; + struct iovec iov = { .iov_base = buf, .iov_len = sizeof(buf) }; + int ret, e; + + ret = preadv2(fd, &iov, 1, 0, RWF_DONTCACHE); + e = ret < 0 ? errno : 0; + if (e == EOPNOTSUPP) { + if (!quiet) + fprintf(stderr, + "main: filesystem does not support " + "dontcache IO, disabling!\n"); + return 0; + } + + return 1; +} + void writefileimage() { @@ -2337,12 +2349,28 @@ have_op: switch (op) { case OP_READ: TRIM_OFF_LEN(offset, size, file_size); - doread(offset, size); + doread(offset, size, 0); + break; + + case OP_READ_DONTCACHE: + TRIM_OFF_LEN(offset, size, file_size); + if (dontcache_io) + doread(offset, size, RWF_DONTCACHE); + else + doread(offset, size, 0); break; case OP_WRITE: TRIM_OFF_LEN(offset, size, maxfilelen); - dowrite(offset, size); + dowrite(offset, size, 0); + break; + + case OP_WRITE_DONTCACHE: + TRIM_OFF_LEN(offset, size, maxfilelen); + if (dontcache_io) + dowrite(offset, size, RWF_DONTCACHE); + else + dowrite(offset, size, 0); break; case OP_MAPREAD: @@ -2538,6 +2566,7 @@ usage(void) " -0: Do not use exchange range calls\n" #endif " -K: Do not use keep size\n\ + -T: Do not use dontcache IO\n\ -L: fsxLite - no file creations & no file size changes\n\ -N numops: total # operations to do (default infinity)\n\ -O: use oplen (see -o flag) for every op (default random)\n\ @@ -2546,7 +2575,7 @@ usage(void) -S seed: for random # generator (default 1) 0 gets timestamp\n\ -W: mapped write operations DISabled\n\ -X: Read file and compare to good buffer after every operation\n\ - -Z: O_DIRECT (use -R, -W, -r and -w too)\n\ + -Z: O_DIRECT (use -R, -W, -r and -w too, excludes dontcache IO)\n\ --replay-ops=opsfile: replay ops from recorded .fsxops file\n\ --record-ops[=opsfile]: dump ops file also on success. optionally specify ops file name\n\ --duration=seconds: ignore any -N setting and run for this many seconds\n\ @@ -2702,7 +2731,7 @@ uring_setup() } int -uring_rw(int rw, int fd, char *buf, unsigned len, unsigned offset) +uring_rw(int rw, int fd, char *buf, unsigned len, unsigned offset, int flags) { struct io_uring_sqe *sqe; struct io_uring_cqe *cqe; @@ -2733,6 +2762,7 @@ uring_rw(int rw, int fd, char *buf, unsigned len, unsigned offset) } else { io_uring_prep_writev(sqe, fd, &iovec, 1, o); } + sqe->rw_flags = flags; ret = io_uring_submit_and_wait(&ring, 1); if (ret != 1) { @@ -2781,7 +2811,7 @@ uring_rw(int rw, int fd, char *buf, unsigned len, unsigned offset) } #else int -uring_rw(int rw, int fd, char *buf, unsigned len, unsigned offset) +uring_rw(int rw, int fd, char *buf, unsigned len, unsigned offset, int flags) { fprintf(stderr, "io_rw: need IO_URING support!\n"); exit(111); @@ -2789,19 +2819,21 @@ uring_rw(int rw, int fd, char *buf, unsigned len, unsigned offset) #endif int -fsx_rw(int rw, int fd, char *buf, unsigned len, unsigned offset) +fsx_rw(int rw, int fd, char *buf, unsigned len, unsigned offset, int flags) { int ret; if (aio) { ret = aio_rw(rw, fd, buf, len, offset); } else if (uring) { - ret = uring_rw(rw, fd, buf, len, offset); + ret = uring_rw(rw, fd, buf, len, offset, flags); } else { + struct iovec iov = { .iov_base = buf, .iov_len = len }; + if (rw == READ) - ret = read(fd, buf, len); + ret = preadv2(fd, &iov, 1, offset, flags); else - ret = write(fd, buf, len); + ret = pwritev2(fd, &iov, 1, offset, flags); } return ret; } @@ -3065,6 +3097,9 @@ main(int argc, char **argv) if (seed < 0) usage(); break; + case 'T': + dontcache_io = 0; + break; case 'W': mapped_writes = 0; if (!quiet) @@ -3076,6 +3111,7 @@ main(int argc, char **argv) case 'Z': o_direct = O_DIRECT; o_flags |= O_DIRECT; + dontcache_io = 0; break; case 254: /* --duration */ if (!optarg) { @@ -3293,6 +3329,8 @@ main(int argc, char **argv) copy_range_calls = test_copy_range(); if (exchange_range_calls) exchange_range_calls = test_exchange_range(); + if (dontcache_io) + dontcache_io = test_dontcache_io(); while (keep_running()) if (!test()) -- 2.47.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] fsx: add support for RWF_DONTCACHE 2025-01-07 16:05 ` [PATCH 2/2] fsx: add support for RWF_DONTCACHE Jens Axboe @ 2025-01-07 18:19 ` Darrick J. Wong 2025-01-07 18:24 ` Jens Axboe 0 siblings, 1 reply; 13+ messages in thread From: Darrick J. Wong @ 2025-01-07 18:19 UTC (permalink / raw) To: Jens Axboe; +Cc: zlang, fstests On Tue, Jan 07, 2025 at 09:05:15AM -0700, Jens Axboe wrote: > Using RWF_DONTCACHE tells the kernel that any page cache instantiated > by this operation should get pruned once the operation completes. If > data is in cache prior to the operation it will remain there. > > Add ops for testing both the read and write side of this. At startup, > kernel support for this feature is probed. If support isn't available, > uncached/dontcache IO is performed as regular buffered IO. If -Z is > used to turn on O_DIRECT, then uncached/dontcache IO isn't performed. Huh. Does the kernel reject RWF_DONTCACHE for directio? And, if a directio implementation falls back to the pagecache (e.g. xfs when doing a sub-fsblock cow write), do we: (a) want RWF_DONTCACHE to propagate through to the buffered io implementation (which I think xfs does) and (b) should filesystems *turn it on* any time they fall back, even if the original IO request didn't set DONTCACHE? (Aside from those questions, the code changes look good.) --D > Defaults to on if available, and adds a -T parameter to turn it off. > > See the kernel posting adding support: > > https://lore.kernel.org/linux-fsdevel/20241220154831.1086649-1-axboe@kernel.dk/ > > Signed-off-by: Jens Axboe <axboe@kernel.dk> > --- > ltp/fsx.c | 114 ++++++++++++++++++++++++++++++++++++------------------ > 1 file changed, 76 insertions(+), 38 deletions(-) > > diff --git a/ltp/fsx.c b/ltp/fsx.c > index 41933354328a..9efd2f5c86d1 100644 > --- a/ltp/fsx.c > +++ b/ltp/fsx.c > @@ -43,6 +43,10 @@ > # define MAP_FILE 0 > #endif > > +#ifndef RWF_DONTCACHE > +#define RWF_DONTCACHE 0x80 > +#endif > + > #define NUMPRINTCOLUMNS 32 /* # columns of data to print on each line */ > > /* Operation flags (bitmask) */ > @@ -101,7 +105,9 @@ int logcount = 0; /* total ops */ > enum { > /* common operations */ > OP_READ = 0, > + OP_READ_DONTCACHE, > OP_WRITE, > + OP_WRITE_DONTCACHE, > OP_MAPREAD, > OP_MAPWRITE, > OP_MAX_LITE, > @@ -190,15 +196,16 @@ int o_direct; /* -Z */ > int aio = 0; > int uring = 0; > int mark_nr = 0; > +int dontcache_io = 1; > > int page_size; > int page_mask; > int mmap_mask; > -int fsx_rw(int rw, int fd, char *buf, unsigned len, unsigned offset); > +int fsx_rw(int rw, int fd, char *buf, unsigned len, unsigned offset, int flags); > #define READ 0 > #define WRITE 1 > -#define fsxread(a,b,c,d) fsx_rw(READ, a,b,c,d) > -#define fsxwrite(a,b,c,d) fsx_rw(WRITE, a,b,c,d) > +#define fsxread(a,b,c,d,f) fsx_rw(READ, a,b,c,d,f) > +#define fsxwrite(a,b,c,d,f) fsx_rw(WRITE, a,b,c,d,f) > > struct timespec deadline; > > @@ -266,7 +273,9 @@ prterr(const char *prefix) > > static const char *op_names[] = { > [OP_READ] = "read", > + [OP_READ_DONTCACHE] = "read_dontcache", > [OP_WRITE] = "write", > + [OP_WRITE_DONTCACHE] = "write_dontcache", > [OP_MAPREAD] = "mapread", > [OP_MAPWRITE] = "mapwrite", > [OP_TRUNCATE] = "truncate", > @@ -393,12 +402,14 @@ logdump(void) > prt("\t******WWWW"); > break; > case OP_READ: > + case OP_READ_DONTCACHE: > prt("READ 0x%x thru 0x%x\t(0x%x bytes)", > lp->args[0], lp->args[0] + lp->args[1] - 1, > lp->args[1]); > if (overlap) > prt("\t***RRRR***"); > break; > + case OP_WRITE_DONTCACHE: > case OP_WRITE: > prt("WRITE 0x%x thru 0x%x\t(0x%x bytes)", > lp->args[0], lp->args[0] + lp->args[1] - 1, > @@ -784,9 +795,8 @@ doflush(unsigned offset, unsigned size) > } > > void > -doread(unsigned offset, unsigned size) > +doread(unsigned offset, unsigned size, int flags) > { > - off_t ret; > unsigned iret; > > offset -= offset % readbdy; > @@ -818,12 +828,7 @@ doread(unsigned offset, unsigned size) > (monitorend == -1 || offset <= monitorend)))))) > prt("%lld read\t0x%x thru\t0x%x\t(0x%x bytes)\n", testcalls, > offset, offset + size - 1, size); > - ret = lseek(fd, (off_t)offset, SEEK_SET); > - if (ret == (off_t)-1) { > - prterr("doread: lseek"); > - report_failure(140); > - } > - iret = fsxread(fd, temp_buf, size, offset); > + iret = fsxread(fd, temp_buf, size, offset, flags); > if (iret != size) { > if (iret == -1) > prterr("doread: read"); > @@ -870,7 +875,6 @@ check_contents(void) > unsigned map_offset; > unsigned map_size; > char *p; > - off_t ret; > unsigned iret; > > if (!check_buf) { > @@ -885,13 +889,7 @@ check_contents(void) > if (size == 0) > return; > > - ret = lseek(fd, (off_t)offset, SEEK_SET); > - if (ret == (off_t)-1) { > - prterr("doread: lseek"); > - report_failure(140); > - } > - > - iret = fsxread(fd, check_buf, size, offset); > + iret = fsxread(fd, check_buf, size, offset, 0); > if (iret != size) { > if (iret == -1) > prterr("check_contents: read"); > @@ -1064,9 +1062,8 @@ update_file_size(unsigned offset, unsigned size) > } > > void > -dowrite(unsigned offset, unsigned size) > +dowrite(unsigned offset, unsigned size, int flags) > { > - off_t ret; > unsigned iret; > > offset -= offset % writebdy; > @@ -1099,14 +1096,9 @@ dowrite(unsigned offset, unsigned size) > (monitorstart == -1 || > (offset + size > monitorstart && > (monitorend == -1 || offset <= monitorend)))))) > - prt("%lld write\t0x%x thru\t0x%x\t(0x%x bytes)\n", testcalls, > - offset, offset + size - 1, size); > - ret = lseek(fd, (off_t)offset, SEEK_SET); > - if (ret == (off_t)-1) { > - prterr("dowrite: lseek"); > - report_failure(150); > - } > - iret = fsxwrite(fd, good_buf + offset, size, offset); > + prt("%lld write\t0x%x thru\t0x%x\t(0x%x bytes)\tdontcache=%d\n", testcalls, > + offset, offset + size - 1, size, (flags & RWF_DONTCACHE) != 0); > + iret = fsxwrite(fd, good_buf + offset, size, offset, flags); > if (iret != size) { > if (iret == -1) > prterr("dowrite: write"); > @@ -1954,6 +1946,26 @@ do_preallocate(unsigned offset, unsigned length, int keep_size, int unshare) > } > #endif > > +int > +test_dontcache_io(void) > +{ > + char buf[4096]; > + struct iovec iov = { .iov_base = buf, .iov_len = sizeof(buf) }; > + int ret, e; > + > + ret = preadv2(fd, &iov, 1, 0, RWF_DONTCACHE); > + e = ret < 0 ? errno : 0; > + if (e == EOPNOTSUPP) { > + if (!quiet) > + fprintf(stderr, > + "main: filesystem does not support " > + "dontcache IO, disabling!\n"); > + return 0; > + } > + > + return 1; > +} > + > void > writefileimage() > { > @@ -2337,12 +2349,28 @@ have_op: > switch (op) { > case OP_READ: > TRIM_OFF_LEN(offset, size, file_size); > - doread(offset, size); > + doread(offset, size, 0); > + break; > + > + case OP_READ_DONTCACHE: > + TRIM_OFF_LEN(offset, size, file_size); > + if (dontcache_io) > + doread(offset, size, RWF_DONTCACHE); > + else > + doread(offset, size, 0); > break; > > case OP_WRITE: > TRIM_OFF_LEN(offset, size, maxfilelen); > - dowrite(offset, size); > + dowrite(offset, size, 0); > + break; > + > + case OP_WRITE_DONTCACHE: > + TRIM_OFF_LEN(offset, size, maxfilelen); > + if (dontcache_io) > + dowrite(offset, size, RWF_DONTCACHE); > + else > + dowrite(offset, size, 0); > break; > > case OP_MAPREAD: > @@ -2538,6 +2566,7 @@ usage(void) > " -0: Do not use exchange range calls\n" > #endif > " -K: Do not use keep size\n\ > + -T: Do not use dontcache IO\n\ > -L: fsxLite - no file creations & no file size changes\n\ > -N numops: total # operations to do (default infinity)\n\ > -O: use oplen (see -o flag) for every op (default random)\n\ > @@ -2546,7 +2575,7 @@ usage(void) > -S seed: for random # generator (default 1) 0 gets timestamp\n\ > -W: mapped write operations DISabled\n\ > -X: Read file and compare to good buffer after every operation\n\ > - -Z: O_DIRECT (use -R, -W, -r and -w too)\n\ > + -Z: O_DIRECT (use -R, -W, -r and -w too, excludes dontcache IO)\n\ > --replay-ops=opsfile: replay ops from recorded .fsxops file\n\ > --record-ops[=opsfile]: dump ops file also on success. optionally specify ops file name\n\ > --duration=seconds: ignore any -N setting and run for this many seconds\n\ > @@ -2702,7 +2731,7 @@ uring_setup() > } > > int > -uring_rw(int rw, int fd, char *buf, unsigned len, unsigned offset) > +uring_rw(int rw, int fd, char *buf, unsigned len, unsigned offset, int flags) > { > struct io_uring_sqe *sqe; > struct io_uring_cqe *cqe; > @@ -2733,6 +2762,7 @@ uring_rw(int rw, int fd, char *buf, unsigned len, unsigned offset) > } else { > io_uring_prep_writev(sqe, fd, &iovec, 1, o); > } > + sqe->rw_flags = flags; > > ret = io_uring_submit_and_wait(&ring, 1); > if (ret != 1) { > @@ -2781,7 +2811,7 @@ uring_rw(int rw, int fd, char *buf, unsigned len, unsigned offset) > } > #else > int > -uring_rw(int rw, int fd, char *buf, unsigned len, unsigned offset) > +uring_rw(int rw, int fd, char *buf, unsigned len, unsigned offset, int flags) > { > fprintf(stderr, "io_rw: need IO_URING support!\n"); > exit(111); > @@ -2789,19 +2819,21 @@ uring_rw(int rw, int fd, char *buf, unsigned len, unsigned offset) > #endif > > int > -fsx_rw(int rw, int fd, char *buf, unsigned len, unsigned offset) > +fsx_rw(int rw, int fd, char *buf, unsigned len, unsigned offset, int flags) > { > int ret; > > if (aio) { > ret = aio_rw(rw, fd, buf, len, offset); > } else if (uring) { > - ret = uring_rw(rw, fd, buf, len, offset); > + ret = uring_rw(rw, fd, buf, len, offset, flags); > } else { > + struct iovec iov = { .iov_base = buf, .iov_len = len }; > + > if (rw == READ) > - ret = read(fd, buf, len); > + ret = preadv2(fd, &iov, 1, offset, flags); > else > - ret = write(fd, buf, len); > + ret = pwritev2(fd, &iov, 1, offset, flags); > } > return ret; > } > @@ -3065,6 +3097,9 @@ main(int argc, char **argv) > if (seed < 0) > usage(); > break; > + case 'T': > + dontcache_io = 0; > + break; > case 'W': > mapped_writes = 0; > if (!quiet) > @@ -3076,6 +3111,7 @@ main(int argc, char **argv) > case 'Z': > o_direct = O_DIRECT; > o_flags |= O_DIRECT; > + dontcache_io = 0; > break; > case 254: /* --duration */ > if (!optarg) { > @@ -3293,6 +3329,8 @@ main(int argc, char **argv) > copy_range_calls = test_copy_range(); > if (exchange_range_calls) > exchange_range_calls = test_exchange_range(); > + if (dontcache_io) > + dontcache_io = test_dontcache_io(); > > while (keep_running()) > if (!test()) > -- > 2.47.1 > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] fsx: add support for RWF_DONTCACHE 2025-01-07 18:19 ` Darrick J. Wong @ 2025-01-07 18:24 ` Jens Axboe 2025-01-07 23:22 ` Darrick J. Wong 0 siblings, 1 reply; 13+ messages in thread From: Jens Axboe @ 2025-01-07 18:24 UTC (permalink / raw) To: Darrick J. Wong; +Cc: zlang, fstests On 1/7/25 11:19 AM, Darrick J. Wong wrote: > On Tue, Jan 07, 2025 at 09:05:15AM -0700, Jens Axboe wrote: >> Using RWF_DONTCACHE tells the kernel that any page cache instantiated >> by this operation should get pruned once the operation completes. If >> data is in cache prior to the operation it will remain there. >> >> Add ops for testing both the read and write side of this. At startup, >> kernel support for this feature is probed. If support isn't available, >> uncached/dontcache IO is performed as regular buffered IO. If -Z is >> used to turn on O_DIRECT, then uncached/dontcache IO isn't performed. > > Huh. Does the kernel reject RWF_DONTCACHE for directio? And, if a It doesn't, it simply ignores it. Not sure why you ask? It's buffered IO after all, falling back to just clearing the flag seems like the most sensible solution here. > directio implementation falls back to the pagecache (e.g. xfs when doing > a sub-fsblock cow write), do we: > > (a) want RWF_DONTCACHE to propagate through to the buffered io > implementation (which I think xfs does) and Maybe? The current implementation keeps things simple and doesn't touch any of that stuff, but conceptually it'd make sense to mark those buffered ranges as uncached, if instantiated as buffered IO on behalf of direct IO. > (b) should filesystems *turn it on* any time they fall back, even if the > original IO request didn't set DONTCACHE? Same answer :-) -- Jens Axboe ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] fsx: add support for RWF_DONTCACHE 2025-01-07 18:24 ` Jens Axboe @ 2025-01-07 23:22 ` Darrick J. Wong 2025-01-08 0:00 ` Jens Axboe 0 siblings, 1 reply; 13+ messages in thread From: Darrick J. Wong @ 2025-01-07 23:22 UTC (permalink / raw) To: Jens Axboe; +Cc: zlang, fstests On Tue, Jan 07, 2025 at 11:24:13AM -0700, Jens Axboe wrote: > On 1/7/25 11:19 AM, Darrick J. Wong wrote: > > On Tue, Jan 07, 2025 at 09:05:15AM -0700, Jens Axboe wrote: > >> Using RWF_DONTCACHE tells the kernel that any page cache instantiated > >> by this operation should get pruned once the operation completes. If > >> data is in cache prior to the operation it will remain there. > >> > >> Add ops for testing both the read and write side of this. At startup, > >> kernel support for this feature is probed. If support isn't available, > >> uncached/dontcache IO is performed as regular buffered IO. If -Z is > >> used to turn on O_DIRECT, then uncached/dontcache IO isn't performed. > > > > Huh. Does the kernel reject RWF_DONTCACHE for directio? And, if a > > It doesn't, it simply ignores it. Not sure why you ask? It's buffered IO > after all, falling back to just clearing the flag seems like the most > sensible solution here. I was curious, because your code does has_dontcache=0 when -Z is used to select directio mode. So I wondered if it that was because the kernel would return EOPNOTSUPP for directio + RWF_DONTCACHE? :) Then I wondered if there was actually a good usecase either for letting userspace specify it, or for filesystems to add it for buffered write fallback. At this point I would wager there's a stronger case for adding drop-behind automatically because userspace shouldn't have to communicate "write this without accessing the page cache, and don't leave file contents in the page cache that I already told you not to do." Anyway the fstests change satisfies me now so Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> --D > > directio implementation falls back to the pagecache (e.g. xfs when doing > > a sub-fsblock cow write), do we: > > > > (a) want RWF_DONTCACHE to propagate through to the buffered io > > implementation (which I think xfs does) and > > Maybe? The current implementation keeps things simple and doesn't touch > any of that stuff, but conceptually it'd make sense to mark those > buffered ranges as uncached, if instantiated as buffered IO on behalf of > direct IO. > > > (b) should filesystems *turn it on* any time they fall back, even if the > > original IO request didn't set DONTCACHE? > > Same answer :-) > > -- > Jens Axboe > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] fsx: add support for RWF_DONTCACHE 2025-01-07 23:22 ` Darrick J. Wong @ 2025-01-08 0:00 ` Jens Axboe 0 siblings, 0 replies; 13+ messages in thread From: Jens Axboe @ 2025-01-08 0:00 UTC (permalink / raw) To: Darrick J. Wong; +Cc: zlang, fstests On 1/7/25 4:22 PM, Darrick J. Wong wrote: > On Tue, Jan 07, 2025 at 11:24:13AM -0700, Jens Axboe wrote: >> On 1/7/25 11:19 AM, Darrick J. Wong wrote: >>> On Tue, Jan 07, 2025 at 09:05:15AM -0700, Jens Axboe wrote: >>>> Using RWF_DONTCACHE tells the kernel that any page cache instantiated >>>> by this operation should get pruned once the operation completes. If >>>> data is in cache prior to the operation it will remain there. >>>> >>>> Add ops for testing both the read and write side of this. At startup, >>>> kernel support for this feature is probed. If support isn't available, >>>> uncached/dontcache IO is performed as regular buffered IO. If -Z is >>>> used to turn on O_DIRECT, then uncached/dontcache IO isn't performed. >>> >>> Huh. Does the kernel reject RWF_DONTCACHE for directio? And, if a >> >> It doesn't, it simply ignores it. Not sure why you ask? It's buffered IO >> after all, falling back to just clearing the flag seems like the most >> sensible solution here. > > I was curious, because your code does has_dontcache=0 when -Z is used to > select directio mode. So I wondered if it that was because the kernel > would return EOPNOTSUPP for directio + RWF_DONTCACHE? :) Ah gotcha - no that's not the case, it's just that it doesn't make any sense to open a file O_DIRECT and then use RWF_DONTCACHE, when it already shouldn't be cached. Outside of the case you brought up where we'd want to drop page cache for O_DIRECT that ends up allocating page cache, but that should be done regardless imho. > Then I wondered if there was actually a good usecase either for letting > userspace specify it, or for filesystems to add it for buffered write > fallback. At this point I would wager there's a stronger case for > adding drop-behind automatically because userspace shouldn't have to > communicate "write this without accessing the page cache, and don't > leave file contents in the page cache that I already told you not to > do." Yeah agree, I think we should just use the same mechanism for O_DIRECT instantiated page cache, without needing the app setting RWF_DONTCACHE. O_DIRECT is also slowed down by the existence of page cache, which can be annoying particularly when you fill a file with O_DIRECT and now have a bunch of page cache for it. > Anyway the fstests change satisfies me now so > Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Thanks! -- Jens Axboe ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-01-08 0:00 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-01-06 17:48 [PATCHSET 0/2] Add RWF_DONTCACHE support Jens Axboe 2025-01-06 17:48 ` [PATCH 1/2] fsstress: add support for RWF_DONTCACHE Jens Axboe 2025-01-07 2:11 ` Darrick J. Wong 2025-01-07 2:16 ` Jens Axboe 2025-01-07 17:30 ` Darrick J. Wong 2025-01-06 17:48 ` [PATCH 2/2] fsx: " Jens Axboe 2025-01-07 2:09 ` Darrick J. Wong 2025-01-07 2:12 ` Jens Axboe -- strict thread matches above, loose matches on Subject: below -- 2025-01-07 16:05 [PATCHSET v2 0/2] Add RWF_DONTCACHE support Jens Axboe 2025-01-07 16:05 ` [PATCH 2/2] fsx: add support for RWF_DONTCACHE Jens Axboe 2025-01-07 18:19 ` Darrick J. Wong 2025-01-07 18:24 ` Jens Axboe 2025-01-07 23:22 ` Darrick J. Wong 2025-01-08 0:00 ` Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox