* [PATCH] ceph: do not fill fscache for RWF_DONTCACHE writeback @ 2026-04-01 20:56 Max Kellermann 2026-04-02 19:44 ` Viacheslav Dubeyko 0 siblings, 1 reply; 5+ messages in thread From: Max Kellermann @ 2026-04-01 20:56 UTC (permalink / raw) To: idryomov, amarkuze, ceph-devel, dhowells, pc, netfs, linux-fsdevel, linux-kernel Cc: Max Kellermann Avoid populating the local fscache with writeback from dropbehind folios. At the moment, buffered RWF_DONTCACHE writes still go through the usual Ceph writeback path, which mirrors the written data into fscache. The data is dropped from the page cache, but we still spend local I/O and local cache space to retain a copy in fscache. The DONTCACHE documentation is only about the page cache and the intent is to avoid caching data that will not be needed again soon. I believe skipping fscache writes during Ceph writeback on such pages would follow the same spirit: commit the write to permanent storage, but otherwise get it out of the way quickly. Use folio_test_dropbehind() to treat such folios as non-cacheable for the purposes of Ceph's write-side fscache population. This skips both ceph_set_page_fscache() and the corresponding write-to-cache operation for dropbehind folios. The writepages path can batch together folios with different cacheability, so track cacheable subranges separately and only submit fscache writes for contiguous non-dropbehind spans. This keeps normal buffered writeback unchanged, while making RWF_DONTCACHE a better match for its intended "don't retain this locally" behavior and avoiding unnecessary local cache traffic. Signed-off-by: Max Kellermann <max.kellermann@ionos.com> --- Note: this is an additional feature on top of my Ceph-DONTCACHE patch, see https://lore.kernel.org/ceph-devel/20260401053109.1861724-1-max.kellermann@ionos.com/ --- fs/ceph/addr.c | 34 ++++++++++++++++++++++++++++++---- 1 file changed, 30 insertions(+), 4 deletions(-) diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index 2090fc78529c..9612a1d8ccb2 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -576,6 +576,21 @@ static inline void ceph_fscache_write_to_cache(struct inode *inode, u64 off, u64 } #endif /* CONFIG_CEPH_FSCACHE */ +static inline bool ceph_folio_is_cacheable(const struct folio *folio, bool caching) +{ + /* Dropbehind writeback should not populate the local fscache. */ + return caching && !folio_test_dropbehind(folio); +} + +static inline void ceph_flush_fscache_write(struct inode *inode, u64 off, u64 *len) +{ + if (!*len) + return; + + ceph_fscache_write_to_cache(inode, off, *len, true); + *len = 0; +} + struct ceph_writeback_ctl { loff_t i_size; @@ -730,7 +745,7 @@ static int write_folio_nounlock(struct folio *folio, struct ceph_writeback_ctl ceph_wbc; struct ceph_osd_client *osdc = &fsc->client->osdc; struct ceph_osd_request *req; - bool caching = ceph_is_cache_enabled(inode); + bool caching = ceph_folio_is_cacheable(folio, ceph_is_cache_enabled(inode)); struct page *bounce_page = NULL; doutc(cl, "%llx.%llx folio %p idx %lu\n", ceph_vinop(inode), folio, @@ -1412,11 +1427,14 @@ int ceph_submit_write(struct address_space *mapping, bool caching = ceph_is_cache_enabled(inode); u64 offset; u64 len; + u64 cache_offset, cache_len; unsigned i; new_request: offset = ceph_fscrypt_page_offset(ceph_wbc->pages[0]); len = ceph_wbc->wsize; + cache_offset = 0; + cache_len = 0; req = ceph_osdc_new_request(&fsc->client->osdc, &ci->i_layout, vino, @@ -1477,9 +1495,11 @@ int ceph_submit_write(struct address_space *mapping, ceph_wbc->op_idx = 0; for (i = 0; i < ceph_wbc->locked_pages; i++) { u64 cur_offset; + bool cache_page; page = ceph_fscrypt_pagecache_page(ceph_wbc->pages[i]); cur_offset = page_offset(page); + cache_page = ceph_folio_is_cacheable(page_folio(page), caching); /* * Discontinuity in page range? Ceph can handle that by just passing @@ -1491,7 +1511,7 @@ int ceph_submit_write(struct address_space *mapping, break; /* Kick off an fscache write with what we have so far. */ - ceph_fscache_write_to_cache(inode, offset, len, caching); + ceph_flush_fscache_write(inode, cache_offset, &cache_len); /* Start a new extent */ osd_req_op_extent_dup_last(req, ceph_wbc->op_idx, @@ -1514,13 +1534,19 @@ int ceph_submit_write(struct address_space *mapping, set_page_writeback(page); - if (caching) + if (cache_page) { + if (!cache_len) + cache_offset = cur_offset; ceph_set_page_fscache(page); + cache_len += thp_size(page); + } else { + ceph_flush_fscache_write(inode, cache_offset, &cache_len); + } len += thp_size(page); } - ceph_fscache_write_to_cache(inode, offset, len, caching); + ceph_flush_fscache_write(inode, cache_offset, &cache_len); if (ceph_wbc->size_stable) { len = min(len, ceph_wbc->i_size - offset); -- 2.47.3 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] ceph: do not fill fscache for RWF_DONTCACHE writeback 2026-04-01 20:56 [PATCH] ceph: do not fill fscache for RWF_DONTCACHE writeback Max Kellermann @ 2026-04-02 19:44 ` Viacheslav Dubeyko 2026-04-03 6:52 ` Max Kellermann 0 siblings, 1 reply; 5+ messages in thread From: Viacheslav Dubeyko @ 2026-04-02 19:44 UTC (permalink / raw) To: Max Kellermann, idryomov, amarkuze, ceph-devel, dhowells, pc, netfs, linux-fsdevel, linux-kernel On Wed, 2026-04-01 at 22:56 +0200, Max Kellermann wrote: > Avoid populating the local fscache with writeback from dropbehind > folios. > The idea sounds reasonable enough. However, this patch cannot be standalone because it depends on another one. I assume that a filesystem must declare DONTCACHE feature support by setting FOP_DONTCACHE in its file_operations.fop_flags. Am I right here? And what's about the IOCB_DONTCACHE. As far as I can see, write_begin_get_folio() translates IOCB_DONTCACHE into FGP_DONTCACHE: static inline struct folio *write_begin_get_folio(const struct kiocb *iocb, struct address_space *mapping, pgoff_t index, size_t len) { fgf_t fgp_flags = FGP_WRITEBEGIN; fgp_flags |= fgf_set_order(len); if (iocb && iocb->ki_flags & IOCB_DONTCACHE) fgp_flags |= FGP_DONTCACHE; return __filemap_get_folio(mapping, index, fgp_flags, mapping_gfp_mask(mapping)); } The Ceph write_begin path calls netfs_write_begin() but does not pass IOCB_DONTCACHE through to trigger __folio_set_dropbehind. So, folio_test_dropbehind() would never be true on the Ceph write path right now. Does it make sense? > At the moment, buffered RWF_DONTCACHE writes still go through the > usual Ceph writeback path, which mirrors the written data into > fscache. The data is dropped from the page cache, but we still spend > local I/O and local cache space to retain a copy in fscache. > > The DONTCACHE documentation is only about the page cache and the > intent is to avoid caching data that will not be needed again soon. > I believe skipping fscache writes during Ceph writeback on such pages > would follow the same spirit: commit the write to permanent storage, > but otherwise get it out of the way quickly. > > Use folio_test_dropbehind() to treat such folios as non-cacheable for > the purposes of Ceph's write-side fscache population. This skips both > ceph_set_page_fscache() and the corresponding write-to-cache operation > for dropbehind folios. > > The writepages path can batch together folios with different cacheability, > so track cacheable subranges separately and only submit fscache writes > for contiguous non-dropbehind spans. > > This keeps normal buffered writeback unchanged, while making > RWF_DONTCACHE a better match for its intended "don't retain this > locally" behavior and avoiding unnecessary local cache traffic. > > Signed-off-by: Max Kellermann <max.kellermann@ionos.com> > --- > Note: this is an additional feature on top of my Ceph-DONTCACHE patch, > see https://lore.kernel.org/ceph-devel/20260401053109.1861724-1-max.kellermann@ionos.com/ > --- > fs/ceph/addr.c | 34 ++++++++++++++++++++++++++++++---- > 1 file changed, 30 insertions(+), 4 deletions(-) > > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c > index 2090fc78529c..9612a1d8ccb2 100644 > --- a/fs/ceph/addr.c > +++ b/fs/ceph/addr.c > @@ -576,6 +576,21 @@ static inline void ceph_fscache_write_to_cache(struct inode *inode, u64 off, u64 > } > #endif /* CONFIG_CEPH_FSCACHE */ > > +static inline bool ceph_folio_is_cacheable(const struct folio *folio, bool caching) > +{ > + /* Dropbehind writeback should not populate the local fscache. */ > + return caching && !folio_test_dropbehind(folio); > +} > + > +static inline void ceph_flush_fscache_write(struct inode *inode, u64 off, u64 *len) > +{ > + if (!*len) > + return; > + > + ceph_fscache_write_to_cache(inode, off, *len, true); Are you sure that caching should be always true? All other calls checks that ceph_is_cache_enabled(): bool caching = ceph_is_cache_enabled(inode); > + *len = 0; > +} The ceph_folio_is_cacheable() and ceph_flush_fscache_write() are out of CONFIG_CEPH_FSCACHE. It doesn't look right. > + > struct ceph_writeback_ctl > { > loff_t i_size; > @@ -730,7 +745,7 @@ static int write_folio_nounlock(struct folio *folio, > struct ceph_writeback_ctl ceph_wbc; > struct ceph_osd_client *osdc = &fsc->client->osdc; > struct ceph_osd_request *req; > - bool caching = ceph_is_cache_enabled(inode); > + bool caching = ceph_folio_is_cacheable(folio, ceph_is_cache_enabled(inode)); > struct page *bounce_page = NULL; > > doutc(cl, "%llx.%llx folio %p idx %lu\n", ceph_vinop(inode), folio, > @@ -1412,11 +1427,14 @@ int ceph_submit_write(struct address_space *mapping, > bool caching = ceph_is_cache_enabled(inode); > u64 offset; > u64 len; > + u64 cache_offset, cache_len; Why do you need to introduce the cache_offset and cache_len? We already have offset and len. > unsigned i; > > new_request: > offset = ceph_fscrypt_page_offset(ceph_wbc->pages[0]); > len = ceph_wbc->wsize; > + cache_offset = 0; Is it correct initialization? Frankly speaking, I don't quite follow why we need such initialization. Thanks, Slava. > + cache_len = 0; > > req = ceph_osdc_new_request(&fsc->client->osdc, > &ci->i_layout, vino, > @@ -1477,9 +1495,11 @@ int ceph_submit_write(struct address_space *mapping, > ceph_wbc->op_idx = 0; > for (i = 0; i < ceph_wbc->locked_pages; i++) { > u64 cur_offset; > + bool cache_page; > > page = ceph_fscrypt_pagecache_page(ceph_wbc->pages[i]); > cur_offset = page_offset(page); > + cache_page = ceph_folio_is_cacheable(page_folio(page), caching); > > /* > * Discontinuity in page range? Ceph can handle that by just passing > @@ -1491,7 +1511,7 @@ int ceph_submit_write(struct address_space *mapping, > break; > > /* Kick off an fscache write with what we have so far. */ > - ceph_fscache_write_to_cache(inode, offset, len, caching); > + ceph_flush_fscache_write(inode, cache_offset, &cache_len); > > /* Start a new extent */ > osd_req_op_extent_dup_last(req, ceph_wbc->op_idx, > @@ -1514,13 +1534,19 @@ int ceph_submit_write(struct address_space *mapping, > > set_page_writeback(page); > > - if (caching) > + if (cache_page) { > + if (!cache_len) > + cache_offset = cur_offset; > ceph_set_page_fscache(page); > + cache_len += thp_size(page); > + } else { > + ceph_flush_fscache_write(inode, cache_offset, &cache_len); > + } > > len += thp_size(page); > } > > - ceph_fscache_write_to_cache(inode, offset, len, caching); > + ceph_flush_fscache_write(inode, cache_offset, &cache_len); > > if (ceph_wbc->size_stable) { > len = min(len, ceph_wbc->i_size - offset); ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] ceph: do not fill fscache for RWF_DONTCACHE writeback 2026-04-02 19:44 ` Viacheslav Dubeyko @ 2026-04-03 6:52 ` Max Kellermann 2026-04-03 17:18 ` Viacheslav Dubeyko 0 siblings, 1 reply; 5+ messages in thread From: Max Kellermann @ 2026-04-03 6:52 UTC (permalink / raw) To: Viacheslav Dubeyko Cc: idryomov, amarkuze, ceph-devel, dhowells, pc, netfs, linux-fsdevel, linux-kernel On Thu, Apr 2, 2026 at 9:44 PM Viacheslav Dubeyko <vdubeyko@redhat.com> wrote: > The Ceph write_begin path calls netfs_write_begin() but does not pass > IOCB_DONTCACHE through to trigger __folio_set_dropbehind. So, > folio_test_dropbehind() would never be true on the Ceph write path right now. > Does it make sense? Yes, see: > > --- > > Note: this is an additional feature on top of my Ceph-DONTCACHE patch, > > see https://lore.kernel.org/ceph-devel/20260401053109.1861724-1-max.kellermann@ionos.com/ The code in this patch is not reachable until my RWF_DONTCACHE patch is merged as well. > Are you sure that caching should be always true? All other calls checks that > ceph_is_cache_enabled(): > > bool caching = ceph_is_cache_enabled(inode); This function is only called if caching is enabled. > > > + *len = 0; > > +} > > > The ceph_folio_is_cacheable() and ceph_flush_fscache_write() are out of > CONFIG_CEPH_FSCACHE. It doesn't look right. All of the old code is out of CONFIG_CEPH_FSCACHE, too. Does the old code look right? > > @@ -1412,11 +1427,14 @@ int ceph_submit_write(struct address_space *mapping, > > bool caching = ceph_is_cache_enabled(inode); > > u64 offset; > > u64 len; > > + u64 cache_offset, cache_len; > > Why do you need to introduce the cache_offset and cache_len? We already have > offset and len. These keep track of the region that should be submitted to fscache. Folios without "dropbehind" need to be excluded from that. > > new_request: > > offset = ceph_fscrypt_page_offset(ceph_wbc->pages[0]); > > len = ceph_wbc->wsize; > > + cache_offset = 0; > > Is it correct initialization? Frankly speaking, I don't quite follow why we need > such initialization. Technically, cache_offset does not need to be initialized as long as cache_len is zero because then its value is never used. Would you feel more comfortable if I remove the unnecessary initializer? I wasn't sure which approach would raise fewer eyebrows. -- Max Kellermann Principal Architect Hosting Technology cm4all | Im Mediapark 6a | 50670 Köln | Germany General information about the company can be found here: https://www.cm4all.com/impressum A member of the IONOS Group ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] ceph: do not fill fscache for RWF_DONTCACHE writeback 2026-04-03 6:52 ` Max Kellermann @ 2026-04-03 17:18 ` Viacheslav Dubeyko 2026-04-03 18:13 ` Max Kellermann 0 siblings, 1 reply; 5+ messages in thread From: Viacheslav Dubeyko @ 2026-04-03 17:18 UTC (permalink / raw) To: Max Kellermann Cc: idryomov, amarkuze, ceph-devel, dhowells, pc, netfs, linux-fsdevel, linux-kernel On Fri, 2026-04-03 at 08:52 +0200, Max Kellermann wrote: > On Thu, Apr 2, 2026 at 9:44 PM Viacheslav Dubeyko <vdubeyko@redhat.com> wrote: > > The Ceph write_begin path calls netfs_write_begin() but does not pass > > IOCB_DONTCACHE through to trigger __folio_set_dropbehind. So, > > folio_test_dropbehind() would never be true on the Ceph write path right now. > > Does it make sense? > > Yes, see: > > > > --- > > > Note: this is an additional feature on top of my Ceph-DONTCACHE patch, > > > see https://lore.kernel.org/ceph-devel/20260401053109.1861724-1-max.kellermann@ionos.com/ > > The code in this patch is not reachable until my RWF_DONTCACHE patch > is merged as well. > > > Are you sure that caching should be always true? All other calls checks that > > ceph_is_cache_enabled(): > > > > bool caching = ceph_is_cache_enabled(inode); > > This function is only called if caching is enabled. I think that such interface will be more clean and safe: static inline void ceph_flush_fscache_write(struct inode *inode, u64 off, u64 *len, bool caching) > > > > > > + *len = 0; > > > +} > > > > > > The ceph_folio_is_cacheable() and ceph_flush_fscache_write() are out of > > CONFIG_CEPH_FSCACHE. It doesn't look right. > > All of the old code is out of CONFIG_CEPH_FSCACHE, too. Does the old > code look right? As far as I can see, all fscache code is under CONFIG_CEPH_FSCACHE compilation option. If we have some issues with old code, then it makes sense to fix it. But this code is fscache related and it should be under CONFIG_CEPH_FSCACHE protection, from my point of view. Moreover, other fscache related code is under CONFIG_CEPH_FSCACHE protection pretty above the code of these functions. > > > > @@ -1412,11 +1427,14 @@ int ceph_submit_write(struct address_space *mapping, > > > bool caching = ceph_is_cache_enabled(inode); > > > u64 offset; > > > u64 len; > > > + u64 cache_offset, cache_len; > > > > Why do you need to introduce the cache_offset and cache_len? We already have > > offset and len. > > These keep track of the region that should be submitted to fscache. > Folios without "dropbehind" need to be excluded from that. > > > > new_request: > > > offset = ceph_fscrypt_page_offset(ceph_wbc->pages[0]); > > > len = ceph_wbc->wsize; > > > + cache_offset = 0; > > > > Is it correct initialization? Frankly speaking, I don't quite follow why we need > > such initialization. > > Technically, cache_offset does not need to be initialized as long as > cache_len is zero because then its value is never used. Would you feel > more comfortable if I remove the unnecessary initializer? I wasn't > sure which approach would raise fewer eyebrows. I am simply trying to follow why we need in cache_offset. We are using the offset currently: /* Kick off an fscache write with what we have so far. */ ceph_fscache_write_to_cache(inode, offset, len, caching); Why the offset is not good enough? Thanks, Slava. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] ceph: do not fill fscache for RWF_DONTCACHE writeback 2026-04-03 17:18 ` Viacheslav Dubeyko @ 2026-04-03 18:13 ` Max Kellermann 0 siblings, 0 replies; 5+ messages in thread From: Max Kellermann @ 2026-04-03 18:13 UTC (permalink / raw) To: Viacheslav Dubeyko Cc: idryomov, amarkuze, ceph-devel, dhowells, pc, netfs, linux-fsdevel, linux-kernel On Fri, Apr 3, 2026 at 7:18 PM Viacheslav Dubeyko <vdubeyko@redhat.com> wrote: > Why the offset is not good enough? Because the "offset" variable tracks the whole write, including the folios that are supposed to be omitted from the fscache. -- Max Kellermann Principal Architect Hosting Technology cm4all | Im Mediapark 6a | 50670 Köln | Germany General information about the company can be found here: https://www.cm4all.com/impressum A member of the IONOS Group ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-04-03 18:13 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-01 20:56 [PATCH] ceph: do not fill fscache for RWF_DONTCACHE writeback Max Kellermann 2026-04-02 19:44 ` Viacheslav Dubeyko 2026-04-03 6:52 ` Max Kellermann 2026-04-03 17:18 ` Viacheslav Dubeyko 2026-04-03 18:13 ` Max Kellermann
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox