* /proc/PID/io/read_bytes accounting regression? @ 2023-02-17 11:54 Daire Byrne 2023-02-17 14:08 ` David Wysochanski 0 siblings, 1 reply; 7+ messages in thread From: Daire Byrne @ 2023-02-17 11:54 UTC (permalink / raw) To: linux-nfs Hi, Maybe someone here can quickly point me in the right direction for this oddity that we noticed. On newer kernels, it looks like the task io accounting is not incrementing the read_bytes when reading from a NFS mount? This was definitely working on v5.16 downwards, but has not been working since v5.18 up to v6.2 (I haven't tested v5.17 yet). If I read from a local filesystem, then the read_bytes for that PID is incremented as expected. If I read over NFS using directIO, then the read_bytes is also correctly incremented for that PID. It's just when reading normally without directIO that it is not. The write_bytes and rchar are also still both correct in all situations. I have checked the kernel config and I'm fairly sure I have all the right things enabled: CONFIG_TASK_XACCT=y CONFIG_TASK_IO_ACCOUNTING=y CONFIG_TASKSTATS=y Unless there was some extra config introduced specific to the nfs client in later kernels that I missed? Cheers, Daire ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: /proc/PID/io/read_bytes accounting regression? 2023-02-17 11:54 /proc/PID/io/read_bytes accounting regression? Daire Byrne @ 2023-02-17 14:08 ` David Wysochanski 2023-02-17 14:36 ` Matthew Wilcox 0 siblings, 1 reply; 7+ messages in thread From: David Wysochanski @ 2023-02-17 14:08 UTC (permalink / raw) To: Daire Byrne, Matthew Wilcox; +Cc: linux-nfs On Fri, Feb 17, 2023 at 6:56 AM Daire Byrne <daire@dneg.com> wrote: > > Hi, > > Maybe someone here can quickly point me in the right direction for > this oddity that we noticed. > > On newer kernels, it looks like the task io accounting is not > incrementing the read_bytes when reading from a NFS mount? This was > definitely working on v5.16 downwards, but has not been working since > v5.18 up to v6.2 (I haven't tested v5.17 yet). > > If I read from a local filesystem, then the read_bytes for that PID is > incremented as expected. > > If I read over NFS using directIO, then the read_bytes is also > correctly incremented for that PID. It's just when reading normally > without directIO that it is not. > > The write_bytes and rchar are also still both correct in all situations. > > I have checked the kernel config and I'm fairly sure I have all the > right things enabled: > > CONFIG_TASK_XACCT=y > CONFIG_TASK_IO_ACCOUNTING=y > CONFIG_TASKSTATS=y > > Unless there was some extra config introduced specific to the nfs > client in later kernels that I missed? > > Cheers, > > Daire > Daire, Thanks for the report. Willy, Question for you at the bottom of this. First, here's what looks to be the candidate changes between these versions: $ git log --oneline v5.16..v5.18 fs/nfs/read.c 89c2be8a9516 NFS: discard NFS_RPC_SWAPFLAGS and RPC_TASK_ROOTCREDS fc1c5abfca7e NFS: Rename fscache read and write pages functions 8786fde8421c Convert NFS from readpages to readahead 16f2f4e679cf nfs: Implement cache I/O by accessing the cache directly I would be this is due to this patch, which went into 5.18: 8786fde8421c Convert NFS from readpages to readahead And the hunks that now call readahead_page vs read_cache_pages: @@ -397,14 +396,16 @@ int nfs_readpage(struct file *file, struct page *page) return ret; } -int nfs_readpages(struct file *file, struct address_space *mapping, - struct list_head *pages, unsigned nr_pages) +void nfs_readahead(struct readahead_control *ractl) { + unsigned int nr_pages = readahead_count(ractl); + struct file *file = ractl->file; struct nfs_readdesc desc; - struct inode *inode = mapping->host; + struct inode *inode = ractl->mapping->host; + struct page *page; int ret; - trace_nfs_aop_readahead(inode, lru_to_page(pages), nr_pages); + trace_nfs_aop_readahead(inode, readahead_pos(ractl), nr_pages); nfs_inc_stats(inode, NFSIOS_VFSREADPAGES); ret = -ESTALE; @@ -422,14 +423,18 @@ int nfs_readpages(struct file *file, struct address_space *mapping, nfs_pageio_init_read(&desc.pgio, inode, false, &nfs_async_read_completion_ops); - ret = read_cache_pages(mapping, pages, readpage_async_filler, &desc); + while ((page = readahead_page(ractl)) != NULL) { + ret = readpage_async_filler(&desc, page); + put_page(page); + if (ret) + break; + } nfs_pageio_complete_read(&desc.pgio); put_nfs_open_context(desc.ctx); out: trace_nfs_aop_readahead_done(inode, nr_pages, ret); - return ret; } and this hunk: @@ -397,14 +396,16 @@ int nfs_readpage(struct file *file, struct page *page) return ret; } -int nfs_readpages(struct file *file, struct address_space *mapping, - struct list_head *pages, unsigned nr_pages) +void nfs_readahead(struct readahead_control *ractl) { + unsigned int nr_pages = readahead_count(ractl); + struct file *file = ractl->file; struct nfs_readdesc desc; - struct inode *inode = mapping->host; + struct inode *inode = ractl->mapping->host; + struct page *page; int ret; - trace_nfs_aop_readahead(inode, lru_to_page(pages), nr_pages); + trace_nfs_aop_readahead(inode, readahead_pos(ractl), nr_pages); nfs_inc_stats(inode, NFSIOS_VFSREADPAGES); ret = -ESTALE; @@ -422,14 +423,18 @@ int nfs_readpages(struct file *file, struct address_space *mapping, nfs_pageio_init_read(&desc.pgio, inode, false, &nfs_async_read_completion_ops); - ret = read_cache_pages(mapping, pages, readpage_async_filler, &desc); + while ((page = readahead_page(ractl)) != NULL) { + ret = readpage_async_filler(&desc, page); + put_page(page); + if (ret) + break; + } nfs_pageio_complete_read(&desc.pgio); put_nfs_open_context(desc.ctx); out: trace_nfs_aop_readahead_done(inode, nr_pages, ret); - return ret; } int __init nfs_init_readpagecache(void) In v5.16 we had this call to task_io_account_read(PAGE_SIZE); on line 109 of read_cache_pages(); 76 /** 77 * read_cache_pages - populate an address space with some pages & start reads against them 78 * @mapping: the address_space 79 * @pages: The address of a list_head which contains the target pages. These 80 * pages have their ->index populated and are otherwise uninitialised. 81 * @filler: callback routine for filling a single page. 82 * @data: private data for the callback routine. 83 * 84 * Hides the details of the LRU cache etc from the filesystems. 85 * 86 * Returns: %0 on success, error return by @filler otherwise 87 */ 88 int read_cache_pages(struct address_space *mapping, struct list_head *pages, 89 int (*filler)(void *, struct page *), void *data) 90 { 91 struct page *page; 92 int ret = 0; 93 94 while (!list_empty(pages)) { 95 page = lru_to_page(pages); 96 list_del(&page->lru); 97 if (add_to_page_cache_lru(page, mapping, page->index, 98 readahead_gfp_mask(mapping))) { 99 read_cache_pages_invalidate_page(mapping, page); 100 continue; 101 } 102 put_page(page); 103 104 ret = filler(data, page); 105 if (unlikely(ret)) { 106 read_cache_pages_invalidate_pages(mapping, pages); 107 break; 108 } 109 task_io_account_read(PAGE_SIZE); But there's no call to task_io_account_read() anymore in the new readahead code paths that I could tell, but maybe I'm missing something. Willy, Does each caller of readahead_page() now need to call task_io_account_read() or should we add that into readahead_page() or maybe inside read_pages()? ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: /proc/PID/io/read_bytes accounting regression? 2023-02-17 14:08 ` David Wysochanski @ 2023-02-17 14:36 ` Matthew Wilcox 2023-02-17 15:47 ` David Wysochanski 0 siblings, 1 reply; 7+ messages in thread From: Matthew Wilcox @ 2023-02-17 14:36 UTC (permalink / raw) To: David Wysochanski; +Cc: Daire Byrne, linux-nfs On Fri, Feb 17, 2023 at 09:08:27AM -0500, David Wysochanski wrote: > On Fri, Feb 17, 2023 at 6:56 AM Daire Byrne <daire@dneg.com> wrote: > > On newer kernels, it looks like the task io accounting is not > > incrementing the read_bytes when reading from a NFS mount? This was > > definitely working on v5.16 downwards, but has not been working since > > v5.18 up to v6.2 (I haven't tested v5.17 yet). > > In v5.16 we had this call to task_io_account_read(PAGE_SIZE); on line 109 > of read_cache_pages(); > > But there's no call to task_io_account_read() anymore in the new > readahead code paths that I could tell, > but maybe I'm missing something. > > Willy, > Does each caller of readahead_page() now need to call > task_io_account_read() or should we add that into > readahead_page() or maybe inside read_pages()? I think the best way is to mimic what the block layer does as closely as possible. Unless we can pull it out of the block layer & all filesystems and put it in the VFS (which we can't; the VFS doesn't know which blocks are recorded by the filesystem as holes and will not result in I/O). The block layer does it as part of the BIO submission path (and also counts PGPGIN and PGPGOUT, which no network filesystems seem to do?) You're more familiar with the NFS code than I am, so you probably have a better idea than __nfs_pageio_add_request(). ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: /proc/PID/io/read_bytes accounting regression? 2023-02-17 14:36 ` Matthew Wilcox @ 2023-02-17 15:47 ` David Wysochanski 2023-02-17 15:54 ` David Wysochanski 2023-02-17 15:59 ` Matthew Wilcox 0 siblings, 2 replies; 7+ messages in thread From: David Wysochanski @ 2023-02-17 15:47 UTC (permalink / raw) To: Matthew Wilcox; +Cc: Daire Byrne, linux-nfs On Fri, Feb 17, 2023 at 9:36 AM Matthew Wilcox <willy@infradead.org> wrote: > > On Fri, Feb 17, 2023 at 09:08:27AM -0500, David Wysochanski wrote: > > On Fri, Feb 17, 2023 at 6:56 AM Daire Byrne <daire@dneg.com> wrote: > > > On newer kernels, it looks like the task io accounting is not > > > incrementing the read_bytes when reading from a NFS mount? This was > > > definitely working on v5.16 downwards, but has not been working since > > > v5.18 up to v6.2 (I haven't tested v5.17 yet). > > > > In v5.16 we had this call to task_io_account_read(PAGE_SIZE); on line 109 > > of read_cache_pages(); > > > > But there's no call to task_io_account_read() anymore in the new > > readahead code paths that I could tell, > > but maybe I'm missing something. > > > > Willy, > > Does each caller of readahead_page() now need to call > > task_io_account_read() or should we add that into > > readahead_page() or maybe inside read_pages()? > > I think the best way is to mimic what the block layer does as closely as > possible. Unless we can pull it out of the block layer & all > filesystems and put it in the VFS (which we can't; the VFS doesn't > know which blocks are recorded by the filesystem as holes and will not > result in I/O). > Caller, ok. I see, that makes sense. Looks like cifs has a call to task_io_account_read() after data has been received. Also netfs has a call as well at the bottom of netfs_rreq_unlock_folios(). Both seems to be _after_ data has been received, but I'm not sure that's correct. > The block layer does it as part of the BIO submission path (and also > counts PGPGIN and PGPGOUT, which no network filesystems seem to do?) > You're more familiar with the NFS code than I am, so you probably > have a better idea than __nfs_pageio_add_request(). > Hmm, yes does the block layer's accounting take into account actual bytes read or just submitted? Maybe I need to look at this closer but at first glance it looks like these numbers may sometimes be incremented when actual data is received and others are incremented when the submission happens. As to the right location in NFS, the function you mention isn't a bad idea, but maybe not the right location. Looking in nfs_file_direct_read() we have the accounting at IO submission time, appears to be the same as the block layer. Also since my NFS netfs conversion patches are still outstanding, I'll have to somehow take the netfs call into account if fscache is enabled. So the right place is looking like somewhere in nfs_read_folio() and nfs_readahead(). ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: /proc/PID/io/read_bytes accounting regression? 2023-02-17 15:47 ` David Wysochanski @ 2023-02-17 15:54 ` David Wysochanski 2023-02-17 15:59 ` Matthew Wilcox 1 sibling, 0 replies; 7+ messages in thread From: David Wysochanski @ 2023-02-17 15:54 UTC (permalink / raw) To: Matthew Wilcox; +Cc: Daire Byrne, linux-nfs On Fri, Feb 17, 2023 at 10:47 AM David Wysochanski <dwysocha@redhat.com> wrote: > > On Fri, Feb 17, 2023 at 9:36 AM Matthew Wilcox <willy@infradead.org> wrote: > > > > On Fri, Feb 17, 2023 at 09:08:27AM -0500, David Wysochanski wrote: > > > On Fri, Feb 17, 2023 at 6:56 AM Daire Byrne <daire@dneg.com> wrote: > > > > On newer kernels, it looks like the task io accounting is not > > > > incrementing the read_bytes when reading from a NFS mount? This was > > > > definitely working on v5.16 downwards, but has not been working since > > > > v5.18 up to v6.2 (I haven't tested v5.17 yet). > > > > > > In v5.16 we had this call to task_io_account_read(PAGE_SIZE); on line 109 > > > of read_cache_pages(); > > > > > > But there's no call to task_io_account_read() anymore in the new > > > readahead code paths that I could tell, > > > but maybe I'm missing something. > > > > > > Willy, > > > Does each caller of readahead_page() now need to call > > > task_io_account_read() or should we add that into > > > readahead_page() or maybe inside read_pages()? > > > > I think the best way is to mimic what the block layer does as closely as > > possible. Unless we can pull it out of the block layer & all > > filesystems and put it in the VFS (which we can't; the VFS doesn't > > know which blocks are recorded by the filesystem as holes and will not > > result in I/O). > > > Caller, ok. I see, that makes sense. > Looks like cifs has a call to task_io_account_read() after data has been > received. Also netfs has a call as well at the bottom of > netfs_rreq_unlock_folios(). > Both seems to be _after_ data has been received, but I'm not sure that's > correct. > > > The block layer does it as part of the BIO submission path (and also > > counts PGPGIN and PGPGOUT, which no network filesystems seem to do?) > > You're more familiar with the NFS code than I am, so you probably > > have a better idea than __nfs_pageio_add_request(). > > > Hmm, yes does the block layer's accounting take into account actual > bytes read or just submitted? Maybe I need to look at this closer > but at first glance it looks like these numbers may sometimes be > incremented when actual data is received and others are incremented > when the submission happens. > > As to the right location in NFS, the function you mention isn't a bad > idea, but maybe not the right location. Looking in nfs_file_direct_read() > we have the accounting at IO submission time, appears to be the > same as the block layer. Also since my NFS netfs conversion patches > are still outstanding, I'll have to somehow take the netfs call into account > if fscache is enabled. So the right place is looking like somewhere > in nfs_read_folio() and nfs_readahead(). I should have read the kernel docs. From Documentation/filesystems/proc.rst 1744 read_bytes 1745 ^^^^^^^^^^ 1746 1747 I/O counter: bytes read 1748 Attempt to count the number of bytes which this process really did cause to 1749 be fetched from the storage layer. Done at the submit_bio() level, so it is 1750 accurate for block-backed filesystems. <please add status regarding NFS and 1751 CIFS at a later time> So it looks like NFS directIO (and non-direct, prior to v5,18) did the same thing as the block layer and is consistent with the definition. Fix would be just add a call to task_io_account_read() inside nfs_read_folio() and nfs_readahead(). ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: /proc/PID/io/read_bytes accounting regression? 2023-02-17 15:47 ` David Wysochanski 2023-02-17 15:54 ` David Wysochanski @ 2023-02-17 15:59 ` Matthew Wilcox 2023-03-23 17:56 ` David Wysochanski 1 sibling, 1 reply; 7+ messages in thread From: Matthew Wilcox @ 2023-02-17 15:59 UTC (permalink / raw) To: David Wysochanski; +Cc: Daire Byrne, linux-nfs, David Howells On Fri, Feb 17, 2023 at 10:47:01AM -0500, David Wysochanski wrote: > On Fri, Feb 17, 2023 at 9:36 AM Matthew Wilcox <willy@infradead.org> wrote: > > > > On Fri, Feb 17, 2023 at 09:08:27AM -0500, David Wysochanski wrote: > > > On Fri, Feb 17, 2023 at 6:56 AM Daire Byrne <daire@dneg.com> wrote: > > > > On newer kernels, it looks like the task io accounting is not > > > > incrementing the read_bytes when reading from a NFS mount? This was > > > > definitely working on v5.16 downwards, but has not been working since > > > > v5.18 up to v6.2 (I haven't tested v5.17 yet). > > > > > > In v5.16 we had this call to task_io_account_read(PAGE_SIZE); on line 109 > > > of read_cache_pages(); > > > > > > But there's no call to task_io_account_read() anymore in the new > > > readahead code paths that I could tell, > > > but maybe I'm missing something. > > > > > > Willy, > > > Does each caller of readahead_page() now need to call > > > task_io_account_read() or should we add that into > > > readahead_page() or maybe inside read_pages()? > > > > I think the best way is to mimic what the block layer does as closely as > > possible. Unless we can pull it out of the block layer & all > > filesystems and put it in the VFS (which we can't; the VFS doesn't > > know which blocks are recorded by the filesystem as holes and will not > > result in I/O). > > > Caller, ok. I see, that makes sense. > Looks like cifs has a call to task_io_account_read() after data has been > received. Also netfs has a call as well at the bottom of > netfs_rreq_unlock_folios(). > Both seems to be _after_ data has been received, but I'm not sure that's > correct. It's probably correct, just different from the block layer. I don't have any special insight here, just an inclination to be as similar as possible. > > The block layer does it as part of the BIO submission path (and also > > counts PGPGIN and PGPGOUT, which no network filesystems seem to do?) > > You're more familiar with the NFS code than I am, so you probably > > have a better idea than __nfs_pageio_add_request(). > > > Hmm, yes does the block layer's accounting take into account actual > bytes read or just submitted? Maybe I need to look at this closer > but at first glance it looks like these numbers may sometimes be > incremented when actual data is received and others are incremented > when the submission happens. > > As to the right location in NFS, the function you mention isn't a bad > idea, but maybe not the right location. Looking in nfs_file_direct_read() > we have the accounting at IO submission time, appears to be the > same as the block layer. Also since my NFS netfs conversion patches > are still outstanding, I'll have to somehow take the netfs call into account > if fscache is enabled. So the right place is looking like somewhere > in nfs_read_folio() and nfs_readahead(). Yes, we don't want to double-count either fscache or direct I/O. I'm Maybe Dave as opinions about where we should be accounting it -- I'm not sure that netfs is the right place to do it. Maybe it should be in each network filesystem instead of in netfs? ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: /proc/PID/io/read_bytes accounting regression? 2023-02-17 15:59 ` Matthew Wilcox @ 2023-03-23 17:56 ` David Wysochanski 0 siblings, 0 replies; 7+ messages in thread From: David Wysochanski @ 2023-03-23 17:56 UTC (permalink / raw) To: David Howells; +Cc: Daire Byrne, linux-nfs, Matthew Wilcox On Fri, Feb 17, 2023 at 10:59 AM Matthew Wilcox <willy@infradead.org> wrote: > > On Fri, Feb 17, 2023 at 10:47:01AM -0500, David Wysochanski wrote: > > On Fri, Feb 17, 2023 at 9:36 AM Matthew Wilcox <willy@infradead.org> wrote: > > > > > > On Fri, Feb 17, 2023 at 09:08:27AM -0500, David Wysochanski wrote: > > > > On Fri, Feb 17, 2023 at 6:56 AM Daire Byrne <daire@dneg.com> wrote: > > > > > On newer kernels, it looks like the task io accounting is not > > > > > incrementing the read_bytes when reading from a NFS mount? This was > > > > > definitely working on v5.16 downwards, but has not been working since > > > > > v5.18 up to v6.2 (I haven't tested v5.17 yet). > > > > > > > > In v5.16 we had this call to task_io_account_read(PAGE_SIZE); on line 109 > > > > of read_cache_pages(); > > > > > > > > But there's no call to task_io_account_read() anymore in the new > > > > readahead code paths that I could tell, > > > > but maybe I'm missing something. > > > > > > > > Willy, > > > > Does each caller of readahead_page() now need to call > > > > task_io_account_read() or should we add that into > > > > readahead_page() or maybe inside read_pages()? > > > > > > I think the best way is to mimic what the block layer does as closely as > > > possible. Unless we can pull it out of the block layer & all > > > filesystems and put it in the VFS (which we can't; the VFS doesn't > > > know which blocks are recorded by the filesystem as holes and will not > > > result in I/O). > > > > > Caller, ok. I see, that makes sense. > > Looks like cifs has a call to task_io_account_read() after data has been > > received. Also netfs has a call as well at the bottom of > > netfs_rreq_unlock_folios(). > > Both seems to be _after_ data has been received, but I'm not sure that's > > correct. > > It's probably correct, just different from the block layer. I don't > have any special insight here, just an inclination to be as similar > as possible. > > > > The block layer does it as part of the BIO submission path (and also > > > counts PGPGIN and PGPGOUT, which no network filesystems seem to do?) > > > You're more familiar with the NFS code than I am, so you probably > > > have a better idea than __nfs_pageio_add_request(). > > > > > Hmm, yes does the block layer's accounting take into account actual > > bytes read or just submitted? Maybe I need to look at this closer > > but at first glance it looks like these numbers may sometimes be > > incremented when actual data is received and others are incremented > > when the submission happens. > > > > As to the right location in NFS, the function you mention isn't a bad > > idea, but maybe not the right location. Looking in nfs_file_direct_read() > > we have the accounting at IO submission time, appears to be the > > same as the block layer. Also since my NFS netfs conversion patches > > are still outstanding, I'll have to somehow take the netfs call into account > > if fscache is enabled. So the right place is looking like somewhere > > in nfs_read_folio() and nfs_readahead(). > > Yes, we don't want to double-count either fscache or direct I/O. > I'm Maybe Dave as opinions about where we should be accounting it -- > I'm not sure that netfs is the right place to do it. Maybe it should > be in each network filesystem instead of in netfs? > David - can you comment on whether you agree the call to task_io_account_read() inside netfs_rreq_unlock_folios() is wrong? I think based on the discussion here, the definition in the kernel docs, and the fact we have to record against the proper PID, the accounting should be done in the caller of the netfs interfaces not inside netfs unlock path. It looks like afs may rely on netfs IO accounting but can it be moved? ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-03-23 17:58 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-02-17 11:54 /proc/PID/io/read_bytes accounting regression? Daire Byrne 2023-02-17 14:08 ` David Wysochanski 2023-02-17 14:36 ` Matthew Wilcox 2023-02-17 15:47 ` David Wysochanski 2023-02-17 15:54 ` David Wysochanski 2023-02-17 15:59 ` Matthew Wilcox 2023-03-23 17:56 ` David Wysochanski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox