* [PATCH] fat: Avoid oops when bdi->io_pages==0
@ 2020-08-30 0:59 OGAWA Hirofumi
2020-08-30 1:21 ` Matthew Wilcox
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: OGAWA Hirofumi @ 2020-08-30 0:59 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-fsdevel
On one system, there was bdi->io_pages==0. This seems to be the bug of
a driver somewhere, and should fix it though. Anyway, it is better to
avoid the divide-by-zero Oops.
So this check it.
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: <stable@vger.kernel.org>
---
fs/fat/fatent.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/fat/fatent.c b/fs/fat/fatent.c
index f7e3304..98a1c4f 100644
--- a/fs/fat/fatent.c 2020-08-30 06:52:47.251564566 +0900
+++ b/fs/fat/fatent.c 2020-08-30 06:54:05.838319213 +0900
@@ -660,7 +660,7 @@ static void fat_ra_init(struct super_blo
if (fatent->entry >= ent_limit)
return;
- if (ra_pages > sb->s_bdi->io_pages)
+ if (sb->s_bdi->io_pages && ra_pages > sb->s_bdi->io_pages)
ra_pages = rounddown(ra_pages, sb->s_bdi->io_pages);
reada_blocks = ra_pages << (PAGE_SHIFT - sb->s_blocksize_bits + 1);
_
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [PATCH] fat: Avoid oops when bdi->io_pages==0 2020-08-30 0:59 [PATCH] fat: Avoid oops when bdi->io_pages==0 OGAWA Hirofumi @ 2020-08-30 1:21 ` Matthew Wilcox 2020-08-30 1:54 ` OGAWA Hirofumi 2020-08-30 14:01 ` Sasha Levin 2020-08-31 15:22 ` Jens Axboe 2 siblings, 1 reply; 15+ messages in thread From: Matthew Wilcox @ 2020-08-30 1:21 UTC (permalink / raw) To: OGAWA Hirofumi; +Cc: Andrew Morton, linux-kernel, linux-fsdevel On Sun, Aug 30, 2020 at 09:59:41AM +0900, OGAWA Hirofumi wrote: > On one system, there was bdi->io_pages==0. This seems to be the bug of > a driver somewhere, and should fix it though. Anyway, it is better to > avoid the divide-by-zero Oops. > > So this check it. > > Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> > Cc: <stable@vger.kernel.org> > --- > fs/fat/fatent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/fat/fatent.c b/fs/fat/fatent.c > index f7e3304..98a1c4f 100644 > --- a/fs/fat/fatent.c 2020-08-30 06:52:47.251564566 +0900 > +++ b/fs/fat/fatent.c 2020-08-30 06:54:05.838319213 +0900 > @@ -660,7 +660,7 @@ static void fat_ra_init(struct super_blo > if (fatent->entry >= ent_limit) > return; > > - if (ra_pages > sb->s_bdi->io_pages) > + if (sb->s_bdi->io_pages && ra_pages > sb->s_bdi->io_pages) > ra_pages = rounddown(ra_pages, sb->s_bdi->io_pages); Wait, rounddown? ->io_pages is supposed to be the maximum number of pages to readahead. Shouldn't this be max() instead of rounddown()? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] fat: Avoid oops when bdi->io_pages==0 2020-08-30 1:21 ` Matthew Wilcox @ 2020-08-30 1:54 ` OGAWA Hirofumi 2020-08-30 3:53 ` Matthew Wilcox 0 siblings, 1 reply; 15+ messages in thread From: OGAWA Hirofumi @ 2020-08-30 1:54 UTC (permalink / raw) To: Matthew Wilcox; +Cc: Andrew Morton, linux-kernel, linux-fsdevel Matthew Wilcox <willy@infradead.org> writes: > On Sun, Aug 30, 2020 at 09:59:41AM +0900, OGAWA Hirofumi wrote: >> On one system, there was bdi->io_pages==0. This seems to be the bug of >> a driver somewhere, and should fix it though. Anyway, it is better to >> avoid the divide-by-zero Oops. >> >> So this check it. >> >> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> >> Cc: <stable@vger.kernel.org> >> --- >> fs/fat/fatent.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/fs/fat/fatent.c b/fs/fat/fatent.c >> index f7e3304..98a1c4f 100644 >> --- a/fs/fat/fatent.c 2020-08-30 06:52:47.251564566 +0900 >> +++ b/fs/fat/fatent.c 2020-08-30 06:54:05.838319213 +0900 >> @@ -660,7 +660,7 @@ static void fat_ra_init(struct super_blo >> if (fatent->entry >= ent_limit) >> return; >> >> - if (ra_pages > sb->s_bdi->io_pages) >> + if (sb->s_bdi->io_pages && ra_pages > sb->s_bdi->io_pages) >> ra_pages = rounddown(ra_pages, sb->s_bdi->io_pages); > > Wait, rounddown? ->io_pages is supposed to be the maximum number of > pages to readahead. Shouldn't this be max() instead of rounddown()? Hm, io_pages is limited by driver setting too, and io_pages can be lower than ra_pages, e.g. usb storage. Assuming ra_pages is user intent of readahead window. So if io_pages is lower than ra_pages, this try ra_pages to align of io_pages chunk, but not bigger than ra_pages. Because if block layer splits I/O requests to hard limit, then I/O is not optimal. So it is intent, I can be misunderstanding though. Thanks. -- OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] fat: Avoid oops when bdi->io_pages==0 2020-08-30 1:54 ` OGAWA Hirofumi @ 2020-08-30 3:53 ` Matthew Wilcox 2020-08-30 9:04 ` OGAWA Hirofumi 0 siblings, 1 reply; 15+ messages in thread From: Matthew Wilcox @ 2020-08-30 3:53 UTC (permalink / raw) To: OGAWA Hirofumi; +Cc: Andrew Morton, linux-kernel, linux-fsdevel On Sun, Aug 30, 2020 at 10:54:35AM +0900, OGAWA Hirofumi wrote: > Matthew Wilcox <willy@infradead.org> writes: > > > On Sun, Aug 30, 2020 at 09:59:41AM +0900, OGAWA Hirofumi wrote: > >> On one system, there was bdi->io_pages==0. This seems to be the bug of > >> a driver somewhere, and should fix it though. Anyway, it is better to > >> avoid the divide-by-zero Oops. > >> > >> So this check it. > >> > >> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> > >> Cc: <stable@vger.kernel.org> > >> --- > >> fs/fat/fatent.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/fs/fat/fatent.c b/fs/fat/fatent.c > >> index f7e3304..98a1c4f 100644 > >> --- a/fs/fat/fatent.c 2020-08-30 06:52:47.251564566 +0900 > >> +++ b/fs/fat/fatent.c 2020-08-30 06:54:05.838319213 +0900 > >> @@ -660,7 +660,7 @@ static void fat_ra_init(struct super_blo > >> if (fatent->entry >= ent_limit) > >> return; > >> > >> - if (ra_pages > sb->s_bdi->io_pages) > >> + if (sb->s_bdi->io_pages && ra_pages > sb->s_bdi->io_pages) > >> ra_pages = rounddown(ra_pages, sb->s_bdi->io_pages); > > > > Wait, rounddown? ->io_pages is supposed to be the maximum number of > > pages to readahead. Shouldn't this be max() instead of rounddown()? Sorry, I meant 'min', not 'max'. > Hm, io_pages is limited by driver setting too, and io_pages can be lower > than ra_pages, e.g. usb storage. > > Assuming ra_pages is user intent of readahead window. So if io_pages is > lower than ra_pages, this try ra_pages to align of io_pages chunk, but > not bigger than ra_pages. Because if block layer splits I/O requests to > hard limit, then I/O is not optimal. > > So it is intent, I can be misunderstanding though. Looking at this some more, I'm not sure it makes sense to consult ->io_pages at all. I see how it gets set to 0 -- the admin can write '1' to /sys/block/<device>/queue/max_sectors_kb and that gets turned into 0 in ->io_pages. But I'm not sure it makes any sense to respect that. Looking at mm/readahead.c, all it does is limit the size of a read request which exceeds the current readahead window. It's not used to limit the readahead window itself. For example: unsigned long max_pages = ra->ra_pages; ... if (req_size > max_pages && bdi->io_pages > max_pages) max_pages = min(req_size, bdi->io_pages); Setting io_pages below ra_pages has no effect. So maybe fat should also disregard it? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] fat: Avoid oops when bdi->io_pages==0 2020-08-30 3:53 ` Matthew Wilcox @ 2020-08-30 9:04 ` OGAWA Hirofumi 0 siblings, 0 replies; 15+ messages in thread From: OGAWA Hirofumi @ 2020-08-30 9:04 UTC (permalink / raw) To: Matthew Wilcox; +Cc: Andrew Morton, linux-kernel, linux-fsdevel Matthew Wilcox <willy@infradead.org> writes: > On Sun, Aug 30, 2020 at 10:54:35AM +0900, OGAWA Hirofumi wrote: >> Matthew Wilcox <willy@infradead.org> writes: >> >> Hm, io_pages is limited by driver setting too, and io_pages can be lower >> than ra_pages, e.g. usb storage. >> >> Assuming ra_pages is user intent of readahead window. So if io_pages is >> lower than ra_pages, this try ra_pages to align of io_pages chunk, but >> not bigger than ra_pages. Because if block layer splits I/O requests to >> hard limit, then I/O is not optimal. >> >> So it is intent, I can be misunderstanding though. > > Looking at this some more, I'm not sure it makes sense to consult ->io_pages > at all. I see how it gets set to 0 -- the admin can write '1' to > /sys/block/<device>/queue/max_sectors_kb and that gets turned into 0 > in ->io_pages. if (max_sectors_kb > max_hw_sectors_kb || max_sectors_kb < page_kb) return -EINVAL; It should not set to 0 via /sys/.../max_sectors_kb. However the default of bdi->io_pages is 0. So it happened if a driver didn't initialized it. > But I'm not sure it makes any sense to respect that. Looking at > mm/readahead.c, all it does is limit the size of a read request which > exceeds the current readahead window. It's not used to limit the > readahead window itself. For example: > > unsigned long max_pages = ra->ra_pages; > ... > if (req_size > max_pages && bdi->io_pages > max_pages) > max_pages = min(req_size, bdi->io_pages); > > Setting io_pages below ra_pages has no effect. So maybe fat should also > disregard it? |-----------------------| requested blocks [before] ra_pages |===========|===========|===========| io_pages |---------|---------|---------| req |---------|-|-------|---| [after] ra_pages |=========|=========|=========| io_pages |---------|---------|---------| req |---------|---------|---| This path is known the large sequential read there. Well, anyway, this intent is to use [after] as 3 req, instead of [before] as 4 req. Thanks. -- OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] fat: Avoid oops when bdi->io_pages==0 2020-08-30 0:59 [PATCH] fat: Avoid oops when bdi->io_pages==0 OGAWA Hirofumi 2020-08-30 1:21 ` Matthew Wilcox @ 2020-08-30 14:01 ` Sasha Levin 2020-08-30 14:16 ` OGAWA Hirofumi 2020-08-31 15:22 ` Jens Axboe 2 siblings, 1 reply; 15+ messages in thread From: Sasha Levin @ 2020-08-30 14:01 UTC (permalink / raw) To: Sasha Levin, OGAWA Hirofumi, Andrew Morton Cc: linux-kernel, linux-fsdevel, stable, stable Hi [This is an automated email] This commit has been processed because it contains a -stable tag. The stable tag indicates that it's relevant for the following trees: all The bot has tested the following trees: v5.8.5, v5.4.61, v4.19.142, v4.14.195, v4.9.234, v4.4.234. v5.8.5: Build OK! v5.4.61: Failed to apply! Possible dependencies: 898310032b96 ("fat: improve the readahead for FAT entries") a090a5a7d73f ("fat: fix fat_ra_init() for data clusters == 0") v4.19.142: Failed to apply! Possible dependencies: 898310032b96 ("fat: improve the readahead for FAT entries") a090a5a7d73f ("fat: fix fat_ra_init() for data clusters == 0") v4.14.195: Failed to apply! Possible dependencies: 898310032b96 ("fat: improve the readahead for FAT entries") a090a5a7d73f ("fat: fix fat_ra_init() for data clusters == 0") f663b5b38fff ("fat: add FITRIM ioctl for FAT file system") v4.9.234: Failed to apply! Possible dependencies: 898310032b96 ("fat: improve the readahead for FAT entries") a090a5a7d73f ("fat: fix fat_ra_init() for data clusters == 0") f663b5b38fff ("fat: add FITRIM ioctl for FAT file system") v4.4.234: Failed to apply! Possible dependencies: 898310032b96 ("fat: improve the readahead for FAT entries") 8992de4cec12 ("fat: constify fatent_operations structures") a090a5a7d73f ("fat: fix fat_ra_init() for data clusters == 0") f663b5b38fff ("fat: add FITRIM ioctl for FAT file system") NOTE: The patch will not be queued to stable trees until it is upstream. How should we proceed with this patch? -- Thanks Sasha ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] fat: Avoid oops when bdi->io_pages==0 2020-08-30 14:01 ` Sasha Levin @ 2020-08-30 14:16 ` OGAWA Hirofumi 0 siblings, 0 replies; 15+ messages in thread From: OGAWA Hirofumi @ 2020-08-30 14:16 UTC (permalink / raw) To: Sasha Levin; +Cc: Andrew Morton, linux-kernel, linux-fsdevel, stable Sasha Levin <sashal@kernel.org> writes: > Hi > > [This is an automated email] > > This commit has been processed because it contains a -stable tag. > The stable tag indicates that it's relevant for the following trees: all > > The bot has tested the following trees: v5.8.5, v5.4.61, v4.19.142, v4.14.195, v4.9.234, v4.4.234. > > v5.8.5: Build OK! [...] > > How should we proceed with this patch? Only 5.8.x has to apply this patch. Thanks. -- OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] fat: Avoid oops when bdi->io_pages==0 2020-08-30 0:59 [PATCH] fat: Avoid oops when bdi->io_pages==0 OGAWA Hirofumi 2020-08-30 1:21 ` Matthew Wilcox 2020-08-30 14:01 ` Sasha Levin @ 2020-08-31 15:22 ` Jens Axboe 2020-08-31 16:37 ` OGAWA Hirofumi 2 siblings, 1 reply; 15+ messages in thread From: Jens Axboe @ 2020-08-31 15:22 UTC (permalink / raw) To: OGAWA Hirofumi; +Cc: Andrew Morton, linux-kernel, fsdevel On Sat, Aug 29, 2020 at 7:08 PM OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> wrote: > > On one system, there was bdi->io_pages==0. This seems to be the bug of > a driver somewhere, and should fix it though. Anyway, it is better to > avoid the divide-by-zero Oops. > > So this check it. > > Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> > Cc: <stable@vger.kernel.org> > --- > fs/fat/fatent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/fat/fatent.c b/fs/fat/fatent.c > index f7e3304..98a1c4f 100644 > --- a/fs/fat/fatent.c 2020-08-30 06:52:47.251564566 +0900 > +++ b/fs/fat/fatent.c 2020-08-30 06:54:05.838319213 +0900 > @@ -660,7 +660,7 @@ static void fat_ra_init(struct super_blo > if (fatent->entry >= ent_limit) > return; > > - if (ra_pages > sb->s_bdi->io_pages) > + if (sb->s_bdi->io_pages && ra_pages > sb->s_bdi->io_pages) > ra_pages = rounddown(ra_pages, sb->s_bdi->io_pages); > reada_blocks = ra_pages << (PAGE_SHIFT - sb->s_blocksize_bits + 1); I don't think we should work-around this here. What device is this on? Something like the below may help. diff --git a/block/blk-core.c b/block/blk-core.c index d9d632639bd1..10c08ac50697 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -539,6 +539,7 @@ struct request_queue *blk_alloc_queue(int node_id) goto fail_stats; q->backing_dev_info->ra_pages = VM_READAHEAD_PAGES; + q->backing_dev_info->io_pages = VM_READAHEAD_PAGES; q->backing_dev_info->capabilities = BDI_CAP_CGROUP_WRITEBACK; q->node = node_id; -- Jens Axboe ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] fat: Avoid oops when bdi->io_pages==0 2020-08-31 15:22 ` Jens Axboe @ 2020-08-31 16:37 ` OGAWA Hirofumi 2020-08-31 16:39 ` Jens Axboe 0 siblings, 1 reply; 15+ messages in thread From: OGAWA Hirofumi @ 2020-08-31 16:37 UTC (permalink / raw) To: Jens Axboe; +Cc: Andrew Morton, linux-kernel, fsdevel Jens Axboe <axboe@kernel.dk> writes: > On Sat, Aug 29, 2020 at 7:08 PM OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> wrote: >> >> On one system, there was bdi->io_pages==0. This seems to be the bug of >> a driver somewhere, and should fix it though. Anyway, it is better to >> avoid the divide-by-zero Oops. >> >> So this check it. >> >> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> >> Cc: <stable@vger.kernel.org> >> --- >> fs/fat/fatent.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/fs/fat/fatent.c b/fs/fat/fatent.c >> index f7e3304..98a1c4f 100644 >> --- a/fs/fat/fatent.c 2020-08-30 06:52:47.251564566 +0900 >> +++ b/fs/fat/fatent.c 2020-08-30 06:54:05.838319213 +0900 >> @@ -660,7 +660,7 @@ static void fat_ra_init(struct super_blo >> if (fatent->entry >= ent_limit) >> return; >> >> - if (ra_pages > sb->s_bdi->io_pages) >> + if (sb->s_bdi->io_pages && ra_pages > sb->s_bdi->io_pages) >> ra_pages = rounddown(ra_pages, sb->s_bdi->io_pages); >> reada_blocks = ra_pages << (PAGE_SHIFT - sb->s_blocksize_bits + 1); > > I don't think we should work-around this here. What device is this on? > Something like the below may help. The reported bug is from nvme stack, and the below patch (I submitted same patch to you) fixed the reported case though. But I didn't verify all possible path, so I'd liked to use safer side. If block layer can guarantee io_pages!=0 instead, and can apply to stable branch (5.8+). It would work too. Thanks. > diff --git a/block/blk-core.c b/block/blk-core.c > index d9d632639bd1..10c08ac50697 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -539,6 +539,7 @@ struct request_queue *blk_alloc_queue(int node_id) > goto fail_stats; > > q->backing_dev_info->ra_pages = VM_READAHEAD_PAGES; > + q->backing_dev_info->io_pages = VM_READAHEAD_PAGES; > q->backing_dev_info->capabilities = BDI_CAP_CGROUP_WRITEBACK; > q->node = node_id; -- OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] fat: Avoid oops when bdi->io_pages==0 2020-08-31 16:37 ` OGAWA Hirofumi @ 2020-08-31 16:39 ` Jens Axboe 2020-08-31 16:56 ` Matthew Wilcox 2020-08-31 17:16 ` OGAWA Hirofumi 0 siblings, 2 replies; 15+ messages in thread From: Jens Axboe @ 2020-08-31 16:39 UTC (permalink / raw) To: OGAWA Hirofumi; +Cc: Andrew Morton, linux-kernel, fsdevel On 8/31/20 10:37 AM, OGAWA Hirofumi wrote: > Jens Axboe <axboe@kernel.dk> writes: > >> On Sat, Aug 29, 2020 at 7:08 PM OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> wrote: >>> >>> On one system, there was bdi->io_pages==0. This seems to be the bug of >>> a driver somewhere, and should fix it though. Anyway, it is better to >>> avoid the divide-by-zero Oops. >>> >>> So this check it. >>> >>> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> >>> Cc: <stable@vger.kernel.org> >>> --- >>> fs/fat/fatent.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/fs/fat/fatent.c b/fs/fat/fatent.c >>> index f7e3304..98a1c4f 100644 >>> --- a/fs/fat/fatent.c 2020-08-30 06:52:47.251564566 +0900 >>> +++ b/fs/fat/fatent.c 2020-08-30 06:54:05.838319213 +0900 >>> @@ -660,7 +660,7 @@ static void fat_ra_init(struct super_blo >>> if (fatent->entry >= ent_limit) >>> return; >>> >>> - if (ra_pages > sb->s_bdi->io_pages) >>> + if (sb->s_bdi->io_pages && ra_pages > sb->s_bdi->io_pages) >>> ra_pages = rounddown(ra_pages, sb->s_bdi->io_pages); >>> reada_blocks = ra_pages << (PAGE_SHIFT - sb->s_blocksize_bits + 1); >> >> I don't think we should work-around this here. What device is this on? >> Something like the below may help. > > The reported bug is from nvme stack, and the below patch (I submitted > same patch to you) fixed the reported case though. But I didn't verify > all possible path, so I'd liked to use safer side. > > If block layer can guarantee io_pages!=0 instead, and can apply to > stable branch (5.8+). It would work too. We really should ensure that ->io_pages is always set, imho, instead of having to work-around it in other spots. -- Jens Axboe ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] fat: Avoid oops when bdi->io_pages==0 2020-08-31 16:39 ` Jens Axboe @ 2020-08-31 16:56 ` Matthew Wilcox 2020-08-31 17:00 ` Jens Axboe 2020-08-31 17:16 ` OGAWA Hirofumi 1 sibling, 1 reply; 15+ messages in thread From: Matthew Wilcox @ 2020-08-31 16:56 UTC (permalink / raw) To: Jens Axboe; +Cc: OGAWA Hirofumi, Andrew Morton, linux-kernel, fsdevel On Mon, Aug 31, 2020 at 10:39:26AM -0600, Jens Axboe wrote: > We really should ensure that ->io_pages is always set, imho, instead of > having to work-around it in other spots. Interestingly, there are only three places in the entire kernel which _use_ bdi->io_pages. FAT, Verity and the pagecache readahead code. Verity: unsigned long num_ra_pages = min_t(unsigned long, num_blocks_to_hash - i, inode->i_sb->s_bdi->io_pages); FAT: if (ra_pages > sb->s_bdi->io_pages) ra_pages = rounddown(ra_pages, sb->s_bdi->io_pages); Pagecache: max_pages = max_t(unsigned long, bdi->io_pages, ra->ra_pages); and if (req_size > max_pages && bdi->io_pages > max_pages) max_pages = min(req_size, bdi->io_pages); The funny thing is that all three are using it differently. Verity is taking io_pages to be the maximum amount to readahead. FAT is using it as the unit of readahead (round down to the previous multiple) and the pagecache uses it to limit reads that exceed the current per-file readahead limit (but allows per-file readahead to exceed io_pages, in which case it has no effect). So how should it be used? My inclination is to say that the pagecache is right, by virtue of being the most-used. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] fat: Avoid oops when bdi->io_pages==0 2020-08-31 16:56 ` Matthew Wilcox @ 2020-08-31 17:00 ` Jens Axboe 2020-08-31 17:39 ` OGAWA Hirofumi 0 siblings, 1 reply; 15+ messages in thread From: Jens Axboe @ 2020-08-31 17:00 UTC (permalink / raw) To: Matthew Wilcox; +Cc: OGAWA Hirofumi, Andrew Morton, linux-kernel, fsdevel On 8/31/20 10:56 AM, Matthew Wilcox wrote: > On Mon, Aug 31, 2020 at 10:39:26AM -0600, Jens Axboe wrote: >> We really should ensure that ->io_pages is always set, imho, instead of >> having to work-around it in other spots. > > Interestingly, there are only three places in the entire kernel which > _use_ bdi->io_pages. FAT, Verity and the pagecache readahead code. > > Verity: > unsigned long num_ra_pages = > min_t(unsigned long, num_blocks_to_hash - i, > inode->i_sb->s_bdi->io_pages); > > FAT: > if (ra_pages > sb->s_bdi->io_pages) > ra_pages = rounddown(ra_pages, sb->s_bdi->io_pages); > > Pagecache: > max_pages = max_t(unsigned long, bdi->io_pages, ra->ra_pages); > and > if (req_size > max_pages && bdi->io_pages > max_pages) > max_pages = min(req_size, bdi->io_pages); > > The funny thing is that all three are using it differently. Verity is > taking io_pages to be the maximum amount to readahead. FAT is using > it as the unit of readahead (round down to the previous multiple) and > the pagecache uses it to limit reads that exceed the current per-file > readahead limit (but allows per-file readahead to exceed io_pages, > in which case it has no effect). > > So how should it be used? My inclination is to say that the pagecache > is right, by virtue of being the most-used. When I added ->io_pages, it was for the page cache use case. The others grew after that... -- Jens Axboe ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] fat: Avoid oops when bdi->io_pages==0 2020-08-31 17:00 ` Jens Axboe @ 2020-08-31 17:39 ` OGAWA Hirofumi 0 siblings, 0 replies; 15+ messages in thread From: OGAWA Hirofumi @ 2020-08-31 17:39 UTC (permalink / raw) To: Jens Axboe; +Cc: Matthew Wilcox, Andrew Morton, linux-kernel, fsdevel Jens Axboe <axboe@kernel.dk> writes: > On 8/31/20 10:56 AM, Matthew Wilcox wrote: >> On Mon, Aug 31, 2020 at 10:39:26AM -0600, Jens Axboe wrote: >>> We really should ensure that ->io_pages is always set, imho, instead of >>> having to work-around it in other spots. >> >> Interestingly, there are only three places in the entire kernel which >> _use_ bdi->io_pages. FAT, Verity and the pagecache readahead code. >> >> Verity: >> unsigned long num_ra_pages = >> min_t(unsigned long, num_blocks_to_hash - i, >> inode->i_sb->s_bdi->io_pages); >> >> FAT: >> if (ra_pages > sb->s_bdi->io_pages) >> ra_pages = rounddown(ra_pages, sb->s_bdi->io_pages); >> >> Pagecache: >> max_pages = max_t(unsigned long, bdi->io_pages, ra->ra_pages); >> and >> if (req_size > max_pages && bdi->io_pages > max_pages) >> max_pages = min(req_size, bdi->io_pages); >> >> The funny thing is that all three are using it differently. Verity is >> taking io_pages to be the maximum amount to readahead. FAT is using >> it as the unit of readahead (round down to the previous multiple) and >> the pagecache uses it to limit reads that exceed the current per-file >> readahead limit (but allows per-file readahead to exceed io_pages, >> in which case it has no effect). >> >> So how should it be used? My inclination is to say that the pagecache >> is right, by virtue of being the most-used. > > When I added ->io_pages, it was for the page cache use case. The others > grew after that... FAT and pagecache usage would be similar or same purpose. The both is using io_pages as optimal IO size. In pagecache case, it uses io_pages if one request size is exceeding io_pages. In FAT case, there is perfect knowledge about future/total request size. So FAT divides request by io_pages, and adjust ra_pages with knowledge. I don't know about verity. Thanks. -- OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] fat: Avoid oops when bdi->io_pages==0 2020-08-31 16:39 ` Jens Axboe 2020-08-31 16:56 ` Matthew Wilcox @ 2020-08-31 17:16 ` OGAWA Hirofumi 2020-08-31 17:19 ` Jens Axboe 1 sibling, 1 reply; 15+ messages in thread From: OGAWA Hirofumi @ 2020-08-31 17:16 UTC (permalink / raw) To: Jens Axboe; +Cc: Andrew Morton, linux-kernel, fsdevel Jens Axboe <axboe@kernel.dk> writes: > On 8/31/20 10:37 AM, OGAWA Hirofumi wrote: >> Jens Axboe <axboe@kernel.dk> writes: >> >>> I don't think we should work-around this here. What device is this on? >>> Something like the below may help. >> >> The reported bug is from nvme stack, and the below patch (I submitted >> same patch to you) fixed the reported case though. But I didn't verify >> all possible path, so I'd liked to use safer side. >> >> If block layer can guarantee io_pages!=0 instead, and can apply to >> stable branch (5.8+). It would work too. > > We really should ensure that ->io_pages is always set, imho, instead of > having to work-around it in other spots. I think it is good too. However, the issue would be how to do it for stable branch. If you think that block layer patch is enough and submit to stable (5.8+) branch instead, I'm fine without fatfs patch. (Or removing workaround in fatfs with block layer patch later?) Thanks. -- OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] fat: Avoid oops when bdi->io_pages==0 2020-08-31 17:16 ` OGAWA Hirofumi @ 2020-08-31 17:19 ` Jens Axboe 0 siblings, 0 replies; 15+ messages in thread From: Jens Axboe @ 2020-08-31 17:19 UTC (permalink / raw) To: OGAWA Hirofumi; +Cc: Andrew Morton, linux-kernel, fsdevel On 8/31/20 11:16 AM, OGAWA Hirofumi wrote: > Jens Axboe <axboe@kernel.dk> writes: > >> On 8/31/20 10:37 AM, OGAWA Hirofumi wrote: >>> Jens Axboe <axboe@kernel.dk> writes: >>> >>>> I don't think we should work-around this here. What device is this on? >>>> Something like the below may help. >>> >>> The reported bug is from nvme stack, and the below patch (I submitted >>> same patch to you) fixed the reported case though. But I didn't verify >>> all possible path, so I'd liked to use safer side. >>> >>> If block layer can guarantee io_pages!=0 instead, and can apply to >>> stable branch (5.8+). It would work too. >> >> We really should ensure that ->io_pages is always set, imho, instead of >> having to work-around it in other spots. > > I think it is good too. However, the issue would be how to do it for > stable branch. Agree > If you think that block layer patch is enough and submit to stable > (5.8+) branch instead, I'm fine without fatfs patch. (Or removing > workaround in fatfs with block layer patch later?) Well, it should catch any block based case as we then set it when allocating the queue. So should do the trick for all block based cases at least. -- Jens Axboe ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2020-08-31 17:39 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-08-30 0:59 [PATCH] fat: Avoid oops when bdi->io_pages==0 OGAWA Hirofumi 2020-08-30 1:21 ` Matthew Wilcox 2020-08-30 1:54 ` OGAWA Hirofumi 2020-08-30 3:53 ` Matthew Wilcox 2020-08-30 9:04 ` OGAWA Hirofumi 2020-08-30 14:01 ` Sasha Levin 2020-08-30 14:16 ` OGAWA Hirofumi 2020-08-31 15:22 ` Jens Axboe 2020-08-31 16:37 ` OGAWA Hirofumi 2020-08-31 16:39 ` Jens Axboe 2020-08-31 16:56 ` Matthew Wilcox 2020-08-31 17:00 ` Jens Axboe 2020-08-31 17:39 ` OGAWA Hirofumi 2020-08-31 17:16 ` OGAWA Hirofumi 2020-08-31 17:19 ` Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).