* blocklayoutdriver and page_cache_next_hole
@ 2016-04-26 17:27 Benjamin Coddington
2016-04-27 7:53 ` Christoph Hellwig
0 siblings, 1 reply; 2+ messages in thread
From: Benjamin Coddington @ 2016-04-26 17:27 UTC (permalink / raw)
To: linux-nfs; +Cc: hch
I'm doing some benchmarking on SCSI layout, and I ran into a case where
bonnie++ seemingly stopped making forward progress.
bl_pg_init_write() wants to figure out how big the layout should be and uses
page_cache_next_hole() pretty aggressively if the inode size isn't equal to
the mapping. Problem is that page_cache_next_hole() is fairly stupid about
just walking through the page_cache radix tree by index.
The end result is that for fairly large files (>4G) my machine spends all
its time in __radix_tree_lookup(), and I might as well just use regular NFS.
Here's some bash I use to reproduce that problem (note the truncate is 4G +
1):
[root@gfs-a24c-02 local_spc4]# cat <(dd if=/dev/zero bs=1M count=4096) - > bar &
[1] 2000
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 10.7775 s, 399 MB/s
[1]+ Stopped cat <(dd if=/dev/zero bs=1M count=4096) - > bar
[root@gfs-a24c-02 local_spc4]# sync
[root@gfs-a24c-02 local_spc4]# truncate -s 4294967297 bar
[root@gfs-a24c-02 local_spc4]# dd if=/dev/zero of=bar bs=1M count=4096 conv=notrunc
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 82.0328 s, 52.4 MB/s
This performance problem gets far worse on larger address_maps..
A couple of ways to fix spring to mind: make page_cache_next_hole() less
stupid, or instead of trying to figure out what wb_size should be in
pg_init, provide a way for pg_init to look up a matching lseg beforehand.
Or maybe create a lsize parameter?
I'm going to continue to flail around trying to determine the best way to
fix this unless sound advice is offered. I don't know enough about the page
cache to make improvements to page_cache_next_hole(), nor have I any good
estimate on the acceptability of the other two approaches. Thanks for any
input.
Ben
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: blocklayoutdriver and page_cache_next_hole
2016-04-26 17:27 blocklayoutdriver and page_cache_next_hole Benjamin Coddington
@ 2016-04-27 7:53 ` Christoph Hellwig
0 siblings, 0 replies; 2+ messages in thread
From: Christoph Hellwig @ 2016-04-27 7:53 UTC (permalink / raw)
To: Benjamin Coddington; +Cc: linux-nfs, hch
On Tue, Apr 26, 2016 at 01:27:02PM -0400, Benjamin Coddington wrote:
> I'm doing some benchmarking on SCSI layout, and I ran into a case where
> bonnie++ seemingly stopped making forward progress.
>
> bl_pg_init_write() wants to figure out how big the layout should be and uses
> page_cache_next_hole() pretty aggressively if the inode size isn't equal to
> the mapping. Problem is that page_cache_next_hole() is fairly stupid about
> just walking through the page_cache radix tree by index.
>
> The end result is that for fairly large files (>4G) my machine spends all
> its time in __radix_tree_lookup(), and I might as well just use regular NFS.
Haha. One option is to simply optimize page_cache_next_hole, as that
should be easy and the read-ahead could would benefit as well. Or make
NFS pass through the writeback size and stop hacking around upper layers
in the layout driver. Both seem like useful options.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2016-04-27 7:53 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-26 17:27 blocklayoutdriver and page_cache_next_hole Benjamin Coddington
2016-04-27 7:53 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).