blocklayoutdriver and page_cache_next

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* blocklayoutdriver and page_cache_next_hole
@ 2016-04-26 17:27 Benjamin Coddington
  2016-04-27  7:53 ` Christoph Hellwig
  0 siblings, 1 reply; 2+ messages in thread
From: Benjamin Coddington @ 2016-04-26 17:27 UTC (permalink / raw)
  To: linux-nfs; +Cc: hch

I'm doing some benchmarking on SCSI layout, and I ran into a case where
bonnie++ seemingly stopped making forward progress.

bl_pg_init_write() wants to figure out how big the layout should be and uses
page_cache_next_hole() pretty aggressively if the inode size isn't equal to
the mapping.  Problem is that page_cache_next_hole() is fairly stupid about
just walking through the page_cache radix tree by index.

The end result is that for fairly large files (>4G) my machine spends all
its time in __radix_tree_lookup(), and I might as well just use regular NFS.

Here's some bash I use to reproduce that problem (note the truncate is 4G +
1):

[root@gfs-a24c-02 local_spc4]# cat <(dd if=/dev/zero bs=1M count=4096) - > bar &
[1] 2000
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 10.7775 s, 399 MB/s
[1]+  Stopped                 cat <(dd if=/dev/zero bs=1M count=4096) - > bar
[root@gfs-a24c-02 local_spc4]# sync
[root@gfs-a24c-02 local_spc4]# truncate -s 4294967297  bar
[root@gfs-a24c-02 local_spc4]# dd if=/dev/zero of=bar bs=1M count=4096 conv=notrunc
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 82.0328 s, 52.4 MB/s

This performance problem gets far worse on larger address_maps..

A couple of ways to fix spring to mind: make page_cache_next_hole() less
stupid, or instead of trying to figure out what wb_size should be in
pg_init, provide a way for pg_init to look up a matching lseg beforehand.
Or maybe create a lsize parameter?

I'm going to continue to flail around trying to determine the best way to
fix this unless sound advice is offered.  I don't know enough about the page
cache to make improvements to page_cache_next_hole(), nor have I any good
estimate on the acceptability of the other two approaches.  Thanks for any
input.

Ben

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: blocklayoutdriver and page_cache_next_hole
  2016-04-26 17:27 blocklayoutdriver and page_cache_next_hole Benjamin Coddington
@ 2016-04-27  7:53 ` Christoph Hellwig
  0 siblings, 0 replies; 2+ messages in thread
From: Christoph Hellwig @ 2016-04-27  7:53 UTC (permalink / raw)
  To: Benjamin Coddington; +Cc: linux-nfs, hch

On Tue, Apr 26, 2016 at 01:27:02PM -0400, Benjamin Coddington wrote:
> I'm doing some benchmarking on SCSI layout, and I ran into a case where
> bonnie++ seemingly stopped making forward progress.
> 
> bl_pg_init_write() wants to figure out how big the layout should be and uses
> page_cache_next_hole() pretty aggressively if the inode size isn't equal to
> the mapping.  Problem is that page_cache_next_hole() is fairly stupid about
> just walking through the page_cache radix tree by index.
> 
> The end result is that for fairly large files (>4G) my machine spends all
> its time in __radix_tree_lookup(), and I might as well just use regular NFS.

Haha.  One option is to simply optimize page_cache_next_hole, as that
should be easy and the read-ahead could would benefit as well.  Or make
NFS pass through the writeback size and stop hacking around upper layers
in the layout driver.  Both seem like useful options.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2016-04-27  7:53 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-26 17:27 blocklayoutdriver and page_cache_next_hole Benjamin Coddington
2016-04-27  7:53 ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).