Re: [RFC PATCH] fs: block_dev: compute nr_vecs hint for improving writeback bvecs allocation

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Ming Lei <ming.lei@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>, Jens Axboe <axboe@kernel.dk>,
	linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	"Darrick J . Wong" <darrick.wong@oracle.com>,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [RFC PATCH] fs: block_dev: compute nr_vecs hint for improving writeback bvecs allocation
Date: Fri, 8 Jan 2021 15:59:22 +0800	[thread overview]
Message-ID: <20210108075922.GB3982620@T590> (raw)
In-Reply-To: <20210106222111.GE331610@dread.disaster.area>

On Thu, Jan 07, 2021 at 09:21:11AM +1100, Dave Chinner wrote:
> On Wed, Jan 06, 2021 at 04:45:48PM +0800, Ming Lei wrote:
> > On Tue, Jan 05, 2021 at 07:39:38PM +0100, Christoph Hellwig wrote:
> > > At least for iomap I think this is the wrong approach.  Between the
> > > iomap and writeback_control we know the maximum size of the writeback
> > > request and can just use that.
> > 
> > I think writeback_control can tell us nothing about max pages in single
> > bio:
> 
> By definition, the iomap tells us exactly how big the IO is going to
> be. i.e. an iomap spans a single contiguous range that we are going
> to issue IO on. Hence we can use that to size the bio exactly
> right for direct IO.

When I trace wpc->iomap.length in iomap_add_to_ioend() on the following fio
randwrite/write, the length is 1GB most of times, maybe because it is
one fresh XFS.

fio --size=1G --bsrange=4k-4k --runtime=30 --numjobs=2 --ioengine=psync --iodepth=32 \
	--directory=$DIR --group_reporting=1 --unlink=0 --direct=0 --fsync=0 --name=f1 \
	--stonewall --rw=$RW
sync

Another reason is that pages in the range may be contiguous physically,
so lots of pages may share one single bvec.

> 
> > - wbc->nr_to_write controls how many pages to writeback, this pages
> >   usually don't belong to same bio. Also this number is often much
> >   bigger than BIO_MAX_PAGES.
> > 
> > - wbc->range_start/range_end is similar too, which is often much more
> >   bigger than BIO_MAX_PAGES.
> > 
> > Also page/blocks_in_page can be mapped to different extent too, which is
> > only available when wpc->ops->map_blocks() is returned,
> 
> We only allocate the bio -after- calling ->map_blocks() to obtain
> the iomap for the given writeback range request. Hence we
> already know how large the BIO could be before we allocate it.
> 
> > which looks not
> > different with mpage_writepages(), in which bio is allocated with
> > BIO_MAX_PAGES vecs too.
> 
> __mpage_writepage() only maps a page at a time, so it can't tell
> ahead of time how big the bio is going to need to be as it doesn't
> return/cache a contiguous extent range. So it's actually very
> different to the iomap writeback code, and effectively does require
> a BIO_MAX_PAGES vecs allocation all the time...
> 
> > Or you mean we can use iomap->length for this purpose? But iomap->length
> > still is still too big in case of xfs.
> 
> if we are doing small random writeback into large extents (i.e.
> iomap->length is large), then it is trivial to detect that we are
> doing random writes rather than sequential writes by checking if the
> current page is sequential to the last sector in the current bio.
> We already do this non-sequential IO checking to determine if a new
> bio needs to be allocated in iomap_can_add_to_ioend(), and we also
> know how large the current contiguous range mapped into the current
> bio chain is (ioend->io_size). Hence we've got everything we need to
> determine whether we should do a large or small bio vec allocation
> in the iomap writeback path...

page->index should tell us if the workload is random or sequential, however
still not easy to decide how many pages there will be in the next bio
when iomap->length is large.


Thanks,
Ming

next prev parent reply	other threads:[~2021-01-08  8:01 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-05 13:26 [RFC PATCH] fs: block_dev: compute nr_vecs hint for improving writeback bvecs allocation Ming Lei
2021-01-05 18:39 ` Christoph Hellwig
2021-01-06  8:45   ` Ming Lei
2021-01-06 22:21     ` Dave Chinner
2021-01-08  7:59       ` Ming Lei [this message]
2021-01-08 21:00         ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210108075922.GB3982620@T590 \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).