From: Mikulas Patocka <mpatocka-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Alasdair G Kergon <agk-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Kent Overstreet
<koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-bcache-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org,
yehuda-L5o5AL9CYN0tUFlbccrkMA@public.gmane.org,
vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
bharrosh-C4P08NqkoRlBDgjK7y7TUQ@public.gmane.org,
tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
sage-BnTBU8nroG7k1uMJSBkQmQ@public.gmane.org,
drbd-dev-cunTk1MwBs8qoQakbn7OcQ@public.gmane.org,
Dave Chinner <dchinner-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
tytso-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org
Subject: Re: [PATCH v3 14/16] Gut bio_add_page()
Date: Mon, 28 May 2012 12:07:14 -0400 (EDT) [thread overview]
Message-ID: <Pine.LNX.4.64.1205281129180.2227@file.rdu.redhat.com> (raw)
In-Reply-To: <20120525223937.GF5761-FDJ95KluN3Z0klwcnFlA1dvLeJWuRmrY@public.gmane.org>
Hi
The general problem with bio_add_page simplification is this:
Suppose that you have an old ATA disk that can read or write at most 256
sectors. Suppose that you are reading from the disk and readahead for 512
sectors is used:
With accurately sized bios, you send one bio for 256 sectors (it is sent
immediatelly to the disk) and a second bio for another 256 sectors (it is
put to the block device queue). The first bio finishes, pages are marked
as uptodate, the second bio is sent to the disk. While the disk is
processing the second bio, the kernel already knows that the first 256
sectors are finished - so it copies the data to userspace and lets the
userspace process them - while the disk is processing the second bio. So,
disk transfer and data processing are overlapped.
Now, with your patch, you send just one 512-sector bio. The bio is split
to two bios, the first one is sent to the disk and you wait. The disk
finishes the first bio, you send the second bio to the disk and wait. The
disk finishes the second bio. You complete the master bio, mark all 512
sectors as uptodate in the pagecache, start copying data to the userspace
and processing them. Disk transfer and data processing are not overlapped.
The same problem arises with raid-0, raid-5 or raid-10: if you send
accurately-sized bios (that don't span stripe boundaries), each bio waits
just for one disk to seek to the requested position. If you send oversized
bio that spans several stripes, that bio will wait until all the disks
seek to the requested position.
In general, you can send oversized bios if the user is waiting for all the
data requested (for example O_DIRECT read or write). You shouldn't send
oversized bios if the user is waiting just for a small part of data and
the kernel is doing readahead - in this case, oversized bio will result in
additional delay.
I think bio_add_page should be simplified in such a way that in the most
common cases it doesn't create oversized bio, but it can create oversized
bios in uncommon cases. We could retain a limit on a maximum number of
sectors (this limit is most commonly hit on disks), put a stripe boundary
to queue_limits (the stripe boundary limit is most commonly hit on raid),
ignore the rest of the limits in bio_add_page and remove merge_bvec.
Mikulas
On Fri, 25 May 2012, Alasdair G Kergon wrote:
> Where's the urge to remove merge_bvec coming from?
>
> I think it's premature to touch this, and that the other changes, if
> fixed and integrated, should be allowed to bed themselves down first.
>
>
> Ideally every bio would be the best size on submission and no bio would
> ever need to be split.
>
> But there is a cost involved in calculating the best size - we use
> merge_bvec for this, which gives a (probable) maximum size. It's
> usually very cheap to calculate - but not always. [In dm, we permit
> some situations where the answer we give will turn out to be wrong, but
> ensure dm will always fix up those particular cases itself later and
> still process the over-sized bio correctly.]
>
> Similarly there is a performance penalty incurred when the size is wrong
> - the bio has to be split, requiring memory, potential delays etc.
>
> There is a trade-off between those two, and our experience with the current
> code has that tilted strongly in favour of using merge_bvec all the time.
> The wasted overhead in cases where it is of no benefit seem to be
> outweighed by the benefit where it does avoid lots of splitting and help
> filesystems optimise their behaviour.
>
>
> If the splitting mechanism is changed as proposed, then that balance
> might shift. My gut feeling though is that any shift would strengthen
> the case for merge_bvec.
>
> Alasdair
>
next prev parent reply other threads:[~2012-05-28 16:07 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-25 20:25 [PATCH v3 00/16] Block cleanups Kent Overstreet
2012-05-25 20:25 ` [PATCH v3 01/16] block: Generalized bio pool freeing Kent Overstreet
[not found] ` <1337977539-16977-2-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-05-28 1:15 ` Tejun Heo
2012-05-28 10:04 ` Boaz Harrosh
2012-05-25 20:25 ` [PATCH v3 02/16] dm: Use bioset's front_pad for dm_rq_clone_bio_info Kent Overstreet
[not found] ` <1337977539-16977-3-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-05-28 0:57 ` [dm-devel] " Jun'ichi Nomura
[not found] ` <4FC2CD93.1080009-JhyGz2TFV9J8UrSeD/g0lQ@public.gmane.org>
2012-05-28 11:41 ` Jun'ichi Nomura
2012-05-28 1:21 ` Tejun Heo
[not found] ` <1337977539-16977-1-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-05-25 20:25 ` [PATCH v3 03/16] block: Add bio_reset() Kent Overstreet
[not found] ` <1337977539-16977-4-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-05-28 1:23 ` Tejun Heo
2012-05-28 10:02 ` Boaz Harrosh
2012-05-25 20:25 ` [PATCH v3 04/16] pktcdvd: Switch to bio_kmalloc() Kent Overstreet
[not found] ` <1337977539-16977-5-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-05-28 1:30 ` Tejun Heo
2012-05-25 20:25 ` [PATCH v3 05/16] block: Kill bi_destructor Kent Overstreet
[not found] ` <1337977539-16977-6-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-05-28 1:36 ` Tejun Heo
2012-05-29 2:10 ` Kent Overstreet
[not found] ` <20120529021042.GA6472-RcKxWJ4Cfj3IzGYXcIpNmNLIRw13R84JkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
2012-05-29 2:20 ` Tejun Heo
2012-05-25 20:25 ` [PATCH v3 06/16] block: Add an explicit bio flag for bios that own their bvec Kent Overstreet
[not found] ` <1337977539-16977-7-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-05-28 1:52 ` Tejun Heo
2012-05-25 20:25 ` [PATCH v3 07/16] block: Rename bio_split() -> bio_pair_split() Kent Overstreet
[not found] ` <1337977539-16977-8-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-05-28 10:15 ` Boaz Harrosh
[not found] ` <4FC3504B.7000903-C4P08NqkoRlBDgjK7y7TUQ@public.gmane.org>
2012-05-29 2:15 ` Kent Overstreet
2012-05-25 20:25 ` [PATCH v3 08/16] block: Rework bio splitting Kent Overstreet
[not found] ` <1337977539-16977-9-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-05-28 16:12 ` Mikulas Patocka
2012-05-25 20:25 ` [PATCH v3 09/16] block: Add bio_clone_kmalloc() Kent Overstreet
2012-05-25 20:25 ` [PATCH v3 10/16] block: Add bio_clone_bioset() Kent Overstreet
2012-05-25 20:25 ` [PATCH v3 11/16] block: Only clone bio vecs that are in use Kent Overstreet
2012-05-25 20:25 ` [PATCH v3 12/16] Closures Kent Overstreet
[not found] ` <1337977539-16977-13-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-05-25 20:57 ` Joe Perches
2012-05-25 21:35 ` Kent Overstreet
2012-05-25 20:25 ` [PATCH v3 13/16] Make generic_make_request handle arbitrarily large bios Kent Overstreet
[not found] ` <1337977539-16977-14-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-05-25 22:58 ` Alasdair G Kergon
[not found] ` <20120525225852.GG5761-FDJ95KluN3Z0klwcnFlA1dvLeJWuRmrY@public.gmane.org>
2012-05-25 23:12 ` Alasdair G Kergon
2012-05-26 0:18 ` Kent Overstreet
2012-05-25 20:25 ` [PATCH v3 14/16] Gut bio_add_page() Kent Overstreet
[not found] ` <1337977539-16977-15-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-05-25 20:46 ` Mike Snitzer
2012-05-25 21:09 ` Kent Overstreet
[not found] ` <20120525210944.GB14196-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-05-25 22:39 ` Alasdair G Kergon
[not found] ` <20120525223937.GF5761-FDJ95KluN3Z0klwcnFlA1dvLeJWuRmrY@public.gmane.org>
2012-05-28 16:07 ` Mikulas Patocka [this message]
[not found] ` <Pine.LNX.4.64.1205281129180.2227-e+HWlsje6Db1wF9wiOj0lkEOCMrvLtNR@public.gmane.org>
2012-05-28 20:28 ` Tejun Heo
[not found] ` <20120528202839.GA18537-RcKxWJ4Cfj1J2suj2OqeGauc2jM2gXBXkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
2012-05-28 21:27 ` Mikulas Patocka
2012-05-28 21:38 ` Tejun Heo
[not found] ` <20120528213839.GB18537-RcKxWJ4Cfj1J2suj2OqeGauc2jM2gXBXkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
2012-05-28 23:02 ` Tejun Heo
[not found] ` <20120528230208.GA20954-RcKxWJ4Cfj1J2suj2OqeGauc2jM2gXBXkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
2012-05-29 2:08 ` Dave Chinner
2012-05-29 2:15 ` Tejun Heo
[not found] ` <20120529021558.GG20954-RcKxWJ4Cfj1J2suj2OqeGauc2jM2gXBXkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
2012-05-29 3:36 ` Kent Overstreet
2012-05-29 2:07 ` Dave Chinner
2012-05-29 1:54 ` Dave Chinner
2012-05-29 3:34 ` Kent Overstreet
2012-06-05 0:33 ` Dave Chinner
2012-05-25 20:25 ` [PATCH v3 15/16] md: Kill merge_bvec_fn()s Kent Overstreet
2012-05-25 20:25 ` [PATCH v3 16/16] dm: Kill merge_bvec_fn() Kent Overstreet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.1205281129180.2227@file.rdu.redhat.com \
--to=mpatocka-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
--cc=agk-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org \
--cc=bharrosh-C4P08NqkoRlBDgjK7y7TUQ@public.gmane.org \
--cc=dchinner-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=drbd-dev-cunTk1MwBs8qoQakbn7OcQ@public.gmane.org \
--cc=koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=linux-bcache-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=sage-BnTBU8nroG7k1uMJSBkQmQ@public.gmane.org \
--cc=snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=tytso-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=yehuda-L5o5AL9CYN0tUFlbccrkMA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).