From: Mark Fasheh <mark.fasheh@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] Fixing cluster/heartbeat.c's use of bio_add_page()
Date: Tue Jan 9 14:28:10 2007 [thread overview]
Message-ID: <20070109222804.GY6831@ca-server1.us.oracle.com> (raw)
In-Reply-To: <200701051506.52840.philipp.reisner@linbit.com>
Hi Philipp,
On Fri, Jan 05, 2007 at 03:06:52PM +0100, Philipp Reisner wrote:
> Hi There,
>
> As was already pointed out Mathieu Avila on Thu, 07 Sep 2006 03:15:25 -0700
> that OCFS2 is expecting bio_add_page() to add pages to BIOs in an easily
> predictable manner.
>
> That is not true, especially for devices with own merge_bvec_fn().
>
> Therefore OCFS2's heartbeat code is very likely to fail on such devices.
>
> The attached patch corrects this, and reduces the file cluster/heartbeat.c
> by 90 lines.
Thanks for taking this on - fixing up that code has been on my todo list for
a while.
> The whole patch builds around the change that the bio_put() call is
> moved to the bio's bi_end_io() function.
> This makes the whole idea of trying to predict the behaviour of
> bio_add_page() unnecessary, therefore we can remove compute_max_sectors()
> and o2hb_compute_request_limits().
>
> -Phil
>
> PS1: With that applied I can happily run OCFS2 on a DRBD-8.0 device which
> sits on top of software raid (both have interesting merge_bvec_fn()
> functions.
>
> PS2: I created the ocfs2-fix-bio_add_page.diff on 2.6.17 .
> And later refitted the patch to 2.6.20-rc2.
Ok, great. I only looked at the 2.6.20-rc2 patch as that's what I'm running
here.
> static int o2hb_read_slots(struct o2hb_region *reg,
> unsigned int max_slots)
> {
> - unsigned int num_bios, slots_per_bio, start_slot, num_slots;
> - int i, status;
> + unsigned int current_slot=0;
> + int status;
> struct o2hb_bio_wait_ctxt wc;
> - struct bio **bios;
> struct bio *bio;
>
> - o2hb_compute_request_limits(reg, max_slots, &num_bios, &slots_per_bio);
> -
> - bios = kcalloc(num_bios, sizeof(struct bio *), GFP_KERNEL);
> - if (!bios) {
> - status = -ENOMEM;
> - mlog_errno(status);
> - return status;
> - }
> -
> - o2hb_bio_wait_init(&wc, num_bios);
> -
> - num_slots = slots_per_bio;
> - for(i = 0; i < num_bios; i++) {
> - start_slot = i * slots_per_bio;
> + o2hb_bio_wait_init(&wc);
>
> - /* adjust num_slots at last bio */
> - if (max_slots < (start_slot + num_slots))
> - num_slots = max_slots - start_slot;
> -
> - bio = o2hb_setup_one_bio(reg, &wc, start_slot, num_slots);
> + while(current_slot < max_slots) {
> + bio = o2hb_setup_one_bio(reg, &wc, ¤t_slot, max_slots);
> if (IS_ERR(bio)) {
> - o2hb_bio_wait_dec(&wc, num_bios - i);
> -
> status = PTR_ERR(bio);
> mlog_errno(status);
> goto bail_and_wait;
> }
> - bios[i] = bio;
>
> + atomic_inc(&wc.wc_num_reqs);
> submit_bio(READ, bio);
> }
You changed o2hb_bio_wait_init() to initialize wc_num_reqs to zero, and
instead you increment the value before each submit_bio(). This makes sense
because we now don't know how many requests will ultimately be submitted.
However, A race now exists here between the piecemeal increment of
wc_num_reqs and the completion code in o2hb_bio_wait_dec():
if (atomic_dec_and_test(&wc->wc_num_reqs)) {
BUG_ON(num > 0);
complete(&wc->wc_io_complete);
}
So, if enough bios complete before you're done submitting all your read
requests, you could trigger the completion, wc_io_complete early.
This would certainly be hard to hit, but that's never stopped a race from
occuring before :)
So, NACK until we fix that race. Otherwise the patch seems to look fine. I
can test a future version here at Oracle.
Thanks,
--Mark
--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh@oracle.com
WARNING: multiple messages have this Message-ID (diff)
From: Mark Fasheh <mark.fasheh@oracle.com>
To: Philipp Reisner <philipp.reisner@linbit.com>
Cc: ocfs2-devel@oss.oracle.com, drbd-dev@lists.linbit.com
Subject: [Drbd-dev] Re: [Ocfs2-devel] Fixing cluster/heartbeat.c's use of bio_add_page()
Date: Tue, 9 Jan 2007 14:28:04 -0800 [thread overview]
Message-ID: <20070109222804.GY6831@ca-server1.us.oracle.com> (raw)
In-Reply-To: <200701051506.52840.philipp.reisner@linbit.com>
Hi Philipp,
On Fri, Jan 05, 2007 at 03:06:52PM +0100, Philipp Reisner wrote:
> Hi There,
>
> As was already pointed out Mathieu Avila on Thu, 07 Sep 2006 03:15:25 -0700
> that OCFS2 is expecting bio_add_page() to add pages to BIOs in an easily
> predictable manner.
>
> That is not true, especially for devices with own merge_bvec_fn().
>
> Therefore OCFS2's heartbeat code is very likely to fail on such devices.
>
> The attached patch corrects this, and reduces the file cluster/heartbeat.c
> by 90 lines.
Thanks for taking this on - fixing up that code has been on my todo list for
a while.
> The whole patch builds around the change that the bio_put() call is
> moved to the bio's bi_end_io() function.
> This makes the whole idea of trying to predict the behaviour of
> bio_add_page() unnecessary, therefore we can remove compute_max_sectors()
> and o2hb_compute_request_limits().
>
> -Phil
>
> PS1: With that applied I can happily run OCFS2 on a DRBD-8.0 device which
> sits on top of software raid (both have interesting merge_bvec_fn()
> functions.
>
> PS2: I created the ocfs2-fix-bio_add_page.diff on 2.6.17 .
> And later refitted the patch to 2.6.20-rc2.
Ok, great. I only looked at the 2.6.20-rc2 patch as that's what I'm running
here.
> static int o2hb_read_slots(struct o2hb_region *reg,
> unsigned int max_slots)
> {
> - unsigned int num_bios, slots_per_bio, start_slot, num_slots;
> - int i, status;
> + unsigned int current_slot=0;
> + int status;
> struct o2hb_bio_wait_ctxt wc;
> - struct bio **bios;
> struct bio *bio;
>
> - o2hb_compute_request_limits(reg, max_slots, &num_bios, &slots_per_bio);
> -
> - bios = kcalloc(num_bios, sizeof(struct bio *), GFP_KERNEL);
> - if (!bios) {
> - status = -ENOMEM;
> - mlog_errno(status);
> - return status;
> - }
> -
> - o2hb_bio_wait_init(&wc, num_bios);
> -
> - num_slots = slots_per_bio;
> - for(i = 0; i < num_bios; i++) {
> - start_slot = i * slots_per_bio;
> + o2hb_bio_wait_init(&wc);
>
> - /* adjust num_slots at last bio */
> - if (max_slots < (start_slot + num_slots))
> - num_slots = max_slots - start_slot;
> -
> - bio = o2hb_setup_one_bio(reg, &wc, start_slot, num_slots);
> + while(current_slot < max_slots) {
> + bio = o2hb_setup_one_bio(reg, &wc, ¤t_slot, max_slots);
> if (IS_ERR(bio)) {
> - o2hb_bio_wait_dec(&wc, num_bios - i);
> -
> status = PTR_ERR(bio);
> mlog_errno(status);
> goto bail_and_wait;
> }
> - bios[i] = bio;
>
> + atomic_inc(&wc.wc_num_reqs);
> submit_bio(READ, bio);
> }
You changed o2hb_bio_wait_init() to initialize wc_num_reqs to zero, and
instead you increment the value before each submit_bio(). This makes sense
because we now don't know how many requests will ultimately be submitted.
However, A race now exists here between the piecemeal increment of
wc_num_reqs and the completion code in o2hb_bio_wait_dec():
if (atomic_dec_and_test(&wc->wc_num_reqs)) {
BUG_ON(num > 0);
complete(&wc->wc_io_complete);
}
So, if enough bios complete before you're done submitting all your read
requests, you could trigger the completion, wc_io_complete early.
This would certainly be hard to hit, but that's never stopped a race from
occuring before :)
So, NACK until we fix that race. Otherwise the patch seems to look fine. I
can test a future version here at Oracle.
Thanks,
--Mark
--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh@oracle.com
next prev parent reply other threads:[~2007-01-09 14:28 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-01-05 6:07 [Ocfs2-devel] Fixing cluster/heartbeat.c's use of bio_add_page() Philipp Reisner
2007-01-05 14:06 ` [Drbd-dev] " Philipp Reisner
2007-01-09 14:28 ` Mark Fasheh [this message]
2007-01-09 22:28 ` [Drbd-dev] Re: [Ocfs2-devel] " Mark Fasheh
2007-01-11 1:58 ` Philipp Reisner
2007-01-11 9:58 ` [Drbd-dev] " Philipp Reisner
2007-01-16 11:58 ` Mark Fasheh
2007-01-16 19:57 ` [Drbd-dev] " Mark Fasheh
2007-01-16 12:20 ` Philipp Reisner
2007-01-16 20:20 ` [Drbd-dev] " Philipp Reisner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070109222804.GY6831@ca-server1.us.oracle.com \
--to=mark.fasheh@oracle.com \
--cc=ocfs2-devel@oss.oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.