From: Jens Axboe <jens.axboe@oracle.com>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: linux-kernel@vger.kernel.org, Neil Brown <neilb@suse.de>
Subject: Re: [PATCH 1/2] Avoid bio_endio recursion
Date: Thu, 26 Jun 2008 09:07:24 +0200 [thread overview]
Message-ID: <20080626070723.GL20851@kernel.dk> (raw)
In-Reply-To: <Pine.LNX.4.64.0806252008380.11130@engineering.redhat.com>
On Wed, Jun 25 2008, Mikulas Patocka wrote:
> On Wed, 25 Jun 2008, Jens Axboe wrote:
>
> >On Tue, Jun 24 2008, Mikulas Patocka wrote:
> >>
> >>
> >>On Tue, 24 Jun 2008, Jens Axboe wrote:
> >>
> >>>On Tue, Jun 24 2008, Mikulas Patocka wrote:
> >>>>Hi
> >>>>
> >>>>bio_endio calls bi_end_io callback. In case of stacked devices (raid,
> >>>>dm),
> >>>>bio_end_io may call bio_endio again, up to an unspecified length.
> >>>>
> >>>>The crash because of stack overflow was really observed on sparc64. And
> >>>>this recursion was one of the contributing factors (using 9 stack frames
> >>>>--- that is 1728 bytes).
> >>>
> >>>Looks good, I like the concept. Can you please make it a little less
> >>>goto driven, though? The next_bio and goto next_bio could just be a
> >>>while().
> >>>
> >>>--
> >>>Jens Axboe
> >>>
> >>
> >>Hi.
> >>
> >>This is the patch, slightly de-goto-ized. (it still contains one, I think
> >>that while (1) { ... break ... } is no better readable than goto).
> >
> >Sure, that looks better.
> >
> >>I found another problem in my previous patch, I forgot about the "error"
> >>variable (it would cause misbehavior for example if disk fails, submits an
> >>error and raid driver turns this failure into success). We need to save
> >>the error variable somewhere in the bio, there is no other place where it
> >>could be placed. I temporarily saved it to bi_idx, because it's unused at
> >>this place.
> >
> >I don't think bi_idx is a fantastic idea, I could easily imagine the
> >bi_end_io function wanting to do a segment loop on the bio. Use
> >bi_phys_segments instead (or bi_hw_segemnts, no difference), they should
> >only be used when queuing and building IO, not for completion purposes.
> >And put a big fat comment there explaining the overload. Plus they are
> >just a cache, so if you use either of those and at the same time clear
> >BIO_SEG_VALID in bi_flags, then it's guarenteed to be safe.
>
> Here is new patch that uses bi_phys_segments.
>
> >Also please put the per-cpu definition outside of bio_endio(). And I
> >don't think you need to disable interrupts, a plain preempt_disable() /
> >preempt_enable() should be enough.
>
> Disabling interrupts is needed. If you disable only preempt, interrupt can
> come at any point and add anything to the queue. You'd have to use local_t
> and local_cmpxchg tricks then and the code would be much more ugly. I
> found that the code is called with interrupts disabled most time, so it
> doesn't matter if I disable them again.
Right, that wont work of course. Completions are typically done through
a softirq, so it is not currently done with hard interrupts disabled. So
it's a bit of a shame to impose this extra restriction, bio unrolling is
necessarily a really short operation. We could reenable around the
bi_end_io() call, but then we'd need to save and restore for each bio.
>
> Mikulas
>
> >--
> >Jens Axboe
>
> Avoid recursion on bio_endio. bio_endio calls bio->bi_end_io which may in
> turn
> call bio_endio again. When this recursion happens, put the new bio to the
> queue
> and process it later, from the top-level bio_endio.
>
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
>
> ---
> fs/bio.c | 38 +++++++++++++++++++++++++++++++++++---
> include/linux/bio.h | 3 +++
> 2 files changed, 38 insertions(+), 3 deletions(-)
>
> Index: linux-2.6.26-rc7-devel/fs/bio.c
> ===================================================================
> --- linux-2.6.26-rc7-devel.orig/fs/bio.c 2008-06-23
> 13:49:45.000000000 +0200
> +++ linux-2.6.26-rc7-devel/fs/bio.c 2008-06-25 18:09:48.000000000 +0200
> @@ -1166,15 +1166,47 @@ void bio_check_pages_dirty(struct bio *b
> * bio unless they own it and thus know that it has an end_io
> * function.
> **/
> +static DEFINE_PER_CPU(struct bio **, bio_end_queue) = { NULL };
> +
> void bio_endio(struct bio *bio, int error)
> {
> + struct bio ***bio_end_queue_ptr;
> + struct bio *bio_queue;
> +
> + unsigned long flags;
> +
> + bio->bi_phys_segments = error;
Please make that
bio->bi_flags &= ~(1 << BIO_SEG_VALID);
bio->bi_phys_segments = error;
as I mentioned, since then we have nothing to worry about. Otherwise
bio_phys_segments() could return a completely wrong count.
> if (error)
> clear_bit(BIO_UPTODATE, &bio->bi_flags);
> else if (!test_bit(BIO_UPTODATE, &bio->bi_flags))
> - error = -EIO;
> + bio->bi_phys_segments = -EIO;
> +
> + local_irq_save(flags);
> + bio_end_queue_ptr = &__get_cpu_var(bio_end_queue);
> +
> + if (*bio_end_queue_ptr) {
> + **bio_end_queue_ptr = bio;
> + *bio_end_queue_ptr = &bio->bi_next;
> + bio->bi_next = NULL;
> + } else {
> + bio_queue = NULL;
> + *bio_end_queue_ptr = &bio_queue;
> +
> +next_bio:
> + if (bio->bi_end_io)
> + bio->bi_end_io(bio, (short)bio->bi_phys_segments);
> +
> + if (bio_queue) {
> + bio = bio_queue;
> + bio_queue = bio->bi_next;
> + if (!bio_queue)
> + *bio_end_queue_ptr = &bio_queue;
> + goto next_bio;
> + }
> + *bio_end_queue_ptr = NULL;
> + }
>
> - if (bio->bi_end_io)
> - bio->bi_end_io(bio, error);
> + local_irq_restore(flags);
> }
>
> void bio_pair_release(struct bio_pair *bp)
> Index: linux-2.6.26-rc7-devel/include/linux/bio.h
> ===================================================================
> --- linux-2.6.26-rc7-devel.orig/include/linux/bio.h 2008-06-25
> 18:06:59.000000000 +0200
> +++ linux-2.6.26-rc7-devel/include/linux/bio.h 2008-06-25
> 18:08:08.000000000 +0200
> @@ -86,6 +86,9 @@ struct bio {
>
> /* Number of segments in this BIO after
> * physical address coalescing is performed.
> + *
> + * When ending a bio request in bio_endio, this field is temporarily
> + * (ab)used to keep the error code.
> */
> unsigned short bi_phys_segments;
>
--
Jens Axboe
next prev parent reply other threads:[~2008-06-26 7:07 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-06-24 5:22 [PATCH 1/2] Avoid bio_endio recursion Mikulas Patocka
2008-06-24 6:08 ` Neil Brown
2008-06-24 14:36 ` Mikulas Patocka
2008-06-24 8:07 ` Jens Axboe
2008-06-24 14:27 ` Mikulas Patocka
2008-06-25 8:24 ` Jens Axboe
2008-06-26 0:13 ` Mikulas Patocka
2008-06-26 7:07 ` Jens Axboe [this message]
2008-07-02 4:09 ` Mikulas Patocka
2008-07-02 8:00 ` Alan Cox
2008-07-03 21:03 ` Mikulas Patocka
2008-07-02 8:25 ` Jens Axboe
2008-07-03 21:08 ` Mikulas Patocka
2008-07-03 21:04 ` Alan Cox
2008-07-03 22:54 ` Mikulas Patocka
2008-07-03 23:00 ` Alan Cox
2008-07-03 23:51 ` Mikulas Patocka
2008-07-03 23:44 ` Alan Cox
2008-07-04 3:26 ` Mikulas Patocka
2008-07-04 8:11 ` Alan Cox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080626070723.GL20851@kernel.dk \
--to=jens.axboe@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mpatocka@redhat.com \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.