From: Ming Lei <tom.leiming@gmail.com>
To: Ming Lei <tom.leiming@gmail.com>
Cc: Shaun Tancheff <shaun@tancheff.com>,
"linux-ide@vger.kernel.org" <linux-ide@vger.kernel.org>,
"open list:DEVICE-MAPPER (LVM)" <dm-devel@redhat.com>,
linux-block@vger.kernel.org,
Linux SCSI List <linux-scsi@vger.kernel.org>,
Jens Axboe <axboe@fb.com>,
Shaun Tancheff <shaun.tancheff@seagate.com>,
Christoph Hellwig <hch@infradead.org>,
Mikulas Patocka <mpatocka@redhat.com>
Subject: Re: [PATCH 10/12] Limit bio_endio recursion
Date: Thu, 7 Apr 2016 13:44:43 +0800 [thread overview]
Message-ID: <20160407134443.552c2556@tom-T450> (raw)
In-Reply-To: <CACVXFVNo6+F_OPyiTE882UrECz7BG_wKGj=teukibuiMdxmLHw@mail.gmail.com>
On Thu, 7 Apr 2016 11:54:49 +0800
Ming Lei <tom.leiming@gmail.com> wrote:
> On Mon, Apr 4, 2016 at 1:06 PM, Shaun Tancheff <shaun@tancheff.com> wrote:
> > Recursive endio calls can exceed 16k stack. Tested with
> > 32k stack and observed:
> >
> > Depth Size Location (293 entries)
> > ----- ---- --------
> > 0) 21520 16 __module_text_address+0x12/0x60
> > 1) 21504 8 __module_address+0x5/0x140
> > 2) 21496 24 __module_text_address+0x12/0x60
> > 3) 21472 16 is_module_text_address+0xe/0x20
> > 4) 21456 8 __kernel_text_address+0x50/0x80
> > 5) 21448 136 print_context_stack+0x5a/0xf0
> > 6) 21312 144 dump_trace+0x14c/0x300
> > 7) 21168 8 save_stack_trace+0x2f/0x50
> > 8) 21160 88 set_track+0x64/0x130
> > 9) 21072 96 free_debug_processing+0x200/0x290
> > 10) 20976 176 __slab_free+0x164/0x290
> > 11) 20800 48 kmem_cache_free+0x1b0/0x1e0
> > 12) 20752 16 mempool_free_slab+0x17/0x20
> > 13) 20736 48 mempool_free+0x2f/0x90
> > 14) 20688 16 bvec_free+0x36/0x40
> > 15) 20672 32 bio_free+0x3b/0x60
> > 16) 20640 16 bio_put+0x23/0x30
> > 17) 20624 64 end_bio_extent_writepage+0xcf/0xe0
> > 18) 20560 48 bio_endio+0x57/0x90
> > 19) 20512 48 btrfs_end_bio+0xa8/0x160
> > 20) 20464 48 bio_endio+0x57/0x90
> > 21) 20416 112 dec_pending+0x121/0x270
> > 22) 20304 64 clone_endio+0x7a/0x100
> > 23) 20240 48 bio_endio+0x57/0x90
> > ...
> > 277) 1264 64 clone_endio+0x7a/0x100
> > 278) 1200 48 bio_endio+0x57/0x90
> > 279) 1152 112 dec_pending+0x121/0x270
> > 280) 1040 64 clone_endio+0x7a/0x100
> > 281) 976 48 bio_endio+0x57/0x90
> > 282) 928 80 blk_update_request+0x8f/0x340
> > 283) 848 80 scsi_end_request+0x33/0x1c0
> > 284) 768 112 scsi_io_completion+0xb5/0x620
> > 285) 656 48 scsi_finish_command+0xcf/0x120
> > 286) 608 48 scsi_softirq_done+0x126/0x150
> > 287) 560 24 blk_done_softirq+0x78/0x90
> > 288) 536 136 __do_softirq+0xfd/0x280
> > 289) 400 16 run_ksoftirqd+0x28/0x50
> > 290) 384 64 smpboot_thread_fn+0x105/0x160
> > 291) 320 144 kthread+0xc9/0xe0
> > 292) 176 176 ret_from_fork+0x3f/0x70
> >
> > Based on earlier patch by Mikulas Patocka <mpatocka@redhat.com>.
> > https://lkml.org/lkml/2008/6/24/18
>
> Looks a empty link, and the following had the discussion too:
>
> http://linux-kernel.2935.n7.nabble.com/PATCH-1-2-Avoid-bio-endio-recursion-td306043.html
>
> >
> > Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
> > ---
> > block/bio.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++++++---
> > 1 file changed, 54 insertions(+), 3 deletions(-)
> >
> > diff --git a/block/bio.c b/block/bio.c
> > index d42e69c..4ac19f6 100644
> > --- a/block/bio.c
> > +++ b/block/bio.c
> > @@ -1733,6 +1733,59 @@ static inline bool bio_remaining_done(struct bio *bio)
> > return false;
> > }
> >
> > +static DEFINE_PER_CPU(struct bio **, bio_end_queue) = { NULL };
>
> The idea looks very nice, but this patch can't be applid to current block-next
> branch.
>
> Looks it might be simpler to implement the approach by using percpu bio_list.
Cc Mikulas & Christoph
How about the following implementation with bio_list? Looks more readable and
simpler.
Just run a recent testcase of bcache over raid1(virtio-blk, virtio-blk), looks
it does work, :-)
---
diff --git a/block/bio.c b/block/bio.c
index f124a0a..b97dfe8 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -68,6 +68,8 @@ static DEFINE_MUTEX(bio_slab_lock);
static struct bio_slab *bio_slabs;
static unsigned int bio_slab_nr, bio_slab_max;
+static DEFINE_PER_CPU(struct bio_list *, bio_end_list) = { NULL };
+
static struct kmem_cache *bio_find_or_create_slab(unsigned int extra_size)
{
unsigned int sz = sizeof(struct bio) + extra_size;
@@ -1737,6 +1739,46 @@ static inline bool bio_remaining_done(struct bio *bio)
return false;
}
+/* disable local irq when manipulating the percpu bio_list */
+static void unwind_bio_endio(struct bio *bio)
+{
+ struct bio_list bl_in_stack;
+ struct bio_list *bl;
+ unsigned long flags;
+ bool clear_list = false;
+
+ local_irq_save(flags);
+
+ bl = this_cpu_read(bio_end_list);
+ if (!bl) {
+ bl = &bl_in_stack;
+ bio_list_init(bl);
+ clear_list = true;
+ }
+
+ if (!bio_list_empty(bl)) {
+ bio_list_add(bl, bio);
+ goto out;
+ }
+
+ while (1) {
+ local_irq_restore(flags);
+
+ if (!bio)
+ break;
+
+ if (bio->bi_end_io)
+ bio->bi_end_io(bio);
+
+ local_irq_save(flags);
+ bio = bio_list_pop(bl);
+ }
+ if (clear_list)
+ this_cpu_write(bio_end_list, NULL);
+ out:
+ local_irq_restore(flags);
+}
+
/**
* bio_endio - end I/O on a bio
* @bio: bio
@@ -1765,8 +1807,7 @@ again:
goto again;
}
- if (bio->bi_end_io)
- bio->bi_end_io(bio);
+ unwind_bio_endio(bio);
}
EXPORT_SYMBOL(bio_endio);
>
> > +
> > +static struct bio *unwind_bio_endio(struct bio *bio)
> > +{
> > + struct bio ***bio_end_queue_ptr;
> > + struct bio *bio_queue;
> > + struct bio *chain_bio = NULL;
> > + int error = bio->bi_error;
> > + unsigned long flags;
> > +
> > + local_irq_save(flags);
>
> If we may resue current->bio_list for this purpose, disabling irq should have
> been avoided, but looks it is difficult to do in that way, maybe impossible.
> Also not realistic to introduce a new per-task variable.
>
> > + bio_end_queue_ptr = this_cpu_ptr(&bio_end_queue);
> > +
> > + if (*bio_end_queue_ptr) {
> > + **bio_end_queue_ptr = bio;
> > + *bio_end_queue_ptr = &bio->bi_next;
> > + bio->bi_next = NULL;
> > + } else {
> > + bio_queue = NULL;
> > + *bio_end_queue_ptr = &bio_queue;
>
> Suggest to comment that bio_queue is the bottom-most stack variable,
> otherwise looks a bit tricky to understand.
>
> > +
> > +next_bio:
> > + if (bio->bi_end_io == bio_chain_endio) {
> > + struct bio *parent = bio->bi_private;
> > +
> > + bio_put(bio);
> > + chain_bio = parent;
> > + goto out;
> > + }
> > +
> > + if (bio->bi_end_io) {
> > + if (!bio->bi_error)
> > + bio->bi_error = error;
> > + bio->bi_end_io(bio);
> > + }
> > +
> > + if (bio_queue) {
> > + bio = bio_queue;
> > + bio_queue = bio->bi_next;
> > + if (!bio_queue)
> > + *bio_end_queue_ptr = &bio_queue;
> > + goto next_bio;
> > + }
> > + *bio_end_queue_ptr = NULL;
> > + }
> > +
> > +out:
> > +
> > + local_irq_restore(flags);
> > +
> > + return chain_bio;
> > +}
> > +
> > /**
> > * bio_endio - end I/O on a bio
> > * @bio: bio
> > @@ -1762,9 +1815,7 @@ void bio_endio(struct bio *bio)
> > bio_put(bio);
> > bio = parent;
> > } else {
> > - if (bio->bi_end_io)
> > - bio->bi_end_io(bio);
> > - bio = NULL;
> > + bio = unwind_bio_endio(bio);
> > }
> > }
> > }
> > --
> > 1.9.1
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-block" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
next prev parent reply other threads:[~2016-04-07 5:45 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-04 5:06 [PATCH 00/12] Block support for zoned drives Shaun Tancheff
2016-04-04 5:06 ` [PATCH 01/12] Add ZBC <-> ZAC xlat support for report, open, close, reset, finish Shaun Tancheff
2016-04-04 5:06 ` [PATCH 02/12] ata-scsi: Translate ReportZones result to big endian Shaun Tancheff
2016-04-04 5:06 ` [PATCH 03/12] BUG: Losing bits on request.cmd_flags Shaun Tancheff
2016-04-04 15:56 ` Jeff Moyer
2016-04-04 16:06 ` Shaun Tancheff
2016-04-05 2:08 ` Martin K. Petersen
2016-04-05 10:39 ` Shaun Tancheff
2016-04-04 5:06 ` [PATCH 04/12] Add bio/request flags for using ZBC/ZAC commands Shaun Tancheff
2016-04-04 5:06 ` [PATCH 05/12] Add ioctl to issue ZBC/ZAC commands via block layer Shaun Tancheff
2016-04-04 5:06 ` [PATCH 06/12] Add ata pass-through path for ZAC commands Shaun Tancheff
2016-04-04 5:06 ` [PATCH 07/12] ZDM: New DM target 'zoned' Shaun Tancheff
2016-04-04 5:06 ` [PATCH 08/12] RAID 4/5/6: Indicate parity blocks as 'META' Shaun Tancheff
2016-04-04 5:06 ` [PATCH 09/12] RAID 4/5/6: Fine-grained TRIM enable for ZDM Shaun Tancheff
2016-04-04 5:06 ` [PATCH 10/12] Limit bio_endio recursion Shaun Tancheff
2016-04-07 3:54 ` Ming Lei
2016-04-07 5:44 ` Ming Lei [this message]
2016-04-07 6:03 ` Ming Lei
2016-04-04 5:06 ` [PATCH 11/12] Stream Id: Use PID to seed Stream Id construction Shaun Tancheff
2016-04-04 5:06 ` [PATCH 12/12] Block Zoned Control for Userspace Shaun Tancheff
2016-04-04 7:32 ` [PATCH 00/12] Block support for zoned drives Hannes Reinecke
2016-04-04 8:13 ` Shaun Tancheff
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160407134443.552c2556@tom-T450 \
--to=tom.leiming@gmail.com \
--cc=axboe@fb.com \
--cc=dm-devel@redhat.com \
--cc=hch@infradead.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-ide@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=mpatocka@redhat.com \
--cc=shaun.tancheff@seagate.com \
--cc=shaun@tancheff.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox