Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748
       [not found] <20140102053101.GA29352@localhost>
@ 2014-01-03 19:51 ` Muthu Kumar
       [not found]   ` <20140105094639.GA7423@localhost>
  2014-01-06 22:10   ` Kent Overstreet
  0 siblings, 2 replies; 21+ messages in thread
From: Muthu Kumar @ 2014-01-03 19:51 UTC (permalink / raw)
  To: fengguang.wu
  Cc: Kent Overstreet, Jens Axboe, linux-btrfs, linux-fsdevel, LKML,
	lkp

Looks like Kent missed the btrfs endio in the original commit. How
about this patch:

---------

In btrfs_end_bio, call bio_endio_nodec on the restored bio so the
bi_remaining is accounted for correctly.

Reported-by: fengguang.wu@intel.com
Cc: Kent Overstreet <kmo@daterainc.com>
CC: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Muthukumar Ratty <muthur@gmail.com>
--------

 fs/btrfs/volumes.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f2130de..edfed52 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5316,7 +5316,11 @@ static void btrfs_end_bio(struct bio *bio, int err)
                }
                kfree(bbio);

-               bio_endio(bio, err);
+                /*
+                 * Call endio_nodec on the restored bio so the bi_remaining is
+                 * accounted for correctly
+                 */
+               bio_endio_nodec(bio, err);
        } else if (!is_orig_bio) {
                bio_put(bio);
        }

On Wed, Jan 1, 2014 at 9:31 PM,  <fengguang.wu@intel.com> wrote:
> Greetings,
>
> We hit the below bug when doing write tests to btrfs.
> Other filesystems (ext4, xfs) works fine. 2 full dmesgs are attached.
>
> 196d38bccfcfa32faed8c561868336fdfa0fe8e4 is the first bad commit
> commit 196d38bccfcfa32faed8c561868336fdfa0fe8e4
> Author:     Kent Overstreet <kmo@daterainc.com>
> AuthorDate: Sat Nov 23 18:34:15 2013 -0800
> Commit:     Kent Overstreet <kmo@daterainc.com>
> CommitDate: Sat Nov 23 22:33:56 2013 -0800
>
>     block: Generic bio chaining
>
>     This adds a generic mechanism for chaining bio completions. This is
>     going to be used for a bio_split() replacement, and it turns out to be
>     very useful in a fair amount of driver code - a fair number of drivers
>     were implementing this in their own roundabout ways, often painfully.
>
>     Note that this means it's no longer to call bio_endio() more than once
>     on the same bio! This can cause problems for drivers that save/restore
>     bi_end_io. Arguably they shouldn't be saving/restoring bi_end_io at all
>     - in all but the simplest cases they'd be better off just cloning the
>     bio, and immutable biovecs is making bio cloning cheaper. But for now,
>     we add a bio_endio_nodec() for these cases.
>
>     Signed-off-by: Kent Overstreet <kmo@daterainc.com>
>     Cc: Jens Axboe <axboe@kernel.dk>
>
>  drivers/md/bcache/io.c       |  2 +-
>  drivers/md/dm-cache-target.c |  6 ++++
>  drivers/md/dm-snap.c         |  1 +
>  drivers/md/dm-thin.c         |  8 +++--
>  drivers/md/dm-verity.c       |  2 +-
>  fs/bio-integrity.c           |  2 +-
>  fs/bio.c                     | 76 ++++++++++++++++++++++++++++++++++++++++----
>  include/linux/bio.h          |  2 ++
>  include/linux/blk_types.h    |  2 ++
>  9 files changed, 90 insertions(+), 11 deletions(-)
>
> [   35.466413] random: nonblocking pool is initialized
> [  196.918039] ------------[ cut here ]------------
> [  196.919770] kernel BUG at fs/bio.c:1748!
> [  196.921505] invalid opcode: 0000 [#1] SMP
> [  196.921788] Modules linked in: microcode processor
> [  196.921788] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.13.0-rc6-01897-g2b48961 #1
> [  196.921788] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [  196.921788] task: ffff8804094acad0 ti: ffff8804094e8000 task.ti: ffff8804094e8000
> [  196.921788] RIP: 0010:[<ffffffff811ef01e>]  [<ffffffff811ef01e>] bio_endio+0x1e/0x6a
> [  196.921788] RSP: 0018:ffff88041fc83da8  EFLAGS: 00010046
> [  196.921788] RAX: 0000000000000000 RBX: 00000000fffffffb RCX: 00000001802a0002
> [  196.921788] RDX: 00000001802a0003 RSI: 0000000000000000 RDI: ffff8800299ff9e8
> [  196.921788] RBP: ffff88041fc83dc0 R08: ffffea00096cc980 R09: ffff8804097f5100
> [  196.921788] R10: ffffea000aeb8280 R11: ffffffff8143841e R12: ffff88025b326780
> [  196.921788] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000003000
> [  196.921788] FS:  0000000000000000(0000) GS:ffff88041fc80000(0000) knlGS:0000000000000000
> [  196.921788] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  196.921788] CR2: 00007f16e7a1948f CR3: 000000007f85e000 CR4: 00000000000006e0
> [  196.921788] Stack:
> [  196.921788]  ffff8800299ff9e8 ffff8800299ff9e8 ffff88025b326780 ffff88041fc83de8
> [  196.921788]  ffffffff81438429 00000000fffffffb ffff8803d36e6c00 0000000000000000
> [  196.921788]  ffff88041fc83e10 ffffffff811ef063 ffff8802bae0a1e8 ffff8802bae0a1e8
> [  196.921788] Call Trace:
> [  196.921788]  <IRQ>
> [  196.921788]  [<ffffffff81438429>] btrfs_end_bio+0x116/0x11d
> [  196.921788]  [<ffffffff811ef063>] bio_endio+0x63/0x6a
> [  196.921788]  [<ffffffff814cb712>] blk_mq_complete_request+0x89/0xfe
> [  196.921788]  [<ffffffff814cb79d>] __blk_mq_end_io+0x16/0x18
> [  196.921788]  [<ffffffff814cb7bf>] blk_mq_end_io+0x20/0xb1
> [  196.921788]  [<ffffffff815a1ba9>] virtblk_done+0xa4/0xf6
> [  196.921788]  [<ffffffff8155c463>] vring_interrupt+0x7c/0x8a
> [  196.921788]  [<ffffffff81107427>] handle_irq_event_percpu+0x4a/0x1bc
> [  196.921788]  [<ffffffff811075de>] handle_irq_event+0x45/0x61
> [  196.921788]  [<ffffffff81109f40>] handle_edge_irq+0xd9/0xfb
> [  196.921788]  [<ffffffff81039f56>] handle_irq+0x21/0x2a
> [  196.921788]  [<ffffffff81a0c3fd>] do_IRQ+0x4d/0xb4
> [  196.921788]  [<ffffffff81a034f2>] common_interrupt+0x72/0x72
> [  196.921788]  <EOI>
> [  196.921788]  [<ffffffff81065bfa>] ? native_safe_halt+0x6/0x8
> [  196.921788]  [<ffffffff8103f5d8>] default_idle+0x38/0xc1
> [  196.921788]  [<ffffffff8103fd04>] arch_cpu_idle+0x18/0x28
> [  196.921788]  [<ffffffff81106b6b>] cpu_startup_entry+0x178/0x269
> [  196.921788]  [<ffffffff81116954>] ? clockevents_register_device+0x112/0x117
> [  196.921788]  [<ffffffff8105ba60>] start_secondary+0x277/0x279
> [  196.921788] Code: ff ff eb bb 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 53 bb fb ff ff ff 48 85 ff 74 4c 8b 47 44 85 c0 7f 02 <0f> 0b 85 f6 74 07 f0 80 67 10 fe eb 09 48 8b 47 10 a8 01 0f 44
> [  196.921788] RIP  [<ffffffff811ef01e>] bio_endio+0x1e/0x6a
> [  196.921788]  RSP <ffff88041fc83da8>
> [  196.921788] ---[ end trace 0ec0fc28f7931a30 ]---
> [  196.921788] Kernel panic - not syncing: Fatal exception in interrupt
> [  196.921788] Rebooting in 10 seconds..
>
> Thanks,
> Fengguang
>

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748
       [not found]   ` <20140105094639.GA7423@localhost>
@ 2014-01-05 16:28     ` Muthu Kumar
  0 siblings, 0 replies; 21+ messages in thread
From: Muthu Kumar @ 2014-01-05 16:28 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Kent Overstreet, Jens Axboe, linux-btrfs, linux-fsdevel, LKML,
	lkp

Fengguang,
Instead of rebooting, can you trigger a crash dump when this happens
and send us the backtrace (to start with)?

Kent,
Did you do any btrfs test with your changes?

Regards,
Muthu

On Sun, Jan 5, 2014 at 1:46 AM, Fengguang Wu <fengguang.wu@intel.com> wrote:
> Hi Muthu,
>
> On Fri, Jan 03, 2014 at 11:51:31AM -0800, Muthu Kumar wrote:
>> Looks like Kent missed the btrfs endio in the original commit. How
>> about this patch:
>>
>> ---------
>>
>> In btrfs_end_bio, call bio_endio_nodec on the restored bio so the
>> bi_remaining is accounted for correctly.
>>
>> Reported-by: fengguang.wu@intel.com
>> Cc: Kent Overstreet <kmo@daterainc.com>
>> CC: Jens Axboe <axboe@kernel.dk>
>> Signed-off-by: Muthukumar Ratty <muthur@gmail.com>
>> --------
>>
>>  fs/btrfs/volumes.c |    6 +++++-
>>  1 files changed, 5 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index f2130de..edfed52 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -5316,7 +5316,11 @@ static void btrfs_end_bio(struct bio *bio, int err)
>>                 }
>>                 kfree(bbio);
>>
>> -               bio_endio(bio, err);
>> +                /*
>> +                 * Call endio_nodec on the restored bio so the bi_remaining is
>> +                 * accounted for correctly
>> +                 */
>> +               bio_endio_nodec(bio, err);
>>         } else if (!is_orig_bio) {
>>                 bio_put(bio);
>>         }
>
> Interestingly, the BUG message disappeared but it blocks the test run.
> In the end, the test watchdog reboots the machine with SysRq:
>
>         2014-01-04 23:13:02 mount -t btrfs /dev/vda /fs/vda
>         [   20.184264] btrfs: device fsid f0e06999-0518-47e0-a622-21b8749438be devid 1 transid 4 /dev/vda
>         [   20.186552] btrfs: disk space caching is enabled
>         [  131.360457] random: nonblocking pool is initialized
> ==>     [ 1465.069342] SysRq : Emergency Sync
> ==>     [ 1475.071055] SysRq : Resetting
>
> Attached is the full dmesg for a good run (v3.13-rc7) and a bad run
> (this patch).
>
> Thanks,
> Fengguang

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748
  2014-01-03 19:51 ` [block:for-3.14/core] kernel BUG at fs/bio.c:1748 Muthu Kumar
       [not found]   ` <20140105094639.GA7423@localhost>
@ 2014-01-06 22:10   ` Kent Overstreet
  2014-01-07  0:47     ` Muthu Kumar
  1 sibling, 1 reply; 21+ messages in thread
From: Kent Overstreet @ 2014-01-06 22:10 UTC (permalink / raw)
  To: Muthu Kumar, Chris Mason
  Cc: fengguang.wu, Jens Axboe, linux-btrfs, linux-fsdevel, LKML, lkp

Chris, the patch below seems to be incorrect - with it we get hangs, so
bi_remaining (probably) isn't getting decremented when it should be. You sent
Jens fixes for btrfs which I somehow lost when I rebased, do you remember how
this is supposed to work? Looking at the code I'm not quite sure what's going on
here.

On Fri, Jan 03, 2014 at 11:51:31AM -0800, Muthu Kumar wrote:
> Looks like Kent missed the btrfs endio in the original commit. How
> about this patch:
> 
> ---------
> 
> In btrfs_end_bio, call bio_endio_nodec on the restored bio so the
> bi_remaining is accounted for correctly.
> 
> Reported-by: fengguang.wu@intel.com
> Cc: Kent Overstreet <kmo@daterainc.com>
> CC: Jens Axboe <axboe@kernel.dk>
> Signed-off-by: Muthukumar Ratty <muthur@gmail.com>
> --------
> 
>  fs/btrfs/volumes.c |    6 +++++-
>  1 files changed, 5 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index f2130de..edfed52 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -5316,7 +5316,11 @@ static void btrfs_end_bio(struct bio *bio, int err)
>                 }
>                 kfree(bbio);
> 
> -               bio_endio(bio, err);
> +                /*
> +                 * Call endio_nodec on the restored bio so the bi_remaining is
> +                 * accounted for correctly
> +                 */
> +               bio_endio_nodec(bio, err);
>         } else if (!is_orig_bio) {
>                 bio_put(bio);
>         }

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748
  2014-01-06 22:10   ` Kent Overstreet
@ 2014-01-07  0:47     ` Muthu Kumar
  2014-01-07  2:52       ` Kent Overstreet
  2014-01-07  5:53       ` Fengguang Wu
  0 siblings, 2 replies; 21+ messages in thread
From: Muthu Kumar @ 2014-01-07  0:47 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Chris Mason, Fengguang Wu, Jens Axboe, linux-btrfs, linux-fsdevel,
	LKML, lkp

OK, after a bit more staring I believe the correct fix is the following.

Fengguang, Please try this one?

Regards,
Muthu

------------
In btrfs_end_bio(), we increment bi_remaining if is_orig_bio. If not,
we restore the orig_bio but failed to increment bi_remaining for
orig_bio, which triggers a BUG_ON later when bio_endio is called. Fix
is to increment bi_remaining when we restore the orig bio as well.

Reported-by: fengguang.wu@intel.com
CC: Kent Overstreet <kmo@daterainc.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Chris Mason <clm@fv
Signed-off-by: Muthukumar Ratty <muthur@gmail.com>
----------------
 fs/btrfs/volumes.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 37972d5..2011cc0 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5297,9 +5297,9 @@ static void btrfs_end_bio(struct bio *bio, int err)
                if (!is_orig_bio) {
                        bio_put(bio);
                        bio = bbio->orig_bio;
-               } else {
-                       atomic_inc(&bio->bi_remaining);
                }
+               atomic_inc(&bio->bi_remaining);
+
                bio->bi_private = bbio->private;
                bio->bi_end_io = bbio->end_io;
                btrfs_io_bio(bio)->mirror_num = bbio->mirror_num;

--------------------------



On Mon, Jan 6, 2014 at 2:10 PM, Kent Overstreet <kmo@daterainc.com> wrote:
> Chris, the patch below seems to be incorrect - with it we get hangs, so
> bi_remaining (probably) isn't getting decremented when it should be. You sent
> Jens fixes for btrfs which I somehow lost when I rebased, do you remember how
> this is supposed to work? Looking at the code I'm not quite sure what's going on
> here.
>
> On Fri, Jan 03, 2014 at 11:51:31AM -0800, Muthu Kumar wrote:
>> Looks like Kent missed the btrfs endio in the original commit. How
>> about this patch:
>>
>> ---------
>>
>> In btrfs_end_bio, call bio_endio_nodec on the restored bio so the
>> bi_remaining is accounted for correctly.
>>
>> Reported-by: fengguang.wu@intel.com
>> Cc: Kent Overstreet <kmo@daterainc.com>
>> CC: Jens Axboe <axboe@kernel.dk>
>> Signed-off-by: Muthukumar Ratty <muthur@gmail.com>
>> --------
>>
>>  fs/btrfs/volumes.c |    6 +++++-
>>  1 files changed, 5 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index f2130de..edfed52 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -5316,7 +5316,11 @@ static void btrfs_end_bio(struct bio *bio, int err)
>>                 }
>>                 kfree(bbio);
>>
>> -               bio_endio(bio, err);
>> +                /*
>> +                 * Call endio_nodec on the restored bio so the bi_remaining is
>> +                 * accounted for correctly
>> +                 */
>> +               bio_endio_nodec(bio, err);
>>         } else if (!is_orig_bio) {
>>                 bio_put(bio);
>>         }

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748
  2014-01-07  0:47     ` Muthu Kumar
@ 2014-01-07  2:52       ` Kent Overstreet
  2014-01-07  5:53       ` Fengguang Wu
  1 sibling, 0 replies; 21+ messages in thread
From: Kent Overstreet @ 2014-01-07  2:52 UTC (permalink / raw)
  To: Muthu Kumar
  Cc: Chris Mason, Fengguang Wu, Jens Axboe, linux-btrfs, linux-fsdevel,
	LKML, lkp

On Mon, Jan 06, 2014 at 04:47:38PM -0800, Muthu Kumar wrote:
> OK, after a bit more staring I believe the correct fix is the following.

This code still confuses me but I think you're correct, the fix certainly
matches the evidence we have.

> Fengguang, Please try this one?
> 
> Regards,
> Muthu
> 
> ------------
> In btrfs_end_bio(), we increment bi_remaining if is_orig_bio. If not,
> we restore the orig_bio but failed to increment bi_remaining for
> orig_bio, which triggers a BUG_ON later when bio_endio is called. Fix
> is to increment bi_remaining when we restore the orig bio as well.
> 
> Reported-by: fengguang.wu@intel.com
> CC: Kent Overstreet <kmo@daterainc.com>
> CC: Jens Axboe <axboe@kernel.dk>
> CC: Chris Mason <clm@fv
> Signed-off-by: Muthukumar Ratty <muthur@gmail.com>
> ----------------
>  fs/btrfs/volumes.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 37972d5..2011cc0 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -5297,9 +5297,9 @@ static void btrfs_end_bio(struct bio *bio, int err)
>                 if (!is_orig_bio) {
>                         bio_put(bio);
>                         bio = bbio->orig_bio;
> -               } else {
> -                       atomic_inc(&bio->bi_remaining);
>                 }
> +               atomic_inc(&bio->bi_remaining);
> +
>                 bio->bi_private = bbio->private;
>                 bio->bi_end_io = bbio->end_io;
>                 btrfs_io_bio(bio)->mirror_num = bbio->mirror_num;
> 
> --------------------------
> 
> 
> 
> On Mon, Jan 6, 2014 at 2:10 PM, Kent Overstreet <kmo@daterainc.com> wrote:
> > Chris, the patch below seems to be incorrect - with it we get hangs, so
> > bi_remaining (probably) isn't getting decremented when it should be. You sent
> > Jens fixes for btrfs which I somehow lost when I rebased, do you remember how
> > this is supposed to work? Looking at the code I'm not quite sure what's going on
> > here.
> >
> > On Fri, Jan 03, 2014 at 11:51:31AM -0800, Muthu Kumar wrote:
> >> Looks like Kent missed the btrfs endio in the original commit. How
> >> about this patch:
> >>
> >> ---------
> >>
> >> In btrfs_end_bio, call bio_endio_nodec on the restored bio so the
> >> bi_remaining is accounted for correctly.
> >>
> >> Reported-by: fengguang.wu@intel.com
> >> Cc: Kent Overstreet <kmo@daterainc.com>
> >> CC: Jens Axboe <axboe@kernel.dk>
> >> Signed-off-by: Muthukumar Ratty <muthur@gmail.com>
> >> --------
> >>
> >>  fs/btrfs/volumes.c |    6 +++++-
> >>  1 files changed, 5 insertions(+), 1 deletions(-)
> >>
> >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> >> index f2130de..edfed52 100644
> >> --- a/fs/btrfs/volumes.c
> >> +++ b/fs/btrfs/volumes.c
> >> @@ -5316,7 +5316,11 @@ static void btrfs_end_bio(struct bio *bio, int err)
> >>                 }
> >>                 kfree(bbio);
> >>
> >> -               bio_endio(bio, err);
> >> +                /*
> >> +                 * Call endio_nodec on the restored bio so the bi_remaining is
> >> +                 * accounted for correctly
> >> +                 */
> >> +               bio_endio_nodec(bio, err);
> >>         } else if (!is_orig_bio) {
> >>                 bio_put(bio);
> >>         }

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748
  2014-01-07  0:47     ` Muthu Kumar
  2014-01-07  2:52       ` Kent Overstreet
@ 2014-01-07  5:53       ` Fengguang Wu
  2014-01-07 20:15         ` Muthu Kumar
  1 sibling, 1 reply; 21+ messages in thread
From: Fengguang Wu @ 2014-01-07  5:53 UTC (permalink / raw)
  To: Muthu Kumar
  Cc: Kent Overstreet, Chris Mason, Jens Axboe, linux-btrfs,
	linux-fsdevel, LKML, lkp

On Mon, Jan 06, 2014 at 04:47:38PM -0800, Muthu Kumar wrote:
> OK, after a bit more staring I believe the correct fix is the following.
> 
> Fengguang, Please try this one?

Yes, it runs fine now!

Tested-by: Fengguang Wu <fengguang.wu@intel.com>

Thanks,
Fengguang

> ------------
> In btrfs_end_bio(), we increment bi_remaining if is_orig_bio. If not,
> we restore the orig_bio but failed to increment bi_remaining for
> orig_bio, which triggers a BUG_ON later when bio_endio is called. Fix
> is to increment bi_remaining when we restore the orig bio as well.
> 
> Reported-by: fengguang.wu@intel.com
> CC: Kent Overstreet <kmo@daterainc.com>
> CC: Jens Axboe <axboe@kernel.dk>
> CC: Chris Mason <clm@fv
> Signed-off-by: Muthukumar Ratty <muthur@gmail.com>
> ----------------
>  fs/btrfs/volumes.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 37972d5..2011cc0 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -5297,9 +5297,9 @@ static void btrfs_end_bio(struct bio *bio, int err)
>                 if (!is_orig_bio) {
>                         bio_put(bio);
>                         bio = bbio->orig_bio;
> -               } else {
> -                       atomic_inc(&bio->bi_remaining);
>                 }
> +               atomic_inc(&bio->bi_remaining);
> +
>                 bio->bi_private = bbio->private;
>                 bio->bi_end_io = bbio->end_io;
>                 btrfs_io_bio(bio)->mirror_num = bbio->mirror_num;
> 
> --------------------------
> 
> 
> 
> On Mon, Jan 6, 2014 at 2:10 PM, Kent Overstreet <kmo@daterainc.com> wrote:
> > Chris, the patch below seems to be incorrect - with it we get hangs, so
> > bi_remaining (probably) isn't getting decremented when it should be. You sent
> > Jens fixes for btrfs which I somehow lost when I rebased, do you remember how
> > this is supposed to work? Looking at the code I'm not quite sure what's going on
> > here.
> >
> > On Fri, Jan 03, 2014 at 11:51:31AM -0800, Muthu Kumar wrote:
> >> Looks like Kent missed the btrfs endio in the original commit. How
> >> about this patch:
> >>
> >> ---------
> >>
> >> In btrfs_end_bio, call bio_endio_nodec on the restored bio so the
> >> bi_remaining is accounted for correctly.
> >>
> >> Reported-by: fengguang.wu@intel.com
> >> Cc: Kent Overstreet <kmo@daterainc.com>
> >> CC: Jens Axboe <axboe@kernel.dk>
> >> Signed-off-by: Muthukumar Ratty <muthur@gmail.com>
> >> --------
> >>
> >>  fs/btrfs/volumes.c |    6 +++++-
> >>  1 files changed, 5 insertions(+), 1 deletions(-)
> >>
> >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> >> index f2130de..edfed52 100644
> >> --- a/fs/btrfs/volumes.c
> >> +++ b/fs/btrfs/volumes.c
> >> @@ -5316,7 +5316,11 @@ static void btrfs_end_bio(struct bio *bio, int err)
> >>                 }
> >>                 kfree(bbio);
> >>
> >> -               bio_endio(bio, err);
> >> +                /*
> >> +                 * Call endio_nodec on the restored bio so the bi_remaining is
> >> +                 * accounted for correctly
> >> +                 */
> >> +               bio_endio_nodec(bio, err);
> >>         } else if (!is_orig_bio) {
> >>                 bio_put(bio);
> >>         }

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748
  2014-01-07  5:53       ` Fengguang Wu
@ 2014-01-07 20:15         ` Muthu Kumar
  2014-01-07 20:29           ` Chris Mason
  2014-01-08 21:13           ` Chris Mason
  0 siblings, 2 replies; 21+ messages in thread
From: Muthu Kumar @ 2014-01-07 20:15 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Kent Overstreet, Chris Mason, Jens Axboe, linux-btrfs,
	linux-fsdevel, LKML, lkp

Thanks Fengguang. Final patch with added comment. BTW, fengguang
mentioned that git-am has trouble with the inline patch and "quilt
import" worked fine for him...

------------
In btrfs_end_bio(), we increment bi_remaining if is_orig_bio. If not,
we restore the orig_bio but failed to increment bi_remaining for
orig_bio, which triggers a BUG_ON later when bio_endio is called. Fix
is to increment bi_remaining when we restore the orig bio as well.

Reported-and-Tested-by: Fengguang wu <fengguang.wu@intel.com>
CC: Kent Overstreet <kmo@daterainc.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Chris Mason <clm@fb.com>
Signed-off-by: Muthukumar Ratty <muthur@gmail.com>

-----------
 fs/btrfs/volumes.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 37972d5..34aba2b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5297,9 +5297,13 @@ static void btrfs_end_bio(struct bio *bio, int err)
                if (!is_orig_bio) {
                        bio_put(bio);
                        bio = bbio->orig_bio;
-               } else {
-                       atomic_inc(&bio->bi_remaining);
                }
+                /*
+                 * We have original bio now. So increment bi_remaining to
+                 * account for it in endio
+                 */
+               atomic_inc(&bio->bi_remaining);
+
                bio->bi_private = bbio->private;
                bio->bi_end_io = bbio->end_io;
                btrfs_io_bio(bio)->mirror_num = bbio->mirror_num;

-------------------------------------

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748
  2014-01-07 20:15         ` Muthu Kumar
@ 2014-01-07 20:29           ` Chris Mason
  2014-01-07 21:23             ` Muthu Kumar
  2014-01-08 21:13           ` Chris Mason
  1 sibling, 1 reply; 21+ messages in thread
From: Chris Mason @ 2014-01-07 20:29 UTC (permalink / raw)
  To: muthu.lkml@gmail.com
  Cc: kmo@daterainc.com, linux-btrfs@vger.kernel.org,
	fengguang.wu@intel.com, linux-kernel@vger.kernel.org,
	lkp@linux.intel.com, axboe@kernel.dk,
	linux-fsdevel@vger.kernel.org

On Tue, 2014-01-07 at 12:15 -0800, Muthu Kumar wrote:
> Thanks Fengguang. Final patch with added comment. BTW, fengguang
> mentioned that git-am has trouble with the inline patch and "quilt
> import" worked fine for him...
> 
> ------------
> In btrfs_end_bio(), we increment bi_remaining if is_orig_bio. If not,
> we restore the orig_bio but failed to increment bi_remaining for
> orig_bio, which triggers a BUG_ON later when bio_endio is called. Fix
> is to increment bi_remaining when we restore the orig bio as well.
> 

Hi everyone,

Which git tree is this against?  Just Jens or some extra code too?

I'll run some tests here.  My original patch is below (it's slightly
different from Muthu's).

Btrfs is sometimes calling bio_endio twice on the same bio while
we chain things.  This makes sure we don't trip over new assertions in
fs/bio.c

Signed-off-by: Chris Mason <clm@fb.com>

diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index 7fcac70..5b30360 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -2289,6 +2289,10 @@ static void btrfsic_bio_end_io(struct bio *bp, int bio_error_status)
 		block = next_block;
 	} while (NULL != block);
 
+	/*
+	 * since we're not using bio_endio here, we don't need to worry about
+	 * the remaining count
+	 */
 	bp->bi_end_io(bp, bio_error_status);
 }
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 62176ad..786ddac 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1684,7 +1684,7 @@ static void end_workqueue_fn(struct btrfs_work *work)
 	bio->bi_private = end_io_wq->private;
 	bio->bi_end_io = end_io_wq->end_io;
 	kfree(end_io_wq);
-	bio_endio(bio, error);
+	bio_endio_nodec(bio, error);
 }
 
 static int cleaner_kthread(void *arg)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index ef48947..a31448f 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5284,9 +5284,17 @@ static void btrfs_end_bio(struct bio *bio, int err)
 		}
 	}
 
-	if (bio == bbio->orig_bio)
+	if (bio == bbio->orig_bio) {
 		is_orig_bio = 1;
 
+		/*
+		 * eventually we will call the bi_endio for the original bio,
+		 * make sure that we've properly bumped bi_remaining to reflect
+		 * our chain of endios here
+		 */
+		atomic_inc(&bio->bi_remaining);
+	}
+
 	if (atomic_dec_and_test(&bbio->stripes_pending)) {
 		if (!is_orig_bio) {
 			bio_put(bio);
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748
  2014-01-07 20:29           ` Chris Mason
@ 2014-01-07 21:23             ` Muthu Kumar
  2014-01-08 19:41               ` Chris Mason
  0 siblings, 1 reply; 21+ messages in thread
From: Muthu Kumar @ 2014-01-07 21:23 UTC (permalink / raw)
  To: Chris Mason
  Cc: kmo@daterainc.com, linux-btrfs@vger.kernel.org,
	fengguang.wu@intel.com, linux-kernel@vger.kernel.org,
	lkp@linux.intel.com, axboe@kernel.dk,
	linux-fsdevel@vger.kernel.org

Chris,
This is based off of Jens block tree, for-3.14/core branch...

Regards,
Muthu

On Tue, Jan 7, 2014 at 12:29 PM, Chris Mason <clm@fb.com> wrote:
> On Tue, 2014-01-07 at 12:15 -0800, Muthu Kumar wrote:
>> Thanks Fengguang. Final patch with added comment. BTW, fengguang
>> mentioned that git-am has trouble with the inline patch and "quilt
>> import" worked fine for him...
>>
>> ------------
>> In btrfs_end_bio(), we increment bi_remaining if is_orig_bio. If not,
>> we restore the orig_bio but failed to increment bi_remaining for
>> orig_bio, which triggers a BUG_ON later when bio_endio is called. Fix
>> is to increment bi_remaining when we restore the orig bio as well.
>>
>
> Hi everyone,
>
> Which git tree is this against?  Just Jens or some extra code too?
>
> I'll run some tests here.  My original patch is below (it's slightly
> different from Muthu's).
>
> Btrfs is sometimes calling bio_endio twice on the same bio while
> we chain things.  This makes sure we don't trip over new assertions in
> fs/bio.c
>
> Signed-off-by: Chris Mason <clm@fb.com>
>
> diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
> index 7fcac70..5b30360 100644
> --- a/fs/btrfs/check-integrity.c
> +++ b/fs/btrfs/check-integrity.c
> @@ -2289,6 +2289,10 @@ static void btrfsic_bio_end_io(struct bio *bp, int bio_error_status)
>                 block = next_block;
>         } while (NULL != block);
>
> +       /*
> +        * since we're not using bio_endio here, we don't need to worry about
> +        * the remaining count
> +        */
>         bp->bi_end_io(bp, bio_error_status);
>  }
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 62176ad..786ddac 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1684,7 +1684,7 @@ static void end_workqueue_fn(struct btrfs_work *work)
>         bio->bi_private = end_io_wq->private;
>         bio->bi_end_io = end_io_wq->end_io;
>         kfree(end_io_wq);
> -       bio_endio(bio, error);
> +       bio_endio_nodec(bio, error);
>  }
>
>  static int cleaner_kthread(void *arg)
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index ef48947..a31448f 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -5284,9 +5284,17 @@ static void btrfs_end_bio(struct bio *bio, int err)
>                 }
>         }
>
> -       if (bio == bbio->orig_bio)
> +       if (bio == bbio->orig_bio) {
>                 is_orig_bio = 1;
>
> +               /*
> +                * eventually we will call the bi_endio for the original bio,
> +                * make sure that we've properly bumped bi_remaining to reflect
> +                * our chain of endios here
> +                */
> +               atomic_inc(&bio->bi_remaining);
> +       }
> +
>         if (atomic_dec_and_test(&bbio->stripes_pending)) {
>                 if (!is_orig_bio) {
>                         bio_put(bio);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748
  2014-01-07 21:23             ` Muthu Kumar
@ 2014-01-08 19:41               ` Chris Mason
  2014-01-08 19:54                 ` Muthu Kumar
  0 siblings, 1 reply; 21+ messages in thread
From: Chris Mason @ 2014-01-08 19:41 UTC (permalink / raw)
  To: muthu.lkml@gmail.com
  Cc: kmo@daterainc.com, linux-btrfs@vger.kernel.org,
	fengguang.wu@intel.com, linux-kernel@vger.kernel.org,
	lkp@linux.intel.com, axboe@kernel.dk,
	linux-fsdevel@vger.kernel.org

On Tue, 2014-01-07 at 13:23 -0800, Muthu Kumar wrote:
> Chris,
> This is based off of Jens block tree, for-3.14/core branch...
> 

Ok, Kent did pull in one of my hunks, one was a comment and the third
was effectively the same as your patch.  I tried to test the end result
today, but get these on boot with ext4:

[    8.336061] WARNING: CPU: 0 PID: 0 at fs/bio.c:1778 bio_endio+0xbe/0x100()
[    8.336062] bio_endio: bio for (unknown) without endio
[    8.336063] Modules linked in: megaraid_sas(+)
[    8.336065] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.13.0-rc7-mason+ #1
[    8.336066] Hardware name: ZTSYSTEMS Echo Ridge T4  /A9DRPF-10D, BIOS 1.07 05/10/2012
[    8.336069]  00000000000006f2 ffff88087fc03c28 ffffffff815cb8c6 00000000000006f2
[    8.336071]  ffff88087fc03c78 ffff88087fc03c68 ffffffff81047497 ffff88085561a8e8
[    8.336073]  ffff8808582b6d80 00000000000000fe 00000000fffffffb ffff8808582b6d80
[    8.336073] Call Trace:
[    8.336078]  <IRQ>  [<ffffffff815cb8c6>] dump_stack+0x49/0x5b
[    8.336082]  [<ffffffff81047497>] warn_slowpath_common+0x87/0xb0
[    8.336084]  [<ffffffff81047561>] warn_slowpath_fmt+0x41/0x50
[    8.336086]  [<ffffffff813aa6b8>] ? scsi_request_fn+0xc8/0x6a0
[    8.336087]  [<ffffffff8119bc8e>] bio_endio+0xbe/0x100
[    8.336091]  [<ffffffff8128c1d3>] blk_update_request+0x243/0x3a0
[    8.336092]  [<ffffffff8128c352>] blk_update_bidi_request+0x22/0xa0
[    8.336094]  [<ffffffff8128ceca>] blk_end_bidi_request+0x2a/0x80
[    8.336096]  [<ffffffff8128cf5b>] blk_end_request+0xb/0x10
[    8.336098]  [<ffffffff813ab916>] scsi_io_completion+0xa6/0x700
[    8.336100]  [<ffffffff813a2b68>] scsi_finish_command+0xc8/0x130
[    8.336101]  [<ffffffff813ac0bf>] scsi_softirq_done+0x13f/0x160
[    8.336104]  [<ffffffff812937ad>] blk_done_softirq+0x6d/0x80
[    8.336106]  [<ffffffff8104c26b>] __do_softirq+0xdb/0x290
[    8.336108]  [<ffffffff8104c51d>] irq_exit+0xbd/0xd0
[    8.336110]  [<ffffffff81003db1>] do_IRQ+0x61/0xe0
[    8.336112]  [<ffffffff815d012a>] common_interrupt+0x6a/0x6a
[    8.336117]  <EOI>  [<ffffffff814e213a>] ? cpuidle_enter_state+0x4a/0xc0
[    8.336119]  [<ffffffff814e2136>] ? cpuidle_enter_state+0x46/0xc0
[    8.336121]  [<ffffffff814e2277>] cpuidle_idle_call+0xc7/0x160
[    8.336123]  [<ffffffff8100b2c9>] arch_cpu_idle+0x9/0x20
[    8.336126]  [<ffffffff8108fd8a>] cpu_startup_entry+0x9a/0x250
[    8.336128]  [<ffffffff815c3702>] rest_init+0x72/0x80
[    8.336131]  [<ffffffff81ac2047>] start_kernel+0x3fd/0x40a
[    8.336133]  [<ffffffff81ac1a78>] ? repair_env_string+0x5b/0x5b
[    8.336134]  [<ffffffff81ac159d>] x86_64_start_reservations+0x2a/0x2c
[    8.336136]  [<ffffffff81ac16df>] x86_64_start_kernel+0x140/0x147
[    8.336137] ---[ end trace d0966e2430ea53b4 ]---
[    8.336146] ------------[ cut here ]------------
[    8.336146] kernel BUG at fs/bio.c:523!
[    8.336148] invalid opcode: 0000 [#1] SMP
[    8.336148] Modules linked in: megaraid_sas(+)
[    8.336150] CPU: 0 PID: 2911 Comm: scsi_id Tainted: G        W    3.13.0-rc7-mason+ #1
[    8.336150] Hardware name: ZTSYSTEMS Echo Ridge T4  /A9DRPF-10D, BIOS 1.07 05/10/2012
[    8.336151] task: ffff8808556b4150 ti: ffff8808556b6000 task.ti: ffff8808556b6000
[    8.336153] RIP: 0010:[<ffffffff8119bbba>]  [<ffffffff8119bbba>] bio_put+0x8a/0xa0
[    8.336153] RSP: 0018:ffff8808556b7b68  EFLAGS: 00010246
[    8.336154] RAX: 0000000000000000 RBX: ffff8808582b6d80 RCX: 0000000000000000
[    8.336155] RDX: ffff8808582b6dec RSI: 0000000000000003 RDI: ffff8808582b6d80
[    8.336155] RBP: ffff8808556b7b78 R08: 0000000000000004 R09: 0000000000000000
[    8.336156] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
[    8.336156] R13: 0000000000000000 R14: ffff8808567ebe28 R15: ffff8808582b6d80
[    8.336157] FS:  00007f16056bd700(0000) GS:ffff88087fc00000(0000) knlGS:0000000000000000
[    8.336158] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    8.336159] CR2: ffffe8f7ffc00000 CR3: 0000000856303000 CR4: 00000000000407f0
[    8.336159] Stack:
[    8.336164]  ffff8808582b6d80 0000000000000000 ffff8808556b7ba8 ffffffff81291b37
[    8.336168]  ffff8808556b7b88 ffff8808556b7cf8 ffff88085561a8e8 ffff880855685400
[    8.336172]  ffff8808556b7c78 ffffffff8129b42d ffff8808556b7be8 ffffffff8119e09b
[    8.336172] Call Trace:
[    8.336174]  [<ffffffff81291b37>] blk_rq_unmap_user+0x47/0x60
[    8.336177]  [<ffffffff8129b42d>] sg_io+0x26d/0x370
[    8.336179]  [<ffffffff8119e09b>] ? bdget+0x11b/0x130
[    8.336183]  [<ffffffff811068c9>] ? find_get_page+0x19/0xa0
[    8.336185]  [<ffffffff8129bc79>] scsi_cmd_ioctl+0x409/0x480
[    8.336186]  [<ffffffff81106af2>] ? unlock_page+0x22/0x30
[    8.336189]  [<ffffffff81130949>] ? __do_fault+0x439/0x560
[    8.336191]  [<ffffffff8129bd3c>] scsi_cmd_blk_ioctl+0x4c/0x70
[    8.336194]  [<ffffffff81437d6f>] sd_ioctl+0xcf/0x160
[    8.336196]  [<ffffffff81298003>] __blkdev_driver_ioctl+0x23/0x30
[    8.336198]  [<ffffffff81298638>] blkdev_ioctl+0x1f8/0x790
[    8.336199]  [<ffffffff8119d717>] block_ioctl+0x37/0x40
[    8.336201]  [<ffffffff811790c7>] do_vfs_ioctl+0x87/0x4f0
[    8.336204]  [<ffffffff8126374a>] ? file_has_perm+0x8a/0xa0
[    8.336205]  [<ffffffff811795c1>] SyS_ioctl+0x91/0xa0
[    8.336207]  [<ffffffff815d77e2>] system_call_fastpath+0x16/0x1b
[    8.336218] Code: 8b 74 24 10 48 29 fb 48 89 df e8 a2 d2 f6 ff 48 8b 1c 24 4c 8b 64 24 08 c9 c3 0f 1f 80 00 00 00 00 48 89 df e8 38 60 fb ff eb 9a <0f> 0b 0f 1f 40 00 eb f
a 66 66 66 66 66 2e 0f 1f 84 00 00 00 00
[    8.336220] RIP  [<ffffffff8119bbba>] bio_put+0x8a/0xa0
[    8.336220]  RSP <ffff8808556b7b68>
[    8.336221] ---[ end trace d0966e2430ea53b5 ]---

Trying to track it down.

-chris

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748
  2014-01-08 19:41               ` Chris Mason
@ 2014-01-08 19:54                 ` Muthu Kumar
  2014-01-08 20:16                   ` Chris Mason
  0 siblings, 1 reply; 21+ messages in thread
From: Muthu Kumar @ 2014-01-08 19:54 UTC (permalink / raw)
  To: Chris Mason
  Cc: kmo@daterainc.com, linux-btrfs@vger.kernel.org,
	fengguang.wu@intel.com, linux-kernel@vger.kernel.org,
	lkp@linux.intel.com, axboe@kernel.dk,
	linux-fsdevel@vger.kernel.org

Chris,

[    8.336061] WARNING: CPU: 0 PID: 0 at fs/bio.c:1778 bio_endio+0xbe/0x100()
[    8.336062] bio_endio: bio for (unknown) without endio

This is my recent change to avoid memory leak in bio_endio. But I
think the problem is higher up, most likely bio_endio is called twice
on the same bio (which was freed before).

Are you running the unmodified for-3.14/core or do you have local patches?


Regards,
Muthu

On Wed, Jan 8, 2014 at 11:41 AM, Chris Mason <clm@fb.com> wrote:
> On Tue, 2014-01-07 at 13:23 -0800, Muthu Kumar wrote:
>> Chris,
>> This is based off of Jens block tree, for-3.14/core branch...
>>
>
> Ok, Kent did pull in one of my hunks, one was a comment and the third
> was effectively the same as your patch.  I tried to test the end result
> today, but get these on boot with ext4:
>
> [    8.336061] WARNING: CPU: 0 PID: 0 at fs/bio.c:1778 bio_endio+0xbe/0x100()
> [    8.336062] bio_endio: bio for (unknown) without endio
> [    8.336063] Modules linked in: megaraid_sas(+)
> [    8.336065] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.13.0-rc7-mason+ #1
> [    8.336066] Hardware name: ZTSYSTEMS Echo Ridge T4  /A9DRPF-10D, BIOS 1.07 05/10/2012
> [    8.336069]  00000000000006f2 ffff88087fc03c28 ffffffff815cb8c6 00000000000006f2
> [    8.336071]  ffff88087fc03c78 ffff88087fc03c68 ffffffff81047497 ffff88085561a8e8
> [    8.336073]  ffff8808582b6d80 00000000000000fe 00000000fffffffb ffff8808582b6d80
> [    8.336073] Call Trace:
> [    8.336078]  <IRQ>  [<ffffffff815cb8c6>] dump_stack+0x49/0x5b
> [    8.336082]  [<ffffffff81047497>] warn_slowpath_common+0x87/0xb0
> [    8.336084]  [<ffffffff81047561>] warn_slowpath_fmt+0x41/0x50
> [    8.336086]  [<ffffffff813aa6b8>] ? scsi_request_fn+0xc8/0x6a0
> [    8.336087]  [<ffffffff8119bc8e>] bio_endio+0xbe/0x100
> [    8.336091]  [<ffffffff8128c1d3>] blk_update_request+0x243/0x3a0
> [    8.336092]  [<ffffffff8128c352>] blk_update_bidi_request+0x22/0xa0
> [    8.336094]  [<ffffffff8128ceca>] blk_end_bidi_request+0x2a/0x80
> [    8.336096]  [<ffffffff8128cf5b>] blk_end_request+0xb/0x10
> [    8.336098]  [<ffffffff813ab916>] scsi_io_completion+0xa6/0x700
> [    8.336100]  [<ffffffff813a2b68>] scsi_finish_command+0xc8/0x130
> [    8.336101]  [<ffffffff813ac0bf>] scsi_softirq_done+0x13f/0x160
> [    8.336104]  [<ffffffff812937ad>] blk_done_softirq+0x6d/0x80
> [    8.336106]  [<ffffffff8104c26b>] __do_softirq+0xdb/0x290
> [    8.336108]  [<ffffffff8104c51d>] irq_exit+0xbd/0xd0
> [    8.336110]  [<ffffffff81003db1>] do_IRQ+0x61/0xe0
> [    8.336112]  [<ffffffff815d012a>] common_interrupt+0x6a/0x6a
> [    8.336117]  <EOI>  [<ffffffff814e213a>] ? cpuidle_enter_state+0x4a/0xc0
> [    8.336119]  [<ffffffff814e2136>] ? cpuidle_enter_state+0x46/0xc0
> [    8.336121]  [<ffffffff814e2277>] cpuidle_idle_call+0xc7/0x160
> [    8.336123]  [<ffffffff8100b2c9>] arch_cpu_idle+0x9/0x20
> [    8.336126]  [<ffffffff8108fd8a>] cpu_startup_entry+0x9a/0x250
> [    8.336128]  [<ffffffff815c3702>] rest_init+0x72/0x80
> [    8.336131]  [<ffffffff81ac2047>] start_kernel+0x3fd/0x40a
> [    8.336133]  [<ffffffff81ac1a78>] ? repair_env_string+0x5b/0x5b
> [    8.336134]  [<ffffffff81ac159d>] x86_64_start_reservations+0x2a/0x2c
> [    8.336136]  [<ffffffff81ac16df>] x86_64_start_kernel+0x140/0x147
> [    8.336137] ---[ end trace d0966e2430ea53b4 ]---
> [    8.336146] ------------[ cut here ]------------
> [    8.336146] kernel BUG at fs/bio.c:523!
> [    8.336148] invalid opcode: 0000 [#1] SMP
> [    8.336148] Modules linked in: megaraid_sas(+)
> [    8.336150] CPU: 0 PID: 2911 Comm: scsi_id Tainted: G        W    3.13.0-rc7-mason+ #1
> [    8.336150] Hardware name: ZTSYSTEMS Echo Ridge T4  /A9DRPF-10D, BIOS 1.07 05/10/2012
> [    8.336151] task: ffff8808556b4150 ti: ffff8808556b6000 task.ti: ffff8808556b6000
> [    8.336153] RIP: 0010:[<ffffffff8119bbba>]  [<ffffffff8119bbba>] bio_put+0x8a/0xa0
> [    8.336153] RSP: 0018:ffff8808556b7b68  EFLAGS: 00010246
> [    8.336154] RAX: 0000000000000000 RBX: ffff8808582b6d80 RCX: 0000000000000000
> [    8.336155] RDX: ffff8808582b6dec RSI: 0000000000000003 RDI: ffff8808582b6d80
> [    8.336155] RBP: ffff8808556b7b78 R08: 0000000000000004 R09: 0000000000000000
> [    8.336156] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
> [    8.336156] R13: 0000000000000000 R14: ffff8808567ebe28 R15: ffff8808582b6d80
> [    8.336157] FS:  00007f16056bd700(0000) GS:ffff88087fc00000(0000) knlGS:0000000000000000
> [    8.336158] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    8.336159] CR2: ffffe8f7ffc00000 CR3: 0000000856303000 CR4: 00000000000407f0
> [    8.336159] Stack:
> [    8.336164]  ffff8808582b6d80 0000000000000000 ffff8808556b7ba8 ffffffff81291b37
> [    8.336168]  ffff8808556b7b88 ffff8808556b7cf8 ffff88085561a8e8 ffff880855685400
> [    8.336172]  ffff8808556b7c78 ffffffff8129b42d ffff8808556b7be8 ffffffff8119e09b
> [    8.336172] Call Trace:
> [    8.336174]  [<ffffffff81291b37>] blk_rq_unmap_user+0x47/0x60
> [    8.336177]  [<ffffffff8129b42d>] sg_io+0x26d/0x370
> [    8.336179]  [<ffffffff8119e09b>] ? bdget+0x11b/0x130
> [    8.336183]  [<ffffffff811068c9>] ? find_get_page+0x19/0xa0
> [    8.336185]  [<ffffffff8129bc79>] scsi_cmd_ioctl+0x409/0x480
> [    8.336186]  [<ffffffff81106af2>] ? unlock_page+0x22/0x30
> [    8.336189]  [<ffffffff81130949>] ? __do_fault+0x439/0x560
> [    8.336191]  [<ffffffff8129bd3c>] scsi_cmd_blk_ioctl+0x4c/0x70
> [    8.336194]  [<ffffffff81437d6f>] sd_ioctl+0xcf/0x160
> [    8.336196]  [<ffffffff81298003>] __blkdev_driver_ioctl+0x23/0x30
> [    8.336198]  [<ffffffff81298638>] blkdev_ioctl+0x1f8/0x790
> [    8.336199]  [<ffffffff8119d717>] block_ioctl+0x37/0x40
> [    8.336201]  [<ffffffff811790c7>] do_vfs_ioctl+0x87/0x4f0
> [    8.336204]  [<ffffffff8126374a>] ? file_has_perm+0x8a/0xa0
> [    8.336205]  [<ffffffff811795c1>] SyS_ioctl+0x91/0xa0
> [    8.336207]  [<ffffffff815d77e2>] system_call_fastpath+0x16/0x1b
> [    8.336218] Code: 8b 74 24 10 48 29 fb 48 89 df e8 a2 d2 f6 ff 48 8b 1c 24 4c 8b 64 24 08 c9 c3 0f 1f 80 00 00 00 00 48 89 df e8 38 60 fb ff eb 9a <0f> 0b 0f 1f 40 00 eb f
> a 66 66 66 66 66 2e 0f 1f 84 00 00 00 00
> [    8.336220] RIP  [<ffffffff8119bbba>] bio_put+0x8a/0xa0
> [    8.336220]  RSP <ffff8808556b7b68>
> [    8.336221] ---[ end trace d0966e2430ea53b5 ]---
>
> Trying to track it down.
>
> -chris

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748
  2014-01-08 19:54                 ` Muthu Kumar
@ 2014-01-08 20:16                   ` Chris Mason
  2014-01-08 20:40                     ` Muthu Kumar
  0 siblings, 1 reply; 21+ messages in thread
From: Chris Mason @ 2014-01-08 20:16 UTC (permalink / raw)
  To: muthu.lkml@gmail.com
  Cc: kmo@daterainc.com, linux-btrfs@vger.kernel.org,
	fengguang.wu@intel.com, linux-kernel@vger.kernel.org,
	lkp@linux.intel.com, axboe@kernel.dk,
	linux-fsdevel@vger.kernel.org

On Wed, 2014-01-08 at 11:54 -0800, Muthu Kumar wrote:
> Chris,
> 
> [    8.336061] WARNING: CPU: 0 PID: 0 at fs/bio.c:1778 bio_endio+0xbe/0x100()
> [    8.336062] bio_endio: bio for (unknown) without endio
> 
> This is my recent change to avoid memory leak in bio_endio. But I
> think the problem is higher up, most likely bio_endio is called twice
> on the same bio (which was freed before).
> 

I think these are just two separate problems.  Lets ignore the WARN_ON
for now.

> Are you running the unmodified for-3.14/core or do you have local patches?
> 

It's for-3.14/core with my btrfs branch.  Basically rc7 instead of rc6
but no changes to the block layer.  I hadn't mounted btrfs yet.

-chris


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748
  2014-01-08 20:16                   ` Chris Mason
@ 2014-01-08 20:40                     ` Muthu Kumar
  2014-01-08 20:51                       ` Chris Mason
  0 siblings, 1 reply; 21+ messages in thread
From: Muthu Kumar @ 2014-01-08 20:40 UTC (permalink / raw)
  To: Chris Mason
  Cc: kmo@daterainc.com, linux-btrfs@vger.kernel.org,
	fengguang.wu@intel.com, linux-kernel@vger.kernel.org,
	lkp@linux.intel.com, axboe@kernel.dk,
	linux-fsdevel@vger.kernel.org

On Wed, Jan 8, 2014 at 12:16 PM, Chris Mason <clm@fb.com> wrote:
> On Wed, 2014-01-08 at 11:54 -0800, Muthu Kumar wrote:
>> Chris,
>>
>> [    8.336061] WARNING: CPU: 0 PID: 0 at fs/bio.c:1778 bio_endio+0xbe/0x100()
>> [    8.336062] bio_endio: bio for (unknown) without endio
>>
>> This is my recent change to avoid memory leak in bio_endio. But I
>> think the problem is higher up, most likely bio_endio is called twice
>> on the same bio (which was freed before).
>>
>
> I think these are just two separate problems.  Lets ignore the WARN_ON
> for now.
>

Not really... the BUG that is triggered:

kernel BUG at fs/bio.c:523!

It is in bio_put() (added to bio_endio() as part of recent change)
which gets an already freed bio.

>> Are you running the unmodified for-3.14/core or do you have local patches?
>>
>
> It's for-3.14/core with my btrfs branch.  Basically rc7 instead of rc6
> but no changes to the block layer.  I hadn't mounted btrfs yet.
>
> -chris
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748
  2014-01-08 20:40                     ` Muthu Kumar
@ 2014-01-08 20:51                       ` Chris Mason
  2014-01-08 21:01                         ` Muthu Kumar
  0 siblings, 1 reply; 21+ messages in thread
From: Chris Mason @ 2014-01-08 20:51 UTC (permalink / raw)
  To: muthu.lkml@gmail.com
  Cc: kmo@daterainc.com, linux-btrfs@vger.kernel.org,
	fengguang.wu@intel.com, linux-kernel@vger.kernel.org,
	lkp@linux.intel.com, axboe@kernel.dk,
	linux-fsdevel@vger.kernel.org

On Wed, 2014-01-08 at 12:40 -0800, Muthu Kumar wrote:
> On Wed, Jan 8, 2014 at 12:16 PM, Chris Mason <clm@fb.com> wrote:
> > On Wed, 2014-01-08 at 11:54 -0800, Muthu Kumar wrote:
> >> Chris,
> >>
> >> [    8.336061] WARNING: CPU: 0 PID: 0 at fs/bio.c:1778 bio_endio+0xbe/0x100()
> >> [    8.336062] bio_endio: bio for (unknown) without endio
> >>
> >> This is my recent change to avoid memory leak in bio_endio. But I
> >> think the problem is higher up, most likely bio_endio is called twice
> >> on the same bio (which was freed before).
> >>
> >
> > I think these are just two separate problems.  Lets ignore the WARN_ON
> > for now.
> >
> 
> Not really... the BUG that is triggered:
> 
> kernel BUG at fs/bio.c:523!
> 
> It is in bio_put() (added to bio_endio() as part of recent change)
> which gets an already freed bio.
> 

Oh! I see.  Let me try with that one reverted.  Thanks!

-chris


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748
  2014-01-08 20:51                       ` Chris Mason
@ 2014-01-08 21:01                         ` Muthu Kumar
  2014-01-08 21:11                           ` Chris Mason
  0 siblings, 1 reply; 21+ messages in thread
From: Muthu Kumar @ 2014-01-08 21:01 UTC (permalink / raw)
  To: Chris Mason
  Cc: kmo@daterainc.com, linux-btrfs@vger.kernel.org,
	fengguang.wu@intel.com, linux-kernel@vger.kernel.org,
	lkp@linux.intel.com, axboe@kernel.dk,
	linux-fsdevel@vger.kernel.org

On Wed, Jan 8, 2014 at 12:51 PM, Chris Mason <clm@fb.com> wrote:
> On Wed, 2014-01-08 at 12:40 -0800, Muthu Kumar wrote:
>> On Wed, Jan 8, 2014 at 12:16 PM, Chris Mason <clm@fb.com> wrote:
>> > On Wed, 2014-01-08 at 11:54 -0800, Muthu Kumar wrote:
>> >> Chris,
>> >>
>> >> [    8.336061] WARNING: CPU: 0 PID: 0 at fs/bio.c:1778 bio_endio+0xbe/0x100()
>> >> [    8.336062] bio_endio: bio for (unknown) without endio
>> >>
>> >> This is my recent change to avoid memory leak in bio_endio. But I
>> >> think the problem is higher up, most likely bio_endio is called twice
>> >> on the same bio (which was freed before).
>> >>
>> >
>> > I think these are just two separate problems.  Lets ignore the WARN_ON
>> > for now.
>> >
>>
>> Not really... the BUG that is triggered:
>>
>> kernel BUG at fs/bio.c:523!
>>
>> It is in bio_put() (added to bio_endio() as part of recent change)
>> which gets an already freed bio.
>>
>
> Oh! I see.  Let me try with that one reverted.  Thanks!
>
> -chris
>

But, like I said, problem is in different place. I am running a "dd"
on ext4 fs for a while now, but didn't hit the problem. Any idea to
repro locally? I would also suggest running just the for-3.1/core to
isolate the issue.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748
  2014-01-08 21:01                         ` Muthu Kumar
@ 2014-01-08 21:11                           ` Chris Mason
  2014-01-08 21:14                             ` Kent Overstreet
  0 siblings, 1 reply; 21+ messages in thread
From: Chris Mason @ 2014-01-08 21:11 UTC (permalink / raw)
  To: muthu.lkml@gmail.com
  Cc: kmo@daterainc.com, linux-btrfs@vger.kernel.org,
	fengguang.wu@intel.com, linux-kernel@vger.kernel.org,
	lkp@linux.intel.com, axboe@kernel.dk,
	linux-fsdevel@vger.kernel.org

On Wed, 2014-01-08 at 13:01 -0800, Muthu Kumar wrote:
> On Wed, Jan 8, 2014 at 12:51 PM, Chris Mason <clm@fb.com> wrote:
> > On Wed, 2014-01-08 at 12:40 -0800, Muthu Kumar wrote:
> >> On Wed, Jan 8, 2014 at 12:16 PM, Chris Mason <clm@fb.com> wrote:
> >> > On Wed, 2014-01-08 at 11:54 -0800, Muthu Kumar wrote:
> >> >> Chris,
> >> >>
> >> >> [    8.336061] WARNING: CPU: 0 PID: 0 at fs/bio.c:1778 bio_endio+0xbe/0x100()
> >> >> [    8.336062] bio_endio: bio for (unknown) without endio
> >> >>
> >> >> This is my recent change to avoid memory leak in bio_endio. But I
> >> >> think the problem is higher up, most likely bio_endio is called twice
> >> >> on the same bio (which was freed before).
> >> >>
> >> >
> >> > I think these are just two separate problems.  Lets ignore the WARN_ON
> >> > for now.
> >> >
> >>
> >> Not really... the BUG that is triggered:
> >>
> >> kernel BUG at fs/bio.c:523!
> >>
> >> It is in bio_put() (added to bio_endio() as part of recent change)
> >> which gets an already freed bio.
> >>
> >
> > Oh! I see.  Let me try with that one reverted.  Thanks!
> >
> > -chris
> >
> 
> But, like I said, problem is in different place. I am running a "dd"
> on ext4 fs for a while now, but didn't hit the problem. Any idea to
> repro locally? I would also suggest running just the for-3.1/core to
> isolate the issue.

Just reverting that change fixes it for me.  Jens mentioned it was
broken for on-stack bios.

-chris


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748
  2014-01-07 20:15         ` Muthu Kumar
  2014-01-07 20:29           ` Chris Mason
@ 2014-01-08 21:13           ` Chris Mason
  2014-01-08 21:21             ` Jens Axboe
  1 sibling, 1 reply; 21+ messages in thread
From: Chris Mason @ 2014-01-08 21:13 UTC (permalink / raw)
  To: muthu.lkml@gmail.com
  Cc: kmo@daterainc.com, linux-btrfs@vger.kernel.org,
	fengguang.wu@intel.com, linux-kernel@vger.kernel.org,
	lkp@linux.intel.com, axboe@kernel.dk,
	linux-fsdevel@vger.kernel.org

On Tue, 2014-01-07 at 12:15 -0800, Muthu Kumar wrote:
> Thanks Fengguang. Final patch with added comment. BTW, fengguang
> mentioned that git-am has trouble with the inline patch and "quilt
> import" worked fine for him...
> 
> ------------
> In btrfs_end_bio(), we increment bi_remaining if is_orig_bio. If not,
> we restore the orig_bio but failed to increment bi_remaining for
> orig_bio, which triggers a BUG_ON later when bio_endio is called. Fix
> is to increment bi_remaining when we restore the orig bio as well.
> 
> Reported-and-Tested-by: Fengguang wu <fengguang.wu@intel.com>
> CC: Kent Overstreet <kmo@daterainc.com>
> CC: Jens Axboe <axboe@kernel.dk>
> CC: Chris Mason <clm@fb.com>
> Signed-off-by: Muthukumar Ratty <muthur@gmail.com>
> 

Reviewed-by: Chris Mason <clm@fb.com>

Jens, please pull this one in.

> -----------
>  fs/btrfs/volumes.c |    8 ++++++--
>  1 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 37972d5..34aba2b 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -5297,9 +5297,13 @@ static void btrfs_end_bio(struct bio *bio, int err)
>                 if (!is_orig_bio) {
>                         bio_put(bio);
>                         bio = bbio->orig_bio;
> -               } else {
> -                       atomic_inc(&bio->bi_remaining);
>                 }
> +                /*
> +                 * We have original bio now. So increment bi_remaining to
> +                 * account for it in endio
> +                 */
> +               atomic_inc(&bio->bi_remaining);
> +
>                 bio->bi_private = bbio->private;
>                 bio->bi_end_io = bbio->end_io;
>                 btrfs_io_bio(bio)->mirror_num = bbio->mirror_num;
> 
> -------------------------------------



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748
  2014-01-08 21:11                           ` Chris Mason
@ 2014-01-08 21:14                             ` Kent Overstreet
  2014-01-08 21:18                               ` Muthu Kumar
  0 siblings, 1 reply; 21+ messages in thread
From: Kent Overstreet @ 2014-01-08 21:14 UTC (permalink / raw)
  To: Chris Mason
  Cc: muthu.lkml@gmail.com, linux-btrfs@vger.kernel.org,
	fengguang.wu@intel.com, linux-kernel@vger.kernel.org,
	lkp@linux.intel.com, axboe@kernel.dk,
	linux-fsdevel@vger.kernel.org

On Wed, Jan 08, 2014 at 09:11:49PM +0000, Chris Mason wrote:
> On Wed, 2014-01-08 at 13:01 -0800, Muthu Kumar wrote:
> > On Wed, Jan 8, 2014 at 12:51 PM, Chris Mason <clm@fb.com> wrote:
> > > On Wed, 2014-01-08 at 12:40 -0800, Muthu Kumar wrote:
> > >> On Wed, Jan 8, 2014 at 12:16 PM, Chris Mason <clm@fb.com> wrote:
> > >> > On Wed, 2014-01-08 at 11:54 -0800, Muthu Kumar wrote:
> > >> >> Chris,
> > >> >>
> > >> >> [    8.336061] WARNING: CPU: 0 PID: 0 at fs/bio.c:1778 bio_endio+0xbe/0x100()
> > >> >> [    8.336062] bio_endio: bio for (unknown) without endio
> > >> >>
> > >> >> This is my recent change to avoid memory leak in bio_endio. But I
> > >> >> think the problem is higher up, most likely bio_endio is called twice
> > >> >> on the same bio (which was freed before).
> > >> >>
> > >> >
> > >> > I think these are just two separate problems.  Lets ignore the WARN_ON
> > >> > for now.
> > >> >
> > >>
> > >> Not really... the BUG that is triggered:
> > >>
> > >> kernel BUG at fs/bio.c:523!
> > >>
> > >> It is in bio_put() (added to bio_endio() as part of recent change)
> > >> which gets an already freed bio.
> > >>
> > >
> > > Oh! I see.  Let me try with that one reverted.  Thanks!
> > >
> > > -chris
> > >
> > 
> > But, like I said, problem is in different place. I am running a "dd"
> > on ext4 fs for a while now, but didn't hit the problem. Any idea to
> > repro locally? I would also suggest running just the for-3.1/core to
> > isolate the issue.
> 
> Just reverting that change fixes it for me.  Jens mentioned it was
> broken for on-stack bios.

On-stack bios? I don't recall ever coming across such a thing, who what
where why?

i would expect on stack bios to work though, i'm really curious how it
was broken

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748
  2014-01-08 21:14                             ` Kent Overstreet
@ 2014-01-08 21:18                               ` Muthu Kumar
  2014-01-08 21:24                                 ` Kent Overstreet
  0 siblings, 1 reply; 21+ messages in thread
From: Muthu Kumar @ 2014-01-08 21:18 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Chris Mason, linux-btrfs@vger.kernel.org, fengguang.wu@intel.com,
	linux-kernel@vger.kernel.org, lkp@linux.intel.com,
	axboe@kernel.dk, linux-fsdevel@vger.kernel.org

On Wed, Jan 8, 2014 at 1:14 PM, Kent Overstreet <kmo@daterainc.com> wrote:
> On Wed, Jan 08, 2014 at 09:11:49PM +0000, Chris Mason wrote:
>> On Wed, 2014-01-08 at 13:01 -0800, Muthu Kumar wrote:
>> > On Wed, Jan 8, 2014 at 12:51 PM, Chris Mason <clm@fb.com> wrote:
>> > > On Wed, 2014-01-08 at 12:40 -0800, Muthu Kumar wrote:
>> > >> On Wed, Jan 8, 2014 at 12:16 PM, Chris Mason <clm@fb.com> wrote:
>> > >> > On Wed, 2014-01-08 at 11:54 -0800, Muthu Kumar wrote:
>> > >> >> Chris,
>> > >> >>
>> > >> >> [    8.336061] WARNING: CPU: 0 PID: 0 at fs/bio.c:1778 bio_endio+0xbe/0x100()
>> > >> >> [    8.336062] bio_endio: bio for (unknown) without endio
>> > >> >>
>> > >> >> This is my recent change to avoid memory leak in bio_endio. But I
>> > >> >> think the problem is higher up, most likely bio_endio is called twice
>> > >> >> on the same bio (which was freed before).
>> > >> >>
>> > >> >
>> > >> > I think these are just two separate problems.  Lets ignore the WARN_ON
>> > >> > for now.
>> > >> >
>> > >>
>> > >> Not really... the BUG that is triggered:
>> > >>
>> > >> kernel BUG at fs/bio.c:523!
>> > >>
>> > >> It is in bio_put() (added to bio_endio() as part of recent change)
>> > >> which gets an already freed bio.
>> > >>
>> > >
>> > > Oh! I see.  Let me try with that one reverted.  Thanks!
>> > >
>> > > -chris
>> > >
>> >
>> > But, like I said, problem is in different place. I am running a "dd"
>> > on ext4 fs for a while now, but didn't hit the problem. Any idea to
>> > repro locally? I would also suggest running just the for-3.1/core to
>> > isolate the issue.
>>
>> Just reverting that change fixes it for me.  Jens mentioned it was
>> broken for on-stack bios.
>
> On-stack bios? I don't recall ever coming across such a thing, who what
> where why?
>
> i would expect on stack bios to work though, i'm really curious how it
> was broken

New change added a bio_put() which might not work if the bio is on stack.

I don't remember seeing a on-stack-bio either, any help to jog my memory?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748
  2014-01-08 21:13           ` Chris Mason
@ 2014-01-08 21:21             ` Jens Axboe
  0 siblings, 0 replies; 21+ messages in thread
From: Jens Axboe @ 2014-01-08 21:21 UTC (permalink / raw)
  To: Chris Mason, muthu.lkml@gmail.com
  Cc: kmo@daterainc.com, linux-btrfs@vger.kernel.org,
	fengguang.wu@intel.com, linux-kernel@vger.kernel.org,
	lkp@linux.intel.com, linux-fsdevel@vger.kernel.org

On 01/08/2014 02:13 PM, Chris Mason wrote:
> On Tue, 2014-01-07 at 12:15 -0800, Muthu Kumar wrote:
>> Thanks Fengguang. Final patch with added comment. BTW, fengguang
>> mentioned that git-am has trouble with the inline patch and "quilt
>> import" worked fine for him...
>>
>> ------------
>> In btrfs_end_bio(), we increment bi_remaining if is_orig_bio. If not,
>> we restore the orig_bio but failed to increment bi_remaining for
>> orig_bio, which triggers a BUG_ON later when bio_endio is called. Fix
>> is to increment bi_remaining when we restore the orig bio as well.
>>
>> Reported-and-Tested-by: Fengguang wu <fengguang.wu@intel.com>
>> CC: Kent Overstreet <kmo@daterainc.com>
>> CC: Jens Axboe <axboe@kernel.dk>
>> CC: Chris Mason <clm@fb.com>
>> Signed-off-by: Muthukumar Ratty <muthur@gmail.com>
>>
> 
> Reviewed-by: Chris Mason <clm@fb.com>
> 
> Jens, please pull this one in.

Done, with the added reviewed and tested-by's.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748
  2014-01-08 21:18                               ` Muthu Kumar
@ 2014-01-08 21:24                                 ` Kent Overstreet
  0 siblings, 0 replies; 21+ messages in thread
From: Kent Overstreet @ 2014-01-08 21:24 UTC (permalink / raw)
  To: Muthu Kumar
  Cc: Chris Mason, linux-btrfs@vger.kernel.org, fengguang.wu@intel.com,
	linux-kernel@vger.kernel.org, lkp@linux.intel.com,
	axboe@kernel.dk, linux-fsdevel@vger.kernel.org

On Wed, Jan 08, 2014 at 01:18:46PM -0800, Muthu Kumar wrote:
> On Wed, Jan 8, 2014 at 1:14 PM, Kent Overstreet <kmo@daterainc.com> wrote:
> > On Wed, Jan 08, 2014 at 09:11:49PM +0000, Chris Mason wrote:
> >> On Wed, 2014-01-08 at 13:01 -0800, Muthu Kumar wrote:
> >> > On Wed, Jan 8, 2014 at 12:51 PM, Chris Mason <clm@fb.com> wrote:
> >> > > On Wed, 2014-01-08 at 12:40 -0800, Muthu Kumar wrote:
> >> > >> On Wed, Jan 8, 2014 at 12:16 PM, Chris Mason <clm@fb.com> wrote:
> >> > >> > On Wed, 2014-01-08 at 11:54 -0800, Muthu Kumar wrote:
> >> > >> >> Chris,
> >> > >> >>
> >> > >> >> [    8.336061] WARNING: CPU: 0 PID: 0 at fs/bio.c:1778 bio_endio+0xbe/0x100()
> >> > >> >> [    8.336062] bio_endio: bio for (unknown) without endio
> >> > >> >>
> >> > >> >> This is my recent change to avoid memory leak in bio_endio. But I
> >> > >> >> think the problem is higher up, most likely bio_endio is called twice
> >> > >> >> on the same bio (which was freed before).
> >> > >> >>
> >> > >> >
> >> > >> > I think these are just two separate problems.  Lets ignore the WARN_ON
> >> > >> > for now.
> >> > >> >
> >> > >>
> >> > >> Not really... the BUG that is triggered:
> >> > >>
> >> > >> kernel BUG at fs/bio.c:523!
> >> > >>
> >> > >> It is in bio_put() (added to bio_endio() as part of recent change)
> >> > >> which gets an already freed bio.
> >> > >>
> >> > >
> >> > > Oh! I see.  Let me try with that one reverted.  Thanks!
> >> > >
> >> > > -chris
> >> > >
> >> >
> >> > But, like I said, problem is in different place. I am running a "dd"
> >> > on ext4 fs for a while now, but didn't hit the problem. Any idea to
> >> > repro locally? I would also suggest running just the for-3.1/core to
> >> > isolate the issue.
> >>
> >> Just reverting that change fixes it for me.  Jens mentioned it was
> >> broken for on-stack bios.
> >
> > On-stack bios? I don't recall ever coming across such a thing, who what
> > where why?
> >
> > i would expect on stack bios to work though, i'm really curious how it
> > was broken
> 
> New change added a bio_put() which might not work if the bio is on stack.
> 
> I don't remember seeing a on-stack-bio either, any help to jog my memory?

That's code that logically belongs in bio_chain_endio(), it's just a
hack to avoid blowing the stack since the kernel is compiled with
-fno-sibling-call-optimization when you enable frame pointers (otherwise
would optimize those tail calls to jumps and we'd have no stack blowing
issues).

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2014-01-08 21:23 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20140102053101.GA29352@localhost>
2014-01-03 19:51 ` [block:for-3.14/core] kernel BUG at fs/bio.c:1748 Muthu Kumar
     [not found]   ` <20140105094639.GA7423@localhost>
2014-01-05 16:28     ` Muthu Kumar
2014-01-06 22:10   ` Kent Overstreet
2014-01-07  0:47     ` Muthu Kumar
2014-01-07  2:52       ` Kent Overstreet
2014-01-07  5:53       ` Fengguang Wu
2014-01-07 20:15         ` Muthu Kumar
2014-01-07 20:29           ` Chris Mason
2014-01-07 21:23             ` Muthu Kumar
2014-01-08 19:41               ` Chris Mason
2014-01-08 19:54                 ` Muthu Kumar
2014-01-08 20:16                   ` Chris Mason
2014-01-08 20:40                     ` Muthu Kumar
2014-01-08 20:51                       ` Chris Mason
2014-01-08 21:01                         ` Muthu Kumar
2014-01-08 21:11                           ` Chris Mason
2014-01-08 21:14                             ` Kent Overstreet
2014-01-08 21:18                               ` Muthu Kumar
2014-01-08 21:24                                 ` Kent Overstreet
2014-01-08 21:13           ` Chris Mason
2014-01-08 21:21             ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).