linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* DIO panic on 2.6.21.5
@ 2007-06-28  3:01 Badari Pulavarty
  2007-06-29 22:10 ` Zach Brown
  0 siblings, 1 reply; 9+ messages in thread
From: Badari Pulavarty @ 2007-06-28  3:01 UTC (permalink / raw)
  To: linux-fsdevel, Zach Brown

Hi Zach,

One of our perf. team ran into this while doing some runs.
I didn't see anything obvious - it looks like we converted
async IO to synchronous one. I didn't spend much time digging
around.

Is this a known issue ? Any ideas ?

Thanks,
Badari

------------[ cut here ]------------
kernel BUG at fs/direct-io.c:1113!
invalid opcode: 0000 [1] SMP
CPU 11
Modules linked in: oprofile raw capability commoncap qla2xxx ipv6 button 
battery ac loop dm_mod tg3 ext3 jbd edd fan thermal processor scsi_transport_fc 
sg aic94xx libsas firmware_class scsi_transport_sas serverworks sd_mod scsePid: 
9709, comm: db2sysc Not tainted 2.6.21.5-smp #1
RIP: 0010:[<ffffffff802a7a1a>]  [<ffffffff802a7a1a>] 
__blockdev_direct_IO+0x96c/0xa4a
RSP: 0018:ffff810c3d3efc08  EFLAGS: 00010202
RAX: 0000000000000246 RBX: ffff810041768b24 RCX: ffffffff80406918
RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff810041768b24
RBP: ffff810041768800 R08: 0000000000000086 R09: ffff81003f0a2370
R10: ffff810c3f3ee9d8 R11: ffffffff802e9716 R12: 0000000000000001
R13: fffffffffffffdef R14: ffff810c5beb42b0 R15: 0000000000000000
FS:  00002b3d14ffe940(0000) GS:ffff810c444c95c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00002b3e84f57000 CR3: 000000004511c000 CR4: 00000000000006e0
Process db2sysc (pid: 9709, threadinfo ffff810c3d3ee000, task ffff810041fee760)
Stack:  ffff810c42f0fbc0 000000004c1f4000 ffff811836306d48 00000011803e7f95
 0000000000000009 0000000000000000 0000000000000008 0000000000000000
 ffff810041fee760 0000000900000001 0000000000000001 ffff810024ac9948
Call Trace:
 [<ffffffff802a67c7>] blkdev_direct_IO+0x45/0x4a
 [<ffffffff802a66ec>] blkdev_get_blocks+0x0/0x96
 [<ffffffff80260108>] generic_file_direct_IO+0xbd/0xfb
 [<ffffffff802601a6>] generic_file_direct_write+0x60/0xf5
 [<ffffffff80260b70>] __generic_file_aio_write_nolock+0x2e7/0x410
 [<ffffffff8029a0ce>] aio_read_evt+0x9a/0x108
 [<ffffffff80260d8e>] generic_file_aio_write_nolock+0x34/0x80
 [<ffffffff80260d5a>] generic_file_aio_write_nolock+0x0/0x80
 [<ffffffff80299d23>] aio_rw_vect_retry+0x72/0x14a
 [<ffffffff8029a63f>] aio_run_iocb+0x9a/0x134
 [<ffffffff8029b10a>] io_submit_one+0x311/0x35e
 [<ffffffff8029b678>] sys_io_submit+0x9b/0x101
 [<ffffffff80229986>] default_wake_function+0x0/0xe
 [<ffffffff80209b8e>] system_call+0x7e/0x83


Code: 0f 0b eb fe 4d 85 e4 75 1d 48 8b 74 24 08 44 89 ea 48 89 ef
RIP  [<ffffffff802a7a1a>] __blockdev_direct_IO+0x96c/0xa4a
 RSP <ffff810c3d3efc08>




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: DIO panic on 2.6.21.5
  2007-06-28  3:01 DIO panic on 2.6.21.5 Badari Pulavarty
@ 2007-06-29 22:10 ` Zach Brown
  2007-06-30  0:03   ` Badari Pulavarty
  0 siblings, 1 reply; 9+ messages in thread
From: Zach Brown @ 2007-06-29 22:10 UTC (permalink / raw)
  To: Badari Pulavarty; +Cc: linux-fsdevel


On Jun 27, 2007, at 8:01 PM, Badari Pulavarty wrote:

> Hi Zach,
>
> One of our perf. team ran into this while doing some runs.
> I didn't see anything obvious - it looks like we converted
> async IO to synchronous one. I didn't spend much time digging
> around.

It looks pretty bad, a *shouldn't happen* kind of case.  I'm sure it  
just means I missed some case :/.

> Is this a known issue ? Any ideas ?

I haven't seen it before, no.  Do they have a reliable recipe to  
reproduce it?

In any case, I'll dig in early next week when I'm back from OLS.  I'm  
pretty confident that we'll be able to find the case.  Please let me  
know if it's urgent and I should try and find time before then.

Thanks for reporting it.

- z

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: DIO panic on 2.6.21.5
  2007-06-29 22:10 ` Zach Brown
@ 2007-06-30  0:03   ` Badari Pulavarty
  2007-07-03 22:28     ` [PATCH] dio: remove bogus refcounting BUG_ON Zach Brown
  0 siblings, 1 reply; 9+ messages in thread
From: Badari Pulavarty @ 2007-06-30  0:03 UTC (permalink / raw)
  To: Zach Brown; +Cc: linux-fsdevel



Zach Brown wrote:

>
> On Jun 27, 2007, at 8:01 PM, Badari Pulavarty wrote:
>
>> Hi Zach,
>>
>> One of our perf. team ran into this while doing some runs.
>> I didn't see anything obvious - it looks like we converted
>> async IO to synchronous one. I didn't spend much time digging
>> around.
>
>
> It looks pretty bad, a *shouldn't happen* kind of case.  I'm sure it  
> just means I missed some case :/.
>
>> Is this a known issue ? Any ideas ?
>
>
> I haven't seen it before, no.  Do they have a reliable recipe to  
> reproduce it?
>
> In any case, I'll dig in early next week when I'm back from OLS.  I'm  
> pretty confident that we'll be able to find the case.  Please let me  
> know if it's urgent and I should try and find time before then.
>
> Thanks for reporting it.

I asked our perf. team to reproduce and print out refcount, if this 
happens again.
We can look at it when we get back.

Thanks,
Badari



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] dio: remove bogus refcounting BUG_ON
  2007-06-30  0:03   ` Badari Pulavarty
@ 2007-07-03 22:28     ` Zach Brown
  2007-07-05  2:25       ` Badari Pulavarty
  0 siblings, 1 reply; 9+ messages in thread
From: Zach Brown @ 2007-07-03 22:28 UTC (permalink / raw)
  To: Badari Pulavarty
  Cc: Linus Torvalds, Andrew Morton, linux-fsdevel, linux-kernel

Linus, Andrew, please apply the bug fix patch at the end of this reply
for .22.

> >>One of our perf. team ran into this while doing some runs.
> >>I didn't see anything obvious - it looks like we converted
> >>async IO to synchronous one. I didn't spend much time digging
> >>around.

OK, I think this BUG_ON() is just broken.  I wasn't able to find any
obvious bugs from reading the code which would cause the BUG_ON() to
fire.  If it's reproducible I'd love to hear what the recipe is.

I did notice that this BUG_ON() is evaluating dio after having dropped
it's ref :/.  So it's not completely absurd to fear that it's a race
with the dio's memory being reused, but that'd be a pretty tight race.

Let's remove this stupid BUG_ON and see if that test box still has
trouble.  It might just hit the valid BUG_ON a few lines down, but this
unsafe BUG_ON needs to go.

-------

dio: remove bogus refcounting BUG_ON

Badari Pulavarty reported a case of this BUG_ON is triggering during
testing.  It's completely bogus and should be removed.

It's trying to notice if we left references to the dio hanging around in
the sync case.  They should have been dropped as IO completed while this
path was in dio_await_completion().  This condition will also be
checked, via some twisty logic, by the BUG_ON(ret != -EIOCBQUEUED) a few
lines lower.  So to start this BUG_ON() is redundant.

More fatally, it's dereferencing dio-> after having dropped its
reference.  It's only safe to dereference the dio after releasing the
lock if the final reference was just dropped.  Another CPU might free
the dio in bio completion and reuse the memory after this path drops the
dio lock but before the BUG_ON() is evaluated.

This patch passed aio+dio regression unit tests and aio-stress on ext3.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>

diff -r 509ce354ae1b fs/direct-io.c
--- a/fs/direct-io.c	Sun Jul 01 22:00:49 2007 +0000
+++ b/fs/direct-io.c	Tue Jul 03 14:56:41 2007 -0700
@@ -1106,7 +1106,7 @@ direct_io_worker(int rw, struct kiocb *i
 	spin_lock_irqsave(&dio->bio_lock, flags);
 	ret2 = --dio->refcount;
 	spin_unlock_irqrestore(&dio->bio_lock, flags);
-	BUG_ON(!dio->is_async && ret2 != 0);
+
 	if (ret2 == 0) {
 		ret = dio_complete(dio, offset, ret);
 		kfree(dio);

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] dio: remove bogus refcounting BUG_ON
  2007-07-03 22:28     ` [PATCH] dio: remove bogus refcounting BUG_ON Zach Brown
@ 2007-07-05  2:25       ` Badari Pulavarty
  2007-07-05  3:53         ` Suparna Bhattacharya
  2007-07-05 17:11         ` Zach Brown
  0 siblings, 2 replies; 9+ messages in thread
From: Badari Pulavarty @ 2007-07-05  2:25 UTC (permalink / raw)
  To: Zach Brown; +Cc: Linus Torvalds, Andrew Morton, linux-fsdevel, lkml

On Tue, 2007-07-03 at 15:28 -0700, Zach Brown wrote:
> Linus, Andrew, please apply the bug fix patch at the end of this reply
> for .22.
> 
> > >>One of our perf. team ran into this while doing some runs.
> > >>I didn't see anything obvious - it looks like we converted
> > >>async IO to synchronous one. I didn't spend much time digging
> > >>around.
> 
> OK, I think this BUG_ON() is just broken.  I wasn't able to find any
> obvious bugs from reading the code which would cause the BUG_ON() to
> fire.  If it's reproducible I'd love to hear what the recipe is.
> 
> I did notice that this BUG_ON() is evaluating dio after having dropped
> it's ref :/.  So it's not completely absurd to fear that it's a race
> with the dio's memory being reused, but that'd be a pretty tight race.
> 
> Let's remove this stupid BUG_ON and see if that test box still has
> trouble.  It might just hit the valid BUG_ON a few lines down, but this
> unsafe BUG_ON needs to go.

I went through the code multiple times, I can't find how we can trigger
the BUG_ON(). But unfortunately, our perf. team is able reproduce the
problem. Debug indicated that, the ret2 == 1 :(

Not sure how that can happen. Ideas ?

Thanks,
Badari

> 
> -------
> 
> dio: remove bogus refcounting BUG_ON
> 
> Badari Pulavarty reported a case of this BUG_ON is triggering during
> testing.  It's completely bogus and should be removed.
> 
> It's trying to notice if we left references to the dio hanging around in
> the sync case.  They should have been dropped as IO completed while this
> path was in dio_await_completion().  This condition will also be
> checked, via some twisty logic, by the BUG_ON(ret != -EIOCBQUEUED) a few
> lines lower.  So to start this BUG_ON() is redundant.
> 
> More fatally, it's dereferencing dio-> after having dropped its
> reference.  It's only safe to dereference the dio after releasing the
> lock if the final reference was just dropped.  Another CPU might free
> the dio in bio completion and reuse the memory after this path drops the
> dio lock but before the BUG_ON() is evaluated.
> 
> This patch passed aio+dio regression unit tests and aio-stress on ext3.
> 
> Signed-off-by: Zach Brown <zach.brown@oracle.com>
> Cc: Badari Pulavarty <pbadari@us.ibm.com>
> 
> diff -r 509ce354ae1b fs/direct-io.c
> --- a/fs/direct-io.c	Sun Jul 01 22:00:49 2007 +0000
> +++ b/fs/direct-io.c	Tue Jul 03 14:56:41 2007 -0700
> @@ -1106,7 +1106,7 @@ direct_io_worker(int rw, struct kiocb *i
>  	spin_lock_irqsave(&dio->bio_lock, flags);
>  	ret2 = --dio->refcount;
>  	spin_unlock_irqrestore(&dio->bio_lock, flags);
> -	BUG_ON(!dio->is_async && ret2 != 0);
> +
>  	if (ret2 == 0) {
>  		ret = dio_complete(dio, offset, ret);
>  		kfree(dio);


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] dio: remove bogus refcounting BUG_ON
  2007-07-05  2:25       ` Badari Pulavarty
@ 2007-07-05  3:53         ` Suparna Bhattacharya
  2007-07-05 17:11         ` Zach Brown
  1 sibling, 0 replies; 9+ messages in thread
From: Suparna Bhattacharya @ 2007-07-05  3:53 UTC (permalink / raw)
  To: Badari Pulavarty
  Cc: Zach Brown, Linus Torvalds, Andrew Morton, linux-fsdevel, lkml

On Wed, Jul 04, 2007 at 07:25:10PM -0700, Badari Pulavarty wrote:
> On Tue, 2007-07-03 at 15:28 -0700, Zach Brown wrote:
> > Linus, Andrew, please apply the bug fix patch at the end of this reply
> > for .22.
> > 
> > > >>One of our perf. team ran into this while doing some runs.
> > > >>I didn't see anything obvious - it looks like we converted
> > > >>async IO to synchronous one. I didn't spend much time digging
> > > >>around.
> > 
> > OK, I think this BUG_ON() is just broken.  I wasn't able to find any
> > obvious bugs from reading the code which would cause the BUG_ON() to
> > fire.  If it's reproducible I'd love to hear what the recipe is.
> > 
> > I did notice that this BUG_ON() is evaluating dio after having dropped
> > it's ref :/.  So it's not completely absurd to fear that it's a race
> > with the dio's memory being reused, but that'd be a pretty tight race.
> > 
> > Let's remove this stupid BUG_ON and see if that test box still has
> > trouble.  It might just hit the valid BUG_ON a few lines down, but this
> > unsafe BUG_ON needs to go.
> 
> I went through the code multiple times, I can't find how we can trigger
> the BUG_ON(). But unfortunately, our perf. team is able reproduce the
> problem. Debug indicated that, the ret2 == 1 :(
> 
> Not sure how that can happen. Ideas ?

Does it trigger even if you avoid referencing dio in the BUG_ON(), i.e.
with something like ...


--- direct-io.c	2007-07-02 01:24:24.000000000 +0530
+++ direct-io-debug.c	2007-07-05 09:18:56.000000000 +0530
@@ -1104,9 +1104,10 @@ direct_io_worker(int rw, struct kiocb *i
 	 * decide to wake the submission path atomically.
 	 */
 	spin_lock_irqsave(&dio->bio_lock, flags);
+	is_async = dio->is_async;
 	ret2 = --dio->refcount;
 	spin_unlock_irqrestore(&dio->bio_lock, flags);
-	BUG_ON(!dio->is_async && ret2 != 0);
+	BUG_ON(!is_async && ret2 != 0);
 	if (ret2 == 0) {
 		ret = dio_complete(dio, offset, ret);
 		kfree(dio);

> 
> Thanks,
> Badari
> 
> > 
> > -------
> > 
> > dio: remove bogus refcounting BUG_ON
> > 
> > Badari Pulavarty reported a case of this BUG_ON is triggering during
> > testing.  It's completely bogus and should be removed.
> > 
> > It's trying to notice if we left references to the dio hanging around in
> > the sync case.  They should have been dropped as IO completed while this
> > path was in dio_await_completion().  This condition will also be
> > checked, via some twisty logic, by the BUG_ON(ret != -EIOCBQUEUED) a few
> > lines lower.  So to start this BUG_ON() is redundant.
> > 
> > More fatally, it's dereferencing dio-> after having dropped its
> > reference.  It's only safe to dereference the dio after releasing the
> > lock if the final reference was just dropped.  Another CPU might free
> > the dio in bio completion and reuse the memory after this path drops the
> > dio lock but before the BUG_ON() is evaluated.
> > 
> > This patch passed aio+dio regression unit tests and aio-stress on ext3.
> > 
> > Signed-off-by: Zach Brown <zach.brown@oracle.com>
> > Cc: Badari Pulavarty <pbadari@us.ibm.com>
> > 
> > diff -r 509ce354ae1b fs/direct-io.c
> > --- a/fs/direct-io.c	Sun Jul 01 22:00:49 2007 +0000
> > +++ b/fs/direct-io.c	Tue Jul 03 14:56:41 2007 -0700
> > @@ -1106,7 +1106,7 @@ direct_io_worker(int rw, struct kiocb *i
> >  	spin_lock_irqsave(&dio->bio_lock, flags);
> >  	ret2 = --dio->refcount;
> >  	spin_unlock_irqrestore(&dio->bio_lock, flags);
> > -	BUG_ON(!dio->is_async && ret2 != 0);
> > +
> >  	if (ret2 == 0) {
> >  		ret = dio_complete(dio, offset, ret);
> >  		kfree(dio);
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] dio: remove bogus refcounting BUG_ON
  2007-07-05  2:25       ` Badari Pulavarty
  2007-07-05  3:53         ` Suparna Bhattacharya
@ 2007-07-05 17:11         ` Zach Brown
  2007-07-05 17:16           ` Badari Pulavarty
  1 sibling, 1 reply; 9+ messages in thread
From: Zach Brown @ 2007-07-05 17:11 UTC (permalink / raw)
  To: Badari Pulavarty; +Cc: Linus Torvalds, Andrew Morton, linux-fsdevel, lkml

> the BUG_ON(). But unfortunately, our perf. team is able reproduce the
> problem.

What are they doing to reproduce it?  How much setup does it take?

> Debug indicated that, the ret2 == 1 :(

That could be consistent with the theory that we're racing with the  
dio struct being freed and reused before it's tested in the BUG_ON()  
condition.  Suparna's suggestion to sample dio->is_async before  
releasing the refcount and using that in the BUG_ON condition is a  
good one.

- z

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] dio: remove bogus refcounting BUG_ON
  2007-07-05 17:11         ` Zach Brown
@ 2007-07-05 17:16           ` Badari Pulavarty
  2007-07-05 17:21             ` Zach Brown
  0 siblings, 1 reply; 9+ messages in thread
From: Badari Pulavarty @ 2007-07-05 17:16 UTC (permalink / raw)
  To: Zach Brown; +Cc: Linus Torvalds, Andrew Morton, linux-fsdevel, lkml

On Thu, 2007-07-05 at 10:11 -0700, Zach Brown wrote:
> > the BUG_ON(). But unfortunately, our perf. team is able reproduce the
> > problem.
> 
> What are they doing to reproduce it?  How much setup does it take?

Huge OLTP run :(

> 
> > Debug indicated that, the ret2 == 1 :(
> 
> That could be consistent with the theory that we're racing with the  
> dio struct being freed and reused before it's tested in the BUG_ON()  
> condition.  Suparna's suggestion to sample dio->is_async before  
> releasing the refcount and using that in the BUG_ON condition is a  
> good one.

I will ask them to try that.

Thanks,
Badari


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] dio: remove bogus refcounting BUG_ON
  2007-07-05 17:16           ` Badari Pulavarty
@ 2007-07-05 17:21             ` Zach Brown
  0 siblings, 0 replies; 9+ messages in thread
From: Zach Brown @ 2007-07-05 17:21 UTC (permalink / raw)
  To: Badari Pulavarty; +Cc: Linus Torvalds, Andrew Morton, linux-fsdevel, lkml

>> What are they doing to reproduce it?  How much setup does it take?
>
> Huge OLTP run :(

Darn :(

> I will ask them to try that.

Great, thanks.

- z

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-07-05 17:21 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-28  3:01 DIO panic on 2.6.21.5 Badari Pulavarty
2007-06-29 22:10 ` Zach Brown
2007-06-30  0:03   ` Badari Pulavarty
2007-07-03 22:28     ` [PATCH] dio: remove bogus refcounting BUG_ON Zach Brown
2007-07-05  2:25       ` Badari Pulavarty
2007-07-05  3:53         ` Suparna Bhattacharya
2007-07-05 17:11         ` Zach Brown
2007-07-05 17:16           ` Badari Pulavarty
2007-07-05 17:21             ` Zach Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).