* pagefault in generic_file_buffered_write() causing deadlock
@ 2006-11-15 15:57 Badari Pulavarty
2006-11-15 17:00 ` Andrew Morton
0 siblings, 1 reply; 6+ messages in thread
From: Badari Pulavarty @ 2006-11-15 15:57 UTC (permalink / raw)
To: akpm, linux-mm; +Cc: ext4, lkml
Hi Andrew & MM experts,
We are looking at a customer situation (on 2.6.16-based distro) - where
system becomes almost useless while running some java & stress tests.
Root cause seems to be taking a pagefault in generic_file_buffered_write
() after calling prepare_write. I am wondering
1) Why & How this can happen - since we made sure to fault the user
buffer before prepare write.
2) If this is already fixed in current mainline (I can't see how).
Ideas on what I can do to fix it ?
Thanks,
Badari
Here is the analysis & stacks:
===============================
Java thread doing mmap() holding for mmap_sem and waiting for
transaction to be unlocked:
java D 000000000fed3ff4 7104 2447 2391 2448 2446
(NOTLB)
Call Trace:
[C00000002AC8F410] [C000000001315AC0] 0xc000000001315ac0 (unreliable)
[C00000002AC8F5E0] [C00000000000F0B4] .__switch_to+0x12c/0x150
[C00000002AC8F670] [C00000000039980C] .schedule+0xcec/0xe4c
[C00000002AC8F780] [C00000000017BC24] .start_this_handle+0x3b4/0x4ac
[C00000002AC8F8A0] [C00000000017BE08] .journal_start+0xec/0x140
[C00000002AC8F940] [C000000000171374] .ext3_journal_start_sb+0x58/0x78
[C00000002AC8F9C0] [C00000000016AB90] .ext3_dirty_inode+0x38/0xb0
[C00000002AC8FA50] [C0000000000F6820] .__mark_inode_dirty+0x60/0x1d4
[C00000002AC8FAF0] [C0000000000E9F60] .touch_atime+0xc8/0xe0
[C00000002AC8FB80] [C000000000093834] .generic_file_mmap+0x54/0x80
[C00000002AC8FC00] [C0000000000AC450] .do_mmap_pgoff+0x558/0x870
[C00000002AC8FD10] [C00000000000A9C0] .sys_mmap+0xdc/0x160
[C00000002AC8FDC0] [C000000000014258] .compat_sys_mmap2+0x14/0x28
[C00000002AC8FE30] [C00000000000871C] syscall_exit+0x0/0x40
kjournald locked the transaction and waiting for journal stop
(t_updates to go to zero):
kjournald D 0000000000000000 8704 2167 1 2203 2028
(L-TLB)
Call Trace:
[C00000003514F980] [C0000000005257D8] amd74xx_pci_tbl+0x8/0x200 (unreliable)
[C00000003514FB50] [C00000000000F0B4] .__switch_to+0x12c/0x150
[C00000003514FBE0] [C00000000039980C] .schedule+0xcec/0xe4c
[C00000003514FCF0] [C00000000017DA58] .journal_commit_transaction+0x190/0x1448
[C00000003514FE50] [C000000000182F44] .kjournald+0xf0/0x27c
[C00000003514FF90] [C000000000025630] .kernel_thread+0x4c/0x68
Another java thread, did journal_start() in prepare_write() and
took a pagefault while copying. Now this is waiting for mmap_sem
to finish the fault :(
java D 000000000ffd76f0 6384 2452 2391 2453 2451
(NOTLB)
Call Trace:
[C00000002ABBEE50] [C00000002ABBEEE0] 0xc00000002abbeee0 (unreliable)
[C00000002ABBF020] [C00000000000F0B4] .__switch_to+0x12c/0x150
[C00000002ABBF0B0] [C00000000039980C] .schedule+0xcec/0xe4c
[C00000002ABBF1C0] [C00000000039B688] .rwsem_down_read_failed
+0x284/0x2d0
[C00000002ABBF290] [C00000000039D58C] .do_page_fault+0x2e4/0x75c
[C00000002ABBF460] [C000000000004860] .handle_page_fault+0x20/0x54
--- Exception: 301 at .__copy_tofrom_user+0x11c/0x580
LR = .generic_file_buffered_write+0x39c/0x7c8
[C00000002ABBF750] [C000000000095A94]
.generic_file_buffered_write+0x2c0/0x7c8 (
unreliable)
[C00000002ABBF8F0] [C0000000000962EC]
.__generic_file_aio_write_nolock+0x350/0x3
e0
[C00000002ABBFA20] [C000000000096908] .generic_file_aio_write+0x78/0x104
[C00000002ABBFAE0] [C0000000001649F0] .ext3_file_write+0x2c/0xd4
[C00000002ABBFB70] [C0000000000C5168] .do_sync_write+0xd4/0x130
[C00000002ABBFCF0] [C0000000000C5ED4] .vfs_write+0x128/0x20c
[C00000002ABBFD90] [C0000000000C664C] .sys_write+0x4c/0x8c
[C00000002ABBFE30] [C00000000000871C] syscall_exit+0x0/0x40
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: pagefault in generic_file_buffered_write() causing deadlock
2006-11-15 15:57 pagefault in generic_file_buffered_write() causing deadlock Badari Pulavarty
@ 2006-11-15 17:00 ` Andrew Morton
2006-11-15 18:16 ` Badari Pulavarty
2006-11-15 18:20 ` Badari Pulavarty
0 siblings, 2 replies; 6+ messages in thread
From: Andrew Morton @ 2006-11-15 17:00 UTC (permalink / raw)
To: Badari Pulavarty; +Cc: linux-mm, ext4, lkml
On Wed, 15 Nov 2006 07:57:45 -0800
Badari Pulavarty <pbadari@us.ibm.com> wrote:
> We are looking at a customer situation (on 2.6.16-based distro) - where
> system becomes almost useless while running some java & stress tests.
>
> Root cause seems to be taking a pagefault in generic_file_buffered_write
> () after calling prepare_write. I am wondering
>
> 1) Why & How this can happen - since we made sure to fault the user
> buffer before prepare write.
When using writev() we only fault in the first segment of the iovec. If
the second or succesive segment isn't mapped into pagetables we're
vulnerable to the deadlock.
> 2) If this is already fixed in current mainline (I can't see how).
It was fixed in 2.6.17.
You'll need 6527c2bdf1f833cc18e8f42bd97973d583e4aa83 and
81b0c8713385ce1b1b9058e916edcf9561ad76d6
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: pagefault in generic_file_buffered_write() causing deadlock
2006-11-15 17:00 ` Andrew Morton
@ 2006-11-15 18:16 ` Badari Pulavarty
2006-11-15 18:20 ` Badari Pulavarty
1 sibling, 0 replies; 6+ messages in thread
From: Badari Pulavarty @ 2006-11-15 18:16 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, ext4, lkml
Andrew Morton wrote:
> On Wed, 15 Nov 2006 07:57:45 -0800
> Badari Pulavarty <pbadari@us.ibm.com> wrote:
>
>
>> We are looking at a customer situation (on 2.6.16-based distro) - where
>> system becomes almost useless while running some java & stress tests.
>>
>> Root cause seems to be taking a pagefault in generic_file_buffered_write
>> () after calling prepare_write. I am wondering
>>
>> 1) Why & How this can happen - since we made sure to fault the user
>> buffer before prepare write.
>>
>
> When using writev() we only fault in the first segment of the iovec. If
> the second or succesive segment isn't mapped into pagetables we're
> vulnerable to the deadlock.
>
Yes. I remember this change. Thank you.
>
>> 2) If this is already fixed in current mainline (I can't see how).
>>
>
> It was fixed in 2.6.17.
>
> You'll need 6527c2bdf1f833cc18e8f42bd97973d583e4aa83 and
> 81b0c8713385ce1b1b9058e916edcf9561ad76d6
>
I will try to get this change into customer :(
Thanks,
Badari
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: pagefault in generic_file_buffered_write() causing deadlock
2006-11-15 17:00 ` Andrew Morton
2006-11-15 18:16 ` Badari Pulavarty
@ 2006-11-15 18:20 ` Badari Pulavarty
2006-11-15 19:29 ` Andrew Morton
1 sibling, 1 reply; 6+ messages in thread
From: Badari Pulavarty @ 2006-11-15 18:20 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, ext4, lkml
Andrew Morton wrote:
> On Wed, 15 Nov 2006 07:57:45 -0800
> Badari Pulavarty <pbadari@us.ibm.com> wrote:
>
>
>> We are looking at a customer situation (on 2.6.16-based distro) - where
>> system becomes almost useless while running some java & stress tests.
>>
>> Root cause seems to be taking a pagefault in generic_file_buffered_write
>> () after calling prepare_write. I am wondering
>>
>> 1) Why & How this can happen - since we made sure to fault the user
>> buffer before prepare write.
>>
>
> When using writev() we only fault in the first segment of the iovec. If
> the second or succesive segment isn't mapped into pagetables we're
> vulnerable to the deadlock.
>
>
Hmm.. Not it :(
Its coming from write() not writev().
[C00000002ABBF290] [C00000000039D58C] .do_page_fault+0x2e4/0x75c
[C00000002ABBF460] [C000000000004860] .handle_page_fault+0x20/0x54
--- Exception: 301 at .__copy_tofrom_user+0x11c/0x580
LR = .generic_file_buffered_write+0x39c/0x7c8
[C00000002ABBF750] [C000000000095A94]
.generic_file_buffered_write+0x2c0/0x7c8 (
unreliable)
[C00000002ABBF8F0] [C0000000000962EC]
.__generic_file_aio_write_nolock+0x350/0x3
e0
[C00000002ABBFA20] [C000000000096908] .generic_file_aio_write+0x78/0x104
[C00000002ABBFAE0] [C0000000001649F0] .ext3_file_write+0x2c/0xd4
[C00000002ABBFB70] [C0000000000C5168] .do_sync_write+0xd4/0x130
[C00000002ABBFCF0] [C0000000000C5ED4] .vfs_write+0x128/0x20c
[C00000002ABBFD90] [C0000000000C664C] .sys_write+0x4c/0x8c
[C00000002ABBFE30] [C00000000000871C] syscall_exit+0x0/0x40
Thanks,
Badari
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: pagefault in generic_file_buffered_write() causing deadlock
2006-11-15 18:20 ` Badari Pulavarty
@ 2006-11-15 19:29 ` Andrew Morton
2006-11-15 20:39 ` Chris Mason
0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2006-11-15 19:29 UTC (permalink / raw)
To: Badari Pulavarty; +Cc: linux-mm, ext4, lkml
On Wed, 15 Nov 2006 10:20:43 -0800
Badari Pulavarty <pbadari@us.ibm.com> wrote:
> Andrew Morton wrote:
> > On Wed, 15 Nov 2006 07:57:45 -0800
> > Badari Pulavarty <pbadari@us.ibm.com> wrote:
> >
> >
> >> We are looking at a customer situation (on 2.6.16-based distro) - where
> >> system becomes almost useless while running some java & stress tests.
> >>
> >> Root cause seems to be taking a pagefault in generic_file_buffered_write
> >> () after calling prepare_write. I am wondering
> >>
> >> 1) Why & How this can happen - since we made sure to fault the user
> >> buffer before prepare write.
> >>
> >
> > When using writev() we only fault in the first segment of the iovec. If
> > the second or succesive segment isn't mapped into pagetables we're
> > vulnerable to the deadlock.
> >
> >
> Hmm.. Not it :(
> Its coming from write() not writev().
>
> [C00000002ABBF290] [C00000000039D58C] .do_page_fault+0x2e4/0x75c
> [C00000002ABBF460] [C000000000004860] .handle_page_fault+0x20/0x54
> --- Exception: 301 at .__copy_tofrom_user+0x11c/0x580
> LR = .generic_file_buffered_write+0x39c/0x7c8
> [C00000002ABBF750] [C000000000095A94]
> .generic_file_buffered_write+0x2c0/0x7c8 (
> unreliable)
> [C00000002ABBF8F0] [C0000000000962EC]
> .__generic_file_aio_write_nolock+0x350/0x3
> e0
> [C00000002ABBFA20] [C000000000096908] .generic_file_aio_write+0x78/0x104
> [C00000002ABBFAE0] [C0000000001649F0] .ext3_file_write+0x2c/0xd4
> [C00000002ABBFB70] [C0000000000C5168] .do_sync_write+0xd4/0x130
> [C00000002ABBFCF0] [C0000000000C5ED4] .vfs_write+0x128/0x20c
> [C00000002ABBFD90] [C0000000000C664C] .sys_write+0x4c/0x8c
> [C00000002ABBFE30] [C00000000000871C] syscall_exit+0x0/0x40
>
Oh well. If it's a deadlock (this is not clear from your description) then
please gather backtraces of all affected tasks.
There is an ab/ba deadlock with journal_start() and lock_page(), iirc.
Chris and I had a look at that a while back and collapsed in exhaustion -
it isn't pretty.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: pagefault in generic_file_buffered_write() causing deadlock
2006-11-15 19:29 ` Andrew Morton
@ 2006-11-15 20:39 ` Chris Mason
0 siblings, 0 replies; 6+ messages in thread
From: Chris Mason @ 2006-11-15 20:39 UTC (permalink / raw)
To: Andrew Morton; +Cc: Badari Pulavarty, linux-mm, ext4, lkml, npiggin
On Wed, Nov 15, 2006 at 11:29:57AM -0800, Andrew Morton wrote:
> Oh well. If it's a deadlock (this is not clear from your description) then
> please gather backtraces of all affected tasks.
>
> There is an ab/ba deadlock with journal_start() and lock_page(), iirc.
> Chris and I had a look at that a while back and collapsed in exhaustion -
> it isn't pretty.
This should be the page fault/journal lock inversion stuff Nick was
working on. His patchset had a pretty good description of the problems,
Badari can also dig through the novell/ltc bugzillas for vmmstress.
Should be LTC9358.
Hopefully Nick's patches will address all of this. sles9 had a partial
solution for the mmap deadlock, I think it was to dirty the inode at a
later time. For some reason, I thought this workload was passing in
later kernels...
-chris
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2006-11-15 20:39 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-15 15:57 pagefault in generic_file_buffered_write() causing deadlock Badari Pulavarty
2006-11-15 17:00 ` Andrew Morton
2006-11-15 18:16 ` Badari Pulavarty
2006-11-15 18:20 ` Badari Pulavarty
2006-11-15 19:29 ` Andrew Morton
2006-11-15 20:39 ` Chris Mason
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).