* Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes [not found] ` <fa.yXZlqXBzNi9Gq/4Q6Wc9H6bw+lU@ifi.uio.no> @ 2006-05-17 16:47 ` Dave Olson 2006-05-17 18:08 ` [openib-general] " Roland Dreier 0 siblings, 1 reply; 8+ messages in thread From: Dave Olson @ 2006-05-17 16:47 UTC (permalink / raw) To: Roland Dreier; +Cc: Bryan O'Sullivan, openib-general, linux-kernel On Mon, 15 May 2006, Roland Dreier wrote: | This looks like a pastiche of several patches. Why can't it be split | up into logical pieces? | | > Call dma_free_coherent without ipath_mutex held. | | Why? Doesn't freeing work with the mutex held? Sure, that's the way the previous code worked. We are seeing a bug (with both our driver native MPI processes and mthca mvapic), where when 8 processes using "simultaneously exit", we get watchdogs and/or hangs in the close routines. Moving the freeing outside the mutex was an attempt to see if we were running into some VM issues by doing lots of page unlocking and freeing with the mutex held. It seemed to help somewhat, but not to solve the problem. It also allows other processes to open and close in a somewhat more timely fashion. Dave Olson olson@unixfolk.com http://www.unixfolk.com/dave ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes 2006-05-17 16:47 ` [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes Dave Olson @ 2006-05-17 18:08 ` Roland Dreier 2006-05-18 4:13 ` Dave Olson 0 siblings, 1 reply; 8+ messages in thread From: Roland Dreier @ 2006-05-17 18:08 UTC (permalink / raw) To: Dave Olson; +Cc: linux-kernel, openib-general Dave> We are seeing a bug (with both our driver native MPI Dave> processes and mthca mvapic), where when 8 processes using Dave> "simultaneously exit", we get watchdogs and/or hangs in the Dave> close routines. Moving the freeing outside the mutex was an Dave> attempt to see if we were running into some VM issues by Dave> doing lots of page unlocking and freeing with the mutex Dave> held. It seemed to help somewhat, but not to solve the Dave> problem. Am I understanding correctly that you see a hang or watchdog timeout even with the mthca driver? Is there any possibility of posting the test case to reproduce this? It doesn't seem likely that ipath changes are going to fix a generic bug like this... - R. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes 2006-05-17 18:08 ` [openib-general] " Roland Dreier @ 2006-05-18 4:13 ` Dave Olson 2006-05-18 4:55 ` Roland Dreier 2006-05-18 5:26 ` Dave Olson 0 siblings, 2 replies; 8+ messages in thread From: Dave Olson @ 2006-05-18 4:13 UTC (permalink / raw) To: Roland Dreier; +Cc: linux-kernel, openib-general On Wed, 17 May 2006, Roland Dreier wrote: | Dave> We are seeing a bug (with both our driver native MPI | Dave> processes and mthca mvapic), where when 8 processes using | Dave> "simultaneously exit", we get watchdogs and/or hangs in the | Dave> close routines. Moving the freeing outside the mutex was an | Dave> attempt to see if we were running into some VM issues by | Dave> doing lots of page unlocking and freeing with the mutex | Dave> held. It seemed to help somewhat, but not to solve the | Dave> problem. | | Am I understanding correctly that you see a hang or watchdog timeout | even with the mthca driver? Yes. That is, the symptoms are the same, although the cause may be different. | Is there any possibility of posting the test case to reproduce this? It's the MPI job mpi_multibw (based on the OSU osu_bw, but changed to do messaging rate), running 8 copies per dual-core 4-socket opteron, both on InfiniPath MPI, and MVAPICH (built for gen2). We ship the source with our upcoming release, and will probably make it available outside our release. We did discover one possible problem today, which is shared between our device code and the core openib code, and that's doing some memory freeing and accounting from a work thread (updating mm->locked_vm and cleaning up from earlier get_user_pages); the code in our driver was copied from the openib core code, it's not literally shared. I have a strong suspicion that at least sometimes, it's executing after the current->mm has gone away. I'm looking at that more right now. | It doesn't seem likely that ipath changes are going to fix a generic | bug like this... It wasn't an attempt to fix it, so much as to work around it, while I worked on other higher priority stuff. As I mentioned, it also helps a bit in allowing multiple processes to be in the open and close code simultaneously, when you have multiple cpus, so even on that basis, I'd probably leave it as it now is. Dave Olson olson@unixfolk.com http://www.unixfolk.com/dave ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes 2006-05-18 4:13 ` Dave Olson @ 2006-05-18 4:55 ` Roland Dreier 2006-05-18 5:15 ` Bryan O'Sullivan 2006-05-18 7:04 ` Dave Olson 2006-05-18 5:26 ` Dave Olson 1 sibling, 2 replies; 8+ messages in thread From: Roland Dreier @ 2006-05-18 4:55 UTC (permalink / raw) To: Dave Olson; +Cc: linux-kernel, openib-general Dave> We did discover one possible problem today, which is shared Dave> between our device code and the core openib code, and that's Dave> doing some memory freeing and accounting from a work thread Dave> (updating mm->locked_vm and cleaning up from earlier Dave> get_user_pages); the code in our driver was copied from the Dave> openib core code, it's not literally shared. Dave> I have a strong suspicion that at least sometimes, it's Dave> executing after the current->mm has gone away. I'm looking Dave> at that more right now. It doesn't seem likely to me. In uverbs_mem.c, ib_umem_release_on_close() does get_task_mm() and gives up if it can't take a reference to the task's mm. The mmput() doesn't happen until ib_umem_account() runs in the work thread. I do see obvious bugs in ipath_user_pages.c, though. In ipath_release_user_pages_on_close(), you have: mm = get_task_mm(current); if (!mm) goto bail; work = kmalloc(sizeof(*work), GFP_KERNEL); if (!work) goto bail_mm; goto bail; INIT_WORK(&work->work, user_pages_account, work); work->mm = mm; work->num_pages = num_pages; bail_mm: mmput(mm); bail: return; So with the "goto bail" you skip the code which does something with the work you allocate, which means that you leak not only the work structure but also the reference to the task's mm that you took. Even without the "goto bail" the code still wouldn't actually schedule the work, so the work structure would be leaked, although you would do mmput(). I'm not sure what you were trying to do here.c - R. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes 2006-05-18 4:55 ` Roland Dreier @ 2006-05-18 5:15 ` Bryan O'Sullivan 2006-05-18 5:17 ` Roland Dreier 2006-05-18 7:04 ` Dave Olson 1 sibling, 1 reply; 8+ messages in thread From: Bryan O'Sullivan @ 2006-05-18 5:15 UTC (permalink / raw) To: Roland Dreier; +Cc: Dave Olson, linux-kernel, openib-general On Wed, 2006-05-17 at 21:55 -0700, Roland Dreier wrote: > So with the "goto bail" you skip the code which does something with > the work you allocate, which means that you leak not only the work > structure but also the reference to the task's mm that you took. Wow. I have no idea where that extra "goto bail" came from. It's not supposed to be there. <b ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes 2006-05-18 5:15 ` Bryan O'Sullivan @ 2006-05-18 5:17 ` Roland Dreier 0 siblings, 0 replies; 8+ messages in thread From: Roland Dreier @ 2006-05-18 5:17 UTC (permalink / raw) To: Bryan O'Sullivan; +Cc: Dave Olson, linux-kernel, openib-general Bryan> Wow. I have no idea where that extra "goto bail" came Bryan> from. It's not supposed to be there. Even without it you still leak the work structure, because there's no schedule_work(). Now that I look at it, in uverbs_mem.c, the mm will be leaked if the kmalloc fails... - R. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes 2006-05-18 4:55 ` Roland Dreier 2006-05-18 5:15 ` Bryan O'Sullivan @ 2006-05-18 7:04 ` Dave Olson 1 sibling, 0 replies; 8+ messages in thread From: Dave Olson @ 2006-05-18 7:04 UTC (permalink / raw) To: Roland Dreier; +Cc: linux-kernel, openib-general On Wed, 17 May 2006, Roland Dreier wrote: | I do see obvious bugs in ipath_user_pages.c, though. In | ipath_release_user_pages_on_close(), you have: | | mm = get_task_mm(current); | if (!mm) | goto bail; It turns out that since this is called from ipath_close(), mm will always be NULL, so what we do is leak memory, and possibly leave some locked pages. I've been looking at this code this evening; fixing it is clearly needed, but doesn't help the long delays, hangs, and watchdogs, so far. Dave Olson olson@unixfolk.com http://www.unixfolk.com/dave ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes 2006-05-18 4:13 ` Dave Olson 2006-05-18 4:55 ` Roland Dreier @ 2006-05-18 5:26 ` Dave Olson 1 sibling, 0 replies; 8+ messages in thread From: Dave Olson @ 2006-05-18 5:26 UTC (permalink / raw) To: Roland Dreier; +Cc: linux-kernel, openib-general On Wed, 17 May 2006, Dave Olson wrote: | On Wed, 17 May 2006, Roland Dreier wrote: | | | Am I understanding correctly that you see a hang or watchdog timeout | | even with the mthca driver? | | Yes. That is, the symptoms are the same, although the cause | may be different. | | | Is there any possibility of posting the test case to reproduce this? | | It's the MPI job mpi_multibw (based on the OSU osu_bw, but changed | to do messaging rate), running 8 copies per dual-core 4-socket opteron, | both on InfiniPath MPI, and MVAPICH (built for gen2). Here's the typical case where the watchdog fires (with infinipath MPI), on FC4 2.6.16 2108 (without kprobes, with kprobes things are slightly different, but not much; I'm running without since we were often in the kprobes code from the exit code, but I think that's just a red-herring). The sysrq p was some seconds prior to the watchdog. It's almost as though something is looping far too many times during the close cleanup. The other 7 exitting processes are typically in sys_exit_group -> do_exit -> __up_red --> __spin_lock_irqsave -> __up_read (or __down_read) (from what sysrq t prints). They are all runnable on the other 7 processors. The infinipath driver does mmap both memory and device pages for each of these processes. SysRq : Show Regs CPU 0: Modules linked in: ib_sdp(U) ib_cm(U) ib_umad(U) ib_uverbs(U) ib_ipath(U) ib_ipoib(U) ib_sa(U) ib_mad(U) ib_core(U) ipath_core(U) nfs(U) nfsd(U) exportfs(U) lockd(U) nfs_acl(U) ipv6(U) autofs4(U) sunrpc(U) video(U) button(U) battery(U) ac(U) i2c_nforce2(U) i2c_core(U) e1000(U) floppy(U) sg(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) ext3(U) jbd(U) dm_mod(U) sata_nv(U) libata(U) aic79xx(U) scsi_transport_spi(U) sd_mod(U) scsi_mod(U) Pid: 23788, comm: mpi_multibw Not tainted 2.6.16-1.2108_FC4.rootsmp #1 RIP: 0010:[<ffffffff8013c50e>] <ffffffff8013c50e>{__do_softirq+81} RSP: 0018:ffffffff8048d368 EFLAGS: 00000206 RAX: 0000000000000022 RBX: 0000000000000022 RCX: 0000000000000080 RDX: 0000000000000000 RSI: 00000000000000c0 RDI: ffff81007f1fd0c0 RBP: ffffffff80528f80 R08: 0000000000000200 R09: 0000000000000002 R10: ffffffff804a6a38 R11: 0000000000000000 R12: ffffffff80577c80 R13: 0000000000000000 R14: 000000000000000a R15: 00002aaabba6c000 FS: 00002aaaab32ffa0(0000) GS:ffffffff80511000(0000) knlGS:00000000f7fc86c0 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000055555565ebe8 CR3: 000000007ac6d000 CR4: 00000000000006e0 Call Trace: <IRQ> <ffffffff8010c076>{call_softirq+30} <ffffffff8010d82c>{do_softirq+44} <ffffffff8010b9d0>{apic_timer_interrupt+132} <EOI> <ffffffff80355226>{_write_unlock_irq+14} <ffffffff801659d9>{__set_page_dirty_nobuffers+183} <ffffffff8016cc80>{unmap_vmas+1042} <ffffffff8016fa78>{exit_mmap+124} <ffffffff80133f17>{mmput+37} <ffffffff80139783>{do_exit+584} <ffffffff80142aec>{__dequeue_signal+459} <ffffffff80139f00>{sys_exit_group+0} <ffffffff80143f03>{get_signal_to_deliver+1568} <ffffffff8010a37a>{do_signal+116} <ffffffff80197151>{__pollwait+0} <ffffffff80197e9c>{sys_select+934} <ffffffff8010acb7>{sysret_signal+28} <ffffffff8010afa3>{ptregscall_common+103} [ perhaps 20 or 30 seconds later, NMI fires; we had already been sort of stuck for 60 seconds or so when I did the sysrq p above ] NMI Watchdog detected LOCKUP on CPU 1 CPU 1 Modules linked in: ib_sdp(U) ib_cm(U) ib_umad(U) ib_uverbs(U) ib_ipath(U) ib_ipoib(U) ib_sa(U) ib_mad(U) ib_core(U) ipath_core(U) nfs(U) nfsd(U) exportfs(U) lockd(U) nfs_acl(U) ipv6(U) autofs4(U) sunrpc(U) video(U) button(U) battery(U) ac(U) i2c_nforce2(U) i2c_core(U) e1000(U) floppy(U) sg(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) ext3(U) jbd(U) dm_mod(U) sata_nv(U) libata(U) aic79xx(U) scsi_transport_spi(U) sd_mod(U) scsi_mod(U) Pid: 23789, comm: mpi_multibw Not tainted 2.6.16-1.2108_FC4.rootsmp #1 RIP: 0010:[<ffffffff80214bd0>] <ffffffff80214bd0>{_raw_write_lock+161} RSP: 0018:ffff81007c5b5c18 EFLAGS: 00000086 RAX: 000000008f02e600 RBX: ffff810037cec680 RCX: 00000000002c2671 RDX: 0000000000927190 RSI: 0000000000000001 RDI: ffff810037cec680 RBP: ffff810037cec668 R08: ffff810002d6b500 R09: 00000000fffffffa R10: 0000000000000003 R11: ffffffff80165922 R12: ffff810037cec680 R13: 00002aaaac200000 R14: ffff810002d6b540 R15: 00002aaabba6c000 FS: 00002aaaaaae6080(0000) GS:ffff81011fc466c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000033f38bdaf0 CR3: 000000007c296000 CR4: 00000000000006e0 Process mpi_multibw (pid: 23789, threadinfo ffff81007c5b4000, task ffff8100030557a0) Stack: ffff810002d6b540 ffffffff8016596b 0000000075ad5067 00002aaaac1b4000 ffff81007d451da0 ffffffff8016cc80 0000000000000000 ffff81007c5b5d38 ffffffffffffffff 0000000000000000 Call Trace: <ffffffff8016596b>{__set_page_dirty_nobuffers+73} <ffffffff8016cc80>{unmap_vmas+1042} <ffffffff8016fa78>{exit_mmap+124} <ffffffff80133f17>{mmput+37} <ffffffff80139783>{do_exit+584} <ffffffff80142aec>{__dequeue_signal+459} <ffffffff80139f00>{sys_exit_group+0} <ffffffff80143f03>{get_signal_to_deliver+1568} <ffffffff8010a37a>{do_signal+116} <ffffffff80197151>{__pollwait+0} <ffffffff80197e9c>{sys_select+934} <ffffffff8010acb7>{sysret_signal+28} <ffffffff8010afa3>{ptregscall_common+103} Code: 84 c0 75 7f f0 81 03 00 00 00 01 f3 90 48 83 c1 01 48 8b 15 Kernel panic - not syncing: nmi watchdog ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2006-05-18 7:04 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <fa.2ho1QSA8Kf7L8EFqp3rLsB7NE9s@ifi.uio.no>
[not found] ` <fa.yXZlqXBzNi9Gq/4Q6Wc9H6bw+lU@ifi.uio.no>
2006-05-17 16:47 ` [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes Dave Olson
2006-05-17 18:08 ` [openib-general] " Roland Dreier
2006-05-18 4:13 ` Dave Olson
2006-05-18 4:55 ` Roland Dreier
2006-05-18 5:15 ` Bryan O'Sullivan
2006-05-18 5:17 ` Roland Dreier
2006-05-18 7:04 ` Dave Olson
2006-05-18 5:26 ` Dave Olson
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.