Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes
       [not found] ` <fa.yXZlqXBzNi9Gq/4Q6Wc9H6bw+lU@ifi.uio.no>
@ 2006-05-17 16:47   ` Dave Olson
  2006-05-17 18:08     ` [openib-general] " Roland Dreier
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Olson @ 2006-05-17 16:47 UTC (permalink / raw)
  To: Roland Dreier; +Cc: Bryan O'Sullivan, openib-general, linux-kernel

On Mon, 15 May 2006, Roland Dreier wrote:

| This looks like a pastiche of several patches.  Why can't it be split
| up into logical pieces?
| 
|  > Call dma_free_coherent without ipath_mutex held.
| 
| Why?  Doesn't freeing work with the mutex held?

Sure, that's the way the previous code worked.

We are seeing a bug (with both our driver native MPI processes and mthca mvapic),
where when 8 processes using "simultaneously exit", we get watchdogs and/or hangs
in the close routines.   Moving the freeing outside the mutex was an attempt
to see if we were running into some VM issues by doing lots of page unlocking
and freeing with the mutex held.   It seemed to help somewhat, but not to solve
the problem.

It also allows other processes to open and close in a somewhat more timely
fashion.

Dave Olson
olson@unixfolk.com
http://www.unixfolk.com/dave

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes
  2006-05-17 16:47   ` [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes Dave Olson
@ 2006-05-17 18:08     ` Roland Dreier
  2006-05-18  4:13       ` Dave Olson
  0 siblings, 1 reply; 8+ messages in thread
From: Roland Dreier @ 2006-05-17 18:08 UTC (permalink / raw)
  To: Dave Olson; +Cc: linux-kernel, openib-general

    Dave> We are seeing a bug (with both our driver native MPI
    Dave> processes and mthca mvapic), where when 8 processes using
    Dave> "simultaneously exit", we get watchdogs and/or hangs in the
    Dave> close routines.  Moving the freeing outside the mutex was an
    Dave> attempt to see if we were running into some VM issues by
    Dave> doing lots of page unlocking and freeing with the mutex
    Dave> held.  It seemed to help somewhat, but not to solve the
    Dave> problem.

Am I understanding correctly that you see a hang or watchdog timeout
even with the mthca driver?

Is there any possibility of posting the test case to reproduce this?
It doesn't seem likely that ipath changes are going to fix a generic
bug like this...

 - R.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes
  2006-05-17 18:08     ` [openib-general] " Roland Dreier
@ 2006-05-18  4:13       ` Dave Olson
  2006-05-18  4:55         ` Roland Dreier
  2006-05-18  5:26         ` Dave Olson
  0 siblings, 2 replies; 8+ messages in thread
From: Dave Olson @ 2006-05-18  4:13 UTC (permalink / raw)
  To: Roland Dreier; +Cc: linux-kernel, openib-general

On Wed, 17 May 2006, Roland Dreier wrote:

|     Dave> We are seeing a bug (with both our driver native MPI
|     Dave> processes and mthca mvapic), where when 8 processes using
|     Dave> "simultaneously exit", we get watchdogs and/or hangs in the
|     Dave> close routines.  Moving the freeing outside the mutex was an
|     Dave> attempt to see if we were running into some VM issues by
|     Dave> doing lots of page unlocking and freeing with the mutex
|     Dave> held.  It seemed to help somewhat, but not to solve the
|     Dave> problem.
| 
| Am I understanding correctly that you see a hang or watchdog timeout
| even with the mthca driver?

Yes.   That is, the symptoms are the same, although the cause
may be different.

| Is there any possibility of posting the test case to reproduce this?

It's the MPI job mpi_multibw (based on the OSU osu_bw, but changed
to do messaging rate), running 8 copies per dual-core 4-socket opteron,
both on InfiniPath MPI, and MVAPICH (built for gen2).

We ship the source with our upcoming release, and will probably make
it available outside our release.

We did discover one possible problem today, which is shared between
our device code and the core openib code, and that's doing some 
memory freeing and accounting from a work thread (updating mm->locked_vm
and cleaning up from earlier get_user_pages); the code in our driver
was copied from the openib core code, it's not literally shared.

I have a strong suspicion that at least sometimes, it's executing after
the current->mm has gone away.   I'm looking at that more right now.

| It doesn't seem likely that ipath changes are going to fix a generic
| bug like this...

It wasn't an attempt to fix it, so much as to work around it, while
I worked on other higher priority stuff.   As I mentioned, it also helps
a bit in allowing multiple processes to be in the open and close code
simultaneously, when you have multiple cpus, so even on that basis,
I'd probably leave it as it now is.

Dave Olson
olson@unixfolk.com
http://www.unixfolk.com/dave

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes
  2006-05-18  4:13       ` Dave Olson
@ 2006-05-18  4:55         ` Roland Dreier
  2006-05-18  5:15           ` Bryan O'Sullivan
  2006-05-18  7:04           ` Dave Olson
  2006-05-18  5:26         ` Dave Olson
  1 sibling, 2 replies; 8+ messages in thread
From: Roland Dreier @ 2006-05-18  4:55 UTC (permalink / raw)
  To: Dave Olson; +Cc: linux-kernel, openib-general

    Dave> We did discover one possible problem today, which is shared
    Dave> between our device code and the core openib code, and that's
    Dave> doing some memory freeing and accounting from a work thread
    Dave> (updating mm->locked_vm and cleaning up from earlier
    Dave> get_user_pages); the code in our driver was copied from the
    Dave> openib core code, it's not literally shared.

    Dave> I have a strong suspicion that at least sometimes, it's
    Dave> executing after the current->mm has gone away.  I'm looking
    Dave> at that more right now.

It doesn't seem likely to me.  In uverbs_mem.c,
ib_umem_release_on_close() does get_task_mm() and gives up if it can't
take a reference to the task's mm.  The mmput() doesn't happen until
ib_umem_account() runs in the work thread.

I do see obvious bugs in ipath_user_pages.c, though.  In
ipath_release_user_pages_on_close(), you have:

		mm = get_task_mm(current);
		if (!mm)
			goto bail;

		work = kmalloc(sizeof(*work), GFP_KERNEL);
		if (!work)
			goto bail_mm;

		goto bail;

		INIT_WORK(&work->work, user_pages_account, work);
		work->mm = mm;
		work->num_pages = num_pages;

	bail_mm:
		mmput(mm);
	bail:
		return;

So with the "goto bail" you skip the code which does something with
the work you allocate, which means that you leak not only the work
structure but also the reference to the task's mm that you took.

Even without the "goto bail" the code still wouldn't actually schedule
the work, so the work structure would be leaked, although you would do
mmput().

I'm not sure what you were trying to do here.c

 - R.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes
  2006-05-18  4:55         ` Roland Dreier
@ 2006-05-18  5:15           ` Bryan O'Sullivan
  2006-05-18  5:17             ` Roland Dreier
  2006-05-18  7:04           ` Dave Olson
  1 sibling, 1 reply; 8+ messages in thread
From: Bryan O'Sullivan @ 2006-05-18  5:15 UTC (permalink / raw)
  To: Roland Dreier; +Cc: Dave Olson, linux-kernel, openib-general

On Wed, 2006-05-17 at 21:55 -0700, Roland Dreier wrote:

> So with the "goto bail" you skip the code which does something with
> the work you allocate, which means that you leak not only the work
> structure but also the reference to the task's mm that you took.

Wow.  I have no idea where that extra "goto bail" came from.  It's not
supposed to be there.

	<b


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes
  2006-05-18  5:15           ` Bryan O'Sullivan
@ 2006-05-18  5:17             ` Roland Dreier
  0 siblings, 0 replies; 8+ messages in thread
From: Roland Dreier @ 2006-05-18  5:17 UTC (permalink / raw)
  To: Bryan O'Sullivan; +Cc: Dave Olson, linux-kernel, openib-general

    Bryan> Wow.  I have no idea where that extra "goto bail" came
    Bryan> from.  It's not supposed to be there.

Even without it you still leak the work structure, because there's no
schedule_work().

Now that I look at it, in uverbs_mem.c, the mm will be leaked if the
kmalloc fails...

 - R.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes
  2006-05-18  4:13       ` Dave Olson
  2006-05-18  4:55         ` Roland Dreier
@ 2006-05-18  5:26         ` Dave Olson
  1 sibling, 0 replies; 8+ messages in thread
From: Dave Olson @ 2006-05-18  5:26 UTC (permalink / raw)
  To: Roland Dreier; +Cc: linux-kernel, openib-general

On Wed, 17 May 2006, Dave Olson wrote:

| On Wed, 17 May 2006, Roland Dreier wrote:
| 
| | Am I understanding correctly that you see a hang or watchdog timeout
| | even with the mthca driver?
| 
| Yes.   That is, the symptoms are the same, although the cause
| may be different.
| 
| | Is there any possibility of posting the test case to reproduce this?
| 
| It's the MPI job mpi_multibw (based on the OSU osu_bw, but changed
| to do messaging rate), running 8 copies per dual-core 4-socket opteron,
| both on InfiniPath MPI, and MVAPICH (built for gen2).

Here's the typical case where the watchdog fires (with infinipath MPI),
on FC4 2.6.16 2108 (without kprobes, with kprobes things are slightly
different, but not much; I'm running without since we were often in
the kprobes code from the exit code, but I think that's just a red-herring).

The sysrq p was some seconds prior to the watchdog.  It's almost as
though something is looping far too many times during the close cleanup.

The other 7 exitting processes are typically in
	sys_exit_group -> do_exit -> __up_red --> __spin_lock_irqsave -> __up_read (or __down_read)
(from what sysrq t prints).  They are all runnable on the other 7
processors.  

The infinipath driver does mmap both memory and device pages for each of
these processes.

SysRq : Show Regs
CPU 0:           
Modules linked in: ib_sdp(U) ib_cm(U) ib_umad(U) ib_uverbs(U) ib_ipath(U) ib_ipoib(U) ib_sa(U) ib_mad(U) ib_core(U) ipath_core(U) nfs(U) nfsd(U) exportfs(U) lockd(U) nfs_acl(U) ipv6(U) autofs4(U) sunrpc(U) video(U) button(U) battery(U) ac(U) i2c_nforce2(U) i2c_core(U) e1000(U) floppy(U) sg(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) ext3(U) jbd(U) dm_mod(U) sata_nv(U) libata(U) aic79xx(U) scsi_transport_spi(U) sd_mod(U) scsi_mod(U)
Pid: 23788, comm: mpi_multibw Not tainted 2.6.16-1.2108_FC4.rootsmp #1           
RIP: 0010:[<ffffffff8013c50e>] <ffffffff8013c50e>{__do_softirq+81}    
RSP: 0018:ffffffff8048d368  EFLAGS: 00000206                      
RAX: 0000000000000022 RBX: 0000000000000022 RCX: 0000000000000080
RDX: 0000000000000000 RSI: 00000000000000c0 RDI: ffff81007f1fd0c0
RBP: ffffffff80528f80 R08: 0000000000000200 R09: 0000000000000002
R10: ffffffff804a6a38 R11: 0000000000000000 R12: ffffffff80577c80
R13: 0000000000000000 R14: 000000000000000a R15: 00002aaabba6c000
FS:  00002aaaab32ffa0(0000) GS:ffffffff80511000(0000) knlGS:00000000f7fc86c0
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b                           
CR2: 000055555565ebe8 CR3: 000000007ac6d000 CR4: 00000000000006e0
                                                                 
Call Trace: <IRQ> <ffffffff8010c076>{call_softirq+30}
       <ffffffff8010d82c>{do_softirq+44} <ffffffff8010b9d0>{apic_timer_interrupt+132} <EOI>
       <ffffffff80355226>{_write_unlock_irq+14} <ffffffff801659d9>{__set_page_dirty_nobuffers+183}
       <ffffffff8016cc80>{unmap_vmas+1042} <ffffffff8016fa78>{exit_mmap+124}
       <ffffffff80133f17>{mmput+37} <ffffffff80139783>{do_exit+584}         
       <ffffffff80142aec>{__dequeue_signal+459} <ffffffff80139f00>{sys_exit_group+0}
       <ffffffff80143f03>{get_signal_to_deliver+1568} <ffffffff8010a37a>{do_signal+116}
       <ffffffff80197151>{__pollwait+0} <ffffffff80197e9c>{sys_select+934}             
       <ffffffff8010acb7>{sysret_signal+28} <ffffffff8010afa3>{ptregscall_common+103}

[ perhaps 20 or 30 seconds later, NMI fires; we had already been sort of
stuck for 60 seconds or so when I did the sysrq p above ]

NMI Watchdog detected LOCKUP on CPU 1                                                
CPU 1                                
Modules linked in: ib_sdp(U) ib_cm(U) ib_umad(U) ib_uverbs(U) ib_ipath(U) ib_ipoib(U) ib_sa(U) ib_mad(U) ib_core(U) ipath_core(U) nfs(U) nfsd(U) exportfs(U) lockd(U) nfs_acl(U) ipv6(U) autofs4(U) sunrpc(U) video(U) button(U) battery(U) ac(U) i2c_nforce2(U) i2c_core(U) e1000(U) floppy(U) sg(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) ext3(U) jbd(U) dm_mod(U) sata_nv(U) libata(U) aic79xx(U) scsi_transport_spi(U) sd_mod(U) scsi_mod(U)
Pid: 23789, comm: mpi_multibw Not tainted 2.6.16-1.2108_FC4.rootsmp #1           
RIP: 0010:[<ffffffff80214bd0>] <ffffffff80214bd0>{_raw_write_lock+161}
RSP: 0018:ffff81007c5b5c18  EFLAGS: 00000086                          
RAX: 000000008f02e600 RBX: ffff810037cec680 RCX: 00000000002c2671
RDX: 0000000000927190 RSI: 0000000000000001 RDI: ffff810037cec680
RBP: ffff810037cec668 R08: ffff810002d6b500 R09: 00000000fffffffa
R10: 0000000000000003 R11: ffffffff80165922 R12: ffff810037cec680
R13: 00002aaaac200000 R14: ffff810002d6b540 R15: 00002aaabba6c000
FS:  00002aaaaaae6080(0000) GS:ffff81011fc466c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b                           
CR2: 00000033f38bdaf0 CR3: 000000007c296000 CR4: 00000000000006e0
Process mpi_multibw (pid: 23789, threadinfo ffff81007c5b4000, task ffff8100030557a0)
Stack: ffff810002d6b540 ffffffff8016596b 0000000075ad5067 00002aaaac1b4000          
       ffff81007d451da0 ffffffff8016cc80 0000000000000000 ffff81007c5b5d38 
       ffffffffffffffff 0000000000000000                                   
Call Trace: <ffffffff8016596b>{__set_page_dirty_nobuffers+73}
       <ffffffff8016cc80>{unmap_vmas+1042} <ffffffff8016fa78>{exit_mmap+124}
       <ffffffff80133f17>{mmput+37} <ffffffff80139783>{do_exit+584}         
       <ffffffff80142aec>{__dequeue_signal+459} <ffffffff80139f00>{sys_exit_group+0}
       <ffffffff80143f03>{get_signal_to_deliver+1568} <ffffffff8010a37a>{do_signal+116}
       <ffffffff80197151>{__pollwait+0} <ffffffff80197e9c>{sys_select+934}             
       <ffffffff8010acb7>{sysret_signal+28} <ffffffff8010afa3>{ptregscall_common+103}
                                                                                     
Code: 84 c0 75 7f f0 81 03 00 00 00 01 f3 90 48 83 c1 01 48 8b 15 
Kernel panic - not syncing: nmi watchdog    

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes
  2006-05-18  4:55         ` Roland Dreier
  2006-05-18  5:15           ` Bryan O'Sullivan
@ 2006-05-18  7:04           ` Dave Olson
  1 sibling, 0 replies; 8+ messages in thread
From: Dave Olson @ 2006-05-18  7:04 UTC (permalink / raw)
  To: Roland Dreier; +Cc: linux-kernel, openib-general

On Wed, 17 May 2006, Roland Dreier wrote:
| I do see obvious bugs in ipath_user_pages.c, though.  In
| ipath_release_user_pages_on_close(), you have:
| 
| 		mm = get_task_mm(current);
| 		if (!mm)
| 			goto bail;

It turns out that since this is called from ipath_close(),
mm will always be NULL, so what we do is leak memory, and
possibly leave some locked pages.  I've been looking at this
code this evening; fixing it is clearly needed, but doesn't help
the long delays, hangs, and watchdogs, so far.

Dave Olson
olson@unixfolk.com
http://www.unixfolk.com/dave

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-05-18  7:04 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <fa.2ho1QSA8Kf7L8EFqp3rLsB7NE9s@ifi.uio.no>
     [not found] ` <fa.yXZlqXBzNi9Gq/4Q6Wc9H6bw+lU@ifi.uio.no>
2006-05-17 16:47   ` [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes Dave Olson
2006-05-17 18:08     ` [openib-general] " Roland Dreier
2006-05-18  4:13       ` Dave Olson
2006-05-18  4:55         ` Roland Dreier
2006-05-18  5:15           ` Bryan O'Sullivan
2006-05-18  5:17             ` Roland Dreier
2006-05-18  7:04           ` Dave Olson
2006-05-18  5:26         ` Dave Olson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.