All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: copying large files over NFS locks up machine on-testing from Thursday
@ 2005-05-21 20:40 Ian Pratt
  0 siblings, 0 replies; 10+ messages in thread
From: Ian Pratt @ 2005-05-21 20:40 UTC (permalink / raw)
  To: Kip Macy, Chris Wright; +Cc: xen-devel

 

> If it happens on -unstable in domUs (it probably does) I can 
> grab a coredump or attach to it while it is running and see 
> where it is wedged. I don't have a port master where I'm at 
> so I can't use pdb or cdb.

It's also worth giving it a prod with 'xm sysrq' to see what's going on.
Also knowing whether its busy looping or blocked can be useful.

Ian
 
> On 5/21/05, Chris Wright <chrisw@osdl.org> wrote:
> > * Kip Macy (kip.macy@gmail.com) wrote:
> > > I've locked up my dom0 a couple of times this morning 
> copying a 3GB 
> > > file from local disk to an NFS mount(neither xend nor 
> guests running).
> > > I don't encounter this problem on the stock CentOS 4 kernel. The 
> > > machine is a PowerEdge 2850 with 2 e1000 cards - the one 
> in use is 
> > > connected to a PowerConnect 2216 10/100 switch and has negotiated 
> > > 100Mbit. I'll check if the stock Cambridge isn't negotiating full 
> > > duplex but that shouldn't cause lockups.
> > >
> > > My mount options are:
> > > defaults,intr,rsize=32768,wsize=32768,nfsvers=3,tcp,timeo=600
> > 
> > Hmm, I've seen this on stock 2.6.11 kernels as well (no xen).  Any 
> > chance you can get useful debugging out it?
> > 
> > thanks,
> > -chris
> >
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: copying large files over NFS locks up machine on-testing from Thursday
@ 2005-05-22  2:51 Ian Pratt
  2005-05-22  3:26 ` Kip Macy
  0 siblings, 1 reply; 10+ messages in thread
From: Ian Pratt @ 2005-05-22  2:51 UTC (permalink / raw)
  To: Kip Macy, Chris Wright; +Cc: xen-devel

 
> I just tried copying the 3GB file over NFS from within a domU 
> on -testing in hopes of getting some debug info. dom0 became 
> unresponsive for a few seconds after close to a minute. It 
> had successfully copied 2.3GB when I hit ^C and then started 
> a copy from NFS to the domU's / which itself is a loopback 
> device mounted over NFS in dom0 - shortly thereafter the 
> machine locked up, the only output being complaints from 
> megaraid about aborted SCSI commands. It seems possible that 
> this is a dom0 issue.

It sounds like the megaraid driver is unhappy. Can you reproduce this
copying the file to /dev/null?

It's worth checking the Dell site to make sure you have the latest
megaraid firmware and driver.

Ian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: copying large files over NFS locks up machine on-testing from Thursday
  2005-05-22  2:51 copying large files over NFS locks up machine on-testing from Thursday Ian Pratt
@ 2005-05-22  3:26 ` Kip Macy
  2005-05-22 18:24   ` Kip Macy
  0 siblings, 1 reply; 10+ messages in thread
From: Kip Macy @ 2005-05-22  3:26 UTC (permalink / raw)
  To: Ian Pratt; +Cc: Chris Wright, xen-devel

 > It sounds like the megaraid driver is unhappy. Can you reproduce this
> copying the file to /dev/null?

I think it was unhappy because its interrupts weren't being serviced.
Copying /home/kmacy/suseroot.0 to /home/kmacy/suseroot.1 (NFS -> NFS)
locks the machine up just fine. The machine will also become
unresponsive transiently when running fsck in domU on a filesystem
that is a loopback device mounted over NFS.

> It's worth checking the Dell site to make sure you have the latest
> megaraid firmware and driver.

Running native mainline 2.6.11.10 NFS transfers don't cause any
problems. To reduce the possibility of it being a benchmark-like SUE
I'll do a clean build from scratch of the dom0 kernel.


        -Kip

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: copying large files over NFS locks up machine on-testing from Thursday
  2005-05-22  3:26 ` Kip Macy
@ 2005-05-22 18:24   ` Kip Macy
  2005-05-22 18:42     ` Kip Macy
  0 siblings, 1 reply; 10+ messages in thread
From: Kip Macy @ 2005-05-22 18:24 UTC (permalink / raw)
  To: Ian Pratt; +Cc: Chris Wright, xen-devel

I just updated to 2.0.6. Large NFS transfers work fine with the
default configuration. If xend has been started (i.e. the bridge has
been configured) the machine will become unresponsive when copying a
large file over NFS.

I'll just skip migration testing for now.

        -Kip



On 5/21/05, Kip Macy <kip.macy@gmail.com> wrote:
>  > It sounds like the megaraid driver is unhappy. Can you reproduce this
> > copying the file to /dev/null?
> 
> I think it was unhappy because its interrupts weren't being serviced.
> Copying /home/kmacy/suseroot.0 to /home/kmacy/suseroot.1 (NFS -> NFS)
> locks the machine up just fine. The machine will also become
> unresponsive transiently when running fsck in domU on a filesystem
> that is a loopback device mounted over NFS.
> 
> > It's worth checking the Dell site to make sure you have the latest
> > megaraid firmware and driver.
> 
> Running native mainline 2.6.11.10 NFS transfers don't cause any
> problems. To reduce the possibility of it being a benchmark-like SUE
> I'll do a clean build from scratch of the dom0 kernel.
> 
> 
>         -Kip
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: copying large files over NFS locks up machine on-testing from Thursday
@ 2005-05-22 18:40 Ian Pratt
  2005-05-22 19:24 ` Kip Macy
  0 siblings, 1 reply; 10+ messages in thread
From: Ian Pratt @ 2005-05-22 18:40 UTC (permalink / raw)
  To: Kip Macy; +Cc: Chris Wright, xen-devel

 
> I just updated to 2.0.6. Large NFS transfers work fine with 
> the default configuration. If xend has been started (i.e. the 
> bridge has been configured) the machine will become 
> unresponsive when copying a large file over NFS.

It would be good to see if you can reproduce this on native with a
bridge running.

Thanks,
Ian

> On 5/21/05, Kip Macy <kip.macy@gmail.com> wrote:
> >  > It sounds like the megaraid driver is unhappy. Can you reproduce 
> > this
> > > copying the file to /dev/null?
> > 
> > I think it was unhappy because its interrupts weren't being 
> serviced.
> > Copying /home/kmacy/suseroot.0 to /home/kmacy/suseroot.1 
> (NFS -> NFS) 
> > locks the machine up just fine. The machine will also become 
> > unresponsive transiently when running fsck in domU on a filesystem 
> > that is a loopback device mounted over NFS.
> > 
> > > It's worth checking the Dell site to make sure you have 
> the latest 
> > > megaraid firmware and driver.
> > 
> > Running native mainline 2.6.11.10 NFS transfers don't cause any 
> > problems. To reduce the possibility of it being a 
> benchmark-like SUE 
> > I'll do a clean build from scratch of the dom0 kernel.
> > 
> > 
> >         -Kip
> >
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: copying large files over NFS locks up machine on-testing from Thursday
  2005-05-22 18:24   ` Kip Macy
@ 2005-05-22 18:42     ` Kip Macy
  0 siblings, 0 replies; 10+ messages in thread
From: Kip Macy @ 2005-05-22 18:42 UTC (permalink / raw)
  To: Ian Pratt; +Cc: Chris Wright, xen-devel

No bridge helps with NFS, but I just tried scp on a freshly re-booted
machine and it locked up instantly. Booting into Centos4 SMP and the
scp works fine - if this a SUE it is a particularly inventive one.

I guess it is time to move to the server room and cross my fingers
that this is a 100Mbit issue.

         -Kip


On 5/22/05, Kip Macy <kip.macy@gmail.com> wrote:
> I just updated to 2.0.6. Large NFS transfers work fine with the
> default configuration. If xend has been started (i.e. the bridge has
> been configured) the machine will become unresponsive when copying a
> large file over NFS.
> 
> I'll just skip migration testing for now.
> 
>         -Kip
> 
> 
> 
> On 5/21/05, Kip Macy <kip.macy@gmail.com> wrote:
> >  > It sounds like the megaraid driver is unhappy. Can you reproduce this
> > > copying the file to /dev/null?
> >
> > I think it was unhappy because its interrupts weren't being serviced.
> > Copying /home/kmacy/suseroot.0 to /home/kmacy/suseroot.1 (NFS -> NFS)
> > locks the machine up just fine. The machine will also become
> > unresponsive transiently when running fsck in domU on a filesystem
> > that is a loopback device mounted over NFS.
> >
> > > It's worth checking the Dell site to make sure you have the latest
> > > megaraid firmware and driver.
> >
> > Running native mainline 2.6.11.10 NFS transfers don't cause any
> > problems. To reduce the possibility of it being a benchmark-like SUE
> > I'll do a clean build from scratch of the dom0 kernel.
> >
> >
> >         -Kip
> >
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: copying large files over NFS locks up machine on-testing from Thursday
  2005-05-22 18:40 Ian Pratt
@ 2005-05-22 19:24 ` Kip Macy
  0 siblings, 0 replies; 10+ messages in thread
From: Kip Macy @ 2005-05-22 19:24 UTC (permalink / raw)
  To: Ian Pratt; +Cc: Chris Wright, xen-devel

It turns out it isn't bridging - the NFS lockup only happens if xend
is running. If I start then stop xend and do a NFS transfer I don't
hit any problems. Any idea what xend could be doing that would be
making the system so unhappy?



  -Kip


> It would be good to see if you can reproduce this on native with a
> bridge running.
> 
> Thanks,
> Ian
> 
> > On 5/21/05, Kip Macy <kip.macy@gmail.com> wrote:
> > >  > It sounds like the megaraid driver is unhappy. Can you reproduce 
> > > this
> > > > copying the file to /dev/null?
> > > 
> > > I think it was unhappy because its interrupts weren't being 
> > serviced.
> > > Copying /home/kmacy/suseroot.0 to /home/kmacy/suseroot.1 
> > (NFS -> NFS) 
> > > locks the machine up just fine. The machine will also become 
> > > unresponsive transiently when running fsck in domU on a filesystem 
> > > that is a loopback device mounted over NFS.
> > > 
> > > > It's worth checking the Dell site to make sure you have 
> > the latest 
> > > > megaraid firmware and driver.
> > > 
> > > Running native mainline 2.6.11.10 NFS transfers don't cause any 
> > > problems. To reduce the possibility of it being a 
> > benchmark-like SUE 
> > > I'll do a clean build from scratch of the dom0 kernel.
> > > 
> > > 
> > >         -Kip
> > >
> > 
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: copying large files over NFS locks up machine on-testing from Thursday
@ 2005-05-22 19:36 Ian Pratt
  2005-05-22 19:54 ` Kip Macy
  0 siblings, 1 reply; 10+ messages in thread
From: Ian Pratt @ 2005-05-22 19:36 UTC (permalink / raw)
  To: Kip Macy; +Cc: Chris Wright, xen-devel


> It turns out it isn't bridging - the NFS lockup only happens 
> if xend is running. If I start then stop xend and do a NFS 
> transfer I don't hit any problems. Any idea what xend could 
> be doing that would be making the system so unhappy?

It's really unlikely to be xend that's causing this. Are you sure its
not just xend running the network script to start the bridge?

2.0 vintage xend is the cause of many troubles, but I think its likely
to be innocent in this case :-)  

Ian 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: copying large files over NFS locks up machine on-testing from Thursday
  2005-05-22 19:36 Ian Pratt
@ 2005-05-22 19:54 ` Kip Macy
  2005-05-22 20:40   ` Kip Macy
  0 siblings, 1 reply; 10+ messages in thread
From: Kip Macy @ 2005-05-22 19:54 UTC (permalink / raw)
  To: Ian Pratt; +Cc: Chris Wright, xen-devel

> It's really unlikely to be xend that's causing this. Are you sure its
> not just xend running the network script to start the bridge?

That was my initial assumption, but running /etc/xen/scripts/network
start to start the bridge before doing transfers didn't cause any
problem. Additionally, the bridge is still up after xend is shutdown.

I've just moved it into the server room where the switch is GigE so
we'll find out shortly if it is some weird interaction with the
specific rev of the network card.
 
> 2.0 vintage xend is the cause of many troubles, but I think its likely
> to be innocent in this case :-)  

That would certainly be my thinking - but at this point it is the only
common item and I'm grasping at straws.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: copying large files over NFS locks up machine on-testing from Thursday
  2005-05-22 19:54 ` Kip Macy
@ 2005-05-22 20:40   ` Kip Macy
  0 siblings, 0 replies; 10+ messages in thread
From: Kip Macy @ 2005-05-22 20:40 UTC (permalink / raw)
  To: Ian Pratt; +Cc: Chris Wright, xen-devel

On 5/22/05, Kip Macy <kip.macy@gmail.com> wrote:
> > It's really unlikely to be xend that's causing this. Are you sure its
> > not just xend running the network script to start the bridge?
> 
> That was my initial assumption, but running /etc/xen/scripts/network
> start to start the bridge before doing transfers didn't cause any
> problem. Additionally, the bridge is still up after xend is shutdown.
> 
> I've just moved it into the server room where the switch is GigE so
> we'll find out shortly if it is some weird interaction with the
> specific rev of the network card.

Never mind. The two switches in the server room are 100 Mbit. 
 
       -Kip

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2005-05-22 20:40 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-22  2:51 copying large files over NFS locks up machine on-testing from Thursday Ian Pratt
2005-05-22  3:26 ` Kip Macy
2005-05-22 18:24   ` Kip Macy
2005-05-22 18:42     ` Kip Macy
  -- strict thread matches above, loose matches on Subject: below --
2005-05-22 19:36 Ian Pratt
2005-05-22 19:54 ` Kip Macy
2005-05-22 20:40   ` Kip Macy
2005-05-22 18:40 Ian Pratt
2005-05-22 19:24 ` Kip Macy
2005-05-21 20:40 Ian Pratt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.