public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed
* [Linux-ia64] 2.4.1 network problems
@ 2001-02-13 23:43 Michael Madore
  2001-02-14 18:49 ` David Mosberger
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Michael Madore @ 2001-02-13 23:43 UTC (permalink / raw)
  To: linux-ia64

Hi,

I am experiencing a lot of problems with 2.4.1 + David's patch on Lions.  If
I try to scp a file from another machine the network connection dies almost
immediately.  If I restrict the memory available to the kernel at boot time
to 1024MB, then the problem seems to go away.  I've tried this on 4 Lions,
all with the eepro100 card.

Has anyone else seen this?  I've had problems with the eepro100 before, but
things are much worse now.

-- 
Mike Madore
Software Engineer
TurboLinux, Inc.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Linux-ia64] 2.4.1 network problems
  2001-02-13 23:43 [Linux-ia64] 2.4.1 network problems Michael Madore
@ 2001-02-14 18:49 ` David Mosberger
  2001-02-14 20:10 ` Michael Madore
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: David Mosberger @ 2001-02-14 18:49 UTC (permalink / raw)
  To: linux-ia64

Can't say I have the problem you're describing.  I did have one
strange NFS related hang yesterday, after 13 days of uptime on a 2-way
Big Sur.  I did an "rpm -Uvh" while the current working directory was
on an NFS mounted filesystem and the "rpm" hung (even though the RPM
file itself was on a local filesystem).  It appeared that the
__rpc_execute() was waiting for an event that never happened and there
didn't seem to be a timeout either.  Of course, when doing the same
thing again after rebooting the system, it worked just fine.  Also,
note that it was only that one "rpm" process that got stuck (so it's
not like the kernel's timer facility was hosed all together).

So, for now, we should probably focus on trying to find a test case
that reproduces the problem reliably (or at least with decent
frequency).

Actually, IIRC, there are some eepro100 patches in the pipe for 2.4.2,
but I haven't played with it.  Also, given that Mike says the problems
happen both with e100 and eepro100, I suspect those won't help.

	--david

>>>>> On Tue, 13 Feb 2001 15:43:37 -0800, Michael Madore <mmadore@turbolinux.com> said:

  Mike> Hi, I am experiencing a lot of problems with 2.4.1 + David's
  Mike> patch on Lions.  If I try to scp a file from another machine
  Mike> the network connection dies almost immediately.  If I restrict
  Mike> the memory available to the kernel at boot time to 1024MB,
  Mike> then the problem seems to go away.  I've tried this on 4
  Mike> Lions, all with the eepro100 card.

  Mike> Has anyone else seen this?  I've had problems with the
  Mike> eepro100 before, but things are much worse now.

  Mike> -- Mike Madore Software Engineer TurboLinux, Inc.

  Mike> _______________________________________________ Linux-IA64
  Mike> mailing list Linux-IA64@linuxia64.org
  Mike> http://lists.linuxia64.org/lists/listinfo/linux-ia64


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Linux-ia64] 2.4.1 network problems
  2001-02-13 23:43 [Linux-ia64] 2.4.1 network problems Michael Madore
  2001-02-14 18:49 ` David Mosberger
@ 2001-02-14 20:10 ` Michael Madore
  2001-02-14 20:35 ` David Mosberger
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Michael Madore @ 2001-02-14 20:10 UTC (permalink / raw)
  To: linux-ia64

On Wed, Feb 14, 2001 at 10:49:12AM -0800, David Mosberger wrote:
> Can't say I have the problem you're describing.  I did have one
> strange NFS related hang yesterday, after 13 days of uptime on a 2-way
> Big Sur.  I did an "rpm -Uvh" while the current working directory was
> on an NFS mounted filesystem and the "rpm" hung (even though the RPM
> file itself was on a local filesystem).  It appeared that the
> __rpc_execute() was waiting for an event that never happened and there
> didn't seem to be a timeout either.  Of course, when doing the same
> thing again after rebooting the system, it worked just fine.  Also,
> note that it was only that one "rpm" process that got stuck (so it's
> not like the kernel's timer facility was hosed all together).
> 
> So, for now, we should probably focus on trying to find a test case
> that reproduces the problem reliably (or at least with decent
> frequency).
> 
> Actually, IIRC, there are some eepro100 patches in the pipe for 2.4.2,
> but I haven't played with it.  Also, given that Mike says the problems
> happen both with e100 and eepro100, I suspect those won't help.

In addition to the >1024MB problem, I am also experiencing a reproducible
nfs hang on Lions.  If I copy this file

   ftp://frontier.turbolinux.com/pub/ia64/nfshang.bin 

from an nfs server to the local file system, the copy hangs after
transferring 294912 bytes.  If I copy the file from the local filesystem to the 
server, the copy completes successfully.  The hang only occurs with certain files.

The following messages are logged:

nfs: server plateau not responding, still trying
nfs: task 294 can't get a request slot
eth0: 0 multicast blocks dropped.
eepro100: wait_for_cmd_done timeout!
eepro100: wait_for_cmd_done timeout!
eepro100: wait_for_cmd_done timeout!
nfs: task 295 can't get a request slot
eepro100: wait_for_cmd_dont timeout!

Only nfs seems to be affected.  I am able to ssh into and out of the box.

Also, the hang does not happen if I switch to a 3COM 3C905B network card.

I have tried the same experiment with the Intel driver module.  Although the
problem is harder to reproduce,  when nfs does hang, it is in the same
offset in the same file.


-- 
Mike Madore
Software Engineer
TurboLinux, Inc.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Linux-ia64] 2.4.1 network problems
  2001-02-13 23:43 [Linux-ia64] 2.4.1 network problems Michael Madore
  2001-02-14 18:49 ` David Mosberger
  2001-02-14 20:10 ` Michael Madore
@ 2001-02-14 20:35 ` David Mosberger
  2001-02-14 20:39 ` Don Dugger
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: David Mosberger @ 2001-02-14 20:35 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Wed, 14 Feb 2001 12:10:24 -0800, Michael Madore <mmadore@turbolinux.com> said:

  Mike> In addition to the >1024MB problem, I am also experiencing a
  Mike> reproducible nfs hang on Lions.  If I copy this file

  Mike>    ftp://frontier.turbolinux.com/pub/ia64/nfshang.bin

  Mike> from an nfs server to the local file system, the copy hangs
  Mike> after transferring 294912 bytes.  If I copy the file from the
  Mike> local filesystem to the server, the copy completes
  Mike> successfully.  The hang only occurs with certain files.

That's a rather bizarre bug.  I can confirm this is happening on one
of our Lions as well.  The Big Sur (also with eepro100) is fine
though.  The Lion has 4GB and the Big Sur only 1GB of RAM.

Thanks,

	--david


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Linux-ia64] 2.4.1 network problems
  2001-02-13 23:43 [Linux-ia64] 2.4.1 network problems Michael Madore
                   ` (2 preceding siblings ...)
  2001-02-14 20:35 ` David Mosberger
@ 2001-02-14 20:39 ` Don Dugger
  2001-02-15  9:46 ` Andreas Schwab
  2001-02-16  6:10 ` Dragan Stancevic
  5 siblings, 0 replies; 7+ messages in thread
From: Don Dugger @ 2001-02-14 20:39 UTC (permalink / raw)
  To: linux-ia64

Mike-

Just as a data point I downloaded you file and copied over NFS to
my BigSur multiple times, both to and from the BigSur.  No problems
with any of the copies.

On Wed, Feb 14, 2001 at 12:10:24PM -0800, Michael Madore wrote:
> 
> On Wed, Feb 14, 2001 at 10:49:12AM -0800, David Mosberger wrote:
> > Can't say I have the problem you're describing.  I did have one
> > strange NFS related hang yesterday, after 13 days of uptime on a 2-way
> > Big Sur.  I did an "rpm -Uvh" while the current working directory was
> > on an NFS mounted filesystem and the "rpm" hung (even though the RPM
> > file itself was on a local filesystem).  It appeared that the
> > __rpc_execute() was waiting for an event that never happened and there
> > didn't seem to be a timeout either.  Of course, when doing the same
> > thing again after rebooting the system, it worked just fine.  Also,
> > note that it was only that one "rpm" process that got stuck (so it's
> > not like the kernel's timer facility was hosed all together).
> > 
> > So, for now, we should probably focus on trying to find a test case
> > that reproduces the problem reliably (or at least with decent
> > frequency).
> > 
> > Actually, IIRC, there are some eepro100 patches in the pipe for 2.4.2,
> > but I haven't played with it.  Also, given that Mike says the problems
> > happen both with e100 and eepro100, I suspect those won't help.
> 
> In addition to the >1024MB problem, I am also experiencing a reproducible
> nfs hang on Lions.  If I copy this file
> 
>    ftp://frontier.turbolinux.com/pub/ia64/nfshang.bin 
> 
> from an nfs server to the local file system, the copy hangs after
> transferring 294912 bytes.  If I copy the file from the local filesystem to the 
> server, the copy completes successfully.  The hang only occurs with certain files.
> 
> The following messages are logged:
> 
> nfs: server plateau not responding, still trying
> nfs: task 294 can't get a request slot
> eth0: 0 multicast blocks dropped.
> eepro100: wait_for_cmd_done timeout!
> eepro100: wait_for_cmd_done timeout!
> eepro100: wait_for_cmd_done timeout!
> nfs: task 295 can't get a request slot
> eepro100: wait_for_cmd_dont timeout!
> 
> Only nfs seems to be affected.  I am able to ssh into and out of the box.
> 
> Also, the hang does not happen if I switch to a 3COM 3C905B network card.
> 
> I have tried the same experiment with the Intel driver module.  Although the
> problem is harder to reproduce,  when nfs does hang, it is in the same
> offset in the same file.
> 
> 
> -- 
> Mike Madore
> Software Engineer
> TurboLinux, Inc.
> 
> _______________________________________________
> Linux-IA64 mailing list
> Linux-IA64@linuxia64.org
> http://lists.linuxia64.org/lists/listinfo/linux-ia64

-- 
Don Dugger
"Censeo Toto nos in Kansa esse decisse." - D. Gale
n0ano@valinux.com
Ph: 303/938-9838


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Linux-ia64] 2.4.1 network problems
  2001-02-13 23:43 [Linux-ia64] 2.4.1 network problems Michael Madore
                   ` (3 preceding siblings ...)
  2001-02-14 20:39 ` Don Dugger
@ 2001-02-15  9:46 ` Andreas Schwab
  2001-02-16  6:10 ` Dragan Stancevic
  5 siblings, 0 replies; 7+ messages in thread
From: Andreas Schwab @ 2001-02-15  9:46 UTC (permalink / raw)
  To: linux-ia64

Don Dugger <n0ano@valinux.com> writes:

|> Mike-
|> 
|> Just as a data point I downloaded you file and copied over NFS to
|> my BigSur multiple times, both to and from the BigSur.  No problems
|> with any of the copies.

The NFS problem seems to be firmware dependent.  None of our BigSurs
suffer from it, but all our Lions do since I have upgraded them beyond
build 55.

Andreas.

-- 
Andreas Schwab                                  "And now for something
SuSE Labs                                        completely different."
Andreas.Schwab@suse.de
SuSE GmbH, Schanzäckerstr. 10, D-90443 Nürnberg


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Linux-ia64] 2.4.1 network problems
  2001-02-13 23:43 [Linux-ia64] 2.4.1 network problems Michael Madore
                   ` (4 preceding siblings ...)
  2001-02-15  9:46 ` Andreas Schwab
@ 2001-02-16  6:10 ` Dragan Stancevic
  5 siblings, 0 replies; 7+ messages in thread
From: Dragan Stancevic @ 2001-02-16  6:10 UTC (permalink / raw)
  To: linux-ia64

On Wed, Feb 14, 2001, David Mosberger <davidm@hpl.hp.com> wrote:
; >>>>> On Wed, 14 Feb 2001 12:10:24 -0800, Michael Madore <mmadore@turbolinux.com> said:
; 
;   Mike> In addition to the >1024MB problem, I am also experiencing a
;   Mike> reproducible nfs hang on Lions.  If I copy this file
; 
;   Mike>    ftp://frontier.turbolinux.com/pub/ia64/nfshang.bin
; 
;   Mike> from an nfs server to the local file system, the copy hangs
;   Mike> after transferring 294912 bytes.  If I copy the file from the
;   Mike> local filesystem to the server, the copy completes
;   Mike> successfully.  The hang only occurs with certain files.
; 
; That's a rather bizarre bug.  I can confirm this is happening on one
; of our Lions as well.  The Big Sur (also with eepro100) is fine
; though.  The Lion has 4GB and the Big Sur only 1GB of RAM.

If I am right it's not the offset but it's rather the data combined with
a certain fragmentation of the packets, if you use these two files
to "cp" them over nfs you should see the hang at 0 bytes transfered,
don't know yet why not all systems are affected, I am still working on it.

ftp://ftp.valinux.com/pub/people/visitor/0xdeadbeef/

-- 
No Kernel Hackers were harmed during writing of this email.

                                                -Dragan


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2001-02-16  6:10 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-02-13 23:43 [Linux-ia64] 2.4.1 network problems Michael Madore
2001-02-14 18:49 ` David Mosberger
2001-02-14 20:10 ` Michael Madore
2001-02-14 20:35 ` David Mosberger
2001-02-14 20:39 ` Don Dugger
2001-02-15  9:46 ` Andreas Schwab
2001-02-16  6:10 ` Dragan Stancevic

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox