From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Fri, 7 Sep 2001 09:14:46 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Fri, 7 Sep 2001 09:14:36 -0400 Received: from picard.csihq.com ([204.17.222.1]:11407 "EHLO picard.csihq.com") by vger.kernel.org with ESMTP id ; Fri, 7 Sep 2001 09:14:25 -0400 Message-ID: <033a01c1379e$e3514880$e1de11cc@csihq.com> From: "Mike Black" To: "Trond Myklebust" Cc: "linux-kernel" In-Reply-To: <024f01c13601$c763d3c0$e1de11cc@csihq.com> Subject: Re: 2.4.8 NFS Problems Date: Fri, 7 Sep 2001 09:13:29 -0400 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2600.0000 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org But my timeouts were only 10 seconds -- well below the timeo and retrans timeout periods. And my network traffic shows that this is the client causing the problem NOT the server. It's the read() that pauses for 10 seconds and then the NFS write immediately returns EIO. So...I don't think soft mounts has anything to do with it. Also...I've now seen this error once more even with the 4096 read/write sizes. ________________________________________ Michael D. Black Principal Engineer mblack@csihq.com 321-676-2923,x203 http://www.csihq.com Computer Science Innovations http://www.csihq.com/~mike My home page FAX 321-676-2355 ----- Original Message ----- From: "Trond Myklebust" To: "Mike Black" Cc: "linux-kernel" Sent: Friday, September 07, 2001 7:49 AM Subject: Re: 2.4.8 NFS Problems >>>>> " " == Mike Black writes: > I've been getting random NFS EIO errors for a few months but > now it's repeatable. Trying to copy a large file from one > 2.4.8 SMP box to another is consistently failing (at different > offsets each time). This doesn't appear to be a network > problem as the last comm between the machines looks OK. By the > timestamps it appears that a read() is taking too long and > causing a timeout? Morale: Don't use soft mounts: they are prone to these things. If you insist on using them, then try playing around with the `timeo' and `retrans' mount variables. Soft mount timeouts are not only due to network problems, but can equally well be due to internal congestion. The rate at which the network can transmit requests is usually (unless you are using Gigabit) way below the rate at which your machine can generate them. Cheers, Trond