From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934078AbXG0Ndb (ORCPT ); Fri, 27 Jul 2007 09:33:31 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755935AbXG0NdR (ORCPT ); Fri, 27 Jul 2007 09:33:17 -0400 Received: from mx2.netapp.com ([216.240.18.37]:37122 "EHLO mx2.netapp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1760446AbXG0NdP (ORCPT ); Fri, 27 Jul 2007 09:33:15 -0400 X-IronPort-AV: E=Sophos;i="4.16,589,1175497200"; d="scan'208";a="86543604" Subject: Re: NFSv4 poops itself From: Trond Myklebust To: Jeff Garzik Cc: Linux Kernel Mailing List , Andrew Morton , Michal Piotrowski In-Reply-To: <46A9EAB0.3090306@garzik.org> References: <46A9EAB0.3090306@garzik.org> Content-Type: text/plain Content-Transfer-Encoding: 7bit Organization: Network Appliance Inc Date: Fri, 27 Jul 2007 09:33:07 -0400 Message-Id: <1185543187.6586.10.camel@localhost> Mime-Version: 1.0 X-Mailer: Evolution 2.10.1 X-OriginalArrivalTime: 27 Jul 2007 13:33:09.0038 (UTC) FILETIME=[ABBAA4E0:01C7D052] Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 2007-07-27 at 08:53 -0400, Jeff Garzik wrote: > Background: > > Server: x86-64 dual core Intel, kernel 2.6.23-rc1 (my home fileserver) > Exporting NFS/NFSv4 mounts. Client count: 1 Uptime: 4 days > > Client: x86-64 dual core Intel, kernel 2.6.23-rc1 (my main workstation) > NFS mount setup: > pretzel:/ on /g type nfs4 (rw,noatime,proto=tcp,addr=10.10.10.1) > Uptime: 4 days > > Home directory mounted via NFSv4. > > Problem: > > My workstation has been happily talking to my file server for several > days without incident. An hour ago, my numeric keypad stopping working > (unrelated problem... USB or X bug?). The solution to the keypad > problem is usually to log out of X and log back in, or worse case, reboot. > > So, I log out, and log back in. At first, a few shell windows open and > successfully initialize themselves (read bash profile over NFS, etc.) > Then, as more shell windows open, things start hanging. I can easily > switch to console and ssh to the fileserver, so it is clear this is an > NFS hang. > > No adverse messages at all on the client. > > On the server, I see NFSv4 spamming dmesg with hundreds of thousands of > messages: > > Jul 27 08:20:53 pretzel kernel: NFSD: preprocess_seqid_op: old stateid! > Jul 27 08:21:24 pretzel last message repeated 167966 times > Jul 27 08:21:55 pretzel last message repeated 173628 times > Jul 27 08:21:55 pretzel kernel: NFSD: preprocess_seqid_op: old stateid! > Jul 27 08:22:26 pretzel last message repeated 171286 times > Jul 27 08:23:27 pretzel last message repeated 344461 times > Jul 27 08:23:30 pretzel last message repeated 18656 times > > I rebooted the client, the problem disappeared, and everything is happy > again... but clearly NFSv4 shat itself. And now I am worried this will > happen again. > > In all my quite-heavy use of NFSv4 I've never seen this behavior before, > so I would call this a regression. Yup. Bruce has reported the same bug so it is under investigation. I'll keep you posted when I think we have a fix. Cheers Trond