From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mi Jinlong Subject: Re: [RFC] After server stop nfslock service, client still can get lock success Date: Wed, 18 Nov 2009 17:50:30 +0800 Message-ID: <4B03C366.8050009@cn.fujitsu.com> References: <4B027123.4060100@cn.fujitsu.com> <84C94F5A-0192-4F6D-858D-0CCA92574625@oracle.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Cc: "Trond.Myklebust" , NFSv3 list , "J. Bruce Fields" To: Chuck Lever Return-path: Received: from cn.fujitsu.com ([222.73.24.84]:52870 "EHLO song.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1756845AbZKRJtP (ORCPT ); Wed, 18 Nov 2009 04:49:15 -0500 In-Reply-To: <84C94F5A-0192-4F6D-858D-0CCA92574625@oracle.com> Sender: linux-nfs-owner@vger.kernel.org List-ID: Hi Chuck Lever: > > On Nov 17, 2009, at 4:47 AM, Mi Jinlong wrote: > >> When testing NLM, i find a bug. >> After server stop nfslock service, client still can get lock success >> >> Test process: >> >> Step1: client open nfs file. >> Step2: client using fcntl to get lock. >> Step3: client using fcntl to release lock. >> Step4: service stop it's nfslock service. >> Step5: client using fcntl to get lock again. >> >> At step5, client should get lock fail, but it's success. >> >> Reason: >> When server stop nfslock service, client's host struct not be >> unmonitor at server. When client get lock again, the client's >> host struct will be reuse but don't monitor again. >> So that, at step5 client can get lock success. > > Effectively, the client is still monitored, since it is still in statd's > monitored list. Shutting down statd does not remove it from the monitor > list. If the local host reboots, sm-notify will still send the remote > an SM_NOTIFY request, which is correct. > > Additionally, new clients attempting to lock files when statd is down > will fail, which is correct if statd is not available. > > Conversely, if a monitored remote reboots, there is no way to notify the > local lockd of the reboot, since statd normally relays the SM_NOTIFY to > lockd, but isn't running. That might be a problem. Yes, it seems a problem. I don't confirm it, so i want get your opinion. > > However, shutting down statd during normal operation is not a normal or > supported thing to do. > >> Question: >> 1. Should unmonitor the client's host struct at server >> when server stop nfslock service ? >> >> 2. Whether let rpc.statd tell kernel it's status(when start and stop) >> by send a SM_NOTIFY ? > > There are a number of other coordination issues around statd start-up > and shut down. The server's grace period, for instance, is not > synchronized with sending reboot notifications. So, we do recognize > this is a general problem. > > In this case, however, I would expect indeterminate behavior if statd is > shut down during normal operation, and that's exactly what we get. I'm > not sure it's even reasonable to support this use case. Why would > someone shut down statd and expect reliable NFSv2/v3 locking behavior? > In other words, with due respect, what problem would we solve by fixing > this, other than making your test case work? When server's nfslock service is stop, client can get lock success sometimes and can't get success sometimes, it's puzzled. > > Out of curiosity, what happens if you try this on a Solaris server? I'm a new man for Solaris. When Solaris's nlockmgr is stop, client can't get lock immediately. thanks, Mi Jinlong