From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mi Jinlong Subject: Re: [RFC] After nfs restart, locks can't be recovered which record by lockd before Date: Thu, 14 Jan 2010 18:06:42 +0800 Message-ID: <4B4EECB2.8050400@cn.fujitsu.com> References: <4B4D979D.6090307@cn.fujitsu.com> <20100113075155.5c409567@barsoom.rdu.redhat.com> <4B4E16C3.4050206@oracle.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Cc: Jeff Layton , "Trond.Myklebust" , "J. Bruce Fields" , NFSv3 list To: Chuck Lever Return-path: Received: from cn.fujitsu.com ([222.73.24.84]:59331 "EHLO song.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1750856Ab0ANKG7 convert rfc822-to-8bit (ORCPT ); Thu, 14 Jan 2010 05:06:59 -0500 In-Reply-To: <4B4E16C3.4050206@oracle.com> Sender: linux-nfs-owner@vger.kernel.org List-ID: Hi Chuck, Chuck Lever =E5=86=99=E9=81=93: > On 01/13/2010 07:51 AM, Jeff Layton wrote: >> On Wed, 13 Jan 2010 17:51:25 +0800 >> Mi Jinlong wrote: >> >> Assuming you're using a RH-derived distro like Fedora or RHEL, then = no. >> statd is controlled by a separate init script (nfslock) and when you >> run "service nfs restart" you're not restarting it. NSM notification= s >> are not sent and clients generally won't reclaim their locks. >> >> IOW, "you're doing it wrong". If you want locks to be reclaimed then >> you probably need to restart the nfslock service too. >=20 > Mi Jinlong is exercising another case we know doesn't work right, but= we > don't expect admins will ever perform this kind of "down-up" on a nor= mal > production server. In other words, we expect it to work this way, an= d > it's been good enough, so far. >=20 > As Jeff points out, the "nfs" and the "nfslock" services are separate= =2E > This is because "nfslock" is required for both client and server side > NFS, but "nfs" is required only on the server. This split also dicta= tes > the way sm-notify works, since it has to behave differently on NFS > clients and servers. >=20 > Two other points: >=20 > + lockd would not restart itself in this case if there happened to = be > NFS mounts on that system When testing, i find nfs restart will cause lockd restart. I find some codes which cause the lock stop when nfs stop. At kernel 2.6.18, fs/lockd/svc.c ... 354 if (nlmsvc_users) { 355 if (--nlmsvc_users) 356 goto out; 357 } else 358 printk(KERN_WARNING "lockd_down: no users! pid=3D%= d\n", nlmsvc_pid); ... 366=20 367 kill_proc(nlmsvc_pid, SIGKILL, 1); ... =20 At kernel 2.6.18, fs/lockd/svc.c ... 344 if (nlmsvc_users) { 345 if (--nlmsvc_users) 346 goto out; 347 } else { 348 printk(KERN_ERR "lockd_down: no users! task=3D%p\n= ", 349 nlmsvc_task); 350 BUG(); 351 } .... 357 kthread_stop(nlmsvc_task); 358 svc_exit_thread(nlmsvc_rqst); ... As above, when nlmsvc_users <=3D 1, the lockd will be killed. >=20 > + lockd doesn't currently poke statd when it restarts to tell it to > send reboot notifications, but it probably should Yes, I agree with you. But now, when some reason cause lockd restart = but statd not restart, locks which hold before will lost. Maybe, the kernel should fix this. >=20 > We know that lockd will start up when someone mounts the first NFS > share, or when the NFS server is started. If lockd sent statd an > SM_SIMU_CRASH (or something like it) every time it cold started, stat= d > could send reboot notifications at the right time on both servers and > clients without extra logic in the init scripts, and we wouldn't need > that kludge in sm-notify to know when a machine has rebooted. What's the meaning of cold start?? System reboot? Or statd reboot? I want to know when using cammond "service nfslock restart" restart t= he=20 nfslock service(means restart statd and lockd), will the statd call s= m-notify=20 to notify other client? Or don't? At RHEL5 and Fedora, the sm-notify will be call and send SM-NOTIFY ev= ery time=20 when nfslock restart. thanks, Mi Jinlong