* Problem using exportfs in an active-active nfs cluster
@ 2010-06-08 16:24 RaSca
[not found] ` <4C0E6ED0.60200-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: RaSca @ 2010-06-08 16:24 UTC (permalink / raw)
To: linux-nfs
Hi all,
I'm trying to use a resource agent named exportfs for my active-active
nfs cluster configuration.
The resource agent works using the exportfs command. I have an instance
of nfs-kernel-server for each node and exportfs dynamically remove or
append the export to the node (obviously I've got also a shared storage).
The problem comes when the export is mounted by a client and this client
is writing on it: if the node switches, then the migration fails. The
sequence is this one:
- The resource exportfs stops correctly (the Resource Agent launch
exportfs -u)
- The Filesystem resource tries to unmount the exported filesystem,
doing an fuser to see if some processes are locking the fs.
- fuser doesn't return anything, but the filesystem is still locked.
This happens because the kernel process nfsd is locking the FS.
- The migration fails.
The only way to make thing work again is to restart the
nfs-kernel-daemon on the node which the resource reside and then cleanup
the resource.
Now, after many discussions on the Linux-ha Mailing List, I'm here to
ask if this problem is about the exportfs command. Why a filesystem
remains locked to the nfsd kernel process even if I (or the resource
agent) have done an "exportfs -u" command?
What can else I do to free the exported filesystem? Note that I've tried
to mount from the client with "nolock" option and also the exported
filesystem is mounted with "noatime".
Thanks a lot,
--
RaSca
Mia Mamma Usa Linux: Niente è impossibile da capire, se lo spieghi bene!
rasca@miamammausalinux.org
http://www.miamammausalinux.org
^ permalink raw reply [flat|nested] 9+ messages in thread[parent not found: <4C0E6ED0.60200-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org>]
* Re: Problem using exportfs in an active-active nfs cluster [not found] ` <4C0E6ED0.60200-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org> @ 2010-06-08 21:54 ` Neil Brown [not found] ` <20100609075425.7ec4ccd9-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org> 0 siblings, 1 reply; 9+ messages in thread From: Neil Brown @ 2010-06-08 21:54 UTC (permalink / raw) To: rasca-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b; +Cc: linux-nfs On Tue, 08 Jun 2010 18:24:48 +0200 RaSca <rasca-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org> wrote: > Hi all, > I'm trying to use a resource agent named exportfs for my active-active > nfs cluster configuration. > The resource agent works using the exportfs command. I have an instance > of nfs-kernel-server for each node and exportfs dynamically remove or > append the export to the node (obviously I've got also a shared storage). > The problem comes when the export is mounted by a client and this client > is writing on it: if the node switches, then the migration fails. The > sequence is this one: > > - The resource exportfs stops correctly (the Resource Agent launch > exportfs -u) > - The Filesystem resource tries to unmount the exported filesystem, > doing an fuser to see if some processes are locking the fs. > - fuser doesn't return anything, but the filesystem is still locked. > This happens because the kernel process nfsd is locking the FS. > - The migration fails. > > The only way to make thing work again is to restart the > nfs-kernel-daemon on the node which the resource reside and then cleanup > the resource. > > Now, after many discussions on the Linux-ha Mailing List, I'm here to > ask if this problem is about the exportfs command. Why a filesystem > remains locked to the nfsd kernel process even if I (or the resource > agent) have done an "exportfs -u" command? > > What can else I do to free the exported filesystem? Note that I've tried > to mount from the client with "nolock" option and also the exported > filesystem is mounted with "noatime". > > Thanks a lot, > Try exportfs -f NeilBrown ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <20100609075425.7ec4ccd9-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>]
* Re: Problem using exportfs in an active-active nfs cluster [not found] ` <20100609075425.7ec4ccd9-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org> @ 2010-06-09 7:43 ` RaSca [not found] ` <4C0F4634.8090102-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org> 0 siblings, 1 reply; 9+ messages in thread From: RaSca @ 2010-06-09 7:43 UTC (permalink / raw) To: Neil Brown; +Cc: linux-nfs Il giorno Mar 08 Giu 2010 23:54:25 CET, Neil Brown ha scritto: [...] > Try > exportfs -f Already tried, it didn't work. The file system is still locked by nfsd. --=20 RaSca Mia Mamma Usa Linux: Niente =E8 impossibile da capire, se lo spieghi be= ne! rasca-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org http://www.miamammausalinux.org ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <4C0F4634.8090102-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org>]
* Re: Problem using exportfs in an active-active nfs cluster [not found] ` <4C0F4634.8090102-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org> @ 2010-06-09 8:01 ` RaSca 2010-06-09 8:01 ` RaSca 2010-06-09 8:04 ` Neil Brown 2 siblings, 0 replies; 9+ messages in thread From: RaSca @ 2010-06-09 8:01 UTC (permalink / raw) To: rasca-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b; +Cc: Neil Brown, linux-nfs Il giorno Mer 09 Giu 2010 09:43:48 CET, RaSca ha scritto: > Il giorno Mar 08 Giu 2010 23:54:25 CET, Neil Brown ha scritto: > [...] >> Try >> exportfs -f > Already tried, it didn't work. The file system is still locked by nfs= d. To be more specific: In my try first I do an exportfs -u of the export = i=20 need to move and then i do an exportfs -f. Client side the copy wait for a while and then gives some "Permission=20 denied" and exit. Server side the problem, as already said is this one: Jun 9 11:54:16 ubuntu-nodo1 lrmd: [692]: info: RA output:=20 (share-a-fs:stop:stderr) umount: /share-a: device is busy.#012=20 (In some cases useful info about processes that use#012 the=20 device is found by lsof(8) or fuser(1)) Jun 9 11:54:16 ubuntu-nodo1 lrmd: [692]: info: RA output:=20 (share-a-fs:stop:stderr) Jun 9 11:54:16 ubuntu-nodo1 Filesystem[4065]: ERROR: Couldn't unmount=20 /share-a; trying cleanup with KILL Jun 9 11:54:16 ubuntu-nodo1 Filesystem[4065]: INFO: No processes on=20 /share-a were signalled As you can see the filesystem resource agent tries to unmount /share-a=20 but even if fuser or lsof does not give output, it still remains locked= =20 (by nfsd, of course). Have you got any other suggestion? Thanks a lot for your help. --=20 RaSca Mia Mamma Usa Linux: Niente =E8 impossibile da capire, se lo spieghi be= ne! rasca-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org http://www.miamammausalinux.org ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Problem using exportfs in an active-active nfs cluster [not found] ` <4C0F4634.8090102-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org> 2010-06-09 8:01 ` RaSca @ 2010-06-09 8:01 ` RaSca 2010-06-09 8:04 ` Neil Brown 2 siblings, 0 replies; 9+ messages in thread From: RaSca @ 2010-06-09 8:01 UTC (permalink / raw) To: Neil Brown; +Cc: linux-nfs Il giorno Mer 09 Giu 2010 09:43:48 CET, RaSca ha scritto: > Il giorno Mar 08 Giu 2010 23:54:25 CET, Neil Brown ha scritto: > [...] >> Try >> exportfs -f > Already tried, it didn't work. The file system is still locked by nfs= d. To be more specific: In my try first I do an exportfs -u of the export = i=20 need to move and then i do an exportfs -f. Client side the copy wait for a while and then gives some "Permission=20 denied" and exit. Server side the problem, as already said is this one: Jun 9 11:54:16 ubuntu-nodo1 lrmd: [692]: info: RA output:=20 (share-a-fs:stop:stderr) umount: /share-a: device is busy.#012=20 (In some cases useful info about processes that use#012 the=20 device is found by lsof(8) or fuser(1)) Jun 9 11:54:16 ubuntu-nodo1 lrmd: [692]: info: RA output:=20 (share-a-fs:stop:stderr) Jun 9 11:54:16 ubuntu-nodo1 Filesystem[4065]: ERROR: Couldn't unmount=20 /share-a; trying cleanup with KILL Jun 9 11:54:16 ubuntu-nodo1 Filesystem[4065]: INFO: No processes on=20 /share-a were signalled As you can see the filesystem resource agent tries to unmount /share-a=20 but even if fuser or lsof does not give output, it still remains locked= =20 (by nfsd, of course). Have you got any other suggestion? Thanks a lot for your help. --=20 RaSca Mia Mamma Usa Linux: Niente =E8 impossibile da capire, se lo spieghi be= ne! rasca-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org http://www.miamammausalinux.org ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Problem using exportfs in an active-active nfs cluster [not found] ` <4C0F4634.8090102-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org> 2010-06-09 8:01 ` RaSca 2010-06-09 8:01 ` RaSca @ 2010-06-09 8:04 ` Neil Brown 2010-06-09 10:28 ` RaSca 2 siblings, 1 reply; 9+ messages in thread From: Neil Brown @ 2010-06-09 8:04 UTC (permalink / raw) To: rasca-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b; +Cc: linux-nfs On Wed, 09 Jun 2010 09:43:48 +0200 RaSca <rasca-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org> wrote: > Il giorno Mar 08 Giu 2010 23:54:25 CET, Neil Brown ha scritto: > [...] > > Try > > exportfs -f > > Already tried, it didn't work. The file system is still locked by nfsd. > Seems unlikely ... "exportfs -f" flushes all the export caches in the kernel thus letting go of any filesystems. I guess an active NFS request could still hold the fs active, but that should complete fairly quickly. file locking might be an issue. Might a client have a lock on some file in the filesystem? Failover of locks is rather more complicated that simple file-access fail-over. I don't recall what the status of this is currently. When the umount files, check the content of /proc/net/rpc/nfsd.export/content and /proc/locks to check what is actually using the filesystem. NeilBrown ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Problem using exportfs in an active-active nfs cluster 2010-06-09 8:04 ` Neil Brown @ 2010-06-09 10:28 ` RaSca 2010-06-10 4:09 ` Neil Brown 0 siblings, 1 reply; 9+ messages in thread From: RaSca @ 2010-06-09 10:28 UTC (permalink / raw) To: Neil Brown; +Cc: linux-nfs Il giorno Mer 09 Giu 2010 10:04:39 CET, Neil Brown ha scritto: [...] > Seems unlikely ... "exportfs -f" flushes all the export caches in the kernel > thus letting go of any filesystems. > I guess an active NFS request could still hold the fs active, but that should > complete fairly quickly. > file locking might be an issue. Might a client have a lock on some file in > the filesystem? Failover of locks is rather more complicated that simple > file-access fail-over. I don't recall what the status of this is currently. > When the umount files, check the content of > /proc/net/rpc/nfsd.export/content > and > /proc/locks > to check what is actually using the filesystem. Note that I'm mounting from the client with nolock option. Here is the output of the two cat: /proc/net/rpc/nfsd.export/content: #path domain(flags)#012# /share-a#011192.168.1.0/24(rw,no_root_squash,sync,wdelay,crossmnt,no_subtree_check,fsid=1,uuid=7c80c4af:2a244b39:af adb554:8c8e0574) /proc/locks: 1: POSIX ADVISORY WRITE 753 00:11:3923 0 EOF#0122: FLOCK ADVISORY WRITE 739 00:11:3916 0 EOF#0123: POSIX ADVISORY WRITE 522 00:11:3049 0 EOF What do you think about it? Thanks a lot! -- RaSca Mia Mamma Usa Linux: Niente è impossibile da capire, se lo spieghi bene! rasca@miamammausalinux.org http://www.miamammausalinux.org ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Problem using exportfs in an active-active nfs cluster 2010-06-09 10:28 ` RaSca @ 2010-06-10 4:09 ` Neil Brown 2010-06-11 8:49 ` RaSca 0 siblings, 1 reply; 9+ messages in thread From: Neil Brown @ 2010-06-10 4:09 UTC (permalink / raw) To: rasca; +Cc: linux-nfs On Wed, 09 Jun 2010 12:28:43 +0200 RaSca <rasca@miamammausalinux.org> wrote: > Il giorno Mer 09 Giu 2010 10:04:39 CET, Neil Brown ha scritto: > [...] > > Seems unlikely ... "exportfs -f" flushes all the export caches in the kernel > > thus letting go of any filesystems. > > I guess an active NFS request could still hold the fs active, but that should > > complete fairly quickly. > > file locking might be an issue. Might a client have a lock on some file in > > the filesystem? Failover of locks is rather more complicated that simple > > file-access fail-over. I don't recall what the status of this is currently. > > When the umount files, check the content of > > /proc/net/rpc/nfsd.export/content > > and > > /proc/locks > > to check what is actually using the filesystem. > > Note that I'm mounting from the client with nolock option. > > Here is the output of the two cat: > > /proc/net/rpc/nfsd.export/content: > > #path domain(flags)#012# > /share-a#011192.168.1.0/24(rw,no_root_squash,sync,wdelay,crossmnt,no_subtree_check,fsid=1,uuid=7c80c4af:2a244b39:af > adb554:8c8e0574) > > /proc/locks: > > 1: POSIX ADVISORY WRITE 753 00:11:3923 0 EOF#0122: FLOCK ADVISORY > WRITE 739 00:11:3916 0 EOF#0123: POSIX ADVISORY WRITE 522 00:11:3049 > 0 EOF > > What do you think about it? > Clearly there are no locks .. though I wonder what is mounted on 00:11. Probably not important. The fact that the export entry is there after you did "exportfs -f" strong suggests that a new request came in and caused mountd to re-add the entry. Do you disable the network interface that the clients connect to *before* unexporting? If you don't, you should. Maybe run mountd with "-d all" and see what it is doing when you are unexporting and unmounting. NeilBrown ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Problem using exportfs in an active-active nfs cluster 2010-06-10 4:09 ` Neil Brown @ 2010-06-11 8:49 ` RaSca 0 siblings, 0 replies; 9+ messages in thread From: RaSca @ 2010-06-11 8:49 UTC (permalink / raw) To: Neil Brown; +Cc: linux-nfs Il giorno Gio 10 Giu 2010 06:09:08 CET, Neil Brown ha scritto: [...] > Do you disable the network interface that the clients connect to *before* > unexporting? If you don't, you should. > Maybe run mountd with "-d all" and see what it is doing when you are > unexporting and unmounting. So, after a very long time, here is the solution. You were absolutely right: i was not disabling the ip address first. Anyway it remains a little bit strange that exportfs -u does not unlock the exported filesystem, but removing the ip first makes things goes smooth, so... Many thanks again! Have a good day. -- RaSca Mia Mamma Usa Linux: Niente è impossibile da capire, se lo spieghi bene! rasca@miamammausalinux.org http://www.miamammausalinux.org ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-06-11 8:49 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-08 16:24 Problem using exportfs in an active-active nfs cluster RaSca
[not found] ` <4C0E6ED0.60200-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org>
2010-06-08 21:54 ` Neil Brown
[not found] ` <20100609075425.7ec4ccd9-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2010-06-09 7:43 ` RaSca
[not found] ` <4C0F4634.8090102-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org>
2010-06-09 8:01 ` RaSca
2010-06-09 8:01 ` RaSca
2010-06-09 8:04 ` Neil Brown
2010-06-09 10:28 ` RaSca
2010-06-10 4:09 ` Neil Brown
2010-06-11 8:49 ` RaSca
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox