* Problem using exportfs in an active-active nfs cluster
@ 2010-06-08 16:24 RaSca
[not found] ` <4C0E6ED0.60200-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: RaSca @ 2010-06-08 16:24 UTC (permalink / raw)
To: linux-nfs
Hi all,
I'm trying to use a resource agent named exportfs for my active-active
nfs cluster configuration.
The resource agent works using the exportfs command. I have an instance
of nfs-kernel-server for each node and exportfs dynamically remove or
append the export to the node (obviously I've got also a shared storage).
The problem comes when the export is mounted by a client and this client
is writing on it: if the node switches, then the migration fails. The
sequence is this one:
- The resource exportfs stops correctly (the Resource Agent launch
exportfs -u)
- The Filesystem resource tries to unmount the exported filesystem,
doing an fuser to see if some processes are locking the fs.
- fuser doesn't return anything, but the filesystem is still locked.
This happens because the kernel process nfsd is locking the FS.
- The migration fails.
The only way to make thing work again is to restart the
nfs-kernel-daemon on the node which the resource reside and then cleanup
the resource.
Now, after many discussions on the Linux-ha Mailing List, I'm here to
ask if this problem is about the exportfs command. Why a filesystem
remains locked to the nfsd kernel process even if I (or the resource
agent) have done an "exportfs -u" command?
What can else I do to free the exported filesystem? Note that I've tried
to mount from the client with "nolock" option and also the exported
filesystem is mounted with "noatime".
Thanks a lot,
--
RaSca
Mia Mamma Usa Linux: Niente è impossibile da capire, se lo spieghi bene!
rasca@miamammausalinux.org
http://www.miamammausalinux.org
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Problem using exportfs in an active-active nfs cluster
[not found] ` <4C0E6ED0.60200-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org>
@ 2010-06-08 21:54 ` Neil Brown
[not found] ` <20100609075425.7ec4ccd9-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: Neil Brown @ 2010-06-08 21:54 UTC (permalink / raw)
To: rasca-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b; +Cc: linux-nfs
On Tue, 08 Jun 2010 18:24:48 +0200
RaSca <rasca-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org> wrote:
> Hi all,
> I'm trying to use a resource agent named exportfs for my active-active
> nfs cluster configuration.
> The resource agent works using the exportfs command. I have an instance
> of nfs-kernel-server for each node and exportfs dynamically remove or
> append the export to the node (obviously I've got also a shared storage).
> The problem comes when the export is mounted by a client and this client
> is writing on it: if the node switches, then the migration fails. The
> sequence is this one:
>
> - The resource exportfs stops correctly (the Resource Agent launch
> exportfs -u)
> - The Filesystem resource tries to unmount the exported filesystem,
> doing an fuser to see if some processes are locking the fs.
> - fuser doesn't return anything, but the filesystem is still locked.
> This happens because the kernel process nfsd is locking the FS.
> - The migration fails.
>
> The only way to make thing work again is to restart the
> nfs-kernel-daemon on the node which the resource reside and then cleanup
> the resource.
>
> Now, after many discussions on the Linux-ha Mailing List, I'm here to
> ask if this problem is about the exportfs command. Why a filesystem
> remains locked to the nfsd kernel process even if I (or the resource
> agent) have done an "exportfs -u" command?
>
> What can else I do to free the exported filesystem? Note that I've tried
> to mount from the client with "nolock" option and also the exported
> filesystem is mounted with "noatime".
>
> Thanks a lot,
>
Try
exportfs -f
NeilBrown
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Problem using exportfs in an active-active nfs cluster
[not found] ` <20100609075425.7ec4ccd9-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
@ 2010-06-09 7:43 ` RaSca
[not found] ` <4C0F4634.8090102-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: RaSca @ 2010-06-09 7:43 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-nfs
Il giorno Mar 08 Giu 2010 23:54:25 CET, Neil Brown ha scritto:
[...]
> Try
> exportfs -f
Already tried, it didn't work. The file system is still locked by nfsd.
--=20
RaSca
Mia Mamma Usa Linux: Niente =E8 impossibile da capire, se lo spieghi be=
ne!
rasca-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org
http://www.miamammausalinux.org
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Problem using exportfs in an active-active nfs cluster
[not found] ` <4C0F4634.8090102-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org>
@ 2010-06-09 8:01 ` RaSca
2010-06-09 8:01 ` RaSca
2010-06-09 8:04 ` Neil Brown
2 siblings, 0 replies; 9+ messages in thread
From: RaSca @ 2010-06-09 8:01 UTC (permalink / raw)
To: rasca-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b; +Cc: Neil Brown, linux-nfs
Il giorno Mer 09 Giu 2010 09:43:48 CET, RaSca ha scritto:
> Il giorno Mar 08 Giu 2010 23:54:25 CET, Neil Brown ha scritto:
> [...]
>> Try
>> exportfs -f
> Already tried, it didn't work. The file system is still locked by nfs=
d.
To be more specific: In my try first I do an exportfs -u of the export =
i=20
need to move and then i do an exportfs -f.
Client side the copy wait for a while and then gives some "Permission=20
denied" and exit.
Server side the problem, as already said is this one:
Jun 9 11:54:16 ubuntu-nodo1 lrmd: [692]: info: RA output:=20
(share-a-fs:stop:stderr) umount: /share-a: device is busy.#012=20
(In some cases useful info about processes that use#012 the=20
device is found by lsof(8) or fuser(1))
Jun 9 11:54:16 ubuntu-nodo1 lrmd: [692]: info: RA output:=20
(share-a-fs:stop:stderr)
Jun 9 11:54:16 ubuntu-nodo1 Filesystem[4065]: ERROR: Couldn't unmount=20
/share-a; trying cleanup with KILL
Jun 9 11:54:16 ubuntu-nodo1 Filesystem[4065]: INFO: No processes on=20
/share-a were signalled
As you can see the filesystem resource agent tries to unmount /share-a=20
but even if fuser or lsof does not give output, it still remains locked=
=20
(by nfsd, of course).
Have you got any other suggestion? Thanks a lot for your help.
--=20
RaSca
Mia Mamma Usa Linux: Niente =E8 impossibile da capire, se lo spieghi be=
ne!
rasca-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org
http://www.miamammausalinux.org
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Problem using exportfs in an active-active nfs cluster
[not found] ` <4C0F4634.8090102-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org>
2010-06-09 8:01 ` RaSca
@ 2010-06-09 8:01 ` RaSca
2010-06-09 8:04 ` Neil Brown
2 siblings, 0 replies; 9+ messages in thread
From: RaSca @ 2010-06-09 8:01 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-nfs
Il giorno Mer 09 Giu 2010 09:43:48 CET, RaSca ha scritto:
> Il giorno Mar 08 Giu 2010 23:54:25 CET, Neil Brown ha scritto:
> [...]
>> Try
>> exportfs -f
> Already tried, it didn't work. The file system is still locked by nfs=
d.
To be more specific: In my try first I do an exportfs -u of the export =
i=20
need to move and then i do an exportfs -f.
Client side the copy wait for a while and then gives some "Permission=20
denied" and exit.
Server side the problem, as already said is this one:
Jun 9 11:54:16 ubuntu-nodo1 lrmd: [692]: info: RA output:=20
(share-a-fs:stop:stderr) umount: /share-a: device is busy.#012=20
(In some cases useful info about processes that use#012 the=20
device is found by lsof(8) or fuser(1))
Jun 9 11:54:16 ubuntu-nodo1 lrmd: [692]: info: RA output:=20
(share-a-fs:stop:stderr)
Jun 9 11:54:16 ubuntu-nodo1 Filesystem[4065]: ERROR: Couldn't unmount=20
/share-a; trying cleanup with KILL
Jun 9 11:54:16 ubuntu-nodo1 Filesystem[4065]: INFO: No processes on=20
/share-a were signalled
As you can see the filesystem resource agent tries to unmount /share-a=20
but even if fuser or lsof does not give output, it still remains locked=
=20
(by nfsd, of course).
Have you got any other suggestion? Thanks a lot for your help.
--=20
RaSca
Mia Mamma Usa Linux: Niente =E8 impossibile da capire, se lo spieghi be=
ne!
rasca-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org
http://www.miamammausalinux.org
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Problem using exportfs in an active-active nfs cluster
[not found] ` <4C0F4634.8090102-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org>
2010-06-09 8:01 ` RaSca
2010-06-09 8:01 ` RaSca
@ 2010-06-09 8:04 ` Neil Brown
2010-06-09 10:28 ` RaSca
2 siblings, 1 reply; 9+ messages in thread
From: Neil Brown @ 2010-06-09 8:04 UTC (permalink / raw)
To: rasca-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b; +Cc: linux-nfs
On Wed, 09 Jun 2010 09:43:48 +0200
RaSca <rasca-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org> wrote:
> Il giorno Mar 08 Giu 2010 23:54:25 CET, Neil Brown ha scritto:
> [...]
> > Try
> > exportfs -f
>
> Already tried, it didn't work. The file system is still locked by nfsd.
>
Seems unlikely ... "exportfs -f" flushes all the export caches in the kernel
thus letting go of any filesystems.
I guess an active NFS request could still hold the fs active, but that should
complete fairly quickly.
file locking might be an issue. Might a client have a lock on some file in
the filesystem? Failover of locks is rather more complicated that simple
file-access fail-over. I don't recall what the status of this is currently.
When the umount files, check the content of
/proc/net/rpc/nfsd.export/content
and
/proc/locks
to check what is actually using the filesystem.
NeilBrown
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Problem using exportfs in an active-active nfs cluster
2010-06-09 8:04 ` Neil Brown
@ 2010-06-09 10:28 ` RaSca
2010-06-10 4:09 ` Neil Brown
0 siblings, 1 reply; 9+ messages in thread
From: RaSca @ 2010-06-09 10:28 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-nfs
Il giorno Mer 09 Giu 2010 10:04:39 CET, Neil Brown ha scritto:
[...]
> Seems unlikely ... "exportfs -f" flushes all the export caches in the kernel
> thus letting go of any filesystems.
> I guess an active NFS request could still hold the fs active, but that should
> complete fairly quickly.
> file locking might be an issue. Might a client have a lock on some file in
> the filesystem? Failover of locks is rather more complicated that simple
> file-access fail-over. I don't recall what the status of this is currently.
> When the umount files, check the content of
> /proc/net/rpc/nfsd.export/content
> and
> /proc/locks
> to check what is actually using the filesystem.
Note that I'm mounting from the client with nolock option.
Here is the output of the two cat:
/proc/net/rpc/nfsd.export/content:
#path domain(flags)#012#
/share-a#011192.168.1.0/24(rw,no_root_squash,sync,wdelay,crossmnt,no_subtree_check,fsid=1,uuid=7c80c4af:2a244b39:af
adb554:8c8e0574)
/proc/locks:
1: POSIX ADVISORY WRITE 753 00:11:3923 0 EOF#0122: FLOCK ADVISORY
WRITE 739 00:11:3916 0 EOF#0123: POSIX ADVISORY WRITE 522 00:11:3049
0 EOF
What do you think about it?
Thanks a lot!
--
RaSca
Mia Mamma Usa Linux: Niente è impossibile da capire, se lo spieghi bene!
rasca@miamammausalinux.org
http://www.miamammausalinux.org
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Problem using exportfs in an active-active nfs cluster
2010-06-09 10:28 ` RaSca
@ 2010-06-10 4:09 ` Neil Brown
2010-06-11 8:49 ` RaSca
0 siblings, 1 reply; 9+ messages in thread
From: Neil Brown @ 2010-06-10 4:09 UTC (permalink / raw)
To: rasca; +Cc: linux-nfs
On Wed, 09 Jun 2010 12:28:43 +0200
RaSca <rasca@miamammausalinux.org> wrote:
> Il giorno Mer 09 Giu 2010 10:04:39 CET, Neil Brown ha scritto:
> [...]
> > Seems unlikely ... "exportfs -f" flushes all the export caches in the kernel
> > thus letting go of any filesystems.
> > I guess an active NFS request could still hold the fs active, but that should
> > complete fairly quickly.
> > file locking might be an issue. Might a client have a lock on some file in
> > the filesystem? Failover of locks is rather more complicated that simple
> > file-access fail-over. I don't recall what the status of this is currently.
> > When the umount files, check the content of
> > /proc/net/rpc/nfsd.export/content
> > and
> > /proc/locks
> > to check what is actually using the filesystem.
>
> Note that I'm mounting from the client with nolock option.
>
> Here is the output of the two cat:
>
> /proc/net/rpc/nfsd.export/content:
>
> #path domain(flags)#012#
> /share-a#011192.168.1.0/24(rw,no_root_squash,sync,wdelay,crossmnt,no_subtree_check,fsid=1,uuid=7c80c4af:2a244b39:af
> adb554:8c8e0574)
>
> /proc/locks:
>
> 1: POSIX ADVISORY WRITE 753 00:11:3923 0 EOF#0122: FLOCK ADVISORY
> WRITE 739 00:11:3916 0 EOF#0123: POSIX ADVISORY WRITE 522 00:11:3049
> 0 EOF
>
> What do you think about it?
>
Clearly there are no locks .. though I wonder what is mounted on 00:11.
Probably not important.
The fact that the export entry is there after you did "exportfs -f" strong
suggests that a new request came in and caused mountd to re-add the entry.
Do you disable the network interface that the clients connect to *before*
unexporting? If you don't, you should.
Maybe run mountd with "-d all" and see what it is doing when you are
unexporting and unmounting.
NeilBrown
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Problem using exportfs in an active-active nfs cluster
2010-06-10 4:09 ` Neil Brown
@ 2010-06-11 8:49 ` RaSca
0 siblings, 0 replies; 9+ messages in thread
From: RaSca @ 2010-06-11 8:49 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-nfs
Il giorno Gio 10 Giu 2010 06:09:08 CET, Neil Brown ha scritto:
[...]
> Do you disable the network interface that the clients connect to *before*
> unexporting? If you don't, you should.
> Maybe run mountd with "-d all" and see what it is doing when you are
> unexporting and unmounting.
So, after a very long time, here is the solution. You were absolutely
right: i was not disabling the ip address first. Anyway it remains a
little bit strange that exportfs -u does not unlock the exported
filesystem, but removing the ip first makes things goes smooth, so...
Many thanks again!
Have a good day.
--
RaSca
Mia Mamma Usa Linux: Niente è impossibile da capire, se lo spieghi bene!
rasca@miamammausalinux.org
http://www.miamammausalinux.org
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-06-11 8:49 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-08 16:24 Problem using exportfs in an active-active nfs cluster RaSca
[not found] ` <4C0E6ED0.60200-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org>
2010-06-08 21:54 ` Neil Brown
[not found] ` <20100609075425.7ec4ccd9-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2010-06-09 7:43 ` RaSca
[not found] ` <4C0F4634.8090102-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org>
2010-06-09 8:01 ` RaSca
2010-06-09 8:01 ` RaSca
2010-06-09 8:04 ` Neil Brown
2010-06-09 10:28 ` RaSca
2010-06-10 4:09 ` Neil Brown
2010-06-11 8:49 ` RaSca
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox