public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
* Problem using exportfs in an active-active nfs cluster
@ 2010-06-08 16:24 RaSca
       [not found] ` <4C0E6ED0.60200-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: RaSca @ 2010-06-08 16:24 UTC (permalink / raw)
  To: linux-nfs

Hi all,
I'm trying to use a resource agent named exportfs for my active-active 
nfs cluster configuration.
The resource agent works using the exportfs command. I have an instance 
of nfs-kernel-server for each node and exportfs dynamically remove or 
append the export to the node (obviously I've got also a shared storage).
The problem comes when the export is mounted by a client and this client 
is writing on it: if the node switches, then the migration fails. The 
sequence is this one:

- The resource exportfs stops correctly (the Resource Agent launch 
exportfs -u)
- The Filesystem resource tries to unmount the exported filesystem, 
doing an fuser to see if some processes are locking the fs.
- fuser doesn't return anything, but the filesystem is still locked. 
This happens because the kernel process nfsd is locking the FS.
- The migration fails.

The only way to make thing work again is to restart the 
nfs-kernel-daemon on the node which the resource reside and then cleanup 
the resource.

Now, after many discussions on the Linux-ha Mailing List, I'm here to 
ask if this problem is about the exportfs command. Why a filesystem 
remains locked to the nfsd kernel process even if I (or the resource 
agent) have done an "exportfs -u" command?

What can else I do to free the exported filesystem? Note that I've tried 
to mount from the client with "nolock" option and also the exported 
filesystem is mounted with "noatime".

Thanks a lot,

-- 
RaSca
Mia Mamma Usa Linux: Niente è impossibile da capire, se lo spieghi bene!
rasca@miamammausalinux.org
http://www.miamammausalinux.org

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Problem using exportfs in an active-active nfs cluster
       [not found] ` <4C0E6ED0.60200-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org>
@ 2010-06-08 21:54   ` Neil Brown
       [not found]     ` <20100609075425.7ec4ccd9-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Neil Brown @ 2010-06-08 21:54 UTC (permalink / raw)
  To: rasca-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b; +Cc: linux-nfs

On Tue, 08 Jun 2010 18:24:48 +0200
RaSca <rasca-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org> wrote:

> Hi all,
> I'm trying to use a resource agent named exportfs for my active-active 
> nfs cluster configuration.
> The resource agent works using the exportfs command. I have an instance 
> of nfs-kernel-server for each node and exportfs dynamically remove or 
> append the export to the node (obviously I've got also a shared storage).
> The problem comes when the export is mounted by a client and this client 
> is writing on it: if the node switches, then the migration fails. The 
> sequence is this one:
> 
> - The resource exportfs stops correctly (the Resource Agent launch 
> exportfs -u)
> - The Filesystem resource tries to unmount the exported filesystem, 
> doing an fuser to see if some processes are locking the fs.
> - fuser doesn't return anything, but the filesystem is still locked. 
> This happens because the kernel process nfsd is locking the FS.
> - The migration fails.
> 
> The only way to make thing work again is to restart the 
> nfs-kernel-daemon on the node which the resource reside and then cleanup 
> the resource.
> 
> Now, after many discussions on the Linux-ha Mailing List, I'm here to 
> ask if this problem is about the exportfs command. Why a filesystem 
> remains locked to the nfsd kernel process even if I (or the resource 
> agent) have done an "exportfs -u" command?
> 
> What can else I do to free the exported filesystem? Note that I've tried 
> to mount from the client with "nolock" option and also the exported 
> filesystem is mounted with "noatime".
> 
> Thanks a lot,
> 

Try 
  exportfs -f

NeilBrown

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Problem using exportfs in an active-active nfs cluster
       [not found]     ` <20100609075425.7ec4ccd9-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
@ 2010-06-09  7:43       ` RaSca
       [not found]         ` <4C0F4634.8090102-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: RaSca @ 2010-06-09  7:43 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-nfs

Il giorno Mar 08 Giu 2010 23:54:25 CET, Neil Brown ha scritto:
[...]
> Try
>    exportfs -f

Already tried, it didn't work. The file system is still locked by nfsd.

--=20
RaSca
Mia Mamma Usa Linux: Niente =E8 impossibile da capire, se lo spieghi be=
ne!
rasca-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org
http://www.miamammausalinux.org

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Problem using exportfs in an active-active nfs cluster
       [not found]         ` <4C0F4634.8090102-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org>
@ 2010-06-09  8:01           ` RaSca
  2010-06-09  8:01           ` RaSca
  2010-06-09  8:04           ` Neil Brown
  2 siblings, 0 replies; 9+ messages in thread
From: RaSca @ 2010-06-09  8:01 UTC (permalink / raw)
  To: rasca-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b; +Cc: Neil Brown, linux-nfs

Il giorno Mer 09 Giu 2010 09:43:48 CET, RaSca ha scritto:
> Il giorno Mar 08 Giu 2010 23:54:25 CET, Neil Brown ha scritto:
> [...]
>> Try
>> exportfs -f
> Already tried, it didn't work. The file system is still locked by nfs=
d.

To be more specific: In my try first I do an exportfs -u of the export =
i=20
need to move and then i do an exportfs -f.
Client side the copy wait for a while and then gives some "Permission=20
denied" and exit.
Server side the problem, as already said is this one:

Jun  9 11:54:16 ubuntu-nodo1 lrmd: [692]: info: RA output:=20
(share-a-fs:stop:stderr) umount: /share-a: device is busy.#012=20
(In some cases useful info about processes that use#012         the=20
device is found by lsof(8) or fuser(1))
Jun  9 11:54:16 ubuntu-nodo1 lrmd: [692]: info: RA output:=20
(share-a-fs:stop:stderr)
Jun  9 11:54:16 ubuntu-nodo1 Filesystem[4065]: ERROR: Couldn't unmount=20
/share-a; trying cleanup with KILL
Jun  9 11:54:16 ubuntu-nodo1 Filesystem[4065]: INFO: No processes on=20
/share-a were signalled

As you can see the filesystem resource agent tries to unmount /share-a=20
but even if fuser or lsof does not give output, it still remains locked=
=20
(by nfsd, of course).

Have you got any other suggestion? Thanks a lot for your help.

--=20
RaSca
Mia Mamma Usa Linux: Niente =E8 impossibile da capire, se lo spieghi be=
ne!
rasca-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org
http://www.miamammausalinux.org

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Problem using exportfs in an active-active nfs cluster
       [not found]         ` <4C0F4634.8090102-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org>
  2010-06-09  8:01           ` RaSca
@ 2010-06-09  8:01           ` RaSca
  2010-06-09  8:04           ` Neil Brown
  2 siblings, 0 replies; 9+ messages in thread
From: RaSca @ 2010-06-09  8:01 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-nfs

Il giorno Mer 09 Giu 2010 09:43:48 CET, RaSca ha scritto:
> Il giorno Mar 08 Giu 2010 23:54:25 CET, Neil Brown ha scritto:
> [...]
>> Try
>> exportfs -f
> Already tried, it didn't work. The file system is still locked by nfs=
d.

To be more specific: In my try first I do an exportfs -u of the export =
i=20
need to move and then i do an exportfs -f.
Client side the copy wait for a while and then gives some "Permission=20
denied" and exit.
Server side the problem, as already said is this one:

Jun  9 11:54:16 ubuntu-nodo1 lrmd: [692]: info: RA output:=20
(share-a-fs:stop:stderr) umount: /share-a: device is busy.#012=20
(In some cases useful info about processes that use#012         the=20
device is found by lsof(8) or fuser(1))
Jun  9 11:54:16 ubuntu-nodo1 lrmd: [692]: info: RA output:=20
(share-a-fs:stop:stderr)
Jun  9 11:54:16 ubuntu-nodo1 Filesystem[4065]: ERROR: Couldn't unmount=20
/share-a; trying cleanup with KILL
Jun  9 11:54:16 ubuntu-nodo1 Filesystem[4065]: INFO: No processes on=20
/share-a were signalled

As you can see the filesystem resource agent tries to unmount /share-a=20
but even if fuser or lsof does not give output, it still remains locked=
=20
(by nfsd, of course).

Have you got any other suggestion? Thanks a lot for your help.

--=20
RaSca
Mia Mamma Usa Linux: Niente =E8 impossibile da capire, se lo spieghi be=
ne!
rasca-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org
http://www.miamammausalinux.org

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Problem using exportfs in an active-active nfs cluster
       [not found]         ` <4C0F4634.8090102-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org>
  2010-06-09  8:01           ` RaSca
  2010-06-09  8:01           ` RaSca
@ 2010-06-09  8:04           ` Neil Brown
  2010-06-09 10:28             ` RaSca
  2 siblings, 1 reply; 9+ messages in thread
From: Neil Brown @ 2010-06-09  8:04 UTC (permalink / raw)
  To: rasca-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b; +Cc: linux-nfs

On Wed, 09 Jun 2010 09:43:48 +0200
RaSca <rasca-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org> wrote:

> Il giorno Mar 08 Giu 2010 23:54:25 CET, Neil Brown ha scritto:
> [...]
> > Try
> >    exportfs -f
> 
> Already tried, it didn't work. The file system is still locked by nfsd.
> 

Seems unlikely ... "exportfs -f" flushes all the export caches in the kernel
thus letting go of any filesystems.
I guess an active NFS request could still hold the fs active, but that should
complete fairly quickly.
file locking might be an issue.  Might a client have a lock on some file in
the filesystem?  Failover of locks is rather more complicated that simple
file-access fail-over.  I don't recall what the status of this is currently.

When the umount files, check the content of
   /proc/net/rpc/nfsd.export/content
and
   /proc/locks

to check what is actually using the filesystem.

NeilBrown

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Problem using exportfs in an active-active nfs cluster
  2010-06-09  8:04           ` Neil Brown
@ 2010-06-09 10:28             ` RaSca
  2010-06-10  4:09               ` Neil Brown
  0 siblings, 1 reply; 9+ messages in thread
From: RaSca @ 2010-06-09 10:28 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-nfs

Il giorno Mer 09 Giu 2010 10:04:39 CET, Neil Brown ha scritto:
[...]
> Seems unlikely ... "exportfs -f" flushes all the export caches in the kernel
> thus letting go of any filesystems.
> I guess an active NFS request could still hold the fs active, but that should
> complete fairly quickly.
> file locking might be an issue.  Might a client have a lock on some file in
> the filesystem?  Failover of locks is rather more complicated that simple
> file-access fail-over.  I don't recall what the status of this is currently.
> When the umount files, check the content of
>     /proc/net/rpc/nfsd.export/content
> and
>     /proc/locks
> to check what is actually using the filesystem.

Note that I'm mounting from the client with nolock option.

Here is the output of the two cat:

/proc/net/rpc/nfsd.export/content:

#path domain(flags)#012# 
/share-a#011192.168.1.0/24(rw,no_root_squash,sync,wdelay,crossmnt,no_subtree_check,fsid=1,uuid=7c80c4af:2a244b39:af
adb554:8c8e0574)

/proc/locks:

1: POSIX  ADVISORY  WRITE 753 00:11:3923 0 EOF#0122: FLOCK  ADVISORY 
WRITE 739 00:11:3916 0 EOF#0123: POSIX  ADVISORY  WRITE 522 00:11:3049
  0 EOF

What do you think about it?

Thanks a lot!

-- 
RaSca
Mia Mamma Usa Linux: Niente è impossibile da capire, se lo spieghi bene!
rasca@miamammausalinux.org
http://www.miamammausalinux.org

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Problem using exportfs in an active-active nfs cluster
  2010-06-09 10:28             ` RaSca
@ 2010-06-10  4:09               ` Neil Brown
  2010-06-11  8:49                 ` RaSca
  0 siblings, 1 reply; 9+ messages in thread
From: Neil Brown @ 2010-06-10  4:09 UTC (permalink / raw)
  To: rasca; +Cc: linux-nfs

On Wed, 09 Jun 2010 12:28:43 +0200
RaSca <rasca@miamammausalinux.org> wrote:

> Il giorno Mer 09 Giu 2010 10:04:39 CET, Neil Brown ha scritto:
> [...]
> > Seems unlikely ... "exportfs -f" flushes all the export caches in the kernel
> > thus letting go of any filesystems.
> > I guess an active NFS request could still hold the fs active, but that should
> > complete fairly quickly.
> > file locking might be an issue.  Might a client have a lock on some file in
> > the filesystem?  Failover of locks is rather more complicated that simple
> > file-access fail-over.  I don't recall what the status of this is currently.
> > When the umount files, check the content of
> >     /proc/net/rpc/nfsd.export/content
> > and
> >     /proc/locks
> > to check what is actually using the filesystem.
> 
> Note that I'm mounting from the client with nolock option.
> 
> Here is the output of the two cat:
> 
> /proc/net/rpc/nfsd.export/content:
> 
> #path domain(flags)#012# 
> /share-a#011192.168.1.0/24(rw,no_root_squash,sync,wdelay,crossmnt,no_subtree_check,fsid=1,uuid=7c80c4af:2a244b39:af
> adb554:8c8e0574)
> 
> /proc/locks:
> 
> 1: POSIX  ADVISORY  WRITE 753 00:11:3923 0 EOF#0122: FLOCK  ADVISORY 
> WRITE 739 00:11:3916 0 EOF#0123: POSIX  ADVISORY  WRITE 522 00:11:3049
>   0 EOF
> 
> What do you think about it?
>

Clearly there are no locks .. though I wonder what is mounted on 00:11.
Probably not important.

The fact that the export entry is there after you did "exportfs -f" strong
suggests that a new request came in and caused mountd to re-add the entry.

Do you disable the network interface that the clients connect to *before*
unexporting?  If you don't, you should.
Maybe run mountd with "-d all" and see what it is doing when you are
unexporting and unmounting.

NeilBrown

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Problem using exportfs in an active-active nfs cluster
  2010-06-10  4:09               ` Neil Brown
@ 2010-06-11  8:49                 ` RaSca
  0 siblings, 0 replies; 9+ messages in thread
From: RaSca @ 2010-06-11  8:49 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-nfs

Il giorno Gio 10 Giu 2010 06:09:08 CET, Neil Brown ha scritto:
[...]
> Do you disable the network interface that the clients connect to *before*
> unexporting?  If you don't, you should.
> Maybe run mountd with "-d all" and see what it is doing when you are
> unexporting and unmounting.

So, after a very long time, here is the solution. You were absolutely 
right: i was not disabling the ip address first. Anyway it remains a 
little bit strange that exportfs -u does not unlock the exported 
filesystem, but removing the ip first makes things goes smooth, so... 
Many thanks again!

Have a good day.

-- 
RaSca
Mia Mamma Usa Linux: Niente è impossibile da capire, se lo spieghi bene!
rasca@miamammausalinux.org
http://www.miamammausalinux.org

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-06-11  8:49 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-08 16:24 Problem using exportfs in an active-active nfs cluster RaSca
     [not found] ` <4C0E6ED0.60200-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org>
2010-06-08 21:54   ` Neil Brown
     [not found]     ` <20100609075425.7ec4ccd9-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2010-06-09  7:43       ` RaSca
     [not found]         ` <4C0F4634.8090102-9B074fXSGsOr88ip1nKoZ2D2FQJk+8+b@public.gmane.org>
2010-06-09  8:01           ` RaSca
2010-06-09  8:01           ` RaSca
2010-06-09  8:04           ` Neil Brown
2010-06-09 10:28             ` RaSca
2010-06-10  4:09               ` Neil Brown
2010-06-11  8:49                 ` RaSca

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox