All of lore.kernel.org
 help / color / mirror / Atom feed
* [uml-devel] Reboot failing with file locked, CLONE_FILES and host kernel BUG()
@ 2004-04-13 21:23 Marcin Pawlik
  2004-04-14  1:13 ` [uml-devel] [patch] " Henrik Nordstrom
  0 siblings, 1 reply; 6+ messages in thread
From: Marcin Pawlik @ 2004-04-13 21:23 UTC (permalink / raw)
  To: user-mode-linux-devel

Hello,

My UML fails to reboot saying:

#v+
F_SETLK failed, file already locked by pid 32589
Failed to lock 'fs', err = 11
#v-

or when .cow file is used:

#v+
F_SETLK failed, file already locked by pid 7530
Failed to lock 'root_fs.cow', err = 11
unable to open root_fs.cow for validation
Initializing stdio console driver
#v-

It happens when I execute "shutdown -r now" command on guest system or
"cad" in uml_mconsole. I'm using tt mode and the problem is present in
every version I checked - user-mode-linux 2.4.24-1um-2 from Debian,
manually compiled 2.4.24 and 2.6.4. This behavior doesn't also depend on
host kernel version - I tried 2.4.18, 2.4.24 and 2.6.5. The problem was
reported in Debian as a bug #220679 (and forwarded here by Matt
Zimmerman) but I think it wasn't solved. 

On 2.6.5 host kernel it also triggers host kernel BUG in
locks_remove_flock from fs/locks.c. fl->flags is FL_POSIX and the kernel
expects FL_FLOCK or FL_LEASE. On 2.4.* kernels the bug doesn't show
since BUG() line was added in 2.5.

I think UML starts its threads with CLONE_FILES and the main process is
restarted witch execvp which also preserves the lock. In skas mode the
problem is not present because the process that survives the reboot is
also the one holding the lock. In tt mode the lock is not placed by the
tracing thread so UML cannot place it again. 

I tried to close all the files (from ubd_dev array) in kill_io_thread
function and it helps but I have no idea what happens to the host kernel
when UML is not modified. Usually it removes the lock when the main
process dies but sometimes the lock is left. Killing all the other
processes doesn't also help here. I don't know how to reproduce this
without UML. Probably a special combination of clone flags and maybe
ptrace settings used by UML is needed.

Regards,

-- 
Marcin Pawlik


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [uml-devel] [patch] Re: Reboot failing with file locked, CLONE_FILES and host kernel BUG()
  2004-04-13 21:23 [uml-devel] Reboot failing with file locked, CLONE_FILES and host kernel BUG() Marcin Pawlik
@ 2004-04-14  1:13 ` Henrik Nordstrom
  2004-04-14 13:32   ` Marcin Pawlik
  0 siblings, 1 reply; 6+ messages in thread
From: Henrik Nordstrom @ 2004-04-14  1:13 UTC (permalink / raw)
  To: Marcin Pawlik; +Cc: user-mode-linux-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 774 bytes --]

On Tue, 13 Apr 2004, Marcin Pawlik wrote:

> #v+
> F_SETLK failed, file already locked by pid 32589
> Failed to lock 'fs', err = 11
> #v-

Have been plauged by this quite a lot.. tried to narrow it down the other
day but the conclusion was that the host fcntl locking implementation is
buggy and stale locks easily gets left behind even after application has
closed the file or even terminated. Probably related to the use of clone()
which somewhat messes up the hosts view of which process owning the lock..

After this I gave up and rewrote this part to use flock instead of fcntl 
for locking. Seems to work much better except that locks are only local 
and does not protect from multiple stations accessing the same NFS mounted 
image..

Patch attached.

Regards
Henrik

[-- Attachment #2: Type: TEXT/PLAIN, Size: 898 bytes --]

Index: arch/um/os-Linux/file.c
===================================================================
RCS file: /cvsroot/user-mode-linux/linux/arch/um/os-Linux/file.c,v
retrieving revision 1.29
diff -u -r1.29 file.c
--- arch/um/os-Linux/file.c	7 Apr 2004 20:44:49 -0000	1.29
+++ arch/um/os-Linux/file.c	14 Apr 2004 00:41:22 -0000
@@ -688,6 +688,7 @@
 
 int os_lock_file(int fd, int excl)
 {
+#if USE_FCNTL_LOCK
 	int type = excl ? F_WRLCK : F_RDLCK;
 	struct flock lock = ((struct flock) { .l_type	= type,
 					      .l_whence	= SEEK_SET,
@@ -710,6 +711,21 @@
 	err = save;
  out:
 	return(err);
+#else
+	int type = excl ? LOCK_EX : LOCK_SH;
+	int err, save;
+
+	err = flock(fd, type | LOCK_NB);
+	if(!err)
+		goto out;
+
+	save = -errno;
+
+	printk("file already locked\n");
+	err = save;
+ out:
+	return(err);
+#endif
 }
 
 int os_ftruncate(int fd, __u64 size)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [uml-devel] [patch] Re: Reboot failing with file locked, CLONE_FILES and host kernel BUG()
  2004-04-14  1:13 ` [uml-devel] [patch] " Henrik Nordstrom
@ 2004-04-14 13:32   ` Marcin Pawlik
  2004-04-14 13:46     ` Marcin Pawlik
  2004-04-14 14:44     ` Henrik Nordstrom
  0 siblings, 2 replies; 6+ messages in thread
From: Marcin Pawlik @ 2004-04-14 13:32 UTC (permalink / raw)
  To: Henrik Nordstrom; +Cc: user-mode-linux-devel

[-- Attachment #1: Type: text/plain, Size: 3037 bytes --]

On Wed, Apr 14 at 03:13, Henrik Nordstrom wrote:
> On Tue, 13 Apr 2004, Marcin Pawlik wrote:
> 
>> #v+
>> F_SETLK failed, file already locked by pid 32589
>> Failed to lock 'fs', err = 11
>> #v-
> 
> Have been plauged by this quite a lot.. tried to narrow it down the
> other day but the conclusion was that the host fcntl locking
> implementation is buggy and stale locks easily gets left behind even
> after application has closed the file 

Do you know where and which thread closes the files? I tried to add file
closing to kill_io_thread() (patch attached) and it helps but I think it
should also be performed without my code.

> or even terminated. Probably related to the use of clone() which
> somewhat messes up the hosts view of which process owning the lock..

If it works as I suspected clone is used with CLONE_FILES. The lock is
released if any of file-sharing threads closes the file or all of them
are finished. The tracing thread is never finished so if the file is not
explicitly closed the host kernel shouldn't release the lock. This is
correct (the files should simply be closed by UML before reboot). 

The problem with host kernel is that it sometimes doesn't release the
lock even after all threads are finished and on 2.6.5 always hits a
BUG() line in locks_remove_flock. I don't see how this could be
exploited but it should be corrected anyway. On 2.6.5 it leaves
filesystem in inconsistent state with kernel unable to umount it.
I thought it would be nice to reproduce this with something simpler than
UML before reporting. Unfortunately I don't have sufficient UML
internals knowledge to mimic its threads creation, ptracing, file
locking and reboot which should lead to the same behavior.

> After this I gave up and rewrote this part to use flock instead of
> fcntl for locking. Seems to work much better except that locks are
> only local and does not protect from multiple stations accessing the
> same NFS mounted image..
> 
> Patch attached.
> 
>
> Index: arch/um/os-Linux/file.c
> ===================================================================
> RCS file: /cvsroot/user-mode-linux/linux/arch/um/os-Linux/file.c,v
> retrieving revision 1.29
> diff -u -r1.29 file.c
> --- arch/um/os-Linux/file.c	7 Apr 2004 20:44:49 -0000	1.29
> +++ arch/um/os-Linux/file.c	14 Apr 2004 00:41:22 -0000
> @@ -688,6 +688,7 @@
>  
>  int os_lock_file(int fd, int excl)
>  {
> +#if USE_FCNTL_LOCK
>  	int type = excl ? F_WRLCK : F_RDLCK;
>  	struct flock lock = ((struct flock) { .l_type	= type,
>  					      .l_whence	= SEEK_SET,
> @@ -710,6 +711,21 @@
>  	err = save;
>   out:
>  	return(err);
> +#else
> +	int type = excl ? LOCK_EX : LOCK_SH;

I don't understand this. IMO excl should be F_RDLCK or F_WRLCK. F_RDLCK
is 0, F_WRLCK is 1 and LOCK_EX is 2 so you will always use LOCK_SH.
Anyway I tried the patch on 2.4.24 with uml-patch-2.4.24-2 and it breaks
UML. It is unable to halt or restart with some of its processes left.
I don't know why, maybe because of mixed flock/fcntl calls.

Regards,

-- 
Marcin Pawlik

[-- Attachment #2: uml-kill-io-thread.patch --]
[-- Type: text/plain, Size: 577 bytes --]

diff -urN kernel-source-2.4.24/arch/um/drivers/ubd_kern.c kernel-source-2.4.24.mp/arch/um/drivers/ubd_kern.c
--- kernel-source-2.4.24/arch/um/drivers/ubd_kern.c	2004-04-14 14:38:21.000000000 +0200
+++ kernel-source-2.4.24.mp/arch/um/drivers/ubd_kern.c	2004-04-14 14:42:55.000000000 +0200
@@ -495,6 +495,16 @@
 
 void kill_io_thread(void)
 {
+	int i;
+	struct ubd * ubd_devp = ubd_dev;
+
+	for(i = 0; i < MAX_DEV; i++, ubd_devp++) {
+		if(ubd_devp) {
+			os_close_file(ubd_devp->fd);
+			close(ubd_devp->cow.fd);	
+		}
+	}
+
 	if(io_pid != -1)
 		os_kill_process(io_pid, 1);
 }

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [uml-devel] [patch] Re: Reboot failing with file locked, CLONE_FILES and host kernel BUG()
  2004-04-14 13:32   ` Marcin Pawlik
@ 2004-04-14 13:46     ` Marcin Pawlik
  2004-04-14 14:44     ` Henrik Nordstrom
  1 sibling, 0 replies; 6+ messages in thread
From: Marcin Pawlik @ 2004-04-14 13:46 UTC (permalink / raw)
  To: Henrik Nordstrom; +Cc: user-mode-linux-devel

On Wed, Apr 14 at 15:32, Marcin Pawlik wrote:
> Do you know where and which thread closes the files? I tried to add
> file closing to kill_io_thread() (patch attached) and it helps
[...]

> diff -urN kernel-source-2.4.24/arch/um/drivers/ubd_kern.c
> kernel-source-2.4.24.mp/arch/um/drivers/ubd_kern.c ---
> kernel-source-2.4.24/arch/um/drivers/ubd_kern.c	2004-04-14
> 14:38:21.000000000 +0200
> +++ kernel-source-2.4.24.mp/arch/um/drivers/ubd_kern.c	2004-04-14
> 14:42:55.000000000 +0200 @@ -495,6 +495,16 @@
>  
>  void kill_io_thread(void)
>  {
> +	int i;
> +	struct ubd * ubd_devp = ubd_dev;
> +
> +	for(i = 0; i < MAX_DEV; i++, ubd_devp++) {
> +		if(ubd_devp) {
> +			os_close_file(ubd_devp->fd);
> +			close(ubd_devp->cow.fd);

To be consistent I should of course change the line above to
"os_close_file(ubd_devp->cow.fd);", sorry.

Regards,

-- 
Marcin Pawlik


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [uml-devel] [patch] Re: Reboot failing with file locked, CLONE_FILES and host kernel BUG()
  2004-04-14 13:32   ` Marcin Pawlik
  2004-04-14 13:46     ` Marcin Pawlik
@ 2004-04-14 14:44     ` Henrik Nordstrom
  2004-04-14 17:09       ` Marcin Pawlik
  1 sibling, 1 reply; 6+ messages in thread
From: Henrik Nordstrom @ 2004-04-14 14:44 UTC (permalink / raw)
  To: Marcin Pawlik; +Cc: Henrik Nordstrom, user-mode-linux-devel

On Wed, 14 Apr 2004, Marcin Pawlik wrote:

> Do you know where and which thread closes the files? I tried to add file
> closing to kill_io_thread() (patch attached) and it helps but I think it
> should also be performed without my code.

No, I do not remember, but the thread which originally opened and locked 
the file is apparently not around after the UML has booted.

> The problem with host kernel is that it sometimes doesn't release the
> lock even after all threads are finished and on 2.6.5 always hits a
> BUG() line in locks_remove_flock. I don't see how this could be
> exploited but it should be corrected anyway. On 2.6.5 it leaves
> filesystem in inconsistent state with kernel unable to umount it.
> I thought it would be nice to reproduce this with something simpler than
> UML before reporting. Unfortunately I don't have sufficient UML
> internals knowledge to mimic its threads creation, ptracing, file
> locking and reboot which should lead to the same behavior.

Indeed. I am not of much help here however..

> >  int os_lock_file(int fd, int excl)
> >  {
> > +#if USE_FCNTL_LOCK
> >  	int type = excl ? F_WRLCK : F_RDLCK;
> >  	struct flock lock = ((struct flock) { .l_type	= type,
> >  					      .l_whence	= SEEK_SET,
> > @@ -710,6 +711,21 @@
> >  	err = save;
> >   out:
> >  	return(err);
> > +#else
> > +	int type = excl ? LOCK_EX : LOCK_SH;
> 
> I don't understand this. IMO excl should be F_RDLCK or F_WRLCK. F_RDLCK
> is 0, F_WRLCK is 1 and LOCK_EX is 2 so you will always use LOCK_SH.

???

excl is a boolean, true if the lock should be exclusive (write access),
false if it is a shared lock (read-only). This is how the UML function 
os_lock_file is defined. This function does not expect fcntl lock names as 
argument.

In addition the flock api does not use the F_XXX flags. None of the code
mentioning F_XXX flags is relevant to the flock implemention which is
below after the #else.

What the patch does is that it completely replaces os_lock_file with 
another implementation using flock instead of fcntl, with the old 
implementation #ifdef USE_FCNTL_LOCK (which is not defined).

> Anyway I tried the patch on 2.4.24 with uml-patch-2.4.24-2 and it breaks
> UML. It is unable to halt or restart with some of its processes left.
> I don't know why, maybe because of mixed flock/fcntl calls.

Seems to works here.. there os no other uses of F_SETLK in my uml tree.  
Using this successfully on RedHat-8 (2.4.20 somthing host kernel, no SKAS)  
and Fedora Core 2 test 1 + SKAS (2.6.something + SKAS host kernel)..

Regards
Henrik



-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [uml-devel] [patch] Re: Reboot failing with file locked, CLONE_FILES and host kernel BUG()
  2004-04-14 14:44     ` Henrik Nordstrom
@ 2004-04-14 17:09       ` Marcin Pawlik
  0 siblings, 0 replies; 6+ messages in thread
From: Marcin Pawlik @ 2004-04-14 17:09 UTC (permalink / raw)
  To: Henrik Nordstrom; +Cc: user-mode-linux-devel

On Wed, Apr 14 at 16:44, Henrik Nordstrom wrote:
> On Wed, 14 Apr 2004, Marcin Pawlik wrote:
> 
>> Do you know where and which thread closes the files? I tried to add
>> file closing to kill_io_thread() (patch attached) and it helps but I
>> think it should also be performed without my code.
> 
> No, I do not remember, but the thread which originally opened and
> locked the file is apparently not around after the UML has booted.

Yes, but this is not necessarily a problem. Any thread sharing file
description table can close (and therefore unlock) the file.

[...]
>>>  int os_lock_file(int fd, int excl)
>>>  {
>>> +#if USE_FCNTL_LOCK
>>>  	int type = excl ? F_WRLCK : F_RDLCK;
>>>  	struct flock lock = ((struct flock) { .l_type	= type,
>>>  					      .l_whence	= SEEK_SET,
>>> @@ -710,6 +711,21 @@
>>>  	err = save;
>>>   out:
>>>  	return(err);
>>> +#else
>>> +	int type = excl ? LOCK_EX : LOCK_SH;
>> 
>> I don't understand this. IMO excl should be F_RDLCK or F_WRLCK.
>> F_RDLCK is 0, F_WRLCK is 1 and LOCK_EX is 2 so you will always use
>> LOCK_SH.
> 
> ???
> 
> excl is a boolean, true if the lock should be exclusive (write

Ups. Yes, you are absolutely right. I thought... Well I don't know.
I'm sorry. Probably I should take some sleep :/

[...]
> Seems to works here.. there os no other uses of F_SETLK in my uml
> tree.  Using this successfully on RedHat-8 (2.4.20 somthing host
> kernel, no SKAS)  and Fedora Core 2 test 1 + SKAS (2.6.something +
> SKAS host kernel)..

I tried it on Debian testing/unstable with different host kernels
(2.4.25, 2.4.25 with skas, 2.6.5, 2.4.18-1-k7 from Debian) and the same
binary on RHEL 3.0 with some 2.4.21. Doesn't work for me. After "cad" or
"halt" in uml_mconsole I have sleeping and traced proceses left. 

I placed my testing UML binary and the filesystem (infinite loop in
/sbin/init) on http://www.pwr.wroc.pl/~marcinp/uml/uml.tar.gz (~1.2 MB).
Maybe it depends on UML configuration or compiler used. Could you send
me your --showconfig?

Regards,

-- 
Marcin Pawlik


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2004-04-14 18:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-04-13 21:23 [uml-devel] Reboot failing with file locked, CLONE_FILES and host kernel BUG() Marcin Pawlik
2004-04-14  1:13 ` [uml-devel] [patch] " Henrik Nordstrom
2004-04-14 13:32   ` Marcin Pawlik
2004-04-14 13:46     ` Marcin Pawlik
2004-04-14 14:44     ` Henrik Nordstrom
2004-04-14 17:09       ` Marcin Pawlik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.