* [uml-devel] Occasional hang starting up.
@ 2006-03-11 18:01 Rob Landley
2006-03-12 20:33 ` Blaisorblade
2006-03-14 20:22 ` Jeff Dike
0 siblings, 2 replies; 19+ messages in thread
From: Rob Landley @ 2006-03-11 18:01 UTC (permalink / raw)
To: user-mode-linux-devel
On linux-2.6.16-rc5 I'm occasionally getting a hang starting up (with
rootfstype=hostfs), right after the initializing VFS line. This time when I
killed it, I got the following error (repeated twice):
remove_umid_dir - actually_do_remove failed with err = -2
I'm guessing that's where it's hanging?
Rob
--
Never bet against the cheap plastic solution.
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [uml-devel] Occasional hang starting up.
2006-03-11 18:01 [uml-devel] Occasional hang starting up Rob Landley
@ 2006-03-12 20:33 ` Blaisorblade
2006-03-14 20:22 ` Jeff Dike
1 sibling, 0 replies; 19+ messages in thread
From: Blaisorblade @ 2006-03-12 20:33 UTC (permalink / raw)
To: user-mode-linux-devel; +Cc: Rob Landley
On Saturday 11 March 2006 19:01, Rob Landley wrote:
> On linux-2.6.16-rc5 I'm occasionally getting a hang starting up (with
> rootfstype=hostfs), right after the initializing VFS line. This time when
> I killed it, I got the following error (repeated twice):
>
> remove_umid_dir - actually_do_remove failed with err = -2
> I'm guessing that's where it's hanging?
I guess not.
That's a shutdown function - when UML exits it should remove ~/.uml/<umid> and
contained files.
--
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade
___________________________________
Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB
http://mail.yahoo.it
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [uml-devel] Occasional hang starting up.
2006-03-11 18:01 [uml-devel] Occasional hang starting up Rob Landley
2006-03-12 20:33 ` Blaisorblade
@ 2006-03-14 20:22 ` Jeff Dike
2006-03-14 21:31 ` Rob Landley
2006-03-16 14:22 ` Werner Almesberger
1 sibling, 2 replies; 19+ messages in thread
From: Jeff Dike @ 2006-03-14 20:22 UTC (permalink / raw)
To: Rob Landley; +Cc: user-mode-linux-devel
On Sat, Mar 11, 2006 at 01:01:41PM -0500, Rob Landley wrote:
> On linux-2.6.16-rc5 I'm occasionally getting a hang starting up (with
> rootfstype=hostfs), right after the initializing VFS line.
I'm in the process of beating on the newer externfs-based hostfs and humfs,
so I'm not too concerned with bugs in the old hostfs. Having said that,
if you can get a stack from the hang, it might turn out to be an easily
diagnosed and fixed bug.
> This time when I
> killed it, I got the following error (repeated twice):
>
> remove_umid_dir - actually_do_remove failed with err = -2
There's some confusion sometimes during shutdown when UML is forcibly killed.
I haven't really looked at this, but it's harmless.
Jeff
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [uml-devel] Occasional hang starting up.
2006-03-14 20:22 ` Jeff Dike
@ 2006-03-14 21:31 ` Rob Landley
2006-03-16 14:22 ` Werner Almesberger
1 sibling, 0 replies; 19+ messages in thread
From: Rob Landley @ 2006-03-14 21:31 UTC (permalink / raw)
To: user-mode-linux-devel; +Cc: Jeff Dike
On Tuesday 14 March 2006 3:22 pm, Jeff Dike wrote:
> On Sat, Mar 11, 2006 at 01:01:41PM -0500, Rob Landley wrote:
> > On linux-2.6.16-rc5 I'm occasionally getting a hang starting up (with
> > rootfstype=hostfs), right after the initializing VFS line.
>
> I'm in the process of beating on the newer externfs-based hostfs and humfs,
> so I'm not too concerned with bugs in the old hostfs. Having said that,
> if you can get a stack from the hang, it might turn out to be an easily
> diagnosed and fixed bug.
If it's in code that's going away I can live with it. Let me know when the
new hostfs is ready for wider testing and I'll give it a whirl...
Rob
--
Never bet against the cheap plastic solution.
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [uml-devel] Occasional hang starting up.
2006-03-14 20:22 ` Jeff Dike
2006-03-14 21:31 ` Rob Landley
@ 2006-03-16 14:22 ` Werner Almesberger
2006-03-17 19:36 ` Blaisorblade
1 sibling, 1 reply; 19+ messages in thread
From: Werner Almesberger @ 2006-03-16 14:22 UTC (permalink / raw)
To: Jeff Dike; +Cc: Rob Landley, user-mode-linux-devel
Jeff Dike wrote:
> Having said that,
> if you can get a stack from the hang, it might turn out to be an easily
> diagnosed and fixed bug.
Seems that arch/um/kernel/sigio_user.c:write_sigio_thread begins
its poll before there are any fds in current_poll, so update_thread
caused by a add_sigio_fd shortly thereafter hangs while waiting for
a response.
A sleep(1) at the beginning of write_sigio_thread "fixes" this :-)
- Werner
--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [uml-devel] Occasional hang starting up.
2006-03-16 14:22 ` Werner Almesberger
@ 2006-03-17 19:36 ` Blaisorblade
2006-03-17 21:27 ` Werner Almesberger
` (2 more replies)
0 siblings, 3 replies; 19+ messages in thread
From: Blaisorblade @ 2006-03-17 19:36 UTC (permalink / raw)
To: user-mode-linux-devel; +Cc: Werner Almesberger, Jeff Dike, Rob Landley
[-- Attachment #1: Type: text/plain, Size: 1395 bytes --]
On Thursday 16 March 2006 15:22, Werner Almesberger wrote:
> Jeff Dike wrote:
> > Having said that,
> > if you can get a stack from the hang, it might turn out to be an easily
> > diagnosed and fixed bug.
>
> Seems that arch/um/kernel/sigio_user.c:write_sigio_thread begins
> its poll before there are any fds in current_poll
Ah, and you mean it hangs...
> , so update_thread
> caused by a add_sigio_fd shortly thereafter hangs while waiting for
> a response.
Clearly
> A sleep(1) at the beginning of write_sigio_thread "fixes" this :-)
Er... there's only one thing which seems unclear - poll(NULL, 0, -1) shouldn't
fail with -EFAULT instead of hanging? Yep, it should. I didn't believe what
you said, so. But then I went to check the implementation. And, indeed, it
seems that Linux isn't up to this :-).
So:
1) I'll try to fix poll(2) to return -EINVAL. Dunno whether anyone will say
"no, the app is stupid, it deserves no error", but hope not (with "try" I
refer to this). Attached patch should do this.
2) write_sigio_thread should do a "down" on a semaphore/mutex and the first
update_thread should "up" it. As usually, this semaphore would be indeed
implemented as a pipe.
--
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade
[-- Attachment #2: poll-mustn_t-hang-0-fds.patch --]
[-- Type: text/x-diff, Size: 1062 bytes --]
poll(2): don't hang caller if nfds == 0
From: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>, Werner Almesberger <wa@almesberger.net>
As Werner Almesberger observed in a real program, poll(2) will never return if
called with nfds == 0; which is stupid but not so badly deserving.
I looked at the code and verified that indeed such a case is never detected and
do_poll keeps looping, checking any fd passed, overlooking it wasn't passed any
and calling schedule_timeout() endlessly. At least it's a TASK_INTERRUPTIBLE
sleep...
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Index: linux-2.6.git/fs/select.c
===================================================================
--- linux-2.6.git.orig/fs/select.c
+++ linux-2.6.git/fs/select.c
@@ -629,6 +629,10 @@ int do_sys_poll(struct pollfd __user *uf
struct fdtable *fdt;
int max_fdset;
+ /* Otherwise below code would hang! */
+ if (unlikely(nfds < 1))
+ return -EINVAL;
+
/* Do a sanity check on nfds ... */
rcu_read_lock();
fdt = files_fdtable(current->files);
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [uml-devel] Occasional hang starting up.
2006-03-17 19:36 ` Blaisorblade
@ 2006-03-17 21:27 ` Werner Almesberger
2006-03-17 22:14 ` Blaisorblade
2006-03-20 10:20 ` Werner Almesberger
2006-03-21 1:00 ` Jeff Dike
2 siblings, 1 reply; 19+ messages in thread
From: Werner Almesberger @ 2006-03-17 21:27 UTC (permalink / raw)
To: Blaisorblade; +Cc: user-mode-linux-devel, Jeff Dike, Rob Landley
Blaisorblade wrote:
> 1) I'll try to fix poll(2) to return -EINVAL. Dunno whether anyone will say
> "no, the app is stupid, it deserves no error", but hope not (with "try" I
> refer to this). Attached patch should do this.
I think the behaviour of "poll" with nfds = 0 is correct as it is.
At least POSIX doesn't say anything to the extent that "nfds"
couldn't be zero:
http://www.opengroup.org/onlinepubs/009695399/functions/poll.html
Furthermore, this is one of several ways to implement a "sleep"
function with sub-second granularity, so existing applications
may already depend on "poll" accepting nfds = 0. (Of course,
they would probably be better off using "nanosleep".)
> 2) write_sigio_thread should do a "down" on a semaphore/mutex and the first
> update_thread should "up" it.
Yup :-)
Thanks,
- Werner
--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [uml-devel] Occasional hang starting up.
2006-03-17 21:27 ` Werner Almesberger
@ 2006-03-17 22:14 ` Blaisorblade
2006-03-17 23:54 ` Werner Almesberger
0 siblings, 1 reply; 19+ messages in thread
From: Blaisorblade @ 2006-03-17 22:14 UTC (permalink / raw)
To: Werner Almesberger; +Cc: user-mode-linux-devel, Jeff Dike, Rob Landley
On Friday 17 March 2006 22:27, Werner Almesberger wrote:
> Blaisorblade wrote:
> > 1) I'll try to fix poll(2) to return -EINVAL. Dunno whether anyone will
> > say "no, the app is stupid, it deserves no error", but hope not (with
> > "try" I refer to this). Attached patch should do this.
>
> I think the behaviour of "poll" with nfds = 0 is correct as it is.
> At least POSIX doesn't say anything to the extent that "nfds"
> couldn't be zero:
> http://www.opengroup.org/onlinepubs/009695399/functions/poll.html
I don't think that "hanging" is a correct behaviour anyway... unless the
application *wanted* to wait for a signal, but this is a clumsy way to do
this (I don't think there's any difference from pause()).
> Furthermore, this is one of several ways to implement a "sleep"
> function with sub-second granularity
> , so existing applications
> may already depend on "poll" accepting nfds = 0. (Of course,
> they would probably be better off using "nanosleep".)
This is a very smart note, however usleep(3) exists since 4.3 BSD so this is a
bit unlikely; however you're surely right on this.
> > 2) write_sigio_thread should do a "down" on a semaphore/mutex and the
> > first update_thread should "up" it.
>
> Yup :-)
>
> Thanks,
> - Werner
--
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade
___________________________________
Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB
http://mail.yahoo.it
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [uml-devel] Occasional hang starting up.
2006-03-17 22:14 ` Blaisorblade
@ 2006-03-17 23:54 ` Werner Almesberger
0 siblings, 0 replies; 19+ messages in thread
From: Werner Almesberger @ 2006-03-17 23:54 UTC (permalink / raw)
To: Blaisorblade; +Cc: user-mode-linux-devel, Jeff Dike, Rob Landley
Blaisorblade wrote:
> however usleep(3) exists since 4.3 BSD so this is a
> bit unlikely;
Oh, there must be a million of passably valid reasons why someone
may still do it this way, e.g.,
- because they don't know of the more suitable function
- because they don't trust the more suitable function for some
reason
- because they're using some restricted set of functions,
e.g., from a portable library
- because the other function isn't available on all platforms
they care about, while the "poll" functionality (perhaps
through some abstraction that may also use "select") is
- because they use a signal to get out of the initial "hang"
- because the semantics of their code actually say "I/O or
timeout"
- because that's what the code that has passed all those
reviews and certifications does, and nobody is allowed to
touch it. Ever.
etc.
Just because this valid if unexpected behaviour was a little
inconvenient for us isn't a good reason for breaking it. If
a safe, easy, and incredibly boring life was the top item on
our agenda, we should start with getting rid of concurrency ;-)
- Werner
--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [uml-devel] Occasional hang starting up.
2006-03-17 19:36 ` Blaisorblade
2006-03-17 21:27 ` Werner Almesberger
@ 2006-03-20 10:20 ` Werner Almesberger
2006-03-21 1:00 ` Jeff Dike
2 siblings, 0 replies; 19+ messages in thread
From: Werner Almesberger @ 2006-03-20 10:20 UTC (permalink / raw)
To: Blaisorblade; +Cc: user-mode-linux-devel, Jeff Dike, Rob Landley
Blaisorblade wrote:
> 2) write_sigio_thread should do a "down" on a semaphore/mutex and the first
> update_thread should "up" it. As usually, this semaphore would be indeed
> implemented as a pipe.
Be the way, in case a proper fix is too involved for 2.6.16.x,
here's a quick and dirty work-around that should have no ill
side-effects:
This hack works around a race condition that can make UML hang
on startup. If the race condition is encountered, this change
limits the time of the hang to one second. Besides that, it can
cause a small performance penalty by waking up an otherwise
idle "poll" every second.
This is intended purely as a readily applicable band aid and
should not be considered a long-term fix.
- Werner
---------------------------------- cut here -----------------------------------
Signed-off-by: Werner Almesberger <werner@almesberger.net>
--- linux-2.6.16.orig/arch/um/kernel/sigio_user.c 2006-03-20 02:53:29.%N -0300
+++ linux-2.6.16/arch/um/kernel/sigio_user.c 2006-03-20 06:27:11.%N -0300
@@ -184,7 +184,7 @@
signal(SIGWINCH, SIG_IGN);
fds = ¤t_poll;
while(1){
- n = poll(fds->poll, fds->used, -1);
+ n = poll(fds->poll, fds->used, 1000);
if(n < 0){
if(errno == EINTR) continue;
printk("write_sigio_thread : poll returned %d, "
--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [uml-devel] Occasional hang starting up.
2006-03-17 19:36 ` Blaisorblade
2006-03-17 21:27 ` Werner Almesberger
2006-03-20 10:20 ` Werner Almesberger
@ 2006-03-21 1:00 ` Jeff Dike
2006-03-21 1:21 ` Blaisorblade
2 siblings, 1 reply; 19+ messages in thread
From: Jeff Dike @ 2006-03-21 1:00 UTC (permalink / raw)
To: Blaisorblade; +Cc: user-mode-linux-devel, Werner Almesberger, Rob Landley
On Fri, Mar 17, 2006 at 08:36:58PM +0100, Blaisorblade wrote:
> 2) write_sigio_thread should do a "down" on a semaphore/mutex and the first
> update_thread should "up" it. As usually, this semaphore would be indeed
> implemented as a pipe.
Is there anything wrong with this:
Index: linux-2.6.15/arch/um/os-Linux/sigio.c
===================================================================
--- linux-2.6.15.orig/arch/um/os-Linux/sigio.c 2006-03-20 19:48:51.000000000 -0500
+++ linux-2.6.15/arch/um/os-Linux/sigio.c 2006-03-20 19:49:55.000000000 -0500
@@ -271,6 +271,10 @@ void write_sigio_workaround(void)
if(write_sigio_pid != -1)
goto out_unlock;
+ current_poll = ((struct pollfds) { .poll = p,
+ .used = 1,
+ .size = 1 });
+
write_sigio_pid = run_helper_thread(write_sigio_thread, NULL,
CLONE_FILES | CLONE_VM, &stack, 0);
@@ -284,10 +288,6 @@ void write_sigio_workaround(void)
memcpy(write_sigio_fds, l_write_sigio_fds, sizeof(l_write_sigio_fds));
memcpy(sigio_private, l_sigio_private, sizeof(l_sigio_private));
- current_poll = ((struct pollfds) { .poll = p,
- .used = 1,
- .size = 1 });
-
sigio_unlock();
return;
That seems to me to fix the basic problem - the thread is using data
that hasn't been set up yet, and I don't see that moving it up breaks
anything.
Jeff
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [uml-devel] Occasional hang starting up.
2006-03-21 1:00 ` Jeff Dike
@ 2006-03-21 1:21 ` Blaisorblade
2006-03-23 18:45 ` Jeff Dike
0 siblings, 1 reply; 19+ messages in thread
From: Blaisorblade @ 2006-03-21 1:21 UTC (permalink / raw)
To: user-mode-linux-devel; +Cc: Jeff Dike, Werner Almesberger, Rob Landley
On Tuesday 21 March 2006 02:00, Jeff Dike wrote:
> On Fri, Mar 17, 2006 at 08:36:58PM +0100, Blaisorblade wrote:
> > 2) write_sigio_thread should do a "down" on a semaphore/mutex and the
> > first update_thread should "up" it. As usually, this semaphore would be
> > indeed implemented as a pipe.
> Is there anything wrong with this:
> Index: linux-2.6.15/arch/um/os-Linux/sigio.c
> ===================================================================
> --- linux-2.6.15.orig/arch/um/os-Linux/sigio.c 2006-03-20
> 19:48:51.000000000 -0500 +++
> linux-2.6.15/arch/um/os-Linux/sigio.c 2006-03-20 19:49:55.000000000 -0500
> @@ -271,6 +271,10 @@ void write_sigio_workaround(void)
> if(write_sigio_pid != -1)
> goto out_unlock;
>
> + current_poll = ((struct pollfds) { .poll = p,
> + .used = 1,
> + .size = 1 });
> +
> write_sigio_pid = run_helper_thread(write_sigio_thread, NULL,
> CLONE_FILES | CLONE_VM, &stack, 0);
>
> @@ -284,10 +288,6 @@ void write_sigio_workaround(void)
> memcpy(write_sigio_fds, l_write_sigio_fds, sizeof(l_write_sigio_fds));
> memcpy(sigio_private, l_sigio_private, sizeof(l_sigio_private));
>
> - current_poll = ((struct pollfds) { .poll = p,
> - .used = 1,
> - .size = 1 });
> -
> sigio_unlock();
> return;
>
> That seems to me to fix the basic problem - the thread is using data
> that hasn't been set up yet, and I don't see that moving it up breaks
> anything.
> Jeff
Ok, this is conceptually correct, and looking at 2.6.14 code this bug wasn't
there, so _I_ introduced it, while cleaning it up. Sorry. Beyond, my idea of
adding the semaphore was incorrect (it would work only if
write_sigio_workaround did the up()).
However, while I overlooked this, you overlooked what I did:
*) you must also move the setting of write_sigio_fds and sigio_private above,
or we'll get the same problem again in different places of
write_sigio_thread.
*) you must update the exit path. I've taken care so that only if everything
succeed the result is stored, and to hold the lock for the minimum time
needed (particularly not while doing any allocation). I'm not sure whether
the first property is needed, but it would be conservative to do so to
preserve coherence of data structures.
In particular: if we store an fd != -1, some function may later close it (say
when the console is closed); if we closed it in the exit path, then we're
closing an unrelated fd and getting a bug.
So, if the thread startup fails, you _must_ clear again current_poll,
write_sigio_fds and sigio_private.
--
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade
___________________________________
Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB
http://mail.yahoo.it
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [uml-devel] Occasional hang starting up.
2006-03-21 1:21 ` Blaisorblade
@ 2006-03-23 18:45 ` Jeff Dike
2006-03-23 19:05 ` Werner Almesberger
2006-03-24 0:37 ` Blaisorblade
0 siblings, 2 replies; 19+ messages in thread
From: Jeff Dike @ 2006-03-23 18:45 UTC (permalink / raw)
To: Blaisorblade; +Cc: user-mode-linux-devel, Werner Almesberger, Rob Landley
On Tue, Mar 21, 2006 at 02:21:40AM +0100, Blaisorblade wrote:
> However, while I overlooked this, you overlooked what I did:
> *) you must also move the setting of write_sigio_fds and sigio_private above,
> or we'll get the same problem again in different places of
> write_sigio_thread.
Fixed.
> *) you must update the exit path. I've taken care so that only if everything
> succeed the result is stored, and to hold the lock for the minimum time
> needed (particularly not while doing any allocation). I'm not sure whether
> the first property is needed, but it would be conservative to do so to
> preserve coherence of data structures.
>
> In particular: if we store an fd != -1, some function may later close it (say
> when the console is closed); if we closed it in the exit path, then we're
> closing an unrelated fd and getting a bug.
>
> So, if the thread startup fails, you _must_ clear again current_poll,
> write_sigio_fds and sigio_private.
OK, this is all done, I think.
What do you think about this?
Index: linux-2.6.15/arch/um/os-Linux/sigio.c
===================================================================
--- linux-2.6.15.orig/arch/um/os-Linux/sigio.c 2006-03-23 12:55:27.000000000 -0500
+++ linux-2.6.15/arch/um/os-Linux/sigio.c 2006-03-23 13:41:11.000000000 -0500
@@ -269,42 +269,41 @@ void write_sigio_workaround(void)
/* Did we race? Don't try to optimize this, please, it's not so likely
* to happen, and no more than once at the boot. */
if(write_sigio_pid != -1)
- goto out_unlock;
+ goto out_free;
- write_sigio_pid = run_helper_thread(write_sigio_thread, NULL,
- CLONE_FILES | CLONE_VM, &stack, 0);
-
- if (write_sigio_pid < 0)
- goto out_clear;
+ current_poll = ((struct pollfds) { .poll = p,
+ .used = 1,
+ .size = 1 });
if (write_sigio_irq(l_write_sigio_fds[0]))
- goto out_kill;
+ goto out_free;
- /* Success, finally. */
memcpy(write_sigio_fds, l_write_sigio_fds, sizeof(l_write_sigio_fds));
memcpy(sigio_private, l_sigio_private, sizeof(l_sigio_private));
- current_poll = ((struct pollfds) { .poll = p,
- .used = 1,
- .size = 1 });
+ write_sigio_pid = run_helper_thread(write_sigio_thread, NULL,
+ CLONE_FILES | CLONE_VM, &stack, 0);
- sigio_unlock();
- return;
+ if (write_sigio_pid < 0)
+ goto out_clear;
- out_kill:
- l_write_sigio_pid = write_sigio_pid;
- write_sigio_pid = -1;
sigio_unlock();
- /* Going to call waitpid, avoid holding the lock. */
- os_kill_process(l_write_sigio_pid, 1);
- goto out_free;
+ return;
out_clear:
write_sigio_pid = -1;
- out_unlock:
- sigio_unlock();
+ current_poll = ((struct pollfds) { .poll = NULL,
+ .size = 0,
+ .used = 0 });
+ write_sigio_fds[0] = -1;
+ write_sigio_fds[1] = -1;
+
+ sigio_private[0] = -1;
+ sigio_private[1] = -1;
+
out_free:
kfree(p);
+ sigio_unlock();
out_close2:
close(l_sigio_private[0]);
close(l_sigio_private[1]);
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [uml-devel] Occasional hang starting up.
2006-03-23 18:45 ` Jeff Dike
@ 2006-03-23 19:05 ` Werner Almesberger
2006-03-23 20:13 ` Jeff Dike
2006-03-24 0:37 ` Blaisorblade
1 sibling, 1 reply; 19+ messages in thread
From: Werner Almesberger @ 2006-03-23 19:05 UTC (permalink / raw)
To: Jeff Dike; +Cc: Blaisorblade, user-mode-linux-devel, Rob Landley
Jeff Dike wrote:
> Index: linux-2.6.15/arch/um/os-Linux/sigio.c
Hmm, is this a link ?
% find linux-2.6.15.5/arch/um -name sigio.c
%
The patch applies cleanly to arch/um/kernel/sigio_user.c of 2.6.16,
though, and indeed seems to solve the problem.
Thanks,
- Werner
--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [uml-devel] Occasional hang starting up.
2006-03-23 19:05 ` Werner Almesberger
@ 2006-03-23 20:13 ` Jeff Dike
2006-03-23 21:23 ` Werner Almesberger
0 siblings, 1 reply; 19+ messages in thread
From: Jeff Dike @ 2006-03-23 20:13 UTC (permalink / raw)
To: Werner Almesberger; +Cc: Blaisorblade, user-mode-linux-devel, Rob Landley
On Thu, Mar 23, 2006 at 04:05:33PM -0300, Werner Almesberger wrote:
> Hmm, is this a link ?
>
> % find linux-2.6.15.5/arch/um -name sigio.c
> %
Nope, it's just later in my patchset than one which moves the file.
> The patch applies cleanly to arch/um/kernel/sigio_user.c of 2.6.16,
> though, and indeed seems to solve the problem.
Yup, and that's the original location.
Jeff
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [uml-devel] Occasional hang starting up.
2006-03-23 20:13 ` Jeff Dike
@ 2006-03-23 21:23 ` Werner Almesberger
2006-03-23 21:43 ` Jeff Dike
0 siblings, 1 reply; 19+ messages in thread
From: Werner Almesberger @ 2006-03-23 21:23 UTC (permalink / raw)
To: Jeff Dike; +Cc: Blaisorblade, user-mode-linux-devel, Rob Landley
Jeff Dike wrote:
> Nope, it's just later in my patchset than one which moves the file.
Ah, mysteries of the world, easily explained :-)
Do you plan to submit this for 2.6.16.x ?
- Werner
--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [uml-devel] Occasional hang starting up.
2006-03-23 21:23 ` Werner Almesberger
@ 2006-03-23 21:43 ` Jeff Dike
0 siblings, 0 replies; 19+ messages in thread
From: Jeff Dike @ 2006-03-23 21:43 UTC (permalink / raw)
To: Werner Almesberger; +Cc: Blaisorblade, user-mode-linux-devel, Rob Landley
On Thu, Mar 23, 2006 at 06:23:11PM -0300, Werner Almesberger wrote:
> Ah, mysteries of the world, easily explained :-)
:-)
> Do you plan to submit this for 2.6.16.x ?
Haven't thought about that yet. It would certainly be a candidate.
Jeff
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [uml-devel] Occasional hang starting up.
2006-03-23 18:45 ` Jeff Dike
2006-03-23 19:05 ` Werner Almesberger
@ 2006-03-24 0:37 ` Blaisorblade
2006-03-24 2:26 ` Jeff Dike
1 sibling, 1 reply; 19+ messages in thread
From: Blaisorblade @ 2006-03-24 0:37 UTC (permalink / raw)
To: user-mode-linux-devel; +Cc: Jeff Dike, Werner Almesberger, Rob Landley
[-- Attachment #1: Type: text/plain, Size: 3630 bytes --]
On Thursday 23 March 2006 19:45, Jeff Dike wrote:
> On Tue, Mar 21, 2006 at 02:21:40AM +0100, Blaisorblade wrote:
> > However, while I overlooked this, you overlooked what I did:
> > *) you must also move the setting of write_sigio_fds and sigio_private
> > above, or we'll get the same problem again in different places of
> > write_sigio_thread.
> OK, this is all done, I think.
> What do you think about this?
Yep, it seems it looks good. But 1 bug still and two suggestions.
See also attached patch (it misses one of the suggestions because it's late
and I'm going to sleep).
> Index: linux-2.6.15/arch/um/os-Linux/sigio.c
> ===================================================================
> --- linux-2.6.15.orig/arch/um/os-Linux/sigio.c 2006-03-23
> 12:55:27.000000000 -0500 +++
> linux-2.6.15/arch/um/os-Linux/sigio.c 2006-03-23 13:41:11.000000000 -0500
> @@ -269,42 +269,41 @@ void write_sigio_workaround(void)
> /* Did we race? Don't try to optimize this, please, it's not so likely
> * to happen, and no more than once at the boot. */
> if(write_sigio_pid != -1)
> - goto out_unlock;
> + goto out_free;
>
> - write_sigio_pid = run_helper_thread(write_sigio_thread, NULL,
> - CLONE_FILES | CLONE_VM, &stack, 0);
> -
> - if (write_sigio_pid < 0)
> - goto out_clear;
> + current_poll = ((struct pollfds) { .poll = p,
> + .used = 1,
> + .size = 1 });
>
> if (write_sigio_irq(l_write_sigio_fds[0]))
> - goto out_kill;
> + goto out_free;
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Here you're not clearing current_poll!
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> - /* Success, finally. */
> memcpy(write_sigio_fds, l_write_sigio_fds, sizeof(l_write_sigio_fds));
> memcpy(sigio_private, l_sigio_private, sizeof(l_sigio_private));
>
> - current_poll = ((struct pollfds) { .poll = p,
> - .used = 1,
> - .size = 1 });
> + write_sigio_pid = run_helper_thread(write_sigio_thread, NULL,
> + CLONE_FILES | CLONE_VM, &stack, 0);
>
> - sigio_unlock();
> - return;
> + if (write_sigio_pid < 0)
> + goto out_clear;
>
> - out_kill:
> - l_write_sigio_pid = write_sigio_pid;
> - write_sigio_pid = -1;
> sigio_unlock();
> - /* Going to call waitpid, avoid holding the lock. */
> - os_kill_process(l_write_sigio_pid, 1);
> - goto out_free;
> + return;
>
> out_clear:
> write_sigio_pid = -1;
> - out_unlock:
> - sigio_unlock();
> + current_poll = ((struct pollfds) { .poll = NULL,
> + .size = 0,
> + .used = 0 });
> + write_sigio_fds[0] = -1;
> + write_sigio_fds[1] = -1;
> +
> + sigio_private[0] = -1;
> + sigio_private[1] = -1;
> +
If possible (there should be such array constructors) this shouldn't be
redundant / duplicated with the initial declarations.
Like write_sigio_fds = (int[]) {-1,-1} and having a common macro like
#define SIGIO_FDS_INIT {-1,-1}
static int write_sigio_fds[2] = SIGIO_FDS_INIT;
...
write_sigio_fds = (int[]) SIGIO_FDS_INIT
(we _will_ get one copy wrong, if we can :-)).
If sending to 2.6.16 this can be kept separate or they could maybe complain
"ah, no extra changes, just the fix!".
Btw, this one is not attached.
> out_free:
> kfree(p);
> + sigio_unlock();
p is a local variable, so sigio_unlock can be done before kfree(p).
> out_close2:
> close(l_sigio_private[0]);
> close(l_sigio_private[1]);
--
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade
[-- Attachment #2: uml-fix-sigio-startup-race-hang-fix.patch --]
[-- Type: text/x-diff, Size: 1076 bytes --]
Index: linux-2.6.git/arch/um/kernel/sigio_user.c
===================================================================
--- linux-2.6.git.orig/arch/um/kernel/sigio_user.c
+++ linux-2.6.git/arch/um/kernel/sigio_user.c
@@ -398,7 +398,7 @@ void write_sigio_workaround(void)
.size = 1 });
if (write_sigio_irq(l_write_sigio_fds[0]))
- goto out_free;
+ goto out_clear_poll;
memcpy(write_sigio_fds, l_write_sigio_fds, sizeof(l_write_sigio_fds));
memcpy(sigio_private, l_sigio_private, sizeof(l_sigio_private));
@@ -414,18 +414,20 @@ void write_sigio_workaround(void)
out_clear:
write_sigio_pid = -1;
- current_poll = ((struct pollfds) { .poll = NULL,
- .size = 0,
- .used = 0 });
+
write_sigio_fds[0] = -1;
write_sigio_fds[1] = -1;
sigio_private[0] = -1;
sigio_private[1] = -1;
+ out_clear_poll:
+ current_poll = ((struct pollfds) { .poll = NULL,
+ .size = 0,
+ .used = 0 });
out_free:
- kfree(p);
sigio_unlock();
+ kfree(p);
out_close2:
os_close_file(l_sigio_private[0]);
os_close_file(l_sigio_private[1]);
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [uml-devel] Occasional hang starting up.
2006-03-24 0:37 ` Blaisorblade
@ 2006-03-24 2:26 ` Jeff Dike
0 siblings, 0 replies; 19+ messages in thread
From: Jeff Dike @ 2006-03-24 2:26 UTC (permalink / raw)
To: Blaisorblade; +Cc: user-mode-linux-devel, Werner Almesberger, Rob Landley
[-- Attachment #1: Type: text/plain, Size: 765 bytes --]
On Fri, Mar 24, 2006 at 01:37:54AM +0100, Blaisorblade wrote:
> Yep, it seems it looks good. But 1 bug still and two suggestions.
> See also attached patch (it misses one of the suggestions because it's late
> and I'm going to sleep).
I merged that patch.
> If possible (there should be such array constructors) this shouldn't be
> redundant / duplicated with the initial declarations.
>
> Like write_sigio_fds = (int[]) {-1,-1} and having a common macro like
>
> #define SIGIO_FDS_INIT {-1,-1}
>
> static int write_sigio_fds[2] = SIGIO_FDS_INIT;
> ...
> write_sigio_fds = (int[]) SIGIO_FDS_INIT
I tried this - I couldn't get anything like that to compile. I always
got an "incompatible types in assignment" error.
Below is the latest version.
Jeff
[-- Attachment #2: sigio-race --]
[-- Type: text/plain, Size: 2349 bytes --]
# This fixes a race in the starting of write_sigio_thread.
# Previously, some of the data needed by the thread was initialized
# after the clone. If the thread ean immediately, it would see the
# uninitialized data, including an empty pollfds, which would cause it
# to hang.
# We move the data initialization to before the clone, and adjust the
# error paths and cleanup accordingly.
#
# Signed-off-by: Jeff Dike <jdike@addtoit.com>
Index: linux-2.6.16/arch/um/os-Linux/sigio.c
===================================================================
--- linux-2.6.16.orig/arch/um/os-Linux/sigio.c 2006-03-23 17:30:27.000000000 -0500
+++ linux-2.6.16/arch/um/os-Linux/sigio.c 2006-03-23 17:44:11.000000000 -0500
@@ -270,42 +270,40 @@ void write_sigio_workaround(void)
/* Did we race? Don't try to optimize this, please, it's not so likely
* to happen, and no more than once at the boot. */
if(write_sigio_pid != -1)
- goto out_unlock;
+ goto out_free;
- write_sigio_pid = run_helper_thread(write_sigio_thread, NULL,
- CLONE_FILES | CLONE_VM, &stack, 0);
-
- if (write_sigio_pid < 0)
- goto out_clear;
+ current_poll = ((struct pollfds) { .poll = p,
+ .used = 1,
+ .size = 1 });
if (write_sigio_irq(l_write_sigio_fds[0]))
- goto out_kill;
+ goto out_free;
- /* Success, finally. */
memcpy(write_sigio_fds, l_write_sigio_fds, sizeof(l_write_sigio_fds));
memcpy(sigio_private, l_sigio_private, sizeof(l_sigio_private));
- current_poll = ((struct pollfds) { .poll = p,
- .used = 1,
- .size = 1 });
+ write_sigio_pid = run_helper_thread(write_sigio_thread, NULL,
+ CLONE_FILES | CLONE_VM, &stack, 0);
- sigio_unlock();
- return;
+ if (write_sigio_pid < 0)
+ goto out_clear;
- out_kill:
- l_write_sigio_pid = write_sigio_pid;
- write_sigio_pid = -1;
sigio_unlock();
- /* Going to call waitpid, avoid holding the lock. */
- os_kill_process(l_write_sigio_pid, 1);
- goto out_free;
+ return;
out_clear:
write_sigio_pid = -1;
- out_unlock:
- sigio_unlock();
+ current_poll = ((struct pollfds) { .poll = NULL,
+ .size = 0,
+ .used = 0 });
+ write_sigio_fds[0] = -1;
+ write_sigio_fds[1] = -1;
+
+ sigio_private[0] = -1;
+ sigio_private[1] = -1;
out_free:
kfree(p);
+ sigio_unlock();
out_close2:
close(l_sigio_private[0]);
close(l_sigio_private[1]);
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2006-03-24 2:26 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-03-11 18:01 [uml-devel] Occasional hang starting up Rob Landley
2006-03-12 20:33 ` Blaisorblade
2006-03-14 20:22 ` Jeff Dike
2006-03-14 21:31 ` Rob Landley
2006-03-16 14:22 ` Werner Almesberger
2006-03-17 19:36 ` Blaisorblade
2006-03-17 21:27 ` Werner Almesberger
2006-03-17 22:14 ` Blaisorblade
2006-03-17 23:54 ` Werner Almesberger
2006-03-20 10:20 ` Werner Almesberger
2006-03-21 1:00 ` Jeff Dike
2006-03-21 1:21 ` Blaisorblade
2006-03-23 18:45 ` Jeff Dike
2006-03-23 19:05 ` Werner Almesberger
2006-03-23 20:13 ` Jeff Dike
2006-03-23 21:23 ` Werner Almesberger
2006-03-23 21:43 ` Jeff Dike
2006-03-24 0:37 ` Blaisorblade
2006-03-24 2:26 ` Jeff Dike
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.