* [uml-devel] Occasional hang starting up. @ 2006-03-11 18:01 Rob Landley 2006-03-12 20:33 ` Blaisorblade 2006-03-14 20:22 ` Jeff Dike 0 siblings, 2 replies; 19+ messages in thread From: Rob Landley @ 2006-03-11 18:01 UTC (permalink / raw) To: user-mode-linux-devel On linux-2.6.16-rc5 I'm occasionally getting a hang starting up (with rootfstype=hostfs), right after the initializing VFS line. This time when I killed it, I got the following error (repeated twice): remove_umid_dir - actually_do_remove failed with err = -2 I'm guessing that's where it's hanging? Rob -- Never bet against the cheap plastic solution. ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [uml-devel] Occasional hang starting up. 2006-03-11 18:01 [uml-devel] Occasional hang starting up Rob Landley @ 2006-03-12 20:33 ` Blaisorblade 2006-03-14 20:22 ` Jeff Dike 1 sibling, 0 replies; 19+ messages in thread From: Blaisorblade @ 2006-03-12 20:33 UTC (permalink / raw) To: user-mode-linux-devel; +Cc: Rob Landley On Saturday 11 March 2006 19:01, Rob Landley wrote: > On linux-2.6.16-rc5 I'm occasionally getting a hang starting up (with > rootfstype=hostfs), right after the initializing VFS line. This time when > I killed it, I got the following error (repeated twice): > > remove_umid_dir - actually_do_remove failed with err = -2 > I'm guessing that's where it's hanging? I guess not. That's a shutdown function - when UML exits it should remove ~/.uml/<umid> and contained files. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___________________________________ Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB http://mail.yahoo.it ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [uml-devel] Occasional hang starting up. 2006-03-11 18:01 [uml-devel] Occasional hang starting up Rob Landley 2006-03-12 20:33 ` Blaisorblade @ 2006-03-14 20:22 ` Jeff Dike 2006-03-14 21:31 ` Rob Landley 2006-03-16 14:22 ` Werner Almesberger 1 sibling, 2 replies; 19+ messages in thread From: Jeff Dike @ 2006-03-14 20:22 UTC (permalink / raw) To: Rob Landley; +Cc: user-mode-linux-devel On Sat, Mar 11, 2006 at 01:01:41PM -0500, Rob Landley wrote: > On linux-2.6.16-rc5 I'm occasionally getting a hang starting up (with > rootfstype=hostfs), right after the initializing VFS line. I'm in the process of beating on the newer externfs-based hostfs and humfs, so I'm not too concerned with bugs in the old hostfs. Having said that, if you can get a stack from the hang, it might turn out to be an easily diagnosed and fixed bug. > This time when I > killed it, I got the following error (repeated twice): > > remove_umid_dir - actually_do_remove failed with err = -2 There's some confusion sometimes during shutdown when UML is forcibly killed. I haven't really looked at this, but it's harmless. Jeff ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [uml-devel] Occasional hang starting up. 2006-03-14 20:22 ` Jeff Dike @ 2006-03-14 21:31 ` Rob Landley 2006-03-16 14:22 ` Werner Almesberger 1 sibling, 0 replies; 19+ messages in thread From: Rob Landley @ 2006-03-14 21:31 UTC (permalink / raw) To: user-mode-linux-devel; +Cc: Jeff Dike On Tuesday 14 March 2006 3:22 pm, Jeff Dike wrote: > On Sat, Mar 11, 2006 at 01:01:41PM -0500, Rob Landley wrote: > > On linux-2.6.16-rc5 I'm occasionally getting a hang starting up (with > > rootfstype=hostfs), right after the initializing VFS line. > > I'm in the process of beating on the newer externfs-based hostfs and humfs, > so I'm not too concerned with bugs in the old hostfs. Having said that, > if you can get a stack from the hang, it might turn out to be an easily > diagnosed and fixed bug. If it's in code that's going away I can live with it. Let me know when the new hostfs is ready for wider testing and I'll give it a whirl... Rob -- Never bet against the cheap plastic solution. ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [uml-devel] Occasional hang starting up. 2006-03-14 20:22 ` Jeff Dike 2006-03-14 21:31 ` Rob Landley @ 2006-03-16 14:22 ` Werner Almesberger 2006-03-17 19:36 ` Blaisorblade 1 sibling, 1 reply; 19+ messages in thread From: Werner Almesberger @ 2006-03-16 14:22 UTC (permalink / raw) To: Jeff Dike; +Cc: Rob Landley, user-mode-linux-devel Jeff Dike wrote: > Having said that, > if you can get a stack from the hang, it might turn out to be an easily > diagnosed and fixed bug. Seems that arch/um/kernel/sigio_user.c:write_sigio_thread begins its poll before there are any fds in current_poll, so update_thread caused by a add_sigio_fd shortly thereafter hangs while waiting for a response. A sleep(1) at the beginning of write_sigio_thread "fixes" this :-) - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [uml-devel] Occasional hang starting up. 2006-03-16 14:22 ` Werner Almesberger @ 2006-03-17 19:36 ` Blaisorblade 2006-03-17 21:27 ` Werner Almesberger ` (2 more replies) 0 siblings, 3 replies; 19+ messages in thread From: Blaisorblade @ 2006-03-17 19:36 UTC (permalink / raw) To: user-mode-linux-devel; +Cc: Werner Almesberger, Jeff Dike, Rob Landley [-- Attachment #1: Type: text/plain, Size: 1395 bytes --] On Thursday 16 March 2006 15:22, Werner Almesberger wrote: > Jeff Dike wrote: > > Having said that, > > if you can get a stack from the hang, it might turn out to be an easily > > diagnosed and fixed bug. > > Seems that arch/um/kernel/sigio_user.c:write_sigio_thread begins > its poll before there are any fds in current_poll Ah, and you mean it hangs... > , so update_thread > caused by a add_sigio_fd shortly thereafter hangs while waiting for > a response. Clearly > A sleep(1) at the beginning of write_sigio_thread "fixes" this :-) Er... there's only one thing which seems unclear - poll(NULL, 0, -1) shouldn't fail with -EFAULT instead of hanging? Yep, it should. I didn't believe what you said, so. But then I went to check the implementation. And, indeed, it seems that Linux isn't up to this :-). So: 1) I'll try to fix poll(2) to return -EINVAL. Dunno whether anyone will say "no, the app is stupid, it deserves no error", but hope not (with "try" I refer to this). Attached patch should do this. 2) write_sigio_thread should do a "down" on a semaphore/mutex and the first update_thread should "up" it. As usually, this semaphore would be indeed implemented as a pipe. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade [-- Attachment #2: poll-mustn_t-hang-0-fds.patch --] [-- Type: text/x-diff, Size: 1062 bytes --] poll(2): don't hang caller if nfds == 0 From: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>, Werner Almesberger <wa@almesberger.net> As Werner Almesberger observed in a real program, poll(2) will never return if called with nfds == 0; which is stupid but not so badly deserving. I looked at the code and verified that indeed such a case is never detected and do_poll keeps looping, checking any fd passed, overlooking it wasn't passed any and calling schedule_timeout() endlessly. At least it's a TASK_INTERRUPTIBLE sleep... Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Index: linux-2.6.git/fs/select.c =================================================================== --- linux-2.6.git.orig/fs/select.c +++ linux-2.6.git/fs/select.c @@ -629,6 +629,10 @@ int do_sys_poll(struct pollfd __user *uf struct fdtable *fdt; int max_fdset; + /* Otherwise below code would hang! */ + if (unlikely(nfds < 1)) + return -EINVAL; + /* Do a sanity check on nfds ... */ rcu_read_lock(); fdt = files_fdtable(current->files); ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [uml-devel] Occasional hang starting up. 2006-03-17 19:36 ` Blaisorblade @ 2006-03-17 21:27 ` Werner Almesberger 2006-03-17 22:14 ` Blaisorblade 2006-03-20 10:20 ` Werner Almesberger 2006-03-21 1:00 ` Jeff Dike 2 siblings, 1 reply; 19+ messages in thread From: Werner Almesberger @ 2006-03-17 21:27 UTC (permalink / raw) To: Blaisorblade; +Cc: user-mode-linux-devel, Jeff Dike, Rob Landley Blaisorblade wrote: > 1) I'll try to fix poll(2) to return -EINVAL. Dunno whether anyone will say > "no, the app is stupid, it deserves no error", but hope not (with "try" I > refer to this). Attached patch should do this. I think the behaviour of "poll" with nfds = 0 is correct as it is. At least POSIX doesn't say anything to the extent that "nfds" couldn't be zero: http://www.opengroup.org/onlinepubs/009695399/functions/poll.html Furthermore, this is one of several ways to implement a "sleep" function with sub-second granularity, so existing applications may already depend on "poll" accepting nfds = 0. (Of course, they would probably be better off using "nanosleep".) > 2) write_sigio_thread should do a "down" on a semaphore/mutex and the first > update_thread should "up" it. Yup :-) Thanks, - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [uml-devel] Occasional hang starting up. 2006-03-17 21:27 ` Werner Almesberger @ 2006-03-17 22:14 ` Blaisorblade 2006-03-17 23:54 ` Werner Almesberger 0 siblings, 1 reply; 19+ messages in thread From: Blaisorblade @ 2006-03-17 22:14 UTC (permalink / raw) To: Werner Almesberger; +Cc: user-mode-linux-devel, Jeff Dike, Rob Landley On Friday 17 March 2006 22:27, Werner Almesberger wrote: > Blaisorblade wrote: > > 1) I'll try to fix poll(2) to return -EINVAL. Dunno whether anyone will > > say "no, the app is stupid, it deserves no error", but hope not (with > > "try" I refer to this). Attached patch should do this. > > I think the behaviour of "poll" with nfds = 0 is correct as it is. > At least POSIX doesn't say anything to the extent that "nfds" > couldn't be zero: > http://www.opengroup.org/onlinepubs/009695399/functions/poll.html I don't think that "hanging" is a correct behaviour anyway... unless the application *wanted* to wait for a signal, but this is a clumsy way to do this (I don't think there's any difference from pause()). > Furthermore, this is one of several ways to implement a "sleep" > function with sub-second granularity > , so existing applications > may already depend on "poll" accepting nfds = 0. (Of course, > they would probably be better off using "nanosleep".) This is a very smart note, however usleep(3) exists since 4.3 BSD so this is a bit unlikely; however you're surely right on this. > > 2) write_sigio_thread should do a "down" on a semaphore/mutex and the > > first update_thread should "up" it. > > Yup :-) > > Thanks, > - Werner -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___________________________________ Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB http://mail.yahoo.it ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [uml-devel] Occasional hang starting up. 2006-03-17 22:14 ` Blaisorblade @ 2006-03-17 23:54 ` Werner Almesberger 0 siblings, 0 replies; 19+ messages in thread From: Werner Almesberger @ 2006-03-17 23:54 UTC (permalink / raw) To: Blaisorblade; +Cc: user-mode-linux-devel, Jeff Dike, Rob Landley Blaisorblade wrote: > however usleep(3) exists since 4.3 BSD so this is a > bit unlikely; Oh, there must be a million of passably valid reasons why someone may still do it this way, e.g., - because they don't know of the more suitable function - because they don't trust the more suitable function for some reason - because they're using some restricted set of functions, e.g., from a portable library - because the other function isn't available on all platforms they care about, while the "poll" functionality (perhaps through some abstraction that may also use "select") is - because they use a signal to get out of the initial "hang" - because the semantics of their code actually say "I/O or timeout" - because that's what the code that has passed all those reviews and certifications does, and nobody is allowed to touch it. Ever. etc. Just because this valid if unexpected behaviour was a little inconvenient for us isn't a good reason for breaking it. If a safe, easy, and incredibly boring life was the top item on our agenda, we should start with getting rid of concurrency ;-) - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [uml-devel] Occasional hang starting up. 2006-03-17 19:36 ` Blaisorblade 2006-03-17 21:27 ` Werner Almesberger @ 2006-03-20 10:20 ` Werner Almesberger 2006-03-21 1:00 ` Jeff Dike 2 siblings, 0 replies; 19+ messages in thread From: Werner Almesberger @ 2006-03-20 10:20 UTC (permalink / raw) To: Blaisorblade; +Cc: user-mode-linux-devel, Jeff Dike, Rob Landley Blaisorblade wrote: > 2) write_sigio_thread should do a "down" on a semaphore/mutex and the first > update_thread should "up" it. As usually, this semaphore would be indeed > implemented as a pipe. Be the way, in case a proper fix is too involved for 2.6.16.x, here's a quick and dirty work-around that should have no ill side-effects: This hack works around a race condition that can make UML hang on startup. If the race condition is encountered, this change limits the time of the hang to one second. Besides that, it can cause a small performance penalty by waking up an otherwise idle "poll" every second. This is intended purely as a readily applicable band aid and should not be considered a long-term fix. - Werner ---------------------------------- cut here ----------------------------------- Signed-off-by: Werner Almesberger <werner@almesberger.net> --- linux-2.6.16.orig/arch/um/kernel/sigio_user.c 2006-03-20 02:53:29.%N -0300 +++ linux-2.6.16/arch/um/kernel/sigio_user.c 2006-03-20 06:27:11.%N -0300 @@ -184,7 +184,7 @@ signal(SIGWINCH, SIG_IGN); fds = ¤t_poll; while(1){ - n = poll(fds->poll, fds->used, -1); + n = poll(fds->poll, fds->used, 1000); if(n < 0){ if(errno == EINTR) continue; printk("write_sigio_thread : poll returned %d, " -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [uml-devel] Occasional hang starting up. 2006-03-17 19:36 ` Blaisorblade 2006-03-17 21:27 ` Werner Almesberger 2006-03-20 10:20 ` Werner Almesberger @ 2006-03-21 1:00 ` Jeff Dike 2006-03-21 1:21 ` Blaisorblade 2 siblings, 1 reply; 19+ messages in thread From: Jeff Dike @ 2006-03-21 1:00 UTC (permalink / raw) To: Blaisorblade; +Cc: user-mode-linux-devel, Werner Almesberger, Rob Landley On Fri, Mar 17, 2006 at 08:36:58PM +0100, Blaisorblade wrote: > 2) write_sigio_thread should do a "down" on a semaphore/mutex and the first > update_thread should "up" it. As usually, this semaphore would be indeed > implemented as a pipe. Is there anything wrong with this: Index: linux-2.6.15/arch/um/os-Linux/sigio.c =================================================================== --- linux-2.6.15.orig/arch/um/os-Linux/sigio.c 2006-03-20 19:48:51.000000000 -0500 +++ linux-2.6.15/arch/um/os-Linux/sigio.c 2006-03-20 19:49:55.000000000 -0500 @@ -271,6 +271,10 @@ void write_sigio_workaround(void) if(write_sigio_pid != -1) goto out_unlock; + current_poll = ((struct pollfds) { .poll = p, + .used = 1, + .size = 1 }); + write_sigio_pid = run_helper_thread(write_sigio_thread, NULL, CLONE_FILES | CLONE_VM, &stack, 0); @@ -284,10 +288,6 @@ void write_sigio_workaround(void) memcpy(write_sigio_fds, l_write_sigio_fds, sizeof(l_write_sigio_fds)); memcpy(sigio_private, l_sigio_private, sizeof(l_sigio_private)); - current_poll = ((struct pollfds) { .poll = p, - .used = 1, - .size = 1 }); - sigio_unlock(); return; That seems to me to fix the basic problem - the thread is using data that hasn't been set up yet, and I don't see that moving it up breaks anything. Jeff ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [uml-devel] Occasional hang starting up. 2006-03-21 1:00 ` Jeff Dike @ 2006-03-21 1:21 ` Blaisorblade 2006-03-23 18:45 ` Jeff Dike 0 siblings, 1 reply; 19+ messages in thread From: Blaisorblade @ 2006-03-21 1:21 UTC (permalink / raw) To: user-mode-linux-devel; +Cc: Jeff Dike, Werner Almesberger, Rob Landley On Tuesday 21 March 2006 02:00, Jeff Dike wrote: > On Fri, Mar 17, 2006 at 08:36:58PM +0100, Blaisorblade wrote: > > 2) write_sigio_thread should do a "down" on a semaphore/mutex and the > > first update_thread should "up" it. As usually, this semaphore would be > > indeed implemented as a pipe. > Is there anything wrong with this: > Index: linux-2.6.15/arch/um/os-Linux/sigio.c > =================================================================== > --- linux-2.6.15.orig/arch/um/os-Linux/sigio.c 2006-03-20 > 19:48:51.000000000 -0500 +++ > linux-2.6.15/arch/um/os-Linux/sigio.c 2006-03-20 19:49:55.000000000 -0500 > @@ -271,6 +271,10 @@ void write_sigio_workaround(void) > if(write_sigio_pid != -1) > goto out_unlock; > > + current_poll = ((struct pollfds) { .poll = p, > + .used = 1, > + .size = 1 }); > + > write_sigio_pid = run_helper_thread(write_sigio_thread, NULL, > CLONE_FILES | CLONE_VM, &stack, 0); > > @@ -284,10 +288,6 @@ void write_sigio_workaround(void) > memcpy(write_sigio_fds, l_write_sigio_fds, sizeof(l_write_sigio_fds)); > memcpy(sigio_private, l_sigio_private, sizeof(l_sigio_private)); > > - current_poll = ((struct pollfds) { .poll = p, > - .used = 1, > - .size = 1 }); > - > sigio_unlock(); > return; > > That seems to me to fix the basic problem - the thread is using data > that hasn't been set up yet, and I don't see that moving it up breaks > anything. > Jeff Ok, this is conceptually correct, and looking at 2.6.14 code this bug wasn't there, so _I_ introduced it, while cleaning it up. Sorry. Beyond, my idea of adding the semaphore was incorrect (it would work only if write_sigio_workaround did the up()). However, while I overlooked this, you overlooked what I did: *) you must also move the setting of write_sigio_fds and sigio_private above, or we'll get the same problem again in different places of write_sigio_thread. *) you must update the exit path. I've taken care so that only if everything succeed the result is stored, and to hold the lock for the minimum time needed (particularly not while doing any allocation). I'm not sure whether the first property is needed, but it would be conservative to do so to preserve coherence of data structures. In particular: if we store an fd != -1, some function may later close it (say when the console is closed); if we closed it in the exit path, then we're closing an unrelated fd and getting a bug. So, if the thread startup fails, you _must_ clear again current_poll, write_sigio_fds and sigio_private. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___________________________________ Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB http://mail.yahoo.it ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [uml-devel] Occasional hang starting up. 2006-03-21 1:21 ` Blaisorblade @ 2006-03-23 18:45 ` Jeff Dike 2006-03-23 19:05 ` Werner Almesberger 2006-03-24 0:37 ` Blaisorblade 0 siblings, 2 replies; 19+ messages in thread From: Jeff Dike @ 2006-03-23 18:45 UTC (permalink / raw) To: Blaisorblade; +Cc: user-mode-linux-devel, Werner Almesberger, Rob Landley On Tue, Mar 21, 2006 at 02:21:40AM +0100, Blaisorblade wrote: > However, while I overlooked this, you overlooked what I did: > *) you must also move the setting of write_sigio_fds and sigio_private above, > or we'll get the same problem again in different places of > write_sigio_thread. Fixed. > *) you must update the exit path. I've taken care so that only if everything > succeed the result is stored, and to hold the lock for the minimum time > needed (particularly not while doing any allocation). I'm not sure whether > the first property is needed, but it would be conservative to do so to > preserve coherence of data structures. > > In particular: if we store an fd != -1, some function may later close it (say > when the console is closed); if we closed it in the exit path, then we're > closing an unrelated fd and getting a bug. > > So, if the thread startup fails, you _must_ clear again current_poll, > write_sigio_fds and sigio_private. OK, this is all done, I think. What do you think about this? Index: linux-2.6.15/arch/um/os-Linux/sigio.c =================================================================== --- linux-2.6.15.orig/arch/um/os-Linux/sigio.c 2006-03-23 12:55:27.000000000 -0500 +++ linux-2.6.15/arch/um/os-Linux/sigio.c 2006-03-23 13:41:11.000000000 -0500 @@ -269,42 +269,41 @@ void write_sigio_workaround(void) /* Did we race? Don't try to optimize this, please, it's not so likely * to happen, and no more than once at the boot. */ if(write_sigio_pid != -1) - goto out_unlock; + goto out_free; - write_sigio_pid = run_helper_thread(write_sigio_thread, NULL, - CLONE_FILES | CLONE_VM, &stack, 0); - - if (write_sigio_pid < 0) - goto out_clear; + current_poll = ((struct pollfds) { .poll = p, + .used = 1, + .size = 1 }); if (write_sigio_irq(l_write_sigio_fds[0])) - goto out_kill; + goto out_free; - /* Success, finally. */ memcpy(write_sigio_fds, l_write_sigio_fds, sizeof(l_write_sigio_fds)); memcpy(sigio_private, l_sigio_private, sizeof(l_sigio_private)); - current_poll = ((struct pollfds) { .poll = p, - .used = 1, - .size = 1 }); + write_sigio_pid = run_helper_thread(write_sigio_thread, NULL, + CLONE_FILES | CLONE_VM, &stack, 0); - sigio_unlock(); - return; + if (write_sigio_pid < 0) + goto out_clear; - out_kill: - l_write_sigio_pid = write_sigio_pid; - write_sigio_pid = -1; sigio_unlock(); - /* Going to call waitpid, avoid holding the lock. */ - os_kill_process(l_write_sigio_pid, 1); - goto out_free; + return; out_clear: write_sigio_pid = -1; - out_unlock: - sigio_unlock(); + current_poll = ((struct pollfds) { .poll = NULL, + .size = 0, + .used = 0 }); + write_sigio_fds[0] = -1; + write_sigio_fds[1] = -1; + + sigio_private[0] = -1; + sigio_private[1] = -1; + out_free: kfree(p); + sigio_unlock(); out_close2: close(l_sigio_private[0]); close(l_sigio_private[1]); ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [uml-devel] Occasional hang starting up. 2006-03-23 18:45 ` Jeff Dike @ 2006-03-23 19:05 ` Werner Almesberger 2006-03-23 20:13 ` Jeff Dike 2006-03-24 0:37 ` Blaisorblade 1 sibling, 1 reply; 19+ messages in thread From: Werner Almesberger @ 2006-03-23 19:05 UTC (permalink / raw) To: Jeff Dike; +Cc: Blaisorblade, user-mode-linux-devel, Rob Landley Jeff Dike wrote: > Index: linux-2.6.15/arch/um/os-Linux/sigio.c Hmm, is this a link ? % find linux-2.6.15.5/arch/um -name sigio.c % The patch applies cleanly to arch/um/kernel/sigio_user.c of 2.6.16, though, and indeed seems to solve the problem. Thanks, - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [uml-devel] Occasional hang starting up. 2006-03-23 19:05 ` Werner Almesberger @ 2006-03-23 20:13 ` Jeff Dike 2006-03-23 21:23 ` Werner Almesberger 0 siblings, 1 reply; 19+ messages in thread From: Jeff Dike @ 2006-03-23 20:13 UTC (permalink / raw) To: Werner Almesberger; +Cc: Blaisorblade, user-mode-linux-devel, Rob Landley On Thu, Mar 23, 2006 at 04:05:33PM -0300, Werner Almesberger wrote: > Hmm, is this a link ? > > % find linux-2.6.15.5/arch/um -name sigio.c > % Nope, it's just later in my patchset than one which moves the file. > The patch applies cleanly to arch/um/kernel/sigio_user.c of 2.6.16, > though, and indeed seems to solve the problem. Yup, and that's the original location. Jeff ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [uml-devel] Occasional hang starting up. 2006-03-23 20:13 ` Jeff Dike @ 2006-03-23 21:23 ` Werner Almesberger 2006-03-23 21:43 ` Jeff Dike 0 siblings, 1 reply; 19+ messages in thread From: Werner Almesberger @ 2006-03-23 21:23 UTC (permalink / raw) To: Jeff Dike; +Cc: Blaisorblade, user-mode-linux-devel, Rob Landley Jeff Dike wrote: > Nope, it's just later in my patchset than one which moves the file. Ah, mysteries of the world, easily explained :-) Do you plan to submit this for 2.6.16.x ? - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [uml-devel] Occasional hang starting up. 2006-03-23 21:23 ` Werner Almesberger @ 2006-03-23 21:43 ` Jeff Dike 0 siblings, 0 replies; 19+ messages in thread From: Jeff Dike @ 2006-03-23 21:43 UTC (permalink / raw) To: Werner Almesberger; +Cc: Blaisorblade, user-mode-linux-devel, Rob Landley On Thu, Mar 23, 2006 at 06:23:11PM -0300, Werner Almesberger wrote: > Ah, mysteries of the world, easily explained :-) :-) > Do you plan to submit this for 2.6.16.x ? Haven't thought about that yet. It would certainly be a candidate. Jeff ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [uml-devel] Occasional hang starting up. 2006-03-23 18:45 ` Jeff Dike 2006-03-23 19:05 ` Werner Almesberger @ 2006-03-24 0:37 ` Blaisorblade 2006-03-24 2:26 ` Jeff Dike 1 sibling, 1 reply; 19+ messages in thread From: Blaisorblade @ 2006-03-24 0:37 UTC (permalink / raw) To: user-mode-linux-devel; +Cc: Jeff Dike, Werner Almesberger, Rob Landley [-- Attachment #1: Type: text/plain, Size: 3630 bytes --] On Thursday 23 March 2006 19:45, Jeff Dike wrote: > On Tue, Mar 21, 2006 at 02:21:40AM +0100, Blaisorblade wrote: > > However, while I overlooked this, you overlooked what I did: > > *) you must also move the setting of write_sigio_fds and sigio_private > > above, or we'll get the same problem again in different places of > > write_sigio_thread. > OK, this is all done, I think. > What do you think about this? Yep, it seems it looks good. But 1 bug still and two suggestions. See also attached patch (it misses one of the suggestions because it's late and I'm going to sleep). > Index: linux-2.6.15/arch/um/os-Linux/sigio.c > =================================================================== > --- linux-2.6.15.orig/arch/um/os-Linux/sigio.c 2006-03-23 > 12:55:27.000000000 -0500 +++ > linux-2.6.15/arch/um/os-Linux/sigio.c 2006-03-23 13:41:11.000000000 -0500 > @@ -269,42 +269,41 @@ void write_sigio_workaround(void) > /* Did we race? Don't try to optimize this, please, it's not so likely > * to happen, and no more than once at the boot. */ > if(write_sigio_pid != -1) > - goto out_unlock; > + goto out_free; > > - write_sigio_pid = run_helper_thread(write_sigio_thread, NULL, > - CLONE_FILES | CLONE_VM, &stack, 0); > - > - if (write_sigio_pid < 0) > - goto out_clear; > + current_poll = ((struct pollfds) { .poll = p, > + .used = 1, > + .size = 1 }); > > if (write_sigio_irq(l_write_sigio_fds[0])) > - goto out_kill; > + goto out_free; <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Here you're not clearing current_poll! <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< > - /* Success, finally. */ > memcpy(write_sigio_fds, l_write_sigio_fds, sizeof(l_write_sigio_fds)); > memcpy(sigio_private, l_sigio_private, sizeof(l_sigio_private)); > > - current_poll = ((struct pollfds) { .poll = p, > - .used = 1, > - .size = 1 }); > + write_sigio_pid = run_helper_thread(write_sigio_thread, NULL, > + CLONE_FILES | CLONE_VM, &stack, 0); > > - sigio_unlock(); > - return; > + if (write_sigio_pid < 0) > + goto out_clear; > > - out_kill: > - l_write_sigio_pid = write_sigio_pid; > - write_sigio_pid = -1; > sigio_unlock(); > - /* Going to call waitpid, avoid holding the lock. */ > - os_kill_process(l_write_sigio_pid, 1); > - goto out_free; > + return; > > out_clear: > write_sigio_pid = -1; > - out_unlock: > - sigio_unlock(); > + current_poll = ((struct pollfds) { .poll = NULL, > + .size = 0, > + .used = 0 }); > + write_sigio_fds[0] = -1; > + write_sigio_fds[1] = -1; > + > + sigio_private[0] = -1; > + sigio_private[1] = -1; > + If possible (there should be such array constructors) this shouldn't be redundant / duplicated with the initial declarations. Like write_sigio_fds = (int[]) {-1,-1} and having a common macro like #define SIGIO_FDS_INIT {-1,-1} static int write_sigio_fds[2] = SIGIO_FDS_INIT; ... write_sigio_fds = (int[]) SIGIO_FDS_INIT (we _will_ get one copy wrong, if we can :-)). If sending to 2.6.16 this can be kept separate or they could maybe complain "ah, no extra changes, just the fix!". Btw, this one is not attached. > out_free: > kfree(p); > + sigio_unlock(); p is a local variable, so sigio_unlock can be done before kfree(p). > out_close2: > close(l_sigio_private[0]); > close(l_sigio_private[1]); -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade [-- Attachment #2: uml-fix-sigio-startup-race-hang-fix.patch --] [-- Type: text/x-diff, Size: 1076 bytes --] Index: linux-2.6.git/arch/um/kernel/sigio_user.c =================================================================== --- linux-2.6.git.orig/arch/um/kernel/sigio_user.c +++ linux-2.6.git/arch/um/kernel/sigio_user.c @@ -398,7 +398,7 @@ void write_sigio_workaround(void) .size = 1 }); if (write_sigio_irq(l_write_sigio_fds[0])) - goto out_free; + goto out_clear_poll; memcpy(write_sigio_fds, l_write_sigio_fds, sizeof(l_write_sigio_fds)); memcpy(sigio_private, l_sigio_private, sizeof(l_sigio_private)); @@ -414,18 +414,20 @@ void write_sigio_workaround(void) out_clear: write_sigio_pid = -1; - current_poll = ((struct pollfds) { .poll = NULL, - .size = 0, - .used = 0 }); + write_sigio_fds[0] = -1; write_sigio_fds[1] = -1; sigio_private[0] = -1; sigio_private[1] = -1; + out_clear_poll: + current_poll = ((struct pollfds) { .poll = NULL, + .size = 0, + .used = 0 }); out_free: - kfree(p); sigio_unlock(); + kfree(p); out_close2: os_close_file(l_sigio_private[0]); os_close_file(l_sigio_private[1]); ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [uml-devel] Occasional hang starting up. 2006-03-24 0:37 ` Blaisorblade @ 2006-03-24 2:26 ` Jeff Dike 0 siblings, 0 replies; 19+ messages in thread From: Jeff Dike @ 2006-03-24 2:26 UTC (permalink / raw) To: Blaisorblade; +Cc: user-mode-linux-devel, Werner Almesberger, Rob Landley [-- Attachment #1: Type: text/plain, Size: 765 bytes --] On Fri, Mar 24, 2006 at 01:37:54AM +0100, Blaisorblade wrote: > Yep, it seems it looks good. But 1 bug still and two suggestions. > See also attached patch (it misses one of the suggestions because it's late > and I'm going to sleep). I merged that patch. > If possible (there should be such array constructors) this shouldn't be > redundant / duplicated with the initial declarations. > > Like write_sigio_fds = (int[]) {-1,-1} and having a common macro like > > #define SIGIO_FDS_INIT {-1,-1} > > static int write_sigio_fds[2] = SIGIO_FDS_INIT; > ... > write_sigio_fds = (int[]) SIGIO_FDS_INIT I tried this - I couldn't get anything like that to compile. I always got an "incompatible types in assignment" error. Below is the latest version. Jeff [-- Attachment #2: sigio-race --] [-- Type: text/plain, Size: 2349 bytes --] # This fixes a race in the starting of write_sigio_thread. # Previously, some of the data needed by the thread was initialized # after the clone. If the thread ean immediately, it would see the # uninitialized data, including an empty pollfds, which would cause it # to hang. # We move the data initialization to before the clone, and adjust the # error paths and cleanup accordingly. # # Signed-off-by: Jeff Dike <jdike@addtoit.com> Index: linux-2.6.16/arch/um/os-Linux/sigio.c =================================================================== --- linux-2.6.16.orig/arch/um/os-Linux/sigio.c 2006-03-23 17:30:27.000000000 -0500 +++ linux-2.6.16/arch/um/os-Linux/sigio.c 2006-03-23 17:44:11.000000000 -0500 @@ -270,42 +270,40 @@ void write_sigio_workaround(void) /* Did we race? Don't try to optimize this, please, it's not so likely * to happen, and no more than once at the boot. */ if(write_sigio_pid != -1) - goto out_unlock; + goto out_free; - write_sigio_pid = run_helper_thread(write_sigio_thread, NULL, - CLONE_FILES | CLONE_VM, &stack, 0); - - if (write_sigio_pid < 0) - goto out_clear; + current_poll = ((struct pollfds) { .poll = p, + .used = 1, + .size = 1 }); if (write_sigio_irq(l_write_sigio_fds[0])) - goto out_kill; + goto out_free; - /* Success, finally. */ memcpy(write_sigio_fds, l_write_sigio_fds, sizeof(l_write_sigio_fds)); memcpy(sigio_private, l_sigio_private, sizeof(l_sigio_private)); - current_poll = ((struct pollfds) { .poll = p, - .used = 1, - .size = 1 }); + write_sigio_pid = run_helper_thread(write_sigio_thread, NULL, + CLONE_FILES | CLONE_VM, &stack, 0); - sigio_unlock(); - return; + if (write_sigio_pid < 0) + goto out_clear; - out_kill: - l_write_sigio_pid = write_sigio_pid; - write_sigio_pid = -1; sigio_unlock(); - /* Going to call waitpid, avoid holding the lock. */ - os_kill_process(l_write_sigio_pid, 1); - goto out_free; + return; out_clear: write_sigio_pid = -1; - out_unlock: - sigio_unlock(); + current_poll = ((struct pollfds) { .poll = NULL, + .size = 0, + .used = 0 }); + write_sigio_fds[0] = -1; + write_sigio_fds[1] = -1; + + sigio_private[0] = -1; + sigio_private[1] = -1; out_free: kfree(p); + sigio_unlock(); out_close2: close(l_sigio_private[0]); close(l_sigio_private[1]); ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2006-03-24 2:26 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-03-11 18:01 [uml-devel] Occasional hang starting up Rob Landley 2006-03-12 20:33 ` Blaisorblade 2006-03-14 20:22 ` Jeff Dike 2006-03-14 21:31 ` Rob Landley 2006-03-16 14:22 ` Werner Almesberger 2006-03-17 19:36 ` Blaisorblade 2006-03-17 21:27 ` Werner Almesberger 2006-03-17 22:14 ` Blaisorblade 2006-03-17 23:54 ` Werner Almesberger 2006-03-20 10:20 ` Werner Almesberger 2006-03-21 1:00 ` Jeff Dike 2006-03-21 1:21 ` Blaisorblade 2006-03-23 18:45 ` Jeff Dike 2006-03-23 19:05 ` Werner Almesberger 2006-03-23 20:13 ` Jeff Dike 2006-03-23 21:23 ` Werner Almesberger 2006-03-23 21:43 ` Jeff Dike 2006-03-24 0:37 ` Blaisorblade 2006-03-24 2:26 ` Jeff Dike
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.