From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KxxII-0003rA-34 for qemu-devel@nongnu.org; Thu, 06 Nov 2008 00:20:34 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KxxIF-0003qx-Hg for qemu-devel@nongnu.org; Thu, 06 Nov 2008 00:20:32 -0500 Received: from [199.232.76.173] (port=42283 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KxxIF-0003qu-C0 for qemu-devel@nongnu.org; Thu, 06 Nov 2008 00:20:31 -0500 Received: from bsdimp.com ([199.45.160.85]:61184 helo=harmony.bsdimp.com) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1KxxIF-0007v5-4A for qemu-devel@nongnu.org; Thu, 06 Nov 2008 00:20:31 -0500 Date: Wed, 05 Nov 2008 22:19:58 -0700 (MST) Message-Id: <20081105.221958.756907011.imp@bsdimp.com> Subject: Re: [Qemu-devel] Re: [5578] Increase default IO timeout from 10ms to 5s From: "M. Warner Losh" In-Reply-To: <20081106005312.GA26173@shareable.org> References: <20081105150042.GJ13630@shareable.org> <20081105.091015.232928302.imp@bsdimp.com> <20081106005312.GA26173@shareable.org> Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: jamie@shareable.org Cc: jan.kiszka@web.de, qemu-devel@nongnu.org In message: <20081106005312.GA26173@shareable.org> Jamie Lokier writes: : M. Warner Losh wrote: : > : > Which ones have a good kernel implementation of it? FreeBSD's is : > : > currently approximately: : > : > : > : > if (!mask) : > : > _sigprocmask(mask, &oldmask); : > : > /* here */ : > : > select(); : > : > if (!mask) : > : > _sigprocmask(oldmask, NULL); : > : > : > : > I'm assuming that the problem is due to a signal arriving at /* here */. : > : : > : If that's _kernel_ code and the kernel behaves like Linux, it's not a : > : problem because signals don't affect the control flow until returning : > : to userspace, meaning the select() will return EINTR. : > : > It is currently user level code, and I'm looking at moving it into the : > kernel, but I need to understand the race being talked about here. : : Ugh, I had imagined FreeBSD would have got that right, since it's : quite good in other areas. I've added FreeBSD to my blacklist of : broken pselect() implementations, thanks for the info. : : Do you know if FreeBSD's pread() and pwrite() are also thread-unsafe : userspace wrappers using lseek+read/write? They are harder to avoid : when you're looking at high performance code. I haven't looked... : > Why is it no good. What is the race here? Is it just the oldmask : > thing and multiple callers to select, or is it something else? : : It's racy with a single caller. The race is: program's signal handler : sets a flag like "alarm_happened = 1". The program's main loop checks : the flag before calling select(). If the signal is delivered before : that check, the program doesn't call select() and handles the reason : for the flag. If the signal is delivered during select(), that : returns EINTR and the program handles the reason for the flag. But if : the signal is delivered _between_ checking the flag and calling : select(), the program gets stuck. : : pselect() avoids that stuck state, by blocking the signal before the : program checks the flag, and guaranteeing if the signal is delivered : after that point, pselect() returns EINTR. It's sort of analogous to : pthread_cond_wait() needing a mutex. OK. that makes sense. : > And if it is the oldmask thing, why wouldn't multiple callers of : > pselect mess it up depending on what order they have. : : Signal masks are per-thread anyway, multiple callers isn't an issue. OK. I have never had good things happen with signals and threads... Warner