public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: Help: error 514 in select()
       [not found] <s4e61722.051@smtp.thermawave.com>
@ 2006-08-19 10:05 ` Baurzhan Ismagulov
  0 siblings, 0 replies; 3+ messages in thread
From: Baurzhan Ismagulov @ 2006-08-19 10:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: msushchi

Hello Misha,

On Fri, Aug 18, 2006 at 07:37:50PM -0700, Misha Sushchik wrote:
> I am writing to you because I found a post by you in a newsgroup that
> described improper error reporting by select(), reporting error 514.

So if you don't mind, I'm taking this to linux-kernel, please answer to
the list and Cc to me.


> We recently tried to upgrade our server from RedHat 7 to RHEL 4, with
> kernel version 2.6.9. Our CORBA-based communication now is halted now
> and then due to "unknown error 514" in select().

include/linux/errno.h says user space should never see this error code,
so this is a bug in your kernel. core_sys_select returns this code if a
signal is pending for the current process. You have the following
options:

* Test with the latest kernel. 2.6.9 is almost two years old.

* Ask RedHat to fix the problem in 2.6.9.

* Fix the problem yourself.

You may try applying something like the following to your current kernel
in order to understand how to reproduce the problem (untested):

diff -Naurp linux-2.6.orig/fs/select.c linux-2.6/fs/select.c
--- linux-2.6.orig/fs/select.c	2006-08-19 11:57:53.000000000 +0200
+++ linux-2.6/fs/select.c	2006-08-19 11:57:43.000000000 +0200
@@ -430,6 +430,8 @@ asmlinkage long sys_select(int n, fd_set
 		}
 	}
 
+	if (ret == -ERESTARTNOHAND)
+		BUG();
 	return ret;
 }
 
IIUC, this should print a backtrace every time the problem occurs.


With kind regards,
Baurzhan.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Help: error 514 in select()
@ 2006-08-22  1:49 Misha Sushchik
  0 siblings, 0 replies; 3+ messages in thread
From: Misha Sushchik @ 2006-08-22  1:49 UTC (permalink / raw)
  To: ibr, linux-kernel

Baurzhan:

Thank you for this information. 
We are now trying to get someone fix the problem for us. 
Unfortunately I know nothing of kernel-level programming and do not have the time to get up to speed in it myself.

At my level (application developer, not very deep) I have several pieces that I am missing.
1) I do not know when this problem first appeared (or, re-appeared, as I can see from searching the web). If we knew the latest version where this problem was not, we would consider just going back to that version, instead of waiting for the bug to be fixed.
2) I do not have a sensible way of reproducing this error in a short time. It may take a few days of running our application in order for it to fail in this way. This is killing us (timewise) in testing possible solutions.

The latest kernel we had this reproduced with was 2.6.17.

Thanks a lot for your help. 
Misha.


>>> Baurzhan Ismagulov <ibr@radix50.net> 08/19/06 03:05AM >>>
Hello Misha,

On Fri, Aug 18, 2006 at 07:37:50PM -0700, Misha Sushchik wrote:
> I am writing to you because I found a post by you in a newsgroup that
> described improper error reporting by select(), reporting error 514.

So if you don't mind, I'm taking this to linux-kernel, please answer to
the list and Cc to me.


> We recently tried to upgrade our server from RedHat 7 to RHEL 4, with
> kernel version 2.6.9. Our CORBA-based communication now is halted now
> and then due to "unknown error 514" in select().

include/linux/errno.h says user space should never see this error code,
so this is a bug in your kernel. core_sys_select returns this code if a
signal is pending for the current process. You have the following
options:

* Test with the latest kernel. 2.6.9 is almost two years old.

* Ask RedHat to fix the problem in 2.6.9.

* Fix the problem yourself.

You may try applying something like the following to your current kernel
in order to understand how to reproduce the problem (untested):

diff -Naurp linux-2.6.orig/fs/select.c linux-2.6/fs/select.c
--- linux-2.6.orig/fs/select.c	2006-08-19 11:57:53.000000000 +0200
+++ linux-2.6/fs/select.c	2006-08-19 11:57:43.000000000 +0200
@@ -430,6 +430,8 @@ asmlinkage long sys_select(int n, fd_set
 		}
 	}
 
+	if (ret == -ERESTARTNOHAND)
+		BUG();
 	return ret;
 }
 
IIUC, this should print a backtrace every time the problem occurs.


With kind regards,
Baurzhan.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Help: error 514 in select()
       [not found] <s4ea002d.084@smtp.thermawave.com>
@ 2006-08-23 23:22 ` Baurzhan Ismagulov
  0 siblings, 0 replies; 3+ messages in thread
From: Baurzhan Ismagulov @ 2006-08-23 23:22 UTC (permalink / raw)
  To: linux-kernel; +Cc: msushchi

Hello Misha,

On Mon, Aug 21, 2006 at 06:49:03PM -0700, Misha Sushchik wrote:
> 1) I do not know when this problem first appeared (or, re-appeared, as
> I can see from searching the web). If we knew the latest version where
> this problem was not, we would consider just going back to that
> version, instead of waiting for the bug to be fixed.

It would be fairly easy to find out once you have (2) solved.


> 2) I do not have a sensible way of reproducing this error in a short
> time. It may take a few days of running our application in order for
> it to fail in this way. This is killing us (timewise) in testing
> possible solutions.

I would start with the following:

1. Do you know all signals sent to the process in question by itself or
   by other processes? Can you log them?

2. Have you tried the patch from my last mail?

This may help to understand how the problem can be reproduced. My
current assumption is, your process calls select(2), a signal arrives,
ERESTARTNOHAND is not replaced with EINTR and the former is leaked to
the user space; you have to find out where the return value should be
replaced, and make it happen. If there are many ERESTARTNOHANDs during
the normal operation, printing the stack just before returning to user
space could help further.


> The latest kernel we had this reproduced with was 2.6.17.

Good to know! I assume you haven't tried 2.6.17.10, right? Although,
after a quick skimming through the changes since 2.6.17, I haven't seen
anything that could be relevant.


With kind regards,
Baurzhan.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2006-08-23 23:22 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-22  1:49 Help: error 514 in select() Misha Sushchik
     [not found] <s4ea002d.084@smtp.thermawave.com>
2006-08-23 23:22 ` Baurzhan Ismagulov
     [not found] <s4e61722.051@smtp.thermawave.com>
2006-08-19 10:05 ` Baurzhan Ismagulov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox